UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A comparison of artificial neural network and linear statistical models for the detection of glaucoma… Parfitt, Craig Michael 1997

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1997-0159.pdf [ 15.79MB ]
Metadata
JSON: 831-1.0099139.json
JSON-LD: 831-1.0099139-ld.json
RDF/XML (Pretty): 831-1.0099139-rdf.xml
RDF/JSON: 831-1.0099139-rdf.json
Turtle: 831-1.0099139-turtle.txt
N-Triples: 831-1.0099139-rdf-ntriples.txt
Original Record: 831-1.0099139-source.json
Full Text
831-1.0099139-fulltext.txt
Citation
831-1.0099139.ris

Full Text

A Comparison of Artificial Neural Network and Linear Statistical Models for the Detection of Glaucoma Based on Measurements of the Optic Nerve Head Collected with a Scanning Laser Ophthalmoscope by Craig Michael Parfitt .A.Sc. in Mechanical Engineering, University of British Columbia, 1991 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF T H E REQUIREMENTS FOR T H E D E G R E E OF MASTER  OF APPLIED SCIENCE in  THE FACULTY OF, GRADUATE STUDIES  BIOMEDICAL ENGINEERING Department of Mechanical Engineering We accept this thesis as conforming to the required standard  J /  /7  /  A  ,  T H E UNIVERSITY OF BRITISH COLUMBIA March, 1997 © Craig Michael Parfitt, 1997  Authorization In presenting this thesis in partial fulfillment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission.  Department of Mechanical Engineering University of British Columbia, Vancouver, Canada. 21 March 1997  Abstract The scanning laser ophthalmoscope is a device used by ophthalmologists to obtain topographic images of patients' optic nerve heads (ONHs). Measurements are taken from these images that quantitatively describe the shape of the ONH. Glaucoma involves the loss of retinal nerve fibers, which in turn produces a change in the O N H shape.  However, it is not known which shape  parameters are most relevant for the diagnosis of glaucoma. To solve this problem, a forward stepping linear discriminant function analysis (DFA), and a non-linear feedforward artificial neural network (ANN) were used to build classification models for patient data. Patients were first independently classified using perimetry data (visual fields) into normal and abnormal (glaucomatous) groups. The D F A built a classification function based upon individual entry of variables into the model using the F statistic as the entry criterion at each step. The classification function was validated using a jackknife validation method which removes bias from the model. The A N N was trained with error back-propagation using the conjugate gradient method to update the synaptic weights. The entire data set (45 normals and 44 abnormals) was utilized for cross validation, one normal and one glaucoma each time. The error rate of the training set was required to be less than 20% to ensure valid network convergence. The D F A had a diagnostic precision of 88.8% {86.7% specificity and 90.9% sensitivity} with a jackknife validation of 87.6% {86.7% specificity and 88.6% sensitivity}. The A N N trained with an average training classification rate of 87.8% with a cross validated classification rate of 87.8% {86.7% specificity and 88.8% sensitivity}. Receiver operating characteristic curve analysis showed that both classification models were comparable; ROC area was 0.916 for D F A and 0.879 for ANN. Both the D F A and the A N N classification generalized well, which indicates that the ONH measurements are useful for the detection of glaucoma. ii  Table of Contents Abstract  ii  List of Tables  ix  List of Figures  xii  List of Abbreviations  xvii  Glossary  xix  Acknowledgment  xxi  Preface  xxii  1. Introduction  1  1.1 Glaucoma  1  1.2 Clinical Diagnosis  3  1.2.1 Intraocular Pressure  3  1.2.2 Visual Field Testing  4  1.2.3 Optic Nerve Head Examination  5  1.3 Treatment  6  1.4 Early Diagnosis  7  1.5 Methods for Examining ONH Structure  8  1.6 The Heidelberg Retina Tomograph  8  1.7 Pattern Classification  9  iii  1.7.1 Discriminant Function Analysis  10  1.7.2 Artificial Neural Networks  11  1.7.3 ROC Curves  11  2. Background Information and Studies  12  2.1 Physiology of Vision  12  2.1.1 Light's Journey in the Eye  12  2.1.2 Neurons and Neuronal Circuits  14  2.1.3 The Optic Nerve Head  18  2.2 Historical Review of Glaucoma Screening  22  2.2.1 Visual Field Testing  22  2.2.2 Examination of the Optic Nerve Head  26  2.2.2.1 Ophthalmoscopy and Drawings  26  2.2.2.2 Ophthalmic Photography  26  2.2.2.3 Image Analyzers  27  2.2.3 Intraocular Pressure  2.3 The Analysis of Screening Methods  28  28  2.3.1 Multivariate Analysis: Discriminant Function Analysis  29  2.3.2 Artificial Neural Networks: Supervised Learning  31  2.3.3 ROC Analysis  32  2.4 Review of Previous Work from the Analysis of Glaucoma Predictors  35  iv  3. Methods and Theory  :  38  3.1 Patient Data Collection  38  3.1.1 Patient Selection  38  3.1.1.1 Perimetry  40  3.1.1.2 Tonometry  43  3.1.2 Collection of the Optic Nerve Head Image Data  44  3.1.2.1 Confocal Scanning Laser Ophthalmoscopy  44  3.1.2.2 The Heidelberg Retina Tomograph  46  3.1.2.3 Procedure Using the Heidelberg Retina Tomograph  47  3.1.2.4 HRT Image Acquisition  48  3.1.2.5 Topographic Image Formation  49  3.1.2.6 Relative and Tilted Coordinates  51  3.1.2.7 Calculating ONH Image Parameters  54  3.1.2.8 Reference Plane  54  3.1.2.9 Contour Line or ONH Boundary  56  3.1.2.10 Corrected Contour Line  56  3.1.2.11 Curved Surface  58  3.1.2.12 Stereometric Parameters or ONH Shape Parameters  58  3.2 Statistical Analysis., 3.2.1 Data Screening  63 63  3.2.1.1 Univariate Descriptive Statistical Checks  63  3.2.1.2 Univariate and Multivariate Outliers  64  v  3.2.1.3 Testing for Normality  65  3.2.1.4 Testing for Heteroscedasticity  66  3.2.1.5 Significant Variable Relationships  66  3.2.2 Multivariate Analyses and the General Linear Model  67  3.2.2.1 Multivariate Analysis of Variance  68  3.2.2.2 Principal Components Analysis  68  3.2.2.3 Discriminant Function Analysis  69  3.2.2.3.1 The Classification Functions  70  3.2.2.3.2 Validity of Results  71  3.3 Artificial Neural Network Analysis  74  3.3.1 Artificial Neural Networks Architecture  ..75  3.3.2 Back Propagation Learning Algorithm  76  3.3.3 Weight Changes Using the Conjugate Gradient Method  78  3.3.4 Training Considerations  80  3.3.4.1 Synaptic Weight Considerations  80  3.3.4.2 Example Presentation  81  3.3.4.3 Iteration and Stopping Criteria  82  3.3.5 A N N Generalization  82  3.3.6 A N N Configuration for Generalization  84  3.3.7 Cross Validation  86  3.3.8 A N N Classification Function 3.3.9 Simulation of the A N N  :  87 89  vi  3.4 ROC Analysis  89  4. Results  91  4.1 Statistical Screening  91  4.1.1 Descriptive Statistics  91  4.1.2 Bivariate Correlation Analysis  94  4.1.3 Normality  95  4.1.4 Homoscedasticity  97  4.1.5 Regression Analysis  97  4.1.6 Squared Multiple Correlations  98  4.1.7 Significant Group Differences  98  4.2 Multivariate Statistical Models  100  4.2.1 Principal Components Analysis  100  4.2.2 Discriminant Function Analysis  101  4.2.2.1 Screening  101  4.2.2.2 Classification Model  102  4.3 Artificial Neural Network Model  105  4.4 Receiver Operating Characteristic Curves  115  5. Discussion  117  5.1 HRT Measurements  117  vii  5.2 How Useful is Discriminant Function Classification?  119  5.3 Artificial Neural Network Classification  124  5.4 Both Models Considered  135  5.5 Application to Screening  137  6. Conclusions and Recommendations  139  6.1 Concluding Remarks  139  6.2 Recommendations for Future Work  141  Bibliography  142  Appendices  151  A. Scanning Laser Ophthalmoscope Parameters  151  B. Chapter 4 Tables  153  viii  List of Tables Table 2-1. A two-by-two contingency table to illustrate sensitivity and specificity  29  Table 2-2. The discriminant function classifier (Caprioli, 1992)  30  Table 3-1. Group assignment criteria for pre-classification of patients. Note: Abnormal are glaucomatous Table 3-2. Humphrey Field Analyzer test spot intensities  39 42  Table 3-3. Three conditions used to diagnose glaucomatous visual field loss. Note: Edge points are points located on the boundary of the test field  43  Table 3-4. HRT coordinate systems  52  Table 3-5. Angular position of anatomical ONH positions  57  Table 3-6. The HRT's stereometric parameter definitions  59  Table 3-7. Optic nerve head segment definitions  63  Table 3-8. The values for F-to-enter and F-to-remove during a the forward stepwise DFA. The significanceall F values was p<0.01  72  Table 4-1. The Number of weighted connections according to the number of hidden nodes and hidden layers (with 13 inputs, one output, and bias)  106  Table 5-1. HRT relative error applied to the predictors. Good separation between group means still exists as the errors do not overlap  119  Table 5-2. Results of Caprioli's discriminant function classifier  123  Table J3-1. Descriptive Statistics for Fourteen Stereometric Parameters  153  Table B-2a. Normals Correlation Matrix for Fourteen Stereometric Parameters Before Transformations  154  ix  Table B-2b. Normals Correlation Matrix for Thirteen Stereometric Parameters After Transformations  154  Table B-2c. Abnormals Correlation Matrix for Thirteen Stereometric Parameters After Transformations  155  Table B-3. Testing for normality of varible distributions with 45 normals  156  Table B-4. Regression of variables with age; normal group only  156  Table B-5. Squared multiple correlations for both groups  157  Table B-6. Significance tests for reliable group mean differences  158  Table B-7. The significance of the Hotelling's T test showing reliable group differences (d.f. 2  13,75)  159  Table B-8a. Principal components for the normal group, n=45, with associated statistics. Factors have unrotated loading coefficients  159  Table B-8b. Principal components for the both groups, n=89, with associated statistics. Factors have unrotated loading coefficients  160  Table B-8c. Principal components for the abnormal group, n=44, with associated statistics. Factors have unrotated loading coefficients  160  Table B-9. Discriminant function analysis results  161  Table B-10. D F A statistics showing significance of function  161  Table B - l 1. Variable entry statistics  161  Table B-l2. F statistics for each variable entered into D F A  162  Table B - l 3. McNemar's % calculations for a p<0.05 with one d.f, B and C are number incorrect 2  in one step and not the other Table B-14. Coefficients for the classification functions from the DFAs  162 162  x  Table B-l5. Data for hidden node effects, I , 2" layer. Random number generator was seeded s  with 1, zero minimum training iterations, and randomized starting weights. Adjusted reference plane data was used; Adj. Ref. T, after statistical transformations (13 inputs) and Adj. Ref. Before (14 inputs)  ....163  Table B-l6. Seed number effect; the number which seeds the random number generator before randomizing the starting weights. There was zero minimum training iterations, and 2 hidden nodes were used with randomized starting weights. The Adj. Ref. data was used (14 inputs)  163  Table B-17. Data set format. Trained with a minimum training criteria of 85%, 2 hidden units in the one hidden layer, and zero minimum training iterations  163  Table B-18. Effect of setting a minimum to the number training iterations completed before testing. All training was done with 2 hidden units and an 80% training crtierion. The Adj. Ref. T data had target values of 0.1 and 0.9 compared with the values of 0 and 1 for Adj. Ref. 164 Table B-l9. Effect of removing bias unit connections; connections are number of hidden units in 1 layer and the actual connections in place, b-bias, h-hidden, o-output. Seed number st  was 1, minimum iterations were 250, with an 80% criteria for training  164  Table B-20. Effect of varying the target values for the groups. The seed was 1 and the number of hidden nodes was 2 in the 1 layer st  Table B-21. Training criteria, % correct of training set. All trained with 2 hidden units  164 164  xi  List of Figures Figure 1-1. Glaucomatous cupping due to the loss of retinal nerve fibers (RNFs);sections 1 to 3 show the progression of the RNF loss due to elevated intraocular pressure and the resultant optic nerve head cupping  3  Figure 1-2. Nerve fiber bundle defects shown in the central 30° of the visual field. A superior paracentral scotoma (left), central absolute defect surrounded by a relative scotoma, and an arcuate area defect (right) are depicted; the blind spot is the region at 15° along the nasal mid-line (right side) (Shields, 1992)  5  Figure 2-1. Side view of the human brain highlighting the visual system (Hubel, 1988)  13  Figure 2-2. Horizontal cross section of the normal eye  13  Figure 2-3. The path light takes to reach receptor cells (Hubel, 1988)  14  Figure 2-4. The cell layers of the retina (Hubel, 1988)  15  Figure 2-5. Section of a nerve cell with synaptic inputs from other cells. Note: the synaptic buttons (a.k.a. boutons) are the site of the synapses  16  Figure 2-6. Sectional view of the brain from the ventral side (Hubel, 1988)  17  Figure 2-7. Division of the eyes' visual fields showing the path of the optic nerve fibers  18  Figure 2-8. The central 30° of the retina exhibiting RNF axon paths in the right eye.Inset: The nerve fibers closer to the optic nerve pile up on top of the peripheral fibers (Shields, 1992) Figure 2-9. Section of the optic nerve at the back of the eye, with top view of optic disc  19 19  Figure 2-10. The optic nerve head: (A) nerve fiber layer at the surface of the retina; (B) pre-laminar region; (C) lamina cribrosa region; (D) retro-laminar region (Shields, 1992)  20  xn  Figure 2-11. Variation in the shape of the physiological cup (c): type I. small funnel-shaped; type JJ. temporal (T), cylindrical; type i n . central, trough-shaped; type IV. steep nasal (N) wall with sloping of temporal wall toward disc (d) margin; type V. developmental anomalies (not shown) (Shields, 1992)  21  Figure 2-12. Traquair's island of vision. (A) peripheral limits, (B) profile of contour representing sensitivity, (bs) blind spot, (f) fixation point (Shields, 1992) Figure 2-13. An example of an artificial neural network (Haykin, 1994)  25 31  Figure 2-14. The model underlying ROC analysis: overlapping distributions in a discriminatory system (Metz, 1986)  33  Figure 2-15. Varying the confidence threshold (Metz, 1986)  34  Figure 2-16. A typical ROC curve (Metz, 1986)  34  Figure 2-17. A set of ROC curves (Swets, 1988)  35  Figure 3-1. Test spot pattern used by the central 30-2 threshold test program  41  Figure 3-2. Psychometric function, showing the threshold for the probable detection of light  42  Figure 3-3. Applanation and the Imbert-Fick Law (Shields, 1992)  44  Figure 3-4. Confocal imaging allows the removal of: a) out-of -focus light reflections, and b) scattered light. Note: System refraction is due to patient and equipment refraction Figure 3-5. Heidelberg retina tomograph system  46 46  Figure 3-6. Tomographic image series (poor reproduction in gray scale). The top left is the highest tomographic image and the lower right is the deepest; reading order is left to right, row by row  48  Figure 3-7. Tomographic image series alignment  49  Figure 3-8. Tomographic image series light intensity profile  50  xiii  Figure 3-9. The gray scale topographic image (left) is based upon pixel surface height; the reflectivity image (right) is based upon the tomographic image series' pixel intensity values  51  Figure 3-10. Reference ring position within a 10° image  52  Figure 3-11. The retinal surface plane as defined by the reference ring (T-temporal, N-nasal)  53  Figure 3-12. Optic nerve head in a three dimensional mesh view. Ripples are blood vessels. The depression in the center is the central region (cup) of the ONH  54  Figure 3-13. Cross section of the optic nerve head showing the reference plane, parallel to the retinal surface plane and separated by a distance Href (T - temporal, N - Nasal)  55  Figure 3-14. Drawing the contour line using the reflectivity image (right) as a guide; the topographic image is shown on the left  56  Figure 3-15. ONH image (right eye) with the associated peripapillary height variation diagram along the contour line. The height variation diagram is plotted moving clockwise (right eye) and counter clockwise (left eye) around the contour starting from the 9 or 3 o'clock position, respectively  57  Figure 3-16. A normal ONH with shallow cup  61  Figure 3-17. An abnormal ONH with a deep cup  61  Figure 3-18. Stereometric parameters calculated from regions defined in Table 3-7  62  Figure 3-19. Multilayered artificial neural network (Haykin, 1994)  75  Figure 3-20. The logistic (sigmoid) non-linear activation function (Haykin, 1994)  77  Figure 3-21. Curve fitting: (left) properly fitted, (right) overfitted (Haykin, 1994)  83  Figure 3-22. Regions formed by two distinct data groups, (a) linearly separable regions, (b) nonlinearly separable regions (Haykin, 1994)  85  xiv  Figure 3-23. A N N Structure (Haykin, 1994).  88  Figure 4-1. The histograms for age (a), and the fourteen stereometric parameters before any transformations: (b) ag, (c) eag, (d) abrg, (e) hie, (f) mhcg, (g) pheg, (h) hvc, (i) vbsg, (j) vasg, (k) vbrg, (1) varg, (m) mdg, (n) tmg, (o) mr. Both groups are displayed in the histograms:black bars=normal and white bars=abnormal  114-7  Figure 4-2. The histograms for the four variables which were transformed to produce normality in the normal healthy subject group:(a) sqrtabrg,(b) sqrtvbsg,(c) logvasg,(d) sqrtvbrg.... 119 Figure 4-3. The distribution of group scores for the third central moment after correction for age effects  121  Figure 4-4. The resulting classifications from the D F A canonical discriminant function  127  Figure 4-5. The separation between valid generalization and memorization is illustrated  130  Figure 4-6. The effects of using a different number to seed the random number generator. The same network configuration was used  132  Figure 4-7. The format of the data set to which the artificial neural network was presented: raw adjusted plane data (Adj. Ref), normalized Adj. Ref. (Norm'd), and standardized Adj. Ref. (Std'd)  133  Figure 4-8. The effect of controlling the minimum number of iterations before testing validation cases had upon generalization  134  Figure 4-9. Effect of removing bias unit connections; number of hidden units in the 1 layer and the st  actual connections in place, b-bias, h-hidden, o-output  135  Figure 4-10. Effect of varying the target values for the groups. The seed was 1 and the number of hidden nodes was 2 in the 1 layer, training criteria was either 80% or 85% st  136  xv  Figure 4-11. The effect of varying the minimum percent correct in the training set before validation. Each run was trained with 2 hidden units Figure 4-12. ROC curves for discriminant function and artificial neural network analysis  138 139  Figure 5-1. A cross section of the optic nerve head showing the relative features. In this example the adjusted reference plane is located 50pm below the retinal surface. The volume above reference is the same as the rim volume, a is the mean height of the curved surface and b is the cup depth (Zangwill et al., 1996)  143  Figure 5-2. (Fig. 3-22 repeated) Regions formed by two distinct data groups, (a) linearly separable regions, (b) non-linearly separable regions  149  Figure 5-3. Hyperplanes that divide the data space into regions: (a) simplest convex; (b) undivided; (c) 3 regions with 2 hyperplanes; (aa) 7 regions with 3 hyperplanes Figure 5-4. The decision boundary between two overlapping normal distributions  150 159  xvi  List of Abbreviations ANN  artificial neural network RGC  retinal ganglion cell  RNF  retinal nerve fiber  confocal scanning laser ophthalmoscope  RNFL  retinal nerve fiber layer  CV  cross validation  ROC  receiver operating characteristic  DFA  discriminant function analysis  RONHA  Rodenstock Optic Nerve Head Analyzer  FPF  false positive fraction  SD  standard deviation  GLM  general linear model  SLO  scanning laser ophthalmoscope  GS  Glaucoma Scope  SMC  squared multiple correlation  GVFD  glaucomatous visual field defects  TPF  true positive fraction  HRA  Humphrey Retinal Analyzer  VF  visual field  HRT  Heidelberg Retina Tomograph  VHHSC  Vancouver Hospital and Health Sciences Centre  HTG  high tension glaucoma HRef  reference plane height  IOP  intraocular pressure sqrt  squareroot  LGN  lateral geniculate nucleus log  logarithm  LTG  low tension glaucoma corr  corrected  LTS  Laser Tomographic Scanner ag  area, global  eag  effective area, global  abrg  area below reference, global  hie  height in contour  mhcg  mean height of contour, global  pheg  peak height of contour, global  CNS CSLO  MVA NTG OHT ONH  central nervous system  multivariate analysis normal tension glaucoma ocular hypertensive optic nerve head  PCA  principal components analysis  POAG  primary open angle glaucoma xvii  hvc  height variation of contour  vbsg  volume below surface, global  vasg  volume above surface, global  vbrg  volume below reference, global  varg  volume above reference, global  mdg  maximum depth, global  tmg  third moment, global  mr  mean radius  xviii  Glossary ametropia poor vision because the image is not focused upon the retina aphakic - without lens. apostilb - a measure of illuminance (the incident flux per unit area); roughly equivalent to 1 lux or lumen/m of incident surface. axon - the process of a nerve cell which conducts a signal to synapse with other neurons bivariate scatter plot - a plot which makes apparent any non-linear relationships between variables campimetry - measurement of the visual field on a flat surface. contour line - the boundary of the optic disc cup shape measure - the third central moment for the frequency distribution of the cup depth values heteroscedastic - one of the variables is non-normal or there is an indirect relationship among the variables homoscedastic - the relative variability of scores for one variable is roughly the same for the other variables hyperplane - a plane formed in multiple dimensions which divides space multivariate - multiple variable nerve fiber - a ganglion cell axon which conducts signals over long distances in the nervous system papillomacular bundle - the temporal area of the optic disc which is thought to change last in the progression of glaucoma perimetry - measurement of the visual field on a curved surface; replaced campimetry in modem clinical practice. perimetry, kinetic - a series of spots indicating where a particular test object intensity was first seen, connecting these spots produces an isopter plot for that test object intensity, repeated for several different test object intensities resulting in series of isopter plots mapping out the hill (island) of vision. perimetry, static - reports a threshold intensity of a particular spot in the visual field, after test completion a numerical threshold value is obtained for each spot tested.  xix  phakic - with lens. pseudo-exfoliation syndrome - frequently in association with glaucoma, the separation of capsular membranes reflectivity image - a 'real' image made up of intensity values refractive error - a focusing deficit resulting from failure of the focal point of the eye not falling upon the retina risk factors - indices usually associated with a disease which are more qualitative than quantitative sceral canal - the sheath which surrounds the optic nerve at the back of the eye. scotomas - a loss of vision shown by a localized loss of retinal nerve fiber tissue sensitivity - the percentage of patients correctly identified with the disease. specificity - the percentage of patients correctly identified without the disease. tomographic image - an image from a single plane sliced through a structure topographic image - an image made up of height values representing topography trabecular meshwork - a sieve like structure which bridges the scleral sulcus and converts it into a tube univariate - single variable  xx  Acknowledgment I wish to thank my co-supervisors, Prof. S. Green and Prof. N. V . Swindale, for motivating me and for their patience.  I especially want to thank Nick for his exhaustive proof reading and  mentoring. This research was made possible through an M R C grant to NVS and with the Margaret L. Adamson Award for Research in Ophthalmology. Although this degree was somewhat longer than planned, the time has been a period of personal growth. I am still in process and thank God for His continued presence and gentle hand in my life. The walk that lies ahead I pursue with joy and hope that Christ will continue to be my strength and focus.  I thank all of my friends and family for their encouragement during a seemingly endless  time, with very special thanks to EMS for.her loving friendship.  xxi  Preface In the 1950's, while a postdoctoral fellow at Harvard University, Marvin Minsky developed a light microscope which enabled him to view sections of a specimen without first having to dissect (Minsky, 1988). The "double-focusing stage-scanning microscope", as it was patented in 1961, did not gain much recognition for its value until 30 years later. This approach to optical imaging, now commonly known as confocal microscopy, has caught on and is now being used in many different areas of science. The improvement of the laser as a light source, its components becoming smaller and inexpensive, and the advancement of computers, increasing their processing capacity of image data, have both benefited this approach enormously. The advantages of the laser and the computer have been incorporated into the confocal imaging system producing a very powerful diagnostic tool. This system which allows our viewing and storing of section images without damaging tissue has improved our ability to view specimens in vivo. Three dimensional images are formed, which are then viewed, manipulated, and analyzed. Now, with the application of confocal microscopy in ophthalmology, it is possible to quantitatively measure retinal tissue with a similar system.  The system, commonly known as a  scanning laser ophthalmoscope, is used to gather measurements of retinal structures. The direction of these efforts is to determine the usefulness of these measurements for the identification of disease.  xxii  1. Introduction 1.1 Glaucoma Glaucoma is a disease in which there is a progressive loss of retinal ganglion cells caused by intraocular pressure (IOP), the internal pressure of the eye, becoming elevated above the individual's normal physiological level. The disease is the cause of 10% of all blindness in the world.  It affects 2% to 3% of the population over 35 years of age in North America  It has  enormous consequences, affecting people's ability to function within society and the quality of life. The costs to North American society each year are great (Ganley et al., 1983):  the loss of  productive work time is estimated at $235 million; direct treatment is estimated at $1.5 billion; 95,000 patients each year lose some significant degree of sight; 5,500 patients go blind; and 300,000 new cases of glaucoma occur. In 1991 (Tielsch), it was estimated that there were one million people in North America with undetected glaucoma. The significance that glaucoma has in society is not difficult to appreciate and the need for improved, cost effective methods of detection and treatment to fight the losses due to the disease is obvious. Glaucoma actually includes a conglomerate of disease entities which share three denominators leading to blindness: (a) the IOP is too high for the normal functioning of the retinal ganglion cells; (b) the degeneration of the optic nerve head is associated with visual field loss; (c) this progressive visual field loss leads to blindness. The mechanisms by which these pathological conditions develop and cause nerve fiber degeneration are not fully understood, but it has been proposed that the IOP is inexorably linked to an individual's risk of developing glaucoma. The various types of glaucoma are classified  1  according to the age of onset (childhood or adult), etiology (primary or secondary condition) and the pathogenic mechanism (open-angle or closed-angle) (Becker, 1975; Kolker, 1976). The most common form is primary open-angle glaucoma (POAG), from here-on-in referred to as glaucoma unless otherwise stated, which is: (a) a primary glaucoma, resulting from developmental or degenerative abnormalities affecting the channels responsible for aqueous outflow (Anderson, 1972); (b) open-angle, the anterior chamber angle between the iris root and the trabecular meshwork is from 20 to 45 degrees. In POAG the trabecular space is collapsed from degenerative changes and becomes covered with cellular or fibrovascular membranes, or clogged by particles or cells (Armaly, 1975). This restricts the aqueous humor outflow and the IOP becomes elevated. Nerve fibers (cell axons) are more easily injured with increased IOP and through retrograde degeneration the ganglion cell body in the retina dies (Quigley, 1993). The floor of the optic nerve head (ONH) cup recedes from the retina's surface (Fig. 1-1); the excavation of the ONH results from the collapse of the adjacent connective-tissue plates of the lamina cribrosa and their rotation at their point of insertion into the sclera.  This is apparently caused by the differential pressure between the intraocular and  extraocular compartments. This mechanism of damage is thought to be common to all forms of glaucoma, although the actual mechanism for the development of this pathological condition has yet to be agreed upon. Low tension glaucoma (LTG), a specific type of POAG also referred to as normal tension glaucoma (NTG), is a special case where degeneration of the ONH occurs in people with an IOP within the normal population range.  2  The Progression of Glaucoma Optic nerve  a. Normal eye pressure  a. Increased eye pressure...  a. Continued high pressure..  b. Normal optic nerve  b. Destroys nerve fibers, hollowing out nerve.  b. Further destroys nerve fibers.  Figure 1-1. Glaucomatous cupping due to the loss of retinal nervefibers(RNFs); sections 1 to 3 show the progression of the RNF loss due to elevated intraocular pressure and the resultant optic nerve head cupping.  1.2 Clinical Diagnosis As mentioned earlier, elevated intraocular pressure, visual field loss, and optic nerve head damage are the three common denominators leading to blindness in glaucoma. They provide the basis for our understanding of glaucoma and represent the focus of clinicians' efforts to diagnose the disease. 1.2.1 Intraocular Pressure The IOP has a Gaussian distribution in the general population and is usually in the range from 10 to 20 mm Hg. The pressure can be recorded by a few different methods: indentation tonometry,  3  applanation tonometry, and non-contact tonometry. The common instrument used is the tonometer; the tonometer measures the IOP using a relation between the deformation of the eye's globe and the force responsible for the deformation. The Schiotz tonometer indents the cornea , whereas the Goldmann applanation tonometer flattens a standard area of the cornea measuring the force required. Another type of applanation tonometer, the Maklakov-type, measures the area flattened using a standard force. More modern instruments, such as pneumatic tonometers, electronically record the pressure. Non-contact tonometers use a puff of air as a standard force and automatically record the time required for the deformation of the cornea. There are many factors which influence the IOP, some cause long-term changes and some, only short-term fluctuations. The long-term influences on pressure are genetics, age, gender, refractive error, and race. The fluctuations in eye pressure are due to body position: movement of the eye and eye lid; physical exertion; the time of day; ocular and systemic conditions; foods and drugs. The variability of the pressure measurement is high due to the influence of the above factors. Tonometry is neither sufficiently sensitive nor specific enough to be used alone as a screening test. 1.2.2  Visual Field Testing  The normal visual field may be depicted as a three-dimensional surface, representing areas of relative retinal sensitivity and characterized by a peak at the point of fixation, an absolute depression corresponding to the optic nerve head (blind spot), and a sloping of the remaining areas to the boundaries of the field. Early glaucomatous damage may produce a generalized depression of this surface which can be demonstrated with several psychophysical tests. However, the specific visual field changes of glaucoma are localized defects that correspond to loss of retinal nerve fiber bundles and include paracentral and arcuate scotomas above and below fixation and step-like defects along the nasal mid-line (nasal step).  4  Figure 1-2. Nerve fiber bundle defects shown in the central 30° of the visual field. A superior paracentral scotoma (left), central absolute defect surrounded by a relative scotoma, and an arcuate area defect (right) are depicted; the blind spot is the region at 15° along the nasal mid-line (right side) (Shields, 1992).  Perimeters, the instruments used to measure the field of vision, may have static and/or kinetic targets.  The targets are presented against a background screen; the advancement from the flat  screen of the campimeter to the arc or bowl perimeter increases the reliability of the measurements. The targets are controlled either manually or automatically; the change from manually moved test objects to a projected or screen displayed light source make easier the control of the test environment. The adoption of computers in the 1970s, decreased the test and analysis time to approximately 30 minutes, and increased the sensitivity to glaucomatous visual field loss. (Silverstone, 1986) 1.2.3 Optic Nerve Head Examination Clinical examination of the ONH started with the introduction of the ophthalmoscope in 1850 and its use to observe the optic disc by von Graefe in 1854.  The success of human fundus  photography in 1886 combined with the extensive fundus drawings available, such as von Jaeger's atlas of 1869, paved the way for the diagnoses of ocular disease by referring to examples of drawings or photographs. However, the classifications of Elschnig and the glaucomatous optic disc 5  descriptions by Elliot were what allowed Pickard to begin semi-quantitative evaluation of optic disc excavation. In 1966, when Holm and Krakau used stereo-photogrammetry to measure the optic disc cup, there was a resurgence of interest in the ONH and its relation to the glaucomatous process of nerve fiber loss. The next two decades saw an increased use of ONH photographs and images, obtained with simple cameras, video cameras, and charged couple device cameras, for the measurement of optic disc parameters. In 1980, the scanning laser ophthalmoscope began to be used in ophthalmology and was, five years later, combined with confocal imaging. The result was that topographic digital images of the ONH were collected from measurements recorded using a laser beam. These scanned images could then be analyzed using computer algorithms. This recent step has changed the focus from interpretation of what a human sees in the visual field to the measurement of parameters which are a direct reflection of the loss of retinal ganglion cell axons. 1.3 Treatment There is no known treatment which will restore lost vision after the blindness of glaucoma has occurred. However, the losses take several years. Glaucomatous blindness is preventable in most cases but requires early detection and proper treatment. The detection depends upon the ability to recognize the early clinical manifestations of the disease. Visual field testing is limited because most tests only sample a portion of the field. The time it would take to test the entire visual field properly is enormous and requires patient responses which degrade due to attention span and physical fatigue. The intraocular pressure is a good general guideline for risk, but as a predictor it is extremely limited because there is no knowledge of what level of IOP will lead to glaucomatous damage from one patient to the next. The appearance of the ONH and the retinal nerve fiber layer is the only objective clinical evidence for glaucomatous damage. The measurement of the ONH topography is a quantitative 6  method of describing the shape. However, the type of topography measure selected must be based upon its usefulness for: (1) the detection of early glaucomatous damage; and (2) the efficacy of therapy. The most useful measures are determined by comparing the measurements made in the absence and presence of the disease to resolve the separability of the groups. The ability to separate the two groups is based upon the diagnosis obtained by an independent test, defined as the 'gold standard' for diagnosis. This classification can have its shortfalls because of errors or limitations inherent in the current standard, which is currently based upon visual field indices. 1.4 Early Diagnosis The objective of the new technologies is to detect glaucoma. Their performance is judged against presently acceptable clinical standards for the detection of glaucoma, which each have their shortcomings.  The disease and its diagnosis follow a fairly common path: the biological onset,  detection by screening test, clinical diagnosis based on characteristic visual field defects, and a slow progression over several years leading to blindness. The first two steps are in the pre-clinical phase and diagnosis marks the beginning of the clinical phase. The difference between the time of early detection and the normal clinical appearance is called the lead time. The goal is to increase the lead time as much as possible because the damage to the retinal nerve fibers precedes the loss of vision (Quigley et al., 1989) due partly to the redundancy of the visual neural pathways and also because of the brain's ability to "fill in" any deficits in the visual field. As the time of detection comes closer to the biological onset, the clinicians are given more time to more effectively slow or arrest the progress of the degeneration by administering pressure lowering drugs and thus, the loss of vision. The quantification of ONH excavation using measured parameters introduced a new standard in testing for glaucoma, challenging the accepted standard of visual function measurements.  The •7  cup/disc ratio (Snydacker, 1964) was the first of many O N H parameters used to identify glaucomatous damage. The glaucomatous deviation of ONH parameters from normal population values has increased the interest in finding predictors for glaucoma using ONH topography. The accuracy and reproducibility of the acquisition method for the topography images used to calculate these parameters have become important issues because they affect the reliability of glaucoma tests that use them. 1.5 Methods for Examining ONH Structure Although there are many computerized devices that quantify ONH topography, there are two basic techniques to acquire the measurements: (1) stereo-photogrammetry, and (2) confocal imaging. The computerized image analyzers measure the topography using the methods of stereophotogrammetry to extract depth measurements from stereo pairs of O N H images.  Confocal  scanning laser ophthalmoscopes (CSLOs) use a laser beam that can be focused at different focal planes using confocal optics to extract the three dimensional structure of the ONH directly. 1.6 The Heidelberg Retina Tomograph Confocal microscopy has been applied in ophthalmology by a handful of companies such as Heidelberg, and Laser Diagnostic Technologies. The instruments are designed to view the retina and the laser is of low enough power that no damage to the eye can occur. The device is mounted in much the same way as an ophthalmic camera. Commonly known as a confocal scanning laser ophthalmoscope or a laser scanning tomograph, it acquires images of the structures within the eye that can be quantitatively measured and analyzed. The acquisition of images and information retrieved this way have certain attractive features: (a)  it is safe, painless, and requires minimal preparation of the patient;  (b)  the acquisition time, and thus discomfort for the patient, is very short; 8  (c)  quantitative results are produced, rather than the highly qualitative ones of previous methods;  (d)  the high initial cost of the instrument will quickly be offset by the reduced time taken in patient visits and data analyses;  (e)  research presently being conducted with these instruments is producing many improvements to the image acquisition, reproducibility, and accuracy. The Heidelberg Retina Tomograph, a scanning laser ophthalmoscope, is a tool which produces  quantitative measures of the ONH. The measures are hoped to be used for the early detection of glaucoma by building a mathematical model which classifies the shape of a patient's ONH. 1.7 Pattern Classification  Pattern recognition has long been used to reduce the dimensions of a data set: this allows easier interpretation of the data and identification of features hidden within the data, which can be used to separate classes.  The design of a pattern recognition system has five different stages:  1) the  collection of the data, 2) the formation of the data classes, 3) the selection of features, 4) specification of the classification algorithm, and 5) estimation of the classification error (Raudys, 1991) using a new set of test data. Features, to be useful in discriminating between classes, should contain relevant information, and should be insensitive to irrelevant variability in the input. The number of features should reflect the computational requirements for the training method and the amount of training data needed (Lippman, 1989). The test data should not be used to estimate classifier parameters or to determine classifier structure.  Otherwise, the estimate of the  classification error and the assessment of the generalization of the classifier will be overly optimistic (Lippman, 1989).  9  Both discriminant function analysis (DFA), a multivariate statistical analysis technique, and supervised learning with artificial neural networks (ANNs) lend themselves well to the classification of the patterns. A forward stepping linear D F A produces linear functions of features for the separation of data classes. The forward step adds one new feature at each analysis step. The classification algorithm consists of a set of discriminant functions.  In contrast, an A N N using  supervised learning builds a non-linear classification equation which best separates the classes using weighted features.  The structure of the A N N , the nodes' activation functions, and the  connection weights define the actual classification algorithm.  In the following sections are  discussions of the general principles behind each pattern classifier and examples of their use, as well as a general method which evaluates each classifier's discriminant ability: receiver operating characteristic curves (Swets, 1986; Swets, 1988). 1.7.1 Discriminant Function Analysis The most common form of discriminant analysis proceeds in a stepwise manner: (1) all variables are measured for each input pattern, (2) the variables are considered individually for their ability to separate classes, (3) the most significant variable is entered into the discriminant equation and then the remaining variables are assessed, (4) the procedure repeats until no variable contributes any further beyond a 5 % level of significance, (5) the coefficients of the variables in the discriminant equation are calculated to produce the best separation between classes. This method is known as a forward stepping discriminant function analysis. The application to medicine usually involves normal distributions of data; the general linear model uses a parametric model to produce a linear classifier. One well-known medical application of discriminant analysis showed the joint dependence of coronary heart disease risk on serum cholesterol levels, systolic blood pressure, and  10  other risk factors (Cornfield, 1962). Another example of the use of discriminant analysis is human chromosome classification (Ledley et al., 1980). 1.7.2 Artificial Neural Networks A N N classifiers can perform three different tasks: (1) identify which class best represents an input pattern; (2) simulate associative memory; (3) vector quantify or cluster N inputs into M clusters (Lippman, 1987). The type of A N N chosen, its structure, and method of training will define the task it will perform. In medicine and biology, pattern classes are often not known, and examples of clusters are required to enable the classifier to generalize going beyond just memorization. There are many examples of ANNs which used supervised learning for medical and biological classification problems (Pizzi et al., 1995; Lapuerta et al., 1995; Maddalena et al., 1995; Errington et al. 1993; Lamiell et al., 1993; Syu et al., 1993).  A N N are also readily used for  computerized analysis of medical images (Chan et al., 1994; Kippenhan et al., 1992). 1.7.3 ROC Curves Receiving operating characteristic (ROC) curves may be used to judge the discrimination ability of different modeling methods, such as D F A and ANNs, which utilize various parameters from an imaging system for predictive purposes. A meaningful and objective evaluation of a diagnostic imaging system's performance must compare the image based diagnosis with the actual state of disease and health. The comparison must distinguish between the inherent diagnostic capacity of the system and any inclusion of noise into the system. ROC curves provide the only known basis for distinguishing between these two factors to determine diagnostic performance. The ROC curve indicates the tradeoffs the diagnostic system makes between correctly identifying the diseased and healthy states that describe the inherent discrimination capacity of that system. This method will be introduced further in Chapter 2. 11  2. Background Information and Studies 2.1 Physiology of Vision The eye is a sensory organ which converts photon stimuli into electrical impulses that are passed onto higher areas of the central nervous system (CNS). The eye conveys visual information to the brain from the visual field such as colour, orientation, direction, depth, texture, size, motion, form with colour, and form with motion. It is agreed that 90% of the information around us reaches our brain via sight (Havener, 1995). The eyes are situated in the orbits of the skull and are connected to the brain via the optic nerves (Fig. 2-1).  The eye is a little over an inch in diameter, with two compartments. The posterior  compartment consists of the vitreous humor, a semi-gelatinous material, with a volume of about 4 ml and the anterior compartment consists of the aqueous humor with a volume of about 0.25 ml. 2.1.1 Light's Journey in the Eye Light enters the eye (Fig. 2-2) passing through many transparent layers before reaching its final destination at the back of the eye: the cornea is crossed first, where two thirds of the total refraction in the eye occurs; then the aqueous humor, a fluid which allows the iris, the central aperture, freedom to constrict or dilate to limit or increase the amount of light entering the eye; the lens, a flexible and transparent material, then supplies the remainder of the refraction for the eye (the ciliary muscles contract increasing the lens' curvature, thus decreasing the focal length and bringing near objects into focus; when the ciliary muscles are at rest the lens is focused in the distance); and finally the vitreous humor is traversed before the light reaches the retina at the back of the eye.  12  Lateral geniculate body  Optic radiations  Figure 2-1. Side view of the human brain highlighting the visual system (Hubel, 1988). Cornea ' I  Pupil  Iris Canal of ScbIcmm.  A n t e r i o r chamber ,  ^.rrrr-rr-~^,' '-/y^ ^<£'J^—  \  Filtration angle  <" S5K,--'' Posterior V _ — - t  When light approaches the retina it passes through two layers of cells before reaching the  ;  C i l i a r y body - .  posterior of the retina where the receptor cells (Fig. 2-3) are located. Vitreout  Even though light has to  travel through several layers of nerve cells (Fig. 24), vision is not appreciably blurred because the cells are transparent. The receptor cells use lightsensitive pigment supplied by the pigmented cells  Figure 2-2. Horizontal cross section  to convert photon stimuli into electrical signals. The  of the normal eye.  pigmented cells (Fig. 2-4) contain melanin which absorbs light that has passed through the retina and stops it from reflecting back or scattering around inside the eye. There are two types of light detecting receptor cells in the back of the retina; the rods, responsible for vision in low level light, and the cones, responsible for colour vision and 13  the ability to see fine detail, which do not respond in low light. The rods are far more numerous than cones in the retina except in the fovea, the center of the visual field. The fovea contains only cones on the retinal surface for extremely detailed vision with the other two layers of cells displaced to the side, creating a small pit about 1/2 mm in diameter in the retina (Fig. 2-2). The fine details apparent while seeing with the fovea could probably not be achieved if the two transparent cells layers were not displaced to the side.  Figure 2-3. The path light takes to reach receptor cells (Hubel, 1988).  2.1.2 Neurons and Neuronal Circuits The electrical impulses from the receptor cells are passed forward through the first neuroretinal cell layer containing amacrine, horizontal, and bipolar cells. The information gathered by the first layer of cells is propagated forward to the retinal ganglion cells mostly through the bipolar cells. This information first collected by the many rod and cone receptor cells, approximately 125 million in each eye, is thus preprocessed by the many neurons that the electrical impulses have reached, and the information is transformed and passed onto the brain by the ganglion cells, of which there are about one million per eye. This relatively simple neuronal network accomplishes many tasks for the CNS such as preliminary colour processing (Bronzino, 1995). This image sensor of the eye  14  maps about 3.5° of the visual field to every 1 mm of surface and accomplishes a very large amount of data compression (125:1, receptor cells to ganglion cells) and signal processing.  Pigmenced cell  Rod  •  Cone  Horizontal cell Amacritic cell bipolar cell —  Ganglion cell-  Figure 2-4. The cell layers of the retina (Hubel, 1988).  Neurons (Fig. 2-5), including the retinal ganglion cells, pass on information chemically through synaptic junctions or electrically by tight junctions between cells. The cell body, the soma, contains the nucleus and from it originates processes called dendrites. The dendrites reach out and contact surrounding cells to receive chemical or electrical signals. The signals are then integrated in the soma and if a threshold is reached an action potential (electrical signal) is generated at the axon hillock.  The action potential is then conducted along a long process called the axon which  terminates in synaptic terminals or boutons.  The synaptic bouton is where the information  15  contained within the action potential is passed on through a synaptic or tight junction to the next neuron in the neuronal circuit.  Dendrites Axon hillock—initial segment (action potential originates here)  Axon  Synaptic buttons  Soma (cell body)  Figure 2-5. Section of a nerve cell with synaptic inputs from other cells. Note: the synaptic buttons (a.k.a. boutons) are the site of the synapses.  The axons of the retinal ganglion cells are extremely long in comparison to the cell body. The retinal ganglion cells (RGCs) are commonly referred to as retinal nerve fibers (RNFs). The cell bodies of the nerve fibers are located in the innermost, layer of the retina where the dendrites receive signals from the bipolar cells (Fig. 2-3). The axons travel from the ganglion cell bodies, along the surface of the retina, into the optic nerve (Fig. 2-2), and then on to synapse with other neurons in the thalamus of the brain.  The entire collection of nerve fibers from the retina is  contained within the associated optic nerve (Fig. 2-6) which carries the complete neural output from each eye. After leaving the posterior pole of the eye through the optic nerve, the axons pass through the optic chiasm and decussate (split) to form the optic tracts which consist of fibers from both eyes (Fig. 2-7).  The nerve fiber axons from each optic tract continue on and synapse (contact) with  neurons in the associated lateral geniculate nucleus (LGN).  The L G N neurons form the optic  16  radiations which synapse with neurons in the different areas of the visual cortex where further processing of the visual information takes place.  Figure 2-6. Sectional view of the brain from the ventral side (Hubel, 1988).  17  Figure 2-7. Division of the eyes' visual fields showing the path of the optic nerve fibers.  2.1.3 The Optic Nerve Head The retinal nerve fibers exit the eye through a hole in the sclera at the back of the eye: the scleral canal. Before leaving the eye the million or so RNFs must converge upon this exit. As the axons converge (Fig. 2-8) they accumulate and the retinal nerve fiber layer (RNFL) thickness grows. Thus, near the optic disc (Fig. 2-9), the area of the opening in the choroid and sclera, the RNFL thickness increases rapidly and bulges upward forming a neural rim before all of the RNFs enter the optic nerve. The center depression of the optic disc is the optic cup. This entire structure formed by the convergence of RNFs at the optic disc is the optic nerve head (ONH) (Fig. 2-10). Each eye has an ONH which consists of all the RNFs leaving each eye, therefore, the ONH is a compact representation of the retina's RGCs, hence, the entire visual field. As the fibers converge, nerve fiber bundles are formed (Fig. 2-10) which are supported in an underlying region of the ONH by sheets making up the lamina cribrosa.  18  Retina  5*321 Nerve  Figure 2-8. The central 30° of the retina exhibiting RNF axon paths in the right eye. Inset: The nerve fibers closer to the optic nerve pile up on top of the peripheral fibers (Shields, 1992). Superior timporol and .y\ nasal r e t i n a l arteries  retinal vein  Figure 2-9. Section of the optic nerve at the back of the eye, with top view of optic disc.  19  Optic Nerve Head  *  1  Figure 2-10. The optic nerve head: (A) nervefiberlayer at the surface of the retina; (B) pre-laminar region; (C) lamina cribrosa region; (D) retro-laminar region (Shields, 1992).  The shape of the ONH differs within the normal population and three ONH features vary in shape accordingly; the optic disc, the optic cup, and the neural (retinal) rim. The optic disc margin is defined by the scleral canal's inner edge; the scleral canal is the opening in the sclera at the optic nerve. Definition of the margin of the optic cup varies among clinicians because the slope of the cup wall is not constant, changing amongst individuals. Thus, there is no standard reference point on the cup wall, which is agreed upon by clinicians, for an exact definition of the margin. The cup is the volume of RNF tissue missing in the center of the ONH, the depression, but due to variation in the shape (the depth, the base area, the upper area, the slope of the walls and the symmetry) there are many ways to define the location of the cup's limits. A good guideline for this is the point at which the retinal arteries and retinal veins change direction from the surface of the retina into the optic nerve (Fig. 2-9). The physiological cup for healthy eyes was classified by Elschnig in 1899 and by contour mapping (Jonas et al., 1988; Schwartz, 1973), which illustrates the different shapes of ONH possible (Fig. 2-11). The ONH's neural rim is bounded by the optic disc on the outside and the  20  optic cup on the inside. The shape of the neural rim also varies widely in the population, but the height of the rim changes directly with the amount of RNF tissue and with the diameter of the scleral canal. II  Figure 2-11. Variation in the shape of the physiological cup (c): type I. small funnel-shaped; type II. temporal (T), cylindrical; type III. central, trough-shaped; type IV. steep nasal (N) wall with sloping of temporal wall toward disc (d) margin; type V. developmental anomalies (not shown) (Shields, 1992).  The ONH is subject to changes in its structural shape due to the loss of RNFs. There is gradual irreplaceable loss of nerve fibers, estimated at roughly 5600 axons per year (Balazsi et al., 1984; Mikelberg et al., 1989), over a life time. This loss which amounts to about 400,000 fibers during a 70 year life span, does not appreciably impair a person's vision as the remaining 1.2 million (Balazsi et al., 1984) fibers will still provide good visual function.  However, there are many  diseases of the eye which cause abnormal degeneration of the RNFL and thus, the ONH. Glaucoma is the predominant disease causing ONH degeneration.  21  2.2 Historical Review of Glaucoma Screening Although there were numerous technologically different developments in the last two centuries for the detection and identification of glaucoma, the directions in which progress was made can be followed down two different paths: testing visual function and examining the retina . Each path 2  focused on superficial screening, qualitative descriptions of the disease, and quantitative measures of the progress of the disease. However, ultimately an absolute measure of the disease is sought upon which a standard can be defined and tested against, but even to the present day there is no universally accepted "gold standard" for the disease of glaucoma. 2.2.1 Visual Field Testing There are different manners in which to test the visual function of the eye; testing the completeness of the field of vision has been the most obvious since Hippocrates'(460-380 BC) first recordings of defects in the visual field and Ptolemy's (150 BC) first attempts at visual field measurement.  The visual field encompasses everything that can be seen while the eye remains  fixed; normally 60° superior, 75° inferior, 100° temporal, and 60° nasal. Since Mariotte in 1668 A D noted the discontinuity in the visual field that relates to the blind spot, and Boerhaave in 1708 noted the existence of localized loss of sight in the visual field, known as scotomata, clinicians have examined the integrity of the visual field. However, the visual field was not accurately measured until Young did so in 1801.  He  measured the limits of the normal visual field and blind spot exactly; Purkinje repeated these measurements in 1825. Although localized visual field loss was observed by Galen (131-179 AD), the disease of glaucoma and the loss of vision associated were first defined by Mackenzie in 1830.  1  All historical data was gathered using Varma and Spaeth (1993), and Silverstone and Hirsch (1986).  22  The techniques and equipment used to acquire visual field measurements have been evolving for a long time.  Albrecht von Graefe, 1856, introduced visual field testing in clinical  ophthalmology with his "campimeter". He remarked on the equal importance of measuring the visual field as well as central visual acuity. Exploiting the qualitative or diagnostic information obtained from peripheral the visual field, he recognized the contraction of the visual field that occurred with glaucoma and documented that this occurred prior to the loss of central acuity. De Wecker, in 1867, designed a campimeter screen similar to von Graefe's , but with radial markings instead of Cartesian squares. Forster, in 1869, developed the first practical perimetry instrument: a 180° arc pivoted on a stand; test objects of different sizes were moved in from the sides. Bjerrum introduced the concept of quantitative visual field testing using the Bjerrum screen in 1889. The screen was divided into sectors and was used to test a patient's ability to detect a stimulus projected upon it. Bjerrum was able to: increase the sensitivity and accuracy of visual field testing because the test objects subtended very small visual angles due to longer testing distances and smaller objects.  He also  distinguished between relative and absolute scotomas, and characterized the scotomas of glaucoma that circled the fovea and included the blind spot. He was the first to correct for refractive errors, but ignored variation in the pupil size and its effects. The Bjerrum screen remains the standard tool for campimetry. In the 1930s, Aimark introduced light projection arc perimeters and in 1945, Goldmann's hemispheric projection perimeter began modem quantitative perimetry. The advantages were the following: the hemi-spherical shape allowed easier testing of the visual field; the bowl over the arc  The intraocular pressure has been used as a glaucomatous indicator but as discussed above because of normal variability is rarely regarded as diagnostic.  2  23  provided more even illumination; complete freedom of test object movement; and size, brightness and colour could be accurately controlled. The concept of the visual field as an island of vision (a contour defined by visual sensitivity) within a sea of darkness (the areas of the visual field where there is no vision) was defined by Traquair in 1949 and is depicted in Figure 2-12. This was important for conceptualizing what the field of vision looked like and thus, developing more accurate methods to measure it. Harms and Aulhornis introduced the Tubingen perimeter in 1962: the first practical instrument for static perimetry to study glaucomatous visual field defects using static profile perimetry which increased light intensity in steps. Kinetic perimetry could also be performed when a defect was found using static techniques. The Visual Field Analyzer by Friedmann, 1966, allowed quantitative campimetry: a strobe light was located behind a metal plate with holes, then the brightness and position of the test target were varied through the use of shutters and filters. Computers were combined with visual field testing in the 1970s, automated perimeters such as: the television campimeter of Lynn and Tate, the Octopus device of Fankhauser, and the Competer of Heijl and Krakau.  Many more  automated perimeters were introduced in the next decade.  24  Figure 2-12. Traquair's island of vision. (A) peripheral limits, (B) profile of contour representing sensitivity, (bs) blind spot, (f) fixation point (Shields, 1992).  The selective loss of axons is usually greatest in the upper and lower poles of the optic disc at the earliest stage of glaucoma. This region is where the mid-peripheral RGC axons enter the ONH (Quigley et al., 1985). There is a preferential loss of the larger ganglion cells; smaller ganglion cells are characteristically lost due to compressive tumors, demyelinative disorders, and ischemic optic neuropathy (Quigley et al., 1982). Generally, visible changes of the optic disc (optic nerve damage) precede the onset of reproducible visual defects (Quigley et al., 1985). The glaucomatous field loss is neither entirely diffuse nor extremely localized (Malmo, 1991). This is because of a dispersion of the ganglion cell axons passing through the same area of the optic disc and is due to an intermixing of axonal bundles in the peripapillary retina. After the optic disc has developed cupping and atrophy, it becomes abnormally vulnerable to further damage even at normal IOP levels (Abedin et al., 1982). This observation can be used to explain low tension glaucoma: the level of the IOP does not have to be elevated above normal levels in order to cause optic nerve damage.  25  2.2.2 Examination of the Optic Nerve Head 2.2.2.1 Ophthalmoscopy and Drawings Herman von Helmholtz, in 1850, invented the ophthalmoscope which helped clinicians diagnose eye disease through visual inspection. Soon after, 1854, Albrecht von Graefe made the first observations of the optic disc in glaucoma with the ophthalmoscope and concluded the glaucomatous optic disc was raised. The next year von Graefe corrected his initial statement of 1854 with the help of Weber: the glaucomatous optic disc was excavated not raised. He later corrected another statement, "excavation occurred without increased IOP", with advice from Donders: disc excavation actually occurred with elevated IOP. Eduard von Jaeger published an atlas of the diseases of the ocular fundus in 1869; fundus drawings which on average took 30 sessions, 2.5 hrs. each, to draw. Thirty years later, in 1899, Elschnig classified types of normal O N H excavation based upon ophthalmic observations of the cup and substantiated these by histological examination. Pickard pioneered semiquantitative evaluation of the optic disc through ophthalmic drawings by measuring the excavation of the optic disc using an overlayed transparent grid from 1921-48. While Elliot, in 1922, described the basic signs of the glaucomatous optic disc: cupping, and pallor or atrophy. Much later, 1960, Colenbrader used a series of concentric circles to estimate cupping of the disc. In 1963, Ford and Sarwar made the first qualitative descriptions of the optic disc since Elliot in 1922. Armaly, in 1969, related the cup/disc ratio to open angle glaucoma 2.2.2.2 Ophthalmic Photography In 1863, Liebrecht, in Berlin, and Noyes, in New York, separately attempted fundus photography but both with limited success. The next year, Rosebrough, in Toronto, photographed the fundus of a cat successfully with a camera attached to an ophthalmoscope.  Jackman and 26  Webster in England reported the first success in photographing the human fundus in 1886; the 2.5 min exposure produced a poor quality picture. Later, in 1907, Dimmer and Pillat published a photographic atlas of the ocular fundus. Not until 1926, when Nordinson, in Upsala, was the first commercial fundus camera made available.  In 1964, Snydacker introduced the concept of the  cup/disc ratio for the evaluation of normal eyes by ophthalmoscopy and photography. Holm and Krakau, in 1966, used a photogrammetric method, stereo-photography, for determining the volume of tumors and the optic disc cup. This began attempts to obtain exact measurements of the optic cup 2.2.2.3 Image Analyzers Following Holm and Krakau, Schwartz, in 1973, pioneered the use of stereo-photographs for the measurement of ONH parameters and created the market for computerized image analyzers such as the Rodenstock Optic Nerve Head Analyzer, the Topcon IS-2000, and the Humphrey Retinal Analyzer. McCormick and associates, in Chicago, began work with computerized image analyzers that were used in clinicians' offices by 1975.  Most instruments measured optic disc  cupping but could also measure pallor. The PAR image analyzer, now known as the Topcon Imagenet system, used a digital interactive mapping program to determine the parallax differences and depth measurements from simultaneous stereoscopic images. Depth measurements were then used to calculate various topographic parameters. The first scanning laser ophthalmoscope using two dimensional scanning was developed in Boston, in 1980 (Webb et al., 1980). followed closely by developments in Heidelberg, Germany, in 1982.  This was  The next generation of  scanning laser ophthalmoscopes incorporated a confocal set-up and were realized in 1984 and 1987, respectively.  Finally, in 1989, the Heidelberg Retina Tomograph (Webb et al., 1989) was  introduced. Its images were of high quality and highly reproducible. 27  2.2.3 Intraocular Pressure Since the introduction of the Schiotz indentation tonometer in 1905 and the Goldmann applanation tonometer in 1954, there have been many studies involving the IOP in relation to ocular disease. Leydhecker (1958) found that the IOP varied in the general population; the mean was 15.5 mm Hg with standard deviation of 2.57 mm Hg (Leydhecker, 1958). Two standard deviations from the mean, 20.6 mm Hg, is commonly defined as the upper limit of normal pressure (Shields, 1992; Colton et al., 1980). People with an IOP above this upper limit, with no visual field loss, and no visible ONH damage are classified as ocular hypertensive (OHT) and are known to be at risk for developing glaucoma (Kolker et al., 1977; Phelps, 1977). Patients with glaucomatous visual field loss, are subdivided into two categories on the basis of IOP measurement: high-tension glaucoma (HTG) having an IOP > 21 mm Hg, and low-tension or normal-tension glaucoma (LTG/NTG) having an IOP < 21 mm Hg. 2.3 The Analysis of Screening Methods A combination of a diagnostic imaging procedure and an image reader constitutes a diagnostic system.  The effectiveness of a particular diagnostic imaging procedure depends upon who  interprets the images, and the diagnostic setting in which images are to be used. The diagnostic accuracy is a measure of the performance, i.e. percent correct. This measure does not reveal the relative frequencies of false positive and false negative errors, which can have substantially different clinical implications. The sensitivity, the fraction of patients actually having the disease in question that is correctly diagnosed as positive, and the specificity, the fraction of patients actually without the disease that is correctly diagnosed as negative, both define the performance of the system.  28  The percentage of patients correctly identified with a disease by a test measure is referred to as the 'sensitivity', and the percentage of people correctly identified without the disease is referred to as the 'specificity'. Table 2-1 illustrates the derivation of these values: 'a' is the number of true positives, 'b' is the number of false positives, 'c' is the number of false negatives, and ' J ' is the number of true negatives. The overall classification of patients, system accuracy, is (a + d)/N. The objective of any screening method is to maximize the sensitivity, i.e. not allowing patients with glaucoma to go unnoticed.  However, any people without the disease which are identified  incorrectly, false positives, will incur extra cost to the system through further testing and incorrect treatment, and for this reason the specificity should be maximized to reduce the cost for the health care system. It is also unnecessarily frightening for the person misdiagnosed. The cost is a relevant issue because the 2% prevalence of glaucoma implies that, by far, a screening method will be handling more people without the disease than with it. "Gold Standard" Diagnosis Positive  Negative  Total  a  b  a+b  true positive  false positive  c  d  false negative  true negative  Total  a+c  b+d  a+b+c + d=N  Performance  sensitivity  specificity  accuracy  a/(a + c)  d/(b + d)  (a + d)/N  Positive  Negative  c +d  Table 2-1. A two-by-two contingency table to illustrate sensitivity and specificity.  2.3.1 Multivariate Analysis: Discriminant Function Analysis Numerous studies have utilized discriminant function analysis (DFA) to identify the predictors of glaucomatous visual field loss using risk factors, ONH parameters, visual field indices, or a  29  combination of them. One study (Drance, 1975; Drance, 1976) using six disc variables obtained a sensitivity of 84 % and a specificity of 83 %. One difficulty however was the standardization of the parameter calculation methods. This study was later followed up (Susanna and Drance, 1978) and it was found that 8 of the 36 false positives and one of the 36 true negatives, went on to develop glaucomatous visual field loss; these patients were all thought to be disease free. A related study (Drance et al., 1978) used 7 of 33 available parameters to classify 93 % of the patients correctly, a sensitivity of 92 % and a specificity of 95 %, using the training set model. The most important parameters, which were consistently the highest ranked were the structural measurements of the ONH; rim abnormality and cup/disc ratio. Another follow up study (Drance et al., 1981) showed that the discriminant function of 1978 did relatively poorly on a new set of patients: the overall classification was 81 %, 48 % sensitivity and 92 % specificity. The D F A was recalculated for the new population and classified 79 % overall with a sensitivity of 71 % and a specificity of 81 %, showing that there was either a lack of classifier generalization or an error in producing the same measurements. Another study (Caprioli, 1992) used a D F A to classify patients using well defined structural and functional parameters. The results are given in Table 2-2.  %  Overall  Sensitivity  Specificity  Combined DF  87 + 3  90 ± 5  76 + 5  Structural DF  76 ± 2  88±4  35 ± 12  Functional DF  77 ± 3  99 ± 1  6±6  Table 2-2. The discriminant function classifier (Caprioli, 1992).  The results were good but the analysis was biased; the visual field indices used to define the classes were used in the DFA.  30  2.3.2 Artificial Neural Networks: Supervised Learning Since the introduction of the back-propagation algorithm by Rumelhart and McLelland (1986) there have been numerous uses of these multi-layered artificial neural networks for pattern classification (Pizzi et al., 1995; Lapuertaet al., 1995; Maddalenaet al., 1995; Errington et al. 1993; Lamiell et al., 1993; Syu et al., 1993). Networks are defined by the inputs and the desired outputs, the number of layers and nodes in the network, the activation function of the nodes, the connection weights, and the learning rule by which the weights are changed.  An example multilayered  artificial neural network is pictured in Figure 2-13.  Input layer  First hidden layer  Second hidden layer  Output layer  Figure 2-13. An example of an artificial neural network (Haykin, 1994).  Supervised learning is when the desired outputs are used to 'teach' the network. The learning rule is the process by which the free parameters of an artificial neural network are adapted through a continuing process of pattern presentation from the environment; the type of learning is determined by the method through which the parameters change. Error-correction learning is when the desired response is compared to the actual response to generate an error term, which is then adjusted and  31  propagated back through the network, hence, error-back-propagation. Thus, pattern recognition is developed without defining the classification algorithm. There has been much work reported on how to determine the structure of the network, which will be discussed in the next chapter. The generalization of an A N N is its ability to learn the discriminating features defining the relationship amongst a set of example patterns and then correctly identify a new pattern using these features. This ability to generalize differentiates the A N N from an A N N which has only memorized the example set of patterns, and enables the A N N to recognize new patterns or cases. The general method is to divide up a data set into three parts: for example, 40% are used as a training set, 10% as a validation set, and the remaining 50% are put aside as an independent test set. The validation set is used to test the various A N N configurations taught using the training set, and the final A N N structure is determined using the ANNs diagnostic precision as a measure of performance. The validation set is combined with the training set which is then used to train the final A N N configuration. The independent test set is used to verify the diagnostic performance of the pattern classification model. 2.3.3 ROC Analysis The quantitative measures used to describe an ROC curve are derived assuming that the data can be represented by two separate but usually overlapping Gaussian distributions (Fig. 2-14).  32  On* PomBkJ Setting of tht Confidtnc* Thrtthold  Actually Nagetiv* Potitntt  Actually Petitiv* Patttnt* FPF^I-Sptcificity: 2=  Specificity  (L-^--TPF-StMiti»ity:(|lll  Canfkttnci in a Positive Otcwon Ltti * » Mora  Figure 2-14. The model underlying ROC analysis: overlapping distributions in a discriminatory system (Metz, 1986).  In Figure 2-14, the true positive fraction, i.e. the fraction of actually positive patients that is correctly diagnosed as positive, is represented by the area under the right-hand distribution to the right of the threshold. The false positive fraction, i.e. the fraction of actually negative patients that is incorrectly diagnosed as positive, is represented by the area under the left-hand distribution to the right of the threshold. Changing the confidence threshold causes sensitivity and specificity values to vary inversely: generating pairs of sensitivity / specificity values (Fig. 2-15).  If varied  continuously a smooth curve would be swept out which constitutes an ROC curve (Fig. 2-16); a larger area under the ROC curve demonstrates greater discrimination capacity and a lower ROC area indicates less discrimination capacity (Fig. 2-17). E.g. if the distributions are identical and completely overlapped (non-discrimminable) the area is 0.5.  33  Four Oiffwtnt Conhdonc* ThrtthoMi Uttd ol Hi* Son* Timt  Cotogory I  2  3  4  Cotogory 5  Confidonct in o Positive Oocnion HM • » Moro  Figure 2-15. Varying the confidence threshold (Metz, 1986).  0-1 0  1  1  1  1  1  H  1  1  1  0.5  False Positive Fraction (FPF) Figure 2-16. A typical ROC curve (Metz, 1986).  I1.0  F»ls»-po«ltlv« proportion  «*(*+<*), FaJMfottttVo proportion  Figure 2-17. A set of ROC curves (Swets, 1988).  ROC analysis is based in statistical decision theory and was first developed in signal detection theory; that development having been motivated initially by problems in radar.  The potential  advantages of ROC analysis in studies of medical decision making were first suggested by Lusted (1971). The attention to the conceptual basis for ROC analysis in medical diagnosis has increased (Griner et al., 1981) and specifically in medical imaging (Lusted, 1978). 2.4 Review of Previous Work from the Analysis of Glaucoma Predictors Since the introduction of the cup/disc ratio, the significant deviation of a measurement from the normal population value has been used as a predictor for the presence of glaucoma. The predictors are not always structural parameters; risk factors such as IOP or functional parameters such as visual field defects are also used.  The variability of the cup/disc ratio, the rim area, and the  individual depth measurements from stereo pair images produced by the Humphrey Retinal Analyzer was examined for healthy subjects and ocular hypertensive (OHT) subjects (with and without glaucomatous visual field loss) (Dandona et al., 1989). The cup/disc ratio was significantly different (p=0.01) between the healthy and OHT subjects, 0.33 ± 0.06 and 0.59 ± 0.06, respectively, but the difference was not statistically significant (p=0.07) for the disc rim area, 2.043 ± 0 . 1 2 5 and  35  2  1.653 ± 0.157 mm , respectively.  The depth measurements within the O N H were significantly  different (p<0.05) between the healthy subjects and the OHT subjects with glaucomatous visual field loss, 165.6 ± 8.7 and 305.0 ± 49.4 um, respectively.  Many papers have studied the predictive ability of the cup/disc ratio in conjunction with new parameters and other indices. A multivariate risk analysis used four variables, vertical estimates of cup-disc ratio, mean IOP, positive family history of glaucoma and age, and found that at 0.25 predicted risk of field loss, the sensitivity was 97 %, the specificity was 80 % and the overall classification was 86 % (Hart et al., 1979). The study was done using subjects followed over a period of five years with the risk factors recorded at the beginning. Another study (Airaksinen et al., 1991) used multiple linear regression to chart the long term progressive change of visual field indices of OHT patients. Using risk factors and optic disc parameters the model correctly classified 81 % of the patients, with a sensitivity of 72 % and a specificity of 87 %. A long term study (Armaly et al., 1980) examined the predictive power of 26 risk factors including race, age, history of disease, and cup/disc ratio. Five of the factors, outflow facility, age, applanation pressure, cup-disc ratio and pressure change after water drinking, were found to be significantly related to the development of glaucomatous visual field defects (GVFDs) but multivariate analysis showed that the collective predictive power of these factors examined was poor. The resulting conclusion was that there must be other factors that play an important role in the development of GVFDs.  Another long term study (Sommer et al., 1979a; Sommer et al.,  1979b) examined the predictive power of progressive changes in the size, shape, and contour of the disc and introduced a new structural parameter, the thickness of the nerve fiber layer as it crosses the disc rim. The sensitivities and specificities of these parameters were reported as visual field  36  loss progressed over a period of about 10 years. This study showed that certain parameters were sensitive to progressive ONH structural damage but not to detection of disease onset. Thus, indices differ in their usefulness dependent upon the stage of the disease in which they were measured.  37  3. Methods and Theory 3.1 Patient Data Collection Patients suspected of having glaucoma were referred by doctors to the V H H S C U B C Pavilions Eye Clinic and the 12 investigation.  th  and Oak Pavilion Eye Care Center on a continuing basis for further  This usually included measuring the visual field and recording the intraocular  pressure. Additional tests include psychophysical and electro-diagnostic testing for disturbances in glaucoma (Mikelberg et al., 1995). As a result there is a large patient database which contains many patients with, or suspected of having, glaucoma. Researchers at the University of British Columbia, in conjunction with the teaching hospitals, formed this ongoing glaucoma study. Dr. S. M . Drance and Dr. F. S. Mikelberg have, since 1992, used a scanning laser ophthalmoscope (SLO) to obtain images of their patients' optic nerve heads (ONH). 3.1.1 Patient Selection The usefulness of the SLO's ONH image for the early detection of glaucoma can only be judged against present "gold standard" which is based on the psychophysical and physiological tests. There is no single agreed upon diagnostic criterion for detecting glaucoma.  However, a  combination of visual field testing (perimetry) and intraocular pressure measurement (tonometry) together with clinical judgment are common ways to diagnose glaucoma.  In this study, the  appearance of the ONH was not used to classify patients, in order to avoid any bias with the diagnostic determination from the SLO's ONH images. Classification of patients was based upon independent and dissimilar forms of data. The patients were screened for inclusion into the study provided they met the following criteria:  38  (i) they were suspect of having primary open-angle glaucoma (POAG), commonly referred by an ophthalmologist; (ii) they were at an early stage of development; (iii) they had no other ocular disease, with the exception of having either pigmentary dispersion syndrome or pseudo-exfoliation syndrome associated with POAG; (iv) they had no known family history of the disease; (v) only one eye from each patient was tested, although most POAG patients' condition is bilateral. The patients were classified using the criteria for group membership given in Table 3-1. The perimetry results were used as the primary method for this classification (Table 3-3 defines an abnormal visual field). All patients were identified as low-tension (< 21 mm Hg) or high-tension (> 21 mm Hg) on the basis of the IOP measurement. This was then used to differentiate between normal and suspect patients with normal visual fields; patients with abnormal visual fields were all classified as abnormal regardless of IOP. Visual Field  Intraocular Pressure (IOP)  Group Membership  normal  <21 mm Hg  Normal  normal  >21 mm Hg  Suspect  abnormal  any pressure  Abnormal  Table 3-1. Group assignment criteria for pre-classification of patients. Note: Abnormal are glaucomatous.  The normal control group was formed from volunteers within the hospital clinics and the research staff. Each volunteer was screened for eye disorders and risk factors to ensure they had good vision with no pathological conditions that could affect the shape of the ONH. The optic disc appearance was not used to establish a volunteer's acceptability for the normal group, i.e. no normal was rejected on the basis of an ONH exam. The two types of test used to determine if a patient  39  exhibited signs of early glaucomatous damage, perimetric and tonometric, are described in more detail in the following sections. 3.1.1.1 Perimetry Perimetry is the measurement of the peripheral limits of the visual field and the relative visual sensitivity of points within the limits. Thus, the visual field can be mapped out as depicted in Traquair's island of vision (Fig. 2-12). The height of the island's surface is the relative visual sensitivity of the retina at a particular point in the visual field. Since its inception, perimetry has been evolving and many different procedures have been developed, manual and automated, to map the visual field. The machine used in this study was the Humphrey Field Analyzer (model 610), utilizing the central 30-2 threshold test program (Fig. 3-1). This is widely used in clinical practice for testing the visual field.  The pupils were dilated to  approximately 3.0 mm in diameter or better using topical drugs. Visual sensitivity decreases when pupil diameter is less than 2.0 mm, but remains constant from 2.25 to 5.0 mm (Silverstone and Hirsch, 1986). A quantitative threshold strategy was used to map the sensitivity of the retinal area being tested. This strategy is best applied in cases where early loss is expected (Silverstone and Hirsch, 1986), as in our study.  Quantitative strategies yield the most reliable values for the  threshold and reduce the time involved in the testing procedure. There are faster methods but the trade-off in accuracy for effort was not appropriate in this study.  40  Figure 3-1. Test spot pattern used by the central 30-2 threshold test program.  The threshold is the luminance at which a target light can be identified above the background luminance on 50% of the trials (Fig. 3-2). This involves many trials at many retinal locations and would be time consuming. Thus, a different method was employed to identify the threshold values: a quantitative strategy.  Stimulus brightness was decreased in 4dB steps until it was no longer  visible.. The light intensity was then increased in steps of 2dB until the light was again visible. The process was repeated several times using the smaller 2dB steps. The intensity at which the light changed from being detectable to non-detectable was recorded as the threshold. Information from nearby points and from the corresponding point in a previous examination were used to speed up the process of convergence. This procedure takes from 20 - 30 minutes to perform. Table 3-2 shows the dB range associated with the apostilb. The apostilb is a measure of illuminance (the incident flux per unit area); roughly equivalent to 1 lux or lumen/m of incident surface. It is the 2  unit of light intensity most often used in automated visual field testing.  41  threshold  intensity  LIGHT INTENSITY Figure 3-2. Psychometric function, showing the threshold for the probable detection of light.  Apostilb (Asb)  Log Asb  Sensitivity (dB)  0.1  -1.0  50  1  0.0  40  10  1.0  30  100  2.0  20  1000  3.0  10  10000  4.0  0  Table 3-2. Humphrey Field Analyzer test spot intensities.  The central field examination (central 30-2 threshold test pattern, Fig. 3-1) includes: the arcuate areas, i.e. between 5 and 25 degrees radially, with an even and dense distribution of test spots, the central 5 degrees, the blind spot, and both sides of the nasal horizontal meridian. These are all of interest in testing for glaucoma (Silverstone and Hirsch, 1986). The entire test pattern encompasses the central 30 degrees of the visual field. If the results of the perimetry test indicated any of the three conditions outlined in Table 3-3, the patient was classified as having an abnormal (glaucomatous) visual field.  42  No. of adjacent points down by at least:  Special conditions  5dB  10 dB  3  lof3  anywhere but no edge points  2  anywhere but no edge points  3  immediately above or below nasal horizontal meridian including edge points  Table 3-3. Three conditions used to diagnose glaucomatous visual field loss. Note: Edge points are points located on the boundary of the test field. 3.1.1.2 Tonometry The intraocular pressure (IOP) is a measure of the internal pressure of the eye, due to the aqueous outflow resistance and the aqueous humor production of the eye. Devices developed to measure this pressure are called tonometers and make use of the relationship between an applied force and the resultant deformation of the eyeball.  The tonometer used in this study was the  Goldmann applanation tonometer. The flattening of the globe is simple to describe mathematically because of the constant shape of the surface (Fig. 3-3). The Goldmann applanation tonometer, introduced in 1954 (Shields, 1992), measures the force required to flatten the corneal surface. The Goldmann instrument was based on a modification of the Imbert-Fick Law (also known as the Malakov-Fick Law) (Goldmann, 1954). The modified law states the following: W +S=P xAi+B  (3.1)  t  where, A = area flattened by W, Aj = inner area, B = force required to bend cornea, P = pressure in t  sphere , S = surface tension , W = P x A = external force. t  43  w  Figure 3-3. Applanation and the Imbert-Fick Law (Shields, 1992).  As mentioned earlier , the IOP varies in the general population: the mean IOP is 15.5 mm Hg (Leydecker, 1958); two standard deviations above the mean is 20.6 mm Hg and is interpreted as the upper limit of normal pressure (Shields, 1992; Colton and Ederer, 1980). The IOP was measured for all patients in this study and was used to help define a criteria for patient groupings. A normal IOP was defined as being less than 21 mm Hg. 3.1.2 Collection of the Optic Nerve Head Image Data The images of the O N H were obtained using a Heidelberg confocal scanning laser ophthalmoscope.  The imaging method, procedures used with the equipment, theory relevant to  equipment operation, image acquisition, and software calculations are described in the following sections. 3.1.2.1 Confocal Scanning Laser Ophthalmoscopy The confocal scanning laser ophthalmoscope's (CSLO) design principles are similar to those of the scanning confocal microscope. A small aperture allows only the light originating from the area  44  of the retina being illuminated to pass to the detector for the formation of an image (Varma and Spaeth, 1993; Nasemann and Burk, 1990; Heidelberg, 1994). Reflections from planes that are out of focus are removed by the diaphragm (Fig. 3-4), hence, only the plane being imaged is picked up by the detector. Light which is scattered by opacities in the lens, by refractive index variations in the media it travels through, or from retinal structures it passes through does not come into focus at the aperture plane of the diaphragm or its position (Fig. 3-4). The aperture size directly controls the amount of unwanted light removed and reduces the scan's depth of field.  With depth of field  reduction, imaging layer by layer is more easily controlled and gives more consistent results. The suppression of the unwanted reflected light increases the contrast of the resulting images making it easier to visualize retinal structures. Many different companies, such as Laser Diagnostic Technologies, Heidelberg Engineering, and Rodenstock, make SLOs based upon this principle. The Heidelberg Retina Tomograph was purchased for its proven reliability and the company's development direction.  45  Confocal Laser Scanning System detector  focal plane  beam scanner  refraction of system  Figure 3-4. Confocal imaging allows the removal of: a) out-of -focus light reflections, and b) scattered light. Note: System refraction is due to patient and equipment refraction.  3.1.2.2 The Heidelberg Retina Tomograph The Heidelberg Retina Tomograph (HRT) consists of a laser scanning camera, a standard patient eye examination stand with chin rest, and a computer system with control panel (Fig. 3-5).  Figure 3-5. Heidelberg retina tomograph system.  46  3.1.2.3 Procedure Using the Heidelberg Retina Tomograph The HRT's operator is able to align the camera to view a live image of the fundus. The HRT corrects for ametropia (poor vision because the image is not focussed upon the retina) and contact lenses can be left in. The location of the focal plane can be changed in coarse 1 diopter increments and fine 0.25 diopter increments to focus upon the retina. The focal plane location shifts within a ±12 diopter range (see Appendix A for specifications and limitations of the HRT). The patient's pupils do not require dilation for recording image data; a 1 mm diameter pupil was found to be sufficient (Heidelberg, 1994).  The preference was to use as large a diameter pupil to obtain  adequate illumination. However, the signal/noise ratio increases with increased pupil diameter because the amount of detected light increases. As a result, the only patients who had their pupils dilated were on miotics; pupil size was usually between 2 mm and 3 mm in diameter. The scan size was selected to be 10° by 10°: this area contains most ONHs (Heidelberg, 1994). The operator adjusted the total depth of the scan by specifying the location of the first and last focal planes of the scan series. The scan depth was made large enough to contain all of the retinal structure being imaged (i.e., the ONH) and was adjusted between 0.5 mm and 4.0 mm in increments of 0.5 mm; the scan depth of a normal ONH is 1.5 mm and a glaucomatous ONH typically requires a scan depth of 2.5 mm (Heidelberg, 1994). After image acquisition, described in more detail below, the resulting tomographic image series (Fig. 3-6) had to meet the following three requirements: (1) the first four images should appear dark; (2) the last four images should appear dark; (3) the center images appeared brighter, increasing in intensity toward the middle.  47  Figure 3-6. Tomographic image series (poor reproduction in gray scale). The top left is the highest tomographic image and the lowerrightis the deepest; reading order is left toright,row by row.  This ensured that the entire depth of the ONH cup and the highest point of the neuroretinal rim were contained within the image series. If these requirements were not met, the scan was not used and the process was repeated after making appropriate changes to the location of the first and/or last focal plane. 3.1.2.4 HRT Image Acquisition The Heidelberg camera uses a 670 nm wavelength (visible red light) diode laser light source to illuminate the retina. The amount of reflected light is recorded by a charged couple device detector. Using this light source, the Heidelberg SLO produces 256 pixel by 256 pixel, two-dimensional, optical section images of the retina with 8-bit intensity resolution. These images are formed from individual point measurements and each takes 0.032 seconds to record at a repetition rate of 20 Hz (specifications in Appendix A.). A series of thirty-two equally spaced two-dimensional section images is taken this way over a total depth, 0.5 to 4.0 mm, predetermined by the operator to contain the entire optic nerve head. Thus, the spacing between image planes was from 16 um to 130 um and this uniform distribution of sectional images formed a three-dimensional image of the ONH.  48  The thirty-two section images took 1.6 seconds to acquire, during which time the patient was required to remain stationary minimizing eye movement and blinking.  The thirty-two section  images were then aligned (Fig. 3-7) by using blood vessels as landmarks, and a three-dimensional tomographic image series was formed. The aligned image series was then transformed into a threedimensional depth image.  Alignment was necessary in order to correct for any shifts between  individual section images due to eye movements during the acquisition time.  Confocal Imaging  y  Optical Section Images  Figure 3-7. Tomographic image series alignment. 3.1.2.5 Topographic Image Formation The formation of a three-dimensional image requires interpretation of the 8-bit intensity profiles along the optical axis (z-axis). The (x,y) planes (focal planes) are perpendicular to the optical axis, and thus, with the z-axis, define a three-dimensional image I(x,y,z). The light intensity, /, measured at each spatial (x,y,z) location represented the amount of reflected light at each point. For every fixed (xg.yg) location there exists a one-dimensional intensity distribution Ifx^y^z) called a confocal z-profile (Heidelberg, 1994) (Fig. 3-8).  If there is only one light reflecting surface the z-profile,  49  theoretically, will have its maximum at the position of the surface, and a full width of approximately 300 um (Heidelberg, 1994) at half of the maximum, depending on the aperture of the system.  However, variation in refractive power of the eye affects the localization of this  maximum.  Confocal Z-Profile intensity 4  maximum position «=> height  z (image plane)  Figure 3-8. Tomographic image series light intensity profile. The determination of the height of the retinal surface is dependent upon two assumptions: (1) all reflected light originates at one location; (2) there exists a well defined maximum from within the set of section images. The reflectivity image is the sum of the thirty-two two-dimensional section images; the value at each pixel is the reflectivity of the structure at the corresponding location. The reflectivity image is a 'real image' compared with the topographic image which contains the information on the threedimensional shape of the ONH (Fig. 3-9). The height of the reflecting surface in the topographic image is defined by the location of the maximum of the intensity distribution. Once the maximum was found along the z-profiles of all of the (x,y) points, they were recorded as a matrix of numbers  50  Z(x,y) representing a measured position along the optical axis or height of the curved surface of the ONH examined. The accuracy of the Z measurement is typically ± 1 0 um (Heidelberg, 1994). The Z(x,y) matrix forms the basis of the topographic map's height measurements in the standard coordinate system; the origin is defined by the focal plane of the eye.  Beside the standard  coordinate system, a relative and tilted coordinate system can be defined which takes the tilt of the retinal surface into account.  Figure 3-9. The gray scale topographic image (left) is based upon pixel surface height; the reflectivity image (right) is based upon the tomographic image series' pixel intensity values.  3.1.2.6 Relative and Tilted Coordinates The relative coordinate system is defined relative to the retina's orientation. The origin of the zaxis is located at the mean height of the retinal surface of the eye examined. The (x,y) planes are parallel to the retinal surface which is generally tilted with respect to the focal plane of the eye and no longer perpendicular to the optical axis. Table 3-4 summarizes the two coordinate systems.  51  Definitions Standard  •  origin of the z-axis is at the focal plane of the eye  •  z-axis is parallel to the optical axis  •  (x,y) planes are perpendicular to the optical axis and parallel to the focal plane of the eye, retinal curvature taken into account  Relative and • Tilted  origin of the z-axis is at the mean height of the retinal surface of the eye (relative coordinates)  •  (x,y) planes are parallel to the retinal surface (tilted coordinates) Table 3-4. HRT coordinate systems.  The relative and tilted coordinate system was the method by which all calculations were made. The advantage of this coordinate system is that it adjusts for natural variations in tilt of the retina. To define the relative and tilted coordinate system, the absolute mean height and the tilt of the retinal surface must be determined. This was done by defining a reference ring (Fig. 3-10) around the boundary of the topographic image of the ONH. The reference ring was image centered and had an outer diameter of 94% of the image size and an inner diameter of 91% of the image size, as shown in Figure 3-10.  Image Size 10° x 10°  Outer Ring Diameter 9.4° (2.7 mm)  Inner Ring Diameter 8.8° (2.5 mm)  Ring Width 0.08 mm  Figure 3-10. Reference ring position within a 10° image.  52  The diameter and location of the reference ring were chosen based upon the extents of the peripapillary retina in the image (Heidelberg, 1994). A plane was fitted to the retinal surface height around the reference ring. The mean height of the plane which best fit the retinal surface was equal to the mean height of the reference ring. This defined the origin of the z-axis for the relative and tilted coordinate system. The two tilt angles, relative to the optical axis, of the surface normal to the plane which best fit the reference ring were used to define the tilt of the coordinate system (Fig. 311).  contour line  contour line  Figure 3-11. The retinal surface plane as defined by the reference ring (T - temporal, N - nasal).  In summary, for the relative and tilted coordinate system used, the height measurements are relative to the mean height of the retinal surface of the reference ring, and the (x,y) planes are tilted to be parallel to actual the retinal surface. The new z-axis is perpendicular to the mean retinal surface. The tilt corrects for misalignment of the laser beam. The relative coordinates keep the magnitude of the height measurements relative to the retinal surface.  53  3.1.2.7 Calculating O N H Image Parameters The topographic images that were a result of the above post-acquisition-processing consisted of a 256 x 256 array of 8 bit height values. Figure 3-12 is a three-dimensional mesh diagram of a typical ONH image. These diagrams qualitatively describe the ONH to a trained ophthalmologist, but represent an extremely large amount of information about the relative height of the points imaged on the ONH surface.. In order to simplify this information and to quantitatively describe the ONH, parameters were measured from these images using proprietary software supplied by Heidelberg Engineering. Heidelberg researchers conferred with glaucoma clinicians who proposed measures of the ONH which seemed a priori likely to be useful. These measurements are known as stereometric parameters.  Figure 3-12. Optic nerve head in a three dimensional mesh view. Ripples are blood vessels. The depression in the center is the central region (cup) of the ONH.  3.1.2.8 Reference Plane The calculation of the stereometric parameters required the definition of a reference plane, an ONH boundary or contour line, and a curved surface. The standard reference plane was defined, in version 1.10 of the Heidelberg software which was used, to be 320 um (HRef) posterior to the mean height of the retinal surface as defined earlier (Fig. 3-13). That is, the standard reference plane was  54  z = 0.320 mm in the relative and tilted coordinate system. The software also allowed the option of defining the reference plane to be any height, in absolute (the focal plane of the eye) or relative coordinates. The effect of changing this reference plane on the diagnostic value of the resulting stereometric parameters was examined. Airaksinen and Tuulonen (1993) and Heidelberg (1994) proposed that the height of the retinal surface at the papillomacular bundle be used as the zero z-coordinate value. The advantage of using this method is that there is a lower probability of changes occurring to the retinal surface height at the papillomacular bundle during the progression of glaucoma. The calculations using this origin were compared with those made using the standard reference plane in preliminary studies to determine which of the two reference planes would produce ONH parameters which could be best used to detect glaucoma.  cup  Figure 3-13. Cross section of the optic nerve head showing the reference plane, parallel to the retinal surface plane and separated by a distance Href (T - temporal, N - Nasal).  55  3.1.2.9 Contour Line or ONH Boundary Before calculating the ONH parameters, the region had to be outlined.  Using a hand held  mouse a clinician or an operator skilled in recognizing the boundary of the ONH drew an outline in the ONH image displayed on the computer screen. The reflectivity image was used in conjunction with the actual topographic image to outline the ONH (Fig. 3-14). The reflectivity image is similar to what the clinician sees when looking through an ophthalmoscope at a patient's ONH, and it was therefore easier for a clinician to use the reflectivity image, rather than the topographic image, to identify the ONH boundary. In the present study only one clinician (FSM) outlined the ONHs. He had no prior knowledge of the patient's diagnosis, which removed any variation in method due to technique, experience, or knowledge.  Figure 3-14. Drawing the contour line using the reflectivity image (right) as a guide; the topographic image is shown on the left.  3.1.2.10 Corrected Contour Line The height variation diagram (Fig. 3-15) shows the measured height of the surface of the retina along the contour line. The scale displayed on the right side of the diagram is the relative height (relative and tilted coordinate system) denoted, z, and the scale on the left side of the diagram is the  56  height relative to the mean absolute height of the entire contour line denoted, dz. The horizontal axis is scaled in degrees and the height has been measured along the contour line from zero to threehundred and sixty degrees in one degree increments. Table 3-5 shows the angular positions of anatomical locations around the ONH.  i  i  1  1  Q  ©  -o.w •la -».a  t i l t e d ,  H.SM  /  HRef  =  r e l a t i v e a.211  H  H  -o.Oi; 0.10  V -  0.20 0.30 0.10  z  [ H M ]  0.51  0 10 1) 5(1  0 00  15.00  JO. 00  135.00  180.00  225.00  270.00  315.00  360.00  C »3  Figure 3-15. ONH image (right eye) with the associated peripapillary height variation diagram along the contour line. The height variation diagram is plotted moving clockwise (right eye) and counter clockwise (left eye) around the contour starting from the 9 or 3 o'clock position, respectively.  Angular Position  Description  0° (360°)  temporal  90° (-270°)  superior  180° (-180°)  nasal  270° (-90°)  inferior  Table 3-5. Angular position of anatomical ONH positions.  57  These positions are independent of which eye is examined as they are defined in relation to anatomical landmarks which are mirrored in each eye. The contour line height variation diagram typically shows a double maximum, due to the variation in the thickness of the retinal nerve fiber layer around the ONH. The peripapillary height variation has been observed to decrease during the development of glaucoma (Heidelberg, 1994). The contour line also varied in height due to local elevations caused by retinal blood vessels. These features were removed by a smoothing function within the proprietary software which was not disclosed. The remaining variations in the corrected contour line height over the retinal surface were due to variations in the thickness of the nerve fiber layer. 3.1.2.11 Curved Surface The ONH was cut by a 'curved surface' defined as follows: (1) the surface was bounded by the (corrected) contour line; (2) the boundary's height was equal to the associated corrected contour line height; (3) the height of the surface's center was the mean height of the corrected contour line; (4) the surface was sectioned by straight lines from the center to the boundary points; (5) the surface was not a plane, but follows the corrected contour line height variation; it resembles an unfolded filter paper. The reference plane, the corrected contour line, and the curved surface were the basis of the parameter calculations describing the shape of the ONH. 3.1.2.12 Stereometric Parameters or ONH Shape Parameters The types of stereometric measurements made are described in Table 3-6.  Figure 3-11 and  Figure 3-13 give pictorial descriptions of the ONH and its boundaries.  58  Parameter Area  Variable ag  Description of the stereometric parameter. The total global area within the contour line, i.e. the disc area.  Effective area  eag  The total global area of those parts within the contour line which are located below the curved surface, i.e. an estimate of the cup area.  Area below reference  abrg  The total global area of those parts within the contour line which are located below the current reference plane.  Height in contour  hie  The mean Z coordinate of all parts inside the contour line.  Mean height of contour  mhcg  The mean global Z coordinate of the corrected contour line.  Peak height of contour  pheg  The global Z coordinate of the maximum elevation of the corrected line.  Height variation contour  hvc  The difference in height between the most elevated and the most depressed point of the corrected contour line. When the contour line is placed around an ONH, this describes the height variation of the surface of the peripapillary retina which is due to the thickness variation of the nerve fiber layer. This parameter typically has a value of 300 hundred microns or less.  Volume below surface  vbsg  The total global volume of those parts within the contour line which are located below the curved surface, i.e. an estimate of the cup volume. The curved surface is used as the upper limit for the measurement.  Volume above surface  vasg  The total global volume of the parts within the contour line which are located above the curved surface, i.e. an estimate of the neuro-retinal rim volume. The curved surface is used as the lower limit for the measurement.  Volume below reference  vbrg  The total global volume of those parts within the contour line which are located below the reference plane. The reference plane is used as the upper limit for the measurement.  Volume above reference  varg  The total global volume of those parts within the contour which are located above the reference plane. The reference plane is used as the lower limit for the measurement.  Maximum depth  mdg  The global mean depth of the 5% of pixels with the highest depth values within the contour line, i.e. maximum cup depth. The depth is determined relative to the curved surface.  Third moment  tmg  The global third central moment (skewness) of the frequency distribution of depth values relative to the curved surface of those parts located inside the contour line. Only structures located below the curved surface contribute. This is a function of the overall shape of the ONH. Values are typically negative in normal eyes (flat cup with small depth values) and positive in glaucomatous eyes (high slopes at the cup boundary, deep cup, high depth values most frequent).  Mean radius  mr The mean radius of the contour line by interpolation. Table 3-6. The HRT's stereometric parameter definitions.  59  The third moment was the most complex calculation and was defined as the normal third central moment of the frequency distribution of the depth values in the cup of the ONH. As this measure turned out to be one of the most useful, its calculation will be described in more detail, as follows: (1) the curved surface bounds the upper surface of the cup; (2) within the (corrected) contour line, the local depth of the cup is measured for each pixel as the difference between the measured height and the curved surface; (3) the frequency distribution of these local depth values describes the characteristic shape of the cup; (4) the third moment is the skewness, a measure of the asymmetry; (5) the normalized central third moment is determined by the following equation: J (f - Df  D  3  i(t - D)dt  • f f(f -  (3.2)  D)dt  where t = local cup depth , D = one half the maximum cup depth , and f(t) = frequency distribution; (6) the third central moment is a dimensionless number. Small depth values are more frequent in a shallow cup and high depth values are more frequent in a cup with high slopes at the edges. The value of the third moment is negative if small depth values are more frequent than high depth values. Figure 3-16 shows a normal ONH with a shallow cup and a negative third moment value (referred to as cup shape measure in the figures). The third moment is positive if high depth values are more frequent than small depth values. Figure 3-17 shows an abnormal ONH with a deep cup and steep sides which has a positive third moment value.  60  An example output of these calculations is shown in Figure 3-18.  It should be noted that the  spreadsheet shows only some of the values for the entire ONH, namely those for some predefined segments which are described in Table 3-7 below (Table 3-5 defines anatomical locations). The global values of the stereometric parameters were used in the analyses described in Section 2.3.1 and 2.3.2. The global stereometric parameters and the diagnostic criteria summarized in Table 3-1 form the data used for each patient in the following analyses. Parameter  [din]  Area [mm ] Effect. Area [nm* ] Area bel Refr [mm*] Height in Cont [mm] MeanHeight Cont[mm] PeakHeight Cont[mm] Height Uar Cont[mm] Uol bel Surfce[cmm] Uol abu Surfce[cmm] Uol bel RefrncEcmm] Uol abu Refrnclcmm] Maximum Depth [mm] Third moment [] Mean radius [mm] 2  global tempor1 tmp/sup tmp/inf  nasal nsl/sup insl/inf  1.652 8.993 8.4B6 8.198 8.118 -0.083 8.499 8.176 8.826 8.837 8.398 8.517 -8.215 8.728 tempor1 tmp/sup tmp/inf nasal nsl/sup nsl/inf  8 482 8 366 0 244  8.212 8.163 8.B87  8.228 8.147 8.838  B.395 8.144 8.882  8 215 8 083 0 832  8.288 8.898 0.003  0 368 0 224  8.852 -B.B49  8.862 8.882  8.841 -8.851  -0 878 -0 883  8.836 8.813  8.B48 8.882 B.811 8.836 8.551 8.034  8.829 8.884 8.884 8.852 8.528 -8.166  8.818 8.811 8.888 8.139 8.286 -8.272  0 8 0 0 0 -0  0 0 0 0 0 -0  064 882 819 812 465 126  819 004 882 078 516 161  8.885 8.884 8.888 8.872 8.345 -8.397  t i l t e d , relative HRef = 8.412 mm oth = -5.6 ° ocv = -2.5 *  Figure 3-18. Stereometric parameters calculated from regions defined in Table 3-7.  62  Segment global  Section defined by the segment. Whole  interior  of  Range of Segment  (corrected)  0° to 360°  contour line. temporal (tmp)  Temporal quadrant, (tmp = 0_)  -45° to 445°  tmp/superior  Temporal superior octant.  +45° to +90°  nsl/superior  Nasal superior octant.  +90° t o + 1 3 5 °  nasal (nsl)  Nasal quadrant, (nsl = 180_)  +135° t o + 2 2 5 °  nsl/inferior  Nasal inferior octant.  -135° t o - 9 0 °  tmp/inferior  Temporal inferior octant.  -90° to -45°  Table 3-7. Optic nerve head segment definitions.  3.2 Statistical Analysis 3.2.1 Data Screening The data from the perimetric and tonometric classification were collected using methods that have been standard in glaucoma testing for a number of years (Shields, 1992); the Humphrey Field Analyzer was used for visual field testing as in most clinical practices (Silverstone, 1986). The data acquired using the HRT was screened for any errors made in collection and/or entry; the statistical program BMDP1D (Dixon, 1988) was used. 3.2.1.1 Univariate Descriptive Statistical Checks The fourteen stereometric parameters were evaluated using univariate statistical indices to help determine the best multivariate analysis method. The ranges were examined to ensure data was entered correctly and as a preliminary step in checking for outliers. Histograms of the variables showed the frequency distribution of the data for each individual variable and gave an indication of the type of distribution present in the sample normal and glaucomatous populations. The arithmetic means, standard deviations, modes, and the sample variances were checked for any indication of errors in data or abnormalities for both the normal and abnormal groups.  63  3.2.1.2 Univariate and Multivariate Outliers As with any disease measure, there can be expected to be a variation in the data values proportionate to the severity of the condition. Given a large sample size, the measurements from the normal group were expected to be normally distributed. The abnormal group was also expected to be normally distributed, but with a different mean from the normal group, thus separating the two groups. The distribution of each variable was examined for both groups to identify any values which laid outside the expected range: a value greater than three standard deviations was used as an indicator for possible outliers. The standard scores were calculated using the program BMDP1D (Dixon, 1988). Sometimes a case may have an unusual pattern of scores which considered singly may be within the expected range for a standardized score. However, this pattern becomes obvious in multiple dimensions.  A within-group test, the Mahalanobis distance, was used to identify such cases for  each group. A conservative probability estimate for a case being an outlier was used: a of p<0.00\. The Mahalanobis distance, a measure of a case's distance from the mean of all cases, was evaluated as  with the degrees of freedom, v, equal to the number of variables, thus, ^  a  v  . Mahalanobis  distance is equivalent to chi-squared (j?) because under multivariate normality D has the same 2  distribution as chi-square. D (jf) was calculated using the program B M D P A M (Dixon, 1988). 2  Four multivariate outliers were found (/?<0.001, 13 degrees of freedom, ^>34.528) using the Mahalanobis distance (D ). The patient case numbers were 2362, 880, 1141, and 2417 with 2  of  40.976, 51.639, 79.436, and 42.716. The only case removed from the data set was 1141 because after examining the variables individually it was also found to be a univariate outlier in three: hie, mhcg, and pheg with standard scores of 6.12, 6.20, and 6.15. When examined further it was found  64  that there was an error made when outlining the optic nerve head which explained the extreme values in these variables. 3.2.1.3 Testing for Normality The underlying assumption of all the statistical procedures used in the analyses was that all of the variables and all linear combinations of the variables were normally distributed.  The  assumption of multivariate normality, when met, resulted in the residuals of analysis also being normally distributed and independent. It was impossible to test all possible linear combinations of the variables for normality, although the statistical tests were derived with this assumption. Shapiro and Wilk's (1985) W statistic is a measure of the normality of a distribution. The W statistic is positive with a maximum value of one, values greater than 0.9 are expected with normal distributions, but the p-value indicates whether there is significant departure from normality. Normality of variables was also tested using two indices of distribution: skewness and kurtosis. The skew is a measure of asymmetry of the distribution. The expected value is zero for a normal distribution and when divided by its standard error, can be roughly read as a standardized score for a normal distribution with a significance level. The standard error is determined as Standard Error of Skew = (6/N)  1/2  (3.3)  where N is the number of cases. The kurtosis of the distribution is a measure of the relative frequency of cases in the tails compared to the center of the distribution: positive kurtosis has a higher than normal frequency of cases in the center and negative kurtosis has a higher than normal frequency of cases in the tails. The expected value of the kurtosis is three for a normal distribution, however, most statistical packages adjust this to zero. The standard error for kurtosis can similarly be calculated for kurtosis and when the kurtosis is divided by its standard error the result is a normal score with an associated significance level. The standard error is determined as 65  Standard Error of Kurtosis = (24/N)  1/z  (3.4)  where TV is the number of cases. When a variable was found to significantly differ from a normal distribution using one of the three statistics described above, the variables were transformed using one or a combination of the following relationships: square root, logarithmic, inverse, or reflect. The W statistic, skewness and kurtosis values were calculated using the program BMDP2D (Dixon, 1988). 3.2.1.4 Testing for Heteroscedasticity When the assumption of multivariate normality was met, the relationship between variables was homoscedastic, i.e. the relative variability of scores for one variable was roughly the same for the other variables.  However, if one of the variables was non-normal or there was an indirect  relationship among the variables, the relationship became heteroscedastic.  This would cause a  perfectly good mathematical relationship between variables to be missed or not fully captured by the analysis.  Any linear relationship between variables may still have been captured but  information that may lead to even higher predictability among variables would have been lost. The analysis would not be invalidated but merely weakened because of lost information. The variables in this study were examined using bivariate scatter plots; bivariate scatter plots made apparent any non-linear relationships among the variables.  It was not necessary to correct for non-linear  relationships using variable transformations. 3.2.1.5 Significant Variable Relationships The variables were analyzed for any statistically significant correlations with each other and with common risk factors. It has been shown in previous studies that patient risk factors such as age, gender, and the eye of the examination can be correlated with glaucoma (Kass, 1978; Jonas et al., 1989; Jonas and Naumann, 1989; Klein, 1992). These risk factors were examined for their  66  effect upon patient groupings through group significance tests and a correlation matrix with the program BMDP3D (Dixon, 1988). Specifically, it has been shown that age has an effect upon the retinal nerve fiber layer (Jonas et al., 1989) and that the thickness of the retinal nerve fiber layer decreased with age (Balazsi, 1984; Jonas and Naumann, 1989). The effect of the patient age upon the stereometric parameters was removed through linear regression using the program BMDP1R (Dixon, 1988). Multicollinearity is said to occur when correlations of 0.90 and above exist bewteen variables, and singularity occurs when a variable is perfectly correlated with a combination of one or more of the other variables. These conditions lead to problems when matrix inversion is attempted; the solution is made unstable by multicollinearity and prohibited by singularity.  The program  B M D P A M (Dixon, 1988), used during preliminary data screening for missing values, computed squared multiple correlations (SMCs) for each of the variables with all of the other variables. Thus, the offending variables could be identified and removed. If the variable had an SMC of greater than 0.9999 was it removed; none were removed using the SMC as a criteria. 3.2.2 Multivariate Analyses and the General Linear Model It is hard to analyze and make inferences from multivariate data. When there are mixtures of discrete (the patient grouping) and continuous variables (the stereometric parameters) the possible ways to analyze the data increases. Multivariate statistical techniques were introduced to handle the above situation and have developed into very useful tools. They minimize the number of steps that once were taken analyzing complicated data sets using univariate statistics, but have increased the complexity of the results and are more difficult to interpret. Fortunately univariate statistics and multivariate statistics are based upon the same general linear model, at least when dealing with parametric data.  67  The general linear model (GLM) (Tabachnick and Fidell, 1989) is based upon the assumptions that the variables have a linear relationship with each other and that they are additive. Thus, all multivariate models are composed of a series of weighted terms added together. These assumptions do not exclude variables with non-linear relationships from the models but will limit the amount of useful information gained from a linear analysis. The choice of strategy for dealing with overlapping variance between variables is dependent upon the type of solution sought. The data in the study was primarily used for its ability to predict patient group membership and any variance explained by more than one variable needed to be taken into account. 3.2.2.1 Multivariate Analysis of Variance A multivariate analysis of variance (MANOVA) was used to evaluate the difference between the centroids of the set of dependent variables (stereometric parameters) for an independent grouping variable (disease, gender, or eye).  If a reliable difference was found the independent 2  variable was further examined to assess its influence upon patient grouping. Hotelling's T test was 2  used because the groupings were dichotomous.  Hotelling's T test was performed using the  program BMDP3D (Dixon, 1988). 3.2.2.2 Principal Components Analysis A principal components analysis (PCA) was performed on the data set to determine if there were combinations of the variables which made clustering of data more obvious. The variables were condensed into a smaller number of uncorrelated variables, the principal components. This was a good exercise to go through as there were no initially specific hypotheses as to which stereometric parameters were of any value for the prediction of group membership. The variables in  68  the study were examined for the presence of these factors using the BMDP4M program (Dixon, 1988). 3.2.2.3 Discriminant Function Analysis The purpose of discriminant function analysis (DFA) is to predict group membership using a set of predictor variables. The group identification must be known for each case used in the analysis. The combination of predictor variables is called the classification function. This function can then be used to classify new cases whose group membership is unknown. This function is based upon the separation of groups due to the variance attributable to differences between groups and the variance attributable to the difference within groups. The variables chosen to be used in computing the linear classification functions can be done in a stepwise manner, i.e. adding one at a time. Specifying forward or backward selection of variables allows entry or removal of variables at each step from the classification function; the variable that adds the most separation of the groups is entered or the variable that adds the least is removed. The analysis produces two types of classification functions: canonical discriminant functions and classification equations. The number of discriminant functions is determined by the lesser of the number of predictors or the number of groups less one. These functions discriminate between combinations of groupings. The number of classification equations is determined by the number of groups. These equations give a score for each case for each group, the highest of which wins the classification. Discriminant functions are like regression equations; a discriminant function score was calculated from the sum of the input features, each weighted by a coefficient.  The mean for each  discriminant function, D„ for all cases is zero, because the mean for each of the standardized input features used was also zero. Following this logic, the standard deviation for each D. is unity. The 69  group means of the discriminant functions were measured in standard deviation units from the overall discriminant function mean.  The discriminant functions essentially reduce the spacing  between groups from that of the predictors' dimension to a single dimension, the discriminant function's.  Discriminant functions can be evaluated for significance, and usually the first  contributes most to the separation of groups. There were only two groups and therefore, only one discriminant function. A case was classified into a group dependent upon whether the D score was ;  positive or negative. A classification equation was developed for each group and then cases were evaluated for group assignment.  The classification equations were the sums of the predictors, each weighted by a  coefficient, and the addition of a constant.  The coefficients were found from the means of the  predictors and the pooled within-group variance-covariance. The group classification scores were calculated for each case and the case was then assigned to the group with the highest classification score. The classification functions were developed using a forward stepping discriminant function analysis with the program BMDP7M (Dixon, 1988). 3.2.2.3.1 The Classification Functions The number of canonical discriminant .functions is limited by the number of groups or the number of predictors. It is either the number of predictors used or the degrees of freedom for groups whichever is smaller. There are two groups in this study and thus, there will always be only one discriminant function. The degrees of freedom for the groups is equal to the number of groups, k, minus one. A positive value indicates membership to one group and a negative value indicates membership to the other. The following is the general form of the discriminant functions:  70  D,=i(d,j-Zj)  + K  (3.5)  ;'=i  where D= canonical discriminant function, d = discriminant function coefficients, z-= stereometric i}  parameters, i= number of functions, p= number of predictors, and K is a constant which was added when the inputs had not been standardized. To assign cases into groups, a classification equation is developed for each group. Thus, this study would only need two classification equations, one for the normals and one for the glaucomas. The group function with the highest value dictates the group membership of a case. The following is the general form of the classification equations:  Cj = c  -X,)  j0  (3.6)  1=1  where C= classification function, c. = constant, cji= the classification function coefficient, Xi = the 0  stereometric parameter value for each case classified, / = variable number, j = group number, p = total number of variables. The C.'s were determined from the following equation:  • Cj = W~' • Mj  (3.7)  •l where W  = within-group variance-covariance matrix, Af. = column matrix of the predictors'  average values, i.e. the means of each variable for each group. The constant, c- , was determined as 0  c  j0  = -y -Cj 2  • Mj  (3.8)  and was the constant for group j. 3.2.2.3.2 Validity of Results Stepwise Addition of Variables The measure used to determine a variable's entry into the classification function was the F statistic. This statistic is the same as the ratio used in analysis of variance; testing the equality of 71  group means. The statistic is calculated using the Mahalanobis distance statistic and is proportional to a distance measure. In the discriminant analysis the F-to-enter for each predictor is calculated at step zero. It is the univariate F for testing reliability of the mean difference between the group singled out and the others. At each step the variable with the highest F-to-enter is entered into the classification function. Once entered, an F-to-remove is calculated which reflects the reduction in prediction that would result if a predictor were removed from the equation.  It is the unique  contribution the predictor makes, in this particular set of predictors, to separation of groups. Thus, the variance explained with its removal will reflect this proportion of variance in the data set. Table 3-8 lists the criteria used to determine a variable's entry into or removal from the classification function. Step Number  Degrees of  Fcritical  Freedom  p<0.01  1  87  6.977  2  86  4.898  3  85  4.055  4  84  3.582  Table 3-8. The values for F-to-enter and F-to-remove during a the forward stepwise DFA. The significance all F values was p<0.01.  Unequal Group Size When the groups are of unequal size the classification equations are adjusted for the difference in group size. This extra term in Eqn. 3.9 from Eqn. 3.6 adjust for the group size, n , by using a }  natural logarithmic ratio of the group size to the total number of cases, N.  J  C  = JO C  +  X (n c  • >) + X  L N  K /  )  N  (3-9)  /=i  72  McNemar's Test for Reliability When variables are entered into the classification functions, although they may provide a higher classification rate, they may not reliably improve the classification's models ability to predict group membership.  McNemar's repeated-measures  test for change indicates if there was reliable  improvement in classification with the addition of variables to the classification function. Equation 3.10 is the calculation for the % statistic with one degree of freedom. 2  % = (IB-CI-1) /(B+C) 2  2  (3.10)  Where B is the number of incorrect cases in the earlier step, which were correct in the later step when the variables were added. C is the number of correct cases in the earlier step, which were incorrect in the later step after the variables were added. The critical value for  , at a significance  level of/?<0.05, is 3.84. Jackknife Validation The classification functions are derived using all of the cases, however, when this approach is used bias enters the classification because the coefficients are derived in part from each case. In a jackknifed classification, the coefficients used to assign a case to a group are computed when the case is left out. Thus, each case has its own set of coefficients derived from all of the other cases. This gives a more realistic estimate for the predictors to separate groups. In stepwise discriminant function analysis not all of the bias is removed but it is greatly reduced; the entry of variables into the classification function is based upon the entire data set. After the final step in the discriminant analysis, the Mahalanobis distance from each group mean is calculated to each case and the posterior probabilities for each case.  A jackknife procedure (Lachenbruch and Mickey, 1968)  computes the Mahalanobis distance as the distance from the eliminated case to the groups formed  73  by the remaining cases. The result represents a validation of the classification model's ability to predict group membership for new cases using the associated variables. 3.3 Artificial Neural Network Analysis Artificial neural network research has been motivated from its inception by the recognition that the brain computes in different way than conventional computers. The brain is a highly complex, parallel computer which has the capability of performing many tasks much faster than present day serial computers. The human vision system is a good example: it takes 100-200 ms to recognize a face which might take a serial computer days (Churchland and Sejnowski, 1986). The brain builds its own rules from birth, which define how the neurons that make up that great structure communicate. Synapses, connections between the cells, form and are changed through experience. An artificial neural network is an analogy to the brain. It is a parallel distributed processor that has a natural capacity for storing experiential knowledge. This is similar to the brain because the knowledge is gained through a learning process and the interneuron connection strengths, known as synaptic weights, are used to store this knowledge.  The neurons in the brain fire once certain  stimulus thresholds are reached and the synapses are thought to be strengthened through repeated firing (Hebb, 1949). The synapses can be either excitatory or inhibitory in nature, but not both. The artificial neural networks are organized in a similar manner. The nodes each have an activation function which defines how it is to fire when it receives an input. The nodes are connected through synapses of variable strength, usually called weights, which are adjusted to alter the nature of internode connections. The structure of the network, the number of nodes and the number of layers can be varied according to the task the network is asked to solve.  74  3.3.1 Artificial Neural Networks Architecture The manner in which the neurons of an artificial neural network (ANN) are structured is closely linked to the learning algorithm used to train the network. The simplest network contains an input layer of source nodes that project onto an output layer of computational nodes. This order defines the network as feedforward, i.e. no projections from the higher level of the output to the lower level of the input. Multilayered feedforward networks (Fig. 3-19) are distinguished by the presence of one or more hidden layers, whose computation nodes are called hidden units. The function of the hidden neurons is to intervene between the input and network output.  Input layer of source nodes  Layer of hidden neurons  Layer of output neurons  Figure 3-19. Multilayered artificial neural network (Haykin, 1994).  Each hidden layer added enables the network to extract higher-order statistics which is valuable when the dimension of the input layer is large (Churchland and Sejnowski, 1992). The source nodes in the input layer apply the respective parts of the activation pattern to the first hidden layer. The outputs of the computation units in the first hidden layer are used as inputs to the second  75  hidden layer. This continues through all the layers of the network until the output layer is reached. The output signal of the output layer constitutes the overall response of the network to the activation pattern. 3.3.2 Back Propagation Learning Algorithm Multilayered networks have been applied successfully to solve some difficult and diverse problems by training them in a supervised manner with the error back-propagation algorithm. Rumelhart, Hinton, and Williams (1986) developed the back-propagation learning algorithm which has emerged as the most popular learning algorithm for training multilayer networks.  This  algorithm was discovered independently in two other places (Parker, 1985; LeCun, 1985) but the original work was described by Werbos (1974). The algorithm is based upon the error correction learning rule. The work presented by Rumelhart et al. (1986) solved the problem of how to automatically adjust hidden unit weights for a feedforward network. A squashing function was applied to the input of each hidden unit which mapped the hidden unit's input to a smoothly varying step function. This smoothly varying function meant that small changes to the weights of the synaptic inputs allowed the hidden units to abstract high-order properties. The net effect of this was that the input space could be divided by a non-linear decision border.  Cowan (1967) had introduced the  "sigmoid" firing characteristic and the smooth firing condition for a neuron that was based on the logistic function. This non-linear function was applied as the squashing or activation function of the computation nodes. Figure 3-20 shows the logistic non-linearity and Eqn. 3.11 describes that function mathematically,  <P(v) = r - ^ r 1+e  (3-ii)  76  where (p(v) is the activation function, v is the net internal activity level of the node, and a is the slope parameter. The inputs to the node are weighted and summed and a bias is added (or threshold subtracted) producing the net internal activity of the node which is then passed through the logistic function. The output is close to zero for large negative inputs, roughly linear near zero input, and saturates at one for large positive values. 21  1  1  1  1  1  1.8 1.6 -  1  1  1  1  , .  1.4 -  "  1.2 -  v Figure 3-20. The logistic (sigmoid) non-linear activation function (Haykin, 1994).  Basically, the error back-propagation process consists of a forward and a backward pass through the different layers of the network. During the forward pass an activation pattern (input vector) is applied to the to the sensory nodes of the network and its affect is propagated through the network; the weights are fixed during this time. The actual response of the network is calculated using the activation function of each output node j using Eqn. 3.11; where a is 1 and the actual output of node j, yj, is (p(v) at node j.  Then, during the backward pass the synaptic weights are adjusted  according to the error correction rule. The error correction rule uses an error signal which is generated from a comparison between the desired outputs and the actual outputs. This error signal is propagated back through the network against the synaptic connections. The error signal is used to adjust the actual response of the network closer to the desired response.  77  The error signal, e,-, at the output of neuron j for pattern n is defined by e = d (n)-y (n) j  j  (3.12)  j  where dj is the desired output and y is the actual output. The instantaneous squared error for y  neuron j is V2 ef(n), and the instantaneous value E(n) of the sum of squared errors is obtained by summing V2 ej (n) over all neurons in the output layer, set J, thus E(/i) = Vi^e)(n)  The average squared error E  avg  (3.13)  is obtained by summing E(n) over all n patterns, then normalizing  with respect to the pattern set size N,  E .  g  = ^  E  (  n  )  ( 3  "  1 4 )  * * n=l  and represents the cost function as the measure of training set learning performance. The average squared error is a function of all the free parameters (synaptic weights and thresholds) of the network. The learning process involves adjusting the network free parameters to minimize the average squared error, E . avg  The adjustments to the weights are made in accordance with the  respective errors computed for each pattern presented to the network.  The average of these  individual weight changes over the entire training set is an estimate of the true change that would occur by modifying the weights based on minimizing the cost function, E . avg  3.3.3 Weight Changes Using the Conjugate Gradient Method The goal of training the network is to find the set of weights that minimizes the cost function or network error. An error surface can be defined in multidimensional space. One axis represents the magnitude of the error and the other axes are the magnitude of each weight. When moving around on the error surface, one will find that the network error changes for each set of weights mapping  78  out a smooth surface. Thus, there must exist a global minimum for this error and with a nonlinear network there will possibly exist local minima. The method by which we change the weights during training is determined by the search for the minima on the error surface.  A method which has been shown (Kramer and Sangiovanni-  Vincentelli, 1989, and Johansson et al., 1990) to require fewer epochs, the number of training iterations, than the standard back-propagation algorithm is back-propagation learning based upon the conjugate-gradient method for supervised training of multilayered networks. The conjugate-gradient method uses the gradient vector (first-order partial derivatives) and the Hessian matrix (second-order partial derivatives) of the cost function E (w). The gradient vector avg  is calculated for a set of weights on the error surface and is used to adjust the weights. The two choices to be made in adjusting the weights are the direction the change is to be made and the magnitude of the change or step size. The conjugate gradient method chooses the direction which is conjugate to the previous directions. This means that the minimization along the new direction does not undo minimization already achieved along previous directions. A line search is used with conjugate gradient method because directions change radically, and consequently so do appropriate step sizes. The weight vector w(n) at iteration n is updated in accordance with the rule w(n + 1) = w(n) + Ti(n)p(n) where rj(n) is the learning-rate parameter and the direction vector is p(n).  (3.15) The learning-rate  parameter r/(n) is defined by the following equation r|(n) = arg min {Ea g(w(n) + T)p(n)} 11  V  (3.16)  79  which searches for the particular value of 77 for which the cost function E (w(n)+rp(n) is avg  minimized. The computation of the learning rate parameter using Eqn. 3.16 is a line search. The accuracy of the line search was shown by Johansson et al. (1990) to have a profound influence on the performance of the conjugate-gradient method.  The conjugate-gradient method uses the  Hessian matrix in its derivation but the algorithm is formulated in a way that the estimation and storage of the Hessian matrix is avoided. 3.3.4 Training Considerations 3.3.4.1 Synaptic Weight Considerations The first step in back-propagation learning is the selection of the synaptic weights.  Prior  information can be used to choose an initial set of values, for example using a standard set of weights loaded in at the start of each training session. However, most commonly, as in this case, the free parameters of the network are set to random numbers uniformly distributed within a small range. The wrong choice of initial weights can lead to a phenomenon known as premature saturation which has been investigated by Lee (1991, 1993). If the magnitude of the net internal activity level of an output neuron is large, and if the activation function has limiting values of {0, 1} (e.g. a sigmoid function), then the corresponding slope of the activation function will be small and the output will be close to {0, 1}. The neuron is said to be saturated. If the output is {1,0} when the desired output is {0, 1}, respectively, the neuron is said to be incorrectly saturated. When this occurs the adjustment to the synaptic weights will be very small and, although network error will be large, the network will have a hard time leaving an area on the error surface. Thus, the sum of squared errors will remain relatively constant for some time during which the network may have been assumed to have reached a local minimum and training ceased. 80  Lee (1991) stated that incorrect saturation rarely occurs when the neurons of the network operate in their linear regions, i.e. nearer to the middle of the activation function (Fig. 3-20). Rumelhart and McClelland (1986) suggested that the desired outputs be set to {0.1, 0.9} compared with the usual {0, 1}. The reason was that the system can not reach the its extreme values without infinitely large weights causing network instability. The two results combine very well because the shift of the desired outputs will cause the output neurons to operate closer to their linear range. This effect was investigated using differing desired outputs. The normalization and standardization of the input values, keeping magnitudes of the net internal activity levels of all neurons within similar ranges, was examined to control for premature saturation. If a neuron has a considerably larger net internal activity level than another neuron, the first neuron becomes saturated while the other neuron remains unsaturated converging rapidly upon its minimum. The instantaneous sum of squared errors may remain constant long enough for network training stopping criteria to be met. However, the first neuron did not have sufficient time to escape from the saddle point on the error surface. Thus, a neuron may remain saturated giving improper solutions to the minimization of the network error and resulting in premature saturation. The transformation of the input values, JC„ using normalization (Eqn. 3.17) and standardization (Eqn. 3.18) was compared with using the raw input values while training the network.  The mean of the values for variable i is  XiNorm'd = (Xi-Ui)/Ui  (3.17)  Xistd'd = (xi-Ui)/si  (3.18)  and the standard deviation for variable i is  3.3.4.2 Example Presentation In the process of training the ANN, the presentation of examples can be made in a pattern mode or batch mode. In pattern mode (on-line operation) the weights are updated after the presentation of 81  each example. In batch mode a group of examples is presented before the weights are updated. The group can be from two examples to the entire training set and is defined as an epoch. The weights are adjusted, on the basis of the average network error, after the complete presentation of the training examples constituting a training epoch. The batch process was used because it gives a better estimate of the gradient vector. However, pattern mode is less likely to be trapped in a local minimum because it is more like a stochastic search. 3.3.4.3 Iteration and Stopping Criteria The A N N was presented with epochs of training examples repeatedly until the network free parameters stabilized and the average sum squared error computed over the training set was at a minimum or had reached an acceptably small value. When the cost function reached a stationary value where the absolute rate of change in average squared error for each epoch was sufficiently small (from 0.1 % to 1 %) the network stopped training. 3.3.5 A N N Generalization An artificial neural network is said to generalize when input patterns never used in training or creating the network have their output correctly computed by the network.  The training of the  network may be viewed as a curve fitting problem where the network learns to fit a function to the data. Figure 3-21 illustrates the fitting of two curves to the same data. Assume that the left curve is the actual function that we are attempting to approximate with the nonlinear input-output mapping produced by the network. The right curve has fitted a function which has memorized the data points or patterns presented, but an arbitrary point from the left curve will most likely not be approximated well by the right curve.  The left curve with properly fitted data has good  generalization, whereas, the right curve with overfitted data has poor generalization. Generalization occurs when a new pattern fits the function approximated by the network. 82  Input  Input  Figure 3-21. Curve fitting: (left) properly fitted, (right) overfitted (Haykin, 1994).  Generalization is influenced by the size of the training set, the configuration of the network, and the physical complexity of the problem at hand. The complexity of the problem is something which can not be controlled and in this study is not even known. However, the size of the training set and the network configuration can usually be influenced. The size of the data with this study is fixed to the number of patients available. However, a constraint on the number of patterns necessary for generalization has been proposed by Baum and Haussler (1989) for a network with one hidden layer. The number of patterns, N, necessary in the training set is bounded by the number of weights, W, the number of hidden nodes, M, and the error permitted, e, on a test of the approximated function. The training set error is less than e/2 and thus, Wis  N>  32W  32M^  r  In  (3.19)  An approximation for this function is simply N>W/e indicating that the number of patterns necessary is 1/e times the number of weights.  83  3.3.6 A N N Configuration for Generalization As mentioned in the last section, network generalization is affected by the configuration of the ANN.  It has been shown that the decision regions formed by single and multiple layer ANNs  depend upon the number of hidden nodes and layers (Lippman, 1987).  Thus, by varying the  number of hidden layers and hidden nodes the classification regions can be manipulated (Lippman, 1987). The resulting decision surface topological characteristics will determine the ability of the A N N to generalize for future classification of data. "The hope is that the network so designed will generalize to predict correctly on future examples of the same problem" (Baum and Haussler, 1989). The Vapnik and Chervonenkis (VC) dimension is used to determine the maximum number of dichotomies that can be induced on the input features.  N is the number of input patterns; d is  number of input features; k is the number of hidden units in the first hidden layer unless otherwise specified; e is the dimension of hypercube output space, the number of output units; W is the total number of weights whose values define the mapping of the ANN's state to the classification decision hypersurface, the input space; and e is the total fraction of misclassified training examples. The upper bound for this number is N 0(W/e log (W/e)) for which the network will correctly 3  2  classify l-e/2 examples, with confidence that there will be future correct classification of 1-e examples. The lower bound for which this choice will fail to correctly classify 1-e future examples is W(W/e) (Baum and Haussler, 1989). The formation of different decision regions (Lippman, 1987) can be made using some simple guidelines with an error back-propagation network which uses sigmoidal activation functions. A simple convex region can be formed with a single hidden layer in the network; the number of  84  hidden nodes in the first layer defines the number of sides to that region. A complex hypercube can be formed with more than one hidden layer. The number of hidden nodes in the first hidden layer should be twice the number of inputs. The second layer actually performs a logical A N D on the outputs of the first hidden layer. The minimum number of hidden nodes in the second layer should equal the number of disconnected regions in the input space. There must be at least three hidden nodes in the first hidden layer in order to form convex regions in the second hidden layer. A third layer is not necessary in a feedforward classification A N N due to the complexity of the regions formed. The number of hidden nodes necessary to divide the decision surface into arbitrary dichotomies is (N-l) (Huang and Huang, 1991; Nilsson, 1965; Sartori and Antsaklis, 1991). The hyperplanes formed by the hidden nodes with sigmoid computing elements in ^-dimensions can not be constructed to cut more than d specific segments. Thus, N/d hyperplanes are necessary to ensure separation of an arbitrary dichotomy (Baum, 1988).  Figure 3-22. Regions formed by two distinct data groups, (a) linearly separable regions, (b) non-linearly separable regions (Haykin, 1994).  85  Each hidden node in the first hidden layer forms a hyperplane in the <i-dimensional input space separating simple dichotomies (logistic node functions).  Upper hidden nodes combine regions  defined in the input space by lower hidden layer's nodes; second hidden layer performs a logical AND on the first layer nodes, and the output layer would perform a logical OR on second hidden layer nodes. A third hidden layer is not necessary for classification of data due to its high degree of complexity and thus, non-generalizing form. Convex regions are formed by more than one hidden node in the first layer. There must always be at least three hidden nodes in the first layer when using nodes in upper layers otherwise complex regions will not be closed.  A hypercube's  complexity is dependent upon the number of first layer hidden nodes and is formed from at least one second hidden layer node (Pedroni and Yariv, 1993). Generalization for an A N N is dependent upon the input space being fully defined for all group regions, thus as N increases the greater the chance for valid generalization. For a small sample size, N, the A N N cannot be expected to generalize, but as N grows and the hypothesis is not supported, an agreement on the samples is probable and the A N N should be expected to generalize (AbuMostafa, 1989). 3.3.7 Cross Validation Cross validation is commonly used in statistics (Stone, 1974; Goldbaum et al., 1994) and removes bias from a model when training data has to be used to test the model. Generally, the data set is partitioned into a modeling set and an evaluation set; the modeling set is further partitioned into a training set and a validation set. model.  The architecture of the A N N defines the classification  The modeling set is used to determine the optimum configuration of the A N N . The  evaluation set is used to test the generalization of the resulting classification model. The A N N configuration being tested is taught with the training set and tested with the validation set. This is 86  repeated for different A N N configurations. After the optimum classification model is determined the A N N is trained using the complete modeling set.  The classification model's generalization  ability is then evaluated using the test set. The data set was small, 89 cases, which with thirteen or fourteen input dimensions provides a very sparsely defined input space.  Therefore, a variation to the cross validation method was  instituted. A masked training session was used to develop the classification model. Two examples, one normal and one abnormal, were removed from the modeling set and then the rest were used to train the network. The two examples were then used to validate the networks performance and the results noted. This two step process was repeated until the entire modeling set had been used to validate the A N N configuration. This reduced any bias because the network parameters were not derived in part from any cases in the testing set.  The process was then repeated for all  configurations of the network which were examined and the results recorded.  There was no  evaluation set available to test the classification model's true generalization ability. However, the validation rate was expected to be a good approximation for the generalization of the classification model. 3.3.8 A N N Classification Function Figure 3-23 is a node diagram of the A N N structure from which the model equations were derived. Although a second hidden layer was included in the figure, the classification model was not expected to include a second layer because of the theoretical considerations presented in Section 3.3.7.  87  Input layer  First hidden layer  Second hidden layer  Output layer  Figure 3-23. ANN Structure (Haykin, 1994).  The activation of each hidden and output node was governed by the following sigmoid function, which was determined from Eqn. 3.11,  /<z)  = TT^  '  (3  where z = the sum of all the weighted inputs and thresholds.  20)  Thus, for node j in the first  computational layer, the summed input of that neuron is  i=0  where p is the number of inputs to node j,  is the connection weight from the i input of node j, -th  and Xj is the input feature value to node j (Xj = 1.0, all bias units are set to an activation of unity). t  0  The activation, f-, of the node j in the first computational layer is 1 fj ~  1  +  <-(i-O)-w,o-X;r»0-i-  '  (3.22)  where / . is the activation of node k in the second computational layer t  (3.23)  88  The output up to node k in the second layer is  fk = ~  7 (-(1.0).w -  >  —  l  .„. _ . .„. ..  t0  o  x  l+e  i  r  )  (3.24) W  H  -  )  '  Hence, for a single hidden layered network with one output unit, f  k  is the overall response of the  A N N to a single input pattern. 3.3.9 Simulation of the A N N Version 3.1 of the Xerion Neural Network Simulator, written by Drew van Camp (1993) at the University of Toronto, was used to simulate the artificial neural network.  The error back  propagation model was used for the network simulation and the simulator's script language was used to program the training, validation, and testing instructions. 3.4 ROC Analysis The inherent discrimination capacity of a classification system depends upon the extent to which the distributions are separated or overlapped. The classification function maps its inputs to a single dimension to separate the groups distributions.  The observer must adopt a confidence  threshold for the mapping to select the most appropriate diagnosis.  The confidence threshold or  decision criterion will depend upon both the observer's estimates of prior probabilities and value judgments regarding the consequences of various kinds of correct and incorrect decisions. An ROC curve is traced out as the confidence threshold is varied; there is no need to know the shape of the distributions. A numerical threshold of abnormality must be selected explicitly when diagnostic tests yield single numerical results. Both the results of the D F A and the A N N , produced patient classifications associated models.  using their  The D F A produced a canonical discriminant function which produced a  89  numerical value for patient scores.  The value was expected to vary between ± 3 because the  canonical function is roughly an approximation to a standardized score of a normal distribution. The threshold is normally located at zero, but when generating the {sensitivity, specificity} pairs the value was varied from the lowest score to the highest. The artificial neural network produced a overall network output for each set of patient scores. The value range {0, 1} was as expected because the network output is based upon the logistic function. The threshold is normally 0.5, but was varied {0, 1} to generate {sensitivity, specificity} pairs. For both analyses the step in between thresholds was chosen to be sufficiently small to define a curve. The areas under each curve were calculated to produce a measure of the overall discriminant ability of each classification model.  90  4. Results 4.1 Statistical Screening 4.1.1 Descriptive Statistics After removal of one outlier (see Section 3.2.1.2), the data set consisted of forty-five normal subjects (mean age of 51.6 ± 15.0 years) and forty-four glaucomatous subjects (mean age of 61.0 ± 12.2 years). Of the eighty-nine subjects, forty-six were male and forty-three female. There were forty-five left eye images and forty-four right-eye images.  Table B-l summarizes the means,  standard deviations, standard error of means, and coefficients of variation for the data set and the disease groupings. There was no significant difference for the gender and eye groupings. Figure 41 shows the distributions for each variable in the normal and diseased groups.  Age  9 -,  • Normals  8  • Ahnoi nials • \  ~j  Global Area  12 -, • Normals  76  • Ahnornijls 8 -  M  ,0, CN  v->  co co  Figure 4-1 (a), (b)  91  Global Effective Area 12  Global Area Below Reference •  10  Noimals  14  • Abnormals  12  • Normals  10  • Abnormals  §6 4 U  i  J  i  4  JLnn,  IT)  O  8  r-  f*  2  n,  0  o  (N  d  T  d  °  i  JI (N  d  Figure 4-1 (c), (d) Global Mean Height of Contour  Height in Contour 30 - - •Normals rt • Abnormals  25  • Normals  20  • Ahnorm.ik .  lis 10 5 0  1 °  d  d °  Figure 4-1 (e), (f) Global Peak Height of Contour  Height Variation in Contour  30  25 • Normals  25  20  • Abnormals  20  • Abnoi mals S  o'|l5 Ui 10  15H  3  <3io -I  5  5  r-^  0 -I oo  1  .  _EL  r  d ^ o 1  • Normals  °  0  d Figure 4-1 (g), (h)  Global Volume Above Surface  Global Volume Below Surface  14  16  12  •  Nonnals  14  • AbnormaN  10  • Normals  12  c-  10 c  U6 -  U  1  3 O  , • Abnormals  6 4  \o rd d •d d  1 .  0  O o d  OO ON  d d  r<=> d o  ,1  m  d  Figure 4-1 (i), (j) Global Volume Above Reference  Global Volume Below Reference 14 12  • Normals  10  • Abnoimals i  c8 3  66 4 20O  u-, v-i  o ~ d  00 CN  1 d o  L r-"r in  £ r» cn "  <^> t  Jl  V-) vn  CN CN VO  O VI  Figure 4-1 (k), (1) Global Third Moment  Global Maximum Depth  • Noi mals • Ahnomul  Figure 4-1 (m), (n)  0.325 -,  Jl  0.225 0.25 ]  CN m  1 I  0.15 0.175  — * —*  2  Mean Radius  Figure 4-l(o) Figure 4-1. The histograms for age (a), and the fourteen stereometric parameters before any transformations: (b) ag, (c) eag, (d) abrg, (e) hie, (f) mhcg, (g) pheg, (h) hvc, (i) vbsg, (j) vasg, (k) vbrg, (1) varg, (m) mdg, (n) tmg, (o) mr. Both groups are displayed in the histograms:black bars = normal (healthy) and white bars = abnormal (glaucomatous).  4.1.2 Bivariate Correlation Analysis The normal group was examined for any significant correlations between variables.  The  Pearson product-moment correlation coefficient r was used as the measure of association. The correlation measured the size and direction of any relationship between any two variables. Table B2 is the complete correlation matrix for the variables before any transformations of the data. Usually a value r>0.70 will weaken an analysis (Tabachnick and Fidell, 1989) and both variables should not then be included. The calculations were expected to be related because the study was focused on finding the best discriminant variables, and this involved trying variations of the calculations. With correlations of r>Q.9Q multicollinearity becomes a concern; multicollinearity causes instability when inverting matrices during calculations.  Singularity occurs when the  correlation is equal to 1. For these reasons variables with a correlation r>0.99 were removed. The only variable removed from the analysis was the mean radius, mr, because it was correlated with 94  the global area, ag, at 0.998. This was expected because ag ~ Jt(mrf. The only other correlation above 0.90 was between vbsg and vbrg at 0.914, which is understandable because the only difference between these calculations is the reference plane which defines the base height of the volume. 4.1.3 Normality The variables were each tested for normality because the use of the general linear model is based upon the assumption that the data are normally distributed. The normal group was used as it was expected to be normally distributed; the same assumption was not made about the glaucomatous group data. The W statistic was found to be significant (p<0.01) for three variables indicating departure from normality: the W statistic (positive with a maximum value of 1) is expected to be W >0.9 for normal distributions and increases with sample size; a small value indicates departure from normality. The three variables (W statistic, p value) were vbsg (W=0.9115, p=0.002), vasg (W=0.8118,/?=0.0000), and vbrg (W=0.8633,p=0.0000). The skewness and kurtosis of the variable distributions were also calculated and four variables were found to be positively skewed. The expected value of skewness for a symmetric distribution is zero and when divided by the standard error can be interpreted as roughly a standardized score from a normal distribution; a value greater than 2 is unusual. The standard scores for these two indices were (skewness/standard error, kurtosis/standard error) were abrg (1.87, 0.34), vbsg (3.15, 1.99), vasg (5.64, 7.93), vbrg (4.44, 4.88). Although the skew of abrg did not exceed 2 it was found visually to be positively skewed; the other three variables were also positively skewed. The expected value for kurtosis is zero for a normal distribution and also can be interpreted as a standard score when divided by the standard error; deviations from a normal distribution are indicated by a value greater than 2, longer tails, and by a value less than -2, shorter tails. 95  Following this analysis, four variables were transformed to correct for their deviations from  normal distributions: abrg, vbsg, vasg, and vbrg. The square root transformation was used for all except vasg which was transformed using  logio. The values for all variables can be found in Table  B-3. Figure 4-2 shows the associated distributions after transformation.  Squareroot of the Global Area Below Reference  14  12  • Noi mal.s  • Noi nuils  12  Squareroot of the Global Volume Below Surface  10  • Abnornuls  • Abnormal.s  10 3  66  h _  CN  ° o~  t  1i  I/-)  t " - >/->  rd d  n d o  vo  Figure 4-2 (a), (b)  OO  0\  d d  Squareroot of the Global Volume Below Reference  Logarithm of the Global Volume Above Surface • Normals  7 6  •  \bnormals  f  1  Mil IT)  CN oo  CN  U">  $2  oo  d  ^  I d ^ o 1  9 Figure 4-2 (c), (d)  Figure 4-2. The histograms for the four variables which were transformed to produce normality in the normal healthy subject group: (a) sqrtabrg, (b) sqrtvbsg, (c) logvasg, (d) sqrtvbrg.  96  4.1.4 Homoscedasticity The assumption of homoscedasticity is that the variability in scores for one variable is approximately the same at all values of another variable. To meet the assumption of multivariate normality, the relationship between variables must be homoscedastic: the assumption that each variable and all linear combinations of the variables are normally distributed. Heteroscedasticity is caused by either non-normality of one of the variables or there is a non-linear relationship between variables; it is not fatal to an analysis. The analysis is weakened by missing any relationships not captured by a linear analysis, but not invalidated.  In the present case, the variables were all  normally distributed after transformations. However, the data were not examined for non-linear relationships between all of the combinations of variables. 4.1.5 Regression Analysis There was a significant difference (p=0.0017) between the ages of the healthy and the glaucomatous subjects. Therefore, a regression analysis was done using age as the independent variable and all other parameters individually as the dependent variable. The only variable found to be significantly (p<0.01) linearly related with age was the third central moment, tmg (r=0.3917, rrag=0.00198*<3ge-0.33082, P2 ,/=0.01). Table B-4 shows the significance values of the entire ta  variable list. The age effect was removed with the following equation: tmgcorr = tmg + 0.00198*(50-age)  (4.1)  The age of 50 was arbitrarily chosen for the rotation point of the tmg vs. age slope line. Figure 4-3 shows the distribution for the third moment after correction for age effects.  97  Global Third Central Moment Corrected for Age 18 16 14 12  -  6 4 2 0  -  • Normals • Abnormals  XUJL  m r-  oo  d o  o  >/->  o  Figure 4-3. The distribution of group scores for the third central moment after correction for age effects.  4.1.6 Squared Multiple Correlations The squared multiple correlation (SMC) of a dependent variable with other independent variables serves as a guide to how highly related it is to linear combinations of the other variables. A value of 1 represents a perfect relationship and the usual tolerance levels for the S M C are between 0.99 and 0.9999 before a variable is removed from the analysis.  To prevent  multicollinearity from a variable being too highly related to other variables, a tolerance of 0.9999 was used, again because the nature of the calculations and their expected correlation. Table B-5 shows the complete list of SMCs for the variables after all of the transformations were made. No variables were removed because they all were below the tolerance level. 4.1.7 Significant Group Differences The data groupings (disease, eye, gender) were examined for significant differences.  The  pooled t test (Student's t test) was used in conjunction with the separate variance t test (Welch) to examine if there were significant differences between all of the variables individually. The pooled test assumes that population variances for both groups are equal and its denominator contains a 98  pooled sample standard deviation. The Levene F for variability was used to determine if there was any significant group difference in variability (p<0.01) which then invalidated the use of the pooled r test. The separate variance test tends to be conservative if the population variances are equal because it does not make the additional assumption. Both the eye and gender groupings had no significant differences for any of the transformed variables, including age, between the left and right eye, and the males and females; also there were no significant group differences in variability. Table B-6 summarizes the results for these tests for the disease grouping. There was no significant group difference in variability between any of the variables for the healthy and glaucomatous subjects.  However, there were significant group differences (p<0.01) for all of the transformed  variables, including age, except for global area (ag, ^=0.7624), height variation of the contour (hvc, /?=0.1776), the logarithm of the global volume above surface (logvasg, p=0.017), and the global mean depth (mdg, p=0.0129). The pooled test and the separate variance test agreed on all of the variables. The equality of the multivariate group means, without age, were tested using the multivariate Hotelling's T and Mahalanobis' D . 2  2  Both statistics express the same information about the  multivariate distance between the groups, and were transformed into an F statistic to generate pvalues.  All the groupings were again used for completeness so as not to miss any information.  Hotelling's T was used to determine each of the independent variable's effect upon the dependent 2  variables in separate analyses. Hotelling's T is a special case of multivariate analysis of variance 2  where the independent variable has only two groups and there are several dependent variables. The result reveals differences between the centroids of the two groups on the combined averages of the dependent variables.  This prevents unnecessary inflation of Type I error due to multiple  significance tests with correlated dependent variables. Table B-7 summarizes the results which  99  found again, only the disease grouping to be significant: 7^=107.4029, D =4.7735, F=7.1222, 2  degrees of freedom =13 and 75, p<0.00005. 4.2 Multivariate Statistical Models 4.2.1 Principal Components Analysis The large number of variables and the unknown structure of the data set raised the question of how the variables clustered and if there was the same amount of information contained within a smaller number of independent variables.  A set of thirteen uncorrelated components was  determined using principal components analysis. The axes for the transformed data after principal components analysis are orthogonal, and the components are uncorrelated, ranked in order of decreasing variance. The reduced set of variables was further analyzed with a discriminant function analysis to assess their value in predicting the onset of glaucoma. Table B-8a shows the first four principal components with their unrotated factor loadings and the variance explained by each for the normal healthy group; Table B-8b shows the same for the entire data set, healthy and glaucomatous groups. The first four factors explained 85.04% of the cumulative variance in data space with the components generated using the normal group and 86.12% with the components generated using both groups. Therefore, it was not expected that the discriminant function analysis would do better than 85% using the minimized number of components.  Because eight principal  components were needed to explain greater than 99% of the variance in data space for both sets, there is unlikely to be any appreciable gain using principal components analysis. Orthogonal rotation was not used to minimize the number of variables that each factor reflected because such a transformation shifts the variance explained by the first few factors to subsequent factors, increasing the number of variables necessary to explain most of the variance in the data. The principal components of the glaucomatous group were not used in a discriminant function 100  analysis because the variance explained by the first component was very low (variance = 4.8285, 37.14% of variance in the data). This agrees with the assumption that deviation from a normal healthy optic nerve head due to degeneration will not occur in one direction in the data space, i.e. along a principal component. 4.2.2 Discriminant Function Analysis 4.2.2.1 Screening Discriminant function analysis was used to decide whether to use data determined using the standard reference plane or data determined using the adjusted reference plane. As mentioned in Section 3.1.2.8, both approaches used data before any transformations.  Also, the discriminant  analysis was used to screen predictive models based upon the principal components of the adjusted reference plane data and on the data after transformations of Section 4.1.  The goal of principal  components analysis is to extract maximum variance from the data set with each component which is an independent coherent subset of the original variables. Two principal component sets were formed, one from the normal group and one from both groups, as mentioned in the preceding section. Table B-9 summarizes the results for the discriminant screening. The data calculated using the adjusted reference plane was better at producing a predictive model for disease than the data calculated using the standard reference plane; 86.5% correct for the adjusted reference plane data and 84.6% correct for the standard reference plane data.  The  Heidelberg company used this finding (reported by Mikelberg et al. (1995)) in their updated software (version 1.11 and up), and from here on, the data mentioned were calculated using the adjusted reference plane as well as the data mentioned in Section 4.1. The principal components classifications were 77.5% and 80.9% for the combined set and the normal set, respectively. These results were expected because the variances explained by the first four components for each set 101  were not greater than 85%. Only the first two components were used in the respective discriminant functions which were not likely to contain all of the information hidden within the data necessary to separate the two groups. Also, the principal components did not minimize the number of variables used as they were all dependent upon the full set of stereometric parameters. Thus, further analysis was completed using the data set calculated from the adjusted reference plane with the transformations discussed in Section 4.1, referred to as the Final data set. 4.2.2.2 Classification Model A hierarchical discriminant function was obtained, using a forward stepwise discriminant function analysis, to assess prediction of case membership in the two disease groups using the thirteen stereometric parameters mentioned earlier. The forward stepping criteria mentioned in Chapter 3, Table 3-8 for the F statistic was used to decide whether an independent variable, a predictor, was added to determine group membership. The predictors were the global area, global effective area, square root of the global area below reference, height in the contour, global mean height of the contour, global peak height of the contour, height variation of the contour, square root of the global volume below surface, logarithm of the global volume above surface, square root of the global volume below reference, global volume above reference, global third central moment corrected for age effects, and global maximum depth. There was statistically significant separation among the two groups from three of the variables, F(3,85)=24.901, p<0.001; see Table B-10 for reliability of group separation. Using these three variables in the classification  equations,  (specificity=86.7%, sensitivity=90.9%). procedure  which  reduced  bias  in  88.8%  of the cases were predicted correctly  This result was validated, based upon the jackknife the  classification,  with  87.6%  correctly  classified  (specificity=86.7%, sensitivity=88.6%). The three predictors were the global third central moment 102  corrected for age effects (tmgcorr), the global volume above reference (varg), and the mean height in the contour (hie). In the first step tmgcorr was entered into the equation with an F value of 41.397 then varg was entered in the second step with an F value of 16.338 and in the third step hie was entered with an F value of 5.936, see Table B - l 1 for complete listing. There were no variables removed and the variables all entered under the F criteria in Table 3-8. Table B-l2 reports the F to remove values for each predictor and also reports that the remaining predictors all had F to enter values of <3.4 which is under the criteria of F(4,84)=3.58, p=0.01, for the fourth step. McNemar's test for change indicated reliable improvement in classification, p<0.05 with ^ (1)>3.84, with the 2  addition of both varg and hie as predictors; ^ (1)=5.88, p<0.05. However, when the entry of the 2  variables were examined separately, McNemar's test failed to indicate reliable improvement in classification; entry of varg at step 2, % (1)=3.5, and entry of hie at step 3, ^ (1)=1.33. Table B-13 2  2  is a summary of these calculations. As mentioned in Section 3.2.2.3.2, the group classification equations were corrected for disproportionate group size with the addition of the term, ln(n/N), where n is the group size and N y  is the total sample size. This was automatically calculated by BMDP7M, but for completeness it is mentioned that the two terms added to each classification were /n(45/89) = -0.6819 for the normals group equation and /n(44/89) = -0.7044 for the abnormals group equation. coefficients for the two equations are listed in Table B-14.  The resulting  Below is listed the canonical  discriminant function for the disease group; the coefficients are also listed in Table B-14. D = -2.75*hic + 3.69*varg - 6.19*tmgcorr -1.59  (4.2)  There is only one discriminant function because there are only two groups. The mean value of D for each group was 0.917 for the healthy subjects and -0.937 for the glaucomatous subjects. A value less than zero classifies the case as glaucomatous and a value greater than zero classifies the 103  case as healthy. Figure 4-4 plots out the group distributions for D. The scores are equivalent to standard deviations, and thus the means are both nearly one standard deviation from zero.  Canonical Discriminant Function Group Distributions 12 -r 10  • Health) • Glaucomas  u  c §•  6  u fa  4  0  Function Range Figure 4-4. The resulting classifications from the DFA canonical discriminant function.  The three predictors were not highly correlated in the normal group (hicwarg ^-0.2911, hic.tmgcorr ^0.4111, varg:tmgcorr r=-0.1067) or the abnormal group (hicwarg r=-0.2358, hic.tmgcorr r=0.2832, varg:tmgcorr A^-0.3361), see Table B-2b and A.4-2c, respectively. This was expected because the calculation of these three variables are quite different: hie is a height, varg is a volume, and tmgeorr is a moment calculation. The volume above reference is equivalent to the neural retinal rim tissue which decreases with ongoing loss of retinal nerve fibers. The mean height in the contour is the maximum depth in the optic cup which is again expected to be greater in glaucomatous images due to loss of nerve fibers. The third central moment is a measure of the frequency of depth values in the cup, as mentioned in Section 3.1.2.12; negative with shallow depth values, an O N H with very little or no cupping, and positive with high depth values, an ONH with 104  cupping due to loss of nerve fibers. Thus, all three predictors are likely to be sensitive to the loss of retinal nerve fibers. 4.3 Artificial Neural Network Model The structure of the artificial neural network was determined by optimizing the model's performance on predicting case group membership. The average training classification rate of the model and the cross validated classification rate were compared to determine the generalization performance of the network. The average training classification rate of the model was always higher than the cross validation rate. The following are the steps taken to determine the optimum method for training and the optimum configuration of the network. All graphs report the average training classification rate, Training, with the cross validation classification rate, CV: (i) Hidden Layers and Nodes The number of hidden layers was varied from zero to two.  Without a hidden layer the  network's capacity to memorize the training patterns, due to there being just fourteen weighted connections (13 inputs and one bias to the output), was diminished. With one hidden layer there were fifteen weighted connections for every hidden node plus one more (13 inputs and one bias to each hidden node, with one from each hidden to the output, and one bias to output). With two hidden layers, there were the number of hidden nodes for one hidden layer plus one, times the number of second hidden layer nodes, all plus one (13 inputs and one bias to each 1 layer hidden st  node, each 2  nd  layer hidden node had inputs from the bias and each 1 layer hidden node as well as  one to the output, and one bias to output).  st  Table 4-1 summarizes the number of weighted  connections for each network configuration reported in Figure 4-5.  The increase in number of  weighted connections increases the capacity of the network to memorize patterns, however, each  105  successive hidden layer passes the results of the previous layer through a sigmoidal operator. These operations produce successively complex decision regions in input data space. Number of Nodes inj Number of Nodes in Total Number of 1 Hidden Layer st  | 2  nd  Hidden Layer  Connections  0  0  14  1  0  16  2  0  31  3  0  46  0  61  1  33  4  |  2 2  |  2  65  3  !  1  48  2  95  3  Table 4-1. The Number of weighted connections according to the number of hidden nodes and hidden layers (with 13 inputs, one output, and bias).  In Chapter 3, it was mentioned that three hidden layers are very rarely necessary, and with this data set it was found that two hidden layers were also not necessary, largely because the limited number of training patterns, and the high number of dimensions in the input data space produced very sparsely defined group regions. This is graphically represented in Figure 4-5 (all values are from Table B-l5) showing the difference between the average training classification rate (Training) on patterns compared with the cross validation classification rate (CV). This separation is due to the network memorizing the training patterns, and not being able to extrapolate any generalizations about the data set and apply them to new patterns being tested. As the training rate drops below 100%, the specific network configuration's memorization capacity for the number of training patterns has been reached. The C V rate was found to increase as the network was forced to learn  106  more patterns with less memory capacity, hence generalize as to the features which best separate the two groups.  As the C V rate approached the training rate the network was found to reach its  theoretical limit for memorizing patterns without losing its generalization capacity for new patterns. The seed number for all configurations, in Figure 4-5, was 1, with randomized starting weights, and for the first nine configurations an 80% training criterion with target values of 0.1 and 0.9 used with the adjusted reference plane data after statistical transformations (13 inputs), and for the last four configurations an 85% training criterion with target values of 0 and 1 used with the adjusted reference plane data before statistical transformations (14 inputs).  Number of Hidden Units and Layers Varied  50 -1  1  1  0,0  1,0  2,0  1  1  1  1  1  1  1  1  1  3,0 4,0  2,1  2,2  3,1  3,2  1,0  2,0  3,0  4,0  1  Number of Hidden Units (lst,2nd Layers)  Figure 4-5. The separation between valid generalization and memorization is illustrated.  As the theory indicated in Chapter 3, there exists one hyperplane for every I layer hidden node s  and each second layer hidden node performs a logical A N D on all of the 1 layer hidden nodes. st  Figure 4-5 shows that the network minimized the separation between the training rate and the C V rate with configurations of 0 hidden nodes, 2 hidden nodes in the 1  st  layer only, and with 2 hidden  nodes in the 1 layer and 1 hidden node in the 2 . The highest C V rate was with 2 hidden nodes in st  nd  107  the 1 layer only; a training rate of 88.4% and a cross validation rate of 86.7%. This not only had st  the highest C V rate but also had the least separation. The capacity of the two hidden layer networks and the one hidden layer networks with greater than 2 hidden nodes, were too high for the size data set; all of these configurations had a training rate of greater than 95%, when the minimum training rate was 80%, and a C V rate of less than 79%. The 2 hidden layer network with 2 nodes in layer 1 and 1 in layer two, behaved very similarly to the single hidden layered networks with 2 hidden nodes; these three examples were the best of those tested. The network with no hidden nodes performed very well with the overall transfer function being a simple sigmoidal of the inputs, however, improvement in generalization with this configuration was not expected because the training rate was already below 90% showing very little more capacity for generalization, (ii) Starting Weights When the network is trained, the weight connections in between nodes must have initial values. Three methods were examined for the setting of these weights: randomized at start using a random number generator, loading the same set at every start (i.e. always starting in the same position in weight space), and not resetting values after initially training the network (i.e. starting at the location in weight space which was last calculated). Only the first method produced results which were useful for training the network. Both the second and third method would regularly fall into a local minimum in the weight space and with the learning methods used, the weights would remain at this location thus, stalling the learning process. The method of randomizing the initial weights was adopted; a random number generator was used which was reset each time with a seed number given by the user.  108  (iii) Seed Number The seed is used to seed the random number generator when randomizing weights, as mentioned in the preceding paragraph. Table B-l6 and Figure 4-6 summarize the effects that changing the seed number had on generalization. There is no evidence from the data to indicate that different seed numbers would produce better or worse generalization results. It was observed that under one initiation of the Xerion simulator, the same seed number would produce exactly the same results under the same training conditions and network configuration. This became useful when one wanted to reproduce results.  However, once the simulator was quit and restarted  different results would be produced from the same training conditions and network configuration; evident in the last three runs with identical seeds but different initializations of the simulator.  Seed Number Effect 100  590 u  ©  U  $£80  • Training •CV  70  -i  9  6  1  5  -i  r  4  3  2  r  1  1  7  6  5  4  1  1  1  Seed Number Figure 4-6. The effects of using a different number to seed the random number generator. The same network configuration was used.  (iv) Data Format The data set was presented to the network in three different formats: the raw adjusted reference plane data before any statistical transformations, the same data after normalization, and again, the  109  same data after standardization, as explained in Section 3.3.4.1. As can be seen in Figure 4-7, when the network was trained identically the raw adjusted reference plane data performed better than either the data after normalization or the data after standardization. Table B-l7 summarizes the data presented in Figure 4-7, each training run was performed under identical conditions, only the fourteen stereometric parameters were in different forms, with seed numbers of 2 and 1 used to seed the random number generator, respectively, for each data set format.  Standardization and  normalization of the adjusted reference plane data was not performed because there was no gain in the separation of groups.  100  Data Set Format Varied T  II  90 -<->  280 u U  70" 60 50 -L  £  z  Data Set  Figure 4-7. The format of the data set to which the artificial neural network was presented: raw adjusted plane data (Adj. Ref.), normalized Adj. Ref. (Norm'd), and standardized Adj. Ref. (Std'd).  (v) Minimum Number of Training Iterations A minimum was set to the number of training iterations made before the network could test on the validation cases. This was done to prevent the network from falling into a local minimum prematurely.  If the network did not reach this minimum number of iterations before reaching  110  learning criteria, then the network was forced to start again from another point in weight space. Table B-l8 and Figure 4-8 summarize the results for four pairs of runs, each with the same initial conditions and during the same initialization of the Xerion simulator, except for a minimum number of training iterations to be reached before testing could take place. There was found to be no gain in setting a minimum on the number of training iterations to be completed before testing the validation cases.  Minimum Number of Iterations Effect  70  !  1  o r  -  c  O s rV  r  1  o  o v  ©  i  o C  N  i  o O i vo  I  i  o  o  o  O >  /  -  i  c  ir> N  —  '  t  s —"  Seed Number, Number of Training Iterations Figure 4-8. The effect of controlling the minimum number of iterations before testing validation cases had upon generalization.  (vi) Bias Unit Connections The effect the bias unit had upon the classification equation's ability to generalize was examined by a) removing the bias unit connection to the hidden node, b) removing both the bias unit connection to the hidden and output node, c) leaving bias connections intact, and d) with no hidden node, thus just a bias unit to output connection. The results are summarized in Table B-l9 and Figure 4-9. training rate.  There was no appreciable gain in C V rate but there was some variance in the The removal of bias unit connections did not affect the network's ability to  111  generalize, but did affect the network's capacity to memorize; the extra connections increased the number of patterns memorized.  Bias Unit Effect  70 -j lh, b-o  1  1  1  lh  lh, b-h,o  Oh, b-o  Bias Unit Connections Figure 4-9. Effect of removing bias unit connections; number of hidden units in the 1 layer and the actual connections in place, b-bias, h-hidden, o-output. st  (vii) Target Values The training of the artificial neural network requires that there be given the correct classification for the patterns so that the network can learn. However, the output of the network is a sigmoidal function which varies {0, 1}. Usually, one group would be given the value 0 and the other group the value 1. In a discussion by Rumelhart and McLelland (1986), the desired outputs are given values inside of the transfer function's limits. The purpose is to allow the actual values to vary either side of the target while training, thus, giving a more normal distribution of values around the target value after training. In Table B-20 and Figure 4-10 summarizes the results of varying the target values for the network.  The four cases which were better at minimizing the separation  between the C V rate and the training rate all used target values of 0.1 for normals and 0.9 for the abnormals. They also had the highest C V rates which emphasized their ability to generalize.  112  Target Value Effects  .25,1:0  .25,.75  .1,.9  .1,.9  0,1  .1,.9  .1,-9  Target (Normal,Abnormal) Figure 4-10. Effect of varying the target values for the groups. The seed was 1 and the number of hidden nodes was 2 in the 1 layer, training criteria was either 80% or 85%. st  (viii) Minimum Training Classification Rate There has already been one method mentioned which was examined for its ability to improve the generalization of the network: imposing a minimum number of iterations before proceeding to validation. This method did not have any advantages, and was prone to producing slightly worse results. At each step during cross validation the network calculated the correct number of cases in the training set and compared this with a percentage. This was done to make sure that the network had actually memorized the training patterns to some degree. The average of these training rates were calculated for each run and compared with the C V rate to assess the degree of separation between the two and was used as a guide to the network's ability to generalize.  The effect of  varying the minimum percentage correct in the training set before validation was also examined and the results are summarized in Table B-21 and in Figure 4-11. Unless stated, the maximum training  113  classification rate allowed was 100 %. The graph does not illustrate any specific trend in the range of 80% to 90%. However, when the network was allowed to test with a training rate of less than 80% the generalization capability dropped off quickly. As can be seen by the cases requiring 90% minimum there was a large separation between the C V rate and the training rate. The last two cases 80(51) and 85(51) had identical results which showed that by starting with a seed of 51 the network training rate never dropped below 85%, and had an average of 95.7%.  This illustrates that  sometimes the network solution in data space can provide good memorization of patterns, but poor generalization capabilities. There was no limitation by allowing the training rate to be at 80% compared to 85%, however, when 90% was approached the network tended to over learn, not allowing feature extraction. The training rate of 80% minimum was found to be sufficient to make the network learn sufficiently but not to over learn. Other ranges, e.g. 85-90, did not perform better and caused long training sessions; invalid training sessions of greater than 90% were common. Thus, nothing was gained by setting an upper limit.  114  Minimum Training Classification Rate Effect  50 -I  1  , - ( , — W W  © O  1  1  I , — I , W W  0 © O O O  0 O  > O  O  1  — I W  /  O  ,  1  — W  0 O O  1 I  1  , — I , W W  > O ^ 00  / N  O  "  1 — I W  O  1  ,  O  1 — l W  0 N  0 O  1  f  l O W  O  0  t ^ ^  O  0  O O H  O  0 O  1 O  0 O  '  1  1 — W  O ON  < > IT)  -  l V->  ^ O V ) oo oo W  % Minimum-Maximum (Seed No.) Figure 4-11. The effect of varying the minimum percent correct in the training set before validation. Each run was trained with 2 hidden units.  4.4 Receiver Operating Characteristic Curves The receiver operating characteristic curves in Figure 4-12 represent the two best models produced by each analysis method: D F A and ANN.  The area under the D F A curve was 0.916  which is greater than the area under the A N N curve by 4% and indicates a higher probability of the D F A predicting patient group membership. The D F A produced jackknife results which classified 87.6% of the patients correctly (86.7% specificity and 88.6% sensitivity) using the Mahalanobis distance.  The A N N , using cross validation, classified 87.8% of the patients correctly (86.7%  specificity and 88.9% sensitivity) which was equivalent to the DFA. The canonical discriminant function used to produce the ROC curve below for the D F A , actually classified 88.8% of the patients correctly (86.7% specificity and 90.9% sensitivity).  However, an equivalent equation  could not be produced for the A N N so the cross validation results were used instead, possibly the  115  reason the area under the curve is less. Most statistical papers state the latter results for the D F A but for completeness the D F A and A N N were compared by using their validation results. The actual average training classification rate for the A N N was 88.9% which was equivalent to the DFA. Receiver Operating Characteristic Curves for the Discriminant Function Analysis and the Artificial Neural Network Analysis  -O—ANN •  DFA  Area Under ROC Curve: ANN = 0.879 DFA = 0.916  0.2  0.4  0.6  0.8  False Positive Fraction (1-[specificity]) Figure 4-12. ROC curves for discriminant function and artificial neural network analysis.  116  5. Discussion Optic nerve head images quantitatively describe the retinal nerve fiber bundles which represent the completeness of the visual field. The models in this study successfully utilized measurements from these images to detect the presence of glaucomatous damage.  Although the discriminant  function analysis was a linear model, it performed as well as the nonlinear model defined by the artificial neural network.  Both of these models validated their results using methods which  removed bias when testing cases. The models had a decision threshold which was varied and the ROC curve of each was plotted. The flexibility of the decision threshold is important because with any diagnostic system there are tradeoffs: the tradeoffs in this study are the cost based on prevalence of false treatment and the cost to individuals with the disease who are missed. As can be seen, many factors affect the building of a classification model and the method by which the model's results are validated represent it's ability to extract features from the data which can then be applied to the general population. 5.1 HRT Measurements The measurements obtained with the Heidelberg Retina Tomograph are reported to have a theoretical reproducibility of approximately ± 1 0 microns for each image pixel (Heidelberg, 1994). However, this accuracy is unlikely to be achieved in practice for the reasons described below: (1) lateral eye movements: although these are compensated for by the SLO software, blood flow causes the retina to move longitudinally and blood vessels to change in diameter; (2) accuracy and reproducibility are affected by the quality of the acquired image; an image with reduced reliability results if the scan parameters are not set properly;  117  (3) if the camera was not aligned with the center of the pupil, image tilt would be present; the corrections made for tilt affected the reproducibility of measurements; (4) in order for the signal to noise ratio to be maintained when dealing with small pupil diameters or lens opacities, the sensitivity of the light detection system must be set to a high value to increase brightness; electronic and photic noise increase when the sensitivity is set high and this decreases image quality and reproducibility. Therefore, the effect of these sources of error in the measurements should be examined. Unfortunately, this is not possible in the clinical setting and is difficult with human autopsy or animal tissue.  Shape changes occur to biological tissue in between the time that the image is  acquired and the time of physical measurements. However, a plastic model eye was used in two studies to verify the accuracy of two different instruments which measure ONH topography: the Rodenstock Optic Nerve Head Analyzer (Shields, 1989), and the Heidelberg Retina Tomograph (Dreher and Weinreb, 1991). Shields (1989) found that after correcting for the magnification error of the computer-measured cup diameter, with a refractive error factor for the phakic (with lens) model of the eye, and an axial length factor for the aphakic (without lens) model, the average errors were 4.6% and 7.5%, respectively. In comparison, Dreher and Weinreb (1991) reported that the cup diameter had an average relative error of 2.0% and 3.6% for the phakic and aphakic models, respectively. Dreher and Weinreb (1991) also reported the average relative error of the depth measurements for the phakic and aphakic models, which were 11.7% and 10.1%, respectively. However, Weinreb (1990) calculated the accuracy of the HRT using depth measurements and found that the relative error did not exceed 3.1% for a phakic model of the eye. Using the highest error reported, 12% (Dreher and Weinreb, 1991), the three parameters used in the discriminant function as predictors were evaluated for the effect that the error has upon the 118  separation of group means. Even if the variance due to error is taken into account, Table 5-1 shows that the separation between group means is still sufficient to allow group distinction. Third Moment  Volume  Height in Contour  Above Reference normal j abnormal  normal  abnormal  normal  -0.229 j -0.101  0.368  0.190  0.106  0.027 j  0.044  Range - low  -0.256 | -0.113  0.323  0.145  0.0706  j 0.260  Range - high  -0.202 | -0.0889  0.412  0.234  0.141  ! 0.330  Between groups  -0.202 ! -0.113  0.323  0.234  0.141  |  Mean Max(12% error)  ; abnormal |  0.295  j 0.035  0.260  Table 5-1. HRT relative error applied to the predictors. Good separation between group means still exists as the errors do not overlap.  5.2 How Useful is Discriminant Function Classification? The preliminary discriminant function analyses showed which of the two reference planes was the better choice. The adjusted reference plane produced parameter calculations which were better at separating the two classes, normal and glaucomatous patients. The findings of this work have been published (Mikelberg, et al., 1995) and Heidelberg incorporated the adjusted reference plane into the version 1.11 of the HRT software. A forward stepping linear discriminant function analysis was performed on the adjusted reference plane data after statistical transformations were done to normalize the distributions for four of the variables, abrg, vasg, vbrg, vbsg, and to correct for age effects on the variable tmg. One variable, mr, was removed from the analysis due to a high correlation with ag. The result of the analysis was to show that three variables could be used to define a classification function which separated the two classes with an overall accuracy of 88.8% (86.7% specificity, 90.9% sensitivity). These results were verified with a jackknife validation which reduced the bias by building 119  independent classifications from the test data using Mahalanobis distances; the jackknife classification rate was 87.6% (86.7% specificity, 88.6% sensitivity).  The three discriminant  variables were the third central moment corrected for age, tmgcorr, the volume above the reference plane, varg, and the height in contour, hie. The third moment (tmg) is a shape description of the optic nerve head determined by using the third central moment of the frequency distribution of all the height values in the contour. The volume above reference (varg) is the same as the neuroretinal rim volume pictured in Figure 5-1. The height in contour (hie) is the mean height of all parts in the contour.  The separation of the two groups with these three variables was significant,  F(3,85)=24.9 with /?<0.001, and none were highly correlated with each other indicating independent relationships.  Figure 5-1. A cross section of the optic nerve head showing the relative features. In this example the adjusted reference plane is located 50pm below the retinal surface. The volume above reference is the same as the rim volume, a is the mean height of the curved surface and b is the cup depth (Zangwill et al., 1996).  These three variables are likely to be affected by nerve fiber loss. The third moment is an indirect measure of the shape, and as the depth values increase, the third moment increases from a  120  normally negative value to a positive value.  The neuroretinal rim volume decreases with  degeneration of the nerve fiber layer which occurs as the optic nerve head becomes more depressed. The height in the contour measures the average height of all parts within the contour and is again sensitive to depression of the optic nerve head due to nerve fiber loss. A multivariate analysis study was designed to identify the parameters which would predict the ocular hypertensive (OHT) patients who would develop glaucoma (Airaksinen et al., 1991). The data set consisted of 96 OHT patients which were divided into two groups; the model was determined using one group and then tested using the other group. The factors which separated stable OHT patients from ones who developed glaucoma were initial rim area, change of rim area, and standard deviation of the mean defect of the visual field. This model correctly classified 81% of the patients (87% specificity, 72% sensitivity). These results indicate that even visual field data is not 100% accurate at detecting glaucoma when tested long term. Drance (1975, 1976, 1978a, 1978b, 1981) was involved in a series of studies which examined the long term use of predictors of glaucoma.  One study (Drance, 1975, 1976), using six disc  variables measured from ophthalmic photographs, correctly identified 62 (84% sensitivity) of the discs associated with visual field defects and 62 (83% specificity) of those discs that were normal. Thus, there were 12 false positives. The method with which the parameters were calculated was well defined. This study was followed up (Susanna and Drance, 1978a) and it was found that 8 of the 36 false positives and one of the 36 true negatives, went on to develop glaucomatous visual field loss; these patients were all initially thought to be disease free.  The generalization of the  discriminant function was poor because the cases followed up had been misclassified.  121  A related study (Drance et al., 1978) used 7 of 33 available parameters to classify 93% of the patients correctly, (92% sensitivity, 95% specificity). The parameters which were consistently the highest ranked were the structural measurements of the ONH; rim abnormality and cup/disc ratio. The study used 219 patients with glaucoma and 100 normal patients. The groupings were stratified according to the prevalence of low-tension glaucoma, and hypertension in the normal population. However, the types of parameters used were difficult to collect and the study was biased because of the way in which the patients were classified conflicted with the parameters used as predictors, e.g. stereo examination of the discs and full ocular exam. Another follow up study (Drance et al., 1981) showed that the discriminant function of 1978 did relatively poorly when applied to a new set of patients: the initial overall classification was 81 % (48 % sensitivity and 92 % specificity). The D F A was then applied to the new population and classified 79 % overall (71 % sensitivity, 81 % specificity) showing that there was either a lack of classifier generalization or an error in reproducing the same measurements. These studies emphasize the need for long term follow up to provide a real account of the classification model's ability to generalize. As is obvious from these results, none of the models has a long term follow up rate which matches the original success rate. In a preliminary study done as part of the present project a different classification function was calculated using three variables: tmg, varg, and hvc, with 85.7% overall classification (84.4% specificity, 87.0% sensitivity) (Mikelberg, et al., 1995).  The classification function has been  incorporated into version 2.01 of the HRT software by Heidelberg. The one parameter different from Eqn. 4.2 hvc, height variation contour, describes the maximum variation along the contour; it did not contribute significantly to the classification. Caprioli (1992) also used a D F A to classify patients using well defined structural and functional parameters. The results are given in Table 5-2. 122  %  Overall  Sensitivity  Specificity  Combined DF  87 ± 3  90 ± 5  76 + 5  Structural DF  76 ± 2  88±4  35 ± 1 2  Functional D F  77 ± 3  99+ 1  6±6  Table 5-2. Results of Caprioli's discriminant function classifier.  The results were good but the analysis was biased because visual field indices were used to define the classes against which the D F A results were judged. This study was repeated by Brigatti et al.(1996) using an artificial neural network to build the classification model. The results were better and will be discussed in the next section. The low specificities are representative of the smaller normal group which only made up 23% of the population sample, i.e. 54 out of a total of 185 glaucoma and 54 normal patients. Uchida et al. (1996) used a discriminant function analysis to build a classification model which would separate 43 normal patients from 53 glaucoma patients based upon nine scanning laser ophthalmoscope parameters: cup area, cup-to-disc area ratio, rim area, height variation contour, cup volume, rim volume, cup shape measure, mean retinal nerve fiber layer height, and retinal nerve fiber layer cross-section area.  As in the Caprioli study, the results were biased because the  appearance of the optic disc was used to classify the patients. The model was determined using two thirds of the data, with one third used for testing; the test set was rotated in a manner similar to cross validation. In spite of the bias the model did not perform very well; the diagnostic precision of the model was 81% (81% specificity, 81% sensitivity).  123  5.3 Artificial Neural Network Classification The artificial neural network was best trained using error back-propagation with the conjugate gradient algorithm to adjust the weights while minimizing network error. The conjugate gradient determined the direction of minimization on the error surface. A line search was used because directions change radically with conjugate gradient, and consequently so does the appropriate step size. The initial direction on the error surface was defined by the initial weight vector. Training ceased when the network error stabilized, remaining within a defined tolerance range. The optimal network architecture for this classification problem was determined by maximizing the classification rate of the validation cases; the validation cases were not used to determine network parameters during training. The ability of the network to generalize was measured by its success after training in correctly classifying new cases. The algorithm explains the data from our population sample and is successfully applied to the general population, i.e. new cases. The factors which affect network generalization are as follows: (a) the best cross validation (CV) rate was obtained with two hidden units in the first layer and without a second hidden layer. According to theory this would form a simple convex region from two hyperplanes in data space, where the computation nodes use sigmoidal activation functions. Thus, the groups were best separated with the simplest convex region , i.e. two hyperplanes; (b) the best cross validation rates, with the least separation between the training and C V rates, were obtained when data presented to the network were calculated using the adjusted reference plane.  The least separation between the C V rate and the training rate was  obtained with the statistically transformed adjusted reference plane data (13 inputs); the highest cross validation rate was 84.1% (86.7% specificity, 81.4% sensitivity). However, 124  the adjusted reference plane data without statistical transformations (14 inputs) performed the best overall, with a cross validation rate of 87.8% (86.7% specificity, 88.9% sensitivity); (c) the randomization of the starting weights produced the most reliable network convergence; (d) the values mentioned in (b) were obtained with two different sets of target values {0.1,0.9} and {0,1}, respectively. The target values which produced the least separation between the C V rate and the training rate was {0.1,0.9}.  However, the targets {0,1} produced the  highest C V rate but with greater separation. This allows for some improvement; (e) a minimum training rate of 80% was sufficient to ensure valid network convergence. The seed number and the minimum number of iterations were both factors which were not found to affect the ability of the network to generalize. Varying the connectivity of the bias units did not affect the cross validation rate of the network. However, as expected the memorization capacity of the network decreased when the number of network weights decreased. Artificial neural network generalization is sought with pattern classification systems so the network's application will not be limited to the training data. Lippman (1989) defines the effect of using sigmoidal computation nodes in a network to be that of dividing the data space into hyperplanes. The number of hyperplanes is determined by the number of hidden nodes used in the first hidden layer (Baum, 1988). Lippman (1987) found that a network with only one computation node, an output, divides the data space with a hyperplane; a network with one hidden layer forms convex open or closed regions; a network with two hidden layers or more forms arbitrary regions for which the complexity is limited by the number of nodes. The closed regions formed by one hidden layer are bounded by convex sets of hyperplanes or within parallel hyperplanes. However, the closed regions formed by a two hidden layer network have a combination of convex and  125  concave edges. The regions shown in Figure 5-2(b) are only separable by a combination of convex and concave planes, thus requiring a two hidden layer network.  Figure 5-2 (Fig. 3-22 repeated). Regions formed by two distinct data groups, (a) linearly separable regions, (b) non-linearly separable regions.  Abe et al. (1993) determined that a minimum of four hyperplanes was needed to divide ten classes. Their network was trained, network parameters were tuned for generalization, and then an algorithm was extracted from the network for testing. The accuracy of the network increased with increased number of hidden nodes. This was expected because the storage capacity of a network increases with the number of connections. However, the accuracy of the algorithm decreased when the number of hidden nodes increased. This showed that there is an upper limit on the number of hidden nodes above which generalization will not occur, or the network over-fits the functions being approximated. The results of Section 4.3 (i) were similar to that of Abe et al.(1993). The number of hidden nodes increased as the network training rate increased, but the cross validation peaked and then decreased with increasing number of hidden nodes.  This is shown clearly in Figure 4-5.  As  126  network complexity increases, e.g. the number of hidden nodes, the number of hyperplanes in data space increases and costs network generalization. When the number of hyperplanes used in data space for separating classes increases, the networks begin to memorize all of the data features. However, when the classes are separated by the minimum number of hyperplanes the network is forced to only extract the relevant features from the data, which increases the chance of correctly identifying a new pattern using the appropriate feature. Therefore, with two classes, normal and glaucomatous, only one hyperplane should have been needed.  However, when the separation  between classes becomes less and the distributions become more complex, the number of hyperplanes needed increases (Fig. 5-2). In Figure 5-3(aa) three hyperplanes divide the seven patterns into seven regions; the network has memorized the patterns using all three digits as features.  If the patterns were only to be  separated by their number, only one hyperplane would be needed. The three hyperplanes would still identify each pattern correctly, but when a new pattern is added there will be higher chance of misclassification because the feature of interest, the number, may not be the only feature filtered by the network, i.e. too much information is kept and generalization is decreased.  127  Figure 5-3. Hyperplanes that divide the data space into regions: (a) simplest convex; (b) undivided; (c) 3 regions with 2 hyperplanes; (aa) 7 regions with 3 hyperplanes.  As stated earlier, the number of hidden units controls the generalization of the network, i.e. the number of hyperplanes dividing the data space. Two hyperplanes produce a simple convex region in data space Figure 5-3(a). The two groups are expected to form two distinct regions, however, this may not be the case with this study's data. Figure 5-3(c) is similar to (a) if the hyperplanes were extended, however, if the two hyperplanes are parallel the data space is divided into three distinct regions. It is possible that this study data could be distributed in this manner. Lippman (1987) identified this as a closed region formed by one hidden layer. Lippman's (1987) observation of a single computation node network combined with Baum's (1988) relationship between the number of hidden nodes and the number of hyperplanes confirms the results presented in Section 4.3 (vi).  There was no difference in the cross validation  performance of the network with or without one hidden unit because the result in data space was the same: only one hyperplane divided the regions.  However, the connectivity of the bias unit  determined the inclusion of thresholds into the overall network response function and thus limited  128  the possible orientations of the hyperplane dividing the regions. Figure 5-2(a) shows a hyperplane dividing two regions but if the orientation or location of the hyperplane is changed, e.g. a missing threshold, cases from either region could be misclassified by the incorrectly placed hyperplane. This affects the memorization capacity of the network, as illustrated in Figure 4-9. In general, a network's ability to identify the patterns that it was trained on is the network's memorization capacity.  The number of hyperplanes necessary to guarantee memorization of k  patterns is k-1 (Huang and Huang, 1991). However, this does not take into account the number of classes of objects. Using the same logic applied to the data, one hyperplane would be needed to separate two classes or distributions provided the two are separated by a sufficient distance to allow placement of the hyperplane without cutting into either of the distributions. When there is not adequate distance more hyperplanes are necessary to construct a non-linear boundary that will separate the classes. When Huang and Huang's criterion is used with classes we are defining a new mapping for the network which is not to memorize all of the data features but to form a mapping which only preserves the features necessary to separate the classes. Patterns are points in data space but classes form distributions which can be cut by decision regions formed by hyperplanes. Thus, when mapping mutli-dimensional data to a lower dimension, another criterion comes into effect: the number of hyperplanes necessary to guarantee separation of N patterns in d dimensions is N/d (Baum, 1988). Thus, with the data , iV=87 (89 - 2 validation cases) and d=\3 (worst case), N/d=l hyperplanes. This illustrates that as the number of dimensions increases the distributions of data are more sparsely defined. Therefore, as the number of input features increases the number of data required to adequately define the data space increases.  The data distributions are better defined  with a large number of patterns. However, as the number of input dimensions, i.e. the parameters,  129  increases, the easier it is to memorize all of the patterns using fewer hyperplanes to separate the higher dimensional data. The ability of a network to generalize is measured by its success at predicting new cases using a model formed from independent training data.  Generalization occurs when a network only  memorizes the features necessary for separation of object classes, but it is decreased when other features are memorized which degrade the classification. Thus, when patterns are memorized all of the features of the data are stored, i.e. generalization cannot be guaranteed. Baum and Haussler (1989) proposed a guide for the number of patterns necessary to define the data space upon which a classification can be built using a sufficient number of hyperplanes to separate object classes. The number of training patterns, N, is greater than the number of connection weights in the network, W, divided by the generalization error rate , e, which is twice the allowable training error rate; N>W/e. The data trained on the network with two hidden nodes , 31 connection weights (Table 4-1), and the error rate found, 0.12 (100%-87.8%), actually requires 258 patterns for generalization to occur. Eqn. 3.19 includes the number of the hidden nodes used as well. Both equations indicate that as the number of connection weights increases generalization.  so does the number of patterns necessary for  This shows a dependency of network generalization on the complexity of the  network and also on the number of input dimensions. As the data is spread out more in higher dimensions fewer hyperplanes are necessary to separate data and as the number of weights increase in a network which is a reflection of the number of inputs and the number of hyperplanes used the number of patterns necessary to guarantee generalization increases. Figure 2 (aa) illustrates that when data is distributed in a manner complementary to data clustering, fewer hyperplanes are necessary to memorize patterns or separate object classes. This explains the conflict between Baum and Haussler's requirement of 258 patterns for generalization 130  and the actual data set size of 87 training patterns: certain types of clustering allow easier separation in data space. Abe et al. (1993) also examined the effect of increasing the network tolerance, allowable training error, upon the network's generalization. They found that as the tolerance increased the classification rate decreased. necessary  This showed that sufficient minimization of network error is  to arrive upon a satisfactory network parameter state for successful  network  generalization. This was confirmed in this study by using a minimum training classification rate to ensure that the network weights had reached an acceptable minimum on the error surface. However, when the minimum was raised too high, e.g. 90%, the network was found to over-fit the data and generalized poorly (Fig. 4-11). Abe et al. (1993) found that the size of the training set affected the generalization of the network, i.e. as the number of patterns increased the classification rate increased, verifying the comments made earlier in regard to definition of the input data space. Partridge (1996) found that the randomization of the initial weights and the number of hidden nodes affects a network's ability to generalize. The effect of varying the starting point on the error surface, i.e. the initial randomized values of the weights, was that the generalization of the network varied. This can be explained because as the starting position on the error surface is varied, the path to the global minimum or a sufficient local minimum is changed. This change may actually restrict the ability of the network training method to reach that minimum depending upon the topographic features blocking the way. By varying the initial weights with the random number generator seed, a solution was converged upon. The effects of changing the seed number can be seen in Figure 4-6. The resultant generalization rate varied with the different starting points on the error surface. The effects of premature saturation were discussed by Lee et al. (1991, 1993).  Incorrect  saturation was observed to occur (a) when the initial weights were not uniformly distributed within 131  a small range of values; (b) when the number of nodes was not maintained low for satisfactory operation of the network; and (c) when the nodes operated in the non-linear regions. The initial weights have already been discussed, but restated, if the network starts in a poor location in error space defined by the network weights, then the minimum is difficult to find. In relation to weight magnitude, if the weights are high, the gradient will be low and there will be a good chance of remaining in a poor local minimum for the problem. The network error may have to increase to escape from the local minimum and this would be restricted by the gradient method because the direction needed would be up which does not decrease network error. The effect of the number of hidden nodes has already been discussed in depth. Keeping the nodes operating in their linear region was looked at and Figure 4-10 shows the effect of varying the target values. There is no conclusive evidence that with the target values set at {0.1, 0.9} the network generalizes better than when targets were set at {0, 1}. The two pairs of targets used different format data sets, thus, the factors that may produce better generalization were not isolated.  Therefore, more investigation  should be done with the data set format, the adjusted reference plane with and without statistical transformations, and the target values' effect on where each neuron is operating, either in the linear or non-linear range. As Rumelhart and McLelland (1986) discussed, the targets set at the extremes of {0, 1} force the connection weights to have high values with the actual targets being impossible to attain. The distribution of the output values would also be non-normally distributed with targets set at the limits, but better distributed with the targets inside the operating range of the neurons, thus, representing a more valid distribution of the data in output space, i.e. the classification of patients being normally distributed like most population measurements. Overall, the network was seen to generalize better when the network was forced to use fewer hyperplanes to divide the data space into regions. This usually meant that the memorization rate of 132  the network decreased but the cross validation rate increased.  The maximum generalization  observed was attained when the average training rate equaled the cross validation rate of 87.8%. A threshold seems to have been reached with the data set and the configuration of the ANN. The limiting factor was the size of the data set. If the number of hyperplanes used in data space was increased, the network memorized better but the cross validation rate decreased, thus, over-fitting of the data occurred.  Other studies have used back-propagation artificial neural networks for the evaluation of glaucoma predictors. Goldbaum et al. (1994) used visual field indices as glaucoma predictors, and classified the optic discs using their clinical appearance. The model was cross validated with a relatively poor classification rate of 67% (71% specificity, 65% sensitivity). The model used 120 patients (60 glaucomatous and 60 normal), and 54 visual field indices, the patient age and the eye of examination as network inputs. The number of dimensions in input space was so high that the 120 patient data points would have been extremely sparsely distributed. Forcing the network to use one hidden layer with four hidden units would therefore not be expected to produce good results as the study showed.  Brigatti et al. (1996) used functional and structural data in combination and separately for the prediction of glaucoma. They used a test set which was one third of the entire data set for cross validation. The data set consisted of 185 glaucomatous and 54 normal subjects. The subjects were classified using IOP and visual field defects. The appearance of the optic disc was used to verify the classification. The structural variables used were cup/disc area ratio, rim area, cup volume, an average of 64 height measurements around the disc, and an average of 8 height measurements in both the inferior and superior poles of the disc, separately. These measurements were made using  133  the Rodenstock Optic Nerve Head Analyzer. The three functional measurements (visual field indices not used to classify the patients) were mean defect, corrected loss variance, and short term fluctuation. For the combination of structural and functional data, an overall classification of 88% (84% specificity, 90% sensitivity) was obtained. For the structural measurements only, an overall classification of 80% (56% specificity, 87% sensitivity) was found, and for the functional measurements only, the overall classification was 84% (86% specificity, 84% sensitivity).  Like  many studies reviewed here, the results were biased because functional measurements which are theoretically related to the visual field parameters were used to classify the patients, and the optic disc appearance was used to filter out any abnormal looking glaucomatous or normal patients. The patient groups were mismatched in size, 185 glaucomas to 54 normals, and there was no mention of analysis effects being corrected for as mentioned in Section 3.2.2.3.2. The structural results show this error because the specificity, the number of correct normals, was poor at 56%. This did not affect the overall classification because the normals only make up -23% of the data set. The way to provide a good validation of these models would be to use a large independent test set consisting of an equal number of normal and glaucomatous patients.  Uchida et al. (1996) studied of the use of confocal scanning laser ophthalmoscope parameters for the prediction of glaucoma. The patients were classified using IOP, visual field indices, and the appearance of the optic disc. The data set consisted of 43 normal and 53 glaucomatous patients, and was divided into three subsets; two subsets were used to train the network and one to test. Nine optic disc measures were obtained: cup area, cup-to-disc area ratio, rim area, height variation contour, cup volume, rim volume, cup shape measure, mean retinal nerve fiber layer height, and retinal nerve fiber layer cross-section area.  The variables were not corrected for age effects  134  although there was a difference in means for the groups; 50.9 ± 13.6 for normal and 56.1 ± 10.0 for the glaucomatous. A network with one hidden layer containing four hidden nodes was used. The results was an overall classification rate of 92% (91% specificity, 92% sensitivity).  Bias was  entered into this study by using the appearance of the optic disc to first classify patients. However, the most important observation is that the network used four hidden nodes with nine inputs and two classes.  The capacity of such a network is very large and it is hard to believe that valid  generalization was obtained with such a small data set. Thus, the data must have been clustered close together from the criteria for their classification, i.e. the result of the bias.  The new  parameters in this study were the retinal nerve fiber layer thickness measurements which were shown to have very little predictive power compared with cup shape measure. The area under an ROC curve for cup shape measure was 0.93; for mean retinal thickness was 0.69; and for retinal cross-section area was 0.62. The study used idealized groups of subjects to produce ideal results. Similar work was completed by Parfitt et al. (1995b) and presented at a conference before the above work was published. 5.4 Both Models Considered The two methods with which data was analyzed have had their advantages and disadvantages. The data were all derived from human subjects; data will usually be normally distributed over a population. The training set was small, so this sample could not be shown to represent a normal distribution, but was assumed to be distributed at least approximately normally. The two groups should have been easily separable if there was a correlation between the parameters and the pathological changes in the ONH due to glaucoma. Although a higher level feature analysis of the raw ONH images would possibly simulate the ability of a clinician to diagnose glaucomatous images among a mixed data set, the actual features 135  that differentiate between the healthy and diseased state are not known. The set of measurements available was a plausible starting point for an investigation of what features of the optic nerve head are important. The discriminant function analysis was based upon the assumption that the data were normally distributed and the multivariate distributions were separable by a linear function. In contrast, the artificial neural network offered a non-linear solution to the same problem. Figure 5-4 shows two overlapping normal distributions of data. There is no possible way to achieve 100% specificity and 100% sensitivity with the mapping produced by this classification model's transfer function. Thus, with the data used in this study, it is possible that no useful mapping exists which will adequately separate the groups. The fact that the validated results for both models were similar suggests that the solutions may be the best that can be extracted from the parameters. Decision boundary  Class  Class  Figure 5-4. The decision boundary between two overlapping normal distributions.  In summary, the discriminant function offered insight into which parameters contributed most to the classification of the disease groups, i.e. the shape changes to the optic nerve head most prominent during the nerve fiber degeneration of glaucoma. The artificial neural network offered a non-linear approach which offers the ability to separate groups, as in Figure 5-2 (b), which would normally not be separable with a linear classification function. The artificial neural network model offers the most promise for further investigation, however, the limiting factor for both methods is  136  the adequate definition of the data space. The size of the data set needs to be much larger, by an order of magnitude of about a thousand, for the methods to be fully explored. For example the A N N was limited to a one hidden layer network because of data set size, the possible solutions offered by the types of regions defined in data space by a second hidden layer may provide a better solution to the problem. 5.5 Application to Screening The population sample consisted of approximately 50% normal and 50% glaucomatous patients.  As mentioned in the introduction, the prevalence of glaucoma in the North American  population is about 3%. Thus, the sample is not an accurate representation of a normal population. Both analyses can be performed taking into account unbalanced group numbers, but the actual distributions which accurately represent the populations that would be screened using classification models built upon scanning laser ophthalmoscope data are not known. However if the general population was mass screened for glaucoma this would be very costly, with little reward. At present, the usual way for admission to a glaucoma clinic is by referral for a serious ophthalmic condition requiring a specialist.  This is the most probable way in which a person would be  screened for glaucoma due to vision problems; by this point some degree of retinal degeneration has usually occurred. The patients who will ultimately be screened will most likely come from a sample of the population that have been referred to a specialist eye doctor due to some type of vision problem.  If this sample could be approximated then a classification model could be  developed that would take into account the population dynamics.  Again, there are tradeoffs  between the specificity (the correctly diagnosed normals) and the sensitivity (the correctly diagnosed glaucomatous patients): the false positives cost the system money when treated improperly or are recommended for further testing, and the false negatives may not be correctly 137  diagnosed until a considerable amount of damage has been done. Both models maximized both the specificity and sensitivity, however, the percentage of misdiagnosed cases would both be costly to the system and to the patients if this were implemented as a primary screening device.  The  clinician's judgment and experience far outweigh the implementation of one screening device, but there is a definite need for the earlier detection of glaucoma.  138  6. Conclusions and Recommendations 6.1 Concluding Remarks The focus of this work has been to assess the performance of optic nerve head shape measurements obtained with the Heidelberg Retina Tomograph for the early detection of glaucomatous damage to the eye. The work involved extensive data collection from optic nerve head images; repeated analysis of the data set using statistical analyses and artificial neural networks; and development of code for the analysis instruction sets. The thesis reports on the important progress made in these steps and the results of these efforts. Two types of analysis were performed: discriminant function analysis and artificial neural network analysis. The discriminant function analysis classified 88.8 % of the subjects correctly, with a specificity of 86.7% and the sensitivity was 90.9%. These results were validated with a jackknife procedure which classified 87.8% of the subjects correctly, with a specificity of 86.7% and a sensitivity of 88.6%. The artificial neural network analysis classified 87.8% of the subjects correctly, with a specificity of 86.7% and a sensitivity of 88.8%, using cross validation. The average training classification rate was 87.8%. The discriminant function analysis was a linear analysis which performed as well as the non-linear artificial neural network analysis when comparing the validated results.  The discriminant function did perform better when using the  model's actual classification rate as the measure of performance; 88.8% compared to the 87.8% average training classification rate of the network. The area under the receiver operating characteristic curves for the two types of classification models was higher for the DFA, 0.916, compared to the A N N , 0.879. Classification was also more robust with the D F A model because the rate was less affected by the placement of the decision  139  threshold, illustrating that the group distributions are more easily separated within the model. The results show that for this data set the linear discriminant function analysis was sufficient, and the more common availability of statistical packages make it the choice for modeling. Non-linear techniques usually involve more intensive calculations and neural networks also require more setup for analysis. Therefore, the extra effort required to do artificial neural network analysis was not warranted by the results. However, as the data set grows and the data space becomes more fully defined, non-linear techniques can be expected to show better results, because of their flexibility for building the classification regions. The discriminant function analysis has a large advantage over the artificial neural network analysis because it only uses a few parameters. It gives insights as to which shape features of the optic nerve head are most affected by glaucoma. The parameters which contributed most to the classification were, in descending order, the third moment for the frequency distribution of height values corrected for age effects, the volume of tissue above the reference plane (i.e. the neuroretinal rim volume), and the average height of the optic nerve head surface within the contour line. It is unlikely at present that the HRT will replace the standard methods of visual field testing because there is no increase in performance and the cost of the equipment is high. However, the performance is a flawed measure to use to compare because the results of the HRT were judged using visual field indices, which themselves may not be completely accurate. The instrument does have some advantages over present methods which are listed below: (i) the results are quantitative measures which do not require clinical interpretation, thus removing some error introduced by the clinician; (ii) the time taken to obtain the images is far less than for the visual field tests, which minimizes the patient's and the technician's time; 140  (iii) the instrument can be used for further research which will continue to achieve insights into the mechanisms of physical ONH damage in glaucoma; (iv) visual field testing may be less accurate because it depends on perceptual measures which are susceptible to variation, e.g. if the patient is elderly and gives unreliable responses. It is also inherently variable because of the probabilistic nature of perception close to detection threshold. 6.2 Recommendations for Future Work Further investigation using the scanning laser ophthalmoscope as a predictive tool is necessary. The following is a list of the directions which should be taken: (i) investigating the use of the sector values (the ONH shape in different sectors) as predictors; (ii) increase the data set size to further define the model and investigate other analysis methods for model building; (iii) introduce new optic nerve head measures using insight from prior analysis and other glaucoma studies; (iv) improve the optic nerve head outlining procedure to reduce human introduced error and automate the analysis process; (v) look for a reference plane location which is more constant during the progression of glaucoma.  141  Bibliography Abe S, Kayama M , Takenaga H , Kitamura T.1993:Extracting Algorithms From Pattern Classification Neural Networks. Neural Networks. ;6:729-735. Abu-Mostafa YS.1989:The Vapnik-Chervonenkis Dimension: Information versus Complexity. Neural Computation. ;1:312-317. Airaksinen PJ, A . T, Alanko HI. 1991 Prediction of Development of Glaucoma in Ocular Hypertensive Patients. In: Krieglstein GK, ed. Glaucoma Update IV. Berlin: Springer; 183-6. Airaksinen SJ, Tuulonen A. 1993:Reference plane determination.: Department of Ophthalmology University of Oulu, Finland. Armaly M F , Krueger D E , Maunder L , Becker B, Hetherington J, Kolker A E , Levene RZ, Maumenee A E , Pollack IP, Shaffer RN.1980:Biostatistical Analysis of the Collaborative Glaucoma Study: I. Summary Report of the Risk Factors for Glaucomatous Visual-Field Defects. Arch Ophthalmol. ;98(Dec):2163-71. Balazsi A G , Rootman J, Drance S M , Schulzer M , Douglas GR.1984:The Effect of Age on the Nerve Fiber Population of the Human Optic Nerve. American Journal of Ophthalmology. ;97(6):760-766. Baum EB.1988:On the Capabilities of Multilayer Perceptrons. Journal of Complexity. ;4:193-215. Baum E B , Haussler D.1989:What Size Net Gives Valid Generalization? Neural Computation. ;1:151-160. Bradley JV.1984:The complexity of nonrobustness effects. Bulletin of the Psychonomic Society. ;22(3):250-253. Brigatti L, Hoffman D, Caprioli J.1996:Neural Networks to Identify Glaucoma With Structural and Functional Measurements. American Journal of Ophthalmology. ;121(5):511-521. Bronzino JD. 1995: Vision. Biomedical Engineering Handbook. Hartford: CRC Press; Ch. 4. Caprioli J, Klingbeil U , Sears M , Pope B.1986:Reproducibility of Optic Disc Measurements With Computerized Analysis of Stereoscopic Video Images. Arch Ophthalmol. ;104(July): 1035-9. Caprioli J.1992:Discrimination Between Normal Ophthalmology & Visual Science. ;33(1): 153-9.  and Glaucomatous  Eyes. Investigative  Chan K H , Johnson K A , Becker JA, Satlin A, Mendelson J, Garada B, Holman BL.1994:A Neural Network Classifier for Cerebral Perfusion Imaging. JNucl Med. ;35(5):771 -4. 142  Chandler PA, Grant W M . 1965.Lectures on Glaucoma. Philadelphia: Lea & Febiger. Churchland PS, Sejnowski Cambridge: MIT; 61-137.  TJ. 1992Computational  Overview. The Computational Brain.  Cioffi GA.1993:Editorial: Optic Nerve Head Analysis in the 1990s. Journal of Glaucoma. ;2(2):779. Colton T, Ederer F.1980:The distribution of intraocular pressures in the general population. Surv Ophthal. ;25:123. Cornfield J. 1962:Joint dependence of risk of coronary heart disease on serum cholesterol and systolic blood pressure: A discriminant function analysis. Proc Fed Am Soc Exp Biol. ;21:58-62. Cowan JD. 1967:A Mathematical Theory of Central Nervous Activity. London: University of London. Dandona L , Quigley H A , Jampel HD.1989:Reliability of Optic Nerve Head Topographic Measurements With Computerized Image Analysis. American Journal of Ophthalmology. ;108(4):414-21. Dandona L, Quigley HA, Jampel HD. 1989:Variability of Depth Measurements of the Optic Nerve Head and Peripapillary Retina With Computerized Image Analysis. Arch Ophthalmol. ;107(December): 1786-92. Dixon WJ, ed. BMDP Statistical Software Manual. 1988 ed. Berkeley, C A : University of California Press; 1988. Drance SM.1975:Correlation of optic nerve and visual field defects in simple glaucoma. Trans Ophthalmol Soc UK. ;95:288-96. Drance SM.1976:Correlation between optic disc changes and visual field defects in chronic openangle glaucoma. Trans Am Acad Ophthalmol Otolaryngol. ;81:224-6. Drance S M , Schulzer M , Douglas GR, Sweeney VP.1978:Use of Discriminant Analysis: II. Identification of Persons With Glaucomatous Visual Field Defects. Arch Ophthalmol. ;96(Sept): 1571-3. Drance SM, Schulzer M , Thomas B, Douglas GR. 1981 Multivariate Analysis in Glaucoma: Use of Discriminant Analysis in Predicting Glaucomatous Visual Field Damage. Arch Ophthalmol. ,99.1019-22. Dreher AW, Weinreb RN. 1991: Accuracy of Topographic Measurements in a Model Eye with the Laser Tomographic Scanner. Investigative Ophthalmology & Visual Science. ;32(11):2992-2996. 143  Dreher A W , Tso PC, Weinreb RN.1991:Reproducibility of Topographic Measurements of the Normal and Glaucomatous Optic Nerve Head With the Laser Tomographic Scanner. American Journal of Ophthalmology. ;lll(February):221-229. Errington PA, Graham J. 1993Classification of Chromosomes using a Combination of Neural Networks. IEEE International Conference on Neural Networks; 1236-1241. Ganley JP, Roberts J. 1983:Eye conditions and related need for medical care among persons 1-74 years of age: United States, 1971-72. Washington: Vital Health and Statistics. Goldbaum M H , Sample PA, White H , Cote B, Raphaelian P, Fechtner RD, Weinreb RN.1994:Interpretation of Automated Perimetry for Glaucoma by Neural Network. Investigative Ophthalmology & Visual Science. ;35(9):3362-3373. Goldmann MH.1954:Un nouveau tonometre a aplanation. Bull Soc Franc Ophthal. ;67:474. Griner PF, Mayewski RJ, Mushlin A L , Greenland P. 1981 Selection and interpretation of diagnostic tests and procedures: Principles and applications. Annals Int Med. ;94:553-592. Hanley JA, McNeil BJ.1982:The Meaning and Use of the Area under the Receiver Operating Characteristic (ROC) Curve. Radiology. ;143(April):29-36. Hart W M , Yablonski M , Kass M A , Becker B.1979:Multivariate Analysis of the Risk of Glaucomatous Visual Field Loss. Arch Ophthalmol. ;97(Aug).T455-8. Havener WH. \91\:Synopsis of Ophthalmology. 3rd ed. St. Louis: Mosby. Haykin S. \99A\Neural Networks. Englewoods Cliffs, NJ: Macmillan. Haykin S. \994\Neural Networks: A Comprehensive Foundation. Englewood Cliffs: Macmillan College Publishing Co. Hebb DO. 1949.The Organization of Behavior: A Neuropsychological Theory. New York: Wiley. Heidelberg. \994:Heidelberg Retina Tomograph Operation Manual (Revision 1.10). Heidelberg: Heidelberg Engineering. Higginbotham EJ.1993:Symposium: Diagnostic and Therapeutic Projections for Glaucoma Management into the 21st Century. Journal of Glaucoma. ;2(2): 128-51. Hoskins HD, Hetherington J, Glenday M , Samuels SJ, Verdooner SR.1994:Repeatability of the Glaucoma-Scope Measurements of Optic Nerve Head Topography. Journal of Glaucoma. ;3(1): 1727. Huang S-C, Huang Y-F. 1991 :Bounds on the Number of Hidden Neurons in Multilayer Perceptrons. IEEE Transactions on Neural Networks. ;2(l):47-55. 144  Hubel DH. \9SS:Eye, Brain, and Vision. New York: W. H. Freeman and Company. Humphrey. 1983:77?e Humphrey Field Analyzer Model 610 Owner's Manual. San Leandro, California: Humphrey Instruments. Johansson E M , Dowla FU, Goodman D M . 1990:Back-propagation learning for multi-layer feedforward neural networks using the conjugate gradient method. Lawrence Livermore National Laboratory: Livermore. Jonas JB, Gusek GC, Naumann GOH.1988:Optic disc, cup and neuroretinal rim size, configuration and correlations in normal eyes. Investigative Ophthalmology & Visual Science. ;29:1151. Jonas JB, Naumann G0.1989:Parapapillary retinal vessel diameter in normal and glaucoma eyes. II. Correlations. Investigative Ophthalmology & Visual Science. ;30(7): 1602-1611. Jonas JB, Nguyen N X , Naumann G.1989:The Retinal Nerve Fiber Layer in Normal Eyes. Ophthalmology. ;96(5):627-632. Kass M A , Zimmerman TJ, Alton E, Lemon L, Becker B.1978:Intraocular pressure adn glaucoma in the Zuni indians. Archives of Ophthalomogy. ;96(12):2212-3. Kippenhan JS, Barker WW, Pascal S, Nagel J, Duara R.1992:Evaluation of a Neural-Network Classifier for PET Scans of Normal and Alzheimer's Disease Subjects. J Nucl Med. ;33(8): 1459-67. Klein BE, Klein R, Sponsel WE, Franke T, Cantor LB, Martone J, Menage MJ.1992:Prevalence of glaucoma. The Beaver Dam Eye Study. Ophthalmolgy. ;99(10): 1499-504. Kolker A E , Becker B. 1977:'Ocular hypertension' vs. open-angle glaucoma: a different view. Arch Ophthal. ;95:586. Kramer A H , Sangiovanni-Vincentelli A . 1989:Efficient parallel learning algorithms for neural networks. Advances in Neural Information Processing Systems 7. San Mateo: Morgan Kaufmann; 40-48. Kronfeld PC.1974:Glaucoma and the Optic Nerve: A Historical Review. Survey of Ophthalmology. ;19(3): 154-165. Kruse FE, Burk ROW, Volckerr H-E, Zinser G, Harbarth U.1989:Reproducibility of Topographic Measurements of the Optic Nerve Head with Laser Tomographic Scanning. Ophthalmology. ;96(9): 1320-4. Lachenbruch PA, Mickey MR. 1968Estimation of Error Rates in Discriminant Analysis. Technometrics. ;10(1):1-11.  145  Lamiell J M , Ward JA, Hilliard JK. 1993:Detection of Type-Specific Herpesvirus by Neural Network Classification of Western Blot Densitometer Scans. IEEE International Conference on Neural Networks; 1731-1737. Lapeurta P, Azen SP, LaBree L.1995:Use of neural networks in predicting the risk of coronary artery disease. Computers and Biomedical Research. ;28(l):38-52. LeCun Y.1985:Une procedure d'apprentissage pour risseau a seuil assymetrique. Cognitiva. ;85:599-604. Ledley RS, Ing PS, Lubs HA.1980:Human Chromosome Classification Using Discriminant Analysis andBayesian Probability. Computational Biological Medicine. ;10:209-218. Lee Y, Oh S, Kim M . 1991:The effect of initial weights on premature saturation in backpropagation learning. International Joint Conference on Neural Networks. Seattle, WA; 765-770. Lee Y, Oh S-H, Kim MW. 1993:An Analysis of Premature Saturation in Back Propagation Learning. Neural Networks. ;6:719-728. Leydhecker W, Akiyama K, Neumann HG.1958:Der intraokulare Druck gesunder menschlicher Augen. Klin Monatsbl Augenheilkd. ;133:662. Lichtman JW. 1994:Confocal Microscopy. Scientific American; 40-45. Lippman RP. 1987:An Introduction to Computing with Neural Nets. IEEE ASSP Magazine. (April):4-21. Lippman RP.1989:Pattern Classification Magazine. (November):47-64.  Using Neural Networks.  IEEE Communications  Lusky M , Taylor J, Bosem M E , Weinreb RN.1992:Reproducibility of Topographic Measurements of the Optic Nerve Head With the Retina Tomograph. Investigative Ophthalmology & Visual Science. ;33(4):885. Lusky M , Bosem M E , Weinreb RN. 1993 Reproducibility of Optic Nerve Head Topography Measurements in Eyes with Undilated Pupils. Journal of Glaucoma. ;2(2): 104-109. Lusted LB. 1971:Signal detectability and medical decision-making. Science. ;171:1217-1219. Lusted LB.1978:General problems in medical decision making, with comments on ROC analysis. Seminars Nucl Med. ;8:299-306. Maddalena DJ, Johnston GA. 1995 Prediction of receptor properties and binding affinity of ligands to benzodiazepine / G A B A A receptors using artificial neural networks. Journal of Medicinal Chemistry. ;38(4):715-24. 146  Malmo A H . 1991:Some Characteristics of Glaucomatous Visual Field Loss. Glaucoma Update IV. Heidelberg: Springer-Verlag; 133-139. Manly BFJ. l986:Multivariate Statistical Methods: A Primer. New York: Chapman and Hall. Metz CE.1986:ROC Methodology in Radiologic Imaging. Invest Radiol. ;21(9):720-733. Mikelberg FS, Drance SM, Schulzer M , Yidelgigne H M , Weis MM.1989:The Normal Human Optic Nerve: Axon Count and Axon Diameter Distribution. Ophthalmology. ;96(9): 1325-8. Mikelberg FS, Wijsman K, Schulzer M.1993:Reproducibility of Topographic Parameters Obtained with the Heidelberg Retina Tomograph. Journal of Glaucoma. ;2(2): 101-103. Mikelberg FS, Yidegiligne HM.1993:Axonal loss in band atrophy of the optic nerve in craniopharyngioma: a quantitative analysis. Canadian Journal of Ophthalmology. ;28(2):69-71. Mikelberg FS, Parfitt C M , Swindale NV, Graham SL, Drance SM, Gosine R. 1995:Ability of the Heidelberg Retina Tomograph to Detect Glaucomatous Visual Field Loss. Journal of Glaucoma. ;4(3):242-247. Minsky M.1988:Memoir on Inventing the Confocal Scanning Microscope. Scanning. ;10(4):128138. Nasemann JE, Burk ROW, eds. Scanning Laser Ophthalmoscopy and Tomography. Berlin: Quintessenz; 1990. Nilsson NJ. \965:Learning Machines. New York: McGraw-Hill. Parfitt C M , Mikelberg FS, Swindale NV. 1995a:The Detection of Glaucoma Using An Artificial Neural Network. 17th Annual International Conference of the IEEE Engineering in Medicine and Biology Society & 21st Canadian Medical and Biological Engineering Conference. Montreal: IEEE. Parfitt C M , Mikelberg FS, Swindale NV, Green S, Graham SL, Drance SM.1995b:The Use of Artificial Neural Networks to Identify Optic Nerve Head Degeneration Due to Glaucoma. Investigative Ophthalmology and Visual Science. ;36(4):S628 Abstract 2885. Parker DB. 1985:Learning logic: Casting the cortex of the human brain in silicon. Cambridge: Center for Computational Research in Economics and Management Science, MIT. Partridge D.1996:Network Generalization Differences Quantified. Neural Networks. ;9(2):263-271. Pedroni V A , Yariv A. 1993:Learning in the Hypercube. IEEE International Conference on Neural Networks; 1168-1171. Phelps CD. 1977:Ocular hypertension: to treat or not to treat? Arch Ophthal. ;95:588. 147  Pizzi N, Choo LP, Mansfield J, Jackson M , Halliday W C , Mantsch HH, Somorjai RL.1995:Neural Network Classification of Infrared Spectra of Control and Alzheimer's Diseased Tissue. Artificial Intelligence in Medicine. ;7(l):67-79. Quigley HA, Addicks EM.1982:Quantitative Studies of Retinal Nerve Fiber Layer Defects. Arch Ophthalmol. ;100(May):807-814. Quigley H A , Dunkelberger GR, Green WR.1989:Retinal ganglion cell atrophy correlated with automated perimetry in human eyes with glaucoma. American Journal of Ophthalmology. ;107:453. Quigley HA. 1993:Open-angle glaucoma. The New England Journal of Medicine. ;328(15):10971106. Raudys SJ, Jain AK.1991:Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence. ;13(3):252-264. Rumelhart D E , McClelland JL. \9&6:Parallel Distributed Processing. Cambridge, M A : MIT Press. Rumelhart D E , Hinton G E , Williams RJ. 1986:Learning integral representations by error propagation. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge: MIT Press; 316-362. Russo AP. 1991:Neural networks for sonar signal processing, Tutorial No. 8. IEEE Conference on Neural Networks for Ocean Engineering. Washington, DC. Sartori M A , Antsaklis PJ.199LA Simple Method to Derive Bounds on the Size and to Train Multilayer Neural Networks. IEEE Transactions on Neural Networks. ;2(4):467-471. Schwartz B. 1973:Cupping and pallor of the optic disc. Arch Ophthal. ;89:272. Shapiro SS, Wilk MB.1965:An analysis of variance test for normality (complete samples). Biometrika. ;52:591-611. Shields M B , Martone JF, Shelton AR, Ollie AR, MacMillan J.1987:Reproducibility of Topographic Measurements With the Optic Nerve Head Analyzer. American Journal of Ophthalmology. ;104(December):581-586. Shields MB, Tiedman JS, Miller KN, Hickingbotham D, Ollie AR.1989:Accuracy of Topographic Measurements With the Optic Nerve Head Analyzer. American Journal of Ophthalmology. ;107(March):273-279.  148  Shields MB.1989:Editorial: The Future of Computerized Image Analysis in the Management of Glaucoma. Am J of Ophthalmol. ;108(3):319-23. Shields M B . \992:Textbook of Glaucoma. 3rd ed. Baltimore, Maryland: Williams & Wilkins. Silverstone DE, Hirsch J. \9S6Automated Visual Field Testing; Techniques of Examination and Interpretation. East Norwalk, CT: Prentice-Hall. Sommer A , Pollack I, Maumenee A E . 1979a: Optic Disc Parameters and Onset of Glaucomatous Field Loss: I. Methods and Progressive Changes in Disc Morphology. Arch Ophthalmol. ;97(Aug): 1444-8. Sommer A, Pollack I, Maumenee AE.1979b:Optic Disc Parameters and Onset of Glaucomatous Field Loss: U. Static Screening Criteria. Arch Ophthalmol. ;97(Aug): 1449-54. Stone M.1974:Cross-validatory choice and assessment of statistical predictions. J Roy Statist Soc SerB. ;36:111-147. Sugar HS, Barbour FA.1949:Pigmentary glaucoma. A rare clinical entity. American Journal of Ophthalmology. ;32:90. Susanna R, Drance SM.1978:Use of Discriminant Analysis: I. Prediction of Visual Field Defects From Features of the Glaucoma Disc. Arch Ophthalmol. ;96(Sept): 1568-70. Swets JA.1986a:Indices of Discrimination or Diagnostic Accuracy: Their ROCs and Implied Models. Psychological Bulletin. ;99(1): 100-117. Swets JA.1986b:Form of Empirical ROCs in Discrimination and Diagnostic Tasks: Implications for Theory and Measurement of Performance. Psychological Bulletin. ;99(2): 181-198. Swets JA.1988:Measuring Accuracy of Diagnostic Systems. Science. ;240:1285-1293. Syu M J , Tsao GT. . 1993:Neural Network Approach to Identify Batch Cell Growth. IEEE International Conference on Neural Networks; 1742-1747. Tabachnick BG, Fidell LS. \9%9:Using Multivariate Statistics. 2nd ed. New York, New York: Harper Collins. Thomas JV. 1992:General Considerations. In: Thomas JV, Belcher CD, Simmons JR, Cook LL, eds. Glaucoma Surgery. St. Louis: Mosby; 1-2. Tielsch JM, Sommer A, Katz J, Royall R M , Quigley H A , Javitt J.1991:Racial variations in the prevalence of primary open-angle glaucoma: the Baltimore Eye Survey. JAMA. ;266:369-74.  149  Uchida H , Brigatti L , Caprioli J.1996:Detection of Structural Damage From Glaucoma With Confocal Laser Image Analysis. Investigative Ophthalmology & Visual Science. ;37(12):23932401. van Camp D. 1993:A Users Guide for The Xerion Neural Network Simulator Version 3.1. Toronto: University of Toronto. Varma R, Spaeth GL, Parker KW. Lippincott Co.  \993:The Optic Nerve in Glaucoma. Philadelphia, PA: J. B.  Vaughan D, Asbury T. \9%6.General Ophthalmology. Eleventh ed. Connecticut: Lange. Webb RH, Hughes GW, Pomerantzeff O.\980Applied Optics. ;19:2991-2997. Webb RH, Hughes GW, Delori YCA989Applied Optics. ;26:1492-1498. Weinreb RN, Dreher A W . 1990:Reproducibility and Accuracy of Topographic Measurements of the Optic Nerve Head with the Laser Tomographic Scanner. In: Nasemann JE, Burk ROW, eds. Scanning Laser Ophthalmoscopy and Tomography. Berlin: Quintessenz; 177-181. Weinreb RN, Lusky M , Bartsch D-U, Morsman D.1993:Effect of Repetitive Imaging on Topographic Measurements of the Optic Nerve Head. Arch Ophthalmol. ;lll(May):636-638. Weinreb RN.1995:Editorial: Diagnosing and Monitoring Glaucoma with Confocal Scanning Laser Ophthalmoscopy. Journal of Glaucoma. ;4(4):225-7. Werbos PJ. 1974:Beyond regression: New tools for prediction and analysis in the behavioral sciences. Cambridge: Harvard. Zangwill L M , van Horn S, Lima MDS, Sample PA, Weinreb RN.1996:Optic Nerve Head Topography in Ocular Hypertensive Eyes Using Confocal Scanning Laser Ophthalmoscopy. American Journal of Ophthalmology. ;122(October):520-525.  150  Appendices A. Scanning Laser Ophthalmoscope Parameters (i) Scanning Laser Camera •  diode laser source wavelength  670 nm  •  maximum irradiance at retina  0.5 mW/cm  •  scanning speed horizontal line frequency  8 kHz  vertical image frequency  20 Hz  (ii) Image Acquisition •  field  of view transverse (actual image dimensions) minimum  10° x 10°  maximum  20° x 20°  longitudinal (total depth scan) minimum to maximum •  focus range  •  digitized image size  0.5 mm to 4.0 mm -12 to +12 diopters  2D image  256 x 256 pixels  3D image  256 x 256 x 32 voxels  •  line scan frequency  •  pixel frequency  •  scan time per image  8 000 Hz 4 MHz  2D image 3D image series •  1.6 sec  optical resolution transverse longitudinal  •  0.032 sec  10 pm 300 pm  digital resolution transverse  10 to 20 um/pixel  longitudinal  16 to 128 um/image  •  intensity resolution  8 bit  •  3D image file size  2 MByte  (iii) Topography Image 256 x 256 pixels  •  image size  •  total size  •  digital resolution  65 536 pixels  transverse maximum (10° x 10° image size)  10 um/pixel  minimum (20° x 20° image size)  20 um/pixel  longitudinal  •  maximum (0.5 mm scan depth)'  2 um  minimum (4.0 mm scan depth)  16 urn  absolute accuracy and reproducibility of the height measurements per pixel  •  computing time for topographic image  30 pm 60 sec  B. Chapter 4 Tables Variable  Level  age normal abnormal 36 i  .?.?£  normal abnormal normal abnormal  abrg. normal abnormal hie normal abnormal mhcg normal abnormal pheg normal ! abnormal hvc normal j abnormal vbsg normal abnormal vasg normal abnormal vbrg i normal j abnormal .YES.  mdg.  normal abnormal normal abnormal  tmg normal abnormal mr normal abnormal Table B-l.  Mean  i  Standard ; Standard Error Coefficient of Deviation of Mean Variation 56.247 i 14.397 1.5261 0.25596 51.600 14.964 2.2307 0.29000 61.000 12.221 1.8424 0.20034 2.212 j 0.441 j 0.0467 0.19942 j 2.226 j 0.396 0.0590 0.17775 2.197 0.487 0.0735 0.22180 1.344 0.495 0.0525 0.36819 1.168 0.444 0.0661 0.37986 1.525 i 0.484 0.0729 0.31728 0.912 0.491 0.0520 0.53786 0.655 1 0.379 0.0565 0.57855 j 1.176 0.453 0.0683 0.38555 0.174 0.0185 0.87445 0.199 j 0.106 i 0.129 0.0192 1.21630 i 0.295 ! 0.164 0.0246 0.55354 0.079 j 0.121 0.0128 1.53781 0.036 0.065 0.0097 1.80770 0.122 j 0.147 0.0222 1.20541 -0.095 0.127 0.0135 -1.33334 -0.138 0.069 0.0103 -0.50381 -0.052 0.156 0.0235 -2.99952 0.369 0.107 j 0.0113 0.28901 0.384 0.088 0.0131 0.22836 j 0.353 0.122 j 0.0184 0.34548 0.347 0.222 0.0235 0.63856 0.244 0.170 0.0253 0.69508 0.452 0.221 0.0332 0.48791 0.068 0.050 j 0.0053 0.73409 0.079 0.059 0.0087 0.73760 0.056 0.035 ! 0.0053 0.63301 0.231 0.192 1 0.0204 0.83152 j 0.134 j 0.119 0.0177 0.88324 0.331 j 0.203 0.61526 j 0.0307 0.280 0.162 0.0172 0.57807 0.368 0.149 0.0223 0.40575 0.190 0.120 0.01800 0.63032 0.626 j 0.212 0.0224 0.33798 0.571 0.219 0.0327 0.38344 0.682 0.190 0.0287 0.27895 -0.166 0.102 0.0108 -0.61483 -0.229 0.076 0.0113 -0.33107 -0.101 0.083 -0.82444 0.0126 0.10014 0.831 0.083 0.0088 0.834 0.0112 0.075 0.08999 0.827 0.11062 0.091 0.0138 Descriptive Statistics for Fourteen Stereometric Parameters.  153  -0.096:  mr  1 1  U:  60: • CN  , 60: "Si  ; Tt i ON ;o !d : O i r: cn iSO o iOi Id! oo : Tt CN i so ! cn 3 ! : Tt r~: cn d id di  varg !  -0.013! -0.163!! 0.188!  mdg j  1.0001 -0.007! 0.198!  60  6i  1 cn Tt di  00 o ON Os cn CN so cn cn d d d  OS 8 CN oo P Tt >n d d i  SO CN CN CN Tt o CN d• d d  §  00 X) > 00 >  "CN"  "©" 8" 'CN" ON P o CN SO rcn rcn >n Os CN di d d cd d d 0.138! 0.0711 -0.026; 0.556! 0.390! 0.040! -0.042!  00 >  hvc  1 §  i  P  h  60 O  cn cn' Tt 'os' 'ON" CN ON Tt cn CN r- Os CN o Tt cn CN di d di d d• d d d•  .™.  o ( N oo o in oo oo p rTt CN d d d d i o OS oo OS >n CN cn o o cn p oo SO >n oo SO d d p d di  0 o0  a XI  O cn cn in S ON OS Tt 00 o CN CN Tt o d d d d di i cn r- oo oo O Os so o rCN SO Tt o d di d d d  'CN' CN" 'os' 'so" d' 'CN 'in' r» Os cn Tt oo oo CN Tt SO cn oo m Tt m in d d d o d di d d d d d i ™ . ON 'so" o cn "CN" 'ON" "in ob cn cn*"CN" o so oo t-- OS o Os cn oo cn CN so CM o oo in so cn >n o p m d d d d o d d• d d d d d  O O O i  eag j  60 •8  •Si tt:  a";  i: CN CN i : ;o; i i  60: 31 >: 60: Oi  CN •  Tt; d:  O":  i  in:  : OO :m !d  \<  cn r- B cn  SO :  •Si tt;  . 1>  CN : CN ;  OH  !8i 8  iii  CN:  :Oi iOi ; cn; i cn: : Tt: : CN : i cn •cn i d|  O: : cn : O OO : iOS; >n: :0: I OO i  O:  :m I ON • >n id  tt:  i so;  © •  00  !so m : OS CM  Oi . r- i  : SO : SO : C N : CN ;  :  ON |  fcl  o U  I r—: I >n i  :O: :Os!  isoi  60: C3 : O  o 2 & a u  Tt: •ON: Tt 1 : oo: in : i C N i  !d! : <n : : r-~ i ! d!  60: < > - i  1  Id!  •SO: :so; : CN : •Tt:  60-  0  i cn! : oo : i m! io i  sV3  i  "3 X)  j  CN  cn: o;  PQ D  in r-» o d  ©""ob" OS OS o ON d d i cn" CN 'CN "CN" so os" C N ' Tt" cn TiON CN r- cn Tt OS SO en o o CN cn cn d d d i d d o o d d d  o 2  mhcg i phcg I  1  ™ .  ™ .  O ON" O VO u O cn CN 60 ^ p O O ct)  4) 60 60 c3 §p C3 CD  abrg  j  H  u >  vbsg vasg vbrg varg mdg tmg  60 c3  1 1 | i i ]  'so" o "so cn d "in o rSO Os >n >n cn oo Os oo p m m o o cn CM cn o d o © d d d i d d d d d d  ON :  m : f-  Si  <n i • Tt: Id!  :  : <n : id! i CN •OS: < D : ,: cn 8ft Id bo;  U :  Si  X)  Oi in i i cn i : CN :  60:  bOi  O: •a i  bo: w:i Xi >: cr-  X) > tt  o o  154  fa  d Q  oo  o  >  Q c5  60 IH  ?5  q  >  i-H  00 i-H  CN ©  •  ^ H  CN VO co CO co  d d o VO VO  1 VO  co o VO  d d  VO ,—i . i o d" ?5 co co o Os CO CN vo r© CO vo CN T—1 d1 d d d H  60-  >  60-  oo  CN vo OS oo rCN o i—i  60  c5 fj  -O  CO VO oo VO oo r-  vo o VO vo d d d d d 1  1  ,—i VO o CN VO oo o i-H CN OS VO vo <r> co o Tf VO CN o d d di d d d  ?5 oo OO oo  o  >  ^ H  i  VO r- ,—i O s oo r~ CO CN rp- V O VO VO •<* rO Q O o CN o o CN Q  60 O  r-] O.  T-H  q d d d q d d 1  I  CN CN  VO VO c- o oo CN oO oo CO CN V VO rO s q o o o o d d d d d d d d• c5 CO CO fo 1—t oo  OJJ  ^ H  T—<  *-H  '—1  1  i  ^3 c5 -C  V O oo T-H  o  oo  CO CO *—1  CN rCN  HH  VO oo T Os O vO T oo oo 00 vo CO •a- TJ" P~ o CO CN co CN CO  CO  oo  CN  CO co oo  CN  d d d d d d d d d i  O  60 t-i  -O.  q  HH  oo  HH  c5 HH  V) oo r- IT) C N rCO VO Os Os Os v-> vo O co co Os C N oo VO r- co o VO H  d dI d d d dI d d d d • 1  oo VO o o © CO CO VO VO o N CN 1-H V O Os CO co 00 o o C o CO VO V0 co rrNC N »—' r- C N vo o CN o oo o C d d d d d i i d d •d d d d  6G eS U  H-'  i  60  o o  o q  VO OS 00 ,—i ,—i oo VO VO oo VO CO VO I/O o T Os oo VO VO Os T f H N O Tt vO VO o \ OO C '—i VO C N C N V T T o O d ,—1  '—1  d d d  •d d d d d d d d I  CO o (N O v o VO O CN VO vo VO oo O roo V oo oj O CO oo o • ^ r CO O 60 q o m C3 q O d d d d o  I 60!  1 01  1 60!  o  CN oo Os Os Os V O VO CN p- v O VO oo oo co co d CN co o i  d d d d q i  Test of Normality W Statistic  Significance Level  Skewness ] Value/Std.Err.  Original  Kurtosis ] Value/Std.Err. j  0.9432 0.9867 0.9780 0.9524 0.9807 0.9730 0.9727 0.9609 0.9115 0.8118 0.8633 0.9471 0.9729 0.9724 0.9850  0.0419 -0.13 0.9385 j 0.21 0.6860 j -0.26 • 0.0980 1.87 0.7809 0.30 0.5132 -1.02 0.5036 -1.01 P..?.g hvc 0.2063 0.34 vbsg 0.0020 3.15 vasg 0 5.64 vbrg 0 4.44 0.0605 0.11 Yarg. 0.5105 -0.76 mdg 0.4954 0.39 tmg mr 0.9033 -0.38 Transformed 1 1 ! sqrtabrg 0.9807 0.7811 -0.79 sqrtvbsg 0.9816 0.8083 0.05 logvasg 0.9902 0.9840 i 0.27 sqrtvbrg 0.9838 0.8730 0.61 tmgeorr 0.9922 0.9949 0.03 Table B-3. Testing for normality of varible distributions with 45 normals. a  ge  ag. ?ag aprg hie mhcg  :  h  -1.74 -0.38 -0.85 0.34 -0.34 -0.45 -0.06 -1.35 1.99 7.93 4.88 -1.23 -0.70 -0.70 -0.43 0.15 0.02 -0.34 -0.05 -0.15  Dependent i  F Ratio ]  ag  j  4.8771  Variable age 1 Std. Error j P2TA1L Intercept ! Coefficient 1 1 0.0326] 2.66108] -0.0084] 0.0038] 0.03  eag  |  3.219]  0.0798]  1.57163]  sqrtabrg  0.855;  0.3603]  hie  0.2451  0.6235]  mhcg  5.735J  pheg  0.913;  hvc  0.237]  sqrtvbsg  PTAIL  -0.0078]  0.0044]  0.08  0.89091]  -0.0023]  0.0025]  0.36  0.07234]  6.46E-04]  1.31E-03]  0.62  0.0211]  -0.04083]  1.49E-03]  6.21E-04]  0.02  0.3446]  -0.17226]  6.69E-04]  7.00E-04]  0.34  0.6289]  0.40629]  -4.34E-04]  8.91E-04]  0.63  1.434;  0.2376]  0.57169]  -0.0021]  0.0018]  0.24  logvasg  2.514]  0.1202]  -1.42497]  0.0045]  0.0028]  0.12  sqrtvbrg  1.118]  0.2963]  0.41843]  -0.0017]  0.0016]  0.3  0.275]  0.45394]  -0.0017]  0.0015]  0.27  !  0.1135]  0.75202]  -0.0035]  0.0022]  0.11  7.794]  0.0078]  -0.33082]  1.98E-03]  7.10E-04]  0.01  varg  ]  1.223]  mdg  1  2  tmg  ]  -  6 1  :  Table B-4. Regression of variables with age; normal group only.  Normals Abnormals 0.92665 0.93199 ag. eag 0.97597 0.98465 sqrtabrg 0.95924 0.96743 hie 0.99575 0.99709 0.98652 mhcg 0.99283 pheg 0.95203 0.94486 hvc 0.82953 0.93936 sqrtvbsg 0.99331 0.99772 logvasg 0.90425 0.92893 sqrtvbrg 0.97240 0.97157 varg. 0.93482 0.95681 0.90778 0.96759 mdg. tmgeorr 0.83227 0.80648 Table B-5. Squared multiple correlations for both groups.  1 Levene'sF 1 p-value ! Pooled t Test j p-value 1 Separate t Test j d.f.(87) ! 1 d.f.(l,87) ! Variability i Disease ! ! i I I j 4.60 0.0348 -3.24 0.0017 -3.25 1 g. . 0.75 1 0.3904 0.30 0.7624 0.30 g 0.07 0.7992 1 -3.63 1 0.0005 1 -3.63 1 .eag. sqrtabrg 0.22 0.6398 -5.95 0.0000 -5.96 ic 0.78 0.3790 I -6.09 0.0000 -6.08 mhcg 0.79 0.3756 -3.59 0.0005 -3.57 2.21 0.1407 1 -3.36 1 0.0012 i -3.33 1 Phcg vc 1.03 0.3133 1.36 0.1776 1.35 sqrtvbsg 0.00 0.9573 -5.14 0.0000 -5.14 logvasg 0.00 0.9810 1 2.43 I 0.0170 1 2.43 I sqrtvbrg 1.78 0.1860 1 -5.81 0.0000 -5.80 2.87 ! 0.0939 6.21 0.0000 1 6.22 1 Y.«F& "... i mdg 1.44 0.2332 -2.54 0.0129 -2.54 tmgcorr 0.58 0.4484 -6.43 0.0000 1 -6.42 Eye i l l j i 6.08 1 0.0157 ! -0.18 age 0.8595 1 -0.18 2.95 0.0896 I -0.20 | 0.8392 -0.20 ag eag 0.86 0.3569 0.53 I 0.5966 1 0.53 sqrtabrg 0.31 0.5768 0.82 1 0.4154 1 0.82 1 ic 1.47 0.2290 1.59 0.1145 1 1.59 1 mhcg 1 5.73 I 0.0188 1.00 0.3195 0.99 I 2.12 0.1486 1.07 0.2862 1.07 P.h.9.g vc 0.14 0.7074 -1.05 0.2963 1 -1.05 sqrtvbsg 0.01 0.9386 1.35 0.1799 1 1.35 1 logvasg 1 0.33 I 0.5646 -0.44 0.6603 I -0.44 sqrtvbrg 0.07 0.7948 1.19 0.2364 1.19 varg 0.95 0.3322 1 -0.72 0.4709 -0.72 mdg 0.04 0.8368 1.41 0.1630 1.41 tmgcorr 1 0.34 | 0.5622 1.66 0.1005 1.66 Gender j j j 1 j j ag? 0.06 0.8104 0.36 0.7189 0.36 0.00 0.9851 -0.08 ag 0.9349 1 -0.08 eag 1 2.27 I 0.1357 -0.26 0.7938 1 -0.27 sqrtabrg 1.13 0.2917 0.22 I 0.8302 1 0.22 ic 0.30 0.5836 -0.30 0.7665 -0.30 mhcg 0.85 0.3586 0.19 1 0.8480 0.19 0.73 I 0.3959 0.52 0.6026 0.51 P .?.g vc 1 3.68 I 0.0583 -1.09 0.2782 -1.08 sqrtvbsg 1.59 0.2107 -0.77 0.4444 -0.77 logvasg 0.60 0.4388 0.63 0.62 0.5335 sqrtvbrg 0.24 0.6240 -0.13 | 0.8934 1 -0.13 I 0.4934 1 0.47 1 -1.07 0.2896 -1.06 yarg mdg 0.81 0.3703 -1.17 0.2462 -1.17 tmgcorr 0.16 0.6943 -0.13 0.8986 -0.13 Table B-6. Significance tests for reliable group mean differences.  p-value  1 d.f.  f o r  a  e  a  h  1 0.0017 0.7630 0.0005 0.0000 0.0000 0.0007 0.0015 0.1795 0.0000 0.0170 0.0000 0.0000 0.0128 0.0000  I 0.8599 0.8397 0.5977 0.4160 0.1160 0.3251 0.2903 0.2971 0.1802 0.6608 0.2364 0.4715 0.1630 0.0999  1 84.3 1 82.7 1 86.0 1 85.3 1 81.6 1 58.8 1 59.1 1 77.9 1 87.0 1 87.0 1 84.7 1 83.8 1 85.8 1 81.4 I 1 80.8 1 79.5 1 81.7 1 85.7 1 79.4 1 55.4 1 64.7 1 85.2 1 86.4 1 85.3 I 87.0 1 85.2 1 87.0 i 85.2 j  0.7188 0.9347 0.7916 0.8288 0.7672 0.8508 0.6088 0.2840 0.4413 0.5357 0.8932 0.2914 0.2435 0.8986  1 86.8 1 87.0 1 81.8 1 84.2 1 85.0 1 65.0 1 68.0 1 75.5 1 85.4 1 83.1 1 87.0 1 84.5 1 86.0 1 86.4  158  Group Class Gender Eye  Mahalanobis D Hotelling's T F Value 4.7735 107.4029 7.1222 0.5025 6.3231 0.4193 0.2749 11.3063 0.7498 Table B-7. The significance of the Hotelling's T test showing reliable group differences (d.f. 13,75). 2  2  p Value 0.0000 0.9583 0.7087  2  1 FACTOR 1 0.325 1 0.887 ! 0.853 ! 0.906 0.555 0.494 0.021 0.955 -0.671 j 0.926 -0.418 i 0.736 i 0.381 6.0756 0.4674  FACTOR 2 0.587 0.197 0.207 -0.198 -0.453 -0.72 0.58 0.237 0.297 0.204 0.535 0.375 -0.099 2.1484 0.6326  FACTOR 3 -0.407 -0.236 -0.366 0.353 0.529 0.137 0.73 0.071 -0.006 -0.179 0.537 0.269 0.205 1.7522 0.7674  FACTOR 4 0.5 0.009 0.058 0.023 0.25 0.265 -0.177 -0.027 0.465 -0.048 0.27 -0.292 0.533 1.0795 0.8504  ag eag sqrtabrg hie mhcg pheg hvc sqrtvbsg logvasg sqrtvbrg varg mdg tmgeorr Variance Explained (VP) Cumulative Proportion of Variance in Data Space Cumulative Proportion j 0.5495 0.7439 0.9024 1 of Variance in Factor Space Table B-8a. Principal components for the normal group, n=45, with associated statistics. Factors have unrotated loading coefficients.  I Factor 1 1 0.284 i 0.783 I 0.895 I 0.82 I 0.402 ! 0.451 ! -0.292 i 0.916 ! -0.609 1 0.939 I -0.671 j 0.608 1 0.667 ! 5.9711 j 0.4593 ; j i 0.5334  I Factor 2 ! -0.719 I -0.476 ! -0.297 I 0.477 1 0.778 I 0.789 i -0.109 j -0.286 ! -0.112 i -0.188 i -0.244 j -0.112 1 0.038 j 2.5017 j 0.6518  i  Factor j 3 i 0.097 i 0.034 i -0.147 j 0.294 ! 0.336 j 0.027 j 0.826 ! 0.219 \ 0.217 ; -0.016 1 0.571 i 0.559 j -0.12 i 1.662 j 0.7796 j  Factor 4 0.588 0.186 0.102 0.033 0.312 0.351 -0.184 -0.086 0.502 -0.08 0.153 -0.353 0.013 1.0602 0.8612  ag eag sqrtabrg hie mhcg pheg hvc sqrtvbsg logvasg sqrtvbrg varg mdg tmgcorr Variance Explained Cumulative Proportion of Variance in Data Space Cumulative Proportion 0.9053 ! 1 | 0.7568 of Variance in Factor Space Table B-8b. Principal components for the both groups, n=89, with associated statistics. Factors have unrotated loading coefficients. FACTOR 1 0.619 0.789 0.937 0.282 -0.223 -0.068 -0.452 0.902 -0.447 0.927 -0.487 0.342 0.618 4.8285 0.3714  : FACTOR j 2 -0.481 -0.344 -0.111 0.862 0.847 0.909 j -0.251 j -0.064 I -0.3 | 0.146 ! -0.436 0.154 0.146 3.0641 0.6071  FACTOR I FACTOR 3 4 0.306 0.505 0.201 0.341 -0.124 0.221 0.398 0.009 0.345 0.316 0.087 0.361 0.666 ! -0.141 0.37 -0.164 0.389 j 0.4 0.063 -0.189 0.673 0.043 0.618 -0.59 -0.101 1 -0.058 2.0166 I 1.2462 0.7622 0.8581  ag | eag j sqrtabrg hie mhcg pheg j hvc 1 sqrtvbsg i logvasg j sqrtvbrg j varg | mdg I tmgcorr Variance Explained Cumulative Proportion of Variance in Data Space j 0.8883 j 1 Cumulative Proportion 0.4328 0.7075 of Variance in Factor Space Table B-8c. Principal components for the abnormal group, n=44, with associated statistics. Factors have unrotated loading coefficients.  Classification Matrix j ! % Correct j normal i abnormal 1 86.7 ! 86.4 j 80 ! 89.1 j 75.6 1 79.5 i 75.6 1 86.4 j 86.7 90.9 Table B-9. Discriminant  Data Set Adj. Ref Std. Ref PCA both PCA normal Final  Jackknife Validation % Correct j total normal abnormal i 86.5 I 84.4 86.4 84.6 77.8 87 77.5 75.6 79.5 ! 75.6 84.1 1 80.9 88.8 86.7 88.6 j function analysis results. ;  total 85.4 82.4 77.5 79.8 87.6  3, 87 df 3, 85 df jU-Statistic (Wilks'j Approximate I Lambda) i F-Statistic i p<o.ooi Adj. Ref 0.498611 28.491 Std. Ref 0.581395 31.68 PCA both 0.630614 25.188 PCA normal 0.649244 i 23.231 Final 0.532239 24.901 Table B-10. DFA statistics showing significance of function. I Step No. Adj. Ref 1 2 3 Std. Ref 1 2 PCA both 1 2 PCA normal 1 2 Final 1 2 3  i Variable  ! tmg  .  v a r g  hie  1  i Approximate F to Enter j U-Statistic j F-Statistic j D.F. j j | 56.9711 56.9721 0.6043 j 1,87 0.5313! 37.928! 2, 86 11.808! 5.579! 0.4986! 28.491! 3,85 j I j !  tmg mdg j  I pel pc2  55.367! 5.311! j 34.789! 11.419!  0.6165! 0.5814!  0.7143! 0.6306! j j pel 0.7327! 31.741 pc2 11.054! 0.6492! ! 1 I 1 tmgcorr 41.397! 0.6776! varg 16.338! 0.5694! hie 0.5322! 5.936: Table B-11. Variable entry statistics.  55.367! 31.68! i 34.789! 25.188! j 31.741 23.231! 1 41.397! 32.517! 24.901!  1,87 2, 86 1, 87 2, 86 1,87 2, 86 1,87 2, 86 3, 85  j Data Set j Adj. Ref ! Std. Ref i PCA both : PCA normal  j Variables j j F to Remove F to Enter Step 1 Step 2 Step 3 Rest tmg! varg! hie! 15.73! 7.27! 5.58! < 2.13 tmgj mdg! ! 54.02! 5.31! <1.06 pel! pc2! 50.18! 11.42! <0.96 pel! pc2] |  45.97! 11.05 j < 1.87 tmgcorr; varg! hicj 9.37! <3.4 10.19! 5.94! Table B-l2. F statistics for each variable entered into DFA. Final  Step  B  C  1 to 2  3  0  11  3  3.5  14  3  5.88  2 to 3 1 to 3  j  !  x  2  1.333  Table B-13. McNemar's X calculations for a p<0.05 with one d.f., B and C are number incorrect in one step and not the other. 2  ! ! ! j ! Centroids j ! Variable 1 j Variable 2 i Variable 3 1 Constant ; normal abnormal Adj. Ref hie! varg! tmg! Canonical Function -2.54238; 3.08371! -7.46081! -1.59215! 0.98037! -1.00265 Normals C.E. 14.96594! 18.88105! -35.8871! -9.06237! Abnormals C.E. 20.00756! 12.76597! -21.0921! -5.92719! Std. Ref tmgj 1 g! Canonical Function -1.80502; -11.8107! 0.84842! -0.82117! -0.82998 Normals C.E. 12.72031! -32.6664! -8.06513! Abnormals C.E. 15.74987! -12.8433! -6.6714! PCA both pel! pc2] Canonical Function -1.04752! -1.04327! 1.68983! 0.74824! -0.76525 Normals C.E. 0.78357! -5.01087! -8.80665! Abnormals C.E. ! 2.36898! -3.43189! -11.3771! PCA normal pel! pc2; Canonical Function -0.91974! 1.29041! 1.69261! 0.71859! -0.73493 Normals C.E. 1.23955! 7.08888! -11.2246! 2.57642! 5.21324J -13.6967J Abnormals C.E. tmgcorr! Final hie! vargj 3.68917! -6.18957! -1.58569! 0.91652! -0.93735 Canonical Function I -2.75338! -39.4653! -9.77284! Normals C.E. 16.68881! 19.68431! 21.793241 12.84504! -27.9906! -6.85248; Abnormals C.E. Table B-14. Coefficients for the classification functions from the DFAs. m d  162  Data Format Hidden Nodes ! sliO ! CV Rate Avg. Training Rate Specificity Sensitivity Adj. Ref, T 0 81.8 1 80 j 89.5 80 83.7 Adj. Ref, T 1 78.4 ! 80 ! 92.1 80 76.7 Adj. Ref, T 2 I 80 j 84.1 93.9 86.7 81.4 Adj. Ref, T 3 77.3 ! 80 j 97.6 80 74.4 Adj. Ref, T 4 76.1 I 80 j 98.6 80 72.1 Adj. Ref, T 21 I 80 j 83 94 84.4 81.4 Adj. Ref, T 22 78.4 ! 80 | 94.8 82.2 74.4 Adj. Ref, T 31 77.3 96.2 1 80 j 77.8 76.7 Adj. Ref, T 32 ! 80 j 77.3 96.8 80 74.4 Adj. Ref. 1 I 85 j 83.3 88.2 80 86.7 Adj. Ref. 2 I 85 j 86.7 88.4 86.7 86.7 Adj. Ref. 3 84.1 I 85 j 88.8 84.1 84.1 - Adj. Ref. 4 82.2 92.1 ! 90 j 82.2 82.2 Table B-15. Data for hidden node effects, 1 , 2 layer. Random number generator was seeded with 1, zero minimum training iterations, and randomized starting weights. Adjusted reference plane data was used; Adj. Ref. T, after statistical transformations (13 inputs) and Adj. Ref. Before (14 inputs). st  nd  Training % Min. j Seed j CVRate j Training Rate I CV Specificity CV Sensitivity 85 84.4 1 88.3 j 84.4 84.4 ! 9 i 85 84.4 i 88.5 86.7 j 82.2 I 6 i 85 j 5 j 86.7 j 88.4 I 88.9 j 84.4 85 88.5 82.2 j 82.2 1 4 1 82.2 i 85 84.4 j 88.5 84.4 84.4 ; 3 85 84.4 88.8 84.4 j 84.4 I 2 i 85 j 1 j 86.7 i 88.4 86.7 j 86.7 80 84.4 i 88.3 86.7 I 7 I 82.2 80 83.3 j 88.5 82.2 84.4 ! 6 j 80 j 83.3 j 88.4 84.4 1 5 1 82.2 80 j 88.7 82.2 84.4 I 4 1 83.3 80 j 1 81.1 I 88.3 77.8 84.4 80 87.8 j 87.8 86.7 88.9 i 1 80 j 1 j 83.3 88.0 ! 84.4 j 82.2 Table B-l6. Seed number effect; the number which seeds the random number generator before randomizing the starting weights. There was zero minimum training iterations, and 2 hidden nodes were used with randomized starting weights. The Adj. Ref. data was used (14 inputs). Format I Seed I CVRate 1 Training Rate [ CV Specificity I CV Sensitivity Norm'd 2 ! 73.3 i 96.4 1 73.3 I 73.3 Norm'd 1 74.4 96.3 68.9 77.8 Adj. Ref. i 2 j 84.4 88.8 ! 84.4 84.4 Adj. Ref. 1 1 86.7 i 88.4 ! 86.7 86.7 2 ! Std'd 78.9 I 94.9 I 75.6 82.2 Std'd 1 72.2 95 1 66.7 77.8 Table B-17. Data set format. Trained with a minimum training criteria of 85%, 2 hidden units in the one hidden layer, and zero minimum training iterations.  163  Data Format  Seed No., j CV Rate Training Rate j CV Specificity j CV Sensitivity Minimum Iterations Adj. Ref. 84.4 | 7,0 i 88.3 i 86.7 82.2 Adj. Ref. ! 7,200 82.2 88.4 1 82.2 I 82.2 Adj. Ref. I 6,0 83.3 i 88.5 82.2 84.4 Adj. Ref. [ 6,200 ! 83.3 88.6 86.7 80.0 Adj. Ref. i 5,0 ! 83.3 i 88.4 ! 84.4 ! 82.2 Adj. Ref. I 5,200 81.1 87.8 82.2 80.0 Adj. Ref. T | 84.1 1,0 ; 93.9 86.7 81.4 Adj. Ref. T 1,250 79.5 94.6 1 84.4 74.4 Table B-l8. Effect of setting a minimum to the number training iterations completed before testing. All training was done with 2 hidden units and an 80% training crtierion. The Adj. Ref. T data had target values of 0.1 and 0.9 compared with the values of 0 and 1 for Adj. Ref. Data Format i Connections CVRate j Training Rate CV Specificity 1 CV Sensitivity Adj. wAge lh, b-o 82.2 92.9 80.0 84.4 Adj. wAge lh 81.1 j 87.7 80.0 1 82.2 Adj. wAge lh, b-h,o 80.0 j 92.9 77.8 ! 82.2 Adj. Ref. T Oh, b-o 81.8 89.5 80.0 83.7 Table B-19. Effect of removing bias unit connections; connections are number of hidden units in 1 layer and the actual connections in place, b-bias, h-hidden, o-output. Seed number was 1, minimum iterations were 250, with an 80% criteria for training. st  Data Format Adj. Ref. T Adj. Ref. T Adj. Ref. T Adj. Ref. T Adj. Ref. T Adj. Ref. T Adj. Ref. T  i I !  i j ] I !  Minimum j % ; Normal j Abnormal j CVRate j Training j CV j CV Iterations j min! Rate j Specificity ! Sensitivity 0 1 85 ! 0.25 j 1 73.9 I 93.5 j 66.7 81.4 0 85 i 0.25 j 0.75 71.6 I 93.2 i 68.9 74.4 0 I 85 i 0.1 0.9 84.1 93.9 86.7 81.4 0 i 80.i 0.1 .{. 0.9 84.1 93.9 86.7 81.4 0 j 80 j 0 1 75 95.3 1 77.8 72.1 250 80 I 0.1 0.9 79.5 94.6 i 84.4 74.4 200 80 ! 0.1 j 0.9 ! 79.5 94.5 1 84.4 j 74.4 Table B-20. Effect of varying the target values for the groups. The seed was 1 and the number of hidden nodes was 2 in the 1 layer. st  Data Format i Criteria j Seed j CVRate Training Rate Adj. Ref. I 80 I 1 81.1 I 88.3 Adj. Ref. 80 1 87.8 j 87.8 Adj. Ref. 83.3 I 80 j 1 1 88 Adj. Ref. T 80 1 84.1 j 93.9 Adj. Ref. 86.7 I 88.4 I 85 ! 1 1 Adj. Ref. j 85-90 j 85.6 j 87.9 Adj. Ref. T I 85 i 1 j 84.1 j 93.9 Adj. Ref. 90 1 80 91.9 Adj. Ref. T I 5 j 77.3 80 j ! 95.3 84.4 Adj. Ref. 89.5 i 88 j 7 i Adj. Ref. 8 I 81.1 89.6 I 88 j Adj. Ref. 90 10 78.9 ! 91.7 Adj. wAge 80 51 j 78.9 1 95.7 Adj. wAge 85 51 78.9 95.7 Table B-21. Training criteria, % correct of training set.  I I j j  I \ j  j All  CV Specificity j CV Sensitivity 77.8 84.4 86.7 88.9 84.4 I 82.2 86.7 81.4 86.7 I 86.7 84.4 86.7 86.7 81.4 77.8 I 82.2 80 74.4 86.7 j 82.2 77.8 j 84.4 75.6 j 82.2 75.6 i 82.2 75.6 j 82.2 trained with 2 hidden units.  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0099139/manifest

Comment

Related Items