Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A statistical analysis of follow-up data on untreated hypertensives Besler, Murray Jack Andrew 1982

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
831-UBC_1982_A6_7 B48.pdf [ 13.92MB ]
Metadata
JSON: 831-1.0080295.json
JSON-LD: 831-1.0080295-ld.json
RDF/XML (Pretty): 831-1.0080295-rdf.xml
RDF/JSON: 831-1.0080295-rdf.json
Turtle: 831-1.0080295-turtle.txt
N-Triples: 831-1.0080295-rdf-ntriples.txt
Original Record: 831-1.0080295-source.json
Full Text
831-1.0080295-fulltext.txt
Citation
831-1.0080295.ris

Full Text

A STATISTICAL ANALYSIS OF FOLLOW-UP DATA ON UNTREATED HYPERTENSIVES by MURRAY JACK ANDREW BESLER B.A. Hon., University of Regina, 1976 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE MASTER OF SCIENCE i n the Department of Mathematics We accept t h i s thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA September 1982 0 Murray J.A. Besler, 1982 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y available for reference and study. I further agree that permission for extensive copying of t h i s thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. I t i s understood that copying or publication of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of Mathematics  The University of B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 Date September 28, 1982 ABSTRACT Hypertension (commonly known as "high blood pressure") i s a disorder whose e f f e c t s - e i t h e r d i r e c t or i n d i r e c t - are now widely recognized as capable of creating serious health problems i n the segment of the population so a f f e c t e d . Estimates of the si z e of t h i s segment depend on such factors as the c r i t e r i a employed i n the diagnosis of hypertension, but an o v e r a l l f i g u r e of around 10% of the adult population i s commonly quoted - along with the observation that the prevalence i s greater among blacks, the e l d e r l y , and males i n general. The major targets of hypertensive damage are the heart, the brain, the kidneys, and the retinae; consequently, heart attack and heart f a i l u r e , stroke, and kidney f a i l u r e are common causes of death among these p a t i e n t s . Unfortunately, the cause of the elevated a r t e r i a l pressure can be " p o s i t i v e l y " i d e n t i f i e d i n only some eight to twelve percent of a l l cases seen; the remaining patients are c l a s s i f i e d as "primary" or " e s s e n t i a l " hypertensives. Research involving e s s e n t i a l hypertension has been hampered by the complexity of the (as yet) poorly understood hemodynamic mechanisms of the body, and any sort of consensus here seems to be very slow i n developing. Meanwhile, the disease i s now being treated - upon discovery - with any one or a combination of three basic anti-hypertensive agents, a l l of which involve inconvenience, expense, and some rather unpleasant side e f f e c t s . The wide-spread acceptance of these drugs has also had an unhappy e f f e c t on the evolution of our understanding of the c l i n i c a l course of untreated hypertension, since now only the "mildest" cases may be studied i n t h e i r natural form. Thus, many questions concerning hypertensive disease remain without s a t i s f a c t o r y answers; an appreciation of the degree of uncer-ta i n t y surrounding many of the issues may be had by examining the work of as few as three or four authors on the subject. Some of the unresolved (or only p a r t i a l l y settled) issues include: the r e l a t i v e prognostic s i g n i f i c a n c e of the various signs and symptoms that are often associated with high blood pressure; the i n t e r - r e l a t i o n s that e x i s t among these signs and symptoms; the pattern of change of the various symptoms over time and the r e l a t i o n s h i p of such changes to the patient's outlook; and, f i n a l l y , the nature and extent of the part played by hypertension within the broader problem of cardiovascular disease i n general - and atherosclerosis i n p a r t i c u l a r . Data on 48 male and 50 female primary hypertensives who received e s s e n t i a l l y no anti-hypertensive drug therapy, and who were followed u n t i l death (or for a maximum of ten years) are analyzed i n t h i s t h e s i s , with the aim of i l l u m i n a t i n g the issues r a i s e d above - and others as w e l l . Factors such as age, sex, blood pressure (average as well as extreme values), heart, b r a i n , kidney, and r e t i n a l symptoms are examined, both i n terms of t h e i r a s s o c i a t i o n with s u r v i v a l time as well as t h e i r i n t e r - r e l a t i o n s ; sex differences and time trends are also ex-plored. The bulk of the exploratory work i s done with ordinary multiple regression and l i n e a r discriminant models. Considerable e f f o r t i s devoted to attempts to reduce the i n s t a b i l i t y that r e s u l t s from the presence of highly correlated variables within the p r e d i c t i o n equation. F i n a l l y , the r e l a t i v e l y new hazard function methodology proposed by i i i Cox i s used to obtain an objective and comprehensive formula f o r use i n creating (or identifying) groups of patients who share a s i m i l a r prognosis. Such a c r i t e r i o n would, f o r example, be h e l p f u l i n design-ing a study to compare d i f f e r e n t forms of treatment of hypertension. i v TABLE OF CONTENTS Page ABSTRACT i i LIST OF TABLES i x LIST OF FIGURES AND ILLUSTRATIONS x i v ACKNOWLEDGEMENTS xv Chapter 1 INTRODUCTION 1 2 A REVIEW OF THE MEDICAL PROBLEM 5 2.1 Introduction and De f i n i t i o n s 5 2 . 2 The Etiology of Hypertension 8 2 . 3 The Epidemiology of E s s e n t i a l Hypertension 10 2.4 Pathology 13 2 . 5 The C l i n i c a l Course of E s s e n t i a l Hypertension 18 2 . 6 The Prognosis of E s s e n t i a l Hypertension 24 2 . 7 Treatment 29 3 THE SOURCE AND STRUCTURE OF THE DATA 31 3.1 The Sample 31 3 .2 The Variables 33 4 AN OUTLINE OF THE STATISTICAL METHODS USED 43 4.1 The Goals of the Analysis 43 4 . 2 The Sequence of Analyses 45 4 .2.1 L i f e Table Analysis 46 4 . 2 . 2 Analysis of Missing Data 49 4 . 2 . 3 Preliminary Analyses. . 50 v Chapter Page 4 . 2 . 4 Multiple Linear Regression 51 4 . 2 . 5 Discriminant Analysis 63 4 . 2 . 6 L o g i s t i c Regression 67 4 . 2 . 7 Repeated Measures ANOVA 69 4 . 2 . 8 Growth Curve Analysis 71 4 . 2 . 9 Models f o r the Hazard Function 74 5 A SUMMARY OF THE RESULTS 81 5 .1 Preliminary Survival Analysis 81 5 .2 The Missing Data Problem 92 5 . 3 Exploratory S t r a t i f i c a t i o n s 97 5 .4 Analysis of Time Zero Variables 101 5 . 4 . 1 Preliminary Explorations 102 5 . 4 . 2 Using P a r t i a l l y Re-expressed Data 119 5 . 4 . 3 P r i n c i p a l Components Analysis 125 5 . 4 . 4 Additional Analyses 131 5 .5 Predictions i n Male-Female Strata 134 5 . 5 . 1 Analyses Using O r i g i n a l and P a r t i a l l y Re-expressed Data 135 5 . 5 . 2 Using Within-groups P r i n c i p a l Components 144 5 . 5 . 3 Additional Analyses 152 5 .6 Male-Female Comparisons 156 5 .7 Analyses with Pulse Pressure and Range Variables 160 5 .8 P r e d i c t i n g Papilledema Symptoms 172 5 .9 Analyses using Time Two Data 175 v i Chapter Page 5 . 9 . 1 Complete Sample Results 176 5 . 9 . 2 Analysis of Male and Female Subsamples 181 5 .10 Analyses using Time Five Data 187 5 . 1 0 . 1 Time Five Symptom Variables 197 5 . 1 0 . 2 Inclusion of Time Zero Data 200 5 . 1 0 . 3 Repeated-Measures ANOVA 204 5 . 1 0 . 4 Growth Curve Analysis 220 5 . 1 1 Modelling the Hazard Function 222 5 . 1 1 . 1 Searching f o r a Model 222 5 . 1 1 . 2 Time Zero Data 227 5 . 1 1 . 3 Time Two Data 231 6 CONCLUSIONS 239 6 .1 S t a t i s t i c a l Methodology 239 6 . 2 The Medical Issues 242 6 . 2 . 1 The O r i g i n a l Problems 242 6 . 2 . 2 Additional Problems 252 6 . 3 Concluding Remarks 256 BIBLIOGRAPHY 257 APPENDICES A LIKELIHOOD OF THE SAMPLE OF OBSERVED AGES 261 B DERIVATION OF "MINVAR" COEFFICIENTS 263 C DERIVATION OF PRINCIPAL COMPONENTS 266 D CORRELATION MATRIX FOR TIME-ZERO DATA 268 E CORRELATION MATRIX FOR TIME-ZERO DATA: MALES 269 v i i Page F CORRELATION MATRIX FOR TIME-ZERO DATA: FEMALES 270 G CORRELATION MATRIX FOR WITHIN-GROUPS COMPONENTS: MALES 271 H CORRELATION MATRIX FOR WITHIN-GROUPS COMPONENTS: FEMALES 272 I CORRELATION MATRIX OF ORTHOGONALIZED WITHIN-GROUPS COMPONENTS: MALES 273 J CORRELATION MATRIX OF ORTHOGONALIZED WITHIN-GROUPS COMPONENTS: FEMALES 274 K ESTIMATED COEFFICIENTS OF THE GROWTH CURVE MODEL 275 L CORRELATION CHART FOR COMBINED SAMPLE, TIME ZERO DATA 276 v i i i LIST OF TABLES Table Page I Chicago Heart Association Detection Project i n Industry (1967-1972) 12 II Ten-year, Age-adjusted Death Rates per 1000 Men 25 III Summary of Mortality by Severity Group 28 IV Summary of the Variables 41 V Comparison of F u l l and Conditional MLE's fo r X and y (A) Severity Group 2 ( o u t l i e r removed) (B) Severity Group 3 90 VI Preliminary S t r a t i f i c a t i o n s : Severity Groups 1 and 2 only 9 9 VII Ten Strongest Correlations (with S u r v i v a l Time) and t h e i r Ranks 103 VIII Regression Results for the Combined Sample (A) INTERV (n = 98) (B) INTERM (n = 98) (C) STATUS (n = 98) (D) TIME (46 non-survivors) (E) INTERM (Wt'd) (F) INTERV (severity groups 1 S 2, n = 83) (G) STATUS (severity groups 1 S 2) (H) TIME (3 3 non-survivors, severity groups IX Biweight Regression Results f o r INTERV. H I X Discriminant Analysis Results (A) Using a Random Subsample (n = 73) (B) Using the E n t i r e Sample (n = 98) (C) Using ".Normalized" Scores (n = 98) 113 XI Quadratic Discriminant Results 117 XII MINVAR-inspired Re-expressions (A) MAXSYS and MINDIA Converted to Residuals (B) MINDIA, KSPN0, RVES0 Converted to Residuals 120 ix Table Page XIII Regression Results using Re-expressions of Extreme Blood Pressures 122 XIV Regression Results using Re-expressions of Extreme Blood Pressures and Kidney Variables. 124 XV P r e d i c t i n g INTERV with P a r t i a l l y Re-expressed Data 126 XVI Results of Stepwise Discriminant Analysis: MAXDIA, MINSYS, MINDIA Re-expressed 127 XVII Eigenvectors of 17 Standardized Time Zero Variables . . ' 129 XVIII C o e f f i c i e n t s of Variables Obtained from Regression on P r i n c i p a l Components. . . 130 XIX P r i n c i p a l Components within Groups of Symptoms 132 XX Correlations among Selected Components of Table XIX 133 XXI Regression Results f o r the Male Subsample (n = 48) (A) INTERV (B) INTERM (C) INTMED (D) TIME (Wt'd) (E) INTERV (severity groups 1 & 2 , n = 38) 136 XXII Discriminant Analysis Results (Male Subsamples) (A) Using a Random Subsample (n = 40) (B) Using the E n t i r e Subsample (n = 48) 138 XXIII Regression Results f o r the Female Subsample (n = 50) (A) INTERV (B) INTERM (C) INTMED (D) TIME (W't'd) (E) INTERM 140 XXIV Discriminant Analysis Results (Female Subsample) (A) Using a Random Subsample (n = 42) (B) Using the E n t i r e Subsample (n = 50) and Re-expressions 142 XXV P r i n c i p a l Components within Groups of Symptoms: Males 145 x Table Page XXVI P r i n c i p a l Components within Groups of Symptoms: Females 146 XXVII A l l Subsets Regression Results f o r Males: INTERM on the Components of Table XXV 148 XXVIII A l l subsets Regression Results f o r Females: INTERM on the Components of Table XXVI 149 XXIX A l l Subsets Regression Results f o r Males INTERM on "Uncorrelated" Components 150 XXX Stepwise Discriminant Analysis Results f o r Males: Status by "Uncorrelated" Components 151 XXXI A l l Subsets Regression Results f o r Females: INTERM on " P a r t i a l l y " and " F u l l y " Orthogonalized Components 153 XXXII Stepwise Discriminant Analysis Results f o r Females: Status by "Uncorrelated" Components 155 XXXIII Comparison of Male and Female Subsamples 157 XXXIV Correlations and Summaries for Upper Half-range and Pulse Pressure Variables 162 XXXV Regressions with INTERM using Upper Half-range Variables (A) Complete Sample (n = 98) (B) Males (n = 48) (C) Females (n = 50) 164 XXXVI Regressions with INTERM using Pulse Pressure Variables (A) Males (n = 48) (B) Females (n = 50) 165 XXXVII P a r t i a l C o r r e l a t i o n Analyses (A) INTERM, SMXMM, DMXMM Given Age, MINSYS, MINDIA, (SEX) (B) INTERM, SR, DR Given Age, SYSO, DIAO, SEX (n = 42) (C) INTERM, PPMIN, PPMAX Given Age, SYSO, DIAO, (SEX) . . . UQ XXXVIII Regression Results f o r INTERM with Adjusted MINIMA (A) Males (B) Females 173 x i Table Page XXXIX Regression Results using Time Two Data, Combined Samples (A) STATUS (n = 83) (B) INTERV (n = 83) (C) TIME (31 non-survivors) 177 XL Regression Results using Time Two Data: Combined Sample, Target: INTERM (n = 80) 180 XLI Regression Results f o r INTERM using Time Two Data and Subsamples (A) Males (n = 36) (B) Females (n = 44) 182 XLII Discriminant Analysis with Time Two Data: Male Subsample (n = 36) 185 XLIII Discriminant Analysis with Time Two Data: Female Subsample (n = 44) 186 XLIV C o e f f i c i e n t s f or Selected Within-groups P r i n c i p a l Components: Male Subsample 188 XLV Stepwise Regression using INTERM and Time Two Data Re-expressed as Within-groups Components: Males 189 XLVI Regression Equation f o r INTERM using Residual Components: Males 190 XLVII Stepwise Discriminant Results using Residual Components: Males 191 XLVIII C o e f f i c i e n t s f or Selected Within-groups P r i n c i p a l Components: Female Subsample 192 XLIX Stepwise Regression using INTERM and Time Two Data Re-expressed as Within-groups Components: Females 193 L Regression Equation for INTERM using Residual Components: Females 194 LI Stepwise Discriminant Results using Residual Components: Females 196 LII Regression Results using Time Five Variables i n Judgment Components 199 x i i Table Page LIII Importance of Time Zero and (Time Five minus Time Zero) Variables as Predictors (by Sex) 201 LIV Regression and Discriminant Analysis Results for Time Zero and (Time Five minus Time Zero) Data (by Sex) 203 LV Results of Repeated Measures ANOVA for 13 Symptoms 219 LVI Results of Growth Curve Analysis: Linear and Quadratic C o e f f i c i e n t s 221 LVII Hazard Model Results: Combined Sample, Time Zero Data, 98 Cases 229 LVIII Hazard Model Results: Male, Female Groups, O r i g i n a l Time Zero Data, 4 8 , 50 Cases 230 LIX Hazard Model Results: Female Group, Time Zero Residuals, 50 Cases 232 LX Hazard Model Results: Male, Female Groups, Time Zero, Orthogonalized Components 233 LXI Hazard Model Results: Male, Female Groups, Time Zero, Time Two Increment Data 235 LXII Hazard Model Results: Male, Female Groups, Orthogonalized Time Zero, Time Two Increments 237 x i i i LIST OF FIGURES Figure Page 1 T y p i c a l Blood Pressure Chart of a Hypertensive Patient 20 2 Product-limit Estimate f o r S(t) (by Severity Group) 83 3 L i f e - t a b l e Estimates of Cumulative Hazard: (A) Severity Group 2 ( o u t l i e r removed) (B) Severity Group 3 84 4 Truncated-Weibull P l o t of Ages at Death (A) Severity Group 2 (a = 35 years) (B) Severity Group 3 (a = 15 years) 87 5 Truncated-Weibull P l o t of Known Durations (A) Severity Group 2 ( o u t l i e r removed) (B) Severity Group 3 93 6 Average Symptom at Years 0 , 2 , 5 , by Sex and Status (A) SYS (B) DIA (C) HSYM (D) HSIZ (E) HECG (F) BACHE (G) BCVA (H) KSYM (I) KPRO (J) KSPN (K) RVES (L) RRET (M) RPAP 205 7 Truncated-Weibull P l o t of Follow-up Times: Males 223 8 Truncated-Weibull P l o t of Follow-up Times: Females 224 9 Check f or L i n e a r i t y of X and lnX versus SEV 226 x i v ACKNOWLEDGEMENT S The present work owes much to the painstaking e f f o r t s of Dr. K.A. Evelyn and hi s associates, who designed and c a r r i e d out the o r i g i n a l follow-up study from which the data were derived. In addition, Dr. Evelyn, through h i s cooperation i n answering questions and supplying l i t e r a t u r e on the medical issues, gave the present study of h i s data both d i r e c t i o n and impetus i n the c r u c i a l early stages. His passing during the f i n a l stages of preparation of t h i s thesis i s deeply regretted, and hi s expertise i n the f i e l d of hypertension w i l l be greatly missed. I am also indebted to my supervisor, Dr. M. Schulzer, whose in s i g h t l e d to the r e v i v a l of the present dataset and whose patience and persistence contributed much to i t s thorough exploration. Thanks are also due to Dr. J . Petkau who, with customary a m i a b i l i t y , accepted a l a s t -minute enlistment as a second reader. F i n a l l y , the material contributions of the National Research Council of Canada and of the Department of Mathematics, U.B.C. are grate-f u l l y recognized. xv Chapter 1 1 INTRODUCTION Research into the causes, e f f e c t s , and nature of high blood pressure began a f t e r 1910 - about the time the measurement of blood pressure i n humans became po s s i b l e . Although the prognostic s i g n i f i c a n c e of severe elevations of a r t e r i a l pressure was f a i r l y well known by the 1920's, i t was not u n t i l the 1930's that Goldblatt's work, involving a r t i f i c i a l l y - i n d u c e d hypertension, opened the era of intensive experi-mental research i n t o t h i s disease. In 1946, the f i r s t e f f e c t i v e drugs for the treatment (but not necessarily the cure) of a r t e r i a l hypertension became a v a i l a b l e ; by 1950, i t was unethical to withhold such drugs i n severe cases, and the study of the natural h i s t o r y of untreated hyper-tension ended soon a f t e r . The termination of such studies at that time l e f t many questions unanswered, since: (a) hypertensive disease appears to have a 20-year duration on the average, and i s asymptomatic i n i t s early stages; since follow-up periods i n excess of 15 years are rare, and since the onset of high blood pressure was believed f o r many years to be associated with advancing age, the l a t e r stages of the disease had received the most attention p r i o r to 1950; (b) the body's hemodynamic mechanisms are complex (and s t i l l incompletely understood today); t h i s complexity includes an apparent s e n s i t i v i t y to many exogenous and endogenous s t i m u l i , and has created a d i f f i c u l t problem i n those areas of research where a u n i v e r s a l l y accepted and operational d e f i n i t i o n of the disease e n t i t y i t s e l f i s necessary. (See Chapter 2 for more d e t a i l s . ) In the l a s t three decades, the e t h i c a l o b l i g a t i o n to t r e a t cases judged s u f f i c i e n t l y "severe" has contributed to a s h i f t i n emphasis toward 2 the early and "pre-hypertensive" stages of the disease, where i t i s s t i l l p o s s i b l e to observe untreated patients; however, patients are once again being studied during only a part of the course of the disease. In spite of such d i f f i c u l t i e s , though, several issues have been illuminated over the years. Today, few a u t h o r i t i e s s t i l l believe that only severe hyper-tension i s associated with a poor prognosis; i n f a c t , the use of the term "benign" i s now proscribed i n discussions of "mild" degrees of hypertension, since increases i n morbidity and mortality have been observed at a l l degrees of "abnormality" of the blood pressure. Many researchers have also recognized hypertension as a major r i s k factor i n the s c l e r o t i c degenera-t i o n of the a r t e r i e s ; and epidemiologic studies have shown that over ten per cent of the adult population of the United States i s s i g n i f i c a n t l y hypertensive. Thus, the gravity of t h i s disease has been well defined. Many factors have been found to be associated with hypertension: s a l t intake, obesity, and family h i s t o r y are but a few examples. In ad-, d i t i o n , many theories on the underlying causes of the elevated pressure have been proposed and are being debated i n the l i t e r a t u r e . Notwithstanding such clues-, however, the o r i g i n s of the elevated pressure i n the vast majority of cases must s t i l l be c l a s s i f i e d as "unknown". Fortunately, new and better drugs for the control of high blood pressure have been discovered i n the past 25 years. By 1967, the c a r e f u l work of the Veterans Administration Cooperative Study Group [19, 20, 21] had eliminated most doubts about the effectiveness of these drugs i n reducing the blood pressure and decreasing the r i s k of c e r t a i n cardio-vascular complications i n patients having average d i a s t o l i c pressures above 115 mm. Hg. However, even today, the evidence accumulated i n support 3 of a s i m i l a r effectiveness at lower average pressures i s not yet conclusive; as a r e s u l t , physicians often hesitate to prescribe drugs whose side e f f e c t s often appear to out-weigh the benefits - e s p e c i a l l y when the average d i a s t o l i c pressure i s below 105 mm. Hg. For t h i s reason, and a v a r i e t y of others as w e l l , an estimated 87% of hypertensives do not have t h e i r blood pressure within the generally accepted "normal" range. Much e f f o r t i s currently being devoted to p u b l i c awareness programmes i n an e f f o r t to p u b l i c i z e the p o t e n t i a l l y serious consequences of untreated hypertension. Thus, a statement made by Geiger and Scotch i n 1963 i s s t i l l v a l i d today: "The disease continues to pose stubborn and complex problems f o r almost every basic and applied medical d i s c i p l i n e - the biochemist and p h y s i o l o g i s t no l e s s than the epidemiologist and i n t e r n i s t . " [24, p.1151] A case i n point i s a problem that arises i n studies comparing the effec-. tiveness of d i f f e r e n t forms of therapy for hypertension. Evelyn et a l , have remarked that "various authors employ a wide v a r i e t y of c r i t e r i a of therapeutic benefit i n e s s e n t i a l hypertension, without having a r r i v e d at general agreement as to the r e l a t i v e importance of each f a c t o r " . [13, p.592] A very s i m i l a r problem i s discussed by F r e i s et a l [20, p.116] i n r e l a t i o n to t h e i r d e f i n i t i o n of a severity index for hypertension; the index involves a weighted sum of the same "f a c t o r s " r e f e r r e d to i n [13], namely: blood pressure l e v e l , heart, brain, kidney, and r e t i n a l symptoms. What i s not known i n each case i s t h i s : how should medical experience with t h i s disease be translated into a set of numbers (or weights) i n d i -cating the prognostic value of each factor when a l l relevant information 4 on the patient i s to be considered simultaneously? F i r s t the medical framework of the problem w i l l be established i n some d e t a i l . A f t e r describing the data to be analyzed, the present report w i l l attempt to provide s t a t i s t i c a l answers to the problem discussed above, and others as w e l l . Regression models (for the most part) w i l l be used to analyze the data, which resulted from a ten-year follow-up study.(completed before 1960) of 98 untreated hypertensives. The association of c l i n i c a l l y - o b s e r v a b l e symptoms with s u r v i v a l time, as well as the i n t e r - c o r r e l a t i o n s of the symptoms themselves, w i l l be examined in d e t a i l . Since examinations of the patients were made at four points i n the course of the. study (where possible) the data also provide information about the evolution of symptoms over time. Male and female subsamples w i l l be analyzed separately, i n view of the apparently d i f f e r e n t course of the disease i n these two groups. F i n a l l y a model for the hazard function (force of mortality) w i l l provide an objective c r i t e r i o n on which to base a severity index. Chapter 2 5 A REVIEW OF THE MEDICAL PROBLEM The goal of t h i s chapter i s to present a non-technical overview of some of the medical aspects of e s s e n t i a l hypertension. Special emphasis w i l l be placed on those areas that are required f o r a f u l l appreciation of the s t a t i s t i c a l analyses of l a t e r chapters. 2.1 Introduction and D e f i n i t i o n s Hypertension i s a general term r e f e r r i n g to any abnormal elevation of a r t e r i a l pressure. The rhythmical contraction of the heart creates a pressure pulse i n the a r t e r i a l system, with the maximum pressure occurring at the l i m i t of contraction, or systo l e , of the l e f t v e n t r i c l e , and the minimum pressure being attained during the period of re l a x a t i o n , or d i a s t o l e , of t h i s v e n t r i c l e . Using a sphygmomanometer - the c u f f / auscultation method of measuring blood pressure - i t i s possible to obtain approximations to only the two extremes of pressure i n the system. The s y s t o l i c (maximum) pressure normally l i e s i n the range of 110 to 130 millimeters of mercury, while the d i a s t o l i c (minimum) pressure i s usually between 75 and 85 mm. Hg i n the normal, r e s t i n g adult. The pulse pressure i s defined as the s y s t o l i c minus the d i a s t o l i c pressure, and the mean a r t e r i a l pressure (over the enti r e pulse) i s estimated from flow studies as the d i a s t o l i c plus one-third of the pulse pressure (that i s , an average of the extreme pressure that puts twice as much weight on the d i a s t o l i c ) . A meaningful d e f i n i t i o n of what constitutes an "abnormal" elevation of blood pressure i s much harder to give, and has, i n the past, posed a serious problem f o r epidemiologists and others. An excellent summary of t h i s issue was given by Goldring and Chasis back i n 1944: 6 " I t i s d i f f i c u l t to set a fine d i v i d i n g l i n e between normal and abnormal a r t e r i a l blood pressure because: (a) the s y s t o l i c and d i a -s t o l i c pressures may vary 10 mm. Hg i n a succession of f i v e heart beats; (b) there i s a marked v a r i a t i o n among d i f f e r e n t exam-iners; (c) the l e v e l of blood pressure varies with conditions under which the measurement i s made; (d) the blood pressure l e v e l v aries with the c r i t e r i a and methods employed; (e) the blood pressure l e v e l i s the resultant of the i n t e r p l a y of dynamic v a r i a b l e s , namely s y s t o l i c discharge [the cardiac output per beat], heart rate, e l a s t i c i t y of the c e n t r a l a r t e r i e s [especially of the aorta], peripheral resistance [controlled l a r g e l y by the a r t e r i o l e s ] , and v i s c o s i t y and volume of the blood . . . variables that obviously cannot be c o n t r o l l e d i n every instance. 1 11 [25, p.8] The l i n k s among points (a), (c), and (e) above may be c l a r i f i e d by consideration of the a c t i v i t i e s of the cardiac and vasomotor r e f l e x centres, a l l of which are located i n the medulla oblongata of the c e n t r a l nervous system. These centres send out nervous impulses which regulate cardiac output and the c a l i b r e of the small a r t e r i e s and a r t e r i o l e s ; the centres themselves are stimulated by affer e n t impulses i n i t i a t e d at various pressure- and chemical-sensitive s i t e s i n the body, as well as by higher centres i n the brain. Furthermore, the tone of the a r t e r i o l e s may be affec t e d d i r e c t l y by such l o c a l factors as temperature, oxygen concentration, and the presence of hormones produced by the endocrine glands ( e s p e c i a l l y the adrenal glands). Hence, the mean l e v e l of a r t e r i a l pressure varies greatly i n response to such common s t i m u l i as exercise and emotions; the elevation of the s y s t o l i c pressure i n eit h e r case may be as much as 40 mm. Hg. Comments within square brackets have been added f or c l a r i t y . 7 Such considerations have an e s p e c i a l l y great e f f e c t on the in t e r p r e t a t i o n of casual blood pressures, as recorded (by sphygmomanometer) in the c l i n i c or doctor's o f f i c e . (The term "casual blood pressure" i s used i n opposition to "r e s t i n g blood pressure", which usually implies that the patient has been confined to bed for a considerable period). But even i f "casual" e f f e c t s were completely eliminated, the measurement of a r t e r i a l pressure by the usual method would s t i l l be subject to c e r t a i n c r i t i c i s m s : (a) the influence of arm circumference i s s i g n i f i c a n t , as 2 shown by comparison of sphygmomanometric readings with more d i r e c t measurements of a r t e r i a l pressure: the former method tends to overestimate the pressure i n obese i n d i v i d u a l s ; (b) observer error i s also important, since the auscultatory method depends on determining the moment of appearance and disappearance of c e r t a i n sounds produced within the par-t i a l l y compressed b r a c h i a l artery; (c) observer bias may manifest i t s e l f , e i t h e r through d i g i t preference and rounding o f f , or else through the technique of multiple determinations of the pressure i n search of a "normal" reading; the doctor may also be influenced by knowledge of the patient's condition. F i n a l l y , even when adjustments have been made for both physio-l o g i c a l f l u c t u a t i o n and error i n measurement, i t i s apparent that there i s no convenient gap separating "high normal" from "mildly elevated" blood pressures. Epidemiologic surveys and l i f e insurance mortality studies have been evoked i n defence of s i g n i f i c a n t l y d i f f e r e n t points of view i n t h i s regard: the view supported by the epidemiologic studies r e l i e s mainly 2 That i s , in v o l v i n g actual i n t r a - a r t e r i a l pressure recordings, such as with the use of the s e n s i t i v e pressure transducer [23, p.8]. 8 on large-sample information concerning the usual range of casual pressures i n various age/sex groups; the l i f e insurance studies, on the other hand, attempt to e s t a b l i s h i d e a l l i m i t s based on the prognostic s i g n i f i c a n c e of even "mild" elevations of blood pressure. In the f i n a l a n a l y s i s , however, the choice of a d i v i d i n g l i n e i s necessarily a r b i t r a r y , and "a commonly used c r i t e r i o n of hypertension i s a reading of over 145 s y s t o l i c and/or 95 d i a s t o l i c , presumably on the basis of a single measurement taken i n the l y i n g p o s i t i o n a f t e r the patient has rested i n t h i s p o s i t i o n for a few minutes" [12, p.243]. This d e f i n i t i o n r e f l e c t s the view that an elevated d i a s t o l i c pressure should not be the sole defining c r i t e r i o n f o r hyper-tension - although, of course, i t may be. 2 .2 The Etiology of Hypertension From the foregoing discussion, i t might be suspected that hypertension may at times be re l a t e d to any of a number of conditions i n the body. When an underlying cause can be p o s i t i v e l y i d e n t i f i e d , a diagnosis of secondary hypertension i s made; otherwise, the abnormal condition i s simply r e f e r r e d to as primary, or e s s e n t i a l hypertension. (Another name f o r e s s e n t i a l hypertension i s "hypertensive vascular disease" .) Notwithstanding the many advances that have been made i n the study of the causes of secondary hypertension, i t i s estimated that over 90% of a l l cases of hypertension must be categorized (by the process of exclusion) as e s s e n t i a l hypertension. According to a well-known p r i n c i p l e of hydraulics, under i d e a l conditions the pressure i n a tube i s d i r e c t l y proportional to both the flow through the tube and the resistance offered by the tube; having observed that cardiac output (flow) i s within normal 9 l i m i t s i n the majority of cases, researchers have consequently d i r e c t e d most of t h e i r attention to studying the causes of increased resistance i n the e s s e n t i a l hypertensive. Since the a r t e r i o l e s e f f e c t i v e l y c o n t r o l peripheral resistance, and since the smooth muscle i n the walls of these small blood vessels responds to a number of c o n s t r i c t o r s t i m u l i , the following theories of the o r i g i n a t i o n of e s s e n t i a l hypertension have been put f o r t h : (a) the a r t e r i o l e s of the subject with hypertensive vascular disease show a "hyper-reactivity" to c o n s t r i c t o r s t i m u l i - e i t h e r due to a defective system of destruction or binding of the stimulating substances, or else as a r e s u l t of a basic abnormality of the smooth muscle metabolism i t s e l f ; the unknown defect i s thought to be in h e r i t e d ; (b) changes i n renal metabolism may lead to increased production of a substance i n the kidney c a l l e d renin; i n turn, renin eventually leads to increased amounts of the powerful vasoconstrictor, angiotensin I I ; (c) changes i n the adrenal cortex may produce increased amounts of aldosterone, which eventually leads to e i t h e r increased t o n i c i t y of smooth muscle or to i t s h y p e r - r e a c t i v i t y to normal concentrations of norepinephrine (noradrenaline); (d) the r e n i n -angiotensin-aldosterone theory, which combines the major premises of (b) and (c) above; (e) the system's baroreceptors (automatic pressure regulators located p r i m a r i l y i n the aorta and c a r o t i d a r t e r i e s ) may become "reset" at higher l e v e l s [24, pp .1158-1160]. Unfortunately, none of these formu-l a t i o n s has yet received general acceptance, and both the controversey and the research continue. From t h i s point forward, major emphasis w i l l be on e s s e n t i a l hypertension since, c l e a r l y , t h i s i s by f a r the most important form of the disorder and i s not d i r e c t l y "treatable". Nevertheless, i t may be enlightening, i n view of the preceding paragraph, to summarize the most 10 common known causes of secondary hypertension as follows: .(a) nonrenal causes: o v e r a c t i v i t y of the adrenal cortex; o v e r a c t i v i t y of the adrenal medulla (pheochromocytoma); pregnancy (as i n toxemia of pregnancy); coarctation (narrowing) of the aorta; toxins (such as lead); contraceptive p i l l s ; (b) renal causes: acute and chronic glomerulonephritis (inflammation of the glomerulus, the b l o o d - f i l t e r i n g u n i t of the kidney); pyelonephritis (inflammation of the kidney and p e l v i s ) ; d i a b e t i c kidney; p o l y c y s t i c kidney. 2.3 The Epidemiology of E s s e n t i a l Hypertension Numerous epidemiologic studies of high blood pressure have been c a r r i e d out i n various parts of the world; unfortunately, the use of a v a r i e t y of measurement conditions and c u t - o f f points (for c l a s s i f i c a t i o n of an i n d i v i d u a l as "hypertensive") reduces the comparability of these studies and u l t i m a t e l y t h e i r value - to some degree. Another problem involves the d e f i n i t i o n and goal of the study i t s e l f - that i s , whether i t i s a study of high blood pressure i n general, or of e s s e n t i a l hypertension i n p a r t i c u l a r ; i t should be noted that tests f or the various known causes of elevated blood pressure are generally time-consuming, and are successful i n demonstrating the existence of one of the conditions l i s t e d at the end of section 2.2 i n only a small f r a c t i o n of hypertensives. Notwithstanding such problems, however, c e r t a i n patterns have become apparent, and suggest that the prevalence of elevated a r t e r i a l pressure varies greatly over the world and seems to be associated with race, and possibly environmental f a c t o r s , as w e l l . F a i r l y recent estimates, based on national health surveys i n the United States, suggest that more than 10% of the adult population i s hypertensive. The strength of the race factor within the United States i s apparent i n the prevalence rates of 11 hypertensive disease among i n d i v i d u a l s aged 18 to 79 years: the figure i s 9% f o r whites and 16% for blacks, based on the World Health Organiza-tion's c r i t e r i o n of a d i a s t o l i c pressure that i s co n s i s t e n t l y 95 mm. Hg or higher. Figures obtained through the Chicago Heart Association Detection Project i n Industry (1967-1972) are summarized i n Table I [44 , p. 4 ] . The importance of age, race, and sex factors i n the prevalence of hypertension i n the United States i s made cl e a r by t h i s t a b l e . In addition to these, many other factors have been found to be associated with hypertensive vascular disease. A short d e s c r i p t i o n of them follows [see 2 4 , 4 4 , 1 2 ] : (a) family h i s t o r y : the hereditary nature of e s s e n t i a l hypertension has been f a i r l y well established, i n spite of problems a r i s i n g from possible e f f e c t s of a common environment; (b) occasional high readings i n youth: t h i s factor may be associated with the "hyper-reactivity" (to the various s t i m u l i that normally induce an elevation i n blood pressure) which i s believed by some researchers to be the forerunner of established hypertension; (c) obesity and s a l t intake: the tendency to develop high blood pressure seems to be proportional to the gain i n weight from the young adult stage to middle age; reduction i n s a l t intake often lowers the blood pressure, regardless of any simultaneous weight l o s s ; (d) body b u i l d : c l e a r l y r e l a t e d to points (a) and (c), t h i s factor i s described as the tendency of i n d i v i d u a l s with a short stocky b u i l d to develop high blood pressure; (e) psychosomatic factors and urbanization: notoriously d i f f i c u l t to 12 Table I: Chicago Heart Association Detection Project i n Industry (1967-1972): Prevalence rate (percent) of high blood pressure, defined as s y s t o l i c > 160 and/or d i a s t o l i c > 95 mm. hg. (The project involved 29,153 employees of almost 100 companies. Sample sizes varied from 69 black women aged 55-64 to 4,946 white men aged 25-34). AGE GROUP (Years) RACE SEX 25-34 35-44 45-54 55-64 Male 10.0 15.7 26.0 36.3 White Female 3.5 8.7 18.1 29.3 Black Male Female 14.0 5.2 20.6 14.9 36.2 28.9 48.3 35.8 13 measure q u a n t i t a t i v e l y , these factors represent an attempt to account for the e f f e c t s of " s t r e s s " on the l e v e l of a r t e r i a l pressure; most researchers are presently reluctant to assign a s i g n i f i c a n t r o l e to such factors i n the development of hypertensive vascular disease. I t should be evident from even t h i s incomplete l i s t that e s s e n t i a l hypertension poses formidable problems to the research worker, regardless of the p a r t i c u l a r approach adopted i n studying t h i s common disorder. 2.4 Pathology Many vascular and organic changes are known to be associated with e s s e n t i a l hypertension. I t i s s t i l l a matter of controversy, however, as to exactly how these changes are r e l a t e d to the o r i g i n a l (but unknown) disorder; moreover, the controversy, i s l i k e l y to p e r s i s t u n t i l more i s known about the pathogenesis of vascular disease i n general. For the time being, therefore, i t i s not c l e a r whether some changes should c o r r e c t l y be regarded as an i n t e g r a l part of the syndrome of hypertensive vascular disease, or whether they are, i n f a c t , secondary to the elevated a r t e r i a l pressure (and thus i n d i r e c t e f f e c t s of the o r i g i n a l d i s o r d e r ) , or, f i n a l l y , whether the changes are merely c o i n c i d e n t a l - as a r e s u l t of a general s u s c e p t i b i l i t y to vascular disease, for example. The following statement, made by W.S. Peart i n a f a i r l y recent textbook of medicine, seems to ind i c a t e the current trend of opinion regarding the foregoing issue: " I t seems quite c l e a r that most manifestations of disease i n patients with r a i s e d a r t e r i a l pressure are the consequences of, or are made worse by, the presence of r a i s e d pressure. This applies, for example, to the incidence of atheroma and other 14 vascular disease. ... The fa c t that some patients withstand high a r t e r i a l pressure without showing very many s i g n i f i c a n t changes i n t h e i r vascular systems compared with other patients with lower pressures and more serious vascular diseases does not i n v a l i d a t e t h i s t h e s i s . One i n d i v i d u a l ' s blood vessels may be of better q u a l i t y than another's." [41, p.981] By drawing attention to the absence of a close c o r r e l a t i o n between blood pressure l e v e l and the rate of progression of other changes, statements such as that above may also lead one to ponder the meaning of the term "s e v e r i t y " as applied to e s s e n t i a l hypertension, (indeed, the d e f i n i t i o n of the disease e n t i t y i t s e l f remains obscure.) Thus, while every e f f o r t should be made to d i s t i n g u i s h between the degree of elevation of the blood pressure (severity of hypertension) and the o v e r a l l degree of d e t e r i o r a t i o n of the patient's condition (severity of hypertensive vascular disease), such an e f f o r t i s not c o n s i s t e n t l y made by a l l authors. These problems must be borne i n mind while reading any d e t a i l e d discussion of e s s e n t i a l hypertension. Patients whose a r t e r i a l pressures are c o n s i s t e n t l y above 170-180 mm. Hg s y s t o l i c or 100-110 mm. Hg d i a s t o l i c (roughly) are very l i k e l y to develop cardiac complications i f t h i s degree of hypertension i s allowed to continue unchecked f o r several years. The l e f t v e n t r i c l e of the heart i s almost always involved here since i t i s c o n t i n u a l l y c a l l e d upon to pump blood against the higher pressure i n the vascular tree; t h i s r e s u l t s i n hypertrophy (an increase i n size) of the muscles f i b r e s i n the l e f t v e n t r i c l e causing the v e n t r i c l e wall to become t h i c k e r . D i l a t a t i o n (stretching, to cause a permanent increase i n diameter) of the l e f t v e n t r i c l e follows the l i m i t i n g phase of hypertrophy, and t h i s i s thought by some to 15 be the heart's f i n a l attempted adjustment to the increased (or increasing) l e v e l of a r t e r i a l pressure: a f t e r that, normal stroke volume may no longer be attained i f the blood pressure remains elevated or increases s t i l l f urther, or i f the blood supply to the heart muscle i t s e l f i s somehow impaired; as the pressure "backs up" through the l e f t heart, then through the pulmonary system and eventually i n t o the venous side of the c i r c u l a t i o n , the signs and symptoms of congestive heart f a i l u r e (see Section 2.5) w i l l be seen. Although d e t a i l s of the exact mechanism are s t i l l being sought, t h i s same degree of hypertension i s also thought to lead to, or at l e a s t promote, a r t e r i o l o s c l e r o s i s , a condition i n which the a r t e r i o l e becomes thicker and s t i f f e r , and the lumen (the space within the vessel) i s narrowed. The thickening i s mainly due to hyperplasia (overgrowth) of connective t i s s u e i n the inner l i n i n g of the v e s s e l , although hypertrophy and p r o l i f e r a t i o n of smooth muscle c e l l s , as well as t h e i r migration to the inner l i n i n g (intima) may also be involved [43]. S i g n i f i c a n t narrowing of the lumens of the glomerular a r t e r i o l e s can lead to a gradual decline i n the performance of the kidney as the i n d i v i d u a l f i l t e r i n g units - the nephrons - become impaired. In the eye, the thicker walls of the a r t e r i o l e s i n the r e t i n a may be distinguished i n two ways: (a) the c l e a r l y - v i s i b l e column of blood within the a r t e r i o l e i s narrower; and (b) the s c l e r o t i c a r t e r i o l e compresses the venule where the two cross - a phenomenon known as arteriovenous (AV) nicking. One of the p r i n c i p a l themes of the 1975 i n t e r n a t i o n a l symposium on hypertensive vascular disease, held i n Liege, i s evident i n the following statements by one the p a r t i c i p a n t s , Simon Koletsky: 16 " I t i s now widely accepted that high blood pressure i s a major fa c t o r i n i n i t i a t i n g and promoting the development of a t h e r o s c l e r o s i s . Many c l i n i c a l and experimental observations support t h i s view." [43, p.4] However, unive r s a l acceptance of such statements awaits much more research i n t o the d e t a i l s concerning the development of a t h e r o s c l e r o s i s , a vascular disorder characterized by deposits of c h o l e s t e r o l and other f a t s i n the walls of the major a r t e r i e s . E.D. F r e i s has stated that "atherosclerosis of the coronary a r t e r i e s causes more deaths i n t h i s country [the United States] than any other disease" [23, p.76].. Most of the deaths are a r e s u l t of myocardial i n f a r c t i o n , a condition i n which part of the heart muscle dies because the a r t e r i e s supplying i t are blocked o f f (usually by a blood c l o t ) . The most severe form of hypertensive vascular disease i s often c a l l e d malignant or accelerated hypertension to emphasize the serious and rapidly-progressive nature of the changes associated with i t . This rare form of the disease i s also characterized by a more consistent r e l a t i o n s h i p between a r t e r i a l pressure l e v e l and the rate of vascular and organic d e t e r i o r a t i o n , as Peart notes i n the following: "Despite occasional suggestions to the contrary, the best evidence points to the malignant phase (necrotizing or accelerated phase) of hyper-tension as being mainly r e l a t e d to the absolute height of the mean pressure or the rate of r i s e of the mean pressure. E f f o r t s to segregate malignant hypertension as a p e c u l i a r disease i n i t s own r i g h t have been unrewarding..." [41, p.982] Usually, sustained d i a s t o l i c pressure l e v e l s above 120 mm. Hg are required to touch o f f the series of rapid changes i n the cardiovascular system 17 that, i f l e f t untreated, often lead to death within two or three years. In some instances, vigorous methods of treatment are capable of substan-t i a l l y improving t h i s very grim prognosis [7]-. The major pat h o l o g i c a l change associated with malignant hypertension i s f i b r i n o i d necrosis, which r e f e r s to death of smooth muscle c e l l s i n the walls of the a r t e r i o l e s , a f t e r which the degenerated c e l l s resemble a plasma protein c a l l e d f i b r i n . I f the weakened wall of the a r t e r i o l e so affe c t e d allows red blood c e l l s to pass through i t , small hemorrhages w i l l occur around the a r t e r i o l e ; plasma leakage, on the other hand, causes edema around the necrotic a r t e r i o l e . Furthermore, c l o t s may form within the a r t e r i o l e , p a r t i a l l y or completely blocking i t o f f . The a r t e r i o l e s of the brain and kidney are p a r t i c u l a r l y susceptible to f i b r i n o i d necrosis. In the brain, a condition known as "acute hypertensive encephalopathy" develops as a d i r e c t r e s u l t ( i t i s believed) of multiple small hemorrhages, f o c a l or generalized edema and severe c o n s t r i c t i o n of the cerebral a r t e r i o l e s . The kidneys undergo an accelerated form of the changes mentioned e a r l i e r with respect to a r t e r i o l o s c l e r o s i s , as the glomeruli, each supplied by a single a r t e r i o l e , degenerate i n the absence of a s u f f i c i e n t blood supply. Examination of the retinae with the use of an ophthalmoscope often provides a useful clue as to the state of the a r t e r i o l e s elsewhere i n the body: the hemorrhages observed there are often streaky i n appear-ance, while the s i t e s of plasma leakage are s o f t and f l u f f y - l o o k i n g and are thus c a l l e d "cotton-wool" or " s o f t " exudates. The head of the optic s nerve may also develop edema, turn a darker pink colour, and become noticeably elevated; t h i s condition i s c a l l e d papilledema, and i s almost 18 always seen i n fully-developed cases of accelerated hypertension. When the disease has reached t h i s stage, the prognosis of death within s i x months or a year (usually due to uremia, a to x i c condition associated with retention i n the blood of nitrogenous substances that are excreted by the normal kidney) i s quite consistent unless intensive treatment i s begun immediately. 2.5 The C l i n i c a l Course of E s s e n t i a l Hypertension I t i s now commonly believed that the hypertension associated with hypertensive vascular disease usually begins before the age of 40 -that i s , t h i s disorder i s no longer to be associated with only middle or o l d age. I t i s true, however, that most of the morbidity and mortality r e s u l t i n g from (or associated with) e s s e n t i a l hypertension becomes apparent - i n the absence of severe hypertension - only during the l a t e r stages of the course of the disease. Because the e a r l i e s t stages of t h i s disease are so devoid of symptoms, i t i s notoriously d i f f i c u l t to document i t s onset (except i n rather rare cases); t h i s reason, and the b e l i e f (nevertheless) that e s s e n t i a l hypertension has an average duration of about 20 years, may explain why a complete, d e t a i l e d p i c t u r e of the natural h i s t o r y of t h i s disorder i s not a v a i l a b l e . The term, " v a r i a b i l i t y " , i s c e n t r a l to any d e s c r i p t i o n of the c l i n i c a l course of hypertensive vascular disease. The s t r i k i n g v a r i a b i l i t y i n s everity has already been alluded to i n the previous section, i n connection with accelerated hypertension; at the other extreme, one finds cases with "benign" hypertension, which may involve only mild health problems during i t s course of more than 30 years. Moreover, "the degree of involvement of d i f f e r e n t organs may be so unequal that the c l i n i c a l 19 p i c t u r e i n a given patient may be completely dominated by the manifes-tations of damage to one or the other of the v i t a l systems, so that "cardiac", "cerebral" and "renal" forms of the disease are sometimes described'! . [12, p.245]. The signs and symptoms presented by the hypertensive patient and observed by the c l i n i c i a n may also have various i n t e r p r e t a t i o n s i n terms of the underlying p h y s i o l o g i c a l changes that are capable of pro-ducing them. I t i s therefore e s s e n t i a l to i n t e r p r e t a given c l i n i c a l manifestation i n the l i g h t of the r e s u l t s of a complete work-up of the p a t i e n t . K.A. Evelyn [15] has made an extensive study of the pattern of change i n blood pressure over time. Figure 1 i s a reproduction of "an unusually complete example of the type of blood pressure chart that can sometimes be obtained by pooling the records of several physicians. . . [Figure 1] i l l u s t r a t e s most of the features which are c h a r a c t e r i s t i c of patients who are currently considered to be examples of e s s e n t i a l hyper-tension" [15, p.93]. The most important c h a r a c t e r i s t i c s are the following: (a) even over small time i n t e r v a l s , the blood pressure shows a considerable amount of l a b i l i t y , and t h i s v a r i a b i l i t y seems to increase with the mean l e v e l f o r both s y s t o l i c and d i a s t o l i c pressures; (b) there i s a phase during which the blood pressure i s well within "normal" l i m i t s ; t h i s phase ends around the forty-second year i n the case depicted i n Figure 1; (c) following t h i s "normotensive" phase i s a period of t r a n s i t i o n , i n which only occasional readings are above 140/90; (d) the t r a n s i t i o n phase gives way to a period of f a i r l y steady increase i n both s y s t o l i c and d i a s t o l i c pressure (ages 48 to 51 i n the example given); note .that the d i a s t o l i c en IE £ E Q) . _ U) 00 QJ i _ Q . •o o jO m 300 200 100 • o ° o „ _ o o o o o OO o o a o ° ° I I I I I I I I I 1 1 1 I I 1 I 1 I I I I I I I I 1_L 38 40 42 44 46 48 50 52 54 56 58 60 62 64 Age Figure 1: Typical Blood Pressure Chart of a Hypertensive Patient (showing the four phases) Note: D i a s t o l i c pressure i s shown with open dots, s y s t o l i c with f u l l dots. N 3 o 21 increases more slowly than the s y s t o l i c , so that a simultaneous r i s e i n pulse pressure i s also apparent here; (e) both blood pressure readings seem to " l e v e l o f f " (after the age of 52 in Figure 1 ), although a great deal of f l u c t u a t i o n about the mean i s common. Evelyn presents other charts i n support of h i s view that t h i s "plateau e f f e c t " i s a consistent f i n d i n g (and one of the very few) i n hypertensive men and women of a l l ages, and of various degrees of blood pressure elevation. These findings take on even greater s i g n i f i c a n c e when i t i s noted that the c l i n i c a l record of the evolution of hypertensive vascular disease has been, and remains, heavily dependent upon sphygmomanometric readings - almost to the point where the most easily-measured symptom, hypertension, i s equated with the (rather vague) underlying disease process. However, the blood pressure may f a l l to an approximately normal l e v e l during the course of the disease, sometimes as a r e s u l t of cardiac complications, but also f o r no known reason i n some cases. Such i r r e g u l a r -i t i e s , as well as other problems with over-emphasis on a r t e r i a l pressure readings, w i l l be further discussed i n Section 2.6. The signs and symptoms associated with damage to the four main target organs (the brain, the heart, the kidney, and retina) of the hyper-tensive patient are often placed i n two d i f f e r e n t categories: one contains those signs and symptoms which are believed to be a d i r e c t r e s u l t of the elevated blood pressure; the other includes those manifestations which may be more d i r e c t l y r e l a t e d to the atherosclerosis that often accompanies hypertension. Among the many kinds of headache complained of by hypertensives, i t i s f e l t that only the v a r i e t y that occurs upon awakening i n the 22 morning and i s located i n the back of the head ( o c c i p i t a l headache) i s d i r e c t l y r e l a t e d to hypertension; t h i s same view holds that true hyper-tensive headache i s rare i n the absence of severe hypertension ( d i a s t o l i c l e v e l s above 120 mm. Hg).[41]. In cases of greater o v e r a l l severity (that i s , in v o l v i n g vascular damage), the syndrome of hypertensive encephalopathy may be observed. Of a more transient nature than the acute form discussed i n the previous section, t h i s syndrome i s probably caused by intermittent severe contractions of the cerebral vessels i n association with high a r t e r i a l pressure [41]; i t manifests i t s e l f through prolonged, unusually severe o c c i p i t a l headache, d i z z i n e s s , nausea, vomiting, transient p a r a l y s i s or numbness, transient loss of v i s i o n , and even coma and convulsions [12] . Major cerebral events may occur as well and are of two main kinds: those in v o l v i n g hemorrhage - e i t h e r cerebral or subarachnoid (beneath the middle of three membranes.' that cover the brain and s p i n a l cord); and those in v o l v i n g thrombosis (a blood clot) - a r e s u l t which i s f e l t to be more d i r e c t l y r e l a t e d to the existence of a t h e r o s c l e r o t i c plaques i n the cerebral or the c a r o t i d a r t e r i e s . As noted i n the l a s t section, cardiac complications are common i n hypertensives, and are of two main kinds: (a) The sequence of hyper-trophy, d i l a t a t i o n , and eventual f a i l u r e of the l e f t v e n t r i c l e , due to i t s increased work load; and (b) l o c a l , temporary obstruction of blood flow (ischemia) to the myocardium (heart muscle), r e s u l t i n g i n angina p e c t o r i s (pain and oppression around the heart), or even i n f a r c t i o n . A number of symptoms are r e l a t e d to these complications and i n some cases, d i f f i c u l t y a r i s e s i n d i s t i n g u i s h i n g t h e i r o r i g i n s . Dyspnea ( d i f f i c u l t , 23 laboured breathing), cough, increased nocturia (urination during the nig h t ) , and eventually swelling of the feet are some of the early signs of congestive heart f a i l u r e , while orthopnea (breathing almost impossible while i n the l y i n g p o s i t i o n ) , paroxysmal nocturnal dyspnea (sudden, p e r i o d i c attacks of severe dyspnea during the ni g h t ) , and, l a t e r , pulmonary edema occur as the condition progresses. Angina pe c t o r i s may also be d i r e c t l y l i n k e d to the high a r t e r i a l pressure [41] , although i t i s customarily a t t r i b u t e d to s c l e r o t i c narrowing of the coronary a r t e r i e s . Abnormalities of the electrocardiogram (ECG) are more r e l i a b l e indices of the early stages', of l e f t v e n t r i c u l a r hypertrophy than are the r e s u l t s of roentgenographic examination [23] , since often only the more obvious changes caused by d i l a t a t i o n are noticed i n the chest x-ray. F r e i s has noted the absence of "r e a d i l y applied tests to detect minor degrees of renal damage". [23, p.104]. Also, minor degrees of renal impairment often accompany hypertension of "moderate" severity (170-180/ 100-110 and more), while the more serious changes, including uremia, are usually part of the syndrome of accelerated hypertension. Thus, kidney symptoms, when they are detectable, are l i k e l y to carry serious prognostic s i g n i f i c a n c e . The o v e r a l l condition of the kidneys i s generally studied by examining the .patient f o r nocturia (refer to the l a s t paragraph f o r another i n t e r p r e t a t i o n ) , p r o t e i n u r i a (protein, usually albumin, i n the ur i n e ) , reduced urinary concentrating power ( s p e c i f i c g r a v i t y ) , the a b i l i t y to excrete phenolsulfonphthalein (PSP), and for the amount of non-protein nitrogen i n the blood (NPN). In the case of the r e t i n a , the pathological changes and the signs and symptoms overlap to a great degree (see Section 2.4). In 24 summary, the a r t e r i o s c l e r o t i c changes include increased r e f l e c t i o n of l i g h t by the a r t e r i o l e s , and arteriovenous nicking - both a r e s u l t of s t i f f , thick-walled vessels. So-called "benign" hypertensive abnormalities include generalized or l o c a l i z e d a r t e r i o l a r narrowing and twisting, while the more severe changes involve some degree of retinopathy ( r e t i n a l hemorrhanges and edema, " s o f t " and "hard" exudates), and eventually the formation of a star-shaped figure r a d i a t i n g out from the macula (a yellow spot on the retina) . Papilledema i s often seen i n cases of extremely elevated a r t e r i a l pressure, but the l a t t e r condition i s not necessary for the appearance of t h i s symptom. The l i m i t a t i o n s of the present review make i t impossible to give more d e t a i l on the many and varied manifestations of hypertensive disease. However, the foregoing should be s u f f i c i e n t f o r an understanding of the v a r i ables and t h e i r s t a t i s t i c a l analysis that are presented i n following chapters. 2.6 The Prognosis of E s s e n t i a l Hypertension L i f e insurance studies, many of them done before 1960, established (in large samples of individuals) the existence of a strong c o r r e l a t i o n between casual blood pressures and subsequent mortality r a t e . Table II [44, p.7] shows the ten-year age-adjusted rates per 1000 men, both for coronary heart disease deaths and a l l causes of death combined. Such studies were l a r g e l y responsible for d i s p e l l i n g any l i n g e r i n g b e l i e f s that mild elevations of blood pressure are "benign" i n nature. However valuable such data may be, though, Evelyn [15] has pointed out three shortcomings of the a c t u a r i a l approach: (a) the samples studied did not include the more severe grades of hypertension, since such 25 Table I I : 10-Year, Age-Adjusted Death Rates Per 1000 Men, National Cooperative Pooling Project. White Males, Age 30-59 at Entry. Sample Sizes f o r Each Blood Pressure Group Given i n Parentheses : DIASTOLIC PRESSURE (MM. Hg.) < 75 75-84 85-94 95-104 * 105 A l l Deaths 50 54 82 84 158 Chd Deaths 26 20 38 45 70 (1271) (2752) (2125) (940) (493) NOTE: Chd stands f o r "Coronary Heart Disease" 26 cases were f e l t to be very poor insurance r i s k s ; (b) casual blood pressure readings are subject to the sources of v a r i a t i o n discussed i n Section 2.1, and occurrences of very large drops i n a r t e r i a l pressure upon h o s p i t a l i z a -t i o n are not rare; (c) the e f f e c t s of the often co-existing a t h e r o s c l e r o s i s i n the hypertensive must also be taken into account i n the prognosis. In summary, Evelyn states that there i s " c l i n i c a l evidence that the height of the casual blood pressure i n patients with substantial degrees of 3 hypertension i s rather a poor guide to the prognosis of the i n d i v i d u a l " [15, p.98]. Attempts have been made to co r r e l a t e the various kinds and degrees of cardiovascular complications with the outlook f o r the hyper-tensive p a t i e n t . Through a study of 250 unselected and untreated hypertensives, G.A. Perera drew the following conclusions i n h i s 1948 a r t i c l e : "In general, the i n i t i a l height of blood pressure, headaches and other symptoms, cardiac hypertrophy and r e t i n a l changes bore no r e l a t i o n s h i p to ultimate outlook. Progressive r i s e i n blood pressure, cerebral vascular accidents, r e t i n i t i s , coronary artery disease, congestive f a i l u r e or albuminuria indicated a r e l a t i v e l y poor prognosis though with notable exceptions." [42, p.22] Other authors have also noted the somewhat ambiguous nature of the changes i n the retinae; the problem seems to a r i s e from the observation that, while severely elevated blood pressure often involves serious r e t i n a l manifestations, the reverse statement seems to be v a l i d l e s s frequently. Moreover, even hemorrhages, s o f t exudates, and papilledema can recede Blood pressure l e v e l s exceeding about 180-190/110-120 mm. Hg are implied here. 27 "spontaneously". Evelyn, while working with 98 untreated hypertensives (as part of a study on the value of a c e r t a i n s u r g i c a l procedure for e s s e n t i a l hypertension), became convinced "of the dependence of the prognosis of hypertension on the presence of vascular complications rather than on the height of the casual blood pressure" [15/ p.109]'. This series of 48 men and 50 women with rather severe grades of hypertensive vascular disease was divided into three groups, based only on i n i t i a l evidence of cardio-vascular-renal complications: the group l a b e l l e d "A" had no serious complications; those i n group "B" had at l e a s t one of the complications, other than papilledema; and group "C" consisted of any patients having papilledema. Table III gives the corresponding ten-year death rates and mean s u r v i v a l times of the patients who died [15, p.110]. The very poor outlook for those patients with papilledema i n t h i s s e r i e s i s consistent with the remarks made i n the l a s t section. Fortunately, only about 5% of a l l hypertensives eventually enter the accelerated phase of t h i s disorder, and r a r e l y i s there a t r a n s i t i o n from the normotensive state d i r e c t l y i n t o the "malignant" state. Evelyn describes a " t y p i c a l " case of accelerated hypertension: " . . . the patient i s r e l a t i v e l y young and a rather rapid rate of increase of the blood pressure to sustained high d i a s t o l i c l e v e l s i s associated, over a r e l a t i v e l y short period of time, with acute a r t e r i o l a r damage a f f e c t i n g the kidneys and r e t i n a . " [15, p.106] Studies of untreated hypertensives with various degrees of elevation of the a r t e r i a l pressure have shown accelerated hypertension to be the cause of death only about 5% of the time, while congestive Table I I I : Summary of Mortality by Severity Group SEVERITY GROUP A B C Death Rate (%) 16 54 88 Mean Survival Time 89 mo. 50 mo. 7 mo. Number of Cases 31 52 15 29 heart f a i l u r e , myocardial i n f a r c t i o n , cerebral hemorrhage, and gradual renal f a i l u r e account f o r 40-50%, 10-20%, 10-20%, and 4-5% of hyper-tensive deaths r e s p e c t i v e l y [42, p.422]; [12, pp.248, 249]. Such a pattern i s consistent with the discussion of the pathology of hypertensive vascular disease given i n Section 2.4 . 2.7 Treatment During a period of roughly 25 years, ending about 1960, a sur-g i c a l procedure known as sympathectomy was used i n the treatment of e s s e n t i a l hypertension. This rather d e l i c a t e operation involves excision of c e r t a i n ganglia (masses of nerve tissue) i n the sympathetic d i v i s i o n of the autonomic nervous system, and i s followed by a convalescence of at l e a s t three months. I t w i l l be important i n the sequel to r e c a l l that "most surgeons prefer not to operate on patients over 55 years of age or on those who have had major complications such as congestive heart f a i l u r e , myocardial i n f a r c t i o n or cerebrovascular accident. Serious impairment of renal function i s u n i v e r s a l l y regarded as an absolute c o n t r a i n d i c a t i o n " [12, p.257]. Sympathectomy i s r a r e l y performed now, since i t tends to r e s u l t i n a l a s t i n g , s i g n i f i c a n t reduction of blood pressure i n only about one-t h i r d of the patients who are s t i l l a l i v e f i v e years a f t e r t h e i r operation; the operation i s also c o s t l y and produces annoying side e f f e c t s . Moreover, the Veterans Administration Cooperative Study [19, 20, 21] has l e f t l i t t l e doubt about the effectiveness of c u r r e n t l y - a v a i l a b l e antihypertensive drugs i n preventing (or delaying) some types of cardiovascular complica-tions i n men with average d i a s t o l i c pressures above 105 mm. Hg. Although proof of a s i m i l a r effectiveness for men i n the 90-105 mm. Hg range, as 30 well as f o r women i n general, i s s t i l l awaited, i t has nevertheless been considered unethical f o r more than 25 years not to use the drugs a v a i l a b l e . There are three classes of antihypertensive drugs at present; because they act i n d i f f e r e n t ways, they also present d i f f e r e n t side e f f e c t s , many of which can be minimized by a c a r e f u l combination of two or more types- of drugs. The three classes are: (a) d i u r e t i c s , which decrease plasma and e x t r a c e l l u l a r volumes and reduce sympathetic nervous a c t i v i t y ; (b) va s o d i l a t o r s , which lower peripheral resistance by causing the a r t e r i o l e s to d i l a t e ; (c) sympathetic i n h i b i t i n g agents, which block the a c t i v i t y of the sympathetic nervous system at various s i t e s . Hydrochloro-t h i a z i d e , hydralazine, and methyldopa are f a i r l y common examples of each of the three types, r e s p e c t i v e l y . I t must be emphasized that such drugs have been designed to control the "leading" symptom of hypertensive vascular disease; as such they may not be eliminating the disease e n t i t y i t s e l f . I t seems c l e a r that more research i n t o the causes, nature, and extent of t h i s e n t i t y i s urgently needed. Chapter 3 31 THE SOURCE AND STRUCTURE OF THE DATA The data to be analyzed i n the sequel are part of the r e s u l t s of a case-control study which was undertaken some 40 years ago and published in 1960 by K.A. Evelyn et a l under the t i t l e , " E f f e c t of Thoracolumbar Sympathectomy on the C l i n i c a l Course of Primary (Essential) Hypertension" [14]. The study - to which reference has already been made i n Section 2.6 -- involved 198 e s s e n t i a l hypertensives whose condition had been diagnosed before 1945; the patients were followed u n t i l death, or for a maximum of ten years, and a f a i r l y complete record of complications i n v o l v i n g the four major target organs i s a v a i l a b l e f or almost a l l the patients. Since the 98 control subjects received e s s e n t i a l l y only symptomatic treatment - only two patients reveived any anti-hypertensive drug therapy while on study - the follow-up data c o l l e c t e d from t h i s group constitute a valuable body of information on the c l i n i c a l course of untreated primary hypertention. 3.1 The Sample Although the data c o l l e c t e d on the patients who underwent surgery are e s s e n t i a l l y ignored i n the sequel, the p a i r i n g procedure used i n the o r i g i n a l study makes i t necessary to consider the c h a r a c t e r i s t i c s of the s u r g i c a l sample before discussing those of the c o n t r o l sample. The patients who underwent thoracolumbar sympathectomy were the f i r s t 50 of each sex to be accepted for t h i s operation between the years 1940 and 1945 at Massachusetts General H o s p i t a l , with the following exceptions: non-residents of the state and patients with secondary hypertension (see Section 2.2) were excluded at the outset; those who l a t e r died of causes not r e l a t e d to hypertensive disease, along with three patients who were 32 l o s t to follow-up f o r various reasons, were replaced by patients who eventually yielded more complete sets of data - a system that eliminated two types of "censored" data (see Chapter 4) from the f i n a l r e s u l t s [14, p.190]. Thus, the s u r g i c a l group i s a sample that i s not f u l l y random, and consists of primary hypertensives r e s i d i n g i n Massachusetts around 1940 and considered e l i g i b l e f o r sympathectomy. The c o n t r o l data "were obtained from the c l i n i c a l records of nearly 1,500 patients who had received symptomatic therapy only"'", and who 2 had been followed up i n one of three cooperating Hypertension C l i n i c s 3 for a minimum of ten years or u n t i l death " [14, p.219]. Once the s u r g i c a l series was established, 98 of the patients i n t h i s group were paired with a suitable control subject. A "preoperative period" was defined f o r the s u r g i c a l cases as the six-month period p r i o r to the operation; f o r the purposes of p a i r i n g , the analagous period f o r the controls was understood to be the s i x months following the patient's f i r s t v i s i t to a c l i n i c . On the basis of the subject's c l i n i c a l record during the "preoperative period", Evelyn et a l assigned a grade of 0, 1, 2, 3 or 4 to each of 13 "items" (variables) designed to r e f l e c t the degree of involvement of the heart (three items), the brain (two items), the kidneys (three items), and the retinae (three items); the average (casual) s y s t o l i c and d i a s t o l i c pressures were graded as w e l l . I t was on the basis of age, sex, and 12 of the 13 preoperative grades - headache was ignored - that s u r g i c a l and ''"Described as any of the standard treatments of the day that were known to r e l i e v e the discomfort produced i n any of the various target organs - with the exception of antihypertensive drugs. 2 Massachusetts General Ho s p i t a l , Boston; Columbia-Presbyterian Medical Center, New York; Royal V i c t o r i a Hospital, Montreal. 3 This implies that the study was e s s e n t i a l l y retrospective. 33 control cases were matched. Since stringent c r i t e r i a had to be met before a proposed match was considered acceptable, two very young male members of the s u r g i c a l group were not matched. As a r e s u l t of t h i s c a r e f u l p a i r i n g process, the f i n a l c ontrol group - 48 men and 50 women - displays the same general c h a r a c t e r i s t i c s as the s u r g i c a l s e r i e s . In summary then: (a) the controls represent well-defined cases of e s s e n t i a l hypertension; (b) with several exceptions, the cont r o l patients are some-what more severe cases than usual, since sympathectomy was a rather extreme form of therapy (see Section 2.7); (c) there was no evidence of serious kidney impairment (nitrogen retention) at the outset i n any of the controls; severe heart symptoms (congestive heart f a i l u r e and myo-c a r d i a l i n f a r c t i o n ) and major cerebrovascular complications were s i m i l a r l y absent; (d) a l l controls were under the age of 55 at the s t a r t of the study; (e) ten male and f i v e female controls had papilledema at the outset; (f) since they were drawn from only those patients whose c l i n i c a l records were reasonably complete, the control sample probably represents the more "compliant" segment of the hypertensive population. 3.2 The Variables The reader i s direc t e d to the 1960 a r t i c l e of Evelyn et a l 34 [14] f o r a d e t a i l e d d e s c r i p t i o n of the follow-up r e s u l t s of the o r i g i n a l study. A summary of the variables relevant to the present analysis i s included here. (1) Age: the age, i n years, of the patient at the time of "operation". (2) Family h i s t o r y of hypertension.- coded 4 (++ i n the o r i g i n a l data table) i f there was a h i s t o r y of hypertension i n both parents; coded 3 (+) i f there was a h i s t o r y of hypertension i n one parent or i n at l e a s t one of the patient's s i b l i n g s ; coded 2 (?) i f the family h i s t o r y was doubtfully p o s i t i v e ; and coded 1 (-) i f there were no known cases of hypertention among the patient's f i r s t - d e g r e e r e l a t i v e s , or i f no informa-t i o n was a v a i l a b l e . (3) Known duration of the disease: due to the asymptomatic nature of the early stages of e s s e n t i a l hypertension, t h i s v a r i a b l e must be regarded as being inherently u n r e l i a b l e ; indeed, many cases are simply recorded as "six months", the length of the preoperative period. (4) , (5) "Preoperative" maximum s y s t o l i c , d i a s t o l i c blood pressures: recorded i n the o r i g i n a l units (mm. Hg), these two variables "represent the highest s y s t o l i c and d i a s t o l i c reading recorded under "casual" condi-tions during the six-month preoperative period" [14, p.201]. (6), (7) "preoperative" minimum s y s t o l i c , d i a s t o l i c blood pressures: these two va r i a b l e s were meant to record the lowest blood pressure read-ings obtained under " r e s t i n g " conditions during the "preoperative" period; however, since the i n i t i a l examination of 56 of the 98 control patients was done on an out-patient basis, no " r e s t i n g " measurements were performed 35 fo r these cases; i n the published data, an a s t e r i s k indicates that only a "casual minimum has been recorded. (8), (9), (10), (11) Average (casual) s y s t o l i c blood pressure grades: corresponding to four pre-determined points i n the course of the ten-year study, these v a r i a b l e s , recorded on the 0-4 scale mentioned i n Section 3.1, were designed to indicate the average s y s t o l i c pressure based on "casual" readings; the "time zero" grade was based on the average of a l l "casual" readings taken during the "preoperative period", while the "time two" grade r e f l e c t s the average of a l l "casual" readings made over a period of one year centered at the second anniversary of the "operation"; the "time f i v e " and "time ten" grades were s i m i l a r l y defined i n r e l a t i o n to the f i f t h and tenth anniversaries r e s p e c t i v e l y . The grades were defined as follows: 0 : l e s s than 140 mm. Hg 1 : 140 - 169 mm. Hg 2 : 170 - 199 mm. Hg 3 : 200 - 229 mm. Hg 4 : more than 229 mm. Hg (12) , (13), (14) , (15) Average (casual) d i a s t o l i c blood pressure grade: these four v a r i a b l e s were defined i n the same way as the corresponding s y s t o l i c grades, except the i n t e r v a l s are now the following: 0 : l e s s than 90 mm. Hg 1 : 90 - 104 mm. Hg 2 : 105 - 119 mm. Hg 3 : 120 - 134 mm. Hg 4 : more than 134 mm. Hg 36 I t should be noted that the lowest two grades (0 and 1) were r a r e l y used f o r the control group, t h e i r o r i g i n a l purpose being to allow f o r dramatic drops i n a r t e r i a l pressure i n the s u r g i c a l s e r i e s following the operation. (16)., (17) , (18) , (19) Cardiac symptoms (grade) : t h i s item was recorded at years 0, 2, 5, and 10 i n the course of the study (where p o s s i b l e ) , as were a l l other graded (0-4) items; here, the grades cover heart symptoms of two basic kinds: those r e s u l t i n g from various degrees of myocardial i n s u f f i c i e n c y , from s l i g h t dyspnea to congestive heart f a i l u r e ; and those r e s u l t i n g from coronary heart disease (ranging from s l i g h t angina to myocardial i n f a r c t i o n ) [14, p. 192] ; the grades progress from no symptoms of ei t h e r kind (0) through s l i g h t dyspnea or r e t r o s t e r n a l discomfort (1) and f i n a l l y to c r i p p l i n g degrees of l e f t v e n t r i c u l a r or congestive f a i l -ure, or angina decubitus, or myocardial i n f a r c t i o n (4) [14, p.191]. Unfortunately, no attempt was made to account f o r the two types of symptoms separately, although Evelyn et a l have recognized the d e s i r -a b i l i t y of d i s t i n g u i s h i n g between the d i r e c t and the more i n d i r e c t consequences of hypertension (see the discussion of heart symptoms i n Section 2). (20), (21), (22) , (23) Heart size (grade): abnormalities of the shape and the s i z e of the heart (cardiothoracic r a t i o ) were recorded on the basis of a chest x-ray; grade 0 r e f e r s to a normal shape and a r a t i o of less than 55%, while grade 4 denotes a r a t i o of over 64% (enlarged heart). (24), (25) , (26) , (27) Abnormalities of the electrocardiogram (grade): t h i s item records changes i n the T waves and ST segments, grade 3 being 37 reserved f o r patterns c h a r a c t e r i s t i c of l e f t v e n t r i c u l a r hypertrophy or of acute coronary i n s u f f i c i e n c y , and grade 4 assigned to " a r t e r i a l f i b r i l l a t i o n or pattern c h a r a c t e r i s t i c of o l d or recent myocardial i n -f a r c t i o n " [14, p.192]. Often, no grade could be assigned to t h i s item because of the uncertainty caused by d i g i t a l i s therapy. (28) , (29), (30), (31) Headache (grade): an attempt was made to apply the 0-4 scale to t h i s highly q u a l i t a t i v e v a r i a b l e ; grade 0 r e f e r s to head-ache of "normal" se v e r i t y and frequency, while grade 4 i s reserved f o r headache that i s "incapacitating, r e q u i r i n g bedrest or opiates f o r r e l i e f " [14, p.192]. No grade was assigned f o r cases of c l a s s i c migraine. (32), (33), (34), (35) Cerebrovascular accidents (grade): t h i s item i s broadly defined to include a l l those sensory and motor manifestations not included i n the previous item; thus, symptoms progressing from numbness or clumsiness, to strokes with varying duration of loss of function, as well as hypertensive encephalopathy (from acute to chronic) are accounted f o r here; grade 3 also includes cases of subarachnoid hemorrhage (see [14, p.192]. I t i s important to note that, while enlargement of the heart i s a f a i r l y - l a s t i n g state, a stroke i s an i s o l a t e d event whose e f f e c t s may be temporary or permanent. The item described i n the l a s t paragraph records the incidence of a stroke and i t s severity, but makes l i t t l e attempt to evaluate the p e r s i s t e n t e f f e c t s of the accident at subsequent time points. For example, a stroke that was graded 3 at time two years i n the study 38 would be followed by a grade of 0 at time f i v e i f the patient suffered no further strokes i n t h i s i n t e r v a l [14, p.192]. (36) , (37) , (38) , (39) Renal symptoms (grade): grades 0 to 2 are con-cerned with the frequency of nocturia, while grades 3 and 4 are reserved for the symptoms of uremia (early and advanced, r e s p e c t i v e l y ) . The reader w i l l r e c a l l from Section 2.5 that nocturia may also be an early symptom of congestive heart f a i l u r e . (40) , (41), (42) , (43) P r o t e i n u r i a (grade): for t h i s item, the 0-4 scale chosen f o r the study corresponds to the scale used i n the c l i n i c a l laboratory; as always, 0 indicates a normal state, while a 4 indicates advanced abnormality. (44) r (45), (46), (47) Renal function (grade): t h i s item "was graded on the basis of urinary concentrating power (SG), the a b i l i t y to excrete i n -travenously i n j e c t e d phenolsulfonthalein (PSP) , and the concentration of non-protein nitrogen i n the blood (NPN)" [14, p.192]; grades 3 and 4 are devoted e x c l u s i v e l y to NPN concentrations. (48) , (49) , (50) , (51) Abnormalities of the r e t i n a l vessels (grade) : the r e t i n a l vessels were examined for both hypertensive (narrowing and twisting of a r t e r i o l e s ) and a r t e r i o s c l e r o t i c (increased l i g h t r e f l e x , AV nicking) changes; t h i s item records the average of the two separate grades, with rounding toward the hypertensive grade [14, p.193]. Almost h a l f the c o n t r o l patients were examined by the same ophthalmologist. 39 (52) , (53) , (54), (55) Retinopathy (grade): t h i s item records the degree of hypertensive change i n the retinae, including r e t i n a l hemorrhages, edema, s o f t and hard exudates (see Section 2.4); grades 3 and 4 record the development of a "star f i g u r e " at the macula (see Section 2.5). (56) , (57), (58), (59) Papilledema (grade) : r e f e r s to the degree of elevation of the nerve head (see Section 2.4); grade 4 r e f e r s to an eleva-t i o n of 4 or more diopters. (60), (61) Cause of death: 46 of the 98 patients died while on study, and the major and secondary causes of death ( i f any) are recorded i n these two items; the a r b i t r a r y code used f o r the present study i s : 0 : patient s t i l l a l i v e (or no secondary cause) 1 : heart f a i l u r e 2 : myocardial i n f a r c t i o n 3 : hypertensive cardiovascular disease 4 : cerebral hemorrhage 5 : cerebrovascular accident 6 : uremia 7 : malignant hypertension (62) Survival time postoperatively: f o r those patients coded 0 i n item 60, the maximum time on study (10 years) i s recorded; otherwise, the s u r v i v a l time, i n years, i s given. (63) Sex: coded 2 f o r males and 4 for females. (64) Severity group: t h i s item was added to those derived from the published data table by applying the d e f i n i t i o n of severity groups as 40 stated i n the o r i g i n a l a r t i c l e [14]; groups A, B, and C were coded 1, 2, and 3 r e s p e c t i v e l y (see Section 2.6 or [14, p.206]). Table IV presents a l i s t of a l l 64 o r i g i n a l v a r i a b l e s together with the most commonly-used (in the sequel) abbreviation of each item name, i t s range of values, and the number of co n t r o l subjects f o r whom no data were recorded for that v a r i a b l e , at d i f f e r e n t times. The number of missing variables f o r a given patient varies from 0 to 36. There are several reasons for the use of the symbol " — " i n the o r i g i n a l data t a b l e : f i r s t , i t i s c l e a r that a patient who died a f t e r four years on study would be coded " — " for a l l time 5 and time 10 v a r i a b l e s ; however, Evelyn et a l adopted the rule that, i n such a case, the columns o r i g i n a l l y reserved for the five-year data would hold instead a record of the patient's condition j u s t p r i o r to death; the analagous rule applied regardless of the point i n the course of the study at which the p a t i e n t died. Other missing data are the r e s u l t of the authors' rules regarding i n t e r p o l a t i o n : i n cases where a thorough examination of the patient had not been made during the month c l o s e s t to a given anniversary of the "operation", i n t e r p o l a t i o n based on data obtained before and a f t e r that month was allowed - but only i f the "before" and " a f t e r " grades were i d e n t i c a l [14,p.197], A f i n a l source of missing data originates i n "the r e l i a b i l i t y and completeness of the o r i g i n a l records and follow-up exam-inat i o n s , and, i n the case of subjective symptoms such as headache and dyspnea, . . . the patient's r e l i a b i l i t y as a witness" [14, p.193]. Although there i s i n e v i t a b l y a c e r t a i n loss of information involved i n the use of a system of grades, e s p e c i a l l y when the open-ended extreme grade i s given, Evelyn et a l point out the major advantage TABLE IV: Summary of the Variables NUMBER NAME ABBREVIATION RANGE NO. MISSING 1 Age at operation (yrs.) AGE 15.8-53.2 0 2 Family h i s t o r y HIST 1,2,3,4 0 3 Known duration (yrs.) DUR 0.5-16.0 0 4 Preop. maximum s y s t o l i c MAXSYS 180-300 0 5 Preop. maximum d i a s t o l i c MAXDIA 115-180 0 6 Preop. minimum s y s t o l i c MINSYS 115-260 0 7 Preop. minimum d i a s t o l i c MINDIA 70-140 0 8-11 Average casual s y s t o l i c SYS0,2,5,10 0,1,2,3,4 0,18,7,6 12-15 Average casual d i a s t o l i c DIA0,2,5,10 0,1,2,3,4 0,18,7,6 16-19 Cardiac symptoms HSYMO.2,5,10 0,1,2,3,4 0,15,8,5 20-23 Heart size HSIZ0,2,5,10 0,1,2,3,4 7,36,19,2 24-27 Electrocardiogram HECGO,2,5,10 0,1,2,3,4 12,42,24,17 28-31 Headache BACHO.2,5,10 0,1,2,3,4 0,13,6,10 32-35 Cerebrovascular accident BCVAO.2,5,10 0,1,2,3,4 0,8,4,4 36-39 Renal symptons KSYMO.2,5,10 0,1,2,3,4 0,16,10,9 40-43 Proteinuria KPROO,2,5,10 0,1,2,3,4 0,25,9,8 44-47 Renal function KSPNO,2,5,10 0,1,2,3,4 3,28,13,9 48-51 Retinal vessels RVESO,2,5,10 0,1,2,3,4 5,29,17,2 52-55 Retinopathy RRETO.2,5,10 0,1,2,3,4 5,31,15,12 56-59 Papilledema RPAPO.2,5,10 0,1,2,3,4 5,31,15,11 60 Cause of death (major) CAUSE 0-7 0 61 Cause of death (second) SCAUS 0-7 0 62 Survival time (yrs.) TIME 0.17-10.00 0 63 Sex SEX 2,4 0 64 Severity group SEV 1,2,3 0 42 of t h i s system as being i t s " r e l a t i v e freedom from ambiguity which makes i t possible f o r independent observers to assign almost i d e n t i c a l grades when assessing the same record" [14, p.193]. More important than the grading system i t s e l f , perhaps, i s the f a c t that no record of the number of i n d i v i d u a l measurements, upon which averages or extremes were based, i s a v a i l a b l e today. Moreover, i t became c l e a r that there was considerable patient-to-patient v a r i a b i l i t y i n the number of such measurements, and t h i s n e c e s s a r i l y introduces an element of ambiguity into the d e f i n i t i o n of such an item. Items 4 to 7 - the extreme blood pressure readings -may be p a r t i c u l a r l y a f f e c t e d by t h i s problem. F i n a l l y , i t should be noted that the o r i g i n a l data table con-tains postoperative extreme blood pressure readings, defined i n the same way as items 4 to 7. However, i n the absence of information on the dates of occurrence of these extremes, i t was f e l t that l i t t l e use could be made of these variables i n the analysis that follows. 43 Chapter 4 AN OUTLINE OF THE STATISTICAL METHODS USED 4.1 The Goals of the Analysis When l i s t i n g the goals f o r the analysis of any set of data, i t i s important to have a c l e a r understanding of the l i m i t a t i o n s inherent i n that p a r t i c u l a r dataset. In the present case, i t must be recognized that the control data described i n Chapter 3 were c o l l e c t e d under some of the constraints mentioned i n Chapter 1. In p a r t i c u l a r : the study l a s t e d f o r only ten years, and 47% of the patients died while on study; only c l i n i c a l l y - o b s e r v a b l e signs and symptoms of the progression of the disease were recorded, and today's researcher might regret the absence of other variables now considered relevant - exercise and dietary habits, f o r example; as pointed out e a r l i e r , the dataset does not include c e r t a i n information required f o r a c a r e f u l s t a t i s t i c a l analysis - such as the number of measurements being summarized by an average, and a measure of t h e i r v a r i a b i l i t y . The foregoing l i m i t a t i o n s may, of course, be seen as a r i s i n g from the major goal of the o r i g i n a l study, which was: to compare the effectiveness of sympathectomy to that of symptomatic treatment alone for the amelioration or postponement of the major cardiovascular complications usually associated with the l a t e r stages of hypertension. Thus, the coded symptom variables were used i n the o r i g i n a l study mainly f o r purposes of matching and comparison - although they were obviously selected f o r t h e i r apparent prognostic s i g n i f i c a n c e as well (see Section 2.6). With regard to t h i s l a s t point, the o r i g i n a l a r t i c l e by Evelyn et a l mentions a study i n which 117 patients were divided "into two groups on the basis of the presence or absence of c e r t a i n s p e c i f i e d hypertensive 44 "complications" (abnormal electrocardiograms, cardiac enlargement, cerebro-vascular accident and f o c a l encephalopathy), and [ i t was shown] that the mortality rate was s t r i k i n g l y higher i n the patients i n whom one or more of the "complications" was present at the s t a r t of the follow-up period" [14, p.206] . Although the whole question of the r e l a t i o n s h i p between such complications and the ultimate outlook f o r the patient i s of obvious i n t e r e s t (especially i n comparative t r i a l s ) , Evelyn et a l d i d not pursue the question much beyond an examination of mortality rates and mean sur-v i v a l time i n each of the three " s e v e r i t y " groups, again defined soley on the presence or absence of c e r t a i n complications. These r e s u l t s have already been presented i n Table III of Chapter 2. The authors have noted that most of the deaths occur i n the lower portions of the data t a b l e , corresponding to the patients having the highest blood pressure l e v e l s , but are c a r e f u l to add some examples i l l u s t r a t i n g "the l i m i t a t i o n s i n -herent i n any method of p r e d i c t i n g the prognosis of hypertension on the basis of the c l i n i c a l findings at the s t a r t of the period of observation" [14, p.215]. Almost no mention i s made i n the a r t i c l e of the r e l a t i o n s h i p of changes i n symptoms (over time) to the patient's outlook. I t was the r e a l i z a t i o n that the control data had not been adequately explored that l e d to renewed i n t e r e s t i n t h i s body of data. Since the variables recorded there are e a s i l y observable and apparently c l o s e l y associated with prognosis, i t was f e l t at the outset that the following problems might be approached: (1) Which symptoms bear the " c l o s e s t " r e l a t i o n s h i p to s u r v i v a l and how might t h i s r e l a t i o n s h i p be expressed qu a n t i t a t i v e l y ? 45 (2) How are the symptoms themselves inte r r e l a t e d ? Are these i n t e r -r e l a t i o n s stronger within a group of symptoms (such as the group of three heart symptoms) than among d i f f e r e n t groups? (3) How do the symptoms change over time, and which changes appear to carry the greatest prognostic si g n i f i c a n c e ? (4) Can a s e v e r i t y index be defined that w i l l overcome some of the problems associated with previous e f f o r t s of t h i s type? (See the i n t r o -duction, Chapter 1.) Besides the foregoing ones, several other questions arose i n the course of the analysis and were given various degrees of consideration. Some of these problems may best be described i n the next section of t h i s chapter, which outlines the major s t a t i s t i c a l methods used i n the analysis; others w i l l be discussed i n greater d e t a i l i n Chapter 5. 4.2 The Sequence of Analyses The data table from [14, pp. 194-196, 198-200], inc l u d i n g the s u r g i c a l cases"*", was f i r s t entered into a computer f i l e using the numer-i c a l coding ou t l i n e d i n Chapter 3 for the c a t e g o r i c a l v a r i a b l e s (family h i s t o r y , cause of death, e t c . ) . Summary s t a t i s t i c s and histograms were produced for each v a r i a b l e to a i d i n the detection of errors and possible o u t l i e r s , and to see how the data were d i s t r i b u t e d . Here, and throughout the a n a l y s i s , the BMDP Biomedical Computer Programs [6], proved to be adequate f o r such standard methods. "'"Little use was made of these data here. 46 4.2.1 L i f e Table Analysis I t seemed natural to s t a r t the analysis with an examination of variables 1, 3, and 62 (see Table IV, Chapter 3), combined to give the patient's age at the end of the follow-up period (1 added to 62), and the t o t a l known duration of hypertension (3 + 62). L e t t i n g the random v a r i a b l e , T, denote ei t h e r one of these times, we have the following standard d e f i n i t i o n s : (a) F(t) denotes the p r o b a b i l i t y that the time, T, does not exceed the value t : F(t) = P(T < t) ; (b) S(t) i s the " s u r v i v a l function": S(t) = P(T > t) ; (c) f ( t ) i s the density corresponding to F : f ( t ) = ( F ( t ) ) , where defined; (d) i f f ( t ) e x i s t s , the hazard function, X(t) , i s defined as lim P(t<T<t+At|T>t) f ( t ) . -rr = „ (A(t) , a co n d i t i o n a l density At-MD At S(t) -* function, i s also c a l l e d the f a i l u r e rate or the force of m o r t a l i t y ) ; •t (e) A(t) denotes the cumulative hazard function: A(t) = X(x)dx J 0 The following r e l a t i o n s h i p s follow immediately: A(t) = -In [S (t)] ; S(t) = exp{- X(x)dx} 0 Since the present data contain observations that are incomplete (or censored), due to the presence of 52 survivors at the end of the ten-year follow-up period, estimates f o r the functions (a) to (e) above were obtained using e i t h e r (or both) of the following non-parametric methods: 47 (i) the l i f e - t a b l e (or actuarial) method involves d i v i d i n g the range of poss i b l e values of T into several i n t e r v a l s and estimating the con-d i t i o n a l p r o b a b i l i t y of dying i n a given i n t e r v a l , given s u r v i v a l to the beginning of the i n t e r v a l , as follows: the number of deaths i n that i n t e r v a l i s simply divided by the t o t a l number of patients at r i s k of dying i n the i n t e r v a l ; ( i i ) The non-parametric maximum l i k e l i h o o d estimate when there i s censor-ing (the Kaplan-Meier or product-limit estimate) changes value (or "jumps") at the observed times to death, rather than at the end-points of the a r b i t r a r y l i f e - t a b l e i n t e r v a l s ; i f the d observed times to death are denoted t,,. < t,„. < ••• < t . n . , then S(t) i s (1) (2) (d) r. -1 ~ i estimated by P(t) = n , where r. represents the number of i p a tients remaining i n the study at time t ( ^ j ' including any that died at the time » a n < ^ ^- n e product i s taken over a l l i for which t ( i ) ~ ^ • M o r e d e t a i l s , as well as formulas f o r the e s t i -mated variance of each estimate, are given i n [26]; the o r i g i n a l a r t i c l e of Kaplan and Meier, [29] , may also be consulted. The r e s u l t s of such non-parametric techniques are generally of greatest use when expressed i n terms of the hazard and cumulative hazard function estimates, since these functions often have a more e a s i l y recognizable form than does S(t) Other, more s p e c i a l i z e d , graphical 1 Y For example, S(t) = exp ( - A t ) for the Weibull d i s t r i b u t i o n , whereas Y - l X(t) = X y t ' , for parameters A and y . 48 methods also e x i s t for guiding the researcher to an appropriate para-metric model, such as the gamma or Weibull density; one such method, c a l l e d the Weibull p l o t , i s discussed i n the l a s t section of t h i s chapter. A l l these techniques must, however, be used more cautiously when truncation i s present: when T represents age at death, i t must be 2 r e a l i z e d that the sampling r e s t r i c t i o n combined with the pre-determined ten-year duration of the study imply that any observable time to death w i l l n e c e s s a r i l y be less than 65 years; and i n the case of t o t a l known duration, i t i s known i n advance that a l l times w i l l exceed one h a l f of a year - the length of the r e q u i s i t e "preoperative" period. In the former case, the d i s t r i b u t i o n i s truncated on the r i g h t at R = 65 years, and the truncated hazard function, A(t| R) , i s r e l a t e d to that of the com-plete d i s t r i b u t i o n as follows: X (t| R) = X (t) [1 + {s(R)/[S(t) - S(R)]}] for t < R ; thus, X(t|R) approaches i n f i n i t y as t approaches R , and resembles X(t) only near the o r i g i n , i n general. The l a t t e r case involves truncation on the l e f t at L = 0.5 year, and i t i s e a s i l y seen here that X(t|D = X(t) for t > L . A d d i t i o n a l problems with the i n t e r p r e t a t i o n of such p l o t s a r i s e when censored observations are mixed with those from a r i g h t truncated d i s t r i b u t i o n , since i t i s generally not reasonable to assume that the unobserved times w i l l a l l be le s s than some value, R . These points w i l l be more f u l l y discussed i n the next chapter. Once a parametric model for the d i s t r i b u t i o n of times has been chosen, maximum l i k e l i h o o d estimates for i t s parameters may be found. This was done i n the present analysis only for the case where T 2 Only cases under the age of 55 were accepted. 49 represents the patient's age at death. I t i s a simple matter to write down the l i k e l i h o o d of the sample of observed and censored times when the d e t a i l s of the sampling procedure are ignored; however, when an attempt th i s made to account f o r the f a c t that the i case was observed not from b i r t h , but'from the age a^ onward, for a maximum of ten years, the der i v a t i o n of the l i k e l i h o o d becomes more complex. Deta i l s w i l l be l e f t to Appendix A. 4.2.2 Analysis of Missing Data The pattern of missing data for the en t i r e control sample was examined using the Biomedical Computer Program, BMD:PAM[6]. Included i n the r e s u l t s of t h i s program i s a version of the o r i g i n a l data matrix i n which missing observations are coded "M" and the a v a i l a b l e variables are l e f t blank. Various summaries are also included, such as the percentage of missing data per v a r i a b l e and per case, various c o r r e l a t i o n matrices, and so on. Once the nature of the missing data problem had been c l a r i f i e d , various methods of coping with t h i s problem were considered; these tech-niques are ou t l i n e d i n [18] and [3]. Since the output from the f i r s t a p p l i c a t i o n of BMDrPAM indicated the presence of s i g n i f i c a n t c o r r e l a t i o n s among the v a r i a b l e s , the technique of estimating a missing value by the mean of that v a r i a b l e was considered unacceptable. Instead, the method of stepwise regression (discussed i n a l a t e r section of t h i s chapter) with two forward steps was used; i n most cases, many more variables could have been used to p r e d i c t the missing value, but the number of missing values to be estimated and the degree of c o r r e l a t i o n among the p o t e n t i a l p r e d i c t o r s made the two-step method seem most reasonable. 50 Since the number of variables that could possibly be recorded for a patient i s a function of h i s or her s u r v i v a l time, the e n t i r e sample was divided into three groups, depending on whether the patient died before the end of two years, between two and f i v e years, or a f t e r f i v e years on study. Separate two-step regression estimates were made i n each group, and examined for consistency with the values of known vari a b l e s ; i n p a r t i c u l a r , values outside the 0-4 range of the coded symptom variables were adjusted somewhat i n the appropriate d i r e c t i o n , and a l l estimates were rounded to one decimal place. F i n a l l y , histograms and summary s t a t i s t i c s for each completed va r i a b l e were examined to evaluate the e f f e c t s of the estimation process. 4.2.3 Preliminary Analyses Since several c r i t e r i a f o r def i n i n g subsamples were a v a i l a b l e a p r i o r i , a number of "exploratory" analyses were done with the aim of checking the homogeneity of the complete sample and of examining r e l a t i o n -ships among c e r t a i n v a r i a b l e s . The c r i t e r i a - which were used (in view of the moderate t o t a l sample siz e of 98) to create two subsamples or s t r a t a i n each case - include: presence or absence of papilledema (associated with "accelerated" hypertension) , severity group (as defined i n [14]) , age of the patient at the s t a r t of the study, family h i s t o r y of hyper-tension, and the patient's s y s t o l i c range (defined as MAXSYS-MINSYS). The 2 program, BMD:P3D was used to c a l c u l a t e Hotelling's T s t a t i s t i c (see [37, pp. 136-139] as well as the corresponding univariate s t a t i s t i c s , and a separate histogram was produced f or each va r i a b l e i n each of the two subsamples. With respect to the problem of i d e n t i f y i n g p r o g n o s t i c a l l y 51 s i g n i f i c a n t s t r a t a i n the sample, much more sophisticated methods are ava i l a b l e (such as the Automatic Interaction Detection procedure). Such methods allow subsamples to be defined by crossing two or more v a r i a b l e s , but are of rather l i m i t e d use i n datasets (such as the present one) where many c l o s e l y r e l a t e d variables have been recorded on a r e l a t i v e l y small number of i n d i v i d u a l s : a problem with very small subsample sizes i s quickly encountered. Other methods were therefore considered f o r a f i n e r analysis of the present data. 4.2.4 Multiple Linear Regression Wien several "independent" or "predictor" variables are to be rel a t e d simultaneously to a single "dependent" or "target" v a r i a b l e , the methods of multiple l i n e a r regression are immediately suggested. Montgomery, [35, p.304] and Armitage [2, p.302], among many others, provide a good introduction to t h i s widely-used technique. For a review of some of the more subtle problems encountered i n the a p p l i c a t i o n of regression methods i n general, Mosteller and Tukey, [39], should be consulted. Since some of these problems arose i n the present a n a l y s i s , some d e t a i l concern-ing the solutions attempted w i l l be given here. During the f i r s t stage of the an a l y s i s , the dependent v a r i a b l e was taken to be s u r v i v a l time (TIME), or some re-expression of t h i s v a r i a b l e , and the independent variables included a l l the data a v a i l a b l e at the beginning ("time zero") of the study. Thus f or a t o t a l sample siz e of n = 98 , there are 21 time zero v a r i a b l e s : AGE, SEX, HIST, DUR, MAXSYS, MAXDIA, MINSYS, MINDIA, SYSO, DIAO, HSYMO, HSIZO, HECGO, BACHEO, BCVAO, KSYMO, KPR0O, KSPNO, RVESO, RRETO, RPAPO. (See page '41- for var i a b l e d e f i n i t i o n s . ) The expression of the dependent v a r i a b l e posed 52 a problem, since the time l i m i t f o r the study re s u l t e d i n 52 of the 98 times being recorded as ten years; thus, various " i n t e r v a l " v a r i a b l e s were defined as follows: STATUS = r 0 i f TIME > 10 k 1 i f TIME < 10 INTERV 1 i f TIME < 5 2 i f 5 < TIME < 10 3 i f TIME > 10 INTERM = < / 1 i f TIME < 2 2 i f 2 < TIME < 5 3 i f 5 < TIME < 10 4 i f TIME > 10 Analyses were also done with censored cases weighted 0 ( i . e . , survivors excluded) or 0.3; the l a t t e r weighting was chosen soley to reduce the e f f e c t of having 52 observations coded "4" when INTERM was the target v a r i a b l e . Expressed i n a general form the multiple l i n e a r regression model i s : (1) y. = B,x + 3 0x.„ + ••• + 3.x + e. I 1 x l 2 _2 k l k i where: th y^ i s the value of the dependent va r i a b l e f o r the i case, i = 1, 2, • • • , n th x^ _. i s the value of the j independent v a r i a b l e f o r the .th case, j = 1, 2, •••, k 53 i s the p a r t i a l regression c o e f f i c i e n t f o r v a r i a b l e j i s a random var i a b l e having mean 0 and variance a , f o r a l l i ; e. and e. are uncorrelated i f i * i • I D L e t t i n g _* = ( y ^ y_, •••» Yn> » _" = ( e _ ' e 2 ' ' " ' e „ } ' = 32' V ' ~= -1' -2' ""' 5k> ^ ^ = ( X l 3 ' ^ • x ) f o r j = 1, 2, k, model (1) can be expressed i n matrix form as: y = X B + e where E[e] = 0 and COV [E] = a I If X^  i s of f u l l rank, k , the l e a s t squares estimate of , denoted 3_ , i s given by: 3 = (X'X)~ 1x'y I t can be shown quite d i r e c t l y that 3^  i s an unbiased estimator for 3_ 2 -1 with variance - covariance matrix given by a (X _) . Thus, i f the vectors x. and x. are not orthoqonal ( i . e . x.^x. * 0 ) , the entry i n row i , column j of (X'X) w i l l be non-zero, and the estimates 3. and 3. w i l l be c o r r e l a t e d . 3 If X'X has the form i 0 » M — 1 — f o r s c a l a r S 11 ' (k-1) x l vector 0_ , and (k-1) x (k-1) matrix M , then (X"X) ' „-l -1 11 0 0_ M In t h i s case 2 , 2 , v 2 has variance a /S,, = a / > x., 11 . n i l i = l 54 If i s not orthogonal to a l l the other X j ^ s o that the f i r s t row of X^X has more than one non-zero entry) a new vector may be defined: (2) r_^ = x - W A = X-^ - x = the r e s i d u a l a f t e r regression of x^ on x_2, x 3, • • • , x (W = (x 2, x > ''' > x R) > A' = (&„, a., •••, a,) )-— 2 3 k -1 Using A = (W'W) W'x^ / i t i s e a s i l y seen that W r_^ = 0_ , so that the re s i d u a l vector, r_^ , i s orthogonal to each of the remaining k-1 vectors. I f x^ i s now replaced by r_^ i n the o r i g i n a l model (1) , the variance 2 ,v .2 ' i n of the corresponding regression c o e f f i c i e n t i s given by a / \ r . (3) i = l 1 k „ k Also, since fLr. = i3x - fL ) a.x. = fLx. - S (§.a.)x. , 1-1 1-1 1 ^ 2 3-3 1-1 ^ 2 1 D D 3^ i s the same as the value found f o r x_^  , but the other regression c o e f f i c i e n t s w i l l be modified by the addition of the product P^ a j " v 2 v1 " 2 Since ) r . , =}(x., -x.,) , the error sum of squares f o r u i l L i i i i x regressed on x , • • • , 5 £ V J the denominator i n (3) w i l l be small whenever x^ i s well-predicted by some subset of the other k-1 vari a b l e s ; i n turn, the variance of 3^ w i l l be very large. In short then, non-orthogonality among the independent vectors leads to both cor r e l a t e d regression c o e f f i c i e n t estimates as well as a large (and p o t e n t i a l l y misleading) variance f o r at l e a s t one of these estimates. (Some of these points are at l e a s t mentioned i n [4].) Beaton and Tukey, [4], also point out that the c o e f f i c i e n t of any si n g l e v a r i a b l e - say x^ - can be modified by transforming the other 55 vectors as follows: _Ej = 2£j " c ' j = 2 , 3 , • • • , k If y_ i s then regressed on x , x*, x*, *••,_*, the new regression c o e f f i c i e n t s w i l l be 3*, 3 2/ &3> '•'; • , 3 k ; that i s , only the c o e f f i c i e n t of x^ changes, becoming 3* = 3^ + _ c.3. . (This r e s u l t follows from j=2 D 3 the f a c t that the sets {x , x , x } and {x , x*, x*} generate X _ rC X _ JC the same subspace of n-dimensional space, and the pr o j e c t i o n of y onto t h i s subspace i s unique.) Beaton and Tukey go on to note that, when each c_. i s the regression c o e f f i c i e n t f o r regressed on x^ (so that each x? i s a r e s i d u a l ) , then the variance of the r e s u l t i n g • c o e f f i c i e n t , 3£ w i l l be minimized - a procedure the authors have named "minvar modifi-cation". A d e t a i l e d , algebraic proof of t h i s r e s u l t - quite unlike the j u s t i f i c a t i o n given i n [4] - appears i n Appendix B; i t i s also shown i n t h i s appendix that the r a t i o of the minimum variance obtainable (for the v 2 v 2 c o e f f i c i e n t of x^ ) to the o r i g i n a l variance i s given by £r_^/ _ x ^ , with r n as i n (2) above. When a l l k v a r i a b l e s , x, , •••, x , have —1 —1 —k mean 0 , t h i s r a t i o i s ju s t 1 minus the c o e f f i c i e n t of determination when x^ i s regressed on the other k-1 independent v a r i a b l e s . I t i s worth noting that t h i s r a t i o has been c a l l e d the "usable f r a c t i o n " of the var i a b l e x^ i n [4]. The procedure of minvar modification was designed to y i e l d some i n s i g h t i n t o the degree of interdependence e x i s t i n g among the estimated regression c o e f f i c i e n t s , and to suggest l i n e a r combinations of 56 these c o e f f i c i e n t s that are better determined (in the sense that t h e i r estimated variances are small). The existence of strong c o r r e l a t i o n s among the time zero independent variables i n the present analysis created an opportunity to use such techniques; the problem was e s p e c i a l l y acute when the s i x blood pressure v a r i a bles (MAXSYS, MAXDIA, MINSYS, MINDIA, SYSO, DIAO) were used i n t h e i r o r i g i n a l form. However, Beaton and Tukey themselves have admitted that "knowing minvar modifications does improve our i n s i g h t considerably - though often not enough" [4, p.176]. For t h i s reason, various a l t e r n a t i v e methods were considered i n an e f f o r t to f i n d a well-determined set of estimated regression c o e f f i c i e n t s within a "good" p r e d i c t i o n equation. Some of the most important of these methods w i l l now be described. In datasets such as the present one where the number, k, of p o t e n t i a l p r e d i c t o r v a r i a b l e s i s large, one or more variables may be l e f t out of the regression equation f o r one of the following reasons: (a) Certain regression c o e f f i c i e n t s i n the model, y. = )3.x.. + e. , l u j ij i may i n f a c t be 0, or nearly 0; that i s , a v a r i a b l e , x_. , may not be l i n e a r l y r e l a t e d to y . (b) A c e r t a i n v a r i a b l e x. may indeed be r e l a t e d to y i n a l i n e a r manner (the sample c o e f f i c i e n t of l i n e a r c o r r e l a t i o n , r (x_. ,y_) , i s s i g n i f i c a n t l y d i f f e r e n t from 0), but the r e s i d u a l , X j ~ 2Lj = —j ~ T a. x. , .. ,,.. ., . . ., . th ^ ._ _ , may be "small"-; that i s , the j v a r i a b l e adds l i t t l e to the information already c a r r i e d by the other p r e d i c t o r s . Since x_. i s w e l l -approximated by x. , i t follows that 3.x. w i l l be close to 3.x. = -~D 3~n 3~3 57 k \ 3.a.x_. and t ha t y = £ 3 . x . w i l l d i f f e r l i t t l e from T 3 . x . + Moreover, i t i s e a s i l y seen i n t h i s case that any other p r e d i c t o r , x —m for which d * 0 , w i l l be well-approximated m by y ( - a . / a )x. f o r a . = -1 . Thus the choice of a va r i a b l e to delete from the model may be quite a r b i t r a r y . The s i t u a t i o n j u s t described may occur because there i s a r e a l and reasonable l i n e a r r e l a t i o n s h i p among the independent variables - as one might expect i n the case of the three heart v a r i a b l e s ; a l t e r n a t i v e l y , t h i s phenomenon might be better explained as being an a r t i f a c t of a number of independent v a r i a b l e s , k, that i s large r e l a t i v e to the sample s i z e , n, 2 (see [37, p.109] f o r a proof of the f a c t that E[R ] approaches 1 as k approaches n ). In e i t h e r case, i t i s undesirable to t r y to f i t a l l k c o e f f i c i e n t s because, as seen e a r l i e r i n t h i s section, the r e s u l t i n g estimates w i l l be unstable. In extreme cases, i t may even be impossible to c a l c u l a t e the inverse of the matrix X. _ - Some techniques that may be used to eliminate unnecessary variables w i l l now be l i s t e d . (a) Stepwise Regression There are two basic types of stepwise regression: forward stepping, i n which predictors are entered into the regression equation one at a time, with the order of entry based on the greatest reduction i n v 2 . the r e s i d u a l sum of squares, RSS = )(y. - y.) ; and backward stepping, u i I which removes predictors from the f u l l regression equation one at a time according to the order of smallest increase i n the r e s i d u a l sum of squares. In both methods, the r a t i o of the change i n RSS to the mean 58 v 2 squared error f o r a l l k pr e d i c t o r s , MSE = £(y^ ~~ y^) /(n - k) , i s compared informally to some previously chosen value of the F - d i s t r i b u t i o n , i n order to a r r i v e at a p r a c t i c a l stopping r u l e . For more d e t a i l s , see [38, p.388], [2, p.314], or [6, pp. 375, 379]. Stepwise regression i s not without i t s drawbacks, as the following example i l l u s t r a t e s . Suppose the sample c o r r e l a t i o n of x.^  and y , denoted r ( x ^ , y_) , i s 0.68, while r ( x _ 2 ' l ) = 0.70 and r ( x ^ , x^) = 0.85 . If r ( x , , y) i s le s s than 0.70 for j = 3, 4, k, then a simple forward stepwise procedure w i l l s e l e c t x^ at the f i r s t step, while x^ may never enter the equation at a l l , despite i t s s i m i l a r l y strong c o r r e l a t i o n with y . Such an omission i s not only d i f f i c u l t to explain to a user of the s t a t i s t i c a l model, i t may also a f f e c t the ultimate performance of the r e s u l t i n g regression equation. While improv-ed stepwise algorithms are now a v a i l a b l e (some combining both forward and backward stepping), these methods are by no means the complete solu t i o n to the problem of v a r i a b l e reduction - e s p e c i a l l y i n the pre-sence of strong c o r r e l a t i o n s among p r e d i c t o r s . (b) A l l Subsets Regression Computer programs that examine a very large number of subsets of the o r i g i n a l k p r e d i c t o r s , and report a u s e r - s p e c i f i e d number of "best" subsets, are now a v a i l a b l e at moderate cost. For the present a n a l y s i s , the program used was BMD:P9R, [6, p.418], i n which "best" may be defined i n any one of three ways: 2 (i) i n terms of the c o e f f i c i e n t of determination, R ; 2 2 2 ( i i ) i n terms of the "adjusted" R , equal to R - p ( l - R )/ (n - k) ; p i s the number of variables i n the subset (includ-ing the constant vector, i f an intercept i s desired), n i s 59 the sample s i z e and k i s as above; ( i i i ) i n terms of Mallows' C c r i t e r i o n [33, p.661], defined as P RSS/MSE - (n - 2p); n and p are as above, RSS i s the r e s i d u a l sum of squares for the subset under consideration, and MSE i s the mean squared error when a l l k predictors are used. Mallows has shown that C i s an estimate of the expected value P £ ~ 2 of the scaled sum of squared errors, E[ i (xT3 - x_^_)/a ] , f o r i = l 1 th x' = (x.,, x._, x.,) = the data for the i sampled — i i l i2 i k element - and g i s the estimate of g containing a zero -P -where the corresponding v a r i a b l e has been deleted from the regression equation. Although Mallows' c r i t e r i o n generally appears to be a good choice f o r values of k le s s than 30, Mallows himself [33, p.669] has pointed out that t h i s method s t i l l f a l l s short of solving the problem of determining the "correct" number of predictors to use; once again, problems a r i s e when the predictors are highly i n t e r c o r r e l a t e d . Nevertheless, the " a l l subsets" method does provide many more good, a l t e r n a t i v e regression equations than do the stepwise techniques. From the a v a i l a b i l i t y of so many solutions, the data analyst may a r r i v e at a f u l l e r appreciation of the ambiguities caused by the presence of correlated p r e d i c t o r s ; i f r e -gression c o e f f i c i e n t s and t h e i r estimated standard errors are also a v a i l a b l e f o r the reported subsets, the nature of the ambiguity may be c l a r i f i e d somewhat. 60 (c) Regression on P r i n c i p a l Components In view of the problems caused by co r r e l a t i o n s among the pre d i c t o r s , x , x , i t would seem natural to seek an "equivalent" X JC set of v a r i a b l e s , v , •••, v, , that are uncorrelated. By "equivalent" —1 —k i s meant that, f o r any k x 1 vector, c_ , there i s a k x 1 vector b , such that (x,, x, )c = (v,, v, )b . Since most regression — —1 —k — —1 —^ — equations include an intercept (which can be thought of as the p a r t i a l regression c o e f f i c i e n t for a v a r i a b l e , x^ = (1, 1, 1)), the problem becomes more t r a c t a b l e i f each x^ i s transformed by subtracting , _ _ ! n i t s mean from each component: x. m = x. - x.x„ for x. = — > x.. .•. Then —IT — l I-O i n % u i u=l {x , x. , x } and { x . x , x } are c l e a r l y equivalent sets U X JC U J. X JC J. of p r e d i c t o r s . Dropping the T subscript now, and l e t t i n g X = (x_ , x , x ) as before, the problem consists of fi n d i n g c o e f f i c i e n t X JC vectors, c_ , c_ , • • • , c_ , each of length k , such that v = k y c..x. = X c. for i = l , 2, • • • , k , and r ( v . , v.) = 0 f o r i * j . 1.3-3 1 - i -3 Since i t i s now assumed that each x. has mean 0 , so does each v. , -3 - i and v. and v. w i l l be uncorrelated i f and only i f v^v. = 0 . By - i - - 3 - i - 3 applying Gramm-Schmidt orthogonalization to x_ , i t i s e a s i l y seen that many solutions e x i s t f o r the problem as posed above; to obtain a unique so l u t i o n , one might impose the a d d i t i o n a l conditions that v^ have maximal variance among a l l l i n e a r combinations of the x. , and that v. - i have maximal variance among a l l l i n e a r combinations that are orthogonal to each of v, , v_, • • • , v .• (for j = 2, 3, k) . The k l i n e a r —1 —2 ^ J - l 61 combinations that r e s u l t are c a l l e d the p r i n c i p a l components of X_ . See Appendix C for some of the d e t a i l s of the c a l c u l a t i o n procedure here. Besides overcoming some of the problems caused by c o r r e l a t e d p r e d i c t o r s , p r i n c i p a l components regression usually leads to a reduction i n the number of new variables f i t t e d , as w e l l . A component i s commonly deleted from the multiple regression equation on the basis of e i t h e r i t s variance - the reasoning being that a component with very small variance i s u n l i k e l y to make any meaningful contribution to p r e d i c t i o n - or on the basis of i t s c o r r e l a t i o n with the dependent v a r i a b l e , y_ . I t should be noted that low-variance components occasionally show f a i r l y strong c o r r e l a t i o n s with y_ . Most of the problems commonly encountered i n the a p p l i c a t i o n of p r i n c i p a l components regression became apparent during the present a n a l y s i s . They include: (i) d i f f i c u l t i e s caused by widely d i f f e r i n g units of the o r i g i n a l v a r i a b l e s ; the extreme blood pressures, for example, were re-corded i n the o r i g i n a l units of mm. Hg, while the 13 symptom variables were coded on a 0-4 scale. To avoid dominance of the components by the variables having the l a r g e s t u n i t s , standardization of a l l the o r i g i n a l predictors was done f i r s t ; another approach consisted of extracting p r i n c i p a l components from each of several groups of s i m i l a r variables - although inter-group c o r r e l a t i o n s s t i l l existed; ( i i ) the d i f f i c u l t y i n i n t e r p r e t i n g a l l but the f i r s t few components i n a meaningful, p r a c t i c a l way; t h i s problem i s p a r t i c u l a r l y 62 u n s e t t l i n g when such "uninterpretable" components show strong c o r r e l a t i o n s with y_ ; ( i i i ) the f a c t that elimination of components from the regression equation usually does not r e s u l t i n any of the o r i g i n a l disappearing as w e l l ; thus, not only must the value of each x^ be known, t h e i r use i n the p r i n c i p a l components regression equation requires a d d i t i o n a l arithmetic f i r s t ; i f the p r i n c i p a l components equation is ' transformed into one i n v o l v i n g the o r i g i n a l v a r i a b l e s , the r e s u l t may have l i t t l e i n t u i t i v e appeal. In the present a n a l y s i s , the basic ideas and methods of p r i n -c i p a l components regression proved to be most useful when used i n combina-t i o n with other techniques, of which one more remains to be described. (d) "Subjective" Linear Combinations Mosteller and Tukey [39, p.394], have suggested a more informal approach to the problem of recombining the o r i g i n a l predictors i n t o a better set: they recommend that l i n e a r combinations of the o r i g i n a l x. be based on subjective c r i t e r i a , r e l a t i n g to the nature of the x. . — i — i For example, i f x^, x^, and x^ are " s i m i l a r " measurements, one might consider = (x + * 2 + — 3 ^ 3 ' —2 = ^—1 + —2 ~ 2—3^ ' —3 = (x^ - 2x^ + ' ^ o r e x a m P ^ e * These new variables should s a t i s f y as many of the following c r i t e r i a as p o s s i b l e : (i) they should have a reasonably simple or p r a c t i c a l i n t e r p r e t a t i o n i n terms of the o r i g i n a l data; 63 ( i i ) they should be less c o r r e l a t e d among themselves than the o r i g i n a l v a r i a b l e s ; ( i i i ) a good p r e d i c t i o n of the dependent v a r i a b l e should be obtain-able with a smaller number of the new v a r i a b l e s . Mosteller and Tukey warn that t h i s technique should be employed on an "a p r i o r i " basis only, i f i t i s to be e f f e c t i v e i n producing more stable estimates. Thus, the data analyst should draw on general insi g h t s and past experience with s i m i l a r datasets i n t r e a t i n g the new v a r i a b l e s , rather than turning to the data at hand for guidance (see [39, p.396]). Wien the data from the two-year point i n the study were added to the l i s t of p r e d i c t o r s , i t was decided that each p a i r of o r i g i n a l measurements on a given symptom (SYSO and SYS2, for example) would be replaced by the "base" l e v e l of the symptom and the increment over the two years (SYSO and SYSI = SYS2 - SYSO). The decision to examine prognosis using "baseline" and "change" variables was also made with c r i t e r i a (i) and ( i i ) (above) i n mind. 4.2.5 Discriminant Analysis The p r e d i c t i o n of a binary v a r i a b l e such as STATUS (the 0-1 va r i a b l e with 1 i n d i c a t i n g death while on study) i s not handled well with ordinary multiple regression techniques; although i t i s possible to regard t h i s v a r i a b l e as a rather c r y p t i c coding of the o r i g i n a l TIME va r i a b l e , the continuous nature of the predicted values along with the fac t that these values may l i e outside of the i n t e r v a l , [0, 1], suggest that STATUS may be more p r o f i t a b l y regarded as a variable i n d i c a t i n g membership i n one of two populations - non-survivors and survivors. 64 With the l a t t e r point of view, .the technique known as l i n e a r discriminant analysis i s appropriate, since i t i s designed to p r e d i c t group membership on the basis of independent v a r i a b l e s , x , •••, x . This technique has been widely used i n medical applications and amply discussed i n the l i t e r a t u r e (see, for example, [1, p.126] or [37, p.230]). As a basis for a discussion of some of the problems encountered i n the a p p l i c a t i o n of discriminant a n a l y s i s , a few e s s e n t i a l points must be mentioned here. In ordinary two-group discriminant a n a l y s i s , i t i s assumed that each i n d i v i d u a l belongs to one of two populations, TT^  or TT^  , and that the k measurements a v a i l a b l e on each i n d i v i d u a l have some multivariate d i s t r i b u t i o n with corresponding mean vector y^ or y_^  , and covariance matrix, Z , that i s independent of the population. An i n d i v i d u a l with observation vector xl = (x.,, x._, x.,) i s c l a s s -— i i l i2 i k i f i e d by comparing the po s t e r i o r p r o b a b i l i t y of membership i n TT_ to that of , given the observation vector. That i s , an i n d i v i d u a l i s predicted to belong to TT^  i f and only i f P [TT. l x . ] 1 1 > M P C^QJ-^] Using Bayes' Rule, l e t t i n g p_. stand f o r P[TT_.] , and taking logarithms, the r u l e f o r c l a s s i f i c a t i o n as ir becomes: In[L(x. TT, ) ] -1 —I 1 In [L(x^ |ir ) ] > InM + l n P Q ~ InP-j^ • If i t i s further assumed that an observation vector from ir (j = 0, 1) has a multinormal d i s t r i b u t i o n , the left-hand side of the l a s t i n e q u a l i t y becomes: (y - u ) ' E 1 x. - ^ -(y - u ) ' E 1 (y + u ) . —1 —0 — — l 2 —1 —0 — — 1 — 0 65 In p r a c t i c e , , , and E_ are replaced by t h e i r usual sample estimates, as are p^ and p^ i f a random sample was drawn from the combined populations. Defining C ( i | j ) as the cost incurred i n c l a s s i -f y ing an i n d i v i d u a l as belonging to population i when, i n f a c t , the true population i s ir_. , the expected cost of m i s c l a s s i f i c a t i o n w i l l be minimized i f the constant, M , i s taken to be C(1|0)/C(0|1) above. (In the present a p p l i c a t i o n , where 0 denotes s u r v i v a l , i t would seem reasonable to assign some value less than 1 to M .) As with multiple regression, computer programs employing stepwise techniques are a v a i l a b l e to cal c u l a t e and evaluate the d i s -criminant function. Re-expression of the independent variables again proved u s e f u l , but other problems, involving the basic assumptions of homogeneity of variance and normality, had to be approached as w e l l . One s o l u t i o n to these problems involves examination of (estimated) d i s -criminant scores of the o r i g i n a l samples - that i s , the values ( u n - u )'E ^ x. f or i = 1, 2, •••, n = n + n n . If the assumption of —1 —0 — —I 0 1 mult i v a r i a t e normal populations i s v a l i d , the d i s t r i b u t i o n of d i s c r i m i -nant scores f o r each sample should have an approximately normal shape; and, i f the assumption of a common covariance matrix i s r e a l i s t i c , the two sample d i s t r i b u t i o n s should not show s i g n i f i c a n t l y d i f f e r e n t spreads. In cases where the normality assumption i s not s a t i s f i e d , i t i s sometimes possible to apply a "normalizing" transformation such as the logarithmic transformation (see [2, p.351]). When the homogeneity assumption remains i n v a l i d (even a f t e r successful a p p l i c a t i o n of a normalizing operation), the problem becomes one of assigning an observation to one of two univariate normal 66 3 populations having unequal variances. In t h i s case, the d i v i d i n g point, C , (lying between the means of the two populations) may be adjusted i n several ways; two ways w i l l now be described: (i) making the p r o b a b i l i t i e s of m i s c l a s s i f i c a t i o n equal. Let the univariate normal populations have mean and 2 2 variance equal to y^ , and y^ , r e s p e c t i v e l y , and suppose > . If x i s an observation from one of the populations, then a value c i n (y , y^) i s required such that: i 2 i 2 P(x>c|y Q, g ) = P(x<c|y 1, c^) That i s , 1 - 4([c-y ]/a ) = * ( [ c - y 1 ] / a 1 ) 4 I t follows immediately from the symmetry of the normal density that ( c - y Q ) / a 0 = - ( c - y ^ / c ^ , so that c = ( a ^ g + ° / (a + a^) - a simple weighted average of the population means. ( i i ) minimizing the o v e r a l l p r o b a b i l i t y of m i s c l a s s i f i c a t i o n . A value c i s required such that f(c) = 2 2 P(x>c|y Q, a ) + P(x<c|y 1, a^) i s minimized (with the same assumptions as i n (i) above). Rewriting f(c) as These methods assume that a l i n e a r discriminant function already e x i s t s and that the two r e s u l t i n g (univariate) populations of discriminant scores are-normal. The relevant variables are thus condensed.into a s i n g l e observation. These methods were suggested by my thesis supervisor, Dr. M. Schulzer. $ denotes the standard normal cumulative d i s t r i b u t i o n . 67 1 - $([c-u ]/a ) + $([c-u^]/a^) , i t can be seen that df/dc = 0 when = 0 [ (c- y ^ / c ^ ] / 0[(c-y_)/o ] . Taking logarithms, c i s found to be a root of the equation 2 2 2 2 2 Ac + Be + D = 0 with A = a± ~ a2 ' B = 2 ^ 1 ° 2 ~ ^2°!* ' 2 2 2 and D = (u 2a ) - (P-^) - 2{a a ) l n ( o / a _ ) . In actual p r a c t i c e , the formulas given i n both (i) and ( i i ) 2 above must be used with sample estimates replacing y Q , y^ , c_ , and 2 °1 * A second s o l u t i o n to the problem of d i f f e r i n g covariance matrices leads to the c a l c u l a t i o n of a quadratic discriminant function. By allowing E and E, to be d i f f e r e n t covariance matrices i n the 1 ^ -0 —1 l i k e l i h o o d r a t i o , L(XJIT^) / L(XJTTQ) , and l e t t i n g K = InM + l n p Q -l n p 1 , i t may be seen that In [L (xj TT^ ) ] - ln [L(xJir )] > K i f and only i f : x'Dx + 2x'(E- 1y 1 - > 2K - In (| E^ | / | ^  | ) - ^ \ + y ^ " 1 ^ ° - C - h1 • The f i r s t term i n the l a s t i n e q u a l i t y accounts f o r the quadratic nature of the discriminant function. Besides being less t r a c t a b l e from a computational point of view, the quadratic discriminant function i s also prone to problems of s t a b i l i t y and i n t e r p r e t a b i l i t y of the estimated c o e f f i c i e n t s . In the present study, a quadratic discriminant function was cal c u l a t e d , but not investigated i n any d e t a i l . 4.2.6 L o g i s t i c Regression In view of the rather r e s t r i c t i v e assumptions (concerning the d i s t r i b u t i o n of the predictor variables) required by the l i n e a r 68 discriminant method, an a l t e r n a t i v e model for the analysis of a binary dependent v a r i a b l e was considered. A model which i s known to possess desirable properties - both from the medical as well as the t h e o r e t i c a l point of view - i s the l o g i s t i c model: Pr [ i n d i v i d u a l belongs to TT_ predictors (x.. , x )] 1 1 p • = 1/(1 + exp[BQ + jj_£x]) = f (3_, x) ... where x" = (x^, x ), the vector of pre d i c t o r v a r i a b l e s f o r the i n d i v i d u a l ; and 3^ = (3 , 3,, '", 3 ) . — 0 1 p An estimate f or the c o e f f i c i e n t vector, 3_ , may be obtained by using some i t e r a t i v e method (such as the Gauss-Newton procedure) to minimize: n 2 g(3) = I [y. - f(3,x.)] 1=1 with 1 i f i n d i v i d u a l i belongs to TT 0 i f i n d i v i d u a l i belongs to TT This model has been extensively discussed i n the l i t e r a t u r e (see, f o r example, [32, p.785] or [11, p.892]), and i t s advantages compared with those of the l i n e a r discriminant. In summary, the l o g i s t i c model has been shown to be v a l i d whenever the j o i n t d i s t r i b u t i o n of the pred i c t o r v a r i a b l e s belongs to the exponential family - a very mild r e s t r i c t i o n ; however, Efron has shown that, when the pre d i c t o r variables 69 have a j o i n t d i s t r i b u t i o n that i s multivariate normal, l o g i s t i c regression i s "between one h a l f and two t h i r d s as e f f e c t i v e as normal disc r i m i n a t i o n f or s t a t i s t i c a l l y i n t e r e s t i n g values of the parameters" [11, p.892]. This conclusion i s based on a study of asymptotic r e l a t i v e e f f i c i e n c i e s of the two methods. Moreover, involving i t e r a t i v e pro-cedures as they do, l o g i s t i c methods are less t r a c t a b l e from the point of view of computation. In the present study, the l o g i s t i c model was explored and the r e s u l t s compared to those of the l i n e a r discriminant. The l o g i s t i c model was not used extensively, however, and further comments w i l l be l e f t to a l a t e r chapter. 4.2.7 Repeated Measures ANOVA Measurements on a l l 13 graded symptom variables (ranging from SYS to RPAP) were made, where po s s i b l e , at each of the four follow-up points i n the study (years 0, 2, 5, and 10). Thus, we have a s i t u a t i o n i n which the number of p o t e n t i a l p r e d i c t o r v a r i a b l e s , p, i s increasing as more follow-up i n t e r v a l s are considered, while the t o t a l sample s i z e , n , i s d e c l i n i n g due to deaths. In f a c t , at the end of the study period, n = 52 and p i s over 60. The problems caused by a singular (or nearly singular) covariance matrix, x"x , are already very serious at the year 5 stage, e s p e c i a l l y when the o r i g i n a l sample i s divided into more homo-geneous groups. Although the previously-discussed methods of v a r i a b l e s e l e c t i o n and re-expression are s t i l l a pplicable i n theory, there i s i n e v i t a b l y an a r b i t r a r y q u a l i t y to the r e s u l t s obtained i n p r a c t i c e . Thus, d i f f e r e n t techniques were employed when the year 5 data were included. One of these i s known as the analysis of variance with 70 repeated measures: a separate analysis was done f or each of the 13 symptom variables measured at times 0, 2, and 5 years; the va r i a b l e s , SEX (male-female) and STATUS at year 10 (alive-dead) appear as crossed f a c t o r s i n the design; the three f a c t o r s , SEX, STATUS, AND STAGE (0, 2, or 5), are thus f i x e d - e f f e c t s f a c t o r s , while the factor assigned to cases (replications) i s a random-effects factor crossed with STAGE, but nested within SEX and STATUS. The data may be l a i d out as follows: STATUS ALIVE DEAD S E X MALE FEMALE i j k l The associated model may be written as follows: y , = y + OL. + 3. + (a3) . . + y + (ay) : . . + (By) + 1 3 k l 1 3 13 k l k j k 1 3 k K I D ) k i d } ) where a . ( i = l , 2), 3 . ( j = 1» 2), y ( k = l , 2, 3) are the terms f o r 1 3 k the f i x e d - e f f e c t s f a c t o r s , SEX, STATUS, and STAGE, r e s p e c t i v e l y , (a3K_. (ay) , (3y) - v 1 a n d (ctBy) • - v a r e t n e two- and three-way i n t e r a c t i o n terms, p 1 ( i j ) represents the random-effects f a c t o r , r e p l i c a t i o n s 71 (patients) within the grouping f a c t o r s , SEX and STATUS, and f i n a l l y ^ P ^ k l ( i j ) r e P r e s e n t s the i n t e r a c t i o n of STAGE and r e p l i c a t i o n s within SEX and STATUS (see [46]). With such a model, the main e f f e c t s of SEX, STATUS, and STAGE on a given symptom may be tested, along with t h e i r i n t e r a c t i o n s (for example, i s there a d i f f e r e n t time trend for the symptom i n the two STATUS groups?). The repeated measures f a c t o r , STAGE, i s analyzed through orthogonal polynomial decomposition; as explained i n the documentation of the computer package used: "P2V computes two analyses of variance. The f i r s t i s f o r P Q = (Y + Y2 + Y3^ / / y^" " T n e s e c o n c ^ c a n ke decomposed into two parts: P = (Y - Y )//2 and P = (Y - 2Y + Y )//£" . -L JL JL ^ O 5 P^ and P 2 are the orthogonal polynomial decomposition of Y l ' Y 2 ' Y 3 i n t o l i n e a r and quadratic components (P Q i s the mean). An analysis of variance i s performed for both P and P ... P2V pools the r e s u l t s f or P and P by adding the sums of squares f o r each e f f e c t from the i n d i v i d u a l analyses." [6, p.552] In the present a p p l i c a t i o n , the repeated measures fa c t o r i s to be equated with STAGE. Thus, l i n e a r and quadratic e f f e c t s of the STAGE fac t o r may be tested, for each symptom (Y) i n turn. 4.2.8 Growth Curve Analysis A natural complement to the ANOVA described above i s the f i t t i n g of a second-degree polynomial to the data obtained by measuring 5 The orthogonality i s only approximate i n t h i s case, since the time points are not quite equally spaced. each symptom at three times i n the course of the study. Although one might consider applying simple polynomial regression with the given symptom as the dependent v a r i a b l e and the independent v a r i a b l e being the STAGE i n the study ( i e . the time p o i n t s ) , such an approach v i o l a t e s the basic assumption that the observations at one l e v e l of x be uncorrelated with those at any other l e v e l . What i s required i s a regression model with correlated error v a r i a b l e s . Morrison describes such a model i n [37, pp. 216-222]. Called a "growth curve" model because of i t s early applications to the study of growth, t h i s method has the further advantage of allowing for the imposition of an experi-mental design on the subjects. The design used here i s very s i m i l a r to the previous ANOVA design; thus, separate regression c o e f f i c i e n t s are calculated i n each SEX by STATUS c l a s s . The model used f o r each symptom i s : Y_"= XBA/+ where: Y_" i s the 3-row, 66 column matrix of observations (one row f o r each time p o i n t ) ^ arranged as follows: [ Y , _ 2 i ' ***' V • Y » • » V • Y . . • V • Y • • . Y 1 -30,1' -12' ' -22,2' -13' ' ^6,3' -14' ' ^6,4 J ... that i s : female survivors, male survivors, female non-survivors, male non-survivors. X i s the 3 by 3 matrix of independent v a r i a b l e s , transformed to reduce i n t e r c o r r e l a t i o n s ; the f i r s t column i s the constant vector of ones; the second i s the STAGE v a r i a b l e , (0, 2, 5)' transformed to (-7, -1, 8)' by subtracting the mean and At the five-year point, 66 patients were l e f t i n the study: 38 female and 28 males. 73 multiplying by 3; and the t h i r d column i s the second degree term for STAGE, (0, 4, 25)', transformed by taking the r e s i d u a l a f t e r regression on (-7, -1, 8 ) ' and multiplying 2 — by 3 ( i . e . [T - b(T-T)]«3) - the r e s u l t being (36.1050, 17.1578, 33.7369)' . B i s the 3 by 4 matrix of regression c o e f f i c i e n t s (one column for each SEX by STATUS c l a s s ) . A' i s the 4 by 66 design matrix with ones i n the f i r s t 30 columns of row one, i n columns 31 to 52 of row two, i n columns 53 to 60 of row 3, and i n columns 61 to 66 of row 4 and zeroes everywhere else . e" i s the 3 by 66 matrix of error v a r i a b l e s ; i t i s assumed that each column of e_" has a multivariate normal d i s t r i b u -t i o n with mean 0_ and covariance matrix Z_ ; moreover, the columns of e' are independent. Thus, we have: Y.. = xb. + e.. -13 — 3 iD with i = l , 2, •••, n ; j = 1, 2, 3, 4 ; n i = 3 0 ' n2 = 2 2 ' n 3 = 8 ' n4 = 6 That i s , Y. . = b, .x, + b_.x„ + b .x + e. . where x , x , and x — i ] 13—1 2]—2 33—3 — 1 3 — 1 2 0 are the constant, l i n e a r , and quadratic term, r e s p e c t i v e l y . 74 The least-squares estimate f o r B i s given by: , , -1 -1 , -1 , , „ -1 B = (X D X) 2i £ Y A (A A) ... which i s s i m i l a r to the ordinary regression estimate, except for the presence of the 3 by 3 matrix, .p. = i" -1__'r_; (corrected, within groups sum of squares) The a p p l i c a t i o n of t h i s technique to the present data was preceded by c e r t a i n variance-reducing transformations i n some cases. In addition, no packaged programs were a v a i l a b l e to perform the c a l c u l a t i o n s , so s i g n i f i c a n c e t e s t s were not attempted with respect to B_ ; t h i s omission i s not an important one, since the ANOVA of the previous section e f f e c t i v e l y answered the same questions. 4.2.9 Models for the Hazard Function The use of ordinary multiple regression models with a censored dependent v a r i a b l e i s at best awkward, and may even lead (as seen e a r l i e r ) to a v i o l a t i o n of the model's basic assumptions. Although such considera-tions do not detract s i g n i f i c a n t l y from the usefulness of such techniques f o r exploratory purposes, better methods have recently been proposed for analyzing the r e l a t i o n s h i p between s u r v i v a l time and the a v a i l a b l e explanatory v a r i a b l e s . In his 1972 a r t i c l e , Cox takes a novel approach to the problem "of assessing the r e l a t i o n between the d i s t r i b u t i o n of f a i l u r e time and Z_ " [10, p. 189]. Here, Z_ = (Z., , Z^) r e f e r s to the vector of 75 pre d i c t o r v a r i a b l e s , which may be continuous, d i s c r e t e , functions of time, and so on. Given the random va r i a b l e defined as the s u r v i v a l time from some point, Cox's model assumes that the hazard function depends on the pre d i c t o r s , Z_ , as well as on the time, t : X(t, Z) = X"(t)exp(Z§) ; X Q(t) and g_ are to be estimated from the data. Although there i s a b r i e f discussion of some simpler models, i n which ^Q^) 1 S s p e c i f i e d i n advance, most of Cox's paper concentrates "on exploring the consequences of allowing X Q(t) to be a r b i t r a r y , main i n t e r e s t being i n the regression parameters" [10, p.190]. This added generality i s at the expense of e f f i c i e n c y of estimation •  of B_ , and Cox admits that l i t t l e i s known about the magnitude of t h i s l o s s . Moreover, at the time when Cox's model was being considered for a p p l i c a t i o n to the present set of data, no easy way of computing the estimates, B_ and X^ (t) (the l a t t e r i n v o l v i n g a separate maximum l i k e l i h o o d procedure at each time of death), was a v a i l a b l e . Thus, i t was decided to attempt, for the present a n a l y s i s , to f i n d a s u i t a b l e expression f o r X^(t) from consideration of the data. The obvious f i r s t step, however, was the choice of the s t a r t i n g point f o r the measurement of the random v a r i a b l e , T (see p.46 i n t h i s chapter). Since the hazard function corresponding to T was to be modelled as a function of a v a i l a b l e data on the patient, and since these data were recorded no e a r l i e r than the beginning of the study, t h i s point was the obvious choice - at l e a s t a t the outset. When data from l a t e r points i n the study were considered, the d e f i n i t i o n of T was sui t a b l y modified. 76 I t should be noted that any hazard function model, X.(t) = A (t)h(g, Z.) , i = 1, 2, n , implies that the i .th i n d i v i d u a l i s drawn from a population having the density: ft That i s , the i m p l i c i t assumption i s that the o r i g i n a l sampled population i s a mixture of as many subpopulations as there are d i f f e r e n t values of population may be expressed as f ( t , 0 ) , where the form of f does not vary, i t does not follow that the mixed population also has a density of the same form, f (even f o r a mixture of expo n e n t i a l l y - d i s t r i b u t e d sub-populations) . This f a c t complicates the process of chosing an appropriate hazard function model, and may be one of the best arguments i n favour of leaving A n(t) unspecified. made to gain some i n s i g h t into the form of ^Q^) through consideration of subsamples. Since the two-parameter Weibull density ( f ( t ) = Ayt exp(-At ) ) often provides a good f i t to s u r v i v a l data, and i s a general i z a t i o n of the exponential density, a version of the Weibull p l o t (see [26, p.105]) was done f o r each sex separately. When considering time-censored data, with a single time of censoring, t , the observed times to f a i l u r e (death, i n t h i s case) come from a truncated density of the form: h(3, Z) i n that population. However, even i f the density of each sub-Notwithstanding t h i s l a s t observation, however, an attempt was f (t) i f 0 < t < t 0 i f t t [0, t ] 77 ... where f ( t ) and F(t) are res p e c t i v e l y the density and cumulative d i s t r i b u t i o n function corresponding to the uncensored f a i l u r e times. When t h i s d i s t r i b u t i o n i s Weibull with parameters ft and y (both p o s i t i v e ) , the s u r v i v a l function associated with the truncated density, g , i s : {K[exp(-ftt Y) - exp(-Xt Y)] i f t < t u o 0 i f t > t Q with 1/K = 1 - exp(-At Y) . Rewriting S(t) as K exp(-Xt Y) x [1 - exp{-X(t Y - t Y ) } ] f o r t i n [0, t ], and taking logarithms twice, i t follows that l n { l n [ l / S ( t ) ] } = l n { X t Y - InK - l n [ l - exp{-X(t Y - t Y ) } ] } . Thus, the right-hand side w i l l be well-approximated by "lnft + y l n t f o r small values of t , and approach + °° as t approaches t - provided, of course, that the Weibull model i s appropriate. To construct t h i s truncated-Weibull p l o t , the ordered times to f a i l u r e , t . . . < t . „ . < ••• < t.. . '< t , are transformed into the vl) \2) (d) 0 p l o t t i n g p o s i t i o n s on the ho r i z o n t a l axis, and a version of the empirical d i s t r i b u t i o n function (transformed s l i g h t l y to remain p o s i t i v e at the l a s t f a i l u r e time), namely S ( t ^ j ) = (d + 1 - i ) / ( d + 1) , provides the ordinates. I f the shape of t h i s graph i s close to the one predicted f o r the Weibull, then a rough estimate f o r y may be made; for values of y near 1 , an exponential model i s appropriate. For the present data, the exponential d i s t r i b u t i o n appeared to be an adequate f i t - although when the sample i s s t r a t i f i e d on the basis of only one explanatory v a r i a b l e (SEX) the truncated-Weibull p l o t i s only 78 roughly appropriate. However, the assumption of a constant hazard rate i s convenient from a computational point of view and seems to have some i n t u i t i v e appeal i n t h i s case, where a r e l a t i v e l y short time period i s being considered. Breslow [5] has proposed two d i f f e r e n t models for the assumed constant (over time) hazard, X : (1) X = l/_3) or 1/X = Z3_ (2) X = exp(_3) or ln(X) = Zg where, i n each case, Z = (Z . Zn , Z ) and g" = (g , g,, g ); — 0 1 p — 0 1 p of course, one of the explanatory v a r i a b l e s , Z^ say, may be the constant v a r i a b l e . To decide which model i s appropriate, Breslow suggests: s t r a t i f y i n g the sample on the basis of each v a r i a b l e , Z ( i = 1, p) , i n turn; estimating X i n each stratum; and p l o t t i n g X ^ and lnX against the appropriate values of Z. '.. The graph y i e l d i n g the c l o s e r approximation to a s t r a i g h t l i n e w i l l i n d i c a t e the more appropriate model. This method i s again rather rough, and may concievably lead to d i f f e r e n t conclusions for d i f f e r e n t v a r i a b l e s , Z. and Z. . i 3 However, i t i s f a r easier to apply than Kay's check for proportional hazards [30], f o r example, and i n the present a p p l i c a t i o n , the r e s u l t s were f a i r l y conclusive, supporting model (2). Having decided to model the hazard function for the i * " * 1 i n d i v i d u a l i n the sample as: X. = X. exp(Z 3 + ••• + Z. 3 ) I 0 i l 1 ip p exp(3 0 + Z R . + •••+ Z. 3 ) = exp(Z.3) 0 x l 1 i p p — i — 79 the next step was the computation of B = ( B „ , B, , B ) ' • Under — 0 1 p such a model, the l i k e l i h o o d of the censored sample i s e a s i l y seen to be proportional to: n 6. if X. exp(-A.t.) . , i i i i = l where 6. I 1 i f t. i s a time to death l 0 i f the observation i s censored The logarithm of t h i s l i k e l i h o o d i s then: In i = K + T i S . Z . B - J t. exp(Z.B) . , l — i — . , I — i — i = l i = l where Z_ and B_ now include the constant. The maximum l i k e l i h o o d estimate for B i s then obtained as the sol u t i o n to the following system: ^|S^-=" l 6.Z. - I t.Z. exp(Z.B) = 0 dB . , 1—1 . n 1—1 —1— — — 1=1 1=1 Such a system may be solved by i t e r a t i v e methods, as noted e a r l i e r . A l t e r n a t i v e l y , the l i k e l i h o o d may be maximized by using a non-l i n e a r least-squares program to minimize: n I (y. - f(Z.,3))' i = l 1 80 1/2 with y. i d e n t i c a l l y 0 , and f ( Z . , g) = (- In L ( t . , Z., ft)) , as I — I — I — I — suggested i n [6, p.862] . The actual computational technique used here was based on a "d e r i v a t i v e - f r e e " non-linear regression program, described i n [6, p.507]. Here again, a maximum l i k e l i h o o d problem i s being converted into one involving non-linear least-squares. Since the program used also provides for the c a l c u l a t i o n of the l i k e l i h o o d under the given model, the large-sample l i k e l i h o o d r a t i o t e s t (see [36, p.440]) may be used to check the si g n i f i c a n c e of the proposed predictor variables i n the model. One f i n a l comment must be made here i n view of the consider-able attention given i n previous sections to the problem of corr e l a t e d p r e d i c t o r v a r i a b l e s . As Mosteller and Tukey point out i n [39, p.422], when f i t t i n g a non-linear function, f ( x , _) , the independent variables are e f f e c t i v e l y 3f(x, g_) — for j = 0, 1, 2, • • • , p . j Whenever these d e r i v a t i v e s each involve more than one of the o r i g i n a l p r e d i c t o r s , re-expression of the o r i g i n a l predictors w i l l generally be i n e f f e c t i v e i n reducing the i n t e r - c o r r e l a t i o n s among the actual independent v a r i a b l e s . This point w i l l be rai s e d again i n the next chapter. 81 Chapter 5 A SUMMARY OF THE RESULTS The following account of the r e s u l t s obtained by a p p l i c a t i o n of the a n a l y t i c techniques described i n the l a s t chapter w i l l not be a complete one, due to the sheer volume of these r e s u l t s ; i n some cases, only the most i n t e r e s t i n g or i l l u s t r a t i v e outcomes w i l l be mentioned, and the reader who seeks more d e t a i l i s r e f e r r e d to the complete indexed c o l l e c t i o n of r e s u l t s a v a i l a b l e from the author. 5.1 Preliminary Survival Analysis I t was seen i n Chapter 4 that d i f f i c u l t i e s i n studying the d i s t r i b u t i o n of a "time" random v a r i a b l e , T , a r i s e when experimental units have not been observed from the defined s t a r t i n g point for measure-ment of T . This s i t u a t i o n seems to be unavoidable when T represents the age at death of an i n d i v i d u a l who, t y p i c a l l y , i s not i d e n t i f i e d as belonging to the relevant population before the age of 20 years (at l e a s t ) ; and, when T i s taken as the t o t a l known duration of hypertension, there e x i s t s , i n addition to the p o s s i b i l i t y of follow-up times exceeding 25 years, problems with defining and then i d e n t i f y i n g the s t a r t of the disease i n an i n d i v i d u a l . In these cases, i t i s usually necessary to assume that the a v a i l a b l e samples somehow approximate random samples from the s u r v i v a l d i s t r i b u t i o n s of i n t e r e s t ; such assumptions are questionable, of course, but seem i n e v i t a b l e when the experimental units are human beings. L e t t i n g T denote the age at death, the f i r s t look at the data consisted of obtaining estimates for S ( t ) , X ( t ) , and A(t) using the l i f e - t a b l e and/or product-limit options of the computer program BMDrPlL. 82 The next step was to repeat the analysis with the o r i g i n a l sample of 98 broken down into the three severity groups o r i g i n a l l y defined i n [14]. Figure 2 shows the product-limit estimate of S(t) f o r the s t r a t i f i e d sample. Besides aff o r d i n g a more d e t a i l e d p i c ture of the differences i n m o r t a l i t y among these three sub-populations, the s t r a t i f i e d a n a lysis also served to sort out the censored observations: since only f i v e of the 31 members of seve r i t y group 1 had been observed u n t i l death, the analysis of t h i s group was omitted i n favour of groups 2 and 3, where censored observations account f o r 24 of 52 and two of 15 cases respective-l y . In sev e r i t y group 2, the analysis was repeated a f t e r removal of a l i k e l y o u t l i e r - a patient whose complications consisted only of a code 1 f o r RVES and a code 2 f o r HECG, and who was s t i l l a l i v e at the age of 25.8 years when the study ended; i t was f e l t that such a patient was "at best" only a borderline member of group 2. Some of the foregoing analyses had to be redone when i t was r e a l i z e d that the data consist of a mixture of truncated observations (the sampling method ensured that a l l observed times to death would be less than 65 years) and censored observations from a f u l l (non-truncated) d i s t r i b u t i o n . Figure 3(A) shows the l i f e - t a b l e estimates f o r A(t) i n sev e r i t y group 2 ( o u t l i e r removed); as should be the case according to the c a l c u l a t i o n s of Chapter 4, there i s a much steeper r i s e when trun-cated observations alone are used. The two graphs from s e v e r i t y group 3 (see Figure 3(B)) are very s i m i l a r to one another, due to the small f r a c t i o n of censored observations i n t h i s sub-sample. Unfortunately, the e f f e c t of truncation on the shape of these graphs makes i t d i f f i c u l t to judge the true concavity of A(t) for the f u l l d i s t r i b u t i o n ; one might 84 Figure 3: L i f e - t a b l e Estimates of Cumulative Hazard: (A) Severity Group 2 ( o u t l i e r removed) Note: "n = 28" r e f e r s to the non-survivors, or "truncated" observations. 85 n=15 Figure 3 (B): Severity Group 3 Note: "n = 13" r e f e r s to the non-survivors, or "truncated" observations. 86 conjecture, however, that X(t) i s an increasing function of t , i n which case A(t) should be concave upward. One density that shows an increasing hazard rate for c e r t a i n values of i t s parameters, and which would appear f l e x i b l e enough to f i t Y - l the present data i s the three-parameter Weibull: f ( t ) = X y(t-a) e xp [ - X(t - a ) ] where X , y,. and a are a l l p o s i t i v e and are the scale, shape, and "assured l i f e t i m e " (or location) parameters r e s p e c t i v e l y (t > a , of course). Estimation of a by the usual methods i s trouble-some, and a i s generally taken to be some value near the smallest observed value of T i n the sample. With the data transformed by sub-t r a c t i o n of a , a truncated-Weibull p l o t (Int vs In(In [1/S(t)])) was done f o r s e v e r i t y groups 2 and 3 (see Figures 4(A) and (B)) . As with the truncated hazard p l o t s , these Weibull p l o t s also tend to i n f i n i t y as t approaches the truncation point, R ; the estimation of the slope ( i . e . , of y ) must be done with t h i s f a c t i n mind. Examination of Figure 4(A) suggests that a truncated Weibull provides a rather good f i t to the data from se v e r i t y group 2, and a rough estimate of the slope y i e l d s Y = 1.3 . The truncated Weibull would also seem adequate as a model f o r the group 3 data, but, here, the paucity of cases makes the estimation of the slope rather a r b i t r a r y ; f or what i t i s worth, the slope of the f r e e -hand regression l i n e i s between 1.2 and 1.3 . Turning now to the problem of maximum l i k e l i h o o d estimation of the parameters, X and y r f o r the Weibull, i t i s not d i f f i c u l t to show that the c o n d i t i o n a l MLE's, given the ages at the outset of the n study, s a t i s f y : X = d/£(tT - aT) and 87 1.5 0.8 0.1 £-1.3 c -2.0 -2.7 l 1 1 1 1 I > 0.6 1.2 1.8 2.4 ln(t) Figure 4: Truncated-Weibull Plot of Ages at Death (A) Severity Group 2 (a = 35 years) 0.5 0.0 -0.5 -1.0 f - 1 . 5 -2.0 -2.5 • 88 0.7 U 2.1 2.8 ln(t) Figure 4(B): Severity Group 3 (a = 15 years) 89 n T C t T l n t . - aT lna . ) L 1 1 i i " - = I I S.lnt. I ( t Y - a Y ) Y 1 1 1 1 l v ... using the notation of Appendix A , along with d = £ 5. = the number 1 1 who died on study. Treating the data as i f they were an ordinary progressively-censored sample, the l i k e l i h o o d i s simply jj- ) } ^ { S ( t ) ^ ' i= l i i 6^ as above, and t = the greatest observed age, minus the assured l i f e -time, a . In t h i s case, A and y are found from a p a i r of equations that may be derived from those above by removing a l l occurrences of the y Y a. and a!lna. terms. I i i Table V compares the c o n d i t i o n a l and f u l l l i k e l i h o o d MLE's obtained by solving the above equation with the a i d of s p e c i a l l y - w r i t t e n computer programs; the e f f e c t s of changing a were also explored to some extent, and are r e a l l y quite pr e d i c t a b l e . Since the Weibull hazard func-Y - l t i o n i s A(t) = Ay(t-a) , i t i s c l e a r that the value of y i s c r u c i a l here. In t h i s respect, the reader w i l l note that the cond i t i o n a l estimates of y are much lower than the corresponding f u l l l i k e l i h o o d estimates - as should be the case, since the f u l l l i k e l i h o o d e n t i r e l y disregards the truncated nature of the sample. An unfortunate side e f f e c t of using c o n d i t i o n a l MLE's i s t h e i r greater variance r e l a t i v e to f u l l l i k e l i h o o d estimates. C a l c u l a t i o n of ''"Except t ha t both t . and a. are now adjus ted f o r a . i i Table V: Comparison of F u l l and Conditional MLE's for X and y (A) Severity Group 2 (n=51) FULL CONDITIONAL a = 35 A = 0.0023063 Y = 2.007 a = 35 not done, due to st a r t i n g ages < 35 a = 30 A = 0.0000681 Y = 2.96405 a = 30 ^ = 0.00487 y = 1.80725 a = 25* 5 = 0.0000023 Y = 3.80000 a = 25 I = 0.00129 y = 2.11323 * n = 52: o u t l i e r included (B) Severity Group 3 (n=15) FULL a = 15 A = 0.0000943 a = 15 A = 0.2864 y = 2.751 y = 1.0661 a = 12.5 A = 0.0000108 a = 12.5 A = 0.1545 y = 3.308 y = 1.2158 a = 10 A = 0.0000017 y = 3.77 CONDITIONAL 91 the asymptotic covariance matrix of the cond i t i o n a l estimates f o r group 2 (with a = 30) y i e l d s : VvAR(x) = 0.007 , /VAR(y) = 0.418 ; and the estimated c o r r e l a t i o n of X and y I s -0.9898 . Similar c a l c u l a t i o n s were c a r r i e d out f o r s e v e r i t y group 3, but are of l i t t l e use with such a small sample siz e (n = 15), and are omitted; i t should also be noted that maximum l i k e l i h o o d estimates are often quite biased i n samples of s i z e 15, but a l t e r n a t i v e estimates that remain v a l i d with truncated data of t h i s kind appear to be d i f f i c u l t to develop. Turning now to the (censored) sample of known durations of hypertension, the reader w i l l r e c a l l from previous comments that the durations recorded at the s t a r t of the ten-year follow-up period are to be viewed with suspicion; a clue to the a t t i t u d e of the o r i g i n a l research-ers v i s - a - v i s the r e l i a b i l i t y of t h i s v a r i a b l e may be found i n the absence of v a r i a b l e 3 from the l i s t of "items" used to match s u r g i c a l and control cases. I t i s further noted that nine male and nine female patients had a known duration of hypertension of s i x months when the study began, i n d i -c ating that t h e i r medical h i s t o r y began with t h e i r f i r s t v i s i t to one of the hypertension c l i n i c s ; i n other cases, durations of up to 16 years were recorded. In view of these negative considerations and the experience with the age-at-death v a r i a b l e , i t was f e l t that only a b r i e f examination of the known duration data was warranted here. As with the age-at-death data, product-limit estimates of the s u r v i v a l function were made f o r s e v e r i t y groups 2 (n = 51) and 3 (n = 15). It was noted e a r l i e r that truncation on the l e f t has r e l a t i v e l y l i t t l e e f f e c t on the r e s u l t i n g hazard and cumulative hazard functions; i n p a r t i c u l a r , there i s no tendency to d r i f t o f f toward i n f i n i t y and the 92 required adjustments are straightforward. Since a generally increasing hazard function was suspected here again, a type of l e f t - t r u n c a t e d Weibull p l o t was constructed f o r each of severity groups 2 and 3 (see Figures 5(A) and (B)). In t h i s case, as t approaches the truncation point, L , the ordinate, In[In(1/S(t))] approaches - 0 0 , and slope estimates should weight points progressively l e s s as t diminishes. However, even allowing f o r t h i s behaviour, the Weibull does not appear to provide a p a r t i c u l a r l y good f i t to the group 2 data; and, as before, the group 3 data are rather l i m i t e d f o r such analyses. The " s u b j e c t i v e l y -weighted" free-hand regression l i n e estimates f o r X and y i n groups 2 and 3 are, r e s p e c t i v e l y , X = 0.033 , y = 1.22 and X = 0.062 , Y = 1.38 . Conditional MLE's, given the durations at the s t a r t of the study, could have been found exactly as f o r the age-at-death v a r i a b l e , but i t was decided, instead, to pursue more i n t e r e s t i n g v a r i a b l e s . 5.2 The Missing Data Problem Section 4.2 contains an o u t l i n e of the methods used to deal with the problem of missing data among the coded symptom v a r i a b l e s . Section 3.2, i n addition to discussing the various reasons behind the missing data, also l i s t s (for each variable) the number of cases f o r which no observation was recorded. The following points should further i l l u s t r a t e the nature of the problem: (a) As noted e a r l i e r , the number of missing observations per case varied from 0 to 36, and a complete set of data was a v a i l a b l e f o r 23.5% of the en t i r e sample. 93 0.2 -0.4 -1.0 ~ - 1 . 6 5-2.2 -2.8 -3.4 i ! L J L 1.0 1.5 2.0 2.5 ln ( t ) Figure 5: Truncated-Weibull P l o t of Known Durations (A) Severity Group 2 ( o u t l i e r removed) 0.8 0.3 -0.2 £-0.7 5 " 1- 2 c -1.7 -2.2 94 ^ # I 1 I I l__> 1.3 1.6 1.9 2.2 ln ( t ) Figure 5(B): Severity Group 3 95 (b) The cases with the la r g e s t number of missing observations include one with 36 missing v a r i a b l e s , two with 25, one with 24, one with 23, and three with 22. These, and other cases with a high percentage of missing data were c l o s e l y examined i n an e f f o r t to understand why they f e l l i nto t h i s group. I t was found that, out of the worst 13 cases, only f i v e or s i x presented serious problems f o r succeeding analyses, since a large percentage of the missing data involved the "condition p r i o r to death" v a r i a b l e that was included among the coded symptom variables ( l i t t l e use was made of t h i s information i n the analyses that followed). Of these f i v e or s i x cases, only one ("88C" i n the o r i g i n a l data matrix of [14]) was omitted from the estimation procedure, because 26 of the 36 missing observations d i d not involve the "condition p r i o r to death" v a r i a b l e . (c) The most incomplete variables are: HECG2 (42.8% missing), HSIZ2 (36.7%), RRET2 and RPAP2 (31.6% each), RVES2 (29.6%), KSPN2 (28.6%), KPR02 (25.5%), and HECG5 (24.5%). Thus, i t would appear that the greatest problem i s with the heart and r e t i n a l symptoms at the two-year point i n the study - a r e s u l t that i s l i k e l y due to the f a i l u r e of many cases to be examined at some time near t h i s point i n t h e i r follow-up. (d) One of the c o r r e l a t i o n matrices produced by the BMD:PAM program showed that there was a tendency f o r data recorded at the same time point of the study to be ei t h e r mostly present or mostly absent; that i s , a case with one time two va r i a b l e missing (present) i s very l i k e l y to have several other time two variables missing (present). This f a c t implies that the variables used to provide an estimate f o r a given missing 96 observation w i l l more frequently belong to a d i f f e r e n t time point of the study than does the missing observation; such a process would tend to strengthen the c o r r e l a t i o n among variables recorded at d i f f e r e n t time points. (e) When the complete sample was divided into the three groups p r i o r to the generation of stepwise regression estimates, i t was noted that: (i) i n group 1, f o r which no data could have been recorded at the f i v e and ten year points, 18.6% of the remaining 26 coded symptom variables were missing; t h i s subsample i s of s i z e n = 17 ; ( i i ) the corresponding percentage i n group 2, which now includes the time 5 v a r i a b l e s , was 12.3%; n = 14 here; ( i i i ) f o r group 3, where a l l 4 time points could have been used, and n = 67 , almost 15% of the p o t e n t i a l t o t a l of 67(13)(4) = 3484 observations were missing. (f) For each of the above three groups, a t r a n s i t i o n frequency matrix was constructed for each of the 13 coded symptoms; such a matrix contains counts of the number of times a given symptom at some l e v e l , x (= 0, 1, 2, 3, or 4), changes to another l e v e l , y , at the succeeding time point. These matrices showed, f o r example, that heart si z e (HSIZ) very r a r e l y decreased with the passage of time, whereas symptoms such as blood pressure l e v e l and headache were considerably more va r i a b l e i n t h i s sample. Such information allowed an objective appraisal of the q u a l i t y of the regression estimates generated by BMD:PAM, and suggested a l t e r n a t i v e s when these appeared to be quite unreasonable (as was 97 occasionally the case). (g) F i n a l l y , the program BMD:P2D was used with the "completed" data sets; by producing frequency d i s t r i b u t i o n s , histograms, and measures of l o c a t i o n and v a r i a t i o n f o r each v a r i a b l e , t h i s program suggested that the f i n a l estimates were quite reasonable and consistent; also i l l u s t r a t e d were the evolution of a given symptom over time and the symmetry (or lack thereof) of the sample histograms. These r e s u l t s were followed up i n l a t e r examinations of the data. 5.3 Exploratory S t r a t i f i c a t i o n s The use of the program, BMD.-P3D, with various c r i t e r i a f o r d i v i s i o n of the o r i g i n a l sample into two subsamples i n each case, proved to be useful f or many reasons. F i r s t , i t was a natural sequel to the complete-sample descriptions provided by BMD:P2D and discussed above (Section 5.2). Second, i t provided an opportunity to check ( on a univariate basis) f o r the existence of non-overlapping, separate sub-samples; when such groups are present i n a sample, spurious sample co r r e l a t i o n s that r e s u l t may cause problems for l a t e r analyses. F i n a l l y , among the many variables a v a i l a b l e f or t h i s sample are several whose re l a t i o n s h i p s to prognosis and/or other symptoms were of i n t e r e s t on an a p r i o r i basis - that i s , they were suggested by the medical theory. Of course, t h i s t o o l i s l a r g e l y an exploratory one, designed to produce clues rather than conclusions. When divided on the basis of papilledema symptoms (associated with accelerated hypertension) at time zero - that i s , severity groups one and two versus severity group three - the data yielded the following 98 r e s u l t s : a l l s i x blood pressure measurements (the extremes as well as the coded averages), a l l three kidney v a r i a b l e s , and a l l three r e t i n a l v a r i a b l e s had s i g n i f i c a n t l y larger average values i n the papilledema group; the P-values for the univariate t - s t a t i s t i c s here were above 0.005 i n only one case - the KSYMO va r i a b l e ; the age, family h i s t o r y , known duration, heart, and brain variables were not s i g n i f i c a n t l y d i f f e r e n t , on the average, i n these two groups; examination of the accompanying histograms allowed one to assess the e f f e c t s of o u t l i e r s on these t e s t r e s u l t s , and showed that there was a considerable over-lapping of ranges between the two subsamples, even for those variables having highly d i f f e r e n t means. Uremia as a cause of death seems to be associated with the papilledema group. I f the presence of papilledema i s accepted as a r e l i a b l e c r i t e r i o n f o r the diagnosis of accelerated hypertension i n t h i s sample, then the foregoing r e s u l t s appear to support the b e l i e f that t h i s form of the disease has i t s most serious e f f e c t s on the kidneys, and that these e f f e c t s are strongly associated with severely elevated blood pressure. Table VI summarizes the r e s u l t s of the remaining s t r a t i f i c a t i o n s that w i l l be discussed i n t h i s section. The f i r s t column (severity group 1 - group 2) shows that these groups are well-distinguished with respect to a l l variables except HIST, DUR and SEX; i n f a c t , the r e s u l t s here are more s t r i k i n g than f o r the previously-discussed s t r a t i f i c a t i o n (using papilledema), and support the suggestion that accelerated hypertension could reasonably be regarded as a very severe form of hypertensive vascular disease rather than a separate disease unto i t s e l f . The d i v i s i o n on the basis of age created an under fo r t y group 99 Table VI: Preliminary S t r a t i f i c a t i o n s of Severity Group 1 and 2 Data (P-values, i n percent form, for t-tests) STRATA SEVERITY AGE HISTORY SYS. RANGE GRP. 1 - GRP.2 YOUNG-OLD NEG.-POS. LOW-HIGH STRATA SIZES 31, 52 36, 47 39, 44 37, 46 AGE * 1.2 < NA * 2.6 > 16.5 < HIST 28.0 > 23.1 > NA 6.2 < DUR 57.9 > * 1.0 < 85.8 > 85.8 > MAXSYS ** 0.3 < ** 0.2 < 99.2 < ** 0.0 < MAXDIA * 1.9 < 29.4 < * 2.0 < ** 0.1 < MINSYS * 4.1 < * 1.3 < * 2.9 > 6.6 > MINDIA ** 0.8 < 24.5 < 59.9 > 40.7 > SYSO ** 0.0 < 6.4 < 78.7 < ** 0.0 < DIAO * 1.2 < 77.7 < 71.8 < ** 0.0 < HSYMO ** 0.0 < 5.5 < 29.3 > 12.1 < HSIZO ** 0.0 < 16.5 < 45.9 > 20.1 < HECGO ** 0.0 < A* 0.1 < 50.3 < 28.1 < BACHO * 1.5 < 64.3 > 65.5 < 66.9 > BCVAO ** 0.0 < ** 0.2 < 20.7 > 55.2 > KSYMO ** 0.0 < * 4.2 < 46.6 > 47.0 < KPROO ** 0.0 < * 1.6 < 97.7 > 15.1 < KSPNO ** 0.2 < * 3.1 < 62.1 > 49.0 > RVESO ** 0.0 < ** 0.1 < 98.9 > ** 0.5 < RRETO ** 0.0 < * 2.0 < 82.6 > 23.7 < SEX 93.1 > 82.1 > 71.0 > 7.5 < MORTALITY 16.13%-53.85% 30.56%-46.81% 35.90%-43.18% 27.02%-50.00% MEAN SURV. TIME 7.33 - 4.33(YR) 5.04 - 4. 59 4.49 - 4.92 5.59 - 4.37 (NON-SURVIVORS) KEY: ( i ) ** : P-value < 1%, * : 1% < P-value < 5% ( i i ) NA : not applicable ( s t r a t i f i c a t i o n v a r i a b l e ) ( i i i ) <(>) : the mean of the f i r s t subsample was lower (higher) than the mean of the second. 100 2 of s i z e 36 and an over f o r t y group with 47 i n d i v i d u a l s . The existence of s i g n i f i c a n t differences between the means of the various s y s t o l i c v a r i a b l e s and the absence of such differences f o r the d i a s t o l i c v ariables i s of i n t e r e s t and r e c a l l s the b e l i e f that i t i s p r i m a r i l y the s y s t o l i c pressure that increases with age. As for the other v a r i a b l e s , there i s some tendency, though by no means a s t r i k i n g one, for heart, brain, kidney, and r e t i n a l symptoms to be worse i n older p a t i e n t s . Thus, even with s e v e r i t y group 3 excluded here, the r e s u l t s suggest that youth does not preclude serious organic complications when hypertension i s present. With respect to the family h i s t o r y s t r a t a ("negative or doubt-f u l " versus " p o s i t i v e " ) , i t i s c l e a r that one of the following i s true: ei t h e r one's family h i s t o r y of t h i s disease has l i t t l e e f f e c t on i t s r e s u l t i n g s e v e r i t y , or the data a v a i l a b l e contain so many inaccuracies that any e f f e c t i s almost completely obscured. In view of current know-ledge concerning the hereditary aspects of hypertension, one might be tempted to subscribe to the former explanation - though the present data are f a r from convincing here. F i n a l l y , the s y s t o l i c range c r i t e r i o n (defined as MAXSYS-MINSYS) was used to create s t r a t a that possibly r e f l e c t the idea of high and low v a r i a b i l i t y of blood pressure; t h i s analysis represents a crude f i r s t step i n a serie s of attemps to explore the implications of v a r i a b i l i t y of blood pressure on the patient's outlook. Apart from some e f f e c t on the r e t i n a l vessels, the only other r e s u l t to note here i s 40 was a round number close to the o v e r a l l mean of the sample of ages - that was skewed to the l e f t . 101 that a high s y s t o l i c range i s associated with a high average value of MAXSYS and a somewhat low average value of MINSYS; the p o s s i b i l i t y that t h i s a nalysis i s confounding range and l e v e l e f f e c t s of blood pressure led to more re f i n e d analyses i n the sequel. The r e l a t i o n s h i p of the various s t r a t i f i c a t i o n c r i t e r i a to prognosis i s summarized i n the l a s t two rows of Table VI. In terms of both mortality rate and mean s u r v i v a l time of the patients that died on study, the se v e r i t y c r i t e r i o n i s c l e a r l y the most s i g n i f i c a n t of the four; i t i s i n t e r e s t i n g to note that AGE i s i n t h i r d place i n t h i s respect, a f t e r s y s t o l i c range. I t should be c l e a r that the c r i t e r i a discussed here represent only a small and rather a r b i t r a r y subset of those that would be of i n t e r e s t f o r one reason or another; f o r example, i t w i l l be seen l a t e r that d i v i s i o n on the basis of sex makes a valuable contribution to the a n a l y s i s . However, as noted i n the l a s t chapter, better methods e x i s t for data sets l i k e the present one, although they may not possess the same straightforward, i n t u i t i v e appeal as s t r a t i f i c a t i o n . I t should be noted, too, that even the l i m i t e d analyses of t h i s section suggest that the present data set i s generally of a high q u a l i t y , insofar as t h e i r r e s u l t s appear to be consistent with c e r t a i n widely-accepted medical views on hypertension. 5.4 Analysis of Time Zero Variables This section describes some of the r e s u l t s obtained by applying the f a i r l y standard techniques of multiple regression and discriminant analysis outlined i n the l a s t chapter; hazard function regression models are l e f t to a l a t e r section. 102 5.4.1 Preliminary Explorations The f i r s t group of analyses was c a r r i e d out on the o r i g i n a l time zero v a r i a b l e s described i n Chapter 3. Questions in v o l v i n g the choice of dependent v a r i a b l e , y , and possible d i v i s i o n of the 98 cases into subsamples caused t h i s group of analyses to be a very large one; the following tables are intended to convey some idea of the various d i r e c t i o n s taken at t h i s early stage of the multi-variable a n a l y s i s . Table VII i s a short d e s c r i p t i o n of the pattern of c o r r e l a t i o n between the time zero v a r i a b l e s and various re-expressions of the TIME v a r i a b l e . The following points are noted here: - whether severity group 3 i s omitted or not, there i s a remarkable con-sistency i n the variables that are most strongly correlated with the three targets, STATUS, INTERV, and INTERM, even though STATUS i s a much coarser re-expression than INTERM; - f o r the complete sample (n = 98), the r e t i n a l , blood pressure, and kidney variables dominate; i n the reduced sample (n = 83), r e t i n a l symptoms have a somewhat lower ranking, and heart symptoms are now among the top ten (although j u s t barely); - the extra d e t a i l added by the INTERM re-expression appears to have picked up some c o r r e l a t i o n with the RPAPO v a r i a b l e , as would be expected from the medical theory concerning papilledema; - the c o r r e l a t i o n of a patient's age with h i s or her s u r v i v a l time i s weak r e l a t i v e to that of the leading symptom var i a b l e s ; Table VII: Ten Strongest Correlations (with Survival Time) and Their Ranks Variable (at Time 0) TARGET, SAMPLE MAX-SYS MAX-DIA MIN-SYS MIN-DIA SYS DIA HSYM HSIZ HECG BACH BCVA KSYM KPRO KSPN RVES RRET RPAP SEX (A) INTERV (n=98) -.48 (7) -.53 (3) -.42 (9) -.49 (6) -.51 (4) -.40 (10) -.50 (5) -.56 (1) -.53 (2) -.44 (8) (B) INTERM (n=98) -.47 (8) -.52 (4) -.44 (9) -.48 (7) -.52 (5) -.43 (10) -.49 (6) -.58 (1) -.58 (2) -.55 (3) (C) STATUS (n=98) .41 (7) .47 (2) .37 (9) .43 (6) .46 (3) .38 (8) .45 (4) .47 (1) .44 (5) .33 (10) (D) TIME (n=46)* -.50 (4) -.44 (8) -.38 (9) -.37 (10) -.50 (5) -.53 (2) -.44 (7) -.57 (1) -.50 (6) -.51 (3) (E) INTERM (WT'D) -.51 (6) -.51 (5) -.42 (9) -.48 (7) -.53 (4) -.38 (10) -.42 (8) -.59 (1) -.55 (2) -.53 (3) (F) INTERV (n=83) -.43 (4) -.49 (1) -.44 (3) -.47 (2) -.33 (9) -.31 (10) -.42 (6) -.38 (8) -.42 (5) -.38 (7) -(G) STATUS (n=83) .36 (7) .43 (1) .39 (4) .42 (2) .32 (9) .30 (10) .41 (3) .37 (5) .36 (6) .33 (8) — (H) TIME (N=33)* -.52 (3) -.48 (4) -.55 (2) -.57 (1) -.28 (10) -.28 (9) -.33 (6) -.29 (7) -.29 (8) -.43 (5) — NOTES: ( i ) INTERV = (1 IF TIME e (0,4] ( i i ) INTERM = / 1 IF TIME e (0,2] I 2 IF TIME e (4,10) I 2 IF TIME e (2,5] (3 IF TIME > 10 YR. 1 3 IF TIME e (5,10) M IF TIME > 10 YR (iv) * NON-SURVIVORS ONLY (v) n = 98: COMPLETE SAMPLE ( v i i ) WT'D: SURVIVORS WEIGHTED 0.3 ( i i i ) STATUS =f1 IF TIME < 10 YR LO IF TIME > 10 YR o CO (vi) n = 83,33: SEV. GRP. 3 .OMITTED 104 - among the blood pressure v a r i a b l e s , MINSYS generally has the weakest c o r r e l a t i o n with s u r v i v a l time; MINDIA also has a f a i r l y weak c o r r e l a -t i o n . The r e s u l t s of applying the standard techniques of multiple l i n e a r regression to the variables of Table VII are summarized i n Table 3 VIII. The " a l l subsets regression" program, BMD:P9R, was useful here since the c o r r e l a t i o n matrix (see Appendix D) revealed that the pred i c t o r variables are strongly i n t e r r e l a t e d i n these samples. Table VIII repro-duces the "best" three subsets f o r each target, with "best" defined i n 2 terms of Mallows' C c r i t e r i o n (the value of R and the adjusted P 2 R also appear for each set of p r e d i c t o r s ) . The row headed "MIN.TOLER." 2 contains the smallest tolerance or usable f r a c t i o n (that i s 1 - R ) among the predictors i n the given set, along with the name of the corresponding pr e d i c t o r ; t h i s information i s intended to provide a rough idea of the degree of m u l t i c o l l i n e a r i t y e x i s t i n g i n each set. A few. more d e t a i l e d comments about Table VIII follow: - the sets given i n parts (A) to (C) of the table indicate that, l i k e the c o r r e l a t i o n r e s u l t s above, the regression r e s u l t s are not very s e n s i t i v e to the p a r t i c u l a r re-expression of the TIME v a r i a b l e used; among the minor differences here are the replacement of RVESO by RPAPO i n going from INTERV to INTERM, and the absence of a l l r e t i n a l v a r i a b l e s from the sets reported f o r STATUS; - the v a r i a b l e , MINSYS, i s always accompanied by other blood pressure The v a r i a b l e s , AGE, HIST, DUR were included in'the analyses, but are generally omitted from the tables to save space. "ASR" r e f e r s to a l l subsets regressions; "% TOTAL R 2 " r e f e r s to the percentage of the R 2 value f o r the e n t i r e set of predictors that i s achieved with the given subset. Table VIII: Regression Results, g , (eg) * 1000, for the Combined Sample VARIABLES & SUMMARIES < ASR 'BEST' :A) INTERV (n=< ASR #2 )8) ASR #3 (B ASR 'BEST' ) INTERM (n=98) ASR #2 ASR #3 AGE HIST DUR MAXSYS MAXDIA MINSYS MINDIA -15 (3) 13 (4) -18 (7) -12 (3) 13 (4) -16 (7) -16 (3) 13 (4) -18 (7) -20 (3) 19 (5) -29 (8) -16 (4) 19 (5) -26 (8) -19 (3) 18 (5) -27 (8) SYSO DIAO -169 (115) -239 (146) HSYMO HSIZO HECGO BACHO BCVAO -177 ( 56) -173 ( 56) -149 ( 63) -250 ( 73) -245 ( 72) -193 ( 80) KSYMO KPROO KSPNO -445 (110) -428 (110) -188 (122) -483 (102) -508 (136) -483 (136) -247 (156) -469 (137) RVESO RRETO RPAPO SEX INTERCEPT -122 ( 77) 298 ( 63) 5174 -119 ( 77) 290 ( 63) 4871 315 ( 63) 5264 -374 (128) 345 ( 82) 7079 -369 (127) 335 ( 82) 6643 -350 (128) 360 ( 82) 6839 R 2(%) % TOTAL R^ ADJ'D R 2(%) Msi 60.6 95.6 57.6 -0.25 0.3309 61.5 97.0 58.1 -0.19 0.3266 60.5 95.5 57.5 -0.11 0.3315 63.7 93.7 60.9 16.8 0.5392 65.0 95.6 61.6 17.7 0.5294 64.7 95.1 61.6 17.9 0.5305 MIN. TOLER. VARIABLE 0.3065 MINSYS 0.3065 MINSYS 0.3054 MINSYS 0.3091 MINSYS 0.3088 MINSYS 0.3031 MINSYS Table VIII, Continued VARIABLES & SUMMARIES ASR 'BEST' (C) STATUS (n=< ASR #2 J8) ASR #3 (D) TIM ASR 'BEST' E (NON-SURVIVORJ ASR #2 5, n=46) ASR #3 AGE HIST DUR 8 (5) MAXSYS MAXDIA MINSYS MINDIA 5 (2) -8 (3) 11 (4) 6 (2) -8 (3) 10 (4) 8 (2) -7 (3) 11 (4) -32 ( 17) -31 ( 17) -46 ( 18) 47 ( 20) -56 ( 31) SYSO DIAO 137 ( 77) 115 ( 76) -1456 (564) -1380 (557) -1653 (572) HSYMO HSIZO HECGO BACHO BCVAO 102 ( 37) 91 ( 36) 63 ( 41) KSYMO KPROO KSPNO 241 ( 67) 263 ( 66) 121 ( 81) 252 ( 67) RVESO RRETO RPAPO -1,107 (332) 847 (365) -570 (363) -808 (351) -666 (351) SEX INTERCEPT -121 ( 41) -1096 -122 ( 41) -983 -132 ( 41) -1066 853 (300) 16874 712 (306) 16505 943 (320) 18377 R 2(%) % TOTAL R 2 ADJ'D R 2(%) Msi 48.2 95.3 43.6 -0.36 0.1420 46.8 92.5 42.7 -0.20 0.1442 46.8 92.4 42.7 -0.17 0.1443 56.8 82.4 52.6 -2.59 3.8542 59.3 86.0 54.2 -2.53 3.7208 64.0 93.4 57.9 -2.51 3.4212 MIN. TOLER. VARIABLE 0.308 MINSYS 0.309 MINSYS 0.305 MINSYS 0.641 MAXSYS NOT AVAILABLE NOT AVAILABLE Table VIII: Continued VARIABLES & SUMMARIES (E) INTERM ASR 'BEST' (SURVIVORS WEI ASR #2 GHTED 0.3) ASR #3 (F) INTERV (S ASR 'BEST' EV. GRP. 3 OMIT ASR #2 TED, n = 83) ASR #3 AGE HIST DUR MAXSYS MAXDIA MINSYS MINDIA -17 (4) 22 (5) -26 (8) -15 (4) 22 (5) -28 (8) -18 (4) 21 (5) -28 (8) -12 (3) 14 (4) -15 (8) -10 (3) 15 (4) -16 (8) -15 (3) 14 (4) -18 (8) SYSO DIAO -309 (139) -353 (139) -298 (138) -184 (121) -186 (120) HSYMO HSIZO HECGO BACHO BCVAO -202 ( 67) -185 ( 67) -216 ( 67) -139 ( 69) -131 ( 68) -135 ( 69) KSYMO KPROO KSPNO -237 (128) 163 (100) -308 (134) -303 (150) -473 (121) -260 (152) -439 (123) -337 (149) -482 (122) RVESO RRETO RPAPO -183 ( 94) -266 (103) -243 ( 89) -312 (101) -212 ( 94) -294 (104) -135 ( 93) SEX INTERCEPT 369 ( 77) 6,588 362 ( 78) 6,480 354 ( 77) 6,951 255 ( 68) 4,471 255 ( 67) 4,244 266 ( 68) 4,822 R 2(%) % TOTAL R 2 ADJ'D R 2(%) Msi 68.2 95.6 64.9 15.42 0.4637 66.9 93.8 63.9 15.72 0.4764 69.1 96.9 65.6 15.91 0.45503 53.8 92.4 48.9 3.47 0.3248 55.1 94.7 49.6 3.55 0.3200 52.4 90.0 48.0 3.60 0.3304 MIN. TOLER. VARIABLE 0.288 MINSYS 0.3 MINSYS 0.28 MINSYS 0.330 MINSYS 0.32 MINSYS 0.34 MINSYS Table VIII, Continued VARIABLES & SUMMARIES (G) STATUE ASR 'BEST' > (SEV.GRP. 3 ASR #2 OMITTED, n=83] ASR #3 ) (H) TIME (N 1 ASR 'BEST' 3N-SURVIVORS OF and 2, n = 33) ASR #2 SEV. GRPS ASR #3 AGE HIST DUR MAXSYS MAXDIA MINSYS MINDIA 7 (2) -9 (3) 13 (5) 5 (2) -8 (3) 11 (6) 7 (2) -9 (3) 12 (5) 39 ( 16) -41 ( 22) 45 ( 17) 32 ( 16) SYSO DIAO 107 ( 82) -2862 (643) -2669 (688) -2980 (668) HSYMO HSIZO HECGO BACHO BCVAO 56 ( 47) -617 (333) -815 (349) -614 (337) -741 (368) -809 (364) KSYMO KPROO KSPNO 287 ( 92) 281 ( 83) 269 ( 93) 275 ( 82) 236 (101) 290 ( 82) RVESO RRETO RPAPO* -895 (437) -902 (458) SEX INTERCEPT -106 ( 45) -906 -100 ( 45) -701 -116 ( 46) -924 904 (352) 8347 850 (351) 15280 868 (366) 9216 R 2(%) % TOTAL R z ADJ'D R 2(%) Msi 41.8 86.9 37.2 0.59 0.1523 43.0 89.4 37.7 1.07 0.1510 42.8 89.0 37.5 1.28 0.1515 61.8 80.2 52.9 -0.66 3.4731 61.0 79.1 52.0 -0.28 3.5456 56.7 73.5 48.7 -0.24 3.7873 MIN. TOLER. VARIRABLE | 0.331 MINSYS 0.3 MINSYS 0.3 MINSYS 0.589 DIAO N.A. 0.6 DIAO * not used here 109 variables (usually MAXSYS and MINDIA) and always has a c o e f f i c i e n t whose sign i s opposite to the one expected (in view of i t s s i g n i f i c a n t negative c o r r e l a t i o n with TIME); MINSYS i s very often the v a r i a b l e that i s most c l o s e l y r e l a t e d to the other p r e d i c t o r s i n the set; although the variables BACHO and SEX have r e l a t i v e l y weak co r r e l a t i o n s with the various target v a r i a b l e s , they also have large usable f r a c t i o n s ; t h i s f a c t probably accounts f o r t h e i r quite consistent appearance i n the regression equations here, i n preference to the heart symptom variables (which never appear); the re-expression, INTERM, seems to have some advantages over INTERV, i n that the former i s a f i n e r p a r t i t i o n of the TIME v a r i a b l e and has a greater percentage of i t s variance accounted f o r by the same number of predic t o r s ; although the p r e d i c t i o n of TIME i n a sample of patients who are known to survive l e s s than ten years might seem to be of l i m i t e d p r a c t i c a l use, i t should be noted that the r e s u l t s f o r the STATUS var i a b l e could be used as a f i r s t step to i d e n t i f y patients who are very u n l i k e l y to be ten-year survivors; i t then seems quite reasonable to obtain more d e t a i l by a p p l i -c a t i o n of the TIME equations with such p a t i e n t s . with weighted l e a s t squares regression as i n part (E) of Table VIII,"the n 2 minimand i s ) w.(y. - y.) where w. , i n t h i s case, i s 0.3 for the i= l survivors and 1 otherwise; weighting was used here as a compromise between working with the complete sample and using only the non-survivors; and i t also allows the f i t t i n g process to perform better with the data 110 representing actual observed times; comparing parts (B) and (E) of the table, i t i s seen that the weighting technique has not created any major d i f f e r e n c e s , although an extra r e t i n a l v a r i a b l e i s retained i n the weighted solutions; elimination of s e v e r i t y group 3 from the sample has the rather under-standable e f f e c t of decreasing the importance of r e t i n a l symptoms as pre d i c t o r s , but t h i s i s the only major change; regression of the TIME var i a b l e f o r the 33 non-survivors i n severity groups 1 and 2 represents the greatest departure from the pattern that i s generally i n evidence i n Table VIII: that i s , i n c l u s i o n of MAXSYS, MINSYS, MINDIA, DIAO (sometimes), BACHO, one or two kidney variables ( e s p e c i a l l y KSPNO), one or two r e t i n a l v a r i a b l e s ( e s p e c i a l l y RVESO), and SEX; the ma-jor d i f f e r e n c e i n part (H) i s the replacement of a kidney v a r i a b l e byBCAVO. Before leaving the d e s c r i p t i o n of regression r e s u l t s involving the o r i g i n a l time zero v a r i a b l e s , another form of weighted analysis that was done should be mentioned. This technique, c a l l e d "biweight regression", i s described i n [4, p.151] and i s a robust-resistant form of least-squares regression; the i t e r a t i v e process s t a r t s with equal weighting of a l l cases, but i n subsequent steps assigns to case i the weight 2 2 w. = (1 - u. ) i f u. < 1 (and 0 otherwise), with u. = the r e s i d u a l , i i i I y^ - y^ , from the preceding step, divided by some measure of v a r i a b i l i t y of the l a s t set of r e s i d u a l s . In the present a p p l i c a t i o n , the measure of v a r i a b i l i t y was taken to be twice the i n t e r - q u a r t i l e range of the re s i d u a l s , and s i x steps were used. Table IX provides some d e t a i l on the r e s u l t s of t h i s process. The target v a r i a b l e here i s INTERV, and because TABLE IX: Biweight Regression Results for INTERV Co e f f i c i e n t s x 10 6 VARIABLES & ORDINARY BIWEIGHT LEAST SQ. FIRST SECOND THIRD FOURTH FIFTH SIXTH INTERCEPT MAXSYS BACHO KSPNO RVESO SEX 4665400 -10908 -178664 -436031 -153157 277482 4735500 -11690 -203340 -437850 -151790 322450 4762600 -12020 -211073 -422516 -155141 340848 4775230 -12204 -215772 -410301 -158746 352275 4781480 -12007 -217963 -402140 -161276 358295 4784470 -12353 -219152 -397231 -162895 361678 4786100 -12380 -219691 -394469 -163802 363391 R2 (%) F-STAT MSERROR 55.87 23.29 .3626 68.00 39.105 .2580 68.86 40.69 .2529 69.91 42.74 .2452 70.25 43.46 .2429 70.47 43.90 .2414 70.54 44.07 .2410 ANALYSIS OF RESIDUALS AFTER EACH REGRESSION: RES 1 RES 2 RES 3 RES 4 RES 5 RES 6 RES 7 |MAXIMUM| jMINIMUMj RANGE 2S VARIANCE1'2 % || < 0.50 % || < 0.25 1.5369 0.0043 2.9913 1.9310 0.58647 55.1 26.5 1.6335 0.0003 3.1125 1.93902 0.58867 57.2 28.6 1.6829 0.0087 3.1625 1.88312 0.5907 58.2 28.6 1.7147 0.0099 3.1927 1.87510 0.59237 58.2 28.6 1.7327 0.0118 3.2086 1.86868 0.5934 58.2 28.6 1.7429 0.0130 3.2175 1.86860 0.5940 58.2 28.6 1.7484 0.01376 3.22197 1.86888 0.5943 58.2 28.6 NOTE: ( i ) • [f i f l u.| < 1 else ( i i i ) S = Q 3 ~ Qj - i n t e r q u a r t i l e range of the r e s i d u a l s . ( i i ) u. y. - y. ~ 2S (iv) % | | : percent with absolute value. 112 of the necessity of c a l c u l a t i n g the residuals a f t e r each step, a r e l a -t i v e l y small set of predictors was chosen here. I t w i l l be noted that the c o e f f i c i e n t s of step 6 are not gr e a t l y d i f f e r e n t from those of the ordinary least-squares method and the improvement i n the percentage of resid u a l s that are "small" i s f a i r l y modest. One may perhaps conclude from t h i s that ordinary least-squares regression i s adequate f o r the type of data being considered here. In any event, the absence of an e f f i c i e n t , single program to carry out the c a l c u l a t i o n s precluded any more extensive use of biweighting i n the present a n a l y s i s . Turning now to the alternate form of analysis f o r the STATUS va r i a b l e , as indicated i n Chapter 4, the reader w i l l f i n d some of the r e s u l t s summarized i n Table X, parts (A) to (C). Part (A) gives the f i n a l r e s u l t of a stepwise l i n e a r discriminant analysis that was c a r r i e d out with the o r i g i n a l sample randomly divided into groups of siz e 73 and 25; the larger subsample provided the discriminant function, and the smaller set of cases was held i n reserve to provide a check on the func-t i o n found. I t i s r e a d i l y apparent that t h i s function, using no p r i o r p r o b a b i l i t i e s or costs of m i s c l a s s i f i c a t i o n , does not succeed very well with patients who died while on study. To correct t h i s undesirable s i t u a t i o n , the e n t i r e sample was used i n r e - c a l c u l a t i n g the c o e f f i c i e n t s f o r the set of v a r i a b l e s found i n (A), and t h i s time the constant, K , was adjusted using the sample proportions of a l i v e and dead cases (q , q D) and a cost of m i s c l a s s i f i c a t i o n r a t i o that ( a r b i t r a r i l y ) s t i p u l a t e s a 1.5 times higher cost of i n c o r r e c t l y c l a s s i f y i n g a patient who w i l l i n f a c t die within ten years. The r e s u l t s which appear i n Table X ( B ) ( i ) , are not jackknifed r e s u l t s and indicate that the Table X: Discriminant Analysis Results Using the l i n e a r rule that classes a case as a 10-year survivor i f and only i f : 0.07285 (MAXDIA) + 0.70508 (BACHO) + 1.79003 (KSPNO) -0.94844 (SEX) - 10.03429 < 0 (Based on a random subsample of size 73) ( i ) CLASSIFICATION MATRICES: "TRAINING SET" (n=73), JACKKNIFED* Ali v e Dead PREDICTED ALIVE PREDICTED DEAD 36 12 3 22 PERCENT CORRECT 92.3 64.7 79.5% o v e r a l l ( i i ) "OMITTED SET" (n=25) A l i v e Dead PREDICTED ALIVE 11 7 PREDICTED DEAD 2 5 PERCENT CORRECT 84.6 41.7 64.0% o v e r a l l *Each case i s c l a s s i f i e d using a function calculated from the data of a l l cases other than the one being c l a s s i f i e d . Table X, Continued Using the enti r e sample (n=98) with the variables found i n Part A, the rule i s : SCORE = 0.06862 (MAXDIA) + 0.5682 (BACHO) + 1.6924 (KSPNO) - 0.50081 (SEX) - 10.178 < K ( i ) With K = -0.285321 = In C(D|A)q^ C(A|D)q T where C(DIA)/C(AlD) = 2/3, q A = .53, q D = .47 PREDICTED ALIVE PREDICTED DEAD PERCENT CORRECT Aliv e Dead 42 12 10 34 80.77 73.91 77.55% o v e r a l l ( i i ) With K = -0.32051 p r o b a b i l i t i e s of , the adjustment for equal m i s c l a s s i f i c a t i o n : PREDICTED ALIVE PREDICTED DEAD PERCENT CORRECT Aliv e Dead 40 11 12 35 76.92 76.09 76.53% o v e r a l l ( i i i ) With K = 0.11562, the adjustment for an o v e r a l l p r o b a b i l i t y of m i s c l a s s i f i c a t i o n : minimum PREDICTED ALIVE PREDICTED DEAD PERCENT CORRECT Aliv e Dead 46 14 6 32 88.46 69.57 79.59% o v e r a l l Table X, Continued Using "Normalized" scores: Class a case a 10-year survivor i f and only i f : L o g 1 Q ( s c o r e + 7) ^  C with score as i n Table X, (B) ( i ) With C = 0.828245, the mean normalized scores r e s u l t i s i d e n t i c a l to point midway between the for the 2 groups, the that of Table X, (B), ( i ) . ( i i ) With C = 0.82151, the adjustment for equal p r o b a b i l i t i e s of m i s c l a s s i f i c a t i o n : PREDICTED PREDICTED PERCENT ALIVE DEAD CORRECT Alive 40 12 76.92 Dead 11 35 76.09 76.53% c ( i i i ) With C = 0.83346, the adjustment for an ov e r a l l minimum p r o b a b i l i t y of m i s c l a s s i f i c a t i o n : PREDICTED PREDICTED PERCENT ALIVE DEAD CORRECT Alive 45 7 86.54 Dead 12 34 73.91 80.61% (iv) With C = 0.816, to produce an opposite e f f e c t to ( i i i ) PREDICTED PREDICTED PERCENT ALIVE DEAD CORRECT A l i v e 39 13 75.00 Dead 9 37 80.43 77.55% c 116 percentage of c o r r e c t l y c l a s s i f i e d non-survivors s t i l l lags behind that of the survivors. Since an examination of the two sets of discriminant scores produced by the function i n (B) indicated a substantial d i f f e r e n c e i n variance, the adjustments f o r equal and minimum p r o b a b i l i t i e s of m i s c l a s s i f i c a t i o n were explored; these r e s u l t s form parts ( i i ) and ( i i i ) of Table X(B). I t i s c l e a r that the o v e r a l l minimum p r o b a b i l i t y of m i s c l a s s i f i c a t i o n i s achieved at the expense of the percentage of non-survivors c o r r e c t l y c l a s s i f i e d . The f a c t that the p r o b a b i l i t i e s of m i s c l a s s i f i c a t i o n i n ( i i ) are s t i l l not quite equal was o r i g i n a l l y thought to be r e l a t e d i n some way to the skewness of the d i s t r i b u t i o n s of sample discriminant scores; the l a t t e r appeared more normal a f t e r a logarithmic transformation, and the adjustments of part (B) were repeated using the means and variances of "normalized" scores. The r e s u l t s appear i n Table X(C). Although no e f f e c t was observed f o r the equalizing adjustment, the same can not be said of the adjustment i n ( i i i ) - although the o v e r a l l gain i s a modest 1%. F i n a l l y , an adjustment i n the opposite d i r e c t i o n to that of ( i i i ) was considered (see part ( C ) ( i v ) ) , and i t appears from t h i s that a good percentage of non-survivors can be i d e n t i f i e d without greatly decreasing the o v e r a l l percentage of correct c l a s s i f i c a t i o n s . Table XI presents a summary of the discriminant r e s u l t s obtain-ed by relaxi n g the r e s t r i c t i o n of equal covariance matrices i n the two populations. Since the computations involved here are rather time-consuming, only one quadratic discriminant function was calculated, using the same four variables considered with the l i n e a r functions. Part (i) of t h i s table may be compared with Table X ( B ) ( i ) , f o r example, and the degree of improvement judged. Lacking a BMDP program to carry out the Table XI: Quadratic Discriminant Results Using the Quadratic rule that classes a case as a 10-year survivor i f and only i f : -0.00058 (w 2) + 0.18674 (w) + 0.33378 (x 2) - 4.0047(x) + 4.1621 ( y 2 ) - 17.905 (y) + 0.15365 ( z 2 ) - 2.5004 (z) + 0.04448 (wx) + 0.15674 (wy) + 0.01178 (wz) + 0.52090 (xy) - 0.75132 (xz) - 1.44562 (xz) _< 16.452552 where w = MAXDIA, x = BACHO, y = KSPNO, z = SEX (i ) C l a s s i f i c a t i o n Matrix (98 controls) PREDICTED ALIVE PREDICTED DEAD PERCENT CORRECT A l i v e Dead 45 10 7 36 86.54 78.26 82.65% o v e r a l l ( i i i ) C l a s s i f i c a t i o n Matrix (98 s u r g i c a l cases) PREDICTED ALIVE PREDICTED DEAD PERCENT CORRECT A l i v e Dead 38 12 19 29 66.67 70.73 68.37% o v e r a l l 118 j a c k k n i f i n g , i t was decided to t e s t t h i s quadratic discriminant function on the a v a i l a b l e s u r g i c a l data, even though the l a t t e r c o n stitute a sample from a population whose ten-year m o r t a l i t y rate should be lower than that of untreated p a t i e n t s . The r e s u l t s appear i n part ( i i ) of Table XI and, not s u r p r i s i n g l y , show that the function i n c o r r e c t l y c l a s s -i f i e s many patients who a c t u a l l y survived the study period. One f i n a l look at the r e l a t i o n s h i p between TIME and the v a r i a b l e s , MAXDIA, BACHO, KSPNO, and SEX r e s u l t e d from re-expressing TIME i n a manner s u i t a b l e for use i n a l o g i s t i c model; although the STATUS v a r i a b l e could have been used here, an attempt to provide more d e t a i l l e d to the d e f i n i t i o n of the new target, TSTAT = 1 - (TIME/10) , which i s 0 f o r the survivors and approaches 1 as TIME decreases. The c o e f f i c i e n t s calculated by a non-linear regression program of the BMDP l i b r a r y are, with standard errors i n parentheses: 0.0591 (0.0035) f o r MAXDIA, 0.5059 (0.1704) f o r BACHO, 1.4226 (0.3031) f o r KSPNO, -0.5464 (0.1798) f o r SEX, and the constant was -9.5044. One advantage of using the l o g i s t i c model with t h i s equivalent re-expression of TIME i s that a l l predicted values w i l l belong to the o r i g i n a l [0, 1] i n t e r v a l , thereby sidestepping one -o'f the l e s s desirable aspects of the regression models discussed e a r l i e r . However, problems do a r i s e i n comparing r e s u l t s obtained using TSTAT i n a l o g i s t i c model with ordinary regression and discriminant outcomes. " 2 For what i t i s worth, the r e s i d u a l mean square, MSE = Z(y - y) /v , i s 0.064 for the above l o g i s t i c r e s u l t , compared with a value of 0.142 for ordinary regression using the STATUS v a r i a b l e . 119 5.4.2. Using P a r t i a l l y Re-expressed Data The appearance of rather small usable f r a c t i o n s among even the r e l a t i v e l y small sets of predictors selected by a l l subsets regres-sion (and reported i n the l a s t section) l e d to a search for alternate sets of predictors possessing comparable p r e d i c t i v e power and more stable estimated c o e f f i c i e n t s . This section presents some solutions i n which as many of the o r i g i n a l variables as possible are retained - f or the sake of i n t e r p r e t a b i l i t y . The f i r s t solutions considered involved MINVAR transformations centering on MINSYS, the va r i a b l e possessing the smallest usable f r a c t i o n i n many of the sets chosen f o r p r e d i c t i o n of INTERV. MINVAR c a l c u l a t i o n s showed that the c o e f f i c i e n t for MINSYS changes from +0.013 to -0.011 when the other p r e d i c t o r s , MAXSYS, MINDIA, BACHO, KSPNO, RVESO, and SEX are replaced by t h e i r residuals a f t e r regression on MINSYS; with t h i s set of predictors the r e s i d u a l of RVESO i s now the va r i a b l e having the small-est usable f r a c t i o n (0.619). Table XII(A) describes one of many " p a r t i a l " MINVAR transforma-tions attempted, using regression on MINSYS f o r the r e s i d u a l s , once again. Since most of the problem with the small usable f r a c t i o n of MINSYS stemmed from t h i s v a r i a b l e being highly correlated with MAXSYS and MINDIA, a f a i r l y i nterpretable and stable s o l u t i o n was obtained by replacing these variables with t h e i r r e s i d u a l s . Unfortunately, the pre d i c t o r , RVESO, which i s very strongly r e l a t e d to the target v a r i a b l e , i s l e f t with a low tolerance and a large standard e r r o r . The solution.presented i n part (B) of Table XII demonstrates that MINSYS can be eliminated as an e x p l i c i t member of t h i s set of predictors by replacing MINDIA, KSPNO, Table XII: MINVAR-inspired Re-expressions VARIABLE COEF* ST.ERROR* P-VALUE** TOLERANCE TMXSYS -15 3.0 0.0 0.714 MINSYS - 5 2.5 5.5 0.841 TMNDIA -18 6.5 0.7 0.949 BACHO -177 56.2 0.2 0.894 KSPNO -445 109.8 0.0 0.730 RVESO -122 77.1 11.6 0.543 SEX 298 62.8 0.0 0.857 INTERCEPT 5174 R^  = 0.6061 TMXSYS = MAXSYS - 0.68973 (MINSYS) TMNDIA = MAXSYS - 0.41585 (MINSYS) * x 1000; ** x 100 VARIABLE COEF* ST.ERROR* P-VALUE** TOLERANCE MAXSYS -15 3.0 0.0 0.450 MINSYS 1 3.1 75.8 0.525 TMNDIA -18 6.5 0.7 0.949 BACHO -177 56.2 0.2 0.894 TKSPNO -445 109.8 0.0 0.785 TRVESO -122 77.1 11.6 0.619 SEX 298 62.8 0.0 0.857 INTERCEPT 5174 R z = 0.6061 TMNDIA as i n (A) TKSPNO = KSPNO - 0.00637 (MINSYS) TRVESO = RVESO - 0.01387 (MINSYS) 121 and RVESO by t h e i r r e s i d u a l s ; although the s i x - v a r i a b l e set that would r e s u l t i s not given i n the table, i t i s noted here that the value of 2 R for t h i s set would be 0.6057, and the tolerance of MAXSYS would improve considerably from 0.450 . Problems with RVESO are again evident here, however. Another approach that was taken to overcome the problems caused by h i g h l y - i n t e r r e l a t e d groups of p r e d i c t o r s involved the use of p r i n c i p a l components - on a small scale: the highly correlated v a r i -ables, MINSYS and MINDIA, were replaced by l i n e a r combinations that approximate t h e i r p r i n c i p a l components, and MAXDIA was replaced by i t s r e s i d u a l , RMXDIA, a f t e r regression on the two components; KSYMO and KPROO were s i m i l a r l y replaced by t h e i r "components", KC1 and KC2, and 4 KSPNO by i t s r e s i d u a l , RKSPN. The r e s u l t s , s i m i l a r to those presented i n Section 5.4.1 above, appear i n Tables XIII to XVI. I t should be noted that the r e s u l t s i n Table XIII were produced by a forward-backward step-wise regression program (BMD:P2R) i n an ultimately f u t i l e attempt to pare down the computing costs incurred by using a l l subsets regression 5 (as i n BMD:P9R). The sets that appear i n Table XIII correspond roughly to the " a l l subsets regression" r e s u l t s reported i n Table VIII (C), (A), and (B), and i t i s i n t e r e s t i n g to note that these solutions appeared near or at the end of the stepping procedures. Since the sets presented i n Table XIII are equivalent to some of the solutions i n Table VIII, the c o e f f i c i e n t s obtained for MINSYS and MINDIA by reversing the transformation to PCI and PC2 should be i d e n t i c a l 4 The c o e f f i c i e n t s f or these re-expressions were a c t u a l l y derived from the female sample only. 5 A l l subsets regression produced many more i n t e r e s t i n g solutions than the stepwise routine - for about the same p r i c e . Table XIII: Regression Results, $ 5( ag) * 1000, Using Re-expressions of Extreme Blood Pressures VARIABLES & SUMMARIES (A) STATl SWR #12 IS (n=98) SWR #13 (B) INTE1 SWR #9 IV (n=98) SWR #10 (C) INT SWR #14 ERM (n=98) SWR #15 AGE HIST DUR 8 (5) MAXSYS RMXDIA PCI PC2 5 (2) -3 (2) 13 (5) 6 (2) -3 (2) 13 (5) -15 (3) 5 (3) -22 (7) -17 (3) 5 (3) -23 (7) -16 (4) 7 (4) -32 (9) -20 (3) 6 (4) -34 (9) SYSO DIAO 137 ( 77) 115 ( 76) -239 (146) HSYMO HSIZO HECGO BACHO BCVAO 102 ( 37) 91 ( 36) -177 ( 56) -196 ( 55) -245 ( 72) -250 ( 73) KSYMO KPROO KSPNO 241 ( 67) 263 ( 66) -445 (110) -521 (100) -483 (136) -508 (136) RVESO RRETO RPAPO -122 ( 77) -369 (127) -374 (128) SEX INTERCEPT -121 ( 41) -1096 -122 ( 41) -983 298 ( 63) 5174 307 ( 63) 5465 335 ( 82) 6643 345 ( 82) 7079 R 2(%) ADJ'D R2(%) MSE 48.2 43.6 0.1420 46.8 42.7 0.1442 60.6 57.6 0.3309 59.5 56.8 0.3364 64.8 61.6 0.5294 63.7 60.9 0.5393 MIN. TOLER. VARIABLE 0.33 MAXSYS 0.37 MAXSYS 0.45 MAXSYS 0.57 MAXSYS, PCI 0.37 MAXSYS 0.56 MAXSYS NOTE: ( i ) PCI = 0.9148 (MINSYS) + 0.4039 (MINDIA) ( i i ) PC2 = -0.4039 (MINSYS) + 0.9148 (MINDIA) ( i i i ) RMXDIA = MAXDIA - 0.3187 (PC2) - 0.0729 (PC2) 123 to those found using the o r i g i n a l v a r i a b l e s ; that t h i s i s .so i s e a s i l y checked. This of course would not be true i f the sets had excluded one of the v a r i a b l e s , PCI or PC2, and i t i s i n t e r e s t i n g to note that both are retained - although PCI i s not highly s i g n i f i c a n t i n most of the sets. As for the tolerances, i t i s now MAXSYS that has the smallest usable f r a c t i o n , although PCI i s not much better at around 0.56 . PC2 i s almost completely unrelated to the other predictors retained, with a tolerance of around 0.93 . These usable f r a c t i o n s would have improved with the s u b s t i t u t i o n of RMXDIA for MAXSYS, but such a s o l u t i o n d i d not appear here. Table XIV describes " a l l subsets regression" r e s u l t s using dependent variables that were not discussed e a r l i e r : INTMED replaces the codes used for the INTERM i n t e r v a l s by the midpoints of the i n t e r -vals (the l a s t one being quite a r b i t r a r y ) , and the weighted TIME analysis i s an attempt to work with the o r i g i n a l data. While the r e s u l t s for weighted TIME are very s i m i l a r to those of Table XIII, the INTMED sets r e t a i n RMXDIA rather than PCI, with a corresponding improvement i n the usable f r a c t i o n s of the blood pressure v a r i a b l e s . Comparing the INTMED and INTERM re-expressions, however, shows that the l a t t e r has a s l i g h t l y higher percentage of i t s variance accounted for by regression on a l l 21 time zero predictors (68.4% versus 65.2%). To remedy the low tolerance problems evident with the sets of Table XIII, a new p a i r of "components" f o r MINSYS and MINDIA were formed, and MAXSYS was replaced by i t s r e s i d u a l , RMXSYS, a f t e r regression on these components (also c a l l e d PCI, PC2). Two i n t e r e s t i n g solutions from near the end of a stepwise procedure using these new variables and Table XIV: Regression Results, f,(ag) x 1000, Using Re-expressions of Extreme Blood Pressures and Kidney Variables VARIABLES & SUMMARIES ASR 'BEST' (A) INTMED ASR # 2 ASR # 3 (B) TIME: WEIG ASR 'BEST' HTED 1/3 FOR ASR # 2 SURVIVORS (n=98) ASR # 3 MAXSYS RMXDIA PCI PC2 -64 ( 13) -60 ( 25) -133 ( 39) -55 ( 14) -53 ( 25) -126 ( 39) -62 ( 13) -63 ( 26) -136 ( 40) -45 (14) 21 ( 10) -88 ( 28) -39 ( 14) 18 ( 11) -91 ( 28) -46 ( 14) 18 ( 11) -87 ( 28) SYSO DIAO -1317 (451) -1420 (452) -1238 (457) HSYMO HSIZO HECGO BACHO BCVAO -925 (329) -845 (330) -1105 (324) -640 (225) -535 (219) -560 (238) KC1 KC2 RKSPNO 1304 (621) -2362 (621) 1305 (616) -2088 (625) -2150 (625) -755 (435) 431 (422) -843 (443) RVESO RRETO RPAPO -1358 (534) -684 (425) -1097 (554) -1317 (544) -795 (299) -645 (336) -941 (290) -756 (335) -798 (299) -661 (337) SEX INTERCEPT 1257 (345) 30,780 1261 (342) 28,836 1216 (351) 31,078 1031 (250) 19,137 1037 (253) 18,526 1049 (251) 19,372 R 2 ( % ) 0 ADJ'D R 2(%) !: % of TOTAL R 2 c p MSE 62.7 59.4 96.2 9.36 9.9273 63.8 60.1 97.9 10.03 9.7531 60.9 57.8 93.4 10.39 10.3035 67.4 64.0 98.7 0.22 4.943 66.3 63.2 97.1 0.90 5.0552 67.8 64.1 99.3 1.29 4.941 MIN. TOLER. VARIABLE 0.71 MAXSYS not available MAXSYS 0.7 MAXSYS 0.38 MAXSYS, PCI 0.4 MAXSYS 0.4 MAXSYS NOTE: ( i ) PCI, PC2, RMXDIA as i n Table XIII ( i i ) KC1 = 0.4612 (KSYMO) + 0.8873 (KPROO) ( i i i ) KC2 = 0.8873 (KSYMO) + 0.4612 (KPROO) ( i v ) RKSPN = KSPNO - 0.21775 (KC1) + 0.15175 (KC2) (v) INTMED = 1.0 i f TIME e (0,2], 3.5 for (2.5], 7.5 for (5,10), 13 f o r [10,») 125 INTERV, are presented i n Table XV. Solution (B) has a p a r t i c u l a r l y good 2 set of tolerances and uses only s i x variables to produce a value of R that i s almost 94% of that obtained using a l l 21 variables; i t s major drawback i s perhaps the absence of the good p r e d i c t o r , RVESO. F i n a l l y , Table XVI presents the l a s t two l i n e a r discriminant functions produced by a stepwise program using the PCI, PC2, RMXDIA re-expressions of Table XIII. These r e s u l t s were the ones expected i n view of the functions obtained with the o r i g i n a l data. I t i s evident from the "percentage correct" column of each so l u t i o n that a downward adjustment of the constant, K , (from 0) would improve the percentage of non-survivors c o r r e c t l y i d e n t i f i e d , but such modifications were not pursued further at t h i s stage. 5.4.3 P r i n c i p a l Components Analysis One drawback to the regression r e s u l t s presented so far i s the tendency for only some of the v a r i a b l e s , from a group of s i m i l a r v a r i a b l e s that share a strong c o r r e l a t i o n with the dependent v a r i a b l e , .to appear i n the regression equation. An example here i s provided by the presence of RVESO as the lone r e t i n a l v a r i a b l e when p r e d i c t i n g INTERV, although RRETO would also be an e f f e c t i v e p r e d i c t o r . (The discussion of stepwise regression i n Section 4.2.4 includes a comment on t h i s problem.) The s o l u t i o n that f i r s t comes to mind involves the use of averages f o r each symptom group, to reduce the p o s s i b i l i t y of any s i n g l e representa-t i v e of the group performing poorly i n a p r e d i c t i o n equation. Since the f i r s t p r i n c i p a l component of a group of strongly correlated variables i s generally a type of weighted average, p r i n c i p a l component re-expressions were a natural choice at t h i s point. 126 Table XV: Predict i n g INTERV with P a r t i a l l y Re-expressed Data (A) VARIABLE COEF* ST.ERROR* TOLER. F-STAT. RMXSYS -15 3 0.72 25.38 PCI - 5 2 0.82 4.72 PC2 13 7 0.97 3.18 BACHO -177 56 0.89 9.89 KSPNO -445 110 0.73 16.42 RVESO -122 77 0.54 2.52 SEX 298 63 0.86 22.50 INTERCEPT 5174 (B) R z = 0.6061 * x 1000 VARIABLE COEF* ST.ERROR* TOLER. F-STAT. RMXSYS -17 3 0.90 40.95 PCI -6 2 0.89 7.35 PC2 12 7 0.97 3.04 BACHO -196 55 0.94 12.48 KSPNO -521 100 0.90 27.24 SEX 307 63 0.86 23.77 INTERCEPT 5465 R 2 = 0.5951 NOTE: ( i ) PCI = 0.7613 (MINSYS) + 0.6484 (MINDIA) ( i i ) PC2 = 0.4166 (MINSYS) - 0.9091 (MINDIA) ( i i i ) RMXSYS = MAXSYS - 0.6429 (PCI) - 0.6994 (PC2) = MAXSYS - 0.7808 (MINSYS) + 0.2190 (MINDIA) 127 Table XVI: Results of Stepwise Discriminant Analysis MAXDIA, MINSYS, MINDIA Replaced by RMXDIA, PCI, PC2 as in Table XIII C l a s s i f i c a t i o n function from Step 14 (out of a t o t a l of 15): 0.06023 (MAXSYS) - 0.0196 (PCI) + 0.09831 (PC2) + 0.67427 (BACHO) + 1.99168 (KSPNO) - 0.91544 (SEX) - 11.98467 CLASSIFICATION MATRICES: (JACKKNIFED VALUES ARE IN PARENTHESES) A l i v e Dead PREDICTED ALIVE 46 10 (43) (13) PREDICTED DEAD 6 36 (9) (33) PERCENT CORRECT 88.5 78.3 (82.7) (71.7) 83.67 (77.55) o v e r a l l C l a s s i f i c a t i o n Function from Step 15: 0.04567 (MAXSYS) + 0.09018 (PC2) + 0.63021 (BACHO) + 1.81847 (KSPNO) - 0.78453 (SEX) - 12.8843 CLASSIFICATION MATRICES (AS IN (A)) Aliv e Dead PREDICTED ALIVE 45 12 (43) (13) PREDICTED DEAD 7 34 (9) (33) PERCENT CORRECT 86.5 73.9 (82.7) (71.7) 80.61 (77.55) o v e r a l l 128 Table XVII gives an overview of the p r i n c i p a l component structure of the 17 standardized symptom var i a b l e s ; the reader should note that the computer program used here presented only those eigen-vectors whose corresponding eigenvalues are greater than 0.2 . The f i r s t column of the table shows that component 1 i s an o v e r a l l average that assigns a low weight to BCVAO. While the nature of t h i s component was expected, i t s low percentage of the t o t a l variance of the group -only 38.6% - was somewhat s u r p r i s i n g . Moreover, eleven components are required to obtain over 90% of the t o t a l variance of the o r i g i n a l 17 v a r i a b l e s . Only components 1, 13, and 8 have a c o r r e l a t i o n with the target, INTERV, whose absolute value exceeds 0.1; they are -0.6897, -0.1733, and 0.1036 r e s p e c t i v e l y . The regression equation for INTERV using these three components i s : -0.228KPC1) + 0.1160(PC8) - 0.2695(PC13) + 2.2755 2 with a t o t a l R of 0.5165 . Table XVIII converts t h i s equation into one involving only the o r i g i n a l v a r i a b l e s . Consideration of these r e s u l t s made i t apparent that, i n addition to the problems of i n t e r p r e -t a b i l i t y mentioned i n Chapter 4, t h i s s o l u t i o n also s u f f e r s by comparison 2 with R values obtained e a r l i e r with the same target; i f the SEX v a r i a b l e was uncorrelated with a l l three of the components i n the above 2 equation, the t o t a l R would change to 0.5627 with the i n c l u s i o n of t h i s fourth v a r i a b l e (the actual value would, of course, be lower); t h i s 2 i s to be compared with the R of 0.6061 of Table XV (A), f o r example. Whether the approximate 5% decrease i n variance accounted for by regres-sion i s o f f s e t by the increased " s t a b i l i t y " of the r e s u l t i n g equation i s l a r g e l y a matter of opinion. In any event, another technique was Table XVII: Eigenvectors of 17 Standardized Time Zero Variables ( x 10^) Eigenvectors 1 2 1 MAXSYS 2945 -2559 2 MAXDIA 2953 -2078 3 MINSYS 2622 -3322 4 MINDIA 2417 -2801 5 SYS 3071 -2675 6 DIA 3010 -2746 7 HSYM 1959 1974 8 HSIZ 1930 3404 9 HECG 1894 1051 10 BACH 1538 3661 11 BCVA 666 1432 12 KSYM 2210 3464 13 KPRO 2610 1242 14 KSPN 2128 1402 15 RVES 2954 1324 16 RRET 2780 1928 17 RPAP 2179 1591 3 4 5 6 -1840 -592 -98 2725 -798 -1498 -985 2827 443 1485 2890 -3268 1183 1970 3652 -3736 -1207 -1385 -579 -177 -1603 -1913 -658 1053 -4423 -369 -2157 -3307 -3621 -93 -754 -1547 -2083 4837 -3506 -1499 -995 -2590 4510 -2149 -1482 6275 4376 4026 -903 -2246 2592 2026 2570 1664 -755 694 3580 2023 -3176 -1490 1266 718 -1475 2795 3357 -1641 64 1754 4128 -1001 93 -2171 7 8 9 10 1315 2217 934 1326 -3345 -167 -741 -2495 555 216 2097 -684 -1123 -1984 839 -1809 1059 698 -1195 4062 403 -1134 -2080 -162 3817 310 -1975 -1475 -495 -1373 6937 -431 -2915 2430 -3116 -1110 -2931 2419 -2196 5082 2805 446 -1212 656 -844 -4069 -2303 -3963 -4538 2800 1026 -751 30 -5965 -1376 3535 1916 1297 2794 2342 1122 91 1061 -1059 4268 3701 -1585 -2554 11 12 13 14 -317 -3228 -3714 275 909 805 2941 4994 -435 -1223 -3323 1007 -2053 2698 2403 -958 739 -389 -649 -3323 1779 -384 3404 -619 1634 5067 -2058 1360 1479 -2983 2004 440 -4249 -2661 68 -385 -594 -19 944 1901 2336 5 940 1524 -1421 -1436 -1906 -4106 5419 2593 -2400 -2884 1473 -1388 -483 1876 -3991 3647 2933 -2774 -3025 1226 -3247 4103 1897 -3561 3123 -512 % TOTAL (38.6) (50.1) (58.9) (66.1) (72.0) (76.5) (80.5) (83.9) (86.8) (89.4) (91.9) (93.9) (95.7) (97.2) VARIANCE 1 Table XVIII: C o e f f i c i e n t s of Variables Obtained from Regression on P r i n c i p a l Components Index of components entering 1 13 8 Residual sum of squares 36.46919 34.37985 33.63271 F-Values Regression Component Model 87.11 48.61 33.47 to enter 87.11 0.4757 5.77 0.5058 2.09 0.5165 CONSTANT 5.2357 5.4294 5.4287 VARIABLES 1 MAXSYS 2 MAXDIA -0.0023 0.0011 0.0020 -0.0041 -0.0090 -0.0091 3 MINSYS -0.0023 0.0011 0.0012 4 MINDIA -0.0039 -0.0085 -0.0101 5 SYS 6 DIA 7 HSYM 8 HSIZ 9 HECG 10 BACH 11 BCVA 12 KSYM -0.0881 -0.0661 -0.0763 -0.0883 -0.2062 -0.2231 -0.05316 0.01287 0.01715 -0.05073 -0.11297 -0.13133 -0.05077 -0.05291 -0.01978 -0.03192 -0.05509 -0.02955 -0.01630 -0.04348 -0.04903 -0.08420 0.00162 -0.07721 13 KPR0 -0.06758 0.00585 0.04273 14 KSPN -0.07796 -0.05706 -0.16821 15 RVES -0.06556 -0.14247 -0.12783 16 RRET -0.05903 0.02312 0.02410 17 RRAP -0.07022 -0.18916 -0.12849 NOTE: See Table XVII for d e f i n i t i o n of components. 131 considered here, as we l l ; i t s d e s c r i p t i o n follows. Table XIX describes the r e s u l t s of carrying out a p r i n c i p a l component analysis on each of s i x r e l a t i v e l y homogeneous groups of symptoms: extreme blood pressures, average blood pressures, heart, brain, kidney, and r e t i n a l symptoms. The variables were not standardized i n t h i s case. Table XX gives the c o r r e l a t i o n matrix for 12 of the components so obtained. Correlations between the f i r s t component of each group and the blood pressure components, EBPl and BPl are of p a r t i -cular i n t e r e s t , as i s the very high c o r r e l a t i o n between Rl and Kl (0.699). Turning now to the r e l a t i o n s h i p of these components with the dependent v a r i a b l e , INTERV, Table XX shows that the s i x strongest corre-l a t i o n s are with, i n descending order: R l , BPl, K l , EBPl, Hi, B l . " A l l subsets regression" was c a r r i e d out on the variables of Table XX, and the "best" set retained the following: EBP3, BPl, HI, Rl with respective tolerances: 0.96, 0.69, 0.79, 0.70; the two-tailed P-value for the 2 c o e f f i c i e n t of Hi was 0.12 5 . Since the value of R for t h i s set was a disappointing 0.4786, no further d e t a i l i s warranted here. Improve-ments i n techniques in v o l v i n g within-groups p r i n c i p a l components were eventually r e a l i z e d , and w i l l be described i n a l a t e r section. 5.4.4 Ad d i t i o n a l Analyses The foregoing account of preliminary regression and d i s c r i m i -nant analysis r e s u l t s using time zero data i s by no means complete. It-i s not po s s i b l e to give here more than the following short d e s c r i p t i o n of a few of the most i n t e r e s t i n g explorations that were also c a r r i e d out at t h i s stage: Table XIX: P r i n c i p a l Components within Groups of Symptoms (C o e f f i c i e n t s x 10 4) (A) (B) (D) MAXSYS MAXDIA MINSYS MINDIA % VAR. CORR. EBPl 7025 3166 5849 2533 72.9 -512 EBP2 6020 2061 -6660 -3894 17.6 -88 EBP3 -3342 8773 -1961 2831 6.5 -277 EBP4 1800 -2959 -4194 8391 3.0 -201 SYSO DIAO % VAR. CORR. BPl 7167 6974 90.4 -543 BP2 -6974 7167 9.6 -52 HSYMO HSIZO HECGO % VAR. CORR. HI 5969 6116 5193 62.1 -404 H2 2399 4816 -8429 22.7 38 H3 -7656 6277 1407 15.2 -4 BACHO BCVAO % VAR. CORR. Bl 9691 2465 59.2 -317 B2 -2465 9691 40.8 7 KSYMO KPROO KSPNO % VAR. CORR. Kl 3102 8448 4361 64.3 -514 K2 -9220 3791 -785 20.4 232 K3 -2317 -3777 8965 15.3 -171 RVESO RRETO RPAPO % VAR. CORR. Rl 6315 6938 3463 75.9 -603 R2 -7599 4650 4541 14.7 12 R3 1540 -5449 8209 9.4 -85 NOTE: ( i ) "% Var." i s the percentage of t o t a l variance accounted f o r by the component ( i i ) "CORR" i s the c o r r e l a t i o n of the component with the target, INTERV ( x 10 3). Table XX: Correlations Among Selected Components of Table XIX ( x 10 4) EBPl 1 EBPl 1 10000 EBP 2 2 -0 EBP3 3 0 BPl 4 8149 HI 5 3716 H2 6 -1065 Bl 7 1750 B2 8 885 Kl 9 4539 K2 10 -266 Rl 11 5244 R2 12 -1140 INTERV 13 -5119 EBP2 EBP3 BPl 2 3 4 10000 - i 10000 1336 1691 10000 1464 672 3974 717 -669 -146 -36 571 1751 -566 -1530 -561 -229 2940 4310 -1458 -832 -1247 885 1591 5018 -1860 149 -1368 -882 -2773 -5435 HI H2 Bl 5 6 7 10000 -38 10000 3830 1591 10000 980 -1901 23 4316 -1447 3565 -2453 -2975 -3590 4004 -48 3528 -2224 1151 647 -4044 377 -3169 B2 Kl K2 8 9 10 10000 615 10000 230 25 10000 74 6990 -2023 -2333 451 -945 72 -5140 2318 Rl R2 INTERV 11 12 13 10000 70 10000 -6027 123 10000 EIGENVALUES 3.33442 1.65976 1.36169 1.14669 1.03624 0.22168 0.14255 CUMULATIVE PROPORTION OF TOTAL VARIANCE OF INDEPENDENT 1 2 3 4 5 0.27787 0.41619 0.52966 0.62522 0.71158 0.98815 1.00000 0.77520 0.71014 0.64215 0.49906 0.47052 VARIABLES 6 7 8 9 0.77618 0.83536 0.88887 0.93046 0.96967 134 a l l subsets regression using INTERV was done on two random subsets of the 98 controls, one of s i z e 44, the other of size 63; the r e s u l t s were supportive of those given i n Table VIII (A); various regressions were done with one or more of the extreme blood pressure variables l e f t out, as part of the study of the i n t e r -relatedness of t h i s set of var i a b l e s ; one of the findings was that MINSYS and MINDIA seem to work best as a "team", insofar as MINDIA was only marginally s i g n i f i c a n t , at best, i n the absence of MINSYS; a stepwise discriminant analysis was done on the INTERV va r i a b l e ; the greatest percentage of correct c l a s s i f i c a t i o n s was for the survivors (coded 3) followed by the short-term group (coded 1), and then the middle group (coded 2); t h i s l a s t group was very poorly c l a s s i f i e d ; some of the analyses described i n Sections 5.4.1 and 5.4.2 were done with the SEX va r i a b l e l e f t out; the r e s u l t s form part of the motivation for the next seri e s of analyses. 5.5 Predictions i n Male-Female Strata The presence of the SEX va r i a b l e i n a l l the sets chosen by stepwise and " a l l subsets" methods (using the combined male and female data), along with the changes observed when t h i s v a r i a b l e was l e f t out of c e r t a i n analyses, suggested that a better understanding of these data might be obtained by separating the cases into sex groups. Further support f o r t h i s d i v i s i o n came from a regression analysis i n which the c o e f f i c i e n t s of a "reasonable" set of predictors - MAXSYS, MINSYS, MINDIA, BACHO, and KSPNO - for the target, INTERV, were compared i n the 135 two groups. The analysis of variance of regression c o e f f i c i e n t s over groups was s i g n i f i c a n t (the P-value f o r the F - s t a t i s t i c with 6 and 86 degrees of freedom was 0.00057), i n d i c a t i n g that the two sets of c o e f f i c i e n t s cannot reasonably be assumed i d e n t i c a l ; examination of the c o e f f i c i e n t s themselves showed that those for MINSYS and MINDIA were only about one t h i r d as large i n the male subsample as i n the female group. Thus, many of the more enlightening analyses of Section 5.4 were repeated with the sample s t r a t i f i e d by sex. Since the problems encountered here - e s p e c i a l l y with the female group - were s i m i l a r to those seen with the combined sample, t h i s section w i l l c l o s e l y p a r a l l e l the preceding one. 5.5.1 Analyses Using O r i g i n a l and P a r t i a l l y Re-expressed Data Tables XXI to XXIV summarize the major regression and d i s c r i -minant analysis r e s u l t s , using e i t h e r the o r i g i n a l v a r i a b l e s or " p a r t i a l " p r i n c i p a l components and res i d u a l s (as i n Section 5.4.2) together with a v a r i e t y of dependent v a r i a b l e s . A few comments are i n order here: while i n the male subsample, at most four v a r i a b l e s are selected (MAXSYS, BACHO, KSPNO, RPAPO), the sets for the female group contain f i v e to seven predictors ( e s s e n t i a l l y MAXDIA, MINSYS, MINDIA, HSYMO, KSPNO, RRETO); 2 values of R are s i m i l a r for the two groups when the target i s INTERM (around 65%), but are considerably larger i n the female group when INTERV i s predicted with the o r i g i n a l v a r i a b l e s (68% versus 60%); i t should be noted that Table XXIII (E), besides in v o l v i n g p a r t i a l l y Table XXI: Regression Results, 3»(°g) x 1000, for the Male Subsample (n=48) VARIABLES & SUMMARIES ASR 'BEST' (A) INTERV ASR #2 ASR #3 ASR 'BEST' (B) INTERM ASR #2 ASR #3 AGE HIST DUR MAXSYS MAXDIA MINSYS MINDIA -16 (3) -15 (3) -16 (3) -17 (4) -19 (4) -19 (4) SYSO DIAO HSYMO HSIZO HECGO -BACHO BCVAO -188 (85) -256 (111) -311 (114) KSYMO KPROO KSPNO -602 (142) -573 (137) -296 (168) -545 (142) -526 (199) -464 (214) -464 (202) -749 (183) RVESO RRETO RPAPO -377 (165) -398 (164) INTERCEPT 6096 5867 6053 7364 7581 7793 R 2(%) ADJ'D R 2(%) C P 2 % of Total R z MSE 55.9 53.9 0.91 83.4 0.4108 60.3 57.6 1.03 90.0 0.3783 58.8 56.0 2.32 87.8 0.3922 65.2 61.9 7.08 87.3 0.6152 64.7 61.4 7.55 86.7 0.6229 60.9 58.2 7.62 81.6 0.6747 MIN. TOLER. VARIABLE 0.937 BOTH 0.880 MAXSYS NOT AVAILABLE 0.644 RPAPO 0.656 RPAPO 0.880 MAXSYS Table XXI, Continued: (Male Subsample) VARIABLES & SUMMARIES (C) I ASR 'BEST' NTMED ASR #2 (D) TIME (SUB ASR 'BEST' IV. WT'D 1/3) ASR #2 (E) INTER1 OMITTl SWR #3 / (SEV. GRP. 3 SD, n=38) SWR #4 AGE HIST DUR MAXSYS MAXDIA MINSYS MINDIA -79 ( 18) -74 (18) -70 ( 14) -72 ( 14) -12 (4) -14 (4) SYSO DIAO HSYMO HSIZO HECGO -887 (437) BACHO BCVAO -1217 (467) -1059 (471) -619 (313) -244 (108) KSYMO KPROO KSPNO -3135 (750) -2499 (844) -1174 (536) -1433 (535) -645 (216) -618 (228) RVESO RRETO RPAPO -1080 (697) -945 (408) -798 (420) INTERCEPT 28,758 27,530 23,555 24,139 5,358 5,748 R 2(%) ADJ'D R 2(%) c p % of Total R 2 MSE 61.4 58.7 -0.97 86.4 11.3972 63.4 60.0 0.13 89.2 11.0471 64.5 61.2 7.85 83.7 5.4635 64.4 61.0 8.04 83.5 5.4890 50.1 45.7 NA 0.4143 42.5 39.2 NA 0.4635 MIN. TOLER. VARIABLE 0.880 MAXSYS 0.644 RPAPO 0.706 RPAPO 0.644 RPAPO 0.870 MAXSYS 0.962 BOTH 138 Table XXII: Discriminant Analysis Results (Male Subsample) (A) Using the l i n e a r rule that classes a case as a 10-year survivor i f and only i f : 2.30807 (DIAO) + 0.78958 (BACHO) + 1.66546 (KSPNO) -8.81088 £-0.16034 = l n ( q A / q D ) ( i ) CLASSIFICATION MATRICES: 'TRAINING SET" (n=40), JACKKNIFED A l i v e Dead PREDICTED ALIVE 16 4 PREDICTED DEAD 2 18 PERCENT CORRECT 88.9 81.8 85.0% o v e r a l l ( i i ) A l i v e Dead "OMITTED SET" (n=8) PREDICTED ALIVE PREDICTED DEAD 4 3 0 1 PERCENT CORRECT 100.0 25.0 62.5% o v e r a l l I 139 Table XXII, Continued (Male Subsample) (B) Using the entire subsample (n=48) ( i ) CLASSIFICATION FUNCTION FROM STEP #2: 0.05714 (MAXSYS) + 2.04472 (KSPNO) - 13.92056 j< -0.16034 (No C l a s s i f i c a t i o n Matrix Available) ( i i ) CLASSIFICATION FUNCTION FROM THE FINAL STEP (#5) 0.05506 (MAXSYS) + 0.82685 (BACHO) + 2.11678 (KSPNO) - 14.28473 < -0.16034 CLASSIFICATION MATRIX (Jackknifed Values i n Parentheses) PREDICTED PREDICTED PERCENT ALIVE DEAD CORRECT A l i v e 19 (19) 3 (3) 86.4 (86.4) Dead 3 (3) 23 (23) 88.5 (88.5) 87.5 87.5 o v e r a l l Table XXIII: Regression Results, §»(a") x 1000, for the Female Subsample (n=50) VARIABLES & SUMMARIES ASR 'BEST' (A) INTERV ASR #2 ASR #3 ASR 'BEST' (B) INTERM ASR #2 ASR #3 AGE HIST DUR MAXSYS MAXDIA MINSYS MINDIA -21 (5) 18 (4) -22 (7) -21 (5) 18 (5) -22 (8) -6 (4) -15 (6) 22 (5) -24 (7) -28 (7) 23 (6) -27 (11) -29 (7) 20 (6) -23 (10) -28 (7) 23 (6) -26 (10) SYSO DIAO HSYMO HSIZO HECGO -300 ( 95) -343 ( 96) -297 ( 93) -329 (132) -274 (130) BACHO BCVAO KSYMO KPROO KSPNO -276 (128) -300 (126) -436 (179) -359 (176) RVESO RRETO RPAPO -219 ( 72) -248 ( 73) -192 ( 72) -356 (101) -338 (102) -319 ( 99) INTERCEPT 5219 5265 5591 6865 6928 6804 R 2(%) ADJ'D R 2(%) C P 2 % of Total R z MSE 68.3 63.8 13.90 92.3 0.2227 64.8 60.8 14.04 87.6 0.2414 70.3 65.3 15.53 95.0 0.2137 61.6 57.3 13.33 84.3 0.4540 61.4 57.1 13.55 84.0 0.4564 65.0 60.2 13.68 88.9 0.4235 MIN. TOLER. VARIABLE 0.502 MINSYS 0.502 MINSYS 0.377 MAXSYS 0.502 MINSYS 0.528 MINSYS 0.502 MINSYS Table XXIII, Continued (Female Subsample) VARIABLES & SUMMARIES (C) ASR 'BEST' INTMED ASR #2 (D) TIME (S ASR "BEST' URV. WT'D 1/3) ASR #2 (E) II SWR #6 \fTERM SWR #7 MAXSYS RMXDIA PCI PC2 -132 ( 29) -175 ( 46) -125 ( 30) -173 ( 48) -79 ( 21) -119 ( 33) -77 ( 21) -115 ( 34) -30 (7) -31 ( 11) -30 (7) -27 ( 11) SYSO DIAO -1269 (455) -1390 (465) HSYMO HSIZO HECGO -1521 (534) -1703 (551) -1060 (359) -1058 (370) -166 (143) BACHO BCVAO -687 (263) -664 (271) KC1 KC2 RKSPNO -1697 (762) -942 (489) 346 (220) -416 (176) 477 (190) -452 (174) RVESO RRETO RPAPO -1289 (393) -1318 (409) -752 (256) -726 (263) -317 ( 092) -323 ( 92) INTERCEPT 27,670 26,911 23,535 23,347 7,060 6,905 R 2(%) ADJ'D R 2(%) c p % of Total R 2 MSE 66.2 62.4 2.89 91.0 7.514 62.4 59.1 3.94 85.8 8.175 77.6 73.8 9.00 94.7 3.2431 75.6 72.2 9.18 92.3 3.4479 67.3 62.7 92.1 0.3964 66.2 62.4 90.6 0.3996 MIN. TOLER. VARIABLE 0.800 (RRETO) NOT AVAILABLE NOT AVAILABLE 0.632 (RRETO) NOT AVAILABLE NOT AVAILABLE 0.654 (HSYMO) 0.776 (RRETO) NOTE; PCI = 0.40393 (MINDIA) + 0.91479 (MINSYS) KC1 = 0.46118 (KSYMO) + 0.88731 (KPROO) PC2 = -0.40393 (MINSYS) + 0.91479 (MINDIA) KC2 = -0.88731 (KSYMO) + 0.46118 (KPROO) RMXDIA = MAXDIA - 0.31867 (PCI) - 0.07292 (PC2) RKSPNO = KSPNO - 0.21775 (KC1) + 0.15175 (KC2) Table XXIV: Discriminant Analysis Results (Female Subsample) Using the l i n e a r rule that classes a case as a 10-year survivor i f and only i f : 0.15350 (MAXDIA) - 0.12147 (MINSYS) + 0.16997 (MINDIA) + 2.74908 (HSYMO) - 22.68004 < 0.40547 = l n ( q A / q D ) ( i ) CLASSIFICATION MATRICES: "TRAINING SET" (n=42), JACKKNIFED Al i v e Dead PREDICTED ALIVE 18 2 PREDICTED DEAD 6 16 PERCENT CORRECT 75.0 88.9 81.0 o v e r a l l ( i i ) "OMITTED SET" (n=8) A l i v e Dead PREDICTED ALIVE PREDICTED DEAD 5 0 PERCENT CORRECT 83.3 100.0 87.5 o v e r a l l 143 Table XXIV, Continued: (Female Subsample) Using the entire subsample (n=50) with RMXDIA, PCI, PC2 as i n Table XXIII ( i ) CLASSIFICATION FUNCTION FROM STEP #4: 0.13764 (RMXDIA) + 0.20607 (PC2) + 2.33107 (HSYMO) + 2.01577 (KSPNO) - 20.49559 < 0.40547 CLASSIFICATION MATRIX (Jackknifed Results) PREDICTED PREDICTED PERCENT ALIVE DEAD CORRECT A l i v e 28 (27) 2 (3) 93.3 (90.0) Dead 3 (3) 17 (17) 85.0 (85.0) 90.0 (88.0) ( i i ) CLASSIFICATION FUNCTION FROM STEP (#5) 0.08812 (AGE) + 0.15185 (RMXDIA) + 0.23188 (PC2) + 2.17753 (HSYMO) + 1.98465 (KSPNO) - 25.80594 < 0.40547 CLASSIFICATION MATRIX (Jackknifed Results) PREDICTED ALIVE PREDICTED DEAD PERCENT CORRECT Aliv e Dead 27 (26) 1 (3) 3 (4) 19 (17) 90.0 (86.7) 95.0 (85.0) 92.0 (86.0) o v e r a l l 144 re-expressed p r e d i c t o r s , also has a d i f f e r e n t target than the one i n Table XXI (E); the r e s u l t s of using INTERV with the f i v e female members of s e v e r i t y group 3 omitted were very s i m i l a r to those given i n Table XXIII (A) f o r the complete female sample; considering the l e s s biased jackknifed r e s u l t s , i t would seem that the o v e r a l l percentages of correct c l a s s i f i c a t i o n s i n the two subsamples are very s i m i l a r although, here again, fewer v a r i a b l e s were retained for the male group. Appendices E and F give the c o r r e l a t i o n matrix of the time zero predictors f o r each subsample. 5.5.2 Using Within-groups P r i n c i p a l Components As i n the combined-sample analysis of Section 5.4.3, the r e s u l t s of using p r i n c i p a l components ca l c u l a t e d from the 17 ungrouped symptom variables were rather disappointing for both the male and female subsample. More i n t e r e s t i n g and interpretable solutions were obtained as follows: the four extreme blood pressure variables were f i r s t "standardized" so that each had a mean of three and a standard deviation of one - making the units comparable to those of SYSO and DIAO; p r i n c i -pal components were then calculated within each of the f i v e groups: blood pressure, heart, brain, kidney, and r e t i n a l symptoms (see Tables XXV and XXVI); the c o r r e l a t i o n matrices of these two sets of within-groups components were examined, and strong c o r r e l a t i o n s among f i r s t components were noted (see Appendices G and H); the r e l a t i o n s h i p of these components with the target INTERM was explored: i n both samples, the strongest c o r r e l a t i o n s were with R l , BPl, and K l , i n descending order; 145 Table XXV: P r i n c i p a l Components Within Groups of Symptoms (Male Subsample) (Co e f f i c i e n t s x 10 4) SMXSYS SMXDIA SMNSYS SMNDIA SYSO DIAO % VAR. CORR. BP1 4440 4211 4436 4181 3755 3361 73.3 -627 BP2 3071 4761 -4831 -5996 918 2788 12.3 -49 BP3 -5735 6563 -1891 3583 -2725 437 6.3 84 BP4 5345 2537 643 710 -5573 -5746 3.7 55 BP 5 2655 -2128 -7278 5682 1745 -266 2.7 -294 BP6 1513 -2358 166 968 -6598 6903 1.7 -129 SMXSYS = (MAXSYS - 143.83)/29.558; SMXDIA = (MAXDIA - 100.45)/16.136 SMNSYS = (MINSYS - 81.824)730.100; SMNDIA = (MINDIA- 58.004)715.929 (B) HSYMO HSIZO HECGO % VAR. CORR. HI 6569 5898 4697 64.4 -361 H2 -4400 -2059 8741 22.8 -11 H3 6122 -7808 1243 12.8 104 (C) BACHO BCVAO % VAR. CORR. Bl B2 9354 -3537 3537 9354 63.8 36.2 -428 151 (D) (E) KSYMO KRPOO KSPNO % VAR. CORR. Kl 1616 8309 5325 69.1 -585 K2 -9457 2845 -1570 18.8 256 K3 -2819 -4782 8318 12.1 -66 RVESO RRETO RPAPO % VAR. CORR. RI 6609 6113 4353 77.8 -709 R2 -7043 3050 6410 14.6 -20 R3 2591 -7303 6321 7.6 -101 NOTE: "CORR" i s the c o r r e l a t i o n of the component with the target, INTERM (cf Table XIX), x 10 3. 146 Table XXVI: P r i n c i p a l Components Within Groups of Symptoms (Female Subsample) (Coef f i c i e n t s * 10 4) SMXSYS SMXDIA SMNSYS SMNDIA SYSO DIAO % VAR. CORR. BPl 4714 4485 4337 3626 3409 3753 63.1 -485 BP2 3121 3837 -4517 -6971 1633 1967 17.5 -290 BP3 4709 -4665 5537 -4668 -409 -1857 8.4 337 BP4 164 -6128 -2790 1010 6408 3544 5.4 25 BP 5 6710 -1232 -4710 3929 -2178 -3330 3.7 -153 BP6 901 -2082 -393 127 -6304 7412 2.0 86 SMXSYS = (MAXSYS - 158.86)/28.420; SMXDIA = (MAXDIA - 98.630)/16.696 SMNSYS = (MINSYS - 104.33)/21.285; SMNDIA = (MINDIA - 66.290)/12.177 (B) HSYMO HSIZO HECGO % VAR. CORR. HI 5277 6083 5929 61.9 -390 H2 -1033 7377 -6676 23.2 -20 H3 -8435 2929 4503 15.0 59 (C) (D) BACHO BCVAO % VAR. CORR. Bl 9956 -936 54.8 -308 B2 936 9956 45.2 -58 KSYMO KRPO0 KSPNO % VAR. CORR. Kl 4556 8370 3031 63.5 -467 K2 -4798 5177 -7084 20.9 380 K3 -7499 1773 6374 15.7 198 (E) RVESO RRETO RPAPO % VAR. CORR. Rl 5699 7906 2240 77.8 -624 R2 -8206 5334 2051 15.6 32 R3 427 -3007 9528 6.7 -38 NOTE: "CORR" i s the c o r r e l a t i o n of the component with the TARGET, INTERM, x 10 3. 147 " a l l subsets" regression was c a r r i e d out using the o r i g i n a l components, along with AGE, HIST, and DUR: these r e s u l t s appear i n Tables XXVII and XXVTII; due to the presence of low tolerances i n the selected sets of o r i g i n a l components, the 11 non-blood pressure symptoms were replaced by t h e i r r e s i d u a l s a f t e r (multiple) regression on BPl, BP2, BP3, BP4, BP5, and BP6 (this was denoted by preceding the o r i g i n a l component name with an R); a l l subsets regression was repeated with the transformed set of components; the process continued with'the conversion of RH, KB, and RK v a r i a b l e s to t h e i r residuals (denoted SHI, SH2, etc.) a f t e r regression on RRl, RR2, and RR3; then the SH and SB variables became THl, TH2, TH3, TBI, TB2 - t h e i r r e s i d u a l s a f t e r regression on SKI, SK2, SK3; f i n a l l y , TBI and TB2 were replaced by UBl and UB2, t h e i r residuals a f t e r regression on the TH v a r i a b l e s . The success of t h i s orthogonalization procedure may be judged from the f i n a l c o r r e l a t i o n matrices, given i n Appendices I and J : although a few weak "within-group" c o r r e l a t i o n s have crept i n as a r e s u l t of using r e s i d u a l s , a l l "among-group" c o r r e l a t i o n s have vanished. The re s i d u a l s also appear to be more weakly correlated with INTERM than were the o r i g i n a l components. One unfortunate drawback to t h i s process i s the extremely complicated nature of the "higher order" r e s i d u a l s : while the c o e f f i c i e n t s f o r t h e i r c a l c u l a t i o n are a v a i l a b l e , they are not presented here. However, the order i n which resi d u a l s were taken was designed to make i t more . l i k e l y that only "lower order" re s i d u a l s would be retained i n the f i n a l sets of p r e d i c t o r s . Tables XXIX and XXX present the regression and discriminant analysis r e s u l t s f or the male group, using the f i n a l set of (almost) Table XXVII: A l l Subsets Regression Results for Males: INTERM on the Components of the Table XXV (A) "BEST" Set: R 2 = 63.4, ADJ'D R 2 = 60.0, C = 0.94 MSE = 0.6460 VARIABLE COEF* ST.ERROR* P-VALUE (%) TOLERANCE BP1 -226 74 0.4 0.643 BP5 -663 324 4.7 0.934 BP 6 -676 390 9.0 0.993 RI -385 96 0.0 0.613 INTERCEPT 5423 (B) SECOND Set: R 2 = 63.3, ADJ'D R 2 = 59.9, C = 1.04, MSE = 0.6475 VARIABLE COEF* ST.ERROR* P-Value (%) : BP1 -251 75 0.1 BP5 -606 327 6.4 K2 374 220 8.9 RI -338 97 0.1 INTERCEPT 5298 Along with AGE, HIST, DUR ^w o - T a i l P-VALUES: PROB I t v | > COEF/ST.ERROR, v = d.f. *: x 1000 149 Table XXVIII: A l l Subsets Regression Results f o r Females: INTERM on the Components of the Table XXVI (A) "BEST" Set: R 2 = 69.7, ADJ'D R 2 = 64.7, C p =-1.40 MSE = 0.3752 VARIABLE ° COEF* ST.ERROR* P-VALUE (%) TOLERANCE BP1 -178 57 0.3 0.715 BP2 -263 96 0.9 0.919 BP3 427 135 0.3 0.958 BP5 -442 209 4.0 0.922 Bl -177 88 5.1 0.858 K2 594 169 0.1 0.909 RI -220 84 1.3 0.597 INTERCEPT 5260 (B) SECOND Set: R 2 = 70.7, ADJ'D R 2 = 64.9, C = -0.39, MSE = 0.3727 VARIABLE COEF* ST.ERROR* P-Value (%) BP1 -170 57 0.3 BP2 -265 96 0.6 BP 3 498 148 0.1 BP5 -402 212 5.7 Bl -209 92 2.3 K2 562 171 0.1 RI -214 84 1.1 AGE -14 13 25.8 INTERCEPT 5840 NOTE: See Table XXVII footnotes *: x 1000 Table XXIX: A l l Subsets Regression Results f o r Males: INTERM on "Uncorrelated" Components "BEST" Set: R 2 = 61.1, ADJ'D R 2 = 58.4, C = 1.41 MSE = 0.6713 VARIABLE COEF* ST.ERROR* P-VALUE (%) TOLERANCE BPl -405 61 0.0 1.000 BP5 -998 319 0.3 1.000 RRl -388 101 0.0 1.000 INTERCEPT 5055 = Rl - 0.46415 (BPl) + 0.1448 (BP2) + 0.21198 (BP3) + 0.64916 (BP4) - 0.86854 (BP5) + 0.34733 (BP6) SECOND Set: R 2 = 62.9, ADJ'D R 2 = 59.5, C = 1.48, MSE = 0.6549 VARIABLE COEF* ST.ERROR* P-Value (%) TOLERANCE BPl -405 60 0.0 1.000 BP5 -998 315 0.2 1.000 UBl -209 144 14.7 1.000 RRl -388 99 0.0 1.000 INTERCEPT 5015 1000 151 Table XXX: Stepwise Discriminant Analysis Results for Males: Status by "Uncorrelated" Components (A) Step 13 Result: CLASSIFICATION FUNCTION:* 0.90305 (BP1) + 1.82933 (BP5) + 0.57200 (RR1) - 5.61262 CLASSIFICATION MATRICES: (Jackknifed Results) PREDICTED PREDICTED PERCENT ALIVE DEAD CORRECT A l i v e 19 (18) 3 (4) 86.4 (81.i Dead 7 (7) 19 (19) 73.1 (73. 79.2 (77. (B) Step 12 Result: CLASSIFICATION FUNCTION:* 0.95587 (BP1) + 1.93634 (BP5) + 0.95080 (SKI) + 0.60545 (RR1) - 5.38644 CLASSIFICATION MATRICES (Jackknifed Results) PREDICTED PREDICTED PERCENT ALIVE DEAD CORRECT Alive 19 (19) 3 (3) 86.4 (86.4) Dead 7 (8) 19 (18) 73.1 (69.2) 79.2 (77.1) *as i n Table XXII 152 uncorrelated components. Both the R values and the percentage of correct c l a s s i f i c a t i o n s are d i s t i n c t l y lower than i n e a r l i e r s olutions, although the tolerances are perfect (see Tables XXI (B) and XXII (B)); i n a ddition, the "best" subset i s quite simple to i n t e r p r e t . These r e s u l t s are to be contrasted with those f o r the female subsample, presented i n Tables XXXI and XXXII. C l e a r l y , the set given i n Table XXXI (A) and i n v o l v i n g only " f i r s t order" r e s i d u a l s , i s already a very s a t i s f a c t o r y s o l u t i o n ; the nearly perfect tolerances of the set i n part (B) make the more complex solutions of part (C) l e s s i n t e r e s t i n g , and 2 the high value of R f o r the former set i s also noteworthy. The d i s -criminant analysis r e s u l t s (Table XXXII) also compare favourably with e a r l i e r s o l u t i o n s . 5.5.3 Additional Analyses As was the case with the combined-sample analysis of Section 5.4, i t i s not possible to present here a complete account of a l l the analyses done with the male and female subsamples. Only one a d d i t i o n a l set of r e s u l t s w i l l be mentioned at t h i s point: the r e s u l t s of applying l o g i s t i c regression to the STATUS and the set of variables selected by previous stepwise discriminant analyses (as presented i n Section 5.5.1). For the male group, l o g i s t i c regression produced the same c l a s s i -f i c a t i o n matrix as the one i n Table XXII (B), using the same pred i c t o r s , MAXSYS, BACHO, KSPNO. Using RMXDIA, PC2, HSYMO, and KSPNO i n the'female subsample resu l t e d i n a c l a s s i f i c a t i o n matrix that i s i d e n t i c a l to the non-jackknifed r e s u l t s of Table XXIV ( B ) ( i ) ; other sets of three or four v a r i a b l e s r e s u l t e d i n a lower o v e r a l l percentage of c o r r e c t c l a s s i f i c a - -t i o n s . 153 Table XXXI: A l l Subsets Regression Results f o r Females: INTERM on " P a r t i a l l y " and " F u l l y " Orthogonalized Components H1-H3, B l , B2, K1-K3, and R1-R3 Replaced by t h e i r Residuals a f t e r regression on BPl to BP6. "BEST" Set: R 2 = 69.1, ADJ'D R 2 = 63.9, C p = - 0.66, MSE = 0.3838 VARIABLE COEF* ST.ERROR* P-VALUE (%) TOLERANCE BPl -276 49 0.0 1.000 BP2 -313 93 0.2 1.000 BP3 525 134 0.0 1.000 BP5 -360 203 8.3 1.000 RBI -185 91 5.0 0.934 RK2 570 182 0.3 0.918 RRl -228 90 1.5 0.868 INTERCEPT 5365 As i n (A), But RH1-RH3, RBI, RB2, a f t e r Regression on RR1-RR3. "BEST" Set: R 2 = 69.0, ADJ'D R 2 MSE = 0.3843 RK1-RK3 Replaced by t h e i r Residuals = 63.8, C = - 0.61, VARIABLE COEF* ST.ERROR* P-Value (%) TOLERANCE BPl -276 49 0.0 1.000 BP2 -313 93 0.2 1.000 BP 3 525 134 0.0 1.000 BP 5 -360 203 8.3 1.000 SB1 -180 97 6.9 0.999 SK2 597 186 0.3 0.999 RRl -351 84 0.0 1.000 INTERCEPT 5336 1000 Table XXXI, Continued (c) INTERM On "Uncorrelated" Components: BP1-BP6, TH1-TH3, UB1, UB2, SK1-SK3, RR1-RR3 ( i ) "BEST" Set: R 2 = 68.4, ADJ'D R 2 = 63.1, C = 0.06, MSE = 0.3921 VARIABLE COEF* ST.ERROR* P-VALUE (%) TOLERANCE BP1 -276 49 0.0 1.000 BP2 -313 94 0.2 1.000 BP 3 525 135 0.0 1.000 BP5 -360 205 8.6 1.000 UB1 -183 114 11.5 1.000 SK2 609 188 0.2 1.000 RR1 -351 85 0.0 1.000 INTERCEPT 5403 ( i i ) SECOND Set: R z = 66.4, ADJ'D R z = 61.7, C p = 0.16, MSE = 0.4105 VARIABLE COEF* ST.ERROR* P-Value (%) TOLERANCE BP1 -276 50 0.0 1.000 BP 2 -313 96 0.2 1.000 BP 3 525 138 0.0 1.000 BP5 -360 209 8.7 1.000 SK2 609 191 0.2 1.000 RR1 -351 86 0.0 1.000 INTERCEPT 4959 *: x 1000 155 Table XXXII: Stepwise Discriminant Analysis Results for Females: Status by "Uncorrelated" Components Step 5 Result: CLASSIFICATION FUNCTION: 1.01662 (BPl) - 2.59939 (BP3) + 1.45157 (TH1) - 3.02569 (SK2) + 1.04659 - 8.99071 CLASSIFICATION MATRICES: (Jackknifed Results) PREDICTED PREDICTED PERCENT ALIVE DEAD CORRECT Aliv e 26 (25) 4 (5) 86.7 (83.3) Dead 2 (4) 18 (16) 90.0 (80.0) 88.0 (82.0) Step 6 Result: CLASSIFICATION FUNCTION: 1.13677 (BPl) + 1.12867 (BP2) - 2.90681 (BP3) + 1.62308 (TH1) - 3.38324 (SK2) + 1.17027 (RRl) -9.79390 CLASSIFICATION MATRICES (Jackknifed Results) PREDICTED PREDICTED PERCENT ALIVE DEAD CORRECT Alive 27 (27) 3 (3) 90.0 (90.0) Dead 2 (2) 18 (18) 90.0 (90.0) 90.0 (90.0) 156 These r e s u l t s were obtained with a BMDP non-linear regression program c a l l e d P-3R; currently a v a i l a b l e stepwise l o g i s t i c regression programs would have f a c i l i t a t e d a more thorough exploration of the present data through the l o g i s t i c model, although from the a v a i l a b l e r e s u l t s , i t i s doubtful that a major improvement over discriminant r e s u l t s would have been obtained. Such considerations l e d to the use of a smaller v a r i e t y of a n a l y t i c a l tools i n the sequel. 5.6 Male-Female Comparisons I t i s of i n t e r e s t to note that the most frequently seen sets of p r e d i c t o r s for the male and female subsamples have only the v a r i a b l e , KSPNO, i n common (see Tables XXI, and XXIII, f o r example). These r e s u l t s prompted a somewhat more thorough a n a l y s i s of the diff e r e n c e s e x i s t i n g between the a v a i l a b l e male and female subsamples. Table XXXIII describes the d i s t r i b u t i o n of the time zero variables i n the male and female subsamples separately, and then again a f t e r removal of cases belonging to severity group 3. In the part of the table that deals with the complete subsamples, i t can be seen that the female means are s l i g h t l y higher than the male ones - with the exception of MINSYS, MINDIA, HECGO, KSPNO, and, of course, RPAPO (the MAXDIA means are almost i d e n t i c a l ) ; when the cases with papilledema are omitted, only HIST and HECGO have higher means i n the male group. The fa c t that the mortality rate i s lower f o r the female group i n eit h e r case i s a l l the more s i g n i f i c a n t when i t i s discovered that there i s a strong tendency f o r the d i s t r i b u t i o n s i n the male subsamples (both n = 48 and n = 38) to show a greater degree of skewness to the r i g h t than the corresponding female histograms; f o r example, consider SYSO i n the Table XXXIII: Comparison of Male and Female Subsamples COMPLETE SUBSAMPLES SEV. GROUP 3 OMITTED Male (n= 48) Female (n= 50) Male (n=38) Female (n=45) SUMMARIES X s sk x s sk X s sk X s sk AGE 39.2 8.4 -0.68 39.9 8.4 -0.62 39.0 8.6 -0.62 40.6 7.8 -0.54 HIST 2.3 1.1 -0.05 2.3 0.9 -0.36 2.3 1.1 -0.08 2.2 1.0 -0.18 DUR 4.2 3.5 1.11 4.4 3.8 0.99 3.9 3.6 1.28 4.5 3.8 0.96 MAXSYS 232.5 29.6 -0.13 244.1 28.4 -0.41 226.7 29.2 0.15 241.4 28.2 -0.34 MAXDIA 148.9 16.1 0.23 148.7 16.7 0.05 146.4 16.0 0.36 146.9 16.1 0.06 MINSYS 172.1 30.1 0.49 168.2 21.3 0.10 165.3 28.5 1.02 167.2 21.9 0.19 MINDIA 105.6 15.9 0.23 102.8 12.2 -0.19 101.8 13.9 0.14 102.5 11.4 -0.61 SYSO 2.8 0.8 0.39 3.0 0.8 -0.34 2.6 0.8 0.75 3.0 0.7 -0.33 DIAO 3.1 0.8 -0.24 3.2 0.8 -0.64 3.0 0.8 0.00 3.2 0.8 -0.54 HSYMO 0.5 0.9 1.34 0.7 0.8 0.98 0.5 0.9 1.44 0.7 0.8 1.04 HSIZO 0.7 0.8 0.76 1.0 0.9 0.31 0.6 0.8 1.00 1.0 0.9 0.39 HECGO 1.4 0.8 0.08 1.2 0.9 -0.23 1.3 0.9 0.26 1.2 0.9 -0.18 BACHO 1.0 1.1 0.90 1.4 1.1 0.30 0.9 1.0 0.92 1.3 1.0 0.26 BCVAO 0.4 0.9 2.15 0.5 1.0 1.57 0.4 1.0 1.99 0.5 1.0 1.51 KSYMO 0.3 0.6 2.08 0.5 0.6 0.85 0.2 0.5 2.35 0.4 0.5 0.92 KPROO 0.5 0.9 1.80 0.7 0.8 1.04 0.3 0.5 1.45 0.6 0.7 0.81 KSPNO 0.5 0.7 1.14 0.4 0.6 1.12 0.3 0.5 1.66 0.3 0.6 1.40 RVESO .1.7 1.1 0.16 1.8 0.9 -0.02 1.3 0.9 0.07 1.7 0.9 -0.15 RRETO 0.6 1.0 1.54 0.7 1.1 1.49 0.2 0.5 2.11 0.5 0.9 1.90 RPAPO 0.4 0.9 2.50 0.1 0.5 4.25 - - - - - -MORT. RATE 54.17% 40.00% 42.11% 37.78% NOTE: x: mean = £x/n /—zn s: Standard Deviation = v£(x-x) /(n-1) 3 3 sk: Skewness = Z(x-x) / ( n s ) 158 right-hand part of the t a b l e : the p o s i t i v e skewness of the male d i s t r i -bution indicates that more than h a l f of the sample l i e s below the mean; the opposite i s true i n the female sample, where the skewness i s negative. Moreover, only f o r the va r i a b l e s , MINSYS, MINDIA, and RPAPO, i s there any large di f f e r e n c e i n the spread of the d i s t r i b u t i o n s , a f a c t that f a c i l i t a t e s such comparisons as those above. I t i s i n t e r e s t i n g to note that, i n the o r i g i n a l a r t i c l e by Evelyn et a l , [14, pp. 214, 215], the analysis of differences i n mortality rate rests almost e n t i r e l y on the increased frequency of papilledema among the male patients; the RPAPO row of Table XXXIII suggests t h i s argument i s a v a l i d one, but the findings when severity group 3 cases are omitted h i n t at more profound differences here. To take the comparison a step further, stepwise discriminant analyses were c a r r i e d out with the aim of discovering those variables that e f f e c t i v e l y d i s t i n g u i s h male from female patients. The f i v e best discriminators and t h e i r F - s t a t i s t i c s are: MAXSYS (3.94), KSYMO (3.72), BACHO (3.31), HSIZO (3.18), RPAPO (2.70). The f i n a l set, a f t e r forward and backward stepping, i s : MAXSYS, MAXDIA, HECGO, KSYMO, KPROO - with only MAXSYS and the kidney variables having p o s i t i v e signs i n the c l a s s i f i c a t i o n function that assigns a patient to the female population for large values of the function. The analysis was then repeated with the blood pressure variables re-expressed as: PPMAX = MAXSYS - MAXDIA, PPMIN = MINSYS - MINDIA, PPAVE = SYSO - DIAO, MAXDIA, MINDIA, DIAO, where the p r e f i x , PP, stands for pulse pressure.^ With these v a r i a b l e s , PPMAX i s now the best single discriminator (F = 7.56) followed by 6These re-expressions are discussed i n greater d e t a i l i n the next section. 159 PPAVE (3.97), and then KSYMO, BACHO, HSIZO, and RPAPO, as before; the f i n a l c l a s s i f i c a t i o n function contains PPMAX (+) , KSYMO (+),, and RPAPO (-), and the jackknifed r e s u l t s show an o v e r a l l percentage of correct c l a s s i f i c a t i o n s of 69.4% (compared with 70.4% for the s i x -va r i a b l e set above). Thus, female patients of the type represented here may be distinguished from the males f a i r l y e f f e c t i v e l y by: a large s i x -month maximum s y s t o l i c pressure r e l a t i v e to the maximum d i a s t o l i c ; by the presence of more severe kidney symptoms; and by the mildness of papilledema symptoms. Returning now to the comparison of pre d i c t o r s of s u r v i v a l time i n the two subsamples, i t i s noted that F-to-enter values f o r MAXSYS are considerably larger i n the male sample than i n the female group: 27.18 versus 11.30 for INTERM, and 21.7 versus 6.2 for STATUS are t y p i c a l examples; and the opposite i s true f o r the MAXDIA v a r i a b l e : 28.37 f o r females i n p r e d i c t i n g INTERM, versus 12.85 for males; and the corresponding values for STATUS are 17.5 versus 10.2. Moreover, these differences p e r s i s t when severity group 3 cases are removed from each subsample, the r a t i o s for the target, INTERM, being 2.1 for the MAXSYS F - s t a t i s t i c s and 2.3 for the MAXDIA values. Such differences are rather puzzling, considering that the c o r r e l a t i o n of the v a r i a b l e s , MAXSYS and MAXDIA, i s 0.709 i n the male subsample, and an almost equally large 0.684 f o r the females. Scatter p l o t s of MAXSYS and MAXDIA versus TIME were examined i n each subsample separately, and the only anomaly found was the complete absence of any s u r v i v a l times i n the four to seven year range for the males. Whatever e f f e c t s t h i s gap might have on the c o r r e l a t i o n s i n the male group, i t i s clear that no such reasoning 160 could apply to the r e s u l t s i n the female subsample. One f i n a l comment here i s derived from a p a r t i a l c o r r e l a t i o n analysis of the v a r i a b l e s , INTERM, MAXSYS, and MAXDIA, removing the l i n e a r e f f e c t s of AGE, SYSO, and DIAO f o r each subsample separately. For the male group, the re s i d u a l c o r r e l a t i o n between MAXSYS and INTERM drops to -0.185, and that of MAXDIA and INTERM to 0.001 - neither of which i s s i g n i f i c a n t at the 5% l e v e l . In the female subsample, although the r e s i d u a l c o r r e l a t i o n of MAXSYS and INTERM i s only -0.131, the corres-ponding value f o r MAXDIA and INTERM i s a s t i l l large -0.428 (p < 0.01). This r e s u l t i s consistent with the appearance of three blood pressure variables i n the sets of predictors selected f o r the females, and only one such v a r i a b l e for the males. 5.7 Analyses with Pulse Pressure and Range Variables The analyses of t h i s section were motivated i n part by a question posed a number of years ago i n an a r t i c l e on the epidemiology of hypertension: " I t i s c l e a r that degree of v a r i a b i l i t y increases with mean blood pressure; i f l a b i l i t y i s studied without c o n t r o l l i n g f o r mean blood pressure, any differ e n c e i n prognosis may simply r e f l e c t the well-known prognosis of the blood pressure i t s e l f . The i n v e s t i g a t i v e question must be: does f l u c t u a t i n g blood pressure have a worse prognosis than stable blood pressure of the same mean blood pressure l e v e l ? " [24, p.1164] A quick examination of the l i s t of variables a v a i l a b l e f o r the present sample reveals that, i n the absence of recorded variances, 161 any measures of v a r i a b i l i t y used here must necessa r i l y involve the extreme pressures: MAXSYS, MAXDIA, MINSYS, and MINDIA. Closer study of these l a s t two var i a b l e s indicates a p o t e n t i a l l y serious incon-sistency: f o r 24 males and 18 females, the recorded minimum pressures are " r e s t i n g " values (see Chapter 2), while only "casual" minima are av a i l a b l e f o r the other 56 cases. As a f i r s t attempt to deal with t h i s problem, "range" variables were defined avoiding the minimum pressures; thus, a s y s t o l i c "upper half-range", denoted SMXMM, i s defined as MAXSYS - CSYSO, where CSYSO i s simply a conversion of SYSO back to i t s o r i g i n a l units; DMXMM i s s i m i l a r l y defined as MAXDIA - CDIAO. Table XXXIV presents the separate-sample c o r r e l a t i o n s of SMXMM and DMXMM with other a v a i l a b l e time zero v a r i a b l e s . For conveni-ence, o f presentation, the table also includes the r e s u l t s f o r the three "pulse pressure" v a r i a b l e s , PPMAX = MAXSYS - MAXDIA, PPMIN = MINSYS -MINDIA, and PPAVE = CSYSO - CDIAO, although the v a r i a b i l i t y measured by a pulse pressure i s not to be confused with the l i a b i l i t y r e f e r r e d to i n the f i r s t paragraph of the present section; i t should also be noted that these variab les cannot be considered "genuine" pulse pressures since, s t r i c t l y speaking, a pulse pressure i s cal c u l a t e d from a p a i r of readings made at the same point i n time, and there i s no guarantee that, f o r example, a patient's maximum s y s t o l i c and maximum d i a s t o l i c pressures occurred on the same reading. The reader might prefer to regard PPMAX, PPMIN, and PPAVE as part of a set of "judgement components" having at le a s t a s u p e r f i c i a l meaning. A few comments on Table XXXIV follow: negative values f o r SMXMM and DMXMM occurred i n a few instances as a r e s u l t of the d i s c r e t e nature of the SYSO and DIAO variables and the Table XXXIV: Correlations and Summaries for Upper Half-range and Pulse Pressure Variables VARIABLES & MALE, FEMALE RESULTS* (n=48,50) SUMMARIES SMXMM DMXMM PPMAX PPMIN PPAVE INTERM -19 -10 -6 --41 -49 -11 -27 5 -46 -32 AGE 41 27 -11 -3 47 44 24 34 10. 23 HIST 9 8 23 27 -5 -9 -11 •32 -3 -4 DUR - 1 -2 -8 -2 19 0 23 -8 18 2 MAXSYS NA NA NA NA NA MAXDIA NA NA 22 13 41 29 -47 36 MINSYS 11 27 17 9 NA NA NA MINDIA 2 -2 24 7 35 14 47 10 58 28 SYSO -7 -13 15 9 58 46 61 35 NA DIAO 10 10 10 5 44 37 51 35 52 43 HSYMO 6 -2 2 -2 28 16 11 29 31 24 HSIZO 9 11 10 16 15 14 9 21 16 16 HECGO 17 13 40 3 14 30 10 25 23 26 BACHO -17 3 -8 27 24 -23 14 -3 45 -16 BCVAO 18 21 -3 0 17 17 6 9 2 -5 KSYMO -1 12 -9 44 8 3 -7 20 6 21 KPROO -12 28 22 53 13 7 27 23 43 12 KSPNO -14 -16 14 11 9 2 15 14 35 32 RVESO 12 22 6 46 52 13 29 18 57 22 RRETO -13 33 14 40 15 28 22 26 42 23 RPAPO -11 30 -11 28 31 11 30 14 45 -5 MEAN 23.8 27.9 18.7 17.4 83.6 95.4 66.3 65.4 78.6 84.9 ST. DEV. 18.0 20.3 10.2 11.1 21.4 20.9 19.2 16.3 16.6 14.8 SKEW. 0.5 0.4 0.5 0.4 -0.3 -0.0 1.2 0.3 0.2 -0.1 MAX. 65 75 37 48 125 146 140 100 102 117 MIN. -5 -5 2 -3 40 54 32 35 57 57 Further Correlations (Male, Female) SMXMM DMXMM PPMAX PPMIN PPAVE SMXMM 100 , 100 PPMAX 100 , 100 DMXMM 25 , 28 100 , 100 PPMIN 53 , 50 100 , 100 PPAVE 56 , 40 55 , 26 100 , 100 * Correlations x 100 NOTE: NA: NOT AVAILABLE 163 widths of the coded i n t e r v a l s ; there are only a few strong c o r r e l a t i o n s involving the upper h a l f -range v a r i a b l e s , most of them involving DMXMM i n the female sample; i n p a r t i c u l a r , s u r v i v a l time tends to be le s s for female cases with large values of DMXMM - an observation that i s l i k e l y related to the p a r t i a l c o r r e l a t i o n r e s u l t s mentioned at the end of Section 5.6; large values of DMXMM are also associated with more serious kidney and r e t i n a l symptoms - e s p e c i a l l y i n the female sample; SMXMM and DMXMM have only weak c o r r e l a t i o n s with the other blood pressure v a r i a b l e s f o r which r e s u l t s are a v a i l a b l e ; the pulse pressure variables appear to be better predictors of su r v i v a l time i n the male than i n the female subsample; t h i s may be due, i n part, to the somewhat stronger c o r r e l a t i o n s that e x i s t between each pulse pressure v a r i a b l e and the o r i g i n a l d i a s t o l i c variables for the male group; almost a l l of the remaining strong c o r r e l a t i o n s i n the right-hand part of Table XXXIV involve the male group; i n p a r t i c u l a r , PPAVE has s i g n i f i c a n t p o s i t i v e c o r r e l a t i o n s with a l l the r e t i n a l v a r i a b l e s , two of the kidney v a r i a b l e s , and BACHO, i n the male subsample; i t i s not c l e a r from t h i s analysis how much of t h i s c o r r e l a t i o n may be accounted f o r by the strong pulse pressure - d i a s t o l i c (or s y s t o l i c ) r e l a t i o n s h i p noted above. Tables XXXV and XXXVI present the " a l l subsets regression" r e s u l t s using the v a r i a b l e s of Table XXXIV. Comparison of Tables Table XXXV: Regression with INTERM using Upper Half-range VARIABLES R,(cg) x 1000. VARIABLES & SUMMARIES (A) COMPLETE ASR 'BEST' SAMPLE (n=98) ASR #2 (B) MALE S ASR 'BEST' UBSAMPLE (n=48) ASR #2 (C) FEMALE SI ASR 'BEST' JBSAMPLE (n=50) ASR #2 SMXMM DMXMM MINSYS MINDIA -18 (4) 20 (5) -28 (8) -17 (4) 19 (5) -26 (8) -20 (6) -23 (6) -21 (9) 22 (6) -21 (10) -23 (9) 24 (6) -25 (10) CSYSO CDIAO -22 (4) -21 (4) -29 (11) -12 (6) -37 (10) -33 (10) HSYMO HSIZO HECGO -247 (135) BACHO BCVAO -245 (74) -187 (82) -320 (111) -307 (119) KSYMO KPROO KSPNO -484 (141) -250 (157) -443 (142) -552 (199) -583 (204) -426 (178) -360 (177) RVESO RRETO RPAPO -376 (128) -352 (128) -433 (159) -412 (166) -355 (103) -332 (101) SEX INTERCEPT 350 ( 83) 7252 365 (82) 7021 7826 6593 7480 7168 R 2(%) % of TOTAL R 2 ADJ'D R 2(%) Cp MSE 63.9 93.5 60.7 14.65 0.5426 64.9 94.9 61.3 15.22 0.5333 68.1 91.3 64.3 2.95 0.5766 66.3 88.8 62.3 4.82 0.6084 62.8 85.9 57.6 10.08 0.4505 65.6 89.8 59.8 10.12 0.4272 MIN. TOLER. VARIABLE 0.306 MINSYS 0.3 MINSYS 0.651 RRAP NOT AVAILABLE NOT AVAILABLE 0.500 MINSYS NOT AVAILABLE NOT AVAILABLE NOTE: ( i ) SMXMM = MAXSYS - CSYSO ( i i ) DMXMM = MAXDIA - CDIAO ( i i i ) CSYSO = 30 (SYSO) + 125 (IV) CDIAO = 15 (DIAO) + 83 Table XXXVI: Regression with INTERM using Pluse Pressure Variables 8,(0 x 3- 0 0 0 VARIABLES & SUMMARIES ASR 'BEST' (A) MALE (n=48) ASR #2 ASR #3 (E ASR 'BEST' ) FEMALE (n=50) ASR #2 ASR #3 PPMAX MAXDIA PPMIN MINDIA -17 (6) -17 (8) -14 (6) -23 (6) -22 (8) -29 (6) 23 (6) -35 (7) 20 (6) -35 (7) 18 (6) PPAVE DIAO -394 (184) 19 ( 10) HSYMO HSIZO HECGO -272 (129) BACHO BCVAO -256 (112) -234 (114) -325 (115) -165 ( 91) KSYMO KPROO KSPNO -525 (213) -551 (210) -574 (209) -362 (174) 287 (139) -502 (174) 344 (140) -521 (170) RVESO RRETO RPAPO -378 (171) -329 (170) -431 (168) -319 ( 98) -397 (102) -359 (102) INTERCEPT 7373 5838 7279 6594 7437 7801 R 2(%) % of TOTAL R2 ADJ'D R 2(%) °MSE 65.2 87.4 61.0 6.08 0.6298 65.1 87.2 61.0 6.13 0.6306 67.9 91.0 63.2 6.16 0.5942 64.9 88.8 60.9 4.82 0.4154 64.7 88.5 60.7 4.99 0.4173 67.2 91.9 62.7 5.30 0.3968 MIN. TOLER. VARIABLE 0.614 RRAPO NOT AVAILABLE NOT AVAILABLE NOT AVAILABLE NOT AVAILABLE 0.712 RRETO NOT AVAILABLE NOT AVAILABLE NOT AVAILABL1 NOT AVAILABL1 NOTE: ( i ) PPMAX = MAXSYS - MAXDIA ( i i ) PPMIN = MINSYS - MINDIA ( i i i ) PPAVE = CSYSO - CDIAO (iv) See Table XXXV f o r D e f i n i t i o n of CSYSO, CDIAO 166 VIII (B) and XXXV (A) suggest that l i t t l e i s gained by the use of SMXMM and DMXMM for the combined sample; i n p a r t i c u l a r , the reader's attention i s drawn to the near equivalence of the "best" solutions i n each case. With the male subsample, an improvement of about 3% i n the 2 value of R (from 0.65 to 0.68), without a serious change i n the minimum usable f r a c t i o n , i s obtained using the upper half-ranges. The re-expressions of Table XXIII (E) are far more e f f e c t i v e than those of Table XXXV (C) for the female group. Throughout Table XXXV, the reader w i l l note a very strong tendency for the o r i g i n a l maximum blood pressure (either MAXSYS or MAXDIA) to be recoverable by recombining SMXMM (or DMXMM) and eith e r of the highly correlated averages, CSYSO or CDIAO. The s i t u a t i o n i s reversed i n Table XXXVI: the pulse pressure v a r i a b l e s tend to recombine to y i e l d something close to MAXSYS again, i n the male sets (although PPAVE i s marginally s i g n i f i c a n t on i t s own. i n the t h i r d s e t ) ; i n the female subsample, improvements i n the tolerances are achieved using the pulse pressure variables (as was expected), and 2 higher values of R are obtained with sets of the same s i z e as those containing only the o r i g i n a l v a r i a b l e s . Analysis of the combined sample using pulse pressure variables was omitted. At t h i s point, the group of 42 cases for whom re s t i n g minimum pressures were a v a i l a b l e was used to study the e f f e c t s of the "complete range" and "lower half-range" variables with respect to sur-v i v a l time; these variables are defined as follows: SR = MAXSYS -MINSYS, DR = MAXDIA - MINDIA, LRNGS = CSYSO - MINSYS, and LRNGD = CDIAO - MINDIA. A l t e r n a t i v e l y , the 56 cases having only "casual" minima might have been considered (or the two groups compared), but 167 considerations of time as well as uniformity of measurement conditions led to the choice of the "r e s t i n g " set. The following c o r r e l a t i o n s with INTERM were noted here: MAXSYS (-0.34), MAXDIA (-0.37), MINSYS (-0.22), MINDIA (-0.40), SYSO (-0.32), DIAO (-0.44), SMXMM (-0.12), DMXMM (-0.06), LRNGS (-0.02), LRNGD (0.23), SR (-0.10), and DR (0.07). The c o r r e l a t i o n between SEX (coded 2 f o r males, 4 for females) and SR, SMXMM, SYSO, LRNGS are, r e s p e c t i v e l y : 0.35, 0.04, -0.02, 0.31; the corresponding values for the d i a s t o l i c counterparts are: 0.17, -0.15, -0.04, 0.31 . Thus, a f i r s t glance at the r e s u l t s for t h i s rather " s p e c i a l " set of cases reveals no c l e a r r e l a t i o n s h i p between s u r v i v a l time and any of the "range" v a r i a b l e s ; and, with respect to sex differences here, the fore-going c o r r e l a t i o n s seem to ind i c a t e that the greater v a r i a b i l i t y i n blood pressure among female cases (as measured by SR or DR) i s mainly due to the greater "lower half-ranges" (LRNGS and LRNGD) i n the female group; that i s , the female hypertensives i n t h i s sample seem capable of achieving lower minimum pressures than the males, although the mean l e v e l s are comparable. A c l o s e r examination of SR and DR uncovered the following c o r r e l a t i o n s with MAXSYS, MAXDIA, SYSO and DIAO for the 42 cases with r e s t i n g minima: 0.33, 0.16, 0.17, 0.2 5 f o r SR, res p e c t i v e l y , and 0.18, 0.46, -0.03, 0.09 f o r DR, res p e c t i v e l y . Thus, there i s only a weak tendency for s y s t o l i c range to increase with the mean l e v e l of blood pressure, and almost no tendency f o r the d i a s t o l i c range to do likewise, i n t h i s sample. Detailed scatter p l o t s of SR and DR versus TIME were then done i n an attempt to understand the nature of the weak l i n e a r 168 r e l a t i o n s h i p s here. I t was immediately c l e a r that cases belonging to severity group 3, or who l a t e r developed papilledema, showed a strong tendency toward both low s y s t o l i c and low d i a s t o l i c ranges - a fa c t which, when combined with t h e i r generally short s u r v i v a l times, s i g n i -f i c a n t l y weakens any negative l i n e a r r e l a t i o n s h i p that may e x i s t between the two v a r i a b l e s . (The same may be said f or the p o s i t i v e r e l a t i o n s h i p between the range variables and SYSO or DIAO.) Moreover, several cases survived the ten-year study with high values of SR and DR, and further i n v e s t i g a t i o n showed most of these cases to be i n t h e i r early t h i r t i e s -si x to eight years below the average age. Unfortunately, the problem of small sample sizes d i d not allow such clues to be followed up with more penetrating analyses - at l e a s t i n t h i s reduced sample. Working again with the complete sample (n = 98), average values of TIME were ca l c u l a t e d within the classes created by crossing the v a r i a b l e s , SEX, SEV (severity group, 1, 2, or 3), SMXMM (low-high), and SYSO (codes 1 and 2 combined); the analysis was, of course, repeated with DMXMM and DIAO repla c i n g the s y s t o l i c values. While such a tech-nique o r i g i n a l l y seemed promising as a way of studying the e f f e c t s of v a r i a b i l i t y of blood pressure on the prognosis, while c o n t r o l l i n g f or the e f f e c t s of average l e v e l , the need (as suggested by e a r l i e r results) to allow f o r SEX and severity group e f f e c t s i n e v i t a b l y created classes with very small numbers of cases; moreover, the use of the v a r i a b l e s , SR and DR, rather than j u s t the upper half-ranges would have been pre-ferable here, but with the mixture of "casual" and "r e s t i n g " minima i n the complete sample, problems with i n t e r p r e t i n g the r e s u l t s were a n t i c i -pated . 169 At t h i s point, a number of p a r t i a l c o r r e l a t i o n analyses were done as part of a second approach to the problem of separating the v a r i a b i l i t y and l e v e l e f f e c t s of blood pressure on prognosis. F i r s t , the c o r r e l a t i o n s among INTERM, SMXMM and DMXMM were ca l c u l a t e d , f o r the male and female groups separately, adjusting for the l i n e a r e f f e c t s of AGE, MINSYS, and MINDIA; the analysis was repeated for the combined sample, adjusting f o r SEX as w e l l . The r e s u l t s appear i n Table XXXVII (A), and are very s i m i l a r to the o r i g i n a l c o r r e l a t i o n s . Next, using only the 42 cases with "resting", minima, the c o r r e l a t i o n s among SR, DR, and INTERM were considered a f t e r adjustment f o r AGE, SYSO, DIAO, and SEX; from Table XXXVII (B), i t i s apparent that only the r e l a t i o n s h i p with SR appears stronger a f t e r the adjustment, although the c o r r e l a t i o n i s s t i l l not s i g n i f i c a n t at the 5% l e v e l . F i n a l l y , considering the maxi-mum and minimum pulse pressure variables a f t e r the same adjustment as i n part (B), i t i s seen (Table XXXVII (C)) that only PPMAX i n the male sample retains i t s negative r e l a t i o n s h i p with INTERM following the adjustment. As for the presence of the p o s i t i v e values i n the r i g h t -hand part of table (C), three appear to be well-explained by the pre-sence of an extreme o u t l i e r , while the value f o r PPMIN i n the female sample - s i g n i f i c a n t at the 5% l e v e l - i s rather puzzling, but consistent with previous r e s u l t s i n v o l v i n g the o r i g i n a l v a r i a b l e s , MINSYS and MINDIA. These explorations are perhaps of greatest use i n demonstrating the generally smaller prognostic importance of the range v a r i a b l e s considered here r e l a t i v e to the " l e v e l " v a r i a b l e s , SYSO, DIAO, etc. Once again, though, the absence of more r e l i a b l e measures of v a r i a b i l i t y i s regretted. Table XXXVII: P a r t i a l Correlation Analyses INTERM, SMXMM, DMXMM, Adjusting f o r AGE, MINSYS, MINDIA, (SEX) ORIGINAL CORRELATIONS ADJUSTED CORRELATIONS WITH INTERM WITH INTERM SMXMM DMXMM SMXMM DMXMM MALE -0.190 -0.062 -0.134 0.049 FEMALE -0.098 -0.405 -0.137 -0.412 COMBINED -0.116 -0.231 -0.128 -0.192 INTERM, SR, DR, "RESTING" SET (n=42), adjusting for AGE, SYSO, DIAO, SEX ORIGINAL CORRELATIONS ADJUSTED CORRELATIONS SR DR SR DR -0.097 0.067 -0.179 0.050 INTERM, PPMIN, PPMAX, adjusting for AGE, SYSO, DIAO, (SEX) (complete samples) ORIGINAL CORRELATIONS ADJUSTED CORRELATIONS PPMIN PPMAX PPMIN PPMAX MALE FEMALE COMBINED -0.268 0.052 -0.137 -0.489 -0.107 -0.240 0.186* 0.300 0.220* -0.193 0.146* -0.023 *These values are much closer to zero with the removal of an extreme o u t l i e r . 171 Since so much attention has already been given here to the inconsistency of the minimum blood pressure v a r i a b l e s , i t would seem inappropriate to ignore the possible e f f e c t s of t h i s inconsistency on the regression r e s u l t s of previous sections. One set of analyses c a r r i e d out to evaluate such e f f e c t s involves the use of " a l l subsets regression" within the " r e s t i n g " set (n = 42). Using the o r i g i n a l time zero v a r i a b l e s , the "best" set was the following: -0.020 (MAXSYS) + 0.023 (MINSYS) - 0.025 (MINDIA) - 0.294 (BACHO) - 0.909 (KSPNO) + 0.543 (SEX)+ 5.630 2 ... where the dependent v a r i a b l e i s INTERM and the value of R i s 0.673. This r e s u l t should be compared with that of Table VIII (B). I t i s of i n t e r e s t to note that larger "good" sets reported by the program include RRETO, bringing the solutions for the two d i f f e r e n t samples (n = 98 and n = 42) even c l o s e r together. F i n a l l y , further assurance of the v a l i d i t y of previous r e s u l t s was gained by carrying out what became a rather complex adjustment procedure, f o r each SEX group separately: the minimum blood pressure variables were adjusted f o r the "casual e f f e c t " - the tendency f o r casual minima to be larger than r e s t i n g minima; however, t h i s e f f e c t was apparent only a f t e r making a preliminary " l e v e l " adjustment (by SYSO for the males, DIAO for the females); t h i s was necessary because patients with r e s t i n g minima, having been h o s p i t a l i z e d , tended to be more severely hypertensive, and thus have higher, rather than lower, minima i n r e l a t i o n to the out-patient group. " A l l subsets regression" r e s u l t s a f t e r the f i n a l adjustments yielded solutions that were very s i m i l a r ( i f not id e n t i c a l ) to those of Tables XXI (B) and XXIII (B). The rather complex 172 re-expressions i n the solu t i o n given i n Table XXXVIII (B) appear to be 2 worth the e f f o r t as f a r as R and tolerance values are concerned, and the so l u t i o n provides f o r a f a i r l y i n t u i t i v e adjustment f o r casual readings as we l l . Also of i n t e r e s t here are the r e l a t i v e l y low c o r r e l a -tions of RMNSYS and RMNDIA with INTERM, and t h e i r high l e v e l of significance in the f i n a l equation; here, perhaps, more than at any other point i n the an a l y s i s , i t appears that MINSYS and MINDIA together act l i k e a single v a r i a b l e . 5.8 Predict i n g Papilledema Symptoms The ass o c i a t i o n of papilledema with the "accelerated" form of hypertension, and the well-known serious prognostic implications of the l a t t e r condition prompted the following b r i e f examination of the a v a i l -able data f o r symptoms re l a t e d to the onset of papilledema. The target of the f i r s t set of analyses done here was denoted PRESPAP, which i s 1 i f the patient showed any papilledema symptoms at any time i n the course of the study, and 0 otherwise. C o r r e l a t i o n and regression analyses f o r PRESPAP were c a r r i e d out on both the e n t i r e sample (n = 98) and on those without papilledema at time zero (n = 83), using a l l a v a i l a b l e time zero variables other than RPAPO. For the com-plete sample, the strongest ten c o r r e l a t i o n s with PRESPAP are: RRETO (0.487), DIAO (0.455), MAXDIA (0.452), MINDIA (0.455), RVESO (0.440), KSPNO (0.414), KPROO (0.398), MAXSYS (0.372), SYSO (0.354), MINSYS (0.348), KSYMO (0.328). The "best" set produced by " a l l subsets 2 regression" had R = 0.4516 and MSE = 0.1177, and contained: AGE, MAXSYS, MINDIA, SYSO, DIAO, HYSMO, KSPNO, with AGE and SYSO having negative c o e f f i c i e n t s i n the equation. Unfortunately, t h i s equation i s 173 Table XXXVIII: Regression Results for INTERM With Adjusted MINIMA (A) A l l subsets regression r e s u l t s f o r INTERM, Male Subsample: Same as Table XXI (B). (B) Female Subsample, "BEST" SET: R 2 = 66.4, MSE = 0.4063 VARIABLE COEF* ST. ERROR** P-VALUE (%) TOLERANCE MAXDIA -23 6 0.1 0.750 RMNSYS 25 6 0.0 0.686 RMNDIA -24 10 2.0 0.695 HSYMO -256 127 5.0 0.850 KSPNO -371 173 3.7 0.872 RRETO -319 97 0.2 0.713 INTERCEPT 6277 * x 1000 The above s o l u t i o n involves the following re-expressions: CASUAL = f l i f \0 i f only casual MINIMA are available resting MINIMA were obtained. RCAS = CASUAL + 0.132098 (DIAO) = adjustment f o r unequal mean l e v e l e f f e c t s RMNSYS = MINSYS - 13.850416 (DIAO) - 9.21691 (RCAS) = adjustment for l e v e l e f f e c t then 'casual' e f f e c t . = MINSYS - 15.067951 (DIAO) - 9.21691 (CASUAL) CORRELATIONS: 6.736566 (DIAO) - 4. 00157 (RCAS) • 7.265165 (DIAO) - 4. 00157 (CASUAL). RMNSYS RMNDIA INTERM RMNSYS 1 RMNDIA 0.526 1 INTERM 0.194 -0.033 1 MAXSYS 0.243 -0.016 -0.437 MAXDIA 0.047 0.034 -0.609 174 subject to the uncertainties that r e s u l t from low usable f r a c t i o n s -0.244 f o r SYSO, 0.296 for MAXSYS, and 0.302 f o r DIAO - and would p r o f i t from some attemps to re-express the blood pressure v a r i a b l e s . For the reduced sample, the r o l e of the d i a s t o l i c v a r i a b l e s i s again apparent i n the c o r r e l a t i o n ordering: MAXDIA (0.375), DIAO (0.368), MINDIA (0.339), HSYMO (0.314), MAXSYS (0.242), SYSO (0.232), KSYMO (0.199), KSPNO (0.180), MINSYS (0.169), HECGO (0.168). In t h i s case, the "best" set includes MAXDIA, MINSYS, MINDIA, and HSYMO, with only MINSYS having 2 a negative c o e f f i c i e n t . The value of R i s now only 0.260, and MINSYS and MINDIA have tolerances of 0.397 and 0.404 re s p e c t i v e l y . This second case, involving only patients who were free of papilledema symp-toms at time zero, i s perhaps the more appealing from a p r a c t i c a l point of view. The second set of analyses done here involved weighting papilledema symptoms (that i s , the coded values, 0 - 4 ) by t h e i r time of f i r s t occurrence during the study; thus, i f a patient developed p a p i l l e -dema of grade 2 at the five-year point of the study, the corresponding value of the target v a r i a b l e , denoted ADJDPAP, would be 2/5, and so on, with time zero grades l e f t unadjusted. The weighting was designed to allow f o r an assumed l e s s e r p r e d i c t a b i l i t y of events occurring fa r t h e r i n the future. The ten l a r g e s t c o r r e l a t i o n s with ADJDPAP i n the complete sample are: RRETO (0.610), RVESO (0.482), KSPNO (0.441), KPROO (0.435), DIAO (0.348), SYSO (0.344), KSYMO (0.309), MAXSYS (0.306), MAXDIA (0.294), and MINSYS (0.284). The "best" set of predictors f o r ADJDPAP produced 2 an R value of 0.474 using HIST, MAXDIA, DIAO, KPROO, RRETO, and SEX; MAXDIA and SEX had negative c o e f f i c i e n t s , and the d i a s t o l i c v a r i a b l e s possessed the smallest usable f r a c t i o n s : 0.330 for MAXDIA, and 0.401 fo r DIAO. When the reduced sample i s considered, only the extreme and the average blood pressures remain s i g n i f i c a n t l y correlated with ADJDPAP, the order being: DIAO (0.254), MAXDIA (0.235), SYSO (0.232), and MAXSYS (0.209). The regression equations here include MINSYS (-), 2 DIAO (+), and RRETO (+), and produce values of R that are less than 0.15 . 5.9 Analyses using Time Two Data In the o r i g i n a l comparative study of Evelyn et a l , data were c o l l e c t e d at the two-year mark of the study period i n order to detect any occurrences of rapidly-worsening symptoms, such as are often seen when the patient develops accelerated hypertension. In attempting to integrate the 13 time-two symptom variables into the present a n a l y s i s , a number of points had to be considered: f o r several time two . v a r i a b l e s , more than one quarter of the obser-vations are missing; the most incomplete variables are: HECG2 (42.3% missing), HSIZ2 (36.1%), RRET2 (30.9%), RPAP2 (30.9%), RVES2 (28.9%), KSPN2 (27.8%), and KPR02 (24.7%); the t o t a l a v a i l a b l e sample now includes 80 cases (36 males, 44 females), a f t e r the exclusion of 17 cases who died within two years of the s t a r t of the study and one case f o r whom the data were too incom-plete to allow estimation; the remaining cases should now be regarded as a selected sample; along with the smaller sample sizes at t h i s stage, there e x i s t s , as 176 w e l l , a need to modify some of the target variables (re-expressions of the TIME variable) considered e a r l i e r ; i n p a r t i c u l a r , i f we continue to measure s u r v i v a l time from the s t a r t of the study (time zero), the f i r s t category of the INTERM va r i a b l e disappears now; since c o r r e l a t i o n s are not af f e c t e d by l i n e a r re-expressions of the v a r i a b l e s , the p o l i c y adopted here was to leave the various target variables i n t h e i r o r i g i n a l forms, using time zero as the s t a r t i n g point f o r measurement of s u r v i v a l . - with the number of independent variables (33, including SEX) approach-ing the size of the male subsample (26), i t was c l e a r from the outset that the problems with m u l t i c o l l i n e a r i t y experienced e a r l i e r would be more acute at t h i s stage; moreover, the 13 time two variables have a l -ready been seen to be strongly correlated with t h e i r time zero counter-parts; thus, the use of "judgement components" was considered from the beginning of the time two a n a l y s i s . 5.9.1 Complete Sample Results Although the r e s u l t s of the time zero analysis suggested a d i v i s i o n of the complete sample into male and female groups, i t was f e l t that, with the new v a r i a b l e s and a l t e r e d sample of the two-year point, the d e s i r a b i l i t y of such a d i v i s i o n should be re-established. To t h i s end, a ser i e s of " a l l subsets regression" analyses were c a r r i e d out, on an exploratory basis, using the targets, STATUS, INTERV, and TIME - the l a s t one here being used only with a sample of non-survivors. I t should be noted that due to an e f f o r t to make use of data recorded near the two-year point of the study and thereby increase the sample sizes somewhat, the r e s u l t s presented i n Table XXXIX are based on samples that include TABLE XXXIX: Regression Results using Time Two Data, Combined Samples: 3 , ( 0^ ) x 1000 VARIABLES & SUMMARIES (A) i ASR 'BEST' 5TATUS (n=83) ASR #2 (B) INT ASR 'BEST' ERV (n=83) ASR #2 (C) TIME (NON-ASR 'BEST'(II) SURVIVORS ONLY) ASR 'BEST' (I) AGE MAXSYS MINSYS -49 ( 16) 31 ( 13) SYSI DIAI 825 (443) 1231 (437) HSYMI HSIZI HECGI 269 ( 77) 307 ( 76) -1109 (357) 632 (354) KPROI KSPNI 78 ( 43) -198 ( 74) -149 ( 78) 893 (371) RVESI RPAPI -513 (171) -454 (172) SYS2 DIA2 114 ( 60) 139 ( 54) -154 ( 89) -2304 (524) -2333 (631) HSYM2 HECG2 144 ( 47) 136 ( 48) -295 ( 64) -254 ( 68) -1215 (476) BCVA2 146 ( 54) 162 ( 54) RVES2 RRET2 RPAP2 -299 ( 68) -268 ( 69) -978 (408) -1874 (462) SEX INTERCEPT -250 -318 129 ( 57) 2557 143 ( 57) 2888 779 (289) 19,569 19,301 R2 ADJ'D R 2 CP MSE 54.4 51.4 9.12 0.1151 52.4 49.9 9.39 0.1186 63.6 61.2 10.83 0 2388 65.0 62.2 10.94 0.2329 74.3 67.9 12.10 2.1279 76.1 68.9 -0.45 2.0649 MIN. TOLER. VARIABLE 0.559 DIA2 NOT AVAILABLE NOT AVAILABLE 0.664 KSPNI NOT AVAILABLE NOT AVAILABLE 0.530 MINSYS 0.371 KSPNI NOTE: " I " at the end of a variable name indicates an increment: TIME 2 value minus TIME 0 value 178 three cases who died between 1.8 and 2.0 years a f t e r the s t a r t of the follow-up - so that the data recorded for them i n the time two columns could reasonably be treated as time two data. Also of importance here i s the re-expression of the 26 time zero and time two variables i n terms of increments (time two value minus time zero value)' and the o r i g i n a l time two v a r i a b l e s ; c a r e f u l consideration of the c o r r e l a t i o n matrix (for a l l the o r i g i n a l v a r i a b l e s as well as the increments) l a t e r l e d to the replacement of the time two data by that of time zero. A f i n a l comment here applies as well to a l l analyses involving more than about 25 independent v a r i a b l e s : since the " a l l subsets regression" programs av a i l a b l e at the time placed an upper l i m i t on the number of independent variables that could be processed at one time, i t was necessary, i n the present s i t u a t i o n , to proceed i n two stages, a l l variables of possible i n t e r e s t a f t e r the f i r s t analysis being added to the group that was o r i g i n a l l y omitted, i n order to create a " f i n a l " set of predictors f o r entry i n t o the program. I t i s recognized that t h i s process i s not theo-r e t i c a l l y equivalent to an analysis involving a l l of the variables at once, but i n the ap p l i c a t i o n s to be described here, t h i s shortcoming would appear to have l i t t l e p r a c t i c a l importance; of course, the s e l e c t i o n of v a r i a b l e s f o r the f i r s t stage of analysis had to be done with a c e r t a i n amount of f o r e s i g h t . Returning now to Table XXXIX, only a few comments are warran-ted here: comparing these r e s u l t s to s i m i l a r ones from the time zero a n a l y s i s , there i s c l e a r l y l e s s consistency among the selected sets of Table XXXIX; 179 only DIA2, HSYM2, and SEX appear with any r e g u l a r i t y here, although s u b s t i t u t i o n of a given symptom by another from the same group (RVES2 and RRET2, for example) i s i n evidence; the absence of SEX and a l l r e t i n a l v a r i a b l e s from the sets selected f o r p r e d i c t i o n of STATUS i s notable; - part (C) of the table d i f f e r s from the other parts i n that two "best" sets are presented - one from the second stage of the analysis and the other from the f i r s t stage; that a f i r s t - s t a g e r e s u l t should have a 2 higher value of R than a second-stage "best" set may be explained by the dependence of the c r i t e r i o n on the r e s i d u a l sum of squares f o r a l l v a r i a b l e s being considered; as i n the time zero r e s u l t s , the variable MINSYS, when i t appears, has a p o s i t i v e c o e f f i c i e n t i n p r e d i c t i n g s u r v i v a l time, as well as a f a i r l y small usable f r a c t i o n . The analysis summarized i n Table XL was c a r r i e d out a f t e r an examination of the relevant c o r r e l a t i o n s indicated that the association of a given increment with the time zero v a r i a b l e was almost always weaker than the increment - time two c o r r e l a t i o n ; moveover, i n t e r - c o r r e -l a t i o n s among the time zero variables were generally.weaker than f o r the time two symptoms. The target chosen here was INTERM, and the sample was not "augmented" (as was the case i n Table XXXIX). Of p a r t i c u l a r i n t e r e s t here are the importance of RPAPI and KSYMI i n the p r e d i c t i o n equations, and the s i g n i f i c a n c e of the SEX v a r i a b l e . This l a s t observa-2 t i o n and the larger values of R obtained here, r e l a t i v e to those f o r INTERV, motivated the analyses presented i n the next subsection. Table XL: Regression Results Using Time Two Data: Combined Sample, Target: INTERM (n=80) ASR 'BEST' ASR #2 ASR #3 VARIABLES & SUMMARIES COEF* ST. ERR* T-STAT TOLER. COEF* ST. ERR* T-STAT COEF* ST. ERR* T-STAT MAXSYS -8 2 -3.22 0.527 -7 3 -2.66 MINSYS 6 3 2.20 0.588 6 3 2.16 HSYMI -251 76 -3.30 0.887 -240 76 -3.17 -326 84 -3.90 KSYMI -497 124 -4.01 0.826 -469 124 -3.77 KSPNI -161 83 -1.93 RRETI -234 91 -2.57 0.880 -293 98 -2.97 -306 103 -2.96 RPAPI -756 167 -4.52 0.926 -729 167 -4.37 -681 172 -3.96 HSYMO -243 81 -3.02 KSYMO -355 125 -2.84 0.834 -311 127 -2.44 KSPNO -361 105 -3.42 0.868 -341 105 -3.24 -292 112 -2.62 RRETO -138 94 -1.47 -259 87 -2.97 SEX 168 57 2.93 0.852 166 57 2.91 100 57 1.74 INTERCEPT 4177 3974 3681 R 2 (%) 67.6 68.6 65.5 ADJ'D R z(%) 63.5 64.1 61.6 c p 15.61 16.58 17.01 MSE 0.2217 0.2181 0.2330 * x 1000 r-' 00 o 181 5.9.2 Analysis of Male and Female Subsamples The two-stage " a l l subsets regression" r e s u l t s f o r INTERM i n the male and female subsamples are summarized i n Table XLI. As i n the complete sample analysis f or t h i s target, the variables used at the f i r s t stage included 11 of the 13 coded time zero symptoms (SYSO and DIAO were omitted) and a l l 13 increments. Unfortunately, the covariance matrix f o r these 24 var i a b l e s was e s s e n t i a l l y singular i n the male subsample, and two variables had to be removed; they were BCVAI and HECGO, with c o r r e l a t i o n s with INTERM of -0.029 and -0.238 r e s p e c t i v e l y . In the second stage, the s i x time zero blood pressure v a r i a b l e s , along with AGE, HIST, and DUR were joined by a l l the var i a b l e s of i n t e r e s t from the f i r s t stage. Despite the inconvenience of using t h i s procedure i n the present s i t u a t i o n , " a l l subsets regression" was employed here i n pr e f e r -ence to a stepwise method because of the larger number of i n t e r e s t i n g solutions i t generally produces and the observed tendency of good "step-wise" solutions to be a subset of the " a l l subsets" s o l u t i o n s . A few observations on Table XLI are i n order here: there i s no e x p l i c i t overlap at a l l between the male and female sets, although both contain at l e a s t one heart v a r i a b l e increment and - at le a s t i n the larger male sets - there i s a common tendency f o r a time zero r e t i n a l symptom to appear; 2 the number of variables selected and the r e s u l t i n g value of R are d i s t i n c t l y smaller f o r the males than f o r the females; these findings should be considered i n the l i g h t of the d i f f e r e n t sizes of the sub-samples ; TABLE XLI: Regression Results for INTERM using Time Two Data and Subsamples: VARIABLES & SUMMARIES ASR 'BEST' (A) MALES (r ASR #2 i=36) ASR #3 ASR 'BEST' (B) FEMALES ( ASR #2 n=44) ASR #3 MAXSYS -10 (3) HSYMI HSIZI HECGI -476 ( 96) -504 ( 97) -518 ( 94) -337 (116) -152 ( 77) -355 (112) -144 ( 74) -349 (117) BACHI KSPNI -231 ( 82) -94 ( 52) -246 ( 80) -100 ( 53) -264 ( 83) RRETI RPAPI -570 ( 99) -587 (135) -604 ( 98) -537 (134) -646 ( 99) -584 (137) SYSO HSYMO KSPNO RVESO RRETO -661 (161) -395 (110) -631 (165) -270 (127) -604 (160) -1,81 (100) -194 (78) -324 ( 71) -168 ( 77) -358 ( 71) -143 ( 79) -389 ( 72) INTERCEPT 6063 4731 4643 4113 4078 4049 R 2 ( % ) 0 ADJ'D R z(%) CP MSE 71.6 68.9 2.63 0.2182 71.1 68.4 3.05 0.2215 73.9 70.5 3.36 0.208 81.8 78.2 12.49 0.1157 83.4 79.6 12.57 0.1086 81.6 78.0 12.85 0.1170 MIN. TOLER. VARIABLE 0.883 KSPNO 0.85 KSPNO NOT AVAILABLE NOT AVAILABLE 0.742 RRETI NOT AVAILABLE NOT AVAILABLE NOT AVAILABLE NOT AVAILABLE 183 - problems with m u l t i c o l l i n e a r i t y are not too serious here even with the large sets of the female r e s u l t s ; thus i t appears that the choice of judgment components f o r these analyses was an adequate one. Because of the d i f f e r e n t samples and the re-expressions involved i n Table XLI, i t i s d i f f i c u l t to judge the consistency of these r e s u l t s with the corresponding time zero findings and to gauge the degree of improvement i n the predictions obtained - i f any. Some information was obtained here by f i r s t regressing INTERM on the 20 time zero v a r i a b l e s of Table XXI, for the cases that survived more than two years, and then repeating the analysis with the 13 time zero coded va r i a b l e s replaced by the corresponding time two v a r i a b l e s . For the male subsample, the time zero analysis produced the following "best" 2 set: MAXSYS, KPROO, KSPNO, with R = 0.521; a second set, with 2 R =0.513, chose BACHO instead of KPROO, which gives one of the sets appearing i n Table XXI (A). The time two best set includes MAXSYS, 2 2 HSYM2, and KSPN2, with R = 0.620. In the female subsample, an- R of 0.508 was obtained using MAXDIA, MINSYS, MINDIA, and HSYMO, while 2 the set, HSYM2, KSPN2, RRET2, and RPAP2 produced an R of 0.686; moreover, a larger time two set for the female group included MINSYS (+) and MINDIA (-) i n addition to the four v a r i a b l e s of the "best" set, and 2 the r e s u l t i n g value of R was 0.728. These r e s u l t s are quite consistent with those of Table XXIII and c l e a r l y i n d i c a t e the d e s i r a b i l i t y of integrat i n g the time two data into the an a l y s i s . The r e s u l t s of stepwise discriminant analyses of STATUS using a l l 33 independent variables (re-expressed as i n Table XLI) may be found i n Tables XLII and XLIII. The jackknifed percentages of correct c l a s s i f i c a t i o n s here r e f l e c t the r e l a t i v e l y low success rate for c l a s s -i f y i n g patients who did not survive the study period. This s i t u a t i o n would l i k e l y improve with a downward adjustment of the constant i n each case, and perhaps some of the more sophisticated adjustments and trans-formations used i n Section 5.4.1 are indicated here as w e l l . For present purposes, however, i t s u f f i c e d to notice i n both subsamples, the consis-tency between the regression r e s u l t s of Table XLI and many of the "good" disc r i m i n a t i n g sets obtained from the stepwise descriminant a n a l y s i s . The f i n a l step i n the time two analysis involved p r i n c i p a l component re-expressions of the a v a i l a b l e data, by analogy with the time zero a n a l y s i s . Once again, components were extracted from groups of s i m i l a r v a r i a b l e s (blood pressure, heart, brain, kidney, and r e t i n a l ) , but with time two variables replaced by the increments, i n an e f f o r t to reduce inter-group c o r r e l a t i o n s somewhat. The most important difference at t h i s stage, however, i s the number of components obtained: 30 here, as compared with only 17 previously. With such a large number of variables to work with, the step-by-step orthogonalization technique employed previously no longer seemed worthwhile here. Instead, selected members of a "promising" group of predictors f o r each subsample were replaced by t h e i r r e s i d u a l s i n an e f f o r t to produce usable f r a c t i o n s of 2 around 90% or more, high values of R , and high l e v e l s of s i g n i f i c a n c e f o r a l l variables retained. These goals proved easier to a t t a i n i n the male than i n the female subsample. Tables XLIV to XLVII o u t l i n e the r e s u l t s of the p r i n c i p a l component re-expression process i n the male sample: a few of the 185 Table XLII: Discriminant Analysis with Time Two Data: Male Subsample (n=36) Step #9 Result: C l a s s i f y a case as an eight-year survivor i f and only i f 2.22377 (SYSO) + 3.12509 (KSPNO) + 3.23658 (HSYMI) - 8.69268 < 0.44731 CLASSIFICATION MATRICES (Jackknifed Results) A l i v e Dead PREDICTED ALIVE 20 (20) 2 (3) PREDICTED DEAD 2 12 (2) (11) PERCENT CORRECT 90.9 85.7 (90.9) (78.6) 88.9 (86.1) o v e r a l l Step #8 Result: 2.85734 (SYSO) + 3.51533 (KSPNO) + 1.89696 (DIAI) + 2.90670 (HSYMI) - 10.15852 < 0.44731 CLASSIFICATION MATRICES (Jackknifed) PREDICTED PREDICTED PERCENT ALIVE DEAD CORRECT A l i v e 21 (21) 1 (1) 95.5 (95.5) Dead 2 (4) 12 (10) 85.7 (71.4) 91.7 (86.1) 186 Table XLIII: Discriminant Analysis with Time Two Data: Female Subsample (n=44) Step #16 Result: C l a s s i f y a case as an eight-year survivor i f and only i f : 2.91604 (RRETO) + 3.83782 (HSIZI) + 1.37076 (BACHI) + 3.2528 (KSPNI) + 5.47257 (RRETI) + 3.22568 (RPAPI) - 6.0778 < 0.75377 CLASSIFICATION MATRICES (Jackknifed Results) A l i v e Dead PREDICTED ALIVE 30 (29) 3 (4) PREDICTED DEAD 0 (1) 11 (10) PERCENT CORRECT 100.0 (96.7) 78.6 (71.4) 93.2 (88.6) o v e r a l l Step #15 Result: 1.95428 (KSPNO) + 2.82163 (RRETO) + 3.96921 (HSIZI) + 1.25064 (BACHI) + 2.99344 (KSPNI) + 5.76381 (RRETI) + 4.03092 (RPAPI) - 6.97523 < 0.75377 CLASSIFICATION MATRICES (Jackknifed) PREDICTED PREDICTED PERCENT ALIVE DEAD CORRECT Aliv e 30 (29) 0 (1) 100.0 (96.7) Dead 2 (3) 12 (11) 85.7 (78.6) 95.5 (90.9) o v e r a l l 187 o r i g i n a l components are described i n Table XLIV; two representative stepwise regression r e s u l t s f o r INTERM, using AGE, HIST, DUR and the o r i g i n a l 30 within-groups components, appear i n Table XLV; the best r e s i d u a l s o l u t i o n obtained i s summarized i n Table XLVI; and, f i n a l l y , Table XLVII presents some of the discriminant analysis r e s u l t s using the " r e s i d u a l " components. Comparing Tables XLV and XLVI, i t can be 2 seen that using re s i d u a l s has s i g n i f i c a n t l y improved the value of R while producing a p r e d i c t i o n equation i n which variables from a l l f i v e groups are present. The discriminant analysis r e s u l t s , however, are somewhat disappointing - although there are some signs of improvement i n the jackknifed percentages of non-survivors c o r r e c t l y c l a s s i f i e d . The corresponding r e s u l t s f or the female subsample appear i n Tables XLVIII to LI. The regression so l u t i o n of Table XLIX (A) i s notable for the presence of only one tolerance below the 80% l e v e l (K2 2 at 0.732), along with a value of R that compares quite favourably with those of Table XLI (B). The unappealing complexity of the r e s i d u a l s o l u t i o n (Table L) indicates the d i f f i c u l t y with which r e l a t i v e l y small 2 improvements i n R were achieved by t h i s process; however, the minimum tolerance i s now about 89%, i n d i c a t i n g that an apparently more stable s o l u t i o n has been obtained. F i n a l l y , the reader should compare the jackknifed r e s u l t s i n Tables XLIII and LI before deciding which d i s c r i -minant function i s the most desirable from a p r a c t i c a l point of view. 5.10 Analyses using Time Five Data The incorporation of the data c o l l e c t e d at the five-year point i n the study exacerbated most of the problems mentioned i n Section 5.9 188 Table XLIV: C o e f f i c i e n t s for Selected Within-groups P r i n c i p a l Components: Male Subsample (C o e f f i c i e n t s x 10 4) VARIABLE BP1 BP2 BP4 BP7 SYSO 3531 2287 3170 -2799 DIAO 3312 1958 -2015 -5826 SYSI -783 -6455 -1925 -599 DIAI -478 -6271 2073 -3141 SMXSYS 4428 -791 5648 1499 SMXDIA 4294 -100 -6178 3427 SMNSYS 4471 2374 1457 4636 SMNDIA 4206 -1918 -2387 -3537 % OF VARIANCE 60.4 14.4 6.1 1.5 CORR. (INTERM) -0.466 0.1752 -0.160 0.411 SMXSYS = (MAXSYS - 195.91742)/29.22147 SMXDIA = (MAXDIA - 129.66266)/16.25401 SMNSYS = (MINSYS - 135.84422)/29.68356 SMNDIA = (MINDIA - 87.16627)/14.02817 COMP. HSYMO HSIZO HECGO HSYMI HSIZI HECGI % CORR. H3 H6 3223 -4071 1220 -13 -7350 4061 -4232 -2951 -2316 -3628 3291 6712 17.2 3.0 0.279 0.090 COMP. BACHO BCVAO BACHI BCVAI % CORR. B4 -4185 5288 -5005 5428 2.2 -0.422 COMP. KSYMO KPROO KSPNO KSYMI KPROI KSPNI % CORR. Kl K4 380 -3899 -4067 -2005 -2443 5778 -3998 3453 -2579 4112 -7396 -4307 51.8 7.9 0.646 -0.165 Adjusted to have mean = St.Dev. = 1 COMP. RVESO RRETO RPAPO RVESI RRETI RPAPI % CORR. RI 9194 2797 496 -2273 -1326 693 51.2 -0.389 R2 -1168 7827 236 5928 -1331 632 15.5 -0.353 Table XLV: Stepwise Regression Using INTERM and Time Two Data Re-expressed as Within-groups Components: Male Subsample (A) Step #9 Result: R 2 = 65.4, ADJ'D R 2 = 59.6, MSE = 0.2832 VARIABLE COEF* ST. ERROR** F-STAT TOLERANCE BP2 -246 113 4.76 0.723 H2 -425 144 8.69 0.387 H3 516 136 14.36 0.693 Kl 340 131 6.76 0.418 DUR -68 30 4.99 0.660 INTERCEPT 4898 Step #10 Result: R 2 = = 59.9, ADJ'D R 2 = 54.7, MSE = 0.3175 VARIABLE COEF* ST.ERROR* F-STAT TOLERANCE H2 -382 151 6.38 0.394 H3 398 132 9.05 0.822 Kl 279 135 4.25 0.438 DUR -50 31 2.56 0.716 INTERCEPT 4452 * x 1000 e XLVI: Regression Equation f o r INTERM using Residual Components: Male Subsample VARIABLE COEF* ST. ERROR** P-VALUE TOLER. BPl -208 41 0.0 0.988 BP4 -357 133 1.2 0.953 BP7 1173 263 0.0 0.986 H3 319 104 0.5 0.917 RB4 -949 314 0.5 0.962 RK1 379 117 0.3 0.946 RK4 -727 214 0.2 0.948 INTERCEPT 4972 * x 1000 R 2 = 0.7514, MSE = 0.218 RB4 = B4 + 0.233 (BP7) + 0.098 (H3) RK1 = Kl + 0.23771 (BPl) - 1.15479 (BP7) + 0.89579 (H6) + 0.51676 (B4) + 0.11192 (Rl) + 0.37606 (R3) RK4 = K4 - 0.517632 (H6) Conversion to O r i g i n a l C o e f f i c i e n t s : VARIABLE COEFFICIENT ( x10 4) BPl -1179 BP 4 -3569 BP7 5139 H3 2261 H6 7160 B4 -7527 Kl 3790 K4 -7273 Rl 424 R3 1425 191 Table XLVII: Stepwise Discriminant Results using Residual Components: Male Subsample (A) Step #7 Result: C l a s s i f y a case as an eight-year survivor i f and only i f : 0.73905 (BP1) + 2.0026 (BP4) + 2.77368 (H2) - 3.24519 (H3) - 7.14398 (K6) - 1.4885 (RK1) + 3.66578 (RK4) - 5.19315 < 0.44731 CLASSIFICATION MATRICES (Jackknifed Results) Alive Dead PREDICTED ALIVE 21 (19) 2 (3) PREDICTED DEAD 1 12 (3) ( I D PERCENT CORRECT 95.5 85.7 (86.4) (78.6) 91.7 (83.3) o v e r a l l (B) Step #8 Result: 1.18605 (BP1) + 1.72108 (BP4) + 2.93334 (H2) - 3.76113 (H3) - 2.18996 (B3) - 8.45251 (K6) - 1.75435 (4K1) + 4.31765 (RK4) - 5.96440 < 0.44731 CLASSIFICATION MATRICES (Jackknifed) PREDICTED PREDICTED PERCENT ALIVE DEAD CORRECT Alive 21 (17) 1 (5) 95.5 (77.3) Dead 2 (2) 12 (12) 85.7 (85.7) 91.7 (80.6) NOTE: ( i ) H2 = 0.3878 (HSYMO) + 0.3452 (HSIZO) + 0.0831 (HECGO) + 0.5432 (HSYMI) + 0.2954 (HSIZI) + 0.5841 (HECGI) ( i i ) B3 = 0.5105 (BACHO) + 0.5351 (BCVAO) + 0.5536 (BACHI) + 0.3828 (BCVAI) ( i i i ) K6 = - 0.4633 (KSYMO) + 0.6874 (KPROO) - 0.1867 (KSPNO) - 0.3997 (KSYMI) + 0.2665 (KPROI) - 0.2170 (KSPNI) 192 Table XLVIII: C o e f f i c i e n t s f o r Selected Within-Groups P r i n c i p a l Components: Female Subsample (C o e f f i c i e n t s x 10 4) VARIABLE BPl BP2 BP6 SYSO 3227 -1494 701 DIAO 3698 -1603 -6042 SYSI -507 -332 -4503 DIAI -988 1064 4933 SMXSYS 4674 -3045 679 SMXDIA 4305 -4363 3157 SMNSYS 4469 4045 2471 SMNDIA 3785 7020 -1358 % OF VARIANCE CORR. (INTERM) 56.1 -0.356 13.7 0.114 3.5 0.01' SMXSYS = (MAXSYS - 212.36165)/28.11562 SMXDIA = (MAXDIA- 130.49066)/15.55479 SMNSYS = (MINSYS - 146.42724)/20.82276 SMNDIA = (MINDIA- 91.28247)/ll.28571 Adjusted to have mean = st.dev. = 1 (B) COMP. HSYMO HSIZO HECGO HSYMI HSIZI HECGI % CORR. H3 H6 -436 -3215 985 1922 -4149 -818 -7411 -2857 -2792 8747 -4348 791 17.7 5.8 0.513 -0.083 (C) COMP. BACHO BCVAO BACHI BCVAI 1 % CORR. B4 -1974 7293 -731 6510 1 6.2 -0.516 (D) COMP. KSYMO KPROO KSPNO KSYMI KPROI KSPNI % CORR. K2 K5 -2501 1374 -2168 2097 -3330 -1392 -2424 -9016 -4276 3168 -7335 676 33.3 4.7 0.541 -0.073 COMP. RVESO RRETO RPAPO RVESI RRETI RPAPI % CORR. Rl 7117 6842 434 -1267 -649 557 49.5 -0.384 R2 -5036 4614 511 -251 -7107 -1585 26.5 0.476 R3 4711 -4694 969 2156 -5359 -4634 9.6 0.461 Table XLIX: Stepwise Regression using INTERM and Time Two Data Re-expressed as Within-groups Components: Female Subsample (A) Step #7 Result: R 2 = 80.9, ADJ'D R 2 MSE = 0.1214 = 77.2, VARIABLE COEF* ST. ERROR* F-STAT TOLER. BP6 350 126 7.73 0.852 H4 147 99 2.18 0.866 B2 72 47 2.41 0.888 K2 370 70 27.72 0.732 Rl -284 55 26.67 0.924 R2 337 78 18.48 0.848 R3 795 126 39.61 0.901 INTERCEPT 4301 (B) STEP #8 RESULT: R 2 = 82.4, ADJ'D R 2 = 78.4, MSE = 0.1151 VARIABLE COEF* ST. ERROR* F-STAT TOLER. BP6 298 126 5.55 0.802 H3 135 79 2.97 0.744 H4 174 98 3.14 0.844 B2 64 46 1.95 0.877 K2 328 73 20.25 0.647 Rl -258 55 21.72 0.860 R2 322 77 17.50 0.836 R3 737 128 33.34 0.837 INTERCEPT 4322 * x 1000 NOTE: ( i ) H4 = - 0.6899 (HSYMO) + 0.5086 (HSIZO) + 0.0327 (HECGO) + 0.3512 (HSYMI) - 0.2200 (HSIZI) - 0.3043 (HECGI) ( i i ) B2 = 0.6011 (BACHO) + 0.1398 (BCVAO) - 0.7844(BACHI) - 0.0624 (BCVAI) 194 Table L: Regression Equation for INTERM using Residual Components: Female Subsample VARIABLE COEF* ST. ERROR* P-VALUE (%) TOLER. RH3 452 70 0.0 0.958 RH5 -273 132 4.6 0.917 RB4 -634 101 0.0 0.979 RK2 287 67 0.0 0.967 RK5 299 169 8.4 0.955 RRl -189 61 0.4 0.894 RR2 309 81 0.1 0.927 RR3 758 132 0.0 0.938 INTERCEPT 4194 * x 1000 R = 0.8309, MSE = 0.111 RH3 = H3 - 0.02605 (BP2) - 0.15360 (BP 6) + 0.31554 (B4) RH5 = H5 - 0.05911 (BP2) + 0.12541 (BP6) + 0.27068 (B4) RB4 = B4 + 0.07347 (BP2) - 0.15577 (BP6) RK2 = K2 + 0.10439 (BP2) + 0.52799 (BP6) - 0.39256 (H3) + 0.34663 (H5) + 0.27436 (B4) RK5 = K5 + 0.03329 (BP2) - 0.03723 (BP6) + 0.06417 (H3) - 0.00534 (H5) - 0.19790 (B4) RRl = R l - 0.00757 (BP2) - 0.52560 (BP6) + 0.39960 (H3) - 0.18997 (H5) - 0.68045 (B4) - 0.09700 (K2) + 0.44300 (K5) RR2 = R2 - 0.19494 (BP2) + 0.08141 (BP6) - 0.10769 (H3) - 0.28474 (H5) + 0.16992 (B4) - 0.25100 (K2) - 0 .06700 (K5) RR3 = R3 - 0.01696 (BP2) - 0.06909 (BP6) - 0.12530 (H3) - 0.06051 (H5) + 0.15067 (B4) + 0.09800 (K2) + 0 .33400 (K5) Table L, Continued CONVERSION TO ORIGINAL COEFFICIENTS VARIABLE COEFFICIENT ( * 10 4 ) BP2 -740 BP6 2076 H3 1547 H5 -2731 B4 -2504 K2 3021 K5 4477 RI -1888 R2 3091 R3 7582 196 TABLE LI: Stepwise Discriminant Results using Residual Components: Female Subsample (A) Step #18 Result: C l a s s i f y a case as an eight-year survivor i f and only i f : - 5.29335 (RH3) + 3.72381 (RH5) - 1.39468 (RB2) + 4.94301 (RB4) - 5.50302 (RK2) + 2.62235 (RR1) - 3.28189 (RR2) - 8.54369 (RR3) - 11.66879 < 0.75377 CLASSIFICATION MATRICES (Jackknifed Results) PREDICTED PREDICTED PERCENT ALIVE DEAD CORRECT Alive 30 (28) 0 (2) 100.0 (93.3) Dead 2 (2) 12 (12) 85.7 (85.7) 95.5 (90.9) o v e r a l l (B) Step #19 Result: - 4.93639 (RH3) - 1.08236 (RB2) + 4.36477 (RB4) - 5.04904 (RK2) + 2.61749 (RR1) - 3.22651 (RR2) - 8.02713 - 11.14732 < 0.75377 CLASSIFICATION MATRICES (Jackknifed) PREDICTED PREDICTED PERCENT ALIVE DEAD CORRECT Alive 29 (29) 1 (1) 96.7 (96.7) Dead 3 (3) 11 ( I D 78.6 (78.6) 90.9 (90.9) NOTE: RH3, RH5, RB4, RK2, RR1, RR2, RR3 are as i n Table L RB2 = B2 - 0.040969 (BP2) + 0.180395 (BP6) 197 with respect to the time two data. When considering cases f o r whom v a l i d time f i v e data were a v a i l a b l e , the t o t a l sample siz e i s reduced to 66 (28 males and 38 females) and the t o t a l number of variables on each i s now 47 (39 coded symptoms plus eight "background" v a r i a b l e s ) . For t h i s reason, regression analyses using a l l possible independent v a r i -ables were not attempted - even for the combined sample. Rather, the time f i v e data were f i r s t examined separately, and then i n combination with the time zero data, the variables being re-expressed i n terms of time zero data and (time f i v e minus time zero) increments. The time zero data were used here i n preference to t h e i r time two counterparts because of the missing data problems among the l a t t e r v a r i a b l e s . However, data from a l l of the f i r s t three examination points of the ten-year follow-up were considered i n the repeated measures analysis of variance and i n the growth curve a n a l y s i s , discussed i n Sections 5.10.3 and 5.10.4. . 5.10.1 Time Five Symptom Variables The analyses of t h i s section involve the 13 coded symptom va r i a b l e s , as measured f i v e years a f t e r the s t a r t of the study, plus the eight "background" v a r i a b l e s , AGE, HIST, DUR, MAXSYS, MAXDIA, MINSYS, MINDIA, and SEX. The independent v a r i a b l e here w i l l be e i t h e r the "alive-dead" v a r i a b l e , STATUS, or the o r i g i n a l s u r v i v a l time v a r i a b l e , TIME; with the l a t t e r target, the survivors were weighted 0.3, since they are almost four times as numerous as the non-survivors i n t h i s sample. Considering f i r s t the combined sample of 66 five-year survivors, the following ordering of c o r r e l a t i o n s with "weighted time" was observd: 198 KSYM5 (-0.739), RRET5 (-0.733), HSYM5 (-0.729), KPR05 (-0.660), KSPN5 (-0.641), RVES5 (-0.636), HECG5 (-0.581), HSIZ5 (-0.476). The reader w i l l note the absence of any blood pressure variables from the above l i s t ; i n f a c t , among these, SYS5 was most highly co r r e l a t e d with weighted time, with r = -0.447 . Because of the strong c o r r e l a t i o n s among the predictors i n t h i s sample, the regression analysis using the o r i g i n a l 21 variables was not very s a t i s f a c t o r y . A better s o l u t i o n , using judgment components (or t h e i r residuals a f t e r regression on the weighted average of r e t i n a l symptoms, Rl) i s presented i n Table LIT... (A). Of i n t e r e s t here are the presence of the MINSYS, MINDIA p a i r with opposite signs, and the impor-tance of the r e t i n a l re-expression, R l . Parts (B) and (C) of Table LII show a s i m i l a r tendency for the average of the three r e t i n a l symptoms to play a c e n t r a l r o l e i n the "best" p r e d i c t o r sets chosen by a l l subsets techniques. Unfortunately, the male subsample i s hardly larger i n s i z e than the set of predictors that were used here, and the r e s u l t i n g s o l u t i o n should therefore be viewed with some reservation. A stepwise discriminant analysis of the STATUS va r i a b l e was done f o r the combined sample only, and, once again, the data were re-expressed as the judgment components of Table L I I . Using MIN2, Hi, K2, and Rl (with K2 = 2.0 (KSYM5) - 1.0 (KPR05) - 1.0 (KSPN5)), the following c l a s s i f i c a t i o n matrix was observed both with jackknifing and without: ALIVE DEAD PREDICTED PREDICTED ALIVE DEAD 51 1 5 9 PERCENT CORRECT 98.1 64.3 90.9 o v e r a l l 199 Table L I I : Regression Results using Time Five Variables i n Judgment Components A. Combined Sample (n=66) "Best" predictors f o r weighted time: 0.018 (MIN2) - 0.634 (RH1) - 0.166(H2) - 0.450 (RK1) - 1.143 (Rl) - 0.193 (RR2) + 11.060 (R z = 79.9, minimum tolerance = 0.82, RK1) MIN2 = 11.0 (MINSYS) - 25.0 (MINDIA) RH1 = [1.2 (HSYM5) + 1.0 (HSIZ5) + 0.9 (HECG5)]/3.1 - 0.650 (Rl) H2 = 2.0 (HSYM5) - 1.4 (HSIZ5) - 1.4 (HECG5) RK1 = [1.2 (KSYM5) + 1.0 (KPR05) + 1.3 (KSPN5)]/3.5 - 0.767 (Rl) Rl = [0.9 (RVES5) + 1.2 (RRET5) + 0.5 (RPAP5)]/2.6 RR2 = - 1.6 (RVES5) + 2.0 (RRET5) - 1.6 (RPAP5) - 0.362 (Rl) B. Male Subsample (n=28) "Best" predictors for weighted time: 0.421 (RDIA5) - 0.909 (H2) - 0.191 (HI) + 8.647 (R 2 = 70.3, minimum tolerance = 0.931, RDIA5) RDIA5 = DIA5 - 0.5228 (Rl) H2 = 2.0 (HSYM5) - 1.4 (HSIZ5) - 1.4 (HECG5) R2 = [1.1 (RVES5) + 1.3 (RRET5) + 0.8 (RPAP5)]/3.2 C. Female Subsample (n=38) "Best" predictors f o r weighted time: - 0.529 (RH1) - 0.214 (RH2) - 0.454 (RK1) - 1.854 (Rl) + 10.855 (R 2 = 85.6, minimum tolerance = 0.973, RK1) RH1 = [1.1 (HSYM5) + 1.0 (HSIZ5) + 1.0 (HECG5)]/3.1 - 0.707 (Rl) RH2 = 2.0 (HSYM5) - 1.0 (HSIZ5) - 1.0 (HECG5) - 0.483 (Rl) RK1 = [1.3 (KSYM5) + 1.2 (KPR05) + 1.3 (KSPN5)]/3.8 - 0.933 (Rl) Rl = [0.7 (RVES5) + 1.1 (RRET5) + 0.1 (RPAP5)]/1.9 200 The r e l a t i v e ease with which survivors are c o r r e c t l y c l a s s i f i e d i s again apparent i n the above t a b l e . One other target v a r i a b l e was also explored using the predic-tors of t h i s section. Abbreviated SEV10, t h i s v a r i a b l e represents an attempt to summarize the patient's o v e r a l l condition at e i t h e r the ten-year point i n the study or at the time of death (assumed to be a f t e r the five-year p o i n t ) . Involving as i t does a rather a r b i t r a r y weighting of heart, brain, kidney, and r e t i n a l symptoms, t h i s rough "index" w i l l not be discussed i n d e t a i l here. However, i t was i n t e r e s t i n g to note, i n the combined sample, that the variables having the seven highest c o r r e l a -tions with SEV10 are p r e c i s e l y those that are most highly c o r r e l a t e d with "weighted TIME". Less a r b i t r a r y indices of " f i n a l condition" were con-sidered as w e l l , but w i l l not be discussed here. 5.10.2 Inclusion of Time Zero Data Again, the dependent va r i a b l e s considered are STATUS and "weighted time", but the analyses were c a r r i e d out only on the male and female subsamples, i n view of the differences between parts (B) and (C) of Table LII. A summary of the r e l a t i o n s h i p of the re-expressed p r e d i c -tors (numbering 33 i n a l l ) to each of the targets appears i n Table L I I I . It i s f a i r l y c l e a r from these r e s u l t s that the blood pressure variables have been reduced to a p o s i t i o n of secondary importance as predictors of s u r v i v a l , r e l a t i v e to changes i n heart, r e t i n a l , and kidney symptoms. The main differences between the male and female r e s u l t s appear to be the greater importance, once again, of kidney v a r i a b l e s i n the female group, and the dominance of '"increment" variables i n the l i s t of the "best" 201 Table L I I I : Importance of Time Zero and (Time Five Minus Time Zero) Variables as Predictors (by Sex) For STATUS (F-STAT.) For WEIGHTED TIME (CORR) MALE FEMALE MALE FEMALE HSYMI (10.25) RRETI (17.53) HSYMI (-0.681) RRETI (-0.820) RVESI (9.27) HSYMI (12.25) RRETI (-0.647) KSYMI (-0.708) RRETI (8.40) KSPNI (8.98) HECGI (-0.643) KSPNI (-0.624) KPROI (8.24) KSPNO (6.56) KSYMI (-0.600) KSYMO (-0.550) KSPNI (6.00) KPROI (6.07) KPROI (-0.590) KSPNO (-0.536) HECGI (4.92) KSYMI (5.20) KSPNI (-0.557) KPROI (-0.525) KSYMI (4.16) HSIZI (4.56) RVESI (-0.541) HSYMO (-0.524) RPAPI (3.61) HSYMO (4.18) HSIZI (-0.498) HSYMI (-0.497) HSIZI (3.27) HIST (4.12) RPAPI (-0.480) HECGO (-0.465) BACHO (-0.407) RVESI (-0.437) NOTE: ( i ) " I " a f t e r a variable name indicates an increment: time 5 minus time 0 value ( i i ) Sample s i z e s : 28 males 38 females ( i i i ) Weighted time: f l . O f o r Non-Survivors 10.3 f o r survivors 202 predictors f o r the males. Table LIV i s a summary of the t y p i c a l stepwise regression and discriminant analysis r e s u l t s obtained using the time zero and increment va r i a b l e s , along with the seven "background" measures. The f a c t that the i n c l u s i o n of eight variables i n the discriminant function for the male group was enough to achieve a "perfect" c l a s s i f i c a t i o n throughout can be taken as further evidence that the s i z e of the subsample i s s e r i o u s l y l i m i t i n g i t s usefulness. The r e s u l t s for the female group appear to be considerably more consistent, with the HSYM, KPRO, and RRET va r i a b l e s appearing i n both solutions presented i n part (B) of the table. Improvements upon the solutions presented i n Table LIV were sought, using p r i n c i p a l component re-expressions of the independent v a r i a b l e s . As i n Section 5.9 (Tables XLIV and XLVlII), p r i n c i p a l compo-nents were ca l c u l a t e d for each group of symptoms r e s p e c t i v e l y , using time zero and increment v a r i a b l e s . In both the male and the female groups, the f i r s t p r i n c i p a l component of each symptom group was e s s e n t i a l l y an average of the increment v a r i a b l e s ; t h i s suggests that each subsample i s r e l a t i v e l y homogeneous with respect to the l e v e l s of the time zero v a r i a b l e s , but that a v a r i e t y of rates of change i n these symptoms occurred over the f i r s t f i v e years of study. This tendency w i l l be con-firmed g r a p h i c a l l y i n the next subsection. Since the most obvious problem i n Table LIV (A) involved very low usable f r a c t i o n s among the r e t i n a l v a r i a b l e s , these were replaced by t h e i r f i r s t three p r i n c i p a l components, Rl , R2, and R3. However, a stepwise regression analysis with these re-expressions showed that nine 203 Table LIV: Regression and Discriminant Analysis Results for Time Zero and (Time Five minus Time Zero) Data (by Sex) (A) Male Subsample (n=28) (a) Stepwise Regression Result f o r Weighted TIME ( F i n a l Step) - 0.028 (MAXDIA) + 0.844 (DIAO) - 0.423 (RVESO) - 0.771 (HSYMI) - 0.388 (BCVAI) - 1.797 (RRETI) + 2.241 (RPAPI) + 12.010 (R 2 = 0.934, MINIMUM TOLERANCE = 0.11, RPAPI, RRETI) (b) Discriminant Analysis Result Using: HECGO, BCVAO, KSPNO, HSYMI, BCVAI, RRETI, RPAPI CLASSIFICATION MATRICES (with Jackknifing) A l i v e Dead PREDICTED ALIVE PREDICTED DEAD 5 0 (5) (1) 1 (1) 22 (21) PERCENT CORRECT 83.3 (83.3) 100.0 (95.5) 96.4 (92.9) o v e r a l l (B) Female Subsample (n=38) (a) Stepwise Regression Result f o r Weighted TIME ( F i n a l Step) - 0.693 (HSYMO) - 0.636 (RRETO) - 0.641 (HSYMI) - 0.537 (KPROI) - 0.814 (RRETI) + 10.303 (R 2 = 0.878, MINIMUM TOLERANCE = 0.477, RRETI) (b) Discriminant Analysis Result Using: KPROO, RRETO, HSYMI, RVESI, RRETI CLASSIFICATION MATRICES (Jackknifed) PREDICTED PREDICTED PERCENT ALIVE DEAD CORRECT Aliv e 6 (5) 2 (3) 75 (62.5) Dead 0 (0) 30 (30) 100 (100) 94.7 (92.1) 204 predictors were required before the R values reached those of Table LIV (A); and, with so many predictors and only 28 cases, the i n e v i t a b l e r e s u l t was another group of low tolerances. Since t h i s type of analysis did not seem worthwhile i n the male subsample, and since the r e s u l t s already obtained f o r the female group were quite s a t i s f a c t o r y , these methods were abandoned i n favour of those to be described i n the next• subsections. 5.10.3 Repeated-Measures ANOVA This subsection summarizes the r e s u l t s of the analysis of variance with repeated measures described i n subsection 4.2.7 . The v a r i a b l e s and trends being explored here are e a s i l y understood from an examination of Figure 6, parts (A) to (M). A few comments are warranted at t h i s stage: there i s a strong tendency f o r the average symptom, at a l l three time points, to be higher i n the female group than i n the male group; t h i s i s true r e s p e c t i v e l y f o r both those who survived the study, and those who di d not, although the tendency i s stronger among the former patients; the exceptions to t h i s tendency among the non-survivors show another sort of consistency: the average symptom at times zero and two i s higher f o r the female group, while the male subsample shows the more serious symptom average at the five-year point; t h i s i s true of the variables KSYM, KSPN, RVES, RRET, and RPAP. a second i n d i c a t i o n of male-female differences i s derived from the observation that, for eight of the 13 symptoms, the time zero average i s 35r-30 25 2 0 h 15h 10 h 205 m. s 0 YEAR Figure 6: Average Symptom at Years 0, 2, 5, by Sex and Status (A) SYS Note: f.n. = female non-survivors (n = 8) m.n. = male non-survivors (n = 6) f . s . = female survivors (n = 30) m.s. = male survivors (n = 22) Figure 6(B): DIA Note: The symptom i s transformed by adding one and then taking the base 10 logarithm. Note: The symptom i s transformed by adding one and then taking the base 10 logarithm. Figure 6(E) : HECG Note: The symptom i s transformed by adding one and then taking the base 10 logarithm. Figure 6(G): BCVA 212 Figure 6(H): KSYM Note: The symptom i s transformed by adding one and then taking the base 10 logarithm. 35t 213 I I I I > 0 2 5 YEAR Figure 6(1): KPRO Note: The symptom i s transformed by adding one and then taking the base 10 logarithm. Figure 6(J): KSPN Note: The symptom i s transformed by adding one and then taking the base 10 logarithm. Figure 6(K): RVES Note: The symptom i s transformed by adding one and then taking the base 10 logarithm. Figure 6(M): RPAP 218 lower f o r the group of male non-survivors than f o r the females who survived the study period; f o r the BCVA symptom, t h i s i s true f o r a l l three time points, and i n other cases, the two time zero averages are quite close, with the males being somewhat more severe; t h i s i s consis-tent with e a r l i e r r e s u l t s showing a greater "resistance" among the females of t h i s sample. the behavior of blood pressure and brain symptoms over time d i f f e r s considerably from the general tendency f o r a l l four subsamples to show increasing symptom averages; f o r SYS, the average l e v e l s decline some-what i n a l l groups except the male non-survivors (where there i s a s l i g h t increase); with DIA, the o v e r a l l trend i s to decreasing averages f o r a l l four subsamples and t h i s i s true f o r BACH as well; the V-shaped trend f o r a l l groups except the male non-survivors i s an i n t e r e s t i n g observa-t i o n concerning the BCVA va r i a b l e and possible suggests a five-year recurrence pattern of cerebrovascular accidents. Table LV gives the P-values, as percentages, f o r the various t e s t s of s i g n i f i c a n c e c a r r i e d out by the repeated-measures ANOVA. The columns involving the measurement-time e f f e c t , MT, are of greatest i n t e r e s t here, and are consistent with the more subjective analysis of the preceding paragraph. For example, with the BCVA v a r i a b l e , there i s a highly s i g n i f i c a n t quadratic e f f e c t of measurement time on symptom average. The f i n a l two columns are a type of summary f o r the whole table: the MT column shows that, with the exception of SYS and DIA, the symptom averages f o r the combined sample change s i g n i f i c a n t l y over time, i n eit h e r a l i n e a r or quadratic fashion (or a mixture of both); the MT by STATUS i n t e r a c t i o n e f f e c t indicates whether the change over Table LV: Results of Repeated Measures ANOVA for 13 Symptoms [Measurement Time*** (MT), STATUS, SEX]: P-VALUES ( x 100) EFFECTS STATUS MT MT LINEAR MT LINEAR MT MT x STATUS SEX x SEX LINEAR x STATUS x SEX QUADRAT. MT STATUS SYS 1 10 66 37 24 53 95 64 29 DIA 7 11 49 5 65 50 52 9 13 *HSYM 0 5 89 0 0 29 69 0 0 *HSIZ 3 1 20 0 11 77 0 0 6 HECG 0 88 62 0 0 63 7 0 1 BACH 30 9 60 0 54 42 3 0 70 BCVA 90 19 38 20 95 74 0 2 99 *KSYM 0 28 52 0 2 33 46 0 1 *KPRO 0 4 48 2 0 7 14 1 0 *KSPN 0 44 58 0 1 19 30 0 0 **RVES 2 35 67 0 0 6 47 0 0 *RRET 0 38 65 0 0 42 99 0 0 **RPAP 8 15 5 3 2 3 1 1 1 * Transformed by adding 1.0 and then taking the base 10 logarithm ** S i g n i f i c a n t MT x Sex inter a c t i o n observed *** Previously referred to as "STAGE" 220 time i s of the same nature i n the two subsamples, survivors and non-survivors; the r e s u l t s here may be interpreted as saying that the evolution of symptom sev e r i t y i s strongly associated with s u r v i v a l f o r a l l but the blood pressure and brain symptoms; that i s , non-survivors show more r a p i d l y worsening symptoms. 5.10.4 Growth Curve Analysis Estimates f o r the c o e f f i c i e n t s of the growth curve model described i n Section 4.2.9 may be found i n Table LVI; the constant terms have been omitted to save space, but a complete l i s t i n g of the r e s u l t s appears i n Appendix K. Using t h i s table the predicted symptom at time T, for times i n the i n t e r v a l from zero to f i v e years, i s obtained from the following equation: PREDICTED SYMPTOM (T) = [§ + 7 ( b B 2 - B^) ] + (3J3 ^ 3bB 2)T + - 2 3B 2T where B Q , B ^ , and B 2 a r e the estimated constant, l i n e a r , and quadratic c o e f f i c i e n t s , r e s p e c t i v e l y , and b i s simply the constant, 588/114 = 5.1579 . The reader's attention i s drawn to the f a c t that, f o r seven of the symptoms, the predicted value of the untransformed symptom i s found only a f t e r r a i s i n g the base ten to a power given by the above equation, and then subtracting one from the r e s u l t ; these symptoms include HSYM, HSIZ, BACH, KSYM, KPRO, KSPN, and RRET. In view of the r e s u l t s of Section 5.10.3, a p r a c t i c a l a p p l i c a -t i o n of these c o e f f i c i e n t s might be the following: f o r a patient who has been examined at two separate times, within f i v e years of each other, Table LVI Results of Growth Curve Analysis: Linear and Quadratic C o e f f i c i e n t s ( 3 , 3 „ , Times 10 4) 1 FEMALE SURV. MALE SURV. FEMALE NON-SURV. MALE NON-SURV. LIN. QUAD. LIN. QUAD. LIN. QUAD. LIN. QUAD. SYS -175 105 -143 -5 -59 -58 102 -56 DIA -154 106 -72 160 -237 -75 -121 -65 *HSYM 32 -14 36 28 135 7 208 -39 *HSIZ 61 -17 66 -17 115 -28 130 -67 HECG 337 64 217 110 668 165 664 -19 *BACH -79 34 -46 6 -103 9 -71 59 BCVA -160 196 -104 111 -191 269 -102 56 *KSYM 34 7 26 15 70 -1 144 -3 *KPRO -23 15 2 -18 41 -41 131 -15 *KSPN 31 19 28 -16 82 -26 177 -8 -RVES 101 18 187 55 402 68 879 -65 *RRET 2 1 -9 -1 170 4 124 -5 RPAP -34 -27 12 -7 17 -19 261 -156 * Transformed by adding 1.0, then taking the base 10 logarithm 222 an estimate of the rate of change i n symptom severity could be made (l i n e a r l y ) and the r e s u l t compared with each of the slopes, 33^ -SbB^ i f ° r the survivors and non-survivors of the same sex; a markedly higher rate of change than that of the survivor group would indi c a t e a poor prognosis. This comparison could be c a r r i e d out f o r the following variables (for example): HSYM, KSPN, RRET, and the consistency of the r e s u l t s noted. 5.11 Modelling the Hazard Function This section summarizes the r e s u l t s obtained by a p p l i c a t i o n of the methods described i n Section 4.2.9 . Because of the non-linear nature of the model being used here, c e r t a i n problems were encountered along the way - only some of which could be dealt with using the exper-ience gained from previous (linear) analyses. 5.11.1 Searching f o r a Model The techniques employed here are discussed i n Section 4.2.9.. Figures seven and eight are the truncated-Weibull p l o t s i n the male and female subsamples resp e c t i v e l y ; both p l o t s have a shape that i s reason-ably close to the one predicted f o r a truncated-Weibull s u r v i v a l curve, although the upward curve (toward i n f i n i t y ) f o r the values of ^ ( j j near ten i s not as regular or as f a s t as expected. In each case, a s t r a i g h t l i n e was f i t t e d (by hand) to the " l i n e a r " portion of the graph (near the origin) i n order to estimate the Weibull parameter, y ; although t h i s process i s quite subjective, i t seems u n l i k e l y that estimates above 1.1 would be obtained by other analysts. However, a more objective manner of analyzing such p l o t s would be d e s i r a b l e . 1.5r-0 . 8h 0.1 -0.6 5 -1.3 i i C -2.0 -2.7 -1 0 223 l n ( t ) Figure 7 :•' Truncated-Weibull P l o t of Follow-up Times: Males 224 Figure 8: Truncated-Weibull P l o t of Follow-up Times: Females 225 At t h i s point, the implications of the exponential model suggested by the Weibull p l o t s were examined. This model states that th the hazard function f o r the i i n d i v i d u a l , X^(t) , depends on the severity of the various symptoms at the s t a r t of the study period, and not on the amount of time, t, already survived. Thus, the hazard rate i s assumed constant over time, given the seve r i t y of the symptoms; t h i s implies that the i n d i v i d u a l ' s "constant" hazard rate could be re-evalu-ated following a change i n some of the symptoms, but would not increase "automatically" over the course of time. The i n t u i t i v e appeal of such a model i s enhanced by previous r e s u l t s showing the secondary importance of AGE as a predictor of s u r v i v a l time, r e l a t i v e to blood pressure and target organ symptoms. In short, more s p e c i f i c information about "wear out" i s provided by the state of the e s s e n t i a l components of the "system" than by the system's age alone. The next step was the a p p l i c a t i o n of Breslow's method f or deciding between two competing models for the constant hazard 'X , namely: X = (Zg) and X = exp(Zg) . Single-variable s t r a t i f i c a t i o n s were done on the basis of the following: SEV, SEX, AGE, DUR, DIAO, BACHO, p r i n c i p a l components: Hi, K l , and Rl , MAXSYS, MAXDIA, HSYMO, HSIZO, and HECGO. Plots of the cumulative hazard function (= - l n [ S ( t ) ] ) were produced for each stratum (using BMD:P1L) and estimates f o r X were made near the o r i g i n , to discount the e f f e c t s of censoring. At t h i s point, problems arose i n determining the appropriate spacing of the s t r a t a codes along the ho r i z o n t a l axis, given the o r i g i n a l coded nature of most of the symptoms used here and the d i s t r i b u t i o n of data i n the various s t r a t a . As a r e s u l t , only rather rough in d i c a t i o n s were obtained here. Figure 9 227 i s presented here only as an unusually regular and c l e a r example of the trends observed for the other v a r i a b l e s ; that i s , the exponential model con s i s t e n t l y l e d to pl o t s that were more nearly l i n e a r than those of the "inverse" model. Besides pointing to an adequate model, the s t r a t i f i c a t i o n s of the l a s t paragraph also provided a useful summary of the e f f e c t s of the various symptoms on the s u r v i v a l curve. Only i n the case of the heart symptom va r i a b l e s was there a c l e a r departure from the expected ordering of heights of the i n t r a - s t r a t a s u r v i v a l curves: .the two "middle" curves (out of four i n a l l ) were interchanged i n t h i s case. The AGE s t r a t i f i -c ation revealed a co n s i s t e n t l y higher p r o b a b i l i t y of s u r v i v a l f or patients i n the "young" group, but the curves for the "middle" and "old" groups suggest, i f anything, a s l i g h t l y lower curve f o r the middle-aged group. Unfortunately, considerations of space d i d not allow such p l o t s to be reproduced here. 5.11.2 Time Zero Data The remainder of the r e s u l t s presented i n t h i s chapter were obtained by repeated use of the BMDP program, BMD:P3R, modified to f i n d maximum l i k e l i h o o d estimates f o r J3 i n the model, \ = exp(Zg) . Various sets of independent v a r i a b l e s , Z_ , were considered, l a r g e l y on the basis of the r e s u l t s of previous regression analyses. Variables were e l i m i n -ated from the model on the basis of high P-values for the l i k e l i h o o d -r a t i o t e s t . Thus, the technique employed here was b a s i c a l l y a step-down procedure (with the occasional forward step) with v a r i a b l e d e l e t i o n guided by the size of the asymptotic standard error r e l a t i v e to the pre-d i c t e d c o e f f i c i e n t of the v a r i a b l e . Unfortunately, the l i m i t e d 228 f l e x i b i l i t y of the computational procedure used here d i d not encourage the exploration of a greater v a r i e t y of independent v a r i a b l e s ; however, ava i l a b l e r e s u l t s do suggest that good sets of predictors from the previous l i n e a r analyses are at l e a s t adequate when used i n the present model. In support of t h i s l a s t remark, there was a notable absence of d i f f i c u l t y i n achieving convergence of the i t e r a t i v e estimates with BMD:P3R even when large sets of variables were used. Two sets of complete sample (n = 98) r e s u l t s are presented i n Table LVII. Part (A) i s l a r g e l y a continuation of the model checking of the previous subsection, and indicates that SEV, SEX, and AGE are a l l useful p r e d i c t o r s , whereas AGE squared i s not. In Part (B), a small set of re-expressed predictors was checked and seen to produce c o e f f i c i e n t s and standard errors that are consistent with the standardized r e s u l t s of l i n e a r analyses. Unfortunately, the tolerances i n Table LVII (B) are considerably lower than they would be i n an ordinary regression analysis (see Table XII (B)) , since the i * " * 1 p r e d i c t o r i n the present model i s e f f e c t i v e l y exp(Zg) . Comparison of the l i k e l i h o o d s of the two sets of p r e d i c t o r s , (A) and (B), suggests that better r e s u l t s are obtained by considering i n d i v i d u a l symptoms, rather than AGE and SEV alone. F i n a l l y , the high l e v e l of s i g n i f i c a n c e of the transformed SEX v a r i a b l e i n model (B) was noted, and the decision made to pursue the remaining analyses with male and female groups separated. Table LVIII summarizes the r e s u l t s of using previously-established "good" sets of predictors i n each subsample. The f i n a l solutions and P-values obtained here are remarkably s i m i l a r to those of Table XXI (B) f o r the male group, and Table XXIII (B) f o r the females. 229 Table LVII: Hazard Model Results: Combined Sample, Time Zero Data, 98 cases (A) VARIABLE (Z) COEFFICIENT (§)* ST. ERROR* TOLERANCE Z s CONSTANT -2599 155 0.889 AGE 427 182 0.826 39.558 8.405 CSEX 342 152 0.924 — — CSEV 1725 271 0.768 - -VARIABLE DEFINITIONS: (B) CSEX = -1 for Female, +1 for Male CSEV = SEV - 2 TESTS: (1) AGE: x{ = 7.05; P < 0.01 2 (2) AGE 2:X X = 0.89; P > 0.10 2 (3) CSEXiXx = 5.34; P < 0.025 - In (LIKELIHOOD) = 144.8658 VARIABLE (Z) COEFFICIENT (§)* ST. ERROR* TOLERANCE Z s CONSTANT -2909 215 0.552 MAXSYS 1004 255 0.558 238.430 29.42 RMNDIA 379 183 0.737 33.531 9.159 BACHO 384 198 0.772 1.224 1.099 RKSPNO 570 187 0.727 -0.644 0.600 RRVESO 321 228 0.662 -0.620 0.963 CSEX 479 179 0.798 - -* x 1000 VARIABLE DEFINITIONS: RMINDIA = MINDIA -RKSPNO = KSPNO -RRVESO = RVESO -2 TEST: CSEX: \ = 9.626; P < 0.05 -In (LIKELIHOOD) = 129.8709 0.41585 (MINSYS) 0.00637 (MINSYS) 0.01387 (MINSYS) NOTE: Variables for which the mean Z, and standard deviation, s, are given were standardized before entry into the model. 230 Table LVIII: Hazard Model Results: Male, Female Groups, O r i g i n a l Time Zero Data, 48, 50 Cases (A) MALE GROUP VARIABLE (Z) COEFFICIENT (3)* ST. ERROR* TOLERANCE Z s CONSTANT -2448 320 0.444 — — MAXSYS 719 218 0.462 232.5 29.558 BACHO 301 270 0.642 1.021 1.101 KSPNO 634 322 0.339 0.456 0.681 RPAPO 361 388 0.346 0.375 0.866 TESTS: (1) AGE, MINSYS, MINDIA (2) RPAPO: x2 = 2.95 (3) HSIZO: (4) RRETO: xf = 0.20 X? = 1.58 X 3 = 2.77; P < 0.09 P > 0.20 P > 0.10 P > 0.15 (B) - In (LIKELIHOOD) = 66.3901 FEMALE GROUP VARIABLE (Z) COEFFICIENT (3)* ST. ERROR* TOLERANCE Z s CONSTANT -3587 542 0.373 MAXDIA 977 475 0.404 148.72 16.697 MINSYS -1523 708 0.235 168.18 21.285 MINDIA 1075 640 0.290 102.82 12.177 KSPNO 538 409 0.760 0.384 0.565 RRETO 487 396 0.458 0.654 1.114 HSYMO 833 438 0.367 0.740 0.778 * x 1000 TESTS: (1) AGE, MINSYS, MINDIA: X 3 = 9.22; P (2) AGE, BACHO: X 2 = 3.68; P > 0.10 (3) HYSMO: *1 - 9.39; P < 0.005 (4) MAXDIA: 4- 10.11; P < 0.005 (5) DIAO: 4 - 8.65; P < 0.01 (6) RRETO: 4.75; P < 0.04 (7) KSPNO: * i 2 - 5.99; P < 0.025 (8) KPROO: 3.62; P < 0.05 -In (LIKELIHOOD) = 55.7023 231 Attempts to introduce variables of a type not already represented i n the set were l a r g e l y f u t i l e i n both cases, r e i n f o r c i n g the impression that variable s e l e c t i o n could be guided by previous regression r e s u l t s . Tables LIX and LX are the r e s u l t s of e f f o r t s to improve upon the solutions of Table LVIII - p a r t i c u l a r l y with regard to the tolerances. The r e s i d u a l s o l u t i o n of Table LIX, in v o l v i n g only the female data, i s s t i l l f a r from i d e a l ; i n f a c t , some of the tolerances here are lower than i n Table LVIII (B). Some improvement i s apparent with the within-groups p r i n c i p a l components of Table LX, although the tolerances are s t i l l well below those reported for the l i n e a r analyses - e s p e c i a l l y i n the female subsample, where more variables are retained. A comparison of the l i k e l i h o o d s within each subsample indicates that a substantial improvement was obtained with p r i n c i p a l component re-expressions only i n the male subsample, although the component solutions retained the same numbers of variables as did the respective o r i g i n a l - v a r i a b l e solutions. Table LX may be further compared with Tables XXIX and XXXI. 5.11.3 Time Two Data The f i n a l sequence of analyses incorporate data from year two of the study into the hazard model. Since a constant hazard rate (with respect to time) i s once again assumed, the only modification at t h i s stage involves the replacement of the o r i g i n a l TIME v a r i a b l e by TIME - 2. As before, the time two variables are replaced by the increments, time two minus time zero value. Results involving the time zero and increment variables appear i n Table- LXI, which may be compared with Table XLI. Besides the usual problems with low tolerances, the reader w i l l notice the retention of 232 Table LIX: Hazard Model Results: Female Group, Time Zero Residuals, 50 Cases VARIABLE (Z) COEFFICIENT (3)* ST. ERROR* TOLERANCE Z s CONSTANT -3659 559 0.388 RMXDIA 1061 486 0.600 84.55 15.00 PC2 -950 556 0.452 26.12 8.63 HSYMO 709 484 0.341 0.740 0.777 RKSPNO 619 386 0.810 0.187 0.523 KC2 -304 606 0.294 -0.112 0.503 RRETO 423 470 0.356 0.654 1.114 * x 1000 VARIABLE DEFINITIONS: PCI = 0.40393 (MINDIA) + 0.91479 (MINSYS) PC2 = - 0.40393 (MINSYS) + 0.91479 (MINDIA) RMXDIA = MAXDIA - 0.31867 (PCI) - 0.07292 (PC2) KC1 = 0.46118 (KSYMO) + 0.88731 (KPROO) KC2 = - 0.88731 (KSYMO) + 0.46118 (KPROO) RKSPNO = KSPNO - 0.21775 (KC1) + 0.15175 (KC2) 2 TESTS: (1) PCI, KC1: X2 = 0.07; P > 0.50 2 (2) HSYMO: * i = 6.46; P < 0.015 2 (3) RRETO: Xi = 3.37; P < 0.08 -In (LIKELIHOOD) = 54.9889 233 Table LX: Hazard Model Results: Male, Female Groups, Time Zero, Orthogonalized Components (A) MALE GROUP VARIABLE (Z) COEFFICIENT ( B ) * ST. ERROR* TOLERANCE Z s CONSTANT -2447 273 0.868 BPl 1271 330 0.712 7.286 1.967 BP 5 568 221 0.711 0.086 0.375 SKI 464 279 0.869 -0.566 0.654 RRl 713 333 0.626 -1.991 1.187 VARIABLE DEFINITIONS: SEE TABLES XXV AND XXIX (A) SKI = Kl - 0.282 (BPl) + 0.127 (BP2) - 0.358 (BP3) + 0.391 (BP4) - 0.343 (BP5) + 0.337 (BP6) - 0.450 (RRl) - 0.034 (RR2) - 0.109 (RR3) TESTS: (1) BP6, RR2: = 1.31; P > 0.15 (2) TH3, UB2: x| = 1-37; P > 0.15 (3) UBl: X 2 = 0*45; P > 0.20 (4) SK2: x ^ = 1*57; P > 0.12 (5) SKI: xf = 8.27; P < 0.01 -In (LIKELIHOOD) = 64.6199 (B) FEMALE GROUP VARIABLE (Z) COEFFICIENT (B) * ST. ERROR* TOLERANCE Z s CONSTANT -3785 640 0.282 BPl 1416 642 0.579 7.393 1.810 BP2 539 494 0.771 -0.229 0.953 BP3 1113 527 0.563 -0.448 0.662 TH1 547 448 0.344 0.775 0.746 SK2 -846 607 0.543 0.410 0.477 RRl 557 390 0.552 -0.454 1.055 VARIABLE DEFINITIONS: See Table XXVI RRl = Rl - 0.36547 (BPl) - 0.30677 (BP2) + 0.33840 (BP3) + 0.65297 (BP4) - 0.36686 (BP5) + 0.41574 (BP6) SK2 = K2 + 0.032 (BPl) - 0.036 (BP2) - 0.064 (BP3) + 0.244 (BP4) - 0.218 (BP5) - 0.219 (BP6) + 0.142 (RRl) - 0.083 (RR2) + 0.231 (RR3) * x 1000 234 Table LX (B), Continued TH1 - RH1 - 0.341 (RR1) + 0.543 (RR2) - 0.147 (RR3) - 0.849 (SKI) + 0.872 (SK2) + 0.823 (SK3)* TESTS; (1) HIST, BP6: (2) BP5, B l : (3) BP2: (4) HI: - In (LIKELIHOOD) X2 = 0.40 2 X 2 = 1-73 X x = 3.77 \ = 5.72 55.0023 P > 0.20 P > 0.20 P = 0.05 P < 0.02 *Complete d e t a i l s of the more complex re-expressions are a v a i l a b l e , but too lengthy to present here. 235 Table LXl: Hazard Model Results: Male, Female Groups, Time Zero, Time Two Increment Data (A) MALE GROUP (n=36) VARIABLE (Z) COEFFICIENT ( 3 ) * ST. ERROR* TOLERANCE Z s CONSTANT -3459 547 0.465 AGE 1007 1448 0.145 38.539 9.218 SYSO 790 643 0.465 2.583 0.770 KSPNO 498 755 0.183 0.303 0.521 HSYMI 1231 830 0.355 0.258 0.835 TESTS: (1) BACHO, RPAPI: (2) AGE: -In (LIKELIHOOD) = 30.5653 (B) FEMALE GROUP (n=44) 2 X2 2 1.98; P > 0.20 X-L - 5.81; P < 0.02 VARIABLE (Z) COEFFICIENT (§)* ST. ERROR* TOLERANCE Z s CONSTANT -3670 584 0.466 HSYMO 943 481 0.831 0.659 0.713 HSIZI 625 440 0.379 0.405 0.489 HECGI 736 399 0.913 0.211 0.735 KSPNI 478 422 0.703 0.175 0.703 RPAPI 987 1504 0.659 0.027 0.405 TESTS: (1) DIAO: (2) MINSYS: (3) RRETO: (4) RRETI: (5) KSPNI: (6) RPAPI: X l = X l = 2 x l -X, = X i = X i = 0.70; P > 0.20 0.53; P > 0.20 0.54; P > 0.20 1.42; P > 0.10 4.05; P < 0.05 10.89; P < 0.005 -In (LIKELIHOOD) = 37.0522 * x 1000 236 AGE i n the male group's set, and the presence of only f i v e predictors i n the s o l u t i o n reported f o r the female subsample. Of course, i t should be 33 remembered that, among the 2 -1 possible d i f f e r e n t subsets of predictors that might have been t r i e d here, only a very small f r a c t i o n could be tested using the a v a i l a b l e "manual" backward-stepping procedure. The l a s t r e s u l t s produced here appear i n summary form i n Table LXII. The p r i n c i p a l components and residuals used there are those of Tables XLVI and XLIX r e s p e c t i v e l y , and the present sets of predictors may be compared to the ones i n those t a b l e s . The presence of a second blood pressure component, BP2, i n the hazard model set f o r the female group i s notable here, as i s the f u t i l i t y of attempts to improve t o l e r -ances by re-expressing a s i n g l e p a i r of v a r i a b l e s . In f a c t , the previous analyses suggest that only when a set of predictors i s replaced by a completely orthogonalized set - that i s , tolerances of 100% everywhere -w i l l the hazard model tolerances improve appreciably. These r e s u l t s represent the culmination of present e f f o r t s to produce an objective a l t e r n a t i v e to the SEV c r i t e r i o n as a means of i d e n t i f y i n g patients having a s i m i l a r prognosis. Summaries and conclu-sions of a broader nature are l e f t to the next, and f i n a l , chapter. 237 Table LXII: Hazard Model Results: Male, Female Groups, Orthogonalized Time Zero, Time Two Increments (A) MALE GROUP (n=36) VARIABLE (Z) COEFFICIENT (3)* ST. ERROR* TOLERANCE Z s CONSTANT -3381 920 0.243 _ BP1 1268 789 0.454 3.638 1.921 BP4 1018 717 0.454 0.049 0.610 BP7 -1099 836 0.414 -1.779 0.303 H3 -970 740 0.193 -0.908 0.794 RB4 -953 914 0.570 -0.555 0.257 RK1 -729 586 0.391 2.574 0.696 RK4 637 509 0.546 -0.089 0.379 VARIABLE DEFINITIONS: (See also Table XLIV) RK1 = Kl + 0.2377 (BP1) - 1.1548 (BP7) + 0.8958 (H6) + 0.5168 (B4) + 0.1119 (Rl) + 0.3761 (R3) RK4 = K4 - 0.5176 (H6) RB4 = B4 + 0.233 (BP7) + 0.098 (H3) TESTS: RB4: X 2 = 8.98; P < 0.01 BP4: xj = 7.87; P < 0.01 RB4,BP4: X 2 = 11.50; P < 0.01 -In (LIKELIHOOD) = 29.5741 * x 1000 238 T a b l e L X I I , C o n t i n u e d (B) FEMALE GROUP (n=44) VARIABLE (Z) COEFFICIENT (g)* ST. ERROR* TOLERANCE Z s CONSTANT -4726 2107 0.077 _ BP2 . -518 1203 0.610 -0.585 0.901 BP6 1327 1616 0.099 -1.189 0.457 H4 -321 734 0.777 0.003 0.575 B2 -1321 1327 0.262 1.267 1.210 K2 -2064 1552 0.058 -0.470 0.883 Rl 834 956 0.159 1.407 1.006 R3 -2004 1744 0.115 0.500 0.443 VARIABLE DEFINITIONS: See T a b l e s X L V I I I , X L I X 0 . 5 7 ; P > 0 .20 T E S T S : (1) H5 ,R2 (2) BP6: (3) H3: (4) K 2 : - I n (LIKELIHOOD) = 30.5989 -A2 X 2 = 7 . 5 1 ; P < 0.01 x\ = 0 . 1 0 ; P > 0 .20 = 12 .53 ; P < 0 .005 NOTE: S u b s t i t u t i o n o f RBP6 = BP6 + 0.12644 (K2) f o r BP6 changed the t o l e r a n c e s t o 0.115 (RBP6) and 0.087 ( K 2 ) . C o e f f i c i e n t s and s t a n d a r d e r r o r s a l s o changed l i t t l e . * x 1000 239 Chapter 6 CONCLUSIONS 6.1 S t a t i s t i c a l Methodology The goal of t h i s section i s to review and evaluate the various techniques that have been used here to explore and analyze the a v a i l a b l e data. The techniques employed include: elementary d e s c r i p t i o n , s t r a t i f i c a t i o n , c o r r e l a t i o n analysis (including multiple and p a r t i a l c o r r e l a t i o n ) , multiple regression (with various methods of v a r i a b l e s e l e c t i o n ) , l i n e a r discriminant a n a l y s i s , re-expression of the independent variables (using the MINVAR c r i t e r i o n , or p r i n c i p a l components, or judg-ment components), non-linear methods (quadratic discriminant, l o g i s t i c regression, hazard function model), nonparametric methods (for s u r v i v a l curve estimates), and "repeated measures" methods (analysis of variance and the growth curve model). Throughout the analysis, the use of these tools was i n i t i a t e d and guided by the desire to approach the data from more than one point of view i n hopes of avoiding some of the traps that a r i s e from the p e c u l i a r i t i e s of e i t h e r the data or of the analyst. Ordinary multiple regression was by f a r the most useful method for exploring the r e l a t i o n s h i p between symptoms and s u r v i v a l ; as noted e a r l i e r , the sample s i z e and number of independent variables i n the set precluded the f u l l use of multi-way s t r a t i f i c a t i o n - a technique that enjoys a more d i r e c t i n t u i t i v e appeal than multiple regression. I t must be noted that the appeal of ordinary multiple regression was some-what diminished i n t h i s case due to the presence of a large number of strongly i n t e r - r e l a t e d predictors - which created the usual problems with i n t e r p r e t a t i o n of the estimated regression c o e f f i c i e n t s . A combination of v a r i a b l e s e l e c t i o n and v a r i a b l e re-expression techniques was used to 240 t r y to solve t h i s problem. Among the former, the " a l l subsets" technique with Mallows' c r i t e r i o n seemed to provide the best p i c t u r e of the "redundancy" problem at a (computing) cost that compares favourably with that of the standard stepwise methods; of course, when so many d i f f e r e n t subsets of the predictors are explored i n t h i s way, the analyst must be on guard for the eventuality of a spurious s o l u t i o n . In general, v a r i a b l e re-expressions offered further i n s i g h t s , e s p e c i a l l y when solutions of a "global" nature were abandoned i n favour of re-expressions within a group of s i m i l a r v a r i a b l e s : for example, p r i n c i p a l components within each group of symptoms (blood pressure, heart, b r a i n , etc.) - or "judgment" components of the same nature - or the discovery of re-expressions with more d i r e c t p h y s i c a l i n t e r p r e t a t i o n s ( l i k e pulse pressures and ranges). However, i t must be s a i d that, however i l l u m i n a t i n g they often were, no single technique or combination thereof was f u l l y s a t i s f a c t o r y as a means of understanding the (apparently) complex re l a t i o n s h i p s among the many variables of t h i s dataset. Turning now to some of the other techniques that were employed, i t was seen that ordinary l i n e a r discriminant analysis performed w e l l enough with the given data that a l t e r n a t i v e s such as l o g i s t i c regression and the quadratic discriminant d i d not seem worth the extra computational e f f o r t . On the other hand, normalizing transformations and adjustment of the constant (to e i t h e r minimize or equate the m i s c l a s s i f i c a t i o n p r o b a b i l i t i e s ) seem worthy of consideration (at least) as sequels to a c a r e f u l l i n e a r discriminant a n a l y s i s . The analysis of variance with repeated measures and the growth curve analysis seemed to perform well with the given data, i n s o f a r 241 as they agreed c l o s e l y both with one's i n t u i t i v e impressions of the graphical r e s u l t s and with each other, as w e l l . With respect to the growth curve model, t h i s i s one instance where the extra computational e f f o r t required appears to have been amply rewarded. Although hazard function models are generally used to adjust for the e f f e c t s of uncontrolled variables when comparing s u r v i v a l curves, t h i s basic method also proved useful f o r the present analysis, where comparison was not the main concern. The general model i s p a r t i c u l a r l y w e l l - s u i t e d to the type of censoring encountered i n the present data and, assuming the a c c e p t a b i l i t y of the exponential model adopted, i t provides an estimate f o r the p r o b a b i l i t y of surviving f o r various periods of time (less than 10 years) given the se v e r i t y of symptoms. I t i s of i n t e r e s t that variables found to be important predictors i n e a r l i e r , l i n e a r analyses also showed a strong tendency to be retained i n the non-linear hazard model solutions. Among the various problems encountered i n using the r e l a t i v e l y new hazard function methodology, three appear to be of a s u f f i c i e n t l y general nature to warrant i n c l u s i o n here: (1) improved computational techniques, perhaps inv o l v i n g a stepwise algorithm, would f a c i l i t a t e a more thorough exploration of the given set of pred i c t o r s ; (2) only rather rough checks of the adequacy of the s p e c i f i c model under consideration seem p r a c t i c a l at t h i s point - e s p e c i a l l y when many independent v a r i a b l e s are a v a i l a b l e ; (3) attempts to improve the s t a b i l i t y of the estimated c o e f f i c i e n t s v i a re-expression of the predictors are l a r g e l y unrewarding. 242 The combined e f f e c t of these three problems on the q u a l i t y (and, hence, the usefulness) of the f i n a l estimate i s p o t e n t i a l l y serious and d e f i n i t e l y worthy of concern and further study. 6.2 The Medical Issues The medical l i t e r a t u r e on hypertension i s strewn with phrases l i k e : " . . . i t i s not known i f . . . " ; " . . . i t seems quite c l e a r that..."; "...there appears to be..."; " . . . i t i s l i k e l y that..."; and so on. The i n d e f i n i t e and/or i l l - d e f i n e d state of many important issues concerning t h i s disorder makes i t a d i f f i c u l t task to e s t a b l i s h a set of p r i n c i p l e s as a basis for discussion of the issues. A r t i c l e s of an a u t h o r i t a t i v e nature - medical textbooks, papers by such authors as F r e i s , Evelyn, and Chasis, e t c . -were taken as a source of "commonly-held and w e l l -reasoned opinions" on the subject. Some of these opinions w i l l be r e f e r r e d to e x p l i c i t l y i n the following subsections. 6.2.1 The O r i g i n a l Problems This subsection discusses the problems that were proposed i n chapters 1 and 4 as the i n i t i a l framework for the a n a l y s i s . (1) The f i r s t problem was to i d e n t i f y the symptoms having greatest prognostic s i g n i f i c a n c e and to work out the quantitative r e l a t i o n s h i p s involved. The r e s u l t s r e l a t i v e to t h i s problem were found to depend on the sample used (combined, males alone, females alone) and, to a l e s s e r extent, on the choice of re-expression of the s u r v i v a l time v a r i a b l e . With respect to t h i s l a s t point, i t was noted that some important differences e x i s t between the set of (regression) predictors chosen for the "alive-dead" re-expression (STATUS) and those chosen for re-expressions that use sub-intervals of the given ten-year follow-up 243 period (INTERV and INTERM). The f a c t that these differences are f a r less pronounced i n the c o r r e l a t i o n table (see Table VII) i s s i g n i f i c a n t and may be explained by the tendency of a symptom with high p r e d i c t i v e power (as measured by i t s c o r r e l a t i o n with the dependent variable) to be l e f t out (or " p a r t i a l e d out") of a regression equation that i s saturated with other good p r e d i c t o r s , acting ( i n d i v i d u a l l y or together) as a "substitute" f o r the omitted symptom. The same reasoning explains the presence i n the p r e d i c t i o n equation of r e l a t i v e l y weak p r e d i c t o r s : they are carrying information that i s l a r g e l y independent of that supplied by the other selected symptoms. The differ e n c e between the set of variables showing the strongest c o r r e l a t i o n s with s u r v i v a l and the f i n a l set of predictors letained i n the regression equation often provides a rough guide as to the strength of the r e l a t i o n s h i p s among the various symptoms themselves. These points w i l l now be amplified. The c o r r e l a t i o n tables f o r the time zero data c l e a r l y point to the r e t i n a l symptoms - the state of the r e t i n a l vessels (RVES) and degree of retinopathy (RRET), to be precise - as having the strongest a s s o c i a t i o n with s u r v i v a l time. This brings to mind Giffo r d ' s comment, as recorded i n The Hypertension Handbook: "The appearance of the r e t i n a i s a more important and more r e l i a b l e index to the severity and prognosis of hypertension than are casual blood pressure readings." [44, p.71] In the f u l l sample of 98 controls, the r e t i n a l variables seem to be followed i n importance by (in descending order): the maximum and average d i a s t o l i c v a r i a b l e s (MAXDIA and DIAO), the renal function variable (KSPNO) and s y s t o l i c v a r i a b l e s ; heart and brain symptoms are 244 found near the bottom of the l i s t . This trend i s echoed i n the c o r r e l a t i o n tables using within-groups p r i n c i p a l components, whether each sex i s considered separately or not; i n each case, the most highly co r r e l a t e d components are weighted averages (that i s , f i r s t components) of r e t i n a l , blood pressure, aid kidney symptoms, i n descending order. The patient's age (AGE) was repeatedly seen to be i n f e r i o r as a pr e d i c t o r of s u r v i v a l time r e l a t i v e to variables r e f l e c t i n g the state of the target organs. The presence of extreme blood pressures (six-month maxima and minima) among the best predictors of s u r v i v a l i s i n t e r e s t i n g i n view of t h e i r associated measurement problems: f o r example, Dr. K.A. Evelyn has stated that the maximum reading i s l i k e l y to be the f i r s t one taken during the i n i t i a l examination period. Although much attention has been given i n the l i t e r a t u r e to the s i g n i f i c a n c e of casual blood pressure readings r e l a t i v e to r e s t i n g measurements, few authors besides Evelyn have given much thought to the importance of casual "peaks" i n pressure, or to the minimum pressure the pa t i e n t i s able to achieve. Researchers currently exploring the prognostic s i g n i f i c a n c e of "spiking" (occasional high readings) i n youth might shed some more l i g h t on t h i s phenomenon by extending t h e i r study i n t o higher age groups. Turning now to the l i n g e r i n g debate over the r e l a t i v e importance of s y s t o l i c versus d i a s t o l i c pressure i n prognosis, the p o s i t i o n taken by Peart i s worth recording here: "Too much has been made of the greater importance of d i a s t o l i c hypertension i n morbidity. I t i s highly l i k e l y that r i s e i n the mean pressure [one-t h i r d of the s y s t o l i c plus two-thirds of the d i a s t o l i c ] i s of greatest importance whatever way i t i s produced." [41, p.981]. 245 I f the highly s i g n i f i c a n t s y s t o l i c - d i a s t o l i c c o r r e l a t i o n s observed f o r the present sample are t y p i c a l of a broader spectrum of hypertensives (or even normotensives), then the fa c t that Peart (and others) should a r r i v e at such a conclusion i s quite understandable. Perhaps the point that should be debated now i s whether the height of the d i a s t o l i c pressure i s more d i r e c t l y r e l a t e d to prognosis than that of the s y s t o l i c . S t a t i s t i c a l analyses can do no more than o f f e r clues to such problems. In the present case, the data appear to be p o i n t i n g towards, a sex e f f e c t : s y s t o l i c pressures seem to have greater prognostic s i g n i f i c a n c e than the d i a s t o l i c readings f o r the males while the reverse i s true f o r the females. I f t h i s were the case i n general, i t i s quite easy to see how the combined-sample (both sexes) r e s u l t s of a c o r r e l a t i o n study might lead to a statement l i k e Peart's above. As f a r as the variables a v a i l a b l e i n the study could be used to assess the v a r i a b i l i t y of a patient's blood pressure, i t would appear from the present data that blood pressure v a r i a b i l i t y i s of only minor prognostic s i g n i f i c a n c e a f t e r adjusting for the well-known e f f e c t s of blood pressure l e v e l (that i s , v a r i a b i l i t y tends to increase as the average level of the pressure increases). However, i t i s recognized that the existence of a better measure of v a r i a b i l i t y ( l i k e the standard deviation) might have l e d to a d i f f e r e n t conclusion here. Turning now to the time zero regression r e s u l t s , the reader should not be surprised to f i n d only a b r i e f review of the major r e s u l t s here, given the v a r i e t y of solutions that were produced. In a l l the samples considered, extreme blood pressure, kidney, and r e t i n a l variables were present i n most of the f i n a l p r e d i c t i o n equations; i n both the male 246 and the combined sample, the headache va r i a b l e (BACHO) was retained as we l l ; i n the female sample, headache symptoms are replaced by heart symptoms (HSYMO) and, as i n the combined sample, the minimum blood pressure variables (MINSYS and MINDIA) appear with opposite signs. Attempts to judge the order of r e l a t i v e importance of the pre d i c t o r s i n any of the o r i g i n a l equations are stymied by the degree of i n t e r -relatedness of the predictors . (as^alluded to a t the beginning of t h i s subsection and i n the s t a t i s t i c a l section) which creates a s i t u a t i o n i n which "importance" i s shared among more than one p r e d i c t o r . Many sorts of re-expressions of the o r i g i n a l v a r i a b l e s (weighted averages, d i f f e r e n c e s , residuals) were used to provide solutions where " r e l a t i v e importance" problems are easier to resolve, but then the predictors themselves became harder to i n t e r p r e t , i n general. Nevertheless, i t seems reasonable to r e t a i n at l e a s t some f a i t h i n the p r e d i c t i v e power of the r e s u l t i n g equations, although, ultimately, only experience with independent sets of data can show how well-founded that f a i t h i s . The appearance of heart variables and e s p e c i a l l y a b r a i n v a r i a b l e i n the regression equations i l l u s t r a t e s the tendency f o r symptoms of secondary prognostic importance (as measured by the c o r r e l a t i o n c o e f f i c i e n t ) to play a r o l e i n p r e d i c t i o n as c a r r i e r s of r e l a t i v e l y "independent" pieces of information. I t i s s u b s t a n t i a l l y more d i f f i c u l t to explain the presence of the two minimum blood pressures (MINSYS and MINDIA) i n the regression solutions obtained f o r both the combined and the female samples. I t i s tempting to conclude that the opposite signs f o r the regression c o e f f i c i e n t s (MINSYS has a p o s i t i v e c o e f f i c i e n t , but a s i g n i f i c a n t negative c o r r e l a t i o n with s u r v i v a l time) 247 suggest a r o l e f o r a type of "minimum pulse pressure" (PPMIN=MINSYS-MINDIA) i n prognosis. Indeed, the female data reveal a weak p o s i t i v e a ssociation between such recombinations of the minimum pressures and s u r v i v a l time; and the behaviour of these two pressures was b a s i c a l l y unchanged throughout a great v a r i e t y of data re-expressions, samples, and analytic.techniques (both l i n e a r and non-linear). Moreover, a p a r t i a l c o r r e l a t i o n analysis revealed that, a f t e r taking i n t o account the patient's age and average pressures (AGE, SYSO, DIAO), a s i g n i f i c a n t p o s i t i v e c o r r e l a t i o n between minimum pulse pressure and s u r v i v a l time emerged f o r the female sample. I t appears- then that t h i s e f f e c t i s not e a s i l y explained away by s t a t i s t i c a l considerations of m u l t i c o l l i n e a r i t y -that i s , MINSYS (for example) being too strongly r e l a t e d to other predictors selected for the regression; perhaps t h i s difference of minimum pressures r e a l l y does have some p r a c t i c a l medical s i g n i f i c a n c e -such as a r e f l e c t i o n of the patient's basal pulse pressure, for example. Turning b r i e f l y to r e s u l t s i n v o l v i n g data c o l l e c t e d at the two- and five-year points of the study, the reader i s again reminded of the problems with missing data, selected samples, and decreasing sample sizes that were experienced at these stages of the a n a l y s i s . Such d i f f i c u l t i e s notwithstanding, though, c e r t a i n r e s u l t s seem f a i r l y consistent here: r e t i n a l symptoms (or weighted averages thereof) s t i l l seem to be the best i n d i v i d u a l predictors of s u r v i v a l , but the blood pressure variables are now reduced to a r o l e of secondary importance r e l a t i v e to target organ symptoms. Heart symptoms, on the other hand, seem to increase i n importance as time passes. I t i s i n t e r e s t i n g to r e c a l l here that the members of the sample being studied were generally 248 severely hypertensive, but had no serious heart (congestive heart f a i l u r e , myocardial i n f a r c t i o n ) , brain (cerebrovascular accident), or kidney symptoms at the study's beginning; they obviously d i d not remain free of such complications. In summary then, the r e s u l t s of the present analysis seem to agree with an opinion expressed by Peart, Chasis, and several others, who f e e l that the severity of the cardiovascular complications (that often co-exist with changes considered to be more d i r e c t l y r e l a t e d to hypertension) can develop greater prognostic s i g n i f i c a n c e than the height of the blood pressure i t s e l f ; as the patient's h i s t o r y of hypertension lengthens, t h i s e f f e c t appears to become more pronounced i n the present sample. (2) The second of the o r i g i n a l goals proposed f o r the present analysis was to examine the i n t e r - r e l a t i o n s among the various symptoms that were recorded. The strength of these r e l a t i o n s has already been alluded to, and i s further i l l u s t r a t e d by the observation that, i n the combined sample, about one t h i r d of the complete set of 22 time-zero p r e d i c t o r s 2 were able to achieve 94% of the R value observed when a l l 22 variables were present; i n other words, there i s a great deal of redundancy - of the i m p l i c i t kind- among the various symptoms a v a i l a b l e . C o r r e l a t i o n tables and charts (such as those of the appendix) are h e l p f u l i n understanding the nature of the redundancy. Not s u r p r i s i n g l y , the blood pressure variables are strongly r e l a t e d among themselves, and, i n general, symptoms of the same type are highly co r r e l a t e d with one another. Of greater i n t e r e s t i s the c e n t r a l r o l e played by r e t i n a l symptoms (or components), p a r t i c u l a r l y i n r e l a t i o n to the kidney and blood pressure variables; i n f a c t , a l l of the o r i g i n a l renal variables seem to be 249 at l e a s t as strongly associated with the r e t i n a l variables ( e s p e c i a l l y retinopathy and the state of the r e t i n a l vessels) as they are among themselves. (To be more pr e c i s e , the f i r s t component of the kidney group has a c o r r e l a t i o n of around 0.7 with the corresponding r e t i n a l component - whether sexes are combined or not.) On the other hand, brain symptoms generally showed the weakest l i n k s to the other symptom groups, while heart symptoms f e l l somewhere i n the middle, with respect to inter-group c o r r e l a t i o n s . Peart [41, p.986] has also noted the absence of a strong a s s o c i a t i o n between blood pressure l e v e l and hypertensive headache (abbreviated BACH). The strength of the i n t e r - r e l a t i o n s observed here might be interpreted as support for the view that "hypertension" i s a c t u a l l y a syndrome, for which a somewhat more informative name i s "hypertensive vascular disease". .Moreover, while there e x i s t s much evidence suggesting that the abnormally high l e v e l of a r t e r i a l pressure i s c e n t r a l i n i n i t i a t i n g c e r t a i n organic changes, the r e s u l t s of the present analysis suggest that r e t i n a l changes are deserving of considerable attention as w e l l , i n s o f a r as they appear to be a rather consistent r e f l e c t i o n of the progress of the disease i n other parts of the body - e s p e c i a l l y the kidneys. (3) The prognostic s i g n i f i c a n c e of changes i n symptom se v e r i t y over time w i l l now be discussed i n the l i g h t of the present r e s u l t s . Re-expression of the follow-up data, c o l l e c t e d at the two- or five-year point of the study, i n terms of "increments" (change i n symptom grade since time zero) provided d e t a i l e d regression r e s u l t s incorporating the concept of rate of change of symptom s e v e r i t y . Considering the problem from the point of view of a single v a r i a b l e at a time, the* graphical 250 analysis of i n d i v i d u a l symptom averages at year zero, two, and f i v e of the study proved i n t e r e s t i n g ; the follow-up analyses of t h i s d e s c r i p t i v e stage suggested that the changes bearing the greatest prognostic * s i g n i f i c a n c e are, i n (roughly) descending order: heart symptoms (HSYM) , * * p r o t e i n u r i a GCPRO), kidney function (KSPN), r e t i n a l vessels (RVES), retinopathy (RRET), electrocardiographic abnormalities (HECG), kidney * symptoms (K.SYM) and papilledema (RPAP) . The blood pressure averages for the various sub-samples considered were remarkably stable over time, even though the 0 to 4 grading system was not yet "overtaxed" i n any of the samples. This l a s t observation i s consistent with the previously-reported tendency f o r blood pressure symptoms to diminish i n prognostic s i g n i f i c a n c e over time r e l a t i v e to target organ symptoms. As f o r br a i n symptoms, average headache grades tended to decline over time i n a l l the subsamples, while the "other" b r a i n symptoms (cerebrovascular accidents, etc.) showed a s i m i l a r consistency with respect to a d i f f e r e n t pattern: a V-shaped graph suggesting (rather surprisingly) a five-year recurrence pattern i n these symptoms; the reader i s once again reminded of the rather p e c u l i a r nature of t h i s l a s t v a r i a b l e (BCVA) - e s p e c i a l l y i t s mandate to record the incidence of events, rather than any long-l a s t i n g e f f e c t s they might have had. (4) The f i n a l o r i g i n a l goal was the creation of a severity index f o r the disorder that would be "comprehensive" - i n the sense that the p a t i e n t 1 s various symptoms would be given appropriate weights within a single formula. The ultimate "source" of the weights was, of course, the association between symptoms and s u r v i v a l ; t h i s implies a d e f i n i t i o n of *: Indicates that the symptom was transformed by taking logarithms. 251 "severity" i n which length of l i f e assume a dominant r o l e . Although e a r l i e r regression and discriminant analysis r e s u l t s do, i n fact, provide a v a r i e t y of severity index formulae of the desired type, i t was f e l t that the hazard function method was capable of y i e l d i n g extra d e t a i l - at the same time as i t solved the problem of censored data (that i s , 52 patients survived the study p e r i o d ) . The l a s t section of Chapter 5 gives the d e t a i l s of the formulae for each sex: once the appropriate numerical value for each of the chosen symptoms has been determined, the equation can be used to estimate the value of the parameter, X , i n the formula, S(t) = exp(-Xt) , which gives the p r o b a b i l i t y that the patient w i l l survive the next t years; the formula should not be used f o r values of t much beyond 10 years, which i s the period of time covered by the study. The reader i s reminded that the hazard function i s only "pseudo-constant": even though i t does not depend d i r e c t l y on time, changesiin the patient's symptoms w i l l generally be r e f l e c t e d i n a re-evaluation of X , and a whole new s u r v i v a l curve for the p a t i e n t . In comparative studies, patients could be grouped according to estimated hazard rate at the outset of a follow-up period and the surviving patient's f i n a l X value could be used as a summary of his or her condition at the study's end. A f i n a l comment on the hazard function r e s u l t s i s a cautionary one: although the nature of the model for X guarantees a p o s i t i v e estimate for t h i s parameter, no matter what values are substituted f o r the symptom grades, t h i s f l e x i b i l i t y should not be abused; that i s , the rough .checks of the adequacy of the model that were c a r r i e d out here are v a l i d only f o r the type of data represented by the present sample, and other 252 "kinds" of hypertensives (very mild ones, f o r example) may f i t i n t o a d i f f e r e n t model. Moreover, as had j u s t been suggested, the appropriateness of the model adopted f o r the a v a i l a b l e data i s by no means completely established i t s e l f , and further data and/or experience could eventually lead to refinements. 6.2.2 A d d i t i o n a l Problems To be discussed i n t h i s subsection are some of the more s i g n i f i c a n t problems or discoveries that came to l i g h t i n the course of the a n a l y s i s . (1) The f a c t that many of the conclusions previously drawn i n t h i s chapter are s e x - s p e c i f i c has already suggested the importance of male-female differences with respect to the course of hypertensive disease. Apparent differences between the sexes i n the r e l a t i v e prognostic s i g n i f i c a n c e of s y s t o l i c and d i a s t o l i c pressures have already been discussed (subsection 6.2.1). A review of the regression r e s u l t s reveals that, i n general, more predictors were retained i n the equations f o r the female group than for the males, and the q u a l i t y of p r e d i c t i o n appears to be superior f o r the former p a t i e n t s . More than one ser i e s of analyses c a r r i e d out here have suggested that female hypertensives are, as a group, capable of enduring more severe symptoms than are males; t h i s statement i s l a r g e l y based on the observed differences i n both symptom average grades (consistently higher f o r females) as well as mortality rates (lower f o r females). F i n a l l y , there i s some evidence suggesting that women are capable of achieving lower s y s t o l i c pressures than are men having the same average l e v e l s of a r t e r i a l pressure. Such observations 253 are of p o t e n t i a l i n t e r e s t to researchers t r y i n g to uncover the r o l e of hormones i n the development of vascular diseases i n general; animal experiments have pointed to an as s o c i a t i o n between the male hormone, testosterone, and increased s c l e r o t i c degeneration of the a r t e r i e s , and s y s t o l i c pressure i s widely believed to increase with greater "hardening" of the a r t e r i e s . (Here again, a better understanding of the s i g n i f i c a n c e of minimum blood pressures f o r the course of the disease would be helpful.) (2) A second c r i t e r i o n that was used to create subsamples of the complete set of 98 cases i s the presence of papilledema (RPAP), which was taken to be a strong i n d i c a t i o n that the hypertension had progressed to the accelerated (or malignant) stage i n the patients under study here. The data suggested that accelerated hypertension could reasonably be regarded as the "severe end" of the hypertension spectrum, rather than a separate disease e n t i t y , and for that reason, only a few analyses were c a r r i e d out with accelerated cases excluded. In addition, the a v a i l a b l e data appear to support widely-held b e l i e f s concerning the r o l e of age and blood pressure l e v e l ( p a r t i c u l a r l y the d i a s t o l i c pressures) i n the development of t h i s form of the disorder. F i n a l l y , c a r e f u l analysis of the r o l e of blood pressure v a r i a b i l i t y (discussed e a r l i e r ) indicated that, i n cases showing papilledema, there i s a tendency f o r the range of pressures to be small, considering the high average l e v e l of the pressure; i n other words, patients with symptoms suggestive of accelerated hypertension appear to have not only r e l a t i v e l y high l e v e l s of a r t e r i a l pressure, but l i t t l e r e l i e f from these pressures, as w e l l . 254 (3) An area of research that i s currently very active (see [43]) involves a study of the r o l e of elevated a r t e r i a l pressure within the o v e r a l l p i c t u r e of cardiovascular disease i n general. In order of decreasing c e r t a i n t y , i t i s believed that high blood pressure plays a r o l e (at least) i n the hypertrophy and eventual f a i l u r e of the l e f t v e n t r i c l e i n the heart, a r t e r i o l o s c l e r o s i s , a a n d a t h e r o s c l e r o s i s . The polemic that s t i l l e x i s t s here i s i l l u s t r a t e d by the following quotations from Chasis: "...the events that complicate or terminate the [hypertensive] disease are vascular i n o r i g i n and the factors that a f f e c t the progress of vascular disease other than the l e v e l of systemic a r t e r i a l pressure are multiple and complex. The primary event i n the vascular wall that i n i t i a t e s a r t e r i a l disease i s unknown and factors such as d i e t , genetics, environment and hormones have been suggested as p l a y i n g a pathogenetic r o l e . " [1, p.8] ... and from F r e i s : "...the burden of evidence today from both animal experiments and c l i n i c a l studies i s that i t i s the hypertension which produces the vascular disease." [22, p.10] The f i n a l r e s o l u t i o n of t h i s controversey i s l i k e l y to have important consequences for the popularity of treatment regimes that concentrate on reducing the blood pressure l e v e l ; indeed, any treatment that l i m i t s i t s e l f to the control of the leading, or most v i s i b l e , symptom of the disorder, but which leaves the underlying disease process free to pursue i t s course i s completely unacceptable. At present, general statements seem to be premature, considering the v a r i e t y and complexity of vascular diseases; that i s , while recent r e s u l t s support 255 F r e i s ' claims with respect to some types of vascular disease, other types - e s p e c i a l l y those i n v o l v i n g a t h e r o s c l e r o s i s - are s t i l l shrouded i n uncertainty. I t is thus regrettable that the variables a v a i l a b l e i n the present study can not, by t h e i r nature, shed more l i g h t on the r o l e of blood pressure i n i n i t i a t i n g and/or worsening a t h e r o s c l e r o t i c complications; i t must also be remembered that there i s considerable v a r i a b i l i t y i n the duration of hypertension among these pa t i e n t s , and that only a rough idea of t h i s duration i s a v a i l a b l e i n many cases. As an example of what can be gathered from the present data, the heart symptom (HSYM) var i a b l e w i l l be considered here (see section 2 of Chapter 3 for the d e f i n i t i o n of t h i s "two-pronged" variable) . The c o r r e l a t i o n tables of the appendix reveal that heart symptoms at the two-year point of the study are most strongly associated (in the complete sample) with: time-zero heart symptoms (c o r r e l a t i o n of 0.60), maximum s y s t o l i c (MAXSYS) and average s y s t o l i c (SYSO) pressures (both around 0.40), d i a s t o l i c (DIAO) pressure (0.39), p r o t e i n u r i a (KPROO, r = 0.34), heart siz e (HSIZO, r = 0.32), electrocardiographic abnormalities (HECGO) and maximum d i a s t o l i c (MAXDIA) pressure (0.31 each). From the point of view of change i n heart symptoms over the f i r s t two years of the study (HSYM2 minus HYSMO), the highest c o r r e l a t i o n with a non-heart v a r i a b l e i s 0.17 with maximum s y s t o l i c pressure (MAXSYS), followed by 0.12 with maximum d i a s t o l i c pressure (MAXDIA), and 0.11 for both average d i a s t o l i c (DIAO) pressure and p r o t e i n u r i a (KPROO). Unfortunately, these rather tenuous suggestions of the importance of blood pressure i n "producing" 1 1 A s t a t i s t i c a l analysis of c o r r e l a t i o n s , of course, can never e s t a b l i s h a cause and e f f e c t r e l a t i o n s h i p . 256 a t h e r o s c l e r o t i c complications are too e a s i l y dismissed (for the reasons outlined above) to warrant further discussion here, and were not pursued i n the main a n a l y s i s . 6.3 Concluding Remarks In addition to the support they o f f e r f or a v a r i e t y of "established" medical opinions on the disorder, the foregoing analyses of the present data have also provided new i n s i g h t s i n t o the progression and prognosis of hypertensive disease i n that segment of the population represented by the a v a i l a b l e sample. Three r e s u l t s stand out as being p a r t i c u l a r l y consistent, and p o t e n t i a l l y valuable i n understanding and t r e a t i n g t h i s disease. F i r s t , i t seems c l e a r that male hypertensives react to t h e i r disease i n ways that d i f f e r s u b s t a n t i a l l y from those of female hypertensives. Second, the associations of r e t i n a l symptoms -which are d i r e c t l y and e a s i l y observable - with both s u r v i v a l time as well as target organ symptoms (es p e c i a l l y those of the kidney) appear to be strong enough to warrant a more consistent e x p l o i t a t i o n by the medical p r o f e s s i o n . F i n a l l y j the analyses have suggested that average l e v e l s of blood pressure ( d i a s t o l i c or s y s t o l i c ) , while useful and p r o g n o s t i c a l l y s i g n i f i c a n t summaries of the behaviour of the patient's a r t e r i a l pressure, should nevertheless be accompanied by other s t a t i s t i c a l descriptions of the d i s t r i b u t i o n of pressures - such as the standard deviation, a measure of skewness, and the extreme values observed; such a d d i t i o n a l information seems indispensable i f one i s to estimate the percentage of time that the patient i s subject to the damaging e f f e c t s of unusually high a r t e r i a l pressure. 257 BIBLIOGRAPHY [1] Anderson, J.W., Introduction to Multivariate Statistical Analysis, Wiley, New York, 1958. [2] Armitage, P., Statistical Methods in Medical Research, Blackwell S c i e n t i f i c P u b l i c a t i o n s , Oxford, 1971. [3] Beale, E.M.L. and R.J.A. L i t t l e , "Missing Values in Multivariate Analysis", Journal of the Royal S t a t i s t i c a l Society, B37 (1975) , 129-145. [4] Beaton, A.E. and J.W. Tukey, "The F i t t i n g of Power Series, Meaning Polynomials, I l l u s t r a t e d on Band-Spectroscopic Data", Technometrics, 16 (1974), 147-192. [5] Breslow, N., "Covariance Analysis of Censored Su r v i v a l Data", Biometrics, 30 (1974), 89-99. [6] Brown, M.B. and W.J. Dixon ( e d i t o r s ) , BMDP-77 Biomedical Computer Programs (P Series), University of C a l i f o r n i a Press, Berkeley, 1977. [7] Chasis, H., "Appraisal of Antihypertensive Drug Therapy", Circulation, 50 (1974), 4-8. [8] Chiang, C.L., Introducation to Stochastic Processes in Biostatistics, Wiley, New York, 1968. [9] C h i l d , D., The Essentials of Factor Analysis , Holt, Rinehart and Winston, 1973. [10] Cox, D.R., "Regression Models and L i f e Tables", Journal of the Royal Statistical Society, B34 (1972), 187-220. [11] Efron, B., "The E f f i c i e n c y of L o g i s t i c Regression Compared to Normal Discriminant Analysis", Journal of the American Statistical Association, 70 (1975), 892-898. [12] Evelyn, K.A., "The Pathogenesis, C l i n i c o - P a t h o l o g i c a l C o r r e l a t i o n and Treatment of Hypertension", Mo Gill Medical Journal, 1946, 225-257. [13] Evelyn, K.A. et a l . , " E f f e c t of Sympathectomy on Blood Pressure i n Hypertension", Journal of the American Medical Association, 140 (1949), 592-601. [14] Evelyn, K.A. et a l . , " E f f e c t of Thoracolumbar Sympathectomy on the C l i n i c a l Course of Primary (Essential) Hypertension", The Americal Journal of Medicine, 28 (1960), 188-221. 258 [15] Evelyn, K.A., "The Natural History of Hypertension", Transactions of the Association of Life Insurance Medical Directors of America, 53 (1969), 84-111. [16] F e i n s t e i n , A.R., Clinical Bio statistics, V. Mosby, Saint Louis, 1977. [17] Finnerty, F.A., "Aggressive Drug Therapy i n Accelerated Hypertension", American Journal of Eursing, 74 (1974), 2176-2180. [18] Frane, J.W., "Some Simple Procedures f o r Handling Missing Data i n M u l t i v a r i a t e Analysis", Psychometrika, 41 (1976), 409-415. [19] F r e i s , E.D. et a l . , "A Double B l i n d Control Study of Antihyper-tensive Agents", Archives of Internal Medicine, 106 (1960), 81-96. [20] F r e i s , E.D. et al., "E f f e c t s of Treatment on Morbidity i n Hypertension", Journal of the American Medical Association, 202 (1967), 116-122. [21] F r e i s , E.D. et al., "Ef f e c t s of Treatment on Morbidity i n Hypertension I I " , Journal of the American Medical Association, 213 (1970), 1143-1152. [22] F r e i s , E.D., "Rebuttal: Appraisal of Antihypertensive Drug Therapy", Circulation, 50 (1974), 9-10. [23] F r e i s , E.D., Introduction to the Nature and Management of Hypertension, Robert J . Brady Co., Bowie Maryland, 1974. [24] Geiger, J . and N.A. Scotch, "The Epidemiology of E s s e n t i a l Hypertension - I", Journal of Chronic Diseases, 16 (1963), 1151-1182. [25] Goldring, W. and H. Chasis, Hypertension and Hypertensive Disease, The Commonwealth Fund, New York, 1944. [26] Gross, A.J. and V.A. Clark, Survival Distributions: Reliability Applications in the Biomedical Sciences, Wiley, New York, 1975. [27] Harvey, A.C., "Some Comments on M u l t i c o l l i n e a r i t y i n Regression", Journal of Applied Statistics, 26 (1977), 188-191. [28] Humerfelt, S. B j . , An Epidemiological Study of High Blood Pressure, Scandinavian University Books, Oslo, Norway, 1963. 259 [29] Kaplan, E.L. and P. Meier, "Nonparametric Estimation from Incomplete Observations", Journal of the American Statistical Association, 53 (1958), 457-481. [30] Kay, R., "Proportional Hazard Regression Models and the Analysis of Censored Survival Data", Journal of Applied Statistics, 26 (1977), 227-237. [31] K i l l i p , T., "The Problem: Atherosclerosis", Textbook of Medicine, Fourteenth E d i t i o n , (Beeson, P.B. and W. McDermott, e d i t o r s ) , 1975, 981-992. [32] Krzanowski, W.J., "Discrimination and C l a s s i f i c a t i o n Using Both Binary and Continuous Variables", Journal of the American Statistical Association, 70 (1975), 782-790. [33] Mallows, C.L., "Some Comments on C ", Technometrics, 15 (1973), 661-675. P [34] Mansfield, E.R., J.T. Webster and R.F. Gunst, "An Analy t i c Variable Selection Technique for P r i n c i p a l Component Regression", Journal of Applied Statistics, 26 (1977) , 34-40. [35] Montgomery, D.C., Design and Analysis of Experiments, Wiley, New York, 1976. [36] Mood, A.M., F.A. G r a y b i l l and D.C. Boes, Introduction to the Theory of Statistics, Third E d i t i o n , McGraw-Hill, New York, 1974. [37] Morrison, D.F., Multivariate Statistical Methods, Second E d i t i o n , McGraw-Hill, New York, 1976. [38] Morrison, D.G., "On the Interpretation of Discriminant Analysis", Journal of Marketing Research, 6 (1969), 156-163. [39] Mosteller, F. and J.W. Tukey, Data Analysis and Regression, Addison-Wesley, Don M i l l s , Ontario, 1977. [40] Oglesby, P. (e d i t o r ) , Epidemiology and Control of Hypertension, Stratton Intercontinental Medical Book Corporation, New York, 1975. [41] Peart, W.S., " A r t e r i a l Hypertension", Textbook of Medicine, Fourteenth E d i t i o n , (Beeson, P.B. and W. McDermott, e d i t o r s ) , 1975, 981-992. [42] Perera, G.A., "Diagnosis and Natural History of Hypertensive Vascular Disease", American Journal of Medicine, 416 (1948), 416-422. 260 [43] Rorive, G. and Henry Van Cauwenberge (e d i t o r s ) , The Arterial Hypertensive Disease (A Symposium at Liege), Masson i n c . , New York, 1976. [44] stamler, J . et a l . , The Hypertension Handbook, Merck Sharp & Dohme, West Point, Pa., 1974. [45] Turnbull, B.W., "The Empirical D i s t r i b u t i o n Function with A r b i t r a r i l y Grouped, Censored, and Truncated Data", Journal of the Royal Statistical Society, B38 (1976), 290-295. [46] Winer, B.J., Statistical Principles in Experimental Design, Second E d i t i o n , McGraw-Hill, New York, 1971. 261 Appendix A LIKELIHOOD OF THE SAMPLE OF OBSERVED AGES The experiment consists of drawing a sample of s i z e n from the s p e c i f i e d population of hypertensives, recording the age, A , of each, following the sample f or a pre-determined maximum number of years, t , recording the greatest observed age T* , of each, and noting whether T* i s an age at death (A = 1) or a censored observation (A = 0 ). Thus, the data for the i sampled element c o n s i s t of a value a. of A. , a value t * of T* , and a value <S, of A. . i i i i i i Since l i t t l e can be said about the j o i n t density of the sample of ages, a = (a,, a ) - other than to mention that t h i s density i s — 1 n truncated on the r i g h t at 55 years - i t was decided to work with the con d i t i o n a l l i k e l i h o o d of t * = ( t * , t*) and 6 = (<5, , 6 ) — 1 n — 1 n given a_ . That i s , L(0_; t * , j5_| A = a) i s sought, where 0_ i s the vector of parameters f o r the density of ages at death. F i r s t note that L(6_; t * , 6_|a) = L(6_; t * | 6 _ , a) • L(6_; 6_|a) . Dropping Q_ from the notation, i t i s seen that f ( t * | 6 , a J = f ( t * ) —7 . —. r- i f 6. = 1 , since, i n t h i s case, t * i s an age at Eva. + t ) — F (a.) 1 1 • 1 0 1 death from a density truncated on the l e f t at the given age, a^ , and on the r i g h t at the maximum observable age, a . + t ; i f 6 . = 0 , 1 0 1 knowledge of a^ means that t * must equal t + a^ . Thus: n f f ( t * ) 1 6 i L ( t * | 6 , a) = n { -7 i ; — r • - '- - . , F(a. + t - F a . i = l 1 0 1 262 Turning now to L ( 6 a) , consider f i r s t P ( A = 1 a ) = —'— i i P ( i t h case dies on study | the i t h case was aged a. years at the F(a + t ) — F(a ) s t a r t of the study) = i 0 i _ where F i s once again the S(a.) x c.d.f. of the age at death; s i m i l a r l y , i t i s seen that P ( A . = o|a.) = S(a. + t )/S(a.) , so that L(<5|a) = H I F ( d i + V F ( a i ) 1 ° 1 i=ii WGT) f S(a. + t ) l 0_ S(a.) l 1-6 F i n a l l y , L ( t * , 6 a) = 1-6 . n {f(t*)} {s(a. + t )} I l 0 n i = l S(a.) I The logarithm of t h i s c o n d i t i o n a l l i k e l i h o o d can then be d i f f e r e n t i a t e d with respect to the parameter vector 6_ ; s e t t i n g the r e s u l t equal to 0^  , and solving the r e s u l t i n g system of equations (perhaps using i t e r a t i o n ) , a " c o n d i t i o n a l " MLE, 6_ , of 6 can be found. Cox [10] has shown that such estimators share the same large-sample properties as those derived from a " f u l l " l i k e l i h o o d . In p a r t i c u l a r , the asymptotic covariance matrix, V(0_) = E_ , i s given by: i f 0 = (9. , ••• , 9 ) — 1 p 263 Appendix B DERIVATION OF "MINVAR" COEFFICIENTS Let g , g , • • • , g be the o r i g i n a l regression c o e f f i c i e n t s f o r the p r e d i c t o r s , x , x , • • • , x - The existence of a vector c_" = k (c , c , c ) , such that the variance of 3* = 3 + \ e . g . 2 3 k J- J- j=2 ^ ^ (denoted V A R [ g * ] ) i s minimized w i l l be demonstrated. 2 -1 Suppose VAR[3] = a (X'X) i s denoted V and p a r t i t i o n e d as V = v l l j_Y-v* ! v * , where v = VAR[B ] ; v*' = (COV[g , B ] , COV[g , g ] ) ; V* = V AR[g„] = E [ ( g „ 1 k — -^-2 —2 L 2 } ( -^2 " h] '] ' a n d ^2 = (3 2, 3 3, 3 ) • Then, l e t t i n g f (c_) = VAR [ g * ] = V A R [ g 1 + c'3_ ] = v + 2c_'v* + c_'V*c , the minimizing vector, c_ , may be found by solving df/dc = 0 i n c . Now: = 2v* + 2V*c = 0 i f and only i f de — — c = -(V*) v* To f i n d (V*) , a well-known i d e n t i t y f o r the inverse of a p a r t i t i o n e d matrix (see [37, p.68] may be used: V 1 = (l/a 2)X'X = ( V11 " 2*'W*> 1 "m M 264 where m- = ~(v - v*'(V*) 1y_*) 1 v*'(V*) 1 and M = (V*) 1 - (V*) 1 v*m' th Recalling that the i j element of the matrix X'X , is denoted S.. , i t follows that ID (1) ( v x l - v ^ V * ) " 1 ^ * ) " 1 = ( l / a 2 ) S and (2) m = ^ a '21 'kl (1/a )S* Substituting (1) i n t o (2) and s i m p l i f y i n g y i e l d s : c = -(V*) \* = (1/S )s* That i s , c. = S../S.. , the c o e f f i c i e n t when x. i s regressed on x, ; 3 31 i i ~~3 — i thus, x* = x. - c.x, = x. - x. , i s the r e s i d u a l when x. i s regressed ~3 -J D—1 -3 —1 -D on x n , so that x*'x, = 0 . -1 -1 -1 L e t t i n g X* = (**,, x*, • • • , xj)• , and A* = (X*"X*) ^*'x , i t follows from the l a s t remark above that A* = 0_ . Now define x^ = k k 1 a_.x^ as i n (2) of Section 4.2.4, and x* = £ a*x* ; then by (3) of j=2 3~3 "I ^2 3-3' Section 4.2.4, i t i s seen that VAR[g*]/VAR[g ] , the r a t i o of the minimum to the o r i g i n a l variance of the c o e f f i c i e n t of X-^ i s just: 265 {a 2/Z(x.. - x* ) 2} /{a 2/S(x.. - x . ) 2 } i i x l x l x l A 2 2 A £(x., - x._) /Ex., (since each a* = 0) x l x l x l j A 2 2 This l a s t f r a c t i o n may be rewritten as 1 - Ex., /Ex., . Whenever a l l x l x l k v a r i a b l e s , x, x , x. have mean 0 , t h i s r a t i o i s j u s t 2 1 - the c o e f f i c i e n t of determination (R ) for x^ regressed on Appendix C DERIVATION OF PRINCIPAL COMPONENTS 266 Let X = (x^, 2£2' x^) , the matrix of o r i g i n a l p r edictors transformed so that each v a r i a b l e has mean 0 . The f i r s t p r i n c i p a l component, v^ = Xc^ , i s found by maximizing f ( c ± ) = VAR[v^] = c^Ac ... where A = l / ( n - 1)X"X i s the usual sample covariance matrix, and i s subject to the constraint, S^S.-^ ~ 1 • Using a Lagrange multi-p l i e r to express the l a t t e r , we a r r i v e at: Then - i ^ * 1 ) = 2Ac, - 2Ac, = 0 i f and only i f Ac, = A c , . Thus c, i s dc_^ — 1 —1 — 1 1 —1 an eigenvector of the matrix A , and A i s the associated eigenvalue. Moreover, the variance of the associated component v^ i s cj|Ac^ = —1^—1^ = X ' u s ;" L n9 t n e constraint; i f t h i s variance i s to be a maximum, A must be the la r g e s t eigenvalue of A , denoted A^ . When f i n d i n g the second component, there i s an a d d i t i o n a l constraint, —2—± = ^ ' a n < ^ s o o n ^ o r t^ i e r e m a i n i - n 9 " components ( i . e . v^v = v'v = 0 , e t c . ) . The so l u t i o n vectors, c, *•*»£. are found, as above, to be the remaining eigenvectors of A , with c__. correspond-ing to the j*"* 1 l a r g e s t eigenvalue, A . ; again, A . = VAR[v.] . Since 3 3 —3 X , + • • • + A, i s the trace of A , i t follows that the f r a c t i o n , l k — A _ . / E A ^ i s the proportion of the t o t a l variance (of the k o r i g i n a l variables) accounted f o r by the j*"* 1 component. The reader i s r e f e r r e d to [37, p.266] and [1, p.272] for more d e t a i l s . APPENDIX D: CORRELATION MATRIX FOR TIME-ZERO DATA r AGE HIST OUR MAXSYS MAXDIA MINSYS MINDIA SYS DIA HSYM hSIZ HfXG BACH ^ 1 2 3 4 5 6 7 8 9 1_0 11 12 13_ AGE 1 1.000 HIST 2 -0.122 1.000 DDR 3 0770* "=07CH 170 00 ' MAXSYS 4 0.375 0.052 0.092 l.OCO HAXCIA S 0.07? 0.1B2 0.042 0.681 1.000  M I N S Y S = = ~ 5 0.Z54 -0.205 C.110 C.608 0.490 1.000 MINDIA 7 0.113 -0.121 0.107 0.419 0.460 0.762 1.000 SYS 8 0.191 -0.006 0.118 0.759 0.646 0.615 0.523 1.000 DTA 9 0.165 0.03C 0.099 0.722 0.761 0.56 2 074 94 0.808 l.OCO HSYM 10 0.310 -0.036 0.105 0.333 0.269 0.192 0.120 0.388 0.385 1.000 h511 11 0.235 -0.001 -0.077 0.280 0.260 0.152 0.101 0.252 0.254 0.539 1.000 nHUTj r z 0 .3 r i o T t m 0 .034 OT3T3 0 . 3 1 4 OTJSB 072T2 0 .274 0 . 2 5 7 0 .399 0 .351 ncioT5 BACH 13 -0.135 0. 130 -0.011 0.133 0.163 0.112 0.139 0.198 0.144 0.289 0.394 0.119 1.000 BCVA 14 0.261 -0.134 0.117 0.127 -0.014 0.133 0.151 -0.005 -0.004 0.106 0.149 0.244 0.090 KTYK T5 0.053 07044 070~54 TT.2B0 0.367 0.141 C . 1 8 9 " ~ 0 . 2 8 0 0.341 1175*16" 0.466 0.180 0.516 KPRO 16 0.052 -0.023 0.086 0.359 0.486 0.367 0.365 0.366 0.346 0.173. 0.292 0.386 0.283 KSPN 17 0.209 - 0.015 0.112 0.179 0.274 0.265 0.301 0.343 0.267 0.171 0.218 0.328 0.101 TO TB 0.273 -0.048 07T7B 0.520 0.484 0.350 077471 075715 C.456 0.327 6.387 0.37 8 0.252 R p £ T 19 0.122 0.049 0.113 0.409 0.454 0.339 0.325 0.401 0.381 0.220 0.284 0.183 0.348 »P«P 20 -0.011 0.116 0.053 0.260 0.240 0.325 C.290 0.294 0.290 0.211 0.158 0.138 0.271 STX ' 72 CT7TT4T) 0T0TT4 07028 07T98 -0.004 =07076 -OTTO'S 0.157 "OTcTW 0T1TS 07T79" - 0.095 0.1 83 INTERV 23 - 0.213 - 0.084 - 0.097 - 0.478 - 0.527 -0.324 -0.416 - 0.486 - 0.514 - 0.303 - 0.285 - 0.302 - 0.289 BCVA KSYM KPRO KSPN REVES RRET RPAP SEX INTERV r*r~ i s ITS r r re i"9 2Tj 22 2~3 BCVA 14 1.000 TTSTF T5 O T T T S 1.000 • KPRC 16 0.144 0.331 l.OCO KSPN 17 0.073 0.274 0.488 1.000 KfcVfcS T8 0.1VU 07401 0.5'31' 0.475 1.000 " " " " RKET 1 9 0.033 0.531 0.543 0.507 0.670 1.000 RP*P 20 0.007 0.313 0.457 0.435 0.475 C.609 1.000 27 0.067 0.193 C70T9 -0.058 0.032 0.013 -0.165 1751)0 INTERV 23 -0.095 -0.399 -0.386 -0.498 -0.559 -0.534 -0.444 0.215 1.000 OS 00 APPENDIX E: CORRELATION MATRIX FOR TIME ZERO DATA: MALES AGE H I S T CURAT MAXSYS MAXDIA K I N S Y S MI NDIA SYS DIA HSYM H S U HECG BACH ECVA KS YM KPRO KSPN RFVb'S PRET RPAP TIME SEX INTERM AGE H I S T DUR AT MAXSYS MAXDI4 MINSYS M I N D I A 1.0000 G.C299 C . 2 1 8 0 0 . 4 2 3 0 0. 1512 9 10 11 0.2538 0.1968 C.2094 0.3057 0.30*2 0.2500 1.1CO0 0.0372 0.0194 0.1062 1 2 13 14 1 5 16 17 0.3058 0.0263 0.1 106 6.1268 0.0961 0.2627 18 19 20 21 22 23 0.3386 0. 2056 0.1077 -0.3033 0.0 -0.2975 - 0 . 1 9 8 2 - 0 . 2 3 9 0 - 0 . 0 4 4 2 - 0 . 0 5 5 1 0.06 2 2 0 . 0 4 6 1 0 . 1 1 7 5 0 . 1 1 8 1 -Q., 1 0 7 5 0 . 0 0 8 6 - 0 . 1 5 0 3 - 0 . 1 3 5 3 1.0000 0 . 1 4 2 7 0 . 0 1 6 0 0.2310 0.1584 0.1631 0.0900 0.0325 -0.0888 1 . 0 0 0 0 0 . 7 O 8 5 0.6807 0.5599 0.7949 0. 7440 0.3807 0.2971 -0.1639 0.0291 -0,0180 0.1380 0.1006 C.1023 -0.1456 0.0670 0.0757 0.0127 0.0 0.0019 0 . 1 6 7 3 0 . 1 7 0 2 0 . 1 6 8 3 - 0 . 1 2 4 9 0.0 -0..2.3JL. 0.3322 0.2749 0. 1052 6.1274 0.3351 0.2517 _UQO_0D_ 0. 5561 0.5574 0.6325 0.7793 0.3227 0.3438 0.4295 0.1322 -0.0387 0.1301 0.4418 0.3485 1.0000 0.8265 0.7362 0.6258 0.1706 0.1494 6 0 1 8 33 80 3 2 8 3 6 2 1 8 0 .6094-0.4147 0.4216 0.1913 -0.4878 0.0 -0.4673 0. 1 9 1 1 0.2355 0.1351 0.0144 0.447C 0.3252 1 . 0 0 0 0 0 . 6 5 4 7 0 . 5 6 5 1 0. 1 8 9 5 _ 0 j _ L Z 5 £ . 0.4066 0.3897 0.42 50.. -0.4851 0.0 -J.4559 0.2468 0.2756 0.1824 0.1052 0.5253 _0JA296_ 0.4245 0.4721 0.4392 -0.5498 0.0 -0.5392 H S U HECG 11 12 l-SIZ 1 1 1.0000 HECG 1 2 i". 3963 1. OCfiC _ E A C H ..-.13 C . 3 8 1 5 0 . 1 6 7 7 BCVA 14 0.0048 0 . 2 6 1 8 K S YM 1 5 f.2e89 0 . 0 3 3 9 KPHC 16 ^. 37"> 2 0 . 4 5 3 3 KSPN 17 0.2071 0 . 2 6 1 6 R^VHS 18 C. 4 22 7 0 . 3 3 2 3 RKFT 19 0 . 3 3 5 0 0 .113 5 "RPTP 0.2172 C . 1 5 7 0 TIMt 21 -Co 3 39 5 - 0 . 2 5 9 7 SEX ? 7 0.0 . _ o . o INTIfRM 23 -r.3558 -6". 2 4 1 1 BACH BCVA 13 KSYM KPRO 14 KSPN 15 16 17 SYS CIA HSYM 1.0000 0.8185 0.4144 0.2730 0.4504 -0.0012 C.1600 0.4880 _i.40.41_ 0.6338 0.4993 0.4696 -0.6131 0.0 -0.5908 . 1.0000 C. 4362 0 .3893 0.2484 0.3219 -0.0238.. 0.2563 0. 42 52 0.3605 1.0000 0.6150 0.5301 0. 4656 0 . 3622 - 0 . 6264 0 . 0 - 0 . 5 9 7 6 0 .3514 0 . 3 541 0.0.718 0 .2306 C.2806 0-1091 0.3435 0 .1975 0 .2 804 - 0 . 3 0 0 4 0 .0 -0 .2718 REVES RRET RPAP 18 19 20 1.30CC 0 . 1 8 6 2 0.5O45 .2 4 03_ 0 . 1 6 0 1 0 . 3 4 3 5 0 . 4 1 2 6 0.304.1 - C „ 4 2 9 8 .0.0 -i: .45 30 1 . 0 0 0 0 0 . 0 1 0 5 _0_. 1.73.4. 0 . 0 4 6 6 0.0 32 6 - O M 1 2 3 0 . 0 2 4 1 - 0 . 1 0 3 0 0 .0 - 3 . 0 5 7 0 1.0 OX' 0...183.9.. U.OjQQ.C C . 2 5 0 1 0 . 6 4 8 5 l.OOOC 0 . 2 1 4 2 0 . 5 5 6 1 0 . 5 5 8 9 1 . 0 0 0 0 C . 4 5 0 4 0 . 5 9 0 9 0 . 7 2 6 5 0 . 7 0 8 2 1 . 0 0 0 0 . . 0.28 2 7 0 . 5 3 9 6 0 . 5 3 3 2 0 . 5 4 1 5 0. 6 9 9 8 1.0000 - C . 3 5 39 -U.5 70 3 - 0 . 5 7 0 2 - 0 . 6 5 1 9 - 0 . 6 0 9 0 - 0 . 5 9 3 4 .JLuQ _">.0 0.0 0.0 0.0 - 0 . 4 0 0 1 -0 .501 1 - 0 . 5 5 3 8 - 0 . 6 4 2 6 - 0 . 6 2 0 0 - 0 . 6 0 4 1 ON VO APPENDIX F: CORRELATION MATRIX FOR TIME ZERO DATA: FEMALES r " ""' AGE H I S T DURAT MAXSYS MAXDIA MINSYS M I N D I A S Y S C I A HSYM 1 2 3 4 5 6 7 8 9 10 ? AG E 1 1.0000 HI ST 2 - 0 . 2 9 4 0 1.0000 DURAT 3 0 . 3 8 2 6 - 0 . 0 6 7 2 i .oooo MAXSYS 4 0 . 3 2 6 0 0 . 0 9 1 4 0 . 0 3 7 5 1 . 0 0 0 0 MAXCIA 5 - 0 . 0 0 1 1 0 . 2 6 5 9 0 . 0 6 4 2 0 . 6 8 4 3 1 . 0 0 0 0 M I N S Y S 6 C . 2 7 1 7 - 0 . 2 1 9 2 - 0 . 0 3 1 6 0 . 5 9 3 9 0 . 4 2 5 7 1 . 0 0 0 0 KIND I A 7 0 . 0 2 2 2 0 . 0 5 1 7 0 . 0 5 7 6 0 . 3 1 6 4 0 . 3 5 8 3 0 . 6 4 4 7 1 . 0 0 0 0 SYS 8 0.16 * 5 0 . 0 3 9 3 0 . 0 6 8 9 0 . 7 0 6 6 0 . 6 2 9 2 0. 502 5 0 . 4 1 1 6 1.0000 . DIA S 0 . 0 2 9 7 0 . 1 2 3 7 0 . 1 0 4 8 0 . 7 1 2 4 0 . 7 4 5 9 0 . 5 1 4 6 0 . 4 3 7 C 0 .8063 1.0000 HSYM 10 0 . 3 1 2 5 - 0 . 1 6 3 9 0 . 1 7 6 0 0 . 2 4 7 1 0 . 2 1 9 1 0 . 2 5 5 7 0 . 0 5 5 3 0 . 3 3 1 0 0 .3276 1.0000 H S I Z 11 0. 2 1 5 9 - 0 . 0 5 1 8 - 0 . 0 7 8 6 0 . 2 1 4 6 0 . 1 9 6 6 0 . 2 0 0 S 0 . 0 6 3 9 0 .1 7 4 4 0 . 1292 0 .4491 E E C G 12 0.4426 0 . 0 2 3 6 0 . 2 0 7 9 0 . 3 4 7 8 0 . 2 1 3 1 0 . 3 1 8 8 0 . 2 2 5 3 0 . 3 1 6 8 0 .2762 0 .4843 BACH 13 - 0 . 3 1 4 5 0 . 1 4 8 3 - 0 . 0 5 7 5 -0 . 0 7 9 5 0 . 1 5 1 7 - 0 . 0 1 6 8 0 . 0 1 6 9 - 0 . 1 2 2 2 - 0 . 0 3 9 1 0 .1829 BCVA 14 0 , 3 8 9 6 - 0 , 1 6 4 0 0 . 2 2 4 3 0.1275 0 . 0 0 6 6 0 . 1 5 0 9 0 . 1 3 8 8 -0 . 0 2 8 9 . 0 . 0 0 7 4 0 .1283 KSYM 15 - 0 . 0 2 5 5 0 . 0 8 2 4 - 0 . 0 2 3 0 0 . 3 6 6 1 0 . 5 8 4 6 0 . 3 4 9 2 0 .3475 0 . 3 5 4 0 0 .4089 0 .4804 KPRO 16 - 0 . 0 0 0 8 0 . 1 3 2 4 0.0686 D . 3 6 8 5 0 . 5 3 5 3 0 . 2 7 7 2 0 . 1 7 7 0 0 . 2 1 2 8 0. 2609 0 .0261 KSPN 17 0 . 1 5 4 5 0 . 1 4 8 9 0 . 1 2 7 4 0 . 1 2 6 9 0 . 1 9 3 8 0 . 1 6 1 1 0 . 0 9 7 4 0 . 2 9 8 2 0 .1726 0 .7737 R E V E S 18 0 . 1 9 3 9 0 . 0 8 9 9 0 . 1 9 3 2 0 . 4 3 1 2 0 . 5 7 5 2 0 . 2 6 3 9 0 . 2 2 1 5 0 . 3 4 0 7 0 .3753 0 .3013 RRET IS 0 . 0 4 5 9 0 . 0 3 0 1 0.06 39 0 . 4 8 7 5 0 . 4 8 3 1 0 . 2 9 5 7 0 . 1 6 7 9 0 . 3 1 2 9 0. 3063 0.2451 P.PAP 20 - 0 . 1 9 4 3 0 . 2 0 8 4 - 0 . 1 0 5 5 0.2.655 0 . 3 5 3 9 0 . 0 6 7 7 -0 . 0 6 7 8 0 . 0 9 3 7 0 .2369 0 .1616 T I M E 21 - 0 . 0 5 0 7 - 0 . 1 9 1 7 - 0 . 1 4 2 6 - 0 . 4 4 5 1 - 0 . 6 2 3 3 - 0 . 153 7 - 0 . 2 8 2 0 - 0 . 5 2 0 3 - 0 . 5071 - 0 . 4 2 9 0 SEX 22 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 . 0 0 . 0 INTERM 23 - 0 . 0 3 5 5 - 0 . 2 2 1 8 - 0 . 1 4 5 4 - 0 . 4 3 6 5 - 0 . 6 0 9 5 - 0 . 1 0 5 6 - 0 . 2 5 4 4 =J!^4Q5_ - 0 . 4 7 8 1 -0 . 3 4 0 ? - - - _ HS IZ HECG BACH BCVA KSYM KPRO K S P N R E V E S RRET RPAP 11 12 13 14 15 16 17 18 19 20 H S I Z 1 1 1.0000 t-SCG 12 r . 3 5 8 3 l.DCOO EACH 1.3 0.3671 0 , 1 1 1 7 1.0000. BCVA 14 0 . 2 4 9 0 0 . 2 4 4 6 - 0 . 0 1 8 0 1 . 0 0 0 0 KSYM 1 5 0 . 5 7 6 7 0 . 3 4 6 5 C . 4 9 4 1 0 . 2 9 0 7 1.0000 KPRO 16 0 . 2 4 7 9 0 . 3 4 1 3 _0..3099 Q . 1 0 7 3 _C. 4AQQ l . O Q O G KSPN 17 0.2 63 4 0 . 3 9 8 1 0.05 86 0 . 1 1 1 8 C . 3 4 0 1 0 . 3 0 1 5 1 . 0 0 0 0 R F V E S 18 0.3 576 0 . 4 4 9 2 0 . 1 3 8 1 0 . 3 6 7 1 0 . 6 2 7 7 0 . 4 9 8 9 0 . 3 6 0 2 1 . 0 0 0 0 PRET 19 0 . 2 ^ 6 3 0 . 2 4 3 2 0.29 55 0 . 1 5 0 4 0 . 6 1 1 8 0 . 5 0 0 9 0 . 2 8 0 3 0 . 6 4 2 8 1.0000 QPAP 20 0.17^6 0.0 85 9 0 . 3 5 1 1 0 . 0 1 3 0 0 . 5 1 6 5 0 . 3 9 9 4 0 . 2 4 6 2 C . 3 8 9 6 0. 5693 1 . 0000 T I Mt: 2 1 - 0 . 2 9 5 1 - 0 . 3 1 7 7 - C . 3 1 6 4 - 0 . 0 H 4 7 - 0 . 6 5 3 6 - 0 . 3 1 7 6 - 0 . 3 9 1 9 - 0 . 5 6 5 0 - 0 . 5 9 3 2 - 0 . 3 7 2 3 s e x 22 C O 0.0 C .0 0.0 .Os_0_ 0.0 0.0 0.0 0.0 INTERM 23 -i). 308 3 - 0 . 2 789" - 0 . 3 1 2 1 -0 . 0 2 6 0 - 0 . 6 0 3 4 - 0 . 2 9 3 6 - 6 . 3 9 1 5 - 0 . 5 4 2 3 - C . 5 8 04 -0 . 3 5 9 5 i APPENDIX G: CORRELATION MATRIX FOR WITHIN-GROUPS COMPONENTS: MALES P P ] n°2 RP3 RP4 BP 5 RP6 HI BP ! BO 2 PP 3 BP * BP 5 BP6 H2 H3 Bl B2 K l K2 1 0.000 0.000 -P.OPO o . o o o l.oor b.ooc -O.OPP o.ooc l .ooo -0.000 l . O O P o.ooo o.ooo o . i o o HI H2 H3 rr B2 _ K l K2 K3 "P. 1 R2 P3 HIST OUR "TNTERM ' W O . 4 1 6 0 . 0 7 5 -0.000 1 . 0 0 0 0 .000 9 0."61 T O ~ » T . 7 T 9 ~ 11 - 0 . 0 6 9 T! »*0.52 3 13 0.029 14 - 0 . 0 4 3 t 0.274 -0.014 0.016 -0.199 O.P37 0.1 9 5 - 0 . ! 39 ;nTT4"5" - 0 . 0 2 0 - 0 . 0 5 8 It 0 . 3 0 5 - 0 . 0 3 6 "oTJoT" • * 0 . 3 0 3 1 5 16 17 TT 2 0 21 *fr075"83 — - n . i 0 6 -0.083 0.159 1 8 * * - 0 o 6 2 7 - 0 . 0 9 6 - 0 . 1 5 5 - 0 . 0 3 4 "=0.007" * - 0 . 3 2 0 - 0 . 1 0 0 «X # • 0 . 3 4 7 - 0 . 1 5 3 - 0 . 0 4 9 0 . 1 9 4 0 . 0 2 1 0.022 7rl.07S 0.208 •0.503 0 . 0 3 2 •0.160 0 . 0 8 4 - 0 . 1 6 1 0.142 - O . O l f l - 0 . 1 8 2 - 0 . 1 5 4 0 . 0 1 9 0 . 1 4 4 0 . 1 1 9 - 0 . 0 1 9 0 . 0 5 5 0 . 0 0 0 0.1 §1 - 0 . 0 3 1 6. 103 "oTJ^o" 0 . 0 1 5 0.121 - 0 . 1 7 8 0 . 0 2 5 ~6. 208 - 0 . 1 4 9 0. 102 0.0 84 - 0 . 0 3 8 - 0 . 0 5 8 - 6 .294 1.000 O.Olfl -0. 182 H I . i 2 a - o T I o T °-ulli 0.096 -0.193 -0.051 -0 .06 7 0.031 0.028 It 0.334 -0.178 -0.012 -0.129 1.000 - 0 . 0 0 0 0.000 *1 0 .388 1  - 0 . 0 5 6 000 boo »*0.402 -0.080 -0.218 « 0 . 3 9 9 -0.183 0.162 *^0.355 0.089 -0.C74 Jtfij » ^ - 0 . 3 6 1 0, 0. _0. -0. -0. • f t 0, -o. -0, TJTTT 2JZ. 1.000 208 238 030 016 129 084 085 075 166 01 1 - 0 . 0 2 4 -0.047 0.113 -0.062 ~ C . 1 0 4 " 0.073 » 0 . 2 8 8 0.072 0.032 0.110 0.104 ,000 .000 *-0. = 0. * " Q . 0. -0. 0, 0. **-o. TBTT 350 228 364" 010 °£2_ 1 .QQO 0 . 0 1 5 0 . 2 4 6 - 0 . 0 1 6 - 0 . 2 0 9 - 0 . 0 8 6 • 0._2J>9_ 055 0 . 0 9 6 077 - 0 . 1 5 9 021 - 0 . 0 3 1 428 0 .151 1.000 - 0 . 0 0 0 o.ooo _L3_ 1.000 -P iOOQ **6.729 - 0 . 1 7 4 0 . 0 9 5 ^ - 0 . 1 7 6 0 . 1 7 0 - 0 . 1 5 4 „ 0 , 1 1 9 **-0.585 - 0 . 1 2 8 - 0 . 0 5 3 - 0 . 1 0 5 0 . 2 5 6 K3 P I ! 4 15 AGS K3 14 " T T l .onp 0.124 p -> R 3 AGE HIST DIJP INTERM 1 6 1 _7_ ' 1 9 20 21 18 0 . P i o - P . 1 7 9 T 7 l 9 " 4 " ' - 0 . 0 2 7 - 0 . 0 1 9 - 0 . 0 6 6 1.000 17 1.000 -O.OOO 1.000 P. POP - c . o o o  * 0 . 273 - C , 219. . 0 . 0 0 6 - 0 . 0 2 5 » n . ? 6 7 - 0 . 1 0 8 , 0 . 1 9 1 n . 0 1 8 0 . 0 2 5 " - 0 . 7 1 ; - C . n 2 n - O . l P l HIST DUR .19. ! . 0 0 1 0 . r 30 0. 218 - P . 2 9 7 20 1.000 0 . 0 3 7 0.00 2 21 I KTERM ._ 18 1.000 - 0 . 1 2 3 1 . 0 0 0 APPENDIX H: CORRELATION MATRIX FOR WITHIN-GROUPS COMPONENTS: FEMALES _ RP1 BP 2 1 BP4 R P 5 BP6 HI H2 H3 B l B2 10 K l _ i l _ _L2_ K2 _L2_ BP1 BP2 "RP 3" RP 4 BP5 -0.000 -P.noo 0.000 1 . 1 0 0 6 . 0 0 0 - O . 1 0 C -o.oor 1 ."00 - 0 . 0 0 0 - 0 . O C 0 i .nno -o.ioo l .ooo BP 6 HI H2 H 3" Rl R2 ~KT~ K2 K3 R r P2 P3 -O.onn t)»0.362 -0.12R "o ."O.i 5 -o.oon o.ooo o.non 0.061 -0. 1 i 7 0.143 -0.025 0 > i O -0.016 10 11 13 -0.107 14 -0.212 T 5 • ~ B r T?o5 16 -0.109 17 -1.073 0.02C -0.130 AGE HIST DUR INTERM 20 0.073 21 0.057 1 R M - 0 . 4 R 5 » 0. 130™ 0.063 0.029 0. 218 -0.006 0. 229 : o n r 0.228 0.037 -0.290 1 84 096 C„030 -0.154 - r . i 11 "*-0."5*>l -0.110 -6.074 " -0.040 * 0.2 86 "TjToTr ' o. 15ft 1.010 -0.067 -0.075 -0.083 -757 oTA 0.074 -0.124 0.078 0.! 5R ~T.0T7 0.218 -0.0 57 « <I~0T4 30 M-0.371 -0.112 *• 6 . 3 37 0.349 -0.238 0.175 7-"o72'5T 0.040 -0.123 -0.014 0.175 -0.067 -bTTi"9—' 0.070 -f'.Q8 3 0.150 0.129 -0.132 -?J709~9~ 0.032 0.246 0.038 -0.078 0.073 0.025 0. 149 • 0. 259 0.0 36 -0.153 -0. 130 0.049 0.064 0.086 1.000 0.000 O.oor. •6.263 * 0.390 1.000 -O.OOO 6 . 2 2 7 0.040 »*0.455 |»s0.3 94 0.1 90_ jS6.403 -0.2 53 -0.002 -0.003 -0.067 *-0.298_ ' "^3.7519 0.102 0.086 1 . 0 00 0 .022 0 .239 0 .226 0._213_ " 6 . 0 4 3 - 0 . 1 6 6 - 0 . 0 4 0 5*0.410 -0.072 0.119 - 0 . 1 9 8 - 0 . 0 4 9 • -0.263 "HtO.396" - 0 . 0 2 0 0 .043 0 .182 - 0 . 0 9 6 0 .059 1.000 - 0 . 0 0 0 - 0 . 0 4 9 > -0 .325 /" 0 . 2 5 4 0 .205 I .000 1.225 - 0 . 1 6 0 _ - 0 j 2 0 0 * 0 . 2 6 9 X - 0 . 2 8 6 0 . 1 8 9 - 0 . 0 5 7 * - 0 . 3 4 6 0 . 1 6 2 0 . 0 7 6 * - 0 . 3 0 8 * 67356 - 0 . 1 4 8 0 . 218 - 0 . 0 5 8 1.000 0 . 0 0 3 0 . 0 0 4 #*0.678 - 0 . 0 9 1 O . ) 4 0 0 . 0 2 0 0 . 1 5 0 0 .06 8 1 .000 0 .000 - 0 . 1 8 0 0 . 0 9 7 -•a»o_Z3_. - 0 . 1 0 1 - 0 . 0 4 8 - 0 . 0 2 6 0 . 4 6 7 * * 0 . 3 8 0 PI 14 15 R3 1 6 AGE 17 HIST 1 9 OUR 20 21 INTERM 18 KJ_ PI R2 P3 HI ST DUR INTPPM • 00<"> 15 -0.233 16 17 ~lT 2 0 21 1 8 -n..->31 -0.0 5 5 0„075 0.142 0.198 1. Jflp 0.000 1.000 -0.000 0.001 1.000 0.089 -0.229 •-0.254 0.072 -0.047 1.234 0.108 -0.195 -0.162 **1.624 l.<"32 -0.038 1 . 0 0 1 | x - 0 . 294 ••'P.383 - i > . 0 36 1 .000 -0.067 -0.222 1.000 -0.145 1.000 K 3 APPENDIX I: CORRELATION MATRIX OF ORTHOGONALIZED WITHIN-GROUPS COMPONENTS MALES \ BPl 1 BP2 2 EP3 3 e P 4 4 HP 5 5 BP6 6 T H l 7 TH2 R TH3 9 UBl 10 UB2 11 SKI 17 SK2 1 "3 BPl BP 2 1 2 1 .000 O.OOC 1.000 —< BP3 BP4 BP 5 3 4 C 0.000 -0.000 0 .000 0.000 -0.000 0. 100 1 .000 -C.000 -C.000 1.000 O.ono 1.0 I K BP6 THl TH2 6 7 8 0.000 0.000 0.000 0.000 0.000 -0.000 -0.000 -0.000 C.OOO 0.000 -0.000 -0.000 COCO 0.000 0.000 1.000 -0.000 C.OOO 1.000 -0.155 1,000 TH3 UBl UB2 9 10 11 0.000 0.000 0.000 0.000 0.000 -0.000 C.000 0.000 -C.OOO -0.000 0.000 0.000 0.000 - 0 . 0 0 0 0 . 0 0 0 -O.OOC - C O JO 0.000 -0.047 0.000 0.000 - 0 . 0 2 7 -C.OOO 0.000 1.000 -0.000 - 0 . 0 0 0 1 . 0 0 0 0.29Q 1 . 0 0 0 SKI SK 2 SK3 12 13 11 -0.000 - 0 . 0 0 0 0.000 -0.000 - 0 . 0 0 0 0,000 C.OOO C.OOO -C.OOO - 0 . 0 0 0 -o.ooo 0.000 -0.000 0.000 0. 0 0 0 -O.OOC -O.OOC -0.. 0 0 0 -o.ooo -0.000 -0.000 -0.000 O.OCO -0.000 -0.000 -O.OCO -o.ooo -O.OOO - 0 . 0 0 0 0 . 0 0 0 - 0 . 0 0 0 0 , 0 0 0 0 . 0 0 0 1 . 0 0 0 0 . 1 4 ? - 0 . 1 3 4 1 . 0 0 0 0 . 1 1 2 R R l RR 2 RR 3 15 16 17 0.000 - 0 . 0 0 0 -0.000 0.000 - 0 . 0 0 0 0.000 -0.000 -C.OOO -C.OOO 0.000 -0.000 0.000 -0.000 -0.000 - 0 . 0 0 0 O.OOC -O.OJO O.OOC -o.ooo 0.000 o.ooo -0.000 0.000 -0.000 - 0 . 0 0 0 0 . 0 0 0 0 . 0 0 0 0 . 0 0 0 0 . 0 0 0 - 0 . 0 0 0 0 . 0 0 0 0 . 0 0 0 - 0 . 0 0 0 0 . 0 0 0 - 0 . 0 0 0 - 0 . 0 0 0 - 0 . 0 0 0 0 . 0 0 0 0 . 0 0 0 A G E HIST OUR 19 20 21 0.300 - 0.083 0.159 0.053 0.347 -0.153 -0.274 0.032 - 0 . 1 6 0 0.144 0.1 19 -0.019 0.064 -0.038 -0.05 8 0.334 -0.178 -0.012 0.314 0.158 -0.143 0.138 0.180 -0.155 0.147 -0.013 0 . 148 - 0 . 1 5 6 0.092 - 0 , 0 8 3 0 . 0 0 4 0.027 - 0 , 0 Q 7 0 . 0 8 8 - 0 . 1 8 3 - 0 *0Q4_ - 0 . 0 0 5 0 . 1 1 6 - 0 . 1 1 5 INTERN 18 - 0 . 6 2 7 -0.049 0.084 0.055 -0.294 -0.129 0.047 - 0 . 0 0 7 0.121 -0.135 0.083 - 0 . 0 9 8 0 . 1 1 6 SK3 14 RRl 15 RR2 16 RR3 17 AGE I i HIST 20 OUR 21 INTERM 18 SK 3 RR 1 14 15 1 .000 0.000 1.000 RR 2 RR 3 AGE 16 17 19 0.000 -0.000 0 . 1 » 2 0. l i s - 0 . 136 0, 143 l.OCO 0. 1 26 -C.ioo. 1.000 -0=1.5.3 1..U.C0 HIST OUR INTERM 2C 2 1 18 -0.048 -0.050 -0.037 0.061 rt. 1 2 2 - 0 . 363 0.427 I! .009 - C ] 69 -0.054 -0.086 0.005 0. 0 30 1. . 2 1 3 - J . 2 9 7 l.CiJC 0 . 0 3 7 0.002 1.000 - 0 . 1 2 3 1 . 0 0 0 K 3 APPENDIX J: CORRELATION MATRIX OF ORTHOGONALIZED WITHIN-GROUPS COMPONENTS: FEMALES S RP1 1 BP 2 2 BP3 •a BP4 4 BP 5 5 e P 6 6 THl 7 TH2 8 TH 3 9 UB1 10 UB2 11 SKI 12 SK2 1 3 EP1 BP2 1 2 1 . 0 0 0 -0.000 1 .000 —s BP3 BP4 BP 5 4 5 -0.000 -0.000 0.000 0.000 -0.000 -0.000 l .ono -c.ooo -o.ooo l.OOC - 0 . 0 0 0 1 . 0 0 0 BP6 THl TH2 6 7 6 -0.000 0.000 0.000 -O.OOO 0.000 0.000 o.oco 0.000 c.ooo 0.000 COCO 0.000 0.000 - 0 . 0 0 0 -o.ooo 1 . 0 0 0 - 0 . 0 0 0 - 0 . 0 0 0 1.000 -0.0 99 1.000 TH3 UB1 UB2 9 10 11 -0.000 0.000 0.000 -0.000 0.000 o.ooo -c.ooo -c.ooo -c.ooo -C.OOO 0.000 0.000 0.000 - 0 . 0 0 0 -0.000 c.ooc -0.000 - 0.000 -0.028 -O.OOO -0.000 0.128 0.000 -0.000 1.0 00 -0.000 -COCO 1.000 -0.130 1.000 SKI SK2 SK3 12 13 14 0.000 -0.000 0.000 -0.000 0.000 0.000 0.000 0.000 -c.ooo -0.000 0.000 0.000 0.000 -0.000 -0.000 0.000 o.ooc -0.00 0 0.000 -0.000 0.000 0.000 C.OOO 0.000 -0.000 0.000 -0.000 -0.000 0.000 -0.000 0.000 -0.000 0.000 1.000 0.218 XUA54 1.000 0.007 RR 1 RR 2 RR3 15 16 17 -0.000 0.000 -n.ooo 0.000 0.000 0.000 c o c o -0.000 -c.ooo -0.000 0.000 0.000 0.000 -0.000 -0.000 -o.ooc -0.000 -0.000 -C.OOO -0.000 -o.ooo -0.000 -0.000 0.000 0.000 0.000 -0.000 0.000 -0.000 0.000 -0.000 0.000 o.ooo 0.000 0.000 -0.000 -0.000 -0.000 0.000 AGE HIST CUR 19 20 21 0.182 0.073 0.057 -0.012 0. 228 0.037 0.430 -0.371 -0.112 0.038 -0.078 0.073 0. 149 0. 259 0.036 -0.130 0.049 0. 064 C.279 -0.008 0.114 -0.121 -0.031 -Q.16 9 -0.074 C.132 -0,193 6.018 -0.227 0.144 . 0.021 -0.140 0.234 -0.115 „..:. 0.211 0.066 0.010 0.046 ...... 0.056^  0.016 -0.133 -0.107 O^ QQfl 0.282 INTERM 18 -0.485 -0.29C 0.337 0.025 - 0 . 1 5 3 0.086 -0.114 -O.OIO SK3 14 RR1 15 RR2 It RR3 17 AGE 19 HIST 20 OUR 21 INTERM 18 SK 3 RR 1 14 15 1 .000 -o.ooo l.OOC RR2 RR3 AGE 16 1 7 19 -0.000 - 0 . 0 0 0 0.075 0.084 -0.017 -0.015 1 . 0 0 0 C O 10 - 0.321 1 . 0 0 0 - 0.177 1..J00 HIST OUR INTERM 20 21 18 0.192 0.177 -O.nni - 0 . 0 8 5 0.115 - 0 . 3 5 ? 0.028 - 0.178 - 0 . 1 - 9 2 6.179 - 0.191 - 0.021 -G.2 9 4 C. 383 - 0.036 l.COO -0.06 7 -0.222 l .ooo - 0.145 1.000 ( S i Appendix K: Estimated Co e f f i c i e n t s of the Growth Curve Model ( x 10 ) FEMALE SURV. MALE SURV. FEMALE NON--SURV. MALE NON-SURV CONST. LIN. QUAD. CONST. LIN. QUAD. CONST. LIN. QUAD. CONST. LIN. QUAD. SYS 22994 -175 105 22344 -143 -5 31692 -59 -58 29389 102 -56 DIA 24428 -154 106 21008 -72 160 34800 -237 -75 29824 -121 -65 *HSYM 2089 32 -14 1 36 28 3316 135 7 3660 208 -39 *HSIZ 3136 61 -17 2376 66 -17 5319 115 -28 4352 130 -67 HECG 9664 337 64 9155 217 110 14957 668 165 18815 664 -19 *BACH 1259 -79 34 861 -46 6 2244 -103 9 155 -71 59 BCVA 3516 -160 196 -1556 -104 111 -4814 -191 269 -1056 -102 56 *KSYM 686 34 7 -237 26 15 2032 70 -1 1923 144 -3 *KPR0 669 -23 15 1081 2 -18 3870 41 -41 1948 131 -15 *KSPN 1195 31 19 10428 28 -16 36104 82 -26 25252 177 -8 RVES 14362 101 18 10206 187 55 17485 402 68 20157 879 -65 *RRET 688 2 1 255 -9 -1 1732 170 4 1861 124 -5 RPAP 1915 -34 -27 753 12 -7 1850 17 -19 16562 261 -156 * Transformed by adding 1.0, then taking the base 10 logarithm APPENDIX L: CORRELATION CHART FOR COMBINED SAMPLE, TIME ZERO DATA POSITIVE CORRELATION VARIABLE 0.10-0.21 0.0-0.10 0.00-0.08 0.08-0.16 0.16-0.24 0.24-0.32 0.32-0.40 0.40-0.48 0.48-0.56 0.56-0.64 0.64-0.72 0.72-0.81 AGE (AG) HT, BA RP XD, KY, KP SE ND, RR SY, DI, HI KN DU, NS, HY BC, RV XS, HE HIST (HT) BC, AG, NS ND DU, SY, HY HI, KP, KN XS, DI, SE HE, KY, RR BA, RP XD DUR (DU) HT, BA, HI XD, HE, KY RP, SE XS, NS, ND SY, DI, HY2 RV AG MAXSYS (XS) HT DU, BA, BC KN, SE HI, HE, KY RP AG, HY, KP ND, RR RV NS XD SY, DI MAXDIA (XD) BC, SE AG, DU HT, BA, RP HY, HI, HE KN KY ND, RR NS, KP, RV XS, SY DI MINSYS (NS) HT SE DU, HI, BA BC, KY HY AG, HE, KN KP, RV, RR RP XD XS, SY, DI ND MINDIA (ND) HT, SE AG, DU, HY HI, BA, BC KY HE, KN, RP KP, RV, RR XS, XD SY, DI NS SYSO (SY) HT, BC DU, SE AG, BA HI, HE, KY RP HY, KP, KN RR ND, RV NS XD XS, DI DIAO (DI) BC HT, SE DU, BA AG HI, HE, KN RP HY, KY, KP RR RV ND NS XS, XD, SY HSYMO (HY) HT DU, ND, BC SE NS, KP, KN RR, RP AG, XD, BA XS, SY, DI HE, KY, RV HI HSIZO (HI) HT, DU NS, ND, BC SE, RP AG, SE, KN XS, XD, SY DI, RR, KP HE, RV, BA KY HY HECGO (HE) SE HT, DU BA, RP KY, RR XS, XD, NS ND, SYj DI 3 AG, HY, HI KP, KN, RV BACHO (BA) AG DU HT, XS, NS ND , DI, HE 4 XD, SY, SE HY, KP, RV RP HI, RR KY BCVAO (BC) HT XD, SY, DI KN, RR, RP SE DU, XS, NS ND, HY, HI 5 KY, RV AG, HE KSYMO (KY) AG, HT, DU NS ND, HE, BC SE XS, SY, KN RP XD, DI, HY KP HI, RV BA, RR KPROO (KP) HT AG, SE DU, BC HY HI, BA XS, NS, ND SY, DI, HE 6 RP XD, KN, RR RV KSPNO (KN) HT, SE BC DU, BA AG, XS, HY HI XD, NS, ND DI, KY SY, HE RV, RP KP, RR RVESO (RV) HT SE DU, BC AG, BA NS, ND, HY HI, HE DI, KY, KN RP XS, XD, SY KP RR RRETO (RR) HT, BC, SE AG, DU HY, HE HI NS, ND, DI BA XS, XD, SY KY, KP, KN RP RV RPAPO (RP) SE AG DU, BC HT, HI, HE HY XS, XD, ND SY, DI, BA6 NS KP, KN, RV RR SEX (SE) ND, RP XD, NS, HE KN AG, HT, DU DI, BC, KP' SY, HY XS, HI, BA KY -INTERV BC HT, DU AG, SE HY, HI, HE BA NS, KY, KP XS, ND, RP XD, SY, DI KN, RV, RR 0.10-0.21 0.00-0.10 0.00-0.08 0.08-0.16 0.16-0.24 0.24-0.32 0.32-0.40 0.40-0.48 0.48-0.56 0.56-0.64 0.64-0.72 0.72-0.81 RV 2BC, KP, KN, RR 3BC 4BC, KN 5BA, KP °KY RV, RR 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0080295/manifest

Comment

Related Items