STATISTICAL ANALYSIS OF SURVIVAL DATA: AN APPLICATION TO PERHIPHERAL VASCULAR BYPASS SURGERY BY PREETHI NIRMALIE KOTTEGODA B . S c , The University of Sri-Jayawardenepura, S r i Lanka, 1981 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (The Department of S t a t i s t i c s ) We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA November, 1985 (c)Preethi Nirmalie Kottegoda, 1985 In presenting degree at of department publication in partial fulfilment University of British Columbia, for and the freely available copying this this or of thesis reference thesis by this for his thesis study. scholarly or for her financial STATIST X C S The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 DE-6(3/81) I further purposes gain the requirements I agree that agree may be It is representatives. permission. Department of of shall not that the permission granted an advanced Library shall by understood be for allowed the for make extensive head that without it of copying my my or written ABSTRACT A retrospective study was carried out on 535 patients who underwent bypass surgery for peripheral vascular disease. Survival data f o r 303 patients out of these 535 cases are subjected to quantitative analysis. The main i n t e r e s t i s i n s u r v i v a l of these patients i n order to i d e n t i f y the r i s k factors. The importance of types of grafting technique i n long-term s u r v i v a l i s also considered. S t a t i s t i c a l methods used to ascertain the important prognostic variables include Cox's proportional hazards model, stepwise regression and a l l subsets regression i n proportional hazards model discussed Kuk (1984). by In descending order of s i g n i f i c a n c e , the most important variables are myocardial i n f a r c t i o n , presence or absence of hypertension, sex and whether or not a r e v i s i o n operation was done. The variable, h i s t o r y of a previous coronary bypass graft i s highly correlated with s u r v i v a l but the comparison of i t s s i g n i f i c a n c e to the other s i g n i f i c a n t variables i s not possible with Cox's model. also related to s u r v i v a l i n t h i s data set. Age i s However, since there i s no control group, one cannot make a strong conclusion about the e f f e c t of age on s u r v i v a l of the patients who vascular disease. have had surgery for peripheral iii TABLE OF CONTENTS ABSTRACT _ _ _ _ _ TABLE OF CONTENTS _ _ LIST OF TABLES _ _ LIST OF FIGURES _ _ ACKNOWLEDGMENT . INTRODUCTION Chapter 1 _ . _ _ _ _ _ _ _ _ _ _ _ . _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ - _ - _ _ _ _ _ _ _ _ _ _ - _ _ _ . 1.4 SUMMARY STATISTICS _ _ _ _ _ _ _ _ _ _ _ _ if _ V _ v _ _ _ ' 7 _ 1 _ _ v ' % _ 2. _ S _ _ _ - 6 _ - - It _. _ f £ _ _ _ _ GENERAL THEORY FOR COX'S MODEL _ 2.2 APPLICATIONS AND RESULTS FROM COX'S MODEL 2.3 THEORY FOR STEPWISE REGRESSION IN COX'S MODEL 2.4 RESULTS FROM STEPWISE REGRESSION 2.5 CHECKING FOR PROPORTIONALITY ASSUMPTION AND ADEQUACY OF THE FIT Chapter 3 - - - _ _ 2.1 IN COX'S MODEL - _ _ _'<(( _ _ _ _ _ _ _ _ _ _ CLEANING UP OF DATA _ _ _ 1.3 _ _ _ SOURCE OF DATA AND HOW IT WAS COLLECTED _ _ _ _ _ 1.2 COX'S REGRESSION MODEL _ _ _ MEDICAL ASPECTS _ _ _ 1.1 Chapter 2 _ _ _ DETAILS AND BACKGROUND OF DATA _ „ _ _ _ _ _ Of. _ _ _ - _ _ -2.7 - - - _ - - 2.8 - - 3 2 . ALL POSSIBLE SUBSETS REGRESSION IN COX'S MODEL - - - H 3.1 THEORY 3.2 APPLICATIONS AND RESULTS _ _ _ - _ _ _ _ _ - 1 _ _ - - - - - - £f iv Chapter 4 Chapter CHECKING FOR INFLUENTIAL OBSERVATIONS 5 CONCLUSIONS _ BIBLIOGRAPHY _ _ _ APPENDIX 1 _ _ _ APPENDIX 2 _ . „ _ _ _ _ _ _ _ _ _ _ _ _ _ _. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _._ _ _ _ _ _ _ _ _ _ _ _ _ _ ^ 9 _ S"3 56 _ _ . . _ - _ _ _ _ _ 5 " 7 _ _ _ _ 6 / V LIST OF TABLES TABLE I II III IV V a r i a b l e s A s s o c i a t e d w i t h t h e study _ _ Frequency D i s t r i b u t i o n o f V a r i a b l e s _ _ Frequency D i s t r i b u t i o n o f O p e r a t i o n Type Frequency D i s t r i b u t i o n o f Follow-up _ „ _ _ _ _ _ _ _ _ _ // ,. Information f o r Cox's Model VI Regression C o e f f i c i e n t s f o r Cox's Model; V a r y i n g Month o f Death VII _ Regression C o e f f i c i e n t s _ _ Regression C o e f f i c i e n t s X XI XII XIII - _ - _ - _ _ _ _ _ f o r Cox's Model; S u r v i v a l True S u r v i v a d P a t i e n t s IX - _ _ - /3 2,0 - f o r Cox's Model; S u r v i v a l Time i n Years and True S u r v i v e d P a t i e n t s VIII _ _ - Regression C o e f f i c i e n t s _ _ _ V and/or O p e r a t i o n 9 _ _ Estimated C o r r e l a t i o n Matrix _ _ _ _ _ _ _ Parameter E s t i m a t e s from Stepwise Regression Two-way Contingency _ Tables _ _ _ _ _ 23 Time i n Months and _ _ _ _ _ - _ _ _ _ - 25" _ _ 3/ _ _ _ - _ _ Summary o f I n f l u e n c e F u n c t i o n Values _ _ _ _ P r o p o r t i o n a l Hazards R e g r e s s i o n Model _ _ _ _ 2.^- 5"/ SZ vi LIST OF FIGURES FIGURE 1 Log minus log survival function f o r MI _ _ _ 2 Log minus log survival function f o r AGE _ _ _ _ _ 3 Log minus log s u r v i v a l function f o r HYPT _ _ _ _ _ 4 Log minus log survival function f o r SEX 5 Log minus log s u r v i v a l function f o r ADDOP _ _ _ 38 6 Log minus log survival function f o r D2 _ _ _ _ 39 7 Residual plot f o r checking proportional hazards model _ _ _ _ _ _ _ _ 3^ 35 _ _ 3 o _ _ _ . _ 37 ^0 vii ACKNOWLEDGEMENT I would l i k e to thank Dr. N. Reid f o r her guidance and assistance i n producing t h i s thesis. I am also indebted to Dr. M. Schulzer f o r his careful reading and h e l p f u l c r i t i c i s m s of t h i s thesis. I would l i k e to express my gratitude to Dr. M.T. Janusz, who kindly provided the data and offered advice on the medical aspects of t h i s thesis. The f i n a n c i a l support of the University g r a t e f u l l y acknowledged. of B r i t i s h Columbia i s -1- INTRODUCTION Bypass s u r g e r y f o r p e r i p h e r a l v a s c u l a r d i s e a s e has been g a i n i n g wide acceptance as an e f f e c t i v e a l t e r n a t i v e t o amputations. t h e r e a r e c o n t r o v e r s i e s about t h e s u r g i c a l t e c h n i q u e s , l e s s has been d i r e c t e d t o t h e e v a l u a t i o n o f r i s k f a c t o r s . Although attention As a r e s u l t o f many p e o p l e b e i n g i n t e r e s t e d i n v a r i o u s s u r v i v a l s t u d i e s , most s u r g i c a l c e n t r e s keep f o l l o w - u p r e c o r d s o f t h e s u r v i v a l e x p e r i e n c e o f t h e i r patients. I n t h i s study, r e t r o s p e c t i v e l y o b t a i n e d r e c o r d s o f one such centre are subjected to q u a n t i t a t i v e a n a l y s i s i n order to i d e n t i f y factors affecting survival. C l i n i c a l d e t a i l s o f t h e bypass procedures are presented i n s e c t i o n 1.1 w h i l e t h e background o f d a t a a r e g i v e n i n s e c t i o n 1.2. Chapters In 2 and 3, answers t o t h e f o l l o w i n g q u e s t i o n s a r e sought: 1) What f a c t o r s a r e t h e most important 2) How does each bypass t e c h n i q u e a f f e c t The s t a t i s t i c a l methods employed t o answer these q u e s t i o n s a r e Cox's p r o p o r t i o n a l hazards i n predicting survival? r e g r e s s i o n , stepwise r e g r e s s i o n , a l l s u b s e t s r e g r e s s i o n i n p r o p o r t i o n a l hazards model and c o n t i n g e n c y analysis. 5. table A method o f d e t e c t i n g any i n f l u e n t i a l o b s e r v a t i o n s i s d i s c u s s e d i n Chapter Chapter survival? 4. C o n c l u s i o n s and s u g g e s t i o n s a r e g i v e n i n -2- Chapter 1 DETAILS AND Section 1.1 BACKGROUND OF DATA MEDICAL ASPECTS Diseases involving peripheral blood vessels, that i s blood vessels i n the arms and legs,are known as peripheral vascular diseases. Bypass surgery for peripheral vascular disease i s a highly accepted s u r g i c a l treatment. This reduces the number of amputations, which had been the most common s u r g i c a l procedure that was available. Different types of bypass procedures are used depending on the patient's condition. selecting patients. preference Each surgeon has somewhat d i f f e r e n t c r i t e r i a i n Another bias introduced i s the surgeon's for one s u r g i c a l technique over another. While the r e s u l t s of these operative procedures have been studied extensively, less attention has been directed to the evaluation of r i s k factors. In t h i s study we are interested i n s u r v i v a l of patients undergoing surgery for peripheral vascular disease, i n order to i d e n t i f y the r i s k factors. We are p a r t i c u l a r l y interested i n s u r v i v a l of the patients with deaths due to cardiac disease i n order to i d e n t i f y high, medium, and/or low r i s k patient groups i n the hope of i d e n t i f y i n g populations who t h e i r heart. are l i k e l y to benefit from aggressive investigation of -3- Aorotobifemoral bypass grafting has become the procedure of choice f o r most patients with occlusive disease of the a o r t i c biftilrcation, which i s the junction where the abdominal aorta divides into the l e f t and r i g h t branches. r i g h t common i l i a c a r t e r i e s . These two branches are the l e f t and In t h i s type of technique, the graft i s extended to the l e f t femoral artery ( i n the l e f t leg) and the right femoral artery ( i n the r i g h t leg) because a o r t i c flow w i l l be better when both sides are revascularized. By taking the graft to femoral a r t e r i e s most of the disease i s bypassed. The most popular and commonly used grafting material i s Dacron, which can be woven or knitted. Usually, a Dacron tube with two limbs i s used f o r Aorotobifemoral grafting. The proximal end of the graft i s sutured to a small hole cut i n the front of the aorta. end-to-side anastomosis. This process i s c a l l e d a Sometimes the aorta can be completely divided and the proximal end of the graft anastomosed end-to-end. Distally, one limb of the graft i s sutured end-to-side to a hole cut i n the r i g h t femoral artery and s i m i l a r l y , the other limb to the l e f t femoral artery. The Femoropopliteal bypass procedure i s used to bypass occlusion of the s u p e r f i c i a l femoral artery, when there i s an adequate flow i n the p o p l i t e a l artery i n the leg. The most acceptable grafting material currently available, i s the reversed saphenous vein. This vein possesses valves which only allow the flow of blood towards the heart. I t i s therefore necessary to remove an appropriate length of the vein and to reverse i t s d i r e c t i o n , before grafting i t to the artery. One end of the reversed vein i s stitched to a small longitudinal i n c i s i o n made i n the p o p l i t e a l artery and the other end to a s i m i l a r cut made i n the common femoral artery. Both anastomoses are performed end-to-side. An aneurysm i s an abnormal d i l a t a t i o n of a blood vessel, usually forming a pulsating tumour. commonly seen aneurysm. Abdominal A o r t i c Aneurysm i s the most I t consists of weakening of the a r t e r i a l wall of the aorta so that i t i s l i k e l y to be stretched by the force of a r t e r i a l blood pressure. When the wall i s weakened, the whole vessel tends to d i l a t e but i f the vessel wall i s weaker over one area, that part of the vessel i s l i a b l e to blow out and form an aneurysm. Tube g r a f t , end-to-end b i f u r c a t i o n graft from aorta to the r i g h t common i l i a c artery or end-to-side b i f u r c a t i o n graft to the l e f t external i l i a c artery, are some of the possible types of resconstruction for t h i s disease. Usually, a woven Dacron graft i s preferred as the grafting material. Other types of peripheral vascular operations include Axillofemoral bypass graft i n which the :axillary artery ( i n the arm) and the common femoral artery are involved. One end of the graft i s stitched on to a small cut made i n the ...axillary artery and the other end to a s i m i l a r cut made i n the common femoral artery. When one i l i a c artery i n a leg i s severely occluded and the other i l i a c artery i n the other leg i s a suitable donor-vessel, blood can be delivered to the ischemic end v i a a Femoral-Femoral bypass, I l i a c - I l i a c bypass or Iliac-Femoral bypass. There are several other operation techniques and bypass procedures f o r peripheral vascular disease, but the ones described above are the most common. In f a c t , Aorotobifemoral and -5- Femoropopliteal procedures account f o r the majority of bypasses i n peripheral vascular disease. Section 1.2 SOURCE OF DATA AND HOW IT WAS COLLECTED The data analysed here i s a c o l l e c t i o n of observations and measurements from reports on patients who had undergone peripheral vascular surgery at St. Paul's Hospital (Vancouver, B.C.) and 1977. between 1975 The data i s recorded both on data sheets and on individual patient cards and the information contained on them i s almost the same except the l a t t e r has only the summary. A retrospective study on 535 patients was performed i n October 1981 and information collected on each patient i s name, age, sex, type of operation; whether i t be Aorotobifemoral grafting (ABF), Femoropopliteal grafting (FP), Abdominal Aortic Aneurysm (AAA) or other peripheral vascular operations, the patient's preoperative symptoms; whether those be ischemia or claudication, whether the patient had a previous vascular operation and whether revisions of peripheral vascular operations were performed. Also recorded are the presence or absence of angina, h i s t o r y of a previous myocardial i n f a r c t i o n or a previous coronary bypass graft and the presence or h i s t o r y of diabetes or hypertension. Patient deaths are recorded as being "early" which i s within 30 days of surgery or " l a t e " which i s beyond 30 days. Cause of death i s recorded on data sheets and noted on the cards as being cardiac or non-cardiac. The date of operation and date of death are -6- recorded by year and month ( i n 341 cases out of 535) although i n some cases the day i s also recorded. The data was recorded manually on data sheets and then a summary of these d e t a i l s was noted on patient cards which are easy to read and handle. In one data sheet, there i s information on more than one patient, whereas each patient has exactly one patient card. In February 1985, records of these 535 patients were converted to computer files. Reprints of the data sheet, patient card and the format used for converting to computer f i l e s are included i n Appendix 1. Section 1.3 CLEANING UP OF DATA When the s t a t i s t i c a l analysis was carried out, a l l the 535 patients as well as a l l the variables were not used, f o r many reasons. There were 89 cases excluded from the study as t h e i r year of operation and/or death was unknown. Another 143 cases were deleted because some of t h e i r variables had missing observations. I t was noted that some patients had more than one operation type at the i n i t i a l operation. Hence operation type was partitioned into 15 mutually exclusive subsets as shown i n table I I I . The type OTHER includes peripheral vascular operations other than ABF, FP and AAA. The subsets 9, 10, 11 and 14 were automatically excluded because the patients belonging to those subsets were among the deleted 232 cases. Subsets 5,6,7 and 8 were pooled together and four indicator variables were defined to represent the differences i n survival rates between the -7- f i v e categories of operation type. Pooling was done to avoid having too many variables i n the model. Patients within a data sheet were ordered alphabetically and then t h e i r records entered into the computer f i l e according to t h i s order. In the analysis each patient was i d e n t i f i e d by two labels, namely, the sequence number and page number. The former i n the order i n which they were entered i n to the computer f i l e and the l a t t e r i s the number corresponding to t h e i r data sheet. Survival times were measured i n months rather than i n years because the former i s more spread out. As noted i n section 2.2, using the month or the year did not make any d r a s t i c changes i n significance of variables nor i n the estimated c o e f f i c i e n t s . There were 72 cases i n which the month of operation and/or death was not recorded and i n such situations i t was assumed that month was June. This was done to avoid further deletion of cases which would have made the sample size small. As noted i n section 2.2, assuming the unknown month to be January or December, did not make any d r a s t i c changes i n significance variables nor i n estimating variable c o e f f i c i e n t s . of Hence throughout the study the unknown month of operation and/or death was assumed to be June. As we are p a r t i c u l a r l y interested i n survival with respect to cardiac disease, a l l non cardiac deaths and a l i v e patients were treated as censored observations. There were 45 deaths, 255 censored and 3 l o s t s to follow-up, out of 303 cases. From the 255 censored observations, there were 58 non cardiac deaths and 197 a l i v e patients. -8- Even i f the t r u e s u r v i v e d p a t i e n t s were u s e d , drastically. T h i s i s noted i n s e c t i o n The d a t a f i l e i n i t s follow-up information. description. the r e s u l t s do not 2.2. f i n a l form had 15 v a r i a b l e s Table I gives change as w e l l as a l l the v a r i a b l e names and their -9- TABLE I. Variables Associated with the Study VARIABLE NAME VARIABLE DESCRIPTION AGE Age; range i s 30 to 97 years. SEX Sex; ISCH Symptoms of ischemia; CLAUD Symptoms of claudication; PVOP A previous vascular operation done; ANGINA Presence or absence of angina; 0 = males 1 = females 1 = yes 0 = no 1 = yes 0 = no 1 = yes 0 = no 1 = present 0 = absent MI History of myocardial infarction; DIAB History of diabetes; 1 = yes HYPT History of hypertension; ADDOP Revisions of peripheral vascular operations; 1 = yes PCBG 1 = yes 0 = no 1 = yes 0 = no 0 = no Previous coronary bypass graft done; (Dl, D2, D3, D4) 0 = no 1 = yes 0 = no (0,0,0,0) i f FP only (1,0,0,0) i f ABF only (0,1,0,0) i f OTHER only (0,0,1,0) i f AAA only (0,0,0,1) i f ANY Dl TWO Indicator variable representing the difference between operation type FP and ABF -10- TABLE I. Variables Associated with the Study (cont'd.) VARIABLE NAME D2 VARIABLE DESCRIPTION Indicator variable representing the difference between operation type FP and OTHER D3 Indicator variable representing the difference between operation type FP and AAA D4 Indicator variable representing the difference between operation type FP and ANY TWO -11- Section 1.4 SUMMARY STATISTICS S t a t i s t i c s given i n the following tables are based on the 15 variables and 535 cases. The values i n parentheses correspond to the used 303 cases. TABLE I I . Frequency D i s t r i b u t i o n of Variables VARIABLE NAME Present Absent Missing ISCH 101 (80) 434 (223) 0 (0) CLAUD 96 (55) 439 (248) 0 (0) (72) 429 (231) 0 (0) 79 (45) 456 (258) 0 (0) 100 (65) 435 (238) 0 (0) DIAB 46 (30) 489 (273) 0 (0) HYPT 150 (87) 382 (216) 3 (0) ADDOP 120 (49) 415 (254) 0 (0) 520 (297) 0 (0) PVOP 106 ANGINA MI PCBG 15 (6) D1,D2,D3,D4 = 0,0,0,0 128 (80) = 1,0,0,0 SEX (223) 21 (0) 97 (60) 417 (243) 21 (0) = 0,1,0,0, 53 (34) 461 (269) 21 (0) = 0,0,1,0 73 (40) 441 (263) 21 (0) = 0,0,0,1 163 (89) 351 (214) 21 (0) males = 388 (217), 386 females = 147 (86) -12- Table I I I . Subset Frequency D i s t r i b u t i o n of Operation Type Subset Name Frequency 128 (80) 1 FP only 2 ABF only 97 (60) 3 OTHER only 53 (34) 4 AAA only 73 (40) 5 ABF + FP 15 (8) 6 ABF + OTHER V ANYTWO 52 (30) 7 ABF + AAA \ 30 (10) 8 FP + OTHER J 66 (41) 9 FP + AAA 2 10 10 OTHER + AAA 11 ABF + FP + OTHER 3 12 ABF + FP + AAA 0 13 FP + OTHER + AAA 0 14 ABF + OTHER + AAA 6 15 ALL FOUR 0 -13- TABLE IV. Early Death Late Death 25 (23) 38 (35) 0 (0) 0 (0) 63 (58) Cardiac 7 (4) 54 (41) 0 (0) 0 (0) 61 (45) Alive 0 (0) 0 (0) 387 (197) 0 (0) 387 (197) Unknown 0 (0) 8 (0) 0 (0) Non Cardiac CAUSE OF DEATH Frequency D i s t r i b u t i o n of Follow-up Information 32(27) 100 (76) Alive Unknown 16 (3) 387 (197) 16 (3) 24 (3) 535 (303) -14- Chapter 2 COX'S REGRESSION MODEL Section 2.1 GENERAL THEORY FOR COX'S MODEL There have been many a r t i c l e s i n the recent l i t e r a t u r e on the application of regression analysis to data with censored observations (e.g. Cox, 1972; M i l l e r , 1981; K a l b f l e i s h and Prentice, 1980). Let T denote the random f a i l u r e time with a density function f ( t ) and d i s t r i b u t i o n function F ( t ) . The s u r v i v a l function S(t) i s defined to be the cummulative p r o b a b i l i t y of s u r v i v a l past time t and given by S(t) = Pr {T > t} = 1 - F(t) The hazard function X(t) has the interpretation \ ( t ) dt = Pr{t _ T ;_ t + dt | t £ T } Then, X(t) = f(t) [1 - F ( t ) ] -15- Hence we have S(t) = exp { -J \(x) dx} One of the important goals i s to estimate the survival function. I f the parametric form of f ( t ) i s known and once we have the maximum l i k e l i h o o d estimates f o r the parameters, s ( t ) can be estimated. For example, i f f(t) = y e v t , then \ ( t ) = u and S(t) = e u t . I f the maximum l i k e l i h o o d estimate of y i s y, then, the maximum l i k e l i h o o d estimate of S(t) i s e . However, i f the parametric form of f ( t ) i s unknown, a non parametric estimate f o r S(t) can be obtained using the empirical s u r v i v a l function. I f there i s no censoring, the empirical s u r v i v a l function based on a sample of size n i s given by S(t) = X \ Number of observations _ t I ; t _ 0 When dealing with censored data, t h i s equation has to be modified. Consider n individuals and assume that t <t 1 are K(_n) d i s t i n c t times at which deaths occur. < • • • • < t., 2 Let d. = number of deaths at time t. l l n. = number of individuals "at r i s k " at time t . ; 1 -16- that i s the number of individuals a l i v e j u s t p r i o r to time t ^ . In addition to l i f e times t , t , _L t. , there are also censoring times 2 K c.'s f o r individuals whose l i f e times are not observed. J Then an estimate of S(t) i s defined as = S(t) n i: 1 t. < 1 - d. l n. I This i s called the Kaplan-Meier estimate of the survival function and i s a kind of a non parametric and Meier, 1958). maximum l i k e l i h o o d estimate. This estimate i s a step function with a u n i t value at t = 0 and drops by a factor (1 change at j ' C s (Kaplan after t = t.. I t does not However, the e f f e c t of censoring times i s incorporated <\ into the n / s and hence, into the sizes of the jumps i n S ( t ) . T y p i c a l l y , the f a i l u r e time depends upon quantitative or q u a l i t a t i v e explanatory variables known as covariates, such as age, sex, type of medical treatment. Effects of these covariates on the l i f e times can be studied using a kind of regression model called Cox's model. Let Z be the vector of covariates and B_ be a vector of unknown coefficients. Then, Cox's model s p e c i f i e s X ( t ; Z) = X (t) exp {Z p}, o T — -17- where X(t ; Z } i s the hazard rate with covariate vector Z and X (t) o i s the hazard rate with Z_ = 0 (Cox, 1972). The regressor variables here are the covariates and changes i n these, change the hazard function i n a m u l t i p l i c a t i v e way. proportional hazards model. Such a model i s c a l l e d a When B_ i s estimated and tested f o r s i g n i f i c a n c e , one can f i n a l l y select a set of s i g n i f i c a n t covariates that would predict the hazard rate. Estimates of the regression parameters are obtained by maximizing the p a r t i a l l i k e l i h o o d function given by (Cox, 1975) K L(B) = n exp " i=l (ZT 6_) / _ exp (Z.T J3) where Z. = covariate vector of the i ^ n individual R^ = set of individuals at r i s k j u s t p r i o r to t ^ when there are t i e s among the death times, the p a r t i a l l i k e l i h o o d function proposed by Breslow (1974): (B) =TT\exp (S. 6) /\l i s maximized. exp (z. Here, d. i n the number of deaths at time t. and S. i s i l the vector sum of the covariates of d. i n d i v i d u a l s . l ~ i -18- Section 2.2 APPLICATIONS AND RESULTS FROM COX'S MODEL The sample size f o r t h i s analysis was 303. The variable, HISTORY OF PREVIOUS CORONARY BYPASS GRAFT, was p e r f e c t l y ordered with time; that i s a l l people who have had a previous coronary bypass had survival times less than 28 months and the patients who had not undergone a coronary bypass graft had survival times greater than 28 months. Hence i t i s clear that HISTORY OF PREVIOUS CORONARY BYPASS GRAFT i s a variable which i s highly correlated to s u r v i v a l . Due to the fact that the variable was ordered with time, the p a r t i a l l i k e l i h o o d i s maximized at i n f i n i t y . Therefore the c o e f f i c i e n t cannot be estimated. Since Cox's model cannot be used with such a variable i n the model, i t was excluded from the computer analyses. t h i s analysis were AGE, The other 14 variables used f o r SEX, ISCH, CLAUD, PVOP, ANGINA, MI, DIAB, HYPT, ADDOPT, DI, D2, D3 and D4 where DI, D2, D3 and D4 are dummy variables defined i n Table I. The regression analysis was carried out using the computer package BMDP program 2L. The logarithm of the maximized p a r t i a l l i k e l i h o o d function, the global chi-square and i t s p-value as well as the estimated c o e f f i c i e n t s , t h e i r asymptotic standard errors and the standardized c o e f f i c i e n t s f o r each covariate are presented i n Table V. Here, the unknown month of operation and/or death was assumed to be June. The global chi-square s t a t i s t i c tests the hypothesis that a l l -19- coefficients are i d e n t i c a l l y zero. U (0) I" T where U(0) This s t a t i s t i c 1 U (0) r e p r e s e n t the v e c t o r of f i r s t d e r i v a t i v e of the l i k e l i h o o d f u n c t i o n e v a l u a t e d at j_ = 0 and 1(0) information matrix asymptotic i s d e f i n e d as e v a l u a t e d at J3 = 0. denotes the partial observed The g l o b a l c h i - s q u a r e has an c h i - s q u a r e d i s t r i b u t i o n w i t h degrees of freedom e q u a l to the number of c o v a r i a t e s i n the model. The r e g r e s s i o n coefficient indicates c o v a r i a t e and the h a z a r d f u n c t i o n . the r e l a t i o n s h i p between The e f f e c t of a u n i t change the in A v a r i a b l e X. held f i x e d . on the h a z a r d f u n c t i o n i s e s t i m a t e d by e ^ ; a l l other 1 A positive coefficient f u n c t i o n and t h e r e f o r e i n c r e a s e s the v a l u e of the hazard s u r v i v a l d e t e r i o r a t e s with increasing values the v a r i a b l e p r o v i d e d t h a t the c o v a r i a t e s are r e a s o n a b l y one a n o t h e r . X's (A n e g a t i v e c o e f f i c i e n t has the r e v e r s e of independent of interpretation). -20- TABLE V. R e c e s s i o n Coefficents f o r Cox's Model Log l i k e l i h o o d = -218.2350 Global Chi-square = 53.2400, D.F = 14, p-value = 0.0000 VART. ABLE NAME COEFFICIENT STANDARD ERROR STANDARDIZED COEFFICIENT P-VALUE AGE 0.0368 0.0163 2.27 0.005 SEX -0.8954 0.4302 -2.08 0.009 0.2854 0.3689 0.77 0.180 CLAUD -0.5094 0.4932 -1.03 0.200 PVOP -0.2491 0.3819 -0.65 0.780 ANGINA 0.2599 0.3865 0.67 0.340 MI 1.1720 0.3346 3.50 0.000 DIAB 0.4045 0.4135 0.99 0.110 HYPT 0.9028 0.3354 2.69 0.004 ADDOP -0.7097 0.5018 -1.41 0.09 Dl -0.4222 0.5568 -0.76 0.24 D2 0.7896 0.4536 D3 -0.8628 0.5624 -1.53 0.07 D4 -0.2016 0.4242 -0.48 0.34 ISCH 1.74 0.04 -21- A p-value of zero f o r the global chi-square s t a t i s t i c that not a l l the c o e f f i c i e n t s are zero. indicates AGE, SEX, MI, HYPT and D2 are highly s i g n i f i c a n t whereas D3 and ADDOP appear to have a f a i r l y s i g n i f i c a n t e f f e c t on the hazard function. The above regression analysis was carried out s i m i l a r l y with the unknown month of death and/or operation assumed to be January and December and the results are shown i n Table VI. parentheses correspond to December). (The values i n I t i s clear that the values of the c o e f f i c i e n t s do not change very much when compared to the values given i n Table V. The s i g n i f i c a n t variables turn out to be the same. Hence a l l further analyses are done with the assumption of unknown month of death and/or operation to be June. The same regression analysis was repeated, once with survival times measured i n years and true survived patients and again with survival times measured i n months and true survived patients. The corresponding results are presented i n Table VII and Table VIII respectively. I t i s clear that i n both these tables, the values of the c o e f f i c i e n t s do not change very much when compared to the values given i n Table V. The s i g n i f i c a n t variables turn out to be the same. Hence, we use the s u r v i v a l times i n months and consider non cardiac deaths and a l i v e patients as censored observations. -22- TABLE VI. Regression Coefficients f o r Cox's Model; Varying Month of Death and/or Operation Log Likelihood = -222.3807 (-210.2653) Global chi-square = 52.32 (53.05), D.F = 14, p-value = 0.00 (0.00) VARIABLE COEFFICIENT NAME STANDARD ERROR STANDARDIZED COEFFICIENT AGE 0.0355 ( 0.0358) 0.0162 (0.0160) 2.19 ( 2.24) SEX -0.8929 (-0.8820) 0.4281 (0.4365) -2.09 (- 2.02) 0.2523 ( 0.3553) 0.3677 (0.3705) 0.67 ( 0.96) CLAUD -0.5596 (-0.4384) 0.4921 (0.4886) -1.14 (- 0.90) PVOP -0.2202 (-0.2479) 0.3783 (0.3884) -0.58 (- 0.64) ANGINA 0.2458 ( 0.2414) 0.3850 (0.3893) 0.64 ( 0.62) MI 1.1895 ( 1.1427) 0.3332 (0.3394) 3.58 ( 3.37) DIAB 0.4273 ( 0.3208) 0.4139 (0.4187) 1.03 ( 0.77) HYPT 0.8415 ( 0.9040) 0.3312 (0.3399) 2.54 ( 2.66) ADDOP -0.6064 (-0.9413) 0.4909 (0.5243) -1.24 (- •1.80) Dl -0.4991 (-0.3987) 0.5532 (0.5559) -0.90 (- 0.72) D2 0.7520 ( 0.7632) 0.4483 (0.4521) 1.68 ( 1.69) D3 -0.8791 (-0.8325) 0.5609 (0.5609) -1.57 (- 1.48) D4 -0.2490 (-0.1465) 0.4183 (0.4290) -0.60 (- 0.34) ISCH -23- TABLE VII. Regression Coefficients f o r Cox's Model; Survival Time i n Years and True Survived Patients. Log Likelihood = -243.8209 Global chi-square = 59.94, D.F. = 14, p-value = 0.0000 VARIABLE NAME COEFFICIENT STANDARD ERROR STANDARDIZED COEFFICIENT AGE 0.0362 0.0156 2.32 SEX -0.8399 0.4123 -2.04 0.2206 0.3562 0.62 CLAUD -0.5209 0.4812 -1.08 PVOP -0.2106 0.3568 -0.59 ANGINA 0.2303 0.3684 0.63 MI 1J806 0.3244 3.64 DIAB 0.3916 0.4084 0.96 HYPT 0.9097 0.3107 2.93 ADDOP -0.6639 0.4912 -1.35 Dl -0.4417 0.5477 -0.81 D2 0.7892 0.4442 D3 -0.8270 0.5224 -1.58 D4 -0.2354 0.4109 -0.57 ISCH 1.78 -24- TABLE VIII. Regression C o e f f i c i e n t s f o r Cox's Model; Survival Time i n Months and True Survived Log Likelihood = -233.5218 Global chi-square = Patients 54.82, D.F. = 14, p-value = 0.0000 VARIABLE NAME COEFFICIENT STANDARD ERROR STANDARDIZED COEFFICIENT AGE 0.0364 0.0156 2.33 -0.8876 0.4325 0.2630 0.3618 CLAUD -0.5318 0.4812 -1.11 PVOP -0.2311 0.3880 -0.65 ANGINA 0.2581 0.3922 0.66 MI 1.1650 0.3210 3.63 DIAB 0.3921 0.4025 0.97 HYPT 0.9130 0.3218 2.84 ADDOP -0.6528 0.4931 -1.32 DI -0.4350 0.5529 -0.78 D2 0.7725 0.4512 1.71 D3 -0.8510 0.5583 -1.52 D4 -0.2182 0.4217 -0.52 SEX ISCH -2.05 0.72 -25- TABLE IX. Estimated Correlation Matrix (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) AGE (1) 1.0 SEX (2) -.11 1.0 ISCH (3) -.05 -.10 1.0 CLAUD (4) -.09 PVOP (5) .02 .01 -.05 .10 1.0 .08 .16 1.0 .05 .07 -.01 1.0 ANGINA (6) -.09 .01 MI (7) -.05 .09 -.02 DIAB (8) -.11 .07 -.10 -.09 -.13 .04 -.01 -.09 1.0 HYPT (9) .03 -.02 DI (10) .04 -.04 -.07 -.02 -.01 -.08 -.07 .16 .05 1.0 D2 (11) .10 .16 .07 D3 (12) -.11 .06 D4 (13) -.09 .02 -.10 ADDOP (14) .05 .08 -.15 -.18 .05 -.10 1.0 .07 .05 -.22 -.07 -.20 -.11 .06 .14 .13 .16 .11 -.05 1.0 .02 .03 1.0 .11 -.11 .09 -.12 .17 .19 1.0 .13 -.10 -.15 -.14 .12 -.13 .09 .41 0.02 1.0 .05 -.05 .02 .07 .10 -.04 -.16 .07 -.09 .09 .06 1.0 -26- According to medical reports i t was suspected that ISCH and AGE were correlated. To examine t h i s f i r s t order association, a simple contingency table was constructed and the n u l l hypothesis of independence of ISCH and AGE tested by Pearson's chi-square goodness of fit test. For the purpose of t h i s analysis AGE was categorized into 4 groups. The 2x4 contingency table f o r ISCH vs. AGE i s AGE < 40 y r present ISCH absent 41-60 yr 61-80 yr > 80 yr 6 139 275 14 434 0 26 70 5 101 6 165 345 19 535 with a Pearson x of 3.58 with a s i g n i f i c a n c e l e v e l of 0.31. 2 From 3 Table IX we have the c o r r e l a t i o n c o e f f i c i e n t between AGE and ISCH as -0.05. Significance of t h i s sample c o r r e l a t i o n c o e f f i c i e n t can be tested using the following t e s t (Anderson, 1984, p.109). I f y i s the sample c o r r e l a t i o n c o e f f i c i e n t between two variables, then the n u l l hypothesis of the population c o r r e l a t i o n between the two variables being equal to zero, i s rejected i f J5__^M > t _ (a) N 2 (l-Y )>2 2 where N i s the sample s i z e and t (a) i s the two-tailed -27- s i g n i f i c a n c e point of the t - d i s t r i b u t i o n with (N-2) degrees of freedom for significance l e v e l a. Using t h i s test f o r sample c o r r e l a t i o n c o e f f i c i e n t between AGE and ISCH, the significance l e v e l turned out to be 0.2. S i m i l a r l y the other c o e f f i c i e n t s between each p a i r of variables were tested and the significance levels appeared to be i n the range of 0.3 - 0.1. A stepwise l o g i s t i c regression was also carried out using BMDP program LR, with ISCH as the binary response variable and AGE as the independent variable. This stepwise procedure did not select AGE as a s i g n i f i c a n t variable since the p-value was 0.62. Hence there i s no evidence f o r any association between AGE and ISCH f o r t h i s data set. Section 2.3 THEORY FOR STEPWISE REGRESSION IN COX'S MODEL As a more e f f i c i e n t way of i d e n t i f y i n g the independent variables which are s i g n i f i c a n t l y related to the hazard function, regression stepwise procedure was used. In the stepwise process s i g n i f i c a n t p r o b a b i l i t i e s are computed on the basis of a large sample p a r t i a l l i k e l i h o o d r a t i o test using the chi-square value calculated from the log of the r a t i o of two maximized p a r t i a l l i k e l i h o o d functions. This i s known as the MPLR method. represent the set of indices of the covariates Let M i n the regression model at any given step and L denote the maxmized p a r t i a l l i k e l i h o o d M function based on the covariates belonging to set M. The MPLR method -28- removes the variables corresponding to the index K e M f o r which i s smallest i f Pr (xf) > l i m i t to remove or enter the variable corresponding to index K ^ M f o r which x* = -2 t n J v < f > ) ; M + = M * i s largest i f Pr(x*) < l i m i t to enter. used for t h i s analysis are 0.15 Section 2.4 and 0.10 1*1 The remove and enter l i m i t s respectively. RESULTS FROM STEPWISE REGRESSION Computer package BMDP program 2L was used to carry out the analysis. Following t h i s procedure MI was the f i r s t variable to enter the model with a x* of 19.24. With MI i n the model, the variable that 2 was added next i s AGE. The x x f o r t h i s stage was 6.72 significance l e v e l of about 0.009. with a The next variable to enter was and the x* was 3.58 with a significance l e v e l of 0.058. entered at the fourth step with a x*of 3.67, 0.055. 0.041 D2 HYPT was significance l e v e l The variable SEX which had a x* of 4.18 and significance l e v e l was entered at the f i f t h stage and the stepwise process terminated a f t e r the s i x t h step i n which ADDOP was entered with x* of 2.85, significance l e v e l 0.092. The c o e f f i c i e n t values, t h e i r asymptotic standard errors and the -29- standardized c o e f f i c i e n t s are given i n Table X. These values do not change d r a s t i c a l l y , when compared to the values given i n Table V. Thus at t h i s stage we choose as the model \ ( t ; Z) = \ (t) exp{l.33 x MI + 0.04 x AGE + 0.77 x HYPT '+ 0.89 x D2 o -0.82 x SEX - 0.76 x ADD0P} (**) Recall that SEX = 0 ; males 1 ; females ADDOP = 0 ; r e v i s i o n operation not done = 1 ; r e v i s i o n operation done Hence i t i s clear that the hazard rate f o r males i s almost twice that —0 82 for females (e ' = 0.44) and performing a r e v i s i o n operation tends to halve the hazard rate (e °'^ - 0.47). 6 Patients who have had femoropopliteal grafting technique (FP) have a better survival than the patients who had undergone any peripheral vascular surgery belonging to the category "OTHER". This i s indicated by the estimated c o e f f i c i e n t —0 89 of D2 (e = 2.44) which i s a measurement of the difference i n hazard rates between operation type FP and OTHER. The estimated c o e f f i c i e n t s f o r MI and HYPT are p o s i t i v e , as expected, since presence -30- of t h e s e i s r e l a t e d t o p o o r e r p a t i e n t f u n c t i o n i n g . A positive c o e f f i c i e n t f o r age i m p l i e s t h a t the o l d e r p e o p l e tend t o d i e e a r l i e r t h a n young ones but s i n c e t h i s study does n o t have a c o n t r o l group, t h a t i s t h e r e are no age matched p a t i e n t s who have n o t been o p e r a t e d f o r the d i s e a s e , one cannot make a s t r o n g c o n c l u s i o n about the effect of age on the s u r v i v a l of the p a t i e n t s who had undergone s u r g e r y f o r peripheral vascular disease. -31- TABLE X. VARIABLE NAME Parameter Estimates from Stepwise Regression COEFFICIENT STANDARD ERROR STANDARDIZED COEFFICIENT MI 1.3261 0.3031 4.38 AGE 0.0400 0.0149 2.68 HYPT 0.7661 0.3168 2.42 D2 0.8893 0.3995 2.23 SEX -0.8171 0.4217 -1.94 ADDOP -0.7619 0.4938 -1.54 -32- Section 2.5 CHECKING FOR PROPORTIONALITY ASSUMPTION AND ADEQUACY OF THE FIT IN COX'S MODEL Recall that the s u r v i v a l function S(t; Z) i s given by r S(t ; z) = exp { - b \ M x ; Z) dx} Hence with Cox's proportional hazards model we get -ftn S(t ; Z) = - a n S ( t ) . exp {ZjB_} T Q Hn [-8.n S(t ; Z) ] =jW-&n S ( t ) ] + Z 8_ T Q Thus, the logarithm of the minus logarithm of s u r v i v a l function for a p a r t i c u l a r covariate pattern, when plotted against time i s a straight l i n e , i f the p r o p o r t i o n a l i t y assumption i s true. When we p l o t t h i s on the same scale f o r the categories of a p a r t i c u l a r variable, such as males and females of variable SEX, then the two l i n e s should be p a r a l l e l , i f the p r o p o r t i o n a l i t y assumption holds for that variable. Figures 1 through 6 show the plots of logarithm of the minus logarithm of estimated s u r v i v a l function for the s i x s i g n i f i c a n t variables, evaluated with the mean covariate vector ( K a l b f l e i s h and Prentice, 1980, p. 92). The mean covariate vector has elements which are equal to the mean of each covariate and i t was used to avoid having too many p l o t s which would correspond to each possible value of the s i x -33- variables. AGE, The proportional hazards assumption i s met by the variables SEX, MI and HYPT as the corresponding curves are p a r a l l e l . The proportionality assumption does not seem to hold f o r D2 and ADDOP since those curves have s l i g h t departures from p a r a l l e l i s m . Once the model i s f i t t e d , the o v e r a l l adequacy of the model can be checked by p l o t t i n g the s u r v i v a l curve estimates computed from the residuals. The estimated residuals f o r the i ^ * A 1 i n d i v i d u a l i s given by A e. = -Sin S(t. ; Z.) ; 1 = 1,2,»»», n 1 1 i _ where S(t^ ; Z^) i s the estimated s u r v i v a l function f o r the i t l l i n d i v i d u a l ( K a l b f l e i s h and Prentice, 1980, p.96). I f the model f i t s the data, the e\'s should behave as a random sample of censored unit exponential variates. Thus when the s u r v i v a l curve estimates based on these residuals are plotted on a log scale, the r e s u l t i n g plot should y i e l d approximately a straight l i n e with slope -1. In t h i s analysis, the estimated residuals were obtained from the output of BMDP program IL which computed the Kaplan-Meier s u r v i v a l curve estimates based on the residuals. The corresponding plot which i s i l l u s t r a t e d i n Figure 7, supports the adequacy of the f i t . -34- i i + t i i i i i i + i i i i i i CM t- < < < < CO CO CO CD ca ca CO < ca ca co CO CO CO CO CO CO CO CO CO CO < < < < < < < < < < i o UJ i i < < < CO CO CO CO ca ca ca ca ca oo co y- CO LO < < < < 00 co ca ca co co oa ca o 00 ca ca ca I r- < < < < < < < ca oa oa ca ca CM CO < < < < ca ca co co CM < < < < < < < < < i i i i oa 00 ca oa 1 i i LOG CO MINUS < < < < < < < < < . < < < < < oo oo oa oo oa * + i i i i i LOG S U R V I V A L FUNCTION o _ -35- LOG MINUS LOG SURVIVAL FUNCTION FIGURE 3. STRATA Log minus l o g s u r v i v a l function f o r HYPT SYMBOL NO YES A B AA + BBBB BBBBBBBB + BB -B -B AAAA - B AAA -B A -*A - * -A + BBB BBBBBB AAA AAA AAAA ,+ BBBB AAAAA B AAAA + BB BBBBBBB BBBBBBBBBBB AAAAAAAAAA + 24 16 AAAAAAAAA + AAAAAAAA + 40 32 MONTH AA A AAAA + A + 56 48 + . • 72 64 FIGURE 4. Log minus l o g s u r v i v a l f u n c t i o n f o r SEX STRATA SYMBOL MALE FEMALE A B + AAAAAAAAAAA AAAAAAAAA AA AAA A A AA A A BBBBBBB AAAAAAAA BBBBBBBBB AAAAA BBBBBBBBBBBB AAA BBBBB AAAA BB AAAA BBBBBB + AAAAA BBBBB AAA BBBBBB AABBBBB -B*BB 0. 8. 16 24 32 MONTH 40 48 56 64 72 FIGURE 5. STRATA Log minus l o g s u r v i v a l f u n c t i o n f o r ADDOP SYMBOL NO YES A B + + AAAAAAAAAAAAA BB AAAAAAAA BBBBB AAAA BBBBBBBBBB AAAA BBBB AAAAA BB AA B + AAAAAA BB AAA B - AAA BBBBB -A BBBB -A B A A A AAAAAA AA AA A A A AAAAA .+ 0. + 8. + 16 + 24 + 32 MONTH + 40 + 48 + 56 + 64 + .. 72 FIGURE 6. STRATA Log minus l o g s u r v i v a l function for D2 SYMBOL NO YES A B AA + B B BB BBBBB BBBBBBBB BBBBBBB BBBBBBBBBBBBBBBB AA B AA -BB AAAAAA +B AAA -B A -B AA -B AAAAAAAAAA -*A -A AA AAAAAAAA BBBB AAAAAAAAAAA A AA AAA AAAAAAAAA A + + + 0. + 8. + 16 + 24 + 32 MONTH + 40 + 48 + 56 + 64 + • • 72 FIGURE 7 . Residual p l o t f o r checking h a z a r d s model proportional + . . + ....+....+....+....+....+.... ... . + . . . . + . . . . + . + + .+....+....+. +** * • • * .•....•....•....•....-••....•....•..•••••••••••••••••••••••"'•••••••••••••'••-••••'•••• -- -' + 0.00 !o70 .210 .350 .490 .630 .770 .910 1 .140 .280 .420 .560 .700 .840 .980 -41- Chapter 3 ALL POSSIBLE SUBSETS REGRESSION IN COX'S MODEL Section 3.1 THEORY Although stepwise procedures are often used to select s i g n i f i c a n t variables i n regression with censored data, a l l possible subsets regression i s preferred as a more r e l i a b l e and informative method, provided that i t i s computationally f e a s i b l e (Kuk, 1984; Draper and Smith, 1981). This i s because stepwise procedures lead to a single subset of variables and do not suggest alternative good subsets. A c r i t e r i o n that i s based on the Wald s t a t i s t i c and which i s equivalent to Mallow's Cp s t a t i s t i c i s used f o r selecting the best subset. Consider Cox's proportional hazard model discussed i n section T 2.1. Let P T T = (P , P ) and l e t model a correspond to P 1 2 2 =0. Then W , the wald s t a t i s t i c of the f u l l model against model a i s defined as a A T _ i A W a = P2 c 2 2 p 2 where P AT AT = (P , P ) i s obtained from the f u l l model and obtained \ -42- A from the f u l l model as the estimated covariance matrix of 6. So, to get W j from the f u l l f i t , extract the second component of B and bottom corner of C; t h i s l a s t needs to be inverted. Then a selection c r i t e r i o n V , a suggested by Kuk i s given as V where P a = W a a + 2p a i s the number of covariates i n the model a. To begin with, the following matrix T A A A P (3.1.1) AT AT PA A (N-p-l)+P AP A where again, p i s obtained from f u l l f i t A -i . = C = estimated covariance matrix of P A N i s an a r b i t r a r y integer > P was constructed by Kuk i n order to show the equivalence of V and C a p statistic. If x,y are the independent and dependent variables from an ordinary multiple regression and M i s the matrix of corrected sums of squares and crossproducts defined as -43- „ ,T M =/x x T x y T y x T y y then the residual sums of squares, RSS i s T T T - l T RSS = y y - y x (x x) x y. By treating (3.1.1) as i f i t were a matrix of corrected sums of squares and crossproducts of independent and dependent variables computed from a sample s i z e N, the residual sums of squares obtained by t h i s matrix i s /IT A T* AA A 3 A T — 1 RSS(full) = (N - p - 1) + S A 3 - B A T = (N — p — 1) + 3 A AT A3 — P A P = (N - p - 1) The residual sums of squares f o r the model a i s RSS(a) = RSS(full) -I- p T 2 C (3 X 22 2 and the Mallows' Cp s t a t i s t i c f o r the model a i s Cp(a) = RSS(a) + 2 (Pa + 1) - H (3»1«2) -44- Where S 2 = RSS(full) = 1 by the choice of (3»1«1) . (N - p - 1) Substituting (.3*1*2) f o r RSS(a) the above equation f o r C^Ca) can be s i m p l i f i e d as C (a) = RSS(full) + P C |3 + 2(P + 1) - N p 2 22 2 a = (N - p - 1) + (2 - N) + P C P + 2p 2 22 2 a = -p + 1 + W + 2pa a T _1 T _1 = V - p +1 a Hence i t i s clear that the c r i t e r i o n V i s formally equivalent to Mallows' C . The problem can now be handled by the standard P s t a t i s t i c a l package BMDP program 9R, which does a l l possible subsets l i n e a r regression. The subset that minimizes i s chosen to be the best subset. Section 3.2 APPLICATIONS AND RESULTS For t h i s analysis a l l 14 variables were used. The estimated A c o e f f i c i e n t s and the estimated covariance matrix of p were obtained -45- from the output of BMDP program 2L. The estimated covarian.ce matrix was then inverted with the help of a Fortran subroutine. (See Appendix A 2) Using t h i s inverted matrix and B, the matrix (3.1.1) was constructed. In t h i s study, (3.1.1) was a 15 X 15 symmetric matrix. The matrix i s used as a covariance matrix for input to the BMDP program P9R. In the control language f o r t h i s program, the value of the sample size N, should be specified i n the INPUT paragraph. (See Appendix 2) The best subset selected by t h i s method was SEX, MI, HYPT, D2 and ADDOP which had a Cp value of 5.18. SEX,MI,HYPT,D2 and ADDOP with The second best was the model AGE, a Cp value of 5.58. the subset selected by the stepwise procedure. The second best was The difference between the Cp values f o r the best subset and the second best subset i s very small. The c o e f f i c i e n t f o r age i n the second best subset was 0.03^6 and the corresponding standardized c o e f f i c i e n t was 2.61. When these values are compared to the corresponding values obtained from stepwise regression, i t i s clear that AGE i s a s i g n i f i c a n t variable. From the results (discussed i n section 2.2) on the significance of c o r r e l a t i o n c o e f f i c i e n t s , i t appears that there i s no evidence f o r any association between AGE and the other variables. Separate stepwise l o g i s t i c regressions were carried out f o r each variable; taken as a binary response and the independent variable as AGE. A l l these regressions indicated a p-value greater than 0.6 f o r AGE. Several contingency tables were constructed f o r AGE vs the other f i v e s i g n i f i c a n t variables and a Pearson's chi-square goodness of f i t test was carried out. According to the results presented i n Table XI, there -46- i s no evidence f o r any association between AGE and other variables i n t h i s data set. MI, AGE, HYPT D2, SEX and ADDOP were selected as the s i g n i f i c a n t variables f o r the f i n a l model. -47- TABLE XI. Two-way Contingency Tables AGE < 40 yr 41-60 yr 61-80 yr >80 yr Male 4 68 139 6 217 Female 1 23 57 5 86 5 91 196 11 303 SEX Pearsons's .23, significance l e v e l =0.53 AGE < 40 y r 41-60 yr 61-80 yr >80 yr No 4 72 132 8 216 Yes 1 19 64 3 87 5 91 196 11 303 Pearsons's X 2 = 4.41, significance l e v e l = C .22 -48- TABLE XI. (continued) AGE < 40 y r 41-60 y r No 4 79 Yes 1 12 5 91 61-80 yr 176 >80 y r 10 269 1 34 11 303 D2 Pearsons*s x 20 196 =0.99, significance l e v e l =0.80 3 AGE < 40 y r 41-60 yr 61-80 yr >80 yr No 5 76 162 11 254 Yes 0 15 34 0 49 5 91 196 11 303 ADDOP Pearsons' s x =3.29, significance l e v e l = 0.35 3 AGE < 40 yr 41-60 yr 61-80 yr > 80 yr No 4 73 154 7 238 Yes 1 18 42 4 65 5 91 196 11 303 MI Pearsons's X 2 3 = 1.61, significance l e v e l == 0.66 -49- Chapter 4 CHECKING FOR INFLUENTIAL OBSERVATIONS In some data sets, one of the cases may have s u f f i c i e n t impact upon the regression such that, i f that case were deleted, d i f f e r e n t r e s u l t s would have been obtained. observations. Such cases are known as i n f l u e n t i a l I t i s suggested that empirical influence functions computed for each covariate and each observation i n the proportional hazards regression model, can be useful to i d e n t i f y these i n f l u e n t i a l observations. (Reid and Crepeau, 1985). discussed i n the above reference was The theory and method applied to t h i s study. Influence function values are computed f o r each case (patient) and each covariate. Since i t i s d i f f i c u l t to consider influence function values for a l l the 14 variables and 303 observations, only to the s i x s i g n i f i c a n t variables. obtained attention was restricted The estimated c o e f f i c i e n t s were from BMDP program 2L and a Fortran program was used to calculate the influence function. From the summary i n Table XII i t i s seen that case 160 (the case numbers are with respect to a l l 535 cases) has the largest value of the influence function for covariate HYPT and D2. smallest value for covariates MI and D2. Observation 1 had the Table XIII summarizes the proportional hazards regression models; the f i r s t using a l l observations and the others excluding d i f f e r e n t cases. The magnitude of the influence function for each case i s roughly A A A consistent with the magnitude of (8 - |3 .) where |3 . i s the -50- estimated c o e f f i c i e n t when the i case i s deleted. In t h i s study, one u n i t on the influence function scale correspond to ||3 - |3 ^1 approximately equal to 0.003. From the values given i n Table XIII, i t i s clear that the estimated c o e f f i c i e n t s and t h e i r standard errors do not change very much and t h i s indicates that none of these specified cases seem to have very strong influence on the estimated parameters. This also agrees with the proportional hazards plots and the residual plot of section 2.5 because none of these cases show up on either of these p l o t s . -51- TABLE XII. Covariate Summary of Influence Function Values Maximum of influence Minimum of influence function (case no:) function (case n o O AGE 0.5069 (431) -1.2242 (324) SEX 44.9926 (225) -24.0415 (257) MI 17.2703 (106) -23.3969 (1) HYPT 20.1358 (160) -19.6145 (30) D2 33.6886 (160) -45.0780 (1) ADDOP 64.2133 (46) -40.4665 (162) -52- TABLE XIII. Proportional Hazards Regression Model Estimated c o e f f i c i e n t (standard error) AGE SEX MI HYPT D2 ADDOP A l l data 0.0400 w i t h modeK**) (0.015) -0.8171 (0.422) 1.3261 (0.303) 0.7661 (0.317) 0.8893 (0.400) -0.7619 (0.494) Case 160 deleted 0.0394 (0.015) -0.7770 (0.423) 1.3636 (0.306) 0.6943 (0.322) 0.7575 (0.421) -0.7376 (0.495) Case 1 deleted 0.C893 (0.015) -0.8290 (0.421) 1.3894 (0.306) 0.7474 (0.315) 1.0316 (0.401) -0.7916 (0.494) Case 46 deleted 0.0398 (0.015) -0.7916 (0.423) 1.3817 (0.307) 0.8113 (0.320) 0.9340 (0.402) -0.7502 (0.495) Case 162 deleted 0.0404 (0.015) -0.8355 1.3525 (0.422) (0.303) 0.7962 (0.316) 0.8827 (0.399) -0.6446 (0.494) A l l cases specified in Table XII deleted 0.0495 (0.016) -0.9765 (0.487) 0.6816 (0.344) 1.1120 (0.430) -0.7984 (0.555) 1.4662 (0.332) -53- Chapter 5 CONCLUSIONS An outline of the surgical techniques of bypass surgery for peripheral vascular disease i s presented. The data analysed i n t h i s study i s based on 303 patients s u r g i c a l l y treated f o r peripheral vascular disease at St. Paul's Hospital, Vancouver, B.C., 1975-1977. between A subset of the recorded variables was used f o r the analysis due to problems with incomplete records. When the month of death and/or operation was unknown, i t was assumed to be June. S t a t i s t i c a l procedures such as Cox's regression, stepwise regression, a l l subsets regression for the proportional hazards model as well as contingency tables are used to i s o l a t e important variables in predicting survival and to discover associations among variables. The conclusions of these analyses are: 1) the most important variables i n descending order of t h e i r s i g n i f i c a n c e are myocardial i n f a r c t i o n , presence or absence of hypertension, sex and whether or not a r e v i s i o n operation was done. History of a previous coronary bypass graft i s highly correlated with s u r v i v a l but the comparison of i t s significance to the other s i g n i f i c a n t variables i s not possible since the c o e f f i c i e n t corresponding to h i s t o r y of a previous coronary bypass graft could not be estimated. -54- 2) age i s also related to s u r v i v a l i n t h i s data set. However, since there i s no control group; that i s we do not have a group of age matched patients who have not undergone surgery for peripheral vascular disease, one cannot make a strong conclusion about the e f f e c t of age on s u r v i v a l of the patients who have had surgery f o r peripheral vascular disease. 3) patients who have had Femoropopliteal grafting technique have a better s u r v i v a l than the patients who had undergone any peripheral vascular surgery belonging 4) to the category "OTHER". i n t h i s data set hazard rate f o r males i s almost twice that for females. 5) performing a r e v i s i o n operation tends to halve the hazard rate. 6) presence of myocardial i n f a r c t i o n or hypertension i s related to poorer patient functioning. 7) although pairwise c o r r e l a t i o n between some of the variables (example; age and ischemia, ischemia and claudication) i s suspected, tests used i n this study did not indicate i t . One of the d i f f i c u l t i e s i n t h i s study was that there was no control group available. Hence strong conclusions could not be made i n -55- certain instances. The o t h e r problem was t h a t the d a t a was not c o m p l e t e l y r e c o r d e d , e s p e c i a l l y the date of death and/or o p e r a t i o n . A l t h o u g h t h e r e was d a t a f o r 535 p a t i e n t s , 89 cases had because t h e i r y e a r of death and/or o p e r a t i o n was another 143 not known. cases were excluded from the study s i n c e t h e i r had m i s s i n g v a l u e s . Then, variables Hence, i f we had more complete and a c c u r a t e d a t a , the r e s u l t s c o u l d have been more a c c u r a t e . s h o u l d be c l e a r l y s p e c i f i e d . and AAA t o be d e l e t e d were c l e a r l y noted. A l s o , the type of o p e r a t i o n I n t h i s d a t a s e t , o n l y the types ABF, FP The o p e r a t i o n types b e l o n g i n g t o c a t e g o r y "OTHER" were noted v e r y p o o r l y ; e s p e c i a l l y i f the s p e c i f i c o p e r a t i o n type i n t h i s c a t e g o r y was i f t h e r e was more a c c u r a t e l y r e c o r d e d , one c o u l d have seen a d i f f e r e n c e i n s u r v i v a l r a t e s between those t y p e s . c o u l d have a l s o checked other v a r i a b l e s . f o r i n t e r a c t i o n s between o p e r a t i o n type The o t h e r problem was illegible. I t would were engaged i n the survey o r the m e d i c a l s t a f f c o u l d have e n t e r e d the r e c o r d s i n t o computer and and t h a t i n c e r t a i n cases the hand w r i t i n g i n the d a t a sheets and p a t i e n t cards was have been much b e t t e r i f the people who One then g i v e n them t o the s t a t i s t i c i a n f o r s t a t i s t i c a l files analyses. -56BIBLIOGRAPHY Anderson, T.W. (1984). An Introduction to Multivariate S t a t i s t i c a l Analysis. 2nd Ed. New York: Wiley. Cooperman, M. Pflung, B., Martin, E.W. and Evans, W.E. (1978). Cardiovascular r i s k factors i n patients with peripheral vascular disease. Surgery 84, 505-509. Cox. D.R. (1972). Regression models and l i f e tables (with discussion). S t a t i s t . Soc. B 34, 187-202. Draper, N.R. and Smith, H. (1981). Gaspar, M.R. and Barker, W.F. J.R. Applied Regression Analysis. (1981). Peripheral Vascular Disease. K a l b f l e i s c h , J.D. and Prentice, R.L. (1980). F a i l u r e Time Data. New York: Wiley. 3rd Ed. The S t a t i s t i c a l Analysis of Kuk, A.Y.C. (1984). A l l subsets regression i n proportional hazards model. Biometrika, 71, 587-92. Lawless, J.F. (1982). York: Wiley. M i l l e r R.G. (1982). S t a t i s t i c a l Models and Methods f o r Lifetime Data. New Survival Analysis. New York: Wiley. Reid, N. and Crepeau, H. (1985). Influence functions f o r proportional hazards regression. Biometrika 72. 001-009. Weisberg, S. (1980). Applied Linear Regression. Wright, C.B. (1983). Vascular Grafting: C l i n i c a l Applications and Techniques. •u I j ! i i 'i 1 0 | 1 1 [ >t i -6 _ -u i . 1 « 0 i — * 1 v) 1 4 0. (0 5 0 _ v) r HT - •I « 1 cr* -.• r >, a. -Q f 4 v -c •K 1 .• •- 1! y «j * f ' U VJ ! I ? <a <!§ • vj Vb 0) > r . j / "5: •u -» ( & Q I 1 U <a 1 ca •2: ft "> •» ^ VJ 11 i 1 1 1 t«. <<« «0 ? ' H ft. -K "t v_ N!> ^ ? or 0 m .•- $^ _ Q V) tt- Cr; _ Q cy t> -1 s o *_ > 5 3 ? VI »^ (V, ^ 2 i 2 Co vP & * ™ a a- ^ ^ <i N *• *0 ^ VI *•> 4 .=1 ^ •v Hi i ; < * 0 -il • « U\ \ ; 2 1 < * 1 i vu " y * K N to *l i On Q «l v!) N i or o o a -2? ** -a a * VJ ^ r- < - -a MJ M VJ ^ >1 •i ill v. 5 Q • -58- APPENDIX 1 (continued) PATIENT CARD C A D — P V D Study Page No. Name Age___LSex___ • ABF • FP • Other. A S O D Operation -W ft AAA • • Isch. Claudication • Other • Prev. Vase. Op. Symptoms Angina IS Ml 1 iL CAD • ACBG - • Diabetes • Hypertension ____ Duration of follow-up • Cardiac • Non-Cardiac • Cardiac • Non-Cardiac Af.-T- ijuvJtf.o- Early Death Late Death -59- APPENDIX 1 (continued) FORMAT FOR COMPUTER FILES (1) SEQUENCE NUMBER: (2) PAGE NUMBER: (3) AGE: (4) SEX: (5) OPERATION TYPE: (6) ADDITIONAL SURGERY: (7) SYMPTOMS: (8) HISTORY: 99 = Missing 0 = Male 1 = Female ABF FP AAA OTHER = = = = NO NO NO NO 1 1 1 1 = = = = YES YES YES YES 0 = NO 1 = YES Ischemia Claudication 0 _ NO 0 = NO 1 _ YES 1 = YES Previous vascular operation Angina Myocardial i n f a r c t i o n Previous coronary bypass Diabetes Hypertension 0 0 0 0 0 0 (9) STATUS OCTOBER '81: Early death Late death S t i l l alive Unknown = = = = (10) CAUSE OF DEATH: Non cardiac Cardiac S t i l l alive Unknown = = = = (11) DATE OF DEATH: DD/MM/YR if if if if blank. 0 0 0 0 —NO = = = = NO NO NO NO NO 1 1 1 1 1 1 —YES = = = = = YES YES YES YES YES 0 1 2 9 0 1 2 9 DD unknown leave blank MM unknown leave blank YR unknown type 99 patient i s s t i l l a l i v e leave a l l columns -so- da) DATE OF OPERATION: DD/MM/YR i f DD unknown leave blank i f IMM unknown leave blank i f YR unknown type 99 If information on (3) to (8) i s known to be missing, then this was indicated with a code of 9 -61- APPENDIX 2 FORTRAN SUBROUTINE FOR MATRIX INVERSION REAL*8 DA, DT, DDET, DCOND DIMENSION DA (8,8), DT (10,10), IPERM (16) C **READ IN MATRIX DATA** READ (5,10) N 10 FORMAT (12) READ (5,20) ((DA ( I , J ) , 1=1, N), J=l, N) 20 FORMAT (F5.0) C **FIND THE INVERSE** CALL INV (N, NDIMA, DA, IPERM, NDIMT, DT, DDET, JEXP, DCOND) IF (DDET) 25, 30, 25 C **WRITE OUT RESULTS** 25 WRITE (6,40) N, DDET, JEXP, DCOND 40 FORMAT ('N=', 12, 5X, 'DETERM= *, G10.3, '*10**', 12/'INVERSE') WRITE (6,50) (( DT(I,J), I = 1,N), J=l, N) 50 FORMAT (IX, 14G10.3) STOP 30 WRITE (6,60) 60 FORMAT ('INVERSION FAILED') STOP END CONTROL LANGUAGE FOR BMDP:9R PROGRAM / PROBLEM TITLE IS 'PVD DATA'. / INPUT UNIT = 9. CASES = 303. VARIABLES = 15. TYPE = COVA. SHAPE = SQUARE. FORMAT i s ' (15F8.3)'. / VARIABLE NAMES ARE AGE, SEX, ISCH, CLAUD, PVOP, ANGINA, MI,DIAB, HYPT, DI, D2 , D3 , D4 , ADDOP, SURVIVAL . / REGRESS DEPENDENT IS SURVIVAL. INDEPENDENT ARE 1 to 14. METHOD = CP. TOLERANCE = 0.0001. PENALTY = 2. NUMBER = 3. ZERO. /END
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Statistical analysis of survival data : an application...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Statistical analysis of survival data : an application to perhipheral vascular bypass surgery Kottegoda, Preethi Nirmalie 1985
pdf
Page Metadata
Item Metadata
Title | Statistical analysis of survival data : an application to perhipheral vascular bypass surgery |
Creator |
Kottegoda, Preethi Nirmalie |
Publisher | University of British Columbia |
Date Issued | 1985 |
Description | A retrospective study was carried out on 535 patients who underwent bypass surgery for peripheral vascular disease. Survival data for 303 patients out of these 535 cases are subjected to quantitative analysis. The main interest is in survival of these patients in order to identify the risk factors. The importance of types of grafting technique in long-term survival is also considered. Statistical methods used to ascertain the important prognostic variables include Cox's proportional hazards model, stepwise regression and all subsets regression in proportional hazards model discussed by Kuk (1984). In descending order of significance, the most important variables are myocardial infarction, presence or absence of hypertension, sex and whether or not a revision operation was done. The variable, history of a previous coronary bypass graft is highly correlated with survival but the comparison of its significance to the other significant variables is not possible with Cox's model. Age is also related to survival in this data set. However, since there is no control group, one cannot make a strong conclusion about the effect of age on survival of the patients who have had surgery for peripheral vascular disease. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-06-20 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0096717 |
URI | http://hdl.handle.net/2429/25912 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-UBC_1986_A6_7 K67_8.pdf [ 2.55MB ]
- Metadata
- JSON: 831-1.0096717.json
- JSON-LD: 831-1.0096717-ld.json
- RDF/XML (Pretty): 831-1.0096717-rdf.xml
- RDF/JSON: 831-1.0096717-rdf.json
- Turtle: 831-1.0096717-turtle.txt
- N-Triples: 831-1.0096717-rdf-ntriples.txt
- Original Record: 831-1.0096717-source.json
- Full Text
- 831-1.0096717-fulltext.txt
- Citation
- 831-1.0096717.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0096717/manifest