STATISTICAL ANALYSIS OF SURVIVAL DATA: AN APPLICATION TO PERHIPHERAL VASCULAR BYPASS SURGERY BY PREETHI NIRMALIE KOTTEGODA B.Sc, The University of Sri-Jayawardenepura, Sri Lanka, 1981 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (The Department of Statistics) We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA November, 1985 (c)Preethi Nirmalie Kottegoda, 1985 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of S T A T I S T X C S The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 DE-6(3/81) ABSTRACT A retrospective study was carried out on 535 patients who underwent bypass surgery for peripheral vascular disease. Survival data for 303 patients out of these 535 cases are subjected to quantitative analysis. The main interest is in survival of these patients in order to identify the risk factors. The importance of types of grafting technique in long-term survival is also considered. St a t i s t i c a l methods used to ascertain the important prognostic variables include Cox's proportional hazards model, stepwise regression and a l l subsets regression in proportional hazards model discussed by Kuk (1984). In descending order of significance, the most important variables are myocardial infarction, presence or absence of hypertension, sex and whether or not a revision operation was done. The variable, history of a previous coronary bypass graft is highly correlated with survival but the comparison of i t s significance to the other significant variables is not possible with Cox's model. Age is also related to survival in this data set. However, since there is no control group, one cannot make a strong conclusion about the effect of age on survival of the patients who have had surgery for peripheral vascular disease. i i i TABLE OF CONTENTS ABSTRACT _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ if TABLE OF CONTENTS _ _ _ _ _ _ _ _ „ _ _ _ _ _'<(( LIST OF TABLES _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ V LIST OF FIGURES _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ v ' ACKNOWLEDGMENT . . _ _ _ _ _ _ _ _ _ _ _ _ _ _ v ' 7 INTRODUCTION _ . _ _ _ - _ - _ _ - _ _ _ _ 1 Chapter 1 DETAILS AND BACKGROUND OF DATA _ _ _ _ _ _ _ % 1.1 MEDICAL ASPECTS _ _ _ _ _ _ _ _ _ _ _ 2. 1.2 SOURCE OF DATA AND HOW IT WAS COLLECTED _ _ S 1.3 CLEANING UP OF DATA _ . _ _ _ _ _ _ _ - 6 1.4 SUMMARY STATISTICS _ _ _ _ _ _ _ _ - - It Chapter 2 COX'S REGRESSION MODEL _ _ _ _ _ _ _ _. _ f £ 2.1 GENERAL THEORY FOR COX'S MODEL _ _ _ _ _ _ Of. 2.2 APPLICATIONS AND RESULTS FROM COX'S MODEL _ _ _ _ 2.3 THEORY FOR STEPWISE REGRESSION IN COX'S MODEL _ _ -2.7 2.4 RESULTS FROM STEPWISE REGRESSION _ - - 2.8 2.5 CHECKING FOR PROPORTIONALITY ASSUMPTION AND ADEQUACY OF THE FIT IN COX'S MODEL - - - - - - - - - - 3 2 . Chapter 3 ALL POSSIBLE SUBSETS REGRESSION IN COX'S MODEL - - - 1H 3.1 THEORY _ _ - _ _ _ _ _ - -3.2 APPLICATIONS AND RESULTS _ _ _ - - - - - £ f i v Chapter 4 CHECKING FOR INFLUENTIAL OBSERVATIONS _ _ _ _ _ _ _ ^ 9 Chapter 5 CONCLUSIONS _ _ _ _ _ _ _ _ _ _ _ _ _ S"3 BIBLIOGRAPHY _ _ _ _ _ _ _ _ _ _ _ _ _ _ . . _ - _ 56 APPENDIX 1 _ _ _ _ _ _ _ _ _ . _ _ _ _ _ _ _ 5 " 7 APPENDIX 2 _ . „ _ _ _. _ _ _ _ _ _ _ _ _ _ 6 / V LIST OF TABLES TABLE I Variables Associated with the study _ _ „ _ _ _ _ 9 I I Frequency D i s t r i b u t i o n of Variables _ _ _ _ _ _ // I I I Frequency D i s t r i b u t i o n of Operation Type ,. IV Frequency D i s t r i b u t i o n of Follow-up Information _ _ _ - /3 V Regression C o e f f i c i e n t s f o r Cox's Model _ _ _ _ _ _ 2,0 VI Regression C o e f f i c i e n t s f o r Cox's Model; Varying Month of Death and/or Operation _ _ _ _ _ - - - - - -VII Regression C o e f f i c i e n t s f o r Cox's Model; Survival Time i n Years and True Survived Patients _ _ _ _ _ _ _ _ 23 VIII Regression C o e f f i c i e n t s f o r Cox's Model; Survival Time i n Months and True Survivad Patients _ _ _ _ _ _ _ _ _ 2.^-IX Estimated C o r r e l a t i o n Matrix _ _ _ _ _ _ - - 25" X Parameter Estimates from Stepwise Regression _ _ _ 3/ XI Two-way Contingency Tables _ _ _ _ _ _ _ _ -XII Summary of Influence Function Values _ _ _ _ _ _ 5"/ XIII Proportional Hazards Regression Model _ _ _ _ SZ v i LIST OF FIGURES FIGURE 1 Log minus log survival function for MI _ _ _ _ _ _ _ 3^ 2 Log minus log survival function for AGE _ _ _ _ _ _ _ 3 5 3 Log minus log survival function for HYPT _ _ _ _ _ _ _ 3 o 4 Log minus log survival function for SEX _ _ _ _ _ . _ 3 7 5 Log minus log survival function for ADDOP _ _ _ 38 6 Log minus log survival function for D2 _ _ _ _ 39 7 Residual plot for checking proportional hazards model ^0 v i i ACKNOWLEDGEMENT I would like to thank Dr. N. Reid for her guidance and assistance in producing this thesis. I am also indebted to Dr. M. Schulzer for his careful reading and helpful criticisms of this thesis. I would like to express my gratitude to Dr. M.T. Janusz, who kindly provided the data and offered advice on the medical aspects of this thesis. The financial support of the University of British Columbia is gratefully acknowledged. -1-INTRODUCTION Bypass surgery f o r peri p h e r a l vascular disease has been gaining wide acceptance as an e f f e c t i v e a l t e r n a t i v e to amputations. Although there are controversies about the s u r g i c a l techniques, less attention has been dire c t e d to the evaluation of r i s k f a c t o r s . As a r e s u l t of many people being interested i n various s u r v i v a l studies, most s u r g i c a l centres keep follow-up records of the s u r v i v a l experience of t h e i r p atients. In t h i s study, r e t r o s p e c t i v e l y obtained records of one such centre are subjected to quantitative analysis i n order to i d e n t i f y f a c t o rs a f f e c t i n g s u r v i v a l . C l i n i c a l d e t a i l s of the bypass procedures are presented i n section 1.1 while the background of data are given i n section 1.2. In Chapters 2 and 3, answers to the following questions are sought: 1) What factors are the most important i n p r e d i c t i n g survival? 2) How does each bypass technique a f f e c t s u rvival? The s t a t i s t i c a l methods employed to answer these questions are Cox's proportional hazards regression, stepwise regression, a l l subsets regression i n proportional hazards model and contingency table a n a l y s i s . A method of detecting any i n f l u e n t i a l observations i s discussed i n Chapter 4. Conclusions and suggestions are given i n Chapter 5. -2-Chapter 1 DETAILS AND BACKGROUND OF DATA Section 1.1 MEDICAL ASPECTS Diseases involving peripheral blood vessels, that is blood vessels in the arms and legs,are known as peripheral vascular diseases. Bypass surgery for peripheral vascular disease is a highly accepted surgical treatment. This reduces the number of amputations, which had been the most common surgical procedure that was available. Different types of bypass procedures are used depending on the patient's condition. Each surgeon has somewhat different c r i t e r i a in selecting patients. Another bias introduced is the surgeon's preference for one surgical technique over another. While the results of these operative procedures have been studied extensively, less attention has been directed to the evaluation of risk factors. In this study we are interested in survival of patients undergoing surgery for peripheral vascular disease, in order to identify the risk factors. We are particularly interested in survival of the patients with deaths due to cardiac disease in order to identify high, medium, and/or low risk patient groups in the hope of identifying populations who are l i k e l y to benefit from aggressive investigation of their heart. -3-Aorotobifemoral bypass grafting has become the procedure of choice for most patients with occlusive disease of the aortic biftilrcation, which is the junction where the abdominal aorta divides into the l e f t and right branches. These two branches are the l e f t and right common i l i a c arteries. In this type of technique, the graft is extended to the l e f t femoral artery (in the l e f t leg) and the right femoral artery (in the right leg) because aortic flow w i l l be better when both sides are revascularized. By taking the graft to femoral arteries most of the disease is bypassed. The most popular and commonly used grafting material is Dacron, which can be woven or knitted. Usually, a Dacron tube with two limbs is used for Aorotobifemoral grafting. The proximal end of the graft i s sutured to a small hole cut in the front of the aorta. This process is called a end-to-side anastomosis. Sometimes the aorta can be completely divided and the proximal end of the graft anastomosed end-to-end. Distally, one limb of the graft is sutured end-to-side to a hole cut in the right femoral artery and similarly, the other limb to the l e f t femoral artery. The Femoropopliteal bypass procedure is used to bypass occlusion of the superficial femoral artery, when there is an adequate flow in the popliteal artery in the leg. The most acceptable grafting material currently available, is the reversed saphenous vein. This vein possesses valves which only allow the flow of blood towards the heart. It is therefore necessary to remove an appropriate length of the vein and to reverse i t s direction, before grafting i t to the artery. One end of the reversed vein is stitched to a small longitudinal incision made in the popliteal artery and the other end to a similar cut made in the common femoral artery. Both anastomoses are performed end-to-side. An aneurysm is an abnormal dilatation of a blood vessel, usually forming a pulsating tumour. Abdominal Aortic Aneurysm is the most commonly seen aneurysm. It consists of weakening of the arte r i a l wall of the aorta so that i t is l i k e l y to be stretched by the force of arter i a l blood pressure. When the wall is weakened, the whole vessel tends to dilate but i f the vessel wall is weaker over one area, that part of the vessel is liable to blow out and form an aneurysm. Tube graft, end-to-end bifurcation graft from aorta to the right common i l i a c artery or end-to-side bifurcation graft to the l e f t external i l i a c artery, are some of the possible types of resconstruction for this disease. Usually, a woven Dacron graft is preferred as the grafting material. Other types of peripheral vascular operations include Axillofemoral bypass graft in which the :axillary artery (in the arm) and the common femoral artery are involved. One end of the graft is stitched on to a small cut made in the ...axillary artery and the other end to a similar cut made in the common femoral artery. When one i l i a c artery in a leg is severely occluded and the other i l i a c artery in the other leg i s a suitable donor-vessel, blood can be delivered to the ischemic end via a Femoral-Femoral bypass, I l i a c - I l i a c bypass or Iliac-Femoral bypass. There are several other operation techniques and bypass procedures for peripheral vascular disease, but the ones described above are the most common. In fact, Aorotobifemoral and -5-Femoropopliteal procedures account for the majority of bypasses in peripheral vascular disease. Section 1.2 SOURCE OF DATA AND HOW IT WAS COLLECTED The data analysed here is a collection of observations and measurements from reports on patients who had undergone peripheral vascular surgery at St. Paul's Hospital (Vancouver, B.C.) between 1975 and 1977. The data is recorded both on data sheets and on individual patient cards and the information contained on them is almost the same except the latter has only the summary. A retrospective study on 535 patients was performed in October 1981 and information collected on each patient is name, age, sex, type of operation; whether i t be Aorotobifemoral grafting (ABF), Femoropopliteal grafting (FP), Abdominal Aortic Aneurysm (AAA) or other peripheral vascular operations, the patient's preoperative symptoms; whether those be ischemia or claudication, whether the patient had a previous vascular operation and whether revisions of peripheral vascular operations were performed. Also recorded are the presence or absence of angina, history of a previous myocardial infarction or a previous coronary bypass graft and the presence or history of diabetes or hypertension. Patient deaths are recorded as being "early" which is within 30 days of surgery or "late" which is beyond 30 days. Cause of death is recorded on data sheets and noted on the cards as being cardiac or non-cardiac. The date of operation and date of death are -6-recorded by year and month (in 341 cases out of 535) although in some cases the day is also recorded. The data was recorded manually on data sheets and then a summary of these details was noted on patient cards which are easy to read and handle. In one data sheet, there is information on more than one patient, whereas each patient has exactly one patient card. In February 1985, records of these 535 patients were converted to computer f i l e s . Reprints of the data sheet, patient card and the format used for converting to computer f i l e s are included in Appendix 1. Section 1.3 CLEANING UP OF DATA When the s t a t i s t i c a l analysis was carried out, a l l the 535 patients as well as a l l the variables were not used, for many reasons. There were 89 cases excluded from the study as their year of operation and/or death was unknown. Another 143 cases were deleted because some of their variables had missing observations. It was noted that some patients had more than one operation type at the i n i t i a l operation. Hence operation type was partitioned into 15 mutually exclusive subsets as shown in table III. The type OTHER includes peripheral vascular operations other than ABF, FP and AAA. The subsets 9, 10, 11 and 14 were automatically excluded because the patients belonging to those subsets were among the deleted 232 cases. Subsets 5,6,7 and 8 were pooled together and four indicator variables were defined to represent the differences in survival rates between the -7-five categories of operation type. Pooling was done to avoid having too many variables in the model. Patients within a data sheet were ordered alphabetically and then their records entered into the computer f i l e according to this order. In the analysis each patient was identified by two labels, namely, the sequence number and page number. The former in the order in which they were entered in to the computer f i l e and the latter is the number corresponding to their data sheet. Survival times were measured in months rather than in years because the former is more spread out. As noted in section 2.2, using the month or the year did not make any drastic changes in significance of variables nor in the estimated coefficients. There were 72 cases in which the month of operation and/or death was not recorded and in such situations i t was assumed that month was June. This was done to avoid further deletion of cases which would have made the sample size small. As noted in section 2.2, assuming the unknown month to be January or December, did not make any drastic changes in significance of variables nor in estimating variable coefficients. Hence throughout the study the unknown month of operation and/or death was assumed to be June. As we are particularly interested in survival with respect to cardiac disease, a l l non cardiac deaths and alive patients were treated as censored observations. There were 45 deaths, 255 censored and 3 losts to follow-up, out of 303 cases. From the 255 censored observations, there were 58 non cardiac deaths and 197 alive patients. - 8 -Even i f the true survived p a t i e n t s were used, the r e s u l t s do not change d r a s t i c a l l y . T h i s i s noted i n s e c t i o n 2.2. The data f i l e i n i t s f i n a l form had 15 v a r i a b l e s as w e l l as f o l l o w - u p i n f o r m a t i o n . Table I gives a l l the v a r i a b l e names and t h e i r d e s c r i p t i o n . -9-TABLE I. Variables Associated with the Study VARIABLE NAME VARIABLE DESCRIPTION AGE Age; range is 30 to 97 years. SEX Sex; 0 = males 1 = females ISCH Symptoms of ischemia; 1 = yes 0 = no CLAUD Symptoms of claudication; 1 = yes 0 = no PVOP A previous vascular operation done; 1 = yes 0 = no ANGINA Presence or absence of angina; 1 = present 0 = absent MI History of myocardial infarction; 1 = yes 0 = no DIAB History of diabetes; 1 = yes 0 = no HYPT History of hypertension; 1 = yes 0 = no ADDOP Revisions of peripheral vascular operations; 1 = yes 0 = no PCBG Previous coronary bypass graft done; 1 = yes 0 = no (Dl, D2, D3, D4) (0,0,0,0) i f FP only (1,0,0,0) i f ABF only (0,1,0,0) i f OTHER only (0,0,1,0) i f AAA only (0,0,0,1) i f ANY TWO Dl Indicator variable representing the difference between operation type FP and ABF -10-TABLE I. Variables Associated with the Study (cont'd.) VARIABLE NAME VARIABLE DESCRIPTION D2 Indicator variable representing the difference between operation type FP and OTHER D3 Indicator variable representing the difference between operation type FP and AAA D4 Indicator variable representing the difference between operation type FP and ANY TWO -11-Section 1.4 SUMMARY STATISTICS Statistics given in the following tables are based on the 15 variables and 535 cases. The values in parentheses correspond to the used 303 cases. TABLE II. Frequency Distribution of Variables VARIABLE NAME Present Absent Missing ISCH 101 (80) 434 (223) 0 (0) CLAUD 96 (55) 439 (248) 0 (0) PVOP 106 (72) 429 (231) 0 (0) ANGINA 79 (45) 456 (258) 0 (0) MI 100 (65) 435 (238) 0 (0) DIAB 46 (30) 489 (273) 0 (0) HYPT 150 (87) 382 (216) 3 (0) ADDOP 120 (49) 415 (254) 0 (0) PCBG 15 (6) 520 (297) 0 (0) D1,D2,D3,D4 = 0,0,0,0 128 (80) 386 (223) 21 (0) = 1,0,0,0 97 (60) 417 (243) 21 (0) = 0,1,0,0, 53 (34) 461 (269) 21 (0) = 0,0,1,0 73 (40) 441 (263) 21 (0) = 0,0,0,1 163 (89) 351 (214) 21 (0) SEX males = 388 (217), females = 147 (86) -12-Table III. Frequency Distribution of Operation Type Subset Subset Name Frequency 1 FP only 128 (80) 2 ABF only 97 (60) 3 OTHER only 53 (34) 4 AAA only 73 (40) 5 ABF + FP 15 (8) 6 ABF + OTHER V ANYTWO 52 (30) 7 ABF + AAA \ 30 (10) 8 FP + OTHER J 66 (41) 9 FP + AAA 2 10 OTHER + AAA 10 11 ABF + FP + OTHER 3 12 ABF + FP + AAA 0 13 FP + OTHER + AAA 0 14 ABF + OTHER + AAA 6 15 ALL FOUR 0 -13-TABLE IV. Frequency Distribution of Follow-up Information Early Death Late Death Alive Unknown Non Cardiac 25 (23) 38 (35) 0 (0) 0 (0) 63 (58) CAUSE Cardiac 7 (4) 54 (41) 0 (0) 0 (0) 61 (45) OF DEATH Alive 0 (0) 0 (0) 387 (197) 0 (0) 387 (197) Unknown 0 (0) 8 (0) 0 (0) 16 (3) 24 (3) 32(27) 100 (76) 387 (197) 16 (3) 535 (303) -14-Chapter 2 COX'S REGRESSION MODEL Section 2.1 GENERAL THEORY FOR COX'S MODEL There have been many articles in the recent literature on the application of regression analysis to data with censored observations (e.g. Cox, 1972; Miller, 1981; Kalbfleish and Prentice, 1980). Let T denote the random failure time with a density function f(t) and distribution function F(t). The survival function S(t) is defined to be the cummulative probability of survival past time t and given by S(t) = Pr {T > t} = 1 - F(t) The hazard function X(t) has the interpretation \(t) dt = Pr{t _ T ;_ t + dt | t £ T } Then, X(t) = f(t) [1 - F(t)] -15-Hence we have S(t) = exp { - J \(x) dx} One of the important goals is to estimate the survival function. If the parametric form of f(t) is known and once we have the maximum likelihood estimates for the parameters, s(t) can be estimated. For example, i f f(t) = y e v t , then \(t) = u and S(t) = e u t . If the maximum likelihood estimate of y is y, then, the maximum likelihood estimate of S(t) is e . However, i f the parametric form of f(t) is unknown, a non parametric estimate for S(t) can be obtained using the empirical survival function. If there is no censoring, the empirical survival function based on a sample of size n is given by S(t) = X \ Number of observations _ t I ; t _ 0 When dealing with censored data, this equation has to be modified. Consider n individuals and assume that t < t < • • • • < t., 1 2 1 are K(_n) distinct times at which deaths occur. Let d. = number of deaths at time t. l l n. = number of individuals "at risk" at time t.; -16-that is the number of individuals alive just prior to time t^. In addition to l i f e times t , t , t. , there are also censoring times _L 2 K c.'s for individuals whose l i f e times are not observed. Then an J estimate of S(t) is defined as S(t) = n i : t. < 1 1 - d. l n. I This is called the Kaplan-Meier estimate of the survival function and is a kind of a non parametric maximum likelihood estimate. (Kaplan and Meier, 1958). This estimate is a step function with a unit value at t = 0 and drops by a factor (1 - after t = t.. It does not change at C j ' s - However, the effect of censoring times is incorporated <\ into the n/s and hence, into the sizes of the jumps in S(t). Typically, the failure time depends upon quantitative or qualitative explanatory variables known as covariates, such as age, sex, type of medical treatment. Effects of these covariates on the l i f e times can be studied using a kind of regression model called Cox's model. Let Z be the vector of covariates and B_ be a vector of unknown coefficients. Then, Cox's model specifies X (t; Z) = X (t) exp {ZTp}, — o -17-where X(t ; Z} is the hazard rate with covariate vector Z and X (t) o is the hazard rate with Z_ = 0 (Cox, 1972). The regressor variables here are the covariates and changes in these, change the hazard function in a multiplicative way. Such a model is called a proportional hazards model. When B_ is estimated and tested for significance, one can f i n a l l y select a set of significant covariates that would predict the hazard rate. Estimates of the regression parameters are obtained by maximizing the partial likelihood function given by (Cox, 1975) K L(B) = n " i=l exp ( Z T 6_) / _ exp (Z.T J3) where Z. = covariate vector of the i ^ n individual R^ = set of individuals at risk just prior to t^ when there are ties among the death times, the partial likelihood function proposed by Breslow (1974): (B) =TT\exp (S. 6) /\l exp (z. is maximized. Here, d. in the number of deaths at time t. and S. is l i ~ i the vector sum of the covariates of d. individuals. l -18-Section 2.2 APPLICATIONS AND RESULTS FROM COX'S MODEL The sample size for this analysis was 303. The variable, HISTORY OF PREVIOUS CORONARY BYPASS GRAFT, was perfectly ordered with time; that i s a l l people who have had a previous coronary bypass had survival times less than 28 months and the patients who had not undergone a coronary bypass graft had survival times greater than 28 months. Hence i t is clear that HISTORY OF PREVIOUS CORONARY BYPASS GRAFT is a variable which is highly correlated to survival. Due to the fact that the variable was ordered with time, the partial likelihood is maximized at i n f i n i t y . Therefore the coefficient cannot be estimated. Since Cox's model cannot be used with such a variable in the model, i t was excluded from the computer analyses. The other 14 variables used for this analysis were AGE, SEX, ISCH, CLAUD, PVOP, ANGINA, MI, DIAB, HYPT, ADDOPT, DI, D2, D3 and D4 where DI, D2, D3 and D4 are dummy variables defined in Table I. The regression analysis was carried out using the computer package BMDP program 2L. The logarithm of the maximized partial likelihood function, the global chi-square and i t s p-value as well as the estimated coefficients, their asymptotic standard errors and the standardized coefficients for each covariate are presented in Table V. Here, the unknown month of operation and/or death was assumed to be June. The global chi-square s t a t i s t i c tests the hypothesis that a l l -19-c o e f f i c i e n t s are i d e n t i c a l l y z e r o . This s t a t i s t i c i s def ined as U T (0) I " 1 U (0) where U(0) represent the vector of f i r s t d e r i v a t i v e of the p a r t i a l l i k e l i h o o d f u n c t i o n evaluated at j_ = 0 and 1(0) denotes the observed informat ion matrix evaluated at J3 = 0. The g l o b a l chi - square has an asymptotic c h i - s q u a r e d i s t r i b u t i o n with degrees of freedom equal to the number of covar ia tes i n the model. The r e g r e s s i o n c o e f f i c i e n t i n d i c a t e s the r e l a t i o n s h i p between the covar ia te and the hazard f u n c t i o n . The e f f e c t of a u n i t change i n A v a r i a b l e X. on the hazard f u n c t i o n i s estimated by e^ 1 ; a l l other X ' s h e l d f i x e d . A p o s i t i v e c o e f f i c i e n t increases the value of the hazard f u n c t i o n and therefore s u r v i v a l d e t e r i o r a t e s with i n c r e a s i n g values of the v a r i a b l e provided that the covar ia tes are reasonably independent of one another. (A negative c o e f f i c i e n t has the reverse i n t e r p r e t a t i o n ) . -20-V ART. ABLE NAME TABLE V. Recession Coefficents for Cox's Model Log likelihood = -218.2350 Global Chi-square = 53.2400, D.F = 14, p-value = 0.0000 COEFFICIENT STANDARD ERROR STANDARDIZED COEFFICIENT P-VALUE AGE SEX ISCH CLAUD PVOP ANGINA MI DIAB HYPT ADDOP Dl D2 D3 D4 0.0368 -0.8954 0.2854 -0.5094 -0.2491 0.2599 1.1720 0.4045 0.9028 -0.7097 -0.4222 0.7896 -0.8628 -0.2016 0.0163 0.4302 0.3689 0.4932 0.3819 0.3865 0.3346 0.4135 0.3354 0.5018 0.5568 0.4536 0.5624 0.4242 2.27 -2.08 0.77 -1.03 -0.65 0.67 3.50 0.99 2.69 -1.41 -0.76 1.74 -1.53 -0.48 0.005 0.009 0.180 0.200 0.780 0.340 0.000 0.110 0.004 0.09 0.24 0.04 0.07 0.34 -21-A p-value of zero for the global chi-square s t a t i s t i c indicates that not a l l the coefficients are zero. AGE, SEX, MI, HYPT and D2 are highly significant whereas D3 and ADDOP appear to have a f a i r l y significant effect on the hazard function. The above regression analysis was carried out similarly with the unknown month of death and/or operation assumed to be January and December and the results are shown in Table VI. (The values in parentheses correspond to December). It is clear that the values of the coefficients do not change very much when compared to the values given in Table V. The significant variables turn out to be the same. Hence a l l further analyses are done with the assumption of unknown month of death and/or operation to be June. The same regression analysis was repeated, once with survival times measured in years and true survived patients and again with survival times measured in months and true survived patients. The corresponding results are presented in Table VII and Table VIII respectively. It is clear that in both these tables, the values of the coefficients do not change very much when compared to the values given in Table V. The significant variables turn out to be the same. Hence, we use the survival times in months and consider non cardiac deaths and alive patients as censored observations. -22-TABLE VI. Regression Coefficients for Cox's Model; Varying Month of Death and/or Operation Log Likelihood = -222.3807 (-210.2653) Global chi-square = 52.32 (53.05), D.F = 14, p-value = 0.00 (0.00) VARIABLE NAME COEFFICIENT STANDARD ERROR STANDARDIZED COEFFICIENT AGE SEX ISCH CLAUD PVOP ANGINA MI DIAB HYPT ADDOP Dl D2 D3 D4 0.0355 -0.8929 0.2523 -0.5596 -0.2202 0.2458 1.1895 0.4273 0.8415 -0.6064 -0.4991 0.7520 -0.8791 -0.2490 ( 0.0358) (-0.8820) ( 0.3553) (-0.4384) (-0.2479) ( 0.2414) ( 1.1427) ( 0.3208) ( 0.9040) (-0.9413) (-0.3987) ( 0.7632) (-0.8325) (-0.1465) 0.0162 0.4281 0.3677 0.4921 0.3783 0.3850 0.3332 0.4139 0.3312 0.4909 0.5532 0.4483 0.5609 0.4183 (0.0160) (0.4365) (0.3705) (0.4886) (0.3884) (0.3893) (0.3394) (0.4187) (0.3399) (0.5243) (0.5559) (0.4521) (0.5609) (0.4290) 2.19 ( -2.09 (-0.67 ( -1.14 (--0.58 (-0.64 ( 3.58 ( 1.03 ( 2.54 ( -1.24 (--0.90 (-1.68 ( -1.57 (--0.60 (-2.24) 2.02) 0.96) 0.90) 0.64) 0.62) 3.37) 0.77) 2.66) •1.80) 0.72) 1.69) 1.48) 0.34) -23-TABLE VII. Regression Coefficients for Cox's Model; Survival Time in Years and True Survived Patients. Log Likelihood = -243.8209 Global chi-square = 59.94, D.F. = 14, p-value = 0.0000 VARIABLE NAME COEFFICIENT STANDARD ERROR STANDARDIZED COEFFICIENT AGE SEX ISCH CLAUD PVOP ANGINA MI DIAB HYPT ADDOP Dl D2 D3 D4 0.0362 -0.8399 0.2206 -0.5209 -0.2106 0.2303 1J806 0.3916 0.9097 -0.6639 -0.4417 0.7892 -0.8270 -0.2354 0.0156 0.4123 0.3562 0.4812 0.3568 0.3684 0.3244 0.4084 0.3107 0.4912 0.5477 0.4442 0.5224 0.4109 2.32 -2.04 0.62 -1.08 -0.59 0.63 3.64 0.96 2.93 -1.35 -0.81 1.78 -1.58 -0.57 -24-TABLE VIII. Regression Coefficients for Cox's Model; Survival Time in Months and True Survived Patients Log Likelihood = -233.5218 Global chi-square = 54.82, D.F. = 14, p-value = 0.0000 VARIABLE NAME COEFFICIENT STANDARD ERROR STANDARDIZED COEFFICIENT AGE SEX ISCH CLAUD PVOP ANGINA MI DIAB HYPT ADDOP DI D2 D3 D4 0.0364 -0.8876 0.2630 -0.5318 -0.2311 0.2581 1.1650 0.3921 0.9130 -0.6528 -0.4350 0.7725 -0.8510 -0.2182 0.0156 0.4325 0.3618 0.4812 0.3880 0.3922 0.3210 0.4025 0.3218 0.4931 0.5529 0.4512 0.5583 0.4217 2.33 -2.05 0.72 -1.11 -0.65 0.66 3.63 0.97 2.84 -1.32 -0.78 1.71 -1.52 -0.52 -25-TABLE IX. Estimated Correlation Matrix (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) AGE (1) 1.0 SEX (2) -.11 1.0 ISCH (3) -.05 -.10 1.0 CLAUD (4) -.09 .02 .10 1.0 PVOP (5) .01 -.05 .08 .16 1.0 ANGINA (6) -.09 .01 .05 .07 -.01 1.0 MI (7) -.05 .09 -.02 .04 -.01 -.09 1.0 DIAB (8) -.11 .07 -.10 -.09 -.13 .05 -.10 1.0 HYPT (9) .03 -.02 .08 -.15 -.18 .07 .11 -.05 1.0 DI (10) .04 -.04 -.07 -.02 -.01 -.08 -.07 .16 .05 1.0 D2 (11) .10 .05 -.22 -.07 -.20 -.11 .02 .16 .07 .03 1.0 D3 (12) -.11 .06 .14 .13 .16 .11 -.11 .09 -.12 .17 .19 1.0 D4 (13) -.09 .02 -.10 .13 -.10 -.15 -.14 .12 -.13 .09 .41 0.02 1.0 ADDOP (14) .05 .06 .10 -.04 -.16 .07 -.09 .05 -.05 .02 .07 .09 .06 1.0 -26-According to medical reports i t was suspected that ISCH and AGE were correlated. To examine this f i r s t order association, a simple contingency table was constructed and the null hypothesis of independence of ISCH and AGE tested by Pearson's chi-square goodness of f i t test. For the purpose of this analysis AGE was categorized into 4 groups. The 2x4 contingency table for ISCH vs. AGE is AGE < 40 yr 41-60 yr 61-80 yr > 80 yr present 6 139 275 14 434 ISCH absent 0 26 70 5 101 6 165 345 19 535 with a Pearson x 2 of 3.58 with a significance level of 0.31. From 3 Table IX we have the correlation coefficient between AGE and ISCH as -0.05. Significance of this sample correlation coefficient can be tested using the following test (Anderson, 1984, p.109). If y is the sample correlation coefficient between two variables, then the null hypothesis of the population correlation between the two variables being equal to zero, is rejected i f J 5 _ _ ^ M > t N _ 2 ( a ) (l-Y2)>2 where N is the sample size and t (a) is the two-tailed -27-significance point of the t-distribution with (N-2) degrees of freedom for significance level a. Using this test for sample correlation coefficient between AGE and ISCH, the significance level turned out to be 0.2. Similarly the other coefficients between each pair of variables were tested and the significance levels appeared to be in the range of 0.3 - 0.1. A stepwise logistic regression was also carried out using BMDP program LR, with ISCH as the binary response variable and AGE as the independent variable. This stepwise procedure did not select AGE as a significant variable since the p-value was 0.62. Hence there is no evidence for any association between AGE and ISCH for this data set. Section 2.3 THEORY FOR STEPWISE REGRESSION IN COX'S MODEL As a more efficient way of identifying the independent variables which are significantly related to the hazard function, stepwise regression procedure was used. In the stepwise process significant probabilities are computed on the basis of a large sample partial likelihood ratio test using the chi-square value calculated from the log of the ratio of two maximized partial likelihood functions. This is known as the MPLR method. Let M represent the set of indices of the covariates in the regression model at any given step and L denote the maxmized partial likelihood M function based on the covariates belonging to set M. The MPLR method -28-removes the variables corresponding to the index K e M for which is smallest i f Pr (xf) > limit to remove or enter the variable corresponding to index K ^ M for which x* = -2 t n J v < f > ) ; M + = M * 1*1 is largest i f Pr(x*) < limit to enter. The remove and enter limits used for this analysis are 0.15 and 0.10 respectively. Section 2.4 RESULTS FROM STEPWISE REGRESSION Computer package BMDP program 2L was used to carry out the analysis. Following this procedure MI was the f i r s t variable to enter the model with a x* of 19.24. With MI in the model, the variable that 2 was added next is AGE. The x x for this stage was 6.72 with a significance level of about 0.009. The next variable to enter was D2 and the x* was 3.58 with a significance level of 0.058. HYPT was entered at the fourth step with a x*of 3.67, significance level 0.055. The variable SEX which had a x* of 4.18 and significance level 0.041 was entered at the f i f t h stage and the stepwise process terminated after the sixth step in which ADDOP was entered with x* of 2.85, significance level 0.092. The coefficient values, their asymptotic standard errors and the -29-standardized coefficients are given in Table X. These values do not change drastically, when compared to the values given in Table V. Thus at this stage we choose as the model \ ( t ; Z) = \ (t) exp{l.33 x MI + 0.04 x AGE + 0.77 x HYPT '+ 0.89 x D2 o -0.82 x SEX - 0.76 x ADD0P} (**) Recall that SEX = 0 ; males 1 ; females ADDOP = 0 ; revision operation not done = 1 ; revision operation done Hence i t is clear that the hazard rate for males is almost twice that —0 82 for females (e ' = 0.44) and performing a revision operation tends to halve the hazard rate (e °'^6 - 0.47). Patients who have had femoropopliteal grafting technique (FP) have a better survival than the patients who had undergone any peripheral vascular surgery belonging to the category "OTHER". This is indicated by the estimated coefficient —0 89 of D2 (e = 2.44) which is a measurement of the difference in hazard rates between operation type FP and OTHER. The estimated coefficients for MI and HYPT are positive, as expected, since presence -30-of these i s r e l a t e d to poorer p a t i e n t f u n c t i o n i n g . A p o s i t i v e c o e f f i c i e n t f o r age impl ies that the o l d e r people tend to d i e e a r l i e r than young ones but s ince t h i s study does not have a c o n t r o l group, that i s there are no age matched p a t i e n t s who have not been operated f o r the d isease , one cannot make a strong conclus ion about the e f f e c t of age on the s u r v i v a l of the p a t i e n t s who had undergone surgery f o r p e r i p h e r a l vascular d isease . -31-TABLE X. Parameter Estimates from Stepwise Regression VARIABLE COEFFICIENT STANDARD STANDARDIZED NAME ERROR COEFFICIENT MI 1.3261 0.3031 4.38 AGE 0.0400 0.0149 2.68 HYPT 0.7661 0.3168 2.42 D2 0.8893 0.3995 2.23 SEX -0.8171 0.4217 -1.94 ADDOP -0.7619 0.4938 -1.54 -32-Section 2.5 CHECKING FOR PROPORTIONALITY ASSUMPTION AND ADEQUACY OF THE FIT IN COX'S MODEL Recall that the survival function S(t; Z) is given by r b S(t ; z) = exp { - \ M x ; Z) dx} Hence with Cox's proportional hazards model we get -ftn S(t ; Z) = - a n S Q(t) . exp {ZTjB_} Hn [-8.n S(t ; Z) ] =jW-&n S Q(t)] + ZT8_ Thus, the logarithm of the minus logarithm of survival function for a particular covariate pattern, when plotted against time is a straight line, i f the proportionality assumption is true. When we plot this on the same scale for the categories of a particular variable, such as males and females of variable SEX, then the two lines should be parallel, i f the proportionality assumption holds for that variable. Figures 1 through 6 show the plots of logarithm of the minus logarithm of estimated survival function for the six significant variables, evaluated with the mean covariate vector (Kalbfleish and Prentice, 1980, p. 92). The mean covariate vector has elements which are equal to the mean of each covariate and i t was used to avoid having too many plots which would correspond to each possible value of the six -33-variables. The proportional hazards assumption is met by the variables AGE, SEX, MI and HYPT as the corresponding curves are parallel. The proportionality assumption does not seem to hold for D2 and ADDOP since those curves have slight departures from parallelism. Once the model is fit t e d , the overall adequacy of the model can be checked by plotting the survival curve estimates computed from the residuals. The estimated residuals for the i ^ * 1 individual is given by A A e. = -Sin S(t. ; Z.) ; 1 = 1,2,»»», n 1 1 _ i where S(t^ ; Z^) is the estimated survival function for the i t l l i n d i v i d u a l (Kalbfleish and Prentice, 1980, p.96). If the model f i t s the data, the e\'s should behave as a random sample of censored unit exponential variates. Thus when the survival curve estimates based on these residuals are plotted on a log scale, the resulting plot should yield approximately a straight line with slope -1. In this analysis, the estimated residuals were obtained from the output of BMDP program IL which computed the Kaplan-Meier survival curve estimates based on the residuals. The corresponding plot which is illustrated in Figure 7, supports the adequacy of the f i t . -34-i i + t i i i i i i + i i i i i i CM t-y- i o U J < CO CO 00 CO CD ca ca CO ca ca co < < < < < < < < CO LO CO CO CO CO CO CO CO CO CO CO CO CO CO CO ca ca ca ca ca oo co < < < < < < < < < < < < < CO o co ca ca co co oa ca < < < < < < < CM CO I r-o _ < < < < CM < < 00 ca ca ca ca oa oa ca ca ca ca co co i i i i < < < < < < < < < < < < < i < i < i < i . < oa < 00 < ca < oa < oo oo oa oo oa * 1 + i i i i i L O G M I N U S L O G S U R V I V A L F U N C T I O N - 3 5 -L O G M I N U S L O G S U R V I V A L F U N C T I O N FIGURE 3. Log minus log s u r v i v a l function for HYPT STRATA SYMBOL NO A YES B AA A BBBBBBBBBBB AA + BB A BBBBBBB AAAA B AAAAAAAA BBBB AAAAAAAAA BBBBBB AAAAAAAAAA BBB AAAA BBBBBBBB AAAAA BBBB AAA + BB AAA -B AAAA -B AAAA -B AAA -B A - * A - * - A + , + + + + + + + + . • 24 40 56 72 16 32 48 64 MONTH FIGURE 4. Log minus log s u r v i v a l function f or SEX STRATA SYMBOL MALE A FEMALE B A AA A A AA AAA + AAAAAAAAA AAAAAAAAAAA A BBBBBBB AAAAAAAA BBBBBBBBB AAAAA BBBBBBBBBBBB AAA BBBBB AAAA BB AAAA BBBBBB + AAAAA BBBBB AAA BBBBBB - AABBBBB - B * B B 8. 24 40 56 72 0. 16 32 48 64 MONTH FIGURE 5. Log minus log s u r v i v a l function for ADDOP STRATA SYMBOL NO A YES B + A A AA AA A A A A + A A A A A A AAAAAAAAAAAAA AAAAA BB AAAAAAAA BBBBB AAAA BBBBBBBBBB AAAA BBBB AAAAA BB AA B + AAAAAA BB AAA B - AAA BBBBB - A BBBB - A B . + + + + + + + + + + .. 8 . 24 40 56 72 0. 16 32 48 64 MONTH FIGURE 6. Log minus log s u r v i v a l function for D2 STRATA SYMBOL NO A YES B AA A BBBB A BBBBBBBBBBBBBBBB AA + BBBBBBB AAA BBBBBBBB AAAAAAAAA BBBBB AAAAAAAAAAA BB AA B AAAAAAAA B AA - B AA -BB AAAAAA +B AAA -B A -B AA -B AAAAAAAAAA - * A - A + + + + + + + + + + + • • 8. 24 40 56 72 0. 16 32 48 64 + MONTH FIGURE 7 . Residual p l o t f o r checking p r o p o r t i o n a l hazards model + . . + . . . .+.. . .+.. . .+.. . .+.. . .+.. . . + . . . . + . . . . + . . . . + . + .+....+....+. +** * • • * .•....•....•....•....-••....•....•..•••••••••••••••••••••••"'•••••••••••••'••-••••'•••• + -- -' !o70 .210 .350 .490 .630 .770 .910 1 0.00 .140 .280 .420 .560 .700 .840 .980 -41-Chapter 3 ALL POSSIBLE SUBSETS REGRESSION IN COX'S MODEL Section 3.1 THEORY Although stepwise procedures are often used to select significant variables in regression with censored data, a l l possible subsets regression is preferred as a more reliable and informative method, provided that i t is computationally feasible (Kuk, 1984; Draper and Smith, 1981). This i s because stepwise procedures lead to a single subset of variables and do not suggest alternative good subsets. A criterion that is based on the Wald s t a t i s t i c and which is equivalent to Mallow's Cp s t a t i s t i c i s used for selecting the best subset. Consider Cox's proportional hazard model discussed in section T T T 2.1. Let P = (P , P ) and let model a correspond to P =0. Then 1 2 2 W , the wald s t a t i s t i c of the f u l l model against model a is defined as a W a A T _ i A = P c p 2 2 2 2 where P = (P , P ) is obtained from the f u l l model and A T A T obtained \ -42-A from the f u l l model as the estimated covariance matrix of 6. So, to get W j from the f u l l f i t , extract the second component of B and bottom corner of C; this last needs to be inverted. Then a selection criterion V , a suggested by Kuk is given as V = W + 2p a a a where P is the number of covariates in the model a. a To begin with, the following matrix T A A A P A T AT A P A (N-p-l)+P AP A where again, p is obtained from f u l l f i t - i . A A = C = estimated covariance matrix of P N i s an arbitrary integer > P was constructed by Kuk in order to show the equivalence of V and C a p st a t i s t i c . If x,y are the independent and dependent variables from an ordinary multiple regression and M is the matrix of corrected sums of squares and crossproducts defined as (3.1.1) -43-„ , T T M =/x x x y T T y x y y then the residual sums of squares, RSS is T T T - l T RSS = y y - y x (x x) x y. By treating (3.1.1) as i f i t were a matrix of corrected sums of squares and crossproducts of independent and dependent variables computed from a sample size N, the residual sums of squares obtained by this matrix i s / I T A A T — 1 T* RSS(full) = (N - p - 1) + S A 3 - B A A A 3 A T A AT = (N — p — 1) + 3 A 3 — P A P = (N - p - 1) The residual sums of squares for the model a is RSS(a) = RSS(full) -I- p T C X(3 (3»1«2) 2 22 2 and the Mallows' Cp s t a t i s t i c for the model a is Cp(a) = RSS(a) + 2 (Pa + 1) - H -44-Where S 2 = RSS(full) = 1 by the choice of (3»1«1) . (N - p - 1) Substituting (.3*1*2) for RSS(a) the above equation for C^Ca) can be simplified as C (a) = RSS(full) + PTC_1|3 + 2(P + 1) - N p 2 22 2 a = (N - p - 1) + (2 - N) + P TC _ 1P + 2p 2 22 2 a = -p + 1 + W + 2pa a = V - p + 1 a Hence i t i s clear that the criterion V is formally equivalent to Mallows' C . The problem can now be handled by the standard P s t a t i s t i c a l package BMDP program 9R, which does a l l possible subsets linear regression. The subset that minimizes is chosen to be the best subset. Section 3.2 APPLICATIONS AND RESULTS For this analysis a l l 14 variables were used. The estimated A coefficients and the estimated covariance matrix of p were obtained -45-from the output of BMDP program 2L. The estimated covarian.ce matrix was then inverted with the help of a Fortran subroutine. (See Appendix A 2) Using this inverted matrix and B, the matrix (3.1.1) was constructed. In this study, (3.1.1) was a 15 X 15 symmetric matrix. The matrix is used as a covariance matrix for input to the BMDP program P9R. In the control language for this program, the value of the sample size N, should be specified in the INPUT paragraph. (See Appendix 2) The best subset selected by this method was SEX, MI, HYPT, D2 and ADDOP which had a Cp value of 5.18. The second best was the model AGE, SEX,MI,HYPT,D2 and ADDOP with a Cp value of 5.58. The second best was the subset selected by the stepwise procedure. The difference between the Cp values for the best subset and the second best subset is very small. The coefficient for age in the second best subset was 0.03^6 and the corresponding standardized coefficient was 2.61. When these values are compared to the corresponding values obtained from stepwise regression, i t is clear that AGE is a significant variable. From the results (discussed in section 2.2) on the significance of correlation coefficients, i t appears that there is no evidence for any association between AGE and the other variables. Separate stepwise logistic regressions were carried out for each variable; taken as a binary response and the independent variable as AGE. A l l these regressions indicated a p-value greater than 0.6 for AGE. Several contingency tables were constructed for AGE vs the other five significant variables and a Pearson's chi-square goodness of f i t test was carried out. According to the results presented in Table XI, there -46-is no evidence for any association between AGE and other variables in this data set. MI, AGE, HYPT D2, SEX and ADDOP were selected as the significant variables for the f i n a l model. -47-TABLE XI. Two-way Contingency Tables AGE < 40 yr 41-60 yr 61-80 yr >80 yr Male 4 68 139 6 217 SEX Female 1 23 57 5 86 5 91 196 11 303 Pearsons's .23, significance level =0.53 AGE < 40 yr 41-60 yr 61-80 yr >80 yr No 4 72 132 8 216 Yes 1 19 64 3 87 5 91 196 11 303 Pearsons's X2 = 4.41, significance level = C .22 -48-TABLE XI. (continued) AGE < 40 yr 41-60 yr 61-80 yr >80 yr No 4 79 176 10 269 Yes 1 12 20 1 34 5 91 196 11 303 D2 Pearsons*s x =0.99, significance level =0.80 3 AGE < 40 yr 41-60 yr 61-80 yr >80 yr No 5 76 162 11 254 ADDOP Yes 0 15 34 0 49 5 91 196 11 303 Pearsons' s x =3.29, significance level 3 = 0.35 AGE < 40 yr 41-60 yr 61-80 yr > 80 yr No 4 73 154 7 238 MI Yes 1 18 42 4 65 5 91 196 11 303 Pearsons's 2 X = 1.61, significance level = 3 = 0.66 -49-Chapter 4 CHECKING FOR INFLUENTIAL OBSERVATIONS In some data sets, one of the cases may have sufficient impact upon the regression such that, if that case were deleted, different results would have been obtained. Such cases are known as influential observations. It is suggested that empirical influence functions computed for each covariate and each observation in the proportional hazards regression model, can be useful to identify these influential observations. (Reid and Crepeau, 1985). The theory and method discussed in the above reference was applied to this study. Influence function values are computed for each case (patient) and each covariate. Since i t is d i f f i c u l t to consider influence function values for a l l the 14 variables and 303 observations, attention was restricted only to the six significant variables. The estimated coefficients were obtained from BMDP program 2L and a Fortran program was used to calculate the influence function. From the summary in Table XII i t is seen that case 160 (the case numbers are with respect to a l l 535 cases) has the largest value of the influence function for covariate HYPT and D2. Observation 1 had the smallest value for covariates MI and D2. Table XIII summarizes the proportional hazards regression models; the f i r s t using a l l observations and the others excluding different cases. The magnitude of the influence function for each case is roughly A A A consistent with the magnitude of (8 - |3 .) where |3 . is the -50-estimated coefficient when the i case is deleted. In this study, one unit on the influence function scale correspond to ||3 - |3 1^ approximately equal to 0.003. From the values given in Table XIII, i t is clear that the estimated coefficients and their standard errors do not change very much and this indicates that none of these specified cases seem to have very strong influence on the estimated parameters. This also agrees with the proportional hazards plots and the residual plot of section 2.5 because none of these cases show up on either of these plots. -51-TABLE XII. Summary of Influence Function Values Covariate Maximum of influence Minimum of influence function (case no:) function (case noO AGE 0.5069 (431) -1.2242 (324) SEX 44.9926 (225) -24.0415 (257) MI 17.2703 (106) -23.3969 (1) HYPT 20.1358 (160) -19.6145 (30) D2 33.6886 (160) -45.0780 (1) ADDOP 64.2133 (46) -40.4665 (162) -52-TABLE XIII. Proportional Hazards Regression Model Estimated coefficient (standard error) AGE SEX MI HYPT D2 ADDOP A l l data 0.0400 -0.8171 1.3261 0.7661 0.8893 -0.7619 with modeK**) (0.015) (0.422) (0.303) (0.317) (0.400) (0.494) Case 160 deleted 0.0394 (0.015) -0.7770 (0.423) 1.3636 (0.306) 0.6943 (0.322) 0.7575 (0.421) -0.7376 (0.495) Case 1 deleted 0.C893 (0.015) -0.8290 (0.421) 1.3894 (0.306) 0.7474 (0.315) 1.0316 (0.401) -0.7916 (0.494) Case 46 deleted 0.0398 (0.015) -0.7916 (0.423) 1.3817 (0.307) 0.8113 (0.320) 0.9340 (0.402) -0.7502 (0.495) Case 162 deleted 0.0404 (0.015) -0.8355 (0.422) 1.3525 (0.303) 0.7962 (0.316) 0.8827 (0.399) -0.6446 (0.494) A l l cases specified in Table XII deleted 0.0495 -0.9765 1.4662 0.6816 1.1120 -0.7984 (0.016) (0.487) (0.332) (0.344) (0.430) (0.555) -53-Chapter 5 CONCLUSIONS An outline of the surgical techniques of bypass surgery for peripheral vascular disease is presented. The data analysed in this study is based on 303 patients surgically treated for peripheral vascular disease at St. Paul's Hospital, Vancouver, B.C., between 1975-1977. A subset of the recorded variables was used for the analysis due to problems with incomplete records. When the month of death and/or operation was unknown, i t was assumed to be June. Sta t i s t i c a l procedures such as Cox's regression, stepwise regression, a l l subsets regression for the proportional hazards model as well as contingency tables are used to isolate important variables in predicting survival and to discover associations among variables. The conclusions of these analyses are: 1) the most important variables in descending order of their significance are myocardial infarction, presence or absence of hypertension, sex and whether or not a revision operation was done. History of a previous coronary bypass graft is highly correlated with survival but the comparison of i t s significance to the other significant variables is not possible since the coefficient corresponding to history of a previous coronary bypass graft could not be estimated. - 5 4 -2) age is also related to survival in this data set. However, since there is no control group; that i s we do not have a group of age matched patients who have not undergone surgery for peripheral vascular disease, one cannot make a strong conclusion about the effect of age on survival of the patients who have had surgery for peripheral vascular disease. 3) patients who have had Femoropopliteal grafting technique have a better survival than the patients who had undergone any peripheral vascular surgery belonging to the category "OTHER". 4) in this data set hazard rate for males is almost twice that for females. 5) performing a revision operation tends to halve the hazard rate. 6) presence of myocardial infarction or hypertension is related to poorer patient functioning. 7) although pairwise correlation between some of the variables (example; age and ischemia, ischemia and claudication) is suspected, tests used in this study did not indicate i t . One of the d i f f i c u l t i e s in this study was that there was no control group available. Hence strong conclusions could not be made in -55-c e r t a i n instances. The other problem was that the data was not completely recorded, e s p e c i a l l y the date of death and/or operation. Although there was data f o r 535 patients, 89 cases had to be deleted because t h e i r year of death and/or operation was not known. Then, another 143 cases were excluded from the study since t h e i r variables had missing values. Hence, i f we had more complete and accurate data, the r e s u l t s could have been more accurate. Also, the type of operation should be c l e a r l y s p e c i f i e d . In t h i s data set, only the types ABF, FP and AAA were c l e a r l y noted. The operation types belonging to category "OTHER" were noted very poorly; e s p e c i a l l y i f the s p e c i f i c operation type i n t h i s category was more accurately recorded, one could have seen i f there was a d i f f e r e n c e i n s u r v i v a l rates between those types. One could have also checked f o r in t e r a c t i o n s between operation type and other v a r i a b l e s . The other problem was that i n c e r t a i n cases the hand wr i t i n g i n the data sheets and patient cards was i l l e g i b l e . I t would have been much better i f the people who were engaged i n the survey or the medical s t a f f could have entered the records into computer f i l e s and then given them to the s t a t i s t i c i a n f o r s t a t i s t i c a l analyses. -56-BIBLIOGRAPHY Anderson, T.W. (1984). An Introduction to Multivariate S t a t i s t i c a l Analysis. 2nd Ed. New York: Wiley. Cooperman, M. Pflung, B., Martin, E.W. and Evans, W.E. (1978). Cardiovascular risk factors in patients with peripheral vascular disease. Surgery 84, 505-509. Cox. D.R. (1972). Regression models and l i f e tables (with discussion). J.R. Statist. Soc. B 34, 187-202. Draper, N.R. and Smith, H. (1981). Applied Regression Analysis. Gaspar, M.R. and Barker, W.F. (1981). Peripheral Vascular Disease. 3rd Ed. Kalbfleisch, J.D. and Prentice, R.L. (1980). The St a t i s t i c a l Analysis of Failure Time Data. New York: Wiley. Kuk, A.Y.C. (1984). A l l subsets regression in proportional hazards model. Biometrika, 71, 587-92. Lawless, J.F. (1982). St a t i s t i c a l Models and Methods for Lifetime Data. New York: Wiley. Miller R.G. (1982). Survival Analysis. New York: Wiley. Reid, N. and Crepeau, H. (1985). Influence functions for proportional hazards regression. Biometrika 72. 001-009. Weisberg, S. (1980). Applied Linear Regression. Wright, C.B. (1983). Vascular Grafting: C l i n i c a l Applications and Techniques. -6 _ -u v. VJ _ >, v -c y «j «0 ft r-or Q or o o •u « —* 0 'i I j i ! i i . 1 | 1 1 0 i >t 1 [ Q 5 ^ < - -a 0 . v) 1 4 i 1 0 •i >1 (0 5 M J i l l M -r H T v) •I « c r * 1 -.• r a. - Q 4 •K 1! f .• •-* f ' U VJ 1 ! <a <!§ I ? 1 1 1 i "> •» ^ 1 I / • < < « VJ t«. ca •2: Q 1 & U <a "5: 0) 11 ( vj -» j •u r . ft. > ? -K ' H "t * VJ Vb N!> v_ v!) ^ ^ a K -2? N ? -a ** a _ 0 m N .•- t> ^ -1 $ s 5 _ o *>3 ? cy VI » ^ (V, to ^ 2 i 2 Co ™ a- ^ ^ <i & * * 2 V) Cr; Q Q «l tt-_ *l i vP On N *• 4 i VI .=1 ^ < * 1 *0 ^ Hi *•> i ; •v < -il a 1 U\ 0 « i vu " y * ; • \ • -58-APPENDIX 1 (continued) PATIENT CARD C A D — P V D S t u d y P a g e N o . N a m e • ABF • FP • Other. ft AAA -W • Isch. • Claudication • Other • Prev. Vase. Op. Angina IS Ml 1 iL • A C B G -A g e _ _ _ L S e x _ _ _ ASOD Operation Symptoms CAD • Diabetes • Hypertension ____ Duration of follow-up • Cardiac • Non-Cardiac • Cardiac • Non-Cardiac Af.-T- ijuvJtf.o-Early Death Late Death -59-APPENDIX 1 (continued) FORMAT FOR COMPUTER FILES (1) (2) (3) (4) (5) (6) (7) (9) SEQUENCE NUMBER: PAGE NUMBER: AGE: SEX: OPERATION TYPE: 99 = Missing 0 = Male 1 = Female ABF 0 = NO 1 = YES FP 0 = NO 1 = YES AAA 0 = NO 1 = YES OTHER 0 = NO 1 = YES ADDITIONAL SURGERY: SYMPTOMS: 0 = NO (8) HISTORY: Ischemia 0 _ NO 1 _ YES Claudication 0 = NO 1 = YES Previous vascular operation 0 — NO 1 — YES Angina 0 NO 1 = YES Myocardial infarction 0 = NO 1 = YES Previous coronary bypass 0 = NO 1 = YES Diabetes 0 = NO 1 = YES Hypertension 0 = NO 1 = YES STATUS OCTOBER '81: (10) CAUSE OF DEATH: (11) DATE OF DEATH: 1 = YES Early death = 0 Late death = 1 S t i l l alive = 2 Unknown = 9 Non cardiac = 0 Cardiac = 1 S t i l l alive = 2 Unknown = 9 DD/MM/YR i f DD unknown leave blank i f MM unknown leave blank i f YR unknown type 99 i f patient is s t i l l alive leave a l l columns blank. - so -da) DATE OF OPERATION: DD/MM/YR i f DD unknown leave blank i f IMM unknown leave blank i f YR unknown type 99 If information on (3) to (8) is known to be missing, then this was indicated with a code of 9 -61-APPENDIX 2 FORTRAN SUBROUTINE FOR MATRIX INVERSION REAL*8 DA, DT, DDET, DCOND DIMENSION DA (8,8), DT (10,10), IPERM (16) C **READ IN MATRIX DATA** READ (5,10) N 10 FORMAT (12) READ (5,20) ((DA (I,J), 1=1, N), J=l, N) 20 FORMAT (F5.0) C **FIND THE INVERSE** CALL INV (N, NDIMA, DA, IPERM, NDIMT, DT, DDET, JEXP, DCOND) IF (DDET) 25, 30, 25 C **WRITE OUT RESULTS** 25 WRITE (6,40) N, DDET, JEXP, DCOND 40 FORMAT ('N=', 12, 5X, 'DETERM= *, G10.3, '*10**', 12/'INVERSE') WRITE (6,50) (( DT(I,J), I = 1,N), J=l, N) 50 FORMAT (IX, 14G10.3) STOP 30 WRITE (6,60) 60 FORMAT ('INVERSION FAILED') STOP END CONTROL LANGUAGE FOR BMDP:9R PROGRAM / PROBLEM TITLE IS 'PVD DATA'. / INPUT UNIT = 9. CASES = 303. VARIABLES = 15. TYPE = COVA. SHAPE = SQUARE. FORMAT is ' (15F8.3)'. / VARIABLE NAMES ARE AGE, SEX, ISCH, CLAUD, PVOP, ANGINA, MI,DIAB, HYPT, DI, D2 , D3 , D4 , ADDOP, SURVIVAL . / REGRESS DEPENDENT IS SURVIVAL. INDEPENDENT ARE 1 to 14. METHOD = CP. TOLERANCE = 0.0001. PENALTY = 2. NUMBER = 3. ZERO. /END
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Statistical analysis of survival data : an application...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Statistical analysis of survival data : an application to perhipheral vascular bypass surgery Kottegoda, Preethi Nirmalie 1985
pdf
Page Metadata
Item Metadata
Title | Statistical analysis of survival data : an application to perhipheral vascular bypass surgery |
Creator |
Kottegoda, Preethi Nirmalie |
Publisher | University of British Columbia |
Date Issued | 1985 |
Description | A retrospective study was carried out on 535 patients who underwent bypass surgery for peripheral vascular disease. Survival data for 303 patients out of these 535 cases are subjected to quantitative analysis. The main interest is in survival of these patients in order to identify the risk factors. The importance of types of grafting technique in long-term survival is also considered. Statistical methods used to ascertain the important prognostic variables include Cox's proportional hazards model, stepwise regression and all subsets regression in proportional hazards model discussed by Kuk (1984). In descending order of significance, the most important variables are myocardial infarction, presence or absence of hypertension, sex and whether or not a revision operation was done. The variable, history of a previous coronary bypass graft is highly correlated with survival but the comparison of its significance to the other significant variables is not possible with Cox's model. Age is also related to survival in this data set. However, since there is no control group, one cannot make a strong conclusion about the effect of age on survival of the patients who have had surgery for peripheral vascular disease. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-06-20 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
IsShownAt | 10.14288/1.0096717 |
URI | http://hdl.handle.net/2429/25912 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1986_A6_7 K67_8.pdf [ 2.55MB ]
- Metadata
- JSON: 831-1.0096717.json
- JSON-LD: 831-1.0096717-ld.json
- RDF/XML (Pretty): 831-1.0096717-rdf.xml
- RDF/JSON: 831-1.0096717-rdf.json
- Turtle: 831-1.0096717-turtle.txt
- N-Triples: 831-1.0096717-rdf-ntriples.txt
- Original Record: 831-1.0096717-source.json
- Full Text
- 831-1.0096717-fulltext.txt
- Citation
- 831-1.0096717.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0096717/manifest