STATISTICAL ANALYSIS OF SURVIVAL DATA: AN APPLICATION TO CORONARY BYPASS SURGERY by NANCY RE ID B. Math., University of Waterloo, 1974 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES Institute of Applied Mathematics and Statistics We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA August, 1976 @ Nancy Reid, 1976 In presenting th i s thesis in pa r t i a l fu l f i lment of the requirements for an advanced degree at the Un ivers i ty of B r i t i s h Columbia, I agree that the L ibrary shal l make it f ree ly ava i l ab le for reference and study, I fur ther agree that permission for extensive copying of this thesis for scho lar ly purposes may be granted by the Head of my Department or by his representat ives. It is understood that copying or pub l i ca t ion of th is thesis for f inanc ia l gain sha l l not be allowed without my writ ten permission. Department of h^THgnAT I CS The Univers i ty of B r i t i s h Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Date / W - jlG i i ABSTRACT The survival data for two hundred patients who underwent coronary bypass surgery are subjected to quantitative analysis. The questions of interest are: (i) the long-term survival rates of these patients, ( i i ) the prognostic factors influencing survival, and ( i i i ) the importance of types of grafting in long-term survival. S t a t i s t i c a l methods used to ascertain the important prognostic variables include contingency table analysis and discriminant analysis. It is found that le f t ventricular function, age, risk classification, and extent of occlusion of the diseased artery are the most influential variables. The relationship of these variables to survival i s analysed in detail using the proportional hazards model discussed by Cox (1972). i i i ACKNOWLEDGEMENTS I would like to thank Dr. J. V. Zidek for suggesting the topic of this thesis, and for his generous time and assistance given during the writing. I am also indebted to Dr. J. E. Koziol for his careful reading and helpful criticisms of this thesis. I would like to express my gratitude to Dr. U. S. Page of Portland, Oregon, who kindly provided the data and offered advice on the medical aspects of this thesis. The financial support of the National Research Council of Canada and the University of British Columbia is gratefully acknowledged. Introduction Bypass surgery for coronary artery disease has been gaining wide acceptance as an effective alternative to medical treatment. However, there is much controversy about the efficacy of surgery, and the long-term benefits are extensively debated. As a result, most surgical centres keep follow-up records of the survival experience of their patients. In this paper, records of one such centre are subjected to quantitative analysis, i n order to assess the long-term survival rates and the factors affecting survival. C l i n i c a l details of bypass surgery and a history of the data are presented in Section I. Retrospective data from undesigned experiments present many problems of analysis. The data collected has been subject to many biases, and assumptions necessary for even simple parametric models, such as linear regression, may not be f u l f i l l e d . There are many variables measured on relatively few patients, and i t is necessary to isolate a manageable number of important variables. In addition, many survival time observations are missing because cases were lost to medical follow-up, or the study terminated with many cases alive. This problem of censored observations is typical of medical studies. In Section II answers to the following questions are sought: 1) What factors are the most important in predicting survival? 2) How does grafting technique affect survival? The s t a t i s t i c a l methods employed to answer these questions are contingency table analysis, stepwise discriminant analysis, cluster 2 0.2 analysis, and Sonquist and Morgan's interaction detector. It i s found that l e f t ventricular function, age, risk classification, and extent of occlusion in the diseased arteries are the most important prognostic variables. These conclusions agree with results of similar studies. In Section III the relationship between survival and the important independent variables i s examined in more detail via the proportional hazards model. This model incorporates l i f e table methods developed for data with censored observations with regression analysis. It i s shown how a separate survival distribution ( l i f e table) can be estimated for each patient, based on the survival experience of the whole sample as well as his measurements on the independent variables. 3 i . l I. Background of the data Coronary artery bypass surgery has provided an effective method for treatment of myocardial ischemia, and has been particularly successful in relieving anginal pain. The survival experience of medically and surgically treated patients has been extensively investigated. (See Mundth and Austen, 1975, for a review.) Although surgery is associated with successful short term results and improved functional status, there are few reports of long term followup of surgical patients. Most studies are retrospective, and do not have a suitable group of medically treated patients with which to compare surgical results. This is due in part to the ethical and practical d i f f i c u l t i e s associated with prospective, randomized studies, such as that in progress since 1972 at twelve Veterans Administration Hospitals in the United States for which- preliminary results are just now being reported (Mathur and Guinn, 1975). The shortage of long term followup is due to a variety of factors. The necessary surgical procedures have been only recently developed, and few centres have been performing operations for more than six years. In addition, surgical mortality is known to be related to the surgical experience of the centre (Mundth and Austen, 1975), and surgical techniques have been substantially refined over the past six years (Tecklenberg, et. a l . , 1975). Thus the long term results of the earliest patients cannot be regarded as representative of the long term results derivable from current practice, so they are not 4 1.2 thoroughly investigated. There are also practical d i f f i c u l t i e s i n finding and recording long-term data. It is very time-consuming to obtain regular follow-up information on a l l patients, and the data f i l e s soon become so large as to be unmanageable without a computer. Another important unanswered question concerns the relative advantages and disadvantages of the various surgical techniques in use. Although many changes and supposed improvements have been made in surgical practice since the techniques were f i r s t developed, l i t t l e comparative data has been reported. The most common type of bypass is the saphenous vein aortocoronary bypass f i r s t described by Fav.aloro et. a l . (1968). An appropriate length of saphenous vein is removed from the leg. One end of the vein is grafted to the aorta, and the other end to the distal portion of the occluded artery (i.e. past the area of stenosis, or narrowing). Thus the blood supply to the heart muscle can pass through the graft, and the heart can receive the necessary oxygen. Such a procedure is called direct myocardial revascularization. (Indirect revascularization involves the implanting of a freely bleeding artery, and is not often used.) Separate grafts are used for each area of stenosis. Most patients receive one or two bypass grafts, but some receive three or four. The direct internal mammary artery bypass proposed by Green et. a l . (1972) has also been used successfully, and several studies have reported better results with the mammary artery than with the saphenous vein (e.g. Siegel et. a l . , 1975; Mark et. a l . , 1975; 5a 1.3 McCormick et. a l . , 1975). The mammary artery has i t s own blood supply from the aorta, so when the area of stenosis of the coronary artery is suitably located the free end of the mammary artery can be grafted to the dis t a l portion of the occluded artery. In some cases the mammary artery is too narrow to provide adequate flow, or too far from the occlusion to provide a suitable bypass, in which case the saphenous vein bypass must be used. The rad i a l artery has also been used as a bypass (U. S. Page, personal communication). The bypass vein or artery is usually grafted to the distal portion of the occluded artery by stitching the end of the bypass to a small hole cut in the artery (L. H. Cohn, 1973). Such a graft is called an end-to-side anastomosis (Trapp and Bisarya, 1975). The other end is then grafted to the aorta by a similar anastomosis (unless the internal mammary artery is used). The aorta is often cross-clamped during this stage of the operation. Sometimes the side of the bypass vein is joined to the side of the artery in a side-to-side anastomosis. Due to the hemodynamics involved, this method might be expected to provide more efficient revascularization. Side-to-side grafts are further differentiated as longitudinal and transverse, depending on the angle at which the bypass is grafted to the artery. If a side-to-side anastomosis is used, i t is possible to bypass more than one occlusion with one segment of vein by using several side-to-side grafts and ending with an end-to-side graft. This procedure is called a sequential bypass. Figure 1.1 is a sketch of the heart showing the three main coronary arteries and their branches FIGURE 1 . 1 TYPES OF BYPASS GRAFT 5b Good Samaritan Hospital and Medical Center Portland, Oregon Coronary Artery Grafts Date: I Left Internal Mammary (LIMA) Circumflex Left Anterior Desc. (LAD) Obtuse Marginal ft Mean Press Flow 1 R units! . i I— ^ Post. Desc. Acute Marginal Mean Pressure (mm Hg) LEGEND ' individual vein graft; end-to-sid< l e f t internal marai artery graft sequential side-t< side graft side-to-side longitudinal graf side—to-side transverse graft 4 •1 • R= Flew (cc/ninj x 1 0 -6 1.5 (courtesy of Northwest Surgical Associates), and the various types of grafts that might be used (L. H. Cohn, 1973; U. S . Page, personal communication). The data analysed here is not the result of a randomized study, but rather a collection of observations and measurements from reports on patients of Northwest Surgical Associates (Portland, Oregon) who have undergone bypass surgery since March, 1969. Each patient's record provided information on 163 variables in seven categories: new patient information, patient history and risk factors, catheterization data, surgical data, graft data, hospital course, and followup information. A reprint of the data sheet is included i n Appendix A. The data was recorded manually on a variety of record sheets, unt i l January, 1975, when conversion to computer f i l e s began. Patient data is now entered directly into the computer. As of May, 1975, the computer f i l e held the records of the f i r s t two hundred patients (March, 1969 through November, 1971) and the most recent one hundred patients (December, 1974 through March, 1975). The principal problem of interest in this study was the determination of long term survival rates and the influence of surgical graft procedure on survival. The mammary artery vs. saphenous vein issue has been discussed by several surgical groups, and this issue is s t i l l widely debated. L i t t l e has been reported on- the success rates with other types of grafts. Although there are indications that side-to-side grafts have been more successful than individual grafts, the evidence has not been analysed quantitatively. 7 1.6 The second problem was that of uncovering any systematic biases that may have been present, and to analyse the results with such biases taken into consideration. A systematic bias could cause a false association between graft type and survival, or could mask a true association between graft type and survival. The third problem of interest was that of determining pre--operative variables influencing survival and, i n particular, how variables subject to direct control affect patient prognosis. In studies reported to date in the literature, the most influential variables are l e f t ventricular performance (as measured by end-diastolic pressure, ejection fraction, dyskinesia) (Oldham et. a l . , 1972), presence of congestive heart failure and valvular disease (Tecklenberg et. a l . , 1975), extent of revascularization, recent myocardial infarction, and surgical experience (Mundth and Austen, 1975). Other variables thought to affect survival are extent of occlusion in the diseased arteries, hypertension, and amount of post-operative bleeding. Variables of general interest are smoking, family history, and level of serum cholesterol. The f i n a l objective of the analysis is the determination of associations among the variables. This information would help to assess the importance of prognostic variables and the extent of systematic bias, as well as indicating how the questionnaire might be improved. For the investigation of survival rates the most recent one hundred patients were not used, as there was no follow-up information 8 1.7 available for them. Eight other patients were deleted because their graft type was not recorded, or their records contained errors or inconsistencies. The f i n a l sample size was 192. The questionnaire has undergone many changes as the study has progressed. As a result, many variables on the questionnaire have not been recorded for the earlier patients. The response recorded for such variables is the "unknown" category. As an i n i t i a l data reduction technique the univariate frequency table for each variable was examined and variables with 75% or more of the total responses "unknown" were deleted from the data f i l e . These variables would provide no relevant information i f they had been recorded for only a few patients. Just ten patients received four bypass grafts, so variables deleted included a l l measurements on the fourth graft. Table 1.1 gives a l l variables that were sufficiently complete to be retained i n the analysis. Of course, the remaining variables s t i l l had some missing observations, which were replaced by the mode for that variable (the response of the highest proportion of patients). For some variables categories were grouped to f a c i l i t a t e analysis. For example, Table 1.2 is the frequency table for variable SMOKE, and Table 1.3 is the reduced table after grouping. The other variables similarly grouped were INFINT, DIAB, HYPERT, LVEJCT, DISCOR, and ARRYTH. The data f i l e i n i t s f i n a l form had 192 cases and 62 variables, as well as follow-up information. The analysis presented in Section II attempts to uncover relationships among these variables and to isolate important variables. TABLE 1 . 1 Variables not deleted QUESTION NUMBER VARIABLE NAME VARIABLE DESCRIPTION QUESTION NUMBER VARIABLE NAME VARIABLE DESCRIPTION 2 PTNUMB patient number 70 DISCOR diseased coronary not grafted 3 DATE date of entry 74 PERFPR perfusion prime 9 ANGENT angina at entry 75 FILTER f i l t e r on oxygenator 10 SEX sex 76 VENT venting of heart 10 AGE age 78 HCT hematocrit on bypass 11 ANGDUR duration of angina 82 CLAMP minutes-aorta; clamped- • 12 ANGCH recent anginal change 90 NUMGRFT # of grafts 1 3 INFHIST history of infarctions 9 1 AREA ( 1 , 2 , 3) area of graft 2 3 14 INFINT interval between inf's. 92 SITE (*) 3 site of anastomosis 15 CHF congestive heart failure 93 1 TYPE ( 2 ) 3 surgical technique 16 ECG ANT electrocardiogram(anterior) 94 SIZE ^ o diameter of graft 17 ECG INF electrocardiogram(inferior) 96 J FLOW * o flow through graft 18 ECG LAT electrocardiogram(lateral) 98 J STEN 2? , o proximal stenosis 30 DIAB diabetes 99 J CONF ^ 3 graft "confidence TABLE 1.1 con't. QUESTION NUMBER VARIABLE NAME VARIABLE DESCRIPTION QUESTION NUMBER VARIABLE NAME VARIABLE DESCRIPTION 31 HYPERT hypertension 105 STAT 1 2 , graft status on discharge 36 FAMILY family history 107 3 ARRYTH arrythniias 37 SMOKE smoking 109 POBLD post-op bleeding 52 LVE l e f t ventricular emptying 110 CARDOP decreasing cardiac output 53 LVEJECT l e f t vent, ejection fraction 116 SGOT serum enzyme level 54 LVMANT l.v. motion (anterior) 118 MI myocardial infarction in ungrafted area 55 LVMAPI l.v. motion (apial) 122-123 DAYS 1 days admission ->- operation 56 LVMINF l.v. motion (inferior) 124 DAYS 2 days operation discharge 57 CORSTAT 1 coronary status (anterior) 125 COAG anti-coagulant 58 CORSTAT 2 coronary status (circumflex) 108 TRANSF transfusions 59 CORSTAT 3 coronary status (right) 130 SURV survival status on discharge 69 RISK risk classification 11 1.10 TABLE 1.2 Original Frequency Distribution of SMOKE Category Name Frequency unknown 125 never 28 quit 12 < 10 pk/yr. 1 10-40 pk/yr. 18 > 40 pk/yr. 8 TABLE 1.3 Reduced Frequency Distribution of SMOKE Category Name Frequency unknown 125 non-smoker at time of operation 40 smoker at time of operation 27 12 II.1 II. Analysis of the Data Contingency tables, discriminant functions, cluster analysis and an interaction detection procedure are used in an attempt to reduce the large number of variables to a manageable subset, and to answer the following questions: 1) Which variables influence survival? 2) Is a systematic bias affecting selection of patients for a particular type of graft? The analyses indicate that such a selection bias is not in effect with respect to functional measurements or risk category, but that there is a selection bias over time. Conclusions in the literature about important prognostic variables are supported by this study. 1... .Contingency Table Analysis The n x r contingency table examines f i r s t order associations between variables. The null hypothesis (independence of rows and columns), 2 conditional on the observed marginal totals, is tested by a v, ... (n-1)(r-1) s t a t i s t i c . Categorical variables are differentiated as nominal, ordinal, and interval. The values assigned to a nominal variable have no meaning in themselves, but only serve to label the elements of the sample. Examples of nominal variables are SEX and ECG. On the other hand, the values of an interval variable are directly related to the categories of that variable. For example, CLAMP is coded 1 through 7 according to 13 II.2 the number of minutes that the aorta was cross-clamped during the operation. In this case, there is a mathematical relationship between the values assigned the category and the values of the variables. Another such example is AGE, which for purposes of analysis was categorized i n ten-year age groups. Ordinal variables assume values that measure a ranking of the variable. Examples of ordinal variables are SMOKE (responses non-smoker, smoker, and heavy smoker), and RISK (responses "Class 1" through "Class 5" represent increasing degree of risk involved i n surgery). Different non-parametric estimates of correlation have been developed for the three classes of variables. The product-moment correlation coefficient, p , is calculated as: E(r,-r)(c.-c) p = 1 i — _ [E(r. - r ) 2 E(c.-c)V and is used to measure association between interval variables. However, with nominal and ordinal variables p is not informative, since their values are respectively labels and ranks. Freeman's "coefficient of determination", 0 , i s used when one variable is nominal and the other ordinal. It is based on Wilcoxon's signed rank test. Association between ordinal variables is measured by a coefficient, y , developed by Goodman and Kruskal; based on the extent to which the rank of one measurement can predict the rank of the other. When both variables are nominal Guttman's "symmetric correlation coefficient", A , measures the frequency of 14 II.3 agreements and inversions between the categories. In addition to calculating these measures of association and estimating their significance we can test the hypothesis of row-column independence by Pearson's chi-square goodness-of-fit test or by the log-likelihood ratio chi-square. Pearson's chi-square is n r ( e. . - f. . ) 2 Y2 = y y _ J J i x (n-l)(r-l) .i± e.. where e_^_. = expected frequency of row i , column j f = observed frequency of row i , column j . 2 The log-likelihood chi-square, also distributed approximately x i - n large samples is given by " ( n - l , ( r - l , " " 2 j x £ i j ^ <^> The sta t i s t i c s described above are used extensively in the social sciences and are available in most s t a t i s t i c a l packages. If the hypothesis of row-column independence is not supported by the data, i t may be that the two variables are indeed correlated; but i t may also be the case that the two variables are only correlated with a third variable. Controlling for this third variable may remove the apparent correlation between the original two. It is then necessary to consider partial correlations i n an attempt to uncover the masked variable. On the other hand, the hypothesis of independence may be falsely accepted because a third variable is hiding the association 15 II.4 between the f i r s t two. This latter phenomenon is referred to as Simpson's paradox (Blyth, 1973). A simple example of Simpson's paradox is provided by the following contingency tables of A vs. B , B vs. C , and A vs. B and C : 1 B 2 1 19 18 37 A 2 20 21 41 39 39 78 • 1 B 2 1 27 10 37 C 2 12 29 41 39 39 78 There is no evidence here that A and B are correlated, but when stra t i f i e d by C the null hypothesis is rejected: 1 B 2 C = 1 9 10 19 2 18 0 18 27 10 37 16 II. 5 A B C = 1 2 1 11 7 18 2 1 22 23 12 29 41 In spite of these d i f f i c u l t i e s , the examination of first-order correlations is a useful preliminary investigation into the nature of the relationships among the variables, and w i l l be used as a basis for further analysis. One of the most important questions i n this study is the extent of association between survival and the other variables, especially type of graft. Survival experience was somewhat arbi t r a r i l y categorized as either less than three years or greater than three years. It was necessary to restr i c t the number of categories to two to avoid many empty ce l l s ; the sample size being small relative to the number of variables. The cut-off point of three years was chosen becuase the study was expecially concerned with late survival and in bypass surgery studies, three years i s considered lont-term (see e.g. Tecklenburg i t . a l . , 1975). Twenty-eight patients were excluded from this analysis because three-year information was not available. Of the 164 remaining, 84 survived past three years and 80 did not. Theoretically, 0 varies between 0 and +1, and y between -1 and +1. In practice, however, 9 > .4 and |y| > .5 Note: Some patients lost to followup were inadvertently included in the group of 80 coded as not surviving past three years. While correcting for these patients .changes the numerical results of Section II.1, i t does not affect the conclusions. 17 II.6 2 were seldom observed, even when the x tests were highly significant and the rows and columns clearly dependent. For example, the 2 x 2 table for SURV vs. PTNUMB is SURV 1(0-100) PTNUMB 2(100-200) < 3 yr. 28 52 80 > 3 yr. 60 24 84 88 76 164 2 2 with a Pearson x-j_ o f 20.72 (a = .00) and a log-likelihood x-^ o f 22.66 (a = .00) ; yet the coefficient of association, 0 , was only .36603. Similarly, for SURV vs. RISK, the 2 x 5 table is SURV RISK Cl-1 Cl-2 Cl-3 Cl-4 Cl-5 < 3 yr. 7 27 23 16 7 80 > 3 14 41 21 5 3 84 21 68 44 21 10 164 2 2 with a Pearson x ^ °f 13.82 (a = .008) and a log-likelihood x ^ °f 14.31 (a = ;007) ; yet the coefficient of association, y , is only -.38957. However, a relative measure of the coefficient's importance is available, because the distribution of y under the null hypothesis is approximated by an expression involving the error function, and i t s significance level, a , can be calculated. For the above table a is 18 II. 7 .001. The Freeman coefficient 6 is approximately distributed as 2 Pearson's x • (The significance test may be invalidated by too many empty cell s , however.) So the absolute values of these correlation coefficients are not informative by themselves, but an assessment of their relative importance is available. Table II.1 presents the coefficient of association and the 2 X values for the variables found to be significantly correlated with survival. Variables LVMANT, LVMAPI, LVEJECT, C0RSTAT1, and RISK are a l l measures of heart function before the operation. Studies similar to this one have also concluded that these variables have a significant effect on survival experience. The variables i n Table II.1 below SMOKE are a l l measures of surgical techniques used (with the exception of FAMILY). Although they are not very significant as a whole, they are of interest because they are more open to control by the surgeon. Three variables that require more detailed examination are SMOKE, FAMILY, and TYPE 1; the f i r s t two because their frequency distributions present some anomalies, to be discussed below, and the third because i t is a variable of primary. interest to this study. 19 II. 8 TABLE II.1 Variables correlated with three-year survival Variable Name Coefficient of Pearson x &n-L x Association (sig. level) (s i-g. level) (s i-g- level) LVMANT Y -.541 (.000) 24 .54 (.000) 26 .61 (.000) LVEJECT Y .512 (.000) 16 .11 (.000) 18 .00 (.000) C0RSTAT1 Y = -.404 (.000) 20 .09 (.017) 24 .12 (.004) RISK Y = -.390 (.001) 13 .82 (.008) 14 .31 (.007) PTNUMB e = .366 20 .72 (.000) 22 .66 (.000) ARRYTH e = .171 9 .11 (.027) 10 .09 (.017) SMOKE Y = .533 (.000) 15 .33 (.001) 15 .74 (.000) FAMILY Y = .338 (0013) 6 .62 (.084) 6 .75 (.079) LVMAPI Y = .318 (.030) 7 .56 (.108) 7 .79 (.098) TYPE 1 0 = .246 14 .39 (.006) 16 .77 (.001) FILTER e = .230 6 .91 (.074) 8 .15 (.042) VENT 6 = .227 9 .32 (.009) 9 .47 (.009) NUMGRFT 0 = .161 6 .37 (.093) 6 .43 (.091) CLAMP 0 .169 10 .63 (.155) 10 .95 (.140) 20 II.9 Smoking appears to be highly correlated with survival but the large number of observations in the unknown category leave this conclusion open to criticism: SURV unknown SMOKE non-smoker smoker x 2 -< 3 yr. 63 11 6 80 > 3 yr. 42 25 17 84 105 36 23 164 = 15.33 If the unknown category is deleted, the resulting table i s : SURV SMOKE non-smoker smoker < 3 yr. 11 6 - 17 > 3 yr. 25 17 42 36 23 49 Y = .211 X l2 = 1.01 and the correlation is no longer significant. In fact, this variable would have been deleted prior to analysis by the c r i t e r i a outlined i n Section I except that i t is a variable of great interest. FAMILY is another such variable, but deleting the "unknown" category here has the opposite effect. Before deletion the table i s : 21 11.10 SURV unknown FAMILY negative mild severe < 3 yr. 55 12 6 7 80 > 3 yr. 43 13 12 16 84 98 25 18 23 164 = 6.62 (a = .084) while the reduced table i s : SURV FAMILY negative mild severe < 3 yr. 12 6 7 25 > 3 yr. 13 12 16 41 25 18 23 66 X3 = 21.15 (a < .001) However, i t is not valid to base conclusions on the reduced tables unless the group for which the variables are unknown is not significantly different from the group for which a response is available. To investigate this issue two-way tables of SMOKE vs. a l l other variables were constructed, with variable SMOKE categorized as "known" and "unknown". For each table the hypothesis of row-column independence was tested. If this hypothesis is not supported by the data, then we have evidence that the patients for whom SMOKE is known are different from patients with missing data, with respect to the column variable. In fact, this i s the case for the variables RISK, NUMGRFT, CLAMP, FILTER, 22 11.11 VENT, SURV, TYPE 1, PTNUMB, and LVE. The same procedure was applied to FAMILY, with the same results. It was concluded that SMOKE and FAMILY do not provide enough relevant information for valid inferences about their association with SURV. They were eliminated from further analyses. Graft type was recorded for the f i r s t , second, and third grafts. Only TYPE 1 was examined i n detail, because many patients received only one graft, and because the type of the f i r s t graft largely determines the type of the second graft. Table 1 shows that TYPE 1 is significantly correlated with survival. However, the contingency 2 table has some empty c e l l s , which invalidates the x test of significance: TYPE 1 individual vein graft sequential end-side seq. side-side transverse seq. side-side lateral < 3 yr. 41 28 5 6 80 > 3 yr. 63 17 0 4 84 104 45 5 . 10 164 Since the second, third and fourth categories are a l l sequential grafts, a valid chi-squared test may be obtained by grouping to achieve sufficiently large c e l l totals. The resulting table i s : 23 11.12 SURV TYPE 1 Individual sequential < 3 yr. 41 39 80 > 3 yr. 63 21 84 104 60 164 = .457 X± = 6.17 (a = .015) The reduced table supports the association indicated by the original table, so we conclude that the type of graft has a significant f i r s t order correlation with survival. It now remains to determine whether or not this is a spurious correlation. For example, i t may have occurred because patients were preferentially selected for one type of graft. Table II.2 summarizes the results of the analyses of two-way contingency tables of TYPE 1 vs. a l l other variables. For each table TYPE 1 was grouped as above, into individual vein and sequential graft. As expected, TYPE 1 is highly correlated with TYPE 2. In most cases, one determines the other. TYPE 1 is also weakly correlated with many of the surgical variables of Table II.1, with the exception of variables measuringlleft ventricular function (LVMANT, LVEJECT, LVMAPI, LVE). Particularly noteworthy is the observation that TYPE 1 is highly correlated with PTNUMB. It can be seen i n Table II.3a that earlier patients were more often selected for individual grafts than patients entering the study towards the end of the two year period. If the f i r s t order correlation between survival and graft type is due to a common association with patient number, then this correlation should 24 11.13 disappear when patient number is taken into account. Table II.4 shows the three-way table of SURV vs. TYPE 1, controlled for PTNUMB. SURV and TYPE 1 are uncorrelated when the population is stratified by PTNUMB. This observation leads to the conclusion that a selection bias was indeed in effect, in as much as the f i r s t patients in the study generally received an individual vein graft and later patients a sequential graft. At this point there ariese a d i f f i c u l t y in interpretation of the results. It i s not clear i f (i) the apparent superiority of individual grafts is due to better survival of the f i r s t one hundred patients, or ( i i ) the f i r s t one hundred patients did better because most of them had individual grafts. It i s expected that later patients have better survival times because PTNUMB measures, among other things, improvement in surgical techniques over time. (This issuee is discussed more f u l l y is Section III.) So conclusion ( i i ) above i s more consistent with prior expectations. However, there is no evidence to indicate that individual grafts were superior. In addition, patients requiring only one graft a l l received individual grafts. In view of the slight association of NUMGRFT with SURV (cf. Table II.1), conclusion (i) above may in fact have more support. The correct interpretation of Tables II.3 and II.4 i s s t i l l an open question. Part of the d i f f i c u l t y i s due to the effect of arbitrary cut-off time in analysing survival data of this nature. Suppose two 25 11.14 TABLE II.2 Variables correlated with TYPE 1 Variable Name Coeff. of Assoc. 2 Pearson x 2 Zn-i. x TYPE 2 A = .550 147 .82 (.000) 189 .05 ( .000) PTNUMB A = .529 59 83 (.000) 66 .99 ( .000) RISK 6 = .038 19 67 (.001) 22 .59 ( .000) VENT A = .248 27 97 (.000) 28 .17 ( .000) FILTER A = .182 26 41 (.000) 26 83 ( .000) CLAMP A = .145 44 79 (.000) 46 22 ( .000) NUMGRFT A = .186 53 57 (.000) 64 66 ( .000) AREA 1 A = .104 16 59 (.002) 16 36 ( .000) TABLE II.3 (a) TYPE 1 vs. PTNUMB (b) SURV vs. PTNUMB TYPE 1 0-100 PTNUMB 100-200 PTNUMB 0-100 100-200 indiv. 80 24 104 < 3 28 52 80 SURV seq. 8 52 60 > 3 60 24 84 88 76 164 88 76 164 TABLE II.4 Three-way table SURV vs. TYPE 1 and PTNUMB PTNUMB: 0-100 indv. TYPE 1 sequen. < 3 26 2 28 SURV > 3 54 6 60 80 8 88 PTNUMB: 100-200 < 3 15 37 52 SURV > 3 9 15 24 24 52 76 < 1 ; A = .02 ,87 ; A = .00 27 a 11.16 JL t * I f the c u t - o f f time i s l e s s than t " , there w i l l be evidence that Group A had b e t t e r s u r v i v a l experience than Group B, but i f c u t - o f f time i s greater than t * the opposite r e s u l t w i l l o b t a i n . Figure I I . 1 shows the p r o d u c t - l i m i t s u r v i v a l curves f o r i n d i v i d u a l and s e q u e n t i a l g r a f t s . There i s some i n d i c a t i o n that the curves do i n t e r s e c t at approximately 3^ months. And i n f a c t i f t a b l e s of two-year s u r v i v a l vs. PTNUMB and TYPE 1 are constructed, there i s no longer a s i g n i f i c a n t c o r r e l a t i o n between s u r v i v a l and TYPE 1 or between s u r v i v a l and PTNUMB. 2. D i s c r i m i n a n t A n a l y s i s The s t r u c t u r e imposed by a parametric model may provide f u r t h e r i n s i g h t s i n t o the r e l a t i o n s h i p s among the v a r i a b l e s . We assume that the p o p u l a t i o n of p a t i e n t s f o r those s u r v i v i n g more than ( r e s p e c t i v e l y , l e s s than) three years has (at l e a s t approximately) a normal d i s t r i b u t i o n w i t h a mean ve c t o r of observations u., and 27b .0 .9 c o Z3 "O > 8 > .7 l/J TJ O) O .!.6 V) UJ i r A Sequentia I graft O Individual graft 5 10 15 20 25 30 35 40 45 50 55 60 65 Survival ti me (mon ths ) FIG. I l l PRODUCT-LI MIT SURVIVAL CURVES FOR INDIVIDUAL AND SEQUENTIAL G R A F T S : ^ 28 11.18 dispersion matrix £ (respectively, mean y^ a n (^ t n e same dispersion £ ) , so that discriminant analysis may be used to find the variables which best predict survival. A discriminant function may be estimated from the data for possible later use as a means of classifying future patients into one of the two normal populations. We assume the costs of misclassification are equal, and that the prior probability that an individual belongs to population one (TT^) is the same as the prior probability that he belongs to TT^ • Then, following Anderson (1958), for known mean vectors and and known dispersion matrix £ the minimax classification procedure i s the following: \ • ?' r1(H1-y2> ^ 2 (W' £-1 (yrV (II.1) R2 : ?' r 1 ( y 1 - y 2 ) -| (yi +H2 )' ^~ 1 (yrH2 ) where and are the regions of classification into iff^ and TT^ , respectively, and x is the vector of observations. When > 1^2 ' a n c* ^ a r ^ n 0 t known, t n e y m a y ^ e replaced in (II.1) by sample estimates. Let n l 1 r 29 11.19 n l n2 where n^ and are the sizes of samples from ir^ and TT^ • Then criterion (II.1) becomes Rx : x ' S " 1 ^ - ^ ) >J(i1+x2), S _ 1(x 1-x 2) (II.2) R2 : x'S 1(x 1-x 2) ±\(x1+?2), S """(x^Xj) The term x'S ^(x^-x^,) is Fisher's discriminant function. Although (II.2) is not the minimax solution, i t seems a reasonable large sample approximation to i t , x^ , x 2 , and S being unbiased estimates of y 1 , y 2 , and £ . In addition, we can estimate the probability of misclassification. This i s done using the result that i f X is a random observation to be classified into TT^ or -n^ , then cn.3) u = x' r 1(y 1-y 2 ) - i (yi +H2 )' ^~ 1 (yry2 ) is distributed as a N(-|a, a) variate i f X is from TT^ , and is distributed as a N(- -|ot, a) variate i f X is from , where a = (Hry 2 )' r 1(y 1-y 2 ) is the Mahalanobis distance between the populations. In the case of 30 11.20 unknown means and dispersion, we estimate U by replacing y^ , y^, and ][ i n (II.3) by , , and S . The "best" set of predictor variables with which to classify observations may be determined in a stepwise fashion analogous to that involved in stepwise regression. At the f i r s t step, the variable that, by i t s e l f , provides the best classification function is chosen to enter the linear discriminant function. At step two, the variable that provides, i n combination with the f i r s t , the best classification function based on two variables is added to the discriminant function. At this stage the equation is re-examined to see what results obtain i f these two variables are entered in reverse order. The procedure continues i n this manner, one variable entering at each step and superfluous variables being removed whenever necessary. When addition of any one of the remaining variables does not significantly improve the classification function, the procedure is terminated (Draper and Smith, 1966) . The best classification function maximizes the estimated variance between groups and minimizes the estimated variance within groups. A partial F-statistic is compared to a pre-selected F value to measure the significance of the contribution of an additional variable. Table II.5 summarizes the results obtained by applying here the stepwise analysis described above, and this is done for significance c r i t e r i a of .05 and .10. The results are, in reasonable agreement with those of Table II.1. Variables LVMANT, PTNUMB, RISK, and C0RSTAT1 31 11.21 are a l l correlated with survival according to the non-parametric measures of association y and 0 . These variables also enter the discriminant function as highly significant predictors. There are some discrepancies between the two sets of results as well, i n particular, with respect to the variables ECGANT, MI, and AGE. Table II.6 shows how successful the subsets of predictors are in classifying the observations. The improvement from Step 5 to Step 11 is slight. The five predictor variables entered at the .05 level give remarkably good discrimination. It is of interest to note that LVMANT alone has an overall success rate of 64.67%. Two of the variables entered at the five-percent level merit closer examination. The frequency tables for ECGLAT and MI are presented in Table II.7. The variables are highly skewed, with most of the patients f a l l i n g i n the "normal" category. So although complete, the available data is not very informative. In addition, these two variables are the only nominal ones entered at the five percent level. Discriminant analysis uses the Pearson product-moment correlation coefficient to decide how to enter the right hand variables into the discriminant function. However, for nominal variables with more than two categories the product moment partial correlation coefficient may not be a good reflection of the true partial correlation. As a consequence of these considerations, i t is d i f f i c u l t to interpret the results of the discriminant analysis and i t seems desirable, therefore, to eliminate ECGLAT and MI from consideration. 32 11.22 TABLE II.5 Stepwise Discriminant Analysis Three-year survival as dependent variable (a) cut-off criterion .05 STEP NUMBER VARIABLE ENTERED F-PROB 1 LVMANT .0000 2 PTNUMB .0000 3 ECGLAT .0088 4 MI .0077 5 AGE .0051 (b) cut-off criterion .10 STEP NUMBER VARIABLE ENTERED F-PROB 1 LVMANT .0000 2 PTNUMB .0000 3 ECGLAT .0088 4 MI .0077 5 AGE .0051 6 RISK .0559 7 DAYS 1 .0520 8 FILT .0929 9 C0RSTAT1 .0452 10 LVE .0710 11 COAG .0940 33 11.23 TABLE II.6 Percent of cases classified into each group (a) five variables entered actual < 3 predicted > 3 < 3 79.01 21.00 overall classi fication rate 77.84% > 3 23.26 76.74 (B). eleven variables entered actual overall classi-fication rate 80.49% 34 11.24 TABLE II.7 Frequency Distributions of ECGLAT and MI (a) ECGLAT (b) MI normal 126 acute inf. 1 old inf. 13 questionable 15 ST wave 9 none anterior lateral inferior 153 1 2 164 164 35 11.25 Table II.8 presents the variables entered at the 5% level and 10% level when ECGLAT and MI are excluded. One new variable, ANGDUR, is entered, while FILT, CORSTAT1, and COAG become unnecessary. The classification rate for the 5% cut-off is 78.05%; the probability of misclassification has not been increased by excluding the two variables. In spite of the fact that the underlying assumption of two normally distributed populations may not be satisfied, and i n spite of the problems associated with stepwise selection procedures, the discriminant analysis confirms the contingency table analysis, and i t appears that l e f t ventricular function, age, risk classification and time of entry into the study are important prognostic indicators. This is encouraging because i t supports the conclusions of other such studies (see Section I ) . For discriminant analyses that follow next, TYPE 1 is the classificatory variable. These analyses were done in order to detect any selection biases that might exist, by isolating variables that predict graft type. The results are summarized in Table II.9. As expected, TYPE 2 and PTNUMB are the most important predictors of TYPE 1. It is d i f f i c u l t to interpret the influence of AREA 3; only 44 patients received three grafts. It might be an indirect indication of the number of grafts, which in turn influences graft type; however NUMGRFT enters the equation at the 10% level. With the exception of PTNUMB, ARRYTH, ECGLAT and HYPERT a l l 36 11.26 TABLE II.8 Stepwise discriminant analysis - ECGLAT and MI excluded (a) .05 cut-off STEP NUMBER 1 2 3 VARIABLE ENTERED LVMANT PTNUMB AGE F-PROB .0000 .0000 .0130 overall classification rate - 78.05% (b) .10 cut-off STEP NUMBER 1 2 3 4 5 6 7 VARIABLE ENTERED LVMANT PTNUMB AGE RISK LVE DAYS 1 ANGDUR F-PROB .0000 .0000 .0130 .0638 .0150 .0581 .0894 overall classification rate - 73.78% 37 11.27 TABLE II.9 Stepwise discriminant analysis - TYPE 1 as dependent variable (a) .05 cut-off STEP NUMBER VARIABLE ENTERED F-PROB 1 TYPE 2 .0000 2 PTNUMB .0000 3 AREA 3 .0004 (b) .10 cut-off STEP NUMBER VARIABLE ENTERED F-PROB 1 TYPE 2 .0000 2 PTNUMB .0000 3 AREA 3 .0004 4 SITE 3 .0561 5 ECGLAT .0633 6 ARRYTH .0631 7 NUMGRFT .0430 8 CONF 3 .0314 9 SIZE 2 .0209 10 PERFPR .0928 11 CLAMP .0740 12 SITE 1 .0941 13 ANGDUR .0885 14 HYPERT .0872 15 SIZE 1 .0721 38 11.28 the v a r i a b l e s i n the d i s c r i m i n a n t f u n c t i o n are g r a f t data; that i s , they are measures of the s u r g i c a l techniques used and f u n c t i o n a l l e v e l of the g r a f t e d area. So i t does not appear that p a t i e n t s are p r e f e r e n t i a l l y s e l e c t e d f o r one type of g r a f t . Of course, they are s e l e c t e d on PTNUMB, as noted p r e v i o u s l y , because i n d i v i d u a l v e i n g r a f t s were used almost e x c l u s i v e l y when bypass surgery was f i r s t developed. The c l a s s i f i c a t i o n r a t e s a s s o c i a t e d w i t h t h i s d i s c r i m i n a n t a n a l y s i s are remarkably h i g h . They are presented i n Table 11.10. In f a c t , the c l a s s i f i c a t i o n achieved w i t h TYPE 2 and PTNUMB only i s as good as that achieved a f t e r 15 steps; the o v e r a l l per-cent of cases c o r r e c t l y c l a s s i f i e d by these two v a r i a b l e s i s 97.56%. Since TYPE 2 and PTNUMB are so important i n p r e d i c t i n g TYPE 1, a d i s c r i m i n a n t a n a l y s i s was run i n which these two v a r i a b l e s were not allowed to enter the equation. Table 11.11 presents the r e s u l t s ; a l l the v a r i a b l e s (except ECG) measure s u r g i c a l d e t a i l s . The c o n c l u s i o n of part 1, that p a t i e n t s are p r e f e r e n t i a l l y s e l e c t e d only on PTNUMB, i s supported by t h i s a n a l y s i s . 3. D e t e c t i o n of I n t e r a c t i o n E f f e c t s D i s c r i m i n a n t a n a l y s i s depends on the assumption of an under-l y i n g normal d i s t r i b u t i o n f o r the p o p u l a t i o n , w i t h equal d i s p e r s i o n s f o r each c l a s s . Although i n p r a c t i c e the technique seems to be f a i r l y robust to departures from the assumptions, there i s no t h e o r e t i c a l j u s t i f i c a t i o n f o r expecting v a l i d r e s u l t s when the assumptions do not 39 11.29 TABLE 11.10 Percent of cases classified into each group (a) .05 cut-off actual predicted indiv. sequen overall classification rate - 96.95% (b) .10 cut-off actual predicted indiv. sequen. overall classification rate - 97.56% 40 11.30 TABLE 11.11 Stepwise discriminant analysis TYPE 1 as dependent variable, TYPE 2 and PTNUMB excluded (a) .05 cut-off STEP NUMBER VARIABLE ENTERED (b) Percent of cases classified into groups F-PROB 1 NUMGRFT .0000 2 CLAMP .0000 3 VENT .0001 4 PERFPR .0005 5 ECGLAT .0001 6 COAG .0091 7 C0NF3 .0224 8 TRANSF .0488 actual predicted indiv. sequen. overall classifica-tion rate - 90.85% 41a 11.31 hold. In addition, i t is d i f f i c u l t to interpret the results when the independent variables are nominal. A data analytic method that avoids both assumptions about the underlying distributions and calculating correlation coefficients was suggested by Morgan and Sonquist (1963). It was developed as a tool for analysing survey data when the goal is to obtain qualitative conclusions about the interactions among the variables. Morgan and Sonquist describe the method as a "formalization of an extensive data sort". By separating the sample groups in some optimum way, some insight i s gained into the relationships among the observed variables. For a given dependent variable Y the procedure chooses a v - 2 best predictor variable to partition the sample so that l(Y,-Y) 1 2 within the groups so formed is as small as possible, and the £(Y -Y) across the groups is as large possible. These two groups are in turn s p l i t by the next best predictor and the process continues until either (i) the sum of squares within groups is smaller than some specified fraction of the total sum of squares, or ( i i ) the group sizes are not larger than a specified number. This technique produces a tree of binary s p l i t s . The binary tree for the surgical data is sketched in Figure II.2. Each node of the tree shows the predictor variable on which the group was s p l i t , and the values of the variable in each partition. Variables LVMANT, PTNUMB, AGE, ECGLAT and C0RSTAT1 are again selected as the important predictors of SURV. Figure II.2 provides support for the conclusions of parts 1 and 2. 41b SITE 2 minor branch z TOTAL SAMPLE LVMANT < 3 r PTNUMB 0-100 SITE 2 major artery AGE 60-80 AGE 30-50 PTNUMB 100-200 LVMANT > 3 CORSTAT 2 <_ 70% narrowing AGE 40-50 AGE 50-70 CORSTAT 1 no disease CORSTAT 1 some disease CORSTAT 2 > 70% narr. ECGLAT none or old inf.. ECGLAT recent inf. CORSTAT 2 no disease CORSTAT 2 some disease ECGINF none, old inf. ECGINF recent inf. FIGURE II.2 Results of an application of Sonquist & Morgan's "Automatic Interaction Detector" 42 11.33 4. Cluster Analysis Cluster analysis is another method of separating the sample into groups with similar characteristics. The correlations between pairs of variables are used as measures of the distances between the pairs, and the two variables closest together are combined and treated as one cluster. Then the correlations between this cluster and the remaining variables are used to add a third variable to the cluster. This procedure continues until the correlation between the cluster and a l l variables outside the cluster i s smaller than a specified minimum. The remaining variables are then clustered among themselves, and when a l l variables have been put into one of the clusters the procedure stops. The variables within the groups so formed are correlated, but variables in different groups are not. Unfortunately, the available programs for the cluster analysis of large sets of data a l l use the Pearson product moment correlation coefficient. As discussed previously this need not be a suitable measure of association between nominal variables. Nor is i t clear what correlation coefficient should be used when a cluster consists of nominal and ordinal variables. Table 11.12 shows the five clusters formed, with the average correlation of each cluster. The results are somewhat inconclusive, although an overall description of some of the clusters i s possible. The f i r s t 17 variables entered in cluster 1 a l l relate to graft data. Cluster 2 contains some of the variables found to be predictors of 43 11.34 TABLE 11.12 Classification of variables by cluster analysis CLUSTER 1 number of variables: 23 average correlation: .320 variables entered : SIZE 3, STEN 3, CONF 3, AREA 3, STAT 3, TYPE 3, SITE 3, FLOW 3, TYPE 1, NUMGRFT, TYPE 2, SIZE 2, SITE 2, AREA 2, STEN 2, CONF 2, STAT 2, CORSTAT 1, CORSTAT 2, CORSTAT 3, FILTER, FLOW 2, LVMINF CLUSTER 2 10 .165 DAYS 1, DAYS 2, PTNUMB, VENT, SURV, TRANSF, SGOT, number of variables average correlation variables entered ANGCH, ANGDUR CLUSTER 3 number of variables: 13 average correlation: ,7258 variables entered : LVE, LVMAPI, LVMANT, RISK, CHF, CARDOP, ARRYTH, INFHIST, CONFI, DISCOR, FLOW 1, SIZE 1, COAG CLUSTER 4 number of variables average correlation variables entered 7 .169 AREA 1, SITE 1 HCT, SEX, ECGANT, STEN 1, LVEJECT CLUSTER 5 number of variables: 6 average correlation: .127 variables entered : ECGINF, ECGLAT, PERFPR, INFINT, STAT 1, CLAMP 44 II. 35. survival (DAYS 1, PTNUMB, ANGDUR) but measures of l e f t ventricular function and risk are noticeably absent. The f i r s t three variables i n cluster 3 measure l e f t ventricular function, and this is known to be correlated with RISK, the fourth variable, yet LVEJECT is in a separate cluster (#4). Similarly two ECG readings are in one cluster (#5) but the third is i n cluster 4. The clusters of Table 11.12 could be used to reduce the number of variables used in subsequent analyses or even to reduce the number of variables recorded. Some representative variables for each cluster could be chosen; since variables within a cluster are correlated this would result in minimal loss of information. For example, the graft data variables might be represented by TYPE 1, NUMGRFT, CORSTAT 1 and SIZE 2. 5. Summary The data provides evidence that l e f t ventricular function, age, and risk are the most important predictors of survival. In addition, prognosis improved over the course of the study, as indicated by the influence of patient number on survival. An apparent association between surgical graft type and survival was shown to be caused only by a common correlation of these variables with patient number. The conclusions reached in this section are used in section III for a more detailed parametric analysis of the survival distribution. 45 i l i . l III. Regression Analysis of Survival Times There has been much discussion in the recent literature on the application of regression analysis to data with censored observations (e.g. Cox, 1972; Breslow, 1974 and 1975; Fiegl and Zelen, 1965; Kr a l l , Uthoff and Harley, 1975; R. G. Miller, 1974). The models proposed incorporate the l i f e table methods of Kaplan and Meier. Let T denote the random failure time. We are interested i n finding the cumulative probability of survival past time t : F(t) = Pr{T >_ t} . Define the instantaneous failure time, i.e. the hazard function, by Pr{t <_ T <_ t + At 11 <j T} A(t) = lim At-X)+ A t Then we have F(t) = exp{- A(u)du} 0 The usual estimate of F(t) is Kaplan and Meier's product-limit estimate. The estimate of the probability of dying i n an interval is chosen to agree with the observed proportion of deaths in the interval. The assumption must be made that causes of death and causes of censoring are independent. Let t , , . . v be the. i * " * 1 ordered failure-time, V-*-/ . . . . ^ m , .v be the number of failures at t,,, , and t ( i ) • - - (i) v)'.s be the number of individuals "at risk" at t,.. (i) (i) ; 46 III.2 that i s , r,., is the number of individuals alive and not lost to ( 1 ) observation at time t,.. . Then (i) k m ( 1 ) H t ) = I 6(t-t ) 1=1 r ( i ) U ; where I s the last observed failure time and 6(0 is the Dirac delta function. Then F(t) = n [1 - - i l l ] . t ( ± ) < t r ( D These estimates have been shown to be the maximum likelihood estimates (Kaplan and Meier, 1958). The models that have been proposed by Cox, Fiegl and Zelen, and others combine l i f e table analysis with regression analysis by letting the hazard function depend on some regressor variables, x , in the following way: (III.l) X(t;x) = XQ(t) g(x,3) Here ^ ( t ) 1 S t n e underlying hazard function for some baseline value of x , such as x = 0 , and g(x,g) is a function of the observed x and some coefficients g , to be estimated. Then the probability of surviving past time t is given by F(t;x) = exp{- A(u;x)du} 0 47 III. 3 The model incorporates information from censored observations via the product-limit estimate of A^(t) . This is especially useful in medical follow-up studies, where some observations are lost to follow-up, and some are censored due to termination of the study. The most common form for g(*) is one proposed by Fiegl and Zelen (1965) and generalized by Cox (1972), viz: g(x,3) = exp(x 3) The model A(t;x) = A (t) exp(x §) is called the "proportional hazards model" because changes in the regressor variables change the hazard function in a multiplicative way (Cox, 1972). Since survival data i s usually discrete, rather than continuous, Cox proposed a discrete analogue to (III.l) i n order to estimate 3 and Ag(t) . This involves conditioning on the known failure times and using the conditional likelihood function to solve for 3 , then using 3 and " iterating at each failure time to find ^ ( t ) • No parametric assumptions about ^gCt) a r e made. The discrete form of (III.l) is A(t;x) A Q(t) 1 - A(t;x) ° £ X P ( ? ~6) 1 - A Q(t) Breslow (1972) proposed a modification of Cox's approach i n order to simplify the estimation procedure. He assumed that the hazard 48 III.4 function ^Q ^ ) ^ s constant between failure times: \ X 0 C t ) = Ak = " f o r * £ ( t ( k - l ) ' ' t t ) 1 If there are no tied observations, Breslow's model gives the same results as Cox's model; i t is a good approximation with large sets of data (Breslow, 1974). Breslow has written a FORTRAN IV program to implement his model, and this program was used in the analysis presented here. The f i r s t step in the analysis was to choose the regressor variables x . These were chosen with the aid of the stepwise discriminant analysis discussed in Section II. Table IV.1 shows the variables entered into the discriminant analysis at the 10% level. The variables ECGLAT and MI were excluded for reasons discussed in Section II. Of these, the following five were selected: (1) LVMANT (left ventricular motion (anterior), (2) PTNUMB (patient number), (3) AGE, (4) RISK, and (5) LVE (left ventricular emptying. The sixth regressor variable was TYPE 1 (graft type) - the variable of interest. The sample size for this analysis was larger than that for . previous analyses, since no cases had to be discarded due to censoring. The most recent one hundred cases were not used, since a large number of the observations censored at time zero would not be expected to provide very much information about survival times, and could lead to problems with convergence. The results of the regression analyses are presented in Table III.2. The log-likelihood and estimated coefficients (standard errors in parentheses) are given for each equation f i t t e d . The usefulness TABLE I I I . l Variables in the discriminant function STEP # VARIABLE ENTERED F-PROB 1 LVMANT .0000 2 PTNUMB .0000 3 AGE .0130 4 RISK .0638 5 LVE .0150 6 DAYS 1 .0581 7 ANGDUR .0894 TABLE III.2 Regression Coefficients for the Proportional Hazards Model NUMBER OF VARIABLES NUMBER OF ITERATIONS LOG LIKELIHOOD LVMANT AGE COEFFICIENTS RISK (std. errors) LVE TYPE PTNUMB 0 1 5 4 5 5 5 5 5 5 5 5 5 5 5 6 5 5 5 6 5 142.732 134.047 140.421 130.186 130.724 141.847 140.820 127.940 129.910 127.040 129.767 128.662 12667366 125.324 126.481 125.850 124.252 124.070 123.778 123.650 122.730 122.310 .597(.135) ,319 ,152 .336 ,505 .359 .331 ,042(.02) .033 ,041 ,048 .056 ,068 .047 .065 .062 .814(.161) .690 .758 .591 .790 .819 .579 .521 .563 .598 .477 .499 .551 .524 .456 .502 .840 (.171) .492 ,380 ,537 .506 .462 .581 .306 .521 .344 .301 -.356 (.303) -.252 -.288 -.403 -.421 -.461 -.007 (.003) .-.006 -.005 -.006 -.006 51a III. 7 of adding one or more variables to the model is measured by twice the 2 difference in the log-likelihoods, distributed approximately x with degrees of freedom equal to the number of variables added (Breslow, 1975) The coefficients are estimated for centered values of the regressor variables. The effect of a unit change in variable x. on the hazard function is measured by e . The magnitude of the coefficient depends on the range of the corresponding variable; for this reason the coefficient for AGE is an order of magnitude smaller than the coefficients for heart function variables, and the coefficient for PTNUMB is another order of magnitude smaller. This could have been avoided by transforming the variables AGE and PTNUMB. However, no problems of interpretation arise as long as the range of each variable is kept in mind when the coefficients are compared (see Figure I I I . l ) . The sign of the coefficient has the following interpretation: a positive coefficient implies a deteriorating prognosis with increasing values of the variable, and vice versa. Recall that Pr{patient lives >_ t months} = F(t;x) and F(t;x) = exp{-= exp{-rt -A.(u;x)du} 0 r t * xB X (u) e ? B du} 0 which i s a decreasing function of x^ when 3\ is positive. The estimated coefficients for LVE, LVMANT, and RISK are positive, as expected, since larger values of these variables are related to poorer patient functioning. This analysis supports the hypothesis that increased 51b FIGURE I I I . l Variable values and their meanings VARIABLE NAME POSSIBLE VALUES INTERPRETATION LVMANT 1-5 decreasing ventricular function with increasing values AGE 30-87 RISK 1-5 increasing risk with increasing values LVE 1-5 decreasing l.v. function with increasing values TYPE 1 1-4 1-individual vein 2- sequential end-side 3- seq. side-side transverse 4- seq. side-side longitudinal PTNUMB 1-200 increasing values with time of entry into the study 52 III.9 age is a l i a b i l i t y in open heart surgery. And, as was noted in Section II, higher patient number means improved prognosis, reflecting the time trend in the success-failure rate. The fact that the coefficient for graft type is negative provides some evidence that individual vein grafts actually gave better results than sequential grafts. It might be expected that some variables could be deleted from the model without a great loss in precision, since some of them are highly correlated (LVMANT, LVE, and RISK, in particular). One way of choosing the most efficient subset of the variables was suggested by K r a l l , Utoff and Harley (1975). Their method, called a "step-up" procedure, is analogous to forward subset selection i s multiple regression analysis. The f i r s t variable to enter the equation is the one that, by i t s e l f , reduces the log-likelihood by the largest amount. Once this variable has been entered, a l l two-variable combinations with this variable are calculated, and again the pair that reduce the log-likelihood the most are chosen to enter the regression equation. The significance til of the i variable's contribution is approximately measured by - 2 l o g ( L i + 1 - L i ) * X J • Following this procedure, i t is seen from Table III.2 that RISK should be 2 the f i r s t variable to enter the equation with a x-^ of 25.092. With RISK in the equation, the variable that should be added next is LVE. 2 The x value for this stage is 6.292, with a significance level of 2 about .015. The next variable to enter is AGE, and the X-, i s 7.352, 53 III.10 again highly significant. The next variable to enter is LVMANT, with a 2 X , of 2.508, a significance level greater than .1 . If this step-up procedure was to be followed rigourously, we would stop here and choose as the model A(t;x) = AQ(t)-exp{.041 * AGE + .521 * RISK + .537 * LVE} 2 If we continue in spite of the low x » a n d enter LVMANT, we find the 2 next variable to enter i s PTNUMB, with a x-^ of 3.388 not significant at the 5% level. However, i f we jump from three variables to five, entering 2 LVMANT and PTNUMB simultaneously, the x 2 s t a t i s t i c is 2(127.040 - 123.650) = 6.78 which is significant at the 5% level. This discrepancy probably arises 2 because twice the difference i n log-likelihoods is not exactly x » except at the f i r s t stage (Krall, Uthoff and Harley, 1975). This results makes choosing a subset of regressor variables d i f f i c u l t , since the optimum action is not at a l l clear. It seems reasonable, however, that the more conservative approach i s better suited for this type of data, since the computations involved are not too tedious. So the first-order model chosen is (2) i(t;x) = A Q(t) exp{.331 * LVMANT + .062 * AGE + .502 * RISK + .301 * LVE - .006 * PTNUMB} 54 III.11 Since l e f t ventricular motion, l e f t ventricular emptying, and risk are correlated, i t would not be surprising i f an interaction involving these variables made a significant contribution to the model. The results summarized in Table III.3 seem to lend evidence to this hypothesis. The coefficients for each of the three variables is reduced in size when another variable is added to the model. This is most striking i n the case of LVMANT: by i t s e l f , a unit change in LVMANT results i n a two-fold increase in the hazard function 597 (e' = 1.822); however, with RISK or LVE in the equation as well the effect of LVMANT is much less pronounced (e' = 1.377, e' = 1.246, respectively). For these reasons, some models including second-order interaction terms were fi t t e d . The results are presented i n Table III.4. When LVMANT x RISK interaction is included in the model, the coefficient is not too large, and the coefficients of LVMANT and RISK do not change appreciably from those i n Table III.2. The LVMANT x LVE interaction effect is somewhat larger, but again the f i r s t order coefficients are not very different from their previous values. This i s not the case with LVMANT x AGE interaction, however. Here the interaction has a large influence on the hazard function, and the main effect of LVMANT is negligible, once this interaction has been accounted for.' In terms of reduction of log-likelihood, none of the subsets of variables in Table III.4 are as good as subsets of the same size in Table III.2. Although there is evidence that LVMANT and AGE interact, inclusion of the interaction term does not significantly improve the 55 III.12 TABLE I I I . 3 V a r i a b l e subsets suggesting i n t e r a c t i o n e f f e c t s NUMBER NUMBER LOG- COEFFICIENTS VARIABLES ITERATIONS LIKELIHOOD LVMANT RISK LVE 1 5 -134 .047 .597 4 -130 .186 .814 5 -130 .724 .840 2 5 -127 .940 .319 .690 5 -127 .040 .591 .492 5 -129 .319 .224 .660 56 III.13 TABLE III.4 Interaction terms included in the model NO. VARS. NO. ITER. LOG . L LVMANT AGE RISK LVE LVMx AGE LVMx RISK LVMx LVE 3 5 -127.891 .449 .715 -.026 5 -130.047 .117 .627 .028 5 -128.579 -4';'378 -.028 1.012 5 5 -123.706 1.25 .041 .182 1.07 -1.23 TABLE III.5 Comparison of lst-order and 2nd-order models NO. VARS. NO. ITER. LOG L LVMANT AGE RISK LVE PTNUMB LVMANTx AGE 5 5 -122.310 .331 .062 .502 .301 -.006 6 5 -121.471 1.144 -.024 1.602 .307 -.006 .919 2(122.310 - 122.471) = 1.778 *1,.05 " 3 ' 8 4 x i , . i o = 2 ' 7 1 57 III.14 model. Table III.5 compares the first-order model (III.2) with the second-order model including LVMANT x AGE interaction. For purposes of prediction the f i r s t order model was chosen. The underlying hazard function, ^Q^*-) ' e s t i m a t e d simultaneously with the coefficients, 3. . In Breslow's model J a. y t ) = A. = e 1 t e ( t ( . _ 1 ) , t ( 1 ) ] . and is estimated by \ = m ( i ) / ( t ( i ) " t ( i - i ) ) . j exp%i> -1 i where R. is the set of patients at risk at t... . Withdrawals which i (i) occurred in (t,. t...] were adjusted to have occurred at t,. 1 X , (i-1)' ( i ) J J ( i - l ) as suggested by Breslow. The time unit was one month, and one month was added to each survival time. This was suggested by K r a l l , Uthoff and Harley (1975) as a means of avoiding possible convergence problems associated with negative expected survival time for patients who lived less than one month. The underlying hazard function is presented i n Table III.6, along with the Kaplan-Meier hazard function. Both functions are graphed in Figure III.2. The Kaplan-Meier estimates are equivalent to the maximum likelihood estimates with 3 = 0 . They are consistently lower than the maximum likelihood estimates with § / 0 . This indicates that averaging the regressor variables over the risk set and including such a term in the estimate improves a patient's prognosis. 58a III.15 TABLE III.6 Product-limit arid Maximum Likelihood estimates of Survival Distribution (months) nu (multiplicity) P. I I .PL q i 1 14 .9686 .0314 .9282 .0718 2 1 .9656 .0031 .9230 .0056 7 1 .9623 .0034 .9178 .0057 8 1 .9589 .0035 .9122 .0060 12 1 .9554 .0037 .9067 .0061 18 2 .9480 .0077 . .8951 .0128 21 1 .9437 .0046 .8883 .0075 23 1 .9392 .0047 .8816 .0076 25 1 .9344 .0051 .8749 .0076 30 1 .9284 .0065 .8665 .0096 40 1 .9184 .0107 .8498 .0192 48 2 .8893 .0317 .8012 .0571 49 1 .87 44 .0167 . 7770 .03 0 3 54 1 .8376 .0421 .7215 .0714 61 1 .7586 .0943 .5772 .2000 P. = F'(t.; x) 9,- = P^L -PL q i A.(x) n [ l W V ) m. (i) m (k) (k) r(k) = C a r d R(k) 58b FIG.III.2 PRODUCT-LIMIT AND MAXIMUM LIKELIHOOD ESTIMATES OF PROBABILITY. OF SURVIVAL. 59 III.17 By using this model, the survival distribution for each patient can be estimated, the estimate being based on the survival rates of similar patients i n the group analyzed above. Dr. U. S. Page (personal communication), i s currently using a similar technique. He estimates the survival distribution for each risk classification by the product-limit method based on the survival experience of his previous patients, and obtains an estimate of a patient's survival distribution function according to the risk assessed that patient. The usefulness of this method is limited, however, because i t w i l l not usually be feasible to cross-classify on more than two variables. The present model, unlike that used by Page, allows comparison of survival distributions on arb i t r a r i l y many variables (we have chosen five, as outlined above). We measure the right-hand variables of (2) for the patient under consideration, and form the discrete hazard function, X.(x) for each t. , from which the cumulative survival l - x distribution can be estimated as i F(t.) = n (1 - X (x)) . 1 £=1 * ~ For purposes of prediction, patient number was not incorporated into A^(x) . Patient number increases linearly with time, and the effect of patient number on survival reflects a time trend in survival experience of the f i r s t two hundred patients. This time trend is largely a result of improvement in surgical expertise, and has been reported in the medical literature (Mundth and Austen, 1975) . It may also reflect such influences as improvement in materials, or in patient 60 III.18 attitude as bypass surgery becomes more widely used and accepted. Including patient number w i l l decrease the hazard function, but this more favourable prognosis could be misleading. Extrapolation of the time trend to patient numbers outside the range (0, 200] is not ju s t i f i e d , because there is no information available to decide i f the effect continued over time, or levelled off, or even started to decline at some point. When the data for the f i r s t 1000 patients is available, a re-analysis w i l l provide more information about this effect, and i f patient number remains an important explanatory variable then the prediction model could be changed to incorporate i t . At this time, however, we have the somewhat unusual situation that the model which best explains the data is not the best model for predicting new values. Howedoes this model affect a particular patient's prognosis? For i l l u s t r a t i v e purposes, survival distributions were estimated for two patients chosen at random from the group previously excluded from the analysis (patient numbers 901-1000). Coincidentally, a l l of one patient's right hand variables were higher than average, and a l l of the other's were below average (Table III.7), so we are comparing a relatively "healthy" patient with a relatively "unhealthy" one. Figure III.3 shows the Kaplan-Meier survival distribution (the standard actuarial approach), the underlying survival curve, and the predicted survival curves for the two patients. Inclusion of the right-hand variables had a marked effect on prognosis. Figure III.4 compares the survival curves based on risk classification alone with the survival curves predicted by the model. For the "healthier" patient, with risk 61 a III.19 TABLE III.7 Values of Regressors for two " t r i a l " patients LVMANT . LVE AGE RISK Patient 937 3 3 56 4 Patient 983 1 1 46 2 mean 1.81 1.74 53.55 2.57 61b 5 10 15 20 25 30 35 40 45 50 55 60 65 S u r v i v a l ti me ( months ) FIG.III.3 EST IMATED SURVIVAL DISTRIBUTION FOR TWO PATIENTS CHOSEN AT R A N D O M . 61c 5 10 15 20 25 30 35 40 45 50 55 60 65 S u r v i v a l time (months) FIG. III. 4 ESTIMATED SURVIVAL DISTRIBUTION BASED ON RISK AD AND FOUR PROGNOSTIC VARIABLESOO. 62 III.22 classification 2, inclusion of his other variables improves his prognosis, but for the patient in risk category 4, inclusion of the right hand variables worsens his prognosis. Already the classification method has run into d i f f i c u l t y ; there were so few patients of the f i r s t two hundred in risk category 4 that the survival distribution has only two distinct points. In conclusion, the proportional hazards model was f i t t e d to the data; as a result a survival distribution can be estimated for each patient. The model is log-linear and is based on patient age,risk, and l e f t ventricular function, and the survival experience of the sample. Examples of the predicted survivor function are presented. It remains to test this model on new data to obtain a measure of i t s accuracy and to revise the model in light of those results. 63 i v . l IV. Conclusions An outline of the surgical techniques of bypass surgery, and the conclusions that have been reached by similar studies are presented. The data analysed i n the study is based on the reports of 192 patients surgically treated for coronary artery disease by Northwest Surgical Associates of Portland, Oregon, between March, 1969 and December, 1971. A subset of the recorded variables was used for the analysis, due to problems with incomplete records. Standard s t a t i s t i c a l procedures for retrospective studies (including contingency tables and discriminant analysis) are used to isolate variables important i n predicting survival and to discover associations among the variables. The conclusions of these analyses are: 1) time of entry into the study (relative to March, 1969) is strongly correlated with grafting technique used and with survival experience. 2) the most important prognostic indicators are l e f t ventricular motion, l e f t ventricular emptying, l e f t ventricular ejection fraction, extent of occlusion of the diseased artery, age, and risk classification. 3) the patients i n the study were not preferentially selected for one surgical graft procedure over another on the basis of pre-operative data. These These conclusions are used to develop a proportional hazards model, which combines log-linear regression with actuarial survival 64 IV.2 distribution analysis. With this model, relationships among the important prognostic variables are examined i n detail. It i s possible to estimate a separate l i f e table for each individual, and some examples of such tables are given. The advantages of this method over others of a similar nature are 1) the sample is not s p l i t for purposes of prediction, hence a l l the sample information i s used in each prediction, and 2) the simultaneous effect of several variables may be incorporated into the predictions. 65 IV.3 REFERENCES Anderson, T. W. (1958): An Introduction to Multivariate S t a t i s t i c a l Analysis. New York: John Wiley and Sons, Inc. Blyth, C. R. (1973): Simpson's Paradox and Mutually Favorable Events. J. American St a t i s t i c a l Association 68: p. 746. Breslow, N. (1972): Contribution to discussion on the paper of D. R. Cox, cited below. Breslow, N. (1974): Covariance Analysis of Censored Survival Data. Biometrics 30: 89-99. Breslow, N. (1975): Analysis of Survival Data under the Proportional Hazards Model. International St a t i s t i c a l Review 43: 43-54. Cohn, L. (1973): The Surgical Treatment of Acute Myocardial Ischemia. New York: Mt. Kisco Ltd. Cox, D. R. (1972): Regression Models and Life Tables (with discussion). J. Royal St a t i s t i c a l Society B 34: 187-220. Draper, N. R. and Smith, H. (1966): Applied Regression Analysis. New York: John Wiley and Sons, Inc. Favaloro, R. G. (1968): Saphenous vein graft in the surgical treatment of coronary artery disease: operative technique. J. Thoracic and Cardiovascular Surgery 58: 178-185. Fiegl, P. and Zelen, M. (1965): Estimation of Exponential survival probabilities with concomitant information. Biometrics 21:826-838. Green, G. E. (1972): Internal mammary artery-tp-coronary anastomosis: three year experience with 165 patients. Annals Thoracic and Cardiovascular Surgery 14: 260-271. Kaplan, E. I. and Meier, P. (1958). Nonparametric estimation from incomplete observations. J. American St a t i s t i c a l Association 53: 457-481. Kra l l , J. M., Uthoff, V. A., and Harley, J. B. (1975): A Step-up procedure for selecting variables associated with survival. Biometrics 31: 49-57. 66 IV.4 Mark, A. L. et. a l . (1975): Patency of internal mammary artery grafts (abstract). Circulation 52 (suppl. II). Mathur,- V. S. and Guinn, G. A. (1975): Prospective randomized study of coronary bypass surgery in stable angina: the f i r s t 100 patients. Circulation 51 and 52 (suppl. I): I-133-I-140. McCormick, J. R., Kanebo, M., Bane, A. E., Geha, A. S. (1975): Blood flow and vasoactive effects i n internal mammary and venous bypass grafts. Circulation 51 and 52 (suppl. I): 1-72-1-70. Miller, R. G. (1974): Least squares regression with censored data. Technical Report #3. Division of Biostatistics. Stanford University, Stanford, California. Morgan, J. N. and Sonquist, J. A. (1963): Problems i n the analysis of survey data, and a proposal. J. American St a t i s t i c a l Association 58: 415-435. Mundth, E. D. and Austen, W. G. (1975): Surgical methods for coronary heart disease. New England Journal of Medicine 293: 13-19, 75-80, 124-30. Oldham, et. a l . (1972): Risk factors in coronary artery bypass surgery. Archives of Surgery 105: 918-923. Siegel, W. et. a l . (1975): Comparison of internal mammary artery and saphenous vein grafts for myocardial revascularization by exercise testing (abstract): Circulation 52 (suppl. II). Tecklenberg, P. L., Alderman, E. L., Miller, D. C , Shumway, N. E., and Harrison, D.C. (1975): Changes in survival and symptom rel i e f i n a longitudinal study of patients after bypass surgery. Circulation 51 and 52 (suppl I): 1-98 - 1-104. Trapp, W. G. and Bisarya, R. (1975): Placement of coronary artery bypass surgery without pump oxygenator. Annals of Thoracic Surgery 19: 1-9. APPENDIX. PATIENT DATA QUESTIONNAIRE HOSPITAL ID IMPRINT NEW PATIENT INFORMATION DATA SHEET 1. HOSPITAL UNIT NUMBER: 1. 2. PATIENT NUMBER: M or S Prefix + 4 digits 2. 3. 4. DATE: (6 digits) Day Mo Year PATIENT PREVIOUSLY ENTERED: 3. 4. (i f yes, enter M or S number) 5. PATIENT NAME: 5. (20 spaces allowed) 6. PATIENT ADDRESS: 6. 7. 8. PHONE: (7 digits) DOCTOR CODE: (2 digit code) NAME 7. 8. 9. ANGINA AT ENTRY: 0=Unk, l=No, 2=Cl-2, 3=Cl-3,-4=01-4T 9. 10. AGE & SEX: (M.orF + 2 digits) 5=01-5 10. 11. ANGINA DURATION: 0=Unk, l=<lwk, 2=l-6wk, 3=6-12wk, 11. 4=>3mos 12. ANGINA A PAST MO: 0=Unk, l=No, 2=Mild, 3=Sev 12. 13. INFARCTION HX: 0=Unk, l=No, 2=lx, 3=2x, 4=3 or more 13. 14. INFARCTION INTERVAL: 0=Unk or N/A, l=<6hr, 2=6hr-lwk, 14. 3=<3wk, 4=<3mo, 5=>3mo 15. CHF: 0=Unk, l=No, 2=Cl-2, 3=Cl-3, 4=Cl-4 15. 16. RESTING ECG, ANT: 0=Unk, l=Nor, 2=Acute Rec MI, 16. 3=01d SE INF, 4=01d FT INF, 5=Can't Eval, 6=Persist STA 17. RESTING ECG, INF: (Same choices as #16) 17. 18. RESTING ECG, LAT: (Same choices as #16) 18. 19. RECENT BRUCE STRESS TEST: 0=Unk, l=No, 2=Yes-, 19. 3=Yes ++ 20. STRESS DURATION: 0=Unk or N/A, l=<3m, 2=3-6m, 20. 3=6-9m, 4=>9m 21. STRESS-MAX HR: 0=Unk or N/A, 1=<100, 2=101-120, 21. 3=121-140, 4=141-160, 5=>160 22. 22. 23. 23. 24. 24. 25. 25. 26. 26. 27. 27. 28. 28. 29. REVISION DATE 08 01 75 29. HOSPITAL ID IMPRINT PATIENT HISTORY & RISK FACTORS PATIENT'S NUMBER: PT'S NUMBER 30. DIABETES: 0=Unk, l=No, 2=Mild(-Rx), 3=Sev(Rx+) 30. _ 31. HYPERTENSION: 0=Unk, l=No, 2=Mlld, 3=Mod(Mild 31. _ LVH), 4=Sev(Gr LVH) 32. LVH (s Hypertension): 0=Unk, l=No, 2=Mod, 3=Sev 32. _ 33. SERUM CHOLESTEROL: 0=Unk, 1=<250, 2=<300, 33. _ 3=<350, 4=>350 34. LIPOPROTEIN PHENOTYPE: 0=Unk, l=Nor, 2=Type II, 34. _ 3=Type IV, 4=0th 35. TRIGLYCERIDE LEVEL: 0=Unk, l=Nor, 2=175-350, 35. _ 3=350-700, 4=>700 36. FAMILY HX: 0=Unk, l=Neg, 2=Mild, 3=Sev 36. _ 37. SMOKING HX: 0=Unk, l=Never, 2=Quit>l yr, 37. _ 3=<10 Pk/yr, 4=<40 Pk/Yr, 5=>40 Pk/yr 38. PREV CARDIAC SURG: 0=Unk or No, l=Valve(s), 38. _ 2=V'berg, 3=Cor Revasc, 4=Cong Repair, 5=Comb of above 6=Aneurysm 39. NON-CARDIAC VASC PIS: 0=Unk or No, l=Carotid, 39. _ 2=Aorto-Iliac, 3=Fem-Pop, 4=Renal, 5=Comb, 6=0th 40. PREV NON-CARDIAC VASC SURG: 0=Unk or No, l=Yes 40. _ 41. CARDIOMEGALY(BY XRAY): 0=(Unk) or None, l=Mild, 41. _ 2=Mod, 3=Sev 42. 42. 43. 43. 44. 44. _ 45. 45. 46. 46. _ 47. 47. 48. 48. _ 49. 49. REVISION DATE: 08 01 75 HOSPITAL ID IMPRINT CATHETERIZATION DATA SHEET PATIENT'S NUMBER: PT'S # 50. IS THIS A FOLLOW UP: 0=No, Letter=Yes 50. _ (If yes, use interval letter + Re Study Number) 51. STUDY DATE: 51. DAY MO YR 52. GEN LV EMPTYING: 0=Unk, l=Nor, 2=Mildi, 4=Sev4- 52. _ 53. LV EJECT FX: 0=Unk, +=<15%, 2=15-20%, 3=21-25%, 53. _ 4=26-30%, 5=31-40%, 6=41-50%, 7=51-60%, 8=>60% 54. LV MOTION, ANT: 0=Unk, l=Nor, 2=Mild-)-, 3=Mod+, 4=Sev+,54. _ 5=Parad 55. LV MOTION, API: 0=Unk, l=Nor, 2=Mild+, 3=Mod+, 4=Sev4-,55. _ 5=Parad 56. LV MOTION, INF: 0=Unk, l=Nor, 2=Mild+, 3=Mod-t-, 4=Sev+,56. _ 5=Parad 57. CORONARY STATUS, ANT: 0=Nor, l=Small, ?Signif. 57. _ 2=Loc Prox Dis-<50% 6=Diff or Distal 50-75% 3=Loc Prox Dis-50-75% 7 =Diff or Distal 76-99% 4=Loc Prox Dis-76-99% 8 =0ccl § Collateral 5=Diff or Distal<50% 9 =Occl s Coll or ReCanal s Dist Branches 58. CORONARY STATUS, CIR: (Same choices as #5 7) _ 58. _ 59. CORONARY STATUS, RIGHT: (Same choices as #57) _ 59. _ 60. LEFT MAIN STENOSIS: 0=Unk, l=No, 2=<50%, 3=51-76%, 60. _ 4=76-99%, 5=Occl 61. VALVULAR PIS: 0=Unk, l=No, 2=MI (Pap Muscle), 61. _ 3=MI(Rheum), 4=MS, 5=AS, 6=AI, 7=M+A Pis 8=3V Pis 62. CATH COMP: 0=Unk, l=No, 2=Bleed(Transfused), 3=Reg 62. _ Thrombectomy, 4=Arrythmia Rx'd, 5=MI, 6=0th 63. CONGENITAL ANOMALY?: 0=Unk or No, l=Of Cor, 63. _ 2=ASP-VSB etc, 3=Ischemic VSP 64. GRAFT STATUS: 0=No Graft 64. _ Graft 1; l=0pen, 2=S1.Stenosis, 3=Severe Sten, 4=Closed Graft 2; l=0pen, 2=S1.Stenosis, 3=Severe Sten, 4=Closed Graft 3; l=0pen, 2=S1.Stenosis, 3=Severe Sten, 4=Closed Graft 4; l=0pen, 2=S1.Stenosis, 3=Severe Sten, 4=Closed 65. STUDY DONE AT: 0=GSH, l=Salem Mem, 2=Yakima, 3=Eugene 65. _ (Sac. Heart) 4=0ther 66. 66. _ 67. 67. _ 68. 68. _ 69. PATIENT RISK: 0=Unk, 1=1, 2=2, 3=3, 4=4, 5=5 69. _ (l=Nor V, ICor; 2=Nor V, 2=Cor; 3=Diff Dis/lmp Vent, 4=Piff Dis/Sev Imp Vent, 5=Comb Valve & Coronary Surg.) HOSPITAL ID IMPRINT SURGICAL DATA SHEET 71 HOSP ID IMPRINT PATIENT'S NUMBER: PT'S # 70. DISEASED COR NOT GRAFTED: 0=No, 1=SI-Ant, 2=S1-Lat, 70. _ 3=S1-Inf, 4=Mod-Ant, 5=Mod-Lat, 6=Mod-Inf, 7=Sev-Ant, 8=Sev-Lat, 9=Sev-Inf 71. ASSOC VALVE SURG: 0=No, 1=M Comm, 2=A Comm, 3=MVR, 71. _ 4=AVR, 5=2VR, 6=Congenital repair 7=0ther 72. ANEURYSM: 0=No, l=Yes(No Rx), 2=Yes-Plicated, 72. _ 3=Yes-Resected 73. OXYGENATOR: 0=Q-100l=Q-200, 2=Disc 73. _ 74. PERFUSION PRIME: 0=Tot D i l l=Blood, 2=Plasma, 74. _ 3=Alb, 4=Dex 75. FILTERS: 0=None, 1=CS, 2=Art, 3=CS+Art 75. _ 76. VENT: 0=No, 1=LV Direct, 2=LV-Atrial 76. _ 77. GRAFT PREP: 0=Saline, l=Hep V Bl, 2=Hep Art Bl, 77. _ 3=In Situ, 4=IMM 78. HCT ON BYPASS: 0=Unk, 1=<22, 2=22-26, 3=>26 78. _ 79. FIBRILLATION: 0=Unk, l=No, 2=Cont, 3=Intermittent 79. _ 80. PERFUSION TEMP: 0=Unk, 1=<32°C, 2=32°C, 3=34°C, 80. _ 4=37°C 81. EXT CARDIAC COOLING: 0=Unk, l=No, 2=Yes 81. _ 82. TOTAL X CLAMP TIME: 0=Unk or None, l=<10m, 2=<20m, 82. _ 3=<30m, 4=<45m, 5=<60m, 6=<75m, 7=>75m 83. 83. _ 84. 84. 85. 85. _ 86. 86. _ 87. 87. I 88. 88. _ 89. 89. REVISION DATE: 08 01 75 HOSPITAL ID IMPRINT INDIVIDUAL GRAFT DATA HOSP ID IMPRINT PATIENT NUMBER 90. NUMBER OF GRAFTS: 0=Unk, Use Actual Number 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. GRAFT AREA: 0=Unk, l=Ant, 2=Lat, 3=Inf. GRAFT SITE: 0=None, 1=AD, 2=PD, 3=Mar, 4=Cir, 5=Dia, 6=RT, 7=Inter 8=PLD, 9=0th TYPE OF GRAFT: 0=None, l=Ind V, 2=Seq E-S 3=S-ST, 4=S-SL, 5=LIMA, 6=RIMA, 7=RA E-S, 8=RA S-S, 9=Other CORONARY ART SIZE: 0=Unk, l=>2mm, 2=l|-2mm, 3=<l-|rnm COR . QUAL AT SITE: 0=Unk, l=Nor, 2=S1 Dis, 3=Sev Dis, 4=End-Good, 5=End-Poor GRAFT FLOW: 0=Unk, l=<25cc, 2=25-50cc, 3=51-75cc, 4=76-125cc, 5=>125 cc GRAFT RES: 0=Unk, 1=<200, 2=200-400, • 3=401-600, 4=601-800, 5=801-1000, 6=1001-1500, 7=1501-2000, 8=>2000 PROX STENOSIS: 0=Unk, l=OCC, 2=76-99% 3=50-75%, 4=<50% GRAFT CONF: 0=Unk, l=Good, 2=Fair, 3=Poor PT # 90. 91. 92. TECHNICAL PROBLEMS: 0=Unk, l=No, 2=Yes-Sl, 3=Yes-Great MI IN AREA: (Post or Intra Op) 0=Unk, l=No, 2=Prob(ECG), 3=Def(ECG) GRAFT STATUS (ON DISCH): 0=Unk, l=Open, 2=S1 Sten, 3=Sev Sten, 4=Closed 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. GRAFT NUMBER 1 2 3 4 REVISION DATE: 08 01 75 HOSPITAL ID IMPRINT HOSPITAL COURSE 73 HOSP ID IMPRINT PATIENT NUMBER: PT # 106. POST OP STUDY: 0=No, l=Yes, 2=PM 106. (If yes, 1 is followed by cath number; ie: 1, 0000 107. ARRYTHMIAS: 0=Unk, l=No, 2=PVC.'S, 3=Supravent Beats, 107. 4=Both, 5=VT or VF, 6=Recurrent VT or VF 108. TRANSFUSIONS: 0=Unk, l-8=actual #, 9=9 or more 108. 109. P.O. BLEED: 0=Unk or 500cc, 1=501-lOOOcc, 2=>1000cc, 109. 3=Reop lx, 4=Reop 2x or more, 5=Tamponade 110. i CARD I AC OUTPUT: 0=No, l=Mild+, 2=Mod4- (Pressure X 110. + 6hr), 3=Sev +(2 Pressors X 24hr) , 4=Sev 4- (IABA Req) 111. NEURO COMP: 0=No, l=Yes-Perf, 2=Yes-Non Perf 111. 112. WOUND COMP: 0=None or N/A, l=Minimal, 2=Sev Inf, 3=Dehis s Inf 112. _ 113. RENAL COMP: 0=No, l=Mod4-Out, 2=Sev4-Out, 3=Dialysis Req 113. _ 114. PULM COMP: 0=No, l=Mild Shock Lung, 2=Sev Shock Lung s Trach, 3=Shock lung c Trach, 4=Pneumo c Trach, 5=Pulm Lmb 114. _ 115. CPK: 0=Unk, 1=<80, 2=81-150, 3=151-300, 4=301-400, 5=401-500, 6=501-600, 7=601-700, 8=701-800 9=>800 115. 116. SGOT: 0=Unk, 1=<20, 2=<40, 3=<60, 4=<80, 5=<100, 6= =<120 116. 7=<140, 8=<160, 9=>160 117. LDH: 0=Unk, 1=<100, 2 =<200, 3=<300, 4=<400, 5=<500 117. 6=<600, 7=<700, 8=<800, 9=>800 118. MI-UNGRAFTED AREA: 0=Unk, l=No, 2=Ant, 3=Lat, 4=Inf, 118. 5=2 areas 119. COMP OF MI: 0=None or N/A, 1=C. Shock, 2=VSD, 3=MI, 119. 4=CHF(not frMI) 120. IABA USED: 0=No, l=Before Study, 2=After Study, 3=After Surg 120. 121. TOTAL HOSP DAYS: Actual Number(00=Unk)(99=99+) 121. _ 122. DAYS, ADMIT+CATH: (0=No cath, 9=9+) Actual Number 122. _ 123. DAYS, CATH->SURG: (0=No Sur, 9=9+) Actual Number 123. 124. DAYS, SUR->DISCH: (0=OR Death) Actual Number (If medical Patient, Use Cath Disch) 12 A . _ 125. COAGULATION: 0=Unk, l=No, 2=Hep, 3=Coum, 4=Mini-Hep 125. _ 126. 126. 127. 12 7. _ 128. 128. 129. 130. SURVIVAL: 0=Dis-Alive, l=Died IH(Cardiac), 2=Died, OR 129. 130. 3=Died IH-Non Cardiac, 4=LT Med FU (If "4" coded add Surgical Number, 4 ) (If Death, add Date of death ie: 1 ) Revision Dated: 08 01 75 FOLLOW UP DATA SHEET 74 PATIENT NAME: PT NAME: PATIENT NUMBER: INTERVALS 2. ANGINA IN INTERVAL: 0=Unk, l=No, 2=Cl-2, 3=Cl-3, 4=Cl-4 3. CHF IN INTERVAL: 0=Unk, l=No, 2=Cl-2, 3=Cl-3, 4=Cl-4 4. MI IN INTERVAL: 0=Unk, l=No, 2=Ant, 3=Lat, 4=Inf. 5. BRUCE STRESS IN INT: 0=Unk, l=No, 2=Yes-, 3=Yes+ 6. STRESS HR: 0=Unk or N/A, 1=<100, 2=<120, 3=<140, 4=<160, 5=>160 7. WORK STATUS-INT END: 0=Unk, l=No, 2=Reduced, 3=Nor PT // INTERVALS 1. 2. 3. 4. 5. 6. 7. _ 8. GRAFT STATUS-INT END: 0=Unk, l=Open, 2=S1 Sten 8 3=Sev Sten, 4=Closed _9. ANGIO IN INT: 0=Unk, l=No, LETTER=Yes 9 2=Autopsy (Letter used is interval letter followed by cath #) 10. 11. 12. PATIENT STATUS-INT END: 0=Unk, l=Alive, 2=Died-Cardiac, 3=Died-Non Cardiac, 4=LT Med FU 3=Trans to Surg Series; S followed by surg f 4 0 0 (If Pt died, the number is followed by death date) 10. 11. 12. REVISION DATE: 08 01 75
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Statistical analysis of survival data : an application...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Statistical analysis of survival data : an application to coronary bypass surgery Reid, Nancy 1976
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Statistical analysis of survival data : an application to coronary bypass surgery |
Creator |
Reid, Nancy |
Publisher | University of British Columbia |
Date Issued | 1976 |
Description | The survival data for two hundred patients who underwent coronary bypass surgery are subjected to quantitative analysis. The questions of interest are: (i) the long-term survival rates of these patients, (ii) the prognostic factors influencing survival, and (iii) the importance of types of grafting in long-term survival. Statistical methods used to ascertain the important prognostic variables include contingency table analysis and discriminant analysis. It is found that left ventricular function, age, risk classification, and extent of occlusion of the diseased artery are the most influential variables. The relationship of these variables to survival is analysed in detail using the proportional hazards model discussed by Cox (1972). |
Subject |
Cariovascular system -- Diseases -- Mortality Heart -- Surgery |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-02-08 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0093698 |
URI | http://hdl.handle.net/2429/19818 |
Degree |
Master of Science - MSc |
Program |
Statistics |
Affiliation |
Science, Faculty of Statistics, Department of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1976_A6_7 R43_4.pdf [ 3.56MB ]
- Metadata
- JSON: 831-1.0093698.json
- JSON-LD: 831-1.0093698-ld.json
- RDF/XML (Pretty): 831-1.0093698-rdf.xml
- RDF/JSON: 831-1.0093698-rdf.json
- Turtle: 831-1.0093698-turtle.txt
- N-Triples: 831-1.0093698-rdf-ntriples.txt
- Original Record: 831-1.0093698-source.json
- Full Text
- 831-1.0093698-fulltext.txt
- Citation
- 831-1.0093698.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0093698/manifest