Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Predicting short term disability benefit days for the Workers’ Compensation Board of British Columbia Fukui Innes, Junko 2004

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2004-0216.pdf [ 5.49MB ]
Metadata
JSON: 831-1.0091548.json
JSON-LD: 831-1.0091548-ld.json
RDF/XML (Pretty): 831-1.0091548-rdf.xml
RDF/JSON: 831-1.0091548-rdf.json
Turtle: 831-1.0091548-turtle.txt
N-Triples: 831-1.0091548-rdf-ntriples.txt
Original Record: 831-1.0091548-source.json
Full Text
831-1.0091548-fulltext.txt
Citation
831-1.0091548.ris

Full Text

PREDICTING SHORT T E R M DISABILITY BENEFIT D A Y S FOR THE WORKERS' COMPENSATION BOARD OF BRITISH COLUMBIA By JUNKO FUKUIINNES B.Sc. (Statistics), Simon Fraser University, 2001 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN BUSINESS ADMINISTRATION In THE F A C U L T Y OF G R A D U A T E STUDIES SAUDER SCHOOL OF BUSINESS We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA April 2004 © Junko Fukui Innes, 2004 Library Authorization In presenting this thesis in partial fulfillment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Junko Fukui Innes 13/04/2004 Name of Author (please print) Date (dd/mm/yyyy) Title of Thesis: PREDICTING SHORT T E R M DISABILITY BENEFIT DAYS FOR T H E W O R K E R S ' COMPENSATION BOARD O F BRITISH COLUMBIA Degree: Master of Science Department of Sauder School of Business The University of British Columbia Vancouver, BC Canada Year: 2004 Abstract The objective of this study was to predict Short Term Disability (STD) benefits paid by the Workers' Compensation Board of British Columbia. An STD benefit is defined as the total number of wage days that a worker is unable to work because of an injury incurred at the workplace. Factors which may explain variability in STD days paid at the claim level were studied with regression analysis and ranked according to the level of importance. This analysis indicated that injury related factors have more of an effect on STD days than other factors such as age and occupation. Regression analysis was used to study total STD days paid within the truncation periods and to predict the total STD days in the coming 6, 12, and 18 months at the area office and W C B levels. The model developed to total STD days paid within the truncation periods showed the relative difference of STD days within each factor. For example, this model showed that the older the claimant is, the more STD days they are likely to receive. The model to predict the total STD days in the coming 6, 12, and 18 months at the area office and WCB level yielded accurate results as measured by a comparison of the estimated STD days and the actual STD days paid in the past. Survival analysis was used to estimate additional and total STD days given that a claim has already received more than a certain number of STD days. The model showed that the more STD days a claimant has received so far, the more additional and total STD days the claimant will receive. Knowing the influential factors on STD days will help W C B seek possible methods to reduce STD days. The predictive models for the coming 6, 12, and 18 months will help the process of budgeting. Finally, based on the predictive model of the additional and total STD days, W C B can anticipate how many STD days will be paid on a particular type of claim given that a claim has already received more than a certain number of days of payment. ii Table of Content Abstract ii List of Tables v List of Figures ix Acknowledgement xi 1. Introduction 1 1.1 Company Background 1 1.2 Rehabilitation and Compensation Services 1 1.3 Claims 1 1.4 Project Objective 2 1.5 Related Literature 2 2. Data Structure and Preliminary Analysis 4 2.1 Data 4 2.2 STD Claim Conditions 4 2.3 Claim Reopenings 6 2.4 Truncation of STD Days 7 2.5 Cluster Analysis 10 2.5.1 The Comparison of Predictive Power between the Clusters and the Aggregations of WCB 14 3. Regression Analysis 15 3.1 Multiple Regression Analysis 15 3.1.1 Cross Validation 18 3.2 Determining Factors that May Explain Variation in STD Days 19 3.3 Models for STD Days Paid Based on Regression Analysis 20 3.3.1 Model for Total STD Days Paid Within Truncation Period at the Claim Level 20 3.3.2 Model to Predict Total STD Days Paid for Open Claims at the Claim Level, Area Office Level, and WCB Level 21 4. Survival Analysis 25 4.1 Background 25 4.2 Censoring 26 4.3 Regression Model for Survival Analysis 27 4.4 Total and Additional STD Days Paid to a Claim 30 5. Results 31 5.1 Determining Factors that May Explain Variation in STD Days 31 5.2 Model for Total STD Days Paid Within Truncation Period at the Claim Level 35 m 5.3 Model for Total STD Days Paid for Open Claims at the Claim Level, Area Office Level, and WCB Level : 40 5.3.1 Model for Total STD Days Paid in the Next Year at the Claim Level 40 5.3.2 Predictive Model for Total STD Days Paid in the Next Year at the Area Office 41 5.3.3 Predictive Model for Total STD Days Paid in the Next Year at the WCB Level 49 5.4 Model for Estimated Total and Additional STD Days with Survival Analysis 51 5.4.1 Determination of the Distribution for Survival Analysis 51 5.4.2 The Predictive Model of Total and Additional STD Days 53 6. Discussion 55 7. Future Work 56 References 58 Appendix A: Description of Factors Used in Analysis 60 Appendix B: Potentially Important Factors and Missing Values 61 Appendix C: Descriptive Statistics and Examples of Clusters 63 Appendix D: MSE CV, Root MSE_CV, Relateve_MSE_CV, and Rank of Potentially Important factors 69 Appendix E: Model for Total STD Days Paid Within Truncation Period at the Claim Level 74 Appendix F: Model to Predict STD Days in Coming Year for Open Claims at Claim Level 83 Appendix G: The Predicted Coefficients of the Model to Predict Additional and Total STD Days for a Claim which Already Has Received a Certain Number of STD Days 90 iv List of Tables Table 2.2.1: Descriptive Statistics of Total STD Days whose Injury Years were Specified Years 5 Table 2.3.1: Frequency of Reopened Claims 6 Table 2.3.2: Mean, Median and Maximum Time (Days) Between Closure and Reopening 7 Table 2.4.3: The Number of Claims Closed Within 6, 12, 18 and 24 Months Period 9 Table 2.4.4: Mean, Median, and Maximum STD Days for Each Truncation 9 Table 2.5.1: Cluster Formation of ICD9 13 Table 2.5.2: The Number of Original and Clustered Categories for Factors 13 Table 3.1.1: Baselines for Categorical Variables Used in this Analysis 16 Table 3.2.1: Factors Studied for Important Factors 19 Table 3.3.1.1: Factors Included in Model for Total STD Days Paid Within Truncation Period at the Claim Level 21 Table 3.3.2.1: Factors Included in Model to Predict Total STD Days for Open Claims at the Claim Level 23 Table 3.3.2.2: The Number of Claims Open at Area Offices Between 1997 and 2002 24 Table 3.3.2.3: Time Interval of Prediction for STD Days 25 Table 4.2.1: Independent Variables in Survival Analysis 29 Table 5.2.1: Factors which are Known at the Time of Registration or the First STD Payment.. 35 Table 5.2.2: R-Squared, Standard Deviation, and Mean for the Models in Each Truncation Period 36 Table 5.2.3: The Estimated Additional STD Days for ICD9 Cluster 36 Table 5.2.4: The Estimated Additional STD Days for Industry Cluster 37 Table 5.2.5: The Estimated Additional STD Days for Occupation Type Cluster 37 Table 5.2.6: The Estimated Additional STD Days for Area Offices 38 Table 5.2.7: The Estimated Additional STD Days for Weekday 38 Table 5.2.8: The Estimated Additional STD Days for Month 39 v Table 5.3.1.1: M S E and R-squared for STD Days in Previous 12 Months and ICD9 Models... 41 Table 5.3.2.1: Actual and Predicted Total STD Days in 2001 and 2002 for 6 Months Interval. 42 Table 5.3.2.2: Actual and Predicted Total STD Days in 2001 and 2002 and Predicted Total STD days in 2003 for 12 months interval 44 Table 5.3.2.3: Actual and Predicted Total STD Days in 2001 and 2002 and Predicted Total STD Days in 2003 for 18 Months Interval 47 Table 5.3.3.1: Actual and Predicted Total STD for 6 Months at W C B Level 50 Table 5.3.3.2: Actual and Predicted Total STD days for 12 Months at W C B Level 50 Table 5.3.3.3: Actual and Predicted Total STD for 18 Months at WCB level 50 Table 5.4.1.1: Log Likelihood, AIC, and BIC for Models with Distributions 51 Table 5.4.2.1: Conditions of Examples 53 Table A . l : Description of Factors Used in Analysis 60 Table B . l : Frequencies and Percentages of Missing Values for Potentially Important Factors 61 Table C l : Accident Type Cluster 63 Table C.2: Body Part Cluster 64 Table C.3: ICD9 Cluster 65 Table C.4: Industry Type Cluster 66 Table C.5: Nature of Injury Type Cluster 66 Table C.6: Occupation Type Cluster 67 Table C.7: Source of Injury Type Cluster 68 Table D . l : M S E _ C V , Root M S E _ C V , Relative M S E _ C V , and Rank of Important Factors with 6 Months Truncation with Starting Point of Registration Day 69 Table D.2: M S E C V , Root MSE_CV, Relative M S E _ C V , and Rank of Important Factors with 12 Months Truncation with Starting Point of Registration Day 70 Table D.3: M S E C V , Root M S E _ C V , Relative M S E _ C V , and Rank of Important Factors with 18 Months Truncation with Starting Point of Registration Day 70 v i Table D.4: M S E _ C V , Root M S E _ C V , Relative M S E C V , and Rank of Important Factors with 24 Months Truncation with Starting Point of Registration Day 71 Table D.5: M S E _ C V , Root M S E C V , Relative M S E _ C V , and Rank of Important Factors with 6 Months Truncation with Starting Point of 1st STD Payment Day 71 Table D.6: M S E _ C V , Root M S E C V , Relative M S E _ C V , and Rank of Important factors with 12 Months Truncation with Starting Point of 1st STD Payment Day 72 Table D.7: M S E _ C V , Root M S E C V , Relative M S E _ C V , and Rank of Important Factors with 18 Months Truncation with Starting Point of 1st STD Payment Day 72 Table D.8: M S E _ C V , Root MSE_CV, Relative M S E C V , and Rank of Important Factors with 24 Months Truncation with Starting Point of 1st STD Payment Day 73 Table E . l : The R-Square, M A E C V , MSE, and Root M S E for Backward Stepwise Method of Models for 6, and 12 Months Truncation 74 Table E.2: The R-Square, M A E C V , MSE, and Root M S E for Backward Stepwise Method of Models for 18 and 24 Months Truncation 75 Table E.3: Coefficient of Truncation Model 77 Table E.4: Additional STD Days to the Baseline Based On 6, 18, and 24 Months Truncation Models for ICD9 Cluster 79 Table E.5: Additional STD Days to the Baseline Based On 6, 18, and 24 Months Truncation Models for Industry Type Cluster 79 Table E.6: Additional STD Days to the Baseline Based On 6, 18, and 24 Months Truncation Models for Occupation Type Cluster 80 Table E.7: Additional STD Days to the Baseline Based On 6, 18, and 24 Months Truncation Models for Area Offices 80 Table E.8: Additional STD Days to the Baseline Based On 6, 18, and 24 Months Truncation Models for Weekday 81 Table E.9: Additional STD Days to the Baseline Based On 6, 18, and 24 Months Truncation Models for Month 81 Table E.10: Additional STD Days to the Baseline Based On 6, 18, and 24 Months Truncationl Models for Gender 81 Table E . l 1: Additional STD Days to the Baseline Based On 6, 18, and 24 Months Truncation Models for Year 82 Table F . l : The R-Square, M S E and Root M S E for Backward Stepwise Method of Models for 6 Time Interval for Open Claims 83 vii Table F.2: The R-Square, M S E and Root M S E for Backward Stepwise Method of Models for 12 Time Interval for Open Claims 84 Table F.3: The R-Square, M S E and Root M S E for Backward Stepwise Method of Models for 18 Time Interval for Open Claims 85 Table F.4: The Predicted Coefficient of Models for 6, 12, and 18 Months Time Interval for Open Claims 87 Table G . l : The Predicted Coefficients of the Model to Predict Additional and Total STD Days for a Claim which Already Has Received a Certain Number of STD Days .... 90 v i i i List of Figures Figure 2.2.1: Distribution of the Number of Total STD Days by Year 5 Figure 2.3.1: Time Interval Between Openings for Claims 6 Figure 2.3.2: Distribution of Time (Days) Between Any Closure and a Subsequent Reopening 7 Figure 2.4.1: Claim Evaluation in Truncation 6 Months 8 Figure 2.4.2: An Example of a Claim which Stays Open Longer than 6-Months Truncation 8 Figure 2.4.3: Six Months Truncation with Different Starting Points (Truncation 1: Injury Day, Truncation 2: Registration Day, Truncation 3: 1s t STD Payment Day) 10 Figure 2.5.1: Process of Creating Clusters 11 Figure 2.5.2: Comparison of Predictive Power Between Clusters and W C B Aggregations 14 Figure 3.1.1: Binary Coding for Categorical Variable 15 Figure 3.3.2.1: Methodology to Predict Total STD Days for Open Claims at Area Offices and W C B Level 22 Figure 3.3.2.2: The Prediction Methodology of the Model 22 Figure 4.1.1: The Censoring Rule for Claims 27 Figure 5.1.1: Relative M S E _ C V (the First STD Payment as the Starting Point of Truncations) 33 Figure 5.1.2: Relative M S E _ C V (the Registration Day as the Starting Point of Truncations) 34 Figure 5.2.1 The Estimated Additional STD Days for Age 39 Figure 5.3.2.1: Actual and Predicted Total STD Days in 2001 and 2002 for 6 Months Interval 43 Figure 5.3.2.2: Actual and Predicted Total STD Days in 2001 and 2002 and Predicted Total STD days in 2003 for 12 Months Interval 45 Figure 5.3.2.3: Actual and Predicted Total STD Days in 2001 and 2002 and Predicted Total STD Days in 2003 for 18 Months Interval 48 Figure 5.4.1.1: The Actual Average Additional STD Days to be Paid 51 Figure 5.4.1.2: The Average of Actual Total STD Days to be Paid 52 ix Figure 5.4.2.1: Estimated Additional STD Days to Be Paid 54 Figure 5.4.2.2: Estimated Total STD Days to Be Paid 54 Figure B . l : Change of Potentially Important Factors' Missing Values in Percentage 62 Figure E . l : Change in Root M S E in the Process of Backward Stepwise Method for Models with 6,12, 18 and 24 Months Truncation 76 Figure E.2: Additional STD Days to the Baseline Based On 6, 18, and 24 Months Truncation Models for Age 82 Figure F . l : Change in Root M S E in the Process of Backward Stepwise Method for Models with 6, 12, 18 and 24 Time Interval for Open Claims 86 x Acknowledgement First of all, I would like to acknowledge and thank Professor Alex Carvalho for being my advisor on this project. You taught me a lot of theoretical and practical usage of statistics. I also would like to thank Professor Martin L. Puterman, Professor Jonathan Berkowitz, and Mats Gerschman for their sincere advice and consideration for my work. I would also like to show my appreciation to my project member, Andrew Gray and Project Advisor, Fredrik Odegaard for their assistance on my project. I also would to acknowledge and thank Steve Barnett, Brian Erickson, Parm Hothi, Ernest Urbanovich, and Ian Edwards from the Workers' Compensation Board of BC for their support for this project. M y experience with WCB through the summer of 2003 was one of my greatest memories at U B C . Finally, I would like to thank my family. Neil, my husband, supported me throughout the completion of my Master's degree. Because of your support and encouragement I was able to earn this degree. I also would like extend special thanks to Monto, my little nephew, who gave me a smile whenever I needed one. xi 1. Introduction 1.1 Company Background The Workers' Compensation Board of British Columbia (WCB) is the administrative organization which encourages a safe and healthy workplace for workers and employers in British Columbia. WCB serves about 2 million workers and about 172,000 employers in British Columbia. W C B has 14 regional offices in British Columbia. The Workers' Compensation Act for the B.C. Ministry of Skill Development and Labour authorizes W C B to enforce occupational safety and health standards, provide legislated rehabilitation benefits and compensation to injured workers or their dependents, and collect funds from businesses to operate the workers' compensation system. W C B protects employers from lawsuits which may incur from their workers due to the work-related injuries or diseases. In addition, WCB also protects the workers and their dependents from financial hardship due to work-related injuries and diseases by awarding eligibility for compensation benefits. 1.2 Rehabilitation and Compensation Services The Rehabilitation and Compensation Services Division (WCB/RC) makes decisions about the entitlement of benefits for workers who have work-related injuries and provides assistance to injured workers returning to their workplaces. It is also responsible for administrating the health benefit payment, as well as providing appropriate vocational rehabilitation services for injured or diseased workers. 1.3 Claims There are several types of benefits workers can receive as a result of a request for compensation. Short Term Disability (STD) benefits, also known as Wage Loss Duration (WLD), are defined as the total number of wage days that a worker is unable to work because of an injury that was incurred at his or her work place. From now on, we will use STD to describe W L D . Those awarded STD are expected to go back to work after recovering from their injuries. The cost of STD benefits in 2002 for WCB was $253 million, which accounts for about 19% of the claim costs. Vocational Rehabilitation (VR) is the compensation to help injured workers go back to their pre-injury employment or the comparable occupation category, to assist the workers to overcome the immediate and long-term vocational impact of the injury or disease, and to encourage, reassure, and provide counseling to the workers. The cost of V R in 2002 for W C B was $130.5 million, which accounts for about 10% of the claim costs. Long Term Disability (LTD) benefits are for a worker who is permanently totally disabled or partially disabled due to work-related injuries or diseases. This benefit is payable for the lifetime of the worker. The cost of LTD days in 2002 for W C B was $738.3 million which accounts for about 54% of the claim costs. Heath Care (HC) benefits are compensation for the cost of health care for work related injuries and diseases, such as the cost of hospitalization, treatment, prescription of drugs, and so on. Claims can receive HC along with other benefits mentioned above. Some claims received only 1 HC, and these are called Health Care Only claims (HCO). The cost of HC benefits in 2002 for W C B was $237.6 million which is about 17% of claim costs. 1.4 Project Objective STD is considered to be one of the main factors that WCB/RC considers in its decision making process. Reduction of STD days reflects the WCB's mandate to obtain the early recovery of claimants from injury and to assist in their timely return to work. In addition, a reduction of STD days would lead to large financial savings. For example, for every one-day reduction in the average STD claim duration, WCB saves approximately $6 million in annual costs. Prediction of STD days and determination of factors which influence STD days would produce a number of benefits for WCB/RC. By determining the driving factors of STD, WCB/RC would be able to seek possible methods to reduce STD days. The determination of a predictive model and influential factors of STD days would help the process of budgeting. Finally, determining which influential factors of STD days are controllable may help reduce and predict the STD days in an effective way for both the individual worker and his or her employer. This study described in this thesis focused on STD benefits. The main objective of this study was to predict STD days paid at the claim level, area office level, and W C B level, as well as to determine the influential factors on the duration of STD. Potentially important factors in explaining variability in STD days of claims were suggested by WCB. These factors were analyzed and ranked according to the level of influence on the number of STD days at the claim level. Three models were developed for different purposes. The first model was created to study the claim characteristics based on the STD days. The second model was created to predict the total STD days in the coming 6, 12, and 18 months at the area office and W C B levels. The claims used for the second model were those which were open at the end of the previous months of the interval. The third model was created to estimate additional and total STD days given that a claim has already received more than a certain number of STD days. 1.5 Related Literature In this section, we review some studies related to our work as well as studies using survival analysis in business and working condition related fields. First, we will review W C B related studies. Urbanovich et al (2003) developed logistic regression models to determine the risk of STD claims to be converted into V R claims or LTD claims and to forecast the potential financial impact on the WCB. Based on this study, about 4.2% of claims were converted to V R or L T D claims, but received 64.3% ($1.2 billion) of the total payment awards in 1999. Claims which were either closed permanently or received HC or STD payment after the first final STD payment made up 95.8% of all the claims, but received 35.7% ($651 million) of the payment and awards. The logistic regression models were developed separately for the most common natures of injury. 95.72% of claims belonged to these 10 most common natures of injuries. Based on these models with the nature of injury as a classification, STD days paid and age of claimant were detected as statistically significant. He also determined that the probability of conversion 2 increased as the number of STD days paid increased and as the age of the injured worker increased. Mason (1993) studied claim duration using Analysis of Variance. The objective of the study was to identify the relationship between duration and the factors which had an effect on it. He determined that age, gender, marital status, injury type, accident type, subclass (e.g. logging), and accident year were each statistically significant predictors of claim duration. Next we review studies that applied survival analysis in business related fields. Hennessey and Muller (1995) used survival analysis to determine the effect of vocational rehabilitation efforts and work incentive programs. They analyzed the effect of demographics, the programs offered, and additional knowledge of the program such as trial work period or extended Medicare coverage on the vocational rehabilitation recipients' time to return to work, that is, the time taken to go back to work after receiving VR. In terms of the demographics of the recipients, the outcome indicated that the recipients who were younger, had higher education levels and lower amounts of primary insurance returned to work sooner. It also indicated that recipients who were male and single returned to work sooner. In addition, among five V R services (physical therapy, job or vocational training, job counseling, general education, and assistance in job placement), all of them, except job counseling, were positively related to returning to work sooner. Finally, for the knowledge of the program, they also discovered that the trial work period before returning to work had a positive relationship with the time to return to work. On the other hand, the knowledge of extended Medicare coverage had a negative relationship with the time to return to work. Trevor (2001) developed a survival model to analyze the association of time to voluntary turnover and job satisfaction, job availability, and movement capital (represented by cognitive ability, which was measured by the Armed Forces Qualifications Test), occupation-specific training (defined as the duration of specific vocational preparation to perform the average work in an occupation), and education level. Voluntary turnover means that people change their jobs voluntarily, rather than by being laid off. Trevor discovered that there was an interaction among the factors studied in this analysis. First, the higher the education level, cognitive ability, and occupation-specific training, the larger the negative impact of job satisfaction was on time to voluntary turnover. On the other hand, for individuals with low education levels, low cognitive ability, and low occupation-specific training, the rate of unemployment had an important effect on his or her time to voluntary turnover. Parker, Peters, and Turetsky (2002) studied the survival likelihood of distressed firms based on various corporate governance attributes and financial characteristics by using survival analysis. 176 U.S companies were sampled. They experienced a decrease of cash flow, from positive to negative, in successive fiscal years between July 1988 and June 1996, but did not appear to be in financial distress. The survival model indicated that a company that replaced the CEO by an outsider had twice as much chance of bankruptcy. In addition, the larger the levels of stockholder and insider ownership were, the less likely the company would experience bankruptcy. 3 2. Data Structure and Preliminary Analysis 2.1 Data Claims with injury dates between January 1st, 1997 and December 31 s t, 2002 were used for this study. We had a total of 419,360 claims. We decided to use these claims because in 1996 WCB had a major structural change and, as a result, some of the information regarding each claim had not been collected in the same manner as it previously was. In addition, an interval of 15 years would give us a good idea of claim life. Finally, the newer the information is, the more likely that we capture the trend of STD days paid to claims. W C B staff and the Center for Operations Excellence (COE) team discussed potentially important factors on claims regarding STD days at the beginning of the project. Then the determined factors were extracted by W C B from their data warehouse. We originally received the data in MS ACCESS form with 22 tables. As the project progressed, we requested additional data from W C B as we required it. By the end, we had received a total of 33 tables. Some of the factors in the tables were used as they were, and others were used to create additional factors for analysis. In total, we used 20 original factors from WCB and created 22 new factors based on the factors given to us by WCB. Appendix A gives details of the factors used in this study. Several factors which were considered potentially important contained missing values. Table B . l in Appendix B shows the number of missing values for each factor. Figure B . l in Appendix B shows the change in the number of missing values in the factors over the six years. Half of the factors had about 10% of their data missing. Overall, the number of missing values was stable over the 6 years of study, but in 2000, all of the factors except multiple injuries had more missing values. Multiple injuries contains the largest amount of missing values at 16.7% and the majority of its missing values were in 1997. We did not remove claims even if some of their factors were missing. If we removed all the claims from which at least one of the factors was missing, we would lose a lot of claims from the sample. SAS (Statistical Analysis System - Statistical Software) was used in this study. MS ACCESS data tables from WCB were imported into SAS. After importing, the tables were merged and new factors were created as required. Maple was also used for some complex calculations. 2.2 STD Claim Conditions Some difficulties are involved in predicting STD days due to some unknown factors in the life of a claim. The unknown factors are: • How long a claim stays open • Whether a claim reopens • If a claim reopens, how long it stays open • How long a claim is closed before the next reopening The mean, median, and standard deviation of the number of STD days of claims are shown in Table 2.2.1. Claims were grouped based on their injury year. The summary statistics were determined based on the number of days claims were open between their injury days and April 4 30th, 2003. The overall decline of means, standard deviation, and maximum number of STD days occurs because the older the claim, the longer the time they mature or stay open, but the newer the claims, the shorter the time they mature. Claims with injury days in 2002 have fewer number of STD days compared to those with injury days in 1997. As a result, the variability in the number of STD days is smaller in 2002 than in 1997. The standard deviation in 2002 is much smaller than that in 1997. The number of claims declined over the six years. The means of STD days are about 40 days, but the medians are about 10 days in the six years. These differences indicate that the distribution of total STD days of claims is skewed to the right. The skewness can be observed on Figure 2.2.1 which shows the distribution of the total STD days for each year. Notice that majority of claims received less than 20 STD days, and only a few receive more than 100 STD days. Table 2.2.1 Descriptive Statistics of Total STD Days whose Injury Years were Specified Years Year Number of Claims Mean Median Standard Deviation Minimum Maximum 1997 74,260 43 10 109 1 2,087 1998 73,435 45 10 105 1 1,725 1999 71,980 43 10 95 1 1,512 2000 72,273 41 10 86 1 1,173 2001 67,095 38 11 72 1 825 2002 60,317 32 10 52 1 466 Figure 2.2.1 Distribution of the Number of Total STD Days by Year The Number of STD Days Paid Per Year 60000 50000 40000 ->. u e a 30000 -o-U fa 20000 -10000 -o o o o o o o aooo o o o o o acoo o o o o o acoo o o o o o aooo o o o o o aooo o o o o o C D CN *o o t^- oo CN \o C O N o ' t oon \o C O N ^ o ^ x n « aors o o t o o t s o C O N ^ O T r o o r s v o aor^  ^ c tooo ivo co —' — '—1 CN CN r*v~t —. — — f\j CN) m -—' <—' '—CN CN C*B^ — — — CN CN r*m '—' — — CN CN ctm — — — CN CN cm A A A A A A 1997 1998 1999 2000 2001 2002 STD Days Paid 5 2.3. Claim Reopenings One of the uncertainties involved in a claim's life is whether a claim reopens and, if it does, how long it stays open. It is also possible that claims reopen several times. Figure 2.3.1 shows an example of the time interval between opening and reopening. The arrow indicates the time interval between previous closure and the following reopening. Figure 2.3.1 Time Interval Between Openings for Claims Closed Closed Claim reopenings were studied briefly in this analysis. Claims were considered to be closed for the first time when the first final STD payment occurred. When claims received the first STD payment for the second time opening, they were considered to be reopened. Then they were considered to be closed when they received the second final STD payment. Table 2.3.1 Frequency of Reopened Claims # of Reopenings Frequency % 0 365,009 87.04 1 44,830 10.69 2 7,192 1.72 3 or more 2,310 0.55 Table 2.3.1 shows the number of claims reopened 0, 1, 2, and 3 or more times. More than 85% of claims never reopened, and close to 99.5% of claims reopened two or fewer times. Figure 2.3.2 shows the distribution of time between any closure (the final STD payment of a period) and a subsequent reopening (the first STD payment of a period) by day. The x-axis indicates the number of days between any closure and a subsequent reopening, and the y-axis indicates the relative frequency of claims which reopened at least once. Very few claims reopen if they have been closed more than 90 days. 6 Figure 2.3.2 Distribution of Time (Days) Between Any Closure and a Subsequent Reopening Distribution of Length of Closure-Reopening Period Al l Closure-Reopening Periods 70.00% 1 60.00% 50.00% >> o c § 40.00% o-£ LL | 30.00% ra CD BC 20.00% 10.00% 0.00% -! in o m o co in o co CD un CD o CM r -in o CD 1-c- oo in to co in o> in CO o oo o Days Table 2.3.2 shows the median and maximum number of days between a closure and subsequent reopening i f a claim reopens. The mean of the time interval increases for claims with more reopenings. However, medians indicate that the time intervals are around one to two months. Medians are not affected by outliers, and skewed distributions like this time interval, so it is reasonable to use the median to study the centre of the distribution. Based on the medians, i f a claim reopens, it is likely to reopen within two months of the previous closure. Table 2.3.2 Mean, Median and Maximum Time (Days) Between Closure and Reopening Time Interval Mean Median Maximum Any Time Interval 103 25 2196 First Closure and First Reopening 91 22 2196 Second Closure and Second Reopening 145 36 1995 Third or More Closure and Third or More Reopening 161 55 1645 2.4 Truncation of STD Days In Section 2.2, the unknown factors in claim life were discussed. In order to deal with such unknown factors, one of the methods employed in this study was truncation of the claim period. This approach creates intervals of a certain number of months with a particular starting point. 7 STD days paid within the intervals can be studied as a result. By using truncation, we can see a snapshot of the claim life within a defined time interval. The studied truncation periods considered: • Time interval of truncation: 6 months, 12 months, 18 months, and 24 months • The choices of the starting point of truncation: the injury day, the registration day, and the first STD payment day. Figure 2.4.1 is an example of a 6-month truncation. The starting point can be the injury date, registration date or the first STD payment date. During the truncation period, a claim may stay open the whole time, or close and reopen, or close and never reopen. Figure 2.4.1 Claim Evaluation in Truncation 6 Months The Start of Truncation The End of Truncation Open Closed Open Closed Months The advantage of the truncation is that because the duration of a claim in a certain time interval is studied, uncertainty about the total life time, the chance of reopening, and length of reopening do not have to be considered. One disadvantage of the truncation methodology is that not all of the claims will close permanently within the period of truncation. Figure 2.4.2 shows an example of a claim which stays open longer than the 6 months truncation period. Figure 2.4.2 An Example of a Claim which Stays Open Longer than 6 Months Truncation The Start of The End of Truncation Truncation Months Another disadvantage is that we can only examine the life of a claim within a truncation period. We will not know how the claim developed outside of the truncation period. For each claim, we determined the number of STD days paid within particular truncation periods using the following process. W C B records the type of benefit and its payment for each claim monthly. One of the tables we received contained benefit types and their payment, and the date 8 of the payment for individual claims. At first, we determined the end of a truncation of, for example, 6 months starting at the first STD payment date, by counting 6 months from the first STD payment date. Then we determined the STD payment days that occurred within the 6 months, and the number of STD days which were paid on each payment day. This is the number of STD days paid within the 6 months truncation with the first STD payment date as a starting point. Similarly, the number of STD days paid within different truncation periods was determined, and these variables are shown in Table A . 1 in Appendix A . For example, a variable, trunc_6_lst_std_months, represents the number of STD days paid for 6 months starting from the first STD payment. The frequency of claims which closed permanently within 6, 12, 18 and 24 months starting from injury day, registration day and the first STD payment day were determined. Table 2.4.3 shows the result. For preliminary analysis purposes, claims were considered to be closed permanently i f they did not reopen between the first final STD payment and April 30 t h, 2003. Notice that more than 90% of claims close within 6 months of their opening no matter which starting point the truncation has. This indicates that a truncation of 6, 12, 18, or 24 months covers most of the claims' STD days paid. Table 2.4.3 The Number of Claims Closed Within 6,12,18 and 24 Months Period Active Period (months) Starting Day Injury Day Registration Day First STD Payment Frequency % Frequency % Frequency % 0-6 382,734 92.29 385,533 92.97 390,954 94.27 0-12 403,557 97.31 404,160 97.46 405,430 97.76 0-18 409,943 98.85 410,205 98.91 410,667 99.03 0-24 414,706 100.00 414,706 100.00 414,706 100.00 Table 2.4.4 shows the mean, median, and maximum STD days paid for each truncation period. The mean ranges between 27 days and 39 days. Medians are stable regardless of the truncation. However, maximum STD days are very different depending on the truncation. Table 2.4.4 Mean, Median, and Maximum STD Days for Each Truncation Truncation Period Starting Point Mean Median Max. STD 6 months Injury Day 27.8 9.0 224 Registration Day 28.6 10.0 711 First STD Payment 30.7 10.0 1288 12 Months Injury Day 33.8 10.0 412 Registration Day 34.2 10.0 751 First STD Payment 35.5 10.0 1288 18 Months Injury Day 36.6 10.0 583 Registration Day 36.8 10.0 1068 First STD Payment 37.7 10.0 1288 24 Months Injury Day 38.1 10.0 763 Registration Day 38.2 10.0 1068 First STD Payment 38.9 10.0 1305 9 Figure 2.4.3 shows the 6 months truncation with different starting points. In the study, STD payments occurring within a truncation period will be analyzed. The starting point of Truncation 1 is the injury day, that of Truncation 2 is the registration day, and that of Truncation 3 is the first STD payment day. A, B, and C indicate the day that a claimant received payment for the STD days he or she has received between the previous payment and the day A, B, or C. Truncation 1 and Truncation 2 capture STD payments of A and B. Although a claim opens and receives some STD days after the payment of B, Truncation 1 and Truncation 2 do not include those days because the payment of those days does not occur until C, which is outside the truncation. Truncation 3 captures the STD payments of A, B, and C. As a result, a truncation with the first STD payment as the starting point covers more STD days paid than a truncation with the injury day or registration day as the starting point. In the further analysis, for 6 and 12 months truncation, claims with an injury date between January 1st, 1997 and December 31st, 2001 were used, and for the 18 and 24 months truncation, claims with an injury date between January 1st, 1997 and December 31st, 2000 were used. The data set was chosen in the manner described above so that the claim would have the full truncation period to mature. For example, by using claims whose injury date was up to the end of December, 2002, the claim condition in 2001 and 2002 could be observed in the given data set. This, as a result, covers the entire truncation of 18 and 24 months. Figure 2.4.3 Six Months Truncation with Different Starting Points (Truncation 1: Injury Day, Truncation 2: Registration Day, Truncation 3: First STD Payment Day) Truncation 1 r#" Truncation 2 i • i Truncation 3 • i • i Injury Day _L Registration Day 1 s t STD Payment Day _L J_ J_ _L _L • 6 Months STD Payment • Claim Open Claim Closed _L _L J a n Feb Mar A p r May J u n Ju l A u g S e p Oct Nov D e c 2.5 Cluster Analysis A number of factors which potentially could affect the number of STD days paid were studied. The majority was represented by categorical variables. In order to develop predictive models with categorical variables as independent variables, we had to create binary variables for each category within a factor. For example, if there were 3 occupations within an occupational factor we set the occupational factor equal to 1 if an individual had a particular occupation, and 0 10 otherwise. As a result, we would have had a matrix of size 3 x 419,360 for the occupation type. With more than one categorical variable of interest, our analysis would be very complex. Some of the factors used as part of the predictive models contain many categories. Therefore, cluster analysis is used to reduce the number of categories so that predictive models could be developed relativelyeasily. In fact, the W C B has three levels of aggregation for some categorical variables of interest. Level 1 is the most detailed level and is shown in Table 2.5.2 and level 3 is the most aggregated level. The numbers of categories for both level 1 and level 3 are shown in Table 2.5.2. Therefore, we compared the predictive power of clustered factors and WCB aggregation after forming clusters. Claims with an injury date between January 1st, 1997 and December 31 s t, 2000 were used to form clusters. The method of clustering is shown in Figure 2.5.1 below. We grouped these claims to form clusters for all factors because these claims had had at least 2 years to mature, so that the clusters would be more representative of the STD characteristics of each factor. Figure 2.5.1 Process of Creating Clusters January 1st, 1997 and December 31st, 2002 January 1st, 1997 and December 31st, 2000 100 > 5,10,20,35, 50, 65, 80, 90, and 95 percentiles of STD days paid 100 < 50* percentile of STD days paid t • cluster cluster cluster cluster cluster cluster The categories within a factor were divided into two groups. The first group contains the categories which appeared at least 100 times, and the other contains the categories that appeared less than 100 times. For example, in the case of ICD9, i f more than 100 claims had a category called back injury, then this category is in the first group. If less than 100 claims had a category called fracture of jaw, then this category is in the second group. After the category had been divided into these two groups, for the categories in the group with at least 100 occurences, we determined the 5, 10, 20, 35, 50, 65, 80, 90, and 95 percentiles of STD days. Then, using SAS, 11 the values of the percentiles above were compared and those categories which had similar percentile values were gathered together to form a cluster. The comparison was done based on the average distances between the nine types of percentiles among the factors. The factors which had the shortest average distance between them were clustered together. Refer to Khattre and Naik (2000) for more details. After the clusters were created, we then determined the 50 t h percentile of total STD paid for the categories belonging to the group which appeared fewer than 100 times. We then compared this percentile with the 50 t h percentile of STD days of claims belonging to each cluster, and included the categories into the clusters which had the closest 50 t h percentiles. The reason why we used this method was that since only a few claims had these types of categories, we considered that it was not possible to determine the 5, 10, 20, 35, 50, 65, 80, 90, and 95 percentiles of STD days as was done with the group with categories appearing more than 100 times. One disadvantage of this method is that some of the categories in factors did not appear between 1997 and 2000, but did appear in 2001 or 2002. We did not include these categories in clusters because they did not have enough time to mature and, as a result, their STD days received may not have represented the actual severity of injuries or disease related to the categories. SAS was used to determine the clusters of factors. The determination of the number of clusters was based on cubic clustering criterion (CCC) and pseudo t2 (PST2). Refer to Khattre and Naik (2000), and SAS Institute Inc (1983) for more details on the determination of C C C and PST2. CCC is determined based on observed R-squared and expected R-squared. In general, the higher the CCC value is, the better the clustering is. Usually, a CCC>3 is considered to be good. PST2 statistic measures the separation between the two clusters most recently joined. The larger the PST2, the less similar the categories within clusters are. We looked for the local maximum value of C C C for each factor. At the same time, we looked for a change of PST2 in the process of clustering and i f PST2 suddenly increased dramatically, we considered that the categories that were merged were not significantly similar in terms of their percentiles. We determined the final number of clusters which indicated a CCC as the local maximum and whose PST2 was the one just before a big increase. For example, Table 2.5.1 shows the C C C and PST2 of ICD9. When 20 clusters were formed, CCC was 1.44 and PST jumped to 115 from 6.2 when 19 clusters were formed. However, we wanted to have smaller number of clusters, so we decided to move on to form fewer clusters. Notice that the next jump to PST2 occurs when between 10 and 11 clusters were formed. In addition, the absolute value of the CCC of cluster 11 is higher than either when 12 clusters or 10 clusters were formed. Therefore, we decided that the total number of clusters for ICD9 was 11. 12 Table 2.5.1 Cluster Formation of ICD9 # of Clusters Cluster Joined F R E Q CCC PST2 25 CL35 CL38 17 0.66 16.5 24 CL39 OB40 5 0.85 4.1 23 CL47 ^ CI.31 4 1.04 3.1 22 OB8 OB42 2 1.36 21 OB9 CL33 5 1.6 3.3 20 CL24 CL32 9 1.44 6.2 19 CL36 CL48 73 -3.5 115 18 CL30 CL40 6 -3.2 ~'~5ir 17 CL42 OB5 18 -2.8 10.3 16 CL26 OB97 4 -2.2 1.9 15 CL29 CL28 4 -1.7 2.3 14 CL25 CL37 __! -1.4 7.9 13 O I ! M CL16 5 -0.73 1.8 12 CL20 CL50 11 -o r 4.5 C 1.23 CI.27 0 45 4.') 10 CL17 CL14 38 -1 1 47.7 9 CI.22 CL21 7 -0.65 7.9 8 CL13 CL43 7 0.26 6.4 7 CL15 CL12 15^ 0.97 11 6 CL18 CL11 12 1.29 12.5 5 CL7 CL8 22 0.98 24.3 4 CL19 CLIO 111 -4.5 329 3 CL5 CL6 34 -2.9 42.2 2 CL3 CL9 41 -0.32 53.4 1 CL4 CL2 152 0 392 Table 2.5.2 shows the number of categories for factors before and after clustering and WCB aggregation if available. Notice that without aggregation or clustering, each factor has more than 100 categories. However, by aggregation or clustering, the number of categories of factors was reduced, making it easier to analyze the factors as a result. The number of claims and examples categories belonging to each cluster is shown in Table C.l ~ C.7 in Appendix C. Table 2.5.2 The Number of Original and Clustered Categories for Factors Factors # of categories (WCB level 1 code) # of categories of WCB aggregation (level 3 code) # of clusters ICD9 900 n/a 11 Nature of injury 306 6 9 Accident type 221 8 11 Body part 171 8 14 Source of injury 1112 10 10 Industry type 959 12 12 Occupation type (Stats Canada) 506 11 13 13 2.5.1 The Comparison of Predictive Power Between the Clusters and the Aggregations of WCB The predictive power of the clusters and the aggregations of W C B are compared. Multiple linear regression and cross validation are used to determine the predictive power of each method for the factors. Section 5 describes the methodology of linear regression, cross validation, and root M S E C V . Figure 2.5.2 shows the mean square error determined by the cross validation (MSE_CV) of the factors. The cluster Nature of Injury shows a slightly smaller root M S E _ C V than WCB's associated aggregation. The other clusters show a very small difference in root M S E _ C V from that of the associated aggregation of W C B . This indicates that the clusters predicting STD days have a slightly better predictive power than the aggregation of W C B does. In this study, we decided to use clusters to conduct the analysis and develop models in this study. The disadvantage of using the clusters instead of the W C B aggregation is that WCB would have to recategorize the claims according to the clusters formed for each factor in order to run the predictive models developed in this study. Further, the clusters may not be easily interpretable. For future study, we recommend using the WCB aggregation when available and developing a meaningful aggregation for ICD9. Figure 2.5.2 Comparison of Predictive Power between Clusters and WCB Aggregations Root M S E _ C V of C lus te rs v s . W C B Aggrega t ions 75.00 70.00 LU in 65.00 60.00 55.00 40.00 Accident Body Part Industry Nature of Occupation Source of Type Type Type Injury Type Injury Factors -o- - 6 months WCB - ^ — 6 months Cluster -A- - 12 months WCB —k—12 months Cluster . - • - . 18 months WCB — • 18 months Cluster - -X- - 24 months WCB — K — 2 4 months Cluster 14 3. Regression Analysis 3.1 Multiple Regression Analysis In multiple regression analysis, several predictors are used to explain variability in a single response variable. For this study, the response variable is STD days paid on a claim. However, the claim and the form of the response variable vary depending on the model and as a result the response variable of total STD days paid were determined for each model. For example, in the model using the truncation method mentioned in Section 4.4, the response variable was total STD days paid within a truncation of 6, 12, 18 or 24 months. For each model, the type of response variable, the STD days that will be predicted, as well as its independent variables are specified. In this setting, multiple regression analysis works as follows. There are n claims of interest. There are / continuous and m categorical variables as independent variables for the model. Each of the categorical variables has £>/levels where j = 1 .. m. For the categorical variables, we created binary variables to indicate a category of factors an individual belongs to. We set the baseline as shown in Table 3.1.2. We created a matrix of O's and l 's for all the categorical variables in the models. The variable for each cluster was set to 1 if an individual has an occupation type which belongs to the particular cluster, and 0 otherwise. However, i f the individual's occupation type belongs to cluster 13, the baseline of this factor, then all the categories are equal to zero. Figure 3.1.1 shows the case for occupation type cluster. There are 13 clusters for this factor. At the end, each individual has a transposed vector which contains one 1 and ten O's. The effect of baseline is shown as the coefficient of the intercept of the model. Table 3.1.2 indicates that missing values were set as baselines for some factors. As mentioned earlier, we did not remove claims which did not have complete information on all factors of interest. Therefore, some claims contain missing values in some of the factors. We used missing value as one of the levels in some clusters because we hoped that the missing value itself would show us some characteristics of claims. For example, missing values may tend to appear in claims which are registered at a certain area office, or in a certain month, and so on. Figure 3.1.1 Binary Coding for Categorical Variable cluster: 1 2 3 4 5 6 7 8 9 10 11 12 claim 1 [ 1 0 0 0 0 0 0 0 0 0 0 0 ] claim 2 [ 0 0 0 0 0 1 0 0 0 0 0 0 ] claim 3 [ 0 0 0 0 0 0 0 0 0 0 0 0 ] Claim 1 belongs to cluster 1, claim 2 belongs to cluster 6, and claim 3 belongs to the baseline, cluster 13. 15 Table 3.1.1 Baselines for Categorical Variables Used in this Analysis Factor Baseline ICD9 cluster Missing Value Occupation type cluster Cluster 13 Industry type cluster Missing Value Nature of injury type cluster Missing Value Accident type cluster Missing Value Body part cluster Missing Value Source of injury cluster Missing Value Age Older than 60 Gender Male Area office Missing Value Injury month Missing Value Injury day of a week Sunday Previous 6 or 12 or 18 month STD days 30 days or less The total number of coefficients coming from each categorical variable is: Rj - Qj i f the baseline is a missing value Rj = Qj -1 if the baseline is a non-missing value Then the total number of coefficients except that of the intercept in the model is: m 7=1 The multiple linear regression model for this study, in general, is: Y = X/3 + e (3.1.2) where Y and e are (n k x 1) vectors below, -i y2 (3.1.3) e = (3.1.4) nk is the total number of claims for model k. 16 /? is defined to be the vector parameter of length (p + l)x 1 where p is the number of coefficients. Po P = PP (3.1.5) X is defined to be an nkx(p +1) matrix. X = 1 x 21 -*22 1 X n \ Xn2 "2p np (3.1.6) Matrix X contains the observed values for the continuous independent variables, and 0 or 1 for the categorical independent variables. Assuming that each e, is normally distributed, e can be written as: e, ~ N(0, c>2In), where / is (nk x nk) identity matrix. In order to determine the parameters of the models, parameter estimates are chosen to minimize a quantity called the residual sum of squares (RSS). = {Y-XP)T(Y-XP) where xf is the ith row of X, i=l ..n^. (3.1.7) The least squares estimate, f3, can be determined by differentiating (3.1.7) with respect to each /?, and gives: p = (xTxylxTY where)3is a(p + l)x 1 vector: (3.1.8) Po A p= (3.1.9) PP 17 Mean square error is determined as: MSE = , , m nk-(p + \) (3.1.10) We applied the backward stepwise elimination method to reduce models. When we built predictive models with multiple regression analysis, we included all the factors we decided to use to build a full model. Then we reduced the factors one factor at a time based on their rank of importance determined by the section of important factors, and developed a reduced model based on multiple regression with cross validation. In general, when the backward stepwise elimination method is used to develop models, factors which are not significant are removed first. However, in this study we used the rank of important factors because we had a lot of categorical variables as factors, and the number of categories each factor had was large. As a result, it would take a lot of work to remove each non-significant category of factors of interest. In addition, because our model did not take any interactions into account, we could not determine whether the categories were non-significant due to the interactions. Therefore, we decided that it would be reasonable to remove a factor as a whole based on the rank of important factors rather than to remove individual categories. For example, even i f several months in the factor, month of injury, are not significant enough to be part of a model, we would not remove these months. However, we would remove the factor, month of injury as a whole after all of the factors whose ranks were lower than that of month of injury were removed. We then would remove this factor and would compare the resulting model with the more saturated model. 3.1.1 Cross Validation Cross validation was applied to some of the analyses. Cross validation was used to study model performance and choose the best model. The data were divided into five groups randomly, and the record of the STD days of claims belonging to group 1 was removed. Then the records of the STD days of claims belonging to the rest of the groups were used to develop a regression model and the STD days of claims belonging to group 1 were predicted. The process was repeated for all five groups. After all the claims had predicted values of STD days, Mean Square Error for cross, validation ( M S E C V ) and Mean Absolute Error for cross validation ( M A E C V ) were determined using the following equations. 5 ^ (actualSTD - predictedSTDf MSE CV = ^ (3.1.11) 5 ^ , , (3-1.12) / t \actualSTDi - predictedSTDt MAE CV= 1=1 The best regression model is defined as the one which had the smallest M S E _ C V and M A E _ C V . When M S E _ C V and M A E C V did not agree, we used M S E _ C V as the measurement to define 18 the best regression model because M S E _ C V penalizes large deviations between actual STD and predicted STD days more than M A E _ C V . 3.2 Determining Factors that May Explain Variability in STD Days Factors which have an influence on STD days were analyzed and ranked based on their influential power as determined in this study. 18 factors in Table 3.2.1 were analyzed in this study. 17 of the factors below were suggested by W C B as potentially important factors. In addition, we also created a factor called "adjusted multiple injuries". In order for a claim to be in the database, the claimant has to have at least one injury. Therefore, we assumed that all the claims that had missing values for the factor had one injury and named this factor "adjusted number of injuries". We also studied the impact of this factor on STD days paid in this analysis. The description of factors can be found in Appendix A . Table 3.2.1 Factors Studied for Important Factors Factor Type of Factor Number of Categories (Qj) ICD9 cluster Categorical 11 Occupation type cluster Categorical 13 Industry type cluster Categorical 11 Nature of injury type cluster Categorical 9 Accident type cluster Categorical 11 Body part cluster Categorical 14 Source of injury cluster Categorical 10 Age Categorical 6 Gender Categorical 2 Area office Categorical 15 Injury year Continuous n/a Injury month Categorical 12 Injury day of a week Categorical 7 Multiple injuries Continuous n/a Adjusted multiple injuries Categorical n/a Interval (injury day ~ 1st STD payment) Continuous n/a Interval (1st disablement day ~ 1st STD payment) Continuous n/a Interval (registration day ~ 1st STD payment) Continuous n/a Multiple regression analysis was used to rank the factors. We used each potentially important factor as a predictor and developed separate models for each of them. The number of STD days paid to a claim within truncations of 6, 12, 18 and 24 months with registration day and the first STD payment day as a starting point is used as a dependent variable. The details of the truncation method and the process of determining the number of STD days paid within a particular truncation were described in Section 2.4. Claims with injury dates between January 1st, 1997 and December 31 s t, 2001 were used for the truncations of 6 and 12 months and those with injury dates between January 1st, 1997 and December 31 s t, 2000 were used for the truncations of 18 and 24 months in this analysis. If a factor had a greater influence on STD days paid, the model with the factor would estimate the STD days paid more accurately than those whose influential power was smaller. In other words, the M S E of a model of a factor with a strong influence on STD days will be smaller than 19 that of a factor with a weak influence on STD days. Cross validation described in Section 3.1.1 and their M S E C V (3.1.11) and M A E _ C V (3.1.12) were compared to determine which factors were more influential. Because we used different claims for truncations of 6 and 12 months and truncations of 18 and 24 months, we needed to standardize M S E _ C V and M S A _ C V to compare the performance of factors across the different lengths of truncation. Therefore, after the process above, relative M S E _ C V (R_MSE_CV) and relative M A E C V (R_MAE_CV) were determined in the following way and the potentially important factors were ranked based on them. 1=1 R_MAE_CV= MAE~CV> (3.1.14) Y^MAE_CV 1=1 where s is the total number of regression models built for the analysis. In this case, s = 19 (models for 18 different factors and one model with only an intercept as a factor). 3.3 Models for STD Days Paid Based on Regression Analysis The following two types of models were developed with regression analysis. Model 1: Model for Total STD Days Paid Within Truncation Period at the Claim Level Model 2: Model for Total STD Days for Open Claims at the Claim Level, Area Office Level, and W C B Level. 3.3.1 Model for Total STD Days Paid Within Truncation Period at the Claim Level This model was developed to compare STD days between different claim types. For example, this model could be used to compare whether forestry industry workers would be likely to have more STD payment compared to fishery industry workers. Multiple regression analysis and cross validation were again used to develop models. The response variable was STD days paid for a claim within a truncation period of 6, 12, 18, and 24 months with the first STD payment as the starting point. The factors included as independent variables in the models are shown in Table 3.3.1.1. Baselines for the categorical variables are shown in Table 3.1.1. 20 Table 3.3.1.1 Factors Included in Model for Total STD Days Paid Within Truncation Period at the Claim Level Factor Type of Factor Number of Categories ( Q i ) ICD9 cluster Categorical 11 Occupation type cluster Categorical 13 Industry type cluster Categorical 11 Age Categorical 6 Gender, Categorical 2 Area office Categorical 15 Injury year: (injury year - 2000)* 10 Continuous n/a Injury month Categorical 12 Injury day of a week Categorical 7 A model including all the factors above was run, and estimated coefficients of the factors were determined. The backward stepwise elimination method was applied to determine the best model. A full model was first built with the method mentioned in Section 5.1, and then cross validation described in Section 3.1.1, was applied to measure the model performance. 3.3.2 Model to Predict Total STD Days for Open Claims at the Claim Level, Area Office Level, and WCB Level. For WCB to budget for coming year, the knowledge of how many STD days would occur in the coming year is necessary. There are three types of claims that have to be predicted for the coming year. Type 1: New claims which will open in the coming year Type 2: Existing active claims in the current year which wil l stay open in the coming year Type 3: Existing inactive claims in the current year which will reopen in the coming year. We focused on type 2 claims and developed a model to predict total STD days for such claims. At first, we developed a model to predict total STD days that will be paid to type 2 claims. Then we determined which area offices these claims belonged to and added all the STD days of claims belonging to the individual area office (area office level prediction). Finally, we aggregated all the claims to predict total STD days for the coming year at the W C B level. Figure 3.3.2.1 shows the process of the prediction in this section. 21 Figure 3.3.2.1: Methodology to Predict Total STD Days for Open Claims at Area Offices and WCB Level Claim Claim Claim Claim Claim Area Office Area Office Area Office WCB Three different time intervals were studied, as shown in Figure 3.3.2.2. 1) The prediction of total STD days of claims which were open on June 30 t h, 2002 for the next 6 months between July 1st, 2002 and December 31 s t, 2002. The total STD days of claims in the time interval of C were predicted. 2) The prediction of 2003 total STD days of claims which were open at the end of December, 2002. This interval is shown as D in Figure 3.3.2.2. 3) The prediction of total STD days of claims which were open on June 30 t h, 2002 for the next 18 months between July 1st, 2002 and December 31 s t, 2003. In other words, the total STD days of claims in the time interval of C+D were predicted. Figure 3.3.2.2 The Prediction Methodology of the Model A B C D I I I I I 07/01/2001 01/01/2002 07/01/2002 01/01/2003 01/01/2004 Originally, we developed a model for the scenario described in 1) above, and we later developed models for 2) and 3) as requested by WCB. Multiple linear regression analysis was used to develop the model. Eleven factors included as independent variables in the model are shown in Table 3.3.2.1. We used claims with an injury date of January 1st, 1997 and December 31 s t, 2001 to develop models. 22 The dependent variables for scenario 1 were the total number of STD days paid for a claim for 6 months starting June 30 t h of 1998, 1999, 2000, 2001 and 2002. The dependent variables for scenario 2 were the total number of STD days paid for a claim for 12 months starting January 1st of 1998, 1999, 2000, and 2001. The dependent variables for scenario 3 were the total number of STD days paid for a claim for 18 months starting June 30 t h of 1998, 1999, 2000, and 2001. We did not include claims which were open at the end of June or December 1997 in the data because we suspected that there were not as many claims open at the end of those two months compared to other years. This is because the claims open at the end of June and December 1997 were only those whose injury year was in 1997. However, claims which were open at the end of June and December in 1998 and after were 1) claims opened in the particular year (for example, in the case of 1998, those opened in 1998), and those opened before the particular year (for example, in the case of 1998, those opened in 1997). STD paid in the previous 12 months is a categorical factor. There are 13 categories in this factor such that 0 to 30 STD days paid in previous 12 months, 30 to 60 STD days paid in previous 12 months, and so on, up to 360 days. The reason we set this factor as a categorical variable is that the relationship between this factor and the STD days paid in the coming year, the dependent variable, was assumed to be a non-linear relationship. If we set this factor as a continuous variable, we would have to assume that STD days paid in the previous 12 months and STD days to be paid in the coming year have linear relationship. Therefore by setting the factor as categorical, we would be able to capture the non-linear relationship between the dependent variable and this factor. Multiple injuries is used as a factor for this model. The majority of missing values for multiple injuries were in 1997. Although some claims with injury date in 1997 were open at the time of interest mentioned earlier, we assumed that such claims were scarce. Table 3.3.2.1 Factors Included in Model to Predict Total STD Days for Open Claims at the Claim Level Factor Type of Factor Number of Categories (Qj) ICD9 cluster Categorical 11 Occupation type cluster Categorical 13 Industry type cluster Categorical 11 Age Categorical 6 Gender Categorical 2 Area office Categorical 15 Injury year: (injury year - 2000)* 10 Continuous n/a Injury month Categorical 12 Injury day of a week Categorical 7 Multiple injuries Continuous n/a STD days paid in the previous 12 months Categorical 13 In order to decide which claims were open, we had to check two conditions of claims based on the record called "active claims": first, whether the claim was open one month before starting the time interval of interest; and second, whether the claim received any STD payments in the month. If both of the conditions are satisfied, a claim is considered as open. Checking two conditions is necessary because although records of some of claims are indicated as still open, they do not receive any STD payments. In this case, these claims are generally considered as 23 closed. This happens merely because the record of claim condition has not been updated for some reason. Table 3.3.2.2 shows the number of open claims at each area office between 1997 and 2002. Claims were considered to be open at the end of either June or December of a specified year i f they were open at the end of the particular month, and had received a STD payment in that month. Otherwise, claims are considered to be closed. The increase of the number of claims open is because the data used includes claims whose injury dates were between January 1st, 1997 and December 31 s t, 2002. Therefore, for example, in 1997, the claims included were only those with an injury date between January 1s t, 1997 and December 31 s t, 1997, whereas the claims included in 1998 were those with an injury date between January 1st, 1997 and December 31, 1998, and so forth. Therefore it is natural for the total number of claims to increase every year. However, the total number of claims decreased in 2001 and 2002. This indicated that there were fewer claims opened in 2001 and 2002 compared to 1998, 1999, and 2000. At the end of 2002, the offices in Cranbrook and Vernon were closed and the claims which belonged to these two offices were transferred to the offices in Nelson, Kelowna or Kamloops. In addition, the Vancouver-South and Vancouver-Richmond offices were merged at the end of 2002, and it now is called the Vancouver-Richmond office. On the data table obtained by WCB, some of the claims which were transferred to the new offices at the end of 2002 had already had the new area office as their office although the claims were originally registered at the pre-transferred office before the end of 2002, and others had their pre-transferred office as their area office. Therefore, the number of claims for the affected area offices (Nelson, Kelowna, Kamloops Vancouver-South office and Vancouver-Richmond) were not accurate for 2002. Table 3.3.2.2 The Number of Claims Open at Area Offices Between 1997 and 2002 Area Office 1997 1998 1999 2000 2001 2002 06/30 12/31 06/30 12/31 06/30 12/31 06/30 12/31 06/30 12/31 06/30 12/31 Abbots ford 205 301 360 446 429 447 424 450 419 460 360 400 Burnaby 374 447 588 731 747 688 645 694 699 706 520 539 Central Services 29 41 44 59 63 68 76 92 94 78 88 84 Coquitlam 296 367 429 513 557 486 430 . 402 409 464 •404 347 Courtenay 267 339 388 440 454 543 503 539 458 448 321 292 Cranbrook 66 146 156 195 173 185 169 175 119 115 62 10 Kamloops 187 328 318 425 441 496 447 463 409 391 346 414 Kelowna 197 330 411 474 470 538 461 469 380 409 346 400 Nanaimo 260 443 408 511 442 486 456 516 424 396 331 338 Nelson 88 130 132 181 194 198 188 196 192 235 215 243 Prince George 277 379 403 414 442 486 357 335 333 360 330 277 Surrey 454 683 739 914 980 980 803 812 783 860 745 716 Terrace 117 170 198 259 259 315 259 259 192 180 140 133 Van Central & North 421 616 691 801 815 811 705 759 791 730 635 597 Vancouver South 343 498 573 621 601 604 600 569 482 462 326 44 Vancouver-Richmond 488 699 740 899 925 869 855 975 974 1014 923 1176 Vernon 106 144 156 176 197 239 209 186 151 165 81 18 Victoria 317 433 481 617 614 658 614 653 576 606 522 576 Total 4492 6494 7215 8676 8803 9097 8201 8544 7885 8079 6695 6604 24 W e created a factor ca l led status_year to indicate whether a c l a i m was open at the end o f each year. I f a c l a i m was open, for example , at the end o f 1998, then status_year equals to 1998. B a s e d on the 11 independent variables and dependent variable ment ioned above, three models were developed. T h e n w e extracted total S T D days o f c la ims i n 2003 b y getting a l l the c la ims open at the end o f 2002. W e extracted these estimates based o n the factor s t a t u s y e a r o f 2002. Table 3.3.2.3 Time Interval of Prediction for STD Days Model Status year Claims Open Prediction 12 months 2000 Dec 31st , 2000 Jan 1st, 2 0 0 1 - D e c 31st , 2001 2001 Dec 31st , 2001 Jan 1st, 2 0 0 2 - D e c 31st , 2002 2002 Dec 31st , 2002 Jan 1st, 2003 - Dec 31st , 2003 18 months 2000 June 30st , 2000 July 1st, 2 0 0 0 - D e c 31st , 2001 2001 June 30st , 2001 July 1st, 2001 - Dec 31st , 2002 2002 June 30st , 2002 July 1st, 2002 - Dec 31st , 2003 6 months 2000 June 30st , 2000 July 1st, 2 0 0 0 - Dec 31st , 2000 2001 June 30st , 2001 July 1st, 2001 - Dec 31st , 2001 2002 June 30st , 2002 July 1st, 2002 - Dec 31st , 2002 V a l i d a t i o n o f the mode l was done b y compar ing the estimated and actual S T D days pa id i n 2001 and 2002. F o r example, i n order to estimate the total S T D days pa id i n 2002, w e kept c la ims w h i c h were open at the end o f 2001 or earlier. Then , w e removed a l l the actual S T D days pa id to the c la ims w h i c h were open at the end o f 2001 , and estimated these values b y us ing the informat ion o f the rest o f the c la ims . Af t e r that, we compared the estimates and the actual S T D days pa id for 2002. The percentage difference and absolute difference were used to compare the estimated and actual S T D days paid . Percentage difference \ActualSTDdaysPaid - EstimatedSTDdaysPaid\ ActualSTDdaysPaid (3.3.2.1) Abso lu t e difference = ActualSTDdaysPaid - EstimatedSTDdysPaid\ (3.3.2.2) The backward stepwise e l imina t ion method was appl ied to determine the best mode l . 4. Survival Analysis 4.1 Background S u r v i v a l A n a l y s i s is used to measure the effect o f the predictor on the t ime un t i l event occurs. It has been extensively used i n the analysis o f med ica l data and equipment failure analysis. In this study, the event is when a c la imant goes back to w o r k after be ing away f rom w o r k due to his/her injury or disease incurred at his/her work . W e also considered h o w some factors, such as occupat ion type, and I C D 9 , affect the t ime unt i l a claimant goes back to w o r k . 25 As mentioned earlier, there are several unknown factors involved in a claim life such as how long a claim stays open, whether it reopens, and i f it reopens, how long it stays open, and so on. Survival analysis can take into account such uncertainties by allowing for censoring. A specific observation is considered censored i f the event has not occurred yet at the time we stop observing it. It this project, a claim is considered censored i f it is still open at the time we stop collecting information (censoring time) about a specific employee. 4.2 Censoring The complication of censoring is caused because of the reopening of claims. Any claim always has a chance to reopen. Even i f a claim is closed at the time of censoring, it is possible for the claim to reopen after the censoring time. Therefore, the following rule for censoring was developed to determine whether claims are closed permanently or will reopen. In this study, the data used for the survival analysis are claims with injury dates between January 1st, 1997 and December 31 s t, 2002. The dataset contains the information about the claim status until April 30 t h, 2003. According to Section 3.3, 87% of claims never reopened i f they had been closed for more than two months. We assumed that claims are closed permanently i f they have not reopened within two months of the previous closure. Otherwise, currently closed claims are considered as reopening in the near future. Therefore, February 28 t h, 2003, two months before April 30 t h, 2003, was chosen as the censoring time for claims. Based on the assumption of reopening condition above, there are two types of claims. Type 1 and Type 2 claims are specified in Figure 4.1.1. Type 1: Claims which were considered to be closed permanently (A, B) Claims closed on or before February 28 t h, 2003 and never reopened in the next two months. Type 2: Claims which staved open or would reopen (C, D) Claims that stayed open after February 28 t h, 2003 or claims that reopened between February 28 t h, 2003 and April 30 t h, 2003. A claim is considered to be open at the end of February, 2003 i f the claim is denoted as open at the end of the month and it received STD payment in February, 2003. There are claims which were open at the end of February, 2003, but did not receive any STD payment. These claims were considered to be closed at the end of February, 2003. 26 Figure 4.1.1 The Censoring Rule for Claims A B C D Claims Feb 28, 2003 A p r i l 30, 2003 — O p e n - • Uncenso red -Close Censo red Among 419,360 claims, about 98.9% of claims were closed before February 28 t h , 2003, and therefore were not censored. 1.1% of claims were open or reopened after February 28 t h , 2003 and were considered to be censored. 4.3 Regression Model for Survival Analysis Suppose Yj is a random variable representing duration of STD days (total STD days), and t is the censoring time, February 28 t h , 2003, for the i t h claimant. Additionally, let Xn..Xjk be the values of k covariates (predictors) for claimant i. The cumulative distribution function of Y is: Fyiy^probfcZy) (4.1) where Y( e (0,+oo) (4.1) is the probability Yj will be less than or equal to a certain y. Based on (4.1), the probability of surviving beyond t can be determined with a survival function: SY(y) = P{Yi>y) = \-FY{y) (4.2) where 0<SY(y)<l S(0) = l Given a random sample of n independent observations Y j , i =l,...,n , the probability or likelihood associated with the data is 27 L = Y[\fyiy)V'[Sr{y)r (4.3) 8j is an indicator variable with 8, = 0 if a claim is not censored, and 5; = 1 i f a claim is censored. For our study, the sample size n is equal to 419,360. In many situations, it may be convenient to assume that the function fY(y) belongs to a known parametric family of density functions, and these families are indexed by a unknown parameter vector 0. Therefore, the survival function SY(y) also has a known form, indexed by the same parameter 0. For example, fY(.) can be a log-normal density, with unknown 0 = (p., rj), or it can be a negative exponential density with unknown parameter 0 =X. Note that in the first case we have two unknown parameters, while in the second, we have only one. Given the parametric representation fY(y) = fY(y; 0), we can rewrite equation (4.3) as follows: By maximizing the log-likelihood function, log L(0), where L(0) is as in (4.4), we can obtain estimates for the unknown parameter vector 0, and perform statistical inference. Survival analysis wil l fit the distribution function that will yield an interpretable and reasonable estimate of the effect of factors of interest on total STD days for claimants based on (4.3) and the method of maximum likelihood estimation. Therefore, we first had to determine the suitable distribution for our survival model function. We developed survival models without any independent factors to see which distribution would likely be the best. After determining the suitable distribution, we developed the model to estimate STD days with some independent variables. We developed the models with log normal, exponential, Weibull, and gamma distributions. The best fitted model was determined based on two measurements: Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), and then compared with the actual additional STD days. AIC and BIC are defined by: where K is the number of parameters in a model and T is the total number of individuals (observations) in our sample used to build the model. The criteria AIC and BIC can be used when we compare models with different numbers of parameters because they incorporate a penalty for reducing degrees of freedom while still revealing how well the models fit. If model A contains more parameters than model B, K„, the number of parameters in model A, is larger, and as a result, 2Ka will be larger than 2Kb for AIC. n (4.4) AIC = -2loglikelihood + 2K BIC = -2loglikelihood + KlogT (4.5) (4.6) 28 Similarly, KJogT will be larger than Kb logT for BIC. The model is considered to be better i f AIC and BIC are smaller. The more coefficients a model has, the larger the 2Kt or KtlogT is, and the smaller the degrees of freedom of the model is. Therefore, by taking into account the number of coefficients in a model, AIC and BIC penalize the reduction of degrees of freedom. Refer to Griffiths (1993) and Greene (1997) for more detail. Along with AIC and BIC, we determined the actual additional STD days to decide the proper distribution for the survival model. The actual additional STD days were the means of additional and total STD days actually paid to a claimant who has already been away from work for a certain number of days respectively. We used the actual additional STD days to determine the best distribution for our survival model by comparing the actual additional STD days and the estimated additional STD days based on the model with the four distributions mentioned earlier. The claims with injury days between January 1st, 1997 and December 31 s t, 2000 were used to determine the means. First, we determined the number of additional days after a claim has received 5, 10, 15, and so on, up to 180 days by subtracting the number of days a claim has already received from the total STD days for the claim listed in the database. If the result of subtraction was negative, we did not count it. For example, i f the total STD days for a claim was 10 days, then the claim received an additional 5 days after it received 5 days. However, there are no more additional days once a claim received 10 STD days. After determining the additional days in the manner above, we took an average of the additional STD days for claims after receiving 5 days, 10 days, and so on. We then determined the total number of STD received by claims after a claimant was away from work for a certain number of days, in other words, 5, 10, 15, and so on up to 180, to the average additional STD days determined above. After determining the best distribution for the survival model, we then developed a model with the independent variables. The independent variables used in this analysis are shown in Table 4.2.1 below. Table 4.2.1 Independent Variables in Survival Analysis Factor Type of Factor Number of Categories ICD9 cluster Categorical 11 Occupation type cluster Categorical 13 Industry type cluster Categorical 11 Age Categorical 6 Gender Categorical 2 Area office Categorical 15 Injury year: (injury year - 2000)* 10 Continuous n/a Injury month Categorical 12 Injury day of a week Categorical 7 29 4.4 Total and Additional STD Days Paid to a Claim After determining the best model with a certain distribution based on the methodology discussed in Section 4.2, the conditional expectation , the number of additional STD days given that a claimant has been away from work for more than x days, will be determined in the following way. First, the probability that the number of total STD days paid to claim i is larger than y, given that claim i has already received STD days of more than x days is the following. , r I l prob\Y. > y,Y >x] prob^, >y\Yi >x\=P 1 ' J prob\Yj > x\ _ prob\Yi > y] prob^ > x] = 1 - -Fy(y ) (4.7) l-FY(x) The probability that the total number of STD days paid is less than or equal to y given that the claimant received more than x STD days can be determined with the following equation based on (4.7). probh < y\Yt >x]=l-1—^4 (4-8) L ' 1 ' J l-Fr(x) Let Wt be the additional STD days that the claimant gets given that the claimant received more than x STD days. In other words, Wj - Y,-x where Wj > 0. Based on (4.8), the distribution function for the additional STD days paid given that the claimant received more than x STD days is: ^ > ) = l - ! ^ M ^ (4.9) The Probability Density Function (pdf )of (4.9) is: ow The expected value of the additional STD days paid given that the claimant received more than x STD days already is: 00 (4111 E{w)= J w / ) r (w)dw 0 The expected value of the total STD days given that the claimant has been away from work more than x STD days is determined by adding x to (4.11). 30 5. Results 5.1 Determining Factors that May Explain Variation in STD Days The influential power of 18 factors was studied and ranked according to their predictive power. It is assumed that i f a factor has better predictive power on total truncated STD days than other factors, then the factor is more influential on overall STD days paid. The method of ranking of influential factors is explained in Section 3.2. Tables D . l ~ D.8 in Appendix D show M S E C V , root M S E _ C V , relative M S E C V , M A E _ C V , R-squared, and rank of each factor with different truncation periods. The smaller relative M S E C V , and relative M A E C V are, the better the predictive power. On the other hand, the larger the R-squared is, the better the predictive power is. The ranking of factors of truncation of 6, 12, 18, and 24 months with the first STD payment as the starting point and with the registration day as the starting point are shown on Figure 5.1.1 and Figure 5.1.2, respectively. Both figures show similar results. ICD9 cluster has the best predictive power in any truncation methodologies followed by nature of injuries. This graph clearly shows that ICD9 cluster is a much better predictor than other predictors. The rest of the factors have a very similar predictive power. Some factors show that their ranking changes with different truncation methodologies. ICD9 cluster and nature of injuries cluster show a clear increase in relative MSE_CV. Further, ICD9, nature of injury, body part, source of injury, and accident type are kinds of injury related factors. A l l of these injury or disease related factors had more impact on STD days paid compared to the other type of factors. In addition, industry type cluster and occupation type cluster predict STD days well following the injury related factors. Besides the injury-related factors, the adjusted multiple injuries factors showed a strong impact on STD days. Some of the claims did not have multiple injuries information, and, as mentioned earlier, we created the adjusted multiple injuries factor by filling in one injury for the claims whose multiple injuries factor was missing, assuming that all the claims have at least one injury in order to be on record at WCB. Based on the model with the first STD payment date as a starting point, the rank for non-adjusted multiple injuries, raw multiple injuries, are: the 12 t h for truncation of 6, and 12, and 19 th for truncation of 18, and 24 months. On the other hand, the rank for adjusted multiple injuries is the fourth best for the 6 months truncation and the third best predictor for the other truncation periods. Therefore, i f there were fewer missing values in multiple injuries, this factor would have been a good predictor for STD days. The timing of injuries or diseases such as the injury months, year, injury day of the week, and so on show relatively small influence on STD days paid. The time interval between injury day, disablement day, registration day and the first STD payment day also did not have much of an impact on STD days paid. Note that constancy was not worst some cases. This was because we were comparing models on the bases of cross validation measures. Overall, the magnitude of influence of the majority of factors on STD days paid were very similar. We made the following important decisions made based on this analysis. 31 First of all, since a lot of clusters represent some type of injury or disease related factors, they are highly correlated to each other. Including correlated factors in a predictive model would make models unstable. Therefore, one factor had to be chosen among injury and disease related factors. As mentioned earlier, ICD9 showed the most influence on STD days. Therefore, ICD9 was chosen to represent the injury and disease of claimants in this study. Secondly, although it was a very good predictor, the adjusted multiple injuries was not used in most of the predictive models. As shown in Table B . l in Appendix B, the multiple injuries factors had missing values in claims whose injury date was in 1997, much more than those whose injury date was in any other year. Most of the models we built included the claims with injury date in 1997, and by including such unbalanced data, we were worried that the outcome might have been distorted as a result. However, we included this factor in the model to predict total STD days in the coming year for claims which are open at the end of the current year because this model was built based on claims which were active between 1998 and 2001. Finally, we did not include three intervals: disablement day and the first STD payment, registration day and the first STD payment, and injury day and the first STD payment, studied in this section in any models. W C B suggested that the interval of such time changes as the process system of claims within W C B changes. Therefore, these intervals would not necessarily represent their system and their effect might not have been shown accurately by the models. 32 5.1.1. Relative MSE_CV (the First STD Payment as the Starting Point of Truncations) * The lower the values, the better the prediction Relative Mean Square Error for Explanatory Variables with the 1st STD Payment Day as the Starting Point 0.055 0.054 0.053 0.052 £ 0.051 0.050 2 0.049 u tt 0.048 0.047 0.046 0.045 •o- • o ..-A • A A' 6 months 12 months 18 months 24 months Truncation Period — Intercept Only • • Injury Year (Rescaled) — Injury Month • • Injury Weekday — Gender • - Area Office — Multiple Injuries •- Disablement-Payment Interval — Registration-Payment Interval • • Occupation Type - Cluster — Age •• Injury-Payment Interval — Source of Injury - Cluster - Industry Type - Cluster — Accident Type - Cluster • Body Part Type - Cluster — Multiple Injuries - Adjusted • • Nature of Injury - Cluster — ICD9 Code - Cluster 33 5.1.2. Relative MSE CV (the Registration Day as the Starting Point of Truncations) * The lower the values, the better the prediction Relative Mean Squared Error for Explanatory Variables with Registration Day as the Starting Point of Truncations 0.056 0.054 0.052 t 0 05 co c n o> 2 o = 0.048 a o a. 0.046 0.042 6 months 12 months 18 months Truncation Period 24 months - -A--0 -Intercept Only Injury Year (Reseated) Injury Month Injury-Rayment Interval Disablement-Payment Interval Injury Weekday Gender Area Office Registration-Payment Interval Multiple Injuries Occupation Type - Ouster -0 Age o- - - Industry Type - Ouster -k— Source of Injury - Ouster • - - - Body Part Type - Ouster Accident Type - Ouster a — Multiple Injuries - Adjusted A — Nature of Injury - Ouster •• ICD9 Code - Ouster •x-34 5.2 Model for Total STD Days Paid Within Truncation Period at the Claim Level The model provides an estimate of total truncated STD days paid in 6, 12, 18, and 24 months for a claim with certain characteristics. Originally we were considering studying the truncation with both the registration day and the first STD payment day. At the point of registration of a claim and the first STD payment day of a claim, the following factors shown on Table 5.2.1 are known. Table 5.2.1 Factors which are Known at the Time of Registration or the First STD Payment Registration Day The 1st STD Payment Day ICD9 Cluster X Occupational Type cluster X Industry Type Cluster X Age X X Gender X X Area Office X Injury Year X X Injury Month X X Injury Day of a Week X X Notice that there are not many factors known at the time of registration of a claim. More importantly, ICD9 which was determined to be the most influential on STD days is not known at the time of registration of a claim. Therefore, registration day was not used as a starting point and the first STD payment day was the only starting point of interest. In addition, the first STD payment is used as a starting point of truncations because, as was indicated in Section 2.4, the truncation with the first STD payment day contains more claims with longer STD days paid. The model was developed based on the method discussed in Section 3.1 and 3.1.1. The M S E C V , M A E C V , and R-squared for this analysis are shown in Tables E . l and E.2 in Appendix E. The final models for all truncation periods included the factors were the same as those in Table 3.3.1.1. The coefficients of the factors for all truncation periods are shown in Table E.3 in Appendix E. Table 5.2.2 shows R-squared, standard deviation and mean of the final models for each truncation period. The model with a shorter truncation period explains more variability in the data. The longer the truncation period, the more outliers were included, and as a result, the standard deviations were much larger than those with a shorter truncation period. Overall, the model with 6 months truncation period was the best for the accuracy of the prediction, but it did not include those claims with many STD days, which costs WCB more. 35 Table 5.2.2 R-squared, Root MSE, and Mean for the Models in Each Truncation Period Truncation 6 Months 12 Months 18 Months 24 Months R-Squared 19.18 17.78 16.01 15.12 Root MSE 42.94 58.94 71.06 78.46 Mean 30.69 35.51 37.74 38.91 Based on this model, by assuming all other factors are controlled, the estimated additional STD days paid for each level of factors compared to the baseline level can be determined. The estimated additional STD days can give some idea of the relation among levels of a factor. The following is an example of the description of additional STD days for each factor in the model in the 12 months truncation. The additional STD days of factors in 6, 18, and 24 months truncation are shown in Tables E.3.4 to E.3.11 and Figure E.3.2 in Appendix E. 1) ICD9 Table 5.2.3 shows the additional STD days for ICD9 cluster compared to the baseline. Notice that the range of additional STD days is very large. The top two clusters receive an average of 100 days more than the baseline cluster. In addition, half of the ICD9 clusters receive more than 50 additional days. Therefore, there are some injuries which are much more severe than others and workers have to be away from work for a long period of time. Table 5.2.3 The Estimated Additional STD Days for ICD9 Cluster Cluster ICD9 Cluster (e.g) Additional STD days 11 Intervertebral Disc Disorder, Injury To Spinal Cord 156.12 5 Displacement of Thoracic/Lumbar Disc 118.70 9 Fracture Of Ankle, Closed 92.70 8 Dislocation Of Knee 82.62 4 Carpal Tunnel Syndrome 61.35 10 Strain Rotator Cuff, Shoulder & Upper Arm 54.38 6 Fracture Of Clavicle/Collar Bone, Closed 45.57 3 Strain Shoulder & Upper Arm 25.81 7 Concussion 23.70 2 Lower Back Strain 16.35 1 Open Wound Of Finger(s)/Thumb(s)/Nail(s) 0.00 2) Industry Type Code Table 5.2.4 shows the additional STD days for Industry cluster relative to the baseline, cluster 1. The example of industry type in Cluster 1 is restaurant and public sector. Compared to people working in the Restaurant/Public Sector, those who are working in the Farming industry have close to 50 more STD days, followed by those in Coal Mining, who take about 46 additional STD days. Overall, there is a relatively small difference in the additional STD days among different industry types compared to ICD9. 36 Table 5.2.4 The Estimated Additional STD Days for Industry Cluster Cluster Industry Cluster (e.g) Additional STD Days 11 Farming . 48.55 12 Coal Mining 46.43 10 Natural Resource 44.79 9 Fishing 38.08 8 Pipeline Construction 29.15 7 Road Construction/Mining 20.67 5 Forest Management 17.58 4 Trucking 16.49 3 Building Construction 11.31 6 Manufacturing 8.68 2 Private Service 4.93 1 Restaurant/Public Sector 0.00 3) Occupation Type Code Table 5.2.5 shows the additional STD days for Occupation Type cluster compared to the baseline. The example of the baseline occupation type is teachers and system analyst. Compared to ICD9 and Industry clusters, occupation type clusters do not have a large range of additional STD days. For example, the largest additional STD days were received by those whose occupations were related to logging and forestry. They received an additional 23 STD days compare to the baseline occupations. The majority of occupation clusters receive an addition of up to 10 STD days compared to the baseline. Table 5.2.5 The Estimated Additional STD Days for Occupation Type Cluster Cluster Occupation Cluster Additional STD days 8 Logging and Forestry 23.11 12 Fishing and Trapping Occupations/ Natural Resources Occupations 10.62 6 Contractors 8.76 5 Forestry Workers 8.24 4 Drivers/Nurses 7.35 7 Heavy Machinery 6.35 10 Machine Operators 4.64 9 Labourers in 3.55 1. Manufacturing 3 Cleaners/Carpenters 2.56 1 Food Services 1.04 11 Technical 0.21 9 Teachers/System Analysts 0.00 37 4) Area Office Table 5.2.6 indicates the estimated additional STD days for Area Office. The baseline was Vernon. Kamloops has the most additional STD days followed by Courtenay, and Central Service. As mentioned earlier, the offices in Cranbrook and Vernon were closed and Vancouver South and Vancouver-Richmond were merged in 2002. In the database from W C B , some of the claims were assigned to the new area office, but others still had their original area offices. Table 5.2.6 The Estimated Additional STD Days for Area Offices A r e a Off ice Additional S T D Days Kamloops 9.16 Courtenay 7.58 Central Service 6.60 Vancouver Richmond 5.14 Nelson 4.40 Nanaimo 4.20 Surrey 4.19 Vancouver Central 4.03 Burnaby 4.01 Terrace 3.71 Kelowna 3.65 Prince George 3.44 Victoria 2.85 Abbots ford 2.82 Coquitlam 2.69 Vancouver South 2.59 Cranbrook 2.11 Vernon 0.00 5) Weekday of Injury Table 5.2.7 shows the estimated additional STD days for weekdays when the injury occurred. Injuries that occurred on Saturday have the most additional STD days compared to the baseline, Tuesday. Overall, i f an injury or disease occurred around the weekend, the claim tends to have more additional STD days. Table 5.2.7 The Estimated Additional STD Days for Weekday Weekday Additional STD days Saturday 4.52 Sunday 3.62 Friday 3.20 Thursday 1.56 Wednesday 1.06 Monday 0.27 Tuesday 0.00 38 6) Injury Month-Table 5.2.8 indicates the estimated additional STD days for the month when the injury occurred. The baseline is February. STD days increases in summer and fall, and is relatively low in early spring. Table 5.2.8 The Estimated Additional STD Days for Month Month Additional STD days October 1.69 September 1.24 November 1.01 December 0.95 April 0.94 July 0.80 August 0.62 June 0.55 January 0.45 May 0.32 March 0.30 February 0.00 7) Age Figure 5.2.9 shows the estimated additional STD days for an age group. The baseline is claimants who are 20 years old or under. The additional STD days increases as the age of claimants increases. Figure 5.2.9 The Estimated Additional STD Days for Age Additional STD days with Age 12 months truncation 30 -25 ->< •8 20 -Q « 15 -v ,> « 10 -o> DC 5 -0 -20 and under 21-30 31-40 41-50 51-60 60 and above Age Group 39 8) Gender Females have an average of an additional five STD days more than males. 9) Injury Year For every additional year, there are about .4 additional STD days. The advantage of this analysis is that it gives us a precise estimate of level effects within the factors. For example, the model helped to understand which area office has claimants with more STD days. The disadvantage is that because the model did not include the interaction terms of factors, although the difference of STD days within each factor could be studied, the model cannot explain why there is difference within the factors. For example, some area offices receive more STD days compared to the rest. The model does not explain why this is the case. This may be because of the efficiency of the people at the office, the type of industry or the demographics of the people the office is serving. 5.3 Model for Total STD Days Paid for Open Claims at the Claim Level, Area Office Level, and WCB Level As mentioned earlier, our focus in this analysis was claims which stay open through the end of the current year. We developed three models to predict the total STD days to be paid for these types of claims in the coming year at the claim level, area office level and W C B level. Model 1: To predict total STD days of claims which were open on June 30 t h, 2002 for the next 6 months between July 1st, 2002 and December 31 s t, 2002. Model 2: To predict total STD days of claims which were open at the end of December, 2002 for the next 12 months between January 1 s t and December 31 s t, 2003. Model 3: To predict total STD days of claims which were open on June 30 t h, 2002 for the next 18 months between July 1st, 2002 and December 31 s t, 2003. 5.3.1 Model for Total STD Days Paid in the Next Year at the Claim Level The model is developed to predict the total STD days paid in the next year for a claim which was open at the end of the previous year. As mentioned earlier, three models for different time intervals were developed. Multiple linear regression analysis was used to develop models. The factors included to build the models were the same as those shown in Table 3.3.2.2. As mentioned, we included multiple injuries as a factor in this study because we used claims with injury year between 1998 and 2002 to build a model. Backward stepwise elimination method was used to reduce the model. However, we did not use the cross validation mentioned Section 3.1.1. We used M S E to determine the best model. Although we used the rank of factors determined in Section 5.1 for most of the factors in the model, we have an additional factor, STD days in the previous 12 months, which was not a part of the rank. In order to estimate the rank of this factor, we ran a regression model with only the 40 factor, STD days in the previous 12 months as a independent variable, and STD days of open claims for 6, 12, and 18 months as dependent variables as described in Section 3.1. Then, we determined M S E for these three models. Next, we ran a multiple regression model with ICD9 in a same manner as the model with a factor, STD days in the previous 12 months and determined the MSE. Table 5.3.1.1 shows the M S E of these models. A l l M S E and R-squared of models for STD days in previous 12 months are smaller than those of models for ICD9. Therefore, we decided that the factor, STD days in the previous 12 months would be the last factor to be removed in backward stepwise elimination method in this analysis. Table 5.3.1.1 MSE and R-squared for STD Days in Previous 12 Months and ICD9 Models 6 Months 12 Months 18 Months MSE R-squared MSE R-squared MSE R-squared STD days in previous 12 months 2816.70 0.15 8286.37 0.12 14382.00 0.13 ICD9 3149.64 0.05 9025.52 0.05 15684.00 0.05 After deciding the order of removing factors for the backward stepwise elimination method, we developed a model for open claims of 6, 12, and 18 months. We chose the full model as the final model for all three intervals since its M S E was the smallest of any of the models. In addition, we also thought that the full model was suitable in this situation because we were predicting the next year's STD days, unlike the next 5 years or 10 years STD days. A model with more coefficients may overfit the data. However, it also tends to work better than a reduced model when we are predicting values for the near future. It also captures the characteristics of several factors of interest. M S E and R-squared of the three models are shown in Tables F . l , F.2, and F.3, in Appendix F. Figure F . l in Appendix F shows the change in root MSE. The coefficient of the models for 6, 12, and 18months are shown in Table F.4 in Appendix F. 5.3.2 Predictive Model for Total STD Days Paid in the Next Year at the Area Office After we determined the prediction of total STD days to be paid for on individual claim, as shown in Figure 3.3.2.1, we aggregated these claims to their area office to predict total STD days to be paid at the area office level in the coming year. Table 5.3.2.1 shows the actual and predicted total STD days paid for the 6 month interval. The minimum of the percentage difference between July 1st, 2000 and December 31 s t, 2000 is 0.04% for the Nanaimo office, the maximum is 47.22% for Central Services, and the average is 9.74%. For July 1st, 2001 and December 31 s t, 2001, the minimum percentage difference is 0.12% for the Nelson office, the maximum is 46.92% for Central Services, and the average is 10.06%. For July 1st, 2002 and December 31 s t, 2002, the minimum percentage difference is 0.48% for the Prince George office, the maximum is 71.09% for Central Services, and the average is 13.87%. The percentage difference of Vernon is large and is possibly because the Vernon office was closed and all the claims were transferred to the Nelson, Kelowna, or Kamloops office. However, some of the claims on the data did not have a new area office name, as mentioned earlier. 41 73 < 3 § o c S 3 a n o n c •i TO 2? ^ 60 IS o o 4^ "vo IO OV o to VO o OV 4^ - J to to o o o o Ov 4^ VO OV t > » ST vA U l rt = 85 s a a a H o rt £L co H O © as O o 65 3 a K i O o K > S5 •t OS © 3 rt It ov vo IS It OV VO o to VO to OV oo ON 4^ OV 4^ 4^ o ov to VO 4^ "vo ov OV to 4*. vo OV o 4^ 4^ 4^ to VO 4^ to 4^ OV vo o ov to 4^ to to 4^ to to bo VO VO ov 4^ o 4^ i > as <=• > o n a > a-Total STD Days % ° hO 00 CO - J co o O o o o o o o o O o o o o o o o O o o o o o o o o o o o o o o W \ \ \ \ \ \ \ \ \ \ \ X \ \ \ W \ \ \ \ \ \ \ \ \ W \ \ \ \ \ \ \ \ \ \ \ \ V vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx^  1& > - I CD a -* o' CD xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx .xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxwxxxxxxv , X * i ; x^xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxv °0 vxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxwvvvxv VAV.V.VA'.V.VAVVAIAVmW.V/WWW.V.V.WJ'W.V.V.I • ho o o o > o E3 N> o o o 3? CD Q. o ' ho o o > o ho o o 3? CD 9: ! o ® ho o o ho > o ho o o ho CD Q. o' 5? oro' e re Ul > n c hi a a H o CO H o o ce o o » D a o o OS © XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXxVl 4^ Victoria | Vernon Vancouver-Richmond Vancouver South Van Central & North Terrace Surrey Prince Georee Nelson Nanaimo Kelowna * s 3 ST o o Cranbrook Courtenav Coouitlam Central Services Burnabv Abbotsford O N V O ~o C O oo U l o - J O N to O N ~o N O UJ ~o to to 4^ oo U l o J > N O U l U l oo UJ U l to O N O O J > U) Ac O O O N J > U l U l - o to O N O O o UJ UJ UJ J > O N U l U l N O O O o * . N O U l to UJ 4^ to oo to N O 4^ O N 00 N O N O UJ to N O to 4 i N O J > u> O N o o o - J O N O N to to tual -a O N 4^ U l - o UJ ~ J O N to 00 O N - o U) o to 45-- J 4> O O U l UJ N O U l N O UJ o O N U l J > UJ to red O N OJ J > bo to o U l N O N O " N O J > ,661 ,620 ,252 ,722 ,002 ,550 ,900 ,378 ,390 ,660 "o o U l 4^ to ,227 liction % Difference to o o 7.49 9.89 19.38 25.09 17.58 7.52 15.05 18.34 3.56 2.94 3.47 8.48 21.63 0.79 14.70 52.37 19.88 26.12 % Difference 5,230.06 | 1,736.62 17,108.83 12,551.67 13,420.01 2,003.70 11.977.84 6.796.51 800.81 1.425.84 1.743.53 4,211.19 3,445.92 466.20 5,282.79 6,598.69 13.533.62 1 1.395.40 Absolute Diffprpnrp 57,374 9,408 99,584 31,248 66,437 18,528 75,975 36,100 30.405 38.375 37,627 41,406 6.730 41.291 40.980 11,093 60,039 1 41.633 Actual "V U l N O UJ oo UJ U) UJ O N J > N O UJ o\ to U l UJ O N to J > UJ o U l UJ OS O N O N 4^ UJ OO red UJ to O N J > " O N to - J N O O N O "-•J oo J > O N O N O 00 U l UJ OO j> - J to 00 O N U l b to o U l J > O N oo UJ bo UJ ~j to b N O —1 bo oo liction 3.06 46.31 16.02 8.68 2.49 6.11 1.60 1.21 16.23 5.44 12.04 3.90 56.70 25.89 10.02 44.36 6.76 6.76 % Difference 2002 ~o U l oo 4,356 15,957 2,712 1,653 1,132 1,212 J > UJ oo 4,934 2,089 4,530 1,614 3,816 10,692 4,107 4,921 4,058 2,815 Absolut Diffprpn 10,692 ? •» 50,586 Office Closed 1 99,049 Office Closed 50,906 13.413 1 59,229 1 25,963 27,434 30,191 1 35,337 1 41,520 1 Office Closed 1 28,951 1 27,610 1 8,710 1 44,208 1 1 29,780 1 Prediction 1 2003 1 » «T u> > rt e — D a a a H o H O a BS O O t—' SS E3 a b o O o K i » D a i n> a a H o rt H © O cs '< K i o o S» I—I o 3 rt cr -t 2 era' 4^ 8 X °0 \ h ho Total STD Days cn co % 1 \m\\\\m\mmvcNN\\v • c ro 8 i > o cT 03 3? o' 0 I a re l/i l*» k» • re «••• S SL B a •1 re a H o CO H O O O O I—' 65 B a K ) O O N< 65 B a »* re re a H o SL co H O O 65 •< v> O O S5 1 re -t < 65 Table 5.3.2.2 and Figure 5.3.2.2 show predicted and actual total STD days for 2002, and predicted total STD days for 2003 at each Axea Office for 12 months. Notice that the actual and predicted STD days in 2002 seem very close. The average percentage difference in 2001 is 15.24%, and 15.20% in 2002. The minimum percentage difference in 2001 is 0.79% for Courtenay, and 1.21% for Prince George in 2002. However, the maximum percentage differences for both years are very large. In 2001, Central Service has a maximum percentage difference of 52.37%, and in 2002, Vernon has a maximum percentage difference of 46.31%. Table 5.3.2.3 and Figure 5.3.2.3 show the actual and predicted total STD days in 2002 and predicted total STD days in 2003 for an 18 months interval. The percentage difference between the prediction and actual STD days are larger than the 12 months model. This is because as the time interval gets larger, there are more outliers in the model. These outliers make the model unstable, and as a result, the prediction is not as accurate as the model with shorter intervals. Minimum of the percentage difference between July 1st, 2000 and December 31 s t, 2001 is 0.49% for Nanaimo office, the maximum is 50.18% for Central Services, and the average is 15.45%. For July 1st, 2001 and December 31 s t, 2002, the minimum percentage difference is 0.36% for Nanaimo office, the maximum is 53.39% for Central Services, and the average is 19.01%. The difference between the prediction and the actual STD days for some offices is large for both 2001 and 2002. There are several possible reasons why these differences occurred. One of the reasons is that the distribution of total STD days to be paid is skewed. The majority of the area offices which had a relatively large difference between prediction and the actual STD days paid had a total number of claims less than about 150. The model developed for the claim level predicts the estimate for the mean of total STD days paid for claims with the same condition. The model tended to underestimate the total STD days paid for claims with a large number of STD days paid in the past, and to overestimate for the claims with a smaller number of STD days paid in the past. The aggregation of estimated total STD days tends to be close to the aggregation of the actual total STD days paid i f there are more claims to aggregate because the difference of overestimated total STD days and underestimated total STD days cancels each other by being added. The Central Service office has the largest percentage difference in the 6, 12, and 18 months intervals. A possible reason is that the Central Service receives specialized claims such as industrial diseases (for example, a disease caused by asbestos), and sensitive claims such as an injury caused by a worker being attacked at work by another worker). These cases are rare and it is difficult to predict STD days paid based on such rare cases. 46 o 4^ O s 4^ -^ 1 O N O N O N N O 4^ to P s o lO B < la 2 O s O 4^ O to N O S O 4^ O £ o c < to N O o o ON 4^ O O N N O C O o z S ° n O N to 4^ 4^ 4*. O N N O 4^ C N O N O N "to p K 1 H 2 « 3 = ra N O 4 i I O O 00 o a 2 > re c e O O o E o o io o o O s n -u i IO o ° o M o H p ST <M Lo a < 4^ oo > —* ro s O (D Total STD Days r o J>. c n c o o r o o o o o o o o o o o o o o XXXXXXXXXXXXXXXXX> • \\\\\\\\\\\\\\\\\v\\\\\\\\v gpoOoo iWWWWXVXWWXW I \ A \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ V X •XXXXXVAI % * X X « ^ \ \ \ \ \ \ N V \ \ \ \ \ \ W V V I X \ \ \ \ \ \ 3 W N N K N N N N N N N N N N N N N i XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSN H • c _ c _ c *< r o r o o o o o — 1 o o o S? CD o CO CD O CO 1. —* r o r o o o o o r o —^ 3> > ed ctu o IB CD CL E3 S c _ c _ c c •< >< r o r o o o o o r o o o o I? De o o CO CO — k r o r o o o o o CO — 3? CD CD Q . CL o ' o ' CD CD Q . CL • c _ s ; r o o o —k o r-j h> o c o _—^ r o o o r o > o ET CTQ B rt (71 U> ts) U) • o rt C £. p a a a H o rt rt. CO H © © p ts) o o h-' P 3 a ts) o o ts) P 3 a *a rt a a H o rt £L co H © © p 01 ts) o © 00 o D rt "1 p 5.3.3 Predictive Model for Total STD Payment in the Next Year at the WCB Level After we determined the prediction of total STD days to be paid at the area office level, we aggregated these predictions to the WCB level. The predicted and actual total STD days for July 1st, 2000 to December 31st, 2000, July 1st, 2001 to December 31st, 2001, and July 1st, 2002 to December 31st, 2002 (6 months period) at WCB level are shown in Table 5.3.3.1. The predicted total STD days at WCB for July 1st, 2002 to December 31st, 2002 is 616953 days and the actual total STD days paid was 596,967 days. The percentage difference is 3.35% and it indicates that the model predicts total STD days in 2002 very well. The actual STD days at WCB level for July 1st, 2001 to December 31st, 2001 is 542,432 days, and the predicted STD days is 577,212 days. The percentage difference for this time interval is still small at 6.03%. Table 5.3.3.2 shows predicted and actual total STD days for 2001 and 2002, and 2003 (12 months period) at WCB level. The predicted total STD days at WCB level in 2003 is 574,064 days. The actual STD days at WCB level in 2002 is 744,233 days, and the predicted STD days at WCB level in 2002 is 744,075 days. Notice that the percentage difference for 2002 is only 0.02%. This indicates that the prediction is very accurate. Finally, Table 5.3.3.3 shows predicted and actual total STD days for July 1st, 2000 to December 31st, 2001, July 1st, 2001 to December 31st, 2002, and July 1st, 2002 to December 31st, 2003 (18 months period) at WCB level. The predicted total STD days at WCB for July 1st, 2002 to December 31st, 2003 is 670,616 days. The actual STD days at WCB level for July 1st, 2001 to December 31st, 2002 is 877,668 days, and the predicted STD days is 796,335 days. The percentage difference for this time interval is 9.27%. The model predicts total STD days for 18 months at WCB level very well. Overall, the percentage difference of the predictions and the actual STD days paid for WCB level has an average of about 5%, and the WCB level prediction is very accurate. This model can only predict the total STD days paid in 2003 for claims which were open at the end of 2002 at WCB. As mentioned earlier, the model at WCB can be used to predict total STD days for other years besides 2003, if the coefficient of the previous year of STD days paid in the claim level model is updated. 49 VO U l > UJ n '-o C O l a O l ™ •0 vo •1 s o n U) a. © re U l rt ov S' tO s o o o o )? o -T. re -•> i O U l ->i n re re rt UJ S3 n B 3 » U J 2 TO re n t o o o U l o 5 > 3 =• 1 o re ST o 3 g. re re 00 —1 > —1 bv S ov B3 oo ^ c_ - 4 B VO ro •~ ov a u» « " rt U l o* IO s O o rt o a VO o re re re re i o - J re 2-B » UJ re era re a IO o o t o 00 2 > ff <» UJ UJ UJ olute rence c_ O = •o re ^ OV - J o re a <** " OV re" r" i o rt IO © Ov o" <=> s B o ^ UJ rt o H p re" u> u o p D a h3 re a re a H o P _ c« H O S5 "1 oo o p rt cr </> p rt n W r re re 0 0 U l > o rt U J s 0 0 p vo ns - J "1 P U l re B a 'vO re' rt rt 5' B rt O a re o 7 rt -T. re UJ rtj 1 rt* rt rt* bv entage erence o o entage erence 2001 VO 2 • 5 <=• re «e vo 'j> -1 o —1 lute ence U l lute ence - o J> J> re rt i o U J S 69 U J ' -o - J i s-i J> re p 4> a B o n rt VI U l S' "* B O o re n o -*> -1 - * s « UJ ro ^ b B p V t o rt re jq to re re o o (O 5 > _ re "> U l ™ o 0 0 lute ence — -fl W 3 U l —t re v, J> a rt Wl rt o IO rt ov o © J> o B S o re re H p CT re" t y i U) • rt S P _ P D a *n "« re a re a H o rt _. r / j H © O p s* I—I O B P rt o -r re < re rt-c ti O S 2 re s ° 2 65 re rt o re to o o o 2 • a* 5" 2 re re re o s d ^ 5: re rti o 2 re § 5-3 65 re n, © ? ! re w I to o o cr ©_ rt-re > re C 65 re a S. re » *< gi n re re 5 2. 3 6! 1 ro re re re to o 2 • 51 CT •I o 2 re re 5.4 Model for Estimated Total and Additional STD Days with Survival Analysis 5.4.1 Determination of the Distribution for Survival Analysis Survival analysis will fit the distribution function that will yield an interpretable and reasonable estimate of the effect of factors of interest on total STD days for claimants. Therefore, we had to determine the distribution for the survival model. First, we developed survival models without any predictive factors. These models were developed with log normal, exponential, Weibull and gamma distributions. Table 5.4.1.1. Log Likelihood, AIC, and BIC for Models with Distributions Log Normal Exponential Weibull Gamma Log Likelihood -790039.84 -954445.97 -832867.07 -783625.88 Parameters 2 1 1 2 BIC: (-2*Log Likelihood +KlogT) 1580523.86 1909330.51 1666172.70 1567695.94 AIC: (-2*Log Likelihood + 2K) 1580237.67 1909047.95 1665890.14 1567409.76 Table 5.4.1.1 shows the log likelihood, AIC, and BIC of the models developed. The gamma distribution has the smallest AIC and BIC followed by log normal. However, the values of gamma and log normal were very close. Therefore, we decided to compare the predicted values based on the survival models with gamma and log normal distributions with the actual additional and total STD days of claims which have already received a certain number of STD days. Figures 5.4.1.1 and 5.4.1.2 show this comparison. Figure 5.4.1.1 The Average of Actual Additional STD Days to Be Paid Comparison of Distributions and Actual Values •x— Observed (up to 1998) —*— Observed (up to 1999) — • Observed (up to 2000) - -+- - log normal — — gamma 2000 i —• — , 1800 0 • 20 40 60 80 100 120 140 160 180 # of STD days 51 As we can see, the survival model with the gamma distribution over-estimates the additional STD days much more than the model with the log normal distribution. The log normal distribution model prediction of the additional STD days was very close to the actual additional STD days. Figure 5.4.1.2 The Average of Actual Total STD Days to Be Paid Comparison of Distributions and Actual TotalSTDDays - X — Observed (up to 1998) — * — Observed (up to 1999) — * — Observed (up to 2000) - -+- - log normal gamma ?Rnn ?nnn 40 60 80 100 Number of Days 120 140 160 180 Figure 5.4.1.2 shows the average of actual total STD days paid to claims. Again, the survival model with the gamma distribution over-estimated the total STD days of claims which have already received a certain number of STD days. The model with the log normal distribution performed well. Therefore, although AIC and BIC of the survival model with the gamma distribution were slightly smaller, we decided to use the log normal model to develop a model with independent variables mentioned in Section 4.2. The probability density function of the log normal is: fy(y)= y42no exp 2a lT(log(y)-Xf/3)2 (5.4.1) Where y: duration of the claim (total STD days) X J : independent variables (gender, age, ICD9, etc) (3: Coefficient of factors (5.4.1) will be used to determine the conditional probabilities with the method discussed in Section 4.3. 52 5.4.2 The Predictive Model for Total and Additional STD Days The model to predict total and additional STD days for a claim which has already received a certain number of STD days was built with survival analysis. The final model to predict the average of total and additional STD days included all the factors shown in Table 4.3.1. The estimated coefficients for the model are listed in Appendix G. After the coefficient of the model was determined, the conditional distribution was determined based on the method described in Section 4.4. This model can provide the estimate of total and additional STD days for a claim with a particular condition. The following is an example of two types of claims. Table 5.4.2.1 Conditions of Examples Factor Claimant A Claimant B ICD9 cluster Strained back Broken knee cap Occupation type cluster Restaurant Fishing Industry type cluster Kitchen helper Vessel captain Age 31 58 Gender Male Female Area office Vancouver-Richmond Nanaimo Injury year: (injury year - 2000)* 10 1997 1998 Injury month July October Injury day of a week Tuesday Saturday The condition of Claimant A is minor compared to the condition of claimant B. For example, Section 5.2 indicated that the additional average STD days paid for a broken knee cap, which belongs to cluster 6, is about 45 days more than back injuries, which belong to cluster 1. Industry cluster 9 includes fishing and it has on average an additional 9 days more than industry cluster 1, which includes restaurant. Occupation cluster 12 has relatively minor injuries compared to claim B. The estimate of averages of additional and total STD day paid for these two claims are shown in Figure 5.3.2.1 and Figure 5.3.2.2 respectively. 53 Figure 5.4.2.1 Estimated Additional STD Days to Be Paid A d d i t i o n a l S T D Days - Claimant A —®~~ Claimant B Number of Days The graph indicates that when Claimant A has been away from work more than 20 days, he w i l l receive an additional average of 34 days. Claimant B, on the other hand, w i l l receive an average additional days of 349 days i f she has been away from her work more than 20 days. Figure 5.4.2.2 Estimated Total STD Days to Be Paid Total S T D Days -•— Claimant A - Claimant B 800 700 600 V ) >. re Q Q r -</> 5 o 500 400 300 20 40 60 80 100 120 Number of STD Days 140 160 180 54 According to Figure 5.4.2.2, when Claimant A has been away from work more than 20 days, he will receive an average total of 54 days. Claimant B, on the other hand, wil l receive an average total days of 369 days i f she has been away from her work more than 20 days. Overall, depending on the condition of a claimant, the total and additional STD days to be paid wil l vary. The more severe the claimant's conditions, the more additional and total STD days wil l be received by him or her. In addition, the longer a claimant has been away from work, the more he or she will get additional STD days. As a result, such individuals will get a larger number of total STD days at the end. This pattern was also observed in Figure 5.3.2.1 and Figure 5.3.2.2 which were constructed based on the actual number of STD days. The prediction of total and additional STD days based on this analysis can be created for any type of disaggregation by selecting a particular combination of characteristics of interest for a claim. 6. Discussion The objective of this study was to predict STD days paid on a claim. Predicting STD days involves the following unknown factors for STD days: how long a claim stays active, whether a claim reopens, i f it reopens, how long it stays active, and how long it stays closed before it reopens. We had to build predictive models with these unknown factors in mind. Based on the preliminary analysis, we discovered several points. First of all, the distribution of STD days paid to claims is skewed to the right. Over 90% of claims closed within 6 months after the first STD payment occurred. However, there were a few claims that were active for more than 5 years. Over 85% of claims never reopened once they were closed. The distribution of the time between the previous closure and the following reopening is skewed to the right. The median of time between the first closure and the following reopening was one month. The median of the amount of time of the third or more closure and the following reopening increased to three months. The mean of the time between the closures and the following reopenings increased as claims reopened more. We used the truncation method to overcome some of the unknown factors of claims mentioned above. Truncation of 6 months, 12 months, 18 months and 24 months starting at the registration day and the first STD payment day were studied. As mentioned earlier, the majority of claims closed within 6 months of their registration day or the first STD payment. The medians of these truncation periods are about 10 days, but the means increased from about 27 days to 39 days as the truncation period increased. Cluster analysis was applied to some of the categorical variables that contained a large number of categories. The clusters were formed based on their 5, 10, 20, 35, 50, 65, 80, 90, and 95 percentiles of STD days. After the clustering, we compared the predictive power of the clusters and WCB level aggregations based on their MSE. Clusters showed slightly better predictive power, and we decided to use clusters for analysis throughout this study. 55 After the preliminary analysis, we analyzed potentially important factors provided by WCB in order to rank them according to their impact level on STD days. We analyzed 17 factors by using linear regression analysis. Based on the analysis, we discovered that medically-related factors such as ICD9 or nature of injury have more influence on STD days than the rest of the factors. Finally, we developed three types of predictive models for STD days to be paid. 1) Model for Total STD Days Paid Within the Truncation Period at the Claim Level developed with linear regression analysis We used the truncation method to develop models. Models with truncations of 6, 12, 18 and 24 months with the first STD payment as the starting point were developed. The shorter the truncation was, the better the prediction was. In addition, the majority of claims closed within 6 months of the first STD payment. However, a model with a shorter truncation period will not include the claims that stay open for a long time. These claims are costly to WCB. Based on the models, we could study the effect of factors; for example, the occupation type of logging has an additional 23 days compared to the occupation type of system analyst. Based on this model, W C B can learn about the relationship between the number of STD days paid and a particular type of claim. 2) Model to predict Total STD Days for Open Claims at the Claim Level, Area Office Level, and W C B Level developed with linear regression analysis With this model, we predicted total STD days to be paid in the coming year for claims which were open at the end of the current year. Models to predict total STD days for 6, 12, and 18 months were developed. At first, we developed models at the claim level, then we aggregated the predicted STD days to the area office level and finally to the W C B level. The percentage difference between the actual and predicted total STD days for the area office was between about 1% and 50%. The percentage differences for W C B were between about 0.02 and 10%. This model will help budgeting for the coming year for those STD claims which are open at the end of the current year. 3) Model to predict total and additional STD with survival analysis This model gives the prediction of total and additional STD days for a claimant who has already received a certain number of STD days. We discovered that the more STD days a claimant has already received, the more additional STD days he or she will receive. This model can be disaggregated based on the characteristics of claims of interest. This model provides an indication of the number of additional and total STD days that will be paid to a particular type of claim when it has been in the system for a while. 7. Future W o r k 1) Predictive model development of STD days for claims that reopen in the next year In this study, we developed the model to predict total STD days in the coming year for claims open at the end of the current year. In order for W C B to budget for the coming year, it is necessary to predict claims which are closed at the end of the current year, but will reopen in the coming year. 2) Predictive model development of STD days for claims which w i l l open in the next year In addition to the model mentioned above, W C B needs to predict total STD days for new claims in the next year to budget for entire claims which will receive STD days in the next year. 56 3) Effect of caseloads of claim managers on STD days Claim managers have a varying caseloads at WCB, and the effect of the caseloads of claim managers on STD days paid can be studied. If any effect is detected, then W C B will be able to adjust caseloads so that claims will be dealt with in a suitable manner by claim managers. For example, i f an optimal number of cases (claims) for a case manager with a certain amount of experience can be determined, claims will be monitored more efficiently and can be closed on time. 57 Reference Allison, P. D. 2001. Survival Analysis Using The SAS System: A Practical Guide. North Carolina: SAS Institute Inc. The Association of Workers' Compensation Boards of Canada, 1998. History / Overview, http://www.awcbc.org/english/history.htm The Association of Workers' Compensation Boards of Canada, 2003. Definition for Key Statistical Measure. http://wA\^v.awcbc.orR/english/board_pdfs/Definit.ions_2003.pdf Carvalho, A. and Skoulakis, G. 2003. Time Series Mixtures of Generalized t Experts, The University of British Columbia: 1-41 Delwiche, L.D and Slaugher, S. J.2003. The Little SAS Book. North Carolina: SAS Institute Inc. Dobson, A . J. 1997. An Introduction to Generalized Linear Models. New York: Chapman & Hall/CRC Government of British Columbia, 2003. Workers' Compensation Act, http ://www. qp. go v .be. ca/statreg/ stat/W/96492_01 .htm Greene, W. H. 1997. Econometric Analysis. 3rd ed. New Jersy: Prentice-Hall, Inc. Griffiths, W. E, Hi l l , R. C , and Judge, G. G. 1993. Learning and Practicing Econometrics. New York: John Wiley & Sons, Inc. Hennessey. J. C, and Muller, S. 1995. The Effect of Vocational Rehabilitation and Work Incentives on Helping the Disabled-Worker Beneficiary Back to Work, Social Security Bulletin, 58, 1: 15-28 Hosmer, D. W, and Lemeshow, S. 1999. Applied Survival Analysis: Regression Modeling of Time to Event Data. New York: John Wiley & Sons, Inc. Koehler. A . B. and Muphree, E. S. 1988. A Comparison of the Akaike and Schwarz Criteria for Selecting Model Order, Applied Statistics: 37, 2: 187-195 Mason, K. 1993. Claim Duration, President's Quarterly Report Second Quarter 1993, Workers' Compensation Board of British Columbia: 92-100 Motulsky, H. Multicollinearity in Multiple Regression, www.graphpad.com/ai1ic1es7multicollinearitv.htm Parker, S., Peters, G., and Turetsky, H. 2002. Corporate Governance and Corporate Failure: a Survival Analysis, Corporate Governance, 2: 4-12 58 SAS Institute Inc. 1983. SAS Technical Report A-108 Cubic Clustering Criterion, Cary, NC: SAS Institute Inc., http://support.sas.com/publishing/pubcat/techreports/5903.pdf Trevor, C. 2001. Interactions among Actual Ease-of-Movement Determinants and Job Satisfaction in the Prediction of Voluntary Turnover. Academy of Management Journal, 44: 621-638 Urbanovich, E., Young, E. E., Puterman, M . L. , and Fattedad, S. O. 2003. Identifying High-Risk Claims within the Workers' Compensation Board of British Columbia's Claim Inventory by Using Logistic Regression Modeling, Interfaces: 33, 4: 15-27 Weisberg, S. 1985. Applied Linear Regression 2nd ed. New York: John Wiley & Sons, Inc. Workers' Compensation Board of British Columbia 2002. 2002 Annual Report, http://www.worksafebc.com/publications/reports/annual_reports/assets/pdf/ar2002.pdf Workers' Compensation Board of British Columbia 2003, Riskwatch Newsletter, http://www.worksafebc.com/publications/newsletters/riskwatch/Assets/PDF/riskwatch01 03 senior.pdf Work Loss Duration Institute, 2003. Official Disability Guidelines Main Section Table of Contents ICD9 Major Categories, http://www.disabilitvdurations.coni/icd9top.htm 59 Appendix A: Description of Factors Used in Analysis Table A . l : Description of Factors Used in Analysis 1) Claim Master Data Type Description claim number Num Code for individual claim claim injury date Num Injury date of the employee Injury Year Num Injury year of the employee cal num week day Num Injury day of the week of the employee pay period from dt Num The first day people missed work cr claim regi date Num The registration date of the claim area office code Char Area office code inj wkr age qty Num Age of the injured employee sex type code Char Gender of the employee first std pay dt Num First std payment date first fnlstd paydt Num First final std payment date std days paid qty Num # of std days paid cnt for csms id Num Industry identification of employer that cost was charged to osh location Char Operating location where person injured nature inj type cd Char Nature of injury type body part type cd Char Injured body part icd9 med dgns cd Char ICD9 code srijt type code Char Source of injury accident type code Char Type of accident / occpt statscan cd Char Stats Canada occupation code 2) Factors Created Type Description cluster accident Char Cluster of accident type cd cluster bodypart Char Cluster of body part type cd cluster icd9 Char Cluster of icd9 med dgns cd cluster industry Char Cluster of industry type cluster nature inj Char Cluster of nature inj type cd cluster occupation Char Cluster of occpt statscan cd cluster source inj Char Cluster of srijt type code injury month Num Injury months extracted from claim injury date interval dis pay Num first std apy dt - pay period from dt interval inj pay Num first std pay dt - claim injury date interval reg_pay Num first std_pay dt - cr claim regi date n of injuries Num Number of injuries each claims have trunc 6 1st std months Num 6 months truncation starting 1st STD payment day trunc std 12 1st std months Num 12 months truncation starting 1 st STD payment day trunc 18 1st std months Num 18 months truncation starting 1st STD payment day trunc 24 1st std months Num 24 months truncation starting 1st STD payment day trunc std 6 months Num 6 months truncation starting registration date trunc std 12 months Num 12 months truncation starting registration date trunc std 18 months Num 18 months truncation starting registration date trunc std 24 months Num 24 months truncation starting registration date trunc std 6 inj months Num 6 months truncation starting injury date trunc std 12 inj months Num 12 months truncation starting injury date trunc std 18 inj months Num 18 months truncation starting injury date. truncstd 24 inj months Num 24 months truncation starting injury date 60 0\ "0 H V re S. H re o o o p © o o o o o o o o o o © o o. so o o © © o o o o o o © o © o © o a. © o © o o © © © © © © © o © © —t © O N o 4^ © © © © —i O N o 4^ O N o o m a o S. re B — TS O u » 9: 3 s fi-ts © O N O N 4^ 4^ to I O © N O N O © N O t/3 O B -1 re o 4^ O N 4^ © 4^ 4^ i o to to © N O N O © N O © to S o O N 4^ O N 4^ O N O N 4*. 4>. - O 4^ to 00 to to © -J N O N O a o s n H ro W re • A e n B r» S" V ! 65 3 a fD 1 n n B 65 ore n Vi B NS NO ^1 (TO •< 65 fD B 0 s NO NO O S NO NO NO *0 o n a 3 O "1 65 B * J 65 o S re B re 2? to o o to > •a fD B a w o fD B £L vT T3 O Ti et-69 B a n © Vi » B Vi Vi B" ore fD Vi U) J> ON o w o a-O o 3 a. rt o' 3 t o oo to oo < ct cr o < 5' ro 3 o a a § O 3 3 5' ro o p a N ' Ct a-51 L O - O NO 4^ a. O t-t O < 3 ni a-2! o O o |NJ NO 00 O £u o5 O ^ X 3 p 2 o O o e= Ira O cr o I* cn IP o 3 «-c p o rV 3 rt o 3 J > NO 00 UJ ON NO to o |00 00 £ 3 3 ^ " w o 2 a>' 3' o jr o 00 rt 2 o 55 ft a. oo 5' 00 5' 00 O o o M O P 00 s o > era U*l K> 4^ oo Jrt ON •n oo 13 3' 00 O 3 00 o 3 3 00 O < rt rt X s_ era c 3 era O _^ ct' TI P H o oo ro 00 U l UJ 00 >p3 ro cr 3' ft fra P-X > P cr 3 rt Cu «° 3 s C u SL S. ct era 3 W »< ft o o 3 3" X O O ft' o 00 rt" C o CO H Si cr ST n • rt D a o o a rt P rt H rt © ft on rt "1 rt O N J M 05 rt 65 rt *1 n V 65 P a w 69 3 •a ft" rt rt ON -1^ O *>. to h-l o N O 30 O O N *>. uster % K) [- U l O N U> U l ON 4^  o K> >_» OO O u> 4^  O N U) O 4^  O ON O U l ~-J b o t o 4i. b U l bo U l ^-J ~J u> 4^  U> U l O N U> O oo t o O N U l O 5" -j o ~ J O 4^  O N SUII 4^  O 00 U> u> oo O ON ON NO U> NO o O N 4^  4i. U l U l U> O N t o t o to 00 00 Me » CTN U l NO u> U l o NO O N UJ U) 00 s» 00 00 OO —1 O U l o U l U l s U l OO UJ o K> U> o NO —1 IO O 4^  U l o O N t o Median 00 e it-s' SJ t o O 4^  4^  CTN —1 h—* U l ON O N t o U l 00 OO OO U> U l 4^  O N U l 4^  U> IO U l U> 4^  Percentile 00 X 00 ^ r Multiple Boi > • f l H m elvis c^  ead ervous Sys etinal Deta rain: Footf p 3 a. vi p 3 a. houlder. In 3 ft ft V umbar Ree Multiple Boi 3 rT T l O ingers. Exc humb: Tho xternal Eve o a- > o CO ft xternal Eve o* 3 ro ET o < ft p 3 ro 3 P o 00 3 ^ a. CO r-Nervous Shoe entl 1 And LeeCs"): CO r CD rp co H ine Clavicle ani Lumbo-Sacr Cervical Reei arts: Lower Le mfsl: Pelvic R Thumb: HandC c Reeion: Groi inerficial Corn 00 3 ine Clavicle ani o ro rt VI 3 ro " —' • ink 3 TO P_ Breakd 1US ink Sr.; Re n VI ion . Excen > w Breakd VI p 3 2 5" 3 ervi Nec (Incl. Sac . Excen bras » own); a: F.lbi cai Ve k And (Incl. Sac t Fineers: Chest. Ex ions): iple D Lung(s), Pleu CO c a d ft ;rtebrael; Mul Shoulder roiliacl t Fineers: Chest. Ex Evefsl: Scale escriptions »-i > 3] o P rt . a i—i 3 S* 3 r^-Vt H 3 1 Ol >-i c ft 3 3 P_ na r o O ft o a fD 00 5' ;ases c ns ;ases o ;ases c p H p O a •< *0 e Os to "ON to OS ON O OO 4^ a ? 12 3 3 ° O o p fT p a ro « I ro Cd o a »n O o" co ro a. S 2 P o 00 ro ta g a 2 P a o o o ro :CL <. O P o TJ r ro ON To NO 00 U J o u> NO 00 00 a fei o fr P £ ° ro c oo CT O 3_ ro c T3 T J ro 00 00 00 a T3 H ro a CL o a NO o ui ui NO to to UJ O o I TJ * i P O £T >-t ro O > 3 fr o CO ro O. TJ p o ro O TJ P 3 ro ro O p ro V o G P o p. co O <T> 3 CL c ON U> to Lo to oo o to o 4^  TP ^ o Iro P la cr. o IO a P ro iro 3 ro TJ S° P H o ro ro O TJ T i P o ro O P* ° ro co t u S. o o 3 ro ro 00 r + •1 P 3' o 3 o p' ro 4i. NO Z O O O 3 3 o ON U l NO U l ON 4^ 4^ O 3 o l o o |rrj 3 o a a a-r P o o 3 o w o o M 3 ro m fT o o o 3 5' 3 fr ON © U) ON ON 4^ NO o o TJ cr P P" CD co o a P a P-O ra ro 3 TJ P Cd o 3 ro CO T i O O H p ET CO <» T i O S o ff O 3 ro 2 o oo U J oo U l to oo I O 42. to NO UJ TJ O O cr o g P" o 2 r O 3 ,3 P - TJ ro O i-h H cr P' fr TJ cr s_ P" T3 T3 W 3 P" O ro CL to o 4^ O OO U l o 4i. ON W fr t§. o o a K o 3 oo <^ a B3 w 3 5-ro CO O T3 P cr O T> W fr o 00 UJ U l OS OS 4^ o £ ° ro ^ 3 CL 3 t. * S l 2, ° §* 2 M 3 2 ro c ? ^ i-t cr c : i fr x cT it B S. cr ^ o "> cr o 3 O o o P 5' 3 00 p 3' > 3 fo* > o cr P 3 re o re H » O n o NO Q 5" «> 1 UJ U l UJ UJ O o cr rt 3 > cr in a rt 3 13 o 03 3 Z m n w sr o rt o o 3 o" 3 O 00 3* O .CD i — -•8 ?. CD Id o CJ n o 3 o 3 © UJ un k) < rt ^ rt & a 3 S-o 03 3 03 B oo > 3 X 00 p J o . rt 3 rt I* 03 H 6s sr p i n 65 rt O B -I H rt rt 03 U J O oo U l 'ho o UJ O N oo o ct i r . IT 3 ° P O P c« 00 3 O CT> ST CT> o _ cT O O 5 cr. 3 3 2 5' 3 o & CT> 13 2? B* 5' CTQ > O P o. Cl 3 ?3 3 00 X o p 3 o 3 o P a. K O N N O O N 00 U J ON ON P 3 a. O rt <' rt U J J> ON 00 00 ON NO p a. P FS 3 C P *r o •-t pi-CT> 13 rt o O rtj 3' y o § rt> 3 ro 3 o —' ro r o ro X p 3_ 5' rq oo ft ct 0 00 O N O N U J oo U l U l UJ J>. o 00 Q 3 ft O N —1 X c rt! rt ct rt 3 a o o o 3 ro cr o rt P 3 03 CD H 3 ro O p rt K o rt P 3 2 3 o 5' 3 o rt & rt 3 O < P o ' 3 3 a. 3 3 fQ rt O o 3 & rt S' 3 5' IP p p" UJ o bo U l O 2" o rt, O ST a 5' *3 n> »i n re s ?3 ct p 3 rt P 3 r 1 o O o < ct 10 3 cr o ' 00 o cr o o 3 la o o s H 65 cr rt" O B a B c/> rt rt. v; H v; T J rt o 5" VI rt rt "1 o 3 OJ NO OJ OJ J> ON U l OJ o • f l P' ro shi 3 3 o OQ CO 5' < ro CD W H cn CD rt P 5' S ro OO T3 ro rt »-t co o CJ> • n ssi ish o ro 3 rt P_ 3 oo ro 3 3 o 3 Ol 3 o "3. o 3 ro n cr o • d rt p o X ro p • f l ON ON ON NO NO O l ON 3 S-P 3 3 o <pa p !X g ° - I' o es rt, oo ro ^ . 3 <-a ro rt at O 3 £L 00 ro oo e ro p p 3 CL 13 O ON NO OJ 00 Oh o 3" r l § B : 2 S 3 O ro ro rt p r S p 3 a. 3 oo H 3 ro ,ro & K ro P rt. 3 OQ > a. 3 5' P 3 P cro ro o cr ON t o OJ oo P © rt £C 3" ro 2 OQ 3 « p < o o P 3 cr. *" o 3 on ro p . rt iO P o o o 3 13 p hj a. a ° po cn rt _ P 5' 3 * ON -rt NO J>-ON oo fO oo t o t o p 3 p OQ ro rt |rt "> 5' O 3-ro 00 3 13 ro r o OQ *£ 5' OQ P 3 a. in o > 3 3 O C 3 P 3 a. O cr o-i o ON a r o —. OQ fr <E. P W g. S " p w g. P on 3 ro rt v ; O 13 ro rt p o rt oo c 3 C L ro O 3 3 C L > o B_ 13 rt ro |o p 3 a. a p 3 ro O 13 ro t o OJ O fO © o ro OJ oo NO p 3 a-00 3 13 ro rt < cn O > n o 3 cr P o o 13 ro p 3 C L 00 3 13 ro rt < CO O O NO ON ro OJ o ° CL. °° o 3 OQ P P 3 C L ?3 ro tro > 3 o 3 rt P 3 CL 3 00 CL ro p 3 P era ro 3 o "NO 0/1 ON on OJ ON NO P 3 C L X ro p 00 ro o ro on 00 3 13 13 O z 00 3 13 ro rt < cn" O P 3 C L o r o OJ oo r o ro x r S S-ro >-i ro 3 U rt o oo 2 3 OQ 3 00 r o 3 OQ to c r o 3 o rt ro p 3 C L O e O E" P s re lie 31 o C L I O I o 3 3 > Ct ro 3 C L S1 s*1 ro 3 ro •5* ro O cr ro o o o o o 3 OQ C L 3_ 3' cro P 3 CL re n o rt O = •a rt o" B H rt rt U i F CP OS C X ! ON U J 4^ 00 © NO oo 3 2 3 o* 5' ro oo < UJ o S" <T O o 3 3 ff •-1 ro CL I* ro 3 - l ro Pu O 3 o ro 00 3 o i-t-3 >-! ro P 3 CL 00 3 p o ro U l N O N O U ) b oo U J H 3 CTQ O 4 O P o o 3 O Q Cd O 3 CTQ P 3 CL 3 OQ P O C T 5' ro O o O o ro CL U l © -o. U J b oo O N N> NO O N oo U l 5N p < ro cr o ' ro" T J 3 3 ff n o 3 &• 5' 3 o o 3 < ro o O ro ro r o OQ OQ 5' O Q P 3 C L O O C L 3 O Q O oo 13 ro o N ro C L (TJ X o p < p 5' OQ O N N O o U l INJ b tNJ O N 4> r ca o O o 3 o T J O ,3 3' »-i ro C L 3 o ro T J O * O 3 P R 1 P ^< co Q o 3 3 C L 00 3 U J "NO 4^ K J O N N O K P 3 O . O o 2 o 3 a o ro »-t ro C L U J U l - J N O O 5" V ro "i ft o o E" U J -J I b 00 T J O o CL Od ro < ro >-! P OQ ro F ro 3 O Q 3 ro 00 TJ ro N ro CL 3 O Q 3 OQ re Q. S° 3 P i co o x ro CO c 3 a ro" © ro o s Appendix D: MSE_CV, Root MSE_CV, Relative MSE_CV, and Rank of Important Factors Table D.l: MSE_CV, Root MSE_CV, Relative MSE_CV, and Rank of Important Factors with 6 Months Truncation with a Starting Point of Registration Day Factor 6 Months MSE C V Root M S E C V Relative MSE C V M A E C V Rank I C D 9 Code - Cluster 1500.57 38 .74 0.0462 26 .49 1 Nature of Injury - Cluster 1555.68 39 .44 0.0479 27.13 2 Multiple Injuries - Adjusted 1680.26 40 .99 0.0517 28.98 3 Accident Type - Cluster 1680.88 41 .00 0.0518 28.70 4 Body Part Type - Cluster 1680.94 41 .00 0.0518 28.74 5 Source of Injury - Cluster 1685.25 41 .05 0.0519 28 .87 6 Industry Type - Cluster 1692.95 41 .15 0.0521 29.05 7 Age 1724.51 41 .53 0.0531 29.42 8 Occupation Type - Cluster 1725.75 41 .54 0.0531 29.46 9 Multiple Injury - Original 1735.69 41 .66 0.0534 29 .69 10 Registration-Payment Interval 1753.59 41 .88 0.0540 29.82 11 Area Office 1755.43 41 .90 0.0540 29 .87 12 Gender 1755.97 41 .90 0.0541 29 .87 13 Injury Weekday 1756.76 41.91 0.0541 29.88 14 Disability-Payment Interval 1757.88 41 .93 0.0541 29.86 15 Injury-Payment Interval 1758.33 41 .93 0.0541 29.88 16 Injury Month 1759.24 41 .94 0.0542 29.91 17 Injury Year (Rescaled) 1759.28 41 .94 0.0542 29.91 18 Intercept Only 1759.97 41 .95 0.0542 29.92 19 69 Table D.2: M S E _ C V , Root M S E C V , Relative M S E _ C V , and Rank of Important Factors with 12 Months Truncation with a Starting Point of Registration Day Factor 12 Months MSE CV Root MSE CV Relative MSE CV M A E CV Rank ICD9 Code - Cluster 3176.46 56.36 0.0465 33.86 1 Nature of Injury - Cluster 3303.82 57.48 0.0483 34.76 2 Multiple Injuries - Adjusted 3498.48 59.15 0.0512 36.92 3 Body Part Type - Cluster 3547.42 59.56 0.0519 36.76 4 Accident Type - Cluster 3553.91 59.61 0.0520 36.74 5 Industry Type - Cluster 3565.66 59.71 0.0521 37.13 6 Source of Injury - Cluster 3570.84 59.76 0.0522 36.98 7 Age 3628.33 60.24 0.0531 37.61 8 Occupation Type - Cluster 3636.88 60.31 0.0532 37.69 9 Multiple Injury - Original 3651.93 60.43 0.0534 38.10 10 Area Office 3684.46 60.70 0.0539 38.16 11 Gender 3691.25 60.76 0.0540 38.16 12 Injury Weekday 3692.43 60.77 0.0540 38.18 13 Registration-Payment Interval 3693.46 60.77 0.0540 38.18 14 Disability-Payment Interval 3695.14 60.79 0.0540 38.19 15 Injury Month 3695.87 60.79 0.0541 38.21 16 Injury Year (Rescaled) 3696.29 60.80 0.0541 38.21 17 Injury-Payment Interval 3696.57 60.80 0.0541 38.22 18 Intercept Only 3696.67 60.80 0.0541 38.22 19 Table D.3: M S E _ C V , Root M S E _ C V , Relative M S E _ C V , and Rank of Important Factors with 18 Months Truncation with a Starting Point of Registration Day Factor 18 Months MSE CV Root MSE CV Relative MSE CV M A E CV Rank ICD9 Code - Cluster 4769.06 69.06 0.0471 38.17 1 Nature of Injury - Cluster 4955.7 70.40 0.0489 39.23 2 Multiple Injuries - Adjusted 5125.36 71.59 0.0506 41.19 3 Body Part Type - Cluster 5264.76 72.56 0.0520 41.27 4 Accident Type - Cluster 5276.13 72.64 0.0521 41.25 5 Industry Type - Cluster 5292.65 72.75 0.0522 41.67 6 Source of Injury - Cluster 5300.05 72.80 0.0523 41.50 7 Age 5370.95 73.29 0.0530 42.14 8 Occupation Type - Cluster 5384.53 73.38 0.0531 42.27 9 Area Office 5439.68 73.75 0.0537 42.73 10 Gender 5454.01 73.85 0.0538 42.73 11 Injury Weekday 5455.57 73.86 0.0538 42.75 12 Registration-Payment Interval 5459.45 73.89 0.0539 42.78 13 Disability-Payment Interval 5459.69 73.89 0.0539 42.77 14 Injury Month 5459.75 73.89 0.0539 42.78 15 Injury-Payment Interval 5460.25 73.89 0.0539 42.79 16 Injury Year (Rescaled) 5460.37 73.89 0.0539 42.79 17 Intercept Only 5460.7 73.90 0.0539 42.79 18 Multiple Injury - Original 5461.88 73.90 0.0539 43.10 19 70 Table D.4: MSE_CV, Root MSE_CV, Relative MSE_CV, and Rank of Important Factors with 24 Months Truncation with a Starting Point of Registration Day Factor 24 Months M S E C V Root M S E C V Relative M S E C V M A E C V Rank ICD9 Code - Cluster 5 9 5 7 . 1 6 77 .18 0 . 0 4 7 3 4 0 . 6 1 1 Nature of Injury - Cluster 6 1 8 3 . 4 3 78 .63 0 .0491 4 1 . 7 2 2 Multiple Injuries - Adjusted 6 3 4 8 . 8 7 79 .68 0 . 0 5 0 5 4 3 . 7 4 3 Body Part Type - Cluster 6 5 4 4 . 3 7 8 0 . 9 0 0 . 0 5 2 0 4 3 . 8 8 4 Accident Type - Cluster 6 5 6 0 . 0 5 80 .99 0 .0521 4 3 . 8 6 5 Industry Type - Cluster 6 5 7 6 . 7 5 8 1 . 1 0 0 . 0 5 2 3 4 4 . 2 9 6 Source of Injury - Cluster 6 5 9 1 . 2 3 81 .19 0 . 0 5 2 4 4 4 . 1 4 7 Age 6 6 7 0 . 2 4 81 .67 0 . 0 5 3 0 4 4 . 7 9 8 Occupation Type - Cluster 6 6 8 7 . 8 8 81 .78 0 . 0 5 3 2 4 4 . 9 3 9 Area Office 6 7 4 6 . 7 8 82 .14 0 . 0 5 3 6 4 5 . 4 1 10 Gender 6 7 6 7 . 2 3 82 .26 0 . 0 5 3 8 4 5 . 4 0 11 Injury Weekday 6 7 6 8 . 8 3 82 .27 0 . 0 5 3 8 4 5 . 4 3 12 Injury-Payment Interval 6 7 6 8 . 9 1 82 .27 0 . 0 5 3 8 4 5 . 4 3 13 Injury Month 6 7 7 3 . 2 8 82 .30 0 . 0 5 3 8 4 5 . 4 6 14 Registration-Payment Interval 6 7 7 4 . 0 4 82 .30 0 . 0 5 3 8 4 5 . 4 7 15 Injury Year (Rescaled) 6 7 7 4 . 0 9 82 .30 0 . 0 5 3 8 4 5 . 4 6 16 Disability-Payment Interval 6 7 7 4 . 1 8 82.31 0 .0538 4 5 . 4 7 17 Intercept Only 6 7 7 4 . 2 3 82.31 0 .0538 4 5 . 4 7 18 Multiple Injury - Original 6 7 7 6 . 5 7 82 .32 0 . 0 5 3 9 4 5 . 9 0 19 Table D.5: MSE_CV, Root M S E _ C V , Relative MSE_CV, and Rank of Important Factors with 6 Months Truncation with a Starting Point of 1st STD Payment Day Factor 6 Months M S E C V Root M S E C V Relative M S E C V M A E C V Rank ICD9 Code - Cluster 1899.95 43.59 0 .0460 28 .32 1 Nature of Injury - Cluster 1970.68 44 .39 0 .0477 29 .07 2 Accident Type - Cluster 2139 .77 46 .26 0.0518 30 .92 3 Body Part Type - Cluster 2140 .89 46 .27 0.0519 30.98 4 Multiple Injuries - Adjusted 2148.43 46.35 0.0520 31 .32 5 Source of Injury - Cluster 2157.71 46.45 0.0523 31 .22 6 Industry Type - Cluster 2160.93 46 .49 0.0523 31 .36 7 Injury-Payment Interval 2171.61 46 .60 0 .0526 31 .97 8 Registration-Payment Interval 2191.06 46.81 0.0531 32 .09 9 Disability-Payment Interval 2193 .34 46.83 0.0531 32.21 10 Age 2201.51 46 .92 0.0533 31 .79 11 Occupation Type - Cluster 2209.15 47 .00 0.0535 31 .89 12 Multiple Injury - Original 2227 .57 47 .20 0 .0540 32 .12 13 Area Office 2241.03 47.34 0.0543 32 .32 14 Gender 2243 .79 47 .37 0.0543 32 .32 15 Injury Weekday 2245 .50 47.39 0.0544 32.34 16 Injury Month 2248 .50 47 .42 0.0545 32 .37 17 Injury Year (Rescaled) 2248 .82 47 .42 0.0545 32 .37 18 Intercept Only 2249 .17 47.43 0.0545 32.38 19 71 Table D.6: MSE_CV, Root MSE_CV, Relative MSE_CV, and Rank of Important Factors Factor 12 Months MSE CV Root MSE CV Relative MSE CV M A E CV Rank ICD9 Code - Cluster 3574.10 59.78 0.0464 35.16 1 Nature of Injury - Cluster 3716.95 60.97 0.0482 36.11 2 Multiple Injuries - Adjusted 3955.94 62.90 0.0513 38.53 3 Body Part Type - Cluster 4002.30 63.26 0.0519 38.29 4 Accident Type - Cluster 4009.04 63.32 0.0520 38.26 5 Industry Type - Cluster 4027.05 63.46 0.0523 38.71 6 Source of Injury - Cluster 4038.11 63.55 0.0524 38.58 7 Injury-Payment Interval 4094.75 63.99 0.0531 39.47 8 Age 4099.27 64.03 0.0532 39.22 9 Occupation Type - Cluster 4114.66 64.15 0.0534 39.36' 10 Registration-Payment Interval 4118.47 64.18 0.0534 39.60 11 Disability-Payment Interval 4125.21 64.23 0.0535 39.71 12 Multiple Injury - Original 4133.06 64.29 0.0536 39.76 13 Area Office 4162.49 64.52 0.0540 39.83 14 Gender 4172.43 64.59 0.0541 39.82 15 Injury Weekday 4174.22 64.61 0.0542 39.85 16 Injury Month 4178.20 64.64 0.0542 39.88 17 Injury Year (Rescaled) 4178.83 64.64 0.0542 39.89 18 Intercept Only 4178.92 64.64 0.0542 39.90 19 Table D.7: MSE_CV, Root MSE_CV, Relative MSE_CV, and Rank of Important Factors with 18 Months Truncation with a Starting Point of 1st STD Payment Day Factor 18 Months MSE CV Root MSE CV Relative MSE CV MAE CV Rank ICD9 Code - Cluster 5170.85 71.91 0.0470 39.25 1 Nature of Injury - Cluster 5373.18 73.30 0.0488 40.35 2 Multiple Injuries - Adjusted 5582.34 74.72 0.0507 42.53 3 Body Part Type - Cluster 5724.54 75.66 0.0520 42.56 4 Accident Type - Cluster 5735.38 75.73 0.0521 42.52 5 Industry Type - Cluster 5758.86 75.89 0.0523 43.01 6 Source of Injury - Cluster 5772.82 75.98 0.0525 42.85 7 Age 5845.96 76.46 0.0531 43.50 8 Injury-Payment Interval 5847.98 76.47 0.0531 43.72 9 Occupation Type - Cluster 5867.13 76.60 0.0533 43.67 10 Registration-Payment Interval 5875.57 76.65 0.0534 43.86 11 Disability-Payment Interval 5885.10 76.71 0.0535 43.98 12 Area Office 5921.21 76.95 0.0538 44.13 13 Gender 5938.72 77.06 0.0540 44.13 14 Injury Weekday 5941.12 77.08 0.0540 44.16 15 Injury Month 5945.86 77.11 0.0540 44.19 16 Injury Year (Rescaled) 5946.52 77.11 0.0540 44.20 17 Intercept Only 5946.73 77.12 0.0540 44.20 18 Multiple Injury - Original 5971.53 77.28 0.0543 44.57 19 72 Table D.8: M S E _ C V , Root M S E _ C V , Relative M S E C V , and Rank of Important Factors with 24 Months Truncation with a Starting Point of 1st STD Payment Day Factor 24 Months MSE CV Root MSE CV Relative MSE CV MAE CV Rank ICD9 Code - Cluster 6293.61 79.33 0.0473 41.42 1 Nature of Injury - Cluster 6532.47 80.82 0.0491 42.57 2 Multiple Injuries - Adjusted 6728.90 82.03 0.0506 44.75 3 Body Part Type - Cluster 6927.94 83.23 0.0521 44.85 4 Accident Type - Cluster 6943.08 83.33 0.0522 44.81 5 Industry Type - Cluster 6966.06 83.46 0.0523 45.30 6 Source of Injury - Cluster 6985.62 83.58 0.0525 45.15 7 Age 7066.47 84.06 0.0531 45.81 8 Injury-Payment Interval 7076.87 84.12 0.0532 46.03 9 Occupation Type - Cluster 7089.93 84.20 0.0533 45.98 10 Registration-Payment Interval 7106.31 84.30 0.0534 46.18 11 Disability-Payment Interval 7117.21 84.36 0.0535 46.30 12 Area Office 7147.98 84.55 0.0537 46.46 13 Gender 7171.12 84.68 0.0539 46.45 14 Injury Weekday 7173.49 84.70 0.0539 46.48 15 Injury Month 7178.37 84.73 0.0539 46.51 16 Injury Year (Rescaled) 7179.20 84.73 0.0540 46.52 17 Intercept Only 7179.24 84.73 0.0540 46.53 18 Multiple Injury - Original 7204.10 84.88 0.0541 46.98 19 73 12 Months Truncat ion o o 5" c' o 3 6 Months Truncat ion o H H c ta C/3 W » 3 a JO o o 0 5 "1 n CO 65 rt pr 89 •t a rt TS en" rt rt O a o a o rt ON 65 3 a t>> o 3 • TS TS rt 3 a o a rt o H o rt CZJ H O d 85 1/1 a H o 85 rt rt 5" a 85 rt Q 5" i ' 24 Months Truncation o o NO 3 C I o c ro 4 CD o ii o o rt o c ro o —t o e ro ft I 18 Months Truncation o NO O D NO I 3* c n D NO I ' O ill O CD 3 Q-> CD o I c -a 5. o' 3 D -|cro fo o l c o 5" O n o a. si S I a. c rt << <l ro 2 > PS I O < 2 m i n < SB 2 M I o < H 3 e ore a = " ft o B* ti a oro rt Root MSE o 4^ U l Ol VI O O O O 00 o NO O 1^ O N Injury Year Injury Month Injury Weekday Gender c A rea Office a n a. H 2. Occupat ion o Age Group Industry ICD9 Intercept Only • • • X • • • X • • • X • • • X • • • X • • • X • • • X • • • X • 1 • • X • • • X • • rt 00 2 2 o o 3 3 rt t-r 3" 3" (A (A H H -r c c 3 3 n O 01 01 rt rt 5' 5" 3 3 X • M rt M Mont Mon hs ths Tr H uncat runca rt on ion 70 O o rt fl> 01 3 V) c 01 A m -l o fl> a < 2 o a. o o CZ3 o « re Vi v> o rt Cd 69 O ?r 09 "1 a C/3 rt-rt ns as" rt rt rt er o a o a rt V rt* c r O N i—' 00 65 B a 4^. o B rt c r Vi Table E.3: Coefficients of Truncation Model F a c t o r P a r a m e t e r E s t i m a t e s 6 M o n t h s 12 M o n t h s 18 M o n t h s 24 M o n t h s I n t e r c e p t 89.62 120.73 130.75 138.52 1 -0 5 1 -3.39 -4 Xh -7 66 2 14.02 12.95 12.33 9.95 3 20.68 22.42 23 .02 21.46 4 48.34 57.95 . 63 .68 6.4.87 5 88.05 115.30 131.80 137.01 I C D 9 C o d e C l u s t e r 6 37.37 42 .17 43 .82 42 .99 7 16.34 20 .30 23 .57 24.16 8 64.41 79.23 89 .19 90.86 9 70.78 89.31 101.83 107.06 10 39.82 50.98 57.62 59.47 11 110.35 152.72 182.89 197.68 1 -49.60 -72.22 -79 .58 -86 .19 2 -45.58 -67 .29 -74 .00 -80 .37 3 -40.50 -60.91 -66 .86 -72 .90 4 -36.51 -55.73 -61 .17 -66.73 5 -36.10 -54.64 -59 .17 -64 .26 I n d u s t r y T y p e C l u s t e r 6 -42.60 -63.54 -69 .46 -75.33 7 -33.32 -51.54 -55 .07 -60 .26 8 -24.89 -43.07 -46 .60 -52 .59 9 -20.70 -34.14 -34.51 -37 .67 10 -8.85 -27.42 -43 .96 -49.43 11 -8.71 -23.66 -20 .52 -24.82 12 -15.83 -25 .79 -21 .42 -22 .77 20 and under -19.19 -24.70 -26 .84 -27.71 21 -30 -14.76 -19.21 -20 .73 -21 .27 A g e G r o u p 31 -40 -10.13 -13 .07 -13 .53 -13 .49 41 -50 -8.47 -10.94 -11 .15 -10 .99 51-60 -6.29 -7.61 -6 .76 -6 .34 1 -18.07 -19 .19 -20 .04 -18 .57 2 -16.11 -16.68 -17 .05 -15.43 3 -16.92 -17 .67 - 1 8 . 3 9 -16.83 4 -12.95 -12.88 -12 .96 -11.13 5 -12.40 -11.98 -13 .92 -12.31 O c c u p a t i o n T y p e ( C l u s t e r ) 6 -11.63 -11 .47 -10 .84 -9.38 7 -14.77 -13 .87 - 1 2 . 3 9 -8.68 8 -0.81 2.88 7.94 13.60 9 -19.71 -20.23 -18 .48 -16.95 10 -16.27 -15 .59 -17 .26 -15 .14 11 -17.99 -20.01 -23.81 -23.23 12 • -6 .67 -9.60 -V.60 -12 .27 | I Factors are not significant 77 Table E.3: Coefficients of Truncat ion M o d e l - continued Factor Parameter Estimates 6 Months 12 M o n t h s 18 Months 24 Months A b b o t s f o r d -2.43 -4 .76 -6 .70 -7.18 B u r n a b y -1.52 -3 .57 -5 .12 -5.61 Cen t r a l Service 2.98 -0.98 -3.31 -3.71 C o q u i t l a m -2.06 -4 .89 -6 .66 -7 .22 C r a n b r o o k . -2 .99 -5 .47 -4 .63 -4.95 K a m l o o p s 1.21 1.59 2.07 2.82 K e l o w n a -2 .49 -3.93 -3 .26 -2.91 N a n a i m o -1 .09 -3.38 -4 .22 -4 .69 A r e a Office N e l s o n -2.56 -3.18 -4 .28 -3.13 Pr ince Georgp -1.72 -4 .14 -5 .14 -5 .40 Surrey -1 .19 -3 .39 -4 .37 -4 .66 Terrace -2.66 -3 .87 -3 .50 -2.73 V a n c o u v e r Cen t ra l -0.91 -3.55 -5 .64 -6.33 V a n c o u v e r R i c h m o n d -0 .69 -2.43 -4.31 -4 .64 V a n c o u v e r South -1.56 -4 .99 -7 .19 -8 .29 V e r n o n -4.45 -7.58 -8 .66 -9 .48 V i c t o r i a -3.06 -4.73 -5.55 -5 .66 G e n d e r Female 4.51 5.19 5.88 6.02 M o n d a y -3.12 -3 .35 -3 .29 -3.15 Tuesday -3.31 -3 .62 -3.71 -3 .67 Injury Weekday Wednesday -2.61 -2 .56 -2 .29 -2 .12 T h u r s d a y -2 .09 -2 .06 -2.03 -1 .94 F r i d a y -0.5X -0.42 -0 OX 0.03 Saturday O.X5 0 90 x \ 1 1 9 1 27 February -0.75 -0 45 -0 OS | |J)3 M a r c h -0 32 -0 15 -0 19 -0.08 A p r i l 0.0S 0.49 0.93 1.00 M a y -0.51 -0 .12 0.36 0.41 June -0.16 0.10 1- 0.52 0.13 Injury M o n t h J u l y 0 0-1 0.35 0.68 0 54 A u g u s t 0 09 0.17 0.52 0 47 September 0 - | 0.79 1.27 M".23 Octobe r I OS 1.2-1 1.24 1.I3 N o v e m b e r 0 sy 0.57 0 79 0.53 December 0.16 0.50 1 26 1.19 Injury Y e a r : (injury year - 2000)*10 -0.01 -0 .04 -0.08 -0.11 | I Factors are not significant 78 Table E.4: Additional STD Days to the Baseline Based On 6,18, and 24 Months Truncation Models for ICD9 Cluster Rank Truncation 6 18 24 Cluster Additional STD days Cluster Additional STD days Cluster Additional STD days 1 11 110.86 11 187.75 11 205.34 2 5 88.57 5 136.66 5 144.67 3 9 71.29 9 106.69 9 114.72 4 8 64.92 8 94.04 8 98.52 5 4 48.85 4 68.54 4 72.53 6 10 40.34 10 62.48 10 67.13 7 6 37.88 6 48.68 6 50.64 8 3 21.19 7 27.88. 7 31.82 9 7 16.85 3 28.42 3 29.12 10 2 14.53 2 17.18 2 17.61 11 1 0.00 1 0.00 1 0.00 Table E.5: Additional STD Days to the Baseline Based On 6,18, and 24 Months Truncation Models for Industry Type Cluster Rank Truncation 6 18 24 Cluster Additional STD days Cluster Additional STD days Cluster Additional STD days 1 11 40.89 11 59.06 12 63.42 2 10 40.76 12 58.16 11 61.37 3 12 33.77 9 45.07 9 48.52 4 9 28.90 10 35.61 10 36.76 5 8 24.71 8 32.97 8 33.60 6 7 16.28 7 24.51 7 25.93 7 5 13.50 5 20.41 5 21.93 8 4 13.09 4 18.41 4 19.46 9 3 9.10 3 12.71 3 13.29 10 6 7.00 6 10.12 6 10.86 11 2 4.03 2 5.57 2 5.82 12 1 0.00 1 0.00 1 0.00 7 9 Table E.6: Additional STD Days to the Baseline Based On 6,18, and 24 Months Truncation Models for Occupation Type Cluster Rank Truncation 6 18 24 Cluster Additional STD davs Cluster Additional STD days Cluster Additional STD days 1 8 18.90 8 31.75 8 36.83 2 12 13.04 12 14.21 7 14.55 3 6 8.08 6 12.97 6 13.86 4 5 7.32 7 11.41 4 12.11 5 4 6.76 4 10.85 12 10.96 6 7 4.94 5 9.89 5 10.92 7 2 3.60 2 6.75 10 8.09 8 10 3.45 10 6.55 2 7.80 9 3 2.80 3 5.42 3 6.40 10 11 1.72 9 5.33 9 6.28 11 1 1.64 1 3.77 1 4.66 12 9 0.00 11 0.00 11 0.00 Table E.7: Additional STD Days to the Baseline Based On 6,18, and 24 Months Truncation Models for Area Offices Rank Truncation 6 18 24 Area Office Additional STD days Area Office Additional STD days Area Office Additional STD days 1 Central Service 7.43 KamlooDS 10.74 Kamloops 12.30 2 Kamloops 5.66 Courtenav 8.66 Courtenav 9.48 3 Courtenav 4.45 Kelowna 5.40 Terrace 6.75 4 Vancouver Richmond 3.76 Central Service 5.35 Kelowna 6.57 5 Vancouver Central 3.54 Terrace 5.16 Nelson 6.36 6 Nanaimo 3.36 Nanaimo 4.44 Central Service 5.77 7 Surrey 3.26 Nelson 4.38 Vancouver Richmond 4.84 8 Burnaby 2.93 Vancouver Richmond 4.35 Surrey 4.83 9 Vancouver South 2.89 Surrey 4.30 Nanaimo 4.79 10 Prince George 2.73 Cranbrook 4.04 Cranbrook 4.54 11 Coquitlam 2.39 Burnabv 3.54 Prince George 4.09 12 Abbotsford 2.02 Prince George 3.52 Burnabv 3.87 13 Kelowna 1.96 Victoria 3.11 Victoria 3.83 14 Nelson 1.89 Vancouver Central 3.03 Vancouver Central 3.16 15 Terrace 1.79 Coquitlam 2.00 Abbotsford 2.31 16 Cranbrook 1.46 Abbotsford 1.97 Coquitlam 2.26 17 Victoria 1.39 Vancouver South 1.47 Vancouver South 1.20 18 Vernon 0.00 Vernon 0.00 Vernon 0.00 80 Table E.8: Additional STD Days to the Baseline Based On 6,18, and 24 Months Truncation Models for Weekday Rank Truncation 6 18 24 Weekday Additional STD days Weekday Additional STD davs Weekday Additional STD davs 1 Saturday 4.15 Saturday 4.90 Saturday 4.94 2 Sunday 3.31 Sunday 3.71 Friday 3.70 3 Friday 2.72 Friday 3.63 Sunday 3.67 4 Thursday 1.22 Thursday 1.68 Thursday 1.73 5 Wednesday 0.69 Wednesday 1.42 Wednesday 1.55 6 Monday 0.18 Monday 0.42 Monday 0.52 7 Tuesday 0.00 Tuesday 0.00 Tuesday 0.00 Table E.9: Additional STD Days to the Baseline Based On 6,18, and 24 Months Truncation Models for Month Rank Truncation 6 18 24 Month Additional STD davs Month Additional STD davs Month Additional STD davs 1 October 1.80 September 1.47 September 1.31 2 September 1.45 December 1.46 December 1.27 3 November 1.34 October 1.44 October 1.20 4 December 1.21 April 1.12 April 1.08 5 August 0.83 November 0.98 July 0.62 6 April 0.80 July 0.88 November 0.61 7 July 0.79 August 0.71 August 0.55 8 January 0.75 June 0.71 June 0.51 9 June 0.59 May 0.56 May 0.49 10 March 0.43 January 0.19 February 0.11 11 May 0.24 February 0.14 January 0.08 12 February 0.00 March 0.00 March 0.00 Table E.10: Additional STD Days to the Baseline Based On 6,18, and 24 Months Truncation Models for Gender Truncation Rank 6 18 24 Gender Additional STD davs Gender Additional STD davs Gender Additional STD davs 1 Female 4.51 Female 5.88 Female 6.02 2 Male 0.00 Male 0.00 Male 0.00 81 Figure E.2: Additional S T D Days to the Baseline Based O n 6,18, and 24 Months Truncation Models for Age Additional STD Days for Age Group *—6 Months Truncation —e— 18 Months Truncation A 24 Months Truncation 30 -r 20 and under 21-30 31-40 41-50 51-60 60 and above Age Group Table E . l l : Additional S T D Days to the Baseline Based O n 6,18, and 24 Months Truncation Models for Year As Year Increases Truncation 6 18 24 Decrease by 0.05 days 0.39 days 1.10 days 82 6 Months Interval from Jul/01/02 Vi H D D-3 Vi H O a. ,p O D oo £ £ £ CD O o a 3 o 3 . ct on H a o > era O c T 3 C -a P O 3 o a c o o o 3 S° o a. re t/) H cr H cr CO s » rt. rt a CO a o o CO CO 65 rt 65 rt. a CO rt rt T3 I. c«" rt O a o rt! o a ST ON H 3 rt 65 O O •a 3 Q S" 3 <ji > T J rt 3 a o a rt h3 rt a co H a o 65 C/5 rt n o 3 5' OTP rt 65 rt. O rt D o sr 3 Vi 65 rt o 5" 3 r « rt oo 4^ 12Months Interval from Jan/01/03 Table F.2: The R-Squared, MSE and Root MSE for the Backward Stepwise Method of Models for 12 Time Interval for Open Claims Intercept Intercept, Previous STD days Intercept, Previous STD days, ICD9 Intercept, Previous STD days, ICD9, Mutliple Injuries Intercept, Previous STD days, ICD9, Mutliple Injuries, Industry Intercept, Previous STD days, ICD9, Mutliple Injuries, Industry, Age Group Intercept, Previous STD days, ICD9, Mutliple Injuries, Industry, Age Group, Occupation Intercept, Previous STD days, ICD9, Mutliple Injuries, Industry, Age Group, Occupation, Area Office Intercept, Previous STD days, ICD9, Mutliple Injuries, Industry, Age Group, Occupation, Area Office, Gender Intercept, Previous STD days, ICD9, Mutliple Injuries, Industry, Age Group, Occupation, Area Office, Gender, Injury Weekday Intercept, Previous STD days, ICD9, Mutliple Injuries, Industry, Age Group, Occupation, Area Office, Gender, Injury Weekday, Injury Month Intercept, Previous STD days, ICD9, Mutliple Injuries, Industry, Age Group, Occupation, Area Office, Gender, Injury Weekday, Injury Month, Injury Year Coefficients in the Model Table F.2: The R-Squared, MSE and Root MSE for the Backward Stepwise Method of Models for 12 Time Interval for Open Claims o o o o to o o —1 o -J o 00 o oo o 00 o 00 o 00 o 00 o *o Table F.2: The R-Squared, MSE and Root MSE for the Backward Stepwise Method of Models for 12 Time Interval for Open Claims 9456.65 8286.37 8096.65 7814.96 7765.85 7747.36 7742.68 7723.58 7721.58 7721.19 7716.60 7670.79 MSE Table F.2: The R-Squared, MSE and Root MSE for the Backward Stepwise Method of Models for 12 Time Interval for Open Claims 97.25 91.03 89.98 88.40 88.12 88.02 87.99 87.88 87.87 87.87 87.84 87.58 RMSE Table F.2: The R-Squared, MSE and Root MSE for the Backward Stepwise Method of Models for 12 Time Interval for Open Claims 18 Months Interval from Jul/01/02 Vi H D CO ro. ro ro o rt o c rp o 3 c q o o ro ~ rt C/J H D > era ro O 3 H cr ST H CO c as a CO M 65 B a o o CO 1 rt co r> TT 65 1 a co (-•• rt •a Vi a B" O a o a rt © h-• 00 H 3 re 69 O rt B Q ST i ' Figure F. l : Change in Root MSE in the Process of Backward Stepwise Method for Models with 6,12,18 and 24 Time Interval for Open Claims 140.00 120.00 100.00 $ 80.00 £ 60.00 40.00 20.00 0.00 Change of Root MSE for Dynamic Model • 6 Months Interval from July /01/02 • 18 Months Interval from July/01/02 A 12Months Interval from January 101/03 A A A A A A A A A • A • • • • • • • • • • • ^ ^ ^ ^ ^ Jf J Removed Coefficient from Previous Model 86 Table F.4: The Predicted Coefficients of Models for 6,12, and 18 Months Time Interval for Open Claims Factor Interval Starts from Jul/01/02 Jan/01/03 Jul/01/02 6 Months 12 Months 18 Months Intercept 71.58 73.12 92.92 Previous STD days 30 < S T D = 60 9.34 12.10 17.36 60 < S T D = 90 19.28 28.73 34.84 90 < S T D = 120 24.77 38.44 45.72 120 < S T D = 150 30.41 44.06 59.27 1 5 0 < S T D = 180 31.71 47.40 58.37 1 8 0 < S T D = 210 33.41 48.14 58.83 210 < S T D = 240 36.97 61.37 64.84 240 < S T D = 270 34.83 47.54 65.54 270 < S T D = 300 53.54 64.45 107.48 300 < S T D = 330 55.31 77.82 96.11 330 < S T D = 360 57.90 89.25 98.08 >360 62.09 87.38 118.44 ICD9 Code Cluster 1 -10.12 -3.60 i : •9 51 2 -5 94 -3.31 v - - : -9 S9 3 1.9S • 8.71 8.05 4 4.8.5 16.44 14 S l 5 1 1 .Wi 28.00 25 13 6 3.60 9 61 6S2 7 6.Mi 26.57 I Id 8 •'•n%v?L4.88 " 14.71 .1 <w 9 . " 8.84 27.1 1 25 M 10 11.59 32.39 33.99 11 19.73 43.53 49.37 Multiple Injuries 5.58 12.66 18.00 Industry Type Cluster 1 -21.08 -31.08 -39.23 2 -17.83 -27.30 -34.83 3 -12.88 -20.20 -25.39 4 -15.21 -24.76 -29.58 5 -15.11 -20.04 -23.97 6 -17.63 -26.48 -30.29 7 -17.69 -19.71 -30.26 8 -14.05 -24.08 -27.57 9 -5.31 -2D.SO .'.'"'""-18.28 10 -14.87 •22.46 -44.94 11 -12.23 -24."5 -12.13 12 -5.00 -1 1.34 1191 Factors are not significant 8 7 Table F.4: The Predicted Coefficients of Models for 6,12, and 18 Months Time Interval for Open Claims - Continued In terval Starts from Factor Jul/01/02 Jan/01/03 Jul/01/02 6 Months 12 M o n t h s 18 Mon ths 20 and under -14.70 -18.08 -16.86 21-30 -5.05 -d Nl) : -0.54 Age G r o u p 31-40 -0.')9 1.27 10 (o 41-50 - i r 1.33 7 2^ 51-60 -0.62 t 61 S "-I 1 -1.60 -11 20 -11 1-1 2 - : u -7 9.1 -10 i - l 3 03 • ' J S i -1087 4 0 38 -4.35 -3 70 5 . . - -2.45 -4.81 ; ", -10 ?7 Occupa t ion Type (Cluster) 6 -0.68 -6.20 \ -6 18 7 2.13 -3.68 ' - " -1 14 8 4.0-1 21.97 ~ 16 80 9 -16 47 4.87 ** -36 1 1 10 6.73 5.47 > 20 41 11 -20. . T -36 "3 12 -16 11 - -45 40 Abbotsford -7.43 -11.74 -16.00 Burnaby -7.32 -10.33 - r . 6 3 Central Service -1 00 0.85 " -6.70 Coquitlam -11.09 -9.96 -23 1" Cranbrook -10.01 -4.04 -I ".15 Kamloops -0.8T 1.36 -1 65 Kelowna -2.18 -3.V5 -6.00 Nanaimo -5.1 1 -6.67 -12.71 A r e a Office Nelson -3 05 8.85 0.03 Prince George -6.93 -8.78 -21.79 Surrey -5.66 -7.85 -13.67 Terrace -6.33 -1.81 -5 71 Vancouver Central -8.97 -9.00 -24.38 Vancouver Richmond -4.10 -6.82 -15.44 Vancouver South -11.76 -16.60 -30.13 Vernon -11.50 -11.67 -24.29 Victor ia -1.43 -2.X6 -6.X6 Gende r Female 2.76 3.15 1.15 I I Factors are not significant 88 Table F.4: The Predicted Coefficients of Models for 6,12, and 18 Months Time Interval for Open Claims - Continued In te rva l Starts f rom Fac tor Jul/01/02 Jan/01/03 Jul/01/02 6 Mon ths 12 M o n t h s 18 Mon ths Monday -1.34 -5.07 -1 48 Tuesday -1 39 -4.80 -3 44 Injury W e e k d a y Wednesday -2 (if. -2.43 1 Thursday -2.02 -4.39 - i 45 Friday -() 84 -4.88 -1 14 Saturday •0.8(. -4 64 -6 "M February 2.47 4.45 s ><> March O o l 8.10 1 96 A p r i l 1.92 14.5S 5.60 M a y •0.16 10.3^ -1 21 June -5.06 7.23 - | o ^ In jury M o n t h July -4.84 10.1? -1 1 'it, August -4.68 9.26 -."91 September -1.12 4.51 -J "5 October 2.11 •1.96 0 M) November -1.18 3.33 10 December 0.55 1.25 -2.93 Injury Y e a r : ( injury year - 2000)*10 -0.19 -0.61 -0.96 Factors are not significant 89 Appendix G: The Predicted Coefficients of the Model to Predict Additional and Total STD Days for a Claim which Already Has Received a Certain Number of STD Days Table G.l: The Predicted Coefficients of the Model to Predict Additional and Total STD Days for a Claim which Already Has Received a Certain Number of STD Days Factor Coefficent Intercept 4.1712 1 -0.2784 2 0.5254 3 0.7262 4 1.622 5 2.3872 I C D 9 Code Cluster 6 1.3871 7 0.3722 8 1.9908 9 2.1726 10 1.213 11 2.796 1 -1.4558 2 -1.3004 3 -1.1456 4 -0.9175 • 5 -0.9283 Industry Type Cluster 6 -1.1463 7 -0.8696 8 -0.6779 9 -0.555 10 -0.2991 11 -0.2675 12 -0.3677 20 and under -0.7301 21-30 -0.5348 Age Group 31-40 -0.3403 41-50 -0.2618 51-60 -0.1757 1 -0.4761 2 -0.3934 3 -0.4046 4 -0.2726 5 -0.1894 Occupation Type (Cluster) 6 -0.176 7 -0.2984 8 0.072 9 -0.6983 10 -0.4059 11 -0.4851 12 -0.01 hi ] Factors are not significant 90 Table G.l: The Predicted Coefficients of the Model to Predict Additional and Total STD Days for a Claim which Already Has Received a Certain Number of STD Days -Continued Factor Coefficent A b b o t s f o r d -0.0775 B u r n a b y -0.015 Cent ra l Service -0.001.3 C o q u i t l a m -0.0234 C r a n b r o o k -0.1241 K a m l o o p s 0.0344 K e l o w n a -0 .1579 N a n a i m o -0.0278 A r e a Office N e l s o n -0.0708 P r ince George -0.0825 Surrey -0.0291 Terrace -0.0863 V a n c o u v e r Cen t ra l 0.0098 Vancouve r R i c h m o n d 0.05 Vancouve r South 0.0101 V e r n o n -0 .2237 V i c t o r i a -0.1471 G e n d e r Female 0.0635 M o n d a y -0.1091 Tuesday -0.1391 Injury Weekday Wednesday -0.1454 T h u r s d a y -0 .1486 F r i d a y 0.0367 Saturday 0.0169 February -0 .0289 M a r c h 0.0016 A p r i l -0.00': M a y -0.00.3.) June 0.0018 Injury M o n t h Ju ly 0.0165 A u g u s t 0.0254 September 0.0261 O c t o b e r 0.0306 N o v e m b e r 0 .0282 December 0.0444 Injury Y e a r : (injury year - 2000)*10 -0.0006 Scale 1.4194 ] Factors are not significant 91 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0091548/manifest

Comment

Related Items