"Education, Faculty of"@en . "Educational Studies (EDST), Department of"@en . "DSpace"@en . "UBCV"@en . "Jackes, Kerri"@en . "2017-01-21T04:10:52"@en . "2009-04-20"@en . "Evaluation shares many characteristics with other fields within the social sciences, but\r\nhas struggled to be recognized as a stand alone academic discipline. Reasons include an\r\novershadowing emphasis on practice rather than theory and modeling. Despite this challenge,\r\nmuch advancement has occurred in evaluation stemming from early works in program\r\nevaluation. These methods, theories and early practice have influenced adult education\r\npractitioners who borrowed and adapted concepts so to assess workplace training and human\r\nresources development (HRD) programs. HRD evaluation literature has seen little advancement\r\nsince the highly noted inaugural work of Kirkpatrick in 1959. His famous four-level model of\r\nhas remained largely unchanged, is often used as a benchmark for other HRD models and has\r\ninfluenced other field contributions. Notably, this includes the work Phillips and his message of\r\nthe ultimate level of evaluation: return on investment (ROI). Many organizations are clamoring\r\nto adopt the Phillips process, yet there is little documented evidence that supports the application\r\nor the utility of the model. Although ROI seems to translate easiest in the corporate world, many\r\npublic sector organizations are taking up the challenge as well in a bid to provide results-based\r\nevidence in support of accountability.\r\nThe Canada Revenue Agency (CRA) chose to adopt the Phillips model for its HRD and\r\ntraining evaluation program. The experience and results were documented in the public review of\r\nits technical training by the Auditor General of Canada. Overall, CRA garnered positive reviews\r\nbut were told to not have made adequate use of the Phillips model in providing evidence of\r\neffectiveness. The descriptive analysis uncovers other factors, such as staffing, funding and\r\nsuitability as important in determining the extent in which a model should be applied. In the case\r\nof CRA, targets and methods were unrealistic to sustain."@en . "https://circle.library.ubc.ca/rest/handle/2429/59881?expand=metadata"@en . "EVALUATING THE EVALUATION OF TRAINING AT CANADA REVENUE AGENCY: A DESCRIPTIVE ANALYSIS OF MODELS by Kerri Jackes A GRADUATING PROJECT SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF EDUCATION IN THE FACULTY OF GRADUATE STUDIES ADULT EDUCATION Approved by: Dr. Lesley Andres Dr. Tom Sork April 20, 2009 THE UNIVERSITY OF BRITISH COLUMBIA April 20, 2009 Abstract Evaluation shares many characteristics with other fields within the social sciences, but has struggled to be recognized as a stand alone academic discipline. Reasons include an overshadowing emphasis on practice rather than theory and modeling. Despite this challenge, much advancement has occurred in evaluation stemming from early works in program evaluation. These methods, theories and early practice have influenced adult education practitioners who borrowed and adapted concepts so to assess workplace training and human resources development (HRD) programs. HRD evaluation literature has seen little advancement since the highly noted inaugural work of Kirkpatrick in 1959. His famous four-level model of has remained largely unchanged, is often used as a benchmark for other HRD models and has influenced other field contributions. Notably, this includes the work Phillips and his message of the ultimate level of evaluation: return on investment (ROI). Many organizations are clamoring to adopt the Phillips process, yet there is little documented evidence that supports the application or the utility of the model. Although ROI seems to translate easiest in the corporate world, many public sector organizations are taking up the challenge as well in a bid to provide results-based evidence in support of accountability. The Canada Revenue Agency (CRA) chose to adopt the Phillips model for its HRD and training evaluation program. The experience and results were documented in the public review of its technical training by the Auditor General of Canada. Overall, CRA garnered positive reviews but were told to not have made adequate use of the Phillips model in providing evidence of effectiveness. The descriptive analysis uncovers other factors, such as staffing, funding and suitability as important in determining the extent in which a model should be applied. In the case of CRA, targets and methods were unrealistic to sustain. 2 Dedication I would like to dedicate this graduating paper to Dr. Lesley Andres for agreeing to advise me on this process and taking up the challenge of working with me at a distance. 1'd also like to take this opportunity to thank Dr. Tom Sork for being a second reader on this paper and offering advice to improve the quality of the discussion. I would like to also dedicate this work to my parents as they were always willing to pick me up at the library and return my overdue borrowed books, no questions. Thank you all for your support! 3 Acknowledgements This paper would not have been possible without the contribution of industry knowledge and the interest of my colleagues and managers at the Canada Revenue Agency. Their support of learning through educational assistance and professional development has allowed me to complete this M. Ed. and expand my knowledge of adult education theory and practice. You introduced me to this field and motivated me to pursue it professionally. Thank you. 4 Table of Contents Abstract ..... .................. ........ ..... ....... .......... ...... ... .......... ........ ....... ..... ........ .... ..... ... ......... ......... ..... ... . 2 Dedication ..... ........... ... .......... ... ....... .... .... ............ ... ...... ...... ....... ..................... ........ ......................... 3 Acknowledgements ....... ..... .......... ........... ........ ....... ........... ..... ... ...... ... ...... ... .. ...... .............. ......... ... .. 4 Table of Contents ...................... .... ....................... .......... ............. ................................. ........ ......... .. 5 Chapter One: Introduction .... .... ............... .................. ....... .... ....... ................ ... ................................ 6 Chapter Two: Literature Review ............... ....... ..... ....... ............ ...... .... ...... ... ................................... 9 Evaluation defined .... .......... ..... .... .. ... ... ........................................... ..... ... .. ..... ... ..... ..... ....... ........... .... ....... 9 Link to adult education ....... ... ........ ..... ..... .. ........... .. .. ... ...... .......... ........ .... ..... .... ....... ............................... 12 Roots of evaluation research ........ .... ..... ........... ..... ...... ... .... ....... ........... .... ....... ... .................................... 14 Leading theorists and noted shifts infocus ........................... ......... .... .. ....... ... ..... ......... .... .. .......... .... ... ... 17 Theory and practice .......... ... .... ...... ....... ......... ....................................... .......... .. ................... ......... ......... 19 Chapter Three: Evaluation in Human Resources Development ..... ...................... ....... .... ....... ...... 23 Kirkpatrick's four-level model ... .. ...... .. .. ....... ... ................... ... ....... .. .. ..... ............ ..... ............ .. .. ...... .. ....... 25 Phillips 'five-level framework ..... .... ............ ........... ....... ......... ..... .... .... ...... ... ..... ...... .. ..... .... ... .. ..... .. ........ 30 Six-stages evaluation and the Success-case method .... .... ...... .......... ... .......... .. ....... .......... ...... ................ 37 Chapter Four: Case Study: HRD evaluation at Canada Revenue Agency ......... ..... .................... .. 42 Office of the Auditor General .... .... .. ... .. ................ ......... .................... ... .... .. ............................ .. .. .. ......... . 43 Learning at CRA .... ... .. .............................................................................................................. ............. 45 Current evaluation model employed ..... .. .. ........ ..... ............................ .......... ........ ........ ............. ............. 48 OA G recommendations ............... ........ ..... .. ..... .... .... .. ..... ... .... .... ......... .... ...... ... ... ..... ..... ............... ...... ... .. 51 Lessons learned ....... ............................ .................................. ...... .... ... .. ............................... ................... 53 Chapter 5: Conclusion ...... ....... ... ....... ... ............. .. ............................... .... ...... .. ...... .................. ... .... 55 References .... .......... ........................ ..... .... ...... ...... .... ..... ........... .... ......... ... .. ..... ................. ........ ...... 58 Bibliography ........ .. .... ..................... ........................... .................. .. Error! Bookmark not defined. 5 Chapter One: Introduction Research and evaluation share many of the same characteristics. What tends to distinguish evaluation from traditional research is that evaluators are often responsible for forming concluding opinions that place value or determine the worth of what is under examination. For many, asking who is the evaluator is often as important as asking what it is they are evaluating. For most social science research disciplines the researcher can be regarded as a conduit for the questions and investigation underway. In traditional social science study, researchers begin with a question that needs to be answered and formulate hypotheses. Evaluators, on the other hand, are driven by the program being administered rather than research questions (Weiss, 1998, p.1S); and as such, evaluators often rely on judgment rather than hypothesis testing. Although the two share characteristics such as methodology and design, evaluators continue to seek a professional designation for their work which in some academic circles has been dubbed \"a lower order of research\" (Weiss, 1998, p.17). Despite this hierarchical order of research, evaluation theory and practice is derived itself from social science research whereby evaluators are often asked to assess social programs and policies. This is largely the motivational factor that has led to the expansion of evaluation as a field, though pundits claim it lacks the academic recognition to be considered its own autonomous discipline. What is difficult to argue is that evaluation, in part due to its roots in . social inquiry and progress, is often associated with education and draws from education practice. For instance, learners of formal education programs are often tested, graded and ranked in terms of their achievements. Adult learners who participate in non-formal education may be evaluated under a different set of criteria. Many times the evaluation may focus on the intervention rather than the performance of the adult learners. Perhaps this is why evaluation and 6 adult education, when examined together, are frequently linked to organizational improvement and human resources development (HRD) and become disassociated from the altruistic roots of social inquiry. The practice of evaluating HRD training programs as a direct attribute of organizational performance is gaining the attention of corporate senior managers who are asking if the training investment was worth the cost. Once characterized as a fledgling research tradition, program evaluation is being pressed forward by industry magazines that sell organizations the idea they need to prove their training is effective, consequently promoting books and seminars on models claiming to produce the most accurate evaluation results. For some organizations the ultimate worth of a training program may be based on intangible benefits like job satisfaction while for others, the return on capital investment is paramount. A prime example of this type of industry support is seen in the recognition the Phillips return on investment (ROI) model for evaluating training receives from professional training societies, like the Canadian and American Society for Training and Development. The popularity of ROI can be attributed to direct marketing and in part to a lack of direct competition of an equally omnipresent and profitable evaluation model to corporate organizations. This exposition is a descriptive analysis of the evolution of the evaluation research tradition and its link to adult education via human resources development. It acts as a historical account of the literature and major contributors to the field first through program evaluation roots followed by a look into popular HRD training evaluation models. The literature review serves to define evaluation, investigate the relati.onship between theory and practice, and demonstrate how research paradigms have changed since becoming widespread in the 1960s to modem applications in support ofHRD. Following the literature review and brief comparison ofHRD 7 evaluation models is a descriptive case study on the application ofHRD evaluation in the Canada Revenue Agency (CRA). The case study was chosen to illustrate how an evaluation model can be put into action and how models are interpreted by organizations. The CRA touts itself as a supporter of adult education, lifelong learning and has a generous training and learning buqget. The CRA presents an interesting example in the use of applied evaluation models as demonstrated through a recent independent audit of its technical training program where the use of the Phillips ROI model was scrutinized by the Office of the Auditor General. It is the first federal organization to have its corporate and programs training audited and can be interpreted as a comparison baseline for other federal departments and agencies. While the utility of the applied evaluation model is questionable, the case study leaves little doubt that, given the high costs of running HRD programming in the federal public sector, HRD training needs to be evaluated. In addition, how models serve to guide that process require assessment. 8 Chapter Two: Literature Review Evaluation by its very nature is multidisciplinary. It touches many \"contexts, fields and types of evaluands\" (Davidson, 2005, p. xvi). There are rarely offices or faculties of evaluation dedicated to the research of evaluation. Evaluation research tends to be carried out under the guise of another discipline, like education, public administration, healthcare or psychology (King, 2003). As a relatively new tradition, approaching 50 years now, many aspects of the knowledge base are still in their infancy. This chapter outlines the growth of program evaluation as a research tradition in the social sciences and its links to adult education. It discusses the dichotomy between theory and practice, concepts in evaluation, leading theorists and their contributing ideas to the evolution of evaluation as a field and a profession. The literature review concludes with thoughts of where the tradition is predicted to shift amid current trends and inquiries in evaluation. Evaluation defin ed Evaluation is a term that is defined mostly by its purpose. To interpret its meaning is to ask why and when to evaluate. Currently there is not a single definition that has been endorsed by the community of evaluators (Shadish, Cook & Leviton, 1991; King, 2003). Guba and Lincoln (1989) defend their choice not to define evaluation because it is many different things to different stakeholders in different circumstances. For them, \"there is no point in asking\" (p. 21). what evaluation is since they defy having found a credible answer. For the purposes of this conversation on program evaluation, a definition is required to ground the case study presented in Chapter Four. 9 The Compact Oxford English Dictionary describes evaluation as a numerical expression of finding value or assigning an amount to a subject of study. This definition is in line with other modem dictionary definitions; however, it could mislead the reader to wrongly conclude that to evaluate is to only perform quantitative research and analysis. While it would be easiest to equate evaluation with quantitative methodology, many practitioners and theorists in the field employ qualitative research methods to evaluate, which do not fit within this frame of reference. A more specific definition that denotes the purpose and timing of evaluation is necessary. Scriven is considered an authority on the topic of evaluation and has been a celebrated contributor to the theoretical development of the field and evaluation specific methodology (Davidson, 2005, p. xi) . He describes professional evaluation as \"the systematic determination of the quality or value of something\" (Scriven, 1991, p.12). Scriven's definition is given high regard and often cited in evaluation texts because he uses a systems approach and focuses on valuing the evaluand. For others, defining evaluation is also about discussing method and utility and not solely valuing. In his review of evaluation models, Stufflebeam (2001) defines program evaluation as \"a study designed and conducted to assist some audience to assess an object's merit and worth\" (p.ll). Like Scriven's, this definition is a precise summary of the main concepts of why evaluate for various types of assessments in a variety of contexts, which could include adult education programs. Here we find Stufflebeam's definition addressing use and value of a program without characterizing appropriate methods. Stufflebeam (2003) uses the Context, Input, Process and Product (CIPP) Model to emphasize that \"evaluation's most important purpose is not to prove, but to improve,\" which is a reworked idea introduced by Guba in 1971 (p. 4). CIPP was introduced to counter the evaluation orthodoxy of the 1960s where evaluations were intended \"to 10 detennine if objectives had been achieved and met the requirements of experimental design\" (Stufflebeam, 2003, p. 30). While there is some debate whether conducting an evaluation is more of an art than a science, many classic definitions attempt to clarify the field of evaluation by employing rigorous research designs. Also, there may be motivation to conduct evaluations beyond merit or worth, like reasons of political will or for window-dressing. It is important to note that Stufflebeam (2001) would classify alternatives to judging worth and merit as \"pseudoevaluations that promote invalid or incomplete findings\" (p. 11). While to-the-point, Stufflebeam's definition is decision-oriented and does not account for the variety of methods that can be used (AIkin, 2004). Caffarella (2002) was also influenced by program evaluation research but her work comes from a perspective of adult education and program planning. She defined evaluation as \"a process used to detennine whether the design and delivery of a program were effective and whether the proposed outcomes were met\" (p. 225). Caffarella emphasizes first the importance of strategically planning for evaluation before a program has been developed and second acknowledging that many evaluation activities can be infonnal and largely unplanned. This description may be too specific for the intended purposes of this discussion and fails to reflect evaluation that is not guided by predetennined program objectives, such as in goal-free evaluation. Many evaluators rely on a goal oriented approach for their evaluations. While Caffarella is not unique with this respect, it does exclude other categories of evaluation. Other definitions encompass the ideas of assigning worth and merit but also fGCUS on methods and the context of evaluation. Weiss is largely influenced by research methodology and traditional experimental and quasi-experimental research designs. Weiss (1998) defines evaluation as 11 A systematic assessment of the operation and/or the outcomes of a program or policy, compared to a set of explicit and implicit standards, as a means of contributing to the improvement of the program or policy (p.4). Weiss further expresses evaluation as a process for continuous improvement that can be formative or summative, meaning to evaluate during implementation or concluding implementation, respectively. Another interesting aspect of Weiss' work is that she also examines the context in which the evaluation is taking place, although this is not reflected in her definition. Evaluations tend to be highly political. The context is an important aspect to consider because the results of an evaluation may influence political support, funding, or may be used for the dismissal of stakeholders rather than the improvement of the program. What separates Weiss' concept of evaluation is her consideration for the political environment, which suits this discussion of applications of evaluation since adult education and human resources development tend to rely on funding which is rarely guaranteed. Weiss adds focus to Stufflebeam's definition by including the purpose of evaluation as making programs work better and allocating resources for their improvement (Weiss, 1998). Scriven, Stufflebeam and Caffarella tend to focus on outcome evaluation rather than continuous improvement. The critique of the previous definitions shows that this discussion is best suited to adopt Weiss' definition in order to account for continuous improvement and quality assessment of adult education programs. Link to adult education Program evaluation and adult education are both interdisciplinary. While evaluation struggles to gain independent academic acceptance, adult education is recognized as a credible field of study and evaluation a research tradition within it. Non-accredited learning, self-di~ected 12 learning, informal learning and basic education beyond school age are often used to characterize adult education. Adults choose to participate out of general interest or to improve their skills to increase their competitiveness in the often-touted knowledge-based economy. While most informal adult education can be considered self-funded, much of non-formal adult education is provided through social justice outreach, workshops sponsored through government projects and community-based initiatives. Other non-formal adult education activities can be regarded as job-related training or employer sponsored work-place training for the purpose of improving performance on the job. Governments often make evaluation a requirement of grants, obligating many adult education practitioners to conduct evaluations as a condition of continued funding for their programs, whereas businesses want to know if investing in training is making a difference to their bottom line. Like all practitioners, adult education practitioners readily draw from research and studies on evaluation to support their work and programs in schools and communities. Most textbooks dedicated to practitioners include chapters on evaluation and knowledge transfer (Caffarella, 2002; Saks & Haccoun, 2007). Many of the practitioners come from the field of human resources development (HRD) and have access to a variety of books written on the topic. The biggest difference being that HRD books and articles tend to focus on linking training to performance (Swanson, 1998; Dionne, 1996; Phillips, 1996; Brinkerhoff, 1987) and adult educators tend to favor program improvement. In a literature review of evaluation oftraining, Hoyle (1984) found it necessary to differentiate between educational evaluation and training evaluation due to the different approaches used in each case. He determined that training or HRD evaluation \"is usually concerned with the assessment of brief programmes for adults, in non-school situations, on subjects which are not susceptible to objective measurement\" (Hoyle, 1984, p. 275). This 13 analogy also ties to the infonnal connotation of adult education. Hoyle found that schools and higher education institutions applied educational evaluation with scientifically based methods and measurable objectives. HRD training on the other hand is problematic to evaluate because of the difficulty in measuring objectives subjectively and the lack of management support for evaluation. This discussion will follow training program evaluation and use work-place educational events for adults to expand on theories and applications of program evaluation. Chapter Three takes a further look at HRD evaluation and introduces models and theories specific to this context. Roots of evaluation research There is some debate over the genesis of the tradition of evaluation research. Some scholars point out that we evaluate ourselves on a daily basis: how we perfonn, decisions we make and reactions we receive, to continuously improve ourselves (Weiss, 1998). Guba and Lincoln (1989) argue that fonnal evaluation can be traced back to the school systems in France and England in the mid-nineteenth century. Dubbing this the first generation of evaluation, early work was dedicated to the measurement and classification of results. In fact, it is from this early period of evaluation research history that the world famous IQ Test was developed and originally implemented (Guba & Lincoln, 1989). Rossi (1999) draws from examples ofinfonnal social health interventions that date from the seventeenth century. Since these documented occurrences, shifts in focus have led to what many consider the initiation or explosion of modem program evaluation research, where current writings that apply a combination of methods, placing value and use are traced to the studies, theories and publications dating back to the 1960s. In the 1960s, evaluation was considered a newly emerging field and a growth industry (AIkin, 2004; Rossi, 1999). Rossi (1999) described this as the boom years of evaluation research 14 due to government and industry demand for the kinds of reports that described the merit and worth of programs. Patton (2008) refers to these opportunities erupting from the \"Great Society\" legislation of the time (p. 14). It was the practice of and the need for evaluation that ultimately drove the development of research and theory. Many studies conducted at that time were based on earlier ideas of progressive education, like that of Tyler (1942) and Campbell's (1957) work on research methods. Evaluation of government-sponsored programs was in high demand in the early 1960s as funding dedicated to post-war building, cultural and social programs was increasing (Rossi, 1999; King, 2003; Patton, 2008). The resources being disbursed needed to be justified in order to sustain these programs. AIkin (2004) describes the early foundation of evaluation research being rooted in social inquiry and accountability and control. Governments especially rely on accountability in the \"broadest sense\" to provide the rationale for resource management (Alkin, 2004, p. 12). Accountability is most often associated with reporting, justifying analyses, and answerability (AIkin, 2004). This type of evaluation finds faults and identifies persons or offices that can be held responsible to address them. Program evaluation received much attention during government eras of fiscal conservatism that followed large government social program expenditure (Rossi, 1999). Despite a withdrawal in accountability-based research over time, a thrust has been observed in both private and public sector organizations interested in enhancing performance and calculating the return on investment (Phillips, 1996). This is seen in amplified cases of businesses and training units producing human resources development evaluations and the work performed at the Office of the Auditor General (OAG) of Canada and the United States Government Accounting Office (GAO) (Rossi, 1999; AIkin, 2004; OAG, 2007). The case study presented in Chapter Four documents . the work of the OAG assessing a federal public sector training program. While accountability 15 and control remain important for supporting evaluation practice in government and business, the tradition of evaluation research and modeling was largely built on social inquiry (AIkin, 2004). AIkin relates social inquiry to the recognition of the \"unique social dimension\" of people acting in groups (2004, p . 15). This pillar of evaluation research began in a positivist research methods frame. Rossi (1999) explains that the \"diversity of social interventions\" has influenced the expansion of evaluation research within social science departments and became conventional as preferred methodologies were changed (p.9). Research in this domain has contributed to the theoretical catalogue, philosophical aspects and contemporary methodology that drives practice. The literature reveals that much of the academic publications on evaluation considered being the building blocks of the tradition were written under the guise of social inquiry. The research deviates into three streams, enriched since the pioneer works of the 1960s. They are as follows: methods, valuing and use. Table 1 Descriptions of research streams in program evaluation Category Methods Valuing Use Description Described as research dedicated to finding appropriate experimental and quasi-experimental research designs for knowledge construction . Relies on predetermined objectives and the use of methods like control groups. Influenced by social science research methods and the positivist paradigm, more precisely educational measurement. Concerned with the evaluator's ability to make judgments and choose outcomes to evaluate. Research in this category contributed to the development of program evaluation specific theories and the role of the evaluator juxtaposed to the researcher, e.g., Scriven (1967). Most often associated with the practice of evaluation, decision-making and addressing needs of stakeholders. Adapted according to decision-oriented theories. Evaluations based on use are designed to be informative where results directly impact decisions, e.g., Program Evaluation Standards (American Evaluation Association), Patton's (2008) utilization-focused evaluation. Note. Adapted from \"An Evaluation Theory Tree\" by Aikin, M. C. and Christie, C. A., 2004, Aikin, M. C. (Ed.). Evaluation roots. Thousand Oaks, CA: Sage. pp. 12-66. 16 Leading theorists and noted shifts in focus A number of key scholars can be singled out for having been cited continuously throughout the boom of evaluation literature. Some names like Campbell, Scriven, Stufflebeam, Guba and Lincoln for example, have been mentioned and cited already in this discussion as contributors of theory, models and practice of evaluation that transcend disciplines. Most in the field tend to be working out of British or American universities and continue to be active producers of new knowledge and are active within professional evaluation associations. Their works are seen as core pieces in evaluation research and offered ideas that shifted the direction of research over the last 50 years . In a dedication to the sixth edition of his popular textbook, Evaluation: A Systematic Approach, Rossi (1999) describes Campbell as the \"architect of evaluation theory and practice\" (n.p.). Campbell is best known in evaluation and experimentation circles for his work in the 1960s on eliminating bias while conducting field research (Alkin, 2004). Campbell was a trained psychologist but his work informed studies early in the development of program evaluation literature. His peers consider him a great social science methodologist of his time (AIkin, 2004; Shadish & Luellen, 2004). For these reasons, Campbell is classified as the grandfather of the methods research stream of evaluation research for his groundbreaking experimental approach to evaluation and work on cause and effect, which are firmly grounded in positivist thinking (Shadish & Luellen, 2004). AIkin (2004) describes Experimental and Quasi-experimental Designs for Research (Campbell & Stanley, 1966) as one of the articles that contributed directly to the advancement of social science research methods. Scriven is noted as the first in academia to publish a general theory of valuing. Others after him continue to use his work as a baseline (AIkin & Christie, 2004). His early scholarly 17 writing presented a shift within the tradition diverting research away from its core in methodology towards valuing. Scriven is a multidisciplinary theorist with over 300 works spanning education, philosophy, mathematics, psychology as well as other social disciplines, but is best known for his contribution to evaluation research. He held presidential posts for the American Educational Research Association and American Evaluation Association. Scriven's thinking on the role of the evaluator as the prime determiner of the worth of the program being examined cleared the way for knowledge construction in valuing. Scriven is also recognized for introducing an alternative to experimental design, what he called the \"modus operandi,\" whereby the evaluator determines a causal chain of events (AIkin & Christie, 2004, p. 33). In 1972,' Scriven published Pros and Cons of Goal-free Evaluation outlining a controversial approach to evaluation that argued against using objectives at a time when most evaluators depended on pre-determined program objectives to perform their work. Scriven advocates for the evaluator to decide what the necessary program outcomes were in order to identify the accomplishments of the program. A better-known model to affect the use stream of evaluation research is Stufflebeam et al. 's CIPP model introduced in 1966 for large educational proj ects and evo lving thereafter. AIkin and Christie (2004) describe the CIPP model as representing four types of evaluation: context, input, process, and product. The model can be applied in pieces or in full. The model is intended as a framework for guiding summative and formative evaluations that support the decision-making process of identified stakeholders. Stufflebeam expresses this model as a work in progress classified under improvement-oriented evaluation that can also be considered as studies in decision-making and accountability (Stufflebeam, 2002; 2001; 2004). The processes to support decision-making for the purpose of delivering better programs to clients focus on the use 18 of evaluation. Stufflebeam's CIPP model is used most commonly in the field of education to analyze education programs (Casey, 2006). The work of Guba and Lincoln (1989) offers insight into the shifts of research and influence that occurred in the tradition. Early research and study came from a positivist scientific paradigm and impacted the work of other researchers that followed. In Fourth Generation Evaluation (1989), Guba and Lincoln began to express ideas that exhibit a postmodem influence. Their contribution is the perception that the evaluator is not the sole researcher or determiner of value, as endorsed by Scriven and others, instead the evaluator is a \"constructivist investigator\" whose role is \"to tease out the constructions that various actors in a setting hold, and so far as possible, to bring them into conjunction\" (Guba & Lincoln, 1989, p. 142). They claim that the evaluator is but one perspective and hence can offer only one reality. Negotiation thus becomes a fourth stream of evaluation research since evaluators often work with multiple stakeholders who each hold their own realities of the program. Studies in evaluation moved from true experimentation to constructivist methods of using \"maximum variation sampling to identify the broadest scope of stakeholders who are interviewed sequentially\" to prove the range of individual constructions or realities (AIkin & Christie, 2004, p. 42). Guba and Lincoln demonstrate the variety of thinking and approaches that occurred within evaluation research and the paradigm shift that transpired. Theory and practice Evaluation is a field that is securely connected to practice. Many practitioners have had little exposure to established theories. Most find themselves in evaluator roles out of necessity to the organization, even though they have been trained in other disciplines (Christie, 2003). While the collection of literature has continued to expand, including textbooks and handbooks 19 dedicated to practitioners, there seems to be a great disconnect between practice and theory to the extent that King (2003) describes program evaluation as a field in need of validated theories. This statement is supported by research of the biggest collective of practitioners, the American Society of Training and Development (ASTD), who found that their members do not often follow the recommendations of evaluation literature, citing reasons of finding the literature's advice neither applicable nor useful. Most of the research seems to follow a pattern of after-the-fact research, based on experience. Stufflebeam's CIPP model has evolved to fill a theory gap in evaluation research over time in what he describes reflects prolonged effort and a modicum of progress to achieve the still distant goal of developing a sound evaluation theory, that is, a coherent set of conceptual, hypothetical, pragmatic, and ethicai principles forming a general framework to guide the study and practice of evaluation (Stufflebeam, 2004). According to Horsch (1999) this lack of theory and support for research is one of the reasons that Scriven describes program evaluation as having earned its recognition as a separate discipline within the social sciences slowly. The emphasis on evaluation seems to always be on conducting one rather than theorizing about it (King, 2003). That being said, the community of evaluation theorists is small and they are heavily influenced by each other's work, adding to its slow progress (AIkin, 2003) In The Challenge of Studying Evaluation Theory, King (2003) asserts that evaluators are most often working for their clients rather than testing theories. The key for successful evaluation tends to focus on more pragmatic results than academic achievement. She attributes this general lack of a theoretical base to six underlying issues: 1. Lack of general conceptual consensus 2. Tendency of favoring the practical focus 3. Focus on evaluation models and methods over theory 4. Overshadowing focus of program theory 20 5. Lack of research support 6. Relatively young field Of particular interest to this discussion is how evaluators are seeking to address specific political contexts with broad models that exacerbate the practice / theory dichotomy. According to King, the development of alternative models, related again to the issue of practicality trumping theory, is a primary driver away from theory development and characteristic of a field in its infancy. This may explain the lack of funding available to test theories. King questions whether the study of evaluation theory deserves funding \"before developing models that deserve examination\" (King, 2003, p. 59). Popular theories that exist include the dichotomy between the universalist view and the contingency view. Chen (2004) offers the contingency view in support of practitioners because it advocates that the choice of evaluation strategy and approach is dependent on the situation and advocates that there is no blanket model for all types of evaluation. Other common theories centre on scientific credibility and stakeholder credibility. The latter case also tends to be more helpful to practitioners who are obligated to prioritize the opinions of the many stakeholders involved in the development and delivery of a program over the method used to collect data. Despite the recorded disconnect between theory and practice, Chen (2004) advocates that to rightly determine the worth and merit of a program, the process must be guided by program theory to explain the causal processes that affect change (p. 156). As Visser (n.d.) points out, without academic critical analysis and adequate supporting theories, there is a risk that \"evaluations may evolve into even more politically charged events than they already are, and the evaluator becomes just one of the stakeholders\" (n.p). Christie (2003) points out that while most practitioners operate without a solid theoretical framework, they db proceed with some conceptions or instincts they have about the work they 21 are perfonning. If current theories are deemed inapplicable by today's practitioners it may be of interest to conduct studies into \"the implicit or folk theories of evaluation that exist in the field,\" as recommended by Christie (2003, p. 92). The question remains of what guides the work of evaluators (Christie, 2003). The literature review of program evaluation shows an evolution from a fledgling field to a profession has been slow and arduous but has yielded success and recognition. Like many of the social science disciplines, evaluation has its roots in positivist research approaches influenced by the natural sciences. The developments in the field reflect the debated paradigm wars, where research approaches became more pragmatic and began incorporating constructivist methods. The research tradition does seem to suffer from an identity crisis because most work is conducted on behalf of other disciplines and only a handful of universities have a faculty dedicated to its research and advancement. Nevertheless, evaluation has rightfully earned a place within social science research for the contribution of practical models, and ideas centered on methods, value and use. Since program evaluation is interdisciplinary, concepts often are borrowed and explored under the guise of another field of study, and the literature is continuously updated. Adult educators draw on advances in program evaluation for their work in HRD. The tradition may benefit from a movement to attain a common language and definition to what it means to evaluate. If these debates can be settled earlier then more time can be dedicated to conceiving and testing theories that are useful on practical and scholarly levels. 22 Chapter Three: Evaluation in Human Resources Development The previous chapter examined the research tradition of evaluation from both academic and practice perspectives. It was demonstrated that evaluation research has come into view mostly through the work of professionals needing evaluation solutions for their education interventions and programs. The thrust in research comes from two avenues: public social programming and human resource development (HRD), where the latter tends to combine more adult education theories and practice with economic and organizational psychology theories. Statistics Canada (2001) reports that along with learners who finance their studies, employers are equally the largest providers of financial support for adult education, with tuition reimbursement, paid education leave and workplace learning. With this type of investment, companies are more often than before putting emphasis on the evaluation of training. The most solid evidence of the work and contributions from the HRD field is published in industry magazines such as T+D and Training and supported by industry associations like the American and Canadian Societies for Training and Development. The requirement of training directors and instructors to provide substantiation for training and HRD investment helped push forward the creation of theories and models of evaluation with an industry perspective, although training or HRD evaluation has not experienced that same degree of growth or attention as more general program evaluation. Stufflebeam would generally classify HRD evaluation studies as improvement-accountability oriented evaluation and cost-benefit analysis approaches. The first emphasizes the involvement of multiple stakeholders to determine the worth and merit of a program whereas the second uses a strictly quantitative measure of worth and merit. While these types of evaluations are not limited to HRD, some of the best examples of their use are from this field. The most frequently cited contributors have 23 been credited with the development of the most pragmatic evaluation models, used by many Fortune 500 companies. HRD evaluation models also tend to commonly use direct assessment strategies to identify the merit, value and worth of training programs (Casey, 2006). The work of Kirkpatrick, Phillips and Brinkerhoff continue to gain attention in HRD publications. Research and literature in HRD evaluation reveals that significant barriers exist when organizations attempt to evaluate training. In fact, Holton (2005) claims that despite best efforts, \"only about half of the training programs are evaluated for objective performance outcomes. Additionally, less than one third of training programs are evaluated in any way that measure changes in organizational goals or profitability\" (p. 258). Holton (2005) believes that this is evidence for the poor state of training evaluation research and inadequate HRD practice (p. 258). Holton (2005) argues that current training evaluation models are thus \"not good decision-making models for HRD in organizations\" and so advanced evaluation tends to not be needed (p. 258). Dionne (1996) suggests that the best attempt to evaluate within an organization is to merge the efforts of researchers, management and trainers to develop an understanding of the requirements of each stakeholder group and create efforts for a concerted approach. This idea is not unlike Patton's (2008) Utilization-focused Evaluation where stakeholders must work together in order to produce useful reports for implementation rather than collecting dust in library stacks. Dionne listed the reasons for the complexity of evaluating training to include the absence of a unifying model and theory, unreliable current research, inconsistent methodologies and measurement, and how organizational constraints continue to limit implementation of a study (Dionne, 1996). Dionne recommends that some concessions by researchers, managers and trainers are required if progress is to be made in this field. First, he states that true experimental approaches are unrealistic and do not reflect the post-training environment, and second, trainers and managers 24 need to build training that is well-supported after the initial intervention. The theme, according to Dionne, is to balance the validity of the research findings with the credibility of providing positive results. Many models dedicated to the evaluation ofHRD, workplace training specifically, have been published since the rise of evaluation research in the 1960s, with a resurgence occurring in the 1990s. In recent decades, attempting to isolate the benefit of programs in terms of financial merit and cost analysis has become more popular. Most models tend to build on the work of. others while some approaches have become more influential to practitioners in the field. Evaluation methods and data collection tools used to support these models tend to evolve with the availability of new technology. The following section briefly details models commonly applied by organizations to evaluate training effectiveness, merit and worth within industry and business organizations. Kirkpatrick 's four-level model Individuals who conduct evaluation research in HRD and training tend to be overlooked in academic program evaluation literature since their work is not rigorously peer reviewed before being published. If a unifying model could be proposed, the exception would be that of Kirkpatrick, who has been repeatedly cited in the literature as being the grandfather of evaluation of training effectiveness, whose model has remained largely unchanged for 50 years. Kirkpatrick is best known for his contribution of a hierarchical four-level model of evaluation that is focused on outcomes, where each succeeding level provides more important information than the last to the extent that he claims each level has a causal effect on the next (Saks & Haccoun, 2007). Many practitioners consider it as the gold standard for HRD evaluation. The idea ofthe four-level framework was born when D. Kirkpatrick needed to describe how he planned to evaluate a 25 counseling program he developed as part of his Ph.D. dissertation (Kirkpatrick & Kirkpatrick, 2006). The model has evolved and has been revisited over the decades since first being published in the Journal for the American Society of Training Directors in a series of articles called \"Techniques for Evaluating Training Programs\" in 1959 while teaching at the University of Wisconsin. D. Kirkpatrick's son, J. Kirkpatrick, who aligned the levels of evaluation with strategic business planning, provides the most recent modifications (Kirkpatrick, 2007). In fact, D. Kirkpatrick did not modify or revisit his original concept until the first edition of his book Evaluating Training Programs: The Four Levels was published in 1994. Table 2 Kirkpatrick's four-level model Level Title 1 J, Reaction 2 J, Learning 3 J, Behavior 4 Results Description Data at this level measures how the learners reacted to training. Questions typically asked: Did they like it? Are they satisfied with the materials, instructor? Data is typically collected by a survey called a 'smile sheet' distributed at the end of training . The subsequent level attempts to measure the learning gain achieved as a direct result of the training intervention. Data is typically collected by pre- and post-tests and through learner self-reporting. Evaluation here moves beyond satisfaction towards reporting on advances in skills, knowledge and attitudes. This level determines the transfer of skills, knowledge, attitudes and changed behaviors that are attributed directly to the training. This evaluation most often represents the actual degree of a program's effectiveness by verifying that trainees are applying what they learn into their work. Data is collected by survey, interview and by observation. The final stage, which is an accumUlation of the previous three levels, concludes what was the effect of the training on organizational outcomes. The results typically translate to increased productivity, higher profits, decreased accidents, etc. Quantitative data collection tools such as performance records are used at this level. Despite being the motive for investing in HRD training, results at this level are often difficult to measure. Note. The levels as described by Kirkpatrick have only slightly evolved over the decades since being introduced in 1959. 26 D. Kirkpatrick's original aim was to encourage training directors to \"increase their efforts in evaluating training programs\" (Kirkpatrick, 1996, p.54). Today, the model has been widely adopted by organizations to guide training program evaluation. Bates (2005) attributes the popularity of the model to the systematic method practitioners can use to classify program outcomes, how practitioners in for-profit sectors can use information on results to relay success in business terms, and how the model simplifies the evaluation process by streamlining variables and reducing the burden of collecting pre-course performance indicators (p. 221). Kirkpatrick also helped focus evaluation on outcomes and promote the idea that single outcome results are not useful when evaluating complex organizational programs (Bates, 2005). Kirkpatrick's approach has been met with criticism in academic circles mostly due to its excessive popularity among practitioners despite its lack of theoretical grounding. Most critics of the four-level model have accused it of being a taxonomy of evaluation rather than a model (Holton, 1996a). Casey (2006) explains that another disadvantage is that it can take too many resources to apply the model for an organization when resources for evaluation tend to be limited. As well, this direct assessment method of evaluation requires learners to be observed, surveyed, tested, which takes a lot of time and reduces productivity (Casey, 2006). Other limitations of the model centre on its incompleteness, the \"assumption of causality\", and the \"assumption of increasing information as the levels of outcomes are ascended\" (Bates, 2005, p. 221). In his analysis, Holton describes the failure of the four-level model as a model due to \"incomplete implementation and little empirical testing\" (Holton, 1996a, p.6). Holton (1996a) argues that the Kirkpatrick model is representative of a lack of research committed to building explanatory theories in the field, which he identifies as stunting the continued growth ofHRD 27 evaluation. This, he claims, is occurring despite HRD evaluation being crucial in a time of increased international economic competition. Bates (2005) also explains that research over the last few decades shows that there are many organizational and environmental characteristics that can have an impact on training either before, during or after the intervention. The model offers an \"oversimplified view\" (Bates, 2005, p. 222) that does not adequately account for the context in which the training was developed and delivered. Kirkpatrick's model implies that issues such as personnel support, access to resources, morale, values, and others, are factors not \"essential for effective evaluation\" (Bates, 2005, p. 222). Critics like Holton and Bates base their position on the few studies and literature reviews conducted on the application of the Kirkpatrick four-level model. The work of Alliger and Janak (1989) uncovers some of the assumptions of causality related to implementing Kirkpatrick's four-level model. Alliger and Janak report, and oflate, Alliger, Tennenbaum, Bennett, Traver, and Shortland (Bates, 2005), having found little evidence that illustrates a hierarchical correlation between levels and offer an alternative model of causality. This means that ideas such as learners needing to have reacted positively in order for the training to be successful, have been largely unfounded. In most cases the researchers believed that authors of evaluation studies assume the Kirkpatrick model to be true without testing its underlying assumptions of causality. The final criticism is based on the assumption that the more advanced the level of evaluation obtained, the stronger and more informative the results will be. Bates (2005) argues that in practice, the \"conceptual links\" (p. 222) of the model are weak and the resulting data collected from the studies does not substantiate that claim. Despite the inconsistency between theory and practice, Alliger and Janak claim that most other HRD evaluation models offered to practitioners are similar in presentation and bare similar criticisms ofthe Kirkpatrick model, 28 meaning, they found that no ideal evaluation model existed at the time of their study. It can be argued that no other models have been able to compete against the simplicity of Kirkpatrick's ideas and presentation to date. In response to the criticism, Kirkpatrick claims that the initial techniques were never offered as a model, but rather it was others in the field that interpreted it to be a model (Kirkpatrick, 1996). Despite his early motivation, D. Kirkpatrick and J. Kirkpatrick currently market the levels as Kirkpatrick's Four-Level Model as evidenced in the article ofthe same title published in the January 1996 issue of Training and Development. The problem at issue is that the model falsely leads practitioners to assume there is a relationship among the levels first identified by Kirkpatrick. Holton believes that the Kirkpatrick model offers a too simplistic illustration of the causal connections between reactions, learning, behavior and impact and ignores factors such as motivation and external events, among others. Holton suggests dedicating resources to developing a true model of evaluation for the benefit of HRD evaluation (Holton, 1996b). Kirkpatrick's greatest contribution to the field of evaluation has been to ground evaluation by providing a universal language and widely accepted framework for approaching evaluation (Brauchle & Schmidt, 2004). Noted by Alliger and Janak, Kirkpatrick has succeeded in terms of offering a \"global heuristic for training evaluation\" (1989, p. 339). Kirkpatrick identified four stages at which evaluators can capture data ranging from simplest to more complex with emphasis moving from the individual towards impact on the organization. The model continues to dominate HRD evaluation almost 50 years later arguably due to its simple approach. Most of the practical work today in HRD evaluation claims to build on or be 29 influenced by Kirkpatrick's four levels. Such influenced researchers have suggested one to two additional levels to account for external influences and financial analysis. Phillips' five-level framework Cost or financial analysis has reemerged in HRD evaluation where evaluators within organizations are being asked more often to produce evidence of training impact represented by a monetary value. This type of analysis can assist when communicating results with clients and for \"gaining consensus about potential accomplishments and costs\" (Parsons, 1997). Most supporters of this approach to evaluation tend to put training in the same light as other large investment costs for organizations, which require justification. There are many ways to calculate the value of the training investment. Murray and Efendioglu (2007) claim that there are four methods commonly used today and Phillips' return on investment (ROI) method is one of those. Bottom line approaches to report training results are not unfamiliar to industries and private sector organizations. Case studies have been found that date back to the early 20th century with the earliest being the accounting model developed by DuPont in 1919 (Wang, Dou & Li, 2002). Essentially it is a study of the \"causality of company training and company performance\" (p. 204). ROI as an approach to HRD evaluation has been routinely documented since the early 1960s (Wang, Dou & Li, 2002); a trend pushed by large-scale job training programs in the United States of the time, which coincided with the growth of social program evaluation research. Despite the growing demand to know the value of corporate training, many of the studied experimental, non-experimental and quasi-experimental designs were determined to be limited when applied to the HRD field because they lacked focus on performance improvement. Stufflebeam (2001) classifies these types of evaluations as cost-benefit analysis approaches. Reasons for increasing the use of cost analysis of education and training are 30 associated with improving efficiency in allocating resources, projecting costs, comparing alternative methods for meeting program objectives and testing the expenditures for program expansion (Barker, 2001). Studies may also attempt to determine the cost to the organization for not offering the training in the first place. The purpose of these approaches, according to Stufflebeam, is to help organizations determine costs associated with program inputs, determine the monetary value of the program outcomes, compute benefit-cost ratios, compare computed ratios to those of similar programs, and ultimately judge a program's productivity in economic terms (2001, p. 31). Cost analyses always reflect the organization's investment rather than the leamer's contribution. These types of studies can be of value particularly to outsiders interested in replicating the returns in their companies. Analyses can range from comparing the outcomes of similar programs with the results yielded for each dollar spent to the effect ofthe training investment on the organization. In terms ofHRD, it is easier to apply such a model to large-scale development programs because they are more often designed with quantifiable outcomes. While Stufflebeam determines this type of analysis to be \"important but problematic consideration in program evaluations\" (2001, p. 32), since the 1980s interest in this approach by industry practitioners has gained momentum and is permeating into the public sector. Major hurdles for most evaluators seeking training ROI is the difficulty in obtaining accurate accounting details for the program's inputs and placing a monetary value on anticipated and unexpected outcomes. These challenges are supported by the study of Murray and Efendioglu (2007) who attempted to compare popular valuing methods. They found that calculated ROI figures were plagued with too many assumptions and omissions and lacked consistent calculations, which prevented figures from being compared to alternative valuing 31 methods and to other organizational capital investments. Their study concluded that there were many \"problems in evaluating the true impact of training\" (Murray & Efendioglu, 2007, n.p.). Barker (2001) reminds us that the bottom line impact for firms are uncertain, although training seems to provide discernible positive effects in areas such as employee turnover, staff morale and product/service quality. However, it is difficult to gauge the effects on productivity, and there are difficulties in establishing links between firm performance and training (p. 16). Phillips is celebrated as the current supreme author and champion ofRO!. Phillips drummed up a lot of support for ROI from the time when a series of articles he wrote appeared in the 1996 volume of Training & Development published by the American Society for Training and Development (ASTD). This followed case studies and a multivolume 1994 casebook, In Action, written by Phillips and also sponsored and distributed by ASTD. The partnership with ASTD provided access to the largest community of practitioners looking for business related approaches to program evaluation to help them defend their training budgets in a time of industry downsizing. Currently, Phillips is chairman of the ROI Institute, which markets itself as the official certifying body for the ROI Professional designation. He offers his method as meeting the requirements of practitioners, management and evaluation researchers and is representative of a paradigm shift away from training for activity towards training for results as an accountability measure (Phillips, 1997). Phillips has continuously revisited and marketed the ROI process model/methodology he refined in the 1990s. According to ASTD, Phillips's process is the most documented, with over a hundred case studies using this methodology. Globally, over 2,500 individuals have been certified in the methodology, and the 15 books on the process have been published in 25 languages (ASTD, 2007, p. B-3). 32 Phillip's ROI method is characterized as the addition of a fifth level to the commonly accepted Kirkpatrick four-level model (see Table 3). The difference, according to Phillips, is that he developed a process model of how the data is collected, measured, quantified and integrated at each level. The final, and \"ultimate level of evaluation\" is to calculate the ROI (see figures 1 and 2) (Phillips, 2007, p. 5). Table 3 Phillip 's five-level framework Level 1 -l- Reaction & planned action 2-l- Learning 3-l- Application & implementation 4-1- Business impact Brief description Measures participants reaction to the program and outline specific plans for implementation Measures changes in skills, knowledge or attitudes Measures changes in behavior on-the-job and specific application and implementation Measures business impact of the program 5 Return on investment Compares the monetary value of the results with the costs for the program, usually expressed as a percentage Note. Adapted from Phillips, J . J. (2007a). \"Table 4: Five level Framework\". Measuring ROI: The process, current issues, and trends. ROI Institute. pp. 5. The ROI calculation is a monetary conversion, as a percentage, of the training's business impact on the organization (Kirkpatrick level four). Worth noting is that to advance from one step to the next, Phillips requires data to be captured and analyzed at all the previous levels, similar to the Kirkpatrick model. He does note that analyses at lower levels of evaluation do not need to be comprehensive if an ROI is being planned (Phillips, 2007b). Because there is a similarity, it can then be deduced that Phillips relies on the same assumptions of causal relationships in Kirkpatrick's four levels, which was deemed to be a problematic approach to evaluation (Alliger & Janak, 1989; Horton, 1996a; Bates, 2005). Phillips' ROI method has drawn 33 additional criticism for operating in the absence of an analytical framework and its reliance on judgment-based approaches (Wang, Dou & Li, 2002). Figure 1 Formula used to calculate ROI ROI = __ ~(T~o~ta=l~b~e~n~ef~it_-_T~o=ta=l~c~o~s~t) ___ X100% . Total cost Note: Adapted Phillips, J . J. (2007). ''ROI Model\". In Return on investment in training and performance improvement programs. Houston, TX: Gulf Publishing. pp. 25-41 . Figure 2 Phillips ROI Model Note . Reproduced from Phillips, J. J. (2007). \"Figure 1: The ROI Model\". Measuring ROI: The process, current issues, and trends. ROI Institute. pp. 9. Wang, Dou and Li (2002) argue that there is a dilemma with ROI measurement to the extent that they claim current ROI approaches \"are inadequate to meet the needs ofHRD practices\" (p. 210). Organizations that base progress in terms of intangible measures, like client satisfaction and employee motivation, are often excluded from HRD literature since their success 34 cannot be assigned a dollar value. These organizations, government agencies and non-profit bodies often included, tend to measure the effectiveness of training in relation to meeting the desired program objectives, not returned profit. Another obstacle that exists is that current models built on business impact do not use methods that sufficiently isolate HRD impacts from other existing variables. Although Wang, Dou and Li note Phillips has created general guidelines to address external variable and intangible measures, his ROJ method falls short of supporting these business objectives. A final concern addressed by Wang, Dou and Li (2002) is that many ROJ approaches rely on employees and managers to assign dollar values to the results oftraining. This judgment-based estimation is problematic because the evaluator cannot guarantee that the participant has the skills or qualifications required to make this kind of judgment. This type of data gathering can be considered ironic for HRD since training tends to center on improving competency-based performances. Wang, Dou and Li add that previous research by industrial and organizational psychologists into judgment-based approaches has proven that the resulting calculation is often inflated due to overconfidence by participants' focus leaning towards what went well rather than the hurdles or difficult topics in the training. Phillips attempts to tackle these criticisms of subjectivity by including a confidence rating in his ROI methodology. At level four, Phillips asks participants to attribute their post-training success on-the-job as a direct result of the training received. Participants are then asked a follow-up question to determine their degree of confidence (as a percentage) with their previous response. Even though Phillips errs on the side of caution and advocates that only the most conservative estimates be used, Wang, Dou and Li point out, what often results are inflated financial figures that are supposed to represent the ROJ of a training program. Since there is unreliability associated with ROJ calculations, some critics 35 shrug-offROI studies as window dressing that contains no concrete evaluation merit, value or use. Other challenges with the Phillips model were identified by Casey (2006). Casey suggests that the ROI figure is inadequate because it does not identify the organization's objectives. Further to that idea, Parsons (1997) believes that financial analysis tends to overlook important program characteristics and fails to incorporate organizational values into its conclusions. According to Holton and Naquin (2005), this is why evaluation does not contribute to decision making and is rarely done. Holton and Naquin consider the Phillips model to be a an example, most likely the best documented example, of a rational-economic evaluation model for HRD, where the process is rational data-driven to assist with decision-making (Holton & Naquin, 2005, p. 262). At issue here is that the limited evolution ofthe models and tools and tools to date focused on few questions and constructed a limited view of what evaluations can offer organizations, and have generally failed to prove their value (Holton & Naquin, 2005, p. 266). Casey (2006) further suggests that ROI is but a figure in a moment in time and not static. The figure can also be misleading due to factors not associated with the training, such as changes in market pressures. Although the Phillips approach has been determined to be problematic by researchers, its popularity among HRD practitioners continues to increase and draw the attention of business managers. Its popularity may be due to the straight forward figures that translate well in business meetings and fiscal planning for training budgets, but as Murray and Efendioglu proved, the calculation is flawed. In spite of their criticism ofROI evaluation, Wang, Dou and Li acknowledge that Phillips attempts to show how the four-level framework can be measured and Murray and Efendioglu admit that methods are becoming more rigorous in \"assigning direct and 36 indirect costs of training\" (2007, n.p.). The complete use ofROI still remains infrequent while practitioners continue to measure levels informally and emphasis continues on assessing learner reactions. Phillips should be acknowledged for presenting a process model in support of Kirkpatrick's long accepted classification system that still retains that original air of simplicity. The benefits of the ROI model offered by Phillips for most practitioners is the explicit method and the reliability of survey and collection tools from levels one to four that have been tested in over 100 organizations. The model is not a diagnostic tool to explain why programs may need improvement or why they were successful, but for most HRD practitioners who have limited exposure to rigorous evaluation design, Phillips provides tested tools that need little to no customization so they can deliver an evaluation report to management. This tested approach can offer credible results for the organization from the perspective of an HRD evaluation novice for explaining the outcome of a program, but not where the program failed or succeeded. Six-stages evaluation and the Success-case method Brinkerhoff is credited for offering a more precise and comprehensive purpose for evaluation in terms of training and development (Marsden, 1991) with the Six-stages model and his Success-case Method (Brinkerhoff, 1983). Introduced in 1983, the Success-case method is largely viewed as an alternative to the hierarchical level frameworks. It remains outcome oriented but relinquishes the requirement that learners know the intervention's \"clearly defined and agreed-on goals beforehand\" (Holton & Naquin, 2005, p. 265). Followed in 1987 by the Six-stages evaluation, he identified a series of six logical stages towards the overarching purpose of HRD, which according to Brinkerhoff is organizational pay-off. Brinkerhoffs Six-stages model looks similar to the Kirkpatrick and Phillips models as it can also be considered a rational-37 economic evaluation model whereas the Success-case method is classified under bounded rationality (Holton & Naquin, 2005). In Six -stages evaluation, Brinkerhoff adds a focus on the initial needs of the organization through planned needs assessments and design; however, the model is intended to be applied as a cycle of stages rather than a hierarchical ladder. What all the models discussed so far share with the Six-stages model is the common idea that to move from one stage to the next, the previous questions imbedded at each stage in the model must be answered. Table 4 Brinkerhoff's six-stages of effective HRD evaluation 2 3 4 Stages Goal setting and needs analysis Program design Implementation and operation of HRD Immediate outcomes Endurance and 5 application of immediate outcomes 6 Organizational benefits Purpose To determine that an identified problem represents a training need and to determine the real goals of the program To determine the most appropriate training strategy To determine if the chosen strategy is successfully implemented To determine if learning occurred and to what extent To determine usage outcomes (at the individual level) To determine impacts and worth (at the organizational level) Note. Adapted from Brinkerhoff, R. O. (1987). \"The six stages of effective of HRD evaluation.\" Achieving results from training . San Francisco, CA: Jossey-bass; Marsden, M. J. (1991). Evaluation: Towards a definition and statement of purpose. Australian Journal of Educational Technology (7}1, 31-38. Stage one of the six-stages model can be equated to a learning needs assessments and stage two can be considered in terms of instructional planning, the model's overall goal is identical to other HRD evaluation models: to determine individual and organizational impact. All things considered, the Brinkerhoff approach offers a more useful approach to training evaluation 38 than Kirkpatrick or Phillips, but is more comprehensive and complex despite its advantage of being specific to training. The six-stages model has been mostly ignored in HRD evaluation practice perhaps because available alternatives appeared simpler, although Brinkerhoffs ideas are often explored in HRD evaluation literature. According to Marsden (1991), the Kirkpatrick model \"falls short of an ideal model because it is entirely outcome-oriented whereas the Six-stages model is integrated to include instructional activities in the planning, design and implementation stages of the instructional process\" (n.p.). It is implied that her criticisms also apply to Phillips model as well. Also, because the model tries to indirectly quantify organizational impact, Brinkerhoffs model endures the same criticizes as other rational-economic models: requires too many resources to be fully implemented, misleading and limited in terms of contributing to the decision-making process of organizations (Casey, 2006; Holton & Naquin, 2005). The Success-case method uses learner accounts of success to identify outcomes. After the training, few examples of where the training seemed to. have really made an impact are identified and selected because these learners seemed to have benefited the most from the intervention. The cases are selected using the \"intuitive judgment of the trainer\" (Brinkerhoff, 1983). The purpose of the approach is to collect information to answer the following questions: 1. How have you used the training? 2. What benefits can be attributed to use of training? 3. What problems did you encounter in using the training? 4. What were the negative consequences of the training and/or its use? 5. What criteria did you use to decide if you were using the training correctly or incorrectly? (Brinkerhoff, 1983, p. 58) Brinkerhoff (1983) claims that what separates this approach from others discussed so far is that it \"seeks information about a few subjects rather than seeking thinner, quantifiable data about many subjects\" (p. 58). This important because according to Brinkerhoff (2006) \"training 39 programs produce reliable results\" (p. 24), yet instances where learners achieve great success or realize none are averaged in the outcome of the data, to show \"quite mediocre results\" (p. 24). The Success-case method can be used to leverage predictable results and \"make a business case for taking specific and concrete actions to support training\" (Brinkerhoff, 2006, p. 25). Despite this difference, the Success-case method is criticized for relying on trainers to decide what may be critical success factors on the job and may not identify what learners have problems with when returning to work (Casey, 2006). This method relies on judgments, which Wang, Dou and Li (2002) had identified as problematic. The review ofHRD evaluation models and frameworks shows that little progress has occurred since Kirkpatrick launched his four-level model in 1959 onto the world. Most research traditions go through some degree of growth and adaptation during a span of 50 years, which is seen when we look at progress in the sciences. This lack of evolution may be due to several reasons. To speculate, little research has been dedicated to improve HRD evaluation models and evaluation has generally not been considered a priority by training departments. It may be too expensive to study and many organizations are more concerned with results than process. Christie explained that practice tends to overshadow modeling and theory testing. Most HRD research draws from scholarly literature in program evaluation, economics and industrial and organizational psychology. The question of why greater improvements to the field have not realized remains unanswered; however, most researchers agree that better theories are required to ground models and make them more applicable and help explain program outcomes (Holton, 1996a; Ottoson, 2000). Until such a theory exists, practitioners will continue to use methods available to them that can be implemented in a straight forward manner and at the lowest cost to their organizations. The case study presented in Chapter Four is an account ofHRD evaluation 40 within the Canadian public sector. The federal agency examined use the Phillips Model to assess and report on ROI and evaluate its training program. 41 Chapter Four: Case Study: HRD evaluation at Canada Revenue Agency A renewed era of fiscal conservatism and government restructuring in the mid-l 990s prompted the Canadian government to review its practices and streamline business processes to reflect management trends in the private sector. This was due to many factors and not limited to the realities of high government debt and the erosion of public confidence in government. This shift represented a move towards greater accountability in the public sector and a focus on results. The emphasis on accountability can be seen in government initiatives such as results-based management, Program Activity Architectures that link programs to strategic outcomes and the requirement for departmental performance reports (Hunt, 2004). Although still lagging behind private sector organizations, the Canadian federal government is slowly moving towards visible resource accounting as part of its control and accountability framework. This shift in public management can also be seen in many aspects of government, including a growing trend in accountability-oriented evaluation ofHRD and workplace training. This chapter explores the Canada Revenue Agency (CRA) HRD training and evaluation framework as an example of federal organizations moving towards results-based reporting founded on current training evaluation models. The Office of the Auditor General (OAG) of Canada released the October 2007 Report to the House of Commons on October 30, 2007, which included a performance audit of technical training and learning of key program areas ofCRA. Despite some positive reviews, the results of the Auditor Genera1's enquiry expose some inefficiencies and gaps within CRA's HRD practice, including the evaluation of its technical training. The case study presented offers a descriptive analysis of the current model being used at CRA to evaluate technical training and proposes some lessons learned for improved evaluation 42 practices at the Agency and other federal organizations ready to develop their capacities to evaluate training. Office of the Auditor General The OAG acts as an independent and reliable source for an objective account of how parliament is spending and managing taxpayer dollars. The responsibilities of the Auditor General, as an officer of Parliament, are legislated in the Auditor General Act and the Financial Administration Act (OAG, 2005). Apart from auditing federal departments, agencies and some crown corporations, the Auditor General has the responsibility for reporting on audit results publicly at least four times a year and testifying before parliamentary committees on the status and consequences ofthose audits. The OAG conducts financial audits, which examine departmental financial statements, and performance audits, which focus on management, controls and reporting within federal organizations. While Parliament gives the OAG discretion in determining what to audit, the subject of performance audits is chosen according to the risks government organizations and citizens face with the implementation of programs. Often, these are programs that carry a high financial tax burden or an issue that threatens the health and safety of Canadians (OAG, 2005). The OAG also investigates ways in which the management of funds can be more efficient and topics of interest to members of parliament. According to the OAG (2005), these audits examine whether government programs are being managed with due regard for economy, efficiency, and environmental impact, and with measures in place to determine their effectiveness. Covering a wide range of topics, these audits contain recommendations that can serve as a springboard to lasting and positive change in the way government functions. Follow-up audits are conducted to determine whether the government has made satisfactory progress in implementing the Office's recommendations. 43 While the OAG serves as Canada's watchdog to ensure that government spending is kept accountable to its budget and that the prescribed results are being achieved, it does not make critique on the merit or worth of government programs and initiatives. The mandate ofthe OAG is to ensure that government is doing things right but not necessarily if it is doing the right things. Evaluation and auditing perform different functions in government. As pointed out by Hunt in a presentation to the Canadian Evaluation Society, \"audit looks at how we are doing things; evaluation examines whether we are doing the right things and if we are doing them in a cost-effective manner\" (2004, n.p.). Evaluation tends to focus on effectiveness and efficiency of programs and policies and requires different skill-sets to collect and process information than audits (Hunt, 2004). Evaluations tend to seek out evidence-based information as direct input into reports. Hunt specifies that auditing is an exercise to uncover management malpractice of program implementation while evaluators tend to be interested in the outcomes of the program. Within the federal government, the Treasury Board Secretariat houses the Centre of Excellence for Evaluation that promotes the alignment of strategic evaluation to program objectives and maintains the federal Evaluation Policy, which impacts the work of departments and agencies in the federal system. They operate within a high-level of government, arms length of evaluation practice within departments and agencies. The activities of the Centre of Excellence for Evaluation include ensuring that departments and agencies have the capacity to use evaluation data in support of decision-making, mostly to uphold responsible expenditure management and cost-effectiveness. The Centre of Excellence for Evaluation is mandated to monitor quality, feed evaluation information into cabinet decision-making priorities, and strengthen the government's capacity to evaluate its programs and policies (Hunt, 2004). The 44 programs and policies that are often targeted for evaluation affect the most senior ranks of government, namely deputy ministers, and are often tied to performance agreements. Results of evaluation are typically aggregated by the submitting organization to provide a wide-view and high-level picture of department or agency programs and tend not reflect HR practices. The Evaluation Policy does not require departments and agencies to follow a Treasury Board Secretariat (TBS) model of evaluation so long as evaluators can ensure that the evaluation design is rigorous enough to withstand scrutiny. Learning at eRA The role of the CRA is to administer Canada's tax legislation, regulations and the various benefit programs within the tax system (CRA, 2007). The skills and tax knowledge of its employees allows the Agency to adequately collect funds that pay for all federal government initiatives. In 1999 the Department of Revenue Canada formally became an agency, giving it status as a separate employer in the Public Service of Canada. As an agency, CRA has more flexibility and control over its human resources policies and programs to serve its 30 000 employees, which according to CRA's website, has helped modernize the organization. One of the transformations stemming from its transition to an agency was CRA's commitment to becoming a learning organization. This commitment to learning is reflected in CRA's planned approach to ensure that policies support adult education, educational assistance and that training and learning at CRA are consistent and effective (see Figure 3). The Conference Board of Canada describes the four pillars of a learning organization as communicating a clear vision of the organization's strategic direction, fostering an environment that supports a culture of risk taking and experimentation, allowing employees to learn hands-on, and ensUring that knowledge is well managed and accessible to its employees (2007). 45 Figure 3 CRA - Foundation for a learning organization Sulldlng a Learn'ng Organization - The Foundation Note: Copied from Training and Learning Directorate. (2004, November). CRA report card on learning: Becoming a learning organization: Five years of progress. Canada Revenue Agency. Since 1999, CRA's investment in formal training and learning represents 5% to 6% of payroll costs (personnel salary plus the employee benefit program). The average CRA employee spends roughly 10 days in training or engaged in some type oflearning activity (OAG, 2007; CRA, 2004). In comparison, a Conference Board of Canada study investigated the learning and development outlook for Canadian businesses and they found that comparable large Canadian businesses on average spend roughly 1.8% of annual payroll costs on formal training (Hughes & Grant, 2007). It can be assumed that this figure may not account for actual salary costs. Other data from the same study show that government ranks in the middle in terms of spending on employee learning when compared to Canadian industries. In 2005-2006, CRA disbursed $140 million to cover the salary costs for participating in learning, whether with accredited education 46 institutions or in a non-fonnal workplace learning event! (OAG, 2007). Participant salary costs continue to be CRA's largest expenditure on learning. CRA also supports learning through its official Learning Policy, designed to support \"learning all the time ... everywhere\" (OAG, 2007, p. 9). Initiatives under the Learning Policy include managing CRA's investment in learning, sponsoring employee development, supporting accredited learning, and engaging managers and employees to communicate CRA's philosophy on learning (OAG, 2007). Currently, there are more than 700 national English and French learning products available to CRA employees that were either designed by the Agency or purchased from vendors. The courses are delivered using classroom, online, self-study, and blended variations of these solutions (McCallister & Gillis, 2006). Seventy-eight percent of respondents to CRA's 2005 Employee Survey said that they received the training they required to perfonn their job while 69% believed they received the coaching necessary to improve their skills. These statistics mirror the results of the OAG October 2007 Report where infonnation collected in interviews and focus groups revealed a desire and culture to learn among employees. The Report states as a result, [eRA employees] are eager to participate in formal and informal learning opportunities that help them broaden their knowledge. They also enjoy sharing their knowledge and experience with colleagues and learning from their colleagues when they have the opportunity to do so (p.6) . Overall, the OAG gave CRA recognition for the strides made in its transition to a learning organization. Their research pointed to a positive learning culture supported by the organization. This is reflected in what CTV reported on the release of the Auditor General's Report, I Figures exclude language training. 47 Fraser [the Auditor General] also gave a positive review to the Canada Revenue Agency for its efforts to keep its employees well-trained when it comes to emerging tax issues. CRA officials often find themselves up against highly specialized private sector tax accountants and lawyers, the report notes, but said there is a corporate culture at the Canada Revenue Agency that encourages employees to learn more and improve their skills when handling complex tax cases (Akin, 2007, n.p.). Despite these encouraging statistics, the OAG found that monitoring and the evaluation of learning to be inconsistent and underused when assessing whether learning was meeting the needs of its employees and aligned to its business goals. Their conclusions clashed with their initial expectations of adequate processes in place for eRA to assess its operational learning requirements and a system of evaluation to assess the effectiveness of their training and learning program that meets both operational and individual learning needs (OAG, 2007). Among other issues, their recommendation for the evaluation of training was for eRA to consistently apply its chosen evaluation model, the Phillips ROI Process Model, for regular reporting and to ensure the effectiveness of their learning programs. Current evaluation model employed The transition to from federal department to agency, among other benefits, aimed at modernizing the approach eRA took to learning that did not exist in its previous mandate as a department. These strategies included linking learning to career management, ensuring that learning is aligned with corporate priorities and objectives, as well as the consistent measurement and reporting on the results oflearning (eRA, 2004). The eRA committed itself to \"evaluate courses, measure changes in the capability gap and assess the return on investment of training\" (eRA, 2004, p. 18). From these priorities, the Training and Learning Directorate at eRA emphasized the collection of evaluation data at level-one, learner reactions, for all their paper-based and online learning products. In 2005, the reaction questionnaires were enhanced to 48 include data collection on perceived learning, in-line with what Kirkpatrick and Phillips discuss as \"Level 2\" information on learning. The OAG uncovered in its audit that despite its intentions, the CRA failed to consistently collect the data, and neither provided routine reports on course evaluations nor went to examine learning transfer or organizational impact, except for a few ad hoc examples in management training. However, during interviews with CRA Team Leaders, the OAG did find that there is some evidence that employee performances improved following training. The Kirkpatrick Model was the standard for course evaluation and influenced the CRA, like most large organizations, to evaluate according to levels of increasing complexity. Prior to 2005, paper questionnaires were distributed at the conclusion ofleaming events and mailed to Headquarters in Ottawa; now, statistics and comments are compiled into an Evaluation Database (McCallister & Gillis, 2006). A separate evaluation database automatically records evaluation data from online and blended learning products. In 2006, over 10 000 evaluations were input in to the Evaluation Database; however, no formal requests for evaluation reports were made or processed beyond requests for statistics on the perceived learning gain achieved and overall satisfaction with the learning product. Neither of these statistics reflects the value or impact of the training on CRA, and whether it describes when, how or even if a transfer of learning from the classroom to the office took place. In short, reaction level questionnaires cannot determine the effectiveness of the training since the reactions ofleamers can be influenced by factors such as the likeability of the instructor or the complexity of the instructional content. In 2004, the Training and Learning Directorate began a certification process to enable its Learning Officers to employ the Phillips Method to evaluate courses beyond learner reactions since collection of information on learning achievement and transfer was ad hoc or nonexistent 49 prior to 2005 (McCallister & Gillis, 2006). CRA's decision to use the Phillips Method was based on the ROI Institute's international brand recognition, the considerable case studies that existed and its tested and certified data collection tools. CRA was the first federal government organization to obtain Phillips' ROI certification in 2005 and the fourth organization in Canada (McCallister & Gillis, 2006). The main drivers to adopt the Phillips ROI Process Model according to McCallister, who is a senior design specialist at CRA speaking in a CSTD webinar, were the inexplicable increasing training costs revealed in a 2002 internal horizontal review that identified a need for an evaluation framework, that training was not being aligned with corporate . planning, and the increasing requirement for senior management to know the \"bottom-line\" of training (McCallister & Gillis, 2006, slide 9). Ultimately, there was \"overwhelming interest and support in the CRA,\" for ROI of training (McCallister & Gillis, 2006, slide 30). To stay true to the methodology, CRA committed to \"30 percent of learning events to be evaluated at level 3, 10 percent at level 4, and 5 percent at level 5,\" as recommended by Phillips (OAG, 2007, p. 18). As noted by the OAG, the progress to routinely evaluate Levels 3 to 4 has been slow. They found that \"of more than 700 courses in the Directory of Learning Products, to date the Agency has done only one level 5 evaluation and two level 3 evaluations; only the level 5 evaluation involved a technical training course\" (OAG, 2007, p. 18). The Auditor General was not informed on the Level 4 Kirkpatrick study that took place in the Pacific Region years earlier. Between the gap in the evaluation of training identified in 2002 by internal review and the performance audit completed in 2007, the CRA was not able to validate its chosen framework or model of evaluation. In response, CRA said that the cost of evaluation beyond the Level 1 and 2 questionnaires made the Level 3 to 5 targets difficult to maintain and that \"more realistic targets\" were needed (OAG, 2007, p. 18). 50 Unbeknownst to the Training and Learning Directorate in Ottawa, the training and learning office located in the Pacific Region (British Columbia and Yukon Territory) of CRA conducted an evaluation of its regional Jump-start Orientation Program for new managers parallel to the time they were attempting to institutionalize the Phillips method. The evaluation was published as a case study in D. Kirkpatrick and J. Kirkpatrick's 2006 edition of Evaluating . Training Programs: The Four Levels. The evaluator used reaction level questionnaires, content analysis questionnaires, random follow-up of few participants and post session focus groups (Kirkpatrick & Kirkpatrick, 2006). The results of the study focused on what Phillips considers to be intangible benefits. The evaluator of the orientation program noted that the participants remarked on changed behaviors such as improved morale, improved teamwork, \"which itself was reflected in improved production\" (Barron, 2006, p. 220). Despite campaigning for the widespread use of the Phillips Method, this case study shows that the various regional training offices tend to operate independently and apply other methods. OA G recommendations From her performance audit of technical training at CRA, the Auditor General made a broad recommendation for CRA to ensure that learning is effective in the Agency because \"important improvements are needed before the Agency can be fully assured that it is getting value for its investment\" (OAG, 2007, p. 21). According to the October 2007 Report, the Canada Revenue Agency should establish targets for evaluating the effectiveness of training and learning events using the model it has adopted and a reasonable time frame for meeting those targets, complete the evaluations, and report the results to senior management (p.IS). CRA agreed with the OAG's recommendation for a more rigorous and routine evaluation of its national training offered to its employees. In response, CRA 51 plans to establish targets for evaluations of formal learning products, including blended and e-learning products, by 31 March 2008. The Agency will develop an action plan with time frames to meet these targets. The action plan will be aligned with available resources (OAG, 2007, p. 26). The reasons that led to the Auditor General's recommendations would largely be based on speculation since no documented analyses on CRA evaluation practices exist or account for the challenges upon implementation of Phillips ROJ model. McCallister and Gillis ' 2006 presentation to CSTD offers some clues as to why implementing the model may have been difficult for the Agency through their lessons learned. First, to sustain the model's momentum, CRA may have required that more employees be trained in the ROI Methodology to ensure constant and consistent evaluation; and second, the implementation plan for evaluation stopped once a Level 5 ROJ pilot study was completed giving CRA a ROJ professional designation from the ROI Institute. McCallister and Gillis may not have been aware ofthe regional preference to use other HRD evaluation models, like the application of Kirkpatrick's four-level framework in the Pacific Region. Otherwise, there is no evidence that indicates that marketing, funding, or staff were made available to maintain the framework and apply the Phillip's ROJ Process Model beyond its pilot phase of2004-2005. McCallister and Gillis did not specify that full endorsement for the model would be reviewed after the pilot. It could be argued that the error occurred in the absence of a planning schedule for future evaluations. For example, McCallister and Gillis do not discuss their client stakeholders nor do they specify how they plan to choose what courses and HRD programs are to be evaluated and to what levels. 52 Lessons learned CRA's early experience with the evaluation oftheir technical training shows that they were largely unguided in how to collect and analyze information but, despite such an uncoordinated approach, they felt evaluation was important to the organization. The Canadian Centre for Management Development (2001) offered this advice to Government of Canada organizations in a discussion paper, organizations should take a strategic view of measuring the impact of instructional programs. Assessment at all levels for all programs is probably not cost-effective ... An evaluation process for instructional programs really requires a strategic approach that clearly identifies what programs will be assessed at what levels. These decisions would weigh the costlbenefit of the level of assessment vis-a-vis the cost and desired impact of the program on organizational performance (p. 5). The challenge for CRA was to approach evaluation strategically, with it partners and stakeholders and determine what kind of evaluation best suited its organization. Instead, blanket models like Kirkpatrick's and Phillip's were tested and later applied as the agency's approach of choice, and it would seem without much consideration for the required resources. Like much of the criticism Kirkpatrick met in the years since his four-level framework was introduced, it would seem the CRA knew what information it needed but did not have the tools or expertise to collect the correct data, analyze it or how to report the results effectively. The interest in applying the more advanced levels of evaluation prior to 2004 is undocumented or nonexistent in this case. Despite some success with the Kirkpatrick four-level framework in the regions, the adoption of Phillip's ROJ Process Model was supposed to relieve the evaluation practice by providing tested tools, especially for collecting high-level data for advanced Level 3 to 5 evaluations. The motivation for adopting the Phillips method was that it would make collected data more credible and be collected regularly to satisfy the concerns exposed in the horizontal 53 review conducted in 2002. While these goals were identified in an implementation plan and accomplished, the plan did not outline resources required to support the ongoing evaluation of training. It could be deduced that either the magnitude of the targets scrutinized by the OAG created an unmanageable workload or that staff turnover possibly lead to a decrease in knowledgeable employees to adequately manage the added work. Some things remain uncertain, such as CRA's plan to train its evaluators in the methodology and ifit was their intent to implement after 2005. The lesson learned from this experience is to account for the necessary resources needed to make the full application of the Phillip's ROI process sustainable. Necessary commitment to HRD evaluation through appropriate staffing and funding will ensure more adequate and regular monitoring and reporting of training impact. Also, as seen in CRA's response to the OAG, it may be necessary for some organizations to customize targets and strategies to make the models fit their organizations. 54 Chapter 5: Conclusion This exploration into the evolution ofHRD and workplace evaluation found its roots in the theory and practice of program evaluation. The period from the 1960s to 1970s proved to be a decade rich with research activities that enhanced the credibility ofthe tradition and garnered some academic acceptance within the social sciences. Authors such as Scriven and Stufflebeam were able to generate ideas and categorize methods and research streams to better coordinate research efforts and accommodate the immediate evaluation needs of educators. Neither scholar was schooled in program evaluation but arrived to it via other academic vehicles. Without a dOl,lbt, program evaluation is interdisciplinary and touches many professional fields and scholarly disciplines; however, the emphasis has traditionally been put on the evaluation practitioners. For instance, adult educators and programmers tend to find themselves in evaluation roles without a clear sense of what they are supposed to be doing. Thus far, program evaluation and HRD evaluation literature has not been adequately used to support the field practitioner despite attempts to reconceptualize evaluation models over the decades. Most adult education practitioners use evaluation results to determine the merit and worth of their programs, to improve the training offered or to fulfill an obligation to program sponsors. Since practitioners may devote more time to developing instructional materials or teaching, evaluation often takes a backseat in the process causing practitioners to rely on reaction level questionnaires that ask if learners enjoyed the experience. While this information is useful, Kirkpatrick located learner reactions as the lowest form of useful evaluation data in his four-level model. The advanced levels focused on documenting changed behaviors and organizational impact. This concept generated a lot of noise in HRD and training literature, although the concept has largely remained unchanged since 1959. Supporting literature has surfaced 55 suggesting that HRD evaluation needs to be more rigorous and follow traditional social science research methods. Authors such as Phillips even adapted Kirkpatrick's concept of levels to make evaluation more appealing to adult educators in businesses and organizations so data could stand-up to scrutiny. The idea was to describe evaluation with the language of investment so HRD and training could be taken more seriously by management. Phillips has been able to sell many books based on ROI; nonetheless, most training evaluations continue to only capture data on learner reactions towards the intervention rather than impact or ROI (Barker, 2001). The best example of these best intentions for rigorous and routine evaluation is documented in the OAG technical training audit of the CRA in 2007. The CRA is the largest federal government organization and is responsible for collecting the funds that support all programs and policies in Canada. For this reason, the OAG wanted to audit technical training to ensure that CRA staff were equipped with the knowledge and skills for collecting and disbursing government funds. The audit revealed that CRA is showing signs of having an organizational culture that supports leaming. The same study also demonstrated that the Agency does not adequately evaluate its HRD; despite the CRA noticing a similar gap and deciding to pilot the use of the Phillips ROI model three years earlier. While the OAG did not criticize the model itself it did make recommendations for CRA to take advantage of what it offered. The CRA had not been able to implement the model after its pilot, perhaps due to a lack of staff and funding, and admitted that its evaluation targets were unrealistic for their available resources. The CRA had continued to rely on data pertaining to learner reactions though there was no evidence of how that data was conveyed prior to the audit. Lessons leamed for other public sector organizations seeking to solidify HRD evaluation is to customize the Phillips model so it meets the specific needs of that organization. There is no literature that discusses a variance 56 in effectiveness if processes are altered to suit organizational needs. By their very nature, government organizations may not be required to judge the worth or merit of their HRD based on the investment value returned but may choose to place more emphasis on intangible measu~es like job satisfaction. The analysis demonstrates that there is value and political interest in uncovering the invested worth ofHRD as is shown in the Auditor General's choice of aUditing eRA's training and learning program. 57 References Akin, D. (2007, October 30). AG report raises concerns about security threats. CTV News. Retrieved from http://www.ctv.ca/servlet/ArticleNews/story/CTVNews/20071030/ag_report_071 03012007 10301 AIkin, M. C. (2003). Evaluation theory and practice: insights and new directions. New Directions for Evaluation (97). 81-89. Alliger, G. M. & Janak, E. A. (1989). Kirkpatrick's levels of training criteria: Thirty years later. Personnel Psychology 42.331-342. American Society of Training and Development (2007). Measuring and evaluating learning. Unpublished participant guide. ASTD Publications Department, Alexandria, VA. Barker, K. (2001, May). Return on training investment: Environmental scan. Vancouver, BC: FuturEd. Bartel, A. P. (2000). Measuring the employer's return on investments in training: Evidence from the literature. Industrial Relations, (39)3, 502-525. Bates, R. (2005). Kirkpatrick four-level evaluation model. In Mathison, S. (Ed.), Encyclopedia of Evaluation (pp. 221-222). Thousand Oaks, CA: Sage Publications. Brauchle, P. E. & Schmidt, K. (2004). Contemporary approaches for assess outcomes on training, education, and HRD programs. Journal of Industrial Teacher Education 3(41). 58 Brinkerhoff, R. O. (1983, August). The success case: A low-cost, high yield evaluation. Training and Development Journal. 37(8), 58-60. Brinkerhoff, R. O. (1987). Achieving results from training. San Francisco, CA: Jossey-bass. Brinkerhoff, R. O. (2006, May). Getting real about evaluation. T+D, 60(5),24-25. Caffarella, R. S. (2002). Planning programs for adult learners. San Francisco, CA: Jossey Bass Canadian Centre for Management Development. (2001). A learning measurement tool for the Government of Canada instructional programs. Unpublished discussion paper. Casey, M. S. (2006). Problem-based inquiry: An experimental approach to training evaluation. Unpublished doctoral dissertation, University of Akron. Christie, C. A. (2003) Understanding evaluation theory and its role in guiding practice: formal, folk, and otherwise. New Directionsfor Evaluation (97). 91-93. Hughes, P. D. & Grant, M. (2007). Learning and development outlook 2007. Conference Board of Canada. Davidson, J. E. (2005). Evaluation methodology basics. Thousand Oaks, CA: Sage Publications. Dionne, P. (1996). The evaluation of training activities: A complex issue involving different stakes. Human Resources Development Quarterly 3 (7). 279-286. Holton, E. F., III. (1996a). The flawed four-level evaluation model. Human Resource Development Quarterly, (7) 1, 5-22. 59 Holton, E. F., III. (1996b). Final word: Response to reaction to Holton article. Human Resource Development Quarterly, (7)1,27-29. Holton, E. F., III. & Naquin, S. (2005). A critical analysis ofHRD evaluation models from a decision-making perspective. Human Resource Development Quarterly, (16)2, 258-280. Horsch, K. (1999). Interview with Michael Scriven. The Evaluation Exchange, (V) 2/3. Hoyle, A. R. (1984). Evaluation of training: A review of the literature. Public Administration and Development 3(4).275-282. Hunt, T. (2004, December). Innovations in accountability: Federal government directions in strengthening accountability and the central role of evaluation. Treasury Board of Canada Secretariat. Retrieved from, http://www.tbs-sct.gc.ca/eval/ppt/dec04-002 _ e.asp. King, J. A. (2003). The challenge of studying evaluation theory. New Directions for Adult and Continuing Education (97). 57-67. Kirkpatrick, D. L. (1996). Invited reaction: Reaction to Holton article. Human Resource Development Quarterly, (7)1,23-26. Kirkpatrick, D. L. & Kirkpatrick, J. D. (2006). Evaluating training programs: The four levels. San Francisco, CA: Berret-Koehler Publishers, Inc. Kirkpatrick, J. (2007, August). The hidden power of Kirkpatrick's four levels of evaluation. T+D, 61(8),34-37 . 60 Marsden, M. J. (1991). Evaluation: Towards a definition and statement of purpose. Australian Journal of Educational Technology (7) 1, 31-38. Retrieved from http://www.ascilite.org.au!aj et/ aj et7/marsden.html McCallister, J. & Gillis, L. (2006, May 1). ROI Online Learning Series: How the CRA received ROI process certification. Canadian Society for Training and Development. Retrieved from http://www.cstd.ca/networkingicop/roi_netword.htmI. Murray, L.W. & Efendioglu, A. M. (2007). Valuing the investment in organizational training. Industrial and Commercial Training (7)39.372 - 379. Office of the Auditor General of Canada. (2007, October). Chapter 7: Technical training and learning - Canada Revenue Agency. Retrieved from http://www.oag-bvg.gc.ca/ domino/reports. nsflhtmll07menu _e. htmI. Ottoson, J. M. (2000). Evaluation of continuing professional education: Toward a theory of our own. New Directions for Adult and Continuing Education (86). 43-53 . Parsons, J. G. (1997). Values as a vital supplement. Human Resource Development Quarterly, (8)1,5-13 . Patton, M. Q. (2008). Utilization-Jocused evaluation (4th ed.). Thousand Oaks, CA: Sage Publications Phillips, J. J. (2007). Measuring ROJ: The process, current issues, and trends. ROI Institute. Birmingham, AI. 61 Ramlow, M. E. (n.d.). The program evaluation standards: Summary of the standards of the American Evaluation Association. American Evaluation Association. Retrieved from http ://www.eval.org/EvaluationDocuments/progeval.html. Rossi, P. H., Freeman, H. E. and Lipsey, M. W. (1999). Evaluation: A systematic approach . Thousand Oaks, CA: Sage Publications Scriven, M. (1991). Evaluation thesaurus (4th ed.). Newbury Park, CA: Sage. Saks, A. M. & Haccoun, RR (2007). Managing performance through training. Toronto, ON: Thompson Shadish, W. R, Jr., Cook, T. D., & Leviton, L. C. L. (1991). Foundations of program evaluation. Thousand Oaks, CA: Sage. Shadish, W. R , & Luellen, J. K. (2004). Donald Campbell: The accidental evaluator. In AIkin, M. C. (Ed.), Evaluation Roots (pp. 80-87). Thousand Oaks, CA: Sage Publications. Stufflebeam, D. L. (2002, June). CIPP evaluation model checklist. Western Michigan University: The Evaluation Centre. Retrieved from http: //www.wmich.edu/evalctr/checklists/cippchecklist.htm Stufflebeam, D. L. (2003). The CIPP model for evaluation. Paper presented at the Annual Conference of the Oregon Program Evaluators Network (OPEN). Retrieved from http://www . wmich.edu/evalctr/pubs/CIPP-ModelOregon 10-03. pdf Stufflebeam, D. L. (2004). The 21st century CIPP model. In AIkin, M. (ed.). Evaluation roots . Thousand Oaks, CA: Sage. pp. 245-266. 62 Swanson, R. A. (1998). Demonstrating the financial benefit of human resources development: Status and update on the theory and practice. Human Resources Development Quarterly 3(9). 285-295. Visser, R. M. S. (n.d.). Trends in program evaluation literature: The emergence o/pragmatism. Texas Centre for the Advancement of Literacy & Learning. Retrieved from http://www-tcall.tamu.edu/orp/orp5.htm Wang, G. G., DOll, Z. & Li, N. (2002). A systems approach to measuring return on investment for HRD interventions. Human Resources Development Quarterly 2(13).203-224. Weiss, C. H. (1998). Evaluation (2nd ed.). Upper Saddle River, NJ: Prentice Hall. 63 "@en . "Graduating Project"@en . "10.14288/1.0340068"@en . "eng"@en . "Unreviewed"@en . "Vancouver : University of British Columbia Library"@en . "Attribution-NonCommercial-NoDerivatives 4.0 International"@* . "http://creativecommons.org/licenses/by-nc-nd/4.0/"@* . "Graduate"@en . "University of British Columbia. ADHE 590"@en . "Evaluating the evalution of training at Canada Revenue Agency : a descriptive analysis of models"@en . "Text"@en . "http://hdl.handle.net/2429/59881"@en .