A S S E S S I N G P E R F O R M A N C E A N D C O N S T R U C T V A L I D I T Y OF L A P A R O S C O P I C S U R G I C A L S I M U L A T O R S by J O A N N E L I M B.Sc. (Eng), The University of Guelph, 2001 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L M E N T OF T H E R E Q U I R E M E N T S F O R T H E D E G R E E OF M A S T E R OF A P P L I E D S C I E N C E in T H E F A C U L T Y OF G R A D U A T E S T U D I E S (Mechanical Engineering) T H E U N I V E R S I T Y OF B R I T I S H C O L U M B I A December 2006 © Joanne L i m , 2006 11 Abstract The objective of this work is to assess the construct and performance validity of two laparoscopic surgical simulators. Currently, the evaluation of surgeons is considered subjective and unreliable, and this is a reason why surgical educators have been studying surgical simulators as a method to quantitatively assess surgeons. But we must find out i f these simulators are valid and reliable methods for training and assessing surgeons. We have designed an experimental surgical tool and data collection system to quantitatively measure surgeon motor behaviour in the operating room (OR). Our experimental system collects kinematics and force/torque data from sensors, and we have developed a sensor fusion algorithm to be able to extract high frequency and continuous kinematics data. We have collected data from surgical residents (PGY4) , and compared it to expert surgeon data to investigate construct validity of both a physical simulator and virtual reality (VR) simulator. We also study the performance validity of both the simulators by comparing measurable quantities, such as force and kinematics, on the simulators with that collected in the OR. To examine differences in our contexts, we use the Kolmogorov-Smirnov statistic. According to our intrasubject intersetting (OR, V R , physical) comparisons, we see large differences between the O R and V R simulator, leading to the conclusion of poor performance validity. Conversely, we see smaller differences between the physical simulator and the OR, and therefore showing fair performance validity. In our interlevel (expert vs. resident) comparisons, we see that the V R simulator shows poor construct validity with little difference detected between skill levels, while the physical simulator seems to be able to detect differences in some performance measures and can be considered to show fair construct validity. i i i Table of Contents Abstract i i Table of Contents i i i List of Tables v i i i List of Figures ix Acknowledgements x i i i Chapter 1: Introduction and Literature Review 1 1.1 Introduction and Objectives 1 1.2 Minimally Invasive Surgery 2 1.2.1 The Challenges of MIS for Surgeons 4 1.2.2 The Challenges of MIS for Surgical Educators 6 1.2.3 Reasons for Using Surgical Simulators 6 1.2.3.1 Surgeon Certification 6 1.2.3.2 Equipment Design and Evaluation 6 1.2.3.3 Transfer of Training 7 1.3 Current Training Methods 7 1.4 Current Methods of Surgical Performance Assessment 9 1.5 Simulator Validation 11 1.5.1 Construct, Performance and Concurrent Validity 13 1.6 Research Question 14 1.7 Developing a Quantitative Assessment Method 15 1.7.1 Kinematics 16 1.7.2 Forces/Torques 17 1.8 Project Goals 17 Chapter 2: The Hybrid Experimental Laparoscopic Surgical Tool for Performance Measure Assessment 20 2.1 Introduction 20 2.2 Laparoscopic Surgical Tool 21 2.3 Performance Measures 22 2.3.1 Kinematics 22 2.3.1.1 Optoelectronic Position Tracking 23 2.3.1.1.1 Other Kinematics Options 24 2.3.1.2 Electromagnetic Position Tracking 25 2.3.2 Force/Torque 27 2.3.3 Sensor Bracket Design 28 2.3.3.1 Force Balance 31 2.3.4 Grip Force 37 2.4 Force/Grip Data Processing 38 2.4.1 Grip Calibration 38 2.4.2 Gravity Effects Calibration 42 2.5 Kinematics Data Fusion 43 2.5.1 Data Fusion Introduction 43 2.5.2 General Data Fusion 43 2.5.2.1 Fusion Methods 45 2.5.4 Kinematics Data Fusion Technique 45 2.5.4.1 Data Fusion Technique Details 46 2.5.5 Error Analysis 54 2.5.5.1 Analysis Method 54 2.5.5.2 Results of Error Analysis 55 2.5.5.2.1 Computer Generated Data 55 2.5.5.2.2 Laboratory Data 57 2.5.5.2.3 Operating Room Data 59 2.5.6 Discussion of Kinematics Data Fusion 61 2.5.7 Conclusions for Kinematics Data Fusion 62 2.6 Discussion and Recommendations 62 2.6.1 Kinematics 63 2.6.2 Force 63 2.6.3 Recommendations 64 Chapter 3: Experimental Methods for Assessing Validity of Laparoscopic Surgical Simulators 66 3.1 Introduction and Objectives 66 3.2 Subjects and Settings 67 3.2.1 Settings 68 3.2.1.1 Operating Room 68 3.2.1.2 Virtual Reality Simulator 68 3.2.1.3 Physical Simulator 69 3.3 Performance Measures 70 3.3.1 Kinematics 70 3.3.2 Forces 70 3.4 Equipment Used 73 3.4.1 Video Data 73 3.4.2 System Component Integration 74 3.4.3 Data Acquisition Software 75 3.5 Data Collection 76 3.5.1 Operating Room Study 76 3.5.2 Simulator Data Collection 77 3.6 Data Post-Processing 78 3.6.1 Kinematics Data Registration and Calibration 79 3.6.2 Force/Torque Data Registration and Calibration 79 3.6.2.1 Force/Torque Data Registration 79 3.6.3 Raw Data Synchronization 80 3.7 Electrosurgery Unit 81 3.7.1 E S U Effects 82 3.7.1.1 Removal of E S U Effects 83 3.8 Task Comparisons 86 3.8.1 The Dissection Stage 86 3.8.2 Data Segmentation 87 3.8.2.1 Data Segmenting 87 3.9 Setting Comparisons 88 3.9.1 Kolmogorov-Smirnov Statistic 89 3.9.2 Comparisons 90 3.9.3 Assigning Confidence Intervals 90 3.9.4 Dependent Data and Moving Block Bootstrap 90 3.9.4.1 Measurement Resolution 91 3.10 Discussion 93 Chapter 4: Results of a Quantitative Study to Assess Laparoscopic Surgical Simulator Validity 95 4.1 Introduction 95 4.2 Results 95 4.2.1 Context Comparisons 96 4.2.1.1 Surgical Residents 96 4.2.1.2 Expert Surgeons 97 4.2.2 The D-Value 97 4.2.3 Presentation of Results 98 4.2.3.1 A l : Intrasubject Intraprocedural O R comparisons 98 4.2.3.1.1 Resident 1: Intrasubject Intraprocedural O R 99 4.2.3.1.2 Resident 2: Intrasubject Intraprocedural O R 101 4.2.3.1.3 Resident 3: Intrasubject Intraprocedural O R 103 vi 4.2.3.2 A2: Intrasubject Intertrial V R simulator 105 4.2.3.3 A3, A 4, and A5: Intersubject Intrasetting Comparisons I l l 4.23A A6: Intrasubject Intersetting 117 4.2.3.5 Expert vs. Resident Comparisons 123 4.2.3.5.1 Interlevel Intrasetting O R 124 4.2.3.5.2 Interlevel Intrasetting Physical Simulator 127 4.2.3.5.3 Interlevel Intrasetting V R Simulator 130 4.3 Discussion 132 4.3.1 Context Comparisons 133 4.3.1.1 Intraprocedural Operating Room Variability 133 4.3.1.2 Intrasubject Intertrial V R Variability 134 4.3.1.3 Intersubject Intrasetting Comparisons 134 4.3.1.3.1 Operating Room 135 4.3.1.3.2 Virtual Reality Simulator 135 4.3.1.3.3 Physical Simulator 135 4.3.1.4 Intrasubject Intersetting Comparison 136 4.3.1.5 Interlevel Intrasetting 136 4.3.1.5.1 Operating Room 137 4.3.1.5.2 Virtual Reality Simulator 137 4.3.1.5.3 Physical Simulator 138 4.3.1.2.4 Experts vs. Residents 138 4.3.2 Performance Measure Reliability 139 4.4 Conclusions 139 Chapter 5: Conclusions and Recommendations 141 5.1 Introduction • 141 5.2 Review of Research 141 5.2.1 Experimental Surgical Tool 141 5.2.2 Data Collection 143 5.2.2.1 The Operating Room 143 5.2.2.2 The Experimental Surgical Tool 143 5.2.2.3 Simulators 144 5.2.3 Data Fusion 144 5.2.4 Performance Measures 145 5.2.5 Context Comparisons 146 5.2.6 Simulator Validation 147 5.3 Recommendations 147 5.3.1 Software 148 5.3.2 Hardware 148 5.3.3 O R Data Collection 149 5.3.4 Simulators 149 5.3.5 Other Recommendations 149 vi i 5.4 Partner & Future Studies 149 List of Terms 153 Bibliography 155 Appendix A : O R Study Experimental Protocol and Data Acquisition Procedures 165 Appendix B : Operational Definitions 169 Appendix C: University of British Columbia C R E B approval 173 Appendix D: Medicine Meets Virtual Reality Conference Submission 175 Appendix E: S A G E S Conference Submission 177 Appendix F: Transfer of Training from Simulator to Operating Room 179 Vlll List of Tables Table 1 .1- Types of validity definitions 13 Table 2.1 - Criteria for design of the sensor mounting bracket 29 Table 2.2 - Advantages of data fusion 44 Table 3.1 - Performance measures available from the three contexts 72 Table 4.1 - Summary of successful data collection from each context 95 Table B . l - Hierarchical subtask dissection definition 171 Table B.2 - Kinematics and force performance.measures 172 IX List of Figures Figure 1 .1- Typical minimally invasive surgery operating room setup 3 Figure 1.2 - A typical laparoscopic cholecystectomy operation 4 Figure 1.3 - Reduced D O F of motion of the MIS tool tip 5 Figure 1.4 - Physical and V R simulators 8 Figure 1.5 - Performance Measures 15 Figure 1.6 - Are laparoscopic surgical simulators valid? 18 Figure 2.1 - Maryland dissector tip 21 Figure 2.2 - Tool tip reference frame 22 Figure 2.3 - N D I Polaris optoelectronic position tracking system 23 Figure 2.4 - M D M A r r a y 24 Figure 2.5 - Polhemus Fastrak magnetic position tracking system 27 Figure 2.6 - F/T system of Rosen (1999) 28 Figure 2.7 - A T I M i n i 40 F/T transducer 28 Figure 2.8 - Two cut views of a typical laparoscopic tool shaft 29 Figure 2.9 - Sensor mounting bracket on surgical tool 30 Figure 2.10 - Force/Torque sensor bracket two segments 30 Figure 2 . 1 1 - Force load path through sensor bracket and F/T sensor 31 Figure 2.12 - Laparoscopic trocar 32 Figure 2 .13a - Overall view of the tool, which is then split into 3 sections for F B D analysis.32 Figure 2.13b - Effective tip forces & moments 34 Figure 2.13c - Free body diagram of distal end of surgical tool and force sensor 34 Figure 2.13d - Free body diagram of force sensor and stationary tool handle 35 Figure 2.13e - Free body diagram of tool handle and strain gauges 36 Figure 2.14 - Strain gauge circuit diagram for half-bridge circuit 37 Figure 2.15 - Strain gauges mounted on tool handle 38 Figure 2.16 - Mechanics of laparoscopic tool 38 Figure 2.17 - Interaction between strain gauges and force sensor 39 Figure 2 . 1 8 - Results from one calibration test 40 Figure 2.19 - Friction in the surgical tool handle and bracket 41 Figure 2.20 - Grip compensation 42 X Figure 2.21 - Data fusion steps 46 Figure 2.22a - Noisy magnetic data 47 Figure 2.22b - G C V smoothed magnetic data 48 Figure 2.23 - Laboratory data 49 Figure 2.24 - Interpolate the magnetic data 50 Figure 2.25 - Interpolated magnetic data 51 Figure 2.26 - Difference curve 52 Figure 2.27 - Interpolated difference curve 53 Figure 2.28 - Fused data 54 Figure 2.29 - Computer generated magnetic and optical data 56 Figure 2.30 - R M S error for computer-generated data 57 Figure 2.31 - Laboratory collected magnetic and optical data 58 Figure 2.32 - R M S error for laboratory collected data 59 Figure 2.33 - Real O R magnetic and optical data 60 Figure 2.34 - R M S error for O R data 61 Figure 3.1 - Diagram of goals for this project 67 Figure 3.2 - Tool tip reference frame 73 Figure 3.3 - Components of the performance measurement system 74 Figure 3.4 - Custom designed data acquisition software 76 Figure 3.5 - Data post-processing 78 Figure 3.6 - Data synchronization process 80 Figure 3.7 - Strain gauge data with electrocautery noise 83 Figure 3.8 - Raw and noise removed strain gauge data 84 Figure 3.9 - Electrocautery affected magnetic data 85 Figure 3.10 - Electrocautery affected F/T data 86 Figure 3.11 - V R simulator vs. Physical simulator vs. O R 87 Figure 3.12 - Kolmogorov-Smirnov C P D 89 Figure 3.13 - Moving Block Bootstrap 91 Figure 3.14 - Confidence intervals for D-values 92 Figure 3.15 - C P D of D-values 93 Figure 4.1 - Context comparisons for surgical residents 96 Figure 4.2 - Interlevel context comparisons for expert and residents 97 xi Figure 4.3 - Resident 1 intraprocedure O R C P D 100 Figure 4.4 - Resident 1 intraprocedure O R D-values 101 Figure 4.5 - Resident 2 intraprocedure O R C P D 102 Figure 4.6 - Resident 2 intraprocedure O R D-values 103 Figure 4.7 - Resident 3 intraprocedure O R C P D 104 Figure 4.8 - Resident 3 intraprocedure O R D-values 105 Figure 4.9 - Resident 1 intertrial V R simulator C P D 106 Figure 4.10 - Resident 1 intertrial V R simulator D-value comparisons 107 Figure 4.11 - Resident 2 intertrial V R simulator C P D 108 Figure 4.12 - Resident 2 intertrial V R simulator D-value comparisons 109 Figure 4.13 - Resident 3 intertrial V R simulator C P D 110 Figure 4.14 - Resident 3 intertrial V R simulator D-value comparisons 111 Figure 4.15 - Intersubject intrasetting (OR) C P D 112 Figure 4.16 - Intersubject intrasetting (OR) D-value comparisons 113 Figure 4.17 - Intersubject intrasetting ( V R simulator) C P D 114 Figure 4.18 - Intersubject intrasetting ( V R simulator) D-value comparisons 115 Figure 4.19 - Intersubject intrasetting (physical simulator) C P D 116 Figure 4.20 - Intersubject intrasetting (physical simulator) D-value comparisons 117 Figure 4.21 - Resident 1 intersetting C P D 118 Figure 4.22 - Resident 1 intersetting D-values 119 Figure 4.23 - Resident 2 intersetting C P D 120 Figure 4.24 - Resident 2 intersetting D-values 121 Figure 4.25 - Resident 2 intersetting C P D 122 Figure 4.26 - Resident 3 intersetting D-values 123 Figure 4.27 - Lumped interlevel O R C P D 124 Figure 4.28 - Interlevel O R individual C P D 125 Figure 4.29 - D-values for the two experts and three residents in the O R 126 Figure 4.30 - Lumped interlevel physical simulator C P D 127 Figure 4.31 - Interlevel physical simulator individual C P D 128 Figure 4.32 - D-values for the two experts and three residents in the physical simulator.... 129 Figure 4.33 - Lumped interlevel V R simulator C P D 130 Figure 4.34 - Interlevel V R simulator individual C P D 131 Xll Figure 4.35 - D-values for the two experts and three residents in the V R simulator 132 Figure 5.1 - N e w performance measures 145 Figure 5.2 - Concurrent research projects at the Neuromotor Control Laboratory 151 Figure A . l - University of British Columbia operating room experimental set-up 167 Figure B . l - Five levels of the hierarchical decomposition 169 Figure B.2 - Five phases of laparoscopic surgery 169 Figure B.3 - Stage level diagram for cystic duct dissection (CDD) and gallbladder dissection (GBD) 170 Acknowledgements I would like to acknowledge and thank all those who have supported me from the first day I began this journey. Firstly, my thanks to my supervisor, Dr. Antony Hodgson, for his enthusiasm and optimism in this project, even when things were at their bleakest. I could always count on him to put a positive spin on every aspect, and for this I am most grateful. I am most appreciative to the participating surgical residents that I harassed endlessly for their time, when their time is so limited. Thanks to Dr. Ed Chang, Dr. Naisan Garraway, and Dr. Kathy Hsu. Thanks also to Dr. Hamish Hwang for twisting the arms of his resident friends to participate in this project. I would also like to acknowledge Dr. Alex Nagy and Dr. Neely Panton for sharing their knowledge and expertise, and supervising the surgical residents in the operating room. Many thanks go to Marlene Purvey and her surgical staff, and also to Betty Whincup and the staff at the Sterile Supply Department. Next I want to give a big high-five to Catherine Kinnaird and Iman Brouwer. Y o u guys are the best! N o one else understands as well as you what we went through to finish this project. A s a wise Iman once said, "I thought this day would never come." I also need to give a big pat on the back to the orthopod girls: Stacy Bullock, Carolyn Sparrey, Carolyn Greaves and Christina Nios i . Thank goodness for coffee break is all I have to say about that. Also, thanks to the rest of the N C L crew. Keep up the good work! I also want to say thanks to V a l Roy for being such a good suburb buddy. I am very grateful to my friends at C E S E I : Ferooz, Vanessa, Marlene, Humberto and Dr. Qayumi. Thanks for the comic relief, the fabulous printer/copier, and all the food. The students that get to work with you next are very lucky indeed. Many thanks go to Ryan Jennings for being at home to lend an ear to listen to my whining, for pushing me constantly to work harder, and for never letting me give up. To my Dad whom I got good advice from about completing graduate studies, and giving continued support and encouragement. A n d lastly, thanks to my M o m , whom I could not have done this project without. I am eternally grateful to for her unwavering support, no matter what. 1 Chapter 1 Introduction and Literature Review 1.1 Introduction and Objectives Minimal ly invasive surgery is an increasingly popular procedure that uses smaller incisions, and results in much shorter recovery periods for the patients. Unfortunately, the surgery is substantially more demanding for the surgeon, who must learn a new set of skills; to use long instruments inside the body, while looking at a monitor outside of the body. Simulators offer the surgeon an opportunity for unlimited practice, and for practice on unusual cases. In order for the training to be useful, the simulator must accurately reflect the skill set required in surgery. The goal of this project was to validate both a physical and a virtual reality simulator in terms of the kinematics and the forces used in comparison to those used during surgery. Surgeons must learn to operate both with skill and safety. The use of surgical simulators has become more widespread and important in the training of surgical residents. It is important that researchers direct their efforts into the areas that are of most significance to the patients and surgeons alike. Objective measurements of a surgeon's performance are more readily available in a simulator as compared to taking measurements during a live operation. This is also important when evaluating trained surgeons. Making these measurements in a simulated setting would be ideal as it is much more easy to evaluate performance in a simulator than in the OR. New tool designs and improvements could also be tested in a simulator saving operating room time and money. Surgical education has lagged behind other educational areas where simulators are commonplace for teaching and training novices. Other professions, such as aviation, have successfully included simulation training into their educational programs. The success in the pilot training industry has pushed surgical educators to continue research in this area. In a survey in 1999, 92% of program directors agreed that there is a need for technical skills training outside of the O R (Haluck 2001). This is a definite explicit sign that it is imperative that other methods of surgical education be explored. 2 The overall objective of our lab research was to create and apply a quantitative method of surgical performance in order to assess two laparoscopic surgical simulators. The shorter-term goals included a study of the validity and reliability of these surgical simulators, and a study of the minimum technological requirements of a virtual reality surgical simulator. We aimed to establish whether these simulators are reliable measurement devices. The primary objective of the work presented in this project was to assess the validity of both virtual reality and physical laparoscopic surgical simulators. The second goal was to develop a new experimental tool and system capable of the collection and analysis of the performance measures used in the simulator validity assessments. Operating room data was compared to analogous tasks in the simulator settings. The new methods provide a standard for future simulator assessments. 1.2 Minimally Invasive Surgery Minimal ly invasive surgery (MIS) has become a routine and usual method of performing many types of surgical procedures. MIS is also known as minimal access surgery ( M A S ) or keyhole surgery. Because of advances in technology and medicine, many open surgical procedures can now be performed using MIS . The notion of MIS first began in the early 20 t h century (Nagy, 1992). After World War II, the two most important inventions related to endoscopy and MIS were developed: the rod-lens system and fibreoptics. After much development in surgical technique and camera technology, the first laparoscopic cholecystectomy using video was performed on a human in 1987 in Lyons, France (Mishra 2004). Within that year, many other surgeons were practicing their first cholecystectomies on humans on both sides of the Atlantic. Since the late 1980s, MIS has become commonplace in modern general surgery. The use of MIS in the United States in abdominal surgical procedures has reached 60-80% (Taylor 1995). A typical minimally invasive surgery operating room set-up can be seen below in Figure 1.1. Figure 1.1: Typical minimally invasive surgery operating room set-up. Notice the video monitors in the background and situated around the OR. The surgeons rely on these monitors to view the surgical field within the patient. The monitors show a direct video feed from the laparoscopic camera. Laparoscopic surgery has allowed surgeons to perform many of the same procedures as in traditional open surgery, but using small incisions (5-15 mm) instead of large abdominal incisions (7-15 cm) (Huntsville 2002). This increased use of M I S techniques throughout the years has led to benefits for patients. Studies have shown that the patient benefits in terms of reduced post operative pain, smaller scars, reduced hospital stay, quicker return to normal physical activities and therefore, a quicker return to work (Treat 1996, Perissat 1995). It is common, and proven to be safe, for routine cholecystectomy procedures to be day surgeries, with the patient coming into the hospital in the morning and leaving for home in the afternoon (Prasad 1996). Other patients are discharged from the hospital usuallyl or 2 days after the cholecystectomy with low complication rates (Lujan 1998). A typical laparoscopic cholecystectomy operation set-up can be seen in Figure 1.2. 4 F igure removed due to copyr ight Figure 1.2: A typical laparoscopic cholecystectomy operation. Al though there are many obv ious benefits for the patients in M I S , there is a specia l ized sk i l l set required by the surgeon that is much different than in tradit ional open surgical techniques. Laparoscop ic tools very often l imi t the surgeons' dexterity and range o f mot ion, and surgeons use uncomfortable postures to complete tasks (Person 2001). Laparoscopic tools are considered e c o n o m i c a l l y . p o o r l y designed, awkward and not easy to use (Berguer 1999, E m a m 2001 , Treat 1996). A l s o , the t ime to complete a laparoscopic procedure compared to the same'open surgical procedure can be up to 3 0 % longer (Gl inats is 1992, Treat 1996). Converse ly , other studies have shown that there is either no signif icant difference in surgical t imes, or that the laparoscopic approach may actual ly be shorter in t ime duration (Pessaux 2001). 1.2.1 The Challenges of M I S for Surgeons The special sk i l l set required o f surgeons for laparoscopic surgery is especial ly d i f f i cu l t for the trainee to learn. One o f the aspects that a novice surgeon must adapt to is what is k n o w n as the fu l c rum effect (Jordan 2001). Spec i f ica l ly in laparoscopic surgery, this is when the surgical tool is inserted into the abdomen, creating a fu lc rum. The surgeon experiences a mot ion 5 reversal. For example, when the surgeon moves their hand to the left on the outside of the body, the tool tip is moving to the right inside the abdomen. This is a basic motor skill that novice surgeons must learn. Another issue is video-hand-eye coordination (Ballantyne 2002, Perkins 2002). The surgeon is no longer directly viewing the surgical field, but rather a 2D video monitor of what is happening inside the abdomen. The surgeon is working with their hands outside of the abdomen using longer surgical tools than in open surgery, watching a video feed of what the surgical tools are doing inside the abdomen. This lends itself to a lack of depth perception and makes tasks such as suturing and knot tying more difficult. Tactile feedback is also reduced in MIS creating yet another problem for surgeons to overcome. In laparoscopic surgery, there are a reduced number of degrees of freedom (DOF) for the surgical tool. The laparoscopic tool only has 4 D O F as opposed to the open surgical tool, which has 6 D O F . This limits the surgeon's dexterity and range of motion (Tendick 1995, Ballantyne 2002). The tip movement is limited to pitch, yaw, roll and plunge (i.e. in/out of the abdomen), as shown in Figure 1.3 (Person 2000). Figure 1.3: Reduced DOF of motion of the MIS tool tip. DOF are roll, pitch, and yaw about the fulcrum created by the entry portal and plunge through the portal. (Modified from source: Person 2000). Researchers are studying methods to deal with these limitations of laparoscopic surgery by looking into new technologies such as 3D vision systems (Jones 1996, McDougal l 1996, Abdominal wall yaw pitch 6 Chan 1997, Hanna 1998), robotic surgery (Dakin 2003, Hubens 2003, Ruurda 2003, Ruurda 2002, Vuilleumier 2003) telerobotic surgery (Ballantyne 2002, Marescaux 2001, Perez 2003), and interactive image guidance (Harms 2001, Herline 2000, Stefansic 2002). 1.2.2 The Challenges of MIS for Surgical Educators Due to the inherent limitations of performing M I S , the surgical education community must face the challenge of deciding where to train the surgeons and how to evaluate them. These issues are of importance to the surgical community and public alike, in that it is imperative that surgeon trainees finish their education with the ability to operate safely and effectively. Researchers unanimously agree that the current training and evaluation of surgeons is subjective, unreliable and costly (Feldman 2004, Lentz 2002, Rosser 1998, Winckel 1994). This is one reason why there has been pressure to investigate the feasibility of using surgical simulators for the purposes of training and evaluation. 1.2.3 Reasons for Using Surgical Simulators Surgical simulators have many possible useful applications. Surgeon certification and tool evaluation are just two of the possible uses of validated simulators. 1.2.3.1 Surgeon Certification The ability to quantitatively assess surgical performance is important to the training and certification of both novice and expert surgeons. The methods that we have developed wi l l allow performance measurement in the O R followed by a comparison to the surgical reference database of performance measures from surgeons of varying skill levels. This w i l l allow for a quantitative analysis of skill level and a method for identifying where improvements are needed. For example, when a novice surgeon seems to be having difficulty in a certain task, this could ideally be identified, and advice given specifically to address the problem. 1.2.3.2 Equipment Design and Evaluation Tool and equipment designers could evaluate the performance of their new instrumentation in a validated simulated environment. The designers could ideally be confident that the new tools wi l l give the same performance in the OR. The evaluation of new tools would be an iterative 7 process where the new tool is compared to a reference database of performance measures for past tool designs. 1.2.3.3 Transfer of Training . If a simulator is shown to be valid (see Section 1.5 for further details), the next step in furthering the push for simulators to be used in surgical education programs is to determine the transfer of training issue. Do novice surgeons who practice in simulators show a significant improvement in the operating room? In other words, i f a novice surgeon spends X amount of time practicing on a simulator, w i l l there be a quantifiable improvement in O R performance, as opposed to a similar novice who does not have any simulator training? The original goal of this project was to study the issue of transfer of training from simulator to human operating room, as this is a subject that needs analysis in the surgical education and simulator fields of study. Unfortunately due to many logistical nightmares such as patient recruitment, scheduling, and many others, this project was converted to a simulator validity study. This was the most logical step as the proper O R and simulator data had already been collected. Further information on the transfer of training from simulator to O R issue can be found in Appendix F. 1.3 Current Training Methods Success in laparoscopic surgery is very dependent on the surgeon's proficiency and experience (Perissat 1995). The apprenticeship-training model is still the most commonly used for providing experience to surgical residents. This is basically where the surgical resident shadows the expert surgeon, and learns the tools and tricks of the trade by observation, questions, and some hands-on practice. The disadvantages of this approach are that the surgical educator has no control over which patients require surgery, potentially limiting a novice's contact to a small variety of cases. Consequently, the novice surgeon may only be exposed to a limited pool of anatomy and pathology. The use of human cadavers as a training model has been used in surgeon training programs with some success (Martin 2003). However these cadaveric models have their own 8 disadvantages: they are expensive, subject to availability, have different tissue properties than a live human, and there is some concern over transmission of disease (Nelson 1990). Animal models may avoid some of the stresses and time constraints of apprenticeship training, but the anatomy often differs from that of humans. The disease state cannot often be reproduced in the animal, and an animal care facility is expensive. There are many moral and ethical issues related to training on live animals. The United Kingdom has banned the use of animals for surgical training (Li r ic i 1997, Moorthy 2003). Surgical simulators, both physical and virtual reality (Figure 1.4), are becoming more widely used and accepted for use in surgical education, although their use is still limited at the University of British Columbia surgical training program, in that currently there is no prescribed simulator training. The use of virtual reality (VR) systems with haptic (force-feedback) interfaces has garnered much interest. Simulators are designed to highlight either or both the psychomotor skills (e.g., clipping and suturing skills) and the cognitive aspects of surgery (e.g., decisions about the steps to follow during a procedure). Simulator training is safe, highly available and unlimited practice is possible. No supervision is necessary when a novice is using these simulations. Figure 1.4: Physical and VR simulators. The left picture shows a physical simulator using regular laparoscopic tools, and an inanimate model. The right picture shows a VR simulator with computer-generated models. Recently in the surgical education community, there has been quite some interest expressed specifically in virtual reality (VR) simulators. Most of the current studies were done with V R simulators. Although V R simulators are comparatively expensive in initial cost compared to bench-top simulators, they do have advantages. These V R simulators can be programmed to include variant anatomy (pathologies, rare occurrences), temporal changes (patient status, bleeding), and of course, provide objective measurements (e.g., time, errors, kinematics, etc.) through the computer software. Surgical educators have come to the realization that the surgical training programs should become more structured, and that surgical models and simulators should have a more important role in training, evaluating, and certifying surgeons (Feldman 2004). For the attending expert surgeons to be wil l ing to change to this new paradigm of teaching, it is imperative to eventually demonstrate that time spent in a simulator can replace time spent in the operating room. This is not only important to the educators but to the hospital administrators and the taxpayers alike. In the U S in 1997, the estimated cost of training 1014 general surgery residents in the O R was $53 mill ion (Bridges 1999). This cost was mostly attributed to the extra amount of time (2480 hours) spent in the ORs when a resident is operating. So it is quite obvious that financially, simulators may save time in the O R and therefore money in training surgeons. 1.4 Current Methods of Surgical Performance Assessment A clear and objective method to assess performance and skill in laparoscopic procedures is potentially useful for many aspects of surgery including surgical resident evaluation, simulator validation, and surgical tool evaluation. Since the early 1970s, when Kopta developed one of the first methods for performance evaluation, the surgical education community has become quite interested in this topic (Kopta 1971). Current evaluation methods are known to be subjective and possibly unreliable, so there is a need for objective methods to measure surgical performance (Rosser 1998, Winckel 1994, Lentz 2002, Chung 1998, Feldman 2004). One of the more commonly used methods for surgeon evaluation is the structured skills assessment form. These forms can be a type o f checklist or a form where the evaluator must describe/fill-in specific areas. This type of form allows for a complete intra-operative performance evaluation, which can analyze both psychomotor and cognitive skills of a surgeon. Many researchers have used this type of evaluation in various studies (Winckel 1994, Eubanks 1999, Reznick 1997). Many studies have also been done to show the validity and 10 reliability of using these structured skills forms (Martin 1997, Goff 2001, MacRae 2000,Cohen 1990, Regehr 1998,Faulkner 1996). The shortcomings of these types of forms include patient variability, stress associated with the O R environment, and the difficulty of recognizing the level of technical skil l . These surgical skills are not specifically quantified during these structured skills assessments. Another very common measure used to quantify surgeon performance is the speed to complete a task. The time required to perform a procedure is easy to measure and has been used in many studies (Derossis 1998, Fried 1999, Hanna 1998, Hodgson 1999, Rosser 1997, Starkes 1998, Szalay 2000, Taffinder 1999). Quality of performance has also been used as a method of evaluation. This measure is generally evaluated using subjective methods such as checklists and global assessments ratings (Eubanks 1999, Feldman 2004). Global assessment ratings are a type of subjective evaluation method where an evaluator can rate the subject on a scale (i.e. 1-poor to 5-excellent). Objective Structured Assessment of Technical Ski l l (OS A T S ) is one of the more commonly used and researched qualitative assessment techniques (Martin 1997). The O S A T S is a set of operation-specific checklists that is specific to a physical simulator. Quality is a subjective performance measure that can usually be easily implemented into any type of evaluation method. A measure of error has also been studied with some interest. Although most surgeons do not like to speak about errors or injuries occurring during surgery, errors and injuries do occur (Francoeur 2003, Way 2003). The methods of evaluating error vary from objective measures, usually in simulated settings, (Francis 2002, Grantcharov 2003, 0 'Toole l999) to subjective observed measures (Bann 2003, Joice 1998, Seymour 2002). Force/torques are another measure that can be analyzed. More recently, Rosen and colleagues successfully completed a study in a porcine model analyzing force/torque signatures on the surgical tool tip (Rosen 2001). The researchers used a Markov modeling method (method to detect patterns) along with a structured process to classify tool movements to evaluate surgical performance. They showed they could correctly categorize surgeons into two different experience levels (novice and expert) because of similarities derived from their Markov 11 models. Other researchers have also incorporated force measurements into their measurement and training systems (deVisser 2002, Hanna 1997, Morimoto 1997, O'Toole 1999, Wagner 2002, Verner 2002,Yamauchi 2002). 1.5 Simulator Validation Validity is a general term with many definitions. The American Psychological Association developed a set of standard definitions to aid in validity studies ( A P A 1974). From these standards, we are most interested in behavioural correspondence validity (now referred to as validity), as this is how the human operator treats the simulator as compared to the real situation. B y comparing the simulator and the real situation during analogous tasks in terms of human operator behaviour, this can be tested (Blaauw 1982). It is of utmost importance that the simulators used for surgical skill training, assessment, and certification be validated. The test in validating surgical simulators is to prove that performance in the simulator wi l l represent performance in the OR. To ensure a valid simulation, we must make certain that a surgeon treats the simulation, in as many applicable and quantifiable aspects as possible, the same way they treat a live patient. Many research groups have put considerable time into validating the currently available surgical simulation systems (Adrales 2003, Bloom 2003, Feldman 2004,Paisley 2001, Schijven 2003, Strom 2003, Taffinder 1998). There are five common different levels of validity from least to most rigorous: face, content, construct, concurrent and predictive (Table 1.1). Face validity is a type of validity that is assessed by experts' review of the contents of the simulator. It is a subjective test as it is based on expert opinion, and is usually done in the initial phases of validity testing. Content validity is an extension of face validity, where the expert would use a checklist to reduce the rater subjectivity. The content validity tests to see i f the simulator contains the steps and skills that are used in the real procedure. These simple validity tests are also the most subjective. 12 Construct validity is tested by discrimination between skill levels. It tests the degree to which the simulator "identifies the quality, ability or trait it was designed to measure" ( A P A 1974). This is another common test applied to surgical simulators. Concurrent validity is a validity test that correlates performance with the current gold standard. For surgical simulators, the gold standard is operating room performance by expert surgeons. Currently, the gold standard measurement is done with performance-specific checklists in the O R (Feldman 2004). Using this approach is generally time consuming and is still considered subjective. Predictive validity is whether the simulator can predict actual performance in the real setting. This type of validity is rather controversial as decisions about junior surgeons may be based on simulator performance. If predictive validity is shown, a poor simulator performance may remove juniors from continuing in their surgical training (Gallagher 2003). In a parallel project to the one to be described a fellow lab member, Catherine Kinnaird, investigated some aspects of validity of both physical and virtual reality surgical simulators with expert surgeon subjects (Kinnaird 2004). In Kinnaird's work, a new type of validity, performance validity, was introduced. Performance validity is a quantitative assessment of measurable quantities of performance in the O R (i.e. kinematics and force profiles); i f these measures are the same as in the surgical simulator, then the simulator can be considered valid. This new type of validity allows for objective assessments using the same measurable quantities in many different environments. Therefore, we have uniformity and consistency when making evaluations in the O R or simulators. 13 Table 1.1: Types of validity definitions (Gallagher 2003) Validity Definition Studies Face Expert Opinion Haluck 2001, McCarthy 1999 Content Checklist o f matching elements Paisley 2001, Schijven 2002 Construct Differentiates between skil l levels Adrales 2003, Datta 2002 Gallagher 2004, Grantcharov 2002, Taffinder 1998, Schijven 2003 Concurrent Correlates with gold standard Ahlberg 2002, Feldman 2004, Grantcharov 2004 Performance Quantifiable performance measures same as "real" setting Present study, Kinnaird 2004 Predictive Predicts future results N / A 1.5.1 Construct, Performance, and Concurrent Validity A very important step in the evaluation of surgical simulators is to establish construct validity. Construct validity is a quality established when performance scores on a simulator reflect the ability of the person performing the actual procedure; therefore an expert should score higher than a novice. Different researchers have studied construct validity of various different types of simulators such as arthroscopy and gastrointestinal endoscopy (Bloom 2003, Srivastava 2004). The concept of construct validity is often regarded as an important central theme in validation studies (Gabberson 1997). In the laparoscopic simulator field, there has also been extensive research into the validity of the M I S T - V R simulation system (Mentice Medical Simulation A B , Gothenburg, Sweden). The construct validity of this particular system has been established in a few different studies (Gallaher 2002, Gallagher 2001, McNatt 2001). The latest study on the M I S T - V R showed that the system has "discriminative validity" and was capable of evaluating the psychomotor skills necessary in laparoscopic surgery and discriminating experts and novices (Gallagher 2004). The M I S T - V R system has been shown to discriminate between the performances of subjects with similar experience and similar skil l levels. Subjects can then be grouped according to psychomotor skill level. Discriminative validity is a further refinement of construct validity. 14 Construct validity has also been shown in physical simulators such as the M c G i l l Inanimate System for Training and Evaluation of Laparoscopic Skills (MISTELS) system (Fried 2004). This was an in-depth study with over 200 participating surgeons and trainees in 5 countries. The M I S T E L S system is the physical simulator used by the Society of American Gastrointestinal and Endoscopic Surgeons ( S A G E S ) Fundamentals of Laparoscopic Surgery (FLS) program. The current "gold standard" for concurrent validity studies is O R performance. But the problem with this is the subjective methods (i.e., checklists) to evaluate this O R behaviour. 1.6 Research Question Because the previous methods for investigating validity in surgical simulators have been done with subjective assessments, there is a need to further the study into simulator validity by using quantitative measures. What we would like to know is whether or not motor behaviour in the simulator is analogous to the OR. This w i l l allow us to determine whether the simulator is a good training and evaluation environment. In a complementary study to this project, Catherine Kinnaird (2004) began the investigation into simulator validity by evaluating expert surgeons in the O R and with both physical and V R simulators. That study looked at the performance validity of these simulators by comparing data from the O R with that of the simulators. This expert surgeon study led us to want to investigate further the validity of these simulators. Therefore, the project to be described in this manuscript is a furthering of the validity study of these simulators. The primary objective of this project was to investigate the performance, construct, and concurrent validity of both a physical and V R surgical simulator. The construct validity study used the expert surgeon data analyzed by Kinnaird (2004). The secondary objective was to develop a system that was capable of collecting and analyzing quantitative data from the human OR. 15 1.7 Developing a Quantitative Assessment Method The development of a quantitative method to assess surgeons in the human O R required much thought and preparation. To be able to study the validity of surgical simulators and gather the performance measures that w i l l allow for various context comparisons, we needed to improve and elaborate upon performance measures previously established within our lab. The performance measures that have been used previously include time, kinematics, joint angle and event sequencing (McBeth 2002). A s shown below in Figure 1.5, known in our lab as the "Wheel of Performance", there are other measures that can be made, and incorporated into our system of performance evaluation. Postural Quality Performance Measures ) J Kinematics Event • Sequencing Force / Torque Error Frequency Figure 1.5: Performance Measures. The performance measures in bold and in the solid-line box are the ones used in this study. The measures shown in the dotted boxes will not be specifically studied in this thesis. Due to the time constraints of this project, we focused our study on the following measures: kinematics, and force/torque. The performance measure of quality can be included easily by including checklists/questionnaires of some type, and as mentioned above, postural and event sequencing has been successfully completed in a previous study in our lab. 1.7.1 Kinematics For this study, we have continued the work by McBeth (2002) to gather and analyze kinematics data for the surgical tool tip during laparoscopic surgery. We required a high frequency tracking system that would give us three-dimensional position and orientation data. For this type of tracking, there are many types of commercial systems available such as optoelectronic, magnetic, ultrasonic, and each system has its own advantages and disadvantages. Optoelectronic systems can provide wireless high frequency data, and can be sterilized for O R use. The hybrid systems that are able to track both passive (wireless) and active (infrared) markers are useful in many circumstances. Disadvantages include line-of-sight problems and interference with external infrared sources. McBeth (2002) used an optoelectronic system and did have problems with line-of-sight where some procedures had virtually no usable data, which led to unreliable results. The optical system also had a low sampling frequency (30Hz), and this led to difficulties in producing velocity, acceleration and jerk profiles. In this project, we wanted to improve upon McBeth 's method, and produce high frequency continuous kinematics data. A tracking system that was available to us that seemed to overcome the problems of the optoelectronic system was an electromagnetic system. Electromagnetic sensors give higher frequency data sampling and are not affected by line-of-sight issues, but a wire to the interface unit connects each receiver. External ferrous materials and electromagnetic fields also detrimentally influence these sensors. Because of these issues, an electromagnetic system could not be used solely in the O R environment. The study created a kinematics data collection system that incorporates both the optoelectronic and electromagnetic tracking systems. This overcomes the line-of-sight and low sampling rate problems of the optical sensor, and the low accuracy, metal interference problems of the magnetic sensor. B y using a combination of the two position sensors, we are able to achieve a continuous high frequency kinematics dataset. 17 1.7.2 Force/Torques To measure forces and torques, again there are commercially available systems. Force and torque measurements are made with specially designed force/torque sensors. For use in this project, it was important that we find a sensor that was small enough, yet robust enough, to be used in the OR. Strain gauge based force sensors are commonly used, and easily available, and can be gas sterilized for use in the OR. We followed the lead of Rosen (2001) with their technique of mounting a force sensor onto the shaft of a surgical tool. We also used strain gauges mounted to the surgical tool handle to aid in the calibration of the force sensor and to measure grip forces. 1.8 Project Goals Surgery and surgical education are at a point where the traditional "see one, do one, teach one" teaching technique is no longer acceptable. Surgical education experts have more recently looked into the possibility of using simulators to train, test and certify surgeons. Before these simulators can be used in widespread practice, a thorough evaluation of the systems must be done. Validation of these physical and virtual reality simulators is of utmost importance, as a valid simulator w i l l provide an environment that closely approximates the environment where the task wi l l eventually be performed (Prystowski 1999). The primary goal of this project is to assess construct and performance validity of two surgical simulators: virtual reality and physical (Figure 1.6). Construct validity refers to the concept that the context actually recreates the environment that it intends to recreate. A method of testing this in a surgical setting is to see whether expert surgeons perform better in these simulators than resident surgeons. A simulator that shows construct validity w i l l be able to detect the skill level differences between experts and novices. Performance validity of a simulator is where the simulator's behaviour is the same as in the OR. If a subject performs the same quantitative measures (such as kinematics or force) in a simulator as in the OR, the simulator is said to show performance validity. In turn, we also begin an investigation for quantitatively assessing concurrent validity of the both V R and physical simulators. We are able to make a quantitative "gold standard" measurement in the O R with expert surgeons (data analyzed by Kinnaird 2004), and gather the same performance measures in all other contexts (i.e. both V R and physical simulators). The results from this study could then be used in the design of new simulators, surgical tools and techniques, surgeon training and evaluation. 18 Figure 1.6: Are laparoscopic surgical simulators valid? We are looking for similar motor behaviours between the simulator and the OR to investigate validity of laparoscopic surgical simulators. The orange represents a physical simulator, where the task was to peel the skin off the orange and remove a few segments; the bottom left picture is of a VR simulator interface. As mentioned previously, our performance measures of kinematics and forces were used for our quantitative measures for our validity study. We required a continuous high frequency signal in both these measures, and our existing lab system did not allow for this (i.e., occlusions in optical data). Therefore, the secondary goal of this project was to develop a new tool that would allow for us to get these continuous high frequency measures. The new data collection and analysis system incorporates a data fusion of the two kinematics data streams that eliminates the problem of occluded optical data. Previous methods of combining kinematics data done in the surgical environment have been attempted (Birkfellner 1998, Nakamoto 2000) but none of them were a true fusion of kinematics data. 19 Chapter 2 describes the design of the new experimental surgical tool, and the design o f all the subsystems required for data collection and analysis. It provides a thorough description of the data fusion technique of two kinematics data streams to create high frequency continuous performance measures from the gathered data in the O R and physical simulator as well as the force measurement considerations and calibrations that are required for extraction of force performance measures. Chapter 3 is the experimental methods used to collect data in the OR, and with the two simulators. A description of the equipment used and details of the data post-processing are also included. Chapter 4 contains the results of the experimental testing and a discussion of these results. The reliability of the chosen performance measures and the subject and context variability relating to validity of the surgical simulators is investigated. Chapter 5 is a summary of the findings, conclusions and recommendations for future work. The conclusions relate to current and complementary studies that affect studies in surgical education and simulation. 20 Chapter 2 Experimental Laparoscopic Surgical Tool for Performance Measure Assessment 2.1 Introduction Minimal ly invasive surgery is now a common and essential component o f modern surgical medicine. Unfortunately, the same developments in surgical education and assessment have not kept pace. The current methods o f surgical assessment have been shown to be subjective and unreliable (Chung 1998, Feldman 2004, Lentz 2002, Rosser 1998, Winckel 1994). Therefore, it is agreed there is a need for an objective method to assess surgical performance. For many years, the notion that operative skills should be evaluated has been brought up repeatedly (Kopta 1971). Surgical simulators may provide an excellent venue for performance evaluation, as the measures can be objectively measured. Bench-top trainers, virtual reality (VR) systems and animal models are all used in surgical education programs currently. The performance measures that have been used by researchers include completion time, errors, force/torque signatures, event sequencing, and tool tip kinematics (Chung 1998, deVisser 2002, Derossis 1998, Hanna 1998, McBeth 2002, Rosen 2002, Way 2003, Yamuchi 2002). The longer-term goals o f the lab projects are to create a surgical skil ls database where surgeons could look-up their performance as compared to others. A surgical resident would be able to compare their performance to others o f their own level, and see what needs to improve, or where they excel. But in order to do this, research must be done to validate the surgical simulators, and prove that training in a simulator does improve O R performance. Currently, studies have shown that expert surgeons perform better in simulators than novices, and practicing in a simulator leads to improvement in the simulator (Derossis 1998, Fried 2004, Rosser 1997). It has also been shown that assessments made in a simulator can be used to monitor progress (Derossis 1999, Fried 2004), and that practice in a porcine model leads to O R performance improvement (Fried 1999). And even more recently, breakthrough studies have shown that practice in a simulator would indeed lead to improvements in the human O R (Seymour 2002, Grantcharov 2004). This tells us that skil ls learned in a simulator could be used to replace O R time for learning. 21 This chapter describes the design and considerations for creating a new tool and data collection system to measure O R and simulator data used in studying the validity o f both physical and virtual reality simulators. The objective is to improve upon the current tools used to gather the performance measures, and to add measurement of force to the system originally created by former lab member Paul McBeth (2002). A new technique was also created to fuse our two gathered streams o f kinematics data to create a high frequency and continuous kinematics data stream. 2.2 Laparoscopic Surgical Tool The laparoscopic surgical tool that was used in these studies was a Maryland dissector as seen in Figure 2.1. This particular tool was chosen as it is used the most during the initial parts o f the laparoscopic cholecystectomy procedure to dissect away the surrounding tissues from the cystic duct and artery. It is used to pul l , spread, and tear away the extraneous body tissues. When it is connected to the electro-surgical unit, it is capable o f burning and cauterizing tissues. This particular tool was chosen on recommendation o f an expert surgeon participating in our studies. We obtained a commercial ly available tool through Storz Endoscopy. These tools have an interchangeable tool tip insert. Other tool tips may be purchased and used instead of the Maryland dissector insert. This is a good feature for future work, as different tips and therefore motions and forces w i l l be available for data collection. Figure 2.1: Maryland dissector tip. Used for dissecting away surrounding tissues. 2.3 Performance Measures There are a wide range o f performance measures that are available for assessing surgical sk i l l . In consultation with expert surgeons, literature searches, and fol lowing the protocol from the previous study done in our lab by Paul McBeth (2002), we are continuing with the chosen performance metric o f tool tip kinematics, and with the addition of tool tip force/torque. We are no longer including completion time, ergonomics/joint angles and event sequencing that were previously completed, but they can easily be re-implemented back into the system. The fol lowing sections describe further the selected performance metrics and the methods we used to collect this data. 2.3.1 Kinematics The use of tool tip kinematics measures in assessing surgical performance has become more common in surgical performance measurement systems (McBeth 2002, Rosen 1999). Rosen's group has created the BlueDragon system, which measures kinematics o f the tool tip in vivo in a porcine model (Rosen 2002). Another group has incorporated electromagnetic trackers to measure distance, number o f movements, and speed for a surgeon's hand movements in a laboratory setting (Taffinder 1998, Smith 2002). In a previous study within our lab by McBeth (2002), kinematics data was collected using an optoelectronic position tracking system. Our group continued with McBeth 's work and further elaborated and improved the system to measure tool tip kinematics data to investigate tool tip velocities, acceleration, and jerk in the fol lowing tool tip directions: axial, grasp, translation, transverse, absolute and roll about the tool axis (Figure 2.2). Figure 2.2: Tool tip reference frame. Tool tip directions with respect to the tool handle. The axial (zj direction is along the tool shaft. The grasp (y) is in line with the tool jaws. The translate fx) direction is in the perpendicular direction of the y and z axis. Y X - Translate Z - Axial 2.3.1.1 Optoelectronic Position Tracking In a previous study in our lab by Paul McBeth (2002), an optoelectronic motion tracking system was used to collect the kinematics data. According to the product manual, the Northern Digital (NDI Northern Digital Inc., Waterloo, O N , Canada)) Polaris Hybrid Tracking System (Figure 2.3) is capable o f tracking the 3D positions o f both infrared light emitting diodes ( IRED's) and passive reflective markers with an accuracy o f -0.2-0.3mm. In our study, we only used passive markers. This optoelectronic system was originally chosen because surgeons have seen them in the operating rooms and are familiar with their presence, the parts are easily sterilizable, a system was available, and we were primarily interested in postural data. Figure 2.3: NDI Polairis optoelectronic position tracking system. The top picture is the camera unit, and the lower picture is the tool interface unit. The Polaris system uses an infrared camera to track the desired markers. It requires three passive markers (retro-reflective balls) to establish an array, or reference frame. Polaris records the position and orientation o f the reference frame with respect to the camera. A Multi-Directional Marker Array ( M D M A r r a y ) was custom designed and made by McBeth (2002), and was attached to the experimental tool to track tool movement. This specially designed array was created to make the tool visible from many angles compared to just a standard planar array. The standard array was one o f the original problems with this system. 24 The M D M A r r a y has five geometrically unique faces, and can be rotated in many directions to al low the Polaris camera to track one face at a time. This allows for improved visibil ity o f the passive markers to the camera and more continuous data to be collected, as intermittent data and therefore gaps in the data, is a significant problem. See Figure 2.4 for a picture of the M D M A r r a y . Figure 2.4: MDMArray. Halo of optical passive marker balls used for optical position tracking. The infrared camera tracks faces (3 balls) of the array. The study conducted in our lab previously has shown that the Polaris optoelectronic system is usable in the OR, but some limitations were discovered. Because the Polaris depends on line-of-sight from the camera to the marker arrays, these arrays can become occluded from the camera's view by surgeon movements, interrupting and leaving gaps in the data stream. It was found that during typical manipulation tasks, the cl ipping tool was visible 78 +/-12% o f the time even with the M D M A r r a y (McBeth 2002). 2.3.1.1.1 Other Kinematics Options Because our goal was to have a system that could gather continuous high frequency data to obtain our performance measures, we considered various options such as: • Re-designing/modifying the current marker array to al low for more positions o f the array to be seen • More optical tracking cameras to allow the arrays to be seen from multiple angles, therefore increasing visibility • Incorporating o f a second motion tracking system: o Accelerometer/gyro o ShapeTape™ (flexible tape like position sensor which reports its shape) o Electromagnetic system 25 The options o f changing the marker array or adding more cameras to the optical tracking system were discarded as this may improve the problem o f occlusions/gaps, but would likely not solve it completely. A lso the sampling frequency would still remain relatively low. The accelerometer/gyro option was considered, but was not easily available in our lab, and the same for the ShapeTape™. Neither of these systems would have been reasonable to design and debug in a reasonable amount of time. The electromagnetic system was available for our use as it was available through inter-departmental collaborations, and would provide the high frequency and continuous data stream that we required. 2.3.1.2 Electromagnetic Position Tracking Electromagnetic tracking systems have been used in the past in surgical applications, but problems such as electrical noise and interference have been reported (Datta 2002, Frantz 2003, Smith 2002). Other researchers have attempted to combine sensors in the surgical environment. A study completed by Birkfellner and colleagues (Birkfellner 1998) at the University of Vienna successfully combined and calibrated a hybrid (optical and electromagnetic) tracking system. Their motivation for merging the two tracking systems was similar to ours in that they were concerned with the optical system's line-of-sight limitations, especially in a crowded environment like the OR. The electromagnetic system provided a continuous stream o f data. Their hybrid tracker employed a simple switching protocol: i f the optical system is in view and available to collect data, it was used. If not, data was requested from the magnetic tracker. Only one piece o f data was collected at each time interval, either optical or magnetic, so no true fusion was performed. This system was tested in an O R test set-up but not during an actual operation on a human. The main contribution o f this group was to investigate to what extent ferrometallic materials in the O R affected a magnetic tracking system, and created a calibration look-up table to compensate for the interference. They also found that the calibration to be useful after multiple registration attempts under varying O R conditions. This is an idea that holds promise for future studies, and was not used in our study due to the fact that we could collect two separate data streams (optical and electromagnetic sensors) with relative ease. Creating a switching protocol would have been more time consuming. 26 Another group led by Nakamoto (2000) also created a hybrid system involving both optical and electromagnetic tracking systems. This group recognized that the source o f many inaccuracies in a magnetic system in an operating room are the O R table and surgical instruments. Because of space and time constraints, it is also very difficult to calibrate for these distortions during or before an operation. This group developed a method for calibration, which allowed the magnetic transmitter to be moved intraoperatively, and allowed for optimal physical placement of the transmitter by using an optical sensor to track the magnetic transmitter. A n interesting discovery by this group was that the distance between the magnetic transmitter and receiver must be relatively short to maintain an acceptable accuracy. They found that the transmitter-receiver distance must be 20cm for an error o f 2mm in and around O R equipment. This group did not seem to fuse the data, but simply used the optical system to track the magnetic transmitter. Our goal was to fuse the optical and magnetic data to create one continuous and high frequency dataset. This was the most reasonable and feasible option at the time as we were able to collect both the optical and magnetic data easily. We wanted to rely on the accuracy of the optical system, but use the continuous high frequency data from the magnetic system. By performing a data fusion, we were able to take advantage o f the good qualities of both systems. The electromagnetic system we used was the Polhemus Fastrak. The Polhemus Fastrak (Polhemus Inc., Colchester, V T , U S A ) electromagnetic tracking system was chosen to be the complementary tracking system to the Polaris optoelectronic tracking system. The Fastrak is a magnetically- based tracking system based on a fixed transmitter that sends out low frequency magnetic fields that allows the moving receiver to determine its position. Six degrees o f freedom for position and orientation can be measured. The Fastrak does not suffer from line-of-sight issues, and has a much higher sampling frequency (120Hz). Although, magnetic systems do have their own disadvantages such as suffering from drift, interference from ferrous metals in the environment, and are electrically wired. The Polhemus user manual gives an accuracy o f 2 mm within the l m 3 working volume, but one study found that this accuracy could only be achieved within a transmitter-receiver distance o f 22cm (Mi lne 1996). The Polhemus Fastrak electromagnetic tracking system can be seen below in Figure 2.5. 27 Power supply Transmitter Tool interface unit Figure 2.5: The Polhemus Faslrak magnetic position tracking system. This picture shows the tool interface unit, power supply, transmitter, 1 receiver and the stylus. 2.3.2 Force/Torque The adequate and appropriate use of forces/torques (F/T) in any surgical procedure is a ski l l that must be learned by a novice and practiced carefully by al l . Surgical procedures require a certain amount of finesse and knowledge when applying forces and torques to human tissues. It is important that a surgeon is aware o f this aspect, and takes it into account during any procedure. The collection o f continuous high frequency F/T data to measure the surgical tool tip-tissue interaction force/torques during a live human surgery was our goal. Rosen and colleagues have successfully measured forces and torques in vivo in a porcine model, and were able to classify surgeons' ski l l level using force/torque signatures (Rosen 1999). Their F/T data was collected using two separate sensors: a tri-axial F/T sensor and a strain gauge system mounted on the surgical tool handle (Figure 2.6). Their sensor is a custom-made tri-axial F/T transducer that mounts directly onto a laparoscopic tool shaft (hole through the center of the transducer). 28 Figure 2.6: F/T system of Rosen (1999). A custom-designed F/T sensor was mounted directly onto the surgical tool shaft. This sensor has a hole through the center. A strain gauge system is mounted onto the tool handle. To measure the forces and torques associated with the surgical tool tip, we mounted a Mini40™ (ATI Industrial Automation, Apex, N C , U S A ) force/torque sensor (Figure 2.7) to our experimental Maryland dissector tool. This is a strain gauge based transducer, and able to withstand the forces and torques used in laparoscopic cholecystectomies. Force and torques in all three axes (Fx, Fy, Fz, Tx , Ty, Tz) were recorded at 120Hz in counts per unit force. This data is continuously collected directly into a Matlab file. Wi l lem Atsma, also a member o f the Neuromotor Control Laboratory, wrote the Matlab drivers for data streaming o f the F/T data. The Min i40 was chosen as it was available in our lab, and is compact enough to be mounted onto the surgical tool without much interference, and not affect the weight o f the surgical tool significantly. Figure 2.7: ATI Mini40 F/T transducer. It is 40mm in diameter, 12.2mm thick, and weighs 50g. Sensing range: Fx. Fy +/- 80N, Fz +/-240N. 2.3.3 Sensor Bracket Design To attach the optical sensor M D M A r r a y , magnetic receiver, and the F/T sensor, onto the Maryland dissector surgical tool, some type o f mounting bracket was required. Many considerations were taken into account in the design o f this mounting bracket (Table 2.1) 41 29 Table 2.1: Criteria for design of the sensor mounting bracket. I Criteria | Force bearing/load path I Lightweight Small Non-conductive material Non-obtrusive N o sharp edges Reason To al low for the force path to travel through the F/T sensor To not affect the surgical tool weight and balance To keep the surgical tool shaft length as long as possible To allow electro-cautery current to pass through the tool shaft and not through the sensors (especially the F/T sensor) To allow the surgeon as normal tool function as possible To prevent surgical staff from cutting gloves Special measures had to be taken to allow the F/T sensor to be able to function properly, and be able to measure the tool tip forces/torques. We wanted to ensure that all the forces would be transmitted through the innermost shaft of the surgical tool. To do this, the outer shaft o f the tool was cut to al low for these forces to be transmitted along the innermost shaft (Figure 2.8) through the bracket and then through the F/T sensor. This changed the original electrical isolation coating (seen as the thin black coating on the tool shaft) o f the surgical tool shaft, and care had to be taken to minimize the area of the electrically live shaft that is exposed. The bracket was designed to not allow for accidental contact between the human and the exposed shaft. Insulating covering ^Metal tube Innermost shaft Figure 2.8: Two cut views of a typical laparoscopic tool shaft. It consists of two layers of lubes (thin and thick lines), and the innermost shaft (dotted fill). The outer layer is a protective and electrically insulating covering. The middle tube is the metal structure of the tool shaft. And the innermost shaft is connected to the tool tip, and the tool handle. The innermost shaft and the middle tube are electrically live when electrocautery current is applied. After much iteration, the bracket was finally designed to mount all sensors, and satisfied the criteria (Figure 2.9). The bracket was designed in conjunction with volunteer lab engineer (Brandon Lee). 30 Figure 2.9: Sensor mounting bracket on surgical tool. The inset picture is a close up of the sensors mounted on bracket. The final design o f the bracket consists o f two parts: top and bottom segments as seen in Figure 2.10. The bracket parts are mounted in-between the two parts o f the original surgical tool. The top segment is directly attached to the outer shaft of the surgical tool, and w i l l sense all forces of this top segment. The force sensor was not mounted inline, as with Rosen's device shown earlier, because our force sensor did not have a hole drilled through it to al low passage o f the central rod. Figure 2.10: Force/Torque sensor bracket two segments. ATI force torque transducer mounted below tool shaft via custom designed bracket (Source: Kinnaird 2004). 31 The file was submitted in S T L format (generated by Sol id Works) to technologists at the British Columbia Institute for Technology (BCIT) to construct the bracket out o f a medical grade A B S plastic on the rapid prototyping machine. A non-conductive material was required because o f the electrical current that is transmitted through the shaft for tissue cutting and coagulation. A l l the sensors, as well as the user and patient, must be protected from this electrical current. The design and material o f the bracket also sets the magnetic receiver as far away as possible from any metallic elements that could potentially lead to errors in the magnetic sensor readings. The wires coming from the Fastrak, F/T sensor, and the strain gauges, can all be gathered to one side o f the bracket and tied together to minimize obstruction to the surgeons. This is done before each surgical experimental procedure. The detailed drawing and specifications o f the mounting bracket can be found in Appendix C. 2.3.3.1 Force Balance The M D M A r r a y optical halo and F/T sensor connect the two segments. Since the force sensor is in the path connecting the segments, it registers any forces acting between them (Figure 2.11). Figure 2.11: Force load path through sensor bracket and F/T sensor. The force travels bi-directionally along the tool shaft through the bottom segment, through sensor, and through top segment or vice versa. In an O R situation, a trocar (tubular object used to hold surgical tool near operating site) is inserted into the abdomen, and the surgical tool is inserted through this trocar (Figure 2.12). This allows smoother movement of the tool and provides stability for the long laparoscopic tools. The surgical tool can be pushed down or pulled back along the length o f the trocar to 32 access deeper tissues. The trocar also keeps the abdominal inflation gases inside because it is sealed. These gases are required in laparoscopic surgery to al low for better internal visualization. ^^C_3 =o Surgical tool Abdominal Force sensor Trocar F igure 2.12: Laparoscopic trocar. The trocar provides stability for the long laparoscopic surgical tools. There are force interactions between the tool shaft and the trocar that are sensed by the force sensor. In our subsequent data analysis, we require an estimate o f the forces the surgeon is applying to the tissues using the tool. In this section, we present a free body diagram analysis o f the loads applied to the tool and demonstrate how the tip forces are estimated. These FBDs are shown in Figures 2.13a-e. Rod X . \ Tool tip Tool shaft Force sensor Rigidly attached tool handle Moving tool handle F igure 2.13a: Overall view of the tool, which is then split into 3 sections (2.13c,d,e)for FBD analysis. The dashed line at "A "represents the cut to create sections 2.13c and 2.13d. The "B' dashed line is used to create figures 2.13d and 2.13e. In the fol lowing figures (2.13c-e), the fol lowing abbreviations are used. F t a (actual tissue-tip interaction force), F t (effective tissue-tip interaction force), F a (force along the shaft, ie. trocar forces), F r (tool rod force), F s (sensor force), F g (gravity force), F h (hinge forces), and F f (grip force o f the hand on tool handles). The respective moments are also included. 33 The gravitational force (F g) is assumed to be in the negative y-direction for illustrative purposes. In general, it w i l l be a function o f the tool's attitude. The effect of gravity forces on the force sensors is accounted for using a calibration method fully described in Kinnaird's thesis (Kinnaird 2004) and introduced in section 2.4 below. In the fol lowing FBDs , we identify the gravitational forces on the tool but in the subsequent analysis, we assume that the sensor readings have been adjusted to take these forces into account and therefore set the gravitational forces to zero. There are two force-sensing elements in the tool; the force sensor collects forces and moments in all 3 directions (x, y and z) and the strain gauge pair is used to estimate the bending moment in the handle used to apply grasping forces. From these sensor readings, we are able to estimate the tip-tissue interaction forces as described below. The F t and M t values are actually what we consider the effective tip force (ie. combination of both the actual tip-tissue interaction forces and any forces along the tool shaft). Unfortunately, due to the fact that the interactions between the surgical tool shaft and the trocar do not occur at a well defined point and are not directly sensed separately from the tip forces, the trocar interaction forces were not specifically modelled in this study. Directly estimating these trocar interaction forces would require a model that could account for the movement o f the surgical tool along the trocar, but this is difficult because the trocar does not act on the tool at one specific point, but along a 7-10cm portion o f the tool shaft. The characterization of the trocar-tool interaction forces could be investigated further in future studies. We did assume that the axial trocar forces were l ikely to not be very large in comparison with the tip/tissue interaction forces because the tool could slide through the trocar under its own weight. It is more difficult to justify a claim that the lateral forces are low because, although the abdominal wall is compliant and the tool is rarely used as a "pry bar", the forces could be comparable in magnitude. Nonetheless, since the point of application o f the trocar forces changes, it is difficult to cleanly separate the two, which is why we have decided to represent the forces as equivalent tip forces and moments (i.e. tip forces + trocar forces = effective tip force), as shown in Figure 2.13b. Equations (a) - (f) show how the effective tip forces are affected by the presence of trocar forces. 34 feik— Trocar forces ( F . ) . . . & moments ( V I , ) f / ' • Effective tip forces ( F t ) & (\* * jr moments (VI,) Distance between tip and trocar (dt) /Actua l tip forces (!•',.,) \.../ & moments ( M , a ) Figure 2.13b: Effective tip forces & moments are a combination of the trocar interaction force & momnts and the actual tool-tip forces & moments. The equations used to find the effective tip forces and moments are: a) l',x l ; t , x - Fax d) M t x = M , , , + M a x -b) F , v = F t a v + F a v ' e) M , v = M , a v + M a V - F a z (d , ) c) F l z = F t a z + Faz f) M t z = M t a z + M „ z + F a z (d . ) Figure 2.13c: Free body diagram of distal end of surgical tool and force sensor. Effective dp forces are represented. The "d" are the perpendicular distances of the forces used in the moment equation. The equilibrium equations (assuming acceleration is comparatively low and can be neglected): 2) HFv=0 = -Fly-Fg]+Fxy 3) YF: O /;:; • /•;._ Summing moments about the center of the force sensor (black circle on figure): 4) ZM*=0=F»di ~M«+M« 5) T,My=0=Ft!d2-M^-+Msy 35 6) JX- = 0 = ^ 4 - F A + F , A +Ftyd2 +MI: +Ms: Our goal here is to express the effective tip forces and moments in terms of the measured forces and the rod force. • From eq. 1: F t x = F r + F s x (eq.A) • Apply ing gravity compensation (F g l =0) to eq.2: F^ = F s y (eq.B) • From eq.3: F l z = - F s z (eq.C) • From eq.4 : M l x = F t z d| + M s x (eq.D) • From eq.5 : Mty = F t z d 2 + M s y (eq.E) • From eq.6 and gravity compensation: M t e = - Ftyd2 + F r d i - F t x d| - M s z (eq.F) • Have 6 equations but 7 unknowns. The force sensor gives values for F s and M s ; F r is derived from the analysis described later on page 36, which is found in Figure 2.13e. Figure 2.13d: Free body diagram of force sensor and stationary tool handle. The "d" are the perpendicular distances of the forces used in the moment equations, and are different for each section figure. Note that this diagram is not used in the analysis, but is shown for completeness. 7) £ F , = 0 = F t a - F / I j r - F „ 8) HK=0 = Fny-Fg2-F:y + Fhy 9) Z^=0=^-+^,-^ Summing moments about the center o f the force sensor (black circle on figure): 10) = 0 =MXX - Mkx + F^d, - Fflzdb , i ) = 0 = Msy + Fh.ds - Fflzd% - Mhy 12) 5X = 0 = Fhyd5 - Fgld5 + FJyd8 - Ffxd6 ~ Fhxd7 - Ms: 36 Figure 2.13e: Free body diagram of tool handle and strain gauges. The "d" are the perpendicular distances of the forces used in the moment equations, and are different for each section figure. These equations are derived to show that the act of gripping results in the rod force, Fr. 13) 1 ^ = 0 = - ^ . - ^ , + ^ , 14) HK=0 = -F„-Fhy + Ff2y 15) YF:=0 = F>,:+FJ2: Summing moments about the hinge: 16) XM , =0 = M f a - F , 2 ^ l l 17) X M , = 0 = M ^ + ^ 2 . A 18) = 0 = Frd9 + Ff2ydi0 + Ff2xdx, - Fg3d]2 -Mf2: • From eq. 13: F r = Fo x + F| l x(eq.G) • Applying gravity compensation ( F G 3 = 0) to eq.14: F Q y = Fhy ( e q . H ) • From eq. 18 and gravity compensation: Frdg = - Ff2ydio - FQXdi i+Mf2Z (eq.l) From eq. 18, if we make the assumption that F Q (grip force of fingers) in Figure 2.13e is applied in the same fixed spot (as the finger holes for the tool are not large), and we take the sum of the moments around the hinge, we find that: Fr = f(Ff2, Ma). Our strain gauge pair senses the bending moment in the handle at the gauge location (Fn)(grip to strain gauge distance) + M B . But we also believe that MQ ~ 0 because it is physically difficult to apply a pure couple here. Therefore we can make the assumption that the strain gauge pair's output is 37 proportional to F^. So, in principle, we can compute F r direcly and substitute it back into eqs. 1-6 and the equations derived from them. In fact, it is more straightforward to observe the effect o f grip forces on the force sensor output and to directly correct the force sensor readings as a function o f the strain gauge pair's output, as described below in section 2.4; details are contained in Kinnaird's thesis (Kinnaird 2004). Therefore, the final equations for estimating the effective tip force and moments are: 19) F t x = F r + F s x 20) Fry = F s y 21) F t z = - F s z 22) M l x = F t z d, + M s x 23) Mty = F t z d 2 + M s y 24) M l z = - F t y d 2 + F r d, - F^d, - M 2.3.4 Gr ip Force The grip forces measured are used to correct the force readings to better estimate the tool shaft loads. The grip force is measured by two strain gauges mounted onto the surgical tool handle (Figure 2.14) and can the gauges be seen on the tool in Figure 2.15. In this half-bridge configuration, the gauges are measuring the perpendicular axis forces exerted on the handle, and we can correlate this force to forces at the surgical tool tip. Vo= output voltage, V E x = excitation voltage, G F = gauge factor, e = strain R G = nominal resistance o f strain gauge, AR = strain induced change in resistance Ri and R 2 reference resistors Figure 2.14: Strain gauge circuit diagram for half-bridge configuration. The "F" represents the surgeon s grip force exerted on the tool handle. K - A N (torsion) K . * AK (Corrtpr6ssiion V The two gauges used were standard Vishay Micro-Measurements 120ohms. These two gauges are then fed into an instrumentation amplifier with built in gain and offset control. This signal conditioner also compensates for temperature by having the reference resistors within. The 38 These grip strains can then be extracted from the total F/T measurement to give a more accurate tool tip force measurement. This is explained further in section 2.4.1. See Figure 2.15 for a picture of the strain gauges mounted to the surgical tool handle. Figure 2.15: Strain gauges mounted on tool handle. The figure on the left is a general diagram of the surgical tool. The picture on the right shows a side view, where two gauges are attached on opposite sides of the tool handle. 2.4 Force/Grip Data Processing In an earlier section, we showed that the force sensor also responded to grip forces. Here, we explain how we use the strain gauges to separate grip forces from tool interaction forces. The following sections describe our concerns with the force sensor calibration, and what was done to extract force data as a performance measure. 2.4.1 Grip Calibration To fully understand and use the data received from the F/T sensor, an understanding of the mechanics of the surgical tool is needed (Figure 2.16). A typical laparoscopic surgical tool shaft has a few layers as described above in section 2.3.3. The innermost long shaft is what is attached to both the tool handles and the tool tip. This controls the opening and closing of the tool tip jaws. The tool handles are opened by the surgeon, the inner shaft w i l l move and shorten, therefore causing the jaws to open due to the built-in pivot mechanism. Figure 2.16: Mechanics of laparoscopic tool. When the handles are opened, the tool tip jaws are also opened due to a shortening of the innermost tool shaft. (Modified from source: Kinnaird 2004). 39 The F/T transducer senses this movement and wi l l record it appropriately. Because of the design of our F/T sensor, the surgical tool shaft had to be cut, and the special bracket mounted as discussed previously in section 2.3.3. A l l the loads on the inner shaft are transferred through the bracket and sensed by the transducer. The interaction between the strain gauges and the force sensor is depicted in Figure 2.17. Through calibration, the strain gauge data is used to separate the grip forces from the actual tissue manipulation forces. Figure 2.17: Interaction between strain gauges and force sensor. 1) Surgeon closes handle. 2) Strain gauges sense strain in tool handle. 3) Tool tip jaws close. 4) Tool shaft goes into compression, and the force sensor (dotted fill) senses this force against the tool bracket (solid Discussion of the calibration algorithm used to separate the grip forces from tissue manipulation forces can be found in Kinnaird's thesis (2004). The tool was held in a neutral position, and the tool handle is open and shut while recording data from both the force sensor and the strain gauges. The results from one of Kinnaird's (2004) calibration tests are shown in Figure 2.18. 3 black fill). 40 Sample Calibration Grip Data - One Direction Strain Data (V) Figure 2.18: Results from one calibration test. The friction loop (dotted line) and the arrows are indicating the direction of motion. This loop is not consistent, and varies with grip strength. (Source: Kinnaird 2004) Ideally, the force reading would be linearly related to grip strain, but there is clearly some nonlinearity. The force versus strain graph indicates flat sections at both ends where the strain reading increases while the force reading remains constant. The flat part at the lower left is likely due to friction within the tool handle (see Figure 2.19). The strain gauges detect the initial forces required to overcome the friction before any load is transmitted to the force sensor. There also seems to be a large amount o f hysteresis during the release after the squeeze, as shown by the dashed line in Figure 2.18. This loop is not constant, and the location varies with different strength squeezes. The upper right portion of the plot also demonstrates another flat section due to saturation o f the force sensor. The force sensor has overload protection but occasionally the surgeon's grip can reach the maximum sensing range o f the sensor, which causes the strain reading to increase while the force reading stays constant. In conversation with Catherine Kinnaird, and a visual inspection o f the force data verified that, the saturation problems (upper flat part o f curve) were not very significant, as it is believed that less than 2 % of the force readings hit this saturation area. This value was based on a visual inspection o f the force data. Tool rod/bracket interface friction Hinge handle friction 41 IOC Figure 2.19: Friction in the surgical tool handle and bracket. There was friction in the tool hinge handle, and the tool rod/bracket interface. The mounting bracket also moved slightly/slipped along the tool shaft due to an improper fit. This movement may have contributed to the hysteretic loop in Figure 2.18, and to tool handle movements not being sensed by the force sensor. These problems led to a need for a somewhat more complicated grip compensation algorithm than could be used i f the relationship between strain gauge output and force sensor output did not exhibit either hysteresis nor saturation. A mean grip constant was calculated for the linear portions o f the hysteretic loop (Figure 2.20). When the force data was within the linear range (inside the dotted oval), grip forces were removed. Outside o f this range, no compensation was applied. On the upper side of this linear region, the force sensor is no longer responsive to changes in the tip force; no compensation was applied, leading to misleadingly high force peaks, especially in the axial direction. In conversation with Catherine Kinnaird, and by visual inspection o f the force data, it was believed that about 2 0 % o f the force data might be outside the linear region (region outside o f dotted oval). Kinnaird (2004) completed an error study o f the compensation algorithm, and it was found that the algorithm reduces the R M S tip force error by about 5 0 % in a typical manipulation (in a simulator environment). While it was correct to not apply any compensation to the force sensor reading when the grip force was in the low end of the curve, the correct thing to have done in the case o f saturation would have been to set the tip force reading to "zero" , and i f possible, to have excluded such data from subsequent analysis. However, we erroneously left the readings uncorrected, but believe this ultimately had a small effect on our conclusions because saturation occurred relatively infrequently. 42 o Tool friction \ \ Force saturation strain Figure 2.20: Grip compensation. The force data has grip removed only in the linear region, as outlined in the dashed oval. This leads to misleadingly higher force peaks in the force data stream. With regards to tool design, the next iteration o f the mounting bracket should be better fitted to the tool shaft to prevent the " s l i pp ing " movement mentioned above. The force sensor chosen should also be able to read the higher forces without saturating. 2.4.2 Gravity Effects Calibration The effects o f gravity on the F/T sensor are significant enough that they must also be compensated for. This sensor is inherently quite sensitive, and is mounted o f f the axis o f the surgical tool shaft. This creates a force/torque reading just from the mass o f the surgical tool alone. These F/T readings can vary up to ~5N when the surgical tool is held or placed in different rol l (rotation about z axis in tool tip frame (Figure 2.1)) and pitch (rotation about y axis in tool tip frame) orientations. Rotation in the yaw (about x axis in tool tip frame) does not affect F/T readings in the neutral position, as gravity naturally acts perpendicular to this axis o f rotation. In order to improve the F/T readings, these effects o f gravity must be compensated for. A mathematical model was created to compensate for the roll and pitch. The details can be found in Catherine Kinnaird 's thesis (2004). This model led to an almost 2 X decrease in R M S error when compared to a simple mean subtraction method, where the mean force value was subtracted from the total force. 43 2.5 Kinematics Data Fusion Because we have used two different position sensors in our data collection process, we make use o f this data for our kinematics measure, and have created a technique to fuse these two datasets. 2.5.1 Data Fusion Introduction In a previous study in our lab by McBeth (2002), the Polaris optoelectronic tracking system was used to collect 3D position data collection for the method to quantitatively measure surgical performance. The optical data was collected at 20Hz in the McBeth study, which is a suitable sampling frequency for measuring human movement particularly for postural studies (Woltring 1986). However, as mentioned previously, one drawback o f the optical sensor is that it is susceptible to line-of-sight problems, which leads to occlusion o f the optical markers and consequently gaps in the optical data stream. He also suggested that a maximum gap size o f 0.5s could be interpolated successfully (McBeth 2002). In the OR , marker occlusions longer than 0.5s occur quite frequently. These are the main reasons for wanting to improve the position tracking system for quantifying motor performance. Various different options were considered (i.e., ShapeTape, accelerometers, gyroscopes) as discussed in section 2.3.1.1.1, but in the end we chose to combine an electromagnetic position tracking system and the optical sensor. Due to availability and ease-of-use, the Fastrak electromagnetic position tracking system was chosen to complement the Polaris. Fastrak is a three-dimensional (position and orientation) magnetic tracking system, and these are known to be free o f line-of-sight issues, have continuous data collection, and sample at a high frequencies (~120Hz). According to the product manual, the 3D position and orientations o f the receivers can be measured with an accuracy o f 2mm and 0.15° within a l m 3 working volume surrounding the magnetic transmitter, but we have found that in reality, the accuracy is not as good as the manual suggests. We have found that once the transmitter-receiver distance is greater than ~20-30cm, the data becomes less accurate, and tends to fluctuate around the actual value. 2.5.2 General Data Fusion To obtain a useful estimate o f the tool position, the two data streams must be fused. The general process o f combining multiple data sources is well studied in a wide variety o f applications (Challa 2004). Sensor fusion is defined as "the combination of sensory data or data derived from sensory data such that the resulting information is in some sense better than would be possible when these sources were used individually" (Elmenreich 2002). In short, we would like to combine more than one source of data to create information that is better than either alone. Table 2.2: Advantages of data fusion (Elmenreich 2002) Advantage Reason Robustness & Reliability Inherent redundancy and provide data even when one source fails Extended Spatial & Temporal Coverage One sensor can see where another cannot, and provide data when another cannot Increased Confidence Measurement of one sensor confirmed by measurements from other sensors of same domain Reduced Ambiguity & Uncertainty Joint information reduces set of ambiguous interpretations of measured value Robustness Against Interference Increasing the measurement space makes system less vulnerable to interference Improved Resolution Multiple independent measurements taken of same property Sensor fusion can be implemented at various levels of interpretation depending on the application. Low-level fusion (or raw data fusion) will combine various sources of raw data to produce new data this is supposed to provide more information than the original inputs. Intermediate-level fusion (or feature level fusion) combines features such as edges, corners, lines and textures into a feature map that is then used for segmentation and detection. High-level (or decision fusion) combines decisions from several experts. Methods include voting, fuzzy-logic and statistical methods. In our case, we will be concentrating on the low level or raw data fusion, as there will be two raw kinematics data sets combined into one. Because the notion of data fusion covers so many levels of interpretation and types of sensors, there is no single model of fusion that can work for all applications. It is key to find a model that is optimal for a specific application. 45 2.5.2.1 Fusion Methods There are many different types o f sensor fusion, and each one has its pros and cons. Some of the areas where sensor fusion is used include the areas o f military (i.e. target tracking), satellite positioning, and image processing (Challa 2004). The more common methods of sensor fusion in these areas include: • Bayesian Inference - using probabilities to attach weightings, i.e., automotive applications sensor fusion (Coue 2003) • Dempster-Shafter Inference - similar to Bayesian, but more computationally intensive. A l lows for more unknowns, as it relies on "bel iefs" and "masses", i.e., New use in human computer interactions (Wu 2002) • Art i f ic ia l Neural Networks -perception studies (Johnson 1998) • Kalman Filtering - prediction/correction filter, often used in navigation 2.5.4 Kinematics Data Fusion Technique Before the collected data can be fused, it must be synchronized in time and registered into the same reference frame as summarized in Chapter 3 section 3.6.1. The details o f synching and registering two data streams were presented in the complementary thesis o f Kinnaird (2004). Once the two positional data sets are synchronized and registered, they are then put through the fusion process. Position and orientation measurements from the optical sensor are considered to be correct and accurate when optical markers are visible, and the magnetic measurements are used to provide estimates o f the shape and detail o f the sensor's trajectory, especially during times when the optical sensor has missing data (i.e. optical marker occlusions). We take these two data streams and fuse them into one continuous high frequency data stream. By using the accuracy o f the optical data, and the continuity o f the magnetic data, we take the advantages o f both systems to get the data we want. We first filter the magnetic data. This filtered magnetic data is then evaluated at the times o f the optical data to estimate its value at the optical sampling times. The time matched optical and magnetic data is now subtracted from each other to create a difference curve. Next, we interpolate this difference curve to estimate the errors at each magnetic sample time. The fused data is an addition of the interpolated difference curve to the original magnetic data. 46 A f low diagram o f the steps to data fusion is shown in Figure 2.21. Filter magnetic data Evaluate/int times o f opt (magnetic) erpolate @ ical Difference curve (optical - magnetic) Spline interpolation (difference) * Fused Data = splined difference + magnetic Figure 2.21: Data fusion steps 2.5.4.1 Data Fusion Technique Details The original magnetic data was sampled at 120Hz, which is much faster than the 30Hz sampling rate o f the optical data. The magnetic data was first filtered using a Generalized Cross Validation ( G C V ) approach by Woltring (1986) to smooth the dataset, as the magnetic data can be rather noisy (Figure 2.22a and Figure 2.22b). 175 h ' J J c i i i i i i i i 213.5 2136 213.7 213 B 213.9 214 214 1 Time C P D r e f e r e n c e Length-N | r=1000 J r=1000 CPD(D r s . r e f ) Resample Length = N 0.95 D „ Figure 3.15: CPD of D-values. Finding a CPD of D-values between the CPDmeas and CPDref: we can assess the relevance of any measured D-value. IfDmeas is greater than the Dcr value of CPDfDrsj-ej) then the two CPD's under consideration are different (Source: Kinnaird 2004). 3.10 Discussion The Maryland dissector tool was chosen as the experimental tool in consultation with expert surgeons. This particular tip was selected as it is frequently and commonly used in 94 laparoscopic cholecystectomies. The Maryland tool tip is interchangeable, as future studies may require other useful tool tips. A custom designed and built bracket was created to mount the various sensors required for this project. By creating this bracket and mounting it to the surgical tool shaft, all sensors were mounted securely and kinematics and F/T measures could be extracted for study. The use o f the electrosurgical unit (ESU) in the O R caused a significant amount o f distortion and noise in our raw data. The magnetic, F/T and strain gauge sensors were all adversely affected by the E S U . This effect required data removal to be done before performance measures could be analyzed. A technique to monitor the velocity changes was used to successfully remove the affected sections o f noisy data. The remaining ESU affected data could then be removed manually. Forces and torques o f the surgical tool tip were collected using the experimental set up and a tri-axial transducer mounted on the bracket. Many hours o f calibrations and data registration were done in post-processing to remove gravity effects, grip effects, and electro surgery unit effects. 95 Chapter 4 Results of a Quantitative Study to Assess Laparoscopic Surgical Simulator Validity 4.1 Introduction In this chapter, we present pilot study data illustrating intersubject variability, intrasubject differences, and the reliability of our chosen performance measures. The performance and behaviours of the novice surgeons were compared to each other in OR and simulators (i.e., performance validity), and then again compared to the experts (i.e., construct validity). We also analyze the concurrent validity of the simulators based on our performance measures in the OR as the gold standard. The implications of the analysis are then discussed as concerning the reliability of our data collection system and construct validity and performance validity of the simulators. The protocol as outlined in Chapter 3 section 3.5, and the lengthy post-processing (Chapter 3 section 3.6) were followed for each of the three OR procedures, and physical simulator data collections. This resulted in time synchronized and post-processed kinematics and force data referenced to a common reference frame at the surgical tool tip. The dissection task data of the surgical procedure is also broken down into segments as discussed in Chapter 3 section 3.8.1.2. 4.2 Results We were able to successfully collect OR data 3 times (one surgery from each resident). We also collected data in the virtual reality (VR) and physical simulators. A summary of the data collections is shown below in Table 4.1 Table 4.1: Summary of successful data collection from each context O R V R Simulator Physical Simulator Resident 1 1 3 1 Resident 2 1 3 1 Resident 3 1 3 1 96 4.2.1 Context Comparisons The comparisons between the contexts and the subjects are completed in many different areas. We have collected data from the surgical residents in each context (OR, physical simulator, V R simulator), and w i l l make comparisons within these. Then this data wi l l be compared against the expert data previously presented by Kinnaird (2004). 4.2.1.1 Surgical Residents The first comparisons (Figure 4.1) presented w i l l be intrasubject from each procedure. Each procedure is divided into segments as discussed in Chapter3 section 3.3.1.2, and these segments compared to each other (A l ) . This w i l l investigate intrasubject intraprocedure variability and repeatability. Next, the intrasubject intertrial V R (A2) comparisons are shown to investigate repeatability in the V R simulator o f the residents. Thirdly, the intersubject intrasetting (A3, A4, A5) results w i l l be analyzed. Each o f the residents w i l l be compared in the three settings (OR, physical simulator, V R simulator) to evaluate consistency at the skil l level. And lastly, the intrasubject intersetting (A6) results wi l l compare each o f the residents' behaviour in the O R to the simulators to performance validity o f the two simulators. OR Physical Simulator VR Simulator Resident 1 CD . © i w k Resident 2 © ( D © c D © ( D Resident 3 (D . r r © ; r w A l Intrasubject intraprocedural O R A2 Intrasubject intertrial V R simulator A3 Intersubject intrasetting O R A4 Intersubject intrasetting physical simulator A5 Intersubject intrasetting V R simulator A6 Intrasubject intersetting (OR versus V R , O R versus physical) Figure 4.1: Context comparisons for surgical residents. The numeric values in the table represent the respective As. 97 4.2.1.2 Expert Surgeons To study construct validity as stated in our objectives, we need to compare our surgical resident data to that o f the expert surgeons. The expert surgeon data was collected and performance measures extracted by Catherine Kinnaird (2004). Our resident to expert comparisons wi l l be called " interlevel" comparisons (Figure 4.2). This comparison w i l l be interlevel intrasetting. This w i l l demonstrate the results o f the experts compared to the residents in each o f the contexts (OR, V R simulator, physical simulator). A7 is new method o f evaluating concurrent validity as we have O R expert data as the "gold standard". We have the same performance measures available in each o f the other contexts (resident skil l level and simulators). This way we are able to quantitatively make suggestions in the concurrent validity o f the simulators. A8 and A9 allow us to investigate the construct validity o f both the physical and V R simulator, as we are trying to detect skil l level differences. O R Physical Simulator V R Simulator Residents A P A b A (Q Experts r V 1 ' r 1 y r A7: Interlevel Intrasetting O R A8: Interlevel Intrasetting physical simulator A9 : Interlevel Intrasetting V R simulator Figure 4.2: Interlevel context comparisons for experts and residents. 4.2.2 The D-Value The KS statistic D-value is calculated for all context comparisons. The D-value depends on the size o f the original sample sizes o f the distributions. Our sample sizes are all in the magnitude o f several thousand data points, and the larger the sample size the smaller the D-value must be to be considered "s imi lar" . Generally, when a dataset is resampled from itself (Drs-ref), D-values are usually about 0.02-0.05. (Remember that a D-value o f 0 is similar, and a D-value o f 1 is maximum difference) When two CPD's are different, we usually see values o f 0.8-1. 98 4.2.3 Presentat ion of Results The performance measures as discussed earlier are velocity, acceleration, jerk and force in the six tool tip directions: axial (z), grasp (y), translation (x), transverse (yjx2 + y2 ), absolute ( + v 2 + z2 ), and roll about the tool axis. The performance measure of distance from the mean (D mean) is presented only in the absolute and roll about the tool axis directions as it is sensitive to the choice of location of the global reference frame. The cumulative probability distributions (CPDs) of all twenty-six performance measures in all directions are presented in a large plot with twenty six subplots. The 75 t h percentile of the data is shown for better visualization of the results as this area shows the critical areas of the CPD's. The important differences between CPD's are always in this region. The CPD's all have long tails, and if the entire C P D was shown, the critical areas would appear to be vertical, and the important differences would not be easily seen. The D-values of the comparisons is also calculated and presented in another plot. The D-values are shown with confidence intervals, and the critical confidence interval is also shown for finding the statistical difference (CPD(DRs- r ef)). 4.2.3.1 A l : Intrasubject In t raprocedura l O R Compar i sons Each of the surgical residents performed a laparoscopic cholecystectomy with an expert surgeon in attendance and supervising. Each resident had one session of data collection in the OR, and the results from each are presented in the following sections. Each surgical dissection task was divided into three segments to examine intraprocedural repeatability. The first segment consisted of anatomy exploration and identification. The second segment was the cystic duct and artery dissection. And the third segment was the gallbladder removal from the liver bed. We investigate the repeatability of the resident within one procedure. 99 4.2.3.1.1 Resident 1: Intrasubject Intraprocedural O R The performance measures for the three segments of the OR procedure are extracted (Figure 4.3) . In an initial visual inspection, the CPD's are relatively similar in shape and range. The kinematics measures of velocity, acceleration, and jerk in all tool tip directions show the most similarity in shape. The force, distance from mean ( d ), and the transverse and absolute tip directions measures show the most variability. Segment 3 is the most different from the other two segments of the procedure, and this is seen in all tool tip directions. The axial forces are the largest in value, as this is a combination of the axial tip force and the grip forces not removed through the calibration process. This large axial force measure coincides with what was found for the expert surgeons (Kinnaird 2004). The segments are then compared to a data lumping of the other two segments. These D-values and the corresponding confidence intervals signify the variability between segments (Figure 4.4) . Each of the segments in an OR procedure represents a different portion of the dissection task. The CPD reference (DRs-ref) is created from resampling the reference CPD from itself. If the experimental D-values is close to the 95 t h percentile of the CPD (DRs- rer), the more "similar" they are considered. The performance measure CPD's indicated the segments 1 and 2 are similar, and the D-value calculation verifies this. As expected, segments 1 and 2 are more similar, and segment 3 is the more different from the other two segments. The D-values represent the variability between segments for this procedure. It gives us an idea of intraprocedural repeatability as each of the segments has a different goal in the OR, even though they are all considered part of the dissection task. In general, we can say that resident 1 is repeatable intraprocedurally. 100 Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation Velocity Acce l 'n Jerk Force Segment 1 Segment 2 Segment 3 -30 -10 0 10 0 0.5 1 1.5 Figure 4.3: Resident 1 intraprocedure OR CPD. Segments 1,2 & 3. Each of the individual graphs represents a performance measure in that particular direction at the tool tip. 101 Acceleration D s e gl - ref L^seg2 - ref I—B 1 n -'seg3 - ref L^seg2 - seg3 C P D 9 0 ( D R S . r e f ) 0.4 0.5 0.8 Dif ference (D) Va lue 0.7 0.9 Figure 4.4: Resident 1 intraprocedure OR D-values. Segments 1, 2 & 3. The horizontal error bars represent the confidence interval on the D-value. 4.2.3.1.2 Resident 2: Intrasubject Intraprocedural O R The C P D ' s o f the calculated performance measures again seem to be quite similar in shape and range for all measures (Figure 4.5). We see some small differences in segment 3 in the transverse and absolute tool tip directions as was seen previously with Resident 1. Again, the force and d show the most differences in C P D shape. The D-values are calculated and lend support to what was seen in the C P D ' s o f the performance measures (Figure 4.6). The kinematics performance measures have small differences between all segments with the majority o f D-values below 0.3. We see here that segment 1 has many D-values that fall within the C P D (DRs- r ef) indicating the values are essentially the same. A lso , for this subject, there are no D-values greater than 0.6. The kinematics performance measures all have D-values below 0.2 demonstrating very repeatable behaviour within this procedure. 102 Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation Sej mient 1 Sej *ment 2 u n a Sej ^ ment 3 d Velocity -200 0 200 -200 0 100 -500 0 500 . 200 500 800 , 200 600 1000 0 2 Jerk Force ,-ff J, -5 0 5 10 00.5 -2 -1 0 0.5 1 1.5 2 5 10 15 -0.2 0 Figure 4.5: Resident 2 intraprocedure OR CPD. Ssegments 1,2 & 3. Each of the individual graphs represents a performance measure in that particular direction at the tool tip. 103 d M Tran! Velocity: Acceleration xlf' Tmri! Jerk: rSk Force: 4- * -I n L-'seg2 - ref I * 1 D ; seg3 - ref )i( I D seg2 - seg3 CPD 9 0 (D R S . r e f ) 0 0.1 0 2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Dif ference (D) V a l u e Figure 4.6: Resident 2 intraprocedure OR 2D-value. Segments 1, 2 & 3. 4.2.3.1.3 Resident 3: Intrasubject Intraprocedural O R For Resident 3, we see very similar results (Figure 4.7) as to what was seen with Resident 1 and Resident 2. The three segments show a lot of similarity when looking at the CPD performance measures. Segment 1 shows some difference in the jerk measure in the transverse and absolute tool tip directions. We again see the most difference in the d and force measures. The similarity between segments is confirmed by the D-values (Figure 4.8). Force and d measures have again the largest differences. This OR trial shows the least amount of intersegment variability with all D-values below 0.3 except for the force in the translate (x) direction. Resident 3 demonstrated the most repeatable behaviour within a single OR procedure. 104 Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute 1 0.5 I Rotation Segment 1 Segment 2 Segment 3 Velocity Acce l ' n Jerk Force -200 0 200 , 200 400 GOO ; 200 500 BOO 1 •0.5 0 0.5 •600 0 400 , -500 0 500 „ -500 0 500 , 500 1500 , 500 200D , - 4 - 2 0 2 1 0.5 0 2000 0 2000 -2000 01000 -2000 0 2000 2000 6000 2000 8000 -20 -10 0 N -40 -20 0 20 -2 0 2 4 6 8 -2 0 2 4 6 2 4 6 El 10 20 40 -0.5 0 Figure 4.7: Resident 3 intraprocedure OR CPD. Segments 1, 2 & 3. Each of the individual graphs represents a performance measure in that particular direction at the tool tip. 105 Trans Velocity I Acceleration) Jerk I TKS Force I Da -I n L /seg2 - ref D seg3 - ref D, seg2 - seg3 C P D 9 0 ( DRS-ref) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Difference (D) Value 0.8 0.9 Figure 4.8: Resident 3 intraprocedure OR D-value. Segments 1,2 & 3. 4.2.3.2 A2: Intrasubject Intertrial V R simulator The three residents each performed the cystic duct dissection module on the V R simulator three times each. The performance measures extracted from the V R simulator are less comprehensive than from the O R or physical simulator; there are only 17 performance measures. There is not any roll direction, or component force values. It should be noted that as o f time o f this manuscript, the V R force values are pending change. The manufacturer hardware calibration value was not quoted correctly, and therefore the V R force values wi l l need to be multiplied by a factor still to be determined. We do know that this factor w i l l be less than 2, and therefore wi l l not significantly affect the comparison results. Intrasubject intertrial variability was examined, and little variability was seen in either the range or shapes o f the C P D s for all three residents (Figures 4.9, 4.11, 4.13). A lso seen from the V R simulator data, are the low absolute force values and the small range o f values. The residents also tended to spend about half o f the time at very low forces. The three trials D-values for each o f the three residents were compared. These D-values (Figure 4.10, 4.12, 4.14) coincide with the visual observations seen on the C D F plots representing very little differences in the majority o f measures. The largest differences are seen in the absolute force and distance from mean performance measures. The variability is so low in this contextual comparison that many D-values are below 0.1 between the three trials. Each o f the three residents is very repeatable in three trials in the V R simulator. Velocity Acce l 'n Jerk Force Ax ia l (z) Grasp (y) Translate (x) Transverse Absolute Rotation • I rad 50 0 50 , - 5 0 0 50 , -40 0 30 , 2 0 B0 100 , 20 B0 140 , 0 0.5 1 rad/s3 -400 0 300 -400 0 200 -400 0 200 200 500 BOO 200 600 1000 0 0.5 1 1 i 1 1 Nm 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0.3 1.4 - Q 05 Figure 4.9: Resident 1 intertrial VR simulator CPD. Each of the individual graphs represents a performance measure in that particular direction at the tool tip. 107 d Abs Velocity I Acceleration % Tftg Jerkx Force Abs H D, ' (trial 1 vs. .2) "I D ( trial 1 vs. 3) | r~| —j " (trial 2 vs 3) CPD w (D R S . r e f ) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Di f fe rence (D) V a l u e 0.8 0.9 Figure 4.10: Resident 1 intertrial VR simulator D-value comparisons. 108 Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation Velocity Acce l 'n Jerk Force 0.5 0 -400 0 300 -500 0 500 -400 0 300 200500 900 11 1 1 0.5 N 0.5 N 0.5 1 0 0.5 Figure 4.11: Resident 2 intertrial VR simulator CPD. Each of the individual graphs represents a performance measure in that particular direction at the tool tip. Velocity* Acceleration* Tran| Jerkx Force Abs i II git| i I- 4 D (trial I w. .2) I ^ I D (trial I vs 3) | r~| _ | D (trial 2 vs 3) CPDc»(DRs . rer) 0.1 0.2 0 3 0.4 0.5 0.6 07 Difference (D) Va lue 0.8 0.9 Figure 4.12: Resident 2 intertrial VR D-value comparisons. 110 Ax ia l (z) Grasp (y) Translate (x) Transverse Absolute Rotation Velocity 0 5 Acce l 'n 0-5 Jerk Force -10 0 10 , -10 0 10 , -10 0 10 1 15 25 10 20 30 ^ 0.5 1 rad/s" 50 0 50 „ -50 0 50 , -50 0 5D . 20 B0 100140 . 50 100150 .0 0.5 rad/s' 1 0.5 0 -400 0 300 . -400 0 300 N 1 0.5 0 N 1 0.5 0 -400 0 300 200 BO01000 400 1200 "0 0.5 1 11 N 1 0.5 0 0.5 1 0 0.5 1 0 0.5 1 0 Nm 1 0.3 0.9 0 0.5 Figure 4.13: Resident 3 intertrial VR simulator CPD. Each of the individual graphs represents a performance measure in that particular direction at the tool tip. Tranf Velocity x iM Acceleration! Jerk I Force Abs H D (trial I vs 2) -I D„ -"(trial 1 w 3) | | | | D ( t r j a l 2 « 3) CPD9o(DR S. r e f) 0.1 0.2 0.3 0.4 05 0.6 Difference (D) Value 0.7 Figure 4.14: Resident 3 intertrial VR D-value comparisons. 0.9 4 . 2 . 3 . 3 A 3 , A 4, and A 5 : Intersubject Intrasetting Comparisons The three residents performance measure C P D ' s are compared in each o f the three contexts o f the operating room, virtual reality simulator, and physical simulator (Figures 4.15, 4.17, 4.19). We are able to examine consistency at the skil l level by making these comparisons. At first glance, the C P D ' s for the three subjects in each context look rather similar. The operating room C P D ' s show the most differences, especially for Resident 2. The V R simulator C P D shows very similar shapes and ranges for all performance measures other than d. The physical simulator C P D ' s shows again similar shapes and ranges in all measures other than forces and D mean. The differences between the residents are analyzed (Figure 4.16, 4.18, 4.20). These D-values confirm the initial visual inspection o f the CPD ' s . The data for the physical simulator shows that Resident 2 and Resident 3 have more similar patterns when compared to Resident I with most measure below D = 0.3. For the V R simulator, the three residents show much more similarity with the majority o f the D-values below 0.15 demonstrating amazing consistency at their ski l l level. The O R difference comparisons indicate a larger range o f D-values with a 112 spread throughout the range. A point to remember is that the V R simulator is a very repeatable and structured environment, while the O R context is much more inherently variable. And the physical simulator is somewhere in-between these two contexts in terms o f variability and repeatable structure. Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation 1 Velocity Acce l ' n Jerk Force Resident 1 Resident 2 Resident 3 Figure 4.15: Intersubject intrasetting (OR) CPD. m Trans Velocity) TM Acceleration I T ! Jerk j Force § 4* ** 4tr — H D (Resident I 1 vs .2) H""«———| D ( Resident II vs. 3) | _ ^ ^ _ ^ _ | D(Resident 2 vs.3) CPD 9 0 (D R S , e f ) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Dif ference (D) Va lue 0.8 0.9 Figure 4.16: Intersubject intrasetting (OR) D-value comparisons. 114 Ax ia l (z) Grasp (y) Translate (x) Velocity Acce l ' n Jerk Force Resident 1 Resident 2 Resident 3 ransverse Absolute Rotation rad -10 0 10 -10 0 10 . -10 0 10 1 0.5 0 1 0.5 0 -50 0 50 , -50 0 50 , -50 0 50 , 20 70 120 50 100 150 0 0.5 rad/s3 400 0 300 -400 0 300 „ -400 0 300 „ 200 500 800 200 600 1000 0 0.5 1 Nm 0 0.5 1 Figure 4.17: Intersubject intrasetting (VR simulator) CPD. 115 Abs Irani Velocity* Acceleration! Irani Jerk I Force Abs ' » 1 "»<***" p 3j£ D , ' (Resident I I vs 2) ^ I D( Resident 11 vs 3) | 3g | D(Resident 2 vs 3) 1. .,,1 C P D 9 o ( D R S . r e f ) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 09 Difference (D) Va lue Figure 4.18: Intersubject intrasetting (VR simulator) D-value comparisons. Ax ia l (z) Grasp (y) Translate (x) Transverse Absolute Rotation — — — Resident 1 | 1 | 1 Resident 2 — Resident 3 Velocity 0 5 Acce l 'n Jerk Force -2000 01000 -2000 0 2000 .-B000 0 4000 2000 11 10000 2000 10000 -15-10-5 0 5 -40 -20 -40 -20 0 20 40 10 30 50 0 5 10 15 Figure 4.19: Intersubject intrasetting (physical simulator) CPD. 117 TrariE Velocity 1 Acceleration) Jerk 1 TrI Force I H - . T 0.1 H D , ' (Resident I 1 vs. 2) ^ I D(Residentll vs. 3) | ^ | D(Resident 2 vs. 3) CPDc ,o(D R S . r ef) 0 2 03 0.4 0.5 06 Difference (D) Value 0 7 0 8 0.9 Figure 4.20: Intersubject intrasetting (physical simulator) D-value comparison. 4.2.3.4 A6: Intrasubject Intersetting Each o f the three residents had data collected in the three contexts: OR , V R simulator, and physical simulator. Comparisons were done for each subject in each o f the settings (i.e. intrasubject intersetting) (Figures 4.21, 4.23. 4.25). These comparisons w i l l help us in our investigation o f the performance validity o f the two surgical simulators. If the quantitative measures in the simulator are similar to that in the OR, then the simulator can be considered to show performance validity. It can be seen from the C P D ' s that the kinematics measures in all tool tip directions for the OR, and the physical simulator are more similar when compared to the V R simulator. It would seem that the resident's move more slowly in the V R simulator relative to the physical simulator and in the OR. Another significant visual is the absolute force measure, which is very low in the V R simulator. It is so much lower than the physical simulator or O R settings that it is not easily seen on the plots. 118 The D-value analysis provides further evidence to the differences and similarities seen in the C P D ' s (Figure 4.22, 4.24, 4.26). The largest differences are seen between the force values, where the D-value is often 1.0, maximum absolute difference. A lso in the d performance measure, there are a few D-values that are also at 1.0. Another interesting note is that the comparison between the physical simulator and the OR, where many o f the D-values are below 0.4. And conversely, when we compare the V R simulator to either the O R or physical simulator, the D-values are generally larger than 0.3. We consistently see that the O R vs. physical simulator comparisons shows lower D-values than the O R vs. V R simulator comparisons. Velocity Acce l ' n Jerk Force Ax ia l (z) Grasp (y) Translate (x) Transverse Absolute Rotation i -200 0 200 -300 0100 ,-100-50 0 50 200 400 1 200400600 ^0.2 0 0.1 £000 0 4000-6000 O2000 -4.000 0 2000 , 2000 A ,2000 12000 -6 -4-2 0 2 4 1 -40 -20 0 -2 0 2 4 6 0 5 2 4 6 8 10 20 30 -3 - 2 - 1 0 1 Figure 4.21: Resident 1 intersetting CPD. OR, VR simulator, physical simulator. Velocity! J * Acceleration! Trarl; Jerk! TM Force \ D, OR vs V R I * I D OR vs Phys I X I ^ V R ra p h V s CPD„o(DRs.ref) 0.5 0.6 0.7 hce (D) Value 0.8 0.9 Figure 4.22: Resident 1 intersetting D-values. OR, VR simulator, physical simulator Velocity Acce l 'n Jerk Force A x i a l (z) Grasp (y) Translate (x) Transverse Absolute 1 0.51 Rotation Phys V R O R rad/s3 f N -40 -20 0 0 5 10 -40 -20 0 20 40 10 30 50 Nm 0 L 0 5 10 15 Figure 4.23: Resident 2 intersetting CPD. OR, VR simulator, physical simulator. 121 Ti Velocity Ti Acceleration: T Jerk Ti Force 0 il 0.2 0.3 D n B „ VR OA 0.5 0.6 Difference (D) Va lue I • I D OR vs Phys I C P D 9 0 ( D R S . r e r ) 0.7 + 0.8 0.9 Figure 4.24: Resident 2 intersetting D-values. OR, VR simulator, physical simulator 122 A x i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation Phys V R O R ,-200 0 100 -300 0100 ,-200 0 1DD 1 I , i ' i— * l 1 I . •> J 1 200 400 GOO , 200 600 1 -0.5 0 0.5 Force 0 1000 -2000 0 1000 -4000 0 2000 . 2000 6000, 2000 6000 -10 -5 0 5 N lj 0.5 0 N J -40 -20 0 20 0 5 10 -40 -20 0 20 40 20 40 0 5 10 Figure 4.25: Resident 3 intersetting CPD. OR, VR simulator, physical simulator. 123 Tram Velocity! Acceleration! Trl Jerk: jM Force: D OR vs V R D, ' O R vs Phys | - ^ ^ — | D V R vs Phys CPD„o(D R S- r ef) 0.1 0.2 0.3 0.4 0.5 0.6 0., Difference (D) Va lue J.9 Figure 4.26: Resident 3 intersetting D-values. OR, VR simulator, physical simulator 4.2.3.5 Expert vs. Resident Comparisons The data from the three residents was lumped together to create a large data set for resident surgeons in each o f the three settings: OR, V R and physical simulators. The expert surgeon data (2 experts) collected and analyzed in a concurrent study by Catherine Kinnaird was also taken and lumped into a dataset to represent the expert surgeons (Kinnaird 2004). These two datasets in each setting, expert and residents respectively, could then be compared to each other to begin an investigation into the construct validity o f the two simulators. If the simulator is able to detect ski l l level differences, it is said to show construct validity. We are also able to demonstrate a new method for evaluating concurrent validity. This type o f validity is usually assessed by a comparison to the "gold standard", which is expert O R behaviour. This "gold standard" has been evaluated using checklists and rating scales in the OR. In our study, we are able to make the same assessments in all contexts, whether O R or simulators, or differing ski l l levels. 124 Due to intrasubject differences that cannot be clearly seen once the data has been lumped, and our small sample sizes for both experts and residents, we also investigated differences amongst the individuals. D-value comparisons are shown to analyze differences amongst the two experts and three residents. Each expert is compared to each resident individually for a more thorough construct validity investigation. 4.2.3.5.1 Interlevel Intrasetting O R The performance measure CPD's for the lumped experts and residents were evaluated and plotted (Figure 4.27). The shapes of the kinematics measures of velocity, acceleration and jerk are somewhat similar, but the ranges do vary. We also see visual larger differences in the force and D mean CPD's . Axial (z) Grasp (y) Translate (x) Transverse Absolute Rotation -40-20 0 20 0 2 4 6 B -4 -2 0 2 4 2 4 6 8 10 20 30 -5 0 5 10 Figure 4.27: Lumped interlevel OR CPD. 125 We then investigate the individual differences for the two experts and three surgical residents (Figure 4.28). We see here the actual variation between all five subjects. We generally see similar shapes in the kinematics measures, while more variability in the d and force measures. Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation Velocity Acce l 'n Jerk Force «« « « Res 1 — — Ex 1 Res 2 . Ex 2 Res 3 mm/sf-A'" 200 0 200 -300 0100 ,400 0 200 -1000 0 1000 , -500 0 500 -2000 01000 . 1000 3000 m m / s ] ^ | T | 7 — 1 1 2000 4000-1DD 0 50 1 -40-20 0 20 20 40 -10 0 10 Figure 4.28: Interlevel OR individual CPD. Two experts and three residents. 126 A n analysis o f the D-values confirms what is seen in the C P D ' s (Figure 4.29). There is a wide range o f D-values ranging from close to 0 to the maximum difference o f 1. Again as was seen earlier in the resident comparisons, the force and D mean measures frequently have a D-value o f 1. Here we see that expert 2 vs. resident 3 generally have D-values below 0.4, while expert 1 vs. resident 1 and resident 2 have all D-values greater than 0.2. E x l vs. Resl Ex I vs. Res2 E x l vs. Res3 Ex2 vs. Res I Ex2 vs. Res2 Ex2 vs. Res3 CPD90 (D R S , e f ) m Ti Velocity T: Acceleration: •+- 4*-0 0 1 02 0 3 0 4 0.5 0.6 0 7 Dif ference (D) Va lue Figure 4.29: D-values for the two experts and three residents in the OR. 8 09 127 4.2.3.5.2 Interlevel Intrasetting Physical Simulator Intelevel comparisons let us evaluate the construct validity o f the physical simulator. We are looking for skil l level differences. The C P D ' s o f the interlevel physical simulator trials are shown in Figure 4.30. Here we see that the kinematics measures in all directions except roll seem to be relatively similar in shape and range. We again see the largest differences in the force and d measures. A x i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation Residents Velocity Acce l 'n 0 200 -200-100 0 100 300 500 , 200 400 600 , -02 0 02 Jerk Force 1 0.5 0 f 0.5 0 ;600 0 400 1 -1000 0 ,000 0 500 , 500 2500 , 500 2500 -10 0 10 02000 :10000 05000 -5000 0 5000, 0.5 1 1.5 2 . 0.5 1 1.5 2 -6000 0 -302010 0 10 0 5 10 15 -30-20-10 0 5 15 25 10 20 30 -10 -5 0 5 Figure 4.30: Lumped interlevel physical simulator CPD. 128 For the physical simulator, we again analyze the individual differences between each expert and resident (Figure 4.31). We do see general trends in the shape and range for the performance measures. We see slightly more similar C P D ' s than we saw in the O R comparisons. Velocity Acce l 'n Jerk Force Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation 1 0.5 rad i H -40 -20 0 20 0 5 10 15 -40 -20 0 20 40 10 30 50 -10 0 10 Figure 4.31: Interlevel physical simulator individual CPD. 129 By looking at the D-values for all the experts and residents, we can more clearly see the individual differences (Figure 4.32). There is a large spread o f D-values throughout the range. And again, we see the largest differences with the D mean and force measures, with a few at the maximum difference o f 1. It is also interesting to see that the comparison o f expert 1 vs. resident 2, we get almost all the D-values below 0.2 indicating they are more similar in their behaviours. Ti Velocity Acceleration! Exl vs. Resl Ex2 vs. Resl Exl vs. Res2 Ex2 vs. Res2 Exl vs. Res3 Ex2 vs. Res3 CPD90 ( D R S ref) ^:j'-:.E'i:''y 0 4 0 5 0 6 Difference (D) Value 0 7 0 8 09 Figure 4.32: D-values for the two experts and three residents in the physical simulator. 130 4.2.3.5.3 Interlevel Intrasetting V R Simulator Again, we are able to investigate construct validity o f the V R simulator by looking for skil l level differences. The C P D comparison between the experts and residents show the most similar profiles (Figure 4.33) when compared to the interlevel comparisons o f the physical simulator and OR. The largest variations are seen in the d and absolute force profiles. There are also differences in the transverse and absolute tool tip directions. Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation Experts Residents Velocity -1D 0 10 -10 0 10 , -10 -5 0 5 Acce l 'n 5 10 152025 , 10 20 30 ,0 0.5 1, rad/s2 -50 0 50 0 : 40 -60 0 40 , 20 60 100 , 20 Jerk Force 140 ,0 0.5 1 rad/sJ 1 0.5 0 -400 0 200 -400 0 200 -400 0 200 200 500 800 200 600 1000 0 0 5 1 1 Nm 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0.3 0.9 0 0.5 1 Figure 4.33: Lumped interlevel VR simulator CPD. 131 We then investigate the individual differences for the V R simulator for all residents and experts (Figure 4.34). Here we see a lot o f similarity in all performance measures. The variability between experts and residents looks to be quite small according to the C P D . Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute 1 Rotation - - Res 1 ^—^— Ex 1 Res 2 Ex 2 — — Res 3 Velocity 15 0 Jerk Force D 0.5 1 0 0.5 1 0 0.5 0.3 1.3 0 0.5 Figure 4.34: Interlevel VR simulator individual CPD. 132 The differences in the V R simulator are all much lower than what is seen in the physical simulator and in the O R (Figure 4.35). A l l D-values are below 0.4 except for the d o f expert 2 vs. resident 2. In this simulator, it would be difficult to distinguish between the experts and residents, as the differences are all small. Abs Trani Velocity I Acceleration* Tranf Jerk I Force Abs Exl vs. Resl Ex I vs. Res2 Exl vs. Res.l Ex2 vs. Resl Ex2 vs. Res2 Ex2 vs. Res3 CPD90 (DRs-rrf) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Difference (D) Va lue Figure 4.35: D-values for the two experts and three residents in the VR simulator. 4.3 Discussion This project was chosen as a complementary and follow-up study to that o f Kinnaird (2004). The results that have been obtained further support and answer the questions initially posed in Kinnaird's project. Further results in the realm o f comparisons between expert and resident surgeons have also lead to more questions and preliminary answers. This work begins the investigation o f construct and performance validity o f the physical and virtual reality (VR) simulators. We also have created a new method for evaluating concurrent validity by having the same performance measures in all contexts, with expert O R data as the "gold standard". The motor behaviour o f the surgical tool tip was the model used to extract quantitative measures that allowed for comparisons. We w i l l analyze and discuss the results to help us 133 understand the comparisons that have been made, and to further investigate the overall objectives o f our validity studies. 4.3.1 Context Comparisons 4.3.1.1 Intraprocedural Operating Room Variability The intraprocedural intrasubject operating room results show that there can be difference between the three dissection segments, but generally speaking, the three segments had a D-value o f less than 0.3. This is an interesting result as each segment has a different objective (i.e. exploration, dissection, etc.). Each o f these segments is a different section o f a larger dissection task. The overall goal at the end o f the dissection task is to have clipped and cut the cystic duct and artery to isolate the gallbladder, and this goal can be reached using a variety o f kinematics and forces. In two o f the three trials, Segment 3 was found to have the largest differences from the other two segments. In the other trial, Segment 1 showed the most difference from segments 2 and 3. It is interesting to note here that in these three O R trials, the 3 segments chosen were not always o f the same tasks as this was not possible. Resident 1 spent the entire data collection period in the exploration and dissection phase, and we never were able to observe any cl ipping or cutting o f the ducts/artery. Due to a chronically inflamed gallbladder, this exploration and dissection took more than 20minutes, when normally it would only take -10 minutes. It is seen in the results that Segment 3 shows the most differences to the other two segments. It is possible that during this particular segment, some particularly difficult or different anatomy o f the gallbladder was causing the resident to vary behaviours. Resident 2 data showed the most difference in segment 3. This O R trial followed the "norma l " segment protocol o f exploration, setting and clipping the duct and artery, and a final gallbladder dissection segment. This protocol most closely follows that o f Kinnaird (2004), but contrary to what she found, segment 3 showed the most difference as opposed to segment 1. 134 Resident 3 data followed the normal course of exploration, dissection, and the setting of two clips and cutting the cystic duct and artery. But segment 3 was the preparation of the cystic duct for cholangiogram examination (x-ray examination to look at gallbladder and ducts). The experimental surgical tool was used to insert the catheter for cystic duct exploration and verification. It is interesting to see that even though this is a completely different task than is done in any other OR trial, the performance measures do show some differences, but not as large as would be expected. For the three OR trials, the levels of differences varied as expected. Resident 3 showed the least amount of intraprocedural variability with D-values below 0.3. While on the other hand, Resident 2 had the largest amount of variability with slightly more D-values over 0.3, while Resident 1 had D-values in-between these two. Generally, the three residents showed good repeatability in the OR intraprocedural comparisons. 4.3.1.2 Intrasubject Intertrial VR Variability Intertrial variability in the V R simulator was found to be quite low with D-values in the ranges of close to 0 to 0.5, and most of the D-values were less than 0.2. This result was as predicted as the V R simulator is not an inherently variable situation. Also, all three V R trials were conducted consecutively on the same day. Each of the trials was the same scenario as the previous trials, and very predictable for the resident to know exactly what to expect. The results also coincide and verify the results by Kinnaird (2004) that a small number of trials are needed for each subject to study simulator performance. The largest intertrial differences were found for d and the absolute force values, which was similar to what was found for the intraprocedural intrasubject OR trials. The three residents are very repeatable in their V R simulator performances. 4.3.1.3 Intersubject Intrasetting Comparisons The intersubject intrasetting comparisons investigate consistency at the skill level within the context. We are specifically looking at PGY4 surgical residents. Our results of intersubject intrasetting differences verify those found by Kinnaird (2004). The intersubject intrasetting differences decreased from OR to physical to V R . This result coincides 135 with the level o f structure inherent in each context; least structured to most structured. The O R environment has many different variables that can lead to many differences, whereas the V R simulator environment does not have many variables, and is a predictable and repeatable environment. 4.3.1.3.1 Operating Room The intersubject O R differences generally were in the area o f 0.2-0.4 in all measures except for force where they were generally larger. This tells us that the residents wi l l use relatively similar tool motor patterns to achieve the same end result. The force patterns used by the residents were more different, and again the same gall bladder removal procedure was completed successfully. The three residents show fair consistency in the O R context. 4.3.1.3.2 Virtual Reality Simulator Intersubject V R simulator differences are lower than the O R trials. This is an expected result, as the intrasubject intertrial V R differences were very low also. The majority o f D-values were below 0.1 showing incredible intersubject similarities. The three residents are very consistent to each other. It is an interesting result in that each o f the residents received no training on the V R simulator, but would treat the simulator in a predictable and repeatable fashion to each other. The residents also commented that they thought the V R simulator was like a video game, and that certain tasks would be useful to train on a V R simulator. But this particular dissection task was not very realistic, and was not the same way they would behave in a real O R situation. These comments coincide with those o f Kinnaird's experts' data comments on the face validity o f the V R simulator (Kinnaird 2004). Neither residents nor experts felt that this V R dissection task was very good for training or evaluation o f skil ls. 4.3.1.3.3 Physical Simulator Intersubject physical simulator differences were also relatively low in all measures with most D-values below 0.3. These D-values fall in-between the O R and V R simulator differences, and this result is the same as found with Kinnaird's expert data (Kinnaird 2004). We see the largest differences in the force and d difference values, and this is the same as was found for the O R trials' differences. It is also interesting to note that the comparison between resident 2 and 136 resident 3 resulted in all D-values below 0.3 except in d. The three residents are fairly consistent in the physical simulator. Again, a quick face validity study was conducted, the opinions varied amongst the residents'. Although none o f them found the mandarin orange dissection incredibly realistic, their opinions did vary on how well they thought their motor patterns or force exertions were similar to in the OR. Another factor in the residents' opinion was the "juicy-ness" o f the orange itself. Some mandarin oranges were quite juicy, and the skin did not peel o f f easily making for a more difficult dissection task. If the mandarin orange was generally "drier" , the dissection task was easier, and the residents' were more easily able to complete the task. 4.3.1.4 Intrasubject Intersetting Comparison The three residents were compared in the three environments o f the OR, V R and physical simulators. This comparison gives us an indication o f performance validity o f the simulators, as we compare each to the O R environment. Specifically, these three contexts were compared to each other: OR to physical, O R to V R , V R to physical. These intersetting comparisons result in larger differences than the intrasetting comparisons. The D-values calculated run the entire range from close to 0 to 1 (similar to different). We see the largest differences between the V R simulator and both the O R and physical simulator settings. The most striking difference was between a few o f the force measures o f the V R and physical simulators with the three residents (D=l ) . A l l three residents show the similarity in differences in the kinematics measures where the V R simulator had slower velocities, accelerations and jerk measures when compared to the O R and physical simulator. The most striking difference was between the absolute force measures o f the V R simulator compared to the physical simulator and the OR, which is most visible when looking at the C P D . The residents did find and comment that the V R simulator to be a " low force" environment compared to a typical O R scenario. 4.3.1.5 Interlevel Intrasetting By using the data collected and analysed in this project, and the data analyzed by Kinnaird (2004), we are able to begin an investigation into the construct validity o f the V R and physical 137 simulators. A simulator showing construct validity will be able to detect differences between skill levels. In this analysis, the data from the two experts was lumped together to create an "expert" group, and the three residents' data was lumped together for a "resident" group. This is an efficient and easy method to detect immediate differences between the two skill levels. We also looked at the differences between all 5 subjects, and can see how each of the three residents compared to the two expert surgeons. This is an interesting comparison as opposed to looking at the lumped data. We can see the more detailed differences between these groups. 4.3.1.5.1 Operating Room Immediately on analysis, we can detect differences between the expert and resident data in the OR. Interestingly, the residents seem to be moving faster (velocity) than the experts. One would think that a surgical resident would be more tentative, and move slower, but as the data shows, this is not the case. We see large differences in the force data, where the expert surgeons use high forces more frequently than the residents. This could be a sign of the tentativeness of the residents. They may not feel comfortable in the OR to "pull and tug" with a lot of force. When we look at the three residents and two experts individually, we see the differences cover the entire range from close to 0 to 1 (similar to different). We do see that the force measures are 0.2