Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Assessing performance and construct validity of laparoscopic surgical simulators 2006

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
ubc_2007-0166a.pdf
ubc_2007-0166a.pdf [ 18.82MB ]
ubc_2007-0166a.pdf
Metadata
JSON: 1.0080773.json
JSON-LD: 1.0080773+ld.json
RDF/XML (Pretty): 1.0080773.xml
RDF/JSON: 1.0080773+rdf.json
Turtle: 1.0080773+rdf-turtle.txt
N-Triples: 1.0080773+rdf-ntriples.txt
Citation
1.0080773.ris

Full Text

A S S E S S I N G P E R F O R M A N C E A N D C O N S T R U C T V A L I D I T Y OF L A P A R O S C O P I C S U R G I C A L S I M U L A T O R S by J O A N N E L I M B.Sc. (Eng), The University of Guelph, 2001 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L M E N T OF T H E R E Q U I R E M E N T S F O R T H E D E G R E E OF M A S T E R OF A P P L I E D S C I E N C E in T H E F A C U L T Y OF G R A D U A T E S T U D I E S (Mechanical Engineering) T H E U N I V E R S I T Y OF B R I T I S H C O L U M B I A December 2006 © Joanne L i m , 2006 11 Abstract The objective of this work is to assess the construct and performance validity of two laparoscopic surgical simulators. Currently, the evaluation of surgeons is considered subjective and unreliable, and this is a reason why surgical educators have been studying surgical simulators as a method to quantitatively assess surgeons. But we must find out i f these simulators are valid and reliable methods for training and assessing surgeons. We have designed an experimental surgical tool and data collection system to quantitatively measure surgeon motor behaviour in the operating room (OR). Our experimental system collects kinematics and force/torque data from sensors, and we have developed a sensor fusion algorithm to be able to extract high frequency and continuous kinematics data. We have collected data from surgical residents (PGY4) , and compared it to expert surgeon data to investigate construct validity of both a physical simulator and virtual reality (VR) simulator. We also study the performance validity of both the simulators by comparing measurable quantities, such as force and kinematics, on the simulators with that collected in the OR. To examine differences in our contexts, we use the Kolmogorov-Smirnov statistic. According to our intrasubject intersetting (OR, V R , physical) comparisons, we see large differences between the O R and V R simulator, leading to the conclusion of poor performance validity. Conversely, we see smaller differences between the physical simulator and the OR, and therefore showing fair performance validity. In our interlevel (expert vs. resident) comparisons, we see that the V R simulator shows poor construct validity with little difference detected between skill levels, while the physical simulator seems to be able to detect differences in some performance measures and can be considered to show fair construct validity. i i i Table of Contents Abstract i i Table of Contents i i i List of Tables v i i i List of Figures ix Acknowledgements x i i i Chapter 1: Introduction and Literature Review 1 1.1 Introduction and Objectives 1 1.2 Minimally Invasive Surgery 2 1.2.1 The Challenges of MIS for Surgeons 4 1.2.2 The Challenges of MIS for Surgical Educators 6 1.2.3 Reasons for Using Surgical Simulators 6 1.2.3.1 Surgeon Certification 6 1.2.3.2 Equipment Design and Evaluation 6 1.2.3.3 Transfer of Training 7 1.3 Current Training Methods 7 1.4 Current Methods of Surgical Performance Assessment 9 1.5 Simulator Validation 11 1.5.1 Construct, Performance and Concurrent Validity 13 1.6 Research Question 14 1.7 Developing a Quantitative Assessment Method 15 1.7.1 Kinematics 16 1.7.2 Forces/Torques 17 1.8 Project Goals 17 Chapter 2: The Hybrid Experimental Laparoscopic Surgical Tool for Performance Measure Assessment 20 2.1 Introduction 20 2.2 Laparoscopic Surgical Tool 21 2.3 Performance Measures 22 2.3.1 Kinematics 22 2.3.1.1 Optoelectronic Position Tracking 23 2.3.1.1.1 Other Kinematics Options 24 2.3.1.2 Electromagnetic Position Tracking 25 2.3.2 Force/Torque 27 2.3.3 Sensor Bracket Design 28 2.3.3.1 Force Balance 31 2.3.4 Grip Force 37 2.4 Force/Grip Data Processing 38 2.4.1 Grip Calibration 38 2.4.2 Gravity Effects Calibration 42 2.5 Kinematics Data Fusion 43 2.5.1 Data Fusion Introduction 43 2.5.2 General Data Fusion 43 2.5.2.1 Fusion Methods 45 2.5.4 Kinematics Data Fusion Technique 45 2.5.4.1 Data Fusion Technique Details 46 2.5.5 Error Analysis 54 2.5.5.1 Analysis Method 54 2.5.5.2 Results of Error Analysis 55 2.5.5.2.1 Computer Generated Data 55 2.5.5.2.2 Laboratory Data 57 2.5.5.2.3 Operating Room Data 59 2.5.6 Discussion of Kinematics Data Fusion 61 2.5.7 Conclusions for Kinematics Data Fusion 62 2.6 Discussion and Recommendations 62 2.6.1 Kinematics 63 2.6.2 Force 63 2.6.3 Recommendations 64 Chapter 3: Experimental Methods for Assessing Validity of Laparoscopic Surgical Simulators 66 3.1 Introduction and Objectives 66 3.2 Subjects and Settings 67 3.2.1 Settings 68 3.2.1.1 Operating Room 68 3.2.1.2 Virtual Reality Simulator 68 3.2.1.3 Physical Simulator 69 3.3 Performance Measures 70 3.3.1 Kinematics 70 3.3.2 Forces 70 3.4 Equipment Used 73 3.4.1 Video Data 73 3.4.2 System Component Integration 74 3.4.3 Data Acquisition Software 75 3.5 Data Collection 76 3.5.1 Operating Room Study 76 3.5.2 Simulator Data Collection 77 3.6 Data Post-Processing 78 3.6.1 Kinematics Data Registration and Calibration 79 3.6.2 Force/Torque Data Registration and Calibration 79 3.6.2.1 Force/Torque Data Registration 79 3.6.3 Raw Data Synchronization 80 3.7 Electrosurgery Unit 81 3.7.1 E S U Effects 82 3.7.1.1 Removal of E S U Effects 83 3.8 Task Comparisons 86 3.8.1 The Dissection Stage 86 3.8.2 Data Segmentation 87 3.8.2.1 Data Segmenting 87 3.9 Setting Comparisons 88 3.9.1 Kolmogorov-Smirnov Statistic 89 3.9.2 Comparisons 90 3.9.3 Assigning Confidence Intervals 90 3.9.4 Dependent Data and Moving Block Bootstrap 90 3.9.4.1 Measurement Resolution 91 3.10 Discussion 93 Chapter 4: Results of a Quantitative Study to Assess Laparoscopic Surgical Simulator Validity 95 4.1 Introduction 95 4.2 Results 95 4.2.1 Context Comparisons 96 4.2.1.1 Surgical Residents 96 4.2.1.2 Expert Surgeons 97 4.2.2 The D-Value 97 4.2.3 Presentation of Results 98 4.2.3.1 A l : Intrasubject Intraprocedural O R comparisons 98 4.2.3.1.1 Resident 1: Intrasubject Intraprocedural O R 99 4.2.3.1.2 Resident 2: Intrasubject Intraprocedural O R 101 4.2.3.1.3 Resident 3: Intrasubject Intraprocedural O R 103 vi 4.2.3.2 A2: Intrasubject Intertrial V R simulator 105 4.2.3.3 A3, A 4, and A5: Intersubject Intrasetting Comparisons I l l 4.23A A6: Intrasubject Intersetting 117 4.2.3.5 Expert vs. Resident Comparisons 123 4.2.3.5.1 Interlevel Intrasetting O R 124 4.2.3.5.2 Interlevel Intrasetting Physical Simulator 127 4.2.3.5.3 Interlevel Intrasetting V R Simulator 130 4.3 Discussion 132 4.3.1 Context Comparisons 133 4.3.1.1 Intraprocedural Operating Room Variability 133 4.3.1.2 Intrasubject Intertrial V R Variability 134 4.3.1.3 Intersubject Intrasetting Comparisons 134 4.3.1.3.1 Operating Room 135 4.3.1.3.2 Virtual Reality Simulator 135 4.3.1.3.3 Physical Simulator 135 4.3.1.4 Intrasubject Intersetting Comparison 136 4.3.1.5 Interlevel Intrasetting 136 4.3.1.5.1 Operating Room 137 4.3.1.5.2 Virtual Reality Simulator 137 4.3.1.5.3 Physical Simulator 138 4.3.1.2.4 Experts vs. Residents 138 4.3.2 Performance Measure Reliability 139 4.4 Conclusions 139 Chapter 5: Conclusions and Recommendations 141 5.1 Introduction • 141 5.2 Review of Research 141 5.2.1 Experimental Surgical Tool 141 5.2.2 Data Collection 143 5.2.2.1 The Operating Room 143 5.2.2.2 The Experimental Surgical Tool 143 5.2.2.3 Simulators 144 5.2.3 Data Fusion 144 5.2.4 Performance Measures 145 5.2.5 Context Comparisons 146 5.2.6 Simulator Validation 147 5.3 Recommendations 147 5.3.1 Software 148 5.3.2 Hardware 148 5.3.3 O R Data Collection 149 5.3.4 Simulators 149 5.3.5 Other Recommendations 149 vi i 5.4 Partner & Future Studies 149 List of Terms 153 Bibliography 155 Appendix A : O R Study Experimental Protocol and Data Acquisition Procedures 165 Appendix B : Operational Definitions 169 Appendix C: University of British Columbia C R E B approval 173 Appendix D: Medicine Meets Virtual Reality Conference Submission 175 Appendix E: S A G E S Conference Submission 177 Appendix F: Transfer of Training from Simulator to Operating Room 179 Vlll List of Tables Table 1 .1- Types of validity definitions 13 Table 2.1 - Criteria for design of the sensor mounting bracket 29 Table 2.2 - Advantages of data fusion 44 Table 3.1 - Performance measures available from the three contexts 72 Table 4.1 - Summary of successful data collection from each context 95 Table B . l - Hierarchical subtask dissection definition 171 Table B.2 - Kinematics and force performance.measures 172 IX List of Figures Figure 1 .1- Typical minimally invasive surgery operating room setup 3 Figure 1.2 - A typical laparoscopic cholecystectomy operation 4 Figure 1.3 - Reduced D O F of motion of the MIS tool tip 5 Figure 1.4 - Physical and V R simulators 8 Figure 1.5 - Performance Measures 15 Figure 1.6 - Are laparoscopic surgical simulators valid? 18 Figure 2.1 - Maryland dissector tip 21 Figure 2.2 - Tool tip reference frame 22 Figure 2.3 - N D I Polaris optoelectronic position tracking system 23 Figure 2.4 - M D M A r r a y 24 Figure 2.5 - Polhemus Fastrak magnetic position tracking system 27 Figure 2.6 - F/T system of Rosen (1999) 28 Figure 2.7 - A T I M i n i 40 F/T transducer 28 Figure 2.8 - Two cut views of a typical laparoscopic tool shaft 29 Figure 2.9 - Sensor mounting bracket on surgical tool 30 Figure 2.10 - Force/Torque sensor bracket two segments 30 Figure 2 . 1 1 - Force load path through sensor bracket and F/T sensor 31 Figure 2.12 - Laparoscopic trocar 32 Figure 2 .13a - Overall view of the tool, which is then split into 3 sections for F B D analysis.32 Figure 2.13b - Effective tip forces & moments 34 Figure 2.13c - Free body diagram of distal end of surgical tool and force sensor 34 Figure 2.13d - Free body diagram of force sensor and stationary tool handle 35 Figure 2.13e - Free body diagram of tool handle and strain gauges 36 Figure 2.14 - Strain gauge circuit diagram for half-bridge circuit 37 Figure 2.15 - Strain gauges mounted on tool handle 38 Figure 2.16 - Mechanics of laparoscopic tool 38 Figure 2.17 - Interaction between strain gauges and force sensor 39 Figure 2 . 1 8 - Results from one calibration test 40 Figure 2.19 - Friction in the surgical tool handle and bracket 41 Figure 2.20 - Grip compensation 42 X Figure 2.21 - Data fusion steps 46 Figure 2.22a - Noisy magnetic data 47 Figure 2.22b - G C V smoothed magnetic data 48 Figure 2.23 - Laboratory data 49 Figure 2.24 - Interpolate the magnetic data 50 Figure 2.25 - Interpolated magnetic data 51 Figure 2.26 - Difference curve 52 Figure 2.27 - Interpolated difference curve 53 Figure 2.28 - Fused data 54 Figure 2.29 - Computer generated magnetic and optical data 56 Figure 2.30 - R M S error for computer-generated data 57 Figure 2.31 - Laboratory collected magnetic and optical data 58 Figure 2.32 - R M S error for laboratory collected data 59 Figure 2.33 - Real O R magnetic and optical data 60 Figure 2.34 - R M S error for O R data 61 Figure 3.1 - Diagram of goals for this project 67 Figure 3.2 - Tool tip reference frame 73 Figure 3.3 - Components of the performance measurement system 74 Figure 3.4 - Custom designed data acquisition software 76 Figure 3.5 - Data post-processing 78 Figure 3.6 - Data synchronization process 80 Figure 3.7 - Strain gauge data with electrocautery noise 83 Figure 3.8 - Raw and noise removed strain gauge data 84 Figure 3.9 - Electrocautery affected magnetic data 85 Figure 3.10 - Electrocautery affected F/T data 86 Figure 3.11 - V R simulator vs. Physical simulator vs. O R 87 Figure 3.12 - Kolmogorov-Smirnov C P D 89 Figure 3.13 - Moving Block Bootstrap 91 Figure 3.14 - Confidence intervals for D-values 92 Figure 3.15 - C P D of D-values 93 Figure 4.1 - Context comparisons for surgical residents 96 Figure 4.2 - Interlevel context comparisons for expert and residents 97 xi Figure 4.3 - Resident 1 intraprocedure O R C P D 100 Figure 4.4 - Resident 1 intraprocedure O R D-values 101 Figure 4.5 - Resident 2 intraprocedure O R C P D 102 Figure 4.6 - Resident 2 intraprocedure O R D-values 103 Figure 4.7 - Resident 3 intraprocedure O R C P D 104 Figure 4.8 - Resident 3 intraprocedure O R D-values 105 Figure 4.9 - Resident 1 intertrial V R simulator C P D 106 Figure 4.10 - Resident 1 intertrial V R simulator D-value comparisons 107 Figure 4.11 - Resident 2 intertrial V R simulator C P D 108 Figure 4.12 - Resident 2 intertrial V R simulator D-value comparisons 109 Figure 4.13 - Resident 3 intertrial V R simulator C P D 110 Figure 4.14 - Resident 3 intertrial V R simulator D-value comparisons 111 Figure 4.15 - Intersubject intrasetting (OR) C P D 112 Figure 4.16 - Intersubject intrasetting (OR) D-value comparisons 113 Figure 4.17 - Intersubject intrasetting ( V R simulator) C P D 114 Figure 4.18 - Intersubject intrasetting ( V R simulator) D-value comparisons 115 Figure 4.19 - Intersubject intrasetting (physical simulator) C P D 116 Figure 4.20 - Intersubject intrasetting (physical simulator) D-value comparisons 117 Figure 4.21 - Resident 1 intersetting C P D 118 Figure 4.22 - Resident 1 intersetting D-values 119 Figure 4.23 - Resident 2 intersetting C P D 120 Figure 4.24 - Resident 2 intersetting D-values 121 Figure 4.25 - Resident 2 intersetting C P D 122 Figure 4.26 - Resident 3 intersetting D-values 123 Figure 4.27 - Lumped interlevel O R C P D 124 Figure 4.28 - Interlevel O R individual C P D 125 Figure 4.29 - D-values for the two experts and three residents in the O R 126 Figure 4.30 - Lumped interlevel physical simulator C P D 127 Figure 4.31 - Interlevel physical simulator individual C P D 128 Figure 4.32 - D-values for the two experts and three residents in the physical simulator.... 129 Figure 4.33 - Lumped interlevel V R simulator C P D 130 Figure 4.34 - Interlevel V R simulator individual C P D 131 Xll Figure 4.35 - D-values for the two experts and three residents in the V R simulator 132 Figure 5.1 - N e w performance measures 145 Figure 5.2 - Concurrent research projects at the Neuromotor Control Laboratory 151 Figure A . l - University of British Columbia operating room experimental set-up 167 Figure B . l - Five levels of the hierarchical decomposition 169 Figure B.2 - Five phases of laparoscopic surgery 169 Figure B.3 - Stage level diagram for cystic duct dissection (CDD) and gallbladder dissection (GBD) 170 Acknowledgements I would like to acknowledge and thank all those who have supported me from the first day I began this journey. Firstly, my thanks to my supervisor, Dr. Antony Hodgson, for his enthusiasm and optimism in this project, even when things were at their bleakest. I could always count on him to put a positive spin on every aspect, and for this I am most grateful. I am most appreciative to the participating surgical residents that I harassed endlessly for their time, when their time is so limited. Thanks to Dr. Ed Chang, Dr. Naisan Garraway, and Dr. Kathy Hsu. Thanks also to Dr. Hamish Hwang for twisting the arms of his resident friends to participate in this project. I would also like to acknowledge Dr. Alex Nagy and Dr. Neely Panton for sharing their knowledge and expertise, and supervising the surgical residents in the operating room. Many thanks go to Marlene Purvey and her surgical staff, and also to Betty Whincup and the staff at the Sterile Supply Department. Next I want to give a big high-five to Catherine Kinnaird and Iman Brouwer. Y o u guys are the best! N o one else understands as well as you what we went through to finish this project. A s a wise Iman once said, "I thought this day would never come." I also need to give a big pat on the back to the orthopod girls: Stacy Bullock, Carolyn Sparrey, Carolyn Greaves and Christina Nios i . Thank goodness for coffee break is all I have to say about that. Also, thanks to the rest of the N C L crew. Keep up the good work! I also want to say thanks to V a l Roy for being such a good suburb buddy. I am very grateful to my friends at C E S E I : Ferooz, Vanessa, Marlene, Humberto and Dr. Qayumi. Thanks for the comic relief, the fabulous printer/copier, and all the food. The students that get to work with you next are very lucky indeed. Many thanks go to Ryan Jennings for being at home to lend an ear to listen to my whining, for pushing me constantly to work harder, and for never letting me give up. To my Dad whom I got good advice from about completing graduate studies, and giving continued support and encouragement. A n d lastly, thanks to my M o m , whom I could not have done this project without. I am eternally grateful to for her unwavering support, no matter what. 1 Chapter 1 Introduction and Literature Review 1.1 Introduction and Objectives Minimal ly invasive surgery is an increasingly popular procedure that uses smaller incisions, and results in much shorter recovery periods for the patients. Unfortunately, the surgery is substantially more demanding for the surgeon, who must learn a new set of skills; to use long instruments inside the body, while looking at a monitor outside of the body. Simulators offer the surgeon an opportunity for unlimited practice, and for practice on unusual cases. In order for the training to be useful, the simulator must accurately reflect the skill set required in surgery. The goal of this project was to validate both a physical and a virtual reality simulator in terms of the kinematics and the forces used in comparison to those used during surgery. Surgeons must learn to operate both with skill and safety. The use of surgical simulators has become more widespread and important in the training of surgical residents. It is important that researchers direct their efforts into the areas that are of most significance to the patients and surgeons alike. Objective measurements of a surgeon's performance are more readily available in a simulator as compared to taking measurements during a live operation. This is also important when evaluating trained surgeons. Making these measurements in a simulated setting would be ideal as it is much more easy to evaluate performance in a simulator than in the OR. New tool designs and improvements could also be tested in a simulator saving operating room time and money. Surgical education has lagged behind other educational areas where simulators are commonplace for teaching and training novices. Other professions, such as aviation, have successfully included simulation training into their educational programs. The success in the pilot training industry has pushed surgical educators to continue research in this area. In a survey in 1999, 92% of program directors agreed that there is a need for technical skills training outside of the O R (Haluck 2001). This is a definite explicit sign that it is imperative that other methods of surgical education be explored. 2 The overall objective of our lab research was to create and apply a quantitative method of surgical performance in order to assess two laparoscopic surgical simulators. The shorter-term goals included a study of the validity and reliability of these surgical simulators, and a study of the minimum technological requirements of a virtual reality surgical simulator. We aimed to establish whether these simulators are reliable measurement devices. The primary objective of the work presented in this project was to assess the validity of both virtual reality and physical laparoscopic surgical simulators. The second goal was to develop a new experimental tool and system capable of the collection and analysis of the performance measures used in the simulator validity assessments. Operating room data was compared to analogous tasks in the simulator settings. The new methods provide a standard for future simulator assessments. 1.2 Minimally Invasive Surgery Minimal ly invasive surgery (MIS) has become a routine and usual method of performing many types of surgical procedures. MIS is also known as minimal access surgery ( M A S ) or keyhole surgery. Because of advances in technology and medicine, many open surgical procedures can now be performed using MIS . The notion of MIS first began in the early 20 t h century (Nagy, 1992). After World War II, the two most important inventions related to endoscopy and MIS were developed: the rod-lens system and fibreoptics. After much development in surgical technique and camera technology, the first laparoscopic cholecystectomy using video was performed on a human in 1987 in Lyons, France (Mishra 2004). Within that year, many other surgeons were practicing their first cholecystectomies on humans on both sides of the Atlantic. Since the late 1980s, MIS has become commonplace in modern general surgery. The use of MIS in the United States in abdominal surgical procedures has reached 60-80% (Taylor 1995). A typical minimally invasive surgery operating room set-up can be seen below in Figure 1.1. Figure 1.1: Typical minimally invasive surgery operating room set-up. Notice the video monitors in the background and situated around the OR. The surgeons rely on these monitors to view the surgical field within the patient. The monitors show a direct video feed from the laparoscopic camera. Laparoscopic surgery has allowed surgeons to perform many of the same procedures as in traditional open surgery, but using small incisions (5-15 mm) instead of large abdominal incisions (7-15 cm) (Huntsville 2002). This increased use of M I S techniques throughout the years has led to benefits for patients. Studies have shown that the patient benefits in terms of reduced post operative pain, smaller scars, reduced hospital stay, quicker return to normal physical activities and therefore, a quicker return to work (Treat 1996, Perissat 1995). It is common, and proven to be safe, for routine cholecystectomy procedures to be day surgeries, with the patient coming into the hospital in the morning and leaving for home in the afternoon (Prasad 1996). Other patients are discharged from the hospital usuallyl or 2 days after the cholecystectomy with low complication rates (Lujan 1998). A typical laparoscopic cholecystectomy operation set-up can be seen in Figure 1.2. 4 F igure removed due to copyr ight Figure 1.2: A typical laparoscopic cholecystectomy operation. Al though there are many obv ious benefits for the patients in M I S , there is a specia l ized sk i l l set required by the surgeon that is much different than in tradit ional open surgical techniques. Laparoscop ic tools very often l imi t the surgeons' dexterity and range o f mot ion, and surgeons use uncomfortable postures to complete tasks (Person 2001). Laparoscopic tools are considered e c o n o m i c a l l y . p o o r l y designed, awkward and not easy to use (Berguer 1999, E m a m 2001 , Treat 1996). A l s o , the t ime to complete a laparoscopic procedure compared to the same'open surgical procedure can be up to 3 0 % longer (Gl inats is 1992, Treat 1996). Converse ly , other studies have shown that there is either no signif icant difference in surgical t imes, or that the laparoscopic approach may actual ly be shorter in t ime duration (Pessaux 2001). 1.2.1 The Challenges of M I S for Surgeons The special sk i l l set required o f surgeons for laparoscopic surgery is especial ly d i f f i cu l t for the trainee to learn. One o f the aspects that a novice surgeon must adapt to is what is k n o w n as the fu l c rum effect (Jordan 2001). Spec i f ica l ly in laparoscopic surgery, this is when the surgical tool is inserted into the abdomen, creating a fu lc rum. The surgeon experiences a mot ion 5 reversal. For example, when the surgeon moves their hand to the left on the outside of the body, the tool tip is moving to the right inside the abdomen. This is a basic motor skill that novice surgeons must learn. Another issue is video-hand-eye coordination (Ballantyne 2002, Perkins 2002). The surgeon is no longer directly viewing the surgical field, but rather a 2D video monitor of what is happening inside the abdomen. The surgeon is working with their hands outside of the abdomen using longer surgical tools than in open surgery, watching a video feed of what the surgical tools are doing inside the abdomen. This lends itself to a lack of depth perception and makes tasks such as suturing and knot tying more difficult. Tactile feedback is also reduced in MIS creating yet another problem for surgeons to overcome. In laparoscopic surgery, there are a reduced number of degrees of freedom (DOF) for the surgical tool. The laparoscopic tool only has 4 D O F as opposed to the open surgical tool, which has 6 D O F . This limits the surgeon's dexterity and range of motion (Tendick 1995, Ballantyne 2002). The tip movement is limited to pitch, yaw, roll and plunge (i.e. in/out of the abdomen), as shown in Figure 1.3 (Person 2000). Figure 1.3: Reduced DOF of motion of the MIS tool tip. DOF are roll, pitch, and yaw about the fulcrum created by the entry portal and plunge through the portal. (Modified from source: Person 2000). Researchers are studying methods to deal with these limitations of laparoscopic surgery by looking into new technologies such as 3D vision systems (Jones 1996, McDougal l 1996, Abdominal wall yaw pitch 6 Chan 1997, Hanna 1998), robotic surgery (Dakin 2003, Hubens 2003, Ruurda 2003, Ruurda 2002, Vuilleumier 2003) telerobotic surgery (Ballantyne 2002, Marescaux 2001, Perez 2003), and interactive image guidance (Harms 2001, Herline 2000, Stefansic 2002). 1.2.2 The Challenges of MIS for Surgical Educators Due to the inherent limitations of performing M I S , the surgical education community must face the challenge of deciding where to train the surgeons and how to evaluate them. These issues are of importance to the surgical community and public alike, in that it is imperative that surgeon trainees finish their education with the ability to operate safely and effectively. Researchers unanimously agree that the current training and evaluation of surgeons is subjective, unreliable and costly (Feldman 2004, Lentz 2002, Rosser 1998, Winckel 1994). This is one reason why there has been pressure to investigate the feasibility of using surgical simulators for the purposes of training and evaluation. 1.2.3 Reasons for Using Surgical Simulators Surgical simulators have many possible useful applications. Surgeon certification and tool evaluation are just two of the possible uses of validated simulators. 1.2.3.1 Surgeon Certification The ability to quantitatively assess surgical performance is important to the training and certification of both novice and expert surgeons. The methods that we have developed wi l l allow performance measurement in the O R followed by a comparison to the surgical reference database of performance measures from surgeons of varying skill levels. This w i l l allow for a quantitative analysis of skill level and a method for identifying where improvements are needed. For example, when a novice surgeon seems to be having difficulty in a certain task, this could ideally be identified, and advice given specifically to address the problem. 1.2.3.2 Equipment Design and Evaluation Tool and equipment designers could evaluate the performance of their new instrumentation in a validated simulated environment. The designers could ideally be confident that the new tools wi l l give the same performance in the OR. The evaluation of new tools would be an iterative 7 process where the new tool is compared to a reference database of performance measures for past tool designs. 1.2.3.3 Transfer of Training . If a simulator is shown to be valid (see Section 1.5 for further details), the next step in furthering the push for simulators to be used in surgical education programs is to determine the transfer of training issue. Do novice surgeons who practice in simulators show a significant improvement in the operating room? In other words, i f a novice surgeon spends X amount of time practicing on a simulator, w i l l there be a quantifiable improvement in O R performance, as opposed to a similar novice who does not have any simulator training? The original goal of this project was to study the issue of transfer of training from simulator to human operating room, as this is a subject that needs analysis in the surgical education and simulator fields of study. Unfortunately due to many logistical nightmares such as patient recruitment, scheduling, and many others, this project was converted to a simulator validity study. This was the most logical step as the proper O R and simulator data had already been collected. Further information on the transfer of training from simulator to O R issue can be found in Appendix F. 1.3 Current Training Methods Success in laparoscopic surgery is very dependent on the surgeon's proficiency and experience (Perissat 1995). The apprenticeship-training model is still the most commonly used for providing experience to surgical residents. This is basically where the surgical resident shadows the expert surgeon, and learns the tools and tricks of the trade by observation, questions, and some hands-on practice. The disadvantages of this approach are that the surgical educator has no control over which patients require surgery, potentially limiting a novice's contact to a small variety of cases. Consequently, the novice surgeon may only be exposed to a limited pool of anatomy and pathology. The use of human cadavers as a training model has been used in surgeon training programs with some success (Martin 2003). However these cadaveric models have their own 8 disadvantages: they are expensive, subject to availability, have different tissue properties than a live human, and there is some concern over transmission of disease (Nelson 1990). Animal models may avoid some of the stresses and time constraints of apprenticeship training, but the anatomy often differs from that of humans. The disease state cannot often be reproduced in the animal, and an animal care facility is expensive. There are many moral and ethical issues related to training on live animals. The United Kingdom has banned the use of animals for surgical training (Li r ic i 1997, Moorthy 2003). Surgical simulators, both physical and virtual reality (Figure 1.4), are becoming more widely used and accepted for use in surgical education, although their use is still limited at the University of British Columbia surgical training program, in that currently there is no prescribed simulator training. The use of virtual reality (VR) systems with haptic (force- feedback) interfaces has garnered much interest. Simulators are designed to highlight either or both the psychomotor skills (e.g., clipping and suturing skills) and the cognitive aspects of surgery (e.g., decisions about the steps to follow during a procedure). Simulator training is safe, highly available and unlimited practice is possible. No supervision is necessary when a novice is using these simulations. Figure 1.4: Physical and VR simulators. The left picture shows a physical simulator using regular laparoscopic tools, and an inanimate model. The right picture shows a VR simulator with computer-generated models. Recently in the surgical education community, there has been quite some interest expressed specifically in virtual reality (VR) simulators. Most of the current studies were done with V R simulators. Although V R simulators are comparatively expensive in initial cost compared to bench-top simulators, they do have advantages. These V R simulators can be programmed to include variant anatomy (pathologies, rare occurrences), temporal changes (patient status, bleeding), and of course, provide objective measurements (e.g., time, errors, kinematics, etc.) through the computer software. Surgical educators have come to the realization that the surgical training programs should become more structured, and that surgical models and simulators should have a more important role in training, evaluating, and certifying surgeons (Feldman 2004). For the attending expert surgeons to be wil l ing to change to this new paradigm of teaching, it is imperative to eventually demonstrate that time spent in a simulator can replace time spent in the operating room. This is not only important to the educators but to the hospital administrators and the taxpayers alike. In the U S in 1997, the estimated cost of training 1014 general surgery residents in the O R was $53 mill ion (Bridges 1999). This cost was mostly attributed to the extra amount of time (2480 hours) spent in the ORs when a resident is operating. So it is quite obvious that financially, simulators may save time in the O R and therefore money in training surgeons. 1.4 Current Methods of Surgical Performance Assessment A clear and objective method to assess performance and skill in laparoscopic procedures is potentially useful for many aspects of surgery including surgical resident evaluation, simulator validation, and surgical tool evaluation. Since the early 1970s, when Kopta developed one of the first methods for performance evaluation, the surgical education community has become quite interested in this topic (Kopta 1971). Current evaluation methods are known to be subjective and possibly unreliable, so there is a need for objective methods to measure surgical performance (Rosser 1998, Winckel 1994, Lentz 2002, Chung 1998, Feldman 2004). One of the more commonly used methods for surgeon evaluation is the structured skills assessment form. These forms can be a type o f checklist or a form where the evaluator must describe/fill-in specific areas. This type of form allows for a complete intra-operative performance evaluation, which can analyze both psychomotor and cognitive skills of a surgeon. Many researchers have used this type of evaluation in various studies (Winckel 1994, Eubanks 1999, Reznick 1997). Many studies have also been done to show the validity and 10 reliability of using these structured skills forms (Martin 1997, Goff 2001, MacRae 2000,Cohen 1990, Regehr 1998,Faulkner 1996). The shortcomings of these types of forms include patient variability, stress associated with the O R environment, and the difficulty of recognizing the level of technical skil l . These surgical skills are not specifically quantified during these structured skills assessments. Another very common measure used to quantify surgeon performance is the speed to complete a task. The time required to perform a procedure is easy to measure and has been used in many studies (Derossis 1998, Fried 1999, Hanna 1998, Hodgson 1999, Rosser 1997, Starkes 1998, Szalay 2000, Taffinder 1999). Quality of performance has also been used as a method of evaluation. This measure is generally evaluated using subjective methods such as checklists and global assessments ratings (Eubanks 1999, Feldman 2004). Global assessment ratings are a type of subjective evaluation method where an evaluator can rate the subject on a scale (i.e. 1-poor to 5-excellent). Objective Structured Assessment of Technical Ski l l (OS A T S ) is one of the more commonly used and researched qualitative assessment techniques (Martin 1997). The O S A T S is a set of operation- specific checklists that is specific to a physical simulator. Quality is a subjective performance measure that can usually be easily implemented into any type of evaluation method. A measure of error has also been studied with some interest. Although most surgeons do not like to speak about errors or injuries occurring during surgery, errors and injuries do occur (Francoeur 2003, Way 2003). The methods of evaluating error vary from objective measures, usually in simulated settings, (Francis 2002, Grantcharov 2003, 0 'Toole l999) to subjective observed measures (Bann 2003, Joice 1998, Seymour 2002). Force/torques are another measure that can be analyzed. More recently, Rosen and colleagues successfully completed a study in a porcine model analyzing force/torque signatures on the surgical tool tip (Rosen 2001). The researchers used a Markov modeling method (method to detect patterns) along with a structured process to classify tool movements to evaluate surgical performance. They showed they could correctly categorize surgeons into two different experience levels (novice and expert) because of similarities derived from their Markov 11 models. Other researchers have also incorporated force measurements into their measurement and training systems (deVisser 2002, Hanna 1997, Morimoto 1997, O'Toole 1999, Wagner 2002, Verner 2002,Yamauchi 2002). 1.5 Simulator Validation Validity is a general term with many definitions. The American Psychological Association developed a set of standard definitions to aid in validity studies ( A P A 1974). From these standards, we are most interested in behavioural correspondence validity (now referred to as validity), as this is how the human operator treats the simulator as compared to the real situation. B y comparing the simulator and the real situation during analogous tasks in terms of human operator behaviour, this can be tested (Blaauw 1982). It is of utmost importance that the simulators used for surgical skill training, assessment, and certification be validated. The test in validating surgical simulators is to prove that performance in the simulator wi l l represent performance in the OR. To ensure a valid simulation, we must make certain that a surgeon treats the simulation, in as many applicable and quantifiable aspects as possible, the same way they treat a live patient. Many research groups have put considerable time into validating the currently available surgical simulation systems (Adrales 2003, Bloom 2003, Feldman 2004,Paisley 2001, Schijven 2003, Strom 2003, Taffinder 1998). There are five common different levels of validity from least to most rigorous: face, content, construct, concurrent and predictive (Table 1.1). Face validity is a type of validity that is assessed by experts' review of the contents of the simulator. It is a subjective test as it is based on expert opinion, and is usually done in the initial phases of validity testing. Content validity is an extension of face validity, where the expert would use a checklist to reduce the rater subjectivity. The content validity tests to see i f the simulator contains the steps and skills that are used in the real procedure. These simple validity tests are also the most subjective. 12 Construct validity is tested by discrimination between skill levels. It tests the degree to which the simulator "identifies the quality, ability or trait it was designed to measure" ( A P A 1974). This is another common test applied to surgical simulators. Concurrent validity is a validity test that correlates performance with the current gold standard. For surgical simulators, the gold standard is operating room performance by expert surgeons. Currently, the gold standard measurement is done with performance-specific checklists in the O R (Feldman 2004). Using this approach is generally time consuming and is still considered subjective. Predictive validity is whether the simulator can predict actual performance in the real setting. This type of validity is rather controversial as decisions about junior surgeons may be based on simulator performance. If predictive validity is shown, a poor simulator performance may remove juniors from continuing in their surgical training (Gallagher 2003). In a parallel project to the one to be described a fellow lab member, Catherine Kinnaird, investigated some aspects of validity of both physical and virtual reality surgical simulators with expert surgeon subjects (Kinnaird 2004). In Kinnaird's work, a new type of validity, performance validity, was introduced. Performance validity is a quantitative assessment of measurable quantities of performance in the O R (i.e. kinematics and force profiles); i f these measures are the same as in the surgical simulator, then the simulator can be considered valid. This new type of validity allows for objective assessments using the same measurable quantities in many different environments. Therefore, we have uniformity and consistency when making evaluations in the O R or simulators. 13 Table 1.1: Types of validity definitions (Gallagher 2003) Validity Definition Studies Face Expert Opinion Haluck 2001, McCarthy 1999 Content Checklist o f matching elements Paisley 2001, Schijven 2002 Construct Differentiates between skil l levels Adrales 2003, Datta 2002 Gallagher 2004, Grantcharov 2002, Taffinder 1998, Schijven 2003 Concurrent Correlates with gold standard Ahlberg 2002, Feldman 2004, Grantcharov 2004 Performance Quantifiable performance measures same as "real" setting Present study, Kinnaird 2004 Predictive Predicts future results N / A 1.5.1 Construct, Performance, and Concurrent Validity A very important step in the evaluation of surgical simulators is to establish construct validity. Construct validity is a quality established when performance scores on a simulator reflect the ability of the person performing the actual procedure; therefore an expert should score higher than a novice. Different researchers have studied construct validity of various different types of simulators such as arthroscopy and gastrointestinal endoscopy (Bloom 2003, Srivastava 2004). The concept of construct validity is often regarded as an important central theme in validation studies (Gabberson 1997). In the laparoscopic simulator field, there has also been extensive research into the validity of the M I S T - V R simulation system (Mentice Medical Simulation A B , Gothenburg, Sweden). The construct validity of this particular system has been established in a few different studies (Gallaher 2002, Gallagher 2001, McNatt 2001). The latest study on the M I S T - V R showed that the system has "discriminative validity" and was capable of evaluating the psychomotor skills necessary in laparoscopic surgery and discriminating experts and novices (Gallagher 2004). The M I S T - V R system has been shown to discriminate between the performances of subjects with similar experience and similar skil l levels. Subjects can then be grouped according to psychomotor skill level. Discriminative validity is a further refinement of construct validity. 14 Construct validity has also been shown in physical simulators such as the M c G i l l Inanimate System for Training and Evaluation of Laparoscopic Skills (MISTELS) system (Fried 2004). This was an in-depth study with over 200 participating surgeons and trainees in 5 countries. The M I S T E L S system is the physical simulator used by the Society of American Gastrointestinal and Endoscopic Surgeons ( S A G E S ) Fundamentals of Laparoscopic Surgery (FLS) program. The current "gold standard" for concurrent validity studies is O R performance. But the problem with this is the subjective methods (i.e., checklists) to evaluate this O R behaviour. 1.6 Research Question Because the previous methods for investigating validity in surgical simulators have been done with subjective assessments, there is a need to further the study into simulator validity by using quantitative measures. What we would like to know is whether or not motor behaviour in the simulator is analogous to the OR. This w i l l allow us to determine whether the simulator is a good training and evaluation environment. In a complementary study to this project, Catherine Kinnaird (2004) began the investigation into simulator validity by evaluating expert surgeons in the O R and with both physical and V R simulators. That study looked at the performance validity of these simulators by comparing data from the O R with that of the simulators. This expert surgeon study led us to want to investigate further the validity of these simulators. Therefore, the project to be described in this manuscript is a furthering of the validity study of these simulators. The primary objective of this project was to investigate the performance, construct, and concurrent validity of both a physical and V R surgical simulator. The construct validity study used the expert surgeon data analyzed by Kinnaird (2004). The secondary objective was to develop a system that was capable of collecting and analyzing quantitative data from the human OR. 15 1.7 Developing a Quantitative Assessment Method The development of a quantitative method to assess surgeons in the human O R required much thought and preparation. To be able to study the validity of surgical simulators and gather the performance measures that w i l l allow for various context comparisons, we needed to improve and elaborate upon performance measures previously established within our lab. The performance measures that have been used previously include time, kinematics, joint angle and event sequencing (McBeth 2002). A s shown below in Figure 1.5, known in our lab as the "Wheel of Performance", there are other measures that can be made, and incorporated into our system of performance evaluation. Postural Quality Performance Measures ) J Kinematics Event • Sequencing Force / Torque Error Frequency Figure 1.5: Performance Measures. The performance measures in bold and in the solid-line box are the ones used in this study. The measures shown in the dotted boxes will not be specifically studied in this thesis. Due to the time constraints of this project, we focused our study on the following measures: kinematics, and force/torque. The performance measure of quality can be included easily by including checklists/questionnaires of some type, and as mentioned above, postural and event sequencing has been successfully completed in a previous study in our lab. 1.7.1 Kinematics For this study, we have continued the work by McBeth (2002) to gather and analyze kinematics data for the surgical tool tip during laparoscopic surgery. We required a high frequency tracking system that would give us three-dimensional position and orientation data. For this type of tracking, there are many types of commercial systems available such as optoelectronic, magnetic, ultrasonic, and each system has its own advantages and disadvantages. Optoelectronic systems can provide wireless high frequency data, and can be sterilized for O R use. The hybrid systems that are able to track both passive (wireless) and active (infrared) markers are useful in many circumstances. Disadvantages include line-of-sight problems and interference with external infrared sources. McBeth (2002) used an optoelectronic system and did have problems with line-of-sight where some procedures had virtually no usable data, which led to unreliable results. The optical system also had a low sampling frequency (30Hz), and this led to difficulties in producing velocity, acceleration and jerk profiles. In this project, we wanted to improve upon McBeth 's method, and produce high frequency continuous kinematics data. A tracking system that was available to us that seemed to overcome the problems of the optoelectronic system was an electromagnetic system. Electromagnetic sensors give higher frequency data sampling and are not affected by line-of-sight issues, but a wire to the interface unit connects each receiver. External ferrous materials and electromagnetic fields also detrimentally influence these sensors. Because of these issues, an electromagnetic system could not be used solely in the O R environment. The study created a kinematics data collection system that incorporates both the optoelectronic and electromagnetic tracking systems. This overcomes the line-of-sight and low sampling rate problems of the optical sensor, and the low accuracy, metal interference problems of the magnetic sensor. B y using a combination of the two position sensors, we are able to achieve a continuous high frequency kinematics dataset. 17 1.7.2 Force/Torques To measure forces and torques, again there are commercially available systems. Force and torque measurements are made with specially designed force/torque sensors. For use in this project, it was important that we find a sensor that was small enough, yet robust enough, to be used in the OR. Strain gauge based force sensors are commonly used, and easily available, and can be gas sterilized for use in the OR. We followed the lead of Rosen (2001) with their technique of mounting a force sensor onto the shaft of a surgical tool. We also used strain gauges mounted to the surgical tool handle to aid in the calibration of the force sensor and to measure grip forces. 1.8 Project Goals Surgery and surgical education are at a point where the traditional "see one, do one, teach one" teaching technique is no longer acceptable. Surgical education experts have more recently looked into the possibility of using simulators to train, test and certify surgeons. Before these simulators can be used in widespread practice, a thorough evaluation of the systems must be done. Validation of these physical and virtual reality simulators is of utmost importance, as a valid simulator w i l l provide an environment that closely approximates the environment where the task wi l l eventually be performed (Prystowski 1999). The primary goal of this project is to assess construct and performance validity of two surgical simulators: virtual reality and physical (Figure 1.6). Construct validity refers to the concept that the context actually recreates the environment that it intends to recreate. A method of testing this in a surgical setting is to see whether expert surgeons perform better in these simulators than resident surgeons. A simulator that shows construct validity w i l l be able to detect the skill level differences between experts and novices. Performance validity of a simulator is where the simulator's behaviour is the same as in the OR. If a subject performs the same quantitative measures (such as kinematics or force) in a simulator as in the OR, the simulator is said to show performance validity. In turn, we also begin an investigation for quantitatively assessing concurrent validity of the both V R and physical simulators. We are able to make a quantitative "gold standard" measurement in the O R with expert surgeons (data analyzed by Kinnaird 2004), and gather the same performance measures in all other contexts (i.e. both V R and physical simulators). The results from this study could then be used in the design of new simulators, surgical tools and techniques, surgeon training and evaluation. 18 Figure 1.6: Are laparoscopic surgical simulators valid? We are looking for similar motor behaviours between the simulator and the OR to investigate validity of laparoscopic surgical simulators. The orange represents a physical simulator, where the task was to peel the skin off the orange and remove a few segments; the bottom left picture is of a VR simulator interface. As mentioned previously, our performance measures of kinematics and forces were used for our quantitative measures for our validity study. We required a continuous high frequency signal in both these measures, and our existing lab system did not allow for this (i.e., occlusions in optical data). Therefore, the secondary goal of this project was to develop a new tool that would allow for us to get these continuous high frequency measures. The new data collection and analysis system incorporates a data fusion of the two kinematics data streams that eliminates the problem of occluded optical data. Previous methods of combining kinematics data done in the surgical environment have been attempted (Birkfellner 1998, Nakamoto 2000) but none of them were a true fusion of kinematics data. 19 Chapter 2 describes the design of the new experimental surgical tool, and the design o f all the subsystems required for data collection and analysis. It provides a thorough description of the data fusion technique of two kinematics data streams to create high frequency continuous performance measures from the gathered data in the O R and physical simulator as well as the force measurement considerations and calibrations that are required for extraction of force performance measures. Chapter 3 is the experimental methods used to collect data in the OR, and with the two simulators. A description of the equipment used and details of the data post-processing are also included. Chapter 4 contains the results of the experimental testing and a discussion of these results. The reliability of the chosen performance measures and the subject and context variability relating to validity of the surgical simulators is investigated. Chapter 5 is a summary of the findings, conclusions and recommendations for future work. The conclusions relate to current and complementary studies that affect studies in surgical education and simulation. 20 Chapter 2 Experimental Laparoscopic Surgical Tool for Performance Measure Assessment 2.1 Introduction Minimal ly invasive surgery is now a common and essential component o f modern surgical medicine. Unfortunately, the same developments in surgical education and assessment have not kept pace. The current methods o f surgical assessment have been shown to be subjective and unreliable (Chung 1998, Feldman 2004, Lentz 2002, Rosser 1998, Winckel 1994). Therefore, it is agreed there is a need for an objective method to assess surgical performance. For many years, the notion that operative skills should be evaluated has been brought up repeatedly (Kopta 1971). Surgical simulators may provide an excellent venue for performance evaluation, as the measures can be objectively measured. Bench-top trainers, virtual reality (VR) systems and animal models are all used in surgical education programs currently. The performance measures that have been used by researchers include completion time, errors, force/torque signatures, event sequencing, and tool tip kinematics (Chung 1998, deVisser 2002, Derossis 1998, Hanna 1998, McBeth 2002, Rosen 2002, Way 2003, Yamuchi 2002). The longer-term goals o f the lab projects are to create a surgical skil ls database where surgeons could look-up their performance as compared to others. A surgical resident would be able to compare their performance to others o f their own level, and see what needs to improve, or where they excel. But in order to do this, research must be done to validate the surgical simulators, and prove that training in a simulator does improve O R performance. Currently, studies have shown that expert surgeons perform better in simulators than novices, and practicing in a simulator leads to improvement in the simulator (Derossis 1998, Fried 2004, Rosser 1997). It has also been shown that assessments made in a simulator can be used to monitor progress (Derossis 1999, Fried 2004), and that practice in a porcine model leads to O R performance improvement (Fried 1999). And even more recently, breakthrough studies have shown that practice in a simulator would indeed lead to improvements in the human O R (Seymour 2002, Grantcharov 2004). This tells us that skil ls learned in a simulator could be used to replace O R time for learning. 21 This chapter describes the design and considerations for creating a new tool and data collection system to measure O R and simulator data used in studying the validity o f both physical and virtual reality simulators. The objective is to improve upon the current tools used to gather the performance measures, and to add measurement of force to the system originally created by former lab member Paul McBeth (2002). A new technique was also created to fuse our two gathered streams o f kinematics data to create a high frequency and continuous kinematics data stream. 2.2 Laparoscopic Surgical Tool The laparoscopic surgical tool that was used in these studies was a Maryland dissector as seen in Figure 2.1. This particular tool was chosen as it is used the most during the initial parts o f the laparoscopic cholecystectomy procedure to dissect away the surrounding tissues from the cystic duct and artery. It is used to pul l , spread, and tear away the extraneous body tissues. When it is connected to the electro-surgical unit, it is capable o f burning and cauterizing tissues. This particular tool was chosen on recommendation o f an expert surgeon participating in our studies. We obtained a commercial ly available tool through Storz Endoscopy. These tools have an interchangeable tool tip insert. Other tool tips may be purchased and used instead of the Maryland dissector insert. This is a good feature for future work, as different tips and therefore motions and forces w i l l be available for data collection. Figure 2.1: Maryland dissector tip. Used for dissecting away surrounding tissues. 2.3 Performance Measures There are a wide range o f performance measures that are available for assessing surgical sk i l l . In consultation with expert surgeons, literature searches, and fol lowing the protocol from the previous study done in our lab by Paul McBeth (2002), we are continuing with the chosen performance metric o f tool tip kinematics, and with the addition of tool tip force/torque. We are no longer including completion time, ergonomics/joint angles and event sequencing that were previously completed, but they can easily be re-implemented back into the system. The fol lowing sections describe further the selected performance metrics and the methods we used to collect this data. 2.3.1 Kinematics The use of tool tip kinematics measures in assessing surgical performance has become more common in surgical performance measurement systems (McBeth 2002, Rosen 1999). Rosen's group has created the BlueDragon system, which measures kinematics o f the tool tip in vivo in a porcine model (Rosen 2002). Another group has incorporated electromagnetic trackers to measure distance, number o f movements, and speed for a surgeon's hand movements in a laboratory setting (Taffinder 1998, Smith 2002). In a previous study within our lab by McBeth (2002), kinematics data was collected using an optoelectronic position tracking system. Our group continued with McBeth 's work and further elaborated and improved the system to measure tool tip kinematics data to investigate tool tip velocities, acceleration, and jerk in the fol lowing tool tip directions: axial, grasp, translation, transverse, absolute and roll about the tool axis (Figure 2.2). Figure 2.2: Tool tip reference frame. Tool tip directions with respect to the tool handle. The axial (zj direction is along the tool shaft. The grasp (y) is in line with the tool jaws. The translate fx) direction is in the perpendicular direction of the y and z axis. Y X - Translate Z - Axial 2.3.1.1 Optoelectronic Position Tracking In a previous study in our lab by Paul McBeth (2002), an optoelectronic motion tracking system was used to collect the kinematics data. According to the product manual, the Northern Digital (NDI Northern Digital Inc., Waterloo, O N , Canada)) Polaris Hybrid Tracking System (Figure 2.3) is capable o f tracking the 3D positions o f both infrared light emitting diodes ( IRED's) and passive reflective markers with an accuracy o f -0.2-0.3mm. In our study, we only used passive markers. This optoelectronic system was originally chosen because surgeons have seen them in the operating rooms and are familiar with their presence, the parts are easily sterilizable, a system was available, and we were primarily interested in postural data. Figure 2.3: NDI Polairis optoelectronic position tracking system. The top picture is the camera unit, and the lower picture is the tool interface unit. The Polaris system uses an infrared camera to track the desired markers. It requires three passive markers (retro-reflective balls) to establish an array, or reference frame. Polaris records the position and orientation o f the reference frame with respect to the camera. A Multi-Directional Marker Array ( M D M A r r a y ) was custom designed and made by McBeth (2002), and was attached to the experimental tool to track tool movement. This specially designed array was created to make the tool visible from many angles compared to just a standard planar array. The standard array was one o f the original problems with this system. 24 The M D M A r r a y has five geometrically unique faces, and can be rotated in many directions to al low the Polaris camera to track one face at a time. This allows for improved visibil ity o f the passive markers to the camera and more continuous data to be collected, as intermittent data and therefore gaps in the data, is a significant problem. See Figure 2.4 for a picture of the M D M A r r a y . Figure 2.4: MDMArray. Halo of optical passive marker balls used for optical position tracking. The infrared camera tracks faces (3 balls) of the array. The study conducted in our lab previously has shown that the Polaris optoelectronic system is usable in the OR, but some limitations were discovered. Because the Polaris depends on line- of-sight from the camera to the marker arrays, these arrays can become occluded from the camera's view by surgeon movements, interrupting and leaving gaps in the data stream. It was found that during typical manipulation tasks, the cl ipping tool was visible 78 +/-12% o f the time even with the M D M A r r a y (McBeth 2002). 2.3.1.1.1 Other Kinematics Options Because our goal was to have a system that could gather continuous high frequency data to obtain our performance measures, we considered various options such as: • Re-designing/modifying the current marker array to al low for more positions o f the array to be seen • More optical tracking cameras to allow the arrays to be seen from multiple angles, therefore increasing visibility • Incorporating o f a second motion tracking system: o Accelerometer/gyro o ShapeTape™ (flexible tape like position sensor which reports its shape) o Electromagnetic system 25 The options o f changing the marker array or adding more cameras to the optical tracking system were discarded as this may improve the problem o f occlusions/gaps, but would likely not solve it completely. A lso the sampling frequency would still remain relatively low. The accelerometer/gyro option was considered, but was not easily available in our lab, and the same for the ShapeTape™. Neither of these systems would have been reasonable to design and debug in a reasonable amount of time. The electromagnetic system was available for our use as it was available through inter-departmental collaborations, and would provide the high frequency and continuous data stream that we required. 2.3.1.2 Electromagnetic Position Tracking Electromagnetic tracking systems have been used in the past in surgical applications, but problems such as electrical noise and interference have been reported (Datta 2002, Frantz 2003, Smith 2002). Other researchers have attempted to combine sensors in the surgical environment. A study completed by Birkfellner and colleagues (Birkfellner 1998) at the University of Vienna successfully combined and calibrated a hybrid (optical and electromagnetic) tracking system. Their motivation for merging the two tracking systems was similar to ours in that they were concerned with the optical system's line-of-sight limitations, especially in a crowded environment like the OR. The electromagnetic system provided a continuous stream o f data. Their hybrid tracker employed a simple switching protocol: i f the optical system is in view and available to collect data, it was used. If not, data was requested from the magnetic tracker. Only one piece o f data was collected at each time interval, either optical or magnetic, so no true fusion was performed. This system was tested in an O R test set-up but not during an actual operation on a human. The main contribution o f this group was to investigate to what extent ferrometallic materials in the O R affected a magnetic tracking system, and created a calibration look-up table to compensate for the interference. They also found that the calibration to be useful after multiple registration attempts under varying O R conditions. This is an idea that holds promise for future studies, and was not used in our study due to the fact that we could collect two separate data streams (optical and electromagnetic sensors) with relative ease. Creating a switching protocol would have been more time consuming. 26 Another group led by Nakamoto (2000) also created a hybrid system involving both optical and electromagnetic tracking systems. This group recognized that the source o f many inaccuracies in a magnetic system in an operating room are the O R table and surgical instruments. Because of space and time constraints, it is also very difficult to calibrate for these distortions during or before an operation. This group developed a method for calibration, which allowed the magnetic transmitter to be moved intraoperatively, and allowed for optimal physical placement of the transmitter by using an optical sensor to track the magnetic transmitter. A n interesting discovery by this group was that the distance between the magnetic transmitter and receiver must be relatively short to maintain an acceptable accuracy. They found that the transmitter-receiver distance must be 20cm for an error o f 2mm in and around O R equipment. This group did not seem to fuse the data, but simply used the optical system to track the magnetic transmitter. Our goal was to fuse the optical and magnetic data to create one continuous and high frequency dataset. This was the most reasonable and feasible option at the time as we were able to collect both the optical and magnetic data easily. We wanted to rely on the accuracy of the optical system, but use the continuous high frequency data from the magnetic system. By performing a data fusion, we were able to take advantage o f the good qualities of both systems. The electromagnetic system we used was the Polhemus Fastrak. The Polhemus Fastrak (Polhemus Inc., Colchester, V T , U S A ) electromagnetic tracking system was chosen to be the complementary tracking system to the Polaris optoelectronic tracking system. The Fastrak is a magnetically- based tracking system based on a fixed transmitter that sends out low frequency magnetic fields that allows the moving receiver to determine its position. Six degrees o f freedom for position and orientation can be measured. The Fastrak does not suffer from line-of-sight issues, and has a much higher sampling frequency (120Hz). Although, magnetic systems do have their own disadvantages such as suffering from drift, interference from ferrous metals in the environment, and are electrically wired. The Polhemus user manual gives an accuracy o f 2 mm within the l m 3 working volume, but one study found that this accuracy could only be achieved within a transmitter-receiver distance o f 22cm (Mi lne 1996). The Polhemus Fastrak electromagnetic tracking system can be seen below in Figure 2.5. 27 Power supply Transmitter Tool interface unit Figure 2.5: The Polhemus Faslrak magnetic position tracking system. This picture shows the tool interface unit, power supply, transmitter, 1 receiver and the stylus. 2.3.2 Force/Torque The adequate and appropriate use of forces/torques (F/T) in any surgical procedure is a ski l l that must be learned by a novice and practiced carefully by al l . Surgical procedures require a certain amount of finesse and knowledge when applying forces and torques to human tissues. It is important that a surgeon is aware o f this aspect, and takes it into account during any procedure. The collection o f continuous high frequency F/T data to measure the surgical tool tip-tissue interaction force/torques during a live human surgery was our goal. Rosen and colleagues have successfully measured forces and torques in vivo in a porcine model, and were able to classify surgeons' ski l l level using force/torque signatures (Rosen 1999). Their F/T data was collected using two separate sensors: a tri-axial F/T sensor and a strain gauge system mounted on the surgical tool handle (Figure 2.6). Their sensor is a custom- made tri-axial F/T transducer that mounts directly onto a laparoscopic tool shaft (hole through the center of the transducer). 28 Figure 2.6: F/T system of Rosen (1999). A custom-designed F/T sensor was mounted directly onto the surgical tool shaft. This sensor has a hole through the center. A strain gauge system is mounted onto the tool handle. To measure the forces and torques associated with the surgical tool tip, we mounted a Mini40™ (ATI Industrial Automation, Apex, N C , U S A ) force/torque sensor (Figure 2.7) to our experimental Maryland dissector tool. This is a strain gauge based transducer, and able to withstand the forces and torques used in laparoscopic cholecystectomies. Force and torques in all three axes (Fx, Fy, Fz, Tx , Ty, Tz) were recorded at 120Hz in counts per unit force. This data is continuously collected directly into a Matlab file. Wi l lem Atsma, also a member o f the Neuromotor Control Laboratory, wrote the Matlab drivers for data streaming o f the F/T data. The Min i40 was chosen as it was available in our lab, and is compact enough to be mounted onto the surgical tool without much interference, and not affect the weight o f the surgical tool significantly. Figure 2.7: ATI Mini40 F/T transducer. It is 40mm in diameter, 12.2mm thick, and weighs 50g. Sensing range: Fx. Fy +/- 80N, Fz +/-240N. 2.3.3 Sensor Bracket Design To attach the optical sensor M D M A r r a y , magnetic receiver, and the F/T sensor, onto the Maryland dissector surgical tool, some type o f mounting bracket was required. Many considerations were taken into account in the design o f this mounting bracket (Table 2.1) 41 29 Table 2.1: Criteria for design of the sensor mounting bracket. I Criteria | Force bearing/load path I Lightweight Small Non-conductive material Non-obtrusive N o sharp edges Reason To al low for the force path to travel through the F/T sensor To not affect the surgical tool weight and balance To keep the surgical tool shaft length as long as possible To allow electro-cautery current to pass through the tool shaft and not through the sensors (especially the F/T sensor) To allow the surgeon as normal tool function as possible To prevent surgical staff from cutting gloves Special measures had to be taken to allow the F/T sensor to be able to function properly, and be able to measure the tool tip forces/torques. We wanted to ensure that all the forces would be transmitted through the innermost shaft of the surgical tool. To do this, the outer shaft o f the tool was cut to al low for these forces to be transmitted along the innermost shaft (Figure 2.8) through the bracket and then through the F/T sensor. This changed the original electrical isolation coating (seen as the thin black coating on the tool shaft) o f the surgical tool shaft, and care had to be taken to minimize the area of the electrically live shaft that is exposed. The bracket was designed to not allow for accidental contact between the human and the exposed shaft. Insulating covering ^Metal tube Innermost shaft Figure 2.8: Two cut views of a typical laparoscopic tool shaft. It consists of two layers of lubes (thin and thick lines), and the innermost shaft (dotted fill). The outer layer is a protective and electrically insulating covering. The middle tube is the metal structure of the tool shaft. And the innermost shaft is connected to the tool tip, and the tool handle. The innermost shaft and the middle tube are electrically live when electrocautery current is applied. After much iteration, the bracket was finally designed to mount all sensors, and satisfied the criteria (Figure 2.9). The bracket was designed in conjunction with volunteer lab engineer (Brandon Lee). 30 Figure 2.9: Sensor mounting bracket on surgical tool. The inset picture is a close up of the sensors mounted on bracket. The final design o f the bracket consists o f two parts: top and bottom segments as seen in Figure 2.10. The bracket parts are mounted in-between the two parts o f the original surgical tool. The top segment is directly attached to the outer shaft of the surgical tool, and w i l l sense all forces of this top segment. The force sensor was not mounted inline, as with Rosen's device shown earlier, because our force sensor did not have a hole drilled through it to al low passage o f the central rod. Figure 2.10: Force/Torque sensor bracket two segments. ATI force torque transducer mounted below tool shaft via custom designed bracket (Source: Kinnaird 2004). 31 The file was submitted in S T L format (generated by Sol id Works) to technologists at the British Columbia Institute for Technology (BCIT) to construct the bracket out o f a medical grade A B S plastic on the rapid prototyping machine. A non-conductive material was required because o f the electrical current that is transmitted through the shaft for tissue cutting and coagulation. A l l the sensors, as well as the user and patient, must be protected from this electrical current. The design and material o f the bracket also sets the magnetic receiver as far away as possible from any metallic elements that could potentially lead to errors in the magnetic sensor readings. The wires coming from the Fastrak, F/T sensor, and the strain gauges, can all be gathered to one side o f the bracket and tied together to minimize obstruction to the surgeons. This is done before each surgical experimental procedure. The detailed drawing and specifications o f the mounting bracket can be found in Appendix C. 2.3.3.1 Force Balance The M D M A r r a y optical halo and F/T sensor connect the two segments. Since the force sensor is in the path connecting the segments, it registers any forces acting between them (Figure 2.11). Figure 2.11: Force load path through sensor bracket and F/T sensor. The force travels bi- directionally along the tool shaft through the bottom segment, through sensor, and through top segment or vice versa. In an O R situation, a trocar (tubular object used to hold surgical tool near operating site) is inserted into the abdomen, and the surgical tool is inserted through this trocar (Figure 2.12). This allows smoother movement of the tool and provides stability for the long laparoscopic tools. The surgical tool can be pushed down or pulled back along the length o f the trocar to 32 access deeper tissues. The trocar also keeps the abdominal inflation gases inside because it is sealed. These gases are required in laparoscopic surgery to al low for better internal visualization. ^^C_3 =o Surgical tool Abdominal Force sensor Trocar F igure 2.12: Laparoscopic trocar. The trocar provides stability for the long laparoscopic surgical tools. There are force interactions between the tool shaft and the trocar that are sensed by the force sensor. In our subsequent data analysis, we require an estimate o f the forces the surgeon is applying to the tissues using the tool. In this section, we present a free body diagram analysis o f the loads applied to the tool and demonstrate how the tip forces are estimated. These FBDs are shown in Figures 2.13a-e. Rod X . \ Tool tip Tool shaft Force sensor Rigidly attached tool handle Moving tool handle F igure 2.13a: Overall view of the tool, which is then split into 3 sections (2.13c,d,e)for FBD analysis. The dashed line at "A "represents the cut to create sections 2.13c and 2.13d. The "B' dashed line is used to create figures 2.13d and 2.13e. In the fol lowing figures (2.13c-e), the fol lowing abbreviations are used. F t a (actual tissue-tip interaction force), F t (effective tissue-tip interaction force), F a (force along the shaft, ie. trocar forces), F r (tool rod force), F s (sensor force), F g (gravity force), F h (hinge forces), and F f (grip force o f the hand on tool handles). The respective moments are also included. 33 The gravitational force (F g) is assumed to be in the negative y-direction for illustrative purposes. In general, it w i l l be a function o f the tool's attitude. The effect of gravity forces on the force sensors is accounted for using a calibration method fully described in Kinnaird's thesis (Kinnaird 2004) and introduced in section 2.4 below. In the fol lowing FBDs , we identify the gravitational forces on the tool but in the subsequent analysis, we assume that the sensor readings have been adjusted to take these forces into account and therefore set the gravitational forces to zero. There are two force-sensing elements in the tool; the force sensor collects forces and moments in all 3 directions (x, y and z) and the strain gauge pair is used to estimate the bending moment in the handle used to apply grasping forces. From these sensor readings, we are able to estimate the tip-tissue interaction forces as described below. The F t and M t values are actually what we consider the effective tip force (ie. combination of both the actual tip-tissue interaction forces and any forces along the tool shaft). Unfortunately, due to the fact that the interactions between the surgical tool shaft and the trocar do not occur at a well defined point and are not directly sensed separately from the tip forces, the trocar interaction forces were not specifically modelled in this study. Directly estimating these trocar interaction forces would require a model that could account for the movement o f the surgical tool along the trocar, but this is difficult because the trocar does not act on the tool at one specific point, but along a 7-10cm portion o f the tool shaft. The characterization of the trocar- tool interaction forces could be investigated further in future studies. We did assume that the axial trocar forces were l ikely to not be very large in comparison with the tip/tissue interaction forces because the tool could slide through the trocar under its own weight. It is more difficult to justify a claim that the lateral forces are low because, although the abdominal wall is compliant and the tool is rarely used as a "pry bar", the forces could be comparable in magnitude. Nonetheless, since the point of application o f the trocar forces changes, it is difficult to cleanly separate the two, which is why we have decided to represent the forces as equivalent tip forces and moments (i.e. tip forces + trocar forces = effective tip force), as shown in Figure 2.13b. Equations (a) - (f) show how the effective tip forces are affected by the presence of trocar forces. 34 feik— Trocar forces ( F . ) . . . & moments ( V I , ) f / ' • Effective tip forces ( F t ) & (\* * jr moments (VI,) Distance between tip and trocar (dt) /Actua l tip forces (!•',.,) \.../ & moments ( M , a ) Figure 2.13b: Effective tip forces & moments are a combination of the trocar interaction force & momnts and the actual tool-tip forces & moments. The equations used to find the effective tip forces and moments are: a) l',x l ; t , x - Fax d) M t x = M , , , + M a x - b) F , v = F t a v + F a v ' e) M , v = M , a v + M a V - F a z (d , ) c) F l z = F t a z + Faz f) M t z = M t a z + M „ z + F a z (d . ) Figure 2.13c: Free body diagram of distal end of surgical tool and force sensor. Effective dp forces are represented. The "d" are the perpendicular distances of the forces used in the moment equation. The equilibrium equations (assuming acceleration is comparatively low and can be neglected): 2) HFv=0 = -Fly-Fg]+Fxy 3) YF: O /;:; • /•;._ Summing moments about the center of the force sensor (black circle on figure): 4) ZM*=0=F»di ~M«+M« 5) T,My=0=Ft!d2-M^-+Msy 35 6) JX- = 0 = ^ 4 - F A + F , A +Ftyd2 +MI: +Ms: Our goal here is to express the effective tip forces and moments in terms of the measured forces and the rod force. • From eq. 1: F t x = F r + F s x (eq.A) • Apply ing gravity compensation (F g l =0) to eq.2: F^ = F s y (eq.B) • From eq.3: F l z = - F s z (eq.C) • From eq.4 : M l x = F t z d| + M s x (eq.D) • From eq.5 : Mty = F t z d 2 + M s y (eq.E) • From eq.6 and gravity compensation: M t e = - Ftyd2 + F r d i - F t x d| - M s z (eq.F) • Have 6 equations but 7 unknowns. The force sensor gives values for F s and M s ; F r is derived from the analysis described later on page 36, which is found in Figure 2.13e. Figure 2.13d: Free body diagram of force sensor and stationary tool handle. The "d" are the perpendicular distances of the forces used in the moment equations, and are different for each section figure. Note that this diagram is not used in the analysis, but is shown for completeness. 7) £ F , = 0 = F t a - F / I j r - F „ 8) HK=0 = Fny-Fg2-F:y + Fhy 9) Z^=0=^-+^,-^ Summing moments about the center o f the force sensor (black circle on figure): 10) = 0 =MXX - Mkx + F^d, - Fflzdb , i ) = 0 = Msy + Fh.ds - Fflzd% - Mhy 12) 5X = 0 = Fhyd5 - Fgld5 + FJyd8 - Ffxd6 ~ Fhxd7 - Ms: 36 Figure 2.13e: Free body diagram of tool handle and strain gauges. The "d" are the perpendicular distances of the forces used in the moment equations, and are different for each section figure. These equations are derived to show that the act of gripping results in the rod force, Fr. 13) 1 ^ = 0 = - ^ . - ^ , + ^ , 14) HK=0 = -F„-Fhy + Ff2y 15) YF:=0 = F>,:+FJ2: Summing moments about the hinge: 16) XM , =0 = M f a - F , 2 ^ l l 17) X M , = 0 = M ^ + ^ 2 . A 18) = 0 = Frd9 + Ff2ydi0 + Ff2xdx, - Fg3d]2 -Mf2: • From eq. 13: F r = Fo x + F| l x(eq.G) • Applying gravity compensation ( F G 3 = 0) to eq.14: F Q y = Fhy ( e q . H ) • From eq. 18 and gravity compensation: Frdg = - Ff2ydio - FQXdi i+Mf2Z (eq.l) From eq. 18, if we make the assumption that F Q (grip force of fingers) in Figure 2.13e is applied in the same fixed spot (as the finger holes for the tool are not large), and we take the sum of the moments around the hinge, we find that: Fr = f(Ff2, Ma). Our strain gauge pair senses the bending moment in the handle at the gauge location (Fn)(grip to strain gauge distance) + M B . But we also believe that MQ ~ 0 because it is physically difficult to apply a pure couple here. Therefore we can make the assumption that the strain gauge pair's output is 37 proportional to F^. So, in principle, we can compute F r direcly and substitute it back into eqs. 1-6 and the equations derived from them. In fact, it is more straightforward to observe the effect o f grip forces on the force sensor output and to directly correct the force sensor readings as a function o f the strain gauge pair's output, as described below in section 2.4; details are contained in Kinnaird's thesis (Kinnaird 2004). Therefore, the final equations for estimating the effective tip force and moments are: 19) F t x = F r + F s x 20) Fry = F s y 21) F t z = - F s z 22) M l x = F t z d, + M s x 23) Mty = F t z d 2 + M s y 24) M l z = - F t y d 2 + F r d, - F^d, - M 2.3.4 Gr ip Force The grip forces measured are used to correct the force readings to better estimate the tool shaft loads. The grip force is measured by two strain gauges mounted onto the surgical tool handle (Figure 2.14) and can the gauges be seen on the tool in Figure 2.15. In this half-bridge configuration, the gauges are measuring the perpendicular axis forces exerted on the handle, and we can correlate this force to forces at the surgical tool tip. Vo= output voltage, V E x = excitation voltage, G F = gauge factor, e = strain R G = nominal resistance o f strain gauge, AR = strain induced change in resistance Ri and R 2 reference resistors Figure 2.14: Strain gauge circuit diagram for half-bridge configuration. The "F" represents the surgeon s grip force exerted on the tool handle. K - A N (torsion) K . * AK (Corrtpr6ssiion V The two gauges used were standard Vishay Micro-Measurements 120ohms. These two gauges are then fed into an instrumentation amplifier with built in gain and offset control. This signal conditioner also compensates for temperature by having the reference resistors within. The 38 These grip strains can then be extracted from the total F/T measurement to give a more accurate tool tip force measurement. This is explained further in section 2.4.1. See Figure 2.15 for a picture of the strain gauges mounted to the surgical tool handle. Figure 2.15: Strain gauges mounted on tool handle. The figure on the left is a general diagram of the surgical tool. The picture on the right shows a side view, where two gauges are attached on opposite sides of the tool handle. 2.4 Force/Grip Data Processing In an earlier section, we showed that the force sensor also responded to grip forces. Here, we explain how we use the strain gauges to separate grip forces from tool interaction forces. The following sections describe our concerns with the force sensor calibration, and what was done to extract force data as a performance measure. 2.4.1 Grip Calibration To fully understand and use the data received from the F/T sensor, an understanding of the mechanics of the surgical tool is needed (Figure 2.16). A typical laparoscopic surgical tool shaft has a few layers as described above in section 2.3.3. The innermost long shaft is what is attached to both the tool handles and the tool tip. This controls the opening and closing of the tool tip jaws. The tool handles are opened by the surgeon, the inner shaft w i l l move and shorten, therefore causing the jaws to open due to the built-in pivot mechanism. Figure 2.16: Mechanics of laparoscopic tool. When the handles are opened, the tool tip jaws are also opened due to a shortening of the innermost tool shaft. (Modified from source: Kinnaird 2004). 39 The F/T transducer senses this movement and wi l l record it appropriately. Because of the design of our F/T sensor, the surgical tool shaft had to be cut, and the special bracket mounted as discussed previously in section 2.3.3. A l l the loads on the inner shaft are transferred through the bracket and sensed by the transducer. The interaction between the strain gauges and the force sensor is depicted in Figure 2.17. Through calibration, the strain gauge data is used to separate the grip forces from the actual tissue manipulation forces. Figure 2.17: Interaction between strain gauges and force sensor. 1) Surgeon closes handle. 2) Strain gauges sense strain in tool handle. 3) Tool tip jaws close. 4) Tool shaft goes into compression, and the force sensor (dotted fill) senses this force against the tool bracket (solid Discussion of the calibration algorithm used to separate the grip forces from tissue manipulation forces can be found in Kinnaird's thesis (2004). The tool was held in a neutral position, and the tool handle is open and shut while recording data from both the force sensor and the strain gauges. The results from one of Kinnaird's (2004) calibration tests are shown in Figure 2.18. 3 black fill). 40 Sample Calibration Grip Data - One Direction Strain Data (V) Figure 2.18: Results from one calibration test. The friction loop (dotted line) and the arrows are indicating the direction of motion. This loop is not consistent, and varies with grip strength. (Source: Kinnaird 2004) Ideally, the force reading would be linearly related to grip strain, but there is clearly some nonlinearity. The force versus strain graph indicates flat sections at both ends where the strain reading increases while the force reading remains constant. The flat part at the lower left is likely due to friction within the tool handle (see Figure 2.19). The strain gauges detect the initial forces required to overcome the friction before any load is transmitted to the force sensor. There also seems to be a large amount o f hysteresis during the release after the squeeze, as shown by the dashed line in Figure 2.18. This loop is not constant, and the location varies with different strength squeezes. The upper right portion of the plot also demonstrates another flat section due to saturation o f the force sensor. The force sensor has overload protection but occasionally the surgeon's grip can reach the maximum sensing range o f the sensor, which causes the strain reading to increase while the force reading stays constant. In conversation with Catherine Kinnaird, and a visual inspection o f the force data verified that, the saturation problems (upper flat part o f curve) were not very significant, as it is believed that less than 2 % of the force readings hit this saturation area. This value was based on a visual inspection o f the force data. Tool rod/bracket interface friction Hinge handle friction 41 IOC Figure 2.19: Friction in the surgical tool handle and bracket. There was friction in the tool hinge handle, and the tool rod/bracket interface. The mounting bracket also moved slightly/slipped along the tool shaft due to an improper fit. This movement may have contributed to the hysteretic loop in Figure 2.18, and to tool handle movements not being sensed by the force sensor. These problems led to a need for a somewhat more complicated grip compensation algorithm than could be used i f the relationship between strain gauge output and force sensor output did not exhibit either hysteresis nor saturation. A mean grip constant was calculated for the linear portions o f the hysteretic loop (Figure 2.20). When the force data was within the linear range (inside the dotted oval), grip forces were removed. Outside o f this range, no compensation was applied. On the upper side of this linear region, the force sensor is no longer responsive to changes in the tip force; no compensation was applied, leading to misleadingly high force peaks, especially in the axial direction. In conversation with Catherine Kinnaird, and by visual inspection o f the force data, it was believed that about 2 0 % o f the force data might be outside the linear region (region outside o f dotted oval). Kinnaird (2004) completed an error study o f the compensation algorithm, and it was found that the algorithm reduces the R M S tip force error by about 5 0 % in a typical manipulation (in a simulator environment). While it was correct to not apply any compensation to the force sensor reading when the grip force was in the low end of the curve, the correct thing to have done in the case o f saturation would have been to set the tip force reading to "zero" , and i f possible, to have excluded such data from subsequent analysis. However, we erroneously left the readings uncorrected, but believe this ultimately had a small effect on our conclusions because saturation occurred relatively infrequently. 42 o Tool friction \ \ Force saturation strain Figure 2.20: Grip compensation. The force data has grip removed only in the linear region, as outlined in the dashed oval. This leads to misleadingly higher force peaks in the force data stream. With regards to tool design, the next iteration o f the mounting bracket should be better fitted to the tool shaft to prevent the " s l i pp ing " movement mentioned above. The force sensor chosen should also be able to read the higher forces without saturating. 2.4.2 Gravity Effects Calibration The effects o f gravity on the F/T sensor are significant enough that they must also be compensated for. This sensor is inherently quite sensitive, and is mounted o f f the axis o f the surgical tool shaft. This creates a force/torque reading just from the mass o f the surgical tool alone. These F/T readings can vary up to ~5N when the surgical tool is held or placed in different rol l (rotation about z axis in tool tip frame (Figure 2.1)) and pitch (rotation about y axis in tool tip frame) orientations. Rotation in the yaw (about x axis in tool tip frame) does not affect F/T readings in the neutral position, as gravity naturally acts perpendicular to this axis o f rotation. In order to improve the F/T readings, these effects o f gravity must be compensated for. A mathematical model was created to compensate for the roll and pitch. The details can be found in Catherine Kinnaird 's thesis (2004). This model led to an almost 2 X decrease in R M S error when compared to a simple mean subtraction method, where the mean force value was subtracted from the total force. 43 2.5 Kinematics Data Fusion Because we have used two different position sensors in our data collection process, we make use o f this data for our kinematics measure, and have created a technique to fuse these two datasets. 2.5.1 Data Fusion Introduction In a previous study in our lab by McBeth (2002), the Polaris optoelectronic tracking system was used to collect 3D position data collection for the method to quantitatively measure surgical performance. The optical data was collected at 20Hz in the McBeth study, which is a suitable sampling frequency for measuring human movement particularly for postural studies (Woltring 1986). However, as mentioned previously, one drawback o f the optical sensor is that it is susceptible to line-of-sight problems, which leads to occlusion o f the optical markers and consequently gaps in the optical data stream. He also suggested that a maximum gap size o f 0.5s could be interpolated successfully (McBeth 2002). In the OR , marker occlusions longer than 0.5s occur quite frequently. These are the main reasons for wanting to improve the position tracking system for quantifying motor performance. Various different options were considered (i.e., ShapeTape, accelerometers, gyroscopes) as discussed in section 2.3.1.1.1, but in the end we chose to combine an electromagnetic position tracking system and the optical sensor. Due to availability and ease-of-use, the Fastrak electromagnetic position tracking system was chosen to complement the Polaris. Fastrak is a three-dimensional (position and orientation) magnetic tracking system, and these are known to be free o f line-of-sight issues, have continuous data collection, and sample at a high frequencies (~120Hz). According to the product manual, the 3D position and orientations o f the receivers can be measured with an accuracy o f 2mm and 0.15° within a l m 3 working volume surrounding the magnetic transmitter, but we have found that in reality, the accuracy is not as good as the manual suggests. We have found that once the transmitter-receiver distance is greater than ~20-30cm, the data becomes less accurate, and tends to fluctuate around the actual value. 2.5.2 General Data Fusion To obtain a useful estimate o f the tool position, the two data streams must be fused. The general process o f combining multiple data sources is well studied in a wide variety o f applications (Challa 2004). Sensor fusion is defined as "the combination of sensory data or data derived from sensory data such that the resulting information is in some sense better than would be possible when these sources were used individually" (Elmenreich 2002). In short, we would like to combine more than one source of data to create information that is better than either alone. Table 2.2: Advantages of data fusion (Elmenreich 2002) Advantage Reason Robustness & Reliability Inherent redundancy and provide data even when one source fails Extended Spatial & Temporal Coverage One sensor can see where another cannot, and provide data when another cannot Increased Confidence Measurement of one sensor confirmed by measurements from other sensors of same domain Reduced Ambiguity & Uncertainty Joint information reduces set of ambiguous interpretations of measured value Robustness Against Interference Increasing the measurement space makes system less vulnerable to interference Improved Resolution Multiple independent measurements taken of same property Sensor fusion can be implemented at various levels of interpretation depending on the application. Low-level fusion (or raw data fusion) will combine various sources of raw data to produce new data this is supposed to provide more information than the original inputs. Intermediate-level fusion (or feature level fusion) combines features such as edges, corners, lines and textures into a feature map that is then used for segmentation and detection. High- level (or decision fusion) combines decisions from several experts. Methods include voting, fuzzy-logic and statistical methods. In our case, we will be concentrating on the low level or raw data fusion, as there will be two raw kinematics data sets combined into one. Because the notion of data fusion covers so many levels of interpretation and types of sensors, there is no single model of fusion that can work for all applications. It is key to find a model that is optimal for a specific application. 45 2.5.2.1 Fusion Methods There are many different types o f sensor fusion, and each one has its pros and cons. Some of the areas where sensor fusion is used include the areas o f military (i.e. target tracking), satellite positioning, and image processing (Challa 2004). The more common methods of sensor fusion in these areas include: • Bayesian Inference - using probabilities to attach weightings, i.e., automotive applications sensor fusion (Coue 2003) • Dempster-Shafter Inference - similar to Bayesian, but more computationally intensive. A l lows for more unknowns, as it relies on "bel iefs" and "masses", i.e., New use in human computer interactions (Wu 2002) • Art i f ic ia l Neural Networks -perception studies (Johnson 1998) • Kalman Filtering - prediction/correction filter, often used in navigation 2.5.4 Kinematics Data Fusion Technique Before the collected data can be fused, it must be synchronized in time and registered into the same reference frame as summarized in Chapter 3 section 3.6.1. The details o f synching and registering two data streams were presented in the complementary thesis o f Kinnaird (2004). Once the two positional data sets are synchronized and registered, they are then put through the fusion process. Position and orientation measurements from the optical sensor are considered to be correct and accurate when optical markers are visible, and the magnetic measurements are used to provide estimates o f the shape and detail o f the sensor's trajectory, especially during times when the optical sensor has missing data (i.e. optical marker occlusions). We take these two data streams and fuse them into one continuous high frequency data stream. By using the accuracy o f the optical data, and the continuity o f the magnetic data, we take the advantages o f both systems to get the data we want. We first filter the magnetic data. This filtered magnetic data is then evaluated at the times o f the optical data to estimate its value at the optical sampling times. The time matched optical and magnetic data is now subtracted from each other to create a difference curve. Next, we interpolate this difference curve to estimate the errors at each magnetic sample time. The fused data is an addition of the interpolated difference curve to the original magnetic data. 46 A f low diagram o f the steps to data fusion is shown in Figure 2.21. Filter magnetic data Evaluate/int times o f opt (magnetic) erpolate @ ical Difference curve (optical - magnetic) Spline interpolation (difference) * Fused Data = splined difference + magnetic Figure 2.21: Data fusion steps 2.5.4.1 Data Fusion Technique Details The original magnetic data was sampled at 120Hz, which is much faster than the 30Hz sampling rate o f the optical data. The magnetic data was first filtered using a Generalized Cross Validation ( G C V ) approach by Woltring (1986) to smooth the dataset, as the magnetic data can be rather noisy (Figure 2.22a and Figure 2.22b).  175 h ' J J c i i i i i i i i 213.5 2136 213.7 213 B 213.9 214 214 1 Time <ŝ Figure 2.22b: GCV smoothed magnetic data. The bottom graph is a magnification of the smoothed magnetic data. The algorithm iterates to find the optimal smoothing parameter by considering each data point and all the other data points to find a model that reproduces that one data point (i.e., minimum G C V is least affected by any single point). The G C V algorithm is o f specific use in our application as it accommodates unequally time sampled data, and can handle multiple datasets. 49 We hypothesize that this noise is caused by the OR environment of metals and medical instrumentation systems, as the same amount o f noise was seen in the laboratory setting. The noisy spikes of data caused by the electrosurgical unit (ESU) are also removed as wi l l be described in chapter 3 section 3.7.1.1. The magnetic data points were then mathematically reduced to the same number o f points as the optical data by down sampling to time match to the optical data. A difference curve was generated between the optical and interpolated magnetic data. This difference curve was then interpolated to estimate the errors at each magnetic sample time. The interpolated difference curve is then added to the original magnetic curve to produce the corrected/fused position estimate. A demonstration o f the data fusion technique is shown in the fol lowing figures. This data was collected in the laboratory to demonstrate typical operating room movements and the data fusion technique with typical data (Figure 2.23). One should also take note o f the discrepancy in the registration of the magnetic and optical data (see Kinnaird 's thesis 2004). This was one problem with the overall data registration system that our data fusion technique took care of, as wi l l be demonstrated in the fol lowing sections. Time (s) Figure 2.23: Laboratory data. Optical and magnetic data that has been time synched and registered. Optical gaps seen in circled areas. 50 The first step o f G C V filtering the magnetic data was not done in this case, as the magnetic data was quite smooth, and not noisy. This is typical, as noise is usually seen more often in the operating room and rarely in the laboratory. The next step is to interpolate the magnetic data to estimate its value at the optical sampling times (Figure 2.24). o \ A ^ Optical data " Magnetic data Interpolated magnetic data Time (s) Figure 2.24: Interpolate the magnetic data. The red magnetic data (line) is down sampled to time match the optical data that is sampled at a lower frequency (~30Hz). The red dots indicate the down-sampled magnetic data. This is simulated data. The Matlab function " i n t e r p l " is used to down-sample the magnetic data. The Matlab " i n t e r p l " is a one-dimensional data interpolation function based on a cubic spline algorithm. It acts like a lookup table to find the wanted data. This creates a magnetic data stream that is time matched with the optical data (Figure 2.25). Note that this process produces estimates only at times when optical data was available; i f the optical sensor was occluded and a gap in the optical data stream resulted; there w i l l be a corresponding gap in the down-sampled magnetic data stream. 51 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time (s) Figure 2.25: Interpolated magnetic data. The line with open dots is the magnetic data evaluated at optical sample times. There are also gaps in the optical data (top line solid dots) as circled. The third step is to create a difference curve by subtracting the down-sampled magnetic data from the optical data (Figure 2.26). The difference curve represents the error in the magnetic data estimate at the available optical sample times. 52 250 200 | 150 c ° " 100 50 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time (s) Figure 2.26: Difference curve. The light blue line is the difference curve. This represents the error in the magnetic estimate. The fourth step is to interpolate the difference curve with the Matlab " i n t e r p l " function to estimate the errors at the original magnetic sample times (Figure 2.27). At times when optical data is missing (e.g. at -3.7 - 4seconds), the error is estimated by interpolating the difference across the gap. If the magnetic and optical data are perfectly calibrated, registered and time- synchronized, then this difference curve wi l l be constant. Any deviations from these assumptions wi l l generally produce relatively low frequency and low magnitude deviations from this constant difference, so, in the absence of more specific information about how the difference curve varies in time, it is reasonable to simply bridge the gaps between optical fixes with a spline estimate. This process produces difference estimates not only across gaps in the optical data stream, but between sequential optical fixes as wel l . magnetic data Difference curve 53 o 250 c o o ft. Optical data Magnetic data Down sampled magnetic data Difference curve Interpolated difference curve o o % o \f V A v V 1.5 2 2.5 Time (si 3.5 4.5 Figure 2.27: Interpolated difference curve. The solid line is the interpolated difference curve. Finally, we create aby adding the interpolated difference curve back to the original magnetic data. This ' fusion' process treats the optical data as " f ixes " , but it also fi l ls in any optical gaps and produces a high frequency and continuous data stream at the sampling rate o f the magnetic sensor (Figure 2.28). This effective increase in sampling rate aids in extracting performance measures of velocity, acceleration and jerk. 54 c c o CU 250 200 r Optical data Magnetic data Down sampled magnetic data Difference curve Interpolated difference curve Fused data 1.5 2 2.5 Time (s) 3.5 4.5 Figure 2.28: Fused data. The solid black line through the optical data is the fused position estimate. It is a high frequency continuous dataset. Note how the data in the gaps is filed in based on the shape of the magnetic data and how the simple point-to-point connection of the optical data points across the data gaps produces significantly erroneous results. 2.5.5 Er ror Analysis To demonstrate the value o f this data fusion technique, we compare the errors it produces to those of our previous optical interpolation technique using three sources o f data: computer simulated data, laboratory collected data o f typical surgical movements, and real OR data. 2.5.5.1 Analysis Method The fol lowing technique was used for all three situations (computer simulated, laboratory collected, OR) : 1) Collect a complete (without occlusions) optical data set for approximately 1 -3 minutes. 2) Create artificial gaps in the optical data ranging from 0 - 1 0 seconds; after each application o f the analysis, the gap is advanced by 0.1s and the calculations repeated. 3) Compute (a) interpolated optical (b) fused data sets. 4) Calculate R M S error across the gaps in both cases a & b 55 5) Compare a) and b). The previously implemented optical interpolation algorithm was simply using the G C V algorithm to f i l l in the optical gaps. McBeth (2002) first chose the G C V parameters to be used for optical gap interpolation, and we used the same parameters. 2.5.5.2 Results of Error Analysis Three sets o f data were collected to conduct error analyses on. These were: 1) computer generated data, 2) data simulating typical surgical movements collected in the lab, and 3) actual O R collected data. 2.5.5.2.1 Computer Generated Data The simulated data is a sine wave at a frequency o f 1 Hz with an amplitude o f 5 mm. These values were chosen because we felt they were o f a similar frequency and amplitude to surgeon movements in the OR. The optical data is sampled at 30Hz, and the magnetic data at 120Hz, which are the same sampling frequencies that are used in the operating room experiments (Figure 2.29), and we add a 30mm offset to the magnetic data. This demonstrates the simplest form o f our data fusion algorithm. 56 o a. 35 30 25 20 15 10 5h 0 ~i 1 1 1 r Optical data — Magnetic data 0.1 0.2 0.3 0.4 0.5 Time (s) 0.6 0.7 0.B 0.9 Figure 2.29: Computer-generated magnetic and optical data. The lower line is the optical data, while the upper line is the magnetic data. This plot only shows 1 second of data for better visualization of the sine curve. R M S error analysis after using the interpolated optical and then using the data fusion technique is shown in Figure 2.30 below. The R M S error analysis involved taking the interpolated optical or fused data, and comparing it to the original. The differences between the original and interpolated or fused values are used in calculation o f R M S error. The process was continued with larger gap sizes. The optical interpolation method has much larger R M S error values as optical data gap size increases. The data fusion technique error is so small that it is negligible, and is shown as zero on the plot ( RMS error = negligible). In reality (i.e., in the OR) , the amplitudes o f the magnetic and optical data would not match precisely, so this performance is unlikely, but it illustrates the concept. 57 GaD size (seconds) Figure 2.30: RMS error for computer-generated data. 2.5.5.2.2 Laboratory Data Data was collected in the laboratory using movements similar to those that would be performed in the operating room. This would give us an idea o f how our data fusion technique would work with typical data, but without having to set up and collect data during a live operation. We would not have to deal with electrosurgery unit (ESU) effects or other general noise created by the O R environment. A lmost 70 seconds o f data was collected for error analysis (Figure 2.31). 58 160 h i i i 1 i i i i I 1B 19 20 21 22 23 24 Time (s) Figure 2.31: Laboratory collected magnetic and optical data. Bottom figure is a closer view of a 6-7 second interval. Again, we looked at using only the interpolated optical technique compared to our data fusion technique. The results again show a large improvement in the R M S error. This reduction in error is significant as the data fusion process filled in the gaps of the optical data, and was especially effective across the larger gap sizes (Figure 2.32). 59 Gap size (seconds) Figure 2.32: RMS error for laboratory collected data. We also note that the error of the fused data in the laboratory data is below 5mm R M S error, even with a gap size of 10 seconds. At a gap size of 10 seconds, the fused RMS error is 3.9mm, while the interpolated error is at 73.2mm. This demonstrates that the magnetic sensor is able to capture the high-frequency variations in the position signal across the gap, whereas the simple interpolation algorithm essentially assumes a simple spline shape across the gap, thereby producing significant error. 2.5.5.2.3 Operating Room Data Operating room data was extracted from one of our experiments to see how well the data fusion technique worked with real OR data. Approximately 200 seconds of data was extracted for error analysis (Figure 2.33). 60 Operating room data: optical and magnetic 0 20 40 60 80 100 120 140 160 180 200 Time (seconds) Figure 2.33: Real OR magnetic and optical data We see that the RMS error is again lower with our data fusion technique when compared to the interpolated optical (Figure 2.34). The fused error at the lOsecond gap size was 2.38mm, while the interpolated data error was 28.15mm. There is a large improvement in RMS error with use of the data fusion technique. 61 3D I 1 1 r Gap Size (seconds) Figure 2.34: RMS error for OR data. 2.5.6 Discussion of Kinematics Data Fusion Although, in our experience, we did not get gaps of over 10s in our real OR data, a check was done to see what happened to the RMS error at gap sizes of 20s and 30s. The RMS error with the fused data continued to increase slowly, while the RMS error of the interpolated optical signal also continued to rise, but at a much steeper slope. The optical error is related to the magnitude of tool excursion, and the magnetic/fused error is a function of the intrinsic accuracy of the magnetic sensor. Therefore, by fusing the data from both sensors, we are able to come up with an accurate and high-frequency estimate of the tool's position. For typical OR movements in the range of 20mm, the kinematics fusion algorithm can give us a 10-fold decrease in RMS error. In the OR, we also see much finer positional movements (i.e., small millimeter movements) as compared to our other settings. And we see that with large movement excursions (50mm), as seen in the laboratory data, we get even a larger reduction in error with the use of our data fusion algorithm. 62 We specifically conducted our error analysis o f data fusion for translations, and this shows good results. Specific analysis was not done for rotational data. 2.5.7 Conclusions for Kinematics Data Fusion For our experiments, the final objective was to obtain high frequency, continuous estimates o f velocity, acceleration and jerk o f the surgical tool tip. We recorded data from both the optical and magnetic systems and applied a novel data fusion technique. By using our data fusion technique, we have collected a more complete, high frequency continuous data set. This is much improved over the past technique used in our lab o f only using optical tracking systems. This data fusion technique also compensated for discrepancies in the registration o f data (as discussed in Kinnaird 's thesis 2004). This newly proposed data fusion technique is simple and effective for the purpose o f fusing two data streams of similar position and orientation data. It allows for the combination o f an optical and magnetic tracking system that improves the error over interpolation o f the optical data alone. Because o f missing optical data due to loss o f sight o f the markers, this led to difficulties in calculating derivatives o f the position data. By using the magnetic tracking system that has uninterrupted data collection, and warping it to the optical data, we have solved our issues o f missing optical data. The interpolation o f the optical data was sufficient for the previous studies, but the data fusion technique created produces vast improvements over the interpolation method. A s with most data fusion techniques, it is hard to pick one fusion technique that w i l l be optimal for all situations. But for our situation, this new method seems to work very wel l . For its simplicity and novelty, our data fusion technique meets our objective to create a high frequency data set. 2.6 Discussion and Recommendations The purpose of this project was to create a system that would be able to collect quantitative performance measures so we could objectively assess the construct and performance validity o f both physical and virtual reality simulators. This involved modifying an existing system to be capable o f collecting high frequency continuous kinematics and force/torque data from a laparoscopic tool during a live human surgery. 63 One contribution was the development of a system that was able to collect continuous high- frequency kinematics and F/T data in a live human OR with minimal disturbance. The use of a combination of sensors including optical and magnetic position sensors, F/T transducer, and strain gauges all allowed us to achieve our goal. The fusion of the optical and magnetic tracking systems was a novel method to overcome the previous problems of marker occlusion with the optical system alone. By joining these two position-tracking systems, we were able to have the positive features of both sensors: accuracy of the optical system, and high frequency and continuity of the magnetic system. There are recommendations suggested for the continuation of this work in the future, as well as what steps could be taken to improve the overall system in various ways. 2.6.1 Kinematics The kinematics system has been in use in our lab for the past 5 years, and we have continuously improved this system to gather reliable and accurate kinematics measures from the human OR. Because of the optical data gaps caused by occlusions of the markers in the OR, we have chosen to include a second position tracking system. The two position tracking systems (optical and electromagnetic) data streams are fused together to create a high frequency continuous data set. The new kinematics fusion technique demonstrates a large improvement over the previously implemented optical interpolation technique to fill in the optical gaps. 2.6.2 Force The force measures are a new addition to our performance measure data collection system. We have created a system involving both a force sensor and strain gauges to analyze tissue interaction forces. This system is the first system that is able to collect tool forces in the human OR during MIS. But there are some revisions that need to be made to derive more accurate force measures. The grip calibration scheme should be revised. The grip compensation was more complex than originally thought (problems with surgical tool friction and force sensor saturation). The algorithm could not compensate for all forces, and may have led to misleadingly high forces. 64 Rosen (1999) published their work with similar raw force data as was found in our study. They were also not able to properly distinguish the grip force from the force sensor data. In theory, the idea works quite wel l , but we believe that with a few improvements to the overall system, better force estimates could be made. A redesign o f the mounting bracket, and a force sensor that could be mounted directly onto the tool shaft could possibly help to eliminate issues such as the movement o f the bracket on the tool shaft. Dr. Blake Hannaford (2004) o f the University of Washington mentioned [personal communication] that he believes almost 8 0 % o f the forces sensed by the force transducer come from the trocar interaction with the surgical tool. We believe that it may be possible that lateral forces could contribute that amount, but axial forces at the trocar would not be as significant as the surgical tool does not experience much axial friction going through the trocar. Dr. Hannaford's group has done extensive study in this area, as they were one o f the first groups to mount a force sensor onto a surgical tool (Rosen 1999). This is a possible significant contribution to the force measures, and could be taken into account. But on the other hand, we have measured all tool interaction forces, regardless o f the source, and this in itself is an interesting measure. Laboratory studies could be conducted and models created to determine exactly how much force is created by the interaction between the surgical tool shaft and the trocar. Compensation algorithms could then take these trocar forces into account to reveal more accurate tool-tissue interaction forces. 2.6.3 Recommendations The data collection and analysis system that has we have developed for the quantitative assessment o f surgical performance is a unique system, and is one o f the first to be used for this kind o f data collection in the human operating room. Because this system is a first attempt at such a difficult endeavour, there are some improvments that could be made for even easier data collection and analysis. The recommendations include: • Sensor mounting bracket: o Location o f each o f the sensors currently changes the "weight ing" o f the normal surgical tool causing unnatural rotation of the tool tip. The bracket seems to be 65 "bottom heavy", and w i l l want to swing into the one position. Rearrangement o f where each of the sensors is mounted may help with this issue. o Possible friction issues where bracket meets the inner tool shaft causing "st ickiness" in the movement o f the tool handles. o Flex and warping o f the actual sensor mounting bracket. The material should be stiffer. o Current position o f the magnetic receiver occludes the optical markers. Bracket redesign or repositioning o f the receiver would help. • Force/Torque sensor: o Physically smaller sensor would take up less space on the mounting bracket. And would also help with the "weight ing" issue. • Data acquisition: o One button operation from one computer to alleviate some problems with space and time in the operating room. o Possible use of different software system that is designed for data acquisition with multiple serial ports. The experimental data collection and analysis system allowed the collection o f data in the human operating room. With few improvements, the system could be made for easier widespread use in larger studies. 66 Chapter 3 Experimental Methods for Assessing Validity of Laparoscopic Surgical Simulators 3.1 Introduction and Objectives Historically, surgical education has been based on an apprenticeship style o f training, where a senior surgeon would mentor the novice surgeons. It is now widely accepted in the surgical education field that this method o f teaching is no longer acceptable (Feldman 2004, Rosser 1998, Winckel 1994). Inspired by the original flight training simulation programs that were used successfully in pilot training, surgical trainers and simulators were also created. The use o f surgical simulators has come to the forefront o f surgical education programs, and has been incorporated into some surgeon training programs (Fried 2004, Wentink 2003), al lowing for unlimited unsupervised surgical practice. But from these studies, it has yet to be quantitatively shown how motor behaviour and patterns compare between the human operating room and surgical simulators; whether physically or virtual reality based. This study compares these motor behaviours in resident and expert surgeons using a custom-designed experimental system. The primary objective is to study the performance, construct and concurrent validity o f two types o f surgical simulators: physical and virtual reality (VR) , using surgical residents and experts as our subjects. Using our unique data collection and assessment system, we are able to quantitatively and objectively assess surgeon motor behaviours, and make comparisons to simulators using the same performance measures in all contexts. Our objective is to use the custom-designed surgical tool and data collection system described in the last chapter to collect motor behaviour data from novice surgeons in the human OR, and compare this to analogous tasks in surgical simulators to study performance validity o f two surgical simulators. A lso , i f we can show that these simulators can distinguish between resident and expert surgeons, we can conclude that the simulators demonstrate construct validity. 67 For a diagrammatic representation o f the goals in this project, please refer to Figure 3.1 below: Operating Room V R Simulator Physical Simulator Surgical Resident 1 i 3 i 2 * i 2 Expert Surgeon ^ 4 7 ^ i 7 • i r 1 - Performance validity (residents) 2 - Construct validity (experts vs. residents) 3 - Concurrent validity: based on performance measures 4 - Performance validity (experts): completed by Kinnaird (2004) Figure 3.1: Diagram of goals for this project. Each of the comparisons will demonstrate another type of validity. Goals 1-2 in bold are considered the main objectives. The dotted arrow line represents data from the study by Kinnaird (2004). In the balance o f this chapter, we describe the experimental methods used to study our objectives. We wi l l describe the subjects, the settings, and the equipment used to collect and analyse the data. The methods o f post-processing o f the collected data are also covered, and our context comparisons to study performance, construct and concurrent validity. 3.2 Subjects and Settings Three University o f British Columbia PGY-4 (post-graduate year 4) surgical residents were assessed in three different settings (human operating room, virtual reality simulator, physical simulator) over a period o f 5 months (March - July 2004). The residents consisted o f 2 males, 1 female, all under the age o f 35, and right-hand dominant. The surgical residents signed consent forms approved by the institutional review board to participate in our study. 68 Two expert surgeons' data that corresponded to the resident data was analyzed by Catherine Kinnaird (2004) in a recent study, and shared with this author. The expert surgeon data was collected by both this author and Kinnaird. 3.2.1 Settings A l l o f the subjects were evaluated in three settings: OR, V R simulator, and physical simulator. 3.2.1.1 Operating Room The three surgical residents each performed a laparoscopic cholecystectomy under the direct supervision o f an expert surgeon. Data was collected using the custom-designed data collection system with the experimental tool (as described in Chapter 2) during each o f these procedures at the University o f British Columbia Hospital between March and Apr i l 2004. The University o f British Columbia (UBC ) ethical review board gave approval for this data collection. Equipment was sterilized as appropriate by ethylene oxide, and approved for use in the O R by the U B C Biomedical Engineering department. For the O R experiments, no prior selection o f patients or staff was made. The patients were all required to have signed an informed consent form for the data collection prior to their procedure. 3.2.1.2 Virtual Reality Simulator The virtual reality (VR ) simulator used in our experiments consists o f the Reachin™ Laparoscopic Training Package (Stockholm, Sweden) haptic feedback software, and the Immersion® (San Jose, C A , USA ) hardware systems. The Immersion surgical station has two laparoscopic tools with interchangeable handles. These tools are similar to real laparoscopic tools in that they have four haptic degrees o f freedom and a rotating tool tip. This hardware and software systems are complimentary and are combined to make our force feedback V R laparoscopic simulator. Generally, novices progress logically through increasingly difficult ski l l levels in training. The training package does not simulate an entire procedure, but the smaller tasks involved. There are tasks that are specific to the laparoscopic cholecystectomy procedure such as the camera placement, clip and cut, and dissection. The specific module used in our experiments was the cystic duct dissection task o f a laparoscopic cholecystectomy. The subject was to bimanually dissect away the surrounding fat 68 Two expert surgeons' data that corresponded to the resident data was analyzed by Catherine Kinnaird (2004) in a recent study, and shared with this author. The expert surgeon data was collected by both this author and Kinnaird. 3.2.1 Settings A l l o f the subjects were evaluated in three settings: OR, V R simulator, and physical simulator. 3.2.1.1 Operating Room The three surgical residents each performed a laparoscopic cholecystectomy under the direct supervision o f an expert surgeon. Data was collected using the custom-designed data collection system with the experimental tool (as described in Chapter 2) during each o f these procedures at the University o f British Columbia Hospital between March and Apr i l 2004. The University o f British Columbia (UBC ) ethical review board gave approval for this data collection. Equipment was sterilized as appropriate by ethylene oxide, and approved for use in the O R by the U B C Biomedical Engineering department. For the O R experiments, no prior selection o f patients or staff was made. The patients were all required to have signed an informed consent form for the data collection prior to their procedure. 3.2.1.2 Virtual Reality Simulator The virtual reality (VR) simulator used in our experiments consists o f the Reachin™ Laparoscopic Training Package (Stockholm, Sweden) haptic feedback software, and the Immersion® (San Jose, C A , U S A ) hardware systems. The Immersion surgical station has two laparoscopic tools with interchangeable handles. These tools are similar to real laparoscopic tools in that they have four haptic degrees o f freedom and a rotating tool tip. This hardware and software systems are complimentary and are combined to make our force feedback V R laparoscopic simulator. Generally, novices progress logically through increasingly difficult ski l l levels in training. The training package does not simulate an entire procedure, but the smaller tasks involved. There are tasks that are specific to the laparoscopic cholecystectomy procedure such as the camera placement, clip and cut, and dissection. The specific module used in our experiments was the cystic duct dissection task o f a laparoscopic cholecystectomy. The subject was to bimanually dissect away the surrounding fat 69 and tissue from the cystic duct to expose it fully. The Maryland dissector is used in the right hand, and the surgeon's choice of grasper in the left hand. This is the typical surgical tool arrangement in the operating room. 3.2.1.3 Physical Simulator The physical simulator used was a newly developed mandarin orange dissection. From literature searches, it was not found that any other laparoscopic surgical simulators use a mandarin orange simulator. Existing commercial physical simulators were not chosen for use as none readily represented the analogous dissections that were found in the OR or the V R simulator. The orange was chosen in consultation with expert surgeons, and met our requirements of a non-meat material as the same tool had to be used in the human OR (safety requirements do not allow surgical tools that are used on any other animal to be used in the human OR). The removing of segments of orange also represented the OR dissection most closely. The surgeons believed that this simulation was similar enough to real dissection tasks in terms of required movements and forces. The data from this simulator was collected by our custom-designed system as described in Chapter 2. The subject used the instrumented Maryland dissector in their right hand, and was free to choose a standard tool for their left hand. The left hand tool was usually some type of grasper as similarly used in the OR. Generally, for right-handed surgeons, the useful tool is in the right hand, while the left hand is used more often for grasping and holding. The laparoscopic camera handler in these experiments was one of the researchers, with the subject directing to which direction to move and view. Using standard laparoscopic set-up (laparoscopic tower and camera) in a standard box trainer, the subject was asked to remove the peel and dissect out several segments of the orange using the experimental tool. They were specifically told to be cautious and to do as little damage as possible to the surrounding orange segments. As an indicator of face validity of this task, the head of surgical training at the University of British Columbia has decided to include this mandarin orange physical simulation in the surgical education program. 3.3 Performance Measures The fundamental data available from our various systems include position and force data. The optoelectronic and electromagnetic systems provide us with 3D position and orientation data of the surgical tool. The force sensor and strain gauges give us force information. From this collected data, we then extracted a set of kinematics and force measures as described in the following sections. The performance measures that are available are similar between the data from our collection system for the OR and the physical simulator as compared to the V R simulator (Table 3.1). Our data collection system was designed to allow for a broad range of measures to be taken. The V R simulator has built-in software that also gives many measures, but has limitations in the roll and tool tip force data as mentioned previously. We selected a variety of performance measures, and in the end decided to study a total of 26 measures to get a thorough understanding of the surgeon's motor behaviour. (The V R simulator gives us a total of 17 performance measures). The motions are all described in a reference frame at the surgical tool tip as seen in Figure 3.2. 3.3.1 Kinematics 3D kinematics data from the simulators and the OR are a performance measure studied in this project. The position data was differentiated to generate velocity, acceleration, and jerk data. This data can then be used to make comparisons between the three settings (OR, V R simulator, physical simulator). Specifically, the following kinematics performance measures were analyzed: velocity, acceleration, and jerk, in the axial, grasp, translate, transverse, absolute and roll tool tip directions. The V R simulator is limited in the tool tip roll direction and the force data as was mentioned previously. 3.3.2 Forces Force data from the simulators and the OR were compared using the post-processed OR force data and the obtained simulators force data. Individual force components (axial, grasp, translate) and the transverse and absolute planes were analyzed. These components were calculated for the OR and physical simulator and available directly from the V R simulator 71 software for the V R data. Force data in the direction around the surgical tool axis (roll) were analyzed for the physical simulator and O R datasets, but were not available from the V R simulator software. 72 Table 3.1: Performance measures available from the three contexts. All measures are available in the physical simulator and OR contexts as data was collected with the experimental tool. The VR simulator is limited in roll and tool tip forces. Future software upgrades will allow for these measurements. See Figure 3.2 for tool dip directions. (Modified from Kinnaird 2004) Operating Room VR Simulator Physical Simulator Tip Distance from Mean: Absolute X X X Rol l X X Tip Velocity: x,y,z X X X Transverse X X X Absolute X X X Roll X X Tip A c c ' n : x,y,z X X X Transverse X X X Absolute X X X Roll X X Tip Jerk: x,y,z X X X Transverse X X X Absolute X X X Roll X X Tip Force: x,y,z X X Transverse X X Absolute X X X Roll X X 73 Y • X - Translate Roll Grasp Z - Axial Figure 3.2: Tool tip reference frame. The performance measures are taken relative to the surgical tool tip reference frame. 3.4 E q u i p m e n t Used Each of the components of the sensor equipment used to collect the data was previously described in Chapter 2 section 2.3. To recap, for the OR and the physical simulator, we used an instrumented tool which incorporated optoelectronic and electromagnetic sensors to track position, and the force data was collected by a force sensor and strain gauge system. For the VR simulator, we used the data collected by the Reachin system. The remaining systems and the integration are described in the following sections. 3.4.1 Video Data In addition to the sensors in the operating room, we also recorded videos of the surgery. Both an internal abdomen laparoscope camera view, and an external video camcorder focused on the surgeon were recorded. The two videos were time stamped and recorded onto standard VHS tapes using video-editing equipment. From these time stamped videos, correlations in time can be made with the collected kinematics and F/T data for analysis, and enabled us to identify the start and end points of the targeted tasks. The laparoscope video aided in the segmenting of the data, and start-stop points could be picked out for data segmentation as will be discussed in section 3.8.1.2. The external camcorder video allowed us to synchronize the data streams and for determining the characteristic synchronization movements needed as will be described in section 3.6.3. 74 3.4.2 System Component Integration There are many components to this system that needed to be integrated to create a user-friendly system. This included all the sensors (F/T, optical, magnetic, strain gauge), and the video (laparoscopic and camcorder) and the computer to run these sensors (Figure 3.3). A l l the systems except for the electromagnetic tracking system, Fastrak, were connected to a standard desktop computer (2.4GHz A M D Duron processor) with custom-designed data acquisition software written in Matlab (The Mathworks, Massachusetts, USA). Because of difficulties using multiple serial ports in one Matlab program, the Fastrak was connected to a separate laptop computer (minimum 800MHz) and the FTGUI software supplied with the Fastrak tracking system is used to collect data. Data Analysis Figure 3.3: Components of the performance measurement system. 75 The standard laparoscopic surgical equipment was used in each O R procedure. The laparoscope system consisted o f a standard 10mm - 0° surgical laparoscopic, camera and illuminator (Stryker Endoscopy). A l l equipment used for this study was approved by the Biomedical Engineering Department at the University o f British Columbia Hospital, and was sterilized where appropriate with ethylene oxide. 3.4.3 Data Acquisition Software Because o f the variety o f sensors used for data collection, various types o f software were needed. Matlab was used as the primary data collection software because o f its availability and usefulness in data collection and analysis. The optical data was collected at 30Hz via a RS-232 serial port interface using existing custom-designed software implemented by McBeth in a previous study (2002). The graphical user interface (GUI) allowed for the user to see when the optical markers were visible or occluded, which allowed for better placement o f the optical camera prior to the O R data collection. Magnetic data was collected using the company (Polhemus) supplied data collection software (FTGUI ) on a laptop computer. This data was collected at 120Hz through a RS-232 serial port interface. The analog signal data from the strain gauges was gathered and converted to a digital signal using a Measurement Computing PCI data acquisition board. This board is supported by the Matlab Data Acquisit ion Toolbox and allowed for streaming strain gauge data at 120Hz. Custom-designed Windows operating system drivers and Matlab functions previously created by Wi l lem Atsma, a PhD student in our lab, were used to collect the AT I force/torque sensor data from the ISA data acquisition board that comes with the F/T sensor (Atsma 2001). Streaming forces and torques could be collected at 120Hz. Modifications were made to the original optical tracking software by McBeth to allow for data collection o f the optical, F/T, and strain gauges all within the same GUI (Figure 3.4). 76 Polaris Motion Capture System [OR Version) Posture Markers •"' Psv2 . PsvB Z% Hand ZJ Dist forearm Psv3 L _ Fae«4 CZT Fac*5 t - Facel C- Face2 CZ Fa&e3 C Z RightHand Righl Hand Tool Choice UBC - Neuiomotoi Contiol Laboratory Figure 3.4: Custom designed data acquisition software. Polaris (optical tracking) GUI (original version by McBeth 2002) gathers streaming optical data, strain gauges and force/torque sensor with one button. 3.5 D a t a C o l l e c t i o n Study data was collected in the operating room from both virtual reality and physical simulators. 3.5.1 Operating Room Study Each o f the surgical residents performed a laparoscopic cholecystectomy at the University o f British Columbia Hospital with an expert surgeon supervising. There were two researchers in the operating room (OR) for each experiment. One researcher scrubbed into the surgery to prepare the modified laparoscopic tool for use in vivo. This required cutting out and attaching a small thin section o f OpSite™ surgical dressing to be used as a liquid barrier on the force/torque sensor. It was important to seal all crevices in the sensor to not allow any moisture to seep in. A lso , a small piece o f Mepore™ was used to wrap around the surgical tool handle where the strain gauges were mounted. This prevented the surgeon's fingers from getting caught on any edges or the strain gauge wiring, and kept the area clean and free from any foreign substances. The second researcher would help with set-up o f the video camera and Polaris optical camera system, and then operate the computers and required software. The scrubbed-in researcher would also pass o f f the sensor wires from the surgical tool to the other researcher to be connected to the various computers and systems. If at any time the surgeon felt uncomfortable using the modified surgical tool, they could switch to a traditional non-modified 77 surgical tool. Immediately postoperatively, there were calibration "poses" needed with the modified tool. This calibration was used to synchronize and register the various streams o f data. A lso, a gravity vector was established with the tool in a neutral horizontal position to allow us to remove the gravity effects from the raw F/T data. The scrubbed-in researcher would hold and manipulate the tool in the required positions as data was collected. A more detailed explanation o f the operating room protocol can be seen in Appendix A . 3.5.2 Simulator Data Collection Both the surgical simulators were located in the Center o f Excellence for Surgical Education and Innovation (CESEI ) in Vancouver General Hospital ( VGH ) . Each surgical resident came on separate days and completed the data collection on one o f the two simulators on each visit. The V R simulator data was collected first, followed by the physical simulator at the later date. The V R simulator data was collected three times in one session, and took approximately 20 minutes for all three trials. The physical simulator data was collected once, and took approximately 15 minutes to complete. The reason why there we were limited in the number o f trials for each simulator was that the surgical residents did not have any more time to come in. The resident was asked to stand in a natural and comfortable position centred in front o f the simulator, and to treat the simulation as an operative procedure. Before the start o f each simulator data collection session, the surgeon was allowed a short familiarization and training session (~10minutes) on each o f the simulators. This allowed the surgeon to be comfortable with the individual simulators and with the goals o f the task, but did not allow for extensive practice or training. Each resident completed the required task as we collected kinematics and force data with either our system (physical) or built-in software (VR) . Post processing o f the raw V R data was done with software designed by Iman Brouwer (2004), and produced continuous streams o f kinematics and force data. This formatted V R data was similar to that gathered with our intraoperative system and allows for similar performance measure extraction, except that the V R data was limited in the force measures and roll in the tool tip direction. As was mentioned earlier, the x, y, and z direction force measures are available in the simulator defined " w o r l d " frame coordinates, but due to software complications, the proper transformation matrix was not 78 saved, and we could not transform the data to our tool tip reference frame. Therefore, we only could use the absolute force measure, and not the components. Roll torque is not available in the V R simulator. 3.6 Data Post-Processing After data was collected in the operating room with the experimental surgical tool, many steps had to be taken to format the raw data into a usable form. See Figure 3.5 for a diagram of the post-processing steps. Optical Data Magnetic Data Force/Torque Data Strain Gauge Data * Transform data to tool tip Optical & magnetic Data Synchronization Cautery effects removal F/T & strain gauges Cautery effectsremoval Data Fusion Strain gauges F/T gravity effectsremoval Differentiate grip forces removal Performance measures Transform data to tip Figure 3.5: Data post-processing 79 3.6.1 Kinematics Data Registration and Calibration We need to be able to represent the positional data gathered from both the optical and magnetic sensors in the same location in 3D space. Specifically, we want to know where the surgical tool tip is with respect to a world frame. A s was described in Chapter 2 section 2.3.1, each o f the position sensors tracks a 3D position using its' own tracking method. The Polaris optical tracking system can track five geometrically unique faces, which are set as three passively reflecting marker spheres that are custom-mounted on the array halo. Each o f these faces represents a 3D frame in space, and the tracking camera w i l l track one o f their locations in space at a time, with respect to the camera reference frame. The Polhemus magnetic tracking system receiver also has its' own representation as a reference frame, and its location is tracked with respect to the transmitter reference frame. As discussed earlier, we would like to fuse the two data streams from the optical and magnetic sensors. But to be able to do this properly, they need to represent the same locations in space. When using the experimental surgical tool, we are tracking the location o f the surgical tool tip frame with respect to an anatomical body frame. A thorough discussion o f the data registration between the optical camera and magnetic transmitter reference frames can be found in the thesis o f Catherine Kinnaird (2004). Detailed information on component and O R system calibration and reference frame registration were also addressed in the mentioned work. 3.6.2 Force/Torque Data Registration and Calibration This section wi l l briefly outline the registration procedures for the 3D force/torque data. This is necessary to be able to produce force and torque measurements referenced to the surgical tool tip. The strain gauge data is used for estimating grip force and removing it fro the tip force estimates and was previously described in Chapter 2 section 2.4. 3.6.2.1 Force/Torque Data Registration After the raw F/T data was adjusted to account for gravity effects and grip forces, the F/T data was transformed to the surgical tool tip reference frame. The tool tip frame was established using optical and magnetic point probes and a calibration rig. The tip frame created here was established to be the same as in the kinematics registration. Further details o f this can be found in Catherine Kinnaird's thesis (Kinnaird 2004). 80 3.6.3 Raw Data Synchronization Because o f the variety o f sensors and computers used in this data collection procedure, the data streams were not initially synchronized and steps had to be taken to ensure that we could start the data streams at the same time to extract time-matched data. We therefore designed algorithms to allow us to synch the various sensors (Figure 3.6). As described in the Operating Room protocol (Appendix A ) , a large characteristic movement was made by the surgeon at the end o f the surgery, which enabled us to find corresponding times in the position datasets. This characteristic move was much larger than anything that would be seen during typical surgical movements. Also as part o f the protocol, the surgical tool was held in a horizontal stationary position before and after the characteristic large move to further differentiate this synching movement from the surgeon's regular tool movements. The now synched position data was synched to the AT I force data by a "h i t " against a surface. Lastly, the synched position and force data was synched to the strain gauge data by a large "squeeze" to the tool handles. Polaris Optical System large movement Polhemus Magnetic System Synched position data hit ATI FAT System Synched position w & F/T data squeeze Figure 3.6: Data synchronization process. The position sensors are synched first by visual inspection of data for the large characteristic move. This is then synched to the F/T data by looking for the large "hit". This synched data is then time synched with the strain gauge data by the large squeeze that is seen in the strain andforce data, and all data is now synchronized in time. To actually synch the optical and magnetic kinematics data, a visual inspection o f the position data during the large characteristic move is done. A small segment o f time is chosen from both positional datasets, and an optimization routine is executed. From this small window in time, an initial guess o f At is made and input into the algorithm. The real At is calculated by using a non-linear least squares optimization routine found in the Matlab Optimization Toolbox. Further details on this can be found in Catherine Kinnaird's thesis (2004). 81 The synched kinematics data then is synched to the F/T data by again visual inspection and a window o f time is chosen. The large characteristic move also includes a "h i t " as described earlier, which is recorded by the sensors. This "h i t " is larger than any typical forces in surgery. The F/T data can then by synched with the kinematics data. Finally the strain data was synched to the previously synched kinematics and F/T data. The characteristic move includes a large squeeze o f the tool handles, which is after the big "h i t " and usually larger than any squeezes that a surgeon would do. The external camcorder video and the internal laparoscopic video also needed to be time synched with the collected data. These two videos are colleted separately and could be synchronized as we could see the surgeon inserting the laparoscopic camera into the trocar in both videos. Our external camcorder was focussed on the main surgeon, and all their movements were recorded. The internal laparoscopic camera also recorded continuously. Once this insertion movement was identified, then video-editing equipment could be used to time- stamp both internal laparoscopic video and external camcorder video. These videos then had to be synchronized with the collected sensor data. By visually inspecting the magnetic positional data, we could see when the surgeon removed the experimental tool and laid it down. For example, the surgeon removes the experimental Maryland dissector to use the cl ipping tool. These times could be seen both in the external camcorder video and in the magnetic positional data stream, and this information could be used to synchronize them together. 3.7 Electrosurgery Unit A n electrosurgery unit (ESU) is a common and typical piece o f equipment in today's modern OR. It allows the surgeon to cut through tissues while coagulating any blood vessels at the same time. This is beneficial for both the patient and the surgeon. The patient wi l l lose less blood when cuts are made this way, and the surgeon is able to operate in an almost blood-free environment. 82 The E S U delivers radio frequency (RF) currents, which allow the surgeon to cut, cauterize or coagulate live human tissues. A n electrical wire from the E S U is attached to the surgical tool through the port. The electrical current passes down through this connection, down the innermost shaft o f the surgical tool and through to the desired tissues via the surgical tool tip. The monopolar type o f E SU was used in these experiments. The monopolar E S U requires that the electrical current pass from the active electrode through the body, and exit through a passive electrode attached pre-operatively to the patient's body. There are number o f settings that can be chosen on a typical E S U . There are generally two modes: cut and coagulate. Generally in these applications, the E SU cut mode is capable o f a 400KHz (1200V) voltage. And in the coagulation mode, 2 5 0 K H z (3500V) at 4 0 K H z bursts is available for surgeon use to coagulate tissue. In the OR, a surgeon wi l l usually request the "b lend" setting that is a combination o f cut and coagulation settings. This allows for cutting through the tissues while coagulating any blood vessels along the way. 3.7.1 E S U Effects Because o f the variety o f instrumentation and sensors that we have attached to the experimental tool and because we had cut the original tool to add our modifications, we needed to ensure that our sensors would not be damaged and that the current would pass uninterrupted through the tool. Preliminary tests with a typical O R E SU borrowed from Vancouver General Hospital Biomedical Engineering were completed to see the effects on all the sensors. The first tests were completed to determine i f the use o f cautery would damage the sensors. We were unsure i f the electrical current was strong enough to permanently damage the sensors. We did incrementally increase the voltage and current output to maximum from the E S U , and recorded the data and checked the sensors' operation. We found that no sensors would be damaged, but the readings from the strain gauges, magnetic position system, and the F/T data would all be affected by a significant amount o f noise while using coagulation and blend modes. The cut setting did not have any noticeable effect on the data. 83 3.7.1.1 Removal of E S U Effects The magnetic, strain gauges and F/T data are all affected adversely by the ESU. Although, the amount and degree of noise is different for each sensor, the basic approach to dealing with and removing the noise and extracting the proper data is very similar for each sensor, with small modifications and adjustments made for each. But according to our experimental OR data, cautery may be applied for as long as 15 seconds at a time. The effect of ESU activity on strain gauge data is shown in Figure 3.7. Raw Strain Gauge Data from Resident OR trial - 5 - -6 - ; 440 442 444 446 448 450 452 454 time (s) Figure 3.7: Strain gauge data with electrocautery noise. It is obvious that when the ESU is applied, the strain gauge data is completely distorted, and we felt (after some experimentation) that no amount of filtering would produce a useful signal. We decided to simply remove these noisy sections from our data. The data removal algorithm is based on looking at small increments of time (-1/10s) and comparing it to the small time segment before it. If the difference between these windows is beyond a given threshold (chosen by comparison of the known good data from noisy data), the noisy data is removed (Figure 3.8). The threshold values varied depending on the data, and were chosen by examining data surrounding the noisy section. If a large amount of data were 84 affected in a block, then the whole block would be manually removed. This ensured that minimal data would be removed. We were always careful during data removal, and generally under-removed data as opposed to over-removing it. Raw Strain Gauge Data w/ Cautery Removed I , , , , i y . i I 438 440 442 444 446 448 450 452 454 456 458 time (s) Figure 3.8: Raw and noise removed strain gauge data. Large and small sections of noise are removed. The magnetic position tracker is also affected by the ESU (Figure 3.9), but a built-in feature of the Polhemus FTGUI software is its ability to track when errors occurs, and make a record of these errors. This is seen in the raw data output. This allows for an easier preliminary data removal in the magnetic stream, as this erroneous data can be removed. Any remaining noise artefacts can be removed manually or by using the filtering technique of monitoring the sudden noisy changes as described above. Cautery Filtered Magnetic Data - X-Direction P -200 Figure 3.9: Electrocautery affected magnetic data. Lastly, the F/T dataset was also affected by the use of ESU. As can be seen in Figure 3.10, it quite obvious when the cautery current is used, as the data suddenly changes showing large spikes. Data removal was done manually or by using the sudden change in profile filtering method similar to that of the previous sensors. Force/Torque Sensor (Coagulation Power = 75W) 86 Figure 3.10: Electrocautery affected F/T data. 3.8 Task Comparisons A l l surgical procedures are highly variable and we are in need o f a method o f making comparisons between these types o f procedures. We also need a method to compare these O R measurements to both the measurements from the V R and physical simulators to be able to assess the validity o f the simulators and our performance measures. In our O R studies, data was gathered from laparoscopic cholecystectomies (gallbladder removals) without prior selection for patient or operating room staff, therefore increasing the variability between the procedures. In contrast, the V R simulator provides a very structured, rigid and repetitive environment for teaching and evaluation o f surgical skil ls. The tasks are broken down and each one can be practiced separately (i.e. cl ipping task, dissection task, etc.). The physical simulator is also a relatively repetitive and repeatable environment. But to compare between these three settings, we need to be able to extract similar data from each context. 3.8.1 The Dissection Stage The experimental tool chosen in consultation with expert surgeons was the Maryland dissector (or grasper). This tool was selected as it is used extensively throughout the first portion o f most laparoscopic cholecystectomies, and most surgeons are comfortable and familiar with its use. We have selected a commonly completed stage in the operative procedure to demonstrate our approach to performance evaluation. This is the dissection stage, and is a key component in the laparoscopic cholecystectomy procedure. The dissection stage o f the laparoscopic 87 cholecystectomy involves removal o f extraneous tissues and fat surrounding the gallbladder and the vessels (cystic artery and cystic duct), and to isolate these vessels for cl ipping and cutting. Both the V R and physical simulators have analogous tasks that can be compared to this dissection stage in the actual human operation. The V R simulator has a cystic duct dissection simulation where the surgeon must dissect away the fat surrounding the gallbladder and cystic duct and artery (Figure 3.11). The physical simulation is the dissection o f mandarin orange fruit using the hybrid experimental tool and data collection system. Figure 3.11: VR simulator vs. Physical simulator vs. OR. From left to right: VR simulation of cystic duct dissection. Physical simulator is an orange. Actual OR dissection task. 3.8.2 Data Segmentation In a typical O R experiment, we would collect approximately 10-15minut.es o f O R data from all our sensors. This would lead to very large raw datasets. In order to better manage these large amounts o f data, and to be able to break down the procedure according to the hierarchical decomposition, data segmentation was used. The post-processed raw data could be segmented with the help o f the time-stamped internal laparoscopic video and the external camcorder video. We were specifically concerned with the dissection task o f the procedure. After data segmentation, performance measures could be extracted, and then compared with analogous tasks in the V R and physical simulators. 3.8.2.1 Data Segmenting The internal laparoscopic synchronized and time-stamped O R video was used to create a start and end point for each dissection task. Each start point was taken as the moment in time when the experimental surgical tool first contacts the tissues. The end point was when the tool was 88 removed from the tissue, and taken out o f the trocar. These start and end points are then used to segment out the dissection task o f both the formatted kinematics and F/T data. For each procedure, the dissection task o f a typical laparoscopic cholecystectomy is o f interest. This dissection task is decomposed further into segments to allow for easier data manipulation. These segments are identified by visual observation o f the kinematics, F/T and video data to see when the surgical tool tip is in contact with the tissues and is being actively used. Generally, the first segment is when the surgeon has first entered the surgical tool into the abdomen is exploration and anatomy identification (i.e. cystic duct, cystic artery, surrounding vessels). The cystic artery and cystic duct are identified, separated, clipped and cut, respectively. Our experimental tool is used extensively for these portions o f the procedure. Once the cystic duct is cut, the surgeon tends to use a hook or spatula tool to dissect and remove the gallbladder from the liver rather than the experimental tool. The time and parts o f each procedure where the surgeon uses the experimental tool is different between surgeons, and can vary between procedures. Usually, the most variable portions o f the procedure are isolating Calot 's triangle, and dissecting the gallbladder. This was usually due to a chronically inflamed gallbladder resulting from a patient waiting a long time before having the surgery. This sometimes led to longer operating times. For the V R and physical simulators, no data segmenting was done, as the entire task was considered to be dissection. We did consider all three contexts to be analogous in that we were able to separate out the dissections tasks in the O R data, and the two simulators only included dissection task data. 3.9 Setting Comparisons After data was collected and formatted from the three settings (OR, V R and physical simulator), comparisons could be made between these contexts. We wanted to examine intersubject, intrasubject and context differences and similarities. The novice surgeon data also needed to be compared to the expert surgeon data previously collected and analysed (Kinnaird 2004). The raw data consisted o f time histories o f displacement, velocity, jerk and force acquired over intervals o f up to approximately 20 minutes. The main point being that a large 89 amount o f data was collected from somewhat different unstructured (variability in performance) contexts, therefore we cannot make detailed specific comparisons. To assess differences, we chose to use a statistic that would be sensitive to any differences between the cumulative probability distributions (CPD's ) o f the performance measures. The Kolmogorov- Smirnov (KS) statistic was used in previous studies in our lab, and we have decided to continue with its use. 3.9.1 Kolmogorov-Smirnov Statistic The Kolmogorov-Smirnov (KS) statistic is a parameter free measure o f the difference between two C P D ' s . It requires the data from two cumulative probability distributions (CPD's ) o f performance measures such as velocity, force, etc., and it is the maximum absolute vertical difference (D) value between the two C P D ' s (Figure 3.12). The D-value ranges from 0 (similar) to 1 (different). Another advantage o f the KS statistic is that it makes no a priori assumptions about the shape o f the C P D ' s (i.e., they do not have to be Gaussian) and is relatively insensitive to outliers in the data (Hodgson 2002). This characteristic is especially important as our data does include many outliers, even after filtering. For these reasons, the KS statistic has been found to be a valuable tool in evaluating behaviours in different environments (Boer 1996, McBeth 2001), so we have chosen to use the KS statistic for all our contextual comparisons. C P D x Figure 3.12: Kolmogorov-Smirnov CPD. Comparison of cumulative probability distributions to find D-value of the KS statistic. 90 3.9.2 Comparisons In order to better understand the usefulness o f the KS statistic in our application, a brief description o f the comparisons done is needed. (A more thorough description o f these comparisons can be found in Chapter 4 section 4.2.1). We have collected data with 5 subjects in 3 settings (OR, V R and physical simulator). The 5 subjects are divided into 2 levels: resident and expert. We have made intrasubject, intersubject, interlevel, and intersetting comparisons. For example, i f we wanted to compare one resident to another in the performance measure o f force, we would create the C P D o f force for each resident, and then be able to find the KS D-value. The D-value gives us the difference between these two subjects. The larger the D-value, the larger the difference between subjects. 3.9.3 Assigning Confidence Intervals When we express a difference between two C P D ' s , we must also compute a corresponding confidence interval (CI) on the difference measure. Although difficult to do analytically, it is comparatively straightforward to compute using bootstrapping methods. Bootstrapping is a computationally intensive method that involves using the sample actually obtained as an estimate o f the underlying distribution, and randomly resampling the dataset and re-computing the KS statistic at each bootstrapping cycle, which w i l l give us a measure o f the accuracy o f the D-value by assigning a confidence interval to it. Bootstrapping tries to recreate the relationship between "populat ion" and "sample" by assuming that the sample available (i.e., O R and simulator data) is representative o f the underlying population (Efron 1986). The bootstrap method estimates are used to give an estimate o f the measurement error on the D- value calculated for that specific case. 3.9.4 Dependent Data and the Moving Block Bootstrap Simple bootstrapping methods are based on the assumption that the data are completely independent from each other. In our case, however, the value each data point is highly correlated with its neighbours (XJ-I, Xj-2,...Xj-m where m is unknown) because they come from a continuous stream o f data. This temporal correlation implies that there are effectively fewer 91 independent data points, so applying the standard bootstrap method wi l l result in unrealistically tight confidence intervals. The general bootstrap method is more complex with time-correlated data, but the basic ideas remain. The moving block bootstrap ( M B B ) method was chosen as this technique accounts for dependent datasets (Kunsch 1991, L iu 1992). The M B B resamples blocks o f consecutive data at one time, as opposed to resampling a single observation as is done in the standard technique. This results in the dependent structure o f the original data block being retained within each resampled block. The M B B is applied to our dependent data as shown in Figure 3.13. The original dataset X n = { X i , . . . . X n } is partitioned into overlapping blocks o f length / to create a matrix o f blocks {(3|, P N } N X / From this matrix o f overlapping blocks, a suitable number o f blocks k is resampled with replacement to make a resampled set o f blocks {(3*i, P*k}. The new dataX* is then assembled from the elements o f (3*. The value o f k is chosen so that each bootstrap sample is the same length as the original sample data (n=k*/). The size o f each block o f length /, increases with the length o f the original sample. The value o f / should be on the order o f n ' / 5 (Hall 1995). X i X2 X/ X/+i — • • • • \ < — P , — • ! \ < — P 2 — • ! Figure 3.13: Moving Block Bootstrap. The MBB method breaks the dependent dataset that is to be resampled into N=n-l+l overlapping blocks. These blocks are then randomly resampled with replacement to length k and the resampled dataset then assembled from the resampled blocks, thereby preserving the dependent structure of the original dataset. 3.9.4.1 Measurement Resolution We have calculated many D-values from our C P D ' s , and we need to know how reliable those D-values are. We use the M B B method as described above to assign a confidence interval to each o f our D-values. Using the M B B we resample each C P D and this results in many (i.e., 1000) D-values, and we can create a C P D o f D-values. We then can assign a confidence X N XN • #-- \< PN * interval on D | . 2 from the 2.5-97.5' percentiles of th is CPD o f D-values (Figure 3.14). This confidence interval shows the range o f D-values that are likely to occur with the underlying distribution. The size o f the confidence interval is dependent on the effective size o f the dataset and variability within the distribution. Even taking data dependency into consideration, we still do have large datasets, and small confidence intervals are expected for a D-value calculated between two distributions. The confidence intervals give us an estimate o f the measurement error involved in calculating the D-value for each comparison. A n example would be to measure the circumference o f one green apple and one red apple with an inaccurate tape measure, the confidence interval for this measurement would give us a value related to the measurement technique and we could not make any assumptions about the circumference's o f all red and green apples. Therefore, we should keep this in mind when examining our calculated confidence intervals. Figure 3.14: Confidence intervals for D-values. To assign a confidence interval to a D-value, each CPD is resampled to create a CPD of D-values (Source: Kinnaird 2004) 93 A measure o f statistical significance is necessary to judge how much different our performance values are. One o f the C P D ' s is assigned as the reference (CPD r e f ) . Each performance measure (i.e. velocity, force, etc) has its own C P D r e f . This C P D r e f is then resampled and each resampled distribution (CPDRS) is compared to the C P D r e t This is done many times (-1000) to get a distribution o f D-values between the reference and resampled C P D (D R s - r e f ) . The D-value at the 95 t h percentile o f the CPD(D R S - r e r ) is the critical D-value (D c r ) (Figure 3.15). The D c r is used to identify statistical difference between any measured D-value (D m e a s ) and the reference i f D 1 T, e a s is greater than D c r . For example, i f we are investigating the force profiles o f two surgeons in the OR, surgeon 1 is considered the reference and is resampled to get CPD(DRs-ref) . If D s u r g e o n i -surgeon2 is outside the 95 t h percentile o f CPD(DRS-ret) then, the two surgeons' force behaviours are different. Resample Length = n l Find D c > C P D r e f e r e n c e Length-N | r=1000 J r=1000 CPD(D r s . r e f ) Resample Length = N 0.95 D „ Figure 3.15: CPD of D-values. Finding a CPD of D-values between the CPDmeas and CPDref: we can assess the relevance of any measured D-value. IfDmeas is greater than the Dcr value of CPDfDrsj-ej) then the two CPD's under consideration are different (Source: Kinnaird 2004). 3.10 Discussion The Maryland dissector tool was chosen as the experimental tool in consultation with expert surgeons. This particular tip was selected as it is frequently and commonly used in 94 laparoscopic cholecystectomies. The Maryland tool tip is interchangeable, as future studies may require other useful tool tips. A custom designed and built bracket was created to mount the various sensors required for this project. By creating this bracket and mounting it to the surgical tool shaft, all sensors were mounted securely and kinematics and F/T measures could be extracted for study. The use o f the electrosurgical unit (ESU) in the O R caused a significant amount o f distortion and noise in our raw data. The magnetic, F/T and strain gauge sensors were all adversely affected by the E S U . This effect required data removal to be done before performance measures could be analyzed. A technique to monitor the velocity changes was used to successfully remove the affected sections o f noisy data. The remaining ESU affected data could then be removed manually. Forces and torques o f the surgical tool tip were collected using the experimental set up and a tri-axial transducer mounted on the bracket. Many hours o f calibrations and data registration were done in post-processing to remove gravity effects, grip effects, and electro surgery unit effects. 95 Chapter 4 Results of a Quantitative Study to Assess Laparoscopic Surgical Simulator Validity 4.1 Introduction In this chapter, we present pilot study data illustrating intersubject variability, intrasubject differences, and the reliability of our chosen performance measures. The performance and behaviours of the novice surgeons were compared to each other in OR and simulators (i.e., performance validity), and then again compared to the experts (i.e., construct validity). We also analyze the concurrent validity of the simulators based on our performance measures in the OR as the gold standard. The implications of the analysis are then discussed as concerning the reliability of our data collection system and construct validity and performance validity of the simulators. The protocol as outlined in Chapter 3 section 3.5, and the lengthy post-processing (Chapter 3 section 3.6) were followed for each of the three OR procedures, and physical simulator data collections. This resulted in time synchronized and post-processed kinematics and force data referenced to a common reference frame at the surgical tool tip. The dissection task data of the surgical procedure is also broken down into segments as discussed in Chapter 3 section 3.8.1.2. 4.2 Results We were able to successfully collect OR data 3 times (one surgery from each resident). We also collected data in the virtual reality (VR) and physical simulators. A summary of the data collections is shown below in Table 4.1 Table 4.1: Summary of successful data collection from each context O R V R Simulator Physical Simulator Resident 1 1 3 1 Resident 2 1 3 1 Resident 3 1 3 1 96 4.2.1 Context Comparisons The comparisons between the contexts and the subjects are completed in many different areas. We have collected data from the surgical residents in each context (OR, physical simulator, V R simulator), and w i l l make comparisons within these. Then this data wi l l be compared against the expert data previously presented by Kinnaird (2004). 4.2.1.1 Surgical Residents The first comparisons (Figure 4.1) presented w i l l be intrasubject from each procedure. Each procedure is divided into segments as discussed in Chapter3 section 3.3.1.2, and these segments compared to each other (A l ) . This w i l l investigate intrasubject intraprocedure variability and repeatability. Next, the intrasubject intertrial V R (A2) comparisons are shown to investigate repeatability in the V R simulator o f the residents. Thirdly, the intersubject intrasetting (A3, A4, A5) results w i l l be analyzed. Each o f the residents w i l l be compared in the three settings (OR, physical simulator, V R simulator) to evaluate consistency at the skil l level. And lastly, the intrasubject intersetting (A6) results wi l l compare each o f the residents' behaviour in the O R to the simulators to performance validity o f the two simulators. OR Physical Simulator VR Simulator Resident 1 CD . © i w k Resident 2 © ( D © c D © ( D Resident 3 (D . r r © ; r w A l Intrasubject intraprocedural O R A2 Intrasubject intertrial V R simulator A3 Intersubject intrasetting O R A4 Intersubject intrasetting physical simulator A5 Intersubject intrasetting V R simulator A6 Intrasubject intersetting (OR versus V R , O R versus physical) Figure 4.1: Context comparisons for surgical residents. The numeric values in the table represent the respective As. 97 4.2.1.2 Expert Surgeons To study construct validity as stated in our objectives, we need to compare our surgical resident data to that o f the expert surgeons. The expert surgeon data was collected and performance measures extracted by Catherine Kinnaird (2004). Our resident to expert comparisons wi l l be called " interlevel" comparisons (Figure 4.2). This comparison w i l l be interlevel intrasetting. This w i l l demonstrate the results o f the experts compared to the residents in each o f the contexts (OR, V R simulator, physical simulator). A7 is new method o f evaluating concurrent validity as we have O R expert data as the "gold standard". We have the same performance measures available in each o f the other contexts (resident skil l level and simulators). This way we are able to quantitatively make suggestions in the concurrent validity o f the simulators. A8 and A9 allow us to investigate the construct validity o f both the physical and V R simulator, as we are trying to detect skil l level differences. O R Physical Simulator V R Simulator Residents A P A b A (Q Experts r V 1 ' r 1 y r A7: Interlevel Intrasetting O R A8: Interlevel Intrasetting physical simulator A9 : Interlevel Intrasetting V R simulator Figure 4.2: Interlevel context comparisons for experts and residents. 4.2.2 The D-Value The KS statistic D-value is calculated for all context comparisons. The D-value depends on the size o f the original sample sizes o f the distributions. Our sample sizes are all in the magnitude o f several thousand data points, and the larger the sample size the smaller the D-value must be to be considered "s imi lar" . Generally, when a dataset is resampled from itself (Drs-ref), D-values are usually about 0.02-0.05. (Remember that a D-value o f 0 is similar, and a D-value o f 1 is maximum difference) When two CPD's are different, we usually see values o f 0.8-1. 98 4.2.3 Presentat ion of Results The performance measures as discussed earlier are velocity, acceleration, jerk and force in the six tool tip directions: axial (z), grasp (y), translation (x), transverse (yjx2 + y2 ), absolute ( + v 2 + z2 ), and roll about the tool axis. The performance measure of distance from the mean (D mean) is presented only in the absolute and roll about the tool axis directions as it is sensitive to the choice of location of the global reference frame. The cumulative probability distributions (CPDs) of all twenty-six performance measures in all directions are presented in a large plot with twenty six subplots. The 75 t h percentile of the data is shown for better visualization of the results as this area shows the critical areas of the CPD's. The important differences between CPD's are always in this region. The CPD's all have long tails, and if the entire C P D was shown, the critical areas would appear to be vertical, and the important differences would not be easily seen. The D-values of the comparisons is also calculated and presented in another plot. The D-values are shown with confidence intervals, and the critical confidence interval is also shown for finding the statistical difference (CPD(DRs- r ef)). 4.2.3.1 A l : Intrasubject In t raprocedura l O R Compar i sons Each of the surgical residents performed a laparoscopic cholecystectomy with an expert surgeon in attendance and supervising. Each resident had one session of data collection in the OR, and the results from each are presented in the following sections. Each surgical dissection task was divided into three segments to examine intraprocedural repeatability. The first segment consisted of anatomy exploration and identification. The second segment was the cystic duct and artery dissection. And the third segment was the gallbladder removal from the liver bed. We investigate the repeatability of the resident within one procedure. 99 4.2.3.1.1 Resident 1: Intrasubject Intraprocedural O R The performance measures for the three segments of the OR procedure are extracted (Figure 4.3) . In an initial visual inspection, the CPD's are relatively similar in shape and range. The kinematics measures of velocity, acceleration, and jerk in all tool tip directions show the most similarity in shape. The force, distance from mean ( d ), and the transverse and absolute tip directions measures show the most variability. Segment 3 is the most different from the other two segments of the procedure, and this is seen in all tool tip directions. The axial forces are the largest in value, as this is a combination of the axial tip force and the grip forces not removed through the calibration process. This large axial force measure coincides with what was found for the expert surgeons (Kinnaird 2004). The segments are then compared to a data lumping of the other two segments. These D-values and the corresponding confidence intervals signify the variability between segments (Figure 4.4) . Each of the segments in an OR procedure represents a different portion of the dissection task. The CPD reference (DRs-ref) is created from resampling the reference CPD from itself. If the experimental D-values is close to the 95 t h percentile of the CPD (DRs- rer), the more "similar" they are considered. The performance measure CPD's indicated the segments 1 and 2 are similar, and the D-value calculation verifies this. As expected, segments 1 and 2 are more similar, and segment 3 is the more different from the other two segments. The D-values represent the variability between segments for this procedure. It gives us an idea of intraprocedural repeatability as each of the segments has a different goal in the OR, even though they are all considered part of the dissection task. In general, we can say that resident 1 is repeatable intraprocedurally. 100 Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation Velocity Acce l 'n Jerk Force Segment 1 Segment 2 Segment 3 -30 -10 0 10 0 0.5 1 1.5 Figure 4.3: Resident 1 intraprocedure OR CPD. Segments 1,2 & 3. Each of the individual graphs represents a performance measure in that particular direction at the tool tip. 101 Acceleration D s e gl - ref L^seg2 - ref I—B 1 n -'seg3 - ref L^seg2 - seg3 C P D 9 0 ( D R S . r e f ) 0.4 0.5 0.8 Dif ference (D) Va lue 0.7 0.9 Figure 4.4: Resident 1 intraprocedure OR D-values. Segments 1, 2 & 3. The horizontal error bars represent the confidence interval on the D-value. 4.2.3.1.2 Resident 2: Intrasubject Intraprocedural O R The C P D ' s o f the calculated performance measures again seem to be quite similar in shape and range for all measures (Figure 4.5). We see some small differences in segment 3 in the transverse and absolute tool tip directions as was seen previously with Resident 1. Again, the force and d show the most differences in C P D shape. The D-values are calculated and lend support to what was seen in the C P D ' s o f the performance measures (Figure 4.6). The kinematics performance measures have small differences between all segments with the majority o f D-values below 0.3. We see here that segment 1 has many D-values that fall within the C P D (DRs- r ef) indicating the values are essentially the same. A lso , for this subject, there are no D-values greater than 0.6. The kinematics performance measures all have D-values below 0.2 demonstrating very repeatable behaviour within this procedure. 102 Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation Sej mient 1 Sej *ment 2 u n a Sej ̂ ment 3 d Velocity -200 0 200 -200 0 100 -500 0 500 . 200 500 800 , 200 600 1000 0 2 Jerk Force ,-ff J, -5 0 5 10 00.5 -2 -1 0 0.5 1 1.5 2 5 10 15 -0.2 0 Figure 4.5: Resident 2 intraprocedure OR CPD. Ssegments 1,2 & 3. Each of the individual graphs represents a performance measure in that particular direction at the tool tip. 103 d M Tran! Velocity: Acceleration xlf' Tmri! Jerk: rSk Force: 4- * -I n L-'seg2 - ref I * 1 D ; seg3 - ref )i( I D seg2 - seg3 CPD 9 0 (D R S . r e f ) 0 0.1 0 2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Dif ference (D) V a l u e Figure 4.6: Resident 2 intraprocedure OR 2D-value. Segments 1, 2 & 3. 4.2.3.1.3 Resident 3: Intrasubject Intraprocedural O R For Resident 3, we see very similar results (Figure 4.7) as to what was seen with Resident 1 and Resident 2. The three segments show a lot of similarity when looking at the CPD performance measures. Segment 1 shows some difference in the jerk measure in the transverse and absolute tool tip directions. We again see the most difference in the d and force measures. The similarity between segments is confirmed by the D-values (Figure 4.8). Force and d measures have again the largest differences. This OR trial shows the least amount of intersegment variability with all D-values below 0.3 except for the force in the translate (x) direction. Resident 3 demonstrated the most repeatable behaviour within a single OR procedure. 104 Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute 1 0.5 I Rotation Segment 1 Segment 2 Segment 3 Velocity Acce l ' n Jerk Force -200 0 200 , 200 400 GOO ; 200 500 BOO 1 •0.5 0 0.5 •600 0 400 , -500 0 500 „ -500 0 500 , 500 1500 , 500 200D , - 4 - 2 0 2 1 0.5 0 2000 0 2000 -2000 01000 -2000 0 2000 2000 6000 2000 8000 -20 -10 0 N -40 -20 0 20 -2 0 2 4 6 8 -2 0 2 4 6 2 4 6 El 10 20 40 -0.5 0 Figure 4.7: Resident 3 intraprocedure OR CPD. Segments 1, 2 & 3. Each of the individual graphs represents a performance measure in that particular direction at the tool tip. 105 Trans Velocity I Acceleration) Jerk I TKS Force I Da -I n L /seg2 - ref D seg3 - ref D, seg2 - seg3 C P D 9 0 ( DRS-ref) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Difference (D) Value 0.8 0.9 Figure 4.8: Resident 3 intraprocedure OR D-value. Segments 1,2 & 3. 4.2.3.2 A2: Intrasubject Intertrial V R simulator The three residents each performed the cystic duct dissection module on the V R simulator three times each. The performance measures extracted from the V R simulator are less comprehensive than from the O R or physical simulator; there are only 17 performance measures. There is not any roll direction, or component force values. It should be noted that as o f time o f this manuscript, the V R force values are pending change. The manufacturer hardware calibration value was not quoted correctly, and therefore the V R force values wi l l need to be multiplied by a factor still to be determined. We do know that this factor w i l l be less than 2, and therefore wi l l not significantly affect the comparison results. Intrasubject intertrial variability was examined, and little variability was seen in either the range or shapes o f the C P D s for all three residents (Figures 4.9, 4.11, 4.13). A lso seen from the V R simulator data, are the low absolute force values and the small range o f values. The residents also tended to spend about half o f the time at very low forces. The three trials D-values for each o f the three residents were compared. These D-values (Figure 4.10, 4.12, 4.14) coincide with the visual observations seen on the C D F plots representing very little differences in the majority o f measures. The largest differences are seen in the absolute force and distance from mean performance measures. The variability is so low in this contextual comparison that many D-values are below 0.1 between the three trials. Each o f the three residents is very repeatable in three trials in the V R simulator. Velocity Acce l 'n Jerk Force Ax ia l (z) Grasp (y) Translate (x) Transverse Absolute Rotation • I rad 50 0 50 , - 5 0 0 50 , -40 0 30 , 2 0 B0 100 , 20 B0 140 , 0 0.5 1 rad/s3 -400 0 300 -400 0 200 -400 0 200 200 500 BOO 200 600 1000 0 0.5 1 1 i 1 1 Nm 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0.3 1.4 - Q 05 Figure 4.9: Resident 1 intertrial VR simulator CPD. Each of the individual graphs represents a performance measure in that particular direction at the tool tip. 107 d Abs Velocity I Acceleration % Tftg Jerkx Force Abs H D, ' (trial 1 vs. .2) "I D ( trial 1 vs. 3) | r~| —j " (trial 2 vs 3) CPD w (D R S . r e f ) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Di f fe rence (D) V a l u e 0.8 0.9 Figure 4.10: Resident 1 intertrial VR simulator D-value comparisons. 108 Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation Velocity Acce l 'n Jerk Force 0.5 0 -400 0 300 -500 0 500 -400 0 300 200500 900 11 1 1 0.5 N 0.5 N 0.5 1 0 0.5 Figure 4.11: Resident 2 intertrial VR simulator CPD. Each of the individual graphs represents a performance measure in that particular direction at the tool tip. Velocity* Acceleration* Tran| Jerkx Force Abs i II git| i I- 4 D (trial I w. .2) I ^ I D (trial I vs 3) | r~| _ | D (trial 2 vs 3) CPDc»(DRs . rer) 0.1 0.2 0 3 0.4 0.5 0.6 07 Difference (D) Va lue 0.8 0.9 Figure 4.12: Resident 2 intertrial VR D-value comparisons. 110 Ax ia l (z) Grasp (y) Translate (x) Transverse Absolute Rotation Velocity 0 5 Acce l 'n 0-5 Jerk Force -10 0 10 , -10 0 10 , -10 0 10 1 15 25 10 20 30 ^ 0.5 1 rad/s" 50 0 50 „ -50 0 50 , -50 0 5D . 20 B0 100140 . 50 100150 .0 0.5 rad/s' 1 0.5 0 -400 0 300 . -400 0 300 N 1 0.5 0 N 1 0.5 0 -400 0 300 200 BO01000 400 1200 "0 0.5 1 11 N 1 0.5 0 0.5 1 0 0.5 1 0 0.5 1 0 Nm 1 0.3 0.9 0 0.5 Figure 4.13: Resident 3 intertrial VR simulator CPD. Each of the individual graphs represents a performance measure in that particular direction at the tool tip. Tranf Velocity x iM Acceleration! Jerk I Force Abs H D (trial I vs 2) -I D„ -"(trial 1 w 3) | | | | D ( t r j a l 2 « 3) CPD9o(DR S. r e f) 0.1 0.2 0.3 0.4 05 0.6 Difference (D) Value 0.7 Figure 4.14: Resident 3 intertrial VR D-value comparisons. 0.9 4 . 2 . 3 . 3 A 3 , A 4, and A 5 : Intersubject Intrasetting Comparisons The three residents performance measure C P D ' s are compared in each o f the three contexts o f the operating room, virtual reality simulator, and physical simulator (Figures 4.15, 4.17, 4.19). We are able to examine consistency at the skil l level by making these comparisons. At first glance, the C P D ' s for the three subjects in each context look rather similar. The operating room C P D ' s show the most differences, especially for Resident 2. The V R simulator C P D shows very similar shapes and ranges for all performance measures other than d. The physical simulator C P D ' s shows again similar shapes and ranges in all measures other than forces and D mean. The differences between the residents are analyzed (Figure 4.16, 4.18, 4.20). These D-values confirm the initial visual inspection o f the CPD ' s . The data for the physical simulator shows that Resident 2 and Resident 3 have more similar patterns when compared to Resident I with most measure below D = 0.3. For the V R simulator, the three residents show much more similarity with the majority o f the D-values below 0.15 demonstrating amazing consistency at their ski l l level. The O R difference comparisons indicate a larger range o f D-values with a 112 spread throughout the range. A point to remember is that the V R simulator is a very repeatable and structured environment, while the O R context is much more inherently variable. And the physical simulator is somewhere in-between these two contexts in terms o f variability and repeatable structure. Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation 1 Velocity Acce l ' n Jerk Force Resident 1 Resident 2 Resident 3 Figure 4.15: Intersubject intrasetting (OR) CPD. m Trans Velocity) TM Acceleration I T ! Jerk j Force § 4* ** 4tr — H D (Resident I 1 vs .2) H""«———| D ( Resident II vs. 3) | _ ^ ^ _ ^ _ | D(Resident 2 vs.3) CPD 9 0 (D R S , e f ) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Dif ference (D) Va lue 0.8 0.9 Figure 4.16: Intersubject intrasetting (OR) D-value comparisons. 114 Ax ia l (z) Grasp (y) Translate (x) Velocity Acce l ' n Jerk Force Resident 1 Resident 2 Resident 3 ransverse Absolute Rotation rad -10 0 10 -10 0 10 . -10 0 10 1 0.5 0 1 0.5 0 -50 0 50 , -50 0 50 , -50 0 50 , 20 70 120 50 100 150 0 0.5 rad/s3 400 0 300 -400 0 300 „ -400 0 300 „ 200 500 800 200 600 1000 0 0.5 1 Nm 0 0.5 1 Figure 4.17: Intersubject intrasetting (VR simulator) CPD. 115 Abs Irani Velocity* Acceleration! Irani Jerk I Force Abs ' » 1 "»<***" p 3j£ D , ' (Resident I I vs 2) ^ I D( Resident 11 vs 3) | 3g | D(Resident 2 vs 3) 1. .,,1 C P D 9 o ( D R S . r e f ) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 09 Difference (D) Va lue Figure 4.18: Intersubject intrasetting (VR simulator) D-value comparisons. Ax ia l (z) Grasp (y) Translate (x) Transverse Absolute Rotation — — — Resident 1 | 1 | 1 Resident 2 — Resident 3 Velocity 0 5 Acce l 'n Jerk Force -2000 01000 -2000 0 2000 .-B000 0 4000 2000 11 10000 2000 10000 -15-10-5 0 5 -40 -20 -40 -20 0 20 40 10 30 50 0 5 10 15 Figure 4.19: Intersubject intrasetting (physical simulator) CPD. 117 TrariE Velocity 1 Acceleration) Jerk 1 TrI Force I H - . T 0.1 H D , ' (Resident I 1 vs. 2) ^ I D(Residentll vs. 3) | ^ | D(Resident 2 vs. 3) CPDc ,o(D R S . r ef) 0 2 03 0.4 0.5 06 Difference (D) Value 0 7 0 8 0.9 Figure 4.20: Intersubject intrasetting (physical simulator) D-value comparison. 4.2.3.4 A6: Intrasubject Intersetting Each o f the three residents had data collected in the three contexts: OR , V R simulator, and physical simulator. Comparisons were done for each subject in each o f the settings (i.e. intrasubject intersetting) (Figures 4.21, 4.23. 4.25). These comparisons w i l l help us in our investigation o f the performance validity o f the two surgical simulators. If the quantitative measures in the simulator are similar to that in the OR, then the simulator can be considered to show performance validity. It can be seen from the C P D ' s that the kinematics measures in all tool tip directions for the OR, and the physical simulator are more similar when compared to the V R simulator. It would seem that the resident's move more slowly in the V R simulator relative to the physical simulator and in the OR. Another significant visual is the absolute force measure, which is very low in the V R simulator. It is so much lower than the physical simulator or O R settings that it is not easily seen on the plots. 118 The D-value analysis provides further evidence to the differences and similarities seen in the C P D ' s (Figure 4.22, 4.24, 4.26). The largest differences are seen between the force values, where the D-value is often 1.0, maximum absolute difference. A lso in the d performance measure, there are a few D-values that are also at 1.0. Another interesting note is that the comparison between the physical simulator and the OR, where many o f the D-values are below 0.4. And conversely, when we compare the V R simulator to either the O R or physical simulator, the D-values are generally larger than 0.3. We consistently see that the O R vs. physical simulator comparisons shows lower D-values than the O R vs. V R simulator comparisons. Velocity Acce l ' n Jerk Force Ax ia l (z) Grasp (y) Translate (x) Transverse Absolute Rotation i -200 0 200 -300 0100 ,-100-50 0 50 200 400 1 200400600 ^0.2 0 0.1 £000 0 4000-6000 O2000 -4.000 0 2000 , 2000 A ,2000 12000 -6 -4-2 0 2 4 1 -40 -20 0 -2 0 2 4 6 0 5 2 4 6 8 10 20 30 -3 - 2 - 1 0 1 Figure 4.21: Resident 1 intersetting CPD. OR, VR simulator, physical simulator. Velocity! J * Acceleration! Trarl; Jerk! TM Force \ D, OR vs V R I * I D OR vs Phys I X I ^ V R ra p h V s CPD„o(DRs.ref) 0.5 0.6 0.7 hce (D) Value 0.8 0.9 Figure 4.22: Resident 1 intersetting D-values. OR, VR simulator, physical simulator Velocity Acce l 'n Jerk Force A x i a l (z) Grasp (y) Translate (x) Transverse Absolute 1 0.51 Rotation Phys V R O R rad/s3 f N -40 -20 0 0 5 10 -40 -20 0 20 40 10 30 50 Nm 0 L 0 5 10 15 Figure 4.23: Resident 2 intersetting CPD. OR, VR simulator, physical simulator. 121 Ti Velocity Ti Acceleration: T Jerk Ti Force 0 il 0.2 0.3 D n B „ VR OA 0.5 0.6 Difference (D) Va lue I • I D OR vs Phys I C P D 9 0 ( D R S . r e r ) 0.7 + 0.8 0.9 Figure 4.24: Resident 2 intersetting D-values. OR, VR simulator, physical simulator 122 A x i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation Phys V R O R ,-200 0 100 -300 0100 ,-200 0 1DD 1 I , i ' i— * l 1 I . •> J 1 200 400 GOO , 200 600 1 -0.5 0 0.5 Force 0 1000 -2000 0 1000 -4000 0 2000 . 2000 6000, 2000 6000 -10 -5 0 5 N lj 0.5 0 N J -40 -20 0 20 0 5 10 -40 -20 0 20 40 20 40 0 5 10 Figure 4.25: Resident 3 intersetting CPD. OR, VR simulator, physical simulator. 123 Tram Velocity! Acceleration! Trl Jerk: jM Force: D OR vs V R D, ' O R vs Phys | - ^ ^ — | D V R vs Phys CPD„o(D R S- r ef) 0.1 0.2 0.3 0.4 0.5 0.6 0., Difference (D) Va lue J.9 Figure 4.26: Resident 3 intersetting D-values. OR, VR simulator, physical simulator 4.2.3.5 Expert vs. Resident Comparisons The data from the three residents was lumped together to create a large data set for resident surgeons in each o f the three settings: OR, V R and physical simulators. The expert surgeon data (2 experts) collected and analyzed in a concurrent study by Catherine Kinnaird was also taken and lumped into a dataset to represent the expert surgeons (Kinnaird 2004). These two datasets in each setting, expert and residents respectively, could then be compared to each other to begin an investigation into the construct validity o f the two simulators. If the simulator is able to detect ski l l level differences, it is said to show construct validity. We are also able to demonstrate a new method for evaluating concurrent validity. This type o f validity is usually assessed by a comparison to the "gold standard", which is expert O R behaviour. This "gold standard" has been evaluated using checklists and rating scales in the OR. In our study, we are able to make the same assessments in all contexts, whether O R or simulators, or differing ski l l levels. 124 Due to intrasubject differences that cannot be clearly seen once the data has been lumped, and our small sample sizes for both experts and residents, we also investigated differences amongst the individuals. D-value comparisons are shown to analyze differences amongst the two experts and three residents. Each expert is compared to each resident individually for a more thorough construct validity investigation. 4.2.3.5.1 Interlevel Intrasetting O R The performance measure CPD's for the lumped experts and residents were evaluated and plotted (Figure 4.27). The shapes of the kinematics measures of velocity, acceleration and jerk are somewhat similar, but the ranges do vary. We also see visual larger differences in the force and D mean CPD's . Axial (z) Grasp (y) Translate (x) Transverse Absolute Rotation -40-20 0 20 0 2 4 6 B -4 -2 0 2 4 2 4 6 8 10 20 30 -5 0 5 10 Figure 4.27: Lumped interlevel OR CPD. 125 We then investigate the individual differences for the two experts and three surgical residents (Figure 4.28). We see here the actual variation between all five subjects. We generally see similar shapes in the kinematics measures, while more variability in the d and force measures. Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation Velocity Acce l 'n Jerk Force «« « « Res 1 — — Ex 1 Res 2 . Ex 2 Res 3 mm/sf-A'" 200 0 200 -300 0100 ,400 0 200 -1000 0 1000 , -500 0 500 -2000 01000 . 1000 3000 m m / s ] ^ | T | 7 — 1 1 2000 4000-1DD 0 50 1 -40-20 0 20 20 40 -10 0 10 Figure 4.28: Interlevel OR individual CPD. Two experts and three residents. 126 A n analysis o f the D-values confirms what is seen in the C P D ' s (Figure 4.29). There is a wide range o f D-values ranging from close to 0 to the maximum difference o f 1. Again as was seen earlier in the resident comparisons, the force and D mean measures frequently have a D-value o f 1. Here we see that expert 2 vs. resident 3 generally have D-values below 0.4, while expert 1 vs. resident 1 and resident 2 have all D-values greater than 0.2. E x l vs. Resl Ex I vs. Res2 E x l vs. Res3 Ex2 vs. Res I Ex2 vs. Res2 Ex2 vs. Res3 CPD90 (D R S , e f ) m Ti Velocity T: Acceleration: •+- 4*- 0 0 1 02 0 3 0 4 0.5 0.6 0 7 Dif ference (D) Va lue Figure 4.29: D-values for the two experts and three residents in the OR. 8 09 127 4.2.3.5.2 Interlevel Intrasetting Physical Simulator Intelevel comparisons let us evaluate the construct validity o f the physical simulator. We are looking for skil l level differences. The C P D ' s o f the interlevel physical simulator trials are shown in Figure 4.30. Here we see that the kinematics measures in all directions except roll seem to be relatively similar in shape and range. We again see the largest differences in the force and d measures. A x i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation Residents Velocity Acce l 'n 0 200 -200-100 0 100 300 500 , 200 400 600 , -02 0 02 Jerk Force 1 0.5 0 f 0.5 0 ;600 0 400 1 -1000 0 ,000 0 500 , 500 2500 , 500 2500 -10 0 10 02000 :10000 05000 -5000 0 5000, 0.5 1 1.5 2 . 0.5 1 1.5 2 -6000 0 -302010 0 10 0 5 10 15 -30-20-10 0 5 15 25 10 20 30 -10 -5 0 5 Figure 4.30: Lumped interlevel physical simulator CPD. 128 For the physical simulator, we again analyze the individual differences between each expert and resident (Figure 4.31). We do see general trends in the shape and range for the performance measures. We see slightly more similar C P D ' s than we saw in the O R comparisons. Velocity Acce l 'n Jerk Force Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation 1 0.5 rad i H -40 -20 0 20 0 5 10 15 -40 -20 0 20 40 10 30 50 -10 0 10 Figure 4.31: Interlevel physical simulator individual CPD. 129 By looking at the D-values for all the experts and residents, we can more clearly see the individual differences (Figure 4.32). There is a large spread o f D-values throughout the range. And again, we see the largest differences with the D mean and force measures, with a few at the maximum difference o f 1. It is also interesting to see that the comparison o f expert 1 vs. resident 2, we get almost all the D-values below 0.2 indicating they are more similar in their behaviours. Ti Velocity Acceleration! Exl vs. Resl Ex2 vs. Resl Exl vs. Res2 Ex2 vs. Res2 Exl vs. Res3 Ex2 vs. Res3 CPD90 ( D R S ref) ^:j'-:.E'i:''y 0 4 0 5 0 6 Difference (D) Value 0 7 0 8 09 Figure 4.32: D-values for the two experts and three residents in the physical simulator. 130 4.2.3.5.3 Interlevel Intrasetting V R Simulator Again, we are able to investigate construct validity o f the V R simulator by looking for skil l level differences. The C P D comparison between the experts and residents show the most similar profiles (Figure 4.33) when compared to the interlevel comparisons o f the physical simulator and OR. The largest variations are seen in the d and absolute force profiles. There are also differences in the transverse and absolute tool tip directions. Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute Rotation Experts Residents Velocity -1D 0 10 -10 0 10 , -10 -5 0 5 Acce l 'n 5 10 152025 , 10 20 30 ,0 0.5 1, rad/s2 -50 0 50 0 : 40 -60 0 40 , 20 60 100 , 20 Jerk Force 140 ,0 0.5 1 rad/sJ 1 0.5 0 -400 0 200 -400 0 200 -400 0 200 200 500 800 200 600 1000 0 0 5 1 1 Nm 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 0.3 0.9 0 0.5 1 Figure 4.33: Lumped interlevel VR simulator CPD. 131 We then investigate the individual differences for the V R simulator for all residents and experts (Figure 4.34). Here we see a lot o f similarity in all performance measures. The variability between experts and residents looks to be quite small according to the C P D . Ax i a l (z) Grasp (y) Translate (x) Transverse Absolute 1 Rotation - - Res 1 ^—^— Ex 1 Res 2 Ex 2 — — Res 3 Velocity 15 0 Jerk Force D 0.5 1 0 0.5 1 0 0.5 0.3 1.3 0 0.5 Figure 4.34: Interlevel VR simulator individual CPD. 132 The differences in the V R simulator are all much lower than what is seen in the physical simulator and in the O R (Figure 4.35). A l l D-values are below 0.4 except for the d o f expert 2 vs. resident 2. In this simulator, it would be difficult to distinguish between the experts and residents, as the differences are all small. Abs Trani Velocity I Acceleration* Tranf Jerk I Force Abs Exl vs. Resl Ex I vs. Res2 Exl vs. Res.l Ex2 vs. Resl Ex2 vs. Res2 Ex2 vs. Res3 CPD90 (DRs-rrf) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Difference (D) Va lue Figure 4.35: D-values for the two experts and three residents in the VR simulator. 4.3 Discussion This project was chosen as a complementary and follow-up study to that o f Kinnaird (2004). The results that have been obtained further support and answer the questions initially posed in Kinnaird's project. Further results in the realm o f comparisons between expert and resident surgeons have also lead to more questions and preliminary answers. This work begins the investigation o f construct and performance validity o f the physical and virtual reality (VR) simulators. We also have created a new method for evaluating concurrent validity by having the same performance measures in all contexts, with expert O R data as the "gold standard". The motor behaviour o f the surgical tool tip was the model used to extract quantitative measures that allowed for comparisons. We w i l l analyze and discuss the results to help us 133 understand the comparisons that have been made, and to further investigate the overall objectives o f our validity studies. 4.3.1 Context Comparisons 4.3.1.1 Intraprocedural Operating Room Variability The intraprocedural intrasubject operating room results show that there can be difference between the three dissection segments, but generally speaking, the three segments had a D- value o f less than 0.3. This is an interesting result as each segment has a different objective (i.e. exploration, dissection, etc.). Each o f these segments is a different section o f a larger dissection task. The overall goal at the end o f the dissection task is to have clipped and cut the cystic duct and artery to isolate the gallbladder, and this goal can be reached using a variety o f kinematics and forces. In two o f the three trials, Segment 3 was found to have the largest differences from the other two segments. In the other trial, Segment 1 showed the most difference from segments 2 and 3. It is interesting to note here that in these three O R trials, the 3 segments chosen were not always o f the same tasks as this was not possible. Resident 1 spent the entire data collection period in the exploration and dissection phase, and we never were able to observe any cl ipping or cutting o f the ducts/artery. Due to a chronically inflamed gallbladder, this exploration and dissection took more than 20minutes, when normally it would only take -10 minutes. It is seen in the results that Segment 3 shows the most differences to the other two segments. It is possible that during this particular segment, some particularly difficult or different anatomy o f the gallbladder was causing the resident to vary behaviours. Resident 2 data showed the most difference in segment 3. This O R trial followed the "norma l " segment protocol o f exploration, setting and clipping the duct and artery, and a final gallbladder dissection segment. This protocol most closely follows that o f Kinnaird (2004), but contrary to what she found, segment 3 showed the most difference as opposed to segment 1. 134 Resident 3 data followed the normal course of exploration, dissection, and the setting of two clips and cutting the cystic duct and artery. But segment 3 was the preparation of the cystic duct for cholangiogram examination (x-ray examination to look at gallbladder and ducts). The experimental surgical tool was used to insert the catheter for cystic duct exploration and verification. It is interesting to see that even though this is a completely different task than is done in any other OR trial, the performance measures do show some differences, but not as large as would be expected. For the three OR trials, the levels of differences varied as expected. Resident 3 showed the least amount of intraprocedural variability with D-values below 0.3. While on the other hand, Resident 2 had the largest amount of variability with slightly more D-values over 0.3, while Resident 1 had D-values in-between these two. Generally, the three residents showed good repeatability in the OR intraprocedural comparisons. 4.3.1.2 Intrasubject Intertrial VR Variability Intertrial variability in the V R simulator was found to be quite low with D-values in the ranges of close to 0 to 0.5, and most of the D-values were less than 0.2. This result was as predicted as the V R simulator is not an inherently variable situation. Also, all three V R trials were conducted consecutively on the same day. Each of the trials was the same scenario as the previous trials, and very predictable for the resident to know exactly what to expect. The results also coincide and verify the results by Kinnaird (2004) that a small number of trials are needed for each subject to study simulator performance. The largest intertrial differences were found for d and the absolute force values, which was similar to what was found for the intraprocedural intrasubject OR trials. The three residents are very repeatable in their V R simulator performances. 4.3.1.3 Intersubject Intrasetting Comparisons The intersubject intrasetting comparisons investigate consistency at the skill level within the context. We are specifically looking at PGY4 surgical residents. Our results of intersubject intrasetting differences verify those found by Kinnaird (2004). The intersubject intrasetting differences decreased from OR to physical to V R . This result coincides 135 with the level o f structure inherent in each context; least structured to most structured. The O R environment has many different variables that can lead to many differences, whereas the V R simulator environment does not have many variables, and is a predictable and repeatable environment. 4.3.1.3.1 Operating Room The intersubject O R differences generally were in the area o f 0.2-0.4 in all measures except for force where they were generally larger. This tells us that the residents wi l l use relatively similar tool motor patterns to achieve the same end result. The force patterns used by the residents were more different, and again the same gall bladder removal procedure was completed successfully. The three residents show fair consistency in the O R context. 4.3.1.3.2 Virtual Reality Simulator Intersubject V R simulator differences are lower than the O R trials. This is an expected result, as the intrasubject intertrial V R differences were very low also. The majority o f D-values were below 0.1 showing incredible intersubject similarities. The three residents are very consistent to each other. It is an interesting result in that each o f the residents received no training on the V R simulator, but would treat the simulator in a predictable and repeatable fashion to each other. The residents also commented that they thought the V R simulator was like a video game, and that certain tasks would be useful to train on a V R simulator. But this particular dissection task was not very realistic, and was not the same way they would behave in a real O R situation. These comments coincide with those o f Kinnaird's experts' data comments on the face validity o f the V R simulator (Kinnaird 2004). Neither residents nor experts felt that this V R dissection task was very good for training or evaluation o f skil ls. 4.3.1.3.3 Physical Simulator Intersubject physical simulator differences were also relatively low in all measures with most D-values below 0.3. These D-values fall in-between the O R and V R simulator differences, and this result is the same as found with Kinnaird's expert data (Kinnaird 2004). We see the largest differences in the force and d difference values, and this is the same as was found for the O R trials' differences. It is also interesting to note that the comparison between resident 2 and 136 resident 3 resulted in all D-values below 0.3 except in d. The three residents are fairly consistent in the physical simulator. Again, a quick face validity study was conducted, the opinions varied amongst the residents'. Although none o f them found the mandarin orange dissection incredibly realistic, their opinions did vary on how well they thought their motor patterns or force exertions were similar to in the OR. Another factor in the residents' opinion was the "juicy-ness" o f the orange itself. Some mandarin oranges were quite juicy, and the skin did not peel o f f easily making for a more difficult dissection task. If the mandarin orange was generally "drier" , the dissection task was easier, and the residents' were more easily able to complete the task. 4.3.1.4 Intrasubject Intersetting Comparison The three residents were compared in the three environments o f the OR, V R and physical simulators. This comparison gives us an indication o f performance validity o f the simulators, as we compare each to the O R environment. Specifically, these three contexts were compared to each other: OR to physical, O R to V R , V R to physical. These intersetting comparisons result in larger differences than the intrasetting comparisons. The D-values calculated run the entire range from close to 0 to 1 (similar to different). We see the largest differences between the V R simulator and both the O R and physical simulator settings. The most striking difference was between a few o f the force measures o f the V R and physical simulators with the three residents (D=l ) . A l l three residents show the similarity in differences in the kinematics measures where the V R simulator had slower velocities, accelerations and jerk measures when compared to the O R and physical simulator. The most striking difference was between the absolute force measures o f the V R simulator compared to the physical simulator and the OR, which is most visible when looking at the C P D . The residents did find and comment that the V R simulator to be a " low force" environment compared to a typical O R scenario. 4.3.1.5 Interlevel Intrasetting By using the data collected and analysed in this project, and the data analyzed by Kinnaird (2004), we are able to begin an investigation into the construct validity o f the V R and physical 137 simulators. A simulator showing construct validity will be able to detect differences between skill levels. In this analysis, the data from the two experts was lumped together to create an "expert" group, and the three residents' data was lumped together for a "resident" group. This is an efficient and easy method to detect immediate differences between the two skill levels. We also looked at the differences between all 5 subjects, and can see how each of the three residents compared to the two expert surgeons. This is an interesting comparison as opposed to looking at the lumped data. We can see the more detailed differences between these groups. 4.3.1.5.1 Operating Room Immediately on analysis, we can detect differences between the expert and resident data in the OR. Interestingly, the residents seem to be moving faster (velocity) than the experts. One would think that a surgical resident would be more tentative, and move slower, but as the data shows, this is not the case. We see large differences in the force data, where the expert surgeons use high forces more frequently than the residents. This could be a sign of the tentativeness of the residents. They may not feel comfortable in the OR to "pull and tug" with a lot of force. When we look at the three residents and two experts individually, we see the differences cover the entire range from close to 0 to 1 (similar to different). We do see that the force measures are 0.2<D<10. This tells us that the experts compared to the residents use different force patterns when in the OR. In the end, the same end result is reached, but the method to reach that point does vary significantly. The kinematics measures do tend to stay below 0.6, which indicates some more similarities in these motor behaviour patterns. 4.3.1.5.2 Virtual Reality Simulator The interlevel intrasetting differences in the V R simulator are the smallest of the three contexts. In this context, it would be more difficult to make a conclusion on construct validity of the V R simulator, as the D values are small (<0.3). 138 When we compare the subjects individually, we see very little difference in all measures. Both expert 1 and expert 2 show similar kinematics and force patterns to all three residents. This is a significant result as we are trying to detect differences between experts and residents. And according to this, we do not see significant differences between the skill levels. Therefore, the V R simulator does not pass the construct validity test. 4.3.1.5.3 Physical Simulator Here in the physical simulator, we have interlevel difference levels in-between what was seen in the OR and V R simulator contexts. The physical simulator is more able to detect the differences between the two skill levels. We do see more differences in the force and d measures, as was a common theme in all our context comparisons. In the individual comparisons, our physical simulator is the "middle of the road" setting, where differences are between that of the OR and V R simulator. The physical simulator can detect the skill level differences in a fair manner. 4.3.1.5.4 Experts vs. Residents Now that we have collected and analysed data from both surgical experts and residents, and made some comparisons, can we conclude that i f a resident behaves like an expert that they must be an expert? Being able to perform the same tool motor behaviours as an expert does not necessarily make you an expert. Our expert surgeons have been practicing surgery for many years, while the surgical residents are just at the beginning of their careers. So there must be other factors that determine whether a resident is of an expert's calibre. Possibilities that could be studied include linking behaviour and outcome. Some of these outcome measures could include: surgical complications, mortality, loss of function, recovery time, and post-operative pain. Another study could be surgical errors, where surgeons could be doing the same behaviours, but one has more errors, and therefore increasing the risk. Our study has given insight into the motor behaviour patterns of the experts and residents, but we have not investigated the outcomes. These types of outcome studies could provide further evidence on what determines an expert. 139 4.3.2 Performance Measure Reliability The results found in this study further the reliability o f our chosen performance measures. Our intrasetting kinematics and force measures, especially in the V R simulator, were very consistent. We also see similar, although not to the same degree, consistencies in the O R and physical simulator settings. As was first noted by Kinnaird in the study o f expert surgeons (Kinnaird 2004), the force performance measure showed the most variability in intersubject and intrasetting comparisons. The results presented here agree with this, and further support the fact that the force measure is sensitive. The distance from mean measures also showed larger variability than the other measures. 4.4 C o n c l u s i o n s Using the hybrid experimental tool and data collection system, we were able to successfully collect data from the human OR, and the V R and physical simulators for three surgical residents. The KS statistic (D-value) was used to make comparisons between settings and subjects to quantitatively assess motor behaviour, and simulator validity. The reliability o f our performance measures was shown by low variability in the intraprocedural intrasubject comparisons. We also saw low variability in the V R intertrial comparisons for all three residents. The V R simulation is a very repeatable environment, and our performance measures also agree with this repeatability. Our intrasubject intersetting (OR, V R & physical simulators) showed much larger differences suggesting poor performance validity o f the V R simulator, as the residents do not treat this context similarly. The physical simulator suggested an indication o f fair performance validity as it was treated more similarly to the O R by all three residents. We also investigated interlevel differences to study the construct validity o f the simulators. Some differences were noted between the ski l l levels (expert and resident), so it can be suggested that the physical simulator showed fair construct validity. The V R simulator differences were very small, so it would be difficult to conclude that it also shows construct validity. With our limited sample sizes, it is not possible to make firm conclusions. But this is a pilot study, and first attempt at a quantitative investigation o f simulator validity. We have been 140 successful in collecting O R data and making effective comparisons to both V R and physical simulators for surgical residents and experts. Our experimental tool and quantitative analysis system is a novel and unique method to assess surgical performance in various environments. 141 Chapter 5 Conclusions and Recommendations 5.1 Introduction The goals of the research presented in this document include a quantitative evaluation of the validity of two types of laparoscopic surgical simulators. And to do this, we developed an experimental tool to collect data in the human operating room, and developed a method to fuse the collected kinematics data. A standard laparoscopic surgical tool was modified, and a bracket designed to accommodate the various sensors used for data collection to collect surgeon motor behaviour in the operating room. Over a period of five months, performance measure data was collected from the operating room, virtual reality simulator and physical simulator for three surgical residents. This data was compared within and between subjects and contexts. By comparing the simulator behaviour to the OR behaviour, we were able to investigate the performance validity of the simulators. This surgical resident data was then compared to expert surgeon behaviour as analysed by Kinnaird (2004). This comparison aided in the evaluation of construct validity of the two simulators. The overall system was initially developed by McBeth (2002), improved upon by our group (Brouwer 2004, Kinnaird 2004), and will be furthered by Sayra M . Cristancho to achieve our overall goal of creating a surgical performance measure database. 5.2 Review of Research The following sections review and summarize the research conducted in this project. We have covered many areas of study, and will present each in a summarized section. 5.2.1 Experimental Surgical Tool The design and development of a experimental surgical tool was completed in partnership with Catherine Kinnaird, to create a total system capable of measuring and collecting high frequency continuous kinematics and force/torques of the surgical tool tip. From literature searches, it is thought this was the first time that this variety of sensors was attached to a surgical tool for use in the human operating room. 142 There were a few different criteria for the hybrid surgical tool to be created. The incorporation of delicate sensors, and the acceptance of the tool for use in the OR by surgeons and the OR staff were of utmost importance. The biggest challenge was to be able to mount the force/torque (F/T) sensor onto the surgical tool shaft. With the aid of volunteer Brandon Lee (engineering graduate), we were able to create a bracket to mount the F/T sensor off-axis and still be able to transmit forces through the sensor without changing the function of the laparoscopic tool. This bracket also allowed for the mounting of the kinematics sensors. The custom-designed experimental tool allowed for high frequency continuous data collection of kinematics and F/T measures. The kinematics portion of the system consists of both optoelectronic and electromagnetic position tracking systems. These two data streams are collected separately with their respective tracking systems and software. Another objective of this project was to be able to combine these two data sets into one continuous high frequency stream. In this fashion, we can take advantage of the accuracy of the optical system and the high frequency continuity of the magnetic system. This fusion of the datasets is a simple yet efficient method to obtain accurate continuous high frequency kinematics performance measures. It is also a large improvement over the previous kinematics data collection system previously used in our lab. The force/torque system is the newest part of the total data collection system. This component was incorporated into the quantitative performance measure system, and will need some improvement in future studies. Issues with friction in the tool shaft and bracket design problems have led possibly to misleadingly high force data. A redesign of the bracket and possibly a new F/T sensor that can be mounted on the tool shaft would help in the problems that we dealt with. There is also the issue of trocar interaction forces that was not included in this study. These interaction forces could contribute significantly to the force values that we have measured. Another issue that caused problems with our data collection and processing system was the use of electrocautery during the surgical procedures. The surgeons commonly use cautery to cut and coagulate tissues to minimize the amount of blood in the surgical field. But our sensors were affected by this high frequency high voltage electrocautery current, and would lead to a lot o f noise in the data o f both the kinematics (magnetic sensor) and F/T (strain gauges). We developed post-processing techniques to remove these noisy portions o f data, and also manually removed some parts also. 5.2.2 Data Collection 5.2.2.1 The Operating Room Attempting to collect data during a live human operation is a difficult undertaking that is fraught with logistical nightmares: equipment failure, patient consent, surgeon scheduling, hospital strike, and other numerous problems that seemed to crop up weekly. The original plan was to collect data at least twice per week. But instead, we were only able to collect data once every few weeks. Due to these problems, this was the main reason on why we had to switch our focus from a transfer o f training study to a validity study. The created data collection and analysis system is a good start into the realm o f surgeon motor behaviour analysis and measurement. But in its present state, it is not feasible to collect data often or to process a large amount o f data in a reasonable amount o f time. A n average o f 15 hours minimum was required to process the acquired 15-30minutes o f O R data into a usable form. Although we tried to minimize the disturbance in the OR, our large amount o f equipment, and the two researchers required to operate the system, did receive complaints from the O R staff. The actual size o f the O R is relatively small, and by adding the extra equipment and people, we were sometimes " in the way", and created a hassle for the staff. We had also planned to do calibrations immediately post-operatively, but due to logistics, this was not always possible. 5.2.2.2 The Experimental Surgical Tool Our custom-designed experimental surgical tool is one o f the first such tools to be used in a human operating room to monitor surgeon motor behaviour during a laparoscopic cholecystectomy. This tool was used to collect data successfully in the O R a total o f three times, although four trials were attempted. This tool was designed in consultation with expert surgeons, and was designed with the ease o f use for the surgeon in mind. So although we tried to meet the criteria set by the surgeons, the end result was an "awkward" tool as commented by all the surgeons, expert and surgical residents. The main concern was the size and weight o f the 1 4 4 mounting bracket. The bracket was designed to be as lightweight as possible but due to the placement o f the sensors, it tended to be weighted significantly on one side, impeding the normal roll direction around the tool shaft. A lso because o f the wires coming from the multiple sensors, they also tended to keep the tool from roll ing around, and always swinging back to the original position. This experimental tool is a very good first step in the creation o f an instrumented surgical tool capable o f collecting kinematics and F/T measurements in a human OR. 5.2.2.3 Simulators The physical simulator data collection process utilized the same system as for the operating room data collection but without the same logistical problems. The data was easier to post- process, as there were not issues o f electrocautery noise. The physical simulator consisted o f the dissection o f a mandarin orange using standard laparoscopic setup (tower and camera), and was conducted in the Centre o f Excellence for Surgical Education and Innovation (CESEI ) at Vancouver General Hospital ( VGH ) . The virtual reality (VR) simulator data collection process was comparatively simple, although it is noted that Iman Brouwer spent a lot o f time configuring and calibrating this simulator. The continuous high frequency kinematics and force/torque data is directly extracted from and formatted by the V R simulator software. This data was also collected with the aid o f Iman Brouwer in CESE I at V G H . 5.2.3 Data Fusion One o f the objectives o f this project was to create a high frequency continuous data stream o f kinematics data. We are able to achieve this goal by taking the data gathered from our two position sensors, and fusing them into one data set. So after the data is gathered from the operating room or physical simulator contexts, registered and time synchronized, the fusion process is started. It is a simple, yet effective method. By using the advantages o f both systems, we are able to create a data set that is accurate, high frequency and continuous. We have found a large decrease in error over the previously implemented interpolation technique. 145 5.2.4 Performance Measures The quantitative measurement of surgeon performance is of utmost importance to both the public and surgical community. It is necessary to know how our surgeons are performing, and not the simple fact that they can do these procedures. Some of the quantitative measures used to assess surgical performance include completion time, force/torques, kinematics, and ergonomics (Chung 1998, Hanna 1998, McBeth 2002, Rosen 2001, Sackier 1998). Our system to capture kinematics and force/torque data in vivo is very innovative. There were twenty-six performance measures that were investigated: velocity, acceleration, jerk, distance from mean (D mean), and force in the following tool tip directions (axial, grasp, translation, transverse, absolute, roll). The performance measures that we have chosen seem to be reliable as there was little variability between surgical residents in the same environment. This further supports the data found by Kinnaird (2004). We had a total of 26 performance measures to analyze. It is possible that we may not need this wide of a selection of measures, as we were able to make generalizations by looking at the force, d, and kinematics measures. Just looking at the velocity, d and force measures may give us enough detail to conduct comparisons. Also we chose to look at five tool tip directions for these measures. This also may not be necessary, and we could choose to just analyze one tip direction. The force measures were the only one that showed differences in all five tool tip directions. Perhaps in the future, the measures could be reduced to as shown in Figure 5.1. Axial (z) Grasp (y) Translate (x) Transverse Absolute Rotation D mean ft ft ft Velocity ft ft ft Force ft ft ft ft ft ft Figure 5.1: New performance measures. We may be able to reduce the number of performance measures. This will decrease post-processing time, and make comparisons easier and more generalized. 146 5.2.5 Context Comparisons Comparisons were made over the three settings and amongst the subjects. These comparisons helped to establish our construct and performance validity assessments. The Kolmogorov- Smirnov (K.S) statistic was used to calculate the differences between these contexts. By looking at these D-values from the KS statistic, we can quantitatively compare the differences without making any assumptions about the distribution o f the data. We collected data from surgical residents, and used the expert data collected by Kinnaird (2004) for our various comparisons. The comparisons that were made led us to the fol lowing conclusions and new ideas: • O R intraprocedural context can make a difference but is not consistent between subjects in which segments are similar (i.e., segment 1 not always the most different than segment 2 and segment 3 as was found by Kinnaird (2004)) o Each segment has a different goal in mind o Other variables could affect (e.g., patient anatomy, complications) • Residents show very low intertrial variability in the V R simulator o V R simulator is very repeatable and structured environment o Each test is the same as previous, and residents complete task in similar fashion • Intersubject intrasetting comparisons show increasing differences from V R simulator to physical simulator to O R (most repeatable to least repeatable environments) • Intersetting comparisons show that V R simulator is the most different from the O R and physical simulator contexts, and the physical simulator is relatively similar to the OR. Physical simulator shows fair performance validity. • Interlevel differences are seen leading to a suggestion o f fair construct validity for the physical simulator • V R simulator differences were very low between ski l l levels, so does not show construct validity • Performance measures o f force and distance from mean ( d) show the most sensitivity in context comparisons 147 5.2.6 Simulator Validation A valid simulator is one that correctly represents the setting in which it is trying to emulate. By making comparisons between the O R and the two simulated settings ( V R and physical), we can see i f either o f the simulators does a reasonable job o f re-creating OR kinematics and forces. When making intersetting comparisons between the V R and physical simulator and investigating performance validity, we found that the V R simulator was the most different from the O R context. The residents treated the physical simulator more similar to the OR. This is an important note, as Kinnaird (2004) found that the expert surgeons treated the two simulators about equally different from the O R context. In this project, we can suggest that the residents treat the physical simulator relat ively the same as the OR, and this simulator does show fair performance validity. For the surgical education program, this is a significant find as the residents are practicing similar kinematics in the physical simulator as in the OR. Another important factor here is the cost o f each simulator. The V R simulator is ~$50 000 while the physical simulator is —$ 1. We did find the residents tend to move slower and use a lot less force in the V R simulator as compared to the other two contexts. One objective was to investigate the construct validity o f both the physical and V R simulators. The method we have chosen is to study the differences between expert surgeons and surgical residents in both these environments. If a simulator can detect differences between ski l l levels, it is considered to show construct validity. In our interlevel comparisons, we see some differences in the physical simulator. The V R simulator shows very little interlevel differences. It is interesting to note that the V R simulator shows small differences between the two skil l levels (D< 0.3) in the kinematics measures, so could almost be considered "s imi lar " between the residents and experts. Even though both simulators received mixed reviews from the residents and experts, according to our data, the physical simulator would be a better context for training, as it does show fair construct validity. 5.3 Recommendations Recommendations and improvements were suggested in consultation with fellow researchers, surgeons, and operating room staff. Although we have created a good first approach at 148 gathering and analyzing surgeon motor behaviour in the operating room, future studies could be even further improved with some modifications. 5.3.1 Software • "One button" operation for collection o f all data from all sensors from one laptop computer. This was the original plan, but due to inherent multiple serial port issues with Matlab, this was not possible. • Custom designed software to automate the post-processing (data registration and calibration). This was a laborious and tedious task for each set o f O R data. This minimum 15-hour task could be shortened into a more reasonable timeframe. • Automatic data synchronization programs are commercial ly available to remove the human error aspect o f data synching manually and visually. • Custom-designed software to automatically recognize when electrocautery is used during surgery (either during or post-process), and to compensate for these parts o f the data stream, but either filtering or automatic removal o f the noisy data. 5.3.2 Hardware • Sensor bracket redesign by making it more compact and allowing for complete normal use o f the surgical tool, especially in the roll around the tool axis direction. Create out o f a more rigid non-conductive material to prevent wear and allow for better force transmission to the F/T sensor. • Wireless sensors would be most optimal, as this would remove the issue o f having many wires hanging down and affecting the weighting and turning o f the surgical tool. • Improvements to the strain gauge system, as it seemed to be most affected by the electrocautery during the surgical procedure. A different configuration or instrumentation amplifier may be able to solve these problems. • A variety o f surgical tool tips could be purchased. This would al low for more tasks o f a procedure to be analyzed, and not only the dissection. The electrocautery hook or spatula would be the next most used tool during a laparoscopic cholecystectomy. 149 5.3.3 OR Data Collection • Keep a dedicated O R cart with all data collection equipment, and a stand for the Polaris camera, and the video camcorder. • Min imize disturbance to the O R staff by only having one researcher in the room for the entire procedure. The 2 n d researcher should leave the room once the computer system is up and running. • Arrive early for all OR trials to double and triple check all equipment is functioning. • Always book for the first operation in the morning in the largest OR. This allows for more time to setup and check equipment. • Al lowance for intraoperative dynamic tracking o f the magnetic transmitter. This would allow for optimal placement o f the transmitter. A passive optical marker attached to the transmitter may be useful. Simulators Physical simulator improvements would be similar to the software and hardware recommendations. A lso , choosing the proper mandarin orange for the dissection task is also important. The orange must not be too juicy or firm, as this makes for a difficult and messy dissection task. Another improvement would be to create a permanent mounting surface to place the mandarin orange. Virtual reality simulator improvements would need to be discussed with the commercial manufacturers. One immediate modification is to improve the force feedback effects. 5.3.5 Other Recommendations • Account for F/T trocar interaction forces on the surgical tool shaft. • Automated performance measure extraction. 5.4 Partner & Future Studies Our new experimental surgical tool and analysis system is a worthy first contribution into the study o f surgeon motor behaviour. The methods we have created are feasible, and with some improvements, a better system could be created. The next step would be to implement as many o f the recommendations as possible, and to collect more O R data. 150 Some areas o f immediate study that could be investigated with an improved system include: • Which performance measures are the most sensitive, valid and reliable to assess surgeons? • Does training in the virtual reality and/or physical simulator lead to improved performance in the operating room? (This was our original objective question, and still needs to be addressed). The longer-term goal o f the research in our lab (Neuromotor Control Laboratory, University o f British Columbia, Canada) is to eventually create a surgical skills database. This would involve collecting performance measures from many different surgical ski l l levels ranging from the very novice to the expert surgeon. As this database increases in size, a surgeon from any skil l level could see how they compared to others o f their own skil l level. For example, a P G Y 2 resident could see how they compared to others o f the same year level. This could be done for both operating room performance measures as well as in the simulators. A surgeon could also see the specific areas in which they need to improve or where they excel as compared to others. In conjunction with this current research project, there are two projects within our lab occurring also studying different aspects o f surgical simulators. These two projects and the one described in this thesis all fit together (Figure 5.2) to set-up the framework to eventually create the surgical skills databases as mentioned above. Catherine Kinnaird began the investigation o f the validity o f both V R and physical simulators with expert surgeon subjects (Kinnaird 2004). Iman Brouwer studied the minimum technological requirements for a virtual reality simulator, and how haptic quality affects simulator performance (Brouwer 2004). V a l i d a t e / C o m p a r e S imula tor Hapt ic Interface Transfer 151 Opera t ing R o o m Skill Level Database Figure 5.2: Concurrent research projects at the Neuromotor Control Laboratory (University of British Columbia). The studies are: 1) Transfer of training from simulator to operating room, 2) Validation of physical and virtual reality simulators, 3) Minimum technological requirements for a virtual reality simulator, 4) Skill Level Database In a larger more elaborate study, Sayra M . Cristancho wi l l be continuing our projects by using the experimental tool and data collection system to assess more surgeons in the OR. This w i l l help in our final global goal in the creation o f the surgical ski l l level database. Currently, our data analysis methods are time consuming, so Ms. Cristancho wi l l be working on automating the analysis process to be able to collect and process more data in a reasonable amount o f time. Many more procedures could be analyzed, and a better picture o f surgeon motor behaviour can be obtained. Using the data collected in this study and that o f Catherine Kinnaird, surgical resident Dr. Hamish Hwang wi l l do an analysis o f surgical error during a laparoscopic cholecystectomy. He is analyzing the laparoscopic video and the tool tip kinematics and forces/torques data to draw conclusions about qualitative video and quantitative performance measures as they relate to surgical errors. Another surgical resident, Dr. Hanna Piper, wi l l be looking at the feasibility o f using our hybrid experimental tool to analyze the general surgery curriculum new training modules. They 152 will be using our physical simulator mandarin orange model as one of the surgical educational modules for the University of British Columbia surgical resident training program. As is seen with our partner and future studies, our project goals have been met and exceeded. The results presented here and in our partner studies provide a significant contribution into the realm of surgical simulator assessment and education. The foundation for a method and system to collect quantitative human operating room performance measures, and to assess construct and performance validity of laparoscopic surgical simulators has been created and used with success. 153 List of Terms Abs Absolute tool tip direction Acce l ' acceleration BC IT British Columbia Institute for Technology Pi block o f length / o f dependent data pi* block o f length / randomly resampled from original block set (Pi PN} blocks o f dependent data created from original dependent dataset IP'' M resampled blocks o f dependent data from original block set C A cystic artery CESE I Center o f Excellence for Surgical Education and Innovation C P D cumulative probability distribution CPD(DRS-ref) cumulative probability distribution for the bootstrapped data resampled from itself and compared to the reference C D cystic duct C D D cystic duct dissection D Kolomogorov-Smirnov statistic difference measures Dcr critical D-measure at the 95th percentile i f CPD(DRs-ref) Dl -2 Kologorow-Smirnov statistic between two C P D ' s D mean distance from mean E S U electrosurgical unit Ex 1 expert surgeon 1 Ex 2 expert surgeon 2 F/T force/torque G B gallbladder G B D gallbladder dissection G C V Generalized Cross Validation GUI graphical user interface Hz Hertz k length o f resampled block set - k=N// KS Kolmogorov-Smirnov / length o f an individual block for M B B M B B Mov ing Block Bootstrap mag magnetic data mm millimeters M D M A r r a y Mul t i Dimensional Marker Array N Newtons N length o f block set - by default N=n-/+1 N C L Neuromotor Control Laboratory n length o f original data set N m Newton meters O R operating room opt optical data PC personal computer P G Y post-graduate year (for surgical residents) Phy physical simulator RF radio rrequency R M S root mean square (error) rad radians r number o f bootstrapping cycles Res 1 surgical resident 1 Res 2 surgical resident 2 Res 3 surgical resident 3 Rol l rotation (about the experimental tool axis - see Rot) s seconds synch synchronization Trans Transverse tool tip direction V Volts vel velocity V R virtual reality simulator x Translation direction o f tool tip frame X i data at point i Xi» resampled data at point i {Xi.... Xn} original data set {Xi*.... Xn»} resampled data set - created from the resampled block set U B C University o f Brit ish Columbia y Grasping direction o f tool tip frame z Ax i a l direction o f tool tip frame 3D three-dimensional 155 Bibliography Adrales, G. L., Chu, U. B., Witzke, D. B., Donnelly, M . B., Hoskins, D., Mastrangelo, M . J., Jr., Gandsas, A. , Park, A . E. (2003). Evaluating minimally invasive surgery training using low- cost mechanical simulations. Surgical Endoscopy 17, 580-585. Adrales, G.L., Park, A .E . , Chu, U.B., Witzke, D.B., Donnelly, M.B., Hoskins, J.D., Mastrangelo, M . J . Jr, Gandsas, A . (2003). A valid method o f laparoscopic simulation training and competence assessment. The Journal of Surgical Research 114, 156-162. Ahlberg, G., Heikkinen, T., Iselius, L., Leijonmarck, C. E., Rutqvist, J., Arvidsson, D. (2002). Does training in a virtual reality simulator improve surgical performance? Surgical Endoscopy 16, 126-129. Anastakis, D. J., Regehr, G., Reznick, R. K., Cusimano, M. , Murnaghan, J., Brown, M . Hutchison, C. (1999). Assessment o f technical skills transfer from the bench training model to the human model. American Journal of Surgery 177, 167-170. Atsma, W. AT I Force Transducer driver. http://www.mech.ubc.ca/~wa.stma/ATI-FT driver/ Last accessed Apr i l 12, 2004. Auffrey, A .L . , Mirabella, A . Siebold, G.L.(2001). Transfer o f Training Revisited, Advanced Training methods Research Unit, U.S Army Research Institute for the Behavioural and social Sciences, AR I Research Note 2001-10, July 2001. Ballantyne, G .H . (2002). The pitfalls o f laparoscopic surgery: challenges for robotics and telerobotic surgery. Surgical laparoscopy, endoscopy & percutaneous techniques 12, 1-5. Bann, S., Datta, V., Khan, M. , Darzi, A . (2003). The surgical error examination is a novel method for objective technical knowledge assessment. American Journal of Surgery 185, 507- 511. Berguer, R., Forkey, D.L., Smith, W.D. (1999). Ergonomic problems associated with laparoscopic surgery. Surgical Endoscopy 13, 466-468. Birkfellner, W., Watzinger, F., Wanschitz, F., Ewers, R., Bergmann, H. (1998). Calibration o f tracking systems in a surgical environment. IEEE Transactions on Medical Imaging 17, 737- 742. Blaiwes, A . S. (1984). Training Effectiveness Evaluation and Util ization Demonstration o f a Low Cost Cockpit Procedures Trainer (Report No . N A V T R A E Q U I P C E N 78-C-001301). Pensacola, Fla.: Seville Training Systems. B loom, M.B., Rawn, C.L., Salzberg, A .D . , K rummel ,T .M. (2003). Virtual reality applied to procedural testing: the next era. Annals of Surgery 237, 442-448. 156 Bridges, M , Diamond, D. L. (1999). The financial impact of teaching surgical residents in the operating room. American Journal of Surgery 177, 28-32. Brouwer, I. (2004). Cost-performance trade-offs in haptic hardware design. M A S c Thesis, University of British Columbia, Vancouver, BC, Canada. Cao, C. G., MacKenzie, C. L., Ibbotson, J. A. , Turner, L. J., Blair, N . P., Nagy, A . G. (1999). Hierarchical decomposition of laparoscopic procedures. Studies in Health Technology and Informatics 62, 83-89. Challa, S., Koks, D. (2004). Bayesian and Dempster-Shafer Fusion. Sadhana29, 145-174. Chan, A .C . , Chung, S.C., Yim, A.P. , Lau, J.Y., Ng, E.K., L i , A . K . (1997). Comparison of two- dimensional vs. three-dimensional camera systems in laparoscopic surgery. Surgical Endoscopy 11, 438-440. Cohen, R., Reznick, R.K., Taylor, B.R., Provan, J., Rothman, A . (1990). Reliability and validity of the objective structured clinical examination in assessing surgical residents. American Journal of Surgery 160, 302-305. Coue, C , Fraichard, T., Bessiere, P. Mazer, E. (2003). Using Bayesian programming for multi- sensor multi-target tracking in automotive applications. Int'I Conference on Robotics and Automation. Taipei, Taiwan. May 12-17, 2003. Dakin, G.F., Gagner, M . (2003). Comparison of laparoscopic skills performance between standard instruments and two surgical robotic systems. Surgical Endoscopy 17, 574-579. Datta, V. , Chang, A . Mackay, S. Darzi, A . (2002). The relationship between motion analysis and surgical technical assessments. American Journal of Surgery 184,70-73. Derossis, A . M . , Fried, G . M . , Abrahamowicz, M . , Sigman, H.H., Barkun, J.S. & Meakins, J.L. (1998). Development of a model for training and evaluation of laparoscopic skills. American Journal of Surgery. 175, 482-487. Derossis A M , Antoniuk M , Fried G M . (1999). Evaluation of laparoscopic skills: a 2-year follow-up during residency training. Canadian Journal of Surgery 42, 293-296. Derossis A M , Bothwell J, Sigman HH, Fried GM.(1998). The effect of practice on performance in a laparoscopic simulator. Surgical Endoscopy 12, 1117-1120. de Visser, H . , Heijnsdijk, E.A., Herder, J.L., Pistecky, P.V. (2002). Forces and displacements in colon surgery. Surgical Endoscopy 16, 1426-1430. Efron, B., Tibshirani, R. (1986). Bootstrap methods for standard errors, confidence invervals, and other measures of statistical accuracy. Statistical Science 1: 54-77. Elmenreich, W. (2002). An introduction to sensor fusion: research report. http://www.vmars.tuwien.ac.at/frame-papers.html Last accessed March 20, 2004. 157 Emam, T.A. , Frank, T .G. , Hanna, G.B., Cushieri, A . (2001) Influence o f handle design on the surgeon's upper limb movements, muscle recruitment, and fatigue during endoscopic suturing. Surgical Endoscopy 15, 667-672. Eubanks, T.R., Clements, R. H., Pohl, D., Wi l l iams, N., Schaad, D.C., Horgan, S. Pellegrini, C. (1999). A n objective scoring system for laparoscopic cholecystectomy. Journal of the American College of Surgeons 189, 566-574. Faulkner, H., Regehr, G., Martin, J . , Reznick, R. (1996). Validation o f an objective structured assessment o f technical ski l l for surgical residents. Academic Medicine: journal of the Association of American Medical Colleges 71, 1363-1365. Feldman, L.S., Hagarty, S.E., Ghitulescu, G., Stanbridge, D., Fried, G . M . (2004). Relationship between objective assessment o f technical skills and subjective in-training evaluations in surgical residents. Journal of the American College of Surgeons 198, 105-110. Figert, P. L., Park, A . E., Witzke, D. B., Schwartz, R. W. (2001). Transfer o f training in acquiring laparoscopic skil ls. Journal of the American College of Surgeons 193, 533-537. Flexman, R. E.. Roscoe, S. N., WMlIiams. A . C , Jr., & Wil l iges, B. H. (1972). Studies in pilot training: The anatomy o f transfer. Aviation Research Monographs, 2(1). Champaign, 1L: University o f Illinois, Aviat ion Research Laboratory. Ford, J.K., Weissbein, D. A . (1997). Transfer o f Training: A n Updated Review and Analysis, Performance Improvement Quarterly 10, 22-41. Foxon, M . (1993). A process approach to the transfer o f training: The impact o f motivation and supervisor support on transfer maintenance. Australian Journal of Educational Technology 9,130-143 Francis, N.K., Hanna, G.B., Cuschieri, A . (2002). The performance o f master surgeons on the Advanced Dundee Endoscopic Psychomotor Tester: contrast validity study. Archives of Surgery 137, 841-844. Francoeur, J.R., Wiseman, K., Buczkowski , A .K . , Chung, S.W., Scudamore, C .H . (2003). Surgeons' anonymous response after bile duct injury during cholecystectomy. American Journal of Surgery 185, 468-475. Frantz, D.D., Wiles, A .D . , Leis, S.E., Kirsch, S.R. (2003). Accuracy assessment protocols for electromagnetic tracking systems. Physics in Medicine and Biology 48, 2241 -2251. Fried, G . M., Derossis, A . M. , Bothwell, J . , Sigman, H. H. (1999). Comparison o f laparoscopic performance in vivo with performance measured in a laparoscopic simulator. Surgical Endoscopy 13, 1077-1081, discussion 1082. Fried, G .M . , Feldman, L.S., Vassi l iou, M.C. , Fraser, S.A., Stanbridge, D., Ghitulescu, G., Andrew, C .G . (2004). Proving the Value o f Simulation in Laparoscopic Surgery. Annals o f Surgery 240, 518-528. 158 Gallagher, A . G., Richie, K., McClure , N., McGuigan, J . (2001). Objective psychomotor skills assessment o f experienced, junior, and novice laparoscopists with virtual reality. World Journal of Surgery 25, 1478-1483. Gallagher A .G . , Satava, R .M. (2002). Virtual reality as a metric for the assessment o f laparoscopic psychomotor skil ls; learning curves and reliability measures. Surgical Endoscopy 16, 1746-1752 Gallagher, A .G . , Richie, K., McClure , N. , McCuigan, J.(2001). Objective psychomotor skills assessment o f experienced, junior and novice laparoscopists with virtual reality. World Journal ofSurgery 25, 1478-1483. Gallagher, A .G . , Lederman, A.B . , McGlade, K., Satava, R.M., Smith, C D . (2004). Discriminative validity o f the Min ima l l y Invasive Surgical Trainer in Virtual Reality (MIST- V R ) using criteria levels based on expert performance. Surgical Endoscopy, 18, 660-665. Glinatsis, M.T., Griff ith, J.P., McMahon, M.J. (1992). Open versus laparoscopic cholecystectomy: a retrospective comparative study. Journal of Laparoendoscopic Surgery 2, 81-86. Goff, B.A., Lentz, G .M. , Lee, D., Fenner,D., Morris, J . , Mandel, L.S. (2001). Development o f a bench station objective structured assessment o f technical skil ls. Obstetrics and Gynecology 98,412-416. Grantcharov, T. P., Bardram, L., Funch-Jensen, P., Rosenberg, J . Assessment o f technical surgical skil ls. (2002). European Journal of Surgery 168, 139-144. Grantcharov, T.P., Bardram, L., Funch-Jensen, P., Rosenberg, J . (2003). Learning curves and impact o f previous operative experience on performance on a virtual reality simulator to test laparoscopic surgical skil ls. American Journal of Surgery 185, 146-149. Grantcharov, T.P., Kristiansen, V.B. , Bendix, J . , Bardam, L., Rosenberg, J . , Funch-Jensen, P. (2004). Randomized cl inical trial o f virtual reality simulation for laparoscopic skil ls training. British Journal of Surgery 91, 46-150. Gustafsson, F. (2003) http://www.control.isy.liu.se/~fredrik/isis/positioning.html Last accessed: August 21, 2004. Hal l , P., Horowitz, J.L., Jing, B.Y. (1995). On blocking rules for the boostrap with dependent data. Biometrika 82, 561-574. Haluck, R.S., Marshall, R.L., Krummel, T . M . , Melkonian, M.G. (2001) Are surgery training programs ready for virtual reality? A survey o f program directors in general surgery. Journal of the American College of Surgeons 193, 660-5. Hamilton, E. C , Scott, D. J . , Fleming, J . B., Rege, R. V , Laycock, R., Bergen, P. C , Tesfay, S. T., Jones, D. B.(2002). Comparison o f video trainer and virtual reality training systems on acquisition o f laparoscopic skills. Surgical Endoscopy 16, 406-411. 159 Hanna, G.B., Drew, T., Cl inch, P., Shimi, S., Dunkley, P., Hau, C , Cuschieri, A . (1997). Psychomotor skills for endoscopic manipulations: differing abilities between right and left- handed individuals. Annals of Surgery 225, 333-338. Hanna, G.B., Shimi, S.M., Cuschieri, A . (1998), Randomised study o f influence o f two- dimensional versus three-dimensional imaging on performance o f laparoscopic cholecystectomy. Lancet 51, 248-251. Hannaford, B. (2004). Private Discussion September 2004 at the University o f Brit ish Columiba, Vancouver, BC , Canada. Harms, J . , Feussner, H., Baumgartner, M. , Schneider, A . , Donhauser, M . , Wessels, G . (2001). Three-dimensional navigated laparoscopic ultrasonography: first experiences with a new minimally invasive diagnostic device. Surgical Endoscopy 15, 1459-1462. Herline, A. , Stefansic, J .D., Debelak, J . , Gal loway, R.L., Chapman, W.C. (2000). Technical advances toward interactive image-guided laparoscopic surgery. Surgical Endoscopy 14, 675- 679. Hodgson, A.J., Person, J .G., Salcudean, S.E., Nagy, A . G . (1999). The effects o f physical constraints in laparoscopic surgery. Medical Image Analysis 3 275-83. Hubens, G., Coveliers, H., Bal l iu, L., Ruppert, M., Vaneerdeweg, W. (2003). A performance study comparing manual and robotically assisted laparoscopic surgery using the da V inc i system. Surgical Endoscopy 17, 1595-1599. Huntsville Gastroenterology Associates (2002). (http ://www. hunts vi 1 le- gastroenterology.com/laparoscopic_cholecystectomy.shtml, Last accessed January 12, 2004. Hyltander, A. , Liljegren, E., Rhodin, P. H., Lonroth, H. (2002). The transfer o f basic skills learned in a laparoscopic simulator to the operating room. Surgical Endoscopy 16, 1324-1328. Johnson, J.L., Schamschula M.P., Inguva, R., Caulf ield, H.J., (1998). Pulse-coupled neural network sensor fusion. Proceedings of SPIE 3376, 219-226. Joice, P. Hanna, G . B. Cuschieri, A . (1998). Errors enacted during endoscopic surgery—a human reliability analysis. Applied Ergonomics 29, 409-414. Jones, D.B., Brewer, J.D., Soper, N.J. (1996). The influence o f three-dimensional video systems on laparoscopic task performance. Surgical Laparoscopy & Endoscopy 6, 191-197. Jordan, J .A., Gallagher, A .G . , McGuigan, J . , McClure , N. (2001). Virtual reality training leads to faster adaptation to the novel psychomotor restrictions encountered by laparoscopic surgeons. Surgical Endoscopy 15, 1080-1084. Kinnaird, C. (2004). A Multifaceted Quantitative Val idity Assessment o f Laparoscopic Surgical Simulators. M A S c Thesis,University o f British Columbia, Vancouver, BC , Canada. 160 Kopta, J . A . (1971). A n approach to the evaluation o f operative skills. Surgery 70, 297-303. Kunsch, S. N . (1991). Second order optimality o f stationary bootstrap. Statisitics and Probability Letters 11, 335-341. Lahari, S.N. (2003). Resampling methods for dependent data. Springer-Verlag, New York . L iu , R.Y., Singh, K. (1992). Mov ing black, jackknife, and boostrap capture weak dependence. Exploring the Limits of the Boostrap. Wiley, New York : 225: 248. Lujan, J .A., Parrilla, P., Robles, R., Marin, P., Torralba, J .A., Garcia-Ayl lon, J . (1998). Laparoscopic cholecystectomy vs open cholecystectomy in the treatment o f acute cholecystitis: a prospective study. Archives of Surgery 133, 173-175. Marescaux, J . , Smith, M.K., Folscher, D., Jamali, F., Malassagne, B., Leroy, J . (2001). Telerobotic laparoscopic cholecystectomy: initial cl inical experience with 25 patients. Annals of Surgery 234, 1 -7. MacRae, H., Regehr, G., Leadbetter, W., Reznick, R.K. (2000). A comprehensive examination for senior surgical residents. American Journal of Surgery 179, 190-193. Martin, J .A., Regehr, G., Reznick, R., MacRae, H., Murnaghan, J . , Hutchison, C , Brown, M . (1997). Objective structured assessment o f technical ski l l (OSATS ) for surgical residents. British Journal of Surgery 84, 273-278. Martin. M., Scalabrini, B., Rioux, A. , Xhignesse, M.A. (2003). Training fourth-year medical students in critical invasive skills improves subsequent patient safety. The American Surgeon 69, 437-440. McBeth, P.B., (2002). A Methodology for Quantitive Performance Evaluation in M in ima l l y Invasive Surgery. M A S c Thesis, University o f British Columbia, Vancouver, BC , Canada. McCarthy, A. , Harley, P. Smallwood, R. (1999). Virtual arthroscopy traiing: do the "virtual sk i l l s " developed match the real skills required? Studies in Health Technology and Informatics 62, 221-227. McDougal l , E.M., Soble, J.J., Wolf, J.S. Jr., Nakada, S.Y., Elashry, O .M . , Clayman, R.V. (1996). Comparison o f three-dimensional and two-dimensional laparoscopic video systems. Journal of Endourology 10, 371-374. McNatt, S.S., Smith, C.D.(2001). A computer-based laparoscopic skil ls assessment device differentiates experienced from novice laparoscopic surgeons. Surgical Endoscopy 15, 1085- 1089. Mi lne, A .D . , Chess, D.G., Johnson, J .A., K ing , G. J.W. (1996). Accuracy o f an electromagnetic tracking device: a study o f optimal operating range and metal interference. Journal of Biomechanics 29, 791 -3. 161 Mishra, R .K., http://wvvw.laparoscopyhospital.com/liistorv o f laparoseopy.htm Last accessed January 12, 2004. Moore, K. 2002 http://www.bleep.deinon.co.uk/SimHist 1 .html Last accessed January 15/04 Moorthy K, Munz Y , Sarker SK,Darz i A . Objective assessment o f technical skil ls in surgery, B M J 327, November 2003,1032-7. Morimoto, A .K . , Foral, R.D., Kuhlman, J.L., Zucker, K .A. , Curet, M.J., Bocklage, T., MacFarlane, T.I., Kory, L. (1997). Force sensor for laparoscopic Babcock. Studies in Health and Technology Informatics 39, 354-61. Nagy, A .G . , Poulin, E.C., Girotti, M.J., L i twin, D.E., Mamazza, J . (1992). History o f laparoscopic surgery. Canadian Journal ojSurgery 35, 271-4. Nakamoto, M., Sato, Y . Tamaki, Y. , Nagano, H. Miyamoto, M . Sasama, T. Monden, M . Tamura, S. (2000). Magneto-optic hybrid 3D sensor for surgical navigation. Lecture Notes in Computer Science 1935, 839-848. Nelson, M.S. (1990). Models for teaching emergency medicine skil ls. (1990). Annals of Emergency Medicine 19, 333-335. OToo le , R.V., Playter, R.R., Krummel, T . M . , Blank, W.C., Cornelius, N.H. , Roberts, W.R., Bel l , W.J., Raibert, M . (1999). Measuring and developing suturing technique with a virtual reality surgical simulator. Journal of the American College of Surgeons 189, 114-127. Paisley, A . M . , Baldwin, P.J., Paterson-Brown, S. (2001). Val idity o f surgical simulation for the assessment o f operative ski l l . British Journal of Surgery 88, 1525-1532. Perez, A. , Zinner, M.J., Ashley, S.W., Brooks, D.C., Whang, E.E. (2003). What is the value o f telerobotic technology in gastrointestinal surgery? Surgical Endscopy 17, 811-813. Perissat, J . (1995). Laparoscopic surgery in gastroenterology: an overview o f recent publications. Surgical Endoscopy 27, 106-118. Perkins, N., Starkes, J.L., Lee, T.D., Hutchison, C. (2002). Learning to use minimal access surgical instruments and 2-dimensional remote visual feedback: how difficult is the task for novices? Advances in health sciences education: theory and practice 7, 117-131. Person, J .G . (2000). A Foundation for the Design and Assessment o f Improved Instruments for Min imal ly Invasive Surgery. M A S c Thesis, University o f British Columbia, Vancouver, BC , Canada. Pessaux, P., Regenet, N., Tuech, J.J., Rouge, C , Bergamaschi, R., Arnaud, J.P. (2001). Laparoscopic versus open cholecystectomy: a prospective comparative study in the elderly with acute cholecystitis. Surgical laparoscopy, endoscopy, & percutaneous techniques 11, 252- 255. 162 Poulin, F., Amiot , L. P (2002). Interference during the use o f an electromagnetic tracking system under O R conditions. Journal of Biomechanics 35, 733-737. Prasad, A. , Foley, R. J . ( l996). Day care laparoscopic cholecystectomy: a safe and cost effective procedure. European Journal of Surgery 162, 43-46. Prystowski, J .B. (1999). A virtual reality simulator for intravenous catheter placement. American Journal of Surgery 177, 171-175. Regehr, G., MacRae, H., Reznick, R.K., Szalay, D. (1998). Comparing the psychometric properties o f checklists and global rating scales for assessing performance on an OSCE-format examination. Academic Medicine 73, 993-997. Reznick, R., Regehr, G., MacRae, H., Martin, J . , McCu l loch , W. (1997). Testing technical ski l l via an innovative "bench station" examination. American Journal of Surgery 173, 226-230. Rose, F. D., Attree, E. A. , Brooks, B. M., Parslow, D. M., Penn, P. R., Ambihaipahan, N. (2000). Training in virtual environments: transfer to real world tasks and equivalence to real task training. Ergonomics 43, 494-511. Rosen, J . , MacFarlane, M., Richards, C , Hannaford, B., Sinanan, M . (1999). Surgeon-tool force/torque signatures—evaluation o f surgical skills in minimal ly invasive surgery. Studies in Health Technology and Informatics 62, 290-296. Rosen, J . , Hannaford, B., Richards, C .G . Sinanan, M .N . (2001). Markov Model ing o f Min imal ly Invasive Surgery Based on Tool/Tissue Interaction and Force/Torque Signatures for Evaluating Surgical Sk i l l , IEEE Transactions on Biomedical Engineering 48(5), 579-91. Rosen J , Solazzo M , Hannaford B, Sinanan M . (2002). Task decomposition o f laparoscopic surgery for objective evaluation o f surgical residents' learning curve using hidden Markov model. Computer Aided Surgery 7,49-61. Rosser, J . C , Wood. M., Payne, J .H. , Ful lum, T . M . , Lisehora, G.B., Rosser, L.E., Barcia, P.J., Savalgi, R.S. (1997). Telementoring. A practical option in surgical training. Surgical Endoscopy 11, 852-5. Rosser JC, Rosser L E , Savalgi RS. (1997). Ski l l acquisition and assessment for laparoscopic surgery. Archives o f Surgery 132, 200-4. Risucci, D., Cohen, J . A. , Garbus, J . E., Goldstein, M., Cohen, M . G. (2001). The effects o f practice and instruction on speed and accuracy during resident acquisition o f simulated laparoscopic skills. Current Surgery 58, 230-235. Ruurda, J.P., Visser, P.L., Broeders, I.A. (2003). Analysis o f procedure time in robot-assisted surgery: comparative study in laparoscopic cholecystectomy. Computer Aided Surgery 8, 24- 29. Ruurda, J.P., Broeders, I.A., Simmermacher, R.P., Rinkes, I.H., Van Vroonhoven, T.J. (2002). 163 Feasibility o f robot-assisted laparoscopic surgery: an evaluation o f 35 robot-assisted laparoscopic cholecystectomies. Surgical Laparoscopy, Endosocpy, and Percutaneous Techniques 12, 41 -45. Schijven, M.m Jakimowicz, J . (2002). Face, expert, and referent validity o f the Xitact LS500 Laparoscopy Simulator. Surgical Endoscopy 16,1764-1770. Schijven, M. , Jakimowicz, J . (2003). Construct validity: experts and novices performing on the Xitact LS500 laparoscopy simulator. Surgical Endoscopy 17, 803-810. Scott, D. J., Bergen, P. C , Rege, R. V., Laycock, R., Tesfay, S. T., Valentine, R. J . , Euhus, D. M., Jeyarajah, D. R., Thompson, W. M. , Jones, D. B. (2000). Laparoscopic training on bench models: better and more cost effective than operating room experience? Journal of the American College of Surgeons 191, 272-283. Seymour, N . E., Gallagher, A . G., Roman, S. A. , O'Brien, M . K., Bansal, V . K., Andersen, D. K., Satava, R. M . (2002). Virtual reality training improves operating room performance: results o f a randomized, double-blinded study. Annals of Surgery 236, 458-463, discussion 463-464. Smith, S. G., Torkington, J., Brown, T. J., Taffinder, N. J., Darzi, A . (2002). Mot ion Analysis. Surgical Endoscopy 16, 640-645. Smith-Jentsch, K. A. , Salas, E., Brannick, M . T. To transfer or not to transfer? (2001). Investigating the combined effects o f trainee characteristics, team leader support, and team climate. The Journal of Applied Psychology 86, 279-292. Starkes, J.L., Payk, I., & Hodges, N.J. (1998). Developing a standardized test for the assessment o f suturing skil l in novice microsurgeons. Microsurgery 18, 19-22. Stefansic, J .D., Bass, W.A., Hartmann, S.L., Beasley, R.A., Sinha, T.K. , Cash, D.M. , Herline, A.J., Galloway, R.L. Jr. (2002). Design and implementation o f a PC-based image-guided surgical system. Computer Methods and Programs in Biomedicine 69, 211-224. Strom, P., Kjel l in, A. , Hedman, L., Johnson, E., Wredmark, T., Fellander-Tsai, L. (2003). Validation and learning in the Procedicus K S A virtual reality surgical simulator. Surgical Endoscopy 17, 227-231. Szalay, D., MacRae, H., Regehr, G., Reznick, R. (2000). Using operative outcome to assess technical sk i l l . American Journal of Surgery 180, 234-237. Taffinder, N., Darzi, A. , Smith, S., Taffinder, N . (1999). Assessing operative ski l l . Needs to become more objective. BMJ. 318. 887-8. Taffinder, N., Sutton, C , Fishwick, R.J., McManus, I.C., Darzi, A . (1998). Validation o f virtual reality to teach and assess psychomotor skills in laparoscopic surgery: results from randomised controlled studies using the M I ST V R laparoscopic simulator. Studies in Health Technology and Informatics 50, 124-130. 164 Teague, R. C , Gittelman, S. S. Park, O. (1994). A review o f the literature on part-task and whole-task training and context dependency (ARI Technical Report 1010) Alexandria , V A ; U.S A rmy Research Institute for the Behavioural and Social Sciences Tracey, M . R., Lathan, C. E. (2001). The interaction o f spatial ability and motor learning in the transfer o f training from a simulator to a real task. Studies in Health Technology and Informatics 81, 521-527. Treat, M . (1996). A surgeon's perspective on the difficulties o f laparoscopic surgery. In Computer-Integrated Surgery (Taylor, R.H., Lavallee, S., Burdea, G.C., and Mosges, R., eds.), M IT Press, Cambridge, M A . , 559-560. Verner, L., Oleynikov, D., Holtmann, Haider, Zhukov, L. (2002). Measurements o f the Level o f Surgical Expertise Using Flight Path Analysis from da Vinci Robotic Surgery System. http://welntiedia.unmc.edu/medicine/niorien/inis/FliehtAnalvsis.pdf Last accessed January 21, 2004. Vui l leumier, H., Halkic, N . (2003). Implementation o f robotic laparoscopic cholecystectomy in a university hospital. Swiss medical weekly 133, 347-349. Way, L.W., Stewart, L., Gantert, W., L iu , K., Lee, C M . , Whang, K., Hunter, J .G . (2003). Causes and prevention o f laparoscopic bile duct injuries: analysis o f 252 cases from a human factors and cognitive psychology perspective. Annals of Surgery 237, 460-469. Wentink M , Stassen LP , A lwayn I, Hosman RJ, Stassen H G . (2003). Rasmussen's model o f human behavior in laparoscopy training. Surgical Endoscopy 17,1241-1246. Wil l iams, A . C , Jr., & Flexman, R. E. (1949). Evaluation o f the school link as an aid in primary flight instruction. University o f Illinois Bul let in, 46, (71), (Aeronautics Bulletin: No . 5), University o f Illinois. Woltring, H. J . (1986). A Fortran Package for Generalized Cross-Validatory spline Smoothing and Differentiation. Advances in Engineering Software 8(2), 142-151. Wu, H., Siegel, M. , Stiefelhagen, R. Yang, J . , (2002). Sensor Fusion Using Dempster-Shafer Theory. I EEE Instrumentation and Measurement Technology Conference, Anchorage, A K , USA . May 21-23, 2002. Yamauchi, Y.,Yamashita, J . , Morikawa, O., Hashimoto, R., Mochimaru, M. , Fukui, Y. , Uno, H., Yokoyama, K. (2002). Surgical Sk i l l Evaluation by Force Data for Endoscopic Sinus Surgery Training System. Lecture Notes in Computer Science 2488, 44-51. Zeyada Y , Hess RA.(2000). Model ing human pilot cue utilization with applications to simulator fidelity assessment. Journal of Aircraft 37', 588-97. 165 Appendix A OR Study Experimental Protocol and Data Acquisition Procedures A.l Experimental Protocol Attending Surgeons: Dr. A l ex Nagy Dr. Neely Panton Surgical Residents: Dr. Ed Chang Dr. Naisan Garraway Dr. Kathy Hsu Researchers: Joanne L i m Study Protocol: This is the protocol for the laparoscopic surgery performance evaluation study o f the three specified surgical residents performing a M IS cholecystectomy, with an attending surgeon available. Before the patient arrives in the OR, all the equipment is checked and initialized. The resident is asked to scrub and enter the O R while the patient is being anesthetized. One o f the researchers also scrubs, in order to affix Opsite™ and Mepore™ to the force/torque sensor mounted on the modified laparoscopic surgical tool. This is to prevent foreign liquids and substances from contaminating the force/torque sensor. The researcher remains scrubbed to be available to make any adjustments to the tool, and to hand of f the wires from the surgical tool to the other researcher outside o f the sterile field. When the modified tool is ready to be used in the surgery as noted by the attending surgeon or resident, the researcher outside the sterile field performs a test to ensure the motion capture equipment is functioning. Once the equipment is tested and confirmed operational, the researcher informs the surgeon that they are ready to begin recording. The researcher begins collecting data from the sensors and begins recording the operation using both the external video camera and laparoscopic camera. The equipment records for the entire surgery. When the laparoscopic portion o f the surgery is completed, the surgeon informs the researcher, and data collection can be stopped. As the patient is being sutured, i f possible and not too intrusive, the scrubbed researcher holds and manipulates the surgical tool in various positions for synchronization purposes. The surgical tool is held in a horizontal position and a vertical movement is done to strike the tool against a hard surface (i.e. surgical bed). This movement is recorded on both kinematics sensors, and is used for time synchronization. The tool handles are also squeezed while the tool is moved to aid in force synchronization. After these calibrations are completed, all the systems are shut down. If the surgeon feels uncomfortable using the modified laparoscopic tool at any time during the surgery, they are free to stop the experiment and return to using the traditional non-modified surgical tool. Approval for this experiment was granted though the University o f British Columbia Cl in ica l Research Ethics Board and the University o f British Columbia Hospital. Location: Procedure: Catherine Kinnaird I man Brouwer (stand-by) Sayra M . Cristancho (stand-by) University o f Brit ish Columbia Hospital MIS cholecystectomy 166 The Sterile Supply Department and the Biomedical Engineering Department had approved all equipment and instrumentation. A.2 Equipment List The fol lowing is the list o f hardware and software components required for the O R experiments: Hardware: 1 - Canon ZR60 Digital video camcorder Canon A C adaptor 1 - M in i D V cassette 1 - V H S video cassette 1 - Portable desk/trolley 1 - Polaris Tool Interface Unit 1 - Polaris Position Sensor cable 1 - Polaris power cable 1 - Polhemus Fastrak Interface Unit 1 - Polhemus Fastrak Transmitter 1 - Polhemus Fastrak power supply/cable 1 - AT I force/torque sensor M U X box 2 - Serial port cables 1 - Logitech web cam I - 6' U S B extension cord 1 - strain gauge power supply 1 - strain gauge instrumentation amplifier 2 - Tripods 1 - PC 2.4 G H z A M D Duron (tower, keyboard, mouse, monitor) 1 - Laptop PC (minimum 800MHz) 1 - Digital camera 1 - Mepore™* 1 - OpSite™* 1 - 2mm Al len key * 1 - 3mm Al len key* 1 - 4mm Al len key* 1 - small scissors* 4 - spare reflective balls for Polaris M D M A * 1 - Modif ied laparoscopic surgical tool (Maryland dissector)* 3 - pieces o f mounting bracket 1 - Polhemus Fastrak Receiver 1 - AT I Mini40 Force/Torque sensor 1 - Polaris Position Sensor ( M D M A ) Nuts and bolts for attachment and mounting * Equipment requiring sterilization Software: 167 Windows 2000 Matlab 6.0 R12 Tera Term Pro V2.3 Logitech QuickCam V5.4.1 FTGU I Matlab programs: - P M C S . m Appendix A.3 OR Procedure The fol lowing are the procedures for data collection in the O R for use o f the experimental hybrid tool. A.3.1 Pre-operative Set-up Required Time: approximately 30min-60min with two people Suggested start time: evening before surgery, or 0700h for 0800h surgical start Set-up equipment as shown in Figure A . l . Anesthesia cart Researcher Anesthetist Computer Cart Modif ied surgical tool/researcher table Position Sensor & Triood Video Camera & MIS Cart Researcher Figure A . l : University of British Columbia operating room experimental set-up 168 A.3.2 Pre-operative set-up and software initialization Required Time: 10-30 min Suggested start time: 7:00 for 8:00 procedure A.3.2.1 Start-up Procedure Teraterm 1) Run Teraterm 2) Turn on Polaris Tool Interface Unit (wait 20 sec for beep - RESETBE6F w i l l appear in the command window) 3) Type COMM50000- Reply: OKAYA896 4) Teraterm window: Setup - Serial Por t : change baud rate to 115200 5) Type: IN ITen te r (note: _means space bar) - OKAYA896 6) Teraterm Window: File - exit A.3.2.2 OR Data Collection Procedures 1) Start digital video camcorder (focus camera on surgeon arm and experimental tool) 2) Start the MIS V C R to record the laparoscope 3) Start Matlab R l 2 4) Set current directory to file where data is to be collected (i.e., OR_test3) 5) In command window type: P M C S 6) Graphical user interface w i l l appear 7) Select radio buttons for all faces 8) STOP check equipment connections before initialization o f Polaris 9) Press the Initialize Polaris button when ready - Polaris w i l l beep in acknowledgment 10) Wait until all status bars are illuminated in yel low (Check for error messages in the Command Window) 11) When the surgeon indicates that she is ready to use the experimental tool the sensor cords are plugged in to the tool interface units under the surgical table. 12) On the laptop start FTGUI 13) In FTGUI, select logging to port radio button, and select appropriate folder 14) In FTGUI, select continuous data collection radio button 15) In FTGUI, push the Options button - output data - Metric 16) In FTGUI, push the Options button - hemispheres - set hemispheres to 0,0, -1 17) In FTGUI, push Record Data 18) In P M C S , select the appropriate tool used by the surgeon (Tool 2 for our case) 19) Monitor the status bars and adjust equipment i f necessary (i.e. camera) i f required. (Red status bar: missing marker, Ye l low status bar: loading tool identification files, Green status bar: tracked marker) 20) At the end o f the procedure in FTGUI press Stop data recording, press Stop Tracker - followed by Shut-down Polaris ( M U S T B E IN THIS O R D E R ) 169 Appendix B Operational Definitions B.l Hierarchical Decomposition Operational Definitions (McBeth 2001) A hierarchical decomposition modified from Cao by McBeth in 2001 was used to organize the O R data and also to find tasks that are analogous in the simulators and OR. The five-level decomposition describes the procedure in terms o f surgical phases and stages, tool tasks and subtasks, and fundamental tool actions (Figure B. l ) . We are looking primarily at the Task and Subtask levels o f the hybrid experimental surgical tool during dissection tasks. Figure B. l : Five levels of the hierarchical decomposition B. l . l Phase Level Phases are the fundamental levels o f a procedure forming the backbone and the foundation for further decomposition. A laparoscopic cholecystectomy procedure is divided into five distinct phases as shown in Figure B.2. Each phase has a particular goal associated with it to be accomplished in order to proceed to the next phase. This study dealt with Stages, Tasks and Subtasks in the cystic duct and gallbladder dissections only. Future work may be able to incorporate all aspects o f the procedure by having multiple tool tips. Figure B.2: Five phases of a laparoscopic cholecystectomy B.1.2 Stage Level The phase levels are further divided into stages, which have goals, but the goals do not have to be successfully completed before proceeding to the next stage. The stages o f the cystic duct and gallbladder dissection ( C D D and G B D ) phases are shown below (Figure B.3) A l l stage level definitions are based on video observations. 170 F igure B.3 - Stage level diagram for cystic duct dissection (CDD) and gallbladder dissection (GBD) B.1.3 Task Leve l A task is a set o f movements performed with a single tool to achieve a desired effect. A number o f tasks may be required to successfully complete a stage within a procedure. A task segment i f defined from the time the tool tip is placed in the distal end o f the trocar until the tool is pulled out throught the same trocar. We chose the dissection task to investigate. B.1.4 Subtask Leve l The subtask level defines how the experimental tool tip is moving inside the patient. The subtasks are shown in Table B.l for a dissection task. 171 Table B. l : Hierarchical subtask dissection definition Subtask Name Definition Start Stop Free Space Movement: Approach* Tool is moving toward tissue upon entry into the trocar Entry o f the tool tip into the distal end o f the trocar Tool tip in contact with tissue Tissue Manipulation Tool is in contact with the tissue Initial contact o f the tool tip with the tissue Final contact o f the tool tip with the tissue Free Space Movement - Withdrawal Tool is moving away from the tissue being pulled out o f the trocar Final contact o f the tool tip with tissue Exit o f the tool tip from the distal end o f the trocar *tool moving in freespace (no tissue manipu at ion) B.1.5 Action Level The action states are made up o f 12 types o f distinct tool movements. The action level was not examined in this study. It is possible for tool movements to be a combination or a collection o f actions. There are a total o f 72 feasible combinations o f the 12 action states (McBeth 2001). B.2 Performance Measure Definitions The definitions o f the performance measures presented in Chapter 4 are shown in Table B.2. The individual component o f the performance measures are calculated by projecting the tool path vectors onto the tool axis vectors, and allows us to compare performance measures across different settings. Table B.2: Kinematics andforce performance measures Kinematics Measure Performance Measure Definition Distance from mean Absolute (mm) V ( X , - X ) 2 + ( X . - J ) 2 + ( Z , - Z ) 2 Rol l (radians) Roll, - Roll Velocity Ax ia l (mm/s2) Grasp (mm/s2) y, Translate (mm/s2) X, Transverse (mm/s2)) Absolute (mm/s2) / . 2 , . 2 , - 2 yjx, + y, + Z, Rol l (rad/s2) Roll, Acceleration Ax ia l (mm/s3) z\ Grasp (mm/s3) }>i Translate (mm/s3) Transverse (mm/s J) <Jxi2+y,2 Absolute (mm/s3) 1-2 , - 2 , - 2 V*/ +y> +zi Rol l (rad/s3) Roll, Jerk Ax ia l (mm/s4) Grasp (mm/s4) y) Translate (mm/s4) X, Transverse (mm/s4) •Jxf+y,2 Absolute (mm/s4) 1.-2 , 2 , ...2 V*/ +y< +zi Rol l (rad/s4) Roll, Force Ax ia l (N) z.f Grasp (N) y.r Translate (N) xf Transverse (N) ^xj2+y/ Absolute (N) J^xf+y/ +z/ Rol l Torque (N-m) y, Appendix C University o f Brit ish Columbia C R E B approval 175 Appendix D Medicine Meets Virtual Reality Conference Submission This document was submitted to and presented in poster form for The 11 t h Annual Medicine Meets Virtual Reality Conference in Newport Beach, California, U S A in January 2003. Quantitative measures o f transfer o f t raining and val idat ion o f laparoscopic surgical simulators Catherine K innai rd ' , Joanne L i m 1 , Antony J . Hodgson 1 PhD, A l ex G . Nagy 2 M D , Kar im Qayumi 2 M D PhD, Lance Rucker 3 DDS , Karon MacLean 4 PhD Departments o f Mechanical Engineering', Surgery 2, Oral Health Sciences 3, and Computer Science 4 University o f British Columbia, Vancouver, B C , V 6 T 1Z4, C A N A D A Abstract Objective measures o f surgical performance in minimally invasive surgery are o f interest for students, surgeons and the public alike. Current assessments o f surgical performance in the operating room are subjective and potentially unreliable (Rosser, 1998). Surgical simulators have been recognized as a potential source o f objective assessment. However, until these simulators have been shown to be a valid and reliable measurement source, their use in surgical education remains minimal. The goal o f this project is to use a multi-faceted approach to surgical assessment in the operating room, and to compare these measures to performance in analogous tasks on surgical simulators, both bench-top and virtual reality. In order to organize this research, a hierarchical decomposition o f a laparoscopic cholecystectomy is used to divide the surgery into many component tasks (McBeth, 2002). The operating room assessment w i l l utilize performance measures previously shown to be reliable such as time and postural data (McBeth, 2002), as well as incorporating a new measure o f force/torque, and a new method to measure kinematics. The force/torque measures w i l l use a similar approach as Rosen (Rosen, 2001), whereby a 3-dimensional load cell measures forces and torques on a laparoscopic tool. Postural data wi l l be gathered using an optical tracking system. Finally, kinematics data w i l l be gathered using a fusion o f two types o f sensors, optical and magnetic tracking systems. The optically tracked points w i l l act as fixes, whereby the magnetic sensor data with its faster update rates w i l l be fit to these optically tracked points. One specific aspect o f this project involves assessing the transfer o f training from the simulator to the operating room. A control group and an intervention group, comprised o f surgeons at various ski l l levels w i l l perform surgical tasks in the operating room. The intervention group wi l l then receive simulator training. Both groups w i l l be re-tested in the operating room. Comparisons between the groups using the aforementioned performance measures w i l l then be used to assess the transfer o f training effects. 176 Validation o f laparoscopic surgical simulators is yet another component o f this project. We want to quantify the subjective measure o f face validity using the performance measures described. Through research with expert surgeons in the operating room and the two types o f simulators, bench-top and virtual reality, we plan to quantitatively assess simulator validity, as well as establish a method to effectively assess other surgical simulators. The novelty o f this research lies in the multi-pronged approach to quantitatively assess surgical performance in the operating room in order to validate surgical simulators. Many o f these measures have been used alone in previous work, but this would be the first time they have been combined in this way, as far as we know. This work is done in conjunction with the Center o f Excellence for Surgical Education and Innovation (CESEI ) , which is organized by the University o f British Columbia and the Vancouver Hospital and Health Sciences Center. The mission o f the CESE I is to provide multi-disciplinary academic educational center through the use o f modern electronic technology. Validated simulators provide a potential source o f training and certifying surgeons, as well as designing and evaluating tool designs. References McBeth, P. (2002). Thesis: A methodology for quantitative performance evaluation in minimally invasive surgery. University o f British Columbia, Vancouver, BC , Canada. Rosen J , Hannaford B, Richards C G , Sinanan M N . (2001). Markov modeling o f minimally invasive surgery based on tool/tissue interaction and force/torque signatures for evaluating surgical sk i l l . I EEE Transactions on Biomedical Engineering. 48(5):579-91. Rosser JC, Rsser L E , Savalgi RS. (1998). Objective evaluation o f a laparoscopic surgical ski l l program for residents and senior surgeons. Archives o f Surgery. 133(2):657-661. 177 Appendix E Society of Gastrointestinal Endoscopic Surgeons (SAGES) Conference Submission This document was submitted to and presented in poster form (by Dr. Hamish Hwang) for The 10 t h World Congress o f Endoscopic Surgery in Denver, Colorado, U S A in Apr i l 2004. Objective M u l t i - M o d a l Surgical Performance Ana lys i s Joanne L i m 1 , Catherine Kinnaird 1 , Antony J . Hodgson 1 PhD, A l ex G. Nagy 2 M D , Karim Qayumi 2 M D PhD Departments o f Mechanical Engineering 1 , and Surgery 2, University o f British Columbia, Vancouver, BC , V 6 T 1Z4, C A N A D A Abstract Objective measures o f surgical performance in minimally invasive surgery are o f interest for students, surgeons and the public. The goal o f this project is to use a multi-faceted approach to surgical assessment in the operating room, and to compare these measures to performance in analogous tasks on surgical simulators. The operating room assessment wi l l use performance measures such as time, tool tip forces and torques (newly added), and tool kinematics. A commercial 3-D load cell mounted on a laparoscopic tool measures the forces and torques. Strain gauges are mounted onto the tool handle to measure surgeon grip levels. Kinematics data is gathered using both optical and magnetic sensors and the resulting data streams are fused to improve accuracy and reliability. This data fusion w i l l be done using a simple yet effective algorithm we have recently developed. The optical sensor data is regarded as extremely accurate, but it is subject to occlusion and has a comparatively low sampling rate. The magnetic data is acquired more frequently and is never occluded, so we fuse the magnetic data to the optical data for the entire data stream. This gives a complete set o f data even when there are optical data gaps. As shown in the figure, the fused estimate is roughly 6-8X more accurate when optical data is missing than an estimate based on interpolating across the gap with optical data alone. 178 RMS error for optical and fused data 1 4 | 1 1 1 ! 1 1 1 — rj i i i i i i i i i i I 0 2 4 6 8 10 12 14 16 18 20 gap size {if) The novelty and uniqueness o f this research lies in the multi-pronged approach to quantitatively assessing surgical performance in the operating room. Although some o f these measures have been used individually in previous work, to our knowledge they have not previously been combined in this fashion, nor have tool tip forces been measured throughout a live surgery. 179 Appendix F Transfer: of Training from Simulator to Operating Room The original goal o f this project was to study the issue o f transfer o f training from simulator to human operating room, as this is a subject that needs analysis in the surgical education and simulator arenas. But due to many logistical nightmares such as patient recruitment, scheduling, and many others, this project had to be converted to the study described in the manuscript. F . l Transfer of Training The subject o f transfer o f training is a widely studied topic in many fields, not only in surgery. L ikely the most widely known research in this area would be with flight simulators. The first and simple flight simulator was created around 1910 in France when a young student pilot could practice simple controls in a smaller modified type plane (Moore 2002). After many technological advances since that time, computer-based flight simulators have been commonly used in the training o f pilots (Wentink 2003, Zeyada 2000). It is time the medical community also takes a closer look at using surgical simulators in the training and credentialing o f surgeons. Studies have been conducted outside o f flight training and surgical training venues to really study what transfer o f training is, and what affects this transfer. There is also a difference between learning and training transfer that many do not realize. True transfer o f training occurs when the behavior transfers between two distinct and novel situations, while learning occurs when the behavior transfers between two identical situations (Auffrey 2001). There are also the concepts o f near and far transfer; near transfer which is between nearly identical situations, and far transfer is between novel contexts. "True" training transfer is thought to occur quite rarely, and the teaching is thought to be most useful when it is specific, and practiced in an environment similar to the intended situation (Auffrey 2001). To encourage successful learning and training transfer, it is necessary to vary the conditions o f practice: part versus whole task methods (Auffrey 2001). Part methods focus on breaking the whole task into significant pieces, which are then practiced individually and explained in terms o f how they fit in the whole task. Whole methods focus on the repetition o f the task in its 180 whole. Whole methods are acceptable for simple tasks, while part methods should be used for individual difficulties or for complex time-consuming tasks (Teague 1994). The conditions o f transfer o f training include the generalization o f knowledge and skills acquired in training, and the maintenance o f that learning overt ime (Ford 1997). There are three key factors that can impact training outcomes and transfer: 1) training design 2) trainee characteristics 3) work environment factors (Ford 1997). These researchers also see the need for multiple performance measures (other than self-report) for developing a more complete understanding o f training transfer. It is reasonable to expect that individual's personality might affect future performance, but also affect the individual's enthusiasm to learn, learning strategies used, rate o f ski l l acquisition, and o f course, transfer o f training. There are also environmental factors such as support, work climate, and opportunities that are important factors impacting training transfer (Ford 1997, Foxon 1993). F.2 Assessing Transfer of Training The assessment o f the transfer o f training from one environment to another is not a concept that is unique to surgical education. It has long been used in the flight training industry, as most i f not all commercial pilots are trained and assessed in flight simulators. There have been very early evaluations o f flight simulators that demonstrate training and cost-effectiveness (Flexman 1972, Wil l iams 1949). This industry also takes advantage o f a concept known as the "transfer effectiveness ratio (TER ) " (Blaiwes 1984). This T E R is typically 0.75, which shows a reduction o f three hours o f in-flight training is accomplished by 4 hours o f simulator training. F.3 Research Questions We have formulated some research questions that revolve around our main objective. Answering these questions wi l l give us a better view on what we are trying to investigate. F.3.1 Do novices who practice in simulators get better in the OR? "See one. Do one. Teach one." This is the traditional surgical education mantra that was not far o f f from the truth. A surgical student would spend time with an experienced surgeon in the operating room observing and noticing the particulars o f surgery. The next step would be to try out this surgery for oneself. And o f course, logically, would be to teach the next batch o f up and coming novice surgeons. As absurd as all this sounds, it is the way many surgeons have learned their trade. This obviously is not an acceptable method o f education, and things have started to change. And most recently, surgical simulators have come to the forefront as a possible good method for surgical education. 181 F . 3 . 2 What do we want to know? Do novice surgeons who practice in simulators show a significant improvement in the operating room? If a novice surgeon spends X amount o f time practicing on a simulator, w i l l there be a quantifiable improvement in the operating room performance, as opposed to a similar novice who does not have any simulator training. So this would be the ultimate question, do novices who practice in simulators quantitatively improve their surgical performance in the operating room? F . 5 . 3 Why do we need to know? Why is it important that surgical educators find out i f training in a simulator transfer to the operating room? It has been shown that intra-operative assessments are subjective (Lentz 2001). In the simulator, the assessments are all objective and quantifiable. If a novice surgeon could do the majority o f practice in a simulator, it is possible that many thousands o f dollars would be saved in operating room expenses as has been mentioned earlier in this manuscript. It would be optimal for a novice to stay in the surgical simulator until all skills have been learned and practiced to an expert level, and then the novice would then be moved into the operating room. A l l the psychomotor and many cognitive skil ls would already be honed, and there would be less time required in the operating room to practice trivial basic tasks. In the simulator, the skills would all be objectively assessed as opposed to the subjective operating room assessments usually administered by the attending expert surgeon. F . 5 . 4 What we know What do we know now? It has been shown that experienced surgeons are better in simulators than novices (McNatt 2001). And novices who practice in simulators do show improvement (Risucci 2001). Seymour and associates, in a breakthrough study in the human operating room, have published one o f the more recent and respected studies supporting the theory o f transfer o f training between simulator and operating room (Seymour 2002). They were one o f the first groups to conduct a true transfer o f ski l l study from simulator to clinical operating room. Surgical residents (PGY1-4) were randomly assigned into two groups: one group to receive virtual reality (VR) training in addition to standard training (ST), and the other group to receive only ST. A l l subjects completed a series o f tests in visuo-spatial and perceptual abilities prior to training. Psychomotor abilities and V R training were tested on the Min imal l y Invasive Surgical Trainer-Virtual Reality (MIST-VR) . A l l operative procedures were videotaped. Their 182 measurements were based on explicit ly defined observable operative errors (8 defined errors) and length o f time. They found that the ST group made six times as many o f the defined errors when compared to the V R group. The ST group also spent more time completing the task, but it was not statistically significant. This study was one o f the first to demonstrate that training in a simulator does correlate with improved performance in the human operating room. Grantcharov and associates published the most recent study in the study o f transfer o f training to the operating room in 2004 (Grantcharov 2004). They investigated whether laparoscopic skills acquired in a V R simulator could be transferred to operations. This would therefore also validate the role o f V R simulation as a tool for surgical skills training. The study participants consisted o f 20 surgeons with limited laparoscopic cholecystectomy experience (from 0-8 median 4.5). A l l participants performed a baseline laparoscopic cholecystectomy under supervision o f an experienced surgeon. The trainees were then randomized to receive either V R training or a control group with no additional training. The V R group trained on the MIST- V R doing 10 repetitions o f all 6 tasks (of progressive complexity) available in the system. Within 14 days o f the baseline laparoscopic cholecystectomy, all participants performed another laparoscopic cholecystectomy. These procedures were videotaped and assessed by two senior surgeons using predefined rating scales. The results show the V R trained group performed the laparoscopic cholecystectomy faster than the control group, and showed an improvement in error score and economy o f movement. The limitations o f this study as mentioned by the authors include the fact that the scoring o f O R performance was subjective (although minimized by defining objective and easily assessed scoring criteria), and there was a small sample size. This study further supported the idea o f transfer o f training between simulator and operating room.

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
Japan 5 0
France 4 0
United States 2 0
China 1 23
City Views Downloads
Tokyo 5 0
Unknown 4 3
Boardman 1 0
Beijing 1 0
Ashburn 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}

Share

Share to:

Comment

Related Items