Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A methodology for quantitative performance evaluation in minimally invasive surgery McBeth, Paul Bradley 2002

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2002-0174.pdf [ 8.52MB ]
Metadata
JSON: 831-1.0090255.json
JSON-LD: 831-1.0090255-ld.json
RDF/XML (Pretty): 831-1.0090255-rdf.xml
RDF/JSON: 831-1.0090255-rdf.json
Turtle: 831-1.0090255-turtle.txt
N-Triples: 831-1.0090255-rdf-ntriples.txt
Original Record: 831-1.0090255-source.json
Full Text
831-1.0090255-fulltext.txt
Citation
831-1.0090255.ris

Full Text

A M E T H O D O L O G Y F O R Q U A N T I T A T I V E P E R F O R M A N C E E V A L U A T I O N IN M I N I M A L L Y I N V A S I V E S U R G E R Y by P A U L B R A D L E Y M C B E T H B . S c , The University of Calgary, 1999 .:. A THESIS S U B M I T T E D IN P A R T I A L F U L F I L M E N T OF T H E R E Q U I R E M E N T S F O R T H E D E G R E E OF M A S T E R OF A P P L I E D S C I E N C E in T H E F A C U L T Y OF G R A D U A T E STUDIES (Department of Mechanical Engineering) We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y OF B R I T I S H C O L U M B I A February 2002 © Paul Bradley McBeth, 2002 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of v A E l H A M V C A L . E^Cr H o SS^-' * J ^ The University of British Columbia Vancouver, Canada Date HfrRXH 6, DE-6 (2/88) Abstract The objective of this work is to establish a methodology for reliably quantifying the performance of an expert surgeon in laparoscopy with the long-term goal of validating surgical simulations. A validated simulation wi l l allow us to quantitatively assess surgeon performance and evaluate new tool designs. Quantitative performance and skill assessments are critical for evaluating the progress of surgical residents and the efficacy of different training programs. Current evaluation methods are subjective and potentially unreliable, so there is a need for objective methods to evaluate surgical performance. We identify a feasible method to measure kinematic and postural data in the live operating room setting. We used an optoelectronic motion analysis system to acquire postural data and tool tip trajectories of one expert surgeon over a period of four months. To assess reliability of performance measures, we created a hierarchical decomposition diagram describing the procedure in terms of surgical tasks, tool sequences and fundamental tool actions. Using tool tip kinematic data and postural data we extracted characteristic measures of performance and compared these measured distributions using the Kolmogorov-Smirnov statistic. For the most part our performance measures (with the exception of kinematic measures) show consistent reliability over time by a trained surgeon and little effect from patient variability, and so are likely reliable measures of performance. A n expanded set of reliable kinematic measures wi l l form the basis for quantifying surgical skill and should be useful in validating surgical simulations for use in training, certifying surgeons and designing and evaluating new surgical tools. n Table of Contents Abstract i i Table of Contents i i i List of Tables v i i List o f Figures v i i i Acknowledgements x i Chapter 1 - Introduction and Literature Review 1 1.1 Introduction and Objectives 1 1.2 Overview of Minimal ly Invasive Surgery (MIS) 2 1.3 Limitations in Minimal ly Invasive Surgery 4 1.4 Performance Evaluation 5 1.5 Training and Simulators 6 1.6 Simulator Validation 8 1.7 Ergonomics 9 1.8 Analysis of Performance Measures 10 1.9 Kinematic Assessment Tools 11 1.10 Research Questions 12 Chapter 2 - Quantitative Methodology of Evaluating Surgeon Performance in Laparoscopic Surgery 14 2.1 Introduction 14 2.2 Methods 15 2.2.1 Framework 15 2.2.1.1 Project Methodology 15 2.2.1.2 Hierarchical Decomposition 20 2.2.1.3 Performance Measures 21 2.2.1.3.1 Time 22 2.2.1.3.2 Kinematics 22 2.2.1.3.3 Ergonomic / Postural 23 2.2.1.3.4 Event Sequencing 24 i i i 2.2.1.4 Difference Measures 24 2.2.1.4.1 Kolmogorov-Smirnov Statistic 24 2.2.1.4.2 Sequencing Delta Measure 26 2.2.1.5 Evaluating Reliability 27 2.2.1.5.1 Procedural Specific Variability 28 2.2.1.5.2 Context Comparisons 30 2.2.1.6 Clinical Interpretation 31 2.2.2 Experimental Protocol and Data Acquisition and Processing 32 2.2.2.1 Equipment : 32 2.2.2.2 Protocol 34 2.2.2.3 Missing Data and Signal Processing 35 2.2.2.4 Joint Angle Processing 36 2.2.3 Reliability Analysis - Clipping Task w 37 2.2.4 Context Comparison - Clipping Task 37 2.2.5 Clinical Interpretation - Clipping Task 38 2.3 Results 38 2.3.1 Feasibility 38 2.3.2 Reliability 40 2.3.3 Context Comparisons 46 2.3.4 Clinical Interpretation 48 2.4 Discussion 48 2.4.1 Data Acquisition 49 2.4.2 Hierarchical Decomposition 49 2.4.3 Reliability Analysis 50 2.4.3.1 Procedural Specific Variability 50 2.4.3.2 Context Comparisons 52 2.4.4 Clinical Utili ty .: 53 2.4.5 Future Work '. 55 2.5 Conclusion 56 Chapter 3 - Repeatability of Biomechanical Stresses in Laparoscopic Surgery 57 3.1 Introduction 57 3.2 Methods 58 3.2.1 Equipment 58 3.2.2 Kinematic A r m Model , Calibration and Joint Angle Calculations 58 3.2.3 Comparative Analysis (KS Statistic) 59 3.2.4 Ergonomic Stress Scoring System 60 3.2.4.1 Modified Discrete R U L A Method 62 3.2.4.2 Modified Continuous R U L A Method 62 3.2.5 Post-Operative Calculations 64 3.3 Results 64 iv 3.3.1 Marker Visibi l i ty 64 3.3.2 Comparative Analysis 65 3.3.2.1 Measurement Errors 66 3.3.2.2 Distribution Profiles 69 3.3.2.3 Patient / Procedure Factors : 71 3.3.2.4 Difference Measures 72 3.3.3 Posture Scores 73 3.4 Discussion 76 3.4.1 Tool Tracking 76 3.4.2 Joint Angle Comparison 76 3.4.3 Biomechanical Stress Levels 77 3.4.4 Future Work 78 3.5 Conclusions 79 Chapter 4 - Conclusions and Recommendations 80 4.1 Introduction 80 4.2 Review of Present Research 80 4.2.1 Motion Capture System Feasibility 80 4.2.2 Performance Measure Reliability for the Assessment of Surgical Performance in the O R 83 4.2.3 Repeatability of Ergonomic Measures in MIS Surgery 84 4.2.4 Overall Performance Assessment 85 4.3 Future Research Recommendations 87 4.3.1 Mot ion Capture Measurements 87 4.3.2 Performance Measures and Statistical Analysis 88 4.4 Future Studies 88 4.4.1 Simulator Validation '. 89 4.4.2 Surgical Skills Assessment 90 List of Terms 92 Bibliography 95 Appendix A - O R Study Experimental Protocol and Data Acquisition Procedures 102 Appendix B - Measurement Equipment and Data Acquisition Protocols for use in the Operating Room I l l Appendix C - Signal Processing and Data Formatting for Optoelectronic Marker Array data from the O R 116 Appendix D - Operational Definitions ; 124 v Appendix E - Preliminary Selection of Performance Measures and Development of Comparative and Reliability Analysis Techniques 132 Appendix F - Collection of Performance Measure Analysis Results 142 Appendix G - BioRome Conference Submission 155 Appendix H - Medicine Meet Virtual Reality Conference Submission 162 Appendix I - S .A.G.E .S . Conference Submission 170 v i List of Tables Table 2.1 - Summary of data available from each procedure 39 Table 3.1 - General comments regarding each procedure 71 Table 3.2 - The average time taken for the major surgical tasks 74 v i i List of Figures Figure 1.1 - The use of reliable measures of surgical performance 2 Figure 1.2 - Illustration of minimally invasive surgery 3 Figure 1.3 - Reduced D O F of motion of the MIS tool tip 4 Figure 1.4 - Illustrations of commonly used surgical simulators 7 Figure 1.5 - The Kolmogorov-Smirnov statistic 10 Figure 2.1 - Typical uses of the comparative analysis methodology 16 Figure 2.2 - A demonstration of our validation assessment technique 19 Figure 2.3 - Laparoscopic cholecystectomy hierarchical decomposition 21 Figure 2.4 - The tool tip and x,y,z frame of the M D M A r r a y 23 Figure 2.5 - The Kolmogorov-Smirnov statistic 25 Figure 2.6 - Example of a state transition diagram and probability transition matrix 26 Figure 2.7 - Performance measure reliability analysis 29 Figure 2.8 - Global performance evaluation measures for a clipping task. 31 Figure 2.9 - Data collection components 33 Figure 2.10 - Model of the surgeon's arm 34 Figure 2.11 - Graphical representation of missing data interpolation 40 Figure 2.12 - Stage transition diagram for the subtask sequence 41 Figure 2.13 (a) - Completion time reliability analysis 43 Figure 2.13 (b) - Completion time reliability analysis 43 Figure 2.14 (a) - Joint angle distribution reliability analysis 44 Figure 2.14 (b) - Joint angle distribution reliability analysis 44 Figure 2.15 (a) - Similarity sequence of clipping subtasks 45 Figure 2.15 (b) - Similarity sequence of clipping subtasks 45 v i i i Figure 2.16 (a) - Kolmogorov-Smirnov difference statistic of completion time performance measures 46 Figure 2.16 (b) - Kolmogorov-Smirnov difference statistic of kinematic performance measures 47 Figure 2.16 (c) - Kolmogorov-Smirnov difference statistic of mean membership action variables 47 Figure 2.17 - Performance measure reduction for a clipping task 48 Figure 2.18 — Example of measuring the performance of a new surgeon by comparing performance in a single procedure to a reference database from our expert surgeon 54 Figure 2.19 - Resolution of completion time 55 Figure 3.1 - The Kolmogorov-Smirnov statistic 60 Figure 3.2 - R . U . L . A . Posture zones. Zero reference positions are also shown 61 Figure 3.3 - Discrete and continuous modified R U L A posture scores 63 Figure 3.4 - The values labeled on each marker represent the percentage of time it was tracked over all seven procedures after interpolation 65 Figure 3.5 - Graphical representation of cumulative probability distributions of each joint and for each procedure 66 Figure 3.6 - Graphical representation of the cumulative probability distributions of wrist ulnar / radial deviation 67 Figure 3.7 - Variations in neutral posture measurements found experimentally in the lab and from the O R data 68 Figure 3.8 - Graphical representation of the cumulative probability distributions of elbow flexion and extension 69 Figure 3.9 - Graphical representation of normalized cumulative probability distributions of each joint and for each procedure 70 Figure 3.10(a) - Graphical representation of cumulative probability distributions correlating procedures to groupings of joint angle distributions. 71 Figure 3.10(b) - Graphical representation of cumulative probability distributions showing types of variability from procedure to procedure 72 Figure 3 . 1 1 - Benchmarking the difference measures using the K - S statistic on a difference scale 73 ix Figure 3.12 - Cystic duct dissection (CDD) stress levels for each joint angle normalized and averaged over seven procedures, including the continuous and discrete N W P S S . ... 74 Figure 3.13 - Gallbladder dissection (GBD) stress levels for each joint angle normalized and averaged over seven procedures, including the continuous and discrete N W P S S . . . . 75 Figure 3.14 - Gallbladder removal (GBR) stress levels for each joint angle normalized and averaged over seven procedures, including the continuous and discrete N W P S S . . . . 75 Figure 3.15 - Shifted continuous and discrete N W P S S averaged over seven procedures for each joint in each surgical phase 78 x Acknowledgements I would like to thank all of those who have supported me throughout the course of this project. I am grateful to my supervisor, Dr. Antony Hodgson for his knowledge, insight and enthusiasm. I am particularly grateful to Dr. Alex Nagy, for sharing with me his wealth of knowledge in the area of laparoscopy and for putting up with relentless O R experiments and seemingly painful joint calibrations. I would like to thank Dr. Kar im Qayumi for is assistant in helping analyze data and providing clinical interpretation. Many thanks to Jeff Stanley and others from Northern Digital Inc. (Waterloo, ON) for their assistance with Polaris and Mona Charman, Betty Barns, and all the nurses and staff at Vancouver hospital for their cooperation and interest in my project. I would like to thank all my colleagues of the Neuromotor Control Laboratory. I am indebted to my many of them and especially grateful to Cam Shute, Scott Illsley, Chris Plaskos, and Wi l lem Astma. Cam, thank you for reminding me of my priorities and keeping things in perspective, those days of skiing and climbing always seemed to bring clarity. Scott, thank you for getting up early on those rainy winter mornings to help with my O R experiments. I could not have done it without you. Chris, thank you for reminding me of the subtleties of life and what academia is all about. Wil lem, thank you for all those enlightening philosophical discussions on just about everything. Many thanks to all of my friends of whom I bounced off ideas or to whom I may have whined to on occasion, and to everyone else who helped me along the way. Most of all , I would like to thank my family, and especially my parents, for their support and absolute confidence in me. This thesis is dedicated to them, without whom none of this would have been even possible. This work was supported by the Natural Sciences and Engineering Research Council of Canada ( N S E R C ) . xi 1 Chapter 1 Introduction and Literature Review 1.1 Introduction and Objectives Over the past decade minimally invasive surgery (MIS) has become an important part of modern medicine. New surgical techniques and protocols and developments in surgical equipment have allowed many conventionally open procedures, to be done using MIS techniques. Although MIS has been widely adopted over the past decade many of these new procedures are more stressful for the surgeons. These procedures generally take longer and require surgeons to use awkward postures, and the tools limit surgeons' dexterity and range of motion. Surgical simulators are playing an increasingly important role in training and credentialing surgeons and have a potentially significant role to play in designing and evaluating new surgical equipment. A s simulators come to serve a more crucial role in the early training of a surgeon, it is important to ensure that the skills they learn are directly relevant to the practice of surgery in the operating room (OR). This is important because of the comparative ease of making objective measurements in a simulator; performance there is used as a significant component in evaluating or certifying surgeons. Also , i f we can show that tools are used essentially in the same way in the simulator as in the O R then we can improve and test new tool designs before introducing them into the OR. The long-term goal of this research is to establish a methodology for quantifying the validity of surgical simulators for the purpose of evaluating new tool designs and assessing skill and performance of surgeons (see Figure 1.1). We would like to show that time spent practicing motor skills in surgical simulators translates into quantifiable improvements in surgical skill in the O R and that performance in the simulator is a reliable measure of surgical skil l . Furthermore, we would like to show quantifiable differences between different tool designs. To do this we propose to identify corresponding measures of performance in the O R and surgical simulators. 2 r Equipment Evaluation and Design r Performance Measures Simulator Validation Surgeon Certification Figure 1.1 - The use of reliable measures of surgical performance in simulator validation, surgeon certification and equipment evaluation and design. The focus of the research presented in this document is to identify ways of measuring performance of an expert surgeon during laparoscopic cholecystectomies (gallbladder removal). The goal is to identify reliable measures of performance and to benchmark these measures in a non-stereotyped complex motor task performed by an expert surgeon in the OR. The benchmarked performance measures wi l l be useful for establishing the level of performance of an 'expert surgeon' and identifying performance measures which are reliable enough to use in simulator validation and evaluating new surgical equipment. 1.2 Overview of Minimally Invasive Surgery (MIS) The concept o f minimally invasive surgery (MIS) is not new; in fact, the idea has been around since the beginning of the 20-century (Nagy 1992). However, it wasn't until the development of fiber optic light sources in the 1960's that the potential for laparoscopy was realized. Further developments in camera technology and tool design has led to widespread laparoscopic procedures since the late 1980's. Since that time, MIS has advanced into an important part of modern general surgery. In the United States 60-80% of all abdominal surgical procedures are now conducted using MIS techniques (Taylor 1995). Figure 1.2 shows a typical operating room setup for laparoscopic procedures. Figure 1.2 - Illustration of minimally invasive surgery. A typical O R layout during a laparoscopic procedure, (source: www.laparoscopy.com) The development of MIS techniques has resulted in significant benefits for patients. For example, in comparison with open techniques laparoscopy offers the patient (1) minimal scarring, (2) reduced hospital stays, (3) reduced postoperative pain, and (4) faster recovery (Treat 1996, Perissat 1995). Patients are often discharged from the hospital one or two days after a procedure and complication rates for laparoscopic cholecystectomy are low, ranging from 2% to 11% (Catalano 1997). Surgeons, on the other hand, must deal with limitations in their ability to perform relatively simple motor tasks. Commonly used laparoscopic tools limit surgeons' dexterity and range of motion, thus forcing them to use awkward postures to carry out relatively simple tasks (Person 2001). Laparoscopic instruments are considered by many surgeons to be difficult to use and "non-user-friendly" (Treat 1996). These problems, coupled with increased operative times (approximately 30% larger than standard open procedures (Treat 1996)), have caused the use of laparoscopy to be restricted primarily to comparatively simple procedures such as cholecystectomies and hernia repairs. 4 1.3 Limitations in Minimally Invasive Surgery In laparoscopy, the reduction in the number of degrees of freedom of the surgical tool limits the surgeon's dexterity and range of motion (Tendick 1995). A laparoscopic tool has only 4 D O F as compared to a 6 D O F open tool. The motion of the tip is constrained to pitch, yaw, roll , and plunge as shown in Figure 1.3 (Person 2000). Not only does the surgeon experience motion reversal with the tools but they also experience changes in the location of the fulcrum as the tool is moved in and out of the abdominal cavity (Treat 1996). Surgeons have an indirect view of the surgical site via a camera and a 2D video display. The lack of depth perception places limits on difficult maneuvers such as suturing and knot tying. Tactile feedback is significantly reduced in laparoscopic surgery, which often leaves the surgeon guessing at various aspects of the surgery. These drawbacks dissuade surgeons from undertaking more complex procedures using MIS techniques and so have prompted research to improve surgeon performance and efficiency during MIS procedures. Figure 1.3 - Reduced DOF of motion of the MIS tool tip. DOF are roll, pitch, and yaw about the fulcrum created by the entry portal and plunge through the port. (Source: Tendick 1995). Laparoscopic surgery is considered technology-dependent and costly (Straface 1995). Limitations in video system technology and the dexterity of the tools cause difficulties in laparoscopy (Hunter 1997, Pellegrini 1997, Way 1995). New technologies are being developed to address limitations of conventional laparoscopic surgery including; improved vision systems (Tendick 1995), multifunctional and more dexterous 5 instruments with force-feedback (Dautzenberg 1995, Faraz 1997, Herder 1997, Mukherjee 1996), robotic camera systems (Finlay 1995, Sackier 1994, Taylor 1995) and teleoperators (Green 1995, Neisius 1995, Ohgami 1998, Ottensmeyer 1996, Payandeh 1996, Rovetta 1996). 1.4 Performance Evaluation A means of assessing performance and skill in MIS procedures is potentially useful for surgeon trainee evaluation, simulator validation, and surgical tool evaluation. Current evaluation methods are subjective and potentially unreliable so there is a need for objective and standardized methods to evaluate surgical performance (Rosser 1998, Winckel 1994, Lentz 2002, Chung 1998). Structured skills assessment forms represent the gold standard for surgical evaluation. These forms allow comprehensive intraoperative performance measurement, which address both the planning level and psychomotor skills of a surgeon. Winckel used this method to show the difference between junior and senior residents (Winckel 1994). Eubanks used a similar procedure to evaluate videotapes of a procedure (Eubanks 1999). However, patient variability and the stressful environment of the O R make it difficult to quantify the level of technical surgical skill based solely on these evaluation methods. It is also difficult to identify the level of a surgeon's technical surgical skills because these are not explicitly quantified during a structured assessment using the same kinds of measurements typically made in simulators (e.g., tasks completion time, precision measures, etc.). The efficiency of a procedure can be evaluated in terms of speed and quality. Some aspects of performance are easy to quantify while others are difficult. The time required to perform a procedure is easy to measure, and has been used by several research groups in an attempt to evaluate surgical performance (Chung 1998, Derossis 1998, Fried 1999, Hanna 1998, Hodgson 1999, Keyser 2000, Rosser 1997, Starkes 1998, Taffinder 1999). 6 A number of measures of quality have been suggested, some of which are more easily measured than others. Fried investigated surgeon performance in both simulated tasks and animal models (Fried 1999). His assessment is based on a penalty scoring system where scores are assigned for various errors executed during the test such as tool tip deviations, presence of bleeding, and excess suturing. Joices' team investigated surgical performance using a Human Reliability Analysis to identify inter- and intrastep errors. These are errors made at the procedural and motor skill levels. These types of analyses are useful and can co-exist with our techniques. One valuable approach for performance evaluation was recently demonstrated on a porcine model and is based on an analysis of tool-tip force/torque signatures (Rosen 2001). The study uses a Markov modeling technique in conjunction with a structured classification of tool actions to classify and assess surgeon performance. They showed that they could correctly classify surgeons into two experience levels based on the similarity of the Markov models representing their low-level tool-tissue interactions to models derived from reference groups representing the two experience levels. Rosen identifies other feedback data such as tool position as likely alternative or supplementary bases of performance evaluation. 1.5 Training and Simulators Success in laparoscopy is highly dependent on the surgeon's proficiency and experience (Perissat 1995). Apprenticeship training is the most common method o f providing experience and improving proficiency of new surgeons. Unfortunately, the surgical trainer cannot control which patients enter the hospital, potentially limiting a trainee's exposure to a range of surgical problems. Training on animal models may avoid some of the stresses and time constraints of apprenticeship training, although the anatomy often differs from that of humans, one cannot often replicate a disease state in the animal, and maintaining an animal care facility is expensive. There are also moral and ethical concerns related to training on live 7 animals. Some places, for example the United Kingdom, have banned the use of animals for surgical training (Lir ici 1997). Surgical simulators are becoming more widely used as they can represent a wide range of physical models and virtual reality computer simulations. Recently, virtual reality systems with haptic interfaces have been developed. Simulators are designed to emphasize either or both of psychomotor skills (e.g., suturing drills) and the more cognitive, process-related aspects of surgery (e.g., the CyberPatient process simulator program). Figure 1.4 shows examples of simulators commonly used. Figure 1.4 - Illustrations of commonly used surgical simulators. Computer based virtual reality systems (left), anatomical models (middle), and bench-top simulators (right). (Source: www.laparoscopy.com) Surgical training programs are moving towards a more structured curriculum in which surgical models and simulators play a more important role both in developing surgical skill and in evaluating and certifying trainees. To increase the acceptance and integration of simulators into training programs, it is important to show that time spent on simulators can actually replace O.R. experience. For the evaluation and certification of trainees on simulators it is important to show a performance evaluation based on a simulated task strongly correlates with the appropriate aspects of live surgical performance. It is crucial to ensure that the lessons learned on the simulator and the measurements of technical skil l made there are truly relevant and applicable to performance in the operating room. 8 1.6 Simulator Validation A validated surgical simulation is important for (1) surgical skill training, assessment and certification, and (2) equipment evaluation and design. The challenge in validating a surgical simulation is to guarantee that performance results on the simulator represent performance in the OR. For the purpose of surgical skill training, assessment and certification, the value of a simulation depends on how closely it captures the elements o f perceptual-motor and cognitive skills used in an actual task (Sanders 1991). To validate a simulation, we must ensure that a surgeon treats the simulation in as many relevant and measurable respects as possible, the same way they treat a live patient. A number of research groups are attempting to validate surgical simulators primarily for the purpose of training and evaluating new surgeons. Most groups select some mechanical manipulation tasks, which they consider comparable to real surgical tasks. Completion time is the most common measure of performance (Chung 1998, Derossis 1998, Fried 1999, Hanna 1998, Hodgson 1999, Rosser 1997, Starkes 1998, Taffinder 1999). Some groups have used error and precision measures as methods to quantitatively assess performance (Derossis 1998, Fried 1999, Hanna 1998, Keyser 2000, Taffinder 1999). It has typically been shown that more experienced surgeons score better on the trainers than novice surgeons (O'Toole 1998). Comparatively little work has been done to quantitatively show the relationship between the performance on simulators and performance in the O R Current methods for establishing correspondence between simulated and actual O R tasks are poor. The authors tend to agree that the chosen simulated tasks are relevant to O R tasks and that performance in simulated and real tasks are related; this notion that there should be substantial similarity between simulations and real situations is known as "face validity" (Reznick 1993) and is typically considered the first and least rigorous of about four validation steps (least rigorous because it is typically not quantified, but simply emerges from a consensus of application experts). It is my contention, however, that by using motion capture instrumentation for measuring the details of how a surgeon moves, we can produce a quantitative similarity measure between simulated and real tasks - the first quantitative measure of face validity. 9 The research reported here uses detailed quantitative measurements of surgical tool motions and surgeon posture from both live operative procedures and surgical simulations, broken down by surgical context and motor task classification, to establish correspondences between the simulation and the operative tasks. The goal is to develop simple desktop simulations (perhaps based on existing ones - e.g., Derossis 1998) which simulate tool and arm motions similar to those surgeons use in actual surgery. Such detailed kinematic comparisons cannot be done using manual measurement techniques and so w i l l require the kind of intraoperative measurement technique we have described here. 1.7 Ergonomics In addition to ski l l evaluation, postural assessment is important in evaluating improved instrumentation and validating simulators. Postural assessments are common in repetitive task workplace settings, but only a small number of studies have been done in the operating room. Radermacher et al. was one of the first to assess minimally invasive surgeries and suggested improvements for workplace design. Berguer showed that the postures in laparoscopic surgery are more stressful than in conventional open surgery using video and force plate measurement techniques (Berguer 1997). In these past studies, visual or video based observations have formed the bases of postural assessment. Recently, Person conducted a pilot study to investigate the feasibility of using an optoelectronic system to measure joint angle position and assess ergonomic stress during a live O R procedure (Person 2001). From his study it remains unclear as to how strongly ergonomic stress is affected by procedural and patient specific factors. Surgeon injuries resulting from poor instrument design are also common (Horgan 1997). Several musculoskeletal disorders have been associated with laparoscopic surgery including: thumb pain ("Laparoscopic surgeons' thumb") (Neuhaus 1997), hand and elbow pain, lower back pain, and shoulder pain (Buschbacher 1994). Surgeon fatigue, digital neuropathies, and finger discomfort are common conditions among laparoscopic surgeons (Mueller 1993). Research to improve surgeon comfort has resulted in improved handle designs. Different handle designs are available including: (1) axial handles with 10 trigger rings, (2) inline handles with trigger rings, and (3) pistol grip handles with a flat shank or trigger rings (Matern 1999). Many of these new tool designs have been shown to benefit surgeons by reducing fatigue and numbness in hands and arms (Hasson 1993, Neuhaus 1997, Horgans 1997). 1.8 Analysis of Performance Measures The goal of this research is to establish reliable measures of performance which can be used to validate surgical simulators, assess performance of surgeons, and quantify performance of new surgical instrumentation. A s discussed in Section 1.4 there are a variety of measures which can be used to assess performance. In this research we investigate time, kinematic, joint angle, and sequencing performance measures. To compare performance data across two settings (i.e. O R vs. simulator, or Tool A vs. Tool B in a simulator or the OR) we use a hierarchical decomposition technique similar to Cao (1996) to decompose the entire operation into phases, stages, tasks, subtasks, and actions. We extract performance measures from individual, tasks and subtasks, and assign a measure of dissimilarity to the data sets obtained from the two settings. The Kolmogorov-Smirnov (K-S) statistic D is used to quantify the discrepancy between the two distributions of the measured performance variable. The K - S statistic indicates the maximum vertical difference between two cumulative distribution functions (see Figure 1.5). x Figure 1.5 - The Kolmogorov-Smirnov statistic, D, is defined to be the maximum vertical difference between two cumulative distribution functions. The cumulative distribution function for a finite data set is represented as a staircase. 11 The Kolmogorov-Smirnov statistic is useful because it is nondimensional, makes no a priori assumptions about the nature of the statistical distribution of the data, and is also comparatively insensitive to the presence of outliers in the data. The K - S statistic is often used to test the null hypothesis that two sets of data were drawn from the same underlying distribution. Since our two data sets are obtained from two separate tasks we would expect the two distributions to be different. For our analysis we are interested in the measure of the difference between the two distributions. Significant differences wi l l be interpreted as indicators of where we need to refine our simulation to better match the corresponding operating room (OR) task, to quantify differences in performance levels and to evaluate new tool designs. 1.9 Kinematic Assessment Tools We have chosen a complementary approach to work done by Rosen by investigation of performance measures using kinematic data (Rosen 2001). Our approach is to investigate the kinematic measures related to tool position and upper limb position. To track surgical tools and limbs in space, we require a high frequency tracking system. Several commercially available systems are available for tacking the position and orientation of markers in space. Such systems include magnetic, ultrasonic and optoelectronic trackers, and accelerometer/gyroscopic base systems. In addition, goniometers can be used to produce joint angle estimates. Each system has its own unique benefits and drawbacks. Magnetic systems provide continuous, high frequency sampling (60Hz) however, each tracker requires a fixed wire attachment to the tool interface system. In addition, magnetic systems are strongly influenced by external ferrous materials and electromagnetic fields, which are often present in the OR. Ultrasonic systems are continuous high frequency wireless systems which utilize high frequency sound waves to track objects. Ultrasonic trackers are also faced with the line-of-sight requirement of the transmitter and receivers. This system is also influenced by ambient noise sources. Optoelectronic systems provide wireless high frequency data collection and are easily sterilized. Hybrid systems are capable of tracking both passive (wireless) and active 12 (IRED's) markers. These systems are limited by its line-of-sight restriction and interference from external infrared light sources. Accelerometer/gyro based systems also provide high frequency sampling, although they also require fixed wire attachments and often require extensive calibration. 1.10 Research Questions The long term goal of this research is to quantify the fidelity of surgical simulators with applications to surgical training, surgical skill evaluation, equipment evaluation and equipment design for minimally invasive surgery. The objective of the portion of this research reported in this thesis is to establish a methodology to quantitatively assess the performance of an expert surgeon for a laparoscopic cholecystectomy procedure. A number of questions are addressed in this research: 1. What is a feasible way to quantitatively measure surgeon performance? What measures of surgeon performance can be used to reliably quantify performance in a non-stereotyped O R task? What are the characteristic performance measures which describe the performance of an expert surgeon and how can these measures be presented in a clinically useful way? 2. What is the repeatability of biomechanical stresses on an expert surgeon performing a laparoscopic cholecystectomy using a quantitative motion analysis technique? Is a motion analysis based measure of stress strongly affected by procedural and patient specific factors? How can we quantitatively assess the performance of a surgeon with new surgical instrumentation? Chapter 2 presents a feasible method to measure kinematic data in the live operating room setting. A n assessment of the effectiveness of an optoelectronic motion analysis system in obtaining kinematic data in the O R is provided. In addition we address the effect of interpolated and extrapolated data on measurement accuracy. This chapter also presents research in the quantitative evaluation of surgeon performance. We investigated 13 the similarity of kinematic performance measures based on a hierarchical decomposition of surgical tasks in a laparoscopic cholecystectomy. In addition event sequencing is used to quantify the repeatability of progressing through the logical steps of a procedure. Chapter 3 presents the repeatability of biomechanical stresses on an expert surgeon performing a laparoscopic cholecystectomy. The effect of patient specific factors on measurements of biomechanical stress is investigated and tested to see i f a single procedure provides a reliable estimate of stress experienced by a surgeon. Chapter 4 summarizes the findings of the two main sections of the thesis and proposes future studies using the methodology developed here to validate surgical simulations. 14 Chapter 2 Quantitative Methodology of Evaluating Surgeon Performance in Laparoscopic Surgery 2.1 Introduction Over the past decade, minimally invasive surgery has become an integral part of modern medicine. However, developments in teaching and evaluation methods have not kept pace. Quantitative performance and skill assessments are critical for evaluating the progress o f surgical residents and the efficacy o f different training programs. They are also important for validating surgical simulations and in assessing new surgical tool sets and techniques. Current evaluation methods are subjective and potentially unreliable, so there is a need for objective methods to evaluate surgical performance (Chung 1998, Rosser 1998, Winckel 1994). Surgical simulators are playing an increasingly important role in training and objective evaluation of surgeons. Commonly used surgical simulators include bench top trainers, animal models and virtual reality systems. A wide selection of performance measures have been suggested in the literature to quantify performance including: completion time, frequency of errors, force/torque signatures, low level event sequencing, and tool tip positioning (Chung 1998, Derossis 1998, Fried 1999, Hanna 1998, Hodgson 1999, Rosser 1997, Starkes 1998, Taffinder 1999, Torkington 2001). One valuable approach for performance evaluation was recently demonstrated on a porcine model and is based on an analysis o f tool tip force/torque signatures (Rosen 2001). The study uses a Markov modeling technique in conjunction with a structured classification of tool actions to classify and assess surgeon performance. Rosen identifies other feedback data such as tool position as a basis of performance evaluation. As simulators come to serve a more crucial role in the early training of a surgeon, it is important to ensure that the skills they learn are directly relevant to the practice of surgery in the operating room (OR). This is important because of the comparative ease in making objective measurements in a simulator. 15 The long-term objective of this research is to validate surgical simulations by showing that time spent practicing psychomotor skills in simulators relates to improvements in surgical skil l in the live operating room setting, and that measured performance in the simulator is a reliable measure of surgical skil l . In general, current techniques for showing validation demonstrate that experienced surgeons perform better than novices in simulated tasks and that practice leads to improved simulator performance (Rosser 1997, Starkes 1998, Chung 1998, Derossis 1998, Hanna 1998, Fried 1999, O'Toole 1999). In some cases these assessment methods have shown that simulators can be used to evaluate progress (Starkes 1998, Chung 1998) and even more compelling, that practice in the simulator alone leads to improved performance in a pig model (Fried 1999). This implies that carefully chosen simulated tasks can be used to replace O R time for learning psychomotor skills. Our goal is to show quantitatively the performance level of any particular surgeon based on measurements taken in a simulated setting. This chapter outlines a methodology for evaluating surgeon performance and establishes a framework for a quantitative analysis of the performance of an expert surgeon performing a laparoscopic cholecystectomy (gallbladder removal). The objective is to identify a feasible method to measure kinematic data of a surgical tool and postural data in the live operating room setting and to assess the reliability of this analysis method based on a hierarchical decomposition of surgical tasks. We investigate the effect of patient and procedural variability on our assessment of performance and show examples of how this method can be used to validate surgical simulators. 2.2 Methods 2.2.1 Framework 2.2.1.1 Project Methodology The goal o f this research is to develop a methodology for validating surgical simulations. The methodology developed here is intended as a tool to measure and highlight differences of surgical performance in different settings. Primarily, we are interested in using this method for the purpose of validating surgical simulations. However, we have 16 also identified its utility for clinical certification and equipment evaluation and design. Consider the following example: to help validate a surgical simulation we may look at the performance of an expert surgeon in the live O R and compare this to his or her performance in a surgical simulation. If we can show that performance for all relevant aspects in a simulation is essentially the same as performance in the OR, then we have a validated simulation or face validity (Allen, 1986). Finding discrepancies between two settings can help us identify and correct problems with the simulation design, help an instructor to correct a trainee's technique, or help tool designers make improvements to instrumentation. To demonstrate our comparative analysis approach, consider the following examples where this method might be applied (See Figure 2.1): R e f e r e n c e d a t a b a s e s imulator O R R e f e r e n c e d a t a b a s e CD o o Iterative p rocess S imula tor val idat ion Exper t surgeon Cert i f icat ion Iterative p rocess Equ ipment evaluat ion & des ign Exper t surgeon S imu la to r pe r fo rmance Par t icu lar su rgeon N e w toolset in s imulator s imulator Figure 2.1 - Typical uses of the comparative analysis methodology for simulator validation, certification, and Equipment evaluation & design. The segments circled represents contributions of this thesis. 17 Simulator Validation: For the purpose of validating surgical simulations we look for discrepancies in selected measures of performance from the O R and the simulator. Having identified these discrepancies, we can go back and make adjustments to the simulator and conduct a new evaluation. The process is continued until we are satisfied the simulator matches the relevant aspects of the O R procedure we are interested in. Surgeon Certification: Accurately quantifying surgeon performance is important to training and certification of trainees and surgeons. Using our methodology, we can extract performance measures of a trainee in the O R (or simulator) and compare his or her performance to a reference database of performance measures from trainees at various levels in their training programs. This w i l l allow us to quantitatively assess the skill level and identify problematic areas where improvements are required. For example, i f we look at a particular surgical task, such as applying a clip, we may notice a surgeon spending more time doing a particular aspect of the clip application. Using our methodology, we can highlight specific problematic areas and suggest ways of making improvements. Instrumentation Design and Evaluation: Using a validated simulation, tool designers can evaluate the performance of new instrumentation and be confident their performance wi l l transfer to the OR. A s in the validation process of surgical simulators, the evaluation of new instrumentation is an iterative process where the performance of the new tool is compared with an existing database of performance measures for past tool designs. For the purpose of making these comparisons, we establish an organizational framework describing a surgical procedure. Surgery in general can be thought of as collection of tasks organized in a particular sequence, but diversions may occur due to unexpected or variable events or conditions, such as bleeding. A hierarchical decomposition allows us to study difference in tasks even when the procedures are not exactly the same (Cao, 1996). These differences are found in the low level details of tool movements which are common to all procedures. The hierarchical decomposition is useful as an organizational 18 framework for making various context comparisons. For example, simulators are typically designed to model only specific tasks rather than the entire procedure. The decomposition approach wi l l allow us to make comparisons between surgical tasks and simulations of those tasks. The basis of our methodology for comparative analysis requires quantifying face validity. To quantify face validity we select meaningful measures of performance and compare their distributions across settings. See Figure 2.2 (top). A useful performance measure wi l l demonstrate reliability and incorporate details of surgical performance which are important to the quality and efficiency of a procedure. Establishing the reliability of a particular performance measure wi l l depend on its behavior in different settings (comparatively unaffected by variations between patients and procedural differences and yet sensitive to real differences such as training effects and improved instrumentation) and its subjective acceptance by clinicians. Under ideal conditions we would expect a reliable performance measure to behave essentially the same in settings, which are identical in all relevant aspects. For quantifying the difference between performance measure distributions across various settings we use the Kolmogorov-Smirnov (K-S) statistic (Press 1992). The K - S statistic provides a difference measure between two distributions, which produces values from 0 (similar) to 1 (different). See Figure 2.2 (bottom). The K - S statistics was chosen because it considers the entire distribution rather than other statistics which only test for differences in mean (/-test) or variance (/-test) (Von Mises, 1964). 19 Distr ibut ions of per fo rmance m e a s u r e s Sett ing A Sett ing B C o m p a r e Distr ibut ions K S Stat ist ic D. - • 0 Dif ference S c a l e 1 Figure 2.2 - 4^ demonstration of our validation assessment technique. Performance measures are extracted from different settings and compared with each other using the KS statistic. The KS statistic provides a measure of difference which is used to show how different two distributions are. B y developing a reference database of comparative analyses (OR vs. simulator, surgeon A vs. surgeon B , etc) we establish meaning or context to the difference scale. For example, i f we evaluated an expert surgeon from day to day on a surgical simulation we would expect the distribution of performance measures to be extremely similar since the simulator has not changed and the expert surgeon is fully trained. On the other hand i f we evaluated a novice surgeon we would expect some degree of difference attributable to learning effects. If we evaluated an expert surgeon only in the O R and compared performance measure distributions from procedure to procedure we would expect to see differences attributable to patient and procedural variability. These measures of difference provide meaning to the difference scale and thus establish reference points for clinical interpretation. Finally, we are concerned with our ability to distinguish differences among our performance measures. In particular this is important when trying to resolve differences between surgeons of difference skill in the O R or simulator. In the OR, our measurements are influenced by procedural and patient variability. As a result we want to determine the resolution of our measurements so we can reliably detect differences between performance measures. 20 The contributions of this methodology presented in this chapter include: (1) a method to reliably measure completion time, kinematic, event sequencing, and joint angle performance measures, (2) establishing the reliability of selected performance measures based on the evaluation of an expert surgeon during a live OR procedure and showing the influence the procedural specific factors on variability, and (3) providing examples of context comparisons to illustrate the utility of this method. To demonstrate the application of our proposed methodology we selected an example task (applying a clip), which is performed comparatively frequently (on average seven times per procedure) and compared the characteristic measures associated with its component actions across the clipping segments recorded. 2.2.1.2 Hierarchical Decomposition Many o f our chosen performance measures are specific to only certain segments o f the procedure. For example, the distance for a tool tip relocation maneuver can only be defined for low-level movement segments. To provide an organizational structure for our data analysis, we created a hierarchical decomposition describing the procedure in terms of surgical phases and stages, tool tasks and subtasks, and fundamental tool actions, with each level containing more specific details than the previous (see Figure 2.3). This technique is based on a decomposition approach originally described by Cao, with our system having been modified to improve generality and incorporate additional kinematic features on low-level tool movements (Cao, 1996). The decomposition design was based on the standard elements of laparoscopic cholecystectomies and was developed in consultation with two expert surgeons. The decomposition has five levels; the phase level (1) outlines the global goals o f the procedure. The stage level (2) outlines local goals required to complete each phase. The task level (3) involves the use of a single tool. The subtask level (4) describes how the surgical tool is moving inside the patient. Finally, actions (5) describe kinematic features (or potentially force/torque signatures) of stereotyped fundamental movements such as reaching or sweeping. This five-level hierarchical decomposition provides the foundation for a quantitative analysis of surgeon 21 performance since it allows us to quantify standard aspects of procedures, thereby reducing variability in assessing a procedure as a whole. This framework is useful for making comparisons between simulators and the O R as many simulators only model small segments of the entire procedure (ie. Derossis's mechanical manipulation tasks). Operational definitions such as start and end points are presented in Appendix D . Phase A Subtasks • Actions . P u s h -Spread -Hold A _^ - Translate - Grasp - Reach - Sweep - Release - Retract - Orient - Idle - Pull ( n C l i p < /<^V~~N /^fissue" Application 1 (Free Space\ J;PZ7V (Manipulate UppNcation) ffiSffl Figure 2.3 - Laparoscopic Cholecystectomy Hierarchical Decomposition. The decomposition has five levels, each containing more specific details than the previous. 2.2.1.3 Performance Measures There exists a wide range of performance measures for evaluating surgeon performance. In consultation with two expert surgeons we have identified completion times, tool tip kinematics, joint angles, and event sequencing as potentially relevant measures of performance. Other measures such as force/torque signatures and frequency of errors, as reported in the literature, can easily be integrated into our methodology. The following sections outline our selected performance measures used to assess performance. 22 2.2.1.3.1 Time The most commonly used performance measures in past studies have been task completion time (Chung 1998, Derossis 1998, Fried 1999, Hanna 1998, Hodgson 1999, Keyser 2000, Rosser 1997, Starkes 1998, Taffinder 1999). A time-based measure alone, however, w i l l not necessarily capture all aspects of motor performance. A s a result we look at additional performance measures which we feel capture importance aspects of motor performance. The hierarchical decomposition is used to define start and stop points at each level as outlined in Appendix D . 2.2.1.3.2 Kinematics Using kinematic measures to assess surgeon performance is relatively new. Torkington's group uses electromagnetic trackers to measure distance, number of movements, and speed for surgeon's hand movements in simulated tasks and Rosen suggested its importance for evaluating skill (Torkington 2001, Rosen 2001). In our approach we use tool tip kinematic data to extract: travel distance, mean, minimum, and peak tool tip velocity, minimum and peak acceleration, jerk cost (Nelson, 1986), and R M S straight-line deviations. In additional to the kinematic performance measures we investigated the composition of tool tasks and subtasks in terms of the 12 basic and fundamental tool actions. In total there are 72 feasible combinations of these 12 actions. Figure 2.4 outlines the kinematic details o f selected states. We described each of the 12 basic actions using fuzzy membership functions based on instantaneous tool tip position and velocity. For the purpose of establishing performance measures we use the mean action membership values of tasks and subtasks in each of 12 prototypical actions. The mean action membership value was chosen because of the difficulty in automatically detecting segments of action states. In Cao's work verbal kinematic descriptions of tool actions are found based on video data (Cao, 1996). The analysis of video data is an extremely 23 laborious and time consuming process which lead to our approach of using action membership functions. Appendix E outlines details of calculating membership values. Figure 2.4 - The tool tip and x,y,z frame is calibrated to the MDMArray on the tool handle. The tool tip coordinate frame is used to define kinematic details of each state. Postural assessments in minimally invasive surgery are rare, despite widespread acknowledgement that strain injuries do occur to surgeons (Radermacher 1996, Berguer 1997, Person 2001). Training surgeons to use the correct posture is important for long-term occupational health. We evaluate posture to ensure subjects use the same stance and arm positioning in simulators as they would in the OR. A postural evaluation wi l l allow us to compare joint angles for selective tasks and make adjustments to the position of the simulator to match the OR. Based on upper limb modeling by (Person 2001) we calculate the following joint angles: shoulder abduction/adduction, flexion/extension, and internal/external rotation, elbow flexion/extension, pronation/supination and wrist flexion/extension and radial/ulnar deviation of the dominant arm. 2.2.1.3.3 Ergonomic / Postural 24 2.2.1.3.4 Event Sequencing Finally, we investigated high-level (procedural) surgical event sequences as measures of performance: Our approach to event sequencing modeling differs from the low-level Markov Modeling demonstrated by Rosen (Rosen, 2001). A Markov Model is a random model describing the probability of moving from one state to another based on the transition probabilities and previously visited states at regularly sampled intervals. The Markov Modeling technique explicitly represents time, which is not necessarily important for describing the sequencing of surgical events. Our approach is to separate time and sequencing into individual performance measures. This provides greater applicability for identifying relevant surgical sequencing. In contrast to Rosen's approach of using low-level details of random force/torque signatures, we propose an alternative method based on transition probabilities of structured (non-random) event sequences. This measure operates on the level o f the surgical process and allows us to identify variations in the course of a surgical procedure. 2.2.1.4 Difference Measures 2.2.1.4.1 Kolmogorov-Smirnov Statistic Our methodology relies on extracting distributions of performance measures from different contexts. For any particular variable of interest we use the Kolmogorov-Smirnov (K-S) statistic D to quantify the discrepancy between two distributions of the measured variable. The K - S (D) statistic represents the maximum vertical difference between cumulative probability distributions (see Figure 2.5); this measure ranges from 0 (similar) to 1 (different). The associated p value expresses the probability that the two measured distributions arise from the same underlying distribution and provides an indication to the significance of any differences found in the two distributions. 25 C P D I I I y Figure 2.5 - The Kolmogorov-Smirnov statistic, D, is defined to be the maximum vertical difference between two cumulative distribution functions. The cumulative distribution function for a finite data set is represented as a staircase. The K - S statistic is useful because it is non-dimensional, makes no a priori assumptions about the statistical distributions of the data, and is comparatively insensitive to the presence of outliers (Hodgson, 2002). The K - S statistic is commonly used to test the null hypothesis that two sets of data were drawn from the same underlying distribution. In our case we use the K - S statistic as a measure of the difference between the two distributions and use this measure of difference to quantify differences in performance measures. Our data is based on two discrete sets o f measures; thus, the D value we find is an estimate of the difference between the two sets. In order to assign a confidence interval to our estimate of D we use a bootstrapping method based on the single set of measurements. This method involves finding the distribution of D values by randomly resampling the dataset and applying the K - S statistic n times where n»0. Bootstrapping is a way to test the reliability of our dataset (Efron, 1986). Due to time and cost restraints we often have a finite collection of performance measures. Analysis of the K - S statistic suggests when the two finite sets of measures are small or when the difference in means is low, our estimated D value is consistently greater than the actual value (Hodgson, 2002). Our goal is to compensate for size of data set to enable direct comparison across D measures. Using numerical experimentation, the following function gives an approximation of D (our estimated D value): 26 _ _ t r N 0.7742D + 0.0078 D=D + DJN) 1 = v Where: InD 0 (N) = k]+k2\nN ki = 0.0261, k2 = -0.471. [2.1] Given Z) (our estimate), we can solve Equation 2.1 for D, which is adjusted for TV where N is the number of elements in the data vector. This approximation for D is based on comparing equally sized distributions and normally distributed data. 2.2.1.4.2 Sequencing Delta Measure Transition state diagrams were used to represent these event sequences and describe the probability of transitions between different states (Hennie, 1968). These diagrams provide an indication as to how often various transitions occur during an event sequence (see Figure 2.6). The transition probabilities are calculated by dividing the number of transitions between one state and a second state by the total number of transitions from the first state to all other possible states. The transition probabilities of the state diagram are reduced to a Transition Probability Matrix (TPM). Each cell of the transition probability matrix (TPMy) represents the transition probability from one event to another (event / to event j). 0.20, 0.20 End A B C A 0.20 0.60 0.20 CD -4—» W B 0.75 0.00 0.25 C 0.00 0.50 0.50 0.00 Figure 2.6- Example of a state transition diagram and probability transition matrix. 27 We developed a sequencing discrepancy measure to quantify differences between event sequences. For any particular event sequence (ie. sequence of subtasks during a clipping task) we use a sequence difference measure 8 to quantify the discrepancy between the two event sequences. For example, consider comparing a single procedure event sequence (P,) to a reference event sequence (Prej) to establish a measure of difference 8. The definition of the sequencing discrepancy measure 8 is given by: n ^ 1 (^re.f ^sample^ 5 = ^  [2.2] n /=i th where Prej and Psampje, are the i transition probabilities of TPMref, and TPMsampie respectively, w, represents the frequency of the ith event in the reference sequence and n is the total number of transition probabilities. The sequence difference measure 8 is analogous to our K - S (D) statistic as it is non-dimensional and varies on a scale of zero to one (similar to dissimilar). In addition, our measure satisfies the definition o f a metric (ie. for sequences S, T, and U: S(S,T)>0, non-negative; 8(S,T)=0 <=> S=T, coincidence; S(S,T)=S(T,S), symmetry; 8(S,T)+8(T,U)>8(S,U), triangle inequality) (Mannila, 1998). The sequence difference measure is based on two finite sequences resulting in an estimate of the difference between the two. We use a bootstrapping approach to assign a confidence interval to the estimate. 2.2.1.5 Evaluating Reliability To demonstrate the reliability of the proposed analysis procedure, we must show the performance measures are comparatively unaffected by variations between patients and procedural differences and yet sensitive to real differences such as training effects and improved instrumentation. A database of performance measure data from various ski l l -28 leveled surgeons in the O R and simulator is required for establishing overall reliability. In this document we (1) quantify the effects of procedural and patient specific variability for a single surgeon and (2) use contextual comparisons to show sensitivity in an attempt to establish the reliability of our proposed performance measures for an expert surgeon. 2.2.1.5.1 Procedural Specific Variability To demonstrate the effect of procedural specific variability we use the K - S statistic to measure the difference between a distributions of D values related to a single procedural comparison (with patient variability) to our best estimate of the distribution of D values without procedural specific variability. Any difference between these distributions is an estimate of procedural variability. To illustrate this analysis technique, consider total completion time for a clipping task. Each procedure produces a distribution of clipping completion times represented as a cumulative probability distribution (CPD). The underlying distribution of completion times is established by grouping together clipping task completion times of all procedures to get a total cumulative probability distribution (CPDT) as shown in Figure 2.7 (left). The CPDj's of each procedure are compared with the CPDn (CPD for all procedures excluding CPDt) to produce a distribution of measured D values (CDFYDmeasured))- The CDF(Dmeasurea) represents the range of D values we would expect to get i f we measured the same surgeon in the O R on an arbitrary patient. See dotted line shown in Figure 2.7 (right). To establish our best estimate of a D distribution, where we would expect no procedural specific contributors to variability, the CPDT is randomly resampled in lengths equivalent to the number of occurrences in each procedure (for example, a clip task occurs on average seven times per procedure thus the resample length is seven) and compared against itself without replacement for r resamples where r»0. This produces a distribution of reference D values (CDF(Dreference)) which is the range of D values we would expect to get i f we measured this surgeon on a "standard" (identical) patient with no specific contributors to variability. See solid line shown in Figure 2.7 (right). 29 J (m=random resample length) Figure 2.7 - Performance measure reliability analysis - Cumulative Probability Distribution's (CPD) of total clipping time for individual procedures (CPD,) and an accumulated grouping (CPDT) (left). CPD of Dreference and Dmeasured for increasing number of procedures analyzed (right). The calculation process for evaluating the Dreference and Dmeasured distributions (bottom). The shift between Dreference and DmeaSured is an indication of the effect of patient variability. This provides a measure of our ability to detect differences with the same surgeon and different procedures. The K - S statistic is used to measure the difference between Dreference and Dmeasured (D(Dreference,DmeaSurect)) and check i f procedural and patient specific contributors are significant. Significance is established using both analytical and 30 bootstrapping techniques outlined in Appendix E . In combination with the K - S statistic we also measured the horizontal shift (D/wrz) to quantify the magnitude of procedural specific variability. Dhorz is measured by subtracting the means of each distribution of D's. To further investigate our ability to resolve differences and to identify effects of procedural specific variability we applied the above analysis technique to multiple procedures. In this case CDF(Dmeasured) is generated by comparing CPDj2c to CPDTOC where CPDJ2C represents all two procedure combinations of CPDj and CPDTOC represents all procedures excluding CPDJ2C- The CDF(Dmeasured) is calculated by resampling in lengths equivalent to the number of occurrences in two procedures and compared against itself without replacement. Figure 2.7 (right) illustrates the effect of multiple procedure comparisons. A technique analogous to this method is used to evaluate the procedural specific variability of event sequencing performance measures. This modified technique replaces the D distributions (CDF(Dmeasured) and CDF(Dreference)) with 5 distributions (CDF(8measured) and CDF(8rejerence)). A l l other steps are the same. A demonstration of this technique is presented in Appendix E . 2.2.1.5.2 Context Comparisons To demonstrate reliability of our selected performance measures (in our case the application of the K - S analysis technique) we assess differences by comparing distributions of performance measures obtained from multiple executions of a particular task or subtask in two different contexts. These contextual comparisons allow us to highlight differences which may, for example, show the procedural variability which is useful for establishing meaning to our difference scale. 31 2.2.1.6 Clinical Interpretation For the purpose of surgical education it is useful to describe overall performance rather than individual performance measure evaluations because our analysis produces so many of them. In this section we provide a method of reducing our performance measure collection to an overall evaluation parameter. Figure 2.8 shows a conceptual illustration of this proposed analysis technique. Assessing a surgeon's skill level requires comparing their collection performance measures to a reference database, resulting in a collection of D- values. I Sequencing Figure 2.8 - Global performance evaluation measures for a clipping task. A collection of D and 5 values is presented for each performance measure comparison. A weighted average based on performance measure reliability is used to reduce each performance measure of a single performance value for each performance measure type (time, kinematics, posture, sequencing, etc.). Each performance measure type is plotted in a multidimensional performance space 32 To start, we must show reliability of our performance measures in distinguishing differences in various settings such as the O R and simulators for both experts and novices. From these analyses we extract ranked ordered lists of performance measures based on their reliability. For example, consider our collection of kinematic performance measures (tool tip travel distance, peak velocity, e t c . ) . These measures are ranked according to their ability to distinguish performance differences, their influence from patient variability and subjective clinical importance (ie. clinically smoothness may be more important than tool path distance). The ranked ordered lists are established for each performance measure type (ie. kinematics and posture) within each subtask, resulting in four performance measures (time, kinematics, posture, and sequencing). For each subtask within a task, the performance measures are summarized according to clinical importance. The summary is based either on maximum values (for the purpose of highlighting areas requiring improvement) or a weighted average. The weighting of each subtask is related to its importance for overall task completion and its potential for risk of injury. For example, there may be more risk to injury in the tissue manipulation, and clip application subtasks than the approach and withdrawal, thus giving a higher weighting. The summarized measures are plotted in a multidimensional performance space. These values are averaged to give an overall evaluation parameter. This method is useful for comparing performance of different skilled surgeons in both the O R and simulators. It is beneficial for highlighting areas requiring improvements thus allowing surgical educators to correct a trainee's technique by suggesting specific areas needing improvement. 2.2.2 Experimental Protocol and Data Acquisition and Processing 2.2.2.1 Equipment A n optoelectronic motion analysis system was used to acquire postural data and tool tip trajectories at frequencies of -20 Hz . For this study, we used a Northern Digital Polaris Hybrid Optical Tracking System capable of tracking the 3D position of both active infra-red light emitting diodes (IRED's) and passive retro-reflective markers with an accuracy 33 of -0.2-0.3 mm. The optoelectronic system was chosen over other motion capture systems because it is well tolerated by surgeons, accurate, and parts in the O R field are easily sterilizable. The Polaris system was connected to a standard P C (800 M H z A M D Duron) running custom-designed data acquisition software written in Matlab. In addition to the optoelectronic system, we recorded video images of the surgery using both a laparoscope and an external camera focused on the surgeon. The images from these two sources were time stamped and recorded onto standard V H S tape; from these images, we could later determine the stage of the operation and correlate it with the detailed motion measurements. The laparoscope system was comprised of a standard 10 mm - 0° surgical laparoscope, camera and illuminator (Stryker Endoscopy). These components are arranged and connected in the OR as illustrated in Figure 2.9. A l l equipment used in the operating room was tested and approved by Vancouver Hospital's Biomedical Engineering Department and sterilized with ethylene oxide when appropriate. :K>0 V i d e o C a m e r a >•) figure No. 1 Pile Edtt view Insert Tools Window Help P e t e Motion Capture System (OR Version) *i.Qjxj T i m e cod ing V C R E n d o s c o p e T i m e code Synchron iza t i on P o l a r i s Desk top P C Pos i t i on > S e n s o r L a p a r o s c o p i c Too l Posture l» Right Hand Tool Choice ] ' J 0k*« • Oose UBC • Neuromotor Control Laboratory Figure 2.9 - Data collection components - Polaris Hybrid Optical Tracking System and Video Recording Equipment (left). Custom designed data acquisition software written in Matlab (right). The surgical instrumentation used for the study consisted of a standard set of reusable laparoscopic tools designed for cholecystectomy (gall bladder removal) procedures. The same tool set was used in each procedure. None of the tools were specifically designed to enhance ergonomic comfort. 34 2.2.2.2 Protocol One expert surgeon was evaluated in seven clinical laparoscopic cholecystectomies over a period of four months at Vancouver Hospital (January - A p r i l , 2001). For each procedure, clinical assistants and O R staff varied. There was no a priori selection of the patient group, which included both males and females with varying weights. We attached sterilized marker arrays to the surgeon's torso and dominant arm (on the proximal and distal forearm and on the hand), as shown in Figure 2.10 (left, middle). This set of marker arrays enabled us to track and record the surgeon's joint angles at the shoulder, elbow, and wrist. Arrays were secured to the surgeon using elastic and Velcro® harnesses. The torso marker sling, passive marker arm cuffs, and adhesive attachments to the hands were found to be acceptably immobile relative to the torso, arms, and hands in subjective tests (Person 2000). This set of marker arrays enabled us to track and record the surgeon's joint angles at the shoulder, elbow, and wrist. Figure 2.10 - Model of the surgeon's arm (left). Marker array mounting (schematic and as mounted in the operating room (middle)). Multi-Directional-Marker-Array (MDMArray) (schematic and as mounted on a laparoscopic surgical tool (right). The values labeled on each marker represent the percentage of time it was visible during manipulation segments. 35 Custom-designed Multi-Directional Marker Arrays (MDMArray) designed to enhance trackability by increasing the angle at which they could be viewed (relative to planar marker arrays) were attached to each laparoscopic tool handle used by the dominant hand, as shown in Figure 2.10 (right). A planar marker array was attached to the non-dominant hand tool and the laparoscope. The M D M A r r a y s were equipped with a quick release clip to allow easy attachment and removal. Six identical M D M A r r a y s were built and attached to six different laparoscopic tools for the entire procedure (grasper, L -grasper, ratcheted-grasper, clipper, scissors, spatula). One researcher scrubbed into the surgery and attached the sterilized marker arrays to the surgeon's hands, lower arms and torso and laparoscopic tools. This researcher remained to assist the surgeon with any adjustments to the marker arrays or support cuffs. If the surgeon felt uncomfortable wearing the markers at any time during the surgery, they were free to stop the experiment and remove the markers. Immediately postoperatively, the surgeon adopted a reference posture which defined the neutral or zero point for all angles and then performed a set of standard isolated joint motions to provide data used in estimating the joint locations. The tip position of each surgical tool was found using a sphere-fitting calibration procedure in which the tip was placed in a small depression on a fixed surface while the tool was manipulated about that point. In addition to our marker tracking study, we used video archives from an additional eight procedures. The archived videos extended back two years and all were of the same expert surgeon. The archived video recordings were used only for extracting completion time and event sequencing performance measures 2.2.2.3 Missing Data and Signal Processing Optoelectronic systems have inherent line-of-sight limitations causing segments of missing data. To alleviate the problems of missing data we developed custom-designed tool marker arrays with multiple faces visible from several directions (the M D M A r r a y s mentioned above). The non-equally sampled data (-20 Hz) was fitted with a cubic spline 36 using an optimally found smoothing parameter generated by a Generalized Cross Validation ( G C V ) technique (Woltring, 1982). Spline coefficients were used to resample position data and find higher order derivatives. This method is described in Appendix C. To determine what order of derivatives could be reliably measured, a frequency spectral analysis was completed using a Fast Fourier Transform (FFT) to show signal power levels of the position signal and higher order derivatives. To establish the feasibility of using an optoelectronic system for tool tip and posture tracking in the operating room, we measured the frequency of marker occlusions and estimated the errors associated with interpolating missing data. 2.2.2.4 Joint Angle Processing The Polaris system reports the spatial position of the marker arrays in space. Joint angles are calculated from the measured position of the marker arrays located on the surgeon's hand, forearm, upper arm and torso. We used a simplified seven degree-of-freedom (DOF) rigid-body model of the upper limb as shown in Figure 2.10 (left). The model allows us to represent shoulder abduction/adduction, flexion/extension, and internal/external rotation, elbow flexion/extension, pronation/supination and wrist flexion/extension and radial/ulnar deviation. Using this model, we applied a kinematic calibration method based on circle and sphere fitting techniques to locate the approximate centre of each of the shoulder, elbow, and wrist joints. The method locates a joint axis or centre of rotation by fitting a circle (or sphere) to a cloud of raw data points collected from limb markers and expressed in a reference frame attached to an adjacent limb segment. These techniques are well known in the biomechanics and robotics literature (e.g., Halvorsen 1999). Using the joint centre data and a neutral posture reference we calculated joint angles for each joint at each sample instant. For the purposes of our analysis we use the joint angle distributions for the task level as measures of performance for a clipping task. 37 2.2.3 Reliability Analysis - Clipping Task To demonstrate and quantify the effect of procedural specific variability we use the K - S reliability assessment method described in Section 2.2.1.5.1. This evaluation method is applied to completion time distributions for the clipping task and subtasks levels. The method is also used for each joint angle distribution for the clipping task. Due to the limited amount of tool tip kinematic data collected, we were unable to apply this analysis technique to our kinematic performance measures. A technique analogous .to the K - S reliability assessment is used to evaluate the procedure specific variability of event sequencing performance measures for the subtask event sequence. By quantifying the effect of procedural variability we are able to establish the resolution of our measurements (our ability to detect differences in different contexts). We expect that as the number of procedures evaluated increases the effects of patient variability is reduced thus allowing us to resolve smaller differences in comparisons of performance measures. 2.2.4 Context Comparison - Clipping Task We performed several comparative analyses related to the clip application task for completion time and kinematic performance measures. Most commonly, clips are used in the vessel separation (cystic duct (CD) and cystic artery (CA)) stage of the procedure; however, they are also occasionally used to control bleeding throughout the procedure. We would expect there would be no significant difference between performance measures taken from alternate clip applications, as both patient variability and any learning effects w i l l be equally represented in the two data sets. Since the surgeon is fully trained, we would hope for little variation between the first four and the last three procedures studied; any differences would be primarily due to patient variability and the results would represent the reliability of our analysis procedure. Finally, we prospectively investigated several other divisions of the data set in hopes of demonstrating the 38 existence of significant and interesting differences; for example, we considered whether there might be differences between clips applied for: vessel separation vs. control of bleeding; separating the cystic duct vs. the cystic artery; first placement vs. subsequent placement on the same vessel. 2.2.5 Clinical Interpretation - Clipping Task Using the performance measure consolidation method outlined in Section 2.2.1.6 we compare the surgeon's performance in a single procedure to all other procedures where we collected motion capture data. For each procedure the performance measure reduction is based on the completion times for both the subtask and task levels and a comparison of joint angle distributions and event sequencing at the task level. We do not use the kinematic measures due to lack of reliability. 2.3 Results 2.3.1 Feasibility A certain amount of position data was lost during the course of the surgery due to marker occlusion or internal localization errors. Figure 2.10 (left) illustrates the marker array visibility averaged over the seven procedures for individual markers. Using video analysis to separate manipulation from non-manipulation tasks, we found the majority o f missing data could be attributed to marker occlusions during segments not directly related to performing a surgical task (e.g. changing instruments). During tissue manipulation tasks, we had complete joint angle data 80% (SD: 10%) of the time and at least one sample per second 92% (SD: 4%) of the time. The trackability of the dominant hand tool depends on which tool is being used (for the last four procedures we tracked the tool 78% (SD: 12%o) of the time, before interpolation, during the clipping task. This average was highly procedural specific. In 12 of the 45 tasks had full data sets and 20 successfully recorded more than 90%> of the samples, where as in some procedures we obtained virtually no data). Table 2.1 provides a summary of data available for analysis: 39 Table 2.1 - Summary of data avaUable from each procedure Procedure Video Data Kinematic Data Task Time Sequencing Ergonomic Tool tip 1 X X X 2 X X X 3 X * X X 4 X X X X 5 X X X X 6 X X X X 7 X X X X A X X B X C X X D X X E X X F X G X * X H X * X * No subtask time available We used a generalized cross validation ( G C V ) filtering technique to find an optimal smoothing parameter for fitting a cubic spline to position data. The data was resampled at constant intervals and missing data segments interpolated. The R M S error associated with interpolating across varying gap sizes was calculated and is shown in Figure 2.11. A n R M S error of 1mm was chosen as an acceptable error which suggests that a maximum gap size of 0.5 sec (10 samples) can be interpolated. We were able to reduce the amount of missing joint angle data by 20% (SD: 2%) (using our interpolation technique with a maximum 1mm R M S error). Using the same interpolation, we reduced the amount of missing data for the right and left hand tool data by 6% (SD: 2%) and 14% (SD: 3%) respectively. The trackability of the right hand tool after interpolation is 82% (SD: 12%). A n analysis of the signal spectral content found each signal having approximately the same cutoff frequency and noise signal properties. Using the Fast Fourier Transform (FFT) of the position data we found the underlying frequency content (fc) to be ~4 Hz . The accumulated power of the position, velocity, acceleration, and jerk signals suggests each signal is appropriate for our analysis as each signal contains at least 93% of its total power at 4 Hz . 40 o CD CD E fit • *• • ** \ •• • Original Data O Missing Data Filtered / Interp. 0.5 1 1.5 Position (mm) 1.5 E. L U C O 0.5 + + • • Mean R M S Error + Max R M S Error 5 10 Gap Size 15 Figure 2.11 - Graphical representation, of missing data interpolation (left). RMS error associated with missing data gap size (right). 2.3.2 Reliability Despite our efforts to track continuous tool tip data we found our data to have sizable missing data segments making it difficult to extract useful kinematic performance measures. Due to the limited amount of useful kinematic data we are unable to show reliability of our kinematic measures. To represent the structure of the procedure, we use transition state diagrams to represent the probabilities of moving from one state to another. The state transition diagram shown in Figure 2.12 shows the transition probabilities of a clipping task where the states represent the subtasks of a clip application. The subtask sequence for a clipping task is well defined as the exiting transition probability's for the following sequence are all relatively high (greater than 85%): start - approach - tissue manipulation - clip application - withdrawal - finish. 41 Figure 2.12 - Stage transition diagram for the subtask sequence of a clipping task. Transition probabilities are shown and represent the probability of moving to the next state. FSM - Free Space Movement (Based on clipping subtask sequences from 15 procedures) Having identified performance measures from the available clipping task data we conduct a comparative analysis to establish measures o f reliability and context similarity. Figure 2.13 shows the results of our procedural variability analysis of clipping completion time for the task and subtask levels. We measured the difference between CPD(Dreference) and CPD(Dmeasured) by evaluating the K - S statistic (vertical distance)1 and the horizontal shift in means of CVD(preference) and CPD(DmeasUred) for multiple procedural evaluations. The results of the bootstrapped analysis of the uncorrected and corrected D value are shown in Figure 2.13(a). We used the K - S statistic to test the null hypothesis that the 42 CPD(Drefere„ce) and CPD(Dmeasured) were drawn from the same underlying distribution. Here, the null hypothesis of the Z)-value distributions arriving from the same distribution is rejected for all cases. This result is verified by a bootstrapping approach outlined in Appendix E . These results suggest our performance measure is influenced by patient variability and that the effect of patient variability is reduced with increasing number of procedures considered. The horizontal distribution shifts are evaluated by subtracting the mean of each distribution as illustrated in Figure 2.13(b). The uncorrected D-values and differences in mean show a decreasing trend with increasing number of procedures analyzed. With the exception of the clip application and the withdrawal subtasks the remaining levels show a general uniformity along the difference scale. The remaining results presented wi l l deal only with the task level. Results for the joint angle and sequencing performance measure analyses are illustrated in Figures 2.14 and 2.15. The confidence bounds on the joint angle and sequencing measures decrease with increasing number o f procedures analyzed. In all cases the difference in distribution means is decreasing. These results suggest that as the number of procedures considered increases our ability to detect differences in performance measure distributions is improved because the amount of patient variability is reduced. 43 co CD " O 0 O o C l 4— o a> E 3 0.2 0.4 0.6 0.8 Difference S c a l e (D) Figure 2.13 (a) - Completion time reliability analysis for task and subtask level of a clip application. KS (D) value between CPD(Dreference) and CPD(Dmeasured). The corrected D value is the analytical correction based on the size of the data set. 0.5 0.4 Q c l 0.3 O o Q c 0.2 CO CP 0.1 • Dref • D m e a s • • • • • • " • -• • • * : . • • • • » 1 . T T A P T M C P W D T T A P T M C P W D T T A P T M C P W D | ,, ( 1 I 2 1 3 Number of Procedures Figure 2.13 (b) - Completion time reliability analysis for task and subtask level of a clip application. Horizontal difference between CPD(Dreferi;nc(,) and CPD(DmmsuKd) . (TT: total time, AP: approach, TM: tissue, manipulation, CP: clip application, WD: withdrawal). 44 CO CD " O CD CJ O r l 2 o 1 CD E T B I I——I h-K—I |-©H -*—I -K-1-0-&=4 Q "S S CD " t v o t o O C O 3 • • Shou lde r O • E lbow F E a • Wr i s t F E O • Wr is t R U x x F o r e a r m P S 0.0 0.2 0.4 0.6 Di f ference S c a l e (D) 0.8 1.0 Figure 2.14 (a) - Joint angle distribution reliability analysis for the task level of a clip application. KS (D) value between CPD(Dreference) and CPD(Dmeasured). The corrected D value is the analytical correction based on the size of the data set. 0.5 0.4 Q Q D_ o H — o a CO CD 0.3 0.2 0.1 0.0 • Dref • Dmeas jlder EL j w FE t FE t RU arm PS j Shoi Elbo Wris Wris CD o LL. • • • • • • • . • • • • • • • • • • • • • • • • • • • • ! 1 1 2 I 3 Number of Procedures Figure 2.14 (b) - Joint angle distribution reliability analysis for the task level of a clip application. Horizontal difference between CPD(Dreference) and CPD(DmeasuKd ) . 45 to CD i _ X J CD o o 1_ Q. >-*— o CD J D E m rB) I B 1 0.2 0.4 0.6 Difference S c a l e (D) • Uncorrected D • Corrected D 0.8 . Figure 2.15 (a) - Similarity sequence of clipping subtasks. KS (D) value between CPD(Dreference) and CPD(Dmeasured). The corrected D value is the analytical correction based on the size of the data set. 0.2 o D Q-o o 0.1 Q c CO CD • • • • • • • Dref • Dmeas 1 1 . . . . t 2 Number of Procedures 3 Figure 2.15 (b) - Similarity sequence of clipping subtasks. Horizontal difference between CPD(Dreference) andCPD(Dmeasured). 46 2.3.3 Context Comparisons Contextual comparisons of completion time and kinematic performance measures were evaluated at the task and subtasks levels for the clip application task. Using the K - S statistic we compared odd vs. even; first half vs. second half (1 s t vs. 2 n d ) of the clip applications. We also investigated separation of the cystic artery vs. the cystic duct ( C A vs. C D ) ; control of bleeding vs. vessel separation (bleed vs. V S ) ; and first placement vs. subsequent placement (init. vs. sub.) on the same vessel. Figure 2.16a illustrates the clipping task and subtask completion time difference scores. For each comparison with the exception of bleed vs. V S we notice the lower bound is zero which suggests we should notice differences in completion time for clips applied for the purpose of vessel separation and clips applied to control bleeding. For the remaining kinematic performance measures we present data only from the task level, as there is little difference between difference measures for each subtask. Figure 2.16b and 2.16c show the difference measures for kinematic performance measures and the mean membership action states, respectively. 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Difference Sca le (D) 0.7 0.8 0.9 1.0 Figure 2.16 (a) - Kolmogorov-Smirnov difference statistic of completion time performance measures for the tasks and subtask levels in different contexts. The confidence intervals marked indicate 5% and 95% confidence and are based on a bootstrap estimate. 47 Distance Avg. Vel . Max. Vel. Min. Vel . Max. Acc . Min. Acc . Jerk Cost =5F =5F =5F 3£ =5F T • odd vs. even • 1st vs. 2nd A C A vs C D Xini t vs. sub. X Bleed vs. V S 0.0 0.2 0.8 1.0 0.4 0.6 Difference Scale (D) Figure 2.16 (b) - Kolmogorov-Smirnov difference statistic of kinematic performance measures in different contexts. The confidence intervals marked indicate 5% and 95% confidence and are based on a bootstrap estimate. • odd vs. even • 1st vs. 2nd A C A vs C D Xini t vs. sub. X Bleed vs. V S 0.0 0.2 0.4 0.6 0.8 1.0 Difference Scale (D) Figure 2.16 (c) - Kolmogorov-Smirnov difference statistic of mean membership action variables. The confidence intervals marked indicate 5% and 95% confidence and are based on a bootstrap estimate. 48 2.3.4 Clinical Interpretation The results of the performance measure reduction are shown in Figure 2.17 for each of the seven procedures with position data. To improve clarity the confidence bounds are not shown. Completion times appear to be influenced by procedural variability more than the postural and sequencing measures. O.th Sequenc ing Complet ion Time Postural 8 7 6 £ 5 CD 4 o o £ 3 2 1 0 I—•--• 1 -I—•--• 1 -• 1 '.0 0.5 1.0 Difference Sca le (D) Figure 2.17 - Performance measure reduction for a clipping task for seven procedures. Performance measures from each procedure are compared to a reference group consisting of all other performance measures. The average performance measure value for each performance measure type (completion time, postural, and sequencing) (left). The overage averaged performance measure for each procedure (right). 2.4 Discussion The primary purpose of this study was to present a method for evaluating surgeon performance and to demonstrate the reliability of our measurement equipment and our selected performance measures. We also showed the utility of such measurements in making clinical performance assessments. We selected the laparoscopic cholecystectomy as our baseline procedure for study, as it is the most common and well-defined minimally invasive procedure (Traverso 1997), and used an optoelectronic measurement system because it is generally well tolerated by surgeons. We also selected the clipping task as a more specific example as clips are frequently applied (in the order of 7 per procedure) during a standard procedure. \ 49 2.4.1 Data Acquisition Our primary contribution was the development of a methodology for quantifying and comparing surgeon performance in different contexts. To achieve this goal, we used an optoelectronic motion capture system and video to collect tool kinematic data, joint angle position, event sequencing, and time data. Unfortunately, we found limitations in our instrumentation, which prevented us from investigating all aspects of performance. We did however collect a considerable amount of useful kinematic, joint angle, sequencing and time data. We found that during tissue manipulation tasks (as opposed to instrument exchanges or maintenance), sampling rates were roughly 20 H z and joint angle markers were fully visible 80% (SD: 10%) of the time; when these markers did become obscured or otherwise unreadable, the resulting gaps in the data typically lasted for only a fraction of a second (we obtained postural data at least one reading per second 92% (SD: 4%) of the time). This sampling rate is perfectly adequate for postural analyses (Bhattacharya et al. ,1999). In combination with postural data, we successfully recorded tool tip position 82%o (SD: 12%>) (after interpolation) of the time the surgeon was performing clipping tasks. Using a generalized cross validation interpolation technique, we were able to interpolate half-second gaps with estimated R M S errors less than 1.0 mm. Despite our efforts to track continuous tool tip data, we found our data to have sizable missing data segments because of marker occlusions making it difficult to extract useful kinematic performance measures; thereby reducing our ability to establish reliability. 2.4.2 Hierarchical Decomposition In our comparative analyses, we found the hierarchical decomposition useful as an organizational tool for evaluating and comparing performance measures for the phase, stage, task and subtask levels. Based on work done in the lab, we found it difficult to automatically extract action states based on our kinematic data. The action membership functions often suggest membership in more than one state at any particular instant making it difficult to detect the duration of action states. Our method of extracting kinematic data is an improvement over Cao's video based system and shows limited potential for reliably detecting action states automatically. In future studies, the 50 hierarchical decomposition wi l l be beneficial for relating various aspects of surgery to simulators since many bench top systems are designed for specific surgical tasks rather than the entire procedure (Derossis 1998). 2.4.3 Reliability Analysis 2.4.3.1 Procedural Specific Variability We used a modified K - S analysis technique to investigate the effect of patient variability over multiple trials. The purpose of this analysis was to show evidence that procedural specific variability is significant and to estimate the extent of inter-procedural variability. We measured the vertical distance (using the K - S approach) and the horizontal difference in means between the two difference measure ( D and S) distributions. The difference measure distributions represent procedural variability (^measured) and our best estimate without procedural variability (Dreference)-Using the K - S statistic we found evidence of procedure variability in all our comparisons of performance measures. We expect to see a certain degree of procedure variability and by sampling more procedures the variability of our performance measures should decrease. We found the confidence intervals of the D estimates decreased and as expected all upper bound limits decreased with increasing number of procedures considered, with the exception of the joint angle distribution comparisons. The mean corrected D values displayed some rather counterintuitive results as in many cases its value did not consistently decrease. This result is difficult to explain and is possibly the result of our approach to correcting D values for sample size. The empirically based correction equation is intended for comparing equally sized distributions and has not yet been verified for unequal sized distributions. In our case we used the effective Ne=NiN2/(Ni+N2) for non-equally sized distributions. Another possible explanation for the non-decreasing behavior of the mean corrected D values is the lack of data. In comparing these distributions using the horizontal difference in means we notice a general decreasing trend with increasing number of procedures considered. In some cases we notice a negative difference (Dmeasured is to the left of Deference), which we interpret as a random resampling effect. 51 The large differences in our joint angle difference distributions can be explained by our approach in using distributions of joint angles rather than an individual performance measure for each clipping task. In calculating CPD(Dreference) we use a large collection of joint angles across seven procedures. The resampling process produces CPD(DreferenCe), where the confidence bounds and mean value decrease with increasing number of procedures. The mean value of CPD(Dreference) is small compared with CPD(Dmeasured) which is to be expected since we compare distributions which are essentially the same. The large values of Dref_meas are expected, as these distributions are certainly different. The confidence bounds on Dref_meas are all decreasing and in all cases the difference in means of the D distribution is decreasing. The range CPD(Dmeasurecj) values are small (-0.2-0.3) which suggests these are appropriate measures of performance. Overall, we found generally the extent of inter-procedural variability is reduced as shown by the decreasing trend of the upper confidence bounds and differences in distribution means for the increasing number of procedures analyzed suggesting an improvement in our resolution to detect differences in performance. The tradeoff of reducing variability is that we need to examine more procedures which is a time consuming and costly process. To our knowledge there has been no attempt to quantify surgeon performance in the O R (only subjective scoring evaluations) or quantify the effect of procedural specific variability. Performance evaluations in the O R have traditionally been done through subjective evaluation, which have "poor reliability and unknown validity" (Lentz, 2001). Our method shows promise as an effective way to quantitatively assess surgeon performance. Our reliability analysis of the completion time, joint angle and sequencing performance measures suggests these are reliable measures of performance. From our analysis of event sequencing it is unclear whether the results suggest the surgeon is extremely consistent or the event sequence is inherent to the way a procedure is completed. A t particular levels within the hierarchy we suspect event sequencing is dependent on the structure of the procedure rather than a measure of skill . Further measurements of surgeons with differing skil l levels is required to make this evaluation. Perhaps a more appropriate model for our a 52 application would allow us to highlight deviations from an 'ideal' procedural sequence or incorporate details of past events. We found our sequencing model to be generally well suited for describing high-level differences between procedures in contrast to Rosen's Markov Modeling approach which is appropriate for characterizing low level state transition rates as illustrated on low-medium-high force torque states (Rosen, 2001). One potential limitation of Rosen's sequencing work is the inability to reveal high-level details of surgeon performance. Together, Markov Modeling and our event sequencing method can coexist and be useful for identify differences in performance. 2.4.3.2 Context Comparisons The K - S statistic was used to test for differences in performance measures in different contexts in order to demonstrate its utility. We discovered negligible differences between the time measures in the odd vs. even and 1 s t half vs. 2 n d half clips suggesting there are no obvious learning effects and little effect of patient variability. The completion time difference measures are on the order of 0.2 on the difference scale, which suggests time is a reliable measure of performance. Our kinematic measures at this point appear to be unreliable as the upper bound of the context comparisons are in the order of 0.9 on the difference scale, which implies to detect differences we must see a difference greater than 0.9. Therefore our current kinematic measures show poor reliability. More data is required to identify i f kinematic measures are reliable for distinguishing differences in performance. In future studies it might be useful to consolidate data across comparable tasks in different contexts (ie. find performance measures for all tool approaches, not just clipping). There appears to be a slight difference between clips applied to control bleeding and clips applied for vessel separation. This result provides insight into how we can develop new simulators as we may wish to train surgeons differently for clips applied in the context of vessel separation than for controlling bleeding. Overall the time and ergonomic measures seem to show most promise for distinguishing differences. 53 2.4.4 Clinical Utility We presented an approach to consolidate performance measures which is important for assessing global performance. Consolidating completion time, postural and event sequencing performance measures from each procedure, we found that completion time is most affected by procedural variability. Our postural and event sequencing appear to be most reliable. Further work is required to establish appropriate weighting measures for the reliability of these and other performance measures. Appropriate weighting measures wi l l be based on reliability and importance of each measure in assessing performance. To establish the resolution of our measurements in distinguishing differences among surgeons, consider comparing the performance of a new surgeon to our current database of a single expert surgeon. If we evaluate the new surgeon on any arbitrary patient in the O R we would establish a new distribution for any particular performance measure (CPDspec), for example time. Comparing the CPDspec and CPDT (from our expert surgeon) we get a difference measure Dspec; i f Dspec is less than the 95 t h percentile of CPD(Dmeasured), then we conclude the performance of the new surgeon is essentially the same (See Figure 2.18). We use the 95 t h percentile of CPD(DmeaSured) as our critical value (DCr) for distinguishing differences among surgeons. 54 Time D Figure 2.18 - Example of measuring the performance of a new surgeon by comparing performance in a single procedure to a reference database from our expert surgeon. We can also convert back from the non-dimensional D values to dimensional values, which may be more familiar to a surgeon or surgical educator. Consider using the De-value for establishing how much of a particular performance measure value is required to detect a difference. To make this quantification, we simply shift the CPDT until the D value between the shifted and nonshifted CPDT is equal to D C R . The amount of shift is correlated to the difference in performance measure value. Figure 2.19 shows the resolution of our time measurements for detecting differences across multiple procedures. From this demonstration we notice our resolution improves over multiple procedures, although the rate of improvement decreases with increasing procedures. It appears the analysis of two procedures is most appropriate at this point for establishing measures of difference. 5 5 Total A P TM C P W D Task and Subtasks Figure 2.19 - Resolution of completion time in detecting difference for increasing number of procedures analyzed. 2.4.5 Future Work The long-term goal of this work is to develop a validated simulation for applications in training, performance evaluation and equipment design. The value of this simulation depends on how closely it captures the elements of perceptual-motor skills used in an actual task (Sanders 1991). We plan both to develop simple desktop simulations (perhaps based on existing ones - e.g., Derossis 1998) which demonstrate tool and arm motions similar to those used in actual surgery by our expert surgeon and to obtain access to commercial simulator units. Using our data collection techniques (where we expect to see acquisition improvements in the controlled setting of a simulator) and statistical analysis procedures, we w i l l establish the reliability of our performance measures in a simulated setting. We also need to measure performance of surgeons at various skil l levels in both the O R and simulator to develop a database of performance measures at various skil l levels. Using this collection of data we would hope to show that on average our inter-procedural variability is small compared with inter-surgeon performance differences. In selecting performance measures, we are considering investigating the distribution of 56 performance measures rather than individual performance measures for different levels of the decomposition (ie. distribution of velocities vs. average velocity) because we want to increase reliability of kinematic measures. 2.5 Conclusion We used an automated high-frequency tool tracking and postural measurement system to identify kinematic features of specific surgical actions of laparoscopic cholecystectomies as defined in a hierarchical decomposition diagram. The results demonstrate that an optoelectronic and video motion analysis, particularly in conjunction with a missing data interpolation technique, is appropriate for postural measurements and reasonable for recording kinematic data in the operating room, despite occasional periods of missing data. Based on the feasibility of using an optoelectronic motion capture system we developed a methodology for validating surgical simulators and classifying surgical performance. The methodology is based on making comparisons of performance measure distributions in difference contexts using the K - S statistic. For the most part, our comparisons of performance measure distributions showed consistent reliability over time by a trained surgeon, and so are likely reliable measures of performance. The kinematic measures require further comparative analysis and more data to show reliability. From our inter-procedural reliability analysis we showed the significance of procedural variability and found as the number of evaluated procedures increases the resolution of our ability to detect differences improves. The presented reliable measures and data wi l l form the basis for quantifying surgical skil l , and should be useful in validating surgical simulations for use in training, certifying surgeons and designing and evaluating new surgical tools. 57 Chapter 3 Repeatability of Biomechanical Stresses in Laparoscopic Surgery 3.1 Introduction The use of minimally invasive techniques has become widespread over the last decade. Improvements in surgical instrumentation have made it possible for surgeons to undertake increasingly complex laparoscopic surgical procedures. Unfortunately the effects of these instruments on surgeon performance, comfort and safety have not been well evaluated. Minimal ly invasive surgeons are often forced to use uncomfortable and potentially harmful postures, which may lead to long-term strain injuries and musculoskeletal disorders such as Laparoscopic Surgeon's Thumb (Horgan 1997) and carpal tunnel syndrome (Buschbacher 1994). To identify and rectify such problems, ergonomists w i l l typically assess working postures using either direct observation or instruments such as goniometers or motion analysis systems (Genaidy 1994). Some researchers have looked at the joint forces of sustained and repetitive postures (Viikar i -Juntura 1999). Deviations from neutral posture have been correlated with injury risks, so ergonomic stress ratings can be assigned to the history of joint angles observed during a task in a surgical setting. The Rapid Upper Limb Assessment ( R U L A ) system, which was designed for light manual tasks, is an appropriate scoring system for evaluating biomechanical stress of a surgeon (McAtamney 1993). Ergonomic assessments in minimally invasive surgery are rare, despite widespread acknowledgement that strain injuries do occur to surgeons. Postural assessments are common in repetitive task workplace settings, but only a small number of studies have been done in the operating room (OR). Radermacher et al. (1996) was one of the first to assess minimally invasive surgeries and outlined improvements for workplace design. Berguer showed the postures in laparoscopic surgery are more stressful than in conventional open surgery (Berguer 1997) and that instrument design affects fatigue 58 levels (Berguer 1999). In these past studies, visual or video based observations have formed the basis of postural assessment. The effect of patient and procedural variability on a surgeon stress remains poorly understood. In this study we expand the efforts of Person's "proof of concept" pilot study, which demonstrated the feasibility of using an optoelectronic and video motion analysis system to perform a continuous ergonomic posture analysis in the O R (Person, 2000). In this research we investigate the performance and postural stress of an expert surgeon over multiple procedures. The objective of this study is to provide an assessment of biomechanical stresses experienced by an expert surgeon over seven laparoscopic cholecystectomy procedures. More specifically we investigate the effect of patient and procedural specific factors on measurement of biomechanical stress and discuss i f a single procedure provides a reliable estimate of stress experienced by a surgeon. This objective is met by investigating similarity of joint angle distributions and by identifying similarity of biomechanical stress scores. 3.2 Methods 3.2.1 Equipment Refer to Section 2.2.2 in Chapter 2 for information on experimental protocol and the data acquisition system. 3.2.2 Kinematic Arm Model, Calibration and Joint Angle Calculations The Polaris system reports the spatial position of the marker arrays. Joint angles are calculated from the measured position of the marker arrays located on the surgeon's hand, forearm, upper arm and torso. We used a simplified seven degree-of-freedom (DOF) rigid-body model of the upper limb. The model allows us to represent shoulder abduction/adduction, flexion/extension, and internal/external rotation, elbow flexion/extension, pronation/supination and wrist flexion/extension and radial/ulnar deviation. Using this model, we applied a kinematic calibration method based on circle 59 and sphere fitting techniques to locate the approximate centre of each of the shoulder, elbow, and wrist joints. The method locates a joint axis or centre of rotation by fitting a circle (or sphere) to a cloud of raw data points collected from limb markers and expressed in a reference frame attached to an adjacent limb segment. These techniques are well known in the biomechanics and robotics literature (e.g., Halvorsen 1999). We used the joint angle definitions provided by Angl in (1993) to ensure consistency with the angles used in the R U L A scoring system. 3.2.3 Comparative Analysis (KS Statistic) We would expect to see some degree of difference in joint angle distributions as a result of patient, procedural, and measurement variability, however since the surgeon is fully trained we expect that his approach to each patient w i l l be as consistent as possible. We would hope that there is sufficiently little variation between procedures that measurements obtained during a single procedure wi l l be reliable indicators of the physical stress experienced by the surgeon. To demonstrate the repeatability of stress scores for a surgeon over multiple surgeries, we report the differences between the distributions of joint angles obtained in each procedure taken singly and those obtained from the remaining six surgeries. The Kolmogorov-Smirnov (K-S) statistic, D, provides a measure of difference between two distributions and indicates the maximum vertical difference between pairs of cumulative distribution functions as shown in Figure 3.1; D ranges from 0 (similar) to 1 (different). Using a bootstrapping approach outlined in Appendix E we establish a 90% confidence bound on our estimate of D. The associated p value expresses the probability that the two measured distributions arise from the same underlying distribution. 60 C P D x Figure 3.1 - The Kolmogorov-Smirnov statistic, D, is defined to be the maximum vertical difference between two cumulative probability distribution. The cumulative distribution function for a finite data set is represented as a staircase. The difference measure calculated from the K-S statistic for each joint of each procedure provides a way of quantitatively assessing the difference between two distributions. Since we performed this experiment using the same tool set over seven procedures the difference measure essentially provides us with a benchmark for that particular tool set. In future analyses we may be interested in evaluating new tool designs and their effect on surgeon posture. A comparison of the difference measures from the new tool design and benchmarked values of the old tool set will provide insight as to how well the new tool influences surgeon posture. For example, consider developing a tool to reduce stress in the wrist. Our goal would be to design a tool, which minimizes the frequency of wrist flexion/extension and ulnar/radial deviation. If we collect joint angle data of a surgeon using this new tool and compare these distributions with our database of joint angle measures (old tool design and the same surgeon) we would expect to see a difference in distributions. The K-S statistic provides a means of quantitatively assessing the difference between the two distributions of joint angles and from this we quantify the improvement of the new tool. 3.2.4 Ergonomic Stress Scoring System To rate the stress experienced by a surgeon we used a modified version of the Rapid Upper Limb Assessment (RULA) technique (McAtamney 1993). The Ovako Working Posture Analysis System (OWAS) for postural evaluation was considered but deemed inappropriate for our application (Karhu 1997). The OWAS technique was originally 61 designed for assessment of workers in heavy industry and would require significant modifications to be appropriate for our applications. The RULA technique is a posture evaluation method designed for assessing upper limb postures during light manipulation tasks. Stress exposure is evaluated using body posture diagrams and scoring tables, which specify different posture zones shown in Figure 3.2. The posture limit guidelines and scores are validated and based on the findings of several ergonomic studies (McAtamney 1993). We used a modified version of the RULA scoring system, which incorporated continuous scoring measures to improve the sensitivity of our assessments. The RULA scoring system is base on a discrete score for selected joint angle regions. Using a continuous scoring system we are able to more accurately define stress for a range of joint angles. Since the standard RULA scales have different ranges for each joint, we normalized all scores such that 0.0 corresponds to the minimum RULA value for a joint and 1.0 corresponds to the maximum RULA value. * 0 ° - 6 0 ° Figure 3.2 -R. U.L.A. Posture zones. Zero reference positions are also shown. 62 3.2.4.1 Modified Discrete RULA Method We modified the R U L A scoring system slightly to resolve the ambiguity in the scheme's use of the clinical terms 'abduction/adduction' and 'flexion-extension'. We adopted the angle convention used by Keyserling (1986), where shoulder flexion/abduction is classified by identifying elbow elevation <45° from the neutral (anatomical) position as neutral, 45°-90° as mild elevation, and >90° as severe elevation. A score similar to the R U L A method is applied to indicate the posture level: neutral (1), mild elevation (2), severe elevation (3). We also modified the wrist flexion-extension and radial-ulnar deviation penalty zones. In the original R U L A system, any non-neutral wrist postures (assessed visually) are considered stressful, but since measurements from a motion analysis system w i l l always contain some noise, we must choose a non-zero finite range to represent neutral posture; we have therefore defined neutral wrist angles as those lying within ±5° of the neutral position. Finally, in order to numerically classify the range • o f forearm pronation/supination, we adopted the posture classification presented by Genaidy (1994), where pronation or supination outside a ±15° range from neutral is considered poor posture. 3.2.4.2 Modified Continuous RULA Method The modified discrete R U L A method described in the previous section provides discontinuous parameter jumps in the R U L A Score separating each posture zone. The original R U L A system was designed for visual or video observation where continuous joint angle measurements are not available. The actual stress on the soft tissues and the consequent risk of injury wi l l be a continuous function of the joint angles. A more appropriate scoring model would incorporate the same scoring approach by replacing the discontinuities with continuous transition functions. This in effect represents a more realistic method of scoring stress. For example, i f we consider flexion of the wrist, 5.1° of wrist flexion is only slightly more stressful than wrist flexion of 4.9°. Using the discrete 63 modified R U L A method we see completely different stress scores for each of the two joint angles. Using a continuous scoring system we would see a much smoother transition, which is more representative of joint stress. Figure 3.3 shows the discrete scores and continuous transition scoring functions chosen for the modified R U L A scoring system of each joint. 9> o 31 o co CC 1 b 3 o CO CC 1 o 3 o CO § 2 CC 50 Shou lde r A n g l e 100 -20 -10 0 10 Forearm P S A n g l e 20 50 E l b o w F E A n g l e 100 CD O o CO ZD CH 3| 2 1 -20 o o CO or -10 0 10 Wr i s t F E A n g l e -5 0 5 Wr i s t R U A n g l e continuous score discrete score 20 Arms cross body midline or are out to side Figure 3.3 - Discrete and continuous modified RULA posture scores adopted for current study of continuous ergonomic posture analysis during minimally invasive surgery. 64 3.2.5 Post-Operative Calculations Post-operatively, we calculated the joint centres relative to the attached marker arrays and calculated joint angles and the associated discrete and continuous R U L A scores at each sampling interval throughout the procedure. Using the correlated video data we reviewed the entire procedure and divided the procedure into phases based on a hierarchical decomposition outlined in Chapter 3. The three phases of interest include the: cystic duct dissection, gallbladder dissection, and gallbladder removal as classified by Traverso et al. (1997). For each major phase of the surgery, we computed the fraction of time the surgeon spent at each stress level on the discrete R U L A scale. We also compared these distributions across phases of the surgery to determine which phases were most stressful. Using these normalized continuous R U L A values, we compute a normalized weighted postural stress score (NWPSS) for a period of observation by integrating the instantaneous scaled R U L A scores (sRULA(t)) over the observation period, T, and then dividing by the period length to produce an average score: where the superscript j refers to a particular joint. 3.3 Results 3.3.1 Marker Visibility A certain amount of position data is lost during the course of the surgery due to marker occlusion or internal localization errors, (e.g., Polaris is unable to find a marker within 0.5 mm of its expected location). These data were excluded in determining the total percentage spent in a given posture. Figure 3.4 illustrates the marker array visibility averaged over the seven procedures. On average, 36% (SD: 7%) of all joint angle data were missing during the entire surgery over seven trials (before interpolation). 65 Figure 3.4 - The values labeled on each marker represent the percentage of time it was tracked over all seven procedures after interpolation. We used a generalized cross validation (GCV) filtering technique to filter, interpolate and resample position data at a constant 20 Hz sampling interval (Woltring 1986). We were able to reduce the amount of missing joint angle data by 20% (SD: 2%) using our interpolation technique with a maximum 1mm RMS error across a maximum gap size of 0.5 sec (10 samples). Using the correlated video images we separated manipulation from non-manipulation tasks and found that the majority of missing data could be attributed to marker occlusions during segments not directly related to performing a surgical task (e.g. defogging the laparoscope lens, changing instruments, etc.). During tissue manipulation tasks and after interpolation, we had complete joint angle data 80% (SD: 10%) of the time and at least one sample per second 92% (SD: 4%) of the time. 3.3.2 Comparative Analysis For each joint, we calculated and plotted the cumulative probability distribution (CPD) of joint angles during each procedure as shown in Figure 3.5. The CPD is useful to visualize the distribution of the joint angle data and differences among individual distributions. 66 Forearm Pronation/Supination (deg) Figure 3.5 - Graphical representation of cumulative probability distributions of each joint and for each procedure. Each curve in each plot represents the joint angle distribution of one procedure. 3.3.2.1 Measurement Errors In our system, measurement errors can be attributed to: (1) internal localization errors, (2) marker shifts, and (3) neutral posture reference. The internal localization errors are inherent to the system and cause positional and angular orientation errors in the order of 67 0.35 mm and 1.1° respectively. These errors show up as noise in the position and orientation signals and are filtered out using the G C V filter described in Section 3.4.3.2. Marker shifts occur when a marker is bumped into a different position. Great care was taken to position and align the markers in such a way they are not disrupted during the procedure. Only one documented marker shift occurred in all of the procedures analyzed. In procedure 4 the hand marker shifted at minute 6 (total procedural time 44 min) which was the result of poor marker adhesion to the hand. The profile of the C P D in Figure 3.6 shows evidence of a shift in the wrist R U angle distribution. We are currently developing algorithms based on an automatic joint centre identification method reported recently by O'Br ien which wi l l help compensate and correct for marker shifts (O'Brien 2000). Wris t Rad ia l /U lnar (deg) Figure 3.6 - Graphical representation of the cumulative probability distributions of wrist ulnar / radial deviation. Procedure 4 shows evidence of a marker shift. 6 8 The neutral posture reference taken at the end of each procedure is somewhat subjective in that the surgeon simply is asked to adopt a neutral posture. The neutral posture reference is the largest source of error for each joint angle measurement. The data collected for the neutral position is used as the absolute reference for all joint angle calculations for a particular procedure. We conducted an in-lab experiment to investigate the variability of the neutral posture reference position. A subject with no prior knowledge of the system was chosen to repeat a neutral posture reference position 30 times. For each trial the posture was recorded and the joint angles calculated. Since this subject repeated the neutral postures at short intervals, the results are probably better than those of the surgeons, who adopted these postures at intervals of up to several weeks. Figure 3.7 shows the variability of our neutral posture measurements found experimentally in the lab and from the O R data. 8 7 4 6 c o 5 > • E x p . • O R 2 0 Shoulder E lbow Wr is t F E Wrist R U Forearm P S Figure 3.7 - Variations in neutral posture measurements found experimentally in the lab and from the OR data. 69 3.3.2.2 Distribution Profiles The C D F of each joint angle has its own unique profile and horizontal shift. The profile (or shape) itself provides insight into the characteristic behavior of each joint. The profile suggests information about frequency of different angular positions during the procedure. Due to variability in defining the neutral posture for each joint, we shifted each distribution such that they share a common mean. The differences in the distributions are therefore primarily due to differences in shape rather than mean values. Figure 3.8 shows a comparison between the joint angle distributions of the elbow before and after the shift and Figure 3.9 shows a summary of all shifted joint angle distributions. The K S (d) statistic is applied to the collection of shifted joint angle distributions. Non-Shi f ted Shi f ted E l b o w F l e x i o n / E x t e n s i o n (deg) E l b o w F l e x i o n / E x t e n s i o n (deg) Figure 3.8 - Graphical representation of the cumulative probability distributions of elbow flexion and extension (L) Non-shifted data (R) Shifted data (Normalized about the mean of all distributions). '• 70 Shou lde r Elevat ion (deg) E l b o w F l e x i o n / E x t e n s i o n (deg) Fo rea rm Pronat ion /Sup ina t ion (deg) Figure 3.9 - Graphical representation of normalized cumulative probability distributions of each joint and for each procedure. Each distribution is normalized to the collective mean of joint data from all procedures. The bottom right figure shows the joint angle distribution shifts for each joint of each procedure. 71 3.3.2.3 Patient / Procedure Factors Each procedure introduces a certain amount of variability into the data set. The same expert surgeon was evaluated in all procedures and each case involved a new patient and operating staff. Table 3.1 summarizes the patient and procedural factors which influenced the outcome of each procedure and Figure 3.10(a) shows distinct profile differences for the joint angle distributions of the wrist. Comparing the distribution profiles to the patient and procedural factors outlined in Table 3.1 shows little correspondence between distribution profile and procedural variability. Table 3.1 - General comments regarding each procedure. Procedure Patienl t Factors Procedural Factors / Comments Weight Difficulty with Internal Anatomy 1 Normal Normal Standard procedure 2 Heavy Moderate Standard procedure 3 Heavy Normal Problems with cautery tool 4 Heavy Difficult Difficult procedure (large liver) 5 Normal Normal Standard procedure 6 Heavy Normal Difficult procedure 7 Heavy Normal Standard procedure 100, . . T * — * — 1 f ) ( 80 c o -*—' n -4—' to b « 60 s CD .a o > ro E o 40 20 Procedures < 1,3, 5 and 6 •> y Procedures 2, 4, and 7 g —' CO b 80 -£r 60 J 3 CO n o > ro Z3 E n O 40 20 -50 Procedures < L 1, 3, 5 and 6 ^ /] f Procedures 2, 4, and 7 ' -20 0 20 Figure 3.10(a) - Graphical representation of cumulative probability distributions correlating procedures to groupings of joint angle distributions. Normalized cumulative probability distributions of the wrist joint for each procedure (left - wrist flexion/extension, right - wrist radial/ulnar). 72 E l b o w F lex ion - Ex tens ion (deg) Wr is t Rad ia l - U lnar (deg) Figure 3.10(b) - Graphical representation of cumulative probability distributions showing types of variability from procedure to procedure. Horizontal shifts of the distributions (left) and differences in the shape of the distributions (right). In Figure 3.10(b) we notice two types of variability; (1) horizontal shifts of the distributions and (2) differences in the shape of the distributions. Both the horizontal shifts and differences in shapes of distributions suggest patient or procedural variability. The horizontal shifts may also indicate measurement errors. In particular, these measurement errors are likely caused by variability in the neutral posture reference. 3.3.2.4 Difference Measures The difference measure D calculated from the K - S statistic for each joint of each procedure provides a way of quantitatively assessing the difference between two distributions. Plotting the D values for each procedure on a difference scale from 0 to 1 gives us an idea as to the variability of these distributions for an expert surgeon using a standard toolset in a laparoscopic cholecystectomy. The position of the D values on the difference scale provides context to the scale itself. In other words we have a known benchmarked difference scale from which further comparisons can be made. Figure 3.11 shows the range of D values for each joint resulting from comparing joint angle 73 distributions from each procedure to the concatenated distribution of the remaining six procedures, along with the average D value and standard deviation for each joint for shifted and non-shifted data sets. The most proximal joints exhibit the least variability (D ~ 0.10-0.13, on average), while the wrist joint exhibits the most (D ~ 0.2, on average). Since the surgeon is fully trained, these values represent the minimum variations we might expect to find from procedure to procedure when using the same surgical tools. K S (D) Stat is t ic for Shifted and Non-Shif ted Data Shoulder Elbow FE Wrist F E KHh-Q-l . JOibrC -l h- - M H F D - —1 rQKHICH Wrist R U ^ f i P P ^ H f -Forearm P S • Non-Shifted Data o Shifted Data • Mean Non-Shifted • Mean Shifted 0.0 0.2 0.4 0.6 Difference Measure (D) 0.8 1.0 Figure 3.11 - Benchmarking the difference measures using the K-S statistic on a difference scale. 3.3.3 Posture Scores Joint postures are calculated for the three main minimally invasive phases of the standardized laparoscopic cholecystectomy operation as defined by Traverso et al. (1997): cystic duct dissection (CDD) , gallbladder dissection (GBD) , and gallbladder removal (GBR) . Table 3.2 lists the time spent in each of these phases by our surgeon, as well as typical times required. 7 4 Table 3.2- The average time taken for the major surgical tasks performed in seven Operat ive T a s k T ime Typ ica l T i m e (mins) (mins ± std. dev.) Total operation t ime 6 8 . 7 ± 11.0(100%) 72 + 28 (100%) Cys t i c duct d issect ion 1 8 . 8 ± 5 . 5 (27.5+7.1%) 15 + 11 (21%) Gal lb ladder d issec t ion 1 5 . 5 ± 4 . 8 ( 2 2 . 3 ± 4 . 8 % ) 14 + 8 ( 1 9 % ) R e m o v e gal lbladder 8.7+1.8 (12.8+2.9%) 12 + 6 ( 1 7 % ) We computed the modified discrete and continuous R U L A posture scores illustrated in Figure 3.3 for the three operative phases (CDD, G B D , and G B R ) . Figures 3.12, 3.13 and 3.14 show the average contribution of the discrete ergonomic stress levels for each shifted joint angle distribution of interest during the three major phases of the surgery. A normalized weighted postural stress score (NWPSS) is calculated for each phase o f the procedure for both the discrete and continuous scoring systems. For each joint there is lower variance using the continuous scoring system versus the discrete system. The N W P S S provides a directly comparable measure of overall postural stress in each joint. Shi f ted Data - S t ress contribution level : Cys t i c duct d i ssec t ion Continuous NWPSS 13.8*2.5% 56.5±7.3% 54.9±12.3% 71.5±10.1% 32.1 ±16.4% Discrete NWPSS 7.8±5.2% 68.6±8.8% 50.9±11.9% 69.4±11.8% 27.9±19.1% w o E o c o O) L_ UJ o 3 *c c o o 100% 80% 60% 40% 20% -0% 15.4P\ ±12.9 84.5 ±13.1 0.1± 0.1 10.7±9.8 13.9 ±5.4 40.2 ±6.5 29.0 ±14.2 69.4 |±15.1 30.6 ±15.1 27.9 ±24.5 72.1 ±24.5 Shoulder Elbow Wrist FE Wrist FU Forearm PS Normal Posture D1 D2 El 3 U4 Highly Stressed Posture Figure 3.12 - Cystic duct dissection (CDD) stress levels for each joint angle normalized and averaged over seven procedures, including the continuous and discrete NWPSS. 7 5 Shi f ted Data - S t ress contribution level: Ga l l b ladder d i ssec t ion Continuous NWPSS 12.2±0.7% 39.5±8.5% 60.2±16.4% 65.7±14.2% 59.9±20.5% Discrete NWPSS 4.6±1.8% 39.7±11.9% 57.6±18.1% 62.6±17.9% 57.1 ±25.4% CO in co +J CO o i o c o LU O B g 3 o o 100% 80% 60% 40% 20% 0% 90.7 ±4.6 40.0 i25.5 24.4 ±14.9 39.2 ±18.5 22.8 +16.7 57.1 ±32.6 62.6 ±22.9 42.9 ±32.6 37.4 ±22.9 Shoulder Elbow Wrist FE Wrist FU Forearm PS Normal Posture D1 D2 Highly Stressed Posture Figure 3.13 - Gallbladder dissection (GBD) stress levels for each joint angle normalized and averaged over seven procedures, including the continuous and discrete NWPSS. Shi f ted Data - S t ress contribution level : Ga l l b ladder R e m o v a l Continuous NWPSS 12.3±3.1% 43.3±5.3% 56.8±18.2% 73.8±7.4% 32.8±10.8% Discrete NWPSS 6.9±4.5% 50.5±7.3% 53.8±18.7% 72.2±8.5% 28.9±12.3% m in o c tn E o e o o> k-LU o c o n 'k_ c o o 100% 80% 60% 40% 20% 0% 13.3K ±10.8 86.4 ±11.2 0.3±0.3 25.9 ±19.8 19.6 ±8.1 37.9 +16.9 27.2 +21.1 28.9 ±15.7 72.2 ±10.9 • 71.1 ±15.7 27.8 ±10.9 Shoulder Elbow Wrist FE Wrist FU Forearm PS Normal Posture C M U2 Highly Stressed Posture Figure 3.14 - Gallbladder removal (GBR) stress levels for each joint angle normalized and averaged over seven procedures, including the continuous and discrete NWPSS. 76 3.4 Discussion The primary purpose of this study was to provide an assessment of the repeatability and level of biomechanical stresses experienced by an expert surgeon over multiple laparoscopic cholecystectomy procedures and to establish the foundation of a method to quantitatively assess new tool designs. We selected the laparoscopic cholecystectomy procedure to study as it is the most common and well-defined minimally invasive procedure (Traverso 1997), and we used an optoelectronic measurement system because it is well tolerated by surgeons and reasonably easy to implement. 3.4.1 Tool Tracking During tissue manipulation tasks, sampling rates were 20 Hz and the markers were fully visible almost 80% of the time; when markers did become obscured or otherwise unreadable, the resulting gaps in the data typically lasted for only a fraction of a second. After interpolation we obtained at least one reading per second 92%> of the time. This sampling rate is adequate for postural analyses; Bhattacharya et al. (1999), for example, sampled carpenters' postures at 1 Hz and Person (2001) sampled at 1 Hz, 96% of the time. 3.4.2 Joint Angle Comparison A comparison of joint angle distributions was used to investigate the effect of procedure specific factors on measurement of biomechanical stress and to test if a single procedure provides a reliable estimate of stresses experienced by a surgeon. An estimate of the similarity of the joint angle distributions is made using the K-S (D) statistic. The horizontal shifts of the cumulative probability distributions (CPD) indicate variability caused both by measurement error and patient and procedural influences. The most significant cause of measurement error was the neutral posture reference position. In an attempt to investigate the effect of patient and procedural variability each CPD was normalized to a common mean, providing a means to investigate the shape similarity of 77 each distribution. The normalized or shifted C P D ' s allows investigation of the distribution profiles, which most likely illustrates the effect of patient and procedural variability. This may underestimate D but does put a lower bound on expected variability. The low value of D for the shoulder, elbow, and forearm suggests these distributions are only slightly affected by patient variability. Variance of D for the wrist flexion / extension and wrist radial / ulnar deviation suggest moderate effects of patient and procedural variability. In particular, difficult patient types cause greater differences in D values. Using the K - S (D) statistic to calculate measures of difference among joint angle distributions, we established a point on the scale for future assessments of new tool designs. A quantitative assessment of new tool designs can be evaluated on the difference scale. For example the evaluation of a new tool designed to improve wrist ergonomics would most likely show similar D values for the shoulder, elbow, and forearm and significantly different D values for the wrist joint angle distributions, coupled with a lower N W P S S . Patient variability had a moderate effect on the joint angle distributions. The joint most significantly affected by patient variability is the wrist in both flexion / extension and radial / ulnar deviation. For future analysis of new tool design it may be appropriate to evaluate performance on a single patient provided the patient is as 'normal' as possible and i f we require differences in D greater than -0.2. 3.4.3 Biomechanical Stress Levels The normalized stress (NWPSS) for all joints is used to assess the biomechanical stresses experienced by an expert surgeon over seven laparoscopic cholecystectomies. The averaged N W P S S for all joints is comparatively high throughout all the procedures. Figure 3.15 shows the averaged shifted continuous and discrete N W P S S values. The variance of continuous scores are consistently less than the discrete N W P S S scores which suggests there is a difference in our ability to predict ergonomic stress and the continuous 78 R U L A scoring system is a more appropriate method for evaluating stress. The results also show that the wrist and elbow are consistently stressed in all phases of the procedures, which agrees with other reports (Berguer 1998). The forearm is stressed to a lesser degree but it is noted that the G B D phase has more ergonomic stress than the C C D and G B R phases. During the G B D phase, the spatula tool is used most often and since the spatula has no handle the surgeon is forced to hold the tool in awkward ways resulting in high ergonomic stress during the G B D phase. 100 80 S 60 o CO g) 40 co 20 Sh i f t ed D i s c r e t e a n d C o n t i n u o u s N W P S S • C D D | | Continuous • G B D I I Discrete • G B R Shoulder Elbow Wrist FE Joint Wrist RU Forearm P S Figure 3.15 - Shifted continuous and discrete NWPSS averaged over seven procedures for each joint in each surgical phase. 3.4.4 Future Work Our motion capture system has proven to be a useful ergonomic analysis tool; however, there are several improvements which could be made in the future. First, develop a constraining j ig for the upper limb and arm for the neutral posture reference. This issue may also be addressed by implementing O'Brien's automated joint centre and joint axis locating techniques (O'Brien 2000). Secondly, we did experience frequent short-term occlusion o f the markers, which would prevent us from performing dynamics calculations. This problem could be alleviated 79 somewhat by positioning the camera directly overhead, using a second camera, increasing the number of faces on each marker array to enhance visibility, or complementing the optical system with magnetic trackers. Finally, we would like to investigate the effects of fatigue resulting from prolonged near-stationary posture and rapid posture transitions (ie. fatigue vs. repetition). Since we already have joint position as a function of time this analysis would not required additional O R data. 3.5 Conclusions We successfully used an automated high-frequency postural measurement system to assess the biomechanical stresses experienced by an expert surgeon over seven laparoscopic cholecystectomy procedures. We notice patient variability primarily affects certain joint angle distributions, in particular the wrist and elbow joints. Ergonomic stress levels are high in all phases of surgery; in particular, the elbow and wrist are significantly more stressed than the shoulder and forearm. Using the K - S statistic we created a difference scale to benchmark the effects of procedural variability. This new benchmarking scale can be used to evaluate new tool designs, validate surgical simulations and assess the performance of surgeons. This quantitative assessment method allows use to identify segments of the surgery which are most stressful to surgeon. 80 Chapter 4 Conclusions and Recommendations 4.1 Introduction The goal of the research presented in this thesis was to establish a methodology for quantifying the validity of surgical simulators for the purpose of assessing skil l and performance of surgeons and evaluating new tool designs. A technique using a motion analysis system to measure surgeon performance during live surgeries was developed. Seven live procedures were recorded (along with video records of an additional eight procedures) and from this data a range of performance parameters were derived. A method was also developed for showing the reliability of our measurements and our established performance parameters. This system is intended to be used in future studies for the purpose of validating surgical simulations and for developing a database of performance measures of surgeons at various skill levels for surgeon certification. 4.2 Review of Present Research 4.2.1 Motion Capture System Feasibility We used an automated high-frequency tool tracking and postural measurement system to measure the movements of surgical tools and the posture of a surgeon during seven laparoscopic cholecystectomy procedures. To our knowledge this is the first time such a system has been implemented in the O R for the purpose of performance evaluation. In fact, this is the first time anyone has taken a quantitative approach to evaluating performance in the OR. The results demonstrate that an optoelectronic and video motion analysis, particularly in conjunction with a missing data interpolation scheme, is a reasonable method for recording kinematic data in the operating room, despite occasional periods of missing data. This system is particularly well suited for measuring joint angles at low frequencies. When tracking the tool tip, however, this system often misses sizable data segments, thus reducing our ability to extract useful performance measures. For our analysis purposes, tool tip tracking requires continuous high frequency data acquisition. 81 Using this motion capture system in the O R environment presented many challenges. Not only did we have to deal with equipment logistics, marker sterilization concerns, and patient scheduling; but it turned out that during the time of our experiments we were faced with a province wide nursing strike. These challenges prevented us from taking measurements on a weekly basis (a typical schedule of our surgeon); instead we collected data, on average, every two weeks for a four-month period. A n estimated 25-30 hours of preparation work went into every procedure (equipment checks, marker sterilization, and equipment transport) and at least two researchers were required in the O R to operate the data collection system. In order for the recordings to be successful we required the co-operation of operating staff, the surgeon and essentially flawless operation of our equipment. In general, we found the operating staff to be very helpful and on all days the surgeon was wil l ing to participate in the experiment. Our first experiment was a real learning experience. We found our postural markers worked well for calculating joint angles but our tool trackers needed improvement (Our first experiment used planar marker arrays for tracking the tool handle). The results from our first experiment suggested we needed an improved marker array design which lead to the development of the M D M A r r a y . On one occasion our data acquisition software was improperly shut down resulting in formatting problems of the raw data. A lengthy post-operative data analysis was required to rectify the raw data. Improvements to the software and G U I interface prevented future mishaps. A t its present state the feasibility of using this system to evaluate a large number of surgeons in the O R is poor and impractical for commercial applications. The effects of this system on surgeon performance are difficult to quantify. Our surgeon has expressed concerns over the size and weight of the instrumentation used, suggesting their presence may have affected his overall performance. In all of our recordings the surgeon was advised that i f he felt uncomfortable wearing the markers at any time during the surgery, he was free to stop the experiment and remove the markers. In none of our recordings was the experiment prematurely interrupted. However, at the end of three procedures the surgeon commented that the system had a distracting effect and added considerable stress to the procedure. On a number of occasions, the surgeon also 82 expressed that he suffered from "stage fright". A t this point it is difficult to tell whether the instrumentation, the patient, or some other external factor lead to the additional stress experienced by the surgeon. We would assume these distractions would reduce the reliability of our overall performance assessment and would be quantified in our analysis of procedural variability. Additional work is required to minimize the impact of such a system on the performance of a surgeon. Perhaps smaller instrumentation would be of most benefit but ideally we would like to take our measurements without the surgeon even knowing. A t its present state, this system requires a considerable amount of manual and human input. For example, to identify segments of the procedure during which there are manipulation tasks required the researcher to painstakingly review each frame of the video data. This human input not only introduces a certain amount of subjectiveness (although we tried to be as consistent as possible) but also reduces its feasibility to be used in a widespread research study. A considerable amount of time was also spent calibrating the marker arrays and surgical instrumentation. Our surgeon has also expressed considerable opposition toward the postoperative joint calibration procedure (postoperative joint calibration can be replaced with an automated system for joint centre identification). Further work is required to automate this system, thereby allowing us to expand our data collection capabilities. Finally, there was some concern about the precision of our tool tip location. The tool tip location was found based on the instantaneous position of the tool marker, which is located at the tool handle. A tool tip transformation from the tool marker was found using sphere fitting techniques at the end of each procedure. The precision of the tool tip location is on the order of ±1.5-2.0 mm, depending on the noise of the marker array. The kinematic details of the tool are affected by the positional accuracy of the tool tip which raises the issue of the accuracy of our kinematic measures. The poor reliability of our kinematic measures may be the result of both poor tool tip accuracy and lack of data. 83 4.2.2 Performance Measure Reliability for the Assessment of Surgical Performance in the OR. Having established the feasibility of using an optoelectronic motion capture system, we developed a methodology for validating surgical simulators and classifying surgical performance. A hierarchical decomposition was developed to provide an organizational framework for making comparisons of performance measures in different contexts. Based on work done in the lab, we found it difficult to automatically extract action states based on our kinematic data. Our approach used fuzzy memberships to define the membership of each state at any particular instant. We discovered that the membership functions often suggest membership in more than one state at any particular instant making it difficult to detect the duration of action states. Our method of extracting kinematic data is an improvement over Cao's video based system and shows limited potential for reliably detecting action states automatically (Cao 1996). Our methodology is based on making comparisons of performance measure distributions in difference contexts using the Kolmogorov-Smirnov (K-S) statistic. The K - S statistic is most commonly used to test the null hypothesis that two sets of data were drawn from the same underlying distribution. In our case, however, we know that the two data sets are obtained from two separate tasks, so we expect the two distributions to be different. Instead, we use the K - S statistic as a measure of the difference between the two distributions and wi l l interpret significant differences as indicators of performance or skill level differences. The K - S statistic is an improvement over traditionally used statistical tools such as the /-test and f-test which test for significance in means and variance respectively (Von Mises, 1964). The K - S statistic was useful because it is nondimensional, which makes it directly applicable to any measured variable regardless of the units used in the measurement, and because it makes no a priori assumptions about the nature of the statistical distributions of the data. The K - S (D) value gives us an idea of how different two distributions are regardless of how they differ, which is important in our application of performance assessment. 84 For the most part, our comparisons of performance measure distributions showed that the trained surgeon performed consistently over the seven procedures, so the performance measures we investigated are reasonably reliable measures of performance. Because of difficulties in obtaining sufficient tool tip data, we cannot yet conclude that the kinematic measures we measured were reliable. From our inter-procedural reliability analysis, we showed the significance of procedural variability and found that, as the number of procedures evaluated increases, the resolution of our ability to detect differences improves. The significance of procedural variability was shown for all of our performance measures (time, kinematics, joint angles, and event sequencing). Improving the resolution of our measurements is important for detecting differences between different skilled surgeons. However, the resolution for detecting a certain level of differences is a trade-off to the cost of measuring additional procedures. If we find we need a certain level of resolution and it requires measuring 10 procedures then the system is impractical, as the cost of measuring the 10 procedures is high. Not only is the cost prohibitive but also the subject w i l l probably have considerable performance gains over those 10 procedures, which are not being'detected, thus defeating the purpose of our analysis. In our case, for completion time we need about 6 measurements (ie. one procedure) to distinguish a moderate D value of -0 .3 . A t this point this seems appropriate, although further work is required to see i f we can measure differences between surgeons. 4.2.3 Repeatability of Ergonomic Measures in MIS Surgery In this study, we used the same optoelectronic motion capture system to assess the biomechanical stresses experienced by an expert surgeon. This system was the first-ever high frequency automated posture measurement system to be implemented in the O R and stands as a substantial improvement over traditionally based check-list approach. We used a modified version of the R U L A technique for evaluating surgeon joint stress. The original R U L A scoring system, based on discrete scores, was developed as a visual checklist assessment for postural stress. Since we have high frequency joint angle data we 85 modified the R U L A scores into a continuous scoring system. We found the standard deviation of stress scores across the seven procedures to be lower using the continuous system vs. the discrete system, thus suggesting the continuous system is more appropriate for high frequency joint angle data. From our analysis we noticed that patient variability affects certain joint angle distributions more than others, particularly the wrist and elbow joints. More data is required to say definitively i f specific patient variability affects certain joint angle distributions in particular ways. A t this point it is difficult to tell i f a surgeon behaves differently (in terms of postures) because of the patient-specific factors (ie. weight, sex, e t c . ) . From our data we found that postural stress levels are high in all phases of surgery, and the elbow and wrist are significantly more stressed than the shoulder and forearm. This quantitative assessment method allows us to identify segments of the surgery which are most stressful for the surgeon. Using this knowledge to identify stressful states in the OR, we can improve instrumentation for surgeons during laparoscopic surgery and recommend changes in posture to improve comfort. It is important to note that the R U L A scoring system neglects to take into consideration the effect of joint fatigue or loading in the joints. The stress experienced by a surgeon holding a particular posture of a prolonged period of time is different from the same surgeon moving in and out of that posture from a neutral position over a longer period of time. Each scenario would give a similar R U L A score but the effects of fatigue and joint stress would be different in each case. Future work may involve consulting with ergonomists to identify the effects of joint fatigue and repetitive joint strain. We may also consider correlating the stress experience by the surgeon with forces and tool tip position. 4.2.4 Overall Performance Assessment Surgeon performance or skill assessments have become an important topic in the minimally invasive surgery research. Research groups involved in developing training programs or simulations always propose some measure of performance. Completion time 86 is most commonly used as measure of evaluating performance (Chung 1998, Sackier 1998) as it is easy to measure and is directly affected by the task being performed (Sanders 1991). Although simple to measure and compare, task completion time may not sufficiently describe the skill required in performing a task. Research groups have looked at a number of different performance measures including: precision and speed (Derossis 1998), knot quality (Hanna 1997), economy of arm motion (Eman 1999), errors and task times (Hanna 1998), error propagation (Joice 1998), and force-torque signatures (Rosen, 2001). In our research we considered completion time, posture, event sequencing and tool tip kinematics. Any one of these measures may show performance differences, but the question is which measure, or combination of measures, captures the relevant aspects of real surgery. To address this question, we need to measure the performance of surgeons in the O R and in simulators. Using this collection of performance measures, we need to identify which performance measures are reliable at detecting differences between surgeons of different skill levels. Our research suggests for an expert surgeon in the O R our measures of completion time, event sequencing, and joint angle measures are reliable measures for evaluating performance and wi l l be important for the purpose of validating surgical simulations. At this point we don't know exactly how good our performance measures w i l l be at differentiating between different groups of surgeons. We may find that two equally trained surgeons have completely different tool tip kinematics but similar end results. In this case, we would say that the kinematic measures are unreliable. However, i f the end result is the same then there must be some commonality in the sequence of events that occurred. This would suggest that perhaps the sequencing performance measure would be appropriate for measuring differences in performance. We are now starting to make distinctions between motor and cognitive skill levels. Future work wi l l require having a broad disciplinary research group of engineers, physiologists, surgeons, and ergonomists. 87 4.3 Future Research Recommendations 4.3.1 M o t i o n Capture Measurements In consultation with surgeons and other engineers, several recommendations were derived from our motion analysis study in the O R to enhance the reliability of our measurements. The suggestions for future work in data collection are: • Improve surgeon instrument motion tracking methods by improving on the marker array designs and camera positioning. This is done by developing an optimal geometric marker array design and by identifying an optimal camera position. • Investigate the feasibility of using multiple motion tracking cameras and the implementation of alternative tool motion tracking instrumentation such as accelerometer/gyro systems (Integrated Micro Instruments) and Shape Tape (Measurand Inc). The advantage of these techniques are continuous tracking of the tool tip and is limited by having additional wires and attachments to the surgical tool. • Investigate using N C L ' s 6-axis force transducer (ATI Industrial Automation) for calculating force torque signatures of surgical tools (Rosen, 2001). • Develop an automated system for extracting performance measures from video and/or kinematic data. • Develop a constraining j ig for the upper limb neutral posture reference. This w i l l help eliminate some of the distribution shifts of the marker data from procedure to procedure. • Implement an automated technique for determining the joint centres and joint axes (O'Brien 2000). 88 4.3.2 Performance Measures and Statistical Analysis Recommendations for additional performance measures and alternative statistical analysis techniques: • Examine joint angle performance measures such as measures of joint angular velocity and acceleration (Eman, 1999). • Develop a new sequencing similarity measure in which the transition probabilities depend not only on the current state but on previous events as well . • Investigate ways of quantifying human reliability and frequency of errors (Joice, 1998). • Use distributions of performance measures (such as velocity) rather than summary measures such as mean velocity in the comparative analysis. • Investigate the effects of fatigue loading on joint stress. 4.4 Future Studies As mentioned previously, our long-term research goal is to develop validated simulations of surgical tasks based on motion analysis studies in the OR. Our intent is to use these controlled simulations as platforms for assessing surgeons' performance and evaluating improved MIS instrumentation. Several components of the developed methodology required further investigation to assess its overall reliability and clinical utility. The following questions remain unanswered: • How reliable is our selection of performance measures in detecting differences between a surgeon in the O R and simulator and of surgeons of varying skill? • Is the inter-procedural variability large compared with inter-surgeon differences or skill level differences? 89 • How much data do we need to detect measurable differences between various skilled surgeons? Is this amount feasible? • What is the minimum number of performance variables needed to get a well-rounded measure of a surgeon's skill level? • Can we verify that time spent in a simulator translates into improved operating room performance and, i f so, what is the "transfer effectiveness"? The following sections outline relevant studies, which should be considered in achieving our long-term research goals and addressing the above questions. 4.4.1 Simulator Validation The next logical step in this project is to begin the process of validating a surgical simulation. For this phase of the project, we propose to develop a simple desktop simulation (perhaps based on existing ones - e.g., Derossis 1998) which have tool and arm motions similar to those surgeons use in actual surgery. To validate the simulation, we must ensure that a surgeon treats the simulation in all relevant respects the same way they treat a live patient. The validation process requires, (1) showing reliability of our selected performance measures in the simulation, and (2) an iterative process of comparing performance measures between the O R and simulator to refine aspects of the simulation which wi l l improve correspondence of these performance measures. For the immediate future we suggest using the same data acquisition system and would expect to see some improvements in our marker trackability, as the simulator environment w i l l be less complicated than the operating room and we wi l l have better sight lines. We propose to use the quantitative measurements of surgical tool motions and surgeon posture from this O R study for our comparative evaluation of movements in surgical simulations. To measure the discrepancy between performance measures, we use the methodologies developed in this document. 90 The process of showing reliability of our performance measures is complex and requires us to assess how much difference in performance measure distributions is required to detect differences between O R and simulator tasks and between surgeons of varying skill levels. We expect to see some difference in distributions of performance measures collected in both cases. Since we intend to use the simulator as a tool for evaluating performance, we need to know how much difference we can detect at various performance levels. This requires measuring surgeons of different skill levels (described in the following section). The iterative process of simulator validation requires making adjustments of the simulator so that it resembles, as closely as possible, the O R for a particular task. For example, consider joint posture - adjustment of table height and position with respect to the subject w i l l help us to match joint angle distributions. Once we have established correspondence, we can introduce subjects of different skill levels and begin the process of showing performance measure reliability. 4.4.2 Surgical Skills Assessment To establish reliability of our performance measure, we need to quantify the performance of surgeons of varying skill levels. For this assessment we suggest using a simulated setting which provides a more robust platform with less variability and influence from procedure-specific factors. It is also logistically much more feasible to get a wider range of subjects in the simulator, as it is often difficult to get surgeons with a wide range of skill levels in the OR. To differentiate between surgeons with different skill levels, we need reliable and clinically meaningful performance measures which allow us to detect differences between them. The resolution of our ability to detect differences w i l l depend on what differences in skill level we want to resolve. To address this issue we suggest collecting data from a large number of surgeons and develop a database of the recorded 91 performance measures. This database wi l l allow us to provide context to our difference measurements and show reliability and feasibility. Feasibility wi l l depend on the number of samples required for our desired resolution. Finally we need to establish a ranking system for evaluating those measures which are most reliable at detecting differences in performance. In our global assessment of surgeon performance, measures with high reliability should be weighted higher than performance measures with lower reliability. This w i l l allow us to consolidate our performance measures such that we can deliver useful clinical observations for making assessments and ways of suggesting improvements. L i s t o f T e r m s 92 Action States ID Idle HD Hold RH Reach RT Retract PH Push PL Pul l 77? Translate SW Sweep 07? Orient C W Clockwise-Orient CCW Counter Clockwise-Orient G7? Grasp RL Release SP Spread CES Cumulative Event Sequence CESr Cumulative Event Sequence for all procedures CESn Cumulative Event Sequence for all procedures excluding procedure i CPD Cumulative Probability Distribution CPDj Cumulative Probability Distribution for an individual procedure i CPDT Cumulative Probability Distribution for all procedures CPDTi Cumulative Probability Distribution for all procedures excluding procedure i D K S (D) statistic difference measure Dreference CPD of D's from resampling Dmeasured CPD of D 's calculated from each procedure D'mmsm.cd = D(CPDnCPDi) where i is the ithprocedure Dreference = RS,. (CPDr) Drefjneas ~ D (Preferences & measured) DS_BS = R S n ( < > reference) Drefjneas ~ D(Dreference: Dmeasured) DCPD 95 95% confidence level of the bootstrap data Dem so 50%> confidence level of the bootstrap data d (delta) Event sequencing difference measure ^reference CPD o f 8 's from resampling 8measured CPD of 8's calculated from each procedure 5 „ d = d(TPMn TPMj) where i is the i procedure 0 reference = RS:{8(TPMT,TPM";andom)) where TPM'"andom is the transition probability matrix for m randomly selected events in CESr ESj Event Sequence for an individual procedure i FFT Fast Fourier Transform fc Frequency cutoff IRED Infra-red Emitting Diode KS Kolmogorov-Smirnov m resampled data sequence length MDMArray Mul t i Directional Marker Array MIS Min imal Invasive Surgery n number of events or samples > TV number of data points Ne Effective N NCL Neuromotor Control Laboratory NWPSS Normalize Weighted Postural Stress Score p p-value Phases CDD Cystic Duct Dissection GBD Gal l Bladder Dissection GBR Gal l Bladder Removal r number of bootstrapping cycles RS Resample RS^ resample in length of m, r times RULA Rapid Upper Limb Assessment Subtasks TT Total Time AP Approach TM Tissue Manipulation CP C l ip Application WD Withdrawal Surgical Tools Gr G r a s p e r S Scissors Ir Irrigation LGr L - G r a s p e r RGr Ra tche ted G r a s p e r 94 Ca Camera Sp Spatula Cp Clipper TPM Transition Probability Matrix TPMj Transition Probability Matrix of EST TPMT Transition Probability Matrix of CEST TPMTi Transition Probability Matrix of CESTI NVSP normalized Variance shift caused by patient variability VSP Variance Shift 95 Bibliography A n K . N . , Browne A . O . , Korinek S., Tanaka S., Morrey B .F . (1991). Three-dimensional kinematics of glenohumeral elevation. Journal of Orthopaedic Research 9, 143-149. Al len , J .A. , Hays, R.T. & Buffardt, L . (1986). Maintenance training simulator fidelity and individual differences in transfer of training. Human Factors 28, 497-509. Angl in C. (1993). A functional task analysis and motion simulation for the development of a powered upper-limb orthosis. M . A . S c . Thesis, University of British Columbia, Vancouver, B C . Bhattachaya A . , Warren J., Teuschler J., Dimov M . , Medvedovic M . , Lemasters G . (1999). Development and evaluation of a microprocessor-based ergonomic dosimeter for evaluating carpentry tasks. Applied Ergonomics 30, 543-553. Berguer, R. (1997). The application of ergonomics in the work environment of general surgeons. Reviews on Environmental Health 12, 99-106. Berguer, R., Rab, G.T., Abu-Ghaida, H . , Alarcon, A . & Chung, J. (1997). A comparison of surgeons' posture during laparoscopic and open surgical procedures. Surgical Endoscopy 11,139-142. Berguer, R., Remler, M . & Beckley, D . (1997). Laparoscopic instruments cause increased forearm fatigue: a subjective and objective comparison of open and laparoscopic surgery. Minimally Invasive Therapy and Allied Technologies 6, 36-40. Berguer R., Gerber S., Kilpatrick G. , Beckley D . (1998). A n ergonomic comparison of in-line vs. pistol-grip handle configuration in a laparoscopic grasper. Surgical Endoscopy 12, 805-808. Berguer R., Forkey D . L . , Smith W . D . (1999). Ergonomic problems associated with laparoscopic surgery. Surgical Endoscopy 13, 466-468. Bhattachaya, A . , Warren, J., Teuschler, J., Dimov, M . , Medvedovic, M . & Lemasters, G . (1999). Development and evaluation of a microporcessor-based ergonomic dosimeter for evaluationg carpentry tasks. Applied Ergonomics 30, 543-553. Buschbacher, R. (1994). Overuse Syndromes Among Endoscopists. Endoscopy 26, 539-544. 96 Cao, C. G . L . (1996). A Task Analysis of Laparoscopic Surgery: Requirements for Remote Manipulation and Endoscopic Tool Design. M . A . S c . Thesis, Simon Fraser University, Burnaby, B C . Cao, C. G . L . , MacKenzie, C. L . , Ibbotson, J. A . , Turner, L . J., Blair, N . P., and Nagy, A . G. (1999). Hierarchical decomposition of laparoscopic procedures. In Proc. Medicine Meets Virtual Reality: 7. IOS Press, Amsterdam. Catalano, M . F . (1997). Endoscopic Therapy of Complications Following Laparoscopic Cholecystectomy: How Much Can We Expect? Endoscopy 29, 389-391. Chung, J .Y. & Sackier, J . M . (1998). A method of objectively evaluating improvements in laparoscopic skills. Surgical Endoscopy 12, 1111-1116. Dautzenberg, P., Neisius, B . , Trapp, R., and Buess, G . (1995). A Powered Dexterous Instrument with Surgical Effectors for Telemanipulator Assisted Laparoscopy. In Proc. 17th IEEE Engineering in Medicine and Biology Int. Conf. Derossis, A . M . , Bothwell, J., Sigman, H . H . & Fried, G . M . (1998). The effect of practice on performance in a laparoscopic simulator. Surgical Endoscopy 12, 111 7-1120. Derossis, A . M . , Fried, G . M . , Abrahamowicz, M . , Sigman, H . H . , Barkun, J.S. & Meakins, J .L. (1998). Development of a model for training and evaluation of laparoscopic skills. American Journal of Surgery. 175,482-487. Efron, B . , & Tibshirani, R. (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy, Stat. Sci., 1, 54-77. Eubanks T.R. Clements R H . Pohl D . Williams N . Schaad D . C . Horgan S. Pellegrini C. (1999). A n objective scoring system for laparoscopic cholecystectomy. Journal of the American College of Surgeons. 189(6) 566-74. Faraz, A . & Payendeh, S. (1997). Synthesis and Wordspace Study of Endoscopic Extenders With Flexible Stem. Journal of Mechanical Design 119, 412-414. Farin, G . (1988). "Curves and Surfaces for Computer Aided Geometric Design," Academic Press Finlay, P .A . & Ornstein, M . H . (1995). Controlling the Movement of a Surgical Laparoscope. IEEE Engineering in Medicine and Biology 289-291. Fried, G . M . , Derossis, A . M . , Bothwell, J. & Sigman, H . H . (1999). Comparison of laparoscopic performance in vivo with performance measured in a laparoscopic simulator. Surgical Endoscopy 13, 1077-1081. 97 Genaidy A . M . , Al-Shedi A . A . , Karwowski W. (1994). Postural stress analysis in industry. Applied Ergonomics 25, 77-87. Gerald, C.F. ; & Wheatley, P.O. (1989). Applied Numerical Analysis. Addison-Wesley Publishing, 4th edition. Green, P.S., H i l l , J.W., Jensen, J.F. & Shah, A . (1995). Telepresence Surgery. In Proc. 17th IEEE Engineering in Medicine and Biology Int. Conf., 324-329. Halvorsen, K . , Lesser, M . , Lundberg, A . (1999). A new method for estimating the axis of rotation and the center of rotation. Journal of Biomechanics 32,1221-1227. Hanna, G .B . , Drew, T., Clinch, P., Hunter, B . & Cuschieri, A . (1998). Computer-controlled endoscopic performance assessment system. Surgical Endoscopy 12, 997-1000. Hasson, H . M . (1993). Rotational handle for laparoscopic instrumentation. Journal of Reproductive Medicine 38, 494-496. Hennie, F .C . , (1968). Finite-State Models for Logical Machines. John Wiley & Sons, Inc. Herder, J .L. , Horward, M . J . & Sjoerdsma, W. (1997). A laparoscopic grasper with force perception. Minimally Invasive Therapy and Allied Technologies 6, 279-286. Hodgson, A . J., Pantazopol, R. A . , Visser, M . D. , Salcudean, S. E . , and Nagy, A . G . (1997). Assessing potential benefits of enhanced dexterity in laparoscopic surgery. In Proc. 19th Ann. IEEE Engineering in Medicine and Biology Society Conference, Chicago. Hodgson, A . J . , Person, J.G., Salcudean, S.E., Nagy, A . G . (1999). The effects of physical constraints in laparoscopic surgery. Medical Image Analysis 3(3) 275-83. Hodgson, A . J . & McBeth, P .B. (2002). Comparing Motor Performance on Similar Tasks in Different Settings: Statistical Characteristics of a Nondimensional Difference Measure. Internal Document. Horgan, L . F . , O'Riordan, D . C . & Doctor, N . (1997). Neuropraxia following laparoscopic procedures: an occupational injury. Minimally Invasive Therapy and Allied Technologies 6, 33-35. Hunter, J .G. (1997). The Learnining Curve in Laparoscopic Cholecystectomy. Minimally Invasive Therapy and Allied Technologies 6, 24-25. Joice, P., Hanna, G .B . & Cuschieri, A . (1998). Errors enacted during endoscopic surgery - a human reliability analysis. Applied Ergonomics 29, 409-414. 98 Keyserling, W . M . (1986). A computer-aided system to evaluate postural stress in the workplace. Am.Ind.Hyg.Assoc.J. 47,641-649. Lentz, G . M . , Mandel, L .S . , Lee, D. , Gardella, C., Melvi l le , J., Goff, B . A . (2001). Testing surgical skills of obstetric and gynecologic residents in a bench laboratory setting: validity and reliability. American Journal of Obstetrics Gynaecology 184(7) 1462-70. L i r i c i , M . M . (1997). Editorial - New techniques, new technologies and educational implications. Minimally Invasive Therapies & Allied Technologies 6, 102-104. Mannila, H . & Ronkainen, P, (1998). Similarity of Event Sequences (Extend Abstract). University of Helsinki, Department of Computer Science. Matern, U . & Waller, P. (1999). Instruments for minimally invasive surgery: principles of ergonomic handles. Surgical Endoscopy 13, 174-182. Matern, U . , Eichenlaub, M . , Waller, P. & Ruckauer, K . (1999). MIS instruments. A n experimental comparison of various ergonomic handles and their design. Surgical Endoscopy 13, 756-762. McAtamney, L . , Corlett, E . N . (1993). R U L A : A survey method for the investigation of work-related upper limb disorders. Applied Ergonomics 24 91-99. Mueller, L .P . (1993). Laparoscopic instrument grips. A n ergonomic approach [letter]. Surgical Endoscopy 7, 465-466. Mukherjee, R., Song, G . , and Satava, R. M . (1996). A n Articulating Manipulator for Enhanced Dexterity in Minimal ly Invasive Surgery. In Proc. 18th IEEE Engineering in Medicine and Biology Int. Conf. Nagy, A . G . , Poulin, E .C . , Girotti, M . J . , Li twin, D .E . , Mamazza, J. (1992). History of laparoscopic surgery. Canadian Journal of Surgery 35(3) 271-4. Neisius, B . , Dautzenberg, P., Trapp, R., and Buess, G . (1995). Robotic Telemanipulator for Laparoscopy. In Proc. 17th IEEE Engineering in Medicine and Biology Int. Conf. Nelson, W . L . (1983). Physical Principles for Economies of Skilled Movements. Biological Cybernetics Ms, 135-147. Neuhaus, S.J. & Watson, D.I. (1997). Laparoscopic surgeons' thumb - is it a training phenomenon? Minimally Invasive Therapy and Allied Technologies 6, 31-32. 99 O'Brien, J.F., Bodenheimer, B .E . , Brostow, G.J. , Hodgins, J .K. (2000). Automatic Joint Parameter Estimation from Magnetic Motion Capture Data. In: Proceedings of Graphics Interface, Montreal, Canada, 53-60. Ohgami, M . (1998). Robotics in Endoscopic Surgery. Unpublished presentation at Society of American Gastrointestinal Endoscopic Surgeons (SAGES) Annual Meeting, Seattle, W A . Ottensmeyer, M . P., Thompson, J. M . , & Sheridan, T. B . (1996). Telerobotic Surgery: Experiments and Demonstration of Tele-Surgeon/Assistant Cooperation Under Different Time Delays and Tool Designs. In SPIE 2901 - The international society for optical engineering. Telemanipulator and Telepresence Technologies 3. (Stein, M . R., ed.), 156-166. O'Toole, R., Playter, R., Krummel, T., Blank, W., Cornelius, N . , Roberts, W. , Be l l , W. & Raibert, M . (1998). Assessing skill and learning in surgeons and medical students using a force feedback surgical simulator, In: Lecture Notes in Comp. Sci. 1496 899-909. Payandeh, S. & Shell, B . (1996) Endoscope. 2(1). Simon Fraser University, Burnaby, B C , Integrated System based project (IS-9), IRIS. (Pamphlet) Pellegrini, C . A . & Sinanan, M . N . (1997) Training, proctoring, credentialing in endoscopic surgery. Min. Inv. Ther. Allied Tech. 6, 26-30. Perissat, J. (1995) Laparoscopic surgery in gastroenterology: an overview of recent publications. Endoscopy 27, 106-118. Person, J .G. (2000) Thesis: A Foundation for the Design and Assessment of Improved Instruments for Minimally Invasive Surgery. Person, J .G., Hodgson, A . J . , & Nagy, A . G . (2000) Automated High-Frequency Posture Sampling for Ergonomic Assessment of Laparoscopic Surgery. Surgical Endoscopy 11 226-229. Press, W . H . , Teukolsky, S.A., Vetterling, W.T. , & Flannery, B .P . (1992). Numerical Recipes in C, 2 n d Ed . Cambridge University Press. Radermacher, K . , von Pichler, K . C , Erbse, St, Boeckmann, W. , Rau, G . , Jaske, G . , & Staudte, H . -W. (1996). Using human factor analysis and V R simulation techniques for the optimization of the surgical worksystem. In Medicine Meet Virtual Reality: 4: Health Care in the Information Age (Sieburg, H . , Weghorst, S., and Morgan, K . , eds.), IOS Press and Ohmsha, Amsterdam, N L . Rovetta, A . , Sala, R., Cosmi, F., Wen, X . , Milanesi, S., Sabbadini, D. , Togno, A . , Angelini , L . & Bejczy, A . K . (1996). A New Telerobotic Application: Remote 100 Laparoscopic Surgery Using Satellites and Optical Fiber Networks for Data Exchange. International Journal of Robotics Research 15, 267-279. Reznick, R . K . (1993). Teaching and testing technical skills. American Journal of Surgery. 165(3) 358-61. Rosen, J., Hannaford, B . , Richards, C . G . Sinanan, M . N . (2001). Markov Modeling of Minimal ly Invasive Surgery Based on Tool/Tissue Interaction and Force/Torque Signatures for Evaluating Surgical Ski l l , IEEE Transactions on Biomedical Engineering 48(5), 579-91. Rosser, J.C., Wood. M . , Payne, J .H. , Fullum, T . M . , Lisehora, G .B . , Rosser, L . E . , Barcia, P.J., Savalgi, R.S. (1997). Telementoring. A practical option in surgical training. Surgical Endoscopy 11 (8) 852-5. Rosser, J .C, Rosser, L . E . Savalgi, R.S. (1998). Objective Evaluation of a Laparoscopic Surgical Ski l l Program for Residents and Senior Surgeons. Archives of Surgery 133(2), 657-661. Sackier, J . M . & Wang, Y . (1994). Robotically assisted laparoscopic surgery. From concept to development. Surgical Endoscopy 8, 63-66. Sanders, A . F . (1991). Simulation as a tool in the measurement of human performance. Ergonomics 34, 995-1025. Starkes, J .L. , Payk, I., & Hodges, N . J . (1998). Developing a standardized test for the assessment of suturing skill in novice microsurgeons. Microsurgery 18(1) 19-22. Straface, S. (1995). Single-patient-use laparoscopic instrumentation: a company perspective. Endoscopic Surgery & Allied Technologies. 3(2-3) 135-9. Taffinder, N . , Darzi , A . , Smith, S., Taffinder, N . (1999). Assessing operative ski l l . Needs to become more objective. BMJ. 318(7188) 887-8. Taylor, R . H . , Funda, J., Eldridge, B . , Gomory, S., Gruben, K . , LaRose, D. , Talamini, M . , Kavoussi, L . & Anderson, J. (1995). A Telerobotic Assistant for Laparoscopic Surgery. IEEE Eng.Med.Biol.Mag. 279-288. Tendick, F., M o r i , T. & Way, L . W . (1995). Future of Laparoscopic Surgery. In Fundamentals of Laparoscopic Surgery (L.W.Way, S.Bhoyrul and T .Mor i , eds.), Churchill Livingstone Inc., New York, 235-252. Torkington, J . , Smith, S.G., Rees, B.I. , & Darzi , A . (2000). The role of simulation in surgical training. Annals of the Royal College of Surgeons of England. 82(2) 88-94. 101 Traverso L W , Koo K P , Hargrave K , Unger SW, Roush TS, Swanstrom L L , Woods M S , Donohue J H , Deziel DJ , Simon IB, Froines E , Hunter J, Soper N J (1997) Standardizing laparoscopic procedure time and determining the effect of patient age/gender and presence or absence of surgical residents during operation. A prospective multicenter trial. Surgical Endoscopy 11, 226-229. Treat, M . (1996). A surgeon's perspective on the difficulties of laparoscopic surgery. In Computer-Integrated Surgery (Taylor, R . H . , Lavallee, S., Burdea, G .C . and Mosges, R., eds.), M I T Press, Cambridge, M A . , 559-560. Viikari-Juntura, E. , Silverstein, B . (1999) Role of Physical Load Factors in Carpal Tunnel Syndrome. Scand Journal of Work Environment Health. 25(3), 163-185. V o n Mises, R., (1964). Mathematical Theory of Probability and Statistics. Academic Press. Walpole, R .E . , & Myers, R . H . , (1978). Probability and Statistics for Engineers and Scientists, Second Edition, Macmillan Publishing Co., Inc. Way, L . W . , Bhoyrul, S. & Mor i , T. (1995). Learning Laparscopic Surgery. In Fundamentals of Laparoscopic Surgery (L.W.Way, S.Bhoyrul and T .Mor i , eds.), Churchill Livingstone Inc., New York, 225-233. Winckel , C P . , Reznick, R . K . , Cohen, R., & Taylor, B . (1994). Reliability and construct validity of a structured technical skills assessment form. American Journal of Surgery. 167(4), 423-7. Woltring, H.J . (1986). A Fortran Package for Generalized, Cross-Validatory spline Smoothing and Differentiation. Advances in Engineering Software 8(2), 142-151. 102 Appendix A OR Study Experimental Protocol and Data Acquisition Procedures A.l Experimental Protocol Surgeon: Dr. Alex Nagy ( V G H ) Researchers: M r . Paul McBeth (UBC) Dr. Antony Hodgson ( U B C ) Location: Vancouver Hospital Procedure: MIS cholecystectomy Study Protocol: This is a protocol for the motion analysis study of Dr. Alex Nagy performing a MIS cholecystectomy. The surgeon is asked to scrub and enter the O R while the patient is being anesthetized. Before the operation begins one researcher also scrubs, in order to attach the sterilized marker arrays to the surgeon's hands, lower arms, torso, and surgical tools. The researcher remains scrubbed in to assist the surgeon and to be available to make any adjustments to the marker arrays or support cuffs. If the researcher is unable to scrub in, they instruct the O R nurse on the correct procedure for attaching the marker arrays from outside the sterile field. Once the marker arrays are secured in place, another researcher outside the sterile field performs a test to ensure the motion analysis equipment is operational. Once the equipment is tested and confirmed operational, the researcher informs the surgeon that they are ready to begin recording. When the surgeon confirms he is also ready, the researcher begins tracking the markers arrays with the Polaris motion capture system and begins recording the operation using both the external video camera and laparoscopic camera. The equipment records for the duration of the initial open and subsequent laparoscopic portion of the surgery. When the laparoscopic portion of the surgery is completed, the surgeon informs the researcher and equipment is shut off. The surgeon continues to wear the marker arrays once the surgery is completed in order to perform joint calibration. The researcher instructs the surgeon to perform a five-minute joint calibration procedure. Once the joint centre calibration procedure is complete, the surgeon is free to remote the marker arrays. If the surgeon feels uncomfortable wearing the markers at any time during the surgery, they are free to stop the experiment and remove the markers. Approval for this experiment was granted though the University of British Columbia Clinical Research Ethics Board and the Vancouver Hospital. In additional all equipment has been approved by Sterile Supply and Biomedical Engineering Department. 103 A .2 Equipment List The following outlines the hardware and software components required for experiments conducted in the OR: Hardware: 1 - Panasonic video recorder Panasonic A C adaptor 1 - 6 ' extension cord 7 1 - Super V H S video cassette 1 - Portable computer desk 1 - Polaris Tool Interface Unit 1 - Polaris Position Sensor 1 - Serial port cable 1 - Polaris Position Sensor cable 1 - Polaris power cable 1 - Power bar 1 - Logitech web cam 1 - 6' U S B extension cord 2 - Tripods 1 - P C 800 M H z A M D Duron (tower, keyboard, mouse, monitor) 2 - Video tapes 1 - Digital camera I - Active marker array * II - Passive marker arrays and M D M A r r a y ' s * P S V I - (3 reflective markers) P S V 2 - (3 reflective markers, Velcro back) P S V 3 - (3 reflective, 4 - 2mm hex bolts, 6 - 2mm nuts, 1" A C M E clip) P S V 4 - (3 reflective markers) P S V 5 - (3 reflective, 4 - 2mm hex bolts, 6 - 2mm nuts, 1" A C M E clip) P S V 6 - (3 reflective markers) TOOL-1 (MDMArray ) - (5 reflective markers, 3 - N D I mounting posts, 2 N C L mounting posts, 5 - 2mm hex bolts, 3 - 2mm nuts, 3 - 2mm washers, 1" A C M E clip) T O O L - 2 (MDMArray ) - (5 reflective markers, 3 - N D I mounting posts, 2 N C L mounting posts, 5 - 2mm hex bolts, 3 - 2mm nuts, 3 - 2mm washers, 1" A C M E clip) T O O L - 3 (MDMArray ) - (5 reflective markers, 3 - N D I mounting posts, 2 N C L mounting posts, 5 - 2mm hex bolts, 3 - 2mm nuts, 3 - 2mm washers, 1" A C M E clip) T O O L - 4 (MDMArray ) - (5 reflective markers, 3 - N D I mounting posts, 2 N C L mounting posts, 5 - 2mm hex bolts, 3 - 2mm nuts, 3 - 2mm washers, 1" A C M E clip) T O O L - 5 (MDMArray ) - (5 reflective markers, 3 - N D I mounting posts, 2 N C L mounting posts, 5 - 2mm hex bolts, 3 - 2mm nuts, 3 - 2mm washers, 1 A C M E clip) T O O L - 6 (MDMArray ) - (5 reflective markers, 3 - N D I mounting posts, 2 N C L mounting posts, 5 - 2mm hex bolts, 3 - 2mm nuts, 3 - 2mm washers, 1 A C M E clip) 2 - extra clips 1 - ( 1 " A C M E clip, 2 - 2mm hex bolts, 6 - 2mm nuts) 8 - V" dia. 1" long surgical tubing * 4 - Stretchy Velcro straps * 1 - Chest marker attachment strap * 2 - Foam Velcro straps w/ mating end * 1 - 3/16" socket head w/ extension piece * 1 - 2mm Al len key * * Equipment requiring sterilization Software: Windows 2000 or Window N T Matlab 6.0 R12 Tera Term Pro V2.3 Logitech QuickCam V5.4.1 Matlab programs: - P M C S . m - Calibrate Jo in t s .m - O R Tools Calib.m 105 A.3 Marker Placement Refer to the Figure A . l for proper position and orientation of markers: A c t v l - Torso P S V 2 - Right hand P S V 3 - Left tool P S V 4 - Distal Forearm P S V 6 - Proximal Forearm Tool-1 - Spatula Tool-2 - Scissors Tool-3 - Clipper Tool-4 - Grasper 1 Tool-5 - Grasper 2 (L-grasper) Tool-6 - Grasper 3 (Ratcheted grasper) Figure A.l - Marker placement to capture limb motion during surgery (L). Multi-Directional Marker Array on a dominant-hand laparoscopic tool handle (R). 106 A.4 OR Procedure The following are the step-by-step procedures for motion capture data collection in the OR. A.4.1 Pre-operative Set-up Required Time: 30min - lhr Suggested start time: 0600 for a 0745 procedure Set-up equipment as shown in Figure A . 2 . Anesthesia Cart Research Assistant Computer Cart Position Sensor & Tripod y Video Camera & Tripod MIS Cart Anesthetist Patient Tool Interface Unit Cart Researcher Surgeon Assistant Figure A.2 - Vancouver Hospital operating room equipment layout. 107 A.4.2 Software and Polaris Initialization Required Time: lOmin Suggested start time: 0650 for a 0745 procedure Start-up Procedure Teraterm 1) Run Teraterm 2) Turn on Polaris Tool Interface Unit (wait 20 sec for beep - RESETBE6F w i l l appear in the Teraterm command window) 3) Type: COMM 50000 - Reply: OKA YA896 4) Teraterm window: Setup - Serial port... Change baud rate to 115200 5) Type: INIT_ enter (note: _ means space bar) - Reply: OKAYA896 6) Teraterm window: File - Exit Start-up Procedure Matlab (IR testing) 1) Start Matlab 6.0 R l 2 2) Change current directory: d:\Polaris 3) Run IR tes t from Matlab command window 4) Check for any error messages 5) Identify external Infra-red sources (if required) 6) Shutdown Matlab A.4.3 OR Data Collection Procedures Required preparation time: 5 min Required Time: Approximately 75 min Suggested start time: 0745 1) Start video camcorder (focus camera on the surgeon upper limb and the surgical area 2) Start the MIS V C R to record the laparoscope 3) Start Matlab R l 2 4) Set Current Directory to: d:\Polaris\Polaris_Interface\OR_tracker 5) In the Command Window type: PMCS 6) Figure A.3 shown below wi l l appear 7) Select the all radio buttons for all markers 8) STOP check equipment connections before initialization of Polaris 9) Press the Initialize Polaris when ready. 10) Wait until all status bars are illuminated in yellow (Check for error messages in the Command Window). 11) In the Right Hand Tool Choice box select the appropriate tool used by the surgeon 108 12) Monitor the status bars and adjust equipment (ie.camera) i f required. (Red status bar: missing marker, Ye l low status bar: loading tool identification files, Green status bar: tracked marker) 13) A t the end of the procedure press Stop Tracking -> followed by Shut-down Polaris (Do not do this in the opposite direction). S y s t e m Initialization Too l Se lec t ion R a d i o But tons S ta tus B a r s S y s t e m Shu t -down UBC • Neuromotor Control Laboratory Figure A3 - Custom designed data acquisition software (PMCS.m) A.4.4 Joint Centre Calibration Data Collection Procedure Required preparation time: 1 min Required Time: 5 min Suggested start time: immediately post-operative 1) Set Current Directory to: d:\Polaris\Calibration\Joint_Calib_Data_Collection 2) In the Command Window type: Calibrate joints 3) Fol low instructions provided on the Matlab Command Window (Note: each sequence tracks 200 samples) 4) Joint centre calibration • Right shoulder centre calibration - Shoulder abduction / adduction - Subject must hold elbow locked while swinging the arm in an approximately cross shaped arc incorporating shoulder abduction / adduction, flexion / extension, and internal / external rotation. 109 • Right elbow centre calibration - Elbow flexion / extension - Subject must hold upper arm vertically downwards while swinging the forearm about the elbow in flexion and extension. • Right forearm axis calibration - Forearm pronation / supination - Subject must hold upper arm vertically downwards and elbow at 90° while rotating the forearm in pronation / supination motion. • Right wrist centre calibration o Wrist flexion / extension - Subject moves wrist in flexion and extension, o Wrist radial / ulnar deviation - Subject moves wrist in radial and ulnar direction. o Wrist spherical - Subject rotates wrist in a spherical motion. • Neutral Pose - limbs positional with zero reference position - Shoulder down, elbow flexed at 90°, straight wrist and hand. 5) After each calibration step a plot w i l l appear. Data points should be in definable paths. If there are stray data points redo the calibration for that step. Fol low direction in Matlab Command Window. 6) Before the surgeon removes the markers check for the following files in directory: d:\Polaris\Data_storage\Joint_calibration: neutral_r.mat re_fe_calib.mat re_ps_calib.mat rs_calib.mat rw_calib.mat rw_fe_calib.mat rw_ru_calib.mat 7) Remove markers from surgeon A.4.5 Tool Calibration Data Collection Procedure Required preparation time: 10 min Required Time: 30 min Suggested start time: post-operative 1) Start Matlab 6.0 R 12 2) Change current directory: d:\Polaris\OR_Tool_calibration 3) Run: Tool_calibration 4) Place the tip of the tool at a stationary location and rotate the tool handle in a spherical motion about this point 5) Select: PSV1 - extra tool P S V 3 - Left hand tool Tool-1 - Spatula Tool-2 - Scissors Tool-3 - Clipper Tool-4 - Grasper 1 Tool-5 - Grasper 2 Tool-6 - Grasper 3 110 6) Refer to Matlab Command Window for tracking update. Note the sequence w i l l track 150 times. 7) Check plot for stray data points 8) Remove marker from each tool 9) Submit for markers and tools for sterilization A.4.6 Clean-up Equipment / Back-up data Required Time: 15 min 1) Account for all equipment (including marker array sub components) outlined in Section A . 2 . 2) Submit any contaminated equipment for sterilization. 3) Back-up data on hard-drive. A . 5 Con tac ts See Dr. Antony Hodgson for list of hospital contacts. I l l Appendix B Measurement Equipment and Data Acquisition Protocols for use in the Operating Room B.l. Polaris System In our experiments we used a Polaris Hybrid Optical Tracking System (Northern Digital Inc. (NDI), Waterloo, Ontario, Canada) as shown in Figure B . l The system tracks the three-dimensional position of both passive retro-reflective markers and active infra-red light emitting diodes (LEDs) . The positional accuracy of each marker is 0.35 mm. Polaris dynamically tracks and calculates the location ( X , Y , Z spatial coordinates) and orientation (roll, pitch, yaw angles) for each marker array. A marker array is defined by a minimum of three markers on a rigid frame. The system is capable of tracking three active arrays and nine passive marker arrays. Figure B.l - Polaris Hybrid Optical Tracking System (Northern Digital Inc., Waterloo, Ontario, Canada). System components: Position Sensor (top), Tool Interface Unit (middle), Marker Arrays (both active and passive markers) (bottom). 112 We used Polaris to track the position of one active array and nine passive arrays at 20 H z with occasional missing data points. The active array and three passive arrays were used to collect ergonomic postural data. The remaining six passive arrays were used in tool tracking; one passive array on the left hand tool (non-dominant hand) and five passive arrays on the right tool (dominant hand). See Figure B.2. Figure B.2 - Markers arrays used to track position of the upper limb (left), non-dominant hand tool (top right), and dominant hand tool (bottom right). B.l . l Planar Marker Array Design Passive and active marker arrays are made from 1/8" plate aluminum machined into an 'L-shaped' based as shown in Figure B.3. For passive arrays, three retro-reflective markers are mounted to the array plate with stainless steel machine screws and for active arrays, three lA" active infra-red L E D s are mounted to the array plate with epoxy. This planar marker array design is limited in its trackable range of movement as only one plane is visible to the camera. If the plane of the array is tilted more than - 8 0 ° from the plane of the camera then trackability is lost. The total angular range of movement is - 1 6 0 ° for planer marker arrays. 113 Figure B.3 - Polaris marker arrays - Active IRED marker Array (left). Passive retro-reflective marker array (right). B.1.2 Multi-directional Marker Array Design (MDMArray) In minimally invasive surgery the surgical tool is constrained to three rotational degrees of freedom (Roll , Pitch, Yaw), and 1 translation degree of freedom (Plunge). The limited trackability of planar marker arrays prevents continuous tracking of rotating surgical tools where the face of an array can be rotated out of line-of-sight. There is a need for a multi-directional marker array. A new marker array was developed to limit occlusions resulting from the wide range on surgical tool movements. The Multi-Directional Marker Array ( M D M A r r a y ) consists of multiple faces, which can be tracked simultaneously by Polaris. The M D M A r r a y is constrained to having geometrically unique faces (as outlined by Polaris) and must be lightweight, compact, sterializable, easily attached and released from the surgical tools and well accepted by surgeons. The resulting M D M A r r a y , as shown in Figure B.4, measures 72 mm in diameter and 58 mm high and weights 26 grams and is built from 6061 aluminum alloy. The aluminum was polished and finished with a black anodized surface finish to minimize excess infra-red light reflections. It has five detachable retro-reflective markers, which form five geometrically unique faces. Three of the detachable markers are attached to the base using N D I mounting posts. The remaining two detachable markers are attached to custom designed mountain posts. The M D M A r r a y is equipped with a VA" metal binder clip allowing easy attachments and release to the shaft of the instrument. The resulting M D M A r r a y was well accepted by the surgeon and hospital staff. The range of marker trackability has been enhanced. The M D M A r r a y can be rotated about its longitudinal axis with - 3 5 0 ° of trackabiltiy. Normal to die longitudinal axis the trackabilty is - 160° . 114 Figure B.4 - Multi-Directional-Marker-Array (MDMArray) - A multi faced marker array used to track the position of a laparoscopic surgical tool (left). The MDMA attached to a laparoscopic surgical tool (right). A quantitative evaluation of the M D M A r r a y was made using a bench top simulator. The trackability of the M D M A r r a y was calculated and compared with a planar marker array for 30 simulated manipulation tasks using a standard laparoscopic tool. A n average improvement in trackability of 159% (SD: 88%) was found. See Figure B.5. Marker Array Trackability 100 80 1 60 co .XL U co I- 40 20 « o o • • 1 o Oo o o X o X x x x x X X v X X X X X x x v X * x X x x x v x x v x x X X X x O MDMArray X Planar Array 10 15 20 Simulated Task Number 25 30 Figure B.5 - A percent trackability evaluation of MDMArray and Planar marker arrays attached to a standard laparoscopic tool for 30 simulated manipulation tasks. 115 B.2. V i d e o System A video system was used in conjunction with the Polaris system in order to correlate information about the actual surgical procedures to corresponding raw motion data tracked by the optoelectronic system. Video images of the surgery were collected using both a laparoscope and an external camera aimed at the surgeon. The images from these two sources were time stamped and recorded onto a standard V H S tape. The laparoscope system was comprised of a standard 10mm - 30° surgical laparoscope, camera and illuminator (Stryker Endoscopy). These components are arranged and connected in the O R as illustrated in Figure B.6. OR Video Figure B.6 - Data collection components - Polaris Hybrid Optical Tracking System and Video Recording Equipment. B.3 . D a t a A c q u i s i t i o n P ro toco l One expert surgeon was evaluated in seven clinical laparoscopic cholecystectomies over a period of four months at Vancouver Hospital (Vancouver, British Columbia). Posture data and tool tip trajectories were measured with opteoelectronic motion analysis equipment. Sterilized planer marker arrays are attached to the surgeon's torso and dominant arm (on the proximal and distal forearm and on the hand), as shown in Figure B.2. The torso array was made of active markers; all others were passive. Marker arrays were secured to the surgeon using elastic and Velcro® harnesses. Individual M D M A r r a y ' s were attached to each surgical tool (scissors, clipper, spatula, tissue grasper, ratcheted grasper, locking grasper). The Polaris system was connected to a personal computer (800 M H z A M D Duron) running custom designed Matlab data acquisition software. Approval for this experiment was granted through the University of British Columbia Clinical Research Ethics Board and the Vancouver Hospital. Additional details in Appendix A . 116 Appendix C Signal Processing and Data Formatting for Optoelectronic Marker Array data from the OR C l . Signal Processing For the positional tracking of surgical tools and limbs in space an optoelectronic system was chosen because of its sterializabiltiy, sampling properties and its availability despite its inherent line-of-sight limitations. This section describes properties of the signals returned from Polaris and the methods used to deal with missing data cause by occlusions. C. l . l Missing Data The missing data samples are a result of occlusions, internal localization errors or additional processing time required by Polaris. Missing data resulting from occlusions is a serious problem in tracking the surgical instruments using optoelectronic systems. Occlusions occur when the line-of-sight between the marker and the camera becomes obscured. Obstructions may include the surgeon's hands or surgical instrumentation. During surgery an attempt to minimize occlusions was made by carefully positioning the optoelectronic camera to maximize workspace trackability and by attaching markers in the most visible position and orientation. Custom designed tool marker arrays where developed to help deal with line-of-sight limitations as discussed in Appendix B . Internal localization errors or 'flicker' occurs when Polaris is unable to find a marker within 0.5 mm of its expected location. These errors are the result of poor marker calibration constants derived from marker array geometry or the inability of Polaris to distinguish circular characteristics of each reflective sphere. See Figure C l . A s a result a marker array which is clearly in the field of view may temporarily not register with Polaris. Figure Cl - NDI Retroreflective Marker Visibility -164° hemispheric range of view. The internal localization of the marker is based on the centroid. Side view (left) has a shifted centroid which introduces localization errors. Top view (right) has not shifted centroid. 117 Unequally spaced sampled data is the result of real-time calculations required by Polaris. The real-time fitting calculations of the Polaris system often require one or two extra (1/60 s) clock frames. The frequency of these additional computation periods depends on the number of passive markers tracked simultaneously (more markers, more computation). C.1.2 Signal Properties The signal returned by Polaris is the position (X,Y,Z spatial coordinates) and orientation (ql, q2, q3, q4 quaternions) for each marker array. Analysis of the spectral content of each signal found each having approximately the same cutoff frequency and noise signal properties. The Fast Fourier Transform (FFT) and residual power plots on a sample signal are shown in Figure C.2. The underlying frequency content (fc) is -.-4 Hz. \ \ v_. 0 2 4 6 8 10 0 2 4 6 8 10 Frequency Content (Hz) Frequency Content (Hz) Figure C.2 - Spectral signal properties. Signal power as a function of signal frequency (left). Residual signal power as a function of signal frequency. fc = ~4Hz (right). C.1.3 Filtering and Resampling The non-equally sampled nature of the data limits our ability to filter noise and interpolate missing points. Traditional frequency based techniques rely on continuously sampled data at a fixed sample frequencies. Resampling non-equally spaced data at a fixed rate using interpolation techniques prior to filtering is not advised because of signal noise enhancement. Spline fitting techniques are often used on unequally spaced data (Farin 1988). Spline coefficients are calculated and based on sampled data points. A smoothing parameter (S) was used to describe the smoothness of a spline and was chosen to reduce measurement noise. A Generalized Cross Validation (GCV) technique was used to optimally find a smoothing parameter for fitting the spline to the data set. The GCV method used in our experiment was originally developed by Woltring and has been used extensively in the biomechanics literature (Woltring 1986). 118 The G C V technique was used to optimally find coefficients of cubic and quintic splines fitting position data. A cubic spline was used to fit position data acquired at the tool handle ( M D M A r r a y attachment point). Using a homogeneous transformation matrix (found during tool calibration) the fitted position data of the tool handle is used to locate the tool tip. A quintic spline is fitted to the tool tip trajectory and used to calculate higher order derivatives. Quintic splines were chosen over the more commonly used cubic splines, for its higher order derivative continuity properties. The drawbacks of higher order splines are increased computation time and artifacts in positional interpolation (Curtis 1989). In certain applications quintic splines show oscillatory behavior in the interpolation of position data (Curtis 1989). Since our data at the tool tip does not require interpolation no oscillatory behavior was observed in our filtered tool tip position data. Additional details regarding the tool tip calibration process are found in Section C.2.2. C.1.4 Interpolation / Extrapolation Errors Interpolation and extrapolation were used to deal with missing points outlined in Section C . l . l . The error associated with interpolating across varying gap sizes was calculated. Five 20 sec samples of real data with no occlusions were used to calculate average and maximum R M S error of missing data points. The data set collect was real O R data and based on a sample of mixed movement types (rapid movements and slow movements). The original test signal was smoothed using G C V filter. The coefficients from the spline fitting procedure are used to calculated velocity, acceleration, and jerk. A known sample point was removed from the original data set a specified time instant. The missing data set was smoothed and the new position, velocity, acceleration and jerk were calculated at the given specified time. The differences between the original and interpolated values are used in calculation of R M S error. The process was continued with larger gap sizes. The R M S mean difference and max difference was found for each gap size up to 15 samples. In the segments interpolated all possible data points on each side of the missing gap were used in the calculation A minimum of two data points are required on each side of a missing data gap. Figure C.3 (upper left) shows a graphical representation of missing data interpolation for a segment of data. The purpose of incrementing the missing segment along the total segment is to investigate the effect of gap size at different characterized movements. Movements can be fast or slow (high and low frequency). For slow movements the interpolation gap size is large and for fast movements the interpolation size is small. Because of the random nature of the position signal it is difficult to predict the frequency content of the data to vary the gap size. A gap size was chosen to be a compromise between larger errors in fast movements and smaller gap size interpolation in slow movements. A n R M S error of 1mm was chosen as an acceptable error which suggests a maximum gap size of 0.5 sec (10 samples) can be interpolated. A gap size ef 10 samples allows a resolution in velocity, acceleration, and jerk of 22 m/s, 280 m/s 2, 4700 m/s respectively. To remain within the 1 mm R M S error boundary only two extrapolated data points are allowed. See Figure C.3. 119 4 E E. 3 c o '% 2 o 0-1 0 40 30 1 E. | 20 LU CO ^ 10 • Original Data O Missing Data u - , - r » . . . Filtered / Interp. .,«CJ 0.5 1.5 Time (sec) RMS Velocity Error 5 10 Gap Size RMS Jerk Error 6000 — 5000 £ E 4000 UJ 3000 co i 2000 1000 Mean RMS Error Max R M S Error + + + 10 • Mean RMS Error + Max RMS Error • • • • • 15 15 Gap Size RMS Position Error 1.5 co 1 0.5 400 • Mean RMS Error + Max RMS Error • • • + • + • + , • * * 15 Gap Size RMS Acceleration Error 300 E E. | 200 UJ CO S 100 • Mean RMS Error + Max RMS Error • • • • * • • + + + - + + "'" " 40 E E. 1 20 i 10 6 8 10 Gap Size Extrapolation Errors 12 14 • Mean RMS Error + Max RMS Error + i ± i + + + • • + • • 4 6 Gap Size 10 Figure C.3 - Missing data interpolation and extrapolation. Graphical representation of missing data interpolation (top left). RMS interpolation error associated with measuring position and calculating velocity, acceleration, and jerk as a function of gap size (top right - bottom left). RMS extrapolation error associated with measuring position (bottom right). C.1.5 Velocity, Acceleration, and Jerk Signal Properties Calculating higher order derivatives of position data such as velocity, acceleration, and jerk leads to noise amplification and signal distortion. The frequency content of interest in the position signal is less than 4 Hz . This is considerably below the Nyqist frequency of 10 H z which is required to prevent aliasing when sampling a 20 Hz . The sampling rate is appropriate for our analysis of position data. The accumulated power of the position, velocity, acceleration, and jerk signals suggests each signal is appropriate for our analysis as each signal contains at lease 93% of its total power at 4 Hz. See Figure 4. 120 0 8 w 1 1 1 1 1 1 1 1 1 ' 0 1 2 3 4 5 6 7 8 9 10 F r e q u e n c y Con ten t (Hz) Figure C.4 - Accumulated normalized power for position, velocity, acceleration, and jerk signals. C.2 Data Processing and Formatting The output signal returned from Polaris is the position and orientation of each marker array. A series of formatting steps are required for identifying useful ergonomic joint angle data and tool tip data. The following sections describe the details and procedures of data formatting. C.2.1 Ergonomic Data Planar marker arrays are attached to the surgeon's torso and dominant arm (on the proximal and distal forearm and on the hand). The position data from these marker arrays was used in combination with the post-operative calibration data to find joint angles. Figure C.5 illustrates the steps required to find joint angle data. Kinematic model details are outlined in Person's thesis (Person 2000). 121 Post operative joint calibration c a l i b r a t e _ j oints.m Joint centre location f i n d j o i n t frames.m Kinematic model X 3 1 OR data collection PSV2, PSV4, PSV6, Joint angle calculation f i n d j o i n t angles.m r Shoulder Joint angles — • Forearm Elbow F E Wrist F E Wrist RU Figure C.5 - Ergonomic marker data processing and formatting flow chart. C.2.2 Tool tip Data Individual M D M A r r a y ' s were attached to each surgical tool (scissors, clipper, spatula, tissue grasper j ratcheted grasper, L-grasper). The M D M A r r a y has five faces each of which are individually tracked by Polaris. The position data from these faces was used in combination with the post-operative (sphere fitting) calibration data, and filtering algorithms to find the position, velocity, acceleration and jerk of the tool tip. Figure C.6 illustrates the steps required to find tool tip data. 122 Five geometrically unique faces tracked on the MDMArray F A C E 1 , FACE2 , ... , FACE5 Apply transform to tool tip Tool tip transformation based on postoperative tool calibration using sphere-fitting techniques Calculate common tool handle reference frame T o o l r e f t r a n s f o r m s . m Optimally fit a cubic spline using G C V algorithm T o o l _ h a n d l e _ s p l i n e . m I Interpolate / Resample Tool_handle__resample. m Refit a quintic spline using G C V algorithm T o o l _ t i p _ s p l i n e . m Use spline coefficients to calculate position, velocity, acceleration, and jerk at the tool tip Figure C.6- MDMArray marker data processing and formatting flow chart. C.2.3 Video Data The recorded video images of the surgery from both the laparoscope and the external camera aimed at the surgeon require synchronization. The images from these two sources were synchronized, time stamped and recorded together onto a standard V H S tape using a S O N Y S V O - 9 5 0 0 M D S V H S . 1 2 3 C.3 Performance Measure Extraction We developed custom design software, written in Matlab, for managing time durations, tool tip kinematics and joint angle data. The software is useful for visualization purposes and extracting useful performance measures. The graphical user interface (GUI) is easy to use and is equipped a comment window and a help file for assistance. See Figure C.7 Fte fdt « •« insert tools Window H*fc KudDWs PoswnOata Position Otivxim J*mxr*artf6f. Toci fp QrMrtJttort Start! 0 fo j 0 pTSGjpu! Eninrrrrrnrg -Velocity Menu Acceleration Menu Joint Angle Menu 3D trajectory plot Data status bar Zoom bar / window Performance measures Ergonomic stress scores Instructions window Figure C.7 - Custom designed trajectory-viewing program with performance measure displays. (SVP.m) 124 Appendix D Operational Definitions D.l Hierarchical Decomposition Operational Definitions To characterize and assess the repeatability of our chosen performance measures, we created a hierarchical decomposition describing the procedure in terms of surgical phases and stages, tool tasks and subtasks, and fundamental tool actions. This five-level hierarchical decomposition provides the foundation for a quantitative analysis of surgeon performance. This section provides a summary of the notation and terminology used in the hierarchal decomposition of a laparoscopic cholecystectomy. This framework is general enough to be expanded to apply to other surgical procedures. Figure D . l shows the five levels of the hierarchical framework. Figure D.l - Five levels of the hierarchical decomposition. D.l.l Phase Level Phases are the fundamental levels of a procedure forming the backbone and the foundation for further decomposition. A laparoscopic cholecystectomy procedure is divided into five distinct phases as shown in Figure D.2. Each phase has particular global goals and must be achieved before advancing to the next phase. Definitions of each phase are summarized in Table D . l . Figure D.2 - Phase levels of a laparoscopic cholecystectomy. 125 Table DA — Hierarchal phase level definitions. P h a s e N a m e Abbrev . Definit ion Start S top T roca r T P T i m e spent prepar ing and Adminis t rat ion of First indicat ion Prepara t ion inserting surg ica l tools in the abdomina l wal l of the patient. local anaes the t ic to the naval (where the first t rocar is p lace for the camera ) . the tools a re used to manipu la te t i ssue. Cys t i c Duct D issec t ion C D D T i m e spent d issect ing the omenta l fo ramen and t i ssue First indicat ion the tools being Separa t i on of all connec t ing surrounding the cyst ic duct used to v e s s e l s to the and artery. manipulate t issue. gal l b ladder. G a l l B ladde r D issec t ion G B D T i m e spent d issec t ing the gal lb ladder f rom the Fol lowing the C D D with mos t Separa t ion of the gall b ladder f rom gal lb ladder f o s s a on the v iscera l su r face of the liver. often the spatu la used to start d issec t ion . the liver. Ga l l B ladde r R e m o v a l G B R T i m e spent p lacing the gal lb ladder in a bag and removing it f rom the abdomina l cavity. Separa t ion of the gall b ladder f rom the liver. R e m o v a l of the ga l lb ladder f rom the abdomina l cavity. C l o s u r e C L T i m e spent remov ing t rocars and suturing inc is ions. R e m o v a l of the gal lb ladder f rom the abdomina l cavity. F ina l g a u z e or med ica l tape str ips are p laced . Note: all p h a s e level def ini t ions b a s e d on v ideo observat ion D.1.2 Stage Level The phase levels of a procedure are further divided into stages as shown in Figure D.3. Each stage is characterized by its own end goal, however each defined stage within a phase is not required for successful completion of that phase. For example the CDD phase may be successfully completed without having to control bleeding if there was no bleeding to control. The start and ending of each stage is defined in terms of context of the tools being used within each stage. Definitions of each phase are summarized in Table D.2. Figure D.3 - Stage level diagram for CDD, GBD, and GBR phases. Table D.2 - Hierarchal stage level definitions. P h a s e S t a g e Too l used G o a l C D D Explorat ion (Requi red) g raspers , c a m e r a A s s e s s liver state, identify ana tomica l var iat ions and locate var ious ana tomica l structures. Isolate cyst ic duct and artery (Requi red) g raspers , L-g raspers , irrigation, spatu la Loca te and isolate cyst ic duct and cyst ic artery and to r emove any a d h e s i o n s . S e p a r a t e cyst ic duct and artery (Requi red) c l ippers, s c i s s o r s , spatu la C l ip and cut major connec t ing v e s s e l s such a s the cyst ic duct and the cyst ic artery. Cont ro l b leeding (Opt ional if required) c l ippers, g raspers , spatu la Contro l any signif icant b leed ing. G B D Isolate ga l lb ladder f rom liver (Requ i red) g raspers , irrigation, spa tu la Sepa ra te the ga l lb ladder f rom the liver. Cont ro l b leeding (Opt ional if required) c l ippers, g raspers , spatu la Contro l any signif icant b leed ing. G B R C l e a n - u p (Opt ional if required) Irrigation R e m o v e any e x c e s s debr is and to a s s e s s the state of any b leeding ins ide the abdomina l cavity. B a g ga l lb ladder (Requi red) ratcheted g raspers , g raspe rs Insert the ga l lb ladder into a baggie . R e m o v e gal lb ladder (Requi red) ratcheted g raspe rs R e m o v e the ga l lb ladder f rom the abdomina l cavity. Note: all s tage level def ini t ions b a s e d on v ideo observat ion 127 D.1.3 Task Level A task is a set of movements performed with a single tool to achieve a desired effect. A number of tasks or tool sequences may be required to successfully complete a stage of a procedure. A task segment is defined from the time the tool tip is place in the distal end of the trocar until the tool is pulled out through the same trocar. The initial velocity is approximately zero and the movement of the tool into the trocar should be continuous without external interruptions. The withdrawal ends with the tool tip exiting the distal end of the trocar (independent of tool tip velocity). We selected the clip application as an example task, which is performed at least 6 times per procedure, to form the bases of the remaining decomposition analysis. D.1.4 Subtask level The subtask level describes how the surgical tool is moving inside the patient. Figure D . 4 shows the subtask levels required for application of a clip and Table D . 3 outlines definitions of each subtask. Figure D.4 - Subtask level diagram for clip application. Table D.3 - Hierarchal subtask level definitions S u b t a s k N a m e Abbrev . Definit ion Start S top F S M * -A p p r o a c h A P Too l is mov ing toward the t issue upon entry into the trocar. Entry of the tool tip into the distal end of the trocar. Too l tip in contact with t issue. T i s s u e Manipu la t ion T M Too l is in contact with the t issue. Initial contact of the tool tip with t issue. F ina l contact of the tool tip with t issue. C l ip App l ica t ion C P Cl ipp ing tool is used to apply a cl ip. Too l tip is in contact with the patient 's t issue and the jaws of the cl ipping tool start to c lose . J a w s of the cl ipping tool start to open . F S M * -Wi thd rawa l W D Too l is mov ing away f rom the t issue being pul led out of the trocar. F ina l contact of the tool tip with t issue. Exit of the tool tip f rom the distal end of the trocar. Note: all sub task level def ini t ions b a s e d on v ideo observat ion * F ree s p a c e m o v e m e n t ( F S M ) - w h e n the tool to mov ing in f r eespace (no t i ssue manipulat ion). 128 D.l.5 Action level Actions stand as the building blocks of all manipulation tasks and are based on 12 distinct types of tool movements. O f the 12 actions, three are based solely on video observations, two are based solely on kinematic data and the remaining seven are based on a combination of both kinematic and video observations. The kinematic and video data used to distinguish actions often suggest membership in more than one action state. A s a result it is possible for tool movements to be in some combination or a collection of actions. In total there are 72 feasible combinations of these 12 basic actions. Using the kinematic and video data the membership value of the 12 actions at each sample point was evaluated. A membership value ranges from zero to one; zero representing no membership and one representing complete membership. Table D.4 outlines the characteristic details defining each action state according to the reference frame defined in Figure D.5 (left). Table D.4 - Definition of tool/tissue interactions for each action state Act ion Sta te A b b r e v J a w Posi t ion O p e n / c l o s e d T i s s u e Con tac t Too l tip path a s per F ig D.5 State A s s e s s m e n t K inemat i cA / ideo Cons t ra in ts Idle ID Ei ther either none k inemat ic a b s ve l . < 2 m m / s for 0.15 s e c Hold H D C l o s e d yes any v ideo R e a c h R H Ei ther no posit ive Z both Ret rac t R T Ei ther no negat ive Z both P u s h P H Ei ther yes posit ive Z both Pul l P L C l o s e d yes negat ive Z both Trans la te T R Ei ther no X Y - p lane both ' S w e e p S W Either yes X Y - p lane both Or ient O R Ei ther either about Z -ax i s k inemat ic > 20 deg /s C W C W Either ei ther posit ive rot. k inemat ic > 20 deg /s C C W C C W Ei ther either negat ive rot. k inemat ic > 20 deg /s G r a s p G R C l o s e d either any v ideo R e l e a s e R L O p e n yes any . v ideo S p r e a d S P O p e n either any v ideo 129 Figure D.5 - Tool tip reference frame and action state definitions. The action states as defined by the reference frame at the tool tip (left). Tool path projections on the tool axis (right). D.l.5.1 Action Level Membership Functions The kinematic definitions of the reach, retract, and translate actions are the same as the push, pull , and sweep actions respectively; the only difference being tissue contact (established by video observation). For this discussion consider these actions as forward, backward, and sideways. The projection of the tool tip path vector on the tool axis vector at each sample forms the bases for establishing forward, backward, and sideways membership (See Figure D.5(right)). The projected vectors and angle between vectors is calculated at each sample interval. The angle between vectors is used to establish the direction of tool movement ie. forward (0°-90°) or backward (90°-180°). A normalization of the projected vectors is used to calculate the membership of forward and backward, or sideways movement. The membership functions (MF) for the axial (forward / backward) and normal (sideways) directions is calculated as follows: [D. l | Transitions between states are often rapid and membership duration is short-lived making it difficult to tell when the transitions started and ended. Figure D.6 shows sample membership functions for selected actions for a clipping task. For our performance measure analysis we considered mean membership values however, we did made an attempt at isolating tool tip actions states by establishing a series of arbitrary rules and cut-offs and used this data in our sequencing analysis presented in Appendix F 130 j t 0.8 cu - | 0.6 cu o 0.4 o 0.2 < i f I A P T M C P W D Translate Sweep 10 15 20 Time (sec) 25 30 35 Figure D.6 - Sample of action membership functions for a clip application. D.2 Performance Measures Definitions D.2.1 Kinematic Performance Measures Table D.5 outlines the kinematic performance measure definitions used for quantifying performance at the tasks and subtask levels of the hierarchal decomposition. Table D.5 - Time and kinematic performance measure definitions Kinemat ic M e a s u r e Pe r f o rmance m e a s u r e K inemat ic Definit ion T i m e D is tance Start / s top definit ions T=t s t 0 p - t s t a r t outl ined in Sec t ion D.1 Path d is tance '-JL?P Veloc i ty Acce le ra t ion Je rk M in imum M a x i m u m M e a n M in imum ' M a x i m u m C = I ^](xi-xi_l)2+(yi-yi_J2+(zi-z^)2 i _ start 2 - 2 -2 + y, + Jerk cos t (Smoo thness ) (Ne lson , 1983) Trajectory deviat ion R M S straight line deviat ion a,. = Jx~i 1 T J = -^d2(t)dt R M S error between m e a s u r e d trajectory and a straight-l ine trajectory 131 D.2.2 Sequence Similarity Measure Transition state diagrams were used to represent event sequences and describe the probability of transitions between different states. The transition probabilities are calculated by dividing the number of transitions between one state and a second state by the total number of transitions from the first state to all other possible states. The transition probabilities of the state diagram are reduced to a Transition Probability Matrix (TPM). Each cell of the transition probability matrix (TPMy) represents the transition probability from one event to another (event i to event j). A n example of a state transition diagram and T P M are shown in Figure D.7 To establish the confidence interval for the transition probabilities we use the following equation (Walpole, 1978): P-Zan]—<P<P + zan \— [D.2] \ n \ n Where p is the estimated transition probability, n is the number of transitions used to calculate the transition probability, and q=\-p. 132 Appendix E Preliminary Selection of Performance Measures and Development of Comparative and Reliability Analysis Techniques E. l Development of Performance Measures During the course of our data analysis we considered a number of techniques to identify and extract meaningful and reliable performance measures. In addition to our selected performance measures we considered the following analysis techniques: • Spectral analysis using Fast Fourier Transforms (FFT) and short-time Fourier transforms • Wavelets • Trajectory template matching The spectral and wavelet analysis techniques were considered of limited utility in our application. Using these techniques on our kinematic data we found no obvious extractable performance parameters. Template matching may be useful in identifying performance measures related to the tool tip trajectories, however due to limited time we were unable to implement this technique. E.2 Performance Measure Comparative Analysis Using the K-S Statistic To compare distributions of performance measures in different contexts, we use the Kolmogorov-Smirnov (K-S) statistic. The K - S statistics is : useful because it is nondimensional, and makes no a priori assumptions about the nature of the statistical distributions of the data. In addition it considers the entire distribution rather than other statistics, which only test for differences in mean (Mest) or variance (/-test) (Von Mises, 1964). The K - S statistic is commonly used to test the null hypothesis that two sets of data were drawn from the same underlying distribution. This is helpful for testing the significance of procedural specific variability discussed in Section E.2.3. E.2.1 Establishing Confidence Bounds In our applications the K - S statistic is computed based on two finite sets of measures, the value we find is itself an estimate of the true difference between the two sets. To assign a confidence interval to our estimate we use a bootstrapping approach. B y resampling the two finite sets of measures and evaluating the new D value (for resamples » 0) we establish a distribution of D values as shown in Figure E. 1. 133 C P D ! C P D 2 ni n 2 R e s a m p l e with rep lacement to or iginal length ni n 2 Find Dj . r t imes r» 0 100 90 C P D ( D ) / -80 / -70 / / -60 / i -50 40 y / -30 • 20 / -10 -0 0.1 0.2 0.3 0.4 0.5 • D 0.7 0.8 0.9 Figure E.l - Resampling distributions of two finite sets to establish confidence bounds on our estimate for D. The right illustration depicts how we establish the 5% and 95% confidence bounds on our estimate for D. E.2.2 Correction of D for Sample Size Due to the limited size of our data our estimated values for D are consistently greater than the actual value. This is also the case when the difference in means is low. For our purposes we correct the bias in the D estimate by introducing the data vector length. Through numerical experimentation conducted by Hodgson (2002), we found the following function gives an approximation of D (our estimate of D): \-D(l-l/D0(N)) Where lnD 0 (7Y) = A:, +k2lnN and k, = 0.0261, k 2 = -0.471. N is the number of elements in the data vector. For cases where the data vectors are of unequal length we use the effective length Ne given by: ; v . - ^ _ [ E . 2 ] Given D and N, we solve the quadratic equation in D from equation E . l to obtain an estimate of D. 134 E.2.3 Reliability / Patient Variability A s part of our analyses to show reliability of our performance measures we attempt to quantify the effect of procedural variability. This section is intended to outline some of the additional computational details omitted from the methods section of Chapter 2. The goal is to measure the difference between two distributions which represent: (1) the range of differences we would expect i f we measured our surgeon on a standard (ideal) patient (CPD(Dreference)) and (2) the range of differences we would expect to get i f we measured this surgeon in the O R on an arbitrary patient (CPD(Dmeasured))- The difference between these two distributions reflects the amount of procedural variability. The example provided is the distribution of clipping task completion times over 13 procedures. Figure E.2 illustrates how CP D(D reference) and CPD (Dmeasured) are evaluated. Figure E.3 shows the Monte Carlo approach for showing significance of procedural specific variability. The CPD(Dftsref) represents the distribution of D values we would expect for showing the extent of inter-procedural variability. If Dref_meas is within the 90 percentile limits of CDF(DRSref) then there is no evidence of procedural specific variability. This analysis also shows evidence pur bootstrapping approach is appropriate as we see correspondence between the analytical />value distribution and the p-value distribution evaluated using the Monte Carlo approach. The analytical />-value is evaluated directly from the K - S statistic and is a function of D and Ne. Figure E.4 shows how CPD(Dref_meas) is evaluated using a bootstrapping approach and a comparison of the CPD(Dref_meas) to CPD(DRS_rej) for multiple procedures. 135 R e f e r e n c e C P D T C P D , for a l i i , i * j C P D ; R a n d o m l y s a m p l e (without rep lacement ) new CPD of length n n = m e a n length of C P D i ' s r t imes r » 0 for e a c h procedure CPDT Find Dj Find D C P D ( D m e a s u r e d ) C P D ( D r e f e r e n c e ) Figure E.2a - The calculation methods used to find CPD(Dreference) and CPD(Dmeasurea). CPDj and CPDT represent distribution of completion times for a single procedure and all procedures respectively. Figure E.2b - Cumulative Probability Distribution's (CPD) of Dreference andDmeasuredfor increasing number of procedures (right). From right to left the distributions represent 1, 2, and 3 groupings of procedures. 136 C P D ( D r e f e r e n C e ) R a n d o m l y s a m p l e (without rep lacement ) in lengths n n = number of p rocedures r t imes r» 0 H F i n d D C P D ( D R S _ r e f ) Figure E.3a - The calculation method used to find CPD(DRs_rej) 100 90 I 80 J O I 70 b = 60 J D ro x i o > ro E O I Q 0_ O 50 40 30 20 10 0 0 ^ 1 \ / 1 P r a c e d . 2 P r a c e d . "!::' ft i \ \ 3 P r a c e d . •Mi \ / CPD(D R S _ref ) ( 1 f --1 1 i 1 1 i  -.1! W / P M C = 1-CPD(D R S _ref) -i\ ni / / Panalytical (from K - S Stat) -j / m \J V i 0.1 0.2 0.3 0.4 D 0.5 0.6 0.7 0.8 Figure E.3b - Monte Carlo simulation results from CPD(Dreferencs). From right to left the distributions represent 1, 2, and 3 groupings of procedures. 137 C P D ( D m e a s u r e C j ) / " t imes r » 0 CPD(D r eference) n = length(Dreference) R e s a m p l e with rep lacement K - S Stat ist ic F ind D Find p C P D ( D r e f Tes t for procedura l „ spec i f ic variability 0 .5 Difference S c a l e (D) Figure E.4 - The calculation method used to find CPD(Dref_meas) (left). The uncorrected D values for Dref_meas and DRSref (right). E.3 Sequencing Comparative Analysis using the 5 Measure For the purpose of evaluating the difference between event sequences we compared a single event sequence from one procedure (ES,) to a global reference sequence composed from all other procedures (CESji). To distinguish discrepancy between the two event sequences we first calculate the transition probabilities of all possible states as seen from both event sequences and then use a sequence difference measure 8 defined as: n 5 = ^ [E.3] 22>, where P^ and Psample, are the i transition probabilities of TPMre/, and TPMsampie respectively, w, represents the frequency of the z t h event in the reference sequence and n is the total number of transition probabilities. The sequence difference measure 8 is analogous to our KS (D) statistic as it is non-dimensional and varies on a scale of zero to one (similar to dissimilar). In addition, our measure holds a metric property (ie. for sequences S, T, and U: 3(S,T)>0, non-negative; d(S,T)=0 O S=T, coincidence; 3(S,T)=3(T,S), symmetry; 3(S,T)+3(T,U)<8(S,U), triangle (Mannila, 1998)). 138 E.3.1 Establishing Confidence Bounds As with the K-S difference measure the sequencing difference measure is based on two finite sequences resulting in an estimate of the difference between the two. We used a bootstrapping approach to assign a confidence interval to our as outlined in Figure E.5. E S , for all /, / V y R a n d o m Start ing S e q u e n c e R a n d o m Transi t ion S e q u e n c e CEST TPMT Sequential end and start events of neighboring ES in CESTJ are excluded from the calculation of TPMn r t imes r » 0 ES of length m where m = ESj ESj • f  T P M i } • Find 6"; TPM, RS Figure E.S - Comparison of a single event sequence to a reference event sequence to give a similarity measure 5. The reference event sequence is based on a collection of event sequences. NOTE: Random Starting Sequence - sequence based on the probability of the first event occurring. Random Transition Sequence - sequence based on weighted transition probabilities of the reference model. E.3.2 Correction of 5 for Sample Size As of yet we have not developed a correction for the S. Preliminary discussions suggest using a Monte Carlo simulation based on artificial data as in the D. correction. E.3.3 Reliability / Patient Variability In this section we develop a method for quantifying the effects of patient variability on surgical event sequencing. The method is analogous to the K-S method developed for evaluating time, kinematic, and joint angle data. The example provided is of tool sequencing for the entire procedure over 15 procedures. Figure E.6 illustrates how CPD(8reference) and CP'D(S'measured) are evaluated. Figure E.7 shows the Monte Carlo approach for showing significance of procedural specific variability. Figure E.8 shows how CPD (Dref meas) is evaluated using a bootstrapping approach and a comparison of the CPD (Drefjneas) to CPD(DRSref) for multiple procedures. 139 Calcu la t ing 5reference Calcu la t ing 5 m e a s u r e d C E S T E S ; T P M T TPMI Find S L R a n d o m Start ing S e q u e n c e R e f e r e n c e TPMT r t imes r » 0 R a n d o m Transi t ion S e q u e n c e E n d state T h e E S is terminated w h e n the random transit ion s e q u e n c e genera tes a previously def ined end state r Find 5reference T P M Figure E.6a - The calculation methods used to find 6measured (left) and 5reference (right). NOTE: Random Starting Sequence - sequence based on the probability of the first event occurring. Random Transition Sequence - sequence based on weighted transition probabilities of the reference model. Figure E.6b - Cumulative probability distribution's (CPD) ofdreference and dmeasured for increasing number of procedures (right). From right to left the distributions represent 1, 2, and 3 groupings of procedures. 140 n T C P D(5reference) Randomly s a m p l e (without rep lacement) in lengths n n = number of p rocedures r t imes r » 0 Find D C P D ( D R S ref) Figure E. 7a - The calculation method used to find CPD(DRs_rej) Figure E. 7b - Monte Carlo simulation results from CPD(Sreference). From right to left the distributions represent 1, 2, and 3 groupings of procedures. 141 CPD (5 m e a sured) r t imes r» 0 CPD (5 r e f e r e nce) I n = length(5reference) R e s a m p l e with rep lacement K - S Stat ist ic F ind D Find p C P D ( D r e f _ Tes t for procedura l o spec i f ic variabil i ty 0.2 0.4 Difference Scale (D) Figure E.8 - The calculation method used to find CPD(Dref_meas) (left). The uncorrected D values for Dref_meas and DRS_ref (right). Appendix F Collection of Performance Measure Analysis Results This appendix presents a collection of performance measures analyses evaluated at each level of the hierarchical decomposition. The purpose is to develop a database of performance measures useful for future analysis and reference. F.l Phase Level F. l . l Time 30 25 4 ~ 20 c E V 15 10 5 0 • Expert Su rgeon • Typ ca l T ime C D D G B D Surg ica l P h a s e G B R Figure F.l - Phase level completion times. Categories and typical task times from Tr aver so et al. (1997). Procedure Figure F.2 - Percentage contribution of phases in the total procedure. 143 F.1.2 Kinematics There are no kinematic performance measures at the phase level of the hierarchical decomposition. F.1.3 Joint Angles Shoulder - ^ g ^ g ^ K H I - D - H E lbow F E WOqc+l h-p<>H K H - c m - c i — \ HD-H-PD-H Wris t F E (CHI—OH-tKQ-KP-H I—P-OHt-O 1 rQHD-tCH HQ—I Wris t R U Forearm KHiWHP-KH . h-O-r+O—I rCH CTI-CHHDH K H —I H T 3 4 H R - 0 - C 1 - I • Non-Shif ted Data o Shif ted Data • M e a n Non-Shif ted • M e a n Shifted 0.0 0.2 0.4 0.6 Difference Measure (D) 0.8 1.0 Figure F.3 - Ranges of difference measures between joint angle distributions from each individual procedure and the concatenated data from the remaining six procedures, assessed by the Kolmogorov-Smirnov statistic. The confidence intervals marked indicate 90% confidence bound and are based on a bootstrap estimate. Shifted data represents joint angle distributions normalized to have a common mean. 144 F.1.4 Event Sequencing 0.66 ± 0.09 S - Start G r - G r a s p e r S - S c i s s o r s Ir - Irrigation L G r - L - G r a s p e r R G r - Ra tche ted G r a s p e r C a - C a m e r a S p - S p a t u l a C p - C l i p p e r F - F in ish Figure F.4 - Tool exchange state diagram. Confidence intervals found using equation D.l. 145 1 2 3 4 5 6 7 A B ~ C D E F G H Mean 0 i _ ~o u o 0.0 i t P S O Total 5 Value • C D D 5 Value -r j—b—] 1 A G B D 5 Value O G B R 5 Value • Total Mean 5 Value ! • C D D Mean 6 Value A G B D Mean 5 Value - ^ ^ - H J — ^ ' , « G B R Mean 5 Value ^2z O -•CM - O -<>-0.2 0.4 0.6 5 - Difference S c a l e 0.8 1.0 Figure F.5 - Ranges of tool exchange (procedural and phase level) sequence similarity measures from each individual procedure and the concatenated data from the remaining 14 procedures, assessed by our event sequencing similarity measure. The confidence intervals marked indicate one standard deviation and are based on a bootstrap estimate. F.2 Stage Level c E D E c o CL E o o F.2.1 Time 10 9 8 7 6 5 4 3 2 1 0 Exp lore D issec t C D Sep . S tages • Total S t a g e T ime • T i ssue Man ip . T ime C A S e p . Control B leed ing Figure F.6- Stage level completion times during the CDD phase. 146 F.2.2 Kinematics There are no kinematic performance measures at the phase level o f the hierarchical decomposition. F.2.3 Joint Angles There are no joint angle performance measures at the stage level of the hierarchical decomposition. F.2.4 Event Sequencing 1.00 ±0 .00 Figure F.7 - Stage level state diagram for the CDD phase. Confidence intervals found using equation D.l. 147 (D i— =3 TJ CD O O 1 2 3 4 5 6 7 A B C D E G — O I O--o--o--o--o-o 5 Va lue • M e a n 5 V a l u e 0.0 0.2 0.4 0.6 5 - Difference S c a l e 0.8 1.0 Figure F.8 - Ranges of stage level sequencing (CDD: ie. exploration, dissection, vessel separation, and control bleeding) similarity measures from each individual procedure and the concatenated data from the remaining 14 procedures, assessed by our event sequencing similarity measure. The confidence intervals marked indicate a 90% confidence bound and are based on a bootstrap estimate. F.3 Task Level F.3.1 Time 30 o 2 5 4 CD 03 ^ 20 E ^ 15 o a) Q. 10 E o O c Control B leed ing C D Cl ip Task Context C A Figure F.9 - Clipping task completion times in different contexts. The confidence intervals marked indicate one standard deviation. 148 W D 0.0 0.2 0.4 0.6 Difference Scale (D) • odd vs. even • 1st. vs. 2nd A C A vs. CD x init vs. sub x Bleed vs. V S 0.8 1.0 Figure F.10 - Ranges of difference measures between clipping (task and subtask) completion times distributions in different contexts, assessed by the Kolmogorov-Smirnov statistic. The confidence intervals marked indicate a 90% confidence bound and are based on a bootstrap estimate. (Time_context_comp.xls) F.3.2 Kinematics 0.0 0.2 0.4 0.6 Difference Scale (D) 0.8 • o d d v s . e v e n • 1st v s . 2nd A C A v s C D X i n i t v s . s u b . x B l e e d v s . V S 1.0 Figure F.ll - Ranges of difference measures between kinematic performance measure distributions in different contexts, assessed by the Kolmogorov-Smirnov statistic. The confidence intervals marked indicate a 90% confidence bound and are based on a bootstrap estimate. (Perforamcne_Measures.xls) 149 Idle Hold Reach Retract P u s h Pull Translate Sweep Orient - C W - C C W Grasp R e l e a s e Spread Not Observed 0.0 0.2 0.4 0.6 Difference Scale (D) 0.8 • odd vs. even • 1 s t vs. 2 n d • C A v s C D x init vs. sub. x Bleed vs. VS 1.0 Figure F.l2 - Range of difference measures between average action membership value distributions in different contexts, assessed by the Kolmogorov-Smirnov statistic. The confidence intervals marked indicate a 90% confidence bound and are based on a bootstrap estimate. (Perforamcne_Measures_Actionsl.xls) 150 F.3.3 Joint Angles Rs_el " Re_fe Rw_fe Rw ru • odd vs. even • 1st vs. 2nd A 1st vs. sub X Sep. vs. Bleed X CA vs. CD 0.0 0.2 0.4 0.6 0.8 1.0 Difference Scale (D) Figure F.13 - Range of difference measures between mean joint angle distributions in different contexts, assessed by the Kolmogorov-Smirnov statistic. The confidence intervals marked indicate a 90% confidence bound and are based on a bootstrap estimate. 0.0 -x-0.2 0.4 0.6 Difference Scale (D) • odd vs. even • 1st vs. 2nd A 1st vs. sub XSep. vs. Bleed X C A vs. CD 0.8 1.0 Figure F.14 - Range of difference measures between maximum joint angle distributions in different contexts, assessed by the Kolmogorov-Smirnov statistic. The confidence intervals marked indicate a 90% confidence bound and are based on a bootstrap estimate. 151 F.3.4 Event Sequencing CD Separation Q Exit state unfinished O Return to unfinished state CA Separation Figure F.15 - Task level state diagram for cystic duct (left) and cystic artery (right) vessel separation. Confidence intervals found using equation D. 1. 1 2 3 4 5 6 0) \3 7 XI 0) A 0 M 1 B C D E F G H Mean A C A 5 V a l u e £^5-V o C D 5 Va lue r - t ^ ! — I A C A Mean 5 V a l u e I A I f-CH • C D M e a n 5 V a l u e 0.0 F ^ 0.2 0.4 0.6 0.8 1.0 5 - Difference S c a l e Figure F.16 - Ranges of task sequence (vessel separation: ie. clip, clip, clip, cut) similarity measures from each individual procedure and the concatenated data from the remaining 14 procedures, assessed by our event sequencing similarity measure. The confidence intervals marked indicate one standard deviation and are based on a bootstrap estimate. 152 F.4 Subtask Level F.4.1 Time 100% 90% 80% S 70% c | 60% £ 50% o O - 40% CD I 30% 20% 10% 0% 18.1±10.2 16.2±9.3 • Withdraw al • Clip App. El Tissue Manip. • Approach Clipping Task Figure F. 17' ~ Percentage contribution of subtasks in a clipping application tasks. F.4.2 - Kinematics In the interest of conserving space the difference measures for kinematic and average membership value performance measures for the subtask levels have not been provided. The performance measure results at the subtask level are consistent with results at the task level. Subtask difference measure plot can be found in: performance_measures.xls. F.4.3 - Joint Angles In the interest of conserving space the difference measures for joint angle performance measures for the subtask levels have not been provided. The performance measure results at the subtask level are consistent with results at the task level. Subtask difference measure plot can be found in: performance_measures.xls. Figure F.18 - Task level state diagram for subtasks in clipping applications. Confidence intervals found using equation D.l. 154 h o - . - O — i -Or. o 5 Value • Mean 5 Value ho o.o 0.2 0.4 0.6 6 - Difference Scale 0.8 1.0 Figure F.19 - Ranges of subtask sequence (clipping: ie. approach, tissue manipulation, apply clip, withdrawal) similarity measures from each individual procedure and the concatenated data from the remaining 14 procedures, assessed by our event sequencing similarity measure. The confidence intervals marked indicate one standard deviation and are based on a bootstrap estimate. 155 Appendix G BioRome Conference Submission This document was submitted to and presented at the Fifth International Symposium on Computer Methods in Biomechanics and Biomedical Engineering in Rome, Italy in October 2001. Repeatability of Ergonomic Stress Scores in Laparoscopic Surgery Paul B . McBeth 1 , Antony J. Hodgson 1 , PhD, Alex G. Nagy 2 , M D . Departments of1 Mechanical Engineering and 2Surgery, University of British Columbia, Vancouver, BC, Canada V6T 1Z4 Abstract Minimal ly invasive techniques have been rapidly adopted into mainstream surgery over the last decade, yet the effects of the instruments on surgeon performance, comfort and safety have not been well evaluated. Ergonomic assessments in minimally invasive surgery are comparatively rare, despite widespread acknowledgement that strain injuries do occur to surgeons. A n optoelectronic and video motion analysis system is used to acquire postural measurements at frequencies of ~20 Hz . These measurements are converted to ergonomic stress scores using a modified R U L A method. One expert surgeon was evaluated in seven clinical laparoscopic cholecystectomies over a period of four months. We successfully recorded postures at least once per second during 92±6% of the time the surgeon was performing tissue manipulation tasks. We found the ergonomic stress scores were comparatively high throughout the procedures, particularly for the wrist and elbow. The Kolmogorov-Smirnov statistic was used to compare similarities of joint angle distributions over multiple trials. The results show marked similarities between joint angle distributions over multiple - trials, which suggests repeatability in our ergonomic assessment method. This system wi l l be a critical component in validating surgical simulations for use in training and certifying surgeons and in designing and evaluating equipment. 156 1. Introduction Minimally invasive techniques have been rapidly adopted into mainstream surgery over the last decade, yet the effects of the instruments on surgeon performance, comfort and safety have not been well evaluated. Ergonomic assessments in minimally invasive surgery are rare, despite widespread acknowledgement that strain injuries do occur to surgeons. Radermacher et al. (1996) was one of the first to assess minimally invasive surgeries and outlined improvements for workplace design. Berguer showed the postures in laparoscopic surgery are more stressful than in conventional open surgery (Berguer 1997). In this paper we investigate the biomechanical stresses of. an expert surgeon over multiple procedures. The objective is to provide an assessment of biomechanical stresses experienced by an expert surgeon over multiple laparoscopic cholecystectomy procedures and to establish the foundation of a method to quantitatively benchmark new tool designs. 2. Methods 2.1 Equipment A n optoelectronic motion analysis system was used in conjunction with video recordings to acquire postural data and tool tip trajectories at frequencies of ~20 H z during seven live laparoscopic cholecystectomy procedures. For this study, we used a Northern Digital Polaris Hybrid Optical Tracking System capable of tracking the 3D position of both active infra-red light emitting diodes (IREDs) and passive retro-reflective markers with an accuracy of -0.2-0.3 mm. The Polaris system was connected to a standard P C (800 M H z A M D Duron) running custom-designed data acquisition software written in Matlab. In addition to the optoelectronic system, we recorded video images of the surgery using both a laparoscope and an external camera focused on the surgeon. The images from these two sources were time stamped and recorded onto standard V H S tape; from these images, we could later determine the stage of the operation and correlate it with the detailed motion measurements. A l l equipment used in the operating room was tested and approved by Vancouver Hospital's Biomedical Engineering Department and sterilized with ethylene oxide when appropriate. 2.2 Experimental Protocol One expert surgeon was evaluated in seven clinical laparoscopic cholecystectomies over a period of four months at Vancouver Hospital. The surgical instrumentation used for the study consisted of a standard set of reusable laparoscopic tools designed for cholecystectomy procedures. We attached sterilized marker arrays to the surgeon's torso and dominant arm (on the proximal and distal forearm and on the hand), as shown in Figure 1(M,R). This set of marker arrays enabled us to track and record the surgeon's joint angles at the shoulder, elbow, and wrist. Arrays were secured to the surgeon using elastic and Velcro® harnesses. One researcher scrubbed into the surgery and attached the sterilized marker arrays to the surgeon's hands, lower arms and torso and laparoscopic tools. This researcher remained to assist the surgeon with any adjustments to the marker arrays or support cuffs. Immediately postoperatively, the surgeon's joint centre locations were identified using a set of standard isolated joint motions. Neutral joint positions were also determined. 157 2.3 Kinematic Model and Calibration We used a simplified seven degree of freedom rigid-body model of the upper limb as shown in Figure 1(L). The model allows us to represent shoulder abduction/adduction, flexion/extension, and internal/external rotation, elbow flexion/extension, pronation/ supination and wrist flexion/extension and radial/ulnar deviation. Using this model, we applied a kinematic calibration method based on circle and sphere fitting techniques to locate the approximate centre of each of the shoulder, elbow, and wrist joints. continuous score discrete score 40 60 Shoulder Angle Wnst FE Angle Figure 2 - Modified continuous RULA scoring system and the discrete RULA scoring system for shoulder elevation and wrist flexion/extension. 2.4 Stress Scoring System To rate the stress experienced by a surgeon we used a modified version of the Rapid Upper L imb Assessment ( R U L A ) technique (McAtamney 1993). The R U L A technique is a posture evaluation method designed for assessing upper limb postures during light manipulation tasks. Stress exposure is evaluated using body posture diagrams and scoring tables which specify different posture zones. We used a modified version of the R U L A scoring system which incorporated continuous scoring measures to improve the sensitivity of our assessments (see Figure 2). Since the standard R U L A scales have different ranges for each joint, we normalized all scores such that 0.0 corresponds to the minimum R U L A value for a joint and 1.0 corresponds to the maximum R U L A value. 158 2.5 Comparative Analysis (Kolmogorov-Smirnov Statistic) We would expect to see some difference in joint angle distributions due to patient and measurement variability; however, since the surgeon is fully trained we expect that his approach to each patient w i l l be as consistent as possible and we would hope that there is sufficiently little variation between procedures that measurements obtained during a single procedure w i l l be reliable indicators of the physical stress experienced by the surgeon. To demonstrate the repeatability of stress scores for a surgeon over multiple surgeries, we report the differences between the distributions of joint angles obtained in each procedure taken singly and those obtained from the remaining six surgeries. The Kolmogorov-Smirnov statistic, d, provides a measure of difference between two distributions and indicates the maximum vertical difference between pairs of cumulative distribution functions; d ranges from 0 (similar) to 1 (different). 2.6 Post-Operative Calculations Using the correlated video data, we reviewed the entire procedure and divided the procedure into three phases of interest: cystic duct dissection (CDD) , gallbladder dissection (GBD) , and gallbladder removal (GBR) . For each major phase of the surgery, we computed a normalized weighted postural stress score (NWPSS) by averaging the instantaneous scaled R U L A scores over the observation period and we report these values averaged over the seven procedures. 3. Results 3.1 Marker Visibility A certain amount of position data is lost during the course of the surgery due to marker occlusion or internal localization errors. Using the correlated video images we separated manipulation tasks from non-manipulation tasks and found that the majority of missing data could be attributed to marker occlusions during segments not directly related to performing a surgical task. We applied a generalized cross-validation technique to interpolate missing data segments shorter .than 0.5 s in duration (expected R M S error < 0.9 mm). During tissue manipulation tasks, we obtained at least one sample per second an average of 92% (SD: 6%) of the time. 3.2 Comparative Analysis For each joint, we calculated and plotted the cumulative distribution function (CDF) of joint angles during each procedure (see Figure 3 for representative examples). Due to variability in defining the neutral posture for each joint, we shifted each distribution such that they share a common mean. The differences in the distributions are therefore primarily due to differences in shape rather than mean values. Figure 4 shows the range of d values for each joint resulting from comparing joint angle distributions from each procedure to the concatenated distribution of the remaining six procedures, along with the average d value and standard deviation for each joint. The most proximal joints exhibit the least variability (d ~ 0.10-0.13, on average), while the 159 wrist joint exhibits the most (d ~ 0.2, on average). Since the surgeon is fully trained, these values represent the minimum variations we might expect to find from procedure to procedure when using the same surgical tools. E l b o w F l e x i o n / E x t e n s i o n (deg) Wr is t R a d i a l / U l n a r (deg) Figure 3 - Cumulative distribution functions for the elbow flexion/extension and wrist radial/ulnar deviations for all seven procedures. K S S t a t i s t i c f o r J o i n t A n g l e D i s t r i b u t i o n s Shoulder - r C K t t f c c V l - l o f K H Elbow F E ko& F U H Wrist F E M I—o—ftfixHOol Wrist R U ICHHXJCVCH1 l o t o K S (d) va lue Forearm P S (owe? ncc4o0H • M e a n K S (d) va lue 0.0 0.2 0.4 0.6 0.8 1.0 Difference Measure (d) Figure 4 - Ranges of difference measures between joint angle distributions from each individual procedure and the concatenated data from the remaining six procedures, assessed by the Kolmogorov-Smirnov statistic. The confidence intervals marked indicate one standard deviation and are based on a bootstrap estimate. 3.3 Posture Scores Using the three operative phases C D D , G B D , and G B R , we computed the modified normalized continuous R U L A posture scores for each joint; these are shown in Figure 5. The elbow and wrist joints show significant stress throughout all phases of the procedure, as does the forearm during gall bladder dissection. 160 100 80 60 40 o o w </) w CD v> 20 Shoulder N o r m a l i z e d W e i g h t e d P o s t u r a l S t r e s s S c o r e s Elbow Wrist FE Wrist RU Forearm P S Figure 5 - Normalized Weighted Postural Stress Scores for each joint averaged over the three primary phases of all seven procedures. 4. Discussion The primary purpose of this study was to assess the stresses experienced by an expert surgeon over multiple laparoscopic cholecystectomy procedures and to demonstrate a new method to assess the potential benefit o f new tool designs. By comparing joint angle distributions, we investigated the effect of patient-specific factors on measurements of physical stress. The low values and variance of d for the shoulder, elbow, and forearm suggests that these joint angle distributions are only slightly affected by patient variability. The higher variance of d for wrist flexion/extension and radial/ulnar deviation suggest moderate inter-patient effects. The ranges of d established in this study wi l l prove useful when evaluating the effect of new tools with improved ergonomics. Such tools, i f effective, would produce joint angle distributions characterized by d values typically larger than 0.2 when compared with the distributions found in this study. Similarily, such improvements ought to be reflected in decreased stress scores. Unfortunately, current tools seem to produce significant stress at virtually all joints (excluding the shoulder) throughout all phases of the procedure; this finding is consistent with other reports (e.g., Berguer 1998). 5. Conclusions We successfully used an automated high-frequency postural measurement system to assess the stresses experienced by an expert surgeon over seven laparoscopic cholecystectomy procedures. Joint stress levels are high in all phases of surgery and the elbow and wrist are significantly more stressed than the shoulder and forearm. Using the Kolmogorov-Smirnov statistic, we created a difference scale to assess the effects of inter-patient variability. This new scale can be used to evaluate new tool designs, validate surgical simulations and assess the performance of surgeons. 161 6. References Berguer R, Rab G T , Abu-Ghaida H (1997) A comparison of surgeons' posture during laparoscopic and open surgical procedures. Surg. Endosc. 11:139-142. Berguer R, Gerber S, Kilpatrick G (1998) A n ergonomic comparison of in-line vs pistol-grip handle configuration in a laparoscopic grasper. Surg. Endosc. 12:805-808. McAtamney L , Corlett E N (1993) R U L A : A survey method for the investigation of work-related upper limb disorders. App l . Ergon. 24:91-99. Radermacher K , et al. (1996) Using human factor analysis and V R simulation techniques for the optimization of the surgical worksystem. In: Sieburg H , Weghorst S, Morgan K (eds.) Medicine Meets Virtual Reality 4: Health Care in the Information Age, IOS Press and Ohmsha, Amsterdam, N L . 162 Appendix H Medicine Meet Virtual Reality Conference Submission This document was submitted to and presented at The 10 Annual Medicine Meets Virtual Reality Conference in Newport Beach, California, USA in January 2002. Quantitative Methodology of Evaluating Surgeon Performance in Laparoscopic Surgery Paul B . McBeth 1 , Antony J. Hodgson 1, PhD, Alex G. Nagy 2, M D , Karim Qayumi 2 , M D PhD Departments of Mechanical Engineering and Surgery, University of British Columbia, Vancouver, BC, Canada V6T1Z4 Abstract Quantitative performance and skil l assessments are critical for evaluating the progress of surgical residents and the efficacy of different training programs. Current evaluation methods are subjective and potentially unreliable, so there is a need for objective methods to evaluate surgical performance. We identify a feasible method to measure kinematic data in the live operating room setting and to assess the repeatability of an analysis method based on a hierarchical decomposition of surgical tasks. We used an optoelectronic motion analysis system to acquire postural data and tool tip trajectories of one expert surgeon over a period of four months. To assess repeatability of performance measures, we created a hierarchical decomposition diagram describing the procedure in terms of surgical tasks, tool sequences and fundamental tool actions. From the kinematic data, we extracted characteristic measures of individual tool actions and compared these measured distributions using the Kolmogorov-Smirnov statistic. The comparisons of distributions show consistent performance over time by a trained surgeon and little effect from patient variability, and so are likely reliable measures of performance. A n expanded set of reliable kinematic measures wi l l form the basis for quantifying surgical skill and should be useful in validating surgical simulations for use in training, certifying surgeons and designing and evaluating new surgical tools. 163 1. Introduction Although minimally invasive techniques have been rapidly adopted into mainstream surgery over the last decade, developments in teaching and evaluation methods have not kept pace. Quantitative performance and skil l assessments are critical for evaluating the progress of surgical residents and the efficacy of different training programs. They are also important in assessing new surgical tool sets and techniques. Current evaluation methods are subjective and potentially unreliable (Rosser, 1998) so there is a need for objective methods to evaluate surgical performance. One valuable approach, which was recently demonstrated on a porcine model, is based on an analysis of tool tip force/torque signatures (Rosen, 2001). In this paper, we discuss a complementary approach which incorporates kinematic features of both tool motion and surgeon posture. The purposes of this study were to identify a feasible method to measure kinematic data in the live operating room setting and to assess the repeatability of an analysis method based on a hierarchical decomposition of surgical tasks. A n abbreviated sample of the kinematic data results is presented due to limited space. 2. Methods 2.1 Equipment A n optoelectronic motion analysis system and video recordings were used to acquire postural data and tool tip trajectories at frequencies of -20 Hz. For this study, we used a Northern Digital Polaris Hybrid Optical Tracking System capable of tracking the 3D position of both active infra-red light emitting diodes (IREDs) and passive retro-reflective markers with an accuracy of -0.2-0.3 mm. The Polaris system was connected to a standard P C (800 M H z A M D Duron) running custom-designed data acquisition software written in Matlab. In addition to the optoelectronic system, we recorded video images of the surgery using both a laparoscope and an external camera focused on the surgeon. The images from these two sources were time stamped and recorded onto standard V H S tape; from these images, we could later determine the stage of the operation and correlate it with the detailed motion measurements. A l l equipment used in the operating room was tested and approved by Vancouver Hospital's Biomedical Engineering Dept. and sterilized with ethylene oxide when appropriate. 2.2 Experimental Protocol One expert surgeon was evaluated in seven clinical laparoscopic cholecystectomies over a period of four months at Vancouver Hospital. We attached sterilized marker arrays to the surgeon's torso and dominant arm (on the proximal and distal forearm and on the hand), as shown in Figure 1(L,C). This set of marker arrays enabled us to track and record the surgeon's joint angles at the shoulder, elbow, and wrist. Arrays were secured to the surgeon using elastic and Velcro® harnesses. Custom-designed Multi-Directional Marker Arrays (MDMArray ) were attached to the each laparoscopic tool handle used by the dominant hand, as shown in Figure 1(R). A planar marker array was attached to the 164 non-dominant hand tool. The M D M A r r a y s were equipped with a quick release clip to allow easy attachment and removal. The Polaris system is capable of tracking six different laparoscopic tools. One researcher scrubbed into the surgery and attached the sterilized marker arrays to the surgeon's hands, lower arms and torso and laparoscopic tools. This researcher remained to assist the surgeon with any adjustments to the marker arrays or support cuffs. Immediately postoperatively, the surgeon's joint centre locations were calibrated using a set of standard isolated joint motions. The tip position of each surgical tool was found using a sphere-fitting calibration procedure in which the tip was placed in a small depression on a fixed surface while the tool was manipulated about that point. 2.3 Missing Data Optoelectronic systems have inherent line-of-sight limitation^ causing segments of missing data. Custom-designed tool marker arrays with multiple faces visible from several directions (the M D M A r r a y s mentioned above) were developed to minimize these problems. To establish the feasibility of using an optoelectronic system for tool tip and posture tracking in the operating room, we measured the frequency o f marker occlusions and estimated the errors associated with interpolating missing data. Figure 1 - Model of the surgeon's arm (L). Marker array mounting (schematic and as mounted in the operating room (C)). Multi-Directional-Marker-Array (MDMA) (schematic and as mounted on a laparoscopic surgical tool (R). The values labelled on each marker represent the percentage of time it was visible during manipulation segments (%). 165 2.4 Hierarchical Decomposition Many performance measures (e.g., distance for a tool tip relocation maneuver) can only be defined for low-level movement segments. To assess repeatability of such performance measures, we therefore created a hierarchical decomposition describing the procedure in terms of surgical phases and stages, tool tasks and subtasks, and fundamental tool actions (see Figure 2). This five-level hierarchical decomposition provides the foundation for a quantitative analysis of surgeon performance since it allows us to identify related tool actions performed at different points during the procedure and consolidate the associated performance measures for analysis. Figure 2 - Laparoscopic Cholecystectomy Hierarchical Decomposition. 2.5 Performance Measures From the kinematic data associated with each subtask, we extracted characteristic measures of individual tool movements; for example: duration, mean and peak tool tip velocity, peak acceleration, jerk cost, straight-line deviations. In additional to the kinematic performance measures we also investigated mean membership values throughout a subtask in each of 12 prototypical actions (e.g., reach, sweep, idle, etc.); membership is expressed using fuzzy membership functions based on instantaneous tool tip position. We selected an example task (applying a clip) which is performed comparatively frequently (at least 6 times per procedure) and compared the characteristic measures associated with its component actions across the 35 clipping segments recorded. 166 2.6 Comparative Analysis (KS Statistic) To demonstrate the reliability of the proposed analysis procedure, we must show the performance measures are comparatively unaffected by variations between patients yet sensitive to real differences such as training effects. We assess similarity by comparing distributions of performance measures obtained from multiple executions of particular subtasks in two different contexts (e.g., comparing clip application times from the first three procedures to those from the last three could reveal a learning effect). To compare distributions, we use the Kolmogorov-Smirnov (KS) statistic (d), which represents the maximum vertical difference between cumulative distribution functions (see Figure 3); this measure ranges from 0 (similar) to 1 (different). The associated p value expresses the probability that the two measured distributions arise from the same underlying distribution. 100 i =_i i i i 5 10 15.. 20 25 30 35 Time (sec) Figure 3 - Application of the Kolmogorov-Smirnov Statistic to the cumulative distribution functions of task completion times We performed several comparative analyses related to the clip application subtask. Most commonly, clips are used in the vessel separation (CD and C A ) stage of the procedure; however, they are also occasionally used to control bleeding throughout the procedure. We would expect that there would be no significant difference between performance measures taken from alternate clip applications, as both patient variability and any learning effects w i l l be equally represented in the two data sets. Since the surgeon is fully trained, we would hope for little variation between the first four and the last three procedures studied; any differences w i l l be primarily due to patient variability and the results w i l l represent the reliability of our analysis procedure. Finally, we prospectively investigated several other divisions of the data set in hopes of demonstrating the existence of significant and interesting differences; for example, we considered whether there might be differences between clips applied for: vessel separation vs. control of bleeding; separating the cystic duct vs. the cystic artery; first placement vs. subsequent placement on the same vessel. 167 3. Results 3.1 Missing Data A certain amount of position data is lost during the course of the surgery due to marker occlusion or internal localization errors. Figure 1(L) illustrates the marker array visibility averaged over the seven procedures multiple trials for individual markers. Using video analysis to separate manipulation from non-manipulation tasks, we found that the majority of missing data could be attributed to marker occlusicns during segments not directly related to performing a surgical task (e.g. changing instruments). During tissue manipulation tasks, we had complete joint angle data 80% (SD: 10%) of the time and at least one sample per second 92% (SD: 4%) of the time. The trackability of the dominant hand tool depends on which tool is being used (we tracked the tool 78% (SD: 12%) of the time during the clipping task; 12 of the 45 tasks had no lost data and 20 successfully recorded more than 90% of the samples). We used a generalized cross validation (GCV) filtering technique (Woltring, 1986) to find an optimal smoothing parameter for fitting a quintic spline to position data. The data was resampled at constant intervals and missing data segments interpolated. The error associated with interpolating across varying gap sizes was calculated and shown in Figure 4. An RMS error of 1mm was chosen as an acceptable error which suggests that a maximum gap size of 0.5 sec (10 samples) can be interpolated. We were able to reduce the amount of missing joint angle data by 20% (SD: 2%) using our interpolation technique with a maximum 1mm RMS error. Using the same interpolation, we reduced the amount of missing data for the right and left hand tool data by 6% (SD: 2%) and 14% (SD: 3%) respectively. -o 3 0 6 • Original Data O Missing Data — Filtered / Interp. 0 0.5 1 1.5 2 Position (mm) 1.5 E E. o LU 0.5 + + + • • + • + • • -t-. + • -4-* * • Mean RMS Error + Max RMS Error 5 10 Gap Size 15 Figure 4 - Graphical representation of missing data interpolation (L). RMS error associated with missing data gap size (R). 168 3.2 Hierarchical Decomposition The hierarchical decomposition of the procedure provides an organizational framework for the kinematic and event sequencing analysis of different surgical tasks. The decomposition design was based on the standard elements of laparoscopic cholecystectomies and was developed in consultation with two expert surgeons; the decomposition approach can be readily adapted to other laparoscopic procedures. The decomposition has five levels, each containing more specific details than the previous. The phase level (1) outlines the global goals of the procedure. The stage level (2) outlines local goals required to complete each phase. The task level (3) involves the use of a single tool. The subtask level (4) describes how the surgical tool is moving inside the patient. Finally, actions (5) describe kinematic features of stereotyped fundamental movements such as reaching or sweeping. Figure 2 shows the various levels of the hierarchical framework. 3.3 Performance Measures Performance measures and mean action membership values were evaluated at the task and subtasks levels for the clip application task. Figure 5 Illustrates a limited collection of performance measures and their respective difference scores for clipping tasks. Orient Pul l P u s h Ret rac t u) R e a c h z3 G r a s p g R e l e a s e ^ T rans la te -c S w e e p CD i Hold o t M a x A c c . °~ Min V e l . M a x V e l . A v g . V e l . D is tance j T i m e K S Similari ty M e a s u r e s ., , l • 1 • O d d vs. E v e n • 1st vs . 2nd 1st vs . sub. x B leed vs. sep . x C D vs. C A . . I • 1 I * —l I • I i n i - 1 br.———• 1 i m 1 _ I • I 0.0 0.2 0.4 0.6 Di f ference S c a l e (d) 0.8 1.0 Figure 5 - Kolmogorov-Smirnov difference statistic of performance measures and mean membership action variables. 169 4. Discussion We successfully recorded tool tip position 82% (SD: 22%) of the time the surgeon was performing tissue manipulation tasks and posture data at least once per second 92%> (SD: 4) of the time. Using a generalized cross validation interpolation technique, we were able to interpolate one second gaps with estimated R M S errors less than 0.64 mm. Using the K S statistic the small difference between the time measures in the odd and even trials and 1 s t half / 2 n d half trials suggests there are no obvious learning effects and no significant effect of patient variability. This is confirmed by the kinematic measures, although the difference measures are higher in large part because of the limited sample size. In addition, there appears to a significant difference between clips applied to control bleeding and clips applied for vessel separation. 5. Conclusion We used an automated high-frequency tool tracking and postural measurement system to identify kinematic features of specific surgical actions of laparoscopic cholecystectomies as defined in a hierarchical decomposition diagram. The results demonstrate that an optoelectronic and video motion analysis, particularly in conjunction with a missing data interpolation technique, is a reasonable method for recording kinematic data in the operating room, despite occasional periods of missing data. Additional data is required to decrease the confidence intervals of the reported difference measures. The comparisons of distributions of those performance characteristics of surgical actions which we evaluated show consistent performance over time by a trained surgeon, and so are likely reliable measures of performance. A n expanded set of reliable kinematic measures and data wi l l form the basis for quantifying surgical skill and should be useful in validating surgical simulations for use in training, certifying surgeons and designing and evaluating new surgical tools. 6. Acknowledgements We thank the operating room staff at Vancouver Hospital and Health Sciences Centre for their cooperation and assistance. We also thank the Natural Sciences and Engineering Research Council of Canada (NSERC) . 7. References Rosser JC , Rosser L E . Savalgi RS . (1998). Objective Evaluation of a Laparoscopic Surgical Ski l l Program for Residents and Senior Surgeons. Archives of Surgery. 133(2):657-661. Rosen J., Hannaford B . , Richards C . G . Sinanan M . N . , Markov Modeling of Minimal ly Invasive Surgery Based on Tool/Tissue Interaction and Force/Torque Signatures for Evaluating Surgical Ski l l , IEEE Transactions on Biomedical Engineering, 2001, V o l . 48, No . 5, 579-91. Woltring H.J . , A Fortran Package for Generalized, Cross-validatory Spline Smoothing and Differentiation., Advances in Engineering Software, 1986, V o l . 8, No. 2, 142-51. 170 Appendix I S A G .E.S. Conference Submission This abstract has been accepted for poster presentation at the Society of American Gastrointestinal Endoscopic Surgeons ( S A G E S ) 8 t h World congress of Endoscopic Surgery in New York, U S A in March, 2002. Intra-surgeon Variability in Motor Task Performance in Laparoscopic Surgery Paul B . McBeth 1 , Antony J. Hodgson 1, PhD, Alex G. Nagy 2 , M D , Karim Qayumi 2, M D PhD 1 2 Departments of Mechanical Engineering and Surgery, University of British Columbia, Vancouver, BC, Canada V6T1Z4 Abstract Background: Current methods of evaluating the skills of surgical residents are subjective and potentially unreliable, so there is a need for objective methods to monitor their training. The purpose of this study is to test the intra-surgeon reliability o f a proposed quantitative skill assessment method based on surgical tool kinematics. Methods: One expert surgeon performed seven clinical laparoscopic cholecystectomies over a period of four months. Using an optoelectronic motion analysis system we acquired tool tip trajectories at frequencies of - 20 Hz. A hierarchical decomposition matrix was used to segment the procedure into specific surgical actions and extracted characteristic measures of these individual actions (e.g., duration, mean tool tip velocity, etc.). We selected an example sequence (applying a clip) and compared the characteristic measures associated with its component actions across the seven procedures recorded. Results: We found no statistical similarity difference between the completion time distributions in any of the four contexts examined: (1) first 4 trials vs. last 3 trials: the probability of these segments arising from the same distribution is p=0.30, and a percent difference in mean completion time, which would have been deemed significant of (d)=5.4%, (2) cystic artery vs. cystic duct: p=0.1, d=22.3%, (3) 1 clip vs. subsequent clips (cystic artery): p=0.94, d=9.3%, and (4) 1 clip vs. subsequent clips (cystic duct): p=0.67, d=20.8%. Results for trajectory length and tip velocity are. similar. Conclusions: We used an automated tool tracking system to identify kinematic features of specific surgical actions during laparoscopic cholecystectomies. The results suggest an expert surgeon performs specific tasks quite consistently from case to case and the performance measures are only modestly sensitive to specific patients. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0090255/manifest

Comment

Related Items