@prefix vivo: . @prefix edm: . @prefix ns0: . @prefix dcterms: . @prefix skos: . vivo:departmentOrSchool "Applied Science, Faculty of"@en, "Electrical and Computer Engineering, Department of"@en ; edm:dataProvider "DSpace"@en ; ns0:degreeCampus "UBCV"@en ; dcterms:creator "Wong, Darrell"@en ; dcterms:issued "2011-03-12T00:06:24Z"@en, "1991"@en ; vivo:relatedDegree "Doctor of Philosophy - PhD"@en ; ns0:degreeGrantor "University of British Columbia"@en ; dcterms:description """A multibody dynamics formulation has been developed for the purposes of real-time simulation of large scale robotic mechanisms such as excavators. The formulation models the rigid body dynamics of any arbitrary tree structured mechanism, although at present the formulation is restricted to single degree of freedom rotational joints. This formulation is an example of the orthogonal complement approach, which describes the dynamics by projecting an initial description of the primitive equations of motion (the derivatives of translational and angular momentum plus the kinematic equations) from angular and translational Cartesian coordinates to relative angles. In this thesis the approach was developed from Newtonian and Eulerian principles. Novel single cpu algorithms for inertia matrix and force vector formation have been implemented. Novel multiprocessor algorithms were implemented for the inertia matrix and the force vector on a 2d [formula omitted] triangular mesh architecture. A feedforward systolic matrix solution technique was also implemented. These algorithms are of O(n) complexity, and together they form a parallel formulation which is more efficient than other parallel formulations in the literature for mechanisms with fewer than 15 degrees of freedom. A Caterpillar 215B excavator was simulated in real-time using an array of transputers, and teleoperation experiments were conducted to verify the formulation. Single cpu simulations of the PUMA 600 and a human torso were also conducted."""@en ; edm:aggregatedCHO "https://circle.library.ubc.ca/rest/handle/2429/32391?expand=metadata"@en ; skos:note "P A R A L L E L I M P L E M E N T A T I O N O F M U L T I B O D Y D Y N A M I C S F O R R E A L - T I M E S I M U L A T I O N By Darrell Wong B.E. (Electrical) University of Auckland, 1981. M.A.Sc. (Electrical) University of British Columbia, 1985. A T H E S I S S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E OF D O C T O R OF P H I L O S O P H Y in T H E F A C U L T Y O F G R A D U A T E S T U D I E S E L E C T R I C A L E N G I N E E R I N G We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A May 1991 © Darrell Wong, 1991 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia Vancouver, Canada DE-6 (2788) Abstract A multibody dynamics formulation has been developed for the purposes of real-time simula-tion of large scale robotic mechanisms such as excavators. The formulation models the rigid body dynamics of any arbitrary tree structured mechanism, although at present the formula-tion is restricted to single degree of freedom rotational joints. This formulation is an example of the orthogonal complement approach, which describes the dynamics by projecting an initial description of the primitive equations of motion (the derivatives of translational and angular momentum plus the kinematic equations) from angular and translational Cartesian coordinates to relative angles. In this thesis the approach was developed from Newtonian and Eulerian principles. Novel single cpu algorithms for inertia matrix and force vector formation have been implemented. Novel multiprocessor algorithms were implemented for the inertia matrix and the force vector on a 2d n(\"2+1) 4. n triangular mesh architecture. A feedforward systolic matrix so-lution technique was also implemented. These algorithms are of 0(n) complexity, and together they form a parallel formulation which is more efficient than other parallel formulations in the literature for mechanisms with fewer than 15 degrees of freedom. A Caterpillar 215B excavator was simulated in real-time using an array of transputers, and teleoperation experiments were conducted to verify the formulation. Single cpu simulations of the PUMA 600 and a human torso were also conducted. ii Table of Contents Abstract ii List of Tables vii List of Figures ix Acknowledgements xi 1 Introduction 1 1.1 Modeling Multibody Dynamics 1 1.1.1 Issues 2 1.1.2 Motivation 3 1.1.3 Scope 3 1.2 The Principles of Dynamics 4 1.2.1 Lagrangian 5 1.2.2 Newton-Euler 9 1.3 Overview of the Thesis 10 2 Existing Formulations of the Equations of Motion 12 2.1 A Brief Literature Review 12 2.1.1 Coordinate selection 12 2.1.2 Reference point 13 2.1.3 Topology 14 iii 2.1.4 Closed chains 14 2.1.5 Other characteristics of multibody systems 15 2.1.6 Parallel formulations 15 2.2 Forward Dynamics Algorithms in Detail 16 2.2.1 The composite rigid body algorithm 17 2.2.2 Recursive 0(n) techniques 21 2.2.3 Orthogonal complement algorithms 23 2.2.4 The Newton Euler State Space orthogonal complement algorithm 27 2.2.5 Discussion 33 3 Efficient Computation using Body Coordinates 35 3.1 Referral of the Equations to Body Coordinates 36 3.1.1 Single Chain Algorithm 36 3.1.2 Branched systems algorithm 38 3.2 Inertia Matrix Calculation and Complexity 41 3.3 Force Vector Calculation and Complexity 45 3.3.1 0(n2) force vector algorithm and complexity 47 3.4 Equation Solving 49 3.5 Simulations 49 3.5.1 Puma 600 manipulator 50 3.5.2 Human torso 57 3.6 Summary 61 4 A Parallel Dynamics Algorithm 66 4.1 Integrating the Inertia Matrix and Force Vector Computations 66 4.2 Parallel Calculation of H 68 iv 4.3 Parallel Calculation of [L mp) 72 4.4 Parallel Multiplication by HT 72 4.4.1 Multiplying H and [L mp] by HT 75 4.5 Parallel Matrix Solver 80 4.5.1 The Jainandunsing feedforward algorithm 83 4.6 Summary of the Proposed Forward Dynamics Algorithm 89 4.7 Parallelism in Other Formulations 89 4.7.1 Lee and Chang 91 4.7.2 Amin-Javaheri and Orin 92 4.7.3 Fijany and Bejczy 94 4.7.4 Hwang, Bae, Haug and Kuhl 95 4.7.5 Complexity Comparison 96 4.8 Summary 99 5 The Real-Time Excavator Simulator 101 5.1 The UBC Teleoperation Project 101 5.2 A Description of the Caterpillar 215B 102 5.3 Architecture and Hardware Considerations 104 5.4 Implementation of the Simulator 106 5.4.1 Multibody dynamics and distributed Runge Kutta 106 5.4.2 The actuators 109 5.4.3 The display and joystick interface I l l 5.5 Excavator Simulations 113 5.5.1 Timing measurements 113 5.5.2 JOINT and RESOLVED mode experiments 117 v 5.6 Summary 126 6 Conclusions and Further Work 127 6.1 Summary of the Thesis 127 6.2 Concluding Remarks 130 6.3 Contributions to the Literature 131 6.4 Topics Requiring Further Exploration 132 6.4.1 Multiple degree of freedom joints 132 6.4.2 A parallel architecture for the branched formulation 134 6.4.3 Closed chains 135 6.4.4 Actuators and friction effects 139 A Single Chain Dynamics 141 B Branched Chain Dynamics 143 Bibliography 145 vi List of Tables 2.1 Computational Complexity for single chain systems on one cpu 19 3.1 Complexity of Forward Dynamics for One Cpu 45 3.2 Computational Complexity for Inertia Matrix Assembly for One Cpu 45 3.3 Complexity of single cpu force vector algorithm 49 3.4 Denavit Hartenberg parameters for the PUMA 600 50 3.5 Complexity of complete forward dynamics for single chains on one cpu 60 4.1 Computational complexity for parallel H algorithm 71 4.2 H computation schedule for row 1 cpus 71 4.3 Schedule for calculating [L mp] 73 4.4 Redistributed computations and communications for [L4 m^p^] 74 4.5 Revised H computation (row 1) schedule 74 4.6 Revised schedule for calculating [L mp] 75 4.7 First four computations of HTH (rows 1 to 4) and HT[L) (row 0) (rotational components only) 79 4.8 Parallel computational complexity 90 4.9 Estimated computational complexity for Hwang's algorithm 96 5.1 Tasks required for the dynamics simulation 107 5.2 Parameters used in excavator simulation Ill 5.3 kp and kv for PD actuator spool control 113 vii 5.4 Theoretical and real t iming estimates of one cycle of simulation on one cpu and the multiprocessor (Ziiz^ fcll + n C pus) 115 5.5 T i m i n g estimates for calculating H H 6 6.1 Bij for various types of joints 133 viii List of Figures 1.1 Open loop, fixed base, serial chain 6 1.2 Forces and torques on the ith body assuming all rotational joints 8 2.1 Body coordinate frames 28 2.2 Vectors describing the ith hinge constraint 31 3.1 A branched tree of five bodies 39 3.2 Block diagram of the open loop inverse dynamics controlled PUMA 600 simulation 50 3.3 Denavit Hartenberg coordinate description for the PUMA 600. Adapted from (Angeles 88) 51 3.4 0d, Odi 0d profiles for all joints of PUMA 600 model. Overlaid are the 9, 0 and 0 responses for joint 6 54 3.5 Td profiles for joints 1 to 6 generated by the inverse dynamics of the PUMA 600 model 55 3.6 Errors 9d - 9 , 9 d - 9, Bd - 6 profiles for joint 6 of PUMA 600 model 56 3.7 Human torso coordinate diagram 58 3.8 0{• • in cpui,, (row 1) 69 4.3 Final distribution of c\\ • and p) • vectors 70 4.4 Systolic array for solving the matrix equation Ax = b 86 4.5 Data shifting phase prior to solver 87 4.6 Generalised cube network 91 4.7 N=8 processor cube configuration. Adapted from Amin-Javaheri 1988 93 4.8 Computational complexity for the parallel algorithms of Table 4.8, assuming only M is calculated. L - Lee, F - Fijany, H - Hwang, W - Wong (proposed algorithm). 97 4.9 Computational complexity for the parallel algorithms of Table 4.8, including M and feedforward solver. L - Lee, F - Fijany, H - Hwang, W - Wong (proposed algorithm) 98 5.1 Initial coordinate arrangement for Caterpillar 215B and resolved mode parametersl03 5.2 Computer architecture for the excavator simulator 106 5.3 Voltage (v) to spool (x) relationship with deadband 110 5.4 The algorithm organisation between cpuo,o, the control transputer and the IRIS . 114 5.5 JOINT mode 6 response in the simulator 119 5.6 JOINT mode 6 response in the simulator 120 5.7 JOINT mode v and x response in the simulator 121 5.8 RESOLVED mode 6 response in the simulator 123 5.9 RESOLVED mode radius, cabin rotation, height, and bucket rotation responses in the simulator 124 5.10 RESOLVED mode v and x response in the simulator 125 6.1 Architecture for branched four body system 136 x Acknowledgements I would like to thank my supervisor Peter Lawrence for his encouragement and financial and technical support. My committee members, Dr. M.R. Ito, Dr. F. Sassani and Dr. C.W. da Silva have provided encouragement and guidance. To my coworkers, Nariman Sepehri, Dan Chan, Frank Wan, and Real Frenette I owe a great debt of gratitude for their technical advice and diligent proofreading. A special acknowledgement is owed to Ahmed Ibrahim for pointing me in the right direction. Lastly I would like to thank Serena Mar for her constant encouragement and faith, and my parents for their patience and support. xi Chapter 1 Introduction 1.1 Modeling Multibody Dynamics The dynamics of rigid body motion have been derived and analysed since the 18th century. Indeed, the field has been so well investigated it is now called classical dynamics. Until the early part of this century, dynamicists concentrated on the analytical mechanics of single bodies such as tops, in which the motion could be mathematically described in closed form [Whittaker 27]. Systems and motions of greater complexity were considered too difficult. The development of the computer in the 1950s and 60s, with its ability to numerically integrate, renewed interest in the dynamics of more complex multibody systems for which elegant closed form solutions for the equations of motion could not be obtained. Multibody computer simulations require the equations of motion in the form of an algorithm which generates the derivative of the system state from the known previous state. The state consists of joint positions and their velocities. The joint variables may be rotational (e.g. Euler angles), or translational (e.g. prismatic). The objective of the computer simulation is to continuously solve for the derivative variables (generally joint accelerations and joint velocities) and numerically integrate over time to determine the new state variables (joint displacements and velocities). In the fields of space dynamics, mechanism analysis, and vehicle dynamics, the equations of motion have been studied extensively. The formulations have steadily increased in capability, permitting the analysis and simulation of closed chain (kinematic loop) structures, flexible bod-ies and advanced control techniques. This greater capability has unfortunately also resulted in significantly increased computational complexity (in its crudest sense, the number of multiplies and adds as a function of the size of the problem (in this case the number of degrees of freedom 1 Chapter 1. Introduction 2 n)). In addition, the automatic generation of the equations of motion has become more diffi-cult. As a consequence, non-real-time simulation has been the norm, although supercomputers have trimmed computational delays. Online man-in-the-loop interaction, however, has not been possible except for hybrid electromechanical simulators e.g. aircraft training simulators. 1.1.1 Issues There are four major issues in multibody dynamics [Schwertassek 89] which continue to attract attention. 1. The basic issue is the choice of an approach for deriving the equations. This involves the principle of mechanics, the choice of state variables, and the description of system topology. Conceptual simplicity, efficiency and computational complexity are the main criteria for evaluation. i 2. A second issue is the incorporation of other types of dynamic behaviour, including such effects as flexibility and friction. These effects are secondary and dominated by the large scale rigid body rotations, but they can greatly increase the complexity of the equations and may often require much smaller step sizes for the integrator, resulting in greatly increased execution time. These characteristics become more important as advances are made in control theory, structural mechanics, and materials engineering, all of which will require models of light and flexible linkages. 3. The third issue is the choice of appropriate methods for dealing with closed chains. Con-straint equations representing closed chain forces can lead to the addition of Lagrange multipliers, or to transformations to minimal sets of equations. The presence of Lagrange multipliers can lead to numerical instability, and thus coordinate partitioning and other reduction techniques have been proposed to stabilise the system and minimise the number of equations [Schwertassek 84]. A further problem arises when the closed chains open and close over time, leading to system representations which vary in form over time. These problems are numerical and theoretical in nature, and can greatly increase the complexity of the formulation. Chapter 1. Introduction 3 4. The fourth issue is real-time simulation. The need for shorter design cycles and man-in-the-loop simulation capability requires formulations that execute in real-time. Item one above approaches this through theoretical considerations. Another approach is to use multiple processors to generate and solve the equations. At present this area of research is in its infancy, and the issues of algorithm expandability, architectural complexity, and the integration of various phases in the computation are still research problems. Questions arise as to the most appropriate formulation for multiple processor implementation, and the best computer architecture to use. This thesis addresses these last two questions. 1.1.2 Motivation The robotics research group at the U B C Electrical Engineering Department is interested in applying telerobotic control techniques to robot linkages such as excavators. A simulator is required when experiments in control, sensing, kinematics, and hand controller strategies need to be conducted in a controlled setting, before major field tests with real systems begin. This thesis is concerned with the design and implementation of a real-time manipulator simulator of the Caterpillar 215B, an excavator-type machine. 1.1.3 Scope In addition to excavator-type machines, which are basically four link structures, mechanisms with complicated wrists (e.g. the FMS Timberjack feller buncher) and multilegged bodies (e.g. the Kaiser Spyder) have also been designed. Robots such as the proposed Special Purpose Dextrous Manipulator, designed in Canada for the Mobile Service platform on the space station, have multiple chains of links. Modelling these machines efficiently requires formulations of more complex multibody systems than single-branch chains. One aim of this thesis is to take a step towards the development of an algorithm that accommo-dates branched (tree-like) systems. Initially only rotational degrees of freedom will be modelled. Prismatic (translational) joints are not included here, but are easily modeled ([Kim 86a] and Chapter 6). The thesis is limited to the dynamics of rigid body systems, since these large links do not flex appreciably. The formulation does not simulate closed chains, but this area of Chapter 1. Introduction 4 research will be discussed in the chapter on future work. Control issues will not be considered, but the dynamics of hydraulic actuation will be briefly considered in chapter 5, when the issue of interfacing to actuators and controllers is examined. The second aim of this thesis is to implement a real-time simulator for the dynamic equations of the Caterpillar 215B. Researchers in robotics have attempted to obtain real-time simulations of common robots by concentrating on computational efficiency and limiting the problem to simple serial systems (a single open kinematic chain), each joint having only one degree of freedom. Often only one phase of the computation is addressed (e.g. inertia matrix, force vector, or matrix solver), without consideration of the integration of these different phases of the formulation. Even these simplified algorithms have required multiprocessors, since they have are too computationally complex for real-time simulation with existing single cpu computers. Our approach is also to use a multiprocessor. The computer architecture, however, will implement a parallel formulation which is concerned with the integration and efficiency of all phases of the simulation. The proposed formulation is evaluated using the computational complexity of the dynamics algorithm and the simplicity and efficiency of the multiprocessor architecture as the criteria. Initially a single cpu formulation will be developed for branched chains. The parallel archi-tecture, however, is specific for a single chain mechanism. In the discussion on future work, modifications for the inclusion of branched systems in the parallel architecture are examined. 1.2 The Principles of Dynamics The dynamics of a multibody system are determined by the kinematics, which describe the topology, and the kinetics, which describe the forces. Kinematics is focused on the geometric aspects of motion, excluding the forces involved. Link length and rotational or translational displacement and velocities are calculated to obtain the end-effector (or some link in between) position or velocity in Cartesian coordinates (forward kinematics), or conversely, to determine the joint displacements that produce the endpoint position (inverse kinematics). Kinetics is concerned with describing the balance of forces that influence body motion, such as gravity, actuator moments, and Coriolis forces due to the effects of angular velocity. Chapter 1. Introduction 5 Together, the kinematics and kinetics completely describe dynamic behaviour through the dif-ferential equations of motion, in terms of joint variables, structural parameters, and the internal and external forces applied to the joints. Examples of the external forces which affect the mo-tion are the joint torques exerted by the actuators, and forces from the environment such as contact forces. The dynamics can be obtained from two main approaches: Lagrangian, and Newton-Euler. The next two sections outline the fundamentals of these two approaches. Chapter 2 will discuss specific examples in greater detail. 1.2.1 Lagrangian The Lagrangian approach requires the designer to select a set of independent state variables q which can completely characterize the position and orientation of the system. This set is known as the set of generalized coordinates. For a simple manipulator with rotary or prismatic joints, relative joint angles and displacements are easy to measure, and thus make a good set of coordinates. Another choice is inertially referenced angles and displacements. Consider the single chain of n bodies in Fig. 1.1, with one degree of freedom per joint. Assume that the generalised coordinates are qi,...qn (measured relative to the previous link, or with respect to an inertial frame). The Lagrangian is denned as the difference between the kinetic (Ek) and potential (Ep) energy of the system. £(q,q) = Ek(q,q)-Ep(q). (1.1) The equations of motion are defined as: d dC dC . ( . d t ^ - d q ~ r F t , = 1 ' - B ( L 2 ) where F{ is the force acting in the direction of the coordinate. The Ek of a link i is given by Ek(i) = ^rriixjii + iicfjjti;.-. (1.3) J, is the link inertia tensor, and i , and tt>, are the 3x1 translational and angular velocity vectors of the centre of gravity of the ith body, measured relative to an inertial frame of reference. The velocities i , and w, are the Cartesian velocity coordinates of the ith body. The system vector Chapter 1. Introduction 6 B. 1 / z1 •A Figure 1.1: Open loop, fixed base, serial chain manipulator [x w] forms a nonminimum set of coordinates which may be reduced to a minimum set when the hinge constraints are added. Transformations from Cartesian coordinates to q or q are necessary because w (a 3 X 1 vector) is not a minimum representation and is generally not a measurable variable (unless it consists of a rotation about a single axis) since it is a function of up to three Euler-type angular rotation rates. One generalised set of q is rotational displacements (for a system with all rotational joints and no translational joints) which may be measured with respect to a global (inertial) frame or with respect to an adjacent link (relative). The transformation between [x, «?,-] and any other q is ii = Bj)q Wi = B$q (1.4) Chapter 1. Introduction 7 where Bj) and Bp* are translational and rotational 3xn Jacobian velocity transformations to the system vector q. The Jacobians for q — 0 can be derived from the following kinematic velocity and acceleration relations: Wi = «;,•_! + Zi$i Wi = Wi_i + Wi X Zidi + ZiOi (1.5) Xi = x , _ i + - ki Xi = + (wi_i x /j_i) - (wi X fcj) ^ = x , _ ! -|- x + x x - X ki — Wi x x A:,- (1.6) where ki is the vector from the centre of gravity of a body to the lower hinge, and /,• is a similar vector to the upper hinge on a body (Fig. 1.2). zt is the unit vector describing the axis of rotation of the ith hinge, ki, /,, and z, are assumed to be constants when referred to their own coordinate frame, but in equations 1.5 and 1.6, they are time varying, as the equations have been implicitly referred to a common coordinate frame. The system kinetic energy can be written as Ek = \\qTMq where M = f l ( m i B T T B T + B R T J i B R ] ) i1-7) where M is the n x n inertia tensor for the complete manipulator system. The potential energy for the system is due to gravity g, and is n Ev = Y,m>9Txi- (L8) When the Lagrangian is differentiated as per equation 1.2, the equation of motion for the ith. link is n T I n n E MH9j + E E hHkWk + E mi9TB%} = Fi i = 1, .. .n (1.9) j=i j=i k=i j=i where Fi is the actuator torque, and B^} is the tth column of the B^p matrix appearing as a result of the differentiation of Ep with respect to dg,. hijk represents the terms that occur Chapter 1. Introduction 8 Figure 1.2: Forces and torques on the ith body assuming all rotational joints as a result of differentiating M with respect to dq. The first term of equation (1.9) represents the moment due to inertia, the second term represents the Coriolis and centrifugal moments, and the third term represents the moment due to gravity vector g. The equation describing the system is then M{q)q=F-h{q,q)-G{BT). (1.10) The equations must then be solved for q given that M and BT are functions of q, and h is a function of q and q, which are known from the previous time step. F is provided from an actuator algorithm. The benefit of working with the Lagrangian approach is that the forces of constraint between hinges do not have to be considered. The internal forces never explicitly appear in the equations because a kinetic energy argument is used to describe the formulation, and the kinematics are Chapter 1. Introduction 9 used to describe the constraints and transform to generalised coordinates. Although the deriva-tion of the formulation is elegant, developers of computer codes have not used the complete Lagrangian equation for systems of significant complexity due to the computational inefficiency of the technique. Ibrahim [Ibrahim 88] and Silver [Silver 82], however, have shown that this is not necessarily true, by demonstrating that the principle of mechanics does not greatly influence the efficiency of the equations. It is the cost of performing the transformation to different q that determines the efficiency of the algorithms. 1.2.2 Newton-Euler The Newton-Euler approach uses Newton's Second Law of Motion and Euler's equation for rotational momentum to describe the kinetics of each body. Newton's Second Law, F = P = 4(mi) = ml (1.11) dV ' v ' describes the sum of the translational forces F on a body as equal to the rate of change of linear momentum, P. m and x are described as in the previous section. Rotational motion is described by Euler's equation for the inertial torque of a body, which describes the time rate of change of angular momentum: N = L = ^(Jw) = Jw + w x Jw (1.12) where L is the angular momentum, N is its derivative or moment, J is the link inertia (relative to its own coordinate frame), and w is the body's angular velocity (measured with respect to an inertial frame) defined in the previous section. Jw is known as the angular acceleration torque and w X Jw is known as the gyroscopic torque. The gyroscopic torque appears because the orientation of J varies with time, producing a non-zero time derivative. In the Newtonian approach, each link is isolated as a free body, and the forces and torques due to gravity, inertia, velocity, and actuation appear in translational and rotational equations as a balance of forces or moments (Fig. 1.2). The equation describing the translational forces on a link i is (Fig. 1.2): m,x, = 7i - 7,-+i + m,-0, (1.13) Chapter 1. Introduction 10 where the RHS is the sum of the translational forces about link i. 7,- and 7,4.1 are the forces applied to link i at the hinge points i and i + 1 by the attached links, which seek to constrain the body's motion. 17, is the acceleration vector due to gravity. The rotational torques are: Li = J{Wi + Wi x JiWi = Ri\\i - Ri+i\\i+1 + ki x 7, - /, x 7; + 1 + r, - r , + 1 (1-14) where is the actuator torque occurring about the axis of motion i. i?,A, (or i2,+iA, +i) are moments that act to oppose rotational motion in axes orthogonal to the axis of motion of hinge i (or hinge i+ 1). A:, x 7^ and /, x 7,+i are moments due to the translational hinge forces 7,- and 7i+i. All terms in (1.13) and (1.14) have been referred to a common coordinate frame. The above equations can be written for all the bodies in a system with the differentiated state variables x and w. As in the Lagrangian approach, x and w can be projected to a different set of states, such as Euler angle accelerations 9 or Euler parameter accelerations qp, and may be relatively or inertially measured. For one degree of freedom relative joint angles, Wi — f(zi9\\, Z2&2, —Zi^i), where z, is the ith axis of rotation. The equations of motion, 1.13 and 1.14, are then a function of 6, and, due to w, functions of 9 and 9. The projection from [i w] to an alternative set of q such as 9 leads to the elimination of the internal constraint forces and moments 7 and A. Once the equations for all bodies are assembled and the constraint forces and moments are eliminated, the equations must then be rearranged to solve for 9. The formulations may be 0(n) computational complexity if the projection/elimination of the constraint forces is done recursively from link to link (e.g. the algorithms of Featherstone [Featherstone 83], Rodriguez [Rodriguez 88], Bae and Haug [Bae 87b], Roberson and Schwertassek [Roberson 88], and Brandl [Brandl 86]). If the projection is done as a global transformation, 0(n 2) computational complexity (e.g. Hwang and Haug [Hwang 88]), or 0(n 3 ) computational complexity is obtained (e.g. Angeles and M a [Angeles 88], Kim and Vanderploeg [Kim 86a], and Hemami [Hemami 82]). o 1.3 Overview of the Thesis In the next chapter, Chapter 2, some of the formulations proposed by the research community are reviewed. The approaches taken once the Lagrangian or Newtonian primitive equations have Chapter 1. Introduction 11 been established are examined, and examples of parallel implementations are described. In the latter half of Chapter 2, the Newton Euler State Space method [Hemami 82], a formulation based on orthogonal complements, is presented. This method is the starting point for the development of the parallel formulation presented in this thesis in later chapters. In Chapter 3, the equations of motion are reorganised by referring them to body coordinates. This change illuminates the recursive nature of the equations, determining the structure of the algorithm, which then influences the multiprocessor architecture. The formulation is then generalised to branched systems and the equations are shown to be computable in an automated manner. Simulations of a PUMA 600 manipulator and a branched six degree of freedom human torso are presented to demonstrate the formulation. In Chapter 4, a multiprocessor algorithm for calculating the inertia matrix for simple serial chains (an open kinematic series of bodies) is obtained. A multiprocessor algorithm for the calculation of the driving force/torque vector is also developed which integrates well with the inertia matrix algorithm. Chapter 4 also describes the implementation of a parallel feedforward systolic matrix equation solver derived by Jainandunsing [Jainandunsing 89]. A parallel systolic architecture decomposes a problem into simple parallel tasks which are identical. These tasks may be run simultaneously on identical cpus, with intermediate results being passed between neighbours in a pipelined manner, analogous to the way blood is pulsed through the body. This solver completes the set of parallel algorithms used for the simulator. Chapter 5 describes the excavator simulator, including the computer architecture and the inter-faces to a graphics computer and joystick control. Measurements of relative efficiency between single and multiple cpu simulations are given, and a teleoperation experiment is presented to demonstrate the usefulness of the simulator. Chapter 6 concludes the thesis with a discussion on results and future work. Ideas for extending the formulation and the architecture to deal with branched sysytems, multiple degree of freedom joints, and closed chains are presented. Chapter 2 Existing Formulations of the Equations of Motion In this chapter some of the formulations proposed by the research community are reviewed. A brief review of the development of multibody dynamics is presented by pointing out the most important characteristics required of a formulation. Following this, some of the most relevant approaches will be discussed, and examples of parallel implementations described. Finally the Newton Euler State Space method [Hemami 82], a formulation based on orthogonal complements, is presented. This method is the starting point for the parallel formulation developed in this thesis. 2.1 A Brief Literature Review In this section, a very brief review of the literature is presented. Following this, those formula-tions most relevant to the thesis objectives are examined in more detail. 2.1.1 Coordinate selection Selecting the principle of mechanics, as discussed in the previous chapter, is just one step in the formulation process. The selection of the final generalised coordinates (e.g. relative or inertial, minimum or nonminimum set) also greatly influences the final form of the equations of motion. Transformations of the initial equations of motion from Cartesian coordinates [w x] to other coordinate spaces can reduce the number of equations in the final system to varying degrees, and give the user more insight into the mechanics of the formulation. The final equations may be easier to derive (but require a larger set of equations and coordinates) or harder to derive (but result in a smaller set of equations using a minimum set of coordinates). In the 1960s, Roberson and Wittenberg [Roberson 66] and Hooker and Margulies [Hooker 66] 12 Chapter 2. Existing Formulations of the Equations of Motion 13 developed algorithms for problems in spacecraft dynamics that were implementable on a com-puter. Newton-Euler principles were used to form the primitive equations for individual bodies, and these primitive Cartesian equations were augmented by constraint equations and Lagrange multipliers, rather than eliminating the constraint forces by projective transformation. Haug [Haug 89] used inertially measured Cartesian generalised coordinates in the DADS software package, in which constraint forces were not eliminated, and the coordinates were not reduced to other state spaces such as relative angles. This choice generated large numbers of equations, but resulted in simpler definitions for constraints and forcing functions. Hooker [Hooker 70] and Jerkovsky [Jerkovsky 78] used transformations to relative joint variables when it was realised that they were easier to measure, more intuitive, and less prone to numerical error. This choice of coordinates resulted in a smaller but more complex set of equations. Examples of similar transforms are found in [Hemami 82], [Bae 88a], [Kim 86a], [Angeles 88], [Bae 87a], and in the work performed in this thesis. 2.1.2 Reference point When defining the equations of motion for a large system of bodies, both Roberson and Hooker used the augmented body approach, which defined a body barycenter as the point of application of body forces. The barycenter is the center of gravity of a body which has been augmented by point masses at the hinge positions around the body. These point masses are the cumulative masses of all the links in branches outboard from each hinge. The augmented body approach uses the system center of gravity as the system reference point. He and Goldenberg [He 89] have used this technique to computationally simplify the inverse dynamics (calculating the desired torques) for single chains. The barycentric approach has generally been supplanted in the literature by the direct path method introduced by Ho [Ho 74] and subsequently used by Jerkovsky and others. The direct path approach assumes a designated home body as a reference point, and then describes a body by relating it to home via the chain of bodies between them. This results in simple recursive equations for the kinematics and kinetics in which the point of application is at the hinge or the centre of gravity of a body. In this thesis the direct path method is used, with the home body being the first body in the chain. Chapter 2. Existing Formulations of the Equations of Motion 14 2.1.3 Topology Another requirement of a computer formulation is a systematic method for specifying the system topology. Graph theory, which mathematically expresses the topology as a matrix (or linked list) of connectivity, is a popular way of describing the connections and constraints among links. Roberson and Wittenberg used graph theory to define the topology of a system using an incidence matrix. The incidence matrix can be quite complex if the numbering of the bodies and hinges is arbitrary. The simplest incidence matrix uses a strict numbering system related to the direct path described by Ho [Ho 74]. In this method each body in a branch is labelled from a home body with an ascending integer, and each branch is completely labelled before another branch is begun. As a result, the ith row of the incidence matrix has a 1 in the jth column if the jtb. body lies on the path between body i and home. Most formulations, including the one developed in this thesis, now use some form of matrix or list to describe the topology (although roboticists have generally limited their simulations to robots without branches and consequently seldom use an incidence matrix). 2.1.4 Closed chains Closed chains present significant problems for researchers due to the difficulty in selecting appropriate coordinates among a large initial set (the coordinates produced when closed loops are cut). This is because of degrees of freedom which are lost or gained when the system forms geometries which cause the hinges to lock during motion . Numerous researchers have modeled closed chains, but no method has demonstrated significant advantages over others. The augmentation method, using Lagrange multipliers to represent the unknown closed chain constraint forces, plus constraint equations to represent the kine-matic constraints, has been discussed by Nikravesh [Nikravesh 84], Orlandea [Orlandea 77], and Wittenberg [Wittenberg 77]. One method is through direct integration and solution of the augmented equations (a differential-algebraic set). The problem with this technique is that nu-merical errors appear due to numerical integration inaccuracies, resulting in constraint equation violations. Numerical constraint stabilisation has been used by Baumgarte [Baumgarte 72] and Bayo [Bayo 88] to monitor the amount of violation and then numerically dampen the system to Chapter 2. Existing Formulations of the Equations of Motion 15 minimise the errors. State space reduction techniques have also been developed which minimise the number of independent coordinates by partitioning the initial set of coordinates into depen-dent and independent sets u and v respectively. The constraint equations are then redefined so that functions of u coordinates become functions of v coordinates. Coordinates and velocities u, v, ii and v can be calculated, followed by the solution to the differential-algebraic equations. Separating q into il and v enables the algorithm to control the amount of numerical error. This technique was initially proposed by Wehage, Mani and Haug [Wehage 82] [Mani 84]. Kim and Vanderploeg [Kim 86b] refined it using QR decomposition of the constraint Jacobian to identify the independent coordinates. Another refinement of Mani's technique was made by Park and Haug [Park 86], who combined Baumgarte's numerical stabilisation and Mani's partitioning technique. This thesis does not consider the problem of closed chains, although in Chapter 6 an approach developed by Murray and Lovell [Murray 89] is suggested as a logical extension to the current thesis work. 2.1.5 Other characteristics of multibody systems Researchers such as Ho, Hughes [Hughes 79], Bodley [Bodley 78], Keat [Keat 83], Kim [Kim 88] and Ibrahim [Ibrahim 88] have developed formulations which include such characteristics as flex-ible links and more sophisticated joint constraints in the context of space dynamics. Vehicle dynamics researchers such as Fuhrer [Fuhrer 89], and Rulka [Rulka 90] have developed algo-rithms for simulating railguided vehicles and automobiles, which need to deal with friction and nonlinear wheel-rail forces. Characteristics such as these have not been considered in this thesis work as the emphasis has been on the development of the basic rigid body equations as a first effort. 2.1.6 Parallel formulations All the formulations described above were developed for single processors. Bae, Kuhl and Haug [Bae 88b] and Hwang, Bae and Haug [Hwang 88] developed and implemented parallel algorithms on an Alliant FX/8 shared memory multiprocessor. Bae's algorithm is an 0(n) computational complexity algorithm which uses one cpu for each topological branch (chain) in the system. Hwang's algorithm is an n cpu, 0(n + log2 n) computational complexity, parallel Chapter 2. Existing Formulations of the Equations of Motion 16 version of an 0(ra2) single cpu algorithm [Bae 88a]. Robotics researchers such as Lee and Chang [Lee 88], Fijany and Bejczy [Fijany 89] and Amin-Javaheri and Orin [Amin-Javaheri 88] have parallelised single chain linkages using the 0(n3) single cpu algorithm proposed by Orin and Walker [Walker 82] as the basis formulation. They derive 0(n + (log2 n\"|) (Lee, Amin-Javaheri) or O(log2 n) (Fijany) complexity parallel algorithms that use a generalised cube or a hypercube computer architecture utilising n cpus. This brief survey, which does not reflect the contributions of many other researchers, shows that a variety of formulations have been derived. The most popular ones use relative joint coordinates as generalised coordinates, and connectivity matrices using the direct path method to describe the topology (and as a consequence, to describe the constraint and kinematic relationships). Modeling techniques for closed chain constraints and secondary characteristics such as flexibility and friction continue to be developed. Research on parallel solutions has only just begun, and will continue to evolve as computer hardware and algorithms change. In this thesis relative joint angles are used as generalised coordinates and a connectivity matrix is used to describe the topology and kinematics. Closed chain constraints are not dealt with, as the emphasis is on real-time performance achieved through parallel processing. 2.2 Forward Dynamics Algorithms in Detail This section discusses several of the most recent formulations, which are divided into three types, each a different approach to the transformation of the primitive equations of Chapter 1. The first approach is known as the composite rigid body method, which uses the Newton Euler inverse dynamics method derived by Luh, Walker and Paul [Luh 80], plus a technique for calculating the inertia matrix which uses partial sets of links at the open end of the chain as composite bodies. The second approach uses recursive kinematic and kinetic equations to form the dynamics in 0(n) time, by locally eliminating the hinge constraint forces. Accelerations and forces are locally projected from body to body across the joints. The third approach uses a global Jacobian velocity transformation to eliminate the hinge forces and project from [w x] to 9. This third approach is pursued in later chapters of this thesis. Chapter 2. Existing Formulations of the Equations of Motion 17 2.2.1 The composite rigid body algorithm formulations The most commonly used algorithm for single chained robots is the composite rigid body method of Walker and Orin [Walker 82] (method 3), derived for a single chain manipulator with one degree of freedom joints. The algorithm begins by establishing the Lagrangian form of equation 1.10 as the final form of the desired equations, in which the coordinates are relative joint angles and relative prismatic displacements. The computations required to obtain the elements of the matrices in equation 1.10 are then derived from the Newton-Euler inverse dynamics algorithm derived by Luh. The RHS of equation 1.10 (excluding F) are forces due to Coriolis, centrifugal and gravity effects, and are calculated by artificially setting q to zero in Luh's algorithm. Initially, the kinematic equations (1.5) and (1.6) are calculated, except that q\\ (equivalent to 0i) is set to zero and is described through a vector p,, measured from the inertial coordinate frame to the hinge point of body i, rather than the centre of gravity of body i. The equations are written in a forward spatial recursion: Pi = Xi_i + Xi = pi — ki 'ii = pi — Wi X ki — Wi x Wi x ki where pi = — x — x u>,_i x (2.1) and di = ki — U is the vector from hinge i + 1 to hinge i across body i. In this thesis, equation (2.1) has been changed with respect to Luh's original equations to reflect the fact that in the notation used in this thesis, the ith coordinate frame is placed at the bottom hinge of link i, rather than at the top (in Luh's description). The acceleration due to gravity is the first fink's acceleration p\\. The total force F, and total moment JV; about a body are then written as in equations (1.11) and (1.12). The hinge force / , and hinge moment n, are the force and moment exerted on link i by fink i — 1, and are given in a backward recursion as fi = Fi + /„ = fe ni = ni+1 + Ni - ki x Fi - di x fi+1; nn = ne or Ni = n,- - n,+i + fc,- x /,• - /, x (2.2) Chapter 2. Existing Formulations of the Equations of Motion 18 where the equation for Ni in (2.2) is rearranged to consist of hinge quantites only. /„ and nn are equal to the external force or moment at the manipulator tip. Equation (2.2) describes the moments and forces about the isolated body. /, is equivalent to 7 , in equations (1.13) and (1.14), and n, is the sum of the moments due to i2,A,- + r, (with a single z axis rotation, i?,A, is about the x and y axes, while TV is about the z axis). The elements of the force vector fv (Coriolis, centrifugal and gravity forces) are projections of the moments rii about the hinge axes 2,: fv(i) = nfzi i = l,...n (2.3) The inertia matrix element Mij from equation (1.9) is calculated using the composite mass Mj, composite centre of mass ij, and composite inertia Jj of a series of rigid bodies (a\"implies a composite variable). A composite rigid body is comprised of the subset of links j to n at the free end of the chain, where in this case n (no subscript) represents the number of bodies in the chain. The Mj, tj, and Jj can be computed recursively in a backward recursion according to the following equations: (2.4) dj) - rrijkj), (2.5) t' = n,...l (2.6) Mj = Mj+1 + THj, Cj = - i - ( M J + 1 ( c i + 1 Mj and Jj = Jj+i + bj, with Mn = mn Cn = -K Jn = Jn) and bj = Mj+1((rjr3)I where = CJ+\\ + dj - tj - —Cj — kj. rtf)) + h + m3 {irfr])I - (r*rf)) (2.7) (2.8) (2.9) tj is measured relative to the jfth coordinate frame (at hinge j). The calculation of a column j of M requires q to be artificially set to zero in the kinematic equations for all links other than j (qj = 1, a unit acceleration). In addition, the joint velocities q and gravitational forces are zero. The output of the inverse dynamics algorithm with qj = 1 Chapter 2. Existing Formulations of the Equations of Motion 19 and these other quantities set to zero results in the jth column of the inertia matrix Ai. Since joint j is the only joint in motion (links j to n are static), the total forces Fj and Nj on the composite body are Fj = MJ{ZJ x tj) (2.10) for rotational axes (prismatic joints are considered in Walker's paper but not included here). For i < j, Fi and JV, are zero, and /, and n, are computed in a backward recursion as ni — rii+i + di x and /_,• = Fj, rij — Nj + dj x Fj i = l,...j - 1; when i = j. (2.11) For i > j, fi and n, are zero. An element Mij in column j is then calculated by projecting the hinge moment n, onto the ith joint axis. Thus for the jth column, when ijj = 1, Mij = njzl i 6) robots with one cpu. parallel implementations Bae and Haug [Bae 87a] and Rodriguez [Rodriguez 88] have shown that their 0(n) algorithms can be parallelised, but only one cpu can be used for each branch in the topology of the system. Fijany [Fijany 89] and Lee [Lee 88] have stated that within a single chain, parallelism cannot be achieved at the equation level using 0(n) algorithms. Since most robots are single chains, parallel versions of 0(n) algorithms are not useful unless more complicated branched robots need to be modelled (there is of course an abundance of branched systems outside the robotics field). 2.2.3 Orthogonal complement algorithms formulations The third approach, which is the most relevant to this thesis, is the orthogonal complement method. Orthogonal complement algorithms use a global Jacobian transformation matrix which Chapter 2. Existing Formulations of the Equations of Motion 24 projects the [w x] derivative state space to a reduced (smaller) independent set of differentiated state variables e.g. relative acceleration angles 9. The transformation matrix contains vectors which are tangential to the constraint manifold of the system i.e. the vectors are orthogonal to the coefficients orienting the hinge constraint forces 7 and A. As a result, the transformation matrix eliminates the hinge forces from the primitive equations. One example of this transformation matrix approach is the Newton Euler State Space formula-tion developed by Hemami [Hemami 82]. This 0(n3) algorithm uses a two step transformation to eliminate the constraint forces (the translational hinge forces, and the rotational hinge con-straint torques which are orthogonal to the axes of rotation) from the Newtonian equations of motion. The first step removes the translational acceleration coordinate x from the principal equations (1.13) and (1.14) by using equation (1.6) (repeated here in (2.19)), Xi = x,_i + Wi_i x / t-_i + u;,-_i x u>,-_i x — ibi x ki — Wi x Wi x ki (2.19) to derive a matrix which projects the system from x to the inertially measured angular ac-celeration state w and simultaneously eliminates the translational hinge forces 7 . Following this, a second step projects the system from w to either inertially measured [Hemami 82] or relatively measured [Buchner 86] Euler angle accelerations 0. This step utilises the angular velocity Jacobian from equation (1.4), which is a symbolic version of equation (1.5) (repeated here in (2.20)): wi = Wi-i + Wi x Z{9i + ZiOi (2.20) This reduced derivative state space 0 contains only the rotational degrees of freedom (assuming only rotational joints in the system), which are orthogonal to the rotational hinge constraint forces A of equation (1.14). As a result of this orthogonality, the A are eliminated by the projection and the equations appear in a final form similar to that derived by the Lagrangian approach. Note that at this point, translational quantities in (1.13) are referred to and measured in inertial (base) coordinates, while rotational quantities in (1.14) are measured and referred to relative coordinates. Another example of an orthogonal transformation was formulated by Bae, Hwang and Haug [Bae 88a] [Hwang 88], who developed a recursive orthogonal complement formulation for closed chain systems based on variational vector calculus. Instead of denning a matrix which globally Chapter 2. Existing Formulations of the Equations of Motion 25 (i.e. system-wide) projects the inertially measured and referred Cartesian derivative state [x w] to relative coordinates, they defined a local transformation which acts upon every element in a system 'accumulated inertia' matrix. A 6x6 link 'inertia' K{ was denned as an accumulation (i to n) of link masses and rotational link inertias measured with respect to the inertial frame. K{ = Mi + Ki+1 Kn = Mn mil i = n, ...1 where M, = — 771,2; i m,x,- J , - m , ( x , ) 2 (2.21) (2.22) / is a 3x3 unit matrix, and M, is a rearranged version of Featherstone's spatial inertia matrix J,. Ki can only be accumulated if x,- and J, are referred to the inertial frame. For a simple open chain in which all joints are rotational and each has a single degree of freedom, each element of the system inertia matrix Mij is BjKiBj, where Bi = (2.23) (xi + ki)x Zi Zi The vector (x, + ki) describes the position of the ith hinge measured relative to and referred to the inertial frame. Bi locally projects the accumulated 'inertia' about the rotational axes of body i . The computational complexity of forming the inertia matrix plus the force vector is given in Table 2.1. The complexity of the matrix solver is excluded as it was not given. Note that all quantities must be referred to the inertial frame. Kim and Vanderploeg [Kim 86a] developed a formulation for closed chains using the Jacobian velocity transformation to relative joint angles proposed by Jerkovsky [Jerkovsky 78]. Although the Lagrangian kinetic energy expression is used as the principle for deriving the inertia matrix, rather than the Newtonian and Eulerian principles, the final transformation is the same as Hemami's. Kim and Vanderploeg also modeled closed chains using Lagrange multipliers and constraint Jacobians. Ibrahim [Ibrahim 88] developed a similar formulation to Kim's using the kinetic energy expression and a projection matrix. This technique was developed within a formulation which included flexible linkages and variable mass deployment. Angeles and Ma [Angeles 88] presented a formulation for a serial chain manipulator using Kane's approach to produce a natural orthogonal complement to the kinematic constraint equations. Chapter 2. Existing Formulations of the Equations of Motion 26 This natural orthogonal complement is combined with the Lagrangian kinetic energy expression to derive the formulation. The natural orthogonal complement matrix is the same as both Hemami's and Kim's final transformations, although an interesting feature of the algorithm is the referral of each body's variables to its own coordinates. This feature is exploited in the algorithm developed here in Chapter 3 to derive a similar algorithm to Angeles', but based on Hemami's method. Lilly and Orin have recently developed two algorithms which are related to the orthogonal complement concept [Lilly 91]. Lilly/Orin I is similar to those algorithms described above, using a recursive definition of the velocity Jacobian for chains of successively longer length to identify the system inertia matrix. The algorithm is 0(n3) complexity. Lilly/Orin III is similar to Bae, Hwang and Haug's algorithm [Bae 88a] [Hwang 88]. The variables, however, are referred to body coordinates. This algorithm is 0(n2) complexity. parallel implementations Hwang, Bae and Haug [Hwang 88] parallelised their algorithm for an n processor shared bus computer architecture. Generally speaking, Hwang's paper does not lend itself to a computa-tional complexity analysis, as closed chain constraint equations are included (the complexity of single chain branches becomes only a small part of the problem). Our estimates of Hwang's al-gorithm for a single chain (see Chapter 4) indicate that for the inertia matrix, 0((log2 n]) time is taken for the computation of the rotational matrices A° (using the RDA algorithm proposed by Lee), constant time is taken to refer the link inertias Ji to the inertial frame, O(|fog2n]) for calculating x, (referred to the inertial frame) and accumulating Ki, and 0(n) time is taken to locally project each accumulated 'inertia' Ki to relative coordinates (i.e. BjKiBj). The overall complexity for forming the inertia matrix is thus 0(n + [log2 n~|) using n cpus arranged in a hypercube. The architecture for the matrix solver, which is used in the following phase, is best suited to a nearest-neighbour 2d systolic mesh. As a result, both the hypercube and mesh architectures are theoretically required for implementing the simulator (it must be noted that Hwang chose to implement the algorithm on a shared bus, and consequently our estimate is for an ideal architecture). Lilly/Orin III has not been parallelised at this time. The forward dynamics algorithms proposed by Hemami, Kim and Vanderploeg, Ibrahim, and Chapter 2. Existing Formulations of the Equations of Motion 27 Angeles are all relatively similar. They have not yet been implemented on a parallel architecture, although Hemami and Zheng [Zheng 86] have developed an architecture for inverse dynamics using inertially measured coordinates. One of these algorithms would be a logical reference point for the development of an alternative parallel formulation. 2.2.4 The Newton Euler State Space orthogonal complement algorithm In this thesis the Newton Euler State Space formulation has been chosen as a possible algorithm for parallelisation. In the remainder of this chapter the equations of motion described by Hemami are developed using relative rather than inertially measured coordinates. In Chapter 3 these equations will be made more efficient by referring each body's equations to its own coordinate system. Consider the n link open chain linkage depicted in Figure 1.1. Each body is numbered B\\ through Bn, with a home body labelled 1, inertial frame labelled 0 (coincident with joint 1), and all other bodies labelled with increasing integer value as the distance along the direct path between an arbitrary body and home increases. The hinge on body i nearest to the home body and along the direct path is labelled joint i , about which the coordinate rotates. Each of the n joints has one degree of rotational freedom and is driven by an ideal rotational torque actuator. A body coordinate system (bcs) is attached to the centre of gravity of each body. In Hemami's formulation, the bcs was aligned with the principal axes of the body. As a result, the body's inertia matrix J , referred to it's own coordinates, is d\\ag[Jxx, Jyy, Jzz ]. This alignment is restrictive since the representation of axes of rotation not aligned with the principal axes is difficult. In this thesis, it has been found that a Denavit-Hartenberg-type axis orientation is more suitable, especially for single degree of freedom (dof) joints. In such a coordinate description (Fig. 2.1) the axis of rotation at joint i is assigned to be the Z{ axis of 6cs, and the x,- axis is the common perpendicular between the z,_i and z, axes, pointing in direction from z,_i to z,. Although the coordinate origin is drawn at the hinge point in Fig. 2.1, it is actually positioned at the centre of gravity of each body in this algorithm since most of the body quantities are defined relative to the centre of gravity of each body. It is drawn at the hinge since it is easy to graphically identify the direction of the axis of motion zi and the x, Chapter 2. Existing Formulations of the Equations of Motion axis when the coordinate frame is shown at the hinge. 28 Figure 2.1: Body coordinate frames For each body, the Newtonian forces and torques acting on the free body are shown as in Fig. 1.2. 7° 0 represents a vector of translational hinge constraint forces measured in and referred to inertial coordinates; AJ, represents rotational constraint torques at joint t, measured in and referred to bcsi coordinates, which act in the axes of motion (x, and yi) other than the actual Chapter 2. Existing Formulations of the Equations of Motion 29 rotational degree of freedom (z,); r/ t represents ideal actuator torques at joint t defined in and referred to 6cs,, rotating about axis Z{. Newtonian equations of motion for body i are: /«',:,« _ fi i L i ' . j i J _ it: At 0 , ni \\t _\\i ni+1 \\i+l J t w t , 0 — Jt,0 T S.t^OVi.O 'i,t+l AOli+l , 0 + • f t i , i / , i , i /1«+l-at'+l,t'+l/Vr-l,«'+l +rli-AUiritl+i (2-24) m , z ° 0 = mig + 7°o - 7°+i,o (2.25) Equation 2.24 describes the torques applied to the ith body in 6cs, coordinates, including rotational constraint torques, actuator torques, and moments due to the translational hinge constraint forces. (It should be noted that vectors appear in lower case, and matrices appear in upper case, or have a ~ over the variable to indicate that a skew-symmetric matrix has been created from a vector. The superscript on each variable indicates the bcs to which the variable has been referred. The subscripts may have a number of meanings depending on the type of variable). Parameters not already denned are: 1. w\\0 is the angular acceleration of body i (first subscript) measured with respect to the inertial frame 0 (second subscript) and referred to frame i (superscript). 2. / , ? 0 is the gyroscopic torque — (w\\<0 x JiW\\0) [Hemami 82]. 3. R\\ t describes those axes (measured with respect to, and referred to, ith coordinates) in which the ith body cannot rotate (with one dof (z axis) per joint, R\\;, is a 3x2 matrix of two unit vectors in the x and y axes). 4. is the rotation matrix referring a vector in i — 1 to ith coordinates. 5. is the skew symmetric matrix of vector A;),, measured from the centre of gravity of body i to the ith hinge (inboard (see Fig. 1.2)) and referred to the ith coordinate frame. 6. is the skew symmetric matrix of vector /J^+i, measured from the centre of gravity of body i (first subscript) to the outboard hinge i + 1 on body i (second subscript) and referred to the ith coordinate frame. 7. J- is the inertia matrix of a body measured from its centre of gravity (first subscript) and referred to its own coordinates (superscript). Because it is not aligned with the principal axes, J- is not necessarily diagonal. Chapter 2. Existing Formulations of the Equations of Motion 30 Equation (2.25) describes the translational forces affecting the centre of gravity (cog) of body i in inertially measured and referred coordinates. m,<7 is the force due to gravity, where g is [0 0— 9.81]r. Xi is the length vector from the inertial coordinate origin to the cog of body i. Given the above equations for n bodies, a matrix equation can be developed to represent the entire system: Jw = f + Hif + Nt\\ + ET Mx = g + # 2 7 J 0 0 M w / Ni A + E + 7 + 'x 9 H2 0 0 (2.26) where H\\ and H2 are coefficient matrices for the linear constraint forces; N\\ is the coefficient for the rotational constraint torques; and E relates the actuator torques to the appropriate coordinate frames. / and g are vectors describing the system forces generated by the gyroscopic torques and gravity respectively, and J and M are block diagonal matrices J = [J\\,J2,—J£] and M = [mil, m2I, ...TTINI]. An example of these coefficient matrices for a five body chain is presented in Appendix A. The internal constraint forces 7 and A may be eliminated by multiplying (2.26) by matrices which are orthogonal complements of [HT Hj]7 and Ni, respectively. First, £ is redefined as a function of w using translational kinematic hinge constraint equations. An equation for the ith hinge constraint is illustrated in Fig. 2.2. The constraint and its second derivative, referred to inertial coordinates, are: ^fi + A ^ l l - x l l f i - A l 1 l t l . = 0 (2.27) %fl - C i , o + A?(Mfl)%i - A?_1(tB|:11i0)2/::11fI- + 4 % < o - ^zhMzU = o (2.28) Equation (2.27) describes the paths through bodies i and i — 1 for getting to the ith joint and (2.28) is (2.27) twice differentiated (w is the skew symmetric matrix derived from w). The differentiated hinge constraint equation for the whole system is: U 'x = Qw + Qw x = U-^Qw+U-^Qw (2.29) (2.30) Chapter 2. Existing Formulations of the Equations of Motion 31 Figure 2.2: Vectors describing the ith hinge constraint Appendix A describes the internal structure of the matrices U,Q,Qw, U 1Q and U 1. x can be eliminated from (2.26) by substituting (2.30), resulting in: J 0 0 M I / Ni E 0 w = + 7 + A + T — U~lQ 9 H2 0 0 MU^Qw (2.31) Chapter 2. Existing Formulations of the Equations of Motion 32 Hemami describes [IT {U~lQ)TY a s the orthogonal complement to [HT Hj]T, with the rows of [IT {U~^Q)T'^r spanning the tangent space of the kinematic constraint surface. The columns of [B~i Hl]T, which are normal to this constraint surface, are orthogonal to the tangent space. Premultiplying (2.31) by [IT (U~lQ)T) thus eliminates 7 , since it can be shown that H2 = UT and H\\ = —QT, and so the result is zero. A second transformation can be used to eliminate A and project the system to relative joint angle acceleration space 0. Kinematic equation 1.5 is used to project w\\0 to relative joint angle velocities. Equation 1.5 can be written as | is the angular velocity relative to the previous body, but referred to its own coordinates. w\\ t _ x is related through kinematic differential equations to the joint angle rates 0{ using the rotational sequence Body-3 xyz [Hughes 86]: Wix cos Oiy cos 9iz sin 0iz 0 Oix Wiy = — cos 9iy sin 0iz cos 0 Oiy sin Oiy 0 1 Oiz When limited to a single dof in the z axis, 0,x and 0{y are zero, as 0ix = a, (the twist angle constant) and 0iy = 0. Thus if 0i = the above equation simplifies to as in equation 1.5. Note that z), is the z axis of the ith body referred to itself. Thus for the whole system w = CO and ii; = CO + CO (2.35) where C is described in Appendix A. 0 is the system vector of zA\\A\\z (•^•2^1^2,1-^2^3 + y4 2 d| 2 A3 + fc^J-A^iZ (A32A\\d\\x + A\\dla + fcfi3) x A\\A\\z P%,\\ x A\\A\\z (Alipli-iy + kl^xAlAlz (3.5) (3.6) In this case is the vector from the centre of gravity of body 3 to the lower hinge of body 1 (joint 1), in &CS3, while A\\A\\z can be denned as c\\x. The (i, j)th term of [U^QbcaC] is therefore [(U^Q^C]^ or as a system, [(U~1Q)bcaC] where pjj Ph and c\\j = [pxC] - kj- •• A • 2; (3.7) (3.8) (3.9) (3.10) where [p x C] is a symbolic representation. />} • is equivalent in magnitude to the vector used by Angeles [Angeles 88] but with a reversed direction, and c\\ • is equivalent to Angeles' e,j. In this thesis, the order of the cross product is reversed to compensate for the change of direction in p1-. The 3xiV matrices for cj— and p\\j x cj—, where i = n; j = l,...n; are the components of the standard end effector Jacobian. In Angeles' method, J and M are factorised to J1J and MTM where M,- = y/mll^xz- In this thesis, J and M remain unfactorised. Chapter 3. Efficient Computation using Body Coordinates 38 Now that we have efficiently defined (U QbcsC) to be (p X C), equation 2.37 is modified to: [CT {P x Cf) JC M(P x C) e = CT (pxCf f + 9bcs i- 1 J E C6 + M(p x C) 0 or MO = [HTH]0 = b (3.11) In equation (3.11), the ith component of g^ is g\\0 — miA'0g, the force due to gravity of each body when it is referred to body coordinates. The transformation matrix H = [CT (px C)T] on the LHS of (3.11) was developed independently of Angeles' work. Another interesting feature of the H matrix is its recursive nature. The matrix can be considered as a series of N 3 x N Jacobian matrices, the ith one representing a manipulator with i links. This characteristic was also noted by [Lilly 91]. 3.1.2 Branched systems algorithm Consider the branched system shown in Fig. 3.1. The two branches consist of branchl = {1,2,3} and branch2 ={1,4,5}. Each new branch starts from the home body (1) and is completely labelled out to the tip before a new branch is started. If a branch already has some bodies previously labelled, the next unlabelled body is tagged with the lowest unused integer. Consider body 1. Joint jl connects body 1 to ground, joint j2 connects body 2 to body 1, and joint j4 connects body 4 to body 1. Define l\\2 and l\\ 4 as the vectors from the cog of body 1 to the hinge connecting bodies 2 and 4 to 1, referred to bcs\\. The / vectors point to hinges that are further from the home body. k\\ a is the vector from the cog of body 1 to the joint connecting body 1 to ground (joint 1), or in the general case the body which is one closer to home. Then, define d\\x = kxl — l]2 as the vector from joint 2 to joint 1, across body 1, in bcs\\. In general the vector across body i from joint j to joint i can be defined as = fcjt- — l\\j where i = prev(j), the body that is one closer than j on the direct path to the home body. The orientation of each body's coordinate system (not shown in Fig. 3.1) remains the same as for the simple chain (e.g. the X4 axis is parallel to the common perpendicular between the Z\\ Chapter 3. Efficient Computation using Body Coordinates 39 Figure 3.1: A branched tree of five bodies and 24 axes). From Fig. 3.1, equations of motion similar to (2.26) may be defined (see Appendix B ) . The structure of the matrices differ from the single chain configuration as extra elements arise for bodies which have more than one outward body attached (e.g. body 1). The elimination of constraint forces proceeds as before, with the statement of kinematic trans-lational hinge constraint equations, referred to the inertial frame: *i ,o + 4 * * 1 , 1 = ° T° 4. A°h2 - r° - A0!1 - 0 X2,Q 1 Ji2K2,2 x l , 0 — v x3,0 + -^3*3,3 ~ ^2,0 — -^ 2^2,3 = ° r° 4- A°h4 r ° - A0!1 - f) x4,0 + A4K4,4 ~ z l , 0 A l l l , 4 — u Chapter 3. Efficient Computation using Body Coordinates 40 T ° _ j_ A°h5 - r ° - A°l4 - 0 (3.12) Referring each equation to body coordinates and then differentiating twice yields the branched versions of matrices U,U~l,Q and Qw (denned in Appendix B). As in the single chain case, the translational constraint forces are eliminated due to the orthogonality between [Hx Hjbcaf and [IT ((U~1Q)bcsC)T]T, and the rotational constraints are eliminated by the orthogonality between C and N\\ [Hemami 82]. To automate the derivation of the kinematic constraints and the equations of motion, it is necessary to obtain U and f - 1 . This is because the C, (U^Q)^ and (p x C) matrices (shown below) have structures identical to Z 7 - 1 . U-1 is a trivially generated connectivity matrix describing the bodies that appear along the path between a given body and the home body. For example, row 5 in U-1 (in Appendix B) has an I in positions 1, 4, and 5, indicating that the path from home to body 5 includes bodies 1, 4 and 5. It is also useful to identify adjacent bodies. Considering body 4, the body previous to it in the path is found by searching for a -I in the fourth row of U. This occurs in column 1. Once U'1 is generated, Hi and U are easily constructed. U and U~x can be incorporated into simple algorithms for determining other matrices e.g. the computation of Qw and (p x C). As an example of the underlying structure of the matrices, the (U^Q)^ (3N x 3N) and C (3N x N), when computed from U~l, Q, and the kinematics equations (1.5) and (1.6) (Appendix B), are: •^1^2,1 *2,2 ° -^2^1^2,1 -^2^3,2 ^3,3 A\\d\\x 0 0 A\\A\\d\\x 0 0 0 0 0 K4,4 / 14\"5,4 0 0 0 0 K5,5 ,and C = z 0 0 0 0 A\\z z 0 0 0 A\\A\\z A\\z z 0 0 A\\z 0 0 z 0 A\\A\\z 0 0 A\\z z (3.13) Post multiplying (U~1Q)bcs by C results in a 3N x N matrix similar to the (p x C) matrix Chapter 3. Efficient Computation using Body Coordinates 41 generated in equation 3.8 but with non-zero elements in the same positions as those in U : [U^QbcC] = [PxC] = Pn x c n 0 0 0 P\\\\ X c21 ^22 X c22 ° 0 P%\\ X c31 P32 X c32 ^33 x c33 0 0 0 0 p\\\\ x <4i o P%\\ X C51 0 0 0 P444 xc444 0 PS4 X Cg 4 p|5 X c|5 (3.14) Calculating pj • requires knowledge of the path between bodies i and j, which is found in U and U-1. A recursive algorithm to generate cj—, /?} • and [(U~1)QbcsC]\\ j that uses and f/ _ 1 to take into account the system topology is: for j = 1 to N { for t = 1 to N { for p = 1 to i if ([/,> = -/) K 0 = P; ph [U-'QbcsC]^ } } c .• = z K l , l U'<3 Ap(i)Cp(i),J I < J i < j (3.15) Since U~l and U are trivially generated, arbitrarily branched systems can be modeled in an automated manner by utilising this algorithm. 3.2 Inertia Matrix Calculation and Complexity Simulation of the dynamics requires the formation of the terms in (3.11), followed by the linear equation solution of MO = 6. In this section the computational complexity of assembling Chapter 3. Efficient Computation using Body Coordinates 42 the inertia matrix M for the serial chain algorithm (on one cpu) is estimated. Because the complexity of the branched algorithm depends greatly on the system topology, it is not useful to estimate the cost of an arbitrarily branched system. The inertia matrix in (3.11) is factored into two rectangular matrices, HT = [CT (p x C)T] (Nx6N) and r JC M(p x C) H = (3.16) which is (6N xN), where (p x C) = U^QbcaC. The algorithm for calculating (3.16) is: for j = 1 to N { for i = j to N { ci,i = z-c ph [U-'QbcsC]^ = • ™ t - l c « - l , j M-x(pth - kid + tij Pli x <; i > 3 i > j Vrx-mi(p\\,j X Clj) The computations in (3.17) can be decomposed into the following phases: (3.17) 1. Forming the rotation matrices A]^ which refer a vector in 6c5,_i to 6cs,: cos Bi cos a, sin di sin a,- sin 6i — sin di cos ai cos 0, sin a, cos 0, 0 — sin ai cos a, (3.18) Assuming the n (sin#,, cos#i) values have been calculated, and sin a,- and cos a,- are constants, this phase requires An multiplies and 0 adds for an n-body chain. Chapter 3. Efficient Computation using Body Coordinates 43 2. Calculating cj- -, which is the zj • axis referred to fees,- coordinates. Consider the system C matrix for a five body chain: C = z 1 0 0 0 0 A 2 z 1 z 2 0 0 0 A>2Z2,2 z3,3 0 0 Alzl,l A A 7 2 A2Z2,2 \\4 3 A3Z3,3 z4,4 0 A 5 z 1 Ah72 /i2z2,2 A5?3 A3Z3,3 z 5 z5,5 where cj- = sjj = zand cj-,- = AJ.^J.Jj (3.19) Each cj- • in equation (3.19) is a matrix-vector product requiring 8x and 54-. The C matrix thus requires (\"~2Kn~1)(gx,5+), This is because the first diagonal consists only of constant zjj vectors, while the second diagonal refers constant z\\ vectors to 6cs,-+i, which is equivalent to just selecting the last column of the A j + 1 matrix. The other diagonals require the computation of (3.19). 3. Calculating pj- -, the displacement vector from the centre of gravity of body i to the jth hinge: [U-lQ}bcs = P = P\\,I o P2,l P2,2 0 0 ^3,1 p\\,2 ^3,3 0 0 0 PA,\\ P4,2 Pi,3 P4,4 0 0 0 0 P5,l Ph,2 P\\,3 P%,4 P5,5 (3.20) The first diagonal values of the p matrix (i = j) are p{- • = k]^, so no computations are necessary. The rest of the elements require (n)(^~1)(8x,ll+). ^ 4. Multiplying every element in the ith row of the C matrix by J\\. jir\\ . l,...i (3.21) The first diagonal, i = j, can be calculated off line as the multiplicands and the results are constant, and hence the cost will not be counted here. The rest of the matrix requires i2H»z!l(9x,6+). Chapter 3. Efficient Computation using Body Coordinates 44 5. Calculating the cross product of c\\ • and p\\j, and then scaling by the mass mj. mi(pij x citj) (3.22) The first diagonal can be calculated off line. The cost of the rest of this phase is (n)(21~1)(9x,3+). Note that we have multiplied by J? and m,, instead of J\\ and y/m^ This is because the force vector algorithm (to be described later) use C and (p x C) rather than JC or M(p x C). 6. The final phase in the formation of the inertia matrix is to multiply CT by JC and (p X C)T by M(p X C) according to: Mik = £ ( c j „ ) r • (J/4fc) + (4.. x cj i t .f • ( m ^ x cj)fc)) (3.23) Since the inertia matrix M. is symmetric, only half of the matrix needs to be com-puted. The complexity for this phase is then (^ + ^ + § - n)(6x,5+) - \" ( n ~ ^ ( l x , 1+). The —n(6x,5+) appears because terms multiplying two first diagonal elements, being constants, can be calculated offline. The —n^\"2~1^(lx, 1+) term is present because rrij{Kp>- • x Cj j) vectors have zeros as their last elements. The total computational complexity for forming the inertia matrix is the sum of the previous phases' complexities, and is presented in Table 3.1. The algorithm is slightly more efficient than Angeles' because some of the terms are calculated off line. The complexity polynomial given by Angeles also has minor errors, and these have been corrected in Table 3.1, which presents the complexity for a number of single cpu algorithms. Table 3.2 gives data on the costs for calculating the inertia matrix for linkages of various size for various inertia matrix algorithms. The algorithm Lilly III, developed by Lilly and Orin [Lilly 91] and similar to [Bae 88a], is the most efficient in Table 3.2. In this study however, the proposed algorithm will be parallelised because it posesses an architecturally attractive computational structure, compared to Hwang's parallel formulation discussed earlier. Chapter 3. Efficient Computation using Body Coordinates 45 Algorithm force vector inertia matrix solver Angeles X + 105n - 99 90n - 95 n 3 + 39n± _ 33n + g 5n 3 , 29n 2 46n , r \"f i\" + ~ ~ + 0 n J . 3nJ , 4n 6, 2 ' 3 n 3 i 3n 2 i n (j t 2 1\" 3 Trrl + 2 1 n 2 + 539n _ g l n 3 + 16n2 + 75n - 90 Proposed X + 3n2 + 77n - 44 3n2 + 62n - 43 n 3 + 39n± _ 49„ + g 5n 3 i 29n 2 61n , c 6 \"•\" 2 3 ~r ° n J , 3n J 2n 6 f 2 3 HI 4. „ 2 _ 7n 6 6 Zgi + 24n 2 + 2 ^ - 3 6 n 3 + 3 ^ i + sin _ 3 8 Walker X + 137n - 22 l O l n - 11 12n2 + 56n - 27 7n2 + 67n - 53 n J , S n 2 2n 6; \"•\" 2 3 £ + n 2 - I * n J i 27V • 577n 4 Q 6 ' 2 ' 3 ^ «3 + 8n2 + 1 0 ^ l n 64 Brandl X + 250n - 222 220n - 198 Fijany X + 9 ^ + 2 3 2 l n 181 4n2 + 88n - 137 Lilly I X + n 3 + 22n2 - 35n + 12 n 3 + 15n2 - 26n + 10 Lilly III X + 25n y | 49n 37 8n2 + 40n - 48 He X + 91n-97 86n - 96 Table 3.1: Complexity of Forward Dynamics for One Cpu Algorithm 4 5 6 7 Proposed Walker Angeles Fijany Lilly I Lilly III 286x, 209+ 389x, 327+ 318x, 229+ 353 x, 279+ 288 x, 210+ 261x, 240+ 498x, 370+ 555 x, 457+ 538 x, 395+ 509 x, 403+ 512 x, 380+ 398x, 352+ 779 x, 585+ 741 x, 601+ 827X, 615+ 664 x, 535+ 810x, 610+ 560 x, 480+ 1135x, 851+ 953 x, 759+ 1191X, 894+ 848 x, 675+ 1188x, 906+ 747 x, 624+ Table 3.2: Computational Complexity for Inertia Matrix Assembly for One Cpu 3.3 Force Vector Calculation and Complexity The RHS of equation (3.11), which includes the actuator forces or torques plus the effects of Coriolis, centrifugal, and gravitational forces (/„), appears to be computationally expensive when calculated according to equation (3.11). It is commonly calculated more efficiently using the 0(n) inverse dynamics algorithm proposed by Luh, Walker and Paul [Luh 80] (Chapter 2). An 0(n) algorithm which is more efficient than Luh's, proposed by Angeles and Ma [Angeles 89] (Table 3.1), uses Kane's generalised force approach. In this section a new algorithm based on Angeles' algorithm is proposed. Although the proposed Chapter 3. Efficient Computation using Body Coordinates 46 algorithm is of 0(n2) complexity rather than 0(n), it is slightly more efficient than Angeles' for systems with six or less degrees of freedom, as it makes use of the HT = [C (p x C)]T matrix, which has previously been calculated during the computation of the inertia matrix. In contrast, Angeles' algorithm does not make use of the previous inertia matrix calculations. Consider a chain of n bodies. Assuming there are no dissipative forces such as friction, the equation of torque is [Angeles 89]: /« = t [ ( ^ ) T ^ o + (%T™3no ~ 9)] (3-24) and are known as the partial velocities, and are the jth 3xn Jacobian row matrices found within the C and (p x C) matrices, respectively. The angular momentum derivative Z j 0 from (1.12), measured with respect to the inertial frame and referred to the jth frame, is: The translational acceleration Xj is the acceleration of the Xj vector measured from the inertial frame to the centre of gravity of body j. It may be replaced by pj, which as defined here includes the effect of the acceleration due to gravity (i.e. pj = Xj — g). Angeles' recursive equation for pj is: p£o = Al_1(fiZ\\fi+qz\\fixlll\\j+^^ (3-26) An efficient algorithm for calculating p\\0 and L\\0 is [Angeles 89]: ;,0)2. (3.29) Then #:0 = Ai^Zlo + Wi-tfcl^-Wikli (3.30) Equations (3.27) and (3.28) are simply kinematic equation 1.5 referred to body coordinates. Equation (3.29) combined with (3.30) is an efficient method of representing (3.26), while (3.31) is (1.12) referred to body coordinates. Chapter 3. Efficient Computation using Body Coordinates 47 Angeles substituted the 3xn row matrices in C and (p x C) into (3.24) to get: and t = l,...n t = 1, ...n (3.32) (3.33) where (cj,)T = (cjit)TA« and (pj,- x c j „ ) T m J ^ 0 = (cj„-)r(pj,- x mjpji0)- Angeles showed that equation (3.33) is computable in a backward recursion in O(n) time (Table 3.1). 3.3.1 0(n2) force vector algorithm and complexity In this section we define a new algorithm for the force vector which makes use of the [CT (p x C)T] matrix. Equation (3.24) may also be written as fv = cT (pxcy L mp (3.34) where [L mp] is the vector of L{ and m^p,, i — 1, ...n. Thus it can be seen that the momentum derivatives are related to (3.11) by L f 0 J = H \\ + — mp \\ 9bcs M(p x C) cej (3.35) The complexity of computing equation (3.34) is 0(n2) due to the matrix-vector multiplication, and requires the initial computation of L and mp from equations (3.27) to (3.31), followed by the matrix computation of (3.34). The cost may be estimated as follows: 1. Calculating w\\0 requires (n- l)(8x,6+) using (3.27). w\\0 = z\\ x0\\ is a scalar assignment operation. 2. Calculating w\\Q requires the last term in equation (3.28) to be ignored (the 0,- are set to zero to exclude the inertial terms). The second term is a cross product which only requires 2x since has only one nonzero element. Thus the complexity is (n - l)(10x,7+), with w\\0 = 0. Chapter 3. Efficient Computation using Body Coordinates 48 3. Calculating Wi requires (ra - l)(6x,9+), plus lx for W\\. 4. Calculating pi requires (ra — l)(26x,23+), plus 12x,9+ to calculate p|0 = -Alg - Wxk\\A (3.36) where g = [0 0 — 9.81]T. The first term in (3.36) compensates for gravity by adding an additional acceleration to that induced by the angular velocity w°0. 5. Calculating m,p, requires ra(3x). 6. Calculating X, in (3.31) requires (ra - l)(24x, 18+) plus 15x,10+ for L\\fi. 7. Calculating the matrix-vector product in equation (3.34). The first diagonal elements in C are z = [00 1]T, and so no cost is associated with dot products involving these terms. The complexity of (3.34) is \" Y \" ~ 1)-(6x,5+) + ra(3x,3+)+ ra(n~1)(l+) = 3ra2(x,+). (3.37) (3.38) 2 The first term calculates the dot products Li Ttlipi for those computations not involving the first diagonal. The second term adds the com-putations for [(p\\i x cj-,)][m,p,], and the last term describes the operation in which the rotational and translational dot products are added together to form the force vector. The complexity for the proposed force vector algorithm is given in Table 3.1. Complexity values for multibody systems of various size are given in Table 3.3, which shows that the algorithm presented here is more efficient than Angeles and Walker's algorithms for systems of six dof or less, but less efficient than He's. For parallel formulations, however, the thesis will demonstrate in the next chapter that the algorithm developed here integrates very well with the parallel inertia matrix algorithm, making it a good choice for parallel implementation. The force vector for a branched system can be calculated if the [CT (pxC)T] matrix is generated using equations (3.13) and (3.15), and if the kinematic equations (3.27) and (3.28) take into account the branching. Chapter 3. Efficient Computation using Body Coordinates 49 dof Proposed Angeles Walker He 4 312x, 253+ 321 x, 265+ 526 x, 393+ 267 x, 248+ 5 416 x, 342+ 426 x, 355+ 663 x, 494+ 358 x, 334+ 6 526x, 437+ 536 x, 445+ 800 x, 595+ 449 x, 420+ 7 642 x, 538+ 636 x, 535+ 937 x, 696+ 540 x, 506+ 8 764 x, 645+ 741 x, 625+ 1074x, 797+ 631 x, 592+ 9 892 x, 758+ 846x, 715+ 1211 x, 898+ 722 x, 678+ 10 1026x, 877+ 951 x, 805+ 1348 x, 999+ 813 x, 764+ Table 3.3: Complexity of single cpu force vector algorithm 3.4 Equation Solving The Cholesky decomposition linear equation solver [Walker 82] was used as the matrix solver for the single cpu case, as the inertia matrix is positive definite and symmetric. The complexity of the Cholesky algorithm, which consists of a decomposition and subsequent backsubstitution, is given by Walker as: £ x = n3/6 + 3n2/2 - 2n/3 = n3/6 + n 2 -7n/6 (3.39) 3.5 Simulations In this section the simulations of two linkages are discussed. Models of a PUMA 600 six degree of freedom rotary jointed manipulator, and a six degree of freedom model of the human torso (trunk, head, and two upper and lower arms) have been implemented on single cpus. A single cpu simulation of the excavator will be presented in Chapter 5. The PUMA 600 was chosen as an example of a rotary manipulator which is commonly discussed in the literature. In this thesis, profiles of 9, 0, and 9 given by Angeles and Ma [Angeles 88] have been used to drive the inverse dynamics algorithm previously developed in this chapter in sec. 3.4. Although the desired profiles are relatively slow, and hence may not excite all the dynamics of the system, it was necessary to produce results which could be directly compared to those existing in the literature. The inverse dynamics then controls the manipulator model Chapter 3. Efficient Computation using Body Coordinates 50 Joint a.-(deg) a,(m) Mm) Initial 0, 1 0 0 0 0 2 -90 0 -0.149 0 3 0 0.432 0 0 4 90 0.02 -0.432 0 5 -90 0 0 0 6 90 0 -0.056 0 Table 3.4: Denavit Hartenberg parameters for the PUMA 600 in an open loop control mode. All parameters are in metric units, with mass in kg, inertia in kg • m 2, length in m, and torque in Nm. The human torso was chosen as an example of a branched system. Open loop inverse dynamics control was used to simulate joint profiles similar to those used for the PUMA simulation. 3.5.1 Puma 600 manipulator A block diagram of the system appears in Fig. 3.2. The length parameters for the PUMA 600 are given in Table 3.4 (Denavit Hartenberg parameters [Angeles 1988]) and the mass and inertia parameters are given in equations (3.40) to (3.42). Fig. 3.3 is a diagram showing inverse dynamics T d PUMA 600 M0=G(9.e.Td) e e e Figure 3.2: Block diagram of the open loop inverse dynamics controlled PUMA 600 dynamics simulation the Denavit Hartenberg coordinate frames attached to the PUMA. In the coordinate system adopted in this thesis (and used in the following equations), joint i is labelled as the lower joint on link i. Because the coordinate frame is attached to the centre of gravity, the following conversions are required to generate k and / from the Denavit Hartenberg parameters and from Chapter 3. Efficient Computation using Body Coordinates 51 Angeles' paper: Chapter 3. Efficient Computation using Body Coordinates 52 d) V • bi+i sin a , + i -bi+i cos a , + i Ki,i \"•+!,« resulting in and 7 3 J3 J,5 mi m2 m 3 7714 m5 m 6 = 10.521, jfcJJ = [0 0.054 0], l\\T2 = 15.781, k%T2 = [-0.14 0 0], l22T3 = 8.767, ifc3^ = [0 0.197 0], /|J = 1.052, k\\TA = [0 0 0.057], = 1.052, klf5 = [0 0.007 0], / f r = 0.351, 41 = [0 0 0.019]. .6 — [0 0.203 0] [0.292 0 0] [0.02 -0.235 0] [0 0 0.057] [0 -0.049 0] 5 — 1.6120 0 0 0. 0.5091 0 0 0 1.6120 3.3768 0 0 0 0.3009 0 0 0 3.3768 0.0735 0 0 0 0.0735 0 0 0 0.1273 7 2 T4 7 6 -0.4898 0 0 0 8.0783 0 0 0 8.2672 0.1810 0 0 0 0.1810 0 0 0 0.1273 0.0071 0 0 0 0.0071 0 0 0 0.0141 (3.40) (3.41) (3.42) integration algorithm The simulation used a fourth-order fixed step Runge-Kutta method described by [Press 88]. The step size was set at h = 0.01s, and the model was run for 10 seconds of simulation time. Desired values of 0j, 0\\, and 0& were fed into the inverse dynamics algorithm so that the desired Chapter 3. Efficient Computation using Body Coordinates 53 actuation torques rd were available to the PUMA model at every function evaluation (four evaluations per step h). inverse dynamics The 0(n2) inverse dynamics algorithm developed earlier was implemented on one cpu. The desired profiles were generated according to the following equations: MO = (tw„i - sin(ti»Bt))/2 MO = (Wn - Wn COS(u>„r))/2 MO = wl sm(wnt)/2 i'=l,...6 (3.43) where wn = 2it/T, (T=10s) and t is the time variable. The torques r,d(i) are calculated in correspondence with the four time points required by the integration algorithm- once at f (operation [1]), twice at t + h/2 ([2]and [3]), and once at t + h ([4]): MO = f(ed(t), ed(t), 9d(t)) [i] rid(t + h/2) = f(9d(t + h/2), 9d(t + h/2), 9d(t + h/2)) [2,3] M * + h) = f{9d(t + h), 9d(t + h), 9d(t + h)) [4] (3.44) where the rd are calculated using the algorithm described in sec. 3.3.1, with 9{ = 9{d included in equation (3.28). forward dynamics The equations for the PUMA model are calculated on a separate cpu. The r,d are read in at every function evaluation, and are used as the actuator torques r a and added to the force vector fv to get b (fv is calculated using sec. 3.3.1, but 0, is set to zero). The simulation algorithm calculates /„ , b and M and then solves for 9. The Runge-Kutta algorithm then integrates 9 to get 6 and again to get 9. After four evaluations, the algorithm calculates a weighted response for states 9, 9 and 9 at time point t + h (see Chapter 5, section 5.4.1 for an exact description). Chapter 3. Efficient Computation using Body Coordinates 54 time (s) Figure 3.4: 0d, 0d, 0d profiles for all joints of PUMA 600 model. Overlaid are the 0, 0 and 0 responses for joint 6. simulation results Fig. 3.4 shows the desired 0d, 0d, and 0d, and Fig. 3.5 shows the rd (labelled as tql to tq6) generated by the inverse dynamics algorithm in response to the desired profiles. The simulator response for joint 6 is also shown in Fig. 3.4, and is indistinguishable from the desired values. All of the links produce a similar response when overlaid. Fig. 3.6 shows the error curves for link 6, which show maximums of -1 x 10_4deg for 0, —5 x 10_ 5deg/s Chapter 3. Efficient Computation using Body Coordinates 55 I Time (second) Time (second) 1 Time ( second) Time ( second) Figure 3.5: rd profiles for joints 1 to 6 generated by the inverse dynamics of the P U M A 600 model for 0, and 6 x 1 0 - 5 d e g / s 2 for 9. These results are similar to those obtained by Angeles and M a [Angeles 88] (while their errors were similar in magnitude, their step size was larger Chapter 3. Efficient Computation using Body Coordinates 56 time (s) Figure 3.6: Errors 0d -0,0d- 6, 0d - 0 profiles for joint 6 of PUMA 600 model (thus requiring less computations)). Angeles' performance is slightly better (computational cost-wise) due to their using the fifth/sixth order DEVERK integration algorithm from IMSL, which calculates eight evaluations (rather than four) of the derivative before advancing one step h (this gives a better estimate). The time taken for 10 seconds of simulation was 71.12 seconds of computer time. This time was measured assuming no results were stored and that the stack was initialised within the internal R A M of the Inmos T800 transputer (a microprocessor designed for parallel processing). The time taken for one evaluation (inertia matrix formation, force vector formation, and matrix solution) is 17.2 ms. Theoretically, the complexity estimate from Tables 3.2, 3.3 and equation (3.39) for n=6 is Chapter 3. Efficient Computation using Body Coordinates 57 1391 x, 1087+ (also shown in Table 3.5 at the end of this chapter). If a multiply and an add costs 1 ^s and 0.5 ps respectively in a transputer, the theoretical computation time should have been 1.93 ms. Adding the cost of n (sin, cos) calculations at 7/xs each, the total increases to 2.02 ms. This estimate would meet the maximum evaluation time of 2.5 ms when a fourth order Runge Kutta and /i=0.01 are assumed. The actual computations are thus 17.2/2.02= 8.5 times slower due to superfluous computations required for loop counters and array indexing, external memory fetches for global variables, and the cost of entering and leaving subroutines (the cost of saving data on the stack). These costs are unavoidable unless the modularity of the formulation is removed by not using indexed arrays and by replacing all subroutines with inline code. 3.5.2 Human torso In this model of the torso, the head, trunk and left and right upper and lower arms are assumed to have one degree of freedom per joint. Fig. 3.7 describes the structure of the system, including the body coordinate systems. The mass and length parameters used by the formulation are: [0.15 0 0.15] [0 0 0.242] lis = [-0.15 0 0.15] m2 = 5.0, k%t2 = [0 0.12 0], m 3 = 2.8, Jfc^ 3 = [0 0.2 0], /J 4 = [0 - 0.25 0] m4 = 1.4, kj4 = [0 0.2 0], m s = 2.8, kj5 = [0 - 0.2 0], = [0 0.25 0] m6 = 1.4, fcje = [0 - 0.2 0], (3.45) J\\, the trunk inertia, is obtained from [Khosravi 87] while the rest of the inertia parameters are found by approximating each limb as solid cylindrical links with radius r,- and length /, (and Chapter 3. Efficient Computation using Body Coordinates 58 for the head, a solid sphere of radius r 2 ) . T h e values are: r 2 = 0.12, r 3 = 0.05, r 4 = 0.05, r 5 = 0.05, r 6 = 0.05, and / 3 = 0.45, l4 = 0.4, l5 = 0.45, / 6 = 0.4 Chapter 3. Efficient Computation using Body Coordinates 59 These values only approximate those given by Khosravi, since in his case the arms are modeled as one single link, as he was modeling a four link chain performing a somersault. Thus: 7 5 -J5 — 2.7 0 0 2 T 7 1 2 7 ^ 5 0 0 0 0.40 0 , «/2 — 0 2m2r| 5 0 0 0 2.7 0 0 2m 2r| 5 m3(r4 + ft) 0 7 7 1 3 1 - 3 2 0 0 0 m 5 r 5 0 0 7 4 -4^(4 + 5) 0 0 tnU01 cpu0 2 CVU03 row 0 cpu u cpU24 row 1 row 2 c p u ^ cpu '44 row 3 row 4 Figure 4.1: The triangular 2d mesh array used to implement the dynamics algorithm 1. Calculating the 6n x n matrices H and H (H is a component of H), and the 6n x 1 momentum derivative vector [L mp]T. Row 0 of the array in Fig. 4.1 can be used to calculate the derivative vector, while simultaneously rows 1 to n are used to calculate HT and H. 2. Multiplying H and [L mp]T by H1 In this chapter the M and fv calculations are parallelised so that the computations occur in a balanced manner, where both the computations of H and [L mp] finish simultaneously. Chapter 4. A Parallel Dynamics Algorithm 68 This is necessary so that phase 2 can proceed in a synchronised manner. Following this, the array is reused to solve the matrix equation. This is easy to implement using the homogeneous architecture described here, in comparison to the architectures of [Lee 88] and [Fijany 89], which require different architectures for the equation formation and for the solver. 4.2 Parallel Calculation of H The calculation of H requires the recursive evaluation of equation (3.17): H = JC M(p x C) (4.3) The architecture in Fig. 4.1 can be used to compute equation (4.3) in parallel. The following notes are descriptions of the calculations carried out using this architecture: calculate A\\_^ In each cpu in row 1 (Fig. 4.2), cpuit,, the rotation matrix AJ_1 is calculated from equation (3.18) (operation 1 in Table 4.1). The 0, are assumed to be resident. No communication is necessary, and the cost of computing sine and cosine functions (1 of each) are excluded from Table 4.1. Note that the Inmos T800 transputer processor can compute the sine and cosine functions, and thus it is not necessary for the host computer to send these values (assumed by [Lee 88]). calculate cj- • and p\\j Calculation of cj • and pj- • (equations 3.19 and 3.20) proceeds using row 1 as a systolic pipeline, in which cj • and then pj • j = 1,...» (operations 2 and 3 in Table 4.1) are alternately calculated in cpui)t- (Fig 4.2). Once a cj- • vector is calculated, it is passed eastward (e) for the calculation of cj+j • by cpu j^+i according to equation (3.19). The vector is also sent southward (s) to the lower rows in the mesh in preparation for operations 4 and 5. The communication of c\\j (send Chapter 4. A Parallel Dynamics Algorithm 69 i-1 C. i-1 1 e w 2 e w 3 e w 4 A 3 A 2 s s s s 11 1 11 21 2 21 2 ) 22 2 22 31 3 31 3 32 3 32 3 I 33 3 33 41 4 41 4 > 42 4 42 4 > 43 4 43 4 > 44 4 44 row 1 time Figure 4.2: Pipeline computation of A\\_:, c|j, and p\\ • in cpui>t (row 1) east and south; receive west) and the computation of p\\; • is assumed to be simultaneous using the Inmos T800 transputer or the TMS320C40 as the cpu, as they both have DMA controllers for each communication link, enabling simultaneous communication and computation. Note that complete overlap is difficult to achieve with current transputer technology (the cpu chosen Chapter 4. A Parallel Dynamics Algorithm 70 for implementing these algorithms) as the amount of data to be transmitted is small, resulting in a communication setup time that becomes relatively significant. Theoretically, however, it has been assumed here that a complete overlap between the multiple communications and the computation is achievable. The time taken to send a vector of three 32-bit floating point numbers is about 12 microseconds. A matrix-vector product or cross product is considered here to be the smallest atom of computation, since the cost is about 12 microseconds (and therefore the computation can overlap the communication of a 3 X 1 vector). Fig. 4.3 shows the final distribution of elements in the part of the array used to calculate M (rows 1 to 4), and Table 4.2 is the schedule for operations 1 to 5 in Table 4.1, for a four link chain. Operations 1 to 11 in Table 4.2 represent atomic operation cycles. col 1 col 2 col 3 col 4 c ! P 1 11 11 w 2 2 e w 3 e w 4 C 41 A C 21 P 21 C31 n n n r S r S s 2 <» < 4 C22 J C32 C42 n n i s rS •x 3 4 4 C33 P 33 C43 n - s 4 C44 P4 44 row 1 row 2 row 3 row 4 Figure 4.3: Final distribution of c\\j and p\\j vectors Chapter 4. A Parallel Dynamics Algorithm 71 # operation cost comms cpus no. floats 1 4-1 4x 0 n 2 i i (n - l)(8x, 5+) 3n n(n+l) , 2 3 (n - l)(8x, 11+) 3n n(n+l) 2 4 9x, 6+ 0 n(n+l) 2 5 9x, 3+ 0 n(n+l) 2 6 iiVKk) n(3x, 2+) 0 n(n+l) 2 7 P\\M x c{k n(6x, 3+) 0 n(n+l) 2 8 (p x C)TM(p x C) n(3x, 2+) 3n n(n+l) , 2 v 9 n(2+) n n(n+l) 2 Table 4.1: Computational complexity for parallel H algorithm cycle cpu n C P U 1 2 CPU 13 Cpu 14 COSt to CpU 14 1 A1 4x 2 C2,2 c3,3 0x,0+ 3 PL P2,2 P3,3 4 P4.4 0x,0+ 4 Jlcl,l -2 c 2 , l c3,2 4.3 8x,5+ 5 m i (Pi.i x cj f l) P2.1 P3.2 4 04,3 8 x , l l + 6 • 2^C2,1 c 3, l C4,2 8x,5+ 7 ™2(P'2,1 X C ^ ) P4.2 8 x , l l + 8 J3C3,1 c i i 8x,5+ 9 msipii x 40 8 x , l l + 10 74.4 •M C 4 , l 9x,6+ 11 m4(p\\A x c^) 9x,6+ Table 4.2: H computation schedule for row 1 cpus calculate J\\c\\- and m,(p| • X c\\j) All cpus cpu^, in rows 1 through n calculate equations (3.21) and (3.22) with no communications between the cpus (operations 4 and 5 in Table 4.1, and also the last two operations in each column of Table 4.2). The computational complexity of operations 1 to 5 in Table 4.1 is 5 = ( « - l ) ( 1 6 x , 1 6 + ) +(22x,9+) 5 ]T = (16n + 6)x, (16ra-7)+ (4.4) Chapter 4. A Parallel Dynamics Algorithm 72 where the computations and communications of cj j and pj- j synchronise the cycle of computa-tion, and all communications have been overlapped by computations. 4.3 Parallel Calculation of [L mp] In Chapter 3 an 0(n2) algorithm was presented for calculating the vector /„ of gravitational, centrifugal and Coriolis forces. In this section an 0(n) parallel algorithm for /„ is developed which integrates well with the parallel H algorithm just presented. The calculation of r0, the actuator torques, will not be be discussed until Chapter 5, since they are dependent on the type of drive mechanism employed, and can be considered as a separate computational problem. From equation (3.34), the generation of /„ reduces to a problem of premultiplying the vector [L mp] by H. In this section we calculate [L mp] in a parallel manner. Equations (3.27) to (3.31) can be calculated by row 0 in a systolic pipeline. The calculation of wiQi ^ioi WiU and p\\, computed in cpuo,;, requires w\\z\\to, wj~j0, and p,_i to be sent from the left neighbour cpuo,,_i (h, ki, and p, are simplified forms of / | i t + i , k}fi and pj 0 used to simplify the notation in Table 4.3). Note that £, = 4- w\\0 x a,-, where a,- = J\\w\\Q and bi = Jlw\\Q (here 6, is a dummy variable, and is not related to equation 4.2). From Table 4.3, the computational complexity is: = ( n - l)(18x,13+) + (63x,50+) = (18n + 45)x,(13n + 37)+ (4.5) L , m p The computational cost of w*i0 and w\\Q, (n— l)(18x,13+), synchronises the computation cycle and dictates the complexity of the calculation. Although w]*\\ Q is calculated two computational atoms after w\\0 (due to the one atom delay in sending the w\\0 vector to cpuo,t+i), the cost of sending w\\0 to cpu0,,+i is hidden by the computation of u>)i0 in cpu0,i. 4.4 Parallel Multiplication by HT Once the schedules in Tables 4.2 and 4.3 have been completed, both the vector [L mp) and the rectangular matrix H must be multiplied by HT. Chapter 4. A Parallel Dynamics Algorithm 73 cycle cost to cpuoi cpuoi cost to cpu 02 cpu 0 2 CPU 03 CpU04 1 4x A1 4x A Ai -^ 3 2 0x,0+ 3 0x,0+ 4 lx Wi 8x,6+ 5 9x,6+ 10x,7+ wlo 6 9x,6+ 6x,9+ W2 7 8x,ll+ Pi 9x,6+ -W2k2 8 9x,6+ ax = Jlu>l0 9x,6+ w2i2 w3 < 0 9 9x,6+ h = Jlw{0 8x,ll+ fa -W3k3 ^4.0 10 9x,6+ Li,m\\p\\ 9x,6+ a 2 = ^2^2,0 w3i3 W4 11 9x,6+ b2 = J'iw'io fa -W4k4 12 9x,6+ a3 = J3u>3.0 W4l4 13 b3 = Jgwlo P4 14 1>Z, a4 = Jiw\\Q 15 64 = J4w\\Q 16 L4,m4p4 Table 4.3: Schedule for calculating [L mp] The atoms of computation which determine the computation cycle in the inertia matrix algo-rithm, the c|j, p\\: • computations (16x,16+) are comparable to their counterparts in the force algorithm, the (18x,13-(-) w, and W{ computations. The constant overhead for calculating [i mp] (63x,50+), however, is significantly larger than the overhead for H (22x,9+). This results in idle time in the inertia matrix cpu array, where the a, j cpus wait for the 6,- cpus until both algorithms can proceed with the synchronised multiplication by HT. Minimisation of this idle time can be accomplished by passing some of the [L mp] calculations from row 0 to row 1. w\\0 and w'i0 can be sent from row 0 to row 1 cpus, where a,- = J-WIQ and bi = JiW'iQ (18x,12+) can be calculated, relieving row 0 cpus of this burden. The results are then sent back to row 0. Although this requires the communication of vectors a,- and 6,- from row 1 to row 0, it is possible to hide this communication cost with computations. Table 4.4 presents a schedule of some of the communications and computations between cpuo3, cpuo4 and cpui4 (// indicates a parallel process, tx() and rx() indicates transmission and reception). The table shows that all communications are hidden by computations except in the case of cpuo,4-Tables 4.5 and 4.6 are the revised versions of Tables 4.2 and 4.3 (excluding communications) Chapter 4. A Parallel Dynamics Algorithm 74 C P U 0 3 CpU 0 4 CPU14 ^ r x ( j > 2 ) / / W 3 ^ X ^ ) / / ^ ^ 10 j*/7tx(W3/3) - -»rx(W 3/3)//-W4*:4//tx(u;J io) - -*™(«'4j0)//n»4(p4.1XC4.l) 11 c3 = a 3xw;^ 0 ,m3p3//tx(p3) -»• ->rx(fe)//Wy 4//tx(< 0) - - > r x ( ^ 0 ) / / a 4 = y 4 4 ^ 0 12 X 3 = c3 + 63 p4//rx(a4)<- <-tx(a4)//b4 = 13 c4 = a4xw4i0,m4p4//rx(b4) <-tx(64) 14 L4 — c4 \\ 64 15 Table 4.4: Redistributed computations and communications for [L4 m4p\\\\ c p u u CPU12 CPU13 CPU14 COSt tO CpU 14 1 4 AJ 4 4x 2 3,3 P4.4 0x,0+ 4 c2,l c3,2 c4 , 3 8x,5+ 5 X Cl.l) P2.1 P3.2 P4.3 8x,ll+ 6 •7JX.0 «'2 c 2, l c 3 , l A c4 , 2 8x,5+ 7 m2(P2,l X P3.1 4 P4,2 8 x , l l + 8 «*2 w2,0 J3C3,1 4, 8x,5+ 9 T2w2 J2 W2,Q m3(p3,l X 4l) Ph 8x,ll+ 10 74-4 ''4 c 4 , l 9x,6+ 11 J 3 w 3,0 m*(p4.1 X C^j) 9x,6+ 12 9x,6+ 13 J$W4,0 9x,6+ Table 4.5: Revised H computation (row 1) schedule when di and 6, are calculated in row 1. The complexity for the revised H and [L rnp] algorithms prior to multiplying by H is 53 = (n-l)(16x,16+) + (40x,21+) (4.6) H for the H matrix, and J2 = ( « - l ) ( 1 8 x , 1 3 + ) + (45x,38+) (4.7) L,mp for the [L mp] calculation. The H algorithm in Table 4.5 requires each row 1 cpu to transmit 6, = J\\w\\Q to their counterpart cpu in row 0 after all computations have ended (operation 14 in Table 4.4). Chapter 4. A Parallel Dynamics Algorithm 75 cost to cpuoi cpuoi cost to CpU02 CpU 02 CpU 03 cpuo4 1 4x A1 4x A Al Ai 2 0x,0+