UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Classification of multi-channel EMGs for jaw motion recognition by signal processing and artificial neural… Fu, Bin 2004

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_2004-0451.pdf [ 11.71MB ]
JSON: 831-1.0065601.json
JSON-LD: 831-1.0065601-ld.json
RDF/XML (Pretty): 831-1.0065601-rdf.xml
RDF/JSON: 831-1.0065601-rdf.json
Turtle: 831-1.0065601-turtle.txt
N-Triples: 831-1.0065601-rdf-ntriples.txt
Original Record: 831-1.0065601-source.json
Full Text

Full Text

Classification of Multi-channel EMGs for Jaw Motion Recognition by Signal Processing and Artificial Neural Networks by  Bin Fu B.Eng., Tianjin University of China, 1991 M.Eng., Tianjin University of China, 1994  A THESIS SUBMITTED IN P A R T I A L F U L F I L M E N T OF T H E REQUIREMENTS FOR T H E D E G R E E OF M A S T E R OF APPLIED SCIENCE in T H E F A C U L T Y OF G R A D U A T E STUDIES Department of Electrical and Computer Engineering We accept this thesis as conforming to the required standard  T H E UNIVERSITY OF BRITISH C O L U M B I A  September 2004 © Bin Fu, 2004  THE  UNIVERSITY  O F BRITISH  FACULTY  COLUMBIA  OF GRADUATE  STUDIES  L  Library Authorization  In presenting this thesis in partial fulfillment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission.  Bin  Vu.  Name of Author  zty/C) (2.004.  (please  Date (dd/mm/yyyy)  print)  Title of Thesis:  CUss: ^•Vft-t.'<H>  Degree:  Master  Department of  H•&c-btic*$  of  A/>p(i'«<$ A*<^  Mul-trî  —  cRa»n.aJ?  2l*&s  Year:  ^C<O.V\CA,  C2&i*j>uxs>f  -for  Ja*j  Mo*.'»*  ^coA-  iwaa.fi' >^  The University of British Columbia Vancouver, BC Canada  grad.ubc.ca/forms/?formlD=THS  p a g e 1 of 1  last u p d a t e d :  2 0 - M - 0 4  This thesis presents an E M G pattern recognition method to identify jaw movements. Pattern recognition is carried out using backpropagation artificial neural networks (BPN) trained by supervised learning. Different feature extraction methods have been implemented. Results are presented to support the feasibility of the suggested approach. Electromyography (EMG) is electrical activity of muscle. Bruxism is the involuntary and excessive clenching and grinding of teeth. It is one of the reasons that cause serious teeth damage and jaw muscle disorder and currently there is no definitive cure. Knowing actual jaw actions during bruxism will help in designing a more targeted treatment mode. E M G provides information about the neuromuscular activity from which it originates. When performing different muscle contractions, different E M G s are detected. Different muscle contractions are related to various movement tasks. Therefore it is possible to process the E M G to obtain movement classification. The purpose of this study is to design an algorithm to detect jaw motion (simulated bruxism) from the E M G signals. The process consists of feeding the E M G signals to a feature extraction block; and then the extracted features are supplied to an artificial neural network (ANN). The A N N classifies the motion into six categories: left, right and forward with two speeds, fast and slow. The application of this algorithm can provide the basis for a clinical evaluation of bruxism. In this study, three feature extraction techniques have been implemented for comparative analysis: E M G linear envelope (LE), autoregressive modelling (AR parameter estimation) and a hybrid approach of the A R model and discrete wavelet transform (DWT). The performance of the three methods has been evaluated. The A N N structure consists another part of the study. While both linear envelope method and A R model method were able to classify the direction of jaw movement, each confused with the fast and slow speed at the same direction to some extent. To solve the problem, a new A N N structure was designed for the linear envelope method by adding more features like the duration and the integral information of the waveform These features work as bias to the A N N structure and they not only yield high correct classification rates but also strengthen the robustness of the A N N . Finally the classification performance of the algorithm was checked in 10 healthy subjects and reasonable results were obtained. These were best with downsampled L E with the A N N (structure III).  Abstract Table of Contents List of Tables List of Figures List of Abbreviations and Glossary Acknowledgement Chapter 1 Introduction and Problem Statement 1.1 Bruxism 1.2 Electromyography (EMG) 1.3 Current Methods used in the Study of Muscle activity from E M G 1.4 Backpropagation Artificial Neural Network 1.5 Feature Extraction 1.6 Goal of current research Chapter2 Data Collection 2.1 Electrode Placement and Equipment Setting 2.2 Experimental Procedure: Chapter 3 Methods 3.1 A N N as a Pattern Classifier 3.1.1 Structurel 3.1.2 Structure II 3.1.3 Structure in 3.1.4 Factors Influencing ANN's Performance 3.2 Feature Extraction Before Classification 3.2.1 E M G linear envelope and A N N 3.2.2 Data Reduction Using Autoregressive (AR) Model 3.2.3 Data Reduction Using Hybrid Approach of Wavelet Transform and the A R Model Chapter 4 Tests and Results 4.1 Same Subject, Train and Test on The Same Day (Intrasession Test) 4.2 Same Subject, Train and Test on Different Days (Intersession Test) 4.3 Same Subject, Pooling Test 4.4 Test Between Individuals (Intersuhject Test) Chapter 5 Discussion and Conclusion 5.1 Effect of Network Structure 5.2 Comparing Different Feature Extractors Used 5.3 Reproducibility References  ii iii iv v vii viii 1 1 1 2 7 8 13 14 16 16 16 18 21 21 21 21 22 23 24 27 31 37 42 48 48 49 58 60 62 64 64 64 65 67 70  Table 1 The outputs of A N N are coded to represent 6 motion categories  26  Table 2 Test results for different schemes  49  Table 3 Test results for 9 subjects  50  Table 4 Test results for each day of the 5 consecutive days  57  Table 5 Train on reprol and test on the other days (between day test)  58  Table 6 Train on repro2 and test on the other days (between day test)  58  Table 7 Pooling test results  61  List of Figures Figure 1 Masseter and temporalis muscles and an electrode placed on the masseter muscle  4  Figure 2 Four channels E M G signal for a left fast movement  5  Figure 3 Four channels E M G signal for a left slow movement  5  Figure 4 Basic structure of a two-hidden-layer artificial neural network  9  Figure 5 Sigmoid activation function  10  Figure 6 In the feature extraction stage, the overall multichannel signal is described by a smaller set of data. This information is used for either training or classification purposes or both.  14  Figure 7 Target (instruction) signal for making a movement  19  Figure 8 Recursive A N N or dynamic A N N  21  Figure 9 A simple A N N structure  22  Figure 10 Modified A N N structure. To determine the direction and speed of movement at a given instant, four channels E M G at that instant are used. Or recursively the outputs for previous prediction are used  23  Figure 11 Training error curves as a function of the number of iterations  25  Figure 12 Extract the Information-bearing Part of E M G  30  Figure 13 Steps in getting the linear envelope  31  Figure 14 E M G envelope (red curve) obtained from a lowpass filter at cut-off frequency of 4 Hz  32  Figure 15 Find rise time and slope  35  Figure 16 Data reduction using A R model  37  Figure 17 Examples of A R coefficients. Fig. A comes from llf l,rlf l,prf 1,11s l,rlsl,prsl. Fig. B comes  fromllf3,rlf3,prf3,lls3,rls3,prs3  40  Figure 18 Data reduction using W T and A R  42  Figure 19 The time-frequency plane of discrete wavelet transform  44  Figure 20 Frequency bands associated with wavelet transform coefficients after threelevel decomposition, ay. approximation coefficients. d i ^ A : detail coefficients at level 1, 2, 3 respectively  44  Figure 21 Wavelet decomposition  45  Figure 22 A R coefficients obtained from the wavelet transform of the signal  47  Figure 23 7 left fast movements from subject P8  51  Figure 24 8 left fast movements from subject PI  52  Figure 25 4 left fast (upper part) and 4 forward fast movements (down) for subject P7 ..54 Figure 26 Similarity of fast (above 4 plots) and slow (below 4 plots) for right movements in repro5  55  Figure 27 Similarity of fast (above 4 plots) and slow (below 4 plots) for left movements inrepro4  56  Figure 28 Compare the task duration from different reproducibility  59  Figure 29 Formation of the training set for pooling test  61  Figure 30 E M G waveform differences between two subjects, above: subject P2; below: subject PI  63  List of Abbreviations and Glossary EMG  Electromyography  BP  B ackpropagation  BPN  Back Propagation Neural Network  ANN  Artificial Neural Network  PCA  Principal Component analysis  LDA  Linear Discriminant Analysis  PSD  Power Spectral Density  AR  Autoregressive  STFT  Short-time Fourier Transform  WT  Wavelet Transform  DWT  Discrete Wavelet Transform  Bruxism  The involuntary and excessive clenching and grinding of teeth  SB  Sleep bruxism  MVC  Maximum voluntary contraction (or clenching)  TMD  Temporomandibular joint disorder  Lmass  Left Masseter Muscle  Ltemp  Left Temporalis Muscle  Rmass  Right Masseter Muscle  Rtemp  Right Temporalis Muscle  11  Left lateral  rl  Right lateral  pr  Protrusive  Acknowledgement  The author would like to thank Dr. Chris Peck and Dr. Chris Long from the dental department of U B C for their contribution to the data acquisition and also would like to thank Prof. Alan Hannam for his helpful discussions and suggestions.  Many sincere thanks and special appreciation go to my supervisor Prof. Mike Beddoes for his instructions, help, time spent and advice throughout my 2 years at U B C . And many thanks go to my co-supervisor Prof. Rabab Ward for her valuable suggestions and instructions especially on thesis writing.  Finally I would like to dedicate this thesis to my family for their endless support and encouragement.  Chapter 1 Introduction and Problem Statement 1.1 Bruxism In 1907, Marie Pietkiewicz introduced "la bruxomie" to describe gnashing and grinding of the teeth with no functional purpose [1]. The term, "bruxism", was adapted later. Bruxism is an involuntary activity of the jaw musculature. It includes awake bruxism and sleep bruxism. Sleep bruxism (SB) is the grinding or clenching of the teeth during sleep. It is reported by 8% of the adult population and 14%~20% of children. Bruxism can be a habit carried throughout life, that is to say, prior to the eruption of teeth, during the natural dentition period and through the artificial denture period.  Sleep bruxism is often related to malocclusion (morphological factors), mental anxiety (psychological factors), and neurological disorders (neurological factors). It may subside and reappear spontaneously [2].  Nocturnal grinding can exert thousands of pounds of pressure per square inch on the surfaces of teeth. It presses not only teeth but also the supporting bone, the gums and jaw joint. Sleep bruxism (SB) can cause tooth destruction, temporomandibular joint disorder (TMD) (movement limitation, chewing difficulties, jaw pain), headaches, and disruption of the bed partner's sleep due to the grinding sound.  Few individuals are aware of or seek treatment for SB till finally there are noticeable dental problems or muscle pain. It is at this point that they start looking for a cure. Unfortunately by then the habit is ingrained and has brought irreversible losses. The patients have to suffer the teeth damage and pain that may not be successfully treated.  Of course a regular dental examination and radiological investigation can prevent the above situation from happening. Actually in clinical, diagnosis of SB is usually based on dental examination and patient history. According to the American Sleep Disorders  Association, the diagnosis of sleep bruxism is based on the report of tooth grinding or clenching in combination with at least one of the following signs: abnormal tooth wear, sounds associated with bruxism, and jaw muscle discomfort [3].  However, in its initial stage, bruxism only involves minor symptoms and inconveniences. None of the signs and symptoms may be considered conclusive. Diagnosis of bruxism can be difficult and numerous techniques and criteria are currently in use [3,4,5]. A definitive diagnosis can be obtained by a formal sleep study (polysomnography) that includes continuous monitoring of the electrical activity (EMG) of the masticatory muscles, which may show repeated bursts of activity that are not present in non-bruxers. However this kind of study costs much and needs technician's continuous observation. While it gives information about whether bruxism exists, it has not further classified the bruxism by the actual jaw movements. Knowing what kinds of jaw actions a bruxist tends to make and the speed and force of the actions will help dentists understand bruxism better and design a targeted treatment specific to an individual [32].  One possible way to find out jaw actual movements during bruxism is through electromyography (EMG) measurements. Since it is very inexpensive to conduct, the technique seems to have the potential of an excellent cost/benefit ratio.  1.2 Electromyography (EMG) E M G is sometimes referred to as myoelectric activity and has been used for many diverse applications including clinical muscle function evaluation, as a source of control of prosthesis devices and schemes of functional electrical stimulation.  When a muscle is contracted, a small electric potential is produced. If we place an electrode on the skin over the contracted muscle, the electrode can sense this muscle activity potential. The signal detected by the electrodes is a superposition of all the muscle fiber action potentials within the pickup region of the electrode. These muscle  fiber action potentials occur at somewhat random intervals so at any moment, the E M G signal may be either positive or negative voltage. The instantaneous value of E M G is a random variable with zero mean and contains no information [40]. The E M G signal is stochastic with a typical amplitude of 0~6mv (peak to peak). The usable frequency range is 0-500 Hz. The strength of E M G is proportional to the amount of contraction of the underlying muscle.  A single motor unit is a collection of a number of muscle fibers activated by a single neuron. The electrical potential of individual motor units can be measured with wire or needle electrodes placed directly in the muscle. It is called motor unit action potentials (MUAP). Surface E M G signals can be treated as superposition of many filtered, spatially distributed motor unit action potentials [6,7,8].  Muscle contraction is a result of stimulation of the muscle. This stimulation is a result of pulses corning from the nerves to the muscles. The nerves innervate the muscle at the end plates located near the belly of the muscle. The stimulus travels away from the center in both directions. The muscle fiber contracts and thus produces motion. Different muscle contractions produce different EMGs, and different contractions are associated with various movement tasks. Therefore it is possible to process E M G signals to obtain movement classification.  In this research, while making jaw motion, four channels surface electromyography (EMG) were gathered from the right and left anterior temporalis muscles, right and left masseter muscles simultaneously. Shown below is a picture of electrode placement (with electrode put only on the masseter muscle) and pictures of typical E M G records.  Figure  l. Place  MyoScattTM  to the masseter  Sensor muscle  with two active  ctectrodes  parallel  .fibers.  Figure 1 Masseter and temporalis muscles and an electrode placed on the masseter muscle (From http://www.bfe.org/protocol/pro07eng.htm)  Four Channels of EMG (Left Masseter .Left Temporals .Right Masseter .Right Temporalis)  Figure 2 Four channels EMG signal for a left fast movement  Four Channels of EMG (Left Masseter Left Temporalis /fight Masseter .Right Temporalis)  Figure 3 Four channels EMG signal for a left slow movement  The real bruxism episodes are more complicated than that described above. According to Velly-Miguel [10], there are three different types of bruxism episodes: phasic (rhythmic), tonic (sustained), or mixed (both phasic and tonic). The above simulated signals are more like tonic bruxism episodes.  Similar to other electrophysiological signals, E M G signals are small and need to be amplified by an amplifier designed to measure physiological signals. These amplifiers include a differential amplifier circuit, and frequently include some filtering and other signal processing features.  E M G signals are susceptible to numerous technical problems. These include electromagnetic interference like hum or 60 Hz noise from power line, signal acquisition problems like baseline drift, skin artifacts, signal processing errors, and many other kinds of interpretation problems.  Thus great variability may be found when recording E M G using a surface electrode. The recording conditions may change due to alternations in four quantities: head and body posture, skin resistance, temperature and humidity, electromagnetic and electrostatic interference. These cause changes in E M G . Muscle fatigue, emotional factors, the continuous activity of eye muscle can also affect E M G . Lastly the location and relocation of the electrodes (removing and replacing everyday) may change E M G . The values of E M G may vary for the same person tested at different times.  The complexity and diversity of E M G signals leads to diverging diagnoses, even from highly trained myographers. Manual quantitative measurement is time-consuming and subject to various sources of error. Since there are many problems in interpreting the E M G signal, an automated technique and a signal processing method for classifying E M G signals is highly desired.  1.3 Current Methods used in the Study of Muscle activityfromEMG Many investigations were carried out to allow the automatic recognition of different types of muscle activities from the electromyogram form. Possible techniques for classification of E M G include multi-layer neural networks [15,16,17], Hidden Markov models [12], linear discriminant analysis (LDA) [13], artificial intelligence techniques [14] and fuzzy logic. However most of these studies focus on gait analysis and mapping between E M G and trajectory patterns in the free-arm [18,19]; or they focus on controlling a prosthesis limb [20,21,22]. For example, after a hand amputation many of the muscles in the forearm remain and can still be used by the amputee. E M G signals can be read from these and used as the control source for the prosthetic device. While many studies have been devoted to such activities, only a few references [8, 9] have been used in recognition of masticatory (or jaw) muscle activities.  So far, classical statistical identification approaches are mostly used for classification of jaw muscle E M G [8, 9]. Researchers first use time domain parameters to discriminate between function and simulated parafunction and then use statistical tools as classifiers. Gallo L. and Palla S. [8, 9] recorded the electromyograms of the masseter and temporal muscles and defined some parameters like the dynamics (dw) within a sliding window to work as a criterion for separation; or the signals were analyzed for the number, amplitude and duration of contraction episodes. Other features include the rectified mean, the standard deviation, the coefficient of variation, the skewness and the number of threshold crossings within equidistant signal portions. Signal classification was based on a multivariate discriminant analysis algorithm. L signal features characterizing each signal portion were used to determine M equations. These equations yielded the classification score used to predict the type of activity. They classified signals by assigning each one to the activity corresponding to the greatest classification score. Chewing soft food was recognized correctly 97% and speaking 86% of the time. However chewing hard food was classified as chewing soft food or swallowing in 32% of the time. The problem with this approach is that their analytic equations could not describe the complexity of E M G signals.  We want to explore the feasibility of applying the E M G classification techniques mentioned above to the recognition of jaw muscle activity.  Although some classifiers demonstrate obvious advantages over others, it is the signal representation that most dramatically affects the classification performance.  Wavelet transforms are frequently used techniques in recent studies of E M G classification [23]. A linear discriminant analysis (LDA) classifier was utilized on a set of wavelet transform features, reduced by principle component analysis (PDA). L D A classifies a sample object into one of known categories based on training samples with known classes. L D A maximizes the ratio of between-class variance to the within-class variance in any particular data set thereby guaranteeing maximal separability. While the results are encouraging, the whole process (WT + P C A + L D A ) is complex and time consuming.  Artificial neural networks are able to extract essential features from complex patterns and to generalize the extracted characteristics to situations which they have not encountered [21]. This ability of generalization makes the A N N potentially a very powerful classifier and allows A N N to cope with time-scale variance and shape variance in the observed signal. In this study, artificial neural networks (ANN) with a backpropagation algorithm were used to predict jaw activities from E M G signals.  1.4 Backpropagation Artificial Neural Network Since it is difficult to write general recognition rules for different individuals, the A N N classifier was used to learn and adapt to the individual characteristics and adapt to small variations in feature values over time.  A n artificial neural network is modeled after the human brain. It is a group of processing units called "neurons or nodes" which are interconnected and distributed in layers. The  "neurons" are represented by dots (Figure 4), and the solid lines connecting the "neurons" represent a signal from one node to another. The weighting factors represent the strength of the connection from one node to another.  input layer  hidden layerl  hidden Iayer2 output layer  Figure 4 Basic structure of a two-hidden-layer artificial neural network  The network function is determined largely by the connections between neurons. We can train a neural network to perform a particular function by adjusting the weights between neurons. Neural networks have been trained to perform complex functions in various fields of application including pattern recognition, classification, speed, vision and control system. Given a representative set of input/output pairs, the neural network can be trained for classification by forming a specific weight matrix. It is not required to provide any description of the relationship of the pairs.  As mentioned before, an ANN resembles the work of the human brain. For a biological neuron, it will fire (send an output impulse down its axon) if sufficient signals from other neurons fall upon its dendrites in a short period of time. To model this process, a node is said to be fired when its net input exceeds a certain threshold. An activation or transfer function is used to capture the idea. For an artificial neural network the activation function should be continuous, differentiable, and monotonically non-decreasing. The most commonly used activation function is the sigmoid function with range from 0 to 1.  It can be shown that:  r(x) = /-wo-/(*)) S i g m o i d Activation Function  1 0.8 0.6 0.4  -  0.2  0  -4  0  2  3  x  4  Figure 5 Sigmoid activation function  There are two stages for each A N N , the training stage and the testing stage. The training of the network is done in this study using supervised learning with backpropagation. It consists of an information feedforward process and an error backpropagation process.  Information that is forwarded from one unit to another unit in a subsequent layer is multiplied by a weight factor. In each receiving unit, the weighted information of all incoming connections is summed, and a bias term may be added. Subsequently, the summed input plus bias term is processed by an activation function, which is normally a sigmoidal transfer function. The output value is then forwarded to each of the units of the next layer.  The feedforward process is as follow: 1. Each input unit receives input signal x; and broadcasts this signal to all units in the hidden layers. 2. Each hidden unit sums its weighted input signals: n Si  = Y  J  w  i  j  u  l  + w  o  j  Wjj : Weight on connection from jth unit in previous layer to ith unit in  present layer. Uj : Output of jfh unit in previous layer. Sj : Net input to ith unit in present layer. w  oj  : Bias on hidden unit i. The bias term acts like weight on connection from units whose output is always 1.  The weighted sum is subjected to an activation function (transfer function) to compute the output of ith unit.  u,=f(S,) 3. Each output unit sums its weighted input signals and applies the activation function to compute its output yk.  For every input there is an actual output and a desired output. If the actual output is not the same as the desired output, the error is calculated and is back propagated to each hidden layer units, and the weights are changed so as to reduce the error. This process continues until the actual outputs all match the desired output.  The error backpropagation process is as follows [51]: 1. Each output unit computes its error by:  S =(t -y )f'(S ) k  k  k  k  f'(S ) = f(S )(\-f(S )) = k  k  k  y O-y ) k  k  S : Error of kth unit in output layer k  t : Target output or desired output k  y : Actual output k  Then calculate its weight correction term and bias correction term: Aw = kj  Aw  k0  aS Uj k  = aS  k  a : The learning rate Update connection weight from the last hidden layer to output layer:  2. Send S to units in hidden layers k  M  M  S =f'(S )'£S w =u Q-u )'ZS w /f=i l  l  k  lll  l  l  k  N  ^=1  Sj : Error information of jth unit in hidden layer. Update connection weight from previous hidden layer to present hidden layer: w]i =Wji + A w  y/  =w  ji  +ccS u l  i  3. The above process goes on and forth until reaching the input layer.  There is always a problem in deciding on the magnitude of the learning rate a . If it is too large the learning will oscillate; if too small convergence becomes very slow. To solve the problem, momentum is implemented. The idea is that the weight change is pushed on by the previous weight change. The weight change is in a direction that is a combination of the current gradient and the previous gradient. Weight updating became: w  ]i  = Wf +aS u j  i  + pAw*:  p : The momentum term Aw J : The previous weight change. Momentum allows the net to make reasonably large weight adjustments as long as the corrections are in the same general direction for several patterns, while using a smaller weight adjustment to prevent a large response to the error from any one training pattern. The use of a momentum term allows higher learning rate to be used (e.g. 0.9). Normally a subset of the dataset is chosen randomly to train the A N N in learning stage, while the remaining data are used as a test of the recognition success. Patterns used for training were not used for testing and vice-versa.  In the testing stage, the weights remain unchanged and the network is given an E M G input so as to predict which class it belongs to.  Many different layouts for ANNs have been developed and the prediction quality of a network is determined by a number of factors: the number of units in the network, the level of the preset rrrinimum error in the output, and the relative amount of information in the training set. A network cannot be trained properly to map patterns when it does not contain enough units. When the level of the minimum error is set too high the network will not be able to learn the mapping. When the level is set too low overtraining of the network will occur and it will have problems in generalizing and predicting signals that it has not been trained with. When a network is trained with a data set that covers a wide range of information, it will learn the relationships that will occur in different situations.  Since the activation function is non-linear and the hidden units are incorporated, a neural network is capable of identifying patterns in complex, non-linear systems without having an understanding of the relationships between classes. The network can process degraded or incomplete data and may be fault tolerant.  A neural network is also inherently a parallel structure in which comparison to all classes is done simultaneously. The value of output is essentially a measure of the similarity of the unknown pattern to each of the classes.  1.5 Feature Extraction Feature extraction is a signal processing stage prior to classification. The purpose of feature extraction is to obtain useful parameters that would be used for the identification of movements.  This study has focused on identifying 6 classes of jaw motion based on E M G measurements and artificial neural networks. The large quantity of data gathered means that directly feeding the raw E M G data into an A N N is impractical. Feature extraction techniques should be used first to reduce the number of points necessary to describe the signal. At the same time the extracted parameters should have the ability to maximally  separate different classes. Then all the subsequent classification or clustering are based on these features.  In this project, signal features of the EMGs captured during different muscle activity were analyzed. Three parameter extraction methods have been implemented. These are (1) linear envelope E M G , (2) autoregressive modeling (AR parameter estimation) and (3) a hybrid approach of A R model and discrete wavelet transform. Performance of these methods has been evaluated.  Feature Extraction Raw Mutichannel B A G Input  >{  Linear envelope  Pattern Recognition  Ou  DWT and AR  Figure 6 In the feature extraction stage, the overall multichannel signal is described by a smaller set of data. This information is used for either training or classification purposes or both.  Figure 6 shows the flow of the data from the raw signal to the identification of jaw motion. In the feature extraction stage, the overall multi-channel signals are described by a smaller set of data which faithfully represents the original signal. This information is then used for either training or classification purposes or both.  1.6 Goal of current research This research is a joint project between the Electrical and Computer Engineering Department and Department of Oral Health Sciences in U B C . The long-term purpose of the research is to analyze the actual jaw movements performed by individual subjects  with bruxism or other parafunctional habits. In previous studies of bruxism, no attempts have been made to link the E M G patterns with the jaw movements or with the resultant force [32]. This research attemps to derive the relationship and classify jaw movements using E M G . It will give dentists a better understanding of the masticatory system and bruxism  Before working on clinical tests with real patients, experiments should be carried under controlled, simulated environments. For current laboratory stage of the work, simulated bruxism signals were recorded. Three major jaw actions, which are likely to happen during bruxism, are selected. They are left lateral, right lateral and forward. Each with two different speeds, fast and slow. And maximum voluntary clenching (MVC) is also detected for the purpose of normalization of signals obtained from the same individual in different days.  The purpose of the current work is, first, to analyze the signal features of the E M G signals recorded during different muscle activities; second, to design an algorithm and train an A N N to identify jaw motion direction and quantify the motion speed; finally, to test under laboratory conditions the validity and reliability of the algorithm for classifying different jaw activities using E M G signals.  The next stage would be to apply the model, which is built using controlled experimental situations, to muscle activities in the natural environment (i.e. during night sleep). This classification algorithm can be adjusted to accommodate real bruxism signals and used as a tool for data analysis and storage size reduction.  Finally, it is hoped that an E M G based automated recognition system can be implemented and used in a clinic (on potential bruxer or diagnosed bruxer) to assist evaluating bruxism and to help designing a treatment mode (whether or not it uses a splint) to reduce the severity of bruxism  Chapter2 Data Collection E M G Data are supplied by Dr. Chris Peck from department of dentistry science of U B C . A roster of 10 healthy subjects (8 females and 2 males) participated in this study.  2.1 Electrode Placement and Equipment Setting Four-channel bipolar E M G signals were recorded simultaneously from four pairs of electrodes fixed on the masseter and anterior temporalis muscles on each side of face. A specially designed occlusal splint (appliance) is used to guide the direction of jaw motion during tests [32].  Six categories of E M G signal patterns were collected, corresponding to six different classes of movements, that is, left fast, left slow, right fast, right slow and forward fast, forward slow. The sampling rate was set at 1kHz. Each pattern consists of four channel E M G s with 5000 sample points per channel.  One subject participated in the reproducibility test, that is, tests are done in five consecutive days. In each day 15 trials were repeated for each class of movements. So a total of 90 patterns were generated in one day, resulting in a total dataset of 450 patterns for five days. Other 9 subjects only took a one-day test. They were asked to repeat each motion 15 times and 90 patterns with 6 categories were obtained.  During the reproducibility tests, a template was used to mark the position of the electrode and to make sure the relocation of electrodes of each day is accurate [30,31,32].  It is difficult to obtain high-quality electrical signals from biological sources because the signals typically have low amplitude (in the range of micro Volts) and are easily corrupted by electrical noise and test condition and the environment. So a differential preamplifier is used to pick up the E M G signal and to transmit it to the receiving circuit  (bioamplifier) using optical coupling. The bioamplifier then further amplifies the signal, filters it to reduce the noise and to remove out-of -band (>500 Hz) signals. The data processing systemfinallychanges the analog signal to digital at 1000Hz sampling rate, Le. a sampling period of lmili-second. The raw E M G data are acquired now. Each record is comprised of 4 channels with 5000 samples for each channel. The data processing system can later rectify and low pass filter the raw EMG signals to obtain the linear envelopes.  According to Dr. Chris Long's thesis [32], the equipments used for data collection are:  Electrodes: Ag/AgCl surface electrodes (Myo-tronics, Inc. Tukwila WA 98188). These electrodes have a center-to-center distance of two centimeters, and an active area with a diameter of one centimeter. The electrodes were contacted to skin through conductive gel (Grass medical instruments EC2 electrode cream, Quincy, Mass 02169).  Bioamplifier: A 16 channel isolated bioelectric amplifier system Model CP-12/24BA #41310 (SA instrumentation Co. San Diego CA US). The bioamplifier is isolated with optical and magnetic isolation barriers. The bioamplifier employs high passfilters(0.5-2100-500Hz; -12 dB/octave; -3 dB Butterworth type), low pass filters (0.5-1-5- 10Hz;  -  24 dB/octave; -3 dB Butterworth type) and has a selectable gains of 2x, 5x, lOx, and 20x. In this study, the amplifier was set to a gain of 2x; the high passfilterwas set above 20 Hz; and the low passfilterbelow 500Hz.  Data Processing System: Agilent Vee Pro ( Agilent Technologies Inc.). This PC-based "virtual interface" software program converts analogue signals to digital signals. It can further process each signal by rectifying, filtering and storing. In this study, the processed data were stored as textfilesfor later off-line analysis. When saving the data, 'llf is used to represent left lateral fast, 'rlf is used to represent right lateral fast and 'prf is used to represent protrusive fast. The notation is similar for slow movements.  2.2 Experimental Procedure: The test procedure is as follows:  First, the skin is cleaned with alcohol (to reduce skin resistance). Then mount (using conductive gel) surface electrodes over both masseter and anterior temporalis muscles bilaterally and place a ground electrode on the subject's back neck. Differential preamplifiers are used to reduce common mode noise so it is a pair of electrodes for each muscle.  Then the subject is trained to move the jaw by following an indicator (or target), a moving green dot on an oscilloscope. The track of the green dot is like a ramp signal followed by a plateau stage .The slope of the ramp decides the speed of the movements, i.e. fast and slow.  The timing of this target signal is shown in Figure 7.  The slow target rising phase begins 2 sec from time zero and lasts one second (ending at 3 sec). The flat plateau following this lasts one second (ending at 4 sec) and the target then stays on baseline for one sec (i.e. to 5 sec).  The fast sequence rising phase starts at 2 sec, lasts 0.5 sec (ending at 2.5 sec). Its flat plateau lasts one sec (ending at 3.5 sec) and the target then stays on baseline for 1.5 sec) (i.e. to 5 sec).  Target timing for fast and slow  0  Figure 7  1  2 3 Time (Second)  4  5  Target (instruction) signal for making a movement  As the oscilloscope trace starts to rise on the ramp portion of the tracing, the subject is asked to grind his teeth on the splint and to slide to the designated side. The task is completed (jaw moves to furthermost) when arriving at the plateau portion of the tracing. The subject is asked to hold that position during the plateau. He is then asked to simply relax from the clench, rather than to actively open the mouth [32].  Notice that when the subject is asked to make a movement, no constraints are placed on the force except that the subject is asked to be consistent in reproducing the desired motion. Once the subject is trained, tests can begin.  During the recording procedures, the subject is comfortably seated upright in a dental chair. First he is asked to be in a resting state, i.e. to let the teeth lightly touch the splint and with the mandible at rest (this state lasts about 2 sec).  Then the subject is asked to complete a specified movement while the indicator goes along the ramp. Each record (including the resting period) lasts 5 seconds though the actual moving period (fast or slow) takes only about 0.5 or 1 second. The subject holds  his jaw steady during the plateau stage but begins to relax when the target signal goes down sharply.  One thing should be noted here is that the original differentiation into fast and slow is actually based only on a verbal instruction and a visual target (not on the actual performance in response to it).  Methods 3.1 A N N as a Pattern Classifier The basic structure of the neural network used in the present study is a standard threelayer (one hidden layer) network trained with a backpropagation supervised learning algorithm. The neural network can be recursive (dynamic) or non-recursive.  The input nodes of the A N N correspond to the E M G features and the output nodes correspond to partem classes. Three A N N structures have been implemented in this study, corresponding to different types of obtained features.  3.1.1 Structure I Dynamic neural network structures (Figure 8) incorporate memory: the output of the present state may be decided by both the present input and the outputs of the previous states. This structure is suitable for dynamic processes. Further improvement of it can be done using 2 ANNs where each ANN's output is the other's input. This looped A N N structure has been very successful in building a model to map the jaw positions and jaw muscle tensions [29]. 1 delay 1 delay .1 .delay  !  outl out2 Lt  out3  R m  JV_Rt Input Layer  Hidden Layer  Figure 8 Recursive ANN or dynamic ANN  Output Layer  The inputs to the A N N are several time sequences. Samples in a time sequence are fed to the A N N each at a time. The input vector of the A N N is formed in the following way:  Input(n)=[Lm(n),Lt(n),Rm(n),Rt(n)]  The outputs of the A N N are also time sequences which could correspond to the positions of a jaw from where it initiates a motion to the furthermost place it could reach.  3.1.2 Structure II  If the extracted features are a set of numbers (not a time sequence), a simple structure is adopted (Figure 9). The input to the A N N is a set of time domain features like slope, rise time, detail factor, average amplitude and duration; or a set of coefficients calculated from A R modeling. These features form the input to the A N N , all fed in at the same time and correspondingly one result is obtained.  SloneSum RiseTimel outl  RiseTime2 AveAmLm  out2  AveAmLt  out3  AveAmRm AveAmRt Duration  Input Layer  Figure 9 A simple ANN structure  Hidden Layer  Output Layer  3.1.3 Structure III  It is a modification of structure I. The new ANN uses more features like duration and integral information of the EMG waveform (Figure 10). These features work as a bias to the ANN structure.  Figure 10 Modified ANN structure. To determine the direction and speed of movement at a given instant, four channels E M G at that instant are used. Or recursively the outputs for previous prediction are used.  The importance of the bias term can be demonstrated by considering a simple linear equation: y = ax+b. Without the bias b, y can only represent lines that go through the origin. If this line is used as the decision boundary for classification, some classes may not be correctly categorized because they require a decision boundary that does not pass through the origin to separate them.  3.1.4 Factors Influencing ANN's Performance  Before the start of training, the weights of an A N N are set to random values. A common procedure is to initialize the weights to random values between -0.5 and 0.5. The learning rate is set to a small positive value.  Then we select the specific network parameters such as the hidden node number, training repetition, momentum as described below.  Selecting the number of hidden nodes  The optimum number of hidden units is problem dependent and sometime is selected by trial and error. The hidden units should be kept as small as possible to reduce the complexity of the neural network algorithm and to improve generalization when using a small set of training data. However, the size must be sufficient to learn the necessary input-output mapping. Too few hidden units will result in poor network performance because the network will be unable to learn the necessary input/output mapping represented in the training data. An optimum number of hidden units must be determined experimentally for this application.  The effects of the number of hidden units were not obvious in this study. Although there were differences due to the effects of the hidden unit number amongst subjects, some clear trends did emerge. A hidden layer with 20-30 units gave good classification accuracy for all subjects.  The number of hidden units is fixed to 20 in this study in order to compare the performance of different feature extraction methods. A n A N N with two hidden layers and a recurrent A N N can also be used. However experiments using an A N N with 16 and 22 neurons in 2 hidden layers were of little success, compared to one hidden layer approach. Theoretically one hidden layer is sufficient for a backpropagation net to approximate any  continuous mapping from the input patterns to the output patterns to an arbitrary degree of accuracy [51].  Selecting the number of iterations M for training The number of iterations M used in training the A N N is based on the convergence of the learning process. This is determined using the root mean square error criterion. The mean square error is defined as:  t : Targeted output or desired output k  y : Actual output k  M : Number of input-output pairs, or number learning patterns  The mean square error is usually plotted against the number of iterations to show the learning performance of an A N N .  MSE vs. Number of Iterations 0.18 0.16 LU 0.14 CD 0.12 i— 03 0.1 c r 0.08 C/D 0.06 cCO 0.04 CD 0.02 0  msel mse2 mse3  C O - i —  C O T —  C O - i —  CO-"—  C O - i —  CO  C O N - O ^ r ^ i - ^ C O i - W C O  Number of Iterations Figure 11 Training error curves as a function of the number of iterations  Since the usual motivation for applying a backpropagation net is to achieve a balance between correct responses to training patterns and good responses to new input patterns (i.e. a balance between memorization and generalization), it is not necessarily advantageous to continue training until the total squared error actually reaches a minimum Also we should consider the computational efficiency. In this study, the A N N is assumed to be trained when the overall error decreases below 0.01. The training error curves as a function of the number of iterations are shown in figure 11. Convergence was within 1% total error. Generally, 1000 iterations were needed to reach convergence, which took about 20 rnin by a Pentium III computer.  The learning rate and momentum are selected empirically to generate best classification performance on average across all subjects. They are 0.3 and 0.1 respectively.  The three output nodes in this project produce binary values (0 or 1), which is coded to represent 6 movement patterns as described in the table below.  Classes of Movements  Desired Network's Response  left slow  outputl=l, output2=0, output3=0  left fast  output 1=1, output2=0, output3=l  right slow  outputl=0, output2=l, output3=0  right fast  output 1=0, output2=l, output3=l  forward slow  output 1=1, output2=l, output3=0  forward fast  outputl=l, output2=l, output3=l  Table 1 The outputs of ANN are coded to represent 6 motion categories  3.2 Feature Extraction Before Classification The success of any pattern classification system depends almost entirely on the choice of features used to represent the raw signals. It is desirable to use multiple feature parameters for E M G pattern classification since it is very difficult to extract a single feature parameter which entirely reflects the characteristics of the original signals. However in pattern classification or clustering, a uniform distribution of patterns in the pattern space will make it difficult to classify objects. Therefore the higher the dimension a feature vector has, the more noise may be introduced and the more likely the objects will be scattered uriiformly. Also adding feature parameter with small separability may degrade the overall pattern recognition performance. So at the first stage of pattern recognition, many features may be selected, but careful investigation should be made subsequently to extract the most significant features from them  For E M G signals, various time and frequency domain characteristics can be use in feature extraction.  In the time domain, the E M G signal is typically described via a variable related to the size or amplitude of the signal. The E M G root mean square (RMS), the peak value, the rectified mean value and standard deviation, sum, moving averaged E M G , integral E M G , and linear envelope displays are all quantitative parameters that display the amplitude of the E M G signal. Time domain parameters are the most widely used parameters in clinical neurophysiology for interpreting the E M G findings.  Frequency analysis comprises the second category for analyzing the E M G signal. There are many ways to conduct frequency analysis, including the analysis of zero crossings rate, spectral analysis and power spectrum analysis. The spectral analysis of E M G is used for the purpose of muscle fatigue analysis [7] as well as the strength of contraction of the muscle.  Numerous time-frequency algorithms, such as the short time Fourier transform, the wavelet transform and the wavelet packet transform, are the most popular forms of signal analysis.  Time-series analysis of E M G , such as A R modeling technique, is also used.  K. Englehart and B. Hudgins [13,16] investigated and compared the performance of time domain features and those derived from the short-time Fourier transform, the wavelet transform and the wavelet packet transform With an A N N classifier, they demonstrated that the time domain features outperform the transform based feature sets; whereas when using a L D A classifier, P C A reduced transform based feature sets exhibit better performance.  In this research, we chose linear envelope time domain features, the A R model and a combined approach of the discrete wavelet transform and the A R model as primary discriminating features for E M G classification.  Before processing, the E M G recordings should undergo several pre-processing steps. These are described in the rest of this section.  Extracting the Information-Bearing Part of the Signal We can see from Figure 2 and Figure 3 that not all of the sample data are information bearing. Most of the samples are collected while the jaw is resting, only the middle part of signal series is recorded during dynamic muscle contraction. In order to process the correct section of the acquired signal, it is necessary to find the onset and offset points of useful E M G . Generally speaking, the E M G onset can be determined qualitatively by simply examining the E M G plots visually. Here we shall use a quantitative definition.  Due to interference noise and individual differences, the threshold value (the activity onset point), which decides when a jaw activity begins, should be decided from a individual signal. This is not a fixed value for all signals.  Baseline reference needs to be calculated by computing the mean value of a segment of data which is at resting state. In this project, the first 600 points of data of the E M G is taken for baseline reference. Then the threshold is set to 2.5 to 3 times standard deviation above this baseline reference.  The identification of an onset point is based on two conditions: the point must be above the threshold level and stay above the threshold level for a certain amount of time. Here we choose 6~10ms (6-10 samples). However we have 4 channel E M G s and they do not start and finish simultaneously. We define the onset as the point where any of the 4 channel E M G s is above the threshold and stays above for a certain amount of time.  The end point (offset) of the E M G waveform can also be decided by means of a threshold. In the case that we only want to study the rising stage of the waveform, we can intentionally choose the end point as 1 second (1000 samples) or 1.5 seconds (1500 samples) away from the onset point. For some subjects who take longer time to relax after making a motion, the offset should be decided manually.  2000 0.6 i  2500 n  0  3000  Information-Bearing Part of E M G record } >  500  1000  1500  Figure 12 Extract the Information-bearing Part of E M G  Normalize the E M G There are large individual variations in the peak E M G activity during testing. Some subjects simply bite harder than others in response to the same instruction. Even for the same person, since there is no force control, the peak E M G is different from trial to trial. Generally speaking, the E M G activity increases proportionally to the biting force; however, no linear relationship exists between the E M G activity and the bite force.  In order to compare the results from different tests and different subjects, the signal amplitude is expressed as a percentage of the amplitude at the maximum voluntary contraction (MVC).  Normalization can also be done for each E M G waveform individually. In other words, each sample in a waveform is normalized to the peak point of the waveform, making the  peak value to be 1. The purpose of this individual normalization is for the mutual comparison.  3.2.1 E M G linear envelope and ANN Linear Envelope (LE) E M G The raw E M G is a random signal with zero mean. It is relatively useless because it fluctuates in amplitude too quickly and too often. A linear envelope is used to obtain the average trend of the E M G activity.  Raw EMG  Full wave rectified  Low Pass (4 Hz Here)  LE  Sample by every 15-25  Downsampled LE  Figure 13 Steps in getting the linear envelope  The above block diagram shows the procedure used to get the linear envelope of the E M G . The characteristics of the envelope change depending on the cut-off frequency of the low pass filter. A fourth order low pass Butterworth filter with cut-off frequency at 4 Hz is chosen for this study after different cut-off frequencies were compared.  Full-wave rectified E M G takes the absolute value of the raw signal. It is desirable since we are interested in including both the positive and the negative portions of the signal.  The low pass filter smoothes the waveforms (reduce frequency content) and only keeps the shape of the E M G signal (as shown in Figurel4). It is easy to interpret and detect the onset of activity.  Since the bandwidth of the signal is reduced after filtering, it is necessary to decrease the sampling rate accordingly for computational efficiency. Normally, one in every K original samples is retained (downsampling) without losing useful information if the bandwidth is reduced by a factor of K. K is chosen to be 15 to 25 in this research though it can be higher. The envelope is then down-sampled by a factor of 15 ~ 25 to reduce the total sample points and computation load.  Figure 14 E M G envelope (red curve) obtained from a lowpass filter at cut-off frequency of 4 H z  The E M G envelope provides a fairly consistent model for analyzing the range of activities or task. It indicates the tension generated as well as muscle usage. When multichannel EMGs are obtained during an activity, the linear envelope can provide information about the timing sequence of muscle activation; which one starts first, which one finish earlier and which two works together, etc. From the linear envelope, more variables can be extracted.  Integral Absolute Value and Average Amplitude The integral is a mathematical tool used to obtain the area under a curve. The applications for deteirnining the area under a curve in science and technology are numerous. Here the integral of the absolute value is calculated by adding all the amplitudes of the whole activity, from the moment of onset till the moment of offset. N IV  =Z\Xi\ (=1  where X, is the ith sample in the N samples of E M G which are information-bearing.  The E M G integral is a good indication of muscle tension [48]. It is important for quantifying E M G vs. force relationship. It can be correlated to the work done by the muscle and it is a very good measure of the total muscular effort. For example, the waveform that has the larger area under the curve indicates more energy and represents more muscular effort.  Dividing IV by N , the average amplitude is: 1  N  AveA = — Y x.. 1  Duration of the E M G Linear Envelope  The duration of the E M G linear envelope is calculated from the onset point to the offset (end) point. It refers to the time used to finish a specific action (or task) by a group of muscle.  Difference Absolute Mean Value (Detail Factor)  1  DF =  N  Y X ; , -X:  where x is the ith sample and N is the number of samples of E M G which are i  information-bearing.  DF is the mean absolute value of the difference between adjacent samples.  The Rise Time and the Slope of E M G Burst Studies show that the E M G is proportional to the speed of locomotion, though the relationship is not linear. So we can assume that the higher the speed, the quicker the E M G amplitude will change. A given E M G value on the rising part of the E M G curve (It is also referred as the transient burst of E M G and corresponds to the onset of sudden muscle effort or initial phase of a dynamic contraction) contains more information than the E M G value on the descending part of the curve [15]. Calculating the slope or gradient of the ascending part of E M G curve might be a good measure for discriminating fast and slow. Thus adding the slope information may improve classification accuracy. However, the 4 channel maxima do not occur simultaneously and the E M G waveform changes a lot each time, so the slope and rise time are calculated separately for each channel and an average value could be obtained later. Generally, the slope and the rise time (figure 15) are defined as: Rise Time= number of samples between starting point and end point in the EMG rising stage Starting Point: the point which is the local minimum before the first peak End Point: the point which is 90% percent of the first peak  c i ™  Amplitude^  Point  -  Rise  Amplitudes,^ Time  Point  Find Rise Time and Slope (Nouritest\lls3filt4.txt) ** : ending marks # : starting marks  Rise Time  )  5  I  1500  1  2000  1  2500  1  1  1  3000  3500  4000  L_  4500  Figure 15 Find rise time and slope  After a simple statistical study, it is found that the probability density function (or the distribution) of slope (or rise time) for fast and slow overlaps and decision remains unsolved using only the slope distribution. Since a high quality feature should have maximum class separability and minimum overlap, the slope and the rise time are not qualified for a good feature. Later test results show this well.  Combine Feature Variables with Different ANN Structures  The downsampled L E waveforms can work directly with A N N structure I for classification. Since the linear envelope provides information about the timing sequence of muscle activation, the strength of muscle activation and the muscle usage, it is a good measure for classification. The A N N produces decisions on a continuous stream of data.  The average of all the decisions can be used as the final decision. However this only works well in predicting the direction. It could not distinguish fast from slow.  Instead, A N N structure III is used. It adds more features like duration, detail factors, rise time, slope and integral value. These features work as bias to the A N N structure. Test results show that a type III A N N is most powerful when incorporating the duration and integral information. The two variables not only yield the highest correct classification rates but also strengthen the robustness of the A N N . They contribute significantly to the learning process.  On the other hand, variables like slope, rise time, detail factor, average amplitude and duration can combine into a feature vector for classification. Correspondingly, the A N N structure II is used. The number of its inputs is equal to the number of variables selected.  One possible feature selection is: slope sum: sum of the 4 slopes rise time 1 : average of the 4 rise times rise time 2: weighted average rise time calculated by fuzzy logic theory (The strongest signal has the highest weight and the weakest signal has the lowest weight. Weights are 3/4, 1/8,1/16, 1/16 separately.) average amplitude of Lm average amplitude of Lt average amplitude of Rm average amplitude of Rt duration of the task  These construct an 8-input feature vector to be supplied to the A N N . Test results show this combination produces the second highest classification rate (average 86%); only slightly lower than A N N structure III with duration and integral information (average 87%).  3.2.2 Data Reduction Using Autoregressive (AR) Model  Pattern Class  Raw EMG  Figure 16 Data reduction using AR model  Feature extraction can be achieved by building a mathematical model for the E M G signal. If the model forms a good approximation to the signal's observed behavior, it can then be used in a wide range of applications, such as pattern classification, data compression, and linear prediction. Using the A R model, a time domain signal will be reduced to a set of coefficients. The coefficients represent the original signal and can be used as discriminant variables for classification.  Raw E M G signal is more like a random process. It has been proved [40] that the raw E M G can be treated as locally stationary (statistical characteristics, such as average amplitude and frequency content, do not vary with time) over short time intervals. Hefftner et al. [40] showed that an E M G of intervals shorter than 1.0s is wide sense stationary. Therefore an A R model can be applied to a short-duration E M G signal.  The basic idea of an A R model is that the sequential E M G data can be approximated as a linear combination of past E M G data and white noise.  p  y(") = -I>/y(rt-') + e(n) (=1  y(n) : The samples of modeled signal. a : A R coefficients, also referred to as prediction coefficients due to A R t  equivalence to linear prediction. P : The order of the model e(n) : The prediction error. (A white noise residual)  At the beginning, the input to this autoregressive model is white noise. The real predictive signal is as follow: p  y(n) = -2>/y(n-') In other words, each sample in the signal sequence can be predicted by P previous samples. The present value of the time series y(n) is in some way (linearly) dependent on past values of the time series y (n -1), y (n - 2), etc. The model attempts to define this dependence, the order of the model indicating how far back this dependence goes.  The model can be interpreted as a linear system with e(n) as its input and y(n) as its output. The transfer function H (z) for the A R process is:  H(z)= ^Y  1  E(z)  ,  ^  _,  i=1  Since all the coefficients or, lie in the transfer function's denominator, the model is termed an all-pole or autoregressive (AR) model.  The A R modelling of a signal thus results in a set of coefficients. The coefficients contain information about the E M G pattern, and thus information about muscle contraction and the generated movement. It is expected that different motion has different E M G A R coefficients, thus, different muscle activity can be distinguished by checking the A R coefficients.  Building an A R model is actually finding the coefficients a-,. The Levinson algorithm from M A T L A B toolbox is implemented to recursively calculate the A R model coefficients.  In our case, since we have four channels of E M G and each channel is mapped to P coefficients, totally we will get 4 P coefficients for each movement. These coefficients are appended into a vector and are treated as inputs to the A N N for classification.  CH.l=[a ,a, ...a, ] u  2  p  CH2=[ cr , a . . . a 21  2 2  2P  CH3=[ « , a ... a 3 1  32  3P  ] ]  CH4=[a ,a ...a ] 4 1  4 2  4 P  InputVector=[CHl,CH2,CH3,CH4]  That is to say, the series of feature sets extracted from each channel are concatenated to yield a single feature set that will be applied all at once to the network input.  A n A N N structure II with the input nodes 4 P and 3 output nodes is constructed. Once trained, the network is used to classify patterns from a set of coefficients.  The A R models were created from the raw EMGs and A R coefficients were used as features in clustering. Since we are not interested in the high resolution spectral estimation or signal enhancement, we can arbitrarily choose a model order. Here, p=4,6,8,12 is tested. According to Paiss [44], the E M G signals are typically modeled with a 6 order A R model. th  AR Model Coefficients of Ltemp during 6 Kinds of Motions  10  .  .  0.8 0.6  c  0.4 0.2  :  0  o  left fast  0  right fast  o  forward fast  0  left slow  0  right slow  0  forward slow  -0.2 -0.4 <: o  -0.6  Fig.A  -0.8 -1  1  alphal  2 alpha2  3 alpha3  4 alpha4  5  6  alpha5  alpha6  7 alpha7  AR Model Coefficients of Ltemp during 6 Kinds of Motions 10 0.8 0.6 0.4  0  c  0.2 0  o  left fast  o  right fast  o  forward fast  0  left slow  0  right slow  0  forward slow  -0.2 O  -0.4 -0.6  Fig.B  -0.8 -1  1  alphal  2 alpha2  3  4  5  6  alpha3  alpha4  alpha5  alpha6  7 alpha7  Figure 17 Examples of AR coefficients. Fig. A comes from llfl.rlfl ,prfl.llsl,rlsl.prsl. Fig. B from Uf3,rlf3,prf341s3,rls3,prs3  Test results for this method were not promising, with only 66% correct classification rate. Some different jaw activity patterns were partitioned into the same group. This test result agrees with other researchers work [42].  It is very difficult to achieve high classification performance, especially for speed, because this linear prediction model for E M G signals could not describe the non-linear characteristics and large variability of the E M G signals.  The main reason is that the phase characteristic of signal is determined by the characteristics of both the system and the driving noise e(n). A R parameters alone do not describe phase characteristics well.  A typical example of A R model coefficients of left temporalis muscle during 6 types of motions is given in figure 17. After careful observation of figure 17, it is clear that actually only coefficient a and a contain useful information; other coefficients just 2  3  overlapped and could not be differentiated. However this can be improved through further signal processing procedure like applying wavelet transform to E M G before building the A R model.  3.2.3 Data Reduction Using Hybrid Approach of Wavelet Transform and the AR Model  Pattern Class  Raw EMG  Figure 18 Data reduction using WT and AR Wavelet transform (WT) theory has found a wide application in many areas. Its property is suitable for the non-stationary characteristics of E M G signals. Now we explore the feasibility of wavelet coefficients as features for E M G signal classification.  Wavelet transform is a representation of a signal in terms of a set of basis functions which is obtained by dilation and translation of a basis wavelet. The decomposition of the signal into the basis functions implies the computation of the inner products between the signal and the basis functions, leading to a set of coefficients called wavelet coefficients.  f(x)  = ^c Vf {x) k  k  c =(f(x),y/ (x)) k  k  k  ( ) : inner product C : wavelet coefficient k  y/ : basis function k  Since wavelets are short-time oscillatory functions having finite support length (limited duration both in time and frequency), they are localized in both time (spacial) and  frequency domains. Unlike the Fourier transform, the wavelet transform can capture time and frequency features of a signal at the same time.  In multiresolution signal analysis, wavelet transform may be used to decompose a signal at various resolutions. By using wavelet transform, signals are split up into a large number frequency channels. In the two-band multiresolution wavelet transform, a signal can be expressed by wavelet and scaling basis functions at different scale, in a hierarchical manner.  /(*) = £ a . A * W + ^ j ; 0  k  (p  jk  d  jk  y/  jk  (x)  j=0 k  .scaling functions at scale j  Yj : wavelet functions at scale j k  a, d jk  jk  : scaling coefficients and wavelet coefficients at scale j  This is achieved by dilation and translation of basis scaling function at different scale. <fi (x) = jk  2 <fi(2 x-k) jn  j  j : scale factor, k : translation factor  Briefly, at each stage (scale) the wavelet transform provides two sets of coefficients: the scaling function coefficients (approximation coefficients) and the wavelet function coefficients (detail coefficients). Each successive stage replaces the scaling function coefficients by another two sets of scaling function and wavelet function coefficients. And so on.  Changing the scale to a higher scale means frequency band doubling and moving to higher frequencies. Figure 19 gives a good illustration.  Tine  DWT Figure 19 The time-frequency plane of discrete wavelet transform The wavelet transform uses long duration window for capturing the low frequency component and short duration window for capturing the high frequency components.  In this study, the motivation behind the use of wavelet analysis is that it provides a linear time-scale frequency representation of the non-stationary signal. Also if we want to analyze a signal in a specific frequency band, we can work directly on the wavelet transform coefficients corresponding to that band. In another study, we have calculated the power spectrum density function of the E M G signal. It shows that dominant spectrum of E M G locates in the range of 30Hz~100Hz. Therefore it is reasonable to consider decomposing the raw E M G signal into 3 levels and working only on the approximation coefficients, which correspond to a frequency band between OHz to 75Hz. Figures 20 and 21 show the details. Level 1 decomposition Level 2 décomposition Level 3 decomposition t  \  A  75Hz 3.3  125Hz ds  250Hz da  500Hz di  Figure 20 Frequency bands associated with wavelet transform coefficients after three-level decomposition. a : approximation coefficients, di.d^d,: detail coefficients at level 1,2,3 respectively. 3  Wavelet decomposition of information-bearing part of E M G record (Ltemp) Raw EMG=a3+d3+d2+d1  0  100  200  300  400  500  600  700  800  900  100  200  300  400  500  600  700  800  900  100  200  300  400  500  600  700  800  900  Figure 21 Wavelet decomposition The important parameter of the wavelet transform is the choice of the mother wavelet. Various orders of the following wavelet families were considered: Symmlet, Coiflet, Daubechies, Harr, Vaidyanathan, Beylkin, and Biorthogonal. The Coiflet family of order four gave the lowest classification error. This was also reported in Daubechies's paper [25].  After DWT, the resulting data set is still quite large, so the wavelet coefficients must undergo dimension reduction process before they can be used as features for identifying types of movements. One way to do this is to use principal component analysis (PCA). PCA involves projecting the features onto their eigenvectors and retaining those which correspond to the largest eigenvalues. The inherent assumption in its use for  dimensionality reduction in the context of classification is that the signal variance accounts for a significant portion of the discriminant information amongst classes. It is well known that, for many physical signals, this is true, and certainly seems to be the case for the E M G . However P C A is complex and is not adopted in this research.  Another way to reduce the dimension of wavelet transform coefficients is to select the wavelet coefficient with the maximum absolute value at each scale [24], or to select a subset of the coefficients by thresholding.  In this study, a 6-order A R model was built for the approximation part of the wavelet transform to reduce the dimension and at the same time to extract features. A sample output of A R coefficients is shown in Figure 22. Compared with Figure 17, it is clear that at each coefficient, the distribution of different classes is dispersed and rarely overlapped. So information from all of the coefficients can be incorporated and will largely improve the classification ability.  A N N structure II is also used to classify the resulting coefficients.  a n ooemuerus 01 vvawiei approximation tor  Ltemp during 6 Kinds of Motions (wavelet:coif4) 10 0.8 0.6  <>  • 4).  0.4  CD  0.2  o  0  •4  -0.2  left fast  o  right fast  o  forward fast  0  left slow  0  right slow  0  forward slow  "O <>  0  O  0  Q  -0.4  0  -0.6 -0.8  o  •<p~ 1  alphal  2 alpha2  3 alpha3  4 alpha4  5 alphas  o 6 alpha6  7 alpha7  Figure 22 AR coefficients obtained from the wavelet transform of the signal  Chapter 4 Tests and Results The programs are written in C++ and Matlab separately according to different A N N structures. In the C++ program, a configuration file is read to determine parameters such as signal onset and offset threshold, sampling rate, learning rate etc. In the future, this configuration file can be replaced by a window menu. Operators can type in the values for each selection and display a plot of the incoming multi-channel E M G signal. By trial and error, the following parameter values are most often adopted. Small changes may be necessary for individual subjects.  TRAINING.REPETITIONS LEARNINGJRATE MOMENTUM  0.1  THRESHOLD  0.038  500  0.3  BUFFER.LENGTH  10  SAMPLE.PERIOD  15  Different combinations of feature variables and A N N structures have been tested. They are: 1. A R coefficients with A N N structure II 2. DWT + A R coefficients with A N N structure II 3. L E variables with A N N •  Detail Factor with A N N structure II  •  Rise time and Slope with A N N structure II  •  Slope sum, Average amplitude, Rise time, Duration with A N N structure U  •  Down sampled L E , Duration, Integral with A N N structure UI  Tests results for average correct classification rate are given in Table 2.  Feature Set LE  Downsampled One day L E , Duration and Integral Between day ( A N N structure Pooling III)  Correct Classification Rate 87%±5.3% including 3 subjects' bad results 71.3±3.2% 79±3.7% 86%±3.5% (one day test)  Slope sum, Average amplitude, Rise time, Duration Rise Time and Slope  63%±3%  (one day test)  Detail Factor  64%  (one day test)  AR  66%±4.7% (one day test)  AR&DWT  75%±4.1 % (one day test)  Table 2 Test results for different schemes  Since the downsampled L E with duration, integral and A N N structure HI give the best results, its corresponding results are discussed in detail in Section 4.1, 4.2 and 4.3.  The classification ability of A N N is evaluated in four levels.  4.1 Same Subject, Train and Test on The Same Day (Intrasession Test) The intrasession test will show whether the A N N can derive appropriate classification ability and the robustness of this ability. Data from 10 subjects are used to test the classification accuracy of the algorithm For the subject with data from 5 consecutive days, tests are done on each day individually.  The test results of 9 subjects are given in Table 3. The intrasession test produced the best separation of different motion categories among all test schemes. Classification errors ranged from 3.7% to 33.3%. The correct classification rate ranges from 66.7% to 96.3%. Wrong classification is mainly due to confusion with speed. Direction mistakes are rare.  Subject ID Number of  Number of  Number of Correct Classification  Training Files Testing Files Errors  Rate(%CL)  PI  36  54  2  96.3%  P2  35  53  4  92.5%  P3  36  54  4  92.6%  P4  36  54  5  90.7%  P5  36  52  5  90.4%  P6  34  53  7  80%  P7*  36  52  15  71.2%  P8*  36  53  16  65%  KT  36  54  18  66.7%  Table 3 Test results for 9 subjects Test results are very encouraging for subjects PI to P6, with an average classification rate of 90.4%. For P7, P8 and P9, their results turned to be different. After plotting the waveforms of all files in a subject's data set and visually inspecting them, it is found that there are several reasons for the large variability in classification rate. First, these subjects could not perform the motion accurately and consistently; for example, Figure 23 shows 7 left fast movements for subject P8. Compared with PI (Figure 24), there are apparent variations in the shapes of waveforms. The stochastic nature of the E M G is not the primary source of variability here; it is due to lacking of control or losing of concentration when making the motion during the experiments.  keritest\IK10filt4.txt  keritest\llf9filt4.txt  keritest\llf12filt4.txt  keritestMlf 11firt4.txt  0.5  212  •  i  , 0.5  •  95  373 2361  A [J  — . — —  2000  3000  4000  2000  3000  4000  2000  3000  4000  ave ind= 216,sum of slope=4.834 fuzzyind-219  aveind= 197,sum of elope=5.945 fuzzyind- ?04  ave ind= 198,sum of slope=4.399 fuzzyind=209  keritest\llf13filt4.txt  keritest\llf14filt4.txt  keritest\llf15filt4.txt  2000  3000  4000  ave ind= 187,8um of 8lope*4.901 fuzzyind=199  2000  3000  4000  ave lnd= 180,sum of slope=3.487 fuzzyind» 194  Figure 23 7 left fast movements from subject P8  2000  3000  4000  avs ind= 183,3um of slope=4.676 fuzzyind» 197  2000  3000  4000  ave ind= 198,s urn of slopes 4 756 fuzzyind=207  Taratest\llf1filt4.txt  Taratest\llf2filt4.txt  Taratest\llf3filt4.txt  Taratest\llf4filt4.txt  0.3  0.3  0.25  0.25  0.2  0.2  0.15  0.15  0.1  0.1  0.05  0.05  0 2000  3000  4000  2000  3000  4000  a w lnd= 161,sum of slope=3 089 fuzzytnd=175  avBind= 175,sum of slope=2.399 fuzzyind=187  Taratest\llf5filt4.txt  Taratest\llf6filt4.txt  0 2000 3000 4000 aveind= 183,8um of slope=2.354 fuzzyind=194  Taratest\llf7filt4.txt  3000  4000  a w ind= 223,sum of slope-1.677  2000  3000  0.3  0.3  0.25  0.25  0.2  0.2  0.15  0.15  0.1  0.1  0.05  0.05  4000  3000  4000  aw lnd= 170.sum of slope-?.633  fuzzyind=185  fuzzyfnd=179  Figure 24 8 left fast movements from subject PI  4000  0 2000  aw ind= 1/0.sum of slope=2.029  fuzzy* nd=209  3000  Taratest\llf8filt4.txt  0 2000  2000  a w ind= 175.6UI11 of slope=2.215 fuzzylnd=177  2000  3000  4000  aw lnd= 170,sum of slope=2 799 fuzzyind=193  The second reason may be that the muscle usage for different movements is similar. This can also be concluded after examining plots of waveforms. Normally different muscles are primarily used when performing different jaw activities. Left temporalis (Lt, green color) is dominant when moving to the left; Right temporalis (Rt, cyan color) is dominant when moving to the right; Both left and right masseter (blue and red) dominate when moving forward. Speed dose not have much effect on thé above conclusion; in other words, when moving to a specific direction, same muscle dominates no matter moving fast or slow. This result is similar to Chris Long's finding in his thesis [32]. He made a statistical analysis on the subject who did the reproducibility test.  However this conclusion can only be applied to subject P I , P2, P3, P4, P5 and P6. For subjects P7, P8 and P9, sometimes their muscle activities are similar when making different movements, especially left and forward. Figure 25 shows 4 left fast (upper part) and 4 forward fast movements (down) for subject P7. Blue and Red curves dominate for both of the movements.  Marissa\llf5filt4.txt  2000 3000 4000 5000 ave ind= 415,sum of slope-0.904 fuzzylnd=586 Marissa\prf5filt4.txt  Marissa\llf6filt4.txt  2000 3000 4000 5000 ave ind= 231,sum of slope=2.827 fuzzylnd=239 Marissa\prf6filt4.txt  Marissa\llf7filt4.txt  2000 3000 4000 5000  Marissa\llf8filt4.txt  2000 3000 4000 5000  ave lnd= 421,sum of slope=1.547 fuzzylnd=445  ave ind= 262,sum of slope-0.911 fuzzylnd=563  Marissa\prf7filt4.txt  Marissa\prf8filt4.txt  2000 3000 4000 5000  2000 3000 4000 5000  2000 3000 4000 5000  2000 3000 4000 5000  ave ind= 323,sum of slope=2.569 fuzzyind=299  ave ind= 363,sum of s!ope=2 589 fuzzylnd=282  ave lnd= 428,sum of slope=2.244 fuzzyind=590  ave ind- 160,sum of slope=2.676 fuzzylnd=318  Figure 25 4 left fast (upper part) and 4 forward fast movements (down) for subject P7  repro5\rlf1filt4.txt  repro5\rlf2filt4.txt  repro5\rlf3filt4.txt  repro5\rlf4fill4.txt  2000 3000 4000 5000  2000 3000 4000 5000  2000 3000 4000 5000  2000 3000 4000 5000  aw ind= 209,sum ol slope= 1.643 fuzzyind=215  aw ind= 290,sum of slope=2.227 fuzzyind=332  aw lnd= 196,sum of s!ope=2.429 fuzzyind=255  awind= 346,sum of slope=1.9i9 fuzzyind=425  repro5\rls1filt4.txt  repro5\rls2filt4.txt  repro5\rls3filt4.txt  repro5\rls4filt4.txt  2000 3000 4000 5000 awind= 163,sum of slope= 1.692 fuzzyind=189  2000 3000 4000 5000 awlnd= 191,sumofslope=0.310 fuzzyind=373  2000 3000 4000 5000 awlnd= 233,sumof slope=2.910 fuzzyind=212  Figure 26 Similarity of fast (above 4 plots) and slow (below 4 plots) forrightmovements in repro5  2000 3000 4000 5000 awlnd= 168,sumofelope=2.361 fuzzy1nd=193  repro4\llf5filt4.txt  repro4\IH6filt4.txt  repro4\llf7fiH4.txt  repro4\llf8filt4.txt  2000 3000 4000 5000  2000 3000 4000 5000  2000 3000 4000 5000  2000 3000 4000 5000  ave nd= 165,sum of slope=3.958 luzzyfnd=176  ave ind= 173,sum of slope=3 189 fuzzylnd.182  ave ind= 178,sum of slope=3 213 fuzzyind=192  ave Ind- 182,sum of slope=2 553 fuzzylnd-184  repro4\lls5filt4.txt  repro4\lls6filt4.txt  repro4\lls7filt4.txt  repro4\lls8filt4.txt  2000 3000 4000 5000  2000 3000 4000 5000  2000 3000 4000 5000  ave ind= 182,sum ol slope=2.613 luzz/nd=194  ave lnd» 277,sum of elope= 1.646 fuzzyind=233  ave md= 229,sum of s!ope= 1 859 fuzzyind=208  Figure 27 Similarity of fast (above 4 plots) and slow (below 4 plots) for left movements in repro4  2000 3000 4000 5000 ave ind= 192,sum of slope=2 517 fuzzyind=217  As for the data from 5 consecutive days, the following table shows the results. The test results for reproducibility 1 and 2 are exceptional, with 98.1% and 94.4% correction rate separately. But when it comes to reproducibility 3, 4 and 5, the successful rate decreases, with 87%, 85.2% and 87% separately.  Day ID  Number of  Number of  Number of Correct Classification  Training Files Testing Files Errors  Rate(%CL)  Repro 1 33  54  1  98.1%  Repro2  34  54  3  94.4%  Repro3  34  54  7  87%  Repro4 31  54  8  85.2%  Repro5  54  7  87%  31  Table 4 Test results for each day of the 5 consecutive days Basically the A N N could separate directions from each other. It seldom confused left with right or left with forward or etc. Sometimes it has problem in distmguisWng speed, for example, in reproducibility 2, it confused forward slow with forward fast.  This shows that the A N N is not accurate enough in classifying speed. However, after plotting all E M G signals in the dataset, it is reasonable to think that accuracy is partially affected by the inconsistency during the data acquisition procedure.  In each dataset, there are 15 trials for each class. Trials done earlier show obvious different duration between fast and slow, but later trials don't. Fatigue may be the reason to cause the change.  As for reproducibility 4 and 5, when looking at the plots of files, we can only see slight difference between the duration of fast and slow for some movements (see Figure 26 and Figure 27). From the timing target, the fast movement took about 1 second and the slow  one took about 1.5 second. So the difference should be larger than what had happened in reproducibility 4 and 5. This inconsistency between reproducibility causes the test results to degrade.  4.2 Same Subject, Train and Test on Different Days (Intersession Test) This procedure was designed to test the reproducibility of the experiment and robustness of an A N N . One subject's data on 5 consecutive days are used in this test. Table 5 is the result for training on reproducibility 1 and testing on the other 4 days. Table 6 is the result for training on reproducibility 2 and testing on others. The same results apply to the other corresponding combinations.  Test on  Correct Classification Rate  repro2  71.1%  repro3  56.7%  repro4  74.4%  repro5  64.4%  Table 5 Train on reprol and test on the other days (between day test)  Test on  Correct Classification Rate  reprol  70.4%  repro3  74.1%  repro4  77.8%  repro5  81.5%  Table 6 Train on repro2 and test on the other days (between day test)  repro1\rlf1filt4.txt  0.4 r  •  2000  repro1\rlf2filt4.txt  <  3000  1  4000  0.4 r  repro1\rls1filt4.txt  <  2000  3000  n  4000  0.4  repro1\rls2firt4.txt T  r  2000  3000  4000  0.4  2000  3000  4000  ave ind= 170.sum of slope=4 486 fuzzylnd-191  ave ind= 225,sum of slopes 1.973 fuzzyind=274  ave ind= 291 .sum of slope- 1.397 fuzzyind-267  ave lnd= 250.sum of s lope-1.681 fuzzyind=275  repro3\rlf1filt4.txt  repro3\rlf2filt4.txt  repro3\rls1filt4.txt  repro3\rls2fitt4.txt  2000  3000  4000  ave Ind- 201 .sum of slope-1.784 fuzzyind-195  2000  3000  4000  aw ind- 179,sum of slope-1.976 fuzzylnd-191  Figure 28 Compare the task duration from different reproducibility  2000  3000  4000  aw Ind. 182,8um of slope-1.317 fuzzylnd-189  2000  3000  4000  awlnd= 163,8um of slope-1.435 fuzzylnd-180  The results were comparatively less impressive. The prediction accuracy is affected by the variations of data set in different days. If two day's data were similar to each other, then the classification accuracy will be high and vice versa. Table 5 and Table 6 are chosen somewhat randomly for demonstration, other combinations may behave a little differently, but the basic conclusion is still true.  Template was used to make sure that errors due to placement, removal and replacement of electrodes were minimal [30,31].  The day-to-day variable nature of the data sets shows that E M G recording is not well controlled. Especially when we look at the durations of E M G waveforms (see Figure 28). The performance on different days was not consistent. As far as the duration of task is concerned, the reproducibility is poor.  4.3 Same Subject, Pooling Test  A l l data from 5 consecutive days are combined into one big data set. The training data set includes samples from each day and each category of motion. Prediction is done on the remaining. To account for the inherent variability, the E M G is normalized to the peak value of each trial. Training is based on files numbered 1,5,15 in each reproducibility in the following way:  reprol  repro2 Ilfl,llf5,llfl5 1151,1155,11515 prfl,prf5,prfl5 prsl,prs5,prsl5 rlfl,rlf5,rlfl5 rlsl,rls5,rlsl5  repro3 repro4  repro5  Training Set  Pick Up  Figure 29 Formation of the training set for pooling test  Test on  Correct Classification Rate  reprol  81.1%  repro2  86.7%  repro3  77.8%  repro4  68.9%  repro5  78.9%  Table 7 Pooling test results  The remaining files were used for evaluating the performance of the ANN after training. The pooling results (see Table 7) are better than the intersession results. This is because when a network is trained with a data set that covers a wide range of information, it will learn relationships that will occur in different situations. For the intersession tests, if a slow movement in one set is the same as the fast in another, definitely it will be very difficult for an ANN to obtain a higher classification rate. Whereas when pooling all the single data sets into one big data set and picking up each possible sample from each type  of activity and each single data set, the training set will cover almost all the situations and for sure the test results will be improved.  4.4 Test Between Individuals (Intersubject Test)  This test attempts to compare data across subjects and to find out if the training result of one subject is extensible to others.  The original purpose of selecting artificial neural network as a classification tool is based on the thinking that human EMGs are subject to a wide range of individual variations and an A N N can learn and adapt to the individual characteristics.  When data obtained from different subjects are considered, the variability in the data may be caused by different factors: those associated with the recording of the E M G (e.g. position of the E M G electrode) and those involved with the normal variability across individuals. In our data set, E M G primarily varies with the individual. The exact duration of E M G waveform varies between individual, though there is a timing target to guide their reactions. The shape of an E M G waveform is also individual dependant. Figure 30 shows E M G waveforms for left fast movement of two subjects, P I (down part) and P2 (upper part). For the same motion, P2 primarily uses right masseter muscle (red curve) and left temporalis muscle (green curve) while PI uses left masseter (blue) and left temporalis (green) a lot. P2's fast (2 plots in the down-left corner) is as long as Pi's slow (2 plots in the up-right corner), though there is distinct difference between fast and slow for each own subject.  Results for intersubject test are poor. This shows that an A N N , which has been trained to adapt to a specific person, performs badly on others. The A N N has to be trained and tested by the subject's own data.  saratest\IIMfilt4.txt  2000  3000  saratest\IK2filt4.txt  2000  3000  saratest\lls1filt4.txt  2000  3000  saratest\lls2firt4.txt  2000  3000  aveinofe 173,8um ol slope-2.838 fuzzy* nd= 187  ave ind= 294,sum of slope-2.402 fuzzyfnd=263  ave ind= 290,sum of slope=2 484 fuzzylnd-270  aveind= 176,sum of slopes2 643 fuzzylnd=185  Taratest\llf1filt4.txt  Taratest\llf2filt4.txt  Taratest\lls1filt4.txt  Taratest\lls2filt4.txt  2000  3000  aveind= 161,sum of s l o p e d 089 fuzzy»nd=175  2000  3000  ava ind= 175,sum of slope=2.399 fuzzyind=187  2000  3000  aveind* 181,sum of slope=2 365 fuzzylnd-194  Figure 30 EMG waveform differences between two subjects, above: subject P2; below: subject PI  2000  3000  ave ind= 156,sum of slope=3.336 fuzzyind=172  Chapter 5 Discussion and Conclusion In this study, an algorithm was designed to classify six kinds of simulated bruxism activities from the waveform of the E M G signal resulting from the dynamic muscle contraction. A n artificial neural network is trained to identify jaw motion direction and quantify the motion speed. This experimental investigation provides a potential technique for the clinical evaluation and diagnosis of bruxism  Several feature extraction schemes and different artificial neural network structures were tried, and the algorithm was tested for its correct classification rate. It was found that successful recognition of jaw activity varies with the individual being tested, the data set used for training, the feature extraction technique and the A N N structure.  The best separation of these events was reached using time domain linear envelope features. Mean values of the success rate are around 87%. If we exclude the data from 3 subjects who failed to follow the timing target, the success rate will be as high as 90%.  Generally the algorithm can distinguish the directions of jaw movements with almost 100%. This feature can be useful in clinic. Since it helps dentists to find out if a patient has a habit to move his jaw toward a specific direction. Then some kind of oral appliance (or splint) may be designed to correct this habit.  5.1 Effect of Network Structure  The results of the algorithm are very encouraging. It was demonstrated that the neural network classifier could accommodate the diverse set of the E M G patterns. The subjects were not required to reproduce movements with a specific force. After training, the neural network was able to adapt to each subject's distinct E M G patterns and the inherent  variations of E M G . Tests in Chapter 4 œrifirmed that the performance of the neural network based classifier would not be affected by small variations in feature values.  Downsampled linear envelopes of the E M G signals together with A N N structure III generates the best classification. This may be due to the fact that a linear envelope gives information of muscle activation timing sequence, muscle activation strength and muscle usage; integral and duration give information of the total muscle effort and task time. The combination provides a better description of the original signal.  As stated in the previous chapter, a multi-layer feedforward neural network with 2 hidden layers with 16 and 20 neurons respectively, or one hidden layer A N N with 20 neurons was found to be the best configuration.  The quality and consistency of the predictions increased when increasing the number of training iterations, and leveled off at about 500.  Although using an A N N showed a high performance and robustness over widely varying signals, the limitations of this approach are that: (1) . The training may not converge to the optimal solution if some invalid training samples exists. The classification performance may be degraded in such a situation. (2) . The classification is sensitive to the selection of training samples. The classification performance will be improved when the relative amount of information in the training set is sufficient.  5.2 Comparing Different Feature Extractors Used  The classification performance of the feature sets was investigated. Table 2 tabulates the mean and standard deviation of the percentage of correct classifications. The highest diagnostic yield was obtained for the time domain features, being 87% for downsampled L E with A N N structure EI and 86% for L E features with A N N structure II, followed by  the hybrid A R model/DWT feature extractor and the A R model with 75%, 66% respectively. A similar result in the region of 60%-65% for A R model was also obtained using neural network [41]. The results also agree with other researchers' work.  It was found that for the individual data records, differences of A R coefficients are apparent for different activities. However, the difference was not so distinct to facilitate accurate detection. The main reason was that the phase characteristic of a signal is determined by the characteristics of both the system and the driving noise e(ri). A R parameters alone do not describe phase characteristics well. Also a linear prediction model could not describe the non-linear characteristics of the E M G signal. Therefore the results are less impressive.  The 6 order A R model of the approximation coefficients of discrete wavelet transform th  yields better result than direct A R modelling in pattern prediction. This can be related to the fact that approximation coefficients are much smoother and tend to be more linear than raw E M G s and that the time-frequency characteristics of W T is suitable for extracting features from the non-stationary E M G signals.  It is important to note that the time domain features which are extensively used in the clinical laboratories give the highest diagnostic yield. However, some features alone, like rise time or detail factor of the waveform, would not give desired classification. The classification performance varied with which feature was chosen. Among the time domain parameters, the duration and integral were the best discriminative features.  When using rise time or detail factor information alone, correct rate is only about 60%. It is not the A N N ' S deficiency not to be able to distinguish fast from slow. It is because the input data (detail factor for example) from fast and slow are so similar to each other. Section 2.2 shows that the original differentiation into fast and slow is based only on a verbal instruction and a visual target. In actual experiments, there is possibility that the subjects do not exactly response to the target. Therefore some feature parameters are similar for fast and slow.  Generally, results showed reasonable accuracy and precision for different features.  5.3 Reproducibility  Very good results are obtained for training and testing the A N N s on signals obtained in one day only or from one person only (success rate 87%) whereas the results for training and testing the A N N on data obtained on different days give around 10% less success rate and largely depend on the consistency of the experiments. Although various methods of standardization have been used for control recording variables, when E M G s recorded on different days are compared, some variability may still be found. The obvious change in the duration of waveforms show this well.  A device called template was used to locate the surface electrodes as near as possible to the previous sites on the face at the return session. Repeatedly locating electrodes at the correct spot is essential as even small differences in location can lead to large changes in the resulting signal [20,21]. In this study, the possible cause for the variation in reproducibility test is that the subject failed to follow the oscilloscope target precisely. The oscilloscope tracing is the only way to control the timing of movements and decide on the fast and slow states. Failing to follow it, due to fatigue in the final stage of the test or loss of concentration, will for sure make the movements become inconsistent. The splint, which guides the jaw movements, should be redesigned in later experiments to make sure the activity is precisely performed.  Conclusion  In general, this study has shown the feasibility of EMG signal classification for the purpose of jaw motion recognition. The ANNs are able to generate this relation to a certain degree.  Future work  1. Extract features using wavelet transform and principal component analysis (PCA). After the wavelet transform, the dimension of the feature set is still large. PCA, which projects the data onto its eigenvectors, is used to reduce the feature dimension. (PCA is a method of dimensionality reduction based upon finding the vectors (principal components) along which the greatest variance of a distribution is observed.). A fixed number of PCA coefficients which correspond to the largest values may be retained from each EMG channel and may append to form a feature set. The PCA-reduced feature set could then undergo a LDA classifier. In Englehart's paper [13], he demonstrated that the LDA classifier performs as well as or better than a multi-layer perceptron (MLP) for the time-frequency-based features sets.  Measured Signal  Dimensionality Réduction • • • •  TD STFT WT WPT  (PCA)  Class Labels  2. Although the ANN can accommodate a small variance in the EMG, it has to be trained individually on each subject. The network parameters have to be adjusted according to a given subject. This is not time-efficient and an automatic adjustment procedure should be designed.  3. Our tests are based on available experimental data. Although the design of the experiment generally reflects the intention of the designer, problems exist in the controlling the jaw movement precisely and making the experimental data consistent. To improve the experimental data, a specific jaw tracking apparatus for the patient may be used and the position of jaw or the motion of the jaw may be videoed simultaneously with the E M G signals. The position or speed will be synchronized with the E M G for the convenience and accuracy of analysis. A dynamic A N N structure will work better in mapping the E M G with the jaw position or activity speed. 4. Further tests can be done to check the capability of the algorithm acting on a continuous stream of data, such as EMGs obtained during night.  [1] Bader, G. & Lavigne, G.(2000) Sleep bruxism: an overview of an oralmandibular sleep movement disorder. Sleep Medicine Reviews, 4(1) 27-43 [2] G.J.Lavigne & T.Kato (2003) Neurobiological mechanisms involved in sleep bruxism. Crit. Rev. Oral Biol.Med. 14(1): 30-46 (2003) [3] G.J.Lavigne & P.H.Rompre & J. Montplaisir (1996) Sleep bruxism: validity of clinical research diagnostic criteria in a controlled polysornnograpbic study. Journal of dental research 75: 546-552 [4] T. Dceda & K. Nishigawa et al (1996). Criteria for the detection of sleep-associated bruxism in humans. J. of Orofacial Pain 10(3):270-282 [5] W.L.Kydd & C.Daly (1985) Duration of nocturnal tooth contacts during bruxing. The journal of Prosthetic Dentistry 53:717-721 [6] Stashuk, D. (2001) EMG signal decomposition: how can it be accomplished and used? Journal of Electromyography and Kinesiology 11(3): 151-173 [7] C.Disselhorst & J.Silny (1997) Improvement of spatial resolution in surface EMG: A theoretical and experimental comparison of different spatial filters. IEEE Transaction on Biomedical Engineering. 44(7):567-574 [8] L.M.Gallo & S.Palla (1995) Activity recognition in long-term electromyograms. Journal of Oral Rehabilitation 1995 22: 455-462 [9] L.M.Gallo & P.W.Guerra & S. Palla (1998) Automatic on-line one-channel recognition of masseter activity. Journal of Dental Research 77(7): 1539-1546, Jul. 1998 [10] A M Velly-Miguel & J. Montplaisir (1992) Bruxism and other orofacial movements during sleep. J. Craniomandib. Dis. Fac. Oral Pain 6: 71-81  [11] W.L. Kydd & C. Daly (1985) Duration of nocturnal tooth contacts during bruxing. The Journal of Prosthetic Dentistry. 1985, 53(5):717-721 [12] A.D.C Chan & K. Englehart (2002) Hidden Markov model classification of myoelectric signals in speech. IEEE engineering in Medicine and Biology Sep/Oct 2002 143-146 [13] K. Englehart & B.Hudgins (1999) Classification of the myoelectric signal using timefrequency based representations. Medical Engineering & Physics 21 (1999) 431-438 [14] S. Park & S. Lee (1998) EMG pattern recognition based on artificial intelligence techniques. IEEE Transaction on Rehabilitation Engineering, Vol.6, No. 4, December 1998 [15] B. Hudgins & P. Parker (1993) A new strategy for multifunction myoelectric control. IEEE Transaction on Biomedical Engineering, Vol. 40, No. 1, Jan. 1993 [16] K. Englehart & B. Hudgins (2000) Time-frequency based calssification of the myoelectric signal: static vs. dynamic contractions. Proceeding of the 22 Annual EMBS International nd  Conference, July 23-28, 2000, Chicago IL. [17] D.K.Kumar & N.Ma (2001) Classification of dynamic multi-channel electromyography by neural network. Electromyography and Clinical Neurophysiology, 2001, 41: 401-408 [18] G. Cheron & F. Leurs, etc. (2003) A dynamic recurrent neural network for multiple muscles electromyographic mapping to elevation angles of the lower limb in human locomotion. Journal of Neuroscience Methods 129 (2003) 95-104 [19] Hans H.C.M Savelberg & Walter Herzog (1997) Prediction of dynamic tendon forces from electromyographic signals: An artificial neural network approach. Journal of Neuroscience Methods 78(1997) 65-74 [20] K. Englehart & B. Hudgins (2003) A robust, real-time control scheme for multifunction myoelectric control. IEEE Transaction on Biomedical Engineering, Vol.50, No.7, July 2003  [21] Simon Ferguson & G.Reg.Dunlop (2002) Proc. 2002 Australasian Conference on Robotics and Automation: Grasp recognition from myoelectric signals. [22] Toshio Tsuji &Osamu Fukuda (2000) Pattern classification of time-series EMG signals using neural networks. Int. J. Adapt. Control Signal Process. 2000:14:829-848 [23] K. Englehart & B. Hudgins (2001) A wavelet-based continuous classification scheme for multifunction myoelectric control. IEEE Transaction on Biomedical Engineeering, Vol.48, No.3, March 2001 [24] L. Cai & Z. Wang (1999) An EMG classification method based on wavelet transform Proceeding of The First Joint BMES/EMBS Conference: Serving Humanity, Advancing Technology. Oct. 13-16 1999. Atlanta, GA, USA. [25] I.Daubechies (1992) Ten lectures on wavelets CBMS-NSF Regional Conference Series in Applied Mathematics. Philadelphia, PA: SIAM, 1992, vol. 61 [26] Constantinos S. Pattichis & Christos N. Schizas & Lefkos T. Middleton (1995) Neural network models in EMG diagnosis. IEEE Transaction on Biomedical Engineering. Vol. 42, No.5, May 1995 [27] Sietsma J, Dow RJF. (1991) Creating artificial neural networks that generalize. Neural Networks. 4(1991), 67-79. [28] K. Englehart & B. Hudgins (1995) A dynamic feedforward neural network for subset classification of myoelectric signal patterns. 1995 IEEE-EMBC and CMBEC Theme 4: signal processing. [29] M.P.Beddoes & C. C. Peck & A.G.Hannam(2001) ANN-models for jaw research. Proceedings~23 Annual Conference—IEEE/EMBS Oct. 25-28, 2001, Istanbul, TURKEY rd  [30] B.H.Burdette & E.N.Gale (1990) Reliability of surface electromyography of the masseteric and anterior temporal areas. Archs. Oral Biol. Vol.35, No.9, 747-751, 1990.  [31] J.W.Frame & P.S.Rothwell (1973) The standardization of electromyography of the masseter muscle in man. Archs oral Biol. Vol.18, 1419-1423, 1973. [32] Christopher L. Long (2004) Pattern recognition using surface electromyography of the anterior temporalis and masseter muscles. Master thesis, Department of Dentistry, UBC. [33] E.Kwatny & D.H. Thomas (1970) An application of signal processing techniques to the study of myoelectric signals. IEEE Transaction on Biomedical Engineering. Vol. 17 : 303-312 April 1970 [34] Ming Ming Liu & Walter Herzog (1999) Dynamic muscle force predictions from EMG: an artificial neural network approach. Journal of Electromyography and Kinesiology 9 (1999) 391400 [35] Reza Boostani & Mohammad Hassan Moradi (2003) Evaluation of the forearm EMG signal features for the control of a prosthetic hand. Physiological Measurement 24 (2003): 309-319 [36] Alcimar Soares & Adriano Andrade (2003) The development of a virtual myoelectric prosthesis controlled by an EMG pattern recognition system based on neural networks. Journal of Intelligent Information Systems, 21: 2, 127-141, 2003 [37] Lin Wang & Thomas S. Buchanan (2002) Prediction of joint moments using a neural network model of muscle activations from EMG signals. IEEE Transaction on Neural Systems and Rehabilitation Engineering, Vol.10, No.l, March 2002 [38] G. Cheron & F. Leurs (2003) A dynamic recurrent neural network for multiple muscles eclectromyographic mapping to elevation angles of the lower limb in human locomotion. Journal of Neuroscience Methods 129 (2003) 95-104 [39] Jer-Junn Luh & Gwo-Ching Chang (1999) Isokinetic elbow joint torques estimation from surface EMG and joint kinematic data: using an artificial neural network model. Journal of Electromyography and Kinesiology 9 (1999) 173-183  [40] Hefftner G, Zucchini W, Jaros GG (1988) The electromyogram (EMG) as a control signal for functional neuromuscular stimulation—Part I: autoregressive modeling as a means of EMG signature discrimination. IEEE Trans. Biomed Eng 1988;35:597-604. [41] C.S.Pattichis & A.G.Elia (1999) Autoregressive and cepstral analyses of motor unit action potentials. Medical Engineering &Physics 21 (1999) 405-419 [42] M . Bodruzzaman & M.Wilkes (1990) Classification of electromyographic signals by autoregressive modeling. IEEE Proceedings 1990 Southeast Conference. [43] A. Asres & H. Dou (1996) A combination of AR and neural network technique for EMG pattern identification. 18 Annual International Conference of the IEEE Engineering in Medicine th  and Biology Society, Amsterdam 1996 [44] O. Paiss & G.F.Inbar (1987) Autoregressive modelling of surface EMG and its spectrum with application to fatigue. IEEE Transaction on Biomedical Engineering. Vol.34 No. 10: 761-770 [45] J.Pardey & S.Roberts (1996) A review of parametric modelling techniques for EEG analysis. Med.Eng.Phys. 1996. Vol.18, No.l, pp 2-11 [46] Tohru Kiryu & Yoshiaki Saitoh (1992) Investigation on parametric analysis of dynamic EMG signals by a muscle-structured simulation model. IEEE Transaction on Biomedical Engineering, Vol.39, No.3, March 1992 [47] Hirornasa Suzuki & Robin A. Conwit (2002) Relationships between surface-detected EMG signals and motor unit activation. Medicine & Science in Sports & Exercise April 2002, 15091517 [48]D.Gordon E. Robertson (2002) Electromyography: Processing. Course note. Biomechanics Laboratory, School of Human Kinetics, University of Ottawa, Canada. [49] O. Nerrand & P. Roussel-Ragot (1994) Training recurrent neural networks: why and how? an illustration in dynamical process modelling. IEEE Transactions on Neural Networks. Vol., 5, No.2, March 1994  [50] Coryn A.L. Bailer-Jones & David J.C. Mackay (1998) A recurrent neural network for modelling dynamical systems. Network: Computation in Neural Systems, 1998, 9, 531-547 [51] Fausett, L. (1994) Fundamentals of Neural Networks. Architectures, Algorithms and applications. Prentice Hall. [52] Vinay K. Ingle & John G. Proakis (2000) Digital Signal Processing using MATLAB. Brooks/Cole. [53] Emmanuel C. Ifeachor & Barrie W. Jervis (2002) Digital Signal Processing. A Practical Approach. Second Edition. Prentice Hall. [54] C. Sidney Burrus & Ramesh A. Gopinath & Haitao Guo (1998) Introduction to Wavelet and Wavelet Transform: A Primer. Prentice Hall  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items