Condition Monitoring of Industrial Machines Using WaveletPackets and Intelligent Multisensor FusionbySrinivas RamanB.A.Sc., University of British Columbia 2007A THESIS SUBMITTED ]N PARTIAL FULFILLMENT OF THEREQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinTHE FACULTY OF GRADUATE STUDIES(Mechanical Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)May 2009© Srinivas Raman 2009AbstractMachine condition monitoring is an increasingly important area of research and plays an integralrole in the economic competitiveness in many industries. Machine breakdown can lead to manyadverse effects including increased operation and maintenance costs, reduced production output,decreased product quality and even human injury or death in the event of a catastrophic failure.As a way to overcome these problems, an automated machine diagnostics scheme may beimplemented, which will continuously monitor machine health for the purpose of prediction,detection, and diagnosis of faults and malfunctions. In this work, a signal-based conditionmonitoring scheme is developed and tested on an industrial fish processing machine. A variety offaults are investigated including catastrophic on-off type failures, partial faults in gearboxcomponents and sensor failures. The development of the condition monitoring scheme is dividedinto three distinct subtasks: signal acquisition and representation, feature reduction, and classifierdesign.For signal acquisition, the machine is instrumented with multiple sensors to accommodate sensorfailure and increase the reliability of diagnosis. Vibration and sound signals are continuouslyacquired from four accelerometers and four microphones placed at strategic locations on themachine. The signals are efficiently represented using the wavelet packet transform and nodeenergies are used to generate a feature vector. A measure for feature discriminant ability ischosen and the effect of choosing different analyzing wavelets is investigated.Since the dimensionality of the feature vector can become very large in multisensor applications,various means of feature reduction are investigated to reduce the computational cost and improvethe classification accuracy. Local Discriminant Bases, a popular and complementary approach towavelet-based feature selection is introduced and the drawbacks in the context of multisensorapplications are highlighted. To address these issues, a genetic algorithm is proposed for featureselection in robust condition monitoring applications. The fitness function of the geneticalgorithm consists of three criteria that are considered to be important in fault classification:feature set size, discriminant ability, and sensor diversity. A procedure to adjust the weights ispresented. The feature selection scheme is validated using a data set consisting of one healthymachine condition and five faulty conditions.11For classifier design, the theoretical foundations of two popular non-linear classifiers arepresented. The performance of Support Vector Machines (SVM) and Radial Basis Function(RBF) networks are compared using features obtained from a filter selection scheme and awrapper selection scheme. The classifier accuracy is determined under conditions of completesensor data and corrupted sensor data. Different kernel functions are applied in the SVM todetermine the effect of kernel variability on the classifier performance.Finally, key areas of improvement in instrumentation, signal processing, feature selection, andclassifier design are highlighted and suggestions are made for future research directions.111Table of ContentsAbstract.iiTable of Contents ivList of Tables viList of Figures viiList of Abbreviations ixAcknowledgements xChapter 1-Introduction to Condition Monitoring 11.1. Rationale for Machine Condition Monitoring 11.2. Objectives 31.3. Experimental System: Iron Butcher 41.4. Investigated Faults 71.5. General Techniques of Fault Diagnosis 121.5.1 Model Based Systems 121.5.2 Signal Based Systems 131.5. Review of Previous Work 141.5.1 Multisensor Condition Monitoring 141.5.2 Signal Processing and Wavelet Analysis 151.5.3 Feature Reduction 161.5.4 Classification 181.6. Organization of Thesis 19Chapter 2- Signal Processing and Feature Representation 212.1. Instrumentation 212.2. Signal Representation 242.3. Experimental Results 302.4. Feature Representation 372.5. Discriminant Measure 372.6. Analysis of Feature Variation 39ivChapter 3- Feature Selection 413.1. Feature Reduction 413.2. Local Discriminant Bases 423.3. Genetic Algorithm for Feature Selection 443.4. Experimental Results 483.5. Discussion 52Chapter 4- Classification 544.1. Classification 544.2. Support Vector Machines 544.3. Radial Basis Function Network614.4. Experimental Results 624.4.1 Filter Selection 634.4.2 Wrapper Selection 63Chapter 5- Conclusions 675.1. Synopsis and Contributions 675.2. Future Directions 68References 70Appendix A: Gearbox Data 73Appendix B: Instrumentation 75B.1. Signal Acquisition Hardware 75B.2. Interface Software76VList of TablesTable 1.1: Potential Faults in the Iron Butcher 8Table 3.1: GA-Feature Selection Tuning Procedure 48Table 3.2: Feature Selection Based on Relative Entropy 52Table 4.1: Filter Feature Selection 62Table 4.2: Wrapper Feature Selection 64Table A.1: Gear Specifications 73Table A.2: Bearing Specifications 73Table B.1: DAQ Board Specifications 75Table B.2: Amplifier Specifications 75Table B.3: Accelerometer Specifications 75viList of FiguresFigure 1.1: Potential Wind Turbine Faults 3Figure 1.2: Intelligent Iron Butcher 5Figure 1.3: Electromechanical Conveying Unit 5Figure 1.4: Hydraulic System 6Figure 1.5: Pneumatic System 6Figure 1.6: Machine Operation 7Figure 1.7: Gearmotor from SEW Eurodrive 9Figure 1.8: Location of Bearing Damage 9Figure 1.9: Location of Shaft Misalignment 10Figure 1.10: Location of Gear Damage 11Figure 1.11: Dull Blade Fault Simulation 11Figure 1.12: Condition Monitoring Subtasks 14Figure 2.1: Signal Acquisition Schematic 21Figure 2.2: Accelerometer Positions on the fron Butcher 22Figure 2.3: Microphone Positions 23Figure 2.4: LabVIEW Graphical User Interface 24Figure 2.5: Short Time Fourier Transform 25Figure 2.6: Wavelet Transform 26Figure 2.7: Different Analyzing Wavelets 27Figure 2.8: Scalogram of Signal from Accelerometer #3 27Figure 2.9: Discrete Wavelet Transform 28Figure 2.10: Wavelet Packet Transform 29Figure 2.11: Accelerometer #2 Signal for Baseline Condition 30Figure 2.12: Microphone #1 Signal for Baseline Condition 31Figure 2.13: Accelerometer #2 Signal for Faulty Gear Condition 31Figure 2.14: Microphone #1 Signal for Faulty Gear Condition 32Figure 2.15: Accelerometer #2 Signal for Faulty Bearing Condition 32Figure 2.16: Microphone #1 Signal for Faulty Bearing Condition 33Figure 2.17: Accelerometer #2 Signal for Shaft Misalignment Condition 33viiFigure 2.18: Microphone #1 Signal for Shaft Misalignment Condition 34Figure 2.19: Microphone #1 Signal for Hydraulic System Fault Condition 34Figure 2.20: Microphone #1 Signal for Motor Fault Condition 35Figure 2.21: Accelerometer #3 Signal for Sharp Cutter Blade 35Figure 2.22: Accelerometer #3 Signal for Dull Cutter Blade 36Figure 3.1: Orthonormal Bases from WPT 43Figure 3.2: Feature Selection at a1 = 80, a2 =16 and a3 = 4 49Figure 3.3: Feature Selection at a1 = 60, a2 = 32 and a3 = 8 49Figure 3.4: Feature Selection at = 52, a2 35 and a3 = 10 50Figure 3.5: Feature Selection ata1 = 52, a2 = 38 and a3 = 10 50Figure 3.6: Feature Selection at a1 = 52, a2 = 30 and a3 = 18 51Figure 4.1: Separating Hyperplane 55Figure 4.2 Support Vector Machine Principle 56Figure 4.3: Nonlinear Mapping q’ 58Figure 4.4: Leave-one-out Cross Validation to Find Hyperparameters v and u 60Figure 4.5: Structure of the Radial Basis Function Network 61Figure 4.6: Wrapper Feature Selection for RBFN 65Figure 4.7: Wrapper Feature Selection for RBFN w/ Corrupted Sensor Data 65Figure 4.8: Wrapper Feature Selection for RBF-SVM 66Figure 4.9: Wrapper Feature Selection for RBF-SVM w/ Corrupted Sensor Data 66Figure A.1: Gearbox Exploded View 73Figure A.2: Gearbox Manufacturer Catalogue 74Figure B.1: Block Diagram for LabVIEW Interface 76Figure B.2: Front Panel for LabVIEW Interface 77viiiList of AbbreviationsIAL Industrial Automation LaboratoryLAN Local Area NetworkVFD Variable Frequency DriveCCD Charge Coupled DeviceOD Outer DiameterSIFT Scale Invariant Feature TrackingFFT Fast Fourier TransformDWT Discrete Wavelet TransformPCA Principal Components AnalysisLDA Linear Discriminant AnalysisWPT Wavelet Packet TransformLDB Local Discriminant BasesDPP Dictionary Projection PursuitMI Mutual InformationGA Genetic AlgorithmANN Artificial Neural NetworksRBFN Radial Basis Function NetworkSVM Support Vector MachineFPGA Field Programmable Gate ArrayDAQ Data AcquisitionVI Virtual InstrumentSTFT Short Time Fourier TransformCWT Continuous Wavelet TransformK-L Karhunen-LoeveKKT Karush-Kuhn-TuckerixAcknowledgementsFirst, I would like thank Professor Clarence W. de Silva for supervising me these last two years.I am tremendously grateful to him for his skillful guidance and unwavering support ofmyacademic and career goals. His academic accomplishments and humanitarian efforts area truesource of inspiration to me and I consider myself very lucky to have stumbled intohis laboratory(Industrial Automation Laboratory—IAL) and benefited from his engineering contributions.Other faculty I would like thank are Mr. Jon Mikkelsen, for all his support during myundergraduate and graduate studies, members of my research committee: Dr. FarrokhSassaniand Dr. Farbod Khoshnoud (SOFTEK), and Dr. Lalith Gamage—a former Ph.D. student of IALand currently a Visiting Professor with us, for their helpful advice and suggestions.I wish to thank my colleagues at the IAL for their friendship over the last two years; notably,Ramon, Guan-lu, Behnam, Gamini, Tahir, Arun, Ying, and Roland. In particular,Ramon hasbeen extremely helpful during the final stages of the project and I am very thankfulfor hissupport.Financial support for my research project has come from grants held by Prof. de Silva,particularly through the: Tier 1 Canada Research Chair (CRC), Canada Foundation forInnovation (CFI), British Columbia Knowledge Development Fund (BCKDF), andtheDiscovery Grant of the Natural Sciences and Engineering Research Council (NSERC)ofCanada.Words cannot describe how indebted I am to my parents. Short of writing my thesis, they havesupported me in every way possible. From helping me change greasy gearboxesto encouragingme towards the finish line, they have done it all. I can only hope to be as dedicated, loving andcaring a parent as they are. I dedicate this thesis to them.xChapter 1Introduction to Condition Monitoring1.1. Rationale for Machine Condition MonitoringIn many industries, particularly those related to manufacturing and production, machinemalfunction and failure is a cause for serious concern due to many reasons. Financially, it placesa large burden on operation and maintenance costs. Production output and product quality cansignificantly degrade if the machine will directly affect the production process. In extremesituations, there can be human injury or death resulting from catastrophic machine failures. Toprevent these serious problems, companies adopt various maintenance programs to monitor andservice machinery and processes.Maintenance strategies can be classified into several broad categories [1]:1. Run to Failure: Machine maintenance is only performed when the machine has failed.2. Scheduled Maintenance: Maintenance is performed at set time intervals.3. Condition-Based Maintenance: Maintenance is performed according to the condition of themachine, as determined through a monitoring scheme.It is estimated that maintenance costs contribute to approximately half of all operating costs inmanufacturing and processing industries [1]. In view of this, the choice of an appropriate strategyof maintenance is rather important. The choice depends on many factors including thedependence of production level and quality on the machine condition, redundancy of machineand operations, lead-time for replacement of machine, safety of personnel and environment, andreplacement costs of the machine. In particular, if the machine under consideration is crucial tothe production process, has a high replacement cost, has a dangerous failure mode or is difficultto access (due to mobility or remote location), the use of a condition-based program is welljustified.Condition-based maintenance offers many potential advantages over other maintenanceprograms. These include increased availably and reliability of machines, higher operating1efficiencies, improved quality of products and services of the machine, lower downtimes,reduced maintenance costs, and improved safety. Traditionally, the disadvantages of condition-based maintenance programs included the high cost monitoring equipment, operational costs,and requirement of skilled personnel for operating and servicing the monitoring system. Aselectronics and software become more accessible and easy to use, engineers and technicians canafford to implement real-time monitoring systems at a fraction of what the cost used to be. As aresult of the potential advantages, there has been a large effort to automate the maintenanceprocess and move towards condition-based maintenance programs [2].In implementing condition—based maintenance, certain machine parameters are monitored todetermine if there a change in these parameters, which is indicative of machine failure. Theseparameters are measured by appropriate sensors and the condition diagnosis is performed byhumans or by a computer-based system. If the machine is critical to the production process,continuous monitoring by a computer system may be necessary to provide an accurate and timelydiagnosis about machine health. Once the diagnosis is determined, corrective action may betaken ranging from immediate machine shut-down to scheduling a maintenance procedure in thenear future.Certain complex machines may require the use of multiple sensors to capture machine healthinformation. Wind turbines are examples of such machines because they contain numeroussubsystems, each with their own modes of mechanical, electrical and structural failures (see Fig.1.1). The use of more than one sensor is often necessary to make a robust and accurate diagnosisabout the turbine condition. Also, wind turbines are challenging machines to maintain: they arefrequently located in remote offshore locations and are subject to harsh environmentalconditions. A faulty turbine may require a long period of time to access and repair due to theremote location of these machines. Therefore, sensor redundancy is also a useful feature, so thatturbine performance can still be monitored if one or more sensors fail.The European Commission and the Institut fur Solare Energieversorgungstechunik at theUniversity of Kassel, Germany [3] have collaborated to develop a conditioning monitoringsystem to monitor the health of wind turbine systems. The system uses multiple accelerometers,strain gauges, speed sensors, and current sensors to continuously assess the performance andcondition of the wind turbine. The signals are acquired by an onboard data acquisition system2and communicated to a central server using Ethernet LAN technology, where the data isanalyzed using time series and frequency domain techniques. The condition monitoring systemshave been implemented and tested in three wind farms in Germany. As a result of the project’ssuccess, more wind farms in Europe are upgrading or planning to upgrade their systems toinclude advanced fault diagnostics capabilities.Figure 1.1: Potential Wind Turbine Faults. © P. Caselitz and J. Geibhardt, 2002, adapted bypermission.1.2. ObjectivesAs the previous example illustrated, multisensor-based condition monitoring may be required inapplications where a single sensor may not be able to give a complete diagnosis of the machinecondition and where sensors are prone to failure. The goal of the present thesis is to develop amultisensor condition monitoring scheme capable to:• Acquire machine signals from multiple, heterogeneous sensors• Represent the signals in a form that requires minimal computational resources andpreserves faulty features• Require minimal knowledge about physical characteristics of the machine• Require minimal knowledge about physical characteristics of the faultRotor- surface roughness, icing- imbalance- fatigue, impending cracks- faults in pitch adjustmentGear Box- tooth wear or breaking- eccentricity of gear wheelsGenerator- stator insulation failure- cracks in rotor bars- overheatinI fl:Bearings, Shafts- wear, pitting, deformationof outer face and rollingelements of bearings- fatigue, impending cracksof shaftsYaw System- yaw angle offsetTower, WEC Structure- resonances- fatigue, clearance, cracks3• Be implemented in real-time to minimize delay between failure and corrective action• Classify the machine faults with high accuracy• Be robust to disturbances, signal noise, and sensor failures,To accomplish these objectives, vibration and audio signals are acquired from multipleaccelerometers and microphones located at different positions in the prototype industrialmachine (lion Butcher). The signals are efficiently represented with the Wavelet PacketTransform to capture frequency-based fault signatures and a signal-based scheme is developed toavoid the requirement of complex model for machine and associated faults. An on-linemonitoring system is implemented to provide real-time machine updates every three seconds. Afeature selection scheme is developed to minimize the effect of sensor uncertainty, and variousclassification algorithms are implemented and tested to ensure a high accuracy of faultclassification.The overall goal of the research presented in this thesis is to develop a robust fault diagnosisscheme capable of diagnosing a wide range of faults in a complex industrial machine withminimal knowledge of machine and fault characteristics. A representative industrial machine isavailable in the Industrial Automation Laboratory, University of British Columbia, and will beused to develop and test the schemes of the present research.1.3. Experimental System: Iron ButcherFish processing is a multibillion dollar industry in North America and a major industry in theprovince of British Columbia, Canada. In Canada alone, the annual value of the fish processingindustry is estimated to be three billion dollars. The original lion Butcher, a machine designed atthe turn of the20thcentury was widely used in the industry for the head cutting operation ofsalmon. This machine uses a primitive design, without sensing and feedback adjustments, whichleads to significant (about 10%) wastage of useful meat and also degradation of product quality.To decrease the wastage of valuable fish, improve the product quality and process efficiency, andreduce the use of labor in hazardous routine operations, the Industrial Automation Lab (IAL) atthe University of British Columbia designed a machine that would replace the original lionButcher. The new machine, termed the “Intelligent lion Butcher” (see Fig. 1.2) has the followingimportant features: high cutting accuracy, improved product quality, increased productivity andefficiency and flexible automation.4The machine consists of the following subsystems, each with its own specific function:1. Electromechanical conveying system: The conveying system is responsible for transportingthe fish from the loading zone to the cutting area. It is powered by an ac (alternating-current)induction motor coupled to a speed-reducing gearbox. The ac induction motor is controlledby a Variable Frequency Drive (VFD) and a speed transmission unit with a gear ratio106.58:1 (see Fig. 1.3(a)). The rotational motion from the output of the gearbox is translatedinto an intermittent linear motion via a mechanical linkage attached to a sliding mechanism.The sliding mechanism contains a row of pins that fold in one direction only (Fig. 1.3(b)), sothe fish on the machine conveyor will only move in the forward direction.(a) Gearmotor and Linkage (b) Sliding MechanismFigure 1.3: Electromechanical Conveying Unit.Figure 1.2: Intelligent Iron Butcher.52. Hydraulic positioning system: The hydraulic system is responsible for positioning the cutterblade assembly with respect to the fish head (Fig. 1.4). There are two double acting cylinders(one for positioning each axis of the cutter assembly). Each hydraulic cylinder is actuated viaa three-way solenoid valve. The overall system is powered by a hydraulic pump.(a) Hydraulic System SchematicFigure 1.4: Hydraulic System.3. Pneumatic system: The pneumatic system is responsible for two functions. One is to powerthe cutter blade, which cuts the fish head (see Fig. 1.5(a)). The other is to hold the fishstationary during the cutting operation (Fig. 1.5(b)). There are four pneumatic cylinders intotal (three single acting cylinders for stabilizing the fish and one double acting cylinder forthe cutting operation). The cylinders are actuated via a four-way, five-port double solenoidvalve. The overall system is powered by a compressor.(a) Cutter Blade (b) Holding MechanismFigure 1.5: Pneumatic System.Hydraulic pistonand cylinder(b) Cutter Table6Under normal conditions, the overall plant executes the following sequence of operations (seeFig. 1.6). Fish are manually fed by the operator into the loading zone of the Iron Butcher. Thepins on the conveying table push the fish forward during the first half of the motion cycle.During the second half of the cycle, the pneumatic holder is activated and the fish is held downin place. At this time, there is one fish in the cutting zone and one fish in the standby zone. Whilethe conveying pins move back, they do not move the fish because they are retractable in thereverse direction. The cutting operation occurs during this time. A primary CCD camera capturesthe image of the fish and a vision algorithm calculates the optimal cutting position to minimizefish meat wastage. The controller sets the reference x-y position and an electro-hydraulicmanipulator accurately positions the cutter assembly accordingly. Once the assembly is inposition, the pneumatic cutter cuts the fish head. The holder is then released and the cutter blademoves up after the fish head is cut. Then the conveyor mechanism begins the next cycle.1.4. Investigated FaultsThe Iron Butcher has multiple modes of failure associated with each of the subsystems. Table 1.1summarizes some of possibilities. Since it is impractical to investigate all these faults in detail, asubset of these faults will be investigated to demonstrate the effectiveness of the proposedcondition monitoring scheme. Other faults may be handled in a similar manner.Figure 1.6: Machine Operation.7Table 1.1: Potential Faults in the lion Butcher.Major Sub-Systems Potential Faults3lectromechanical Conveying Induction motor failureSystem - Gearbox failure- Linkage failure- Jammed fishHlydraulic Cutter Assembly - Pump rotor/shaft failure3ositioning System - Motor failure- Proportional valve failure- Hydraulic actuator leakageneumatic Powered Cutter and - Valve failureish Stabilizer - Compressor rotor/shaft failure- Motor failure- Pneumatic actuator leakageFor the purposes of the present work, the following representative faults from three categoriesare investigated:1. General on-off type faults: For testing on-off faults, the electromechanical conveying systemand the hydraulic subsystem are turned off. On-off type faults can represent a catastrophicfailure in the system. In the tested cases, it may be more practical to obtain other signals fromthe machine; e.g., current input, pressure transducer, etc. However, it should be noted that theinstalled sensors are not meant for this function and catastrophic failure detection may beconsidered an additional function of these sensors.2. Common partial faults: For rotating machinery, faults are commonly found in three maincomponents: shafts, bearings, and gears. For investigating these common faults, componentsin the gearbox (see Fig 1.7) have been modified to simulate these conditions. Detaileddrawings and part numbers for the gearmotor can be found in Appendix A.8Bearing Damage: Bearings are one of the most important components in rotating machineryand are also the most susceptible to failure. Defects can appear in various bearingcomponents including the outer race, inner race and the rolling elements. Bearing damagecan be caused due to many reasons including excessive wear, corrosion, incorrectinstallation, mechanical shock and fatigue, misalignment, large electrical currents, andinsufficient lubrication. To emulate bearing damage (see Fig. 1.8), the inner race in bearing#34 (in red) was ground and the rolling elements were deliberately damaged with a hammer.Also, the rolling elements and races of bearings #25, #37 and #45 (in blue) were sandblastedto simulate natural wear. See Appendix A for detailed information about the bearings.Figure 1.8: Location of Bearing Damage. © SEW Eurodrive, 2008, adapted by permission.Shaft Misalignment: Misalignment between shafts can occur in various places in a machine.There are two kinds of misalignment: parallel misalignment and angular misalignment.Figure 1.7: Gearmotor from SEW Eurodrive.599Parallel misalignment refers to the offset of a shaft axis from its correct position and angularmisalignment refers to meeting of shaft centerlines at an angle. Misalignment can be causeddue to incorrect installation and mechanical shock causing realignment. Misalignment canlead to excessive machine vibrations and high radial loads on bearings, causing prematurefailures in these components. To emulate parallel and angular shaft misalignment (see Fig.1.9), shaft #17 and the outer diameter (OD) of bearing #11 was ground down 0.002” +-0.0005” to allow a sideway shift of bearing #25. The OD of bearing #25 was ground down0.0065” +- 0.0005” to allow shaft misalignment. Bearing #25 was shimmed on side A,resulting in the shift of shaft #7 toward the input pinion #1. Loctite was applied on theoutside of bearings # 11 and # 25 to prevent the outer race from spinning. The end result ofthis process was the misalignment of shaft #7.Gear Damage: Since gears transmit power from one component of the machine to another,there are significant forces on the gear teeth, making them especially susceptible to failure ifthere is a defective component. Gear defects can appear in various forms including nonuniform tooth wear, cracked, chipped or missing teeth, misalignment between teeth,backlash, and runout. The causes can be due to natural wear, operation outside normal range,and improper maintenance. To emulate defects in the gear teeth (see Fig. 1.10), pinion #34(in blue) and gear #4 (in red) were hit with a hammer and the teeth were gauged with agrinder. Pinion #34 and gear #4 are helical gears with 13 teeth and 76 teeth, respectively(Appendix A).Figure 1.9: Location of Shaft Misalignment. © SEW Eurodrive, 2008, adapted by permission.10593. Faults specific to fish cutting machine: Two possible faults specific to the fish processingmachine are investigated. A dulling cutter blade and a jammed fish in the conveyor table aretwo likely modes of failure during machine operation. To investigate cutter blade dullness,two materials with different flexural strengths: acoustic soundboard and polystyrene, wereused as the cutting material (see Fig. 1.11). The vibration profile and the cutting sound werecompared to determine if the difference in cutting impact could be detected. To investigate afish jam, a mock fish was held securely on the table so the pins would continuously hit thestationary fish, likely causing noticeable vibration in the machine.(a) Polystyrene Insulation (b) Acoustic SoundboardFigure 1.11: Dull Blade Fault Simulation.4. In addition to machine component faults, sensor faults ware also investigated. Remotelocations and industrial environments may pose problematic conditions for sensors to operateFigure 1.10: Location of Gear Damage. © SEW Eurodrive, 2008, adapted by permission.11reliably. To determine the feasibility of the condition monitoring scheme developed in thisthesis, the robustness of the scheme in the presence of sensor failures is investigated. Unlikethe previous machine component faults, the sensor faults are simulated in the process offeature selection and classification by randomly setting the output of selected sensors to azero value.1.5. General Techniques of Fault DiagnosisSchemes of fault diagnosis can be broadly classified into two categories: model-based schemesand model-free schemes, also known as signal-based schemes. Model-based schemes require amodel of the plant being monitored to make a diagnosis whereas model-free systems only requirea machine signal to make a diagnosis. The following sections outline the two approaches anddiscuss their relative merits.1.5.1 Model Based SystemsModel-based systems of fault diagnosis require a “model” or approximate mathematicalrepresentation of the physical plant. The model can take different forms depending on the plantcharacteristics. Some forms of representation include [2]:1. Physical equations: If the system is nonlinear and static, it can be represented in the formt’(y, u) = 0. There exist similar relationships for dynamic, nonlinear and multiple-input/multiple-output systems2. State equations of linear systems: Physical equations for dynamic, linear equations can bewritten as state and output equations:x(t) = Ax(t) + Bu(t)y(t) = Cx(t) + Du(t)3. State observers: State observers allow for the approximation of the dynamic state of thesystem by knowing the system inputs and outputs. For the described discrete state-spacemodel, a simple observer can be written in the form:x(k+1) = Ax(k) + Bu(k) + H[Y(k) -y(k) = Cx(k)124. Transfer functions: Physical equations for stationary systems can be written in theLaplace domain as a ratio of output y(s) to input u(s):u(s)5. Neural-network models: The system can be represented as a network of interconnectedneurons known as a neural network. Details about neural networks are found in Chapter 4[32].6. Fuzzy models: The system can be represented using models derived from approximatereasoning and logic [32].All these methods can be used to generate residual values, which measure the divergencesbetween observed operating conditions and normal operating conditions. Once residuals aregenerated, different techniques can be used to make a diagnosis about the machine condition.Experienced personnel can qualitatively determine the source of the divergence, or quantitativemethods can be used as well. These can range from simple limit-checks to more advancedclassification techniques such as Linear Discriminant Analysis and Neural Networks.1.5.2 Signal Based SystemsSignal Based systems require only a machine signal to make a diagnosis. Machine signals can berepresented in the time-domain, frequency domain, or in the time-scaled frequency domain. Theunderlying physics of the machine and fault often dictate the most suitable representation, asdiscussed in Chapter 2. Once the signal is suitably represented, domain experts can make adiagnosis qualitatively, or more advanced pattern recognition techniques such as NeuralNetworks and Support Vector Machines can be utilized for the purpose [2]. Most industrialcondition monitoring programs use signal-based schemes due to the time-consuming andcomplex nature of the system modeling process. If there are system nonlinearities, couplingbetween multiple subsystems, or unavailability of system information and parameter values, thedevelopment of an accurate system model can become quite challenging. The Iron-Butcher is agood example of a system that is difficult to model in its entirety. It consists of a highlynonlinear electro-hydraulic manipulator, has coupling between the three subsystems, and doesnot have information about all the system parameters as required to build a complete model. As a13result, the present work will focus on developing a signal-based condition monitoring scheme forcomplex machines like the lion Butcher.For the purposes of the present thesis, the task of signal-based condition monitoring can bedivided into three subtasks (see Fig. 1.12). First, the signals are acquired from the machine andsuitably represented. Then, the size of the feature set is reduced to facilitate the classificationprocess. Finally, the reduced feature set is sent to a properly designed classifier, which generatesa diagnosis about the machine condition.Vibration__jz)SoundFigure 1.12: Condition Monitoring Subtasks.1.5. Review of Previous Work1.5.1 Multisensor Condition MonitoringAcquiring information from multiple sensors rather than a single source can have many potentialadvantages including redundancy, complementarity, timeliness, and reduced cost of information.Sensor fusion can effectively reduce the overall uncertainty in a system while increasing theaccuracy of signal perception. In their review of multisensor fusion applications, Luo et al. [4]describe many examples where data from multiple sources are combined using traditional and“intelligent” methods. Applications of multi-sensor technology are highly varied and includerobotics, biomedical engineering, remote sensing, and equipment monitoring among manyothers.SignalRepresentationWavelet Packet14Condition based monitoring schemes using multiple sensors have been implemented in variousapplications and capacities in industry and academia. Of particular interest, Lang and de Silva [5]developed a condition monitoring scheme for the Iron Butcher in 2008. They used anaccelerometer, microphone, and a CCD camera to obtain vibration, sound and vision signals.They used the Fourier transform approach to process sound and vibration signals and a SIFTalgorithm for object tracking. A neuro-fuzzy classifier was designed to classify three types of on-off machine conditions. Although the scheme was able to detect these conditions with highaccuracy, partial faults were not investigated and the robustness of the scheme was not verifiedunder faulty sensor conditions. The present research will address both these issues and attempt toimprove upon the existing condition monitoring scheme.1.5.2 Signal Processing and Wavelet AnalysisConventionally, the Fast Fourier transform (FFT) has been used to represent machine signals inthe frequency domain. However, the wavelet transform has recently gained popularity as apowerful and computationally efficient representation for condition monitoring applications.Wavelet transform has the advantage of simultaneously providing time and frequencyinformation about the signal, which is very useful for analyzing non-stationary signals commonin many condition monitoring applications. In their review of current wavelet applications incondition monitoring, Peng and Chu [6] describe many possible uses of the wavelet transform. Itcan be used as a tool to compress signals, identify system parameters, detect singularities,denoise signals, and generate fault features. Wavelet analysis has been used to successfullydiagnose electrical and mechanical faults in a variety of machine components including bearings,gears, motors, and pumps.Bearings are some of the most important and common components in rotating machinery. In thearea of health monitoring in roller bearings, numerous researchers have applied the discretewavelet transform and wavelet packet transform to decompose vibration signals. Lin et al. [7)applied a Morlet wavelet and threshold denoising to detect impulses caused by faulty gears andbearings. Purushotham et a!. [8] used the discrete wavelet transform and Hidden Markov Modelsfor detecting single-point and multiple-point defects in roller bearings with up to 99% accuracy.Rubini and Meneghetti [9] compared the envelope spectrum and the discrete wavelet transformfor detecting faults in roller bearing elements.15Similarly, the condition monitoring of gears with wavelet analysis has been researched as well.Wang and McFadden [10] applied the continuous wavelet transform to asses tooth damage in ahelicopter gearbox. Sung et al. [11] used the Discrete Wavelet Transform to locate tooth defectsin gear systems at high accuracies. The results showed that the DWT was able to perform betterthan the Short Time Fourier Transform, especially when the faulty gear ran at comparable speedsto other gears. Lin and Zuo [12] used an adaptive wavelet filter to decompose accelerationsignals and detect fatigue cracks in gear teeth.In addition to applying the wavelet transform to acceleration signals for fault detection, thewavelet transform has been used to decompose sound signals. Shihbata et al. [13] used theDiscrete Wavelet Transform to create Symmetric Dot Patterns, which is a visualization techniquefor sound signals. Although not as effective as vibration-based monitoring, the transformedsignals were able to capture fault signatures of fan bearings. Also, Lin [14] applied the Morletwavelet to denoise sound signals and detect abrasion in engine bearings and pushrods. Wu andChan [15] decomposed sound signals with the wavelet packet transform to diagnose gearboxfaults at a high accuracy as well.In conclusion, a review of existing literature suggests that applying the wavelet transform toacceleration and sound signals is an effective approach for diagnosing a wide variety of faults inrotating machinery.1.5.3 Feature ReductionSince the wavelet analysis generates a large number of coefficients, a feature reduction scheme isnecessary to reduce the size of the feature set. Feature reduction will lower the computationalburden and improve the classification accuracy [16]. The two main approaches to featurereduction are (somewhat ambiguously) termed feature extraction and feature selection. In featureextraction, statistical/numerical “transformation” methods such as Principal ComponentsAnalysis (PCA), Independent Components Analysis (ICA), and Linear Discriminant Analysis(LDA) are applied to the initial feature set to reduce its dimensionality. However thetransformation can be computationally expensive and lead to numerical instabilities, particularly16when multiple sensors are used, and the resulting feature size is very large[171. Feature selectionis a more appropriate choice for feature reduction in fault diagnosis applications.Yen and Lin [17] proposed two statistical methods of feature selection in vibration monitoring ofa helicopter gearbox. After decomposing the signal with WPT and FFT, they proposed twofeature selection algorithms, PWM and KNK, to reduce the size of the feature set for input into aneural network. They found that by reducing the size of the feature set from 256 to 2 features foreach sensor, they were able to classify 8 types of faults at a high accuracy.Saito and Coifman [18] developed an extension to the “Best Basis” algorithm termed LocalDiscriminant Bases. This method selects a subset of bases from a collection of orthonormal baseswhich is best able to separate signals from different classes. The orthonormal bases can beconstructed by using wavelet packets or other time-frequency decompositions. The method forselecting these bases employed relative entropy as a cost function for maximizing classseperability. Along the same lines, Liu and Ling [19] developed an extension to the “matchingpursuit” algorithm for selecting wavelet coefficients in fault diagnosis applications. Termed“Informative Wavelets,” the algorithm uses mutual information as a criterion to search for thebest wavelet coefficients.Tafreshi and Sassani [20] developed a fault diagnosis scheme for detecting knock conditions in asingle cylinder diesel engine. After applying the wavelet packet decomposition to accelerationsignals, the feature selection methods: Local Discriminant Bases, Mutual Information, andDictionary Projection Pursuit were compared. LDB and DPP were both shown to have betterclassification performance and lower computational speeds than MI. Also a novel method forconstructing the energy entropy map was proposed to increase the performance of the LDB andDPP algorithms [21].In addition to LDB and DPP, several search methods have been proposed for variable featureselection. Genetic algorithms, in particular, have been proposed as an effective tool for searchinga large feature set and selecting the best features [22]. Jack and Nandi [23] proposed a wrapperbased Genetic Algorithm for reducing the feature set size in condition monitoring applications.The fitness function simply uses the classifier % accuracy to select the best feature set. Thealgorithm suffers the same setbacks as other wrapper feature selectors: it can be computationally17very expensive when selecting from large feature spaces, and there is a risk of overtraining thepattern recognition process to the particular feature set that is analyzed. In view of these issues, afilter selection algorithm is more likely to be effective for large feature sets and does not riskoverfitting the features to the analyzed data set.Of particular interest, Leong and Yen [24] developed a filter feature selection scheme using agenetic algorithm and LDB for diagnosing faults in a helicopter gearbox. After decomposingvibration signals with the WPT, a filter selection scheme with size and discriminant ability ascriteria for the GA objective function was implemented. The method was able to achieve lowerfeature numbers while providing similar classification accuracies as PCA, ICA, PWM, KNK,and LDB. Although showing promising results, there are two drawbacks with the proposedalgorithm. Firstly, the scheme does not utilize sensor redundancy. Given that the feature set sizeand the discriminant ability are the only two criteria for feature selection, it is entirely possiblethat all features may be chosen from one sensor if the features in that sensor have a high relativeentropy value. Also, there are no rules for selecting the objective function weights. As theperformance of the feature selector is highly dependant on these weights, it is difficult togeneralize its performance and benchmark against other methods. In the research of the presentthesis, to accommodate sensor failure, a modified genetic algorithm and a tuning procedure aredeveloped and tested.1.5.4 ClassificationSeveral methods are available to classify the machine condition using acquired signals. Theserange from simple statistical algorithms such as the k-Nearest Neighbor classifier to morecomplex techniques such as Artificial Neural Networks (ANN) and Support Vector Machines(SVM). In recent years, radial-basis function networks (RBFNs) and SVMs have found manyapplications in condition monitoring due to their capabilities as universal approximators andcapacity to perform nonlinear classification.Nandi and Jack [25] compared an ANN and SVM for detecting faults in rolling-elementbearings. For fault features generated from time series and spectral analysis, they found that theaverage value SVM had a slightly higher training classification accuracy but a lower testclassification accuracy than the ANN. Samanta [26] compared an ANN and SVM for detecting18faults in gears. The two methods showed similar classification performances, with the SVMperforming slightly better for a larger feature set, but the training time for the SVM wassubstantially lower. Lv et al. [27] compared ANN and SVM for diagnosing power transformerfaults based on features extracted from Dissolved Gas Analysis. They also found that the SVMhad higher classification accuracies and a lower training time.In comparing the two approaches, studies indicate that the SVM performs better than the ANNand requires less training time. However, the possibility of corrupted data arising from sensorfailure has not been investigated. The present research compares the performance of the twoclassifiers under conditions that simulate both machine failure and sensor failure.1.6. Organization of ThesisThe present chapter discussed the rationale and motivation for needing an on-line conditionmonitoring scheme and outlines the goals and challenges of the research presented in this thesis.The experimental system that is considered in the present context (an industrial Iron Butcher)and the faults that are introduced and studied were described. General techniques for conditionmonitoring were described and the choice of a signal-based technique was justified. Finally, adetailed literature review highlighting relevant past work in condition monitoring, signalprocessing, feature reduction and classification was presented.Chapter 2 describes the signals acquired from the investigated machine and the instrumentationinstalled on the machine for condition monitoring. The foundation of the wavelet transform andwavelet packet decomposition is presented and its advantages over conventional Fouriertransform methods are also discussed. Finally, the generation of a shift-invariant feature vectoris presented and a method for evaluating the discriminant ability is chosen.Chapter 3 describes and compares various methods of feature reduction. Local DiscriminantBases, a common algorithm for frequency-based feature selection, is described and thedrawbacks are argued. A novel feature selection method using Genetic Algorithms is proposedand a tuning procedure is developed. Finally, both algorithms are implemented and experimentalresults are discussed and compared.19Chapter 4 introduces the concept of pattern recognition. The theoretical foundations of RadialBasis Function Networks and Support Vector Machines are presented. The two methods areimplemented and experimental results are discussed and compared.Chapter 5 concludes this work by proving a synopsis of the presented research and outlining themajor contributions made in the thesis. Finally, suggestions are made for further research in thisfield.20Chapter 2Signal Processing and Feature Representation2.1. InstrumentationThis section describes the instrumentation used for condition monitoring in the Iron Butcher.Figure 2.1 shows a schematic representation of the monitoring equipment and the correspondinginformation flow.Acc #1Acc #2Acc#3Acc#44!-Mic#1‘(Mic#2MicPentium IVComputerMic#4Sound CardsFigure 2.1: Signal Acquisition Schematic.To obtain vibration signals from the machine, four single-axis piezoelectric accelerometers fromKistler Instruments are used. A charge amplifier from Kistler Instruments is used to amplify thesignal into a mV voltage reading. The conversion factor from acceleration to voltage is 100mVIg. Since the motor speed will have the highest frequency in the machine (45 Hz) at itsnormal operation, the sampling rate is set at 1 kHz, so multiple harmonics can be captured inaccordance with Nyquist’ s sampling theorem.Power Amp FPGA DAQ Card21Figures 4.1 show the location of the accelerometers. Two accelerometers are mounted on thegearbox to primarily capture fault characteristics of the bearings, gears and the shaft within thegearbox. There have been many studies conducted on the optimal location of the accelerometermounting for gearbox condition monitoring. While there have been no established guidelines onthe exact placement, a suggested approach is to mount the accelerometer radially with respect tothe axis of rotation [1]. One accelerometer is mounted on the frame of the machine to capturecatastrophic failures and conveyor vibrations. One accelerometer is placed near the cutter table toobtain information about the dulling of the blade.The acceleration signals are logged by a field programmable gate array (FPGA) data acquisitionboard from National Instruments. The FPGA DAQ board differs from the regular DAQ board inthat a FPGA is used to control the device functionality rather than a ASICS board. The FPGAboard offers many advantages over traditional Data acquisition boards; for example, completecontrol over synchronization and timing of signals, on-board decision making abilities, and truemulti-rate sampling. With respect to on-line condition monitoring, a major advantage of using anFPGA board is that all data acquisition functions are hard wired in the FPGA, reducing theprocessor load for complex signal processing and classification computations.(a) Accelerometers #1 and #2 (b) Accelerometer #3 (c) Accelerometer #4Figure 2.2: Accelerometer Positions on the Iron Butcher.22Four wideband, capacitive type microphones are used to capture machine sound. Themicrophones capture acoustic pressure waves in the air and give a corresponding voltagereading, which is read by a computer sound card. Since most computer sound cards only acceptone microphone input at a time, three additional sound cards were added to the existingcomputer. According to maintenance technicians at SEW Eurodrive, most gearbox conditionscan be heard and diagnosed by an experienced operator. Therefore, the sampling rate was chosento capture most of the human audible spectrum (20 Hz to 40,000 Hz); however, the sampling ratewas limited to 30 kHz due to real-time processing limitations of the current computer. Figure 2.3shows the location of the microphones. Two microphones are placed near the gearmotor, onemicrophone is placed above the fish-cutting machine, and one microphone is placed near thecutter blade.Figure 2.3: Microphone Positions.(a) Microphones #1 and #2______4(b) Microphone #3 (c) Microphone #423The signal processing functions and Graphical User Interface (GUI) are programmed inLabVIEW. For the classification computations, the ANN and SVM algorithm are expressed inMATLAB code that is embedded in the LabVIEW Virtual Instrument. The GUI has threesections: a controls section for adjusting the signal sampling rates and triggering the acquisition,a plot section where raw data and spectral data are plotted, and a diagnostics section where LEDdisplays indicate the status of the machine. Appendix B has further information about the VI andsoftware. The signal processing functions and diagnostic computations are performed every threeseconds, thereby determining the effective refresh rate of the machine status.A”’A003 —k6I360017I0.A”22600365o,,dl5Sot00dt6Figure 2.4: LabVIEW Graphical User Interface.2.2. Signal RepresentationI.’ot Fab..ePjsp F.eThe common method used to diagnose faults in reciprocating machinery is to represent the signalin the frequency domain using the Fourier Transform. This method decomposes a signal intoconstituent sinusoidal signals at different frequencies. Specifically,-jO)tF(w)= ff(t)e dtwhere F(o) is the Fourier transform of signal f(t)(2.1)AcceIe,,ete,.5,rp..4pt. ..o;o, to re00,10,20.136*1.1lOI28I20.06—-0J0040 0.2 0.4 0.6 0,6 I 1.2 1.4 1.6 i.e 2Th*1o 0o-o--24To avoid redundancy and reduce computational expense, the discrete version of Fouriertransform is implemented in the form of the Fast Fourier Transform (FFT):2jriknF (2.2)where F is the discrete Fourier transform of the signalfk.Although the Fourier transform is adequate in many applications of signal processing, it has amajor disadvantage. In the transformation from the time domain to the frequency domain, allinformation about time is lost (hidden). If the signal is non-stationary and has characteristics thatchange over time due to drifts, trends, abrupt events, transients or other occurrences, the Fourieranalysis can become less effective. As an improvement to the Fourier transform approach,Gabor in 1946, proposed a windowing technique known as the Short Time Fourier Transform(STFT). By considering small sections of the signal in sequence and performing the Fouriertransform on them, the STFT maps the signal into a two-dimensional function of time andfrequency.>ci)0ci0ci)LLTimeFigure 2.5: Short Time Fourier Transform.The formal definition for STFT is:STFT(, w) = sQ)gQ — )e dtTi me(2. 3)where STFT(T, a) is the Fourier transform of the signal s(t), that was previously windowed bythe function g (t) with respect to the time shift variable ‘r.Although the STFT gives both time evolution and the frequency spectrum of the signal, it hastwo major drawbacks: it has a fixed resolution with respect to the time window size at allfrequencies and there are no orthogonal bases for computing the STFT. These drawbacks result25in limited precision achieved due to the window size and reduced efficiency of the algorithmsused for computing the STFT. The second drawback is especially important since fault diagnosisschemes are implemented in real-time and computational cost/speed is a high priority in thepresent application.Originally introduced by Grossman and Monet in 1984, wavelets are a class of irregular,asymmetric functions that have no analytical expression to describe them. Unlike sinusoidalfunctions, wavelet functions have a finite duration and the average value is always zero. Due totheir unique properties, wavelets have found success in a number of areas including datacompression, image processing, and time-frequency spectral estimation.The formal definition for the wavelet transform is:—1/2r tW(a,b;yI)= a JxQ)yIIdt(2. 4)a)where a is the scale parameter, b is the time parameter, (t) is an analyzing wavelet and t’4 (t)is its complex conjugate.As the above formulation shows, the wavelet analysis provides a time-scale view of the signal(see Figure 2.6). In providing a time-scale view, it allows the use of variable window sizes whenanalyzing a signal. Conveniently, the high-frequency information can be analyzed with a shorttime interval and the low-frequency content can be analyzed with a long time interval.Ti meFigure 2.6: Wavelet Transform.Ti meThe choice of the analyzing wavelet depends on the particular signal processing application andthe associated requirements. There exist several families of wavelets as developed by variousresearchers, each family with unique properties and associated advantages and disadvantages.For example, the Biorthogonal wavelet has a linear phase, which is useful for signalreconstruction, and Symlets are symmetrical, which is a useful property for image dephasing.26Among the most commonly used wavelet families in condition monitoring are the Daubechies,Biorthogonal, Symlet and Coiflet wavelets. There are no clear guidelines for selecting theanalyzing wavelet in condition monitoring applications, so different wavelets are tested in thepresent application to determine if the particular choice of wavelet has an effect on classificationperformance.2 4 ê 4(b) DBO4 Wavelet (c) Bior4.4 WaveletFigure 2.7: Different Analyzing Wavelets.a a a a(d) Coif4 WaveletFigure 3.1 shows a scalogram of the machine signal from accelerometer #1 during regularmachine operation. The vertical axis represents the scale of the analyzing wavelet and thehorizontal axis gives the time of the signal. The color of the map corresponds to the magnitude ofthe coefficients at each scale. Since the period of the conveyor motion is large, there is a muchhigher correlation between the coefficients at the highest scales.iiIiyAnalyzed Signal (length 7006)L IyrTr - -V.1000 2000 3000 4000 5000 6000Ca,b Coefficients - Calorabon made: nit, by scale + abs111ijScale of COLOrS from Mu4to MAXFigure 2.8: Scalogram of Signal from Accelerometer #3.Since the CWT (Continuous Wavelet Transform) is computationally expensive and containsredundant information, a subset of scale parameters a and b is chosen to efficiently representthe signal with no loss of information. The parameters a and b are discretized as a = a and—1UF IL—-.-.II05(a) Harr Wavelet0-05[700027b = nab0where m and n are integers. The discretization of the scale and time parametersresults in the Discrete Wavelet Transform (DWT), defined as:W(m,n;if)= a’2fx(t)v*(amt_nbo)dt(2.5)An efficient scheme to compute the DWT using cascaded filters was developed by Mallat and isknown as the Fast Wavelet Transform (FWT). This involves the introduction of a scalingfunction 0(t) and the subsequent calculation of wavelet t’(t) from 0(t) [17]. We haveç5(t12)= ./..hkO(2t—k)(2.6)ii(tI2) = (2.7)where g, andhkare elements of the coefficient vectors of quadrature mirror high pass and lowpass filters, respectively, and k is a time localization parameter. As a result of this relationship,the wavelet transform can be applied to a signal by using filters only without the need forwavelets or scaling functions. In the DWT, only the coefficients from the low pass filter(approximation) are passed through subsequent filters. The high frequency coefficients (details)are not considered to contain as much information and are left as is.a) Fast Wavelet Transform b) Filter BankFigure 2.9: Discrete Wavelet Transform.In developing a condition monitoring scheme that can be generalized for various machines andfault characteristics, it is important to recognize that frequency bandwidth of interest may not beknown beforehand. If the fault characteristics are present in a narrow, high-frequency bandwidth,28the Discrete Wavelet Transform may not analyze the characteristics with sufficient resolution.The Wavelet Packet Transform (WPT) is a generalization of the DWT, where both theapproximation and the details are split into further nodes (see Figure 2.10). This allows the signalto be represented as any combination of the approximation and details nodes.Figure 2.10: Wavelet Packet Transform.To represent this transformation, a wavelet packet function is defined asWJk(t)=2’2W(2t—k)(2.8)where n is the oscillation parameter,jis the scale parameter, and k is the translationparameter. The first two wavelet packet functions are the scaling function and the basic waveletfunction, respectively, as given byW0°(t) = (t) (2.9)W (t) = (2.10)All subsequent wavelet packet functions can be described by the following set of recursiverelationships:W0(t) = J5hkl47k(2t — k) (2.11)W0’(t) =gyJfl(2t — k) (2.12)Using these definitions, the wavelet packet coefficients of a functionfcan be determined byWJkfl =ff(t)Wjkk(t)dt (2.13)29By decomposing both the high frequency and low frequency components, we obtain a richlibrary of orthonormal bases that contain time and frequency information about the stationaryand nonstationary characteristics of a signal.2.3. Experimental ResultsFigures 2.11 to 2.22 show a selection of the acquired acceleration and sound signals and thecorresponding discrete wavelet decomposition of these signals. The accelerometer andmicrophone positions correspond to those shown in Figures 2.2 and 2.4.Even though the WPT isused for feature generation, the DWT is better suited in familiar problems and therefore used forillustrative purposes. The signals are decomposed into 4 levels using the DBO4 wavelet. Here Sis the raw signal in millivolts, a4 is the approximation (corresponding to low frequencies) at thefourth level and d is the detail (corresponding to high frequencies) at the x level. The y-axisgives the raw millivolt reading and the x-axis represents the time count in 1 ms for accelerationand 0.033 ms for sound signals.0.01I Ia4:0.02d4________________________________d0.04 I I I0.05d1 fr0-0.05I I I I I100 200 300 400 500 600 700 800 900 1000Figure 2.11: Accelerometer #2 Signal for Baseline Condition.30Sa4d40.050.02-0.020.02-0.020.02-0.020.02-0.02LI I -Ii .I—•1.I.. i. i. - . ..i .1 ,. II .A I h .. .. .i.. .u.rf II 1•I•”I’”—r..i.L ..i • , 1. iI .1..I .LL.L IL. A A .. ,. .iIIt1IIRIh_.’-SI.iiiItIi1U1,r I‘I’ •1 ‘I7¶r’‘TI”’IIIIkIlie irni‘ miii *iiiimiii::iIwm1p0.02imirtpiir,0.020-0.02-0.04-0.060.5 1 .5 2 2.5ri1 •-‘Ia.Figure 2.13: Accelerometer #2 Signal for Faulty Gear Condition.Figure 2.12: Microphone #1 Signal for Baseline Condition.02S-0.2a40.05d40-0.050.1d3°-0.10.1-0.1d2d1100 200 300 400 500 600 700 800 900 100031Figure 2.14: Microphone #1 Signal for Faulty Gear Condition.0.02a40-0.020.020.040.02-002,Iv•1.-0.040.05-0.05IkI4i•1I1!J‘I•j’ ,. .I 1._I. 1.IIJIh.A—mi’I—I I’ irjW1!rI.1II I I I I0n5wani.4 [IMIii-0.05II I I500 1000 1500 2000 2500 3000Figure 2.15: Accelerometer #2 Signal for Faulty Bearing Condition.i__si*lllM‘I.,,li I I Ik.L..kJ.. .. •.i I . ...‘TT P PJ1 rP -.. I-TI’’’I0.050-0.050.05a4-0.050.02d4-0.020.02d3-0.020.02d2-0.020.02-0.02.. I. - LLI.øU.II1Iw..nDI”.I$IIeJIe1[I!II. IIF r- . — a.A..t‘FRImbH.irui.T1 IlUulIIIfliniuii, -rr v11- !T 7’‘I’•’ -. I-‘4.——-L‘r’,d1EIuI*IPIIP1 Uii -—Ib*IIir*ir,. -Niijisrimu.ii. — -ri0.5 1 .5 2 2.5S0.10-0.1-0.02d4d3d2d132Figure 2.16: Microphone #1 Signal for Faulty Bearing Condition.Sd4d2d10.20.1‘r‘i’jr0.05[h [• . -. iii.L 1 .[ LAJ,-0.050.05-0.050.1-0.10.1..i,Id.hL.Ll.. ii.JUL ,,,LLL., u’,. L.I. INIItj$U1IuF.•1•‘TT..r.CraLi-0.12000 2500 3000 3500 4000IrL.IhIIIIIIuj’nwniiiinpw‘pselFigure 2.17: Accelerometer #2 Signal for Shaft Misalignment Condition.S0.05-0.050.02.I._.1.[,I . ,I,...,.Jd.I. . Li - h... ii .-0.020.020S. II. I.[I_I.iI‘friI I I I I-0.02bi..- .1Sa4d4-0.020.040.02d2-0.02-0.040.02d10-0.020U$wuL tLaIrflr11I1piIIu’vILnaiu,. jiiuiiiifl0.02k.IT0Iii *1 wreti.1iL.JrtI”rn .1i-i---triinit Ju1&.r‘..-Ia11 - - .. L i rrp—I.irrit .1 10.5 1.5 2 2.5-0.10.05a4500 1000 1500 4500 5000330.1S-0.10.1a4d4-0.10.02-0.020.02-0.020.02-0.02Figure 2.18: Microphone #1 Signal for Shaft Misalignment Condition.Sa4d4d3d24J.1.r .r.r. I. ‘v- -t.r. I,•11.5 2 2.5Figure 2.19: Microphone #1 Signal for Hydraulic System Fault Condition.34I I. . .. ii .1 JILr.’..T1—‘I.mjk“unL iflkioII1‘wr •--‘‘. — I - j -.J- I .- J tImI...Ifl-1*4u.mu-1H.LIIeMrn1$rmrn I .TIrI 19‘I — —L1. -jpjppii_d3d2d1jr’ ,1rl•—I—-1”——- r,----.-,,_.T’P”.1 . . . . ,TF’“. —-----r , 1J.U1-0.02IIC0.5 1.5 2 2.50u_Us1.LL. .--“k . L. ..11L .1 - - Lii.-0.058.040.020-0.02-0.04[.L . ..ili. .1L - . - . i. . ,,.. .. .. Li1..1111 • 1TI’‘‘‘‘j‘FIL... .kr’r0.020‘IIJS1iijij owii HN”-”P ---0.020.050-0.05rr_IJfl — —. sinmr ii I.Il0.020-0.02LL ... £,.--rrni•i__ — ._sirimwi0.02i— H” :a.* S-si -•--lê .WpOsSd4d3d2d10.02C_n n20.01C-0.010.01-0.010.01C-0.010.01-0.010.60.40.20-0.2-0.40.1a40.1d4-0.40.2d30-0.20.1d20-0.10.1d10-0.1I I I Ij,1rI I500 1000 1500 2000 2500 3000Figure 2.21: Accelerometer #3 Signal for Sharp Cutter Blade.‘r’,rr.-”---U.U20.01a4-0.01IaLisIeP% aPaJ!J*ULLII$11#$,t IlIflhrgIrri i11*pIIIptI’ -FIHt.LILtIIIfluIIiIIu,1hILIPWWIlI.IHtI-r1 masml- -‘-‘...iI U, ULUI—•::iuimiiauiiiiimi :-r’CFigure 2.20:0.5 1 1.5 2 2.5 3Microphone #1 Signal for Motor Fault Condition.350.80.6S0.40200.2a40.1020-0.20.402d30-020.1Figure 2.22: Accelerometer #3 Signal for Dull Cutter Blade.As the gearbox contains 6 gears and 5 bearings in total, it is difficult to exactly correlate thesignal readings with the emulated faults in view of the rotating components running at differentfrequencies and associated harmonics. Also, the precise nature of the faults is unknown becausethe gear units came preassembled with damaged components. However, the signaldecompositions show that the wavelet transform is able to effectively differentiate between mostof the different conditions. Comparing Figures 2.11 and 2.13, the peak amplitude in theapproximation for the faulty gear is double the baseline amplitude and the frequency of the peakscorresponds approximately to the gear meshing frequency, indicating a problem with the gearteeth. Comparing Figures 2.11 and 2.15, there is a large periodic impulse in the detailbandwidths for the faulty bearing, possibly arising from rolling elements passing over inner racedefects. In Figure 2.17, the approximation itself is oscillatory at a very low frequency. Since themisaligned output shaft is rotating at the lowest frequency in the gearbox, one can correlate theoscillation with the imbalanced force arising from the misaligned shaft. In Figure 2.19, the signalenergies at all scales of the decomposition are lower, resulting from the missing sound of thehydraulic pump. In Figure 2.20, the periodic conveyor sound is missing and the only sound4*44_+**H-0.10.1d10500 1000 1500 2000 2500 300036present is from the hydraulic pump. Figures 2.21 and 2.22 show the impulses from the cuttingblade breaking materials with different strengths. As expected, the sharp blade generates a cleanmaterial breakage and the dull blade has to exert the force over a longer time to achieve the sameeffect.2.4. Feature RepresentationOne major disadvantage of the wavelet packet transform is the lack of transform invariance inthe wavelet bases. Two signals that are shifted slightly in time can have significant differences incoefficient representations. As a result, wavelet packet coefficients cannot be used directly asreliable feature representation means for on-line condition monitoring systems. Yen [24] andTafreshi [20] also describe difficulties of using the coefficients directly for featurerepresentation. One way to solve the feature representation problem is to define node energy asthe sum of all coefficients in the node. By choosing a large window, the effect of any signalshifts will be minimized. We haveEJk=w2 (2.14)7j,k,nwhere w are the coefficients in nodej,k of the wavelet packet tree.By computing the energy of each node, we define a unique feature for each frequency band ofthe wavelet decomposition. Once the energy is computed, the features are preprocessed toeliminate disproportionate differences between classes and to improve the classificationperformance. A simple unit range scaling is used to find the normalized feature;thus,(2.15)u—iwhere x is the mean, 1 is the lower bound, and u is the upper bound of the raw features acrossall classes.2.5. Discriminant MeasureThe discriminant ability of a feature can be defined as a measure of how differently twosequences p and q are distributed. In the application of pattern recognition, it can be described as37the ability to differentiate between two classes. There are many statistical measures ofdiscriminant ability [28]. Some of the most common measures include:• Generalized f-divergence-based distance measures• Mean distance-based distance measures• Contrast type distance measures• Model validation distance measures• Entropy-based distance measuresAccording to Saito and Coifman [18], a natural choice for the discriminant measure in waveletbased pattern recognition applications is relative entropy. Before discussing relative entropy, theconcept of entropy is introduced. Entropy can be viewed as an energy concentration of acoordinate vector or a measure of how much information a signal contains.Shannon entropy is defined as:H(p)plogp (2.16)where p is a nonnegative sequence with p, = 1Because applications of pattern recognition are concerned with the ability to differentiatebetween signals, a discriminant version of entropy, the relative entropy, is often used:l(p,q)_=p1logPL(2.17)i=1q1where p and q are two nonnegative sequences with p1 =q. =1.Because the discriminant measure in Eq. 2.17 is not symmetric and does not satisfy the triangleinequality, the discriminant measure will depend on which class is defined asp and which classis defined as q. The symmetric version of relative entropy, also known as J-divergence, is used:J(p,q) p1 log- +q1 log--- (2.18)q,i=1p1For measuring the discriminant abilities between multiple C classes (as in the present work), thesum of pairwise combinations of relative entropy will be used:c-i CD({p})D(p,p)(2.19)i=1 j=i+1where D is the discriminant measure of the signalsp382.6. Analysis of Feature VariationBefore the feature selection process was implemented, a simple statistical test was implementedto confirm that the differences in features were statistically significant. To test whether thefeature distributions overlapped, the 95% confidence interval of the mean was calculated usingC.I.—x±1.96---—- (2.20)where s is the standard deviation for the features in a class and n is the sample size.The statistical analysis revealed that differences between cutter dullness conditions and fishjamming conditions were not statistically significant; i.e., none of the sensors could detect anyfaults for these conditions when the machine was in operation.Note: As a result, these faults were not considered in the following chapters for the developmentof pattern recognition algorithms.Interestingly, these faults could be detected in isolation (when the other subsystems were turnedoff) and when there were dedicated sensors for detecting these faults.Since there are no clear guidelines for picking the optimal analyzing wavelet, different waveletsare tested to check if the choice of wavelet function significantly affects the discriminantmeasure. The top 16 bases with the highest discriminant power were calculated using eachwavelet decomposition. The results are as shown below:DBO4: 26 25 31 27 28 18 23 30 32 19 29 21 24 13 05 06Haar: 25 31 26 27 28 18 19 32 30 23 21 29 11 16 13 22Bior4.4: 27 25 26 32 28 31 19 21 18 23 24 06 30 29 22 05Coif4: 28 25 26 27 32 31 30 18 23 06 29 05 21 19 13 24Sym4: 28 26 27 25 32 23 31 18 30 19 21 29 13 05 06 2039Note: The discriminant calculations are different from those presented in Chapter 3 because theseranking were generated by using only one sample of acceleration and microphone signalsbecause of data processing limitations.As the discriminant rankings show, there is very little difference (qualitatively) in thediscriminant measures of the different wavelet decompositions. As a result, the feature selectionalgorithm and the subsequent classification procedure are presumably unaffected by the choiceof the analyzing wavelet. DBO4 is chosen as the analyzing wavelet due to its popular use in otherapplications of condition monitoring and the possibility of more standardized comparisons withexisting research.40Chapter 3Feature Selection3.1. Feature ReductionFeature Reduction is one of the most important components of the pattern recognition process.There are two main reasons to reduce the dimensionality of the feature space: to decrease thecomputational expense and increase the classification accuracy [16]. Feature reductiontechniques can be divided into two broad categories: Feature extraction and feature selection.Feature extraction reduces a feature space of dimensionality m to a subspace of dimensionality d<rn by applying a linear or non-linear transformation. Principal Components Analysis (PCA),also known as the K-L expansion, is a popular technique for feature extraction. Other variationsof feature extractors have been proposed for dealing with non-Gaussian distributed data andapplying nonlinear transformations. These techniques include Independent Components Analysis(ICA), Nonlinear PCA, and Neural Networks.Although feature extraction methods provide better classification accuracies than featureselection methods, there are some drawbacks with applying feature extraction in conditionmonitoring applications. If the dimensionality of the feature is exceedingly large, thecomputational cost of the transformation will be correspondingly large. This may not be suitedfor on-line condition monitoring applications where the frequency of machine diagnosis needs tobe sufficiently high. Also, once the transformation is applied, the physical meaning of thetransformed features is lost. When sensor failure is considered, it may be beneficial to knowwhich sensors are providing useful information and how the system will perform if those sensorsignals are not accurate.In light of these issues, feature selection is better suited for condition monitoring applicationswith multiple signal sources. Feature selection reduces a feature space of dimensionality m to asubspace d < rn by selecting a subset of the features that minimizes the classification error.Feature selection methods can be categorized as filters or wrappers. Wrapper selection uses the41classifier to score the performance of the selected features. Different search strategies can beused to guide the selection process including exhaustive searches, forward and backwardselection, and stochastic methods such as genetic algorithms. Although simple to implement andoften produce good results, wrappers methods run the risk of overfitting the model and featuresto a particular data set and classifier [29]. In the present work, rather than using wrappers toselect features, they are used as a tool for evaluating the performance of classificationtechniques, as described in Chapter 4.Filter selection methods select a subset of features independently of the classifier according tosome predetermined criteria. Feature selection uses a metric to predict the performance of afeature subset. For classification problems, discriminant measures of class seperability are oftenused as filter selection metrics. In the next two sections, Local Discriminant Bases, acomplementary feature selection approach to WPT, is discussed and a Genetic Algorithm-basedfeature selection approach is proposed.3.2. Local Discriminant BasesCoifman and Wickerhouser [30] originally developed the popular and widely used “Best Basis”algorithm for optimal wavelet packet tree decomposition. In the method, using entropy as thecost function, the optimal structure of the wavelet packet tree is determined by exploiting theadditive property of entropy and utilizing a “divide and conquer” approach. Since the originalalgorithm was developed for signal compression applications and focused on optimal signalrepresentation, Coifman and Saito [18] proposed “Local Discriminant Bases,” an algorithmspecifically designed for signal classification applications. In their method, rather than usingentropy, an additive discriminant measure is used as the cost function to maximize classseperability while minimizing the representation size.To apply the local discriminant bases algorithm, the signal needs to be represented by acollection of orthonormal bases which can be obtained from wavelet packet decompositionor local trigonometric transforms.42ooI2IIIrrflr3,6Figure 3.1: Orthonormal Bases from WPT.The algorithm for the LDB is described as follows [18):Given a training datasetTconsisting of C classes of signals}whereBJkare the basis vectors of subspace andAJkis an array containing the discriminant measures of subspace2jk;Step]: Once the signal is decomposed into a dictionary oforthonormal bases, specify themaximum depth of decomposition J and discriminant measure DStep 2:SetAJk = BJkandAJk = D({F(J,k,.)}1)for k = 0 ,2Step 3: Determine the best subspaceAlkbySettXjk =D({F(j,k,.)}1)IfAJk AJ+l2k+AI+I2k+l,thenAfk = BJkelseAfk = AJ+12k‘ A+l2k+Iand +j+1,2k+1(where$ is a direct sum)Step 4: Rank the Basisfunctions in order ofdiscriminant abilityStep 5. Use the bases with the highest discriminantfunction for constructing the classifier.To summarize the process, the algorithm starts by evaluating the discriminant measures of theterminal nodes as specified by the maximum level of decomposition. The sum Of the two“children” node discriminant measures is compared with the discriminant measure of the parentnode, which is one level higher on the Wavelet Packet tree. If the summed discriminant values of43the children are higher than that of the parent, the children nodes are kept as the “best bases.” Ifthe discriminant value of the parent node is higher, the children nodes are discarded and thecomparison process repeats itself for the parent node which will now be the children node. Thissequence of actions will repeat until the highest level of the tree is reached, resulting in adecomposition that has the maximum value of relative entropy.Although LDB has been used extensively in different pattern recognition problems, thealgorithm is not designed to be used for feature selection from multiple signal sources (i.e.,multiple time-frequency energy maps). Once the “best” features for each signal source aredetermined, there are no guidelines on how to select the features across different signal sources.Relative entropy cannot be used on its own for feature selection because if one signal source hasa high relative entropy measure for all of its bases, all the features will be selected from thatsignal source. This can be problematic for two reasons:1. Multiclass relative entropy (Eq. 2.19) is not a perfect measure of the overall discriminantability because it can easily be biased by large individual relative entropies.2. If a sensor with large relative entropies fails, the classification scheme will fail, defeatingthe original purpose of multisensor fusion.In the next section, a genetic algorithm is proposed to search the feature space for the best featureset according to three important criteria.3.3. Genetic Algorithm for Feature SelectionIntroduced by Holland in 1975, genetic algorithms are a class of derivative-free optimizationalgorithms that imitate the process of natural selection in genetics. Compared to traditionalcalculus-based schemes, genetic algorithms have two major advantages: they are applicable todiscrete problems and they are less likely to get trapped in local optimums [32]. Also, comparedto other enumerative techniques such as dynamic programming, genetic algorithms are bettercapable of handling large complex problems without breaking down or suffering from the “curseofdimensionaliry.”A simple genetic algorithm optimization procedure consists of the following steps:1. Initialization: An initial population of chromosomes is randomly generated442. Selection: The fitness values all chromosomes in the current population are evaluated. Thechromosomes with the highest fitness values are selected for reproduction as they will have ahigher probability of mating and creating the next generation of “better” chromosomes.3. Reproduction: The chromosomes mate using the crossover operation, to produce the nextgeneration. Genetic operators such as mutation can be applied to increase the diversity of thepopulation4. Termination: Steps 2 and 3 are repeated until a certain condition is reached, upon which thealgorithm is terminated.Further details about Genetic Algorithms and Evolutionary Computing are found in [32].A genetic algorithm for robust condition monitoring systems is proposed and detailed as follows:1. Initialization: The features are represented as a binary sequence of genes. If the gene is a 0,that means that the feature is not selected. If the gene is a 1, that means that the gene isselected as a feature.01101010011 01101010011 01101010011 01101010011Accelerometer #1 Accelerometer #2 Accelerometer #3 Microphone #3The genes are randomly initialized and a population of 100 chromosomes is created.2. Selection: For the fitness function, three intuitive criteria specific to feature selection incondition monitoring applications are proposed:a. Size: Choosing a small feature set will reduce the complexity of the classifier, improve thespeed of computation during the signal processing (because only a subset of the signal isprocessed) and improve the speed of classification. For the fitness function, a normalized metricfor the size of the feature is defined as:(3.1)NTotalwhereNgenesois the number of unselected features andNtaiis the total number of features.45b. Discriminant Ability: The chosen feature set must have high discriminant ability. Thediscriminant ability of a feature can be defined as a measure of how differently two sequencespand q are distributed. In the application of pattern recognition, it can be described as the ability todifferentiate between two classes.For the fitness function, a normalized metric for the discriminant ability of the feature set isdefined as:(‘REDis= genes-(3.2)\RE011where RE is a measure of relative entropy as given by Eq. 2.19.Practically, the relative entropy of the feature set can be found by multiplying a diagonal matrixof individual relative entropies by the transpose of the chromosome vector:RE1 0 0 0 0o RE 0 0 12x=RET1(3.3)o o RE3 0o o 0 RE4 0c. Diversity: The chosen features should be spread out between different sensors, so that if onesensor fails or is corrupted, the detection scheme will still be able to function with a minimallossin classification accuracy. The standard deviation of sensor feature size is proposed as a measureof feature set diversity:1N(3.4)1=1where N3 is the total number of sensors, F, is the feature size of sensor i andp is the meanfeature size of the sensors.One problem with this measure of diversity is that a feature size of zero is a solution with thehighest diversity. To avoid this triviality, a modified normalized metric for the feature diversityis defined as:Div= (3.5)Ngenes=i}Using these criteria, a multi-objective fitness function is defined as:F = a1 (S)+a2 (Dis) + a3 (Div) (3.6)46where a, a2 and a3 are the weights for the three objective functions andak= 100.Since the performance of the classifier will be dependant on the weights of the fitness functioncriteria, an intuitive tuning procedure is proposed to optimize the weight selection:Start the procedure with a high a1 (weightfor size criteria) and a high a’2:a3(discriminantweight to diversity weight) ratio.ii. Classify the uncorrupted data. If the classification accuracy is high, proceed to step 3. If theclassification accuracy is low, repeat step 1 with a lower a1iii. Classify the corrupted data. If the classification accuracy is high, the chosenfeature set isgood and the algorithm can be terminated. If the corrupted classification accuracy is low,proceed to step 4.iv. Decrease the a2: a3 ratio and reclassify the corrupted data. If the classification accuracy ishigh, the chosenfeature set is good and the algorithm can be terminated. If the classificationaccuracy is low, repeat step 1 with a lower a1.This tuning procedure attempts to choose a feature set with the smallest number of features andthe highest classification accuracy under both machine failure and sensor failure conditions.3. Reproduction. A rank-based selection procedure is implemented to determine the best parentsand an elitism strategy is used to preserve the best individuals in a population. Scatteredcrossover (p = 0.8) and Gaussian mutations (p = 0.2) are utilized to increase the populationdiversity [32].4. Termination. The algorithm terminates when the chromosomes have evolved for 100generations or the change in the fitness function value between generations is less than I 0”-3473.4. Experimental ResultsAs described in Chapter 2, the signals from four accelerometers and four microphones aredecomposed to four levels using the DBO4 wavelet. This results in a candidate space of 128features. For the raw sensor data, 300 samples (50 samples from each class) are used as traininginputs and 300 samples (50 samples from each class) are used as testing inputs. For the corruptedsensor data, 300 samples are used for training inputs and 300 samples are used as training inputsas well. However, the corrupted sensor data are modified to simulate faulty sensor conditions. Inthe 300 training samples and 300 testing samples, each sample has one sensor turned off (zerovalue), simulating a catastrophic sensor failure.To test the feature selection algorithm, a radial basis function network (Chapter 4) is used as theclassifier. The code for the Genetic Algorithm and Neural Network classifier is written usingMATLAB toolboxes. Table 3.1 summarizes the steps of the tuning procedure. Here Z refers tothe percentage of raw sensor data that is classified correctly and Z refers to the percentage ofcorrupted data that is classified correctly.Table 3.1: GA-Feature Selection Tuning Procedure.a1 a2 a’3Feature NoZ (%)Z (%)Action90,8,2 117 68 61.5L a180, 16,4 110,120 61.5 56.5a70, 24, 6 79, 104, 120 84 79.560, 32, 8 40, 79, 96, 104, 120 95 91.5- a’155, 35, 10 10, 32, 48, 51, 79, 88, 104, 113 96 93.5a40, 69, 71, 79, 84, 88, 96:98, 102:104, 106,52,38,10 110, 113:115, 117:120, 122 99.5L a2 :a352, 30, 18 16, 27, 36, 50, 79, 92, 104, 113 100 99Figures 3.2 to 3.6 show the optimization procedure for selected steps of the tuning procedure.The upper graph is a plot of the best and mean fitness values of the population as a function ofthe number of generations the population has evolved. The lower graph shows the chromosomewith the highest fitness function when the algorithm terminates. As discussed earlier, if gene is 1,the feature is selected and if the gene is 0, the feature is not selected.48-40a)a)>0)C,)a)Li.• •.•••*.••.4••.••4.. .• •.•+,4%-60-801• est fitness• Mean fitness0 10 20 30 40 50 60 70 80 90 100Generation0.50I I0 20 40 60 80 100 120FeaturesFigure 3.2: Feature Selection at a1 = 80, a2 =16 and a3 = 4.C.-.a)C)a)a)C,)a)a)a)LLa)a)>U)C’)a)UC.-.a)C-)a)a)cf-ia)a)U-• eest fitness• Mean fitness-45-5055-60 -1-0.50-00 10 20 30 40 50 60 70 80 90 100Generation20 40 60 80 100 120FeaturesFigure 3.3: Feature Selection at a1 = 60, a2 = 32 and a3 = 8.49-401CD0a)a)C/]a)a)1a)C.)o.:a)>CoCoa)Li-45 .-50-550. aet fitness•Mean fitnessI I I I10 20 30 40 50 60 70 80 90 100Generation0.5— I I I . I —— I I — I —0 20 40 60 60 100 120FeaturesFigure 3.4: Feature Selection at a1 = 52, a2 = 35 and a3 = 10.0-40-45-50a)z>Co0)a)CLL• Gest fitness•Mean fitness55I I I I I I I I I0 10 20 30 40 50 60 70 80 90 100GenerationU [H hUh20 40 60 80 100 120FeaturesFigure 3.5: Feature Selection ata1 =52, a2 =38 and a3 =10.50a.,..a)C.)a)a)C,,a)Cua)LLa)>Ci,Cl,a)U--4045-50*best fitness• Mean fitness55I I I0 10 20 30 40 50 60 70 80 90 100Generation0.600 20 40 60 30 100 120FeaturesFigure 3.6: Feature Selection at a = 52, a’2 = 30 and a3 = 18.513.5. DiscussionAs Table 3.1 shows, the raw dataset and corrupted dataset classification accuracies increase asa1 decreases. When a sufficiently high raw dataset classification accuracy is achieved, thea2 : a3 ratio is decreased, so the features are better spread out over the sensors and the corrupteddataset classification accuracy increases. The final feature set has 8 features spread out over eachof the 8 sensors. if an algorithm (e.g., LDB) is used to rank discriminant measures and select afeature set based on the ranking alone, the results given in Table 3.2 are obtained.Table 3.2: Feature Selection Based on Relative Entropy.Features Sensors usedZ (%)Z (%)120 Mic #4 25.5 25120, 113 Mic#4 46.5 44.5120, 113, 104 Mic#4,Mic#3 62.5 59.5120,113,104,110 Mic#4,Mic#3 74.5 75120, 113, 104, 110, 117 Mic#4,Mic#3 77.5 77.5120, 113, 104, 110, 117, 102 Mic#4,Mic#3 80 79120, 113, 104, 110, 117, 102, 119 Mic#4,Mic#3 81 78.5120, 113, 104, 110, 117, 102, 119,115 Mic #4, Mic #3 79.5 78.5120, 113, 104, 110, 117, 102, 119,115,103 Mic#4,Mic#3 79.5 77.5120, 113, 104, 110, 117, 102, 119,115, 103,97 Mic#4,Mic#3 81.5 79120, 113, 104, 110, 117, 102, 119,115, 103,97,98 Mic#4,Mic#3 84 81.5120, 113, 104, 110, 117, 102, 119,115, 103,97,98, 118 Mic#4,Mic#3 83 79120, 113, 104, 110, 117, 102, 119,115, 103,97,98, 118, 106 Mic#4,Mic#3 83 79.5120, 113, 104, 110, 117, 102, 119,115, 103, 97, 98, 118, 106, 79 Mic#4,Mic#3,Mic#2 93.5 90.5120, 113, 104, 110, 117, 102, 119,115, 103,97,98, 118, 106,79, 122 Mic#4,Mic#3,Mic#2 93.5 86.5120, 113, 104, 110, 117, 102, 119,115, 103, 97, 98, 118, 106, 79,122,121 Mic #4, Mic #3, Mic#2 93 89As Table 3.2 indicates, Microphones #4 and #3 have the largest discriminant measures. If thefeatures were chosen based on the relative entropy ranking alone, the raw data set and corrupteddataset classification accuracies are worse than that from the filter selection algorithm proposedearlier. This can be attributed to the two drawbacks mentioned in section 3.1: The multiclass52relative entropy rankings can be easily biased by a large difference between individual classes,and sensor redundancy is not utilized. The features for microphones #3 and #4 indicate a largedifference between the baseline conditions, hydraulic system fault conditions, and the electricalmotor fault conditions. As a result, the discriminant rankings for these features are high and theaforementioned faults are classified at a high accuracy. However, the microphones are not able todifferentiate the other faults in a reliable manner and cannot be used as the sole sources offeatures for the classification process. Some other criteria (such as sensor diversity) is required todiagnose different kinds of faults under conditions of ideal sensor and unreliable sensor.53Chapter 4Classification4.1. ClassificationIn the context of pattern recognition, classification refers to the process of categorizing data onthe basis of one or more traits. Mathematically, classification can be considered a mapping froma feature space x to a label y. Traditionally, there have been three theoretical approaches toclassifier design[161.The most basic approach is to classify data based on similarity. Once thereis a good measure of similarity, patterns can be classified on their degree of similarity to existingpatterns. This is the underlying concept of template matching and distance-based classifiers.Another approach is to use posterior probabilities to determine the likelihood of a patternbelonging to a certain category. This is the approach used in Baye’s rule and logistic classifiers.The third approach is to construct decision boundaries that directly minimize classification errorcriteria. This approach is considered the most powerful and is well suited for dealing with noisydata and high dimensionality feature spaces [16]. In particular, Radial Basis Function Networksand Support Vector Machines have emerged as popular techniques of nonlinear classificationdue to their excellent classification accuracies and generalization abilities. The followingsections introduce the theoretical foundations of these two classifiers.4.2. Support Vector MachinesOriginally introduced by Boser, Guyon and Vapnik in 1992, Support Vector Machines (SVM)are a powerful set of supervised learning methods for solving problems of classification andregression. A geometrical explanation of the SVM algorithm can be given. Specifically, itconstructs a hyperplane that maximizes the margin between two classes of data inputs. Thefollowing explanations provide the derivation of the nonlinear, least squares classifier used in thepresent work [33-35].54Linear Binary ClassificationTo demonstrate the classification problem, consider the case of classifying the following datainto two separate classes:(x1,y),(x2) ,(X,y)’where x e R” is the input data vector and y e(—i,+i) is the target or known class for x.Assuming the data is linearly separable, a line (for the case D =2) or a hyperplane (for the caseD >2) can be drawn such that it separates the data into two different classes (see Fig. 4.1). Theconstructed hyperplane has the formwx+b=O (4.1)where,w is a nonnal vector to the hyperplaneis the perpendicular distance from the origin to the hyperplane.Figure 4.1: Separating Hyperplane.Given a set of training inputs, the separation into two classes can be described by the followingset of conditions:x1 w+b=1 for y =+1xw+b=—1 for y=—lCombining these two inequalities into a single condition, we obtain:y,(x1.w+b)—lO Vi(4.2)(4.3)(4.4)......55We define support vectors on the points that lie closest to the separating hyperplane. Then twoplanes H1 and112are defined (see Fig. 4.2) such that they lie on these support vectors andsatisfy the conditions:x1 w+b=+1 for H1x, •w+b=—l for H2Figure 4.2 Support Vector Machine Principle.The distances between the separating hyperplanes to H1 and H2 are represented as twoequivalent distances d1 and d2, known as the SVM margin. In order to maximize the distancefrom the hyperplane to the closest points, it is clear that this margin will have to be maximized.Relating it to Eq 4.4, the margin is equal to and the problem is reformulated as the followingoptimization problem with constraints:Minimize subject toy1 (x1 . w + b) —1 0 (4.5)Rewriting the problem such that it can be solved by dynamic programming, we obtain thefollowing problem:Minimize[w2subject to y1(x .w+b)—10 (4.6)To accommodate the constraints and to ensure that the training inputs are represented as dotproducts between vectors (see nonlinear classification), a Lagrangian switch is made and theproblem is reformulated using Lagrange multipliers a, where a1 0V1:LwN2—a[y1(x .w+b)—lVi}H,56—a1y(x.w+b)+a (4.7)To find the solution to the Lagrangian problem, L is differentiated with respect to w and b,and then the derivatives are set to zero:—> w=oçyx (4.8)L—> w=a1y (4.9)By replacing the equation for w in the primal form of the Lagrangian L, the dual LagrangianformLDis obtained and is maximized; thus,LDx subject to a1y=0 and a 0 (4.10)The above formulation can be solved using quadratic programming to find a, which can then besubstituted in Eq 4.10 to find w and b . The resulting classifier has the form:#svy(x)=sgn[ay1yx xj +b] (4.11)where index i counts from one to the number of support vectors.Linear Non-seperable ClassificationOften, there exist data that are not fully separable for various reasons including incompleteness,unreliability, and noise. To accommodate misclassifications, the margin constraints in Eq. 4.4can be relaxed by introducing a slack variable :y1(x w +b) —1+0 where ‘cli (4.12)The objective function can be redefined to include this relaxation of constraints:Mm!w2+C subject to y1(x .w+b)—1+. 0 ‘cli (4.13)where C is a positive real constant that represents the trade-off between slack variable penaltyand margin size.The Lagrangian is reformulated asLJ2—a1[y(x.w+b)—1+]—u (4.14)57• aL aLSetting — =0, =0 and =0, the dual form of the Lagrangian which can be solved byaw abquadratic programming, is obtained:subjecttoay1=0and0cC(4.15)Nonlinear Classification:In 1995, Vapnik was able to extend the linear classification technique to perform nonlinearclassification by mapping the input data to a high dimensional space in which it might beseparable. To do so, a kernel function is defined such that K(x1, x) = . where (x)is a nonlinear mapping from the feature space to the Hilber space H (see Fig. 4.3). Thereplacement of (x) with (x) and expression of a kernel as the inner product of ,(x) iscommonly referred to as the “Kernel Trick.” This enables us to work in high dimensional spaceswith explicitly performing computations in that space..-•.yxIy————•‘xx xX ‘I///Xx --y——Figure 4.3: Nonlinear Mapping ç.Many types of Kernels can be used to map the input data into a higher space. Among the choicesinclude:Linear Kernel: K(x1,x)=x •(—Ux.—x.j2Radial Basis Function: K(x1,x)= expl2Polynomial Kernel: K(x1,x) = (x, . x3 + 1)’yv(y)çD(y)çD(y)(0(y)(0(x)(0(x)Xy__yyy(0(x)(0(x)(0(y) (0(y)(0(x)(0(y)(0(x)ço(x)(0(y)58Multilayer Perceptron Kernel: K(x1,x) = tanh(k1x x + k2)Since the RBF Kernel is considered a good first choice for many classification applications, itwill be used in the present thesis for comparison against the RBFN [25-27]. However, in thepresent work, the linear and polynomial kernels will also be tested for wrapper feature selection.A similar procedure to linear classification is followed to obtain the nonlinear classifier. Theprimal form of the Lagrangian is now written as:L +c1 (4.16)By setting the Primal Lagrangian derivatives to zero, the subsequent dual form is obtained:LDs.t. a1y=0 and Oa C (4.17)The resulting classifier has the form:#svy(x)=sgn[ay1K(x,x)+b] (4.18)Least Squares ClassificationSuykens [33] proposed a modification to the original SVM algorithm so that the solution isobtained by solving a linear set of equations rather than using Quadratic Programming. Theoptimization problem was redefined as:Mm+ y-_ej2subject to y((x). w+b) = 1+e (4.19)The original formulation by Vapnik was modified by changing the inequality constraint in Eq.4.4 to an equality constraint and using a square loss function for error variable e1. The PrimalLagrangian is written as(4.20)• . 3LSetting the derivatives to zero: — = 0, = 0, = 0 and = 0 we obtain a linearaw ab ae,Karush-Kuhn-Tucker (KKT) system [33]:roTlrbLroY+IIya — 1- IL ILv(4.21)59whereZTZand the kernel trick is applied in the 2 matrix, =y1yK(xx)As seen above, the linear KKT has a square system with a unique solution (for full rank matrix)which is much easier to solve than the convex optimization problem required by Eq. 4.17.Finally the classifier takes the form:#svy(x) =sgn[a1yK(x,x)+b1 (4.22)As this formulation shows, there are two parameters that need to be tuned for designing a softmargin, least squares SVM with radial basis kernel function: the regularization parameter ‘ andthe kernel parametero. These parameters are obtained by performing a “leave-one-out” crossvalidation procedure on the sample data set [34]. In leave-one-out cross validation (see Fig. 4.4),the classifier is trained multiple times with all but one of the training samples. The missingsample is used to test the obtained classifier. The error value will guide the choice of subsequenthyperparameters.Note: In the wrapper feature selection test, it is very time consuming to cross-validate for eachfeature subset. Therefore, the parameters are tuned only once for the neural network featureinputs at the same conditions.0 1 2 3 4 5 6 7 8109(y)Figure 4.4: Leave-one-out Cross Validation to Find Hyperparameters ‘ and o.—1-2-3-4 --5-6(40I I I604.3. Radial Basis Function NetworkRadial basis function networks are a special type of feedforward neural networks which useradial basis functions as the activation function. The basic structure of an RBFN (see Fig. 4.5)consists of an input layer, one hidden layer with a radial basis activation function and an outputlayer [32]. The connection weights between the units of the input layer and the units of thehidden layer are all equal to 1. Researchers have shown that an RBFN with a sufficient numberof hidden nodes can be used as a universal approximator, and this is a useful feature for faultdiagnosis due to the stochastic nature of the feature selection process. The downside of using anRBFN is that some classification problems may require a large number of hidden layer neuronsto achieve satisfactory results.Figure 4.5: Structure of the Radial Basis Function Network.In this structure, there is a nonlinear transformation between the input layer and the hidden layer,and a linear transformation between the hidden layer and the output layer. This allows the inputspace to be cast nonlinearly into a higher dimension space in which it might be linearlyseparable. The nonlinear transformation has the following characteristics: it is symmetrical, has amaximum at the center of the activation function, and has positive values that decrease from thecenter. The resulting output of the function will be bounded and localized as a result. The generalform the RBF function can be written as:g(x)=r (4.23)Input Layer Hidden Layer Output Layer61where x is the input vector, v1 is the vector describing the center of functiong, and o is theunit width parameter ofg.A commonly used function for g. is the Gaussian kernel function described bygj(x)=expX II(4.24)2o}The output of the RBFN with n neurons in the hidden layer and r output units can be describedby:o1(x)=wg(x) j=1 ,m (4.25)where w is connection weight between the i -th perceptron and the i -th output andg1 is theactivation function.There are three parameters that one can use to train the network once the structure of the networkis selected: the center and width (normalization parameter) of the radial function, and connectionweights between the hidden layer and the output layer. The traditional method to finding theseparameters is to employ a two stage approach. First the center and width of the radial functionare determined using an unsupervised clustering algorithm; k-means clustering in this case. Thenthe connection weights are found using a supervised learning algorithm; backpropogation in thiscase.4.4. Experimental ResultsSupport Vector Machines and Radial Basis Function Networks are tested with two featureselection schemes. The feature space is identical to what is used in section 3.3. There is acandidate feature space of 128 features resulting from 8 sensor signals decomposed into 4 levelseach. For the raw sensor data, 300 samples (50 samples from each class) are used as traininginputs and 300 samples (50 samples from each class) are used as testing inputs. For the corruptedsensor data, 300 samples are used for training inputs and 300 samples are used as testing inputs,with every sample of training and testing having one sensor turned off.624.4.1 Filter SelectionThe features selected in section 3.3 for the RBFN are tested for the SVM to compare theclassification accuracies for uncorrupted and corrupted data sets (see Table 4.1).Table 4.1: Filter Feature Selection.NNNNSVMSVMa1 ,a2a3Feature NoZ (%)Z (%)Z (%)Z (°7)90,8,2 117 68 61.5 80 7980, 16,4 110,120 61.5 56.5 69 5770, 24, 6 79, 104, 120 84 79.5 77.5 79.560, 32, 8 40, 79, 96, 104, 120 95 91.5 96 94.555,35,10 10, 32, 48, 51, 79, 88, 104, 113 96 93.5 96.5 9740, 69, 71, 79, 84, 88, 96:98, 102:104,52, 38, 10 106, 110, 113:115, 117:120, 122 99.5 95.5 98 94.552, 32, 18 16, 27, 36, 50, 79, 92, 104, 113 100 99 99 98.5As Table 4.1 shows, the SVM performs better than the neural network approach for the lowerfeature set sizes, but the neural network approach is able to achieve slightly higher classificationaccuracies at larger feature set sizes for both corrupted and uncorrupted data sets.4.4.2 Wrapper SelectionIn addition to testing the classification accuracies of the features selected in section 3.3, awrapper selection process is used to determine the optimal feature set. Wrapper feature selectionaims to maximize the prediction accuracy of the classifier. To do so, a search strategy can beimplemented to search the feature space for candidate features, and the selection of features isdetermined by the classification accuracy of the classifier. For the search strategy, a geneticalgorithm is executed in the same manner as described in section 3.2. However the fitnessfunction is evaluated in the following way: First, the classifier is trained with the training data forthe selected features. Then, the classifier is tested with validation data using the selected features,and the % of data that is accurately classified is calculated. The fitness function is defined simplyasF = % accuracy + small size penalty (4.26)There is a small size penalty in the fitness function to ensure that if multiple feature sets give thesame highest classification accuracy, the smallest feature set will be chosen.63Different kernel functions are tested to determine if the choice has an impact on the classificationaccuracy and the size of the obtained feature set. A linear and2ndorder polynomial kernels aretested against the radial basis function kernel under normal and faulty sensor conditions. Theresults are summarized in Table 4.2. The kernel hyperparameters are tuned once with the featuresselected for the artificial neural networks (ANN) and are left constant during the feature selectionalgorithm. In the first column of Table 4.2, the classifier type and the data set are indicated, andthe second and third columns indicate the highest classification accuracy achieved and lowestnumber of features required. Figures 4.6 to 4.9 show the optimization procedure for the RBFNand RBF-kernel SVM.Table 4.2: Wrapper Feature Selection.Classification Features AccuracyRBFN 4 100%RBFN Corrupted 6 100 %RBF-SVM 3 100 %RBF-SVM Corrupted 11 100 %Lin-SVM 20 89.5 %Lin-SVM Corrupted 39 91.5%2 Poly-SVM 22 83.5 %2 Poly-SVM Corrupted 53 79.5 %As Table 4.2 shows, the RBFN and the RBF-SVM provide very similar performance. They areboth able to achieve high classification accuracies while reducing the feature set sizesubstantially for normal sensor data and faulty sensor data. However, the linear kernel andpolynomial kernel yield much worse performances than that from the RBF kernel. They are onlyable to achieve accuracies of 89.5 % and 91.5% for normal and corrupted sensor data,respectively. This could be due to two reasons: one possibility is that these kernels are moresensitive to hyperparameters than the RBF kernel. Another possibility is that they do not performas well as the RBF-kernel SVMS under conditions of noisy data and missing inputs. Also thetraining time in the order of increasing length was as follows: ANN, RBF-SVM, Linear KernelSVM, and Polynomial Kernel SVM. The SVM training times were significantly longer than theANN training time. This observation is not consistent with other research [27]. This could beattributed to the algorithms programming inconsistencies in different software environments.640)0)Coa)00 10 20 30 40• est fitness•Mean fitness80 90 100C’-.a)C)a)a)C,).2)a)L11-0.50-0 20 40 60Feature80 100 120Figure 4.6: Wrapper Feature Selection for RBFN.•.••4.‘*,••••4* ••4410 20 30 40 50GenerationI — I I I —I — I I —0 20 40 60FeaturesFigure 4.7: Wrapper Feature Selection for RBFN wI Corrupted Sensor Data.0250 60 70GenerationI I800•est fitness• Mean fitnessI I I I60 70 80 90 100C’-.a)C-)a)a)C,)0)Caa)0.5080 100 12065300a)CU>Ci)Cl)a)U-C...a)C)a)a)Coa)DCUa)U-a)>Cr)Cl)a)CLi..1Cs.a)C.)a)a)co0.5U)CU•••• ••••.•.•* ••••• •.••*. •••I I20 40 60 00Figure 4.9: Wrapper Feature Selection for RBF-SVM wI Corrupted Sensor Data.200100•Dest fitness• Mean fitness10 20 30 40 50Generationö 70 00 90 10000.50300— I I I I I— I I I I I0 20 40 60 00 100 120FeaturesFigure 4.8: Wrapper Feature Selection for RBF-SVM.4%•.••••ê•• •t.••• ••*- -2001000-0•Sest fitness•Mean fitnessI I I I I I I I I I10 20 30 40 50 60 70 00 90 100Ge ion0-0Features100 12066Chapter 5Conclusions5.1. Synopsis and ContributionsIn this thesis, a multisensor-based condition monitoring scheme was developed and tested on anindustrial fish processing machine. Two on-off catastrophic faults, three gearbox faults, and twoother partial faults were physically implemented and sensor faults were simulated for evaluatingthe developed methodology. The machine was instrumented with four accelerometers and fourmicrophones to continuously acquire vibration and sound signals. The signals were representedwith the wavelet packet decomposition and node energies were used to generate a feature vector.Different analyzing wavelets were tested and it was determined that the choice did notsignificantly impact the pattern recognition process. A simple statistical analysis indicated thatthe differences between the feature vectors for blade dullness and fish jam were not significant;therefore, not included in the data set for pattern recognition.To improve the classification accuracy and reduce the computational cost, a multi-objectivegenetic algorithm and tuning procedure was developed, which reduced the dimensionality of thefeature space. Experimental tests demonstrated the effectiveness of the scheme and also showedthe drawbacks of using the discriminant measure alone as a means for reducing the feature setsize. Two classifiers, Radial Basis Function Networks and Support Vector Machines, wereintroduced and tested using input features from the proposed filter selection scheme and awrapper feature selection scheme.The classifiers were tested under conditions of ideal sensor data and corrupted sensor data. TheRBF-kernel SVM and the RBFN performed well under both conditions but the linear-kernelSVM and the polynomial-kernel SVM were neither able to achieve high classification accuraciesnor small feature subsets. This could be due the nature of the data (noisy and incomplete) or theclassifier sensitivity to the tuning parameters. Also, the SVM training procedure tooksignificantly longer time than the neural network training procedure.67The main contributions of this work can be summarized as follows:• Development of condition monitoring instrumentation and software developmenttoimplement an on-line condition monitoring scheme capable of updating the machinestatus every three seconds with respect to 6 potential machine defects• Successful application of wavelet packet decomposition to capture fault signaturesof 6machine conditions• Development of a feature selection method using genetic algorithms and associatedtuning procedure, with demonstrated advantages over conventional methods• Comparison of two classification schemes under conditions of healthy sensors andcorrupted sensors5.2. Future DirectionsThe implemented fault testing could be expanded to include further types and seventies of faultsincluding electrical faults and hydraulic system faults, thus creating a comprehensive faultdiagnosis system for the fron Butcher. In particular, the gear and bearing faults shouldbeinvestigated with different seventies of defects to fully verify the capabilities of thediagnosissystem. In the present thesis, the sensor faults were treated as on-off type, which is not alwaysthe mode of sensor failure in practice. Different sensor conditions such as saturationandmeasurement noise may be simulated in the future. Also, the current inability to differentiatebetween blade sharpnesses and fish jam conditions needs to be investigatedfurther, possiblyleading to research in better sensing methods.More analysis can be performed on the correlation between the physical faults and thesignalprocessing to optimize the feature generation process. In the present work, itis assumed that thewavelet analysis technique is sufficient for capturing the fault signatures. Comparisons canbemade with other techniques such as the FFT, STFT and Hilbert Transform. Also,further analysis68may be performed on the physical interpretation of the fault signatures, rather than relying onpattern recognition methods alone.The genetic algorithm-based feature selection method may be tested against different data setsother the one used, and the validity of the tuning procedure could be further investigated. Theperformance may be benchmarked against other feature extraction methods such as PrincipalComponents Analysis and Linear Discriminant Analysis, to quantify the assumptions made inChapter 3 about classification accuracy and computational expense.In the area of classifier design, an effort may be made to further understand the performancebenefits of one classifier over another. Currently, all conclusions about classifiers are madeempirically, but a better understanding of the classification mechanism will enable one to designclassifiers more confidently for condition monitoring applications.69References[1] C. W. de Silva, Vibration and Shock Handbook. Boca Raton, FL: CRC Press, 2005.[2] J. Korbicz, J. M. Koscielny, W. C. Cholewa and Z. Kowalczuk, Fault Diagnosis: Models,Artificial Intelligence, Applications. New York: Springer, 2004.[3] Consortium of the project Offshore M&R (NNE5/2001/710), Advanced Maintenance andRepair for Offshore Wind Farms using Fault Prediction and Condition Monitoring Techniques.Kassel, Germany: ISET, 2005.[4] R. C. Luo, C.-C. Yih, and K. L. Su, “Multisensor fusion and integration: Approaches,applications, and future research directions,” IEEE Sensors J., vol. 2,pp. 107-119, Apr. 2002.[5] H. Lang, Y. Wang, and C. W. de Silva, “An automated industrial fish cutting machine:Control, fault diagnosis and remote monitoring,” Automation and Logistics, 2008. ICAL 2008.IEEE International Conference on,pp.775-780, 2008.[6] Z. K. Peng and F. L. Chu, “Application of the wavelet transform in machine conditionmonitoring and fault diagnostics: a review with bibliography,” Mechanical Systems and SignalProcessing, vol. 18,pp.199-221, 3. 2004.[7] J. Lin, M. J. Zuo, and K. R. Fyfe, “Mechanical fault detection based on the wavelet denoising technique,” Journal of Vibration and Acoustics, vol. 126, pp.9-16, 2004.[8] V. Purushotham, S. Narayanan, and S. A. N. Prasad, “Multi-fault diagnosis of rolling bearingelements using wavelet analysis and hidden Markov model based fault recognition,” NDT E mt.,vol. 38,pp.654-664, 12. 2005.[9] R. Rubini and U. Meneghetti, “Application of the envelope and wavelet transform analysesfor the diagnosis of incipient faults in ball bearings,” Mechanical Systems and Signal Processing,vol. 15,pp. 287-302, 3. 2001.[10] W. J. Wang and P. D. McFadden, “Application of wavelets to gearbox vibration signals forfault detection,” J. Sound Vibrat., vol. 192, pp. 927-939, 5/23. 1996.[11] C. K. Sung, H. M. Tai, and C. W. Chen, “Locating defects of a gear system by the techniqueof wavelet transform,” Mechanism and Machine Theory, vol. 35,pp.1169-1182, 8/1. 2000.[12] J. Lin and M. J. Zuo, “Gearbox fault diagnosis using adaptive wavelet filter,” MechanicalSystems and Signal Processing, vol. 17,pp.1259-1269, 11. 2003.70[131K. Shibata, A. Takahashi, and T. Shirai, “Fault diagnosis of rotating machinery throughvisualization of sound signals,” Mechanical Systems and Signal Processing, vol. 14,pp. 229-241, 3. 2000.[14] J. Lin, “Feature extraction of machine sound using wavelet and its application in faultdiagnosis,” NDTEInt., vol. 34,pp.25-30, 1. 2001.[15] J. Wu and J. Chan, “Faulted gear identification of a rotating machinery based on wavelettransform and artificial neural network,” Expert Syst. Appi., vol. 36,pp.8862-8875, 7. 2009.[16] A. K. Jam, R. P. W. Duin, and Jianchang Mao, “Statistical pattern recognition: a review,”Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 22,pp.4-37, 2000.[17] G. G. Yen and K. -. Lin, “Wavelet packet feature extraction for vibration monitoring,”Industrial Electronics, IEEE Transactions on, vol. 47,pp.650-667, 2000.[18] N. Saito and R. R. Coifman, Local discriminant bases and their applications, Journal ofMathematical Imaging and Vision, v.5 n.4, p.33’7-3S8, Dec. 1995.[19] B. Liu and S. -. Ling, “On the selection of informative wavelets for machinery diagnosis,”Mechanical Systems and Signal Processing, vol. 13,pp.145-162, 1. 1999.[20] R. Tafreshi, “Feature extraction using wavelet analysis with application to machine faultdiagnosis,” Ph.D. dissertation, The University of British Columbia, Vancouver, BC, Canada,2005.[21] R. Tafreshi, F. Sassani, H. Ahmadi, and G. Dumont, “An Approach for the Construction ofEntropy Measure and Energy Map in Machine Fault Diagnosis,” Journal of Vibration andAcoustics, vol. 131, 024501, 2009.[22] Yang and V. Honavar, Feature Subset Selection Using A Genetic Algorithm, FeatureExtraction, Construction and Selection: A Data Mining Perspective,pp.117-136, 1998, secondprinting, 2001.[23] L. B. Jack and A. K. Nandi, “Genetic algorithms for feature selection in machine conditionmonitoring with vibration signals,” Vision, Image and Signal Processing, lEE Proceedings, vol.147,pp.205-2 12, 2000.[24] G. G. Yen and W. F. Leong, “Fault classification on vibration data with wavelet basedfeature selection scheme,” ISA Trans., vol. 45,pp.141-15 1, 4. 2006.[25] L. Jack and A. Nandi, “Support vector machines for detection and characterization of rollingelement bearing faults,” Proc. Inst. Mech. Eng. Part C, vol. 215,pp.1065-1074, 01/01. 2001.71[26) B. Samanta, “Gear fault detection using artificial neural networks and support vectormachines with genetic algorithms,” Mechanical Systems and Signal Processing, vol. 18,pp. 625-644, 5. 2004.[27] G. Lv, H. Cheng, H. Zhai, and L. Dong, “Fault diagnosis of power transformer based onmulti-layer SVM classifier,” Electr. Power Syst. Res., vol. 75,pp.9-15, 7. 2005.[28] M. Basseville, Distance Measures for Signal Processing and Pattern Recognition, SignalProcessing, vol. 18,pp.349-369, 1989.[29] I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,”J. MachineLearning Res. (Special Issue on Variable and Feature Selection), vol. 3,pp.1157-1182, 2003.[30] R. R. Coifman and M. V. Wickerhauser, “Entropy-based algorithms for best basisselection,” Information Theory, IEEE Transactions on, vol. 38,pp.713-718, 1992.[31] A. Jam and D. Zongker, “Feature selection: evaluation, application, and small sampleperformance,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 19,pp.153-158, 1997.[32] F. 0. Kharray and C. W. de Silva, Soft Computing and Intelligent Systems Design. NewYork, NY: Addison_Wesley, 2004.[33] J. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle, Least SquaresSupport Vector Machines, Singapore: World Scientific, 2002.[34] B. Scholkopf and A. J. Smola, Learning with Kernels. Cambridge, MA: MIT Press, 2001.[35] C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” DataMining and Knowledge Discovery, vol. 2, no. 2,pp. 121-167, 1998.[36] P. M. Bentley and J. T. E. McDonnell, “Wavelet transforms: an introduction,” Electronics &Communication Engineering Journal, vol. 6,pp.175-186, 1994.[37] R. M. Rao and A. S. Bopardikar Wavelet Transforms: Introduction to Theory andApplications Reading, MA: Addison-Wesley, 1998.[38] A. Jensen, A. la Cour-Harbo, Ripples in Mathematics: The Discrete Wavelet Transform.Berlin: Springer-Verlag, 2001.72H CD CD (12 CD C0 C ‘-I CH CD C’) CD CD CD 0 C.,C,3.w CD D z z C) C C))w CD D Q C), -n 0) I’) 0 L\)w CD D Q z > C) 0) r’.) 0 0)0 CD CD a CI) -I, z C) -9 (A) 0 00)r.)—z.zzzzzz CDCDCDCDCDCDe?_05—I000DmC))0)0)C))(0I’)CD CD — D-o 0CD G) CD 0) -‘ 0 x 0) 0)w CD 0 - -I CD 0 0 C) C -‘ CD -I Cl) CD 1 CD 0TjH CI, CD CD 0 CD CI) CD -a CD 0 0 CD CD CD 0 0CD-c,CCDCDCD-IJaCU’CU-.3Ic,1c,1-,I->> ..CD CD -t 0 CD -t CD -C Th 0 p ©‘0 DC —S p-i 0 p-i DC 0 p-i U, C-iID 33.•03 ID 3C..0ci.IDrc_.CDCD(ICCD CD —3- CD — 11,3w CD 0CCD_DUIIDc.,c,,c,c,,3D-...aCDCDEt3©DgP3333001._..CD—39C033DCCDDC—DCDC0)DCDCDCDCDCCDDCCDDCCDC,,p..,aao,C,)0C.)OS(‘Sa-jeCDc)CDCDOS>aDC- U,attnCCntntn----.=MCUM000CDL,)0DC00)U’-aaOSOSOSC.)C0((IUS(IS-5CDU)110-0——P.S—DCCDCD.jISI0,5F-S-CD‘-S(‘S—-3333333333.CDa—a3C.)(‘S,5CI)CDCDCCCI.‘i.,S...i—aC,sCCCDDOCC.DCDCCDoocoQQpppp333333°’aD,,,c.,-.33333333<33DC333333333—333333333333-:.CUDCDCDCCDCD(I)CDDCDCP.SUSDCC,)DC—DCDC-‘DCDCDCDCUSUS--01C,)CD—SU)v,C.su,aCD-1Ma-,,aC.)-.,ja-a->a—P.SC.)C,acoa-a0,-CC-—-U -I CD3cD C)1CD 0. C C) CD -ICCl)Cl)-J-(..0 cDCUI—33-.33*- C-., -3D,>01<>0>0>0>0)0)0CD--U’--‘-:.0 -I 0) - 1%) (0 (I)-1 LI) I- LI) -1.Appendix B: InstrumentationB.1. Signal Acquisition HardwareThis section gives the specifications of the accelerometers, amplifier and DAQ Board used foracquiring vibrations signals from the machine.Table B.l: DAQ Board Specifications.Analog Input Specification NI PCI-7833RNumber of Channels 8 SE!8 DlSampling Rate 200 kSIs/chResolution 16 bitsMaximum Voltage Range -io..ioVRange Accuracy 7.78 mVRange Sensitivity 0.305 mVMinimum Voltage Range -1 O..1 0 VRange Accuracy 7.78 mVRange Sensitivity 0.305 mVOn-Board Memory 196 kBI/O Connector 68-pin VHDCI femaleTable B.2: Amplifier Specifications.Specification Kistler 5134BSensor Excitation Current (mA) 15 maxSensor Signal Voltage (V) 24Frequency Range (Hz) 0.1 .. .68000Output Signal (V) 10 maxOperating Temperature Range (2F) 32.. ..140Width (in) 2.791Height (in) 5.07 1Depth (in) 7.331Mass (kg) 1.75Table B.3: Accelerometer Specifications.Specifcation 8702B25 8704B100 8728A500 8730A500Range(g) ±25 ±100 ±500 ±500Sensitivity (mV/g) ± 200 ± 50 10 10Frequency Range (Hz) 1...8000 0.5...10000 2...10000 2...10000Resolution (mgrms) 2 6 20 10Shock (g) 2000 2000 5000 5000Transverse Sensitivity (%) 1.5 1.5 1.5 1.5Operating Temperature Range F -67...212 -67...212 -67...248 -67...2248Non linearity (% FSO) ± 1 ± 1 ± 1 ± 175CDCi) CD 0CD -Cocf o-- cCD 0 CD 0 CD -t CD I03 j4o CD -I 0 CD U) 0 - CD-t CD 0 0 -t IZ(I) s3(j)D ) 2-3>p: CD3 CD pz.—J COol )apr)-.0)—)apro : •roa-H CD B C) 0 CD 0 C,) CD D Co a o) a 0) a C.) a 0•) a C;) a C.) 4.C.’CD . T1 CD 1 IJo WriteAccelData Write5oundData Wnte5oLwieatuesL’‘bce.C0.00..•••WaveIormGraph0.06—i 0.05— 0.04- 0.03— 0.02-I 0.01—0--0.011-0MotorFa,ePumpFaureGearFai.4tAcc2l4:00:05.000PM1213111900Time___________••.-•Acc22 JoMid—Mic2•cc3SMic4—Acc36BearingFaiitJo Sound150Souid16ShaftMoainent
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Condition monitoring of industrial machines using wavelet...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Condition monitoring of industrial machines using wavelet packets and intelligent multisensor fusion Raman, Srinivas 2009-11-18
pdf
Page Metadata
Item Metadata
Title | Condition monitoring of industrial machines using wavelet packets and intelligent multisensor fusion |
Creator |
Raman, Srinivas |
Publisher | University of British Columbia |
Date Issued | 2009 |
Description | Machine condition monitoring is an increasingly important area of research and plays an integral role in the economic competitiveness in many industries. Machine breakdown can lead to many adverse effects including increased operation and maintenance costs, reduced production output, decreased product quality and even human injury or death in the event of a catastrophic failure. As a way to overcome these problems, an automated machine diagnostics scheme may be implemented, which will continuously monitor machine health for the purpose of prediction, detection, and diagnosis of faults and malfunctions. In this work, a signal-based condition monitoring scheme is developed and tested on an industrial fish processing machine. A variety of faults are investigated including catastrophic on-off type failures, partial faults in gearbox components and sensor failures. The development of the condition monitoring scheme is divided into three distinct subtasks: signal acquisition and representation, feature reduction, and classifier design. For signal acquisition, the machine is instrumented with multiple sensors to accommodate sensor failure and increase the reliability of diagnosis. Vibration and sound signals are continuously acquired from four accelerometers and four microphones placed at strategic locations on the machine. The signals are efficiently represented using the wavelet packet transform and node energies are used to generate a feature vector. A measure for feature discriminant ability is chosen and the effect of choosing different analyzing wavelets is investigated. Since the dimensionality of the feature vector can become very large in multisensor applications, various means of feature reduction are investigated to reduce the computational cost and improve the classification accuracy. Local Discriminant Bases, a popular and complementary approach to wavelet-based feature selection is introduced and the drawbacks in the context of multisensor applications are highlighted. To address these issues, a genetic algorithm is proposed for feature selection in robust condition monitoring applications. The fitness function of the genetic algorithm consists of three criteria that are considered to be important in fault classification: feature set size, discriminant ability, and sensor diversity. A procedure to adjust the weights is presented. The feature selection scheme is validated using a data set consisting of one healthy machine condition and five faulty conditions. For classifier design, the theoretical foundations of two popular non-linear classifiers are presented. The performance of Support Vector Machines (SVM) and Radial Basis Function (RBF) networks are compared using features obtained from a filter selection scheme and a wrapper selection scheme. The classifier accuracy is determined under conditions of complete sensor data and corrupted sensor data. Different kernel functions are applied in the SVM to determine the effect of kernel variability on the classifier performance. Finally, key areas of improvement in instrumentation, signal processing, feature selection, and classifier design are highlighted and suggestions are made for future research directions. |
Extent | 2775530 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-11-18 |
Provider | Vancouver : University of British Columbia Library |
DOI | 10.14288/1.0068285 |
URI | http://hdl.handle.net/2429/15224 |
Degree |
Master of Applied Science - MASc |
Program |
Mechanical Engineering |
Affiliation |
Applied Science, Faculty of Mechanical Engineering, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2009-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2009_fall_raman_srinivas.pdf [ 2.65MB ]
- Metadata
- JSON: 24-1.0068285.json
- JSON-LD: 24-1.0068285-ld.json
- RDF/XML (Pretty): 24-1.0068285-rdf.xml
- RDF/JSON: 24-1.0068285-rdf.json
- Turtle: 24-1.0068285-turtle.txt
- N-Triples: 24-1.0068285-rdf-ntriples.txt
- Original Record: 24-1.0068285-source.json
- Full Text
- 24-1.0068285-fulltext.txt
- Citation
- 24-1.0068285.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0068285/manifest