UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Application of machine health monitoring in design optimization of mechatronic systems Xia, Min 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2017_november_xia_min.pdf [ 8.72MB ]
JSON: 24-1.0355835.json
JSON-LD: 24-1.0355835-ld.json
RDF/XML (Pretty): 24-1.0355835-rdf.xml
RDF/JSON: 24-1.0355835-rdf.json
Turtle: 24-1.0355835-turtle.txt
N-Triples: 24-1.0355835-rdf-ntriples.txt
Original Record: 24-1.0355835-source.json
Full Text

Full Text

APPLICATION OF MACHINE HEALTHMONITORING IN DESIGN OPTIMIZATION OFMECHATRONIC SYSTEMSbyMin XiaB.Eng., Southeast University, 2009M.Eng., University of Science and Technology of China, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Mechanical Engineering)The University of British Columbia(Vancouver)September 2017c©Min Xia, 2017AbstractMechatronic systems are widely used in modern manufacturing. The key machin-ery of a manufacturing system should be reliable, flexible, intelligent, less com-plex, and cost effective, which indeed are distinguishing features of a mechatronicsystem. To achieve these goals, continuous or on-demand design improvementsshould be incorporated rapidly and effectively, which will address new design re-quirements or resolve existing weaknesses of the original design.With the advances in sensor technologies, wireless communication, data stor-age, and data mining, machine health monitoring (MHM) has achieved significantcapabilities to monitor the performance of an operating machine. The extensivedata from the MHM system can be employed in design improvement of the moni-tored system. In that context, the present dissertation addresses several challengesin applying MHM in design optimization of a mechatronic system.First, this dissertation develops a systematic framework for continuous designevolution of a mechatronic system with MHM. Possible design weaknesses of themonitored system are detected using the information from MHM. The proposedmethod incorporates an index to identify a possible design weakness by evaluatingthe performance, detecting failures and estimating the health status of the system.Second, improved approaches of intelligent machine fault diagnosis (IMFD)that can be applied to more general machinery and faults, are presented. Thisdissertation develops an IMFD approach based on deep neural networks (DNN).It uses the massive unlabeled MHM data to learn representative features. Usingvery few items of labeled data, this approach can achieve superior diagnosis per-formance. The dissertation presents another IMFD approach, which uses the con-volutional neural networks (CNN) and sensor fusion and has increased diagnosisiiaccuracy and reliability. The end-to-end learning capability of the two approachesenables diagnosis of fault types or machines for which limited prior knowledge isavailable.Third, a hierarchical DNN-based method of remaining useful life (RUL) pre-diction is developed. It achieves high accuracy of RUL prediction by modeling thesystem degradation on different health stages. This method generates a better es-timate of the system RUL, which provides accurate information for the evaluationof system design.iiiLay SummaryReliability, efficiency, and flexibility of a manufacturing system is important inmodern industries. Mechatronic systems are widely used as key components. Con-tinuous or on-demand design improvements should be incorporated effectively.Machine health monitoring (MHM) has achieved significant capabilities to monitorthe performance of an operating machine. The extensive data that may be acquiredcan be employed in design improvement of the monitored system. This disserta-tion addresses several challenges in applying MHM in design optimization of amechatronic system. First, a systematic and closed-loop framework is developedfor continuous design evolution of a mechatronic system with MHM. Second,improved approaches of machine fault diagnosis based on deep neural networks(DNN) and sensor fusion that can be applied to more general machinery and faults,are presented. Third, an improved remaining useful life (RUL) prediction methodis developed, which will provide accurate information for the evaluation of systemdesign.ivPrefaceAll the work presented in this dissertation was conducted by Min Xia in the Indus-trial Automation Laboratory (IAL) at the University of British Columbia, Vancou-ver campus, under the direct supervision and guidance of Dr. Clarence W.de Silva,Professor of Mechanical Engineering, The University of British Columbia. Dr. deSilva proposed and supervised the overall research project, acquired funding andresources for the project, suggested the topic of the thesis, suggested concepts andmethodologies in addressing problems in the topic, provided research facilities inhis IAL, and revised the thesis presentation.Chapter 2 is based on the publications [Min Xia and Clarence W. de Silva,”A Framework of Design Weakness Detection through Machine Health Monitor-ing for the Evolutionary Design Optimization of Multi-domain Systems,” IEEEInternational Conference in Computer Science & Education (ICCSE), Vancouver,August 22-24, pp. 205-210, 2014.] and [Min Xia, Teng Li, Yunfei Zhang, andClarence W. de Silva, Closed-loop Design Evolution of Engineering System UsingCondition Monitoring through Internet of Things and Cloud Computing, ComputerNetworks, vol. 101, pp. 5-18, 2016]. Min Xia was responsible for all major ar-eas of concept formation, algorithm development, experiment validation, as wellas manuscript composition. Teng Li and Yunfei Zhang was involved in the earlystages of concept formation and contribution to manuscript editing. Clarence W.de Silva was the supervisory author on this work and was involved throughout theproject in concept formation and manuscript composition.A version of Chapter 3 has been published [Min Xia, Teng Li, Lizhi Liu, LinXu, Clarence de Silva, An Intelligent Fault Diagnosis Approach with UnsupervisedFeature Learning by Stacked Denoising Autoencoder, IET Science, Measurementv& Technology, 2017, DOI: 10.1049/iet-smt.2016.0423]. Min Xia was responsiblefor all major areas of concept formation, algorithm development, experiment vali-dation, as well as manuscript composition. Teng Li was involved in the early stagesof concept formation and contributed to manuscript editing. Lizhi Liu and Lin Xuwere involved in the early stage of concept formation. Clarence W. de Silva wasthe supervisory author and was involved throughout the work in concept formation,guidance, and manuscript composition.A version of Chapter 4 has been published [Min Xia, Teng Li, Lin Xu, LizhiLiu, Clarence W. de Silva, ”Fault Diagnosis for Rotating Machinery Using Mul-tiple Sensors and Convolutional Neural Networks”, IEEE/ASME Transactions onMechatronics, vol. PP, no. 99, pp. 1-1. doi: 10.1109/TMECH.2017.2728371].Min Xia was responsible for all major areas of concept formation, algorithm devel-opment, experiment validation, as well as manuscript composition. Teng Li wasinvolved in the early stages of concept formation and contributed to manuscriptediting. Lizhi Liu and Lin Xu were involved in the early stage of concept for-mation. Clarence W. de Silva was the supervisory author on this project and wasinvolved throughout the project in concept formation, guidance, and manuscriptcomposition.A version of Chapter 5, [Min Xia, Teng Li, Lizhi Liu, Lin Xu, Clarence W.de Silva, A Hierarchical DNN-based Approach for RUL Prediction], has been ac-cepted by 2017 IEEE International Conference on Systems, Man, and Cybernetics.Min Xia was responsible for all major areas of concept formation, algorithm devel-opment, experiment validation, as well as manuscript composition. Teng Li wasinvolved in the early stages of concept formation and contributed to manuscriptediting. Lizhi Liu and Lin Xu were involved in the early stage of concept formationand algorithm development. Clarence W. de Silva was the supervisory author andwas involved throughout the work in concept formation, guidance, and manuscriptcomposition.viTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiNomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvList of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3.1 Evaluation of Mechatronic Systems . . . . . . . . . . . . 41.3.2 Evolutionary Design Optimization . . . . . . . . . . . . . 61.3.3 Machine Health Monitoring . . . . . . . . . . . . . . . . 91.3.4 Deep Neural Networks . . . . . . . . . . . . . . . . . . . 12vii1.4 Contributions and Organization of the Dissertation . . . . . . . . 132 Closed-loop Design Evolution of Mechatronic Systems . . . . . . . . 162.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2 System Framework . . . . . . . . . . . . . . . . . . . . . . . . . 172.3 Condition Monitoring through IoT and CC . . . . . . . . . . . . . 192.3.1 Dada Transmission . . . . . . . . . . . . . . . . . . . . . 202.3.2 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . 212.3.3 Design Knowledge Base through CC . . . . . . . . . . . 222.4 Design Weakness Detection . . . . . . . . . . . . . . . . . . . . . 222.5 Mechatronic Design Quotient . . . . . . . . . . . . . . . . . . . . 252.6 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.6.1 Machine Condition Monitoring . . . . . . . . . . . . . . 282.6.2 Conceptual Design Stage . . . . . . . . . . . . . . . . . . 302.6.3 Detailed Design Stage . . . . . . . . . . . . . . . . . . . 373 Intelligent Machine Fault Diagnosis using Unsupervised Feature Learn-ing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2 Unsupervised Feature Learning . . . . . . . . . . . . . . . . . . . 413.2.1 Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . 413.2.2 Denoising Autoencoder . . . . . . . . . . . . . . . . . . 433.2.3 Softmax Regression Classifier . . . . . . . . . . . . . . . 443.3 Feature Learning and Fault Diagnosis using SDA . . . . . . . . . 443.3.1 Unsupervised Feature Learning with SDA . . . . . . . . . 463.3.2 Fine-tuning a DNN for Classification . . . . . . . . . . . 473.4 Experiment Studies . . . . . . . . . . . . . . . . . . . . . . . . . 483.4.1 Data Description . . . . . . . . . . . . . . . . . . . . . . 483.4.2 Fault Diagnosis using the Proposed Approach . . . . . . . 503.4.3 Effect of the Size of Labeled Data . . . . . . . . . . . . . 543.4.4 Visualization of Learned Features . . . . . . . . . . . . . 56viii4 IMFD using Convolutional Neural Networks and Sensor Fusion . . 584.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . 594.2.1 Convolution Layer . . . . . . . . . . . . . . . . . . . . . 604.2.2 Feature Pooling Layer . . . . . . . . . . . . . . . . . . . 614.2.3 Softmax Layer . . . . . . . . . . . . . . . . . . . . . . . 614.2.4 Mini-batch Stochastic Gradient Descent . . . . . . . . . . 624.3 IMFD using CNN and Sensor Fusion . . . . . . . . . . . . . . . . 634.4 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . 654.4.1 Bearing Fault Diagnosis . . . . . . . . . . . . . . . . . . 654.4.2 Gearbox Fault Diagnosis . . . . . . . . . . . . . . . . . . 715 RUL Prediction using Hierarchical Deep Neural Network . . . . . . 795.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.2 Health Condition Classification using DNN . . . . . . . . . . . . 825.3 ANN-based RUL Predictor and Smoothing Operator . . . . . . . 845.4 Experimental Study . . . . . . . . . . . . . . . . . . . . . . . . . 875.4.1 Data Description . . . . . . . . . . . . . . . . . . . . . . 875.4.2 RUL Prediction Using DNNRULP . . . . . . . . . . . . . 876 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . 936.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 936.2 Possible Future Work . . . . . . . . . . . . . . . . . . . . . . . . 94Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96ixList of TablesTable 2.1 Subsystems of the automated fish cutting system . . . . . . . . 31Table 2.2 Specifications of the current subsystems and the estimated de-sign requirements . . . . . . . . . . . . . . . . . . . . . . . . 32Table 2.3 Available choices for each subsystem . . . . . . . . . . . . . . 33Table 2.4 Typical interactions between criteria . . . . . . . . . . . . . . 35Table 2.5 MDQ evaluation of design alternatives . . . . . . . . . . . . . 36Table 3.1 Conditions of the bearing in the dataset . . . . . . . . . . . . . 50Table 3.2 Classification results with different levels of noise . . . . . . . 53Table 3.3 Comparison of the proposed approach with SVM and kNN . . 54Table 4.1 Conditions of the bearing in the dataset . . . . . . . . . . . . . 69Table 4.2 Hyper parameters of the convolution layers . . . . . . . . . . . 69Table 4.3 Hyper parameters of the max-pooling layers . . . . . . . . . . 69Table 4.4 Diagnosis accuracy of bearing data using multiple sensors andone sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70Table 4.5 Features selected in the time and frequency domains . . . . . . 71Table 4.6 Comparison of bearing fault diagnosis results using differentapproaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Table 4.7 Gearbox dataset details . . . . . . . . . . . . . . . . . . . . . 74Table 4.8 Diagnosis accuracy of gearbox data using multiple sensors andone sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75Table 4.9 Comparison of gearbox fault diagnosis results using differentapproaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77xTable 5.1 The operation conditions of the experiment . . . . . . . . . . . 88Table 5.2 Training and testing result of the DNNHSC . . . . . . . . . . . 90Table 5.3 RUL prediction error using the HDNNRULP . . . . . . . . . . 91xiList of FiguresFigure 1.1 System framework of design evolution with machine healthmonitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Figure 2.1 Framework of evolutionary design improvement incorporatingMHM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Figure 2.2 Framework for closed-loop design evolution of an engineeringsystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Figure 2.3 Proposed scheme of condition monitoring through the IoT andCC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Figure 2.4 Procedure of design weakness detection. . . . . . . . . . . . . 24Figure 2.5 Schematic diagram of the automated fish cutting system. . . . 29Figure 2.6 Reduced conceptual design space. . . . . . . . . . . . . . . . 34Figure 2.7 Procedure of evolutionary design with GP. . . . . . . . . . . . 38Figure 3.1 The structure of the autoencoder. . . . . . . . . . . . . . . . . 42Figure 3.2 Schematic diagram of the procedure of denoising autoencoder. 43Figure 3.3 The scheme of the SDA based feature learning and fault diag-nosis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Figure 3.4 The procedure of stacking denoising autoencoder. . . . . . . . 47Figure 3.5 Fine-tuning of the DNN by minimizing the error in predictingthe supervised target. . . . . . . . . . . . . . . . . . . . . . . 48Figure 3.6 Experiment setup for bearing condition data acquisition. . . . 49Figure 3.7 The structure of datasets. . . . . . . . . . . . . . . . . . . . . 51Figure 3.8 Classification results with the proposed approach . . . . . . . 52xiiFigure 3.9 Effect of the size of labeled data . . . . . . . . . . . . . . . . 55Figure 3.10 Visualization of learned features . . . . . . . . . . . . . . . . 57Figure 4.1 Flowchart of the proposed fault diagnosis approach. . . . . . . 63Figure 4.2 Architecture of the CNN-based fault diagnosis model . . . . . 64Figure 4.3 Vibration signals of bearings with different conditions fromone sensor. (a) BD 0.18 mm. (b) BD 0.36 mm. (c) BD 0.53mm. (d) OR 0.18 mm. (e) OR 0.36 mm. (f) OR 0.53 mm. (g)IR 0.18 mm. (h) IR 0.36 mm. (i) IR 0.53 mm. . . . . . . . . . 66Figure 4.4 Experimental result of bearing dataset. (a) Convergence curveof the training process. (b) Condition classification confusionmatrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Figure 4.5 Experimental setup for gearbox fault diagnosis. (a) Conveyorsystem of a fish processing machine; (b) Three faulty gear-boxes; (c) Accelerometers mounted on the gearbox; (d) Na-tional Instruments PXIe DAQ system. . . . . . . . . . . . . . 72Figure 4.6 Vibration signals of gearboxes with different conditions fromone sensor. (a) Normal condition. (b) Damaged gear. (c) Dam-aged bearing. (d) Misaligned output shaft. . . . . . . . . . . . 74Figure 4.7 Experimental result of gearbox dataset. (a) Convergence curveof the training process. (b) Condition classification confusionmatrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Figure 5.1 Degradation process of a bearing. (a) Vibration signal of therun-to-failure process. (b) RMS of the vibration signal. . . . . 81Figure 5.2 Flowchart of the DNNRULP method. . . . . . . . . . . . . . 82Figure 5.3 Five stages of the degradation process. . . . . . . . . . . . . . 83Figure 5.4 Structure of the DNN used in the DNNHSC . . . . . . . . . . 84Figure 5.5 Flowchart of the health stage classification. . . . . . . . . . . 85Figure 5.6 Formation of the training samples to each ANNRULP. . . . . 85Figure 5.7 ANN model for RUL prediction. . . . . . . . . . . . . . . . . 86Figure 5.8 PRONOSTIA bearing test setup. . . . . . . . . . . . . . . . . 88Figure 5.9 Vibration signal acquisition scheme . . . . . . . . . . . . . . 89xiiiFigure 5.10 RUL prediction result using the HDNNRULP. . . . . . . . . . 90Figure 5.11 Comparison of the RUL prediction results. . . . . . . . . . . 92xivNomenclaturea Mobius transformb bias vectorB bias matrixc constraint sanctification functionCv Choquet integrald a design alternativeη learning ratee¯ average prediction errorE RUL evaluation functionf feed-forward processfθ deterministic mappinggi fault indicating functiongθ deterministic mappingH historical run-to-failure dataJ cost function of softmax regressionL loss functionLP actual life percentageLˆP estimated life percentageM aggregation operator~P = [p1, p2, ..., pr]T performance aspectsqD stochastic mappingR RUL prediction functionRMS root mean square valueRUL remaining useful life values activation functionS performance satisfaction functiont labels of samplest age valuexv~T = [t1, t2, ..., tr]T estimated RUL of componentsv fuzzy measure~W weighting vectorW weighting matrixxdi partial score of degree of satisfaction of theith criterionxviList of AcronymsAI artificial intelligenceANN artificial neural networksANNRULP ANN-based RUL predictorBD ball defectBGs Bond GraphsCC cloud computingCNC Computer Numerical ControlCNN convolutional neural networksCWRU Case Western Reserve UniversityDB damaged bearingDBN Deep Belief NetsDFP Davidon-Fletcher-PowellDG damaged gearDNN deep neural networksDNNHSC DNN-based health stage classifierDNNRULP DNN-based RUL predictionDPCA dynamic principal component analysisDWCI Design Weakness Candidate IndexxviiFFT fast Fourier transformGA genetic algorithmGP genetic programmingIAL Industrial Automation LaboratoryICA independent component analysisIMFD intelligent machine fault diagnosisIoT Internet of thingsIPv4 Internet Protocol version 4IPv6 Internet Protocol version 6IR inner race defectkNN k-nearest neighborsLGs Linear GraphsMDQ mechatronic design quotientMDI mechatronic design indicatorMHM machine health monitoringMOS misaligned output shaftNLDOP nonlinear dynamic optimization problemOR outer race defectPCA principal component analysisPLS partial least squaresReLU rectified linear unitRGB red-green-blueRMS root mean squareRUL remaining useful lifexviiiSAN Sensor Area NetworkSDA stacked denoising autoencoderSNR signal-to-noise ratioSVM support vector machinet-SNE t-distributed Stochastic Neighbor EmbeddingxixAcknowledgmentsI wish to express my sincere gratitude to my supervisor, Dr. Clarence W. de Silvafor his invaluable advice, guidance, strong support, and great patience throughoutmy academic program and research at the University of British Columbia. I deeplyappreciate him offering me the opportunity to pursue my PhD studies at UBC un-der his supervision. He has provided me countless opportunities and support formy academic advances as well as future career development. Without his advice,guidance, support, encouragement and supervision, I would not have been possibleto achieve the academic goals throughout my PhD program.This research has been supported by several research grants, from the NaturalSciences and Engineering Research Council of Canada, the Canada Foundation forInnovation, the British Columbia Knowledge Development Fund, Mitacs Acceler-ate Program, China Scholarship Council and the Tier 1 Canada Research Chair inMechatronics and Industrial Automation held by Dr. Clarence W. de Silva.I would like to also thank all my colleagues in Industrial Automation Labora-tory at UBC, particularly, Dr. Roland Lang, Dr. Yanjun Wang, Dr. Yunfei Zhang,Dr. Muhammad Tufail Khan, Ms. Yu Du, Ms. Lili Meng, Mr. Shan Xiao, Mr.Hani Balkhair, Mr. Shujun Gao, Mr. Teng Li, Mr. Zhuo Chen, Mr. Tongxin Shuand Mr. Jiahong Chen. I thank all of them for their support, help and friendship,which makes my PhD life less difficult and more joyful.I would like to express my deepest gratitude to my family. I thank my parentsand parents-in-law for their love and encouragement throughout my PhD journey.I thank my daughter Julie, whose birth was such a great joy during my PhD study.Lastly but most importantly, I wish to thank my wife for her strong support andgreat love to me throughout my life.xxChapter 1Introduction1.1 MotivationThe term ”mechatronics” is considered to be first used by engineer Tetsuro Moriin 1969 who combined the word ”mechanical” and ”electronic” to describe theelectronic control systems that were developed for mechanical factory equipmentat Yaskawa Electric Corp. Since then, the meaning of the term has broadened toinclude more technical areas and design methods. Some literature defines mecha-tronics as the synergistic application of mechanics, electronics, control engineer-ing, and computer science in the development of electromechanical products andsystems through integrated design [1]. Some other literature regards the identityof mechatronics as the ”interdisciplinary design methodology which solves pri-marily mechanically oriented product functions through the synergistic spatial andfunctional integration of mechanical, electronic, and information processing sub-systems” [2]. Due to the wide application areas in both industry and consumermarket, mechatronic systems have played a significant role in these areas over thepast few decades. Most mechatronic systems are multi-domain (or multi-physics)mechatronic systems [3], which consist of different domains such as mechanical,electrical, hydraulic, pneumatic, thermal, control, and so on, examples of whichinclude automated packaging lines [4], industrial robots [5], Computer NumericalControl (CNC) machines, smart home appliances, and aircraft.A typical mechatronic system consists of a mechanical skeleton, actuators, sen-1sors, controllers, signal conditioning/modification devices, computer/digital hard-ware and software, interface devices, and power sources [6]. With the rapid devel-opment of modern technological applications, mechatronic systems have becomeincreasingly complex and demanding. Due to its complex structure and possibledynamic coupling between different domains, the design of mechatronic systemsis a rather complicated and challenging task. Furthermore, the flexibility of a man-ufacturing system is quite important and advantageous in modern industries, whichfunction in a competitive environment where market deviation and the need for cus-tomized product are growing. The machinery of a manufacturing system should bereliable, flexible, intelligent, less complex, and cost effective and should be devel-oped using mechatronic methodologies. Specifically, to achieve these goals, thedesign methodologies for these systems should be revisited and improved in thecontext of Mechatronics. In particular, continuous or on-demand design improve-ments may have to be incorporated rapidly and effectively in order to address newdesign requirements or resolve existing weaknesses of the original design.First, an integrated and concurrent approach should be considered in the designprocess, particularly in the conceptual and detailed design phases. This particularlymeans that dynamic interactions between domains should be considered concur-rently throughout the design process. Recently, in the context of multi-domaindesign, attention has been given to such subjects as multi-domain modeling, multi-criteria decision making, and evolutionary computing. Appreciating that an effec-tive method of system representation (i.e., modeling) is important for a systematicmethod of design, dynamic modeling approaches such as Bond Graphs (BGs) [7]and Linear Graphs (LGs) [8, 9] have been investigated for the design of multi-domain system. These modeling methods can describe relationships and interac-tions between different components and domains of a dynamic system. Alongwith modeling methods for multi-domain dynamic systems, several strategies fordesign optimization have been proposed. In the process of design optimization, arather challenging task is to satisfy multiple design objectives. Fuzzy integrals havebeen investigated as a systematic aggregation of different and sometimes contra-dictory design objectives. Evolutionary optimization strategies can then be adoptedto evolve the design. Particularly, genetic programming (GP) [10], in recent years,has gained much interest as it can be extended to evolutionary optimization that2satisfies a variety of design specifications. Through GP, a design model of a multi-domain system can be evolved to search for an optimal solution in the design space,in an automated manner. However, for complex mechatronic system design, thedesign space can be extensive. Thus, it is crucial to detect components with weakdesigns or parameters among the numerous subsystems and components so thatevolutionary design algorithm can be applied efficiently.Second, continuous and rapid design improvement should be considered basedon the running condition of the engineered system to achieve enhanced require-ments in reliability, flexibility, cost, intelligence, and so on. By evaluating theperformance of the running system and detecting, and diagnosing system malfunc-tions and failures, design weaknesses of the system can be identified for furtherimprovement. With the development of monitoring techniques, machine healthmonitoring (MHM) has been conventionally utilized in mechatronic systems as anefficient tool for the diagnosis of system malfunctions and the prognosis of the im-pending failures. Methodologies can be developed to decrease the occurrence offaults and to enhance the reliability of the system. Researchers have realized aswell the potential in facilitating design evolution using MHM system.Recently, with the emerging developments of the Internet of things (IoT), es-pecially with the advances in industrial IoT, the data collection from operatingmachinery is growing exponentially as more and more subsystems and compo-nents are communicating with each other or are being monitored. The big datacollection offers new opportunities not only for smart and flexible manufacturingbut also for the potential of design improvement of existing machinery. It is con-ceivable that, through the information acquired by an MHM system, some designweaknesses of the monitored mechatronic system can be detected and identified.Then, with the assistance of multi-domain dynamic system modelling and evolu-tionary optimization, improvements in the design of the system can be prescribed.However, to evaluate a current design and thereby detect the possible design weak-ness through the massive amounts of acquired machine condition data, a systematicapproach has to be established. Moreover, the traditional machine health diagnosisand prognosis algorithms can only deal with limited categories of machinery. Theyare not capable of capturing some valuable information in massive amounts of ma-chine condition data. Thus, more general and intelligent machine health diagnosis3and prognosis algorithms should be developed with enhanced capability of acquir-ing useful information from big data so that more subsystems and components canbe evaluated.1.2 Research ObjectivesTo achieve continuous and rapid design improvement in mechatronic systems, theperformance of the existing systems should be evaluated and thereby the exist-ing design weaknesses should be identified. Then, integrated design optimizationmethods can be carried out more efficiently based on the detected design weak-nesses. MHM data of running system can form valuable feedback for the designimprovement process. The main challenge is how to establish systematic methodsto determine the design weaknesses by using massive MHM data. The first maingoal of the present research is to develop a systematic framework for the continu-ous design evolution of a mechatronic system through MHM. In this dissertation,an MHM scheme based on IoT and cloud computing (CC) is developed. The sec-ond objective of this dissertation is to develop a formal and systematic approachfor the detection of existing weaknesses of a monitored mechatronic systems. Themethod proposed in the present work develops an index to identify a possible de-sign weakness by evaluating the performance of a system and thereby detectingsystem failures and quantifying the heath status of the system. Also, with the col-lected big data, improved algorithms for fault diagnosis and remaining useful life(RUL) prediction are developed by using deep neural networks (DNN), which is acutting edge learning method that can learn wide knowledge from large amountsof data.1.3 Related Work1.3.1 Evaluation of Mechatronic SystemsUsing a quantitative evaluation of the performance of a mechatronic system, thecompeting design solutions can be objectively compared. Different performancespecifications and evaluation criteria have been proposed and applied to the designof mechatronic systems. The design methodologies of mechatronic systems have4evolved from the sequential approach to the concurrent and integrated approachwhere the interaction and coupling between different domains are considered. Ac-cordingly, various new methods have been proposed to evaluate a mechatronic sys-tem in the process of design evolution.Moulianitis et al. [11] proposed an approach for the evaluation process of amechatronic system in the stage of conceptual design. The candidate solutionswere evaluated through an evaluation index, which was based on fuzzy set the-ory. The evaluation index included three elements: complexity, intelligence, andflexibility where each element contained several criteria. Weight factors were usedto indicate the significance of each criterion. Different t-norms and averaging op-erators were compared and discussed to calculate the final evaluation score. Themethod was applied to the conceptual design of robot grippers for handing fabrics,which showed its effectiveness. However, the interactions between criteria werenot discussed. Furthermore, only discrete design space was considered.De Silva [12] argued that optimal design of the subsystems in a sequential de-sign approach does not guarantee the optimum performance of the overall mecha-tronic system. He proposed to specify separate performance indices for the subsys-tems of the mechatronic system. Then a performance measure called mechatronicdesign quotient (MDQ) was developed, which is employed to optimize the designby making the indices with subsystem interactions to reach the optimal values ofthe indices in a sequential design. Then the goal of the mechatronic design is tofind a design solution that maximizes the MDQ.Based on the concept of MDQ, Behbahani and de Silva [13] developed anintegrated and multi-criteria methodology for the conceptual design of mechatronicsystems. The controller design issues and parameters were considered concurrentlywith other issues and parameters. The interactions between criteria were modeledby fuzzy logic criteria. A nonlinear fuzzy integral called Choquet integral was usedfor the aggregation of criteria. The methodology was implemented in the designof the manipulator of an industrial machine, which demonstrated the effectivenessof the MDQ approach in the conceptual design stage. In the detailed design stage,MDQ may be employed in a similar manner.Janschek [2] developed an approach for mechatronic system evaluation usingperformances metrics based on the deviation of the measured value to a reference5value. System performance based on both deterministic and stochastic models wasdiscussed. The author used p-norms of the performance over an observation time toevaluate the performance under a deterministic model and the 3 standard deviationmethod to evaluate the stochastic performance under a stochastic model. Then,the performance metrics were combined by nonlinear summation, for example,quadratic summation or max-summation. The correlation between metrics wasrepresented by the covariance values.Hammadi et al. [14] proposed a multicriteria performance indicator, calledmechatronic design indicator (MDI) for evaluating the performance of a mecha-tronic system in the preliminary design stage. The individual performance metricswere defined and evaluated using model-based simulation. Then a neural networkof Gaussian radial basis functions was adopted to calculate the MDI by aggregatingthe individual performance metrics. The MDI approach was applied to the designof a mechatronic system with subsystems of an electric motor, a PID controller,and a crack-slider mechanism. The result showed that MDI could provide accu-rate information for decision making. This approach is essentially similar to theprevious MDQ approach even though the authors failed to mention that.Villarreal-Cervantes et al. [15] proposed a concurrent design method for aplanar five revolute two degree-of-freedom parallel robot. The design problemwas modeled as a nonlinear dynamic optimization problem (NLDOP) where thekinematic and dynamic behaviors were considered simultaneously. They used twooptimization techniques, differential evolution, and sequential quadratic program-ming to solve the NLDOP. The optimum structure parameters and the optimumPID control gains were obtained by optimizing the cost function considering thepositioning accuracy and a manipulability measure.1.3.2 Evolutionary Design OptimizationIn the past decade, evolutionary algorithms have been increasingly used in de-sign optimization due to their explorative capabilities and flexible representations.Considerable literature incorporates evolutionary algorithms, in particular, geneticalgorithm (GA) and genetic programming (GP), to explore the parameters of thedesign. GA is inspired by the evolutionary process of species and operates on6the Darwinian principle of survival of the fittest. The solution is represented by achromosome structure that consists of a number of genes. A population of trial so-lutions is randomly generated and evaluated to see how satisfactory each solutionis. Then reproduction operators are applied to generate new design solutions froma pair of parent solutions selected from the existing population. The reproductionprocess continues until the termination criterion has been satisfied.Grimbleby [16] proposed a new approach to circuit synthesis based on a hy-brid GA algorithm. The circuit topology was represented by a chromosome thatconsists of a number of genes. Each gene contained the component type and theirterminal nodes. The length of the chromosome was flexible by introducing theempty component type. When the circuit topology was generated, its fitness wasevaluated after numerically optimizing its component values using a quasi-Newtonalgorithm based on the Davidon-Fletcher-Powell (DFP) method. Circuits gener-ated by the proposed method were efficient and fully met the design specifications.Affi et al. [17] applied GA for multi-objective optimization of a four-barmechatronic system where the geometry and the dynamic behavior were consid-ered simultaneously. The authors compared the concurrent approach with the se-quential approach where they first optimized the geometry of the mechanism fora given path and optimized the mass distribution to minimize several objectivefunctions including the maximum current used by the motor, its maximum vari-ation, and its fluctuation. The results showed that the GA-based multi-objectiveoptimization approach could realize better design solutions.GP is an extension of genetic algorithms where the encoding of the solutionis a tree of computer programs of varying sizes and shapes instead of a chromo-some structure in a conventional GA. Then the genetic algorithm operates on apopulation of computer programs to find the optimal solution. GP has been suc-cessfully applied in design optimization due to its representation flexibility and theexploration capability.Koza et al. [18] applied GP to achieve an automatic synthesis of analog elec-trical circuits. They established a mapping between the tree structure solution inGP with the electrical circuits. A set of circuit-constructing functions were de-fined including component-creating functions, topology-modifying functions, anddevelopment-controlling functions. The genetic operators were then applied on a7population of embryo solutions with the circuit-constructing functions. Both topol-ogy and sizing of the electrical circuits were optimized to automatically create ananalog electrical circuit from a high-level statement of the circuits desired behav-ior. Their approach was applied in the automatic synthesis of eight prototypicalanalog circuits.Inspired by Kozas work, Seo et al. [19] proposed a unified and automatedprocedure capable of designing mechatronic systems to meet given performancespecifications, subject to various constraints. They used BG to model a multi-domain mechatronic system and used GP to explore the design space in achievingan optimal design. BG is an efficient tool for modeling multi-domain systems inview of its domain independent, free and open-ended composition, and efficiencyin classification and analysis of models. Thus, various types of acceptable andfeasible candidate designs can be determined. The authors established a mappingbetween BG model and a GP tree structure representation. They defined various GPconstruction functions and terminals for constructing BG models. Their methodachieved automatically synthesizing designs for a multi-domain dynamic system.However, their approach did not consider controller parameters.Wang et al. [20] extended the basic BG/GP approach to achieve concep-tual mechatronic system design where controller schemes were also incorporated.Their work focused on the improvement of the BG-GP approach through knowl-edge interaction with GP by extracting and incorporating the modular design knowl-edge. Different controller schemes, for example, proportional, proportional plusderivative, proportional plus integral, proportional-integral-derivative, and lead andlag compensators, were represented by combinations of bond graph elements. Com-plex module functions were defined and used as building blocks to reduce the GPsearch space. The incorporated design knowledge enhanced the search efficiencyand realizability of the generated designs.Behbahani and de Silva [21] proposed a two-loop evolutionary approach witha hybrid of GA and GP for the design optimization of a mechatronic system. Amulti-domain mechatronic system was represented by a BG. GP was used in theouter loop for topology optimization and GA was adopted to find an elite solutionfor each topology generated by the GP. To avoid the creation of repeated topologies,a memory feature was added to the GP process. Their approach provided a con-8current, integrated, and autonomous tool for topology synthesis of a mechatronicsystem. It was applied to the design of analog filters and the controller design ofan industrial fish processing machine. Later, the authors improved this approachby introducing a niching scheme [22]. Several elite solutions were generated fordifferent preferences. The designer would be able to make the final decision byevaluating these elites by further domain knowledge or other criteria what couldnot be easily incorporated in the fitness measure. This approach was more con-sistent with a real design practice. It offered increased flexibility to designers andmore chances to achieve a feasible design.1.3.3 Machine Health MonitoringThe growing demand for high quality, low cost and highly customized productsrequires increasing reliability of machinery in production systems. Machine healthmonitoring (MHM or, machine condition monitoring MCM) has been traditionallyused to detect, diagnose and correct system faults [23]. In the past several decades,condition monitoring and fault diagnosis have been widely investigated and appliedin a variety of areas such as industrial automation, process control, and anomalydetection and maintenance in aerospace, automobile, wind turbine, railway andmanufacturing sectors [24–27]. Both model-based [28] and data-driven [29] ap-proaches have been developed in the past several decades. The characteristics andtheir application scenarios of different fault detection and diagnosis approachesin complex systems have been reviewed and analyzed in [30] from the perspec-tive of data processing. The high complexity and cost of model-based approacheslimit their implementation in real applications [31]. However, with advances insensor techniques and signal processing, data-driven approaches based on varioussignals such as vibration, acoustic emission, temperature, pressure, and current,have received increased attention in recent years [29]. Traditional data-driven faultdiagnosis algorithms primarily involve the following three steps: data acquisition,feature extraction and condition classification [32].Features of the signals in the time domain, frequency domain, and time-frequencydomain have been widely investigated and applied for fault diagnosis. Feature se-lection techniques such as principal component analysis (PCA), dynamic principal9component analysis (DPCA), independent component analysis (ICA), and partialleast squares (PLS) are usually applied to decrease the dimension of the featurespace while retaining most of the information from the data [33]. Classificationalgorithms, typically artificial intelligence (AI) approaches e.g. support vectormachine (SVM), artificial neural networks (ANN) and k-nearest neighbors (kNN)[34], are used to classify the conditions. Classifier ensembles have been investi-gated as well to increase the classification accuracy and robustness [35].Another significant application area of machine condition monitoring is prog-nostics and condition-based maintenance. Through the information from conditionmonitoring of a dynamic system, the remaining useful life (RUL) of the systemcan be predicted and an appropriate maintenance plan can be established to achievegood performance of the system at a minimum maintenance cost [36]. Data-drivenapproaches with the integration of AI techniques, e.g. ANN and SVM, have beenwidely studied in recent years for RUL prediction.Wu et al. [37] developed an ANN-based decision support system for predictivemaintenance of rotating equipment. A three layer ANN was used for the predictionof the RUL. The root mean square (RMS) value was calculated from the vibrationsignal of the degradation process. The current and previous RMS values togetherwith the current time value were constructed as inputs to the ANN. Ten point mov-ing average was used to smooth the RMS value.Tian [38] proposed an improved ANN-based method for RUL prediction. An-other hidden layer was added to a typical three layer ANN to better capture thenonlinearity. The author introduced more measurements as inputs to the ANN.The condition monitoring measurement series for a failure history was fitted by afunction generalized from the Weibull failure rate function. The fitted measurementvalues were used as inputs to the ANN to reduce the effects of the noise factors.This approach achieved better RUL prediction than that from Wu’s method.With the advances in wireless sensor networks, cloud computing, Internet ofThings and smart factory, the amount of data acquired in condition monitoring hasgrown exponentially. Such big data present new challenges in data storage, trans-mission, and processing as well as new opportunities in discovering more usefulunderlying information in a reliable and effective manner [39]. Machine conditionmonitoring has shown much potential in identifying weaknesses in engineering10Figure 1.1: System framework of design evolution with machine health mon-itoring.systems that can be related to poor design. de Silva [1] and then Gamage and deSilva [40] proposed a framework that can specifically integrate an MHM systemand an expert system to carry out design evolution of a multi-domain dynamicsystem, as shown in Fig. 1.1. The utilization of information from condition moni-toring of an engineering system was proved to be useful in design improvement ofthe system by detecting malfunctions of the system.Besides fault detection, such valuable information as system performance andremaining useful life can be evaluated from the condition monitoring data to de-tect possible system weaknesses. Xia and de Silva [41] presented a methodologyfor design weakness detection of an engineering system through MHM. Using thesensed condition data, system performance, fault diagnosis, and remaining usefullife estimation were performed to identify the weaknesses of the current design.Still, there are many important and unresolved issues of integrating machinecondition monitoring and engineering system design. For instance: (1) The tra-ditional monitoring systems face many challenges related to communication andstorage of huge amounts of sensed data; (2) Computing capability of a local com-puter can limit the analysis and interpretation of the collected big data; (3) How11to analyze the data and evaluate the current and future status of the monitored sys-tem; (4) How to translate the results into knowledge. (5) How to manage and shareknowledge for assisting the design process, especially collaborative design. (6)How to keep it cost effective due to the large amounts of sensors and data acquisi-tion devices that are needed.1.3.4 Deep Neural NetworksMost of the traditional fault diagnosis approaches rely on manual feature extrac-tion, which requires significant prior knowledge of signal processing and diag-nostic expertise. Also, the existing algorithms are meant for specific issues andtherefore are case sensitive and not meant for general application. In recent years,deep neural networks (DNN) have been investigated and have shown the promisingcapability of capturing representative features from raw data through multiple non-linear transformations across their deep structures. DNN have been implementedin many applications such as computer vision, natural language processing, speechrecognition, and bioinformatics with outstanding performance compared to the ap-proaches that use traditional manually designed features.Krizhevsky et al. [42] trained a deep neural network to classify 1.2 millionhigh-resolution images in the ImageNet LSVRC-2010 contest into 1000 classes.Their DNN consisted of convolutional layers, max pooling layers, fully connectedlayers and a softmax layer. The dropout method was used to reduce overfitting inthe fully-connected layers. They achieved considerably better classification resultscompared with the state-of-the-art.Ruhi et al. applied Deep Belief Nets (DBN) to a natural language callroutingtask. The performance of the DBN-based approach was compared with SVM,boosting and maximum entropy. With pre-training of the DBN by unsupervisedlearning using the additional unlabeled data, the proposed approach achieved betterperformance than other methods.Graves et al. [43] proposed a deep recurrent neural network approach with acombination of multiple levels of representation and the flexible use of long-rangecontext, which achieved a lower error rate on the TIMIT phoneme recognitionbenchmark. Di Lena et al. [44] utilized a DNN architecture for protein contact map12prediction, which increased the classification accuracy by more than 10% over theexisting approaches.Due to the superior capability of DNN in feature learning and classification,they have recently attracted the attention of researchers in machine fault diagnosis.Jia et al. [45] presented a five-layer DNN model for diagnosing the faults of ro-tating machinery. The model was first pre-trained by an unsupervised autoencoderand fine-tuned with the backpropagation algorithm for classification. It achievedexcellent diagnosis performance. Sun et al. [46] proposed a sparse auto-encoder-based DNN approach for induction motor fault classification. Representative fea-tures were learned automatically through a sparse autoencoder and used for faultclassification.1.4 Contributions and Organization of the DissertationThe main contributions of the dissertation are listed below.1. A novel closed-loop design evolution framework for mechatronic systemsis presented. Compared with other design evolution methodologies, it can achievecontinuous design improvement for mechatronic systems through conceptual de-sign, detailed design, implementation, condition monitoring and design weaknessdetection. New design requirements or existing design weaknesses can be ad-dressed by the proposed approach. IoT and CC are introduced to address thelimitation of the traditional MHM approach in sensing, data transmission, datastorage and data processing. A systematic evaluation approach is developed todetect possible design weaknesses, which will guide the redesign by reducing thesearch space.2. An intelligent fault diagnosis approach based on stacked denoising autoen-coder (SDA) and DNN is developed to enhance the capability of an MHM systemfor use with more general and wider categories of mechatronic systems. The typ-ical drawbacks of the traditional fault diagnosis methods are: (1) features are ex-tracted with prior knowledge, (2) a large amount of labeled data is needed, and (3)diagnosis model has to be rebuilt for diagnosing new conditions. The developedapproach overcomes these drawbacks. It can learn representative features auto-matically from a massive quantity of unlabeled condition data. Only a few items13of labeled data are needed to fine-tune the DNN to perform fault diagnosis. Inaddition, the proposed approach can utilize the trained model to diagnose new con-ditions by further fine-tuning the DNN with a few items of labeled data of the newconditions. The effectiveness of the developed approach is verified by using a stan-dard dataset of bearing faults. The robustness of the method to noise is evaluatedby corrupting the original signal with different levels of noise.3. A convolutional neural networks (CNN)-based approach with multiple sen-sor fusion is developed for fault diagnosis of rotating machinery. Signals from mul-tiple sensors are fused at the data level to form the input to the model. Represen-tative features can be learned directly from raw signals by the CNN model whereno hand-crafted features are needed. Dropout is integrated to overcome the prob-lem of overfitting when the size of training data is small. The effectiveness of theapproach is verified by experimental studies in gearbox and bearing fault diagno-sis. The comparison between the proposed method and the traditional approachesshow the superior performance of the proposed method. More importantly, thisend to end feature learning capability of the proposed approach would enable itswide application in fault diagnosis of machinery with limited prior knowledge andlimited representative handcrafted features.4. A hierarchical DNN-based RUL prediction method is proposed with en-hanced prediction accuracy. This method models the degradation process underseveral health stages. Several ANN are used to model the degradation in eachstage. A DNN-based classifier is trained to achieve the health stage classificationusing raw monitoring data. The degradation in each stage is then modeled by anANN using the calculated features. A smoothing operator is applied on the healthstage classification result and the RUL predictions of each stage. The experimentresult shows that the RUL prediction accuracy using the proposed method is supe-rior to that with only one single model.The organization of the dissertation is as follows:Chapter 2 presents the proposed closed-loop design evolution approach forcontinuous and on-demand design improvement of a mechatronic system. AnMHM scheme based on IoT and CC is proposed to employ condition monitoringin the design improvement process by evaluating system performance, detectingthe system failure and estimating system future status. A design weakness index14is proposed to detect possible design weakness and reduce the search space foreffective use of the evolutionary design optimization.Chapter 3 presents an improved DNN-based fault diagnosis approach that hasthe potential for fault classification of more general mechatronic components andsystems. A SDA-based DNN is incorporated. Representative features are learnedby applying the denoising autoencoder to the unlabeled data in an unsupervisedmanner. A DNN is then constructed and fine-tuned with a small amount of labeleddata. New conditions can be classified by simply fine-tuning the trained DNNmodel using a small amount of labeled data under the new conditions.Chapter 4 presents the CNN-based approach for fault diagnosis with datafused from multiple sensors. Data level sensor fusion is achieved with the CNNstructure where temporal and spatial features can be learned. No hand-crafted fea-tures are used because the CNN model can extract representative features directlyfrom raw signals. Fault diagnosis of two widely used machine components, gear-boxes and ball bearings, are tested using the CNN-based approach. The resultsshow the superior diagnosis performance of the proposed method compared to thetraditional approaches. The developed approach can be extended to fault classifica-tion of other machinery with various types of sensors, due to its end-to-end featurelearning structure.Chapter 5 presents the hierarchical DNN-based RUL prediction method. Theoverall degradation process is segmented into different health stages. A DNN-based classification model is built to achieve the health stage classification. Foreach stage, an ANN-based RUL predictor is trained using data samples in eachstage. A smoothing operator functions on the output of the health stage classifierand the output of each RUL predictor are used to obtain the final RUL prediction.Chapter 6 concludes the dissertation by summarizing its main contributionsand discussing the direction of possible future research.15Chapter 2Closed-loop Design Evolution ofMechatronic Systems2.1 IntroductionFlexibility of a manufacturing system is quite important and advantageous in mod-ern industries, which function in a competitive environment where market devia-tions, production of small batches, and the need for customized products are com-mon. The key machinery of a manufacturing system should be reliable, flexible,intelligent, less complex, and cost effective. To achieve these goals, the designmethodologies for engineering systems should be revisited, improved, and inno-vated. In particular, continuous or on-demand design improvements have to beincorporated rapidly and effectively in order to address new design requirementsor resolve existing weaknesses of the original design. The design of an engineer-ing system, which is typically a multi-domain system, can become complicateddue to its complex structure and possible dynamic coupling between physical do-mains. Hence, an integrated and concurrent approach should be considered in thedesign process, in particular both conceptual and detailed design phases. In thecontext of multi-domain design, attention has been given recently to such subjectsas multi-criteria decision making, multi-domain modeling, evolutionary comput-ing, and genetic programming. More recently, machine health monitoring (MHM)has been considered for integration into a scheme of design evolution even though16many challenges exist for this to become a reality . The challenges include lackof systematic approaches and the existence of technical barriers in the acquisition,transmission, storage, and mining of very large amounts of condition data.de Silva [6] proposed a framework of evolutionary design improvement incor-porating MHM, as shown in Fig. 2.1. Evolutionary design improvement can befacilitated through online monitoring, in conjunction with a model of the exist-ing system whose design needs to be improved and evolutionary computing tech-niques, e.g. GP. The MHM system is expected to identify the regions (sites) ofpossible design weaknesses in the system. This provides ”modifiable sites” for theexisting system or model for further optimization.Today, the IoT and CC are being developed quickly, which offer new oppor-tunities for evolutionary design in such tasks as data acquisition, storage and pro-cessing. A framework for the closed-loop design evolution of engineering systemis developed to achieve continuous design improvement for engineering systemusing an MHM system assisted by IoT and CC. New design requirements or thedetection of design weaknesses of an existing engineering system can be addressedthrough the proposed framework. A design knowledge base that is constructed byintegrating design expertise from domain experts, on-line process information fromcondition monitoring and other design information from various sources is pro-posed to realize and supervise the design process to achieve increased efficiency,design speed, and effectiveness.2.2 System FrameworkIn this dissertation, a framework is proposed for design evolution of an engineer-ing system through condition monitoring using IoT and CC, as shown in Fig. 2.2.With the help of IoT, condition data are collected at different sites of the engineer-ing system. The status of the operational system and of the subsystem modules isanalyzed through data mining, which involves evaluation of the performance, diag-nosis of faults, and estimation of the RUL of the system. Then design weaknessesare detected for the monitored system and provided for consideration in a futurestage of design improvement. New design methodologies, available technologies,developed functional modules and their parameters can be collected from domain17Figure 2.1: Framework of evolutionary design improvement incorporatingMHM.experts, handbooks, Internet and other information sources. This information, to-gether with the information from condition monitoring, can be utilized in forminga design knowledge base, which is continuously updated. It can assist the design-ers in generating an innovative and efficient design solution both in the conceptualdesign stage and the detailed design stage. In this manner, a closed-loop designimprovement process is achieved to accommodate new design requirements or tocorrect the system design weaknesses.18Figure 2.2: Framework for closed-loop design evolution of an engineeringsystem.2.3 Condition Monitoring through IoT and CCA key task of condition monitoring is the acquisition of condition data of an en-gineering system such as throughput, capacity, speed, torque, power consumption,size, weight, vibration, current, voltage, and other response variables and param-eters that can be valuable for evaluating the status of the system. The proposedscheme of condition monitoring through IoT and CC is shown in Fig. 2.3. Withthe help of IoT, the condition data from various modules of an engineering system,not just at one site but at different and geographically separated sites can be col-lected. When a system consists of multiple sites (or sensor nodes), fusion of thecondition data from different sites can provide a more reliable estimation of thesystem performance. This process includes three main steps: data acquisition and19Figure 2.3: Proposed scheme of condition monitoring through the IoT andCC.preprocessing, data transmission, and data mining.2.3.1 Dada TransmissionA network layer is utilized to transmit the sensed data. Short-range wireless net-works such as WiFi, Bluetooth, Zigbee and Sensor Area Network (SAN) are com-mon technologies that support the connection of sensors, devices and users, for20data transmission. Internet Protocol version 4 (IPv4) and Internet Protocol version6 (IPv6) are common standards for the transport networks. Data on the systemcondition (condition data), after preprocessing, are transmitted through the net-work layer to the cloud database.2.3.2 Data MiningFrom the condition data, three main types of analysis are conducted: 1. Systemperformance evaluation, 2. Machine fault detection, and 3. Prediction of the RUL.Acceptable system performance and low cost are two key objectives in themechatronic design of a system. Factors related to performance of a mechatronicsystem include: capacity, efficiency, reliability, stability, accuracy, and so on. Re-quirements of system performance can be extended to integrate other requirementssuch as controllability and low cost. Here, performance variables are defined bythe designers according to the design specifications.Machine fault detection may be treated as a procedure of mapping the infor-mation obtained in the measurement space and/or features in the feature space tomachine faults in the fault space [23]. This mapping process may be consideredas a procedure of pattern recognition. It is achieved by automatic classification ofthe signals based on the features extracted from them. AI techniques have beenincreasingly applied to machine fault diagnosis and have shown to improve theperformance over conventional approaches. Some common AI techniques includeexpert systems, fuzzy logic, ANN, GA and SVM. The efficiency of AI techniqueshas been found to be satisfactory in many case studies.Prediction of the RUL means the prediction of the time left before a failurewould occur given the current and past profile of the machine condition. To per-form prognosis, one must have knowledge (or information) of the failure mecha-nism as well as knowledge (or information) of the fault propagation process. Sim-ilar to fault diagnosis, there are three main approaches of prognosis: 1. Statisticalapproaches, 2. AI approaches, and 3. Model-based approaches. In particular, sta-tistical approaches such as Hidden Markov Model and Particle Filter are widelyused in the RUL prediction.212.3.3 Design Knowledge Base through CCIn the proposed framework, the design knowledge base is constructed on a cloudcomputing platform using the design expertise of domain experts (design method-ologies, regularized design, and so on.), technical solutions, available functionmodules with their specifications, design handbooks, data tables, catalogues, theInternet, information from condition monitoring system (system performance, de-tected fault, maintenance history, remaining useful life), and so on. The knowledgebase is updated in an evolutionary manner through mining of the continuous condi-tion monitoring data, experts input, new technology approaches, newly developedmodules and so on, supported by IoT. Collaborative design can be promoted bysharing the knowledge base between the designers from different sites throughubiquitous access to the knowledge base. Through an inference engine, the designknowledge base is utilized to supervise the searching of the design space both in theconceptual and detailed phases by reducing the search space and offering designguidance.2.4 Design Weakness DetectionAs shown in Fig. 2.2, closed-loop design improvement of an engineering system isachieved through design weakness detection using condition monitoring and sub-sequent conceptual design, detailed design, and implementation of the design im-provements. The process of design improvement is carried out under the guidanceof a design knowledge base, which is continuously updated.In the stage of conceptual design, innovative ideas and multi-criteria evaluationare crucial. With a cloud-based design knowledge base and a collaborative designscheme, creative design ideas can be investigated by the design team more easilyand efficiently. With the assistance of the design knowledge base, the design spacefor conceptual design can be structured to form possible conceptual alternatives.In the detailed design stage, after establishing a dynamic model of the multi-domain system, algorithms that can explore the design space should be appliedto achieve the detailed design leading to a desired optimal behavior in the sys-tem. Specifically, the model will be modified in some manner so that its behaviorapproaches the desired behavior, in an optimal manner, as represented by a cost22function. For detailed design optimization, GP has been employed to realize anoptimal design in an evolutionary manner, so as to satisfy a set of specified designobjectives.Production of the engineering system may be carried out using the outcomeof detailed design. The resulting new generation of engineering systems will beimplemented in manufacturing systems to achieve the required production perfor-mance. Condition monitoring is conducted once the designed system is running.Fig. 2.4 shows the procedure of detecting a design weakness in the current design.System performance evaluation, fault diagnosis, and prognosis are carried outusing the monitored condition data. Design weakness candidates can be identifiedusing the Design Weakness Candidate Index (DWCI), which is defined as:DWCI =(~W)T S(~P)E(~T) f∏i=1gi(~F)(2.1)where ~W is an r+k element column vector, and∑r+ki=1 wi = 1 . S(~P)is an r elementcolumn vector. Each element si (pi) is a function that shows the degree of satisfac-tion of the i th performance aspect. ~P = [p1, p2, ..., pr]T is a vector consisting ofall performance aspects, and E(~T)is a k element column vector. Each elementei (ti) is a function that indicates whether the estimated RUL of the i th componentis close to its designed lifetime, and ~T = [t1, t2, ..., tr]T is a vector consisting of allestimated RUL values of the components. Furthermore, gi(~F)is a function in-dicating whether a fault has occurred. It is equal to 0 if a fault has occurred and1 otherwise. Both S(~P)and E(~T)should take into consideration the situationsof over performance and under performance. This means if the performance orRUL of the system considerably exceeds the designed specification, it is a wastefulsituation, and those functions should be able to add a punishment into the DWCI.According to the definition of DWCI, once a failure is detected, its value willdrop to zero. If a fault does not occur and the system is running smoothly, DWCIshould also be as smooth as S(~P)and E(~T)would change slightly. If a signif-icant decrease in DWCI is observed in a comparatively short time, there must besome problem during operation. Therefore, once DWCI drops to zero or decreasessignificantly, the design weakness candidates will be isolated by checking the com-23Figure 2.4: Procedure of design weakness detection.24ponents of the index. Then the design weakness candidates will be evaluated first tosee if the behavior is related to non-design issues such as inappropriate installation,non-standard operation or poor maintenance. If so, these issues will be correctedand condition monitoring of the system will be continued. Otherwise, the designweakness will be imported to the design process for design improvement. After thedesign improvement is made, the redesigned engineering system will be fabricatedand put into operation. Condition monitoring of the system is continued to detectfurther design weaknesses. The process of design evolution is in a loop and it canoffer design improvements continuously.2.5 Mechatronic Design QuotientThe design of an engineering system is carried out broadly at two levels: the con-ceptual design where the type and function of the subsystems are identified andsome high-level decisions about the operation of the system are made [47], and thedetailed design where the topology and parameters of the subsystem are specifiedor tuned [48].In the conceptual phase, high-level decisions of the system structure and feasi-ble conceptual choices are made according to the design expectation. Conceptualdesign is rather important in a design process. The design space can be exten-sive as there can be a variety of possible configurations, and it is not feasible toachieve the best design in one step. In conceptual design, the designer dividesthe complex design space into several subspaces and evaluates all these subspacesproperly and narrows the design down to one or two subspaces. This offers a lesscomplex searching space for the subsequent phase of detailed design. A ratherchallenging task in design optimization is to concurrently satisfy multiple designobjectives. Behbahani and de Silva [13] presented a systematic approach for con-current and integrated design of a mechatronic system by using the concepts ofMDQ. Their approach used MDQ in the evaluation model to facilitate decisionmaking to achieve an optimal conceptual design of a 2-D manipulator. MDQ is aneffective tool in multi-criteria design evaluation of mechatronic systems. It can beutilized to evaluate possible conceptual alternatives in the conceptual design phase.In a design problem of n design criteria and r constraints, MDQ can be written in25the following form:MDQ(d) = M[xd1 ,xd2 , ...,xdn] r∏i=1ci (d) (2.2)where d represents a design alternative, M is an aggregation operator, xdi is thepartial score that shows the degree of satisfaction of the i th criterion, and ci(d) is afunction indicating whether a constraint has been met. In particular, ci(d) is equalto one if the i th constraint is met. Otherwise, it is equal to zero.The evolution criteria in the MDQ formulation may include ”Meeting taskrequirement”, ”Complexity”, ”Reliability”, ”Matching”, ”Flexibility”, ”Controlfriendliness”, ”Efficiency”, and ”Cost” [49]. In practice, the designer may decideto utilize other criteria if they are important for the design problem, or drop some ofthe above criteria if they are not important in the particular design problem. Someof the criteria may take an analytical form while some others may be qualitativeand fuzzy and may involve human perception [50]. A key step is the aggregation ofcriteria. Interactions can exist between criteria. The traditional aggregation methodof weighted average cannot deal with the interaction between criteria and it is onlysuitable when the criteria are independent. Fuzzy measures are used to model theinteractions between criteria in many situations [51]. In the discrete case, a fuzzymeasure on N is a set function v : 2N → [0,1] satisfyingv( /0) = 0 (2.3)v(N) = 1 (2.4)S⊆ T ⇒ v(S)≤ v(T ) (2.5)For any S⊆N, v(S) can be interpreted as the weight of the degree of importanceof the combination S of criteria [52]. Several fuzzy integrals have been developedin aggregating the criteria in multi-criteria decision making [53]. Choquet integralhas been developed and utilized in many applications of multi-criteria evaluation[54, 55], and can be used for the aggregation of criteria in MDQ.Specifically, the Choquet integral can be utilized to aggregate the criteria to26compute the global score of each alternative using the following equation:Cv(x) :=n∑i=1x(i)[v(A(i)− v(A(i+1))](2.6)where (·) indicates a permutation of N such that x(1)≤ ·· ·≤ x(n) , A(i)= {(i), ...,(n)}and A(n+1) = φ [51]. The Choquet integral may be written as well in the form:Cv(x) = ∑T⊆Na(T )∧i∈T xi (2.7)where ∧ denotes the minimum operator and the set function a : 2N → R is theMobius transform of fuzzy measure v as given bya(S) = ∑T⊆S(−1)s−tv(T ) (2.8)where s = |S| and t = |T |.A key problem in using the Choquet integral is that 2n coefficients in [0,1]need to be specified to define the fuzzy measure on every subset of n criteria. Thisis challenging for the designer and is not practical in real applications. Grabisch[56] suggested to consider the 2nd order Choquet integral, which seems to be morepractical in real applications. It allows modeling of interaction among criteria whileremaining simple and operational. The 2nd order Choquet integral is given byCv(x) = ∑i∈Na(i)xi+ ∑i, j∈Na(i j)(xi∧ x j) (2.9)where a(i) = v(i) and a(i j) = v(i j)− v(i)− v( j) defined by Equation 2.8.After the specification of v(i) and v(i j) , all a(i) and a(i j) can be calculated.Therefore, instead of 2n coefficients, only n+C2n = n(n+ 1)/2 coefficients areneeded.In the phase of detailed design, first the best topology is determined, for ex-ample, system components and their interconnections. Then component details arespecified to achieve the best satisfaction of the design requirement. Methods ofevolutionary computing, GP in particular, have received much attention in recentyears for autonomous topology generation. Evolutionary algorithms are proved to27be effective in assisting designers to search the detailed design space and achievean optimal design. The loop of GP operation is iterated until a termination con-dition is reached or a predefined number of iterations is carried out. The designoutcome is then further evaluated for practical implementation.However, many issues are still to be addressed before this approach can beapplied in practice for complex mechatronic systems. For example, arbitrary evo-lution of a design model of complex system can result in vast computation [19] aswell as infeasible outcomes that cannot be implemented in reality [40]. Therefore,it is important to narrow down the search space by detecting possible weaknesseswhere redesign or design improvement should be conducted.2.6 Case StudyNow the automated evolution of the reconfiguration and the design of an automatedindustrial fish cutting system is investigated as a case study. This automated fishcutting system is designed by the Industrial Automation Laboratory of the Univer-sity of British Columbia and is used in industry to cut the fish head automaticallywith minimized wastage of fish meat [1, 9]. The conventional machines used inthe industry caused about 10%-15% wastage of useful meat, each unit percentageof wastage costing about $5 million annually in the province of British Columbia,Canada [40]. The automated fish cutting system is a multi-domain manufacturingsystem that consists of mechanical, electrical, hydraulic and pneumatic subsys-tems [57]. A schematic diagram of the system is shown in Fig.2.5. The presentcase study employs the proposed closed-loop design improvement architecture forthe reconfiguration and design evolution of this engineering system, based on anew set of production requirements.2.6.1 Machine Condition MonitoringWithin the framework proposed in Section 2.2, both real-time condition data of thesystem (production speed, capacity, power consumption, size, fish weight, wastepercentage, malfunction record, etc.) and condition data from similar machines atother fish-processing plants can be acquired by a variety of sensors and then betransmitted to the cloud database through the network layer. The sensed condition28Figure 2.5: Schematic diagram of the automated fish cutting system.data provides a rather precise and up to date status of the current manufacturingsystem. Also, the system model, information of available subsystem alternatives(technologies, devices, parameters, cost, etc.) and design expertise are acquiredand transmitted to the cloud platform. Then the design knowledge base is formedwith this information to assist the design process.Two key production requirements of the automated fish cutting system are pro-duction speed and percent wastage. The performance limit of the original systemis 1.5 fish per second with 3.5% wastage of fish meat according to the sensed con-dition data. In the present case study, new production requirements are given as:2.0 fish per second with 2% wastage for the design task. Since the current systemis not capable for achieving the new production requirements, the performance of29the system is unsatisfactory. A conceptual redesign of the machine is needed as thenew technical solutions for the subsystems are available (e.g. a robotic arm that isbetter capable and more cost-effective may replace the human operator).2.6.2 Conceptual Design StageFirst a conceptual design improvement is carried out. Then the outcome of the im-proved conceptual design is presented to the detailed design process for explorationof the topologies and parameters of each component. This case study illustrates theclosed-loop design improvement framework, with a focus on the detailed steps ofthe multi-criteria decision making of the conceptual design phase. MDQ is uti-lized here for the multi-criteria evaluation of the design alternatives. The currentautomated fish cutting system contains five main subsystems as listed in Table 2.1.Given the new production requirements, the following steps are followed torealize a conceptual design of the automated fish cutting system to satisfy the pro-duction requirements and various constraints.Step 1: Review the design specifications.The new production requirements are 2.0 fish per second with a 2% wastage limitfor fish meat.Step 2: Determine the system configuration.According to the expertise in the design knowledge base, the current configuration(feeding, conveying, vision, positioning table and cutter blade) can achieve the taskof removing the fish head.Step 3: Design specification estimation.Condition monitoring data provides the current system behavior. Based on thecurrent values, the designer can estimate the performance for each subsystem ac-cording to the new production requirements. Table 2.2 lists the specifications ofthe current subsystems and the estimated required specifications corresponding tothe new design requirements.Step 4: Construct the conceptual design space.For each subsystem, search the knowledge base for available and feasible designalternatives.Table 2.3 lists the available technology choices for each subsystem to achieve30Table 2.1: Subsystems of the automated fish cutting systemSubsystems DescriptionFeeding A human operator will place raw fish in the feeding area ofthe conveyor, ideally at the same pace of movement of theconveyor to achieve maximum productivity.Conveying An electromechanical conveying subsystem delivers the rawfish from the feeding area to the cutting area and then off thefish cutting system after removing the head. The conveyingsubsystem is driven by an AC induction motor.Vision There are two primary tasks for the vision subsystem: iden-tifying the optimal cutting location and evaluating the cuttingquality, particularly related to the wastage of fish meat andthe smoothness of the cut. One image is taken for each fishbefore it enters the cutting area. Image processing is thenperformed at a local computer to identify the best referencelocation for the cut. The corresponding coordinates are sentto the controller of the positioning table control system. Afterthe cut, another image is taken to check the quality of the cutand the percent wastage.PositioningTableThis subsystem moves the positioning table that carries thecutting blade accurately and rapidly to the desired cutting lo-cation as identified by the vision subsystem. Motion in thehorizontal plane, perpendicular to the cutter blade, is con-trolled to achieve this task. The positioning table is poweredby two hydraulic actuators and has two degree of freedom.CutterBladeThis subsystem is assembled on the positioning table. Thecutter blade is pushed down by a pneumatic cylinder to cutthe fish head when a raw fish is brought to the cutting areaby the conveyor and the positioning table with the cutter ismoved to the correct location.31Table 2.2: Specifications of the current subsystems and the estimated designrequirementsSubsystem Parameter Current PerformanceLimitRequiredFeeding Feed speed 1.50 1.50 2.00ConveyingAC Motor#0001Output speedwith desiredoutput torque(rpm)2,400 6,000 4,000Positioning TableHydraulicsolutionMotion time (s) 0.48 0.45 This plus cut-ting time shouldbe less than0.50.Motion accu-racy (mm)5.00 5.00 3.00Cutter BladePneumaticsolutionCutting time (s) 0.10 0.10 This plus mo-tion time shouldbe less than0.50.Vision Camerakit type AProcessing time(s)0.55 0.55 0.50its function.Step 5: Determine the criteria for evaluation.The criteria of ”meeting task requirements”, ”reliability”, ”matching”, ”efficiency”,”intelligence,” and ”cost” are chosen as the MDQ attributes for this problem.Step 6: Reduce the design search space by veto effect criteria.Among the six criteria, ”meeting task requirements” has veto effect. Use this cri-terion to eliminate any design alternative that cannot meet the task requirements.From Table 2.3, for feeding and cutter blade, their two alternatives can meet therequirements. For the conveying module, the output speed of AC motor #0002 isbelow the required value. For the positioning table, only the electrical solution canachieve the requirement of motion accuracy. Camera kit type C can fulfill the re-32Table 2.3: Available choices for each subsystemSubsystem Available choices EstimatedcostParameter PerformancelimitFeedingTwo operators $85,000Feed speed3.0(Fish/s)Robotic arm $120,000 2.5(Fish/s)ConveyingAC Motor #0001 $8,000 Output speedwith desiredoutput torque6000 (rpm)AC Motor #0002 $5,000 3800 (rpm)AC Motor #0003 $6,000 5000 (rpm)PositioningtableHydraulic solution $ 15,000Motion time 0.45 (s)Motion accuracy 5.0 (mm)Electrical solution $18,000Motion time 0.30 (s)Motion accuracy 2.0 (mm)Cutter bladeHydraulic solution $9,000Cutting time0.15 (s)Pneumatic solution $7,000 0.11 (s)VisionCamera kit type A $5,000Processingtime0.65 (s)Camera kit type B $3,000 0.80 (s)Camera kit type C $7,500 0.35 (s)quired image processing within the time limit. The cropped tree structure of theconceptual design space is shown in Fig. 2.6. Eight design alternatives need to beevaluated through multi-criteria evaluation.Step 7: Assign a fuzzy measure to each subset of the criteria. The remainingfive criteria, ”reliability”, ”matching”, ”efficiency”, ”intelligence”, and ”cost” forma number of 25 = 32 subsets of criteria. It needs 32 fuzzy measures using theordinary Choquet integral (two of them are self-evident: v( /0) = 0 and v(N) = 1).Here, the 2-additive Choquet integral is adopted [52]. Thus, only n(n+1)/2 = 15fuzzy measures need to be specified. The common type of interactions and theirfuzzy measure representation are summarized in Table 2.4.The fuzzy measures are typically assigned by expert designers. The fuzzy mea-sures used in this case study are obtained from [13] since the engineering system isthe same and the criteria are similar. The fuzzy measures are as follows. v1 = 0.25, v2 = 0.35 , v3 = 0.22 , v4 = 0.18 , v5 = 0.15 , v12 = 0.52, v13 = 0.45 , v14 = 0.50, v15 = 0.52 , v23 = 0.50 , v24 = 0.48 , v25 = 0.60 , v34 = 0.45 , v35 = 0.50, andv45 = 0.42 . These values reflect a negative correlation between cost and other33Figure 2.6: Reduced conceptual design space.criteria and a small positive correlation between any two criteria except cost.Step 8: Multi-criteria evaluation of each design alternative.Evaluate each design alternative according to the chosen criteria except the oneswith veto effect and assign a score to each design alternative. Evaluation guide-lines are found in [13]. Aggregate the partial scores by using 2-additive Choquetintegral to determine the global score of each design alternative. Choose the designalternative with the highest global score. The results are shown in Table 2.5.34Table 2.4: Typical interactions between criteriaInteraction Type Explanation RelationPositiveCorrelationHigh score in criterioni implies a high scorein criterion j, and viceversa.v(i j)< v(i)+ v( j)NegativeCorrelationHigh score in criterioni implies a low scorein criterion j, and viceversa.v(i j)> v(i)+ v( j)Substitutiveness Satisfaction of onlyone criterion producesalmost the same effectthan the satisfaction ofbothv(T )<{v(T ∪ i)v(T ∪ j)}≈ v(T ∪ i j),T ⊆ N \ i jComplementarity Satisfaction of onlyone criterion producesa very weak effectcompared with thesatisfaction of both.v(T )≈{v(T ∪ i)v(T ∪ j)}< v(T ∪ i j),T ⊆ N \ i j35Table 2.5: MDQ evaluation of design alternatives# 1 # 2 #3 #4 #5 #6 #7 #8FeedingmoduleHumanoperatorHumanoperatorHumanoperatorHumanoperatorRoboticarmRoboticarmRoboticarmRoboticarmConveyor Motor#0001Motor#0001Motor#0003Motor#0003Motor#0001Motor#0001Motor#0003Motor#0003PositioningtableElectricalsolutionElectricalsolutionElectricalsolutionElectricalsolutionElectricalsolutionElectricalsolutionElectricalsolutionElectricalsolutionCutter blade HydraulicsolutionPneumaticsolutionHydraulicsolutionPneumaticsolutionHydraulicsolutionPneumaticsolutionHydraulicsolutionPneumaticsolutionVision Type C Type C Type C Type C Type C Type C Type C Type CMatching 0.70 0.60 0.70 0.60 0.80 0.70 0.80 0.70Reliability 0.50 0.40 0.50 0.40 0.50 0.60 0.50 0.60Intelligence 0.70 0.60 0.70 0.60 0.90 0.80 0.90 0.80Efficiency 0.50 0.60 0.60 0.70 0.60 0.70 0.70 0.80Cost 0.85 0.90 0.90 0.95 0.70 0.75 0.75 0.80Globalscore0.621 0.589 0.651 0.624 0.657 0.694 0.693 0.72936The best conceptual design of the automated fish cutting system is the No.8design alternative, which corresponds to a conceptual design solution of Roboticarm (feeding module), Motor #003 (conveying), Electrical solution (positioningtable), Hydraulic solution (cutter blade), Type C camera kit (vision module). In thecase where two or more alternatives have very close global scores, designers caninclude more features and evaluate the alternatives further.2.6.3 Detailed Design StageAfter the conceptual design process, detailed design is conducted to specify thetopology and tune the parameters to achieve the desired design requirements. GAand GP can be utilized to explore the detailed design space and find the optimaldesign in the design space. The procedure of evolutionary design with GP is shownin Fig. 2.7.GP evolution for complex engineering system with numerous subsystems andcomponents can be computationally expensive. A CC platform is utilized hereas well to execute the GP computation and find the optimal detailed design withthe specific topologies and parameters for each subsystem of the fish cutting ma-chine. The design solution is further evaluated and then physically realized. Theredesigned fish cutting machine is then put into production.37Figure 2.7: Procedure of evolutionary design with GP.38Chapter 3Intelligent Machine FaultDiagnosis using UnsupervisedFeature Learning3.1 IntroductionAs introduced in Chapter 2, machine fault diagnosis is a key aspect in the evalu-ation of the design of an existing system. Due to the advantages of data-drivenapproaches and the availability of large amounts of condition monitoring data,data-driven approaches of fault diagnosis are considered in this study. Most ofthe existing data-driven methods have two main deficiencies: (1) Feature extrac-tion needs the specification of representative and robust features by domain ex-perts. This process requires prior knowledge of signal processing and a compre-hensive knowledge of the specific problem. For instance, in the fault diagnosis ofcommonly found rotating machinery, numerous methods have been developed indifferent domains (e.g., time domain, frequency domain, and time-frequency do-main) [58–60]. However, they are only effective for specific types of machineryand hence are case sensitive. (2) Most data-driven algorithms use supervised ma-chine learning techniques, which require large amounts of labeled training data.However, it is difficult to collect labeled data that ensure training of a good model.39In real applications, a large portion of the data collected from a condition monitor-ing system is unlabeled [61].Clearly, it is desirable to develop rather intelligent fault diagnosis methodolo-gies that can utilize massive amounts of unlabeled condition monitoring data tolearn good features without requiring extensive prior or domain knowledge. Re-cently, DNN have gained much attention and have achieved great progress in manyareas; e.g., computer vision, natural language processing, speech recognition andbioinformatics [62]. With its deep and comprehensive structure, a DNN with unsu-pervised learning is able to learn useful features from unlabeled raw data [63]. Withfurther fine-tuning using labeled data, a DNN has been shown to achieve remark-able performance in many classification tasks. In the field of fault diagnosis, DNNhas been investigated only recently. Tao et al. [64] applied stacked autoencoderand softmax regression in bearing fault diagnosis, leading to high diagnosis accu-racy. Feng et al. developed a deeper DNN together with autoencoder and softmaxregression to achieve improved classification results [45]. In both approaches, thefine-tuning process used all the labeled training data to achieve satisfactory classi-fication accuracy. In reality, however, it is difficult to acquire the needed amount oflabeled data for fine-tuning the model. In addition, the basic autoencoder has thedisadvantage that it cannot guarantee the extraction of useful features [65]. DNNhas shown the promising capability in generalized learning that it can classify newclasses by slightly adjusting the trained model without training from scratch sincethe hidden layers contain representative features already extracted from the trainingdata. This is beneficial in many real applications since training of a DNN modelfrom scratch is computationally expensive and time-consuming.Inspired by prior research, this section presents a novel approach of intelli-gent machine fault diagnosis (IMFD) that addresses the deficiencies of the existingmethods. Representative features are extracted automatically from unlabeled con-dition monitoring data by SDA in an unsupervised manner. By stacking the traineddenoising autoencoder, a DNN is constructed to perform intelligent fault diagno-sis after fine-tuning the model with a few items of available labeled data. In thisapproach, a massive amount of easily accessible unlabeled condition monitoringdata is utilized to learn useful and robust features. Only a few items of labeled dataare needed, which is an advantage in a practical application. In addition, after fur-40ther fine-tuning of the trained DNN, newly occurring conditions can be correctlyclassified by the proposed method. Rotating machinery is used as an example todemonstrate the proposed method. Due to the end to end learning capability, thismethod can be applied to fault diagnosis of other systems using correspondingsensed data.3.2 Unsupervised Feature LearningA DNN has multiple layers, each having a number of nodes with nonlinear trans-formations. With its deep architecture, DNN is capable of extracting the discrimi-native features from the input data through a large number of linear and nonlineartransformations [63]. Unsupervised learning as in the autoencoder can learn repre-sentative features from unlabeled raw data [66]. Unsupervised learning is effectivein intelligent fault diagnosis since it does not necessarily use labeled conditionmonitoring data, which are hard to acquire. Stacked denoising autoencoder (SDA),an improved version of the autoencoder, is utilized in the approach proposed in thepresent work to learn features from the condition signals of the monitored system.3.2.1 AutoencoderThe autoencoder is a three-layer neural network that reconstructs the input data inthe output layer [67]. It seeks to reconstruct the original input by minimizing thereconstruction error in an unsupervised manner. Fig. 3.1 shows the structure of theautoencoder.The autoencoder uses a deterministic mapping fθ with parameters θ = {W,b}to map the input vector x ∈ Rd to a hidden representation y ∈ Rd′ according to:y = fθ (x) = s(Wx+b) (3.1)where s is the activation function, W is a d′× d weighting matrix, and b is a biasvector. Then the representation y is mapped back to a reconstructed vector z ∈ Rdby:z = gθ ′(y) = s(W′y+b′) (3.2)where θ ′ = {W′,b′} . The autoencoder is said to have tied weight if the weight41Figure 3.1: The structure of the autoencoder.matrix W′ of the reverse mapping is chosen to be W′ =WT . The parameters of themodel are optimized by minimizing the average reconstruction error as given by:θ ∗,θ ′∗ = argminθ ,θ ′1nn∑i=1L(x(i),z(i))= argminθ ,θ ′1nn∑i=1L(x(i),gθ ′(fθ(x(i))))(3.3)where L is a loss function such as the squared error L(x,z) = ||x− z| |2 .Autoencoder is introduced to initialize a DNN using the representation of thekth layer as the input for the (k+ 1)th layer [68]. After each layer has been ini-tialized, these layers are stacked to form the DNN. Supervised training is thenconducted to fine-tune the network [69]. The layer-wise initialization has shownsignificant improvement in achieving better local minima than random initializa-tion of deep networks. However, the autoencoder may not guarantee the extractionof useful features as it can lead to the obvious solution that simply copies the inputor uninteresting ones that trivially maximize mutual information.42Figure 3.2: Schematic diagram of the procedure of denoising autoencoder.3.2.2 Denoising AutoencoderThe denoising autoencoder is an extension of a basic autoencoder, and aims atlearning more suitable and robust representations to initialize a deep network [70].A denoising autoencoder is trained to rebuild a repaired clean input from a cor-rupted version of it. First, the initial input x is corrupted into x˜ through the stochas-tic mapping x˜∼ qD (x˜|x). The basic autoencoder is then used to map the corruptedinput x˜ to a hidden representation byy = fθ (x˜) = s(Wx˜+b) (3.4)Subsequently, the reconstruction is done by the reverse mapping z = g′θ (y).The loss function of squared error LH(x,z) = ||x− z| |2 is minimized by updatingthe parameters. A schematic representation of the procedure of denoising autoen-coder is shown in Fig. 3.2.The parameters θ and θ ′ are updated through a training set by minimizing theaverage reconstruction error between and the uncorrupt input x , which is the sameas the basic autoencoder. The difference between the denoising autoencoder andthe basic autoencoder is that z here is a deterministic function of x˜ rather than x.It thus encourages the learning of a more clever mapping than the identity: onethat extracts features useful for denoising [71]. The layer-wise procedure is thesame as that of the basic autoencoder. The input corruption is used only for thetraining of each layer to learn useful representations. After the mapping fθ islearned, uncorrupt inputs are used to produce a representation that will serve asthe clean input to the following layer. Different types of corruption processes may43be considered such as additive isotropic Gaussian noise, salt-and-pepper noise andmasking noise [70]. Masking noise is used in this work where a fraction of theelements of x (chosen at random for each sample) is forced to zero.3.2.3 Softmax Regression ClassifierThe softmax regression is a generalized form of logistic regression, which is oftenused for multiclass classification [72]. Given a k class input training set with n sam-ples{x(i)}n(i=1) where x(i) ∈ Rm and its label set {t(i)}ni=1 where t(i) ∈ {1,2, ...,k},softmax regression estimates the probability of each input sample belonging toeach class. The probability is given byP(t(i) = j|x(i);θ)=1∑kl=1 eθTl x(i)eθT1 x(i)eθT2 x(i)...eθTk x(i) (3.5)where j = 1,2, ...,k. θ = [θ1,θ2, · · · ,θk] are the parameters of the softmax regres-sion model and 1/∑kl=1 eθTl x(i)normalizes the distribution so that the summation ofthe probability is one.The cost function of softmax regression is defined as:J(θ) =−1n[n∑i=1k∑j=11{t(i) = j}logeθTj x(i)∑kl=1 eθTl x(i)](3.6)where 1· is the indicator function, which returns 1 if the condition is true and 0otherwise. The parameters are updated by minimizing the cost function J(θ) overthe training dataset. In this work, softmax regression is used to generate the classesof input data.3.3 Feature Learning and Fault Diagnosis using SDAThis dissertation proposes an intelligent fault diagnosis approach based on SDAand DNN. It learns representations from a massive set of unlabeled condition data;e.g., vibration signals of a bearing, gearbox or induction motor with different con-44ditions by unsupervised learning using SDA. Spectrum of the vibration signals areused as the input data in this work. This method can automatically achieve machinecondition classification after fine-tuning the model with very few labeled conditiondata. In addition, new conditions can also be classified after further fine-tuningof the trained model with a few labeled observations of the new conditions. Thescheme of the proposed approach is shown in Fig. 3.3 with the following mainsteps.Step 1: Data acquisition and pre-processingCondition monitoring data can be acquired through various sensors such as ac-celerometer, pressure sensor, thermometer, and cameras depending on the charac-teristics of the monitored system. In this work, vibration signals are used and thespectrum of the time signal is obtained by applying fast Fourier transform (FFT).Step 2: Initialization of the DNNA DNN with N hidden layers is initialized with random parameters. The numberof layers can be increased until the model starts to overfit the training data as sug-gested in [73]. The number of nodes in the first layer can be set equal to or slightlylarger than the dimension of the input data. The number of nodes in the subsequentlayers is usually decreased gradually.Step 3: Unsupervised feature learningDenoising autoencoder is applied here to learn representative features from theunlabeled data as shown in Section 3.2.2. Layer wise training is carried out beforethe layers are stacked.Step 4: Supervised fine-tuning of the modelAfter Step 3, a softmax layer is added on top of the DNN. Labeled condition datais used to fine-tune the parameters of the DNN by stochastic gradient descent.Step 5: Fault diagnosis with the trained DNNWith the trained DNN, fault diagnosis can be carried out on the acquired conditiondata.Step 6: Further fine-tuning and fault diagnosis of new conditionsAs new conditions occur, the trained model can be further fine-tuned with afew labeled condition data of the new conditions. The further fine-tuned model candiagnose the machine conditions including the new ones.45Figure 3.3: The scheme of the SDA based feature learning and fault diagno-sis.3.3.1 Unsupervised Feature Learning with SDAFirst, a denoising autoencoder is used to learn representations from massive unla-beled condition data. A DNN with N hidden layers can be pre-trained by stack-ing N denoising autoencoders. Consider an unlabeled dataset {x(i)}ni=1 wherex(i) ∈ Rm. Each x(i) can be the Fourier transform of a vibration signal of a run-ning machine, e.g., bearing, gearbox and motor under different conditions includ-ing normal and various faulty conditions. The parameter of the first autoencoderθ1 = {W1,b1} is randomly initialized and then is trained following the procedureshown in Fig. 3.2 by minimizing the reconstruction error of the input data x(i) andthe reconstructed data z(i) in Equation 3.3. The sigmoid function is used here asthe activation function. The trained parameters of the first encoder, θ1 is stored forinitialization of the first hidden layer of the deep network. The learned encodingfunction f (1)θ is then used on the uncorrupted input. The resulting representationis used as the input to train the next level denoising autoencoder to obtain the nextlevel encoding function f (2)θ . The trained parameters of the next encoder are usedfor initialization of the next hidden layer. This procedure is repeated as shown inFig. 3.4 until the parameters of all hidden layers of the deep network are updated.46Figure 3.4: The procedure of stacking denoising autoencoder.3.3.2 Fine-tuning a DNN for ClassificationIn this work, after the hidden layers are trained and stacked by the denoising au-toencoder, an output layer, the softmax regression layer is added on top of thestack. A small amount of labeled data is then used to fine-tune the parameters ofthe entire network by minimizing the error in the predicted conditions. Fig. 3.5shows the fine-tuning procedure. Stochastic gradient descent based backpropaga-tion algorithm is used to update the parameters of all hidden layers by minimizingthe error between the predicted label and the actual label of the labeled dataset.The parameters W and b can be updated as:Wl = Wl−η ∂∂Wl J(W,b;X, t) (3.7)Bl = Bl−η ∂∂Bl J(W,B;X, t) (3.8)where Wl and Bl are the weights and bias, respectively, of the l-th layer; η is thelearning rate; (X, t) is a batch containing m training samples; X are the spectrumof vibration signals and t are the corresponding labels.After fine-tuning, the deep network is capable of classifying conditions inthe current condition space. In order to diagnose new conditions, another smallamount of labeled data of the corresponding new conditions is combined with thefine-tuning dataset. With the new fine-tuning dataset, the parameters of the DNNcan be further fine-tuned with the same procedure as shown in Fig.3.5 to achievefault diagnosis in the new condition space. The proposed approach enables effec-tive feature learning and diagnosis with new conditions by simply fine-tuning theDNN with a few labeled data rather than training from scratch, which needs a largeamount of training data and computational resources.47Figure 3.5: Fine-tuning of the DNN by minimizing the error in predicting thesupervised target.3.4 Experiment StudiesThis section presents a case study on fault diagnosis of a motor bearing unit. Itserves to illustrate the application of the methodology developed in this work.3.4.1 Data DescriptionIn the present case study, the dataset of motor bearing vibration signals from CaseWestern Reserve University (CWRU) is analyzed [74]. The experimental setup in-cludes a 2 hp motor, a torque transducer and a dynamometer shown in Fig. 3.6.Vibration data has been collected from an accelerometer at the drive end bear-48Figure 3.6: Experiment setup for bearing condition data acquisition.ing under different conditions: normal condition, ball defect (BD), outer race de-fect (OR) and inner race defect (IR). For the conditions with defects, three levelsof defect severity (0.18, 0.36 and 0.53mm) have been introduced. Thus, 10 bearingconditions in total are included in the dataset of the present case study, as shown inTable 3.1. The vibration signals have been collected under four load conditions (0,1, 2 and 3 hp) with a sampling frequency of 12 kHz. For each bearing condition,100 samples with 1200 data points have been collected from each load condition.Then fast Fourier transformation is applied to each signal to obtain the spectrum.Since the Fourier coefficients are symmetric, only the first half of the Fourier coef-ficients are used to compose each data sample. The dimension of each data sampleis thus 600. There are in total 4000 samples in the dataset with 400 samples foreach bearing condition.A five layer DNN is constructed here including an input layer, three hiddenlayers, and a softmax output layer. The number of neurons of the input layer isequal to the dimension of the input data which is 600. The number of neurons ofthe three hidden layers are selected as 400, 200 and 50. The number of neurons ofthe softmax layer is 10 which is the number of conditions. The sigmoid function isused as the activation function in the hidden layers. The learning rate is set at 0.0549Table 3.1: Conditions of the bearing in the datasetCondition Type Defect Severity (mm) Class Label Sample NumberNormal 0 1 400BD 0.18 2 400BD 0.36 3 400BD 0.54 4 400OR 0.18 5 400OR 0.36 6 400OR 0.54 7 400IR 0.18 8 400IR 0.36 9 400IR 0.54 10 400[BD denotes ball defect; OR denotes outer race defect; IR denotes innerrace defect.].and the fraction of masked zero is 0.1.In order to investigate the performance of the proposed approach in the diagno-sis of new conditions, the dataset is divided into the two parts: dataset B with onecondition (IR with 0.53 mm defect severity) as the dataset of a new condition anddataset A with all the remaining nine conditions as the original conditions. 75%of the samples in dataset A are randomly selected to form the training dataset Cand the remaining 25% to form the testing dataset D. Then 95% of the dataset C israndomly chosen as the unsupervised training dataset E and the remaining 5% asthe fine-tuning dataset F. Fig. 3.7 shows the dataset structure.3.4.2 Fault Diagnosis using the Proposed ApproachIn the first part, unsupervised feature learning and fine-tuning are performed ondataset A, which contains the original nine conditions. When performing SDA ondataset E, we assume that no labeled information is available for these samples,thus unsupervised feature learning is needed. Fine-tuning of the deep networkis then conducted with the labeled data in dataset F. Testing dataset D is used tocheck the diagnosis performance of the trained DNN on the original nine classes.The experiment is repeated 20 times with a maximum epoch of 200 to reducethe randomness. Fig. 3.8(a) shows the training and testing accuracy over nine50Figure 3.7: The structure of datasets.bearing conditions. All the trials achieve more than 97% diagnosis accuracy andthe average testing accuracy is 97.88%. This demonstrates the effectiveness of theproposed approach in intelligent fault diagnosis using a large amount of unlabeleddata and a few items of labeled data.In the next part, fault diagnosis of the new condition space is performed. Withthe proposed approach, the trained DNN in the first part only needs further fine-tuning with a few items of labeled data, rather than training from scratch. 15% ofthe data is randomly selected from dataset G to form dataset I. Dataset F and datasetI form the new fine-tuning dataset for further fine-tuning. The new testing datasetis composed by combining dataset D and dataset I. Also, 20 trials are run and thediagnosis result form that exercise is shown in Fig. 3.8(b). The testing accuracyof ten classes is compared with the testing accuracy of the original nine classes.All trials obtain a testing accuracy of more than 96.50% and the average testingaccuracy is 97.59%. It shows that after further fine-tuning with very few items oflabeled data that contains the new condition, the proposed method performs wellin fault diagnosis in the new condition space.An experiment is conducted to assess the robustness of the proposed approachto noise. The original vibration signal is corrupted using different levels of whiteGaussian noise with the signal-to-noise ratio (SNR) from 30 dB to 0 dB, and a step51(a) Training and testing accuracy over nine bearing conditions(b) Comparison of testing accuracy of ten classes and nine classesFigure 3.8: Classification results with the proposed approachsize of 6 dB. For each level of noise, 20 trials are carried out and the average resultis listed in Table3.2. The training accuracy of nine classes, the testing accuracy ofnine classes, and the testing accuracy of ten classes is represented by R1, R2, andR3 respectively. The result shows that the classification accuracies with an SNR of30 dB and 24 dB are almost the same as those without noise. When the noise level52Table 3.2: Classification results with different levels of noiseSNR (dB) 30 24 18 12 6 0R1(%) 97.29 97.21 96.19 96.58 95.05 82.38R2(%) 97.42 96.67 96.20 95.86 94.05 77.89R3(%) 96.26 96.28 98.17 96.53 95.34 81.50[R1 is the training accuracy of nine classes; R2 is the testingaccuracy of nine classes; R3 is the testing accuracy of tenclasses.].is increased and the SNR is 18 dB and 12 dB, the accuracies decrease slightly.Even when the SNR is 6 dB, the proposed approach can still achieve a 94.05%testing accuracy of nine classes and 95.34% testing accuracy of ten classes. Theperformance of the approach degrades when the SNR is 0 dB where the powerof the noise is equal to that of the original signal. This is the case with noiseisolation and other signal processing methods should be considered when the signalis acquired or pre-processed. Overall, the DNN based approach is robust when thesignal is corrupted by noise.A comparison of the proposed approach with typical data-driven approaches isconducted now. Statistical features in time domain and frequency domain are ex-tracted and used for the fault classification using linear SVM, quadratic SVM, kNN, and weighted kNN. All the data are treated as labeled data in using the four su-pervised learning methods indicated above. Ten features: absolute mean, variance,crest, clearance factor, kurtosis, crest factor, root mean square, pulse factor, skew-ness, and shape factor, are selected in the time domain and five features includingaverage frequency, crest, kurtosis, mean energy, and variance are selected in thefrequency domain. The testing accuracies for both nine classes and ten classes arelisted in Table 3.3.The results show that the proposed approach has achieved the highest testingaccuracy in the nine classes dataset even with only 5% of the data labeled. Also,with only 15% of the data of the tenth condition labeled, it achieves better resultthan three of the other approaches (linear SVM, kNN, and weighted kNN) and onlyslightly lower than the quadratic SVM. However, those four approaches require100% of the data to be labeled.53Table 3.3: Comparison of the proposed approach with SVM and kNNLinearSVMQuadraticSVMkNN WeightedkNNProposedapproachR2 (%) 96.70 97.70 95.30 95.80 97.88R3 (%) 97.50 98.20 96.40 96.70 97.59[R2 is the testing accuracy of nine classes; R3 is the testing accuracy of tenclasses.].3.4.3 Effect of the Size of Labeled DataThe proposed approach is valuable in situations where the availability of labeleddata is quite limited while there exists extensive unlabeled data. It is important toinvestigate the robustness of the method to different ratios of labeled data to unla-beled data. Therefore, further study is done by changing the fraction of unlabeleddata that is used for unsupervised feature learning from 1% to 15% with a step sizeof 1%. Fig. 3.9(a) shows the result of the average testing accuracy after 20 trials foreach step size. The diagnosis accuracy rises rapidly when the fraction of labeleddata is increased from 1% to 4%. It is seen that even with 3% of labeled data, thediagnosis accuracy is around 95%, which shows that the features learned from un-labeled data are representative. With further increase of labeled data, the diagnosisaccuracy tends to increase slightly and becomes stable. The proposed approachachieves satisfactory diagnosis accuracy even with very few items of labeled data.In the diagnosis of new conditions, the effect of the amount of labeled new con-dition data that is used in further fine-tuning of the model is studied. Experimentsare run using different amounts of new condition data for further fine-tuning. Fig.3.9(b) shows the diagnosis result based on different amounts of new condition data.The horizontal axis indicates the percentage of data selected from dataset G, whichforms dataset I. It is seen that the testing accuracy on the new condition improvesrapidly when the percentage of new condition data is increased from 5% to 15%.This is because, with the increase of the labeled new condition data, the DNN istuned to be more accurate in detecting the new condition. The testing accuracyof the other nine classes decreased slightly with the increase of the new conditiondata. With further increase of the new condition data, the testing accuracy of both54(a) Diagnosis result with different fractions of labeled data(b) Testing accuracy under different amounts of new condition dataFigure 3.9: Effect of the size of labeled datathe original nine classes and the new class becomes stable. The result shows thatthe proposed method performs satisfactorily in detecting the new condition afterfurther fine-tuning of the DNN.553.4.4 Visualization of Learned FeaturesThe features learned from the SDA as well as the features fine-tuned by the DNNare visualized by using t-distributed Stochastic Neighbor Embedding (t-SNE) whichis an effective tool to visualize the characteristics of high dimensional data [75].Before running t-SNE, principal component analysis is applied to speed up theadditional computation by reducing the dimension of the features to 30. Then, t-SNE is used to map the 30-dimensional data to a two-dimensional map that canvisualize the learned features from the proposed approach. Fig. 3.10(a) shows thescatter plots of the features learned from the SDA. It is seen that after unsuper-vised learning over unlabeled data, the features can accurately distinguish manyof the conditions. After fine-tuning the DNN by using only a small amount of la-beled data, the data of different conditions are separated clearly, as shown in Fig.3.10(b). It indicates that the features learned by the proposed method are represen-tative. Fig. 3.10(c) presents the features of the new condition space after furtherfine-tuning with the labeled data of the new condition. The result shows that thefeatures work well in clustering the conditions with the new one. The proposed ap-proach is capable of learning representative features by SDA-based unsupervisedlearning and fine-tuning with very few items of labeled data.In summary, the intelligent fault diagnosis approach based on SDA and DNNis able to learn representative features automatically from a massive quantity ofunlabeled condition data. Only a few items of labeled data are needed to fine-tunethe DNN to perform fault diagnosis. In addition, the proposed approach is ableto utilize the trained model to diagnose new conditions by further fine-tuning theDNN with a few items of labeled data of the new conditions. The effectiveness ofthe developed approach was verified by using a standard dataset of bearing faults.The robustness of the method to noise was evaluated by corrupting the originalsignal with different levels of noise. With the comparison of traditional fault diag-nosis methods, the proposed method was shown to overcome the drawbacks of (1)features are extracted with prior knowledge, (2) a large amount of labeled data isneeded, and (3) diagnosis model has to be rebuilt for diagnosing new conditions.56(a) Features learned from the unsupervised SDA.(b) Features of nine classes after fine-tuning.(c) Features of ten classes after further fine-tuning.Figure 3.10: Visualization of learned features57Chapter 4IMFD using ConvolutionalNeural Networks and SensorFusion4.1 IntroductionRecent research has shown that an estimator employing multiple sensors with sen-sor fusion techniques can provide enhanced and robust estimates [76], [77]. Sensorfusion can be classified into three categories: data level, feature level, and deci-sion level [78]. Data level sensor fusion can achieve highest performance becauseit loses less information than the other two categories. The convolutional neuralnetworks (CNN) model is designed for processing two dimensional (2D) or threedimensional (3D) input data. This property has the potential to incorporate sensorfusion to improve the diagnosis accuracy and reliability. For instance, temporalsignals from different locations can be aligned into a 2D matrix as the input to aCNN model where the temporal and spatial information is integrated.Moreover, CNN overcomes the limitation of the regular fully connected DNNin solving more complex problems. The parameters of DNN can grow exponen-tially when more layers are added to the model. It can lead to high computationaleffort or the overfitting problem. Compared with the standard DNN with all fully58connected layers, a CNN is constructed using fewer connections through sharedfilters. Training of a CNN is easier and it uses less computational resource and lesstime. Another advantage of the CNN model is that it is less likely to cause over-fitting with the same available training data. With the linear and nonlinear layersin the model, CNN has shown a strong capability in learning sensitive and robustfeatures [79].Chen et al. [80] proposed an approach using CNN for gearbox fault diag-nosis with vibration signals and achieved high classification accuracy. However,manual feature extraction was still needed to form the input for their CNN model.Janssens et al. [81] developed a three layer CNN model for bearing fault detectionwith vibration signals. However, it could not work on raw data and discrete Fouriertransform was needed. More recently, Guo et al. [82] proposed a hierarchical adap-tive deep CNN for bearing fault diagnosis from raw vibration data. However, theconvergence was quite slow and one-dimensional raw vibration data was arbitrarilyconverted into a square matrix as the input to the CNN model.With the above advantages and capabilities of CNN, the present research pro-poses a CNN-based fault diagnosis approach using signals from multiple sensors.Raw signals are directly used as the input to the model to detect different failures.Signals from multiple sensors are fused at the data level in this model to increasethe accuracy and reliability of the diagnosis. With mini-batch stochastic gradientdescent, the parameters of the network can be tuned efficiently to obtain a CNN-based fault prediction model. A dropout technique is adopted in this approach todecrease the likelihood of overfitting. Representative features are extracted auto-matically through feature learning of the CNN-based model. Fault diagnosis ofrotating machinery is used in this study to illustrate the process of the proposedmethod. The performance of the method is evaluated through both a roller bearingdataset and a gear transmission dataset.4.2 Convolutional Neural NetworksCNN are an important class of DNN and have been successfully applied in vari-ous classification problems due to its capability of feature extraction [83]. CNNis composed of trainable multi-stage architectures involving linear and nonlinear59operations. The input and output of each stage are sets of arrays called featuremaps [84]. Typically, each stage includes two layers: a convolution layer and afeature pooling layer. A typical CNN is constructed by stacking one or a multipleof such 2-layer stages together with a classification layer, e.g., a softmax layer. Thefeed-forward process can be represented:f (X) = fL(. . . f2( f1(X,θ (1)),θ (2)) . . .),θ (L)) (4.1)Here X is the input raw data, e.g., an image, an audio sequence or vibration sig-nals from a machine condition monitoring system; θ (1),θ (2), ...,θ (L) are learnableparameters such as weights and biases at each of the L stages; and f1, f2, · · · , fLare operations at each stage. Outputs of these functions are intermediate featuremaps. For computer vision applications, the input of the network is usually a 2Darray of pixels if the image is of grey scale, or a 3D array for typical images withred-green-blue (RGB) channels.4.2.1 Convolution LayerIn a convolution layer, the input is convolved with a bank of learnable filters (sim-ply known as kernels) to generate new feature maps as the input to the next layer[26]. The operation can be expressed by:X(l)k′ = f (K∑k=1W(l)kk′ ∗X(l−1)k +B(l)k′ ) (4.2)Here l donates the layer number of the network; k′ = 1,2, ...,K′ is the indexof the output feature maps; and k = 1,2, ...,K is the index of the input featuremaps, where K = 1 at the first layer if the input data is a 2D array. The ∗ denotesthe 2D discrete convolution operator applied to the kth filter W(l)kk′ at the lth layerwith the kth feature map X(l−1)k from the (l − 1)th layer. A bias matrix B(l)k′ isthen added to the convolutional outcome. Finally, a nonlinear activation functionf is applied point-wise on each element of the feature maps. Typical nonlinearactivation functions include hyperbolic tangent function, sigmoid function, andrectified linear unit (ReLU). In the present work, ReLU is used due to its superior60performance as reported in a recent work [85]. ReLU is given by:yi jk = max(0,xi jk) (4.3)where xi jk is the (i, j) component of the kth feature map.4.2.2 Feature Pooling LayerAnother important operation in a CNN is pooling, which achieves spatial invari-ance by reducing the resolution of the feature maps [86]. A pooling operator isapplied to each feature map separately by fusing nearby feature values into onevalue through a suitable operator such as max-pooling (using the max operator) oraverage-pooling (using the average operator). The neighborhoods can be steppedby a stride larger than 1. The pooling window can be of different size. Max-poolingis increasingly used in recent models, given by:yi jk = max(yi′ j′k : i≤ i′ < i+ p, j ≤ j′ < j+q) (4.4)where p is the length of the pooling window and q is the width. The maximum inthe neighborhood is selected to be the value of that area. Thus, the pooling layerproduces a feature map of lower resolution.4.2.3 Softmax LayerThe softmax regression is often used for multiclass classification as a generalizedform of logistic regression [72]. Given a training dataset of k classes with m sam-ples{x(i)}mi=1 where x(i) ∈ Rn and its label set {t(i)}mi=1 where t(i) ∈ {1,2, . . . ,k}, softmax regression estimates the probability of each input sample belonging toeach class. The probability is calculated by:P(t(i) = j|x(i);W(L)) =(k∑(l=1)e(W(L)l )T x(i))−1×[e(W(L)1 )T x(i)e(W(L)2 )T x(i) · · ·e(W(L)k )T x(i)]T (4.5)61where j = 1,2, . . . ,k. W(L) =[W(L)1 ,W(L)2 , . . . ,W(L)k]are the parameters of thesoftmax regression model and 1/∑k(l=1) e(W(L)l )T x(i)) normalizes the distribution sothat the summation of the probability is unity.The cost function of softmax regression is defined as:J(W(L)) =− 1m m∑i=1k∑j=11{t(i) = j} log e(W(L)j )T x(i)∑kl=1 e(W(L)i )T x(i) (4.6)where 1{·} is the indicator function, which returns 1 if the condition is true and 0otherwise. The parameters are updated by minimizing the cost function J(W(L))over the training sample. In the present work, softmax regression is used to gener-ate the classes of the input data.4.2.4 Mini-batch Stochastic Gradient DescentSupervised training is performed using gradient descent, which can be imple-mented through the backpropagation algorithm [87]. The weights of all the filtersof the CNN are updated through the learning procedure to minimize the loss func-tion that captures the difference between the target output and the predicted outputof the CNN. Instead of updating the weights over the entire training dataset in thestandard gradient descent method, which can be computationally expensive, mini-batch stochastic gradient descent updates the weights by the average gradient overa small batch of training samples. The parameters W and B are updated accordingto:W(l) = W(l)−η ∂∂W(l)J(W,B;X,Y) (4.7)B(l) = B(l)−η ∂∂B(l)J(W,B;X,Y) (4.8)where η is the learning rate; (X,Y) is a batch containing m training samples; X arethe raw signals; and Y are the corresponding labels.With supervised training using labeled data, the banks of filters at differentconvolutional layers are tuned in an automatic manner. For example, in an imageclassification task, the learned filters are found to be edges in the first layer, object62parts in the intermediate layer, and complicated object models in later layers. Theclassification layer can use the learned bank of filters (or the extracted features) toachieve the classification.4.3 IMFD using CNN and Sensor FusionThe present work proposes an IMFD approach based on CNN using raw data frommultiple sensors. With the appealing capability of automatic feature extraction ofthe approach, no hand-craft features are needed to classify different conditions.Multiple sensor fusion at data level is achieved by combining the raw data frommultiple sensors into a 2D matrix at the input layer. Fig. 4.1 shows the flowchartof the proposed fault diagnosis method. Condition monitoring data of the running Sensor 1XmSensor 1 Sensor 2Data acquisition and data fusion2X mXXtrainX testXCNN model trainingCNN model design and initializationTrained CNN modelDiagnosis resultTraining datasetTesting datasetMachineryValidation datasetCNN model selectionvalidateXFigure 4.1: Flowchart of the proposed fault diagnosis approach.63nmX2nX1nXInput layer(0) m nX R Convolution(1)XFeature maps1KMaxpoolingFeature maps1KConvolutionFeature maps2KMaxpooling2KFully-connectedSoftmaxOutput layer1 1 1p q Filter size:2 2 1p q K Filter size:(2)X (3)XFeature maps(4)X (5)X (6)XDropout DropoutFigure 4.2: Architecture of the CNN-based fault diagnosis modelmachinery is collected from multiple sensors such as vibration signals from ac-celerometers. After denoising and preprocessing, these sets of one dimensionaltime series are stacked row by row to form a 2D input matrix. The temporal in-formation and the spatial information from the sensors are constructed in the inputmatrix in this manner. All the collected samples are then divided into training,validation, and testing datasets. The training dataset is used to train the initializedCNN model by minimizing the error between the predicted condition and the ac-tual one. The validation dataset is used to select a model before possible overfitting.The generalization capability of the trained model is then evaluated by the testingdataset. No manual feature extraction or selection is needed in this approach as therepresentative features are automatically extracted during the training process.Fig. 4.2 shows the detailed structure of the CNN-based fault diagnosis model.Machine condition monitoring data Xni ,(i = 1,2, ...,m) from m vibration sensors iscollected and fused at the data level as the input X ∈Rm×n of the CNN model. Theinput is convolved by K1 filters of size p1×q1×1. The ReLU operation is appliedon the convolved outcome to form the K1 feature maps with dimension (m− p1+1)× (n−q1+1). A max-pooling layer is followed to subsample the feature mapsby using Equation (4.4). Followed by another such stage, the convolution processaims to capture the representative features from the input data. A fully connectedlayer and a softmax layer are added next to generate the machine condition. Mini-batch stochastic gradient descent is used in this work to update the parametersof the model in the training process using Equation(4.6) through Equation (4.8).After training, the CNN model extracts representative features directly from theraw vibration signals from multiple sensors. Fault diagnosis can then be performedon new monitoring data.Overfitting is a common problem in training, which leads to a poor perfor-64mance on the test data especially with limited training data. The present work usesdropout to prevent overfitting. Dropout is a technique that avoids extracting thesame features repeatedly to reduce the possibility of overfitting [88]. During eachiteration of training, neurons are randomly dropped out, which means temporarilyremoved from the network, along with all their incoming and outgoing connec-tions with probability p, so that a reduced network is left for training [89]. It canbe implemented by setting the selected elements of the feature maps to be zero. Inthe testing phase, the dropout is turned off and the probability p is multiplied byeach feature map element. Dropout is considered to combine exponentially manydifferent neural network architectures in an efficient way to find the fittest model.4.4 Experimental StudiesTo evaluate the effectiveness of the proposed approach for the fault diagnosis ofmachinery, two practical rotating devices, bearings and gearboxes, are investigatedin the present work. Vibration signals of different machine conditions are collectedfrom multiple accelerometers. The parameters of the model are trained and updatedthrough the training samples. Then the test samples are provided to the trainedmodel to evaluate the fault diagnosis accuracy of the proposed approach.4.4.1 Bearing Fault DiagnosisExperimental setup and data descriptionIn this case study, the publicly available roller bearing condition dataset collectedfrom a motor drive system by CWRU is analyzed [74]. The objective is to diagnosedifferent faults of bearing, with different levels of severity. The main componentsof the experimental setup are a 2 hp motor, a torque transducer and a dynamometeras shown in Fig. 3.6. Vibration signals have been collected using accelerome-ters mounted at three different locations: the drive end, the fan end, and the base.Bearings under different conditions are tested including normal condition, ball de-fect (BD), outer race defect (OR), and inner race defect (IR). Each defect type(BD, OR, and IR) has three levels of severity. A single point fault was introducedto each bearing by electro-discharge machining with fault diameters of 0.18 mm,65(d)(e)(f)(a)(b)(c)(g)(h)(i)Time (ms)Amplitude (V)Figure 4.3: Vibration signals of bearings with different conditions from onesensor. (a) BD 0.18 mm. (b) BD 0.36 mm. (c) BD 0.53 mm. (d) OR0.18 mm. (e) OR 0.36 mm. (f) OR 0.53 mm. (g) IR 0.18 mm. (h) IR0.36 mm. (i) IR 0.53 mm.660.36 mm, and 0.54 mm. In this case study, the nine faulty bearing conditions areincluded in the dataset, as shown in Table 4.1. The vibration signals have beencollected under four load scenarios (0, 1, 2, and 3 hp) at the sampling frequency of12 kHz. For each bearing condition, 400 samples with 1200 data points have beencollected from the four load conditions. Fig. 4.3 plots the vibration signals of thenine conditions from one sensor.The dimension of each data sample is 3×1200 . There are in total 3600 sam-ples in the dataset with 400 samples for each bearing condition. The hyper param-eters of the convolution layers and the max-pooling layers are selected as listedin Table 4.2 and Table 4.3. In this work, random subsampling with validation isused. The dataset is split randomly into subsets of training, validation and testing.The ratio of each subset is defined as 70%, 15%, and 15%, which are commonlyused. The training dataset is used to train the model. The validation data is used tostop the training when the error rate decreases slightly or even increases, to preventoverfitting. In the experiment, the training is first run for a comparatively long time.Then, based on the error curve of validation dataset, an appropriate epoch is chosenand the corresponding model is selected. The testing dataset is then used to test thefault diagnosis performance of the trained model. Ten trials are carried out and theaverage accuracy and the variation are calculated to evaluate the performance ofthe proposed method.Results and discussionThe parameters of the model are updated using mini-batch stochastic gradient de-scent with a batch size of 50. Different learning rates are tested from 0.005 to0.05 with an interval of 0.005. The results show that 0.015 achieves the best test-ing accuracy and the best convergence. The training and testing result of one trialis shown in Fig.4.4. The training process converges comparatively fast within 25epochs. The testing accuracy is satisfactory, with only one sample misclassified.The average testing accuracy of the ten trails is 99.41% with a standard deviation of0.37%. Further comparison and analysis are made under the following scenarios.(1) Fused data from multiple sensors vs. data from one sensor.The training and testing accuracies of the proposed model with multiple sen-67(a)Actual conditionPredicted condition(b)Figure 4.4: Experimental result of bearing dataset. (a) Convergence curve ofthe training process. (b) Condition classification confusion matrix.68Table 4.1: Conditions of the bearing in the datasetCondition Type Defect severity (mm) Class label Sample numberBD 0.18 1 400BD 0.36 2 400BD 0.54 3 400OR 0.18 4 400OR 0.36 5 400OR 0.54 6 400IR 0.18 7 400IR 0.36 8 400IR 0.54 9 400[BD denotes ball defect; OR denotes outer race defect; IR denotes innerrace defect.].Table 4.2: Hyper parameters of the convolution layersParameter First Convolution Layer Second Convolution LayerNumber of filters 64 128Size of filter [3, 17, 1] [1, 8, 64]Stride 1 1Dropout ratio 0.5 0.5Table 4.3: Hyper parameters of the max-pooling layersParameter First Convolution Layer Second Convolution LayerPooling size 8 4Stride 8 4sors are compared with using signal from only one sensor, as given in Table 4.4.Ten trials are carried out to diagnosis the bearing conditions. The average testingaccuracy is 99.41% by using signals from multiple sensors, which is higher than98.35% by using the signal from only one accelerometer. Also, the standard de-69Table 4.4: Diagnosis accuracy of bearing data using multiple sensors and onesensorTraining Accuracy (%) Testing Accuracy (%)Tail # Multiple sensors One sensor Multiple sensors One sensor1 100 99.37 99.44 99.262 100 99.25 99.26 98.523 99.96 99.21 99.44 98.524 100 99.33 98.52 98.895 100 98.73 99.81 98.156 99.96 99.56 99.44 99.267 100 99.48 99.63 98.528 100 98.97 99.26 98.529 100 98.77 99.44 95.1910 100 98.97 99.81 98.70Average 99.99 99.16 99.41 98.35StandardDeviation0.02 0.29 0.37 1.16viations of both training and testing accuracy of using the proposed approach aremuch lower than those using only one sensor, which shows a more reliable per-formance. The result shows that the proposed approach achieves greater and morerobust diagnosis accuracy with the fusion of signals at the data level.(2) Proposed approach vs. SVM and kNN based on hand-crafted statisticalfeatures.As in the traditional data-driven fault diagnosis approaches, manual feature ex-traction is conducted first. Statistical features in the time and frequency domainsused in [82][90] are calculated and used in this case study for the fault diagnosisusing SVM and kNN. Table 4.5 shows the ten features in the time domain and fivefeatures in the frequency domain that are selected. Features are calculated usingvibration signals from all the three sensors. Two SVM classifiers with linear ker-nel and quadratic kernel are used as well as kNN and weighted kNN. Table 4.6displays the diagnosis results using SVM and kNN compared with the proposedapproach. Only the classification accuracies of the bearing with a ball defect of0.36mm and 0.54mm through the proposed method are slightly lower than usingquadratic SVM. The proposed method performs better in other conditions and ob-70Table 4.5: Features selected in the time and frequency domainsDomain FeatureTimeAbsolute mean: 1n ∑ni=1 |xi|Variance: 1n ∑ni=1 (xi− x)2Crest: max(|xi|)Clearance factor: max(|xi|)/(1n ∑ni=1√x2i)2Kurtosis: 1n ∑ni=1 x4iCrest factor: max(|xi|)/√1n ∑ni=1 x2iRoot mean square:√(∑ni=1 x2i)/nPulse factor: max(|xi|)/(1n ∑ni=1 |xi|)Skewness: 1n ∑ni=1 x3iShape factor:√1n ∑ni=1 x2i /(1n ∑ni=1 |xi|)FrequencyAverage frequency: (∑ni=1ωiXi)/∑ni=1 XiCrest: max(|Xi|)Kurtosis: 1n ∑ni=1 X4iMean energy: 1n ∑ni=1 XiVariance: 1n ∑ni=1 (Xi− X¯)2tains the highest overall testing accuracy of 99.44%. It achieves end to end learningfrom raw data with no hand-crafted features.4.4.2 Gearbox Fault DiagnosisExperimental setup and data descriptionIn this case study, the gearbox condition dataset is collected from the conveyorsubsystem of an industrial fish processing machine as shown in Fig. 4.5 (a). Theobjective is to diagnose the different type of faults of the gearboxes. A SEW-Eurodrive R57DT80N4ES1S motor is used as the driving source of the conveyorsystem connected with a SEW-Eurodrive R57 gearbox. Four gearbox conditions71(c) Accelerometer DG DB MOS (b) (d) (a) Motor and gearbox Figure 4.5: Experimental setup for gearbox fault diagnosis. (a) Conveyorsystem of a fish processing machine; (b) Three faulty gearboxes; (c)Accelerometers mounted on the gearbox; (d) National Instruments PXIeDAQ system.72Table 4.6: Comparison of bearing fault diagnosis results using different ap-proachesFaultconditionProposedmethodLinearSVMQuadraticSVMkNN WeightedkNNBD 0.18 100 99.25 99.5 99 99.25BD 0.36 96.67 94.25 97.25 93.75 92BD 0.54 98.33 97.75 98.75 96.25 96OR 0.18 100 100 100 100 100OR 0.36 100 99 99.75 99.75 99.75OR 0.54 100 100 100 100 100IR 0.18 100 100 100 100 100IR 0.36 100 99.75 99.5 98.75 98.25IR 0.54 100 99.5 99.5 100 100Overall 99.44 98.83 99.36 98.61 98.36[BD denotes ball defect; OR denotes outer race defect; IR denotes innerrace defect.].are tested including the normal condition and three faulty conditions (damagedgear (DG), damaged bearing (DB) and misaligned output shaft (MOS) as shown inFig. 4.5 (b)). Two accelerometers (KISTLER 8702B25 and KISTLER 8704B100)are mounted on the gearboxes in vertical and horizontal directions as shown in Fig.4.5 (c). The vibration signals are acquired by National Instruments PXIe DAQsystem (shown in Fig. 4.5 (d)) with sampling frequency of 5 kHz. In total 6000samples are collected where each gearbox provides 1500 samples with a length of1000 data points.Fig. 4.6 shows the plots of the vibration signals of each condition from oneaccelerometer. Since two sensors are used, the dimension of each sample is 2×1000. The same random subsampling with validation method is used here. Seventypercent of the samples are used for training, fifteen percent for validation, andfifteen percent for testing. The details of the dataset are shown in Table 4.7.Results and discussionA similar CNN structure as in the first case study is used. The filter size of the firstconvolution layer is adjusted to [2,17,1] due to the change in the data dimension.73(a)(b)(c)(d)0-550-550-220-220 25 50 75 1000 25 50 75 1000 25 50 75 1000 25 50 75 100Amplitude (V)Time (ms)Figure 4.6: Vibration signals of gearboxes with different conditions from onesensor. (a) Normal condition. (b) Damaged gear. (c) Damaged bearing.(d) Misaligned output shaft.Table 4.7: Gearbox dataset detailsConditiontypeClasslabelTrainingsamplesValidationsamplesTestingsamplesNormal 1 1050 225 225DG 2 1050 225 225DB 3 1050 225 225MOS 4 1050 225 225[DG denotes damaged gear; DB denotes damagedbearing; MOS denotes misaligned output shaft.].In the training stage, 1050 samples of each condition are used to train the model.Parameters are updated using mini-batch stochastic gradient descent with a batch74Table 4.8: Diagnosis accuracy of gearbox data using multiple sensors and onesensorTraining Accuracy (%) Testing Accuracy (%)Tail # Multiple sensors One sensor Multiple sensors One sensor1 99.95 98.29 100 98.332 99.98 98.07 100 98.003 99.81 98.07 100 98.004 99.95 98.52 99.89 98.785 99.93 98.71 99.78 98.786 99.98 97.31 99.56 96.787 99.98 98.52 99.78 97.898 100 98.62 99.89 94.679 99.88 98.43 99.78 98.7810 99.93 96.60 99.89 95.78Average 99.94 98.18 99.83 97.58StandardDeviation0.06 0.69 0.13 1.40size of 100. Fig.4.7 shows the training and testing results of one trial. Fig. 4.7(a)plots the convergence curve of the training process of one trial. The proposed ap-proach converges fast within 20 epochs. Fig. 4.7(b) shows the confusion matrix ofthe test dataset. The classification performance is outstanding, with only one sam-ple on the condition of the damaged bearing being misclassified to the condition ofthe damaged gear. The average testing accuracy of the ten trials is 99.83% with astandard deviation of 0.13%.To further evaluate the effectiveness of the proposed method, comparison andanalysis are made under the following two scenarios.(1) Fused data from multiple sensors vs. data with one sensor.Table 4.8 lists the training and testing accuracies of the proposed model withmultiple sensors, compared with using the signal from only one sensor. Ten trialsare carried out to diagnose the gearboxes with four conditions. The average testingaccuracy by using signals from multiple sensors is 99.83% , which is higher than97.58% by using the signal from one accelerometer. Result shows that the proposedapproach achieves higher diagnosis accuracy with the fusion of signals at the datalevel. Also, similar to case study one, the standard deviation of the accuracy using75Actual conditionPredicted condition(b)NormalNormalDG DB MOSDGDBMOS(a)Figure 4.7: Experimental result of gearbox dataset. (a) Convergence curve ofthe training process. (b) Condition classification confusion matrix.76Table 4.9: Comparison of gearbox fault diagnosis results using different ap-proachesFaultconditionProposedmethodLinearSVMQuadraticSVMkNN WeightedkNNNormal 100 100 99.56 100 99.89DB 100 96.47 97.00 94.20 96.40DG 99.56 94.53 95.27 92.93 94.47MOS 100 97.20 97.80 95.40 96.47Overall 99.89 97.05 97.80 95.40 96.47[DG denotes damaged gear; DB denotes damaged bearing; MOS denotesmisaligned output shaft.].multiple sensors is lower, which indicates that the performance of the proposedmethod is more reliable than when using data from only one sensor.(2) Proposed approach vs. SVM and kNN based on hand-crafted statisticalfeatures.The same statistical features in the time and frequency domains are used in thiscase, as given in Table 4.5. Features are calculated from vibration signals fromboth sensors. The linear SVM, quadratic SVM, kNN, and weighted kNN, are com-pared here. Table 4.9 shows the diagnosis results using SVM and kNN comparedwith the proposed approach. Classification accuracies of the gearbox with normalcondition are all close to 100% for all the methods. For other conditions, the pro-posed method achieves much better accuracy than all the other approaches. Theoverall performance of the proposed approach is the best.In summary, a CNN-based approach with multiple sensor fusion was proposedfor fault diagnosis of rotating machinery. Sensor fusion was achieved at the datalevel to increase the diagnosis accuracy and reliability by integrating the raw sig-nals to form the input of the CNN-based model. Representative features werelearned directly from raw signals by training the CNN model where no hand-crafted features were needed. Mini-batch stochastic gradient descent and dropoutwere utilized in the training process to increase efficiency and to prevent of overfit-ting when the size of available data was small. Experimental studies on both roller77bearings and gearboxes verified the diagnosis performance of the proposed ap-proach for fault diagnosis. The comparison between the proposed method and thetraditional approaches showed that the proposed CNN-based method could achievehigher and more reliable diagnosis performance. Besides, the end-to-end featurelearning capability of the proposed approach would enable its wide application infault diagnosis of different types of machinery and various faults even when therewas limited prior knowledge and limited representative hand-crafted features.78Chapter 5RUL Prediction usingHierarchical Deep NeuralNetwork5.1 IntroductionSupporting the critical decision-making processes of MHM such as maintenancescheduling, machine health prognostics are of great importance to engineered sys-tems composed of multiple components [91]. RUL prediction is conventionally thekey task in machine health prognostics. With an accurate prediction of RUL, ap-propriate replacement or maintenance actions can be taken to ensure the reliabilityand safety of the running system. The approaches of RUL prediction can be classi-fied into three categories: model-based methods, data-driven methods, and hybridmethods. Model-based methods predict the RUL by using a physical model thatcaptures the possible damage progression processes. Due to the increased com-plexity of engineered systems, it is almost impossible to understand the physics-of-failures of all the components or subsystems. On the other hand, with significantadvances in sensors, communication, data storage and processing technologies, thedata- driven approaches have become more popular and more widely used. Data-driven approaches generally include sensory data acquisition, feature extraction,79pattern recognition, and regression. Based on the characteristics of the machinery,various sensors, e.g., accelerometer, pressure sensor, thermometer, and camera canbe used to acquire the condition monitoring data.From the point of view of system design, the actual lifetime of a system orsubsystems will present the optimal cost effectiveness if it is equal to the designedor required life time of the system or subsystems. Shorter lifetime can lead to un-satisfactory system performance and longer lifetime can potentially increase thecost if more appropriate alternative design is available. Also, inappropriate designor assembly can reduce the lifetime of related components. Therefore, incorporat-ing the lifetime assessment into the evaluation of the system design can facilitatethe design optimization by identifying components or subsystems that may needimprovement.The main steps of most of the existing approaches of RUL prediction includefeature extraction and regression modeling. For instance, in the prognostics of ma-chines based on vibration, features such as RMS and kurtosis are first calculatedfrom the vibration measurements. Then a regression model is fitted using the fea-tures. The existing approaches usually build RUL prediction models based on theentire degradation process. Fig.5.1 shows a typical degradation process of a bear-ing. Fig. 5.1(a) is the plot of the vibration signal of the run-to-failure process. Fig.5.1(b) shows the RMS of the vibration signal. The degradation level often stayslow in the beginning, starts to increase at a small rate after approximately half ofthe lifetime, and increases dramatically before the system fails.However, fitting the whole degradation process with a single regression modelcan be difficult due to the significant changes that occur at different stages. More-over, training data of the whole degradation process is usually very limited andhas different lengths, which makes it even more challenging to determine a pre-cise model for the entire degradation process. Therefore, in this dissertation, ahierarchical DNN-based RUL prediction (DNNRULP) method is proposed to as-sess the RUL. The DNNRULP method contains three main parts: a DNN-basedhealth stage classifier (DNNHSC), ANN-based RUL predictor (ANNRULP) foreach health stage, and a smoothing operator. The degradation process is first seg-mented into n health stages based on its age in the entire degradation. A DNNHSCis built and trained by the training samples with labels of the n health stages. After80Figure 5.1: Degradation process of a bearing. (a) Vibration signal of the run-to-failure process. (b) RMS of the vibration signal.training, when the online monitoring signal arrives, the DNNHSC can output theprobabilities of each health stage to which the current signal belongs. On the otherhand, n ANNRULPs are trained in each stage using features calculated from theraw signal in the training data. When the online signal arrives, the features arecalculated first and inputted into the n ANNRULPs to get n RUL values. Finally,a smoothing operator is applied on the probabilities outputted from the DNNHSCand the RUL values outputted from the ANNRULPs to get the predicted RUL re-81Figure 5.2: Flowchart of the DNNRULP method.sult. The flowchart of the proposed method is shown in Fig. 5.2. Rotating ma-chinery is studied in this research as an illustration because of its significance andwide usage. With appropriate segmentation of health stages in the DNNHSC partand the utilization of features in the ANNRULP part, the proposed RUL predictionscheme can be applied to broad categories of components and systems.5.2 Health Condition Classification using DNNIn this study, the degradation process is divided into five stages{S(i)}5i=1 as shownin Fig. 5.3. The intervals are selected based on the trend of the degradation of thetraining data which is the available l historical run-to-failure data{H(i)}li=1. Eachhistorical run-to-failure data H(i) contains k time series items X(i)j ∈{X(i)1 ,X(i)2 ,X(i)3}at each time cycle from the beginning to the end of the test; i indicates the i-th run-to-failure data item and j ∈ {1,2, ...,k}. Each time series contains m datapoint which is usually the same for all the time series items, as determined by thesampling rate and the sampling duration. Then, a DNN model is initialized andtrained using the training data. Fig. 5.4 shows the structure of the DNN used in82Figure 5.3: Five stages of the degradation process.the DNNHSC. The SDA-based feature learning procedure shown in Section 3.3 isadopted here.The flowchart of the health stage classification is shown in Fig. 5.5.The main steps are the following:Step 1: Data pre-processing.The condition monitoring data through various sensors based on the characteristicsof the monitored system is collected. In this work, vibration signals are used forrotating machinery. The spectrum of the time signal is obtained by applying fastFourier transform. The coefficients are then used as the input to the DNN.Step 2: Initialization of the DNN.A DNN with N hidden layers is initialized with random parameters. New layerscan be added until the model starts to overfit the training data as suggested by [73].The number of nodes in the first layer can be set equal to or slightly higher than thedimension of the input data. The number of nodes of the following layers usuallydecrease gradually.Step 3: Training of the DNN.First, a denoising autoencoder is applied using the method described in section3.3.1 for improved initialization of the parameters of the network. Layer wise83Figure 5.4: Structure of the DNN used in the DNNHSC .training is carried out before the layers are stacked. Then, supervised training isconducted to further update the parameters with all the training data with labels ofhealth stage.Step 4: Health stage classification with the trained DNN.With the trained DNN, health stage classification can be carried out on test data.The DNN outputs the five probabilities that an input signal belongs to each of thefive health stages.5.3 ANN-based RUL Predictor and Smoothing OperatorANN with one or two hidden layers has been used by researchers for RUL predic-tion by capturing the nonlinearity in the relation between the condition monitoringsignal and the progress in degradation. Instead of modeling the entire degradationprocess by one ANN model, in this dissertation, several ANN models are builtbased on different health stages of the system. The overall degradation process is84Figure 5.5: Flowchart of the health stage classification.Figure 5.6: Formation of the training samples to each ANNRULP.segmented into five health stages and the samples in each stage form the trainingsamples for each of the ANN-based RUL predictor as shown in Fig. 5.6.An ANN structure similar to the one in [37] and [38] is adopted in this disser-tation to build each RUL predictor. The ANN model contains an input layer, twohidden layers, and an output layer as shown in Fig. 5.7.85Figure 5.7: ANN model for RUL prediction.The input of the ANN model includes ti, ti−1 , RMSi, and RMSi−1. Here, ti andti−1 are the age values of a component at the inspection point i, and the previousinspection point i−1. RMSi is the RMS value of the signal at the inspection pointi, RMSi−1 is the RMS value of the signal at the previous inspection point i− 1.The output of the ANN model is LPi, the life percentage of the component at theinspection point i. It is defined as the life time until the i-th inspection point overthe duration of the entire life, as a percentage.The ANN model considers the age values and the measurement values of thecurrent and the previous inspection points to build an estimation model for RUL.Only the data at the two inspection time points are used instead of incorporatingdata at more past inspection points since an ANN can have better generalizationcapability with fewer input nodes [38].After obtaining the predicted RUL using the predictors, a smoothing operatoris applied to get the final RUL prediction through the following equation:RUL(X) =n∑i=1P(Si|X) ·Ri(X) (5.1)Here X is the current measurement of the monitored system; i = 1,2, ...,n rep-86resent the n different health stages; P(Si|X) is the probability that X is assigned toclass i using the DNN health stage classifier; and Ri(X) is the predicted RUL of thecurrent measurement by the ith ANN predictor.5.4 Experimental StudyThis section presents a case study of RUL prediction using the proposed DNNRULPmethod. RUL prediction of bearings is considered here. The typical degradationprocess of a bearing is shown in Fig. Data DescriptionIn the present experimental study, the dataset of IEEE 2012 PHM Data Challenge isused [92]. The experiment platform, PRONOSTIA, is designed to test and validatethe methods of bearing fault diagnosis and prognosis. It provides real experimentaldata of degradation of bearings in only a few hours. Compared with other bearingtestbeds, the PRONOSTIA provides degradation data without any initial initiateddefects on the bearings. PRONOSTIA includes three main parts: a rotating part, adegradation generation part and a measurement part. The detailed components ofthe platform are shown in Fig. 5.8.Run-to-failure experiments are performed on the PRONOSTIA platform. Vi-bration and temperature signals are collected from 17 run-to-failure bearing testsunder three different operating conditions as listed in Table 5.1. In this study, onlythe vibration signal in the horizontal direction is used for the RUL prediction. Thevibration signal is sampled at the sampling rate 25.6kHz at each 10s for a dura-tion of 0.1s as shown in Fig. 5.9. Therefore, each sample contains 2560 points.The training dataset is formed by 16 run-to-failure vibration signals and the testingdataset is formed by one run-to-failure signal.5.4.2 RUL Prediction Using DNNRULPFirst, the DNNHSC is trained using the training samples. FFT is performed oneach vibration signal from the bearing test. Each vibration signal contains 2560data points. Thus, there are 1280 data points after FFT. A five layer DNN withone input layer, three hidden layers, and one output layer is initialized by random87Figure 5.8: PRONOSTIA bearing test setup.Table 5.1: The operation conditions of the experimentOperation Condition Motor Speed (rpm) Radial Force (N)1 1800 40002 1650 42003 1500 5000parameters. The number of nodes of the hidden layers is selected as 800, 400,and 100. The samples from the 16 run-to-failure tests are used to train the DNNmodel. SDA is first performed to get a better initialization of the parameters. Thensupervised fine-tuning is conducted using the training data with labels. Second,the ANNRULPs are trained separately by samples of the corresponding stage. TheRMS value is first calculated for each vibration sample. Then, five ANNs with thestructure shown in Fig. 5.7 are initialized with random parameters. The age valuesand the RMS values of the current inspection point and the previous point formthe input of the ANN. The expected output is the life percentage of the currentinspection point. The five ANNs are trained using the training dataset.88Figure 5.9: Vibration signal acquisition scheme .With the trained DNNHSC and five ANNRULPs, the RUL prediction is per-formed on the testing data, which is the vibration signal from one run-to-failureexperiment. The testing sample has 434 data points. Ten trials are conducted inthe experiment. Table 5.2 shows the training and testing results of the DNNHSC.Using Equation 5.1, the predicted life percentage of the testing data is shown inFig. 5.10.The average prediction error is used as the measure for the prediction perfor-mance using the following equation:e¯ =1n·n∑i=1|LPi− ˆLPi| (5.2)where e¯ denotes the average prediction error; n is the number of inspection pointsof the testing sample; and LPi is the actual life percentage at inspection point i; ˆLPiis the predicted life percentage at inspection point i. The RUL prediction accuracyin later stage of the running system is usually more important as the system is closeto possible failure when maintenance or replacement needs to be arranged. Thus,in this dissertation, the prediction performance of the last 10% and the last 5% ofthe total inspection points is calculated to evaluate the effectiveness of the proposed89Table 5.2: Training and testing result of the DNNHSCTrial Number TrainingAccuracy(%)ValidationAccuracy(%)TestingAccuracy(%)1 80.65 78.40 78.112 80.82 79.21 70.513 81.15 78.03 73.274 80.69 77.54 76.505 80.97 78.16 77.426 80.72 78.53 75.357 81.02 77.07 70.748 80.98 79.64 75.129 81.20 77.26 73.2710 80.70 77.32 74.65Average 80.89 78.12 74.49Standard Deviation 0.19 0.81 2.45Figure 5.10: RUL prediction result using the HDNNRULP.90Table 5.3: RUL prediction error using the HDNNRULPTrial Number Eave (%) Eave10 (%) Eave5 (%)1 7.12 9.23 9.692 7.45 7.66 8.373 6.96 7.77 9.544 6.63 7.49 8.005 8.89 9.13 9.596 5.79 6.78 7.517 7.30 5.14 6.838 6.67 8.64 9.159 8.04 12.12 10.7010 7.74 7.30 9.37Average 7.26 8.13 8.88Standard deviation 0.85 1.84 1.17method. The average prediction error of the last 10% and 5% of the total inspectionpoints is calculated using Equation 5.2 by setting n equal to be 43 and 22. Table5.3 lists the average prediction error of the total inspection points and that of thelast 10% and 5%.The average e¯ of the ten trails is 7.26%, with a standard deviation of 0.85%.The overall prediction performance is satisfactory. The prediction error of the last10% and the last 5% of the entire degradation is 8.13% and 8.88%, respectively.The RUL prediction result using the proposed prediction method, which useshierarchical DNN, is compared with that using a single ANN predictor trained onusing the whole degradation without any stage classification. Fig. 5.11 plots theprediction results using the two different approaches.The proposed method has better overall performance than that with only onesingle predictor. Only in the very early stage, before 80 time circles, the singleANN predictor achieves a smaller error. However, afterwards, the single ANNcannot model the degradation satisfactorily and has significantly larger error. TheRUL prediction by the proposed method achieves high accuracy.In summary, a hierarchical DNN-based method has been proposed to predictthe RUL of the components and subsystems of a monitored mechatronic system.Instead of modeling the entire degradation process by one model, the proposed91Figure 5.11: Comparison of the RUL prediction results.method first segments the overall degradation process into several health stagesbased on the degradation characteristics. A DNN-based classifier is trained toachieve the classification of different health stages. For each health stage, oneANN model is established to model the relation between the extracted features andthe RUL in that health stage. Finally, a smoothing operation function is applied onthe output of the DNNHSC and the output of the n ANNRULPs to obtain the RULprediction. An experimental study of RUL prediction on bearings was conducted.The results demonstrate the effectiveness of the proposed method in RUL predic-tion. It achieves much higher prediction accuracy than that using only one ANNmodel on the entire degradation process.92Chapter 6Conclusions and Future Work6.1 ConclusionsWith enhanced requirements in reliability, flexibility, cost and intelligence of mecha-tronic systems, continuous and rapid design improvement of these systems is cru-cial. In this dissertation, new methodologies for applying MHM for the automateddesign optimization of mechatronic systems have been developed.First, a closed-loop design evolution framework incorporating MHM is pre-sented for mechatronic systems. In this approach, MHM is used for design weak-ness detection. It continuously determines design improvements for a mechatronicsystem through the stages of conceptual design, detailed design, and implementa-tion. Specifically, a systematic approach for design weaknesses detection is devel-oped, to guide the redesign process of an existing system by narrowing down thesearch space.With the assistance of IoT and CC, the limitation of a traditional MHM ap-proach in sensing, data transmission, data storage and data processing can be en-hanced. However, it can generate massive monitoring data where more intelligentdiagnostics and prognostics approaches are needed. This dissertation presented anintelligent fault diagnosis approach based on SDA and DNN. It overcomes the typ-ical drawbacks of the traditional fault diagnosis methods: feature extraction withheavy prior knowledge, requirement of a large amount of labeled data, and modelrebuilding for diagnosing new faults. Representative features can be learned auto-93matically from massive and unlabeled condition data. After fine-tuning by a fewlabeled data, the proposed approach can achieve precise fault diagnosis.Moreover, the proposed approach can utilize the trained model to diagnosenew conditions by further fine-tuning the DNN model with a few labeled data ofthe new conditions. A standard dataset of bearing faults is used to evaluate theeffectiveness of the proposed approach as well as the robustness of the method tonoise. It enhances the capability of MHM system for application in general andwider categories of mechatronic systems.Sensor fusion can supply richer information for a more accurate and reliableestimation. In this dissertation, a CNN-based approach with multiple sensor fusionhas been developed for fault diagnosis. Signals from multiple sensors are fused atthe data level to form the input to the model. Representative features can be learneddirectly from raw signals by the CNN model without any hand-crafted features.Experimental studies on the fault diagnosis of a bearing and a gearbox illustratethe effectiveness of the CNN-based approach. The end-to-end feature learningcapability enables its wide application in fault diagnosis of machinery where priorknowledge and handcrafted features are limited.A hierarchical DNN-based RUL prediction approach is developed to improvethe RUL prediction accuracy of the components and subsystems of the mechatronicsystem. Unlike modeling the entire degradation process by a single model, thedegradation process is classified into several health stages and several ANNs areused to model the degradation in each stage. In the health stage classification, aDNN-based classifier is trained to achieve the health stage classification with theraw monitoring data. The degradation in each stage is then modeled by an ANNusing the calculated features. A smoothing operator is applied on the health stageclassification result and the RUL predictions of each stage to generate the RULprediction. The performance of the HDNNRULP was evaluated by an experimentof bearing RUL prediction.6.2 Possible Future WorkThis dissertation presented a systematic approach for applying MHM for auto-mated design optimization of a mechatronic system. The closed-loop framework94of design evolution of a mechatronic system with MHM can achieve continuous de-sign improvement. This dissertation developed a design weakness index integrat-ing system performance evaluation, fault diagnosis, and prognosis. The detailedand practical formulation of the system performance evaluation can be further in-vestigated for real world applications. There is also room for research in how tointerpret the deviated RUL of the components and subsystems with respect to thedesign quality. A case study of conceptual design improvement was conducted inChapter 2. Further research can be done on the detailed design optimization withthe determined information on the system design weaknesses.In the DNN-based fault diagnosis approach, no hand-craft features are needed.It can learn representative features automatically from a massive quantity of unla-beled condition data. However, the features learned by the DNN from the trainingdata are not interpreted in detail. Future research can focus on the analysis of theunderlying meaning of these features.The proposed CNN-based fault diagnosis approach, which incorporates sensorfusion at the data level achieves more accurate and reliable fault diagnosis results.The CNN model deals with multiple sensed data with the same length under thesame sampling frequency. How to integrate data with different lengths could befurther investigated to increase the capability of the CNN-based fault diagnosismethod by fusing data having different sampling frequency.Finally, in the hierarchical DNN-based RUL prediction, the segmentation ofthe health stages is based on the time of the degradation. One possible direction ofresearch is to achieve optimized segmentation by using an AI learning method us-ing the training data. The number of stages and the time intervals of each stage canbe determined by minimizing a cost function such as the average RUL predictionerror.95Bibliography[1] Clarence W. de Silva. Mechatronics: An Integrated Approach. CRC Press,2004. → pages 1, 11, 28[2] Klaus Janschek. Mechatronic systems design : methods, models, concepts.Springer-Verlag Berlin Heidelberg, 2012. → pages 1, 5[3] Clarence W de Silva. Modeling and control of engineering systems. CRCPress, 2009. → pages 1[4] Xiaolei Xie and Jingshan Li. Modeling, analysis and continuous improve-ment of food production systems: A case study at a meat shaving and pack-aging line. Journal of Food Engineering, 113(2):344–350, nov 2012. →pages 1[5] Torgny Broga˚rdh. Present and future robot control developmentAn industrialperspective. Annual Reviews in Control, 31(1):69–79, jan 2007. → pages 1[6] Clarence W de Silva, Farbod Khoshnoud, Li Maoqing, and Saman K Halga-muge. Mechatronics: fundamentals and applications. CRC Press, Taylor &Francis Group, Boca Raton, 2015. → pages 2, 17[7] D.C. Karnopp, D.L. Margolis, and R.C. Rosenberg. System Dynamics: AUnified Approach. Wiley, 1990. → pages 2[8] John J McPhee. On the use of linear graph theory in multibody system dy-namics. Nonlinear Dynamics, 9(1):73–90, 1996. → pages 2[9] Clarence W. de Silva. Mechatronics: A Foundation Course. CRC Press,2010. → pages 2, 28[10] John R Koza. Genetic programming: on the programming of computers bymeans of natural selection, volume 1. MIT press, 1992. → pages 296[11] V.C. Moulianitis, N.A. Aspragathos, and A.J. Dentsoras. A model for conceptevaluation in designan application to mechatronics design of robot grippers.Mechatronics, 14(6):599–622, jul 2004. → pages 5[12] Clarence W. de Silva. Sensory Information Acquisition For Monitoring AndControl of Intelligent Mechatronic Systems. International Journal of Infor-mation Acquisition, 01(01):89–99, mar 2004. → pages 5[13] Saeed Behbahani and Clarence W. de Silva. Mechatronic design quo-tient as the basis of a new multicriteria mechatronic design methodology.IEEE/ASME Transactions on Mechatronics, 12(2):227–232, 2007. → pages5, 25, 33, 34[14] M. Hammadi, J. Y. Choley, O. Penas, A. Riviere, J. Louati, and M. Haddar.A new multi-criteria indicator for mechatronic system performance evalua-tion in preliminary design level. 2012 9th France-Japan and 7th Europe-Asia Congress on Mechatronics, MECATRONICS 2012 / 13th InternationalWorkshop on Research and Education in Mechatronics, REM 2012, pages409–416, 2012. → pages 6[15] M. G. Villarreal-Cervantes, C. A. Cruz-Villar, and J. Alvarez-Gallegos.Structure-control mechatronic design of the planar 5r 2dof parallel robot.In 2009 IEEE International Conference on Mechatronics, pages 1–6, April2009. → pages 6[16] J.B. Grimbleby. Automatic analogue circuit synthesis using genetic algo-rithms. IEE Proceedings - Circuits, Devices and Systems, 147(6):319, 2000.→ pages 7[17] Zouhaier Affi, Badreddine EL-Kribi, and Lotfi Romdhane. Advanced mecha-tronic design using a multi-objective genetic algorithm optimization of amotor-driven four-bar system. Mechatronics, 17(9):489–500, 2007. → pages7[18] J.R. Koza, F.H Bennett III, D. Andre, and M.A. Keane. Synthesis of topologyand sizing of analog electrical circuits by means of genetic programming.Computer Methods in Applied Mechanics and Engineering, 186(2-4):459–482, jun 2000. → pages 7[19] Kisung Seo, Zhun Fan, Jianjun Hu, Erik D. Goodman, and Ronald C. Rosen-berg. Toward a unified and automated design methodology for multi-domaindynamic systems using bond graphs and genetic programming. Mechatron-ics, 13:851–885, 2003. → pages 8, 2897[20] Jiachuan Wang Jiachuan Wang, Zhun Fan Zhun Fan, J.P. Terpenny J.P. Ter-penny, and E.D. Goodman E.D. Goodman. Knowledge interaction with ge-netic programming in mechatronic systems design using bond graphs. IEEETransactions on Systems, Man, and Cybernetics, Part C (Applications andReviews), 35(2):172–182, 2005. → pages 8[21] Saeed Behbahani and Clarence W. de Silva. Mechatronic design evolutionusing bond graphs and hybrid genetic algorithm with genetic programming.IEEE/ASME Transactions on Mechatronics, 18(1):190–199, 2013. → pages8[22] Saeed Behbahani and Clarence W. de Silva. Niching genetic scheme withbond graphs for topology and parameter optimization of a mechatronic sys-tem. IEEE/ASME Transactions on Mechatronics, 19:269–277, 2014. →pages 9[23] A. K S Jardine, Daming Lin, and Dragan Banjevic. A review on machin-ery diagnostics and prognostics implementing condition-based maintenance.Mechanical Systems and Signal Processing, 20:1483–1510, 2006. → pages9, 21[24] Paul Phillips and Dominic Diston. A knowledge driven approach to aerospacecondition monitoring. Knowledge-Based Systems, 24(6):915–927, 2011. →pages 9[25] Jae Yoon and David He. Planetary gearbox fault diagnostic method us-ing acoustic emission sensors. IET Science, Measurement & Technology,9(8):936–944, 2015. → pages[26] Siliang Lu, Qingbo He, Fei Hu, and Fanrang Kong. Sequential multiscalenoise tuning stochastic resonance for train bearing fault diagnosis in an em-bedded system. IEEE Transactions on Instrumentation and Measurement,63(1):106–116, 2014. → pages[27] Tien-I Liu, Junyi Lee, George Liu, and Zhang Wu. Monitoring and diag-nosis of the tapping process for product quality and automated manufactur-ing. The International Journal of Advanced Manufacturing Technology, 64(5-8):1169–1175, feb 2013. → pages 9[28] Z Gao, C Cecati, and S X Ding. A Survey of Fault Diagnosis and Fault-Tolerant Techniques&#x2014;Part I: Fault Diagnosis With Model-Based andSignal-Based Approaches. IEEE Transactions on Industrial Electronics,62(6):3757–3767, 2015. → pages 998[29] Shen Yin, Steven X. Ding, Xiaochen Xie, and Hao Luo. A Review on BasicData-Driven Approaches for Industrial Process Monitoring. IEEE Transac-tions on Industrial Electronics, 61(11):6418–6428, nov 2014. → pages 9[30] Xuewu Dai and Zhiwei Gao. From model, signal to knowledge: A data-driven perspective of fault detection and diagnosis. IEEE Transactions onIndustrial Informatics, 9(4):2226–2238, 2013. → pages 9[31] Diego Alejandro Tobon-Mejia, Kamal Medjaher, Noureddine Zerhouni, andGerard Tripot. A data-driven failure prognostics method based on mix-ture of gaussians hidden markov models. IEEE Transactions on Reliability,61(2):491–503, 2012. → pages 9[32] Hee-Jun Kang and Mien Van. Bearing-fault diagnosis using non-localmeans algorithm and empirical mode decomposition-based feature extractionand two-stage feature selection. IET Science, Measurement & Technology,9(6):671–680, 2015. → pages 9[33] Shen Yin and Guang Wang. Data-driven design of robust fault detection sys-tem for wind turbines. Mechatronics, 24(4):298–306, 2014. → pages 10[34] Robert X. Gao and Xuefeng Chen. Wavelets for fault diagnosis of rotarymachines: A review with applications. Signal Processing, 96:1–15, 2014. →pages 10[35] M Cococcioni, B Lazzerini, and S L Volpi. Robust Diagnosis of RollingElement Bearings Based on Classification Techniques. IEEE Transactionson Industrial Informatics, 9(4):2256–2263, 2013. → pages 10[36] W Wang, P A Scarf, and M A J Smith. On the application of a model ofcondition-based maintenance. Journal of the Operational Research Society,51(11):1218–1227, nov 2000. → pages 10[37] Sze-jung Wu, Nagi Gebraeel, Mark a Lawley, and Yuehwern Yih. A NeuralNetwork Integrated Decision Support System for Condition-Based OptimalPredictive Maintenance Policy. Systems, Man and Cybernetics, Part A: Sys-tems and Humans, IEEE Transactions on, 37(2):226–236, 2007.→ pages 10,85[38] Zhigang Tian. An artificial neural network method for remaining useful lifeprediction of equipment subject to condition monitoring. Journal of Intelli-gent Manufacturing, 23(2):227–237, 2012. → pages 10, 85, 8699[39] W Q Meeker and Y L Hong. Reliability Meets Big Data: Opportunities andChallenges. Quality Engineering, 26(1):102–116, 2014. → pages 10[40] L.B. Gamage, Clarence W. de Silva, and R. Campos. Design evolution ofmechatronic systems through modeling, on-line monitoring, and evolutionaryoptimization. Mechatronics, 22(1):83–94, 2012. → pages 11, 28[41] Min Xia and Clarence W. de Silva. A framework of design weakness de-tection through machine health monitoring for the evolutionary design opti-mization of multi-domain systems. In 2014 9th International Conference onComputer Science Education, pages 205–210, Aug 2014. → pages 11[42] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet clas-sification with deep convolutional neural networks. In Processings of 25thInternational Conference Neural Information Processing Systems, NIPS’12,pages 1097–1105, USA, 2012. → pages 12[43] A. Graves, A. r. Mohamed, and G. Hinton. Speech recognition with deeprecurrent neural networks. In 2013 IEEE International Conference on Acous-tics, Speech and Signal Processing, pages 6645–6649, May 2013. → pages12[44] Pietro Di lena, Ken Nagata, and Pierre Baldi. Deep architectures for proteincontact map prediction. Bioinformatics, 28(19):2449–2457, 2012. → pages12[45] Feng Jia, Yaguo Lei, Jing Lin, Xin Zhou, and Na Lu. Deep neural networks:A promising tool for fault characteristic mining and intelligent diagnosis ofrotating machinery with massive data. Mechanical Systems and Signal Pro-cessing, 73:303–315, 2015. → pages 13, 40[46] Wenjun Sun, Siyu Shao, Rui Zhao, Ruqiang Yan, Xingwu Zhang, and Xue-feng Chen. A Sparse Auto-encoder-Based Deep Neural Network Approachfor Induction Motor Faults Classification. Measurement, 89:171–178, 2016.→ pages 13[47] R. Saravanan, S. Ramabalan, N. Godwin Raja Ebenezer, and C. Dharmaraja.Evolutionary multi criteria design optimization of robot grippers. AppliedSoft Computing, 9(1):159–172, jan 2009. → pages 25[48] S. Behbahani and Clarence W. de Silva. System-Based and Concurrent De-sign of a Smart Mechatronic System Using the Concept of Mechatronic De-sign Quotient (MDQ). IEEE/ASME Transactions on Mechatronics, 13(1):14–21, feb 2008. → pages 25100[49] Clarence W. de Silva. Mechatronic Systems: Devices, Design, Control, Op-eration and Monitoring. CRC Press, 2007. → pages 26[50] Clarence W. de Silva and S. Behbahani. A design paradigm for mechatronicsystems. Mechatronics, 23:960–966, 2013. → pages 26[51] Jean L. Marichal. An axiomatic approach of the discrete Sugeno integralas a tool to aggregate interacting criteria in a qualitative framework. IEEETransactions on Fuzzy Systems, 9(1):164–172, 2001. → pages 26, 27[52] Jean-Luc Marichal. Aggregation of Interacting Criteria by Means of the Dis-crete Choquet Integral, pages 224–244. Physica-Verlag HD, 2002. → pages26, 33[53] Michel Grabisch. Fuzzy integral in multicriteria decision making. Fuzzy Setsand Systems, 69(3):279–298, feb 1995. → pages 26[54] Jalal Ashayeri, Gu¨lfem Tuzkaya, and Umut R. Tuzkaya. Supply chain part-ners and configuration selection: An intuitionistic fuzzy Choquet integral op-erator based approach. Expert Systems with Applications, 39(3):3642–3649,feb 2012. → pages 26[55] Chunqiao Tan. A multi-criteria interval-valued intuitionistic fuzzy group de-cision making with Choquet integral-based TOPSIS. Expert Systems withApplications, 38(4):3023–3033, apr 2011. → pages 26[56] Michel Grabisch. K-Order Additive Discrete Fuzzy Measures and Their Rep-resentation. Fuzzy Sets and Systems, 92(2):167–189, 1997. → pages 27[57] Clarence W. de Silva. Sensors and Actuators: Engineering System Instru-mentation, Second Edition. Taylor & Francis, 2015. → pages 28[58] Jun Wang, Qingbo He, and Fanrang Kong. Multiscale envelope manifoldfor enhanced fault diagnosis of rotating machines. Mechanical Systems andSignal Processing, 52:376–392, 2015. → pages 39[59] Jihong Yan and Lei Lu. Improved HilbertHuang transform based weak signaldetection methodology and its application on incipient fault diagnosis andECG signal analysis. Signal Processing, 98:74–87, 2014. → pages[60] M. Amarnath and I.R. Praveen Krishna. Empirical mode decomposition ofacoustic signals for diagnosis of faults in gears and rolling element bearings.IET Science, Measurement & Technology, 6(4):279, 2012. → pages 39101[61] Jin Yuan and Xuemei Liu. Semi-supervised learning and condition fusion forfault diagnosis. Mechanical Systems and Signal Processing, 38(2):615–627,2013. → pages 40[62] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature,521(7553):436–444, may 2015. → pages 40[63] Ju¨rgen Schmidhuber. Deep Learning in neural networks: An overview. Neu-ral Networks, 61:85–117, 2015. → pages 40, 41[64] Siqin Tao, Tao Zhang, Jun Yang, Xueqian Wang, and Weining Lu. Bear-ing fault diagnosis method based on stacked autoencoder and softmax regres-sion. Proceedings of the 34th Chinese Control Conference, pages 6331–6335,2015. → pages 40[65] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. Stacked Denoising Autoencoders: Learning Useful Rep-resentations in a Deep Network with a Local Denoising Criterion. Journal ofMachine Learning Research, 11(3):3371–3408, 2010. → pages 40[66] Quoc V. Le. Building high-level features using large scale unsupervised learn-ing. Proceedings of the 2013 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP’13), pages 8595–8598, 2013. →pages 41[67] Brain Leke Betechuoh, Tshilidzi Marwala, and Thando Tettey. Autoencodernetworks for HIV classification. Current Science, 91(11):1467–1473, 2006.→ pages 41[68] Yoshua Bengio, Pascal Lamblin, Dan Popovici, and Hugo Larochelle. GreedyLayer-Wise Training of Deep Networks. Advances in Neural InformationProcessing Systems, 19(1):153, 2007. → pages 42[69] G E Hinton and R R Salakhutdinov. Reducing the dimensionality of data withneural networks. Science (New York, N.Y.), 313(5786):504–7, jul 2006. →pages 42[70] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Man-zagol. Extracting and composing robust features with denoising autoen-coders. Proceedings of the 25th international conference on Machine learn-ing, pages 1096–1103, 2008. → pages 43, 44102[71] Xue Feng, Yaodong Zhang, and James Glass. Speech feature denoising anddereverberation via deep autoencoders for noisy reverberant speech recogni-tion. Proceedings of the 2014 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP), pages 1759–1763, may 2014. →pages 43[72] Roberto DAmbrosio, Giulio Iannello, and Paolo Soda. Softmax regressionfor ecoc reconstruction. In International Conference on Image Analysis andProcessing, pages 682–691. Springer, 2013. → pages 44, 61[73] Geoffrey E Hinton. A Practical Guide to Training Restricted Boltzmann Ma-chines. In Gre´goire Montavon, Genevie`ve B Orr, and Klaus-Robert Mu¨ller,editors, Neural Networks: Tricks of the Trade: Second Edition, pages 599–619. Springer Berlin Heidelberg, Berlin, 2012. → pages 45, 83[74] K.A. Loparo. Case Western Reserve University Bearing Data Centre Website,2012. → pages 48, 65[75] L J P Van Der Maaten and G E Hinton. Visualizing high-dimensional datausing t-sne. Journal of Machine Learning Research, 9:2579–2605, 2008. →pages 56[76] J W Park, S H Sim, and H J Jung. Displacement Estimation Using Multimet-ric Data Fusion. IEEE/ASME Transactions on Mechatronics, 18(6):1675–1682, 2013. → pages 58[77] B. Olofsson, J. Antonsson, H. G. Kortier, B. Bernhardsson, A. Robertsson,and R. Johansson. Sensor fusion for robotic workspace state estimation.IEEE/ASME Transactions on Mechatronics, 21(5):2236–2248, Oct 2016. →pages 58[78] K. Liu, N. Z. Gebraeel, and J. Shi. A data-level fusion model for developingcomposite health indices for degradation modeling and prognostic analysis.IEEE Transactions on Automation Science and Engineering, 10(3):652–664,July 2013. → pages 58[79] L. Wang, W. Ouyang, X. Wang, and H. Lu. Visual tracking with fully con-volutional networks. In 2015 IEEE Int. Conf. Comput. Vision (ICCV), pages3119–3127, Dec 2015. → pages 59[80] Zhiqiang Chen, Chuan Li, and Rene´-vinicio Sanchez. Gearbox Fault Identi-fication and Classification with Convolutional Neural Networks. Shock andVibration, 2015, 2015. → pages 59103[81] Olivier Janssens, Viktor Slavkovikj, Bram Vervisch, Kurt Stockman, MiaLoccufier, Steven Verstockt, Rik Van de Walle, and Sofie Van Hoecke. Con-volutional Neural Network Based Fault Detection for Rotating Machinery.Journal of Sound and Vibration, 377:331–345, sep 2016. → pages 59[82] Xiaojie Guo, Liang Chen, and Changqing Shen. Hierarchical adaptive deepconvolution neural network and its application to bearing fault diagnosis.Measurement, 93:490–502, 2016. → pages 59, 70[83] P. Sermanet, S. Chintala, and Y. LeCun. Convolutional neural networks ap-plied to house numbers digit classification. In Proc. 21st Int. Conf. PatternRecognition (ICPR2012), pages 3288–3291, Nov 2012. → pages 59[84] Y. LeCun, K. Kavukcuoglu, and C. Farabet. Convolutional networks andapplications in vision. In Proceedings of 2010 IEEE International Symposiumon Circuits and Systems, pages 253–256, May 2010. → pages 60[85] Chen Xing, Li Ma, and Xiaoquan Yang. Stacked Denoise Autoencoder BasedFeature Extraction and Classification for Hyperspectral Images. Journal ofSensors, 2016, 2016. → pages 61[86] Dominik Scherer, Andreas Mu¨ller, and Sven Behnke. Evaluation of poolingoperations in convolutional architectures for object recognition. In Proceed-ings of the 20th International Conference on Artificial Neural Networks: PartIII, ICANN’10, pages 92–101, Berlin, Heidelberg, 2010. Springer-Verlag. →pages 61[87] Pierre Baldi. Gradient Descent Learning Algorithm Overview: A GeneralDynamical Systems Perspective. IEEE Transactions on Neural Networks,6(1):182–195, 1995. → pages 62[88] Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, andRuslan Salakhutdinov. Dropout : A Simple Way to Prevent Neural Networksfrom Overfitting. Journal of Machine Learning Research, 15:1929–1958,2014. → pages 65[89] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever,and Ruslan Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012. → pages 65[90] M. Xia, F. Kong, and F. Hu. An approach for bearing fault diagnosis basedon pca and multiple classifier fusion. In Proceedings of 6th IEEE Joint In-ternational Conferenceof Information Technology and Artificial Intelligence,volume 1, pages 321–325, Aug 2011. → pages 70104[91] Chao Hu, Byeng D. Youn, Taejin Kim, and Pingfeng Wang. A co-training-based approach for prediction of remaining useful life utilizing both failureand suspension data. Mechanical Systems and Signal Processing, 62-63:75–90, oct 2015. → pages 79[92] Patrick Nectoux, Rafael Gouriveau, Kamal Medjaher, Emmanuel Ramasso,Brigitte Chebel-Morello, Noureddine Zerhouni, and Christophe Varnier.Pronostia: An experimental platform for bearings accelerated degradationtests. In IEEE International Conference on Prognostics and Health Man-agement, PHM’12., pages 1–8. IEEE Catalog Number: CPF12PHM-CDR,2012. → pages 87105


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items