Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Online chatter detection using self-evolving automated machine learning fusion Rahimi, MohammadHossein 2021

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2021_may_rahimi_mohammadhossein.pdf [ 6.86MB ]
Metadata
JSON: 24-1.0396326.json
JSON-LD: 24-1.0396326-ld.json
RDF/XML (Pretty): 24-1.0396326-rdf.xml
RDF/JSON: 24-1.0396326-rdf.json
Turtle: 24-1.0396326-turtle.txt
N-Triples: 24-1.0396326-rdf-ntriples.txt
Original Record: 24-1.0396326-source.json
Full Text
24-1.0396326-fulltext.txt
Citation
24-1.0396326.ris

Full Text

Online Chatter Detection using Self-Evolving Automated Machine Learning Fusion by  MohammadHossein Rahimi  B.Sc., Sharif University of Technology, 2017  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Mechanical Engineering)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  March 2021  © MohammadHossein Rahimi, 2021     ii  The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled: Online Chatter Detection using Self-Evolving Automated Machine Learning Fusion  submitted by M. Hossein Rahimi in partial fulfillment of the requirements for the degree of Master of Applied Science in Mechanical Engineering  Examining Committee: Dr. Yusuf Altintas, Mechanical Engineering Supervisor  Dr. Bhushan Gopaluni, Chemical and Biological Engineering Supervisory Committee Member  Dr. Xiaoliang Jin, Mechanical Engineering Supervisory Committee Member  iii  Abstract Computer Numerical Controlled (CNC) Milling is used to remove excess metal from a blank to produce the final shape of the workpiece. One limitation of high material removal rates with reduced cost in machining operations is self-excited vibrations called chatter. Chatter results in poor surface finish, damage to the workpiece and machine tool’s spindle, and causes accelerated tool wear. Chatter detection and prevention have been one of the major fields of study in manufacturing to improve the quality and productivity of machining operations. This research proposes a self-evolving, online method to detect and avoid chatter in milling operations by fusing deep learning with the knowledge of chatter theory. The process is monitored by collecting vibration data during machining. Windows of data are converted into Short-time Fourier Transforms and processed through a Convolution Neural Network to identify five machining states: Air cutting, entrance to and exit from the cut, stable cut, and unstable cut with chatter. A base state detection model is built by modifying and training AlexNet architecture using experimental data with known states. A specialized Deep Learning architecture is designed based on the base model, using an Automated machine learning process, with high state detection accuracy and low complexity in mind.  In parallel to the machine learning model, chatter detection with a physics-based model is executed to increase the robustness and accuracy of chatter detection. The forced vibrations which always occur in milling are removed by Kalman filter, and the occurrence of chatter is detected using an energy-based method. The hybrid system detects the chatter with a 98.8% success rate. The system has a built-in online self-improving capability. The system stops the machining process and commands a new spindle speed to force the machine to operate in chatter-free zone.  iv  Lay Summary Computer Numerical Controlled (CNC) machines are widely used in the machining industry. The motion of machine along the commanded tool path and as well as the rotation of the spindle which carries the tool is governed by the CNC, and the final shape of the product is produced through metal removal.    The cost and machining time in aerospace and die & mold are quite significant. When a machine experiences unstable chatter, the resulting vibrations lead to high forces and poor surface finish which may destroy the part, tool, spindle and costly parts.  This thesis presents a method to detect and suppress chatter vibration by infusing the physics of the cutting process and Artificial Intelligence. The proposed method can detect the chatter and identifies the best spindle speed to avoid chatter during machining.  v  Preface I was the lead investigator for the work presented in Chapters 3, 4 and 5. I planned the experimental design, conducted experimentation, developed the algorithms, and completed results analysis. Dr. Yusuf Altintas was my supervisor.  A version of chapters 3, 4 and chapter 5 has been presented at a conference. I presented the material at the British Columbia Artificial Intelligence Showcase conference. Additionally, the manuscript “On-line chatter detection in milling using machine learning” was written by M. Hossein Rahimi, Hoai Nam Huynh, and Dr. Yusuf Altintas.  vi  Table of Contents  Abstract ......................................................................................................................................... iii Lay Summary ............................................................................................................................... iv Preface .............................................................................................................................................v Table of Contents ......................................................................................................................... vi List of Tables ................................................................................................................................ ix List of Figures .................................................................................................................................x List of Symbols ........................................................................................................................... xiii List of Abbreviations ................................................................................................................ xvii Acknowledgments .................................................................................................................... xviii Chapter 1: Introduction ................................................................................................................1 Chapter 2: Literature Review .......................................................................................................7 2.1 Dynamic of Metal Cutting .............................................................................................. 7 2.2 Chatter Prediction and Detection using Physics-driven methods ................................. 11 2.2.1 Online Energy-based Chatter Detection ................................................................... 12 2.2.1.1 Identification of the Periodic Component of Machining Vibrations ................ 13 2.2.1.2 Chatter Component Assessment ....................................................................... 16 2.3 Machine Learning and its utilization in Industry and Machining Sciences .................. 19 2.4 Summary ....................................................................................................................... 20 Chapter 3: Online Cutting State Detection Using Machine Learning ....................................22 3.1 Machine Learning Training Pipeline ............................................................................ 23 3.2 Experiment Design, Data Collection, and Plausibility Assessment ............................. 26 vii  3.2.1 Cleaning, Denoising, and Handling Data using Comprehensive Industrial Process Monitoring Data Structure .................................................................................................... 29 3.2.2 Plausibility Assessment and Setting a baseline ........................................................ 32 3.3 Frequency Domain Conversion and Feature Extraction ............................................... 34 3.4 Machining State Detection using Transfer Learning .................................................... 38 3.4.1 Deep Learning Pipeline Overview ............................................................................ 39 3.4.2 Convolutional Blocks................................................................................................ 41 3.4.2.1 Two-dimensional Discrete Convolution Layer ................................................. 42 3.4.2.2 Rectified Linear Unit (ReLU) ........................................................................... 46 3.4.2.3 Local Response Normalization ......................................................................... 47 3.4.2.4 Max Pooling ...................................................................................................... 48 3.4.3 Artificial Neural Network Components .................................................................... 50 3.4.4 SoftMax..................................................................................................................... 53 3.4.5 Focal Loss Classifier ................................................................................................. 53 3.4.6 Transfer Learning (base) Method’s Architecture Overview ..................................... 54 3.5 Automated Convolutional Neural Network Architecture Design for Machining State Detection ................................................................................................................................... 58 3.5.1 Automated CNN Architecture Design Methodology and Structure ......................... 60 3.5.1.1 Architecture Structure and Outline ................................................................... 62 3.5.1.2 Design Methodology ......................................................................................... 63 3.5.2 Bayesian Optimization for Machining State Detection Architecture Design ........... 66 3.5.3 Network Assessment and Objective Function Design .............................................. 69 3.6 Summary ....................................................................................................................... 70 viii  Chapter 4: Machine Learning and Physics-driven Chatter Detection Fusion .......................72 4.1 Improved Energy-Based Chatter Detection Method ..................................................... 73 4.1.1 Chatter Energy Estimation ........................................................................................ 74 4.2 Hybrid Model ................................................................................................................ 77 4.2.1 Decision-making Algorithm ..................................................................................... 78 4.2.2 Chatter Suppression .................................................................................................. 80 4.3 Self-evolving artificial intelligence using Semi-supervised learning ........................... 81 4.3.1 Self-evolution System ............................................................................................... 81 4.4 Summary ....................................................................................................................... 84 Chapter 5: Experimental Validation and Results .....................................................................85 5.1 Data and Training Parameters ....................................................................................... 86 5.2 Baseline ......................................................................................................................... 87 5.3 Standalone Transfer Learning Machining State Detection Method Performance ........ 92 5.4 Standalone Custom Designed Convolutional Neural Network Performance ............... 95 5.5 Improved Energy-based Chatter Detection and Hybrid System (Machine Learning Fusion) Performance ............................................................................................................... 104 5.6 Summary ..................................................................................................................... 107 Chapter 6: Conclusion ...............................................................................................................109 6.1 Future Work ................................................................................................................ 111 Bibliography ...............................................................................................................................113  ix  List of Tables Table 1 Number of Instances and Experiments Distribution for each Dataset and each Machining State............................................................................................................................................... 86 Table 2 Custom Designed Machining State Detection Model Training Hyperparameters .......... 92 Table 3 Custom Designed Machining State Detection Model Training Hyperparameters ........ 100 Table 4 Custom Designed Machining State Detection Model Performance tested on the Test Dataset......................................................................................................................................... 104  x  List of Figures  Figure 1 Stable Cut (top) versus Chatter (bottom) Surface Quality ............................................... 1 Figure 2 Machining Forces for a Sample Process with ωc Chatter Frequency Illustration ........... 2 Figure 3 System Overview.............................................................................................................. 5 Figure 4 Orthogonal Cutting a) Dynamic Scheme, b) Block Diagram........................................... 8 Figure 5 Online energy-based chatter detection overview with a Sample Input to Illustrate the Process .......................................................................................................................................... 13 Figure 6 Machine Learning Architecture Design and Algorithm Training Process Overview .... 24 Figure 7 Experiment Setup ........................................................................................................... 26 Figure 8 Cutting States Illustration - from left to right respectively, Air Cut, Entrance, In Cut, and Exit ......................................................................................................................................... 27 Figure 9 Experiments’ Data Partitioning and Utilization Summary ............................................. 29 Figure 10 Overview of Comprehensive Industrial Process Monitoring ....................................... 30 Figure 11 Breaking down the data and Embedded Data Augmentation ....................................... 30 Figure 12 Setting the Baseline Procedure ..................................................................................... 32 Figure 13 Windowing of Vibration Signal for Calculating STFT ................................................ 35 Figure 14 STFT for each given Instance of Vibration Signal ....................................................... 36 Figure 15 Short-Time Fourier Transform’s Matrix Visual Representation Example ................... 37 Figure 16 STFT Example.............................................................................................................. 38 Figure 17 Deep Learning Training Pipeline Overview ................................................................. 40 Figure 18 A Generic Convolutional Block’s Notation ................................................................. 41 Figure 19 Illustration of group convolution function for a two-group, group convolution .......... 44 xi  Figure 20 Discrete Convolution Illustration with an Arbitrary Kernel Example ......................... 45 Figure 21 Applying a Line Detector Convolution Kernel on STFT Data Example ..................... 46 Figure 22  Example of Applying ReLU to Convolved STFT Matrix ........................................... 47 Figure 23 Local Response Normalization implementation illustration ........................................ 48 Figure 24 Max pooling Operation Example ................................................................................. 49 Figure 25 Example of Max-pooling and applying it to a Convolved STFT after ReLU function 50 Figure 26 Transfer Learning Architecture Overview ................................................................... 55 Figure 27 Overview of Automated Cutting State Detection Network Architecture Design Methodology Scheme ................................................................................................................... 61 Figure 28 Architecture Structure and Design Methodology Scheme ........................................... 66 Figure 29 Suggested Improved Energy-based Method's Scheme ................................................. 74 Figure 30 Chatter Peak Detection Example .................................................................................. 76 Figure 31 Proposed Physics-driven Method Performance on a Sample Signal............................ 77 Figure 32 Probability versus Certainty in Machining State Detection Using Machine Learning Output ........................................................................................................................................... 79 Figure 33 Decision Making Algorithm Outline ............................................................................ 80 Figure 34 Chatter Suppression based on the Analytical Stability Lobes for Sandvik R390-050022 Tool ............................................................................................................................................... 81 Figure 35 The tools used during data collection, from left to right Sandvik R245-12T3, Seco R21769-1632, Sandvik R390-050022, Seco JS452120E33R050, and Sandvik 39241014-6340120B ...................................................................................................................................... 87 Figure 36 Baseline Model Confusion Matrix (with normalized rows) ......................................... 88 Figure 37 Decision Tree, Decision Criteria for machining state detection................................... 89 xii  Figure 38 Decision Tree Decision Boundaries ............................................................................. 90 Figure 39 Normal Distribution Fitted to the Histogram of Each Machining State Experiments Time-domain Test Data ................................................................................................................ 91 Figure 40 Baseline Transfer Learning method’s Training Process .............................................. 93 Figure 41 The Output Value of Activation of the CNN Convolutional Layers for a given piece of Stable Frequency-domain Vibration Signal .................................................................................. 94 Figure 42 Normalized Confusion Matrix of the Transfer Learning Model .................................. 95 Figure 43 Automated Machine Learning Architecture Design Iterations .................................... 96 Figure 44 Architecture Validation Accuracy, Objective Function Value, and Complexity of Each Bayesian Optimization Iteration with top three minimum objective function value iterations marked........................................................................................................................................... 97 Figure 45 Objective function Value versus Bayesian Optimization Estimation of Objective Function Value During Architecture Design Process ................................................................... 98 Figure 46 Custom Designed Machining State Detection CNN Architecture Schematic.............. 99 Figure 47 The Output Value of Activations of the CNN Convolutional Layers for a given Piece of Frequency-domain Vibration Signal ...................................................................................... 100 Figure 48 Custom Designed Architecture Training Process ...................................................... 102 Figure 49 Trained Model of Custom Designed Architecture’s Confusion Matrix ..................... 103 Figure 50 Improved Energy-based Chatter Detection Confusion Matrix with Normalized Rows, based upon the Test Data ............................................................................................................ 105 Figure 51 Hybrid (Machine Learning Fusion) Chatter Detection System Confusion Matrix with Normalized Rows........................................................................................................................ 106  xiii  List of Symbols ℎ𝑗    Regenerative chip thickness for 𝑗-th tooth 𝜃𝑗    Angular immersion angle for 𝑗-th tooth 𝐹𝑟,𝑗, 𝐹𝑡,𝑗  Radial and tangential forces for 𝑗-th tooth 𝑎𝑝   Depth of cut 𝑞   Modal displacement vector 𝐾𝑡, 𝐾𝑟   Tangential and radial cutting force coefficients 𝑎𝑥𝑥, 𝑎𝑥𝑦, 𝑎𝑦𝑥, 𝑎𝑦𝑦 Time-varying periodic directional force coefficients 𝑄𝑐,𝑛, ?̃?𝑐,𝑛  Complex coefficient vectors and their conjugates at the 𝑛-th harmonic 𝑄   Process noise covariance matrix 𝑋𝑝,𝑛, 𝑋𝑐,𝑛  Amplitude of 𝑛-th periodic and chatter vectors 𝜆   Process noise to measurement noise ratio 𝐹   Force vector 𝐾   Kalman gain vector 𝐻   Measurement matrix 𝑀   Number of band-pass filters 𝑅   Measurement noise covariance  𝐷   Lag parameter ?̂?𝑘    Estimated state vector  𝜓    Nonlinear energy operator (NEO)  𝜓𝑑    Nonlinear energy operator discrete form 𝑦𝑐   Chatter component xiv  ?̂?𝑝   Energy of the periodic part of the signal ?̂?𝑝,𝑛   Amplitude of the 𝑛-th harmonic 𝜂   Energy integration factor 𝜏   Time delay Φ   Phase Ψ   Non-linear energy operator 𝐸𝑁𝑐   Chatter energy  𝐸𝑁𝑃    Periodic energy 𝐸𝑅   Energy ratio 𝜔𝑡   Tooth passing Frequency 𝜔𝑠   Spindle Frequency 𝜔𝑐   Chatter Frequency 𝜎   Standard Deviation 𝜇3   Skewness 𝑓𝑠   Sampling frequency 𝑀   Windowing function length 𝐿𝑤   STFT window size 𝐿𝑜   STFT overlap size ReLU(𝑥)  Rectified linear unit 𝜎𝛽   Softmax function 𝑓   Surrogate function 𝜇𝑄   Accusation function xv  𝑝(𝑠𝑡𝑎𝑡𝑒)  Probability of each machining state determined by the machine learning algorithm 𝑝state   Probability of each state 𝑃𝐶   Accumulative Probability of chatter 𝑃𝑆   Accumulative Probability of stable process 𝐵(𝑙)   Output of 𝑙-th Convolutional Block 𝐾(𝑙)   Weights matrix of Kernel of 𝑙-th Convolutional Block 𝐶(𝑙)   Output of 𝑙-th Convolutional Layer 𝑤𝑙   Weights matrix of Neural Network 𝑙-th layer 𝑤𝑖,𝑗𝑙    Weight of the connection of neuron 𝑖 from layer 𝑙 − 1 and neuron 𝑗 from layer 𝑙 𝑏𝑙   Biases matrix of layer 𝑙 𝑏𝑗𝑙   Bias of layer 𝑙 and neuron 𝑗 𝐹𝐿   Focal loss 𝑠𝑖   Network design hyperparameters set for 𝑖-th iteration 𝑊𝑃   Weights matrix of loss penalty 𝑎𝑖(𝑙)   Activation of neuron 𝑖 of layer 𝑙 𝐶𝑖   Convolutional block 𝑖 𝑛𝑐   Number of Convolutional blocks 𝑁𝑖   Neural network hidden layer 𝑖 𝑛𝑁   Number of the neural networks hidden layers 𝐿(𝜃)   Likelihood of 𝜃 xvi  𝑇   Number of training instances 𝑘complexity  Complexity of network 𝑌𝑐   Frequency-domain signal of the non-periodic part of the vibration signal 𝑎𝑡ℎ𝑖   Peak threshold of 𝑖-th point of discrete-time 𝑓𝑠𝑢𝑝   Suppression factor 𝑡ℎCertainty  Certainty threshold 𝑡ℎ𝐸𝑅   Self-evolution energy ratio certainty threshold 𝐾𝑝   Energy-based influence factor 𝛼   Learning rate 𝜖   Offset denominator 𝑚𝑖, 𝑣𝑖   Parameter gradients and their squared xvii  List of Abbreviations AI  Artificial Intelligence ANN  Artificial Neural Network  AutoML Automatic Machine Learning  CNN  Convolutional Neural Network  DDE  Delay Differential Equations DFT  Discrete Fourier Transform ESA  Energy Separation Algorithm FFT  Fast Fourier Transform FL  Focal Loss FRF  Frequency Response Function HQC  Hannan–Quinn information criterion IIR  Infinite impulse response  LSTM  Long Short-term Memory ML  Machine Learning NAS  Neural Architecture Search NEO  Teager-Kaiser Nonlinear Energy Operator ODE   Ordinary differential equation  SDOF  Single Degree of Freedom STD  Standard Deviation STFT  Short-term Fourier transform xviii  Acknowledgments I would like to express my gratitude to my supervisor, Professor Yusuf Altintas, for the confidence he placed in me by accepting me in the Manufacturing Automation Lab and for his continued support and guidance throughout my master of applied science research.  To my friends and peers in the Manufacturing Automation Laboratory, thank you for your technical and moral support and thank you for all the shared laughs and conversations. I would like to thank Dr. Hoai Nam Huynh for his valuable inputs during the process of writing my thesis.   Finally, I would like to thank my magnificent family. I am incredibly grateful to my parents. Without the inspiration, drive, and support that they have given me, I might not be the person I am today.  1  Chapter 1: Introduction Machining is a subtractive manufacturing process where a cutting tool moves relative to a workpiece and removes material to form the desired shape. Machining processes are critical operations in manufacturing parts with final shapes. The productivity rates and dimensional accuracy of the machined parts are of great importance. Violating dimensional tolerances during production scraps the parts, which are costly and disruptive to the manufacturing chain in the factories.   Figure 1 Stable Cut (top) versus Chatter (bottom) Surface Quality Self-excited vibrations in machining, namely the chatter, is the most important limitation for high material removal and dimensional accuracy of the machined parts [1]. As shown in Figure 1, an unstable machining process with chatter leaves vibration marks on the workpiece and results in poor surface quality. During chatter, the amplitude of unstable vibration may increase exponentially leading to excessive chip thickness hence large cutting forces. Chatter does not only result in poor surface quality and excessive noise but can overload the spindle bearings and cause accelerated tool wear.  Chatter can be avoided by selecting the depth of cuts and cutting speeds within the stable zone during process planning before the part is machined. The stable cutting conditions can be predicted as a function of the Frequency Response Function (FRF) of the machine at the tool-workpiece structures at their contact zone, material properties, and tool geometry. However, there are always 2  uncertainties in material properties and FRF of the machine structure which may change under operating conditions.  Therefore, an online chatter detection and avoidance system during the machining process is necessary to run the machine in stable conditions.  As shown in Figure 2, the milling process commonly exhibits three primary sets of forces. The  stable, forced vibrations occur at tooth passing frequency and its harmonics. Chatter, the unstable vibrations, occur close to one of the natural frequencies. There are also sidebands that are spread at the multiples of tooth passing frequency away from the chatter frequency. Having multiple vibration sources with frequencies scattered over various frequency ranges makes it difficult to differentiate between the stable and unstable cut. Moreover, the machining system has numerous states such as entry to and exit from the workpiece, which excite transient vibrations at the natural frequencies of the machine which resemble chatter.   Figure 2 Machining Forces for a Sample Process with 𝝎𝒄 Chatter Frequency Illustration Data-driven methods could be used for chatter detection to account for time-varying system parameters and compensate for the measurement inaccuracies by processing vibration data collected during machining. However, it is essential to ensure the robustness and versatility of the data-driven method to be used for chatter detection.    Fre uencyMagnitude                   3  This thesis presents an online chatter detection method by infusing a data-driven machine learning method, with a physics-driven method to achieve robustness and accuracy.  As shown in Figure 3, the proposed method consists of a machining state detection using convolutional neural networks and a chatter detection algorithm based on the physics of milling running in parallel. Both methods use the vibration signal as their input.   The first algorithm is a self-evolving machine learning method, capable of differentiation between stable and unstable cut and detecting machining states, i.e. tool-workpiece engagement. The process of data acquisition, data handling, plausibility assessment, signal processing, signal features extraction, machine learning model selection, and architecture design are described. A substantial number of cutting experiments are performed that resulted in a database of cutting experiments. The experiment database is directly used for the assessment of machining state detection using machine learning. Furthermore, the vibration signal is used for architecture design and model training. A deep learning model called Convolutional Neural Network (CNN) is used in the thesis. The network architecture, corresponding to the model’s hyper-parameters, is designed using a process known as Automated Machine Learning (AutoML). The architecture design procedure starts with a transfer learning method as the initial input to the designed loop. During the designed AutoML algorithm's execution process, a Bayesian optimization algorithm seeks the best architecture in the defined search space. The resultant architecture is used for online machining state detection. The method uses the time-domain vibration signal as the input and converts it to the frequency domain to perform cutting state detection. It outputs the probability of each machining state. 4  An automatic procedure is designed to evolve the machine learning model’s performance while it is running online. It uses statistical methods to enhance the system's performance as it functions and observes more machining cases.  The second simultaneously running method is an enhanced version of a previously developed physics-driven method, called online energy-based chatter detection. The method takes vibration signal as the input, as well, and output the chatter to periodic energy ratio, which is used as an indicator of chatter. It detects chatter by removing all the periodic components using a Kalman filter. Then by transforming the signal to the frequency domain and setting a statistical threshold to group the chatter peaks. It finds the eligible peaks based on certain criteria and calculates the chatter energy. It uses the ratio of the chatter energy to the accumulative chatter and stable energy. A higher ratio indicates a higher chance of instability.  The algorithm calculates the probability of chatter from the machine learning method and the physics-based method's energy ratio. The cutting states detected by the machine learning algorithm aids the algorithm to avoid false chatter detection. When the chatter is detected, the algorithm stops the feed, changes the spindle speed and resumes the cutting until chatter is avoided. If the chatter is impossible to avoid due to violation of physical limits, the operator is alerted to stop the operation.   5   Figure 3 System Overview 6  The thesis is structured as follows. Chapter 2 discusses the state of the art literature in chatter detection based on both physics and data-driven methods.  It includes a brief review of the recent development in artificial intelligence used in the field of machining stability.  Chapter 3 elaborates on the process of designing and training the machine learning model used for online machining state detection. The chapter starts with the experiment design process and data handling. The experimental data used to train a machine learning model is presented. The trained Model is used as the initial state of an automated machine learning architecture design. The automated design process is developed and implemented to increase the chatter detection accuracy,  and decrease the model complexity and computing power. Chapter 4 presents the machine learning model's fusion with the physics-based model. The first section of the chapter goes over the proposed improvements to the physics-based method, which increases its accuracy and robustness. Furthermore, the chapter explains the algorithm to fuse the two methods and describes the approach taken towards making the algorithm self-evolving.  Chapter 5 presents the experimental validation of the models. The chapter evaluates the performance of the methods individually, as well as accumulatively.  Chapter 6 concludes the thesis with final remarks and future work.    7  Chapter 2: Literature Review There have been numerous attempts to predict and detect chatter before and during the machining process. Prior art involving online chatter detection can be roughly categorized into two following approaches, physics-driven methods, which use metal cutting dynamics to detect chatter vibration;  and data-driven methods, which leverage the experimental data to find patterns and detect chatter. This work suggests a machine learning method for machining state detection, a physics-driven method for chatter detection, and a method to link two methods and leverage the knowledge of metal cutting to improve the data-driven method's performance.  In this chapter, prior research related to chatter detection, both using data-driven and physics-based methods, and automated machine learning architecture design are surveyed. The first section reviews the dynamics of metal cutting. Section 2.2 elaborates on the previous attempts on chatter detection and briefly explain the energy-based chatter detection method which is adopted here as a physics-based method. Section 2.3 presents the machine learning methods and their utilization in machining dynamics. Finally, section 2.4 demonstrates the conclusion derived from the literature and presents where this work stands among them.  2.1 Dynamic of Metal Cutting The dynamic model of milling with relative tool-workpiece vibrations is shown in Figure 4, alongside a block diagram description of the system. In the below figure, 𝑥𝑤 and 𝑦𝑤 represent the position of the workpiece and 𝑥𝑡 and 𝑦𝑡 reperesent the position of the tool.  8   Figure 4 Orthogonal Cutting a) Dynamic Scheme, b) Block Diagram The cutting forces act at the tool and workpiece contact in the opposite direction. The forces are proportional to the static and regenerative chip thickness. The regenerative chip thickness is given by [2],  ℎ𝑗(𝑡) = 𝑐 sin 𝜃𝑗(𝑡)  [𝑥(𝑡) − 𝑥(𝑡 − 𝜏)] sin 𝜃𝑗(𝑡) [𝑦(𝑡) − 𝑦(𝑡 − 𝜏)] cos 𝜃𝑗(𝑡) (1) Where the subscript 𝑗 ∈ ℕ is the tooth index, and 𝑐[mm/rev/tooth] is the feed rate per tooth. The term 𝜏[𝑠] represents the time delay and equal to the tooth passing period (60/(𝜔[rev/min])) for a regular pitch tool. Additionally, 𝜃𝑗[rad] is the angular immersion angle of tooth 𝑗,  𝜃𝑗(𝑡) =  𝜋 (𝜔60𝑡  𝑗 − 1𝑁) (2) The radial force (𝐹𝑟,𝑗) and tangential force (𝐹𝑡,𝑗) of tooth 𝑗 for cutting material with 𝐾𝑟 and 𝐾𝑡 radial and tangential cutting coefficients, respectively, is expressed as [3], FeedTool orkpieceRotation      ,                Milling Dynamics           ( )  ( )a) b)9   {𝐹𝑟,𝑗(𝑡) = 𝑔𝑗(𝑡)𝜌𝑗𝑎𝑝𝐾𝑟ℎ𝑗(𝑡)𝐹𝑡,𝑗(𝑡) = 𝑔𝑗(𝑡)𝜌𝑗𝑎𝑝𝐾𝑟ℎ𝑗(𝑡) (3) Where 𝑎𝑝 is the depth of cut, 𝜌𝑗 is the run-out factor [4], and 𝑔𝑗(𝑡) is an indicator of whether the tooth 𝑗 is in the cut or not as,  𝑔𝑗(𝑡) = {0 ∶ If 𝑗th tooth is not in cut at time 𝑡1 ∶ If 𝑗th tooth is in cut at time 𝑡 (4) Similarly, in the x-y frame, the forces are given by,  [𝐹𝑥𝐹𝑦] =12𝒂𝒑𝑲𝒕. [𝑎𝑥𝑥(𝑡) 𝑎𝑥𝑦(𝑡)𝑎𝑦𝑥(𝑡) 𝑎𝑦𝑦(𝑡)]⏟          2𝐴(𝑡)[𝒙(𝒕) − 𝒙(𝒕 − 𝝉)𝒚(𝒕) − 𝒚(𝒕 − 𝝉)]  𝒂𝒑𝑲𝒕𝒄 [𝑎𝑥𝑥(𝑡)𝑎𝑦𝑥(𝑡)]⏟    𝐴𝑝(𝑡)⏞          Static forces(𝐹𝑠)  (5) Where 𝑎𝑥𝑥(𝑡), 𝑎𝑥𝑦(𝑡), 𝑎𝑦𝑥(𝑡), 𝑎𝑦𝑦(𝑡)  are time-varying, periodic directional force coefficients expressed as [3],  𝑎𝑥𝑥(𝑡) = −∑𝑔𝑗(𝑡)𝜌𝑗 [sin  𝜃𝑗(𝑡)  𝐾𝑟𝐾𝑡(1 − cos  𝜃𝑗(𝑡))]𝑁𝑡𝑗=1 𝑎𝑥𝑦(𝑡) = −∑𝑔𝑗(𝑡)𝜌𝑗 [1  cos  𝜃𝑗(𝑡)  𝐾𝑟𝐾𝑡sin  𝜃𝑗(𝑡)]𝑁𝑡𝑗=1 𝑎𝑦𝑥(𝑡) =∑𝑔𝑗(𝑡)𝜌𝑗 [1 − cos  𝜃𝑗(𝑡) −𝐾𝑟𝐾𝑡sin  𝜃𝑗(𝑡)]𝑁𝑡𝑗=1 𝑎𝑦𝑦(𝑡) =∑𝑔𝑗(𝑡)𝜌𝑗 [sin  𝜃𝑗(𝑡) −𝐾𝑟𝐾𝑡(1  cos  𝜃𝑗(𝑡))]𝑁𝑡𝑗=1 (6) The modal displacement vector of the system, with a size equal to the total number of vibration modes of tool and workpiece, could be represented as the summation of its periodic component 𝑞𝑝(𝑡) and perturbation 𝑞𝑐(𝑡) as below, 10   𝑞(𝑡) = 𝑞𝑝(𝑡)  𝑞𝑐(𝑡) (7)  For a machine with a proportionally damped dynamics cutting under a stable condition, the equation of motion takes the following ordinary differential equation (ODE) form,  ?̈?(𝒕)  [𝟐𝝃𝝎𝒏]?̇?(𝒕)  [𝝎𝒏𝟐]𝒒(𝒕)= 𝑼𝑻𝒂𝒑𝑲𝒕𝑨(𝒕)𝑼[𝒒(𝒕) − 𝒒(𝒕 − 𝝉)]  𝑼𝑻𝑭𝒔(𝒕) (8) Where 𝑈  is the mode shape matrix,  𝜔𝑛  is natural frequency, and 𝐹𝑠  is the static forces from equation (5). For a stable system with no perturbation, the steady-state vibrations as follows,   ?̈?𝑝(𝒕)  [𝟐𝝃𝝎𝒏]?̇?(𝒕)  [𝝎𝒏𝟐]𝒒𝒑(𝒕) = 𝑼𝑻𝑭𝒔(𝒕)𝐎𝐃𝐄 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧→         𝒒𝒑(𝒕)= ∑𝑸𝒑,𝒏𝒆𝒊(𝒏𝝎𝒔𝒕)∞𝒏=𝟏 (9) Given equations (8) and (9), the self-excited vibrations are found from the solution of the Delayed Differential Equation (DDE) [5].   𝒒𝒄(𝒕) = 𝒒(𝒕) − 𝒒𝒑(𝒕) = ∑ (𝑸𝒄,𝒏𝒆𝒊(𝝎𝒄+𝒏.𝝎𝒕)𝒕  ?̃?𝒄,𝒏𝒆−𝒊(𝝎𝒄+𝒏.𝝎𝒕)𝒕)∞𝒏=−∞ (10) Where 𝜔𝑐[rad/s]  is the chatter frequency, and 𝑄𝑐,𝑛  and ?̃?𝑐,𝑛  are respectively the complex coefficient vectors and their conjugates at the 𝑛-th harmonic. Relative tool-workpiece displacement vector 𝑥(𝑡) is determined by the steady-state solution of equation (8) as shown below, 11   𝒙(𝒕) = ∑𝑿𝒑,𝒏𝐜𝐨𝐬(𝒏𝝎𝒔𝒕  𝚽𝐩,𝐧)∞𝒏=𝟏 ∑ 𝑿𝒄,𝒏𝐜𝐨𝐬 ((±𝝎𝒄  𝒏𝝎𝒕)𝒕  𝚽𝒄,𝒏)∞𝒏=−∞ (11) Where 𝑋𝑝,𝑛 , and 𝑋𝑐,𝑛  are respectively the amplitude of 𝑛 -th periodic and chatter vectors. Moreover, Φ𝑝,𝑛, and Φ𝑐,𝑛 are the phase of 𝑛-th periodic and chatter vectors, respectively.  2.2 Chatter Prediction and Detection using Physics-driven methods Chatter stability of milling has been studied extensively in the past few decades [6]. Tlusty and Polacek presented the stability laws requiring the frequency response function of both the machine tool and the workpiece, cutting coefficients of the work material, and tool geometry [7]. Their work resulted in stability lobes, which set a theoretical limit on the maximum stable depth of cut for each given spindle speed. However, the uncertainties in the frequency response function measurements and its corresponding variables, the system's position, and time-varying dynamics may still lead to chatter, even in a theoretical stable zone based on stability lobes.   The difficulties in chatter prediction and the machine tool's time-varying parameters resulted in research in online chatter detection and suppression methods. There have been several attempts to detect chatter during the machining process. T. Delio et al. [8] used a microphone signal during the milling process and converted it to the frequency domain to detect chatter. They set a threshold on the power spectrum to detect chatter occurrence. In another art, T. Choi and Y.C.Shin [9] detected chatter online using the fractional patterns they observed in the acceleration signals. They utilized a wavelet-based maximum likelihood estimation algorithm. In attempting to detect chatter, 12  M. Lamraoui et al. [10] investigated the cyclo-stationary components generated by the machining process.    2.2.1 Online Energy-based Chatter Detection Energy-based chatter detection [11] is an online chatter detection method that uses milling physics to detect instability in the cut. Despite its high potentials and robustness, false chatter detection happens quite often in energy-based chatter detection, especially during the transient states of cut, such as tool entrance into the workpiece or when it leaves the workpiece, since the machine is excited sue to a condition similar to impact hit and display different vibration contents and sudden changes in the vibration signal.   Energy-based chatter detection identifies chatter by comparing energy levels in the force vibration components and the signal's unstable components. In the energy-based chatter detection method, the periodic part of the signal and its energy are identified using a Kalman filter. The periodic part is subtracted from the whole signal, and the chatter energy is calculated using a bandpass filter bank and a Teager-Kaiser Nonlinear Energy Operator (NEO). Finally, the method compares the two calculated energies to conclude about the stability of the machining process. The outline of energy-based chatter detection is presented in Figure 5 with a sample input to illustrate the process. 13     Figure 5 Online energy-based chatter detection overview with a Sample Input to Illustrate the Process  2.2.1.1 Identification of the Periodic Component of Machining Vibrations   Forces during chatter can be decoupled into periodic components due to cutting and  forces close to the structural mode frequency excited by regenerative chatter, which are given by equations (9) and (10), respectively.  14  The measured vibration signal is decoupled into the periodic part that causes forced vibrations and the unstable, chatter components (𝑦 = 𝑦𝑝  𝑦𝑐). The assumption is that the process is unstable when the energy of the self-excited vibration term (𝑦𝑐) dominates the cutting process. The following equation estimates the periodic part of the cutting signal,   𝒚𝒑(𝒌𝑻) = ∑ 𝒀𝒑,𝒏(𝒌𝑻) 𝐜𝐨𝐬(𝒏𝝎𝒔𝒌𝑻  𝝓𝒑,𝒏)𝑵𝒏=𝟏 , (12) where 𝑌𝑝,𝑛(𝑘𝑇) is the amplitude of the spindle harmonic with frequency 𝑛𝜔𝑠 (𝑛 = 1,  , … ,𝑁). The state variable representation of equation (12) with 𝑁 number of harmonics is shown below,  𝒚(𝒕𝒌) = [𝟏 𝟎 … 𝟏 𝟎]⏟            𝑯𝟏×𝟐𝑵𝒒𝒌𝟐𝑵×𝟏  𝒗𝒌 (13) Where 𝑞𝑘+12𝑁×1 represents the state variable representation at discrete time instance of 𝑘  1, has been expressed as follows,  𝒒𝒌+𝟏𝟐𝑵×𝟏 = [𝒒𝟏𝒒𝟐⋮𝒒𝟐𝑵]𝒌+𝟏= [𝝓𝟏 𝟎 … 𝟎𝟎 𝝓𝟐 … 𝟎⋮ ⋱ ⋮𝟎 … 𝝓𝑵]⏟            𝝓𝟐𝑵×𝟐𝑵[𝒒𝟏𝒒𝟐⋮𝒒𝟐𝑵]𝒌⏟    𝒒𝒌𝟐𝑵×𝟏 [𝒘𝟏𝒘𝟐⋮𝒘𝟐𝑵]𝒌⏟    𝒘𝒌𝟐𝑵×𝟏 (14) And 𝜙𝑖 is the state transition matrix used to model 𝑖-th harmonic (𝑖 = 1,  ,  , … , 𝑁). Let the first and second state be 𝑞1𝑘 and 𝑞2𝑘 = 𝐴𝑘 sin(𝜔𝑡𝑘  𝜙) respectively, at the time 𝑡𝑘+1 = 𝑡𝑘  𝑇 will be derived as, 15   𝒒𝟏𝒌+𝟏 = 𝑨𝒌+𝟏 𝐜𝐨𝐬(𝝎(𝒕𝒌  𝑻)  𝝓)= 𝑨𝒌+𝟏[𝐜𝐨𝐬(𝝎𝒕𝒌  𝝓) 𝐜𝐨𝐬(𝝎𝑻)− 𝐬𝐢𝐧(𝝎𝒕𝒌  𝝓) 𝐬𝐢𝐧(𝝎𝑻)] 𝒒𝟐𝒌+𝟏 = 𝑨𝒌+𝟏 𝐬𝐢𝐧(𝝎(𝒕𝒌  𝑻)  𝝓)= 𝑨𝒌+𝟏[𝐜𝐨𝐬(𝝎𝒕𝒌  𝝓) 𝐬𝐢𝐧(𝝎𝑻)− 𝐬𝐢𝐧(𝝎𝒕𝒌  𝝓) 𝐬𝐢𝐧(𝝎𝑻)] (15) Hence the state variable is equal to,  [𝒒𝟏𝑲+𝟏𝒒𝟐𝑲+𝟏] = [𝐜𝐨𝐬(𝝎𝑻) −𝐬𝐢𝐧(𝝎𝑻)𝐬𝐢𝐧(𝝎𝑻) 𝐜𝐨𝐬(𝝎𝑻)] [𝒒𝟏𝒌𝒒𝟐𝒌]  [𝒘𝟏𝒌𝒘𝟐𝒌] ⟹ 𝝓𝒏 = [𝐜𝐨𝐬(𝒏𝝎𝒔𝑻) −𝐬𝐢𝐧(𝒏𝝎𝒔𝑻)𝐬𝐢𝐧(𝒏𝝎𝒔𝑻) 𝐜𝐨𝐬(𝒏𝝎𝒔𝑻)] (16) The state vector of the periodic signal (14) has been estimated using below Kalman filter [12],  {      ?̂?𝒌− = 𝝓?̂?𝒌−𝟏𝑷𝒌− = 𝝓𝑷𝒌−𝟏𝝓𝑻  𝝀𝑹⏞𝑸𝑲𝒌 = 𝑷𝒌−𝑯𝑻(𝑯𝑷𝒌−𝑯𝑻  𝑹)−𝟏?̂?𝒌 = ?̂?𝒌−  𝑲𝒌(𝒔𝒌 −𝑯?̂?𝒌−)𝑷𝒌 = (𝑰 − 𝑲𝒌𝑯)𝑷𝒌− (17) Where process noise to measurement noise ratio initially is 𝜆 = 10−6.  The chatter part of the signal has been estimated based on the estimated state vector (?̂?𝑘) with,  𝒚𝒄(𝒕𝒌) = 𝒚(𝒕𝒌) − ?̂?𝒑(𝒕𝒌) = 𝒚(𝒕𝒌) − 𝑯?̂?𝒌 (18) Given cos2(𝜔𝑡𝑘)  sin2(𝜔𝑡𝑘) = 1 the amplitude of the 𝑛-th harmonic is,  ?̂?𝒑,𝒏(𝒕𝒌) = √?̂?𝟐𝒏−𝟏𝒌𝟐  ?̂?𝟐𝒏𝒌𝟐  (19)  16  2.2.1.2 Chatter Component Assessment Chatter component reconstruction consists of two major steps. First, the signal is filtered to isolate the frequencies in between each two consecutive tooth passing harmonics using a set of bandpass filters. The second step is to use Nonlinear Energy Operator (NEO) [13] and Energy Separation Algorithm (ESA) [14] in order to estimate a measure of the energy of the single component signal in the discrete-time domain.  The signal 𝒚𝒄(𝒕𝒌) goes through a series of bandpass filters. In order to isolate the chatter part of the signal corresponding to each chatter frequency laying in between tooth passing harmonics, a bank of bandpass filters has been used. The number of filters is equal to the number of tooth passing harmonics(𝑁). The mentioned method results in 𝑁 output signals, each isolated frequencies in between one of the tooth passing. The mentioned filters have been designed using the Butterworth method with half-power frequencies and infinite impulse response (IIR) of order four. Each filter 𝑖th filter have half-power frequencies equal to (𝑖 − 1)𝜔𝑡 and 𝑖𝜔𝑡.   Nonlinear Energy Operator (NEO) has been introduced by Kaiser [13]; it has been denoted by 𝜓 in this thesis and defined as,  𝝍[𝒙(𝒕)] = ?̇?(𝒕)𝟐 − 𝒙(𝒕)?̈?(𝒕) (20) For the below undamped SDOF system,  𝒎?̈?  𝒌𝒙 = 𝟎 ⇒ 𝒙(𝒕) = 𝑨𝐜𝐨𝐬(√𝒌 𝒎⁄ 𝒕  𝝓) (21) where 𝐴 and 𝜙 are the amplitude and phase angle of the harmonic motion frequency and 𝜔 =√𝑘 𝑚⁄  is the natural frequency of the oscillation. Given 𝑘𝑥 = −𝑚?̈? the total kinetic and potential energy of the system is given by, 17   𝑬(𝒕) =𝟏𝟐𝒎[?̇?(𝒕)]𝟐  𝟏𝟐𝒌[𝒙(𝒕)]𝟐 =𝒎𝟐𝑨𝟐𝝎𝟐 (22) P.Maragos and T.F.Quatieri [14], suggested the following continuous Teager-Kaiser Nonlinear Energy Operators (NEOs) as a measure to track the energy per half-unit mass of the source of oscillation,  {𝝍[𝒙(𝒕)] = ?̇?(𝒕)𝟐 − 𝒙(𝒕)?̈?(𝒕) = 𝑨𝟐𝝎𝟐𝝍[?̇?(𝒕)] = ?̈?(𝒕)𝟐 − ?̇?(𝒕)?⃛?(𝒕) = 𝑨𝟐𝝎𝟒⟹ 𝝎 = √𝝍[?̇?(𝒕)]𝝍[𝒙(𝒕)], |𝑨| =𝝍[𝒙(𝒕)]√𝝍[?̇?(𝒕)] (23) Hence, in the time instance of 𝑡𝑘 the corresponding frequency 𝜔(𝑡𝑘) and amplitude |𝐴𝑡𝑘| could be calculated as below,  {      𝜽(𝒕𝒌) = 𝝎(𝒕𝒌)𝑻 = 𝒂𝒓𝒄𝒄𝒐𝒔 (𝟏 −𝝍𝒅[𝜟𝒙𝒌]  𝝍𝒅[𝜟𝒙𝒌+𝟏]𝟒𝝍𝒅[𝒙𝒌])|𝑨𝒕𝒌| = √𝝍𝒅[𝒙𝒌]𝟏 − (𝟏 −𝝍𝒅[𝜟𝒙𝒌]  𝝍𝒅[𝜟𝒙𝒌+𝟏]𝟒𝝍𝒅[𝒙𝒌])𝟐 (24) Given the equation (24), for the chatter frequency and amplitude of the 𝑚-th harmonic is given by,  {      𝝎𝑪𝒎 = 𝒂𝒓𝒄𝒄𝒐𝒔 (𝟏 −𝝍𝒅[𝚫𝒚(𝒕𝒌−𝟐𝑫)]  𝝍𝒅[𝚫𝒚(𝒕𝒌−𝑫)]𝟒𝝍𝒅[𝒚𝒄(𝒕𝒌−𝟐𝑫)])|𝑨𝑪𝒎| = √𝝍𝒅[𝒚𝒄(𝒕𝒌−𝟐𝑫)]𝟏 − (𝟏 −𝝍𝒅[𝚫𝒚(𝒕𝒌−𝟐𝑫)]  𝝍𝒅[𝚫𝒚(𝒕𝒌−𝑫)]𝟒𝝍𝒅[𝒚𝒄(𝒕𝒌−𝟐𝑫)])𝟐 (25) Where 𝜓𝑑 is the generalized discrete form of NEO as below, 18   {  𝝍𝒅[𝚫𝒚(𝒕𝒌−𝑫)] =𝚫𝒚(𝒕𝒌−𝑫)𝟐 − 𝚫𝒚(𝒕𝒌−𝟐𝑫)𝚫𝒚(𝒕𝒌)𝑻𝟐𝝍𝒅[𝚫𝒚(𝒕𝒌−𝟐𝑫)] =𝚫𝒚(𝒕𝒌−𝟐𝑫)𝟐 − 𝚫𝒚(𝒕𝒌−𝟑𝑫)𝚫𝒚(𝒕𝒌−𝑫)𝑻𝟐 (26) The difference operators for each band are as below,  {𝚫𝒚(𝒕𝒌−𝑫) = 𝒚𝒄𝒎(𝒕𝒌−𝑫) − 𝒚𝒄𝒎(𝒕𝒌−𝟐𝑫)𝚫𝒚(𝒕𝒌−𝟐𝑫) = 𝒚𝒄𝒎(𝒕𝒌−𝟐𝑫) − 𝒚𝒄𝒎(𝒕𝒌−𝟑𝑫)𝚫𝒚(𝒕𝒌−𝟑𝑫) = 𝒚𝒄𝒎(𝒕𝒌−𝟑𝑫) − 𝒚𝒄𝒎(𝒕𝒌−𝟒𝑫) (27) Where 𝐷 is the lag parameter introduced by W. Lin [15] and equal to one-quarter of the corresponding 𝑚-th chatter frequency,  𝑫 = 𝐫𝐨𝐮𝐧𝐝(𝟏𝟒𝑻𝟐𝝅(𝒎− 𝟎. 𝟓)𝝎𝒕) (28) The resultant frequencies and amplitudes of the harmonics go through a mean filter with a delay equal to spindle frequency.   The energy ratio of the signal is given by,  𝐄𝐑 =𝐄𝐍𝐜𝐄𝐍𝐜  𝐄𝐍𝐩 (29) Where 𝑬𝑵𝒑 and 𝑬𝑵𝒄 are respectively the energy of periodic and chatter part of the signal and could be determined by,  {    𝐄𝐍𝐩 = ∑(?̂?𝒑𝒏)𝟐(𝒏𝝎𝒔)𝟐𝜼𝑵𝒏=𝟏𝐄𝐍𝐜 = ∑(?̂?𝒄𝒎)𝟐(𝝎𝒄𝒎)𝟐𝜼𝑴𝒎=𝟏 (30) Where 𝜂 is the integration factor and determined based on the signal type and ?̂?𝒑𝒏 is given by equation (19). 19  Based on the energy ratio calculated by equation (29), the stability of the cut will be determined. The energy ratio is commensurate to the chatter energy from equation (30); hence, in this thesis, it has been used as an indicator of the chance of an ongoing unstable cut.  In this thesis, the energy ratio has been used as a measure of the probability of the chatter happening. The details of energy-based chatter detection method used here are presented by Caliskan, Kilic and Altintas in [11].   2.3 Machine Learning and its utilization in Industry and Machining Sciences  Artificial Intelligence (AI) is a field that studies the synthesis and analysis of computational agents that act intelligently [16]. Machine Learning (ML) is a subfield of AI concerned with developing automated methods to learn the patterns in a given set of data with the purpose of being used on future data or other outcomes of interest [17].   In the literature, researchers are trying to use a variety of machine learning methods to predict or detect chatter. M. Postel et al. [18] used deep learning to refine the chatter stability lobes. M. Lamraoui et al. [19] investigated different statistical features of the envelope signal, such as variance, skewness, kurtosis, peak value, root mean square (RMS), clearance factor, crest factor, shape factor, and impulse factor. They found variance as a practical feature for chatter detection. In the mentioned work, a fully connected neural network is trained to detect chatter. The collected data for chatter detection is done by performing slot milling tests on an aluminum workpiece. Zh. Yao et al. [9] proposed an online chatter detection method using support vector machines (SVM) to determine the stability of cut based on the vibration signal’s wavelet transform features.  20   Automated Machine Learning (AutoML) methods reduce the amount of resources and time required to create a well-performing model [20]. C. White et al. [21] showed that the Bayesian optimization can outperform other common AutoML methods, such as random search, regularized evolution, and reinforcement learning, for Neural Architecture Search (NAS).  X. He et al. [20] reviewed the recent advancements in the AutoML and provided a literature survey. Machine learning models are capable of being improved by getting retrained on a new set of data. Ch. Rosenberg et al. [22] suggested that self-training makes it possible for the machine learning systems to improve over time while they are performing. P. Mitra et al. [23] investigate the effectiveness of Bayesian AutoML methods for Physics Emulators.  Ch. Rosenberg [22] explored and evaluated semi-supervised learning and self-training approaches for training a machine learning model on weakly labelled data by conducting experiments.   2.4 Summary  In this chapter, past works of literature regarding chatter detection machine learning architecture design are reviewed. There have been advancements in chatter prediction, before the operation and during the process planning, in the past few decades. Online chatter detection is essential due to inaccuracies in the measurements and time-varying nature of the countless number of machine components.   The past literature in the field is either focused on physics-based chatter detection or data-driven method methods. To compensate for each approach's shortcomings, machine learning combined 21  with the physics of metal cutting is preferred. However, in most machine learning chatter detection literature, the physics of the process is ignored.  Among the many online chatter detection methods in the literature, an energy-based system is utilized to account for the process's physics most closely [11]. In literature, different machine learning models are proposed for chatter detection. This work has taken an automated approach towards machine learning network design by proposing an automated network design scheme.    22  Chapter 3: Online Cutting State Detection Using Machine Learning This chapter proposes a machine learning pipeline for machining state detection as well as two machine learning models, one transfer learning and the other one custom designed architecture, for detecting instability in the cutting process as well as the machining state based upon its vibration input. A machine learning pipeline is a set of consecutive mathematical functions and operations that has a set of inputs, a machine learning network, and outputs of interest. Other than inputs, machine learning model and outputs, a machine learning pipeline commonly consists of a transformation function, which in this case is a signal cleaning and short-time Fourier transform functions. The designed pipeline uses a deep learning model, known as Convolutional Neural Network, for machining state and chatter detection.  Deep learning is a subspace of machine learning in artificial intelligence.  In this chapter, the process of designing the deep learning method used for cutting state detection is explained. The outcome of this chapter is a trained deep learning architecture designed specifically for machining state detection.  The chapter is organized as follows. The first section provides an overview of the machine learning pipeline, architecture design, and its training process. Section 2.2 is dedicated to the experiment design process and describing the process of data collection. The collected data are used for deep learning architecture design, training, and evaluation, as well as for parameter tuning. The technical details behind handling the high-volume data collected from sensors during the machining process, and its utilization in the designed machine learning pipeline are discussed. In section 2.3, the details on converting the signal to the frequency domain are discussed. Section 2.4 starts with outlining the deep learning architecture, which is used for machining state detection 23  using transfer learning. Afterwards, each of the architecture's elements with the intuition behind  their use for chatter detection is described. Section 2.5 proposes an approach towards automated CNN architecture design and elaborates on the implementation and utilization of the proposed method for online machining state detection. In the proposed method, a Bayesian optimization algorithm is used as an automated design agent, which is briefly explained in the section. The outcome of section 2.4 is used as the initial input to the optimization loop in this section.   3.1 Machine Learning Training Pipeline An overview of the step taken towards designing machine learning architecture and training the model is given in Figure 6. It is a detailed depiction of the “Automated CNN Architecture Design” part of Figure 3. Training is a process during which the network parameters get tweaked in a way that minimizes the loss, and the final resultant network parameter could serve its goal, which is machining state detection, in the best way possible. This process's output is a trained machining state detection network capable of determining each cutting state's probability based on the vibration signal input.  24   Figure 6 Machine Learning Architecture Design and Algorithm Training Process Overview In general, there are four steps taken in order to construct a functioning machine learning system: 1) Data Collection, Cleaning, Labelling, and Preparation. 1.1) Design of a wide variety of cutting tests. 1.2) Cleaning and Denoising the recorded data. 1.3) Storing data in a suitable format and manually labeling them based on surface quality and knowledge of machining. 1.4) Assessing the plausibility of solving the problem using Machine Learning. 2) Feature engineering by Transforming to Frequency domain. 3) Outlining the Base Architecture. 3.1) Determining the machine learning method of interest based on the characteristic of the problem at hand. 3.2) Outline the initial input to the Automated Architecture Design Loop. 4) Automated Architecture Design Loop.                                                                                                Collected Data                                                                                                                                 Network Architecture Design Using Bayesian Optimization                                                                           Labelled Data Spectrogram Machining State Detection using Transfer LearningTrainingSpectrogram nitial State25  4.1)  Parameterization of the Machine Learning method’s hyper-parameters and define the Optimization Search Space. 4.2) Training the model corresponding to each set of hyper-parameters suggested by the Optimization method. 4.3) Definition of an objective function for the evaluation of the model performance. The section corresponding to each of the mentioned step is denoted in Figure 6 by a circled number. As illustrated in Figure 6, the architecture design and training processes start with the cutting experiments. After each experiment, the cutting states and stability are assessed manually and stored alongside the signals from sensors and cutting conditions. After having the data denoised, the plausibility of machining state detection using machine learning gets appraised. After finding the cutting state detection problem plausible to be solved using statistical methods, the vibration signals are transferred into the frequency domain. Afterwards, a base Deep Learning network gets trained on the denoised and transformed data. In this step, feature engineering, i.e. setting up the frequency domain conversion parameters, is done. The resultant network is assessed in terms of performance and based on the results. Finally, network architecture gets evolved to become optimal for the purpose of cutting state detection. The optimization is done with the goal of increasing accuracy while minimizing architecture complexity. In this thesis, a supervised learning approach is adopted for cutting state detection, meaning that a prelabelled set of data, labeled by machining metrics, has been used to train the model to detect different machining states and determine cutting stability.   26  3.2 Experiment Design, Data Collection, and Plausibility Assessment The first step towards the fabrication of a data-driven method for chatter detection is collecting cutting data from machining tests. In order to have a robust, generalized, and reliable model, having a diverse set of data collected under different cutting conditions is necessary. Hence, a variety of experiments with five tools, fluted and indexed, on steel AISI 4340 and aluminum 7075, with various tool-paths under different cutting conditions, have been performed. It results into more than 1600 cutting experiments cases. In the designed experiments, vibration signals are recorded to identify the relative tool-workpiece engagement and detecting instability in the cutting process.  Multiple sensors (including microphone, accelerometer, forces, etc.) are used to record signals during the machining process. The experiments are conducted on Quaser UX600 5-axis machining center controlled by Heidenhain TNC 640 CNC as shown in Figure 7.  Figure 7 Experiment Setup The collected data are labelled as Air Cut, transient state of tool Entrance into the workpiece, Stable Cut, Chatter, and transient state of tool Exiting the workpiece, based on process knowledge and resulting surface quality. An example of each of the mentioned Machining States is shown in 27  Figure 8. Air Cut describes the state during which the cutter is not in contact with the workpiece. Entrance and Exit describe the transient states during which the cutter is not in its full engagement with the workpiece, and it is getting inside or getting out of the workpiece, respectively. Transient vibrations occur during the entrance and exit transients, with a wide variety of frequency contents, commonly at the same frequency as chatter, due to its similarity to an impact. Hence, there is a need to separate them from the chatter. Finally, Stable Cut and Chatter describe the stability of the cut while the cutter is fully engaged.   Figure 8 Cutting States Illustration - from left to right respectively, Air Cut, Entrance, In Cut, and Exit For accuracy in determining each of the cutting states mentioned in the last paragraph, multiple factors are taken into account during the phase of manually labelling the experiment data. The machine’s feed rate and the G-Code are used to evaluate the precise relative tool-workpiece relative position. Furthermore, since chatter leaves vibration marks on the surface, the surface quality after the cut has been used as the criteria for cutting stability assessment, i.e. in order to determine whether it is chattering or not. The mentioned states and stability conditions have been 28  used during the training and automatic design process, as well as the test and performance evaluation process.  The cutting tests have been partitioned into three subsets of Training, Validation, and Test. The mentioned subsets contain 76, 12, and 12 percent of the whole data, respectively. The training and validation subsets are similar in terms of cutting conditions. As shown in Figure 4, the training set has been used to tweak the model parameters during the training process. The test data consists of more than 30% of unseen cutting conditions to ensure that the reported performance is generalized and realistic and not a result of overfitting. Overfitting a machine learning model means having the model extrapolated to the part of the input, which does not contribute to the results. Overfitting commonly happens when the model attempts to get fitted to residual variation and noises unintentionally and makes predictions based on them. There are methods and techniques to avoid overfitting, which are extensively practiced in this thesis. It is further explained in this chapter and through the thesis. Moreover, the Test set remained untouched and merely used to report the model’s performance and its subcomponents. In other words, the test subset has not been contributed to the design, training, or tuning of the cutting state detection system or the models, and it has been used only for the evaluation of the system performance. A summary of how the experiments have been used during each process is presented in Figure 9. 29   Figure 9            ’                       U    zation Summary 3.2.1 Cleaning, Denoising, and Handling Data using Comprehensive Industrial Process Monitoring Data Structure High-frequency reading from multiple sensors results in a high volume of data; hence, managing and handling the computer resources efficiently is crucial during the whole process, especially the training process. Moreover, it is necessary to have a data type designed to accommodate the multi-sensor, time-series measurements’ re uirements. This data structure’s main objectives are to speed up the training process and increase the robustness, scalability, and versatility of the training pipeline.  In Figure 10, in the designed data structure, each entry has three primary attributes: index, measurement source (e.g. microphone, and accelerometer), and label (e.g. air cut, entrance, stable, chatter, and exit) associated with each part of the signal. The three mentioned attributes are a unique identifier of the entry.   30   Figure 10 Overview of Comprehensive Industrial Process Monitoring Each set of synchronized signals is partitioned into smaller pieces of signal to be used in the training process, as illustrated in Figure 11. It provides appropriately sized training instances. Moreover, the designed data structure offers embedded data augmentation by breaking down each index into multiple sets of smaller pieces of signal with a time delay between them, similar to Rebai et al.[24] suggested in their paper. A scheme of the embedded data augmentation has been depicted in Figure 11, where the data between each consecutive pair of red or yellow dashed lines counts as a cutting instance.  Figure 11 Breaking down the data and Embedded Data Augmentation The designed data structure has filtering, and feature extraction embedded, making real-time filtering and transformation easier. The designed data structure stores data in RAM in the form of 31  pointers to the permanent memory, which results in having each data point taking no more than a few bytes on RAM. The chunks of experiment measurement data could be summoned from permanent memory in a queue, with a shuffleable order for randomizing and avoiding overfitting, or it could be fetched by having its set of unique identifiers. Given that reading from the permanent memory into RAM is the bottleneck, the designed data structure minimizes the permanent memory queries by predicting the next sets of data to be summoned and read them all into RAM once. Moreover, the data structure can read and process the data in parallel to reduce the computation time by utilizing full computational capacity and multithreading. Moreover, a three-dimensional sorted lookup table made it possible to access the sets of data based on each combination of identifiers (i.e. combination of Experiment Number, Source, and Label) in 𝑂(log(𝑛)) plus the reading time. Each index is accompanied by a properties table that contains the other attributes of each experiment, such as the cutting condition and the tool properties in the use case of this thesis.   In this thesis, data acquisition is made through a custom-written LabVIEW application designed for this purpose. The data is stored and organized in the described data structure. The label assigned to each entry is its machining state and determined based on the last section's mentioned method. Both denoising and removing the signal’s drift happens through the data structure, as well. For denoising, a high-pass filter at 4 Hz is used to eradicate the low-frequency noises. In order to get the signal drift and DC offset eliminated from the input signal, the DC blocker proposed by M. Nezami [25] has been used. For the purpose of training, the cleaning and denoising process happened after data collection. However, during online detection, it happens online during the data collection and through the custom data collection application designed for this purpose.  32  3.2.2 Plausibility Assessment and Setting a baseline A baseline is a plausibility assessment method, commonly using heuristics, summary statistics, and randomness measurement. Setting a baseline is usually done using a classic machine learning method, in contrast to deep learning methods. In machine learning problems, it is substantial to set a baseline to assess the plausibility of solving the problem using machine learning before getting started with deep learning. The outcome of this section is a machine learning model fully capable of chatter detection. Nevertheless, the baseline will not be used in the final system, and it is merely for feasibility assessment. In this section, the method of the baseline calculation is explained.   As shown in Figure 12, the first step is to break down the measurements into smaller pieces, using the comprehensive industrial process monitoring data structure. For training, 27000 cutting instances labelled to five matching states, each consists of 0.2 seconds of the microphone vibration signal, are generated.  Figure 12 Setting the Baseline Procedure The considered statistical measures that have been used for this cutting state classification problem are standard deviation and Skewness. Standard deviation is a measure of dispersion, and it is equal to the square root of the variance. A higher standard deviation indicated that the vibration signal’s Cutting  xperimentsBreaking Down  xperiments to Data PointsSkewness  =       1 ×   Decision TreeStandard Deviation =      Machining State Detection ModelPerformance  valuation                           33  magnitude differs from the average magnitude of the signal. The standard deviation of 𝑥1, 𝑥2, … , 𝑥𝑛, where each 𝑥𝑖 is the magnitude of the signal in an instance of discrete-time is determined by,  𝝈 = √∑(𝒙𝒊 − ?̅?)𝟐𝒏 (31) Where ?̅? is the mean of the 𝑥𝑖 values. Furthermore, skewness is a measure of asymmetry in the distribution of a set of data. Applying Skewness to time-domain signal measures the similarity of the measured signal amplitude to the normal distribution. In other words, skewness for a normally distributed set of values is equal to zero. For 𝑥1, 𝑥2, … , 𝑥𝑛, where each 𝑥𝑖 is the magnitude of the signal in an instance of discrete-time is determined by,  ?̃?𝟑 =∑(𝒙𝒊 − ?̅?)𝟑(𝒏 − 𝟏) × 𝝈𝟑 (32)  A decision tree is used as a machine learning model to classify the cutting instances to corresponding machining states. A decision tree is a straightforward machine learning method that determines a set of linear decision boundaries to perform classification, in the case of this work differentiating among the machining states. Each decision boundary is a threshold that assigns a cutting state to each measurement based on the values associated with it.   For training the model, a 10-fold leave one out cross-validation scheme has been used to protect the model against overfitting, which means that the data points have been partitioned into ten disjoint subsets. For each subset 𝑙𝑖 (𝑖 = 1,  , … ,10), the model has been trained on the existing data 34  points in the other nine subsets, all together, and the model performance has been assessed based on the left-out subset (i.e. subset 𝑥). The rest of the subsets will go through the same procedure, and the final error will be calculated based on the average error of all ten subsets [26].  Given the lower count of the data labelled as entrance and exit, in the loss function, the misclassification cost of the mentioned labels is ten times higher than any other misclassification.  The model's accuracy is assessed based on its performance in detecting cutting states of the test set. The results of the method are presented in section 5.2. If the baseline shows promising performance, it is rational to consider using a more advanced machine learning method to solve the problem. In the case of this thesis, it is decided to use a deep convolutional neural network.   3.3 Frequency Domain Conversion and Feature Extraction The signal’s fre uency domain characteristics are an indicator of the cutting state. Whether tooth passing frequency is being observed, in the frequency domain or not, could be an indicator of the tool being in or out of the cut. Moreover, in the case of chatter vibration, resonance peaks and associated harmonics are observed in the frequency domain.  The frequency-domain signal is a good indicator of the cutting state. In this thesis, frequency domain features have been utilized to differentiate between different states of cut and detect the stability of cut.  For the observation of the changes in the amplitude trends in frequency-domain features of the signal over time, a short-time Fourier transform (STFT) has been applied to the signal [27]. STFT calculates the Fourier transform during the time. STFT is commonly used to assess the frequency content of a nonstationary signal over time. The purpose is to use frequency-domain features to 35  differentiate between different states of cut and detect chatter. The machine learning network takes a duration of 0.14 seconds of vibration data as its input and extracts its frequency domain features using STFT.  To calculate STFT the signal gets windowed, with overlap. For 0.14 seconds of vibration data, equivalent to 7000 sampling point with a sampling frequency of 𝑓𝑠 = 50𝑘𝐻𝑧, the windowing process is shown in Figure 13. Windows are overlapping over a length 𝐿𝑜 to compensate the signal attenuation at the window edges. The overlap provides more continuous frequency domain data during the time. The window size is 𝐿𝑤 = 800 and the overlap is 𝐿𝑜 = 680 in this work. The number of windows for each milling case of 𝑛 samples is evaluated as L = ⌊𝑛−𝐿𝑜𝐿𝑤−𝐿𝑜⌋.   Figure 13 Windowing of Vibration Signal for Calculating STFT As shown in Figure 15, the STFT of a signal is determinable by applying a Hanning window [28] to each windowed data of length 𝐿𝑊 and calculating the discrete Fourier transform (DFT) of the resultant signal. The STFT windowing method results in frequency-domain signal through discrete-time domain, covering the frequency components up to Nyquist frequency.  36   Figure 14 STFT for each given Instance of Vibration Signal For a given vibration signal of 𝑐𝑚(𝑘𝑇) in discrete time domain, where 𝑘 = 1, , … , 𝑛, each element of the STFT matrix is computed by,  𝑦𝑘,𝑚 = ∑ 𝑐𝑚(𝑖𝑇)𝐿𝑤𝑖=1 𝑔(𝑖)𝑒−𝑗2𝜋(𝐿𝑤2−𝑘)(𝑖−1)𝐿𝑤   where 1 ≤ 𝑘 ≤ 𝐿𝑤 and 1 ≤ 𝑚 ≤ 𝐿 (33) Where 𝑔 is the window function of the length 𝐿𝑤. The window function in the case of this work is Hann window [28] determined by,  𝑔(𝑖) =12[1 − 𝑐𝑜𝑠 (2𝜋(𝑖−1)𝐿𝑤)]   where 1 ≤ 𝑖 ≤ 𝐿𝑤 (34) The magnitude squared of STFT matrix elements also referred to as spectrogram, is used to detect chatter based upon. In Figure 15, the process of calculating the spectrogram is shown. The resultant STFT matrix for each 0.14 seconds of machining instance in 𝑓𝑠 = 50𝑘𝐻𝑍  is 400 by 52. 37  Additionally, the mapping of the |STFT|2 to the visual representation of the STFT of a signal is shown. The shown visual representation is used repeatably in this work. Each column of the STFT matrix corresponds to a time instance, and each row corresponds to a frequency is illustrated as a colour coded two-dimensional matrix.    Figure 15 Short-Time Fourier Transform’s Matrix Visual Representation Example  The windowing used in this work satisfies constant overlap-add (COLA) constraints [29]. Finally, the resultant STFT matrix is converted to a decibel scale and linearly scaled to the range of zero to one to avoid the bias effect of the magnitude and avoid overfitting. Since amplitudes are 38  symmetric from across the Nyquist frequency, half of the spectrogram is kept resulting in the spectrogram on the right-hand side. Figure 16 shows a scaled spectrogram with only the positive frequency range from an instance of stable cut. The matrix shows the intensity of the vibrations occurring at each frequency, leading to the detection of forced vibrations at the harmonics of tooth frequency and chatter close to one of the structural modes.  Figure 16 STFT Example 3.4 Machining State Detection using Transfer Learning In this section, the transfer learning method used for machining state detection and its components are explained. The components are used further in the next section.  The base architecture for machining state detection is a modified version of a Convolutional Neural Network (CNN) called ImageNet [30]. The proposed method takes the frequency domain signal generated by section 3.3’s method as input and outputs the probability of each state of machining.  The work explained in this section has four importance. First, transfer learning provides fast and robustness training due to being pre-trained. Hence, it provides a reliable cutting state detection system fully capable of detecting machining states independently. Secondly, the transfer learning architecture is used during the process of feature engineering, i.e. finding the right frequency 39  domain parameters. Third, the resultant trained model’s weights and biases are scaled and used as the initial weights and biases for training of the custom designed architecture. Most importantly, a modified version of this work is used as the initial input to the architecture design loop.  A convolutional neural network is a deep learning architecture consisting of two major parts: a set of convolutional blocks and a neural network. Each convolutional block consists of a two-dimensional discrete convolution layer followed by a number of other arithmetic layers. The first block has the network’s STFT input as its input. The next blocks each have their previous block’s output as their input and performs arithmetic processes to generate output in a chain. The convolutional layers extract the form attributes of the input STFT and decrease its size. The output of the convolutional blocks gets flattened and goes into an artificial neural network. Using Artificial Neural Networks (ANN), an equation gets fit to the output of the convolutional blocks. The output of this equation for any given vibration signal input is the probability of the signal being associated with each of the machining states.  To sum up, CNN is a set of weights and biases, forming an equation. The equation gets fit to the machining state detection problem.  Further through this section, the deep leaning training pipeline and its building blocks for the base architecture are explained. These blocks are used further in the next sections, as well.  3.4.1 Deep Learning Pipeline Overview A pipeline is a sequential set of data flow rules among functions and models. There are training and execution pipelines designed in this thesis. The final resultant model of the close loop training pipeline is used in the chatter detection pipeline. The training pipeline is designed to train a model 40  to detect machining states based on vibration signals. The pipeline for training the base network is shown in Figure 17.   Figure 17 Deep Learning Training Pipeline Overview The storage, segmentation, filtering, and transformation to frequency domain steps are elaborated so far. This section’s focus is on the deep learning part of the training pipeline, i.e. convolutional blocks and ANN. To have a functional machine learning model capable of machining state detection, it is crucial to construct a suitable learning architecture. Hence, a well-known deep learning architecture from ImageNet Classification with Deep Convolutional Neural Networks [30] has been adapted to machining state detection and used as the base architecture. In a process known as transfer learning, the pre-trained network is retrained in order to detect machining state and chatter stability. ImageNet is a relatively small/non-complex deep learning architecture. During the training process, the weights and biases are adjusted to make the network equation serve the goal of machining state detection.              Reading from microphone                                      Applying DC blocker and high pass filter                              Applying STFT                  TrainingValidationTest ach xperimentMachining StatesFrom Manually Labeling CriteriaTime (s) Time (s) Time (s) Time (ms)            Softmax   =11 e    Classifier ( ntrance) ( xit) (Stable Cut) (Chatter) (Air Cut)                                                                            xtractedFeatures41  After training the network, the trained model will be used for machining state detection. The input is generated online, by denoising the vibration signal and calculating the spectrogram. Afterwards, the model will be applied to the data in order to calculate the probability of the current vibration being associated with each of the five machining states. The state with the highest probability will be detected as the current state.  The components of the mentioned deep learning architecture are convolution layers, arithmetic and activation functions (e.g. ReLU, Normalization, and Max pooling), and neural network layers. Further through these subsections, the building blocks of the deep learning method are explained, and the notation presented above is elaborated.  3.4.2 Convolutional Blocks The input to deep learning architecture goes through a set of convolutional blocks. Each block starts with a convolutional layer followed by a set of arithmetic functions. Each convolutional block contributes to the discovery and isolation of the relative changes in the frequency domain signal amplitude during the time, which contributes to chatter or other machining states. A sample convolutional block with the notation convention, followed in this thesis, is shown in Figure 18,  Figure 18 A Generic Convolutional Block’           42  In the figure, 𝐶ℎ𝐶(𝑙)×𝑤𝐶(𝑙)×𝑑𝐶(𝑙)(𝑙) and 𝐵ℎ𝐵(𝑙)×𝑤𝐵(𝑙)×𝑑𝐵(𝑙)(𝑙) are the output matrices of the convolutional layer of the 𝑙-th convolutional block and the output of the 𝑙-th convolutional block, respectively. 𝐵ℎ𝐵(𝑙)×𝑤𝐵(𝑙)×𝑑𝐵(𝑙)(𝑙) is the output of a series of arithmetic layers, such as activation, normalization, and max pooling, for the input 𝐶ℎ𝐶(𝑙)×𝑤𝐶(𝑙)×𝑑𝐶(𝑙)(𝑙)𝑔 . Moreover, 𝐾ℎ𝐾(𝑙+1)×𝑤𝐾(𝑙+1)×𝑑𝐾(𝑙+1)(𝑙+1) is the convolution kernel of the 𝑙  1-th layer, which will be applied to its previous block’s convolutional layer. Each layer 𝑙 has a bias term of 𝑏ℎ𝑏(𝑙)(𝑙). The equation of the 𝑙-th block is,  𝐶ℎ𝐶(𝑙)×𝑤𝐶(𝑙)×𝑑𝐶(𝑙)(𝑙)= ?̃?ℎ𝐵(𝑙−1)×𝑤𝐵(𝑙−1)×𝑑𝐵(𝑙−1)(𝑙−1)∗ 𝐾ℎ𝐾(𝑙+1)×𝑤𝐾(𝑙+1)×𝑑𝐾(𝑙+1)(𝑙+1) 𝑏ℎ𝑏(𝑙)(𝑙) (35) Where the asterisk is the convolution operator and ?̃?(𝑙−1) is the 𝐵(𝑙) with the padding applied to it. The arithmetic and activation functions get applied to the value of 𝐶(𝑙) to generate the output of the block (𝐵(𝑙)). The kernel (𝐾(𝑙)) and the bias (𝑏ℎ𝑏(𝑙)(𝑙)) term of the convolutional layer are the learnable parameters that are determined during the training process.   In the next subsections, the convolution layer and arithmetic operations, rectified linear unit, local response normalization, and max pooling, occurring in the convolutional blocks are detailed.  3.4.2.1 Two-dimensional Discrete Convolution Layer The convolution layer is the most important part of the convolutional block, as it is the only part of each convolutional block that gets tweaked during the training process. The two-dimensional discrete convolution layer consists of a set of two-dimensional matrices (which is denoted as 43  𝐾ℎ𝐾(𝑙)×𝑤𝐾(𝑙)×𝑑𝐾(𝑙)(𝑙) as a three-dimensional matrix) which will be determined during the training process. It gets convolved with its input ( 𝐵ℎ𝐵(𝑙)×𝑤𝐵(𝑙)×𝑑𝐵(𝑙)(𝑙−1)) to generate an output (𝐶ℎ𝐶(𝑙)×𝑤𝐶(𝑙)×𝑑𝐶(𝑙)(𝑙)). A two-dimensional convolution reduces the input size and preserves and frames the frequencies and the change in frequencies over time, the STFT parts, which contributes the most to the cutting states.  In other words, during the training process, discrete convolution kernels form to highlight and pick the parts of the frequency domain signal, which play a role in causing chatter or other states of machining. As mentioned, each convolution block starts with a convolution layer. The two-dimensional convolution layer calculates the discrete convolution between the input and the convolution receptive field by iterating the convolution kernel matrix over the layer’s input and evaluating the dot product of the input and the kernel at each point [17]. A two-dimensional convolution layer has a bias term that adds to each element after the convolution is applied.  In this thesis, the first and third convolution operations are applied to their entire inputs, while the rest of the convolutions are two-dimensional grouped convolution. Two-dimensional grouped convolution divides its input into groups, in this thesis, two groups, and applies sliding convolution on each of them. It helps with decreasing the number of learnable parameters. 44   Figure 19 Illustration of group convolution function for a two-group, group convolution  Two-dimensional grouped convolution divides its input into groups, in the case of this thesis, two groups, and applies sliding convolution on each of them. Two groups grouped convolution results in having each kernel being convolved with a number of feature maps obtained by the previous layers. It results in having less redundancy, in term of similarity of the weights and biases or the block’s outcome, for kernels.   Two important terms to define when it comes to Convolutions are stride and padding. Stride is the step size of the convolution window and denoted by 𝑘𝑠(𝑙). Padding is the rows and columns added to the input’s margins to give the kernel more space to cover the whole input. An example of an arbitrary  ×   convolution kernel being applied to the input of the CNN network with zero paddings and a stride of 𝑘𝑠 = 4 is illustrated in Figure 20. 45   Figure 20 Discrete Convolution Illustration with an Arbitrary Kernel Example The output of the convolution layer 𝑙 is given by 𝐶(𝑙) = ?̃?(𝑙−1) ∗ 𝐾(𝑙 )  𝑏𝐾(𝑙), where 𝐶(𝑙) is the output of convolutional layer 𝑙, also referred to as the activations, 𝐵(𝑙−1) is the output matrix of layer 𝑙 − 1, ?̃?(𝑙−1) is the padded 𝐵(𝑙−1), 𝐾(𝑙) is the convolution kernel of layer 𝑙 and 𝑏𝐾(𝑙) is the bias term of the same kernel. In this convention 𝑙 = 0 denotes the input. For a two dimensional convolution, each element of 𝐶𝑖,𝑗,𝑘(𝑙) (𝑖 = 1,  , … , ℎ𝐶(𝑙)  , 𝑗 = 1,  , … , 𝑤𝐶(𝑙) , 𝑘 = 1,  , … , 𝑑𝐶(𝑙)) matrix is given by,  𝑪𝒊,𝒋,𝒌(𝒍)= ∑ ∑ ?̃?(𝒊−𝟏)×𝒌𝒔(𝒍)+𝒂+𝟏,(𝒋−𝟏)×𝒌𝒔(𝒍)+𝒃+𝟏,𝒌(𝒍−𝟏) × 𝑲𝒂,𝒃,𝒌(𝒍)  𝒃𝒌(𝒍)𝑤𝐾(𝑙)𝒃=𝟏ℎ𝐾(𝑙)𝒂=𝟏 (36) During the training process, the value of each convolution kernel and its biases change in a way to make the whole CNN network capable of highlighting the critical features of the spectrogram, i.e. the relative magnitude of the signal in each frequency, which contributes to chatter as well as other 46  machining states. From a conceptual point of view, convolution kernels determine patterns and changes in the amplitude of the input signal's different frequencies during the time.  Below a sample STFT of a stable milling process with the spindle speed of  400[rev/𝑚𝑖𝑛] and a four inserted tool with runout is shown to illustrate the effect of a sample horizontal line detector kernel and showcase the potential of convolution kernels.  Figure 21 Applying a Line Detector Convolution Kernel on STFT Data Example 3.4.2.2 Rectified Linear Unit (ReLU) A Rectified Linear Unit is an activation function. It introduces non-linearity to the equations while it does not saturate. It preserves nonnegative input values and replaces the negative values with zero. The Rectified Linear Unit applies the below function to each element of its input matrix,  47   𝐑𝐞𝐋𝐔(𝒙) = 𝐦𝐚𝐱(𝟎, 𝒙) (37) ReLU highlights the most critical parts of the frequency domain signal for machining state detection. The effect of applying the ReLU function on the data of Figure 21 is shown below. As illustrated, the horizontal lines, which indicate the consistent frequencies during the time, appear to get more prominent after applying ReLU.  Figure 22  Example of Applying ReLU to Convolved STFT Matrix 3.4.2.3  Local Response Normalization Each convolution layer could consist of multiple convolution kernels. Local Response Normalization layer performs lateral inhibition by having each high amplitude output result of each convolution kernel, affecting the adjacent kernels output. In other words, it carries out the changes in the amplitude picked up by one convolution kernel to the neighbour channels. Local Response Normalization performs a channel-wise local response normalization by replacing each element with a normalized value obtained based on its neighbours [30]. Normalizing in the local regions is a kind of Surround Inhibition [31]. 48  For an arbitrary input of 𝑪𝒊,𝒋,𝒌(𝒍) element in the kernel k, at position (𝑖, 𝑗), the normalized response given by,  𝑪𝑵𝒐𝒓𝒎(𝑪𝒊,𝒋,𝒌(𝒍) ) =𝑪𝒊,𝒋,𝒌(𝒍)(𝒌  𝜶∑ (𝑪𝒊,𝒋,𝒙(𝒍))𝟐𝐦𝐢𝐧(𝒅𝑪(𝒍),𝒌+𝒏−𝟏𝟐 )𝒙=𝐦𝐚𝐱(𝟏,𝒌−𝒏−𝟏𝟐 ))𝜷 (38) where 𝑘, 𝑛, 𝛼, and 𝛽 are hyper-parameters whose values are determined using a validation set; In the case of this thesis, 𝑘 =   , 𝑛 =  5, 𝛼 =  10−4, and 𝛽 =  0.75. as proposed by A.Krizhevsky et al.  [30]. To illustrate the concept, the mechanism of its implementation for a window of size five (𝑛 = 5) is shown in Figure 23. It is shown how each element of each matrix, highlighted in red, is replaced with the value of all of the highlighted elements, red and blue, going through equation (38).   Figure 23 Local Response Normalization implementation illustration  3.4.2.4 Max Pooling Max pooling is a method of down-sampling. The Max pooling process slides a window over its input matrix sub-regions and takes the maximum value in each window as each subregion representative.  49  Similar to Rectified Linear Function, max pooling has stride and padding and it puts emphasis on the most matched inputs. Moreover, it eliminates the elements that do not match the amplitude changes according to the convolution kernel. Take the first convolution block as an example; if a max-pooling is added immediately after the convolution layer, each convolution kernel existing in the convolution layer marks the frequency domain signal elements that match the convolution kernels changes of amplitude in different frequencies and during time by showing a high value in the corresponding element of the output matrix. Afterwards, if a max-pooling is added, it will down-sample each convolution kernel's output by preserving the amplitudes that made a better match by eliminating the rest of the amplitudes. Max pooling on 𝐶(1) in the first block of the base architecture, with a stride of two, is shown in Figure 24 as an example.  Figure 24 Max pooling Operation Example The convolved STFT after applying ReLU, as shown in Figure 22, is depicted as an example of the max-pooling function with size of 3  and a stride of 2. In the mentioned figure, max pooling output (on the right) highlights the horizontal lines which were discovered by the convolution kernel and reinforced by ReLU.  50   Figure 25 Example of Max-pooling and applying it to a Convolved STFT after ReLU function 3.4.3 Artificial Neural Network Components The output of the last convolution block is a set of matrices. These matrices are convolved and manipulated from the frequency domain input signal. The convolution layers are supposed to get trained to pick up the frequency domain measurement parts that contribute the most to the cutting states and discard the less/non-important parts. The flattened output of the last layer of convolutional blocks goes into an artificial neural network (ANN) to be classified into the sought states, i.e. the machining states. The ANN takes each element of the matrices from convolutional layers as one input variable and fits an equation to all the input variables. The output of this fitted algorithm will be a set of five numbers, each corresponds to the probability of one of the five machining states An ANN is a set of interconnected artificial neurons. Neurons are variables, each capable of storing a number. The interconnections have numbers associated with them, as well. The Interconnection numbers are called weights.  There are various types of ANNs, yet the most commonly used type, due to its wide range of capabilities, is feedforward ANN. Feedforward ANN is an ANN wherein each neuron belongs to 51  one layer. The layers in feedforward ANN are ordinal, and each neuron is connected to one or more neurons from the next layer. Each neuron from the last layer is connected to a number of output neurons [17] and holds a value. The value of the neuron gets tweaked during the training process. During the cutting state detection, each neuron’s value performs as a coefficient of the neural network equation.  The values of each neuron could be determined by,  ∀ 𝒋, 𝒍 𝒂𝒋(𝒍) =∑ 𝒘𝒊,𝒋𝒍 𝒂𝒊𝒍−𝟏𝒏𝒊=𝟏 𝒃𝒍𝒃𝒋𝒍 (39) Where 𝑎𝑖(𝑙) is the value of neuron 𝑖 from layer 𝑙, 𝑤𝑖,𝑗𝑙  is the weight which connects neuron 𝑖 from layer 𝑙 − 1 and neuron 𝑗 from layer 𝑙, 𝑏𝑖 is the value of bias of layer 𝑙 − 1, and 𝑏𝑗𝑙 is the bias weight of layer 𝑙 and neuron 𝑗. Please note that in the above notation, layer 0 is the equivalent of the input. Using a feedforward ANN with 3 layers, 9216 inputs and 4096 neurons in each of its two hidden layers, its output is given by,  𝒂(𝟑) = (𝒘𝟑(𝒘𝟐(𝒘𝟏𝒙  𝒃𝟏)  𝒃𝟐)  𝒃𝟑) (40) Where 𝑤𝑙, 𝑏𝑙, and 𝑥𝑖, the input number 𝑖, are given as, 52   𝒘𝟏 =[    𝒘𝟏,𝟏𝟏 ⋯ 𝒘𝟏,𝟗𝟐𝟏𝟔𝟏𝒘𝟐,𝟏𝟏 ⋯ 𝒘𝟐,𝟗𝟐𝟏𝟔𝟏⋮ ⋱ ⋮𝒘𝟒𝟎𝟗𝟔,𝟏𝟏 ⋯ 𝒘𝟒𝟎𝟗𝟔,𝟗𝟐𝟏𝟔𝟏]    , 𝒘𝟐 =[    𝒘𝟏,𝟏𝟐 ⋯ 𝒘𝟏,𝟒𝟎𝟗𝟔𝟐𝒘𝟐,𝟏𝟐 ⋯ 𝒘𝟐,𝟒𝟎𝟗𝟔𝟐⋮ ⋱ ⋮𝒘𝟒𝟎𝟗𝟔,𝟏𝟐 ⋯ 𝒘𝟒𝟎𝟗𝟔,𝟒𝟎𝟗𝟔𝟐]    ,  𝒘𝟑 =[    𝒘𝟏,𝟏𝟑 𝒘𝟏,𝟐𝟑 ⋯ 𝒘𝟏,𝟓𝟑𝒘𝟐,𝟏𝟑 𝒘𝟐,𝟐𝟑 ⋯ 𝒘𝟐,𝟓𝟑⋮ ⋮ ⋱ ⋮𝒘𝟒𝟎𝟗𝟔,𝟏𝟑 𝒘𝟒𝟎𝟗𝟔,𝟐𝟑 ⋯ 𝒘𝟒𝟎𝟗𝟔,𝟓𝟑]    , 𝒃𝒍 =[   𝒃𝟏𝒍𝒃𝟐𝒍⋮𝒃𝟒𝟎𝟗𝟔𝒍 ]   , 𝒙 = [𝒙𝟏𝒙𝟐⋮𝒙𝟗𝟐𝟏𝟔] (41) During the training process, the feed-forward ANN's weights and biases are tweaked using backpropagation to minimize the network error. After training the network, it is able to identify the state with the highest probability from the output layer. For instance, in the case of the input signal be a chatter signal, output values 𝑎1(3), 𝑎2(3), 𝑎3(3), and 𝑎4(3) values (probability of air cut, tool entrance, tool exit) will be close to zero whereas  𝑎5(3) will tend to unity, i.e. the probability of chatter being in process is high.  A regularization method called dropout, introduced in the ImageNet Classification with Deep Convolutional Neural Networks [30], is employed in the neural networks training implementation.  Dropout sets t of each artificial neuron to zero with a given probability. This component increases the robustness of the learning by ensuring that different combination of the neurons is functioning.  A drop out of 50% is implemented in this work. It means that in each iteration of training, dropout sets roughly half of the  𝑎𝑗(𝑙) from equation (40) to zero. This results in a fitted equation during the training process in which each individual neuron (𝑎𝑗(𝑙)) has a meaningful contribution to the machining state detection since it is fit to remain relatively functional in the absence of half of the other neurons.  53  Moreover, the output of each neural network layer goes through ReLU.  3.4.4 SoftMax SoftMax turns the numbers from the output of the ANN to the probabilities, scaled from zero to one where one is 100% probability. For each machining state, the probability of that state for a given input is calculated by,   𝒑(𝐬𝐭𝐚𝐭𝐞) = 𝐒𝐨𝐟𝐭𝐌𝐚𝐱 (𝒂𝒊(𝑳𝐥𝐚𝐬𝐭)) =𝒆𝒂𝒊(𝑳𝐥𝐚𝐬𝐭)∑ 𝒆𝒂𝒋(𝑳𝐥𝐚𝐬𝐭)𝑲𝒋=𝟏 (42) Where 𝑎𝑖(𝐿last) is the 𝑖-th neuron of the 𝑙𝑎𝑠𝑡 layer and 𝐾 = 5 is the number of machining states. 𝐿last =   is the number of layers.  3.4.5 Focal Loss Classifier The ANN output goes through a classifier to associate the output numbers to its corresponding classes’, i.e. machining states, probability. In the machining processes, usually, the time which tool spend in-cut is longer than the time it spends getting into or out of the cut. The training dataset used to train follows the same pattern. It takes the tool a short time to pass the transient state of entering and exiting the workpiece. Hence, the number of entrance and exit instances is less than the number of other instances. It is crucial to compensate for this imbalance in the count of data from each state of machining to avoid having a trained cutting state prediction biased against detecting entrance and exit states or not being fitted well enough to detect the right characteristics which represent the mentioned transient states. 54  The focal loss function [32] multiplies the cross-entropy function with a modulating factor. It increases the sensitivity of the CNN to misclassified machining state instances. In this thesis probability of each state of machining is denoted by 𝑝state, where “state” could be replaced with any of the five cutting states. The Focal loss (FL) for the probability of each training instance is given by,  𝐅𝐋 = −𝛂(𝟏 − 𝐩𝐬𝐭𝐚𝐭𝐞)𝜸𝐥𝐨𝐠(𝒑𝐬𝐭𝐚𝐭𝐞) (43) Where 𝛼 =   is a constant called balancing parameter, 𝛾 = 0. 5 is a constant called focusing parameter that specifies the system's sensitivity to wrong machining state detection, and 𝑝state is the probability of the real state. The total focal loss for a set of instances is given by averaging the values of focal loss over all of the instances.  3.4.6 Transfer Learning (base)      ’               O        As illustrated in Figure 26, the base network consists of five convolutional blocks, and each incorporates a number of layers [30]. The output of each block goes through a rectified linear unit.  55   Figure 26 Transfer Learning Architecture Overview Where 𝑓𝑖  (𝑖 = 1, , … ,5) is the arithmetic and activation function of block 𝑖 and 𝑌 is the input STFT instance. This section explains the arithmetical and logical function of each of the architecture layers, as well as the, clarifies and their benefits to this specific application. As shown in Figure 26, the input to the first block is the frequency domain signal during the time, generated by the method explained in section 3.3. The resultant matrix is resized to the network input size (227 by 227) using bicubic interpolation and triplicated to accommodate the transfer learning architecture’s 56  input size. Bicubic interpolation determines each matrix element value by calculating the weighted average of the nearest four neighbours. In the case of a custom design network with an input size e ual to the feature extraction step’s output size, the last two mentioned steps are not re uired. Each block's input goes through a series of functions in that block, as denoted under each block and layer in  Figure 26. The output of one block is the input to the next block. Each layer 𝑙, has a bias term of 𝑏ℎ𝑏(𝑙)(𝑙) which has not been shown in Figure 26 to avoid clutter. The output of the last convolutional layer is a two-dimensional matrix. The matrix gets flattened and goes to the ANN The output of the last convolutional block is classified into the machining states using three layers of ANN. The cross-entropy loss function from the original network architecture is replaced with a focal loss function due to an imbalance in the data. The focal loss function makes up for the fact that the number of the entrance and exit transient state instances is smaller than the other instances.  This section goes over the arithmetic building blocks of the convolutional blocks of the deep learning architecture used in this thesis. The part that each component of the network takes in the machining state detection and classification of the form attribute of the frequency-domain signal is explained. The form factor attributes of the input are the trend of changes in the frequency-domain signal amplitude in different frequencies over a period of time.  The convolution blocks shown in Figure 26 consists of the following layers. The first convolution block consists of an 11 by 11 convolutional layer of 96 convolution kernels with a stride of 4 and zero padding, a linear rectified function, a cross-channel normalization, and a max-pooling of size 3 by 3 and stride size of 2 and a zero padding. The input of the second Convolutional Block is the 57  output of the first block. It incorporates a grouped convolution layer of 2, 128 kernels with a filter size of 5 by 5 and a stride of one and a padding of 2, a rectified linear function, cross-channel normalization, and a max-pooling. The third Convolutional Block has a convolution layer of 384 and a filter size of 3 and a stride of one and a padding of one, and a rectified linear function. The fourth Convolutional Block has two groups of convolution layers of 192 and a filter size of 3 and a stride of one and a padding of one, and a rectified linear function. The last Convolutional Block has a convolution layer of 256 convolution kernels, a rectified linear function and a max pooling. During the training stage of transfer learning, the last convolution layer has been retrained.  The flattened output matrix of the convolutional layer goes to the ANN layers. The base model has a fully connected feedforward ANN. Its size input is 4096, equal to the output size of the last convolutional block's output. The first layer after the input has 4096 neurons as well. Input and the first layer are each followed by a rectified linear function as well as a dropout with a 50% probability. The output layer has five neurons, each corresponding to one of the states of cut. The outputs go through a Softmax function to give the probability of each cutting state. Input and the first ANN layer have dropouts, and they are followed by a rectified linear function. All the ANN layers have been retrained during the training process. The last layer of ANN is followed by a SoftMax layer. Hence, the overall architecture and the first three convolutional blocks remain as it is suggested in the original paper of Image Net. The last two convolutional layers and the neural network is retrained with the weights and biases of the image Net as their initial values. The loss function is changed to the focal loss to make up for the imbalance in the number of each machining state present in the training set.  58  The network is trained on the training data set using Adam [44] optimizer. Adam is a scholastic optimization method that uses the following moving averages to update the network parameters in 𝑖-th iteration, Θ𝑖, and calculate Θ𝑖+1,  𝚯𝒊+𝟏 = 𝚯𝒊 −𝜶𝒎𝒊√𝒗𝒊  𝝐 (44) Where 𝛼 is the learning rate, 𝜖 is the denominator offset, and  𝑚𝑖 and 𝑣𝑖 are parameter gradients and its squared, respectively, and are given by,  {𝒎𝒊 = 𝜷𝟏𝒎𝒊−𝟏  (𝟏 − 𝜷𝟏)𝛁𝑬(𝜽𝒊)𝒗𝒊 = 𝜷𝟐𝒗𝒊−𝟏  (𝟏 − 𝜷𝟐)[𝛁𝑬(𝜽𝒊)]𝟐 (45) The performance of the trained model is presented in section 5.3.  Further, during an automatic machine learning process [33], the base architecture evolves to make an architecture specialized in machining state detection.  3.5 Automated Convolutional Neural Network Architecture Design for Machining State Detection The Architecture of Convolutional Neural Network plays a crucial role in the deep learning algorithm, performance, speed, and robustness. While transfer learning (introduced in section 3.4) offers a quick and robust network for cutting state detection, it does not essentially provide the best architecture in terms of being optimized for a specific purpose, in this case machining state detection. This section suggests a novel approach towards automated CNN design architecture methodology for designing a specialized CNN for machining state detection based on the frequency domain vibration signal. The work in this section results in a machine learning 59  architecture design for processing the vibration signals for the purpose of machining state detection with high accuracy and low model complexity.  Transfer learning comes with limitations and shortcomings. The idea behind the proposed approach is to have the ImageNet [30], also known as AlexNet, architecture, used for the transfer learning method, as the initial state and shape it according to the machining state detection problem. The base network (ImageNet) is modified based on the specification of cutting state detection. Afterwards, the customized network is used as the initial condition for the architecture design algorithm. Finally, using the proposed Automated Machine Learning (AutoML) technic a deep learning architecture is proposed. The proposed architecture is designed with three main goals in mind: high accuracy in machining state detection, low architecture complexity, and high explainability. The output of this section, including the designed network and the trained model's accuracy, is presented in section 5.4.  In terms of machining state detection accuracy and network performance, the automated architecture design uses Bayesian optimization to reach the mentioned set of goals. Moreover, a custom classification layer is designed to accommodate the requirements of real-world cutting.  Regarding complexity, the automated machine learning procedure is designed to have a high tendency to keep the complexity of the architecture low. The complexity of a deep learning architecture is defined as the number of its learnable parameters. The learnable parameters of a deep learning algorithm are the parameters that get adjusted during the training process. More training parameters indicate a more complicated equation to fit the cutting state detection problem in hand. The low complexity of the architecture is important for several reasons. First and foremost, 60  increasing the complexity of a network increases its inclination to overfit and misclassification of the machining states due to redundancy of the learnable parameters. Additionally, less learnable parameters result in a faster network, both during training and chatter detection, and a lighter model with regard to occupying space on the memory.  With respect to explainability, the input has not been resized since the designed CNN is specialized in the machining state detection need. On the contrary, in transfer learning, the input size of the network is given by ImageNet. Also, tooth passing frequency is used as one of the inputs to the network. Moreover, a custom loss function is designed to prevent overfitting and minimizing false positive chatter detection.  There are two significant concepts in the automated machine learning architecture design proposed by this thesis: Architecture Design Methodology, and Objective function and Network Assessment. Further, in this section, each concept is explained.  3.5.1 Automated CNN Architecture Design Methodology and Structure A scheme of the automated architecture design is presented in Figure 27. The utilized architecture outline is similar to the transfer learning architecture from section 3.4, with minor changes to make it tailored for the real-world physical problem at hand.  A model is generated based upon a set of proposed architecture parameters during each iteration of the optimization algorithm. The model gets trained, and the performance of the trained model gets assessed. Based on the assessment, a set of parameters is generated for the next iteration of the optimization algorithm. 61   A methodology is proposed for automatically generating new architectures and modifying them to find a low complexity, high accuracy architecture. For each designed architecture, regardless of its specification, the input to the system will always be the frequency domain vibration signal as well as the tooth passing frequency, and the output will be the probability of each machining state. Based on the optimization algorithm’s output, the best performing model gets trained on the training data and further used as the final machining state detection. Below, a detailed overview of the automated architecture design is shown; each part of the proposed methodology is further elaborated in this section.   Figure 27 Overview of Automated Cutting State Detection Network Architecture Design Methodology Scheme 62  3.5.1.1 Architecture Structure and Outline The initial architecture or the initial input (𝑠0) to the optimization algorithm is based on ImageNet from section 3.4. As shown in Figure 27, three slight modifications have been applied to ImageNet to make it suitable for cutting state detection.  First, the input size is customized to the frequency domain features input given in section 3.3 (400 by 52).   Second, the tooth passing frequency (𝜔𝑡) is added as one of the inputs to the neural network. Hence, the neural network inputs are the convolutional blocks' flattened outputs alongside the tooth passing frequency. Having tooth passing frequency as one of the network inputs leverages our knowledge in machining. Tooth passing frequency contributes to the formation of the frequency domain signal because of the resultant peaks in the tooth passing frequency and its harmonics. It makes it easier for the neural network to detect the cutting state. Since tooth passing frequency is a cutting feature, it increases the neural network state detection’s robustness.   Finally, a customized weighted cross-entropy loss function with a penalty for false chatter detection is designed to accommodate the machining’s particular needs. The loss function is given as below,  𝐋𝐨𝐬𝐬 = −∑𝑾𝑷𝑻 × (𝒀. 𝐥𝐨𝐠(?̂?)) (46) Where 𝑌and ?̂? are matrices of the actual state of the cutting instances, with the known state of machining, and the probability of each cutting state detected by the algorithm, respectively.  63  Furthermore, 𝑊𝑃 is a matrix of miss classification penalty weight. It is designed based on the number of experiment instances for each machining state and the importance and effect of each type of misclassification. The elements of the mentioned matrices correspond to air cut, entrance, stable, chatter, exit cutting states, respectively. Each element of the mention matrices corresponds to a machining state with the mentioned order. Each matrix element's value is the penalty for misclassifying the corresponding row’s machining state to the corresponding column state. Each row of the matrix is normalized in the time of utilization.   𝑊𝑃 =[    0 16 10 10 1010 0 10 10 14 5  5 0 1  5 5  5 1 0  510 16 10 10 0 ]     (47) Where the rows and columns correspond to Air cut, Chatter, Entrance, Exit, and Stable, respectively. As it is shown, the penalty for misdetection of two transient states of entrance and exit, which have a smaller number of training cases, is higher than the others. Additionally, all the penalties associated with false detection of chatter is 16 or higher. Same applies to mistaking stable for chatter.  Except for what was mentioned, the initial network which Bayesian optimization starts with the same topology and components as ImageNet.   3.5.1.2 Design Methodology In the proposed architecture design methodology, each block has been considered as an individual entity with a set of hyperparameters. The number of the convolutional blocks and the neural network layers are subjected to be determined by the optimization algorithm. As shown in Figure 27, each convolution block is denoted by Ci (i = 1,  , … , nc) where nc  is the number of the 64  convolution blocks, and the number of neurons in 𝑖-th hidden layer of neural network layer is denoted by 𝑁𝑖  (𝑖 = 1,  , … , 𝑛𝑁) where 𝑛𝑁 is the number of the neural networks hidden layers. Two nc and 𝑛𝑁 are determinant of the network topology.  In iteration 𝑖 of Bayesian optimization, it comes up with a set of design hyperparameters, 𝑠𝑖 which includes the network topology hyperparameters as well as the convolutional blocks and neural network hyperparameters. Each 𝑠𝑖 is a unique identifier of a CNN architecture. The 𝑠𝑖 set consists of  ≤ 𝑛𝐶 as the number of convolutional blocks and the size and the number of kernels of each of 𝑛𝐶  convolutional blocks’ number of kernels 𝑑𝑖 and kernel size (𝑤𝑖 × ℎ𝑖) for layer 𝑖 as well as 𝑛𝑁 as the number of hidden layers and the number of neurons in each of 𝑛𝑁 layer (𝑁𝑖). After having the 𝑠𝑖 defined by the optimization algorithm, the corresponding CNN structure gets formed, and the feasibility of the CNN structure, in terms of matrix dimensions agreements gets assessed. In case of finding the proposed architecture not feasible, the Bayesian optimization acquisition function modifies its estimate using the method suggested by M. Gilbert et al. [34].   For each given set of 𝑛𝑐 and 𝑛𝑁, the number of optimizable hyperparameters is a function of the number of layers by  nc  𝑛𝑁, since each convolutional block has two parameters for width and height and each ANN layer has a number layers parameter. Additionally, 𝑛𝑐 and 𝑛𝑁 are subject to being optimized. Considering that the Bayesian optimization optimizes a set number of variables, a set of optimization variables for a maximum of five extra convolutional and neural layers are defined. In the optimization iterations with 𝑛𝐶 < 5 or 𝑛𝑁 < 5 the hyperparameters corresponding to layers after the 𝑛𝑐-th layer or 𝑛𝑁-th layer becomes irrelevant. The irrelevant variables were not taken into account during the optimization process, as it is suggested by K. Swersky [35]. 65   The weights and biases associated with the first five convolutional blocks and the first two neural network hidden layers are initialized by resizing the weights and bias matrices of the trained transfer learning method, as suggested in section 3.4, using bicubic interpolation. This helps with faster convergence and more accuracy. For the rest of the weights and biases, the initialization strategy is the same as it is suggested for the last convolutional block and hidden layer of ImageNet [30]. The weights for the extra convolution layers are initialized from a zero-mean Gaussian distribution with a standard deviation of 0.01. The neuron biases of the additional convolutional layers and the hidden layers are set to constant one, and the neuron biases are set to constant zero.  The number of epochs is limited to three during the optimization to minimize the architecture optimization time and saving computing power. Each epoch is a pass during which the whole training dataset is used to train the model. Figure 28 shows the general rules according to which the architectures are designed during the optimization process. Other than the architecture's overall topology and a set of rules and constraints mentioned in this section, the rest of the architecture can be modified during the design methodology. The parts denoted with red font in Figure 28, are the tweakable part of the network, i.e. 𝑠𝑖 contents in each iteration. 66   Figure 28 Architecture Structure and Design Methodology Scheme 3.5.2 Bayesian Optimization for Machining State Detection Architecture Design  Bayesian Optimization is a Global Function Optimization method that uses a response surface methodology-based approach in order to find a local minimum or maximum of an objective function. In this chapter, Bayesian optimization is incorporated as an approach to design a convolutional neural network specialized for machining state detection.   67  Bayesian Optimization is a proper method to be applied to the architecture design problems since they are particularly required to find the extrema of an objective function, which is costly, non-convex, and we do not have access to the derivatives [38].  Bayesian Optimization uses the Bayes theorem to form a surrogate probabilistic model of the objective function. The objective function is the function that is desired to be minimized or maximized; it encompasses the variables which contribute to the problem at hand. Since the use of optimization in this thesis is for the minimization of the objective function; hence, minimization will be the only case that will be explained in this thesis. The objective function definition is explained in the next subsection.  Surrogate optimization attempts to balance the trade-off of exploration and exploitation. Exploitation is sampling, according to the surrogate model’s prediction.  xploration is the search for a global minimum where the surrogate model has an extremely low certainty, resulting in exposure to new local minima. The algorithm has been proven to converge [39].  Bayesian optimization is used to define a surrogate function that approximates the original function with computational efficiency. Bayesian optimization makes use of the prior estimation of the surrogate function. It updates based on the samples drawn from it in order to achieve a more accurate approximation of the surrogate model. For a function 𝑓(𝑥), the surrogate function is denoted by 𝑓. Based on the fact that the defined objective function is complex, a non-parametric method would be appropriate to approximate the functions.  In this work, Bayesian optimization constructs a prior belief about how the proposed architecture, associated with 𝑠𝑖 set of network architecture hyperparameters, would behave. Then, in each iteration, it searches the parameter space by enforcing and updating that prior belief based on the current measurements. In other 68  words, a surrogate model of the problem search space, which has a size of about 1038 is formed using Bayesian optimization. The model gets evolved in each iteration of the optimization to make a more accurate picture of the search space. The search for the best architecture will be carried on based upon the updated surrogate function, from the previous iteration. For the purpose of sampling the points in the search space, Bayesian optimization uses a function called acquisition function, which is denoted by 𝜇𝑄(𝑥). An acquisition function maintains the balance between exploitation and exploration. Exploitation is searching the region in which the surrogate model predicts that the minimum values lay there. On the other hand, exploration is sampling at locations with high uncertainty.  The Bayesian optimization procedure starts with an initial definition of the prior for the surrogate model. Afterwards, in each iteration 𝑖, Bayesian optimization takes three steps towards optimization of a given objective function 𝑓: 1) Finding the sampling point 𝑥𝑖, based on the attributes of the posterior distribution and by using the acquisition function,  𝒙𝒊 = 𝐚𝐫𝐠𝐦𝐚𝐱𝐱𝒖(𝒙|𝑫𝟏 𝒊−𝟏) (48)  Where 𝑢 is the acquisition function, and 𝐷s are the samples drawn from 𝑓. 2) Obtain a possibly noisy sample, 𝑦𝑖, from 𝑓 as below,  𝒚𝒊 = 𝒇(𝒙𝒊)  𝝐𝒊 (49) 3) Augmenting the data as below and updating the Gaussian process,  𝑫𝟏 𝒊 = {𝑫𝟏 𝒊−𝟏, (𝒙𝒊, 𝒚𝒊)} (50)  The mentioned acquisition function evaluates the goodness of fit to a specific Gaussian kernel. The acquisition function used in this thesis is the probability of improvement. The kernel used in 69  this work is the ARD Matérn 5/2 kernel [40]. The estimate of the goodness will be modified based on the approach proposed by M.A.Gelbart et al. [34].  3.5.3 Network Assessment and Objective Function Design The objective function evaluates two main criteria: network complexity and its ability to detect machining states. The objective function goal is to decrease the network complexity without compromising cutting state detection performance. The complexity of the network is defined as the number of learnable parameters of the network. Each block of the deep learning system has a set of weights and biases that gets tweaked during the training process. The training phase's final weights and biases will form an equation in the test/implementation phase, which output each cutting state probability. It is desirable to have a smaller number of learnable parameters to make the equation as simple as possible. A smaller equation is faster to run, i.e. results in a faster chatter detection. Moreover, a simpler equation is less prone to overfit. The last section's resultant trained network performs cutting state detection on the validation data to evaluate the network performance. The mean of the network’s loss over all the predictions is calculated and used as an indicator of the design network performance. Lower loss indicates a better performance, i.e. smaller error.   Given the high volume of training samples and the fact that the objective function must account for network performance and its complexity, the Hannan–Quinn information criterion(HQC) [36] is used. The Hannan–Quinn information criterion (HQC) is given by, 70   𝐇𝐐𝐂 = 𝑻𝐥𝐨𝐠 (∑𝐋𝐨𝐬𝐬𝑻)  𝟐𝒌𝐜𝐨𝐦𝐩𝐥𝐞𝐱𝐢𝐭𝐲𝐥𝐨𝐠(𝐥𝐨𝐠(𝑻)) (51) Where ∑Loss is the summation of the loss function value for all of the training instances, an indicator of network performance, 𝑘complexity  is the number of the network parameters, i.e. network complexity, and  𝑇 is the number of training instances. It is common to use HQC in model selection and AutoML. Given the high number of training instances in deep learning, HQC is a proper candidate to be used [37]. Hannan–Quinn information criterion assesses the accuracy and complexity of the network by introducing a numeric measure. In other words, it measures the accuracy of the network, by evaluating the trained network on the validation data set, and adds a penalty for more complex networks.  3.6 Summary In this chapter, the experiment design and machining state detection methodology are explained. The process of network design starts with using a statistical method for machining state detection to assess the plausibility of solving the problem of machining state detection using machine learning. A transfer learning method is used as the initial input set of the customized architecture design loop. An automated machine learning architecture design pipeline is proposed to design a specialized architecture for machining state detection with high accuracy and low complexity. It consists of processing vibration data and conversion to the frequency domain, as well as a machine learning model and constructor and Optimization components to detect machining states. During the architecture design process, the optimization algorithm suggests a set of architecture parameter of 𝑠𝑖 and a network gets constructed based on them. The network gets trained on the processed signal with the training dataset. The trained model is evaluated using validation data. The objective 71  function value is assessed at this point, based on the accuracy of the trained model state detection on the validation dataset and the network’s complexity. The best performing architecture is trained and used for machining state detection.   72  Chapter 4: Machine Learning and Physics-driven Chatter Detection Fusion This chapter is dedicated to amalgamating the deep learning method for machining state detection from Chapter 3 and a chatter detection method based on milling physics and proposing the full system’s pipeline. The goal is artificial intelligence and physics-driven methods infusion to reach a reliable approach with high accuracy.  The physics-driven method is based on an online energy-based chatter detection. Improvements suggested to increase its accuracy is explained in this chapter.  This chapter is organized as follows. The first section elaborates on the enhancements proposed to the energy-based chatter detection method proposed by Caliskan et al. [41]. The proposed enhancement increases the method's accuracy while it saves computing power required for detecting chatter. Section 2 focuses on the chatter detection pipeline and its implementation in the real world. A decision-making algorithm is proposed, which plays a primary role in the infusion process. Both deep learning and physics-based methods run simultaneously in parallel, detecting the machining state and stability independently. However, the decision-making module infuses the outcome of the mention methods. It results in a method with higher robustness and accuracy than the standalone machine learning method. Section 3 suggests a  self-evaluation loop for the machine learning algorithm, which uses the whole system's entropy and accuracy to retrain the model online and improve it. Lastly, section 4 provided a summary of the work done in the chapter.  73  4.1 Improved Energy-Based Chatter Detection Method In this section, modifications are suggested to the online energy-based chatter by Hakan et al. [11]. The modification enhances the method by increasing the accuracy of the energy-based chatter detection method and speeds up the detection process. The original paper [11] provides an estimation of chatter and periodic frequencies based on the assumption of Kalman filter ideal performance. Then, it uses the estimated energy of chatter and periodic, forced vibrations, to detect instability in the cutting process.  This thesis proposes an approach to improve the robustness of the energy-based method. The proposed method estimates the chatter energy based on the chatter peaks detected in between tooth passing harmonics using a statistical approach. The mentioned method calculates/updates the FFT of the Kalman filter output. It avoids multiple filters and estimates the chatter energy more accurately. As shown in Figure 29, subtracting the Kalman output from the input signal removes the input vibration signal's periodic part.  n the proposed method, Kalman’s output is converted to the frequency domain continuously. The frequency-domain of the non-periodic parts of the vibration signal’s peaks are detected using a statistical method. The proposed peak detection method is designed to ignore the possible remaining peaks at tooth passing harmonics and detect the chatter peaks between the tooth passing harmonics' pairs. Finally, the energy ratio is calculated and used to detect chatter vibration. It is unstable if the energy ratio is higher than a predetermined threshold. Additionally, the parameters of the Kalman filter are tuned using Bayesian optimization. 74   Figure 29 Suggested Improved Energy-based Method's Scheme   4.1.1 Chatter Energy Estimation The input is the vibration signal, i.e. microphone in the thesis, which is passed through Kalman filter to predict ?̂?𝑘 (Figure 29) . The state vector of the periodic part of the signal [12] is,  {    ?̂?𝒌− = 𝝓?̂?𝒌−𝟏𝑷𝒌− = 𝝓𝑷𝒌−𝟏𝝓𝑻  𝝀𝑹𝑲𝒌 = 𝑷𝒌−𝑯𝑻(𝑯𝑷𝒌−𝑯𝑻  𝑹)−𝟏?̂?𝒌 = ?̂?𝒌−  𝑲𝒌(𝒔𝒌 −𝑯?̂?𝒌−)𝑷𝒌 = (𝑰 − 𝑲𝒌𝑯)𝑷𝒌− (52) where 𝑃𝑘 is the error covariance matrix, 𝐾𝑘 is the recursively updated Kalman gain matrix, 𝑅 is the measurement noise covariance, 𝑄 is the diagonal process matrix of the process noise, and 𝜙 is the phase, as suggested in the energy-based method [10]. The energy of the periodic part of the signal (?̂?𝑝) is calculated using Equation (19). 75  The Kalman filter's output is subtracted from the original signal to remove the effects of the periodic part of the signal at tooth passing frequency and its harmonics. The output of the Kalman filter is transformed into the frequency domain.  Ideally, the resultant frequency domain signal (𝑌𝑐) should not contain any periodic part of the signal. However, it is observed in the experiments that in a considerable number of cases, the harmonics of the tooth passing frequency are not totally removed. Since the Kalman filter predictions are based on the current state, changes in the cutting state or sudden changes in the cutting conditions affect the Kalman filter’s prediction accuracy. The transient states of the tool entering and exiting the workpiece negatively influence the Kalman filter prediction. Hence, the following approach is proposed to increase the robustness of energy calculation.  As shown in Figure 30, for the purpose of chatter detection, a robust chatter peak detection in FFT of the vibration signal is suggested. The proposed method set a threshold curve on the FFT magnitude based on the local mean and standard deviation of the frequency-domain signal as well as the average of each instance of the frequency-domain signal’s amplitude. The 𝑛 × 1 amplitude threshold value matrix for each of 𝑛 = 5000 points of the frequency-domain signal is given by,  𝑎𝑡ℎ = 𝜎(𝑌𝑐)  mean(𝑌𝑐)  𝑌?̅? × 𝐽𝑛,1 (53) Where 𝜎() and mean() represent standard deviation and mean filter, respectively, both with a window size of 𝐿 = 0.004 × 𝑛 . Moreover, 𝑌?̅?  is the average of the whole frequency-domain signal, and 𝐽𝑛,1 is a unit matrix by the size of the number of FFT amplitude points. Given the sudden change in the mean and standard deviation, a suppression method is suggested. The suppression factor for each amplitude point of 𝑌𝑐𝑖 with the threshold of 𝑎𝑡ℎ𝑖 is defined as below, 76   𝒇𝒔𝒖𝒑 = 𝒆𝐦𝐚𝐱(𝟎,𝒀𝒄𝒊−𝒂𝒕𝒉𝒊)𝐦𝐚𝐱(𝒀𝒄)  (54) The suggested suppression factor decreases the effect of the peaks exponentially to increase the robustness of the detection. The suppression factor is exponentially proportional to the difference between each amplitude and the corresponding threshold. After applying the suppression factor, by the element-wise division of 𝑌𝑐./𝑓𝑠𝑢𝑝, the threshold is recalculated using equation (53). For each group of FFT points with an amplitude higher than the threshold curve, i.e. each set of continuous points above the threshold, the point with the highest amplitude is determined as a candidate peak. The mentioned threshold is shown in Figure 30.  Figure 30 Chatter Peak Detection Example An eligibility criterion is defined for each candidate to avoid the unremoved tooth passing peaks being counted towards chatter peaks. For each set of points higher than the threshold, its corresponding candidate peak is eligible if the set of the points does not cross any tooth passing frequency. As illustrated in Figure 30, the eligible points might appear in multiple ranges of 77  frequencies. The ranges that go through tooth pass frequencies are ignored. In each remaining range, the value points with the highest values determined as a chatter peak.  The proposed model's performance on three states of machining, chatter, entrance, and stable cut, for a sample signal is shown below. The magnitude of the detected peaks for chatter is considerably higher than the two other states. The detected peaks for the transient and stable states are mostly rejected.  Figure 31 Proposed Physics-driven Method Performance on a Sample Signal 4.2 Hybrid Model A decision-making process is designed for making the final call on chatter detection. The decision-making algorithm is the infrastructure of fusing the deep learning method output with the optimized physics-based method in order to detect chatter. If chatter is detected, the chatter 78  suppression module will get engaged and eradicate the cut's ongoing instability. The model is designed in a manner that it outputs the cutting state as well as the cutting stability. As shown in Figure 3 System Overview, the outputs of both physics-driven and machine learning methods are the inputs of the decision-making algorithm. Cutting state and a command to the machine tool controller, intended to perform chatter suppression, are the algorithm's outputs. Decision-making algorithm shares its inputs with the Self-evolution algorithm.  4.2.1 Decision-making Algorithm The machine learning model output is in the form of the probability of each cutting state. The standalone deep learning algorithm determines the cutting state with the highest probability as the current cutting state. As illustrated in Figure 32, the detected state's certainty is defined as the difference between the state with the highest probability and the one with the second-highest probability. The detection is with low certainty if the difference between the probability of the two states with the highest probability is smaller than the certainty threshold (𝑡ℎcertainty = 8%). Intuitively, when the algorithm determines two relatively close probabilities for the top two most probable machining states, it means that the deep learning algorithm is agnostic about the correct state. In other words, low certainty shows the algorithm’s inability to differentiate between two cutting states. 79   Figure 32 Probability versus Certainty in Machining State Detection Using Machine Learning Output Energy-based chatter detection outputs the energy ratio of the signal. Based on the experimental results, an energy-based chatter detection algorithm has a high tendency to confuse the transient states of machining as chatter.  The outline of the decision-making algorithm is shown in Figure 33. If the deep learning algorithm detects Air Cut, Entrance, or Exit with a high certainty relative to a non-transient in-cut state, it is the final output of the decision making algorithm.  If the algorithm detects chatter or stable cut, the probability of the chatter and stable cut is given by,  {    𝑷𝒄 =𝒑(𝐂𝐡𝐚𝐭𝐭𝐞𝐫)  𝑲𝒑 × 𝑬𝑵𝑲𝒑  𝟏𝑷𝒔 =𝒎𝒂𝒙(𝒑(𝒔𝒕𝒂𝒕𝒆 ≠ 𝒄𝒉𝒂𝒕𝒕𝒆𝒓))  𝑲𝒑 × (𝟏 − 𝑬𝑵)𝑲𝒑  𝟏 (55) Where 𝑃𝑠 , 𝑃𝐶 are the accumulative probability output of the deep learning algorithm, 𝐾𝑝 = 0.  is the energy-based influence factor, and 𝐸𝑁 is the energy ratio from energy-based chatter detection. If 𝑃𝐶 > 𝑃𝑆 chatter is detected, and the chatter suppression procedure is initiated. Otherwise, the machining state will be determined as the state with the highest probability, given that it has at least 4 × 𝑡ℎcertainty of certainty. If it does not, the state will be determined stable.             Air Cut ntranceChatter xitStable CutChatter ProbabilityCertaintyProbability ( )80   Figure 33 Decision Making Algorithm Outline 4.2.2 Chatter Suppression After chatter is detected, the implemented method to suppress chatter will be engaged. The chatter suppression module's output is a new spindle speed, which is sent to the machine tool controller in case of chatter being detected.  In case of chatter being detected, a command is sent to the machine to zero the feed rate and change the spindle speed to the closest stable spindle speed, according to the stability lobes. As shown in Figure 34, the Sandvik tool chatters at  100 [rev/min] and  .8 mm depth of cut, despite what the theoretical Analytical Stability lobes suggest. Hence, the algorithm sends the command to the machine to zero the feed rate, set the spindle speed to 1665 [rev/min], and reactivate the feed rate. This method requires the frequency response function of both the machine tool and the workpiece, cutting coefficients of the work material, and tool geometry to determine the stability lobes [7] beforehand.  >             (Air Cut) ( ntrance) ( xit) (Stable) (Chatter)         ma ( ) ma  ( )Air Cut ntrance xit  1  R                          = Chatter    ×      1  =ma (              )    (1   )   1StableChatterChatterStableNo> 4 ×            Deep Learning Algorithm mproved  nergy based Chatter Detection   81   Figure 34 Chatter Suppression based on the Analytical Stability Lobes for Sandvik R390-050022 Tool 4.3 Self-evolving artificial intelligence using Semi-supervised learning This section aims to develop a self-evolution mechanism to make the machine learning model to improve over time by self-training. There are three criteria proposed for automated retraining of the machine learning model. The first criterion is designed based on the machine learning model’s attributes. The second one is designed based on machining process knowledge.  It is essential to mention the proposed method's goal is to find out an objectively accurate detection of the machining state for the instances detected with low certainty. The priority is to find and utilize more accurate detection at the cost of compromising the number of new training cases.   4.3.1 Self-evolution System A self-evolving artificial intelligence system is capable of improving itself during operation. A semi-supervised learning is used for fabricating a self-evolving deep learning algorithm. The algorithm uses the fusion of the physics-based method and machine learning to retrain and enhance the convolutional neural network in an online manner. It takes advantage of the designed hybrid system to improve the machine learning model. UnstableStable82   During the machining process, the network gets evolved while it is being used to detect the machining state. The system is capable of improving itself both on a pre-set criteria and with an operator’s feedback. N. Fazakis et al.[42] showed that using Semi-supervised learning improves machine learning algorithm performance merely by retraining the model on its predictions. Since the combination of the physics-based and deep learning algorithms have higher accumulative robustness and accuracy, the cutting-state instances detected with a high accumulative probability and accuracy using both methods are reliable sources of new training cases to retrain the machine learning model.   The goal is to retrain the machine learning model on the vibration signal instances that the machine learning model either misclassifies or correctly classifies with low certainty. According to the defined criterion, each machining vibration measurement instance should meet at least one of the following three conditions to be qualified to be used for retraining and evolving the machine learning model.  First, if both physics-driven and machine learning methods have the matching stability detection, i.e. both detect chatter or physics-driven model detects stable cut while the deep learning model is detecting anything but chatter the certainty will be used to measure the reliability of the detection. Regardless of the certainty of the machine learning model’s detection, if the energy ratio (𝐸𝑁) is higher than the defined 𝑡ℎ𝐸𝑅𝐶 threshold, the detection is chatter, data alongside the detected state will be used to retrain the model. If the detected state is stable and the EN is smaller than 𝑡ℎ𝐸𝑅𝑆, the data alongside stable detection will be used to retrain the model. The 𝑡ℎ𝐸𝑅𝐶 and 𝑡ℎ𝐸𝑅𝑆 are the 83  energy ratio threshold for chatter and stable detection, respectively. The mentioned thresholds are determined by averaging the to 10% of chatter and the bottom 10% stable cases by the improved energy-based method. This criterion becomes highly important and influencing when the certainty of the machine learning method’s detection is low, i.e. the ML system is highly agnostic about its correct detection of machining state.   Secondly, each part of a tool path plan starts with air cut followed by an entrance of the tool to the workpiece. It continues to cut through the workpiece, either as stable or in the presence of chatter, and ends with the tool exiting from the workpiece. Knowing this typical characteristic of machining operations, if both systems detect chatter in one instance while it is surrounded by high certainty detection of chatter by the machine learning algorithm, higher than 𝑡ℎCertainty, as well as non- contradictory detection, the data alongside the chatter label will be used for retraining the model. In this context, surrounded is defined as having a hundred sampling before and after of the instance, meeting a particular criterion. Moreover, non-contradictory detection is when there is no disagreement between the physics-based method and the machine learning method detections. Moreover, suppose the physics-driven method detects stable cut while the deep learning algorithm detects one of the machining states, other than chatter, either surrounded by the same state or the states that match the order mentioned in this paragraph. In that case, it will be used to retrain the model.  Finally, the machine’s operator’s feedback will be used as a way to retrain the algorithm.   84  4.4 Summary In this chapter, proposes improvements to the energy-bases chatter detection and suggests an approach towards chatter detection by amalgamating the the physic-based and machine learning method. The proposed method is a fusion of the machine learning machining state detection model, designed and trained in the previous chapter, and a physics-driven chatter detection model based on H. Caliskan et al. [11] work with fundamental enhancements proposed at the beginning of this chapter. The chapter starts by elaborating on the physics-driven chatter detection method suggested in this work and proceeds with the approach taken towards the infusion of the mentioned method with machine learning. The infusion method is proposed with robustness and accuracy in mind.  Finally, a set of criteria is suggested to be used for fine-tuning the machine learning model as it is being used for state detection.   85  Chapter 5: Experimental Validation and Results In this chapter, the experimental outcomes of each method, elaborated in chapters 3 and 4, are presented, and the functionality of the suggested model is verified based on experimental data.  The machining state detection machine learning models, improved energy-based chatter detection model, and the hybrid-system are validated with experimental data. The experiments used for reporting the model's performance is from a new set of data associated with the measurements and experiments that have not been used during the model design or training. Formerly, in this thesis, referred to them as the test data set. It consists of cutting tests with a wide variety of cutting conditions, including cutting conditions different from what is used for model training and design. The data consists of cutting experiments that follow the theoretical linear stability laws presented by Y.Altintas [6] and those that contradict them, i.e. chatter in the theoretical stable zone and vice versa.  Each section of this chapter represents a fully-working original model for chatter detection. The chapter is organized as follows. It starts with an overview of the performed experiments. In the second section, the set baseline and its accuracy are shown. Section 5.3 explores the capability of the model resulted from the transfer learning method introduced in section 3.4. Section 5.4 evaluates the performance of the designed machine learning model in section 3.5. Section 5.5 presents the enhancements caused by the improvements applied to the energy-based method, as suggested in 4.1, and validates the final hybrid model's performance, introduced in section 4.2. Finally, section 5.7 summarizes the outcomes of this chapter.  86  5.1 Data and Training Parameters The first step towards having a robust model and avoiding overfitting is to have a diverse dataset. For this purpose, more than 1600 experiments from about 500 continuous cutting tests, consist of various cutting conditions, tool paths, tools, and materials, are performed. The data is acquired with 𝑓𝑠 = 50𝑘𝐻𝑧 sampling frequency and resulted in 53,884 training instances. The cutting instances were extracted and labelled to five machining states manually based on the process output and surface quality.  Table 1 Number of Instances and Experiments Distribution for each Dataset and each Machining State  Training Instances Validation Instances Test Instances Total (%) Experiments (%) Air cut 14,110 1,200 4,048 19,358 (35.90%) 464 (28.10%) Entrance 540 112 130 782 (1.50%) 286 (17.32%) Stable 16,364 1,280 1,272 18,916 (35.10%) 358 (21.68%) Chatter 9,806 1,966 2,292 14,064 (26.10%) 218 (13.20%) Exit 480 182 102 764 (1.40%) 325 (19.69%) Total  (%) 41,300 4,740 7,844 53,884    (76.60%) (8.80%) (14.60%)     Experiments  (%) 1256 190 205     (76.08%) (11.51%) (12.42%)     The cutting experiments differentiated in parameters such as workpiece material, tool, and cutting conditions. The workpieces’ materials used are aluminum 7075 and steel. In terms of tools, as shown in Figure 35, Six different tools have been used during the data collection as follows, Sandvik R245-12T3, Seco R21769-1632, Sandvik R390-050022, Seco JS452120E33R050, and Sandvik 39241014-6340120B. The Seco R21769-1632 tool is used for cutting steel, and the rest of the tools are tools used to cut aluminum workpieces.  The data collected from chattering and stable cuts have been conducted at various depths of cuts, feed rate, and spindle speed. All of the tools were tap tested to identify their Frequency Response 87  Functions (FRF) while they were mounted on a warmed up machine. The tap test data further used to determine the analytical stability lobes [6] and to plan the experiments accordingly. For each tool, several pairs of depth of cut and spindle speed are measured from each of the four following categories, unstable cut in theoretically stable zone, unstable cut in theoretically unstable zone, stable cut in theoretically unstable zone, and stable cut in theoretically stable zone. The tests cover the predicted stability, unpredicted stability, predicted chatter, and unpredicted chatter conditions. The mentioned statement applies to all of the tools, except Sandvik R245-12T3 and Seco R21769-1632.   Figure 35 The tools used during data collection, from left to right Sandvik R245-12T3, Seco R21769-1632, Sandvik R390-050022, Seco JS452120E33R050, and Sandvik 39241014-6340120B  5.2 Baseline The baseline is set in section 3.2.2 to assess the machining state detection problem's plausibility using statistical methods. The output model is a machine learning model, capable of machining state detection independently. 88  For the goal of machining state detection, a binary decision tree is utilized. The tree uses Gini’s diversity index as its split criterion [43]. As mentioned before, cross-validation with ten folds is used on 0.2 seconds pieces of training data. The method resulted in 85.46% test accuracy in cutting state detection. The Confusion Matrix of the trained model is shown in Figure 36, demonstrating the misclassifications of each cutting state. The Green and red colour scale in the confusion matrix indicate the correspondence class prediction accuracy, where green is an indicator of high accuracy, and red is an indicator of low accuracy. To make the matrix more readable, each row of the matrix represents the percentage regarding the true class of the corresponding row.   Figure 36 Baseline Model Confusion Matrix (with normalized rows)                                       Air Cut   .    .    .    .    .     .    .   Chatter   .     .    .    .    .     .     .    ntrance   .    .     .     .     .     .     .    xit   .    .    .    .     .    .     .   Stable   .     .    .    .     .     .     .   Air Cut Chatter  ntrance  xit Stable TruePositive(Recall)FalseNegative                            .     .     .     .     .   Positive Predictive Value(Precision) .     .     .     .     .   False Discovery Rate89  The model shows promising results in terms of chatter detection. The model is not well trained for detecting entrance and exit of the tool in the workpiece, yet, it rarely mistakes entrance and exit states with chatter. In other words, the model shows substantial results in chatter detection as well as machining state detection. The algorithm performance in detecting entrance and exit is relatively low, however it has about 10% higher accuracy than a random detector in detecting entrance..  In Figure 37, the trained decision tree is depicted, and the range of statistical variables indicating each machining state is shown where𝜎 indicates standard deviation, and 𝜇3 stands for Skewness. For each split, the left arm represents the case in which the variable in the corresponding box on the top of the branch has a smaller value than the number indicated in the box. Similarly, the right arm leads to where the variable, i.e. Standard deviation or Skewness as specified in the box, is bigger or equal to the stated number in the same box. Starting from the top of the tree, the value of the standard deviation and skewness of the vibration signal is compared to the value in the corresponding box and navigates its way to one of the end nodes, which are the machining states.  Figure 37 Decision Tree, Decision Criteria for machining state detection <  <  <  <  <  <  <  <  <   <>       <>               <>        <>                   <>                   <>       <>        <>           <>                               90  The resultant tree indicates that the chatter signal magnitude tends to have more dispersion from the mean. It shows that the skewness is a good differentiator between the machining states. Meanwhile, the standard deviation is a better discriminator for chatter versus stable and out of the cut. In order to depict the decision boundaries of the trained decision tree, Figure 38 is drawn. The algorithm detects chatter in most cases when the standard deviation is higher than 0.103, and if it is equal or smaller than 0.04, it detects transient state or air cut. Moreover, it appears that the decision tree is using skewness, mostly, as a discriminator between different non-chatter machining state.  Figure 38 Decision Tree Decision Boundaries The histogram of each machining state's magnitude for all the training data is shown in Figure 39. The histograms are drawn with 100 bins. A normal distribution is fitted to each histogram. As demonstrated, the stable cut vibration data magnitude’s distribution has a small deviation with a small skewness towards the left, while the chatter cut distribution is farthest from the normal distribution and with a higher skewness. Also, the similar value of standard deviation in air cut, Air Cut ntranceStableChatter xit                       91  entrance and exit and the considerably different skewness value in the mentioned machining states supports the findings about the fitted decision tree split strategy. As evident from histogram of data in Figure 39,  and the decision boundaries of the machine learning model given in Figure 38, the main difference between the entrance and exit signal is that the exit signal is mostly right-skewed. In contrast, the entrance signal is significantly left skews in most cases. Moreover, the air cut vibration signal has a small deviation.  Figure 39 Normal Distribution Fitted to the Histogram of Each Machining State Experiments Time-domain Test Data Even though the confusion matrix suggests that the model performs poorly on the detection of exit and entrance, the model's overall accuracy is promising and an indicator of the plausibility of solving this problem with machine learning, since it provides reasonable accuracy for real-world implementation. Needless to say, as it is apparent in Figure 36, entrance and exit states are mostly mistaken with air cut and stable, which is harmless for our goal of chatter detection and avoiding the confusion between chatter and transient states of cut.  92  5.3 Standalone Transfer Learning Machining State Detection Method Performance This section presents the performance of the proposed standalone machining state detection method introduced in section 3.4. The method is capable of detecting machining state based on the machine's vibration signal. Table 2 Custom Designed Machining State Detection Model Training Hyperparameters Hyperparameter Value Solver Adam (Adaptive Moment Estimation) [44] Initial Learning Rate  .0𝑒 − 4 Learning Rate Drop Factor 0.08 Denominator Offset (𝜖) 1.0𝑒 − 8 Gradient Decay Factor (𝛽1) 0.9 Squared Gradient Decay Factor (𝛽2) 0.999 L2 Regularization(𝜆) 1.0𝑒 − 4 Mini Batch Size  56 Number of Epochs 30 The learning rate is the step size for updating the learnable parameters of the machine learning model in each iteration. The learning rate has been multiplied by the Learning Rate Drop Factor after each epoch to fine-tune the machine learning model. It results in searching for a wider space at the beginning to find an estimation of the local minimum area; as it proceeds with iterations, the learning rate drops, which results in a more refined search.  The learning is multiplied by the Learning Rate Drop Factor after each epoch. The data is shuffled before training to ensure that the order of data does not play a role in the process of machining state detection and the model performance, i.e. it does not overfit. Training of the model takes 60 93  minutes on the NVIDIA® Quadro® P4000 GPU with 8 gigabytes of GPU memory. The architecture’s complexity, i.e. the number of trainable parameters, is e ual to 5.7 × 107 . The training process is shown in Figure 40.    Figure 40 Baseline Transfer Learning method’  Training Process After the 4th epoch, the validation accuracy remained constant, which indicates that the model found the local minima. It is worthwhile to mention that the validation accuracy and loss are shown in Figure 40 to showcase the model training process. 94  For a sample given piece of stable cut vibration’s frequency-domain signal input, the output of each convolutional layer’s filter is shown in Figure 41 to illustrate the progress of input through the trained model,  Figure 41 The Output Value of Activation of the CNN Convolutional Layers for a given piece of Stable Frequency-domain Vibration Signal The trained model has 88.97% test accuracy and 95. 4% training accuracy in machining state detection. Each prediction using the trained model requires 0.14 seconds of vibration data and the computation time takes approximately nine milliseconds.  The confusion matrix shown in Figure 42 presents the performance of the transfer learning model. Each row of the matrix is normalized to the true state corresponding to the row; accordingly, the matrix's diagonal cells represent the Recall Value, and the summation of the percentages presented in each row is equal to 100%.  95   Figure 42 Normalized Confusion Matrix of the Transfer Learning Model The confusion matrix indicates that the model performs well in detecting chatter cases, though it has a high tendency to misclassify non-chatter cases as chatter. The model performs better than the one suggested in the previous section, both in terms of chatter detection and machining state detection.  5.4 Standalone Custom Designed Convolutional Neural Network Performance In this section, the process of the machining state detection architecture design suggested in section 3.5 is elaborated, and the numeric results of the proposed method are shown. Moreover, the performance of the mentioned method is evaluated. The results of both architecture design and training of the designed architecture are interpreted in this section.  96  The architecture design is done through 600 iterations of Bayesian Optimization, searching in search space of 1038, by design, training, and evaluation of one CNN architecture per iteration. The training process is done on an NVIDIA® Quadro® P4000 with 8 gigabytes of GPU memory in 4.2 GPU days.  The architecture design process is shown below.  Figure 43 Automated Machine Learning Architecture Design Iterations The objective function is a combination of complexity and accuracy, as presented by Equation (51). Figure 43 provides another representation of the change of accuracy and complexity during the optimization iterations. It depicts the validation accuracy, and the complexity is compared with the objective function value, middle graph. As shown in Figure 43, the algorithm converges towards points with lower complexity architectural complexity and higher validation accuracy.  Figure 44 depicts the value of the objective function versus Validation accuracy and network complexity in each iteration of the network design process.  97   Figure 44 Architecture Validation Accuracy, Objective Function Value, and Complexity of Each Bayesian Optimization Iteration with top three minimum objective function value iterations marked The estimated objective value by Bayesian optimization versus the real value of the objective function during the optimization is explained as follows, 98   Figure 45 Objective function Value versus Bayesian Optimization Estimation of Objective Function Value During Architecture Design Process The periodic exploration and exploitation process is apparent in Figure 45. The best objective function values are shown in Figure 45. Despite that the lowest objective function value is determined in the 77th iteration, the second-best architecture resulted in the 138th iteration is used as the final model for machining state detection. The reason is the slightly better performance of the second-best architecture after manual hyperparameter tuning, 0.5% improvement in accuracy, and its extremely low complexity. In other words, after tuning the initial learning rate, and learning rate drop factor, and having it get trained for more number of epochs, the second-best architecture showed a better performance than what the first one shown after fine-tuning. As shown in Figure 46, the best architecture consists of three convolutional blocks and two nueral network hidden layers. The first block is a set 100, 1 × 4 convolution kernels with a stride of 4 followed by a ReLu, a cross normalization with a channel size of 5 and 𝛼 = 1𝑒 − 4 and 𝛽 = 0.75 suceeded by a  ×   pooling layer with two strides and zero paddings. The second convolution 99  block consists of 72, 8 ×   grouped convolution with two groups and two as paddings, and after that, a ReLu followed by a cross normalization with a channel size of 5 and 𝛼 = 1𝑒 − 4 and 𝛽 =0.75 suceeded by a  ×   pooling layer with two strides and zero paddings. The convolutional blocks' output goes into a two layers fully connected neural network followed by a ReLu and a drop out layer. The neural network has 1744 neurons in its hidden layer and five neurons in its output layer, each corresponding to one machining state’s probability. The architecture’s complexity, i.e. the number of trainable parameters, is equal to 4.19𝑒7 which is  1%  less complex than the transfer learning method, and the architecture proposed by A. Kerizhevsky [30] while it detects machining states with higher accuracy.   Figure 46 Custom Designed Machining State Detection CNN Architecture Schematic  For a sample given piece of frequency-domain signal input, the output of each convolutional layer’s filter is shown below to illustrate the progress of input through the trained model, 100   Figure 47 The Output Value of Activations of the CNN Convolutional Layers for a given Piece of Frequency-domain Vibration Signal After the architecture design process, the training hyperparameters are manually tuned for the designed architecture using the validation data. The resultant model is trained using the following training hyperparameters shown in Table 3, Table 3 Custom Designed Machining State Detection Model Training Hyperparameters Hyperparameter Value Solver Adam (Adaptive Moment Estimation) [44] Initial Learning Rate During Optimization: 5.0𝑒 − 5 During Training:  .0𝑒 − 4 Learning Rate Drop Factor During Optimization: 0.05 During Training: 0.67 Denominator Offset (𝜖) 1.0𝑒 − 8 Gradient Decay Factor (𝛽1) 0.9      ConvolutionReLuCross NormalizationMax PoolingGrouped ConvolutionReLuCross NormalizationMax PoolingConvolutionReLu                   .    .    .    Normalized MagnitudeTimeFre uency101  Squared Gradient Decay Factor (𝛽2) 0.999 L2 Regularization(𝜆) 1.0𝑒 − 4 Mini Batch Size 51  Number of Epochs During Optimization: 3 During Training: 30  The learning is multiplied by the Learning Rate Drop Factor after each epoch. The data gets shuffled before training to ensure that the order of data does not play a role in the process of machining state detection and the model performance, i.e. it does not overfit. Training of the model takes 55 minutes on the GPU mentioned above. This network's training time is roughly equal to the training time of the transfer learning method because of its smaller size. The training process is shown in Figure 48, 102   Figure 48 Custom Designed Architecture Training Process The validation dataset is merely used during the architecture design process to evaluate each designed and trained network's performance. The validation data is not used during the training process, yet, it is used in Figure 48 for illustration.  The designed architecture showed 94. 8% test accuracy, 94.81% validation accuracy, and 96.64% training accuracy in the detection of machining states. By disregarding the machining states and using the system merely for cutting stability detection, i.e. considering any state other than chatter as stable, the trained model's test accuracy is equal to 98.88%.  Each prediction using the trained model requires 0.07 seconds of vibration data and the computation time takes approximately three milliseconds. The trained model occupies 154 MB on the memory to store its weights and biases. The closeness of training, validation, and test accuracy 103  suggests that the model is well trained and not overfitted. It is worthwhile to mention that the validation accuracy and loss are shown in Figure 48 merely to showcase the training process.  The model has 5. 1% more accuracy in machining state detection in comparison with the transfer learning model, while it is  1% less complex and provides faster training and three times faster prediction.  The confusion matrix of the trained model is presented in Figure 49 based on the test data, with each row normalized,  Figure 49 Trained Model of Custom Design               ’                  The trained model has great dexterity in differentiating chatter, and stable cuts as the percentage of the chatter cases being detected as stable is 0.09% and the accumulative stable and chatter confusion percentage is less than 0.1%. Moreover, it displays that the model is capable of detecting the transient state with a reasonable degree of certainty. The most confusions are in differentiation between the transient states and stable cut, which is justifiable by the fact that there are similar 104  significances in the stable cut frequency-domain signal, such as tooth passing frequency magnitude, which presents in both transient input and the stable cut. Additionally, in this thesis, the machining states are used to compensate for the energy-based method incapability in distinguishing chatter and transient tool entrance and exit states, which means mistaking transient state with stable cut does not cause disruption in the hybrid system. The precision, recall, and F1-Score of the model for each machining state are given in the below table. The below values indicate the high preciseness and robustness of the suggested model in the detection of cutting states and air cut, according to the F1-score. Table 4 Custom Designed Machining State Detection Model Performance tested on the Test Dataset Performance Metric Air Cut Chatter Entrance Exit Stable Recall 90.04% 96.64% 54.62% 74.51% 88.87% Precision 96.61% 99.51% 42.77% 53.52% 88.83% F1-Score 0.968 0.981 0.705 0.623 0.878  5.5 Improved Energy-based Chatter Detection and Hybrid System (Machine Learning Fusion) Performance This section presents the results and experimental validation of the physics-driven improved energy-based method, introduced in section 4.1 and the combination of it with the machine learning method introduced in 3.5, i.e. the hybrid method introduced in section 4.2.  The improved energy-based chatter detection system suggested in section 4.1 results in a chatter detection system with 9 .51%  accuracy. The confusion matrix of the mentioned model is presented below.  105   Figure 50 Improved Energy-based Chatter Detection Confusion Matrix with Normalized Rows, based upon the Test Data The presented confusion matrix is normalized to its rows. The energy-based system is merely capable of detecting chatter and it is desired to have all other machining states be detected as stable. Hence, the predictions are limited to chatter and stable cuts. In this case, it is desired to detect any machining state other than chatter as stable. The confusion matrix's colour coded accordingly. The improved energy-based method has an accuracy of 93.51% in chatter detection. It is 33.93% more accurate than the original energy-based model, proposed by Caliskan [45]. It shows considerable improvement in the misclassification of entrance and exit as chatter, 77.36% and 17.03% improvement, respectively. Besides, the proposed model has a 56.51% improvement in detecting chatter.   Combining the machine learning system and energy-based chatter detection is fused using the method suggested in section 4.2.1. The mentioned system is implemented with 𝑡ℎCertainty = 0.8 106  and 𝐾𝑝 = 0. . It resulted in a machining state detection with 94. 6% test accuracy in machining state detection and 98.90% in chatter detection. The hybrid model shows 5. 9% more accuracy than the standalone improved energy-based chatter detection. Moreover, it performs slightly more accurate (0.0 %) than the stand-alone custom CNN architecture. However, its primary importance is its higher robustness, given the fact that it is infusing the data-driven method with the physics of milling. The row normalized the confusion matrix of the hybrid method is presented below,  Figure 51 Hybrid (Machine Learning Fusion) Chatter Detection System Confusion Matrix with Normalized Rows The confusion matrix shows that the model rarely mistakes the transient states with Chatter. The most confusion is in stable cut and transient, entrance and exit, states. It does not affect the chatter detection and suppression process. Moreover, the high confusion percentage of entrance and exit instances with air cut and stable is justifiable because the corresponding data to entrance and exit both include the instances with the tool having slight engagement with the workpiece, i.e. the first instances of entrance and last instances of exit, and the tool being almost fully engaged with the 107  workpiece, i.e. last instances of the entrance and first instances of exit. The mentioned cases make some of the entrance and exit instances hard to differentiate from air cut and stable cut. The F1-scores for air cut, chatter, entrance, exit, and stable are 0.97, 0.98, 0.62, 0.64, and 0.88.  5.6 Summary The training and validation of the models presented in sections 3.2.2, 3.4, 3.5, and 4.2 are shown in this chapter. The accuracy of the baseline machine learning method suggested that it is plausible to solve the problem of online machining state detection using data-driven methods. The transfer learning model showed higher accuracy than the baseline model. The other significance of this model is its three times faster response time. The baseline model’s network architecture provided the initial state of Bayesian optimization used to design and fabricate the custom CNN architecture. The designed architecture showed a good performance both in chatter detection and machining state detection.   The online physics-driven chatter detection method suggested in 4.2 is validated in this chapter using experimental data. Furthermore, the final system, which consists of the custom CNN architecture and the proposed energy-based chatter detection method, is validated. It is shown that the custom automated designed machine learning architecture and the hybrid model are essentially showing the same results both in terms of chatter detection and machining state detection. Nevertheless, the hybrid model offers more robustness due to accounting for milling physics, which adds significant value to the design.  Moreover, it provides a reliable platform for the machine learning model to evolve when it comes to low certainty detections. The small lead of the hybrid system in chatter detection and the small 108  lead of the custom CNN model in the machining state detection is due to the fact that the physics-driven method is merely designed with the goal of chatter detection in mind.    109  Chapter 6: Conclusion This research aimed to detect the machining states and suppress instability in the machining process online. The approach taken towards differentiating between the five states of machining, air cut, entrance, stable cut, chatter, and exit, is to amalgamate a data-driven machine learning method with the machining process's physics in order to detect chatter without compromising the robustness. The outcome of this work is a self-evolving machining state detection system. The data-driven models are trained on a vast, diverse set of data, and the performance of each individual system, as well as the final product, are evaluated and reported based upon an untouched collection of experiments with both similar and different cutting conditions.   This work consists of three machine learning models, each capable of detecting machining states as well as cutting instability. The first suggested model provides a simple data-driven solution to machining state detection based on the time-domain vibration signal’s dispersion and skewness and using decision trees. The model shows 85.46% accuracy in machining state detection and 97.11% accuracy in chatter detection. The model is used merely as a proof of concept and an indicator of the possibility of solving machining state detection problems with data-driven methods. Furthermore, a transfer learning approach is taken towards machining state detection using machine learning. The method uses a CNN architecture introduced by A. Krizhevsky[30] and modifies and partially retrains the architecture on the machining frequency-domain signal. A pipeline is designed to prepare the data and direct the data flow. The pipeline encapsulates filtering, conversion to the frequency domain, using STFT, and training, as well as utilization of the trained 110  model for machining state detection. The trained model has an 88.97% accuracy in machining state detection and 97.97% accuracy in chatter detection. As the final deep learning model, an automated machine learning architecture design scheme is created to construct a CNN architecture specialized in machining state detection. The scheme uses Bayesian optimization to tweak the network parameters with the goal of lowering the complexity and increasing the accuracy of the model in the machining state detection. The custom-designed architecture takes the frequency-domain machining vibration signal as well as the tooth passing frequency as its input and outputs the probability of each state of cut. It uses the same pipeline for its data flow as the one designed for the transfer learning method, except it does not require resizing the input. The method developed an architecture providing higher accuracy than the transfer learning architecture, while it has a lower complexity. The trained model showed 94.28% accuracy on machining state detection and 98.88% accuracy in chatter detection, based on examination on the test dataset.  The physics-driven online chatter detection method is based upon online energy-based chatter detection introduced by  Caliskan [11]. The method is improved by suggesting a new approach to calculate the chatter energy. In this work, the energy-based chatter detection is tuned, and its chatter energy estimation module using Teager-Kaiser Nonlinear Energy Operator is replaced. The new proposed chatter energy estimation method uses the statistical measures of the FFT of the non-periodic parts of the vibration signal alongside the tooth passing frequency to provide a robust estimation of the chatter energy. The designed system shows 93.51% accuracy in chatter detection with   .    improvement over Caliskan’s algorithm while minimizing the false detection caused by air cuts.  111   Both outputs of CNN custom architecture and the enhanced energy-based models are considered together through a decision-making algorithm to form a hybrid machining state detection and chatter suppression system. The hybrid system has 94.26% accuracy in machining state detection and 98.90% accuracy in chatter detection. The system gets evolved by retraining the machine learning model based on its low certainty detections that have high accumulative probability measured by the hybrid system judgment.   6.1 Future Work The research can be extended by having a central machine learning model shared among multiple machine tools. Given the small size of the trained machine learning model and its self-evolution capability, it could be shared among multiple machine tools in an industry 4.0 setup. For each retraining instance, the corresponding signal can be uploaded to the server and queued to retrain the model.  Then other machines in the setup can download the updated model from the server to have a model with higher accuracy and avoid repetition of the same issue on other machines. The interconnected machine setup could be both local or connected through the internet.  Additionally, a machine learning model could be trained to perform online chatter prediction. It could predict chatter by being trained on the transient entrance signal that results in a chatter cut, as well as the transient signal when the stable cuts start to become unstable, for instance, in slope cuts, i.e. ramping operation, when the depth of cut increases to the point of chattering.  112  In terms of the chatter detection algorithm, the model could be pre-trained on simulated data before training it on the experiment data. The primary constraint is to provide simulated data for the transient entrance and exit states as well as the chatter signal. Furthermore, time-domain series machine learning methods, such as Long Short-term Memory (LSTM) methods, could be considered to be replaced with CNNs recommended in this thesis.  Moreover, after chatter detection, the same stability lobes from the beginning of the machining process is used for determining the new stable spindle speed. It is suggested to update the stability lobes in case of encountering chatter and before choosing the new spindle speed based on it.   113  Bibliography [1] Y. Altintas, G. Stepan,  . Budak, T. Schmitz, and Z. M. Kilic, “Chatter Stability of Machining Operations,” J. Manuf. Sci. Eng., vol. 142, no. November, pp. 1–46, 2020. [2]  . Budak and Y. Altintas¸, “Analytical Prediction of Chatter Stability in Milling—Part I: General Formulation,” J. Dyn. Syst. Meas. Control, vol. 120, no. 1, pp. 22–30, Mar. 1998. [3] Altintas  Y , Author and Ber  AA , Reviewer, “Manufacturing Automation: Metal Cutting Mechanics, Machine Tool Vibrations, and CNC Design,” Appl. Mech. Rev., vol. 54, no. 5, pp. B84–B84, Sep. 2001. [4] T.  nsperger, B. P. Mann, T. Surmann, and G. Stépán, “On the chatter frequencies of milling processes with runout,” Int. J. Mach. Tools Manuf., vol. 48, no. 10, pp. 1081–1089, 2008. [5] T. Insperger, G. Stépán, P. V Bayly, and B. P. Mann, “Multiple chatter fre uencies in milling processes,” J. Sound Vib., vol. 262, no. 2, pp. 333–345, 2003. [6] Y. Altintas, G. Stepan, D. Merdol, and Z. Dombovari, “Chatter stability of milling in frequency and discrete time domain,” CIRP J. Manuf. Sci. Technol., vol. 1, no. 1, pp. 35–44, 2008. [7] J. TLUSTY, “CHAPT R   - THE THEORY OF CHATTER AND STABILITY ANALYS S,” in Machine Tool Structures, F. KOENIGSBERGER and J. TLUSTY, Eds. Pergamon, 1970, pp. 133–177. [8] T. Delio, J. Tlusty, and S. Smith, “Use of audio signals for chatter detection and control,” J. Manuf. Sci. Eng. Trans. ASME, vol. 114, no. 2, pp. 146–157, 1992. [9] T. Choi and Y. C. Shin, “On-line chatter detection using wavelet-based parameter estimation,” J. Manuf. Sci. Eng. Trans. ASME, vol. 125, no. 1, pp. 21–28, 2003. 114  [10] M. Lamraoui, M. Thomas, and M.  l Badaoui, “Cyclostationarity approach for monitoring chatter and tool wear in high speed milling,” Mech. Syst. Signal Process., vol. 44, no. 1, pp. 177–198, 2014. [11] H. Caliskan, Z. M. Kilic, and Y. Altintas, “On-Line Energy-Based Milling Chatter Detection,” vol.    , no. November     , pp.  –12, 2019. [12] R. G. Brown, P. Y. C. Hwang, and others, Introduction to random signals and applied Kalman filtering, vol. 3. Wiley New York, 1992. [13] J. F. Kaiser, “On a simple algorithm to calculate the ‘energy’ of a signal,” in International Conference on Acoustics, Speech, and Signal Processing, 1990, pp. 381–384 vol.1. [14] P. Maragos, S. Member, J. F. Kaiser, and T. F. Quatieri, “Application to Speech Analysis S :, s :,” October, vol. 41, no. 10, pp. 3024–3051, 1993. [15]  . Lin, C. Hamilton, and P. Chitrapu, “Generalization to the Teager-Kaiser energy function & application to resolving two closely-spaced tones,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1995, vol. 3, no. 3, pp. 1637–1640. [16] D. L. Poole and A. K. Mackworth, Artificial Intelligence: foundations of computational agents. Cambridge University Press, 2010. [17] K. P. Murphy, Machine Learning: A Probabilistic Perspective. MIT Press, 2012. [18] M. Postel, B. Bugdayci, and K. egener, “ nsemble transfer learning for refining stability predictions in milling using experimental stability states,” Int. J. Adv. Manuf. Technol., vol. 107, no. 9–10, pp. 4123–4139, 2020. [19] M. Lamraoui, M. Barakat, M. Thomas, and M.  l Badaoui, “Chatter detection in milling machines by neural network classification and feature selection,” JVC/Journal Vib. 115  Control, vol. 21, no. 7, pp. 1251–1266, 2015. [20] X. He, K. Zhao, and X. Chu, “AutoML: A survey of the state-of-the-Art,” arXiv, no. Dl, 2019. [21] C. hite, R. Ai, and . Neiswanger, “Neural Architecture Search via Bayesian Optimization with a Neural Network Model,” no. Neur PS, pp. 1–11, 2019. [22] C. Rosenberg, M. Hebert, and H. Schneiderman, “Semi-supervised self-training of object detection models,”     . [23] P. P. Mitra and D. P. Schmidt, “On the  ffectiveness of Bayesian AutoML methods for Physics Emulators,” no. October, pp. 1–11, 2020. [24] I. Rebai, Y. BenAyed, W. Mahdi, and J.-P. Lorré, “ mproving speech recognition using data augmentation and acoustic model fusion,” Procedia Comput. Sci., vol. 112, pp. 316–322, 2017. [25] M. K. Nezami, “Performance assessment of baseband algorithms for direct conversion tactical software defined receivers: I/Q imbalance correction, image rejection, DC removal, and channelization,” in MILCOM 2002. Proceedings, 2002, vol. 1, pp. 369–376. [26] A. Celisse, “Optimal cross-validation in density estimation with the L2-loss,” Ann. Stat., vol. 42, no. 5, pp. 1879–1910, 2014. [27] M. Stojćev, “Digital Signal Processing: A Computer Based Approach. Sanjit K. Mitra. McGraw-Hill, New York, 1998, hardcover, 864pp., £68.99 ISBN 0-07-042953- ,” Microelectronics Journal, vol. 30, no. 8. p. 807, 1999. [28] A. V Oppenheim, Discrete-time signal processing. Pearson Education India, 1999. [29] J. B. ALL N, “Short Term Spectral Analysis, Synthesis, and Modification by Discrete Fourier Transform,” IEEE Trans. Acoust., vol. 25, no. 3, pp. 235–238, 1977. 116  [30] A. (University of T. Krizhevsky, I. (University of T. Sutskever, and G. (University of T. E. Hinton, “ mageNet Classification with Deep Convolutional Neural Networks,” Curran Assoc. Inc., pp. 1097–1105, 2012. [31] N. FUJIMOTO, T. YUMIBAYASHI, M. ONOUE, M. SUGAMOTO, and A. SUGAMOTO, “Surround  nhibition Mechanism by Deep Learning,” Proc. Annu. Conf. JSAI, vol. 2019, no. Jsai 2019, pp. 2H3J201-2H3J201, 2019. [32] T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 318–327, 2020. [33] M. S. Abdelfattah, Ł. Dudziak, T. Chau, R. Lee, H. Kim, and N. D. Lane, “Best of Both  orlds: AutoML Codesign of a CNN and its Hardware Accelerator,”     . [34] M. A. Gelbart, J. Snoek, and R. P. Adams, “Bayesian optimization with unknown constraints,” Uncertain. Artif. Intell. - Proc. 30th Conf. UAI 2014, pp. 250–259, 2014. [35] K. J. Swersky, “ mproving Bayesian Optimization for Machine Learning using  xpert Priors,”     . [36] S. Journal, R. Statistical, and S. Series, “The Determination of the Order of an Autoregression Author ( s ):   . J . Hannan and B . G . Quinn Published by : Blackwell Publishing for the Royal Statistical Society Stable URL : http://www.jstor.org/stable/       ,” vol.   , no.  , pp.    –195, 2009. [37] G. Claeskens and N. L. Hjort, “A comparison of some selection methods,” in Model Selection and Model Averaging, Cambridge University Press, 2008, pp. 99–116. [38]  . Brochu, V. M. Cora, and N. de Freitas, “A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning,”     . 117  [39] H. M. Gutmann, “A Radial Basis Function Method for Global Optimization,” J. Glob. Optim., vol. 19, no. 3, pp. 201–227, 2001. [40] J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian Optimization of Machine Learning Algorithms,” in Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 2951–2959. [41] Z. Yao, D. Mei, and Z. Chen, “On-line chatter detection and identification based on wavelet and support vector machine,” J. Mater. Process. Technol., vol. 210, no. 5, pp. 713–719, 2010. [42] N. Fazakis, S. Karlos, S. Kotsiantis, and K. Sgarbas, “Self-Trained LMT for Semisupervised Learning,” Comput. Intell. Neurosci., vol. 2016, p. 3057481, 2016. [43] L. Jost, “ ntropy and diversity,” Oikos, vol. 113, no. 2, pp. 363–375, 2006. [44] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv Prepr. arXiv1412.6980, 2014. [45] H. Caliskan, Z. M. Kilic, and Y. Altintas, “On-line energy-based milling chatter detection,” J. Manuf. Sci. Eng. Trans. ASME, vol. 140, no. 11, pp. 1–12, 2018.   

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0396326/manifest

Comment

Related Items