Using Wi-Fi Channel State Information (CSI) for HumanActivity Recognition and Fall DetectionbyTahmid Z. ChowdhuryBEng (Hons), Electrical & Electronic Engineering, University of East London, 2014A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)April 2018c© Tahmid Z. Chowdhury, 2018AbstractHuman Activity Recognition (HAR) serves a diverse range of human-centric applicationsin health care, smart homes, and security. Recently, Wi-Fi-based solutions have attracteda lot of attention. The underlying principle of these is the effect that human bodies haveon nearby wireless signals. The presence of static objects such as ceilings and furniturecause reflections while dynamic objects such as humans result in additional propagationpaths. These effects can be empirically observed by monitoring the Channel State Infor-mation (CSI) between two Wi-Fi devices. As different human postures induce differentsignal propagation paths, they result in unique CSI signatures, which can be mapped tocorresponding human activities.However, there are some limitations in current state-of-the-art solutions. First, theperformance of CSI-based HARs degrade in complex environments. To overcome thislimitation, we propose Wi-HACS: Leveraging Wi-Fi for Human Activity Classification us-ing Orthogonal Frequency Division Multiplexing (OFDM) Subcarriers. In our work, wepropose a novel signal segmentation method to accurately determine the start and end of ahuman activity. We use several signal pre-processing and noise attenuation techniques, notcommonly used in CSI-based HAR, to improve the features obtained from the amplitudeand phase signals. We also propose novel features based on subcarrier correlations andautospectra of principal components. Our results indicate that Wi-HACS can outperformthe state-of-the-art method in both precision and recall by 8% in simple environments, andby 14.8% in complex environments.iiAbstractThe second limitation in existing CSI-HAR solutions is their poor performance innew/untrained environments. Since accurate Wi-Fi based fall detectors can greatly ben-efit the well-being of the elderly, we propose DeepFalls: Using Wi-Fi Spectrograms andDeep Convolutional Neural Nets for Fall Detection. We utilize the Hilbert Huang Trans-form spectrograms and train a Convolutional Neural Network to learn the features auto-matically. Our results show that DeepFalls can outperform the state-of-the-art RT-Fall inuntrained environments with improvements in sensitivity and specificity by 11% and 15%respectively.iiiLay SummaryHuman Activity Recognition (HAR) serves a diverse range of human-centric applicationsin health care, smart homes, and security. Recently, Wi-Fi-based have attracted a lot ofattention. When human beings are in the Wi-Fi range, the signals propagate differently.These effects can be empirically observed by measurements on the Wi-Fi channel. Thechannel variations can be used to classify different human activities.However, there are some limitations in existing Wi-Fi systems. First, the performancesdegrade in complex environments. To overcome this limitation, we propose Wi-HACS,which improves the state-of-the-art work’s precision and recall by 8% in simple environ-ments, and by 15% in complex environments.The second limitation in existing systems is that they do not perform well in new envi-ronments. To improve the performances, we propose DeepFalls which make use of Con-volutional Neural Network. Our results show that DeepFalls can outperform the state-of-the-art method RT-Fall, in untrained environments with sensitivity and specificity improve-ments of 11% and 15% respectively.ivPrefaceThis thesis is based on the research work performed under the supervision of ProfessorCyril Leung. Chapter 2 is based on the conference paper titled “Wi-HACS: LeveragingWiFi for Human Activity Classification using OFDM Subcarriers’ correlation”, which waspublished in the 2017 5th IEEE Global Conference on Signal and Information Process-ing. This paper is co-authored by myself as the first author, Prof. Cyril Leung, and Prof.Chunyan Miao who is a Professor in the School of Computer Science and Engineering atNanyang Technological University (NTU) in Singapore. I hereby confirm that I was theprimary researcher of this work. I came up with the idea of the research independently.My contributions included conducting the literature review, collecting data, identifying theresearch problem and carrying out various signal processing and machine learning tech-niques under the supervision of Profs. Leung and Miao. Since there is no open datasetin this area, I used an open source firmware for an Intel 5300 Network Interface Card(NIC) to collect data. Prior to collecting data, an ethics approval was obtained from theUBC Behavioural Research Ethics Board for the project title: “DeepWi-Fi: LeveragingChannel State Information (CSI) of Wi-Fi and Deep Convolution Nets to Classify HumanGestures”, and the certificate number is H17-01839 .vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviiiNotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Sensors and Computer Vision based HAR . . . . . . . . . . . . . . . . . 11.2 Motivation behind the use of Wi-Fi . . . . . . . . . . . . . . . . . . . . 21.2.1 Wi-Fi Channel and Human Activity Paradigm . . . . . . . . . . 21.3 Related work on CSI based HAR . . . . . . . . . . . . . . . . . . . . . 31.4 Technical Challenges in CSI-based HAR . . . . . . . . . . . . . . . . . 5viTable of Contents1.5 Research Motivation and Contributions . . . . . . . . . . . . . . . . . . 61.6 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Sub-carriers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1 Channel State Information (CSI) . . . . . . . . . . . . . . . . . . . . . . 122.1.1 Correlation between human activities and amplitudes of OFDMSubcarriers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.2 Correlation between human activities and phases of OFDM Sub-carriers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.1.3 OFDM Phase Calibration . . . . . . . . . . . . . . . . . . . . . 172.2 CSI Time-Series Pre-Conditioning . . . . . . . . . . . . . . . . . . . . . 202.2.1 1-D Linear Interpolation . . . . . . . . . . . . . . . . . . . . . . 202.2.2 Hampel Identifier Outlier Removal . . . . . . . . . . . . . . . . 202.2.3 De-trending Subcarriers to avoid spectral distortion . . . . . . . 222.2.4 Effect of zero-padding in FFT computations . . . . . . . . . . . 232.2.5 Tapering CSI waveforms to prevent edge artifacts . . . . . . . . 242.3 Discrete Wavelet Transform (DWT) based Noise Attenuation for CSI sig-nals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3.1 Limitations in time and frequency based noise attenuation tech-niques in CSI systems . . . . . . . . . . . . . . . . . . . . . . . 272.3.2 The Discrete Wavelet Transform (DWT) as a dyadic filter bank . 282.3.3 Thresholding methods and value selection . . . . . . . . . . . . 312.4 Principal Component Analysis (PCA) Dimension Reduction . . . . . . . 342.5 Adaptive Windowing for CSI-signal Segmentation . . . . . . . . . . . . 352.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40viiTable of Contents3 Wi-HACS: Classification and Performance Analysis . . . . . . . . . . . . . 413.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.1.1 Features adopted from baseline . . . . . . . . . . . . . . . . . . 423.1.2 Proposed features based on subcarrier correlations . . . . . . . . 443.1.3 Proposed features based on autospectrum . . . . . . . . . . . . . 453.2 Multi-Class Support Vector Machine (SVM) Classification . . . . . . . . 483.2.1 Cross-validation and Grid-search . . . . . . . . . . . . . . . . . 493.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.3.1 Hardware and Base Signals . . . . . . . . . . . . . . . . . . . . 503.3.2 Data Collection Procedure . . . . . . . . . . . . . . . . . . . . . 513.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.4.1 Baseline Method and Performance Metrics . . . . . . . . . . . . 543.4.2 Selection of optimum number of Principal Components . . . . . 563.4.3 Effect of DWT-based noise attenuation and Hampel Filtering onclassification results . . . . . . . . . . . . . . . . . . . . . . . . 583.4.4 Effect of proposed features on classification results . . . . . . . . 603.4.5 Performance Evaluation of Wi-HACS with Baseline . . . . . . . 613.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684 DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for FallDetection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.1 DeepFalls Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.2 Singular Spectral Analysis based noise attenuation . . . . . . . . . . . . 724.2.1 Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.2.2 Singular Value Decomposition (SVD) . . . . . . . . . . . . . . . 744.2.3 Rank Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 75viiiTable of Contents4.2.4 Diagonal Averaging . . . . . . . . . . . . . . . . . . . . . . . . 764.3 Time Frequency Localization . . . . . . . . . . . . . . . . . . . . . . . 774.3.1 Short Time Fourier Transform (STFT) . . . . . . . . . . . . . . 774.3.2 Continuous Wavelet Transform (CWT) . . . . . . . . . . . . . . 784.4 Hilbert-Huang Transform . . . . . . . . . . . . . . . . . . . . . . . . . 794.4.1 Empirical Mode Decomposition (EMD) . . . . . . . . . . . . . 804.4.2 Ensemble Empirical Mode Decomposition (EEMD) . . . . . . . 844.4.3 Complete Ensemble Empirical Mode Decomposition with Adap-tive Noise (CEEMDAN) . . . . . . . . . . . . . . . . . . . . . . 854.5 Modified Signal Segmentation . . . . . . . . . . . . . . . . . . . . . . . 884.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905 DeepFalls Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . 915.1 Deep Convolutional Neural Network . . . . . . . . . . . . . . . . . . . 925.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945.2.1 Hardware and Base Signals . . . . . . . . . . . . . . . . . . . . 945.2.2 Data Collection Procedure . . . . . . . . . . . . . . . . . . . . . 945.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 975.3.1 Baseline Method and Performance Metrics . . . . . . . . . . . . 975.3.2 Performance Evaluation of DeepFalls with Baseline . . . . . . . 985.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1016 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 1026.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1026.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107ixTable of ContentsAppendicesA Confusion matrices and results for Wi-HACS . . . . . . . . . . . . . . . . 113xList of Tables2.1 Number of subcarriers and carrier grouping (IEEE 802.11n Standards)[21]. The subcarrier indices are the carriers for which channel matricesare sent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.1 Extracted Features for Wi-HACS . . . . . . . . . . . . . . . . . . . . . . 473.2 List of human activities and total number of samples across all environments. 535.1 Number of human falls and fall-like activities in three environments in ourdataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96xiList of Figures1.1 Wi-Fi Multipath Propagation: reflections (black) and scatterings (red) . . 32.1 Amplitude variation across (a) 30 subcarriers for same T-R link, (b) 5thsubcarrier across four T-R links . . . . . . . . . . . . . . . . . . . . . . . 142.2 Correlation Matrix for (a) 30 subcarriers for same T-R link, (b) Group of 5subcarriers across four T-R links. The colorbar represents the correlationvalues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3 CSI signatures for two different human activities (a) Amplitude variations,(b) Raw CSI phase variations, (c) Calibrated CSI phase response. Thevariations for 30 subcarriers are shown on the left and those of 4 subcarriersare shown on the right. . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 (a) Unwrapped CSI phase response , (b) Sanitized CSI phase after calibra-tion. Some parts of (a) are zoomed to show the break in linearity in theunwrapped phase of the 15th and 30th subcarrier. . . . . . . . . . . . . . 192.5 Effect of Hampel Outlier Removal on three subcarriers: (a) Raw CSI Am-plitude waveforms with outliers denoted by ’black circles’, (b) Hampelfiltered CSI amplitude waveforms. . . . . . . . . . . . . . . . . . . . . . 212.6 Effects before and after de-trending CSI waveforms: (a) Hampel Filteredsubcarrier for walking activity, (b) De-trended waveform of (a); One-sidedamplitude spectrum of FFT for (c) before de-trending, (d) after de-trending 22xiiList of Figures2.7 FFT profile of the walking activity (a) before, (b) after zero padding . . . 242.8 (a) Time-domain and (b) FFT profile of the cosine taper with various ta-pering ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.9 The FFT profiles of the amplitude signal during the walking activity mul-tiplied by cosine tapers of various tapering ratios (a) Time domain and (b)Frequency domain for a=5%, (c) Time-domain and (d) Frequency domainfor a=25%, (e) Time domain and (f) Frequency domain for a=50%. . . . . 272.10 (a) The subband coding algorithm in DWT [31] . . . . . . . . . . . . . . 292.11 The DWT decomposition structures for CSI amplitude during ‘walking’event (i) Pre-conditioned amplitude, (ii)-(iv) representing levels 1-3 re-spectively and (a) and (b) denoting approximations and details for eachlevel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.12 Various de-noising algorithms applied to pre-conditioned CSI-amplitudetime-series for a series of activities:(a) Original pre-conditioned signal, (b)5-point median filtering, (c) Butterworth Low-pass filter with cutoff fre-quency at 30Hz, and (d) 3-level ‘db-8’ DWT based de-noising. . . . . . . 332.13 The variance of eigenvalues (modes) after the eigendecomposition of thecovariance matrices for each transmit-receive link. . . . . . . . . . . . . . 342.14 Adaptive Windowing based on the amplitude of FFT coefficients . . . . . 383.1 PCC correlation matrices of amplitude of subcarriers of the same T-R linkafter signal pre-conditioning. Area of interest corresponds to the corre-lation values taken as features: Out-of-place activities: (a) Walking, (b)Jogging, and in-place activities: (c) Sitting and (d) Standing. The colourbar represents the correlation values. . . . . . . . . . . . . . . . . . . . . 44xiiiList of Figures3.2 Experimental Setting for Data Collection: (a) One room (with LOS), (b)Two room (NLOS), and (c) Three rooms (NLOS). . . . . . . . . . . . . . 523.3 Performance of Wi-HACS using different number of Principal Compo-nents per TR link; the total % variance represented by different numberof PCs are: 1(48%), 2(76%), 3(86%), 4(91%), and 5(94%). . . . . . . . . 563.4 Performance of Wi-HACS before and after DWT de-noising and HampelIdentifier for the simplest (setting 1) and most complex (setting 3) environ-ments given in Fig. 3.2. The baseline performances are also included forreference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.5 Performance of Wi-HACS with and without the novel features based onsubcarrier correlations and autospectrum, for the simplest (setting 1) andmost complex (setting 3) environments in Fig. 3.2. The baseline perfor-mance is also included for reference. . . . . . . . . . . . . . . . . . . . . 603.6 Confusion Matrices for cross-validation results for setting 1: (a) Wi-HACS,(b) Baseline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.7 Cross-validation performance metrics Comparison between Wi-HACS andbaseline: setting 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.8 Average Performance Metrics of Wi-HACS and baseline under three en-vironmental settings: the results are shown for both cross-validations andtests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.9 Quantile-Quantile Plot: The x-axis represents quantiles from a normal dis-tribution and y-axis represents the quantiles drawn from the differences inperformance ((i) average accuracy, (ii) average precision and (iii) averagerecall) by Wi-HACS and baseline, in (a) setting 1, (b) setting 2, and (c)setting 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66xivList of Figures4.1 The Singular Spectrum of the CSI amplitude base signals for different hu-man activities in a meeting room. . . . . . . . . . . . . . . . . . . . . . . 754.2 The application of SSA based de-noising on the CSI amplitude for a fallsignal (left) and fall-like signal (right). The fall down and sit (fall-like)activity refers to a fall (or sit) followed by a lying down or remain sittingfor the remaining duration of the signal. . . . . . . . . . . . . . . . . . . 764.3 The Short Time Fourier Transform for (a) an actual fall, and (b) a fall likeactivity. Both the events take place at approximately 7 s windowed by 1 sbefore and 2 s after the event takes place. . . . . . . . . . . . . . . . . . 784.4 The Continuous Wavelet Transform for (a) an actual fall, and (b) a fall likeactivity. Both the events take place at approximately 7 s windowed by 1 sbefore and 2 s after the event takes place. . . . . . . . . . . . . . . . . . 794.5 The IMFs (black) generated by the EMD process for a CSI amplitude sig-nal (red) for a series of activities: walking for 8 seconds, then a fall activity,then lying down until 15 s. . . . . . . . . . . . . . . . . . . . . . . . . . 814.6 The IMFs (black) generated by the CEEMDAN process for a CSI ampli-tude signal (red) for a series of activities: walking for 8 seconds, then a fallactivity, then lying down until 15 s. . . . . . . . . . . . . . . . . . . . . . 864.7 The HHT based on the CEEMDAN of the CSI amplitudes for the (a) fall,and (b)fall-like activities. Both the events take place at approximately 7 swindowed by 1 s before and 2 s after the event takes place. The colorbarrepresents the magnitude of the frequencies. The bottom figures representthe actual part of the image used for training and classification. . . . . . . 874.8 Adaptive Windowing based on the amplitude of FFT coefficients . . . . . 885.1 The proposed Deep Convolutional Neural Network architecture. . . . . . 93xvList of Figures5.2 Experimental Setting for Data Collection: (a) Large meeting room, (b)Studio Apartment, and (c) Same Apartment (furniture position changed). . 955.3 Performances of DeepFalls and RT-Fall after training and testing on eachenvironment separately shown in Fig. 5.2. The performance metrics in(Env2+3) represents the classification results using data combined fromboth apartment environments. . . . . . . . . . . . . . . . . . . . . . . . . 995.4 Performances of DeepFalls and RT-Fall, (a) after training in meeting roomand testing in the apartment environments, (b) after training in apartment(Env 2) and testing in the changed apartment (Env 3). . . . . . . . . . . . 100A.1 Confusion Matrices for cross-validation results for setting 2: (a) Wi-HACS,(b) Baseline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113A.2 Performance metrics for each activity using the confusion matrices abovefor setting 2: (a) Wi-HACS, (b) Baseline. . . . . . . . . . . . . . . . . . 113A.3 Confusion Matrices for cross-validation results for setting 3: (a) Wi-HACS,(b) Baseline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114A.4 Performance metrics for each activity using the confusion matrices abovefor setting 3: (a) Wi-HACS, (b) Baseline. . . . . . . . . . . . . . . . . . 114A.5 Confusion Matrices for test results in setting 1: (a) Wi-HACS, (b) Baseline. 115A.6 Performance metrics for each activity using the confusion matrices abovein setting 1: (a) Wi-HACS, (b) Baseline. . . . . . . . . . . . . . . . . . . 115A.7 Confusion Matrices for test results in setting 2: (a) Wi-HACS, (b) Baseline. 116A.8 Performance metrics for each activity using the confusion matrices abovein setting 2: (a) Wi-HACS, (b) Baseline. . . . . . . . . . . . . . . . . . . 116A.9 Confusion Matrices for test results in setting 3: (a) Wi-HACS, (b) Baseline. 117xviList of FiguresA.10 Performance metrics for each activity using the confusion matrices abovein setting 3: (a) Wi-HACS, (b) Baseline. . . . . . . . . . . . . . . . . . . 117xviiList of AbbreviationsAWS Amazon Web ServicesBW BandwidthCFO Carrier Frequency OffsetCOTS Commercially-Off-The-ShelfCSI Channel State InformationCWT Continuous Wavelet TransformDWT Discrete Wavelet TransformEMD Empirical Mode DecompositionFFT Fast Fourier TransformHAR Human Activity RecognitionHHT Hilbert-Huang TransformIMF Intrinsic Mode FunctionLOS Line–of–SightNASA National Aeronautics and Space AdministrationNIC Network Interface CardNLoS Non–Line–of–SightOFDM Orthogonal Frequency Division MultiplexingPCA Principal Component AnalysisPCC Pearson Correlation CoefficientPSD Power Spectral DensityxviiiList of AbbreviationsRBF Radial Basis FunctionSFO Sampling Frequency OffsetSSA Singular Spectral AnalysisSTFT Short Time Fourier TransformSVD Singular Value DecompositionSVM Support Vector MachineTR Transmit-ReceivexixNotationA Matrixa Vector1 All–one column vectorI Identity matrix| · | Absolute value of a real number or the cardinality of a set(·)T Transpose(·)H Hermitian transposeC The set of complex numbervar(·) Variance operatordiag(x) A diagonal matrix with the elements of vector x on the main diagonalxxAcknowledgmentsI would like to express my deepest and sincerest gratitude to my supervisor, ProfessorCyril Leung, for his patience, encouragement, and advice throughout my Master’s programat The University of British Columbia. Without his immense knowledge and constantfeedbacks, this thesis would have never been possible. He was always supportive of newresearch ideas, raised questions that helped gain new insights and mentored me to developinvaluable research skills.I am also thankful to my co-supervisor Professor Chun Yan Miao, who is with theSchool of Computer Science and Engineering at Nanyang Technological University (NTU),Singapore. She advised me on applying machine learning techniques in my work. I amalso thankful to Professors Jane Wang and Han Yu for their research insights.My heartfelt thanks go to my parents and my sister. They stood by me during difficulttimes and never gave up on me. They believed in me more than I believed in myself. Theirsupport, love, and encouragement are the reasons behind my achievements so far.Last but not least, I would like to express my gratitude to my colleagues in our researchlab, for their invaluable support, friendship, and feedbacks. I am also very thankful tomy friend, Ertion Axha, in Germany. We are not only best friends but have continued tosupport and push each other to the limits to succeed.My work was partially supported by the UBC Faculty of Applied Science, UBC PMC- Sierra Professorship in Communications and Networking, and the National ResearchFoundation, Prime Minister’s Office, Singapore under its IDM Futures Funding Initiative.xxiDedicationTo my parents and my sister, for their confidence and belief inme.xxiiChapter 1Introduction1.1 Sensors and Computer Vision based HARHuman Activity Recognition (HAR) is an application that senses the local environment ofa human being with an objective to serve a diverse range of human-centric applications inhealth care, smart-homes and the military [1]. HAR devices have become more popularwith the increasing demand for smart applications. This application requires devices tobe accurate, comfortable and easily accessible. With an increase in sensor computationpower and relatively cheap hardware, a variety of sensor-based human activity recognitionsystems have been proposed. A comprehensive survey of existing wearable based devicesused for HAR can be found in [2].Another promising field used to classify human activities is computer vision. Improve-ments in image and video processing have enabled real-time human action segmentationsand tagging from continuously streamed videos, hence dramatically improving its use insurveillance applications. The framework of Video-based HAR systems typically includeimage processing techniques such as de-noising and various background subtractions, fol-lowed by feature extractions and a machine learning classification module. A compre-hensive survey of the existing state of the art computer-vision HAR systems is given in[3].1Chapter 1. Introduction1.2 Motivation behind the use of Wi-FiAlthough a variety of HAR applications are based on wearable sensors and computer vi-sion, they suffer from several drawbacks. Despite its small size and light weight, sensorbased systems require the user to wear the device or keep it within close proximity for de-tection. This may cause discomfort and the user needs to remember to keep these devicesclose. For applications such as fall detection, forgetting to wear these devices can be fatal.In the case of video based systems, the coverage area for detection must be withinline-of-sight (LOS). This may require multiple cameras to increase coverage. Despite ofimprovements in image and video processing algorithms, the performance can degradeunder bad lightning conditions. From a user perspective the presence of cameras can affectprivacy.To overcome these drawbacks, researchers in HAR have proposed using a technologyalready present in most homes: Wi-Fi. Wi-Fi based solutions are passive detection systemsin which the users do not need to wear devices. It can propagate through walls, furnitureand doors, and do not require Line of Sight (LOS) thereby enabling larger detection areas.1.2.1 Wi-Fi Channel and Human Activity ParadigmThe underlying principle of Wi-Fi based systems is the effect that human bodies create onnearby wireless signals. Wi-Fi signals can convey information that characterizes the envi-ronment they pass through [4]. This is further illustrated by the signal propagation paths inan indoor environment, shown in Fig. 1.1. The presence of static objects such as ceilingsand furniture cause reflections while dynamic objects such as humans result in additionalpropagation paths caused by scattering (reflections and refractions) of signals. In Fig. 1.1the ‘dashed’ red lines represent the change in scattering paths due to change in human pos-tures. These multipath propagation effects can be empirically observed by analyzing the2Chapter 1. IntroductionFigure 1.1: Wi-Fi Multipath Propagation: reflections (black) and scatterings (red)Channel State Information (CSI) between two Wi-Fi devices. As different human posturesinduce different signal scattering paths, this results in unique CSI signatures which can bemapped to corresponding human activities.1.3 Related work on CSI based HARThe aforementioned drawbacks have prompted researchers to turn to the use of CSI be-tween two Wi-Fi devices. This physical layer quantity can be estimated using commercially-off-the-shelf (COTS) Wi-Fi devices for different OFDM subcarriers by modifying Linuxdrivers [5] for an Intel 5300 Network Interface Card (NIC). The Linux driver, its in-stallation guidelines and debugging issues can be found in the github page: https://github.com/dhalperi/linux-80211n-csitool. A few examples of HAR applicationsdevised from systems leveraging the fine-grained CSI are given below:Activities and Gestures: Applications can include localizing human beings for securityreasons, smart home applications such as measuring the repetitions of various in-homeexercises, monitoring vital signs such as human respirations and heart rates while sleep-ing [6] and [7]. WiKey [8] is one of the recent gesture-based systems which can classify3Chapter 1. Introductionkeystrokes from a continuously typed sentence with an accuracy of 93% . The motiva-tion of such work is to enable “typing in the air” without the need for a keyboard. Futurepotential gesture applications can be Wi-Fi based switching on and off electronic devicesand appliances and automatic sleep of devices when no movement is detected to preserveenergy.Fall Detection of Elderly: Since on-time detection and reporting of falls is crucial, speciallyfor the elderly, CSI-based fall detectors represent an important application. The fall detec-tion signal can be fed to Human-Computer-Interaction devices to alert the nearest medicalfacility. CSI-based fall detectors do not need the user to wear any device and do not invadeprivacy. The earliest work utilizing CSI to detect human falls is Wi-Fall [9] which utilizedthe amplitude to distinguish from three other activities. The authors provided an improvedclassification accuracy by proposing the random forest classifier in their recent work [10].However, both their work suffered from drawbacks: the algorithm did not consider the var-ious fall-like activities that occur in a daily living situation. To overcome this, researchersin Anti-Fall [11] included various fall-like activities in their dataset and utilized the phaseof CSI as a salient feature to improve classification. These authors proposed an improvedmodel [12] by exploring the use of phase difference between two receive antennas to notonly classify but also segment a fall event (including both falls and fall-like activities) fromdaily activities. They compared their work with Wi-Fall and reported higher sensitivity andspecificity.Other interesting applications: Another interesting application of CSI-based HAR is crowdcounting. This can be used in various applications such as guided tour, crowd control andmarketing research and analysis. The authors in [13] proposed a device-free crowd count-ing (FCC) that processed the CSI variance in presence of different number of people to4Chapter 1. Introductionfacilitate such an application, using only a router and a laptop. In [14] the authors proposedseveral signal processing techniques on the amplitude of CSI to detect moving people in aclosed environment. The authors in [15] suggested that passive movement detection can beimproved by using the phase component of CSI for the very first time. They also proposeda novel feature utilizing the maximum eigenvalue of amplitude/phase covariance matrixto improve robustness of classification in different environments. Potential applications ofsuch work include intrusion detection for safety reasons and monitoring patient movementsin hospitals.1.4 Technical Challenges in CSI-based HARThere are several technical challenges when utilizing the CSI signals for HAR applications.The first challenge is the presence of noise in CSI values which do not facilitate direct useto build any HAR system. To address this challenge, we apply several pre-conditioning andnoise attenuation techniques to eliminate abrupt changes in values not instigated by humanactions. The second technical challenge is activity segmentation in continuously streamedsignals. This is because there is usually no clear transition between CSI amplitude or phasesignatures for different activities. Inaccurate segmentation may result not only in falseclassifications but may also miss a change in human activity. To address this challenge,we provide a novel signal segmentation technique based on the amplitude of Fast FourierTransform (FFT) coefficients that adapts based on some pre-defined conditions. The thirdchallenge is the selection of features to classify the different activities. The difficulty isdue to the close resemblance between CSI signatures for different human activities. Toovercome this problem, we apply several signal processing techniques so that the featurescalculated have unique values for different human activities.5Chapter 1. Introduction1.5 Research Motivation and ContributionsMotivationSince sensor based HAR solutions need the user to carry devices and computer visioninvolves the loss of privacy, our motivation is to research Wi-Fi based solutions. Theadvantage of CSI is it can be estimated over COTS devices present in most homes today.ContributionsAlthough leveraging the CSI has the potential to lead HAR applications, there exists limi-tations in all state-of-the-art solutions. We identified three major limitations in the currentliterature and mitigating them is the focus of our research work:(i) CSI performance degradation across complex environmentsMost CSI based solutions report high classification accuracy in relatively simple environ-ments such as a meeting or study room that contain at most a table and a few chairs orsofas. In environments such as home which include multiple walls, existing works usuallyemploy multiple Wi-Fi devices and the classification output is based on some majority vot-ing rule among the multiple devices [12]. Although this technique results in an improvedclassification accuracy, the use of multiple devices maybe impractical or costly from a userperspective.To overcome this problem, we propose Wi-HACS: Leveraging Wi-Fi for Human ActivityClassification using OFDM Subcarriers. We leverage correlation patterns across a rangeof OFDM (Orthogonal Frequency Division Multiplexing) subcarriers as novel features toimprove classification across simple and complex environments. We propose the DiscreteWavelet Transform (DWT) based noise-attenuation technique to the amplitude and phasesignals and demonstrate its superior performance over commonly applied noise attenuation6Chapter 1. Introductionmethods in CSI based solutions. Furthermore, we propose a modification to a featurecommonly utilized in Environmental Science pattern recognition applications called theautospectrum [16]. To facilitate unique feature estimations, the measured signals are pre-conditioned to make the FFT profiles of different human activities as different as possible.As there is a lack of comprehensive solutions in CSI-signal segmentation, we proposea novel method based on the FFT profiles and set criteria for adaptive windowing. Tovalidate Wi-HACS, we reproduce the algorithm of an existing CSI-based HAR system[17] on our dataset and compare the improvements in classification accuracies, especiallyin complex environments. A total of 7 activity classes are considered while evaluatingour work against the benchmark. Since on-time fall detection is considered vital for theelderly population, we have taken human falls as one of the classes in our dataset. Falls isa leading cause of accidental death for people aged 65 and above. However, in our datasetfor to evaluate Wi-HACS, we only include one type of fall activity and one type of fall-likeactivity.Since there are numerous ways to fall down and a lot of daily human activities result insignal patterns which resemble fall signals (fall-like activities), we collect data about theseand focus on the problem of distinguishing between falls and fall-like activities in solvingthe second limitation identified in our research.(ii) CSI performance dependency on trained environmentsAnother limitation commonly associated with CSI based solutions is performance degra-dation in untrained/new environments. This is because the CSI depends heavily on theenvironment of the signal propagation. Since accurate fall-detection is vital to the well-being of elderly people, we propose DeepFalls. Our objective is to improve fall detectionclassifications in untrained environments. Here, we adopt Deep Convolutional Neural Net-7Chapter 1. Introductionworks (DCNNs) because of its ability to estimate features automatically. We considervarious types of falls and fall-like activities as reported in the literature and create a datasetconsisting of falls and fall-like activities only. However, since the DCNN is originallydesigned to classify images, we transform our CSI amplitude signals into a spectrogramrepresentation. In the pre-conditioning stages, we adopt some techniques from Wi-HACSbut utilize the Singular Spectral Analysis (SSA) instead of DWT to denoise signals.In our initial experiments, we found some limitations of classic spectrogram tech-niques, such as Short-Time-Fourier Transform (STFT) and Continuous Wavelet Trans-form (CWT) and suggest the Hilbert-Huang Transform (HHT) [18]. Since this process isbased on the Hilbert transform of signals produced by the Empirical Mode Decomposition(EMD) [19], we evaluate several variants of EMD. In our analysis, the best EMD variant ischosen based on the least “mode-mixing” effects on the Intrinsic Mode Functions (IMFs).Finally, we reproduce the algorithm of a recent state-of-the-art CSI-based fall detector,RT-Fall [12] on our dataset. Our results demonstrate DeepFalls can detect falls more accu-rately than RT-Falls [12] in untrained environments.(iii) Unavailability of CSI time series datasetSince CSI-based HAR is a relatively new research area, there is a lack of open-sourcedataset. As a result, it is difficult to compare relative performances among existing solu-tions. Moreover, the modification of Linux drivers for the Intel 5300 NIC and data col-lection is time-consuming. Hence, a further contribution of this thesis is to make ourdataset available to the public. Since the CSI signals are non-stationary, this dataset canbe useful to researchers working with non-stationary signal processing and feature engi-neering. In addition. we intend to make the spectrogram images as part of our datasetto add to existing image databases for computer vision and DCNN practitioners. How-8Chapter 1. Introductionever, the dataset will only be made available after our publications and our expectedtime-frame to upload the data is by the end of this year. The dataset will be hostedin: https://github.com/tahmidzbr/Human-Activities-Gestures-Recognition-using-Channel-State-Information-CSI-of-IEEE-802.11n.1.6 Thesis organizationThe remainder of this thesis is structured as follows:• In Chapter 2, the foundations for Wi-HACS will be covered. The physical inter-pretation of CSI at the granularity of the OFDM subcarriers will be discussed. Theamplitudes and phases of all subcarriers will be studied per packet as well as its vari-ations under continuously streaming of packets. The correlations between humanactivities and the amplitude and phase variations are explained. This is followed byCSI time-series pre-conditioning and the DWT de-noising techniques to account forabrupt variations not caused by human activities. Since several features will be de-rived from the FFT profiles, pre-processing techniques to calculate unique featureswill be covered. The Principal Component Analysis (PCA) technique is used to re-duce the number of correlated subcarriers per transmit-receive (TR) link. Finally anovel adaptive signal segmentation method based on the FFT profiles of signals andsome pre-defined criteria are described in detail.• In chapter 3, we propose novel features calculated using the subcarrier correlationsand autospectrum of the amplitude and phase principal components. We describe thedata collection procedure and review the baseline and performance metrics. We trainand test the Support Vector Machine (SVM) classifier twice, 1) using features cal-culated before and 2) using features calculated from de-noised signals, to illustrate9Chapter 1. Introductionthe effects of our signal processing techniques in improving the classification results.We also train and test our classifier using the adopted features and compare the re-sults using both adopted and proposed features, to demonstrate the improvementsin classifications due to the new proposed features. Finally, we compare our resultswith the baseline and explain the differences in cross-validation and test results inthree different environments.• In chapter 4, we lay the foundations for DeepFalls. The main objective is to beable to classify human falls from fall-like activities in untrained environments. Wereview the Singular Spectral Analysis (SSA) as an alternative tool to the DWT de-noising. We then review traditional spectrogram transformations based on Fourierand Wavelet transforms and propose the Hilbert-Huang Transform (HHT) basedspectrograms. Since the HHT is based on the Empirical Mode Decomposition (EMD)method, we review several variants and choose the one with the least mode-mixingeffect in the IMFs. The HHT method based on this EMD variant improves thedifferentiation between falls and fall-like activities, compared to traditional time-frequency methods.• In chapter 5, we discuss the Convolutional Neural Network (CNN) architecture usedin DeepFalls. We discuss the procedures taken to collect data from three differentenvironments. We then compare our results with a recent state-of-the-art CSI-basedfall detector by training and testing in different environments.• In chapter 6, we summarize our research results and propose some possible futuredirections.10Chapter 2Wi-HACS: A Wi-Fi based HumanActivity Classification using OFDMSubcarriersThis chapter will cover the architecture of Wi-HACS, designed to solve the first limitationstated in Chapter 1: Performance degradation in complex environments. The amplitude andphase variations in the received signals due to different human activities will be explained.A series of signal pre-conditioning methods to remove outliers and improve features, willbe clarified through experiments. The disadvantages of common de-noising algorithmsin CSI-based HAR are identified and the DWT is proposed to overcome these problems.Since adjacent frequency subcarriers are highly correlated, Principal Component Analysis(PCA) is used to reduce the number of subcarriers per TR link. Finally, a novel adap-tive signal segmentation method based on FFT is described to overcome the limitations inexisting CSI-HAR segmentation methods.11Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers2.1 Channel State Information (CSI)When humans move within range of Wi-Fi networks, the multipath propagation will beaffected. This is because of scattering of Wi-Fi signals resulting from changes in humanpostures [12]. Therefore, the Wi-Fi channel consists of signals reflected and scattered bystatic objects in the environment, such as furniture and moving objects such as humanbeings. When a person moves, the signals reflecting from the person changes. Since theChannel State Information (CSI) represents the signal propagation effects in a channel,these additional reflections caused by different human activities can also be observed. Tocollect the CSI data between a Wi-Fi router and a laptop, we installed and modified afirmware for an Intel 5300 Network Interface Card (NIC) as recommended in [5]. Thegoal of our research is to use this CSI data to recognize human activities.Denoting the transmitted and received signal vectors as x and y respectively, the Wi-Fichannel in frequency domain can be modeled as:y = Hx+n (2.1)where H is a complex channel matrix consisting of CSI values and n is the channelnoise vector. The CSI is estimated for each Orthogonal Frequency Division Multiplexing(OFDM) subcarrier in IEEE 802.11n links [20]. OFDM splits the total frequency spec-trum into 56 or 114 frequency subcarriers for a channel bandwidth BW of 20 and 40 MHzrespectively. The CSI for each OFDM subcarrier ish = |h|e jθ (2.2)where |h| and θ represent the amplitude and phase respectively. We measure the amplitudeand phase of 30 subcarriers per TR link as base signals for further processing to detect12Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers Table 2.1: Number of subcarriers and carrier grouping (IEEE 802.11n Standards) [21]. Thesubcarrier indices are the carriers for which channel matrices are sent.different human activities. The bandwidth of the Wi-Fi channel is set to 20 MHz duringdata collection. Since the modified firmware [5] reports CSI values for 30 subcarriersper TR link, the indices of these subcarriers correspond to the grouping (BW = 20 MHz,subcarrier grouping Ng = 2 and number of subcarriers Ns = 30) in Table 2.1. The rightmost column in Table 2.1 indicates the subcarrier indices k which depend on the channelBW and Ng.In our research, each CSI measurement contains 30 complex matrices with dimensionsNT x x NRx, where NT x and NRx represent the number of transmit and receive antennasrespectively. In this thesis, the CSI values for each subcarrier for a given transmit-receivelink is termed as CSI time-series and the total dimensions are NT x x NRx x 30.13Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers0 50 100 150 200 250 300Packet Index-20-10010203040Amplitude (dB)(a)0 50 100 150 200 250 300Packet Index-20-10010203040Amplitude (dB)(b)Sub#5 - T1R1Sub#5 - T1R2Sub#5 - T2R1Sub#5 - T2R2Figure 2.1: Amplitude variation across (a) 30 subcarriers for same T-R link, (b) 5th sub-carrier across four T-R links2.1.1 Correlation between human activities and amplitudes ofOFDM SubcarriersIn this subsection, we observe how human activities affect the amplitudes of subcarriers.The amplitude variations of different subcarriers in the same TR link as well as those ofdifferent links are observed. The NIC firmware reports the channel measured during thereceived packet preamble, to the user. Hence for each packet, the amplitude and phasevariations of the 30 subcarriers can be measured. In Fig. 2.1, the amplitude variationsper received packet are plotted. It can be observed the variations across all the subcarriersin the same TR link are similar (Fig. 2.1a), whereas variations of the same subcarrier indifferent links are relatively less similar (Fig. 2.1b).Although the benchmark [17] and other related works [9, 12] averaged all 30 subcarri-ers into one time-series and groups of five time-series per link respectively, we investigatewhether this is a good choice. We computed the Pearson Correlation Coefficient (PCC)[22] to measure linear correlations between subcarriers s(t)1 in the same TR link as wellas between different links using the following equation:1In this thesis the signal of interest is denoted by s(t); for clarity in equations s(t).14Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM SubcarriersPCC =∑Li, j=1 sis j−(∑Li=1 si)(∑Lj=1 s j)L√(∑Li=1 s2i − (∑Li=1 si)2L)(∑Lj=1 s2j −(∑Lj=1 s j)2L) (2.3)where si and s j represents the two subcarrier signals and L is the length of the signal; forease of notation, the subcarrier signals si(t) is replaced by si. The results are shown inFig. 2.2. We observe that adjacent subcarriers in the same TR link are more correlatedthan those further apart in frequency, as illustrated in Fig. 2.2a. We also observe that thecorrelations between subcarriers which are far away in frequencies, for instance subcarrier5 and 20 have a correlation value of 0.67 whereast the correlation between subcarrier 3and 10 is 0.27. Therefore the correlations between subcarriers which are not adjacent infrequencies vary. We also observe correlations between the same subcarrier in differentTR links are mostly low. As a result, we do not average out the subcarriers and we utilizesome of these patterns as features that are unique to different human activities. Details onthese are given in Section 3.1. The frequency spacing between successive subcarriers is312.5 KHz [20] and the frequency of each subcarrier fi is calculated asfi = fo+BW ∗ k (2.4)where fo is the center operating frequency, BW is channel bandwidth and k is the subcarrierindex from Table 2.1 (right-most column). When comparing PCC maps for different hu-man activities for the same TR link, we observed distinctive patterns of correlation valuesbetween a subcarrier from index #5−15 with one from #25−30. This is a key observationin our research and the features derived from these will be discussed in section 3.1.2. Theremaining features are calculated by reducing the correlation subcarriers per TR link usingPrincipal Component Analysis (PCA) [22], and is discussed in section 2.4.15Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers(a)5 10 15 20 25 30Subcarrier #51015202530Subcarrier #00.20.40.60.81(b)T1R1-sub5T1R1-sub15T1R1-sub22T1R1-sub30T1R2-sub5T1R2-sub15T1R2-sub22T1R2-sub30T2R1-sub5T2R1-sub15T2R1-sub22T2R1-sub30T2R2-sub5T2R2-sub15T2R2-sub22T2R2-sub305 subcarriers across 4 linksT1R1-sub5T1R1-sub15T1R1-sub22T1R1-sub30T1R2-sub5T1R2-sub15T1R2-sub22T1R2-sub30T2R1-sub5T2R1-sub15T2R1-sub22T2R1-sub30T2R2-sub5T2R2-sub15T2R2-sub22T2R2-sub305 subcarriers across 4 links00.20.40.60.81Figure 2.2: Correlation Matrix for (a) 30 subcarriers for same T-R link, (b) Group of 5subcarriers across four T-R links. The colorbar represents the correlation values.2.1.2 Correlation between human activities and phases of OFDMSubcarriersIn this section, the relationship between human activities and phases of the OFDM sub-carriers are explored. The phases of the subcarriers for one TR link during two differenthuman activities, are plotted in Fig. 2.3b. It is observed that these values are extremely ran-dom and it is impossible to distinguish the two activities. The source of the randomness inphase values is due to the Carrier Frequency Offset (CFO) and Sampling Frequency Offset(SFO), which results due to the mismatch between oscillator frequencies at the transmitterand receiver [23]. Although CFO results in the same phase change across all subcarriers,the SFO causes the phase to grow linearly with the subcarrier index. These effects can bevisualized in Fig. 2.4a.In Fig. 2.4a, a break in linearity is observed for the 15th and 30th subcarriers. Thisis due to the grouping of subcarriers in the Intel 802.11n standards (Table 2.1). For thesubcarrier indices corresponding to channel BW = 20 MHz, Ng = 2, and Ns = 30, everyalternate subcarrier CSI starting from index = -28 is reported. However, after the 14th16Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers0 200 400 600 800Packet Index1520253035Amplitude (dB)(a)0 200 400 600 800Packet Index1520253035Amplitude (dB)sub#1sub#8sub#18sub#300 200 400 600 800Packet Index-505Phase (rad)(b)0 200 400 600 800Packet Index-505Phase (rad) sub#1sub#8sub#18sub#300 200 400 600 800Packet Index-505Phase (rad)(c)0 200 400 600 800Packet Index-505Phase (rad) sub#1sub#8sub#18sub#30sittingsittingwalkingwalkingwalkingwalkingwalkingwalkingsitting sittingsittingsittingFigure 2.3: CSI signatures for two different human activities (a) Amplitude variations,(b) Raw CSI phase variations, (c) Calibrated CSI phase response. The variations for 30subcarriers are shown on the left and those of 4 subcarriers are shown on the right.subcarrier (index = -2), the next index reported is -1 and not 0. Then the next subcarriersreported start with index = 1 followed by alternate indices until 27. This is the reason fora linear behaviour in phases across subcarriers 15th to 29th. Then, the break in linearityfor the 30th subcarrier is due to index = 28 being reported instead of 29. Since the phasesare random (Fig. 2.3b), the PCC maps do not reveal any distinctive patterns in correlationsbetween phases of subcarriers.2.1.3 OFDM Phase CalibrationSince the raw phase information is not useful to distinguish human activities, a calibrationtechnique [24] can enable phases to become base signals2 in addition to the amplitudes foractivity recognition. The measured phase φˆi for the ith subcarrier can be represented as:2Base signals refer to variations in the amplitude and phase of subcarriers in time from which features arecalculated.17Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriersφˆi = φi−2pi kiN δ +β +Z (2.5)where φi represents the true phase, ki is the subcarrier index in Table 2.1, N is the FFTsize (which is 64 in IEEE 802.11n [21]), δ is the timing offset at the receiver, and β andZ denote an unknown phase offset and a measurement noise respectively. Due to severalunknowns in equation (2.5), the phases obtained at the NIC is a noisy representation ofthe true phases. The main idea of the phase calibration technique is to remove δ and βby considering phase across the entire channel bandwidth, which originally consists of 56subcarriers for a 20 MHz channel. However, since only 30 subcarrier CSIs are reported bythe firmware, this is factored into the calculation. Using equation (2.5), the two terms aand b are defined asa =φˆn− φˆ1kn− k1 =φn−φ1kn− k1 −2piNδ (2.6)b =1pp∑j=1φˆ j =1pp∑j=1φ j− 2piδpNp∑j=1k j +β (2.7)where p is the number of subcarriers. Referring to Table 2.1, since the subcarrier frequen-cies are asymmetric, the term ∑pj=1 k j 6= 0 in equation (2.7). But the authors in [24] havereported that by setting this term to zero, the randomness in raw phases of 802.11n devicescan be mitigated to some extent. The calibrated phases φ˜i, are obtained by subtracting alinear term aki+b from equation (2.5) as followsφ˜i = φˆi− (aki+b) = φˆi− φˆp− φˆ1kp− k1 ki−1pp∑j=1φˆ j (2.8)This process is also referred to as phase sanitization. The calibrated phases of subcar-riers corresponding to those measured in Fig. 2.4a are shown in Fig. 2.4b. In Fig. 2.3c,18Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers Figure 2.4: (a) Unwrapped CSI phase response , (b) Sanitized CSI phase after calibration.Some parts of (a) are zoomed to show the break in linearity in the unwrapped phase of the15th and 30th subcarrier.it can be seen the variations in calibrated phases for the different human activities differ.As a result, the calibrated phases are used in addition to the amplitudes as base signals inWi-HACS. However, the baseline [17] used to evaluate our work only utilized the ampli-tudes as base signals. Computing the PCC for the calibrated phases of subcarriers in allthe TR links, reveal similar observations to those for amplitudes. The calibrated phasesof adjacent frequency subcarriers are highly correlated while those of different TR linksreveal mostly less correlations.19Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers2.2 CSI Time-Series Pre-ConditioningThe goal of pre-conditioning is to address the uneven arrival of packets due to the burstynature of Wi-Fi transmission, remove underlying temporal variations not instigated by hu-man actions and improve the frequency characteristics of base signals.2.2.1 1-D Linear InterpolationThe reasons to interpolate the amplitude and phase signals is to enable FFT computations,which require evenly spaced data. This is also important for ‘DeepFalls’ (chapter 4) asunevenly spaced data in time-domain prevent FFT computations to produce spectrograms.Since the dataset in our research is collected at a sampling rate of 100 Hz, we use a 1-Dlinear interpolation algorithm [8] to evenly arrange data with a spacing of 10 ms. The time-stamps of packets are reported as 32 bits by the firmware. By evaluating the difference inthe reported time-stamps between two successive packets, the actual elapsed time for eachpacket arrival can be recorded. The input to the algorithm is packet arrival times, the basesignal values and an equal spaced vector consisting of the new time-points to which theCSI values are interpolated.2.2.2 Hampel Identifier Outlier RemovalThe CSI amplitudes and phases contain noises generated by internal state transitions suchas transmission power and rate adaptations, and thermal noises in the devices [25]. As aresult, these introduce variations and outliers to the base signals which are not caused byhuman presence. The outliers are indicated by ‘circles’ in the CSI amplitude waveforms inFig. 2.5a. In the figure, the most obvious outlier can be seen, for instance at t = 5s. Theother outliers in Fig. 2.5a are declared by the Hampel Identifier algorithm [26].20Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers0 2 4 6 8 10 12 14Time (s)1520253035Amplitude (dB)(a)sub#1sub#15sub#300 2 4 6 8 10 12 14Time (s)1520253035Amplitude (dB)(b)sub#1sub#15sub#30Figure 2.5: Effect of Hampel Outlier Removal on three subcarriers: (a) Raw CSI Ampli-tude waveforms with outliers denoted by ’black circles’, (b) Hampel filtered CSI amplitudewaveforms.The algorithm works as follows. For each value x of the base signals, the median of awindow consisting of x and m/2 neighboring points on each side, is computed. Then thestandard deviation of x about its window median is calculated using the Median AbsoluteDeviation (MAD). If x differs from the median by more than a predefined number of MAD,its value is replaced by the median. In other words, the Hampel Identifier declares discretevalues as outliers outside the interval [µ − γ ∗σ ,µ +σ ∗ γ], where µ and σ represent themedian and MAD respectively and γ is dependent on the application and has a default valueof three. In our research, we varied the value of m and kept the default value of γ = 3. Byvarying m in increments of 5, we observed whether the most obvious outliers are detected.In the end, m = 20 seemed a good choice for the number of points along with x.21Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers0 0.5 1 1.5 2 2.5 3Time (s)203040Amplitude (a)Hampel Filtered Subcarrierlinear regression0 0.5 1 1.5 2 2.5 3Time (s)-50510Amplitude (b)Hampel Filtered Subcarrierlinear regression0 10 20 30 40 50Frequency (Hz)0204060Amplitude(c)0 10 20 30 40 50Frequency (Hz)00.511.5Amplitude(d)Figure 2.6: Effects before and after de-trending CSI waveforms: (a) Hampel Filtered sub-carrier for walking activity, (b) De-trended waveform of (a); One-sided amplitude spectrumof FFT for (c) before de-trending, (d) after de-trending2.2.3 De-trending Subcarriers to avoid spectral distortionThe CSI base signals sometimes display a trend, which can be visualized as a positiveor negative slope over the length of the signal. This effect is mostly observed duringhuman activities which include vertical hops such as squatting, jogging, etc. But in afew cases, this trend is also observed in non-hoping activities such as walking or sittingdown. These effects on the amplitude signal during a walking activity can be seen inFig. 2.6. In Fig. 2.6a, a linear regression line has been plotted to visualize this trend.The frequency associated with the trend is lower than the lowest (fundamental)3 frequencyin the spectrum. The energy from this trend is leaked to that of the lower frequencies,thereby distorting the lower part of the spectrum [16]. This distortion can be minimizedby subtracting the data from the linear regression line, also known as de-trending. Thisis important in our research because some amplitudes of dominant frequencies will be3fundamental frequency is 1/T , where T is the duration of the window [16]22Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriersused as features in Chapter 3. Furthermore, de-trending can also remove the zeroth or DCfrequency component shown in Fig. 2.6c. This is because the zeroth frequency componentis the mean of the signal based on the following DFT equations [27]X [ f ] =L−1∑l=0x[l]exp− j2pil f (2.9)where L is the length of the signal. At the DC frequency, the equation becomesX [0] =L−1∑l=0x[l] (2.10)As a result, the DC frequency is simply the mean of the signal. Hence by de-trending,the lower spectral distortion is minimized and the DC frequency is removed.2.2.4 Effect of zero-padding in FFT computationsSince the frequency resolution (number of frequency points) of the FFT is determined bythe length of the time-series, it is possible to increase frequency resolution by adding moretime points [28]. This can be done by a process known as zero-padding, which adds extrazeros at the end of the time-series. There are two main reasons to zero-pad the CSI basesignals in Wi-HACS:(i) Since the base signals of some activities, for example sitting and standing, are similarand share the same FFT profiles, increasing frequency resolution can improve featuresderived from FFT, in particular the autospectrum. Because when frequency resolution isimproved, the frequencies smeared in the FFT profile can be distinguished better.(ii) It can improve FFT processing times, as this algorithm is most efficient if the inputtime-series has a length corresponding to a power of two. The improvements in resolutiondue to zero-padding can be observed in Fig. 2.7.23Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers0 10 20 30 40 50Frequency (Hz)00.20.40.60.811.2Amplitude(a)0 10 20 30 40 50Frequency (Hz)00.20.40.60.811.2Amplitude(b)1) two distinct frequencies with higher amplitude1) two frequencies smeared together2) four frequency points very close to each other2) four frequency points separated with one verydistinct frequencyFigure 2.7: FFT profile of the walking activity (a) before, (b) after zero padding2.2.5 Tapering CSI waveforms to prevent edge artifactsWhen the FFT is applied to a data vector of finite duration T, the periodicity assumption inFourier analysis [16] creates a step discontinuity at y(T) unless y(0) = y(T ). This resultsin leakage of spurious energy to many frequency bands and hence distorts the Fourierspectrum. To understand this from a mathematical point of view, let us assume a timeseries y(t) whose duration is from −T/2 ≤ t ≤ T/2. Assuming a window function w(t) isdefined asw(t) =1, for −T/2≤ t ≤ T/2.0. otherwise(2.11)If wˆ and Yˆ are the Fourier transforms of w(t) and y(t), then the Fourier transform ofw(t)Y (t) is the convolution of wˆ and Yˆ . If the window function is rectangular (equation(2.9)), then its Fourier transform wˆ being a sinc function and the convolution of wˆ andYˆ produces spurious energy leakages into lower frequency bands [16]. To prevent theseedge discontinuities, it is necessary to multiply the data windows by a taper before takingits Fourier transform. The taper is a function that decays smoothly to zero near the endsof each window. Although spectral leakage cannot be completely prevented, it can be24Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarrierssignificantly reduced by altering the shape of the taper function. The cosine taper withdifferent taper ratio, (0≤ a≤ 1), in time and frequency domains is illustrated in Fig. 2.8.The mathematical form of this taper is:c(t) =12(1− cos(pia t)), for 0≤ t ≤ a.1. for a≤ t ≤ (1−a).12(1− cos(pia (1− t))), for (1−a)≤ t ≤ 1(2.12)By increasing a the power leakage from a spectral peak to adjacent frequencies is de-creased. Referring to Fig. 2.8, the ideal taper to consider would be the Hann taper whichresults in the quickest power decay in frequency domain. However, using this windowwould truncate a lot of data at the start and end of the window, therefore losing signalinformation. Considering the other extreme when a=0% or the rectangular window, theoriginal time-series data is completely preserved, however it results in the slowest decayof frequency across the bandwidth. This results in a time and frequency trade-off andhence it is important to balance the loss of signal in time-domain with the power decay inthe frequency domain. The amplitude base signal for the walking activity is multiplied bycosine tapers for various values of a and the FFT profiles are computed for each case forillustrations in Fig. 2.9.It is observed when a is increased, the window shape resembles more like a ‘cosinebell’ and hence increases the bandwidth, reducing the amount of spectral leakage and thusimproving the resolution of adjacent low-frequency spectral components. This is essentialto the development of Wi-HACS because all human activities give rise to low-frequencycomponents but the energy across these components vary for different activities. However,the amplitude of FFT profiles of the tapered signals decrease with increasing tapering ratio.This is because increasing a has the drawback of attenuating valid data at the beginning25Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM SubcarriersWindow Viewer20 40 60 80 100 120Samples00.20.40.60.81Amplitude(a) Time domain0 0.2 0.4 0.6 0.8Normalized Frequency (×pi rad/sample)-150-100-50050100Magnitude (dB)(b) Frequency domainrectangular (a=0%)cosine taper (a=25%)cosine taper (a=50%)cosine taper (a=75%)Hann (a=100%)a=0% (rectangular)a=25%a=50%a=75%a=100%(Hann)Figure 2.8: (a) Time-domain and (b) FFT profile of the cosine taper with various taperingratiosand end of the time-series. In our research work, we set the value of a to be 5% to ensurea good balance between the time and frequency trade-off.2.3 Discrete Wavelet Transform (DWT) based NoiseAttenuation for CSI signalsIn this section, the noise attenuation techniques commonly used in CSI based HAR will becovered. After identifying their limitations, the DWT will be introduced as a de-noisingtechnique. The concept of DWT as dyadic filter banks, followed by selection of motherwavelet and decomposition levels, is explained. Finally, the thresholding technique and thethreshold value needed to reconstruct the signals from filter decompositions will be given.26Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers0 0.5 1 1.5 2 2.5 3Time (s)-50510Amplitude(a) Time-domain (a=5%)Hampel Filtered + De-trendedHampel Filtered + De-trended + Tapered0 10 20 30 40 50Frequency (Hz)00.511.5Amplitude(b) Frequency-domain (a=5%)0 0.5 1 1.5 2 2.5 3Time (s)-50510Amplitude(c) Time-domain (a=25%)Hampel Filtered + De-trendedHampel Filtered + De-trended + Tapered0 10 20 30 40 50Frequency (Hz)00.511.5Amplitude(d) Frequency-domain (a=25%)0 0.5 1 1.5 2 2.5 3Time (s)-50510Amplitude(e) Time-domain (a=50%)Hampel Filtered + De-trendedHampel Filtered + De-trended + Tapered0 10 20 30 40 50Frequency (Hz)00.511.5Amplitude(f) Frequency-domain (a=50%)Figure 2.9: The FFT profiles of the amplitude signal during the walking activity multipliedby cosine tapers of various tapering ratios (a) Time domain and (b) Frequency domain fora=5%, (c) Time-domain and (d) Frequency domain for a=25%, (e) Time domain and (f)Frequency domain for a=50%.2.3.1 Limitations in time and frequency based noise attenuationtechniques in CSI systemsIn current CSI based HAR literature, the noise attenuation techniques can be categorizedinto time and frequency based approaches. The time-domain approaches are the medianfiltering [10] and the Principal Component Analysis (PCA) de-noising [25]. The frequencydomain approaches are low-pass [29] and band-pass filtering (Butterworth) [6]. In [28] itis stated that the above time-domain approaches can distort the signal and result in lossof some vital high frequency components. In PCA based de-noising, the first PrincipalComponent (PC) is removed as it is considered to represent the highest noise variance.The disadvantage of this technique is that it also removes most of the information (vari-27Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriersance) representing the human activity. In complex environments where relatively less in-formation about human activities is conveyed by CSI signals, this de-noising technique canremove almost all useful information. Frequency domain approaches in CSI-based HARare mostly “out-of-band” filtering techniques, where noises in the passband are not elimi-nated. Since CSI signals have noise in all frequency bands [30], we propose an “in-band”noise-filtering technique based on the DWT to eliminate noise in all frequency bands, whilepreserving high frequency components. This is advantageous because the CSI base signalsconsist of rapid variations in very short durations such as during a change in activity, andare preserved after DWT de-noising.2.3.2 The Discrete Wavelet Transform (DWT) as a dyadic filter bankThe DWT can be realized as a dyadic filter bank, where filters of different cutoff frequen-cies are used to analyze the signal at different scales. The signal is passed through this bankto obtain high (details) and low frequency (approximation) components respectively.The procedure starts with passing the signal through a bank of half-band digital low-pass and highpass filters whose impulse response are h[n] and g[n] respectively, as shownin Fig. 2.10. The next step involves downsampling the signal by 2 since the BW of thebandlimited signal is now halved and according to Nyquist, the sampling rate is now halfof the initial rate. Therefore, the length of the signal at this stage is L2 . It is importantto note that the low-pass filtering removes the higher half-band information without alter-ing the scale. The scale is changed due to the downsampling process. Since resolution isrelated to the number of points (information) in the signal, it is affected by the filteringoperation. Thus, the resolution is decimated by 2 after the filtering operation and the scaleis doubled after the downsampling operation. Therefore the DWT analyzes the signal atdifferent frequency bands with different resolutions. The lowpass and highpass filters28Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM SubcarriersFigure 2.10: (a) The subband coding algorithm in DWT [31]correspond to the two functions of DWT: scaling and wavelet respectively. At each levelof the DWT, the decomposition reduces the time resolution by two (as half the number ofsamples characterize the signal at the previous stage). In contrast, the frequency resolutionat each stage doubles, since the frequency band at each stage spans only half of the bands atthe previous stage. This procedure is referred to as subband coding [31] and is illustrated inFig. 2.10. By using this technique, at low frequencies, the frequency resolution is pre-served, while at higher frequencies, the time resolution is preserved. The 3-level DWTdecomposition of CSI amplitude signal for the walking activity, utilizing the ‘Daubechies’(db-8) [32] wavelet is illustrated in Fig. 2.11. The choice of db-8 as mother wavelet isadopted from [33], in which the authors have shown it is the optimal choice for de-noisingnon-stationary Electrocardiogram (ECG) signals. Although our base signals are different29Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers0 2 4 6 8 10 12 14Time(s)-20020Amplitude(i) Pre-conditioned CSI Amplitude(0-50Hz)0 100 200 300 400 500 600 700 800Number of points-20020Amplitude(ii-a) Level 1 Approximation coefficients(0-25Hz)0 100 200 300 400 500 600 700 800Number of points-20020Amplitude(ii-b) Level 1 Detail coefficients(25-50Hz)0 50 100 150 200 250 300 350 400Number of points-20020Amplitude(iii-a) Level 2 Approximation coefficients(0-12.5Hz)0 50 100 150 200 250 300 350 400Number of points-20020Amplitude(iii-b) Level 2 Detail coefficients(12.5-25Hz)0 50 100 150 200Number of points-20020Amplitude(iv-a) Level 3 Approximation coefficients(0-6.25Hz)0 50 100 150 200Number of points-20020Amplitude(iv-b) Level 3 Detail coefficients(6.25-12.5Hz)Figure 2.11: The DWT decomposition structures for CSI amplitude during ‘walking’ event(i) Pre-conditioned amplitude, (ii)-(iv) representing levels 1-3 respectively and (a) and (b)denoting approximations and details for each level.from ECG signals, they are nonetheless non-stationary. Since the main objective of ourresearch is to improve the classification performance of HAR, we leave the explorationof other wavelet functions for future work. The decomposition level was chosen to be 3based on previous spectrogram results given in [12] and [34]. In our research, we definethree sets of activities in our data: in-place (sit and stand), out-of-place (walk, squat, jog)and fall-events (fall-from walking and sit-from walking). According to [12] the “in-place”activities occupy (0-5 Hz); the “out-of-place” and “fall-events” occupy the entire spectrum(0-50 Hz).In Fig. 2.11, the approximation and detail coefficients at each level of decomposi-30Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarrierstion can be seen. The x-axis represents the number of data points instead of time, as itdecreases by half in subsequent decompositions as a result of downsampling. A furtherdecomposition would produce level-4 approximation coefficients spanning (0 - 3.125 Hz)which would ideally represent very subtle activities such as making a phone call whilesitting. However, since in our measurements, we do not include such activities, the DWTdecomposition is set to level 3.By the process of inverse discrete wavelet transform (IDWT), the signal can be re-constructed. However in DWT de-noising, the detail coefficients are thresholded beforeapplying the IDWT. This is the scope of the following subsection.2.3.3 Thresholding methods and value selectionIn the first step of the DWT noise attenuation, the signal is decomposed using a motherwavelet into pre-determined decomposition levels W resulting in approximation and detailcoefficients at each level (Fig. 2.11). Then, thresholding is applied to the detail coefficientsfrom level 1 to W . Finally, the noise-attenuated signal is constructed using the originalapproximation coefficients in level W and the thresholded detail coefficients in levels 1 toW [35].The two most common methods of thresholding the wavelet coefficients are: hardthresholding and soft thresholding [35]. In both methods, the wavelet coefficients withmagnitude less than the threshold value are set to zero. The difference between these twooperations lie in how the magnitude of wavelet coefficients greater than the threshold aremanipulated. In soft thresholding, the magnitude of coefficients greater than the thresh-old are shrunk towards zero by subtracting the threshold from the coefficient. In hardthresholding, the coefficients greater than the threshold remain unchanged [35]. The twoequations defining hard and soft thresholding [36] respectively, are given as31Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriersw˜ j,i =w j,i : |w j,i| ≥ λ j.0 : |w j,i|< λ j(2.13)w˜ j,i =sgn(w j,i)(|w j,i|−λ j) : |w j,i| ≥ λ j.0 : |w j,i|< λ j(2.14)where λ j is the threshold value, w j,i amd w˜ j,i are the noisy and denoised wavelet coeffi-cients, respectively, at the jth decomposition level and the ith location of the detail compo-nent, and j ≤W . Hard thresholding is more suitable when the detail coefficients representa signal or a noise. In contrast, soft thresholding performs better when the detail coeffi-cient contain both signal and noise [36]. Since, the detail coefficients of our CSI signalscontain both signal and noise, we use the soft thresholding technique. In DWT de-noising,there exists several methods [37] to estimate the threshold value. Since the authors in [38]have experimentally shown the universal threshold [39] is the simplest and most effectivemethod, it has been adopted in our work. The universal threshold is given byλU = σ√2log(L) (2.15)where L denotes the length of signal, and σ is the noise standard deviation. Since the noisein our data is not known, the first-level detail coefficients D1 can be used to estimate σ [35]using the following equationσ =median(|D1|)0.6745(2.16)After thresholding the detail coefficients (levels 1 to 3), and reconstructing the signal usingthese and the third level approximations, the noise-attenuated version of the CSI base sig-32Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers0 2 4 6 8 10 12 14Time(s)-20-100102030Amplitude (a) Original Signal0 2 4 6 8 10 12 14Time(s)-20-100102030Amplitude (b) Median Filtered0 2 4 6 8 10 12 14Time(s)-20-100102030Amplitude (c) Low-pass Filtered0 2 4 6 8 10 12 14Time(s)-20-100102030Amplitude (d) Discrete-Wavelet Transform De-noisingwalk jump stand walkwalk jump stand walkwalk jump stand walkwalk jump stand walkstandingup fromsittingstandingup fromsittingstandingup fromsittingstandingup fromsittingFigure 2.12: Various de-noising algorithms applied to pre-conditioned CSI-amplitudetime-series for a series of activities:(a) Original pre-conditioned signal, (b) 5-point medianfiltering, (c) Butterworth Low-pass filter with cutoff frequency at 30Hz, and (d) 3-level‘db-8’ DWT based de-noising.nals can be constructed. The pre-conditioned amplitude base signal after applying variousde-noising techniques for a series of human activities is shown in Fig. 2.12. The 5-pointmedian filtering algorithm [40] do not preserve sharp transitions in the signal. This is be-cause any short transition usually appears to be a large value in the median-window andis replaced by the median value. As a result abrupt changes in signals are smoothed andhigh frequency components are lost. This can be observed in Fig. 2.12b in the region of600 ms, where spikes in the original signal (Fig. 2.12a) are flattened. Although Butter-worth low-pass filtering can maintain these sharp transitions, as observed in Fig. 2.12c, itis an out-of-band filtering technique, in which all high frequency components outside thecutoff are lost and noises within passband are not eliminated. In contrast, the DWT-basedapproach removes noises from all frequency bands and retains high frequency components.33Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers0 5 10 15 20 25 30mode number00.10.20.30.40.50.60.70.8Ratio of total variance explained by mode modes corresponding to T1-R1modes corresponding to T1-R2modes corresponding to T2-R1modes corresponding to T2-R2Figure 2.13: The variance of eigenvalues (modes) after the eigendecomposition of thecovariance matrices for each transmit-receive link.2.4 Principal Component Analysis (PCA) DimensionReductionSince the CSI amplitude and phase signals correspond to 120 dimensions for two TR links,we use PCA [22] to reduce the number of correlated signals. In the baseline work, theauthors averaged the signals into one signal per link. Since in their work only amplitude isused, the total dimensions were 60, and hence reduced to only two base signals. From ouranalysis in section 2.1.1, we demonstrate subcarriers further away in frequencies show littleto no correlation and hence averaging is not the optimal choice. The following describesthe PCA process to choose the optimal number of PCs to represent our data, where Xirepresents the ith TR link:(i) Mean removal: Since the signals are de-trended in the pre-conditioning step, the meanis already subtracted from each subcarrier in Xi.34Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers(ii) Covariance matrix estimation: Since Xi has 30 subcarriers (30-dimensional), the co-variance matrix for each link, Yi = XTi Xi is of dimensions 30 x 30.(iii) Eigen-decomposition: Since Yi is a square matrix, the eigenvectors Wi correspondingto eigenvalues λi can be computed. The columns of Wi are "ordered" such that the 1stcolumn (1st "PC" or 1st "loading") corresponds to the highest eigenvalue followed by the2nd column (2nd "PC" or 2nd "loading") corresponding to the second highest eigenvalue,and so on.(iv) Optimal number of PCs: This is based on the number of eigenvalues used to representa good variance of the data. From Fig. 2.13, it can be observed the first 3 PCs represents≈ 90% of the variance of the data. The data in this case is a matrix, whose columnsrepresent the subcarriers and the rows represent the values of these subcarriers at eachtime point. Since our research objective is to improve classification accuracies, we trainedand tested the Support Vector Machine (SVM) classifier using features calculated fromdifferent numbers of amplitude and phase PCs. Our classifier results indicated that if theclassifier was trained using features from more than three PCs, the cross-validation resultsdid not improve. Therefore, the CSI time-series in our research is reduced to NT x x NRx x3 = 1 x 2 x 3 = 6 amplitude and 6 phase base signals.2.5 Adaptive Windowing for CSI-signal SegmentationOne of the key challenges in CSI-based HAR systems is to accurately segment the startand end of a human activity from CSI signals. This is needed to calculate features fromeach window for classification. The methods commonly used in existing CSI-based HARare ‘event-detection’ or ‘fall-event segmentation’. In the event detection method [41],35Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriersthe sliding window first determines whether there is a human activity and if so, continueswith the feature extraction and classification algorithms. If no human activity is detected,the window is slid after a pre-defined number of data points. The limitation of such atechnique is since different human activities may be performed for different durations,a non-adaptive window can result in missing a change in human activity. In fall-eventsegmentation method, the signal is segmented to only determine the start and end of a‘fall-event’ such as in [10], [11] and[12]. Although the latest CSI-based fall detector [12]reports a segmentation accuracy of 100%, this technique cannot distinguish the start andend of non-fall activities. Since Wi-HACS is designed to classify both fall-events andother activities, the above two methods are not suitable. Therefore we propose an adaptivewindowing segmentation based on the amplitude of FFT coefficients.As described in section 2.3.5, different categories of human activities occupy differentspectral bands. Therefore an activity segmentation can be based on analyzing the spectralcomponents on each window. However, there are two challenges with such an approach:1) Out-of-place activities such as walking occupy the entire spectral band in the entirewindow, hence part of its spectrum overlaps with in-place activities such as sitting orstanding. 2) Fall events which consist of true falls and fall-like activities (sitting downfrom walking) occupy different spectral bands at different time periods in the activitywindow. Hence, we propose an adaptive sliding window to overcome these problems.From our empirical observations, amplitudes of frequencies ranging between (3-25 Hz)is sufficient to determine the segmentation. In every determination of activity segments,we propose two windows of length 200 (2 s) with 1 s overlap. We define low frequency( fL) to be from 3-10 Hz and high frequency ( fH) from 10-25 Hz. For demonstration pur-poses, let’s assume the lengths of first and second window are (α2−α1) and (β2−β1)respectively. In each window any frequency content with an amplitude less than 0.2 is36Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriersdiscarded. This is because for in-place activities as defined in section 2.3.5, most of theenergy (amplitude) of FFT coefficients are concentrated within 0-5 Hz and few coefficientswith very low amplitudes (< 0.2) are present outside this band. Since in-place activitiescontain fL and out-of-place activities contain both fL and fH , the signal window is adaptedbased on the following four cases:Case 1: Both windows w1 and w2 contain fL:This means both windows contain in-place activities (sit or stand). In this case the twowindows are left unchanged and is further processed individually for feature extractionsand classification. Any change in activity (sit to stand or vice-versa) is based on the classi-fication output.Case 2: w1 contains fL and w2 contains both fL and fH :This implies there is a change in activity from in-place to out-of-place and hence featureswill be calculated from each window and classifications will be made individually.Case 3: Both windows w1 and w2 contain both fL and fH :In this case, a third window w3 of length 200 is computed for the next part of signal withoutany overlap. This is done to estimate the frequency components of w3 to identify whethera change in activity has taken place in w2. This is important because as explained above,fall-events first occupy higher spectral bands corresponding to a rapid movement followedby occupation of lower spectral bands corresponding to the person lying down. Thus therewill be two specific cases in this regard:(i) If w3 contain fL: Identify that there is a change in activity which occurred within w2and hence merge w2 and w3 to form w4 and calculate features from this combined window37Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers Window 2 Three windows with w2 compared with w3. Since w2 has fL and fH and w3 has fL these two windows will be combined to form w3 No human presence Time (s) 0 10 20 30 40 50 15 10 5 0 -5 -10 Amplitude (dB) Open door and enter room Sit down Walk around Perform 4 squats Stand still Jog in room Fall-down Stand up Window 1 Window 3 Two windows individually processed Fall-event: both windows combined for further processing Figure 2.14: Adaptive Windowing based on the amplitude of FFT coefficientsof length: 300 and the w1 of length: 200 separately. In this case w1 will consist of anout-of-place event and w4 will consist of a fall-event.(ii) If w3 contain both fL and fH: This implies the out-of-place activity is continuouslybeing performed in w1,w2 and w3. Hence features will be calculated individually from w1and w2. The w3 will now be labelled as w1 and followed by the next window w2 overlap-ping 1s of w1 as the usual case and then the two windows will be evaluated on a case bycase basis.Case 4: w1 contains fL and fH and w2 contains fL:This indicates a ‘fall-event’ has taken place, which can either be a true-fall or a fall-like38Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriersactivity. In this case the two windows will be merged whose length will now be (β2−α1)to determine if it’s a true-fall or fall-like activity. The adaptive windowing for these casescan also be visualized in Fig. 2.14. The discussions listed above are true for both amplitudeand phase base signals.39Chapter 2. Wi-HACS: A Wi-Fi based Human Activity Classification using OFDM Subcarriers2.6 SummaryIn this chapter, the foundations behind Wi-HACS were reported. The concept of CSI andthe base signals: amplitudes and phases were introduced. We analyzed the correlationsbetween subcarriers which are adjacent to each other, and also far away in frequencies. Weobserved the correlations between adjacent subcarriers are mostly correlated. We also ob-served some interesting patterns in the correlations between subcarriers which are furtheraway in frequencies. These observations results in novel features which improves the clas-sifications, specially in complex environments. We will discuss more about these featuresin Chapter 3. Since the phases are extremely random, a phase calibration algorithm wasused to show that phase signatures can also be used as base signals, hence providing ad-ditional channel signatures corresponding to human activities. Essential pre-conditioningsteps were taken to reduce the base signal artifacts induced by hardware imperfections.Since we derive some features from FFT (details in Chapter 3), de-trending, zero-paddingand tapering techniques were explored to improve the FFT profiles of different humanactivities. The DWT noise attenuation was proposed to overcome the limitations of thede-noising techniques used in current CSI-HAR systems. The PCA algorithm was usedto reduce the number of correlated subcarriers in each TR link. Since there is no segmen-tation method that works for both falls and non-fall activities, we proposed an adaptivewindowing technique capable of segmenting both types of activities.40Chapter 3Wi-HACS: Classification andPerformance AnalysisThe main objectives of this chapter are to evaluate the proposed signal processing andnovel features and compare our results against a baseline. We begin by introducing fea-tures adopted from the baseline and explain novel features based on subcarrier correlationsand dominant frequencies in the autospectrum of base signals. We then discuss the ap-proach taken to tune the parameters in a multi-class SVM classifier. This is followed by adescription of procedures taken to collect the data. The performance metrics used to evalu-ate Wi-HACS are also defined. We present the method used to choose the optimal numberof amplitude and phase Principal Components (PCs). We assess whether the features ex-tracted from DWT and Hampel filtered signals improve classifications by comparing theresults using features from de-noised signals with those obtained from raw signals. Wealso evaluate whether our proposed features improve classification results. We then com-pare our work with a baseline and discuss the results. Finally, we present the results of aone-tailed paired t-test to determine whether our improvements are statistically significant.3.1 Feature ExtractionIn this section, we will outline the features adopted from the baseline [17] and describenovel features which are derived from the subcarrier correlations and the amplitudes of41Chapter 3. Wi-HACS: Classification and Performance Analysisfrequencies in the autospectra of base signals.3.1.1 Features adopted from baselineThe features utilized in the baseline are commonly used in wearable-based HAR systems[2]. The features adopted can be categorized into time and frequency based. The timedomain features are based on statistics: (i) Normalized Standard Deviation, (ii) Skewness,(iii) Kurtosis, (iv) Interquartile range, and (v) Median Absolute Deviation. The frequencydomain feature is the Normalized Entropy or Power Spectral Entropy [42], which estimatesthe disorder measure of time-series data in frequency domain. It is calculated by the fol-lowing steps:(i) Calculate the FFT of the signal, S( fi), where fi is the frequency corresponding to the ithdata point.(ii) Calculate the Power Spectral Density (PSD) of the signal by squaring the amplitudespectrum and scaling it by the length L of the signalPˆ( fi) =1L|S( fi)|2 (3.1)(iii) Normalize the PSD, Pˆ( fi)pi =Pˆ( fi)L∑i=1Pˆ( fi)(3.2)(iv) Calculate the normalized entropy asE =−L∑i=1pi ln pi (3.3)42Chapter 3. Wi-HACS: Classification and Performance AnalysisThe value of L depends on the type of activity segmented; for fall-events, L = 300 (3 s)and for in-place and out-of-place activities, L = 200 (2 s).In addition to the adopted features, we identified two measures: Correlations in time-domain and Autospectrum in frequency domain from which a total of 120 novel featuresare proposed. The former is calculated by measuring the PCC across a specific group ofsubcarriers and the latter is calculated from the optimum number of amplitude and phasePCs. Once the autospectrum of these PCs are calculated, the frequencies corresponding tothe most dominant amplitudes are taken as features. The improvements in classificationresults with our proposed features are experimentally measured and reported in section3.4.4.43Chapter 3. Wi-HACS: Classification and Performance Analysis Figure 3.1: PCC correlation matrices of amplitude of subcarriers of the same T-R linkafter signal pre-conditioning. Area of interest corresponds to the correlation values takenas features: Out-of-place activities: (a) Walking, (b) Jogging, and in-place activities: (c)Sitting and (d) Standing. The colour bar represents the correlation values.3.1.2 Proposed features based on subcarrier correlationsThe correlation matrices are obtained by computing the PCC between the amplitudes of30 subcarriers for one TR link during different human activities. We observe unique corre-lation patterns across a group of subcarriers, during different human activities for a givenenvironment. Referring to Fig. 3.1, the area of interest in each symmetrical correlationmatrix reveals distinctive patterns between subcarriers from indices #5−15 with the onesfrom #25− 30. Specifically, these correlation patterns show relatively higher correlation44Chapter 3. Wi-HACS: Classification and Performance Analysisvalues for out-of-place activities, walking and jogging, and lower values for in-place activ-ities, sitting and standing. Although these observations are also valid for the phase signals,we derive these features from only the amplitude signals, to reduce overfitting.3.1.3 Proposed features based on autospectrumThe autospectrum [16] of a signal s(tn) represents the variance in terms of its Fouriercoefficients. The following derivations will establish this relationship. The discrete Fourierseries representation of s(tn), where n = 1,2, ...,L, sampling interval ∆t = T/L, and theobservations are made at time tn = n∆t, is given bys(tn) =1LL∑n=1s(tn)+M∑m=1[am cos(ωmtn)+bm sin(ωmtn)](3.4)where M is the largest integer ≤ L/2, ωm is the angular frequency given by ωm = 2pimL , andam and bm are the Fourier coefficients given byam =2LL∑n=1(s(tn)cos(ωmtn)), m = 0,1,2, ...,M (3.5)bm =2LL∑n=1(s(tn)sin(ωmtn)), m = 0,1,2, ...,M (3.6)Denoting s(tn) as sn, the variance of the time-series s is given asvar(s) =1LL∑n=1(sn− s¯)2 (3.7)where s¯ represents the mean of s. Using Equation (3.4) the variance can be written asvar(s) =1LL∑n=1[ M∑m=1(am cos(ωmtn)+bmsin(ωmtn))]2(3.8)45Chapter 3. Wi-HACS: Classification and Performance AnalysisSince the sine and cosine functions have orthogonality properties:L∑n=1cos(ωptn)cos(ωqtn) =L2δpqL∑n=1sin(ωptn)sin(ωqtn) =L2δpqL∑n=1cos(ωptn)sin(ωqtn) = 0(3.9)where δpq is the Kronecker delta function. Using the equations in (3.9), the variance inEquation (3.8) can be written asvar(s) =12M∑m=1(a2m+b2m) (3.10)The autospectrum, Am, can be visualized as the energy contained in the frequency bandsAm =L∆t4pi(a2m+b2m) (3.11)where ∆t represents the sampling period. Therefore, the total variance can be written interms of the autospectrum as followsvar(s) =M∑m=1Am∆ω (3.12)with∆ω =2piL∆t(3.13)This autospectrum Am is computed from the optimum number of amplitude and phasePCs from each TR link. In each spectrum, the frequencies corresponding to the 1st fivedominant amplitudes are recorded as features. The number of dominant amplitudes was46Chapter 3. Wi-HACS: Classification and Performance AnalysisFeature Domain Base Signal # ofT-R links# offeaturesSubcarrierCorrelations TimeAmplitude of subcarriersindexed 5-15 with 25-30 1 60Autospectrum FrequencyFirst five peaks of 3Amplitude and 3 PhasePrincipal Components2 60Normalized StandardDeviation Time3 Amplitude and 3 PhasePrincipal Components 2 12Skewness Time " 2 12Kurtosis Time " 2 12Inter-quartile Range Time " 2 12Median AbsoluteDeviation Time " 2 12Normalized Entropy Frequency " 2 12Total # of features 192Table 3.1: Extracted Features for Wi-HACSchosen by trial and error. A table of the features used along with the number of base signalsand TR links are given in Table 3.1. In Table 3.1, the first two rows represents the novelfeatures while the remaining rows represent the adopted features.47Chapter 3. Wi-HACS: Classification and Performance Analysis3.2 Multi-Class Support Vector Machine (SVM)ClassificationThe Support Vector Machine (SVM) [43] is an algorithm that can achieve high gener-alization, meaning that test data can be classified correctly, by maximizing the marginbetween the hyperplane and the nearest feature vector. It can also classify labels/classeswhich are not linearly separable, by the use of a kernel function. Given a training datasetD = {(x1,y1),(x2,y2), ...,(xm,ym)}, where xi is a n-dimensional feature vector, yi is theclass/label to which the feature vector belongs, with a value of a 1 or -1, the soft marginSVM algorithm solves the following optimization problemminw,b,εi12||w||2+Cm∑i=1ξis.t. yi(w ·xi−b)≥ 1−ξi, ∀(xi,yi) ∈ Dξi ≥ 0(3.14)where ξi measure the degree of misclassification, and C is a tunable parameter, that deter-mines the tradeoff between the margin size and amount of error in training. The optimiza-tion problem can be solved using Lagrange multipliers and the dual problem can be usedto solve for the optimal hyperplane separating the classes. When the classes are not lin-early separable the non-linear SVM can be used, which consists of two steps: 1) The inputfeature vectors are transformed into high-dimensional space where the training data can belinearly separated, 2) then the soft-margin SVM is used to find the hyperplane of maximalmargin in the new feature space. A kernel function is used to compute the dot productbetween feature vectors as if they have been transformed to a higher dimensional space,without actually transforming the vectors. The kernel used in our SVM classifier is the Ra-dial Basis Function (RBF) Kernel, as it has proven to be a good choice [44]. The equation48Chapter 3. Wi-HACS: Classification and Performance Analysisfor RBF kernel is K(xi,x j) = exp(−γ||xi− x j||2), where K(xi,x j) is the kernel functionthat computes the dot product of the feature vectors and γ > 0 is the kernel parameter. Alarge gamma leads to high bias and low variance models, and vice versa.Therefore, two parameters in the SVM classifier needs to be tuned: C and γ . Thestrategy used to find good values for these parameters are adopted from [44], in which itsuggests to scale features and then use cross-validation to find the parameters. The featurescaling step in SVM is important as features with very large numerical values dominate theones with smaller numerical values [44]. All training and testing data in our research werenormalized to lie in the range [0,1].3.2.1 Cross-validation and Grid-searchSince the optimal values for the C and γ are not known beforehand, a grid-search tech-nique was used. The goal of finding these optimal values is to minimize the training andvalidation losses. In our research, the data are split into training, validation, and testing.The training and testing data were taken from two different regions of the experimentalrooms, as illustrated in Fig. 3.2. A 10-fold cross validation is performed on the trainingdata. Thus, each instance of the entire training set is predicted once resulting in a cross-validation accuracy. This was done to prevent overfitting. The grid-search on C and γwas done using cross-validation. By trying different combinations of (C,γ), the one whichresulted in the highest cross-validation accuracy was selected. After applying grid-searchand cross-validation, the optimal values were obtained as C = 0.025 and γ = 0.03125,C = 0.03 and γ = 0.04, and C = 0.01 and γ = 0.04 for the three environments in Fig. 3.2respectively. We utilized the ‘Scikit-learn’ python package [45] to train and test our SVMclassifier.49Chapter 3. Wi-HACS: Classification and Performance Analysis3.3 DatasetIn this section, the hardware, environmental settings and the procedures used to collectdata are described.3.3.1 Hardware and Base SignalsWe installed a Linux firmware [5] for an Intel 5300 NIC in a Dell Latitude E600 laptop.The transmitter was a Asus RT-N600 router, set at an operating frequency of 5 GHz. TheCSI data was measured between one transmitter and three receiver antennas (3 TR links),and therefore the time-series were of NT x x NRx x 30 = 1 x 3 x 30 = 90 dimensions if onlyamplitude or phase of subcarriers are used as base signals. If both base signals are used,then the CSI time-series are 180 dimensional.The baseline method utilized all three TR links and used amplitude as the base signal.The amplitudes of 30 subcarriers in each link were averaged to one time series beforefeature extraction. Therefore, the CSI time-series in the baseline were of 3 dimensions.However, in Wi-HACS, we use both amplitude and phase of subcarriers as base signals.We utilize two TR links and calculated the subcarrier correlation features only one TR linkto reduce the effects of overfitting. The remaining features were calculated from a reducednumber of subcarriers. Instead of averaging the subcarriers in each link, we utilize PCA toreduce the number of correlated subcarriers to 3 amplitude and 3 phase PCs per link. Sincea total of 2 links were used, the remaining features were calculated from (3 amplitudes + 3phases) x (2 links) = 12 PCs.50Chapter 3. Wi-HACS: Classification and Performance Analysis3.3.2 Data Collection ProcedureCurrently there is no CSI-based HAR dataset publicly available. We obtained the approvalfor data collection from the UBC Behavioural Research Ethics Board (BREB). The en-vironments used to collect data are illustrated in Fig. 3.2. The rooms are located in theMacLeod building for Electrical and Computer Engineering at The University of BritishColumbia, Vancouver. The rooms where experiments were conducted to collect data, areidentical in size and furniture contents.During data collection, the volunteer was asked to perform a set of human activities inthe regions marked as training and testing in Fig. 3.2. The markers indicated by trianglesand stars in the same figure represent the approximate positions where the activities wereperformed. For each human activity listed in Table 3.2, 30 samples of CSI data werecollected from each training and testing region. In this thesis, a sample refers to a fixed timewindow of data points for a given human activity. For example, the samples correspondingto in-place and out-of-place activities consist of data with a duration of 2 seconds, whereasfall and fall like samples are 3 seconds in duration. This is because in our segmentationresults in Section 2.5, the time windows for fall events were 3 s long. The window waschosen such that the signal 1 s before and 2 s after the fall event is measured. Therefore,the number of samples for each activity are [30 x (3 training + 3 testing regions)] = 180samples. The number of activity samples collected for each environment are (30 training +30 testing) x 7 activities = 420 and therefore our dataset consists of a total of 1260 samplesof human activities.For in-place activities, sit and stand, we asked the volunteer to perform these activitiesfor a continuous period of time (20-25 s) and then sampled these data into 10-12 sampleswith 2 s in each sample. For out-of-place activities, walking and jogging, a similar proce-dure was used. Since squatting is a tiring activity, the volunteer was asked to perform at51Chapter 3. Wi-HACS: Classification and Performance Analysis (a) Testing TX RX Training 5.6mm 3mm 6mm 9m (b) (c) RX RX Testing Training Testing Training 5.6mm 5.6mm TX TX 96% Walk Setting (a) Setting (b) Average Accuracy (%) Setting (c) Sit Stand Walk Squat Fall Lie Down Stand from lie-down AVERAGE 96.6% 95.5% 91.8% 89.7% 74.5% 66% STAND 98% 100% 91% 91% 74% 66% WALK 98% 96% 96% 93% 77% 72% Wi-HACS ACCURACY COMPARISON (%) ACTIVITY Setting #1 Setting #2 Setting #3 Wi-HACS Benchmark [15] Wi-HACS Benchmark [15] Wi-HACS Benchmark [15] SIT 96% 96% 92% 92% 76% 69% SQUAT 97% 97% 95% 92% 79% 68% FALL 95% 92% 90% 86% 70% 59% LIE-DOWN 96% 94% 88% 88% 77% 68% STAND FROM 96% 94% 91% 86% 69% 60% LIE DOWN Figure 3.2: Experimental Setting for Data Collection: (a) One room (with LOS), (b) Tworoom (NLOS), and (c) Three rooms (NLOS).most 7-10 squats continuously, and the corresponding signals were sampled to 3-4 times,as we assumed it takes approximately a second to perform one squat. Since squat belongsto the out-of-place activity, each sample consists of data with a duration of 2 seconds. Forthe fall down activity, a mattress was provided along the length of the dashed lines and thevolunteer was asked to fall from a standing position. Although there are several types offalls reported in the literature [46], only side-way falls were conducted due to space con-straints in the rooms. Other types of falls and fall-like activities are studied in Chapter 4and the data are collected from different environ ents. For the sit f om stand activity, threechairs were provided during the training and testing phase of data collection. The chairswere placed in a similar position as shown by the markers in Fig. 3.2. The volunteerswere asked to stand beside a chair for 4-5 seconds and then sit down and remain seated foranother 4-5 seconds. After the data corresponding to this activity were collected, only theportion consisting of 1 second before and 2 seconds after activity was taken as the sample.This technique of windowing was also done for the fall samples and is also used in [12]. Abuffer period of 10 s were added at the beginning and end of each data collection period.This was done to give enough time for the volunteer to return to the correct position inthe room to conduct the activity. The buffer at the end was the time taken to reach to thelaptop to stop the firmware. Once the data were collected, the data corresponding to the52Chapter 3. Wi-HACS: Classification and Performance AnalysisHuman Activity Number of samples Sit (1) 180 Stand (2) 180 Sit from stand (3) 180 Walk (4) 180 Squat (5) 180 Fall down (6) 180 Jog (7) 180 Total 1260 Table 3.2: List of human activities and total number of samples across all environments.buffer periods were discarded and only the data corresponding to time-periods for whichthe activities were conducted were sampled as explained above and labeled. The 1260samples of data were collected over a period of four months. The list of human activitiesand number of samples are given in Table 3.2.53Chapter 3. Wi-HACS: Classification and Performance Analysis3.4 Results and DiscussionIn this section, the classification results of WI-HACS are presented. The proposed signalprocessing algorithms and novel features are evaluated using the classification metrics. Wealso compare our results with a baseline work [17].3.4.1 Baseline Method and Performance MetricsTo evaluate Wi-HACS, we reproduced a recent state-of-the-art paper [17] on our datasetand compared the performances. Since the baseline metrics is accuracy for each class andthe average accuracy, we have utilized these metrics to measure the performance improve-ments. In addition, we also added the precision and recall metrics for each class. This isdone to avoid the accuracy paradox [47] which means a classifier with very little to nopredictive power has the potential to report a high accuracy. To understand this, we con-sider two cases: (i) When number of true positives is less than false positives, by changingthe classification rule to always output ‘negative category’ will always increase accuracy.(ii) Conversely, if the number of true negatives is less than false negatives, the same canhappen if the classification rule is changed to always output ‘positive’. Therefore the preci-sion and recall is used in addition to accuracy as metrics to compare the total performanceof our work with the baseline method.The precision metric of each class reveals the correct fraction out of all the examplesthe classifier predicts as positive. Recall explains the correct fraction out of all positiveexamples there are with respect to the actual data [48]. Since our precision and recall resultsare higher than the baseline, we did not include the F1 score which is a function of bothprecision and recall. These performance metrics are adopted from the binary confusionmatrix and extended to the multi-class scenario. These metrics are calculated for eachactivity class as follows54Chapter 3. Wi-HACS: Classification and Performance AnalysisAccuracy =T P+T NT P+T N+FP+FN(3.15)Precision =T PT P+FP(3.16)Recall(Sensitivity) =T PT P+FN(3.17)where T P, T N, FP and FN denote True Positive, True Negative, False Positive and FalseNegative respectively. To explain these metrics for multi-class classifications, we considera confusion matrix where rows represent the classifier results and columns represents theground truth, as shown in Fig. 3.6. The T P of a class is the diagonal value of that class inthe matrix. The T N of a class is the sum of all rows and column values excluding that ofthe positive class. The FP of a class is given by the sum of the values in the correspondingrow (excluding the T P). The FN of a class corresponds to the sum of the values in thecorresponding column (excluding the T P).55Chapter 3. Wi-HACS: Classification and Performance Analysis1 2 3 4 5# of Principal Components556065707580859095100(%)Average PrecisionAverage RecallAverage AccuracyFigure 3.3: Performance of Wi-HACS using different number of Principal Components perTR link; the total % variance represented by different number of PCs are: 1(48%), 2(76%),3(86%), 4(91%), and 5(94%).3.4.2 Selection of optimum number of Principal ComponentsThe method to select the optimum number of PCs is heuristic and is based on the clas-sification performance. The 1st five amplitude and phase PCs per TR link represents onaverage 48% to 94% of the total variance of the amplitude and phase base signals respec-tively. By utilizing different number of PCs per TR link, the performance metrics for thecross-validation results of the classifier in setting 1 are plotted in Fig. 3.3. In Wi-HACSonly two TR links are used and the number of amplitude and phase PCs for both links aresame. It is observed that the classification performance initially improves as the numberof PCs increase, as this increases the captured variance of the base signals and hence carrymore information regarding the Wi-Fi channel where the human activity takes place. How-ever, if the number of PCs that represent more than 90% of the variance are used for featureextractions, the classification performance degrades, as seen in Fig. 3.3. This is because56Chapter 3. Wi-HACS: Classification and Performance Analysisalthough increasing number of PCs increase the amount of information regarding the chan-nel, it can also result in overfitting, in which despite low training loss, the validation lossremains high. The optimum number of PCs used per link in Wi-HACS is 3 for both ampli-tude and phase signals which represents approximately 85% of the data. Furthermore, it isobserved the difference between average precision and recall remains same for any numberof PCs, but the gap between them and the average accuracy decreases up to PC #3 and thenincreases with increasing number of PCs. This is because accuracy is a function of T Psand T Ns over all samples of data whereas precision and recall are functions of T Ps withrespect to T Ps and FPs, and T Ps and FNs respectively. Hence, small changes in numberof T Ps affect accuracy slightly but greatly affect the precision and recall.57Chapter 3. Wi-HACS: Classification and Performance Analysis 90.8896.28 93.868.1790.4882.567.190.4882.4020406080100Wi-HACS (beforeDWT and HampelFilter)Wi-HACS (afterDWT and HampelFilter)BaselinePerformance in Setting 1Average Accuracy Average Precision Average Recall68.8280.3366.9426.074524.525.7144.824.3020406080100Wi-HACS (beforeDWT and HampelFilter)Wi-HACS (afterDWT and HampelFilter)BaselinePerformance in Setting 3Average Accuracy Average Precision Average RecallFigure 3.4: Performance of Wi-HACS before and after DWT de-noising and Hampel Iden-tifier for the simplest (setting 1) and most complex (setting 3) environments given in Fig.3.2. The baseline performances are also included for reference.3.4.3 Effect of DWT-based noise attenuation and Hampel Filteringon classification resultsIn our research we used the Hampel Identifier algorithm to remove outliers amd the DWTto filter out noise in the base signals. To assess whether these algorithms improve clas-sifications, we trained and tested our SVM classifier twice: (1) Using features computedfrom raw signals, and (2) using those from the de-noised signals. In this subsection, rawsignals refer to the amplitude and phase PCs and de-noised signals refers to the PCs afterDWT and Hampel Identifier processing. We compare the classification performances forthe simplest (setting 1) and most complex (setting 3) environmental settings. The resultsare plotted in Fig. 3.4. We also included the baseline results4 in the same figure. It canbe observed that the classification performance of Wi-HACS improves when features arecalculated from the de-noised signals as follows: for setting 1, the average accuracy, preci-sion and recall improves by 5.4%, 22.3% and 23.4% respectively; while the improvementsin setting 3 are 11.5%, 18.9% and 19.1% respectively. Furthermore, it can be observed thebaseline performance for the simplest environment is better than Wi-HACS without signal4All the baseline results reported in this thesis are complete; that is, the results are based on all the signalprocessing and feature extractions stated in their paper.58Chapter 3. Wi-HACS: Classification and Performance Analysisde-noising. The above observations indicate that signal de-noising is important prior to fea-ture extractions. An interesting observation from the same figure can be made for the mostcomplex setting. The results for Wi-HACS before de-noising are on average 2% better thanthe baseline for all performance metrics. This is because as environments become morecomplex, the amplitude base signals for different human activities become increasingly in-distinguishable, whereas the phase signals do not deteriorate as much. These observationswere also made in a recent CSI-based fall detection paper [12].59Chapter 3. Wi-HACS: Classification and Performance Analysis 82.5496.28 93.843.4190.4882.542.3890.4882.4020406080100Wi-HACS (withoutproposed features)Wi-HACS (withproposed features)BaselinePerformance in Setting 1Average Accuracy Average Precision Average Recall56.0180.3366.9410.984524.510.9544.824.3020406080100Wi-HACS (withoutproposed features)Wi-HACS (withproposed features)BaselinePerformance in Setting 3Average Accuracy Average Precision Average RecallFigure 3.5: Performance of Wi-HACS with and without the novel features based on subcar-rier correlations and autospectrum, for the simplest (setting 1) and most complex (setting3) environments in Fig. 3.2. The baseline performance is also included for reference.3.4.4 Effect of proposed features on classification resultsTo illustrate whether the proposed features improve classification results, we trained andtested the SVM classifier twice: 1) Once with the adopted features only, and 2) with bothadopted and proposed features. The classification results are shown for the simplest (set-ting 1 in Fig. 3.2a) and most complex environmental setting (Fig. 3.2c) in Fig. 3.5. It canbe observed all three metrics are improved with the addition of the proposed features. Themotivation in exploring these new features was to improve classifications across multiplewalls. Without the proposed features, our classifier resulted in under-fitting or “high-bias”,which means the training loss remained high. This resulted in a low training and lowcross-validation accuracy. We confirmed this was an underfitting problem as the trainingand validation loss did not decrease when we increased the number of training data. Byadding these new features, the underfitting problem was alleviated. The differences be-tween accuracy and precision or recall for complex environments are higher due to thereasons stated in the previous subsection.60Chapter 3. Wi-HACS: Classification and Performance Analysis1a) Confusion Matrix for environment 1: Best Classifier (Our) Sit Stand Sit from Stand Walk Squat Fall Down Jog Sit 28 2 0 0 0 0 1 Stand 2 27 0 0 0 0 0 Sit from Stand 0 0 26 0 0 3 0 Walk 0 0 0 28 2 1 0 Squat 0 1 0 1 27 0 1 Fall Down 0 0 3 0 0 26 0 Jog 0 0 1 1 1 0 28 Total 30 30 30 30 30 30 30 Sit Stand Sit from Stand Walk Squat Fall Down Jog Sit 25 4 0 0 0 1 1 Stand 4 25 0 1 0 0 1 Sit from Stand 0 0 25 1 0 3 2 Walk 0 0 0 24 4 1 0 Squat 0 0 2 3 25 0 1 Fall Down 1 0 2 0 0 25 1 Jog 0 1 1 1 1 0 24 Total 30 30 30 30 30 30 30 (a) Confusion Matrix for Wi-HACS Ground Truth Classifier Results (b) Confusion Matrix for Benchmark Ground Truth Classifier Results Figure 3.6: Confusion Matrices for cross-validation results for setting 1: (a) Wi-HACS, (b)Baseline.3.4.5 Performance Evaluation of Wi-HACS with BaselineHaving justified the importance of the signal processing and proposed features in classifi-cation performance, we now proceed to compare the performances of Wi-HACS and thebaseline in three different environmental settings. We report the 10-fold cross validationresults as well as the test results. The cross-validation was done to ensure we did not havean overfitting problem. The test data is used to determine whether the classifier after fine-tuning based on the training and cross-validation results is now able to predict accuratelywith new data. This is done for both Wi-HACS and the baseline method.• We start with the simplest environment (Fig. 3.2a). The confusion matrices resultingfrom the cross-validation results for both Wi-HACS and the baseline are shown inFig. 3.6. From these matrices, it is easy to visualize the cases (activities) where theclassifier succeeds and fails to predict. It can be observed Wi-HACS have a total of61Chapter 3. Wi-HACS: Classification and Performance Analysis 1a) performance metrics for environment 1 Activity Sit Stand Sit from Stand Walk Squat Fall Down Jog Metrics Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Precision 90.32 80.65 93.10 80.65 89.66 80.65 90.32 82.76 90.00 80.65 89.66 86.21 90.32 85.71 Recall 93.33 83.33 90.00 83.33 86.67 83.33 93.33 80.00 90.00 83.33 86.67 83.33 93.33 80.00 Accuracy 97.10 94.76 97.62 94.76 96.67 92.25 97.62 93.14 96.67 92.22 95.67 94.32 96.62 95.24 Average Accuracy Wi-HACS: 96.28% Benchmark: 93.81% Figure 3.7: Cross-validation performance metrics Comparison between Wi-HACS andbaseline: setting 1.190 TPs compared to 173 for the baseline for a total of 210 samples for the simplestsetting. In setting 2 and 3, Wi-HACS has a total of 149 and 95 TPs respectively whilethe baseline has 117 and 51 TPs respectively. The performance metrics of Wi-HACSand the baseline for each activity for setting 1 is shown in Fig. 3.7.• The misclassifications in Wi-HACS can be attributed to cases where the base sig-nals of similar activities resemble each other. Hence the features computed for thesesignals are also very identical. For instance, for the 30 samples consisting of theclass sit, only 2 are misclassified as stand. The time-domain representation of theseactivities can be observed in Fig. 2.14 for reference. The same can be observed forthe out-of-place and fall activities. Although there are some rare cases of confusionbetween non-similar activities (example between an in-place and an out-of-place ac-tivity), this happens at most for 1 sample for each class. In contrast, the confusionmatrix for the baseline approach illustrates a relatively higher misclassifications be-tween non-similar activities as well as between similar activities.• As the propagation environment becomes more complex, the number of misclassi-fications between similar as well as non-similar activities increase. This is because62Chapter 3. Wi-HACS: Classification and Performance Analysisas environments become more complex, the Wi-Fi signal propagation paths becomemore random. This increases the random perturbations of the CSI base signals whichmakes it challenging to distinguish between different human activities. Thereforeefficient feature extractions play a big role in the overall classification performance.The subcarrier correlations features provide some consistency for different humanactivities regardless of the signal perturbations. This is one of the main reasons forWi-HACS to classify activities with an average accuracy of 80% compared to the66% of the baseline in the most complex environment. Another reason for this im-provement is due to the addition of phases as base signals, which are not consideredin the baseline.• The average cross-validation and average test results for all metrics and all envi-ronments are given as bar charts in Fig. 3.8. The cross-validation results will beanalyzed first. The first observation is that Wi-HACS outperforms the baseline ap-proach in all three environments for all performance metrics. The second observationis although the performances degrade when the environment is more complex, thisdifference is larger in precision and recall, compared to accuracy. This is because, theaccuracy of a class is a measure of how many true positives and true negatives existwith respect to the total number of samples. Hence small differences in true positiveswhile comparing two algorithms do not result in large differences in accuracies. Onthe other hand, the precision and recall estimates the number of true positives withregards to the total number of false positives and false negatives respectively. There-fore changes in the number of true positives greatly affect the precision and recallmetrics.• The third observation is that the difference in performances between Wi-HACS andthe baseline increases as the environment becomes more complex. For instance,63Chapter 3. Wi-HACS: Classification and Performance Analysisthe performance improvement of Wi-HACS with respect to the baseline in averageaccuracy is 5.8%, and 8% in both precision and recall for the simplest environment,while they are 18% and 20.5% for the most complex environment. This is due tothe features proposed in our research. As the environment gets more complex, thevariations of base signals become more and more similar for all activities. Therefore,computing features become increasingly difficult. However, the correlation patternsacross a range of specified subcarriers do not degrade as much as the other features.Although these patterns corresponding to the same human activity do not remainconsistent in different environments, they remain similar in the same environment.• However, to ensure these correlations features are consistent in the same environ-ment, the base signals were first filtered by Hampel Identifier and DWT-based de-noising to minimize variations that are not caused by human presence. In addition,the autospectrum features are also improved by pre-conditioning the signals as out-lined in section 2.2. As a result the classification performances are enhanced whenthese features are used, as demonstrated in section 3.4.4. Furthermore, we also incor-porated the phase in our base signals which provided more information regarding theWi-FI environment where the human activities take place. As a result of these com-bined approaches, Wi-HACS is able to perform with higher accuracies, precisionsand recalls compared to the baseline.• After tuning the classifier using the training and cross-validation results, the tunedclassifier is asked to predict on the test data for each environment. The training andtesting for both Wi-HACS and baseline is done for each environment only. In otherwords, the classifier tuned for one environment is tested on new data collected inthat particular environment and not from a different environment. This is because agoal of this research is to improve classifications across multiple walls. An approach64Chapter 3. Wi-HACS: Classification and Performance Analysis 96.3 93.8 95.28 92.6190.582.588.6580.9790.582.488.5780.95020406080100Wi-HACS (cross-validation)Baseline (cross-validation)Wi-HACS (test) Baseline (test)Environmental Setting 1Average Accuracy Average Precision Average Recall90.485.7889.8684.4471.556.766.7350.271.355.766.1949.78020406080100Wi-HACS (cross-validation)Baseline (cross-validation)Wi-HACS (test) Baseline (test)Environmental Setting 2Average Accuracy Average Precision Average Recall80.3366.9479.7562.34524.542.622.1444.824.342.8521.9020406080100Wi-HACS (cross-validation)Baseline (cross-validation)Wi-HACS (test) Baseline (test)Environmental Setting 3Average Accuracy Average Precision Average RecallFigure 3.8: Average Performance Metrics of Wi-HACS and baseline under three environ-mental settings: the results are shown for both cross-validations and tests.which can classify activities even on untrained environments is discussed in Chapter4. From Fig. 3.8, the observations are identical to the ones made for cross-validationresults. That is, Wi-HACS test performances are higher than the baseline in all met-rics, with the performance improvement gap increasing with tougher environments.In addition, the test results indicate both approaches have the potential to general-ize to new data in the same environment. The confusion matrices and performanceresults for all cross-validation and tests in all three environments, by Wi-HACS andthe baseline are given in Appendix A.65Chapter 3. Wi-HACS: Classification and Performance Analysis-1.5 -1 -0.5 0 0.5 1 1.5Standard Normal Quantiles02468Quantiles of Input Samplea(i) Accuracy in Setting 1-1.5 -1 -0.5 0 0.5 1 1.5Standard Normal Quantiles2468101214Quantiles of Input Samplea(ii) Precision in Setting 1-1.5 -1 -0.5 0 0.5 1 1.5Standard Normal Quantiles-505101520Quantiles of Input Samplea(iii) Recall in Setting 1-1.5 -1 -0.5 0 0.5 1 1.5Standard Normal Quantiles0246810Quantiles of Input Sampleb(i) Accuracy in Setting 2-1.5 -1 -0.5 0 0.5 1 1.5Standard Normal Quantiles510152025Quantiles of Input Sampleb(ii) Precision in Setting 2-1.5 -1 -0.5 0 0.5 1 1.5Standard Normal Quantiles10152025Quantiles of Input Sampleb(iii) Recall in Setting 2-1.5 -1 -0.5 0 0.5 1 1.5Standard Normal Quantiles510152025Quantiles of Input Samplec(i) Accuracy in Setting 3-1.5 -1 -0.5 0 0.5 1 1.5Standard Normal Quantiles14161820222426Quantiles of Input Samplec(ii) Precision in Setting 3-1.5 -1 -0.5 0 0.5 1 1.5Standard Normal Quantiles246810Quantiles of Input Samplec(iii) Recall in Setting 3Figure 3.9: Quantile-Quantile Plot: The x-axis represents quantiles from a normal distri-bution and y-axis represents the quantiles drawn from the differences in performance ((i)average accuracy, (ii) average precision and (iii) average recall) by Wi-HACS and baseline,in (a) setting 1, (b) setting 2, and (c) setting 3.• A one-tailed ‘Paired t-test’ [49] was conducted on each performance metric for allactivity classes, to determine if the improvements of Wi-HACS over the baseline arestatistically significant. This test determines whether the mean difference betweentwo sets of measurements or methods is zero (null hypothesis). An important as-sumption behind this test is that the differences in measurements (results) by twomethods are approximately normally distributed. This shows that this test is quiterobust to deviations from normality [49]. Therefore to test whether the differencesare normally distributed, we performed a Quantile-Quantile (Q-Q) plot [50] and theresults are shown in Fig. 3.9. The Q-Q plot helps us to determine whether a set ofdata plausibly came from some theoretical distribution such as a Normal or Expo-nential. These plots the quantiles taken from the differences in performance between66Chapter 3. Wi-HACS: Classification and Performance AnalysisWi-HACS and the baseline versus the theoretical quantiles taken from a normal dis-tribution. If a regression line can represent most of the results, then the performancedifferences between Wi-HACS and the baseline for each activity class can be con-sidered to be normally distributed. Referring to Fig. 3.9, we can conclude that thedifference in every performance metrics in each environmental setting are normallydistributed. After computing the paired-sample t-test, the values of p indicate thatour improvements are statistically significant with 99% confidence interval.67Chapter 3. Wi-HACS: Classification and Performance Analysis3.5 SummaryIn this chapter, the classification results of our scheme were presented. We explained thefeatures adopted from the baseline and proposed novel features derived from subcarrierscorrelations and autospectra. We reviewed the multi-class SVM classifier, RBF kerneland the technique of cross-validation and grid search to optimize the cost and gammaparameters. The equipment and environmental settings used to collect data were described.Along with the baseline metrics, we added two other metrics to demonstrate the overallperformances of the classifiers. The effect of utilizing the DWT and Hampel Identifierde-noising was explained through experimental results. This was followed by the heuristicapproach to select the optimum number of amplitude and phase PCs per TR link. We alsoillustrated the effect of our proposed features by training and testing the SVM classifierwith and without these features. We compared the performance of Wi-HACS and thatof the baseline for all environments and all metrics. Finally, a paired t-test result wasused to demonstrate our approach can overcome the baseline performance with a statisticalsignificance of 99% across three environmental settings.68Chapter 4DeepFalls: Using Wi-Fi Spectrogramsand Deep Convolution Nets for FallDetectionThis chapter discusses the architecture of DeepFalls, that is designed to solve the secondlimitation stated in Chapter 1: CSI-based HAR performance degradation in untrained en-vironments. Given that accurate fall detection of an elderly person is crucial to his/herwell-being, our research objective is to devise a model that can improve the detection ofhuman falls in untrained environments. In Chapter 3, we found that the SVM classifiersthat were trained in a given environment performed poorly when tested in different envi-ronments. This is because the Wi-Fi signal propagation is dependent on the environmentand hence the feature values extracted from one environment may be quite different fromthose calculated from a different environment. Therefore, it is a challenge to extract fea-tures that are independent of the environment. Thus, we utilize the Convolutional NeuralNetwork (CNN) [51] to extract features automatically. Since the CNN is used for imageclassifications, we discuss the techniques that were used to transform the Wi-Fi signalsinto spectrograms. First, we discuss the signal pre-processing techniques adopted from ourprevious work, Wi-HACS, and provide a framework for DeepFalls. We then discuss theSingular Spectral Analysis (SSA), as an alternative de-noising technique to the DWT. This69Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detectionis followed by an analysis of the Short Time Fourier Transform (STFT) and the Contin-uous Wavelet Transform (CWT) spectrograms. As the Wi-Fi signals for fall and fall likeactivities are similar, so are the spectrograms obtained by these transformations for thesetwo types of activities. Thus, we utilize the Hilbert Huang Transform (HHT), to providehigher resolution spectrograms that can distinguish falls from fall-like activities. However,since the HHT is based on the Empirical Mode Decomposition (EMD), which has its lim-itations, we analyze some variants of EMD. We then describe some modifications to thesignal segmentation method used in Wi-HACS to only segment fall events (fall and fall-likeactivities) from non-fall events.70Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detection4.1 DeepFalls FrameworkIn DeepFalls, we adopt some of the signal pre-processing techniques used in Wi-HACS,as described in Chapter 2. The base signals used in DeepFalls are only the CSI amplitudesfrom four TR links and we leave out the phases for future exploration. The signals arefirst linearly interpolated to ensure equal spacing between data points. Then the HampelIdentifier is used to remove outliers in the signal. This is followed by de-trending, zero-padding, and tapering, as the signal segmentation technique is based on the frequencies ofthe signals. We adopt the signal segmentation scheme proposed in Section 2.5 and modifyit to only segment fall and fall-like activities. Once the signals are segmented, they are de-noised by Singular Spectral Analysis (SSA) and then decomposed using a recent variantof the Empirical Mode Decomposition (EMD). Then, the Hilbert transform of the decom-posed signals (Intrinsic Mode Functions) is used to create spectrogram images, which arefed into a Convolutional Neural Network (CNN) for classification.71Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detection4.2 Singular Spectral Analysis based noise attenuationAlthough DWT-based noise attenuation improved the classification performances in Wi-HACS (Section 3.4.3), the choice of the mother wavelet (‘db-8’) was based on the results ofECG de-noising [33]. Although our signals are different from those of ECG, we utilized the‘db-8’ as a starting point in our signal de-noising. Since our classifications results were bet-ter than the baseline [17] in all three environments, we decided not to explore other motherwavelets. Given that there are many wavelets to choose from, and since our objective isto improve classifications, a heuristic search method would be to compare the classifica-tion performances using different wavelet functions. Since we utilize Deep CNNs, whichrequire a significant amount of training time, searching for the optimal wavelet heuristi-cally is infeasible. Therefore, we investigate the Singular Spectral Analysis (SSA) [52],to avoid this heuristic search. The only parameter to optimize is the number of singularvalues to re-construct the signal, which can be done without measuring the classificationperformance. Therefore, the SSA is chosen over the DWT to de-noise the CSI amplitudes.Other applications of SSA include finding trends of different resolutions, extracting peri-odicity with varying amplitudes and noise attenuation. It has been utilized in the fields ofclimate analysis, meteorological studies, astronomy, medicine and economics. A furthermotivation to use SSA for noise attenuation is that it is a non-parametric technique and doesnot require any statistical assumptions (stationary/non-stationary, Gaussian/non-Gaussian,linear/non-linear) regarding the time-series data [52].There are two stages in the SSA algorithm: Decomposition and Reconstruction. In thefirst stage, the data is arranged in a trajectory matrix by a process known as Embedding.Then this matrix is decomposed by the Singular Value Decomposition (SVD) to obtain thesingular spectrum. In the second stage, the rank of the trajectory matrix is reduced (referredto as Rank Reduction in the literature) and then the noise attenuated signal is reconstructed72Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detectionfrom the rank-reduced trajectory matrix (also known as Diagonal Averaging). The detailedprocedure is given in the following subsections.4.2.1 EmbeddingLet s[n] = (s1,s2, ...,sN) be a 1-D time-series signal of length N. The embedding techniquerefers to mapping this 1-D signal to a multidimensional space by creating lagged vectorsof s[n]. The length of these lagged vectors, L where 1 < L < N is called the embeddingdimension. The number of lagged vectors K = N− L+ 1. Each lagged vector is of thefollowing form:li = (si,si+1, ...,si+L−1)T 1≤ i≤ K, (4.1)where []T represents the matrix transpose. These lagged vectors can be arranged to form atrajectory matrix, M = (l1, l2, ..., lK), which is represented asM =s1 s2 . . . sKs2 s3 . . . sK+1...... . . ....sL sL+1 . . . sN(4.2)From equation (4.2) it is observed that the ascending skew diagonals have the samevalues. The only parameter to select in this step is the embedding dimension L. Accordingto [53], the results of SSA de-noising is not very sensitive to the choice of L as long as Nis sufficiently larger than L, and recommends to use L = N/4.73Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detection4.2.2 Singular Value Decomposition (SVD)In this step, SVD is applied to the trajectory matrix M. Denoting the eigenvalues of MMTin descending order as (λ1 ≥ λ2 ≥ ... ≥ λL ≥ 0) and its corresponding eigenvectors as(U1,U2, ...,UL), the matrix after SVD can be represented as [54]M = M1+M2+ ...+Md (4.3)where d = max(i, such that λi > 0) is the rank of M, Mi =√λiUiVTi ,(i = 1,2, ..,d) arecalled the elementary matrices and Vi = MT Ui/√λ i. The matrix notation of Equation(4.3) isM = U VT (4.4)where is the diagonal matrix containing the singular values in descending order. Thecontribution of these elementary matrices to the trajectory matrix M is determined by theratio of each singular value ηi to the sum of all singular valuesηi =√λi∑Li=1√λi(4.5)After decomposing the trajectory matrices created from the CSI amplitude signals fordifferent human activities by SVD, the contribution of singular values to the signals aregiven in Fig. 4.1. The length of the signals are 7 s, and hence the embedded dimensions L =700/4 = 175. All the activities listed in Fig. 4.1 are performed continuously for 7 s exceptfor the fall and fall-like activities. In these cases, the first 3 seconds corresponds to eitherstanding or walking and the fall or fall-like activity is assumed to occur at approximately 4s, and the remaining portion of the signal corresponds to lying or sitting down. The size ofthe trajectory matrix M is 175 x 526, and the number of eigenvalues after SVD is 175.74Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detection0 20 40 60 80 100 120 140 160Singular value00.511.522.533.544.5Contribution of the Singular Value (%)No Human PresenceFall from standing positionSit from standing positionSquattingStandingWalkingJoggingFall from walking positionSit from walking positionLie down from walkingLie down from standing positionStanding and wavingThe number of singular values are chosen as0.6 x total singular values. In this case, this represents 105 singular values.Figure 4.1: The Singular Spectrum of the CSI amplitude base signals for different humanactivities in a meeting room.4.2.3 Rank ReductionThis is the first step in the reconstruction stage. In this step, a smaller number of singularvalues are recovered. The number of singular values taken to re-construct the signal dependon the application and is usually done on a trial and error basis [54]. In our research, weare mostly interested in whether the spectrograms obtained from Wi-Fi signals can providesome distinguishable patterns between fall and fall-like activities. Therefore, our first taskwas to ensure that the STFT spectrograms for fall events5 resembled what we reportedin Chapter 2; that is higher energies across the entire frequency band (0-50 Hz) duringthe fall event, followed by very low energies in the low-frequency bands (0-5 Hz) afterthe fall event. Using different numbers of singular values, we empirically observed thespectrograms. It was observed that reconstructing the CSI amplitudes using 60% of the5In this thesis, fall events refer to both actual falls and fall-like activities.75Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detection 0 5 10 15 0 5 10 15 Time (s) Time (s) Figure 4.2: The application of SSA based de-noising on the CSI amplitude for a fall signal(left) and fall-like signal (right). The fall down and sit (fall-like) activity refers to a fall (orsit) followed by a lying down or remain sitting for the remaining duration of the signal.most contributing singular values was sufficient, and thus all CSI amplitudes de-noisedby SSA were reconstructed using 0.6xL. After choosing the number of singular values toreconstruct the signal, the rank of the trajectory matrix is reduced and is written asMk = Uk kVTk (4.6)where k is the reduced rank of the trajectory matrix.4.2.4 Diagonal AveragingThe signal can be reconstructed by averaging the skew-diagonal values in the rank reducedmatrix Mk [55]. The result of this operation completes the noise-attenuation procedure us-ing SSA and the results are plotted in Fig. 4.2. It can be observed the signals are relativelysmoother, while preserving high frequency components. This is important, because falland fall-like signals have both high and low frequency components.76Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detection4.3 Time Frequency LocalizationIn this section, the techniques to obtain spectrogram images that localizes both time andfrequency are discussed. We utilize the Fourier and Wavelet-based transformations anddiscuss their limitations in differentiating a fall from a fall-like activity. We then proceedto the Hilbert Huang Transform (HHT) method and illustrate that this method can producespectrograms, which can differentiate the falls from fall-like activities, relatively betterthan the other two methods.4.3.1 Short Time Fourier Transform (STFT)The Short Time Fourier Transform (STFT) was developed to compute the Fourier trans-form of a non-stationary signal. The basic idea is to segment a signal with a pre-determinedwindow and multiply the segment by a window function and then calculate the FFT of theproduct. An assumption made by this technique is that the signal contained in each seg-ment is stationary. Since the frequencies are assumed to be constant within each window[56], the choice of the window is important as shorter duration windows will preservehigh-frequency components and vice versa. In addition, since the window function is typ-ically very small or zero near its boundaries, a portion of the segmented signal may beeffectively ignored in the analysis. Therefore, it is necessary to overlap the segments. Thepercentage of overlap depends on the window function. For windows that are relativelywide in the time domain (such as Hanning), 50% is a commonly used value for the overlap[57]. The STFT spectrograms for a fall and fall-like activity are given in Fig. 4.3. Thesampling frequency is 100 Hz, the length of the segment is 50 samples corresponding to0.5 s with a 50% overlap and the Hanning window function is used. The area between thetwo white vertical lines in Fig. 4.3 corresponds to the section of the spectrogram that willbe analyzed. The fall and the fall-like event occurs at approximately 7 s. Based on our77Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detection Figure 4.3: The Short Time Fourier Transform for (a) an actual fall, and (b) a fall likeactivity. Both the events take place at approximately 7 s windowed by 1 s before and 2 safter the event takes place.segmentation results for fall-events, the segment of the signal for processing is taken to be1 s before the fall event and 2 s after. This type of windowing a fall event is adopted fromthe baseline paper [12]. In Fig. 4.3a, it can be observed the change in energies from highfrequencies to low frequencies is very similar for both fall and fall-like activity. A similarobservation is also reported in the baseline. This effect is specially true when the fall-likeactivity is performed quickly, for example sitting on a chair quickly. Hence, the STFT isnot a very suitable spectrogram to use for distinguishing falls from fall-like activities.4.3.2 Continuous Wavelet Transform (CWT)The wavelet transform technique was introduced to overcome the fixed windowing limita-tion of STFT. Another advantage of the wavelet transform is the choice of different basisfunctions (mother wavelets) whereas the STFT only uses sinusoidal functions. In general,the CWT is used for time-frequency analysis while the DWT is used for noise-attenuation.In the CWT [35], the time-series signal is multiplied by a mother wavelet function of vari-78Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detection Figure 4.4: The Continuous Wavelet Transform for (a) an actual fall, and (b) a fall likeactivity. Both the events take place at approximately 7 s windowed by 1 s before and 2 safter the event takes place.ous scales and translations while computing wavelet coefficients for each scale and trans-lation. The mother wavelet is a zero-mean function of finite duration hence supportingtime-localization in the transformation. In contrast, the sinusoidal base in Fourier trans-forms are of infinite duration, hence enabling only frequency localization. Typical CWTspectrograms for fall and fall-like activities are shown in Fig. 4.4. In the figure, it can beobserved that at low frequencies, the frequency resolution is preserved, while at higher fre-quencies, the time resolution is preserved. This can also be seen in the fall-event which hasboth low and high-frequency components. Although the CWT provides a better spectralresolution than the STFT, the regions representing the fall and fall-like activity in Fig. 4.4,still look very similar.4.4 Hilbert-Huang TransformSince the STFT and the CWT spectrograms do not provide enough differentiation betweena fall and a fall-like activity, we explore the use of the Hilbert-Huang Transform (HHT)79Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detection[18]. The HHT is the National Aeronautics and Space Administration (NASA) designatedname for the combination of the Empirical Mode Decomposition (EMD) and the Hilbertspectral analysis.To obtain the instantaneous frequencies of the activities in the CSI base signals6, theHilbert Transform (HT) [28] is applied to the IMFs generated by the EMD process. Sincethe IMFs are functions of time, the result of the transform y(t) is given asy(t) = H(IMF(t)) =1pi∫ +∞−∞IMF(τ)t− τ dτ (4.7)where H(.) denotes the Hilbert transform. The analytic function z(t) [18] can be written asz(t) = IMF(t)+ iy(t) (4.8)The instantaneous amplitude a(t) and instantaneous phase θ(t) can be derived from z(t) asa(t) =√(IMF)2(t)+ y2(t) (4.9)θ(t) = arctan( y(t)IMF(t))(4.10)Using θ(t), the instantaneous frequency F(t) can be derived asF(t) =dθ(t)dt(4.11)4.4.1 Empirical Mode Decomposition (EMD)The Empirical Mode Decomposition (EMD) is a data-driven technique that decomposessignals into a finite set of fast and slow oscillation functions called Intrinsic Mode Func-6In DeepFalls, only the amplitudes of the CSI base signals are used.80Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detection0 5 10 15Time (s)-202AmplitudeOriginal CSI Amplitude0 5 10 15Time (s)-2-1012AmplitudeIMF #10 5 10 15Time (s)-2-1012AmplitudeIMF #20 5 10 15Time (s)-2-1012AmplitudeIMF #30 5 10 15Time (s)-1-0.500.51AmplitudeIMF #40 5 10 15Time (s)-1-0.500.51AmplitudeIMF #50 5 10 15Time (s)-1-0.500.51AmplitudeIMF #60 5 10 15Time (s)-1-0.500.51AmplitudeIMF #7Figure 4.5: The IMFs (black) generated by the EMD process for a CSI amplitude signal(red) for a series of activities: walking for 8 seconds, then a fall activity, then lying downuntil 15 s.tions (IMFs) [19]. For a signal to be considered as an IMF, two conditions must be fulfilled:(i) The number of extrema (maxima and minima) and the number of zero-crossings of thesignal must be equal or differ at most by one, and(ii) The mean of the upper and lower envelopes of the signal must be zero. For a signals[n], the algorithm outlining the EMD process is given below [19].When the EMD algorithm is applied to a CSI amplitude signal captured for 15 sec-onds with a sampling rate of 100 Hz, it results in 14 IMFs (also known as modes). Thefirst seven IMFs (denoted by black) are plotted in Fig. 4.5. The original CSI amplitudesignal corresponds to walking for the first 8 seconds, followed by a fall, and then lying81Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall DetectionAlgorithm 4.1 Pseudocode for the EMD Algorithm1: Set k = 0 and locate extrema of r0 = s[n].2: Interpolate between extrema to calculate upper and lower envelope, emax and emin re-spectively.3: Calculate envelope mean, m =emax+ emin2.4: Calculate IMF candidate, IMFk+1[n] = rk[n]−m5: if IMFk+1[n] satisfy IMF conditions then6: Save IMFk+1[n] and calculate residue, rk+1[n] = s[n]−k∑i=1IMFi[n].7: Increment k = k+1, and assign rk[n] as input to step 2.8: else9: Assign IMFk+1[n] as input to step 2.10: end if11: Continue until final residue rk[n] is a monotonic function.down for the remaining duration of the signal. In the IMFs plotted in Fig. 4.5, it can beobserved the lower-indexed IMFs consist of the higher frequency component of the CSIsignal whereas the higher-indexed IMFs contain the lower frequency components of thesignal. This is because in EMD, whenever an IMF is generated for any given stage k, thenext step of the algorithm calculates the residue by subtracting the IMFs generated untilstage k from the original signal. In Section 2.3.5, we reported that out-of-place activitiessuch as walking and fall events occupy the entire frequency band (0-50 Hz). The differ-ence is for out-of-place activities, the entire frequency band has high energies during theduration of the activity, whereas for fall events, only a fraction of a second contains highenergies across the entire frequency band followed by high energies in very low-frequencybands, indicating a lying down activity. In our research, we assume that after a fall event,the victim lies down on the floor for at least 2 seconds. In Fig. 4.5, it can be seen the firsttwo IMFs have very fast oscillations with high amplitudes until 8-9 s followed by very lowamplitudes. And the higher-indexed IMFs contain slower oscillations with higher ampli-tudes compared to the lower-indexed IMFs. However, we can also observe, the oscillations82Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detectionwithin the same IMFs, for the first three IMFs, during 0 - 8 s have very different amplitudesduring this time. Moreover, some oscillations are repeated in different IMFs, for instance,the oscillations in IMF 6 between 12-15 s are also present in IMF 7 during the same time.This is referred to as the “mode-mixing" problem in EMD [58]. The mode-mixing refersto oscillations of very different amplitudes in an IMF or presence of similar oscillationsin different IMFs. The mode-mixing patterns described for Fig. 4.5 is not consistent, asfor different human activities and different environments we observe mode-mixing acrossdifferent IMF components in different sections. To overcome the mode-mixing problem,we examine whether the Ensemble Empirical Mode Decomposition (EEMD) can alleviatethis.83Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detection4.4.2 Ensemble Empirical Mode Decomposition (EEMD)To overcome the limitations exhibited by the EMD, a new method called the EnsembleEmpirical Mode Decomposition (EEMD) [59] was proposed. This technique performs theEMD over an ensemble of the signal with addition of white Gaussian noise.The algorithm for the EEMD is given below, where wi[n] are white Gaussian noisesamplesN (0,1) for i = (1,2, ..., I) realizations.Algorithm 4.2 Pseudocode for the EEMD Algorithm1: Generate si[n] = s[n]+ εwi[n].2: Decompose each si[n] by EMD (Algorithm 4.1) to obtain IMF ik [n], where k= 1,2, ...,Kdenotes the IMF number.3: Assign IMFk as the kth mode of s[n] by averaging the corresponding IMF ik [n] over Iensembles, IMFk[n] = 1II∑i=1IMF ik [n].In Algorithm 4.2,the recommended value of ε is 20% of the standard deviation of thesignal [60]. When the EEMD was applied several times to the same CSI amplitude sig-nal used in Section 4.4.1, the mode-mixing problems for the initial IMFs were observedto be alleviated. However, the number of IMFs produced every time were not consis-tent. This is because, for each EEMD trial, different realizations of signal and the addednoise produced a different number of IMFs. This is because in the algorithm, each si[n]is decomposed independently from the other i realizations and for every one of them, aresidue rik[n] = rik−1− IMF ik is obtained at each stage, which has no connections betweenthe different realizations. As a result, the final averaging step in Algorithm 4.2 consistedof a different number of IMFs. Furthermore, the signal reconstructed using the IMFs andthe final residue generated by the EEMD contain errors as although EEMD alleviates themode-mixing problem to some extent, it produces several low-frequency IMFs, whose fre-quencies do not match with the frequencies of the original signal.84Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detection4.4.3 Complete Ensemble Empirical Mode Decomposition withAdaptive Noise (CEEMDAN)The Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEM-DAN) [58] was formulated to solve the limitations in the EEMD. By denoting the EMDoperator which produces the jth mode by Algorithm 4.1 as E j(.), the algorithm for theCEEMDAN is given below.Algorithm 4.3 Pseudocode for the CEEMDAN Algorithm1: Generate si[n] = s[n] + ε0wi[n] and only obtain the first IMF foreach realization by the EMD Algorithm 4.1 and calculate IMF1[n],IMF1[n] = 1II∑i=1IMF i1[n].2: In the first stage (k = 1), calculate first residue, r1[n] = s[n]− IMF1[n].3: Decompose realizations r1[n] + ε1E1(wi[n]), i = 1,2, ..., I, until thefirst mode in this step and assign this as the second IMF, IMF2[n],IMF2[n] = 1II∑i=1E1(r1[n]+ ε1E1(wi[n])).4: Calculate the kth residue (where k = 2, ...,K)), rk[n] = rk−1[n]− IMFk[n])5: Decompose rk[n]+ εkEk(wi[n]) until the first IMF in this step and define the next IMFas IMFk+1[n] =1II∑i=1E1(rk[n]+ εkEk(wi[n])).6: Return to step 4 for remaining k.7: Continue steps 4-6 until residue no longer satisfies IMF criteria.8: Calculate final residue R[n] =s[n]−∑Kk=1 IMFk.The resuting IMFs and the final residue obtained by any EMD process can be used toreconstruct the signal [58]s[n] =K∑k=1IMFk +R[n] (4.12)where R[n] is the final residue that is either a monotonic function or a constant. The IMFs85Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detection0 5 10 15Time (s)-202AmplitudeOriginal CSI Amplitude0 5 10 15Time (s)-2-1012AmplitudeIMF #10 5 10 15Time (s)-2-1012AmplitudeIMF #20 5 10 15Time (s)-2-1012AmplitudeIMF #30 5 10 15Time (s)-1-0.500.51AmplitudeIMF #40 5 10 15Time (s)-1-0.500.51AmplitudeIMF #50 5 10 15Time (s)-1-0.500.51AmplitudeIMF #60 5 10 15Time (s)-1-0.500.51AmplitudeIMF #7Figure 4.6: The IMFs (black) generated by the CEEMDAN process for a CSI amplitudesignal (red) for a series of activities: walking for 8 seconds, then a fall activity, then lyingdown until 15 s.decomposed by the CEEMDAN method for the CSI signal used in Fig. 4.5 are shown inFig. 4.6.It can be observed that most of the mode-mixing problems are now alleviated to a largeextent. In addition, the IMFs do not have the end effects observed in Fig. 4.5, where thebeginning and end of the IMFs consisted of oscillations with very high amplitudes. TheHilbert spectrograms of the IMFs produced by the CEEMDAN are plotted in Fig. 4.7.It can be observed for the fall activity, there are some very high frequencies with higherenergies compared to the window of the fall-like activity. In other words, the quick motionof an actual fall is captured in the HHT spectrogram. In training the CNN, the images were86Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detection (b) Fall like Activity (a) Fall Activity Figure 4.7: The HHT based on the CEEMDAN of the CSI amplitudes for the (a) fall, and(b)fall-like activities. Both the events take place at approximately 7 s windowed by 1 sbefore and 2 s after the event takes place. The colorbar represents the magnitude of thefrequencies. The bottom figures represent the actual part of the image used for training andclassification.windowed to only contain frequencies between 10-40 Hz as this was sufficient to gain agood accuracy.87Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detection Window 2 The blue window consisting of two windows (red and green) is the fall-event detection. No human presence Time (s) 0 10 20 30 40 50 15 10 5 0 -5 -10 Amplitude (dB) Open door and enter room Sit down Walk around Perform 4 squats Stand still Jog in room Fall-down Stand up Window 1 Window 3 Both windows contain 𝑓𝐿; proceed to next window Fall-event: both windows combined for further processing Figure 4.8: Adaptive Windowing based on the amplitude of FFT coefficients4.5 Modified Signal SegmentationThe novel signal segmentation proposed in Section 2.5, segmented several different humanactivities, which also included fall events. However, in DeepFalls, since we are only clas-sifying falls from fall-like activities, we modified the signal segmentation to only segmenta fall-event. The figure used in Section 2.5, is re-plotted in Fig. 4.8. The cases consideredpreviously are same, except the decisions are changed. The cases considered were:Case 1: Both windows w1 and w2 contain fL:In this case, proceed to the next window without overlapping.88Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall DetectionCase 2: w1 contains fL and w2 contains fU :Proceed to next window.Case 3: Both windows w1 and w2 contain both fL and fH :In this case, a third window w3 of length 200 samples (2 s) is computed for the next partof signal without any overlap. This is done to estimate the frequency components of w3 toidentify whether a fall event has occurred in w3. This is important because as explainedabove, fall-events first occupy higher spectral bands corresponding to a rapid movementfollowed by occupation of lower spectral bands corresponding to the person lying down.Thus there will be two specific cases in this regard:(i) If w3 contain fL: Identify that there is a fall event which occurred in w2 and mergew2 and w3 to form w4 and calculate features from this combined window of length 300samples (3 s).(ii) If w3 contain both fL and fH: This implies no fall event occured in w3. The w3will now be labeled as w1 and the next window w2 overlapping 1s of w1 as the usual caseand then the two windows will be evaluated on a case by case basis.Case 4: w1 contains both fH and fL and w2 contains fL:This clearly indicates a ‘fall-event’ has taken place. In this case the two windows will bemerged with a total length of 3 s.89Chapter 4. DeepFalls: Using Wi-Fi Spectrograms and Deep Convolution Nets for Fall Detection4.6 SummaryThe main objective of DeepFalls is to accurately distinguish human falls from fall-likeactivities in untrained environments. The motivation behind our research is that a majorlimitation of CSI-based HAR is that their performances are environment dependent. Thismeans the classifier needs to be re-trained in every new environment to predict accurately.The propagation of Wi-Fi signals varies in different environments, and therefore the fea-tures extracted in an environment may not be suitable for accurately classifying activities ina different environment. Our motivation is to transform the CSI signals into spectrogramsso that the Convolutional Neural Networks can extract features automatically. Our work isfurther motivated by the importance of accurate fall detection, especially for elderly peo-ple. In this chapter, we first illustrated the framework of our scheme, which included a fewsignal pre-processing techniques adopted from Section 2.2. We also explored the SSA asan alternative de-noising technique and explained the choice of the parameters. We thenillustrated the limitations of the STFT and CWT spectrograms and proposed the HHT toobtain improved spectrograms to distinguish falls from fall-like activities. Since the HHTis based on the Hilbert spectra of the modes produced by the EMD, we tried different EMDalgorithms and discussed the reasons for selecting the CEEMDAN algorithm. Finally, thesignal segmentation scheme from Wi-HACS was modified to segment the fall events only.90Chapter 5DeepFalls Performance AnalysisIn this chapter, we show that a Deep Convolutional Neural Network (DCNN) can be trainedto operate directly on the spectrograms of the CSI amplitudes to learn features automati-cally. Our hypothesis is that a DCNN trained this way can decide which features are mosteffective in classifying falls from fall-like activities so that it can improve the detection ofhuman falls in untrained environments. We first describe the DCNN components and thearchitecture used in our research. We then outline the data collection procedure, includingthe hardware and environments. This is followed by a discussion of the baseline methodand the performance metrics used in our analysis. We then compare the performances ofDeepFalls and the baseline in trained and untrained environments. We find that the DCNNhas the ability to distinguish falls from fall-like activities, when the furniture positions arechanged, without the need to re-train the classifier.91Chapter 5. DeepFalls Performance Analysis5.1 Deep Convolutional Neural NetworkAlthough the ideas behind deep learning go back a few decades, it has recently gained alot of attention due to its excellent performances in various application domains, such asspeech recognition [61], image classification [62], and natural language processing [63].Due to advancements in Graphic Processing Unit (GPU) technology and improved algo-rithmic efficiencies, deep networks, which consist of multiple layers of neurons, can betrained more quickly resulting in newer deep learning applications. An overview of deeplearning algorithms can be found in [64]. One of the most successful deep learning algo-rithms is the Deep Convolutional Neural Network (DCNN), which is a deeper representa-tion of the classical Convolutional Neural Network (CNN) developed in [51].The basic architecture of a CNN consists of a convolutional layer, a non-linear acti-vation function, a pooling layer and a fully connected layer. The purpose of convolutionis to extract features from the input image. Then a non-linear activation function such asthe Rectified Linear Unit (ReLU) [65] is applied to the pixels in the feature maps. Themax pooling operation [62] defines a spatial neighborhood in the rectified feature maps,by specifying a window size and taking the largest value from the window. Finally, thepixel values from the last max-pooled feature maps are individually passed as inputs to afully connected Artificial Neural Network (ANN), which may consist of one or more hid-den layers, with output neurons to predict the classes. This fully connected ANN is alsoreferred to as fully connected (FC) layers in CNN [51].The DCNN architecture used in DeepFalls is given in Fig. 5.1. The proposed DCNNmodel consists of three convolutional layers, three max-pooling layers, and four fully con-nected layers. The spectrograms used as inputs are of dimensions 128 x 128, with pixelvalues between 0-255. The first convolutional layer consists of 64 feature maps. Thesecond and third convolutional layers conssits of 128 and 256 feature maps respectively.92Chapter 5. DeepFalls Performance Analysis 1st Convolution Layer {64 feature maps} 1st Pooling Layer (2x2) 2nd Convolution Layer {128 feature maps} 3rd Convolution Layer {256 feature maps} 2nd Pooling Layer (2x2) 3rd Pooling Layer (2x2) Fully Connected Layer Figure 5.1: The proposed Deep Convolutional Neural Network architecture.The feature detectors used to convolve with the input image to form the 1st convolutionallayer and to convolve with the output of the pooling layers to form subsequent convolutionlayers are of dimensions 3x3 with a stride setting of 2. The ReLU activation function isapplied to the feature maps from the convolution layers. The window used to perform themax-pooling operation on the rectified feature maps outputted from the convolution layersare of dimensions 2x2 with a stride setting of 2. The number of dimensions and the stridesused in our research for the feature detectors are chosen as default values, as given in theliterature [51]. The pixel values from the feature maps in the last pooling layer are thenused as inputs to the FC. The hidden layers in the FC consist of 128 neurons each.93Chapter 5. DeepFalls Performance Analysis5.2 Dataset5.2.1 Hardware and Base SignalsSimilar to our previous data collection procedure, we used the Intel 5300 Linux firmware[5] to extract the CSI data. The transmitter and receiver devices were same as described inSection 3.3, that is an Asus RT-N600 router and a Dell Latitude E600 laptop, respectively.However, this time we collected data across two transmit and two receive antennas (4 TRlinks), as we could not get measurements using the third antenna of the NIC. The basesignals used in DeepFalls are only the amplitudes whereas those used in the baseline [12]are the amplitudes, and phase differences. We did not use the phase differences becauseit would increase the training time. Also, there are several combinations of amplitudes,phases and phase differences and finding the best combination depends on the classifica-tion results. We started with using only amplitudes and in the following sections we willdemonstrate it is sufficient to improve the classifications in untrained environments. Weleave the other combinations as future work.5.2.2 Data Collection ProcedureCurrently, there is no CSI-based fall detection dataset publicly available. The data wereobtained under the same approval received from the UBC Behavioural Research EthicsBoard (BREB) for our previous dataset. The environments used to collect data are a largemeeting room and an apartment, as shown in Fig. 5.2. In addition, we also collecteddata by changing some furniture positions in the apartment, which we refer to as the thirdenvironment7. The objective of our research is to improve the classifications of human fallsfrom fall-like activities in trained as well as untrained environments.7In this chapter, the meeting room is referred to as Env 1, the apartment in the original state as Env 2 andthe apartment with furniture positions changed as Env 3.94Chapter 5. DeepFalls Performance Analysis 5mm 96% Kitchen Wash-room TX RX 5mm 12mm Walk Kitchen Wash-room TX RX Setting (a) Setting (b) Average Accuracy (%) Setting (c) Sit Stand Walk Squat Fall Lie Down Stand from lie-down (c) AVERAGE 96.6% 95.5% 91.8% 89.7% 74.5% 66% STAND 98% 100% 91% 91% 74% 66% WALK 98% 96% 96% 93% 77% 72% Wi-HACS ACCURACY COMPARISON (%) ACTIVITY Setting #1 Setting #2 Setting #3 Wi-HACS Benchmark [15] Wi-HACS Benchmark [15] Wi-HACS Benchmark [15] SIT 96% 96% 92% 92% 76% 69% SQUAT 97% 97% 95% 92% 79% 68% FALL 95% 92% 90% 86% 70% 59% LIE-DOWN 96% 94% 88% 88% 77% 68% STAND FROM 96% 94% 91% 86% 69% 60% LIE DOWN (a) TX RX Training and Testing data 8mm 5mm (b) Figure 5.2: Experimental Setting for Data Collection: (a) Large meeting room, (b) StudioApartment, and (c) Same Apartment (furniture position changed).During the data collection phase, the position of the devices and the furniture in theenvironments were unchanged. Although there are various types of human falls, we onlyconsidered the ones reported in the baseline [12]. We provided a mattress to protect thevolunteers from being injured. In our dataset, the two types of human falls consideredwere:(i) Standing-Fall, which means the victim falls from a stationary position, such as from achair or from a bed or while standing (implying a loss of consciousness or balance). Weconsidered two types of standing falls, (i) fall from a chair, and (ii) fall from standing.(ii) Walking-fall, which means that the victim falls from a moving position such as fromwalking or tripping. We considered two cases to collect data that represents a walking-fall,(i) falling from walking, and (ii) falling from jogging. In our data, the fall activities wereperformed in forward, backward and sideward motions. After collecting these data, wegrouped them into the class: Human Falls.Since there are numerous daily activities, which resemble a fall activity, given the timeconstraints, we considered the following types of fall-like activities: (i) Sit from standingposition, (ii) Sit from walking position, (iii) Lie down from walking position, (iv) Lie downfrom standing position. The data corresponding to these activities was labeled: Fall-like95Chapter 5. DeepFalls Performance Analysis Number of Samples Meeting Room (Env 1) Apartment (Env 2) Apartment-changed (Env 3) Activity Training Testing Training Testing Training Testing Falls 27 18 18 12 18 12 Fall-like 27 18 18 12 18 12 Total for train/test 54 36 36 24 36 24 Total collected 90 60 60 Table 5.1: Number of human falls and fall-like activities in three environments in ourdataset.Activities. The total number of samples8 collected for each environment are given in Table5.1. The total number of samples were divided into 60% for training and 40% for testing.8Samples refer to the CSI data captured continuously for 3 s.96Chapter 5. DeepFalls Performance Analysis5.3 Results and DiscussionIn this section, we discuss the performance of DeepFalls in trained and untrained envi-ronments, and then compare our results with the current state-of-the-art CSI-based falldetector [12].5.3.1 Baseline Method and Performance MetricsIn the baseline paper [12], the authors used one transmit and two receive antennas. How-ever, since we collected data across two transmit and two receive antennas, we calculatedtheir features using all four TR links. Specifically, the amplitude features were calculatedacross 4 TR links and features based on phase differences were calculated from 2 TR links.However, in both cases, five consecutive subcarriers were averaged per TR link, resultingin 6 subcarriers per TR link. In DeepFalls, we utilized the amplitudes of all subcarriersas base signals and we calculated the Hilbert spectrograms from these, to serve as inputimages to the DCNN.We adopted the same performance metrics as stated in RT-Fall, sensitivity and speci-ficity. Sensitivity is defined as the percentage of correctly detected falls,sensitivity =T PT P+FN(5.1)and specificity is defined as the percentage of correctly detected non-falls,speci f icity =T NT N+FP(5.2)where T P, T N, FP, and FN represent the number of true positives, true negatives, falsepositives and false negatives respectively.97Chapter 5. DeepFalls Performance Analysis5.3.2 Performance Evaluation of DeepFalls with BaselineIn this subsection, we discuss the results of DeepFalls and compare them with our baseline,RT-Fall, in three different environments, by training and testing in all environments, as wellas training in one environment and testing in different environments. In RT-Fall, the SVMclassifier was used and we trained the SVM model using the parameters described in theirpaper.We trained DeepFalls in the following way. We split the training data into 75% fortraining and 25% for testing first, to adjust the layers of the CNN to overcome overfittingproblems. We computed the spectrogram images using all the subcarriers from the 4 TRlinks and used these images for training and testing. The training was done on a singleNVIDIA K80 GPU with 12 GB RAM in Amazon Web Services (AWS) server. We willnow assess DeepFalls in different environments.• We start by observing the performances of DeepFalls and RT-Fall by training andtesting in the same environment. It can be observed from Fig. 5.3, that DeepFallshave a higher sensitivity (7.7%) and specificity (11.9%) on average, than RT-Fall inall environments. In addition, the performances of both schemes in environment 1 ishigher than the other two environments. This is because environment 1 is relativelysimpler, with fewer furniture and walls.• For the simplest environment (Env 1), DeepFalls has a 10% higher sensitivity and 2%higher specificity than RT-Falls. In the first apartment environment (Env 2), Deep-Falls has an 8.33% higher sensitivity and 20% higher specificity than RT-Falls. Andwhen the furniture positions are changed in the same apartment (Env 3), DeepFallshave a 4.07% higher sensitivity and 14.66% higher specificity than RT-Falls.• Referring to Fig. 5.3, the differences between the sensitivity and specificity for Deep-98Chapter 5. DeepFalls Performance Analysis 93.3 90.983.3386.67 87.481.3387.684.283.388.87566.6783.3366.6779.273.33020406080100Sensitivity(Env 1) Specificity(Env 1)Sensitivity(Env 2)Specificity(Env 2)Sensitivity(Env 3)Specificity(Env 3)Sensitivity(Env 2+3)Specificity(Env 2+3)PERFORMANCE (%)CLASSIFICATION METRICSTraining and testing in each environmentDeepFalls RT-FallFigure 5.3: Performances of DeepFalls and RT-Fall after training and testing on each en-vironment separately shown in Fig. 5.2. The performance metrics in (Env2+3) representsthe classification results using data combined from both apartment environments.Falls are much less than those for RT-Fall. This illustrates that although changing thefurniture positions affect the classifications for both schemes, the effect is more pro-nounced for RT-Falls.• We now discuss the results of both schemes after training on one environment andtesting in different environments. The results are shown in Fig. 5.4. In Fig. 5.4a,the results are obtained after training in the meeting room environment and testingit on data collected from both apartment environments. Although the sensitivity andspecificity for DeepFalls are much lower than before, they are higher than those forRT-Fall. The reason for this low performance is because the environment of a meet-ing room is very different from that of an apartment. However, when DeepFalls wastrained using data from the apartment in the original state (Env 2), and then tested onthe same apartment with furniture positions altered, the results indicate that Deep-99Chapter 5. DeepFalls Performance Analysis 69.6762.36050020406080100Sensitivity SpecificityPERFORMANCE (%)CLASSIFICATION METRICSTraining in meeting room and testing in apartment (Env 2 and Env 3)DeepFalls RT-Fall83.33 79.337264.66020406080100Sensitivity SpecificityPERFORMANCE (%)CLASSIFICATION METRICSTraining in apartment (Env 2) and testing in changed apartment (Env 3)DeepFalls RT-Fall(a) (b) Figure 5.4: Performances of DeepFalls and RT-Fall, (a) after training in meeting room andtesting in the apartment environments, (b) after training in apartment (Env 2) and testingin the changed apartment (Env 3).Falls outperforms RT-Falls with an 11.33% improvement in sensitivity and 14.67%improvement in specificity. Therefore, it can be inferred that the DCNN model ex-tracts features which are not very dependent on the furniture positions, especially forthe apartment we used to collect data.100Chapter 5. DeepFalls Performance Analysis5.4 SummaryOne of the biggest limitations of CSI-based HAR systems is that the models trained in oneenvironment do not perform well in a different environment. This is because the featurescalculated from one environment may not work well in a different environment, as thesignal propagations are different. Therefore, we investigated whether we could utilize aDCNN to extract the features automatically to improve the fall detection in untrained envi-ronments. In this chapter, we demonstrated that our scheme, DeepFalls, has an improvedperformance over the state-of-the-art baseline, RT-Fall, especially in untrained environ-ments. We also demonstrated that DeepFalls outperforms the baseline when trained andtested in the same environment. Although training DeepFalls in a meeting room does notperform well in the apartment environments, the results were better than those of RT-Fall.Furthermore, when DeepFalls trained in the apartment environment was tested by chang-ing the furniture positions, the results were similar to that of being trained and tested inthe same apartment setting. In contrast, RT-Fall showed a degraded performance. Thisillustrates that the DCNN calculates features, which work well even when the positionsof furniture are changed. Therefore, automatic feature extractions are more suitable toaccurately classify falls from fall-like activities in untrained but similar environments.101Chapter 6Conclusion and Future WorkWi-Fi-based HAR can serve a wide range of human-centric applications and usher in anew era of contactless sensing. As most homes today have Wi-Fi enabled hardware, theseHAR systems do not need any additional hardware. It does not require the user to carryor wear a device, unlike wearable HAR technologies. In addition, it does not intrudeprivacy, nor require a Line-of-Sight and can penetrate through walls, which overcome thelimitations of vision-based HAR systems. Since Wi-Fi based HAR is a relatively new areaof research, we have identified some limitations and the goal of our research was to addressthem through various signal processing and machine learning methods. In this chapter, wesummarize our research work and discuss some potential areas that can be improved.6.1 Conclusion• In Chapter 2, we discussed the foundations of Wi-HACS, that was designed to solvethe performance degradation limitation of Wi-Fi-based HAR in complex environ-ments. We described how the amplitude and the calibrated phase of the Wi-Fi re-ceived signal vary with different human activities. We studied the correlations amongdifferent subcarriers, which motivated the novel features introduced in Chapter 3. Wediscussed the use of several signal processing techniques on the CSI amplitudes andphases. We also applied the Discrete Wavelet Transform based de-noising methodthat reduces some of the limitations of existing CSI-based HAR systems. A novel102Chapter 6. Conclusion and Future Worksignal segmentation scheme that can accurately segment both human activities andfall events was also proposed.• We analyzed the classification results of Wi-HACS in chapter 3. We proposed novelfeatures based on subcarrier correlations and the amplitudes of dominant frequenciesin the autospectra of the principal components. We explained the approach taken totune parameters in the multi-class SVM classifier and reported these values for eachenvironment. Since there is no publicly available dataset, we explained the method-ology, hardware and the environments used for data collection. We collected andlabeled a total of 1260 samples consisting of 7 human activities measured in 3 differ-ent environments. We assessed our signal processing techniques and novel featuresby using the classification performances. We compared our work with a baselineCSI-HAR system in [17]. Our results show that the accuracies, precisions and re-calls for all activities and all environments were higher than those of the baselinesystem. In the simplest environment, the improvements on accuracy, precision andrecall were 3%, 8%, and 8.5%. For the most complex environment, the improve-ments were 14%, 20%, and 19%, respectively. In addition, the results of a one-tailedpaired t-test to show that our improvements are statistically significant with a 99%confidence interval.• In Chapter 4, our research objective was to devise a CSI-based fall detector to distin-guish human falls from fall-like activities in untrained environments. Since falls arethe leading cause of accidental death in the elderly population, the need for accuratefall detectors further motivated our research work. One of the limitations of existingCSI-based fall detectors is that classification performances degrade in untrained en-vironments. This is because the feature values calculated in one signal environmentmay be inappropriate in a different environment. Our hypothesis is that automatic103Chapter 6. Conclusion and Future Workfeature extraction may be more appropriate. We suggested the use of the Singu-lar Spectral Analysis (SSA) based noise attenuation and Hilbert Huang Transform(HHT)-based spectrograms to distinguish falls from fall-like activities.• In Chapter 5, we assessed whether the HHT spectrograms and Deep CNN improveclassifications in untrained environments. Since there is no CSI-based fall dataset,we collected a total of 210 samples of fall and fall-like activities in three differ-ent environments. We showed that DeepFalls outperforms RT-Fall when tested inuntrained environments. Our results indicated that changing the furniture positionsdegraded the sensitivity and specificity of DeepFalls by 4% and 2% respectively,whereas RT-Fall degraded by 7% and 9% respectively.In the next subsection, we identify potential areas for future work.104Chapter 6. Conclusion and Future Work6.2 Future Work• In Chapter 3, we utilized the multi-class SVM classifier to distinguish 7 differenthuman activities. Our motivation to use SVM was mainly because it was used in ourbaseline CSI-HAR system [17]. It would be interesting to investigate whether theclassification performances could be improved by other classification algorithms,such as Artificial Neural Networks (ANNS) [66] and Extreme Learning Machine[67].• In both of our datasets, the samples correspond to only one volunteer performingthe activity. But in a real-world scenario, there may be more than one person in thesame environment. Current CSI-based HAR do not consider the possibility of havingmore than one person in the environment except for RT-Fall [12]. In their paper, theauthors reported that their system can detect falls “reliably" of one person, providedthe other person is either sitting or lying down. However, they did not specify whatexactly the performance results are in such cases. It would be useful to first assessthe performances of the scheme in two cases, 1) when one person is performing anin-place or out-of-place activity, while the other performs a fall-event, and 2) whenboth perform fall-events. This is needed since if one person suffers a fall, the otherperson is able to ask for help. The situation is more serious if both persons sufferfrom a fall at the same time.• In chapter 2, we mentioned that the base signals for Wi-HACS were amplitudes andphases. In chapter 5, for DeepFalls, we utilized only the amplitudes. In both cases,we started with amplitudes first, and we observed that the results of Wi-HACS im-prove when phases are added as base signals. In the case of DeepFalls, we have notyet studied how the phases would affect the performance. In [12], the authors re-105Chapter 6. Conclusion and Future Workported that the features calculated from amplitudes and phase differences improvedtheir results. Since there are several base signals to choose from, it would be inter-esting to asssess the classification performances using features from 1) amplitudes,2) phases, 3) phase differences, 4) amplitudes and phases, 5) phases and phase dif-ferences, and 6) amplitudes and phase differences.• In our dataset consisting of falls and fall-like activities, we assumed that after afall-event, the victim stays in the same position for at least 2 seconds. Since thismay vary, depending on the type of fall and the injury caused, we suggest collectingdata that does not restrict the volunteer to remain in the same position for at least 2seconds and study how classification performance is affected.106Bibliography[1] E. Kim, S. Helal, and D. Cook, “Human activity recognition and pattern discovery,”IEEE Pervasive Computing, vol. 9, no. 1, 2010.[2] M. Cornacchia, K. Ozcan, Y. Zheng, and S. Velipasalar, “A survey on activity detec-tion and classification using wearable sensors,” IEEE Sensors Journal, vol. 17, no. 2,pp. 386–403, 2017.[3] T. Subetha and S. Chitrakala, “A survey on human activity recognition from videos,”in Information Communication and Embedded Systems (ICICES), 2016 InternationalConference on. IEEE, 2016, pp. 1–7.[4] Z. Yang, Z. Zhou, and Y. Liu, “From rssi to csi: Indoor localization via channelresponse,” ACM Computing Surveys (CSUR), vol. 46, no. 2, p. 25, 2013.[5] D. Halperin, W. Hu, A. Sheth, and D. Wetherall, “Tool release: Gathering 802.11 ntraces with channel state information,” ACM SIGCOMM Computer CommunicationReview, vol. 41, no. 1, pp. 53–53, 2011.[6] J. Liu, Y. Wang, Y. Chen, J. Yang, X. Chen, and J. Cheng, “Tracking vital signs duringsleep leveraging off-the-shelf wifi,” in Proceedings of the 16th ACM InternationalSymposium on Mobile Ad Hoc Networking and Computing. ACM, 2015, pp. 267–276.[7] X. Liu, J. Cao, S. Tang, and J. Wen, “Wi-sleep: Contactless sleep monitoring via wifisignals,” in Real-Time Systems Symposium (RTSS), 2014 IEEE. IEEE, 2014, pp.346–355.[8] R. Nandakumar, B. Kellogg, and S. Gollakota, “Wi-fi gesture recognition on existingdevices,” arXiv preprint arXiv:1411.5394, 2014.[9] C. Han, K. Wu, Y. Wang, and L. M. Ni, “Wifall: Device-free fall detection by wire-less networks,” in IEEE INFOCOM 2014-IEEE Conference on Computer Communi-cations.[10] Y. Wang, K. Wu, and L. M. Ni, “Wifall: Device-free fall detection by wireless net-works,” IEEE Transactions on Mobile Computing, vol. 16, no. 2, pp. 581–594, 2017.107Bibliography[11] D. Zhang, H. Wang, Y. Wang, and J. Ma, “Anti-fall: A non-intrusive and real-time falldetector leveraging csi from commodity wifi devices,” in International Conference onSmart Homes and Health Telematics. Springer, 2015, pp. 181–193.[12] H. Wang, D. Zhang, Y. Wang, J. Ma, Y. Wang, and S. Li, “Rt-fall: a real-time andcontactless fall detection system with commodity wifi devices,” IEEE Transactionson Mobile Computing, vol. 16, no. 2, pp. 511–526, 2017.[13] W. Xi, J. Zhao, X.-Y. Li, K. Zhao, S. Tang, X. Liu, and Z. Jiang, “Electronic frog eye:Counting crowd using wifi,” in Infocom, 2014 proceedings ieee. IEEE, 2014, pp.361–369.[14] W. Liu, X. Gao, L. Wang, and D. Wang, “Bfp: Behavior-free passive motion detectionusing phy information,” Wireless Personal Communications, vol. 83, no. 2, pp. 1035–1055, 2015.[15] K. Qian, C. Wu, Z. Yang, Y. Liu, and Z. Zhou, “Pads: Passive detection of movingtargets with dynamic speed using phy layer information,” in Parallel and DistributedSystems (ICPADS), 2014 20th IEEE International Conference on. IEEE, 2014, pp.1–8.[16] W. W. Hsieh, Machine learning methods in the environmental sciences: Neural net-works and kernels. Cambridge university press, 2009.[17] Y. Wang, X. Jiang, R. Cao, and X. Wang, “Robust indoor human activity recognitionusing wireless signals,” Sensors, vol. 15, no. 7, pp. 17 195–17 208, 2015.[18] N. E. Huang, Hilbert-Huang transform and its applications. World Scientific, 2014,vol. 16.[19] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.-C. Yen, C. C.Tung, and H. H. Liu, “The empirical mode decomposition and the hilbert spectrumfor nonlinear and non-stationary time series analysis,” in Proceedings of the RoyalSociety of London A: mathematical, physical and engineering sciences, vol. 454, no.1971. The Royal Society, 1998, pp. 903–995.[20] D. Tse and P. Viswanath, Fundamentals of wireless communication. Cambridgeuniversity press, 2005.[21] “IEEE Standard for Information technology– Local and metropolitan area networks–Specific requirements– Part 11: Wireless LAN Medium Access Control (MAC)andPhysical Layer (PHY) Specifications Amendment 5: Enhancements for HigherThroughput,” IEEE Std 802.11n-2009 (Amendment to IEEE Std 802.11-2007 asamended by IEEE Std 802.11k-2008, IEEE Std 802.11r-2008, IEEE Std 802.11y-2008, and IEEE Std 802.11w-2009), pp. 1–565, 2009.108Bibliography[22] J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques. Elsevier,2011.[23] Y. Murin and R. Dabora, “Efficient estimation of carrier and sampling frequencyoffsets in ofdm systems,” in Wireless Communications and Networking Conference(WCNC), 2014 IEEE. IEEE, 2014, pp. 440–445.[24] C. Wu, Z. Yang, Z. Zhou, K. Qian, Y. Liu, and M. Liu, “Phaseu: Real-time los iden-tification with wifi,” in Computer Communications (INFOCOM), 2015 IEEE Confer-ence on. IEEE, 2015, pp. 2038–2046.[25] W. Wang, A. X. Liu, M. Shahzad, K. Ling, and S. Lu, “Device-free human activ-ity recognition using commercial wifi devices,” IEEE Journal on Selected Areas inCommunications, vol. 35, no. 5, pp. 1118–1131, 2017.[26] L. Davies and U. Gather, “The identification of multiple outliers,” Journal of theAmerican Statistical Association, vol. 88, no. 423, pp. 782–792, 1993.[27] C. L. Phillips, J. M. Parr, and E. A. Riskin, Signals, systems, and transforms. PrenticeHall, 2013.[28] S. W. Smith et al., “The scientist and engineer’s guide to digital signal processing,”1997.[29] K. Ali, A. X. Liu, W. Wang, and M. Shahzad, “Keystroke recognition using wifi sig-nals,” in Proceedings of the 21st Annual International Conference on Mobile Com-puting and Networking. ACM, 2015, pp. 90–102.[30] S. Palipana, D. Rojas, P. Agrawal, and D. Pesch, “Falldefi: Ubiquitous fall detectionusing commodity wi-fi devices,” Proceedings of the ACM on Interactive, Mobile,Wearable and Ubiquitous Technologies, vol. 1, no. 4, p. 155, 2018.[31] R. Polikar, “The wavelet tutorial part iii,” IOWA State University, USA, 1996.[32] I. Daubechies, Ten lectures on wavelets. Siam, 1992, vol. 61.[33] B. N. Singh and A. K. Tiwari, “Optimal selection of wavelet basis function applied toecg signal denoising,” Digital signal processing, vol. 16, no. 3, pp. 275–287, 2006.[34] W. Xi, D. Huang, K. Zhao, Y. Yan, Y. Cai, R. Ma, and D. Chen, “Device-free humanactivity recognition using csi,” in Proceedings of the 1st Workshop on Context Sensingand Activity Recognition. ACM, 2015, pp. 31–36.[35] A. Mertins, “Signal analysis: Wavelets, filter banks, time-frequency transforms andapplications, english (revised edition),” 1999.109Bibliography[36] M. Srivastava, C. L. Anderson, and J. H. Freed, “A new wavelet denoising methodfor selecting decomposition levels and noise thresholds,” IEEE Access, vol. 4, pp.3862–3877, 2016.[37] C. Cai and P. d. B. Harrington, “Different discrete wavelet transforms applied todenoising analytical data,” Journal of chemical information and computer sciences,vol. 38, no. 6, pp. 1161–1170, 1998.[38] Q. Pan, L. Zhang, G. Dai, and H. Zhang, “Two denoising methods by wavelet trans-form,” IEEE transactions on signal processing, vol. 47, no. 12, pp. 3401–3406, 1999.[39] S. Poornachandra, “Wavelet-based denoising using subband dependent threshold forecg signals,” Digital signal processing, vol. 18, no. 1, pp. 49–55, 2008.[40] L. Yin, R. Yang, M. Gabbouj, and Y. Neuvo, “Weighted median filters: a tutorial,”IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing,vol. 43, no. 3, pp. 157–192, 1996.[41] W. Wang, A. X. Liu, M. Shahzad, K. Ling, and S. Lu, “Understanding and modelingof wifi signal based human activity recognition,” in Proceedings of the 21st AnnualInternational Conference on Mobile Computing and Networking. ACM, 2015, pp.65–76.[42] A. Zhang, B. Yang, and L. Huang, “Feature extraction of eeg signals using powerspectral entropy,” in BioMedical Engineering and Informatics, 2008. BMEI 2008.International Conference on, vol. 2. IEEE, 2008, pp. 435–439.[43] J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning.Springer series in statistics New York, 2001, vol. 1.[44] C.-W. Hsu, C.-C. Chang, C.-J. Lin et al., “A practical guide to support vector classi-fication,” 2003.[45] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon-del, P. Prettenhofer, R. Weiss, V. Dubourg et al., “Scikit-learn: Machine learningin python,” Journal of machine learning research, vol. 12, no. Oct, pp. 2825–2830,2011.[46] L. M. Shaw, “An investigation of falls in the elderly,” Journal of Rehabilitation Re-search and Development, vol. 33, p. 106, 1996.[47] A. Descoins, “Why accuracy alone is a bad measure for classification tasks, and whatwe can do about it,” Adresse: https://tryolabs. com/blog/2013/03/25/why-accuracy-alone-bad-measure-classification-tasksand-what-we-can-do-about-it/(besucht am 8.12. 2016), 2013.110Bibliography[48] S. Suthaharan, “Support vector machine,” in Machine learning models and algo-rithms for big data classification. Springer, 2016, pp. 207–235.[49] J. H. McDonald, Handbook of biological statistics. Sparky House Publishing Balti-more, MD, 2009, vol. 2.[50] Q.-Q. P. R. RUN, “Quantile-quantile plot.”[51] Y. LeCun, B. E. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. E. Hubbard,and L. D. Jackel, “Handwritten digit recognition with a back-propagation network,”in Advances in neural information processing systems, 1990, pp. 396–404.[52] V. Oropeza and M. Sacchi, “Simultaneous seismic data denoising and reconstructionvia multichannel singular spectrum analysis,” Geophysics, vol. 76, no. 3, pp. V25–V32, 2011.[53] J. B. Elsner and A. A. Tsonis, Singular spectrum analysis: a new tool in time seriesanalysis. Springer Science & Business Media, 2013.[54] V. Oropeza, “The singular spectrum analysis method and its application to seismicdata denoising and reconstruction,” 2010.[55] N. Golyandina, V. Nekrutkin, and A. A. Zhigljavsky, “Analysis of time series struc-ture: Ssa and related techniques (chapman & hall crc monographs on statistics &applied probability),” 2001.[56] B. L. Barnhart, The Hilbert-Huang transform: theory, applications, development.The University of Iowa, 2011.[57] G. Heinzel, A. Rüdiger, and R. Schilling, “Spectrum and spectral density estimationby the discrete fourier transform (dft), including a comprehensive list of windowfunctions and some new at-top windows,” 2002.[58] M. E. Torres, M. A. Colominas, G. Schlotthauer, and P. Flandrin, “A complete en-semble empirical mode decomposition with adaptive noise,” in Acoustics, speech andsignal processing (ICASSP), 2011 IEEE international conference on. IEEE, 2011,pp. 4144–4147.[59] Z. Wu and N. E. Huang, “Ensemble empirical mode decomposition: a noise-assisteddata analysis method,” Advances in adaptive data analysis, vol. 1, no. 01, pp. 1–41,2009.[60] M. A. Colominas, G. Schlotthauer, M. E. TORRES, and P. Flandrin, “Noise-assistedemd methods in action,” Advances in Adaptive Data Analysis, vol. 4, no. 04, p.1250025, 2012.111Bibliography[61] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Van-houcke, P. Nguyen, T. N. Sainath et al., “Deep neural networks for acoustic modelingin speech recognition: The shared views of four research groups,” IEEE Signal Pro-cessing Magazine, vol. 29, no. 6, pp. 82–97, 2012.[62] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deepconvolutional neural networks,” in Advances in neural information processing sys-tems, 2012, pp. 1097–1105.[63] R. Collobert and J. Weston, “A unified architecture for natural language processing:Deep neural networks with multitask learning,” in Proceedings of the 25th interna-tional conference on Machine learning. ACM, 2008, pp. 160–167.[64] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p.436, 2015.[65] R. Livni, S. Shalev-Shwartz, and O. Shamir, “On the computational efficiency oftraining neural networks,” in Advances in Neural Information Processing Systems,2014, pp. 855–863.[66] R. J. Schalkoff, Artificial neural networks. McGraw-Hill New York, 1997, vol. 1.[67] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: theory andapplications,” Neurocomputing, vol. 70, no. 1-3, pp. 489–501, 2006.112Appendix AConfusion matrices and results forWi-HACS2a) Confusion Matrix for environment 2: Sit Stand Sit from Stand Walk Squat Fall Down Jog Sit 22 6 1 1 0 1 1 Stand 5 21 1 1 1 1 1 Sit from Stand 0 0 21 1 0 5 2 Walk 0 0 0 21 5 1 0 Squat 1 0 3 2 22 1 4 Fall Down 2 2 2 0 0 21 1 Jog 0 1 2 3 2 0 21 Total 30 30 30 30 30 30 30 Sit Stand Sit from Stand Walk Squat Fall Down Jog Sit 17 9 3 2 1 1 1 Stand 6 17 3 2 5 2 3 Sit from Stand 1 0 15 1 1 5 4 Walk 1 0 1 18 5 1 0 Squat 2 1 3 2 16 3 4 Fall Down 3 2 3 1 0 17 1 Jog 0 1 2 4 2 1 17 Total 30 30 30 30 30 30 30 (a) Confusion Matrix for Wi-HACS Actual Data Classifier Results (b) Confusion Matrix for Benchmark Actual Data Classifier Results Figure A.1: Confusion Matrices for cross-validation results for setting 2: (a) Wi-HACS,(b) Baseline. 1a) performance metrics for environment 2 Activity Sit Stand Sit from Stand Walk Squat Fall Down Jog Metrics Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Precision 68.75 50.00 67.74 44.74 72.41 55.56 77.78 69.23 66.67 51.61 75.00 62.96 72.41 62.96 Recall 73.33 56.77 70.00 56.67 70.00 50.00 72.41 60.00 73.33 53.33 70.00 56.67 70.00 56.67 Accuracy 90.48 85.71 90.48 83.81 89.43 87.14 92.38 90.48 90.00 86.19 89.43 82.05 90.95 85.05 Average Accuracy Wi-HACS: 90.45% Benchmark: 85.78% Figure A.2: Performance metrics for each activity using the confusion matrices above forsetting 2: (a) Wi-HACS, (b) Baseline.113Appendix A. Confusion matrices and results for Wi-HACS3a) Confusion Matrix for environment 3: Sit Stand Sit from Stand Walk Squat Fall Down Jog Sit 14 8 1 2 3 2 1 Stand 6 14 2 2 5 2 2 Sit from Stand 2 1 13 2 1 7 4 Walk 2 2 2 15 6 2 2 Squat 2 1 5 3 13 3 5 Fall Down 3 2 4 2 1 12 2 Jog 1 2 4 4 2 2 14 Total 30 30 30 30 30 30 30 Sit Stand Sit from Stand Walk Squat Fall Down Jog Sit 9 5 3 3 3 2 4 Stand 5 6 3 7 5 5 3 Sit from Stand 4 4 7 3 3 5 5 Walk 3 4 2 9 6 4 4 Squat 4 3 7 3 7 3 5 Fall Down 3 6 4 1 3 7 3 Jog 2 2 4 4 3 4 6 Total 30 30 30 30 30 30 30 (a) Confusion Matrix for Wi-HACS Actual Data Classifier Results (b) Confusion Matrix for Benchmark Actual Data Classifier Results Figure A.3: Confusion Matrices for cross-validation results for setting 3: (a) Wi-HACS,(b) Baseline. 1a) performance metrics for environment 3 Activity Sit Stand Sit from Stand Walk Squat Fall Down Jog Metrics Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Precision 45.16 31.03 42.42 17.65 43.33 22.58 48.39 28.13 40.63 21.88 46.15 25.98 48.28 24.00 Recall 46.67 30.00 46.67 20.00 41.94 23.33 50.00 30.00 41.94 23.33 40.00 23.33 46.68 20.00 Accuracy 78.15 70.48 80.81 65.24 78.86 67.62 83.71 69.05 80.86 67.14 83.24 59.52 83.71 69.53 Average Accuracy Wi-HACS: 80.33% Benchmark: 66.94% Figure A.4: Performance metrics for each activity using the confusion matrices above forsetting 3: (a) Wi-HACS, (b) Baseline.114Appendix A. Confusion matrices and results for Wi-HACS1a) Confusion Matrix for environment 1: Best Classifier (Our) Sit Stand Sit from Stand Walk Squat Fall Down Jog Sit 27 2 0 0 0 1 1 Stand 3 27 0 0 0 0 1 Sit from Stand 0 0 26 1 0 3 0 Walk 0 0 0 27 2 1 0 Squat 0 1 1 1 27 0 1 Fall Down 0 0 2 0 0 25 0 Jog 0 0 1 1 1 0 27 Total 30 30 30 30 30 30 30 Sit Stand Sit from Stand Walk Squat Fall Down Jog Sit 24 4 0 0 0 1 1 Stand 4 24 0 1 0 0 1 Sit from Stand 0 0 25 1 0 3 2 Walk 0 0 0 24 4 1 0 Squat 0 0 2 3 24 0 1 Fall Down 2 1 2 0 0 25 1 Jog 0 1 1 1 2 0 24 Total 30 30 30 30 30 30 30 (a) Confusion Matrix for Wi-HACS Actual Data Classifier Results (b) Confusion Matrix for Benchmark Actual Data Classifier Results Figure A.5: Confusion Matrices for test results in setting 1: (a) Wi-HACS, (b) Baseline. 1a) performance metrics for environment 1 Activity Sit Stand Sit from Stand Walk Squat Fall Down Jog Metrics Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Precision 87.09 80.00 87.09 80.00 86.67 80.65 90.00 82.76 87.09 80.00 92.59 80.65 90.00 82.76 Recall 90.00 80.00 90.00 80.00 86.67 83.33 90.00 80.00 90.00 80.00 83.33 83.33 90.00 80.00 Accuracy 96.67 91.29 95.33 91.23 94.19 93.76 96.14 94.22 95.16 94.22 95.67 91.23 97.62 92.55 Average Accuracy Wi-HACS: 95.82% Benchmark: 92.61% Figure A.6: Performance metrics for each activity using the confusion matrices above insetting 1: (a) Wi-HACS, (b) Baseline.115Appendix A. Confusion matrices and results for Wi-HACS2a) Confusion Matrix for environment 2: Sit Stand Sit from Stand Walk Squat Fall Down Jog Sit 20 7 2 1 0 1 1 Stand 5 20 2 1 3 2 2 Sit from Stand 0 0 19 1 0 5 3 Walk 0 0 0 21 5 1 0 Squat 2 0 3 2 20 1 4 Fall Down 3 2 2 1 0 20 1 Jog 0 1 2 3 2 0 19 Total 30 30 30 30 30 30 30 Sit Stand Sit from Stand Walk Squat Fall Down Jog Sit 16 9 3 2 1 2 1 Stand 6 15 3 2 5 2 2 Sit from Stand 1 1 12 2 1 7 4 Walk 2 1 1 16 6 1 1 Squat 2 1 4 2 15 3 4 Fall Down 3 2 4 1 0 14 2 Jog 0 1 3 4 2 1 16 Total 30 30 30 30 30 30 30 (a) Confusion Matrix for Wi-HACS Actual Data Classifier Results (b) Confusion Matrix for Benchmark Actual Data Classifier Results Figure A.7: Confusion Matrices for test results in setting 2: (a) Wi-HACS, (b) Baseline. 1a) performance metrics for environment 2 Activity Sit Stand Sit from Stand Walk Squat Fall Down Jog Metrics Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Precision 62.5 47.06 57.14 42.86 67.86 42.86 77.77 57.14 62.5 48.39 68.97 53.85 70.37 59.26 Recall 66.67 53.33 66.67 50.00 63.33 40.00 70.00 55.17 66.67 50.00 66.67 46.67 63.33 53.33 Accuracy 89.05 81.76 87.62 82.86 89.00 83.81 92.86 88.10 89.00 80.24 90.96 86.67 90.47 87.62 Average Accuracy Wi-HACS: 89.86% Benchmark: 84.44% Figure A.8: Performance metrics for each activity using the confusion matrices above insetting 2: (a) Wi-HACS, (b) Baseline.116Appendix A. Confusion matrices and results for Wi-HACS3a) Confusion Matrix for environment 3: Sit Stand Sit from Stand Walk Squat Fall Down Jog Sit 14 8 1 4 3 2 3 Stand 5 13 1 2 5 2 2 Sit from Stand 2 2 13 2 1 7 4 Walk 2 2 2 13 5 2 2 Squat 3 1 5 3 12 3 5 Fall Down 3 3 4 2 2 12 2 Jog 1 1 4 4 2 2 12 Total 30 30 30 30 30 30 30 Sit Stand Sit from Stand Walk Squat Fall Down Jog Sit 8 5 3 3 4 3 4 Stand 5 5 3 8 5 5 3 Sit from Stand 4 4 4 3 4 5 5 Walk 3 4 2 10 6 4 4 Squat 4 4 10 4 5 4 1 Fall Down 3 6 4 1 3 5 4 Jog 3 2 4 1 3 4 9 Total 30 30 30 30 30 30 30 (a) Confusion Matrix for Wi-HACS Actual Data Classifier Results (b) Confusion Matrix for Benchmark Actual Data Classifier Results Figure A.9: Confusion Matrices for test results in setting 3: (a) Wi-HACS, (b) Baseline. 1a) performance metrics for environment 3 Activity Sit Stand Sit from Stand Walk Squat Fall Down Jog Metrics Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Wi-HACS Baseline Precision 40.00 26.67 43.33 14.71 41.94 13.79 46.43 30.30 37.5 15.63 42.86 19.23 46.15 34.62 Recall 46.66 26.67 43.33 16.67 43.33 13.33 43.33 33.33 40.00 16.67 43.33 16.67 40.00 30.00 Accuracy 77.57 68.62 79.49 61.34 78.31 63.43 81.51 65.84 79.55 63.13 80.13 55.42 81.69 65.30 Average Accuracy Wi-HACS: 79.75% Benchmark: 62.30% Figure A.10: Performance metrics for each activity using the confusion matrices above insetting 3: (a) Wi-HACS, (b) Baseline.117
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Using Wi-Fi channel state information (CSI) for human...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Using Wi-Fi channel state information (CSI) for human activity recognition and fall detection Chowdhury, Tahmid Z. 2018
pdf
Page Metadata
Item Metadata
Title | Using Wi-Fi channel state information (CSI) for human activity recognition and fall detection |
Creator |
Chowdhury, Tahmid Z. |
Publisher | University of British Columbia |
Date Issued | 2018 |
Description | Human Activity Recognition (HAR) serves a diverse range of human-centric applications in healthcare, smart homes, and security. Recently, Wi-Fi-based solutions have attracted a lot of attention. The underlying principle of these is the effect that human bodies have on nearby wireless signals. The presence of static objects such as ceilings and furniture cause reflections while dynamic objects such as humans result in additional propagation paths. These effects can be empirically observed by monitoring the Channel State Information (CSI) between two Wi-Fi devices. As different human postures induce different signal propagation paths, they result in unique CSI signatures, which can be mapped to corresponding human activities. However, there are some limitations in current state-of-the-art solutions. First, the performance of CSI-based HARs degrades in complex environments. To overcome this limitation, we propose Wi-HACS: Leveraging Wi-Fi for Human Activity Classification using Orthogonal Frequency Division Multiplexing (OFDM) Subcarriers. In our work, we propose a novel signal segmentation method to accurately determine the start and end of a human activity. We use several signal pre-processing and noise attenuation techniques, not commonly used in CSI-based HAR, to improve the features obtained from the amplitude and phase signals. We also propose novel features based on subcarrier correlations and autospectra of principal components. Our results indicate that Wi-HACS can outperform the state-of-the-art method in both precision and recall by 8% in simple environments, and by 14.8% in complex environments. The second limitation in existing CSI-HAR solutions is their poor performance in new/untrained environments. Since accurate Wi-Fi based fall detectors can greatly benefit the well-being of the elderly, we propose DeepFalls: Using Wi-Fi Spectrograms and Deep Convolutional Neural Nets for Fall Detection. We utilize the Hilbert Huang Transform spectrograms and train a Convolutional Neural Network to learn the features automatically. Our results show that DeepFalls can outperform the state-of-the-art RT-Fall in untrained environments with improvements in sensitivity and specificity by 11% and 15% respectively. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2018-04-23 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0365967 |
URI | http://hdl.handle.net/2429/65593 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2018-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2018_may_chowdhury_tahmid.pdf [ 12.22MB ]
- Metadata
- JSON: 24-1.0365967.json
- JSON-LD: 24-1.0365967-ld.json
- RDF/XML (Pretty): 24-1.0365967-rdf.xml
- RDF/JSON: 24-1.0365967-rdf.json
- Turtle: 24-1.0365967-turtle.txt
- N-Triples: 24-1.0365967-rdf-ntriples.txt
- Original Record: 24-1.0365967-source.json
- Full Text
- 24-1.0365967-fulltext.txt
- Citation
- 24-1.0365967.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0365967/manifest