Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Detecting anomalies in activity patterns of lone occupants from electricity consumption data Leong, Kuan Long 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2016_february_leong_kuanlong.pdf [ 723.16kB ]
Metadata
JSON: 24-1.0223561.json
JSON-LD: 24-1.0223561-ld.json
RDF/XML (Pretty): 24-1.0223561-rdf.xml
RDF/JSON: 24-1.0223561-rdf.json
Turtle: 24-1.0223561-turtle.txt
N-Triples: 24-1.0223561-rdf-ntriples.txt
Original Record: 24-1.0223561-source.json
Full Text
24-1.0223561-fulltext.txt
Citation
24-1.0223561.ris

Full Text

Detecting Anomalies in Activity Patterns of LoneOccupants from Electricity Consumption DatabyKuan Long LeongB.A.Sc., The University of British Columbia, 2013A THESIS SUBMITTED IN PARTIAL FULFILMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)January 2016c© Kuan Long Leong, 2016iiAbstractAs the global population is ageing, the demand for elderly care facilities and servicesis expected to increase. Assisted living technologies for detecting medical emergenciesand assessing the wellness of the elderly are becoming more popular. A person normallyperforms activities of daily living (ADLs) on a regular basis. A person who is able toperform recurring ADLs indicates a certain wellness level. Anomalies in activity patternsof a person might indicate changes in the person’s wellness. A method is proposed in thisthesis for detecting anomalies in activity patterns of a lone occupant using electricity con-sumption measurements of his/her residence. The proposed method infers anomalies inactivity patterns of an occupant from electricity consumption patterns without a need ofexplicitly monitoring the underlying individual activities. The proposed method providesa score which is a quantitative assessment of anomalies in the electricity consumptionpattern of an occupant for a given day. A survey was conducted to obtain the hourlyactivities of three lone occupants for a month. The level of suspicion values, which arequantitative assessments of anomalies in the daily activity patterns of the occupants,were deduced from the survey. Using Fuzzy C-Means (FCM) clustering with Euclideandistance measure, the scores and level of suspicion values were clustered respectively. AAbstract iiiday was then classified as regular or irregular based on the clustering results of the scoresand level of suspicion values respectively. The results showed that anomalies in electricityconsumption patterns can effectively reflect anomalies in the underlying activity patterns.The results also showed that the proposed feature and model based method outperformsa chosen raw data based approach. The performance of the proposed method was im-proved when subsets of features were considered based on the minimum RedundancyMaximum Relevance (mRMR) feature selection. A supervised learning method basedon the Curious Extreme Learning Machine (C-ELM) was then proposed. The proposedmethod based on C-ELM (PM-CELM) outperforms the proposed method based on FCM(PM-FCM), but PM-FCM can operate without labelled training data.ivPrefaceThis thesis is original, unpublished, independent work by the author, Kuan Long Leong,under the supervision of Professor Cyril Leung.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Table of Contents vi2 Detecting Anomalies in Activity Patterns from Electricity Consump-tion Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Features for Representing Regular Electrical Energy Consumption Patterns 112.3 Model for Quantitatively Assessing Detected Anomalies . . . . . . . . . . 152.3.1 Regular Electrical Energy Patterns . . . . . . . . . . . . . . . . . 162.3.2 Final Score and Design Variables . . . . . . . . . . . . . . . . . . 202.3.3 Selection of the Electrical Energy Consumption Data . . . . . . . 222.3.4 Probability Thresholds and Score Assignment . . . . . . . . . . . 232.3.5 Z-Score Threshold and Score Assignment . . . . . . . . . . . . . . 232.3.6 Flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Validating the Proposed Method with a Survey of Activities . . . . . 253.1 Survey of Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2 Regular Activity Patterns for the Survey . . . . . . . . . . . . . . . . . . 263.2.1 Daily Highly Probable Activities . . . . . . . . . . . . . . . . . . 293.2.2 Daily Less Probable Activities . . . . . . . . . . . . . . . . . . . . 293.2.3 Hourly Highly Probable Activities . . . . . . . . . . . . . . . . . . 293.2.4 Hourly Less Probable Activities . . . . . . . . . . . . . . . . . . . 303.2.5 Daily Less Probable Durations of Activities . . . . . . . . . . . . 303.2.6 Less Probable Daily Energy Consumption . . . . . . . . . . . . . 313.2.7 Level of Suspicion . . . . . . . . . . . . . . . . . . . . . . . . . . . 31Table of Contents vii3.3 Configurations of the Design Variables and Thresholds of Acitivty Patterns 323.3.1 Design Variables for the Proposed Method . . . . . . . . . . . . . 323.3.2 Thresholds of Regular Activity Patterns for the Survey . . . . . . 333.4 Correlation between Energy Consumption Patterns and Activity Patterns 353.5 Comparison of the Proposed Method and a Raw Data Based Approach . 373.5.1 Pseudo Ground Truth from the Survey . . . . . . . . . . . . . . . 383.5.2 Training Sets and Test Sets . . . . . . . . . . . . . . . . . . . . . 393.5.3 Clustering of the Scores Provided by the Proposed Method . . . . 403.5.4 Clustering of the Raw Energy Consumption Sequences . . . . . . 403.5.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 423.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454 Reducing the Energy Features Based on mRMR Feature Selection . 464.1 Overview of Feature Selection Methods . . . . . . . . . . . . . . . . . . . 464.2 mRMR Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 Classifying Electrical Energy Consumption Patterns Based on C-ELM 555.1 Classification Based on C-ELM . . . . . . . . . . . . . . . . . . . . . . . 565.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Table of Contents viii6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 636.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78A Pseudocode for the Proposed Method . . . . . . . . . . . . . . . . . . . . 79B Survey Samples from Home C . . . . . . . . . . . . . . . . . . . . . . . . 81C Choosing the Design Variable Values for the Proposed Method . . . . . . 84D Choosing the Threshold Values of Regular Activity Patterns for the Survey 87E Performance Evaluation without Considering Daily Energy Consumption 90F Rankings of the Energy Features for Homes A, B and C . . . . . . . . . . 92ixList of Tables2.1 Summary of the energy datasets . . . . . . . . . . . . . . . . . . . . . . . 103.1 The configurations of the design variables for the proposed method . . . 333.2 The threshold values of regular activity patterns for the survey . . . . . . 343.3 The lengths of the survey and daily energy consumption data in days . . 343.4 The lengths of the training set and test set data . . . . . . . . . . . . . . 403.5 The performances of the proposed method and the chosen raw data basedapproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.1 The performances of the proposed method when subsets of the top rankedone, four, 13 and 72 feature(s) were considered . . . . . . . . . . . . . . . 535.1 The threshold values for C-ELM for Homes A, B and C . . . . . . . . . . 585.2 The performances of PM-CELM, PM-FCM and PM-mRMRnFCM . . . . 605.3 The performances of PM-mRMRnCELM when subsets of the top ranked11 and 72 features were considered . . . . . . . . . . . . . . . . . . . . . 61D.1 A list of potential threshold values for Home B . . . . . . . . . . . . . . . 88List of Tables xE.1 The performances of the proposed method and the chosen raw data basedapproach without considering daily energy consumption . . . . . . . . . . 90F.1 The rankings of the energy features for Homes A, B and C based on mRMRwith the MIQ criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93F.2 Dictionary of the feature indices . . . . . . . . . . . . . . . . . . . . . . . 94xiList of Figures2.1 The typical hourly energy consumption pattern of Home C during a day 112.2 The 1-hour to 24-hour moving totals at each time point (hour) . . . . . . 132.3 Part of the sample maxima probability matrix. Each row corresponds toa moving total whereas each column corresponds to a time point (hour) ofa day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.1 A part of the survey timesheet . . . . . . . . . . . . . . . . . . . . . . . . 263.2 Home C - differences in the energy patterns . . . . . . . . . . . . . . . . 273.3 Home A - scores sorted in descending order of level of suspicion values . . 363.4 Home B - scores sorted in descending order of level of suspicion values . . 363.5 Home C - scores sorted in descending order of level of suspicion values . . 363.6 Flowcharts of the method for classifying a day as regular or irregular froman activity perspective (left), the proposed method based on FCM (center),and the raw data based approach (right) . . . . . . . . . . . . . . . . . . 424.1 Flowcharts of the proposed method based on mRMR and FCM (left) andthe proposed method based on FCM (right) . . . . . . . . . . . . . . . . 51List of Figures xii4.2 Accuracies of the proposed method when subsets of the respective topranked one to 72 feature(s) of Homes A, B and C were considered . . . . 525.1 Flowcharts of the proposed method based on C-ELM (without the shadedsteps) and the proposed method based on mRMR and C-ELM (with theshaded steps) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.2 Accuracies of PM-mRMRnCELM when subsets of the top ranked one to72 feature(s) of Homes A, B and C were considered . . . . . . . . . . . . 61B.1 Home C survey timesheet - Mar 2,2015 . . . . . . . . . . . . . . . . . . . 82B.2 Home C survey timesheet - Mar 15,2015 . . . . . . . . . . . . . . . . . . 83C.1 Occurrence probability of the maximum hourly energy consumption atHome B considering 30, 60, 90 and 120 days of data respectively . . . . . 84xiiiList of AcronymsADL Activity of Daily LivingC-ELM Curious Extreme Learning MachineCGH Comparative Genomic HybridizationFCM Fuzzy C-Means ClusteringMID Mutual Information DifferenceMIQ Mutual Information QuotientmRMR minimum Redundancy Maximum Relevance FeatureSelectionPM-CELM Proposed Method based on C-ELMPM-FCM Proposed Method based on FCMPM-mRMRnCELM Proposed Method based on mRMR and C-ELMPM-mRMRnFCM Proposed Method based on mRMR and FCMList of Acronyms xivSVM-RFE Support Vector Machine Recursive Feature Elimina-tionxvAcknowledgementsI would like to express my sincere gratitude to my supervisor, Professor Cyril Leung, forhis immeasurable support and guidance throughout my graduate studies. His patienceand guidance help me overcome many challenges and finish this thesis. Without ProfessorLeung’s guidance, this thesis would not have been possible.This work was supported in part by the Natural Sciences and Engineering ResearchCouncil (NSERC) of Canada under Grant RGPIN 1731-2013, by the UBC Faculty ofApplied Science, and by the National Research Foundation, Prime Minister’s Office,Singapore under its IDM Futures Funding Initiative and administered by the Interactiveand Digital Media Programme Office.I would like to thank my co-supervisor, Professor Chunyan Miao (Nanyang Techno-logical University), for her guidance and assistance in this research work. I would also liketo thank Dr. Qiong Wu (Nanyang Technological University) for helping me understandthe Curious Extreme Learning Machine (C-ELM).I am thankful to Dr. Christine Chen for her suggestions on the topics for futureresearch.I am grateful to my family members and friends for providing me with the dataAcknowledgements xvirequired for this research work and participating in the survey. Without their help, itwould not have been possible to validate my research ideas.My friends have helped me stay sane through the difficult years of graduate school. Igreatly cherish their friendships and deeply appreciate their support.Most importantly, none of this would have been possible without the love, encourage-ment and patience of my parents. I am deeply grateful to my parents who encourage meto study abroad and to pursue further education. Without my parents, I would not bewho I am today.I dedicate this thesis to my beloved parents.1Chapter 1IntroductionThis chapter begins with the motivation of the research work in this thesis. Works relatedto assisted living technologies and time series clustering are then discussed, followed byan outline of the contributions of this work. The organization of the thesis is describedat the end of this chapter.1.1 MotivationThe global population share of people aged 60 and over is expected to increase from11.7% in 2013 to 21.1% in 2050 [1]. In other words, the number of the elderly people isexpected to reach 2 billion in 2050 from 841 million in 2013. About 92% of people aged 65and over in the United States have at least one chronic disease [2]. Similarly, almost 90%of people aged 65 and over in Canada have at least one chronic condition [3, 4] and 74%of people aged 65 and over in Canada are taking at least one medication [5]. The demandfor elderly care facilities and services such as nursing homes is expected to increase withthe ageing population. A U.S. survey found that 87% of people aged 65 and over preferto age in their own homes [6]. Thus, there is a need to ensure a safe and independentChapter 1. Introduction 2ageing environment for those elderly people who prefer to stay in their own homes, andto help them live in their preferred environments for as long as possible.Assisted living technologies for detecting medical emergencies and assessing the well-ness of the elderly are becoming more popular [7]. While the most direct way of detectingmedical emergencies is arguably by monitoring physiological data such as heart rate orblood pressure, estimating the wellness of a person usually involves monitoring the activ-ities of daily living (ADLs). An important component of the assisted living technologiesis activity recognition and monitoring [7].A person normally performs ADLs on a regular basis. The capability of performingADLs regularly implies that the person is at least physically able to maintain a regularlifestyle. It also indicates that the wellness of the person is at a certain level. Large devi-ations from a regular daily routine may indicate changes in the capability of performingADLs. Such deviations may be used to alert relatives or caregivers to look into the causeof the deviations in a timely manner.To establish the regular activity patterns of a person, number of approaches havebeen proposed to monitor the individual ADLs [7]. Individual activities are usuallymonitored using ambient sensors (e.g. motion sensors and force sensors) placed smartlyin the monitored environment or by cameras. Individual activities are then recognizedas sequences of sensor events or sequences of images. The regular activity patterns of aperson can then be established based on the sequences of recognized activities.ADLs usually involve using electric home appliances. Activities that consume energyChapter 1. Introduction 3might be inferred from the household energy consumption patterns. Energy denoteselectrical energy or electricity in this section and the rest of the thesis unless otherwisespecified. If a person’s activity patterns are exactly the same every day, the energyconsumption patterns will also probably be the same every day. The energy consumptionpatterns of a monitored environment would reflect the activity patterns of the person. Ifthe activity patterns of a person can be sufficiently represented by the energy patterns,these could be used to detect anomalies in the underlying activities and there might beno need to explicitly monitor the underlying activities individually. This thesis aims toshow that deviations from a person’s regular activity patterns can be effectively detectedusing energy consumption measurements of his/her residence.1.2 Related WorksVarious assisted living technologies have been proposed to alleviate health problems,detect medical emergencies and improve the wellness of the elderly [8–30]. One impor-tant assisted living technology is health status monitoring. Physiological parameters arepossibly the best indicators of the health status of a person. The challenge of developingtechnologies for monitoring the health status of a person in the home environment is todevelop a portable, power-efficient and cost-effective device which is able to communi-cate with the relevant health care providers in a timely manner. Practical approachesproposed in [8–11] can monitor the physiological parameters such as electrocardiogramsignals, blood glucose concentrations, blood pressure, body temperature, heart rate andChapter 1. Introduction 4respiratory rate using ZigBee or Bluetooth for communication.Another important assisted living technology is fall detection. The challenge in de-tecting falls in the home environment is to differentiate unintentional falls from normalactivities and to minimize false alarms and missed detections. Fall detection approachescan be divided into two categories: wearable and non-wearable. Approaches using wear-able accelerometers were proposed in [12, 13]. How the wearable device is worn and theperson’s willingness to wear the device are critical to the effectiveness of this approach.Audio based [14, 15] and visual based [16, 17] non-wearable approaches have also beenproposed. Audio based approaches can be adversely affected by background noise whilevisual based approaches can be harmfully affected by occlusions.Another important assisted living technology is activity recognition and monitoring.Activity recognition helps determine what daily activities are essential to a person. Longterm monitoring helps determine the regular activity patterns of a person. A person whois able to perform recurring ADLs indicates a certain wellness level. Activity recogni-tion approaches can generally be categorized as video-based [18] or sensor-based [19–30].In [21–25], the authors designed a wireless sensor network which can interpret the well-ness of a person by monitoring various home accessories such as bed, microwave oven,toilet, dining chair, etc. The wellness of a person was interpreted according to how wellthe monitored person performed the essential daily activities in terms of the home ap-pliances’ active and inactive durations. The difficulties of this approach are as follows:1) determining a sufficient number of sensors for monitoring ADLs, 2) storing the sensorChapter 1. Introduction 5data efficiently, 3) annotating the activities deduced from the sensor data, and 4) classi-fying regular and irregular activities accurately. Although an irregular activity pattern ofa person might be detected correctly, the irregularity may or may not indicate a changein the person’s wellness.In [26], the authors tried to relate a person’s movement pattern to physical andmental health. They explored the correlations of the movement patterns of 10 loneoccupants in 10 different homes and baseline measures of their depression levels andmobility levels. The movement patterns were captured by motion sensors placed in themonitored homes. Although the movement patterns showed strong correlation with thebaseline mobility levels as expected, they only show weak correlation with the baselinedepression levels. The authors concluded that there was not sufficient evidence to showa significant correlation between one’s movement patterns captured by the sensors andone’s depression level. No follow-up study has been found by the author.In [27], the authors proposed to detect anomalies in activity patterns of a personbased on temporal relations between events captured by some sensors such as motionsensors and light sensors. For instance, if event A always occurs before event B but therecurring temporal relation between A and B is violated on a particular day, it will benoted as an anomaly. It would be impractical to investigate all events that occur duringa day and hence the authors only focused on the temporal relations between the mostfrequent events. However, annotating the events captured by the sensors and identifyingthe temporal relations between events could still be time-consuming.Chapter 1. Introduction 6This work proposes to infer anomalies in activity patterns of a person from the house-hold energy consumption patterns. Energy consumption data is essentially a time seriesor a sequence. A straightforward way would be to compare the energy sequences withone another using some similarity measure. Similar sequences are assigned to the samecluster and there could be multiple clusters. Each cluster can then be assigned a classlabel (e.g. regular or irregular) if there is sufficient information. Approaches to clus-ter time series data can be divided into three categories: 1) raw data based, 2) featurebased, and 3) model based [31]. In [32], the authors compared four different commonlyused similarity measures for clustering (Euclidean distance, Mahalanobis distance [33],Dynamic Time Warping distance [34] and Pearson’s correlation [35]) by using raw hourlyenergy consumption data for five university buildings. It was found that the Euclideandistance measure was the overall best similarity measure for clustering the raw hourlyenergy consumption data according to four different validity techniques (Dunn index [36],Davies-Bouldin index [37], clustering balance [38] and cluster-vector balance [32]).However, most similarity measures are too sensitive to slight changes in the rawhourly energy consumption data. For instance, two 24-hour hourly energy consumptionsequences can be clustered into two different groups due to trivial differences even thoughthe underlying activity patterns are almost the same. It has also been observed thatsome key features (e.g. maximum, minimum) of energy patterns are more relevant to theunderlying activity patterns. In other words, the key features could be better indicatorsof the underlying activity patterns than the raw energy data. Therefore, it might beChapter 1. Introduction 7better to work with features selected or extracted from the raw energy consumption datafor detecting anomalies in activity patterns.1.3 ContributionsThis work explores the correlation between the household energy consumption data andthe activity patterns of lone occupants living at home. A method is proposed to detectanomalies in activity patterns of the occupant by monitoring the household energy con-sumption. This work differs from related works in one fundamental perspective. Thiswork intends to detect anomalies in activity patterns of a person without explicitly mon-itoring the individual activities or actions of the person. The household energy consump-tion patterns are used as representations of the activity patterns of the occupant withoutexplicitly considering the individual activities. Instead of attempting to use the rawenergy consumption data for detection, we use features extracted from the raw energyconsumption data. The extracted features are designed to effectively reflect the underly-ing activities in terms of the time of day and related energy consumption. We show thatthe proposed method is more accurate for detecting anomalies in activity patterns of theoccupant than a chosen raw data based approach.If the objective of a certain system is to detect anomalies in activity patterns of aperson, the proposed method represents a simple solution since annotating the activitydata is generally time-consuming and usually requires a large training set [7]. If there is astrong correlation between the household energy consumption patterns and the activityChapter 1. Introduction 8patterns of the occupant, the anomalies in energy consumption patterns will reflect theanomalies in activity patterns of the occupant. The method proposed in this work doesnot identify whether the anomalies detected indicate a positive or negative change in theperson’s wellness. However, it can trigger an alert to caregivers or relatives who can theninvestigate in a timely manner.1.4 Structure of the ThesisThe rest of the thesis is organized as follows. In Chapter 2, a method for detectinganomalies in activity patterns of lone occupants from household energy consumptiondata is proposed. The datasets are first described, followed by the features used forrepresenting regular energy patterns of an occupant and the model for quantitativelyassessing the detected anomalies in energy patterns. In Chapter 3, the survey of activitiesand regular activity patterns are first described, followed by configurations of the relevantvariables and thresholds. The correlation between energy patterns and activity patternsand a comparison of the proposed method with a chosen raw data based approach arethen discussed. In Chapter 4, an overview of feature selection methods is first presented,followed by an introduction to the mRMR feature selection [39, 40] and a performanceevaluation of the proposed method with reduced feature sets. In Chapter 5, a supervisedlearning method based on C-ELM [41] for classifying the energy patterns is proposed.The main findings and some topics for future research are summarized in Chapter 6.9Chapter 2Detecting Anomalies in ActivityPatterns from ElectricityConsumption DataThe objective behind the proposed method is to detect anomalies in activities of dailyliving (ADLs) of an occupant from the household electricity (electrical energy) consump-tion data. A quantitative assessment of anomalies detected during a day is provided bythe method. The result can be used to classify the daily activity pattern of the occupantas regular or irregular from an electrical energy consumption perspective. The methodis designed for use in homes with lone occupants. The datasets, features for representingregular electrical energy consumption patterns, and model for quantitatively assessingdetected anomalies are discussed in this chapter.Chapter 2. Detecting Anomalies in Activity Patterns from Electricity Consumption Data 102.1 DatasetsThe proposed method is based on hourly energy consumption data of a household overmultiple days (e.g. 30 days). Data used in this work were collected from three partic-ipants living alone at home. As the BC Hydro smart meters [42] were installed in theparticipants’ homes, the hourly electrical energy consumption data were simply down-loaded by each participant from the BC Hydro website . A summary of the three datasetsis shown in Table 2.1.Home # of Occupant Type of Home Data Length (days)A 1 Apartment 30B 1 Apartment 365C 1 Detached 365Table 2.1: Summary of the energy datasetsMost days of the dataset were 24-hour except for the first day and the last day ofthe Daylight Saving Time period. The 25-hour day was converted to a 24-hour day byassigning the average of the hourly consumptions of the two 1AM-2AM periods to onesingle 1AM-2AM period. The 23-hour day was converted to a 24-hour day by assigningthe average of the hourly consumptions of 1AM-2AM and 3AM-4AM to the missing2AM-3AM.Chapter 2. Detecting Anomalies in Activity Patterns from Electricity Consumption Data 112.2 Features for Representing Regular ElectricalEnergy Consumption PatternsFeatures are needed to represent the regular (recurring) energy consumption patternsof an occupant. The basic assumption made in this thesis is that people perform theirADLs on a regular basis. Many of the ADLs involve using electric home appliances. It istherefore plausible that these activities may be inferred from the electricity consumptiondata.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 2400.20.40.60.811.21.41.6Time of dayKWHFigure 2.1: The typical hourly energy consumption pattern of Home C during a dayFor instance, the occupant of Home C normally wakes up at 8AM, uses electric cookingappliances right away for two to three hours, is away from home from 11AM to 9PM, andgoes to bed at 11PM. The hourly energy consumption pattern of a typical day at Home Cis shown in Figure 2.1. As can be seen from Figure 2.1, the energy consumption patternmatches quite well the activity pattern. This provides strong evidence that the activityChapter 2. Detecting Anomalies in Activity Patterns from Electricity Consumption Data 12patterns of the occupant are reflected in the energy consumption patterns. Instead ofattempting to recognize the underlying individual activities from the energy consumptiondata, the proposed method uses energy patterns to represent activity patterns of theoccupant without explicitly considering the underlying individual activities.The available dataset in this work can only provide up to one data point per hour.However, an activity is not limited to begin and end within a single pre-framed hourlyinterval in the dataset. An activity can begin at any time in one hourly interval and endat any time in another hourly interval. For instance, if a certain activity normally takesone hour and it normally takes place within two hourly frames, the energy consumptioncorresponding to this activity can be distributed differently among the two consecutivehourly frames depending on the time at which the activity actually begins.For example, if there is unrealistically only one given activity that can occur withintwo particular consecutive hourly frames and it consumes the exact same amount ofenergy whenever it occurs, the sum of the energy consumptions of the two hourly frameswill always be the same whenever the activity occurs. However, if the two consecutivehourly frames are observed separately, the electricity consumptions in the first hour andin the second hour of the two hourly frames may not be the same respectively every daydepending on when the activity occurs. Although there is a consistency in the activitypattern, it indicates that a consistency in this case may not be found in the energyconsumption pattern if the energy consumptions of the two consecutive hourly framesare considered separately. However, a consistency can be found in the energy consumptionChapter 2. Detecting Anomalies in Activity Patterns from Electricity Consumption Data 13pattern if the sum of the energy consumptions of the two consecutive hourly frames isconsidered.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24012345678910KWHTime of dayFigure 2.2: The 1-hour to 24-hour moving totals at each time point (hour)It can easily be seen that a similar argument can also be made for any multipleconsecutive hourly frames. This suggests that considering the multiple-hour total con-sumption can help reflect the underlying activity patterns of the occupant. Therefore,at each time point, the proposed method goes back in time to calculate the 1-hour to24-hour total energy consumptions (moving totals). Consequently, at each time point,we not only have the original energy consumption of the previous hour but also the to-tal energy consumptions of the previous two to 24 hours respectively, as illustrated inFigure 2.2. In Figure 2.2, at any time point, each dot on the bottom curve representsthe hourly consumption of the previous one hour, each dot on the second curve fromthe bottom represents the total consumption of the previous two hours, and so on. ThisChapter 2. Detecting Anomalies in Activity Patterns from Electricity Consumption Data 14step of the proposed method does not add any new information to the dataset; it simplycomputes the two to 24 hours moving totals using the hourly energy consumption datain the dataset.The objective of the proposed method is to detect anomalies in activity patternsfrom the household energy consumption data. However, the raw energy consumptionpatterns might not be the best representations of the underlying activity patterns. Thisis because unimportant parts of the raw energy patterns might adversely affect the de-tection of anomalies in the underlying activity patterns. Although the activity patternsare the same, their corresponding energy patterns may not be exactly the same due tosome small differences. Detecting anomalies may be ineffective if we compare one rawenergy pattern to another using some conventional distance measure such as Euclideandistance measure. Rather, we should choose features which are more representative ofthe underlying activities from the raw energy patterns.The local maxima and local minima of energy patterns are likely to be good indicatorsof the active and inactive hours of an occupant. Therefore, the times at which the maximaand minima occur may be good indicators of the underlying activity patterns. Thedifference between the maximum and minimum energy consumptions might indicate theenergy consumption of the relevant activities. Therefore, the features chosen to representthe energy patterns of an occupant are: 1) the time of day when the maximum occurs foreach moving total, 2) the time of day when the minimum occurs for each moving total,and 3) the range (difference between the maximum and minimum) of each moving total.Chapter 2. Detecting Anomalies in Activity Patterns from Electricity Consumption Data 15In the case that the extremum (maximum or minimum) is not unique, one timepoint among the multiple time points sharing the same extremum is chosen uniformlyat random. Eventually, the features chosen to represent the energy consumption patternof an occupant on a particular day are the 24 time points of maxima, 24 time pointsof minima, and the 24 ranges. One set of the 72 features obtained from one single dayonly represents the energy pattern of the occupant on that particular day. To deduce theregular energy patterns of the occupant, these features need to be collected over multipledays.2.3 Model for Quantitatively Assessing DetectedAnomaliesThe model is used to detect anomalies in energy consumption patterns of an occupantand to provide quantitative assessments of the detected anomalies. The result can thenbe used to classify the daily energy consumption pattern of the occupant as either reg-ular or irregular. Regular (recurring) energy patterns of the occupant are deduced froma collection of the above-mentioned features over multiple days. The regular electricalenergy patterns, final score and design variables, selection of the electrical energy con-sumption data, probability thresholds and score assignment, z-score threshold and scoreassignment, and flexibility are discussed in this section.Chapter 2. Detecting Anomalies in Activity Patterns from Electricity Consumption Data 162.3.1 Regular Electrical Energy PatternsAs mentioned in Section 2.2, the energy pattern of an occupant on a particular day isrepresented by the 24 time points of maxima, 24 time points of minima, and the 24ranges. A quantitative score is provided by the proposed method to indicate how well anenergy pattern of an occupant on a particular day matches the regular energy patterns.Regular energy patterns of an occupant are deduced from a collection of the featuresmentioned in Section 2.2 over multiple days (e.g. 60 days). The regular energy patternsof an occupant are described by 1) how likely a maximum or minimum of a moving totaloccurs at a particular time, and 2) how likely the range (maximum minus minimum)of a moving total happens to have a certain value. The time of day when a maximumor minimum energy consumption normally occurs reflects the active and inactive hoursof an occupant. The normal range of a moving total indicates the typical total energyconsumption of the relevant activities.Using the time points of maxima and the time points of minima of each day overmultiple days (e.g. 60 days), the probability of each time point (hour) being the maximumor minimum of each moving total can be approximated. The pseudocode of the algorithmfor estimating the occurrence probabilities of the maximum (minimum) at each hour forthe hourly energy consumption data is given in Algorithm 1.Chapter 2. Detecting Anomalies in Activity Patterns from Electricity Consumption Data 17Algorithm 1 The pseudocode for estimating the occurrence probabilities of the maxima(minima) for the hourly energy consumption data1: Step 1 Obtain the hourly energy consumption of each hour for one day (i.e. thebottom curve in Figure 2.2)2: Step 2 Record the time point at which the maximum (minimum) occurs3: Step 3 Repeat Step 1 and Step 2 for each day over multiple days (e.g. 60 days)4: Step 4 Estimate the probability that the maximum (minimum) would occur at 1AMaccording to the records5: Step 5 Repeat Step 4 for the remaining 23 time pointsTo calculate the occurrence probability of the maximum (minimum) at each timepoint (hour) for the 2-hour moving total, Algorithm 1 obtains the 2-hour total energyconsumption of each hour for one day (i.e. the second to bottom curve in Figure 2.2) inStep 1. Step 2 to Step 5 are then performed. Corresponding modifications to Step1 of Algorithm 1 can be made in order to estimate the occurrence probabilities of themaxima (minima) for the 3-hour to 24-hour moving totals.Following the previously described procedure (Algorithm 1 and its modified versions),a total of 24×24 = 576 occurrence probabilities are obtained for the maxima and an equalnumber for the minima. These probabilities can be arranged in two 24 × 24 matrices,one for the maxima and one for the minima. Part of a sample maxima probabilitymatrix is illustrated in Figure 2.3, where each row represents a moving total and eachcolumn represents a time point. For instance, it shows that there is 63% chance that theChapter 2. Detecting Anomalies in Activity Patterns from Electricity Consumption Data 18maximum of the 2-hour moving total occurs at 9AM. The minima probability matrix isnot shown here, but it can be interpreted in a similar manner. These two matrices helprepresent the regular energy consumption patterns of an occupant and they can indicatethe normal active and inactive hours of the occupant.Figure 2.3: Part of the sample maxima probability matrix. Each row corresponds to amoving total whereas each column corresponds to a time point (hour) of adayUsing the ranges of the moving totals over multiple days (e.g. 60 days), the mean andstandard deviation of the range of each moving total can be computed. The pseudocodeof the algorithm for estimating the mean and standard deviation of the range of thehourly energy consumption data are given in Algorithm 2.Chapter 2. Detecting Anomalies in Activity Patterns from Electricity Consumption Data 19Algorithm 2 The pseudocode for estimating the mean and standard deviation of therange of the hourly energy consumption data1: Step 1 Obtain the hourly energy consumption of each hour for one day (i.e. thebottom curve in Figure 2.2)2: Step 2 Obtain the maximum and minimum3: Step 3 Compute the range (maximum minus minimum) and record it4: Step 4 Repeat Step 1 to Step 3 for each day over multiple days (e.g. 60 days)5: Step 5 Compute the mean and standard deviation of the recorded rangesTo compute the mean and standard deviation of the range of the 2-hour moving total,we simply use the 2-hour total energy consumption at each time point (hour) for one day(i.e. the second to bottom curve in Figure 2.2) in Step 1 of Algorithm 2. Correspondingmodifications to Step 1 of Algorithm 2 can be made in order to compute the means andstandard deviations of the ranges of the 3-hour to 24-hour moving totals.After applying the above procedure (Algorithm 2 and its modified versions) for allmoving totals, the model will then have the means and standard deviations of the rangesof all 24 moving totals. The means and standard deviations of the ranges help representthe regular energy consumption patterns of an occupant and they provide an indicationof the normal energy consumptions of the underlying activities.To sum up, the regular (recurring) energy consumption patterns of an occupant arenumerically represented by the two probability matrices (one for the minima and one forthe maxima) and the 24 pairs of means and standard deviations.Chapter 2. Detecting Anomalies in Activity Patterns from Electricity Consumption Data 202.3.2 Final Score and Design VariablesTo quantitatively assess the anomalies in the energy pattern of a given day, the energyfeatures of that day need to be provided. As discussed in Section 2.2, the features includethe 24 time points of maxima, 24 time points of minima and the 24 ranges.First, the 24 time points of the maxima (minima) of the given day are converted toscores as described in Algorithm 3. After applying Algorithm 3, the 24 time points ofthe maxima and 24 time points of the minima will be converted to 48 sub-scores.Algorithm 3 The pseudocode for converting the 24 times points of the maxima (minima)of a given day to scores1: Step 1 Obtain the time of day when the maximum (minimum) occurs for the hourlyenergy consumption data2: Step 2 Retrieve the occurrence probability of that time point being the maximum(minimum) from the maxima (minima) probability matrix3: Step 3 Return a positive score (e.g. +1) if the probability is greater than or equalto a pre-defined threshold; otherwise, return a negative score (e.g. -1)4: Step 4 Obtain the time of day when the maximum (minimum) occurs for each ofthe remaining 23 moving totals and repeat Step 2 and Step 3Second, the 24 ranges of that particular day are converted to scores as described inAlgorithm 4.Chapter 2. Detecting Anomalies in Activity Patterns from Electricity Consumption Data 21Algorithm 4 The pseudocode for converting the 24 ranges of a given day to scores1: Step 1 Obtain the range of the hourly energy consumption data of the given day2: Step 2 Retrieve the mean and standard deviation of the range previously computedas described in Algorithm 23: Step 3 Calculate the Z-Score (standardized value) of the range of the given day4: Step 4 Return a positive score (e.g. +1) if the absolute Z-Score is less than or equalto a pre-defined threshold; otherwise, return a negative score (e.g. -1)5: Step 5 Obtain the range of each of the remaining 23 moving totals and repeat Step2 to Step 4A Z-Score (also know as standard score) measures the distance between an observationand its mean in terms of number of standard deviations [43]. After applying Algorithm4, the 24 ranges will be converted to 24 sub-scores.The final score for a given day, which is a quantitative assessment of the anomaliesdetected in the energy pattern of that day, is the sum of the 72 sub-scores. A more positivefinal score for a day indicates that the energy pattern on that day has less deviation fromthe regular energy patterns. Conversely, a more negative final score for a day indicatesthat there is a large deviation from the regular energy patterns. The pseudocode for theproposed method is provided in Appendix A.Regardless of the occurrence of anomalies, there are five key design variables in theproposed method that will affect the final score:1. The selection of the energy consumption data to be included in the modelChapter 2. Detecting Anomalies in Activity Patterns from Electricity Consumption Data 222. The thresholds for classifying the probabilities3. The score assignment for the classified probabilities4. The threshold for classifying the Z-Scores5. The score assignment for the classified Z-ScoresThese five design variables give the model flexibility and they play a significant rolein determining the final scores.2.3.3 Selection of the Electrical Energy Consumption DataTo compute the final score for a given day, some energy consumption data need to beincluded in the model in order to obtain the regular energy patterns of an occupant.The selection of the energy consumption data is likely to have a significant impact onthe probabilities, means and standard deviations that are used to represent the regular(recurring) energy patterns of an occupant. If the regular energy patterns of an occupantdid not vary much, the simplest way would be to include as much data in the model aspossible. In practice, some changes in the regular energy patterns of an occupant areexpected. Including too much data may make it difficult to distinguish between recentregular energy patterns of an occupant and past patterns which may no longer exist. Thelength of the energy consumption data to be included is one of the design variables thatneed to be chosen.Chapter 2. Detecting Anomalies in Activity Patterns from Electricity Consumption Data 232.3.4 Probability Thresholds and Score AssignmentThe maxima and minima probability matrices are constructed given the selection of theenergy consumption data. The probabilities in the matrices are then classified as probableor not probable using some pre-determined thresholds. The probability is deemed to beprobable if it exceeds its corresponding threshold; otherwise, it is deemed to be notprobable. The probability matrices are then converted into score matrices. If the actualprobabilities are used as the sub-scores for computing the final scores, the sub-scorescorresponding to the maxima and minima will have too many different levels; the probableprobabilities would not be well differentiated from the not probable probabilities.Therefore, a probable probability is assigned a positive score (e.g. +1) whereas a notprobable probability is assigned a negative score (e.g. -1). Instead of using the actualprobabilities, the positive and negative scores assigned are used to compute the finalscores as explained in Section 2.3.2. The set of the maxima and minima probabilitythresholds and the magnitude of the score assignment are two of the design variablesthat need to be chosen.2.3.5 Z-Score Threshold and Score AssignmentUsing the mean and standard deviation of the range (maximum minus minimum) of eachmoving total obtained over multiple days (e.g. 60 days), the Z-Score of the range ofeach moving total on a particular day can be calculated. The Z-Score indicates how faraway the range of a moving total on a given day is from its mean in terms of numberChapter 2. Detecting Anomalies in Activity Patterns from Electricity Consumption Data 24of standard deviations. The Z-Scores are then classified as normal or not normal usinga pre-determined threshold. The Z-Score is deemed normal if it does not exceed thethreshold; otherwise, it is deemed not normal. A normal Z-Score is assigned a positivescore (e.g. +1) whereas a not normal Z-Score is assigned a negative score (e.g. -1).The scores assigned are used to compute the final scores as explained in Section 2.3.2.The Z-Score threshold and the magnitude of the score assignment are two of the designvariables that need to be chosen.2.3.6 FlexibilityThe flexibility of the model comes from the five design variables mentioned above. Theselection of the energy consumption data may affect the deduced regular (recurring)energy consumption patterns of an occupant and hence the final scores. The set of themaxima and minima probability thresholds and the Z-Score threshold determine howfrequently an energy pattern has to occur in order to be considered regular. The timesof day when maxima and minima of the moving totals occur are more relevant to whenactivities normally occur, while the Z-Scores of the moving totals are more relevant tothe normal energy consumptions of activities. Therefore, the proposed method can placegreater emphasis on either the times of day when activities normally occur or the normalenergy consumptions of activities by putting more weight on either the probability scoreassignment or the Z-Score score assignment. An example of the configuration of thedesign variables is given in Section 3.3.1.25Chapter 3Validating the Proposed Methodwith a Survey of ActivitiesA validation of the effectiveness of the proposed method is presented in this chapter. Asurvey of activities was conducted to obtain relevant information from the participatinglone occupants. The purpose of the survey was to obtain evidence to assess the effec-tiveness of the method proposed in Chapter 2. The survey of activities, regular activitypatterns for the survey, configurations of the design variables and thresholds of acitivtypatterns, correlation between energy consumption patterns and activity patterns, com-parison of the proposed method and a raw data based approach are discussed below.3.1 Survey of ActivitiesTo validate the correlation between the household energy consumption data and activitypatterns of a person, a survey was conducted to obtain the hourly activities of three loneoccupants for a month. The survey was designed to record the activities of daily living(ADLs) of the occupants. The purpose of the survey was to obtain evidence to show thatChapter 3. Validating the Proposed Method with a Survey of Activities 26Figure 3.1: A part of the survey timesheetanomalies in energy consumption patterns would reflect anomalies in activity patternsof the occupants. A part of the survey timesheet is shown in Figure 3.1. Please referto Appendix B for two complete sample survey timesheets. The listed activities of thesurvey included the essential ADLs such as sleeping and dining. Each participant wasasked to mark the listed activities that took place during any part of each hour on thetimesheet.3.2 Regular Activity Patterns for the SurveyThe regular (recurring) activity patterns of each participating occupant were deducedfrom the activity data of the survey. Using the activity data of the survey, the followingcan be estimated: 1) the occurrence probability of a certain activity during a day, 2)Chapter 3. Validating the Proposed Method with a Survey of Activities 27the occurrence probability of a certain activity at a given hour, and 3) the occurrenceprobability of a particular duration of a certain activity during a day. An activity couldbe considered regular if it is likely to occur during a day. An activity could also beconsidered regular if it is likely to occur at certain hours or has certain durations duringa day. Considering the above conditions, the following features of activity patterns werededuced using appropriate thresholds: 1) highly probable activities during a day, 2) lessprobable activities during a day, 3) highly probable activities during a given hour, 4)less probable activities during a given hour, and 5) less probable durations of a certainactivity during a day. These five features help represent the regular activity patterns ofthe occupants.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 2400.511.5Time of dayKWHFigure 3.2: Home C - differences in the energy patternsIt was observed that there were occasionally noticeable differences in the energy pat-terns of two different days although the activities reported in the timesheets of those twodays were the same or similar. For instance, the hourly energy patterns of Home C onChapter 3. Validating the Proposed Method with a Survey of Activities 28March 2 (red) and March 15 (green) are shown in Figure 3.2 (please refer to Appendix Bfor the complete timesheets of the activities). Although the activities from 7AM to 9AMon those two days were the same, there were noticeable differences in the two energypatterns during the same time period. This indicates that the activity patterns mightnot only involve the times of day when activities occur but also the energy consumptionrelated to the activities. However, the energy consumption data of the individual homeelectric appliances were not explicitly available. Therefore, the daily energy consump-tions (i.e. the total energy consumption of all appliances during a day) were chosen tohelp represent the regular activity patterns of the occupants.Therefore, the six features that were used to quantitatively assess anomalies in activitypatterns of the occupants are:1. The daily highly probable activities2. The daily less probable activities3. The hourly highly probable activities4. The hourly less probable activities5. The daily less probable durations of activities6. The less probable daily energy consumptionThe quantitative assessment of anomalies in a daily activity pattern is called level ofsuspicion in this thesis. The above six features and the level of suspicion are describedbelow.Chapter 3. Validating the Proposed Method with a Survey of Activities 293.2.1 Daily Highly Probable ActivitiesAn activity is classified as daily highly probable if the occurrence probability of theactivity during a day is greater than or equal to a given threshold. A non-occurrence ofa daily highly probable activity during a given day will increase the level of suspicion ofthat day. The magnitude of the increase is equal to the expected duration of the activityduring a day rounded to the closest integer. For instance, if the expected duration of anactivity is eight hours and it does not occur on a given day, the level of suspicion of thatday will be increased by eight.3.2.2 Daily Less Probable ActivitiesAn activity is classified as daily less probable if the occurrence probability of the activityduring a day is less than or equal to a given threshold. An occurrence of a daily lessprobable activity during a given day will increase the level of suspicion of that day. Themagnitude of the increase is equal to the expected duration of the activity during a dayrounded to the closest integer.3.2.3 Hourly Highly Probable ActivitiesFor each activity, an hour is classified as highly probable if the occurrence probabilityof the activity during that hour is greater than or equal to a given threshold. That anactivity occurs during a given day but does not occur at the highly probable hours willincrease the level of suspicion of that day. The magnitude of the increase is equal to theChapter 3. Validating the Proposed Method with a Survey of Activities 30number of missed highly probable hours. For instance, if the highly probable hours of agiven activity are 8AM and 9AM but the activity occurs only at 8AM on a particularday, the level of suspicion of that day will be increased by one. The level of suspicionof a given day will not be increased due to missed highly probable hours if the activitydoes not occur at all on that day; the hourly highly probable feature is relevant only ifthe activity occurs during a day. If the activity is deemed to be daily highly probable,the level of suspicion will have been increased due to its absence during the given day.3.2.4 Hourly Less Probable ActivitiesFor each activity, an hour is classified as less probable if the occurrence probability ofthe activity during that hour is less than or equal to a given threshold. That an activityoccurs at the less probable hours will increase the level of suspicion of the day. Themagnitude of the increase is equal to the number of occurred less probable hours. Forexample, if the less probable hours of a given activity are 3AM and 4AM and the activityoccurs at 3AM on a given day, the level of suspicion of that day will be increased by one.3.2.5 Daily Less Probable Durations of ActivitiesFor each activity, a duration of the activity during a day is classified as less probable ifthe occurrence probability of that duration is less than or equal to a given threshold. Anoccurrence of a less probable duration of an activity will increase the level of suspicionof the day. The magnitude of the increase is equal to the absolute difference (in hours)Chapter 3. Validating the Proposed Method with a Survey of Activities 31between the actual duration and the most probable duration of the activity. The levelof suspicion of a day will not be increased due to a daily less probable duration if theactivity does not occur at all on that day; this feature is relevant only if the activityoccurs during a day. If the activity is deemed to be daily highly probable, the level ofsuspicion will have been increased due to its absence during the given day.3.2.6 Less Probable Daily Energy ConsumptionGiven the daily energy consumption data for multiple days (e.g. 30 days), the meandaily energy consumption and the standard deviation can be calculated. The Z-Score ofthe daily energy consumption can then be computed. A Z-Score measures the distancebetween an observation and its mean in terms of number of standard deviations. TheZ-Score of the daily energy consumption is classified as less probable if its absolute valueis greater than or equal to a given threshold. An occurrence of a less probable dailyZ-Score will increase the level of suspicion of the day. The less probable daily energyconsumption is deemed to be a day-long anomaly, and the magnitude of the increase istherefore 24.3.2.7 Level of SuspicionThe accumulated level of suspicion value of a day is equal to the sum of the level ofsuspicion values from the above six different features. The level of suspicion is meant tobe a quantitative assessment of anomalies in a daily activity pattern of an occupant. AChapter 3. Validating the Proposed Method with a Survey of Activities 32high level of suspicion value of a day indicates a large deviation from the regular activitypatterns.3.3 Configurations of the Design Variables andThresholds of Acitivty PatternsThe design variables for the proposed method (i.e. selection of the electrical energyconsumption data, probability thresholds and score assignment and z-score thresholdand score assignment) need to be determined before a score can be computed. Thethresholds of the above-mentioned six features of activity patterns for the survey alsoneed to be set before a level of suspicion can be calculated. The configurations of thedesign variables for the proposed method and thresholds of regular activity patterns forthe survey are discussed in this section.3.3.1 Design Variables for the Proposed MethodThe design variables for the proposed method affect how often an energy pattern needsto occur in order to be considered regular (recurring), and they play significant rolesin assessing anomalies in energy patterns. The scheme for choosing the design variablevalues is discussed in Appendix C. The configurations of the design variables are shownin Table 3.1. Due to the limited amount of energy data, all scores of Home A werecomputed based on the same 30 days of energy consumption data. In other words, theChapter 3. Validating the Proposed Method with a Survey of Activities 3330-day data window was the same (static) for all scores.For Home B and Home C, the score of a given day was computed based on 60 days ofenergy consumption data immediately before. In other words, the 60-day data windowwas different (dynamic) for each score. Using the given configurations, the maximumpossible score is 72 while the minimum possible score is -72.Home A Home B Home CLength of Data (days) 30 (static) 60 (dynamic) 60 (dynamic)Probability Threshold-Occurrence of Max 0.1 0.1 0.15-Occurrence of Min 0.15 0.1 0.15Score Assignment +/-1 +/-1 +/-1Range Z-Score Threshold 1.5 1 1Score Assignment +/-1 +/-1 +/-1Table 3.1: The configurations of the design variables for the proposed method3.3.2 Thresholds of Regular Activity Patterns for the SurveyThe thresholds of regular activity patterns affect the regular activity patterns and con-sequently the assessments of anomalies in activity patterns. The threshold values usedin this study are shown in Table 3.2. The length of survey data which the various prob-abilities were based on and the length of daily energy consumption data which the dailyZ-Scores were based on are shown in Table 3.3. Due to the limited amount of data, theChapter 3. Validating the Proposed Method with a Survey of Activities 34daily Z-Scores for Home A were computed based on the available 30 days of daily energyconsumption data. The daily Z-Scores for Home B and Home C were computed basedon one year of daily energy consumption data. The scheme for choosing the thresholdvalues is discussed in Appendix D.Features of Activity Patterns Home A Home B Home CProbabilityDaily highly probable activities ≥0.8 ≥0.8 ≥0.9Daily less probable activities ≤0.2 ≤0.2 ≤0.2Hourly highly probable activities ≥0.9 ≥0.8 ≥0.8Hourly less probable activities ≤0.05 ≤0.05 ≤0.05Daily less probable durations of activities ≤0.05 ≤0.05 ≤0.05|Z-Score|Less probable daily energy consumptions ≥1 ≥0.95 ≥1.1Table 3.2: The threshold values of regular activity patterns for the surveyHome A Home B Home CLength of DaysSurvey data 30 30 31Daily energy consumption data 30 365 365Table 3.3: The lengths of the survey and daily energy consumption data in daysChapter 3. Validating the Proposed Method with a Survey of Activities 353.4 Correlation between Energy ConsumptionPatterns and Activity PatternsIn Chapter 2, the Score, a quantitative assessment of anomalies in an energy consumptionpattern, was introduced. In this chapter, the Level of Suspicion, a quantitative assess-ment of anomalies in an activity pattern, was introduced. In this section, we show thecorrelation between the Score (energy consumption patterns) and the Level of Suspicion(activity patterns) for each participating occupant. The scores sorted in descending orderof the level of suspicion values of Home A, Home B and Home C are shown in Figure3.3, Figure 3.4 and Figure 3.5 respectively. In Figure 3.3, the level of suspicion values ofthe first nine scores range from 57 down to 24 while the level of suspicion values of theremaining scores range from 12 down to 0. In Figure 3.4, the level of suspicion valuesof the first two scores are 48 and 26 while the level of suspicion values of the remainingscores range from 11 down to 0. In Figure 3.5, the level of suspicion values of the firsteigth scores range from 30 down to 24 while the level of suspicion values of the remainingscores range from 7 down to 0. Although the score does not monotonically increase as thelevel of suspicion value decreases, the scores tend to be higher when the level of suspicionvalues are lower. In other words, the score seems somewhat correlated with the level ofsuspicion.Chapter 3. Validating the Proposed Method with a Survey of Activities 360 5 10 15 20 25 30−10010203040506070Day Index (where 1 = the highest level of suspicion) ScoreFigure 3.3: Home A - scores sorted in descending order of level of suspicion values0 5 10 15 20 25 30−20−100102030405060Day Index (where 1 = the highest level of suspicion)ScoreFigure 3.4: Home B - scores sorted in descending order of level of suspicion values0 5 10 15 20 25 30−30−20−1001020304050Day Index (where 1 = the highest level of suspicion)ScoreFigure 3.5: Home C - scores sorted in descending order of level of suspicion valuesChapter 3. Validating the Proposed Method with a Survey of Activities 373.5 Comparison of the Proposed Method and aRaw Data Based ApproachThe objective of this research work is to classify the daily activity patterns of an occupantas regular or irregular using the corresponding energy consumption patterns. A simpleway would be to measure the similarities of the raw energy consumption sequences to oneanother and group similar energy sequences into distinct clusters (i.e. to do a clusteringof the raw energy consumption sequences). Each cluster can then be assigned a class label(i.e. regular or irregular). This can be considered a raw data based clustering approach.However, the raw energy consumption data without any further processing may not bean effective indicator of the underlying activity patterns.The proposed method infers anomalies in activity patterns using the features ex-tracted from the raw energy consumption data and the model built using the features.The proposed method thus uses a mixture of feature based and model based approach.This section shows that the proposed method is more accurate than a chosen raw databased approach for classifying activity patterns of the occupants. The level of suspicion,which was deduced from the activity survey, is used as the pseudo ground truth for thecomparison.Instead of attempting to deduce whether an activity pattern is normal or not normalfrom the activity data, a simpler way would be to directly ask the occupant. The answerfrom the occupant could then be used as the ground truth. However, the definition ofChapter 3. Validating the Proposed Method with a Survey of Activities 38”normal” can vary from person to person. Therefore, we would want to ask the occupanta series of questions so that the answers might tell us whether the activity pattern of agiven day is normal or not normal according to our chosen definition.The survey of this research work attempted to ask the occupant to rate his/her moodand health respectively on a 5-point scale. The change in the ratings from one day toanother might indicate the change in his/her wellness (probably the activity patterns aswell). Unfortunately, the participating occupants picked the same rating every day, whichdid not provide much useful information to this research work. Therefore, we decided todeduce the pseudo ground truth (based on the level of suspicion) from the activity dataof the survey instead.The pseudo ground truth from the survey, training sets and test sets, clustering ofthe scores provided by the proposed method, clustering of the raw energy consumptionsequences, and performance evaluation are provided in this section.3.5.1 Pseudo Ground Truth from the SurveyThe pseudo ground truth of whether the daily activity patterns of the participatinglone occupants were regular or irregular was deduced from the level of suspicion values.Although a higher level of suspicion value indicates that the daily activity pattern of anoccupant is more irregular, the boundary between the higher level of suspicion values andthe lower level of suspicion values of each participating occupant has not been defined.Fuzzy C-Means (FCM) clustering [44] was used to cluster the level of suspicion valuesChapter 3. Validating the Proposed Method with a Survey of Activities 39into two groups: regular (low level of suspicion) and irregular (high level of suspicion).FCM was chosen because it is well-known and works well with low dimensional data.A day was then labelled as regular or irregular according to which resulting cluster itbelonged to. The cluster with a lower center value (defined as the mean of all data pointsthat belongs to the cluster) was considered the regular cluster while the cluster with ahigher center value was considered the irregular cluster.3.5.2 Training Sets and Test SetsFor both the scores and raw energy sequences, the available data of each home were splitinto a training set and a test set. There were only 30 days (31 days for Home C) thathave the pseudo ground truth data because the survey was only conducted for a month.In order to compare the clustering results of the proposed method and the chosen rawdata based approach based on the pseudo ground truth, we were limited to compare only30/31 days of the results deduced from the energy consumption patterns by the proposedmethod and the raw data based approach. Therefore, for each home, 20 consecutive days(21 days for Home C) of data were chosen as the training data (i.e. either the first 20/21days or the last 20/21 days of data) while the remaining 10 consecutive days were chosenas the test set data. The lengths (number of samples) of the training set and test set areshown in Table 3.4.Chapter 3. Validating the Proposed Method with a Survey of Activities 40Training Set (days) Test Set (days)Home A 20 10Home B 20 10Home C 21 10Table 3.4: The lengths of the training set and test set data3.5.3 Clustering of the Scores Provided by the ProposedMethodAlthough a lower score indicates that the energy consumption pattern of an occupant ona given day is more irregular, the boundary between the lower scores and higher scoresfor each participating occupant needs to be defined. First, FCM was used to cluster thedata of each training set into two groups respectively. The cluster with a lower centervalue was considered the irregular cluster while the cluster with a higher center value wasconsidered the regular cluster. Second, each data point of each test set was classified asregular or irregular depending on which cluster’s center it was closer to.3.5.4 Clustering of the Raw Energy Consumption SequencesThe proposed method requires an extra step to extract features from the raw energy con-sumption data and build a model using the features before the clustering. The additionalstep would be questionable if the proposed method does not perform better than a rawdata based approach. In [32], it was found that the Euclidean distance measure is theChapter 3. Validating the Proposed Method with a Survey of Activities 41overall best similarity measure for clustering hourly energy consumption data among theother three well-known measures (Mahalanobis distance [33], Dynamic Time Warpingdistance [34] and Pearson’s correlation [35]). Since FCM was the clustering techniqueused in [32], FCM with Euclidean distance measure was chosen for clustering the rawenergy consumption sequences in this thesis.First, FCM was used to cluster the energy consumption sequences of each training setsix times, each time with a different number of pre-determined clusters (2 to 7). Second,the Dunn index [36] was used to determine the best number of clusters for each trainingset according to the six different clustering results. The Dunn index is basically the ratioof the minimum distance between clusters to the maximum distance between data pointswithin the same cluster and should be maximized [45]. According to the Dunn index,the best number of clusters was 3 for all Home A, Home B and Home C. Therefore,FCM was used again to cluster the energy consumption sequences of each training setfor each home into three clusters respectively. Third, the three clusters for each homewere labelled as regular or irregular respectively. Two clusters were to be assigned thesame class label. The particular combination of the three labelled clusters which yieldedthe best result when comparing against the pseudo ground truth was chosen. Last, eachenergy consumption sequence of each test set for each home was classified as regular orirregular depending on which cluster’s center it was closer to. The distance measure usedfor the classification of the test set data was again Euclidean distance measure.Chapter 3. Validating the Proposed Method with a Survey of Activities 423.5.5 Performance EvaluationThe clustering results of the scores and the raw energy sequences are compared here.The pseudo ground truth was derived from the clustering result of the level of suspicionvalues as described in Section 3.5.1. Flowcharts of the method for deriving the pseudoground truth from an activity perspective, the proposed method based on FCM and theraw data based approach are shown in Figure 3.6.Figure 3.6: Flowcharts of the method for classifying a day as regular or irregular froman activity perspective (left), the proposed method based on FCM (center),and the raw data based approach (right)Chapter 3. Validating the Proposed Method with a Survey of Activities 43The performances of the proposed method and the chosen raw data based approachare summarized in Table 3.5. The performance evaluation was based on all three datasetsof the three lone occupants. The performance shown here is the average performance forall Homes A, B and C. The number of missed detections indicates the number of irregulardays that were incorrectly labelled as regular by the proposed method or the raw databased approach. The number of false alarms indicates the number of regular days thatwere incorrectly labelled as irregular by the proposed method or the raw data basedapproach. Sensitivity is the proportion of irregular days that were correctly labelled bythe proposed method or the raw data based approach. Specificity is the proportion ofregular days that were correctly labelled by the proposed method or the raw data basedapproach.Chapter 3. Validating the Proposed Method with a Survey of Activities 44Proposed Method Raw Data Based Approach [32]Training SetAccuracy 57/61 (93.44%) 54/61 (88.52%)Missed Detection 1 1False Alarm 3 6Sensitivity 13/14 (92.86%) 13/14 (92.86%)Specificity 44/47 (93.62%) 41/47 (87.23%)Test SetAccuracy 26/30 (86.67%) 22/30 (73.33%)Missed Detection 1 3False Alarm 3 5Sensitivity 4/5 (80%) 2/5 (40%)Specificity 22/25 (88%) 20/25 (80%)Table 3.5: The performances of the proposed method and the chosen raw data basedapproachAs described in Section 3.2.1 to Section 3.2.7, the level of suspicion was deduced fromthe five features derived from the survey of activities and the daily energy consumptiondata (i.e. the total energy consumption of all appliances during a day). Unlike the otherfive features, the daily energy consumption data were not derived from the survey of ac-tivities. The reason why daily energy consumptions were included is discussed in Section3.2. To examine the significance of the daily energy consumptions in the performanceChapter 3. Validating the Proposed Method with a Survey of Activities 45evaluation, a test was conducted to evaluate the performances of the proposed methodand the raw data based approach when the pseudo ground truth did not consider thedaily energy consumptions (please refer to Appendix E for details). Removing the lessprobable daily energy consumption feature from the pseudo ground truth appeared toadversely affect the accuracy performances of both the proposed method and the rawdata based approach equally. The training set accuracies of the proposed method andthe raw data based approach dropped by 19.30% and 20.37% respectively, while the testset accuracies dropped by 15.38% and 13.64% respectively.3.6 DiscussionThe numerical results show that anomalies in energy consumption patterns detected bythe proposed method tend to correlate with anomalies in activity patterns. The resultsalso show that the proposed feature and model based method outperforms the raw databased approach. It indicates that the features extracted from the energy consumptiondata better represent anomalies in the underlying activities than the raw energy con-sumption data. Although the training set and test set available are relatively small, theperformance of the proposed method is quite encouraging. Given the sensitivity andspecificity, the proposed method can detect the anomalies in activity patterns of theoccupants quite well. The proposed method does not identify whether the anomalies de-tected indicates a positive or negative change in the occupant’s wellness. However, it cantrigger an alert to caregivers or relatives who can then investigate in a timely manner.46Chapter 4Reducing the Energy Features Basedon mRMR Feature SelectionAs mentioned in Section 2.2, the energy consumption pattern of a given day is representedby 24 time points of maxima, 24 time points of minima, and 24 ranges. Thus, there isa total of 72 different features for representing the energy consumption pattern of aday. It has been shown that using subsets of features might improve the performances ofclassifiers for various classification tasks [46–48]. In this chapter, we show the performanceof the proposed method when subsets of the 72 energy features were considered. Anoverview of feature selection methods, the minimum Redundancy Maximum Relevance(mRMR) feature selection, and the performance evaluation are provided in the followingsections.4.1 Overview of Feature Selection MethodsFeature selection methods are typically used for the following reasons: 1) simplifyingmodels for providing better interpretations of the underlying processes that generated theChapter 4. Reducing the Energy Features Based on mRMR Feature Selection 47data, 2) reducing training times of classifiers, 3) improving classification accuracy [49].Feature selection methods generally help improve classification accuracy by: 1) leavingout irrelevant features (noise) and 2) reducing overfitting. Feature selection methods cangenerally be divided into three main categories: 1) filters, 2) wrappers and 3) embeddedmethods.Filter type methods rank features or select subsets of features based on some chosenconditions (e.g. mutual information, Pearson’s correlation [35]) and are independent toclassifiers. Thus, filter type methods filter out irrelevant features before the learningprocess occurs. Some examples of filter methods are RELIEF [50] and mRMR [39,40]. Filters are generally faster than wrappers and embedded methods because filtersrank features according to their characteristics and are independent of the classificationalgorithms.Wrapper type methods select a subset of features and use a classifier (e.g. Naive-Bayesclassifier, Support Vector Machine [51]) to assess the quality of the subset. It is generallyimpractical to exhaustively search through every combination of all considered features.Two typical strategies are forward selection and backward elimination [52]. Forwardselection begins a search with an empty set and adds one feature at a time while backwardelimination begins a search with a full set of all considered features and eliminates onefeature at a time. Wrappers generally yield better classification performance than filtersat the expense of high computational cost.Embedded methods are similar to wrapper type methods except that feature selec-Chapter 4. Reducing the Energy Features Based on mRMR Feature Selection 48tions are part of the classifiers and they cannot be separated. In contrast, wrapper typemethods can be combined with any classifier. Some examples of embedded methods areSupport Vector Machine Recursive Feature Elimination (SVM-RFE) [53] and LocallyWeighted Naive-Bayes classifier [54]. Embedded methods are generally less computa-tionally intensive than wrappers.A filter type method was adopted in this thesis because there were 72 energy fea-tures and hence the computational cost was a concern. Most filter type methods selectfeatures based on their relevance to target classes but do not consider the redundancyamong features [39, 40]. The minimum Redundancy Maximum Relevance (mRMR) fea-ture selection [39, 40], which considers both relevance of features to target classes andredundancy among selected features, was therefore adopted.4.2 mRMR Feature SelectionThe mRMR feature selection [39, 40] is a filter type method which ranks features ac-cording to mutual informations between target classes (i.e. regular and irregular for ourclassification problem) and individual features (relevance) and mutual informations be-tween selected features (redundancy). The idea of mRMR is to select a subset of featureswhich are maximally dissimilar (minimum redundancy) while maximizing the mutualinformation between selected features and target classes (maximum relevance).Let S denote the set of selected features. The minimum redundancy condition [40] isChapter 4. Reducing the Energy Features Based on mRMR Feature Selection 49minWI , WI =1|S|2∑i,j∈SI(i, j), (4.1)where I(i, j) represents the mutual information between feature i and feature j, and |S|is the number of selected features in S. Let h denote the class variable (i.e. regular orirregular for our classification problem). The maximum relevance condition [40] ismaxVI , VI =1|S|∑i∈SI(h, i), (4.2)where I(h, i) represents the mutual information between feature i and class h, and h ={h1, h2, ..., hK} if there are K different classes.The objective of mRMR is to optimize the above two conditions simultaneously.It requires a criterion for combining the two conditions. Therefore, two criteria havebeen considered [40]: the Mutual Information Difference (MID) and Mutual InformationQuotient (MIQ) criteria.MID, max(VI −WI), (4.3)MIQ, max(VI/WI), (4.4)In mRMR, the feature with the highest I(h, i) is chosen as the first feature. Theremaining features are chosen incrementally. The first selected feature is ranked first, thesecond selected feature is ranked second, and so forth. Suppose we have included m− 1features in Sm−1 (where m > 1) and want to add the mth feature. Let S ′m−1 be the setof features that have not been selected. The feature j in S ′m−1 which maximizes the MIDChapter 4. Reducing the Energy Features Based on mRMR Feature Selection 50or MIQ condition is selected as the mth feature [39]:MID, maxj∈S′m−1[I(j, h) − 1m− 1∑i∈Sm−1I(j, i)], (4.5)MIQ, maxj∈S′m−1[I(j, h) /[ 1m− 1∑i∈Sm−1I(j, i)]]. (4.6)One needs to determine which one of the two criteria to use when using mRMR. Afterseveral test runs, it was found that using either MID or MIQ criterion eventually yieldedsimilar classification accuracy for the proposed method. As neither MID nor MIQ hadapparent advantage over each other, the MIQ criterion was adopted for the performanceevaluation.4.3 Performance EvaluationTo use mRMR, we needed to provide the class label of each sample (day) and the 72 energyfeatures of each sample. First, each sample was labelled as 0 (regular) or 1 (irregular)according to the pseudo ground truth introduced in Section 3.5.1. Second, the 72 sub-scores which correspond to the 72 energy features (as mentioned in Section 2.3.2) of eachsample were retrieved. Third, mRMR was used to rank the energy features for HomesA, B and C respectively. Please refer to Appendix F for the detailed rankings. The topranked feature is the feature i that maximizes the relevance condition (i.e. Equation4.2). The next highest ranked feature is the feature j that maximizes the MIQ condition(i.e. Equation 4.6). A flowchart of the proposed method based on mRMR and FCMChapter 4. Reducing the Energy Features Based on mRMR Feature Selection 51(PM-mRMRnFCM) is shown in Figure 4.1. A flowchart of the proposed method basedon FCM (PM-FCM) is also shown in Figure 4.1 for comparison. PM-mRMRnFCM isessentially the same as PM-FCM, except that PM-mRMRnFCM reduces the number ofenergy features before the FCM clustering.Figure 4.1: Flowcharts of the proposed method based on mRMR and FCM (left) andthe proposed method based on FCM (right)The performance of this proposed method was then evaluated when a subset of the topranked m (where 1 ≤ m ≤ 72) feature(s) was considered. The performance evaluationChapter 4. Reducing the Energy Features Based on mRMR Feature Selection 52was based on all three datasets of Homes A, B and C. Please note that each homehas its own ranking of the 72 energy features. The top ranked m feature(s) means thetop ranked m feature(s) of each home respectively. The performance shown here is theaverage performance for all Homes A, B and C. As can be seen in Figure 4.2, neitherthe best training set accuracy nor the best test set accuracy was achieved when all 72features were considered. The training set accuracy was the highest when the top rankedfour to 12 features were considered. The test set accuracy was the highest when the topranked two or 13 to 60 features were considered. The accuracy percentages of some keypoints in Figure 4.2 are listed in Table 4.1.4 10 13 20 30 40 50 60 707075808586.679093.4495100# of features includedAccuracy %  TrainingTestFigure 4.2: Accuracies of the proposed method when subsets of the respective topranked one to 72 feature(s) of Homes A, B and C were consideredChapter 4. Reducing the Energy Features Based on mRMR Feature Selection 53# of feature(s) Training accuracy Test accuracy1 77.05% 83.33%4 100% 86.67%13 98.36% 90%72 93.44% 86.67%Table 4.1: The performances of the proposed method when subsets of the top rankedone, four, 13 and 72 feature(s) were considered4.4 DiscussionIt was found that the performance of the proposed method could be improved when sub-sets of the 72 energy features were considered based on mRMR feature selection. Whenthe top ranked four features were considered, the training set accuracy was improved ascompared to when all features were considered. When the top ranked 13 features wereconsidered, both the training set and test set accuracies were improved as compared towhen all features were considered.As can be seen in Table 4.1, the training accuracy was lower than the test accuracywhen only the top ranked feature was considered. First, the learning algorithm mightnot be able to capture the underlying trend of the data if too few features are included(under-fitting). Second, there are more training samples than test samples. Therefore,a under-fitted model might occasionally fit the test samples better than the trainingsamples.Chapter 4. Reducing the Energy Features Based on mRMR Feature Selection 54As can be seen in Figure 4.2, the training and test accuracies at both ends of thefigure do not seem very stable. It is because mRMR, a filter type feature selectionmethod, ignores the effect of the selected subset of features on the classification accuracyfor ranking features. Therefore, there is no guarantee that the classification accuracy willbe improved when the next highest ranked feature is included.In [55,56], the similar effects on the classification accuracy of some filter type featureselection methods (e.g. information gain, mRMR) can also be observed. In [55], theresearchers classified some Comparative Genomic Hybridization (CGH) data based onmRMR and SVM. The instability in the classification accuracy can be observed whenthe number of selected features was changed. The instability might be more noticeablefor our classification problem because we have a relatively small sample size; each samplein the training set accounts for 1.64% ( 161× 100%) of the classification accuracy and eachsample in the test set accounts for 3.33% ( 130× 100%) of the classification accuracy.In [56], the researchers investigated the relationship between feature selection (andextraction) methods and the resulting classification accuracy based on various classifierssuch as SVM. From the experimental results in [56], the following can be observed:1. The classification accuracy was not stable when the number of selected features(ranked by a given method) was changed.2. The effectiveness of a subset of selected features (ranked by a given method) onthe classification accuracy might vary from classifier to classifier.3. There are generally no rules for choosing the optimal number of features.55Chapter 5Classifying Electrical EnergyConsumption Patterns Based onC-ELMIn Chapter 2 and Chapter 3, the Proposed Method based on Fuzzy C-Means clustering(PM-FCM) was introduced. PM-FCM classifies energy patterns without labelled train-ing data and is hence an unsupervised learning technique. In Chapter 4, PM-FCM wasenhanced by adopting the mRMR feature selection (PM-mRMRnFCM). The mRMRfeature selection ranks the 72 energy features according to some labelled data based onthe pseudo ground truth deduced from the survey (introduced in Section 3.5.1). PM-mRMRnFCM classifies energy patterns using subsets of energy features according to theranking provided by mRMR. The training process of PM-mRMRnFCM is still unsuper-vised because the training process does not require labelled training data. In this chapter,we propose a supervised learning technique based on the Curious Extreme Learning Ma-chine (C-ELM) [41] which uses both the energy features and the pseudo ground truth(i.e. labelled training data) for training. The classification based on C-ELM and itsChapter 5. Classifying Electrical Energy Consumption Patterns Based on C-ELM 56performance evaluation are discussed below.5.1 Classification Based on C-ELMIn this section, a supervised learning technique is adopted for classifying energy consump-tion patterns. The Curious Extreme Learning Machine (C-ELM) [41] is chosen becauseit has been shown that C-ELM outperforms other popular classifiers such as SVM basedon some benchmark classification problems [41]. C-ELM requires the input vectors tobe labelled by the desired outputs for training. For our classification problem, the inputvectors are the 72-dimensional vectors of the energy features, each labelled by the de-sired output (i.e. regular or irregular) which is the pseudo ground truth deduced fromthe survey. The pseudocode for the C-ELM training is given in Algorithm 5. The detailsof C-ELM are given in [41].Chapter 5. Classifying Electrical Energy Consumption Patterns Based on C-ELM 57Algorithm 5 The pseudocode for the C-ELM training1: Step 1 Present a sample (xt, ct), where xt denotes the t-th input vector (i.e. the 72energy features) and ct denotes its class label (i.e. regular or irregular)2: Step 2 Compute the values of four variables (Novelty, Uncertainty, Conflict, andSurprise) for the given input data3: Step 3 Based on the four variables in Step 2, select one of the following learningstrategies: 1) add a hidden neuron, 2) delete a hidden neuron, and 3) update networkparameter4: Step 4 Increment t by 15: Step 5 Repeat Step 1 to Step 4 until the last sample has been reachedC-ELM also requires choosing the following five thresholds for the four variables:1. The low threshold for Novelty2. The high threshold for Novelty3. The threshold for Uncertainty4. The threshold for Conflict5. The threshold for SurpriseEach threshold value ranges from 0 to 1 and hence it is not practical to exhaustivelyexplore all possible combinations. However, about 10000 different combinations of thefive threshold values were tested. The chosen threshold values are those that providedthe best testing accuracy by experiment. The threshold values for Homes A, B and Care shown in Table 5.1.Chapter 5. Classifying Electrical Energy Consumption Patterns Based on C-ELM 58Threshold Home A Home B Home CNovelty Low 0.1 0.1 0.1Novelty High 0.3 0.5 0.4Uncertainty 0.2 0.2 0.3Conflict 0.1 0.2 0.5Surprise 0.6 0.3 0.6Table 5.1: The threshold values for C-ELM for Homes A, B and CThe lengths (number of samples) of the training set and test set for each home arethe same as shown in Table 3.4 in Section 3.5.2.5.2 Performance EvaluationThe performance of the Proposed Method based on C-ELM (PM-CELM) was evaluated.PM-CELM was then enhanced by adopting mRMR (PM-mRMRnCELM). Flowcharts ofPM-CELM and PM-mRMRnCELM are shown in Figure 5.1.Chapter 5. Classifying Electrical Energy Consumption Patterns Based on C-ELM 59Figure 5.1: Flowcharts of the proposed method based on C-ELM (without the shadedsteps) and the proposed method based on mRMR and C-ELM (with theshaded steps)The accuracy performance of PM-CELM is shown in Table 5.2. The performances ofPM-FCM (as in Table 3.5) and PM-mRMRnFCM (as in Table 4.1) are also shown herefor comparison. The performance shown here is the average performance for all HomesA, B and C.Chapter 5. Classifying Electrical Energy Consumption Patterns Based on C-ELM 60PM-CELM PM-FCM PM-mRMRnFCM4 features 13 featuresTraining Set Accuracy 100% 93.44% 100% 98.36%Test Set Accuracy 96.67% 86.67% 86.67% 90%Table 5.2: The performances of PM-CELM, PM-FCM and PM-mRMRnFCMThe procedure for adopting mRMR is the same as discussed in Section 4.3. SincemRMR is independent of the learning algorithm used, the rankings of the energy features(please refer to Appendix F) are exactly the same for both PM-mRMRnCELM and PM-mRMRnFCM. The performance of PM-mRMRnCELM was evaluated and is shown inFigure 5.2. As can be seen in Figure 5.2, the best training accuracy and test accuracy are100% and 96.67% respectively. The accuracy percentages of some key points in Figure5.2 are listed in Table 5.3. The best performance was achieved when all 72 featureswere considered. However, the same performance could also be achieved when only thetop ranked 11 features were considered. As can be seen in Figure 5.2, the classificationaccuracy does not seem very stable. The instability in the classification accuracy wasdiscussed in Section 4.4.Chapter 5. Classifying Electrical Energy Consumption Patterns Based on C-ELM 611011 20 30 40 50 60 70203040506070809096.67100# of features includedAccuracy %  TrainingTestFigure 5.2: Accuracies of PM-mRMRnCELM when subsets of the top ranked one to 72feature(s) of Homes A, B and C were considered# of features Training accuracy Test accuracy11 100% 96.67%72 100% 96.67%Table 5.3: The performances of PM-mRMRnCELM when subsets of the top ranked 11and 72 features were considered5.3 DiscussionIn this chapter, we proposed a supervised learning method based on C-ELM (PM-CELM).It was then enhanced by adopting the mRMR feature selection (PM-mRMRnCELM).They provide two simple solutions for classifying energy consumption patterns as eitherregular or irregular if the ground truth (of whether the underlying activity pattern isChapter 5. Classifying Electrical Energy Consumption Patterns Based on C-ELM 62regular or irregular) is available for training. On the other hand, PM-FCM, the proposedunsupervised learning method based on FCM, as described in Chapter 2 and Chapter 3,can operate without the ground truth or labelled training data. The performance evalua-tion in this chapter shows that PM-CELM outperforms PM-FCM, as might be expected.However, PM-CELM can only be used when labelled training data are available.63Chapter 6Conclusion and Future WorkIn this chapter, the main findings are summarized and some topics of future research arediscussed.6.1 ConclusionThis thesis explored the correlation of household electricity consumption patterns andactivity patterns of an occupant. The objective of the research work in this thesis is toclassify the daily activity patterns of an occupant as regular or irregular. The fundamentalassumption was that anomalies in energy consumption patterns would reflect anomaliesin the underlying activity patterns.In Chapter 2, a feature and model based method for detecting anomalies in activ-ity patterns of an occupant using electrical energy consumption patterns was proposed.The raw energy consumption patterns were believed to be ineffective for reflecting theanomalies in the underlying activity patterns. Therefore, features which were believedto be more effective for reflecting anomalies in activity patterns were extracted from theraw energy consumption data and a model was built based on the features extracted.Chapter 6. Conclusion and Future Work 64The output of the proposed method was a Score, a quantitative assessment of anomaliesin the energy consumption pattern of a given day.In Chapter 3, the correlation of the scores provided by the proposed method and theactivity patterns of three lone occupants was shown numerically. The proposed featureand model based method was also compared with a chosen raw data based approach. Asurvey was conducted to obtain evidence to show that anomalies in energy consumptionpatterns could reflect anomalies in activity patterns. Three lone occupants participatedin the survey. The participants were asked to report their hourly activities every dayon the survey timesheets for a month. The chosen features for representing the reg-ular (recurring) activity patterns were 1) the daily highly probable activities, 2) dailyless probable activities, 3) hourly highly probable activities, 4) hourly less probable ac-tivities, 5) daily less probable durations of activities and 6) less probable daily energyconsumption.Using the features of activity patterns, a Level of Suspicion, a quantitative assessmentof anomalies in a daily activity pattern, could be computed. Fuzzy C-Means clustering(FCM) [44] was used to cluster the level of suspicion values into two clusters (i.e. regularand irregular). The scores provided by the proposed method were also clustered by FCMinto two clusters (i.e. regular and irregular). A raw data based approach [32] was chosento compare with the proposed method. The raw energy consumption sequences wereagain clustered by FCM and the clusters were classified as regular or irregular.Using the clustering result of the level of suspicion values as the pseudo ground truth,Chapter 6. Conclusion and Future Work 65the clustering and classification results of the scores and the raw energy consumptionsequences were compared. First, the results showed that both the scores and the rawenergy sequences correlated with the level of suspicion values. In other words, the resultsindicated that anomalies in activity patterns can be effectively inferred from energyconsumption patterns.Second, the results showed that the proposed method performs better than the chosenraw data based approach, i.e. extracted features are more effective then the raw energyconsumption patterns for reflecting anomalies in the underlying activity patterns.In Chapter 4, the mRMR feature selection [39] was adopted for reducing the numberof features used for representing an energy consumption pattern of a day. The 72 energyfeatures were ranked based on mRMR. It was found that the training set accuracy wasimproved when the top ranked four features were considered while the test set accuracyremained the same as compared to when all features were considered. It was also foundthat both training set and test set accuracies were improved when the top ranked 13features were considered as compared to when all features were considered.In Chapter 5, we proposed a supervised learning method based on C-ELM (PM-CELM) for classifying the energy consumption patterns as either regular or irregular.PM-CELM requires labelled training data (i.e. a supervised learning). The training datawere labelled based on the pseudo ground truth deduced from the survey (introduced inSection 3.5.1). In contrast, the proposed method based on FCM (PM-FCM) introducedin Chapter 2 and Chapter 3 does not require labelled training data (i.e. an unsupervisedChapter 6. Conclusion and Future Work 66learning). It was shown that PM-CELM outperforms PM-FCM. PM-CELM was thenenhanced by adopting mRMR feature selection (PM-mRMRnCELM). It was shown thatPM-mRMRnCELM can achieve the same accuracy performance as PM-CELM using asubset of the energy features.The advantage of the proposed method (PM-FCM) is that it detects anomalies inactivity patterns of an occupant using household electricity consumption patterns withouta need of explicitly monitoring the individual activities of the occupant. Most relatedworks [21–25, 27–30] for detecting anomalies in activities of a person require monitoringthe individual activities or actions of the person. Monitoring the individual activitiesmay require a lot more hardware and annotating the individual activities captured bysome hardware (e.g. sensors and cameras) could be time-consuming.The limitations of the proposed method are as follows: 1) it can only detect anomaliesin activities that affect energy consumptions, 2) it does not identify which particularactivity led to the detected anomalies, and 3) it does not identify whether the detectedanomalies indicate a positive or negative change in a person’s wellness. However, theproposed method can trigger an alert to caregivers or relatives who can then investigatethe cause of the anomalies in a timely manner. If the objective of a certain assisted livingsystem is to automatically raise an alarm when an activity pattern of a monitored persondeviates significantly from the regular activity patterns, the proposed method providesa simple solution.Chapter 6. Conclusion and Future Work 676.2 Future WorkThe proposed method (PM-FCM) in this thesis can detect anomalies in activity patternsof an occupant from household energy consumption data, but it does not identify whichparticular activity led to the anomalies. A possible topic for future research is to recognizeindividual activities from energy consumption patterns. One will probably need to obtainthe activity patterns of an occupant in some way (e.g. survey, sensors) for a relativelylong time before one might match a certain activity to a particular energy consumptionpattern accurately.Another limitation of the proposed method in this thesis is that it does not identify theimplication of the detected anomalies on a person’s wellness. Therefore, another possibletopic for future research is to explore the correlation of energy consumption patternsand a person’s wellness. A possible intermediate step would be to first investigate thecorrelation of activity patterns and a person’s wellness. One might need to first discoverthe particular activity pattern (or patterns) which corresponds to a certain change in theperson’s wellness. One can then observe the corresponding energy consumption patternsevery time when that particular activity pattern occurs. After a long term observation,one might be able to relate a certain change in a person’s wellness to a particular energyconsumption pattern (or patterns).On the other hand, one might skip the intermediate step mentioned above and directlymatch a certain change in a person’s wellness to a particular energy consumption pattern(or patterns). A major challenge of this future research is to obtain the baseline measureChapter 6. Conclusion and Future Work 68of a person’s wellness. Another difficulty is that it will probably require a large amount ofinformation before one can be sure that a particular energy consumption pattern indicatesa certain change in a particular person’s wellness.69Bibliography[1] United Nations, Department of Economic and Social Affairs, Population Division,“World population ageing 2013,” 2013.[2] W. W. Hung, J. S. Ross, K. S. Boockvar, and A. L. Siu, “Recent trends in chronic dis-ease, impairment and disability among older adults in the United States,” BioMed-Central(BMC) Geriatrics, vol. 11, no. 47, 2011.[3] Canada, Parliament, House of Commons, Standing Committee on Health, “Evi-dence,” in Meeting 7, 41st Parliament, 1st Session on, October 5, 2011.[4] Canada, Parliament, House of Commons, Standing Committee on Health, “Evi-dence,” in Meeting 8, 41st Parliament, 1st Session on, October 17, 2011.[5] Canada, Parliament, House of Commons, Standing Committee on Health, “Evi-dence,” in Meeting 12, 41st Parliament, 1st Session on, October 31, 2011.[6] R. Harrell, J. Lynott, S. Guzman, and C. Lampkin, “What is livable? commu-nity preferences of older adults,” American Association of Retired Persons (AARP)Public Policy Institute, April 2014.Bibliography 70[7] P. Rashidi and A. Mihailidis, “A survey on ambient-assisted living tools for olderadults,” IEEE Journal of Biomedical and Health Informatics, vol. 17, no. 3, pp.579–590, 2013.[8] B. Liu, Y. Zhang, and Z. Liu, “Wearable monitoring system with multiple physio-logical parameters,” in Medical Devices and Biosensors (MDBS), 5th InternationalSummer School and Symposium on, 2008, pp. 268–271.[9] E. Sardini, M. Serpelloni, and M. Ometto, “Multi-parameters wireless shirt forphysiological monitoring,” in Medical Measurements and Applications Proceedings(MeMeA), IEEE International Workshop on, 2011, pp. 316–321.[10] K. Malhi, S. C. Mukhopadhyay, J. Schnepper, M. Haefke, and H. Ewald, “A zigbee-based wearable physiological parameters monitoring system,” IEEE Sensors Journal,vol. 12, no. 3, pp. 423–430, 2012.[11] R. Logier et al., “A multi sensing method for robust measurement of physiologicalparameters in wearable devices,” in Engineering in Medicine and Biology Society(EMBS), 36th Annual International Conference of the IEEE on, 2014, pp. 994–997.[12] J. Chen, K. Kwong, D. Chang, J. Luk, and R. Bajcsy, “Wearable sensors for reli-able fall detection,” in Engineering in Medicine and Biology Society (EMBS), 27thAnnual International Conference of the IEEE on, 2006, pp. 3551–3554.[13] T. Tamrat, M. Griffin, S. Rupcic, S. Kachnowski, T. Taylor, and J. Barfield, “Oper-ationalizing a wireless wearable fall detection sensor for older adults,” in PervasiveBibliography 71Computing Technologies for Healthcare (PervasiveHealth), 6th International Confer-ence of the IEEE on, 2012, pp. 297–302.[14] M. Popescu, Y. Li, M. Skubic, and M. Rantz, “An acoustic fall detector systemthat uses sound height information to reduce the false alarm rate,” in Engineeringin Medicine and Biology Society (EMBS), 30th Annual International Conference ofthe IEEE on, 2008, pp. 4628–4631.[15] X. Zhuang, J. Huang, G. Potamianos, and M. Hasegawa-Johnson, “Acoustic falldetection using gaussian mixture models and GMM supervectors,” in InternationalConference on Acoustics, Speech and Signal Processing (ICASSP) of the IEEE on,2009, pp. 69–72.[16] C. Rougier, J. Meunier, A. St-Arnaud, and J. Rousseau, “Robust video surveillancefor fall detection based on human shape deformation,” Circuits and Systems forVideo Technology, IEEE Transactions on, vol. 21, no. 5, pp. 611–622, 2011.[17] V. Vaidehi, K. Ganapathy, K. Mohan, A. Aldrin, and K. Nirmal, “Video basedautomatic fall detection in indoor environment,” in International Conference onRecent Trends in Information Technology (ICRTIT) of the IEEE on, 2011, pp. 1016–1020.[18] Z. Zhou, X. Chen, Y.-C. Chung, Z. He, T. X. Han, and J. M. Keller, “Activityanalysis, summarization, and visualization for indoor human activity monitoring,”Bibliography 72Circuits and Systems for Video Technology, IEEE Transactions on, vol. 18, no. 11,pp. 1489–1498, 2008.[19] L. Chen, J. Hoey, C. D. Nugent, D. J. Cook, and Z. Yu, “Sensor-based activityrecognition,” Systems, Man, and Cybernetics, Part C: Applications and Reviews,IEEE Transactions on, vol. 42, no. 6, pp. 790–808, 2012.[20] O. D. Lara and M. A. Labrador, “A survey on human activity recognition usingwearable sensors,” IEEE Communications Surveys & Tutorials, vol. 15, no. 3, pp.1192–1209, 2013.[21] N. Suryadevara, A. Gaddam, S. Mukhopadhyay, and R. Rayudu, “Wellness determi-nation of inhabitant based on daily activity behaviour in real-time monitoring usingsensor networks,” in Sensing Technology (ICST), 5th International Conference ofthe IEEE on, 2011, pp. 474–481.[22] N. K. Suryadevara and S. C. Mukhopadhyay, “Wireless sensor network based homemonitoring system for wellness determination of elderly,” IEEE Sensors Journal,vol. 12, no. 6, pp. 1965–1972, 2012.[23] N. Suryadevara, S. Mukhopadhyay, R. Wang, R. K. Rayudu, and Y. Huang, “Re-liable measurement of wireless sensor network data for forecasting wellness of el-derly at smart home,” in Instrumentation and Measurement Technology Conference(I2MTC) of the IEEE on, 2013, pp. 16–21.Bibliography 73[24] N. Suryadevara and S. Mukhopadhyay, “An intelligent system for continuous mon-itoring of wellness of an inhabitant for sustainable future,” in Humanitarian Tech-nology Conference (R10-HTC) of the IEEE on, 2014, pp. 70–75.[25] N. K. Suryadevara and S. C. Mukhopadhyay, “Determining wellness through anambient assisted living environment,” IEEE Intelligent Systems, vol. 29, no. 3, pp.30–37, 2014.[26] B. O’Mullane, B. Bortz, A. O’Hannlon, J. Loane, and R. B. Knapp, “Comparison ofhealth measures to movement data in aware homes,” Ambient Intelligence, Springer,pp. 290–294, 2011.[27] V. Jakkula, D. J. Cook, et al., “Anomaly detection using temporal data mining in asmart home environment,” Methods of Information in Medicine, vol. 47, no. 1, pp.70–75, 2008.[28] K.-J. Kim, M. M. Hassan, S. Na, and E.-N. Huh, “Dementia wandering detection andactivity recognition algorithm using tri-axial accelerometer sensors,” in UbiquitousInformation Technologies & Applications (ICUT), 4th International Conference ofthe IEEE on, 2009, pp. 1–5.[29] C. Franco, J. Demongeot, C. Villemazet, and N. Vuillerme, “Behavioral telemoni-toring of the elderly at home: Detection of nycthemeral rhythms drifts from locationdata,” in Advanced Information Networking and Applications Workshops (WAINA),24th International Conference of the IEEE on, 2010, pp. 759–766.Bibliography 74[30] E. Campo, M. Chan, W. Bourennane, and D. Este`ve, “Behaviour monitoring ofthe elderly by trajectories analysis,” in Engineering in Medicine and Biology Society(EMBS), Annual International Conference of the IEEE on, 2010, pp. 2230–2233.[31] T. W. Liao, “Clustering of time series data – a survey,” Pattern Recognition, vol. 38,no. 11, pp. 1857–1874, 2005.[32] F. Iglesias and W. Kastner, “Analysis of similarity measures in times series clusteringfor the discovery of building energy patterns,” Energies, vol. 6, no. 2, pp. 579–597,2013.[33] P. C. Mahalanobis, “On the generalized distance in statistics,” in Proceedings of theNational Institute of Sciences, vol. 2, 1936, pp. 49–55.[34] D. J. Berndt and J. Clifford, “Using dynamic time warping to find patterns intime series,” in Association for the Advancement of Artificial Intelligence (AAAI)Workshop on Knowledge Discovery in Databases on, 1994, pp. 359–370.[35] K. Pearson, “Mathematical contributions to the theory of evolution –on a form ofspurious correlation which may arise when indices are used in the measurement oforgans,” in Proceedings of the Royal Society of London, vol. 60, 1896, pp. 489–498.[36] J. C. Dunn, “A fuzzy relative of the isodata process and its use in detecting compactwell-separated clusters,” Journal of Cybernetics, pp. 32–57, 1974.Bibliography 75[37] D. L. Davies and D. W. Bouldin, “A cluster separation measure,” Pattern Analysisand Machine Intelligence, IEEE Transactions on, no. 2, pp. 224–227, 1979.[38] Y. Jung, H. Park, D.-Z. Du, and B. L. Drake, “A decision criterion for the opti-mal number of clusters in hierarchical clustering,” Journal of Global Optimization,vol. 25, no. 1, pp. 91–111, 2003.[39] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information:criteria of max-dependency, max-relevance, and min-redundancy,” Pattern Analysisand Machine Intelligence, IEEE Transactions on, vol. 27, no. 8, pp. 1226–1238, 2005.[40] C. Ding and H. Peng, “Minimum redundancy feature selection from microarray geneexpression data,” Journal of bioinformatics and computational biology, vol. 3, no. 2,pp. 185–205, 2005.[41] Q. Wu and C. Miao, “C-ELM: A curious extreme learning machine for classificationproblems,” in Proceedings of ELM-2014, vol. 1, pp. 355–366.[42] BC Hydro. Smart metering program. Accessed: 2015-06-05. [On-line]. Available: https://www.bchydro.com/energy-in-bc/projects/smart meteringinfrastructure program.html[43] E. Kreyszig, Advanced Engineering Mathematics Tenth Edition. Wiley, 2011, p.1014.Bibliography 76[44] J. C. Bezdek, R. Ehrlich, and W. Full, “FCM: The fuzzy c-means clusteringalgorithm,” Computers & Geosciences, vol. 10, no. 2, pp. 191–203, 1984.[45] U. Maulik and S. Bandyopadhyay, “Performance evaluation of some clustering al-gorithms and validity indices,” Pattern Analysis and Machine Intelligence, IEEETransactions on, vol. 24, no. 12, pp. 1650–1654, 2002.[46] H. Liu, J. Li, and L. Wong, “A comparative study on feature selection and classi-fication methods using gene expression profiles and proteomic patterns,” GenomeInformatics, vol. 13, pp. 51–60, 2002.[47] S. Doraisamy, S. Golzari, N. Mohd, M. N. Sulaiman, and N. I. Udzir, “A study onfeature selection and classification techniques for automatic genre classification oftraditional Malay music.” in the 9th International Conference on Music InformationRetrieval on, 2008, pp. 331–336.[48] G. Forman, “An extensive empirical study of feature selection metrics for text clas-sification,” The Journal of Machine Learning Research, vol. 3, pp. 1289–1305, 2003.[49] I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” TheJournal of Machine Learning Research, vol. 3, pp. 1157–1182, 2003.[50] K. Kira and L. A. Rendell, “The feature selection problem: traditional methods anda new algorithm,” in the 10th national conference on Artificial intelligence on, 1992,pp. 129–134.Bibliography 77[51] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20,no. 3, pp. 273–297, 1995.[52] R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artificial Intel-ligence, vol. 97, no. 1, pp. 273–324, 1997.[53] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer clas-sification using support vector machines,” Machine Learning, vol. 46, no. 1-3, pp.389–422, 2002.[54] E. Frank, M. Hall, and B. Pfahringer, “Locally weighted naive Bayes,” in the 19thconference on Uncertainty in Artificial Intelligence on, 2002, pp. 249–256.[55] J. Liu, S. Ranka, and T. Kahveci, “Classification and feature selection algorithmsfor multi-class CGH data,” Bioinformatics, vol. 24, no. 13, pp. i86–i95, 2008.[56] A. Janecek, W. N. Gansterer, M. Demel, and G. Ecker, “On the relationship be-tween feature selection and classification accuracy,” in Journal of Machine LearningResearch (JMLR), Workshop and Conference on, vol. 4, 2008, pp. 90–105.78Appendices79Appendix APseudocode for the ProposedMethodAlgorithm 6 The pseudocode for the proposed methodLet D be the number of days in historyint[ ][ ] maxTimeCount = new int [24][24]int[ ][ ] minTimeCount = new int [24][24]double[ ][ ] range = new double [24][D]for d = 1 to D dofor i = 1 to 24 doGet the i-hour moving total sequence of day dmaxTime = time of day when the Maximum occurredminTime = time of day when the Minimum occurredrange[i][d] = Maximum - MinimummaxTimeCount[i][maxTime] increased by 1minTimeCount[i][minTime] increased by 1end forend for*** Deduce regular energy patterns from history ****double[ ][ ] probMax = new double [24][24]double[ ][ ] probMin = new double [24][24]probMax = maxTimeCount/DprobMin = minTimeCount/Ddouble[ ] rangeMean = new double [24]double[ ] rangeStdev = new double [24]for i = 1 to 24 dorangeMean[i] = Mean(range[i][1:D])rangeStdev[i] = Standard Deviation(range[i][1:D])end forAppendix A. Pseudocode for the Proposed Method 80Algorithm 6 The pseudocode for the proposed method (continued)*** Calculate the score for a given day ******* Assuming the score assignments are +/- 1 ****int[ ] scoreMax = new int [24]int[ ] scoreMin = new int [24]int[ ] scoreRange = new int [24]for i = 1 to 24 doGet the i-hour moving total sequence of the day of interestthisMaxTime = time of day when the Maximum occurredthisMinT ime = time of day when the Minimum occurredthisRange = Maximum - MinimumthisRangeZscore = abs(thisRange - rangeMean[i])/rangeStdev[i]if probMax[i][thisMaxTime] ≥ thresholdProbMax thenscoreMax[i] = 1elsescoreMax[i] = -1end ifif probMin[i][thisMinT ime] ≥ thresholdProbMin thenscoreMin[i] = 1elsescoreMin[i] = -1end ifif thisRangeZscore ≤ thresholdRangeZscore thenscoreRange[i] = 1elsescoreRange[i] = -1end ifend forFinalScore = sum(scoreMax[1:24])+sum(scoreMin[1:24])+sum(scoreRange[1:24])81Appendix BSurvey Samples from Home CThe survey timesheets of March 2 and March 15 from Home C are provided in thisappendix.Appendix B. Survey Samples from Home C 82Figure B.1: Home C survey timesheet - Mar 2,2015Appendix B. Survey Samples from Home C 83Figure B.2: Home C survey timesheet - Mar 15,201584Appendix CChoosing the Design Variable Valuesfor the Proposed MethodThis appendix explains how the design variables were chosen using Home B as an example.Please refer to Table 3.1 in Section 3.3.1 for the list of the chosen thresholds of Home B.This appendix explains how the thresholds were chosen according to the hourly energyconsumption data. Since the two to 24 hours moving totals were essentially derived fromthe hourly energy consumption data, the same set of thresholds was applied to all ofthem.2 4 6 8 10 12 14 16 18 20 22 2400.10.20.30.40.530 days of dataTime of dayProbability2 4 6 8 10 12 14 16 18 20 22 2400.10.20.30.40.560 days of dataTime of dayProbability2 4 6 8 10 12 14 16 18 20 22 2400.10.20.30.40.590 days of dataTime of dayProbability2 4 6 8 10 12 14 16 18 20 22 2400.10.20.30.40.5120 days of dataTime of dayProbabilityFigure C.1: Occurrence probability of the maximum hourly energy consumption atHome B considering 30, 60, 90 and 120 days of data respectivelyFirst, the number of days of data to be included was determined according to theAppendix C. Choosing the Design Variable Values for the Proposed Method 85hourly energy consumption data. As can be seen in Figure C.1, the maximum hourlyenergy consumption occurred more frequently at 2PM and 7PM and this was the mostevident when only 30 days of data were included. As more data were included, theoccurrence probabilities of the maximum hourly energy consumption at 2PM and 7PMtended to distribute to adjacent hours. This is probably because the activity patternsof the occupant change slightly over time. On the one hand, we would like to captureonly the most recent regular activity patterns of the occupant. On the other hand, wewould also like to include as much data as possible so that we have more confidence inthe probability estimation. Considering the trade-off between the above two objectives,the size of the data window was chosen to be 60 days.Second, the threshold between the probable and less probable hours needed to bedetermined. The premise was that we would rather tolerate more false alarms than moremissed detections of anomalies in activity patterns. Therefore, the threshold value waschosen so that there were a lot more less probable hours than probable hours. In otherwords, only the hours with significantly higher probabilities than the others were deemedprobable hours. As illustrated in the top right sub-plots of Figure C.1, there were onlytwo noticeable peaks (at 2PM and 7PM). The threshold was then chosen so that only2PM and 7PM were deemed probable. A probability of 0.1 appeared to be a reasonablethreshold to achieve this. The threshold value for the occurrence of the minimum hourlyenergy consumption was chosen in a similar manner and hence is not discussed in detailsin this appendix.Appendix C. Choosing the Design Variable Values for the Proposed Method 86Third, the threshold value for the range Z-Score was determined. The threshold valuewas again determined according to the hourly energy consumption data. After observingseveral different 60-day data windows, the average of the absolute range Z-Scores of thehourly energy consumption data was about 0.8. However, if the average was used asthe threshold, there would be too many days deemed not normal from a range Z-Score’sperspective. Therefore, a threshold higher than the average was more desirable. Thethreshold value was then chosen to be 1 which is about 25% higher than the average.Last, the score assignment was determined. As there was no particular reason toweight one feature more than the others, all the sub-scores were equally weighed. There-fore, the probabilities on or above the corresponding thresholds were assigned +1 whilethose below the thresholds were assigned -1. The range Z-Scores on or below the thresholdwere assigned +1 while those above the threshold were assigned -1.87Appendix DChoosing the Threshold Values ofRegular Activity Patterns for theSurveyThis appendix explains how the threshold values were chosen, using Home B as an exam-ple. Please refer to Table 3.2 in Section 3.3.2 for the list of the chosen threshold valuesand Table 3.3 in Section 3.3.2 for the length of data included for Home B.First, the number of days of daily energy consumption data (i.e. the total energyconsumption of all appliances during a day) to be included was determined. As thedaily energy consumption generally did not vary much from day to day, 365 days of thedaily energy consumption data were included in case there were any seasonal changes.Although the daily energy consumption data did not vary much from day to day, thehourly energy consumption data within a day fluctuated noticeably from day to day.Therefore, the size of the data window of the hourly energy consumption data mentionedin Appendix C was much smaller.Second, the threshold values of regular activity patterns were determined. The bestscenario was to have a few days with noticeably higher level of suspicion values and tohave most of the other days with lower level of suspicion values. In order to achievethis, the combinations of the six threshold values needed to be considered. A list of theAppendix D. Choosing the Threshold Values of Regular Activity Patterns for the Survey 88Features of Activity Patterns Potential ThresholdsProbabilityDaily highly probable activities ≥ 0.8/0.9Daily less probable activities ≤ 0.2/0.1Hourly highly probable activities ≥ 0.8/0.85/0.9/0.95Hourly less probable activities ≤ 0.15/0.1/0.05Daily less probable durations of activities ≤ 0.15/0.1/0.05|Z-Score|Less probable daily energy consumptions ≥ 0.9/0.95/1/1.2/1.3Table D.1: A list of potential threshold values for Home Bpotential thresholds for Home B are shown in Table D.1. The potential thresholds werechosen by observation. For instance, 0.7 was not included in the list of the daily highlyprobable activities because there was no activity that occurred between 43% and 80% ofthe total number of days.As mentioned in Section 3.2.7, the accumulated level of suspicion value for a day isequal to the sum of the level of suspicion values from the above six features. If the activitypattern of a certain day is significantly different from the regular activity patterns, theaccumulated level of suspicion value of that particular day will probably be due to mostof the six features. If the level of suspicion value of a day is relatively low and is onlydue to one feature, it probably indicates that the threshold value for that particularfeature can be tightened. Therefore, the selection of the threshold values began withthe loosest values for the six features. The next threshold values were chosen when thathelped differentiate the days with noticeably higher level of suspicion values from theother days. The selection of threshold values ended when one of the following eventswas encountered: 1) the selection of the thresholds was already the tightest possibleAppendix D. Choosing the Threshold Values of Regular Activity Patterns for the Survey 89combination and 2) choosing the next threshold value of any of the six features wouldsignificantly harm the differentiation between the highly suspicious (possibly anomalous)days and the possibly normal days.90Appendix EPerformance Evaluation withoutConsidering Daily EnergyConsumptionProposed Method Raw Data Based Approach [32]Training SetAccuracy 46/61 (75.41%) 43/61 (70.49%)Missed Detection 5 4False Alarm 10 14Test SetAccuracy 22/30 (73.33%) 19/30 (63.33%)Missed Detection 4 4False Alarm 4 7Table E.1: The performances of the proposed method and the chosen raw data basedapproach without considering daily energy consumptionThe pseudo ground truth in this evaluation was derived from a modified version ofthe level of suspicion value. The modified level of suspicion value only considers the fivefeatures described in Section 3.2.1 to Section 3.2.5, but does not include the daily energyconsumption feature described in Section 3.2.6. The purpose of this evaluation was toexamine the impact of the daily energy consumption feature on the accuracy performancesof the proposed method and the raw data based approach. The performances of theproposed method and the raw data based approach are summarized in Table E.1. Ascompared to the accuracy performances in Section 3.5.5, the training set accuracies of theproposed method and the raw data based approach dropped by 19.30% (93.44−75.4193.44×100%)Appendix E. Performance Evaluation without Considering Daily Energy Consumption 91and 20.37% respectively, while the test set accuracies dropped by 15.38% and 13.64%respectively. Removing the less probable daily energy consumption feature from thepseudo ground truth appeared to adversely affect both the proposed method and the rawdata based approach equally.92Appendix FRankings of the Energy Features forHomes A, B and CThe rankings of the 72 energy features for Homes A, B and C based on mRMR featureselection (with the MIQ criterion) are shown in this appendix. A dictionary of the featureindices is also provided.Appendix F. Rankings of the Energy Features for Homes A, B and C 93Home A Home B Home C Home A Home B Home CRank Feature Index Rank Feature Index1 65 52 63 37 4 25 72 33 50 66 38 60 9 383 9 44 61 39 62 59 424 51 49 51 40 14 48 285 69 66 57 41 27 60 176 16 26 64 42 29 11 417 23 31 70 43 12 61 478 64 53 65 44 22 62 149 6 51 58 45 20 17 4610 52 67 60 46 30 36 811 67 39 52 47 7 63 512 54 68 53 48 8 7 4813 32 54 69 49 5 8 2914 66 69 56 50 45 32 2115 53 46 72 51 43 37 2516 31 27 54 52 3 14 3717 55 43 62 53 47 6 3118 70 70 55 54 25 16 2719 68 55 67 55 46 29 2620 17 71 49 56 24 12 4321 56 56 59 57 15 20 1922 10 72 68 58 36 15 4423 57 41 50 59 2 33 1524 71 34 71 60 38 47 125 49 57 32 61 42 30 3626 58 40 40 62 1 13 2327 72 10 22 63 18 23 228 50 18 34 64 40 22 429 28 35 35 65 37 38 1830 59 3 6 66 19 1 2431 11 64 3 67 26 21 3932 61 65 16 68 41 4 1333 21 19 11 69 48 42 934 63 2 30 70 35 5 2035 34 28 45 71 44 24 1036 13 58 33 72 39 45 12Table F.1: The rankings of the energy features for Homes A, B and C based on mRMRwith the MIQ criterionAppendix F. Rankings of the Energy Features for Homes A, B and C 94Index Feature Index Feature1 1- hour Max time 37 13- hour Min time2 2- hour Max time 38 14- hour Min time3 3- hour Max time 39 15- hour Min time4 4- hour Max time 40 16- hour Min time5 5- hour Max time 41 17- hour Min time6 6- hour Max time 42 18- hour Min time7 7- hour Max time 43 19- hour Min time8 8- hour Max time 44 20- hour Min time9 9- hour Max time 45 21- hour Min time10 10- hour Max time 46 22- hour Min time11 11- hour Max time 47 23- hour Min time12 12- hour Max time 48 24- hour Min time13 13- hour Max time 49 1- hour Range14 14- hour Max time 50 2- hour Range15 15- hour Max time 51 3- hour Range16 16- hour Max time 52 4- hour Range17 17- hour Max time 53 5- hour Range18 18- hour Max time 54 6- hour Range19 19- hour Max time 55 7- hour Range20 20- hour Max time 56 8- hour Range21 21- hour Max time 57 9- hour Range22 22- hour Max time 58 10- hour Range23 23- hour Max time 59 11- hour Range24 24- hour Max time 60 12- hour Range25 1- hour Min time 61 13- hour Range26 2- hour Min time 62 14- hour Range27 3- hour Min time 63 15- hour Range28 4- hour Min time 64 16- hour Range29 5- hour Min time 65 17- hour Range30 6- hour Min time 66 18- hour Range31 7- hour Min time 67 19- hour Range32 8- hour Min time 68 20- hour Range33 9- hour Min time 69 21- hour Range34 10- hour Min time 70 22- hour Range35 11- hour Min time 71 23- hour Range36 12- hour Min time 72 24- hour RangeTable F.2: Dictionary of the feature indices

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0223561/manifest

Comment

Related Items