UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Improving the feasibility of energy disaggregation in very high- and low-rate sampling scenarios Clark, Michelle Susan 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2015_november_clark_michelle.pdf [ 1.6MB ]
JSON: 24-1.0220734.json
JSON-LD: 24-1.0220734-ld.json
RDF/XML (Pretty): 24-1.0220734-rdf.xml
RDF/JSON: 24-1.0220734-rdf.json
Turtle: 24-1.0220734-turtle.txt
N-Triples: 24-1.0220734-rdf-ntriples.txt
Original Record: 24-1.0220734-source.json
Full Text

Full Text

Improving the Feasibility of Energy Disaggregation inVery High- and Low-Rate Sampling ScenariosbyMichelle Susan ClarkBSc in Engineering Physics, University of Alberta, 2013A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of Applied ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)September 2015c©Michelle Susan Clark, 2015AbstractGiven the world’s urgent need to reduce greenhouse gas emissions in order to avoidthe most disastrous effects of climate change, efforts must be made to reduce theseemissions in every way possible. A large share of these emissions come fromthe energy consumption in buildings and there are significant opportunities to re-duce this consumption through energy saving measures. Energy disaggregation ornon-intrusive load monitoring (NILM) is a useful tool that infers the energy con-sumption of individual appliances or equipment within a building from detailedmeasurements of the building’s total energy consumption. This method is veryattractive for providing a detailed breakdown of building energy consumption be-cause it is less expensive and more convenient than measuring the energy use ofeach appliance individually. A wide variety of NILM methods have been proposedand in this thesis we focus on improving the feasibility of two different classes ofNILM methods. We first explore the use of random filtering and random demodu-lation, two methods closely related to the new and developing field of compressedsensing (CS), to acquire and manage very-high-rate electrical measurements usedfor NILM. We show that these methods allow us to reduce the required samplingrate and volume of data collected while retaining valuable signal information re-quired for NILM. Second, we switch to the analysis of very-low-rate data for NILMand develop a method to detect interesting patterns in the very-low-rate aggregateconsumption signal. These patterns are shown to be responsible for a significantshare of the total energy consumption in some buildings and are also related to theoutdoor air temperature in some cases. Taken together, the two parts of this thesisallow us to contribute to the field of NILM by improving its feasibility and helpingto facilitate its widespread use.iiPrefaceThe work presented in this thesis was completed in the Department of Electricaland Computer Engineering at the University of British Columbia under the super-vision of Dr. Lutz Lampe.Part of the work in Chapter 4 will be published in the following conferencepaper:• M. Clark and L. Lampe, “Using random filters to compressively sample elec-trical data for non-intrusive load monitoring,” in IEEE Global Conf. SignalInform. Process., Orlando, FL, Dec. 2015. c©2015 IEEE.The copyright for the publication listed above has been transferred to the Instituteof Electrical and Electronics Engineers (IEEE), but I have permission to use thework associated with it in my thesis. The content in Chapter 4 has been modifiedsubstantially from the version in the conference publication listed above to add de-tail and make the entire thesis coherent. I am the primary author of the conferencepublication and have completed the majority of the work associated with it. Thisincludes, but is not limited to, literature review, simulation design and implemen-tation, data analysis, and manuscript writing. Dr. Lampe contributed to the workas a secondary author by providing guidance and supervision throughout.The work in Chapter 5 was completed as part of a Mitacs Accelerate internshipand a final report summarizing the work was submitted to Mitacs at the conclusionof the internship. The following provisional patent application as also been filed inassociation with the work:• M. S. Clark, B. P. Gready, C. M. Puetter and J.-S. Bernier, “Apparatus andmethod for detection of large equipment from low resolution non-residentialiiienergy consumption data,” U.S. Provisional Patent Application 62190124,Jul. 8, 2015.The intellectual property of the work in Chapter 5 belongs to the partner orga-nization with which the internship was completed, but UBC’s project terms forMitacs Accelerate internships permit me to include this content in my thesis. Iperformed the majority of the work on this project, including literature review, de-sign, analysis and writing. I was directly supervised by Dr. Benjamin Gready, whocontributed to the work in a supervisory role and provided guidance throughout theproject. Dr. Christopher Puetter and Dr. Jean-Sebastien Bernier contributed to theconception of the method and shared ideas on it through various discussions.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Use Cases for Load-Specific Energy Information . . . . . . . . . 21.2 Data for Disaggregation: Smart Meters and Beyond . . . . . . . . 41.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Energy Disaggregation: State of the Art . . . . . . . . . . . . . . . . 72.1 Categorizing Disaggregation Methods . . . . . . . . . . . . . . . 72.1.1 Disaggregation Features . . . . . . . . . . . . . . . . . . 92.1.2 Disaggregation Algorithms . . . . . . . . . . . . . . . . . 10v2.2 A Review of Existing Disaggregation Work . . . . . . . . . . . . 112.2.1 Very-Low-Rate: Multiple Events Per Interval . . . . . . . 112.2.2 Low-Rate: Event Isolation . . . . . . . . . . . . . . . . . 132.2.3 Medium-Rate: Transient Analysis . . . . . . . . . . . . . 152.2.4 High-Rate: Low Order Harmonics . . . . . . . . . . . . . 162.2.5 Very-High-Rate: Waveform Shapes . . . . . . . . . . . . 172.2.6 Extremely-High-Rate: Electrical Noise . . . . . . . . . . 183 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.1 Question 1: Managing Very-High-Rate NILM Data . . . . . . . . . 203.1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . 203.1.2 Existing Approaches . . . . . . . . . . . . . . . . . . . . 253.1.3 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . 303.2 Question 2: Improving Very-Low-Rate NILM . . . . . . . . . . . 313.2.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . 313.2.2 Existing Approaches . . . . . . . . . . . . . . . . . . . . 323.2.3 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . 344 Single-Channel Randomized Sampling to Manage Very-High-RateData for NILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.1 Background on Compressed Sensing and Related Concepts . . . . 364.1.1 Compressed Sensing . . . . . . . . . . . . . . . . . . . . 364.1.2 Random Filtering . . . . . . . . . . . . . . . . . . . . . . 374.1.3 Random Demodulation . . . . . . . . . . . . . . . . . . . 384.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.2.1 Design Considerations . . . . . . . . . . . . . . . . . . . 414.2.2 Sampling Framework . . . . . . . . . . . . . . . . . . . . 474.2.3 The Sensing Matrix . . . . . . . . . . . . . . . . . . . . . 534.3 Numerical Results and Discussion . . . . . . . . . . . . . . . . . 544.3.1 Obtaining and Processing Test Data . . . . . . . . . . . . 554.3.2 Experiment 1: Load Basis . . . . . . . . . . . . . . . . . 564.3.3 Experiment 2: Generic Basis . . . . . . . . . . . . . . . . 594.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65vi5 Analyzing Energy Transitions to Improve Very-Low-Rate NILM . . 665.1 Background and Definitions . . . . . . . . . . . . . . . . . . . . 675.2 Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.2.1 Generating Histograms of Energy Transitions . . . . . . . 675.2.2 Building Candidate Selection . . . . . . . . . . . . . . . 715.2.3 Spike Extraction and Characterization . . . . . . . . . . . 745.2.4 Temperature Correlation Analysis . . . . . . . . . . . . . 775.2.5 Alternative Approaches to Analyzing Consumption Spikes 805.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . 855.3.1 Number of Candidates . . . . . . . . . . . . . . . . . . . 855.3.2 Density Function Selection Performance . . . . . . . . . . 865.3.3 Correlation Analysis . . . . . . . . . . . . . . . . . . . . 885.3.4 Correlation Samples . . . . . . . . . . . . . . . . . . . . 915.3.5 Comparison with an Alternative Method . . . . . . . . . . 995.3.6 Correlations with Heating and Cooling Degree-Days . . . 1035.3.7 Share of Energy Consumption Due to Spikes . . . . . . . 1055.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106.2 Opportunities for Future Work . . . . . . . . . . . . . . . . . . . 111Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113viiList of TablesTable 2.1 Summary of NILM methods organized by sampling rate . . . . 12Table 3.1 Approaches for high-rate NILM data acquisition and management 26Table 4.1 Sparsity basis options for representing electrical signals . . . . 44Table 5.1 Automatic density function selection performance . . . . . . . 87Table 5.2 Comparing predictions by our method and the alternative method 102Table 5.3 Comparing our method’s predictions using temperature vs.heating and cooling degree-days . . . . . . . . . . . . . . . . 104Table 5.4 Comparing predictions by our method and the alternativemethod using temperature or heating and cooling degree-days . 104viiiList of FiguresFigure 2.1 A depiction of how changes in real power can be used to iden-tify changes in the states of appliances for NILM (Reprintedfrom [1] with permission, c©1992 IEEE) . . . . . . . . . . . . 14Figure 3.1 Basic NILM framework . . . . . . . . . . . . . . . . . . . . . 23Figure 4.1 Random filtering matrix equation example (Reprinted withpermission, c©2015 IEEE) . . . . . . . . . . . . . . . . . . . 38Figure 4.2 Random demodulation matrix equation example . . . . . . . 39Figure 4.3 General random filter sampling framework . . . . . . . . . . 40Figure 4.4 Block diagrams of single-channel randomized sampling methods 41Figure 4.5 Processing devices for our two proposed sampling frameworks 49Figure 4.6 Wrapping around the sensing matrix coefficients when mea-suring periodic signals (Reprinted with permission, c©2015IEEE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Figure 4.7 Sample load current waveforms derived from the BLUEDdataset (Reprinted with permission, c©2015 IEEE) . . . . . . 55Figure 4.8 Test error rates for varying random filter length and number ofrecovered cycles (Reprinted with permission, c©2015 IEEE) . 58Figure 4.9 Test error rates for varying number of active loads and tworecovery techniques (Reprinted with permission, c©2015 IEEE) 60Figure 4.10 Signal recovery performance for varying number of activeloads and several sparsity bases . . . . . . . . . . . . . . . . 63ixFigure 4.11 Signal recovery performance for varying signal-to-noise ratioand several sparsity bases . . . . . . . . . . . . . . . . . . . . 64Figure 5.1 Examples of energy transition histograms for two buildings . . 68Figure 5.2 Power and energy consumption of an example equipment pro-cess, where energy measurements are the integration of powerconsumption over 0.25 hour periods . . . . . . . . . . . . . . 70Figure 5.3 An example of electricity consumption sample weeks and anenergy transition histogram used for manual inspection . . . . 73Figure 5.4 Manual classifications of buildings for whether or not they arecandidates for further analysis based on their histograms . . . 74Figure 5.5 Flow charts of spike separation correlation testing . . . . . . . 79Figure 5.6 Time and Fourier properties of a building with frequent spikes 82Figure 5.7 Examples of energy use and their autocorrelation functions . . 83Figure 5.8 Examples of the estimated durations of equipment processesin four different buildings . . . . . . . . . . . . . . . . . . . 85Figure 5.9 Size of sample set after each stage of analysis . . . . . . . . . 86Figure 5.10 Number of correlations detected in analyzed buildings . . . . 89Figure 5.11 Breakdown of detected correlations by vertical . . . . . . . . 89Figure 5.12 Fraction of each vertical with detected correlations . . . . . . 90Figure 5.13 Sample weeks of a building with a positive correlation detected 92Figure 5.14 Summary plots of a building with a positive correlation detected 93Figure 5.15 Sample weeks of a building with a negative correlation detected 95Figure 5.16 Summary plots of a building with a negative correlation detected 96Figure 5.17 Sample weeks of a building with no correlation detected . . . 97Figure 5.18 Summary plots of a building with no correlation detected . . . 98Figure 5.19 Tile plots comparing our method with the alternative method . 101Figure 5.20 Tile plots comparing our method with the alternative method,including plots for the use of heating and cooling degree-days 106Figure 5.21 Share of building energy consumption attributed to spikes . . 107Figure 5.22 Share of building energy consumption attributed to spikes forthe ten verticals with the most buildings containing spikes . . 108xList of SymbolsHigh-rate analysis symbols (Chapter 4)x(t) continuous time form of signal we wish to measurex vector form of signal we wish to measureΨ sparsity basisd length of vector xs sparse vector representation of the vector x using the basisΨΦ compressed sensing matrixy vector of compressed sensing measurements representing xm number of compressed sensing measurements used to represent xxest estimated electrical signal vectorRNyq Nyquist sampling rate used to acquire high resolution electrical signalsR low sampling rate used to capture compressed measurementsq subsampling factorf f und fundamental frequency of electrical signalsNl total number of loadsNa number of active loadsxiτ time delay for calculating the difference signalxτ copy of x delayed by time τyτ copy of y delayed by time τ∆x electrical difference signal vector∆y vector of compressed sensing measurements of∆xh vector of random filter tap coefficientsB length of random filter (number of taps)p(t) continuous time form of periodic chipping sequenceTp period of the chipping sequence, p(t)P diagonal matrix of chipping sequence coefficientsL integrate-and-dump low pass filter matrixT time between sampling switch closurestbl duration that the sampling switch stays closedkbl integer number of electrical cycles per recovery blockmbl number of compressed sensing measurements collected per recovery blockkdel integer number of sets of m compressed sensing measurements in mdelmdel number of compressed sensing measurements corresponding to the timedelay τdbl length of signal to be recovered for each recovery blockσ2s mean of the variance of the test signalsσ2n variance of Gaussian noise added to test signalsNt number of test signals evaluated in numerical experimentsxiiLow-rate analysis symbols (Chapter 5)a duration of an equipment processb power consumption of an equipment processtint time between very-low-rate measurementsfspike frequency at which spikes occur in low-rate energy datau energy consumption difference associated with an energy transitionr approximate range of observed energy transitionsc center of first positive peak in the energy transition density functioncˆ center of the first negative peak in the energy transition density functionσ2mean mean of the variance of the first positive and negative energy transitiondensity function peaksw width of window used to search for energy spikestoutdoor average outdoor air temperature on a given daytset,heat set point for calculating heating degree-daystset,cool set point for calculating cooling degree-daysxiiiGlossaryACF autocorrelation functionADC analog-to-digital converterANN artificial neural networkA-OMP appliance orthogonal matching pursuitCDD cooling degree-daysCO2 carbon dioxideCS compressed sensingDR demand responseDCT discrete cosine transformFLAC free lossless audio codecFSM finite state machineGLS generalized least squaresHAN home area networkHDD heating degree-daysHMM hidden Markov modelHVAC heating, ventilation and air conditioningxivMWC modulated wideband converterNILM non-intrusive load monitoringOMP orthogonal matching pursuitRMS root mean squareRSS residual sum of squaresSMB small to medium businessSNR signal-to-noise ratioSVM support vector machineTOU time-of-usexvAcknowledgmentsI would first like to thank Dr. Lutz Lampe for his guidance throughout my Master’sprogram. It has been a pleasure working with such a supportive and accommodat-ing supervisor and I will endeavour to emulate his insightful approach to researchthroughout my career. I would also like to acknowledge the Natural Sciences andEngineering Research Council of Canada (NSERC) for research funding duringmy program, as well as Mitacs for their financial and administrative support of myinternship. I am also deeply grateful for the support of my family and friends, bothold and new, who have helped me to persevere through my graduate school expe-rience. Finally, I want to thank my husband, Garrett. Over the past two years hehas spent enormous amounts of energy making sure that I eat enough to stay alive,sleep enough to stay sane, and always have a shoulder to cry on when I need one.I cannot thank him enough.xviChapter 1IntroductionReducing carbon dioxide (CO2) emissions is an urgent and severe global issue. Theeffects of climate change due to increasing levels of CO2 in the atmosphere are al-ready being observed, including effects on physical systems (e.g., water resourcequantity and quality), biological systems (e.g., changing geographic ranges of var-ious plant and animal species) and human systems (e.g., food production) [2]. In2010, global leaders met at the United Nations Cancun Climate Conference andagreed to take actions to limit the increase in global average temperature to 2 ◦Crelative to pre-industrial time [3]. Achieving this goal requires substantial changesfrom business-as-usual conditions. The consequences of delaying these changesare severe, including “greater risks of economic disruption” and “greater risks offailing to meet the 2 ◦C target, which would lead to substantially higher adaptationschallenges and costs” [4].One important opportunity to reduce global CO2 emissions is by reducing theenergy consumption of buildings [2, 4, 5]. To do so, it is important to first under-stand how energy is currently being used in buildings. Once we understand this,we can then identify the most effective ways to reduce building energy consump-tion without compromising the functionality of the buildings or the comfort of theiroccupants. Energy disaggregation is a convenient and scalable way to obtain thisinformation and is the main focus of this thesis.The energy consumption of a building’s loads (i.e., appliances or equipment)could be measured individually using one monitoring device per load but energy1disaggregation offers an attractive alternative. The cost and inconvenience of pur-chasing, installing and maintaining monitoring devices for each individual loadare high [6, 7] and make this solution infeasible or undesirable in many buildings.With energy disaggregation, however, one measures the energy consumption of theentire building at a single location (e.g., the utility access point) and then infers theenergy consumption of individual loads from these measurements. An essentialpart of this technology is that the total building consumption measurements arecollected much more frequently than once per billing period, with sampling ratesanywhere from once per hour to millions of samples per second. The first proposalof this energy analysis technique is typically credited to Hart, who developed themethod in the late 1980s and wrote a widely-cited paper outlining it in 1992 [1].This monitoring strategy effectively moves the technological burden of the solu-tion from hardware (i.e., many measurement devices) to software (i.e., a complexdata processing problem) [1]. Energy disaggregation is also sometimes referred toas non-intrusive load monitoring (NILM) due to the fact that no intrusion into thehome is necessary in order to perform disaggregation. By avoiding this intrusion,NILM reduces the cost and inconvenience of energy monitoring systems, therebyencouraging more widespread use of these systems. This is an important step infacilitating faster improvements to building energy efficiency and reductions in CO2emissions.1.1 Use Cases for Load-Specific Energy InformationRegardless of whether load-specific energy information is obtained intrusively ornon-intrusively, it provides valuable information about how energy is being used ina building. There are many use cases for this information, which can be categorizedby the three main groups of people that can use the information: energy consumers,utility companies and researchers [7]. Here we list some of the most compellinguse cases for each category.First, detailed energy consumption information can be used to provide en-ergy consumers with feedback about how they use energy. People sometimeshave difficulty understanding how energy is used in their homes [8] and provid-ing load-specific consumption information can help them make links between their2behaviours and the energy they use. For example, Ueno et al. [9] showed thatgiving consumers appliance-specific consumption information can reduce house-holds’ total energy consumption. They reported that “power consumption of thewhole household was reduced on average by about 9% across 8 households.”Energy consumption information can also be useful for utility companies for avariety of reasons. Detailed energy consumption information helps utilities eval-uate energy savings programs they administer (e.g., BC Hydro’s “Power Smart”campaign [10]) by strengthening the connections one can make between these en-ergy efficiency programs and any observed energy savings [7]. For example, itis easier to prove the effectiveness of a program that encourages people to turnoff lights if one can observe a measured decrease in energy consumption by lightsrather than just an overall decrease in consumption, which could be due to manyother factors. Utilities can also use energy disaggregation to identify the typesof loads that exist in various buildings and how they are being used. For exam-ple, knowing which buildings use electric heating and which use gas heating couldhelp these companies have a more personalized relationship with their customers,allowing them to make more relevant recommendations for new service offerings,equipment upgrades, rebates, etc [7, 11].Finally, detailed energy consumption can be useful for researchers who are at-tempting to understand and improve the way buildings and appliances operate. Inparticular, many buildings that are designed to be sustainable do not actually per-form as well as expected [12]. Load-specific energy information would offer veryvaluable information in these cases, allowing one to understand which loads arenot performing as designed and to address the issue, either in the completed build-ing or in the design of future “green” buildings [7]. In addition, there is currentlylittle publicly-available information about how appliances behave in homes (e.g.,their energy consumption time series, times of use, durations of use) [13]. This in-formation could be very useful for various other areas of energy-related research,such as the study of demand response (DR) programs that encourage consumers tomodify their energy usage patterns using various incentive schemes.31.2 Data for Disaggregation: Smart Meters and BeyondWith the recent popularity of smart meters, disaggregation of whole-building en-ergy data is becoming more of a possibility than ever before. Rather than report-ing a building’s energy usage to consumers once per billing period (e.g., once permonth), smart meters collect measurements much more frequently (e.g., once perhour, minute or second). These higher granularity energy measurements are es-sential for energy disaggregation techniques, since for more information must beinferred from the measurements then simply the building’s total energy use. Inparticular, there has been a significant interest in recent years to develop NILM al-gorithms for data sampled every one second to one minute [14–22] in anticipationthat smart meters will be capable of collecting data at this rate in the near future.In fact, Kelly and Knottenbelt [23] collected a public dataset specifically for NILMresearch and used a 6 second sampling rate in anticipation of a roll-out of smartmeters in the United Kingdom that will sample once every 10 seconds.These methods designed for easy implementation using smart meter data areattractive but there are tradeoffs associated with easy implementation. These meth-ods are only expected to be able to disaggregate less than 10 appliance types fromthe total energy signal [7], which may not be sufficient for some energy disaggre-gation use cases. These methods also tend to use models of appliances with finitestates and fixed power consumptions, so they usually cannot identify loads withcontinuously variable energy consumption (e.g., a light with a dimmer switch).Thus, work on energy disaggregation techniques using even higher rate data is alsonecessary to address the more challenging applications of this technology.In addition, though smart meter data is becoming more and more widely avail-able, not all buildings have them installed. Further, it is common for smart metersto record and store consumption data only once per hour or fifteen minutes, whichis not frequent enough for the vast majority of existing energy disaggregation tech-niques. Thus, continuing research on energy disaggregation techniques using thisvery-low-rate data is also necessary.41.3 ContributionsIn this thesis, we address two specific areas of energy disaggregation research thathave received little attention. Our overarching goal is to improve the feasibility ofusing energy disaggregation in a wider variety of applications than it is currentlycapable of addressing. In particular, we help facilitate the use of energy disaggre-gation in very high- and low-rate sampling scenarios through the following twocontributions:1. Reducing sampling rate and processing burden for high rate sampling ofelectrical data for energy disaggregation. A major challenge with high ratesampling for energy disaggregation is managing the large volumes of datacollected. We propose and explore the use of random filtering [24] and ran-dom demodulation [25], concepts closely related to the field of compressedsensing (CS), to reduce the electrical data sampling rate while maintainingthe important information required to disaggregate the electrical signal. Ourproposed solution facilitates the use of very-high-rate NILM by making itmore feasible to manage the very-high-rate data.2. Applying techniques from the energy disaggregation field on lower-ratedata than it is typically used for. Energy disaggregation using measure-ments collected every 15 to 60 minutes is more readily available than higherrate measurements, but few attempts have been made to push the bound-aries of energy disaggregation techniques into this realm. We demonstratehow certain distinct features of these signals can be extracted and analyzed,thereby facilitating a more immediate application of energy disaggregationon 15 to 60 minute data. In particular, we show how brief, regularly-sizedspikes in 15 minute energy consumption data can be identified and that theyare sometimes correlated with the temperature outside the building.1.4 OrganizationIn the following chapters, we describe the research performed for this thesis. First,we discuss prior work on energy disaggregation in Chapter 2. Next, we state andmotivate our two research questions in Chapter 3. We then discuss our first contri-5bution in Chapter 4, where we describe, simulate and analyze our proposed methodto manage very-high-rate energy disaggregation data. In Chapter 5, we present oursecond contribution, a method of extracting interesting behaviours from low-rateaggregate consumption data. Finally, we draw conclusions from our work and offersuggestions for future research in Chapter 6.6Chapter 2Energy Disaggregation: State ofthe ArtSince the first proposal of NILM by Hart in the late 1980s and early 1990s [1], awide variety of NILM methods have been developed and explored. In this chap-ter we review the current state of NILM research, which continues to be activetoday. We first discuss NILM algorithms in a general sense, describing their defin-ing features and several ways they can be categorized. We then discuss the mostprominent NILM research efforts to date, organizing them by the sampling rate ofthe data they use to perform disaggregation. For additional reading on NILM, werefer to several other recent reviews of the field [7, 26–28]. We also note that, asidefrom the academic literature, a number of companies have also worked on NILMdevelopment.12.1 Categorizing Disaggregation MethodsThe two most important defining features of an NILM method are: a) its disaggre-gation features and b) its disaggregation algorithms.Disaggregation features are any properties of the electrical signal produced by1According to Armel et al. [7], who surveyed smart meter professionals for their review publishedin early 2013, these companies include “High Energy Audits, PlottWatt, Bidgely, Desert ResearchInstitute (DRI), Navetas, General Electric, Intel, and Belkin” and may also include “Verdigris Tech-nologies, Detectent, EcoDog, GridSpy, Check-It Monitoring Solutions, and EMME.”7an appliance’s typical operating behaviour. These are usually properties that canbe measured in the electrical signals produced by the appliances or can be calcu-lated from this data. Examples of features include the change in real and reactivepower2 when an appliance turns on or off (e.g., [1]) or the steady-state currentwaveform produced by an appliance while it is operating (e.g., [29, 30]). Ideally,the disaggregation features used by an NILM method are chosen such that their val-ues for each appliance are unique, which makes it easy to differentiate betweenappliances. When the electrical features are too similar to one another, additionalinformation like the typical duration of appliance usage may be incorporated intothe disaggregation features (e.g., [14]).Disaggregation algorithms are the methods that predict how appliances are op-erating based on disaggregation features. For example, an NILM system mightcompare each new feature it encounters with a library of exemplar features (e.g.,[31, 32]). If each exemplar corresponds to a unique appliance, then choosing theexemplar that most closely matches the newly observed feature allows one to pre-dict which appliance is responsible for that observation. Examples of more compli-cated disaggregation algorithms include those that model appliances as finite statemachines (FSMs) and use the probabilities of appliance state transitions to inferappliance operating schedules (e.g., [14–16]).NILM methods can be categorized in many ways based on the types of disaggre-gation features and algorithms they use. In the following subsections, we discussseveral ways in which to organize and classify NILM algorithms. While many NILMmethods are well-described by these categories, some methods use multiple fea-tures and/or algorithms in combination so that they fall into more than one categorysimultaneously.Due to the vast array of disaggregation features, disaggregation algorithms andapplications for which NILM may be used, it is also very difficult to compare theperformance of NILM methods. This issue has been acknowledged in other NILMreviews [7, 26] and is one of the major challenges of the field. Efforts have been2Real and reactive power are the real and imaginary parts of the product of electrical currentand voltage measurements, where the imaginary part comes from any phase difference between thecurrent and voltage. Real power is associated with the net transfer of energy, while reactive power isassociated with the storage of energy.8made to define and standardize methods of evaluating NILM [33–36] and it hasbeen suggested that clear applications for NILM should be defined [37, 38], but thisis still an ongoing challenge and does not simplify the comparison of work prior tothese standardization efforts. Thus we do not focus on the reported performance ofNILM methods in our review, but rather the disaggregation features and algorithmsthey use.2.1.1 Disaggregation FeaturesThe following are common ways to organize NILM methods by their disaggregationfeatures:• Sampling rate. NILM methods analyze data with sampling rates from onceper hour up to the range of MHz. This is perhaps the most important waythat NILM methods can be categorized because it has a very large impact onthe number of appliances types that can be disaggregated [7], and thus theapplications for which the NILM method can be used.3• Event-based or non-event-based. In event-based NILM, significant changesin the aggregate electrical signal, known as “events”, are detected. Disaggre-gation features associated with each event are then used to identify a singlestate transition of one of the building’s loads that is associated with eachevent. By contrast, non-event-based NILM methods do not depend on eventdetection. Instead, these methods attempt to determine the entire set of ap-pliances that are operating at a given instant in time based on measured orcalculated features at that instant.• Transient or steady-state. Non-event-based methods typically must usesteady-state signals as their disaggregation features, but event-based meth-ods can use either transient signals or the difference between steady-state3Measurements of voltage and current may be collected around the fundamental frequency [7]or at much higher rates to calculate average power consumption, which is typically output by themeasurement device at a low rate (e.g., once per minute or hour). Thus, when we refer to “samplingrate”, we mean the rate of the data that is output by the measurement device and will be analyzedfor disaggregation, not the current and voltage sampling rates at the device’s input. If high frequencycurrent and voltage are required for disaggregation, then the output sampling rate will be the same asthe input rate.9signals before and after a transition in a load’s state. For example, someNILM methods use the shape of the “spectral envelope” of the electrical cur-rent during a load state transition for disaggregation (e.g., [39]), while othersuse the difference between the current waveforms before and after a loadstate transition to disaggregate appliances (e.g., [30]).2.1.2 Disaggregation AlgorithmsThe following are common ways to organize NILM methods by their disaggregationalgorithms:• Degree and type of training. All NILM methods require some informationabout how appliances are expected to behave in order to associate applianceswith electrical signals. This knowledge could be as simple as the rated “on”power, rated standby power and an estimate of the power variance of eachappliance (e.g., [22]). It can also be as detailed as the shape of the steady-state current waveform of each appliance (e.g., [30]). This training infor-mation can be obtained in a variety of ways, including reading rated powerconsumption information from appliance labels or manuals, measuring theconsumption of appliances individually or learning using historical electricaldata.• Offline or online. Some NILM methods are designed to disaggregate energyusage offline, for example once per day (e.g., [22]). These methods could po-tentially be extended to work in an online way (e.g., using the most recent 24hours and updating every few minutes), but this is not always possible. Forexample, [22] identifies periods of appliance activity and attempts to match“on” and “off” transitions during these periods. Thus, it cannot perform dis-aggregation until the end of an active period, which may be several hourslong. By contrast, online NILM methods are designed to provide a real-timeinference of appliance operation. Non-event-based methods, such as [29],are particularly well-suited for this application because they depend only onmeasurements at that instant and not on past data.102.2 A Review of Existing Disaggregation WorkGiven the large number of ways that NILM methods can be categorized, we havechosen to organize our presentation of prior NILM research by a single aspect,namely their sampling rates. As mentioned earlier, this is one of the most importantways to categorize NILM methods because of it is closely tied to the applications ofa given NILM method. Organizing the methods by sampling rate is also suitable inthis thesis because our work addresses research questions in two different samplingrate categories.Although there is a continuous spectrum of sampling rates that NILM methodsmay use, they can be grouped into categories based on common sampling ratesand the different information that can be obtained from them. In this thesis, wehave defined six sampling rate categories that are similar, although not identical, tothe categories used in [7]. We will use the following subsections to discuss eachcategory, including its defining properties and the most prominent work that hasbeen done on NILM methods at those sampling rates. For reference, we have alsoprovided Table 2.1, which summarizes the sampling rate categories and prior workperformed in each. We note that some NILM methods use multiple disaggrega-tion features that are associated with more than one sampling rate. In these cases,we have assigned the study to the category corresponding to its highest requiredsampling rate.2.2.1 Very-Low-Rate: Multiple Events Per IntervalThe lowest sampling rates at which NILM is typically performed are slower thanone sample per minute, ranging up to one sample per hour. With this samplingrate, it is common that more than one appliance event will occur between twoconsecutive measurements. Thus, NILM methods at these sampling rates typicallydo not attempt to track the state changes of individual appliances, but rather detecttrends in energy consumption that may be attributed to certain types of equipment.In [7], it is suggested that NILM at this sampling rate allows the disaggregation ofthree appliance categories: those related to outdoor temperature, those with a cleartime-dependence (e.g., outdoor lighting) and those that are always operating.In the early years of NILM research, the possibility of disaggregating many11Name Sampling Rate Defining Feature StudiesVery Low slower than often multiple events Powers et al. 1991 [40]once per minute per interval Prudenzi 2002 [41]Kolter et al. 2010 [42]Huang et al. 2013 [11]Curtis et al. 2014 [43]Low 1 min to 1 sec typically only one event Hart 1992 [1]per interval Cole and Albicki 1998 [44, 45]Drenker and Kader 1999 [46]Farinaccio and Zmeureanu 1999 [47]Marceau 1999 [48]Marceau and Zmeureanu 2000 [49]Baranski and Voss 2003/2004 [50–52]Ruzzelli et al. 2010 [20]Kim et al. 2011 [14]Kelly 2011 [21]Figueiredo et al. 2012 [53]Kolter and Jaakkola 2012 [15]Parson et al. 2012 [16]Zoha et al. 2013 [17]Makonin et al. 2014 [54]Egarter et al. 2014 [18]Arberet and Hutter 2014 [55]Tang et al. 2014 [22]Liao et al. 2014 [19]Perez et al. 2014 [56]Elhamifar and Sastry 2015 [57]Medium faster than 1 Hz transient shapes are analyzed Norford and Leeb 1996 [58]but slower than Berge´s et al. 2009 [59]fund. freq. Gonc¸alves et al. 2011 [60]Barsim et al. 2014 [61]Du et al. 2015 [62]High fund. freq. low order harmonics Sultanem 1991 [31]up to 2 kHz are analyzed Roos et al. 1994 [63]Leeb et al. 1995 [39]Lee et al. 2005 [64]Srinivasan et al. 2006 [65]Akbar and Khan 2007 [66]Wichakool et al. 2009 [67]Dong et al. 2013 [68]Very High faster than 2 kHz detailed waveform shapes or Chan et al. 2000 [69]up to 40 kHz high harmonics are analyzed Laughman et al. 2003 [70]Lee et al. 2004 [71]Lam et al. 2007 [72]Shaw et al. 2008 [32]Suzuki et al. 2008 [73]Liang et al. 2010 [33, 74]Inagaki et al. 2011 [29]Wang et al. 2012 [30]Popescu et al. 2014 [75]Extremely faster than electrical noise and very high Patel et al. 2007 [76]High 40 kHz harmonics are analyzed Gupta et al. 2010 [77]Table 2.1: Summary of NILM methods organized by sampling rate12appliance types with very-low-rate data was explored by Powers et al. [40]. Theauthors presented a heuristic method of disaggregation based on real power andnon-electrical information like the time of day and duration of appliance use, butthe method was proprietary and its full details were not published. Later, morepowerful analytical tools were applied to the problem, including the use of artificialneural networks (ANNs) by Prudenzi [41] and an attempt by Kolter et al. [42] tosolve an optimization problem that assumes only a few appliances operate at anygiven time. In more recent years, attempts have been made to narrow the goalsof disaggregation of very-low-rate data, with a particular focus on predicting theshare of energy consumption that can be attributed to heating or cooling equipment.Examples of these efforts include work by Huang et al. [11], Curtis et al. [43] andPerez et al. [56].42.2.2 Low-Rate: Event IsolationWe define low-rate sampling for NILM as any method that uses data collected fromonce per minute to once per second. Within this range, one can expect that usuallyonly a single appliance event will occur between measurements, which theoreti-cally allows one to track the operating pattern of individual appliances. This wasthe context and the sampling rate for which NILM was first proposed by Hart [1]. InHart’s method, step changes in real and reactive power consumption were identi-fied as events and attempts were made to cluster similar events together into groups.From there, appliances could be associated with each group and their operationalpatterns could be inferred. An image showing Hart’s association of electrical eventswith appliances is given in Fig. 2.1. A commercialization attempt of a very similarmethod was described a few years later by Drenker and Kader [46]. Early exten-sions to Hart’s work were also proposed in work such as Marceau [48], Marceauand Zmeureanu [49], and Cole and Albicki [44, 45], but similarities between thereal and reactive power associated with appliance state changes still made it dif-ficult for some appliances to be differentiated from one another. Thus, a populartrend following Hart’s work was to obtain measurements at higher sampling rates4The work by Perez et al. was categorized in Table 2.1 as low-rate because it uses one minutedata, but we mention it here because its goals are more closely aligned with other methods used todisaggregate energy consumption associated with heating and cooling.13Figure 2.1: A depiction of how changes in real power can be used to identifychanges in the states of appliances for NILM (Reprinted from [1] withpermission, c©1992 IEEE)in order to obtain more detailed appliance features. These efforts fall into the othersampling rate categories and will be discussed in later sections.Several other approaches have also been taken to apply more complex algo-rithms to the disaggregation problem at this sampling rate. Farinaccio and Zmeure-anu [47] developed an algorithm that searches for changes in the energy demandand then applies a series of rules to evaluate whether or not the changes are in-deed associated with the target appliance. Baranski and Voss [50–52], on the otherhand, attempted to analyze only real power measurements that were made onceper second using an optical sensor which monitors the movement of a dial on atraditional mechanical energy meter. Their algorithm applied complex analyticaltools, including a genetic algorithm, clustering techniques and neural networks toattempt to identify FSM models for appliances.In recent years, there has been a renewed interest in NILM research in this sam-pling rate category because of the widespread roll-out of smart meters. These me-ters are making data sampled at one second to one minute intervals more widelyavailable and researchers are eager to find ways to use this information for NILM.One popular method that has received significant attention is the used of hiddenMarkov model (HMM) strategies, such as work by Kim et al. [14], Kolter and14Jaakkola [15], Parson et al. [16], Zoha et al. [17] Makonin et al. [54] and Egarteret al. [18]. In these methods, loads are modelled as having various states, whichthey can transition between with varying known probabilities. Efforts have alsobeen made to train statistical classifiers such as decision trees (by Liao et al. [19])and ANNs (by Ruzzelli et al. [20]). Figueiredo et al. [53] also tested the use of twoclassification methods, support vector machines (SVMs) and 5-Nearest-Neighbours,for disaggregation at this data rate. Other interesting methods include work byKelly [21], which attempts to automatically extract FSM models of appliances fromaggregate data, and work by Arberet and Hutter [55], which attempts to representthe load profile with a dictionary of boxcar atoms. In Tang et al. [22], the totalvariation of appliance state changes is minimized, thereby taking advantage of thesparsity of appliance events in time. This sparsity is also taken advantage of by theHMM method proposed by Makonin et al. [54]. Elhamifar and Sastry [57] also usedthis sparsity, as well as other information like the relationships between appliancesthat operate at similar times (e.g., those used for cooking) in their disaggregationstrategy. With their method, they learn appliance “powerlets”, or representativepower consumption vectors, from the load profiles of individual appliances, thentry to represent the aggregate signal using these “powerlets.”52.2.3 Medium-Rate: Transient AnalysisThe next sampling rate category is defined as those that use sampling rates that arefaster than once per second (1 Hz), but no higher than the fundamental frequencyof the electrical system (typically 50 Hz or 60 Hz, depending on the country). Themain defining feature of this sampling rate category is that it allows the charac-terization of transient electrical behaviour as appliances change state. While sometransients may be visible from low-rate sampling, medium-rate sampling allowsfor much more detailed information on the transient shapes to be acquired.The number of studies that have focused on this sampling rate is relativelysmall. Norford and Leeb [58] did some work in this area shortly after the first intro-5Elhamifar and Sastry [57] claim that their framework would work on much higher rate data, buttheir experiments focus only on 1 Hz measurements. It is not clear from their discussion how theirmethod would be applied to higher rate data (i.e., if they would simply sample real power at a higherrate than 1 Hz or if they would take advantage other information derived from high-rate sampling,such as harmonic content or raw current waveforms).15duction of NILM. In their method, a sliding window approach was taken, where ex-pected transient waveform shapes were compared with observed data until a matchwas found. Several other studies built on this approach or used similar ideas (e.g.,[32, 39, 64, 67]), but these methods also incorporated higher harmonics, thereforerequiring higher sampling rates. These methods will be discussed further in the fol-lowing sections. Other work using only medium-rate sampling was done by Berge´set al. [59], where various disaggregation features and several classification meth-ods, such as 1-nearest neighbour and decision trees, were tested. An interestingpart of the work in [59] is that they performed regressions on the transient wave-forms and used the regression coefficients as a disaggregation feature. As far as weknow, the use of this disaggregation feature is unique to that group of researchers.Gonc¸alves et al. [60] didn’t take advantage of transients, but used changes in realand reactive changes in power calculated every 60 Hz as their disaggregation fea-tures and explored various unsupervised clustering techniques on these features.Barsim et al. [61], on the other hand, explored clustering methods using proper-ties of the transient electrical signals. Recently, Du et al. [62] proposed a methodwhere transient spikes, as well as repeating patterns in the root mean square (RMS)current, are used to identify appliances like printers or fax machines.2.2.4 High-Rate: Low Order HarmonicsWhen the sampling rate exceeds the fundamental frequency of the electrical sys-tem, we begin to be able to calculate harmonic content of the signal. High-ratesampling methods are those that sample above the fundamental frequency, but nothigher than 2 kHz, which corresponds to approximately the 16th harmonic.In parallel to Hart’s first work on NILM, similar ideas were also being exploredand presented by Sultanem [31], who used real and reactive power readings incombination with low order harmonics for disaggregation. Several studies alsoproposed the use of low order harmonics to address the issue of appliances withsimilar real and reactive power, as we mentioned in Section 2.2.2. These effortsinclude work by Leeb et al. [39], Laughman et al. [70]6 and Akbar and Khan [66].6We classify the work in [70] as “very-high-rate” because its sampling rate exceeds 2 kHz, butwe mention it here because of its focus on extending beyond the use of real and reactive power toimprove discrimination between appliances.16Other work in approximately the last ten years by Lee et al. [64] and Wichakool etal. [67] has explored the use of low order harmonics to disaggregate variable speeddrives from industrial building consumption. This is a particularly challenging taskbecause a variable speed drive’s load can vary continuously and does not alwayshave a sharp turn-on transient. An attempt to apply ANNs to low order harmonicshas also been performed by Srinivasan et al. [65]. The use of ANNs was alsosuggested by Roos et al. [63], with a focus on using multiple cascaded ANNs toclassify families of industrial loads at different levels.7 Dong et al. [68] focusedspecifically on the problem of extracting NILM features for specific appliances.They used total harmonic distortion, a property of low order harmonics, in additionto real and reactive power measurements.2.2.5 Very-High-Rate: Waveform ShapesIn the next highest category, sampling rates are above 2 kHz but no more than 40kHz. With these sampling rates, much more detail about each appliance’s wave-form can be obtained, either from higher harmonics or from the shape raw currentwaveforms themselves.Work on high order harmonics has been performed by Laughman et al. [70]and Shaw et al. [32] to improve the discrimination between appliances. It is notclear from [70] or [32] whether or not the high harmonics are actually used fordisaggregation when testing their methods or if they use only the lower order har-monics. Both of their proposed techniques are theoretically capable of taking intoaccount high harmonics and their sampling rates use suggest that calculating theseharmonics to use for NILM would be possible, so we have categorized them in thisgroup.The other approach for very-high-rate NILM that has received attention is theanalysis of highly detailed raw current waveforms. Early investigations of currentwaveforms were made by Chan et al. [69], who showed that some sample currentwaveforms could be represented using the Daubechies Wavelet basis and suggestedthe use of the wavelet coefficients for disaggregation. Lee et al. [71] also observeddifferences in the shapes of waveforms from different appliances. Lam et al. [72]7No sampling rate was cited in [63] but the disaggregation features they discuss would require atleast high-rate sampling so we have assigned their work to this category.17went on to explore this idea further by analyzing various properties of V-I trajecto-ries, which describe the relationship between voltage and current waveforms. Sincethen, various other approaches using raw waveforms have been explored. Liang etal. [33, 74] suggested it as one of many disaggregation features that could be usedin their “Committee Decision Making” method, where multiple features and mul-tiple algorithms are used simultaneously, each one “voting” to collectively predictthe associated appliance.8 An optimization approach that predicts what combina-tion of appliance waveforms adds up to give the aggregate waveform was proposedby Suzuki et al. [73] and further developed by a group of similar authors, Inagaki etal. [29]. Recently, Wang et al. [30] have taken a similar approach, but have chosento solve the optimization problem in a way that promotes sparsity in the solution,based on the assumption that few appliances are operating or changing state at anygiven time. The approach in [30] also incorporates the use of compressed sensingtechniques, which were first proposed by Cande`s et al. [78] and Donoho [79] asa way to reduce the sampling rate of certain types of data signals. Popescu et al.[75] have also suggested the use of “Recurrence Plot Analysis” to investigate sub-tle changes in recurring current waveforms and use these plots as disaggregationfeatures.2.2.6 Extremely-High-Rate: Electrical NoiseThe last sampling rate category includes any sampling rates above 40 kHz. Thedefining characteristic of this method is its ability to capture electrical noise atvery high frequencies, although it has also been used to capture very high harmoniccontent for disaggregation purposes.This sampling rate category has received little attention from NILM researchers,possibly because of the challenges associated with sampling and managing thedata at such a high rate. The first work in this area was by Patel et al. [76], whoanalyzed electrical noise produced during transient events when appliances changestate. Later work by similar authors, Gupta et al. [77], shifted focus to noiseand very high harmonics produced during the steady-state operation of appliances.8The work in [33, 74] focuses on combining the use of many disaggregation features and algo-rithms and therefore falls into nearly every sampling rate category, as well as nearly all groups whenwe attempt to categorize NILM methods in other ways.18Their method is particularly well suited to detect compact fluorescent light bulbsand consumer electronics, which produce large amounts of electrical noise. Aninteresting challenge associated with these extremely-high-rate methods is that theelectrical noise features used for disaggregation have some properties that varydepending on the location of an appliance within the building. While this suggeststhat it would be difficult to identify a mobile appliance, whose feature will changewith location, Gupta et al. found that the electrical features they collected hadsome properties that were independent of appliance location. Thus, one couldpotentially use the properties that are independent of location to identify when anappliance turns on and the properties that vary with location to infer where themobile appliance is located within the building.19Chapter 3Research QuestionsOur work addresses two research questions related to facilitating NILM. Here westate these two questions, provide motivations for investigating them and discusshow prior literature has attempted to address them.3.1 Question 1: Managing Very-High-Rate NILM DataThe first question we ask in this thesis is, “How can random sampling strategiesusing a single sampling channel be used to improve the acquisition and man-agement of very-high-rate data for NILM?” We use “acquisition” to refer to thesampling of the signals and “management” to refer to various other operations thatmay be performed on the data before NILM algorithms are applied. In particular,this includes any transmission that may be necessary to transport the data to thelocation where NILM will be performed.3.1.1 MotivationsMotivating this research question requires that we answer two other questionsabout NILM: a) “Why is very-high-rate sampling of electrical data necessary forNILM?” and b) “What makes acquiring and managing very-high-rate data for NILMdifficult?” Here we answer these questions.20Very-high-rate data for NILMAs discussed in Chapter 2, we use “very-high-rate data” to refer to data which hasbeen sampled at a rate faster than 2 kHz but no more than 40 kHz. Much of therecent work on NILM has been focused instead on low-rate NILM due to the re-cent popularity of smart meters that provide energy consumption information onthe order of minutes or seconds. By assuming only the knowledge of these low-rate active power measurements and minimizing any intrusive set-up or trainingrequirements, researchers are developing NILM methods that are as easy as possi-ble to implement with currently available data (e.g., [14–16, 18, 19, 21, 22]). Thetradeoff of this easy implementation is that one must have lower expectations ofwhat the NILM methods can infer. These methods are expected to allow the disag-gregation of only about 8 types of loads (i.e., appliances or pieces of equipment)[7]. In addition, they may not be able to identify loads with continuously vari-able energy consumption, since they generally depend on models of loads withfinite states with fixed power consumptions. Some of these methods also requireseveral hours of measurements before NILM predictions can be determined (e.g.,[16, 21, 22]), making them offline methods that cannot perform real-time applica-tions.We focus instead on “high performance” NILM methods that push the bound-aries of what NILM can do using as high a sampling rate as necessary. Rather thanasking, “What can we infer from the currently available electrical data?” we ask,“What could we infer if we had all the electrical data we needed?” For example,one would expect to be able to disaggregate many more than 8 load types whenvery-high-rate data is available [7]. Being able to perform online disaggregation isanother important advantage that very-high-rate data could provide over low-ratedata because it opens up new opportunities for applications of NILM. Examplesinclude detecting when loads are in need of replacement or repair1 or detecting theoccupancy in a building [1]. The latter has various use cases, including security(e.g., to detect unexpected activity in a vacation home [1]) or for home automation(e.g., adjusting the thermostat in a room when a person enters it [76]).It is generally accepted that very little information can be inferred about en-1This may also be possible with offline NILM, but offline methods may not be fast enough to allowa timely response to load faults.21ergy consumption in a building using real power measurements alone. Thus, thelow-rate sampling methods that take advantage of smart meter data typically useadditional information to make predictions about load usage. For example, theymay use the durations for which loads typically operate [14] or an assumption thatchanges in load states are infrequent [22]. Some of this information could also beused for high performance NILM applications, but it is not always suitable (e.g.,the algorithm in [22] requires several hours to elapse before NILM predictions canbe made, which is not suitable for an online application of NILM). As an alter-native, one can instead collect higher-rate electrical data for NILM. Higher-ratesampling allows us to take advantage of electrical properties other than just realpower, which contain valuable information.2 Thus, while not all NILM systemsrequire higher-rate data, there is a distinct need for this data for some NILM ap-plications. In Chapter 2, we saw that the higher sampling rate we use, the moreharmonics and electrical waveform detail we can obtain. This higher level of de-tail is generally associated with greater NILM capabilities [7] and some work hasbeen done to investigate the effects of sampling rate on NILM performance [80]but the relationship between them is still not fully understood. In this thesis, wefocus on the very-high-rate sampling category due to the relatively strong attentionit has received in the existing literature, but recognize that ongoing research in allsampling rate categories is likely required in order to identify which rates are mostappropriate for various NILM applications.Data managementWhen electrical signals are being sampled at very-high-rates, managing the datacan be a significant challenge. Effectively acquiring and managing the data is alsoespecially important for online NILM, where the data needs to be processed quicklyand continuously.One can model the basic framework for NILM data acquisition and process-ing as in Fig. 3.1. That is, the measurements of the building’s aggregate energyconsumption are collected at a “measurement device”, then transmitted to a “pro-2Real power may be calculated from voltage and current measurements taken at rates as low asthe fundamental frequency [7], but additional properties typically need voltage and current samplingfrequencies of at least twenty times that rate (1 to 1.2 kHz) [26].22Measure Transmit Receive ProcessMeasurement device Processing deviceAnalogsignalNILMpredictionsFigure 3.1: Basic NILM frameworkcessing device” (e.g., a personal computer) to be analyzed. There are restrictions ofwhere the measurement device can be located, since it must be close enough to thebuilding’s electrical wiring to collect measurements. There may also be limitationson the processing power of the measurement device, particularly if it is batterypowered.3 The processing device, however, is highly flexible and its location andpower needs can vary considerably. The framework in Fig. 3.1 is sufficiently gen-eral to encompass almost any hardware implementation because the transmissionand reception of the signal could be done in various ways. For example, previousstudies collecting data for NILM have used a wired connection to a computer soundcard [23, 82]. Other studies have used a wireless home area network (HAN) con-nection to transmit energy consumption data between devices [83, 84], although inboth of these studies the transmitted data were only sampled at a rate of 1 Hz.The framework in Fig. 3.1 also allows that some processing could be performedon the measurement device, where it would be included in the “Measure” stage.This generally refers to processing steps like compression of data after it has beenmeasured, but could also include parts of the NILM algorithm processing. If all thedisaggregation occurs at the measurement device, the transmit and receive stagesof Fig. 3.1 could simply be eliminated and the two devices merged into one.While it may be possible to design a measurement device that can performNILM locally, it is often more practical to perform disaggregation on the process-ing device due to the restrictions on location and power use of the measurementdevice. This strategy is suggested in [7] as a temporary measure to use data thatcurrent smart meters have available until they can be upgraded. While [7] alsosuggests upgrading smart meters to eventually allow disaggregation on the metersthemselves, we argue that such a method has several disadvantages. First, the sug-3The Current Cost Energy Transmitter [81] is an example of a battery-powered electricity mea-surement device.23gestion to perform NILM on the processing device as a temporary measure demon-strates this method’s flexibility and the fact that changing the processing deviceis often more convenient than changing the measurement device. As the field ofNILM research quickly grows, it would be beneficial to design a system that accom-modates updates and improvements to NILM algorithms. All NILM algorithms aretemporary to some degree, since they are likely to be replaced and updated as NILMalgorithms improve. Keeping NILM separate from the measurement device may bethe best way to accommodate these changes. This strategy also allows flexibilityin several other ways. For example, this would allow the data to be transmitted tomultiple processing devices, each of which may have a different goal (e.g., one cal-culates and stores power consumption at one minute intervals for utility billing andanother uses high resolution current waveforms to perform online disaggregationfor consumer feedback). Performing disaggregation on a processing device mayalso make it easier to accommodate changes to the set of loads that are present in abuilding or the needs of the consumer (e.g., offering customers different levels ofdisaggregation depending on their needs and budget, which may change over time).While some updates and changes to the measurement device may be possible, thesechanges must either fit within the existing processing and storage capabilities of thedevice or require hardware updates, which may not be convenient or cost effective.Performing NILM on a processing device separate from the measurement device isalso suggested in [37] due to the limited memory of the measurement device. Bymoving NILM to a processing device, one can select or design it to store as muchdata as is needed for the chosen NILM algorithm. This also gives consumers theoption to store data from longer periods of time, possibly by attaching an externalhard drive to the processing device to increase its storage space. This would allowconsumers to share their energy data with third parties who may be able to use itto provide additional services [7].With high performance NILM, one should also be able to collect very-high-ratecurrent and voltage measurements at a designated measurement device and thentransmit them to a separate processing device quickly enough to allow online NILM.This ensures that a wide variety of NILM methods with different data requirementscan be used, where methods that do not need very-high-rate data can simply down-sample the data. This sampling framework does, however, make managing the24very-high-rate data for NILM a challenging problem.There are two main challenges associated with managing the very-high-ratedata when using this NILM framework. The first is the communication channelbetween the measurement and processing devices. While some implementationsof an NILM system may permit the connection of these elements using cables, amore general and flexible system may use wireless communication such as WiFior ZigBee.4 With these wireless technologies it is typically not possible to transmitall the high resolution data to the NILM processing device quickly enough for allNILM applications [7]. Thus, if we wish to maintain the versatility of wireless com-munication, the volume of data that needs to be transmitted must be reduced. Thesecond challenge in managing the data is the design of the measurement device.If the device is battery powered, its processing requirements should be minimizedto extend its life as long as possible. Hardware costs for the measurement devicemay also place restrictions on its design, including limitations on memory, as in[37], and the sampling rate of analog-to-digital converters (ADCs) used in the de-vice (lower-rate ADCs are typically less expensive).3.1.2 Existing ApproachesAssuming the basic framework for energy disaggregation in Fig. 3.1 there are manyways that one could design the measurement and processing stages to acquire andmanage very-high-rate data for NILM. In this section we discuss the various existingapproaches in the context of this basic framework. A summary of these approachesis also given in Table 3.1, along with our novel approach, which will be discussedin Section 3.1.3. We also note that it may be possible to combine some of thesemethods to offer even better data management, but for the purposes of this thesiswe only consider each strategy individually.4To demonstrate this, consider [82], where the netbook computer used as a processing devicewas attached to the door of the fuse box using Velcro because the cables to the voltage and currentsensors were very short. This solution was acceptable for their study, but could certainly be improvedif wireless communication was used instead. Wireless communication would be particularly usefulif the computer was also being used to display the NILM predictions to the consumer and the fuse boxwas not in a central location in the building.25Approach “Measure”stage“Process”stageAdvantages DisadvantagesNaive sampling high-rate ADC NILM • simple• measures all dataat high rate• large volume ofdata• high-rate ADCrequiredCompression high-rate ADC↓compressdecompress↓NILM• reduces amount oftransmitted data• measures all dataat high rate• requires compr.and decompr.• compression ratesare limitedChangedetectionhigh-rate ADC↓det. changesNILM • large reduction indata volume• only records dataat certain times• requires eventdetectionMulti-channelcompressedsensingmixer/filterbank↓low-rate ADCbankrecover↓NILM• reduces requiredsampling rate• can incorporateNILM directly• requires multiplefilters and ADCs• cannot measuretransients well• requires a knowndictionarySingle-channelcompressedsensing (ournovel approach)singlemixer/filter↓singlelow-rate ADCrecover↓NILM• reduces requiredsampling rate• can incorporateNILM directly• uses a singlesampling channel• cannot measuretransients well• requires a knowndictionaryTable 3.1: Approaches for high-rate NILM data acquisition and managementNaive samplingThe simplest method we can consider for acquiring and managing very-high-ratedata for NILM would be to naively sample the electrical signal with a single high-rate ADC and then immediately transmit the data to the processing device. Onecould then apply NILM techniques directly on the received data. As mentionedin Section 3.1.1, however, high-rate ADCs are more costly than lower-rate ADCs.Transmitting all of this very-high-rate data also places high demands on the com-26munication link between the measurement and processing devices and may noteven be possible [7]. The only way to reduce these demands while using the naivesampling framework is to reduce the sampling rate. Doing so will reduce the qual-ity of data received at the processing device and could impact the performance ofNILM algorithms.CompressionAn alternative method that has been suggested is compressing very-high-rate mea-surements once they have been collected using a compression method like bzip2[85] or free lossless audio codec (FLAC) [86], which have been used to compressNILM data in [83] and [23], respectively. This method is similar to the previous oneexcept that compression and decompression algorithms are used at the measure-ment and processing devices, respectively. This solution allows one to measurethe electrical signal at a high rate while reducing the demands on the communi-cation link between devices. The tradeoff with this method is that both deviceswill require additional hardware and processing power to perform compressionand decompression. This is not ideal when the processing power and hardware re-quirements of the measurement device should be minimized, as is the case here. Inaddition, there are limitations to how much compression is possible without losinginformation contained in the measurements. For electricity data for NILM, previ-ous studies have obtained compression ratios of about 1.5 to 3 using bzip2 [83]and about 6 using FLAC [23]. Compression rates may be improved by using lossycompression, where the decompressed signal is not a perfect reconstruction of theoriginal signal, but this could again affect the performance of NILM algorithms.Change detectionYet another sampling framework one could use for very-high-rate sampling is toonly transmit measured data when a significant change in the electrical signal isdetected, as suggested in [26, 37]. This method could greatly reduce the need totransmit measured data, but means that only parts of the signal are transmitted andthat some NILM algorithms cannot be used. In particular, this strategy restricts usto event-based NILM, limiting the applications we can perform (e.g., we would not27be able to detect slowly changing loads with continuously varying consumption).The parts of data that are selected to be transmitted are also dependent on a changedetection algorithm, whose accuracy will affect the data available for NILM. Fur-ther, this framework can be expected to reduce the amount of data transmitted onaverage, but may be required to transmit large amounts of data when many eventsoccur around the same time. This could cause delays or failures at times whenmany loads in a building are changing state. Finally, performing change detec-tion at the measurement device also increases the hardware and processing powerrequired at the measurement device and makes it difficult to update the changedetection algorithm if better methods become available.Compressed sensingFinally, CS could be used to manage very-high-rate electrical data, as proposed by[30]. With CS, signals are sampled in a special way so as to capture all the importantinformation they contain but using fewer samples than the Nyquist rate [78, 79].A nonlinear recovery stage allows the signals to be rebuilt from these speciallydesigned samples. To use CS, it must be possible to represent the measured data asa sparse signal. The authors of [30] assumed this is possible because the currentwaveform is the sum of only a few appliance waveforms associated with a smallnumber of loads that are operating or changing state. Several other NILM studieshave also assumed this sparsity in the appliances that are operating or changingstate (e.g., [22, 42, 54, 57]), but these methods are typically focused on lower-ratedata than we consider for this part of the thesis.In [30], the authors also assumed that they had the flexibility to define a CSsampling matrix arbitrarily and that this could be implemented using a bank ofanalog mixing functions and/or filters followed by a bank of low-rate ADCs. An ex-ample of an implementation that allows this flexibility is the modulated widebandconverter (MWC) [87]. Alternatively, the authors of [30] could simply sample at theoriginal high rate and then apply the filters digitally. This eliminates the advantageassociated with removing the need for a high-rate ADC, but could still be usefulfor efficient signal processing in the digital domain. In either the analog or digi-tal implementation, the method in [30] is very attractive because it represents the28aggregate electrical current signal as the sum of the current waveforms that eachload in the building would produce if operating alone. The CS recovery stage in-volved determining a sparse vector indicating which of these waveforms is active,which immediately performs NILM by indicating which loads are either operatingor changing state.5 The way the authors formulated the CS problem allowed themto automatically perform NILM as part of the sampling and recovery of the originalsignals. For the analog implementation, a disadvantage of this method is that itrequires multiple filters and ADCs to implement CS, which increases the cost andsize of the measurement device. Further, the method relies on a pre-known dic-tionary of steady-state waveforms associated with each load in the building. Thisdictionary may be difficult to obtain and means that only steady-state parts of theelectrical signal will be able to be recovered. Thus, any NILM algorithm that relieson the transient load signals will not be possible with this method.It is worth noting that the method in [30] is only one of many possible waysthat CS could be implemented to acquire NILM data. In particular, CS is designedto combine the compression and sensing stages of data acquisition, but does notnecessarily have to combine the compression and disaggregation stages, as is donein [30]. The latter combination does not affect the measurement stage, but sig-nificantly impacts the processing stage of this sampling framework. Other CSframeworks could therefore be designed for NILM, including those that decouplethe compression and disaggregation stages by changing the way the signals are re-covered at the processing device. This decoupling could be seen as an advantagebecause it allows the use of a wide variety of NILM algorithms rather than the singleone that is “built-into” the sampling framework.Using CS for NILM (using the framework from [30] or others) also offers severalother potential advantages compared to other sampling frameworks. First, passingelectrical data through mixers and/or filters means that the physical measurementsbecome obfuscated and contain no physical meaning until they are recovered atthe processing device. Since one needs to know the mixer and/or filter properties(i.e., the “key”) at the processing device to perform recovery, this offers a built-in5If we measure the raw current signal, “active loads” are those that are operating at a given instant.If we measure the difference between the current signal and a delayed copy, “active loads” are thosethat have changed state during the time delay.29element of security to the signal transmission. CS methods can also be used toremove noise from a signal [88], which is taken advantage of by [30].3.1.3 Our ApproachIn this thesis, we investigate how randomized sampling techniques that use a singlesampling channel can be used to efficiently capture very-high-rate electrical datacollected for NILM. Our investigation is similar to [30] in that it uses ideas fromCS, but differs in that we explore the effects of implementing CS techniques usingonly a single sampling channel and one low-rate ADC. In [30] it is assumed thatmatrices describing the CS sampling process can have arbitrarily defined elements,which typically requires multiple sampling channels to implement, such as MWC[89].6 In this thesis, we explore two sampling implementations which are designedto function using only a single sampling channel. The two methods we considerare known as random filtering [24] and random demodulation [25]. Both methodsreduce hardware requirements at the measurement device by allowing the use ofa single sampling channel, but also introduce restrictions into how the CS sensingmatrix can be constructed. A comparison of the hardware implementations of MWCand random demodulation is made in [90], but our focus in this investigation isinstead on how the restrictions to the CS sensing matrix associated with randomfiltering and demodulation affect signal recovery and NILM performance. The novelapplication of these sampling approaches is also included in Table 3.1.In addition, we design our system model using a more general framework thanin [30] so that the compression and disaggregation stages can be combined, butthey do not have to be. This generalization allows many different NILM algorithmsto be used to process the data rather than a single algorithm that is “built-into” theCS recovery stage.6The method in [89] can reduce the number of required sampling channels by increasing thesampling rate of the bank of low-rate ADCs, but the authors also note that practical limitations makeit difficult to collapse all the way to a single sampling channel.303.2 Question 2: Improving Very-Low-Rate NILMThe second question we ask in this thesis is, “How can ideas from existing NILMtechniques be used to extract valuable information about energy consumptionfrom very-low-rate electrical data, for which these techniques were not orig-inally designed?” Specifically, we focus on detecting and analyzing changes inelectricity consumption that occur far more frequently than others.3.2.1 MotivationsWhile low-rate electricity data sampled every few minutes or seconds is becomingeasier to obtain through smart meters, it is still challenging to obtain in many cases.This is especially true for utility companies wishing to analyze energy consump-tion data, which often have access to data sampled only every 15 to 60 minutes.As discussed in Chapter 2, there has been some research on very-low-rate NILM[11, 41–43, 56], but most NILM research has focused on higher granularity data.Utility companies, as well as those concerned about the urgency of our global needto reduce CO2 emissions, are not necessarily in a position to wait for data sampledat higher rates to become more widely available before attempting to disaggregateenergy data. Thus, addressing the shortage of very-low-rate NILM methods is crit-ical to facilitating more immediate use of energy disaggregation for this samplingrate category.In this thesis, we address this shortage by modifying an NILM technique that ispopular for low-rate NILM methods (i.e., one minute to one second data) so thatit can be applied on very-low-rate data. As discussed in Chapter 2, low-rate datausually allows appliance events to be isolated so that only one appliance changesstate per measurement interval. This means that differences in electricity consump-tion from one measurement interval to the next can often be mapped to changes inindividual appliance states, allowing one to infer the operational schedules of ap-pliances. Given the promising results that this disaggregation technique has shownfor low-rate NILM methods, we believe that it is worthwhile to attempt to extendthe method for use on very-low-rate data. We acknowledge that the expectations ofwhat one can infer from 15 to 60 minute energy data must be lower than for datasampled every few minutes or seconds, but we believe that even coarse disaggre-31gation could provide valuable information about building energy consumption.3.2.2 Existing ApproachesWhile the body of NILM research is large, there has not been extensive work per-formed on analyzing very-low-rate data. Here we expand on Section 2.2.1 by dis-cussing in greater detail the relatively small collection of NILM research for thisdata sampling range.Rule-based disaggregationThe earliest work that we know of exploring very-low-rate NILM was performed byPowers et al. [40] around the same time that NILM was originally proposed. Thismethod was designed for 5 to 15 minute energy data and uses a proprietary “rule-based” algorithm to match observed power consumption spikes with expected loadbehaviour. An interesting aspect of this method is that it suggests using disag-gregation features one at a time rather than simultaneously. That is, they first usepower consumption measurements to differentiate between appliances and onlycompare other disaggregation features (e.g., time of day, duration of usage) if twoappliances have similar power consumption. Due to the propriety nature of theiralgorithm, however, this work has not received much attention in the NILM researchcommunity since it was first proposed.Artificial neural network approachAnother relatively early example of work attempting to disaggregate energy con-sumption using very-low-rate data that by Prudenzi [41]. Prudenzi used ANNs on15 minute data to extract two types of appliance from the aggregate consumption:appliances that are used once per day or less (e.g., washing machine) and thosethat cycle multiple times per day (e.g., water heater). The results in [41] showedrelatively good results but, like [40], this research effort was fairly isolated and hasreceived little attention since it was first proposed. Because ANNs have a tendencyto over-fit the data used to train them [91], this could be the reason that this methodhas not been researched more broadly for NILM since Prudenzi’s work. It is alsopossible that the assumptions about appliance usage (e.g., that a washing machine32is used at most once per day) are not accurate or general enough to be applied tomore buildings.Sparse coding approachKolter et al. [42] proposed the used of “sparse coding” for very-low-rate NILM,where it is assumed that only a few appliances are operating at any given time. Intheir work, the authors formulated an optimization problem that takes this assump-tion into account and made an ambitious attempt to disaggregate consumption datameasured once per hour. In particular, they attempted to infer the energy shareof different loads for one week of data at a time. Kolter et al. argued that theirmethod is effective, but the amount of the aggregate energy that it could assign tothe correct load was no more than about 50% in their tests. Their method also re-quires a large amount of sub-metered electricity data for training, which is difficultto obtain.Focusing on heating and cooling loadsIn recent years, several researchers have narrowed the types of appliances that theyattempt to disaggregate from the total consumption when using very-low-rate data.One of the most popular goals that these researchers have focused on is detectingthe amount of energy consumed by heating or cooling equipment.In one method proposed by Curtis et al. [43], the relationship between outdoorair temperature and the energy consumption of a building are analyzed. For exam-ple, the amount of additional energy the building uses for an incremental increasein outdoor temperature is used to predict what share of the total energy consump-tion is used for cooling.Huang et al. [11] used outdoor air temperature for disaggregation in a slightlydifferent way, attempting to detect the presence of electric heaters and predict theirenergy consumption. Huang et al. used an HMM approach, where the probabilityof an electric heater turning on or off depends on the temperature.One disadvantage of the work in both [43] and [11] is that neither method at-tempts to find the operating schedule of appliances, focusing instead on the shareof total energy that the heating or cooling equipment uses. While this is accept-33able for some applications, understanding the exact operation times of these appli-ances is sometimes desirable. For example, a consumer that is subject to a time-of-use (TOU) electricity pricing scheme may wish to use NILM to verify that theirair conditioner does not operate during times when electricity is most expensive.Perez et al. [56] have made an attempt to predict the operational schedules of airconditioners by training their disaggregation algorithm during times of low activityin the home, when only the air conditioner is expected to be operating. They testedtheir technique on 1 minute, 5 minute and 15 minute data and found that it workedreasonably well for the first two sampling rates but performance was very poor forthe 15 minute data. In addition, their method was only shown to be effective duringcertain months of the year.3.2.3 Our ApproachIn this thesis, we investigate a new opportunity for coarse disaggregation of very-low-rate data. While we recognize that very-low-rate data does not usually resultin the isolation of appliance events in a single interval, we believe that occasion-ally only a single event will occur within these long intervals. In particular, largeequipment that turns on and off quickly, frequently and with approximately thesame magnitude each time may allow us to detect some appliance events in somebuildings using very-low-rate data.Our approach to detecting these frequent events involves analyzing “energytransitions”, which we define as the difference in energy consumption between twoconsecutive measurements in an energy consumption time series. In our method,we sort the observed energy transitions for a building by their magnitudes andattempt to observe “peaks” or “clusters” in these energy transitions which cor-respond to appliance events that are much more frequent than others. Similarstrategies have been used for NILM in household applications with low-rate energydata [1, 21, 50, 51, 60, 61, 68]. In particular, our work takes a similar approach toBaranski and Voss [50, 51] in that we search for energy transitions that occur veryfrequently in the aggregate energy signal using real power only. The primary dif-ference between our work and that in [50, 51] is that they analyze data measuredonce per second. This changes the disaggregation problem significantly, since typ-34ically no more than one appliance will change state per sampling interval. As such,Baranski and Voss attempt to identify possible FSM models for appliances and de-tect their operational sequences in the aggregate signal, whereas we make no suchattempt. Our work is also related to that by Kelly [21], but again Kelly uses fre-quent energy transitions to characterize FSM models for appliances and uses highersampling rates than we consider. In fact, the use of these techniques for very-low-rate data has not been reported, to the best of our knowledge.35Chapter 4Single-Channel RandomizedSampling to ManageVery-High-Rate Data for NILMIn this chapter, we explore how a single sampling channel collecting random mea-surements can be used to efficiently capture very-high-rate electrical data for NILM.We begin with some background information on CS, as well as two single-channelsampling methods that can be used to obtain CS measurements. We then presentour system model and report the results of our numerical experiments. Finally, wediscuss our results and draw conclusions.4.1 Background on Compressed Sensing and RelatedConcepts4.1.1 Compressed SensingThe field of CS was introduced by Donoho [79] and Cande`s et al. [78] as a methodof measuring signals at a much lower rate than the commonly accepted Nyquistrate (i.e., twice the highest frequency component of the signal we wish to capture).Signals are often sampled at a very high rate, then later compressed (e.g., JPEGcompressed images) and CS is an attempt to merge these two steps into one.36The use of CS depends on the assumption the a signal is sparse in some do-main. That is, a signal x of length d is considered s-sparse if it is well representedas x≈Ψs, whereΨ is a sparsity basis and s is a vector of length d with no morethan s non-zero coefficients. One can then collect m carefully designed CS mea-surements of the signal, where m is on the order of O(s logd). That is, the numberof measurements scales with the number of non-zero elements in s, and only withthe logarithm of d.Designing the m measurements is done using an m× d sensing matrix, Φ,which can take on various forms but must be incoherent with the sparsity bases,Ψ. The CS measurements used to describe a signal x can then be written asy =Φx≈ΦΨs. A common way to construct Φ is to define its elements as inde-pendent and identically distributed (i.i.d.) random variables (e.g., from a Gaussiandistribution) to ensure incoherence of Φ with most bases, Ψ. The original signalcan then be recovered from y using the knowledge of Φ and Ψ and the fact thats is a sparse vector. Popular recovery techniques include greedy methods, suchas orthogonal matching pursuit (OMP) [92], or optimization methods where the `1norm of s is minimized (e.g., Basis Pursuit [93]).4.1.2 Random FilteringRandom filtering is a variation of CS first proposed by Tropp et al. [24]. With ran-dom filtering, measurements are obtained by passing the original signal, x, througha random filter. The filter, h, has B taps which are i.i.d. random variables that arechosen once and remain fixed for all measurements. Passing x through the filteramounts to calculating the convolution of the continuously streaming data with thefilter, h. The output of the filter is then subsampled at a rate that has been reducedby a factor of bd/mc from the original sampling rate, RNyq. Each measurement canbe calculated usingy[n] =B−1∑i=0x[nbd/mc+ i]h[B− i] for n = 0, ...,m−1 (4.1)where y[n], x[n] and h[n] are the nth elements of the vectors y, x andh, respectively.With random filters, we can construct a sensing matrix, Φ, by first placing the B37h3 h2 h1 0 0 0 00 0 h3 h2 h1 0 00 0 0 0 h3 h2 h1x0x1x2x3x4x5x6= y0y1y2Figure 4.1: Random filtering matrix equation example (Reprinted with per-mission, c©2015 IEEE)filter taps in the leftmost B elements of the first row of Φ, with all other elementsin the row being zeros. Each of the remaining rows in Φ is then a duplicate of therow above, with a shift to the right by bd/mc places. In Fig. 4.1, we show a smallexample of (4.1) in matrix form where d = 7, m = 3 and B = 3.Constructing the sensing matrix this way loses some of the randomness onewould obtain if choosing all of the matrix’s elements as i.i.d. random variables.We expect that this will make Φ more coherent with Ψ in general. While this isexpected to reduce the effectiveness of the compressed sensing framework, ran-dom filtering maintains more randomness in Φ compared to directly subsamplingthe electrical data, where Φ is simply a periodic subset of the rows of the iden-tity matrix. In fact, recent theoretical results for the random filtering scheme whereB= d have shown that approximatelyO(s log2 s log2 d)measurements are typicallyrequired for signal recovery [94], compared with the O(s logd) measurements typ-ically required for more traditional CS sensing matrices [78, 79]. Thus, we expectthat random filters should allow recovery performance that is somewhere in be-tween using a fully random Φ and using periodic subsampling.4.1.3 Random DemodulationRandom demodulation in another method of applying CS ideas to analog signals.This method was first proposed in 2006 by Kirolos et al. [25] and an early imple-mentation was given soon after by similar authors, Laska et al. [95]. A detailedanalysis of the method was published in 2010 by Tropp et al. [96] and we adopttheir modelling and implementation approach in this thesis. Random demodula-38Φ=L1 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 1Pp1 0 ... 00 p2 ... 0.........0 0 ... p9Figure 4.2: Random demodulation matrix equation exampletion involves first multiplying the continuous signal x(t) by a periodic “chippingsequence”, p(t), which can be implemented as a square wave of randomly gener-ated ±1 elements. The product is then passed through an “integrate-and-dump”operator, which serves as a low pass filter. As in [96], one can use matrix form torepresent these two operations acting on x, the vector form of x(t). We show anexample of this in Fig. 4.2 for d = 9, and m = 3. We define the the matrix P asthe one applying the multiplication operation and the matrix L as the one applyingthe integrate-and-dump operation. Together, they make up the CS sensing matrix,Φ=LP .While the random demodulation sensing matrix is similar to the random fil-tering matrix, they are different in several important ways. Both have a bandedstructure, but the sensing matrix in Fig. 4.1 has the same sequence of nonzero ele-ments in each row, with the only difference being a horizontal shift. With randomdemodulation, however, the matrix product LP can have different nonzero ele-ments in each row, depending on the nature of P . In addition, the filter coefficientsin Fig. 4.1 can overlap so that there may be multiple nonzero values in each columnof the matrix. By contrast, the integrate-and-dump operation used by random de-modulation means that this overlap is not permitted.1 This means that for randomdemodulation, the number of nonzero elements in Φ depends on the subsamplingrate. This is unlike random filtering, where the choices of B and m can be madeindependently. With random demodulation, we have control over Tp, the period ofp(t), instead of the filter length, B.1In some cases a small amount of overlap is used in [96], but the sum of all elements in eachcolumn must always equal one.39Single-channelrandomizedsamplingSample at rate RTransmit ReceiveRecovery andreconstructionNILMMeasurement device Processing deviceAnalogsignalNILMpredictionsFigure 4.3: General random filter sampling framework4.2 System ModelThe basic sampling framework for collecting and processing CS measurements forNILM was described in Section 3.1.1 and shown in Fig. 3.1. Here, we discuss thisframework in greater detail for the case with a single sampling channel. The out-line of how single-channel randomized sampling methods would be implementedfor NILM data acquisition is given in Fig. 4.3. In Fig. 4.4, we show the two possiblerealizations of the “single-channel randomized sampling” block in Fig. 4.3. We useRNyq to denote the sampling rate that would be required to obtain sufficient data res-olution if the naive method of direct sampling from Section 3.1.1 was used. Witheither of the single channel randomized sampling methods, we pass the electricalsignal through the randomization mechanism and then sample at a rate R = qRNyq,where q< 1. The measurements are then transmitted and recovered at the process-ing device. The methods of recovery vary and depend on several considerations,including properties of the signals being measured. In the following sections wediscuss several of these design considerations, then motivate and describe specificdesign choices we made for our system model.40randomfilter(a) Random filteringp(t)× low passfilter(b) Random demodulationFigure 4.4: Block diagrams of single-channel randomized sampling methods4.2.1 Design ConsiderationsData compression ratesUsing random filtering or demodulation for very-high-rate electrical data meansthat both the sampling rate of the ADC and the total volume of measured data canbe reduced. For comparison, compressing the data using bzip2 could also be usedto reduce the volume of measured data, as discussed in Section 3.1.1. For a sam-ple text file of data from the BLUED dataset [84] containing about 98 seconds of12 kHz household electrical data, we removed the header text and wrote the time,current and voltage measurements to a CSV file. The size of this file was 32 MBbefore bzip2 compression and 4.5 MB after, giving a compression ratio of about7. This is much better than the bzip2 compression ratios of 1.5 to 3 reported by[83] and is similar to the compression ratio of 6 reported by [23] when using FLAC.Thus, when using random filtering or demodulation, letting q= 110 so that we sam-ple at a rate R = RNyq10 will have a slightly better compression ratio than bzip2 orFLAC.Analog versus digital implementationBoth the random filtering and random demodulation methods can be implementedeither using analog circuitry or digital components. The most important differencebetween these two methods is that using a digital filter requires that we insert anADC before the filter in Fig. 4.3 that samples at the high rate RNyq. Thus, if the mostimportant reason for implementing random filtering is to eliminate the need for ahigh-rate ADC, a digital filter is not suitable. If our system is implemented simplyto provide a quick and efficient way to compress the high-rate measurements andthe ADC sampling rate is not a concern, then digital components are likely the more41appropriate choice. This is because using digital components gives more flexibil-ity in the randomized sampling implementation, which we will discuss further inSection 4.2.3.Sparsity basisAs discussed in Section 4.1, the use of compressed sensing techniques, includingrandom filtering and demodulation, depends on the assumption that the signalsbeing measured have a sparse representation using some basis. Thus, designingthe sampling framework requires that we understand the sparsity properties of themeasured signals. For the electrical signals we are working with, there are severaloptions one could use for a sparsity basis:• Load basis. Each load in a building has an associated electrical currentwaveform that depends on its operating characteristics. The current wave-form of the aggregate building consumption is the sum of these waveforms.A natural choice for a sparse basis is therefore the “load basis”, a matrixwhose columns are the current waveforms of the building’s loads. This basismakes intuitive sense to represent the aggregate current signal because theloads are the true physical generators of the signal.2 The question one mustanswer, then, is whether or not one should expect the aggregate current sig-nal to be sparse when represented using this basis. That is, are only a smallnumber of loads in a home operating at any given time? Alternatively, onecould measure the “difference signal”, the difference between the originalcurrent signal and a copy which has been delayed by an integer number ofcycles. For this signal to be sparse in the load basis, only a small numberof loads should change their consumption over the delay period (e.g., 1 sec-ond). An important advantage of using the load basis is that it may simplifythe load disaggregation problem, as in [30], because the identification of ba-sis vectors used to represent the aggregate signal automatically determineswhich loads are operating or changing state. A disadvantage of the load ba-sis is that it requires assumptions about load behaviour (i.e., that only a few2This measurement method would not, however, be suitable to represent the voltage signal, whichis driven by the power source (i.e., the utility) and is not an additive combination of the loads in abuilding.42are operating or changing at a given time), which may only hold true un-der certain circumstances. In addition, that acquiring the electrical currentwaveforms of each load in a home may be challenging.• Learned basis. Some signals can be sparsely represented using vectorswhich are learned from the signals themselves. For a comprehensive ex-planation of this method, known as dictionary learning, see [97]. A popularexample of a dictionary learning technique is the K-SVD algorithm [98],where we begin with a randomly initialized set of atoms (i.e., vectors) thatwe call a dictionary (i.e., basis) and iteratively update the dictionary, en-couraging it to be made up of atoms that permit a sparse representation oftraining signals of a certain class. With dictionary learning, we create a ba-sis that gives a sparse representation of this class of signals by construction.After learning the dictionary using these the training signals, the dictionaryatoms can be used to represent previously unseen signals of the same class.An advantage of this method is that it allows one to obtain a customizedbasis that is designed specifically for the measured signals. In addition, noprior knowledge of the loads in the building is required. The main disad-vantage of this method is the requirement for training signals, which may bedifficult to obtain. There is also a possibility that the learned dictionary willbe customized too closely to the training signals and will not be effectiveat representing unseen signals from the same class (i.e., the dictionary will“over-fit” the training signals).3• Predefined basis. Many natural signals are well represented using a sparserepresentation in an appropriate predefined basis [99]. Examples of suchbases include the Fourier transform, discrete cosine transform (DCT) or var-ious wavelet transforms (e.g., Haar or Daubechies). One or more of these3Dictionary learning of electrical signals specifically for NILM was explored very recently byElhamifar and Sastry [57]. In this work, the authors proposed a method to learn a dictionary of“powerlets”, vectors that represent different power consumption patterns. As noted in Section 2.2.2,the authors mentioned that their framework could be used for data measured in the tens of kHz, butpresented it using real power measurements and did not make it clear whether or not their frameworkwould be suitable on raw current waveforms. In our work we restrict ourselves to generic dictionarylearning methods that can be used on any type of signal, but acknowledge that customizations todictionary learning for NILM similar to those in [57] may also be possible for the signals we consider.43bases may allow a sparse representation of our electrical signals (e.g., [69]showed that some current waveforms could be represented using Daubechieswavelets). Predefined bases are easy to use because they are independent ofthe loads in a building and the measured signals. Thus, they require no train-ing signals or prior knowledge about the building’s loads. A disadvantage,however, is that these bases may not permit a good sparse representation ofthe electrical signals we measure.For all the sparsity bases, we assume that they are normalized so that each columnhas an `2 norm of 1. A summary of the possible bases, along with their advantagesand disadvantages is given in Table 4.1.Basis Advantages DisadvantagesLoad basis • Vectors are from true physicalgenerators• Simplified disaggregation maybe possible• Assumptions about loadbehaviours are necessary• Load waveforms may bedifficult to obtainLearned basis • Allows sparse representationof signals by construction• No prior knowledge of thebuilding’s loads are necessary• Training samples may bedifficult to obtain• Dictionary may be over-fittedto training signalsPredefinedbasis• No load information ortraining data required• May not give a sparserepresentation of the signalsTable 4.1: Sparsity basis options for representing electrical signalsRecovering the original or difference signalOne could design the sampling system to recover the difference between the orig-inal signal and a delayed copy (i.e., the “difference signal”) instead of recoveringthe original signal itself. We suggested earlier that the former may be easier whenusing the load basis, but one could recover either the original or the differencesignal using any of the basis options.If we choose to use the load basis and focus on the current signal recovery only,then recovering the difference signal is an easier task than recovering the originalsignal. This is based on the assumption that the number of loads changing their44consumption in a short period of time will always be small, while the number ofloads that are operating at any given instant will not. In fact, we have control ofthe time delay used to calculate the difference signal, which gives us some degreeof control over the number of load changes the difference signal will contain. Ashorter delay can be expected to have a smaller number of load changes, resultingin a signal that is more sparse in the load basis. Buildings also have a “base load”,which is the energy consumption of the building that remains constant and is neverturned off. If we measure the difference signal, the current waveform associatedwith this base load will be subtracted out and we can expect to recover signals thatare indeed only described by the sum of load current waveforms. If we attemptto recover the original signal, however, we may not be able to represent the baseload’s current waveform using vectors from the load basis. In this scenario, thebase load would need to be known and its contribution to the CS measurementswould need to be removed from the measurements before attempting to recoverthe current signal any further.If either a learned or predefined basis is chosen, the decision of whether torecover the original or difference signal is less important. Both of these methodsare suitable for general signals and do not depend on how often loads are operatingor changing their consumption. Certain predefined bases or dictionary learningmethods may happen to permit more sparse representations of either the originalor difference signal, but there is no reason to expect any one scenario will be betterthan another.In addition to considering the impact of different bases on the recovery of theoriginal or difference signal, we must also acknowledge the need to align delayedcopies of signals if recovering the difference signal. We assume that we can alignthe delayed copy so that it represents the pointwise difference between two cyclesin the electrical signal that occur at two different times. Lining up these cyclesmay be difficult, especially when the frequency of the voltage supplied to the homevaries slightly, which is not uncommon [100].Finally, we note that when using random filtering, the random filter used toobtain the measurements is linear. This means that the difference signal can becalculated either before or after the sensing matrix is applied. To see this, we candefine ∆x = xτ −x and ∆y = yτ −y, where xτ and yτ are delayed copies of45x and y, respectively, delayed by time τ . Since the sensing matrix Φ does notvary in time,∆y =Φxτ −Φx=Φ∆x. This means that the delay and differenceoperations do not need to be performed prior to passing the signal through therandom filter. This is a very important point with three positive consequences:1. High-rate sampling is not necessary. Implementing a delay in the signalwould not be possible with analog circuitry. Thus, if the delay had to beimplemented before the filter, it would be necessary to use a high-rate ADCfollowed by a digital filter. This would mean that reducing the sampling rateof our solution would not be possible, making it only useful as an efficientdata compression and management strategy.2. Lower burden on the measurement device. Allowing the delay and dif-ference operations to occur after the filtering stage means that they can bemoved off of the measurement device and onto the processing device. Thisis an important benefit when minimizing the hardware and processing powerused by the measurement device.3. More processing flexibility. Because the delay and difference are imple-mented digitally by the processing device, there is much more flexibility inhow the signal will be processed than if these operations were performed bybuilt-in elements in the measurement device. In particular, a processing de-vice could recover both the original and difference signals or switch betweenrecovering these two signal types as needed.We also note that the arguments above cannot be made for random demodulationin general because Φ varies in time due to the changing matrix P . The exceptionto this is if Tp divides evenly into τ , so that the matrixΦ at time t will be the sameas the matrix t+ τ .Notch filterThe electrical signals we work with have a fundamental frequency, f f und , associ-ated with the driving frequency of the electrical grid. This frequency is generallyeither 50 Hz or 60 Hz. Optionally, one can remove this fundamental frequencyfrom the signals being measured. This can be done using a notch or high pass filter46implemented in either the analog or digital domain. Removing this component ofthe signal may be particularly useful when using the load basis, since it can helpreduce the similarity of load current waveforms, as in [30].4.2.2 Sampling FrameworkBased on the design considerations discussed in Section 4.2.1, we designed a sam-pling framework that would be suitable for implementing random filtering or de-modulation to obtain CS measurements for NILM. The framework includes a simplemeasurement device that follows the block diagram given in Fig. 4.3. It also allowsfor two separate recovery options using two different processing devices that areshown in Fig. 4.5. The first recovery method is designed for bases other than theload basis, while the second is designed specifically for the load basis. We discussthe measurement device and the two processing devices in greater detail in thissection.In the following, we assume to be measuring electrical current signals and notvoltage signals. Measuring the voltage signals is typically less important for ourpurposes because the voltage is a sinusoidal signal driven by the utility companyand contains little information about appliance operation. Indeed, other NILM re-searchers have measured the voltage at very high rates and found it to be a sinu-soidal wave, then proceeded to analyze only the very-high-rate current waveforms[30, 75]. With that said, most of our proposed sampling framework could be ap-plied to voltage signals if necessary. The only part of the framework that cannot beapplied to the voltage signals is the recovery method that uses the load basis, sincethis basis is made up of current waveforms and is not appropriate for representingvoltage signals.Measurement deviceFor the purposes of our experiments, we assume the implementation of randomfiltering and random demodulation in the digital domain. Thus, the measurementdevice first samples the electrical signals at the rate RNyq using a high-rate ADC,then passes through one of the two single-channel sampling implementations givenin Fig. 4.4, realized using digital components. Finally, the output is subsampled47using a subsampling factor of bd/mc. In [30], an analysis of the frequency contentof electrical current signals suggests that the Nyquist rate to capture these signalsis approximately 10 kHz for an electrical system in Europe, which operates witha fundamental frequency of 50 Hz. Based on this, we assume that 12 kHz is theNyquist rate for the 60 Hz North American electrical signals we analyze here.Thus, our model uses RNyq = 12 kHz for the high-rate ADC.We have chosen this model for our experiments because we work in the digitaldomain and the electrical data we use has already been sampled at a high rate. Inpractice, however, the measurement device could instead be made up of an analogcomponents, followed by an ADC sampling at the rate R = bd/mcRNyq. Thus, ourmodel simulates how random filtering or demodulation would behave in the digitaldomain and closely mimics the behaviour of an analog implementation. In eithercase, once the measurements are collected they are transmitted to the processingdevice for analysis.Fortunately, the fact that there is only one measurement device that works forboth recovery methods in Fig. 4.5 means that one does not have to restrict them-selves to one recovery method or the other. Once the measurement device has beendesigned and installed at the measurement location, which may be inconvenient toaccess, one can change between the recovery methods as needed. For example, ifthe load basis was not initially available to the user when the system was installed,they could use a predefined or learned basis instead. If the load basis later becameavailable (e.g., by measuring the load waveforms in the building or obtaining ap-proximate waveforms from a repository of shared load signatures, as suggested in[20, 77]), then the user could switch to using the recovery method designed for theload basis. Assuming that both recovery methods were already implemented andavailable to the user, the effort to switch from one mode to another could be assimple as changing user preferences in the NILM software program. Further, if onerecovery method proved to be more accurate at the time of installation, but NILMresearch eventually showed the other framework to surpass its performance (e.g.,if new dictionary learning techniques allowed better bases to be learned) then theuser could switch to using the “best” method at the current time. Thus, our overallsystem model is very flexible, allowing for easy incorporation of advances in NILMalgorithms.48ReceiveRecoverm× kblsamplesReconstructkbl cyclesNILMProcessing deviceTransmittedmeasurementsNILMpredictionst = nT(a) Generic sparisty basisReceiveDelay bym× kdelsamples+EventdetectionRecoverm× kbl samplesProcessing deviceTransmittedmeasurementsNILMpredictionsor t = nTevent occurred•(b) Load basisFigure 4.5: Processing devices for our two proposed sampling frameworksRecovery method 1: Generic sparsity basisThe first recovery method we chose is designed for bases other than the load basisand is shown in Fig. 4.5a. This is the simpler of the two methods and requires onlythat we have a fixed basis and that the signals we measure are known to be sparsein this basis. Examples of such bases are those which have been learned offlineusing appropriate training signals or a predefined basis such as the DCT.For this recovery method, we recover the original signal and not the differencesignal described in Section 4.2.1. This is because calculating the difference sig-nal requires additional processing power and there is no reason to believe that ageneric dictionary will be better able to recover the difference signal compared tothe original signal.Since no delay or difference operations are necessary, the first task performedby the processing device is to select blocks of measurements to be recovered. Inour model we implement this using a switch which closes every T seconds. Eachtime the switch closes it remains closed for tbl seconds. We can design tbl to ensure49that each recovered block corresponds to an integer number of electrical cycles,kbl . Because the electrical signals have a fundamental frequency, f f und , associatedwith them, we can model each high-rate cycle of the periodic signal as a vectorx with d = RNyqf f und elements. Thus, when we group kbl cycles into one block, thiscorresponds to a recovered signal length of dbl = dkbl . Note that with CS, however,we collect only m CS measurements per d high-rate data points. Thus, tbl shouldbe designed to group mbl = mkbl CS measurements into each recovery block. Notealso that the times T and tbl can then be varied as needed. For example, one couldrecover five cycles once per second or recover ten cycles every f f und10 seconds. Inthe second example, all cycles will be recovered, which may not be necessary inpractice.After being broken into blocks, the signals then pass through what we refer toas the “recovery” stage, where the sparse vector, s, that gives a sparse approxima-tion of x is determined. As noted in Section 4.1.1, this can be done using variousrecovery techniques, including OMP or `1 minimization methods.Immediately following the recovery stage, the signals pass through the “recon-struction” stage, in which the vector s and the basis,Ψ are used to estimate x. Theestimate of this signal is given by xest =Ψs, a length d vector that approximatesthe signal that one would have obtained if they had measured the signal directly atthe rate RNyq.Finally, after reconstructing the electrical signals, we apply NILM. This can bedone using one of the many NILM methods described in Chapter 2.4 This results inthe NILM prediction, the end goal of our system.Recovery method 2: Load basisIf we choose to use the load basis as the sparsity basis for our sampling system,a slightly different recovery method is more appropriate than the one describedabove for a generic sparsity basis. The method is shown in Fig. 4.5b and is similarto the previous one except for three main differences:4Many of the existing NILM algorithms could be performed on the recovered data, but not all ofthem. In particular, our assumption that the recovered signals must be periodic (see Section 4.2.3)means that transient electrical signals will not be well-represented using our sampling method. Thus,NILM algorithms depending on these transients could not be applied to the recovered signals.50• The difference signal is calculated. We do not believe that sufficiently fewloads will be operating in a building at all times so as to allow the electricalsignal to be represented as a sparse combination of load waveforms. Thus,when the load basis is used, we have chosen a framework that calculates thedifference signal by delaying a copy of the original signal and performing asubtraction between these two signals. The delay, τ , is chosen to correspondto mdel = mkdel CS measurements, where kdel is an integer. This ensures thatthe difference signal is a pointwise difference between electrical cycles.• Event detection can be used to control the sampling switch. Since we arealready calculating the difference between two signals after some time de-lay, τ , various event detection techniques could be used to determine whensignificant changes in the measured signal occur. These could be used tocontrol when the switch should be closed in order to recover the signal. De-pending on how the system is designed, this could offer very significant sav-ings because recovery only needs to be performed when events occur. Thiseffectively combines the event-detection method of sampling discussed inSection 3.1.2 with the single-channel CS methods we explore here.• The reconstruction and NILM stages are eliminated. Once we have founda sparse representation of the electrical signal in the load basis, we havealready identified loads which are likely to be changing state and generatingthe measured signals. Thus, it is not necessary to reconstruct x from s andthen apply NILM. Instead, we can use the s vector as the NILM predictiondirectly.Note that this recovery process is the same as that proposed by [30]. We have cho-sen to adopt this recovery stage as part of our sampling framework. The primarydifferences between our sampling method and that in [30] are that: a) we use asingle random filter or a single demodulation channel to acquire the measurementsrather than multiple filters and ADCs, and b) we allow two recovery method options,the first of which does not require the sensing and recovery stages to be coupled.51Adding Dictionary LearningOne important addition to the framework we have described here is the introductionof dictionary learning. There are several options we can consider for dictionarylearning implementation:• Offline learning. The dictionary could be learned offline before beingloaded into the NILM processing device. This method is simple, but requiresoffline training data that is very similar to the signals it is trying to represent.This training data would need to be acquired using a high-rate ADC, either inthe building it is being used for or another building with similar load wave-forms. It also cannot adapt to changing conditions in the building (e.g., newappliances) unless it is re-trained offline.• Learning online at the measurement device. Dictionary learning could beperformed using the electrical signals at the measurement device. Using thereal measured signals as training data, this solution allows the dictionary tobe updated when conditions in the building change. The disadvantages, how-ever, are that this solution adds hardware components, additional processingburden and greater storage requirements to the measurement device. Sincedictionary learning at the measurement device would occur in the digital do-main, a high-rate ADC would also be required on this device.• Learning online at the processing device. It is also possible to learn fromthe CS measurements received at the processing device [101, 102]. Thisimplementation method offers online learning and also prevents excessiveburden on the measurement device, but requires the sensing matrix, Φ, notto be the same for all measurements. This could be incorporated into oursensing matrix by changing the Φ matrix in time.5In our work, we assume the use of the simplest implementation, where the dic-tionary is learned offline. Implementing online dictionary learning at the measure-5In Section 4.2.1, we mentioned that theΦ at time t and t+ τ must be equal in order for us to beable to recover the difference signal at the processing device. This could likely be compatible withthe need forΦ to vary in time if we consider different time scales. For example,Φ could be changedonly every 1000τ , so that any measurements within each 1000τ time period are unaffected by thechanges.52ment device allows the dictionary to update over time, but introduces significantpractical disadvantages. Thus, we do not expect that this online method would beworthwhile to use for a real application. Implementing online dictionary learningat the processing device shows promise, however, and is an interesting area forfuture research.4.2.3 The Sensing MatrixNow that we have described the overall framework that can be used for signal sam-pling and recovery, we discuss details about the framework that affect the sensingmatrix associated with random filtering and demodulation.Random filter matrixWhen designing our system, we would like to be able to choose the number ofmeasurements, m, the filter length, B, and the length of segments to recover, d,independently. This can sometimes result in sensing matrices for random filteringthat have filter elements outside the expected m× d dimensions, as in Fig. 4.6a.We propose managing this issue by “wrapping” the coefficients around, so theycome in on the left side of the matrix, as shown in Fig. 4.6b. This is not possible ingeneral with a single analog filter because the filter coefficients become separatedfrom one another. If we assume that the signals are periodic with period d, how-ever, measurements extending beyond the cycle of interest will actually capturemeasurements from the next cycle, which has the identical shape as the first. Thus,it is as if these extended filter taps are measuring the beginning of the first cycle, asshown in Fig. 4.6.If digital filtering was used instead of analog, it would be possible to implementthe wrap-around matrix even for non-periodic signals. This would be accomplishedby storing some of the high-rate signal samples in memory and applying a “split”filter to the signal. The first few filter elements would multiply the most recentlymeasured high-rate samples and the last few filter elements would multiply someof the stored samples. This demonstrates the flexibility of using digital filtering.Although we use a digital filter in our system model, we would like our simulationsto mimic the behaviour of an analog filter implementation. We therefore do not53dmB(a)dm(b)Figure 4.6: Wrapping around the sensing matrix coefficients when measuringperiodic signals (Reprinted with permission, c©2015 IEEE)allow the digital filter to be split in order to capture non-periodic signals and insteaduse the wrap-around matrix method. Thus, we must assume that the signals weacquire are periodic, which implies that our method is only suitable for steady-state signals. It cannot be used to measure transient electrical behaviour that variesfrom one cycle to the next.Random demodulation matrixThe random demodulation sensing matrix, we cannot choose B and m indepen-dently because of the integrate-and-dump operation, but it is still possible to endup with sensing matrices that exceed the m× d dimensions. Thus, the wrapping-around method we described above can also be used for the Φ=LP matrix.4.3 Numerical Results and DiscussionWe performed several numerical experiments to test the random filtering and de-modulation system models presented. In all cases, we performed tests using elec-tricity consumption data from real homes that have been made available in theBLUED public dataset [84]. In this section, we first discuss how we processed thisdata. For each experiment, we then provide the simulation conditions, followed bythe simulation results and discussion.541: kitchen overhead light 2: dining room overhead light 3: computer 14: office lights 5: LCD monitor 1 6: hallway stairs light7: basement lights 8: DVR/bluray 9: TV10: closet lights11: upstairs hallway lightFigure 4.7: Sample load current waveforms derived from the BLUED dataset(Reprinted with permission, c©2015 IEEE)4.3.1 Obtaining and Processing Test DataWe extracted load current waveforms from the BLUED public dataset [84] for ourexperiments. This dataset records a home’s current and voltage at 12 kHz as wellas the times that loads in the home change state. To extract current waveforms,we first found every 60th current cycle from approximately 5:40 pm to 7:20 pm onOctober 20, 2011 and manually identified various current waveforms associatedwith different states of the home (i.e., different combinations of operating loads).We then found load waveforms as the pointwise difference between pairs of statewaveforms, where the pairs corresponded to states that differ by only one load. Theextracted load current waveforms are shown in Fig. 4.7.554.3.2 Experiment 1: Load BasisIn our first experiment, we tested the recovery method designed for use with theload basis. That is, we performed experiments similar to those in [30], except thatwe investigated the use of a single random filter and a single demodulation channelto collect CS measurements.Simulation parameters and conditionsTest signals were created by first choosing Na, the number of loads that will beactive in each test. These loads must be chosen from all the available loads, ofwhich there are Nl (in this case Nl = 11, as shown in Fig. 4.7). In general, theset of “active” loads may refer to either the set of loads that are operating at agiven instant or the set of loads that have changed state over the time delay τ .In this experiment, because we assume we are measuring the difference signal asdiscussed in Section 4.2.2, our tests correspond to appliances turning on during thetime delay τ .6 A “measured test” was given by the sum of these Na load waveformsplus additive Gaussian noise. The variance of the noise was specified asσ2n = 10− SNRdB10 σ2s , (4.2)where σ2s is the mean of the variance of the test signals. For each simulation,we generated approximately 5× 105 test cases, which included equal numbers ofevery possible combination of Na loads.The following sampling methods were tested, where all filter elements and thediagonal elements of the P matrix were randomly generated as either −1 or 1 withequal probability:1. psubs: periodic subsampling,2. rfilt: one random filter followed by periodic subsampling,3. rdemod: one random demodulation channel followed by periodic subsam-pling, and6Our method can also handle loads that turn off by using negative elements in s, but testing theperformance of such scenarios is left to future work.564. full: m random filters (e.g., sensing matrix filled with i.i.d. random vari-ables).We simulated the sampling and recovery of each test signal, using a notch filteras defined in [30] to remove the f f und component of the signals and the applianceorthogonal matching pursuit (A-OMP) from [30] for recovery. The filter elements,chipping sequence matrices, P , and phase of subsampling were generated ran-domly for each set of(NlNa)tests. We calculated the error rate for each load as thepercent of test cases in which the correct state was predicted. We summarized theoverall performance as the average error rate over all loads, again as in [30].Unless otherwise noted, the parameter settings we used for our simulationswere Na = 3, Tp = 200, a subsampling rate ofRNyq10 and an signal-to-noise ratio (SNR)of 30 dB.Varying filter and recovery segment lengthsWe first performed simulations for varying filter lengths and two choices of dbl:• dbl = RNyq/ f f und so that each test signal contained a single cycle of the peri-odic electrical waveform, and• dbl = 10RNyq/ f f und to investigate the effect of recovering longer signal seg-ments.7The results of these simulations are given in Fig. 4.8.We can see that recovering 10 cycles at a time offers better performance forall three of the sampling methods. This is not surprising because longer signalsshould reduce the effects of noise in the A-OMP recovery. We can also see that thefull method performs best, as expected from CS theory.Further, we observe that the rfilt method sometimes performs better than psubsand sometimes worse. This is an interesting result which suggests that there is atradeoff associated with the filter’s random elements. If the filter is short, the filter’s“mixing” operation simply obfuscates the original data and makes it more difficult7For dbl = 10RNyq/ f f und , we extended the dictionary of waveforms, which have length d =RNyq/ f f und to repeat 10 times and created tests of length dbl = 10d = RNyq/ f f und to recover.57101 10202468Random filter length (taps)Error rate (%)101 1020123Random filter length (taps)Error rate (%)  rfiltpsubsrdemodfull1 cycle recovered10 cycles recoveredFigure 4.8: Test error rates for varying random filter length and number ofrecovered cycles (Reprinted with permission, c©2015 IEEE)to recover. If the filter is longer, however, its random elements allow one to captureincoherent measurements and perform sparse recovery.We also observe that rdemod performs better than rfilt for all filter lengths,suggesting that rdemod is the more suitable single-channel sampling implementa-tion for this application. In choosing between these two methods, it is also worth-while to consider which would be easier to implement. The two methods differmainly in the direction of system design. With random filtering, we choose thetaps we want our filter to have, then design an analog filter to produce that be-haviour. With random demodulation, we use a low pass filter and model it usingthe appropriate matrix (i.e., the integrate-and-dump operation). Which of these iseasier in a particular setting may have a significant influence on which should beused for NILM data. In addition, recall from Section 4.2.1 that for random demod-ulation, Tp must divide evenly into τ in order for the CS sensing matrix, Φ, to be58identical at times t and t+τ . This is necessary for the calculation of the differencesignal at the processing device, as we proposed in Fig. 4.5. Thus, the flexibility inthe choices of p and τ are important design considerations that must be taken intoaccount when deciding between random filtering and demodulation.Comparing recovery techniquesWhen the number of CS measurements per recovered segment, mbl , is larger thanthe number of available loads, Nl , the linear system of equations to recover thestate vector s is not underdetermined. One could therefore use generalized leastsquares (GLS) to recover the signals instead of A-OMP. While [30] demonstrates oneexample of how a least squares approach can incorrectly predict load states whenthe A-OMP algorithm makes correct predictions, we extend these results here. Wedemonstrate that A-OMP does better than GLS for a large number of test realizationsand with varying levels of sparsity of the state vector, s. For different values ofNa, we simulated recovery using both GLS and the A-OMP algorithm. Here werecovered only one cycle (i.e., d = RNyq/ f f und) and used B = d. The results ofthese simulations are given in Fig. 4.9.Consistent with the previous results, rfilt performs better than psubs (becauseB = d) and rdemod performs better than both of these methods. We can also seethat the A-OMP recovery method performs better and that the improvement is morepronounced for more sparse signals. This is not surprising, since OMP methods aredesigned to recover sparse signals.4.3.3 Experiment 2: Generic BasisIn our second experiment, we tested our system framework that is designed to ac-commodate various sparsity bases. With a generic basis, we cannot associate thesparse vector, s, with the states or state transitions of loads. Thus, in this experi-ment we evaluated the recovery performance of different bases by determining howclose the estimated signals, xest were to the true signals, x.591 2 3 4 5 6 70510152025Number of active appliances per testError rate (%)  psubs, GLSpsubs, A−OMPrfilt, GLSrfilt, A−OMPrdemod, GLSrdemod, A−OMPfull, GLSfull, A−OMPFigure 4.9: Test error rates for varying number of active loads and two recov-ery techniques (Reprinted with permission, c©2015 IEEE)Simulation parameters and conditionsThe simulation conditions for this experiment were the same as those described inSection 4.3.2 except for the following differences:1. For this experiment we simulated only the rdemod implementation becausewe are interested in exploring the behaviour of single channel randomizedsampling and rdemod performed better than rfilt in Section Here we recovered the signals using MATLAB’s wmpalg function [103](using the “basic matching pursuit” option) because the A-OMP algorithmfrom [30] is designed specifically for use with the load basis.3. Here we tested the following bases for signal recovery:(a) The load basis, as in Section Three learned dictionaries with 11, 20 and 50 atoms, respectively. Thefirst size was chosen to correspond to the number of loads (i.e., thephysical generators of the signals) and the other two were tested to ex-plore how additional flexibility in the dictionary affects performance.The dictionaries were learned using our implementation of the K-SVDalgorithm [98]. Each dictionary was learned from 1000 training sig-nals, where each was the sum of a randomly chosen set of anywherefrom 1 to 7 of the Nl loads. Gaussian noise generated fromN (0,10−4)was also added to each signal. The K-SVD method requires the selec-tion of a pursuit method, so in our simulations we used MATLAB’swmpalg basic matching pursuit. The dictionary was initialized usingelements from N (0,0.01) and a total of 40 learning iterations wereperformed for each learned dictionary.(c) Three predefined bases, namely the Haar and Daubechies WaveletTransforms and the DCT.(d) A dictionary of 200 columns whose elements are i.i.d. random vari-ables generated fromN (0,0.01).4. Rather than evaluating performance by the percentage of correctly identifiedload states or state transitions, we instead evaluated the different methodsusing the normalized residual sum of squares (RSS) of the recovered signal,which is given by:RSS =1NtdNt∑i=1d∑k=1(Ψs−xest)2i,k, (4.3)where Nt is the number of test signals evaluated.5. We also include the error bound on the recovery, which is associated withthe noise added to the test signals we are attempting to represent. This lineis calculated based on the standard deviation of the Gaussian noise added tothe test signals. The mean of the square of a zero mean Gaussian process isgiven by the variance, σ2, which we used as the RSS error bound.61We also note that all the simulations in this section were performed on testsignals containing only one electrical cycle (i.e., d = RNyq/ f f und) and using Tp =200.Varying the number of active loadsTo compare the performance of the difference bases, we first varied the numberof active loads, Na in each test case. The results are given using both a linear andlogarithmic y scale in Fig. 4.10.As expected, the load basis does well and is close to the error bound. All threelearned dictionaries also perform well, sometimes even better than the load basis.This is promising because the load basis may not always be available, so trainingdictionaries may sometimes be necessary. Interestingly, the smallest learned dic-tionary, which has 11 atoms, performs the best of the three learned dictionaries.The two larger dictionaries with 20 and 50 atoms have very similar performance toone another, which is slightly worse than the learned dictionary with 11 atoms.We can also observe that the two wavelet bases have similar performance andare only slightly better than the random basis. The DCT basis offers some improve-ment over these other two predefined bases, but still does not perform as well asthe learned dictionaries.Finally, we observe that the RSS for all the methods increases with the numberof appliances that are on at any given time. This is consistent with the resultsin Section 4.3.2 and the expectation that recovery performance will be better forsignals that are more sparse.Varying SNRIn the second part of this experiment, we simulated the recovery of measured sig-nals using the different bases and varying SNR. The results of this simulation aregiven in Fig. 4.11.We can see that the ordering of the various bases is very similar to that inFig. 4.10. As before, the learned dictionary with 11 atoms performs better than theload basis in some cases and worse in others. As expected, the performance of allthe methods also improves with increasing SNR.621 2 3 4 5 6 700.511.5Number of active appliances per testRSS error  randHaarDaub.DCTDL50DL20DL11loaderror bound(a)1 2 3 4 5 6 710−410−310−210−1100101Number of active appliances per testRSS error(b)Figure 4.10: Signal recovery performance for varying number of active loadsand several sparsity bases6315 20 25 30 35 40 4500. (dB)RSS error(a)15 20 25 30 35 40 4510−410−310−210−1100SNR (dB)RSS error  randHaarDaub.DCTDL50DL20DL11loaderror bound(b)Figure 4.11: Signal recovery performance for varying signal-to-noise ratioand several sparsity bases644.4 SummaryIn this chapter, we demonstrated how a single random filter or a single random de-modulation channel can be used to reduce sampling rate for NILM. Random filter-ing and demodulation facilitate NILM that requires high resolution data by allowingone to capture this information using a lower-than-Nyquist sampling rate and main-taining a simple implementation. We simulated these sampling techniques on realelectrical data and showed that they both have superior NILM performance to directperiodic subsampling under certain conditions. Thus, when attempting to reducedemands on sampling rate and the amount of data sampled, random filtering anddemodulation offer improvement over the naive approach of direct subsampling.The random demodulation method performed somewhat better than random filter-ing, but we concluded that the choice between these two methods should also takeinto account practical implementation considerations.Further, we showed that strong signal recovery occurs with both the load basisand dictionaries that have been learned offline from high-rate training signals. Thisis promising because it means that dictionary learning could potentially be used asa tool to allow for quality signal recovery when the waveforms cannot be measureddirectly. We also tested several predefined bases and found that they were not verysuccessful at reconstructing our test signals.65Chapter 5Analyzing Energy Transitions toImprove Very-Low-Rate NILMIn this chapter, we apply ideas from the field of NILM research to very-low-ratedata on which it is not typically used. We also take a slightly different approachto the disaggregation problem than we did in Chapter 4. Here, we do not havedisaggregation features that are known to correspond to loads (e.g., the currentwaveforms in Chapter 4) so we cannot simply attempt to detect their presence inthe aggregate signal. Instead, we detect a feature in the aggregate signal that canbe isolated and analyzed, then speculate what loads it may be associated with.More specifically, we develop a method to detect frequent and consistentlysized transitions in energy consumption and analyze these transitions by comparingthem to the outdoor air temperature of the buildings being analyzed. Through thisprocess, we show that the energy transitions often exhibit properties which suggestthey are caused by equipment whose behaviour changes with temperature, such asa heating, ventilation and air conditioning (HVAC) system.We begin this chapter by providing a few definitions that we will use through-out. We then present the methodologies used to detect the energy transitions weare looking for and to analyze these transitions. Next, we present and discuss theresults of applying these methodologies on a set of test buildings. Finally, we finishthe chapter with a brief summary.665.1 Background and DefinitionsFor this chapter, we applied our analysis using a database of electrical consumptiondata from approximately 30000 small to medium businesses (SMBs). The measure-ments have been acquired at a rate of once per 15 minutes, which falls into thevery-low-rate sampling category.While our work in this chapter has only been applied to SMBs, we believe thatthe analytical techniques could be applied to other classes of buildings, such asresidential. We therefore use the generic term “building” to describe the entitieswhose aggregate energy consumption data we are analyzing. We note, however,that SMBs do not always occupy all of a building (e.g., a beauty salon in a mall).Thus, using “building” to describe such an entity is not always physically accurate,but serves our purpose for analysis.We also introduce the term “vertical” to describe the type of SMB (i.e., theprimary use of the building whose energy data we are analyzing). Examples ofverticals are beauty salons, religious worship spaces, schools, grocery stores, etc.Finally, we recall the definition from Section 3.2.3 of an “energy transition”,which is the difference in energy consumption between two consecutive measure-ments in an energy time series. Energy transitions are an important property usedin our method and will be discussed extensively throughout this chapter.5.2 MethodologiesIn this section, we describe the method we developed to analyze very-low-rateelectricity consumption data by identifying frequent energy transitions.5.2.1 Generating Histograms of Energy TransitionsThe first stage of our analysis was to create a histogram of electricity transitions fora given building. To do so, we first calculated the difference between the electricityconsumption for each pair of consecutive measurements. We also calculated thedifferences in the time stamp for each pair of consecutive measurements. Theresult was a set of data points, each with an associated time and energy transition.While one might expect that all consumption measurements should be separatedby a fixed time interval (e.g., 15 minutes), it is possible for measurements to be67(a) Smooth (b) Containing peaksFigure 5.1: Examples of energy transition histograms for two buildingsmissed. We therefore removed any data points from the set if they did not have theexpected time difference.We next ensured that after removing any invalid data points, the set was stilllarge enough to make the analysis meaningful. Unless otherwise noted, we createdhistograms using all valid data collected in 2014 and required that at least halfof the possible data points for that year remained after excluding invalid energytransitions (e.g., for a year of 15 minute data, we expected to measure 35040 datapoints and would analyze any building with at least 350402 = 17520 valid points).We then plotted a histogram of the energy transitions associated with valid datapoints. Examples of these plots for 15 minute interval data from all of 2014 aregiven in Fig. 5.1. As we can see, the histogram in Fig. 5.1a has a relatively smoothdistribution of energy transitions. The histogram in Fig. 5.1b, however, has clearpeaks in consumption that are distinct from the center peak.1 The peaks are approx-imately symmetric about zero, suggesting that for each increase in electricity useby that specific amount, there is often a corresponding decrease in consumptionby approximately the same value. It is this histogram behaviour that we believecould be related to quickly and frequently changing equipment. We explore thisphenomenon in more detail in this chapter.1The center peak is caused by very small energy transitions corresponding to a “steady state”,where the building uses nearly constant consumption over an extended period of time.68Comments on histogram interpretationThere are a few points worth noting about the interpretation of the histograms cre-ated using our method:1. Interpretation as discrete slope. Because the time difference for every datapoint represented in the histogram is equal, the energy transitions we plot inthe histograms are proportional to the discrete slope of the electrical signal.Thus, peaks in the histogram do not necessarily indicate the presence of con-secutive “on-off” transitions that create short spikes in the electrical signal.For example, consider a spike created from an increase in consumption by 1kWh in 1 interval and then a drop by 1 kWh in the following interval. If thisspike is repeated 1000 times in a year, it would contribute to the histogramthe same data points as if the building’s consumption increased by 1 kWh perinterval for 1000 intervals in a row, then decreased by 1 kWh per interval forthe next 1000 intervals. While the latter case may not be physically realistic,it is important to acknowledge that the histograms we create cannot tell thedifferences between these two scenarios.2. Duration of equipment process causing spikes. Equipment processes thatcause short spikes in electricity consumption likely have a duration that isshorter than the measurement interval. We can understand this by compar-ing what happens when an appliance process with duration a and powerconsumption b occurs within a single interval versus when the same pro-cess straddles two intervals, as depicted in Fig. 5.2. In this figure, energyconsumption measurements are made every 0.25 hours (15 minutes) at thedotted grey lines and consist of the integration of power consumption overthe most recent 0.25 hours. In this example, if we assume that the processesare far enough apart from one another, the probability of a process causing aspike in the consumption data that is exactly ab kWh is given by 1− a/tint ,where tint is the measurement interval. If the process does not fall withinone measurement interval, however, other energy transitions will be mea-sured. There is a relationship between the variation in spike amplitude (i.e.,the width of histogram peaks) and the ratio a/tint and we can see that whena regularly repeating appliance process causes spikes that are consistently695.02.50Power(kW)a atintbProcess 1 Process 200.1250.2500.3750.5000.625Energy(kWh)0 0.25 0.50 0.75 1.00 1.25 1.50Time (h)Figure 5.2: Power and energy consumption of an example equipment pro-cess, where energy measurements are the integration of power consump-tion over 0.25 hour periodssized, it is likely that a/tint is small. One could take this further by analyzingthe shape and size of histogram peaks to obtain an estimate of a, which is aninteresting area for future work.3. Spike isolation. We can also see from Fig. 5.2 that if the equipment pro-cesses were too close together, this would affect whether or not spikes aredetected. For example, if Process 2 began at 0.50 h, the energy consumptionmeasurement at 0.75 h would be 0.625 kWh. While it may be possible to de-tect a smooth “bump” in energy consumption, our work focuses on findingsharp spikes that are characterized by an increase in consumption immedi-ately followed by a decrease of approximately the same magnitude. Sucha spike would no longer occur in this case, since there would be two 0.625kWh measurements in a row. A similar effect would also occur if a secondpiece of equipment in the building consumed energy between the times of0.50 h and 0.75 h. Thus, we can see that if spikes are being detected in ag-gregate energy consumption data, they are most likely to be caused by equip-ment processes that are isolated in time from other activities in the buildingthat consume energy.705.2.2 Building Candidate SelectionThe next part of our work focused on automatically selecting buildings that exhib-ited the interesting histogram peaks we had observed. After calculating the energytransitions, we used the density [104] function from R programming language[105] to find a smooth probability density function which estimates how frequentlyeach change occurs.2 The probability density function is similar to a smoothed ver-sion of a histogram and allowed us to further analyze the consumption change data.We found peaks in the smoothed probability density function using a simplealgorithm that searches for instances where the discrete derivative of the functionchanges from positive to negative and keeps only those that are greater than 1%of the largest detected peak. This second step can help to reduce the number ofpeaks we detect which are due to noise, but the method is still very simple anddetects many peaks that are not relevant. We therefore implemented several otheroperations to “clean-up” the list of peaks as follows:1. Merge close peaks. If two peaks were too close together, we merged theminto one peak. This merging behaviour was controlled by a separation dis-tance threshold, so that peaks closer to one another than the threshold werejoined and others were not. The height and location of the new peak waschosen to match those of whichever of the merged peaks had the largestmagnitude.2. Remove insignificant peaks. A second round of thresholding was done,where a peak was considered significant is if its height was greater than somefixed percentage of the largest peak in the energy transition density function.3. Keep only symmetric peak pairs. We kept only peaks that had a “sis-ter” peak that occurred around the same usage change value, but with theopposite sign. This was also controlled by a threshold parameter which de-termined how similar the magnitudes of the peak locations needed to be inorder to call them a symmetric pair. We also removed any peak pairs with2We used a Gaussian smoothing kernel and approximated the density at 512 locations. Thebandwidth (i.e., the standard deviation of the smoothing kernel) was defined as max(0.01,r/75),where r was the range of all but the highest and lowest 0.05% of the energy transitions in our dataset.71energy transitions that were too close to zero, where “too close” was againcontrolled by a distance threshold.For all of these “clean-up” stages, we used thresholds that depended on therange of measured energy transitions and thus were unique for each building.3We then simply categorized buildings by the number of peaks that were de-tected after the peak cleaning procedure. Candidates for further analysis were cho-sen as those with at least one pair of symmetric peaks in their energy transitiondensity function. Thus, any buildings with fewer than 2 peaks were excluded fromfurther analysis.We also observed that it is very common for buildings to have a peak aroundzero, indicating that they often reach a “steady state” where their consumptionchanges very little from one interval to the next. We therefore further refined ourset of candidates for analysis to any buildings having 3 or 5 peaks in their energytransition density function, suggesting a single peak around zero and 1 or 2 peakpairs, respectively. Note that this excludes two important categories of potentialcandidates: a) those with a larger odd number of peaks and b) those with an evennumber of peaks. Since these buildings are somewhat more complex to analyze,we left them out of our final candidate set, but they could be investigated in futurework.4Prevalence of interesting density function behaviourWe investigated which verticals seemed to have the interesting peaks we were look-ing for by manually inspecting their histograms using one year of consumption dataand five sample weeks of the consumption time series throughout that year.5 An3We actually used r, the range of calculated energy transitions between the 0.05% and 99.95%quantiles, to define the thresholds. This was done to reduce the effect of outliers on the choice ofthresholds.4In particular, it would be interesting to investigate buildings with an even number of peaks, sincethey typically do not have a peak near zero. In these buildings, electricity consumption does notstay constant for very long, but instead appears to “jitter” or look “fuzzy”. This may be due to thebehaviour of equipment consuming electricity in the building, but could also be an artifact introducedby the measurement device. If this behaviour is indeed caused by equipment in the building, we maybe able to use additional properties of the energy transition density function (e.g., odd vs. evennumber of peaks) to infer additional information about different buildings.5Although most of the analysis done in this chapter was with data from 2014, these preliminaryobservations were made using data from 2013.7210203040500. 50 100 150Hour of Week (h)Electricity Consumption (kWh)030006000900012000−1.0 −0.5 0.0 0.5 1.0Energy Transition (kWh)Frequency of Occurrence10203040500. 50 100 150Hour of Week (h)Electricity Consumption (kWh)030006000900012000−1.0 −0.5 0.0 0.5 1.0Energy Transition (kWh)Frequency of OccurrenceFigure 5.3: An example of electricity consumption sample weeks and an en-ergy transition histogram used for manual inspectionexample of such a histogram and the five sample weeks for a candidate building isgiven in Fig. 5.3. We inspected various verticals and evaluated them as to whetheror not they were a candidate for further analysis. Buildings could be classified as:a) certainly a candidate, b) possibly a candidate or c) not a candidate. We per-formed this investigation for several verticals that we believed were likely to havethe behaviour we were looking for. We also required that any building we manu-ally inspected had valid data points for at least half the year and that at least onevalid consumption measurement occurred during the five fixed sample weeks. InFig. 5.4, we show the results of this manual inspection. Buildings with no bar indi-cate that we had access to fewer than 50 buildings from that vertical with enoughvalid data points for manual inspection. While we only manually inspected a sub-set of the possible verticals here, the full set of available verticals were analyzed inlater stages of analysis, as described in Section 5.3.Interestingly, our preliminary observations showed that some verticals weremore likely to be candidates for our analysis than others. In particular, small pro-73Figure 5.4: Manual classifications of buildings for whether or not they arecandidates for further analysis based on their histogramsfessional offices, including dentists, physicians, chiropractors, lawyers and accoun-tants, seemed to have a large number of candidates. We suspect that this is becausethese buildings have relatively small total consumption loads, especially duringhours when the business is closed. We believe that a building with a lot of equip-ment that is relatively diverse is more likely to have overlapping consumption ofmany different pieces of equipment, making it difficult to isolate any one item. Insmaller buildings with less equipment, however, a single large appliance could cre-ate spikes in consumption that stand out much more, which is what we believe isoccurring.5.2.3 Spike Extraction and CharacterizationAfter identifying building candidates with peaks in their energy transition densityfunctions, we then used these peaks to attempt to extract spikes from these build-ings’ electricity consumption time series data. While one could visually identifythe presence and approximate amplitude of spikes in building consumption data,this is not practical for thousands of buildings. Instead, using the density func-tions allows us to automatically extract this information, which points us towardsthe buildings in which spikes are likely to exist and predicts their amplitude. By74avoiding the need for visual inspection, our method makes spike extraction froma set of thousands of buildings feasible and allows us to analyze spikes that werepreviously very difficult to extract from large sets of consumption data.We defined a spike in consumption as an increase in consumption from oneinterval to the next that is immediately followed by a decrease in consumptionby approximately the same magnitude. The exact definition of “approximately thesame magnitude” depends on various conditions which are discussed in the follow-ing subsections. It is important to remember that density function peaks indicatethat certain energy transitions are frequent, but do not guarantee the presences ofspikes in the consumption time series. Nonetheless, we have observed that whenbuildings have density functions with multiple peaks, they often have spikes in theirelectricity consumption corresponding to these peaks. We therefore attempted toextract spikes from all the buildings identified as candidates using the energy tran-sition density function analysis. We then further analyzed these spikes wheneverthey were detected. Here we discuss our approaches for spike extraction and char-acterization.ExtractionFor each candidate building selected by the energy transition density function anal-ysis (i.e., those with 3 or 5 peaks), we extracted spikes in consumption as follows:1. Apply median filter. We applied a median filter to the electricity consump-tion data using a window of 5 points, so that each point in the filtered signalis the median of the original point and its two neighbours on each side. Thefiltered signal captures slowly moving patterns in the consumption, such asthe difference in consumption of the building during operating hours ver-sus non-operating hours. We subtracted the median filtered signal from theoriginal data to obtain a residual signal, which we proceeded to analyze.2. Locate “on” transitions. We found spikes by searching for any time inter-vals where the energy transition was within a specified window. The widthof the window was chosen based on the localized distribution of energy tran-sitions around the histogram peaks. We first extracted all the data pointsfrom the energy transition density function in the neighbourhood c± 0.5c,75where c was the centre of the first positive histogram peak (i.e., the positivepeak closest to the center peak) and then calculated the variance of thesedata points. We repeated this process for the data in the window −c±0.5c.After calculating the variance of these two regions of the energy densityfunction, we then found the mean of these two variances, which we calledσ2mean. The width used to define the spike search window was then definedas the w = min(1100σ2mean,0.5c). Any energy transitions that were within thewindow c±w were then labelled as “on” transitions.3. Check for “off” transitions. We then searched for a set of “off” intervals,the time intervals where the change in consumption was within −c±w. Theset of detected spikes was defined as the subset of “on” transition times thatwere immediately followed by one of these “off” transitions.6CharacterizationIf a sufficient number of spikes were detected, we then determined the separationtime between spikes. We hypothesized that this separation time may allow us tounderstand the nature of the equipment producing the spikes. For example, if theseparation times are longer when outdoor temperature is high and shorter whentemperature is low, then we could guess that the equipment associated with thepeaks is related to a heating function because it operates more in cooler temper-atures.7 One could also look at how regular the spikes are (i.e., if they tend tohave the same spacing for extended time periods or if the separations appear tobe random). Regular spikes may suggest that they are caused by a single pieceof equipment that operates on a consistent schedule. Random separations couldbe caused by equipment that operates based on external control factors that are not6There are various other ways we could check for “off” transitions. For example, for each de-tected “on” transition with magnitude u, we could check if the next change in consumption is withinthe range −u± 0.1u. We could also define “off” transitions as those with a change in consumption−cˆ± 0.1cˆ, where cˆ is the magnitude of the first negative peak instead of the first positive one. Al-ternatively, we could define both c and cˆ as the mean of the magnitudes of the positive and negative“sister” peak centers. We leave the exploration of these modifications to future work.7It is also possible that the variation in spacing could be related to some other factor that isindependent of temperature, but happens to create this pattern (e.g., change of building occupancywith season).76consistent in time or by multiple pieces of equipment operating simultaneously andcycling out of sync with one another.While we focused our analysis on the time separation between spikes, manyother properties of the spikes could be analyzed, such as their amplitudes or thetime of day that they occur. This area of exploration is left to future work.5.2.4 Temperature Correlation AnalysisWe chose to investigate the relationships between spike separation time and theoutdoor temperature. For a given building, we used the average daily temperaturemeasured at the nearest weather station.Correlations with temperatureFor several sample buildings, we analyzed the relationship between spike separa-tion time and outdoor temperature as follows:1. Find separation times. For each spike, we defined its separation time asthe time between itself and the previous spike. We excluded any separationtimes that were longer than 24 hours, since such a long separation is morelikely to be cause by missing meter data or undetected spikes than the regularconsumption pattern of the equipment causing the spikes.2. Compare with daily mean temperature. We plotted the separation timesas a function of the daily mean temperature. We found that a significantnumber of outliers affected the results.3. Find daily median separation. It seemed that there was a large concen-tration of similar spike separation times, but there were also many outliers.We believe that this is due to undetected spikes that occurred but were notdetected because of other equipment using electricity in the building. To re-duce the effect of these outliers, we found the median separation times foreach day and plotted these as a function of the daily mean temperature.4. Find weekly median separation. Even the daily median separation ap-peared to be affected by outliers, so next we found the median separation in77each week and plotted it against the average of the seven daily mean tem-peratures over that week. This seemed to provide enough balance betweenexcluding outliers and keeping enough data points to capture outdoor tem-perature variation with changing weather.Based on the analysis above for several sample buildings, we decided to inves-tigate the relationship between weekly median spike separation time and weeklymean temperature for a larger set of buildings. For each building from this largerset, we performed tests as summarized in Fig. 5.5a and described as follows:1. Calculate linear regression. Using R’s lm function [106], we found a uni-variate linear regression between the weekly median spike separation andthe weekly mean temperature from one year of consumption data.2. Apply p-value threshold. We considered any buildings where the p-valueof the regression fit was greater than 0.05 as having no correlation betweenspike separations and temperature.3. Check for sufficiently steep slope. We found the range of slopes for whichthere was a 95% probability that the true slope fell between the upper andlower bound. If this range included a slope of 0, the building was classifiedas having no positive or negative correlation between spike separations andoutdoor temperature because its regression slope was too close to zero.4. Classify as heating or cooling. In buildings with spike separation timesthat were positively correlated with outdoor temperature, we predicted thatthe spikes were caused by equipment responsible for a heating function,since it operates less often in high temperatures. Similarly, we predictedthat spikes with separations that were negatively correlated with temperaturewere caused by equipment responsible for a cooling function.Correlations with heating degree-days and cooling degree-daysIn an extension of the above testing method, we also explored how spike separa-tions are correlated with heating degree-days (HDD) or cooling degree-days (CDD)78weekly medianspike separationsweekly meantemperaturescalculateunivariateregressionfind 95%confidencebounds of slopecheck signof slopenocorrelationno positiveor negativecorrelationheatingcoolingp-value < 0.05 does not include 0p-value > 0.05 includes 0positivenegative(a) Correlations with temperatureweekly medianspike separationsweekly meanheat/cool degreescalculateunivariateregressionfind 95%confidencebounds of slopecheck signof slopeabsent absentpresentabsentp-value < 0.05 does not include 0p-value > 0.05 includes 0negativepositive(b) Correlations with heating or cooling degreesFigure 5.5: Flow charts of spike separation correlation testingrather than with temperature. HDD are defined asHDD = max(0, tset,heat − toutdoor) (5.1)where tset,heat is the temperature below which heating is typically required andtoutdoor is the average outdoor temperature on a given day. Similarly, CDD are de-fined asCDD = max(0, toutdoor− tset,cool) (5.2)where tset,cool is the temperature above which cooling is typically required.8Here, the probable presence of a heating or cooling system is indicated by anegative correlation between spike separation and HDD or CDD, respectively. Forexample, higher HDD means more heating is required, so heating equipment islikely to operate more frequently, reducing the separation time between spikes. IfHDD or CDD are positively correlated with spike separation, the results are moredifficult to interpret. While these positive correlations could indicate the presence8For a more detailed discussion of HDD and CDD, see [107]. Note that they assume tset,heat =tset,cool , but this does not necessarily have to be the case.79of the opposite equipment (e.g., cooling equipment for a positive correlation withHDD), the heating and cooling set points mean that this is not necessarily true. Forthis reason, we chose to simply predict the presence of heating or cooling equip-ment if a negative correlation was detected with the respective variable and assumethe absence of that equipment otherwise. This process is depicted in Fig. 5.5b.By running our analysis on the same set of buildings for both HDD and CDD,we can come up with predictions for both heating and cooling equipment for eachbuilding.9 As in our analysis using outdoor temperature, we could then label build-ings as having heating, cooling or neither.Using HDD and CDD also theoretically allows us to predict the presence ofboth heating and cooling equipment in the same building, although our methodis unlikely to be able to detect both simultaneously. This is because our spikeextraction method, as described in Section 5.2.3, searches for only one size ofconsumption spike, even when there is more than one pair of peaks in the energytransition density function. Thus, the only situation in which one should expect todetect both heating and cooling using our method is if the equipment for both ofthese functions produces consumption spikes that are approximately the same size.5.2.5 Alternative Approaches to Analyzing Consumption SpikesWhile the direction of analysis we chose to explore showed interesting results,there are many other ways one could analyze the energy consumption time seriesdata. Here we mention two such methods and discuss the reasons we chose not topursue these directions in this thesis.Fourier analysisSince the spikes we detect are typically very frequent and appear to have a peri-odicity, one might expect that Fourier analysis of the electricity consumption datawould allow us to extract useful information about the spikes. Unfortunately, ourobservations of the data and an exploration of the Fourier content of the data sug-9Alternatively, one could perform a multivariate regression using both HDD and CDD at the sametime. If neither heating nor cooling was detected, one could then perform two separate univariatecorrelation tests using HDD and CDD individually. Exploring such an implementation and its benefits,if any, is left to future work.80gested otherwise. As an example, Fig. 5.6a and Fig. 5.6b show a sample week andsample day of a building’s consumption data, respectively. We can see in Fig. 5.6bthat frequent spikes occur approximately every 4 hours in the time domain. Forsignals with a period of 4 hours, one would expect to see a large frequency contentat fspike = 14∗3600 ≈ 6.94×10−5 Hz and integer multiples of this frequency. As wecan see from Fig. 5.6c, spikes in frequency content do indeed occur at multiplesof fspike (shown by the red dotted lines), but there are also many other spikes inthe Fourier spectrum that are just as large. Thus, if we had not already seen thetime series data and known to look for frequency content corresponding to a 4 hourperiodic signal, we would not have been able to pick these spikes out as important.We believe that the lack of dominant frequency content corresponding to theobserved spikes may be due to several factors. First, the spikes are relatively smallcompared to the total consumption of the building, which may explain why there isnot much energy in the frequency content associated with them. Second, becausethe spikes are a sharp pattern, they contain a lot of harmonic content. That is,the spikes cannot be described by the fspike Hz frequency alone, but also requiremany integer multiples of fspike to be accurately represented. This may be causinga “spreading out” of the energy content of the spikes over many multiples of fspikeso that we are left with many small peaks in the Fourier spectrum rather than onelarge one.There may also be other factors about the electricity consumption signal thatmake it difficult for us to associate properties of the Fourier spectrum with the ob-served spikes. In any case, we chose not to pursue this area of research further dueto these difficulties. Nevertheless, we believe that Fourier analysis of the electricaldata could hold very valuable information and that further work in this area wouldbe beneficial.Autocorrelation functionOne might expect that if an appliance has a process with a characteristic duration,we may be able to detect this behaviour from the autocorrelation function (ACF) ofthe energy consumption. That is, we can take the energy consumption signal andshift it in time by varying amounts, calculating the correlation between the original810 1 2 3 4 5 6 (days)Electricity consumption (kWh)(a) Electricity consumption: one week0 5 10 15 (hours)Electricity consumption (kWh)(b) Electricity consumption: one day0e+00 1e−04 2e−04 3e−04 4e−04 5e−041e−031e−021e−011e+001e+01Frequency (Hz)Magnitude of Fourier Coefficient(c) Fourier coefficients of electricity consumptionFigure 5.6: Time and Fourier properties of a building with frequent spikesand shifted signals. If there are large correlations for certain shifts, there maybe properties of the signal with durations corresponding to that time shift. Whilethis method appears to be able to identify correlations between days and weeks,we found that it does not help us detect the consumption spikes we are searchingfor. For example, Fig. 5.7 shows the energy consumption of two buildings, alongwith the ACF of these signals (after removing the weekly periodicity and trendusing R’s stl function [108, 109]), which performs seasonal decomposition oftime series. The first building has clearly visible consumption spikes that appearto occur regularly, while the second building does not. In both buildings, we can820 100 200 300 400 500 6000. Interval IndexElectricity Consumption (kWh)(a) Building 1: Consumption0 100 200 300 400 500 6002468Time Interval IndexElectricity Consumption (kWh)(b) Building 2: Consumption0 100 200 300 400 500 600 7000.00.40.8LagACFSeries  y.resid.ts(c) Building 1: ACF0 100 200 300 400 500 600 700−  y.resid.ts(d) Building 2: ACFFigure 5.7: Examples of energy use and their autocorrelation functionssee positive correlations around shifts corresponding to full days (e.g., 96 and 192intervals). It is difficult, however, to attribute any differences between the two ACFplots to the presence or absence of consumption spikes. We can observe that thefirst building does have a small “bump” in its ACF near a shift of 16, correspondingto the approximate separation between spikes of 4 hours, but other variations in theACF make this bump difficult to pick out and make it unclear whether or not it isan important observation. A possible reason that it is difficult to find properties ofthe ACF related to the spikes may be that they are small compared to the energyconsumption each day. Therefore, other correlations like the daily patterns may“drown out” those related to the spikes.Based on our observations here, we chose not to explore the ACF of the elec-tricity consumption time series at this time. We do believe, however, that furtherexplorations in this area may allow one to extract useful information about end useconsumption in buildings and we leave this to future work.83Detection of longer duration eventsWe restrict our work to the detection of spikes that are defined by a consecutiveincrease and then decrease in energy consumption. One could instead consider de-tecting the presence of processes with longer durations so that their “on” and “off”transitions do not occur in consecutive time intervals. While we considered per-forming such an analysis, early observations showed that these types of processeswere far less common that short spikes. For example, the differences in time be-tween each detected “on” transition and the closest “off” transition that follows itare summarized in Fig. 5.8 for 4 sample buildings.10 While the distributions vary,the 15 minute time duration between “on” and “off” transitions is by far the mostcommon in all four cases.We believe that the relatively low number of process durations that are longerthan one interval may have to do with the fact that there are so many appliancesoperating in a building at once. It is possible that the farther in time we get from an“on” event, the less likely we are to find an “off” event with a very closely matchedmagnitude because there is a higher likelihood that other equipment has operatedbetween the “on” and “off” events. This makes longer equipment processes moredifficult to find than brief spikes and may explain why we cannot detect very manyin our sample buildings. It would be interesting to explore this idea further. Itwould also be interesting to investigate whether or not the methods we describehere could be extended to longer processes. We leave both of these items to futurework.10The method of spike detection does not exactly match that in Section 5.2.3 because these obser-vations were made at an early stage in the research process, but the general techniques were similar.840%20%40%0 5 10 15 20 25Estimated duration (hours)Percent of events(a)0%10%20%30%40%0 5 10 15 20 25Estimated duration (hours)Percent of events(b)0.0%2.5%5.0%7.5%10.0%0 5 10 15 20 25Estimated duration (hours)Percent of events(c)0%3%6%9%12%0 5 10 15 20 25Estimated duration (hours)Percent of events(d)Figure 5.8: Examples of the estimated durations of equipment processes infour different buildings5.3 Results and Discussion5.3.1 Number of CandidatesWe begin by summarizing the number of buildings on which our analysis wasperformed. Starting with the original set of approximately 30000 buildings fromthe same geographic region, each one could be rejected from further analysis forany of the following reasons:1. Incorrect data granularity.2. Insufficient number of measurements.3. Lack of significant peaks in the energy transition density function.85Sufficient Num. of WeeksSufficient SpikesHistogram Peaks PresentSufficient MeasurementsCorrect Data GranularityFull Cohort0 25 50 75 100Percent of BuildingsFigure 5.9: Size of sample set after each stage of analysis4. Insufficient number of spikes corresponding to the peaks detected.5. Insufficient number of weeks in which spikes were detected.After rejecting some buildings at each of these 5 stages, we ended up with around4000 buildings. In Fig. 5.9, we show the percentage of buildings from the originalset that remained after each of these five stages. The step where we lose the mostbuildings was during energy transition density function analysis, where we iden-tify which buildings exhibit peaks in their energy transition density function andwhich do not. This drop is to be expected, since we know that not all buildingsdisplay this behaviour. Based on our results, approximately 17.4 % of buildingsfor which sufficient measurements were available displayed these peaks. The nextmost significant drop in the sample was due to buildings that did not have a suf-ficient number of valid measurements in order to be analyzed. While this makesour analysis more difficult because we have fewer buildings to work with, it is notdirectly related to the accuracy of our analysis method. The other drops in the testset are small and are therefore not a concern.5.3.2 Density Function Selection PerformanceTo evaluate the performance of our automatic density function selection method,we chose 200 buildings from the original set and manually inspected their his-tograms. The histograms were created by splitting the range of the measured en-86ergy transitions into 200 equally spaced bins. Not all of the buildings had enoughmeasurements to be considered candidates for analysis (corresponding to loss ofbuildings at the first stage listed in Section 5.3.1), so the total number of histogramswe manually inspected ended up being fewer than 200.For manual inspection, we classified each building for which we could createa histogram as “Candidate”, “Not Candidate” or “Unsure”. We used these obser-vations as our “ground truth” for this analysis, while keeping in mind that visualinspection is not necessarily always correct. Because we did not know what equip-ment was operating in the buildings we were investigating, we did not know thereal ground truth. Our manual observations therefore served as a baseline to testour automatic density function selection method.The confusion matrix in Table 5.1 summarizes the performance of our auto-matic density function selection method, where the “true” answers are from manualinspection. From the set of 200 buildings, we have excluded 9 that were classifiedas “Unsure” using the manual inspection because we could not visually discernwhether or not histogram peaks were present. This, and the fact that not all build-ings in the set of 200 had enough measurements to create a histogram, mean thatthe sum of the values in Table 5.1 is actually 146. Thus, we have also shownin brackets the values in each category as a percent of the total number of casesreported in the table.AnswerDetected Not DetectedPredictionDetected 18 (12.3) 5 (3.4)Not Detected 21 (14.4) 102 (69.9)Table 5.1: Automatic density function selection performanceFrom Table 5.1 we can see that our density function selection method has avery low rate of false positives. Its greatest weakness is that it has a significantnumber of false negatives, where peaks exist but are not detected by the automaticmethod. While we attempted to vary the threshold parameters of the method toimprove its performance, the results in Table 5.1 were the best we were able to87obtain. Thus, we expect that a major change in the peak finding strategy would benecessary to improve its performance significantly.115.3.3 Correlation AnalysisNext, we explored the results of our correlation analysis. In Fig. 5.10 we showthe number of buildings on which we performed the analysis, as well as a break-down of the number of buildings in which we detected a positive, negative or nocorrelation between the median weekly spike separation time and the mean weeklyoutdoor temperature. Note that all of the results in this section refer to correlationswith temperature and not with HDD or CDD. Correlations with HDD and CDD arediscussed in Section 5.3.6To ensure that our correlation analysis technique from Section 5.2.4 was notdetecting correlations where they did not exist, we also ran a control test wherethe temperature was synthetically generated using a normal distribution. Thesetemperature values are not physically realistic, but allow us to create a scenariowhere we know that there is no relationship between the temperature and the spikeseparations. In this control test, we detected positive correlations in approximately2.6% of the buildings and negative correlations in approximately 2.3%. This test isreassuring, since the percent of buildings in which we found correlations with thetrue temperature is significantly higher in both cases.In Fig. 5.11 we show the breakdown of positive, negative and no correlationfor different verticals.12 The absolute number of buildings with each detected cor-relation type is given in Fig. 5.11a. The relative number of each correlation typeas a fraction of the number of buildings of that vertical in given in Fig. 5.11b.Buildings with a positive correlation likely indicate that the spikes are related to aheating function, while negative correlation likely points to a cooling function, andthe plot is labelled as such.Since the original test set did not contain equal numbers of each vertical, we11It is also worth noting that the detection and location of energy transition density function peaksis used not only to identify which buildings are likely to display spikes, but also to predict a spikesize we should search for. This means that improved peak location methods could not only improveour selection of buildings with peaks, but could also allow us to more accurately find spikes in theconsumption time series. We leave such an improvement of the peak finding method to future work.12We show in this figure only verticals for which at least 20 buildings were available.88No CorrelationNegative CorrelationPositive CorrelationAnalyzed Spaces0 25 50 75 100Percent of BuildingsFigure 5.10: Number of correlations detected in analyzed buildingsCoffeeShopBarConvenienceStoreElementaryMiddleSchoolGroceryStoreHotelFastFoodDryCleaningLaundryClothingStoreLiquorStoreRestaurantAccountingOfficeChiropractorOfficeLawOfficePhysicianOfficeBarberShopReligiousWorshipNailSalonVehicleServiceBeautySalonDentistOffice0 250 500 750Number of Buildings0.00 0.25 0.50 0.75 1.00Fraction of BuildingsDetectedCorrelationHeatingCoolingNoneFigure 5.11: Breakdown of detected correlations by verticalshow in Fig. 5.12 the number of buildings for a given vertical with detected cor-relations as a fraction of the number of buildings of that vertical in the originaltest set. Here the fractions do not add up to one (as in the right panel of Fig. 5.11)because not all of the buildings in the original test set had spikes that we could an-alyze. Also note that in Fig. 5.12 the verticals are ordered by fraction of buildingswith heating detected and are thus ordered differently than in Fig. 5.11.It is worthwhile to observe that the set of verticals in the top ten bars of Fig. 5.1189CoffeeShopFastFoodRestaurantGroceryStoreHotelBarDryCleaningLaundryReligiousWorshipElementaryMiddleSchoolConvenienceStoreClothingStoreAccountingOfficeVehicleServiceLawOfficeLiquorStorePhysicianOfficeChiropractorOfficeBeautySalonDentistOfficeBarberShopNailSalon0.0 0.1 0.2 0.3 0.4Fraction of Buildings from Original Test SetDetectedCorrelationHeatingCoolingNoneFigure 5.12: Fraction of each vertical with detected correlationsare almost identical to the set in the top ten bars from Fig. 5.12. The ordering ofthe sets differs in each plot, but the only change of the verticals in the top ten is thatreligious worship buildings in Fig. 5.11 is replaced by liquor stores in Fig. 5.12.We can also see that the top ten verticals in which spikes are detected seem to befrom three categories: small professional offices (dentist, chiropractor, physician,law and accounting), personal service industries (nail salons, beauty salons, barbershops) and vehicle service businesses. All three of these categories fall within theservice industry. They are businesses that tend to offer services as their main sourceof income and are likely to operate during clear opening hours (which may or maynot be regular business hours). Other buildings which do not fall into this categoryand are likely to have very different use patterns include retail stores (includinggrocery stores), schools, religious worship buildings, restaurants and hotels.Further, we observe that there is not enough similarity among buildings wherespikes were detected to use this an indicator of vertical. That is, if we had onlydetected spikes in one or a small number of verticals (e.g., beauty salon, nail salonor barber shop), it might have been reasonable to conclude, “because we detectedspikes in this building’s consumption, it is very likely a beauty salon, nail salon orbarber shop.” Since these are not the observations we made, such a conclusion isnot possible. Nevertheless, consumption spikes may still be helpful in identifyingverticals when combined with other methods of analysis or indicators of vertical.905.3.4 Correlation SamplesTo better understand the behaviour of the buildings with and without correlationsdetected, we took a closer look at a few sample buildings. Here we show an exam-ple of a building from each correlation category (positive, negative and no correla-tion) and discuss general observations that were made for buildings in that category.All of the buildings we tested were located in the northern hemisphere, so theirwinter months are approximately October to March. Again, we note that all of theresults shown here are for correlations between spike separation and temperature,not HDD or CDD.Positive correlation (heating)In Fig. 5.13 we show six sample weeks of electricity consumption from a year fora building where positive correlation was detected.13 We highlight spikes usingblue for the “on” transition and red for the “off” transition. In Fig. 5.14, we showsome plots which summarize the building, including plots of the mean weeklytemperature and the median weekly spike separation versus time, a plot of weeklyseparation versus temperature and part of the histogram of the energy transitionsfor the year. In the histogram, we have left out the center portion, defined as anyenergy transitions with a magnitude that is less than half of c, the location of thefirst histogram peak. This was done because buildings often have a very tall centerpeak, which makes it difficult to see the shape and size of smaller peaks.We can make the following observations about this specific building:• The consumption spikes are easily visible. They are detected often, but notalways, by our automatic search method.• This building does not have a strong underlying consumption pattern (e.g.,on for 9 hours Monday to Friday). This may be why extracting spikes wasparticularly easy in this case.• We can see in Fig. 5.13 that the spikes are more closely spaced in the earlyand late weeks of the year (when outdoor temperatures are lower).13Week 1 is the first week of January, according to ISO 8601 [110].91 816243240480. 50 100 150Hour of Week (h)Electricity Consumption (kWh)TransitionNoneOnOffFigure 5.13: Sample weeks of a building with a positive correlation detected• There is a clear correlation between the spike separation and outdoor tem-perature, as seen in the bottom left pane of Fig. 5.14. A linear fit, however,may not be the best way to model this relationship.• The histogram has very sharp peaks around 0.2 kWh. These sharp peaks sug-gest that spikes are likely to be consistently sized and easy to automaticallyidentify, which is indeed the case.After observing sample weeks and regression plots for several other buildings92Jan Mar May Jul Sep Nov Jan81 01 21 41 61 82 0TimeWeek ly  Me an  o f Da il y  T emp er at ur es  ( C)Jan Mar May Jul Sep Nov Jan3 005 007 009 00TimeMe di an  S pi k e Se pa rat i on  T ime  ( mi n )8 10 12 14 16 18 203 005 007 009 00Weekly Mean of Daily Temperatures (C)Me di an  S pi k e Se pa rat i on  T ime  ( mi n )Energy Transition (kWh)F re qu en cy  o f Oc cu rr en ce  i n On e Year−1.0 −0.5 0.0 0.5 1.005 01 502 50Figure 5.14: Summary plots of a building with a positive correlation detectedwith positive correlations, we can also make the following general observationsabout buildings with this correlation type:• The spikes in the buildings are typically very easy to pick out visually andthe spike detection method appears to do a good job of identifying them.• These buildings often have separations between peaks that varies continu-ously with temperature (as opposed to taking on a few discrete spacings).• The variation in separation is not always noticeable enough to see in the sam-ple weeks, but the correlation can generally be seen clearly in the summaryplots.• Histograms for these buildings usually have very clear and sharp peaks.93Negative correlation (cooling)For a sample building with a negative correlation detected, we show sample weeksof electricity consumption in Fig. 5.15 and summary plots in Fig. 5.16. For thisspecific building, we can make the following observations:• The spikes are usually quite large (about 0.8 kWh), and are easy to visuallyidentify during the winter months (weeks 8, 16 and 48).• There is also a large variation in the size of spikes being detected.• During the summer months (weeks 24, 32 and 40), it seems as if there are twodifferent sizes of spikes being detected by our automatic method. The largerones appear to match those that occur during winter months. The smallerones look like they typically only occur during the summer and are muchmore closely spaced. This suggests that our detection method is detectingmultiple sets of equipment, one of which operates all year and one of whichonly operates in the summer months.• The summary plots suggest that the building tends to be in one of two“states” with two distinct spike separations, rather than having spike sep-arations that vary continuously with temperature.• The histogram shows that the peaks are much less sharp than for the positivecorrelation example. In fact, they appear to be double peaks, with localmaxima around 0.6 kWh and 0.8 kWh. This is consistent with our earlierobservations.After observing several other buildings with negative correlations, we can alsomake the following observations about buildings with this correlation type:• Having what seems to be an additional piece of equipment operating in thesummer months is a common pattern for buildings in which negative correla-tion is detected, but this pattern was not always observed for these buildings.• No samples we looked at showed a clear visual change in separation of thespikes from one week to the next, as was observed for the positive correlationcases.94 816243240480. 50 100 150Hour of Week (h)Electricity Consumption (kWh)TransitionNoneOnOffFigure 5.15: Sample weeks of a building with a negative correlation detected• Several of the samples we looked at appeared to have behaviour that may bemore closely correlated with CDD than with temperature.No correlationFor a sample building with no correlation detected, we show sample weeks of elec-tricity consumption in Fig. 5.17 and summary plots in Fig. 5.18. For this building,we can make the following observations:• It is difficult to tell if the spikes detected are related to one another (i.e.,95Jan Mar May Jul Sep Nov Jan51 01 52 02 53 0TimeWeek ly  Me an  o f Da il y  T emp er at ur es  ( C)Jan Mar May Jul Sep Nov Jan2 004 006 008 00TimeMe di an  S pi k e Se pa rat i on  T ime  ( mi n )5 10 15 20 25 302 004 006 008 00Weekly Mean of Daily Temperatures (C)Me di an  S pi k e Se pa rat i on  T ime  ( mi n )Energy Transition (kWh)F re qu en cy  o f Oc cu rr en ce  i n On e Year−1.5 −1.0 −0.5 0.0 0.5 1.0 1.505 01 001 50Figure 5.16: Summary plots of a building with a negative correlation detectedcaused by the same piece of equipment). They are sometimes easy to see(e.g., hours 125 to 140 of week 40) and sometimes difficult to see (e.g.,hours 75 to 90 of week 40).• As expected, Fig. 5.18 shows essentially no relationship between the spikeseparations and the temperatures.• The histogram peaks in this case are sharper than for the negatively corre-lated example but not as sharp as for the positively correlated example.After observing several other buildings with no correlations, we make the fol-lowing additional observations about these buildings:• The set of buildings had a wide variety of consumption patterns, but in gen-eral tend to be more variable than the buildings with positive or negative96 816243240480. 50 100 150Hour of Week (h)Electricity Consumption (kWh)TransitionNoneOnOffFigure 5.17: Sample weeks of a building with no correlation detectedcorrelations.• Whereas buildings with positive or negative correlations often have spikesthat are easy to see, the uncorrelated buildings do not seem to have spikesthat are easy to separate from the remainder of the signal, whether visuallyor automatically.• The histograms for these buildings often had peaks that were much widerthan when other correlation types were detected.97Jan Mar May Jul Sep Nov Jan51 01 52 02 53 0TimeWeek ly  Me an  o f Da il y  T emp er at ur es  ( C)Jan Mar May Jul Sep Nov Jan2 004 006 008 00TimeMe di an  S pi k e Se pa rat i on  T ime  ( mi n )5 10 15 20 25 302 004 006 008 00Weekly Mean of Daily Temperatures (C)Me di an  S pi k e Se pa rat i on  T ime  ( mi n )Energy Transition (kWh)F re qu en cy  o f Oc cu rr en ce  i n On e Year−1.0 −0.5 0.0 0.5 1.0 1.501 002 003 00Figure 5.18: Summary plots of a building with no correlation detectedDiscussionOf the observations made here, we highlight and discuss two of the most interest-ing. First, we highlight the common observation of a “two-state” behaviour forbuildings where negative correlation was detected. It is not surprising that a pieceof equipment used for cooling would only operate in the warmer months and beshut off completely in the cooler months. It is interesting, however, to note thatthis behaviour was not observed in buildings with positive correlations detected.Instead, positive correlations between spike separation and temperature appearedto be more linear, suggesting that whatever equipment caused the spikes varies itsbehaviour continuously throughout the year and is never shut off completely.Another observation we wish to highlight is that buildings in which no cor-relation was detected tend to have more variation in their consumption (i.e., the98signal appears more “fuzzy”). This is not surprising, since our method is lookingfor spikes in energy consumption of a very specific size and is likely to have diffi-culty detecting this behaviour in very noisy electricity consumption time series.14It is possible that these buildings simply don’t have equipment that produces spikesvery often, but we suspect that this is not always the case. We believe that in somebuildings our method simply has difficulty detecting the spikes amongst the noise.If this is the case, then missed spikes that should have been detected will affect thespike separation times and could make it more difficult to detect correlations be-tween spike separations and temperature. This observation also suggests that moreaccurate and precise ways of detecting spikes in the noisy electricity consumptionsignal could improve our analysis. We again leave such an extension to futurework.5.3.5 Comparison with an Alternative MethodWhile we did not have a ground truth list of buildings that are known to use elec-tric or gas heating or cooling, we did have a second method with which we couldcompare our results. Though neither method is guaranteed to be accurately pre-dicting the presence of heating and cooling systems in buildings, they can supportone another’s predictions if they both give similar results.The second method, which we refer to as the “alternative method”, is a confi-dential proprietary disaggregation technique owned by the partner organization weworked with during the Mitacs internship and is designed to predict the presenceor absence of heating and cooling systems in a building. The results we have avail-able using this method are for a subset of the original cohort of buildings using datafrom September 1, 2013 to September 1, 2014.15One of the main challenges in comparing our results with those from the al-ternative method is that each gives slightly different outputs. For the alternativemethod, we can classify a building as having heating, cooling, neither or both forelectricity and gas. This means there are 16 possible categories a building might14We consider “noise” to be any electricity consumption other than that caused by the equipmentwe are trying to detect.15While these dates to not match the data we used for the correlation analysis exactly, the timeperiods are mostly overlapping.99fall into. Our correlation analysis looks only at electricity data, however, and cannot always predict the presence of both heating and cooling simultaneously.16 Onemight expect that we therefore only have the capability to detect electric equipment,but it is possible that the spikes in electricity data are actually related to some partof equipment that consumes mostly gas (e.g., a fan for a gas heater). Thus, wecannot directly compare the two-dimensional predictions of the alternative methodwith our one-dimensional correlation results.Further, the alternative method results are not available for all the buildings inour original test set, so the number of buildings we can use for this comparisonis only 1267, which is notably smaller than the number of buildings on whichwe ran our correlation analysis (see Section 5.3.1). Nevertheless, we perform thecomparison in an attempt to gain insight about our correlation method.We note as well that the evaluation of our method could be greatly improvedif we knew what the buildings actually used for heating and cooling. Until we canobtain such a set, either by asking building owners, physically inspecting propertiesor simulating buildings with various types of heating and cooling, we cannot knowfor sure how well either method predicts the presence of heating or cooling. Inthe following subsections, we perform what analysis we can using the informationavailable and leave validation using a real ground truth data set to future work. Wenote once more that the results in this section are for correlations between spikeseparation and temperature only. Similar comparisons using heating and CDD aregiven in Section 5.3.6.Relationship between correlations and equipment typeFig. 5.19a shows a tile plot of the alternative method results from all the buildingsfor which our correlation analysis could also be run. The tile plot is like a twodimensional bar plot, where the number on each tile indicates the percent of build-ings that fall into that “bin”. We can see here that the most common equipmentcombination for these buildings is electric cooling and gas heating.We also took the subset of buildings from Fig. 5.19a in which a positive cor-16We can detect both heating and cooling when using HDD and CDD, as discussed in Section 5.2.4,but not using outdoor temperature. The alternative method always uses HDD and CDD and can there-fore detect both heating and cooling simultaneously.1005.371.036.550.325.760.473.630.0815.392.9232.520.391.0311.050.4712.790.24No DataNeitherHeatingCoolingBothNo DataNeitherHeatingCoolingBothElectricity DetectedGas DetectedTotal Buildings = 1267(a) All analyzed buildings4.780.967.650.766.310.762.6813.961.9133.080.380.3810.710.1915.49No DataNeitherHeatingCoolingBothNo DataNeitherHeatingCoolingBothElectricity DetectedGas DetectedTotal Buildings = 523(b) Positive correlation with temperature2.580.744.432.951.8514.763.6938.010.372.9512.550.7413.650.74No DataNeitherHeatingCoolingBothNo DataNeitherHeatingCoolingBothElectricity DetectedGas DetectedTotal Buildings = 271(c) Negative correlation with temperatureFigure 5.19: Tile plots comparing our method with the alternative methodrelation between spike separation and temperature was detected and created a newtile plot from these buildings in Fig. 5.19b. We expect that if positive correlationis related to any one combination of heating and cooling equipment, the propor-tion of buildings with this combination should be notably larger than in Fig. 5.19a.Similarly, we took the subset of buildings from Fig. 5.19a in which a negative cor-relation between spike separation and temperature was detected and created a newtile plot from these buildings in Fig. 5.19c.From Fig. 5.19, we can see that the plots in a, b and c are all very similar.These results suggest that detecting buildings with positive or negative correlationbetween spike separation and temperature does not appear to be related to any101specific equipment combinations (assuming that the alternative method predictionsare indeed accurate).Confusion matricesAs another comparison, we created confusion matrices comparing the alternativemethod’s results for both gas and electric equipment with our results. In Table 5.2,we show how often each equipment type was detected by the alternative methodand our own method. Again, the number in brackets is the percent of the totalnumber of buildings with data for that resource.Temperature PercentCorrelation Prediction AgreementDetected Not DetectedElec. HeatingDetected 189 (14.9) 248 (20.0) 54.1Not Detected 334 (26.4) 496 (39.1)Elec. CoolingDetected 237 (18.7) 736 (58.1) 39.2Not Detected 34 (2.7) 260 (20.5)Gas HeatingDetected 314 (39.7) 410 (51.8) 45.4Not Detected 22 (2.8) 45 (5.7)Gas CoolingDetected 11 (1.4) 15 (1.9) 76.5Not Detected 171 (21.6) 594 (75.1)Table 5.2: Comparing predictions by our method and the alternative methodWe can see that there is not much agreement between the methods. Agreementis highest for gas cooling and electric heating, but this is primarily due to largenumbers of true negatives. Indeed, the highest number of true positives for anyequipment type is for gas heating, where 39.7% of the time the methods both detectheating. The overall agreement, defined aspercent agreement =true positives + true negativestotal buildings, (5.3)is largest for gas cooling, but this metric can be deceiving. The high reportedagreement for gas cooling is due to the large number of true negatives we men-tioned above, which says little about how well our method can detect the presence102of equipment.175.3.6 Correlations with Heating and Cooling Degree-DaysAs discussed in Section 5.2.4, we can also investigate the correlation between spikeproperties and HDD and CDD instead of outdoor temperature. We performed thisanalysis using tset,heat = 15 ◦C and tset,cool = 18 ◦C. The analysis was run on all ofthe approximately 4000 buildings for which spikes were detected from the originaltest set. Of these, 5 buildings did not have enough variation in either that year’sHDD or CDD to calculate correlations, so the results we report here are for theremaining buildings only.In Table 5.3 we show a confusion matrix depicting how often using outdoortemperature for our analysis agrees with using HDD and CDD as a percent of thenumber of buildings on which the analysis was run for both variables.18 We cansee that there was most often agreement between the two methods. We also see thatusing HDD or CDD results in fewer detections of correlation as compared to withusing temperature. This suggests that the spikes we are detecting may sometimesbe caused by an appliance that does not have a set point and truly does correlatewith outdoor temperature and not HDD or CDD. It is also possible that correlationsare not detected with HDD and CDD because the set point we used does not corre-spond to the true set point of the equipment creating the spikes. For example, thespikes could be caused by an electric water heater. This type of equipment wouldhave a much higher set point than the 15 ◦C we assumed for HDD and may havea more complex relationship with temperature than a space heater. Investigatingcharacteristics of buildings where spikes are correlated with temperature but notHDD or CDD is an interesting direction for future work.While Table 5.3 highlights the differences between using outdoor temperatureand HDD or CDD, it is also interesting to compare the results from HDD and CDDwith the alternative method’s results. We have therefore provided Table 5.4, anextended version of Table 5.2 that includes confusion matrices for our results using17In fact, because gas cooling is known to be very uncommon in this geographical region, it isbelieved that results from the alternative method that predict gas cooling are incorrect.18We also broke down the detected correlations by vertical using HDD and CDD (as in Sec-tion 5.3.3), but have omitted these figures because they are very similar to Fig. 5.10, Fig. 5.11 andFig. 5.12.103TemperatureDetected Not DetectedHDDDetected 31.0 1.3Not Detected 6.0 61.8CDDDetected 9.5 0.8Not Detected 6.1 83.6Table 5.3: Comparing our method’s predictions using temperature vs. heat-ing and cooling degree-daysHDD and CDD. Note that the total number of buildings when using HDD and CDDwas 1266, which is one less than when using temperature. This discrepancy is dueto the fact that 1 of the 5 buildings with too little variation in HDD and CDD was inthe original 1267 buildings and was removed.Temperature Percent HDD and CDD PercentCorrelation Prediction Agreement Correlation Prediction AgreementDetected Not Detected Detected Not DetectedElec. HeatingDetected 189 (14.9) 248 (20.0) 54.1 170 (13.4) 266 (21.0) 55.9Not Detected 334 (26.4) 496 (39.1) 292 (23.1) 538 (42.5)Elec. CoolingDetected 237 (18.7) 736 (58.1) 39.2 171 (13.5) 801 (63.3) 35.5Not Detected 34 (2.7) 260 (20.5) 16 (1.3) 278 (22.0)Gas HeatingDetected 314 (39.7) 410 (51.8) 45.4 280 (35.4) 443 (56.1) 41.2Not Detected 22 (2.8) 45 (5.7) 21 (2.7) 46 (5.8)Gas CoolingDetected 11 (1.4) 15 (1.9) 76.5 6 (0.8) 20 (2.5) 82.1Not Detected 171 (21.6) 594 (75.1) 122 (15.4) 642 (81.3)Table 5.4: Comparing predictions by our method and the alternative methodusing temperature or heating and cooling degree-daysWe can see in Table 5.4 that using HDD and CDD tends to result in fewer pos-itives (both true and false) and more negatives (both true and false). The overallaccuracy of the method increases for electric heating and gas cooling, but decreasesfor electric cooling and gas heating. Interestingly, most buildings in this geographi-cal area have electric cooling and gas heating. This suggests that our overall agree-ment metric may be influenced by the number of buildings in the set with each104equipment type.We can also compare the HDD and CDD results with the alternative method bycreating new tile plots for this analysis similar to those in Fig. 5.19. These new plotsare shown in Fig. 5.20d and Fig. 5.20e, where Fig. 5.20a through Fig. 5.20c areidentical to those in Fig. 5.19. Fig. 5.20d and Fig. 5.20e again show that there arefewer correlations detected with HDD and CDD than with temperature. We can alsosee that using HDD and CDD results in similar distributions of equipment predictionsthan when temperature is used.5.3.7 Share of Energy Consumption Due to SpikesWe also investigated how much of a building’s total energy consumption couldbe attributed to the detected consumption spikes. This is important to help usunderstand the significance of the spikes and whether or not they should be targetedfor energy savings strategies. We estimated the energy consumption due to thespikes in two ways:1. Detected spikes: We took the sum of the amplitude of all detected spikes.We expected that this is would underestimate the energy consumption asso-ciated with the spikes because our method does not necessarily detect all thespikes produced by the equipment creating them.2. Estimated spikes: For each week, we took the mean amplitude of the de-tected spikes and multiplied it by the number of spikes there would be ifthey repeated at regular intervals according to the median weekly separationdistance. We then summed up the amplitudes of all these estimated spikes.We expected this method would overestimate the energy consumption asso-ciated with the spikes because it will include spikes where they sometimesdo not exist. With that said, we also believe that this estimation is likelycloser to the true energy associated with the spikes, since we believe it ismore probable that some spikes were not detected rather than not presentaltogether.Using these two methods, we can get approximate upper and lower bounds for theenergy consumed in a building that is related to the consumption spikes. We ran1055.371.036.550.325.760.473.630.0815.392.9232.520.391.0311.050.4712.790.24No DataNeitherHeatingCoolingBothNo DataNeitherHeatingCoolingBothElectricity DetectedGas DetectedTotal Buildings = 1267(a) All analyzed buildings4.780.967.650.766.310.762.6813.961.9133.080.380.3810.710.1915.49No DataNeitherHeatingCoolingBothNo DataNeitherHeatingCoolingBothElectricity DetectedGas DetectedTotal Buildings = 523(b) Positive correlation with temperature2.580.744.432.951.8514.763.6938.010.372.9512.550.7413.650.74No DataNeitherHeatingCoolingBothNo DataNeitherHeatingCoolingBothElectricity DetectedGas DetectedTotal Buildings = 271(c) Negative correlation with temperature4.551.37.140.876.060.872.8113.421.7333.550.430.2210.820.2216.02No DataNeitherHeatingCoolingBothNo DataNeitherHeatingCoolingBothElectricity DetectedGas DetectedTotal Buildings = 462(d) Negative correlation with HDD2.671. DataNeitherHeatingCoolingBothNo DataNeitherHeatingCoolingBothElectricity DetectedGas DetectedTotal Buildings = 187(e) Negative correlation with CDDFigure 5.20: Tile plots comparing our method with the alternative method,including plots for the use of heating and cooling degree-days106Figure 5.21: Share of building energy consumption attributed to spikesthis calculation on the set of approximately 4000 buildings on which the correlationanalysis was run (see Section 5.3.1). Fig. 5.21 shows the distribution of energyshares using both estimation methods. As expected, the estimated spikes methodpredicts higher shares than the detected spikes method in general.Here, we assume that it is worthwhile to investigate and suggest energy savingstrategies to a consumer for any equipment that is responsible for at least 5% of abuilding’s total energy consumption. Our results suggest that spikes are responsiblefor at least 5% of total energy consumption in 15.2% to 43.1% of buildings thathave these spikes. These bounds are based on the detected and estimated spikemethods, respectively. So, while energy saving recommendations related to thespikes will not have a big impact in all the buildings, there is a significant fractionof buildings in which such measures could be useful.We also investigated the distribution of the energy shares of spikes for the tenverticals with the most buildings in our test set. The energy share of the detectedand estimated spikes for these verticals are given in Fig. 5.22.19 We can see thatthe distributions are similar for most verticals. Interestingly, the prediction usingdetected spikes is notably smaller than using estimated spikes for vehicle service19We show only the energy share values up to 20% for clarity, although there are a few that exceed20%.107Figure 5.22: Share of building energy consumption attributed to spikes forthe ten verticals with the most buildings containing spikesbusinesses. We can also see that dentist offices tend to have smaller shares ofenergy associated with their spikes than most buildings and beauty salons tend tohave larger shares. This suggests that understanding what equipment causes theconsumption spikes and suggesting energy saving strategies for them may be moreuseful for beauty salons and less useful for dentist offices.1085.4 SummaryIn this chapter, we developed a method to detect frequently occurring changesin electricity consumption (i.e., “spikes”) and attempted to understand patterns inthese spikes. We investigated how the energy spikes relate to outdoor temperatureand found that their separation times are sometimes correlated with temperature,HDD and/or CDD. This very interesting finding suggests that the equipment respon-sible for the spikes serves some function that depends on outdoor air temperature,such as space heating or cooling. We also explored what share of total buildingenergy consumption that the spikes are responsible for and showed that in manybuildings this share is large enough to be considered for energy savings strategies.109Chapter 6ConclusionsTo conclude this thesis, we summarize our main findings and suggest directions forfuture work in the following sections.6.1 ConclusionsIn this thesis we explored two ways to improve the feasibility of NILM. First, wefocused on very-high-rate data and investigated how compressed sensing using asingle sampling channel can be used to intelligently acquire and manage electricaldata for NILM. Second, we turned to very-low-rate energy data and developed amethod to detect and analyze frequently occurring energy consumption behaviours.The main conclusions we draw from our work are:• Single-channel randomized sampling techniques can be used to effectivelyacquire very-high-rate measurements of electrical signals for NILM. Withthis solution, the signals can be sampled at rates lower than the Nyquist rateand less hardware is required to capture them compared to existing methods.In addition, this solution efficiently compresses the electrical data, making iteasier to transmit, receive, store and manage.• While individual load waveforms can be used as a sparsity basis with thesingle-channel randomized sampling techniques, learned dictionaries arealso effective at permitting sparse representations of electrical signals for110NILM. Learned dictionaries may be particularly useful when the individualload waveforms are not available.• Energy transitions in very-low-rate energy consumption data can permit theextraction and characterization of frequent energy consumption patterns.Links were made between these patterns, which we called “spikes”, and out-door temperature. Thus, our method demonstrates a new way to identifypotential heating and cooling equipment from very-low-rate energy data.Further, we showed how the total energy consumption of these spikes can beestimated and that these patterns are responsible for a large enough shareof buildings’ typical energy consumption that they are worth pursuing forenergy saving measures.In demonstrating these outcomes, our work has contributed in two separate waysto the facilitation of widespread NILM use. Thanks to the many ways that NILMhelps people understand energy consumption, it is our hope that these efforts willcontribute to more informed energy management decisions in all sectors.6.2 Opportunities for Future WorkBased on the work performed for this thesis, we believe there are many valuableways in which the research presented here could be extended. In particular, thefollowing extensions would be particularly interesting:1. Exploring the performance and feasibility of learning dictionaries fromthe CS measurements. Given the success of the offline dictionary learningwe applied for very-high-rate NILM data, exploring other dictionary learningimplementations would be an interesting extension of this work. In par-ticular, learning dictionaries directly from CS measurements [101, 102], asdiscussed in Section 4.2.2, would be very practical for this application. Suchan implementation would eliminate the need for load waveforms or high-ratetraining signals, both of which can be difficult to obtain.2. Investigating the sensitivity of the system to voltage frequency variationand errors in delay times. In our work, we assumed a scenario where111the pointwise difference between current waveforms can be calculated. In-vestigating to what extent small variations in the frequency of the voltagesupply or errors in time delays affect these pointwise differences in currentwaveforms and the overall system performance is another interesting area ofresearch.3. Investigating the influence of external factors other than outdoor tem-perature on consumption spikes in very-low-rate electrical data. Themethod we developed shows how spikes in very-low-rate energy data canbe correlated with temperature, but it is possible that these spikes or otherpatterns in the data are related to factors other than temperature (e.g., occu-pancy, daylight). Additional analysis could be done using the same methodsas we have described here, but exploring relationships between spikes andvarious other external factors.4. Exploring other methods to search for spikes in very-low-rate electri-cal consumption data. We used a simple and effective method of findingspikes, but it would be interesting to compare this method’s performancewith other methods. For example, we could look for a correlation betweensome “search shape” and the signal, similarly to the method used in [58]for more granular energy data. We could slide the “search shape” along theconsumption time series and it would evaluate to a high value when the twomatch closely.112Bibliography[1] G. Hart, “Nonintrusive appliance load monitoring,” Proc. IEEE, vol. 80,pp. 1870–1891, Dec. 1992. → pages ix, 2, 7, 8, 12, 13, 14, 21, 34[2] IPCC, “2014: Summary for policymakers,” in Climate Change 2014,Mitigation of Climate Change: Contribution of Working Group III to theFifth Assessment Report of the Intergovernmental Panel on ClimateChange, O. Edenhofer, R. Pichs-Madruga, Y. Sokona, E. Farahani,S. Kadner, K. Seyboth, A. Adler, I. Baum, S. Brunner, P. Eickemeier,B. Kriemann, J. Savolainen, S. Schlo¨mer, C. von Stechow, T. Zwickel, andJ. Minx, Eds. Cambridge, United Kingdom and New York, NY:Cambridge University Press, 2014, pp. 1–32. → pages 1[3] United Nations Framework Convention on Climate Change (UNFCCC),“Part two: Action taken by the conference of the parties at its sixteenthsession,” in Report of the Conference of the Parties on its sixteenth session,held in Cancun from 29 November to 10 December 2010, Cancun, Mexico,Mar. 2011, Report FCCC/CP/2010/7/Add.1, Decision 1/CP.16. → pages 1[4] UNEP 2014, “Emissions gap report executive summary,” United NationsEnvironment Programme (UNEP), Nairobi, Kenya, Nov. 2014. → pages 1[5] J. Gummer, D. Kennedy, S. Fankhauser, B. Hoskins, P. Johnson, J. King,J. Krebs, R. May, and J. Skea, “Fourth carbon budget review – technicalreport: Sectoral analysis of the cost-effective path to the 2050 target,”Committee on Climate Change (CCC), London, United Kingdom, Dec.2013, ch. 3. → pages 1[6] M. E. Berges, E. Goldman, H. S. Matthews, and L. Soibelman, “Enhancingelectricity audits in residential buildings with nonintrusive loadmonitoring,” J. Ind. Ecology, vol. 14, no. 5, pp. 844–858, Oct. 2010. →pages 2113[7] K. C. Armel, A. Gupta, G. Shrimali, and A. Albert, “Is disaggregation theholy grail of energy efficiency? The case of electricity,” Energy Policy,vol. 52, pp. 213–234, Jan. 2013. → pages 2, 3, 4, 7, 8, 9, 11, 21, 22, 23, 24,25, 27[8] M. Chetty, D. Tran, and R. E. Grinter, “Getting to green: Understandingresource consumption in the home,” in Proc. 10th ACM Int. Conf.Ubiquitous Computing, Seoul, Korea, 2008, pp. 242–251. → pages 2[9] T. Ueno, F. Sano, O. Saeki, and K. Tsuji, “Effectiveness of anenergy-consumption information system on energy savings in residentialhouses based on monitored data,” Appl. Energy, vol. 83, no. 2, pp. 166 –183, 2006. → pages 3[10] BC Hydro. (2015, Aug.) Let’s be smart with our power. [Online].Available: https://www.bchydro.com/powersmart.html → pages 3[11] D. Huang, M. Thottan, and F. Feather, “Designing customized energyservices based on disaggregation of heating usage,” in IEEE Power EnergySoc. Innovative Smart Grid Technologies, Washington, DC, Feb. 2013. →pages 3, 12, 13, 31, 33[12] J. H. Scofield, “Do LEED-certified buildings save energy? Not really,”Energy and Buildings, vol. 41, no. 12, pp. 1386 – 1390, 2009. → pages 3[13] M. Pipattanasomporn, M. Kuzlu, S. Rahman, and Y. Teklu, “Load profilesof selected major household appliances and their demand responseopportunities,” IEEE Trans. Smart Grid, vol. 5, pp. 742–750, Mar. 2014.→ pages 3[14] H. Kim, M. Marwah, M. Arlitt, G. Lyon, and J. Han, “Unsuperviseddisaggregation of low frequency power measurements,” in Proc. SIAM Int.Conf. Data Mining, Mesa, AZ, 2011, pp. 747–758. → pages 4, 8, 12, 14,21, 22[15] Z. Kolter and T. Jaakkola, “Approximate inference in additive factorialHMMs with application to energy disaggregation,” in Int. Conf. ArtificialIntell. Stat., La Palma, Canary Islands, Apr. 2012, pp. 1472–1482. → pages4, 8, 12, 15, 21[16] O. Parson, S. Ghosh, M. Weal, and A. Rogers, “Non-intrusive loadmonitoring using prior models of general appliance types,” in Proc. Conf.Artificial Intell., Toronto, Canada, Jul. 2012, pp. 356–362. → pages 4, 8,12, 15, 21114[17] A. Zoha, A. Gluhak, M. Nati, and M. Imran, “Low-power appliancemonitoring using factorial hidden Markov models,” in IEEE Int. Conf.Intelligent Sensors, Sensor Networks, Inform. Process., Melbourne,Australia, Apr. 2013, pp. 527–532. → pages 4, 12, 15[18] D. Egarter, V. Bhuvana, and W. Elmenreich, “PALDi: Online loaddisaggregation via particle filtering,” IEEE Trans. Instrum. Meas., vol. 64,pp. 467–477, Feb. 2015. → pages 4, 12, 15, 21[19] J. Liao, G. Elafoudi, L. Stankovic, and V. Stankovic, “Non-intrusiveappliance load monitoring using low-resolution smart meter data,” in IEEEInt. Conf. Smart Grid Commun., Venice, Italy, Nov. 2014, pp. 541–546. →pages 4, 12, 15, 21[20] A. G. Ruzzelli, C. Nicolas, A. Schoofs, and G. M. P. O’Hare, “Real-timerecognition and profiling of appliances through a single electricity sensor,”in IEEE Commun. Soc. Conf. Sensor Mesh Ad Hoc Commun. Networks,Boston, MA, Jun. 2010. → pages 4, 12, 15, 48[21] J. Kelly, “Disaggregating smart meter readings using device signatures,”Master’s thesis, Imperial College London, London, United Kingdom, Sep.2011. → pages 4, 12, 15, 21, 34, 35[22] G. Tang, K. Wu, J. Lei, and J. Tang, “A simple model-driven approach toenergy disaggregation,” in IEEE Int. Conf. Smart Grid Commun., Venice,Italy, Nov. 2014, pp. 572–577. → pages 4, 10, 12, 15, 21, 22, 28[23] J. Kelly and W. J. Knottenbelt, “The UK-DALE dataset, domesticappliance-level electricity demand and whole-house demand from five UKhomes,” Scientific Data, vol. 2, no. 150007, 2015. → pages 4, 23, 27, 41[24] J. Tropp, M. Wakin, M. Duarte, D. Baron, and R. Baraniuk, “Randomfilters for compressive sampling and reconstruction,” in IEEE Int. Conf.Acoustics, Speech and Signal Process., vol. 3, Toulouse, France, May 2006,pp. 872–875. → pages 5, 30, 37[25] S. Kirolos, J. Laska, M. Wakin, M. Duarte, D. Baron, T. Ragheb,Y. Massoud, and R. Baraniuk, “Analog-to-information conversion viarandom demodulation,” in IEEE Dallas/CAS Workshop Design, Applicat.,Integration Software, Richardson, TX, Oct. 2006, pp. 71–74. → pages 5,30, 38115[26] M. Zeifman and K. Roth, “Nonintrusive appliance load monitoring:Review and outlook,” IEEE Trans. Consum. Electron., vol. 57, pp. 76–84,Feb. 2011. → pages 7, 8, 22, 27[27] A. Zoha, A. Gluhak, M. A. Imran, and S. Rajasegarar, “Non-intrusive loadmonitoring approaches for disaggregated energy sensing: A survey,”Sensors, vol. 12, no. 12, pp. 16 838–16 866, 2012. → pages 7[28] Y. F. Wong, Y. A. Sekercioglu, T. Drummond, and V. S. Wong, “Recentapproaches to non-intrusive load monitoring techniques in residentialsettings,” in IEEE Symp. Computational Intell. Applicat. Smart Grid,Singapore, 2013, pp. 73–79. → pages 7[29] S. Inagaki, T. Egami, T. Suzuki, H. Nakamura, and K. Ito, “Nonintrusiveappliance load monitoring based on integer programming,” Elect. Eng. inJapan, vol. 174, no. 2, pp. 18–25, 2011. → pages 8, 10, 12, 18[30] Y. Wang, A. Filippi, R. Rietman, and G. Leus, “Compressive sampling fornon-intrusive appliance load monitoring (NALM) using currentwaveforms,” in Proc. IASTED Int. Conf. Signal Process., PatternRecognition Applicat., Crete, Greece, Jun. 2012. → pages 8, 10, 12, 18, 28,29, 30, 42, 47, 48, 51, 56, 57, 59, 60[31] F. Sultanem, “Using appliance signatures for monitoring residential loadsat meter panel level,” IEEE Trans. Power Del., vol. 6, pp. 1380–1385, Oct.1991. → pages 8, 12, 16[32] S. Shaw, S. Leeb, L. Norford, and R. Cox, “Nonintrusive load monitoringand diagnostics in power systems,” IEEE Trans. Instrum. Meas., vol. 57,pp. 1445–1454, Jul. 2008. → pages 8, 12, 16, 17[33] J. Liang, S. K. K. Ng, G. Kendall, and J. W. M. Cheng, “Load signaturestudy–part I: Basic concept, structure, and methodology,” IEEE Trans.Power Del., vol. 25, pp. 551–560, Apr. 2010. → pages 9, 12, 18[34] N. Batra, J. Kelly, O. Parson, H. Dutta, W. Knottenbelt, A. Rogers,A. Singh, and M. Srivastava, “NILMTK: An open source toolkit fornon-intrusive load monitoring,” in Proc. 5th ACM Int. Conf. Future EnergySyst., Cambridge, United Kingdom, 2014, pp. 265–276. → pages 9[35] S. Makonin and F. Popowich, “Nonintrusive load monitoring (NILM)performance evaluation,” Energy Efficiency, vol. 8, no. 4, pp. 809–814,2015. → pages 9116[36] L. Pereira and N. Nunes, “Towards systematic performance evaluation ofnon-intrusive load monitoring algorithms and systems,” in SustainableInternet and ICT for Sustainability, Madrid, Spain, Apr. 2015, pp. 1–3. →pages 9[37] D. He, W. Lin, N. Liu, R. Harley, and T. Habetler, “Incorporatingnon-intrusive load monitoring into building level demand response,” IEEETrans. Smart Grid, vol. 4, pp. 1870–1877, Dec. 2013. → pages 9, 24, 25,27[38] S. Barker, S. Kalra, D. Irwin, and P. Shenoy, “NILM Redux: The case foremphasizing applications over accuracy,” in 2nd Int. WorkshopNon-Intrusive Load Monitoring, Austin, TX, Jun. 2014. → pages 9[39] S. Leeb, S. Shaw, and J. Kirtley, J.L., “Transient event detection in spectralenvelope estimates for nonintrusive load monitoring,” IEEE Trans. PowerDel., vol. 10, no. 3, pp. 1200–1210, Jul. 1995. → pages 10, 12, 16[40] J. Powers, B. Margossian, and B. Smith, “Using a rule-based algorithm todisaggregate end-use load profiles from premise-level data,” IEEE Trans.Comput. Applicat. Power, vol. 4, no. 2, pp. 42–47, Apr. 1991. → pages 12,13, 32[41] A. Prudenzi, “A neuron nets based procedure for identifying domesticappliances pattern-of-use from energy recordings at meter panel,” in IEEEPower Eng. Soc. Winter Meeting, vol. 2, New York, NY, 2002, pp.941–946. → pages 12, 13, 31, 32[42] J. Z. Kolter, S. Batra, and A. Y. Ng, “Energy disaggregation viadiscriminative sparse coding,” in Advances Neural Inform. Process. Syst.,J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, andA. Culotta, Eds. Red Hook, NY: Curran Associates, Inc., 2010, pp.1153–1161. → pages 12, 13, 28, 31, 33[43] R. Curtis, D. Copeland, and D. Yates, “Method and system fordisaggregating heating and cooling energy use from other building energyuse,” U.S. Patent 8 660 813, Feb. 25, 2014. → pages 12, 13, 31, 33[44] A. Cole and A. Albicki, “Data extraction for effective non-intrusiveidentification of residential power loads,” in IEEE InstrumentationMeasurement Technology Conf., vol. 2, St. Paul, MN, May 1998, pp.812–815. → pages 12, 13117[45] A. Cole and A. Albicki, “Algorithm for nonintrusive identification ofresidential appliances,” in Proc. IEEE Int. Symp. Circuits Syst., vol. 3,Monterey, CA, May 1998, pp. 338–341. → pages 12, 13[46] S. Drenker and A. Kader, “Nonintrusive monitoring of electric loads,”IEEE Comput. Appl. Power, vol. 12, pp. 47–51, Oct. 1999. → pages 12, 13[47] L. Farinaccio and R. Zmeureanu, “Using a pattern recognition approach todisaggregate the total electricity consumption in a house into the majorend-uses,” Energy and Buildings, vol. 30, no. 3, pp. 245 – 259, 1999. →pages 12, 14[48] M. L. Marceau, “Nonintrusive load disaggregation computer program toestimate the energy consumption of major end-uses in residentialbuildings,” Master’s thesis, Concordia University, Montre´al, Quebec,Canada, May 1999. → pages 12, 13[49] M. Marceau and R. Zmeureanu, “Nonintrusive load disaggregationcomputer program to estimate the energy consumption of major end uses inresidential buildings,” Energy Conversion Manage., vol. 41, no. 13, pp.1389 – 1403, 2000. → pages 12, 13[50] M. Baranski and J. Voss, “Nonintrusive appliance load monitoring based onan optical sensor,” in IEEE Bologna PowerTech Conf., vol. 4, Bologna,Italy, Jun. 2003. → pages 12, 14, 34[51] M. Baranski and J. Voss, “Genetic algorithm for pattern detection inNIALM systems,” in IEEE Int. Conf. Syst., Man Cybernetics, vol. 4,Hague, Netherlands, Oct. 2004, pp. 3462–3468. → pages 12, 14, 34[52] M. Baranski and J. Voss, “Detecting patterns of appliances from total loaddata using a dynamic programming approach,” in IEEE Int. Conf. DataMining, Brighton, United Kingdom, Nov. 2004, pp. 327–330. → pages 12,14[53] M. Figueiredo, A. de Almeida, and B. Ribeiro, “Home electrical signaldisaggregation for non-intrusive load monitoring (NILM) systems,”Neurocomputing, vol. 96, pp. 66 – 73, 2012, adaptive and NaturalComputing Algorithms. → pages 12, 15[54] S. Makonin, I. V. Bajic, and F. Popowich, “Efficient sparse matrixprocessing for nonintrusive load monitoring (NILM),” in 2nd Int. WorkshopNon-Intrusive Load Monitoring, Austin, TX, Jun. 2014. → pages 12, 15, 28118[55] S. Arberet and A. Hutter, “Non-intrusive load curve disaggregation usingsparse decomposition with a translation-invariant boxcar dictionary,” inIEEE Power Energy Soc. Innovative Smart Grid Technologies Conf.Europe, Istanbul, Turkey, Oct. 2014. → pages 12, 15[56] K. X. Perez, W. J. Cole, J. D. Rhodes, A. Ondeck, M. Webber, M. Baldea,and T. F. Edgar, “Nonintrusive disaggregation of residentialair-conditioning loads from sub-hourly smart meter data,” Energy andBuildings, vol. 81, pp. 316 – 325, Oct. 2014. → pages 12, 13, 31, 34[57] E. Elhamifar and S. Sastry, “Energy disaggregation vie learning powerletsand sparse coding,” in AAAI Conf. Artificial Intell., Austin, TX, Jan. 2015.→ pages 12, 15, 28, 43[58] L. K. Norford and S. B. Leeb, “Non-intrusive electrical load monitoring incommercial buildings based on steady-state and transient load-detectionalgorithms,” Energy and Buildings, vol. 24, no. 1, pp. 51 – 64, 1996. →pages 12, 15, 112[59] M. Berge´s, E. Goldman, H. Matthews, and L. Soibelman, “Learningsystems for electric consumption of buildings,” in Computing in Civil Eng.,C. H. Caldas and O. William J, Eds., 2009, pp. 1–10. → pages 12, 16[60] H. Gonc¸alves, A. Ocneanu, and M. Berge´s, “Unsupervised disaggregationof appliances using aggregated consumption data,” in ACM SIGKDD Conf.Knowledge Discovery and Data Mining, San Diego, CA, Aug. 2011. →pages 12, 16, 34[61] K. S. Barsim, R. Streubel, and B. Yang, “An approach for unsupervisednon-intrusive load monitoring of residential appliances,” in 2nd Int.Workshop Non-Intrusive Load Monitoring, Austin, TX, Jun. 2014. →pages 12, 16, 34[62] L. Du, Y. Yang, D. He, R. G. Harley, and T. G. Habetler, “Featureextraction for load identification using long-term operating waveforms,”IEEE Trans. Smart Grid, vol. 6, pp. 819–826, Mar. 2015. → pages 12, 16[63] J. Roos, I. Lane, E. Botha, and G. Hancke, “Using neural networks fornon-intrusive monitoring of industrial electrical loads,” in IEEEInstrumentation Measurement Technology Conference, vol. 3, Hamamatsu,Japan, May 1994, pp. 1115–1118. → pages 12, 17119[64] K. Lee, S. Leeb, L. Norford, P. Armstrong, J. Holloway, and S. Shaw,“Estimation of variable-speed-drive power consumption from harmoniccontent,” IEEE Trans. Energy Convers., vol. 20, no. 3, pp. 566–574, Sep.2005. → pages 12, 16, 17[65] D. Srinivasan, W. S. Ng, and A. C. Liew, “Neural-network-based signaturerecognition for harmonic source identification,” IEEE Trans. Power Del.,vol. 21, pp. 398–405, Jan. 2006. → pages 12, 17[66] M. Akbar and D. Z. A. Khan, “Modified nonintrusive appliance loadmonitoring for nonlinear devices,” in IEEE Int. Multitopic Conf., Lahore,Pakistan, Dec. 2007. → pages 12, 16[67] W. Wichakool, A.-T. Avestruz, R. Cox, and S. Leeb, “Modeling andestimating current harmonics of variable electronic loads,” IEEE Trans.Power Electron., vol. 24, no. 12, pp. 2803–2811, Dec. 2009. → pages 12,16, 17[68] M. Dong, P. C. M. Meira, W. Xu, and C. Y. Chung, “Non-intrusivesignature extraction for major residential loads,” IEEE Trans. Smart Grid,vol. 4, pp. 1421–1430, Sep. 2013. → pages 12, 17, 34[69] W. Chan, A. So, and L. Lai, “Harmonics load signature recognition bywavelets transforms,” in Proc. Int. Conf. Electric Utility DeregulationRestructuring Power Technologies, London, United Kingdom, Apr. 2000,pp. 666–671. → pages 12, 17, 44[70] C. Laughman, K. Lee, R. Cox, S. Shaw, S. Leeb, L. Norford, andP. Armstrong, “Power signature analysis,” IEEE Power Energy Mag.,vol. 1, pp. 56–63, Mar. 2003. → pages 12, 16, 17[71] W. K. Lee, G. S. K. Fung, H. Y. Lam, F. H. Y. Chan, and M. Lucente,“Exploration on load signatures,” in Int. Conf. Elect. Eng., Sapporo, Japan,Jul. 2004. → pages 12, 17[72] H. Lam, G. Fung, and W. Lee, “A novel method to construct taxonomy ofelectrical appliances based on load signatures,” IEEE Trans. Consum.Electron., vol. 53, no. 2, pp. 653–660, May 2007. → pages 12, 17[73] K. Suzuki, S. Inagaki, T. Suzuki, H. Nakamura, and K. Ito, “Nonintrusiveappliance load monitoring based on integer programming,” in Soc.Instrument Control Engineers Annu. Conf., Tokyo, Japan, Aug. 2008, pp.2742–2747. → pages 12, 18120[74] J. Liang, S. K. K. Ng, G. Kendall, and J. W. M. Cheng, “Load signaturestudy–part II: Disaggregation framework, simulation, and applications,”IEEE Trans. Power Del., vol. 25, pp. 561–569, Apr. 2010. → pages 12, 18[75] F. Popescu, F. Enache, I.-C. Vizitiu, and P. Ciotirnae, “Recurrence plotanalysis for characterization of appliance load signature,” in Int. Conf.Communications, Bucharest, Romania, May 2014. → pages 12, 18, 47[76] S. N. Patel, T. Robertson, J. A. Kientz, M. S. Reynolds, and G. D. Abowd,“At the flick of a switch: Detecting and classifying unique electrical eventson the residential power line,” in Ubiquitous Computing, ser. Lecture Notesin Comput. Sci., J. Krumm, G. Abowd, A. Seneviratne, and T. Strang, Eds.Springer Berlin Heidelberg, 2007, vol. 4717, pp. 271–288. → pages 12, 18,21[77] S. Gupta, M. S. Reynolds, and S. N. Patel, “ElectriSense: Single-pointsensing using EMI for electrical event detection and classification in thehome,” in Proc. 12th ACM Int. Conf. Ubiquitous Computing, Copenhagen,Denmark, 2010, pp. 139–148. → pages 12, 18, 48[78] E. Cande`s, J. Romberg, and T. Tao, “Robust uncertainty principles: Exactsignal reconstruction from highly incomplete frequency information,” IEEETrans. Inf. Theory, vol. 52, pp. 489–509, Feb. 2006. → pages 18, 28, 36, 38[79] D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, pp.1289–1306, Apr. 2006. → pages 18, 28, 36, 38[80] A. Dale´n and C. Weinhardt, “Evaluating the impact of data sample-rate onappliance disaggregation,” in IEEE Int. Energy Conf., May 2014, pp.743–750. → pages 22[81] Current Cost. (2015, Aug.) Current Cost Products: Transmitter. [Online].Available: www.currentcost.com/product-transmitter.html → pages 23[82] L. Pereira, F. Quintal, N. J. Nunes, and M. Berge´s, “The design of ahardware-software platform for long-term energy eco-feedback research,”in ACM SIGCHI Symp. Eng. Interactive Computing Syst., Copenhagen,Denmark, 2012, pp. 221–230. → pages 23, 25[83] J. Z. Kolter and M. J. Johnson, “REDD: A public data set for energydisaggregation research,” in SustKDD: Workshop Data Mining Applicat.Sustainability, San Diego, CA, Aug. 2011. → pages 23, 27, 41121[84] K. Anderson, A. F. Ocneanu, D. Benitez, D. Carlson, A. Rowe, andM. Berge´s, “BLUED: A fully labeled public dataset for event-basednon-intrusive load monitoring research,” in SustKDD: Workshop DataMining Applicat. Sustainability, Beijing, China, Aug. 2012. → pages 23,41, 54, 55[85] J. Seward. (2014, Jun.) bzip2 and libbzip2. [Online]. Available:http://www.bzip.org/index.html → pages 27[86] Xiph. Org Foundation. (2015, Aug.) FLAC: free lossless audio codec.[Online]. Available: http://xiph.org/flac → pages 27[87] M. Mishali, Y. C. Eldar, and A. J. Elron, “Xampling: Signal acquisition andprocessing in union of subspaces,” IEEE Trans. Signal Process., vol. 59,pp. 4719–4734, Oct. 2011. → pages 28[88] S. Foucart and H. Rauhut, A Mathematical Introduction to CompressiveSensing. Birkha¨user Basel, 2013. → pages 30[89] M. Mishali and Y. C. Eldar, “From theory to practice: Sub-Nyquistsampling of sparse wideband analog signals,” IEEE J. Sel. Topics SignalProcess., vol. 4, pp. 375–391, Apr. 2010. → pages 30[90] M. Mishali and Y. C. Eldar, “Xampling: compressed sensing of analogsignals,” in Compressed Sensing: Theory and Applications, Y. C. Eldar andG. Kutyniok, Eds. Cambridge University Press, 2012, ch. 3. → pages 30[91] G. Zhang, B. E. Patuwo, and M. Y. Hu, “Forecasting with artificial neuralnetworks: The state of the art,” Int. J. Forecasting, vol. 14, no. 1, pp. 35 –62, Mar. 1998. → pages 32[92] J. Tropp and A. Gilbert, “Signal recovery from random measurements viaorthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53, pp.4655–4666, Dec 2007. → pages 37[93] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition bybasis pursuit,” SIAM J. Scientific Computing, vol. 20, no. 1, pp. 33–61,1998. → pages 37[94] F. Krahmer, S. Mendelson, and H. Rauhut, “Suprema of chaos processesand the restricted isometry property,” Commun. Pure Appl. Math., vol. 67,no. 11, pp. 1877–1904, 2014. → pages 38122[95] J. Laska, S. Kirolos, M. Duarte, T. Ragheb, R. Baraniuk, and Y. Massoud,“Theory and implementation of an analog-to-information converter usingrandom demodulation,” in IEEE Int. Symp. Circuits Syst., New Orleans,LA, May 2007, pp. 1959–1962. → pages 38[96] J. Tropp, J. Laska, M. Duarte, J. Romberg, and R. Baraniuk, “BeyondNyquist: Efficient sampling of sparse bandlimited signals,” IEEE Trans.Inf. Theory, vol. 56, pp. 520–544, Jan. 2010. → pages 38, 39[97] I. Tosˇic´ and P. Frossard, “Dictionary learning,” IEEE Signal ProcessingMag., vol. 28, no. 2, pp. 27–38, March 2011. → pages 43[98] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm fordesigning overcomplete dictionaries for sparse representation,” IEEETrans. Signal Process., vol. 54, pp. 4311–4322, Nov 2006. → pages 43, 61[99] E. Cande`s and M. Wakin, “An introduction to compressive sampling,”IEEE Signal Processing Mag., vol. 25, pp. 21–30, Mar. 2008. → pages 43[100] M. H. J. Bollen and I. Y.-H. Gu, “Voltage frequency variations,” in SignalProcessing of Power Quality Disturbances. Wiley-IEEE Press, 2006,ch. 2, sec. 1. → pages 45[101] S. Gleichman and Y. Eldar, “Blind compressed sensing,” IEEE Trans. Inf.Theory, vol. 57, pp. 6958–6975, Oct. 2011. → pages 52, 111[102] C. Studer and R. Baraniuk, “Dictionary learning from sparsely corrupted orcompressed signals,” in IEEE Int. Conf. Acoustics, Speech Signal Process.,Kyoto, Japan, Mar. 2012, pp. 3341–3344. → pages 52, 111[103] The Mathworks, Inc. (2015, Aug.) wmpalg: Matching pursuit. [Online].Available: http://www.mathworks.com/help/wavelet/ref/wmpalg.html →pages 60[104] R Core Team. (2015, Jun.) Kernel density estimation, stats packageversion 3.3.0. [Online]. Available:https://stat.ethz.ch/R-manual/R-devel/library/stats/html/density.html →pages 71[105] The R Foundation. (2015, Jun.) The R project for statistical computing.[Online]. Available: http://www.r-project.org → pages 71[106] R Core Team. (2015, Jun.) Fitting linear models, stats package version3.2.0. [Online]. Available:123https://stat.ethz.ch/R-manual/R-patched/library/stats/html/lm.html →pages 78[107] United States Environmental Protection Agency. (2014, May) Climatechange indicators in the united states: Heating and cooling degree days.[Online]. Available:http://www.epa.gov/climatechange/pdfs/print heating-cooling-2014.pdf→ pages 79[108] B. Ripley. (2015, Jun.) Seasonal decomposition of time series by loess,stats package version 3.3.0. [Online]. Available:https://stat.ethz.ch/R-manual/R-devel/library/stats/html/stl.html → pages82[109] R. B. Cleveland, W. S. Cleveland, J. E. McRae, and I. Terpenning, “STL: Aseasonal-trend decomposition procedure based on Loess,” J. Official Stat.,vol. 6, no. 1, pp. 3–73, Jun. 1990. → pages 82[110] International Organization for Standardization. (2015, Jun.) Date and timeformat - ISO 8601. [Online]. Available:http://www.iso.org/iso/home/standards/iso8601.htm → pages 91124


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items