UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Detection of malicious activities against advanced metering infrastructure in smart grid Jokar, Paria 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2016_february_jokar_paria.pdf [ 973.26kB ]
JSON: 24-1.0223162.json
JSON-LD: 24-1.0223162-ld.json
RDF/XML (Pretty): 24-1.0223162-rdf.xml
RDF/JSON: 24-1.0223162-rdf.json
Turtle: 24-1.0223162-turtle.txt
N-Triples: 24-1.0223162-rdf-ntriples.txt
Original Record: 24-1.0223162-source.json
Full Text

Full Text

Dztzxtion of bvlixious Vxtivitizs Vgvinst Vyvvnxzy bztzring Infrvstruxturzin hmvrt GriybyParia JokarB.Sc., University of Science and Technology, Tehran, Iran, 2003M.Sc., University of Science and Technology, Tehran, Iran, 2006A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)December 2015c© Paria Jokar, 2015VwstrvxtIn this thesis we investigate security challenges in smart grid and propose several algo-rithms for detecting malicious activities against AMI. Our work includes two parts. Inthe first part, we focus on the problem of intrusion detection in ZigBee HANs. We studythe requirements and challenges of designing intrusion detection systems for HANs, andsuggest application of model based intrusion detection and automatic intrusion preventiontechniques. Accordingly we design algorithms for detecting and preventing spoofing at-tacks as an important attack type against wireless networks. We extend this work to designan intrusion detection and prevention system for ZigBee HANs, HANIDPS, which is ableto detect and automatically stop various attack types. Through extensive experiments andanalysis we show that the proposed method is able to detect and stop the attacks with highprecision, low cost and short delay, which makes it suitable for HANs. Considering thatin HANIDPS the prevention operation is performed automatically, costs of false positivesare low and limited to some network overhead. Also the delay in stopping the attacks issignificantly shortened compared to when human intervention is required. This reducesthe damages caused by possible attacks.In the next part, we focus on detection of cyber intrusions that affect the load curve.We suggest that by monitoring abnormalities in customers’ consumption pattern these at-tacks are detectable. We introduce a consumption pattern based electricity theft detector,CPBETD, which unlike previous techniques is robust against nonmalicious changes in con-sumption pattern and provides a high and adjustable performance without jeopardizingiiUvstruwtcustomers’ privacy. Extensive experiments on real dataset of 5000 customers show theeffectiveness of our approach. We also introduce instantaneous anomaly detector, IAD,which by monitoring the usage patterns effectively detects attacks against direct and indi-rect load control which are some of the major concerns in AMI.iiierzfvxzHereby, I declare that I am the author of this thesis. Chapters 2-5 encompass work thathas been published or is under review. The corresponding papers were co-authored byProf. Victor C. M. Leung who supervised me through this research. The papers corre-sponding to Chapters 2, 3, and 5 were also co-authored by Nasim Arianpoo, and one of thepapers related to Chapter 4 was co-authorde by Hasen Nacanfar. They provided valuablecomments on these works. The following publications describe the work completed in thisthesis.Journvl evpzrsA euwlishzy• Paria Jokar, Nasim Arianpoo and Victor C. M. Leung, “Electricity Theft DetectionUsing Customers’ Consumption Patterns,” IZZZ irvnsvxtions on hmvrt Griy, DOI10.1109/TSG.2015.2425222, May 2015. (Chapter 5)• Paria Jokar, Nasim Arianpoo and Victor C. M. Leung, “Spoofing Detection in IEEE802.15.4 Networks Based on Received Signal Strength,” Zlszvizr Vy Hox cztfiorks,vol. 11, no. 8, pp. 2648-2660, November 2013. (Chapter 3)• Paria Jokar, Nasim Arianpoo and Victor C. M. Leung, “A Survey on Security Issuesin Smart Grids,” hzxurity vny Communixvtion cztfiorks, DOI: 10.1002/sec.559, June2012. (Chapter 2)Journvl evpzrA huwmittzyivdryfuwy• Paria Jokar and Victor C. M. Leung, “Intrusion Detection and Prevention for ZigBee-Based Home Area Networks in Smart Grids,” August 2015 (Chapter 4)Confzrznxz evpzrsA euwlishzy• Paria Jokar, Nasim Arianpoo and Victor C. M. Leung, “Spoofing Prevention Us-ing Received Signal Strength for ZigBee-Based Home Area Networks,” in eroxC ofIZZZ Intzrnvtionvl Confzrznxz on hmvrt Griy Communixvtions =hmvrtGriyComm),Vancouver, Canada, October 2013. (Chapter 3)• Paria Jokar, Nasim Arianpoo and Victor C. M. Leung, “Intrusion Detection in Ad-vanced Metering Infrastructure Based on Consumption Pattern,” in eroxC of IZZZIntzrnvtionvl Confzrznxz on Communixvtions =ICC), Budapest, Hungary, June 2013.(Chapter 5)• Paria Jokar, Hasen Nicanfar and Victor C. M. Leung, “Specification-Based IntrusionDetection for Home Area Networks in Smart Grids,” in eroxC of IZZZ IntzrnvtionvlConfzrznxz on hmvrt Griy Communixvtions =hmvrtGriyComm), Brussels, Belgium,October 2011. (Chapter 4)vivwlz of ContzntsVwstrvxt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iierzfvxz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivivwlz of Contznts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viaist of ivwlzs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiaist of [igurzs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiaist of Vwwrzvivtions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivVxknofilzygmznts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviiiF Introyuxtion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Detection of Malicious Activities Against Advanced Metering Infrastructure(Trends, Challenges and Needs) . . . . . . . . . . . . . . . . . . . . . . . . 31.1.1 Intrusion Detection Techniques . . . . . . . . . . . . . . . . . . . . 41.1.2 Advanced Metering Infrastructure Threats . . . . . . . . . . . . . . 61.1.3 Requirements and Challenges of Intrusion Detection in AdvancedMetering Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . 91.2 Summary of Results and Main Contributions . . . . . . . . . . . . . . . . 111.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14vihuvly of WontyntsG hzxurity Issuzs in hmvrt Griys . . . . . . . . . . . . . . . . . . . . . . . . . 162.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2 Smart Grid Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2.1 Advanced Metering Infrastructure . . . . . . . . . . . . . . . . . . 182.2.2 Distributed Generation . . . . . . . . . . . . . . . . . . . . . . . . 192.2.3 Wide Area Measurement and Control . . . . . . . . . . . . . . . . 202.3 Security Threats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.4 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.5 Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.6 Key Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.7 Security Risk Assessment and Management . . . . . . . . . . . . . . . . . 302.8 Secure System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 332.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34H hpoofing Dztzxtion vny erzvzntion hystzm for oigWzzBWvszy ]omz Vrzvcztfiorks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.2 A Survey on RSS Based Spoofing Detection Methods . . . . . . . . . . . . 383.3 Home Area Network Architecture . . . . . . . . . . . . . . . . . . . . . . . 423.4 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.5 Spoofing Detection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 443.5.1 Operation Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.5.2 Training Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.5.3 Multiple Air Monitors . . . . . . . . . . . . . . . . . . . . . . . . . 483.6 Spoofing Prevention Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 493.6.1 Static Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50viihuvly of Wontynts3.6.2 Dynamic Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . 503.7 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.7.1 Performance of the Spoofing Detection Algorithm . . . . . . . . . . 513.7.2 Performance of the Spoofing Prevention Algorithms . . . . . . . . 603.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.8.1 Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.8.2 Performance of the Spoofing Detection Algorithm . . . . . . . . . . 623.8.3 Performance of the Spoofing Prevention Algorithms . . . . . . . . 663.9 Discussions and Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 683.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69I ]VcIDehA Vn Intrusion Dztzxtion vny erzvzntion hystzm for oigWzz]Vcs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724.2 Related Work and Comparison . . . . . . . . . . . . . . . . . . . . . . . . 734.3 Home Area Network Security Threats . . . . . . . . . . . . . . . . . . . . 754.3.1 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.4 Home Area Network Intrusion Detection and Prevention System . . . . . 804.4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.4.2 Features and State Space . . . . . . . . . . . . . . . . . . . . . . . 814.4.3 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854.4.4 Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 864.5 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894.5.1 Performance of the Detection Module . . . . . . . . . . . . . . . . 894.5.2 Performance of the Prevention Module . . . . . . . . . . . . . . . . 904.6 Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92viiihuvly of Wontynts4.6.1 Home Area Network Intrusion Detection and Prevention Systemagainst IEEE 802.15.4 attacks . . . . . . . . . . . . . . . . . . . . 924.6.2 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944.6.3 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96J Dztzxtion of bvlixious Vxtivitizs in VbI Using Customzrs Consumptionevttzrns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025.3 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.4 Consumption Pattern Based Electricity Theft Detection Algorithm . . . . 1075.4.1 Training Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.4.2 Application Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 1115.5 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.5.1 Required Performance . . . . . . . . . . . . . . . . . . . . . . . . . 1135.5.2 Classification Method . . . . . . . . . . . . . . . . . . . . . . . . . 1155.5.3 Clustering Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 1165.6 Performance Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . 1165.6.1 Experiment 1: One-class Support Vector Machine . . . . . . . . . . 1195.6.2 Experiment 2: Multi-class Support Vector Machine . . . . . . . . . 1205.6.3 Experiment 3: Multi-class Support Vector Machine with New At-tacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225.6.4 Overall Performance of Consumption Pattern Based Electricity TheftDetection algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 1235.6.5 Effect of Sampling Rate on Performance . . . . . . . . . . . . . . . 124ixhuvly of Wontynts5.6.6 Discussion and Comparison . . . . . . . . . . . . . . . . . . . . . . 1255.7 Instantaneous Anomaly Detection Algorithm . . . . . . . . . . . . . . . . 1275.8 Performance Evaluation of Instantaneous Anomaly Detection . . . . . . . 1295.8.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1295.8.2 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1325.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134K Conxlusions vny [uturz lork . . . . . . . . . . . . . . . . . . . . . . . . . 1356.1 Summary of Accomplished Work . . . . . . . . . . . . . . . . . . . . . . . 1356.2 Suggestions for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 137Wiwliogrvphy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139xaist of ivwlzs3.1 Comparison between λ of RSS and SDC . . . . . . . . . . . . . . . . . . . 643.2 Prevention performance for dynamic threshold spoofing prevention . . . . . 683.3 Comparison of different RSS-based spoofing detection techniques (NA: notapplicable, - : was not provided in the paper, *: average of the results of 4AMs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.1 Example of attacks against ZigBee HANs . . . . . . . . . . . . . . . . . . . 784.2 Requirements of HAN according to the U.S. department of energy guideline. 834.3 Performance for dynamic threshold spoofing prevention. . . . . . . . . . . . 965.1 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225.2 BDR for different values of m . . . . . . . . . . . . . . . . . . . . . . . . . 1245.3 Effect of sampling rate on detection performance . . . . . . . . . . . . . . . 1255.4 Comparison among energy theft detection methods . . . . . . . . . . . . . 1275.5 Appliance load profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1305.6 Cross classification and detection performance of IAD . . . . . . . . . . . . 133xiaist of [igurzs1.1 Architecture of AMI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1 Integration of a data layer to different parts of power system in smart grid al-lows automated transmission and distribution as well as energy conservationand generation based on demand.(Image by Southern California Edison) . 183.1 Architecture of a CP-HAN. . . . . . . . . . . . . . . . . . . . . . . . . . . 433.2 a) Attack scenario. b) CDF of the RSS values of the attacker and the genuinenodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.3 Effect of ratio of malicious traffic on performance of the first step of thespoofing detection algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 553.4 Effect of the difference between RSS mean values on performance of the firststep of the spoofing detection algorithm . . . . . . . . . . . . . . . . . . . . 563.5 Discrete wavelet transform decomposition algorithm . . . . . . . . . . . . . 563.6 Effect of increase in the number of AMs and non-centrality parameter (nrepresents the number of AMs and k is the scale of non-centrality) . . . . . 593.7 Theoretic ROC for static threshold spoofing prevention (m and m′ are themean RSS values of the genuine and attacker nodes) . . . . . . . . . . . . . 613.8 Expected number of tries for successful transmission vs. threshold . . . . . 613.9 Expected number of tries for successful transmission vs. FPR . . . . . . . 623.10 Testbed setting (the distance between consecutive grid dots is 50 cm) . . . 63xiiList of Zigurys3.11 ROC curve of the spoofing detection algorithms for a) AM1, b) AM2, c)AM3, d)AM4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.12 Average ROC of the spoofing detection algorithm. . . . . . . . . . . . . . . 673.13 ROC of spoofing detection in AM4 based on a) R, b) SDC, c) proposedalgorithm.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.14 Experimental ROC for static threshold spoofing prevention. . . . . . . . . 713.15 Average experimental ROC of V1, V2, V3 and V4 for static threshold spoof-ing prevention. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.1 ROC of the detection module. . . . . . . . . . . . . . . . . . . . . . . . . . 955.1 Pseudo-codes of application phase . . . . . . . . . . . . . . . . . . . . . . . 1145.2 An example of the daily consumption. . . . . . . . . . . . . . . . . . . . . 1185.3 An example of the daily consumption. . . . . . . . . . . . . . . . . . . . . 1195.4 ROC curves for three customers with the worst, average and the best detec-tion performance using one-class SVM. . . . . . . . . . . . . . . . . . . . . 1205.5 Silhouette plots for customers with k=1 and k=2. . . . . . . . . . . . . . . 1215.6 ROC curves for three customers with the worst, average, and the best de-tection performance using multi-class SVM. . . . . . . . . . . . . . . . . . 1215.7 ROC curves for three customers with the worst, average and the best detec-tion performance using multi-class SVM with undefined attacks. . . . . . . 1225.8 Relationship between BDR and FPR. . . . . . . . . . . . . . . . . . . . . . 1235.9 Detection accuracy of IAD for false price attack as a function of differencebetween the usage probabilities during high and low price. . . . . . . . . . 134xiiiaist of VwwrzvivtionsAES Advanced encryption standardAGC Automatic generation controlALG Application layer gatewayAM Air monitorAMI Advanced metering infrastructureC-IDPS Central IDPSCAK Cumulative attestation kernelCBR Constant bit rateCMAC Cipher-based message authentication codeCP-HAN Consumer private HANCPDETD Consumption pattern-based energy theft detectorCS Central serverCSMA-CA Carrier sense multiple access - collision avoidanceD DatagramDER Distributed energy resourcesDHWT Discrete Haar wavelet transformDR Detection rateDWT Discrete wavelet transformDoS Denial of servicexivList of UvvryviutionsECDSA Elliptic curve digital signature algorithmED Energy detectionEMS Energy management systemEPRI Electric power research instituteESI Energy service interfaceETDS Energy theft detection systemsFDI False data injectionFFT Fast Fourier transformFPR False positive rateGTS Guaranteed time slotHAN Home area networkHFID High -frequency identifierHTTPS Hyper text transfer protocol with secure socketsIAD Instantaneous anomaly detectorIBE Identity-based encryptionIBS Identity-based signcryptionICS Industrial control systemsID IdentifierIDPS Intrusion detection and prevention systemIDS Intrusion detection systemIED Intelligent electronic devicesIPS Intrusion prevention systemIT Information technologyKGS Key-generating serverLFID Low-frequency identifierxvList of UvvryviutionsLQI Link quality indicatorLUT Look up tableMAC Medium access controlNA Node availabilityNALM None-intrusive appliance load monitoringNAN Neighborhood area networksNILM Nonintrusive load monitoringPDC Phasor data concentratorsPER Packet error ratePHY PhysicalPKI Public key infrastructurePMU Phasor measurement unitsPR Prevention rateRSS Received signal strengthSCADA Supervisory control and data acquisitionSDC Summation of detailed coefficientsSDPS Spoofing detection and prevention systemSEP Smart energy profileSN Sequence numberSVM Support vector machineTCP Transmission control protocolTR Traffic rateUE-HAN Utility enabled HANCP-HAN Customer private HANWAMAC Wide area measurement and controlxviList of UvvryviutionsWAN Wide area networksxviiVxknofilzygmzntsFirst of all, I would like to express my deepest sense of gratitude to my supervisor Prof.Victor C. M. Leung for his continuous support, patience, technical and non-technical guid-ance that helped me accomplish this research.I am also thankful to my friends Maziar Tabarestani, Hasen Nicanfar, Nasim Arianpooand Pedram Samadi for their always being supportive and accountable.Most importantly, I feel indebted to my mother and father for their everlasting love,support, and protectiveness not only in years of doing PhD but also throughout my entirelife.This work was supported by the Natural Sciences and Engineering Research Council(NSERC) of Canada.xviiiChvptzr FIntroyuxtionSmart grid is a vision to modernize the electricity transmission and distribution systems byincorporating computer intelligence into the power system, and providing two way energyflow and data communication. Advanced metering infrastructure (AMI) is a subsystemwithin the smart grid which provides two-way data communication between the smartmeters and the utility company. AMI enables real-time transmissions of power consumptionand pricing information, as well as control commands.Unlocking the tremendous potentials of smart grid such as resilience, high power quality,and consumer participation, strongly depends on security of this system. Integration ofa data layer to the power system can expose it to many cyber security threats. Withoutstrong security measures in place, not only the smart grid will inherit the vulnerabilities ofthe legacy power system, but also new vulnerabilities will be added due to the proliferationof new technologies. From the advent of the smart grid concept, security has always been aprimary concern. In the 2009 White House cyberspace policy review [1], federal governmentwas asked to ensure that security standards are developed and adopted to avoid creatingunexpected opportunists to penetrate these systems or conduct large-scale attacks.Along with security mechanisms that must be designed into the smart grid with the goalof reducing the vulnerabilities and mitigating their consequences, such as cryptographicalgorithms and secure protocols, appropriate intrusion detection systems (IDSs) and intru-sion prevention systems (IPSs) with the ability to detect and prevent malicious activitiesresulting from exploiting the vulnerabilities in the system should also be in place. The1Whuptyr 1B Introduwtionneed for research on intrusion detection for embedded processors within the smart gridwas emphasized in the United States (US) National Institute of Standards and Technology(NIST) guidelines for smart grid cyber security [2], in that smart grid contains a largenumber of processors with limited resources and strict timeliness requirements. AlthoughAMI deployment is still in pilot phase in many jurisdictions, several security flaws havealready been reported for multiple metering devices [3],[4]; this emphasizes on the need fordeveloping appropriate IDSs for AMI communication networks.In this thesis, we study security issues in AMI and requirements of IDS/IPS for AMInetworks. Accordingly, we propose various algorithms for detecting and preventing mali-cious activities against AMI. The proposed algorithms are tailored for unique characteris-tics of AMI networks. Mathematical tools and machine learning techniques are applied indesigning the algorithms, and their performances are evaluated through extensive analysis,simulations and experiments. The rest of this chapter is organized as follows. In Section1.1 we explain general techniques and trends for intrusion detection. Then, we describenew threats introduced against the power system due to addition of a data layer. Further,we overview requirements and challenges of detecting malicious activities against AMI.Section 1.2 provides a summary of the main contributions in this thesis, and discussesthe significance and novelty of the proposed mechanisms. Finally, the organization of thethesis is described in Section 1.3.2Whuptyr 1B IntroduwtionFigure 1.1: Architecture of AMI.FCF Dztzxtion of bvlixious Vxtivitizs VgvinstVyvvnxzy bztzring Infrvstruxturz (irznysAChvllzngzs vny czzys)AMI includes several communication networks as shown in Figure 1.1. Home area networks(HANs) are responsible for demand-side management. Home electric devices such as ap-pliances and thermostats, communicate with the smart meters through HANs to exchangeprice and usage data as well as control commands. Smart meters’ data in each neigh-borhood is collected by data aggregators before being sent to the utility. Communicationbetween smart meters and data aggregators is provided by neighborhood area networks(NANs). Wide area networks (WANs) enable the communication between data aggrega-tors and the utility networks. AMI infrastructure assets are divided into private and publicdomains. The private domain includes systems that are similar to standard informationtechnology (IT) assets. These systems contain a large amount of critical data; yet theyare located in data centers which are secure environments. The public domain assets, on3Whuptyr 1B Introduwtionthe other hand, are located in physically insecure environments which makes them morevulnerable to cyber threats. This necessitates development of appropriate IDSs/IPSs.FCFCF Intrusion Dztzxtion izxhniquzsAs it was defined in [5], ”intrusion detection is the process of monitoring the events thatoccur in a computer system or network and analyzing them for signs of possible incidents.”There are three key metrics commonly used for measuring the performance of an IDS,detection rate (DR), false positive rate (FPR) and Bayesian detection rate (BDR). DRis defined as the number of intrusion instances detected by the system divided by thetotal number of intrusion instances present in the test set. FPR is defined as the numberof normal patterns classified as attacks divided by the total number of normal patterns.BDR is the probability of occurrence of an intrusion once an attack is detected by the IDS.Assuming that V represents an intrusion alarm by the IDS, and I shows the occurrence ofan intrusion these metrics are formulated as follows:Dg = e (A|I) (1.1)Feg = e (A|I¯) (1.2)BDg = e (I|A) (1.3)Usually there is a trade-off between DR and FPR which is represented through receiveroperating characteristic (ROC) curves. An ROC curve shows the values of DR for differentvalues of FPR.In general, there are three types of IDSs based on the method they use for recognizingmalicious activities.• Signature-based IDSs have a database of predefined attack patterns, known as signa-4Whuptyr 1B Introduwtiontures, and detect intrusions by comparing the system behavior with the signatures.• Anomaly-based IDSs detect malicious activities as deviations from statistically nor-mal behavior of the system.• Specification-based IDSs also recognize intrusions as deviation from normal behav-iors; however, instead of statistical methods, normal behaviors are defined based onmanually extracted specifications of the system.Signature-based IDSs have low FPR, yet they are incapable of detecting unknown attacksand their databases need to be updated frequently. Existing anomaly-based IDSs, on theother hand, suffer from high FPR and require long training and tuning time, yet theyare able to detect new attacks. Specification-based IDSs potentially have low FPR andthe ability to detect new attacks. However, the strength of this type of IDS dependson the accuracy and efficiency of selected specifications. Considering that many of theAMI equipment apply new technologies, an exhaustive database of known attacks is notavailable. Thus, signature-based IDSs are not appropriate in the context of AMI.In this work, we design new anomaly-based and specification-based algorithms for de-tecting malicious activities against AMI. Specifically, we are focused on intrusion detectionand prevention in HANs, as well as detection of newly introduced attacks against the powersystem such as attacks against direct and indirect load control and energy theft attempts.The concept of intrusion detection was first introduced in 1980 in [6], where audit trailswere suggested as valuable information that can be utilized to detect anomalous behavioras sign of malicious activities. In 1984 the first model of intrusion detection, IntrusionDetection Expert System, was born [7]. Until 1990, the majority of IDSs were host-based[8] where individual host level audit records were analyzed. In 1990 the concept of networkIDS [9] was introduced in which instead of host behavior, network traffic was monitoredfor signs of intrusion. The first system under this category was Network Security Monitor5Whuptyr 1B Introduwtion[10]. IDS gradually entered the commercial market with products such as NetRanger andRealSecure [11]. One can look at intrusion detection as a classification problem in whichthe signatures or features of the system are classified under benign or malicious categories.Since 1995 various classification techniques were used to achieve a higher detection perfor-mance, including neural networks [12], genetic algorithms [13], state transition [14], fuzzylogic [15], and decision tree [16].Recent research on IDS is directed to target specific systems or attack types ratherthan general solutions. This approach allows to design IDSs tailored for requirements,limitations and characteristics of a given system or attack type which yields a betterdetection performance. In the context of IDS for AMI a few work have been done overthe last few years. Based on the algorithms introduced in each chapter of this thesis, weprovide a literature review of related works in the corresponding chapter separately.FCFCG Vyvvnxzy bztzring Infrvstruxturz ihrzvtsIncrease in use of information technology for demand side management introduces newtypes of cyber-intrusions. AMI threats are categorized into three primary groups [17]:customer attacks, insider attacks, and terrorist or nation-state attacks. These threats cancause cyber impacts such as loss of availability and integrity to the AMI system or the bulkelectric controls. Consequences of these cyber effects on power system, range from increasein peak usage to widespread outages. The lowest level, highest probability threat to AMIis the unethical customer. Next highest in probability is the insider who has financialmotivation. Beyond these threats are the high level, low probability threats of nation-stateor terrorist groups.6Whuptyr 1B IntroduwtionUnzthixvl xustomzr thrzvtOne of the major concerns in AMI is energy theft. Energy theft causes billions of dollarsof financial loss to the utilities every year. Traditional energy theft techniques requiremechanical manipulation of analog electric meters. However, employment of smart meters,introduces numerous new vectors for energy theft. In AMI, the usage data can be tamperedwith before recording, in the smart meter or during the transmission to the utility company.Software-based attacks usually need less knowledge and expertise, therefore, are moreprobable to become widespread. Utilities rely on blink count for detecting theft attempts,in which theft is detected by counting the number of times the meter has been de-energizedas a sign of customer attempt for tampering with the meter. This, however, does not covermany attacks that can happen over the communication links or through manipulation ofwires ahead of meters. New techniques need to be developed to efficiently detect energytheft attempts.The unethical customer requires low funding, has extended physical access to the AMIdevices in his premise, has a long time and moderate commitment to achieve his goal, andneeds low to high cyber skills. The threat might arise from small groups who have theknowledge to develop tools to penetrate advanced metering devices, and become widespreadwhen these tools go commercial. The primary cyber effect of customer attack will be under-report of customer usage. A secondary and probably unintended cyber effect can be failurein reporting correct outage or status information.The consequences of customer attacks can be low if only a few customers modify theirmetering data, to severe when the attacks become widespread and available to customerswith moderate technical knowledge. The immediate consequence of customer attacks is de-creased profitability. In long-term utilities will not be able to plan the infrastructure basedon demand, since the load information is not valid. Besides, invalid outage information7Whuptyr 1B Introduwtionincreases the operational costs and required time for disaster recovery.Insiyzr thrzvtOne of the AMI goals is to manipulate the load curve to reduce peak usage. This helps toreduce the cost of energy for both the utility and the end-customer. A generation providerwho sells energy to the utility may collaborate with an insider to attack the AMI in orderto make money. For instance, by modifying the pricing calculation software in the head-end server the insider can use AMI to increase peak usage. This increases the demand forgenerating high price electricity; therefore, the generation provider will make more money.The insider requires low funding and cyber skills, has low commitment to the goal, highstealth, high physical access, and long time for implementing the attack. The insider takesadvantage of access to systems at opposite end of the AMI from the customer end-point;some examples include AMI head-end, the system that provides pricing information (suchas energy management system or inter control center protocol server), and the networkinfrastructure for these systems. The impacts of insider attack on electric system includehigher peak electricity usage and artificially high usage reporting for planning.cvtion thrzvtThis is a high-level threat which targets AMI to cause damages outside the AMI system, forinstance to affect the bulk electric grid. Unauthorized access to sensitive AMI devices fromthe customer endpoint as well as mass load manipulation constitute the major concernsthat enable nation threat.AMI potentially allows access to the electric grid from the customer endpoints, sincethere is an inherent connection from the endpoint to sources of pricing information in theutility. By penetrating to the customer endpoint or cracking the wireless communicationbetween the AMI meter and other endpoint equipment or from AMI to the local aggrega-8Whuptyr 1B Introduwtiontors, the adversary can gain access the head end equipment. Details of this attack stronglydepend on the AMI implementation.The primary goal of AMI as a demand-response system is reduction of peak load throughmanaging the customers’ usage. This happens through direct and indirect load controlprograms. By tampering the pricing information or direct load control commands anadversary can control the load curve to cause damage to the power system equipment. Forinstance, in an attack against direct load control, the attacker sends turn of messages toall controllable customer equipment. After passing a long enough time which guaranteesmost of the equipment would turn on when allowed, the attacker sends turn-on permissionmessages. This results in a sudden increase in the load, which can affect the bulk electricgrid.FCFCH gzquirzmznts vny Chvllzngzs of Intrusion Dztzxtion inVyvvnxzy bztzring InfrvstruxturzIn designing intrusion detection mechanisms for AMI, specific characteristics and con-straints of this system must be considered. It has been repeatedly argued that existingIDS solutions designed for traditional IT systems are not applicable in AMI [18]. AMI isa large scale system including several communication networks with different technologies.Traditional IDSs including a number of lightweight agents reporting to a central manage-ment server is not applicable in such a system. AMI networks contain millions of nodes.With a central approach for monitoring and intrusion detection, the traffic load, requiredstorage and computational capabilities at the central server will be overwhelming.One of the most vulnerable systems in AMI is HAN. HANs are located in public do-main, and therefore are easier targets for attackers. In North America and many othercountries, wireless is the dominant HAN technology. Ease of accessibility of HAN devices9Whuptyr 1B Introduwtionand application of wireless technology, which uses a shared medium, makes them vulnera-ble to cyber attacks. At the same time, due to the resource and computational constrainsof HAN devices, implementation of strong security mechanisms is a challenging issue.There are many unanswered questions in the area of designing IDSs for HANs. For in-stance, who should be responsible for managing the IDS alarms? What is the cost of falsepositives and how much false positives are bearable for a HAN IDS? What are the differ-ences between HANs and other existing sensor networks, and how these differences affectthe process of monitoring and intrusion detection? As it was described in [2] consideringthe limited resources of senor nodes, the IDSs/IPSs must not impose high computationaland storage load on network nodes. At the same time, due to the large scale, FPR foran IDS must be very low without sacrificing the DR. Otherwise, it will introduce a largeoperational cost. Designing an IDS with high performance and low resource usage is verychallenging. In the area of IDS for sensor networks, focusing on specific attack types ratherthan general solutions was suggested [19],[20],[21],[22],[23]. Still performances of the pro-posed methods need to be significantly improved to be used in practice. Although a lot ofwork has been done in the area of intrusion detection, the area of intrusion prevention hasbeen neglected, while automatic prevention can overcome many challenges of managingthe IDS alarms. As many of the AMI deployments are located in areas far from the utility,receiving IDS alarms by the utility and acting upon them introduces a large operationalcost and delay in stopping the attacks. In traditional IDSs, once an attack is detected, analarm is sent to a network operator who is responsible for finding the roots of the attackand triggering response operations varying from remote diagnosis to on-site inspection.Considering the large scale of the smart grid, when human response is expected, a smallpercentage of false alarms results in a high operational cost. Therefore, in the context ofAMI, intrusion prevention mechanisms which not only detect but also automatically stop10Whuptyr 1B Introduwtionthe attacks are highly preferable.FCG hummvry of gzsults vny bvin ContriwutionsThis thesis investigates security challenges in smart grid and proposes several algorithmsfor detecting and preventing malicious activities against AMI. The main contributions inthis thesis are as follows:• We investigate cyber security and privacy issues in smart grid and challenges ofsecuring this system. We survey existing solutions to enhance the security and privacyof smart grid and provide directions for further research.• Identity spoofing is an important class of attacks which can be used as a basis forseveral other attack types to penetrate or disturb the operation of a ZigBee HAN.Considering the openness property of the transmission medium and inadequate re-sources for implementing strong security measures, HANs are highly vulnerable tothis attack. To determine whether an identity belongs to a legitimate entity or hasbeen counterfeited by a malicious node, forge-resistant parameters are employed.One property that has recently attracted the attention of researchers for detectingspoofing attacks in ZigBee and WiFi networks, is received signal strength (RSS)values. Several RSS-based techniques have been proposed over the last few years[19],[20],[21],[22],[23]. Yet, existing methods have limited performances, require mul-tiple air monitors (AMs) to provide an acceptable detection accuracy, and are vulner-able to environmental changes. Furthermore, there has been no attempt regardingautomatic spoofing prevention using RSS values in the past. As we discussed inSection 1.1.2, automatic prevention in networks like HAN is of great importance.In this thesis we propose a novel high performance RSS-based spoofing detection11Whuptyr 1B Introduwtionfor ZigBee-based HANs. By extracting magnitude and frequency related features ofRSS stream and adaptive learning of the distribution of RSS values, the proposedalgorithm provides a superior performance compared to previous works and is robustagainst environmental changes. We also introduce two techniques for preventingspoofing attacks using RSS values. Extensive analysis and experiments show a highperformance for the proposed methods. The results of this work were published in[24] and [25].• Several IDSs tailored for AMI networks have been proposed over the last few years.Some of the suggested algorithms [26],[27], apply the same method for detecting at-tacks in all parts of the AMI. Using the same solution for different AMI networksincluding HAN, NAN and head end which use different protocols and have differenttraffic features makes these methods inefficient. Some works are focused on specificattack types like false data injection [28],[29], and some cover specific parts of AMI,like NAN [30],[31] and supervisory control and data acquisition (SCADA) [32],[33].In this thesis we focus on designing a novel high performance intrusion detection andprevention system for ZigBee-based HANs, HANIDPS, as one of the most vulnerableAMI networks. HANIDPS utilizes a model-based IDS along with a dynamic ma-chine learning-based prevention technique to detect and prevent malicious activitieswithout prior knowledge of attacks. Considering that in HANIDPS the preventionoperation is performed automatically, the costs of false positives are low and limitedto some network overhead. Also the delay in stopping the attacks is significantlyshortened compared to when human intervention is required. This reduces the dam-ages caused by possible attacks. Through analysis and experiments we show thatHANIDPS is able to detect various attack types with a high performance. Theresults of this work was submitted to [34] and published in [35].12Whuptyr 1B Introduwtion• As described in Section 1.1.2, energy theft is one of the major threats against AMI.Current AMI energy theft detection systems (ETDSs) are mainly categorized intothree groups, state-based, game theory-based and classification-based. State-baseddetection schemes [28],[36],[37] employ specific devices, like wireless sensors and radio-frequency identifiers (RFID), to provide a high detection accuracy. This, however,comes with the price of extra investment required for the monitoring system includ-ing device cost, system implementation cost, software cost and operating/trainingcost. In game theory-based methods [38],[39], the problem of electricity theft de-tection is formulated as a game between the electricity thief and the electric utility.These methods may present a low cost and reasonable, though not optimal, solutionfor reducing energy theft. Yet, how to formulate the utility function of all players,including thieves, regulators and distributors, as well as potential strategies is still achallenging issue. Classification-based approaches [40],[41],[42],[43],[44],[45],[46],[47]take advantage of the detailed energy consumption measurements collected from theAMI. Under normal condition customers consumption follow certain statistical pat-tern; irregularities in usage pattern can be a sign of some malicious activities. Sincethese techniques take advantage of the readily available smart meter data, their costsare moderate. However, there are several shortcomings in existing classification-basedschemes which limits their DR and causes a high FPR, including use of imbalanceddata, dependency on high sampling rates which threatens customers privacy, vulner-ability to non-malicious changes in consumption patterns and contamination attacks.In this thesis we present a novel consumption pattern-based energy theft detector(CPBETD) which is robust against non-malicious changes in usage pattern and pro-vides a high and adjustable performance with a low sampling rate. Therefore, theproposed method does not invade customers’ privacy. Extensive experiments on real13Whuptyr 1B Introduwtiondataset of 5000 customers show a high performance for the proposed method. Wealso introduced instantaneous anomaly detector (IAD), an algorithm for detectingattacks against direct and indirect load control. IAD monitors the consumption pat-tern of a group of customers within a NAN, and detects abnormal behaviors withshort delay. We have provided a synthetic dataset to evaluate the performance ofIAD and observed satisfactory results. The results of these works were published in[48] and [49].In summary this thesis proposes several novel algorithms for detecting malicious activ-ities against AMI. The thesis includes two parts. In the first part we target the area ofintrusion detection in ZigBee-based HANs which are among the most vulnerable AMI sub-systems. We investigate challenges and requirements of intrusion detection and preventionfor HANs and accordingly present a solution that meets these requirements. In the secondpart, we suggest that the fine-grained usage measurements can be used to detect suspiciousbehaviors in AMI that can be originated from any of the AMI networks including HANs,NANs and WANs. We develop algorithms which by monitoring irregularities in customersconsumption patterns detect major types of malicious activities against AMI, includingenergy theft and attacks against direct and indirect load control mechanisms.FCH ihzsis drgvnizvtionThe rest of the thesis is organized as follows.In Chapter 2, we introduce several security issues in the smart grid. We survey existingliterature on different security aspects of the smart grid, and provide directions for furtherresearch. In Chapter 3, we propose an algorithm for detecting spoofing attacks in ZigBeeHANs in the future smart grid. The proposed method aims to detect spoofing attacks with14Whuptyr 1B Introduwtiona high performance and low resource usage. Further we introduce algorithms for automat-ically stopping spoofing attacks once they are detected. We evaluate the performance ofthe proposed methods through extensive experiments and analysis. In Chapter 4, we pro-pose an intrusion detection and prevention system for ZigBee based HANs which is ableto detect several attack types without prior knowledge of the attacks. We extract featuresof HAN network traffic using the corresponding standards to design a specification basedIDS. We also introduce some defense mechanisms for mitigating the attacks and suggesta machine learning approach to find the best strategy against a given attack. In Chapter5, we propose algorithms for detecting malicious activities against AMI that work basedon monitoring abnormalities in customers’ consumption patterns. We introduce CPBETDfor detecting energy theft attempts with high precision, and IAD to detect attacks againstdirect and indirect load control. Through extensive experiments we show the effectivenessof the proposed methods. Finally, the thesis is concluded and some potential future workis introduced in Chapter 6. Each of the main chapters in this thesis is self-contained andincluded in separate journal articles or conference papers.15Chvptzr GV hurvzy on hzxurity Issuzs in hmvrtGriysGCF IntroyuxtionThe existing power system, which provides one-way electricity distribution from centralpower plants to the consumers, is inefficient, unreliable and outdated. This legacy systemis not capable of responding to the increasing demand for energy in the near future, nor tosatisfy the requests of today’s modern life. According to [50] power outages resulting frompower system vulnerabilities are estimated to cost about 100 billion dollars per year inthe United States (US). In August 2003 a cascading outage of generation and transmissionfacilities in the North America, caused a massive blackout in Northeastern and MidwesternUS and Ontario Canada, which resulted in a 6 billion dollar loss. At the same year, majorblackouts also happened in several European countries including Denmark, Sweden, andItaly [51].The vision of smart grid has the potential to bring reliability, efficiency, flexibility,resilience, robustness, and consumer participation to the electricity system by adding acyber layer to the power grid and providing two-way energy flow and data communications.The two-way energy flow allows the easier integration of the renewable energy, such assolar and wind, into the electricity system and enables distributed energy generation. Thewidespread usage of plug-in hybrid electric vehicles (PHEVs) requires a smart grid where16Whuptyr FB gywurity Issuys in gmurt Gridscustomers are aware of the usage prices; accordingly they charge their vehicles in the lowtariff period when the consumption is low [52]. Intelligent monitoring and control of thepower system as well as demand side management relies on the two-way communicationamong the smart grid deployments.Addition of the 2-way data communication layer, however, can expose the power systemto many cyber security threats. Beside security vulnerabilities introduced by the expansionof information system, complexity, highly time sensitive operational requirements and largenumber of stakeholders will introduce additional risks to the power system.In this chapter we introduce several cyber security and privacy issues in smart grid. Weexplain the major security challenges that must be considered in the context of the smartgrid. Moreover, we investigate the solutions that have been proposed by researchers, andprovide directions for further research to address the existing security problems. The rest ofthis chapter is organized as follows. In Section 2.2, we briefly introduce the architecture ofthe smart grid. Section 2.3 explains the significant causes of smart grid security concerns.Section 2.4 addresses the privacy issues. Cryptography solutions and key managementchallenges are covered in Section 2.5 and 2.6 respectively. Section 2.7 investigates securityrisk assessment and management issues. Section 2.8 addresses the secure architectures forthe smart grid, and Section 2.9 summarizes the chapter.GCG hmvrt Griy VrxhitzxturzAccording to a definition by electric power research institute (EPRI) smart grid is ”apower system made up of numerous automated transmission and distribution systems,all operating in a coordinated, efficient and reliable manner. This power system servesmillions of customers and has an intelligent communication infrastructure enabling thetimely, secure, and updatable information flow needed to provide power to evolving digital17Whuptyr FB gywurity Issuys in gmurt GridsFigure 2.1: Integration of a data layer to different parts of power system in smart grid allowsautomated transmission and distribution as well as energy conservation and generationbased on demand.(Image by Southern California Edison)economy”. Figure 2.1 shows integration of the data layer to different parts of power systemin smart grid. A conceptual reference model has been presented in the US NIST smart gridinteroperability record [53]. In this model the smart grid is divided into seven domainsincluding generation, transmission, distribution, markets, operations, service provider, andcustomer. This reference model explains the interfaces among domains, networks andactors within each domain, and communication between domains and their gateway. Fromthe operational point of view, smart grid is divided into three domains: AMI, distributedgeneration, wide area measurement and control (WAMAC).GCGCF Vyvvnxzy bztzring InfrvstruxturzAMI provides two-way communications between smart meters and utility companies whichenable real-time transmission of power consumption and pricing data, as well as controlcommands. This helps customers to adjust their consumption according to the pricinginformation and helps the utilities to deal with the overload caused by peak electricitydemand.18Whuptyr FB gywurity Issuys in gmurt GridsIn order to provide a security profile for AMI, the UtiliSec working group has presenteda logical architecture for AMI in [54]. The authors believe that although technology choicesand features of an AMI system might vary among utilities, at the logical level they arevery similar. They have extracted a logical architecture including the logical componentsof an AMI system and their communications, based on the detailed scenarios and usecases provided by AMI Enterprise community. This work is valuable in that it providesan inclusive source of information about several components of an AMI system, theirinformation and control signal flow, and their connections, which can be used as a basisfor security analysis and developing security solutions for AMI systems.GCGCG Distriwutzy GznzrvtionOne of the major goals of smart grid is to increase the reliability of the power system. Thiscan be achieved by employment of distributed energy resources (DER). DERs are small-scale power generators. A DER can be a fossil fuel generator like a natural gas turbine, arenewable energy generator such as a wind turbine or a roof top solar, as well as a batterythat can save energy over the low-price periods and sell it back during peak usage hours.Unlike traditional power grid that relies on centralized generation and provides one waypower flow from the central generators to the customers, smart grid utilizes two-way powerflow. This enables the customers to sell back the excessive power that has been generatedby their local microgrids, to the utilities. DER can relieve grid congestion by divertingpower from the power surges; and therefore increases the grid robustness against shutdowns. Intelligent integration and management is required in order to efficiently combinethese devices into the distribution grid. Intelligent integration and management relies onthe real-time two-way communication network between DER elements and utilities.19Whuptyr FB gywurity Issuys in gmurt GridsGCGCH liyz Vrzv bzvsurzmznt vny ControlWAMAC is applied to improve the reliability and visibility of the power system by syn-chronously measuring the instantaneous state of the system. Unlike traditional powersystem where decision making and executing is performed in the range of multi-seconds tomulti-minutes, WAMAC enables the future smart grid to take action within 100 millisec-onds time frame. WAMAC consists of phasor measurement units (PMUs) or syncrophasors,which are digital instruments responsible for measuring the parameters of electrical net-works, phasor data concentrators (PDCs) responsible for collecting measured information,and supervisory control and data acquisition (SCADA) responsible for central control. Thereal-time operation and high level of interconnectivity, however, might expose the systemto outside attacks.GCH hzxurity ihrzvtsThe future smart grid is expected to enhance the security and reliability of the existingpower system. Without strong security measures in place, however, not only the smartgrid will inherit the vulnerabilities of the legacy power grid, but also new vulnerabilitieswill be added due to application of the new technologies. Many security threats havebeen reported for the legacy power system until now. In March 2007, the Departmentof Energy’s (DOE) Idaho National Laboratory (INL) conducted an experiment named”Aurora Generator Test”. In this experiment, the exploitation of a security vulnerabilityin SCADA system caused physical damage to a diesel generator [55]. Later, in 2008,the INL published a report in which several vulnerabilities of the SCADA system werecategorized and described [56]. Still there are many flaws in the legacy power systemwhich are not publicly announced. Considering that the smart grid is built on top of the20Whuptyr FB gywurity Issuys in gmurt Gridsexisting power grid it is crucial to improve the security of the legacy system.The supporting technology for the smart grid includes several devices located in physi-cally insecure environments, such as smart meters, intelligent appliances, distributed gen-eration and storage equipment. These devices have 2-way communication to the electricsystem, and therefore add numerous entry points to the grid. Because of their unprotectedlocations, it is easier for attackers to exploit the vulnerabilities of these devices and ei-ther cause local damages or benefiting from the 2-way communication, gain access to themore critical parts of the network. In [3], the authors explain how important data likeauthentication keys can be extracted from the memory of the smart meters and maliciouscodes can be inserted into these devices to launch attacks against other parts of the grid.Considering the large scale of the smart grid, one single software vulnerability in a device,like a smart meter, can be used to compromise millions of devices.Wireless technologies are widely used in the smart grid deployments, because of theirlow-cost, low power consumption, ease of installation, etc. On the other hand, wirelessnetworks are inherently more vulnerable to several types of passive and active attacks,such as eavesdropping and denial of service, compared to wired networks, in that theycommunicate through the shared medium of air. ZigBee standard which is the dominanttechnology for HANs in North America is still in early stages and its security has notbeen evaluated broadly. Still serious vulnerabilities have been reported in Zigbee protocol[57],[58],[59],[60].Smart grid is an attractive target for different attackers with various motivations. Un-ethical customers, publicity seekers, curious or motivated eavesdroppers, etc. might targetthe grid for a variety of malicious reasons. Smart grid is a critical infrastructure that manyother utilities depend on; therefore not only does attract normal hackers with less harmfulintentions, but also terrorists might want to disrupt the grid. When many individuals with21Whuptyr FB gywurity Issuys in gmurt Gridshigh motivations and rich resources target the system, the risk of finding and exploitingthe vulnerabilities and penetrating to the system increases.GCI erivvxyCustomers’ concern about their privacy is one of the major obstacles public adoption ofsmart grid. Unlike the traditional power grid where metering data is read monthly, in thesmart grid detailed energy usage data are collected through smart meters at much shortertime intervals (about every 15 minutes or less). While these precise data are critical toefficient electricity distribution, demand-side management, load management, etc. theymight reveal a great amount of valuable and intimate information about customers, rang-ing from the energy usage patterns and the types of household devices and appliances, toinformation about the number of individuals in a house and their specific activities [61].A lot of research has been done on nonintrusive appliance load monitoring (NALM) tech-niques. NALM techniques can identify the individual household appliances by comparingthe energy usage pattern to libraries of known usage patterns (signatures) of different ap-pliances [62]. NALM algorithms can use the detailed data obtained from smart metersto reveal intimate information about customers’ habits. Another challenging issue is theownership of the valuable collected data. Several entities can benefit from the data includ-ing utility companies and appliance manufacturers. The ownership and accessibility of thedata should be identified clearly.One solution to mitigate the privacy problems in the smart grid is to anonymize datawhenever possible, so that the meter readings cannot easily be associated to a specificcustomer. Although some of the collected data should be attributable, for example forbilling purposes, most of the high-frequency readings, used for power generation and dis-tribution network control, are not required to be attributable to an identified customer;22Whuptyr FB gywurity Issuys in gmurt Gridsinstead, knowing the sub-station or the neighborhood of the gathered data will suffice.Anonymization is a rich research topic and several algorithms for different applicationshave been proposed. Along with adopting the existing algorithms, innovative methodswhich better fit the characteristics and requirements of the smart grid are required. In[22], a method for anonymization of the smart meter data was proposed. In this approachthe meter readings are divided into two categories, low frequency data and high frequencydata. Each data group is transferred with a specific identifier (ID), low-frequency ID(LFID) and high -frequency ID (HFID). The former is attributable, while the latter isanonymous. HFID is hard-coded inside the smart meter and known only by the manufac-turer and a trusted third party responsible for ID verification of data at the aggregators.The disadvantage of this method is that the HFID is a fixed value; considering that thelow frequency and high frequency data are correlated, it is possible to correlate the HFIDand LFID pairs.Appropriate system design is another way to reduce the privacy invasion risks. Forinstance, if optimization of energy consumption is performed locally through intelligentappliances and customer Energy Management System (EMS), the necessary granularity ofdata transfer will be reduced [2]. In [63], a local power management system model was pro-posed. This model uses a rechargeable battery to moderate the exposure of load signature.The idea is to resist the changes in power load so that the metered load stays constant. Thebattery discharges and recharges when the energy usage increases and decreases respec-tively; therefore, the cumulative energy usage stays almost the same over time. Althoughthis method provides a high level of protection, it involves some practical challenges. Onesignificant problem with the proposed method is that the algorithm depends on the usageof a rechargeable battery, which might not be available in all HANs in the future smartgrid. In addition, applying this method to provide privacy might conflict with power usage23Whuptyr FB gywurity Issuys in gmurt Gridsefficiency, which is one of the main goals of the smart grid. For instance, maintaining theconstant load might force the battery to charge during the peak usage period.GCJ CryptogrvphySmart grid collects data from a large number of devices, such as smart meters and smartappliances. The collected data are used to manage the demand-response and integrate thedistributed electrical energy resources. The data are normally transferred through wirelesslinks which are not secure in nature. Therefore, strong security measures must be in placeto protect the critical communication assets. Encryption and authentication - althoughnot enough - can play a significant role in improving the integrity and confidentiality of thedata. Constraining issues that should be considered in designing cryptographic algorithmsfor the smart grid include:• Some smart grid devices, such as home appliances and residential meters, are em-ployed in very large scales. In order to be cost effective, they have limited computa-tional power, memory and storage. Although it is expected that most future smartgrid devices support basic cryptographic capabilities, in designing the cryptographicalgorithms these constrains must be considered.• Some communication channels in the smart grid are designed to transmit short mes-sages and therefore, have low bandwidths. Integrity protection mechanisms such asCipher-Based Message Authentication Code (CMAC) [64] add 64 to 96 bits to everymessage [2]. This leads to a high overhead and might cause latency which is notaffordable in many applications in the smart grid. Bandwidth limitations should alsobe considered in designing authentication algorithms for applications in which dataare transferred with high frequency, such as wide area protection.24Whuptyr FB gywurity Issuys in gmurt Grids• Unlike many other IT systems where confidentiality and integrity are the most im-portant cyber security properties, in many parts of the smart grid availability getsthe highest priority. It is more important to make sure power is available rather thanmaking sure the data regarding power flow are confidential. Therefore, in design-ing security measures such as authentication schemes, it should be considered thatdegrading the availability is not an acceptable cost.• Smart grid equipment have much longer (about 20 years) life compared to typicalIT systems, which might outlast the cryptography algorithms lifetimes. Test and re-placement of these devices are often longer and more costly, due to the large-scale andavailability requirements. In designing the cryptographic modules for such devices,evolvability and upgradability must be considered to enable future changes. Alsousing cryptographic schemes that exceed current security requirements is encouragedto postpone the possible need for future upgrades.• Real time operation is necessary for many smart grid deployments. Cryptographyand authentication algorithms for such systems require minimum computational costand packet buffering.Beside several existing standard encryption algorithms and authentication schemes thatcan be adopted in different parts of the smart grid, research on new methods that bestmeet the unique characteristics of the smart grid systems is required.Identity-based encryption (IBE) is a new public key cryptographic system in which thepublic key is generated based on pre-agreed information bound to the user identity. Thesender can encrypt the message with the receiver’s public key, without communicating toany trusted third party. Instead, the receiver obtains its private key from a key-generatingserver (KGS) only if it wants to decrypt the message. Once the receiver obtained its privatekey, it might not need to communicate to the KGS as long as the private key validity25Whuptyr FB gywurity Issuys in gmurt Gridsperiod is not expired. Some advantages of IBE over standard public key infrastructure(PKI) include simple system setup, reduced key exchange data traffic, and asymmetric keymanagement requirements. These qualities are beneficial in many parts of the smart grid.For instance, in HAN the smart appliances’ sensors might be subject to energy constrainswhich is not the case for smart meters. Using IBE can off-load the key message traffic fromthe sensors to the smart meter with less resource constrains. However, in IBE scheme theKGS should be absolutely trustable which might be impractical in large deployments. Insuch scenarios hybrid approaches which combine PKI and IBE are preferable.So et al. [65] have proposed an identity-based signcryption (IBS) scheme for end-to-end data encryption, authentication, and message integrity in the AMI communicationnetwork. Their scheme includes two phases: registration phase and data transmissionphase. In registration phase, when a device wants to decrypt a received message or signa message to be transferred, registers with a KGS to obtain its private key. The KGSholds the master key of the system and generates the private key based on this masterkey. Once the device obtained its private key, it can communicate with other devices inthe network without contacting the KGS again. In data transmission phase, when deviceA wants to transfer a message to device B, it calculates the public key of B, using B’sunique ID. Then A encrypts the packet using an advanced encryption standard (AES)block chipper with a unique key generated based on B’s public key. Upon receiving theencrypted packet, B decrypts the packet with its own private key and verifies the contentof the packet using A’s public key. So et al. [65] have used Tate pairing [66] on ellipticcurve cryptosystem to generate a shared secret key pair between the message sender andreceiver. This shared secret is used for authentication and encryption purposes. Theyhave further improved the throughput of their signcryption scheme, by proposing a keycaching mechanism to reduce the number of Tate pairing calculations. Simulations in [66]26Whuptyr FB gywurity Issuys in gmurt Gridsconfirm that the processing time and computational overhead of the proposed methodmeets the smart grid requirements. One advantage of elliptic curve is the adjustable levelof security that this method can provide. More level of security needs more calculationtime and therefore more delay. One can make a trade-off between security level and timedelay, by adjusting the algorithm parameters, according to the specific requirements ofdifferent applications in the smart grid. Some other advantages of the proposed schemeinclude simple key management mechanism and independency from pre-device softwareconfiguration by users. More research, however, is required to evaluate the robustness ofsuch an algorithm against different cyber attacks, as well as research on key revocationand key update mechanisms.In-network aggregation methods can be applied to improve the efficiency of many-to-one communications in the smart grid. In these methods the computational load ofdata aggregation is distributed among the intermediate nodes of the network. In doingso, the intermediate nodes need to perform computational operations on the data packetsin transit, which is impossible if the data is encrypted by traditional cryptographic algo-rithms. Homomorphic encryption schemes, on the other hand, allow some mathematicalcomputations on the cipher text.In [67], Li et al. presented a distributed data aggregation method. In their approach,data aggregation is performed through the smart meters participating in routing the datafrom the source smart meter to the data collector. Each intermediate smart meter adds itsown data to the received data packet coming from the previous smart meter in a tree struc-ture. The aggregator then receives the accumulated data of each tree branch. Authors of[67] applied Pillier homomorphic encryption scheme [68], so that the middle nodes can addtheir data to the encrypted data of previous smart meters without accessing the plain textinformation. Compared to traditional aggregation methods, the proposed scheme reduces27Whuptyr FB gywurity Issuys in gmurt Gridsthe computational load of data aggregation, while it slightly increases the computationalload of the smart meters. In traditional approaches, each smart meter encrypts its messagewith its public key, and then the collector has to decrypt all the received messages fromall smart meters separately. In [67], on the other hand, in addition to encrypting messageswith homomorphic encryption, each smart meter has to do some mathematical operationto aggregate the messages. Instead, the collector only needs one asymmetric decryptionfor all of the smart meters in its tree. Considering the significant decrease of the requiredcomputation in the aggregator, computational overhead of the smart meters is acceptable.This scheme, however, is not resistant against false data injection. Having the cipher andpublic key, an illegitimate smart meter can falsify the aggregation result by injecting falsecipher which is decryptable to a meaningful data. Research on mechanisms to detect ma-licious manipulation of aggregated data, as well as homomorphic encryption algorithmsthat match the requirements of in-network aggregation in the smart grid is required.Applications and requirements of multicast authentication in the smart grid was investi-gated in [69]. Authors of [69] proposed a new one-time signature scheme with lower storagecost and smaller signature size compared to the existing methods, at the cost of increas-ing the computational requirements for signature generation and verification. However,the increased computation effect has been mitigated by flexible allocation of computationsbetween sender and receiver regarding their computing capabilities. Multicast authentica-tion has different requirements in different points of the smart grid. While bandwidth isimportant in wide area protection, due to high frequency of data transmission, it mightnot be a crucial factor in HANs. Instead storage limitations might be a concerning issuefor home appliances. The scheme in [69] is particularly appropriate for applications withhigh multicast frequency and small message size such as demand-response and wide areaprotection.28Whuptyr FB gywurity Issuys in gmurt GridsFouda et al. [70] proposed a light-weight message authentication algorithm for smartgrid authentication. They adopted Deffi-Helman key establishment protocol [71] in theirproposed authentication method. Also integrity of messages was assured by applying aHash-based message authentication code. Fouda et al. [70] compared the performance oftheir scheme to elliptic curve digital signature algorithm (ECDSA) through computer sim-ulations. ECDSA is considered to be a secure protocol for the smart grid demand responsecommunication. They concluded that their approach has higher scalability, less decryp-tion/verification delay, and less memory usage. Low latency and few message exchangesfor authentication, makes this algorithm suitable for mutual authentication among smartmeters. More research is required to evaluate the resistant of the proposed mechanismagainst passive and active cyber attacks.GCK Kzy bvnvgzmzntAnother challenging issue in utilizing cryptographic algorithms for the smart grid is keymanagement. Smart grid deployments, such as distributed automation, AMI, distributedgeneration, substations, include a large number of devices with many resource constrains.These devices are responsible for providing remote sensing and remote control. Researchon economical, large-scale and flexible key management schemes that require less central-ization and persistent connectivity, on a scale of possibly millions of keys and credentialsis needed [2].A key management scheme tailored for the smart grid deployments was proposed byWu et al. [72]. They presented an authentication scheme based on elliptic curve publickey cryptography [73] and Needham Schrouder symmetric key authentication protocol[74]. The public key was used to securely establish the symmetric keys of the agents.Then the on-the-fly generated symmetric key is employed in the authentication protocol.29Whuptyr FB gywurity Issuys in gmurt GridsThe authors investigated the robustness of their proposed mechanism against a variety ofsecurity attacks and concluded that their scheme is resistant against replay and man-in-the-middle attacks. Simplicity and scalability are some benefits of their proposed approach.Though, no experimental proof regarding the performance of the algorithm, especially inlarge scale was provided.Group key management for advanced distribution automation system issues was in-vestigated by Sun et al. [75]. They, further, presented a novel key management schemebased on a decentralized three-tire hierarchal network model. Their scheme uses one-waykey derivation algorithm in order to minimize the number of rekeying messages. Efficientkey update and key storage, along with scalability are the advantages of their proposedmethod.Authors of [76] discussed the application of public-key and symmetric-key schemesfor key management in electricity transmission and distribution substations. They haveinvestigated possible attacks and failure modes of these two schemes and compared theiradvantages and disadvantages [76].GCL hzxurity gisk Vsszssmznt vny bvnvgzmzntRisk management is the practice of identifying the critical assets, finding their vulnera-bilities, assessing and prioritizing the risks, and establishing appropriate risk mitigationtactics. Considering the evolving nature of the smart grid vulnerabilities, appropriate riskmanagement frameworks must be designed to improve the security and reliability of thepower system against deliberate attacks, inadvertent errors, equipment failures, or naturaldisasters. Risk assessment is a critical step in the risk management process. In order todecide how much security is required in a specific point of the smart grid, a risk assess-ment methodology is needed. Risk assessment recognizes the system vulnerabilities and30Whuptyr FB gywurity Issuys in gmurt Gridsthe threats that might exploit these vulnerabilities, and estimates the extent of harm thatthese threats might cause. This enables the security personnel to prioritize the risks andaccordingly implement more efficient and cost-effective security measures. Consideringthat the smart grid is a large-scale system which integrates physical and cyber systems,implementing a systematic risk assessment methodology is a challenging task.One of the initial works in the context of cyber security assessment of the power gridwas done by DOE’s infrastructure assurance outreach program [77]. A few works havebeen done with the focus of risk assessment and management methodologies in the contextof the smart grid.Ray et al. [78] investigated different approaches for security risk management of thesmart grid. They believe that although the future smart grid entails both industrial con-trol systems (ICS) and IT systems, which have different performance requirements andmight be the target of different types of attacks, unified risk management approaches arerequired to enhance the security and reliability of the smart grid information exchangesystem. The authors propose threat and vulnerability modeling schemes to categorize thethreats, analyze their impacts, and accordingly prioritize them. They use these schemes toprovide quantitative methodologies for risk assessment. In quantitative risk assessment ap-proaches, security metrics for assets, including systems and applications, are defined; thenthe threat risk which is the probability of exploiting a vulnerability to cause adverse effectsis calculated by performing mathematical computations on the security metrics. Althoughquantitative approaches are accurate and consistent, quantification of attack probabilities,especially for a complicated system like the smart grid which lacks a large database of priorattacks is very difficult.Kundur et al. [79] presented a framework for cyber security attack impact analysis ofthe electric smart grid. They have used directed graphs to provide a unified model for31Whuptyr FB gywurity Issuys in gmurt Gridsboth cyber and physical grid elements relationship. The graph nodes represent the gridelements, while the graph edges show the relationship among different grid components.The state information of each of the graph nodes is modeled by a dynamic system equation,which shows the physics of the interaction of electrical grid elements and the functionalityof the cyber grid components.In [80] two methods for cyber security vulnerability assessment of power industry wasproposed. One method is a probabilistic assessment in which a vulnerability index isassigned to each of the cyber systems of the power grid by the use of probabilistic methods.To achieve the above goal, the probability of occurrence of cyber security events, theprobability of the accidents resulting from these security events, and the loss caused bythese accidents are calculated. This information is used to calculate the vulnerability indexof each of the cyber systems. The authors believe that this method is more appropriatefor control systems of the power industry. However, identifying a probabilistic distributionfor the cyber security events can be a challenging issue. Some works in this regard weredone in [81] and [82]. The other method is an integrated risk assessment in which first thelevel of cyber security risk is categorized into five group, and then a risk matrix is set up.The risk matrix includes the percentage of cyber security risk relating to each category,the probability of accidents caused by the security events and the influence of the accidentson the power industry. The vulnerability index is calculated based on this information.The authors believe that this method is more appropriate for management informationsystem. This work is valuable in that it provides two procedural methods for vulnerabilityassessment of the cyber systems in the power industry, which can be extended for usein the smart grid. However, no proof regarding the efficiency of these two methods waspresented in [80].32Whuptyr FB gywurity Issuys in gmurt GridsGCM hzxurz hystzm VrxhitzxturzIntelligent electronic devices (IEDs) will be widely used in the future smart grid to par-ticipate in critical tasks such as tripping circuit breakers in case of anomalies, raising orlowering voltage levels, etc. IEDs are expected to be remotely controllable and updatable,which might expose them to a variety of cyber attacks, including illegal command execu-tion by a spoofed remote device, or attacks by software and firmware updates. Therefore,improving the security of IEDs is of great importance. Considering the scale and resourcelimitations of such devices, the need for research into cost-effective and mass-produciblearchitectures that are highly tamper resistant and remotely recoverable, as well as securefirmware/software upgrade methods was emphasized in [2].The need for code-auditing capabilities of the AMI deployments, which can be per-formed through remote attestation, was emphasized by the UtiliSec AMI security taskforce [83]. According to [84] ”remote attestation is the process whereby a remote party canobtain certified measurements of parts of the state of a system”. LeMay et al. [84] havepresented an architecture named cumulative attestation kernel (CAK). This architecturecan provide secure remote firmware auditing capability for networked embedded systems.In their approach an unbroken sequence of applications firmware revisions, is recorded inthe kernel memory as audit logs. During the attestation process, a signed version of thisaudit log is provided for the verifier. In designing such architecture they considered thelimitations of the embedded systems that are typically applied in AMI. Furthermore, theydeveloped a prototype CAK that satisfies the requirements of advanced electric meters, andthey investigated the security and fault-tolerance properties of their proposed architecture.However, in the threat model for this architecture the authors excluded extraordinary envi-ronmental phenomena, such as cosmetic radiation. Moreover, they assumed that physicalattacks on microcontrollers, for instance probing, silicon modification, and fault analysis33Whuptyr FB gywurity Issuys in gmurt Gridsare blocked by independent tamper-resistance techniques built into the device’s package.Although this prototype has been designed for smart meters, it is also extensible for usein IEDs.GCN hummvryIn this chapter, we have focused on security aspects of the smart grid. We have describedsome security challenges that are introduced due to the special characteristics, architecture,constrains, and goals of the smart grid. We have investigated existing solutions to addressthese security issues and reviewed some works that have been done in this area over the lastfew years. Moreover, we have provided directions for further research. Although there arestill some uncertainties regarding the architecture and function of the smart grid systems,it is crucial to design security solutions before widespread deployment. In addition toadoption of traditional security measures, research on new methods that meet the especialrequirements of the smart grid is needed.34Chvptzr Hhpoofing Dztzxtion vny erzvzntionhystzm for oigWzzBWvszy ]omz VrzvcztfiorksHCF IntroyuxtionIn this chapter we focus on designing a novel spoofing detection and prevention systemfor ZigBee-based HANs. Identity spoofing is an important class of attacks which by ex-ploiting the openness property of transmission medium is launched more easily in wirelessnetworks compared to wired ones. In a spoofing attack, an adversary masquerades as oneor more legitimate nodes to inject malicious traffic on their behalf. Spoofing is a basis forseveral other attacks, including various types of denial of service (DoS), session hijacking,etc. Therefore, designing appropriate spoofing detection and prevention mechanisms is ofcrucial importance and an open research topic.Most existing systems rely on cryptographic methods to prevent spoofing attacks. How-ever, the long history of breaking the authentication and encryption mechanisms employedin wireless networks shows the inadequateness of such approaches in guaranteeing a spoof-free network [85]. Furthermore, a variety of DoS attacks work in layers 1 and 2 of thenetwork protocol stack, while encryption usually covers the upper layers. In addition, re-35Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworkssource limitations of wireless sensors, hinder the implementation of strong cryptographicschemes.To determine whether an identity belongs to a legitimate entity or has been coun-terfeited by a malicious node, forge-resistant qualities are utilized. One commonly usedcharacteristic is sequence number of data-link layer frames [86]. Sequence numbers arelinear chains of numbers assigned to frames by network cards. It is assumed that sincesequence numbers are allotted by network cards, attackers cannot create a stream of pack-ets that match the sequence number of the legitimate traffic. Therefore, the gap betweensequence numbers could be employed to detect the presence of sybil nodes. However, nowa-days myriad of free packet generator tools exist which enable the attackers to manipulatethe desired fields of every frame.Another property that has recently attracted the attention of researchers is RSS. Ac-cording to the laws of physics, signal strength at a receiver antenna is proportional tothe spatial distance between the receiver and the sender. Assuming that the sybil andlegitimate nodes are located in different places, the RSS spatial correlation can be usedto discriminate the entities applying the same identity. Beside distance, RSS depends onwireless environment features, such as absorption and multipath effect, which makes it hardto predict the power level of frames collected by a given receiver. Thus, sybil nodes cannotsimply adjust their power levels to match the RSS of the legitimate nodes. Considering thatRSS is a physical property that is hard to forge and is highly correlated to the transmitter’slocation, using RSS for spoofing detection has the potential to provide a much better per-formance compared to when a nonphysical parameter like sequence number is used whichis much easier to forge. This is why using RSS for spoofing detection has recently attracteda lot of attention and we have used it in our work. On the other hand, RSS is a randomvariable with Gaussian distribution [19]. Variance of RSS values confines the resolution of36Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksRSS-based detection methods. To address this problem, multiple air monitors (AMs) areemployed [19],[20],[21],[22]. AMs are responsible for listening to the network traffic andanalyzing the RSS values of received frames. Increasing the number of AMs facilitates finerdifferentiation between entities located in closer distances and/or have close RSS values.The downsides of using multiple AMs are the extra cost required for excessive devices, aswell as secure and reliable connections between several AMs and a central server. More-over, relying on multiple AMs complicates the development of preventive measures. Inthis chapter, we introduce a novel RSS-based algorithm which provides a high detectionaccuracy even with a single AM. Unlike most existing solutions that directly process theRSS values of the packet stream, we employ feature extraction techniques to reduce dataredundancy, and obtain a better representation of data. We extract two features of RSSstreams, ratio of out-of-bound frames and the summation of detailed coefficients (SDC) indiscrete Haar wavelet transform (DHWT) of the RSS streams. The former is effective indetecting spoofing attacks where sufficiently far distances exist between the attacker andthe genuine nodes, or when the traffic rate of malicious node significantly outweighs thatof the genuine node. SDC is useful when the legitimate and malicious nodes are locatedin close proximity or have close RSS mean values. Moreover, we suggest adaptive learningof RSS mean values, which reduces the false positives imposed by environmental changes.In addition to spoofing detection algorithm, we introduce two algorithms for automati-cally distinguishing and filtering malicious packets once a spoofing attack is detected. Thecontributions of this chapter are summarized as follows:• We survey existing RSS-based spoofing detection mechanisms, and discuss theirweaknesses.• We design a novel robust RSS-based spoofing detection mechanism with low com-putational and resource overhead. While existing methods rely on multiple AMs for37Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksaccurate attack detection, the proposed approach provides a high detection perfor-mance with a single AM, and a superior performance over other existing methodsusing multiple AMs.• For the first time, we introduce two methods for spoofing prevention using RSS values,static threshold and dynamic threshold. The former has very low computationalrequirements, yet due to high FPR introduces some network overhead. The latterneeds more computations. However, it has a higher accuracy and a very low networkoverhead.• Extensive performance analysis and experiments prove that the proposed algorithmscan detect and stop spoofing attacks with high accuracy and low overhead.The remainder of this chapter is organized as follows: In Section 3.2 we survey theexisting RSS based spoofing detection methods. HAN Architecture is described in Section3.3, and threat model is provided in Section 3.4. Section 3.5 and 3.6 explain the proposedspoofing detection and prevention algorithms orderly. In Section 3.7 the performance ofour approach is analyzed. Experiment results are described in Section 3.8. Section 3.9includes a discussion on the proposed method and comparison with previous approaches.Finally Section 3.10 summarizes the chapter.HCG V hurvzy on ghh Wvszy hpoofing DztzxtionbzthoysIn [19] a method for detecting spoofing attacks in wireless networks based on signalprintswas proposed. Signalprint was defined as a vector containing RSS readings in multiple AMs.Signalprints of the traffic generated by a single node are expected to be similar. Dissimilar38Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworkssamples suggest the presence of an attacker. As a dissimilarity measure, number of vectorelements differing from a mean value more than a predefined threshold was counted. Thethreshold value was defined based on the variance of RSS values. When the out-of-boundelements of a vector exceeded a specific number, an attack alert was raised. For an IEEE802.11 testbed and 6 AMs, the authors reported 95% DR without mentioning the rateof false positives. This approach requires a high number of AMs to achieve a desirableperformance. The authors did not provide any updating mechanism for mean values of theRSS stream in AMs, which may cause a high FPR in the long term due to environmentalchanges.Spoofing detection in IEEE 802.11 transmitters with antenna diversity was targetedin [21]. The authors showed that as a result of antenna diversity, the RSS distributionfunction tends to a multi-Gaussian model, instead of the single Gaussian assumed in otherliterature. They further showed that the difference between the mean RSS values of thetraffic generated with different antennas of the same node is more than 5dB (5dB is thevariance of the RSS Gaussian model used in other literature, which is an important factorin defining threshold values for classifiers.). For each AM, an RSS profile was built; thenfor a sequence of RSS samples, likelihood-ratio test was performed to detect deviationsfrom the AM profiles. One or two times updating of RSS profiles per day was suggested todeal with the effect of environmental changes on distribution function. In an IEEE 802.11testbed in an office building, using a single AM they achieved 73.4% DR with 3% FPR. Forthe same FPR, by increasing the number of AMs to 20, detection rate improved to 97.8%.This work is valuable in that it is the only method effective for multi-antenna transmitters.However, this approach is not efficient for single antenna, since it requires a high numberof AMs, high computation and resources. Besides, one or two time profile updates mightnot be adequate to avoid false positives.39Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksIn [22] a technique for detecting spoofing attacks and localizing the position of ad-versaries was introduced. The authors used k-means clustering [22] for attack detection.For each frame, an n-dimensional vector of RSS readings in n different AMs was defined.Then, utilizing k-means algorithm, m vectors corresponding to a stream of m frames weredivided into two clusters. Assuming a Gaussian distribution with 5dB standard deviation,a threshold was defined for the distance between the centers of the clusters under normalcondition. When the distance exceeded the threshold value, a spoofing alert was raised.Performance of the method was tested in both IEEE 802.11 and IEEE 802.15.4 networktestbeds, each with four AMs. For 10% FPR, [22] achieved 95% DR. Moreover, authorsof [22] studied effect of the distance between the spoofing and original nodes on detectionperformance, and concluded that the further away is the spoofer from the original node, thehigher is the detection rate. For IEEE 802.11, the detection rate was reported to be morethan 90% when the distance is about 13 feet, while for IEEE 802.15.4 the same detectionrate was obtained for distances about 20 feet.The most recent work in the area of RSS-based spoofing detection is [20] which presentsmethods for spoofing detection, finding the number of attackers, and locating multipleadversaries. For detection phase, they used partitioning around mediod (PAM) algorithm.PAM clustering is similar to k-means, yet it is more robust against noise and outliers. Fordiscovering the number of attackers, two methods were suggested, Sillhouette plot andSILENCE. Both methods were based on finding the number of clusters in a clusteringproblem, where each cluster contains samples of a same distribution. This approach iseffective, as long as the adversary node does not change its transmission power. A singleattacker utilizing different power levels, can present multiple clusters. Performance of thismethod was assessed in IEEE 802.11 and IEEE 802.15.4 testbeds, with 5 and 4 AMs,respectively. For 5% FPR, DR was above 90%, when the distances between the malicious40Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksand genuine nodes are less than 15 feet and 20 feet for IEEE 802.11 and IEEE 802.15.4networks, respectively.The major drawback of clustering-based approaches such as k-means and PAM is thatwhen the ratio of malicious traffic significantly outweighs the benign traffic, benign framesare treated by the clustering algorithm as outliers. In this case malicious traffic is dividedinto two clusters; since both clusters belong to the same origin, the attack will not bedetected. Therefore, clustering-based methods can not detect high traffic rate spoofingattacks which include most denial of service (DoS) attacks, such as back-off manipulationattack. In addition, the attacker and the genuine nodes do not necessarily communicatewith the victim at the same time. Attack can happen when the genuine node is silent orhave a very low traffic rate.The only work that by converting the time series of RSS values into frequency domain,tries to provide a more proper representation of data is [23]. In [23], signal strength Fourieranalysis (SSFA) was utilized. The intuition behind this method was that under normalcondition only low-frequency oscillations exist. On the other hand, during spoofing attacks,the genuine frames are interleaved with malicious frames which generate high-frequencycomponents. Using fast Fourier transform (FFT), the energy of high-frequency componentswere compared to a threshold. Passing the threshold value was interpreted as a spoofingattack. The advantage of this method is that using a single AM it can achieve a betterdetection performance compared to other methods. However, when the traffic rate of theoriginal and spoofing nodes surpasses a specific range, this method will not be effective.Besides, relying on high frequency component of the Fourier transform introduced 0.2second delay in attack detection.The goal of this chapter is to provide a resource and time efficient algorithm to detecta vast range of spoofing attacks. While most of the previous works, [19],[20],[21] and [22],41Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworkstried to improve the detection performance by applying different classification techniqueson the raw RSS values, and achieved almost similar results, we focus on providing a thor-ough and more distinctive representation of the data. We show that projecting the datainto a feature space that includes both magnitude and frequency related components, canovercome the limitations of [19],[20],[21] and [22], and allows a high detection performanceeven with one AM.Our work is motivated by the proposed method in [23] in leveraging the fluctuations inRSS stream for attack detection. Yet, we take a different approach; instead of FFT andenergy of high frequency component, we employ DHWT which is more time and resourceefficient. Accordingly we introduce SDC parameter which is highly separable for benignand malicious traffic. In [23] only high frequency components are used for attack detection,which not only imposes a high detection delay, but also is ineffective in detecting high rateattacks. In addition to the fact that DHWT is faster than FFT, by avoiding the relianceon frequency feature for highly separable attacks (high differences between RSS values orhigh attack ratio), we further improve the detection delay.Another advantage of the proposed method compared to [19],[20],[21] and [22] is ro-bustness against environmental changes achieved through adaptive learning of legitimateRSS values. Overall, as we show in the rest of the chapter, the proposed algorithm providesa significantly better performance in terms of resources and detection accuracy comparedto previous works.HCH ]omz Vrzv cztfiork VrxhitzxturzA ZigBee HAN primarily contains two types of devices, a smart meter which connectsa HAN to a NAN, and home devices inside the HAN. Three different HAN topologieswere defined in SEP 1.X [87] specification. In a utility enabled HAN (UE-HAN), all HAN42Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksFigure 3.1: Architecture of a CP-HAN.devices are under the control of the utility. In this architecture the smart meter acts asZigBee MAC layer coordinator, trust center, and energy service interface (ESI). Anothertopology is consumer private HAN (CP-HAN) where an application layer gateway (ALG)connects the smart meter to home devices which are completely governed by the customer.The smart meter passes the usage data and pricing information to the ALG which playsthe role of the network coordinator and also can be the security center. The third topologyis a hybrid of UE-HAN and CP-HAN, where some devices are directly controlled by theutility, while others are managed by the customer. Typical devices in a CP-HAN are shownin Figure 3.1.HCI ihrzvt boyzlIn a spoofing attack, at least three entities are involved, a legitimate node, an attacker anda victim. The legitimate node is allowed to exchange information and command massageswith the victim. The victim uses the identity of the legitimate node to decide whetherpackets are coming from a genuine node or not. For instance, in IEEE 802.15.4 networksnode identities are defined by node ID. The attacker eavesdrops network traffic to extract43Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksidentity of the legitimate node; then, by forging its identity, sends malicious traffic tothe victim on its behalf. Using a spoofing attack, as the basis for other attack types,an adversary can disturb normal operation of the network, gain control of home devices,remove some devices from the HAN, and affect the load curve. The adversary goal candiffer from simply unsettling a neighbor, to more malicious intentions such as inducing aninefficient usage pattern, which can cause financial loss and damage to the customer andthe utility.Through spoofing, an adversary can directly send control commands such as turn on/offto the smart appliances by forging the identity of authoritative nodes like the energymanagement system (EMS). Adversary might also send false data to the governing nodesto indirectly control network nodes. For instance, by masquerading the identity of thesmart meter adversary can send false price and usage data to the EMS and customerdisplay, in return these devices control the HAN as the attacker wants.In order to do a spoofing attack, the attacker node needs to be within the networkcommunication range to be able to exchange messages with the network nodes. The at-tacker might also need to know the authentication credentials and encryption keys whereencryption is supported. This is especially true for upper layer attacks.HCJ hpoofing Dztzxtion VlgorithmOur proposed spoofing detection and prevention system constitutes two modules. A spoof-ing detection part which is installed on the security center and is responsible for detectingspoofing attempts, as well as prevention agents installed on each sensor node to filtermalicious traffic. Both modules work based on the RSS values of received frames. The de-tection system keeps track of RSS values of all network nodes, and analyzes them for signsof spoofing attempts. Once an attack is detected, the detection module sends a secured44Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksmessage to the victim, informing it about the presence and identity of an attacker. Theintrusion prevention agent then distinguishes and filters malicious frames. The detectionalgorithm is as follows.Spoofing detection can be formulated as a statistical significance testing problem. Thenull hypothesis is defined as: H0 : benign traffic (no attack)Test statistics are then used to decide if the observed data belongs to the null hypoth-esis. In order to achieve high detection performance in terms of number of AMs, falsepositive/negative and detection rate, we utilize two parameters to represent the features ofthe stream of RSS values. We use the ratio of out-of-bound frames, which deals with themagnitude of RSS values. Further, we apply DHWT on time series of RSS values and useSDC to measure the oscillations in the data stream. In presence of a spoofing attack, RSStime series have more fluctuations since the legitimate packets are interleaved by forgedpackets with possibly different RSS values. In Section 3.7 we show that under a variety ofattack scenarios, SDC provides a more separable distribution function (compared to RSS)which allows an accurate attack detection even when the genuine and attacker nodes arein close proximity.HCJCF dpzrvtion ehvszThe data stream is divided into windows containing 2n frames. Selection of the windowsize is related to the required number of samples as inputs of a DHWT. Following [19],[20], [21] and [22] we assume a Gaussian distribution for RSS. In these papers the RSSdistribution was examined in indoor environments with areas about 2000 ft2, which issimilar to the physical environment that we target in this work.htzp FO For each captured frame, the RSS is compared with the mean value, µzloutl,of the Gaussian distribution. A counter keeps track of the number of RSS values differing45Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksfrom the µzloutl more than a threshold, theSS. theSS is related to the variance (σ) of theGaussian distribution. At the end of each window, the ratio of out-of-bound frames iscalculated, g = nzutn, where nout is the number of out-of-bound frames. If g lies in therange gmin Q g Q gmtx, the algorithm stops at this step and raises an alarm declaringthe presence of a spoofing attack. g greater than gmin shows that the number of frameshaving an out-of-bound RSS value is more than normal; this with a high probability is dueto the presence of another entity calming the same identity. However, gmtx Q g mightbe the result of alteration of the mean value of the RSS distribution due to environmentalchanges rather than a malicious activity. If an attack is not detected at this step, thealgorithm continues in Step 2.htzp GO Next step is evaluation of the SDC. SDC is calculated using DHWT; in Section3.7 we will briefly introduce DHWT and explain the reason behind using this transforma-tion for feature extraction. Like RSS, SDC has a Gaussian distribution. Knowing themean value and variance of SDC for a given node under normal condition, for each windowSDC is compared with a threshold, thSDV ; if the threshold is exceeded, an attack alert istriggered. In addition to SDC, DHWT calculates the mean value of the frames inside thewindow, µw.htzp HO As the final step, if gmtx Q g and step 2 did not detect an attack, an extracheck is performed. gmtx Q g can be the result of two conditions, shift of the mean valuedue to the environmental changes, or presence of an attacker with a much higher trafficrate compared to the genuine node. Therefore, in order to decide whether an attack alarmshould be raised, or the mean value requires update, detection system sends a sequence ofpackets to the receiver, for instance a stream of data packets that require acknowledgmentor internet control message protocol (ICMP) massages such as PING, if ICMP is supported.Using this method, the detection system forces the genuine node to transmit packets with46Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksthe traffic rate comparable to the traffic rate of the attacker. The detection system thencan be sure that a sufficiently large portion of the received frames comes from the genuinenode. After passing an expected time for receiving the replies, the algorithm is repeatedfrom the first step. This time if still gmtx Q g and an attack was not detected until thisstep, the algorithm decides that the traffic is benign and the mean value is updated byreplacing µzloutl with µw.HCJCG irvining ehvszTo effectively detect spoofing attacks, the algorithm uses four threshold values, theSS,thSDV , gmin, gmtx. Except gmtx which is a fixed parameter close to one, other thresholdsare learnt through a training phase in which the network is assumed to be spoof-free. Firstthe mean, µeSS, and variance, σeSS, of the RSS stream over the whole training durationare calculated. theSS is defined as theSS = σeSS.In the next step, the RSS stream is divided into windows of 2n frames. For each window,g (using µeSS and σeSS) and hDX are calculated. According to the distribution of g andhDX over several windows, the following parameters are extracted:• Mean value and variance of g (µe and σe).• Mean value and variance of SDC (µSDV and σSDV ).• 10 largest values of R (gMTk = {gMTk1P OOOP gMTk10} in descending order)• 10 largest values of SDC (hDXMTk = {hDXMTk1P OOOP hDXMTk10} in descendingorder)Numerical values are assigned to gmin and thSDV using the above parameters. gminand thSDV dictate the trade-off between DR and FPR. When very low FPR is required,47Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksgMTk and hDXMTk are used (application of gMTk1 and hDXMTk1 results in close to0% FPR). Otherwise, the threshold values can be defined according to the parameters ofGaussian distributions. One option is the combination in (3.1) which minimizes the FPRat the first step of the algorithm and provides a good detection rate for the second step;however, depending on the application and security policy, appropriate balance betweenDR and FPR are achievable by adjusting the thresholds: gmin ∈ gMTkthSDV = µSDV + 3σSDV (3.1)HCJCH bultiplz Vir bonitorsWhen the detection system contains more than one AM, the RSS readings of the AMs aretransferred to a central server (CS). All computations are performed in the CS and theAMs are responsible for reading the RSS values of the receiving frames from the targetnodes, as well as sending packets to a given node for mean update. Based on the AMreports, CS makes a global decision about the health or malice of traffic. Using the RSSreadings of different AMs, CS makes an RSS vector for each frame. Reports of differentAMs might be received by CS with different time delays. CS assumes that reports within apredefined time interval (according to the delay estimation) belong to the same frame. If areport from a specific AM is not received by CS in the expected time, the mean RSS of thatAM is used in the RSS vector. Detection algorithm for multi-AM scheme is summarizedas follow:• For each AM, the thresholds and algorithm parameters are learned through a trainingphase similar to what was described for single AM.• During the operation phase, for each window, CS calculates the ratio of out-of-bound48Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksframes. This time a frame is considered to be out-of-bound if (3.2) is true,∥∥∥−−→ghh −−−→µeSS∥∥∥ S ‖−−→σeSS‖ (3.2)where−−→ghh, −−→µeSS and −−→σeSS are vectors with n components, containing the RSSreadings, mean and variance of RSS (learnt in the training phase) of n AMs. Ifgmin Q g Q gmtx, an attack is detected and the algorithm ends for the currentwindow. gmin is learnt in the training phase similar to the single AM method.• For each AM, hDX of the current window is calculated. An attack is detected if(3.3) is true. ∥∥∥−−−→hDX −−−−→µSDV∥∥∥ S 3 ‖−−→σeSS‖ (3.3)where−−−→hDX, −−−→µSDV and −−−→σSDV are vectors with n components, containing the hDX,mean and variance of hDX for n AMs.• If g S gmtx a notification is sent to the AMs. In response, each AM transmits apacket stream to the target node and the mean update mechanism is initiated.Communication between AMs and CS can be wired or wireless; either way it must behighly secured, for instance through implementation of secure authentication and encryp-tion algorithms.HCK hpoofing erzvzntion VlgorithmWe propose two methods for differentiating and filtering malicious packets.49Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksHCKCF htvtix ihrzsholyConsidering that RSS values follow a Gaussian distribution, the agent calculates the meanand variance of RSS values of received frames for each communicating node. These valuesare calculated in an online manner to reduce the size of required storage. When the nodereceives a mean update message from the security center, it discards the saved mean valueand starts calculating it again. When an intrusion alarm is received, for each incomingframe the agent compares the RSS value with the mean RSS and if the difference exceeds apre-defined threshold, the packet will be dropped. The threshold is a fixed value, selectedaccording to the variance of RSS distribution. The main advantage of this method is itssimplicity and low computation and storage requirements. The downside of this approach,however, is the network overhead due to dropping the legitimate packets. The thresholddetermines the trade off between DR and FPR. The smaller the threshold, the more is thechance of dropping illegitimate packets; at the same time it results in drop of more genuinepackets. In Section 3.7 we will investigate this effect in more detail.HCKCG Dynvmix ihrzsholyIn this method the threshold value is assigned dynamically based on the distance betweenRSS distribution of the genuine and attacker nodes. In order to measure the distance, k-means clustering is employed. The number of operations in a one-dimensional two clusterk-means problem is in the order of d(n3logn). When the agent receives an attack alarm,a queue containing the RSS values of n last frames is formed. The agent uses 2-clusterk-means algorithm, and RSS values are divided into two clusters. Then the center anddistance between two clusters are calculated. If one of the calculated centers is close tothe mean value of the genuine node, the measured distance is used to adjust the threshold;since in this case with a high probability one cluster belongs to the genuine frames and50Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksthe other to the attacker. However, the genuine and attacker nodes do not necessarilycommunicate with the victim simultaneously. To address this situation, if none of themeasured mean values is close to the mean value of the genuine node, the average of themean values of the two clusters is calculated and its difference with the mean value of thegenuine node determines the distance.The dynamic method allows a larger threshold for far RSS values which results in alower FPR. In addition, when RSS values are close, a small threshold is used; therefore,attacks are still preventable, while in static approach in order to maintain the balancebetween prevention rate (PR) (which is the number of intrusion instances prevented bythe system divided by the total number of intrusion instances) and FPR, attacks with closeRSS values are not stoppable. Although dynamic threshold reduces network overhead asa result of the low FPR, this method requires more computations and introduces a shortdelay for queuing and clustering the frames; however, considering that these effects onlyhappen during an attack, their costs are acceptable.HCL ezrformvnxz VnvlysisHCLCF ezrformvnxz of thz hpoofing Dztzxtion VlgorithmIn the proposed approach, we exploit the spatial correlation of RSS values to determinewhether the incoming frames, carrying the same identity, belong to a single genuine nodeor are originated from different sources. The RSS of a frame measured in a given locationis affected by parameters such as environmental condition, random noise and multipatheffect; still it strongly depends on the distance between the sender and the receiver. As aresult, RSS of devices located at different physical places are expected to be distinctive.51Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksFigure 3.2: a) Attack scenario. b) CDF of the RSS values of the attacker and the genuinenodes.The spatial dependency of RSS is formulated as (3.4).ghh = e0 − 10γlog(y1y0)+m (3.4)where e0 is the transmission power in a reference point, y0 and y1 are the distances from thesender to the reference point and to the receiver, respectively, γ is the path loss exponent,andm is the shadow fading with a Gaussian distributionc(0P σ) . Having the configurationin Figure 3.2 and assuming an equal transmission power for nodes 1 and 2, the differencebetween RSS values of the two nodes, sensed by node 3 is:∆ghh = 10γlog(y1y2)+∆m (3.5)where ∆m has a Gaussian distribution c(0P√2σ).RSS values in a landmark also follow a Gaussian distribution c(µP σ); while σ dependson environmental condition, its average in an indoor location is reported to be about 5dB[19],[20],[21],[22]. Knowing the above physical properties, in the rest of this section weprove the efficiency of the proposed algorithm through mathematical analysis.htzp FO As the first detection step, assuming the distribution function of c(µzP σz) for52Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksthe genuine node, for each window the number of frames differing from the mean value, µz,more than σz is counted as nout. Then parameter g =nzutnis compared with a thresholdτ (0 Q τ Q 1). The value of g under normal condition is proportional to the area denotedby dots in Figure 3.2.b. Presence of an attacker increases the value of g, since in thiscase nout will be related to the summation of dotted and crossed areas. Smaller value of τresults in a higher DR, yet it increases the FPR. We formulate the FPR as (3.6), which isthe probability of g exceeding the threshold when there is no attacker.Feg = e (g S τ |normvl) = e (nout S τn|normvl) (3.6)enormtl(ghh) = c(µzP σz) (3.7)The probability that from n frames nout are out of a boundary, follows a binomial distri-bution.e (nout|normvl) = nnoute nzutout (1− eout)n−nzut (3.8)eout = enormtl (|ghh − µz| S σz) = 2Φ (µz − σz;µzP σz) (3.9)where eout is the probability of a frame having an out-of-bound RSS under normal condi-tion. In (3.8), Φ(O) is the cumulative distribution function (CDF) of the Gaussian distri-bution. Considering (3.6) and (3.8) the FPR is:Feg =n∑nzut=/ne (nout|normvl) = 1− F (τn;nP eout) (3.10)where F (O) is the CDF of the binomial distribution. As the above formula suggests, FPR53Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksis inversely related to τ . On the other hand DR is formulated as below:Dg = e (g S τ |vttvxk) = e (nout S τn|vttvxk) (3.11)Considering a Gaussian distribution for the attacker, c(µtP σt), PDF of the RSS valuesunder attack is:ettttck(ghh) = (1− η)c(µzP σz) + ηc (µtP σt) (3.12)where η is the ratio of the spoofed frames. Similar to the explanation for FPR, nout has abinomial PDF.e (nout|vttvxk) = nnout (e ′out)nzut(1− e ′out)n−nzut (3.13)where e ′outis the probability of a frame having an out-of-bound RSS under attack condition,considering (3.12):e ′out = 2(1− η)Φ(µz − σz;µzP σz) + ηf(µz + σz;µtP σt) (3.14)where f is the Q-function (the complement of the CDF) of the Gaussian distribution.Finally, DR is summarized as:Dg = 1− F (τn;nP e ′out) (3.15)From the above equations and considering Figure 3.2.b, one can conclude that DR is afunction of τ , η, |µz − µt|, σz and σt. The influence of η and |µz − µt| on detectionperformance, in terms of receiver operating characteristic (ROC), is depicted in Figures3.3 and 3.4. τ defines the trade off between FPR and DR. Figure 3.3 shows that as theratio of malicious frames increases the detection performance improves.54Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksFigure 3.3: Effect of ratio of malicious traffic on performance of the first step of the spoofingdetection algorithmAs expected, Figure 3.4 shows that the detection performance improves when the dis-tance between mean values increases. On the other hand, enlargement of variance degradesthe detection performance. In summary, the more separable are the distribution functionsof the attacker and genuine node, the better is the detection performance.In step two of the algorithm, by applying DHWT on RSS stream, we achieve a param-eter with more separable distribution for a given RSS stream.htzp GO In this step, for each window, DHWT is utilized to provide a measure ofoscillations in RSS values. While fast Fourier transform (FFT) have been widely used toextract frequency components of time series, discrete wavelet transform (DWT) is provedto be a superior alternative in many applications [88]. DHWT has the desirable features ofwavelet transform. It not only contains the frequency content of the input, but also showsthe temporal order. Another advantage of DHWT is the low number of required operations,which makes it time and resource effective. Computing DHWT of c points takes d(c)55Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksFigure 3.4: Effect of the difference between RSS mean values on performance of the firststep of the spoofing detection algorithmFigure 3.5: Discrete wavelet transform decomposition algorithmarithmetic operations, which is much less than d(clogc) required for FFT. Resource andtime efficiency are the major reasons why we employed DHWT in our detection algorithm.Figure 3.5 shows the decomposition process in a wavelet transform. In the figure, g[n] andh[n] are low-pass and high-pass filters which must be quadratic mirror. At each level, theinput stream is decomposed into low and high frequencies. The outputs of low-pass andhigh-pass filters are called approximation coefficients and detail coefficients, respectively.56Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksIn summary, DHWT pairs up the input values, stores the differences, and passes the sumsto the next level. The process is repeated until finally 2n− 1 differences and a mean valueremain [89].Let us assume that m AMs are monitoring the RSS values of the frames with identityof a legitimate node. ∆ghh, which is the total RSS deviation from the mean values in mlandmarks is calculated as:∆ghh2 =m∑i=1(ghhi − µzi)2 (3.16)where ghhi is the value of RSS in landmark i, and µzi is the mean RSS of genuine node inlandmark i. As it was shown in [19], when the two nodes are co-located (there is no attack),m = ∆ghh2 the random variable has a central Chi-square distribution χ2(m), where mis the degree of freedom which is equal to the number of AMs. On the other hand, whenwireless nodes are at different locations, m follows a non-central Chi-square distributionχ2(mPλ), where m is the degree of freedom and γ is the non-centrality parameter, whichin this case is:λ =m∑i=1(µzi − µtiσ)2(3.17)In (3.17), µzi and µti are the mean values of RSS stream of the genuine and attacker nodesin ith AM. The variance is assumed to be the same for both nodes, σ. Therefore, the DRand FPR are calculated using (3.18) and (3.19).Dg = e (x S τ |vttvxk) = 1− F2(mN 22=( τ2σ2)(3.18)Feg = e (x S τ |normvl) = 1− F2(m=( τ2σ2)(3.19)where F(O) is the CDF of χ, and τ is a threshold. When τ is exceeded, a spoofing57Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksattack is detected. While τ defines the trade-off between DR and FPR, DR is affectedby m, σ, and λ. To achieve a higher DR, previous works increased the number of AMs(m). To further improve the detection performance, we suggest application of frequencycomponents; instead of m = ∆ghh2, we define the random variable m as m = ∆hDX2.∆hDX2 =m∑i=1(hDXi − µSDVt)2 (3.20)where µSDVgt is the mean of SDC of the genuine node in ith AM and,hDXi =n−1∑j=1yxi[j] (3.21)In (3.21), n is the window size and yxi[j] is the jth detail coefficient, starting from the highfrequencies, for the ith AM. Assume that the legitimate node sends a frame stream withRSS values hz = {hz1P OOOP hzn=2}, where hz ∼ c(µzP σz). At the same time the attackersends the stream with RSS values of ht = {ht1P OOOP htn=2}, ht ∼ c(µtP σt). For simplicityof analysis we consider an ideal case when the ratio of malicious traffic is 0.5, and each pairof legitimate frames is interleaved by one malicious frame. Then the RSS stream in an AMis: hz = {hz1P ht1P OOOP hzn=2P htn=2} . By applying DHWT on S, level 1 detail coefficientsare:{Sg1−Sa12PSg2−Sa22P OOOPSg n2−San22}For simplicity we ignore higher level detail coefficients(Including higher level coefficients will have a positive effect on separability of hDX).hDX of the first level detail coefficients is hDXS1 =∑n2i=1sgt−sat2. Considering thesummation property of Gaussian variables (∑i vic(µiP σi) = c(∑i viµiP√∑i v2iσ2i)),and assuming the same variance for both attacker and genuine nodeshDXS1 ∼ c(n4(µz − µt)P√n2σ), while hDXSg1 ∼ c(0P√n2σ). Therefore, ∆hDX ∼c(n4(µz − µt)P√n2σ), while ∆ghh ∼ c(µz − µtP σ).Thus, for n = 64 as it can be calculated from (3.17), λ of ∆hDX is 16 times the λ58Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksFigure 3.6: Effect of increase in the number of AMs and non-centrality parameter (nrepresents the number of AMs and k is the scale of non-centrality)of ∆ghh. However, we remind that this is for an ideal case, where benign frames arealternatively interleaved by malicious frames. Also the higher level coefficients are ignored.Therefore, the above computation provides an estimate of improvement of λ rather thana deterministic value.According to (3.18) and (3.19), Figure (3.6) compares the effect of increase in thenumber of AMs and non-centrality on detection performance. It can be seen in the figurethat when the non-centrality parameter is scaled by 4, the detection performance of singleAM outperforms the performance of a system with 12 AMs which has a fixed non-centrality.Other parameters of Figure 3.6 are: µz − µt = 10 and σ = 5.59Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksHCLCG ezrformvnxz of thz hpoofing erzvzntion Vlgorithmshtvtix ihrzsholyFor the static method, FPR and network overhead have almost fixed values depending onthe value of the threshold. Therefore, the threshold is defined so that there is a balancebetween network overhead and PR. As the threshold becomes larger, the system loses itsability to prevent attacks in which the attacker is in close proximity of the genuine node orhave close RSS values. False positives introduce network overhead, since when a genuineframe is dropped due to exceeding the RSS threshold, several retransmissions might berequired until the RSS lies in the legitimate range. The expected number of tries until asuccessful transmission is formulated as:E =1e (|ghh − µz| Q τ |genuine) =11− 2Φ (µz − τ ;µzP σz) (3.22)Figure 3.8 show the relationship between the expected number of retries, Z, and the thresh-old, while the relationship between Z and FPR is presented in Figure 3.9.Dynvmix ihrzsholyIn the dynamic approach, since the threshold is assigned dynamically, based on the differ-ence between mean values, not only attacks with close distance to the genuine node arepreventable, but also unnecessary network over head is avoided. These, however, comewith the penalty of additional computation and processing delay.60Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksFigure 3.7: Theoretic ROC for static threshold spoofing prevention (m and m′ are themean RSS values of the genuine and attacker nodes)Figure 3.8: Expected number of tries for successful transmission vs. thresholdHCM ZxpzrimzntsHCMCF izstwzyIn order to evaluate the performance of our spoofing detection approach, we conductedtwo experiments in an IEEE 802.15.4 network testbed. The network was established in a61Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksFigure 3.9: Expected number of tries for successful transmission vs. FPRreal office environment located in communication lab, in the Department of Electrical andComputer at the University of British Columbia. The office size is 9903 ft2. The networkcontained four landmarks (AMs), an attacker and a genuine node. We used six telosBmotes as network nodes; four were programmed to act as AMs to monitor the RSS ofreceived frames. The AM motes were connected to four personal computer (PC) systems,and the RSS readings were directly transferred to the PCs. Two other motes which hadthe role of the attacker and the genuine node, were programmed to send constant bit rate(CBR) traffic with 5 frames per second. The position of the AMs and the genuine nodeare depicted in Figure 6 by arrows and a star sign respectively. During the course ofexperiments the attacker was placed in different locations depicted in Figure 3.10 by bolddots.HCMCG ezrformvnxz of thz hpoofing Dztzxtion VlgorithmZxpzimznt FThe goal of the first experiment was to study the changes in SDC under normal andattack conditions. We wanted to confirm that in practice SDC provides a more separablerepresentation of the data compared to RSS.62Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksFigure 3.10: Testbed setting (the distance between consecutive grid dots is 50 cm)In this experiment, first the RSS logs of genuine node frames were collected by AM3for the duration of four hours. Then, the hDXs of RSS streams were extracted, and theirmean values and variances were calculated. In computing the hDX, we used the absolutevalue of DXs to avoid cancelation of DXs with different polarities. The window size wasset to 64. In the next step, the attacker node was placed in 50 cm, 1 m, 3 m, 4 m and5 m distance from the genuine node. For each position RSS log of genuine and attackernodes were captured for 1 hour, and the mean value and variance of hDX were calculated.The ratio of malicious and benign traffic was 0.5. The experiment results are presented inTable 3.1. As it can be seen from the table, ∆hDX provides a larger λ in several orders ofmagnitude. In addition, when the distance between two nodes expands, λ increases. Table1 also includes the result of single AM spoofing detection using the proposed algorithm foreach dataset. Even when the distance is as low as 50 cm, using SDC, attack detection isto some extent possible, while RSS based approach is completely incompetent due to thesimilar mean values. As the distance increases to 3 m, the detection performance improvesto a satisfactory level. In previous works, on the other hand, even with multiple AMs, the63Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksTable 3.1: Comparison between λ of RSS and SDCDistvnxz µeSS(yW)σeSS(yW)µSDV(yW)σSDV(yW)λeSS λSDV DghDC(:)[eghDC(:)E(lzgCnoyz)-67 4.24 8 13 - - - -JE xm -67 2.96 19 16 0 0.14 70.37 23.07F m -62 2.1 70 22 0.62 3.13 84.21 11.11H m -59 2.33 122 26.73 1.46 8.23 98.08 2.56I m -55 3.93 188 35.4 2.16 15.05 99.33 0minimum detectable distance was reported to be about 6 m.Zxpzrimznt GIn the next experiment we evaluated the performance of the proposed spoofing detectionmechanism. In the testbed, the attacker node was placed in each position marked by abold dot in Figure 3.10, for 5 minutes, and transmitted CBR traffic. During the wholeexperiment the benign node was located at the position depicted by a star in the figure,and sent CBR traffic with the same rate as the attacker. Overall, 90 different placementsof attacker and benign nodes were tested. The 4 AMs, monitored the stream of RSS valuesfrom both the attacker and the genuine nodes, and stored the values in a log file. At theend of the experiment the log files were collected and the detection algorithm was appliedfor each AM separately. We used 4 AMs for a single AM detector and analyzed the resultsseparately to study the effect of the position of AM on detection performance. The meanand variance of the RSS of the genuine node was calculated using all samples receivedduring the experiment, which includes 135000 samples. We observed a variance of 4.68 dBfor these samples. Therefore, the standard error mean of the genuine node calculated usingthe experiment samples is 0.0058dB. Considering that the granularity of the measured RSSvalues is 1dB which is much larger than this value, this error does not affect the experiment64Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksresults. The detection performance of each AM is depicted in Figure 3.11. As the figuresuggests, AM1 has the best detection performance. The reason is the closer distance ofAM1 to the genuine node. We remind that according to (3.5), attack detection dependson the ratio of distance between attacker and genuine nodes to the AM, rather than thedistance between 2 nodes. Therefore, for a fixed distance between the nodes, a closer AMhas a better chance of attack detection.While the experiment was conducted in an office building with a usual amount ofpeople movement, to study the effect of moving objects, we deliberately introduced moremovements in close proximity of AM3. As Figure 3.11 shows, we observed worst but stillacceptable performance for this AM compared to the other 3 AMs.To further study the effectiveness of the magnitude and frequency features on detectionprocess, Figure 3.11 also includes the ROC curve of the detection processes purely basedon magnitude (g) and frequency (hDX) features. As it can be seen in Figure 3.11, SDCprovides a better detection performance. Also we can see that combining both featuressignificantly improves the performance. The average ROC of the 4 AMs is shown in Figure3.12. In addition, we studied the effect of the ratio of malicious and benign traffic. Theresults are shown in Figure 8. When the rate of malicious and benign traffic is close, thefluctuations in RSS stream is high; therefore, hDX can effectively distinguish the malicioustraffic. However, when the ratio of malicious traffic is very high, the RSS stream will haveless fluctuations, since the traffic will mostly belong to one node (the attacker in thiscase). Yet step 1 of the algorithm will be very effective in this scenario. Therefore, byapplying both features the spoofing detection mechanism can successfully detect a vastrang of malicious traffic ratio. However, when the traffic rate of the benign node is muchhigher than that of the malicious node, the algorithm will not be effective. Yet, suchscenario can be less hazardous. At least most DoS attacks require high traffic rate. Figure65Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksFigure 3.11: ROC curve of the spoofing detection algorithms for a) AM1, b) AM2, c) AM3,d)AM43.13 represents the ROC curve of the spoofing detection in AM4 using g, hDX and bothfeatures under various attack ratios.HCMCH ezrformvnxz of thz hpoofing erzvzntion VlgorithmsFigure 3.14 shows experiment results for static threshold prevention method. In this ex-periment AM1-AM4 play the role of victims (V1-V4). We used 4 victims in our experimentin order to study the effect of position of the victim in prevention performance. As it canbe observed from Figure 3.14, the agent on V1 has the best performance. The reason isthe closer distance between V1 and the genuine node. We also studied the effect of mov-66Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksFigure 3.12: Average ROC of the spoofing detection algorithm.ing objects on the performance. While the testbed was located in an office with a usualamount of people movement we deliberately introduced more movements around V3 whichshows the worst, still acceptable performance compared for this victim node compared toother victim nodes. Figure 3.15 shows the average ROC of the static spoofing preventionalgorithm for four victims. To test the dynamic method, its prevention performance wasmeasured when the threshold was 1/2, 1/3, 1/4, and 1/5 of the distance between the meanvalues. The window size, meaning the number of frames that were clustered each time, was64. The results for four victims are shown in Table 3.2. Comparing Figure 3.14 and Table3.2 shows a significantly higher performance for dynamic threshold method. We empha-size that these results show the performance of an automatic prevention system. Unlikean IDS, here false positives do not cause operational cost or administrator negligence, butthe cost is some network overhead and delay. Even for the worst case in Table 3.2 whichis the 94.42% PR and 64.81% FPR, the most significant cost is 184% network overheadwhich only happens when an attack is detected.67Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksTable 3.2: Prevention performance for dynamic threshold spoofing preventionihrzsholy 1/2 1/3 1/4 1/5kF PR(%), FPR(%) 95.33, 1.52 97.53, 6.00 98.93, 9.03 99.17, 12.36kG PR(%), FPR(%) 92.11, 12.05 95.11, 18.75 96.18, 27.31 96.76, 35.91kH PR(%), FPR(%) 82.73, 23.57 89.45, 39.21 92.55, 53.32 94.42, 64.81kI PR(%), FPR(%) 92.94, 5.02 95.57, 8.76 96.33, 12.29 96.94, 16.78Table 3.3: Comparison of different RSS-based spoofing detection techniques (NA: notapplicable, - : was not provided in the paper, *: average of the results of 4 AMs)Vpprovxh izstvrzv(ft2)cztfiorktypzDgFVb(:)[egFVb(:)coCofVbsDg(:)[eg(:)binCyisBtvnxz(ft2)Dg(:)[eg(:)gzsistvntto znvxhvngzsDztzxtshigh rvtzvttvxksg&hDC 9903 802.15.4 *92.6394.67*0.000.564 99 0.0 9.84 98.08 2.56 Yes YesKBbzvnspGGr1600 802.15.4 - - 4 95.7980.09.520.00 90.00 - Yes NoeVb pGEr 1600 802.15.4 - - 4 96.598. 90.00 5 Yes No[ourizrpGHr- 802.11 80.42 0.05 NA NA NA - - - Yes NohignvlprintpFNr11625 802.11 NA NA 6 95.6 - 16.40 72.2 - No YesbultiBGvussivnpGFr16000 802.11 64.4 1.00 20 94.497. 84.3 1 No NoHCN Disxussions vny CompvrisonA summary of various RSS-based spoofing detection methods is provided in Table 3.3. Thedatasets used for the performance evaluation in the experimental section of the papers arenot the same. Yet, similar approaches have been taken in setting up the testbeds. In somepapers both IEEE 802.11 and IEEE 802.15.4 networks were evaluated. In this case we onlyincluded the IEEE 802.15.4 results since it is the focus of this work. If the experimentsin the previous works were merely based on WiFi, we included the IEEE 802.11 resultsin Table 3.3. According to [19] and [21], the detection performance for IEEE 802.11 andIEEE 802.15.4 is close. Though, the minimum distance for detectable attacks in IEEE802.15.4 is a little higher (about 5 feet) than IEEE 802.11.68Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksHCFE hummvryIn this chapter we have studied the existing RSS-based spoofing detection methods forstatic IEEE 802.15.4 networks and explained their limitations. In addition to long detec-tion delay, ineffectiveness in mitigating high rate attacks and lack of robustness againstenvironmental changes, most existing approaches rely on multiple AMs, which is not costeffective. Further, we have presented a novel spoofing detection technique which employsboth magnitude and frequency features of RSS streams to provide a high detection perfor-mance even with a single AM. Evaluations of the proposed method through experiment andanalysis have proved its high performance both for single and multi-AMs. In addition tointroducing an efficient approach for spoofing detection, we have introduced two algorithmsfor automatic spoofing prevention using RSS values and investigated their performancesthrough analysis and experiments which proved the effectiveness of our approach.69Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksFigure 3.13: ROC of spoofing detection in AM4 based on a) R, b) SDC, c) proposedalgorithm.)70Whuptyr GB gpoong Dytywtion und dryvyntion gystym for nigByyABusyd Homy Uryu bytworksFigure 3.14: Experimental ROC for static threshold spoofing prevention.Figure 3.15: Average experimental ROC of V1, V2, V3 and V4 for static threshold spoofingprevention.71Chvptzr I]VcIDehA Vn Intrusion Dztzxtionvny erzvzntion hystzm foroigWzzBWvszy ]omz Vrzv cztfiorksICF IntroyuxtionThe dominant HAN technology in North America and many other countries is ZigBee.Being located in insecure environment and use of wireless technology make HANs vulnera-ble to cyber attacks [3],[4]; this necessitates application of appropriate IDSs. At the sametime, since HANs are located in areas far from the utility, receiving IDS alarms by theutility and acting upon them introduces a large operational cost and delay in stoppingthe attacks. Considering the large scale of smart grids, when human response is expected,a small percentage of false alarms results in a high operational cost. Therefore, in thecontext of HAN, intrusion prevention mechanisms which not only detect but also stop theattacks are highly preferable.In this chapter we present a novel model-based intrusion detection and preventionsystem (IDPS) for ZigBee-based HANs, HANIDPS. We use SEP 2.0 application protocolspecification as well as IEEE 802.15.4 standard to define a thorough feature space forHANIDPS. HANIDPS employs a model-based detection module and a machine learning-72Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsbased prevention module which dynamically learns the best strategy against an attacker.Through analysis and experiments we show that HANIDPS is able to detect various attacktypes with a high performance. The main contributions of this chapter are:• To the best of our knowledge we are the first who address the problem of automaticintrusion prevention in ZigBee-based HANs. Considering that in HANIDPS the pre-vention operation is performed automatically, the costs of false positives are low andlimited to some network overhead. Also the delay in stopping the attacks is signif-icantly shortened compared to when human intervention is required. This reducesthe damages caused by possible attacks.• HANIDPS is a novel algorithm which utilizes a model-based IDS along with a dy-namic machine learning-based prevention technique to detect and prevent intrusionswith low FPR and without prior knowledge of attacks.The rest of this chapter is organized as follows. In Section 4.2 we compare our methodwith existing approaches. An overview of HAN security threats is provided in Section4.3. Architecture and algorithm of HANIDPS is explained in Section 4.4. Section 4.5 and4.6 present performance analysis and experimental evaluations of HANIDPS. Section 4.7summarizes the chapter.ICG gzlvtzy lork vny CompvrisonDesigning IDSs tailored for smart grid subsystems has attracted the attention of researchersover the last few years. Mitchell et al. [26] introduced a behavior-rule based IDS (BIDS)for securing head ends, distribution access points and smart meters. For each section a setof high-level behavior rules were defined. An intrusion was detected when the behaviorrules were violated. This method provides a high accuracy; however, since the behavior73Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsrules are high level, BIDS is subject to detection delay. Besides, it does not provide aninsight into the cause of misbehaviors, therefore presents little information for stopping theattack. A hierarchical distributed IDS for AMI was proposed in [27]. The distributed IDScomponents were connected through a wireless mesh network. Each component employedsupport vector machine (SVM) and immune system for detecting intrusions. Applying thesame solution for different AMI networks including HANs, NANs and head ends which usedifferent protocols and have different traffic features makes this method inefficient. Unlike[26] and [27] which use the same mechanism for intrusion detection for various AMI net-works, we focus on HANs. This enables us to provide a high performance mechanism withthe ability of both detecting and preventing the attacks. Authors of [28] and [29] targetedthe detection of false data injection (FDI) attacks in smart grids. Lo et. al [28], proposeda hybrid IDS framework for AMI which uses power information and sensor placement todetect FDIs. They introduced algorithms for placing the sensors on lines or feeders to en-hance the detection performance. Chen et. al [29], exploited spatial-temporal correlationsbetween grid components for real-time detection of FDIs. A distributed IDS tailored forwireless mesh networks employed in NANs, was proposed in [30]. This work specificallytargeted the network layer attacks. In [31] a specification-based IDS for communicationsbetween smart meters and data aggregators was presented. Using C12.12 standard pro-tocol a set of constrains on data transmissions was made and attacks were detected bymonitoring the violations of the security policy. Authors of [90] introduced a two-tier IDSfor automatic generation control (AGC) in smart grids. The first tier was a short-termadaptive predictor for system variables, and the second tier performed state inspection toinvestigate the presence of anomalies. Combination of the two tiers provided a balancebetween accuracy and real-time requirements of IDS for AGC. We on the other hand focuson HAN protocols and provide a solution for not only detecting but also preventing the74Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsattacks. Model-based IDS for intrusion detection in wireless sensor networks has attractedthe attention of researchers in the past [32],[33],[91]. When the number of applications andprotocols are limited, model-based IDSs are very effective, since they provide a low FPRand are capable of detecting new attacks. In [32] and [33] model-based IDSs for modbusnetworks in SCADA were presented. Ioannis et al. [91] introduced a specification-basedIDS for detecting network layer attacks in wireless sensor networks including blackholesand grayholes. HANIDPS also uses a model-based approach. However, it distinguishesitself from [32],[33],[91] by covering the unique requirements of the area. First, HAN is astationary network which eliminates the need for monitoring moving objects. Second, HANcoverage area is comparably small and most communications are single hop. Therefore,unlike most IDS solutions for wireless sensor networks which are tailored for network layerprotocols our focus is on PHY and MAC layers. Third, none of [32],[33],[91] addressed theproblem of intrusion prevention for a wide range of attack types. A game theory-basedapproach for detecting and preventing distributed DoS using transport layer flooding at-tacks in wireless sensor networks was presented in [92]. Defense mechanism in this workwas dropping the packets once an attack is detected. HANIDPS employs a wider and moreeffective set of features and actions and therefore is capable of detecting and preventing alarger variety of attacks.ICH ]omz Vrzv cztfiork hzxurity ihrzvtsThreats against AMI can be viewed in three different ways: by type of attacker, motivationand attack technique. In [18] a threat model for AMI was provided which lists the typesof attackers and their motivations as follows:• Curious eavesdroppers, who are motivated to learn about the activity of their neigh-bors by listening to the traffic of the surrounding meters or HAN.75Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbs• Motivated eavesdroppers, who desire to gather information about potential victimsas part of an organized theft.• Overly intrusive meter data management agencies, which are motivated to gain high-resolution energy and behavior profiles about their users, which can damage customerprivacy. This type of attacker also includes employees who could attempt to spyillegitimately on customers.• Unethical customers, who are motivated to steal electricity by tampering with themetering system installed inside their homes or to gain control of the devices whichshould be under control of the utility.• Active attackers, who are motivated by financial gain or terrorist goals. The ob-jective of a terrorist would be to create large-scale disruption of the grid, either byremotely cutting off many customers or by creating instability in the distribution ortransmission networks. Active attackers attracted by financial gain could also usedisruptive actions, such as DoS attacks.• Publicity seekers, who use techniques similar to those of other types of attackers,but in a potentially less harmful way, because they are more interested in fame andusually have limited financial resources.From the above list unethical customers, active attackers and publicity seekers require toperform active attacks to achieve their goals. Curious and motivated eavesdroppers mightperform active or passive eavesdropping. IDSs are effective in protecting the networkagainst active attacks. Overly intrusive meter data management agencies use the dataavailable in head ends rather than targeting the HAN; therefore, are outside the scope ofthis work.76Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsSmart grid is a new concept. A thorough threat model which provides details onpossible attack scenarios and techniques, for different AMI networks and devices, is notavailable yet; providing such models is an open research topic. Few works have been donein this area over the past few years. In [93] a model for security analysis of smart meterswas provided. The authors proposed a systematic method for modeling functionalities ofsmart meters and deriving attacks that can be mounted on them. McLaughlin et al. [94]conducted a multi-vendor penetration testing on AMI devices within NANs. They devel-oped archetypal attack trees for three classes of attacks: energy fraud, denial of serviceand targeted disconnect. Grochocki et al. [95] surveyed various threats facing AMI andcommon attack techniques used to realize them; however, authors of [95] mentioned thatmethods of compromising HANs are beyond the scope of their work. While a thoroughthreat model for HANs does not exist in literature yet, and defining one is a stand aloneresearch topic which is beyond the scope of this thesis, in the following we provide someexamples of possible attacks against ZigBee HANs. We also explore a number of repre-sentative case studies to connect attacker objectives with individual attack steps. Table4.1 shows a summary of attacks against AMI, based on the threat model provided in [95].These attacks are applicable for HANs. For detail explanation of each attack we refer thereaders to [95].ICHCF Cvsz htuyizsIllzgitimvtz rzmotz turnBonDoff xommvnysThis attack can be used by unethical customers, publicity seekers and active attackerswith different motivations. An unethical customer might aim to gain control of a specificdevice which according to the customer utility agreement is under control of the utility.An antisocial publicity seeker might use this attack to achieve fame or unsettle a customer.77Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsTable 4.1: Example of attacks against ZigBee HANsCvtzgory Vttvxk tzxhniquz ivrgztDoS Collision in packet transmission HAN link layerDoS Jamming HAN physical layerDoS Resource exhaustion (battery, bandwidth, CPU) Node in HANDoS Destroy node Node in HANSpoofing Impersonate regular node Node in HANSpoofing Impersonate master node Node in HANSpoofing Man-in-the-middle HAN trafficSpoofing Brute-force Node in HANEavesdropping Passively listen to traffic HAN trafficEavesdropping Active cryptanalysis HAN trafficPhysical Compromise meter Node in HANAn active attacker with malicious intentions such as committing theft, kidnapping, etc.can use this attack to distract the customers. Terrorists might use this attack in largescale to cause horror and chaos, or to affect the load curve in order to damage the powersystem equipment. As a case in point, the attacker sends turn-off messages to all control-lable customer equipment. After passing a long enough time which guarantees that theequipment would turn on when allowed, the attacker sends turn-on permission messages.When applied in large scale, the attack results in a sudden increase in the load, which canaffect the bulk electric grid. Considering that for performing this attack against HANs theattacker must be within the ZigBee communication range, conducting it in large scale isexpensive. But terrorists can be very motivated and have high funding. This attack canbe done using the following steps.1. The attacker passively eavesdrops the network traffic or perform active network scan-ning to learn the link layer address of the authoritative node like EMS and the victimnode.2. The attacker needs to learn the authentication credentials of the authoritative node(if authentication is supported); this information can be obtained using brute-force78Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsattack.3. The attacker needs to know the encryption key to encrypt the control commands(if encryption is supported). Cryptanalysis techniques can be used for this purpose.Man-in-the-middle attacks might also be helpful in bypassing the encryption system.4. The attacker conducts DoS against the authoritative node to stop it from sendinglegitimate control commands.5. The attacker impersonates ID of the authoritative node and sends turn-on/off com-mands on its behalf to the victim node.htzvling xustomzr informvtionThe motivation of this attack is to collect customer information and learn about customerbehavior. For instance, in an organized theft an adversary can benefit from knowing thetotal electricity usage of the household to infer whether the customers are at home or not.The EMS is allowed to request this information from the smart meter, and the smart metersends the information to EMS through encrypted messages. Considering that HAN trafficmay be encrypted and authentication might be required for a node to access the network,this attack may involve the following steps.1. The attacker passively eavesdrops the network traffic or perform active network scan-ning to learn the link layer address of the EMS.2. The attacker needs to learn the authentication credentials of EMS (if authenticationis supported); this information can be obtained using brute-force attack.3. The attacker needs to know the encryption key to encrypt/decrypt the massages (ifencryption is supported). Cryptanalysis techniques can be used for this purpose.79Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbs4. The attacker impersonates ID of the EMS and requests for the usage information.5. The attacker decrypts the messages and collects the message contents.Dznivl of szrvixz vgvinst nztfiork noyzsAn unethical customer may conduct DoS against HAN nodes with the purpose of gainingcontrol of a specific device. By conducting DoS against authoritative nodes or a sensor nodeon a specific device such as thermostat, customer intervenes with the control commandsby the utility and does not allow the utility to control the device. In Section 4.6.1 weintroduce several inexpensive DoS techniques against IEEE 802.15.4 networks.ICI ]omz Vrzv cztfiork Intrusion Dztzxtion vnyerzvzntion hystzmICICF VrxhitzxturzHANIDPS is designed for physical (PHY) and medium access control (MAC) layers ofZigBee HANs and has two modules, detection and prevention.1) Detection module: The detection module monitors the network traffic between thesmart meter and sensor nodes, as well as nodes’ behavior and extracts the network features.These features are analyzed and compared with the expected normal behavior based onthe system specification. If one or more of the features are not normal an intrusion isdetected and the prevention module is triggered.2) Prevention module: Upon receiving an intrusion alarm an action or a set of actionsare automatically performed to stop or mitigate a possible attack. HANIDPS preventiveactions include spoofing prevention, interference avoidance and dropping malicious packets.80Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsAn adversary might use various attack types to disturb the network operation. Therefore,appropriate actions are required to countermeasure different attack scenarios. HANIDPSuses reinforcement learning to find the best strategy against an attacker. In reinforcementlearning, the process of learning happens via trial and error. Through feedbacks receivedfrom the environment, HANIDPS learns what the best and most effective actions are.HANIDPS has two components, monitoring agents and central IDPS (C-IDPS). Agentsare installed on sensor nodes and are responsible for monitoring the behavior of the corre-sponding nodes. They count the packet error rate (PER) and keep a record of RSS valuesof received frames from each communicating node. Agents send measured PERs to theC-IDPS periodically through health messages. Once an attack is detected by C-IDPS,agents might also take part in prevention operations based on the recorded RSS values.C-IDPS is installed on a full function device super-node, which is a tamper resistant devicewith higher capacity and computational power compared to normal nodes. Network trafficbetween sensor nodes flow through C-IDPS where traffic features are extracted. C-IDPSis responsible for analyzing the network features and health messages received form moni-toring agents to infer the state of the network nodes. Once an abnormal state is detectedC-IDPS performs a dynamic defense mechanism in which through Q-learning a set of pre-ventive actions are selected and performed. In the following subsections, we explain thefeature space, preventive actions and decision making process to choose the best sequenceof actions.ICICG [zvturzs vny htvtz hpvxzThe ZigBee alliance, HomeGrid Forum, HomePlug alliance and WiFi alliance created aconsortium for interoperability of energy management devices in HANs. The goal was todefine the interfaces and messages among smart appliances, smart meters and utilities. The81Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsalliances published a new draft for smart grid application communication, SEP 2.0 in July2012. NIST selected SEP 2.0 [96] as a standard profile for smart energy management inhome devices. We used this document as well as the IEEE 802.15.4 protocol specificationsand common features of wireless networks to extract network specifications used in definingthe feature space for HANIDPS. Components of the feature space are as follows:f1) Datagram (D): PHY and MAC layer frame structures according to IEEE 802.15.4specification are defined for C-IDPS. C-IDPS compares some features of the transmittingframes like frame size and reserved bits with the standard structure and decides whetherthe frames are normal. Also according to SEP 2.0, application layer protocol for homedevices is Hyper Text Transfer Protocol with Secure Socket (HTTPS) over TransmissionControl Protocol (TCP). The minimum and maximum length for each message, as well asthe mandatory and optional fields were defined in the protocol specification. Therefore,the theoretical minimum and maximum length of packets can be calculated. The totallength of a legitimate SEP 2.0 message is between 508 and 1524 bytes.f2) Traffic Rate (TR): SEP 2.0 states that ”to prevent overwhelming network resources,notifications should be sent to a given client for a given resource no more than once every30 seconds. Notifications for conditional subscriptions should only be sent once within thistime period for a given client for a given resource and any additional notifications shouldnot be queued. All devices need to be considerate of network resources”. While in SEP2.0 end devices are responsible for pulling network time from the network controller, ithas been mentioned that the granularity of the devices must be one second. Therefore, agenuine node that complies with the protocol specifications does not surpass a limit fortraffic rate.f3) RSS: According to the laws of physics, the signal strength at a receiver antenna isproportional to the spatial distance between the receiver and the sender. Beside distance,82Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsTable 4.2: Requirements of HAN according to the U.S. department of energy guideline.hmvrt Griy [unxtionvlity WvnyfiiythA avtznxyA VvvilvwilityAdvanced Metering Interface 100 kbps/node, 2-15 sec, 99-99.99%Demand Response 100 kbps/node, 0.5-2 sec, 99-99.99%Distributed Energy Resources 100 kbps/node, 0.02-15 sec, 99-99.99%Electric Vehicles 100 kbps/vehicle, 2 sec-5 min, 99-99.99%RSS depends on wireless environment features, such as absorption and multipath effect,which makes it hard to predict the power level of frames collected by a receiver. Thus, anattacker cannot simply adjust his power level to match the RSS of a legitimate node. Thisfeature is useful in detecting spoofing attacks in which an adversary masquerades identityof a legitimate node and send traffic on its behalf. In Chapter 3, we have introduced a highperformance RSS-based method for detecting spoofing attacks in static IEEE 802.15.4networks. We use the same algorithm in HANIDPS to decide whether RSS values arenormal or suspicious.f4) Sequence Number (SN): The regular ordering of sequence numbers according to thestandard is defined for C-IDPS. Unusual sequence numbers can be suspicious.fI) Packet Error Rate (PER): A major cause of high PER is the use of a busy channel.IEEE 802.15.4 employs carrier sense multiple access/collision avoidance (CSMA-CA) toevaluate availability of the channel. Under normal condition PER must be low. However,illegitimate (such as jamming sources) or legal (such as WiFi interference) coexistence ofother signals can increase the PER. This will affect the network throughput and causelatency in packet delivery. We use a mathematical model for ZigBee performance to findthe acceptable PER which allows the required bandwidth and latency for HAN operations.A summary of network requirements for HAN, as indicated by the U.S. department ofenergy [97], is provided in Table 4.2. In [98], a model for computing maximum delay andupper bound on data rate in beacon-enabled guaranteed time slot (GTS) Zigbee networks83Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbswas provided as in (4.1) and (4.2)Dmtx =wX+ (k − 1)×BI − (iGgS + ie)− k × iGgS (4.1)BVNg (t) = X × iGgS + X × (t− (2×BI − ie)) (4.2)where Dmtx is the maximum delay, w is the minimum burst size, X is the service rate whichfor ZigBee is 250 kbps, k is the number of GTS slots, BI is the beacon interval, iGgS is theduration of data transmission within the GTS slot, ie is the portion of the GTS duringwhich there is no data transmission, and BVNg (t) represents the upper bound on data rate.To account for network interference, and considering that the number of re-transmissionsbefore declaring channel access failure in ZigBee specification is 5, [98] modified (4.1) and(4.2) by replacing X with Xtwj as in (4.3):Xtwj = e × X ×4∑i=0(1− e )i (4.3)where e is the probability of a clear channel. As (4.1), (4.2) and (4.3) suggest the max-imum delay and guaranteed data rate of a client is a function of the number of assignedGTSs and the probability of clear channel for the client. We calculate the minimum prob-ability of clear channel for different numbers of assigned GTSs, which allows the requiredperformance according to Table 4.2. PER for an IEEE 802.15.4 network in presence ofinterference is formulated as follows [99]:eEg = 1− (1− eu)az−[Tnb ] × (1− e Iu )[Tnb ] (4.4)eu is bit error rate (BER) without interference, eIu is BER with interference, cz is the84Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsnumber of bits in a packet, w is the duration of bit transmission, and ic is the collisiontime. Without interference e = 1 − eu and in presence of interference e = 1 − e Iu .Therefore, we can calculate the threshold for acceptable PER. For instance, using (4.1),(4.2) and (4.3) and according to Table 4.2 the theoretical ranges that satisfy the demandresponse requirements in terms of number of GTS slots and probability of clear channelare bounded by (4,0.99), (5,0.7), (6,0.5), (7,0.3). In calculating these points the followingparameters were used based on the ZigBee and SEP 2.0 specifications: w=508 bytes, X=250kbps, BI=960 symbols.f6) Node Availability: Represents whether or not health messages are received from theagents.The detection module evaluates each of the above features and if one of them is ab-normal, an intrusion is detected and the prevention module is triggered. The results ofthe evaluation of these features are also used to define the state space for the preventionmodule. The components of the state space are defined as {f1,...,f6} where fi, i=1,...,6are binary values, assigned by evaluating features 1-6. For each of D, TR, RSS, SN, PERand NA, C-IDPS checks whether or not the feature is normal, as described above, andaccordingly assigns a binary value to fi, 0 if the feature is normal and 1 if it is abnormal.ICICH Vxtionsv1) Spoofing Prevention: In Chapter 3, we proposed an RSS based method for distin-guishing and filtering illegitimate packets in static IEEE 802.15.4 networks. We use thedynamic threshold method introduced in Chapter 3, Section 3.6.2 as a defensive mechanismin HANIDPS.v2) Interference Avoidance: In [99] an algorithm for detecting and avoiding WiFi interfer-ence in ZigBee networks was proposed. The method is also effective in combating other85Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbssources of interference such as jamming signals. We adopt the interference avoidancescheme in [99] as one of the actions that HANIDPS may perform when encountering withan abnormal state. The summary of the algorithm is as follows. C-IDPS checks its linkquality indicator (LQI). LQI is a MAC layer parameter which indicates the current qualityof received signals, and provides estimation on how easily received signals can be demodu-lated. LQI value ranges from 0 to 255 and is inversely related to PER. When LQI is small,it can be inferred that a high PER is due to poor link quality rather than problems withend device. If the LQI is low the coordinator makes all the routers within the personal areanetwork (PAN) to perform interference assessment through energy detection (ED) scansdefined in ZigBee protocol. During an ED test, transceiver scans all the IEEE 802.15.4complaint channels in the frequency band supported by the transceiver. If ED is beyondthe threshold of 35 (which corresponds to the noise level between -65 dBm to -51 dBm)interference is detected. Based on the results of ED scans the coordinator selects a channelwith an acceptable quality and all PAN devices migrate to this new channel.v3) Packet Drop: C-IDPS discards the received packets, which datagram or sequencenumbers deviate from those predicted by the protocol, without forwarding them to theintended destinations.v4) Packet Forward: C-IDPS forwards the received packets without further processing.This is helpful when an attacker targets the IDPS by sending high rate traffic and byexhausting the IDPS causes DoS. Another example is when none of the actions is effectivein mitigating the attack and only impose overhead to the system.ICICI azvrning VlgorithmAfter evaluating the network features, CIDPS infers the network state s and if an intrusionis detected by the detection module, it performs an action v which results in transition to86Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsa new state s′. The transition probability only depends on the current state and action,and is independent from all previous states and actions; therefore, satisfies the Markovproperty. Considering that attackers use different strategies to target the network, theinteraction between HANIDPS and the attacker creates a dynamic environment, meaningthat by taking an action in a given state, the next state is unpredictable. Therefore, amongvarious existing reinforcement learning algorithms we use Q-learning which is suitable fordynamic environments. In Q-learning a utility function is defined as a map between thestate-action pairs and their Q values. Q-values predict the cumulative reward that will bereceived following the state-action. When the state space is not too large a look up table(LUT) can effectively be used as the utility function. The LUT can be initialized by zeroor based on previous knowledge of the environment. During the learning process the timeis divided into decision epochs. In each epoch the agent chooses action v in state s whichresults in transition to state s′ and receiving a reward or penalty based on how appropriatethe transition is. After each epoch the LUT is updated using (4.5):f (sP v) = f (sP v) + α (g (sP s′P v) +mvxt′γf (s′P v′)−f (sP v)) (4.5)where α is the learning rate and γ is the discount factor. We use polynomial learningrate in which α = 1R(1 + t)! since it has a faster convergence rate compared to linearlearning rate [100]. As explained in Section 4.4.2, in our problem the state elements arebinary values assigned based on whether or not the network features are normal. At states, state elements are {f s1 P OOOP f s6}. At each state four actions described in Section 4.4.3 arepossible: spoofing prevention, interference avoidance, packet drop and packet forward. Byperforming action v in state s the process moves to state s′ with state elements {f s′1 P OOOP f s′6 }.To assess the new state, C-IDPS extracts and reevaluates the features of network trafficafter performing the action. Following each action a reward is received based on the state87Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbstransition as formulated in (4.6). We define the reward as a function of the changes instate elements and the cost of performing an action as follows:g (sP s′P v) =6∑i=1βi(f si − f s6i)− xost (v) (4.6)where βi is the weight of feature i which is assigned according to the importance of eachfeature. For instance, given that node availability might have more priority than havinga normal traffic rate, by choosing β6 S β2, higher reward will be assigned to the actionwhich keeps the node available. The cost function, xost(O), reflects the costs associatedwith performing an action. It accounts the network overhead, imposed delay and resourceusage such as battery consumption following performing an action. As a case in point, thecost of dropping packets is lower than changing the channel which results in some networkoverhead, delay and resource usage. While the LUT can be initialized randomly, thereexist some relationship between the features and actions. For instance when the datagramdoes not comply with the specifications, dropping the packets might be a better choicecompared to other actions. This knowledge can be employed to initialize the LUT to reducethe learning time. If actions v2 or v4 is selected, {f s61 P OOOP f s66 } are equal to the elements ofthe new state s′. Otherwise, if actions v1 or v3 is performed, C-IDPS reevaluates featuresf1 to f4 of the forwarded packets by the C-IDPS and accordingly defines {f s61 P OOOP f s64 }. f s6Iand f s66 are equal to fs′I and fs′6 , respectively. We used this method because when actionsv1 or v3 are performed C-IDPS drops some of the network packets and in order to decidehow suitable these actions have been we need to evaluate some features of the forwardedpackets rather than all of the network traffic.88Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsICJ ezrformvnxz VnvlysisICJCF ezrformvnxz of thz Dztzxtion boyulzAmong 6 features used in HANIDPS to detect network abnormalities, 4 (D, TR, SN, PER)are directly extracted from system specifications and are not considered major sources offalse positives. For instance, a healthy packet never has a D different from what theprotocol defines. SN of legitimate packets also does not circumvent the specification. Theprobability of having a TR and PER beyond the threshold under normal condition is verylow, since SEP 2.0 specified strict rules for traffic rates and network requirements. Whileambient noise, temporary system faults or wireless communication faults can produce falsepositives in evaluating NA and some other features, the major cause of false positives inHANIDPS is the RSS evaluation results, since RSS is a statistical parameter and a trade offbetween FPR and DR is required. FPR of the detection module is formulated as follows:Feg = e(6⋃i=1(fi = 1) |normvl)≤6∑i=1e inormtl (fi = 1) ≈ e 3normtl (f3 = 1) (4.7)where e inormtl is the probability distribution of ith feature under normal condition. As wehave shown in Chapter 3:e 3normtl (f3 = 1) = 1− F2(m=( τ2σ2)(4.8)χ2(m) is Chi-square distribution with m degrees of freedom which is equal to the numberof AMs used in spoofing detection. HANIDPS uses one AM which is in the C-IDPS. F(O)represents the CDF of χ , σ is the variance of RSS values and τ is a threshold. When τ is89Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsexceeded, a spoofing attack is detected. The DR of the spoofing detection module is:Dg = e 3ttttck (f3 = 1) = 1− F2(mN 22)( τ2σ2)(4.9)where e 3ttttck is the probability distribution of RSS feature under attack and λ is thedistance between SDC distribution of the attacker and the genuine nodes. For further ex-planation on performance of the spoofing detection module we refer the readers to Chapter3.For other features attacks are only detectable if they create a network traffic thatdoes not comply with the SEP 2.0 specifications. This, however, enables detection andprevention of a variety of attack types.ICJCG ezrformvnxz of thz erzvzntion boyulzThe intrusion prevention module performs two types of actions, training and operation.F) irvining vxtionsO These actions are taken for the purpose of training the Q-learning.They are selected before the algorithm is converged and are not effective in mitigatingattacks. Therefore, they are only considered sources of network and system overhead.Assuming a simple case when one action, vj, is effective in mitigating the attack, and iiteration is required before the algorithm converges, the overhead of the learning algorithmis:dverhevy =13g∑t=14∑i=1Ni ̸=jei(t)d(vi) (4.10)where ei(t) is the probability of choosing action i at iteration t, and depends on the initialvalues, reward function and the order of states-actions in the LUT. d(vi) represents theoverhead of performing action i. i is the convergence time of the algorithm. In [100] the90Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsconvergence rate of Q-learning was studied. The authors showed that in asynchronous Q-learning with polynomial learning rate, aside from parameters of Q-learning, convergencetime is related to the covering time, a. The covering time indicates the number of state-action pairs starting from any pair, until all state-actions appear in the sequence withprobability of at least 0.5. The order of this dependency is Ω(a2+1! + a11−! ) which isoptimized for ω = 0O77. Having 6 binary features, the state space size of HANIDPS is 64;for each state there exist 4 possible actions. However, not all states are experienced duringa specific attack. The number of states that are crossed during an attack, and thereforethe covering time depend on the attack complexity. The number of states can range from2 when the attacker uses a specific attack type which follows a same routine over time, toa few when the attacker reacts and adjust his/her strategy according to the actions takenby HANIDPS. Our experiments in Section 4.6 shows that the algorithm converges withthe required precision in few iterations which keeps the learning overhead in an acceptablerange.G) dpzrvtion vxtionsO These actions are taken after the algorithm is trained and arechosen as the best defense mechanism against the attack.GCF) ezrformvnxz of hpoofing erzvzntionO In Chapter 3, Section 3.7.2 we evaluatedthe performance of the spoofing prevention algorithm in detail.GCG) ezrformvnxz of Intzrfzrznxz VvoiyvnxzO Experiment results in [99] showed thatwith the ED scan duration of 135 ms, the proposed algorithm provides the best balancebetween scan duration and accuracy, where the LQI readings for 4600 packet transmissionfor each channel is analyzed. During this time network nodes cannot transmit normaltraffic. In return, the authors showed that by choosing a less interfered channel, thesensors battery life can be prolonged by up to 2-3 years; while if the network operatesunder high interference, a high PER will result in a large retransmission rate which wastes91Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsthe energy of sensors. The other two actions including packet forward and packet dropdoes not impose significant delay or network overhead.ICK ZvvluvtionsTo evaluate the performance of HANIDPS, we study existing attacks against IEEE 802.15.4networks and analyze the detection and prevention capability of HANIDPS against them.Further, we conduct two experiments to show how HANIDPS dynamically learns the mostefficient strategy against an attack.ICKCF ]omz Vrzv cztfiork Intrusion Dztzxtion vny erzvzntionhystzm vgvinst IZZZ MEGCFJCI vttvxksF) gvyio JvmmingO Radio jamming is intentional or unintentional emission of radio signalswhich by decreasing the signal to noise ratio disturb data flow of a wireless network.When the network is under jamming, PER and NA are not in the expected range. Highinterference increases the PER, and since nodes are not able to communicate properly,health messages will not be received by the C-IDPS. When these two features of the featurespace are abnormal, HANIDPS detects an attack and triggers the prevention module.From the actions defined for the HANIDPS, interference avoidance is effective in stoppingunintentional jamming. C-IDPS changes the network channel to a new high quality one.Coexistence of WiFi networks is one of the significant concerns in ZigBee HANs [99], sincehigh rate WiFi traffic can cause unintentional jamming. When jamming is unintentionalthe interfering device will not change its channel and therefore by migrating the HAN tothe new channel the problem is resolved. This is also true for simple cases of intentionaljamming. But in more complicated attacks, for instance when the attacker also changes its92Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbschannel or conduct a wide band jamming which covers the whole ZigBee frequency range,interference avoidance scheme will not be effective. Yet these attacks are more expensiveand energy consuming.G) gzplvyBprotzxtionO IEEE 802.15.4 uses a replay-protection mechanism in which thesequence number of a received frame is compared with the sequence number of the previousframe. If the former is equal or smaller than the latter this frame will be dropped. Anattacker can send frames with large sequence numbers to a receiver, causing it to droplegitimate frames. The detection module checks the sequence number of the frames. Ifthey do not comply with the protocol standard, SN feature will be abnormal and the attackwill be detected. By dropping the illegitimate packets, which is one of the actions definedfor the HANIDPS, this attack can effectively be stopped. The prevention module learnsthe proper action after a few iterations.H) htzgvnogrvphyO An attacker uses the reserved fields of packets to create a hiddenchannel and transfer hidden data. A detailed investigation of steganography attacks inIEEE 802.15.4 was reported in [101]. HANIDPS checks datagram of packets and if itis abnormal (for instance when reserved bits are not 0 as it happens in steganography)detects an attack. The effective action against this attack is dropping malicious packetswhich is learned through reinforcement learning and performed by the prevention modulein HANIDPS.I) WvxkBoff mvnipulvtionO A malicious node steals channel access of legitimate nodesby using an instantly short back-off period. The malicious node is either one of the networknodes that has been penetrated, or an outside node which forges the ID of one or morelegitimate nodes and conducts spoofing to access the network. In both cases the attackerhas a very high traffic rate and TR for that node will be 1. Also since nodes are not ableto transmit their packets properly, health massages might not be received by C-IDPS and93Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsNA will be 1. Besides, not being able to access the channel increases the PER. In the caseof spoofing, RSS values are not normal either. Hence, by causing abnormal TR, NA, PERand RSS the attack is easily detectable. The prevention module can mitigate the attack byfiltering high rate traffic from the attacker and not allowing the coordinator to assign thechannel to that node. Another action that can be helpful, in case of abnormal RSS values,is spoofing prevention. Through interaction with the environment and receiving rewardsand penalties HANIDPS converges to the best action.J) Doh vgvinst yvtv trvnsmission yuring xontzntion frzz pzrioy =CFe)O A maliciousnode extracts ID and GTS number of legitimate nodes through eavesdropping; then forgestheir IDs and by sending GTS deallocation requests terminates their traffic [58]. Since themalicious and legitimate nodes are not located at the same place, their RSS values willnot be similar and the attack is detectable by evaluating RSS feature. Also through spoof-ing prevention action, packets with abnormal RSS values are distinguished and filtered.Therefore, once the prevention module learns what the appropriate action is, the attack isstopped.K) Doh vgvinst Gih rzquzstsO An adversary keeps track of the GTS list and fills up allavailable GTSs by sending several GTS allocation requests. As a result, legitimate nodeswill not have the chance to transmit their data during the CFP [58]. Again, since RSS ofthe attacker does not match that of the genuine node, the attack is detectable and throughspoofing prevention malicious packets can be distinguished and dropped.ICKCG Zxpzrimznt FWe study the performance of HANIDPS under a simple spoofing attack where other thanRSS values traffic features of malicious node is consistent with the protocol. DoS againstGTS requests is an example of this attack. We used the same method and testbed as it94Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsFigure 4.1: ROC of the detection module.has been explained in Chapter 3, Section 3.8 to implement and evaluate the performanceof the HANIDPS against a spoofing attack.G) ezrformvnxz of thz yztzxtion moyulz By applying the spoofing detection mechanismon RSS logs, in average we observed 92.5% DR for 0% FPR. The average ROC curve basedon the RSS logs of 4 AMs is depicted in Fig. 4.1. When the attack is detected the state ofthe victim node is {0,0,1,0,0,0} which triggers the prevention module.H) ezrformvnxz of thz przvzntion moyulzirvining ehvszO We implemented Q-Learning in MATLAB. The LUT was initializedby 0.5. We considered the same weight for all features, and assigned the cost of 0.25, 0.5,0.1, 0 to actions 1-4, respectively. The highest cost was defined for interference avoidancesince it is an energy consuming process. The first four actions were exploratory [100].Then we reduced the exploration rate by a factor of 14[ t4 ] as the algorithm proceeded (i isthe number of iteration). Use of exploratory actions allows the algorithm to explore allactions and converge to the most effective ones. The learning rate was polynomial withω = 0O77 and the discount factor was γ = 0O1. Following the spoofing prevention action thestate of the node was changed to {0,0,0,0,0,0} while for other actions the state remainedunchanged. We observed that for this simple case where only two states are experienced,the algorithm learned the best action after 4 iterations.dpzrvtion phvszO We evaluated the performance of the spoofing prevention mechanism95Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsTable 4.3: Performance for dynamic threshold spoofing prevention.ihrzsholy xozffixiznt FDG FDH FDI FDJeg(:) 93.46 96.07 97.14 97.62[eg(:) 6.16 11.17 16.21 21.68as the effective action against the attack. We applied the spoofing prevention algorithm toRSS logs. This time the AM nodes played the role of a victim. Performance was measuredwhen the threshold was 1/2, 1/3, 1/4 and 1/5 of the distance between the mean values.The window size, i.e. the number of frames which are clustered each time was 64. Theresults are shown in Table 4.3.ICKCH Zxpzrimznt GIn the second experiment we consider a scenario in which an attacker uses ID of a legitimatenode and sends packets that do not comply with the standard datagram. For instance,the value of some of the reserved fields in the packets are different from what have beenspecified in the standard, this is an example of Steganography attack. In this scenario thestate of the victim node is {1,0,1,0,0,0}. Both v2 and v3 can stop the attack and change thestate to {0,0,0,0,0,0}. However action 3 has a lower overhead and in this case has a higheraccuracy. By implementing this scenario with the same parameters as in experiment 1 weobserved that the algorithm was converged to choose action 3 after 4 iterations.ICL hummvryIn this chapter we have introduced HANIDPS, a novel IDPS for ZigBee-based HANs. Con-sidering the insecure environment, use of wireless technology and limited resources of HANdevices, HAN is vulnerable to cyber attacks which necessitates application of appropriateIDSs. Also due to the large scale and high cost of false positives, in the context of HAN,96Whuptyr HB HUbIDdg, Un Intrusion Dytywtion und dryvyntion gystym for nigByy HUbsIDPSs which not only detect but also automatically stop the attacks are highly required.HANIDPS combines a model-based intrusion detection method tailored for HAN specifi-cations and a machine learning-based prevention technique which enables dynamic defenseagainst adversaries without prior knowledge of the attacks. Using novel techniques forspoofing prevention, and through utilization of effective mechanism for countermeasuringintentional and unintentional interference, HANIDPS secures the network against a varietyof attack types. Extensive analysis and simulations have proved the effectiveness of ourapproach.97Chvptzr JDztzxtion of bvlixious Vxtivitizs inVbI Using Customzrs ConsumptionevttzrnsJCF IntroyuxtionOne important property of AMI is that unlike most traditional IT systems, a good portionof the data transferred through AMI is to a high degree predictable. Smart meters sendtheir usage reports to the utilities in predefined time intervals. Not only are the timeintervals between messages constant, but also it is possible to find a statistical model forcustomer’s consumption pattern. Irregularities in usage pattern can be a sign of somemalicious activities. In this section, we leverage the predictability of AMI data to detectadversarial activities such as energy theft and attacks against direct and indirect loadcontrol.Non-technical losses is one of the major concerns in smart grids, since application ofdigital smart meters and addition of a cyber layer to the metering system introduce nu-merous new vectors for energy theft. Current AMI ETDSs are mainly categorized intothree groups [102]: state-based, game theory-based and classification-based. State-baseddetection schemes [28],[36],[37] employ specific devices, like wireless sensors and radio-98Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsfrequency identification tags, to provide a high detection accuracy, which however, comewith the price of extra investment required for the monitoring system including device cost,system implementation cost, software cost and operating/training cost. In game theory-based methods [38],[39], the problem of electricity theft detection is formulated as a gamebetween the electricity thief and the electric utility. These methods may present a lowcost and reasonable, though not optimal, solution for reducing energy theft. Yet, how toformulate the utility function of all players, including thieves, regulators and distributors,as well as potential strategies is still a challenging issue. classification-based approaches[40],[41],[42],[43],[44],[45],[46],[47] take advantage of the detailed energy consumption mea-surements collected from the AMI. Under normal condition customers’ consumption followcertain statistical pattern; irregularities in usage pattern can be a sign of some maliciousactivities. Data mining and machine learning techniques are used to train a classifier basedon a sample database, which is then utilized to find abnormal patterns. Since these tech-niques take advantage of the readily available smart meter data, their costs are moderate.However, several shortcomings in existing classification-based schemes limit their DR andcause a high FPR.One problem with utilization of machine learning classifiers in ETDS is data imbalance;i.e., the numbers of normal and abnormal samples are not in the same range. Benignsamples are easily available using historic data. Theft samples, on the other hand, rarelyor do not exist for a given customer. Besides, due to zero-day attacks in many casesexamples of attack class cannot be obtained from the historic data. Lack of a thoroughdataset of attack samples limits the DR. Furthermore, classification-based methods arevulnerable to contamination attacks, where by granular changes in data and polluting thedataset an adversary deceives the learning machine to accept a malicious pattern as anormal one.99Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsAnother challenging issue that affects the performance of classification-based methods isthe fact that several non-malicious factors can alter the consumption pattern, e.g., changeof residents, change of appliances, seasonality, etc. If such factors are not dealt withproperly, they will result in a high FPR. As argued in [103], due to the base-rate fallacyphenomenon, the FPR is the limiting factor for the performance of an intrusion detectionsystem. This is particularly true for ETDSs where false positives are very costly. Once anenergy theft attack is detected, on-site inspection is required for final verification, which isan expensive procedure.Finally, most existing classification-based methods require a high sampling rate toachieve an acceptable accuracy. However, an effective ETDS must not jeopardize cus-tomers’ privacy. As discussed in [104], while there are several techniques to mitigateprivacy concerns such as power mixing and data aggregation, the privacy-by-design prin-ciple of data minimization is the most effective one. The required sampling rate of smartmeters, which enables achievement of smart grid goals like load management and demandresponse, is still a controversial issue. The higher the sampling rate, the more is the riskof revealing customers’ private information. Thus, ETDSs that do not rely on a high datacollection frequency are preferable.Beside energy theft, increase in use of information technology for demand side manage-ment introduces new types of cyber-intrusions that aim to change the load curve in orderto damage the power system equipment by causing circuit overflow or other malfunctions.These attacks work by tampering the pricing information or direct load control commands.We propose a detection mechanism for such attacks that work based on monitoring of theusage pattern of customers.The main contributions of this chapter are:• We design a novel algorithm for detecting energy theft attacks against AMI, CP-100Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsBETD. CPBETD employs transformer meters along with monitoring of abnormalitiesin customers’ consumption patterns to provide a cost-effective and high performancesolution for energy theft detection. Through application of appropriate clusteringtechniques and transformer meters, unlike existing classification based methods, CP-BETD is robust against contamination attacks and non-malicious changes in con-sumption patterns, and therefore achieves a higher DR and a lower FPR.• We address the problem of imbalanced data and zero-day attacks by generating asynthetic attack dataset, benefiting from the fact that theft patterns are predictable.Through extensive experiments we show that this significantly improves the DR andenables the detection of a wide range of attack types.• We study the effect of sampling rate on detection performance, and show that com-pared to existing methods, CPBETD provides a higher performance with a lowersampling rate. Hence, it has a smaller effect on customers’ privacy.• We argue that the metrics commonly used for ETDS performance evaluation in theliterature are not adequate. While DR and in some cases FPR are used for perfor-mance evaluation, we suggest that Bayesian detection rate (BDR) and associatedcosts of the ETDS should also be considered to justify its application.• We test the performance of CPBETD with real data from over 5000 customers. Thedataset is publicly available and can be used as a benchmark for comparison amongdifferent energy theft detection methods. The best and most recent energy theftdetection solutions existing in the literature are also tested for comparisons. Theresults prove the effectiveness of our approach.• We introduce instantaneous anomaly detection IAD which is an algorithm for de-tecting attacks against direct and indirect load control by monitoring abnormalities101Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsin consumption patterns in a neighborhood. Through simulations we show the effec-tiveness of the proposed algorithm.The rest of this chapter is organized as follows. In Section 5.2, we survey the literaturerelated to electricity theft detection in AMI. Threat models for electricity theft attacksare provided in Section 5.3. Section 5.4 describes the CPBETD algorithm. A theoreticalanalysis of CPBETD is provided in Section 5.5 and experimental result are provided inSection 5.6. In Section 5.7 we introduce IAD and we evaluate its performance in Section5.8.JCG gzlvtzy lorkIn this section we review existing ETDSs in the literature, which employ consumption dataof smart meters to find fraudulent customers. Monitoring of customers’ load profiles forsigns of energy theft in traditional power systems had attracted the attention of researchersin the past. In [105], using historical consumption data, a data mining method along withSVM classifier were used to detect abnormal behaviors. The average daily consumptionsof customers over a two year period were calculated and the long term trend in energyconsumption was used to detect fraudulent customers. This method is only capable ofdetecting abrupt changes in load profile. Besides the detection delay is about two years.In [41], using six months usage reports, five attributes including average consumption,maximum consumption, standard deviation, sum of the inspection remarks and the av-erage consumption of the neighborhood were chosen to create a general pattern of powerconsumption for each customer. K-means based fuzzy clustering was performed to groupcustomers with similar profiles. A fuzzy classification was then performed and Euclideandistances to the cluster centers were measured. Customers with large distances to thecluster centers were considered potential fraudsters. Clustering the customers and relying102Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnson long-term measurements limits the accuracy of this ETDS and causes long detectiondelay. Having more detailed metering information in AMI, CPBETD provides a muchbetter performance with a much shorter delay.A series of works [42–44] by Depuru zt vlC have studied the use of AMI data inclassification-based ETDSs. In [42] SVM was used to approximate the consumption pat-tern of customers based on 96 readings of smart meters in AMI for each day. The classifierwas trained using historical data of normal and theft samples. New samples were classi-fied based on some rules as well as the SVM results. In [43] a neural network model wasincorporated to estimate SVM parameters in order to reduce the training time of the clas-sifier, and a data encoding method was proposed to improve the efficiency and speed of theclassifier. Their method, however, is only effective in detecting energy theft attacks thatresult in zero usage reports, since in one step of the encoding procedure the metering dataare converted into binary values. Therefore, the proposed classification technique cannotdetect a wide range of attack types. In [44], an improved encoding technique was proposedand both SVM and a rule engine based algorithm were applied on encoded data to im-prove the classification accuracy. Portions of the algorithms were parallelized to reduce thedetection time. Still the algorithm suffers from the common shortcomings of classificationbased methods as we have described in detail in the introduction Section. The databaseused for performance evaluation is not publicly available and the authors did not explainhow they obtained theft samples for all customers or what percentages of the training andtesting data were theft patterns. On the other hand, in this chapter we focus on addressingproblems associated with classification-based techniques such as imbalanced data, contam-ination attacks and dynamic property of consumption patterns to improve the detectionaccuracy. We also argue that detection time might not be as crucial as other factors likeFPR or implementation cost of the ETDS.103Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsIn [45], the problem of class imbalance for energy theft detection in traditional powersystems was addressed. Considering that the number of people who commit fraud ismuch less than the number of honest customers, standard classifiers are overwhelmed bybenign samples and tend to ignore the minority class. To achieve a higher performance,different classification techniques, including one-class SVM, optimum path forest and C4.5decision tree, were combined in [45]. The combined classifier showed 2-10% improvementsover individual classifiers. The shortcoming of this method is that while it imposes ahigh computational load, the performance improvement is not substantial. We addressthe problem of class imbalance by generating a dataset of malicious samples using thebenign ones. As our experiments show, this method significantly improves the classificationperformance.In [36], a multi-sensor energy theft detection framework for AMI (AMIDS) was pre-sented. AMIDS collects evidences of malicious behavior from three types of informationsources: 1) cyber-side network and host-based IDSs, 2) on-meter anti-tampering sensors,and 3) power measurement-based anomalous consumption detectors through nonintrusiveload monitoring (NILM). These three types of information were combined to minimizeFPR. While combining the information from different sources is effective in reducing FPR,the algorithms chosen for detecting anomalies have some drawbacks. Use of NILM, whichrequires a high sampling rate, reveals information about the types and time of use ofappliances in customers’ premises. In this work we focus on detecting energy theft at-tempts only based on customers’ consumption patterns that can also be used as a part of amulti-sensor framework like AMIDS. Compared to the NILM-based technique, CPBETDprovides a high performance with a much lower sampling rate.Salinas zt vlC [46] introduced a privacy preserving ETDS that uses peer-to-peer (P2P)computing. A central meter is deployed in each neighborhood to measure the total elec-104Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnstricity consumption at each time instance, which is assumed to be a linear combinationof the energy consumption of all customers in the area. By solving a linear system ofequations suspicious users are found. While the proposed method is valuable in that it isthe only privacy preserving scheme so far, there are several limitations in this approach.Although the method is vulnerable to inaccuracies in technical loss (TL) calculation, thiseffect has not been studied. Besides, this method is only effective for detecting energy theftattacks with constant reduction rates, where the real consumption values are multiplied bya constant less than one for several consecutive readings. However, there are many otherenergy theft scenarios, such as sending random numbers as data. Our approach is capableof detecting more diverse attack types. CPBETD also employs central meters but in acompletely different approach. Here we use distribution transformer meters to short listareas with high probability of theft, and to overcome the limitations of classification-basedtechniques.Authors of [47] suggested modeling the probability distributions of the normal andmalicious consumption patterns, and application of the generalized likelihood ratio (GLR)test to detect energy theft attacks. They used auto regressive moving average (ARMA) tomodel customers’ normal and malicious consumption distributions. They assumed that anattacker would choose a probability distribution that decreases the mean value of the realconsumption. This, however, is not necessarily true with AMI. Considering the dynamicpricing in smart grids, by only changing the order of meter readings without altering theaverage, electricity theft is possible. Another major issue with ARMA-GLR detector is thatit is only effective if the normal behavior and attack patterns can accurately be modeledby an ARMA process. Mashima zt vlC [47] also studied non-parametric algorithms such asexponentially weighted moving average, which provide a better precision when the modelassumptions are inaccurate. They observed a better performance for ARMA compared to105Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsother models. Our tests show that CPBETD significantly outperforms the ARMA-GLRdetector.JCH ihrzvt boyzlThe main objective of an energy theft attack is to pay less than the real value for theconsumed energy. Energy theft attacks against AMI are categorized into three groups:• Physical attacks: where illegal customers physically tamper with their meters toreport a lower usage, for instance through application of a strong magnet to causeinterference, or by reversing or disconnecting the meters. Customers might alsodirectly wire high consuming appliances to an external feeder, bypassing the meters.• Cyber attacks: can be used within the smart meters or over the communication linkto the utility company. Examples include gaining privileged access to the meterfirmware, tampering with the meter storage, interrupting measurements, and inter-cepting the communication link.• Data attacks: target the metering values and are enabled through physical and cyberattacks.Detailed explanation of attacks in each category are available in [36]. CPBETD is de-signed to detect abnormalities in the consumption patterns. Considering that the outcomeof each attack is manipulation of the meter readings, CPBETD is effective in detecting alltypes of attacks presented above.106Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsJCI Consumption evttzrn Wvszy Zlzxtrixity ihzftDztzxtion VlgorithmAmong the three key elements of information security, i.e., confidentiality, integrity andavailability, energy theft attacks target data integrity. The major cost of energy theft isfinancial loss to the utility company caused by unpaid usage. Therefore, while perfor-mance is important, the monetary cost introduced by the detection mechanism must beminimized. For the task of theft detection, false positives can be very expensive, since oncea suspected fraudulent customer is detected, on-site inspection is required. On the otherhand, factors such as detection delay might not be as critical, since the attacks barelyintroduce immediate damage and once an attack is detected the associated financial losscan be compensated through appropriate fines. In Section 5.5 we argue that the accept-able FPR depends on factors such as theft rate, which diverges over different regions; henceETDSs with adjustable FPRs are preferable. CPBETD is designed based on the mentionedproperties, and has two phases, i.e., training and application.JCICF irvining ehvsz1) The algorithm is trained to estimate the TL in transmission lines within the NAN, EgL.Several mechanisms exist for this purpose. As a case in point in [106] a method for precisecalculation of TL in the branches of a distribution system was proposed. For each brancha specific circuit is assumed. Using the data of consumed energy and currents collectedby smart or traditional power meters and by employing least square regression, the likelyresistances at the lines connecting the consumption points to the distribution transformersas well as the non-ohmic losses are calculated. These parameters are then used to predictTLs in future time intervals. By comparing the total power loss with the estimated TL,107Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsnon-technical loss (NTL) in distribution transformer level is calculated. This method doesnot require frequent measurements. Besides, it is robust against falsified data caused byenergy theft attempts during the application phase. Since if the metering data shows alower usage compared to the true amount of consumption, the algorithm will calculate alower TL. This increases the gap between the total amount of loss and TL which is the NTL.The only limiting factor in the proposed method is that the data used during the trainingphase must be genuine, otherwise the model will not be accurate. There are methods[107–109] that mostly rely on the physical characteristics of the distribution lines ratherthan customers data to calculate the TL in each segment; yet they are less accurate dueto the dependency of such features on environmental conditions and inaccurate knowledgeof the circuit elements. While in most works a high precision was reported, a thoroughquantitative evaluation was missing. However, since the uncertainties in TL are usuallyless than the alterations in customers consumptions, a higher precision for calculating TLcompared to NTL is expected. The average calculation error in [109] was reported to bearound 0.5%. Design of the TL estimator is outside the scope of this work. We assumethat DggM and FeggM are the DR and FPR, respectively, of the NTL detector.2) The next step is data preprocessing, including operations like dimension reduction(the process of reducing the number of random variables) and normalization. Each datavector in the dataset includes the meter readings of a customer over a 24 hour period;for instance for n measurements per hour the data vector has 24 × n elements. Whilethere are several methods for dimension reduction that minimize the information loss byextracting the important features of data, we sum up the in-between samples to make thealgorithm compatible with different metering rates. That is, rather than applying a featureextraction technique that saves the key information of a higher dimension data vector whilereducing the data size, we only add up the samples. This is because we assume that the108Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnstime interval of the data received from smart meters is not known.3) Once the data are converted into the proper format, k-means algorithm [110] isperformed on the benign dataset. Several non-malicious factors can alter the consumptionpattern, such as seasonality, change of appliances, and different usage during weekdaysand weekends. In order to have a better DR, k-means clustering with different values of kis performed on the data, and each time the silhouette value of the clusters is calculated.A peak in the silhouette plot for kRl shows that the data are originated from l differentdistributions. Clusters that have few members are eliminated and will not be used fortraining the classifier. This can help to prevent pollution of the benign dataset by unde-tected attacks. We use l’ to denote the final number of clusters after eliminating the smallgroups.4) The next step is preparing a dataset for training the classifier. While a datasetof benign samples for each customer is easily obtainable using historic data, malicioussamples might not be available, since energy theft might never or rarely happen for a givencustomer. In order to address the problem of imbalanced data, one solution is applicationof single-class classification techniques where the classifier is trained only using normalsamples. However, as shown in Section 5.6 for the present application the performance ofone-class classifier is poor. Another solution is to utilize density function approximationmethods as in [111]. However, such approaches are only effective if the data can accuratelybe modeled by the approximation function, which in practice might be difficult to attain.Instead, we propose to create a dataset of malicious samples using the benign samples.The goal of energy theft is to report a lower consumption than the true amount used byconsumer, or to shift the high usage to low tariff periods. Thus, it is possible to generatemalicious samples using the benign ones.Assuming that x={x1,...,xn} is a vector of true consumption values for a 24 hour period109Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnswith n samples, the utility will receive y={y1P OOOP yn} as the meter readings. For honestcustomers y=x while for fraudulent users y = h(x), so that∑ni=1 yi ≤∑ni=1 xi. Throughstudying different scenarios for electricity theft and their effects on measured values, h=C)is extractable. For instance h(x) = αx, 0 ≤ α Q 1 is one possibility. Therefore, it ispractical to generate malicious samples using the benign dataset. Although defining allpossible functions that meet this condition is not practical, through considering a varietyof scenarios and taking advantage of generalization property of SVM, a thorough datasetof attack samples can be generated.5) The next step is training the classifier. Among several existing techniques for regres-sion and classification, we chose SVM due to its superior performance in many applicationscompared to traditional methods like likelihood ratio test and neural networks. Further-more, successful utilization of SVM for problems that deal with the similar data type [105]motivates us to use this classifier. Here, the classifier has l’+1 classes, l’ for benign and 1for malicious samples.6) SVM parameters can be adjusted to attain different performance in terms of DR/FPR.However, as shown in Section 5.6, the ROC of an SVM by itself does not provide a widerange for DR/FPR. In order to achieve the appropriate performance, first the requiredFPR, Fegrxq, is calculated as described in Section 5.5. Let DgSiM and FegSiM denotethe best achievable performance by SVM, parameter m is calculated as follows:m = − log(Fegrxq)FeggM × FegSiM (5.1)where m represents the number of times an abnormal pattern must be detected before anenergy theft alarm is raised. This mechanism, however, will also reduce the DR:Dg = (DggM ×DgSiM)m ≤ DggM ×DgSiM (5.2)110Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsThe larger is the value of m, the smaller is the FPR, yet it also means a larger detectiondelay.JCICG Vpplixvtion ehvsz1) For each neighborhood one or more transformer meters measure the total electricityprovided to the customers in the area, EgM(t). This value is compared with the totalamount of consumption reported by the smart meters of the corresponding distributiontransformer,∑iESMt(t). NTL is reported if for any time, t, during a day, EgM(t) S∑iESMt(t) + EgL(t) + ε, where ε is the calculation error for TL. This parameter adjuststhe trade off between DggM and FeggM , and therefore the DR and FPR of the wholealgorithm. In CPBETD attacks are detectable if the amount of stolen electricity in thearea is greater than ε. The above test is performed each time new samples are collected.2) Each new sample is preprocessed and converted into the proper format consistentwith the training set.3) SVM is applied to a new sample to determine whether it belongs to the benign orattack class.4) If Step 1 did not detect an anomaly and the new sample was classified as benignby the SVM, the new sample is added to the benign dataset and the corresponding attackpatterns will be generated and added to the attack dataset.5) If NTL was detected in Step 1 and the classifier recognized an attack, a suspiciousbehavior of the smart meter is reported. An energy theft is detected when the suspiciousbehavior of the smart meter is repeated m times over a certain period. During this timenew samples are stored in a temporary database. Once an energy theft is detected, anappropriate action such as on-site inspection is exercised. Based on the amount of NTLcalculated in Step 1, higher priority for inspection is assigned to smart meters located in111Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsareas with larger NTL. If a theft is verified, samples in the temporary database are addedto the attack dataset. Otherwise, they will be added to the benign dataset and theircorresponding attack patterns to the attack dataset.6) Another possibility is when Step 1 does not detect an NTL, but SVM recognizes ananomaly. This condition might have three causes. It might be due to SVMmisclassification,or because of error in NTL calculation in Step 1. These cases are not expected to happenfrequently in consecutive days. On the other hand, the condition might be because of somealterations in consumption habits, for instance due to changes of residents or appliances,etc., which result in major changes of usage pattern. In this case the condition will persist.Therefore, when SVM detects an anomaly while there is no sign of NTL in Step 1, the newsample is stored in a temporary database. If in the following days this condition happensfrequently, the old dataset will be discarded and a new dataset based on the samples inthe temporary database will be generated. Once the dataset is large enough, the classifieris retrained. A credibility factor, xfi, is assigned to each smart meter, which is a binaryvariable initialized to one. When a non-malicious anomaly is detected, as described above,xfi is set to zero and when the situation is resolved it will be set back to one. When thealgorithm detects an energy theft, smart meters with xfi = 1 have a higher priority forfurther action. This step of the algorithm makes CPBETD robust against non-maliciouschanges in consumption pattern.7) If NTL was detected in Step 1, but SVM did not recognize any anomaly and thecondition persists, it shows that an attack might be happening, but SVM cannot identify it.In this case the benign dataset of the customers is analysed for sign of data contaminationattack, in which by gradual changes in data and polluting the dataset an adversary deceivesthe learning machine to accept a malicious pattern as a normal one. The long-term trendin daily usage of the customer is studied. A descending slope in long-term consumption112Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnscurve can be a sign of contamination attack. If analysis of historic data does not showa contamination attack, CPBETD raises an alarm to indicate that an attack might behappening but the algorithm cannot detect it, and the algorithm continues its normaloperations for new samples. This happens in rare scenarios, e.g., when a new load withhigh consumption is directly connected to a feeder. This step of the algorithm makesCPBETD robust against contamination attacks.Pseudo-codes of the algorithm for application phase are provided in Fig. 5.1.JCJ ezrformvnxz VnvlysisJCJCF gzquirzy ezrformvnxzAs explained in [40], the revenue of a distribution utility is formulated as follows:g(ePD) =∑ii (qi − qiu) +∑i/(eP qiuP D)F (qiu) (5.3)where qi is the total consumption of user i, qiu is the unbilled part of the usage, i is theprice of electricity, / denotes the probability of detecting an electricity theft, z is the effortinvested in anti-fraud technologies, D is the anomaly detection test, and F (M) represents therecovered fines from the detected theft. On the other hand, the associated costs include theinvestments for protecting against theft, Ψ(e), and the costs of producing enough electricityto meet the demand of all customers X(M). Therefore, the total profit of the utility is:g(ePD)− X(qi)−Ψ(e)O (5.4)The operational cost of handling D is proportional to the effort, e, of the utility to managefalse alarms. At the same time, increasing e improves the probability of detecting a theft.113Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsInputN aS(nyw sumply=, thrxsholw1, thrxsholw2, thrxsholw3(> thrxsholw2=cutputN ttttck(boolyun=VuriublysN countxr1 (numbyr of timys un uttuwk is dytywtyd=, countxr2 (numbyr of non-muliwious unomuliys=, agL(boolyun=,VFt(binury=, SiMzut(boolyun=, gDB1(tymporury dutubusy=, gDB2(tymporury dutubusy=if XTM >∑t XSMi + XTL + "agL=truyOylsyagL=fulsyOynd ifOwlussify aS by gVa, if it is wlussiyd us uttuwk, thyn SiMzut=truy, othyrwisy SiMzut=fulsyOif agL=truyif SiMzut=truyif countxr1 ≥ thrxsholw1ttttck=truyOudd gDB1 to thy uttuwk dutusytOgo to yndOylsycountxr1++Oudd aS to gDB1Oynd ifylsy if SiMzut=fulsyfuisy un ulurm informing ubout thy risk of un undytywtubly uttuwkOynd ifylsy if agL = fulsyif SiMzut=truyif countxr2 < thrxsholw2countxr2++Oudd bg to gDB2Oylsy if thrxsholw2 ≤ countxr2 ≤ thrxsholw3countxr2 ++OVFt = 0Oudd aS to gDB2ylsy if countxr2 > thrxsholw3diswurd thy old dutusytOgynyruty u nyw dutusyt using gDB2Orytruin thy gVaOVFt=1Oynd ifOylsy if SiMzut=fulsyttttck=fulsyOynd ifOyndFigure 5.1: Pseudo-codes of application phaseTherefore, the optimal D is the one with maximum / subject to an upper bound onFPR. The required FPR might vary over different regions due to the base-rate fallacyphenomenon. The Bayesian detection rate (BDR) is defined as:e (I|A) = e (I)×Dge (I)×Dg + e (I¯)× Feg (5.5)114Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnswhere I stands for intrusion and A for alarm. For IDS alarms to be reliable, this probabilitymust be high. As (5.5) suggests, BDR not only depends on DR and FPR, but also is affectedby the probability of occurrence of an intrusion. When this probability is low, no matterhow big the DR is, the denominator will be subjugated by the FPR term. Therefore, inorder to achieve a reasonable value for BDR, very low FPR is required. In the case ofenergy theft e (I) is usually a small value and might significantly vary in different areas.JCJCG Clvssifixvtion bzthoyIn SVM, a model is generated based on the training data and is used to predict the targetvalues of the test data. First, using a kernel function, the data vector is mapped to ahigher dimensional space where data classes are more distinguishable. Then a separatinghyper-plane with maximum margin to the closest data points in each class is found. Thisconcept translates into a convex quadratic optimization problem formulated in (5.6).min12‖fi‖+ X∑ieisuxh thvt yi(fig )ϕ(xi + w) ≥ 1− eiϕ(xi) = e−‖x‖2 (5.6)where xi is the ith input data, yi is the class label for xi, fi is the normal vector to thehyper-plane, X is the penalty for the error term, ei measures the constraint violation, ϕ isthe kernel function, and w determines the offset of the hyper-plane from the origin alongthe normal vector [112].115Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsJCJCH Clustzring bzthoySilhouette plots are applied to determine the number of clusters within a dataset. Assumethat the data have been clustered into k clusters and for each sample i, v(i) is the averagedissimilarity of i with other samples within the same cluster. Also w(i) is the least averagedissimilarity of i to any other clusters. The Silhouette value, s(i) is defined as:s(i) =w(i)− v(i)max{v(i)P w(i)} (5.7)The average of s(i) over all samples within a cluster shows how close the samples in thecluster are, and averaging over the entire dataset shows how properly the data have beenclustered. We use this method to determine the number of clusters in the dataset ofeach customer. Defining separate classes for distinct clusters can help to achieve a higherclassification accuracy.JCK ezrformvnxz ZvvluvtionsWe used the smart energy data from the Irish Smart Energy Trial [113] in our tests.The dataset was released by Electric Ireland and Sustainable Energy Authority of Ireland(SEAI) in Jan 2012; it includes half hourly electricity usage reports of over 5000 Irishhomes and businesses during 2009 and 2010. Customers who participated in the trial hada smart meter installed in their homes and agreed to take part in the research. Therefore,it is a reasonable assumption that all samples belong to honest users. The large numberand variety of customers, long period of measurements and availability to the public makethis dataset an excellent source for research in the area of analysis of smart meters data.For each customer there is a file containing half hourly metering reports for a 535 dayperiod. We reduced the sampling rate to one per hour and for each customer divided the116Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsfile into a dataset of 535 vectors, each with 24 components. Then based on the dataset ofbenign samples, for each sample x={x1P OOOP x24}, we generated 6 types of malicious samplesas follows:for t=1,...,241) h1(xt) = αxt, α = random(0.1,0.8)2) h2(xt) = γtxt, γt = random (0.1,0.8)3) h3(xt) = βtxtβt = 0 stvrt time Q t Q eny time1 elsestvrt time = random(0,23 - minimum off time)yurvtion = random(minimum off time,24)eny time = stvrt time+ yurvtionhere minimum off time=4;4) h4(xt) = γt mean(x), γt = random (0.1,0.8)5) hI(xt) = mean(x)6) h6(xt) = x24−tWe have defined functions h1(M)P OOOP h6(M) based on existing and possible electricity theftscenarios. Functions h1(M) and h2(M) represent scenarios where a fraction of the customerusage is reported. h1(M) multiplies all the samples by the same randomly chosen value, whileh2(M) multiplies each meter reading by a different random number. We have used a uniformrandom number generator in our simulations. h3(M) formulates a common electricity theftmethod in which the smart meter does not send its measurements or sends zero for someduration defined by minimum off time in the formula. h4(M) and hI(M) orderly report afactor and the exact value of the average of readings during the day. h6(M) reverses the117Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsFigure 5.2: An example of the daily consumption.order of readings. hI(M) and h6(M) represent attacks against indirect load control mechanismsin which the price of electricity varies during different hours of the day; while the totalamount of electricity usage stays the same, the usage is reported to happen during thelow tariff periods. h1(M)P OOOP h6(M) formulate some examples of electricity theft attacks thatwe have chosen based on the existing reported attacks and possible attack scenarios asexplained above to evaluate the performance of our method. In practice the relationshipbetween theft and benign samples can be found and formulated by analyzing the existingattack samples in historic database of fraudulent customers, and then be used to generatethe attack dataset for all customers. Also benefiting from the generalization property ofSVM, the algorithm is able to detect attacks for which the exact function in not defined ingenerating the training dataset. We study this effect in Section 5.6.3. An example of thedaily consumption of a customer is shown in Fig. 5.2. Fig. 5.3 shows the correspondingattack patterns. Experiment results in the rest of this section are for the anomaly detectorpart of CPBETD. The total DR and FPR of the algorithm are scaled by the DggM andFeggM .118Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsFigure 5.3: An example of the daily consumption.JCKCF Zxpzrimznt FO dnzBxlvss hupport kzxtor bvxhinzIn the first test we examine the performance of one-class SVM, in which the classifier istrained only using normal samples. Among 535 samples of the benign dataset we use 458samples for training and 77 samples for testing. In each 7 consecutive days one sampleis randomly chosen for the testing set and the other 6 for the training set. Also for eachattack type, 535 samples are generated. In total, testing set includes 3287 samples, scaledin the range [-1,1]. We choose radial basis function (RBF) as the SVM kernel, and usegrid search [112] to find the best values for the kernel parameter, γ, and the upper boundin the fraction of training points, ,. Other SVM parameters are X=50, and e=0.1. Theabove procedure is repeated for 5000 customers. For each customer the best performanceis considered the one that provides the highest difference (HD) between DR and FPR fordifferent combinations of γ and ,. The average value of HD for 5000 customers is found tobe 47%, with the average DR and FPR being 76% and 29%, respectively. Fig. 5.4 showsthe ROC curve for three customers with best, average and worst performances. As the119Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsFigure 5.4: ROC curves for three customers with the worst, average and the best detectionperformance using one-class SVM.results suggest without abnormal samples in training phase, classification performance isnot promising.JCKCG Zxpzrimznt GO bultiBxlvss hupport kzxtor bvxhinzIn the second experiment, we train the classifier using both benign and malicious samples.After preprocessing we employ k-means clustering on the benign dataset and study thesilhouette plots to find the best value for k. For most customers we observe that k=1or 2 provides the best result. Then we train a multi-class SVM with k + 1 classes. Thesilhouette plots for 2 customers with k=1 and k=2 are shown in Fig. 5.5. Originally, thetraining set includes 458 benign and 2748 malicious samples. We use over-sampling, inwhich the members of the minority class are replicated, to make the number of benignand attack samples equal. This helps to avoid the classifier from being biased toward thelarger class. We use RBF kernel and apply grid search to find the appropriate values forγ and X; the value of e is 0.1. For this experiment we observe the average HD of 83%with the average DR and FPR being 94% and 11%, respectively. Fig. 5.6 shows the ROC120Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsFigure 5.5: Silhouette plots for customers with k=1 and k=2.Figure 5.6: ROC curves for three customers with the worst, average, and the best detectionperformance using multi-class SVM.curve of three customers with best, intermediate and worst performances. As we can see,multi-class SVM provides a significantly better performance compared to single class SVM.121Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsFigure 5.7: ROC curves for three customers with the worst, average and the best detectionperformance using multi-class SVM with undefined attacks.Table 5.1: Experiment resultsezrformvnxz ZxpC F ZxpC G ZxpC H]D(:) 47 83 70Dg(:) 76 94 86[eg(:) 29 11 16JCKCH Zxpzrimznt HO bultiBxlvss hupport kzxtor bvxhinz fiithczfi VttvxksAn attacker might use a different h(M) than those used to train the classifier. To study thiseffect, in this experiment we only train the SVM using h3(M), yet we include all hi(M)s inthe test set. The SVM parameters are similar to experiment 2. We observe the averageperformance of HD = 70% with DR = 86% and FPR = 16%. Three examples of ROC curvesare shown in Fig. 5.7. Though in this experiment we observe a lower DR compared toExperiment 2, the performance is still significantly better than one class SVM. A summaryof the experiment results is provided in Table 5.1.122Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsFigure 5.8: Relationship between BDR and FPR.JCKCI dvzrvll ezrformvnxz of Consumption evttzrn WvszyZlzxtrixity ihzft Dztzxtion vlgorithmWhile the exact value for the probability of an energy theft is unknown and varies acrossdifferent areas, we use the result of an experiment in [114] to evaluate the performance ofCPBETD. In 2001, the Arizona public service company conducted a study to provide anaccurate estimate of energy theft in its coverage area. The goal of the study was to findthe extent of meter tampering and the resulting financial loss with 95% accuracy. Among868,000 customers, 550 were randomly selected including rural (65%) and urban, residential(88%) and industrial users. The results showed the definite meter tampering rate of 0.72%.Considering the tampering rate of 0.72%, Figure 5.8 shows the relationship between BDRand FPR. As the figure suggests, in order to achieve a 90% BDR even for 100% DR, FPRmust be around 0.1%. Table 5.2 shows the DR, FPR and BDR for different values of m.By increasing m, appropriate values for FPR and therefore BDR are achievable, while DRstays in an acceptable range. Considering the ROC curves of the SVM, obtaining such alow FPR for m=1 is either impossible or happens for a very low DR.123Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsTable 5.2: BDR for different values of mm F G H IDg(:) 94 88 83 78[eg(:) 11 1 0.1 0.01WDg(:) 6.3 38 85 98JCKCJ Zffzxt of hvmpling gvtz on ezrformvnxzTo study the effect of sampling rate on detection performance, we tested the averageperformance of 100 customers for different metering frequencies. Results are shown inTable 5.3. We observe a peak at 12 samples/day, which indicates that higher samplingrates do not necessarily result in a better performance. This can be due to the factthat for lower sampling rates, the effect of theft on each sample might be more apparent.For instance, when a fraudulent customer aims to report 5KW less usage than his actualconsumption by uniformly dividing this amount over all samples, for 24 samples the changesin usage pattern is less apparent than when 5KW is spread over 12 samples. Therefore,the attack might be more apparent for lower sampling rates. On the other hand, forvery low sampling rates the consumption pattern cannot be modeled accurately. Usingthe lowest sampling rate that provides an acceptable performance is important, not onlyto minimize the required resources, but also to preserve customers privacy. In [115] theauthors showed that for 0.5 samples/second, one can find as much detailed informationas the title of the movie or the TV channel displayed in the household. In [36], using 0.5samples/minute the authors where able to find the type and time of usage of appliancesinside the household. These information cannot be obtained precisely using 1 sample/hour.For 1 sample/day, one cannot find detailed information about consumption habits of a user,still it is possible to conclude whether or not the customers are at home. This informationcan be used for example for the purpose of committing burglary. On the other hand, theutility requires metering data for demand response management. In [40] a game theory124Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsTable 5.3: Effect of sampling rate on detection performancehvmpling gvtz(svmplzsDyvy) Dg(:) [eg(:)F 62.1 28.0G 76.5 16.0I 88.7 9.0K 90.2 9.7FG 93.1 11.0GI 89.8 12.3approach was proposed to formulate the privacy-preserving demand-response problem asthe task of finding the maximum allowable sampling rate that keeps the demand lowerthan a predefined maximum value. We cannot say there is a specific point at which theusers’ measurement is or is not private anymore. However, the lower the sampling ratethe less information can be extracted from the measured data. So the sampling rate thatallows the utility to achieve its goals regarding demand response etc. can be found and theutilities can be forced by law to not collect metering data with higher than the requiredfrequency. One advantage of CPBETD is that it can provide a good performance even forthe sampling rate as low as 4 samples/day.JCKCK Disxussion vny CompvrisonCustomers’ normal consumption pattern may vary due to several non-malicious factors.These changes can be temporary, periodic or permanent and potentially can cause falsepositives. CPBETS uses mechanisms to handle these situations. Short-term changes arethe result of some unusual behaviors, for instance having a big party, which do not last formore than one or few days. Application of distribution transformer meters help to reducefalse alarms due to short-term changes since a suspicious behavior is reported only whenboth distribution transformer meters and SVM detect an anomaly. Therefore, if a customerhas a benign unusual behavior while NTL is not detected in distribution transformer level,125Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsfalse positive will not be generated, unless another customer within the neighborhood areais committing fraud at the same time which results in detection of NTL in the distributiontransformer. In this case the customer is considered suspicious. To handle this situation wesuggested calculation of the required FPR and adjustment of parameterm so that theft willonly be reported when a suspicious behavior is repeated several times. Moreover, customersmight have dissimilar usage habits over different days of week (weekday - weekend) orseasons. Using k-means clustering enables the algorithm to separate distinct distributionsin data, and separate classifiers can be trained accordingly. If a time dependency is observedfor clusters, the corresponding classifiers are labeled, and based on the time of new samplesthey are classified using the appropriate classifier. Finally, permanent changes can happendue to change of residents, appliances, etc. As explained in Step 6 of the application phase,CPBETD can identify these changes and avoid generating false positives.It is also possible that the dataset that is initially used to train the algorithm containstheft samples. One solution is harder supervision during the data collection period. Usingthe NAN NTL detector, areas suspicious to energy theft can be short listed and customersin the area can be considered for on-site inspection. During this phase application of TLcalculation methods that do not rely on customers data is recommended.One limitation of machine learning methods for anomaly detection in the security do-main is vulnerability to contamination attacks. Concurrent use of NAN NTL detectoralong with the customers’ anomaly detector in CPBETD helps to mitigate this concern.When an NTL is detected in NAN, while there is no sign of anomaly in customers’ usage,the trend in daily usage of customers can be monitored. A descending slope in long-termconsumption curve can be an indication of a contamination attack. CPBETD is capable ofdetecting a variety of attack types. The only energy theft scenario that cannot be detectedis when a customer employs a new appliance and directly connects it to an outside feeder.126Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsTable 5.4: Comparison among energy theft detection methodsevrvmztzr VgbVBGagpILr CeWZiD VbIDhpHKr eGepIKrDR(%) 67 94 NA attack1: 96others: NAFPR% 28 11 NA attack1: 9others: NARequired sam-pling ratelow low high lowRobustnessagainst non-maliciouschangesvulnerable robust vulnerable robustPrivacypreservationmedium medium week strongSince there is no sign of this appliance in the historic dataset, the anomaly will not bedetected.Table 5.4 shows a comparison among the proposed algorithm and the most recent andbest ETDSs existing in the literature. We implemented the ARMA-GLR detector proposedin [47], and tested its performance using the SEAI dataset. The results are provided inTable 5.4. The NILM based technique in AMIDS [36] requires around 0.5 samples/minuteto function properly, and thus the 2 samples/hour sampling rate in the SEAI dataset isinadequate to measure its performance. We also implemented the LUD algorithm in [46],which is based on P2P computing. While the algorithm is effective in detecting attack1 inwhich all usage measurements are multiplies by the same coefficient (h1(M)), it performedpoorly for other attack types.JCL Instvntvnzous Vnomvly Dztzxtion VlgorithmAlthough monitoring the usage pattern of each customer over a long period is useful indetecting malicious activities with long-term effects on a specific customer, such as en-127Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsergy theft attempts, it is not suitable for attacks that cause immediate damage. Besides,some attacks target several customers simultaneously with minimum influence on eachseparately; such attacks are not detectable by monitoring the usage pattern of customersindependently. To address these issues, we introduce instantaneous anomaly detection(IAD). IAD leverages the fact that in a given time, the consumption of all customers ina NAN follow a certain pattern. In other words, the approximation function for IAD in agiven time is a function of price and meter reading of all or a group of HANs within theNAN.A database of normal and malicious patterns is generated using the most recent sam-ples. Each vector in the database consists of the meter reading of all smart meters in aNAN for a given time, along with the electricity price and time of measurement. Usingthe database, an SVM classifier is trained to distinguish between normal and maliciouspatterns. Every time new metering data are received from the smart meters, a new samplevector is created and classified. Since a NAN might include hundreds to thousands ofHANs, either the HANs should be sub-grouped, or before classification, the sample vectorshould be preprocessed and its dimension should be reduced using an appropriate featureextraction method. The classifier then decides whether the sample belongs to normal ormalicious classes. Normal patterns are added to the dataset of normal samples. Anomalouspatterns trigger an alarm and after final decision, are added to the database of normal ormalicious patterns.The detection delay of the proposed method is close to the time interval between thesmart meter report transmissions. Using IAD, attacks which are distributed over a group ofcustomers are detectable. Furthermore, IAD is more robust against false positives resultingfrom benign changes in load profile. While it is possible that the consumption pattern ofa specific customer changes due to some unusual condition such as having a big party, it128Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsis less likely that all or a large group of HANs within a NAN experience none-maliciouschanges simultaneously.JCM ezrformvnxz Zvvluvtion of InstvntvnzousVnomvly DztzxtionJCMCF DvtvsztIn order to evaluate the effectiveness of IAD against adversarial activities such as attacksagainst direct and indirect load control we provided a synthetic dataset as follows.Dvtvszt of Wznign evttzrnsWe generated the probability distribution functions (PDF) of electricity usage for typicalhouseholds as a function of time and electricity price. According to [116] among 33 ap-pliances reported in 2001 Residential Energy Consumption Survey (RECS), done by U.S.Department of Energy (DoE), only 17 appliances use 85% of the total household consump-tion. Therefore, as [116] we assumed that these 17 appliances constitute all of the electricdevices in a household. We used the same energy profile for appliances as in [116] whichis shown in Table 5.5. In our study, the dataset is based on the consumption pattern often residential customers. Through careful study and interview with one resident fromeach dwelling, we calculated the consumption distribution of each customer separately.The interviewees were PhD students with major of electrical engineering; they had a deepknowledge in statistics and probability science, and were well aware of the purpose of thestudy. For each appliance we created a PDF as a function of time and price. For eachdevice we asked the interviewee to determine the probability of using the device duringdifferent hours in a working day (Monday-Friday) (ei(t)A iRFACCCAFL ). We also asked the129Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsTable 5.5: Appliance load profileCvtzgory hxhzyulz timz dc hxhzyulz timz :d[[ eofizrCentral Air Conditioning 15 min 35 min 1,064 WRoom Air Conditioning 15 min 35 min 221 WMain Space Heating 15 min 35 min 1,341 WFurnace Fan 15 min 35 min 190 WRefrigerator 15 min 35 min 471 WFreezer 15 min 35 min 395 WDishwasher 1 h 23 h 1,403 WRange top 1 h 23 h 1,468 WOven 1 h 23 h 1,205 WMicrowave Oven 15 min 23.75 h 1,909 WWater Heating 30 min 60 min 971 WLighting 8 h 16 h 322 WTV 8 h 16 h 47 WPC and Printer 4 h 20 h 263 WVCR/DVD 2 h 22 h 96 WCloths Dryer 1 h 23 h 2,956 WClothes Washer 1 h 23 h 329 Winterviewee about the probability of using a specific device knowing the price of the elec-tricity (ei(pr)A iRFACCCAFL )). We defined 3 levels for electricity price: low, medium, andhigh. Then, for each device we calculated its PDF as in (5.8).ei(tP pr) = ei(t)ei(pr) P i = 1P OOOP 17 (5.8)Where ei(tP pr) is the consumption probability of device i at time t with price pr. Basedon the distribution functions, we produced a dataset of normal consumption patterns. Foreach appliance per hour a random number, rvny, was generated. We employed ”Random”class in Java as the pseudo-random number generator. Using the random number, the130Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsappliance status, hi(tP pr), was set to a binary value of ON or OFF according to (5.9).hi(tP pr) = on rvny ≤ ei(tP pr)off rvny S ei(tP pr) P i = 1P OOOP 17 (5.9)In the next step, the consumption function, Fi(tP pr), of each appliance was found basedon the appliance status, and its profile.Fi(tP pr) = eoweri hi(tP pr) = on0 hi(tP pr) = off P i = 1P OOOP 17 (5.10)In (5.10), eoweri is the power consumption of device i according to its energy profile.Finally we calculated the total power function, E(tP pr), as summation of the consumptionfunction of all appliances.E(tP pr) =17∑i=1Fi(tP pr) (5.11)Using the above method we produced a dataset of normal consumption patterns, including800 samples for each customer.Dvtvszt of bvlixious evttzrnsAttacks against direct and indirect load control (DLC/IDLC) constitute some of the majorcyber security threats against AMI.Attacks against indirect load control (ILC): Load control mechanisms aim to modifythe consumption pattern of customers. One common method of load controlling is ILC inwhich customers are motivated to change their load curves by dynamic pricing. Price infor-mation is transmitted to the smart meters through AMI. Either customers use the pricinginformation to adjust their usage manually, or automatic energy consumption scheduling131Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrns(ECS) units within the HANs receive the pricing signal. Based on the customer prioritiesand pricing information, ECS controls the operation of electrical devices. By injectingfalse prices, an attacker can affect the usage profile of the customers. We simulated thiscondition through type 1 attacks as explained below.• iypz F: We exchanged the probability of consumption over low and high pricetariffs, which has the same effect as sending a low price signal when the real price ishigh and vice versa.Attacks against direct load control (DLC): DLC is another load control mechanism inwhich some of the customers’ loads are directly controlled by the utility. Automated DLCsystems, send control signals such as turn on/off, through AMI. By comprising the DLC,an attacker can affect the load curve. For instance, by injecting false ”turn on” signal to alarge number of appliances, the adversary can cause a large sudden increase in total load,which can affect the power quality and damage the customer and utility’s equipment. DLCattacks are simulated through type 1 and type 2 attacks.• iypz G: We added six mega Watt extra power to the normal consumption duringrandom hours and for random durations between 15 minutes to 3 hours.JCMCG izst gzsultsWe considered a NAN including 20 HANs. The dataset contained 400 samples for eachof the benign, Type 1 and Type 2 attack classes. For Type 1 we assumed that all 20HANs were under attack; since in order to cause immediate damage, this attack needsto work in large scale. 400 samples of Type 2 attacks equally contained samples where25%, 50%, 75% and 100% of the HANs were under attack. Components of the samplevectors were the power consumption of 20 HANs at 6pm. We also assumed that at 6pm,132Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsTable 5.6: Cross classification and detection performance of IADClvss corC (:) iC H (:) iC I (:) VxxC (:) [e (:)corC 85.00 14.00 1.00 - -iC F 14.50 85.50 0.00 85.75 14.00iC G 3.00 0.50 96.50 98.00 1.00the price of electricity is high. We used 4-fold cross validation for training and testingto avoid over fitting of the classifier. The datasets were divided into 4 equal size subsets.Every time one subset was tested while the 3 remaining subsets were used to train theclassifier. Parameters of the SVM were: XRJ, eRECEJ and γRECH. Classification resultsare shown in Table 5.6. We observed satisfactory detection performance. However, asit can be seen from the table, IAD provides a lower detection performance for Type 1attacks. The reason is, in simulating the effect of false prices on consumption pattern, weused a probabilistic approach in which a higher consumption probability was assigned tohigh tariff periods and vice versa. Therefore, there might be some samples in dataset ofType 1 attacks that are very similar to the benign class. In future smart grid, where energymanagement modules are installed to automatically control the household appliances basedon the energy price, and even some devices are under direct control of the utility, thepattern of normal consumption regarding the pricing information will be more predictableand distinguishable.To support the idea that under direct or automatic load control the performance ofthe detection method will improve, we studied the effect of the increase in the differencebetween probability of usage for low and high prices. Fig. 5.9 shows the accuracy of IADin detection of Type 1 attacks vs. the probability difference. We observed that when thedifference is increased the detection accuracy is improved.133Whuptyr IB Dytywtion of auliwious Uwtivitiys in UaI ising Wustomyrs Wonsumption duttyrnsFigure 5.9: Detection accuracy of IAD for false price attack as a function of differencebetween the usage probabilities during high and low price.JCN ConxlusionIn this chapter we have introduced CPBETD, a new algorithm for detecting energy theftin AMI. CPBETD relies on the predictability of customers’ normal and malicious usagepatterns. Along with application of SVM anomaly detector, the algorithm uses silhou-ette plots to identify the different distributions in the dataset, and relies on distributiontransformer meters to detect NTL at the transformer level. We have shown that thesefeatures provide a high performance and make the algorithm robust against non-maliciouschanges in consumption pattern as well as data contamination attacks. In practice therequired performance for an ETDS may vary across different regions. We have shown thatby introducing some delay to the detection algorithm, an adjustable performance to matchdifferent objectives is achievable. Using extensive experiments on a real dataset of 5000customers, we have shown that the proposed algorithm provides a high performance evenwith a low sampling rate, which helps to preserve customers’ privacy. We have also intro-duced IAD, an algorithm to detect attacks against DLC and ILC in AMI with short delay,by monitoring abnormalities in consumption patters. Through simulations and analysiswe have proved the effectiveness of the proposed method.134Chvptzr KConxlusions vny [uturz lorkIn this chapter, we highlight the contributions of each chapter and summarize the results.We also present a number of possible directions for further research.KCF hummvry of Vxxomplishzy lorkIn this thesis we have studied security and privacy issues in smart grids, and proposedseveral novel algorithms for detecting malicious activities against AMI. In particular, inChapter 3 and 4 we have addressed the problem of intrusion detection and prevention inZigBee-based HANs as one of the most vulnerable AMI subsystems. We have proposedHANIDPS which is an IDPS tailored for unique requirements and challenges of intrusiondetection in HANs. We have also designed methods for detecting major malicious activitiesagainst AMI including energy theft and attacks against direct and indirect load controlin Chapter 5. Such attacks can be originated from any of the AMI networks comprisingHANs, NANs and WANs. A summary of the work accomplished in each chapter is asfollows:• In Chapter 2, we have investigated cyber security and privacy issues in smart grid.We have surveyed the existing solutions and provided direction for further researchto develop methods tailored for smart grid requirements.• In Chapter 3, we have proposed algorithms for detecting and preventing spoofing135Whuptyr 6B Wonwlusions und Zutury korkattacks in static IEEE 802.15.4 networks. The proposed algorithms use the spatialdependency of RSS values to detect and distinguish malicious frames. We have arguedthat this parameter is hard to forge by attackers, therefore is effective in detectingand preventing the attacks. We have evaluated the performance of the proposedmethods through extensive analysis and experiments, and tested their functionalityunder different attack scenarios. We have also studied the resource usage and networkoverhead of the proposed methods. Evaluation results proved the effectiveness of ourapproach. Further, We have compared the performance of the proposed algorithmswith existing RSS based techniques and showed that our approach provides a higherperformance with lower resource usage. Considering that the proposed algorithms inthis chapter are RSS based, which has the same properties in IEEE 802.11 standardas in IEEE 802.15.4 standard, these algorithms are also applicable for single antennaWiFi local area networks.• In Chapter 4, we have argued that due to the large scale and high cost of false posi-tives, in the context of HAN, IDPSs which not only detect but also automatically stopthe attacks are highly required. We have introduced HANIDPS, a novel IDPS forZigBee-based HANs. HANIDPS combines a model-based intrusion detection methodtailored for HAN specifications, and a machine learning-based prevention techniquewhich enables dynamic defense against adversaries without prior knowledge of theattacks. Using novel techniques for spoofing prevention, and through utilization ofeffective mechanism for countermeasuring intentional and unintentional interference,as well as detecting and dropping frames which are not compliant with HAN stan-dards, HANIDPS secures the network against a variety of attack types. Moreover,we have studied the existing attacks against ZigBee networks and showed the ef-fectiveness of our approach against them. Although HANIDPS has been designed136Whuptyr 6B Wonwlusions und Zutury korkfor ZigBee HANs it is also applicable for other wireless sensor networks with similarproperties, i.e., for networks that employ limited numbers of protocols and applica-tion and it is possible to define a thorough and accurate feature space for them usingthe corresponding standards and specifications.• In Chapter 5, we have introduced novel techniques for detecting malicious activitiesagainst AMI by leveraging the predictability property of AMI data. We have pro-posed CPBETD, a consumption pattern-based energy theft detector which throughapplication of appropriate classification and clustering techniques along with trans-former meters provides a high performance and cost-effective solution for detectingenergy theft attempts. We have shown that unlike existing classification based tech-niques, the presented method is robust against non-malicious changes in consumptionpatterns as well as contamination attacks. In addition, we have argued that existingmetrics for evaluating the performance of ETDSs presented in the literature are in-adequate and other parameters such as Bayesian detection rate and implementationand operational costs must also be considered. We have proved the effectiveness ofCPBETD by evaluating its performance on real data of 5000 customers. We have,further, introduced IAD for detecting attacks against direct and indirect load control.IAD monitors the consumption pattern of a group of customers within a NAN, anddetects abnormal behaviors with a short delay. We have provided a synthetic datasetto evaluate the performance of IAD and observed satisfactory results.KCG huggzstions for [uturz lorkIn the following, some interesting directions for extending the work presented in this dis-sertation are introduced.137Whuptyr 6B Wonwlusions und Zutury kork1. Vpplixvtion of physixvl lvyzr fzvturzs for spoofing yztzxtion vny przvznBtionO in Chapter 3, we introduced algorithms for detecting spoofing attacks in staticIEEE 802.15.4 networks based on RSS values of received frames. We argued thatsince physical layer parameters are hard to forge, they can effectively be used for dis-tinguishing genuine and malicious nodes. One possible direction for future research isapplication of other physical layer features, such as round trip time (RTT) of the re-quest to send/clear to send (RTS/CTS) handshakes. Using multiple parameters canresult in a more accurate attack detection solution. Besides, it can help to developalgorithms that are adaptable for non-static and mobile networks.2. [inying outlizrs to yztzxt znzrgy thzft vttzmptsO In Chapter 5, we proposedan ETDS which distinguishes unethical customers by monitoring abnormalities inconsumption patterns of each customer. One possible direction to extend the pro-posed work is to group customers with similar usage pattern in a neighborhood andapply cross-correlation and outlier detection algorithms, to identify customers ex-hibiting different trend in their usage compared to similar customers. A customerwith different pattern compared to the neighbors is more suspicious for energy theft.For instance, this can be helpful in detecting artificial-looking patterns.3. erioritizing thzft yztzxtion vlzrts wvszy on monztvry lossO Another directionto extend our work on energy theft detection in Chapter 5 is to prioritize theftdetection alerts based on the resulted financial loss. By comparing the maliciouspattern with expected normal pattern monetary loss due to theft attempts can becalculated. Onsite inspection is an expensive procedure; responding to all theft alertsequally is not cost effective for the utilities. Instead, theft attempts that result inhigher loss to the utilities must have more priority for on-site inspection.138Wiwliogrvphy[1] C. M. Furlani, izstimony wzforz thz housz xommittzz on homzlvny szxurity suwxomBmittzz on zmzrging thrzvtsA xywzrszxurityA vny sxiznxz vny tzxhnology. United StatesHouse of Representatives, Jul. 2009.[2] cIhiIg LKGM gzvC F Guiyzlinzs for smvrt griy xywzr szxurity, 2014.[3] J. Wright, hmvrt mztzrs hvvz szxurity holzs. http://www.msnbc.com/id/36055667,2010.[4] FWIO hmvrt mztzr hvxks likzly to sprzvy. http://krebsonsecurity.com /2012/04/fbi-smart-meter-hacks-likely-to-spread, 2012.[5] K. Scarfone and P. Mell, “Guide to intrusion detection and prevention systems,”cIhi hpzxivl euwlixvtion, vol. 1, pp. 1–127, Feb. 2007.[6] J. P. Anderson, “Computer security threat monitoring and surveillance,” cIhi euwBlixvtions, vol. 1, pp. 1–56, Feb 1980.[7] D. E. Denning, “An intrusion-detection model,” IZZZ irvnsvxtions on hoftfivrzZnginzzring, vol. 13, pp. 222–232, Feb 1987.[8] G. Creech and J. Hu, “A semantic approach to host-based intrusion detection sys-tems using contiguous and discontiguous system call patterns,” ZZZ irvnsvxtionson Computzrs, vol. 99, Jan. 2013.[9] B. Mukherjee, L. Heberlein, and K. Levitt, “Network intrusion detection,” IZZZcztfiork, vol. 8, pp. 26–41, Jun. 1994.[10] L. Heberlein, G. Dias, K. Levitt, B. Mukherjee, and J. Wood, “A network secu-rity monitor,” in eroxC of IZZZ hymposium on gzszvrxh in hzxurity vny erivvxy,Oakland, CA, May 1990.[11] P. Stephenson, “Where is the intrusion detection system,” Informvtion hystzms hzBxurity, vol. 8, pp. 1–6, Dec. 2000.[12] S. Lee, M. Laurel, and D. Heinbuch, “Training a neural-network based intrusiondetector to recognize novel attacks,” IZZZ irvnsvxtions on hystzms vny Humvns,vol. 31, pp. 294–299, Jul. 2001.139Bivliogruphy[13] M. Crosbie and E. Spafford, “Applying genetic programming to intrusion detection,”in lorking cotzs for thz VVVI hymposium on Gznztix erogrvmming, Cambridge,Sep 1995.[14] K. Ilgun, R. A. Kemmerer, and P. A. Porras, “State transition analysis: a rule basedintrusion detection approach,” IZZZ irvnsC on hoftfivrz Znginzzring, vol. 21, pp.181–199, March 1995.[15] A. El-Semary, J. Edmonds, and J. Gonzalez, “A framework for hybrid fuzzy logicintrusion detection systems,” in eroxC of IZZZ Intzrnvtionvl Confzrznxz on Fuzzyhystzms, Reno, NV, May 2005.[16] T. Abbes, A. Bouhoula, and M. Rusinowitch, “Protocol analysis in intrusion detec-tion using decision tree,” in eroxC of IZZZ IiCC, Nancy, France, Apr. 2004.[17] R. C. Parks, Vyvvnxzy mztzring infrvstruxturz szxurity xonsiyzrvtions. Sandia Na-tional Laboratories, 2007.[18] R. Berthier, W. H. Sanders, and H. Khurana, “Intrusion detection for advanced me-tering infrastructures: requirements and architectural directions,” in eroxC of IZZZhmvrtGriyComm, Oct. 2010.[19] D. Faria and D. Cheriton, “Detecting identity-based attacks in wireless networksusing signalprints,” in eroxC of thz VCb lorkshop on lirzlzss hzxurity, New York,NY, Sep. 2006.[20] J. Yang, Y. Chen, W. Trappe, and J. Cheng, “Detection and localization of multiplespoofing attackers in wireless networks,” IZZZ irvnsC on evrvllzl vny Distriwutzyhystzms, vol. 24, pp. 44–58, Jan 2013.[21] Y. Sheng, K. Tan, G. Chen, and D. Kotz, “Detecting 802.11 mac layer spoofing usingreceived signal strength,” in IZZZ IcFdCdb, 2008.[22] Y. Chen, W. Trappe, and R. P. Martin, “Detecting and localizing wireless spoofingattacks,” in IZZZ hZCdc, May 2007.[23] D. C. Madory, czfi mzthoys of spoof yztzxtion in MEGCFFw fiirzlzss nztfiorks. M.Eng. Thesis, Dartmouth College, 2006.[24] P. Jokar, N. Arianpoo, and V. C. M. Leung, “Spoofing detection in 802.15.4 networksbased on received signal strength,” Zlszvizr VyC Hox cztfiorks Journvl, vol. 11, no. 8,pp. 2648–2660, 2013.[25] ——, “Spoofing prevention using received signal strength for zigbeebased home areanetworks,” in eroxC of IZZZ hmvrtGriyComm, Vancouver, Canada, Oct. 2013.140Bivliogruphy[26] R. Mitchell and R. Chen, “Behavior-rule based intrusion detection systems for safetycritical smart grid applications,” IZZZ irvnsC hmvrt Griy, vol. 4, no. 3, pp. 1254–1263, 2013.[27] L. Wang, W. Sun, R. C. Green, and M. Alam, “Distributed intrusion detectionsystem in a multi-layer network architecture of smart grids,” IZZZ irvnsC hmvrtGriy, vol. 2, no. 4, pp. 796–808, 2011.[28] C. H. Lo and N. Ansari, “Consumer: A novel hybrid intrusion detection system fordistribution networks in smart grid,” IZZZ irvnsC on Zmzrging iopixs in Computing,vol. 1, no. 1, pp. 33–44, 2013.[29] P. Y. Chen, S. Yang, J. A. McCann, J. Lin, and X. Yang, “Detection of false datainjection attacks in smart-grid systems,” IZZZ Communixvtions bvgvzinz, vol. 53,no. 2, pp. 206–213, 2015.[30] N. B. Mohammadi, J. Misic, H. Khazaei, and V. B. Misic, “An intrusion detectionsystem for smart grid neighborhood area network,” in IZZZ ICC, 2014.[31] R. Berthier and W. H. Sanders, “Specification-based intrusion detection for advancedmetering infrastructures,” in IZZZ evxifix gim Intzrnvtionvl hymposium on DzpznyBvwlz Computing, 2011.[32] S. Parthasarathy and K. Deepa, “Bloom filter based intrusion detection for smartgrid scada,” in IZZZ Cvnvyivn Confzrznxz on Zlzxtrixvl vny Computzr Znginzzring,2012.[33] N. Goldenberg and A. Wool, “Accurate modeling of modbus/tcp for intrusion detec-tion in scada systems,” Intzrnvtionvl Journvl of Critixvl Infrvstruxturz erotzxtion,vol. 6, no. 2, pp. 63–75, 2013.[34] P. Jokar and V. C. M. Leung, “Intrusion detection and prevention for zigbee-basedhome area networks in smart grids,” IZZZ irvnsC on hmvrt Griy, Jun. 2015.[35] P. Jokar, H. Nicanfar, and V. C. M. Leung, “Specification-based intrusion detec-tion for home area networks in smart grids,” in eroxC of hmvrtGriyComm, Brussels,Belgium, Oct. 2011.[36] S. McLaughlin, B. Holbert, A. Fawaz, R. Berthier, and S. Zonouz, “A multi-sensor en-ergy theft detection framework for advanced metering infrastructures,” JhVC, vol. 31,no. 7, pp. 1319–1330, 2013.[37] Z. Xiao, Y. Xiao, and D. H. C. Du, “Non-repudiation in neighborhood area networksfor smart grid,” IZZZ Communixvtions bvgvzinz, vol. 51, no. 1, pp. 18–26, 2013.141Bivliogruphy[38] B. Khoo and Y. Cheng, “Using rfid for anti-theft in a chinese electrical supply com-pany: A cost-benefit analysis,” in eroxC of IZZZ lirzlzss izlzxomBmunixvtions hymBposium, 2011.[39] S. Amin, G. A. Schwartz, and H. Tembine, “Incentives and security in electricitydistribution networks,” hpringzr Dzxision vny Gvmz ihzory for hzxurity, pp. 264–280, 2012.[40] A. A. Cardenas, S. Amin, G. Schwartz, R. Dong, and S. Sastry, “A game theorymodel for electricity theft detection and privacy-aware control in ami systems,” inIZZZ Vllzrton Confzrznxz on CommunixvtionA ControlA vny Computing, 2012.[41] E. Angelos, O. R. Saavedra, O. A. Cortes, and A. N. de Souza, “Detection and iden-tification of abnormalities in customer consumptions in power distribution systems,”IZZZ irvnsC on eofizr Dzlivzry, vol. 26, no. 4, pp. 2436–2442, 2011.[42] S. Depuru, L. Wang, and V. Devabhaktuni, “Support vector machine based dataclassification for detection of electricity theft,” in IZZZ eofizr hystzms Confzrznxzvny Zxposition, 2011.[43] S. Depuru, L. Wang, V. Devabhaktuni, and P. Nelapati, “A hybrid neural networkmodel and encoding technique for enhanced classification of energy consumptiondata,” in IZZZ eofizr vny Znzrgy hoxizty Gznzrvlbzzting, 2011.[44] S. Depuru, L. Wang, V. Devabhaktuni, and R. C. Green, “High performance com-puting for detection of electricity theft,” Intzrnvtionvl Journvl of Zlzxtrixvl eofizrvny Znzrgy hystzms, pp. 21–30, 2013.[45] M. D. Martino, F. Decia, J. Molinelli, and A. Fernndez, “Improving electric frauddetection using class imbalance strategies,” ICegVb, pp. 132–141, 2012.[46] S. Salinas, M. Li, and P. Li, “Privacy-preserving energy theft detection in smartgrids: A p2p computing approach,” JhVC, vol. 31, no. 9, pp. 257–267, 2013.[47] D. Mashima and A. A. Crdenas, “Evaluating electricity theft detectors in smart gridnetworks,” gzszvrxh in VttvxksA IntrusionsA vny Dzfznszs, pp. 210–229, 2012.[48] P. Jokar, N. Arianpoo, and V. C. M. Leung, “Electricity theft detection in ami usingcustomers’ consumption patterns,” eroxzzyings of thz IZZZ, vol. PP, no. 99, May2015.[49] ——, “Intrusion detection in advanced metering infrastructure based on consumptionpattern,” in eroxC of ICC, Budapest, Hungary, Jun. 2013.[50] ihz smvrt griyO vn introyuxtion. U.S. Department of Energy, 2008.142Bivliogruphy[51] Cvuszs of thz GEEH mvjor griy wlvxkouts in north Vmzrixv vny ZuropzA vny rzxomBmznyzy mzvns to improvz systzm yynvmix pzrformvnxz. IEEE Power EngineeringSociety.[52] K. C. Nyns, E. Haesen, and J. Driesen, “The impact of charging plug-in hybrid elec-tric vehicles on residential distribution grid,” IZZZ trvnsC on pofizr systzm, vol. 25,no. 1, pp. 371–380, Feb. 2010.[53] cIhi frvmzfiork vny rovymvp for smvrt griy intzropzrvwility stvnyvrys. Office ofthe national cordination for smart grid interoperability, Jan. 2010.[54] hzxurity profilz for vyvvnxzy mztzring infrvstruxturz kzrsion FCE. The advancedsecurity acceleration project, 2009.[55] Cywzr vttvxksO Is your xritixvl infrvstruxturz svfzT PriceWaterhouseCooper, 2010.[56] Common Cywzr hzxurity kulnzrvwilitizs dwszrvzy in Control hystzm Vsszssmznts wythz Ica chiW erogrvm. Department of Energy Office of Electricity Delivery andEnergy Reliability, 2008.[57] C. P. O’Flynn, “Message denial and alteration on ieee 802.15.4 low power radionetworks,” in IFIe Intzrnvtionvl Confzrznxz on czfi izxhnologizsA bowility vny hzBxurity, Feb. 2011.[58] R. Sokullu, O. Dagdeviren, and I. Korkmaz, “On the ieee 802.15.4 mac layer attacks:Gts attack,” in Intzrnvtionvl Confzrznxz on hznsor izxhnologizs vny Vpplixvtions,Aug. 2008.[59] Y. W. Law, P. Hartel, J. den Hartog, and P. Havinga, “Link-layer jamming attackson s-mac,” in eroxC of of IZZZ lhc, 2005.[60] Y. Xiao, S. Sethi, H. Chen, and B. Sun, “Security services and enhancements in theieee 802.15.4 wireless sensor networks,” in eroxC of of IZZZ GadWZCdb, 2005.[61] C. Laughman, D. Lee, R. Cox, and S. Shaw, “Power signature analysis,” IZZZ eofizrvny Znzrgy bvgvzinz, vol. 1, no. 2, pp. 56–63, 2003.[62] H. Y. Lam, G. S. K. Fung, and W. K. Lee, “A novel method to construct taxonomyelectrical appliances based on load signatures,” IZZZ irvns on Consumzr ZlzxtronBixs, vol. 53, no. 2, pp. 653–660, 2007.[63] C. Efthymiou and G. Kalogridis, “Smart grid privacy via anonymization of smartmetering data,” in IZZZ intzrnvtionvl Confzrznxz on smvrt griy xommunixvtions,Oct. 2010.143Bivliogruphy[64] ihz Vyvvnxzy Znxryption htvnyvryBCiphzrBwvszy bzssvgz Vuthzntixvtion CoyzBeszuyoBgvnyom FunxtionBFGM Vlgorithm for thz Intzrnzt Kzy Zxxhvngz erotoxol=IKZ). RFC 461.[65] H. K. So, S. H. Kwok, E. Y. Lam, and K. S. Lui, “Zero-configuration identity-basedsigncryption scheme for smart grid,” in hmvrt Griy Communixvtions, Oct. 2010.[66] G. Frey, M. Muller, and H. G. Ruck, “The tate pairing and the descrete algorithmapplied to elliptic curve cryptosystem,” IZZZ irvnsvxtions on informvtion thzory,vol. 45, no. 5, pp. 1717–1719, Jul. 1999.[67] F. Li, B. Luo, and P. Liu, “Secure information aggregation for smart grids usinghomomorphic encryption,” in eroxC of IZZZ hmvrtGriyComm, Oct. 2010.[68] P. Pillier and D. Pointcheval, “Public-key cryptosystem based on composit degreeresiduosity classes,” in eroxC of Zuroxrypt, 1999.[69] Q. Li and G. Cao, “Multicast authentication in the smart grid with one-time signa-ture,” IZZZ irvnsC on hmvrt Griy, vol. 2, no. 4, pp. 686–696, 2011.[70] M. M. Fouda, Z. M. Fadlullah, N. Kato, R. Lu, and X. Shen, “Towards a light-weight message authentication mechanism tailored for smart grid communications,”in IcFdCdb lKhHeh, Apr. 2011.[71] D. R. Stingson, CryptogrvphyO ihzory vny ervxtixz. Boca Raton, 2005.[72] D. Wu and C. Zhou, “Fault-tolerant and scalable key management for smart grid,”IZZZ irvnsC on hmvrt Griy, vol. 2, no. 2, pp. 375–381, Jun. 2011.[73] D. J. Malan, M. Welsh, and M. D. Smith, “A public-key infrastructure for keydistribution in tinyos on elliptice curve cryptography,” in eroxC of IZZZ Confzrznxzon hznsor VyBhox Communixvtions vny cztfiorks, 2004.[74] R. Needham and M. Schroeder, “Using encryption for authentication in large net-works of computers,” Communixvtions of thz VCb, vol. 21, pp. 393–399, 1978.[75] Z. Sun and J. Ma, “Efficient key management for advanced distribution automation,”in IZZZ Intzrnvtionvl Confzrznxz on cztfiork Infrvstruxturz vny Digitvl Contznt,Sep. 2010.[76] S. Flouria, R. Anderson, F. Alvarez, and K. McGrath, “Key management for sub-stations: symmetric keys, public keys or no keys?” in IZZZDeZh eofizr hystzmConfzrznxz vny Zxposition, Mar. 2011.[77] J. Dagle, “Vulnerability assessment activities,” eroxC of eofizr Znginzzring hoxiztylintzr bzzting, vol. 1, pp. 108–113, 2001.144Bivliogruphy[78] P. D. Ray, R. Harnoor, and M. Hentea, “Smart power grid: a unified risk managementapproach,” in IZZZ Intzrnvtionvl Confzrznxz on hzxurity izxhnology, Oct. 2010.[79] D. Kundur, X. Feng, S. Liu, T. Zourntos, and K. L. B. Perry, “Towards a frameworkfor cyber attack impact analysis of the electric smart grid,” in IZZZ hvrtGriyComm,Oct. 2010.[80] Y. Jiaxi, M. Anjia, and G. Zhizhong, “Vulnerability assessment of cyber security inpower industry,” in IZZZ eofizr hystzms Confzrznxz, Oct. 2006.[81] G. Hamound, D. Logan, and A. P. Meliopoulos, erowvwilistix szxurity vsszssmznt forpofizr systzm opzrvtions. IEEE PES Reliability, Risk, and Probability Applications,2004.[82] I. Dobson and B. A. Carreras, “Estimating failure propagation in models of cascad-ing blackouts,” in IZZZ Intzrnvtionvl Confzrznxz on erowvwility bzthoys Vpplizy toeofizr hystzms, Sep. 2004.[83] B. Brown, VbI systzm szxurity rzquirzmznts, 2008.[84] M. LeMay and C. A. Gunter, “Cumulative attestation kernels for embedded sys-tems,” hpringzr Computzr hzxurity, pp. 655–670, 2009.[85] A. Bittau, M. Handley, and J. Lackey, “The final nail in wep’s coffin,” in IZZZhymposium on hzxurity vny erivvxy, May 2006.[86] O. Li and W. Trappe, “Relationship-based detection of spoofing related anomaloustraffic in ad hoc networks,” in IZZZ hZCdc, 2006.[87] oigWzz smvrt znzrgy stvnyvry F, 2014.[88] S. Alwadi, M. T. Ismail, and S. A. A. Karim, “A comparison between haar wavelettransform and fast fourier transform in analyzing financial time series data.” gzs JVppl hxi, pp. 352–360, 2010.[89] P. J. V. Fleet, Disxrztz fivvzlzt trvnsformvtionsO Vn zlzmzntvry vpprovxh fiith vpBplixvtions. Wiley, 2008.[90] M. Q. Ali, R. Yousefian, E. Al-Shaer, S. Kamalasadan, and Q. Zhu, “Two-tier data-driven intrusion detection for automatic generation control in smart grid,” in IZZZCch, 2014.[91] K. Ioannis, T. Dimitriou, and F. C. Freiling, “Towards intrusion detection in wirelesssensor networks,” in Zuropzvn lirzlzss Confzrznxz, 2007.145Bivliogruphy[92] S. Shamshirband, A. Patel, N. B. Anuar, M. L. M. Kiah, and A. Abraha, “Coop-erative game theoretic approach using fuzzy q-learning for detecting and preventingintrusions in wireless sensor networks,” Zlszvizr Znginzzring Vpplixvtions of VrtifiBxivl Intzlligznxz, vol. 32, pp. 228–241, 2014.[93] F. Tabrizi and K. Pattabiraman, “A model for security analysis of smart meters,” inIZZZDIFIe Confzrznxz on Dzpznyvwlz hystzms vny cztfiork lorkshops, 2012.[94] S. McLaughlin, D. Podkuiko, S. Miadzvezhanka, A. Delozier, and P. McDaniel,“Multi-vendor penetration testing in the advanced metering infrastructure,” in VCbComputzr hzxurity Vpplixvtions Confzrznxz, 2010.[95] D. Grochocki, J. H. Huh, R. Berthier, R. Bobba, W. H. Sanders, A. Cardenas,and J. G. Jetcheva, “intrusion detection requirements and deployment recommenda-tions,” in IZZZ Confzrznxz on hmvrt Griy Communixvtions, 2012.[96] hmvrt znzrgy profilz G vpplixvtion protoxol stvnyvry. ZigBee Alliance, 2012.[97] Communixvtions rzquirzmznts of smvrt griy tzxhnologizs. U.S. Department of En-ergy, 2010.[98] V. Kounev and D. Tipper, “Advanced metering and demand response communicationperformance in zigbee based hans,” in IcFdCdb lKhHeh, 2013.[99] P. Yi, A. Iwayemi, and C. Zhou, “Developing zigbee deployment guideline under wifiinterference for smart grid applications,” IZZZ irvnsC on hmvrt Griy, vol. 2, no. 1,pp. 110–120, 2011.[100] E. Evendar and Y. Mansour, “Learning rates for q-learning,” Journvl of bvxhinzazvrning gzszvrxh, vol. 5, pp. 1–25, 2004.[101] D. Martins and H. Guyennet, “Attacks with steganography in phy and mac layers of802.15.4 protocol,” in Intzrnvtionvl Confzrznxz on hystzms vny cztfiorks CommuBnixvtions, 2010.[102] R. Jiang, R. Lu, Y. Wang, J. Luo, C. Shen, and X. S. Shen, “Energy theft detectionissues for advanced metering infrastructure in smart grid,” isinghuv hxiznxz vnyizxhnology, vol. 19, no. 2, pp. 105–120, 2014.[103] S. Axelsson, “The base-rate fallacy and the difficulty of intrusion detection,” VCbirvnsC Informvtion vny hystzm hzxurity, vol. 3, no. 3, pp. 186–205, 2000.[104] A. Cavoukian, “Privacy by design: The 7 foundational principles,” in Informvtionvny erivvxy Commissionzr, Ontario, Canada, 2009.146Bivliogruphy[105] J. Nagi, K. S. Yap, K. Sieh, S. Ahmed, and M. Mohamad, “Nontechnical loss detec-tion for metered customers in power utility using support vector machines,” IZZZirvnsC eofizr Dzlivzry, vol. 25, no. 2, pp. 1162–1171, Apr. 2010.[106] D. N. Nikovski and Z. Wang, bzthoy for yztzxting pofizr thzft in v pofizr yistriwutionsystzm. U.S. Patent Application 13/770,460.[107] C. D. Oliveira, N. Kagan, A. Meffe, S. Jonathan, S. Caparroz, and J. L. Cavaretti, “Anew method for the computation of technical losses in electrical power distributionsystems,” in IZZ Confzrznxz vny Zxhiwition on Zlzxtrixity Distriwution, 2001.[108] A. Meffe and C. C. B. de OLIVEIRA, “Technical loss calculation by distributionsystem segment with corrections from measurements,” in IZi Confzrznxz vny ZxhiBwition on Zlzxtrixity Distriwution, 2009.[109] P. N. Rao and R. Deekshit, “Energy loss estimation in distribution feeders,” IZZZirvnsC on eofizr Dzlivzry, vol. 21, no. 3, pp. 1092–1100, 2006.[110] J. A. Hartigan and A. W. Manchek, Vlgorithm Vh FHKO V kBmzvns xlustzring vlgoBrithm. Applied statistics, 1979.[111] K. Hempstalk, E. Frank, and I. H. Witten, dnzBxlvss xlvssifixvtion wy xomwiningyznsity vny xlvss prowvwility zstimvtion. Springers Machine Learning and KnowledgeDiscovery in Databases, 2008.[112] C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,”Dvtv bining vny Knofilzygz Disxovzry, vol. 2, no. 2, pp. 121–167, 1998.[113] Irish soxivl sxiznxz yvtv vrxhivz. http://www.ucd.ie/issda/data/commissionforenergyregulationcer/, 2012.[114] J. J. Culwell, gzszvrxh stuyy quvntifizs znzrgy thzft losszs, 2001.[115] U. Greveler, B. Justus, and D. Loehr, “Multimedia content identification throughsmart meter power usage profiles,” in ComputzrsA erivvxy vny Dvtv erotzxtion, 2012.[116] S. Karnouskos and T. de Holanda, “Simulation of a smart grid city with softwareagents,” in jKhim Zuropzvn hymposium on Computzr boyzling vny himulvtion,Nov. 2009.147


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items