Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Multi-dimensional invariant detection for cyber-physical system security : a case study of smart meters… Raiyat Aliabadi, Maryam 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2018_may_raiyat_aliabadi_maryam.pdf [ 1.46MB ]
JSON: 24-1.0366015.json
JSON-LD: 24-1.0366015-ld.json
RDF/XML (Pretty): 24-1.0366015-rdf.xml
RDF/JSON: 24-1.0366015-rdf.json
Turtle: 24-1.0366015-turtle.txt
N-Triples: 24-1.0366015-rdf-ntriples.txt
Original Record: 24-1.0366015-source.json
Full Text

Full Text

Multi-Dimensional Invariant Detectionfor Cyber-Physical System SecurityA case study of smart meters and smart medical devicesbyMaryam Raiyat AliabadiB.Sc., Shahid Beheshti University, 2000M.Sc., Tehran Islamic Azad University, 2006A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinThe Faculty of Graduate Studies(Electrical & Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)April 2018c Maryam Raiyat Aliabadi 2018AbstractCyber-Physical Systems (CPSes) are being widely deployed in security-critical scenarios such as smart homes and medical devices. Unfortunately,the connectedness of these systems and their relative lack of security mea-sures makes them ripe targets for attacks. Specification-based IntrusionDetection Systems (IDS) have been shown to be e↵ective for securing CPSs.Unfortunately, deriving invariants for capturing the specifications of CPSsystems is a tedious and error-prone process. Therefore, it is important todynamically monitor the CPS system to learn its common behaviors andformulate invariants for detecting security attacks. Existing techniques forinvariant mining only incorporate data and events, but not time. However,time is central to most CPS systems, and hence incorporating time in ad-dition to data and events, is essential for achieving low false positives andfalse negatives.This thesis proposes ARTINALI : A Real-Time-specific Invariant iNfer-ence ALgorIthm, which mines dynamic system properties by incorporatingtime as a first-class property of the system. We build ARTINALI-basedIntrusion Detection Systems (IDSes) for two CPSes, namely smart metersand smart medical devices, and measure their ecacy. We find that theARTINALI-based IDS significantly reduces the ratio of false positives andfalse negatives by 16 to 48% (average 30.75%) and 89 to 95% (average 93.4%)respectively over other dynamic invariant detection tools. Furthermore, itincurs about 32% performance overhead, which is comparable to other in-variant detection techniques.iiLay SummaryCyber-physical systems (CPSes) constitute the core of Internet of Things(IoT). Recently, they are increasingly deployed in many critical infrastruc-tures. The rapid growth of IoT has led to deployment of CPSes without ad-equate security. Researchers have demonstrated successful attacks againstCPSes used in smart grids, modern cars, unmanned aerial vehicles and smartmedical devices. Hence, it is imperative to develop techniques to improvesecurity of these systems. However, CPSes have constraints (such as real-time requirements) that make building Intrusion Detection System (IDS)for them challenging. In this thesis, we propose a technique (ARTINALI)to dynamically monitor the CPS system to learn its common behaviors inthree important dimensions including data, event and time, and formulatebehavioral rules (aka invariants) for detecting security attacks. Furthermore,we built ARTINALI-based IDSes for two important CPSes, namely smartmeters and smart medical devices and evaluated the ecacy of the IDSes.iiiPrefaceThis thesis is the result of work carried out by myself, in collaboration withmy advisor, Dr. Karthik Pattabiraman, Dr. Julien Gascon Samson, andAmita Kamath. All chapters are based on work published in ACM SIG-SOFT Symposium on the Foundations of Software Engineering (FSE2017).I was responsible for writing all the papers, designing solutions, conduct-ing the experiments and evaluating the results. My advisor was responsiblefor editing and writing segments of the manuscripts, providing feedbackand guidance during design phases, experiments, and evaluations. Amitacontributed primarily to the evaluation of one part of the related work forcomparison purposes during her 3-month internship, and Julien helped mewith a couple of iterations over my conference paper, and providing usefulcomments.”ARTINALI: Dynamic invariant detection for Cyber-Physical SystemSecurity.” Maryam Raiyat Aliabadi, Amita Ajith Kamath, Julien Gascon-Samson, and Karthik Pattabiraman, In the Proceedings of the ACM SIG-SOFT Symposium on the Foundations of Software Engineering (FSE2017).ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Abbreviation . . . . . . . . . . . . . . . . . . . . . . . . . . xiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Introduction and Overview . . . . . . . . . . . . . . . . . . . . 11.1 Cyber-Physical Systems . . . . . . . . . . . . . . . . . . . . . 11.2 Characterizing the CPS Attack Surface . . . . . . . . . . . . 21.3 CPS Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Intrusion Detection Systems . . . . . . . . . . . . . . . . . . 61.4.1 Signature-based detection . . . . . . . . . . . . . . . . 61.4.2 Anomaly-based detection . . . . . . . . . . . . . . . 61.4.3 Specification-based techniques . . . . . . . . . . . . . 71.5 Overarching Goal and Research Questions . . . . . . . . . . 91.6 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 10vTable of Contents1.7 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Background and Motivation . . . . . . . . . . . . . . . . . . . 122.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 A Motivating Example . . . . . . . . . . . . . . . . . . . . . 142.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.1 Multi-dimensional model . . . . . . . . . . . . . . . . . . . . 183.2 Event-data-time Interplay . . . . . . . . . . . . . . . . . . . . 193.3 ARTINALI Workflow . . . . . . . . . . . . . . . . . . . . . . 223.3.1 Block 1. ARTINALI D|E Miner . . . . . . . . . . . . 233.3.2 Block 2. ARTINALI E|T Miner . . . . . . . . . . . . 253.3.3 Block 3. ARTINALI D|T Miner . . . . . . . . . . . . 263.3.4 Block 4. IDS Prototype . . . . . . . . . . . . . . . . . 273.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 294.1 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.1.1 Advanced Metering Infrastructure (AMI) . . . . . . 294.1.2 Smart Artificial Pancreas (SAP) . . . . . . . . . . . 304.2 Experimental Procedure . . . . . . . . . . . . . . . . . . . . 315 Evaluation Against Targeted Attacks . . . . . . . . . . . . . 345.1 AMI Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.1.1 Synchronization tampering (Blocks A1A4) . . . . . 345.1.2 Meter spoofing (Blocks B1B5) . . . . . . . . . . . 355.1.3 Message dropping (Blocks C1 C5) . . . . . . . . . . 355.2 Detection of AMI attacks . . . . . . . . . . . . . . . . . . . . 365.2.1 Synchronization tampering . . . . . . . . . . . . . . . 365.2.2 Message dropping . . . . . . . . . . . . . . . . . . . . 365.2.3 Meter spoofing . . . . . . . . . . . . . . . . . . . . . . 375.3 SAP Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.3.1 CGM spoofing attack (Blocks A1A4) . . . . . . . . 37viTable of Contents5.3.2 Basal tampering (Blocks B1B5) . . . . . . . . . . 385.4 Detection of example attacks in SAP . . . . . . . . . . . . . 395.4.1 CGM spoofing attack . . . . . . . . . . . . . . . . . . 395.4.2 Basal tampering attack . . . . . . . . . . . . . . . . 395.4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 406 Evaluation Against Arbitrary Attacks . . . . . . . . . . . . . 416.1 Arbitrary Attack Model . . . . . . . . . . . . . . . . . . . . . 416.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . 436.3 Research Questions (RQs) . . . . . . . . . . . . . . . . . . . 456.3.1 RQ1. F-Score . . . . . . . . . . . . . . . . . . . . . . 456.3.2 RQ2. False Negatives . . . . . . . . . . . . . . . . . . 516.3.3 RQ3. False Positives . . . . . . . . . . . . . . . . . . 536.3.4 RQ4. Memory Overhead . . . . . . . . . . . . . . . . 546.3.5 RQ5. Performance Overhead . . . . . . . . . . . . . . 556.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587.1 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . 587.2 Generalizability of ARTINALI . . . . . . . . . . . . . . . . . 598 Conclusions and Future Work . . . . . . . . . . . . . . . . . . 608.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 618.2.1 Extending the Generalizability of ARTINALI . . . . 628.2.2 Optimizing ARTINALI for the Scalability . . . . . . 628.2.3 Extending ARTINALI for Attack Diagnosis . . . . . 63Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64viiList of Tables1.1 The main deficiencies of current IDS techniques in addressingCPS constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 83.1 E|T and D|T Invariant Types . . . . . . . . . . . . . . . . . . 254.1 Types and number of inferred invariants for SEGMeter acrosstools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.2 Type and number of inferred invariants for OpenAPS acrosstools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.1 ARTINALI invariants to detect the example attacks in AMI. 365.2 ARTINALI invariants to detect example attacks in OpenAPS. 386.1 Optimal training set size for maximum F-Score(0.5) for SEG-Meter across tools, and the corresponding FP and FN ratios. 476.2 Optimal training set size for maximum F-Score(2) for Ope-nAPS across tools, and the corresponding FP and FN ratios. 476.3 Memory and performance overhead of IDS, seeded by ARTI-NALI and the other tools, running on SEGMeter. . . . . . . . 55viiiList of Figures1.1 CPS attack surface . . . . . . . . . . . . . . . . . . . . . . . . 32.1 Scope of dynamic invariant detection techniques . . . . . . . 142.2 A snippet of Serial-Talker code for the SEGMeter . . . . . . . 153.1 Work flow of ARTINALI . . . . . . . . . . . . . . . . . . . . . 223.2 Sample DElog and respective D|E invariants with 100% con-fidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.3 Sample TElog and DurationLOG files and respective E|T in-variants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.4 Representation of ARTINALI D|T invariants . . . . . . . . . 274.1 High level block diagram of (a) Smart meter and (b) SmartArtificial Pancreas. . . . . . . . . . . . . . . . . . . . . . . . . 314.2 Overall experimental process of running the IDS . . . . . . . 325.1 Attack tree for AMI . . . . . . . . . . . . . . . . . . . . . . . 355.2 Attack tree for SAP . . . . . . . . . . . . . . . . . . . . . . . 386.1 Fault injection impact(%): (a) data mutation in SEGMeter,(b) branch flip in SEGMeter, (c) artificial delay in SEGMeter,(d) data mutation in OpenAPS, (e) branch flip in OpenAPS,(f) artificial delay in OpenAPS . . . . . . . . . . . . . . . . . 436.2 FN, FP and F-Score variations based on number of trainingtraces for SEGMeter’s IDS seeded by Daikon invariants. . . 476.3 FN, FP and F-Score variations based on number of trainingtraces for SEGMeter’s IDS seeded by Texada invariants. . . 48ixList of Figures6.4 FN, FP and F-Score variations based on number of trainingtraces for SEGMeter’s IDS seeded by Perfume invariants. . . 486.5 FN, FP and F-Score variations based on number of trainingtraces for SEGMeter’s IDS seeded by ARTINALI invariants. 496.6 FN, FP and F-Score variations based on number of trainingtraces for OpenAPS’s IDS seeded by Daikon invariants. . . . 496.7 FN, FP and F-Score variations based on number of trainingtraces for OpenAPS’s IDS seeded by Texada invariants. . . . 506.8 FN, FP and F-Score variations based on number of trainingtraces for OpenAPS’s IDS seeded by Perfume invariants. . . 506.9 FN, FP and F-Score variations based on number of trainingtraces for OpenAPS’s IDS seeded by ARTINALI invariants. 516.10 FN(%) of IDS for SEGMeter for di↵erent attack types acrossthe tools. Error bars are shown for the 95% confidence inter-val. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.11 FN(%) of IDS for OpenAPS for di↵erent attack types acrossthe tools. Error bars are shown for the 95% confidence inter-val. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52xList of Abbreviation• AMI : Advanced Metering Infrastructure• BG : Blood Glucose• CPS : Cyber-Physical System• FNR : False Negative Ratio• FPR : False Positive Ratio• IDS : Intrusion Detection System• IoT : Internet of Things• PCD : Power Consumption Data• SAP : Smart Artificial Pancreas• SDC : Silent Data Corruption• UAV : Unmanned Aerial VehiclexiAcknowledgmentsFirstly, I would like to express my sincere gratitude to my adviser, Prof.Karthik Pattabiraman. If it was not for your continuous support, carefulguidance, and great patience, I would not have been able to complete thisthesis today.Besides my adviser, I would like to thank my thesis committee and ex-aminers, including Prof. Ivan Beschastnikh , and Prof. Sathish Gopalakr-ishnan. You provided me with valuable feedback, and helped me improvemy thesis.I would also like to thank my colleagues in CSRG for their help andinsightful feedback on my work. I particularity would like to thank all mycolleagues at Dependable Systems Lab for the stimulating discussions, andall the joy we had working together. The memories will stay with me forever.My lovely ARTIN and my dearest ALI, since starting my studies atUBC and living in Vancouver, you have supported me unconditionally, andbeyond imagination. I am very lucky to have you in my life, and I will beforever grateful to you.My loving parents and parents-in-law, I would like to thank you for givingme your wonderful support through ups and downs. You shaped me intowho I am today, and helped me grow both as a researcher, and as a person.Last but not the least, I would like to thank God for giving me thechance to pursue this wonderful profession, and for guiding me throughoutthe entire process.xiiDedicationTo my son, ARTIN and my husband, ALIxiiiChapter 1Introduction and Overview1.1 Cyber-Physical SystemsCyber-physical systems (CPSes) have been investigated as a key area ofresearch since they constitute the core of the Internet of Things (IoT). Re-cently, they are increasingly deployed in many critical infrastructures. Forexample, they are widely used in modern automotives to control di↵erentparts of a car [10]. Millions of people throughout the world have implantedmedical smart devices such as pacemakers to help their hearts beat at aregular rhythm [29]. Moreover, recent studies predict that global spendingon Unmanned Aerial Vehicles (UAVs), will exceed $94 billion over the nextten years [53] and the number of Advanced Metering Infrastructures (AMIs)in smart grids will exceed 800 million by 2020 [49].The rapid growth of IoT has led to deployment of CPSes without ad-equate security. These systems carry out critical tasks and hence, are po-tential targets for cyber attacks. For instance, hackers are able to remotelykill the engine of the smart car, and take control of the brakes [26], or causea drone to crash or go o↵ course[21, 24, 43, 47]. The U.S. Department ofHomeland Security investigated 24 cases of suspected cyber security attacksin smart medical devices in 2014 alone[29, 33], and this number is expectedto increase. In addition, researchers have successfully discovered and demon-strated a variety of attacks in smart meters for energy fraud purposes [45].To make these systems secure, security mechanisms such as intrusion detec-tion systems (IDS) are required.11.2. Characterizing the CPS Attack Surface1.2 Characterizing the CPS Attack SurfaceA CPS consists of a cyber unit (i.e., embedded system), and a physicalunit connected by a communication channel. It also includes a control loopwhich involves interactions between the cyber and physical domains. InFigure 1.1, we characterized the attack surface of a CPS, and extended theattacker entry points presented by previous work [9, 28, 34].According to Figure 1.1, there are four likely entry points (marked as A,B, C, D) for attackers to penetrate the system.Type A attacks can be used to hide the real state of the physical processin order to trick the controller program to make a wrong decision aboutthe next state of physical process, and to delay the detection of the attacksbefore the actual damage to the system (like what happened in Stuxnet [11]).Type C attacks can disrupt the system by directly compromising thecontrol commands that are issued by controller. These attacks can jam thecommunication channel, or compromise the routing protocol. One exampleof this type is Basal tampering attack, which transmits the malicious com-mands to the smart insulin pump after crafting a DoS attack by jammingthe communication between controller and insulin pump [33, 41].Type B include attacks that exploit the vulnerabilities in physical com-ponents. For example, [47] developed a way for attacking drones equippedwith vulnerable MEMS gyroscopes using intentional sound noise causingdrones to loose control and crash.Type D attacks aim to exploit the vulnerabilities in cyber unit to takeover the dynamics of physical process. For instance, recent work [48] inves-tigated an attack to smart facial recognition systems caused by exploiting amis-classification bug (CVE-2016-1516) in the controller algorithm.In this study, we focus on attacks that impact the physical processeswithout exploiting vulnerabilities in the physical domain (type A, C and Dattacks). The reason is that many physical systems deal with legacy technol-ogy and legacy components, that are not controlled by software, and theirbehavior is essentially governed by the physical law. Hence, it is challengingto identify the environment variables, map the environment changes on the21.2. Characterizing the CPS Attack SurfaceFigure 1.1: CPS attack surfaceenvironment variables, and incorporate the laws of physics to define accept-able behavior upon environmental changes [37]. Therefore, the behavior ofphysical processes can be defined more precisely by specifications that arespecified by system developer or experts’ knowledge.Adversarial Model : We assume the adversarial goal is to compro-mise the functionality of CPS. This means that she either prevents a vitalfunctionality of the CPS from being executed (e.g., power consumption datanot being sent to the utility server in smart meter, or regular heartbeat notbeing provided by a pacemaker), or functionalities being executed improp-erly (e.g., insulin injection is resumed when it must be stopped in insulinpump, or brake leads to acceleration in a smart car). The first group ofattacks targets the availability properties, while the second group targetsthe integrity properties of the CPS. We assume that the adversary is capa-ble to inject, drop or modify messages in the communication channels onattack entry points A, C and D defined in previous section to change i) thelegal values of data variables in the code , ii) the normal execution path inthe control flow graph of the code, or iii) the normal timing behavior of aparticular function of the code, and hence impact the CPS functionality.31.3. CPS ConstraintsWe do not consider denial of service (DoS) attacks or those threats thatcompromise the privacy/confidentiality properties of the CPS, as these at-tacks are generally addressed by other techniques including network securitymeasures and cryptographic methods.1.3 CPS ConstraintsIn the following, we discuss the key characteristics and constraints thatsecurity mechanisms must satisfy to be applicable to CPSes.• C1. Real-time constraints: CPSes typically interact with theirenvironment in a real-time fashion. From a security point of view,taking real-time requirements into account is vital for two reasons:First, CPSes are decision making agents that need to make decisions inreal time, and to address the continuous operation capabilities; there-fore, real-time availability is a necessity. This characteristic leads toa stricter operational environment [8], in which any security solutionthat modifies the controller program and changes its real-time behavioris not acceptable for these systems. In other words, the performanceoverhead imposed by security mechanism has to be small enough tosatisfy the CPS real-time constraints. Secondly, in a real-time sys-tem, the operational correctness depends on both logical correctness,and correct timing behavior [54]. This implies that the logical asser-tions are not enough for verifying the correctness of these systems,and hence incorporating time in their system model is essential to en-able the security mechanism to check the system’s operations [15]. Inother words, reflecting the real-time constraints into the CPS modelis required to have a more predictive security mechanism.• C2. Resource constraints : As a single thing in the ”Internet ofThings”, a CPS performs a single task on a single platform with limitedCPU, memory and computational power. It implies that these sys-tems need a security solution that satisfies their resource constraints.For instance, an important component of a security mechanism is the41.3. CPS Constraintsmodel that represents the correct behavior of the CPS. This modelmay occupy a large space in memory. However, CPSes have limitedmemory capacity making existing IDSes inapplicable due to memoryoverheads.• C3. Unknown vulnerabilities : CPSes are new systems that mayhave unknown vulnerabilities, and hence, they are inevitably exposedto zero-day attacks. On the other hand, the physical system is moresusceptible to the vulnerabilities of the cyber system as the interactionbetween the physical system and the cyber system increases. Thus,CPSes require security solutions that rely on a system model ratherthan a dictionary of attacks, to monitor both known and unknownattacks.• C4. Security-critical constraints : As many CPSes are deployedin security-critical applications such as smart medical devices, theyneed security mechanisms with minimum false negative rates. This isimportant because any undetected attack in such life-critical systemsmay have catastrophic consequences for the patient.• C5. Large-scale deployment : CPSes are often deployed on alarge scale (e.g., smart meters), and hence need security mechanismswith minimal false positives as false positives can aggregate within thesystem and consume significant resources. Moreover, these systems aredeployed in mission-critical applications where shutting them down ona false alarm is not a viable option.• C6. No-human-in-the-loop : Most CPSes are autonomic systems,which work without a human-in-the-loop, and need to provide non-interrupted service. Thus, unlike the general computer systems, CPSscannot be interrupted frequently for upgrading and/or patching. Thisimplies that there is a substantial need for deployment of highly sen-sitive security mechanisms that provide automated response againstsecurity attacks. Particularly, this is essential for those CPSes that51.4. Intrusion Detection Systemsare deployed in the security-critical applications such as pacemak-ers, where any service interruption may be fatal for the pacemaker-implanted patient, or those that are deployed on a large scale includingsmart meters, where denying the service due to an attack may shutdown the entire smart grid [50].1.4 Intrusion Detection SystemsIntrusion Detection Systems (IDSes) are the most popular security mecha-nisms that have been widely used to monitor computer systems and detectsecurity attacks. Typical IDSes fall into one of three categories: Signature-based, Anomaly-based, and Specification-based [20]. In the following, we firstintroduce the existing IDSes used for computerized systems, then we explainwhy they are not sucient to secure CPSes (as illustrated in Table 1.1).1.4.1 Signature-based detectionSignature-based detection techniques work by monitoring system behaviorand comparing the behavior against a database of signatures or attributesfrom known malicious threats. The benefit of these techniques is that theydo not need a model of the system they are monitoring, and their limitationis that they cannot detect zero-day attacks [38]. The latter is especiallyimportant for CPSes as they are often dicult to patch or upgrade in thefield. Therefore, signature-based IDSes are not a good match for CPSes asthey do not address constraints C3 and C6. In contrast, both anomaly-basedand specification-based techniques use a behavioral model of the system tocompare with suspicious behaviors, and can detect both known and unknownattacks.1.4.2 Anomaly-based detectionAnomaly-based techniques create a baseline profile of the legitimate systembehavior based on statistical methods by observing its operations at runtime,enabling them to distinguish the incoming system activity as normal or61.4. Intrusion Detection Systemsanomalous. Unfortunately, these techniques incur considerable overheadto profile the system at runtime [20], and also su↵er from high rates offalse-positives, both of which deter their use in CPSes, which are oftenresource constrained, and operate autonomously for long periods of time.Accordingly, they are not viable for CPSes with respect to constraints C2,C4 and C5.1.4.3 Specification-based techniquesSpecification-based techniques build a behavioral model for system by defin-ing a set of rules (known as invariants) that lead to a decision regardingwhether a given pattern of activity is suspicious. Invariants are defined asstatements that describe the relationship among states of a program [4]. Forexample, they can be used to state the relationship between two variablesthat always hold true, over the entire run of a program. Invariants can beeither discovered by static analysis or dynamic analysis of the program, orbe provided by the system developer.Static analysis-based techniques :Static analysis-based techniques are basedon source code and the specifications defined by the developer. Hence theyrely upon apriori knowledge of the systems’ specifications to detect attacks[7]. Static analysis-based techniques are inherently conservative with lowfalse positives, as they only generate invariants that are rigorously provable.These techniques analyze the program without executing the program, andhence, they can not provide adequate information about the real-time behav-ior of the system in its operational environment. Thus, static analysis-basedtechniques are not able to provide a comprehensive model of real-time be-havior and timing properties of system, which in turn, leads to high falsenegatives [20]. Moreover, as these techniques need to access the source code,they cannot statically verify many interesting properties of a program be-cause of unavailable code (e.g., calls to compiled external libraries). As aresult, the invariants set that is generated by static analysis techniques needsto be complemented by developer specifications to address certain externalconditions [4]. However, prior work [14, 39] have found that there is of-71.4. Intrusion Detection SystemsTable 1.1: The main deficiencies of current IDS techniques in addressingCPS constraintsIDS Techniques C1 C2 C3 C4 C5 C6Signature-based X XAnomaly-based X X XSpecification-based (static analysis) X XInvariant-based (dynamic analysis) X Xten inconsistency between what developers describe their system does, andwhat the system does in practice [14, 39]. These drawbacks make the staticanalysis-based techniques insucient for CPSes with respect to constraintsC1 and C4.Dynamic analysis-based techniques :Dynamic analysis-based techniquesprovide an alternative way to understand the system by observing the run-time behavior. They log the key points of the program to peek into the actualprogram behavior at run-time[40], and infer a set of likely invariants. Unlikestatic analysis-based techniques, these techniques do not need the sourcecode as they rely on data received from runs of program. Although theyinstrument the code to collect data for analysis they do not need the code foranalysis step by itself. Using a suciently complete set of test cases, dynamicanalysis-based techniques are potentially able to infer invariants that cannotbe mathematically proved (e.g, time duration between two specific eventsof the program), so there is no way for static analysis-based techniques tofind them. For these reasons, to address CPS constraints discussed in theprevious section, we selected dynamic analysis for mining likely invariantsin CPS systems.There has been a significant amount of work on using dynamic analysisto find likely invariants for program understanding, formal verification, de-bugging and intrusion detection [12, 17, 19, 23, 27, 30, 31, 35, 40, 42, 55, 58].These systems mine execution traces of the system for deriving invariantson the data values of the system, the events or both. However, we find thatmany of these systems incur significant false-positives and/or false-negatives,when used in the context of an IDS, which makes them challenging to deploy81.5. Overarching Goal and Research Questionsfor CPSes that are used in security critical infrastructures (C4) and/or ona large scale (C5).1.5 Overarching Goal and Research QuestionsCPSes are targets of many security attacks. Reconfiguration and/or recoveryof such systems from attacks requires time, money, and human e↵ort. Onthe other hand, as a result of what we discussed in previous section, theexisting IDSes for computer systems do not address the limitations andconstraints of CPSes. Therefore, the overarching goal of this thesis is tounderstand the behavior of CPS systems in the absence of attacks, and usethis understanding to develop an IDS technique to improve the security ofCPS systems in an automated manner.To achieve the overarching goal, we are interested in answering the fol-lowing research questions:RQ1. How can we build a specification-based IDS for CPSsystems with respect to their constraints?• RQ1.1. How can we infer a multi-dimensional behavioral model forCPS systems?• RQ1.2. How can we leverage the multidimensional CPS model to buildan IDS that meets CPS constraints?The most important component of a specification-based IDS for CPSsystems is the CPS model. A CPS model should include a set of specifica-tions (invariants) that precisely describe the correct behavior of the system.We addressed these research questions by proposing a technique to analyzethe normal behavior of the CPS along three dimensions: data, event, andtime, to generate a richer model for the system, and hence a more sensitiveIDS for the system.91.6. Contributions1.6 ContributionsThis thesis introduces ARTINALI (A Real-Time-specific Invariant iNferenceALgorIthm) for mining likely invariants through dynamic analysis in CPSsystems, for specification-based IDSes. The main innovation in ARTINALIis that it incorporates time as a first-class notion in the mined invariants, inaddition to the traditional data and event invariants. This is important fortwo reasons. First, most CPSes have real-time constraints, and hence theiroperational correctness depends on both logical correctness, and correct tim-ing behavior [25, 54]. Hence, incorporating time is essential for detectingmany common security attacks in these systems. Secondly, CPSes have pre-dictable timing behaviors to a first order of approximation, and hence lever-aging this predictability leads to higher accuracy (i.e., lower false-positivesand negatives). However, incorporating time in dynamic invariant detectiontechniques increases the complexity of the learning due to the much largerstate space that needs to be covered. To alleviate this issue, we break upthe problem of learning invariants along the three dimensions into problemsof learning invariants along two dimensions, namely data-events and events-time, and then combine them into data-events-time invariants. To the bestof our knowledge, ARTINALI is the first dynamic invariant detection sys-tem that mines invariants along the three dimensions of data, event, andtime, and uses the mined invariants for intrusion detection.Our contributions are:• Designed ARTINALI, an algorithm that generates a multi-dimensionalmodel for CPSes by mining invariants along the data, event and timedimensions (Chapter 3).• Built an ARTINALI-based IDS prototype, and used it in the contextof two CPS systems, namely i) advanced metering infrastructures, ii)and smart artificial pancreas (Chapter 4).• Evaluated our ARTINALI-based IDS prototype on the two CPS sys-tems, in comparison with several existing state of the art dynamicinvariant detection techniques. We crafted 6 targeted attacks against101.7. Publicationstwo systems, and observed that ARTINALI-based IDS is able to detectall of them while the other techniques could not detect even a singleattack (Chapter 5).• Found that the ARTINALI-based IDS exhibits significantly lower false-negatives and false-positives for arbitrary attacks emulated by faultinjection, compared to the other techniques. Furthermore, it incursabout 32% performance overhead, which is comparable to other in-variant detection techniques (Chapter 6).1.7 PublicationsThe work in this thesis has been published in the following paper:• Maryam Raiyat Aliabadi, and Karthik Pattabiraman. ”ARTINALI:Dynamic invariant detection for Cyber-Physical System Security.” ACMSIGSOFT Symposium on the Foundations of Software Engineering(FSE2017).11Chapter 2Background and MotivationSpecification based techniques based on static analysis and formal specifi-cations (generated by developer) have been proposed as a good fit for CPSsecurity [7, 37]. However, as we discussed in chapter 1-Section 1.4.3, staticanalysis-based techniques have deficiencies that make them insucient foraddressing the CPS constraints (e.g., real-time constraints). On the otherhand, developers rarely write down specifications of their systems. As a re-sult, many specification mining techniques based on dynamic analysis havebeen appeared in previous work [17, 19, 23, 27, 30, 31, 35, 40, 42, 55, 58].Mined specifications cannot replace formal specifications created by an ex-pert since an invariant mined from program traces could be a false positive[3]. However, as many CPS programs lack formal specifications, minedspecifications are necessary. As a result, dynamic analysis is substantiallyimportant to mine invariants that can not be inferred through static analysisor are not specified by system developers. For these reasons, we selected toemploy dynamic analysis to formulate CPS behavior with respect to theirconstraints for specification-based IDSes.In this Chapter, we first survey related work in the area of dynamicanalysis-based techniques and explain how our proposed technique di↵ersfrom them. We then present a motivating example from smart meters toillustrate why we need new types of invariants like the ones generated byour technique to bridge the gap.2.1 Related workThere has been a significant amount of work on using dynamic analysis tomodel the behavior of software systems for program understanding, formal122.1. Related workverification, debugging and intrusion detection [17, 19, 23, 27, 30, 31, 35, 40,42, 55, 58]. These techniques can be categorized into four classes, based onthe models that they generate: i) data invariants, ii) event relationships, iii)data and event relationships, and iv) time dependencies of events. Figure 2.1shows the main dynamic analysis-based techniques, and where they fall alongthe data, event and time axes.Daikon was the first dynamic analysis-based technique to derive (likely)invariants about data value relations [17], and falls into the first class oftechniques. Daikon can be placed on the data axis as it produces a modelfor data constraints without taking into account the events or timing of thesystem. DySy [13], which uses symbolic execution to derive invariants, isanother example of this class.The second class captures the sequence of events within a progam’s ex-ecution paths by inferring finite state machines from a set of traces. Rele-vant examples include Perracotta [58] and Texada [31], both of which derivetemporal logic propositions, and capture sequences of events by tracking dy-namic traces. These tools fall along the event axis since they only capturethe constraints on event relations independent of data or timing information.The third class of techniques generate integration models that capturethe relationship between data and events. For example, the GK-Tail al-gorithm merges temporal specifications and data invariants into ExtendedFinite State Machine models [35]. It represents sequences of method invo-cations that are annotated with data, and is hence limited to classifyingdata invariants that arise among method calls. Quarry finds data invariantsat each program point, and then finds temporal relationships between theinvariants [30]. Neither technique considers timing information, however.The fourth class consists of a single technique, Perfume, which is a spec-ification mining tool designed for modelling system properties based on re-source (time and storage) consumption [40]. It generates an integrationmodel of event relations and their time constraints. Although Perfume con-siders time as a part of model, it does not consider the relationship betweendata and time.Overall, none of the current techniques consider the interplay among132.2. A Motivating Exampletime, events and data in formulating invariants, which we believe is an es-sential characteristic of CPS systems.Figure 2.1: Scope of dynamic invariant detection techniques2.2 A Motivating ExampleWe consider an example of a smart electric meter to illustrate why the ex-isting dynamic analysis-based techniques are often insucient for capturingthe key properties of a CPS system. We also use this as a motivating exam-ple to illustrate ARTINALI later.We use an open-source smart meter called SEGMeter as one of thetestbeds for our implementation and our evaluations (see Chapter 4-Section 4.1.1).SEGMeter is composed of two main components: the meter component anda controller. The meter component is in charge of measuring and collect-ing power consumption data coming through its serial ports, and storingthem in memory. The controller acts as the communication bridge betweenthe meter board and the server, and is in charge of passing server com-mands to the meter board, as well as transmitting power consumption data142.2. A Motivating Exampleto the server at specific time intervals. The Serial-Talker() function inthe controller program of the smart meter is in charge of receiving powerconsumption data (at specific time intervals) and bu↵ering them for billingcalculation purposes. The Serial-Taker code is shown in Figure 2.2 (inthe Lua language).Figure 2.2: A snippet of Serial-Talker code for the SEGMeterSerial-Talker() has a Boolean argument seg-data, that takes valuestrue or false in pre-determined time intervals. The sequence of events thatare invoked in this function varies based on the value passed in argument(seg-data). If true is passed (line 6 in Figure 2.2), then the program emitsthe event send, followed by read. Alternatively, if false is passed to thefunction (line 21 in Figure 2.2), then the program emits events receive andwrite respectively.We examined the invariants inferred by the di↵erent dynamic analysis-based tools for this example. Daikon infers the values of variable seg-datawithin Serial-Talker() during normal execution as the set {true, false},namely seg-data:[true,false]. A typical temporal specification minersuch as Texada identifies the legal sequences of events, e.g., G(send !XF read), which means that upon event send happening, it is always fol-lowed by event read. The invariant inferred by Perfume (send ! receive,152.3. Summary0.1, 1.2) complement the temporal invariant by adding time boundaries be-tween events, i.e., send is followed by read within a time interval of 0.1 to1.2 ms.Assume that an adversary’s goal would be to perform energy fraud andlower their energy bills. One possible attack would be for the attacker totamper with the synchronization between the send and receive modes inthe smart meter. As a result, a part of the energy usage would not be writtento the memory bu↵er which is used for future energy usage calculations andbilling. For instance, should the value false be passed to the functioninstead of true, then it would lead to the execution of receive and writeinstead of send and read; hence the billing information would be incorrect.None of the above techniques can detect the attack as the incorrectoccurrence of sequences are triggered by legal values of seg-data occurringat the wrong time ( e.g., seg-data (T1) = false). More specifically,Daikon would notice a valid value for seg-data, Texada would notice anormal sequence of receive and write events, and Perfume would alsoobserve valid time intervals between events receive and write within theexecuted path. Thus, none of them would detect the attack. Even if allthree models are used jointly, they would still not detect the intrusion, asthe di↵erent models either capture the legal data values, or the legal sequenceof events with their time di↵erence, but not the interplay among them. Thisinterplay is essential for detecting the attack.2.3 SummaryIn this chapter, we surveyed the dynamic analysis-based techniques thatmodel the behavior of software systems. We categorized these techniquesinto four classes, based on the models that they generate, and illustratedwhere they fall along the data, event and time dimensions. Overall, noneof these techniques consider the relationship between data and time, as wellas interplay among time, events and data in formulating invariants, whichwe believe are essential characteristics of CPS systems. We also broughtup a concrete attack example against a smart electric meter, and showed162.3. Summarythat none of the existing techniques would detect the attack. Even if all themodels generated by di↵erent techniques are used jointly, they would stillnot detect the attack, as the di↵erent models either capture the legal datavalues, or the legal sequence of events with their time di↵erence, but not theinterplay among them. This interplay is essential for detecting the attack,and is the main gap in existing dynamic analysis-based techniques.17Chapter 3ApproachIn this chapter, we introduce the security model that ARTINALI uses, andwe explain its design. We first define our multi-dimensional model andthe di↵erent classes of invariants. Next, we explain how to relate di↵erentdimensions to generate real-time data invariants. Finally, we present theARTINALI workflow and algorithm.3.1 Multi-dimensional modelWe model a CPS in three dimensions, as follows:Data refers to data values assigned to the variables of a program. Itincludes neither the timing of processes, nor the sequence and concurrencyof processes.Event refers to an action that a system takes to respond to an externalstimulus.Time refers to real-time constraints, and includes both the constraintson physical timing of various operations, and those where the system mustguarantee response within a specified time frame.We model the security policy of a CPS by inferring the set of invari-ants to be preserved during run time. An invariant, or interchangeably aproperty, is a logical condition that holds true at a particular set of programpoints. Like in prior work [17, 31, 58], we use the term invariants as a short-cut for likely invariants, which are the properties that we observe to be trueacross a set of dynamic execution traces. Corresponding to the dimensionsdefined above, we define six major classes of invariants that form the basisof the CPS model.183.2. Event-data-time Interplay• Data Invariant captures the expected range of values of selected datavariables during normal execution of program.• Event Invariant captures common patterns in the system’s events suchas the order of the events’ occurrence.• Time invariant captures the normal time boundaries (such as durationor frequency) of an event.• Data per Event(D|E) invariant captures the temporal relationshipbetween data and events. It allows the IDS to check the validity ofdata invariants based upon events.• Event per Time (E|T ) invariant captures the constraints over eventand time. It represents the boundaries of transition time from oneevent to another in an event sequence.• Data per Time (D|T ) invariant captures the relational constraints oftime and data invariants, or the data invariant as a function of time.3.2 Event-data-time InterplayIn a CPS, an event is defined as an instance of an action that leads to achange of condition [51] (e.g., message send/receive, sensor data reading, oractivating insulin injection). Events have three key features. First, they re-flect interactions between system components and observations rather thaninternal state. The second feature is the notion that events are separatedin space and time [15, 16, 51]. Thirdly, the locations in the code whereevents are triggered are usually system calls that are accessible by attack-ers. From a security perspective, events are important as they play therole of an input channel for malicious communication with the CPS. Forinstance, those points in which a new measurement is read from sensors, oractuation commands are sent to physical components, are more vulnerableto spoofing attacks [18].Finding a direct relationship between time and data is challenging fromboth the learning and detection perspectives. Since time is a continuous193.2. Event-data-time Interplayphenomenon, we cannot define a sharp time for transitions in data valuesor changing states of the system; instead, a distribution of time values hasto be learned. As execution time variations might be caused by di↵erencesin input sets or di↵erent execution flows, rather than malicious activities,the invariant inference technique should learn the normal time variationsof the system. On the other hand, the IDS has to distinguish legitimatetime variations from any time deviation that indicates an intrusion. Toovercome these challenges, we leverage the event-based nature of a CPS,in which every event takes place in an unique time-frame. We discretizethe time by the events, and use these for learning invariants. After dis-cretizing the time by events, we first examine the relationship between dataand event dimensions to produce invariants that integrate event informationwith constraints on data values (D|E invariants). Secondly, we discover therelational constraints over time and event dimensions to calculate the phys-ical time boundaries of events, either independently (time invariants), or inrelation to each other (E|T invariants). Finally, we combine the result ofthe previous steps to infer D|T invariants.In the following discussion, we illustrate how we infer the D|T invariantsgiven the conditional probability of having data D given event E invariant(P (D|E)), and given the conditional probability of having event E giventime T invariant (P (E|T )). It should be noted that we have representeddi↵erent classes of our likely invariants in the form of probability functionsto conceptually describe the process of how we infer real-time data invariants(i.e., D|T invariants). We exploited the conditional independence of time Tand data D upon a given event Ej to derive equation 3.9. However, ARTI-NALI does not calculate the probability of correctness for every invariant;instead, it generates the invariants that hold true with a high probabilitybased on the above analogy.Considering data D, event E and time T as random variables, equation3.1 expresses the joint probability distribution of variables D, E and T . Werewrite it to obtain equation 3.2. From these two equations, we then derive203.2. Event-data-time Interplayequation 3.3, which expresses the probability of having D and E, given T .P (D,E, T ) = P (D,E|T ) · P (T ) (3.1)P (D,E, T ) = P (D|E, T ) · P (E|T ) · P (T ) (3.2)P (D,E|T ) = P (D|E, T ) · P (E|T ) (3.3)Using the marginal probability mass function of D shown in equation3.4, we formalize P (D|T ) (the probability of having D given T ) in equation3.5 as the sum of the probabilities of data D and event Ej given time T forall events Ej , which can then be rewritten as equation 3.6 (using equation3.3).P (D) =XP (D,Ej), 8Ej (3.4)P (D|T ) =XP (D,Ej |T ), 8Ej (3.5)P (D|T ) =XP (D|Ej , T ) · P (Ej |T ), 8Ej (3.6)For example, assuming that at time T , event Ej occurs; and that uponEj occurring, then variable D gets assigned a specific value. This impliesthat T is the cause of Ej , and that D is the e↵ect of Ej . Thus, variable D isconditionally independent of time variable T given event Ej . Consequently,D and T are conditionally independent, and P (D|Ej , T ) = P (D|Ej). Hence,we can simplify the formulation of P (D|T ) as follows :P (D|T ) =XP (D|Ej) · P (Ej |T ), 8Ej (3.7)According to the event-based semantics of CPS, any given event takesplace in a unique time frame. This implies that two or more events cannottake place at the same time T ; i.e., P (Ej |T ) > 0) P (Ei|T ) = 0, 8Ei 6= Ej .Given this assumption, we first rewrite equation 3.7 to obtain equation 3.8.Then, we simplify it to obtain equation 3.9, which captures the relationshipbetween data D and time T by exploiting the relational constraints of bothdata and time over the same event Ej which takes place at time T .213.3. ARTINALI WorkflowP (D|T ) = P (D|E = Ej) · P (E = Ej |T ) +XP (D|Ei) · P (Ei|T ), 8Ei 6= Ej(3.8)P (D|T ) = P (D|E = Ej) · P (E = Ej |T ) (3.9)In other words, for a given event Ej, a D|T invariant holds true (i.e.,happens with a high probability) if and only if both the corresponding D|Einvariant and E|T invariant hold true.3.3 ARTINALI WorkflowARTINALI is a dynamic analysis-based technique that generates models ofdynamic system behavior, and proposes a multi-dimensional model basedon the design concepts introduced in the previous section. Figure 3.1 showsthe key blocks of ARTINALI’s workflow.Figure 3.1: Work flow of ARTINALIARTINALI technique works at event granularity to mine invariants. Inorder to generate logs for mining invariants, we manually instrument events223.3. ARTINALI Workflowand their associated data variables1. As we generate the CPS model for at-tack detection, we capture all system calls (which are reachable by attackers)as events. However, in ARTINALI, events are user-defined. Accordingly,user can optionally customize the level of granularity by choosing anothertype of events, or prune the space of events by specifying only the importantsystem calls based on the system’s requirements. We instrument the events’program locations by inserting calls to the ARTINALI API functions thatwe developed for collecting logs, before and after the event. During the run-time, these functions collect data and time information associated with theinstrumented events in separate log files (DElog and TElog). The loggedinformation is used as the basis for mining invariants.3.3.1 Block 1. ARTINALI D|E MinerThe ARTINALI D|E Miner learns invariants about the variable values, andhow these values relate to a particular event in the system using a three-step process. First, the D|E Miner takes the logged information, and groupsthem within each trace into distinct classes labeled with the events. It thenmerges classes across the training traces. Second, within each class, usingthe Frequent Item Set mining algorithm [22], it merges the data variableswhile calculating the level of the confidence and support for every variable.As in prior work [17], Support is the fraction of traces in which the variablex within class Ej is seen, and confidence is the fraction of supported classes,where variable x is assigned to the same value(s).Finally, the D|E Miner infers the data invariants associated with eachclass (event). D|E invariants are multi-propositional data invariants as theyall hold true within the same observed event (at the same time). The D|Einvariants are stated in the form of (Ei : d1 = [], d2 = [], ...dn = []), whereEi denotes the name of (i)th event, and d1  dn denote the range of con-crete values of n data variables mapping to the event Ei. E.g, in our firstrunning example of smart meter, (receive: seg-data=false, command=nil,status=time-out, len (partial)0) represents the assigned values of selected1This is similar to what almost all other invariant detection techniques do, with theexception of DAIKON, which has an automated instrumentation engine.233.3. ARTINALI Workflowdata variables during receive mode. Figure 3.2 shows sample DElog andD|E invariants of motivating example.We have chosen the above D|E invariant template as a common datainvariant template. However, ARTINALI D|E miner is extensible to alltemplates that Daikon uses for data invariant inference. We avoid usingall Daikon templates for three reasons: First, data invariants inferred usingvarious templates are overlapped (e.g, 2  X  5, Y  5, Y  X). Secondly,the more number of invariants leads to a higher rate of false positives inanomaly detection [6]. Thirdly, CPS has a limited memory capacity whichmakes using a big IDS model challenging. Thus, a smaller set of rich andstable invariants for CPS IDS model is more desired.Figure 3.2: Sample DElog and respective D|E invariants with 100% confi-dence243.3. ARTINALI WorkflowTable 3.1: E|T and D|T Invariant TypesE|T Invariant TypeType I Ei(t)⌦ Ei(t+ 1Freqi )Type II Ej ⌦ Ei : tjimax,tjiminType III Ei : timax,timinD|T Invariant TypeType I dm(Ti  t  Tj) = []Type II dm = []⌦ dn = [] : tjimax,tjimin3.3.2 Block 2. ARTINALI E|T MinerARTINALI’s E|T Miner infers the E|T invariants in four steps. First, itcreates all consecutive event pairs within one trace annotated with theirtime di↵erences. Second, it groups the pair of events that are labeled withthe same pair name. Third, ARTINALI’s E|T Miner looks for the pair-wiseevents that are observed in the same order within training traces, and calcu-lates their support. Finally, it merges the time variables within each class tocalculate the time boundaries of the paired events, as well as the frequencyand the average duration of every event execution. The E|T invariants areclassified into three types, as shown in Table 3.1. Type I indicates thatevent Ei is repeated every1Freqiseconds. Type II indicates that the pairof events Ei and Ej are repeated in all traces in the same order, and theirtime di↵erence is bounded within tjimax and tjimin. Type III indi-cates the maximum and minimum duration of event Ei. For the example inSection 2.2, invariant send ⌦ send :60.2, 59.9sec showing the frequency ofsend occurrences in the system, and the invariant send⌦ receive : 1.2, 1.5representing the time boundary (between 1.5 and 1.2) as well as the logicalordering of the events (i.e., send before receive), are both examples of E|Tinvariants. We illustrated sample TElog and DurationLOG files that are fedto E|T miner, and the respective E|T mined invariants in Figure ARTINALI WorkflowFigure 3.3: Sample TElog and DurationLOG files and respective E|T in-variants.3.3.3 Block 3. ARTINALI D|T MinerAccording to the formulation described for D|T invariants, ARTINALI com-bines the outputs of D|E and E|T miners to generate the real-time data in-variants (D|T invariants). We define two types of data invariants (Table 3.1),and we explain each type using the example in Section 2.2.Type I represents the distribution of valid data values of variable dmwithin time slot Ti  t  Tj . For example, seg-data(T1  t  T2) = truemeans that the only valid value of variable seg-data is true during the timeinterval T1  t  T2. Note that we di↵er here from Daikon data invariants(e.g. seg-data=true,false), as they only express the valid values of data263.3. ARTINALI Workflowinvariants without considering the time.Type II captures the relationship of data invariants between two con-secutive events. As explained in previous section, every two consecutiveevents have a bounded time di↵erence (Ti+tjimin  Tj  Ti+tjimax).As a result, the data invariants associated with those events have the sametime di↵erence. In other words, data invariant dj = [] holds true untildata invariant di = [] becomes true, while tjimax and tjimin spec-ifies the time di↵erence boundaries between those data invariants. Fig-ure 3.4 shows examples of two types of D|T invariants that ARTINALID|T Miner generates for our running example. For example, invariantseg-data = true ⌦ seg-data = false : 1.2, 0.1sec; i.e., seg-data = trueholds true until seg-data is assigned value false, in a time interval rangingbetween 0.1 and 1.2 seconds.Figure 3.4: Representation of ARTINALI D|T invariants3.3.4 Block 4. IDS PrototypeAs explained in the previous sections, the ARTINALI Miners derive threeclasses of invariants that comprise the final CPS model. The CPS modelnot only satisfies the mined invariants but also admits the correct pathswithin executions. It is used as an input to our IDS prototype for monitoring273.4. Summaryattacks. Our IDS prototype consists of two components: the Tracing moduleand the Intrusion detector. The tracing module is in charge of collectingthe required information from the program’s execution and logging it. Thismodule is the same as the ARTINALI Logger that instruments the code andcollects logs, but with the di↵erence that it is deployed on the productionsystem. The collected information is fed to the intrusion detector, whichperiodically processes the log file and checks it against the invariants derivedfrom the CPS model.3.4 SummaryIn this chapter, we introduced ARTINALI for mining likely invariants throughdynamic analysis in CPS systems, for specification-based IDSes. The maininnovation in ARTINALI is that it incorporates time as a first-class notionin the mined invariants, in addition to the traditional data and event invari-ants. This is important for two reasons. First, most CPSes have real-timeconstraints, and hence their operational correctness depends on both logicalcorrectness, and correct timing behavior. Hence, incorporating time is essen-tial for detecting many common security attacks in these systems. Secondly,CPSes have predictable timing behaviors to a first order of approximation,and hence leveraging this predictability leads to higher accuracy (i.e., lowerfalse-positives and negatives). However, incorporating time in dynamic in-variant detection techniques increases the complexity of the learning dueto the much larger state space that needs to be covered. To alleviate thisissue, we break up the problem of learning invariants along the three di-mensions into problems of learning invariants along two dimensions, namelydata-events and events-time, and then combine them into data-events-timeinvariants. As a proof of concept, we have also built an ARTINALI-basedIDS prototype, that is fed by multi-dimensional model generated by ARTI-NALI, and use it in the context of two CPS systems, namely i) advancedmetering infrastructures, ii) and smart artificial pancreas (Chapter 4).28Chapter 4Experimental SetupThis section first presents the details of two CPS platforms as case studies,and then the experimental procedure for evaluating the IDS on the twoplatforms.4.1 Case StudiesWe chose two CPS platforms as case studies to evaluate the ecacy of theinvariants generated by ARTINALI and the other tools. Note that un-like generic applications, there are few publicly available open-source CPSplatforms that are also security-critical. Furthermore, there is a significantamount of e↵ort involved in setting up a CPS platform and generating exe-cution traces from it.4.1.1 Advanced Metering Infrastructure (AMI)Advanced Metering Infrastructure (AMI) systems are deployed on smartelectric power grids. Smart meters are key components of AMI that providea two-way communication with the utility provider[44]. The large scaledeployment of smart meters and the discovery of many vulnerabilities inthese systems [49, 57], make them good candidates to evaluate our work.According to Figure 4.1(a), a generic smart meter is composed of two maincomponents, namely the meter and the gateway (controller). The metercomponent receives power consumption data (PCD) through analog frontend sensors, and stores them in the memory. The controller componentis the communication bridge between the meter and the utility provider’sserver, passing server commands to the meter, and sending consumption294.1. Case Studiesdata back to the server at specific time intervals (more details in [49]).Our testbed: We use SEGMeter [1], an open source smart meter toevaluate our IDS prototype. SEGMeter is implemented using the Lua lan-guage, and consists of 2500 lines of code (excluding libraries).4.1.2 Smart Artificial Pancreas (SAP)Diabetic patients are migrating from the traditional glucose meter andmanual insulin injection systems to continuous glucose monitoring and au-tonomous insulin delivery devices [33], which are referred to as Smart Ar-tificial Pancreas (SAP). Since attacks to a SAP can threaten the patient’slife, these systems are highly security-critical [41]. Hence, we selected SAPas our second case study to evaluate ARTINALI. The main building blocksof a generic SAP are a Continuous Glucose Monitor (CGM), an insulinpump, and a controller (As illustrated in Figure 4.1(b)). These are com-monly connected through a wireless network to form a real-time monitoringand response loop. The CGM samples the patient’s blood glucose (BG)levels on a regular basis and sends it to the controller. The insulin pumpis a wearable medical device that is used for automatic injection of insulinthrough subcutaneous infusion. It may deliver insulin in two doses:bolusand basal.Each type has specific injection time, rate, and dosage based onthe patient’s needs. The controller controls the closed loop in the SAP.It receives the measured BG from CGM, and issues the suitable actuationcommand for correcting the sugar level.Our testbed: We used Open Artificial Pancreas System (OpenAPS)[32], an open source SAP, as a second use case to evaluate our IDS prototype.OpenAPS implements the controller component of a SAP in JavaScript, andconsists of 2000 lines of code (excluding libraries). We simulated a simpleCGM and an insulin pump to close the loop, as we did not have access toa patient with a real insulin pump and glucose meter. OpenAPS providesa set of test cases that take di↵erent BG values as input and process themfor calculating basal rate of insulin, which we use as a baseline for ourexperiments.304.2. Experimental ProcedureFigure 4.1: High level block diagram of (a) Smart meter and (b) SmartArtificial Pancreas.4.2 Experimental ProcedureFigure 4.2 shows the overall procedure that we follow. In addition to gen-erating the CPS model using ARTINALI, we generate three other models(invariants sets) using Daikon, Texada, and Perfume for comparison pur-poses. We downloaded the latest versions of these tools from their respec-tive websites [2–4]. We do not run the instrumentation front-end of Daikon(i.e., Kvasir), as our goal was to generate data invariants based on the eventtraces we logged. We choose these three tools to represent the first, sec-ond and fourth classes of invariants as described in Chapter 2. We do notchoose the tools in the third category, namely GK-tail and Quarry, as weuse Daikon to find data invariants for the events that we identified in thesystem. Therefore, the invariants generated by Daikon cover the third classof invariants in our experiments (i.e., D|E invariants).There are 22 system calls in SEGMeter’s code, and 4 system calls inthe OpenAPS code. We consider all of them as events. Tables 4.1 and 4.2present the types of invariants and the number of invariants generated bythe three tools and ARTINALI for the SEGMeter and OpenAPS platformsrespectively. As can be seen, ARTINALI generates invariants in the Time,314.2. Experimental ProcedureD|E, E|T , and D|T categories, while DAIKON, Texada and Perfume onlygenerate invariants in the D|E, Event and E|T categories respectively.Because the format of the invariants generated by these other tools maybe di↵erent from that expected by our IDS, we wrote scripts to convert theinvariants to be in the format expected by the IDS interface. ARTINALIdirectly generated invariants in the proper format. In case a tool did notgenerate a certain kind of invariant (e.g., D|E), we leave that invariant fileblank. The generated invariant sets are all fed into the IDS as inputs, andtheir ecacy is evaluated on di↵erent platforms.We divide the experiment into a training phase and a testing phase foreach system. We first obtain execution traces from the two platforms undernormal operation, and randomly divide them into a set of training traces(train) and testing traces (test). We then choose di↵erent training set sizesfor each invariant detection system to optimize the false positive (FP) andfalse negative (FN) ratios for that system. Finally, we evaluate the FP ratiosof the invariants using the test traces, and the FN ratios using the attackmodels described in the next section.Figure 4.2: Overall experimental process of running the IDSThe IDS is implemented in Python, and consists of about 1000 lines ofcode. Since the IDS is run on the CPS platform, which is often resource324.2. Experimental ProcedureTable 4.1: Types and number of inferred invariants for SEGMeter acrosstoolsEvent T ime D|E E|T D|TDaikon - - 24 - -Texada 158 - - - -Perfume - - - 158 -ARTINALI - 12 24 37 24Table 4.2: Type and number of inferred invariants for OpenAPS across toolsEvent T ime D|E E|T D|TDaikon - - 22 - -Texada 57 - - - -Perfume - - - 57 -ARTINALI - 4 22 18 7constrained, it is important to minimize its overheads. We measure theIDS’s time and space overhead for the SEGMeter platform in Chapter 6-Section 6.3.5 and Section 6.3.4. Because we run the OpenAPS platform ina simulator, as we did not have access to its hardware, we do not measurethe IDS overheads on OpenAPS.33Chapter 5Evaluation Against TargetedAttacksIn this chapter, we discuss the potential targeted attacks and how we derivethem for SEGMeter and OpenAPS platforms. We then evaluate the IDSseeded by ARTINALI and other tools against the attacks. Note that weused attack trees based on prior attacks against similar systems to generatethe attacks to minimize bias and model realistic attacks. We found thatARTINALI was able to detect all the attacks, while none of the other toolsdo so. This is because all the attacks involved violations of the interplayamong data, events, and time.5.1 AMI AttacksEnergy fraud is a major class of AMI attacks, and can result in Power Con-sumption Data (PCD) loss and improper billing [36]. We came up with anattack tree for energy fraud in AMI (shown in Figure 5.1), based on attacksintroduced in previous work [36, 44, 50, 57]. There are three major branchesin this tree, namely i) Measurement tampering, ii) Storage tampering, andiii) Network tampering. Corresponding to each branch, we developed theconcrete attack actions as the leaves of the tree as follows.5.1.1 Synchronization tampering (Blocks A1 A4)Synchronization tampering attack occurs due to modification of the time ofsend and receive modes in AMI. We found that the communication betweenthe AMI and the server is synchronized by a vulnerable function (get-data-345.1. AMI AttacksFigure 5.1: Attack tree for AMItimer()) in the controller unit. The controller frequently checks the timewith the sever to decide when to request for data measured by the meter. Ifa malicious user delays the server commands, the controller will not receivedata in the expected time, which leads to data loss, and improper calculationof final PCD.5.1.2 Meter spoofing (Blocks B1 B5)In a smart grid, AMIs communicate with the server using a unique nameor ID. The controller unit is able to be connected to more than one meter,collects the PCDs, and send them along with the meter’s ID to the server. Asthe controller cannot di↵erentiate between normal and abnormal messages,it can be tricked by falsified inputs sent by an attacker instead of the meter.This attack is called meter spoofing attack. We found that spoofing themeter only requires the meter’s ID that is printed on the meter’s nameplate.5.1.3 Message dropping (Blocks C1 C5)An attacker may be able to drop the messages (i.e., a part of energy usage)after bypassing the meter and removing the logged PCD history. A simple355.2. Detection of AMI attacksTable 5.1: ARTINALI invariants to detect the example attacks in AMI.Attack Detecting InvariantSynchronization tampering (1) send (T0+K · 60) ⌦ send (T0+(K+1) · 60), 8 k0Message dropping (2) recv (T1) ⌦ recv (T1+1)Meter spoofing (3) node-name(T0+N · 60) = Node B, 8 N0way to mount this attack is to intercept the communication between themeter and the controller, and control what trac to block and what to passthrough (e.g., through a firewall). Hence, the blocked trac would not beincluded in PCD calculations.5.2 Detection of AMI attacksWe ran the ARTINALI-based IDS on the example attacks, and found thatit detected all of them. Table 5.1 indicates the important invariants that arederived by ARTINALI, which detect the attacks presented in the previoussection.5.2.1 Synchronization tamperingAs synchronization tampering attack modifies the timing of send and re-ceive operations of SEGMeter, we picked events send and receive as rele-vant events to explain this attack. We can see in row 1 of Table 5.1 that theARTINALI invariant captures the sequence of these events during normaloperation, i.e., send operation happens every 60 seconds, and receive is re-peated every 1 second. Thus, this invariant detects the attack as the timingof the events is violated by the attack.5.2.2 Message droppingIf we assume the attacker drops one or more messages from meter, thedropped messages will not be received at the expected time slots by thecontroller. As a result, the frequency of receiving messages in controller willchange. This attack breaks the invariant number (2) in Table 5.1, which365.3. SAP Attacksrepresents the time frequency of receive function which is 1003 milliseconds(⇠= 1sec) within one full execution path. Thus this attack is also detected.5.2.3 Meter spoofingTo detect meter spoofing attack, we selected two receive events (recvA andrecvB) from two di↵erent meters (node A and node B) that are connectedto the same controller, and analyzed the respective invariants. For example,nodeName(T0+N*60) = Node B, 8 N 0 specifies that the valid value ofnodeName at T0 +N ⇤ 60 is Node B. If the identity of node A is stolen bynode B, it sends its messages every 60 seconds under the name nodeA. As aresult, variable nodeName attached to event recvB, becomes nodeA. Thus,the invariant number (3) in Table 5.1 is violated.5.3 SAP AttacksDiabetic therapy tampering is one of the highest severity threats for patients,as it can result in death or severe health complications. We developedan attack tree for diabetic therapy tampering based on publicly availablereports of attacks on SAPs [33, 41], in Figure 5.2. We consider three classesof attacks based on the tree.5.3.1 CGM spoofing attack (Blocks A1 A4)The CGM spoofing attack injects false into the communication channel be-tween CGM and controller making the controller think that the glucose levelis either higher or lower than it actually is. There are two ways that CGMspoofing can be accomplished. First, if the sensor data format is unknown,then a replay attack can be used. In this case, a sensor value read in thepast can be re-sent (e.g., by a RF module [33]) to the controller. This wouldcause the controller unit to indicate an outdated glucose level rather thanthe actual one. Second, if the format of sensor data is known to hacker, shecan send the false data at random time intervals to mislead the controller.375.3. SAP AttacksFigure 5.2: Attack tree for SAPTable 5.2: ARTINALI invariants to detect example attacks in OpenAPS.Attack Detecting invariantCGM spoofing (1) read (t) ⌦ read (t+5)Stop basal injection (2) (120  BG 485) ⌦ (0.9  basal.rate  3.5) : 1.99, 0.464Resume basal injection (3) BG  75 ⌦ basal.rate=0 : 1.99, 0.4645.3.2 Basal tampering (Blocks B1 B5)The basal tampering attack may be accomplished in two di↵erent scenarios.The attacker may issue a command for i) stopping the basal injection (e.g.,basal.rate = 0) when it is required for patient, or ii) resume the basalinjection (basal.rate > 0) when it has to be stopped. These attacks maybe mounted using a software radio board that fully controls the SAP [33,41], and transmits the malicious commands to the pump. To accomplishthe attack, the attacker needs to spoof the PIN number of the controller,and the format of transmission packets - both of these can be done by aneavesdropping attack.385.4. Detection of example attacks in SAP5.4 Detection of example attacks in SAPWe mounted the attack examples on the SAP system we considered (i.e.,OpenAPS), and found that the ARTINALI-based IDS is able to detect allof them. There are four events in the SAP, namely 1) send(BG) or sendingblood glucose by CGM , 2) read(BG) or reading BG by the controller , 3)send(basal.rate) or sending basal rate to pump by the controller, and 4)recv(basal.rate) or receiving basal rate by pump. We used these events asthe basis for mining 51 invariants for OpenAPS’s IDS model. Due to spaceconstraints, we do not present all inferred invariants, but only those thatdetect the example attacks (Table 5.2).5.4.1 CGM spoofing attackWe selected read(BG) in controller as the relevant event, and analyzed theinferred invariants for this event to analyze CGM spoofing attack. Undernormal conditions, the transmission of measured Blood Glucose (BG) toCGM occurs at deterministic, periodic times (e.g., every five minutes). Thisproperty is represented in our model as time frequency of event read(BG),that is read(t)⌦ read(t+5). Using the above property, it would be possibleto detect malicious sensor reading from any external source that performsreplay attack or transmits wrong data at random time intervals to the con-troller as the frequency of reading data by controller would change.5.4.2 Basal tampering attackAs previously explained, the basal tampering attack may be accomplishedin two di↵erent scenarios: i) stop basal injection (basal.rate = 0) whenit is required, and ii) resume basal injection (basal.rate > 0) when it isnot required. These attacks break the invariants shown in Table 5.2. Theinvariant number (2) indicates that if BG is higher than the normal range,the patient needs insulin (i.e., basal.rate > 0). However, the stop insulininjection attack makes the basal.rate value to be 0, which breaks invariantnumber (2). Similarly, the invariant number (3) in Table 5.2 shows that395.4. Detection of example attacks in SAPfor low BG ranges (e.g., BG = 45), the patient does not need insulin (i.e,basal.rate must be 0), but resume basal injection attack sends a command(basal.rate > 0) to the SAP to inject insulin. As a result, invariant number(3) is violated.5.4.3 SummaryThus, we see that the ARTINALI-based IDS is able to detect all six attacksthat we considered (3 for AMI, and 3 for SAP). On the other hand, none ofthe other three systems (i.e., DAIKON, Texada, and Perfume) detect even asingle one of the attacks. This is because all the attacks involved violationsof the interplay among data, events and time40Chapter 6Evaluation Against ArbitraryAttacksIn the previous chapter, we evaluated the IDS based on ARTINALI ontargeted attacks. However, evaluation of security techniques using a smallnumber of targeted (hand-crafted) attacks might not be sucient for CPSsystems for two reasons. First, CPSes are new systems for which there arefew real attacks - hence they need protection from zero-day or unknownattacks. This is especially the case for security-critical CPSes such as smartmedical devices. Secondly, unlike general computer systems, CPSes can bedicult to upgrade and patch frequently, and hence, they need resilientIDSes against arbitrary attacks. Therefore, in this chapter, we evaluate theecacy of invariants generated by ARTINALI and the other three invariant-detection techniques for the CPS platforms using arbitrary attacks.6.1 Arbitrary Attack ModelWe use fault injection (i.e., mutation testing) to emulate arbitrary attacks.Fault injection has been used to study the e↵ects of attacks in previous work[49]. Note that these are not complete attacks, but rather form the buildingblocks of attacks. We deploy di↵erent types of mutation in the program’scode, as follows.• Data mutations, which change the runtime values of data variables inthe code;• Branch flipping, which change the normal execution flow of the pro-gram by flipping branch conditions;416.1. Arbitrary Attack Model• Artificial delay insertion, which modify the normal timing behavior ofthe program.Each of the above categories emulate di↵erent security issues. By per-forming data mutations, an attacker can change critical data in the programto their advantage. Such attacks can be accomplished by exploiting memorycorruption vulnerabilities or race conditions in the program. For instance,[48] investigated an attack to smart facial recognition systems caused byexploiting a mis-classification bug (CVE-2016-1516) in the controller algo-rithm through input data mutation. Likewise, branch flipping can lead toillegitimate control flow paths being taken in the program, to accomplish theattacker’s ends. Such attacks can occur due to code injection or semanticvulnerabilities. As an example, [10] indicated that how attacker is able tochange the sequence of actions in a smart car through exploiting the bu↵eroverflow vulnerability in the telematics of car. Finally, artificial delays canallow attackers to change the timing of the system’s actions, and delay es-sential functions, or cause other functionality to be suppressed, again totheir advantage. Synchronization tampering attack is one example of suchattacks [36]. Through these mutations, we can emulate a wide variety of at-tacks, without a predefined target, thus avoiding bias and allowing modelingof hitherto unknown attacks.We performed 156 and 125 code mutations for SEGMeter and OpenAPSrespectively. We manually seeded each of these mutations in the source codeof the respective systems, by randomly sampling the corresponding programpoints in the program’s code. While this could have been automated by afault injection tool (e.g., LLFI [5]), the languages in which the two systemswere implemented, JavaScript and Lua, were not supported by existing tools.So we had to perform mutations manually. However, we attempted to choosethe program points randomly before performing the experiment to avoidbiasing our evaluation.After mutating the code, we can observe one of four outcomes.• Crash, in which the program is aborted (exception);• Hang, in which the program goes into an infinite loop or deadlocks;426.2. Evaluation MetricsFigure 6.1: Fault injection impact(%): (a) data mutation in SEGMeter, (b)branch flip in SEGMeter, (c) artificial delay in SEGMeter, (d) data mutationin OpenAPS, (e) branch flip in OpenAPS, (f) artificial delay in OpenAPS• SDC (Silent Data Corruption), in which the outcome of the programis di↵erent from a fault-free execution;• No corruption, in which the outcome of the program does not showany observable impact with respect to fault masking or non-triggeringfaults. Internal states might however be corrupted.Note that in the context of this study, we are interested only in SDCand No corruption outcomes, as the Crash and Hang outcomes can easilybe detected without an IDS. Therefore, we need an IDS for the SDC and Nocorruption outcomes which comprise about 85% of the outcomes on average(according to Figure 6.1).6.2 Evaluation MetricsAccuracy: We use three metrics to measure the e↵ectiveness of our IDSfrom the accuracy point of view.436.2. Evaluation Metrics• False Negative ratio (FN), which is the ratio of attacks that were un-detected by the IDS to the total number of attacks;• False Positive ratio (FP), which is the ratio of execution traces thatwere (incorrectly) reported as attacks to the total number of normaltraces;• F-Score(), which is a computation of the harmonic mean of the truepositive ratio (TP), FP and FN2.The variations of the argument  in F-Score() allow us to weigh theabove metrics di↵erently [46], and obtain di↵erent trade-o↵ between FP andFN ratios based on the system requirements. A value of  > 1 weighs FNshigher, while a value of  < 1 weighs FPs higher. A value of  = 1 weighsthem both equally. We hypothesize that FPs are more important in smartmeters, as a false-alarm leads to added cost to the utility provider who needsto deploy service personnel to investigate the false alarm. An occasional FNmay be acceptable in smart meters as the consequence is only a loss ofrevenue. In the OpenAPS, on the other hand, even a single FN can be fatalto the patient, while a FP may be acceptable if there are other checks inplace to filter out FPs (e.g., patient intervention). Hence, for SEGMeter, weselect F-Score(0.5), and for OpenAPS, we choose F-score(2) as our referencemetric.Overheads: In addition to the accuracy, we also measure the memoryand performance overheads of the IDS.Memory overhead is defined as the actual memory usage of the IDS. Itdepends on the size of IDS, the number of invariants that account for theCPS model, and the complexity of invariants (e.g., the invariant Ej ⌦ Ei :tjimax,tjimin carries more information than the invariant Ej ⌦ Ei, andis hence more complex).Performance overhead is the increase in execution time as a result ofrunning the CPS on the target platform. This metric reflects the overheadof both the tracing module and the intrusion detector. Since CPSes run2F  Score() = (1+2)⇥TP(1+2)⇥TP+2⇥FN+FP446.3. Research Questions (RQs)continuously for long periods of time, we measure the performance overheadper cycle, where a cycle refers to one full execution of the main loop of theCPS (both the SEGmeter and OpenAPS consist of a single main loop thatruns continuously).6.3 Research Questions (RQs)We ask the following RQs corresponding to the evaluation metrics.RQ1. How do we choose the training set size to obtain the best F-Score()for each tool?RQ2. What is the FN ratio incurred by the IDS using the invariants derivedby ARTINALI and the other tools ?RQ3. What is the FP ratio incurred by the IDS using the invariants derivedby ARTINALI and the other tools?RQ4. What is the memory overhead of the IDS when using the invariantsderived by ARTINALI and the other tools?RQ5. What is the performance overhead of the IDS when using the invari-ants derived by ARTINALI and the other tools?6.3.1 RQ1. F-ScoreAs mentioned in Chapter 4, we obtain two sets of traces from each system,namely train and test. In this RQ, we ask what should be the optimaltraining set size for each system in order to maximize the correspondingF-Score values. To answer this question, we obtain a total of 40 trainingtraces, and 50 test traces for each system. We then vary the training set sizefrom 5 to 40, in increments of 5. We then run each of the invariant detectiontools including ARTINALI on the same training set to derive invariants. Wethen measure the FP, FN, and F-Score values (0.5, 1, 2) for each invariantdetection tool and system, as a function of the training set size.456.3. Research Questions (RQs)Figures 6.2, 6.3, 6.4 and 6.5 show the distribution of the amount offalse positives (FP), false negatives (FN) and the F-Score computed with = 0.5, 1, 2 in relation to the amount of training traces for SEGMeter, corre-sponding to each of the four invariant detection tools, including ARTINALI.Similarly, Figures 6.6, 6.7, 6.8 and 6.9 show the distribution of the amountof false positives (FP), false negatives (FN) and the F-Score computed with = 0.5, 1, 2 in relation to the amount of training traces for OpenAPS, cor-responding to each of the four invariant detection tools. As expected, as theamount of training traces increases, the FP ratio decreases, since a broaderset of invariants are extracted; thus a lower amount of legitimate actionsare flagged as potential attacks. A consequence is that more attacks areundetected (FN increases), as a more restricted set of invariants can leadto some attacks being undetected. Overall, an increase in the amount oftraining traces lead to an increase of the F-Score at first, then it stabilizes,at which point an optimal amount of training traces have been found (for agiven values of ).Tables 6.1 and 6.2 show the optimal amount of training traces (optimalF-Score) for each invariant detection tool, for SEGMeter and OpenAPS re-spectively. Recall that we choose F-Score(0.5) for SEGMeter and F-Score(2)for OpenAPS, and hence these are the F-score values we choose for the op-timal number of traces. For example, in SEGMeter, a training set size of20 results in the maximum value of the F-Score(0.5) value of ARTINALI,whereas for OpenAPS, a training set size of 15 results in the maximum valueof F-Score(0.5). Likewise, we compute the optimal training set sizes for thethree other tools on both platforms. These are the values of the trainingset sizes we use for deriving the invariants for each tool in the rest of thissection. In other words, we find the best configuration of each tool on eachplatform, and generate invariants using this configuration for comparing thecorresponding IDSes .466.3. Research Questions (RQs)Table 6.1: Optimal training set size for maximum F-Score(0.5) for SEGMe-ter across tools, and the corresponding FP and FN ratios.Daikon Texada Perfume ARTINALIF-Score(0.5) 0.721 0.78 0.813 0.898Num of traces 30 30 35 20FP (%) 23 15 15 12FN (%) 57 60 38 2.3Table 6.2: Optimal training set size for maximum F-Score(2) for OpenAPSacross tools, and the corresponding FP and FN ratios.Daikon Texada Perfume ARTINALIF-Score(2) 0.604 0.62 0.686 0.952Num of traces 30 20 15 15FP (%) 21 16 22 13.5FN (%) 61 61 39 2Figure 6.2: FN, FP and F-Score variations based on number of trainingtraces for SEGMeter’s IDS seeded by Daikon invariants.476.3. Research Questions (RQs)Figure 6.3: FN, FP and F-Score variations based on number of trainingtraces for SEGMeter’s IDS seeded by Texada invariants.Figure 6.4: FN, FP and F-Score variations based on number of trainingtraces for SEGMeter’s IDS seeded by Perfume invariants.486.3. Research Questions (RQs)Figure 6.5: FN, FP and F-Score variations based on number of trainingtraces for SEGMeter’s IDS seeded by ARTINALI invariants.Figure 6.6: FN, FP and F-Score variations based on number of trainingtraces for OpenAPS’s IDS seeded by Daikon invariants.496.3. Research Questions (RQs)Figure 6.7: FN, FP and F-Score variations based on number of trainingtraces for OpenAPS’s IDS seeded by Texada invariants.Figure 6.8: FN, FP and F-Score variations based on number of trainingtraces for OpenAPS’s IDS seeded by Perfume invariants.506.3. Research Questions (RQs)Figure 6.9: FN, FP and F-Score variations based on number of trainingtraces for OpenAPS’s IDS seeded by ARTINALI invariants.6.3.2 RQ2. False NegativesIn this section, we compare the variation in the FN ratio incurred by the IDS,using invariants extracted by ARTINALI and the other tools. Tables 6.1 and6.2 also show the FN ratios for each tool for the SEGMeter and OpenAPSsystems respectively. We observe that overall, ARTINALI was able to detectaround 97.5% of attacks, which means it has an average FN ratio of 2.5%.In contrast, in Perfume, Texada and Daikon the FN ratio was respectively38.5%, 60.5% and 59% on average, across the two platforms. Thus, theARTINALI-based IDS reduces the ratio of false negatives by 89 to 95% overother dynamic invariant detection tools.Figure 6.10 and Figure 6.11 illustrate the FN ratio of the IDS for thethree category of attacks (code mutations), as well as the aggregated FNratio, for each tool, in both SEGMeter and OpenAPS. We discuss the FNratio for each attack category below:Data mutations: ARTINALI exhibits the lowest FN rate for data mu-tations (2 to 3%). This is followed by Daikon, which provides a much lowerFN ratio in data mutation attacks (15% in SEGMEeter and 17% in Ope-516.3. Research Questions (RQs)Figure 6.10: FN(%) of IDS for SEGMeter for di↵erent attack types acrossthe tools. Error bars are shown for the 95% confidence interval.Figure 6.11: FN(%) of IDS for OpenAPS for di↵erent attack types acrossthe tools. Error bars are shown for the 95% confidence interval.526.3. Research Questions (RQs)nAPS) than Perfume (53% in SEGMEeter and 78% in OpenAPS) and Tex-ada (52% in SEGMEeter and 87% in OpenAPS). This is because DAIKONfocuses on data invariants, while Texada and Perfume do not include datainvariants in their model. However, the Daikon data model does not includeother properties like ARTINALI does, resulting in much higher FNs thanARTINALI.Branch flipping: Among the other three tools, ARTINALI has thelowest FN rate for branch flipping attacks (1%). Perfume, Texada andARTINALI exhibit a lower FN ratio compared to Daikon for branch flippingattacks. As these attacks impact the order and sequence of the events in anexecution instance, and Daikon does not have event invariants, it shows lesssensitivity.Artificial delay: Again, ARTINALI has a much lower FN ratio (2-3%)than all three tools for artificial delay attacks, followed by Perfume. Thisis because they both include time in their model. Nevertheless, Daikon andTexada are still able to detect attacks that impact data variables or alterthe execution flow of the program.Overall, the results support our hypothesis that a more comprehensiveinvariant model, such as ARTINALI, which can find invariants and theirconstraints along three dimensions, can detect a significantly larger amountof attacks (and hence has fewer FNs).6.3.3 RQ3. False PositivesIn this section, we compare the FP ratio incurred by our IDS when usingthe invariants derived by ARTINALI against the invariants generated bythe other tools (Daikon, Perfume and Texada). The results are shown inTable 6.1 and 6.2 for the SEGMeter and OpenAPS systems respectively.We can observe that in both CPSes, the use of the ARTINALI-generatedinvariants lead to significantly less false positives compared to the invariantsgenerated by the other tools. More precisely, ARTINALI provides a 20%to 48% improvement of the FP ratio for SEGMeter, and a 16% to 39%improvement of the FP ratio for OpenCPS, over the other tools.536.3. Research Questions (RQs)These results can be explained by the fact that ARTINALI leverages thecorrelations among data, event and time dimensions during correct systembehavior to generate more stable invariants. ARTINALI infers event invari-ants that precisely describe the ordering of events in a sequence within anexecution flow, and then associates data and time constraints to the eventswithin every path (D|E and E|T ). Therefore, during normal operation, thesystem is unlikely to follow the same path with di↵erent associated data andtime values in a given execution, which in turn, reduces the probability offalse positives. Although the IDS uses the same traces for all tools, none ofthese tools other than ARTINALI look at the relational constraints of bothdata and time along the events’ paths, resulting in a higher ratio of falsepositives.While the FP ratio for the ARTINALI-based IDS is lower than the othertools, it is still high for both platforms. To reduce the FP ratio, one candeploy multiple variants of the code and switch to a di↵erent variant whenan attack is detected. If the invariant is not violated in the second version,it may be a false positive. Another solution is to remove invariants thatexhibit high FP ratios [6], but this may also increase the FN ratio. We defera detailed treatment of FP ratio reduction to future work.6.3.4 RQ4. Memory OverheadWe measured the memory consumption of our IDS running on the SEG-Meter platform, using the invariants generated by di↵erent tools. We alsocalculated the number of invariants that ARTINALI and the other toolsinferred for both platforms. Our results are shown in Table 6.3 (“Memoryusage” row). Generally, invariants that involve two or more dimensions (e.g.,E|T invariants) carry more information than the invariants of one dimen-sion (e.g., event invariants), and hence are more complex. We observe thatthe memory usage grows as the number and complexity of invariants in-creases. For example, the IDS consumes the maximum memory usage (3.94MB) when it uses the Perfume-generated invariants, which straddle two di-mensions, and have the maximum number of invariants (158 - Table 4.1).546.3. Research Questions (RQs)Table 6.3: Memory and performance overhead of IDS, seeded by ARTINALIand the other tools, running on SEGMeter.Daikon Texada Perfume ARTINALIMemory usage (MB) 1.24 3.21 3.94 2.96Tracing overhead(%) 22.6 13.4 18.8 23.3Detector overhead(%) 4.7 10.3 13.3 8.3Overall overhead(%) 27.3 23.7 32.08 31.6Full cycle execution(s) 60.94 60.94 60.94 60.94IDS execution time(s) 16.63 14.45 19.57 19.25Overall, we find that the memory consumption of the IDS with ARTINALI-generated invariants is lower than those with Perfume or Texada-generatedinvariants, but higher than those with Daikon-generated invariants. How-ever, the memory usage for all tools is much lower than the available memoryin SEGMeter (16 MB).6.3.5 RQ5. Performance OverheadIn this section, we discuss the performance overhead of our IDS runningon the SEGMeter platform, which consists of an embedded microcontroller(Broadcom BCM3302 V2.9 240MHz CPU and 16 MB RAM) running Linux.Recall that the IDS consists of two components, namely tracing moduleand intrusion detector module. Table 6.3 (middle part) shows the overheadsof the two modules separately for each tool. Each of these measurementsis an average of the overhead of 10 execution traces for each tool, where anexecution trace is defined as one complete execution of the meter’s main loop.We find that ARTINALI and Perfume have the highest aggregate overhead,followed by Daikon, and then Texada. The di↵erence in the overhead is dueto the di↵erence in the tracing module, which needs to collect both event anddata/time information for ARTINALI and Perfume, compared with Texada(events only), and Daikon (data only).In addition to the performance overheads, the IDS execution time shouldbe lower than the execution time of the system’s cycle, or else it will be556.4. Summaryunable to keep up with the system. We measure the raw execution times ofa full cycle in Table 6.3 (last part). As can be seen from the table, the entirecycle takes about 60 seconds (1 minute). However, the execution of the IDSfor each tool takes less than 20 seconds even in the worst case (for Perfume),which is only a third of execution time of the full cycle. Therefore, the IDSis not a bottleneck in any of the four systems, and is easily able to keep upwith the system.Note that the invariant mining process takes place o✏ine, and hencedoes not contribute to the performance overhead of the IDS running on theCPS platform. Nonetheless, we measured the time to mine invariants usingARTINALI, on a standard desktop system (Intel core i7 processor with 32GB RAM, running Linux). We found that the time ranges from 8 to 96seconds in SEGMeter, and from 6 to 36 seconds in OpenAPS. This is avery reasonable cost in most systems. Though this overhead may be higherfor larger systems, invariant mining is a one-time process and needs to beredone only when the code is updated.6.4 SummaryIn this chapter, we evaluated the IDS prototype seeded by ARTINALI andother three invariant-detection techniques for two CPS platforms using ar-bitrary attacks and using accuracy and overhead metrics. We used a model-based fault injection technique to emulate the building blocks of real attacks.Overall, the results support our hypothesis that a more comprehensive in-variant model, such as ARTINALI, which can find invariants and their con-straints along three dimensions, can detect a significantly larger amount ofattacks (and hence has fewer FNs). Moreover, we observed that in bothCPSes, the use of the ARTINALI-generated invariants lead to significantlyless false positives compared to the invariants generated by the other tools.We have also found that the memory consumption and performance over-head of the IDS with ARTINALI-generated invariants is lower than thosewith Perfume or Texada-generated invariants, but higher than those withDaikon-generated invariants. However, the memory usage for all tools is566.4. Summarymuch lower than the available memory in SEGMeter. We also measured theraw execution times of a full cycle of the IDS for each tool, and observedthat it takes less than 20 seconds even in the worst case (for Perfume), whichis only a third of execution time of the full cycle (60 seconds) . Therefore,the IDS is not a bottleneck in any of the four systems, and is easily able tokeep up with the system.57Chapter 7DiscussionIn this chapter, we first examine the threats to the validity of our experi-ments, followed by reflections on ARTINALIs generalizability.7.1 Threats to ValidityAn external threat to the validity is the limited number of CPS platformsconsidered (two). However, as we have mentioned, finding CPS platformsthat are publicly available and security critical is a challenge. We have at-tempted to mitigate this threat by choosing two fairly diverse platforms,with various time constraints, and di↵erent IDS optimization goals. Weacknowledge that these platforms exhibit somewhat simple behaviors - how-ever, many CPSes fall into this category [15].An internal threat to validity is in our evaluation of the ecacy of theinvariants for attack detection through fault injection experiments. Whilenot necessarily representative of all security attacks, fault injection allows usto emulate the behavior of potential attackers without biasing the evaluationtowards known vulnerabilities (at the time of the evaluation). We haveattempted to mitigate this threat by using mutation operators that wereused for emulating attacks in prior work [5].Another external threat to validity is that the IDSes based on specifi-cation mining techniques are vulnerable to malicious traces during train-ing/test phases, and ARTINALI is not an exception. We have attemptedto reduce the threat in the training phase by inferring the invariants o✏ine.During the testing phase, our IDS and CPS are executed on separate pro-cesses, and hence the intrusion detector module of IDS is isolated from theattacks that may compromise the CPS itself. However, protecting the trac-587.2. Generalizability of ARTINALIing module of IDS (that is located on CPS) against attacks is a challengingproblem. We assumed that the IDS is the root of trust for now, but inthe future, we plan to deploy trusted computing base strategies for the IDS(e.g., secure enclaves), so that the attacker cannot attack the IDS directly.Finally, a construct threat to validity is the evaluation metrics used formeasuring ecacy. FP and FN ratios have however been used in a lot ofprior work on intrusion detection, as have F-scores, and hence we do notbelieve this is a significant threat. Another potential construct threat isthe choice of tools we use for comparing with ARTINALI, but we mitigatedthis to an extent by first systematically classifying the space of invariantdetection techniques, and then choosing the tools in each category.7.2 Generalizability of ARTINALIARTINALI relies upon two features, namely event-based semantics, and con-ditional independence of time and data (Section ). Events are operationsthat involve interaction with the outside world. Event-based semantics im-plies that every event takes place in a unique time frame, and hence, thereis no concurrency among event executions. Secondly, ARTINALI assumesan event occurs at a specific time interval, and subsequently, data variablesare assigned to specific values. Thus, the time and data corresponding to aparticular event, are conditionally independent regardless of the dependencyamong events. These two features are a common paradigm for CPSes, andhence ARTINALI can be generalized to other CPS platforms such as pace-makers and unmanned aerial vehicles.However, ARTINALI is not applicable to non-CPS platforms for tworeasons. First, the non-concurrency of events does not hold in non-CPSplatforms such as mobile phones. Secondly, CPS events have limited func-tionality, and hence inferring invariants for each event is straightforward.Unlike CPSes, in general realtime systems, tasks can be of unbounded com-plexity. Furthermore, general purpose computers with full preemptive (i.e.,non-realtime) operating systems have a large space of potential behaviors,which makes it challenging to learn invariants for them.59Chapter 8Conclusions and FutureWork8.1 SummaryCyber-physical systems (CPSes) are becoming increasingly subject to secu-rity attacks due to their interconnectedness and relative lack of protection.In this thesis, we attempt to use dynamic invariant detection techniques tobuild intrusion detection systems for CPSes. Our key insight is that timeis a first class constraint in CPS systems, and hence we incorporate timeinto the invariants, in addition to data and events. We devise an ecientalgorithm for learning invariants over the three dimensions of data, eventsand time, and implement it in a tool called ARTINALI. We demonstrate theuse of ARTINALI on two CPS platforms for intrusion detection. We findthat ARTINALI has significantly lower false negatives and false positivesthan other dynamic invariant detection tools, while incurring comparableperformance and memory overheads.The most important aspect of this work is providing insights regardingovercoming CPS constraints such as real-time constraints and resource con-straints, for security developers. Many security solutions rely on a modelfor the correct behavior of the system they are monitoring. We hypothe-sized that a more comprehensive model leads to a higher coverage againstsecurity attacks. Our results support this hypothesis for ARTINALI, whichincorporates real-time constraints along three dimensions into the model,and hence is able to detect a significantly larger amount of attacks.On the other hand, the complete system model may be large, and check-608.2. Future working it requires more resources. Unlike the other specification mining tech-niques (such as Daikon that infers the relationship among all data variablesat the entry and exit point of all method calls in the program), ARTI-NALI only captures system calls (which are located on the attack surface)as events to generate the CPS model for attack detection, and hence doesnot model those parts of the code that are not security-critical. Our resultshows that even a selective model, that requires significantly less resourcesthan the complete model, may provide reasonable security coverage. Inother words, approximate security solutions are e↵ective for CPSes and pro-vide high enough coverage while requiring significantly less resources. Suchtechniques may be integrated with CPS software and hence, make it easierto build security solutions that meet the CPS requirements.In addition, ARTINALI works at event granularity to mine invariants.In spite the fact that ARTINALI captures all system calls as events forsecurity purposes, in ARTINALI, events are user-defined. This flexibilityenables users to optionally customize the level of granularity by choosinganother type of events, or prune the space of events by specifying only theimportant ones based on the system’s requirements.Moreover, ARTINALI is a specification mining technique that has beenprimarily developed for CPS security space. However, as it generates amulti-dimensional model including many rich properties of correct real-timebehavior of a program, it is applicable in a broader context other than se-curity. More particularly, it can be used for software reliability purposes, ina wide assortment of tasks, such as non-malicious bug detection, softwaretesting, data structure repair, and debugging of applications. Furthermore,ARTINALI-generated invariants can be useful for supporting program evo-lution and comprehension as well.8.2 Future workThere are three potential directions in which this thesis can be extended inthe future.618.2. Future work8.2.1 Extending the Generalizability of ARTINALIIn this thesis, we focused on two families of CPS platforms including smartmeters and smart artificial pancreases as case studies. Although we expectthe same technique to apply to similar CPS platforms, it would be interestingto examine the generalizability of ARTINALI to more complex CPS plat-forms such as unmanned aerial vehicles (i.e., drones). We define complexityas the total number of source lines of code and having more diverse function-alities (i.e., multiple operational modes). Prior work [23, 47] has shown thereare significant challenges in modeling the behavior of drones. For instance,drones operate in various flight modes with di↵erent level of autonomy, andin non-autonomous modes, they show more uncertainty not only in the tim-ing behavior but also in the functional behavior. Hence, generating systemmodel for non-autonomous modes in these systems is a challenge. Moreover,drones behave di↵erently in various flight states (including landing, takingo↵, hovering, etc). Therefore, designing an CPS model for the entire systemthat reflects the properties of each state precisely is challenging.8.2.2 Optimizing ARTINALI for the ScalabilityIn this thesis, we found that ARTINALI-based IDSes impose acceptableoverheads on our CPS platforms. However, the resource constraints of aCPS platform with larger code size may degrade the scalability of our tech-nique as the resource utilization of IDS increases. According to our ex-perimental results, a large fraction of IDS overhead is due to the tracingmodule collecting online information about the system. Prior work [52] hasfound there is a tradeo↵ between attack coverage and the amount of datacollected during tracing. Hence, a potential future direction is to optimallydeploy the tracing with respect to IDS goals including coverage and scala-bility, and CPS constraints including memory usage, performance overheadand computational power overhead.628.2. Future work8.2.3 Extending ARTINALI for Attack DiagnosisIn this thesis, we proposed a multi-dimensional invariant mining techniquefor attack detection in CPSes. However, once an attack is detected, it needsto be mitigated, and attack diagnosis bridges the gap between attack detec-tion and mitigation. Attack diagnosis is a promising approach to carryingout a propagation analysis from source of attacks (cyber vulnerabilities)to the corresponding e↵ects on the physical system [56]. Such analysis isuseful to rank the scope and severity of the cyber attacks and thereby, pri-oritize among various mitigation mechanisms to be selected with respect toCPS constraints. A potential future direction is to incorporate the dynamicinvariants generated by ARTINALI for attack diagnosis and mitigation.63Bibliography[1] Smart energy groups home page.,2011.[2] Perfume User Manual., 2014.[3] Texada User Manual.,2016.[4] The Daikon Invariant Detector User Manual., 2017.[5] Maryam Raiyat Aliabadi and Karthik Pattabiraman. Fidl: A faultinjection description language for compiler-based sfi tools. In Interna-tional Conference on Computer Safety, Reliability, and Security, pages12–23. Springer, 2016.[6] Leonardo Aniello, Claudio Ciccotelli, Marcello Cinque, Flavio Frattini,Leonardo Querzoni, and Stefano Russo. Automatic invariant selectionfor online anomaly detection. In International Conference on ComputerSafety, Reliability, and Security, pages 172–183. Springer, 2016.[7] Robin Berthier, William H Sanders, and Himanshu Khurana. Intru-sion detection for advanced metering infrastructures: Requirements andarchitectural directions. In Smart Grid Communications (SmartGrid-Comm), 2010 First IEEE International Conference on, pages 350–355.IEEE, 2010.[8] Alvaro Cardenas, Saurabh Amin, Bruno Sinopoli, Annarita Giani,Adrian Perrig, and Shankar Sastry. Challenges for securing cyber physi-64Bibliographycal systems. InWorkshop on future directions in cyber-physical systemssecurity, page 5, 2009.[9] Alvaro A Cardenas, Saurabh Amin, and Shankar Sastry. Secure control:Towards survivable cyber-physical systems. In Distributed ComputingSystems Workshops, 2008. ICDCS’08. 28th International Conferenceon, pages 495–500. IEEE, 2008.[10] Stephen Checkoway, Damon McCoy, Brian Kantor, Danny Anderson,Hovav Shacham, Stefan Savage, Karl Koscher, Alexei Czeskis, FranziskaRoesner, Tadayoshi Kohno, et al. Comprehensive experimental analysesof automotive attack surfaces. In USENIX Security Symposium. SanFrancisco, 2011.[11] E Chien, N Falliere, and LO Murchu. W32. stuxnet dossier. symantecsecurity response, 2010.[12] Mihai Christodorescu, Somesh Jha, and Christopher Kruegel. Min-ing specifications of malicious behavior. In Proceedings of the the 6thjoint meeting of the European software engineering conference and theACM SIGSOFT symposium on The foundations of software engineer-ing, pages 5–14. ACM, 2007.[13] Christoph Csallner, Nikolai Tillmann, and Yannis Smaragdakis. Dysy:Dynamic symbolic execution for invariant inference. In Proceedings ofthe 30th international conference on Software engineering, pages 281–290. ACM, 2008.[14] Barthe´le´my Dagenais and Martin P Robillard. Creating and evolvingdeveloper documentation: understanding the decisions of open sourcecontributors. In Proceedings of the eighteenth ACM SIGSOFT interna-tional symposium on Foundations of software engineering, pages 127–136. ACM, 2010.[15] Patricia Derler, Edward A Lee, and Alberto Sangiovanni Vincentelli.Modeling cyber–physical systems. Proceedings of the IEEE, 100(1):13–28, 2012.65Bibliography[16] John Eidson, Edward A Lee, Slobodan Matic, Sanjit A Seshia, and JiaZou. A time-centric model for cyber-physical applications. InWorkshopon Model Based Architecting and Construction of Embedded Systems(ACES-MB), pages 21–35, 2010.[17] Michael D Ernst, Jake Cockrell, William G Griswold, and DavidNotkin. Dynamically discovering likely program invariants to sup-port program evolution. IEEE Transactions on Software Engineering,27(2):99–123, 2001.[18] Earlence Fernandes, Jaeyeon Jung, and Atul Prakash. Security analysisof emerging smart home applications. In Security and Privacy (SP),2016 IEEE Symposium on, pages 636–654. IEEE, 2016.[19] Mark Gabel and Zhendong Su. Symbolic mining of temporal specifica-tions. In Proceedings of the 30th international conference on Softwareengineering, pages 51–60. ACM, 2008.[20] Pedro Garcia-Teodoro, J Diaz-Verdejo, Gabriel Macia´-Ferna´ndez, andEnrique Va´zquez. Anomaly-based network intrusion detection: Tech-niques, systems and challenges. computers & security, 28(1):18–28,2009.[21] Dan Goodin. US spy drone hijacked with GPS spoof hack, reportsays., 2011. [Online; accessed 15-Dec-2011].[22] Go¨sta Grahne and Jianfei Zhu. Fast algorithms for frequent itemsetmining using fp-trees. IEEE transactions on knowledge and data engi-neering, 17(10):1347–1362, 2005.[23] Hengle Jiang, Sebastian Elbaum, and Carrick Detweiler. Reducing fail-ure rates of robotic systems though inferred invariants monitoring. InIntelligent Robots and Systems (IROS), 2013 IEEE/RSJ InternationalConference on, pages 1899–1906. IEEE, 2013.66Bibliography[24] Sami Kamkar. SkyJack., 2013. [Online;accessed 19-Dec-2013].[25] Hermann Kopetz and Gu¨nther Bauer. The time-triggered architecture.Proceedings of the IEEE, 91(1):112–126, 2003.[26] Karl Koscher, Alexei Czeskis, Franziska Roesner, Shwetak Patel, Ta-dayoshi Kohno, Stephen Checkoway, Damon McCoy, Brian Kantor,Danny Anderson, Hovav Shacham, et al. Experimental security analy-sis of a modern automobile. In 2010 IEEE Symposium on Security andPrivacy, pages 447–462. IEEE, 2010.[27] Ted Kremenek, Paul Twohey, Godmar Back, Andrew Ng, and DawsonEngler. From uncertainty to belief: Inferring the specification within.In Proceedings of the 7th symposium on Operating systems design andimplementation, pages 161–176. USENIX Association, 2006.[28] Cheolhyeon Kwon, Weiyi Liu, and Inseok Hwang. Security analysis forcyber-physical systems against stealthy deception attacks. In AmericanControl Conference (ACC), 2013, pages 3344–3349. IEEE, 2013.[29] Neal Leavitt. Researchers fight to keep implanted medical devices safefrom hackers. Computer, 43(8):11–14, 2010.[30] Caroline Lemieux. Mining temporal properties of data invariants. In2015 IEEE/ACM 37th IEEE International Conference on Software En-gineering, volume 2, pages 751–753. IEEE, 2015.[31] Caroline Lemieux, Dennis Park, and Ivan Beschastnikh. General ltlspecification mining (t). In Automated Software Engineering (ASE),2015 30th IEEE/ACM International Conference on, pages 81–92.IEEE, 2015.[32] Dana Lewis. Introducing the# openaps project. 2015.[33] Chunxiao Li, Anand Raghunathan, and Niraj K Jha. Hijacking aninsulin pump: Security attacks and defenses for a diabetes therapy67Bibliographysystem. In e-Health Networking Applications and Services (Healthcom),2011 13th IEEE International Conference on, pages 150–156. IEEE,2011.[34] Hui Lin, Homa Alemzadeh, Daniel Chen, Zbigniew Kalbarczyk, andRavishankar K Iyer. Safety-critical cyber-physical attacks: Analysis,detection, and mitigation. In Proceedings of the Symposium and Boot-camp on the Science of Security, pages 82–89. ACM, 2016.[35] Davide Lorenzoli, Leonardo Mariani, and Mauro Pezze`. Automaticgeneration of software behavioral models. In Proceedings of the 30thinternational conference on Software engineering, pages 501–510. ACM,2008.[36] Stephen McLaughlin, Dmitry Podkuiko, Sergei Miadzvezhanka, AdamDelozier, and Patrick McDaniel. Multi-vendor penetration testing inthe advanced metering infrastructure. In Proceedings of the 26th An-nual Computer Security Applications Conference, pages 107–116. ACM,2010.[37] Robert Mitchell and Ing-Ray Chen. A survey of intrusion detec-tion techniques for cyber-physical systems. ACM Computing Surveys(CSUR), 46(4):55, 2014.[38] Robert Mitchell and Ing-Ray Chen. A survey of intrusion detec-tion techniques for cyber-physical systems. ACM Computing Surveys(CSUR), 46(4):55, 2014.[39] Gail C Murphy, David Notkin, and Kevin Sullivan. Software reflexionmodels: Bridging the gap between source and high-level models. ACMSIGSOFT Software Engineering Notes, 20(4):18–28, 1995.[40] Tony Ohmann, Michael Herzberg, Sebastian Fiss, Armand Halbert,Marc Palyart, Ivan Beschastnikh, and Yuriy Brun. Behavioral resource-aware model inference. In Proceedings of the 29th ACM/IEEE inter-national conference on Automated software engineering, pages 19–30.ACM, 2014.68Bibliography[41] Jerome Radcli↵e. Hacking medical devices for fun and insulin: Breakingthe human scada system. In Black Hat Conference presentation slides,volume 2011, 2011.[42] Orna Raz, Philip Koopman, and Mary Shaw. Semantic anomaly detec-tion in online data sources. In Software Engineering, 2002. ICSE 2002.Proceedings of the 24rd International Conference on, pages 302–312.IEEE, 2002.[43] Fred Samland, Jana Fruth, Mario Hildebrandt, Tobias Hoppe, and JanaDittmann. Ar. drone: security threat analysis and exemplary attack totrack persons. In Proceedings of the SPIE, volume 8301, 2012.[44] Florian Skopik, Zhendong Ma, Thomas Bleier, and Helmut Gru¨neis.A survey on threats and vulnerabilities in smart metering infrastruc-tures. International Journal of Smart Grid and Clean Energy, 1(1):22–28, 2012.[45] Sean W Smith. Security and privacy challenges in the smart grid. 2009.[46] Marina Sokolova, Nathalie Japkowicz, and Stan Szpakowicz. Beyondaccuracy, f-score and roc: a family of discriminant measures for per-formance evaluation. In Australasian Joint Conference on ArtificialIntelligence, pages 1015–1021. Springer, 2006.[47] Yunmok Son, Hocheol Shin, Dongkwan Kim, Young-Seok Park, Juh-wan Noh, Kibum Choi, Jungwoo Choi, Yongdae Kim, et al. Rockingdrones with intentional sound noise on gyroscopic sensors. In USENIXSecurity, pages 881–896, 2015.[48] Rock Stevens, Octavian Suciu, Andrew Ruef, Sanghyun Hong, MichaelHicks, and Tudor Dumitras¸. Summoning demons: The pursuit of ex-ploitable bugs in machine learning. arXiv preprint arXiv:1701.04739,2017.[49] Farid Molazem Tabrizi and Karthik Pattabiraman. Flexible intru-sion detection systems for memory-constrained embedded systems. In69BibliographyDependable Computing Conference (EDCC), 2015 Eleventh European,pages 1–12. IEEE, 2015.[50] Farid Molazem Tabrizi and Karthik Pattabiraman. Formal securityanalysis of smart embedded systems. In Proceedings of the 32nd An-nual Conference on Computer Security Applications, pages 1–15. ACM,2016.[51] Carolyn Talcott. Cyber-physical systems and events. In Software-Intensive Systems and New Computing Paradigms, pages 101–115.Springer, 2008.[52] Uttam Thakore, Gabriel A Weaver, and William H Sanders. A quan-titative methodology for security monitor deployment. In DependableSystems and Networks (DSN), 2016 46th Annual IEEE/IFIP Interna-tional Conference on, pages 1–12. IEEE, 2016.[53] John Villasenor. Next Homeland Security threat: cyber-physical at-tacks by drone strikes?,2011. [Online; accessed 19-July-2011].[54] Joachim Wegener and Matthias Grochtmann. Verifying timing con-straints of real-time systems by means of evolutionary testing. Real-Time Systems, 15(3):275–298, 1998.[55] Westley Weimer and George C Necula. Mining temporal specificationsfor error detection. In International Conference on Tools and Algo-rithms for the Construction and Analysis of Systems, pages 461–476.Springer, 2005.[56] Peng Xie, Jason H Li, Xinming Ou, Peng Liu, and Renato Levy. Usingbayesian networks for cyber security analysis. In Dependable Systemsand Networks (DSN), 2010 IEEE/IFIP international conference on,pages 211–220. IEEE, 2010.70Bibliography[57] Ye Yan, Yi Qian, Hamid Sharif, and David Tipper. A survey on cy-ber security for smart grid communications. IEEE CommunicationsSurveys & Tutorials, 14(4):998–1010, 2012.[58] Jinlin Yang, David Evans, Deepali Bhardwaj, Thirumalesh Bhat, andManuvir Das. Perracotta: mining temporal api rules from imperfecttraces. In Proceedings of the 28th international conference on Softwareengineering, pages 282–291. ACM, 2006.71


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items