Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

CORGIDS : a correlation-based generic intrusion detection system Aggarwal, Ekta 2019

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2019_may_aggarwal_ekta.pdf [ 852.35kB ]
JSON: 24-1.0378283.json
JSON-LD: 24-1.0378283-ld.json
RDF/XML (Pretty): 24-1.0378283-rdf.xml
RDF/JSON: 24-1.0378283-rdf.json
Turtle: 24-1.0378283-turtle.txt
N-Triples: 24-1.0378283-rdf-ntriples.txt
Original Record: 24-1.0378283-source.json
Full Text

Full Text

CORGIDS: A Correlation-based Generic IntrusionDetection SystembyEkta AggarwalB. E - M.B.A, Panjab University, 2013A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of Applied ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)April 2019c© Ekta Aggarwal, 2019The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:CORGIDS: A Correlation-based Generic Intrusion Detection Systemsubmitted by Ekta Aggarwal in partial fulfillment of the requirements for the de-gree of Master of Applied Science in Electrical and Computer Engineering.Examining Committee:Andre´ Ivanov, Electrical and Computer EngineeringSupervisorKarthik Pattabiraman, Electrical and Computer EngineeringCo-supervisor and Co-readerSathish Gopalakrishnan, Electrical and Computer EngineeringHead’s nominee and ChairiiAbstractCyber-Physical Systems (CPS) consist of software and physical components whichcollaborate and interact with each other continuously. CPS deployed in security-critical scenarios such as medical devices, autonomous cars and smart homes havebeen targets of security attacks due to their safety-critical nature and relative lackof protection. Anomaly based Intrusion Detection System (IDS) using data, tem-poral, and logical correlations have been proposed in the past. But none of theapproaches except the ones using logical correlations take into account the mainingredient in the operation of CPS, namely the use of physical properties. On theother hand, IDS that use physical properties either require the developer to defineinvariants manually, or have designed their IDS for a specific CPS. This studyproposes a Correlation-based Generic Intrusion Detection System (CORGIDS), ageneric IDS capable of detecting security attacks by inferring the logical correla-tions of the physical properties of a CPS, and checking if they adhere to the pre-defined framework. A CORGIDS-based prototype is built and used for detectingattacks on two example CPSs - Unmanned Aerial Vehicle (UAV) and Smart Artifi-cial Pancreas (SAP). It is found that CORGIDS achieves a precision of 95.70%, anda recall of 87.90%, while detecting attacks with modest memory and performanceoverheads.iiiLay SummaryCyber-Physical Systems (CPS) are composed of software and physical componentswhich are deeply intertwined. CPS are being deployed in security-critical scenariossuch as medical devices and autonomous cars, and therefore have been targets ofsecurity attacks due to their safety-critical nature and relative lack of protection.However, it is essential to protect these systems, as they form an indispensable partof our lives. Various Intrusion Detection System (IDS) have been proposed in pasthowever, none of the approaches except the ones using logical correlations takeinto account the physical properties of CPS. This thesis proposes Correlation-basedGeneric Intrusion Detection System (CORGIDS), a generic Intrusion Detection Sys-tem (IDS) capable of detecting security attacks by inferring the logical correlationsof the physical properties of a CPS, and checking if they adhere to the predefinedframework. A CORGIDS-based prototype is built and was successfully used fordetecting attacks on two example CPSs.ivPrefaceThis thesis is the result of work carried out by myself, in collaboration with Dr.Karthik Pattabiraman, Dr. Andre´ Ivanov and Mehdi Karimibiuki. A part of thiswork was published in ACM Workshop on Cyber-Physical Systems Security &Privacy (CPS-SPC) 2018. I was responsible for carrying out the research work,conducting experiments and evaluating them and also writing the papers. My su-pervisors, Dr. Karthik Pattabiraman and Dr. Andre´ Ivanov guided and supportedme throughout my research work. They gave me feedback during the phases like;motivation, design, and evaluations and also improving the quality of the researchpapers that we published. Mehdi Karimibiuki helped me in brainstorming throughthis research and also provided helped feedback which helped me to shape thiswork.CORGIDS: A Correlation-based Generic Intrusion Detection System. Ekta Ag-garwal, Mehdi Karimibiuki, Karthik Pattabiraman and Andre´ Ivanov, In the Pro-ceedings of the ACM Workshop on Cyber-Physical Systems Security & Privacy(CPS-SPC) 2018.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiAcronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . 61.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.6 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13vi3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.1 Work-flow of CORGIDS . . . . . . . . . . . . . . . . . . . . . . 143.2 A motivating example . . . . . . . . . . . . . . . . . . . . . . . . 173.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.1 Test-beds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.2 Experimental Procedure . . . . . . . . . . . . . . . . . . . . . . . 245 Attacks description and detection . . . . . . . . . . . . . . . . . . . 265.1 Attacks on UAV . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.2 Detection of attacks on UAV . . . . . . . . . . . . . . . . . . . . 285.3 Attacks on SAP . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.4 Detection of attacks on SAP . . . . . . . . . . . . . . . . . . . . . 315.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 Evaluation of CORGIDS . . . . . . . . . . . . . . . . . . . . . . . . . 336.1 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 336.2 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 366.3 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . 396.3.1 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . 396.3.2 Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406.3.3 Memory overhead . . . . . . . . . . . . . . . . . . . . . 406.3.4 Performance overhead . . . . . . . . . . . . . . . . . . . 416.4 Additional experiments . . . . . . . . . . . . . . . . . . . . . . . 416.4.1 Additional results for UAV platform . . . . . . . . . . . . 426.4.2 Additional results for SAP platform . . . . . . . . . . . . 446.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 Comparison with related work . . . . . . . . . . . . . . . . . . . . . 477.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . 477.2 Comparison on UAV Platform . . . . . . . . . . . . . . . . . . . . 507.2.1 Targeted attacks . . . . . . . . . . . . . . . . . . . . . . . 507.2.2 Arbitrary attacks . . . . . . . . . . . . . . . . . . . . . . 52vii7.3 Comparison on SAP Platform . . . . . . . . . . . . . . . . . . . . 547.3.1 Targeted attacks . . . . . . . . . . . . . . . . . . . . . . . 547.3.2 Arbitrary attacks . . . . . . . . . . . . . . . . . . . . . . 567.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598.1 Threats to validity . . . . . . . . . . . . . . . . . . . . . . . . . . 598.2 A Generic IDS . . . . . . . . . . . . . . . . . . . . . . . . . . . 608.3 Circumventing CORGIDS . . . . . . . . . . . . . . . . . . . . . . 609 Conclusion and Future work . . . . . . . . . . . . . . . . . . . . . . 629.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649.2.1 Implementation of CORGIDS on a real test-bed . . . . . . 649.2.2 Testing efficacy of CORGIDS on other attacks . . . . . . . 649.2.3 Identifying the malicious property using CORGIDS . . . . 649.2.4 Coupling of an automated mitigation technique with CORGIDS 65Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66viiiList of TablesTable 3.1 Slice of a non-faulty system trace obtained while flying an UAVon a random route . . . . . . . . . . . . . . . . . . . . . . . . 19Table 3.2 Slice of a faulty system trace obtained while an UAV was flyingon a random route and infected by distance spoofing attack . . 20Table 6.1 False Positive and False Negative obtained for CORGIDS onthe two test-beds . . . . . . . . . . . . . . . . . . . . . . . . . 38Table 6.2 Comparison of Precision and Recall for OpenAPS platform . . 38Table 7.1 Results of intrusion detection by ARTINALI for Targeted at-tacks on UAV platform . . . . . . . . . . . . . . . . . . . . . . 50Table 7.2 Results of intrusion detection by CORGIDS for Targeted attackson UAV platform . . . . . . . . . . . . . . . . . . . . . . . . . 50Table 7.3 Breakdown of arbitrary attacks for UAV platform . . . . . . . . 52Table 7.4 Results of intrusion detection by ARTINALI for arbitrary at-tacks on UAV platform . . . . . . . . . . . . . . . . . . . . . . 52Table 7.5 Results of intrusion detection by CORGIDS for arbitrary attackson UAV platform . . . . . . . . . . . . . . . . . . . . . . . . . 53Table 7.6 Results of intrusion detection by ARTINALI for targeted attackon SAP platform . . . . . . . . . . . . . . . . . . . . . . . . . 55Table 7.7 Results of intrusion detection by CORGIDS for targeted attackon SAP platform . . . . . . . . . . . . . . . . . . . . . . . . . 55Table 7.8 Arbitrary attacks on SAP platform and its response . . . . . . . 56ixTable 7.9 Results of intrusion detection by ARTINALI for Arbitrary at-tacks on SAP platform . . . . . . . . . . . . . . . . . . . . . . 56Table 7.10 Results of intrusion detection by CORGIDS for Arbitrary attackson SAP platform . . . . . . . . . . . . . . . . . . . . . . . . . 57xList of FiguresFigure 3.1 Workflow of CORGIDS . . . . . . . . . . . . . . . . . . . . 15Figure 3.2 Approach of CORGIDS . . . . . . . . . . . . . . . . . . . . 18Figure 4.1 Drones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Figure 4.2 Components of an OpenAPS platform . . . . . . . . . . . . . 24Figure 5.1 Attack tree for UAV . . . . . . . . . . . . . . . . . . . . . . 27Figure 5.2 Attack tree for SAP . . . . . . . . . . . . . . . . . . . . . . . 30Figure 6.1 Sensitivity Analysis: Independent variables are δ and λ . De-pendent variable is w. The vertical axes in all figures are thevalues of precision and recall calculated after averaging 5 foldcross validation of test system traces. . . . . . . . . . . . . . 35Figure 6.2 Sensitivity Analysis: Independent variables are w and λ . De-pendent variable is δ . The vertical axes in all figures are thevalues of precision and recall calculated after averaging 5 foldcross validation of test system traces. . . . . . . . . . . . . . 36Figure 6.3 Sensitivity Analysis: Independent variables are w and δ . De-pendent variable is λ . The vertical axes in all figures are thevalues of precision and recall calculated after averaging 5 foldcross validation of test system traces. . . . . . . . . . . . . . 37Figure 6.4 Result by varying the training threshold for the UAV platform 43Figure 6.5 Result by varying the number of training traces for the UAVplatform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Figure 6.6 Result by varying the training threshold for the SAP platform 45xiFigure 6.7 Result by varying the number of training traces for the SAPplatform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46xiiAcronymsCGM Continuous Glucose MonitorCORGIDS Correlation-based Generic Intrusion Detection SystemCPS Cyber-Physical SystemsDOS Denial-of-ServiceFN False negative ratioFP False positive ratioGCS Ground Control StationHMM Hidden Markov ModelIDS Intrusion Detection SystemMSPC Multivariate Statistical Process ControlOPENAPS Open Artificial Pancreas SystemPCC Pearson Correlation CoefficientSAP Smart Artificial PancreasSDC Silent Data CorruptionSITL Software in the LoopSUT System Under TestxiiiSVM Support Vector MachineTE Tennessee EastmanUAV Unmanned Aerial VehiclexivAcknowledgmentsI want like to express deepest appreciation to my supervisors, Prof. Andre´ Ivanovand Prof. Karthik Pattabiraman, for their invaluable guidance that helped me toshape my research, and for their persistent encouragement and patience that mademe fulfill my master’s program.I acknowledge our research group members, especially Mehdi Karimibiuki,who helped me tremendously via his feedback and comments. Also, I am gratefullythankful to all my passionate lab mates at the System On Chip and DependableSystems laboratory.I appreciate Natural Sciences and Engineering Research Council of Canada(NSERC) and Intel for providing the infrastructure and computational resourcesfor conducting this project.Finally, I would like to express my gratitude towards my family without whom Iwouldn’t be able to complete this program. I thank my husband, Anuj Issar, for hiswhole-hearted and unconditional support, sacrifice, guidance, and dedication. Myparents, Aggarwals and Sharmas, for sticking with me through thick and thin andconstantly encouraging me. Finally my brother, Akshit Aggarwal, who through hisconstant nudging helped me to maintain calm and think with a clear head. I amdelighted to have all of you by my side.xvChapter 1Introduction1.1 MotivationCyber-Physical Systems (CPS) are embedded systems consisting of software andphysical components which collaborate and are tightly coupled to the environmentin which operate. CPS operate in a closed loop fashion which involves sensingof the current environmental conditions by the sensors, these readings are thenpassed as input to the controller which then based on the control logic sends actu-ation commands to the actuators in the CPS. After the action has been taken, thesensors again take readings and this loop continues infinitely until the system isfunctioning. The rapid growth of CPS has led to an abrupt increase in the usage ofthese devices in our day-to-day life. CPS such as smart meters are used in smartgrids[1, 2] for recording and digitally sending the meter readings to the energysupplier for more reliability and ease of data collection. Autonomous cars[3, 4]are also gaining popularity as major car manufacturers add the option of drivingthe car in an autonomous mode, with little or no input from humans. UnmannedAerial Vehicle (UAV)[5, 6] embedded with camera, global positioning systems andother sensors, are now being deployed in various areas of operation ranging fromrecreational, military, farming, package delivery and disaster relief. Other areasof applications for the CPS are traffic control, HVAC (heating, ventilation and airconditioning), water management systems and smart medical devices.These CPS are different from typical computer systems because they operate in1a physical environment and their properties must conform to laws of physics. Also,CPS are built to solve a specific problem and are not multi-purpose like typicalcomputer systems. Other features which set CPS apart from traditional computersystems are:• Difficult to update: Manufacturers of CPS cannot easily update or replacethe hardware or the software present in the CPS. An extreme example of thisuse case occurs in a commercial aircraft, which is a safety-critical systemwith extensive and expensive certification requirements. Software patchesor updation of new features on systems like these may take a heavy toll intheir day-to-day working and therefore are conducted with caution. Anotherexample is of a smart medical device such as pacemaker, which is used formaintaining the heartbeat of a patient. Updating the software on devices suchas these, require the patient to physically visit the healthcare provider as itcannot be done via the Internet. Therefore, it might not always be possibleto patch a security vulnerability through a software update.• Real-time constraints: CPS interact with the environment continuously whilethey are operating. They need to perform actions based on the input gatheredfrom the sensors in real-time. For instance, autonomous cars, based on theirsurroundings and the path involved in reaching the destination, make certaindecisions such as turning the wheels, applying brakes or accelerating. Pace-makers maintain the heartbeat rate by stimulating electrical pulse at the rateof milliseconds. Also, surgical robots have to operate with high operationalaccuracy as they need to control the timings and area of operation on a pa-tient. All the CPS described above have real-time constraints, therefore anyhindrance in their continuous operability or change in their correctness andtiming behavior will have irreversible consequences.• Zero-day attacks: With the deployment of modern CPS around the world, thesecurity vulnerabilities inherent in them are not fully known. Therefore, thesecurity systems that will be required by these CPS need to be able to detectunknown or zero-day attacks, as compared to having a database of knownattacks contained in them.2• Resource constraints: CPS are built keeping in mind a particular opera-tion, therefore, they are often lacking in computing aspects such as memory,computational power, battery and CPU. Therefore, the security solution thatis devised for these devices, need to respect these constraints and operatewithin the limits. However, even with the resource constraints the behaviormodeling capability of the Intrusion Detection System (IDS) should not besacrificed as it will adversely effect the CPS for which the security solutionis being devised.• Large-scale deployments: CPS systems such as smart meters are deployedin large scale. Therefore, even a small amount of False positive ratio (FP) -when the IDS marks a benign execution of the CPS to be malicious - will leadto large amount of manual effort in examining the falsely reported attacks.Also, if an examination requires shutting down the CPS after an attack isdetected, it might not be the best case to use an IDS with high FP. Therefore,FP should be kept as minimum as possible by the IDS.To sum up, the security mechanism which will be developed should be mindfulof the constraints of the CPS mentioned above. Lately, CPS have been targets ofsecurity attacks due to their safety-critical nature and relative lack of protection.The advent of interconnected CPS to the Internet (also known as the Internet ofThings) has exacerbated their vulnerability as they obviate the need for attackersto have physical access to the CPS. Attacks on CPS such as the smart grid [7, 8]in which the attackers manipulate the smart meter readings have been discoveredby researchers. Attacks such as taking control of brakes and steering wheel insmart cars [3, 9] have been demonstrated by hackers. Even, CPS such as smartmedical devices [10, 11]have been targeted by hackers due to their relative lack ofsecurity mechanism and wireless communication technology. Therefore, there is acompelling need to protect CPS from security attacks.1.2 Threat ModelIn this thesis, we assume the goal of the attacker is to either alter the benign ex-ecution of the CPS or make the operator of the CPS think that the CPS is acting3maliciously. The reason behind second kind of attack could be that, the attackerwants the operator to think that the CPS has become unstable, as a result of which,the operator might follow some mitigation steps to stabilize the system. Duringthis stabilization, the operator might put the CPS on a standstill, until further ac-tions that need to be carried out are agreed upon. Therefore, the actions that theoperator is performing to stabilize the CPS, might be the goal of the attacker. Thereason behind both attacks could be either monetary or collecting valuable infor-mation from the CPS which could then be used in other kind of exploitation. Forinstance, in an UAV, an attacker could spoof the values of distance traveled beingsent to the Ground Control Station (GCS). As a result, the operator might issuecommands for the UAV to descend and land on the ground, until the cause of thedeviation is known. In order to achieve the goals, attacker can tamper with eitherthe communication channel or the control logic present in the controller. We nowdiscuss the access and capabilities of the attacker.Access: The term System Under Test (SUT) in this study is used to representthe system on which the analysis is performed. It is assumed that the attacker hasthe capability to gain read and write access to the communication channel betweenthe SUT and the controller. Using this access, the attacker can modify the contentsor add data packets being transferred. This assumption is realistic as previouswork [2, 12] has shown that such access is rather easy to get.Further, it is assumed that the attacker has the access to the control systemof SUT [13], which means that the application code can be modified to suit theattacker’s needs. Also, an assumption that the attacker cannot modify the operatingsystem kernel or the device firmware is made. This can be ensured by using codesigning[14] or trusted computing hardware if it is available.Capabilities: It is assumed that the attacker, using access to the communica-tion channel, can perform two types of attacks. The first one is spoofing, where thecontents of the data packets can be modified, and the second one is flooding wherethe number of data packets being sent to the controller can be increased. The at-tacker can also perform physical attacks on the CPS, for example by rebooting it atarbitrary points in time.With the access to the control system of SUT, it is assumed that the attacker canchange the control logic to introduce the attack in the CPS to accomplish the goal4of altering the benign execution of a CPS. However, as an attacker is likely to wantto remain stealthy, it is more likely to make small changes to the program ratherthan large-scale changes such as replacing the entire program with their own.For this research, attacks which compromise the confidentiality/privacy of theCPS such as network attacks - Denial-of-Service (DOS) or message dropping at-tacks - are not considered, because these attacks can be detected by network secu-rity mechanism. Also, only attacks that change the correlation between the logicalproperties are considered. Therefore, attacks which do not create an impact on thecorrelation between logical properties are not considered.1.3 ApproachNow, we discuss the approach that we take towards securing the CPS. IDS are beingused for protecting computer systems from security attacks, including CPS [15–17]. IDS work by monitoring the activity of the system for which they are deployedand raise alarm when they detect a malicious intent. Traditional forms of IDS aresignature-based, where signatures of known attacks are compared against the op-erations of the system to identify attacks. Unfortunately, signature-based IDS area poor fit for CPS as the attacks are often tailored to each kind of CPS, and hencecannot be described by generic signatures. Further, due to the remote and oftendisconnected nature of their operation, the attack database in CPS cannot be up-dated frequently unlike traditional computer systems. Finally, a motivated attackercan launch hitherto unknown attacks against a CPS, thereby evading detection bysignature-based schemes.In contrast to signature-based IDS, anomaly-based IDS extract a model of asystem’s behavior and detect any deviations from the extracted model as an attack.Such IDS do not need an attack database, and can hence detect hitherto unknownattacks. Because CPS have constrained behaviors, it is often straightforward to de-rive anomaly-based IDS for them, making these systems a good match for CPS.Unfortunately, anomaly-based systems exhibit high rates of false-positives in prac-tice, as learning a stable model of the system is often challenging. Therefore,some researchers have proposed using physics-based models for anomaly detec-tion models for intrusion detection in CPS [18–22]. The notion is that because5CPS interact closely with their physical environments, they need to follow laws ofphysics, which can in turn be used as the detection model. Efforts have been madeto use the physical properties of the power grid [20, 23], UAV [18] and water treat-ment systems [24] to build a model which represents the expected behavior of theCPS. However, in prior work [18–24], the IDS is designed specifically for a partic-ular CPS. Therefore, the above solutions cannot be easily generalized to other CPS,as the process of finding an appropriate model is both time consuming and effortintensive for developers.In this work, the logical correlations among the physical properties of the CPSare considered as the model for anomaly-based IDS. The hypothesis is that physicalproperties exhibit deterministic and predictable correlations among themselves, asthey have to adhere to the laws of physics. For example, consider the case of anUAV, which needs to follow Newton’s laws of motion during flight. Some physicalproperties of an UAV are: distance traveled, altitude, speed, and flight time. Whenan UAV is flying at a fixed altitude, it has a non-zero speed due to which the distancetraveled and flight time increases, while the battery life left in the UAV decreases.These relationships encompass the logical correlations among the physical proper-ties of the UAV. If during flight it is observed that the battery life left in the UAV isnot decreasing while the speed of the UAV is non-zero, this would imply that thereis some anomaly in the system, which potentially indicates an attack.1.4 Hidden Markov ModelsIn this study, an anomaly-based IDS is built which internally uses an Hidden MarkovModel (HMM) to find logical correlations among the physical variables in a system.HMM are useful for systems which can be represented by sequences or time series.An HMM is a finite model that can be used to describe a probability distributionover an infinite number of possible sequences in a given system [25].Unlike a simple Markov model, an HMM is composed of a number of hiddenstates. Each hidden state ’emits’ symbols according to emission probabilities, andthe states are interconnected by state-transition probabilities. Starting from an ini-tial state, a sequence of states is generated by moving from state to state accordingto the state-transition probabilities until an end state is reached. Each state then6emits symbols according to that state’s emission probability distribution, creatingan observable sequence of symbols. More formally, an HMM can be representedby pi , A, θ where pi represents the starting probability of the transitions betweenthe hidden states, while the transition probability matrix is denoted by A and θrepresents the emission probability of the hidden states.HMM are a good fit for problems in which i) the model parameters and observeddata are present, and there is a need to estimate the sequence of hidden states; ii)the observed data is given and the model parameters are to be estimated, and iii)the information of model parameters and observed data is present while there is aneed to find the likelihood of data. Therefore, in this study, we intend to use HMMfor the third kind of problem, i.e., determining the likelihood of current observeddata belonging to the predefined model’s parameters. In order to do so, the valuesof correlated physical properties of the system can be fed into an HMM, whichcan then infer the correlations between them. These correlations can be used todetermine the likelihood of the current observed data as stemming from the modeland its parameters. Any deviation could be signaled as an anomaly and a possiblesecurity attack.HMM act as the core of intrusion detection module mainly because they arecapable of finding data patterns in high dimensional, non-linear time series basedsystems [26]. Also, HMM work by creating hidden states and then transitioningbetween them which is very similar to the operations of CPS system, which aretypically modeled as state machines. HMM-based IDS are evaluated based on like-lihood measuring factors [26, 27], which represents the overall observable stateof the system. In this thesis, we also use likelihood to govern the entropy of theHMM used for intrusion detection. Unlike techniques such as correlation coeffi-cients, HMM are also highly resilient to noise and outliers. For instance, Krotofil etal. [28] use Pearson Correlation Coefficient (PCC) to determine correlation for thecluster entropy. PCC measures linear correlation among the variables, therefore isnot suitable for multidimensional non-linear data. Also the variables undergoingPCC must be either based on interval or ratio scale, making this approach muchless generic. Chen at el. [21] employ Support Vector Machine (SVM) to detectan anomaly in a time series based system. Unfortunately, SVM do not work wellwith time series data, because they work with a snapshot of the state and classify it7into a class. However, by manipulating the input feature vector to the SVM in sucha way that it encapsulates the time factor, authors use it for anomaly detection.On the other hand, Aliabadi et al. [29] use Frequent Item Set Mining algorithmwhich does not model the system, but mines the data under different events. Un-fortunately, they do not consider other physical properties, except time, of the CPS.Iturbe et al. [30] use Multivariate Statistical Process Control (MSPC) which inferscorrelation among the variables and is better suited for linear correlations, as itworks by generating orthogonal projections. However, for non-linearly correlateddata as in our case, MSPC is not able to find correlations.1.5 ContributionsThis research proposes a generic intrusion detection system capable of detectingsecurity attacks by inferring the logical correlations of the CPS and checking if theyadhere to a predefined framework. HMM are used to automatically infer the logicalcorrelations among the physical properties of the CPS with no a priori knowledgeof the physical laws adhered to by the system or any intervention by the program-mer. The HMM identifies a state as malicious by detecting either an undesired datacorrelation or lack of an expected data correlation among its physical properties.HMM are used as they are good at detecting outliers, and are typically used tomodel time-based systems ( Section 1.4). Though other papers have used logicalcorrelations to detect anomalies [21, 22, 28, 30], none of them have used HMM asthe core to build a generic IDS. To the best of our knowledge, Correlation-basedGeneric Intrusion Detection System (CORGIDS) is the first generic intrusion detec-tion system which uses HMM to infer logical correlations exhibited by the CPS todetermine if an intrusion has occurred. Our contributions are summarized by thefollowing set of actions and outcomes:1. Proposed the use of logical correlations exhibited by the physical propertiesof a CPS to detect intrusions, and the use of Hidden Markov Model (HMM)to infer the logical correlations.2. Designed a Correlation-based Generic Intrusion Detection System (CORGIDS)prototype using Hidden Markov Model (HMM) to detect intrusions in CPS.8Also, demonstrated its use on two behaviorally different CPS test-beds, namelyi) an UAV, and ii) a Smart Artificial Pancreas (SAP).3. Evaluated the effectiveness of CORGIDS by performing five targeted attacks(Chapter 5) and three arbitrary attacks (Chapter 7) on the above mentionedCPS. Found that CORGIDS is successfully able to detect all five attacks, andhas lower False positive ratio (FP) and False negative ratio (FN) rate thanother intrusion detection techniques (described in Chapter 6).4. Performed a comprehensive comparison with a technique in the related workwhich design a generic IDS to detect intrusion. Results exhibits that CORGIDSachieves significantly lower FP and FN for both targeted and arbitrary attacks(expect artificial delay insertion attack) when compared to this related work.This comparison and its results are described in Chapter 7.1.6 PublicationsThe work in this thesis has been published in the following research paper:• ”CORGIDS: A CORRELATION-BASED GENERIC INTRUSION DETEC-TION SYSTEM” Ekta Aggarwal, Mehdi Karimibuiki, Karthik Pattabiramanand Andre´ Ivanov, Proceedings of the 2018 Workshop on Cyber-PhysicalSystems Security and Privacy 2018 (Acceptance Rate: 45%)The remainder of this thesis is structured as follows. Chapter 2 explores relatedwork, and Chapter 3 explains the approach used to build an IDS, CORGIDS, which isused to detect intrusion. Then, Chapter 4 presents the test-beds used for evaluationof CORGIDS. Next, Chapter 5 describes the attacks which are emulated on theexperimental test-beds and Chapter 6 reports the results from the attacks seededin Chapter 5. Chapter 7 describes a quantitative comparison of CORGIDS with itsrelated work. Chapter 8 discusses the limitations and applicability of CORGIDS andfinally Chapter 9 concludes the thesis by describing conclusion and future work.9Chapter 2Related WorkIn this chapter, we discuss the prior work which generates invariants. We first clas-sify the related work into categories based on the type of invariants generated andthen discuss how the authors use the invariants generated for intrusion detection.Towards the end of this chapter a class of invariants, physical invariants, is dis-cussed in detail which is of particular interest to us as we also generate physicalinvariants to anomaly detection.We began by reviewing prior techniques for generating invariants and catego-rized them into six main classes, a) data invariants, which aims to use the valuesof data variables to generate the model; b) temporal invariants, which uses the se-quence of events in a given system to create the model; c) hardware invariants usethe hardware design of the system for which the IDS needs to be designed as inputto detect intrusion; d) network invariants, work at the network level and analyzetheir activity to determine intrusion; e) cooperation invariants, which are based onthe interconnection between various components of the system to build a model;and f) physical invariants, which uses the physical properties of the system to cre-ate a multi-dimensional model. The above mentioned invariants are discussed indetail below.Data Invariants: Significant work [31–35] has been done to determine how toextract data invariants from a system. Ernst et al. [31] built Daikon to dynamicallymine data invariants of a system, thus creating pre- and post-conditions which holdat every entry and exit of a method/function. Csallner et al. [32] propose DySy10to extract data invariants by dynamically executing test cases and simultaneouslyperforming symbolic execution of the program under study. In subsequent work,Csallner [35] designed DSD-Crasher to determine a program’s intended behaviorfor automatically generating test cases and finding bugs. Baliga et al. [33, 34]proposed Gibraltar, for inferring and enforcing data invariants to detect rootkits inthe operating system’s kernel.Temporal Invariants: Temporal invariants have been used to get a better un-derstanding of a system, uncover bugs and to build IDS. Yang et al. [36], definetheir model, Perracotta, to take as input a program and dynamically output tempo-ral invariants. Gabel and Su [37] built Javert which is configured with two basicpredefined patterns of temporal invariants. In subsequent work [38] they built a toolOCD, which is capable of analyzing the trace continuously using a sliding windowconcept to generate invariants. Beschastnikh et al. [39] generate temporal invari-ants dynamically through the use of system logs (traces) and programmer-specifiedregular expressions. Lemieux et al. [40] dynamically generate all the instantiationsof the invariants from a log file and the property types supplied in their tool calledTEXADA.Network Invariants: Khurshid et al.[41] focus on maintaining network cor-rectness and security using their tool, VeriFlow. They manually define networkinvariants like access control policies, absence of routing loops and availability ofa path to the destination to monitor the network for any possible intrusion. Throughthis work, they aim to find faulty rules issued by SDN and prevent them from en-tering the network.Hardware Invariants: Hangal et al.[42] build IODINE, a framework whichdynamically infers invariants from hardware designs. They consume informationcalled simulation dumps, which are generated by the hardware when the simula-tions occur. Then IODINE analyses simulation dumps using a series of analyzersto infer hardware invariants from it. They also employ some request-acknowledgepatterns to extract invariants. They demonstrate their experience of using IODINEon the memory controller unit of a microprocessor.Cooperation Invariants: Waksman et al.[43] stress on building a trusted mi-croprocessor from untrusted parts. They manually outline cooperation invariantswhich govern the intercommunication between several components of a micropro-11cessor. They base their approach on the observation that execution of an instructionin microprocessor consists of several coupled events. Also, they assume that notall cooperating units are lying at the same time based on the real world operation.They develop TRUSTNET and DATAWATCH to maintain security, privacy andintegrity of computer systems from malicious attacks.Physical Invariants: Systems such as CPS operate in the physical environmentand conform to laws of physics. Therefore, they consist of physical propertieswhich can be used to analyze the behavior of the system. Approaches from theprior work which use physical properties of the CPS, to generate invariants canbe classified into those that manually define physical invariants of the CPS, andthose that generate the invariants automatically from the CPS behavior. The secondapproach is more useful than the first one, as it reduces the developer effort andtime.• Manually defined physical invariants: Mitchell and Chen [18] aim to se-cure an UAV by specifying the physical invariants for each sensor and actua-tor embedded inside the system. In subsequent work, they designed an adap-tive specification based IDS [19] called BRUIDS, which could be adaptedbased on the attacker type and environment changes. Similarly, Choudhariet al. [20] manually describe scheduling invariants and physical invariantsin the form of Lyapunov functions. Combining these invariants they pro-duce cooperating invariants which specify and maintain the stability of thesystem. In another work, Paul et al. [23] represent a CPS with one systeminvariant which encapsulates all of its subsystems. Adepu and Mathur [24]design an IDS for a water treatment plant by manually describing the invari-ants for a particular sensor in terms of the water level changes between twoconsecutive readings.• Automatically generated physical invariants: Chen et al. [21] dynami-cally generate the physical invariant which is a SVM model. This SVM modelis then used to classify an activity as benign or malicious for a real-worldwater purification plant. However, as they use statistical model checking,they only provide probabilistic guarantees that the system is correct, leavingroom for False positive ratio (FP) and False negative ratio (FN). Zohrevand12et al. [22] dynamically generated the physical invariant which was a hiddensemi-Markov model for a water supply system. Though they based their ap-proach on data collected from a real water supply system, their model wasspecialized for a specific CPS. In recent work, Aliabadi et al. [29] designedARTINALI, which dynamically mined data, time and event invariants froma set of execution traces of the CPS. They use only one physical property ofthe CPS, time, when generating the three type of invariants - D|E, E|T andD|T - used for intrusion detection. Krotofil et al. [28] used correlations toidentify anomalies in the Tennessee Eastman (TE) process challenge (a real-istic simulation of a chemical process). They used the Pearson CorrelationCoefficient (PCC) for deriving the cluster entropy, which is highly sensitiveto outliers, and does not work well with non-linear data unlike HMM. Also,they rely on physical placement of the sensors for the effectiveness of theirapproach. On the other hand, Iturbe et al. [30] use MSPC to distinguishbetween malicious attacks and natural disturbances for TE process challengeusing logical correlations. In contrast to our work, this research focuses moreon diagnosing the reason behind the system’s current state.2.1 SummaryThere has been significant prior work to use invariants for intrusion detection. Un-fortunately, the former class of systems incur high false-positives(where the IDSwrongly marks a benign execution to be malicious) and false-negatives(where theIDS fails to identify a malicious execution and does not raise an alarm), thus im-plying unreliable detection. Physical invariants have the capacity to detect secu-rity attacks with low false-positives and false-negatives, but current work eitherrequires the invariants to be manually specified, which is time and effort inten-sive, or the systems have important gaps which inhibit their generalizability. Inthis research, an automated technique for capturing the logical correlations amongphysical variables in a generic CPS is proposed, which uses such correlations fordetecting intrusions.13Chapter 3ApproachIn this chapter, the approach that we take towards designing and building a genericIDS is presented. The first part of the chapter describes in detail the work-flowundertaken by CORGIDS along with the algorithm for intrusion detection. Next, it’sfollowed by a motivating example which describes a use case where the physicalinvariants generated by CORGIDS are able to detect intrusion.3.1 Work-flow of CORGIDSCORGIDS is a generic intrusion detection system which exploits the correlation ofthe logical properties of the SUT to detect intrusions. Figure 3.1 shows the keycomponents and the work flow of CORGIDS.CORGIDS workflow can be broken down into three main phases, namely, a)Logging Phase; b) Building an Intrusion Detector Phase; and c) Detecting IntrusionPhase. Each of the phases are explained below.1. Logging Phase: The 1. Logging Phase in Figure 3.1 is the starting pointfor building an intrusion detector and for deploying it on a system for whichintrusion detection is desired. SUT is an input to this phase and is passedthrough the 1.1 Logging module in which it is manually instrumented to col-lect the values of the correlated properties1. These properties are chosen by1The approach of manually instrumenting the code to collect logs has been used by priorwork [21, 29]14Figure 3.1: Workflow of CORGIDSthe user of the intrusion detection system based on the general knowledge ofthe SUT. This phase will ensure that the traces which contain the values ofthe properties while the system is running are collected. Also, it is assumedthat the source code of the SUT is available and can be modified for instru-mentation - this is reasonable as the developer of the system will deployCORGIDS. At the end of this logging phase, the system traces containing thevalues of the logical properties of the SUT are collected.2. Building an Intrusion Detector Phase: In this phase, the system traces col-lected from the Logging Phase are used to build an HMM which behaviorallyrepresents the SUT. The pseudo code of the algorithm for building an intru-sion detector is described below. To build an intrusion detector the systemtraces are fed into the HMM model for its training in Line 1 in ProcedurebuildAnIntrusionDetector.Training of an HMM is begun in procedure trainHMMModel in Line 5, by15Algorithm 1: Building an intrusion detectorProcedure buildAnIntrusionDetector (logs)1 trainedModel = trainHMMModel (logs)2 foreach l ∈ logs dologLikelihood(i) = log(trainedModel(l))S = sum of all logLikelihood(i)’s3 M = mean of S4 return trainedModel, MProcedure trainHMMModel (logs)5 foreach hiddenStates ∈ {2,3, . . . ,J} docreate an HMM model model(i)logLikelihood(i) = log(model(i))6 if (logLikelihood) <threshold then7 return model(i)varying the number of hiddenStates. The number of hidden states is a freeparameter of an HMM which needs tuning in order to create a model whichcan be used for intrusion detection. Iteration begins with HMM model(i)with the starting value of two hidden states and is kept on increasing by one(Line 5). The log likelihood of the model(i) is calculated which representsthe goodness of the model(i) fit of the model to the data that was used forconstructing it. The log likelihood is stored in variable logLikelihood(i) asshown in Line 5. The threshold in Line 6 represents the minimum differencebetween the current and previous HMM log likelihood. Using threshold as astopping criteria for HMM has also been used in prior work [44]. The bestHMM is returned in Line 7 with its parameters as the model to be used forintrusion detection. At this point the trainedHMMModel is used to calculatethe log likelihood for each of the training system traces (Line 2). As a byproduct, the mean of log likelihood is calculated (Line 2) to get the estimatedlog likelihood value M (Line 3) for a training log. M is then used for com-parison in the later steps to detect intrusion. Creating HMM by increasingthe number of hidden states uses a significant amount of memory and com-putational power. However as this phase needs to be carried out just oncefor a SUT, it is not a major bottleneck. Once the intrusion detector HMM is16created, it can be used to detect an anomaly in the next phase.3. Detecting Intrusion Phase: This phase starts with the Logging Phase whichis used to collect the system trace from the SUT while it is running. In thisphase, only the system trace corresponding to the running SUT is collected,rather than many different system traces. The trace generated is then used to-gether with the intrusion detector built in the Building an Intrusion DetectorPhase. Using the HMM model and its saved parameters, the log likelihoodof the current system trace is calculated and compared with the mean loglikelihood M calculated in the Building an Intrusion Detector in Line 6. Ifthe log likelihood of the system trace is less than a specified range (δ ) fromM, it signifies that the system trace does not follow the behavior which wasobserved when the HMM was being trained. The specified range (δ ) is foundby running a sensitivity analysis (Section 6.1). Further, as the system tracesused for building the HMM were assumed to be correct (i.e., not attacked),this implies that the current system trace represents a system under attack.Thus, current state of the SUT is flagged to be malicious.3.2 A motivating exampleThe earlier example of an UAV from Chapter 1 is used in this chapter to illustratehow CORGIDS can be used to detect intrusions in Figure 3.2. As described inChapter 1, an UAV has physical properties such as the current altitude, distancetraveled, current speed and flight time. These physical properties are correlated toeach other as per the laws of physics. The approach which will be used to detectan intrusion in an UAV is elaborated, using the work-flow described in Figure 3.2.Distance spoofing attack scenario of the UAV is used as an example. This attack isexplained in detail in Chapter 5.First the Logging Phase starts, where the UAV is instrumented to collect thecorrelated properties such as altitude, distance traveled, speed and flight time. Theabove properties are collected at regular intervals of time to form the system traces.A section of the sample system trace collected is shown in Table 3.1. In the trace,it can be observed that all the properties are correlated with each other, and that thecorrelations are fairly stable. For instance, if the Speed of the UAV increases, the17Figure 3.2: Approach of CORGIDSDistance traveled will also increase proportionally. Further, the Distance traveledproperty can have values that are either increasing or stagnant. Multiple iterationsof the UAV were run by varying the routes it travels, to collect non-faulty systemtraces from it.In the second phase, Build an Intrusion Detector, the system traces collectedfrom Logging Phase are used. HMM are generated, model(i), by varying the num-ber of hidden states in line 5 in the given algorithm. Then for each of the model(i)generated, the logLikelihood(i) is calculated in line 5 to determine if the model(i)fits the data used for constructing it. To accomplish this, the difference in logLike-lihood(i) is compared to the threshold in line 6 and if the threshold is met, model(i)18Table 3.1: Slice of a non-faulty system trace obtained while flying an UAVon a random routeAltitude(m)Batterycharge(%)Distancetraveled(m)Speed(m/s)Flighttime (s).. .. .. .. ..40 89 42.1445 1 38.3240 89 44.2563 2 39.34240 89 47.2397 3 40.35640 89 51.0202 3 41.37640 88 55.2434 4 42.34540 88 59.5897 4 43.34640 88 64.1632 4 44.33541 88 68.8979 4 45.32341 88 73.7389 4 46.35141 87 78.6564 4 47.44841 87 83.6196 4 48.55141 87 88.6138 4 49.6141 87 93.627 5 50.60441 86 98.6659 5 51.507.. .. .. .. returned in line 7. It was found that an HMM with 15 hidden states is the onewhich met the threshold. As showing 15 hidden states in the Figure 3.2 will clutterit, we simplified the model by showing only 3 hidden states. Further, in lines 2 and3, the sum of logLikelihood’s for all the correlated logs S is calculated from whichM (mean log likelihood) = −4535.933 is extracted.To demonstrate how an attack will be detected by CORGIDS, we consider anattack where the attacker decides to spoof the values of distance traveled foundinside the data packets being transferred from the UAV to the GCS. An UAV peri-odically send the flight data to the GCS to keep it updated about its whereabouts.To intervene the working of UAV, the attacker gains access to the communicationchannel between the UAV and GCS. Now, an attacker can easily change the contentsof the data packets being transferred.In the final phase, when the UAV is deployed in production, the Detecting In-trusion Phase is active in the GCS and uses the current system trace produced fromthe logging module along with the trained HMM to detect intrusion. A slice of19Table 3.2: Slice of a faulty system trace obtained while an UAV was flyingon a random route and infected by distance spoofing attackAltitude(m)Batterycharge(%)Distancetraveled(m)Speed(m/s)Flighttime (s).. .. .. .. ..40 89 42.7868 1 38.20640 89 45.2942 2 39.27941 89 48.6934 3 40.27242 89 42 4 41.26142 88 57.0199 4 42.26743 88 46 4 43.28543 88 66.0254 4 44.35744 88 70.7879 4 45.34744 87 65 4 46.29245 87 80.5709 4 47.3746 87 85.5441 4 48.38646 87 49 4 49.37347 86 54 4 50.36747 86 100.6006 4 51.402.. .. .. .. ..the faulty-system trace is shown in Table 3.2. As can be observed, the values ofdistance traveled are changing but do not follow the correlations observed in theearlier trace in Table 3.1. As only the distance traveled values have been tamperedwith, leaving other logical properties intact, we get a correlation which is differentfrom the one that is expected by the trained HMM. This results in the differencebetween the mean and current log likelihood values being greater than the thresh-old value - say (δ ). From Figure 3.2, the log likelihood of the current system traceis more than δ from the M (mean log likelihood). Therefore, CORGIDS flags thecurrent state of the UAV to be malicious. The value of the threshold used in thisexample, (δ ) = 328.19 is determined experimentally in section SummaryIn this chapter we introduced CORGIDS by first describing its workflow, phase byphase. As discussed in the chapter, CORGIDS deduces the behavior of the SUT20in the Build an intrusion detector phase. This behavior is then used later in theIntrusion detection phase to flag any malicious activities. CORGIDS is novel anddifferent from other prior work discussed in previous chapter because it dynami-cally generates the physical invariants for the CPS and to do it uses HMM. HMMcan exceptional in inferring the correlation/likelihood between different variables.Also, in the first section of the chapter we demonstrate with the help of an algorithmhow a trained HMM is generated with variables such as number of training tracesand threshold. Later, we provide an example which walks through the 3 phasesof work-flow of CORGIDS along with the possible attack scenario. The exampleexplained is taken from our experiments and contains the real values of mean loglikelihood, acceptable range of the trained intrusion detection model. The decisionmaking process of CORGIDS is shown pictorially to enhance understanding.21Chapter 4Experimental SetupIn the previous chapter, we have described how the intrusion detection model canbe trained and how it be able to detect the attacks on CPS. This chapter contains thedetails of the test-beds used for testing the efficacy of CORGIDS. Each test-bed’sworking is explained in detail along with their source. Secondly, the experimentalprocedure undertaken to conduct the experiments is shared.4.1 Test-bedsTo demonstrate that CORGIDS is generic, two CPS test-beds were chosen on whichthe experiments were carried out. These test-beds contain correlated properties anda predefined framework according to which the properties change their values.1. Unmanned Aerial Vehicle (UAV): An UAV, commonly known as a drone, is atype of aircraft different from others mainly because it does not have a pilotaboard. UAV periodically send the flight data to the GCS to keep it updatedabout its whereabouts.An UAV mainly consists of sensors, control logic and actuators forming aclosed loop. The sensors sense the current state of the UAV and its environ-ment and pass it on to the controller, which makes the decision about thenext step to be taken. The decision taken is then sent to the actuator - thisloop runs infinitely while the UAV is in operation. ArduPilot’s Software inthe Loop (SITL) [45] was used for the experiments. ArduPilot is an open-22(a) Real drone(b) ArduPilot’s SITLFigure 4.1: Dronessource autopilot software and is vastly deployed on various vehicle systems.SITL was chosen as the test-bed on a local machine due to lack of a real UAV.2. Smart Artificial Pancreas (SAP): SAP is a medical device used by the diabeticpatients to automatically analyze the insulin dosage to be injected based onthe blood glucose level. It helps in reducing human error and analyzes thecurrent blood glucose levels regularly at fixed intervals of time. A SAP con-sists mainly of i) a blood glucose monitor, which reads the blood glucoselevels of the patients at regular interval of time, ii) a controller, which basedon the blood glucose values decides the insulin that needs to be injected, iii)an insulin pump, which based on the value generated by the controller, in-jects a specific amount of insulin into the patient. Open Artificial PancreasSystem (OPENAPS), an open source SAP was used to evaluate CORGIDS.OPENAPS implements the controller part of the SAP, and has been used in23prior studies [29]. As there was no real patient, simulated values from bloodglucose monitor and the insulin pump we used for our experiments. The val-ues of blood glucose were taken from the test cases provided by OPENAPS,instead from the blood glucose monitor. These values were then served asinput to the OPENAPS to get the amount of insulin required by the patient.The OpenAPS controller was installed on a Raspberry Pi 3 microprocessorto evaluate the memory and performance overhead of CORGIDS.Figure 4.2: Components of an OpenAPS platform4.2 Experimental ProcedureTo evaluate CORGIDS efficacy, the process of attack detection was partitioned intotwo phases, namely training phase and testing phase. The system traces obtainedfrom SUT were randomly divided into training and testing batches. The trainingphase was the one in which the intrusion detector was trained from the non-faultysystem traces that were randomly assigned. Sensitivity analysis was also performedto analyze the value of the parameters which have the most impact on the perfor-mance of the intrusion detector. For instance, for the UAV testbed, routes whichthe UAV used as the flight plan were randomly generated. Therefore, after havingthe UAV simulator fly on all the randomly generated routes, the non-faulty systemtraces were obtained. These logs were then randomly distributed for training andtesting phases. In the testing phase, the intrusion detector which was built in train-ing phase, was used to find out if an intrusion was correctly detected. results werethen used to gauge the performance of CORGIDS based on the evaluation criteria.24In order to reduce variability, five-fold cross validation was run for each of theattacks described in Chapter 5.25Chapter 5Attacks description and detectionIn this chapter, the attacks that were emulated on each test-bed are discussed. Theattacks that are discussed here are targeted attacks, which means that they specif-ically target physical properties of the two CPS platforms. Note, that the attacksdesigned for both the test-beds are intentionally stealthy, i.e. it is expected that theattacker wants to remain undetected while introducing some malicious content toaccomplish its goal. Also, attack trees are used for designing attacks on the test-beds. These attack trees are based on prior attacks on very similar systems, thusmaking them realistic and appropriate for testing CORGIDS.5.1 Attacks on UAVAs discussed in previous chapter, an UAV regularly transmits flight data to the GCS,so that it can be tracked throughout its flight. The GCS based on the flight datareceived interprets if the UAV is following the instructed guidelines or has driftedfrom it. An attack tree for faulty UAV operations (shown in Figure 5.1) was for-mulated, based on attacks introduced in previous work [5, 18]. There are threebranches in this tree, namely, i) Network Tampering; ii) Storage Tampering; andiii) Measurement Tampering. Two out of the three branches were used to developattacks which are discussed below.• Battery Tampering Attack (Block B1-B4): This attack occurs when an at-tacker is able to tamper with the control logic of the UAV by hacking it. By26Figure 5.1: Attack tree for UAVchanging the control logic, the attacker can change the decisions that aremade based on the input physical properties from the sensors. Obtaining theaccess of the UAV is not an unreasonable condition mainly due to the avail-ability of tools capable of achieving the same [46, 47]. In this attack, theattacker can change the part of the code where the percentage of battery leftin the UAV is being sent to the GCS. The original value of percentage bat-tery left in the UAV can be substituted with a value greater than the currentvalue, to lead the GCS into the false understanding that the UAV has plenty ofbattery left in it. Specifically, if the attacker through eavesdropping the com-munication channel, knows that the battery decreases at a particular rate, itcan then send faulty values to make the GCS believe that battery is depletingat a decreased rate to accomplish the motive. Eventually, reaching to a pointwhere the UAV crashes on the ground due to battery drainage, and leads tothe possession of sensitive data by the attacker. As we did not have accessto a real UAV, the experiments on ArduPilot (a real time simulator for UAV)were performed on a local machine. Therefore, we had access to the UAV27and modified its control logic to plant this attack in the code itself.• Flooding Attack (Block A1-A4): The flooding attack occurs when the com-munication channel between the UAV and GCS gets compromised. In thisscenario, an attacker could mount the attack by flooding the communicationchannel by the sending the extra packets along with the ones destined to bereceived by the GCS [46]. The motive of this attack could be populatingthe channel so that the GCS is unable to infer the correct whereabouts of theUAV thereby, leading the attacker to control and use the UAV as desired. Theextra packets being sent can contain physical properties which are differentfrom the legitimate ones. However, we assume that the attacker is stealthyand chooses values close to the real ones to avoid detection. Therefore, theattacker can even resend the packets that were already transmitted to be un-noticed by the GCS. To achieve this attack, faulty data packets were injectedinto the communication channel between UAV and GCS.• Distance Spoofing Attack (Block A (1,2,5,6)): By sending a different valueof the distance traveled instead of the original value, an attacker can falselyportray the current route or the current position of the UAV to the GCS. Thisattack can take place when an attacker eavesdrops on the communicationchannel to know the format of data being transmitted. This knowledge thencan be used to spoof the value of the distance covered in the data packetsbeing sent to the GCS. The motivation behind this attack is that the attackerwants to fool the GCS by leading it to believe that the UAV is following a dif-ferent schedule/route than the planned one. This might lead the GCS to takean action which was the intention of the attack in the first place. This attackis mounted by spoofing false distance traveled data into the communicationchannel between the GCS and UAV. Similar to flooding attack, the com-munication channel was intercepted to send spoofed values for the distancetraveled property to the GCS.5.2 Detection of attacks on UAV• Battery Tampering Attack: As detailed in the attack description, the at-28tacker changes only the battery values in a data packet which also containsother correlated properties such as distance traveled, altitude, speed, andflight time. When this data is received by the GCS with CORGIDS enabledon it, the trained HMM model in the intrusion detector module detects anabnormal activity. A malicious activity is detected because the correlationexpected by the HMM is not the same as received by it, mainly due to thedifference in the relationship of battery with the other properties in the datapacket. As a result, the log likelihood of the current system trace comes outto be less than the intrusion detector, which makes it faulty. This leads toraising of an alarm by the GCS.• Flooding Attack: To detect this attack, the data packets that are receivedby the GCS are fed into the intrusion detector module of CORGIDS. A keypoint to note here is that, if the UAV sends one data packet per second tothe GCS, the data packets received at the GCS end, will be greater than thenumber of packets sent by the UAV, because of the flooding attack. Thetrained HMM model will detect a malicious activity as the number of datapackets which are used for decision making are greater than the case whenthere is no flooding attack. This will lead to a lesser log likelihood of thecurrent data packets than the trained HMM, thus flagging the current state asanomalous.• Distance Spoofing Attack: When spoofed messages reach the GCS, they aregiven to the trained HMM model to find out discrepancy, if any. An importantthing to note here is that the number of data packets sent by the UAV and re-ceived by the GCS are same. However, in some packets the distance traveledby UAV is spoofed to falsely portray that it is following a different route ormay be the sensors are returning some faulty values. However, the correla-tion between the distance traveled and other flight data parameters from amix of faulty and non-faulty packets, is not what is expected by the trainedHMM. Thus, the intrusion detector flags the current state to be anomalous asthe log likelihood of the data packets fed into it is lesser than expected.295.3 Attacks on SAPThe correct execution of SAP is of vital importance as the life of the patient de-pends on it. As discussed above, SAP consists of three components: blood glucosemonitor, controller and insulin pump forming a closed loop. The attacks that werederived for SAP are discussed below and take advantage of the communicationchannel and the access of the code for the controller. An attack tree shown in Fig-ure 5.2 was built using the attacks demonstrated in prior work [11, 29]. The attacksplanted on SAP test-bed are based on the two scenarios described in it.Figure 5.2: Attack tree for SAP• Insulin Tampering Attack (Block A1-A4): Similar to battery tamperingattack on an UAV, insulin tampering attack also occurs when the attackerhacks the controller unit of the system i.e., OPENAPS [11]. After hacking, theattacker can modify the logic where the rate of insulin is calculated, basedon the input blood glucose values sampled from the patient. This will leadto injection of faulty insulin dosage into the patients body which can prove30fatal. Raspberry Pi 3 was used for our SAP experiments and change in thecontrol logic of the OPENAPS was made to reflect this attack. After the attackhad been planted, the insulin dosage command sent out by the controller wasfaulty as expected.• Glucose Spoofing Attack (Block B1-B4): The glucose spoofing attack mod-ifies the value of the blood glucose contained in the data packets being sentfrom the patient. The incorrect value which will be substituted can be eithergreater than, or less than the measured blood glucose value. This change inthe real value will lead the controller to calculate an incorrect value of in-sulin (though the logic through which insulin dose calculated is untouched),which will have harmful effects on the patient. This attack was mountedby injecting false data into the communication channel between the bloodglucose monitor and the controller.5.4 Detection of attacks on SAP• Insulin Tampering Attack: As the attacker modifies only the insulin dosagewhile keeping the other properties the same, the intrusion detector is able todetect the attack, as the current correlation is not what it expects after itstraining phase. Similar to above attacks the log likelihood of the currentsystem trace is less than that of the trained HMM, thus arousing suspicion.• Glucose Spoofing Attack: The intrusion detector module of CORGIDS re-ceives the correlated properties which contains both faulty and non-faultyvalues of the blood glucose in it. Thus, based on this input data, the log like-lihood generated by the current log differs from that expected by the trainedHMM. This indicates that there is an intrusion in the current state of the SAP.Although, the attacks that are demonstrated in this thesis break the logical cor-relation directly, CORGIDS is also capable of detecting an anomaly which is gen-erated through indirect attacks. For example, instead of changing the physicalproperty like battery % left in the battery tampering attack (this is a attack in whichcorrelations are broken directly), we could change either the value of some vari-able (other than the physical variable) used in the UAV or alter other logic which31does not directly effect the physical property (more details in Chapter 7). Thesechanges will propagate in the program and ultimately reach the receiving end ofthe CPS (an actuator). If they do not, then the attack is likely to be harmless asthe attacker cannot change the physical behavior of the CPS without modifying itsoutputs.5.5 SummaryFrom the above mentioned targeted attacks and their detection we see that CORGIDSis able to detect all attacks. This is mainly because in the intrusion model builderphase, it infers the behavior from the training traces supplied, while putting theinferred behavior to use while detecting intrusion. As CORGIDS use physical in-variants to detect intrusion and CPS have to obey the laws of physics, a change incorrelation among the physical properties was easily spotted.32Chapter 6Evaluation of CORGIDSIn this chapter, the results from the sensitivity analysis and attacks seeded in Chap-ter 5 are presented. Also, additional experiments which provide more insight abouthow the tuning parameters of the trained HMM from phase two of workflow ofCORGIDS (Chapter 3), effect the precision and recall were performed. The motivebehind discussing the results achieved by CORGIDS is to measure its performancein terms of the evaluation criteria described in this chapter.6.1 Sensitivity AnalysisBefore evaluating CORGIDS, a sensitivity analysis to find out the values of the threeexperimental parameters was performed. Sensitivity analysis is a study which de-termines how different values of an independent variable affect a particular depen-dent variable under a given set of assumptions. It can be used within given bound-aries that depend on one or more input variables, such as the effect that changes ininterest rates has on bond prices. The three experimental parameters for which thesensitivity analysis was carried out are as follows.• Window size (w): A window size is defined as the time duration which isunder consideration for detecting an intrusion [22] in a SUT. A large w meansthat greater historical data is given to the HMM to decide of a maliciousactivity.• Acceptable range (δ ): An acceptable range defines a range within which33the testing system trace’s likelihood can vary from the mean log likelihoodfrom the trained HMM. A testing trace with the value within range from thespecified mean will be marked to be similar to the training traces. If the valueof δ is chosen to be large, it means that there is enforcement of loose controland allowing system traces with substantial variation from the trained HMMto be considered benign.• Threshold of consecutive decisions (λ ): Stateful tests [48] are performedby maintaining the historical decisions of the IDS and generating alert onlyif it goes above a chosen threshold. The intuition behind using the λ is tolook back at the historical decisions of the intrusion detector to see if there isreally an anomaly or if it is just one time spike in the system. Greater valueof λ enforces more number of consecutive historical intrusion decisions togenerate an alert.The values of w, δ and λ were chosen based on the highest values of precisionand recall (Section 6.2) achieved in this experiment. The results from sensitivityanalysis are shown in Figure 6.1, Figure 6.2 and Figure 6.3. w is measured inminutes while δ in standard deviations. A key point to note here is that more thevalue of precision and recall for a set of experimental parameters, higher is therate of detection. This experiment was carried out by varying one parameter at atime and keeping others constant. For instance, graph a in Figure 6.1 denotes thescenario where the w is varied from 2 to 4 minutes while keeping δ = 1 and λ = 2.Following similar pattern from graph a, the constant parameters will be swept , thatis, δ and λ from their lowest to the highest values. This forms the graphs a - d inFigure 6.1. Therefore, similar to Figure 6.1, other experiments were conducted byvarying δ in Figure 6.2 and λ in Figure 6.3. This sensitivity analysis representsthe data collected from the distance spoofing attack on the UAV testbed.From the graphs, it can be seen that the precision and recall are increasing asthe w is increasing from 2 to 4 minutes, while they are decreasing when the δ andλ are increasing from 1 to 3 standard deviations and 2 to 4 decisions respectively.From this trend it can be inferred that the precision and recall are largest when thew is large with small λ and δ . Similar trend was observed for other attacks on thetwo test-beds. The reason for the trend that was observed is that a HMM requires34(a) δ = 1 and λ = 2 (b) δ = 3 and λ = 2(c) δ = 1 and λ = 4 (d) δ = 3 and λ = 4Figure 6.1: Sensitivity Analysis: Independent variables are δ and λ . De-pendent variable is w. The vertical axes in all figures are the values ofprecision and recall calculated after averaging 5 fold cross validation oftest system traces.substantial historical data to determine if there is some anomaly in the system. Withlesser history (smaller window size), it is unable to correctly infer the current stateof the system. Therefore, when a greater w of 4 minutes is provided, it is able tocreate a more realistic model of the system, as the HMM is now more behaviorallyknowledgeable about the system after having a large w and can now make decisionswith higher likelihood, thus giving the best results for the least value of δ and λ .An important point to note here is that though CORGIDS is able to detect attackseven with less favorable values of w, λ and δ , it achieves less precision and recallin doing so. On the other hand, if the results obtained from sensitivity analysisare used, higher values of precision and recall can be achieved. Therefore, eitherthe lesser favorable parameters can be chosen and results can be obtained quicklyat the cost of accuracy, or with the most favorable parameters, fewer FN can beobtained, while incurring some latency.35(a) w = 2 and λ = 2 (b) w = 4 and λ = 2(c) w = 2 and λ = 4 (d) w = 4 and λ = 4Figure 6.2: Sensitivity Analysis: Independent variables are w and λ . De-pendent variable is δ . The vertical axes in all figures are the values ofprecision and recall calculated after averaging 5 fold cross validation oftest system traces.6.2 Evaluation CriteriaPrecision, recall, FP, FN, performance overheads and memory overheads were usedto evaluate CORGIDS. These metrics are explained below:• Precision: For a malicious execution of SUT, when an intrusion detectorcorrectly detects an intrusion, is called precision. For an intrusion detector,the higher precision the better.• Recall: On the other hand, recall is the percentage when the SUT executionwas malicious and the intrusion detector correctly identified it among all themalicious SUT executions. For an intrusion detector, the higher recall thebetter.• False positive ratio (FP): Represents the ratio of execution traces that werefalsely reported as malicious to the total number of normal traces for a given36(a) w = 2 and δ = 1 (b) w = 2 and δ = 3(c) w = 4 and δ = 1 (d) w = 4 and δ = 3Figure 6.3: Sensitivity Analysis: Independent variables are w and δ . Depen-dent variable is λ . The vertical axes in all figures are the values ofprecision and recall calculated after averaging 5 fold cross validation oftest system traces.CPS.• False negative ratio (FN): Represents the ratio of malicious attacks that wentundetected/unnoticed by the IDS to the total number of attacks for a givenCPS.• Performance Overhead: Performance overhead reflects the additional timetaken, when CORGIDS is deployed on the SUT. It helps to determine if thetime taken by the IDS to detect intrusion is greater than the time taken tocomplete a closed loop once, in which case it is not very helpful to use anIDS.• Memory Overhead: As the devices in which CORGIDS will be used will bememory constrained, it is essential to calculate its memory overhead. Mem-ory occupied by CORGIDS on SUT will be used to determine this overhead.37Table 6.1: False Positive and False Negative obtained for CORGIDS on thetwo test-bedsTestbed Targeted Attack FP (%) FN(%)UAVBattery Tampering 0.0 12.20Flooding 0.0 11.30Distance Spoofing 0.0 12.80SAPInsulin Tampering 5.60 4.20Glucose Spoofing 2.80 8.40Table 6.2: Comparison of Precision and Recall for OpenAPS platformMethodology Testbed FP(%) FN(%) Precision(%) Recall(%)ARTINALISEGMeter 12 2.3 89.06 97.7OpenAPS 13.5 2 87.89 98Zohrevand Wateret al. [22] Treatment - - 78.87 81.4SystemChen Wateret al. [21] Purification - 15 - -PlantCORGIDSUAV 0.00 12.10 100 87.90SAP 4.20 6.30 95.70 93.70For evaluating CORGIDS, the value obtained for each of the three variables (w,λ and δ ) from the sensitivity analysis was used. Table 6.1 contains the results forFP and FN for the two test-beds, namely, an UAV and SAP on which CORGIDS wasdeployed. Table 6.2 compares our results to only those related papers [21, 22, 29]which dynamically generate physical invariants 1. We acknowledge that Table 6.2does not provide a complete comparison as the test-beds, attacks and training andtesting scenarios were different for each IDS, however we include it here to providebetter context about CORGIDS performance. Later in Chapter 7, a comprehensivecomparison of CORGIDS with its related work is described.Krotofil et. al. and Iturbe et. al. [28, 30] do not measure the performanceof their methodology, and hence they could not be compared with CORGIDS. In1Note: For the research papers that were used for comparison with CORGIDS for the UAV and SAPtest-bed, the FP, FN, precision and recall values were directly used. Manual calculation of precisionand recall for [29] was made from the FP and FN values provided in their paper.38addition, precision and recall for CORGIDS and the papers mentioned in Table 6.2were calculated. To calculate the FP and FN values for CORGIDS which will be usedto generate precision and recall, FP and FN values from Table 6.1 were averaged.However, comparison of the precision value of CORGIDS with Chen et. al. [21]could not be made, as the later did not provide it in their paper.6.3 Experiment resultsBased on the above described metrics, we now discuss the results of the experi-ments performed.6.3.1 PrecisionIn this subsection, the precision achieved by seeding the attacks on the SUT andusing CORGIDS to detect an intrusion is discussed. Also, the precision achievedwith prior work in Table 6.2 is shown. As can be observed, CORGIDS achievesa precision of 100% and 95.70% for the UAV and SAP platform respectively. Incomparison, no other intrusion detector has a precision greater than 90%. Specif-ically, CORGIDS provides an 21.33% improvement in precision over Zohravend etal. [22] and approximately 8.88% over Aliabadi et al. [29] for the SAP platform.The reason behind the higher precision percentage for CORGIDS is the use ofcorrelations exhibited by the two CPS. CORGIDS detects attacks by using an HMMto infer if the current system trace exhibits the same trend with which it was trained.This is the reason that especially for the UAV platform, the HMM recognizes ananomaly with almost 100% precision. The reason for comparatively low precisionvalue for SAP platform is the lack of traces. As the total number of traces forthe SAP platform were less, it led to even lower training traces, which eventuallyeffected the modeling of the HMM. For the experiments on both the test-beds,70%:30% ratio for training and testing traces was maintained. However, the lackof availability of patient’s diabetic therapy data led to a lower number of trainingtraces. This, in turn negatively affected the training of the HMM used by CORGIDSfor the SAP platform.396.3.2 RecallThe recall factor of CORGIDS is discussed here and compared to the related workmentioned in Table 6.1 and Table 6.2 respectively. From Table 6.2, it can beobserved that CORGIDS receives a high recall percentage among all the relatedwork. Although, CORGIDS does not have the highest recall, it is quite close toARTINALI with 93.70% for the SAP platform. CORGIDS improves the recall by15.11% when compared to Zohravend et al. [22]. On the other hand, CORGIDSachieves 11.14% lower recall than ARTINALI when both of them are comparedwith their lowest recall factors.CORGIDS achieves lesser recall than ARTINALI [29] mainly because the be-havior of the SUT under attack was stealthy and did not deviate much from thenormal trend. As the deviation was less, the logical correlation between the prop-erties seemed very similar to the one expected, thus the HMM did not mark the stateas anomalous, leading to some false-negatives. Chen et al. [21] do not providerecall factor, but give the value of FN for their approach, the FN value is comparedto CORGIDS, which is 15%. This is higher than CORGIDS FN values of 12.10% and6.30% for the two platforms. Chen et al. use SVM for intrusion detection, whileCORGIDS uses HMM. HMM are able to better capture the sequence of states andtheir transitions in a CPS, and hence CORGIDS achieves lower FN values.6.3.3 Memory overheadMeasurements of the memory overhead of CORGIDS running on the SAP platformwere also collected. The experiments were performed on a Raspberry Pi 3 withapproximately 1 GB of RAM. We found that CORGIDS consumes 36.15 MB whendetecting intrusion. The reason behind this memory overhead is that CORGIDS usesHMM for intrusion detection. The trained HMM model when loaded into memoryalong with the libraries required for it to generate a decision, requires more space.However, as CORGIDS is used by controller to detect intrusion in the SUT, andthe controllers are not memory constrained as compared to the SUT. For instance,in Raspberry Pi 3, it took only a fraction (36.15 MB) of memory from the 1 GBavailable RAM. Thus, we surmise that the memory overhead incurred by CORGIDSis acceptable.406.3.4 Performance overheadLike memory overhead, performance overhead measurements were also taken fromthe Raspberry Pi 3 platform. The average of 10 executions was considered for theoverhead tests. Ideally, the time taken to deduce a decision should be less that theexecution cycle of the SUT, in order for the intrusion detector to keep up with thesystem. The execution cycle time is the time taken to go though once the closedloop of a CPS. The execution cycle time is important because it gives an idea ofhow much time the system takes to complete one loop of input from blood glucosesensors, to calculating the insulin dosage and sending the same to the insulin pump.It takes approximately 1.25 seconds for CORGIDS to generate a decision based onthe input correlated logs. This is negligible compared to the time taken by a singleexecution cycle of SAP, which is about 5 minutes.Scalability of CORGIDS: To understand the scalability of the overheads withHMM size, the HMM used for intrusion detection were varied. Thus, multiple HMMwere created by varying the tuning parameter, the number of hiddenStates. Thenumber of hiddenStates were varied among 2, 5, 10, 15, 20. We observed thatthe memory and performance overhead of CORGIDS remains the same regardlessof the number of hiddenStates in the HMM. This is could be because the librarieswhich are loaded along with the HMM is the dominant factor in the time, and thisdoes not depend on the number of hiddenStates in the HMM.6.4 Additional experimentsWe carried out additional experiments to gather more insight about how the preci-sion and recall vary based on the type of trained HMM. We conduct experimentswhich include the variation of the two parameters which determine the type oftrained HMM that will be generated.• Training threshold - Training threshold is used as a stopping criteria for train-ing the HMM in Building an Intrusion Detector phase. This value signifiesthe maximum difference between current and previous HMM log likelihood,if the current HMM is the trained HMM. We used the value of 0.5% as thetraining threshold in Chapter 3 based on the prior work [44]. However,41through this experiment we want to determine the effect that the change ofthis value has on intrusion detection capability of the trained HMM. Wesweep the value of training threshold from the values (0.35%, 1% and 2%)with 0.5% value already being used in our experiments.• Number of training traces - The number of training traces determine thecontext that the trained HMM will have. Higher the number of training traces,more likely is that the HMM will be equipped with different types of behaviorexhibited by the CPS. For our experiments, we kept the value of training totesting traces to be 70% : 30%. However, for this study we varied this ratioto see what effect it had on the precision and recall of the IDS. Therefore,the ratio of training traces to testing traces was varied between (50% : 50%,60% : 40% and 80% : 20%) with 70% : 30% ratio already covered in theabove experiments.Both parameters - training threshold and number of training traces - discussedabove are varied one by one, i.e. while one is varied other is kept constant, andvice-versa. Also, this study was conducted for both the platforms - UAV and SAP.6.4.1 Additional results for UAV platformWe conducted two experiments, one for each of the two parameters and summarizethe results below.• Training threshold: By varying the training threshold and keeping the num-ber of training to testing traces constant to 70% : 30%, we got the resultshown in Figure 6.4. Similar trend of precision and recall was observedfor each of the individual value of training threshold as when the trainingthreshold was 0.5%. Also, it was seen that the training threshold did ef-fect the recall factor of CORGIDS. It can be seen that the recall increases asthe training threshold decrease. So, for 0.35% as the training threshold weget the maximum value of recall. However, the recall offered by trainingthreshold value of 0.5% is very close to the maximum recall value achievedin this experiments. Therefore, while using 0.5% for the attacks which wereported in Chapter 5, we did not sacrifice the intrusion detection capabili-42ties of CORGIDS by using an under-trained HMM. On the other hand, it canalso be seen that the precision remains unaffected by the change in trainingthreshold, this could be because the trained HMM which we got from each ofthe individual experiments was well trained about the benign behavior of theSUT.Another observation we made was that the time taken to get the trainedHMM model increased as the training threshold decreased. It was because,the maximum difference between the log likelihood’s of the two HMM wasshrinking which led to creation of more HMM until the threshold was met.However, as this training process needs to be done only once and that too ona non-constrained device, thus making the time consumed aspect not a roadblocker.Figure 6.4: Result by varying the training threshold for the UAV platform• Number of training traces: By varying the training to testing traces ratioand keeping the training threshold at 0.5%, we got the results described inFigure 6.5. As can be observed, the number of training traces do affect theperformance of CORGIDS. Specifically, CORGIDS achieve the best precisionand recall when the training traces are maximum or near to the maximumvalue. Using 80% or 70% of the traces as training helped to achieve therecall of approximately 89 as opposed by lesser number of traces.We see a dip in recall when the number of training traces decrease, because43with fewer training data points, the HMM does not get all the possibilities ofthe behavior that it can expect of the SUT. Thus, more the number of trainingtraces better the detection capability of CORGIDS. Additionally, training theHMM with different number of training traces did not make up a lot of timedifference as compared to the experiment when the training threshold wasvaried.Figure 6.5: Result by varying the number of training traces for the UAV plat-form6.4.2 Additional results for SAP platformWe performed similar experiments for the training threshold and the number oftraining traces variables for the SAP platform, with the results summarized below.• Training threshold: For this experiment, we used the same values of thetraining threshold from the above experiment involving the UAV platform.That is, we varied the threshold between (0.35%, 1.0% and 2%) with theexperiments already conducted for the 0.5% value in the attacks discussed inthe Chapter 5. The results are summarized in Figure 6.6.We see that the variation in the training threshold value has an effect onthe recall of CORGIDS. Particularly higher the value of training threshold,lower the recall of CORGIDS, which means it is not able to detect malicious44activities that well. However, the precision for all the experiments led toapproximately similar value of 95. This could be because the HMM was ableto capture the benign behavior of the SUT with more precision. Further more,as the number of training traces for the SAP platform were approximately 4times less than the UAV platform, it led to a very small number of trainingtraces for each of the experiment conducted. Which is why the results ofSAP platform if compared with UAV are lagging behind.Figure 6.6: Result by varying the training threshold for the SAP platform• Number of training traces: By varying the ratio of training to testing tracesand keeping the training threshold fixed at 0.5%, we conducted experimentswhose results are shown in Figure 6.7. It is clearly seen that the numberof training traces surely impacted the precision and recall of CORGIDS. Asanticipated, the few training traces led to less contextual behavior absorptionfor the HMM which ultimately reflected on the evaluation metrics. For theattacks and result shared in Chapter 5, we used 70% : 30% as the ratio andas can be seen from Figure 6.7, the precision and recall attained for thesenumber of training traces achieve the maximum performance.6.5 SummaryThis chapter contains the results of sensitivity analysis which was performed to findout the values of three experimental factors which effected how the IDS performed45Figure 6.7: Result by varying the number of training traces for the SAP plat-formwhile detecting intrusion. It was found that as the window size increased and theacceptable range and threshold of consecutive decisions decreased, CORGIDS per-formance was increasing, that is, fewer FP and FN. This observation was becausethe intrusion detector model uses HMM and HMM requires a slice of the current sys-tem trace for generating a result. If the slice of the trace will be small, it will leadto a narrow window of observations for the HMM, thus not giving it enough datato analyze the current situation. Secondly, in this chapter we presented the resultsof prototyping CORGIDS and using it for intrusion detection for the attacks men-tioned in previous chapter. We use the two test-beds, an UAV and a SAP to provethat CORGIDS is a generic IDS for CPS. From the results we see that CORGIDSachieve higher precision and recall as compared to the prior work. However, forSAP the recall is less than ARTINALI due to lack of the system traces from whichthe IDS was trained. Memory and performance overheads were also measured forthe SAP test-bed which indicated that CORGIDS required approximately 36 MB ofmemory and took 1.25 seconds to generate a result - benign or malicious.46Chapter 7Comparison with related workTo provide a detailed comparison with the related work, we chose that IDS from ourrelated work, which was similar to CORGIDS. This made ARTINALI the only IDSwhich could be used for comparison purposes as it is generic and designed for CPS.All the other IDS discussed in Chapter 2, were built keeping in mind a particularCPS, thus were incapable of being applied to any other test-beds. Therefore, in thischapter we discuss this quantitative comparison experiment that we performed,followed by its results.7.1 Experimental setupIn this chapter, we discuss how we carry out the comparison with ARTINALIwhich includes the test-beds, attacks carried out, collection of system traces anddetermining invariants.• Test-beds - We chose to use two test-beds - UAV and SAP - used in this studyas the base for the comparison. These test-beds are valid for comparisonbecause CORGIDS already demonstrated its efficacy on these platforms. SAPwas also used by ARTINALI.• Attacks - To provide an even playing ground and completeness, we con-ducted this experiment by taking into account all the attacks that were usedin both the IDS - ARTINALI and CORGIDS. Particularly, we consolidatetargeted and arbitrary attacks from ARTINALI and targeted attacks from47CORGIDS to form a super set which was eventually used for evaluation.For example, for the case of SAP platform which was common for boththe IDS, we combined the targeted attacks from both IDS - ARTINALI andCORGIDS - which were used to measure the detection capabilities. How-ever, as CORGIDS did not use arbitrary attacks as mentioned in Chapter 5,all the arbitrary attacks from ARTINALI were used as a measure for boththe IDS. Arbitrary attacks or fault injections represent the building blocksof the attacks that can occur as zero day attacks, as opposed to the targetedattacks which are designed to exploit a particular feature of the system un-der attack. The term Arbitrary attacks was used by authors of ARTINALIin their study, therefore to maintain similarity we use the same term in ourwork. The arbitrary attacks used by ARTINALI are:– Data mutation, these attacks alter the run-time values of data variablesin the code of the CPS.– Branch flipping attacks randomly flip branch conditions to lead to anabnormal execution flow in the CPS.– Artificial delay insertion adds some delay in normal execution of theprogram in CPS.The above mentioned attacks emulate different security loopholes. For in-stance, in case of data mutation attacks, an attacker by exploiting memorycorruption vulnerabilities such as buffer overflow or race conditions couldchange the values of critical data in the system. Also, code injection or se-mantic vulnerabilities can be used by an attacker to flip critical branches inthe program to create an abnormal execution flow, or to delay the execu-tion of essential functions in the program to achieve its goal. As a result ofthe above mentioned arbitrary attacks, following observations were made inthe CPS, i) Crash, by the introduction of the attack, the system resulted ina crash, ii) Hang, means that after the introduction of the attack the systemfailed to do anything (was unable to perform any operation), iii) Silent DataCorruption (SDC), during the attack the operation of the system deviatedfrom its non-malicious outcome, however, the system continued to function,48and iv) No Corruption, no visible changes were observed during run-time,which could differentiate it from the non-malicious system behavior. OnlySDC and no corruption attacks are taken into account while judging the per-formance of both the IDS, as they are difficult to detect and need an IDS.This is because, the other two system behaviors (crash and hang) are easilydetected and they do not necessarily need an IDS to observe that somethingis wrong with the system.For this comparison, we manually seed each of these faults in the sourcecode of the respective test-beds, by randomly sampling the correspondingprogram points in the programs code of the CPS. We manually chose thefault injection points randomly before performing the experiment.• Collection of system traces - To avoid any bias in the traces which were usedfor building the IDS and consecutively for checking intrusion, we used thesame flight plans for UAV platform, and the same glucose readings for theSAP platform, for both IDS. These traces were then randomly divided into70:30 ratio for training and testing purposes for each IDS.• Choosing invariants for modeling IDS - As CORGIDS had previously used thetwo test-beds for intrusion detection, we already had the physical variablesto be used for modeling/deducing the CPS behavior. ARTINALI, on the otherhand, had data, event and time invariants for only the SAP platform, there-fore, we had to extract those invariants for the UAV platform. ARTINALIdefines an event as “an instance of an action that leads to a change of con-dition., e.g. message send/receive, sensor data reading or activating insulininjection”. Based on this definition and also the CPS traces they collectedfor smart meter and SAP test-beds, we chose the invariants for UAV plat-form. 31 system calls were found which were marked as events, and forthose events(i.e., function calls), the data variables that were present insidebecame the data invariants. For instance, functions which read the sensordata in the UAV (latitude, longitude, speed, etc.) were chosen as events. Fol-lowing this, the Data-Event-Time interplay was calculated by ARTINALI’scode[49], after we provide the CPS traces required.49Table 7.1: Results of intrusion detection by ARTINALI for Targeted attackson UAV platformAttack FP(%) FN(%)Battery tampering 5.50 13.00Flooding 7.00 17.50Distance spoofing 8.70 11.50Table 7.2: Results of intrusion detection by CORGIDS for Targeted attacks onUAV platformAttack FP(%) FN(%)Battery tampering 1.42 12.50Flooding 0.00 11.75Distance spoofing 2.85 10.307.2 Comparison on UAV PlatformThis chapter consists of the attacks - targeted and arbitrary - which were used todetermine the efficiency of both ARTINALI and CORGIDS for the UAV platform.We discuss the targeted and arbitrary attacks one by one.7.2.1 Targeted attacksThe attacks - battery tampering, flooding and distance spoofing - described in thischapter are targeted attacks from CORGIDS, as ARTINALI did not have any exper-iments on the UAV platform. The results of these attacks are shown in Table 7.1and Table 7.2.As can be observed from Table 7.1, Table 7.2, CORGIDS consistently hadfewer FP and FN as compared to ARTINALI. This was primarily because thesetargeted attacks were specifically exploiting the physical properties of the UAV,which CORGIDS uses to detect intrusion. On the other hand, as ARTINALI worksby determining the Data-Event-Time interplay in the trace of the CPS, the change inthe physical property did not lead to sizable change in the data part of the invariants,which ultimately led to a lower detection rate for ARTINALI.For instance, in the case of battery tampering attack, the rate of battery deple-tion was halved multiple times for a random small amount of time during the UAV50operation. As CORGIDS operates by deducing the CPS behavior, during the train-ing phase it deduced the correlation of battery depletion with other physical pa-rameters. Therefore, when at run-time, CORGIDS observed that though for a smallamount of duration, the battery depletion rate was different, it raised an alarm. Onthe other hand, during the training phase of ARTINALI, it took into considerationthe Data-Event-Time interplay, though, the data variable for one of the D|E invari-ant included the value of battery charge in the CPS, it lead to a lower intrusiondetection rate. This was probably because the effect of the change in battery valuewas small as compared to other invariants that ARTINALI took into account forthis CPS. The D|E invariant in ARTINALI works by clubbing the values that avariable can take for a particular event(system call/function), the values of batterylevel that it observed was not out of the acceptable range, instead only the rate ofchange of those values was different. Similar was the case for the D|T invariant,which did not always pick up that a particular value of data variable(battery valuein this case) was supposed to be within a particular time slot. The change in the rateof battery value depletion did not have any noticeable change in the E|T invariantwhich could be because the interplay among the event and time were untouched inthe targeted attacks performed. Similar observation was made for distance spoof-ing attack in which the battery depletion rate from battery tampering attack wasreplaced by the distance covered by the drone.The flooding attacks involved re-sending some of the packets to the GCS whichdid not originate from the UAV. Note, that the data contained in the extra pack-ets that were sent by the attacker in this attack had the same data as some of theprevious packets. CORGIDS, after being trained by benign traces in the trainingphase, to generated an alarm when the additional data packets being received ledto change in the probability of current trace belonging to a benign one, though thevalues of the physical properties were the same. This was because the values ofphysical properties which were received multiple times led to change in the corre-lation of ”flightTime” with the other variables. In the duplicate packets, the valueof ”flightTime” was the same as found in the benign ones, therefore it led to anoverall correlation imbalance, flagging this occurrence. However, in ARTINALIthe D|E invariants did not catch the flooding attack because the values present inthe traces/data packets were valid, though there were a greater number of pack-51Table 7.3: Breakdown of arbitrary attacks for UAV platformAttack Crash Hang SDC No corruption Total attacksData mutation 18 15 15 17 65Branch flipping 9 4 4 2 19Artificial delay insertion 6 4 2 3 15Table 7.4: Results of intrusion detection by ARTINALI for arbitrary attackson UAV platformAttack FP(%) FN(%)Data mutation 12.50 15.62Branch flipping 33.30 50.00Artificial delay insertion 20.00 20.00ets. So, the D|E invariant did not reflect much change, however, E|T invariantsoften lead to detect intrusions, as the time range within which an event had to havechanged due to the excessive number of packets. Similarly, D|T invariants alsosometimes alerted that the trace is anomalous because with the duplicate packetsthe timeline of the operation of the CPS was tweaked which led to change in thevalue that a data variable should have in a given time slot (D|T invariant).7.2.2 Arbitrary attacksThe attacks - data mutation, branch flipping and artificial delay insertion - describedin this chapter are the arbitrary attacks used by ARTINALI in their experiments. AsCORGIDS did not previously use these attacks, only arbitrary attacks from ARTI-NALI are being considered for this comparison. Firstly, Table 7.3 shows the break-down of arbitrary attacks that were used to measure the performance of both theIDSes, along with how the CPS responded to it.Secondly, Table 7.4, Table 7.5 show the result of arbitrary attacks on both theIDS. For data mutation and branch flipping attacks, it was observed that CORGIDSachieved fewer FP and FN; however for artificial delay insertion ARTINALI hadfewer FN.In data mutation attacks, data variables were randomly mutated and whenARTINALI was used for intrusion detection, it was found that D|E and D|T in-52Table 7.5: Results of intrusion detection by CORGIDS for arbitrary attacks onUAV platformAttack FP(%) FN(%)Data mutation 9.30 13.65Branch flipping 16.60 33.00Artificial delay insertion 20.00 40.00variants detected some anomalies. That was because these invariants were capableof finding the change that occurred in the data variable at run-time when comparedto the invariants which were generated while training. However, E|T invariantdidn’t show much of change during this attack due to the fact that the relationof the functions/events that were called remained almost the same. Having saidthat, ARTINALI led to a higher FP and FN, as it observes the value assigned tothe variable and not the correlation or pattern exhibited by these data values. Onthe other hand, as CORGIDS uses the behavior/correlation exhibited by the CPS todetect intrusion, it led to better performance. Though in some cases, data variableswhich were basically function variables were mutated, it led to few anomaly detec-tion as the change in function variables propagated ultimately leading to change inphysical variables.In branch flipping attack, due to change in the branch that was executed in thefunction/event, it led to the execution of different functions than expected. Thisattack led to a use case where the events which should have been called and wereused for generating invariants for ARTINALI, were not executed. Therefore inthe system trace, those particular D|E invariants were missing which often leadto mislabeling of an anomalous trace to be benign. Similar was the case for E|Tinvariants, as the functions/events that were called were not monitored in the ARTI-NALI intrusion detection model. However, the D|T invariants depicted the changebecause the change in the function execution led to a different value to a be as-signed to the monitored data variable. CORGIDS, on the other hand, detected theattacks based on the change in the values of data variables leading to change inphysical variables or physical variables itself. As the data variables did not changeas required by the behavioral model deduced by CORGIDS, it led to flagging thecurrent trace as faulty.53Artificial delay insertion attacks exploited the E|T and D|T invariants fromARTINALI which are responsible of measuring how the monitored events are ex-ecuted (noting the time difference between them) and how the values of data vari-able change with respect to time, respectively showed considerable difference thanthe intrusion detection model that was used by ARTINALI. This is the reason thatARTINALI was able to detect these attacks with lower FN%. CORGIDS, on theother hand, used ”flightTime” as one of its physical variables which essentiallyrecorded the time since the UAV started its current flight. Due to change in the pat-tern of ”flightTime” during the attack as compared to the training traces used forgenerating the intrusion detection model, it led to the detection of attack in somecases, which led to higher FN% for CORGIDS.7.3 Comparison on SAP PlatformAs stated above, SAP platform was common in both ARTINALI and CORGIDS,which means that we did not have to select the data and the events required byARTINALI for constructing its IDS, as they were already present from the studyconducted in [29]. Also, for CORGIDS we had already established the physicalinvariants that would be used for intrusion detection in Chapter 6, so we continuedusing those for this comparison. ARTINALI’s targeted attacks on SAP platformwere Continuous Glucose Monitor (CGM) Spoofing and Basal Tampering, it wasobserved that they were the same targeted attacks that used for the purpose of thisstudy in Chapter 5 (Glucose Spoofing and Insulin Tampering). Therefore, for thetargeted attacks category, only two attacks had to be performed. On the other hand,as CORGIDS did not perform any arbitrary attacks and ARTINALI had used themin their research, we decided to also use those attacks for the comparison.We first discuss the details of the targeted attacks followed by their results,which is followed by a similar analysis for the arbitrary attacks.7.3.1 Targeted attacksWe perform 2 targeted attacks on SAP platform. The attacks carried out are CGMSpoofing or Glucose Spoofing, and Basal Tampering or Insulin Tampering. Theresults shown in Table 7.6 and Table 7.7 are achieved after 5 fold cross-validation54Table 7.6: Results of intrusion detection by ARTINALI for targeted attack onSAP platformAttack FP(%) FN(%)CGM Spoofing 18.50 4.20Basal Tampering 15.50 6.50Table 7.7: Results of intrusion detection by CORGIDS for targeted attack onSAP platformAttack FP(%) FN(%)CGM Spoofing 3.80 8.40Basal Tampering 6.50 5.20for the IDSes derived by ARTINALI and CORGIDS.During the experiments, it was observed that ARTINALI only relies on theabnormalities in the invariants to detect intrusion. Therefore, if there is just oneinvariant broken, it marks it as an anomalous trace. However in CORGIDS, theminimum threshold to generate an alarm is maintained, which helps in minimizingthe FP and FN, and hence increasing its performance. Also, ARTINALI works bytaking into account the values of data variables encountered in the training traceand builds an IDS from that. It does not deduce the behavior/interconnections ofdata variables from the training data, which is the reason that higher FP and FN areobserved for ARTINALI as compared to CORGIDS.As stated previously, the attacks carried out were the same for both ARTINALIand CORGIDS, and both the techniques are able to detect the attacks, though withdifferent FP and FN rates. The difference in the FP and FN arose due to the dif-ference in ARTINALI and CORGIDS approach. ARTINALI works by collectingall the values of data variables provided in the training traces. CORGIDS, however,tries to learn the behavior of the system from the training traces. It is not dependenton particular values of variables, rather it tries build a bigger picture and deducesthe behavior, and not the value of individual variables, which is important in CPS.For instance in the CGM spoofing attack, the values of the blood glucose valuesare manipulated which leads to wrong dosage of insulin being calculated by thecontroller. When ARTINALI was used for detecting intrusions, it led to lower55Table 7.8: Arbitrary attacks on SAP platform and its responseAttack Crash Hang SDC No corruption Total attacksData mutation 22 20 18 25 85Branch flipping 8 2 5 3 18Artificial delay insertion 4 5 3 5 17Table 7.9: Results of intrusion detection by ARTINALI for Arbitrary attackson SAP platformAttack FP(%) FN(%)Data mutation 14.0 11.62Branch flipping 12.50 12.50Artificial delay insertion 0.0 12.50detection rate, which was mainly due to the fact that the altered values did notdeviate much from the original values. This lead to less variation in D|E and D|Tinvariants, as the data part(blood glucose) values remained almost same. On theother hand, the E|T invariant from ARTINALI remained unaffected as there wasno change in execution flow of the CPS. CORGIDS, however, detected intrusionsfor most of the cases as the changes in blood glucose value led to changes in thecorrelation with the other physical variables. A similar observation was made forthe basal tampering attack, where blood glucose manipulations were replaced byinsulin dosage.7.3.2 Arbitrary attacksIn this subsection we discuss the arbitrary attacks and their effect on the SAP plat-form with the performance results of ARTINALI and CORGIDS. The arbitraryattacks were used by ARTINALI to measure its performance, hence for complete-ness and fairness, we test both the IDS on these attacks. Three attacks were carriedout namely, data mutation, branch flipping and artificial delay insertion. A breakup of the attacks and how the system responded to them is shown in Table 7.8.The result of arbitrary attacks being detected by both the IDS, on SAP platformare shown in Table 7.9 and Table 7.10.The results show that CORGIDS performed better in data mutation and branch56Table 7.10: Results of intrusion detection by CORGIDS for Arbitrary attackson SAP platformAttack FP(%) FN(%)Data mutation 7.0 4.65Branch flipping 12.50 0.0Artificial delay insertion 12.50 12.50flipping attack as compared to ARTINALI. It was mainly because ARTINALI doesnot take into account the change in the behavior, rather it works with the changein the value of particular data variables. Therefore, in the case of data mutationattack, if the variables that were being tempered did not belong to the variablesthat were used in intrusion detection, the attack went by unnoticed. This scenariowas present mostly for the D|T and D|E invariants which were the part of thoseinvariants which used data variables for intrusion detection. E|T invariants did notchange significantly as the events or the timeline was not effected by this attack.However, for CORGIDS even if physical variables that are used to detect intrusionswere not mutated, change in function variables propagated ultimately leading tochange in physical variables, which was ultimately detected by CORGIDS.Similar result was observed for branch flipping attacks where randomly chosenbranch conditions were flipped to lead to an abnormal execution flow in the CPS.This attack had an effect on all the invariants for ARTINALI - D|E, E|T and D|T,because the change in execution flow led to change in the events that were beingcalled and ultimately the data variables being captured. These events and datavariables in some cases were different from the ones that were being monitored,thus leading to false negatives. On the other hand, for CORGIDS, a change in theexecution flow led to different values of the physical variables, as the code whichwas executed had changed, therefore flagging the current trace.On the other hand, for the artificial delay insertion attack, it was observed thatARTINALI achieved better performance than CORGIDS. It was because of thesmaller effect of the time change on variables that were monitored by CORGIDS.In this attack, the E|T and D|T invariants were affected as they monitor the timechange with events and data variables respectively. D|E invariants, however, showedalmost no change during this attack. Therefore, as ARTINALI also uses time in-57variants to detect intrusions, it was able to detect intrusions easily for this attack.CORGIDS, however, achieved greater FP% as time was not one of the physical vari-ables that was being monitored. This led to an incorrect correlation calculation byCORGIDS, which was seen to flag even the benign execution as an attack.7.4 SummaryThis chapter described the experiments conducted to compare CORGIDS with ARTI-NALI, and to find out how the two IDS performed given a set of attacks and test-beds. First the procedure and choices made for conducting this experiment areexplained. Followed by the experiments conducted on both the test-beds, whichincluded the targeted and arbitrary attacks from both the IDS. The arbitrary at-tacks were injected manually at randomly chosen program points. It was foundthat CORGIDS performed better than ARTINALI in all the targeted attacks and ar-bitrary attacks (except artificial delay insertion) for both the test-beds. The reasonbehind this observation was that CORGIDS uses the correlation behind the physicalproperties of the CPS under test. During the targeted attacks, the physical prop-erties were maliciously altered and attack was detected with fewer FP and FN byCORGIDS, because it directly violated the correlation. However, for artificial delayinsertion, the physical properties that were being monitored did not absorb muchchange during this particular attack, which led to performance degradation. Whileother arbitrary attacks such as branch flipping and data mutation had better detec-tion rate for CORGIDS, as mostly either directly or indirectly the change in variablesled to change in the physical properties which were monitored, this was however,not the case for ARTINALI.58Chapter 8DiscussionThis chapter first details the threats to validity followed by the generic applicabilityof CORGIDS and how it can be circumvented.8.1 Threats to validityThreats to validity consist of proofs which can be used to support a claim whichis contrary to the context supported by a particular study. Threats to validity helpin bringing out the potential areas/scope within the research which might pose athreat to it.The three threats to validity that are considered in this research are:• An Internal threat to this work is the consideration of only five targeted at-tacks to gauge its performance. For instance, this study does not experimentwith other types of targeted attacks such as dropping attacks or DOS attack.This threat is attempted to be mitigated by choosing attacks which are verydifferent in nature and exploit different domains of the test-beds. Also, ar-bitrary attacks are used which form the building blocks of the attacks thatmight occur in future. Another internal threat is the use of simulations togauge the effectiveness and performance of CORGIDS. Though this threatis substantial, it is attempted to be mitigated by keeping the simulations asunbiased as possible. For instance, for each flight of an UAV, the numberof way-points, latitude and longitude of each way-point and altitude were59randomized. However, for future work, CORGIDS will also be evaluated ona real test-bed.• An External threat consists of the use of only two test-beds from the CPSdomain to prove that this approach is effective and general. However, find-ing test-beds which are publicly available (open-source) and also are securitycritical is a difficult task. This threat was attempted to be mitigated by choos-ing two test-beds which were entirely different in behavior and utility. AnUAV is used for flight operations and uses physical laws of motion for opera-tion, while the SAP is a medical device and uses biological properties of thehuman body to calculate the appropriate amount of insulin to be injected.• Finally, the Construct threat to validity is the use of only FP, FN, precisionand recall for the evaluation of CORGIDS. However, these metrics are alsoused substantially by prior work in this area and therefore are valid for com-parison purposes.8.2 A Generic IDSBuilding a generic IDS for systems exhibiting correlation is one of the key contribu-tions of this thesis. This approach utilizes the correlation exhibited by logical prop-erties, for instance speed, distance traveled, altitude, and flight time. These valuesare dependent on each other and change according to only a predefined framework,for example, the laws of physics for an UAV. However, CORGIDS cannot be appliedto those systems which do not exhibit such correlations. For instance, systems ex-cept CPS and financial systems, in which no correlation can be found between itsvariables/properties are not the candidates for using CORGIDS.8.3 Circumventing CORGIDSAs discussed in Section 1.2, it is assumed that the attacker has capabilities whichcan be used to plant an attack on the SUT. An attacker who knows about the internalworking of the system can circumvent the intrusion detection done by CORGIDS.An example scenario in which the attack will be undetected is, when the attackerhacks the SUT and changes the logging module of CORGIDS to send the correct60correlated values of physical properties irrespective of them being faulty at thatpoint of time. If the attacker were to continue this operation throughout the UAVflight, CORGIDS would not be able to detect intrusion, because correctly correlatedvalues will be received by GCS. However, updating all the correlated values atevery second during the entire flight is constrained by power consumption, timeand effort [28]. So, the case would likely be that the attacker would not be ableto forge the values throughout entire duration, thus leading to some discrepancyin values of logical properties which would be flagged by CORGIDS. Another usecase where the IDS could fail is when the attacker manipulates the CPS but keepsit very close to the benign behavior. As CORGIDS will not have any correlationwhich is deviating by a large amount from its original state, the attack will mostprobably go unnoticed. However, if the attacker attempts to perform this type ofattack, it will lead to very little deviation from the original benign behavior, thusleading no harm to the CPS.Another point to note is that the effect of varying a single correlated propertyfor intrusion detection was demonstrated in this study. Varying multiple propertiesin the system will have a similar effect and will lead to an unbalanced correlationwhich will be spotted by the HMM. Also, by varying the rate of increase or decreaseof the anomalous correlated property, variations in the log probability of the currentsystem state will arise which will be marked malicious. However, if log probabilityof the current malicious state is very close to the benign state’s log probability, itis likely that CORGIDS would not be able to distinguish between these two states,and thus the attack would not be detected.61Chapter 9Conclusion and Future work9.1 SummaryCPS have gained ubiquitous popularity over the masses in the past few years. Theyare different from traditional computer systems as they need to interact with thephysical environment, which is subject to the laws of physics. Owing to the con-straints that these CPS possess, they have become the targets of attackers who ex-ploit them due to the insufficient protection and interconnectedness. IDS which arespecifically tailored for CPS have been devised in the past to secure these systemsfrom tailored attacks. The key point to note here is that the physical propertiesof the CPS depict the current state of the system. These properties can be usedto study the correlation exhibited and further can be used to detect presence of anabnormal activity. Using this key insight, we build Correlation-based Generic In-trusion Detection System (CORGIDS), a generic IDS which dynamically generatedthe physical invariants using the physical properties of the CPS. Also, we test theprototype on two behaviorally different CPS test-beds, namely i) an UAV, and ii)a Smart Artificial Pancreas (SAP). We find that CORGIDS produces significantlylower FP and FN for the 5 targeted attacks that were tested on the test-beds.Though, the use of physical properties of the CPS to detect an intrusion isused to build IDS previously, all the solutions either use manually defined phys-ical rules/invariants, or dynamically build the invariants but only for a specificCPS. Both of the invariant generation techniques mentioned require more devel-62oper effort and time, and manually defining invariants specifically needs an in-depth knowledge of the CPS or else the invariants defined for intrusion detectioncould themselves be faulty. To fill this gap, this thesis proposes a generic IDS, de-signed for systems which exhibit correlations, like CPS. Therefore, though an UAVand SAP are completely different in their operational behavior and uses, CORGIDScan be applied to both systems as it tries to infer the correlation present within theset of physical properties for each of the test-bed.The related work which dynamically generates physical invariants for CPS usesalgorithms/models such as i) Pearson Correlation Coefficient (PCC), which is use-ful for measuring linear correlation among data variables and not particularly madefor non-linear multi-dimensional data, ii) Support Vector Machine (SVM), whichdistinguished between malicious and benign behavior. However, these are modelsare not well suited for time-series based systems, iii) Data-Time-Event interplay,was used to deduce how the values of various data points in different events werechanging over the curse of functioning of the CPS, but it did not learn to deduce howthe CPS was behaving. On the other hand, we used Hidden Markov Model (HMM)to identify the correlations between non-linear multi-dimensional data of the CPS.HMM are much more resilient to outliers and noise compared to other techniques,and do not presuppose a distribution of the properties, making them genericallyapplicable.Further, CORGIDS was also compared with ARTINALI, a generic IDS usingData-Event-Time interplay to detect intrusion. Attacks from both the IDS werecombined to generate a superset of attacks on which ARTINALI and CORGIDSwere tested. Particularly, both targeted and arbitrary attacks from ARTINALI andCORGIDS were used to provide an even playing ground for both the techniques. Re-sults from this comparison showed that CORGIDS performed with lower FP and FNfor all the targeted and arbitrary attacks (except the artificial delay insertion attack)as opposed to ARTINALI. Our results showed that as CORGIDS during trainingphase deduces the behavior of the CPS under test, and ARTINALI on the otherhand focuses on gathering the Data-Event-Time interplay, which does not includethe correlation among different physical properties, CORGIDS was able to detectmore attacks as the attacks that ultimately led to change in physical properties ofthe CPS.639.2 Future work9.2.1 Implementation of CORGIDS on a real test-bedAs part of the future work, CORGIDS could be implemented on a real test-bed.This will help in understanding the differences between real world testing and sim-ulations, if any. There may be a case where the real world testing leads to noisein the system traces due to environmental factors. Therefore, when using the realtest-bed traces for training purposes, they could be either refined to remove noiseor could be directly used, as they will provide a basis for the IDS to learn/observethe environment as it is. For instance, in this study Ardupilot’s SITL simulator isused for experiments, therefore in future the same experiments could be performedfor the real UAV platform. The experiment on a real UAV platform will allow adetailed comparison in the differences in performance of CORGIDS - FP, FN, per-formance and memory overheads - when compared to its simulated version. Other-wise, a completely new test-bed, for example, a smart rover could be used to gaugeCORGIDS efficacy. As CORGIDS only requires that the SUT contain the correlationamong properties, it can be easily modified to be used in different systems.9.2.2 Testing efficacy of CORGIDS on other attacksAnother area of improvement for this thesis could be the addition of experimenta-tion of additional attacks. As described in Chapter 5, this thesis considers only 5targeted attacks on the 2 test-beds, and could be expanded by testing CORGIDS onother network attacks, DOS and message dropping attacks.9.2.3 Identifying the malicious property using CORGIDSAn interesting addition to CORGIDS can be the ability of detecting the physicalproperty which is destabilizing the CPS. CORGIDS, by monitoring the correlationof the physical properties of the CPS, is able to detect if there is a malicious activityin the CPS. However, it does not identify the physical property which is breakingthe correlation, for example, in distance spoofing attack in UAV, faulty values ofdistance traveled break the correlations and in SAP, the insulin tampering attackresults in faulty insulin values to be injected in the patient. Therefore, as an exten-64sion to the current work, if we could pinpoint the faulty physical property using theabnormal correlations, necessary action could be taken by the operator to investi-gate the current situation. This diagnosis from CORGIDS will help in decreasingthe operator’s time and manual effort involved when an attack is detected.9.2.4 Coupling of an automated mitigation technique with CORGIDSAnother possible extension of this work could be integration of a mitigation tech-nique along with CORGIDS. As CORGIDS is responsible for only detection of at-tacks in CPS, that is, once the attack happens CORGIDS alerts the controller aboutit. As of now, the rescue or the mitigation is left onto the operator who will mostprobably manually try to control the CPS under attack. However, an integrated mit-igation technique could come in handy at this time, where the mitigation softwarewhich is coupled with CORGIDS is intelligent enough to know what steps are re-quired to be executed in order to stabilize the CPS which was acting maliciously.This technique could offer immediate resolution of conflict, rather than waiting forthe operator to take an action which will require time. The automated mitigationprocess could be also based upon the physical properties of the CPS and alwaysmonitors the current physical status and comes into play once CORGIDS raises analarm. For instance, consider an attack where the attacker modifies the controllerin such a way that it is sending faulty acceleration values. Particularly, the newmalicious acceleration values being sent to the actuators are causing an UAV to de-scend at an abnormal speed which could eventually lead to a crash. Due to changein acceleration (physical property) of an UAV, CORGIDS will detect an intrusionand raise an alarm. At this time the automated mitigation technique which wasmonitoring all the changes in the physical environment kicks in and tries to stabi-lize the system by either activating the fail-safe mode, if any, or putting the UAVin a hovering state which breaks its steep descend and gives the operator a stablesystem, which can then be safely navigated to the desired position.65Bibliography[1] S. Karnouskos, “Cyber-physical systems in the smartgrid,” in 2011 9th IEEEInternational Conference on Industrial Informatics, pp. 20–23, IEEE, 2011.→ page 1[2] G. N. Ericsson, “Cyber security and power system communicationessentialparts of a smart grid infrastructure,” IEEE Transactions on Power Delivery,vol. 25, no. 3, pp. 1501–1507, 2010. → pages 1, 4[3] S. Checkoway, D. McCoy, B. Kantor, D. Anderson, H. Shacham, S. Savage,K. Koscher, A. Czeskis, F. Roesner, T. Kohno, et al., “Comprehensiveexperimental analyses of automotive attack surfaces.,” in USENIX SecuritySymposium, San Francisco, 2011. → pages 1, 3[4] J. Yang and J. Coughlin, “In-vehicle technology for self-driving cars:Advantages and challenges for aging drivers,” International Journal ofAutomotive Technology, vol. 15, no. 2, pp. 333–340, 2014. → page 1[5] A. Y. Javaid, W. Sun, V. K. Devabhaktuni, and M. Alam, “Cyber securitythreat analysis and modeling of an unmanned aerial vehicle system,” inHomeland Security (HST), 2012 IEEE Conference on Technologies for,pp. 585–590, IEEE, 2012. → pages 1, 26[6] F. Mohammed, A. Idries, N. Mohamed, J. Al-Jaroodi, and I. Jawhar, “Uavsfor smart cities: Opportunities and challenges,” in 2014 InternationalConference on Unmanned Aircraft Systems (ICUAS), pp. 267–273, IEEE,2014. → page 166[7] F. Skopik, Z. Ma, T. Bleier, and H. Gru¨neis, “A survey on threats andvulnerabilities in smart metering infrastructures,” International Journal ofSmart Grid and Clean Energy, vol. 1, no. 1, pp. 22–28, 2012. → page 3[8] J. Liu, Y. Xiao, S. Li, W. Liang, and C. P. Chen, “Cyber security and privacyissues in smart grids,” IEEE Communications Surveys & Tutorials, vol. 14,no. 4, pp. 981–997, 2012. → page 3[9] S. Woo, H. J. Jo, and D. H. Lee, “A practical wireless attack on theconnected car and security protocol for in-vehicle can,” IEEE Transactionson intelligent transportation systems, vol. 16, no. 2, pp. 993–1006, 2015. →page 3[10] N. Leavitt, “Researchers fight to keep implanted medical devices safe fromhackers,” Computer, vol. 43, no. 8, pp. 11–14, 2010. → page 3[11] J. Radcliffe, “Hacking medical devices for fun and insulin: Breaking thehuman scada system,” in Black Hat Conference presentation slides,vol. 2011, 2011. → pages 3, 30[12] A. Davanian, F. Massacci, and L. Allodi, “Diversity: A poor man’s solutionto drone takeover.,” in PECCS, pp. 25–34, 2017. → page 4[13] H. Alemzadeh, D. Chen, X. Li, T. Kesavadas, Z. T. Kalbarczyk, and R. K.Iyer, “Targeted attacks on teleoperated surgical robots: Dynamicmodel-based detection and mitigation,” in 2016 46th Annual IEEE/IFIPInternational Conference on Dependable Systems and Networks (DSN),pp. 395–406, IEEE, 2016. → page 4[14] A. Seshadri, M. Luk, E. Shi, A. Perrig, L. Van Doorn, and P. Khosla,“Pioneer: verifying code integrity and enforcing untampered code executionon legacy systems,” in ACM SIGOPS Operating Systems Review, vol. 39,pp. 1–16, ACM, 2005. → page 4[15] T. Lu, J. Zhao, L. Zhao, Y. Li, and X. Zhang, “Towards a framework forassuring cyber physical system security,” Int. J. Security Appl., vol. 9, no. 3,pp. 25–40, 2015. → page 567[16] R. Mitchell and R. Chen, “Behavior rule specification-based intrusiondetection for safety critical medical cyber physical systems,” IEEETransactions on Dependable and Secure Computing, vol. 12, no. 1,pp. 16–30, 2015.[17] G. Bernieri, F. Del Moro, L. Faramondi, and F. Pascucci, “A testbed forintegrated fault diagnosis and cyber security investigation,” in Control,Decision and Information Technologies (CoDIT), 2016 InternationalConference on, pp. 454–459, IEEE, 2016. → page 5[18] R. Mitchell and I.-R. Chen, “Specification based intrusion detection forunmanned aircraft systems,” in Proceedings of the first ACM MobiHocworkshop on Airborne Networks and Communications, pp. 31–36, ACM,2012. → pages 5, 6, 12, 26[19] R. Mitchell and R. Chen, “Adaptive intrusion detection of maliciousunmanned air vehicles using behavior rule specifications,” IEEETransactions on Systems, Man, and Cybernetics: Systems, vol. 44, no. 5,pp. 593–604, 2014. → page 12[20] A. Choudhari, H. Ramaprasad, T. Paul, J. W. Kimball, M. Zawodniok,B. McMillin, and S. Chellappan, “Stability of a cyber-physical smart gridsystem using cooperating invariants,” in Computer Software andApplications Conference (COMPSAC), 2013 IEEE 37th Annual,pp. 760–769, IEEE, 2013. → pages 6, 12[21] Y. Chen, C. M. Poskitt, and J. Sun, “Learning from mutants: Using codemutation to learn and monitor invariants of a cyber-physical system,” arXivpreprint arXiv:1801.00903, 2018. → pages 7, 8, 12, 14, 38, 39, 40[22] Z. Zohrevand, U. Glasser, H. Y. Shahir, M. A. Tayebi, and R. Costanzo,“Hidden markov based anomaly detection for water supply systems,” in BigData (Big Data), 2016 IEEE International Conference on, pp. 1551–1560,IEEE, 2016. → pages 5, 8, 13, 33, 38, 39, 40[23] T. Paul, J. W. Kimball, M. Zawodniok, T. P. Roth, B. McMillin, andS. Chellappan, “Unified invariants for cyber-physical switched system68stability,” IEEE Transactions on Smart Grid, vol. 5, no. 1, pp. 112–120,2014. → pages 6, 12[24] S. Adepu and A. Mathur, “Using process invariants to detect cyber attackson a water treatment system,” in IFIP International Information Securityand Privacy Conference, pp. 91–104, Springer, 2016. → pages 6, 12[25] S. R. Eddy, “Hidden markov models,” Current opinion in structural biology,vol. 6, no. 3, pp. 361–365, 1996. → page 6[26] X. Tan and H. Xi, “Hidden semi-markov model for anomaly detection,”Applied Mathematics and Computation, vol. 205, no. 2, pp. 562–567, 2008.→ page 7[27] V. Jadhav and P. Devale, “Anomaly detection on user browsing behaviors forprevention app ddos,” International Journal of Advances in Engineering &Technology, vol. 1, no. 5, p. 492, 2011. → page 7[28] M. Krotofil, J. Larsen, and D. Gollmann, “The process matters: Ensuringdata veracity in cyber-physical systems,” in Proceedings of the 10th ACMSymposium on Information, Computer and Communications Security,pp. 133–144, ACM, 2015. → pages 7, 8, 13, 38, 61[29] M. R. Aliabadi, A. A. Kamath, J. Gascon-Samson, and K. Pattabiraman,“Artinali: Dynamic invariant detection for cyber-physical system security,”2017. → pages 8, 13, 14, 24, 30, 38, 39, 40, 54[30] M. Iturbe, J. Camacho, I. Garitano, U. Zurutuza, and R. Uribeetxeberria,“On the feasibility of distinguishing between process disturbances andintrusions in process control systems using multivariate statistical processcontrol,” arXiv preprint arXiv:1706.01679, 2017. → pages 8, 13, 38[31] M. D. Ernst, J. H. Perkins, P. J. Guo, S. McCamant, C. Pacheco, M. S.Tschantz, and C. Xiao, “The daikon system for dynamic detection of likelyinvariants,” Science of Computer Programming, vol. 69, no. 1, pp. 35–45,2007. → page 1069[32] C. Csallner, N. Tillmann, and Y. Smaragdakis, “Dysy: Dynamic symbolicexecution for invariant inference,” in Proceedings of the 30th internationalconference on Software engineering, pp. 281–290, ACM, 2008. → page 10[33] A. Baliga, V. Ganapathy, and L. Iftode, “Automatic inference andenforcement of kernel data structure invariants,” in Computer SecurityApplications Conference, 2008. ACSAC 2008. Annual, pp. 77–86, IEEE,2008. → page 11[34] A. Baliga, V. Ganapathy, and L. Iftode, “Detecting kernel-level rootkitsusing data structure invariants,” IEEE Transactions on Dependable andSecure Computing, vol. 8, no. 5, pp. 670–684, 2011. → page 11[35] C. Csallner, Y. Smaragdakis, and T. Xie, “Dsd-crasher: A hybrid analysistool for bug finding,” ACM Transactions on Software Engineering andMethodology (TOSEM), vol. 17, no. 2, p. 8, 2008. → pages 10, 11[36] J. Yang, D. Evans, D. Bhardwaj, T. Bhat, and M. Das, “Perracotta: miningtemporal api rules from imperfect traces,” in Proceedings of the 28thinternational conference on Software engineering, pp. 282–291, ACM,2006. → page 11[37] M. Gabel and Z. Su, “Javert: fully automatic mining of general temporalproperties from dynamic traces,” in Proceedings of the 16th ACM SIGSOFTInternational Symposium on Foundations of software engineering,pp. 339–349, ACM, 2008. → page 11[38] M. Gabel and Z. Su, “Online inference and enforcement of temporalproperties,” in Proceedings of the 32nd ACM/IEEE International Conferenceon Software Engineering-Volume 1, pp. 15–24, ACM, 2010. → page 11[39] I. Beschastnikh, Y. Brun, S. Schneider, M. Sloan, and M. D. Ernst,“Leveraging existing instrumentation to automatically inferinvariant-constrained models,” in Proceedings of the 19th ACM SIGSOFTsymposium and the 13th European conference on Foundations of softwareengineering, pp. 267–277, ACM, 2011. → page 1170[40] C. Lemieux, D. Park, and I. Beschastnikh, “General ltl specification mining(t),” in Automated Software Engineering (ASE), 2015 30th IEEE/ACMInternational Conference on, pp. 81–92, IEEE, 2015. → page 11[41] A. Khurshid, W. Zhou, M. Caesar, and P. Godfrey, “Veriflow: Verifyingnetwork-wide invariants in real time,” ACM SIGCOMM ComputerCommunication Review, vol. 42, no. 4, pp. 467–472, 2012. → page 11[42] S. Hangal, N. Chandra, S. Narayanan, and S. Chakravorty, “Iodine: a tool toautomatically infer dynamic invariants for hardware designs,” inProceedings of the 42nd annual Design Automation Conference,pp. 775–778, ACM, 2005. → page 11[43] A. Waksman and S. Sethumadhavan, “Tamper evident microprocessors,” inSecurity and Privacy (SP), 2010 IEEE Symposium on, pp. 173–188, IEEE,2010. → page 11[44] M. Ferrer, I. Alonso, and C. Travieso, “Influence of initialisation and stopcriteria on hmm based recognisers,” Electronics Letters, vol. 36, no. 13,pp. 1165–1166, 2000. → pages 16, 41[45] “Ardupilot software in the loop.” May 13, 2018. → page 22[46] J.-S. Pleban, R. Band, and R. Creutzburg, “Hacking and securing the ar.drone 2.0 quadcopter: investigations for improving the security of a toy,” inMobile Devices and Multimedia: Enabling Technologies, Algorithms, andApplications 2014, vol. 9030, p. 90300L, International Society for Opticsand Photonics, 2014. → pages 27, 28[47] N. M. Rodday, R. d. O. Schmidt, and A. Pras, “Exploring securityvulnerabilities of unmanned aerial vehicles,” in Network Operations andManagement Symposium (NOMS), 2016 IEEE/IFIP, pp. 993–994, IEEE,2016. → page 2771[48] D. I. Urbina, J. A. Giraldo, A. A. Cardenas, N. O. Tippenhauer, J. Valente,M. Faisal, J. Ruths, R. Candell, and H. Sandberg, “Limiting the impact ofstealthy attacks on industrial control systems,” in Proceedings of the 2016ACM SIGSAC Conference on Computer and Communications Security,pp. 1092–1105, ACM, 2016. → page 34[49] M. R. Aliabadi, “Artinali github page.” → page 4972


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items