UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Error propagation analysis of multithreaded programs using likely invariants Chan, Abraham 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2017_september_chan_abraham.pdf [ 977.09kB ]
JSON: 24-1.0354487.json
JSON-LD: 24-1.0354487-ld.json
RDF/XML (Pretty): 24-1.0354487-rdf.xml
RDF/JSON: 24-1.0354487-rdf.json
Turtle: 24-1.0354487-turtle.txt
N-Triples: 24-1.0354487-rdf-ntriples.txt
Original Record: 24-1.0354487-source.json
Full Text

Full Text

Error Propagation Analysis of Multithreaded ProgramsUsing Likely InvariantsbyAbraham ChanB.A.Sc, The University of British Columbia, 2015A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of Applied ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)August 2017c© Abraham Chan, 2017AbstractError Propagation Analysis (EPA) is a technique for understanding how errors af-fect a program’s execution and result in program failures. For this purpose, EPAusually compares the traces of a fault-free (golden) run with those from a faultyrun of the program. This makes existing EPA approaches brittle for multithreadedprograms, which do not typically have a deterministic golden run. In this thesis, westudy the use of likely invariants generated by automated approaches as alternativesfor golden run based EPA in multithreaded programs. We present Invariant Prop-agation Analysis (IPA), an approach and a framework for automatically derivinginvariants for multithreaded programs, and using the invariants for EPA. We evalu-ate the invariants derived by IPA in terms of their coverage for different fault typesacross six representative programs through fault injection experiments. We findthat stable invariants can be inferred in all six programs, although their coverage offaults depends on the application and the fault type.iiLay SummarySoftware bugs remain prevalent in computing systems. One can either detect andeliminate bugs, or try to tolerate bugs in the software. Fault injection is a techniquethat assesses the ability of software programs to tolerate bugs by introducing faultsinto the program. Following a fault injection assessment, error propagation anal-ysis (EPA) is conducted in order to understand how the injected faults affect theprogram. Existing EPA approaches utilize a straight-line comparison between agolden (fault-free) trace and a faulty trace of the program. However, multithreadedprograms produce different traces between fault-free runs and are hence, unsoundfor EPA. We present Invariant Propagation Analysis (IPA), as an alternative frame-work for EPA, which uses likely invariants instead of golden traces. Unlike goldentraces, likely invariants are properties on the program state that do not change be-tween runs. We evaluate the use of likely invariants for the EPA problem.iiiPrefaceThis thesis is based on work conducted by myself in collaboration with Dr. KarthikPattabiraman, Dr. Stefan Winter, Habib Saissi, and Dr. Neeraj Suri. This workwas published in the 2017 IEEE International Conference on Software Testing,Verification and Validation [4]. I was responsible for coming up with the solutionand validating it, evaluating the solution and analysing the results, and writing thepaper. Stefan contributed primarily to the introduction and background sectionsof the paper. Habib designed the theoretical system model for the framework.Karthik and Neeraj were responsible for guiding me with the solution reasoning,experiments design and results analysis, as well as editing and writing portions ofthe paper.Abraham Chan, Stefan Winter, Habib Saissi, Karthik Pattabiraman and NeerajSuri. IPA: Error Propagation Analysis of Multi-Threaded Programs Using LikelyInvariants, IEEE International Conference on Software Testing, Verification andValidation (ICST), 2017.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Background and Related Work . . . . . . . . . . . . . . . . . . . . . 52.1 Fault Injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Error Propagation Analysis (EPA) . . . . . . . . . . . . . . . . . 62.3 EPA in Multithreaded Programs . . . . . . . . . . . . . . . . . . 72.4 Likely Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . 82.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11v3.2 Solution Overview . . . . . . . . . . . . . . . . . . . . . . . . . 133.3 IPA: EPA Using Likely Invariants . . . . . . . . . . . . . . . . . 143.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 184.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . 184.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 194.3 Fault Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.4 RQ0: Golden Run Variance . . . . . . . . . . . . . . . . . . . . . 214.5 RQ1: Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.6 RQ2: Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.7 RQ3: Invariant Classification . . . . . . . . . . . . . . . . . . . . 284.8 RQ4: Performance Evaluation . . . . . . . . . . . . . . . . . . . 324.9 RQ5: Inferring Invariants at a Lower Confidence . . . . . . . . . 334.10 RQ6: Inferring Invariants at a Finer Granularity . . . . . . . . . . 344.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.1 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.2 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . 475.3 False Positives and False Negatives . . . . . . . . . . . . . . . . . 485.3.1 False Positives . . . . . . . . . . . . . . . . . . . . . . . 485.3.2 False Negatives . . . . . . . . . . . . . . . . . . . . . . . 496 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . 506.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53viList of TablesTable 4.1 Description of faults injected using LLFI . . . . . . . . . . . . 22Table 4.2 Invariant counts and classification (refer to Table 4.3) of IPA’sgenerated invariants . . . . . . . . . . . . . . . . . . . . . . . 25Table 4.3 Description of Invariant Classes . . . . . . . . . . . . . . . . . 30Table 4.4 Classification of violated invariants from 1000 faulty Quicksortruns and their coverage . . . . . . . . . . . . . . . . . . . . . 30Table 4.5 Classification of violated invariants from 1000 faulty Swaptionsruns and their coverage . . . . . . . . . . . . . . . . . . . . . 31Table 4.6 IPA vs EPA Performance Measured in Seconds for step num-bers 1, 2 and 3 (refer to Figure 3.1). . . . . . . . . . . . . . . . 33Table 4.7 Number of invariants inferred per confidence level. . . . . . . . 34viiList of FiguresFigure 2.1 Example multithreaded function with labelled basic block tracesfor f unc(4). . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Figure 3.1 IPA: Invariant-based EPA Model . . . . . . . . . . . . . . . . 15Figure 3.2 Example function in the Blackscholes application . . . . . . . 16Figure 4.1 Golden run based EPA . . . . . . . . . . . . . . . . . . . . . 22Figure 4.2 Average variance between golden run traces . . . . . . . . . . 23Figure 4.3 Number of invariants generated from varying numbers of pro-filing runs for six benchmark applications . . . . . . . . . . . 25Figure 4.4 Proportion of 1000 faulty Quicksort runs that violate at leastone invariant . . . . . . . . . . . . . . . . . . . . . . . . . . 37Figure 4.5 Proportion of 1000 faulty Swaptions runs that violate at leastone invariant . . . . . . . . . . . . . . . . . . . . . . . . . . 38Figure 4.6 Proportion of 1000 faulty Blackscholes runs that violate atleast one invariant . . . . . . . . . . . . . . . . . . . . . . . . 39Figure 4.7 Proportion of 1000 faulty Streamcluster runs that violate atleast one invariant . . . . . . . . . . . . . . . . . . . . . . . . 40Figure 4.8 Proportion of 1000 faulty Nullhttpd runs that violate at leastone invariant . . . . . . . . . . . . . . . . . . . . . . . . . . 41Figure 4.9 Proportion of 1000 faulty Nbds runs that violate at least oneinvariant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Figure 4.10 Control flow graph with labelled basic blocks of the functionshown in Figure 3.2 . . . . . . . . . . . . . . . . . . . . . . . 43viiiFigure 4.11 Number of invariants generated from varying numbers of pro-filing runs at the basic block level. . . . . . . . . . . . . . . . 44Figure 4.12 Number of invariants generated from varying numbers of pro-filing runs at the basic block level. . . . . . . . . . . . . . . . 45ixAcknowledgementsFirst, I would like to thank Dr. Karthik Pattabiraman for his support over the pasttwo years. This thesis could not have been completed without his technical andnon-technical guidance. Second, I would also like to thank Dr. Stefan Winter andDr. Neeraj Suri for their numerous iterations over my conference paper. Lastly, Iwould like to thank my lab colleagues for providing their regular feedback duringweekly meetings.xDedicationTo my parents.xiChapter 1IntroductionSoftware bugs are ever present in software programs, due to programmers’ mis-takes and oversight. With the increasing demand for more complex software sys-tems, many systems may be deployed with both known and unknown bugs withoutfully knowing their consequences [25]. The simplest software bug consists of asyntactical mistake, where a programmer may unintentionally utilize the wrongoperator for an expression. Developers usually write unit and regression tests in anattempt to catch software bugs. However, more complicated software bugs such asmemory corruptions may not be detectable by traditional testing methods. Instead,these bugs can be simulated through artificial software faults and their effects onthe target program can be analysed.Static analysis tools have been explored as an option to automatically discovercommon software bugs without executing the target program. Such tools usuallysearch for patterns in the source code or program binary that match entries in abug database. However, static analysis tools have several key drawbacks. First,the tools has been found to exhibit high false positive rates [43]. Secondly, staticanalysis tools are only able to detect compile time bugs rather than runtime bugs.Alternatively, dynamic analysis has been applied for bug detection by requiringconcrete program executions with set inputs and by comparing the program outputwith an expected value. Standard testing techniques such as unit and regressiontests are examples of dynamic testing. In addition to bug detection, dynamic test-ing techniques also allow developers to understand the effects of a bug on the pro-1gram output. Artificial software faults that effectively emulate real software bugscan offer additional insight and foster the development of specific fault recoverymechanisms in the program [31].Software fault injection (SFI) [1, 7, 30] is a technique for testing the robustnessof software to faults in its operational environment. For this purpose, SFI intro-duces faults into the software under test (SUT) and its environment, executes theSUT, and observes its behavior under the faults. While the nature of SFI closelyresembles that of mutation analysis, the faults considered in SFI are not limitedto syntactical mutation operators, but also include real world faults (e.g., softwarebugs). Similar to other types of robustness tests, such as fuzzing approaches, SFIrelies on negative oracles to determine if a test passes. Unlike traditional teststhat pass if the SUT’s output matches an expected output, SFI tests pass if certainundesired behaviour does not occur (e.g., program crashes).An important type of negative oracle is error propagation, i.e., the corruptionof a module’s internal state by an injected fault. In general, error propagation isundesirable because it can lead to failures that are hard to diagnose and recoverfrom. The identification of how such state corruptions evolve from a fault activa-tion in execution is referred to as error propagation analysis (EPA). EPA typicallyrequires capturing a detailed execution trace of the program when running a test.After the termination of a test, its execution trace is compared to an executiontrace from a fault-free execution, also known as the golden run [5, 15, 23]). Anydeviation from the golden run is considered to be an instance of error propagation.While the golden run comparison technique works well for EPA of determin-istic programs and execution environments, it can lead to spurious outcomes in thepresence of non-determinism, which can cause trace deviations that do not indi-cate error propagation. One of the primary sources of non-determinism is the useof multithreading in programs, which is becoming more prevalent as processorsattain increasing core counts. In a multithreaded program, threads can execute indifferent orders (due to non-deterministic scheduling), and hence the traced valuesin the program may differ across runs.In this thesis, we propose the use of dynamically generated “likely” invariants[8] to perform EPA for multithreaded programs. An invariant is a property ona program’s data values that holds across all of its executions. Likely invariants2are those that hold across observed executions of the program but do not neces-sarily hold across unobserved runs. There are three reasons why likely invariantsare a good fit for the EPA problem. First, likely invariants can be automaticallygenerated by analyzing the traces from different executions of a program, withoutany programmer intervention. This is critical for the technique to scale to large,real-world applications. Second, likely invariants are often compact, and can bechecked with low overhead at run-time, e.g., as predicates in assertions. Third, andmost importantly, likely invariants can be conditioned such that they hold across theentire set of executions on which the program is trained, automatically abstractingout non-deterministic parts of the program. However, likely invariants characterizecorrect executions with less precision than true invariants [8], which may reducetheir efficacy for EPA.Consequently, the question we ask in this thesis is: “How effective are the datainvariants1 generated by automated techniques in tracking error propagation inmultithreaded programs?”. It is important to answer this question to determine ifexisting invariant generation techniques are sufficient for EPA, or if new techniquesneed to be developed. We experimentally measure the effectiveness of an invari-ant in terms of two attributes,(1) the stability of the generated invariant set acrossdifferent (non-deterministic) executions of the program, and (2) the fault cover-age of the generated invariants for different fault types, corresponding to commonsoftware faults.We make the following contributions in this thesis:• We propose the use of invariants for performing EPA in multithreaded pro-grams. (Section 3.2)• We build a framework called IPA (Invariant-based Propagation Analysis) toderive dynamic invariants for multithreaded programs through an automated,end-to-end process (Section 3.3).• We empirically assess the efficacy of the invariants derived using IPA for sixrepresentative multithreaded programs through fault-injection experiments(Chapter 4). We find that the traditional form of EPA is unsuitable for mul-1From this point on, when we say invariants, we mean likely invariants.3tithreaded programs due to their non-determinism. We also find that theinvariants derived by IPA are stable across multiple executions, and providecoverage ranging from 10% to 97% depending on the fault type and pro-gram. Then, we find that the proposed IPA framework is substantially fasterthan traditional EPA-based analysis, while incurring a 2-90% one time setupoverhead. Finally, we perturb the confidence and granularity parameters ofIPA to determine whether the fault coverage can be improved. We find nosignificant difference in the number of inferred invariants when confidenceis lowered and that invariants inferred at the basic block granularity do notstabilize within 20 runs, making them unsound for EPA.4Chapter 2Background and Related WorkIn this chapter, we first describe the notions of fault injection and EPA. Then, weshow an example of why golden run EPA is unsound for multithreaded programs.Lastly, we describe likely invariants and related work in the field on using likelyinvariants.2.1 Fault InjectionFault injection is a technique for modifying one or more components of a softwaresystem to emulate bugs. It has been widely deployed to advance test coverage andsoftware robustness by exploring error handling paths of programs (e.g., [9–11, 20,28]). There are two categories of fault injection: compile-time injection and run-time injection. Compile-time injections involve modify source code (e.g., SAFE[30]) or binary code (e.g., G-SWFIT [7] or EDFI [13]), similar to mutation testing.In contrast, run-time injections mimic software events that corrupt instructions andmemory at run-time. The sensitivity of programs to such events is difficult to assessthrough traditional testing techniques [39]. We focus on run-time injections, andrefer to these as fault injections in this thesis.Traditionally, fault injection tools have targeted hardware faults, such as singleevent upsets caused by particle strikes on chips. However, an increasing numberof fault injection systems now target software faults. Fault injection systems, suchas FIAT [38], LLFI [26], or PDSFIS [18] explicitly support the emulation of a5wide range of software faults at run-time. For instance, buffer overflow errors canbe simulated by under-allocating malloc calls by some number of bytes. Otherexamples include simulating invalid pointer errors by randomly corrupting pointeraddresses, and race conditions by acquiring non-existent or incorrect locks.2.2 Error Propagation Analysis (EPA)The effects of a software fault depends on both its type and its location in which itoccurs. Therefore, EPA attempts to answer the following question: “How does aninjected fault propagate within a program?”Existing EPA approaches in tools such as PROPANE [15] or LLFI [26] makeuse of either instruction or variable trace comparisons between golden and faultyruns of programs. Deviations between traces can be interpreted as data violationsor control flow violations. Data violations occur when identical instructions at thesame program point are invoked with different values. Control flow violations oc-cur when the instruction orders differ. Either violation is considered an indicationof a software fault. However, this approach assumes that traces from golden runsare identical as long as the program is operating on the same inputs. Any non-determinism in the program can violate this assumption, such as that caused bymultithreading.Lemos et. al. [24] addressed the non-determinism problem in EPA using ap-proximate comparison techniques used in computational biology (e.g., DNA se-quencing) to compare golden traces and faulty traces. This approach, however,does not compare the non-deterministic portions of the trace with the golden run,effectively limiting its coverage. Unfortunately, there is a significant amount ofnon-deterministic state in multithreaded programs.Leeke et al. [23] attempt to solve the non-determinism problem in EPA usinga reference model, which is a statistical characterization of the system’s outputs.At a high-level, reference models are similar to likely invariants. However, unlikelikely invariants, which can be automatically derived, the reference model requiressignificant manual effort and also detailed domain knowledge. Further, for manysystems, it may not be possible to derive a reference model if the outputs do notconform to well-known statistical distributions.61 int func (int x) {2 int a = x - 3; // T1: A, T2: D = (x=4, a=1)3 if (a > 0) {4 x = 1; // T1: B, T2: E = (x=1, a=1)5 } else {6 x = 2;7 }8 a = a + 5; // T1: C, T2: F = (x=1, a=6)9 return x;10 }Figure 2.1: Example multithreaded function with labelled basic block tracesfor f unc(4).2.3 EPA in Multithreaded ProgramsIn this section, we provide an example of why EPA using golden run comparisonsdoes not suffice for multithreaded programs.Consider the function in Figure 2.1, created for the purpose of this example.The function takes a single input x and returns 1 if it is greater than 3 and 2 oth-erwise. A golden run trace of this function on a single threaded run, with inputf unc(4), would return the values [(x = 4,a = 1),(x = 1,a = 1),(x = 1,a = 6)] ateach basic block. In a single threaded run, the golden run trace is consistent forevery execution of f unc(4).Consider the case where f unc(4) is executed on two threads. Within eachthread, the set of trace values observed is the same as the single threaded case. Wedenote T 1 as the set of golden trace values from the first thread, T 1 = [A,B,C]and T 2 as the set from the second thread as T 2 = [D,E,F ]. Multiple thread inter-leavings between T 1 and T 2 are possible due to the non-deterministic multithread-ing execution environment. One possible interleaving may lead to the followinggolden trace: [A,B,C,D,E,F ]. Alternatively, another interleaving may result in[A,D,E,F,B,C]. Notice that the order of the golden trace values within individualthreads are preserved. As a result, each execution of f unc(4) may result in a dif-ferent golden trace. If [A,B,C,D,E,F ] was considered the golden trace, EPA usinggolden run comparisons would erroneously consider [A,D,E,F,B,C] as a faultytrace.This example demonstrates control flow non-determinism, which is caused by7thread scheduling. However, non-determinism can also arise through data non-determinism. Data non-determinism is where the same variable (i.e., a shared re-source) exhibits a different value in each thread. In Figure 2.1, both A and D havethe value set (x = 4,a = 1) in each respective thread. Suppose a was a shared vari-able between threads and the function executed with the [A,D,B,E,C,F ] thread-interleaving, then C holds the value set (x = 1,a = 6) while F holds the valueset (x = 4,a = 11). Notice that the shared resource a is incremented by 5 againin the second thread. Data non-determinism is more difficult to monitor thancontrol flow non-determinism. Unlike control flow non-determinism, data non-determinism may not be mitigated through thread-specific traces. Furthermore,altering the target program to include control access mechanisms would produce adifferent program than the program under test.2.4 Likely InvariantsTrue invariants are predicates that are valid across the set of all executions of a pro-gram. Therefore, the violation of a true invariant necessarily indicates the presenceof a fault, provided the invariant was inferred from a correct program. Thus, trueinvariants are sound, but not necessarily complete indicators for error propagation.Unfortunately, the existence of such true invariants is undecidable in the generalcase [34], which makes their automated inference difficult, if not impossible.Likely invariants in contrast, only hold for observed executions but not neces-sarily for all executions. Thus, they may contain spurious invariants in addition totrue invariants. Further, likely invariants may not comprise all true invariants assome true invariants may not be exercised in the set of observed executions. Con-sequently, likely invariants are both incomplete and unsound in the general case,and hence incur both false positives and false negatives.Although likely invariants, unlike true invariants, bear a risk of false positives,we assert that this risk is substantially lower than for golden run comparisons innon-deterministic programs. This is because EPA is typically done over a set ofknown inputs, and we only require that the likely invariants are stable over this set.Further, likely invariants can be generated through automated techniques [6, 8, 14],which make them a viable option even for highly complex programs.8In this thesis, we focus on the likely invariants generated by Daikon [8], whichis the most widely used likely invariant inference engine. Daikon infers and reportslikely invariants based on a set of execution traces. DySy [6] and DIDUCE [14]are other examples of dynamic invariant generation tools. DySy first applies sym-bolic execution and then observes dynamic execution traces to generate invariants.DIDUCE detects invariants and subsequently checks their violations to help pro-grammers locate bugs. However, all three systems suffer from the effects of threadnon-determinism [21], rendering them unsuitable for multithreaded programs. Inrecent work, Kusano et al. [21] addressed this problem by developing a custominterleaving explorer for multithreaded programs as Udon. However, Udon is notused for the purpose of EPA, which is our focus. Our framework builds on top ofUdon for invariant inference.Prior work has used likely invariants for mutation testing and error detection.For example, Schuler et al. [37] assess the viability of invariant checking in mu-tation testing. They find that an invariant approach yields a 97% detection ratein their mutation experiments. However, they evaluate the efficacy of invariantsthrough the proportion of equivalent mutants detected (i.e., mutations that yieldsyntactically different but semantically identical results), which is different fromour goal of using them for EPA. Sahoo et al. [36] use likely invariants to detecthardware faults through software-level symptoms. Their experiments show thattheir approach is able to identify over 95% of hardware faults. However, they fo-cus only on range-based invariants (i.e., checking if values lie in a closed interval),significantly limiting the scope of the approach. Further, they focus on hardwarefaults (i.e., single bit flips). Lu et al. [27] develop a custom invariant extractor andutilize invariants to expose atomicity violations between thread interleavings. Incontrast to these papers, this thesis explores the use of a broad set of likely invari-ants to trace the propagation of software run-time faults in multithreaded programs.2.5 SummaryThis chapter defined the fault model adopted in our evaluation technique in Chap-ter 4. Then, it defined the EPA problem that is addressed in Chapter 3. In Sec-tion 2.3, we demonstrate why existing EPA techniques are unsound in multithreaded9programs. Next, it provided an example of why golden run based EPA does notwork on a multithreaded program. In Chapter 3, we present our new EPA approachusing likely invariants.10Chapter 3MethodologyWe first formalize the EPA problem in Section 3.1. Then, we provide an overviewof our proposed solution in Section 3.2, followed by the development of IPA, theframework that implements our solution, in Section 3.3. We then present an exam-ple to show the applicability of IPA.3.1 System ModelTo systematically study the utility of likely invariants for EPA, we first introduceabstract models for sequential programs and parallel programs. Using these mod-els, we demonstrate that the most widely used approach to EPA for sequentialprograms is not applicable for multithreaded programs. For brevity, we limit ourmodels to terminating programs. We do not consider this a restriction to our ar-gument’s generality, as most EPA techniques (like other experimental software as-sessment) evaluate correctness properties on a finite execution sequence of programstatements.EPA in Sequential Programs We define sequential programs by their control flowgraphs (CFGs). A CFG of a program P is a directed graph (V,E). The set ofvertices V represents the program statements and the set of directed edges E ⊆{(vi,v j)∈V ×V} is defined such that (v,v′)∈ E iff v′ is a possible direct successorstatement to v. The relation E follows directly from the sequence of statements inthe program text and the programming language’s semantics. For every statement11v ∈ V , we define a set of predecessors Pred(v) = {v′ ∈ V : v′ → v}. A programhas exactly one entry point e, and a single exit point x such that Pred(e) = /0 andSucc(x) = /0. A program with multiple exit points can easily be modified to producean equivalent program with one exit point.We model an execution of a sequential program as a path in the CFG of theprogram. Considering an execution σ = e,v1, ...,x, we define a set of reachablestates sσ such that s ∈ sσ is a map assigning values to the program variables aftera statement vi is executed in σ . For the sake simplicity, we write sσ (vi) to referto s. The output of a program is solely determined by the provided input for asequential program. The functional specification of a program relates all possibleinputs to corresponding outputs. A program execution whose output satisfies thefunctional specification is said to be correct.In EPA, faults are injected in the considered program to analyze their possi-ble effects. A fault injection procedure consists of either adding or modifying astatement or its data, the injection point, in the CFG. Given a concrete input, theprogram is executed to obtain a correct execution, referred to as the golden run.Next, a fault is injected and the program is executed again with the same input.The obtained execution may deviate from the golden run as the code has beenmodified. If so, the fault was activated, resulting in an error, and one can analyzethe faulty execution. Any deviation of an execution from a correct execution withthe same input is called an error. In a program execution, we refer to the sequencebetween an error and the last output-defining statement as error propagation. Errorpropagation can be identified based on whether (1) the faulty execution σ f followsa different path in the CFG compared to the golden run σg starting from the injec-tion point vr, OR (2) there exists a statement v in σ f occurring after vr, such thatsσ f (v) 6= sσg(v).EPA in Multithreaded Programs. Multithreaded programs consist of multiplethreads executing concurrently. Each thread Ti is modeled as a separate CFG Ei.For a statement v in CFG Ei, we write th(v) to refer to the thread Ti executing it. Anexecution of a concurrent system is a sequence of statements σ = ei,v1,v2, ...,x jsuch that for every two successive statements v and v′ with v occurring before v′and th(v) = th(v′), v ∈ Pred(v′). In other words, an execution is a linearization of12partial orders of statements induced by the respective CFGs.Due to the non-determinism of scheduling, different “equivalent” linearizationsare possible. Given the same input, two executions σ and σ ′ are said to be equiv-alent iff they deliver the same output, that is, sσ (xi) = sσ ′(x j). Given a golden runσr = ...vi,v j, ..., one can obtain an equivalent linearization σ ′r = ...v j,vi, ... by swap-ping adjacent statements vi and v j, as long as the CFG order and synchronizationmechanisms allow it, if sσr(v j) = sσ ′r(vi). Successive swapping of such statementsgenerates more possible linearizations that can characterize a correct execution[29]. This is the reason why straightforward pairwise comparison of statements inthe executions introduces unsoundness to golden run based EPA for multithreadedsystems. Thus, golden run based EPA may erroneously flag a deviation of observedequivalent linearizations due to to scheduler non-determinism.3.2 Solution OverviewIn our approach, we start with a set of golden runs Σ for a program P and generatea set of likely invariants F from them. Then, we inject a fault into P and run Pagain. The potentially faulty execution is then validated against the likely invari-ants. Suppose σ ∈ Σ denotes a golden run and σ f denotes a faulty execution of theprogram. A likely invariant f ∈ F is defined as a predicate over the set of reachablestates sσ such that f (s) is true for all s ∈ sσ . An execution σ f is said to deviatefrom the correct runs if and only if there is an invariant f ∈ F such that f (s) is falsefor some s ∈ sσ f .To be effective for EPA, the generated invariants must have the following prop-erties, on which we base our empirical assessment in Chapter 4. Both propertiesare discussed in detail in Section 4.5 and Section 4.6.1. Stability: The invariant must hold across multiple fault-free executions ofthe programs targeted for injection with different numbers of threads for agiven set of inputs. This ensures a low false-positive rate.2. Coverage: The invariants must provide high coverage for different types offaults, thereby ensuring a low false-negative rate. We define coverage ofan invariant under a certain fault type as the probability that the invariant is13violated given that a fault of this type occurs in the program and is activatedduring program execution.3.3 IPA: EPA Using Likely InvariantsWe now introduce IPA, a new EPA framework for multithreaded programs us-ing dynamically inferred likely invariants. IPA consists of three main modules,(1) program profiling, (2) invariant inference, and (3) fault detection. Figure 3.1overviews the EPA process using the IPA framework.The profiling module (label À) is invoked at program compilation time, andadds functions at the entry and exit points of every function in the program. Tracingprogram values at function entry and exit points allows us to capture preconditionsand postconditions of procedures, which broadly encapsulate its functionality. Aunique invocation nonce is also assigned to each pair of function entry and exitvalues, on a per thread basis. The invocation nonce enables inferred invariants toassociate exit values with entry values. All of the traced values are accumulated ina trace file, which is then passed to the invariant inference module.The invariant inference module (label Á) examines the values in the trace fileand generates likely invariants with a 100% certainty, meaning that the output in-variants will hold for the given trace file. As discussed in Section 3.2, this stabilityacross different runs keeps the false-positive rate low. Therefore, programs must bechecked to ensure that the set of likely invariants are stable for a given set of inputs.Typically, for terminating programs, this problem can be remedied by using multi-ple profiling runs to generate the trace file. Traces from multiple program runs canproduce fewer invariants than single runs due to a higher probability of falsifica-tion, but can also generate more invariants as longer traces offer higher statisticalsignificance for previously neglected invariants. Once the invariant inference mod-ule produces a stable set of invariants, the invariants can be used to validate faultytraces (i.e., traces generated from faulty program runs).Finally, the fault detection module (label Â) parses and groups the invariantsby their invoked functions. These invariant groupings are stored in a hash mapstructure. The faulty trace, which mirrors the format of the golden trace, is scannedline by line. The fault detection module retrieves the corresponding invariant(s)14Figure 3.1: IPA: Invariant-based EPA Modelfrom the hash map and validates the invariant(s) based on the faulty trace values.The invariant violations are reported in a new file, which records the line numberin the faulty trace, the function name, a flag indicating function entry or exit, andthe violated invariant.3.4 ExampleWe outline an example of using IPA on the Blackscholes application, a multi-threaded benchmark program introduced in Section 4.2. We use this code examplerather than the simpler function in Figure 2.1 to demonstrate the inference andvalidation of invariants of a function argument against a computed output. Fig-ure 3.2 features a function within this program. The profiling module instrumentsthe entry and exit points of this function while executing Blackscholes multipletimes with a fixed input set. InputX , as the sole function argument, is the onlyvariable traced at the function entry. At the function exit, both InputX and thereturn value are traced. Using the trace files, the invariant inference module gener-ates two sets of invariants at the entry and exit points respectively: {InputX > 0},{InputX = orig(InputX),return 6= orig(InputX)}. The entry invariant ( f ) speci-fies that all observed values of InputX are greater than 0. The first exit invariant( f1′) specifies that the final value of InputX must be equal to the value of InputXthat was passed into the function. The second exit invariant ( f2′) asserts that thefunction cannot return the passed-in value of InputX .Suppose a patch of the program incorrectly alters the boolean expression in line7 to InputX > 0.0 (an easy mistake even by experienced programmers [19, 42]).By inspecting the code, we observe that the bug leads to an erroneous change of151 fptype CNDF ( fptype InputX ) {2 int sign; // BB 134 fptype OutputX;56 // Check for negative value of InputX7 if (InputX < 0.0) {8 InputX = -InputX; // BB 29 sign = 1;10 } else // BB 311 sign = 0;1213 OutputX = computeNPrimeX(InputX); // BB 41415 if (sign) {16 OutputX = 1.0 - OutputX; // BB 517 }1819 return OutputX; // BB 620 }Figure 3.2: Example function in the Blackscholes applicationInputX at line 8. The fault detection module can detect this bug by validating thedata trace of InputX against the set of invariants, reporting the violation of f1′. Weapplied this mutation and found that IPA reports the violation of f1′ in 100% ofthe faulty runs involving the same inputs on varying numbers of threads. We alsoobserve that the value of InputX influences the return value, Out putX . However,we did not observe any violations of f2′ in the faulty runs.Violated invariants not only reveal the presence of faults, but also localize thesource of faults. Since f is retained, and f1′ is violated, the fault must have occurredbetween the entry point and the exit point. Note that these statements are notnecessarily from the same function as other threads might have been interleavedwith the function and might have modified the value of some of the variables. Thus,an invariant based approach can avert the pernicious effects of thread variance.3.5 SummaryThis chapter described the design of IPA, a new EPA framework based on likelyinvariants. Section 3.1 presented a formalization of the EPA problem in multi-threaded programs. Section 3.2 introduced a theoretical overview of IPA and Sec-16tion 3.3 described the work flow of IPA. In the next chapter, we present the resultsfor our evaluation of IPA.17Chapter 4Experimental EvaluationThe goal of our experiments is to evaluate the effectiveness of the likely invariantsderived by IPA in performing EPA. As mentioned in Section 3.2, to be effective,a likely invariant should have two properties: (1) stability, and (2) coverage. Toevaluate the stability, we execute the program multiple times, and measure thenumber of executions after which the invariant set stabilizes (Section 4.5). Wethen measure the coverage provided by the invariants for different fault types byinjecting faults into the program and checking whether any of the invariants areviolated due to a fault (Section 4.6). We also group the invariants into differentclasses based on their structure, and measure the coverage provided by each classof invariants (Section 4.7). Finally, we measure the performance overhead of theIPA and EPA approaches (Section 4.8).4.1 Research QuestionsWe ask the following research questions (RQ’s) in our experimental evaluation.• RQ0: Is golden run EPA a sound method to identify error propagation inmultithreaded programs?• RQ1: Do the invariants stabilize across multiple executions of the program?• RQ2: What is the coverage provided by the invariants as a whole, for differ-ent kinds of errors in the program?18• RQ3: What is the coverage provided by invariants of a specific type/class,for different kinds of errors in the program?• RQ4: What is the performance overhead of IPA compared to golden runEPA?• RQ5: Can stable invariants be generated at a lower confidence level?• RQ6: Can stable invariants be generated at a finer program granularity?4.2 Experimental SetupIPA1 consists of three modules as shown in Figure 3.1, namely the program pro-filing module, the invariant inference module, and the fault detection module. Theprogram profiling module is implemented as a LLVM [22] compiler transformationpass, which is based on the instrumentation pass in the Udon tool [21]. The invari-ant inference module utilizes Daikon [8], since it is presently the most widely usedtool for likely invariant generation. Therefore, the primary function of the programprofiling module is to produce a trace file in a Daikon-compatible format. This in-volves some customized configurations in the LLVM compiler pass. For simplicityof implementation, IPA only traces the values of function arguments belonging toprimitive data types – this is similar to what Udon [21] does. Lastly, the fault de-tection module consists of a single Python script and compares the values in thetrace file with the derived invariants.We evaluate the IPA framework using six representative multithreaded bench-marks that perform a wide variety of tasks: Quicksort, Blackscholes, Swaptions,Streamcluster, Nullhttpd, and Nbds. These benchmarks range from roughly 300to 3000 lines of code. All benchmarks are implemented in C/C++, and use thePOSIX threading library (i.e., pthreads). We run all benchmarks using default pro-gram inputs that come with the benchmark suites. Quicksort, as its name suggests,sorts a sequence of integers both sequentially and concurrently using the Quicksortalgorithm, and returns a response code to denote success or failure. Blackscholes,Swaptions, Streamcluster are part of the PARSEC benchmark [3]. Blackscholes1We have made IPA publicly available at http://github.com/DependableSystemsLab/LLFI-IPA.19is an application that solves the Black-Scholes partial differential equation, whichprices a portfolio of European-style stock options. Swaptions uses the Monte Carlopricing algorithm to compute the prices of swaptions, a form of financial derivative.Streamcluster is a web server application performing the online clustering problemwith streaming data. Nullhttpd is a small and efficient multithreaded web serverfor Linux and Windows [33]. Nbds [32] is an implementation of non-blocking datastructures supporting concurrent key-value store transactions. We choose thesebenchmarks to represent a wide variety of domains where multithreading is com-monly applied.We use LLFI [26], a LLVM based tool, to perform fault injections. While LLFIwas originally developed for hardware faults, it currently supports both softwareand hardware faults2. LLFI injects software faults into the program IR by modify-ing instruction or register values of the program at runtime. We assume that faultsare uniformly distributed throughout the program code. Table 4.1 describes howLLFI injects each software fault. We consider only activated faults, i.e., those inwhich the modified data is read by the program, when reporting coverage.4.3 Fault ModelIn this thesis, we consider the following 6 software faults listed in Table 4.1: datacorruptions, file I/O buffer overflows, buffer overflows (involving) malloc, functioncall corruptions, invalid pointers and race conditions.Data corruption is a generic fault type that can capture a wide variety of errorsdue to logical errors (e.g., the example in Chapter 3), and implementation bugs(e.g., integer overflows, uninitialized variables). The buffer overflow fault cate-gories can occur due to common bugs in C/C++ programs where array and pointerbounds are not checked. We distinguish between file I/O-related buffer overflowsand other buffer overflows as the former can lead to security vulnerabilities. Func-tion call corruptions can occur when one passes the wrong parameters to a function,and represents incorrect invocation of functions i.e., interface errors. Invalid point-ers can arise due to errors in pointer arithmetic, or due to the use of pointers afterfreeing them, i.e., use-after-free bugs. Finally, race conditions occur due to locks2Available at: https://github.com/DependableSystemsLab/LLFI20not being acquired or acquired incorrectly, and with at least one of the threadsperforming a write to shared data. We limit ourselves to six fault modes to keepthe number of experiments tractable - all six faults are supported by LLFI [35].Note that the fault types above are broader than those covered by traditional mu-tation testing, in that they involve corruption of the program state beyond simplesyntactic changes.The six chosen software faults represent common bugs [41] that are difficult tocapture through unit or regression tests, and have been used in prior work to em-ulate software faults [12, 16, 17]. Memory-related bugs such as data corruptions,buffer overflows, function call corruptions, and invalid pointers are particularlyhard to debug as the observed program failures may not be clearly associated withthese faults [17]. This is attributed to two main reasons. First, a single memory bugcan propagate to other locations, causing multiple memory bugs. Eventually, pro-gram failures may be attributed to memory bugs that are often distant from the rootmemory bug. Secondly, different types of memory bugs may lead to one anotherthrough propagation. For example, a data corruption can lead to a buffer overflow,which may cause a function call corruption. Such scenarios entail significant effortto determine their root causes. Error testing in unit and regression tests aims tovalidate the program’s exception handling rather than exhaustively testing for thepresence of memory bugs. Like memory-related bugs, race conditions often leadto effects that cannot be detected by simple failure detectors. Therefore, EPA isparticularly useful for analysing the effects of these bugs, which makes them animportant target in our evaluation.4.4 RQ0: Golden Run VarianceWe conduct golden trace analysis (the traditional EPA model) over the benchmarkapplications (see Section 4.2), by varying the number of threads for each program.To conduct EPA following the traditional EPA model shown in Figure 4.1, the ap-plication is compiled and instrumented to invoke a tracing function at every LLVMIR instruction. Hence, each line in a trace file represents an instruction identifierand its corresponding data value in the program. A golden trace of the originalprogram instructions is generated by profiling. Then, a fault is injected into the21Table 4.1: Description of faults injected using LLFIFault Type LLFI ImplementationData Corruption Randomly flips a single bit in an ar-bitrary data value in the programFile I/O Buffer Over-flowRandomly increases the size infread and fwrite operationsBuffer OverflowMallocUnder allocates malloc and callocto emulate overflowing the allo-cated buffersFunction Call Cor-ruptionRandomly corrupts the source reg-ister (i.e., parameter) of a functioncallInvalid Pointer Randomly corrupts the returnedpointer from malloc and callocRace Condition Replaces a lock of a mutex in theprogram with a fake mutexFigure 4.1: Golden run based EPAprogram and a trace of the modified program instructions is produced. Finally,EPA is performed by comparing the golden and faulty traces line by line. Discrep-ancies between the two traces will reveal how faults propagate through the programexecution paths.We collect golden runs over all benchmark programs except Nullhttpd 3, run-ning them with a single thread, 4 threads, 8 threads, 16 threads, and 32 threadsrespectively. We find considerable variance between the golden traces upon run-ning the applications with different numbers of threads using the same input, which3This experiment was not conducted on Nullhttpd since the thread number was not externallyconfigurable.220%5%10%15%20%25%1 6 11 16 21 26 31Gol den Trace Vari anceNumber of ThreadsBlackscholes Quicksort SwaptionsStreamcluster NbdsFigure 4.2: Average variance between golden run tracesobviously does not indicate error propagation. Variance is measured by taking theproportion of line conflicts between two trace files relative to the total number oflines in a single trace file (i.e., proportion of dynamic instructions with differentvalues).Figure 4.2 depicts the average variances between 5 golden traces of each appli-cation, executed with five distinct thread levels. The variance between the goldenruns is 10 % on average due to multithreading non-determinism. However, we ob-served golden run variance when sending multiple server requests concurrently,thus spawning multiple threads.Note that it is possible to use traditional EPA for the deterministic portionsof the program. However, it is non-trivial to identify the deterministic portionsa priori, as these depend both on the number of threads and the inputs given to theprogram. Therefore, traditional methods for EPA cannot be used in a multithreadedcontext.Observation 1 If a multithreaded program is repeatedly executed with the same23input, the golden runs extracted from these executions may differ from each other.4.5 RQ1: StabilityTo use invariants for EPA while minimizing false positives, the invariants must bereproducible among repeated program executions. In this experiment, we evaluatethe stability of the set of dynamically generated invariants across execution reiter-ations. Let n denote the number of execution recurrences. Each application beginswith n= 1 to produce a trace file, which is then delivered to the invariant inferencemodule. The invariant inference module returns a single set of invariants. Thisprocess is repeated with n = 2,3,4,5,10,15, resulting in a family of sets of invari-ants. The number of invariants obtained at each n value is reported in Figure 4.3.In all of our sample applications, we observe a convergence of likely invariants byn = 5. We also verified manually that the invariant sets match when the invariantsconverge, i.e., the invariants derived are the same after 5 executions.Table 4.2 shows the counts of inferred invariants in our sample applications.These are shown only for the stable invariants. We find that there is roughly oneinvariant for every 10–100 lines of source code, with the sole exception of Null-httpd. Few invariants were inferred from Nullhttpd as many of its functions wereparameterless. This ratio is captured by the invariant density, ρ , which representsthe number of invariants per lines of code. The invariant counts show that stableinvariants can be inferred from multithreaded programs, when repeatedly executedwith the same inputs.Observation 2 If a multithreaded program is repeatedly executed with the sameinput, the likely invariants generated from these executions stabilize within ten ex-ecutions.For our coverage assessment in the following section, we consider only thestable invariants, or those invariants that hold across all observed executions (inour experiments). This allows us to minimize the number of false-positives andobtain conservative lower bounds on the coverage.24Table 4.2: Invariant counts and classification (refer to Table 4.3) of IPA’sgenerated invariantsBenchmark LOC Functions Invariants ρ (%) Invariant ClassesA B C D E F G H OtherQuicksort 330 9 27 8.2 3 - - - 1 1 16 6 -Blackscholes 526 5 29 5.5 - - - - 3 - 15 11 -Streamcluster 1580 11 23 1.5 1 - - - - - 14 6 2Swaptions 1635 14 94 5.7 7 4 3 1 4 4 59 11 1Nullhttpd 2500 20 8 0.3 - - - 2 - - 4 2 -Nbds 3158 27 80 2.5 - - - - 4 - 36 39 101020304050607080901001 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Number of Likely InvariantsNumber of Profiling RunsSwaptionsNbdsBlackscholesQuicksortStreamclusterNullhttpdFigure 4.3: Number of invariants generated from varying numbers of profil-ing runs for six benchmark applications254.6 RQ2: CoverageAs we showed in the previous sections, using invariants instead of golden run basedcomparisons, we were able to improve the soundness of EPA for multithreaded ap-plications, i.e., minimize false positives. An important question is, whether we alsomiss true positives in the process of reducing false positives, i.e., if the likelihoodof false negatives is increased for invariant based EPA. To answer this question,we perform 1000 fault injections of each fault type in Table 4.1, one per run, onthe benchmark applications. We choose a sample size of 1000 fault injections suchthat the error bars of the fault coverage rates fall within a 95% confidence interval.Subsequently, we compare the faulty program traces against the set of inferred in-variants. If any of the likely invariants was violated due to the injected fault, welabel the run as a successful detection.Suppose T is the set of all faulty program traces, and p is the number of violatedinvariants in a single trace. Let Tp≥1 be a subset of T , denoting the set of programtraces that violate at least one invariant. Then,Fault Coverage =|Tp≥1||T |The fault coverages for each application are shown in Figure 4.4, Figure 4.5,Figure 4.6, Figure 4.7, Figure 4.9, and Figure 4.8. The error bounds denote the95 % confidence intervals of the reported fault coverages. The figures show thefault coverage for different fault types divided into three failure modes, as seenin similar fault injection experiments [26]: Benign, Crash/Hang and Silent DataCorruption (SDC). Benign indicates faulty program runs with no observable de-viations in the final program output. Faults may still propagate through internalfunctions without manifesting into an observable output deviation. Crash/Hangsignifies faulty runs that either terminate with exceptions or time out. SDC spec-ifies faulty runs that terminate normally but produce program outputs that deviatefrom the golden run (i.e., incorrect outputs). SDCs are often the most importantfailure modes, as they are much harder to detect than crashes.We find that the fault coverage provided by the invariants varies widely acrossapplications, from 90 %–97 % for Swaptions, to 10 %–15 % for Blackscholes. This26variation occurs due to fluctuations in two factors: Invariant densities (ρ), andinvariant relevance (i.e., ability of the invariant to detect faults). The invariant den-sity for a program is computed by dividing the number of inferred invariants bythe lines of source code (LOC). The invariant densities show that there are suffi-cient numbers of invariants to offer reasonable fault detection in EPA. We considerinvariants per LOC a better metric for sufficiency than invariants per function, asthe distribution of invariants across functions is not uniform. This is because somefunctions disproportionately implement more operations than others. Note thatthe density metric underreports the true density of invariants, since the LOC in-cludes both non-execution path code and header files. The invariant densities foreach benchmark is shown in Table 4.2. We observe that Quicksort and Swaptionshave higher invariant densities at 8.2 % and 5.7 % respectively, while also havinghigher fault coverage than other programs. However, invariant density does notexpress the relevance of the invariants to fault detection. The sets of invariants forQuicksort and Swaptions both contain a number of invariants involving compu-tation data, while Blackscholes is dominated by invariants on local environmentvariables. Computation data is more likely to be passed inter-procedurally, whichincreases the likelihood of fault detection. In contrast, local environment variablesrarely carry beyond the scope of functions. Consider the case where a variable iscorrupted at the function exit. If no invariants exist on that variable at the functionexit, the fault would not be captured. However, the prospect of fault detection in-crease if the value is passed to subsequent functions, which may have invariantschecking it.Further, there is considerable variation across different fault types and theirconsequences on the benchmark applications. For example, in Streamcluster, thecoverage for race conditions is only about 15 %, while it is 70 % for data corruptionerrors. In other benchmarks (e.g., Quicksort), the situation is reversed, with raceconditions having the highest coverage (97 %), while data corruption errors havethe lowest coverage (80 %). Data corruption errors directly affect the data as dataoperand bits are randomly flipped. On the contrary, the effects of race conditionscan be difficult to predict as they are dependent on the implementation of lockingpatterns in the threading library. In this case, race conditions cause Quicksort andSwaptions to violate (some) invariants, yet minimal effects are observed in other27benchmarks.Across all applications, the benign errors constitute a majority of fault out-comes (73 % on average), followed by Crash/Hang (22 %) and SDCs (5 %). Wedo not measure SDCs in Nullhttpd and Nbds since these applications return eithera successful response code or a failure message. We find that benign errors exhibitthe highest fault coverage overall. Although benign errors are typically neglectedin EPA, benign fault coverage shows that invariants can track benign faults beforethey are masked. This may be important to find latent bugs in the program. Onthe contrary, Crash/Hang are the most blatant failures. Nullhttpd has the highestrate of Crash/Hang fault coverage among the benchmarks. We find that a set ofinitialization invariants are violated whenever the web server fails to load. Finally,SDCs are typically the least commonly observed failure outcomes across applica-tions, and consequently have the least coverage. Quicksort has the highest rates ofSDC error detection among all the applications. This is because it contains manyinequalities, and a single negated inequality can impact the final ordering of values.Correspondingly, many of the invariants in Quicksort consist of inequality condi-tions and ordering constraints that are sensitive to such value deviations, and henceyield high coverage.Observation 3 If faults are injected in a multithreaded application, their effectsare indicated by violations of likely invariants generated from fault-free multi-threaded executions of that application. However, the coverage provided dependsboth on the application and the type of faults injected.4.7 RQ3: Invariant ClassificationDuring the automated inference of likely invariants, we observed that many invari-ants have a similar structure. For example, some invariants involve inequalities,while others involve set membership and ordering. This observation leads us toask whether differences in structure of the invariants correlate with differences inthe respective invariants’ effectiveness for EPA. The result can help discover whatconstitutes a good invariant for EPA.To study this effect, we first classify the invariants into eight different classesbased on their structure and then consider the coverage of the invariant classes.28The classes are: Array-equality, elementwise-initialization, elementwise, initial-ization, inequality conditions, multi-value, order, return-value invariants. Table 4.3provides a brief description of each invariant class 4. The invariants are classifiedexclusively, without overlap between classes. A small number of invariants did notfall into any of these eight classes – we ignore them for this study.We calculate the coverage of an invariant class as the fraction of fault injectionruns that violate at least one of the invariants in that class. For example, if aninvariant class I has two invariants I1 and I2, and S1 and S2 are the sets of faultinjection runs that result in violation of the invariants I1 and I2 respectively, thenthe coverage of the invariant class I is given by (|S1∪S2|)/N, where N is the totalnumber of fault injection runs that had activated faults.Table 4.2 shows the number of invariants that occur in different classes for thefive applications. Due to space constraints, we only show the results for the Quick-sort and Swaptions applications. However, similar results were observed for allbenchmarks. Table 4.4 and Table 4.5 show the results of the fault injection exper-iment for these two programs, grouped by invariant classes. Note that the figuresonly show those invariant classes (see Table 4.2 that had at least one invariant inthat application.We observe that different invariant classes have different coverage dependingon the application. For example, in Quicksort (Table 4.4), the order invariants havethe highest fault coverage, followed by return-value invariants. The order invari-ants check whether arrays are sorted in either ascending or descending order. Theorder invariants are violated if at least one element in an array is misplaced, whichaccounts for their high fault coverage in Quicksort. A sizeable proportion of faultyruns with violated order invariants result in SDC failures. A proportion of the runsalso resulted in benign failures, despite violating order invariants. This may occurin Quicksort since it returns a response code rather than the sorted numbers as out-put. In comparison, return-value invariants have a lower overall fault coverage thanorder invariants. However, we observe that the majority of return-value invariantviolations result in SDCs, thus showing their importance.On the other hand, the elementwise invariants have the highest fault coverage4The rightmost column of Table 4.2 shows the number of invariants per class in each benchmark.29Table 4.3: Description of Invariant ClassesInvariant Class DescriptionA Array-equalityEquality condition on every elementof an arrayB Elementwise-initializationInitial values of array elementsC Elementwise Condition on the elements of an arrayD Initialization Invariants that associate post-conditions to pre-conditionsE Multi-valueVariable value must match exactlyone element of a setF Order Array is sorted in ascending or de-scendingG RelationalconditionsInvariants involving both equalitiesand inequalitiesH Return-valueInvariants involving the return valueof a functionTable 4.4: Classification of violated invariants from 1000 faulty Quicksortruns and their coverageFault Type Failure Invariant Classes (%)A C D E F G HDataCorruptionSDC 6 1 - - 24 - 24Crash - 1 - - 27 - -Benign 1 1 - 2 30 - 1FileI/OBufferOverflowSDC 8 2 - 2 37 - 37Crash 1 2 - - 9 - 1Benign 2 3 - 2 49 - 1BufferOverflowMallocSDC 9 3 1 2 33 - 33Crash 1 1 - - 9 - -Benign 3 3 2 2 50 3 3FunctionCallCorruptionSDC 7 1 - 1 20 - 20Crash 1 2 - - 27 - -Benign 1 1 - 2 34 - 1InvalidPointerSDC 6 1 - 2 21 - 21Crash 1 1 - - 30 - -Benign 1 1 - 2 31 - 1RaceConditionSDC - - - - 2 - 2Crash 1 1 - - 1 - -Benign 1 1 - - 97 - -30Table 4.5: Classification of violated invariants from 1000 faulty Swaptionsruns and their coverageFault Type Failure Invariant Classes (%)A B C D HDataCorruptionSDC 3 3 6 1 1Crash - - 38 - -Benign 26 26 51 - -FileI/OBufferOverflowSDC 3 3 6 1 1Crash 1 1 18 - -Benign 42 41 72 - -BufferOverflowMallocSDC 4 4 8 1 1Crash - - 18 - -Benign 42 42 71 - -FunctionCallCorruptionSDC 2 2 6 1 1Crash - - 40 - -Benign 26 26 49 - -InvalidPointerSDC 2 2 5 - -Crash - - 40 - -Benign 28 28 48 - -RaceConditionSDC - - - - -Crash - - - - -Benign 58 58 70 - -overall in Swaptions (Table 4.5). Elementwise invariants correspond to predicateson individual elements of an array. Swaptions stores its dataset in an array, whichis passed back and forth between its functions. As a result, a number of arrayelement constraints arise. Elementwise invariants offer a marginally higher faultdetection rate for SDCs compared to the other invariant classes. This contrasts withQuicksort where order and return-value invariants collectively yield high SDC faultdetection.However, there is much less variation in the coverage provided by differentinvariant classes for different types of faults. For example, in Quicksort, orderinvariants offer high coverage regardless of fault type, while multi-value invariantsoffer uniformly low coverage for this application. Thus, the application and theinvariant class have a greater impact on the fault coverage than the fault type, acrossapplications.Observation 4 The coverage of invariants for an application differs across differ-ent classes of likely invariants generated from fault-free multithreaded executions31of the application.4.8 RQ4: Performance EvaluationWe evaluate the performance of IPA by comparing each step shown in Figure 3.1,described in Section 3.3, to its equivalent in golden run EPA. Table 4.6 exhibits theaverage durations for each step, measured in seconds, each averaged over 5 runs.No faults were injected in this experiment as we wanted to obtain the worst caseperformance overheads (i.e., when the application executes to completion).We divide the overhead comparisons into two categories: setup overhead ra-tio (S), and fault detection overhead ratio (D). The values of S and D are com-puted using formulas (1) and (2), where the variable subscripts refer to the steps inEPA/IPA.S =E1I1+ I2(4.1) D =E1+E3I1/5+ I3(4.2)In IPA, the one-time setup overhead for the fault injection experiments consistsof golden run profiling (I1) over 5 runs, and invariant generation (I2). In EPA, onlygolden run profiling is performed (E1), and there is no invariant generation step.We find that IPA induces a 2-90% setup overhead over EPA in our benchmarks.Unlike the setup overhead which is a one-time cost, the fault detection overheadis incurred after every fault injection. In IPA, this process consists of generating asingle trace file (I1/5) and executing the fault detection module (I3). In EPA, thisprocess involves a line by line trace validation between golden and faulty runs (E3).We define D, the fault detection overhead ratio, as the time taken by EPA dividedby that of IPA for fault detection. We find that IPA is 2.7× to as much as 151×faster than EPA as far as fault detection is concerned, depending on the benchmark.This is because EPA traces the program execution after each point, while IPA onlychecks for consistency of invariants at function entries and exits.Thus, IPA incurs slightly higher setup overhead compared to EPA, but hassubstantially lower fault detection overheads. Since the fault detection overhead isincurred on each run (which potentially number thousands in a typical fault injec-tion experiment), it is much more important than the setup overhead.32Table 4.6: IPA vs EPA Performance Measured in Seconds for step numbers1, 2 and 3 (refer to Figure 3.1).Benchmark IPA EPA S× D×1 2 3 1 3Quicksort 5.5 5.4 0.4 1.1 1.4 0.1 2.7Blackscholes 5.5 8 0.5 4.1 72 0.29 72Streamcluster 5.5 7.3 0.7 4.1 52.2 0.31 44Swaptions 9.5 8.9 1.1 18.1 >300 0.96 151Nullhttpd 5.6 4.9 0.3 3.6 22.1 0.32 27Nbds 32 18.3 7.4 49.2 28.3 0.98 7.3Observation 5 IPA incurs a higher setup overhead compared to golden run EPA,but has significantly lower fault detection overhead.4.9 RQ5: Inferring Invariants at a Lower ConfidenceIn Section 4.2, we only consider invariants inferred from the data traces at the 99%Daikon confidence level. In this section, we conduct an analysis of whether it ispossible to infer more stable invariants at lower confidence levels. The confidencerepresents the probability that an invariant is not inferred randomly (i.e., with onesample out of thousands of traces) [8]. The availability of more stable invariants atlower confidence levels may offer higher fault coverage.We refer to the Daikon confidence of invariants as confidence rather than theirstatistical confidence. Each class of invariant has a different measure of confidencewith respect to the samples provided [8]. For instance, a relational condition in-variant such as a < b has a confidence of 1− 12n where n is the number of samplesin the data trace. However, an invariant that is falsified within the data trace filealways has a confidence equal to 0.We conduct an experiment where we infer invariants at various confidence lev-els by adjusting the ”conf limit” option in Daikon. Table 4.7 shows the number ofinvariants inferred at the 99%, 80% and 60% levels. We take the number of invari-ants at the 99% confidence as the baseline. Of the four benchmarks shown, thereis a modest increase of invariants, ranging from a 0% to 4% increase at the 80%33Table 4.7: Number of invariants inferred per confidence level.Conf. Quicksort Blackscholes Streamcluster Nbds99% 25 24 29 7380% 26 24 29 7660% 28 25 32 79confidence and 0% to 10% increase at the 60% level. The number of invariants forSwaptions and Nullhttpd did not change with the confidence levels.Furthermore, when multiple profiling runs are applied prior to invariant infer-ence, the invariants stabilize quickly at 5 runs. This shows that the use of multipleprofiling runs rather than single profiling runs lead to more stable invariants.Observation 6 The number of invariants do not significantly increase when theinvariants are inferred at a lower confidence.4.10 RQ6: Inferring Invariants at a Finer GranularityIn Section 4.2 through Section 4.9, IPA infers invariants at the function level.We explore the feasibility of inferring invariants at a finer program granularity,at the basic block level. The purpose of this experiment is to determine whetherwe can increase fault coverage through finer program granularity between invari-ants. For instance, consider the function in Figure 3.2, which we demonstrated inSection 3.4, an example of IPA using function invariants. This function can bedivided into 6 basic blocks shown in Figure 4.10 - we infer invariants at the begin-ning of each block. The invariants at the beginning of the first basic block (lines1-7) and the second basic block (lines 8-9) respectively are: {InputX > 0} and{InputX < 0}. If the condition in line 7 is erroneously changed to InputX > 0.0(same fault previously described in Section 3.4, the fault detection module can in-validate the faulty program trace through the InputX < 0 invariant. Using basicblock invariants, the fault is detected in the second basic block rather than the endof the function. IPA may not always detect faults by relying solely on invariantsat the function entry and exit points. A fault may lead to an observable error in thebody of a function, where a function variable is corrupted. In this example, InputX34may be overwritten in the computeNPrimeX() function on line 13 and the functionexit invariants may not detect the fault.To enable IPA to infer invariants at basic blocks, we modify the instrumenta-tion step of IPA in Figure 3.1 by adding a LLVM pass to instrument programs atthe entry point of each basic block. Like the function instrumentation pass in IPA,the modified pass traces the values of function argument variables rather than inter-mediate variables. We also change the fault detection script to validate the programtrace values against basic block invariants instead of function invariants. All othermodules in the IPA workflow remain the same in this experiment.We repeat the experiment in Section 4.5 to determine whether the generatedinvariants stabilize within a fixed number of runs. In Figure 4.11, we observe thatthe invariants from four benchmark programs do not stabilize within 20 runs. Ad-ditionally, Nbds is seen to have large fluctuations in inferred invariants as seen inFigure 4.12. We display Nbds separately as the number of its invariants signifi-cantly exceeds the range of other benchmark programs. However, Swaptions andNullhttpd have no fluctations in their inferred invariants. We observe that the in-variant fluctuations are greater in benchmarks whose functions were divided into alarge number of basic blocks. For example, functions in Swaptions were dividedinto 4 basic blocks on average while many functions in nbds were divided into 8 ormore basic blocks. The finer division of functions into basic blocks also signify theincreased presence of branching. The values of function variables likely fluctuatearound the branch conditions. Based on these observations, invariants inferred atthe basic block granularity are not suitable for fault detection. Since the number ofinvariants differ significantly between profiling runs, the set of invariants used as atest oracle do not stabilize within the range of 20 runs.Observation 7 Likely invariants inferred at the basic block granularity under re-peated program executions with the same input, do not stabilize within twenty exe-cutions.4.11 SummaryIn this chapter, we presented the results of the evaluation of IPA. We find thatgolden run based EPA is unsound for multithreaded programs as fault free traces35differ from one another. We also find that invariants derived by IPA is stable acrossmultiple executions and offer a 10% to 97% fault coverage depending on the faulttype and application. We find that IPA is significantly quicker than EPA, althoughit incurs a higher one-time setup overhead. Finally, we discover that invariants atthe basic block granularity do not provide the stability required for EPA. In the nextchapter, we discuss the implications of these results.360%10%20%30%40%50%60%70%80%90%100%Fault CoverageFault TypeQuicksortSDC Crash/Hang BenignFigure 4.4: Proportion of 1000 faulty Quicksort runs that violate at least oneinvariant370%10%20%30%40%50%60%70%80%90%100%Fault CoverageFault TypeSwaptionsSDC Crash/Hang BenignFigure 4.5: Proportion of 1000 faulty Swaptions runs that violate at least oneinvariant380%2%4%6%8%10%12%14%16%Fault CoverageFault TypeBlackscholesSDC Crash/Hang BenignFigure 4.6: Proportion of 1000 faulty Blackscholes runs that violate at leastone invariant390%10%20%30%40%50%60%70%80%Faul t Cover ageFault TypeStreamclusterSDC Crash/Hang BenignFigure 4.7: Proportion of 1000 faulty Streamcluster runs that violate at leastone invariant400%5%10%15%20%25%30%35%40%45%Faul t Cover ageFault TypeNullhttpdCrash/Hang BenignFigure 4.8: Proportion of 1000 faulty Nullhttpd runs that violate at least oneinvariant410%5%10%15%20%25%30%35%Faul t Cover ageFault TypeNbdsCrash/Hang BenignFigure 4.9: Proportion of 1000 faulty Nbds runs that violate at least one in-variant42Figure 4.10: Control flow graph with labelled basic blocks of the functionshown in Figure 3.2430501001502002503001 5 9 13 17 21Number of Likely InvariantsNumber of Profiling RunsBlackscholesQuicksortStreamclusterNullhttpdFigure 4.11: Number of invariants generated from varying numbers of profil-ing runs at the basic block level.4412401260128013001320134013601380140014201 5 9 13 17 21Number of Likely InvariantsNumber of Profiling RunsNbdsFigure 4.12: Number of invariants generated from varying numbers of profil-ing runs at the basic block level.45Chapter 5DiscussionIn this chapter, we first present the implications of our results, and then we outlinethe threats to the validity of our study, and lastly we discuss the false positives andfalse negatives incurred by IPA.5.1 ImplicationsIn this thesis, we address the question of whether likely invariants derived by au-tomated techniques can be used for EPA in multithreaded programs. IPA requiresstable invariants, which provide high coverage for different types of faults. Wefind that the invariants stabilize within a few executions of the program. However,their coverage is highly dependent on the application. For some applications, thecoverage provided is high (80 % to 90 %), while for other applications, the cover-age is quite low (10 % or less). The type of inferred invariants is another factorfor consideration. In Table 4.2, relational invariants (Type G) are predominant inall benchmarks. Conversely, as seen in both Table 4.4 and Table 4.5, their faultcoverages are low. This suggests that existing invariants derived by automatedtools such as Daikon [8] may not be sufficient to ensure high fault coverage acrossapplications.Furthermore, the coverage provided by the invariants depends on the specificfault that is injected, e.g., race conditions. Finally, most of the invariants providecoverage for benign failures and crashes, both of which are much more numerous46than SDCs. However, SDCs are an important concern in practice, as they can resultin catastrophic failures, and likely invariants do not currently provide high coveragefor SDCs. Improving the coverage of likely invariants for SDCs is a direction forfuture work.We further study the effect of invariant structure on fault coverage by groupingthe invariants into different categories. Similar to the prior experiments in Sec-tion 4.6, we observe a significant correlation between the invariant structure andfault coverage though this is more dependent on the application rather than thefault type. However, we find that there is no single class of invariants that pro-vides high coverage across all applications. This implies that it may be better toderive invariants on an application-specific basis, say based on its algorithm, thanto use generic approaches such as Daikon for deriving the invariants. This is alsoa direction for future work.Lastly, we examine the confidence and granularity parameters for the invari-ant inference module in IPA. In Section 4.9, we observe that lowering the Daikonconfidence in IPA does not produce significantly more invariants in the benchmarkapplications. This implies that invariants inferred at the default 99% confidence of-fers roughly approximate fault coverage to that of invariants at a lower confidence.In Section 4.10, we find that invariants inferred at the basic block granularity donot stabilize within 20 runs, are hence, unpreferable for fault detection. In con-trast, the default granularity for IPA is at the function level, which produces stableinvariants with only 5 runs. While changing the granularity to basic blocks mayallow more invariants to be inferred overall, the lack of invariant stability within areasonable number of profiling runs inhibits this option.5.2 Threats to ValidityThere are three threats to the validity of our results. First, IPA uses Daikon forgenerating likely invariants. Some results may not apply if an alternate approachto likely invariant generation is used, which is an external threat to validity. How-ever, Daikon is the most common likely invariant generator used today, and it pro-duces reasonably generic invariants. Hence, we consider our results valid for mostscenarios.47Second, since IPA is limited to tracing local values of primitive data types, theset of generated invariants excludes invariants involving objects and global values.As a result, the invariants deployed for validation are not necessarily the most rele-vant invariants for the program. This is an internal threat to validity. However, mostbenchmarks in this study use only primitive data types in their function parameters,and hence this was not an issue in our programs.Finally, we consider only a limited number of fault types (6) and a limited num-ber of benchmark programs to evaluate IPA. However, we chose the fault types tocover common software bugs, and hence we believe the results are representative.Further, we choose the benchmarks to represent a wide variety of scenarios wheremulti-threading is commonly used.5.3 False Positives and False Negatives5.3.1 False PositivesIn this work, false positives are defined as violated likely invariants that are notcaused by an injected fault. False positives could incite users of IPA to address afault that is not present in the system, diverting developer effort. IPA may reportfalse positives due to the following reasons:• Small sample size: If the number of traces passed to the inference moduleis small, it is possible that an invariant inferred by Daikon could be falsifiedif additional traces are explored.• Unstable invariants: If the set of inferred invariants are unstable, spuriousinvariants (i.e., invariants that only appear in a few traces) could be utilizedfor fault detection. IPA tries to address this problem by inferring invariantsfrom multiple profiling runs. However, the threshold for convergence mayvary across applications. It is also possible that the set of invariants may notbe truly convergent.In Section 4.6, we did not measure the false positive rate for fault coveragereported by IPA separately. We minimize the false positive rate by using multipleprofiling runs and stable invariants for fault detection.485.3.2 False NegativesFalse negatives are defined as activated faults do not violate a likely invariant. Thefalse negative rate of IPA is the proportion of faulty traces with no violated invari-ants, the complement of the fault coverage rate. IPA may report false negatives,owing to the following reasons:• Invariants on Primitives Only: IPA only infers invariants on primitive datatypes only. As a result, IPA cannot detect faults that impact objects.• Invariants on Function Arguments Only: IPA only infers invariants on thefunction arguments. This means that IPA is unable to detect faults, whichimpact local variables in functions.Compared with false positives, false negatives are more pertinent to users ofIPA as their presence could induce users to erringly neglect the effects of a faulton their target program. However, the sources of false negatives in IPA can beresolved through further development of the framework.49Chapter 6Conclusions and Future WorkThis thesis presented an alternative EPA framework using likely invariants for mul-tithreaded programs. The findings from the evaluation reveals that this frameworkcan act as a substitute for golden run based EPA in programs with a sufficient num-ber of stable invariants. This chapter summarizes the research contributions of thisthesis and makes suggestions for future work.6.1 ConclusionsWith processors expanding core counts, multithreaded programs are rising in preva-lence. Despite this trend, existing methods for EPA that make use of golden traces,are unequipped to handle multithreaded programs.To address this problem, we present an EPA framework using likely invariantsin lieu of golden traces, and experimentally evaluate the effectiveness of invari-ants. Our results indicate that invariants can be dynamically derived in all of ourbenchmark applications, with reasonable stability. We inject an assortment of soft-ware faults on the benchmarks to access fault coverage offered by invariants. Wediscover that invariants can capture the effects of faults, but the specific rates offault coverage diverge between different applications and fault types. Building onthese observations, we further dissect the analysis of fault coverage by groupingsimilar classes of invariants together, and find that certain invariant classes offerhigher fault coverage than others across different fault classes. Lastly, we explore50the choice of IPA’s default confidence and granularity parameters in the invariantinference module and find them to be justified among alternatives. In conclusion,likely invariants offer a viable replacement for golden-run based EPA only in someapplications, and not others.6.2 Future WorkFuture work can explore the use of application-specific invariants for increasingfault coverage, and the feasibility of deploying IPA as an intrusion detection sys-tem (IDS) in concurrent systems.In our assessment of IPA, we find that the fault coverage of invariants variesacross applications. We hypothesize that application-specific invariants may yieldgreater fault coverage than presently observed. Unlike the generic classes of in-variants inferred by Daikon, used in our inference module, application-specificinvariants would be tailored to the program algorithm. For example, Daikon cur-rently does not output invariants around the summation of variables. Such classesof invariants may be important in certain algorithms. Another related direction isto expand the scope of the profiling and inference module to handle invariants ofdifferent data structures, beyond primitive types. For example, this enhancementcould allow IPA to infer invariants directly on graphs in graph searching algo-rithms, rather than on the integer values of the vertices. This may facilitate IPAin detecting faults that affect the edges as opposed to waiting if and when a vertexvalue is corrupted. Static analysis techniques could be incorporated to determinethe appropriate type of invariants for different algorithms.Although IPA is a fault injection framework, many lessons learned in this the-sis could be extended to software based IDS in concurrent systems. Software-based IDS reports security attacks on target programs by detecting violations ofsoftware system properties. Many recent IDS have utilized invariants for this pur-pose [2, 40]. IPA could potentially offer an IDS framework for concurrent systems.However, additional work would be required to understand the intrusion detectioncapabilities of the invariants against realistic threat models. Like the work con-ducted in this thesis, it would also be worthwhile to understand the detection ca-pabilities of different classes of invariants and determine methods to increase the51intrusion detection capability of IPA.In this thesis, we focus on the impacts of non-determinism caused by threadinterleavings on EPA and propose IPA as an alternative framework. However, webelieve IPA could be applied to other sources of non-determinism such as proba-bilistic algorithms. Furthermore, the contributions of this thesis could expand un-derstanding of the impacts of real-world faults in non-deterministic environmentsand promote the development of more resilient applications in such systems.52Bibliography[1] M. R. Aliabadi, K. Pattabiraman, and N. Bidokhti. Soft-LLFI: AComprehensive Framework for Software Fault Injection. In Proc. ISSREW’14, pages 1–5, 2014. → pages 2[2] M. R. Aliabadi, A. A. Kamath, J. Gascon-Samson, and K. Pattabiraman.Artinali: Dynamic invariant detection for cyber-physical system security. InProc. FSE ’17, 2017. → pages 51[3] C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, PrincetonUniversity, 2011. → pages 19[4] A. Chan, S. Winter, H. Saissi, K. Pattabiraman, and N. Suri. Ipa: Errorpropagation analysis of multi-threaded programs using likely invariants. InProc. ICST ’17, pages 184–195, 2017. → pages iv[5] J. Christmansson, M. Hiller, and M. Rimen. An experimental comparison offault and error injection. In Proc. ISSRE ’98, pages 369–378, 1998. → pages2[6] C. Csallner, N. Tillmann, and Y. Smaragdakis. DySy: Dynamic SymbolicExecution for Invariant Inference. In Proc. ICSE ’08, pages 281–290, 2008.→ pages 8, 9[7] J. Duraes and H. Madeira. Emulation of Software Faults: A Field DataStudy and a Practical Approach. 32(11):849–867, 2006. → pages 2, 5[8] M. D. Ernst, A. Czeisler, W. G. Griswold, and D. Notkin. Quickly DetectingRelevant Program Invariants. In Proc. ICSE ’00, pages 449–458, 2000. →pages 2, 3, 8, 9, 19, 33, 46[9] C. Fetzer, P. Felber, and K. Hogstedt. Automatic detection and masking ofnonatomic exception handling. 30(8):547–560, 2004. → pages 553[10] C. Fu and B. Ryder. Exception-Chain Analysis: Revealing ExceptionHandling Architecture in Java Server Applications. In Proc. ICSE ’07, pages230–239, 2007. → pages[11] C. Fu, A. Milanova, B. Ryder, and D. Wonnacott. Robustness testing of Javaserver applications. Software Engineering, IEEE Transactions on, 31(4):292–311, 2005. → pages 5[12] A. Ghosh, T. O’Connor, and G. McGraw. An automated approach foridentifying potential vulnerabilities in software. In Proc. IEEE S & P, pages104–114, 1998. → pages 21[13] C. Giuffrida, A. Kuijsten, and A. S. Tanenbaum. EDFI: A Dependable FaultInjection Tool for Dependability Benchmarking Experiments. In Proc.PRDC ’13, pages 31–40, 2013. → pages 5[14] S. Hangal and M. S. Lam. Tracking Down Software Bugs Using AutomaticAnomaly Detection. In Proc. ICSE ’02, pages 291–301, 2002. → pages 8, 9[15] M. Hiller, A. Jhumka, and N. Suri. PROPANE: An Environment forExamining the Propagation of Errors in Software. In Proc. ISSTA ’02, pages81–85, 2002. → pages 2, 6[16] M.-C. Hsueh, T. K. Tsai, and R. K. Iyer. Fault injection techniques andtools. 30(4):75–82, 1997. → pages 21[17] D. Jeffrey, N. Gupta, and R. Gupta. Identifying the root causes of memorybugs using corrupted memory location suppression. In Proc. ICSM ’08,pages 356–365, 2008. → pages 21[18] A. Jin. A PIN-Based Dynamic Software Fault Injection System. In Proc.ICYCS ’08, pages 2160–2167. IEEE, 2008. → pages 5[19] A. J. Ko and B. A. Myers. Development and Evaluation of a Model ofProgramming Errors. In Proc. HCC ’03, pages 7–14, 2003. ISBN0-7803-8225-0. → pages 15[20] P. Koopman and J. DeVale. The exception handling effectiveness of POSIXoperating systems. 26(9):837–848, 2000. ISSN 0098-5589. → pages 5[21] M. Kusano, A. Chattopadhyay, and C. Wang. Dynamic Generation of LikelyInvariants for Multithreaded Programs. In Proc. ICSE ’15, pages 835–846,2015. → pages 9, 1954[22] C. Lattner and V. Adve. LLVM: a compilation framework for lifelongprogram analysis transformation. In Proc. CGO ’04, pages 75–86, 2004. →pages 19[23] M. Leeke and A. Jhumka. Evaluating the Use of Reference Run Models inFault Injection Analysis. In Proc. PRDC ’09, pages 121–124, 2009. →pages 2, 6[24] G. Lemos and E. Martins. Specification-guided golden run for analysis ofrobustness testing results. In Proc. SERE ’12, pages 157–166, 2012. →pages 6[25] B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remoteprogram sampling. In Proc. PLDI ’03, pages 141–154, 2003. → pages 1[26] Q. Lu, M. Farahani, J. Wei, A. Thomas, and K. Pattabiraman. LLFI: AnIntermediate Code-Level Fault Injection Tool for Hardware Faults. In Proc.QRS ’15, pages 11–16, 2015. → pages 5, 6, 20, 26[27] S. Lu, J. Tucek, F. Qin, and Y. Zhou. Avio: Detecting atomicity violationsvia access interleaving invariants. In Proc. ASPLOS XII, pages 37–48, 2006.→ pages 9[28] P. Marinescu and G. Candea. LFI: A practical and general library-level faultinjector. In Proc. DSN ’09, pages 379–388, 2009. → pages 5[29] A. Mazurkiewicz. Trace theory. In Petri nets: applications and relationshipsto other models of concurrency, pages 278–324. Springer, 1987. → pages 13[30] R. Natella, D. Cotroneo, J. Duraes, and H. Madeira. On FaultRepresentativeness of Software Fault Injection. 39(1):80–96, 2013. ISSN0098-5589. → pages 2, 5[31] T. Naughton, W. Bland, G. Vallee, C. Engelmann, and S. L. Scott. FaultInjection Framework for System Resilience Evaluation: Fake Faults forFinding Future Failures. In Proc. Resilience ’09, pages 23–28, 2009. →pages 2[32] nbds. Non-blocking data structures. https://code.google.com/p/nbds/.Accessed: 2016-05-17. → pages 20[33] nullhttpd. Null httpd. https://sourceforge.net/projects/nullhttpd/. Accessed:2016-05-17. → pages 2055[34] O. Padon, N. Immerman, S. Shoham, A. Karbyshev, and M. Sagiv.Decidability of Inferring Inductive Invariants. In Proc. POPL ’16, pages217–231, 2016. → pages 8[35] M. Raiyat Aliabadi and K. Pattabiraman. FIDL: A Fault InjectionDescription Language for Compiler-Based SFI Tools, pages 12–23.Springer International Publishing, Cham, 2016. → pages 21[36] S. Sahoo, M.-L. Li, P. Ramachandran, S. Adve, V. Adve, and Y. Zhou. Usinglikely program invariants to detect hardware errors. In Proc. DSN ’08, pages70–79, 2008. → pages 9[37] D. Schuler, V. Dallmeier, and A. Zeller. Efficient mutation testing bychecking invariant violations. In Proc. ISSTA ’09, pages 69–80, 2009. →pages 9[38] Z. Segall, D. Vrsalovic, D. Siewiorek, D. Yaskin, J. Kownacki, J. Barton,R. Dancey, A. Robinson, and T. Lin. FIAT-fault injection based automatedtesting environment. In Proc. FTCS-18, pages 102–107, 1988. → pages 5[39] M. Sullivan and R. Chillarege. Software defects and their impact on systemavailability-a study of field failures in operating systems. In Proc. FTCS-21,pages 2–9, 1991. → pages 5[40] F. M. Tabrizi and K. Pattabiraman. Flexible intrusion detection systems formemory-constrained embedded systems. In Proc. EDCC ’15, pages 1–12,2015. → pages 51[41] V. Vipindeep and P. Jalote. List of common bugs and programming practicesto avoid them, 2005. → pages 21[42] E. A. Youngs. Human errors in programming. International Journal ofMan-Machine Studies, 6(3):361 – 376, 1974. → pages 15[43] M. Zitser, R. Lippmann, and T. Leek. Testing static analysis tools usingexploitable buffer overflows from open source code. In Proc. SIGSOFT ’04,pages 97–106. ACM, 2004. → pages 156


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items