Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Configurable detection of SDC-causing errors in programs Lu, Qining 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2015_may_lu_qining.pdf [ 2.42MB ]
Metadata
JSON: 24-1.0167661.json
JSON-LD: 24-1.0167661-ld.json
RDF/XML (Pretty): 24-1.0167661-rdf.xml
RDF/JSON: 24-1.0167661-rdf.json
Turtle: 24-1.0167661-turtle.txt
N-Triples: 24-1.0167661-rdf-ntriples.txt
Original Record: 24-1.0167661-source.json
Full Text
24-1.0167661-fulltext.txt
Citation
24-1.0167661.ris

Full Text

Configurable Detection of SDC-causingErrors in ProgramsbyQining LuB.E., Beihang University, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)February 2015© Qining Lu 2015AbstractSilent Data Corruption (SDC) is a serious reliability issue in many do-mains, including embedded systems. However, current protection techniquesare brittle, and do not allow programmers to trade off performance for SDCcoverage. Further, many of them require tens of thousands of fault injec-tion experiments, which are highly time-intensive. In this thesis, we proposetwo empirical models, namely SDCTune and SDCAuto, to predict the SDCproneness of a program's data. Both models are based on static and dy-namic features of the program alone, and do not require fault injections tobe performed. The difference between the two models is that SDCTune isbuilt using a manual tuning process, while SDCAuto is built using a ma-chine learning algorithm. We then develop an algorithm using both modelsto selectively protect the most SDC-prone data in the program subject to agiven performance overhead bound. Our results show that both models areaccurate at predicting the relative SDC rate of an application. And in termsof efficiency of detection (i.e., ratio of SDC coverage provided to performanceoverhead), our technique outperforms full duplication by a factor of 0.78x to1.65x with SDCTune model, and 0.62x to 0.96x with SDCAuto model.iiPrefaceThis thesis is based on a work conducted by myself in collaboration withDr. Karthik Pattabiraman, Dr. Meeta S. Gupta and Dr. Jude A. Rivers.The work was published as a conference paper in the ESWeek 2014 Interna-tional Conference on Compilers, Architectures and Synthesis of EmbeddedSystems (CASES) [21]. I was responsible for coming up with the solution andvalidating it, evaluating the solution and analyzing the results, and writingthe paper. Karthik was responsible for guiding me with the solution reason-ing, experiments design and results analysis, as well as editing and writingportions of the paper.Qining Lu, Karthik Pattabiraman, Meeta S. Gupta and Jude A. Rivers,SDCTune: A Model for Predicting the SDC Proneness of an Application forConfigurable Protection, the ESWeek International Conference on Compilers,Architectures and Synthesis of Embedded Systems, 2014iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Initial Fault Injection Study . . . . . . . . . . . . . . . . . . . 82.1 Terminology and Protection Model . . . . . . . . . . . . . . . 8ivTable of Contents2.2 Fault Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Fault Injection Experiment . . . . . . . . . . . . . . . . . . . 112.4 Injection Results . . . . . . . . . . . . . . . . . . . . . . . . . 122.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1 Heuristics for Fault Propagation . . . . . . . . . . . . . . . . 173.2 Heuristics for Store Operations . . . . . . . . . . . . . . . . . 193.3 Heuristics for Comparison Operations . . . . . . . . . . . . . 263.4 Heuristics of Other Factors . . . . . . . . . . . . . . . . . . . 273.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 344.2 Manually Tuned Model: SDCTune . . . . . . . . . . . . . . . 384.3 Automatically Tuned Model: SDCAuto . . . . . . . . . . . . 404.4 Model Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.5 Choosing the Instructions . . . . . . . . . . . . . . . . . . . . 454.6 Detector Design . . . . . . . . . . . . . . . . . . . . . . . . . 454.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 495.1 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.2 Evaluation Method . . . . . . . . . . . . . . . . . . . . . . . 505.3 Work Flow and Implementation . . . . . . . . . . . . . . . . 535.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56vTable of Contents6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.1 Time Taken by Models . . . . . . . . . . . . . . . . . . . . . 576.2 Effect of Decision Tree Parameters . . . . . . . . . . . . . . . 596.3 Estimation of Overall SDC Rates . . . . . . . . . . . . . . . 606.4 SDC Coverage and Detection Efficiency . . . . . . . . . . . . 646.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707.1 Differences between Benchmarks . . . . . . . . . . . . . . . . 707.2 Differences between Models . . . . . . . . . . . . . . . . . . . 727.3 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . 738 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758.1 Duplication Based Techniques . . . . . . . . . . . . . . . . . 758.2 Invariant Based Techniques . . . . . . . . . . . . . . . . . . . 778.3 Application or Algorithm Specific Techniques . . . . . . . . 789 Conclusion and Future Work . . . . . . . . . . . . . . . . . . 80Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82viList of Tables2.1 Variation of SDC proneness of highly executed instructions. . 143.1 Effects on SDC proneness of some operations . . . . . . . . . 183.2 SDC decreasing rates of masking/crashing prone operations . 183.3 Four major categories of stored values . . . . . . . . . . . . . 214.1 Some features extracted for modeling building . . . . . . . . . 354.1 Some features extracted for modeling building . . . . . . . . . 364.1 Some features extracted for modeling building . . . . . . . . . 375.1 Training programs . . . . . . . . . . . . . . . . . . . . . . . . 505.2 Testing programs . . . . . . . . . . . . . . . . . . . . . . . . . 516.1 Time consumption of SDCTune and SDCAuto . . . . . . . . 586.2 Features adopted by the optimal decision trees for SDCAutomodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.3 The SDC rates and ranks from fault injections and our models 64viiList of Figures3.1 Average SDC proneness observed across all studied programs 203.2 Example of Addr NoCmp . . . . . . . . . . . . . . . . . . . . 223.3 Example of Addr Cmp . . . . . . . . . . . . . . . . . . . . . . 223.4 Example of Cmp NoAddr . . . . . . . . . . . . . . . . . . . . 233.5 Example of Cmp NoAddr . . . . . . . . . . . . . . . . . . . . 233.6 Example of NoCmp NoAddr . . . . . . . . . . . . . . . . . . . 243.7 Example of NoCmp NoAddr . . . . . . . . . . . . . . . . . . . 243.8 Examples of comparison results. . . . . . . . . . . . . . . . . . 283.9 Average SDC proneness of global ones and others in resilient-comparisons-used variables . . . . . . . . . . . . . . . . . . . . 303.10 Average SDC proneness of comparisons with different depthsof loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.11 Average SDC proneness for cumulative variables and others . 323.12 Average SDC proneness for different fan-outs . . . . . . . . . 324.1 Auto tuned decision tree for stored values . . . . . . . . . . . 444.2 Example of inserted detectors and concatenating instructions 464.3 Example of inserted detectors and lazy checking . . . . . . . . 47viiiList of Figures5.1 The workflow of building regression trees and exploring theparameter space for SDCAuto. . . . . . . . . . . . . . . . . . 545.2 The workflow of applying our models . . . . . . . . . . . . . . 556.1 Effect of data point threshold(y-axis) and minimum size ofleaves(x-axis) on regression results . . . . . . . . . . . . . . . 616.2 The correlation of overall SDC rates for all programs. . . . . . 636.3 The overhead of full duplication and hot-path duplication . . 676.4 The results of SDCTune model for different performance over-head bounds, hot-path duplication and full duplication. . . . 686.5 The results of SDCAuto model for different performance over-head bounds, hot-path duplication and full duplication. . . . 69ixAcknowledgementsFirst of all, I would like to thank my advisor Dr. Karthik Pattabiramanfor his support during the past two years. Karthik constantly motivated meto think out of the box independently, and at the same time he gave meguidance and suggestions at the appropriate time. Without his support, thisthesis would not have been possible, and I would not have improved so muchin creative and critical thinking. Karthik is also a mentor to me. His passion,enthusiasm and consistency is always something I am trying to learn.I would like to thank my colleagues in the Computer Systems ReadingGroup (CSRG) for their feedbacks on my practice talks. The weekly paperdiscussion meeting is also a place where I got the big picture in this area.Many thanks to my labmates for making the lab an enjoyable place to work,and for their advice and suggestions on my work and paper writing.I am grateful to my dear friends in Canada and China, for helping mewith some real problems or to talk about life.Finally, special thanks to my family. They do not know what I amworking on exactly, but they have always provided me support. I wouldnever be whom I am today without their constant support.xDedicationTo my parentsxiChapter 1Introduction1.1 MotivationHardware errors are increasing due to shrinking feature sizes [3, 7]. Con-ventional hardware-only solutions such as guard banding and hardware re-dundancy are no longer feasible due to power constraints. As a result, re-searchers have explored software duplication techniques to tolerate hardwarefaults [28]. However, generic software solutions such as full duplication incurhigh power and performance overhead, and hence there is a compelling needfor configurable, application-specific solutions for tolerating hardware faults.This is especially so for embedded systems, which have to operate understrict performance and/or power constraints, in order to meet system-widetiming and energy targets.Hardware faults can affect the running software in three ways: (1) theymay not have any effect on the application (benign/masked), (2) they maycrash or hang the program, or (3) they may lead to incorrect outputs, alsocalled Silent Data Corruption (SDCs). While crashes and hangs are impor-tant from an availability perspective, SDCs are important from a reliabilityperspective because they cause programs to fail without any indication of thefailure. Prior work [24, 34] has broadly focused on crashes and hangs; there-11.2. Backgroundfore we focus on configurable techniques to reduce or eliminate the numberof SDCs in programs.Studies have shown that SDCs are caused by errors in a relatively smallproportion of programs' data variables [11, 14, 33], and by selectively protect-ing these SDC-prone variables, one can achieve high coverage against SDCs.However, most prior work has identified SDC-prone variables using faultinjection experiments, which are expensive for large applications [11, 14].Other work [33] focuses on Egregious Data Corruptions (EDC), which are asubset of SDCs that cause unacceptable deviations in soft-computing appli-cations, i.e., applications with relaxed correctness properties. For example,a single pixel being corrupted in a frame of a video processing applicationwould be an SDC but not an EDC, while the entire frame being corruptedwould be an EDC as it can cause an unacceptable deviation. While their ap-proach is useful for soft-computing applications, it does not apply to general-purpose applications. Further, most of the prior approaches do not allow theuser to trade-off performance for reliability by selectively protecting only afraction of the SDC-prone variables to satisfy strict performance constraints,especially for embedded systems. The only exception that we are aware ofis the work by Shafique et al. [30], but their technique does not distinguishbetween SDC causing errors and other failure causing errors.1.2 BackgroundWe adopt Lapire et al.'s definition of "fault-error-failure" chain [17] asfollows.21.2. BackgroundSystem failure occurs when the delivered service deviates fromthe specified service. The failure occurred because the systemwas erroneous: an error is that part of the system state which isliable to lead to failure. The cause in its phenomenological senseof an error is a fault.Faults can be classified into two groups based on their sources: hardwarefaults and software faults. (1) Hardware faults represent a dysfunction ofone or more hardware components for a period of time. Hardware faultscan be further classified according to their lifetime into transient hardwarefaults, intermittent hardware faults and permanent hardware faults. Transienthardware faults usually occur only once at a location, and last only fora short duration of time. Such faults are usually triggered by cosmic raystrikes, temperature variation or electronic noise. Prior studies have showthat transient hardware faults are increasing due to shrinking feature sizes [3,7]. Intermittent hardware faults usually recur at a location over a period oftime. They are usually the results of timing violations in the chip, which areexacerbated by wear out. Permanent hardware faults occur continuously at afaulty location, and are usually the result of manufacturing defects or circuitaging. (2) Software faults are those rooted in software code which are usuallycaused by the programmer's mistakes or oversights. Memory leak, danglingpointers and deadlocks can be considered as common software faults. In thiswork, we focus on the errors caused by transient hardware faults.A fault becomes an error once it corrupts the state of the program (i.e.,changing the result of an operation, accessing wrong memory spaces, etc.).31.2. BackgroundThe error may or may not result in a failure. Errors that do not causefailures are known as benign errors. Other errors are classified accordingto their consequences as follows. (1) Crash, if the errors trigger systemalerts like hardware exceptions, operating system panic, etc. (2) SDC, if theprogram returns a wrong output without throwing any exceptions or raisingany alerts. (3) Hang, if the program never ends or ends after a considerablylong time. In this thesis, we focus on detecting SDCs, which are importantfor the reliability of the program.To study the program behaviour in the presence of faults, we apply faultinjection to simulate transient hardware faults in our initial study. Fault in-jection is a procedure to introduce faults in a systematic, controlled mannerto study the behaviour of the system under test. Fault injection techniquescan be generally categorized into hardware-based and software-based tech-niques. In this work, we adopt software-based fault injection technique.Software-based fault injection techniques emulate the effects of hardwarefaults at the software level by corrupting the values of program data/in-structions [5, 15]. The main limitations of software-based techniques are theirlimited coverage of potential fault locations and speed. However, software-based techniques offer a high level of configurability without invasive hard-ware modifications and sufficient emulation speed to repeat thousands ofruns. Therefore, we use software fault injection for our study.41.3. Proposed Solution1.3 Proposed SolutionIn this thesis, we propose two models, namely SDCTune and SDCAuto,to quantify the SDC proneness of program variables, and develop a model-based technique to selectively protect highly SDC-prone variables in theprogram. An SDC prone variable is one in which a fault is highly likely toresult in an SDC, and hence needs to be protected. SDCTune and SDCAutouse only static and dynamic analysis to identify the SDC-prone variables in aprogram, without requiring any fault injections to be performed. Further, itallows users to configure the amount of protection depending on the amountof performance overhead they are willing to tolerate. We call our first model,SDCTune, as it allows tunable protection, and second model, SDCAuto, asit builds the model automatically through a machine learning algorithm.The main novelty of our approach is in the identification of heuristicsthat correlate with highly SDC-prone program variables and then integratingthem in our model to quantify the SDC proneness of a variable. We extractthese heuristics using fault injection experiments on a small set of benchmarkprograms that we use for training purposes. We integrate the heuristics inour models with automated program analysis and machine learning algo-rithms. While the initial identification of the heuristics used in SDCTuneand SDCAuto requires fault injection, we do not need fault injection to applyour models to new programs.In this thesis, we target transient errors, and hence we focus on error de-tection rather than recovery (as the program can be restarted from a check-point to recover from a transient error). We use SDCTune and SDCAuto to51.4. Contributionsidentify SDC-prone variables in the program, and to derive error detectorsfor the variables, subject to a given performance overhead. Our detectorsrecompute the value of the chosen variable(s) by duplicating their backwardslice(s), and compare the recomputed value with the original one. Any de-viation between the two values is treated as a successful error detection.1.4 ContributionsWe make the following contributions in this thesis:ˆ We develop heuristics to identify SDC-prone variables based on aninitial fault-injection study (Chapter 2). These heuristics are based onstatic analysis and profile information (Chapter 3).ˆ We first develop a manually-tuned model, SDCTune, based on theheuristics developed to identify the relatively SDC-prone variables in aprogram. We then propose an algorithm based on the model to deriveerror detectors that check the values of the SDC-prone variables atruntime, subject to a performance overhead constraint specified by theprogrammer (Chapter 4).ˆ We also develop a automatically tuned model, SDCAuto, based on thedecision tree machine learning algorithm [27] which can automaticallybuild a regression model from training data.ˆ We evaluate SDCTune and SDCAuto by using them to predict the over-all SDC proneness of a program relative to other programs. The resultsshow that both SDCTune and SDCAuto are highly accurate at predict-ing the overall SDC proneness of a program relative to other programs.61.4. ContributionsThe correlation coefficient between the predicted and observed overallSDC rates ranges from 0.855 to 0.877 (Chapter 6) depending on themodel.ˆ We evaluate the detectors inserted by our algorithm by performingfault-injection experiments on six different programs from those used inour model extraction, for performance overhead bounds ranging from10% to 30%. The results show that our detectors can achieve highdetection coverage for SDC-causing errors, for the given performanceoverhead. SDCTune achieves 0.78x to 1.65x higher efficiencies (i.e.,ratio of SDC coverage provided to performance overhead) than bothfull duplication and hot-path duplication, SDCAuto achieves 0.62x to0.96x higher efficiencies (Chapter 6).7Chapter 2Initial Fault Injection StudyBecause SDC failures are caused by faults that propagate to the pro-gram's output, the SDC proneness of an instruction depends on how it prop-agates a fault, which in turn is determined by its data dependencies. In thischapter, we empirically study how SDC proneness of instructions is influ-enced by the data dependency chains. We first define some terms we willuse in this thesis and formalize the protection problem. We then presentour fault model in Section 2.2 and describe our fault injection experimentin Section 2.3. The results of the experiment is discussed in Section 2.4,and will be used in Chapter 3 to develop heuristics for estimating the SDCproneness of program variables.2.1 Terminology and Protection ModelWe first define the following terms in this paper:Overall SDC rate: This is the overall probability that a transienthardware fault leads to an SDC in the program. We denote this by P (SDC).SDC coverage of an instruction: We define the SDC coverage ofan instruction I to be the probability that an SDC failure is caused by atransient hardware fault in instruction I's result and thus can be detected82.1. Terminology and Protection Modelby protecting instruction I with a detector. This is denoted as P (I|SDC).SDC proneness per instruction: This is the probability that a tran-sient hardware fault in instruction I leads to an SDC. This is denoted asP (SDC|I).Dynamic count ratio: This is the ratio of the number of dynamic in-stances of instruction I executed to the total number of dynamic instructionsin the program. This is denoted as P (I).Our overall goal is to selectively protect instructions with detectors, tomaximize the SDC detection coverage for a given performance cost bud-get. The SDC detection coverage of an instruction, P (I|SDC), representsthe "fraction of SDCs" that can be detected by protecting instruction I,and thus directly represents the importance of the instruction I. There-fore, our goal is to maximize the∑I∈inst set P (I|SDC) subject to a certain∑I∈inst set P (I) specified by the user.∑I∈inst set P (I|SDC) is the coverageof SDC causing faults by protecting the instructions in set: inst set while∑I∈inst set P (I) is the number of dynamic instances of protected instructionsand is proportional to the protection overhead.As mentioned above, it is important to understand how P (I|SDC) variesfor each instruction in the program. One way to do this is to perform randomfault injection into the program and measure P (I|SDC) for each instruction.However, it is difficult to directly measure this probability for each instruc-tion by random fault injection as each instruction may not be injected suf-ficient number of times to obtain statistically significant estimates. Instead,we perform a fixed number of fault injections into individual instructions tomeasure their SDC proneness, P (SDC|I). We then use Bayes' formula to92.2. Fault Modelobtain P (I|SDC):P (I|SDC) =P (SDC|I)P (I)P (SDC)(2.1)where,P (SDC) =∑I∈progP (SDC|I)P (I) (2.2)2.2 Fault ModelWe consider transient hardware faults that occur in processors and cor-rupt program data. Such faults are usually caused by electrical noise, cosmicrays or temperature variation. These faults are exacerbated by decreases infeature sizes and supply voltages. More specifically, we focus on the faultsthat occur in processors' functional units and registers, (i.e., the ALUs, LSUs,GPRs, etc.) which generally result in a corruption of the program data. How-ever, we do not consider the faults in caches or control logic. Architecturalsolutions [19] such as ECC or parity can protect the chip from the faultsin the caches, while faults in the control logic usually trigger hardware ex-ceptions [37]. We do not consider faults in the program's code or programcounter, as such faults can be detected by control-flow checking techniques.As in other work [10, 11, 33], we assume that at most one fault occursduring a program's execution. This is because transient faults are rare rela-tive to the execution times of typical programs.102.3. Fault Injection Experiment2.3 Fault Injection ExperimentThe goal of our fault injection experiment is to understand the reasonsfor SDCs when faults are injected into the program. In other words, we wantto study the SDC proneness of instructions in the program, and understandhow it varies by instruction.The fault injection experiment is conducted using LLFI, a program levelfault injection tool, which has been shown to be accurate for measuring SDCsin programs [35]. LLFI works at the intermediate representative (IR) levelof LLVM compiler infrastructure [18], and enables the user to inject faultsinto the LLVM IR instructions. Using LLFI, we inject into the result of arandom dynamic instruction to emulate the effect of a computational error inthe program. Specifically, we corrupt the instruction's destination register byflipping a single bit in it (similar to what prior work has done [10, 11, 33]).The main advantage of using LLFI is that it allows us to map the faultsback to the program's IR and trace its propagation in the program. This isnecessary for our analysis.Please note that LLVM applies Static Single Assignment (SSA) form inits IR code, so that each instruction is representated as its own result. Thismakes a program variable equivalent to the instruction that is computing itwhen considering program data in LLVM IR code. Therefore, we considerinstructions and program variables as interchangeable in this thesis.We use four benchmarks in this experiment, namely Bzip2, IS, LU andWater-spatial. They are from SPEC[13], NAS[1] and SPLASH-2[36] bench-mark suites respectively. Note that these benchmarks are only used for the112.4. Injection Resultsinitial fault-injection study - we later derive and validate the model with alarger set of programs. We choose a limited set of benchmarks in this studyto balance representativeness with time efficiency for fault injections.We classify the outcome into four categories: (1) Crash, meaning that theprogram threw an exception, (2) SDC, which means the program's outputdeviated from the fault-free outcome, (3) Hang, which means the programtook significantly longer to execute than a fault-free run, and (4) benign,which means the program completed successfully and its output matchedthe fault-free outcome. The above outcomes are mutually exclusive andexhaustive.2.4 Injection ResultsThe results of our fault injection experiments show that the top 10%most executed instructions, or those on the hot paths of the program, areresponsible for 85% SDC failures on average. This result is similar to that ofprior work, which has also observed that a small fraction of static instructionscause most SDCs [11]. However, this does not mean that all the hot-pathinstructions should be protected, as they incur high performance overheadwhen protected. Further, there is considerable variation in SDC rates evenamong the top 10% most executed instructions as the example below shows.Table 2.1 shows an excerpt from the Bzip2 program on its hot path. Theprinciple described here is observed across all four benchmarks we studied,but we focus on this (single) basic block for simplicity. The excerpt containsinstructions from the LLVM IR, into which we inject faults. Although the122.4. Injection Resultsoriginal code is in the LLVM IR form, we use C source-like semantics forsimplicity. For each instruction in the table, we report its SDC pronenessmeasured by fault injection. It can be observed from the table that some ofthe instructions have low SDC proneness, even in this highly executed block,e.g., instruction 4-6. This means even if a fault occurs in the result of theseinstructions, it is unlikely to result into an SDC, and hence protecting suchinstructions is unlikely to improve coverage by much. Therefore, we need tofind factors other than execution time that influence the SDC proneness ofan instruction.After investigating further, we found that SDC proneness is highly in-fluenced by data dependencies among the instructions. For example, inTable 2.1, instruction 4-8 constitute a data dependency chain whose finalresult is stored in instruction 10. Instruction 8 is the end of this data de-pendency chain and has an SDC proneness = 71%. The result of instruction7 is used in instruction 8 so a fault may propagate from instruction 7 toinstruction 8. But, the execution of instruction 8: or can mask the faultybit from instruction 7 if the corresponding bit of the result of instruction2 is 1. This explains why the SDC proneness for instruction 7 is slightlylower than that of instruction 8. The operation of instruction 7: shift leftcan mask the fault in high bit positions of the second source operand dueto architectural wrapping implementation of these shifting operations. Theconsequence of this masking effect is the low SDC proneness of instruction4-6. In addition to the arithmetic operations, our results show that addresscalculation operations such as instructions 1, 3 and 9 ("getelementptr" in-structions in LLVM) have low SDC proneness. This is because the results of132.5. SummaryTable 2.1: Variation of SDC proneness of highly executed instructions.Source code:1 s−>bsBuff |= ( v << (32 − s−>bsLive − n ) ) ;Basic block ID Instruction SDC pronenessbsW()-bb21 t1 = &s + OFFSET(bsBuff) 21%2 t2 = load t1 47%3 t3 = &s + OFFSET(bsLive) 21%4 t4 = load t3 13%5 t5 = 32 - t4 12%6 t6 = t5 - n 12%7 t7 = v  t6 58%8 t8 = t2 | t7 71%9 t9 = &s + OFFSET(bsBuff) 26%10 store t8, t9 -such instructions are usually used for pointer dereferences and are likely tocause segmentation faults which crash the application.Thus, we see that to calculate the SDC proneness of an instruction anddetermine whether it should be protected, one needs to take into account thefault propagation and SDC proneness of the end point of its data dependencychain. We will examine this in more detail in Chapter 3 by devising heuristicsfor finding highly SDC-prone instructions.2.5 SummaryThis chapter defined the fault model adopted in our technique in Sec-tion 2.2. It also defined the core problem of building a configurable SDC142.5. Summarydetection technique in Section 2.1 namely, estimating the SDC proneness ofan instruction. It then presented the results of fault injection experiments(Section 2.3) and found that fault propagation and SDC proneness of datadependency end points are the two major factors required to estimate SDCproneness (Section 2.4). We will examine the two factors in detail and for-mulate heuristics for them in Chapter 3, and then propose our approach forconfigurable SDC detection in Chapter 415Chapter 3HeuristicsIn this chapter, we formulate various heuristics for modelling error propa-gation in a program, and for estimating the SDC proneness of an instruction.We first propose heuristics as hypotheses, and validate them with our exper-imental data. These heuristics will be used in the next chapter to extractprogram features that are required to build both manually tuned and auto-matically tuned model.In the previous chapter, we found that the SDC proneness of a vari-able depends on (1) the fault propagation in its data dependency chain, and(2) the SDC proneness of the end point of that chain. An end point canbe a branch instruction, a store instruction or a function call instruction(in LLVM, function calls are represented by instructions). This is becausestores and branches do not have destination registers, and function call in-structions create a new stack frame, thereby terminating their dependencychains. However, function calls are not considered in our work, as LLVMaggressively inlines functions, and hence there are few instances of such in-structions. Further, because branch instructions depend on the results fromcomparison instructions to determine the direction of the branch, we considerthe results of comparison instructions as the end points of their dependency163.1. Heuristics for Fault Propagationchains. Therefore, we consider only comparison and store instructions forthe SDC proneness of end points of dependency chains.3.1 Heuristics for Fault PropagationIn this section, we study how faults propagate along dependency chains,and how to estimate the SDC proneness of an instruction based on theSDC proneness of the store or comparison instructions that the instructiondepends on, directly or indirectly.HP1: The SDC proneness of an instruction will decrease if its result isused in either fault masking or crash prone instructions.Fault propagation can be stopped by an instruction either masking thefault, or by crashing the program. Both masking and crashing decreasethe probability of an SDC resulting from the instruction that propagates itsdata to the other crashing/masking instruction, as a result of which its SDCproneness is lowered. For example, in Table 2.1, the fault masking effect ofinstruction 7 results in instruction 6 having a low SDC proneness.Table 3.1 shows instructions that have high probability of masking/crash-ing the program, thus lowering the SDC proneness. We derived this tablefrom the initial fault injection study in Section 2.4, based on general trendsacross the applications. Note that these are conservative, as other instruc-tions may also mask fault propagation in specific circumstances dependingon the values of their operands.To estimate SDC proneness of all instructions, we simulate backwardfault propagation from the store and comparison instructions through thedata dependency chains of the program. The SDC proneness of the result173.1. Heuristics for Fault PropagationTable 3.1: Effects on SDC proneness of some operationsOperation Description Effectgetelementptr address calculation Crashtrunc truncate data size Mask due to truncationlshr logical shift right Mask due to Wrappingashr arithmetic shift right Mask due to Wrappingshl shift left Mask due to WrappingTable 3.2: SDC decreasing rates of masking/crashing prone operationsOperation Involved source operands Decrease bygetelementptr all operands 75%trunc variable needs truncation 50%lshr shift bit variable 85%ashr shift bit variable 85%shl shift bit variable 85%183.2. Heuristics for Store Operationsof an instruction will propagate to its source operands unless it is one ofthe operations listed in Table 3.1, in which case, the SDC proneness of thesource operands will decrease by a certain extent, as listed in Table 3.2 tomodel the effect of masking. The values in Table 3.2 are based on our faultinjection experiments.Then, the question left is how to estimate the SDC proneness of storeand comparison instructions. This is addressed in the following two sections.3.2 Heuristics for Store OperationsIn this section, we examine the SDC proneness of store instructions, asthis is one of the two categories of instructions used to estimate the SDCproneness of every instruction in the program. Through our fault injectionstudy in Section 2.4, we found the SDC proneness of store instructions de-pends on how the stored value is used in the program. Therefore, we catego-rized the stores into four types according to their usage in memory addressesand comparisons, as shown in Table 3.3. For each of the categories, we foundthat the SDC proneness is dependent on a specific feature of that category,which is also shown in Table 3.3. For example, in the Cmp NoAddr category,the SDC proneness of the store is determined by whether the value resultsin the comparison result being flipped, thus causing the wrong fork of thebranch to be taken. Figure 3.1a shows the average SDC proneness of thefour categories, and the associated feature for each of the categories.We now examine each of the four categories in detail.HS1: Addr NoCmp stored values have low SDC proneness in general,as shown in Table 3.3.193.2. Heuristics for Store Operations(a) Effects of major related features for each of the four major categoriesof stored values.(b) Effect of data width for address compu-tation related stored values(c) Effect of nest loop depths for loop termi-nating comparisonsFigure 3.1: Average SDC proneness observed across all studied programs203.2. Heuristics for Store OperationsTable 3.3: Four major categories of stored valuesCategory DescriptionMajor relatedfeaturesAverageSDCpronenessAddr NoCmpThe stored value is used incalculating memory addressesbut not comparison resultsData width 22.82%Addr CmpThe stored value is used incalculating both memoryaddresses and comparisonresultsData width andcontrol flowdeviation48.17%Cmp NoAddrThe stored value is used incalculating comparison resultsbut not memory addressesResilient orUnresilientcomparison67.25%NoCmpNoAddrThe stored value is neitherused in memory addresscalculation nor comparisonresultsUsed in output ornot56.41%This is because faults in such values are highly likely to propagate toaddresses of other loads and stores, which would likely result in the appli-cation crashing due to a segmentation fault, especially for those values thatare wider than 32 bits (see Figure 3.1a).Figure 3.2 shows an example of this category, where a fault in the des-tination register of i-3 in (line 3) results in a system crash upon pointerdereference.HS2: Addr Cmp stored values usually have higher SDC proneness thanAddr NoCmp.As shown in Figure 3.1a, by propagating the fault to the comparisoninstruction, Addr Cmp values may change the control flow and elide thepointer dereference, which would have crashed the application otherwise.This decreases the probability of a crash, thereby increasing the SDC prone-213.2. Heuristics for Store Operations1 s t a t i c void mainSort ( . . . ) {2 f o r ( ; i>=3;i−=4)3 { . . . ptr [ j ]=i-3 ; } // corrupted4 }5 s t a t i c void mainSimpleSort ( . . . ) {6 whi le ( mainGtU (ptr[j-h]+d , . . . ) )7 { . . . }8 }9 s t a t i c Bool mainGtU ( UInt32* i1 , . . . ) {10 c1=block [ i1 ] ; . . . i1++; c1=block [ i1 ] ; // load opera t i on11 }Figure 3.2: Example of Addr NoCmp from Bzip2.1 s t a t i c void mainSort ( . . . ) {2 Int32 lo = ftab[sb]&CLEARMASK ; // corrupted3 i f (hi > lo) {// con t r o l f low changed4 mainQsort3 ( lo , . . . ) ;5 }6 }7 void mainQSort3 ( Int32 loSt , . . . ) {8 mpush ( loSt , . . . ) ; . . . mpop ( lo , . . . ) ;9 med=(Int32 ) mmed3 ( block [ ptr [ lo]+d ] , . . . ) ; // load avoided10 }Figure 3.3: Example of Addr Cmp from Bzip2. The fault occurs at line 2may not propagate to the load at line 9 because of the control flow deviationat line 3223.2. Heuristics for Store Operations1 Bool copy_input_until_stop ( Estate* s ) {2 . . .3 whi le ( True ) {4 . . .5 s−>strm−>total_in_lo32++ ;6 i f (s->strm->total_in_lo32==0)7 s−>strm−>total_in_hi32++;8 }9 }Figure 3.4: Example of Cmp NoAddr from Bzip2. A resilient comparisonoperations(line 6) that masks the fault that occurs at line 5.1 s t a t i c void sendMTFValues ( Estate* s ) {2 f o r ( i=0,i<nSelectors ; i++){3 s->selectorMtf[i]=j ;4 } . . .5 f o r ( i=0;i<nSelectors ; i++){6 f o r ( j=0;j<s->selectorMtf[i] ; j++)7 bsW (s , 1 , 1 ) ;8 }9 }Figure 3.5: Example of Cmp NoAddr from Bzip2. An unresilient comparisonusage of the stored value at s->selectorMtf[i]=j(line 3).233.2. Heuristics for Store Operations1 void main ( . . . ) {2 . . .3 ( start ) = ( unsigned long ) ( FullTime . tv_usec + FullTime . tv_sec *1000000) ;4 . . .5 Global−>starttime = start ;6 printf ( . . . , Global−>starttime) ;7 }8 //The value s to r ed in9 //Global−>sta r t t ime i s not used10 // as the output o f the program11Figure 3.6: Example of NoCmp NoAddr from IS with zero SDC proneness.1 void InitA ( double * rhs ) {2 f o r ( j=0;j<n ; j++){3 f o r ( i=0;i<n ; i++){4 rhs[i]+=a[ii][jj] ;5 }6 }7 }8 void CheckResult ( . . . , double* rhs) {9 f o r ( j=0;j<n ; j++){y[j]=rhs[j] ; } . . .10 f o r ( j=0;j<n ; j++){diff=y[j]-1.0 ; . . . }11 max_diff=diff12 printf ( . . . ,max_diff) ;13 }Figure 3.7: Example of NoCmp NoAddr from LU with high SDC proneness.243.2. Heuristics for Store Operationsness compared to the Addr NoCmp category. As an example of this categoryfrom Bzip2 is shown in Figure 3.3.HS3: The SDC proneness of Addr NoCmp and Addr Cmp stored val-ues increase as their Data width decrease.Data width is the number of bits in values, and is a major feature affectingthe SDC proneness of stored values used in address computation (i.e., AddrNoCmp and Addr Cmp). Figure 3.1b shows the average SDC pronenessof the stored values used in address computations, for different data widthvalues. For values used in address computation, a wider data width meansmore bits are crash-prone, and hence the value as a whole has lower SDCproneness.HS4: The SDC proneness of Cmp NoAddr stored values depends on theresilience of the comparison operation to which the value propagates i.e.,how likely it is to change the result of the comparison given a faulty dataoperand.We illustrate the above heuristic with an example from the Bzip2 ap-plication. Figure 3.4 shows an example of a resilient comparison operationin line 6. In this case, the equality is not satisfied in the majority of exe-cutions (obtained through profiling the program), and hence the branch ishighly biased toward the not-equal fork. Therefore, a fault in the variabletotal_in_lo32(line 5) which feeds into the comparison operation is unlikelyto result in the equality being true, and hence the control flow of the pro-gram does not change from a fault-free execution. We call such comparisonsas resilient. On the other hand, the code in the right of Figure 3.5, illus-trates a case where a fault in the comparison operator, selectorMtf[i]=j(line3) will affect the number of loop iterations, thus making it highly SDC prone.253.3. Heuristics for Comparison OperationsWe call such comparisons as unresilient. A key factor in deciding the SDCproneness of Cmp NoAddr stored values is whether the comparison usingthe stored value is resilient (Figure 3.1a).HS5: The SDC proneness of NoCmp NoAddr stored values depend onthe probability of a fault in them propagating to the program's output, andwhether the output is important to the program.NoCmp NoAddr stored values are used neither in computing mem-ory addresses nor in comparison instructions, and do not affect pointers orbranches. Figure 3.6 and Figure 3.7 show two excerpts from IS and LU re-spectively. The faulty stored value in IS only affects the time statistics whilethe one in LU may affect the output of the application. This explains thedifference of their SDC proneness. Also in Figure 3.1a, we can see the aver-age SDC proneness for the stored values that do not propagate to programoutput is much lower than the SDC proneness of those values that do.3.3 Heuristics for Comparison OperationsComparison instructions are the other category of instructions whoseSDC proneness determines the SDC proneness of every instruction in theprogram. We find that the SDC proneness of comparison instructions de-pends on three features, as follows:HC1: Nested loop depths affect the SDC proneness of loops' comparisonoperations, as the SDC proneness of comparison operations in inner loopsare generally lower than the comparison operations in outer loops, as shownin Figure 3.1c.Figure 3.8a shows an example from Bzip2. Both nHeap>1 and weight[tmp]263.4. Heuristics of Other Factors< weight[heap[zz  1]] are used in determining the loop exit conditions forthe outer and inner loops respectively.HC2: Comparison operations that only affect silent stores have low SDCproneness.A silent store is a store whose stored value is not subsequently used bythe program. Therefore, the comparison operation has a low likelihood ofaffecting the program's output. An example from Bzip2 is shown in Fig-ure 3.8b. A flip in the comparison a2update<BZ_NOVERSHOOT(line 4)can cause the store operation quadrant[a2update+nblock]=qVal(line 5) to beelided. However, this is a silent store, and hence does not result in an SDC.HC3: Comparisons that affect output-related store values havehigh SDC proneness.A fault in these comparisons has a high probability of resulting in acorrupted program output. Figure 3.8c shows an example from the LUbenchmark. A faulty comparison result at i<n(line 3) may terminate theloop too early and elide the store operation a[i]+=alpha*b[i](line 4) whosestored value is used in calculating the output. This results in a high SDCproneness of i<n(line 3).3.4 Heuristics of Other FactorsIn addition to the specific features for comparison and store operationswe observed in our experiment, the following factors also affect the SDCproneness of an instruction.HO1: Memory allocation functions related stored values and com-parison operations have low SDC proneness.Memory allocation functions related stored values or comparison273.4. Heuristics of Other Factors1 void BZ2_hbMakeCodeLengths( . . . ) {2 whi le ( nHeap>1){ // outer loop3 . . .4 whi le ( weight [ tmp ]<weight [heap [ zz>>1]]){5 // inner loop6 Heap [ zz ]=heap [ zz>>1];7 zz>>1;8 }9 }10 }(a) An excerpt from Bzip2. Outer loop com-parisons have higher SDC proneness than in-ner loop comparisons1 mainSort ( . . . ) {2 f o r ( j=bbSize−1; j>=0; j−−){3 . . .4 i f (a2update<BZ_NOVERSHOOT)5 quadrant [ a2update+nblock]=qVal ;6 //Not used in fu tu r e7 . . .8 }9 . . .10 }(b) An example from Bzip2 that the com-parison result only affects a silent store in-struction1 daxpy ( double * a , double * b , . . . ) {2 long l ;3 f o r ( i=0;i<n ; i++){4 a[i]+=alpha*b [ i ] ; // skipped due to loop terminat ion }5 }6 bmodd ( double * a , double * c , . . . ) { . . .7 daxpy (&a[k+1+j*stride_c],&a [ k+1+j*stride_a ] , dimi−k−1,alpha ) ; . . .8 // the content o f a [ ] i s corrupted9 }10 lu ( ) { . . .11 A=&a[K+j*nblocks] ; // f a u l t propagates to a [ ] through the c a l l o fbmodd( )12 bmodd (D ,A , strK , strJ , strK , strK ) ; . . . // content o f A [ ] i s corrupted13 }14 CheckResult ( . . . , double* a , . . . ) { . . . // c a l l e d by main ( )15 printf ( . . . ,max_diff , . . . ) ; . . . // corrupted because o f corrupted a [ ]16 }(c) An example from LU that a faulty comparison result: i<n(line 3) will change thecontrol flow and finally affect CheckResult(). The fault propagation trace is highlightedin red.Figure 3.8: Examples of comparison results.283.4. Heuristics of Other Factorsoperations can directly affect memory allocation functions such as malloc(),valloc(), palloc(), and hence faults in the instructions are very likely to trig-ger memory exceptions. This results in having low SDC proneness. Weobserve that the average SDC proneness for memory allocation related storeor comparison operations is 12.42%, which is considerably lower than theaverage of other store and comparison operations, which is 42.58%.In addition to the above features, we consider other program featuresconsidered in prior work, such as global variable [10], the loop depth [33],cumulatively calculation [6], and fan-out of variable [24].HO2: For variables that are used in resilient comparisons, global vari-ables in this group have higher SDC proneness than others variables thatheading to resilient comparisons.As mentioned in prior work [10], global variables are more likely to storethe global states of a program, so a fault in these variables is likely to livelonger and affect more program data. In our initial study experiments, wefound this effect can be helpful in estimating SDC proneness for variablesthat are used in resilient comparisons. As presented in Section 3.2 HS4, re-silient comparisons can mask faults in their backward slices so that variablesused in such comparisons may have lower SDC proneness. However, faults inglobal variables have longer life time and are hence less likely to be masked.Figure 3.9 shows the average SDC proneness of global variables and othersin the group of variables leading to resilient comparisons.HO3: Comparisons that are in higher loop depths exhibit higher SDCproneness.Comparisons that are in higher loop depths have higher SDC pronenesson average. This heuristic is opposite to the conclusion of prior work [33].293.4. Heuristics of Other FactorsFigure 3.9: Average SDC proneness of global ones and others in resilient-comparisons-used variablesThe reason for this difference is that the prior work focuses on EgregiousData Corruptions(EDCs), which are a subset of SDCs that cause significantdeviation in the program's output. A deeper loop structure usually impliesa core computation in a program, and faults in comparisons from thesedeep loops are prone to corrupt a small but critical portion of program datatherefore corrupt the output. However, corrupting such a small portion ofprogram data may not lead to a large deviation in the program output, andare hence not considered important in terms of EDC detection. Figure 3.10shows the average SDC proneness of comparisons with different depths ofloop in our initial fault injection experiment.HO4: Variables that are cumulatively calculated have higher SDCproneness.Similar to the result of prior work [6], cumulatively calculated variablesmay have higher SDC proneness. This is because faults in such variables mayaccumulate along with program execution and thus less likely to be masked.Figure 3.11 shows the average SDC proneness for cumulative variables and303.5. SummaryFigure 3.10: Average SDC proneness of comparisons with different depths ofloopothers.HO5: Variables with higher fan-out are prone be low in SDC proneness.Pattabiraman et al. [24] have found that the fan-out, or the dynamicnumber of uses of a variable, can be a good measure of the crash-pronenessof a program variable. In our initial injection experiment, we also foundthat variables with high fan-out usually have lower SDC proneness whenfanout < 4. However, when fanout ≥ 4, we observed a higher averageSDC proneness. Figure 3.12 shows the average SDC proneness for differentfan-outs in our initial injection experiment.3.5 SummaryIn Chapter 2, we identified two major factors that contribute to the esti-mation of SDC proneness: fault propagation and SDC proneness of the endpoints of data dependency chains. In this chapter we formulated heuristics313.5. SummaryFigure 3.11: Average SDC proneness for cumulative variables and othersFigure 3.12: Average SDC proneness for different fan-outs323.5. Summaryto model both of these two factors. We first extracted instructions thathave shown high fault masking or crash proneness in all the programs in ourinitial study, we then formulated heuristics for fault propagation based onthe result in Section 3.1. For the SDC proneness of the end points of datadependency chains, we classified the end points into two groups: store oper-ation and comparison operation, and formulated heuristics for both groupsrespectively (Section 3.2 and Section 3.3). Finally, we incorporated someother program features based on our experiments, and some heuristics pro-posed in prior work to complement the heuristics for the two groups.33Chapter 4ApproachIn the previous chapter, we examined various heuristics for identifyingSDC-prone variables in a program. In this chapter, we first extract pro-gram features based on the heuristics to describe each store and comparisoninstruction (Section 4.1). We then build the SDCTune (Section 4.2) andSDCAuto models (Section 4.3) with the extracted features, to quantify theestimation of SDC proneness based on empirical data. Finally, we presentour approach for choosing the SDC-prone locations subject to a maximumperformance overhead using SDCTune and SDCAuto (Section 4.5), and thenature of the detectors we inserted to protect the program (Section 4.6)4.1 Feature ExtractionThe first step of building our SDC-proneness estimation model is in ex-tracting features. As shown in Chapter 3, features are extracted accordingto our heuristics and also those proposed in prior work [6, 10, 24, 33]. Thereis a one to many mapping between heuristics and features. In other words,a heuristic may correspond to multiple features. For example, stored valuesof Addr NoCmp group will be identified using two features, namely addresscomputing and comparison used. However, the features that are eventually344.1. Feature Extractionselected to be used in either SDCTune or SDCAuto are determined in themodel building phase, covered in Section 4.2 and Section 4.3 respectively.We extract features through static analysis and dynamic profiling of theprograms. These features describe the stored values and comparison resultsthough three perspectives. (1) Execution time related features contain fea-tures about dynamic counts of or affected by an program variable. (2) Codestructure related features contain features about the position of an programvariable in the code. (3) Data usage related features contain features relevantto the usage of an program variable.Table 4.1 shows an excerpt of all the features we extracted. In total, 66features are extracted for stored values and 67 for comparisons.Table 4.1: Some features extracted for modeling buildingFeaturegroupSubgroup Feature DescriptionCommonfeaturesExecution timerelatedinst func executiontime ratiodynamic counts of the specificinstruction divided by thedynamic counts of thefunction it belongs toinst execution timeratio bymaxdynamic counts of aninstruction divided by themaximum dynamic counts ofall instructionsdominated executiontime ratio bywholedominated dynamic counts ofan instruction divided by thedynamic counts of allinstructions354.1. Feature ExtractionTable 4.1: Some features extracted for modeling buildingFeaturegroupSubgroup Feature DescriptionCommonfeaturesExecution timerelatedpost dominatedexecution time ratiobymaxpost dominated dynamiccounts of an instructiondivided by the maximum ofall instructionsCode structurerelatedbb lengththe number of staticinstructions in the basic blockthat contains the specificinstructionsbb length ratio bymaxbb length divided by themaximum of all instructionspost dominated loopdepth ratio bymaxpost-dominated loop depth ofan instruction divided by themaximum of all instructionsData usage re-lateddata widththe number of bits of theresult of the specificinstructionin globalwhether the specificinstruction changes a globallydefined valueFeaturesfor storedvaluesExecution timerelatedexecution time loadsthe dynamic counts of thestored value being loadedload execution timeentropythe entropy computed basedon the probabilities of a storedvalue being loaded by differentload instructions364.1. Feature ExtractionTable 4.1: Some features extracted for modeling buildingFeaturegroupSubgroup Feature Descriptionexecution timerequired for addrthe dynamic counts requiredfor computing the storingaddressFeaturesfor storedvaluesCode structurerelatednum static loads ratiobymaxthe number of static loadinstructions divided by themaximum of all stored valuesData usage re-latedused in oef func callwhether the stored value isused in functions which haveno side effectFeaturesfor com-parisonsExecution timerelateddecision entropyexecution timethe entropy computed basedon the probabilities of thecomparison resultsCode structurerelatedis loop terminatorwhether the comparison resultcan break a loop executionData usage re-latedis icmpwhether the comparison ismade between integersis fcmpwhether the comparison ismade between float pointvaluescmp with zerowhether the comparison ismade with zeroAlong with these features, we also need the SDC proneness of the storedvalues and comparisons as training data. We conduct fault injection experi-ments upon these variables to gather the SDC proneness.374.2. Manually Tuned Model: SDCTune4.2 Manually Tuned Model: SDCTuneBoth our manually tuned model (SDCTune) and automatically tunedmodel (SDCAuto), for predicting the SDC proneness of a variable, are builtfrom fault injections over a set of training programs, with program featuresextracted before which incorporate the heuristics defined in the previoussection.We start building SDCTune model by modelling the SDC proneness ofstore and comparison instructions in the program. The SDC proneness ofthese instructions depends on categorical features such as resilient compar-isons and on numerical features such as data width (Section 3.2 and Sec-tion 3.3). We manually apply classification to model the categorical features,and linear regression to model the numerical ones. Once we determine theSDC proneness of the store and branch instructions, we use the fault prop-agation procedure outlined in Section 3.1 for estimating the SDC pronenessof other instructions. We explain the classification and regression methodsbelow.Classification The goal of classification is to use the categorical featuresthat we observed before to classify the stored values or comparison resultsinto different groups so that we can apply the numerical features (or arith-metic means) to quantify the SDC proneness of each group. This classifi-cation is done manually according to our empirical data. For each divisionin the model, we first select features that can describe our heuristics, andthen we adopted those features to split our current group into several sub-384.2. Manually Tuned Model: SDCTunegroups. We recursively split these subgroups with our heuristics until all theheuristics are utilized As shown in Sections 3.2 and 3.3, different categoriesof stored values and comparison results have different categorical features fordetermining their SDC proneness (e.g. resilient comparison or not for CmpNoAddr stored values and used in output or not for NoCmp NoAddr ones).Therefore, we apply tree-structured classification so that different featurescan be used in different categories. The features are arranged hierarchicallyin the form of a tree, starting from a root node, and partitioning the nodesbased on different features recursively until all the data in a leaf node belongsto a single category.Regression is applied upon the leaf nodes of the classification tree tofactor in the effects of numerical features such as data width. For example,consider a leaf node of stored values: Addr NoCmp->Not Used in MaskingOperations. We find that the SDC proneness of stored values in this nodesatisfy the following equation: Pˆ (SDC|I) = −0.012 ∗ data width + 0.878.This expression was derived using linear regression based on the results fromfault injection over a set of training programs in Section 5.1. The reasonfor the negative correlation in this equation is that the higher bit positionsof stored values in leaf Addr NoCmp->Not Used in Masking Operations arevery likely to cause application crash if they are corrupted. Since values withlarger data width have a higher probability of being corrupted in higher bitpositions, faults that occur in those values are less likely to cause SDCs asthey are more likely to cause the program to crash. For the leaf nodes thatdo not exhibit a correlation with numerical features, we take the arithmetic394.3. Automatically Tuned Model: SDCAutomeans as the estimation of their SDC proneness.4.3 Automatically Tuned Model: SDCAutoUnlike SDCTune, our automatically tuned model, SDCAuto, is built au-tomatically using a machine learning approach known as the Classificationand Regression Tree (CART) algorithm [27].We choose this algorithm due to the following reasons:1. The built tree model is simple to understand and to interpret. On thecontrary, results from other models such as artificial neural networks,may be more difficult to interpret. The generated CART tree can helpus gain a better understanding of the relation between SDC prone-ness and program features by picking out the features showing strongcorrelation with SDC proneness.2. CART tree model, as one of decision tree model requires little datapreparation [4]. Other regression models, like support vector machine(SVM) and Gaussian process, rely on an appropriate normalizationmethod of input data, which needs delicate tuning based on the appli-cation scenarios. However in our case, the orders of magnitude maybe very different for different features (e.g., execution time related fea-tures and code structure related features), and are hence very difficultto normalize. Therefore, we prefer models like CART Tree which re-quire little data preparation.3. CART tree is able to handle both numerical and categorical data.404.3. Automatically Tuned Model: SDCAutoMany of the features we extracted are categorical data, e.g.,is_global,is_integer and is_fcmp. At the same time, other features are numer-ical, e.g., loop_depth, dominated_execution_time and data_width.Many other regression algorithms may not support a mix of categori-cal data and numerical data.However, one disadvantage of CART algorithm is that the tree may growto be biased if some classes of data dominate. In our case, the tree maybiased towards some training programs because of the large number of datapoints from them while ignoring other training programs. To balance thedataset between different training programs, we define data point threshold asa parameter to constrain the maximum number of data points allowed in thegrowth of the trees for stored values and comparisons. Store and comparisoninstructions will be ranked decreasingly according to their dynamic counts,and we incorporate the top ones as the highly executed instructions aremore valuable for SDC proneness estimation (Equation 2.2). The number ofinstructions we incorporate from each program is limited by the data pointthreshold.Our decision tree is built based on the Mean Squared Error (MSE) cri-teria, and used as regression tree to estimate the SDC proneness. Thealgorithm splits the training dataset recursively to divide the data pointsinto multiple groups until the divided groups have data points fewer than athreshold value, namely minimum size of leaves. The end groups are knownas leaves and the average value (i.e., SDC proneness) are assigned as thevalue of each leaf. For each split, the decision tree algorithm will select a414.3. Automatically Tuned Model: SDCAutofeature and splitting threshold among all possible positions to maximize thereduction of MSE which represents the information gain.Algorithm 1 shows the pseudo code of the CART algorithm to build adecision tree. The algorithm takes a set of n data points: < Xi, yi >, i =1, ..., n as the input data. In our case, the size of this set of data pointsare controlled by data point threshold. For each data point, there is a targetvalue: y, SDC proneness in our case, and an input vector: X with in total Ddimensions as each dimension represents one feature that we extract. Thealgorithm will first test if the current set of n data points can be split intotwo subsets (line 1). The precondition here is that every leaf should have atleast minleaf data points. If the current set of data points cannot be split,the algorithm will create a leaf node and assign it with the average y value ofits data points as its estimation value (line 2-4). If the set of data points canbe split, the algorithm will traverse all the D dimensions of X and all thepossible splitting values of the data points to find a split that can maximizethe information gain which, in our case, is represented as a minimum MeanSquared Error (MSE) (line 7-18). Once a split is done, we recursively callthe routine on the two new subsets until no more divisions can be done (line19-20). Finally, the algorithm will return the root node of the tree.In the above algorithm, the growth of the trees is controlled by param-eter: minimum size of leaves and we control the input data through datapoint threshold. We study the influence of them and present our results inSection 2.4. Figure 4.1 shows an example of our built decision tree for storedvalues with 17 points as minimum size of leaves and 80 instructions as datapoint threshold.424.3. Automatically Tuned Model: SDCAutoinput : 1) A set of n data points: < Xi, yi >, i = 1, ..., n;2) minimum size of leaves: minleafoutput: A regression tree1 if n < 2 * minleaf then2 No split can be made to create two leaves with more thanminleaf data points;3 Create Leaf Node and assign it the average y value of the ndata point;4 Return Leaf Node;5 end6 else7 MSEtotal = The MSE of y values of all n data points;8 for dimension d in all dimensions of X do9 < x[d]i >, i = p, ..., q = all possible variables in< x[d]i, i = 1, ..., n > that can split n points into twogroups with more than minleaf points in each;10 for variable v in all possible variables < x[d]i >, i = p, ..., qdo11 left group, right group = Split at dimension: d withvalue: v;12 new MSEtotal = MSEleft group ×Numleft groupn +MSEright group ×Numright groupn ;13 if newMSEtotal < MSEtotal then14 MSEtotal = newMSEtotal;15 cache split(d, v);16 end17 end18 end19 left leaf , right leaf = split data points < Xi, yi > with lastcached split(dlast, vlast);20 Call recursively upon created left leaf , right leaf ;21 end22 Return Root Node;Algorithm 1: CART Algorithm to Build a Decision Tree434.4. Model UsageFigure 4.1: Auto tuned decision tree for stored values4.4 Model UsageOnce the trees are built from training dataset, we can utilize it to esti-mate the SDC proneness of the stored values and comparison results of thetesting programs. The estimated SDC proneness of those end points of datadependency chains will be back propagated along their backward slice toderive the SDC proneness of each instruction with fault masking or crashingrate considered (section 2.4). Then the SDC proneness of each instructionwill be used to calculate the importance of the instruction and guide theselection of instructions to duplicate and check under a specific overheadbound as described in Section 5.3.444.5. Choosing the Instructions4.5 Choosing the InstructionsAs shown in Section 2.4, we can calculate the SDC coverage of protectingan instruction if we know the SDC proneness of that instruction using Equa-tion 2.1 in Section 2.1. We apply either SDCTune model or SDCAuto modelto estimate the SDC proneness of each instruction in the program that wewant to protect. We also obtain the dynamic count of each instruction inthe program by profiling it with representative inputs. We then attempt tochoose instructions to maximize the SDC coverage subject to a given per-formance overhead (Section 2.4), using a standard dynamic programmingalgorithm [22].4.6 Detector DesignOnce we identify a set of instruction to protect, the next step is to inserterror detectors for instructions. Our detectors are based on duplicating thebackward slices of the instructions to protect, similar to prior work [10]. Weinsert a check immediately after the instructions to be protected, which com-pares the original value computed by the instruction with the value computedby the duplicated instructions. Any difference in these values is deemed tobe an error detection and the program is stopped. Figure 4.2b shows a con-ceptual example of our detector for a given set of instructions to be protectedin Figure 4.2a.Note that we assume that there is a single transient fault in the program(Section 2.2), and hence it is not possible for both the detector and thechosen instruction to be erroneous. Therefore, any error in the computation454.6. Detector Design(a) Data dependency ofdetector-free code. Theshaded portion shows the in-structions need protection.(b) Basic detector instru-mented. The shaded nodesshows the duplicated in-structions and the detectorinserted at the end of the twodependency chains.(c) concatenate duplicatedinstructions. One instruc-tion is added to be pro-tect(node e') that concate-nates the two dependencychains and save one checkerFigure 4.2: Example of inserted detectors and concatenating instructionsperformed by the chosen instruction will be detected by the correspondingerror detector.A naive implementation of our detectors can result in prohibitive per-formance overhead. Therefore, we develop two optimizations to lower thedetector overhead. First, we concatenate adjacent duplicated pieces of codeby adding the instructions between them to the protection set so that wecan combine their detectors. Figure 4.2c shows how this optimization works.This optimization provides benefits when the cost of the saved detector ishigher than the cost due to the added instructions. Second, we performlazy checking, in which detectors for cumulative computations in loops aremoved out of the loop bodies, as the example in Figure 4.3 illustrates. Thisoptimization is effective for long running loops.464.6. Detector Design1 f o r ( i=0; ; i++){2 // loop body3 flag = i<n ? 1 : 0 ;4 i f ( flag == 1) break ; //decompose e x i t p r ed i c a t i onto s imulate i n s t r u c t i on−l e v e l behaviour .5 }(a) Detector-free code1 i=0;2 dup_i=0; // dup l i c a t i on o f i3 f o r ( ; ; ) {4 // loop body5 flag = i<n ? 1 : 0 ;6 dup_flag = dup_i<n ? 1 : 0 ;7 if(flag != dup_flag)8 Assert();// i n c o n s i s t e n t9 i f ( flag == 1) break ;10 }(b) Basic detector instrumented. This shows how the loop index i inoriginal code (a) is protected with bold code as check.1 i=0;2 dup_i=0; // dup l i c a t i on o f i3 f o r ( ; ; ) {4 // loop body5 flag = i<n ? 1 : 0 ;6 dup_flag = dup_i<n ? 1 : 0 ;7 i f ( flag == 1) break ;8 }9 if(flag != dup_flag)10 Assert();11 // i n c o n s i s t e n t(c) Lazy checking applied. This shows how we move the check out of theloop bodyFigure 4.3: Example of inserted detectors and lazy checking474.7. Summary4.7 SummaryThis chapter described the approach of building our configurable SDCdetection technique based on the heuristics from Chapter 3. Section 4.1presented the extracted features for building both the manually tuned model:SDCTune and automatically tuned model: SDCAuto. Section 4.2 presentedthe building of SDCTune and Section 4.3 presented the building of SDCAutowith the CART algorithm. Section 4.4 presented the usage of the two modelsto estimate SDC proneness, and Section 4.5 presented how to utilize theestimated SDC proneness to guide detector placement. The last sectionpresented our SDC detector and two optimizations to reduce its overhead.In the next chapter, we will evaluate the models we built with the algorithmsand use them to guide the placement of error detectors in the application.48Chapter 5Experimental SetupIn this chapter, we present our experimental setup to evaluate both theSDCTune and SDCAuto models for configurable SDC protection. All theexperiments and evaluations are conducted on a Intel i7 4-core machine with8GB memory running Debian Linux. Section 5.1 presents the details ofbenchmarks and Section 5.2 presents our evaluation metrics. Section 5.3presents our methodology and workflow for performing the experiments.5.1 BenchmarksWe choose a total of twelve applications from a wide variety of domainsfor training and testing both of our models. The applications are drawn fromthe SPEC [13], SPLASH2 [36], NAS parallel [1], PARSEC [2] and Parboil [32]benchmark suites. We randomly divide the twelve applications into twogroups, one group for training and the other for testing. The four benchmarksused in Section 2.3 to derive the heuristics are drawn from the training group.The details of these training and testing benchmarks are shown in Table 5.1and Table 5.2 respectively. All the applications are compiled and linked intonative executables with -O2 optimization flags and run in a single threadedmode, as our current implementation of both SDCTune and SDCAutomodels495.2. Evaluation MethodTable 5.1: Training programsProgram DescriptionBenchmarksuiteInput Stores ComparisonsISIntegersortingNAS default 21 20LULinearalgebraSPLASH2 test 41 110Bzip2 Compression SPEC test 681 646SwaptionsPriceportfolio ofswaptionsPARSECSim-large36 101WaterMoleculardynamicsSPLASH2 test 187 224LbmFluiddynamicsParboil short 71 34work only with single-threaded programs.5.2 Evaluation MethodWe evaluate our SDC proneness estimation model from three perspec-tives. First we evaluate the regression results of CART algorithm with dif-ferent parameters, the optimal parameters will be selected for SDCAuto. Wethen use both model for estimating the overall SDC rate of applications, aswell as the SDC coverage(s) for different performance overhead bounds. Theestimation of overall SDC rates are used for comparing the SDC rates ofdifferent applications free from fault injection, while the coverage(s) showthe capability of configurable protection of our technique. We use the sameexperimental setup for fault injection as that described in Section 2.3.505.2. Evaluation MethodTable 5.2: Testing programsProgram DescriptionBenchmarksuiteInput Stores ComparisonsGzip Compression SPEC test 251 399OceanLarge-scaleoceanmovementsSPLASH2 test 322 813CGConjugategradientNAS default 32 97BfsBreadth-FirstsearchParboil 1M 36 57McfCombinatorialoptimizationSPEC test 87 158LibquantumQuantumcomputingSPEC test 39 136Regression results from decision tree model To evaluate the regres-sion results, we calculate the average squared errors for both training andtesting dataset. As shown in Section 4.3, there are two parameters control-ling the tree building process: (1)minimum size of leaves and (2)data pointthreshold.We explore this two-dimensional parameter space, we vary the minimumsize of leaves from 1 to 120 points per leaf, and the data point threshold from10 to 120 data points for each program. These values were chosen basedon our empirical measurements of the numbers of stores and comparisoninstructions in the program. We explore this two-dimensional parameterspace and calculate the mean squared errors for both training and testingdataset for each point in our exploration space. We then present the two515.2. Evaluation Methodoptimal pairs of parameters for stored value decision tree and comparisondecision tree respectively. We also present the features that are adopted bythe optimal trees and compare them with the manually selected features inSDCTune model.Estimation of overall SDC rates: We perform a random fault injectionexperiment to determine the overall SDC rate of the application. We thencompare the SDC rate estimated by both of our models with that obtainedfrom the fault injection experiment. We also estimate the correlation betweenour estimated SDC rates and SDC rates from fault injection. A high positivecorrelation implies the usefulness of our models in comparing SDC ratesamong different applications.SDC coverages for different performance overhead bounds: TheSDC coverage is defined as the fraction of SDC causing errors detected byour detectors. We apply both SDCTune model and SDCAuto model to pre-dict the SDC coverage for different instructions to satisfy the performanceoverhead bounds provided by the user. Our selection algorithm(Section 4.5)starts with the instructions providing the highest coverage, and iterativelyexpands the set of instructions until the performance overhead bounds aremet. We then perform fault injection experiments on the program instru-mented with our detectors for these instructions, and measure the percent-age(s) of SDCs detected. We also compare our results with those of fullduplication, i.e., when every instruction is duplicated in the program, andthat of hot-path duplication, i.e., when the top 10% most executed instruc-525.3. Work Flow and Implementationtions are duplicated in the program.To ensure a fair comparison among these techniques, we use a metriccalled the SDC detection efficiency, which is similar to the efficiencydefined in prior work by Shafique et al. [30]. We define the SDC detectionefficiency as the ratio between SDC coverage and performance overhead fora detection technique. We calculate the SDC detection efficiency of eachbenchmark under a given performance overhead bound, and compare it withthe corresponding efficiencies of full duplication and hot-path duplication.The SDC coverage of full duplication is assumed to be a hundred percent [28].5.3 Work Flow and ImplementationMeasuring regression results of decision trees Figure 5.1 shows theworkflow for selecting parameters and measuring the regression results of thedecision trees which are parts of SDCAuto model. The workflow explores theparameter space which is consist of minimum size of leaves and data pointthreshold to test their influences on the regression results.We first compile the application using LLVM into its IR form. We thenextract the features that SDCAuto needs to estimate the SDC pronenessof stored values and comparison results. This is done using an automatedcompiler pass we wrote in LLVM, and the LAMPView tool [23] for analyzingload/store dependencies. We also need initial SDC proneness data for eachstored value and comparison instructions to build our decision tree model.This is obtained by fault injections. However, the fault injections are donefor building SDCAuto; using the built model does not require fault injection.535.3. Work Flow and ImplementationFigure 5.1: The workflow of building regression trees and exploring the pa-rameter space for SDCAuto.Once the training data and testing data are obtained, we build regressiontrees for stored values and comparison instructions with minimum size ofleaves iterating from 1 to 120 and data point threshold iterating from 10to 120. For each combination of minimum size of leaves and data pointthreshold, we calculate MSE for both training data and testing data to showthe influences of the two parameters. The values of minimum size of leavesand data point threshold with minimum MSE of testing data will be selectedas the optimal parameters of the regression trees.Measuring overall SDC estimation and coverage Figure 5.2 showsthe workflow for estimating the overall SDC rates and providing configurableprotection using either SDCTune model or SDCAuto model. The workflowrequires the following inputs from the user: (1) source code for the program,545.3. Work Flow and ImplementationFigure 5.2: The workflow of applying our models for (1) estimate the overallSDC failure rate and (2) selectively protect the SDC-prone variables subjectto a performance overhead.(2) a set of representative input(s) for executing the application, and (3)output function calls that generate the output data that we care about interms of SDC failures (as mentioned before, not all output data in an ap-plication is important from the perspective of SDCs, for example, statisticalor timing information in the output). In addition, it requires the user tospecify the maximum allowable performance overhead that may be incurredby the detectors inserted by our technique.Similar to how we did the initial study, we first compile the source codeand extract features from the compiled IR. Then, we run the extracted fea-tures through either the SDCTune or the SDCAuto model built in Chapter 4,to generate an estimated SDC proneness for each instruction. We then usethe results from our model to estimate the overall SDC rate of the appli-cation, and for inserting detectors into the program for protecting the mostSDC-prone instructions within the given overhead bound. The detectors areautomatically inserted into the program by another LLVM pass we wrote.We use the representative inputs provided by the user to execute the pro-555.4. Summarygram for obtaining its execution time with the detectors. The above processof choosing instructions to protect is repeated iteratively until the desig-nated performance overhead bound is fulfilled. If we exceed the performanceoverhead bound, we backtrack and remove the most recently inserted detec-tors. Finally, we use the program fortified with the detectors to measure itsperformance overhead and fault coverage.5.4 SummaryIn this chapter, we described the experimental setup for evaluating ourSDCTune and SDCAuto models. Section 5.1 presented the benchmark pro-grams for evaluating our models and described the characteristics of theprograms. Section 5.2 presented the of evaluation methods and experimentdesign. Section 5.3 presented the workflow and implementation details ofevaluating our technique. In the next chapter, we will present the results ofour evaluation.56Chapter 6ResultsThis chapter presents the results of our experiments to: (1) explore theparameter space for our decision tree model for SDCAuto, (2) estimate theoverall SDC rate of an application with both SDCTune and SDCAuto modeland (3) apply configurable protection to maximize detection coverage underdifferent performance overhead bound. We first present the results of thetimes taken by both the SDCTune and SDCAuto models.6.1 Time Taken by ModelsIn our experiments, both SDCTune and SDCAuto models require fiveto fifty minutes (average of 24 minutes) depending on the application, toestimate the overall SDC rate and to generate a fortified executable protectedwith detectors for a given performance overhead. As shown in Table 6.1, mostof the time are spent on Feature extraction and Instruction selection, whichrequire one to forty five minutes (average 10.08 minutes) and five seconds toforty nine minutes (average 14.34 minutes), respectively.The time taken by the Feature Extraction phase is spent in profiling theprogram and recording the store-load dependencies with LAMPView [23].Therefore, programs with longer execution time and more dynamic counts576.1. Time Taken by ModelsTable 6.1: Time consumption of SDCTune and SDCAutoGroup BenchmarkFeatureextraction(minutes)InstructionSelection(minutes)Total(minutes)TrainingLbm 23 9 32IS 9 1 10LU 3 0.083 3.083Bzip2 9 49 58Water 3.5 16 19.5Swaptions 9 9 18TestingGzip 8 27 35CG 45 3 48Ocean 6 35 41Bfs 1 4 5Mcf 2 9 11Libquantum 2.5 10 4.167Mean 10.083 14.340 23.729for memory operations requires much longer time for profiling than otherprograms. In this study, Lbm and CG require longest time on feature ex-traction, and they also require a obviously longer time for a normal run andhave higher dynamic counts of store/load operations.We found that the time taken by Instruction Selection phase is mainlyaffected by the number of stores and comparisons of a program. This isshown in the number of stores and comparisons in Table 5.1, Table 5.2 andtime spent on Instruction Selection in Table 6.1. The reason that programswith many stores and comparisons usually have a large number of staticinstructions. And more static instructions will polynomially increase the586.2. Effect of Decision Tree Parameterssearch space of finding an optimal set of instructions to protect in our tech-nique [22].We also found that merely applying our models to estimate SDC prone-ness causes nearly no time consumption. This matches our intuition as usingthe tree to classify the store/comparison instructions and then back propa-gating the SDC proneness has O(n) time complexity, where n represents thenumber of instructions. Even several tens of thousands static instructionscan be processed in just tens of milliseconds on today's desktop computers.On the contrary, fault injection alone requires anywhere from a few hoursto a few days to generate the SDC rates for each application. Further, esti-mating the SDC-prone locations in a program using fault injection requireseven more fault injections and significant effort to map the results of thefault injection back to the program's code, which is necessary for placingdetectors.6.2 Effect of Decision Tree ParametersWe explored the parameter spaces for stored value decision tree and com-parison instruction decision tree. Figure 6.1 shows the mean squared errors(MSE) for the decision trees under different minimum size of leaves and datapoint threshold for training and testing dataset. Recall that our goal is tochoose parameters for the models to minimize the MSE.From Figure 6.1, we can observe that overfitting occurs (as expected)when minimum size of leaves is too small, while incomplete learning occurswhen it is too large. At the same time, a large data point threshold may596.3. Estimation of Overall SDC Ratesintroduce imbalance in the training dataset and worsen the regression result,as shown in Figure 6.1d, while too small a value can hinder the tree splittingprocess then decrease the accuracy, as shown in Figure 6.1c and Figure 6.1a.As shown in Figure 6.1c and Figure 6.1d , we found minimum size ofleaves = 17, it data point threshold = 80 has lowest MSE for testing storedvalues, and so is selected as optimal for tree of stored values. Similarly,minimum size of leaves = 57, data point threshold = 40 has lowest MSE forcomparison instructions, and so is selected for tree of comparisons. Basedon this configuration, we rank the features according to their importance inTable 6.2. We further discuss these features in Section 7.2.6.3 Estimation of Overall SDC RatesWe estimate the overall SDC rates of the applications using SDCTunemodel and SDCAutomodel, then compare them with the SDC rates obtainedthrough 3000 random fault injections per benchmark. Table 6.3 shows theoverall SDC rates (P (SDC)) from the fault injections and the estimatedoverall SDC rates (Pˆ (SDC)) for both training programs and testing pro-grams. The SDC rates are statistically significant with an error bar rangingfrom 1.78%(Lbm) to 0.71%(Swaptions), at the 95% confidence intervals.From Table 6.3, it can be observed that the absolute values of the esti-mated SDC rates do not match with the observed ones accurately. However,the results show high positive correlation between the SDC ranks estimatedby our models and those observed in reality, where rank represents the rela-tive SDC rate. Figure 6.2 plots the estimated SDC ranks versus the observed606.3. Estimation of Overall SDC Rates(a) MSE of training stored values (b) MSE of training comparisons(c) MSE of testing stored values (d) MSE of testing comparisonsFigure 6.1: Effect of data point threshold(y-axis) and minimum size ofleaves(x-axis) on regression results616.3. Estimation of Overall SDC RatesTable 6.2: Features adopted by the optimal decision trees for SDCAutomodelTree Features used by decision trees ImportanceStored valuesinst func execution time ratio 0.4828bb length 0.1507data width 0.1501post dominated loop depth ratiobymax0.0706post dominated execution timeratio bymax0.0286in global 0.0269load execution time entropy 0.0257num static loads ratio bymax 0.0219bb length ratio bymax 0.0185dominated execution time ratiobywhole0.0119execution time required inst foraddr 0.0075used in oef func call 0.0045execution time loads 0.0003Comparisonsinst func execution time ratio 0.6058inst execution time ratio bymax 0.3942ranks for both the SDCTune and SDCAuto models. The x-axis shows theoverall SDC rates from 3000 random fault injections, while the y-axis showsthe estimated overall SDC rates using either SDCTune or SDCAuto. Thecorrelation coefficient is 0.8770 for our SDCTune model, and 0.8545 for SD-CAuto model, showing a strong positive correlation for both models.Thus, our models are highly accurate in predicting the SDC rates of ap-plications relative to others. However, it is not accurate at predicting theabsolute rates of SDCs. There are two reasons for this inaccuracy. First, ourestimation of SDC rates using fault propagation is conservative, and some-626.3. Estimation of Overall SDC RatesFigure 6.2: The correlation of overall SDC rates for all programs.times may overestimate the SDC proneness of variables in the presence ofapplication-specific masking. Second, our load-store dependence analysis isperformed using the LAMPView tool, which does not handle some libraryfunctions such as memcpy. This is also the reason for the large deviation ofthe SDC rate of CG when using SDCTune model. A large portion of theoutput of CG comes from memcpy and memset. This prevents LAMPViewto trace the data flow so that cause large deviations in estimating the SDCproneness for instructions. This inaccuracy in absolute SDC rate predic-tion may lead to inadequate protection, and additional overhead. However,our results show that despite the inaccuracy, our models can guide detectorplacement to obtain high coverage at low performance overheads. This is be-cause detector placement mainly relies on estimating the SDC proneness ofan instruction compared with other instructions, or the relative SDC prone-ness. Even though the absolute SDC proneness is not estimated accurately,our selection algorithm is still able to choose correct instructions if the rel-636.4. SDC Coverage and Detection EfficiencyTable 6.3: The SDC rates and ranks from fault injections and our modelsGroup BenchmarkP (SDC)frominjectionsPˆ (SDC)fromSDCTunePˆ (SDC)fromSDCAutoTrainingLbm 52.53% 48.11% 48.89%IS 43.46% 33.75% 26.57%LU 31.9% 25.43% 22.36%Bzip2 24.47% 17.88% 19.78%Water 5.9% 9.75% 18.85%Swaptions 4.1% 11.46% 11.74%TestingGzip 33.67% 32.46% 26.88%CG 23.67% 3.75% 24.58%Ocean 20.6% 14.75% 16.8%Bfs 17.37% 14.27% 17.19%Mcf 15.76% 17.84% 15.89%Libquantum 10.5% 10.9% 18.64%ative SDC proneness is correctly estimated by the models. However, this isnot sufficient for the estimation of overall SDC rates, as the estimated overallSDC rates are actually a sum of absolute SDC proneness of all instructions.6.4 SDC Coverage and Detection EfficiencyWe use both of the models for inserting error detectors into the appli-cations to maximize SDC coverage under a given performance overhead.Figure 6.4a shows the SDC coverage obtained by SDCTune model for eachbenchmark under three different performance overhead bounds: 10%, 20%and 30%. For the training programs, the geometric means of the SDC cov-erage for the 10%, 20% and 30% overhead bounds are 34.8%, 71.1% and646.4. SDC Coverage and Detection Efficiency78.9%, respectively. For the testing programs, the corresponding geometricmeans are 37.0%, 58.4% and 74.8% respectively, which are somewhat lowerthan the training programs' averages, as expected. We also measured theSDC coverage obtained with hot-path duplication, and found it to be 74.28%and 92.33% on average for training and testing programs respectively.Figure 6.5a shows the SDC coverage obtained by SDCAuto model. Thegeometric means of the SDC coverage are 31.14%, 66.32% and 76.03% re-spectively for the training programs. For the testing programs, the geometricmeans are 27.37%, 45.70% and 67.63% respectively.Figure 6.3 shows the performance overhead of full duplication and hot-path duplication. The overhead of full duplication is 50.16% on average forthe training programs, while it is 71.37% on average for the testing pro-grams. Hot-path duplication has an overhead of 33.19% for the trainingprograms, and 61.76% for the testing programs. Note that both of these areconsiderably higher than the 30% overhead bound we considered with ourdetectors.We also calculate the detection efficiency of the detectors we inserted,and for hot-path duplication based on their overhead and SDC coverages(Section 5.2). Figure 6.4b and Figure 6.5b show the SDC detection effi-ciency for our detectors with the three overhead bounds, and for hot-pathduplication. The efficiencies are normalized to that of full duplication, whichhas a baseline efficiency of 1. A value close to 1 means that no improvementis achieved over full duplication.With SDCTune model, we observe SDC detection efficiencies of 1.75x,1.78x and 1.32x for the training programs, and 2.65x, 2.09x and 1.78x for the656.4. SDC Coverage and Detection Efficiencytesting programs, at the 10%, 20% and 30% performance overhead boundsrespectively. We have higher detection efficiencies for our testing programsbecause the full duplication overheads of the testing programs are commonlyhigher than the training programs. This results into a lower baseline for thetesting programs in terms of detection efficiencies. More details are providedin Chapter 7. The reason that the efficiencies generally decrease as overheadincrease is that some of the instructions protected at higher overhead arenot as SDC prone. As the performance overhead of the detectors approachesthat of full duplication, the detection efficiencies will drop to 1.Detectors inserted using SDCAuto model have detection efficiencies of1.56x, 1.67x and 1.27x over full duplication for the training programs, and1.96x, 1.64x and 1.62x over full duplication for the testing programs. Thus,we find that there is considerable variation in detector efficiency amongbenchmarks and between SDCTune and SDCAuto model. We explain thereasons in the next chapter.We also observe no gain in efficiency with hot-path duplication comparedto full duplication in spite of its high coverage, as it incurs correspondinglyhigher overhead (as mentioned in Section 2.4). In summary, our techniquesignificantly outperforms both full-duplication and hot-path duplication inproviding better detection efficiency, for much lower performance overheadbounds.666.5. SummaryFigure 6.3: The overhead of full duplication and hot-path duplication6.5 SummaryIn this chapter, we presented the results of the evaluation of the models.We found that SDCTune and SDCAuto are both accurate in predicting theSDC rates of applications relative to other applications. However, both thetwo models did not predict the absolute SDC rates accurately for some appli-cations, as shown in Table 6.3 (Section 6.3). When applying our models forguiding detector placement, we found that our technique improved detectionefficiency of full duplication by a factor of 0.78x to 1.65x, compared with fullduplication and hot-path duplication, for the SDCTune model, and 0.62xto 0.96x for the SDCAuto model. This means that our technique providesmore efficient SDC detection for much lower performance overhead boundscompared with full duplication like approaches.676.5. Summary(a) The SDC coverages with error bars at the 95% confidence interval for SDCTunemodel. The error bars are less than 2%, and obtained from 3000 random fault injec-tions per benchmark. The SDC coverage of full duplication is considered as 100%(b) The normalized detection efficiency of SDCTune model. Full duplication is thebaseline and has detection efficiency = 1. (Detection efficiency is the ratio of SDCcoverage and performance overhead)Figure 6.4: The results of SDCTunemodel for different performance overheadbounds, hot-path duplication and full duplication.686.5. Summary(a) The SDC coverages with error bars at the 95% confidence interval for SDCAuto model.The error bars are less than 2%, and obtained from 3000 random fault injections perbenchmark. The SDC coverage of full duplication is considered as 100%(b) The normalized detection efficiency of SDCAuto model. Full duplication is thebaseline and has detection efficiency = 1. (Detection efficiency is the ratio of SDCcoverage and performance overhead)Figure 6.5: The results of SDCAutomodel for different performance overheadbounds, hot-path duplication and full duplication.69Chapter 7DiscussionIn this chapter, we discuss the reasons of the variation in detection cov-erage and efficiency among benchmarks and also the differences between thetwo models. We first present the differences between benchmarks when us-ing SDCTune model (Section 7.1). Then we discuss the reasons of differentresults between SDCTune model and SDCAuto (Section 7.2). Finally wediscuss the threats to the validity of our approach (Section 7.3).7.1 Differences between BenchmarksThere are two main reasons for the differences of the detection efficiency.First, for our technique to be efficient, it needs to protect instructionswith high SDC proneness, but with low dynamic execution count. We ob-served that applications which have such instructions experience moderateSDC rates, which are neither too high nor too low. From Table 6.3, programssuch as Libquantum, Bfs, Mcf, Bzip2, and Ocean fall into this category. Gen-erally, these programs benefit the most from SDCTune model (Figure 6.4b).But the detectors inserted in Mcf and Ocean have higher overhead so thatthe SDC coverage of these two benchmarks are lower in general under sameperformance overhead bound. For Mcf the reason is that it has a large707.1. Differences between Benchmarksamount of comparison operations for branches at runtime so that much morecheck instructions are needed to be inserted (Section 4.6) which cost moreperformance overhead budget. For Ocean, many of its dynamic instancesare float point operations which means duplicating these instructions maycause higher overhead because processors usually have very limited ALUresources for float point operations. Higher overhead for detectors preventsour technique from protecting larger amount of instructions so that these twobenchmarks have relatively low SDC coverage. However, The high detectionefficiencies of these two benchmarks also benefit a lot with our technique.On the other hand, if the benchmark has highly SDC prone instructionsthat are also highly executed, our technique does not do as well since theoverhead limit prevents our technique from selecting those SDC prone in-structions. Examples of these programs are Lbm, and IS.The second reason for the variation in efficiency among benchmarks rela-tive to full duplication, is that the overhead of full duplication is not uniform,as shown in Figure 6.3. This is because of benchmark-specific reasons suchas the distribution of integer and floating point operations. In general, pro-cessors have abundant integer computation units but not as many floatingpoint units, so the higher the fraction of floating point operations, the higheris the overhead due to duplication. We found that for some benchmarks suchas IS, Bfs, and Bzip2, the full duplication overhead is only about 40%. Thismeans that the detection efficiency improvement over full duplication is un-likely to be very high for these benchmarks. For example, even though IS,Bfs and Swaptions have reasonable SDC coverage, their detection efficiencyis not very high. In one of the benchmarks, Lbm, our detectors have a lower717.2. Differences between Modelsdetection efficiency compared to full duplication. This is because nearlyall SDC prone instructions in the program have high execution counts, andhence the performance overhead bounds cannot be satisfied if they are se-lected for protection. Therefore, this benchmark has low SDC coverage withour technique.7.2 Differences between ModelsComparing the coverage provided by the detectors of SDCTune modeland the SDCAuto model, the latter performs much worse on three programs,namely Bzip2, Gzip, Ocean and Mcf. The reason is the regression tree builtfor comparison operations. As shown in Table 6.2, only two features areutilized in the optimal regression tree by SDCAuto which turns out to haveonly three leaves. Such a result means an incomplete learning and failure inclassifying the extracted features correctly.This is because our CART model cannot utilize multiple features forone split, while categorizing comparison instructions usually requires to doso. For example, is _loop _terminator and nest _loop _depth should beconsidered at the same time, as nest _loop _depth show strong correlation forloop terminating comparisons, which are labeled with is _loop _terminator= True. However, CART may not discover this, as splitting on the individualfeatures, is _loop_terminator or nest _loop _depth along does not reducethe total MSE. So the SDCAuto is not likely to select these features anduse them correctly, and hence fails to build an optimal tree for comparisoninstructions.727.3. Threats to ValidityIn addition, as shown in Figure 4.1 and Table 6.2, the regression treefor stored values also failed in categorizing the training data according tothe four major usage groups (Section 3.2). However, in contrast to usage-based classification, the decision tree of stored values selects many code struc-ture related (e.g. bb_length, post_dominated_loop_depth_ratio_bymax)and common execution time related (e.g. inst_func_execution_time_ratio,dominated_execution_time) features in the model building phase. Thismeans that from the decision tree algorithm's perspective, these easy-to-extract features also have strong correlations with SDC proneness and areworthy of further study for SDC proneness estimation.In short, our SDCTune model illustrates an upper bound on the SDCdetection capability of a model, while our SDCAuto model presents an ex-ample of applying an existing machine learn-ing algorithm to build such amodel. Although our results show that SDCTune outperforms SDCAuto inboth estimating overall SDC rates and guiding detector placement, we notethat the gap is mainly caused by the limitation of the CART algorithm. Adifferent automatically tuned model with a more appropriate algorithm andmore training data may be able to match the manually tuned model.This isa subject of future investigation.7.3 Threats to ValidityInternal Threats First, the heuristics presented in Chapter 3 are formu-lated based on our observation of the fault injection experiments in our initialstudy (Chapter 2) and we can not guarantee that these heuristics can be al-737.3. Threats to Validityways held for all programs. Second, we cannot guarantee that the CARTalgorithm is the best for building our automatically tuned model of esti-mating SDC proneness. Other machine learning algorithms such as neuralnetwork and hidden Markov models may work better. Third, our SDCAutomodel is not built with K-fold cross validation. This is because the parame-ter data point threshold limits the total number of data points available inour training set so that we do not have enough data points for K-fold crossvalidation.External Threats A major external threat comes from the training pro-grams and testing programs of our study. In this thesis, we randomly selectedsix training programs and six testing programs from standard benchmarksuites. However, there may not be sufficient for training and testing a solidpredictor. We partially mitigate this threat by choosing programs from avariety of standard benchmark suites rather than confining ourselves to asingle suite. Another external threat is that both of our models are archi-tecture and operating system dependent, which means the models may onlywork for programs that run on a specific operating system with a specific ar-chitecture. This is because in our study, the consequences of fault injectionexperiments depend on the underlying architecture and operating system.Shifting to another architecture or operating system may vary the resultsof the faults so that our models may not be able to predict them success-fully. We partially mitigate this threat by using an x86 platform running theLinux operating system, which represent common choices running in manycommodity platforms.74Chapter 8Related WorkIn this section, we present prior work related to the detection of silentdata corruption with software technique. We classify related work into threecategories, namely (1) duplication based techniques, (2) invariant based tech-niques, and (3) application or algorithm specific techniques.8.1 Duplication Based TechniquesOne of the earliest papers on identifying critical variables in programs,and selectively protecting them is by Pattabiraman et al. [24]. Unlike ourwork, they focus mostly on crash-causing errors, which are relatively easyto detect compared to SDCs. Further, they do not provide configurableprotection in their work.SWIFT [28] is a compiler based technique that uses full duplication todetect faults in program data. However, full duplication can have significantperformance overhead, especially on embedded systems which do not havean abundant idle resources to mask the overhead of duplication. As shownin Figure 6.4b and Figure 6.5b, SDCTune and SDCAuto outperforms fullduplication in terms of SDC detection efficiency, and also enables config-urability to protect programs from SDC causing errors under various given758.1. Duplication Based Techniquesperformance overheads.Feng et al. [10], and Khudia et al. [16] have attempted to reduce theoverhead of full duplication by only duplicating high-value instructions(and variables), where a fault is unlikely to be detected by other techniquesand hence lead to SDCs. Unlike our work however, they do not providea mechanism to configure the protection for a given performance overheadbound. This is especially important for embedded systems where the systemhas to satisfy strict performance constraints.Another branch of work [6, 8, 19, 20, 33] has focused on protecting soft-computing applications from soft errors, by duplicating only critical instruc-tions or data in the program. Examples of soft-computing applications arethose used in media processing and machine learning, which can tolerate acertain amount of errors in their outputs. These papers exploit the resilienceof soft computing applications to come up with targeted protection mecha-nisms. However, they cannot be applied in general purpose applications.Thomas et al. [33] propose a technique to protect soft-computing ap-plications from Egregious Data Corruptions (EDCs), which are errors thatcause unacceptable deviations in the program's output. Similar to our work,they formulate program-level heuristics to identify EDC prone data in theprogram. However, there are two main differences between their work andours. First, the heuristics they propose are based on how much program datais affected by an error. While this is important for EDC-causing errors, thisis not so for SDC-causing errors as even a small deviation in the output canbe an SDC. Therefore, we need a more complex set of heuristics to predictSDC prone data in a program. Secondly, EDCs constitute only 2 to 10%768.2. Invariant Based Techniquesof a program's faulty outcomes. In comparison, SDCs constitute up to 50%of a program's faulty outcomes, and hence need much more heavyweightprotection.Finally, in recent work, Shafique et al. [30] propose a technique for ex-ploiting fault masking in applications to provide efficient detection. Similarto our work, they rank the vulnerability of instructions in the program, andallow the user to specify performance overhead bounds to selectively chooseinstructions to protect. However, our work differs from theirs in two ways.First, they consider all failures as equally bad, including crashes and hangs.However, we focus exclusively on SDC-causing faults, which are the mostinsidious of faults. Therefore, we can achieve higher efficiency for protectingagainst SDC-causing faults. Secondly, their work employs three metrics todetermine the instructions to protect, all of which are estimated by perform-ing a static analysis of the application's control and data flow graph, whichis conservative by nature. In contrast, our work uses empirical data to buildthe model for estimating the SDC proneness of different instructions, andis hence relatively less conservative. Since Shafique et al. do not providea breakdown of their coverage among SDC failures, crashes and hangs, wecannot quantitatively compare the coverage of SDCTune and SDCAuto withtheir technique.8.2 Invariant Based TechniquesInvariant-based techniques [9, 25, 29] detect errors by extracting likelyinvariants in programs through runtime profiling and dependency analysis.778.3. Application or Algorithm Specific TechniquesThose likely invariants are used as assertions to check abnormal behavioursor data out-of-bounds to detect errors. Invariant based techniques typicallyhave lower overhead than duplication-based techniques, as the assertionsconsist of much fewer instructions than the entire backward slice of the vari-ables. However, an important limitation of this class of techniques is thatthey incur false positives, i.e., they can detect an error even when none oc-curs. This is because they all learn invariants from testing inputs, and theseinvariants may not hold when the program is running with real inputs inproduction. While our work also learns the model for SDC proneness basedon training applications, it uses static analysis to actually derive the detec-tors from the backward slices, and has no false positives as static analysis isconservative.8.3 Application or Algorithm Specific TechniquesHari et al. [11] proposes a set of detectors for detecting SDCs usingprogram-level detectors. Similar to our work, they also come up with amethod to choose variables to protect for maximizing the SDC coverageunder a given performance overhead bound. However, there are two dif-ferences between our work and theirs. First, they require fault injectionsto find the highly SDC prone variables in the program, which is time con-suming. Although they reduce the fault injection space using their Relyzertechnique [12], they still need to perform tens of thousands of injections. Incontrast, we use our model to determine the SDC prone locations withoutneeding fault-injections. Secondly, their detector derivation is done manually788.3. Application or Algorithm Specific Techniquesbased on understanding of the program. Further, some of their detectors areapplication-specific and cannot be generalized across programs, as they relyon specific algorithmic properties. In contrast, we use generic duplication-based detectors which are automatically derived for any application.Sloan et al. [26, 31] propose an algorithm specific approach to enhancethe fault detection for sparse linear algebra applications. Algorithmic so-lutions can achieve high coverage while keeping the performance overheadlow. However, they are not general solutions and cannot be easily appliedto other application types.79Chapter 9Conclusion and Future WorkAs hardware errors increase with technology scaling, Silent Data Corrup-tions (SDCs) are becoming more serious for a wide class of systems. Genericsolutions such as full duplication incur high performance overhead as theydo not prioritize protecting against SDC-causing errors.This paper proposes a configurable protection technique for SDC-causingerrors that allows users to trade-off performance for reliability. We developheuristics for estimating the SDC proneness of instructions and build a man-ually tuned model, SDCTune, and a auto tuned model, SDCAuto, based onthe heuristics and decision tree algorithm. We then use our models to guidethe selection of instructions to be protected with error detectors under agiven performance overhead. Our results show that our models are highlyaccurate at predicting the relative SDC rates of applications. The detectorsinserted using SDCTune model outperform both full duplication and hot-path duplication by a factor of 0.78x to 1.65x in detection efficiency. Andwith SDCAuto model, our detectors outperform full duplication by a factorof 0.62x to 0.96x.We plan to explore two directions as future work. First, while SDCTuneand SDCAuto have high accuracy in predicting the relative SDC rate of80Chapter 9. Conclusion and Future Workan application, they are not as accurate in predicting the absolute SDCrates. We plan to work on improving their absolute accuracy. Second, wewill explore parallelizing the detectors to lower their performance overheadfurther.81Bibliography[1] D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter,L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S.Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga.The nas parallel benchmarks&mdash;summary and preliminary results.In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing,Supercomputing '91, pages 158165, New York, NY, USA, 1991. ACM.[2] Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li.The parsec benchmark suite: Characterization and architectural impli-cations. In Proceedings of the 17th International Conference on ParallelArchitectures and Compilation Techniques, PACT '08, pages 7281, NewYork, NY, USA, 2008. ACM.[3] S. Borkar. Designing reliable systems from unreliable components:the challenges of transistor variability and degradation. Micro, IEEE,25(6):1016, Nov 2005.[4] L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification andRegression Trees. Wadsworth and Brooks, Monterey, CA, 1984.[5] Joao Carreira, Henrique Madeira, João Gabriel Silva, et al. Xception:82BibliographySoftware fault injection and monitoring in processor functional units.Dependable Computing and Fault Tolerant Systems, 10:245266, 1998.[6] J. Cong and K. Gururaj. Assuring application-level correctness againstsoft errors. In Computer-Aided Design (ICCAD), 2011 IEEE/ACMInternational Conference on, pages 150157, Nov 2011.[7] C. Constantinescu. Intermittent faults and effects on reliability of in-tegrated circuits. In Reliability and Maintainability Symposium, 2008.RAMS 2008. Annual, pages 370374, Jan 2008.[8] Marc de Kruijf, Shuou Nomura, and Karthikeyan Sankaralingam. Relax:An architectural framework for software recovery of hardware faults.In Proceedings of the 37th Annual International Symposium on Com-puter Architecture, ISCA '10, pages 497508, New York, NY, USA, 2010.ACM.[9] Michael D. Ernst, Jake Cockrell, William G. Griswold, and DavidNotkin. Dynamically discovering likely program invariants to supportprogram evolution. In Proceedings of the 21st International Conferenceon Software Engineering, ICSE '99, pages 213224, New York, NY,USA, 1999. ACM.[10] Shuguang Feng, Shantanu Gupta, Amin Ansari, and Scott Mahlke.Shoestring: Probabilistic soft error reliability on the cheap. In Proceed-ings of the Fifteenth Edition of ASPLOS on Architectural Support forProgramming Languages and Operating Systems, ASPLOS XV, pages385396, New York, NY, USA, 2010. ACM.83Bibliography[11] Siva Kumar Sastry Hari, Sarita V. Adve, and Helia Naeimi. Low-costprogram-level detectors for reducing silent data corruptions. In Proceed-ings of the 2012 42Nd Annual IEEE/IFIP International Conference onDependable Systems and Networks (DSN), DSN '12, pages 112, Wash-ington, DC, USA, 2012. IEEE Computer Society.[12] Siva Kumar Sastry Hari, Sarita V. Adve, Helia Naeimi, and PradeepRamachandran. Relyzer: Exploiting application-level fault equivalenceto analyze application resiliency to transient faults. In Proceedings ofthe Seventeenth International Conference on Architectural Support forProgramming Languages and Operating Systems, ASPLOS XVII, pages123134, New York, NY, USA, 2012. ACM.[13] John L. Henning. Spec cpu2000: measuring cpu performance in the newmillennium. Computer, 33(7):2835, Jul 2000.[14] M. Hiller, A. Jhumka, and N. Suri. On the placement of software mecha-nisms for detection of data errors. In Dependable Systems and Networks,2002. DSN 2002. Proceedings. International Conference on, pages 135144, 2002.[15] Ang Jin, Jianhui Jiang, Jiawei Hu, and Jungang Lou. A pin-baseddynamic software fault injection system. In Young Computer Scientists,2008. ICYCS 2008. The 9th International Conference for, pages 21602167. IEEE, 2008.[16] Daya Shanker Khudia, Griffin Wright, and Scott Mahlke. Efficient softerror protection for commodity embedded microprocessors using profile84Bibliographyinformation. In Proceedings of the 13th ACM SIGPLAN/SIGBED In-ternational Conference on Languages, Compilers, Tools and Theory forEmbedded Systems, LCTES '12, pages 99108, New York, NY, USA,2012. ACM.[17] Jean-Claude Laprie. Dependable computing and fault-tolerance. Digestof Papers FTCS-15, pages 211, 1985.[18] Chris Lattner and Vikram Adve. Llvm: A compilation framework forlifelong program analysis & transformation. In Proceedings of the Inter-national Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO '04, pages 75, Washington,DC, USA, 2004. IEEE Computer Society.[19] Kyoungwoo Lee, A. Shrivastava, I. Issenin, N. Dutt, and N. Venkata-subramanian. Partially protected caches to reduce failures due to softerrors in multimedia applications. Very Large Scale Integration (VLSI)Systems, IEEE Transactions on, 17(9):13431347, Sept 2009.[20] Song Liu, Karthik Pattabiraman, Thomas Moscibroda, and Ben-jamin G. Zorn. Flikker: Saving dram refresh-power through criticaldata partitioning. In Proceedings of the Sixteenth International Confer-ence on Architectural Support for Programming Languages and Operat-ing Systems, ASPLOS XVI, pages 213224, New York, NY, USA, 2011.ACM.[21] Qining Lu, Karthik Pattabiraman, Meeta S Gupta, and Jude A Rivers.Sdctune: a model for predicting the sdc proneness of an application for85Bibliographyconfigurable protection. In Proceedings of the 2014 International Con-ference on Compilers, Architecture and Synthesis for Embedded Systems,page 23. ACM, 2014.[22] Silvano Martello and Paolo Toth. Knapsack problems. Wiley New York,1990.[23] Thomas Mason et al. LAMPVIEW: A loop-aware toolset for facilitat-ing parallelization. Master's thesis, Dept. of Electrical Engineeringi,Princeton University, 2009.[24] K. Pattabiraman, Z. Kalbarczyk, and R.K. Iyer. Application-basedmetrics for strategic placement of detectors. In Dependable Computing,2005. Proceedings. 11th Pacific Rim International Symposium on, pages8 pp., Dec 2005.[25] K. Pattabiraman, G.P. Saggese, D. Chen, Z. Kalbarczyk, and R.K. Iyer.Dynamic derivation of application-specific error detectors and their im-plementation in hardware. In Dependable Computing Conference, 2006.EDCC '06. Sixth European, pages 97108, Oct 2006.[26] J.S. Plank, Youngbae Kim, and J.J. Dongarra. Algorithm-based disk-less checkpointing for fault tolerant matrix operations. In Fault-TolerantComputing, 1995. FTCS-25. Digest of Papers., Twenty-Fifth Interna-tional Symposium on, pages 351360, June 1995.[27] John Ross Quinlan. C4. 5: programs for machine learning, volume 1.Morgan kaufmann, 1993.86Bibliography[28] G.A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D.I. August.Swift: software implemented fault tolerance. In Code Generation andOptimization, 2005. CGO 2005. International Symposium on, pages243254, March 2005.[29] S.K. Sahoo, Man-Lap Li, P. Ramachandran, S.V. Adve, V.S. Adve, andYuanyuan Zhou. Using likely program invariants to detect hardwareerrors. In Dependable Systems and Networks With FTCS and DCC,2008. DSN 2008. IEEE International Conference on, pages 7079, June2008.[30] M. Shafique, S. Rehman, P.V. Aceituno, and J. Henkel. Exploitingprogram-level masking and error propagation for constrained reliabil-ity optimization. In Design Automation Conference (DAC), 2013 50thACM / EDAC / IEEE, pages 19, May 2013.[31] Joseph Sloan, Rakesh Kumar, and Greg Bronevetsky. Algorithmic ap-proaches to low overhead fault detection for sparse linear algebra. InProceedings of the 2012 42Nd Annual IEEE/IFIP International Con-ference on Dependable Systems and Networks (DSN), DSN '12, pages112, Washington, DC, USA, 2012. IEEE Computer Society.[32] John A Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and W-m Hwu. Par-boil: A revised benchmark suite for scientific and commercial through-put computing. Center for Reliable and High-Performance Computing,2012.87Bibliography[33] A. Thomas and K. Pattabiraman. Error detector placement for softcomputation. In Dependable Systems and Networks (DSN), 2013 43rdAnnual IEEE/IFIP International Conference on, pages 112, June 2013.[34] Long Wang, Z. Kalbarczyk, and R.K. Iyer. Formalizing system behaviorfor evaluating a system hang detector. In Reliable Distributed Systems,2008. SRDS '08. IEEE Symposium on, pages 269278, Oct 2008.[35] Jiesheng Wei and K. Pattabiraman. Blockwatch: Leveraging similarityin parallel programs for error detection. In Dependable Systems and Net-works (DSN), 2012 42nd Annual IEEE/IFIP International Conferenceon, pages 112, June 2012.[36] Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder PalSingh, and Anoop Gupta. The splash-2 programs: Characterizationand methodological considerations. In Proceedings of the 22Nd AnnualInternational Symposium on Computer Architecture, ISCA '95, pages2436, New York, NY, USA, 1995. ACM.[37] Kenneth C. Yeager. The mips r10000 superscalar microprocessor. IEEEMicro, 16(2):2840, April 1996.88

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0167661/manifest

Comment

Related Items