UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Finding resilience-friendly compiler optimizations using meta-heuristic search techniques Narayanamurthy, Nithya 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2015_november_narayanamurthy_nithya.pdf [ 906.06kB ]
Metadata
JSON: 24-1.0166741.json
JSON-LD: 24-1.0166741-ld.json
RDF/XML (Pretty): 24-1.0166741-rdf.xml
RDF/JSON: 24-1.0166741-rdf.json
Turtle: 24-1.0166741-turtle.txt
N-Triples: 24-1.0166741-rdf-ntriples.txt
Original Record: 24-1.0166741-source.json
Full Text
24-1.0166741-fulltext.txt
Citation
24-1.0166741.ris

Full Text

Finding Resilience-Friendly Compiler Optimizations usingMeta-Heuristic Search TechniquesbyNithya NarayanamurthyB. E., Anna University, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of Applied ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)October 2015c© Nithya Narayanamurthy, 2015AbstractWith the projected increase in hardware error rates in the future, software needsto be resilient to hardware faults. An important factor affecting a program’s errorresilience is the set of optimizations used when compiling it. Compiler optimiza-tions typically optimize for performance or space, and rarely for error resilience.However, prior work has found that applying optimizations injudiciously can lowerthe program’s error resilience as they often eliminate redundancy in the program.In this work, we propose automated techniques to find the set of compiler opti-mizations that can boost performance without degrading its overall resilience. Dueto the large size of the search space, we use search heuristic algorithms to efficientlyexplore the space and find an optimal sequence of optimizations for a given pro-gram. We find that the resulting optimization sequences have significantly highererror resilience than the standard optimization levels (i.e., O1, O2, O3), while at-taining comparable performance improvements with the optimizations levels. Wealso find that the resulting sequences reduce the overall vulnerability of the appli-cations compared to the standard optimization levels.iiPrefaceThis thesis is based on a work conducted by myself in collaboration with Dr.Karthik Pattabiraman. I was responsible for coming up with the solution and vali-dating it, evaluating the solution and analyzing the results. Karthik was responsiblefor guiding me on all the core aspects like formalization of the problem, with thesolution reasoning, methodology, and analysis and interpretation of results.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii1 Intoduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Research Goal and Proposed Solutions . . . . . . . . . . . . . . . 21.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Background and Fault Model . . . . . . . . . . . . . . . . . . . . . . 52.1 Error Resilience and SDC . . . . . . . . . . . . . . . . . . . . . . 52.2 Genetic Algorithm (GA) . . . . . . . . . . . . . . . . . . . . . . 62.3 Simulated Annealing (SA) . . . . . . . . . . . . . . . . . . . . . 72.4 Fault Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8iv2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Study on Compiler Optimizations . . . . . . . . . . . . . . . . . . . 113.1 Compiler Optimizations . . . . . . . . . . . . . . . . . . . . . . . 113.2 Fault Injection Study . . . . . . . . . . . . . . . . . . . . . . . . 133.3 Analysis on Individual Optimization . . . . . . . . . . . . . . . . 133.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.1 Problem Statement and Complexity . . . . . . . . . . . . . . . . 174.2 GA-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . 184.2.1 Representative Example . . . . . . . . . . . . . . . . . . 234.3 SA-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . 244.4 Measuring Resilience . . . . . . . . . . . . . . . . . . . . . . . . 264.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.2 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.3 Tuning of the GA parameters . . . . . . . . . . . . . . . . . . . . 315.4 Tuning of the SA parameters . . . . . . . . . . . . . . . . . . . . 335.5 Resilience Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 335.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 345.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.1 Effect of GA Parameters . . . . . . . . . . . . . . . . . . . . . . 356.1.1 Mutation Rate . . . . . . . . . . . . . . . . . . . . . . . 356.1.2 Selection Strategy . . . . . . . . . . . . . . . . . . . . . 366.1.3 Population Size . . . . . . . . . . . . . . . . . . . . . . . 376.1.4 Optimization Types . . . . . . . . . . . . . . . . . . . . . 386.2 Effect of SA parameters . . . . . . . . . . . . . . . . . . . . . . . 396.2.1 Rate of Cooling . . . . . . . . . . . . . . . . . . . . . . . 396.3 Resilience Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 40v6.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 426.5 Vulnerability Evaluation . . . . . . . . . . . . . . . . . . . . . . 446.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487.1 GA-based Vs SA-based . . . . . . . . . . . . . . . . . . . . . . . 487.2 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 497.2.1 Order of the optimizations . . . . . . . . . . . . . . . . . 507.2.2 Resilience vs Vulnerability measuring fitness function . . 517.2.3 Evolution of the GA-based approach . . . . . . . . . . . . 537.2.4 Resilience-Enhancing Compiler Optimizations . . . . . . 547.3 Limitations of our Approaches . . . . . . . . . . . . . . . . . . . 557.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578.1 Effect of Compiler Optimizations on Resilience . . . . . . . . . . 578.2 Choosing Compiler Optimizations . . . . . . . . . . . . . . . . . 588.3 Software Errors and Genetic Algorithms . . . . . . . . . . . . . . 599 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 60Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62viList of TablesTable 2.1 Coverage of the fault model . . . . . . . . . . . . . . . . . . . 9Table 3.1 Different types of compiler optimizations and their characteristics 12Table 5.1 Benchmark programs that are used in our experiments . . . . . 32viiList of FiguresFigure 3.1 Resilience of blackscholes and swaptions optimized with dif-ferent individual optimizations (Black line represents the re-silience of the unoptimized version of blackscholes; Blue linerepresents the resilience of the unoptimized version of swaptions) 14Figure 3.2 Effect of running the LICM optimization on a code snippet (a)Unoptimized version, (b) Optimized version. . . . . . . . . . 15Figure 3.3 Effect of running the LOOP-REDUCE optimization on a codesnippet (a) Unoptimized version, (b) Optimized version. . . . 16Figure 4.1 Crossover Operations: Entities such as ‘a’, ‘b’, ‘c’ etc are in-dividual compiler optimizations in an optimization sequence. . 21Figure 4.2 Mutation Operations: Entities such as ‘a’, ‘b’, ‘c’ etc are indi-vidual compiler optimizations in an optimization sequence. . . 22Figure 4.3 Choosing the neighbor states from current state (a, b, c,.. areindividual optimizations) . . . . . . . . . . . . . . . . . . . . 25Figure 4.4 (a) Correlation between Total Dynamic instruction count vsSDC rate (for Blackscholes), and (b)Data-flow Regression Model’sEstimation vs Actual SDC rate . . . . . . . . . . . . . . . . 28Figure 6.1 Number of generations taken to generate the candidate solutionwith different mutation rate values. . . . . . . . . . . . . . . . 36Figure 6.2 Number of generations taken to generate the candidate solutionby the random selection and score based selection strategies. . 37viiiFigure 6.3 Number of generations taken to generate the candidate solutionwith different population sizes. . . . . . . . . . . . . . . . . . 38Figure 6.4 Number of generations taken to generate the candidate solutionwith different optimization types. . . . . . . . . . . . . . . . 39Figure 6.5 Number of iterations taken to generate the candidate solutionby the SA-based approach with different cooling rate. . . . . . 40Figure 6.6 Aggregate percentage of SDC, crash and benign results acrossbenchmarks for the unoptimized version. . . . . . . . . . . . 41Figure 6.7 Resilience of the Unoptimized, candidate solutions, O1, O2and O3 levels. (Higher values are better). . . . . . . . . . . . 42Figure 6.8 SDC rate of the unoptimized code, candidate solutions, O1, O2and O3 levels. (Lower values are better) . . . . . . . . . . . . 42Figure 6.9 Runtime of the unoptimized code, candidate solutions, O1, O2and O3 levels. (Lower values are better). . . . . . . . . . . . 44Figure 6.10 Vulnerability of the unoptimized code, candidate solutions, O1,O2 and O3 levels. (Lower values are better). . . . . . . . . . . 46Figure 7.1 Resilience of blackscholes with loop-reduce, gvn and candi-date solutions obtained from GA-based and SA-based approacheswith the unoptimized code’s resilience as baseline. Black linerepresents the resilience of the unoptimized code. . . . . . . . 49Figure 7.2 Resilience of blackscholes with the candidate solution-GA based(CS) and the different combinations of the sequence (C1,C2,....C23)with the unoptimized code’s resilience as baseline. . . . . . . 51Figure 7.3 Vulnerability of the candidate solutions with resilience andvulnerability target. Lower values are better. . . . . . . . . . . 52Figure 7.4 Mean population fitness scores during the GA evolution pro-cess for each program. . . . . . . . . . . . . . . . . . . . . . 53Figure 7.5 Resilience of the candidate solutions obtained from GA-basedand Unbounded GA-based approaches. . . . . . . . . . . . . 55Figure 7.6 Mean population fitness scores during the Unbounded GA evo-lution process for each program. . . . . . . . . . . . . . . . . 56ixList of AcronymsLLVM Low Level Virtual MachineLLFI LLVM based Fault InjectorIR Intermediate RepresentationSDC Silent Data CorruptionEDC Egregious Data CorruptionGA Genetic AlgorithmSA Simulated AnnealingCS Candidate SolutionxAcknowledgmentsI would like to thank my advisor Prof. Karthik Pattabiraman for his support,valuable guidance and encouragement. Karthik has always been motivating meto ponder beyond the norm and guided me in the right direction. He constantlyencouraged for my progressive work, at times when I faced difficulties and disap-pointments. His enthusiasm to share one’s thoughts and interrogate on new ideasis something that I find very inspiring.I would like to thank my lab colleagues with whom I have had many insight-ful discussions. Their questions have often made me think in diverse directions.Thanks to my colleagues and the professors at the CSRG meetings who helped meexpand my knowledge in other research fields.This would have never been possible without the support from my family andfriends. I want to thank my parents, sisters and brother-in-laws for their sincerelove and support through this journey. Words cannot do justice to their contribu-tion. Special thanks to my friend Vignesh who has made me strong through thisentire journey. I wish to thank all my friends in India who helped and encouragedme to plan my graduate studies. Thanks to all my friends in Canada who made mefeel this new country as my home.Last but not the least, I thank the almighty for providing me with the above!xiDedicationTo my parentsxiiChapter 1Intoduction1.1 MotivationTransient hardware faults (i.e., soft errors) are becoming more frequent as fea-ture sizes shrink and manufacturing variations increase [3]. Unlike in the past,when such faults were handled predominantly by the hardware, researchers havepredicted that hardware will expose more of these faults to the software applica-tion [10, 19]. This is because the traditional methods of handling hardware faultssuch as dual modular redundancy and guard banding consume significant amountsof energy, which makes their use challenging for most commodity systems whereenergy is a first class constraint. Therefore, it becomes critical to design softwareapplications that are resilient to hardware faults.One of the most important decisions a programmer wishing to build error-resilient applications must make is whether to run compiler optimizations on it.Compiler optimizations, while boosting performance, can often have a deleteriouseffect on resilience [8, 28, 34] as they remove some of its redundancy. On the otherhand, applying optimizations can make programs run faster, thus making them lessvulnerable to hardware errors in the first place 1.In this work, we ask the question: “Do compiler optimizations hurt or improve1We define vulnerability as the probability that an error occurs in the program and leads to afailure, and resilience as the conditional probability that an error leads to a failure given that it occursin the first place. Thus vulnerability = (1− resilience)∗ executionTime.1error resilience and vulnerability of programs ?”. This question is important forprogrammers to decide whether to apply optimizations to those programs for whichresilience matters. Prior work [8, 28] has investigated this question by studying theeffect of compiling with the standard optimization levels (i.e., O1, O2 and O3) onprograms’ error resilience or vulnerability. While this is useful, the standard opti-mization levels group together many optimizations, and hence prior work does notdisambiguate the effects of individual optimizations on error resilience (and vulner-ability). Thomas et al. [34] have considered the effect of individual optimizationson error resilience, but they limit themselves to soft-computing applications, orthose applications that are inherently error tolerant, e.g., multi-media applications.To the best of our knowledge, there is no work that has evaluated the effect ofindividual optimizations on the error resilience of general-purpose programs.1.2 Research Goal and Proposed SolutionsIn this work, we first perform an experimental study (Chapter 3) to understand theeffect of individual optimizations on program’s error resilience. As mentioned,we distinguish between error resilience and vulnerability to separate the effects ofcompiler optimizations on execution time and code structure. We find that thereis a significant difference in the error resilience achieved by individual optimiza-tions, and that this effect varies significantly across applications. Further, contraryto what prior studies have shown [8, 28], we find that some compiler optimizationscan actually improve the error resilience of the program in addition to its perfor-mance, thus doubly reducing the program’s vulnerability.Based on this insight, we devise automated techniques to find a sequence ofoptimizations for a given application that preserves its error resilience. In otherwords, we attempt to find sequences of individual optimizations for an applicationthat do not degrade the application’s error resilience, while improving its perfor-mance, thus reducing its overall vulnerability. However, the space of all possibleoptimizations to consider when optimizing for both resilience and performanceis extremely large and brute force search is intractable. Therefore, we leveragemeta-heuristic techniques proposed in prior work for performance and memoryoptimizations [6, 16, 38].2The meta-heuristic search techniques we use in this work are Genetic Algo-rithms (GA) and Simulated Annealing (SA). Based on our results we suggest GAover Simulated Annealing (SA), as we found that GA was faster and yielded bet-ter solutions in our experiments. Applying GAs and SA to the problem of findingresilience-preserving optimizations requires two things. First, we need to developappropriate operators for the GA and SA to find optimization sequences that satisfythe desired resilience levels. Secondly, we need to come up with a set of parame-ters for the algorithms to ensure that they converge within a reasonable amount oftime. To the best of our knowledge, we are the first to use a meta-heuristic searchalgorithms such as GA and SA to find compiler optimization sequences that canimprove performance without degrading error resilience.1.3 ContributionsWe make the following contributions in our work:• Study the effect of individual optimizations on different programs’ error re-silience through fault-injection experiments,• Propose GA-based and SA-based techniques to find a compiler optimizationsequence for a given application that does not degrade the error resilience,• Implement the techniques in a production, open-source compiler, LLVM [17],• Experimentally tune the parameters of the GA-based and SA-based approachesto achieve fast convergence to solution,• Evaluate our technique on 12 programs from the PARSEC [2] and Par-boil [33] benchmark suites using fault-injection experiments, in terms of itserror resilience, performance and vulnerability, and compare it to the stan-dard optimization levels.The main results of our experimental evaluation are: (1) the resilience of thecandidate optimization sequences found by our techniques (GA and SA) is muchbetter than those of the standard optimization levels, and in many cases, even bet-ter than that of the unoptimized code, (2) the performance of the optimized code3with our techniques is on par with or only slightly lower than the performance ofthe code with the standard optimization levels (GA based - better than O1, O2 andsligthly worse than O3 by 0.39%; SA based - 0.61%, 2.27% and 2.72% worsethan O1, O2 and O3 respectively), (3) On average, our techniques considerablylower the overall vulnerability of the application (GA- 8.12 (±0.21) and SA- 8.51(±0.22) on average), while the standard optimization levels O1 increase the over-all vulnerability of the application (O1-9.53 (±0.25) on average) and O2 and O3reduce it slightly (O2-9.22 (±0.24) and O3-9.11 (±0.24)). Thus, for a small per-formance loss compared to the most aggressive optimization level, our techniquessignificantly reduces the overall application vulnerability from the unoptimizedcode.4Chapter 2Background and Fault ModelIn this chapter, we first define error resilience and vulnerability. We then present abrief overview of Genetic Algorithms, Simulated Annealing and then describe ourfault model.2.1 Error Resilience and SDCA hardware fault can cause a program to fail in one of three ways: it may cause theprogram to crash, hang, or have an Silent data corruption(SDC). SDC is an outcomethat results in incorrect output without any indication, hence the name “silent”. Wefocus on SDCs as they are considered the most severe kind of failures in a program(the other failures, namely crashes and hangs can be detected through hardwareexceptions and timeout mechanisms respectively). Error Resilience is the abilityof the program to prevent an error that occurs in it from becoming an SDC. In otherwords, resilience is the conditional probability that a program does not produce anSDC given that it is affected by a hardware fault (i.e., the fault is activated). Thisis different from vulnerability, which is the unconditional probability of a faultoccurring in the program and leading to an SDC. As mentioned earlier, we definethe Resilience = (1−SDCrate), and Vulnerability = (SDCrate∗Executiontime),where SDCrate is the fraction of SDCs observed over the set of all activated faults(i.e., faults that manifest to the software).Note that our definition of vulnerability differs from the commonly used no-5tion of the Architectural Vulnerability Factor [21], which is defined in terms of thenumber of bits in a hardware structure that are needed for architecturally correctexecution (ACE). We eschew the traditional definition as it is tied to the architec-tural state of the processor, while we want to capture the effect of the error on theapplication. Further, AVF studies often employ detailed micro-architectural simu-lators which are slow, and hence do not execute the application to completion. Onthe other hand, we want to execute applications to completion on the real hardwareas we are interested in the ultimate effect of the error (i.e., whether or not it resultsin an SDC).In our work, we attempt to choose optimizations that maintain the error re-silience of the application compared to the unoptimized version. We focus onresilience to separate the effects of compiler optimizations on code structure andexecution time. Since all the optimizations we choose aim at improving perfor-mance, the vulnerability will be reduced if the resilience is maintained the sameafter the optimization is applied (due to shorter execution time).2.2 Genetic Algorithm (GA)A Genetic Algorithm (GA) [14] is a search heuristic algorithm that is inspired bynatural evolution. The algorithm starts with an initial set of candidate solutions.They are collectively called as the Population. The algorithm has a fitness functionthat is used to calculate a candidate’s fitness score. The fitness score depends onhow good the candidate is at solving a problem, and it is the parameter that eval-uates a candidate’s rank towards the optimal solution. One or two candidates arechosen from the population to perform recombination at each stage.The recombination operations are of two types: Crossover and Mutation. Twocandidates undergo Crossover whereas, for mutation, only one candidate takes part.The crossover operation performs a randomized exchange between solutions, withthe possibility to generate a better solution from a good one. This operation tendsto narrow the search and move towards a solution. On the other hand, mutationinvolves flipping a bit or an entity in a solution, which expands the search explo-ration of the algorithm. Crossover and mutation rate are the probabilities at whichthe respective operations are performed [11] [29]. The choice of these probability6values reflects the trade-off between exploration and exploitation (or convergence).A higher mutation rate for example, leads to better exploration but can delay con-vergence. On the other hand, a high crossover rate can lead to faster convergence,but may get stuck in a local maxima.Typically, recombination gives rise to new better performing members, whichare added to the population. Members in the population that have poor fitnessscores are thus eliminated gradually. This process is repeated iteratively until eithera population member has the desired fitness score, thereby finding a solution, orthe algorithm exceeds the time allocated to it and is terminated.2.3 Simulated Annealing (SA)Simulated annealing is a meta-heuristic search algorithm that derives its name fromthe metallurgical process, “Annealing” [35]. Annealing [9] is a process used toalter the properties of a metal in which the metal is heated to a certain high temper-ature and then allowed to slowly cool with a specific cooling rate. Similarly, SAexplores the search space controlled by variables T-temperature and α-Coolingrate.The algorithm starts by selecting a random state as its current state from thegiven initial set, and the variable temperature T is assigned to a high value. It ex-plores a neighboring state from the current state, and evaluates it by comparing itsscore with the current state. A neighboring state is evolved by applying some slightmodifications to the current state. The score of a state determines how good thestate is as a potential solution for a given problem. The probability of acceptinga state depends on its score, and the variables T and α . Initially when T is high,the probability of accepting a state with a bad score is also high, thus expandingthe scope of the search for finding an optimal solution. T is gradually decreaseddepending on the cooling rate α as the algorithm progresses, and hence the prob-ability of accepting a weak state also decreases. If a state is accepted, then thealgorithm evolves a new stats from the accepted state for the next iteration. Thisprocess continues until an optimal solution is obtained.This algorithm handles the problem of local maxima suffered by Genetic Algo-rithm as its evolution accepts a good number of bad states (states with a bad score)7and tends to proceed towards global maxima [20].2.4 Fault ModelTransient hardware faults occur when particle strikes or cosmic rays affect the flip-flops or the logic elements. Particle strike or cosmic rays might impact various chipcomponents, namely memory, instruction cache, data cache, ALU, pipeline stages.Memory and cache are typically protected by error correcting codes or parity. Theyhave the ability to correct/detect single bit flips caused by the particle strike. Faultsoccurring in the instructions encoding can be detected by the use of simple codesas the instructions do not have the ability to change over execution while the datacan change. However, when a particle strikes the computational components likethe ALU, registers, processor pipelines, logic gates etc, they affect the result of theinstruction that is currently being executed in that component. This faulty resultis consumed by the subsequent dependent instructions ultimately impacting theapplication’s outcome if allowed to propagate.Propagation of an error from the component to the application level due to aparticle strike is illustrated in the following example. Let us consider an applicationA that is being executed in the processor exclusively. Consider the snapshot of theprocessor during a clock cycle n. Instructions i5, i4, i3, i2, i1 are currently in thepipeline stages Fetch, Decode, Execute, Memory, Write back stages respectively. Ifthe particle strikes a component in the execute stage of the pipeline, result of theinstruction in that stage - i3 is compromised and produces erroneous outcome.The fault model that we consider replicates this physical phenomenon of a par-ticle striking the processor components. Coverage of the chip components by thefault model is given in table 2.1. We simulate a particle strike affecting the coveredcomponents in the following way. While the application is in execution, one of itsdynamic instruction is picked randomly from an uniform distribution. We performa bit-flip in any register content of this instruction. This randomness adheres tothe uniform probability of a fault impacting any computational component of theprocessor at a given time. In the given example, the fault occurring at the executestage of instruction i3 would result in storing the faulty value in the destinationregister, and we replicate this by performing a bit flit in the value stored in the8Table 2.1: Coverage of the fault modelComponents Covered?Memory NoInstruction and DatacacheNoProcessor pipeline stages YesALU YesData bus NoControl unit NoRegisters Partially (Specific targetis selected; no uniformdistribution)destination register. We use the single bit-flip model to represent transient faults.Prior work [5] has found that there may be significant differences in the raw ratesof faults exposed to the software layer when fault injections are performed in thehardware. However, we are interested in faults that are not masked by the hard-ware and make their way to the application. Therefore, we inject faults directly atthe application level. The assumption in this fault model is that the application ofinterest is currently being executed in the processor exclusively. If multiple appli-cations are running in the processor and a particle strikes a component where aninstruction of one of the applications is running, only that application gets affected.The other applications continue to run unaffected. Hence, this fault model is alsoindependent of the underlying architecture and the number of applications. Weconsider single bit flips as this is the de-facto fault model for simulating transientfaults in the literature. Finally, we assume that at most one fault occurs during anapplication’s execution, as transient faults are relatively rare events. A similar faultmodel has been used by prior work in this area [11, 20, 22].2.5 SummaryIn this chapter, we defined Silent Data Corruption (SDC) and explained its im-portance over the other outcomes of transient hardware faults such as crashes andhangs. We then explained our definition of error resilience and vulnerability of9software applications. We also explained the general working of Genetic Algo-rithms and Simulated Annealing that were adopted in our automated techniquesfor finding the resilience friendly compiler optimizations. We finally defined thefault model used in our automated techniques to evaluate the error resilience ofsoftware applications.10Chapter 3Study on Compiler OptimizationsIn this chapter we perform an initial fault-injection study that analyzes the effectof individual compiler optimizations on error resilience of software applications.The experimental setup and the benchmarks considered here are described later inChapter 5.3.1 Compiler OptimizationsThere are several optimizations available in standard optimizing compilers. Anoptimization is a code transformation that seeks to improve some property of theprogram e.g., its performance, size. Different code versions can be achieved byapplying some optimizations more than once and in different sequences. As men-tioned earlier, optimizations are often grouped into packages, or levels such asO1, O2, and O3, which are common sequences of optimizations that offer differ-ent trade-offs between performance improvement and memory. Programmers canchoose to invoke either a predefined optimization level, or individual optimizations(from a set of pre-defined optimizations), when compiling their code. Table 3.1lists different common kinds of compiler optimizations and their characteristics.Optimizations can also affect the error resilience of a program [8, 34]. For ex-ample, many optimizations attempt to reduce redundant computations in the pro-gram in order to improve its performance. Unfortunately, this also has the effect ofincreasing the proneness of the program to errors that cause SDCs thereby making11Table 3.1: Different types of compiler optimizations and their characteristicsType Optimization DescriptionData flowoptimizationsConstant propagation(constprop)Replaces instructions which have onlyconstant operands with a constant valueSparse conditionalconstant propagation(sccp)Removes certain types of dead codeand propagates a constant through theentire programCommon subexpressionelimination(cse)Eliminates common subexpressionsand replaces them with a variable thatcomputes that common subexpressionLoopOptimizationsLoop invariant codemotion(licm)Moves the invariant code within theloop to the loop pre-header if it is notaffected by the loop iterationsLoop Strength Reduction(loop-reduce)Replaces expensive operations withinthe loop with simpler operationsUnroll Loops(loop-unroll)Eliminates/rewrites an instruction withrepeated independent instructions inthe loop to increase the program speedUnswitch loops(loop-unswitch)Moves the conditional inside the loopto outside and duplicates the loop in ‘if’and‘else’ clauseGlobalOptimizationsInter-procedural ConstantPropagation(ipconstprop)Similar to constant propagationoptimization which is applied acrossproceduresInter-procedural SparseConditional ConstantPropagation(ipsccp)Inter-procedural variant of sccp(decides on basic blocks, constantsand conditional branches)Inlining(inline) Replaces the function call with thebody of the functionOthersGlobal value Numbering(gvn)Assigns a number value to variablesand expressions that are provablyequivalentMerge Constants(mergeconst)Merges duplicate global constants to asingle shared constantInstruction combine(instcombine)Combines redundant and simplealgebraic instructionsthem less error resilient. On the other hand, some optimizations can increase theresilience of a program (see below).123.2 Fault Injection StudyWe chose 10 individual optimizations at random from about 50 optimizations avail-able from the LLVM compiler. We performed an initial study to analyze the im-pact of individual optimizations on the error resilience of two applications fromthe PARSEC benchmark suite, namely Blackscholes and Swaptions. We first com-piled each program with the chosen ten different optimizations using the LLVMcompiler [17].We performed fault injection experiments on the unoptimized version and theten different optimized versions of the programs to measure their respective er-ror resilience. Figure 3.1 shows the resilience (in %) of the different optimizedversions of the two programs compared to the resilience of the unoptimized ver-sion (baseline). As can be seen in the figure, some optimizations degrade the errorresilience of the program, while some optimizations improve the resilience. Forexample, the loop-reduce optimization improves the error resilience of Blacksc-holes, while instcombine degrades the error resilience.Further, the resilience effectof an optimization differs from one application to the other. For example, whilethe loop-reduce optimization improves the resilience of Blackscholes, it degradesthat of Swaptions. Therefore, we need an application-specific technique to find op-timization sequences that do not degrade error resilience but improve performancefor a given application. This is the goal of this thesis.3.3 Analysis on Individual OptimizationTo further understand why individual optimizations enhance or degrade a pro-gram’s error resilience, we wrote a series of micro-benchmarks that each attempt toexercise a single optimization. We then performed fault-injection studies into thesemicro-benchmarks in order to study the effect of these optimizations. This givesus an idea of why a particular optimization increases or decreases error resilience.We give two examples below, one optimization that degrades error resilience andanother that enhances error resilience.Resilience degrading optimization: Consider the commonly used loop op-timization loop-invariant code motion (LICM), which attempts to reduce the op-erations performed inside loops. It moves the loop-invariant expressions inside13licminstcombine csegvnip-sccpinlineloop-reduceloop-unrollloop-unswitchsccpIndividual Optimizations405060708090100Resilience (in %)BlackscholesSwaptionsFigure 3.1: Resilience of blackscholes and swaptions optimized with differ-ent individual optimizations (Black line represents the resilience of theunoptimized version of blackscholes; Blue line represents the resilienceof the unoptimized version of swaptions)the loop to the pre-header block of the loop without affecting the semantic of theprogram.Figure 3.2a shows a code snippet(unoptimized) where the LICM optimizationcan perform some transformations on the loops and Figure 3.2b shows the codeoptimized by the LICM optimization. Our original code snippet includes multiplesuch loops with similar operations - however, we show only one loop for simplicity.It can be seen that the expression that computes alpha (line 3 in Figure 3.2a) insidethe loop does not depend on the induction variable of the loop. Thus the LICMoptimization moves those expressions to the pre-header block of the loop and min-imizes the computations performed inside the loop as shown in Figure 3.2b.We performed 3000 fault injections in both the unoptimized and LICM op-timized program versions, and observed that the LICM optimization reduces theerror resilience of the program.To understand why the resilience is degraded, assume that the LICM optimizedcode experiences a fault in the computation alpha = (x * c) + s (line 1 inFigure 3.2b). This fault will affect all values of the array rs1 in all loop iterations.The original code on the other hand, computes alpha = (x * c) + s (line 3141 for(i=0; i<10; i++) 1 alpha=(x*c)+s;2 { 2 for(i=0; i<10; i++)3 alpha=(x*c)+s; 3 {4 rs1[i]=i+(alpha*7); 4 rs1[i]=i+(alpha*7);5 } 5 }Figure 3.2: Effect of running the LICM optimization on a code snippet (a)Unoptimized version, (b) Optimized version.in Figure 3.2a) on every iteration of the loop, and hence a fault in the computationaffects only the values of the array in that loop iteration, namely rs1. Therefore,the transformed code has a greater likelihood of experiencing an SDC due to thefault, and its resilience is lowered. This is an example of how an optimization maylower the error resilience of an application.Resilience enhancing optimization: Consider another loop optimization loopstrength reduction (LOOP-REDUCE), that performs strength reduction on arrayreferences by replacing complex operations inside the loop involving the loop in-duction variable with equivalent temporary variables and simpler operations. Sim-ilar to the previous example, Figure 3.3a shows a sample code snippet and howit is transformed by the LOOP-REDUCE optimization. The loop induction vari-able that is used for array references and value computation in the expression,rs1[i] = i*alpha (line 4 in Figure 3.3a) is replaced with temporary variablestemp and temp1 for the address and value of array rs1 as shown in Figure 3.3b(line 6-8). Hence the induction variable here is only used to control the loop entryand exit after the optimization.As in the previous example, we performed 3000 fault injection experiments inboth versions of the programs, and observed that the LOOP-REDUCE optimiza-tion enhances the resilience of the program. To understand why the resilience isenhanced in the case of the LOOP-REDUCE optimization, consider a fault that oc-curs in the computation of the loop induction variable. In the unoptimized version,the fault would affect the value and references of array rs1. On the other hand,in the optimized version, the loop induction variable is restricted to the role of it-erating and exiting the loop, and a fault occurring in this induction variable wouldnot affect the array reference and its contents. Thus the optimized version is more151 alpha=(x*c)*s; 1 alpha=(x*c)*s;2 for(i=0; i<10;i++) 2 temp=&rs1;3 { 3 temp1=0;4 rs1[i]=i*alpha; 4 for(i=0; i<10;i++)5 } 5 {6 *temp=temp1*alpha;7 temp1=temp1+1;8 temp=temp+sizeof(int);9 }Figure 3.3: Effect of running the LOOP-REDUCE optimization on a codesnippet (a) Unoptimized version, (b) Optimized version.resilient that the unoptimized version. This example shows how an optimizationcan improve the error resilience of an application.3.4 SummaryIn this chapter, we study the behavior of different individual optimizations basedon a fault injection study. Our study shows that different optimizations have dif-ferent effects on a program’s error resilience, with some optimizations degradingresilience and others improving it. Further, it is often difficult to judge aprioriwhether an optimization will lower or improve the error resilience, as it is de-pendent on the application’s characteristics. This is why we build an automatedmethod to find optimization sequences that do not lower error resilience of a givenapplication.16Chapter 4MethodologyIn this chapter, we first present the problem statement and discuss its complexity.We then present our GA-based and SA-based approaches for solving the aboveproblem. We finally discuss the implementation details of our approaches.4.1 Problem Statement and ComplexityWe devise automated methods to solve the following problem: given a programP, find an optimization sequence that provides performance improvement withoutdegrading resilience. If γ = [α1,α2,α3, ...αn] where α1,α2,α3, ..αn are individualcompiler optimizations and γ is the superset of optimizations, our goal is to find anon-empty optimization sequence ϕ = {αx1αx2...αxt}, where 1 ≤ x1,x2, ..xt ≤ n,that retains the resilience of the program, i.e., Resilience(ϕ(P)) ≥ Resilience(P)and |ϕ| ≥ 1. The latter constraint is necessary to prevent the trivial solution whereϕ is an empty set, i.e., when no optimizations are performed on the program andthe resilience is the same.Note that a modern compiler has more than 50 optimizations at its disposal.So a naive search strategy to solve this problem would have to search through 250combinations, simply to find the sets of optimizations to run on the program. Eachset can in turn be permuted in different ways (with repetitions allowed), and hencethere is an exponential number of possibilities for solving this problem. This is whywe need an efficient way to search the space of optimizations for resilience, which17is provided by the meta-heuristic search algorithms. While other meta-heuristicsearch methods are also possible (e.g., Hill Climbing), we use GA and SA as theyhave been used in prior works on finding compiler optimization sequences for per-formance and memory.4.2 GA-Based ApproachWe explain our GA-based approach for finding the appropriate compiler optimiza-tion sequence for an application that does not degrade its error resilience. We beginwith a set of unique individual compiler optimizations as our initial population. InGA terms, these individual optimizations constitute the gene and the resulting com-binations of optimizations constitute a chromosome. The optimizations can consistof all the optimizations available in a standard optimizing compiler such as gcc orllvm. We obtain the initial error resilience of the unoptimized version of the appli-cation through fault injection experiments. This is the target error resilience for thealgorithm.The GA-based algorithm is presented in Algorithm 1. The steps are furtherexplained as follows.1. Initialization: Every individual member of the population is called as acandidate. The candidates in the initial population are unique individual compileroptimizations. The fitness score of every candidate in the population is calculatedusing the fitness function (discussed in Step 2). This is shown in the initializationpart of the Algorithm 1. The size of the initial population determines the con-vergence rate of the algorithm and the quality of its solution. We experimentallychoose the initial population size in Chapter 6.We first check if there is any candidate in the initial population that does notlower the program’s error resilience. If such a candidate exists, it is considered asan optimal candidate solution with the desired resilience and the algorithm termi-nates (lines 2-4). This is a trivial condition and is unlikely to occur. For example,we did not encounter this condition in any of our experiments.2. Fitness Function: In GA, the fitness score of a candidate is used to de-termine whether the candidate should be carried forward to the next generation.We devise a fitness function (Θ()) that measures the error resilience of a candi-18Algorithm 1: GA-based approach to find an optimization sequence that does notdegrade error resilienceα1,α2,α3, ...← Individual optimizationsΘ()← FitnessFunction()smin←Minimum fitness score of populationαmin← Candidate with fitness score sminsmax←Maximum fitness score of populationαmax← Candidate with fitness score smaxstarget ← Resilience of unoptimized versionδc←CrossoverRateδm←MutateRatepopulation← [(α1,Θ(α1)),(α2,Θ(α2)),(α3,Θ(α3)), ...]Input: Source code, populationOutput: Optimization sequence that retains the resilience of the given sourcecode1: procedure OPTIMIZATION SEQUENCE FOR RESILIENCE2: smax = max(Θ(α1),Θ(α2),Θ(α3), ...)3: αmax = getCandidate(population[smax])4: while smax ≤ starget do5: αa,αb = TournamentSelection(population)6: if Random()< δc then7: αˆ = crossover(αa,αb)8: else9: αˆ = αa10: end if11: if Random()< δm then12: αˆ = mutation(αˆ)13: end if14: smin = min(Θ(α1),Θ(α2),Θ(α3), ...)15: if smin <Θ(αˆ) then16: αmin = getCandidate(population[smin])17: Eliminate(population,αmin)18: Add(population,(αˆ,Θ(αˆ)))19: end if20: smax = max(Θ(α1),Θ(α2),Θ(α3), ...)21: αmax = getCandidate(population[smax])22: end while23: return αmax24: end procedure19date optimization sequence. However, as an advanced option any function thatmeasures both performance and resilience/vulnerability can be considered, wherethe algorithm searches for an optimization sequence that guarantees enormous im-provement in performance and resilience. Since we consider only error resilience,the fitness function can be based on a resilience model or on fault injection exper-iments. We use fault injection for this purpose as we were unsuccessful in findinga fitness function to predict a program’s resilience based on its code structure (seeappendix). Note however that our method is generic and is not tied to the use offault injection.The fitness function Θ is used to rank the resilience of the candidate. Basedon this rank, the GA decides whether the candidate should be considered for thenext evolution round. It is important to ensure that we can obtain tight confidenceintervals on the error resilience as we use it to compare solutions with each otherin terms of resilience. Therefore, we perform a few thousand fault injection exper-iments in each iteration of the GA to determine the fitness score, depending on thebenchmark’s characteristics.3. Tournament Selection: The goal of the tournament selection is to determinewhich pair of candidates should be recombined with each other to form the nextgeneration of the GA. In our algorithm, we choose two candidates from the popula-tion based on a heuristic (line 5). We consider two different heuristics: (i) Randomselection (ii) Score based selection. In random selection, we pick two candidatesrandomly from the population. In score based selection, we pick two best candi-dates (top two fitness scores), the intuition being that the fittest candidates can giverise to better offspring. We evaluate the effectiveness of both these heuristics inChapter 6.4. GA Recombination operations: We perform recombination operations onthe candidates chosen from tournament selection (line 6 -13). Recombination op-erations are of two types: (i) Crossover (ii) Mutation. CrossoverRate and Mu-tateRate determine the probability at which these operations are performed. TheCrossoverRate is chosen as suggested and used by classical papers in the GAarea [1, 7, 22, 24, 31]. The MutateRate was chosen based on our analysis dis-cussed in the Chapter 6, as there is no consensus in the literature on this value.We devise new crossover and mutation operations in order to explore the large20space of optimizations and drive the algorithm towards obtaining an optimizationset that retains the resilience. We briefly describe these operators.(i) Crossover: Crossover operation involves either append or swap operation.These operations increase the chances of combining the sequences of the two cho-sen candidates to evolve a new candidate with a higher resilience. The appendoperation simply appends the entities of the two selected candidates as shown inFigure 4.1. Swap operation is similar to the two-point crossover, where the entitieswithin the two selected index are swapped between the candidates.Candidate 1 a b c d Candidate 2AppendSwape f aa b c d e f aa f a dCandidate 1 a b c d Candidate 2 e f aNew CandidateNew CandidateFigure 4.1: Crossover Operations: Entities such as ‘a’, ‘b’, ‘c’ etc are indi-vidual compiler optimizations in an optimization sequence.(ii) Mutation: In some cases, we found that a single compiler optimization inthe candidate set degrades the overall resilience, and hence by replacing it withanother individual optimization or deleting it, the GA can generate a better candi-date. Thus we devise a mutation operation to add, delete or replace an individualoptimization with another one. Figure 4.2 shows adding, deleting or replacing anoptimization in the candidate optimization sequence.5. Elimination: The goal of the elimination step is to eliminate the unfit can-didates from the population. The fitness score of the weakest candidate from thepopulation is compared with the fitness score of the new candidate generated fromthe recombination operations. If the weakest candidate’s fitness score is smaller21Candidatea b c dAddkDeletea b dReplaceb f dAdd k optimizationDelete c optimizationReplace c with f optimizationa b c daFigure 4.2: Mutation Operations: Entities such as ‘a’, ‘b’, ‘c’ etc are individ-ual compiler optimizations in an optimization sequence.than that of the new candidate, it is eliminated from the population. In this case,the new candidate with a better resilience is added to the population, hence will beconsidered for the next generation’s evolution. If its fitness score is not smaller,the new candidate is not added to the population, and the population remains un-modified. This is shown in lines 14-19 in Algorithm 1. The main intuition here isthat weaker candidates have lower probability of giving rise to stronger offspring,and hence need to be eliminated from the pool of candidates to carry forward tothe next generation.6. Termination: If a new candidate is added to the population, we checkwhether its resilience is greater than or equal to the target resilience i.e., the re-silience of the unoptimized program. If this is the case, we call it the candidatesolution and stop the algorithm (line 4, and line 23). Otherwise, we repeat theabove steps of Recombination and Elimination until we obtain a candidate solu-tion. It is possible that such a candidate solution takes too long to obtain, or isnever obtained. To resolve this, we terminate the algorithm if the average fitnessscore of the entire population does not change for numerous generations. In thiscase, the algorithm returns the best candidate from the population that is closest tothe resilience of the unoptimized version as the candidate solution.224.2.1 Representative ExampleTo understand how a candidate solution (optimization sequence) is obtained by ourGA-based algorithm, we illustrate its running with the Blackscholes benchmarkprogram from PARSEC [2]. To simplify the presentation, we present only thesteps that were involved in the evolution of the final candidate solution.1. Initialization: We initialized the population with a set of 10 individual op-timizations from the LLVM compiler [17]. Let us consider the following as theinitial population:{licm, instcombine, gvn, early-cse, loop-reduce, sccp, inline, ip-sccp, loop-unroll, loop-unswitch} (these optimizations are explained in Table 3.1).2. Fitness Function: The fitness score of every candidate in the population iscalculated using a fault injection experiment as explained earlier.3. Tournament Selection: As explained in Section 4.2, the crossover and muta-tion operations are performed based on the probability values CrossoverRate andMutateRate. We consider the Random Selection strategy for tournament selection.4. Unrolling the algorithm: Assume that two random candidates namely loop-reduce and loop-unroll are chosen from the population (line 5). A random numberis generated and is found to satisfy the CrossoverRate probability. Again a randomnumber is generated to perform either a swap or append operation. In this case,assume that append operation is performed on the two candidates and results in thenew candidate [loop-reduce, loop-unroll] (lines 6-9). The contents within a squarebracket represent a single optimization sequence.Regardless of whether we perform the crossover operation, a random numberis generated to see if it satisfies the MutateRate probability (line 11). If crossoverwas performed, the new candidate undergoes mutation. Otherwise, the first ofthe two randomly picked candidate undergoes mutation and gives rise to a newcandidate. Assume that in this case, the probability does not satisfy MutateRate.So [loop-reduce, loop-unroll] is the candidate sequence. Fitness score for thiscandidate is calculated and found to be better than the weakest member, early-cse. Hence, the new candidate replaces the weakest candidate in the population.For clarity we have underlined the new candidate added to the population. Now,the population is {licm, instcombine, gvn, [loop-reduce, loop-unroll], loop-reduce,sccp, inline, ipsccp, loop-unroll, loop-unswitch} (lines 14-21). This is the end of23the first generation.In the first generation, the desired target resilience is not yet reached and hencethe algorithm proceeds to the second generation. In this generation, gvn and loop-unroll are chosen as the candidates for selection. In the crossover step, the twocandidates give rise to the new candidate, [gvn, loop-unroll]. Now assume thatthe mutation probability happens to be satisfied in the second generation and thereplace operation is performed on this candidate to give rise to the new candidate[gvn, inline]. Now, the population is {licm, instcombine, gvn, [loop-reduce, loop-unroll], loop-reduce, [gvn, inline], inline, ipsccp, loop-unroll, loop-unswitch}.In the second generation, the target resilience is not yet attained and the algo-rithm continues. Let us assume the two candidates chosen in the third generationare [licm, loop-reduce] and [gvn, inline]. Assume that the CrossoverRate is satis-fied and the two candidates undergo an append operation to yield the new candidate[loop-reduce, loop-unroll, gvn, inline], and that the MutateRate is not satisfied inthis generation. We find that the resilience of this new candidate satisfies the targetresilience and the algorithm terminates. This candidate optimization sequence ispresented as the candidate solution that achieves the desired target resilience (line23). Thus, the algorithm took three generations to obtain the candidate solution inthis example. Note that in reality it took many more generations for the program,but we present only three generations to simplify the presentation.4.3 SA-Based ApproachAs in the GA-based approach, we start with an initial set that includes a subset ofoptimizations available in the compiler. In this approach a state is defined as an op-timization sequence, while the score of a state is defined as the resilience(%) of theoptimized program. Similar to our GA-based approach we consider the resilienceof the unoptimized program as the baseline here (i.e., the target resilience).The SA-based algorithm is shown in Algorithm 2. The step wise execution ofthe SA-based algorithm is as follows:1. Initialization: We begin by choosing a random individual optimization fromthe initial set and consider it to be the current state of the algorithm (line 2 ofAlgorithm 2). The resilience of the program optimized with the optimization(s) in24the current state is measured and it is associated as the score of the current state(scurrent). This score is compared with the target resilience i.e the resilience ofthe unoptimized program and we terminate the algorithm if the resilience score ofthe current state is better than or same as that of the target resilience (line 3-4 ofAlgorithm 2). However, as in the GA-based approach, this trivial condition withan individual optimization was never satisfied in our experiments.2. Choosing Neighbor state: We then generate a neighbor state (sneighbor) fromthe current state (line 5 of algorithm 2) (ChooseNeighbor()). The process of gener-ating the neighbor state involves tweaking the current state with specific operations.These operations include add, delete and replace as shown in the Figure 4.3. Theadd operation involves selecting a random individual optimization from the initialset and appends it to the current state. The replace operation replaces a randomoptimization in the sequence of the current state with an optimization randomlychosen from the initial set. Finally, the delete operation removes a random opti-mization from the sequence of the current state.Current State (Optimization Sequence)[a b c d e]Example Neigbhor States Add – [a b c d e x ]Replace – [a b c y e]Delete – [a b d e]Figure 4.3: Choosing the neighbor states from current state (a, b, c,.. areindividual optimizations)3. Score Calculation: Once the neighbor state is generated, the resilience ofthe program optimized with the optimization sequence in the neighbor state is mea-sured (sneighbor) (line 6 of Algorithm 2).In simulated annealing, the score of a state(ResilienceScore()) is similar to the fitness score in GA. It determines whether theneighbor state should be carried forward as the current state to the next iteration.Similar to the GA-based approach this score is the error resilience measure of astate. As before we perform fault injections to calculate it.4. Accepting neighbor state: The neighbor state is accepted if its score isbetter than the score of the current state. However if the neighbor state has a lower25resilience score, the probability of accepting the neighbor state depends on thevariables T (temperature) and α (cooling rate) in the algorithm. The probability ofacceptance of a weak neighbor state is given by the Equation 4.1 where ∆S is thedifference in resilience scores of the current state and neighbor state (line 7-16 ofAlgorithm 2).P = e−∆S/T (4.1)The temperature T is decremented slowly over a period of time (line 18 ofAlgorithm 2). This cooling down process is determined by the cooling rate α . Onevery iteration, T is updated based on the Equation 4.2. We later discuss about howthe values of T and α are chosen in 6.T = T ∗α (4.2)As mentioned is Chapter 2 initially when T is high the probability of accepting aweak neighbor state is also high. However as T reduces over iterations, the valueof ∆S/T increases, and hence the probability of accepting a weak neighbor statedecreases. Thus initially the probability of accepting a weak state is high andgradually decreases over time in the algorithm. If the neighbor state is accepted,then it will be the current state for the next iteration of the algorithm (line 15-16 ofAlgorithm 2).4. Termination: The above steps 2 and 4 are repeated until a new neighbor stategenerated satisfies the terminating condition. As mentioned earlier, the terminationcondition in our case is the resilience score of the unoptimized program (line 20 ofAlgorithm 2).4.4 Measuring ResilienceBoth our approaches require us to estimate the error resilience of a candidate opti-mization sequence in each generation of the algorithm. We attempt to find such atool based upon program level metrics. If strong correlations exist between a met-ric ’X’ and error resilience, we can then estimate the resilience of a candidate basedon measuring the metric ’X’, rather than doing time consuming fault-injection ex-26Algorithm 2: SA-based approach to find an optimization sequence that does notdegrade error resilienceδ ← [Individual optimizations]T ← Initial Temperatureα ← Rate of coolingP← Acceptance Probabilitystarget ← Resilience of unoptimizedInput : Source code, Set of individual optimizationsOutput: Optimization sequence that does not degrade resilience1: procedure OPTIMIZATION SEQUENCE FOR RESILIENCE2: µcurrent = Random(δ )3: scurrent = ResilienceScore(µcurrent)4: while scurrent <= starget do5: µneighbor =ChooseNeighbor(µcurrent)6: sneighbor = ResilienceScore(µneighbor)7: if sneighbor < scurrent then8: ∆S = scurrent − sneighbor9: P = e−∆S/T10: if Random(0,1)< P then11: µcurrent = µneighbor12: scurrent = sneighbor13: end if14: else15: µcurrent = µneighbor16: scurrent = sneighbor17: end if18: T = T ∗α19: end while20: return µcurrent21: end procedureperiments. However, as we will see, determining such factors is very difficult, andwe could not find evidence of such correlations for SDCs based on the factors iden-tified in prior work [25, 34] for other kinds of failures. This is why we decided touse fault injections.We consider two kinds of program metrics to measure the correlation withSDC rates of programs. The first corresponds to instruction counts of the program27that were proposed by Thomas et al. [34]. The second corresponds to data-flowmetrics that were suggested by Pattabiraman et al. [25]. Although neither paperadvocates the use of these metrics for SDCs, we wanted to measure how goodthese metrics were for predicting SDCs in an application. We performed the faultinjection experiments using the experimental setup and benchmarks described inChapter 5.Thomas et al [34] have found a correlation between the dynamic instructioncount of a program and its resilience for soft-computing applications, i.e., thoseapplications that have relaxed correctness requirements. We wanted to see if thisapplies to general-purpose applications too.Figure 4.4A plots total dynamic instruction count versus the SDC rate forBlackscholes, one of the programs in the PARSEC benchmark suite [2] for 10different optimizations. It can be seen from the figure that total dynamic instruc-tion count has poor correlation with SDC rate. Hence, dynamic instruction countis not a reliable indicator of the SDC rates of an application.1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2Dynaminc Instruction Count(107 )0.500.510.520.530.540.550.560.570.580.59SDC rate0.1 0.2 0.3 0.4 0.5 0.6Predicted SDC rate0.10.20.30.40.50.6Actual SDC rateFigure 4.4: (a) Correlation between Total Dynamic instruction count vs SDCrate (for Blackscholes), and (b)Data-flow Regression Model’s Estima-tion vs Actual SDC ratePattabiraman et al. [25] have found that data-flow metrics such as fan-outs canbe used to predict applications’ vulnerability to failures. Based on this intuition,we investigated the following data flow analysis metrics (i) static backward slicing(ii) fan-in and fan-out of variables (iii) loop dependency. Using these factors, wetried to fit a linear regression model that estimates the SDC rate. Figure 4.4B plots28the estimated SDC rate by the regression model against the actual SDC rate for5 benchmark programs with 10 different compiler optimizations. As can be seenfrom the figure, the estimated SDC rate by the model has very poor correlation withthe actual SDC rate (a similar result was obtained by Hari et al [12]). Therefore, itis nontrivial to determine the error resilience of the program using these factors, andhence we use a fault injection experiment for fitness function and score calculationin our approaches.4.5 SummaryIn this chapter, we described our problem statement and its complexity, motivatingthe need for automated techniques to solve this problem. We then described ourGA-based and SA-based approaches, explaining the step by step process involvedin both algorithms. We followed up explaining our GA-based approach with arepresentative example to have a better understanding on its evolution, obtaining anoptimization sequence that does not degrades resilience. Since both our approachesrequires us to measure the error resilience of an optimized program, we attemptedto find a model/tool measuring the resilience of a program based on its programlevel metrics. Since we were unsuccessful in finding such a model/tool, we useda fault injection tool for this purpose. In the following chapter, we present ourexperimental setup for evaluating our approaches.29Chapter 5Experimental SetupIn this chapter, we first present the implementation details of our techniques andthen describe the benchmarks used for our evaluation. We later explain how wetune the GA and SA parameters, and evaluate the resilience, vulnerability and per-formance overhead of the program compiled with our candidate solutions.5.1 ImplementationWe implemented our approaches using the LLVM compiler [17] , the LLFI faultinjection tool [37], and the Java language. LLVM is a popular optimizing com-piler that includes a host of standard optimizations, and is used in a wide varietyof real-world platforms (including the Mac platform). LLVM allows us to specifyindividual optimizations to be run on the application, as well as the standard op-timization levels. For seeding our techniques, we pick a subset of optimizationsconsisting of data-flow, loop, global and a few other optimizations available in theLLVM compiler [17]. This subset comprises around 10 different optimizations (weexplain why we choose 10 in Chapter 6).For the fitness function and score evaluation in our approaches, we use LLFI,a fault injection tool that operates at the LLVM Intermediate Representation (IR)code level [17] , to inject hardware faults into the program’s code. We use LLFIas it has been found to be accurate for measuring the SDC rate of an applicationrelative to assembly-level fault injection [37]. LLFI first takes the IR code as the30input and determines the target instructions/operands for fault injection. It theninstruments the target instructions/operands with appropriate calls to the fault in-jection functions. These fault injection functions are the ones that injects specificfaults (in our case single bit flips) to the specific instruction operand. At each runthe compiled program is executed, and LLFI randomly chooses a single dynamicinstance of the instrumented instructions to call its fault injection function whichexecutes fault injected instruction. We then compare the outputs of the fault injec-tion experiments with the fault free outcome to measure the error resilience of theprogram. Since hardware faults are random occurrences, LLFI randomly selects adynamic instruction at runtime for fault injection.5.2 BenchmarksWe evaluate our techniques on twelve programs, five from the PARSEC [2] andseven from the Parboil [33] benchmark suites. The benchmarks represent a widevariety of tasks ranging from video processing to high-performance computing,and are all written in C/C++. They range in size from a few hundred to a fewthousand lines of code. The benchmarks chosen and their characteristics are shownin Table 5.1.5.3 Tuning of the GA parametersWe first evaluate the performance of the GA approach in order to tune its param-eters to obtain faster convergence. One way to measure performance of the algo-rithm is by using wall clock execution time. However, the execution time for theGA is dominated by the time it takes to perform the fault injections in each itera-tion of the GA to evaluate the fitness of each candidate. Therefore, the number ofgenerations taken by the algorithm is a more meaningful measure of performance,as the greater the number of generations, the more the number of candidate se-quences generated, and hence the more the number of total fault injections thatmust be performed to evaluate the candidates.We consider the effects of the following parameters in order to tune the GA.These parameters are explained in Section 4.2.(1) MutateRate: We vary this value based on what the literature on GA recom-31Table 5.1: Benchmark programs that are used in our experimentsProgram BenchmarksuiteDescriptionBlackscholes PARSEC Computes the price of options using blacksc-holes partial differential equation.Swaptions PARSEC Computes the price of portfolio of swaptionsby employing Monte Carlo(MC) Simulations.x264 PARSEC An H.264/AVC video encoder, that achieveshigher output quality with lower bit rate.Fluidanimate PARSEC Simulates an incompressible fluid for interac-tive animation purposes.Canneal PARSEC Minimizes the routing cost of a chip designusing a cache-aware simulated annealing.Bfs Parboil Implements a breadth first search algorithmthat computes the path cost from a node toevery other reachable node.Histo Parboil Computes a 2-D saturating histogram with amaximum bin count of 255.Stencil Parboil An iterative Jacobi solver of heat equation us-ing 3-D grids.Spmv Parboil Implements a Sparse-Matrix Dense-VectorProductCutcp Parboil Computes short-range electrostatic potentialsinduced by point charges in a 3D volumeSad Parboil Computes sum of absolute differences forpairs of blocks which is based on the full-pixel motion estimation algorithmSgemm Parboil Performs a register-tiled matrix-matrix multi-plicationmends [13], from low to high values.(2) Population size: We vary this value from 10 to 40 as we have a total of 50optimizations in LLVM.(3) Tournament selection strategy: We consider two strategies, random selec-tion and score-based selection.(4) Optimization Types: We vary the optimization types (data-flow, loop, globaland others) considered in the population.32Once we tune the GA parameters, we use these values to derive the candidatesolutions used in the resilience and performance evaluation experiments describednext.5.4 Tuning of the SA parametersSimilar to the GA-based approach we also tune the SA parameters to evaluatethe performance of the SA-based approach for faster convergence. We know thatinitially the value of T should be high, so that the probability of accepting weaksolutions is high. Choosing the initial value of T depends on the possible values ofP (probability of acceptance). We observed that the least possible value of ∆S is0.1. Hence the possible maximum value of T for which P starts from 0.99 is T=10.Similar to tuning GA parameters, we tune the SA parameter cooling rate byvarying its values with 0.5, 0.7, 0.9 and 0.99. We choose cooling rate values from0.5 as T would decrease rapidly when α is low(T = T ∗α).5.5 Resilience EvaluationWe first compile each of the programs using LLVM with the -O0 option (no opti-mizations) for generating the unoptimized program. We then measure its resilienceby performing fault injection experiments using the fault injection tool LLFI (as ex-plained in Chapter 5.1). We consider this as the baseline for our experiments. Weperform a total of 1000 fault injection experiments per benchmark program (onefault per run), in order to get tight confidence bounds on the SDC rate. The errorbars range from 0.85% to 2.501% depending on the benchmark for the 95% confi-dence interval. In each run, our fault injector injects a single bit flip into the resultof a single dynamic instruction chosen at random from the set of all instructionsexecuted by the program. This is in line with our fault model in Section 2.4.We compare the results of our techniques with standard optimization levelsO1, O2 and O3 as no other prior work has proposed an algorithm for selectingoptimizations for resilience. We repeat the above process for each of the optimiza-tion levels O1, O2, and O3, for each benchmark program, and obtain their errorresilience. We then run our GA-based and SA-based approaches to identify a can-didate solutions (i.e., optimization sequence) for each benchmark program, and33then repeat the same experiment for these candidate solutions. We compare eachof the resilience values to the baseline resilience of the unoptimized version, andmeasure the increase or decrease in error resilience with respect to the baseline.Also in order to be accurate, we consider the error bars into account on every itera-tion of the algorithm while comparing the resilience of the optimization sequenceswith the target resilience.There are a total of 72000 injections performed in our experiments (12 bench-marks, 1000 injections, 6 executables namely, O1, O2, O3, unoptimized and GAand SA candidate solutions). Note that these fault injection experiments do notinclude the overall fault injections performed by the GA-based and SA-based al-gorithms to evaluate a candidate solution, which are many more in number.5.6 Performance EvaluationWe then evaluate the performance improvement obtained by each of the optimiza-tion levels and the candidate solutions obtained by our approaches. We measurethe execution time of the executable compiled with the appropriate set of optimiza-tions i.e., O1, O2, O3 and the GA and SA candidate solutions on our hardwareplatform. The platform we use is a Intel E5 Xeon machine with 32G memory run-ning Ubuntu Linux 12.04. We measure the execution time by taking an averageof 10 trials for each configuration to obtain tight error bars ranging from 0.28% to3.38% for the 95% confidence interval.5.7 SummaryIn this chapter, we discuss the implementation details of our techniques and alsoexplain the working of LLFI (fault injection tool), that was used in our techniquesfor measuring the error resilience of a given program. We present the twelve bench-marks used in our evaluation and also discuss about tuning the various algorithmparameters to improve the performance of both approaches. We then discuss ourevaluation of the candidate solutions obtained from our approaches with respect toperformance and resilience, compared with the standard optimization levels (O1,O2 and O3).34Chapter 6ResultsWe first present the results of how we tune the parameters of our GA-based and SA-based algorithms for faster convergence. We then present the results for evaluatingthe error resilience of the candidate solutions (i.e., optimization sequences) foundby our approaches, and that of the standard optimization levels. Finally, we presentthe results of the performance improvement and vulnerability reduction of each ofthese optimization sequences over the unoptimized version.6.1 Effect of GA ParametersWe consider the effect of four parameters on the GA-based approach’s convergencerate. In these experiments, we only present the results for two benchmarks, bfs andblackscholes due to space constraints, but the results we obtained were similar forthe other programs.6.1.1 Mutation RateWe first considered the effect of varying the mutation rate of the GA-based algo-rithm, keeping the other parameters fixed. As explained in Chapter 2, this parame-ter represents the trade-off between search space exploration (leading to potentiallybetter solutions) and convergence (leading to faster solutions). A larger mutationrate is associated with better exploration but slower convergence.Figure 6.1 shows the effect of varying the mutation rate on the convergence35blackscholes bfsBenchmark Programs050100150200250300350Number of GenerationsMutation rate values0.010.10.30.5Figure 6.1: Number of generations taken to generate the candidate solutionwith different mutation rate values.rate. We performed these experiments with four different mutation rate values:0.01, 0.1, 0.3, 0.5. We observe that the algorithm that obtains the candidate solutionin the least number of generations is for mutation rate = 0.3. Therefore, we choosea mutation rate of 0.3 for our technique.6.1.2 Selection StrategyAs mentioned in Section 4.2, there are two possible selection strategies in eachiteration of our GA-based algorithm. One strategy is to randomly choose any twocandidates in the population to move forward to the next generation (random). An-other strategy is to choose the two best candidates, i.e., the candidates with thehighest fitness scores in each generation (score-based). We compared the numberof generations taken by each strategy to attain convergence across the benchmarkprograms (all other values are kept the same). The results of this comparison isshown in Figure 6.2. It can be observed that for the blackscholes benchmark, thescore based selection method takes many more generations than the random se-lection to obtain the candidate solution. The difference is much lesser for the bfsbenchmark. The poor performance of score based selection is because it movesfaster, but gets stuck at local maxima, trying to select the best candidates in every36generation, following which it takes a long time to attain convergence. On the otherhand, the random selection method moves towards the candidate solution slower,but does not get stuck in the local maxima, making it converge faster. We thereforeuse the random selection strategy in our approach.blackscholes bfsBenchmark Programs050100150200250300350400Number of Generations Random-selectionScore-based selectionFigure 6.2: Number of generations taken to generate the candidate solutionby the random selection and score based selection strategies.6.1.3 Population Sizerepresents the number of individual optimizations present in the initial populationconsidered by our GA-based approach. To evaluate the effect of the populationsize, we examined the number of generations that the algorithm takes to convergefor different population sizes ranging from 10 to 40.Figure 6.3 shows the results of this experiment. The figure shows that thenumber of generations taken to attain convergence increases with the increasingpopulation size. Hence, a smaller population size would arrive at an optimal so-lution faster. On the other hand, increasing the population size may lead us to abetter solution. We however find that even by restricting the population size to just10 optimizations, we are able to achieve satisfactory performance without degrad-ing the error resilience (Section 6.4). Based on our results, we choose a populationsize of 10 for our GA-based approach.370 10 20 30 40Population size050100150200250300Number of generationsblackscholesbfsFigure 6.3: Number of generations taken to generate the candidate solutionwith different population sizes.6.1.4 Optimization TypesCompiler optimizations are classified into different types based on the transforma-tion they perform on the program. As described in Chapter 3, they are classifiedinto data-flow optimizations, loop optimizations, global optimization and others.For the initial population in our approach, we pick a subset that contains a combi-nation of optimizations from the available classes. We wanted to investigate if wecould achieve faster convergence by using only a specific class of optimizations asthe population. For this experiment, we restrict the types of optimizations to eachof the above categories, and compare the number of generations taken to obtainthe candidate optimization sequence. We also compare it to the convergence rateobtained when all the categories are combined together, called “combination ofall’.The results are shown in Figure 6.4. From the figure, it is evident that no singleclass of optimization outperforms the rest for both benchmarks. This suggests thatthere is no one universal set of optimizations that can accelerate the convergence.Hence, we chose a population that consists of a combination of all the optimizationtypes in our experiments, i.e., we do not restrict ourselves to a specific optimizationtype.38blackscholes bfsBenchmark Programs02004006008001000Number of GenerationsOptimization TypesData-flow optimizationsLoop optimizationsglobal optimizationsOther optimizationsCombination of allFigure 6.4: Number of generations taken to generate the candidate solutionwith different optimization types.6.2 Effect of SA parameters6.2.1 Rate of CoolingRate of cooling is the rate at which the temperature is reduced over iterations inSA. The goal of the algorithm is to obtain an optimal solution by accepting morenumber of bad states initially, and this can be achieved by reducing the temperatureslowly over iterations. Since T = T ∗α and α lies between 0-1, α should be highfor reducing T slowly. However to choose a suitable value of α we evaluate thetime of convergence of the algorithm for different α values.Figure 6.5 shows the effect of varying al pha on the convergence rate. Weexperimented with four different values: 0.5, 0.7, 0.9 and 0.99. We choose valuesfrom 0.5 because any value below 0.5 would decrease the temperature rapidly.We observe that our SA-based approach obtains the candidate solution in the leastnumber of iterations is for al pha = 0.99. Therefore, we choose alpha as 0.99 forour SA-based approach.390.5 0.7 0.9 0.99Cooling Rate050100150200250300350400Number of itertionsblackscholesbfsFigure 6.5: Number of iterations taken to generate the candidate solution bythe SA-based approach with different cooling rate.6.3 Resilience EvaluationFigure 6.6 shows the aggregate fault injection results across benchmarks for theunoptimized, original versions. The percentages of SDCs across all benchmarksis 19.0%. Crashes constitute 36.8% and benign injections constitute 44.2%. Weobserved that hangs are negligible in our experiments. Note that we include onlythe activated faults in the above results, or those faults that are actually read by thesystem and affect the program’s data, as this is line with the definition of resilience(see Section 2.1).We compare the resilience of the program compiled with the optimization se-quence (i.e., candidate solution), obtained from our GA-based and SA-based ap-proaches, with the resilience of the unoptimized program, and that of the compileroptimization levels O1, O2 and O3. Figure 6.7 shows the resilience (in %) of theunoptimized, candidate solutions (from GA and SA) and the different optimizationlevels.As can be seen in the figure, the optimization levels O1, O2 and O3 have de-graded the resilience of the application compared to the unoptimized version, forall the benchmarks. On the other hand, the candidate solutions (i.e., optimiza-tion sequence) generated by our techniques (GA and SA approach) provides a re-40Crash36.8%SDC 19.0%Benign44.2%Figure 6.6: Aggregate percentage of SDC, crash and benign results acrossbenchmarks for the unoptimized version.silience better than or on par with the unoptimized version of the program. It canbe observed that for most benchmarks (except x264, cutcp and blackscholes) thecandidate solution obtained from the GA-based approach provides better resiliencethan the SA-based approach. The reason is as follows: an optimization sequenceevolved in an iteration in the GA-based approach cumulatively depends on the cur-rent population that contains a set of best candidates (optimization sequences withbetter resilience) evolved until the previous iteration. Whereas in the SA-basedapproach, an optimization sequence evolved in an iteration depends only on oneoptimization sequence that was evolved in the previous iteration. That said, theresilience achieved by the candidate solutions obtained from the two techniques donot differ significantly.The arithmetic mean of the error resilience across benchmarks, of the unopti-mized version and the GA-candidate solution, SA-candidate solution, O1, O2 andO3 levels are respectively 76.14 (±1.54), 79.125 (±1.57), 78.36 (±1.55), 72.77(±1.6), 73.15 (±1.6) and 73.38 (±1.54).1 Further we show the SDC rates of theunoptimized, candidate solutions (GA and SA) and the optimization levels in Fig-ure 6.8. The arithmetic mean of the SDC rate across benchmarks show that ourcandidate solutions lower the SDC rate significantly compared to the unoptimizedversion, O1, O2 and O3.1We use the arithmetic means (AMs) for computing the averages as we are comparing the absolutevalues.41fluidanimate bfscannealhistoswaptionsx264stencilspmvcutcp sadblackscholessgemmAverageBenchmark Programs405060708090100Resilience (in %)UnoptimizedGA-Candidate solutionSA-Candidate solutionO1O2O3Figure 6.7: Resilience of the Unoptimized, candidate solutions, O1, O2 andO3 levels. (Higher values are better).fluidanimate bfscannealhistoswaptionsx264stencilspmvcutcp sadblackscholessgemmAverageBenchmark Programs0102030405060SDC rate (in %)UnoptimizedGA-Candidate solutionSA-Candidate solutionO1O2O3Figure 6.8: SDC rate of the unoptimized code, candidate solutions, O1, O2and O3 levels. (Lower values are better)6.4 Performance EvaluationAs in the resilience experiment, we compare the performance of the unoptimizedversion of the program with the candidate solutions, O1, O2 and O3. Figure 6.9shows the execution time (in sec) of the unoptimized code, candidate solutions42found by our techniques (GA and SA) and the optimization levels O1, O2 andO3. The figure shows that the candidate solution from our techniques providesbetter performance than the unoptimized version (as expected). Also we observed,our GA-candidate solutions’ average of execution time across benchmarks is bet-ter than those of the optimization levels O1 and O2, and slightly worse than O3(by 0.39%). Similarly, the SA-candidate solutions’ average of execution time is0.61%, 2.27% and 2.72% worse than O1, O2 and O3 respectively. This showsthat resilience friendly optimizations obtain performance improvements that arecomparable to the standard optimization levels.In fact, in some cases, the performance of the candidate solutions found byour approaches is better than the performance provided by the optimization levels.For example, in the case of the blackscholes program, our optimization sequences(both GA and SA) provides a better performance than the optimization levels O1and O2. However, this is not the case for other benchmarks such as fluidanimateand canneal, where the candidate solution’s performance is much worse than theoptimization levels. Analyzing further, we observed that the “union” data typein the fluidanimate program has been optimized by -scalarrepl optimization, andcaused this major performance improvement. However, -scalarrepl was not in-cluded in our initial population. Similar behavior was observed in canneal. It canalso be observed that in some programs such as swaptions, cutcp, sad and sgemmthe candidate solution obtained from SA-based approach do not provide signif-icant performance improvement as compared to that provided by the candidatesolution from GA-based approach (error bars do not overlap). In the remainingprograms considered the candidate solutions obtained from SA-based approachand GA-based approach provide performance improvement with overlapping errorbars. Hence it is difficult to decide on one approach to be better than the other.Note that performance improvement is not an explicit goal of our techniques,though it is implicit in the fact that we choose to use standard compiler optimiza-tions that are predominantly focused on performance improvement.43fluidanimatecannealhistox264spmvblackscholessgemmBenchmark Programs00.010.020.030.040.050.06Execution time (in sec)UnoptimizedGA-Candidate solutionSA-Candidate solutionO1O2O3bfsswaptionsstencilcutcpsadBenchmark Programs00.20.40.60.80.111.21.41.61.8Execution time (in sec)UnoptimizedGA-Candidate solutionSA-Candidate solutionO1O2O3Figure 6.9: Runtime of the unoptimized code, candidate solutions, O1, O2and O3 levels. (Lower values are better).6.5 Vulnerability EvaluationCompiler optimizations often reduce run time of the program, thereby reducing itstime of exposure to transient faults. Hence it is important to evaluate the vulnera-bility of the program in addition to its resilience (Chapter 2). Since vulnerability iscalculated as the product of execution time and SDC rate, it is dependent on boththese factors.Similar to the above evaluations, we compare the vulnerability of different ver-sions of the program. Figure 6.10 shows the comparison of vulnerability of thecandidate solutions (GA and SA) with optimization levels O1, O2 and O3. Er-44ror bars for vulnerability (product of SDC rate and execution time) is calculated byadding the relative errors which is the standard way for uncertainty calculation for aproduct [27] The figure shows that in most of the benchmark programs (except flu-idanimate, and the O3 level in both sad and blackscholes), the candidate solutionsobtained from our techniques reduces the overall vulnerability of the program. Onthe other hand, the optimization levels O1 and O2 increase the overall vulnerabilityof the program, while O3 reduces it but not to the level of our candidate solutions.On an average the vulnerability of programs compiled with standard optimizationlevels O1,O2 and O3 are 9.53 (±0.25) , 9.22 (±0.24) and 9.11 (±0.24). In com-parison, the candidate solutions found by our GA-based and SA-based approacheslower the vulnerability to 8.12 (±0.21) and 8.51 (±0.22) than the unoptimized ver-sion (9.25 (±0.25). Since the candidate solutions obtained from our GA-based ap-proach provides resilience and performance improvement better than the SA-basedapproach, the GA-based approach provides better vulnerability reduction as well.However, in the case of x264 since SA-based approach provides better resilienceand performance improvement, it considerably lowers the vulnerability.In the case of fluidanimate, the standard optimizations reduce vulnerabilitymuch more than our candidate solutions. This is because the optimizations reducethe program’s execution time by 50% or more, which far outweighs the increasedSDC rate due to the optimizations. We have explained the reason for this massivereduction in execution time in Section 6.4.In the case of blackscholes and sad, both candidate solutions do worse than theoptimization level O3 in terms of vulnerability reduction. Again, the optimizationlevel O3 reduces the execution time by nearly 40% in these programs, and thisoutweighs the increase in the SDC rate, resulting in lower vulnerability.6.6 SummaryIn this chapter, we first evaluated the performance of our techniques by varying thevalues of the mentioned GA and SA parameters. Based on the results, we chose thesuitable values for the rest of our experiments. We then evaluated the performance,error resilience and vulnerability of the candidate solutions obtained from our tech-niques and that of the standard optimization levels (O1, O2 and O3). We observed45fluidanimatecannealhistox264spmvblackscholesBenchmark Programs00.10.20.30.40.50.6Vulnerability (SDC rate* Runtime) UnoptimizedGA-Candidate solutionSA-Candidate solutionO1O2O3bfsswaptionsstencilcutcp sadsgemmBenchmark Programs010203040506070Vulnerability (SDC rate*Runtime)UnoptimizedGA-Candidate solutionSA-Candidate solutionO1O2O3Figure 6.10: Vulnerability of the unoptimized code, candidate solutions, O1,O2 and O3 levels. (Lower values are better).that the resilience of the candidate optimization sequences found by our techniques(GA and SA) is much better than those of the standard optimization levels. Alsothe performance of the optimized code with our techniques is on par with or onlyslightly lower than the performance of the code with the standard optimization lev-els (GA based - better than O1, O2 and slightly worse than O3 (by 0.39%); SAbased - 0.6%, 2.27% and 2.72% worse than O1, O2 and O3 respectively). Finallyon an average, our techniques considerably lower the overall vulnerability of theapplication (GA- 8.12 (±0.21) and SA- 8.51 (±0.22) on average), while the stan-46dard optimization levels O1 increase the overall vulnerability of the application(O1- 9.53 (±0.25) on average) and O2 and O3 reduce it slightly (O2-9.22 (±0.24)and O3-9.11 (±0.24)). In the following chapter, we perform further analysis basedon our results.47Chapter 7DiscussionIn this chapter, we examine the implications of our results. We first compare andanalyze the candidate solutions obtained from our GA-based and SA-based ap-proaches. Based on our analysis, we recommend that GA-based approach is moresuitable for our problem statement, and hence we perform further analysis on theresults obtained from the GA-based approach to understand its sensitiveness. Wefinally reflect on the limitations of our approaches.7.1 GA-based Vs SA-basedWe compare the candidate solutions obtained from our two approaches and analyzewhether the resilience improvement is caused by the common individual optimiza-tions they share. We do this analysis to group those individual optimizations asthe resilience-friendly optimizations and these optimizations can be directly con-sidered for finding the optimization sequence.Given that we have two candidate solutions obtained from different approachesfor a program, we compare them to examine whether there exists a fixed set of in-dividual optimizations that provides better performance and resilience. For thispurpose, we analyzed the common individual optimizations in the candidate solu-tions obtained from the GA-based and SA-based approaches for each program. Weevaluate the impact of these individual optimizations on the resilience of the pro-gram, and observed that the resilience improvement of the candidate solutions is48not because of the individual optimizations, but is the effect of the optimization se-quence as whole. For example, we extracted the common individual optimizationsloop-reduce and gvn from the candidate solutions obtained by the GA-based andSA-based approaches for the blackscholes program. Figure 7.1, shows that loop-reduce and gvn do not provide such high resilience improvement as rendered bythe candidate solutions. This shows that the resilience improvement provided bythe candidate solutions is caused by the entire optimization sequence. The trans-formation performed by loop-reduce and the explanation for how it improves theerror resilience has already been explained in Chapter 3. In the case of gvn, theresilience improvement lies within the error bar, and hence we cannot concludeanything about its resilience improvement.loop-reduce gvn GA-based SA-basedOptimizations4850525456586062Resilience (in %)Figure 7.1: Resilience of blackscholes with loop-reduce, gvn and candidatesolutions obtained from GA-based and SA-based approaches with theunoptimized code’s resilience as baseline. Black line represents the re-silience of the unoptimized code.7.2 Sensitivity AnalysisWe compare the resilience improvement rendered by the candidate solutions ob-tained from our GA-based and SA-based approaches. Based on our results we49observed that it is difficult to decide one among them as the better approach forfinding resilience friendly compiler optimizations. This is because we notice thatfor some benchmarks GA-based approach performs better, while for others SA-based approach performs better. However the candidate solutions obtained fromour GA-based approach provide performance improvement that is 5.5% better thanSA-based. Also, it depends on the optimization sequences obtained from the twoapproaches. For example the candidate solution for blackscholes from GA-basedapproach included inline, while that from SA-based approach did not. Analyzingthe optimized code we observed that the inline optimization has a major contribu-tion to the performance improvement in the candidate solution obtained from theGA-based approach. Since the candidate solution from SA-based approach did notcontain inline or any other similar optimization that improves performance compa-rably, it does not provide as good performance improvement as GA-based, O1, O2or O3 optimization levels. Hence our further analysis in this chapter is based onthe results of the GA-based approach.7.2.1 Order of the optimizationsWe evaluate the sensitivity of error resilience with the order of individual opti-mizations in the optimization sequences (candidate solutions) obtained from ourtechniques. We measure the resilience of the program optimized with all possiblepermutations of optimizations in the sequence. If there are ‘n’ individual optimiza-tions in the sequence, then we have n! sequences to be validated. For example,the candidate solution obtained from GA-based approach for the blackscholes pro-gram includes 4 individual optimizations [loop-reduce, loop-unroll, gvn, inline],and hence we have 24 sequences. Figure 7.2 shows the resilience of the candidatesolution (CS) obtained from the GA-based approach and the different sequences(we denote different sequences as C1,C2,...C23). In some cases like CS and C4,the order of the optimizations in a sequence is sensitive to resilience (beyond errorbars). However, in others since the error bars of the resilience measurements over-lap, we cannot draw such a definitive conclusion that the resilience of the programis sensitive to every change in the order of the optimizations.50CS C1 C2 C3 C4 C5 C6 C7 C8 C9 C10C11C12C13C14C15C16C17C18C19C20C21C22C23Optimization Sequences505152535455565758596061Normalized Resilience (in %)Figure 7.2: Resilience of blackscholes with the candidate solution-GA based(CS) and the different combinations of the sequence (C1,C2,....C23)with the unoptimized code’s resilience as baseline.7.2.2 Resilience vs Vulnerability measuring fitness functionWe evaluate the candidate solutions obtained by the GA-based approach, varyingits fitness function for resilience and vulnerability measurement. The goal of thealgorithm with the vulnerability measuring fitness function is to find an optimiza-tion sequence that does not increase the vulnerability of the program. Figure 7.3compares the vulnerability of the candidate solutions (GA-based) with resilienceand vulnerability measuring fitness functions, O1, O2 and O3. It shows that thevulnerability improvement rendered by the candidate solutions obtained with vul-nerability measuring fitness function is comparable to those of candidate solutionsobtained with resilience measuring fitness function. However, in most cases thevulnerability reduction of the candidate solutions obtained with the vulnerabilitymeasuring fitness function was due to its performance improvement and not dueto resilience improvement. On the other hand, our GA-based algorithm with re-silience measuring fitness function improves resilience and performance reason-ably,thus improving vulnerability. This shows that the algorithm with vulnerabilitymeasuring fitness function mainly optimizes to generate optimization sequence thatimprove performance which is another way of improving vulnerability. Also, if the51algorithm with resilience measuring fitness function took x iterations to obtain thecandidate solution the vulnerability measuring fitness function took about 1.2x to10x iterations depending on the application. This is because the algorithm with vul-nerability measuring fitness function tries to optimize for both, time and resilience.Hence the resilience measuring fitness function is more suitable than vulnerabilitymeasuring fitness function.fluidanimatecannealhistoswaptionsx264spmvblackscholesBenchmark Programs00.10.20.30.40.50.6Vulerability (SDC rate*Runtime)GA-Resilience TargetGA-Vulnerability Targetbfsstencilcutcp sadsgemmBenchmark Programs010203040506070Vulerability (SDC rate*Runtime) GA-Resilience TargetGA-Vulnerability TargetFigure 7.3: Vulnerability of the candidate solutions with resilience and vul-nerability target. Lower values are better.527.2.3 Evolution of the GA-based approachWe now examine how our GA-based approach achieves its objectives. We firstcompare our algorithm with a random walk search heuristic. In the random walk,we begin with the same individual optimizations that constituted the initial popu-lation in our experiments. In every iteration, a random optimization sequence isgenerated from these individual optimizations and its resilience is evaluated. Thisprocess is repeated until an optimization sequence that does not degrade the pro-gram’s resilience is obtained. The main difference between the random walk andour approach is that there is no fitness function in the random walk. We observedthat the random walk never converged to a solution even after a long time. Forexample, in the case of the Blackscholes program, the random walk did not obtaina solution even after 4 days (230 iterations), whereas our GA-based approach ob-tained a solution in 4 hours time (24 iterations). Similar results were obtained forother programs.0 10 20 30 40 50 60 70 80 90 100%Generation count0102030405060708090100Normalized Mean fitness score of population blackscholesbfscannealhistox264fluidanimateswaptionsstencilcutcpspmvsadsgemmFigure 7.4: Mean population fitness scores during the GA evolution processfor each program.Thus, the main feature of the GA-based approach that allows it to find the targetoptimization sequence is its evolution, i.e., with every generation, new candidateswith better fitness score are evolved. Figure 7.4 shows the mean fitness score of53the population during the lifetime of GA as it progresses from one generation toanother for each of the benchmarks. To ensure uniformity across benchmarks, wenormalize the number of generations to the total number of generations taken bythe GA to evolve the candidate solution for the benchmark. We do the same for theerror resilience by normalizing it to the final resilience of the candidate solution.As can be seen from the figure, the fitness score of the population (i.e., resilience)gradually improves in each generation, until finally the target resilience is reached.7.2.4 Resilience-Enhancing Compiler OptimizationsIn this paper, we restricted ourselves to finding compiler optimizations that pre-serve the error resilience of the unoptimized version. However, as we found earlier,compiler optimizations can often enhance the resilience of a program. So we askwhat happens if we remove the restriction of simply preserving the error resilienceof the program, and attempt to improve the resilience instead.To explore this notion, we modified our GA-based algorithm to an UnboundedGA-based algorithm. The Unbounded GA-based approach is the same as our GA-based approach, except that the terminating condition is that the average fitnessscore of the candidates in the population remains constant for numerous genera-tions.The results are shown in Figure 7.5. As can be seen, the candidate solutions ob-tained from the Unbounded GA-based approach generates optimization sequenceswith resilience either comparable to or better than the GA-based approach. How-ever, the difference in the geometric means of the two approaches is only 2%,which is within the error bars of the measurement. This shows that the solutionsobtained by our GA-based approach are comparable to the solutions obtained bythe unbounded GA-based approach in terms of error resilience. However, the Un-bounded GA-based approach took about 2x to 20x the number of iterations as theGA-based approach. Thus, our GA-based approach achieves similar results as theUnbounded GA-based approach, but takes much less time.Figure 7.6 shows the mean fitness score of the population during the lifetime ofthe Unbounded GA-based approach as it progresses from one generation to anotherfor each of the benchmarks. We normalize the number of generations and the54fluidanimatebfscannealhistoswaptionsx264stencilspmvcutcpsadblackscholessgemmAverageBenchmark Programs405060708090100Resilience (in %)GA-Candidate solutionUnbounded GA-CandidatesolutionFigure 7.5: Resilience of the candidate solutions obtained from GA-basedand Unbounded GA-based approaches.fitness score to the total number of generations and the average fitness score of thelast generation’s population for the benchmark. As can be seen from the figure,the average fitness score of the population improves over generations, and remainsconstant after a particular generation. This is because either the algorithm hasgotten stuck in the local maxima or it has attained the maximum resilience that canbe obtained from the given set of individual optimizations in the initial population.7.3 Limitations of our ApproachesWhile our approaches lead to faster convergence than the random walk, they havethree limitations. First, as with all optimization techniques, the GA-based approachcan get caught in local maxima and result in sub-optimal solutions. This is partiallymitigated by the choice of the parameters such as the mutation rate, but ultimately,there is no guarantee that the approach will converge to the optimal solution. How-ever our SA-based approach would probably prevent from getting stuck at localmaxima. A second limitation is that our approaches can take a long time to con-verge to a solution. Although the algorithms itself are very fast, the fault injectionto evaluate the resilience of the candidates in each generation takes a long time. Wecan however parallelize the injections as there are no dependencies among individ-550 10 20 30 40 50 60 70 80 90 100%Generation count0102030405060708090100Normalized Mean fitness score of populationblackscholesbfscannealhistox264fluidanimateswaptionsstencilcutcpspmvsadsgemmFigure 7.6: Mean population fitness scores during the Unbounded GA evolu-tion process for each program.ual injection runs, to obtain faster running times, at the cost of higher computationalresources. A third limitation of our approaches is that our results were obtained onour x86-based hardware platform, and may not apply to other platforms. To miti-gate this issue, we chose only the optimizations that are machine-independent, andare hence portable across platforms. Finally, we have considered only SDC’s in ourdefinition of resilience as they are the most critical outcomes, however crashes andhangs may also be important in some cases. These can be considered by suitablymodifying the fitness function and score calculation.7.4 SummaryIn this chapter, we performed various analysis on the results. We first analyzedthe results of our GA-based and SA-based approaches, to chose the most suitableapproach. Based on this comparison we recommended the GA-based approachover SA-based approach and further examined why our approach performs betterin this space. We finally discussed the limitations of both the approaches.56Chapter 8Related Work8.1 Effect of Compiler Optimizations on ResilienceDemertzi et al. [8] have analyzed the effect of standard optimization levels on theapplication’s vulnerability. Unlike us, they do not consider the final outcome ofthe application due to the error, and whether the error results in an SDC. Further,our technique uses fault injections to evaluate the applications’ resilience, whilethey use ACE analysis [21], which has been shown to be less accurate than faultinjection [36].Rehman et. al. [26] devise new optimizations that attempt to decrease the over-all vulnerability of the program. The main difference between our work and theirsis that we consider the standard compiler optimizations rather than devise new op-timizations for resilience. Further, our technique is able to increase the resilienceof the application, without compromising on its performance.Sangchoolie et. al. [28] consider the effect of compiler optimization level onSDCs for different applications. The main difference between their work and oursis that we consider individual optimizations such as loop invariant code motion,rather than the aggregate optimization level such as O1. This is important fortwo reasons. First, the optimization levels such as O1 and O2 are developed forperformance and/or memory reasons, and do not consider error resilience. As aresult, when choosing a standard optimization level, one may be including opti-mizations that hurt error resilience, leading to sub-optimal choices. Secondly, and57more importantly, the resilience effect of optimization levels may vary for differ-ent hardware platforms and applications. Therefore, it is important to develop anautomated technique for choosing the appropriate set of optimizations for a givenhardware platform and application.Thomas et al [34] study the impact of optimizations on error resilience forsoft computing applications. They measure the impact of optimization on Egre-gious Data Corruptions (EDCs), which are significant deviations from fault-freeoutcomes. Based on this study, they recommend a set of optimization that are“safe” against EDCs, i.e., do not significantly increase the EDC rate. There aretwo main differences between their work and ours. First, EDCs are only a subsetof SDCs, and do not apply to general purpose applications where any deviationin the output from the fault-free outcome, no matter how small, is unacceptable.Second, they do not have an automated method to choose the set of optimizationsfor a given application and platform, relying instead on simple heuristics such asdynamic code size. Such heuristics may not work well for all platforms and ap-plications. Cong et al [5] proposed a metric for measuring the loop reliability toanalyze the impact of loop transformations on software reliability. Similar to theprevious work they mainly focus on soft computing applications and EDC out-comes. Also they restrict their analysis for loop optimizations, while we considerall compiler optimizations in general.8.2 Choosing Compiler OptimizationsThere has been a substantial amount of work to automatically choose the best setof compiler optimizations for performance improvement or memory reduction, fora given platform and application [4, 23]. Researchers have proposed the use of var-ious search heuristics algorithms for this purpose (e.g., simulated annealing [38]).The most popular techniques in this space has been genetic algorithms (GAs) andSimulated Annealing (SA), which had been used to find compiler optimization se-quences that optimize for performance [16, 32, 38], energy consumption [30], andcode size [6]. None of them consider error resilience, which is our focus.588.3 Software Errors and Genetic AlgorithmsThere has been considerable work on the use of genetic algorithms to repair soft-ware errors in programs [15, 18]. Given a test suite with one or more failing testcases, the main idea is to repair the program by (1) localizing the line of codewhich is responsible for the failure, and (2) replacing the faulty line with anotherline from the same program in an adaptive and iterative fashion using GA, till allthe test cases in the suite pass. There are two main differences between this workand ours. First, we target hardware faults, while they target software bugs. Sec-ondly, we aim to make the program resilient without knowing ahead of time whereit will fail. Thus, our approach does not require a failing test case or test suite toguide the adaptation.59Chapter 9Conclusion and Future WorkCompiler optimizations perform code transformations for improving the perfor-mance of an application. They mostly target to improve a specific aspect of theprogram like time, memory or code size. However, these code transformationswould mostly affect the error resilience of the program.In this work, we first performed a fault injection study to analyze the effect ofindividual optimizations on program’s resilience. Based on the study we observedthat some optimizations degrade program’s resilience and others improve it. Wealso observed that this effect of optimizations on program’s resilience depends onthe application’s characteristics as well. Hence we proposed automated techniquesfor finding resilience friendly compiler optimizations for a given application.We leveraged search heuristic algorithms, Genetic Algorithms (GA) and Simu-lated Annealing (SA) to find an optimization sequence that improves the program’sperformance without degrading its error resilience. We evaluated our techniqueson twelve benchmark programs, and found that it finds optimization sequences thatprovides resilience on par with or even better than the standard optimization lev-els (O1, O2 and O3), while achieving comparable performance improvement asthe optimization levels. Further, it also lowers the vulnerability of applications,compared to the standard optimization levels.For future work, we plan to look at aspects other than performance, such ascode size, energy efficiency, and trade them off for error resilience. It would beinteresting to explore different strategies to avoid getting stuck at local maxima60in our techniques. This might lead to obtaining better solutions. For example,modifying the selection strategy at the tournament selection step in the GA-basedapproach; adaptively changing the mutation and crossover rate over the generationin the algorithm; exploring other suitable operations for mutation, crossover andneighbor state; finding a hybrid algorithm based on our two approaches.Further, resilience friendly compiler optimizations with similar impact on ap-plications with similar behavior can be analyzed. The sequences obtained for theseapplications can be grouped together and fed as the population/input in our algo-rithm for obtaining better optimization sequences.Based on the resilience-friendly sequences obtained from our approach, it wouldbe interesting to find a model that can predict the effect of a give optimization onprogram’s resilience. For this, the impact of the optimizations on application’scode structure has to be studied.61Bibliography[1] J. E. Baker. Adaptive selection methods for genetic algorithms. InProceedings of an International Conference on Genetic Algorithms and theirapplications, 1985. → pages 20[2] C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite:Characterization and architectural implications. In PACT, 2008. → pages 3,23, 28, 31[3] S. Borkar. Designing reliable systems from unreliable components: Thechallenges of transistor variability and degradation. MICRO, Nov. 2005.ISSN 0272-1732. doi:10.1109/MM.2005.110. URLhttp://dx.doi.org/10.1109/MM.2005.110. → pages 1[4] J. Cavazos, G. Fursin, F. Agakov, E. Bonilla, M. O’Boyle, and O. Temam.Rapidly selecting good compiler optimizations using performance counters.In CGO, March 2007. doi:10.1109/CGO.2007.32. → pages 58[5] J. Cong and C. H. Yu. Impact of loop transformations on software reliability.In ICCAD, 2015. → pages 58[6] K. D. Cooper, P. J. Schielke, and D. Subramanian. Optimizing for reducedcode space using genetic algorithms. In LCTES, 1999.doi:10.1145/314403.314414. URLhttp://doi.acm.org/10.1145/314403.314414. → pages 2, 58[7] K. A. De Jong and W. M. Spears. An analysis of the interacting roles ofpopulation size and crossover in genetic algorithms. In PPSN. 1991. →pages 20[8] M. Demertzi, M. Annavaram, and M. Hall. Analyzing the effects ofcompiler optimizations on application reliability. In IISWC, 2011. → pages1, 2, 11, 5762[9] G. E. Dieter and D. Bacon. Mechanical metallurgy, volume 3. McGraw-HillNew York, 1986. → pages 7[10] S. Feng, S. Gupta, A. Ansari, and S. Mahlke. Shoestring: probabilistic softerror reliability on the cheap. In SIGARCH, 2010. → pages 1[11] J. Grefenstette. Optimization of control parameters for genetic algorithms.SMC, Jan 1986. ISSN 0018-9472. doi:10.1109/TSMC.1986.289288. →pages 6[12] S. K. S. Hari, R. Venkatagiri, S. V. Adve, and H. Naeimi. GangES: Gangerror simulation for hardware resiliency evaluation. In ISCA, 2014. → pages29[13] R. L. Haupt. Optimum population size and mutation rate for a simple realgenetic algorithm that optimizes array factors. In APSURSI, 2000. → pages32[14] J. H. Holland. Adaptation in Natural and Artificial Systems: An IntroductoryAnalysis with Applications to Biology, Control and Artificial Intelligence.1992. ISBN 0262082136. → pages 6[15] D. Kim, J. Nam, J. Song, and S. Kim. Automatic patch generation learnedfrom human-written patches. In Proceedings of the 2013 InternationalConference on Software Engineering, ICSE ’13, pages 802–811, Piscataway,NJ, USA, 2013. IEEE Press. ISBN 978-1-4673-3076-3. URLhttp://dl.acm.org/citation.cfm?id=2486788.2486893. → pages 59[16] S. R. Ladd. Analysis of compiler optimizations via an evolutionaryalgorithm. https://packages.debian.org/sid/devel/acovea. → pages 2, 58[17] C. Lattner and V. Adve. LLVM: A compilation framework for lifelongprogram analysis & transformation. In CGO, 2004. → pages 3, 13, 23, 30[18] C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer. Genprog: A genericmethod for automatic software repair. IEEE Trans. Softw. Eng., 38(1):54–72,Jan. 2012. ISSN 0098-5589. doi:10.1109/TSE.2011.104. URLhttp://dx.doi.org/10.1109/TSE.2011.104. → pages 59[19] M.-L. Li, P. Ramachandran, S. K. Sahoo, S. V. Adve, V. S. Adve, andY. Zhou. Swat: An error resilient system. SELSE, 2008. → pages 1[20] A. Lim, B. Rodrigues, and X. Zhang. A simulated annealing andhill-climbing algorithm for the traveling tournament problem. EuropeanJournal of Operational Research, 174(3):1459–1478, 2006. → pages 863[21] S. S. Mukherjee, C. Weaver, J. Emer, S. K. Reinhardt, and T. Austin. Asystematic methodology to compute the architectural vulnerability factorsfor a high-performance microprocessor. In MICRO, 2003. → pages 6, 57[22] K. Nara, A. Shiose, M. Kitagawa, and T. Ishihara. Implementation ofgenetic algorithm for distribution systems loss minimum re-configuration.IEEE Transactions on Power Systems, Aug 1992. ISSN 0885-8950.doi:10.1109/59.207317. → pages 20[23] Z. Pan and R. Eigenmann. Fast and effective orchestration of compileroptimizations for automatic performance tuning. In CGO, March 2006.doi:10.1109/CGO.2006.38. → pages 58[24] B. J. Park, H. R. Choi, and H. S. Kim. A hybrid genetic algorithm for the jobshop scheduling problems. Computers & industrial engg., 2003. → pages 20[25] K. Pattabiraman, Z. Kalbarczyk, and R. Iyer. Application-based metrics forstrategic placement of detectors. In PRDC, Dec 2005.doi:10.1109/PRDC.2005.19. → pages 27, 28[26] S. Rehman, M. Shafique, F. Kriebel, and J. Henkel. Reliable software forunreliable hardware: Embedded code generation aiming at reliability. InCODES+ISSS, Oct 2011. → pages 57[27] M. Richmond. Examples of uncertainty calculations.http://spiff.rit.edu/classes/phys273/uncert/uncert.html. → pages 45[28] B. Sangchoolie, F. Ayatolahi, R. Johansson, and J. Karlsson. A study of theimpact of bit-flip errors on programs compiled with different optimizationlevels. In EDCC, 2014. → pages 1, 2, 57[29] J. D. Schaffer, R. A. Caruana, L. J. Eshelman, and R. Das. A study ofcontrol parameters affecting online performance of genetic algorithms forfunction optimization. In GEM, 1989. ISBN 1-55860-006-3. URLhttp://dl.acm.org/citation.cfm?id=93126.93145. → pages 6[30] E. Schulte, J. Dorn, S. Harding, S. Forrest, and W. Weimer. Post-compilersoftware optimization for reducing energy. SIGARCH, Feb. 2014.doi:10.1145/2654822.2541980. URLhttp://doi.acm.org/10.1145/2654822.2541980. → pages 58[31] M. Srinivas and L. Patnaik. Genetic algorithms: a survey. Computer, 1994.ISSN 0018-9162. doi:10.1109/2.294849. → pages 2064[32] M. Stephenson, U.-M. O’Reilly, M. C. Martin, and S. Amarasinghe. Geneticprogramming applied to compiler heuristic optimization. In EuroGP, 2003.ISBN 3-540-00971-X. URLhttp://dl.acm.org/citation.cfm?id=1762668.1762691. → pages 58[33] J. A. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, L.-W. Chang, N. Anssari,G. D. Liu, and W.-m. W. Hwu. Parboil: A revised benchmark suite forscientific and commercial throughput computing. CRHC, 2012. → pages 3,31[34] A. Thomas, J. Clapauch, and K. Pattabiraman. Effect of compileroptimizations on the error resilience of soft computing applications. In AER,2013. → pages 1, 2, 11, 27, 28, 58[35] P. van Laarhoven and E. Aarts. Simulated annealing. In SimulatedAnnealing: Theory and Applications, volume 37 of Mathematics and ItsApplications, pages 7–15. Springer Netherlands, 1987. ISBN978-90-481-8438-5. doi:10.1007/978-94-015-7744-1 2. URLhttp://dx.doi.org/10.1007/978-94-015-7744-1 2. → pages 7[36] N. J. Wang, A. Mahesri, and S. J. Patel. Examining ACE analysis reliabilityestimates using fault-injection. In SIGARCH, 2007. → pages 57[37] J. Wei, A. Thomas, G. Li, and K. Pattabiraman. Quantifying the accuracy ofhigh-level fault injection techniques for hardware faults. In DSN, 2014. →pages 30[38] S. Zhong, Y. Shen, and F. Hao. Tuning compiler optimization options viasimulated annealing. In FITME, Dec 2009. doi:10.1109/FITME.2009.81. →pages 2, 5865

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0166741/manifest

Comment

Related Items