UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Precisely quantifying software information flow Enescu, Mihai Adrian 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2016_may_enescu_mihai.pdf [ 598.36kB ]
JSON: 24-1.0228571.json
JSON-LD: 24-1.0228571-ld.json
RDF/XML (Pretty): 24-1.0228571-rdf.xml
RDF/JSON: 24-1.0228571-rdf.json
Turtle: 24-1.0228571-turtle.txt
N-Triples: 24-1.0228571-rdf-ntriples.txt
Original Record: 24-1.0228571-source.json
Full Text

Full Text

Precisely Quantifying SoftwareInformation FlowbyMihai Adrian EnescuB.Sc., The University of British Columbia, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Computer Science)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)March 2016c© Mihai Adrian Enescu 2016AbstractA common attack point in a program is the input exposed to the user. Theadversary crafts a malicious input that alters some internal state of theprogram, in order to acquire sensitive data, or gain control of the program’sexecution.One can say that the input exerts a degree of influence over specificprogram outputs. Although a low degree of influence does not guaranteethe program’s resistance to attacks, previous work has argued that a greaterdegree of influence tends to provide an adversary with an easier avenue ofattack, indicating a potential security vulnerability.Quantitative information flow is a framework that has been used to de-tect a class of security flaws in programs, by measuring an attacker’s in-fluence. Programs may be considered as communication channels betweenprogram inputs and outputs, and information-theoretic definitions of infor-mation leakage may be used in order to measure the degree of influencewhich a program’s inputs can have over its outputs, if the inputs are al-lowed to vary. Unfortunately, the precise information flow measured by thisdefinition is difficult to compute, and prior work has sacrificed precision,scalability, and/or automation.In this thesis, I show how to compute this information flow (specifically,channel capacity) in a highly precise and automatic manner, and scale tomuch larger programs than previously possible. I present a tool, nsqflow,that is built on recent advances in symbolic execution and SAT solving. Iuse this tool to discover two previously-unknown buffer overflows. Exper-imentally, I demonstrate that this approach can scale to over 10K lines ofreal C code, including code that is typically difficult for program analysistools to analyze, such as code using pointers.iiPrefaceThe work I describe in this thesis is mostly my own, with the exception ofthe subSAT solver which was designed by Sam Bayless.I have based much of the text on a paper due to appear and to bepresented at the European Symposium on Security and Privacy 2016, titled“Precisely Measuring Quantitative Information Flow: 10K Lines of Codeand Beyond”, which was written in collaboration with Celina Val, SamBayless, Alan Hu, and William Aiello.I have based some of the writing on previous (but unpublished) submis-sions to the USENIX Security Symposium 2015, and the IEEE Symposiumon Security and Privacy 2015, with the same co-authors.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Quantitative Information Flow . . . . . . . . . . . . . . . . . 52.2 Kite: Conflict-Driven Symbolic Execution . . . . . . . . . . . 63 nsqflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.1 Architecture of nsqflow . . . . . . . . . . . . . . . . . . . . . . 103.1.1 Memory Handling . . . . . . . . . . . . . . . . . . . . 123.1.2 Library Calls . . . . . . . . . . . . . . . . . . . . . . . 143.1.3 Relaxing Precision . . . . . . . . . . . . . . . . . . . . 153.2 Subset Model Counter . . . . . . . . . . . . . . . . . . . . . . 153.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.1 Testing Correctness . . . . . . . . . . . . . . . . . . . . . . . . 204.2 Comparison: TEMU . . . . . . . . . . . . . . . . . . . . . . . 224.3 Comparison: sqifc . . . . . . . . . . . . . . . . . . . . . . . . 264.4 Larger Experiments . . . . . . . . . . . . . . . . . . . . . . . 31ivTable of Contents4.5 CoreUtils . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.6 Performance Discussion . . . . . . . . . . . . . . . . . . . . . 365 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.1 Network-Flow-Based QIF . . . . . . . . . . . . . . . . . . . . 405.2 Symbolic-Execution-Based QIF . . . . . . . . . . . . . . . . . 416 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47vList of Tables4.1 Information flow and running time for programs given in [25] 244.2 Running time for programs given in [25] . . . . . . . . . . . . 254.3 Running time for crc8 from [26] . . . . . . . . . . . . . . . . . 274.4 Grade protocol information flow results . . . . . . . . . . . . 294.5 Grade protocol running time results . . . . . . . . . . . . . . 294.6 Dining cryptographers protocol results . . . . . . . . . . . . . 314.7 Interdecile range for comparison programs. . . . . . . . . . . 324.8 Larger experiment results . . . . . . . . . . . . . . . . . . . . 354.9 Information flow for smallest CoreUtils . . . . . . . . . . . . . 374.10 Information flow for randomly-selected CoreUtils . . . . . . . 38viList of Figures1.1 Stylized input sanitizer . . . . . . . . . . . . . . . . . . . . . . 22.1 Execution tree for code in Listing 2.1 . . . . . . . . . . . . . . 83.1 The architecture of nsqflow . . . . . . . . . . . . . . . . . . . 114.1 Universal hashing tests . . . . . . . . . . . . . . . . . . . . . . 21viiAcknowledgmentsFirst, I wish to express my gratitude to my advisors, Professors Alan J. Huand William Aiello, for their unyielding guidance and support, and seeminglyinfinite patience throughout my time in grad school.Moreover, I’m grateful to the members of the Integrated Systems De-sign laboratory, who always provided me with great ideas and feedback andlistened to me drone on about my work on many occasions. In particular,I extend a special thanks to Celina G. Val and Sam Bayless, for their gen-erosity in donating countless hours and ideas, and their respective projectsKite and ModSAT, without which nsqflow, and this thesis, would not havebeen possible.I would also like to express my thanks to Professor Mark Greenstreet,whose encouragement and advice played an integral role in my decision toattend graduate school, and who has offered excellent feedback on this workon multiple occasions.viiiDedicationI dedicate this work to Emilia, Gabriel, and Cornel Enescu, without whoseunconditional love and support I would not have completed this thesis.ixChapter 1IntroductionIn this thesis, I address the problem of measuring the degree of influence of aprogram’s inputs over its outputs. I first motivate this problem by providingexamples of its economic cost. I then introduce previous related problemsand outline their existing solutions, providing examples that demonstratetheir shortcomings, demonstrating the importance of the problem addressedby this thesis.In a vulnerable software system, a malicious adversary can constructa user input that triggers a bug in an executing program that will eithergrant the adversary control over program execution, or allow the adversaryto extract sensitive information.SQL injection attacks are a well-known example where the adversary ex-ploits errors in the code to illicitly obtain sensitive information [11]. In 2014,the NTT Group estimated that an organization can incur costs of $196, 000for a single SQL injection attack [14]. Such attacks are commonplace, anda list of well-known attacks since 2002 is maintained by Hicken [13], wholists attacks such as the 2011 theft of 7 million private customer detailsfrom the Sony PlayStation Network, and the 2012 theft of account detailsfor 1.6 million records belonging to U.S. government agencies. Furthermore,Hicken also lists many smaller attacks which obtained credentials used togain control of web servers and other systems.However, the crafting of malicious input is not limited to SQL injection.Indeed, the economic and social impact of malicious input attacks far exceedsthe numbers I have just presented.Such a malicious input attack can be described as follows. By varyingthe public program input, an adversary exerts a degree of influence on aspecific program output. One of the oldest methods of quantifying influenceis taint analysis, which has long been used1 to determine whether untrustedinput data can inadvertently influence the value of trusted or sensitive data.Values in a program that are directly computed on the basis of the untrusted1 The earliest published use of the term that I have found is the original Perl book [40].Schwartz et al. [33] provide a fairly recent survey of the large literature on taint analysisand its applications.1Chapter 1. Introductionfunction InputSanitizer1(input)if input is in legal range thenreturn inputelsereturn ERRORfunction InputSanitizer2(input)return inputFigure 1.1: This simple example contrasts a stylized input sanitizer witha function that simply passes the input unchecked. Taint analysis does notdistinguish between the two.input are considered “tainted”. Subsequently, a taint policy determineshow taint is propagated dynamically through runtime operations. As avery coarse, conservative abstraction, taint analysis is efficient but suffersfrom excessive false positives leading to an overestimate of the influence ofuntrusted inputs on sensitive data. Taint analysis also typically addressesthe problem of confidentiality (detecting sensitive data leaks), rather thanprogram integrity (measuring the degree of an adversary’s control).For a simple example, consider the two functions in Figure 1.1. Thefirst is a stylized input sanitizer. The second skips all checks. Evidently,the former is correct and the latter is not, but taint analysis would labelboth functions’ return values equally tainted. Furthermore, as typicallyimplemented, taint analysis can produce false negatives (underestimates ofleaked information) as well. These result from a desire to improve efficiency(e.g., a dynamic analysis that considers only a small number of programpaths) or reduce false positives (e.g., by ignoring implicit information flowwhen tainted information affects control flow). Newsome et al. [25] giveseveral real-life examples (drawn from common Unix utilities, the Windowskeyboard driver, and how GCC compiles switch statements) of coding pat-terns that cause typical taint analyses to produce both false positives andfalse negatives.A quantitative measure of information flow is an elegant solution to theshortcomings of taint analysis, and considerable research has pursued thisdirection. Returning to Fig. 1.1, two reasonable questions to ask wouldbe whether or not InputSanitizer2 gives the input full control of the re-turn value, and whether or not InputSanitizer1 limits the influence tolog2(|legal range|+ 1) bits of information.2Chapter 1. IntroductionIn this thesis, I am primarily concerned with program integrity. At a highlevel, the goal of program integrity is to ensure that inputs from untrustedsources cannot cause a system to perform unsafe actions, according to somespecification of safety. This work, in particular, was inspired by Newsomeet al. [25], whose goal was to statically compute a measure for programintegrity. They define the influence that a (potentially untrusted) programinput can have over a specified program variable in two steps. First, theyconsider the number of distinct values of the specified variable that theprogram can be forced to compute by ranging over all possible input values.They define the bits of influence of the input over the specified variable asthe log2 of this number. For example, if the specified variable is 32 bitsand is sensitive, a large number of bits of influence should flag a potentialproblem.As stated, this definition of influence given by Newsome et al. [25] isequivalent to the channel capacity. The number of bits of influence is equalto the maximum number of bits of uncertainty of the output value, over allpossible input distributions. Even without the connection to informationtheory, influence is a natural worst-case notion in a security context forcapturing in a single value the amount of control an adversary may haveover a specified program variable. In Chapter 6, I consider a generalizationof the worst-case role that auxiliary inputs (i.e., not under the adversary’scontrol) can have on the adversary’s influence on a specified programvariable.In this thesis, I address the same influence computation given by New-some et al.: given a deterministic program, an exit point where informationflow is measured, and a specified set of output variables, compute the car-dinality of the set of all possible values of the output variables after allexecutions from the program’s inputs to its outputs at the exit point. Itreat all inputs as non-deterministic, e.g., under the control of the attacker.My goal is to compute this cardinality precisely.Prior work has sacrificed scalability, precision, and/or automation. Mostpublications that attempt precise (or close approximations of) channel ca-pacity give results only for small examples of around a dozen lines of codeor less (e.g., [2, 22, 25, 27]). On the other hand, Newsome, McCamant, andSong [25] also report impressive scalability in analyzing known vulnerabili-ties on real systems code (RPC DCOM, SQL Server, ATPhttpd), but theseresults are based on analyzing only a single execution trace (thereby ignoringmany possible information flows), and on these examples, their computationyielded lower bounds of fractionally above 6 bits and upper bounds of 323Chapter 1. Introductionbits or slightly less for a 32-bit output. The lower bounds were sufficient tosuggest the presence of an attack, but these are clearly not precise informa-tion flow bounds. My work was directly inspired by theirs — my goal is alsoto handle real system code, but with much greater precision on the chan-nel capacity measurement. Similarly, McCamant and Ernst [21] presentedan analysis that scaled to over 500KLoC, but based on rough, conservativetracking of the bit-widths (not information content) of data flows and somemanual program annotations. Most similar to my work is a recent paper byPhan and Malacaria [26], who compute precise quantitative information flowand report scaling on three examples to over 200 lines of code. Like them,I am also using symbolic execution and modern SAT/SMT techniques. AsI will show in Section 4.3, nsqflow is far more scalable.Contributions:• I demonstrate for the first time that highly precise quantitative infor-mation flow computation can scale to real, medium-sized programs.Specifically, I present nsqflow, an automated static analysis tool for Csource code that measures the channel capacity between a program’sinputs and user-selected outputs. nsqflow is both precise (subject tothe limitations below), and able to scale to 20K lines of real C code.In my experiments, nsqflow flagged two previously-undisclosed bufferoverflows in real, open-source programs.• To facilitate the analysis of real C source, I make extensions to Kite, astate-of-the-art symbolic execution tool, in order to efficiently analyzecode with pointers.4Chapter 2Background2.1 Quantitative Information FlowQuantitative information flow (QIF) is a framework that has been used indetecting a class of security flaws in programs. In QIF, programs are consid-ered communication channels between inputs and outputs, and information-theoretic definitions of information leakage may be used in order to measurethe degree of influence which a program’s inputs can have over its outputs,if the inputs are allowed to vary.Not surprisingly, this is a difficult problem. Intuitively, there is the chal-lenge of analyzing an explosion of possible program paths to the specifiedexit points, as well as the #P-complete problem of enumerating satisfyingassignments that generate possible output values. Formally, Cˇerny´, Chat-terjee, and Henzinger [39] show that merely bounding the cardinality isPSPACE-complete. Furthermore, this complexity is in terms of the explicitstate space of a finite-state program (the set of all possible valuations of allpossible program states), not just in terms of program size. Even in thesimple case of loop-free Boolean programs, Yasuoka and Terauchi [42] showthat the problem is coNP-complete.Sabelfeld and Myers [31] provide an extensive survey of earlier work oninformation flow, and Smith [35] provide a full treatment of the theoreticalfoundations of quantitative information flow in programs.Moreover, there has been work in practically measuring QIF. In [2], theauthors show how to compute a precise measure of information flow forsmall programs for a variety of information-theoretic measures by countingthe number of equivalence classes in an equivalence relation over programvariables. However, the tool they present scales only to small programs onthe order of a dozen lines.Meng and Smith [22] present a way to compute Renyi’s min-entropy byanalyzing relations between pairs of bits in a program’s outputs, but onlypresent results for small programs on the order of 10-20 lines of code (includ-ing many of the smaller examples from [25]), and show a rapid exponentialblowup as they scale up the number of program paths (15 hours for 23252.2. Kite: Conflict-Driven Symbolic Executionprogram paths).Newsome, McCamant, and Song [25] define a program input’s influenceover the program output as the the log2 logarithm of the number of solu-tions to the outputs, given that the inputs are allowed to vary. They arguethat if the influence is large relative to the output width (for instance 32bits for a 32-bit output), this is somewhere that a problem is likely to bepresent. Their measurement is equivalent to channel capacity, which repre-sents a worst-case over all possible distributions of the input variables, of thenumber of bits of uncertainty. Their approach, based on symbolic execution,is able to scale to large programs at the cost of precision. A more completecomparison of nsqflow with [25] is presented in Section 5.2Phan and Malacaria [26] compute precise QIF based on a symbolic exe-cution engine that counts the number of solutions while performing symbolicexecuting, in order to arrive at a precise measurement. They scale to pro-grams on the order of 200 lines of code. A more complete comparison ofnsqflow with [26] is presented in Section Kite: Conflict-Driven Symbolic ExecutionI implement nsqflow as an extension to Kite, a tool that implements a state-of-the-art symbolic execution algorithm known as conflict-driven symbolicexecution [37]. The reader is assumed to be familiar with symbolic executionand SAT solvers based on conflict-driven clause learning (CDCL). Pleaserefer to [3, 12, 15] for a fuller treatment of symbolic execution, and [1, 9, 24]for background on CDCL SAT solvers.A symbolic execution engine operates on symbolic values (sets of concretevalues), executing instructions sequentially in order to prove or disprove aproperty about the program (typically assertion statements). Every timethere is a branch point in the program, the symbolic execution must consider(but not necessarily explore) both program paths. Symbolic execution willsimulate a fork in the program execution. If a branch is infeasible, thesymbolic execution does not need to explore this path, and will continueexecuting only along feasible paths. Eventually, all paths in the programwill be covered as long as the path length is finite.Conflict-driven symbolic execution (CDSE) [37], is a symbolic executionalgorithm that is able to learn from infeasible paths. CDSE initially encodesthe program’s control flow graph (CFG) as a CNF formula (in a separateformula from the path constraints). It then augments this formula withfurther constraints each time a path is proven to be infeasible by the path62.2. Kite: Conflict-Driven Symbolic Executionconstraint solver, to exclude the infeasible path, along with potentially manyothers, from the solution space of the CFG formula.1 int main ( ) {2 symbol ic int x , y ;3 i f ( x < 0)4 x = −x ;5 int r e s u l t = x − y ;6 i f ( x < y )7 r e s u l t = y − x ;8 a s s e r t ( r e s u l t >= 0 ) ;9 }Listing 2.1: An example (adapted from Listing 1.1 of [37]) to il-lustrate that not all paths are needed to prove a property. In thisexample, the property to be proven is an assertion statement embed-ded in the code. In this example, this assertion always holds.For a demonstration of path learning in Kite, consider the small programgiven in Listing 2.1. A corresponding execution tree encoding each programstatement and branch as a node is shown in Figure 2.1. The transitionsfollowing each node are annotated with the full path condition that Kitehas learned following the program statement. Although there are 4 distinctexecution paths that reach the target assertion, Kite need not explore eachdistinct path, unlike traditional symbolic execution engines. For the casewhen Kite evaluates the (x < y) branch to True, the conjuncts in the pathcondition that are shown in red, specifically (x < y) and (result= y − x)are sufficient to prove the assertion. These conjuncts correspond to lines 6and 7 in the code. After Kite has explored the first (leftmost) path, it neednot evaluate the assertion again in the third path from the left, as it haslearned a clause that any time these conjuncts are present, the assertionwill evaluate to false. Similar reasoning can be applied to the second andfourth paths, and Kite only needs to fully explore 2 of the 4 program pathsin order to prove the assertion. Specifically, Kite will exclude paths thatreach the assertion in Line 8 with !(result >= 0) in the path condition.Any future paths that have this clause in the path condition will be refuted.As a result, Kite need not explore all the paths through the program.Each time that a path constraint is proven to be unsatisfiable, it followsthat at least one different branch in the path must be taken in order toreach the output of the program, and in principle, it would be sufficient torefine the CFG formula with this clause. However, on unsatisfiable instances,72.2. Kite: Conflict-Driven Symbolic Executionsymbolic int x, y;if (x < 0)x = -x;(x<0)                True           result = x-y;if (x < y)(-x<0)            (-x<0) and                     (result = -x-y)                      (-x<0) and                         (result = -x-y) and                          (-x<y)                        result = y-x;assert(result≥0);(-x<0) and                    (-x<y) and                    (result=y+x)                       (-x<0) and                (result = -x-y) and          (-x≥y)          result = x-y;(x≥0)          if (x < y)(x≥0) and                    (result = x-y)                    result = y-x;(x≥0) and               (result = x-y) and              (x<y)               assert(result≥0);assert(result≥0);assert(result≥0);                         (x≥0) and                           (result = x-y) and                         (x≥y)(-x>0) and                    (x<y) and                    (result = y-x)                    True True True TrueFigure 2.1: The full execution tree corresponding to the code in List-ing 2.1. The path condition following each program statement is labeled onthe transition following each statement. Each path condition is expressed interms of the initial values of the variables appearing in the path condition(the initial symbolic values given in line 2). I have shown path conditionsas linear arithmetic constraints for ease of reading, but it should noted thatsuch constraints would need an SMT solver — not just a SAT solver.82.2. Kite: Conflict-Driven Symbolic Executionmodern SAT and SMT solvers can derive a conflict clause, which in this casewill describe a sufficient subset of the branches in the infeasible path suchthat at least one of the branches in that subset must differ in any feasiblepath to the output — and in many cases, this subset may be much smallerthan the full set of branches in the path.Analogously to clause learning in a CDCL SAT solver, after each infea-sible path, Kite adds these conflict clauses to the CFG formula, potentiallyblocking many previously unexplored (but provably infeasible) paths, with-out giving up completeness. Each satisfying solution to this CFG formuladescribes a potentially feasible path in the program; decisions on which pathto explore next are made by repeatedly finding satisfying solutions to thisformula.By maintaining this CFG formula and using it to direct the sequence ofpath explorations, CDSE takes advantage of two features responsible for therecent success of CDCL SAT solvers: conflict analysis and non-chronologicalbacktracking. This allows Kite to perform complete symbolic executionwhile mitigating the path explosion problem.9Chapter 3nsqflowI have developed a tool, nsqflow, to measure QIF (specifically, channel ca-pacity) through programs. In this chapter I describe the architecture of mytool. Figure 3.1 presents a high-level overview of my approach, showing thevarious components of the toolchain.As part of this process, the user must select a variable along with alocation in the code over which they wish to measure influence. Currently, Iachieve this with a simple annotation, but in principle this could be appliedto unaltered code, for example using a GUI to select the variable in question.This program is symbolically executed using the symbolic execution en-gine Kite [37] to produce a representation of the program as a Boolean Sat-isfiability (SAT) instance in conjunctive normal form (CNF). During thisstep, Kite forms a path condition for each path that it explores, and em-ploys a constraint solver to both simplify and solve each path condition. Idescribe this procedure more closely in Section 3.1.In addition to being solved, nsqflow stores the CNF representations ofeach path condition encountered during symbolic execution. Finally, onceall paths have been explored, the CNFs for each path condition are com-bined and handed to a model counter, which counts the number of distinct2satisfying assignments to just the output variables in the resulting CNF. Fi-nally, the reported measurement is the base-2 logarithm of the number ofsuch satisfying assignments.In the following sections, I describe the various components of nsqflow.3.1 Architecture of nsqflowI have implemented nsqflow using the symbolic execution engine Kite, whichimplements the CDSE algorithm [37] and was developed primarily to verifyembedded assertions in sequential programs. Consequently, nsqflow inheritsmany components from Kite. Kite verifies programs described in the LLVM2 By distinct, I mean that any set of solutions that share the exact same values on theoutput variables are counted only once.103.1. Architecture of nsqflowKiteSymbolic executionengine SharpSubSAT Subset modelcounterC Sourcewith output variableannotationInput to nsqflowInfluencein bits (QIF)  Output from nsqflownsqflowQIF measurement toolFigure 3.1: The architecture of nsqflow. Note that Kite outputs a CNFformula encoding the feasible paths of the source program. This CNF is fedas input to SharpSubSAT to compute the subset model count. The influenceis the base-2 logarithm of this count.assembly language [18], known as LLVM-IR, which can be obtained by com-piling C files using the LLVM C front-end. nsqflow has inherited LLVM asits underlying compiler suite, and internally uses LLVM-IR during its com-putation of QIF. nsqflow has also inherited Kite’s internal constraint (SMT)solver and SAT solver (STP [3], and MiniSat 2 [9], respectively).Kite is able to generate CNFs representing the set of all feasible pathsleading to a desired target property, and nsqflow leverages this feature ingenerating a final CNF to be model-counted. Unfortunately, Kite does notimplement pointer analysis, treating symbolic pointers conservatively. Thisleads to unacceptable running times for pointer-heavy C code, and I discusshow I address with this issue in section 3.1.1, adding pointer analysis toKite.To achieve a precise measure of information flow, nsqflow symbolicallyexecutes all the relevant (feasible) paths to the output variables of the (finite)program being analyzed, and then forms the disjunction of those paths intoa single CNF for model counting (as described below). However, naivelyattempting to symbolically execute all program paths scales very poorlyin practice, as the number of paths to execute may be exponential in thenumber of branches — and many of those paths might be irrelevant to theinformation flow currently being analyzed.In order to support model counting in this paper, I implement as part of113.1. Architecture of nsqflownsqflow two further changes to Kite: First, nsqflow adds a blocking clause tothe CFG formula each time a feasible path to the output variable is found,to prevent that complete path from being explored again (this forces Kiteto continue symbolically executing all feasible paths to the output, ratherthan just halting after the first path is found). Secondly, nsqflow storesthe formulas for each feasible path constraint, and forms their disjunctionas one monolithic output CNF to be passed to the subset model counter(described below). Only the feasible path constraints must be included inthis disjunction; infeasible path conditions need not be stored, reducing thesize and difficulty of the resulting CNF.During symbolic execution, the accumulated path constraints from rootto leaf in the program’s execution tree (for feasible paths) represent thebranch decisions leading from specific concrete inputs to the output value. Adisjunction of the constraints over all feasible paths represents a full formulafor a finite program. While the number of paths is exponentially large,Kite is able to prune large parts of the search space and make possible anincrementally-updated disjunction.The basic implementation details of forming the incremental pathdisjunction are as follows. Initially, nsqflow requires a user-provided flowdeclaration as a way to mark the output variables at a specific pointin the program. This is implemented as a special function call in theprogram: DECLARE OUTPUT(varname). Conceptually, this function call istranslated into a pair of assertions in the code: assert(varname != 0)and assert(varname == 0), which comprise the target property for thesymbolic execution engine to check. To support an incremental update tothe CNF, nsqflow keeps track of program-variable-to-STP, and STP-to-SATvariable mappings, performing CNF variable renaming to avoid any namecollisions. nsqflow also imposes equality constraints on the CNF variablescorresponding to the same program variable across different paths, in orderto keep the CNF correctly constrained in the number of admitted solutions.The implementation is memory-efficient, in that at any given time, nsqflowneed maintain only these constraints for the previous path and the currentpath.3.1.1 Memory HandlingDue to Kite’s lack of pointer analysis, it would be impossible to analyzereal C code containing pointers. Because Kite assumes that all pointersare 32 bits, Kite branches on all 232 possible memory addresses when there123.1. Architecture of nsqflowis a symbolic pointer dereference. While this is a sound upper bound onthe number of feasible execution paths, it is an infeasible approach. Con-sequently, I have implemented a method to better handle memory access,a facility necessary for nsqflow to scale to real code. Below, I describe howsymbolic pointers are handled in nsqflow.Every time Kite makes a decision on a CFG branch, nsqflow makes aquery to the pointer analysis provided by LLVM (called Data StructureAnalysis (DSA) [19]) under the current path condition, in order to learnaliasing information. For pointers that DSA finds to be definitely equal ordefinitely not equal, nsqflow extends the path condition at the time of theprevious CFG branch decision with this information. Thus, further explo-ration extending that path condition will be augmented with the returnedpointer comparison (aliasing) information.DSA itself works by first constructing a directed graph for each functionin the program, with each graph consisting of a set of nodes that correspondto the memory objects named in the function (stack, heap, and global objectsallocated or named in the function, including basic integer and floating-pointtypes, arrays, structures, pointers, and functions), edges between the fieldsof nodes when corresponding objects name each other in the program, aset of edges mapping these names to the fields they name, and a set of allfunction calls made by the function. This information is enough for DSAto simplify the graph, removing duplicate call sites when calls are made tothe same function with the same arguments and return types. Once DSAhas finished constructing directed graphs for each function in the programusing local information as just described, it copies the local graphs of eachcallee into the calling function’s local graph, detecting strongly-connectedcomponents (SCCs) in order to avoid visiting each SCC more than once (inorder to avoid infinite cycles), eliminating duplicate calls in the same manneras during the local graph construction. The result of this interproceduralcopying is a full call graph for the program. Finally, each graph is traversedagain, with the calling function’s graph being copied into the graph of eachpotential callee. In this manner, DSA is able to scale efficiently, requiringonly seconds for programs spanning hundreds of thousands of lines of code,with a worst-case time complexity of Θ(nα(n) + kα(k)e) and a worst-casespace complexity of Θ(fk), for a program with n instructions, k as themaximum size of a graph for a single function, e edges in the program’s callgraph, f functions in the program, and where α is the inverse Ackermannfunction. Due to this low computational overhead and excellent scalability,nsqflow is able to make a pointer query every time Kite makes a CFG branchdecision, and explores a new path under the path constraint at the time of133.1. Architecture of nsqflowKite’s branch decision.In addition to being an alias analysis, DSA is a points-to set memoryanalysis [7, 19]. Unlike alias analysis, which returns equality and disequalityrelations between pairs of pointer variables, a points-to analysis for a pointervariable will yield a set containing memory objects which each pointer mayreference. I extend Kite to use this information as part of its search decisionswhen the result of a conditional expression depends on a symbolic pointer.When the points-to analysis is successful, nsqflow need only consider a num-ber of branches corresponding to the memory locations from the points-toset, instead of the potential 232 memory addresses. The astute reader willobserve that even this conservative treatment of symbolic pointers whenunable to learn anything from DSA does not affect the precision of the re-sulting influence measurement. As the symbolic execution engine considerseach possible memory address assigned to the pointer, it will determine thatthe paths along memory addresses which cannot be assigned are infeasible,and therefore cannot reach the target property. Thus, although it can havea dramatic impact on performance, it does not sacrifice precision.One drawback of this alias analysis approach, however, is the increase inthe size of the resulting CNFs by inserting clauses. Although these clausesprovide additional constraints, which intuitively should make the intermit-tent SAT solving (as well as the final model counting) easier, it is not guar-anteed to do so and can in some cases increase the difficulty of the problemgiven to the solvers.3.1.2 Library CallsA limitation of nsqflow is one that static analysis tools fundamentally suffer— that of library and system calls, or to functions for which we possessheaders, but not the source implementation. Although nsqflow is a sourceanalysis tool, real C code makes use of included header files for which thesource is not available, and makes heavy use of system calls (in my exper-iments in chapter 4, the code analyzed makes heavy use of network reads,for instance). The default way in which nsqflow deals with the complexityof calls to such code is to implement stub libraries marking these functionsas behaving nondeterministically, and potentially returning any value. Kiteprovides the facility to include these “fake” library calls. Furthermore, forcommonly used functions, the user of the tool can implement models forthe functions to account for the behavior of these calls. A model will helpthe symbolic execution engine prune many paths from its search, but if themodel is incorrect, precision will also suffer.143.2. Subset Model Counter3.1.3 Relaxing PrecisionAlthough my goal is to measure influence precisely, one advantage ofnsqflow’s incremental, path-by-path nature is that it facilitates the earlytermination of its execution to obtain a lower bound on the true channel ca-pacity. This may be useful to implement policies such as “the inputs shouldhave no more than m bits of influence over the output”. This formulationcan be useful to verify programs for which a low influence measurement isexpected (for instance, string sanitizers on untrusted user inputs) — a largeoutput set may be a sign of an integrity violation. For instance, consider aprogram that is intended to output a small set of values, but nsqflow dis-covers a large (but not exhaustive) number of solutions to the output set,such as a buggy piece of code that is expected to return one of only 3 or 4possible values (e.g. return codes) for an output of width 32 bits. In thiscase, observing a measurement of 10 bits (for instance), is enough to alertthe user to the issue, without requiring a full exploration of all 232 possibleoutputs.There are two potential points at which nsqflow may be terminated earlyto obtain this bound. In the first, symbolic execution may be stopped earlyto obtain a CNF that represents only a subset of the paths explored at thatpoint. Because nsqflow maintains a CNF that is incrementally-updated asKite explores new paths, terminating symbolic execution early will yield aCNF with a number of solutions that is a sound lower bound on the totalnumber of solutions. Thus, model counting this CNF will yield a soundlower bound on the influence measurement.In the second, the model counting step may also be terminated early.Due to the incremental nature of the model counter used in nsqflow, it canbe stopped at any point to obtain a lower bound on the total number ofsolutions to the CNF.When exact precision is not required, as in the case of the aforementionedminimum-threshold policy, the savings in running time can be dramatic. In-deed, early termination of either of these steps may lead to an exponentiallylower number of paths to consider for the symbolic execution engine.3.2 Subset Model CounterIn the final step of my approach, nsqflow counts the number of differentsatisfying assignments that the output variables can be assigned in the CNFrepresentation of the program, which corresponds to the size of the feasibleset of values of the output variables. In order to compute the size of the153.2. Subset Model Counterfeasible value set for just the output variables (as opposed to the poten-tially much larger count of all possible satisfying assignments to the CNF,which may include many assignments which differ only in other variables),I use a variant of model counting, which I call subset model counting, or(#SubSAT), as the basis for computing the size of the feasible value set.Whereas in exact model counting, one must count the number of uniquesatisfying assignments to CNF formula φ, in #SubSAT one must count thenumber of unique assignments to a subset S of the variables of φ that canbe extended into satisfying assignments of φ.A typical CDCL-based model counter [32] can be easily adapted to sub-set model counting by simply removing all literals from the blocking clausethat are not elements of the subset S. Such an algorithm is described in Al-gorithm 3.1. Correctness of this algorithm follows from the correctness ofthe very similar 2QBF algorithm (Alg. 1) described in [29].Algorithm 3.1 SharpSubSAT Model Counter1: S ⊂ vars(φ) is the subset of Boolean variables in φ to count.2: num models = 03: while φ is SAT do4: Let assign be a satisfying assignment to φ.5: Let assign′ ⊂ assign be a prime implicant of φ.6: assign′S ← {x | x ∈ assign′ ∧ var(x) ∈ S}.7: num models← num models + 2|S|−|assign′S |8: φ← φ ∧ ¬assign′SI implement this step in the nsqflow toolchain using the #SubSAT solverSharpSubSAT [38], a prime-implicant-based CDCL model counter [4, 8,20, 23] that uses a greedy cover-set approach to deriving prime implicants.SharpSubSAT implements Algorithm 3.1 with the heuristics of first decid-ing on the variables of S; and adding variables not from S to the cover setfirst (thus potentially reducing the number of literals in S that end up inthe prime implicant and subsequent blocking clause). In my experiments,solving the subset model counting problem (for small subsets of up to afew hundred variables) is dramatically faster than complete exact model-counting, and is an important heuristic for the scalability of the computa-tionally hard model counting step of nsqflow. When SharpSubSAT has runto completion, the resulting count is precisely the cardinality of the feasiblevalue set. Finally, nsqflow takes the base-2 logarithm of the cardinality inorder to obtain the influence in bits.163.3. Limitations3.3 LimitationsIt is important to present the limitations of nsqflow, and of this approachto QIF.First and foremost, nsqflow attempts to solve a computationally diffi-cult problem. As nsqflow must fundamentally reason about an exponentialnumber of paths to the program exit points, the number of satisfying as-signments that yield possible possible output values is similarly large. Aspreviously mentioned, multiple steps in the nsqflow toolchain belong to highcomplexity classes. As a result, the runtimes obtained from nsqflow are fea-sible on some subset of real programs, but in the general case, nsqflow maybe unable to run to completion even in the case of finite path lengths. Aspresented, nsqflow is sensitive to the choice of program being analyzed.Second, nsqflow computes only channel capacity. Since channel capacityrepresents the worst-case behaviour over input distributions, it can over-estimate the degree of influence over the program outputs. This can leadto false positives – even when nsqflow computes a large number of possibleassignments to a set of output variables, this does not necessarily indicatea problem with the code. Some computations will fundamentally, and cor-rectly, exert a large degree of influence on their outputs. Such would bethe case, for instance, in cryptographic algorithms based on bit substitutionand permutation. In such a case, channel capacity would not be a use-ful measure in order to detect errors in the code. More generally, channelcapacity is not always the most appropriate measure of information flow,when other measures such as Shannon entropy or Renyi’s min-entropy maybe preferable.Furthermore, nsqfow inherits problems suffered by all symbolic executiontools. My analysis considers only finite-length execution paths. In particu-lar, nsqflow has a configurable parameter to bound the loop unrolling; if thesymbolic execution can exceed this bound, the tool warns the user.3 In the-ory, nsqflow’s bounds are always precise with respect to the set of all pathsexplored, are always guaranteed under-approximations with respect to allpaths (i.e., considering extremely long and non-terminating paths as well),and can be made arbitrarily accurate by extending the symbolic executionruns to include increasingly longer paths.In practice, the important challenge is scalability. As I will showin Chapter 4, nsqflow scales to much larger programs than previous ap-3 In my experimental results in Chapter 4, the bound is always set high enough toexhaustively explore all possible execution paths.173.3. Limitationsproaches [2, 22, 25, 26], and does so while maintaining a high level of pre-cision. There remains a theoretical concern that the lower bounds providedby bounded symbolic simulations could underestimate the influence of a ma-licious input. However, given the state-of-the-art, I believe that nsqflow’semphasis on scalability over completeness is well-justified.Moreover, nsqflow makes conservative abstractions about the environ-ment. For example, nsqflow does not model the file system, so a read to afile is treated as non-deterministic (i.e., under an attacker’s control). Sim-ilarly, system calls that are not explicitly modeled are treated as returningnon-deterministic values. In a security context, these may be reasonableassumptions, but they can result in coarse overestimates of the adversary’sinfluence.Finally, my current implementation operates from the source-code level,so nsqflow cannot quantify information flow to machine-level artifacts, suchas the program counter, as can be done e.g., by Newsome et al. [25]. Inprinciple, my approach could perform analysis of binaries if provided witha front-end that could recover the CFG and functionality of the binary.18Chapter 4EvaluationTwo key measures of performance for an information flow tool such asnsqflow are precision and scale. Previous work in the area has demon-strated a dichotomy between these two extremes, with the ideal being a toolthat can measure flow precisely at scale. I evaluate where nsqflow lies alongthis scale-precision spectrum using a set of programs varying in size andcomplexity.Because information flows through programs are difficult to reason aboutby a human, and there is a general lack of available precise flow measurementtools against which I can compare my own measurements, it is inherentlydifficult to validate the correctness of my tool. In order to achieve somevalidation of correctness, I present two sets of experiments in the followingsections. In the first, I run my tool against a nontrivial program for whichI can analyze the information flow precisely. In the second, I compare themeasurements obtained by nsqflow to the measurements presented by New-some, McCamant, and Song [25] and by Phan and Malacaria [26]. As Iwill demonstrate, nsqflow reports the same value whenever an exact valueis provided in [25] and [26], and falls within the lower and upper boundswhenever an approximate value is presented in [25].Finally in Section 4.4, I demonstrate that nsqflow is able to analyzeprograms at scale, by running it on a selection of open source software andreporting my results.My reported running times take into account a full command-line invoca-tion of the nsqflow toolchain, from the start of symbolic execution to the endof model counting. Since there is fixed overhead associated with this pro-cess (e.g. interprocess communication, internal compilation by LLVM/Kite,temporary file input/output), the examples that run to completion veryquickly are unfairly penalized. Thus, the reported running times for thesmall programs are less meaningful and almost entirely an artifact of myimplementation. However, this effect is negligible for longer-running exper-iments. I executed all experiments on an Intel Core i5-750 2.66GHz with8MB of L2 cache and 4GB of RAM running 64-bit Ubuntu Linux.194.1. Testing Correctness4.1 Testing CorrectnessIn order to test the correctness of nsqflow, I analyze the output QIFof nsqflow on a nontrivial program for which the influence is known byconstruction. The program implements a family of universal hashingfunctions and is shown in Listing 4.1. It demonstrates nsqflow’s preciseinformation flow computation through both explicit data flow and implicitcontrol flow.i n t matrix [ROUNDS] ;. . .void f i l l m a t r i x ( i n t seed ) {i n t i ;srand ( seed ) ;f o r ( i =0; i < ROUNDS; i++)matrix [ i ] = rand ( ) &((1 << OUTPUT WIDTH) − 1 ) ;}// p r in t matrix to f i l e. . .// matrix has cons tant s hardcoded by f i l l m a t r i xlong hash ( long x ) {i n t i ;i n t va lue = 0 ;f o r ( i =0; i < ROUNDS; i++) {i f ( x & 0x1 ) value ˆ= matrix [ i ] ;x >>= 1 ;}re turn value ;}Listing 4.1: Universal hashing exampleThe above generates a matrix of bits, with ROUNDS columns, each ofwhich possesses OUTPUT WIDTH bits by construction (masking the remainingbits in the bitvector to 0). The matrix is filled with deterministic values setby initializing the pseudorandom number generator with a particular valuefor its seed, resulting in the same sequence of pseudorandom numbers whenfill matrix is passed the same seed.The hash function consumes one bit at a time from the value to hash204.1. Testing Correctness0 5 10 15 20 25Rounds (bits)0510152025Measured information flow (bits)Measured information flow for Universal Hashing Example.Number of output bits = 15.Bits, calculatedby nsqflow  (circlearea weighted byprobability)Expected maximuminformationflow (bits)Figure 4.1: Universal hashing tests. The area of the points is proportionalto the number of times that amount of information flow was measured, ofmy 100 runs.(x), and if the bit’s value is 1, it performs a bitwise xor with the bit inthe next matrix column. In this way, one knows that by construction, atmost ROUNDS bits of the return value may be affected by the input bits ofx, dependent on the matrix values chosen. After a number of rounds (inputbits) equal to OUTPUT WIDTH, one expects to see a limit.The only input in this case is x, and measurements correspond tothe influence of x over the value returned by the hash function. I fixedOUTPUT WIDTH to have the value 15 and let ROUNDS vary. In order to avoidcreating specific hardcoded matrices to demonstrate the values, I insteaduniformly sampled the seed value from [0, 231 − 1] and repeated the mea-surement 100 times for each value of ROUNDS. By construction, the infor-mation flow is limited by both the number of input and output bits, i.e.,214.2. Comparison: TEMUmin(ROUNDS, OUTPUT WIDTH). In addition, the influence should be exactlythat value, unless the columns of the pseudorandom matrix do not fullyspan the space of possible outputs.Figure 4.1 plots my results. Exactly as predicted, the influ-ence measurements are limited by (and usually exactly equal to)min(ROUNDS, OUTPUT WIDTH), with smaller information flow when the ma-trix columns do not span all possible outputs. For example, the probabilitythat a 15× 15 binary matrix being full-rank is about 29%, and one can seethat when ROUNDS = OUTPUT WIDTH = 15, a bit less than 1/3 of the 100pseudorandom matrices produce the full 15 bits of influence.4.2 Comparison: TEMUIn this section, I compare my results to examples from prior work byNewsome, McCamant, and Song [25] as an evaluation of their influ-ence measurement tool, called TEMU4. I show that nsqflow is ableto measure information flow consistent with the findings of Newsomeet al., with higher precision and better running time in about half ofthe cases presented. For the reader’s convenience, I have included mytranslations into C of the relevant test programs, shown in listings 4.2 to 4.9.i n t copy ( i n t i ) {i n t v = i ;r e turn v ;}Listing 4.2: Copyi n t masked copy ( i n t i ) {i n t v = i & 0 x0f ;r e turn v ;}Listing 4.3: Masked copy4Actually, it is an extended tool based on the existing TEMU, the dynamic taintanalysis component of the BitBlaze binary analysis software suite [36]224.2. Comparison: TEMUi n t base = CONSTANT; // any constanti n t checked copy ( i n t i ) {i n t v ;i f ( i < 16)v = base + i ;e l s ev = base ;r e turn v ;}Listing 4.4: Checked copyi n t div2 ( i n t i ) {i n t v = i / 2 ;r e turn v ;}Listing 4.5: Divide by 2i n t mul2 ( i n t i ) {i n t v = i ∗ 2 ;re turn v ;}Listing 4.6: Multiply by 2i n t i m p l i c i t ( i n t input ) {i n t output = 0 ;i f ( input == 0) output = 0 ;e l s e i f ( input == 1) output = 1 ;e l s e i f ( input == 2) output = 2 ;/∗ . . . ∗/e l s e i f ( input == 6) output = 6 ;e l s e output = 0 ;re turn output ;}Listing 4.7: Implicit flow234.2. Comparison: TEMUInfluence (bits)Name Fig # in [25] nsqflow TEMU-soundTEMU-sampleTEMU-approx-#SATCopy Table 1 32 6.04–32 31.8–32 32Masked Table 1 4 4 - -Checked Ex. 1 4 4 - -Div2 Table 1 31 6.58–31 30.8–31 31.7Mul2 Table 1 31 6.58–32 30.4–31.6 31.5Implicit Figure 1 2.81 2.81 - -Popcnt Figure 2 5.04 5.04 - -MixDup Figure 2 16 6.04–32 0–28.6 15.8Table 4.1: Information flow for programs given in [25]. The second columnrepresents the figure number in [25] (which I have reproduced in listings 4.2to 4.9). Ranges are given as x–y when values from [25] are approximate.i n t popcnt ( unsigned i n t i ) {i = ( i & 0x55555555 ) + ( ( i >> 1) & 0x55555555 ) ;i = ( i & 0x33333333 ) + ( ( i >> 2) & 0x33333333 ) ;i = ( i & 0 x 0 f 0 f 0 f 0 f ) + ( ( i >> 4) & 0 x 0 f 0 f 0 f 0 f ) ;i = ( i & 0 x 0 0 f f 0 0 f f ) + ( ( i >> 8) & 0 x 0 0 f f 0 0 f f ) ;i n t output = ( i + ( i >> 16)) & 0 x f f f f ;r e turn output ;}Listing 4.8: Population countunsigned i n t mix copy ( unsigned i n t x ) {unsigned i n t y = ( ( x >> 16) ˆ x ) & 0 x f f f f ;unsigned i n t output = y | ( y << 1 6 ) ;r e turn output ;}Listing 4.9: Mix and duplicateI present a comparison of my results for these programs in Table 4.1and Table 4.2. In each case, I measured the flow of my tool from the in-put to the listed function’s return value. The columns labeled TEMU-sound,TEMU-sample, and TEMU-approx-#SAT represent the three different com-plementary approaches employed by Newsome et al. TEMU-sound is the244.2. Comparison: TEMURunning time (s)Name Fig # in [25] nsqflow TEMU-soundTEMU-sampleTEMU-approx-#SATCopy Table 1 0.16 0.5–3.8 0.5–3.8 <30Masked Table 1 0.16 0.5–3.8 - -Checked Ex. 1 0.16 0.5–3.8 - -Div2 Table 1 0.17 0.5–3.8 0.5–3.8 <30Mul2 Table 1 0.17 0.5–3.8 0.5–3.8 <30Implicit Figure 1 0.17 0.5–3.8 - -Popcnt Figure 2 0.17 0.5–3.8 - -MixDup Figure 2 0.17 0.5–3.8 0.5–3.8 <30Table 4.2: Running time for programs given in [25]. The second columnrepresents the figure number in [25] (which I have reproduced in listings 4.2to 4.9). Ranges are given as x–y when values from [25] are approximate.first technique their tool uses, and is able to find sound lower and upperbounds on the channel capacity. When the lower and upper bounds areequal, they have found a precise result. However, when their sound tech-nique fails to find a precise result after several seconds, their tool falls backto approximation techniques represented by the next two columns, TEMU-sample and TEMU-approx-#SAT. The column labeled TEMU-sample rep-resents a probabilistic bound based on sampling the space of possible in-puts by making queries to a constraint solver. Finally, the column labeledTEMU-approx-#SAT is their estimate using an approximate model counterto obtain a probabilistic estimate of the size of the output set. Althoughthey do not present exact runtimes, they state that for TEMU-sound andTEMU-sampling, runtimes for the examples varied between 0.5 and 3.8 sec-onds, while TEMU-approx-#SAT never required more than 30 seconds.In [25], Newsome et al. also present another set of experiments. Unfortu-nately, however, I was unable to reproduce these experiments using nsqflowdue to the nature of their tool compared to mine. Specifically, the RPCDCOM, SQL Server, Samba Filesystem, ATPhttpd, and synthetic switchstatement test cases measure influence over the program counter, which isbeyond the scope of my tool since nsqflow operates on source code.A limitation of my comparison with respect to running time is that,despite contacting the authors for assistance with installing and using theirtool, I was unable to compile their TEMU-based tool on my own testing254.3. Comparison: sqifcmachine. The figures I present are taken directly from the experiments byNewsome, McCamant, and Song [25]. As a result, the figures I presentfor running times should not be directly compared. One should, however,note that the testing machines had similar specifications, although mine wasa slightly newer-generation machine and I also benefit from improvementsmade to SAT and SMT solvers since [25] was published.4.3 Comparison: sqifcI also compare how nsqflow performs on examples from Phan andMalacaria [26], contrasting the bounds and running times of nsqflow withthose of their tool, sqifc5. The examples are CRC8, the Grade Protocol,and the Dining Cryptographers problem, which can be found in Figures 10,12, 18 in [26], respectively. For convenience, I also provide these listings inListings 4.10, 4.11, and 4.12.As with the experiments from Section 4.2, I contacted the authorsfor assistance in installing and using their tool, sqifc. Despite makinga significant effort to install it, along with the guidance of the authors,sqifc is a research prototype that is not easy to use in a general setting.As a result, I was unable to run sqifc on my own testbed. Therefore, asin Section 4.2, I present the figures reported by the authors for comparisonagainst the values I obtained from nsqflow. Consequently, a direct runningtime comparison between nsqflow and sqifc cannot be made on the basisof the figures I present in this section. For the test cases to which I docompare, I note that the testing machine used by Phan and Malacariawas a newer-generation Intel i7 3.4GHz CPU with 8GB of RAM – ahigher-performance machine than my own testbed.5 Actually, in the same paper, Phan and Malacaria also present another tool for Cprograms, which yielded significantly worse computational performance than sqifc in allthe examples they presented. Thus, I compare only against their best tool, sqifc.264.3. Comparison: sqifcsft 0 1 2 3 4 5 6 7 8Time (s) 0.21 0.21 0.22 0.20 0.18 0.21 0.20 0.20 0.22Table 4.3: Running time when running nsqflow on the crc8 program from[26], for various shift values sft. In [26], Phan and Malacaria presentedresults only for sft values of 3 and 5 — 0.475 and 0.289 seconds, respectively.In all cases, nsqflow correctly reported 28−sft bits of influence.unsigned char GetCRC8 (unsigned char check , unsigned char ch ){i n t i , s f t ;f o r ( i = 0 ; i < 8 ; i ++ ) {i f ( check & 0x80 ) {check <<=1;i f ( ch & 0x80 ) { check = check | 0x01 ;}e l s e { check = check & 0 x f e ; }check = check ˆ 0x85 ;} e l s e {check <<= 1 ;i f ( ch & 0x80 ) { check = check | 0x01 ;}e l s e { check = check & 0 x f e ; }}ch <<= 1 ;}check >>= s f t ;r e turn check ;}Listing 4.10: CRC8The first example, crc8, computes an 8-bit CRC over an 8-bit value andshifts the final result by sft significant bits, with the source code presentedin Listing 4.10. Table 4.3 presents nsqflow runtimes for sft values rangingfrom 0 to 7. In [26], results were presented only for sft values of 3 and5, for which their running times using their CBMC-based sqifc tool were0.475 seconds and 0.289 seconds, respectively. In all cases, my tool correctlyreported 28−sft bits of influence. Thus I omit these figures from the table.274.3. Comparison: sqifci n t func ( ) {s i z e t S = 5 , G = 5 , i = 0 , j = 0 ;s i z e t n = ( (G −1)∗ S )+1 , sum = 0 ;s i z e t numbers [ S ] , announcements [ S ] , h [ S ] ;f o r ( i =0; i<S ; i++) h [ i ] = nondet in t ()%G;f o r ( i = 0 ; i < S ; i++)numbers [ i ] = nondet in t ( ) % n ;whi l e ( i <S) {j =0;whi l e ( j <G) {i f (h [ i ]== j )announcements [ i ] =( j+ numbers [ i ]− numbers [ ( i +1)%S ] ) % n ;j=j +1;}i=i +1;}f o r ( i = 0 ; i < S ; i ++)sum += announcements [ i ] ;r e turn sum % n ;}Listing 4.11: Grade protocolThe second example to which I compare is the grade protocol programgiven in Section 3.5 of [26] and copied in Listing 4.11. In [26], Phan andMalacaria vary the number of students in the protocol, and the number ofpossible grades, between 2 and 5, presenting sqifc information flow measure-ments and running times for these values. I ran nsqflow for the same valuesof students and grades as Phan and Malacaria, and present the informationflow measurements obtained from nsqflow in Table 4.4 and nsqflow’s runningtimes, along with sqifc’s running times from [26] in Table 4.5. In addition,I ran nsqflow for 16 and 24 students, and for 16 and 24 possible grades. Ta-ble 4.4 and Table 4.5 present nsqflow’s information flow measurement andrunning times for these values. The running time measurements demon-strate the advantage of nsqflow; my tool runs in 10.43 seconds for valuesof students=24 and grades=24, whereas the CBMC-based sqifc takes 40.4seconds for values of students=5 and grades=5. In all the cases in whichboth tools ran to completion, the number of bits reported by nsqflow andsqifc were equal.284.3. Comparison: sqifc2 3 4 5 16 242 1.585 2.0 2.322 2.585 4.087 4.6443 2.322 2.807 3.17 3.459 5.044 5.6154 2.807 3.322 3.7 4.0 5.615 6.195 3.17 3.7 4.087 4.392 6.022 6.616 4.954 5.524 5.931 6.248 7.913 8.49624 5.555 6.129 6.539 6.858 8.527 9.111Table 4.4: Grade protocol information flow results. The row index repre-sents the number of students, and the column index represents the numberof distinct grade values each student can take. nsqflow and sqifc reportedthe same figures for all cases in which both tools ran to completion. The twobottom rows, and the two rightmost columns, represent results computedby nsqflow for which no equivalent results from sqifc were available.Grades Students2 3 4 5 16 24nsq sqif nsq sqif nsq sqif nsq sqif nsq sqif nsq sqif2 0.36 5.66 0.45 7.03 0.61 10.77 0.86 9.47 1.22 - 2.83 -3 0.34 9.15 0.49 11.60 0.70 17.99 1.17 20.93 2.03 - 3.12 -4 0.49 10.10 0.60 16.87 0.80 21.87 1.15 18.67 2.43 - 3.10 -5 0.85 14.64 0.57 20.67 0.82 33.30 1.63 40.40 2.55 - 3.67 -16 1.24 - 1.53 - 1.79 - 3.12 - 4.14 - 5.42 -24 1.59 - 1.86 - 1.97 - 3.71 - 5.11 - 9.88 -Table 4.5: Grade protocol running time results. The columns labeled nsqpresent nsqflow’s running times, and the columns labeled sqifc presentsqifc’s running times from [26]. All running times are measured in seconds.The row index represent the number of students, and the column indexrepresents the number of distinct grade values that each student can take.294.3. Comparison: sqifcThe third example for comparison is the well-known dining cryptogra-phers protocol, shown in Listing 4.12 for convenience. I ran nsqflow on thesame C code as given by Phan and Malacaria [26] with the same parametervalues (various values up to 300) for n, the number of cryptographers, andpresent the results in Table 4.6. I obtained the same influence measurementspresented by Phan and Malacaria [26]. I found the running time of nsqflowto be much lower, with nsqflow requiring only 3.12 seconds for n = 300,compared to the reported 3326.9 seconds for sqifc. I also ran my tool forlarger values of n, specifically n = 1000 and n = 2000. For n = 2000,nsqflow still required only 167 seconds.s i z e t func ( ) {s i z e t N = 5 , output = 0 , i = 0 ;s i z e t co in [N] , obsco in [ 2 ] , d e c l [N ] ;s i z e t h ;h = nondet uchar ( ) % (N+1);f o r ( i = 0 ; i < N; i++) {co in [ i ] = nondet uchar ( ) % 2 ;}f o r ( i = 0 ; i < N; i++) {dec l [ i ] = co in [ i ] ˆ co in [ ( i +1)%N ] ;i f (h == i +1) {dec l [ i ] = ! dec l [ i ] ;}i = i +1;}f o r ( i = 0 ; i < N; i++) {output = output + dec l [ c ] ;}re turn output ;}Listing 4.12: Dining cryptographersPhan and Malacaria also presented results for three other experiments,all of which I was unfortunately unable to reproduce. Specifically, I wasunable to replicate the Tax Record test case as it is implemented in theJava-like verification language ABS [10], and I was unable to translate it intoequivalent C. Also from Phan and Malacaria’s work [26], I was unable toreplicate the test cases CVE-2011-2208 and CVE-2011-1078, despite beingC code from the Linux kernel. These test cases were modified by the authors,304.4. Larger Experimentsn 3 4 5 6 50 100 200 300 1000 2000Bits 2.0 2.32 2.58∗ 2.81 5.67 6.66 7.65 8.23 9.97 10.97nsq 0.25 0.27 0.33 0.32 0.75 1.41 1.85 3.01 16.24 166.75sqif 2.15 3.50 3.63 18.63 46.97 158.52 587.67 3326.92 − −Table 4.6: Dining cryptographers protocol results. The row labeled Bitsrepresents the number of bits measured by nsqflow. The row labelednsq represents nsqflow’s running time, and the row labeled sqif repre-sents sqifc’s running time. n is the number of cryptographers, and runningtime is measured in seconds. The starred value 2.58 differs from Phan andMalacaria’s reported 2.59. I attribute this to roundoff error during the fea-sible set size to information flow calculation.who employed manual program slicing and system call modeling to reducethe size of the program. I contacted the authors, but they did not haveready access to the source code at the time, so I was unable to include theseexamples in my experiments.Because the tools underlying nsqflow make use of randomization in someof their decisions and heuristics, running times are potentially sensitive tothese choices. Thus, I also measured how tightly clustered the running timesare. I ran all the examples from Section 4.2 and Section 4.3 from 20 to 100times and measured the interdecile range for the running times. I found thatthe running times are tightly clustered, with ranges of 0.01-0.04 seconds forthe tiny examples sensitive to background CPU tasks, and on the orderof 1-5% for the larger experiments of this set. Full results may be foundin Table Larger ExperimentsIn this section, I present further experiments on real C code. The experi-ments range from 1915 to 23759 lines of code, with the output size rangingfrom 32 to 32768 bytes, and the measured influence ranging from 0 to 32752bits, largely depending on the output chosen. Running times ranged from 2minutes for the 2600-line darkHttpd, to 3 and a half hours for the 20000-linexinetd. Lines of code are measured as reported by the UNIX wc -l tool on.c and .h files in the source directories and subdirectories. While this doesnot measure the number of included standard library headers or account forcomments, it is the simplest way to get a general idea about the size of theprogram. The reader will notice the large proportion of http and ftp serversand clients. This choice was somewhat accidental, and resulted from my de-sire to use open-source C programs (found on SourceForge or similar) that314.4. Larger ExperimentsTest Numberof RunsMedian RunningTime (s)InterdecileRange (s)Copy 100 0.16 0.01Masked copy 100 0.16 0.01Checked copy 100 0.16 0.02Divide by 2 100 0.17 0.02Multiply by 2 100 0.17 0.02Implicit flow 100 0.17 0.02Population count 100 0.17 0.01Mix and duplicate 100 0.17 0.01CRC8, sft=0 100 0.21 0.03CRC8, sft=1 100 0.21 0.02CRC8, sft=2 100 0.22 0.04CRC8, sft=3 100 0.20 0.03CRC8, sft=4 100 0.18 0.02CRC8, sft=5 100 0.21 0.02CRC8, sft=6 100 0.20 0.02CRC8, sft=7 100 0.20 0.02CRC8, sft=8 100 0.22 0.02DinCryptos, n=3 100 0.25 0.02DinCryptos, n=4 100 0.27 0.02DinCryptos, n=5 100 0.33 0.03DinCryptos, n=6 100 0.32 0.03DinCryptos, n=50 100 0.75 0.04DinCryptos, n=100 100 1.41 0.05DinCryptos, n=200 100 1.85 0.08DinCryptos, n=300 100 3.01 0.07DinCryptos, n=1000 50 16.24 0.60DinCryptos, n=2000 20 166.75 2.24Table 4.7: Interdecile range (in seconds) for the comparison programs. Therange is reported as the difference between the 90th and 10th percentiles.324.4. Larger Experimentswere of the right scope for nsqflow (roughly 10K LOC or under), and ideallyones that were web-facing or otherwise interesting from a typical securitypoint-of-view. My tool, however, is not limited to programs of this nature.For the full technical details of the experiments, along with results, pleaseconsult Table 4.8. A brief description of each program I tested is found inthe list below.muHttpd Forking HTTP server with basic sanitizer function. The exitpoint/output variable is a buffer to which data from the file systemis copied before it is sent back over the web in order to service therequest. File: request.c; Function: handle_request; Variable:buf.awHttpd Forking HTTP server. The exit point is a buffer to write backfollowing an http request. File: proc.c; Function: procsendfile;Variable: cn->databuf.darkHttpd HTTP server. The exit point is a server name (string) pointerset from the command-line at program start-up, and should not be ableto be modified by its (network) inputs. File: darkhttpd.c; Function:main; Variable: wwwroot[0].xinetd The Linux Extended Internet Services Daemon, version 2.3.15. Theexit point is a newly-allocated server object based on an old (input)one. This test checks whether function side effects can block any pos-sible values for the output variable (the variable to which the contentsinput variables’ addresses are copied). File: server.c; Function:server_alloc; Variable: <return value>.FTP Client Command-line FTP client. The exit point is a characterbuffer containing a 3-character FTP return code to be interpretedplus a null character. This experiment tests whether all potentialvalues are copied to the output buffer and that there is nothing writ-ten past the end of the array that would replace the null character(e.g. off-by-1 error or buffer overflow). File: ClientFTP.c; Func-tion: interpreteReponse; Variable: rep.thttpd Throttling HTTP server, version 2.26. The exit point is a param-eter used for throttling connection speed. This experiment verifieswhether the state of the connection, which may be influenced by anattacker, can affect the throttling parameter to a large degree, or only334.5. CoreUtilsrestricts it to a small set of possible values. File: thttpd.c; Func-tion: handle_send; Variable: max_bytes.Thy Thy HTTP server, version 0.9.4. The output variable is an internalcode representing the result of a parse, of which a small set of valuesshould be possible. This tests whether an attacker can force this re-turn code to take on an arbitrary value, rather than the narrowing ofthe information flow intended by the code. File: http.c; Function:http_request_parse; Variable: <return value>.ftpRelay2 FTPRelay ftp library, version 2. The output variable is a bufferinto which network data is read. This experiment tests whether ornot its simple input sanitizer check constrains the data received. File:childproc.c; Function: ClientToServer; Variable: RecvBuffer.httpserver1 Small http server, version 0.3. The output variable is an inter-nal code depending on data read from the network, as part of a struct.This experiment aims to test for the presence or absence of a simplebuffer overflow — whether or not adjacent data in the struct is clob-bered when writing it. File: httpsmain.c; Function: set_incoming;Variable: sv.rc.ftpslib1 FTP library, version 0.2. This experiment aims to test for thepresence or absence of a simple buffer overflow - whether or notadjacent data in the struct is clobbered when writing it (similarly-structured program to the above httpserver1). File: ftplib.c; Func-tion: set_mode; Variable: fcode.Bugs Found: For two of the experiments, httpserver1 and ftpslib1, thelarge influence reported turned out to be actual security vulnerabilities inthe software. In both these cases, a buffer overflow was present when writingthe value adjacent to the measured variable. Both these bugs were previouslyundisclosed, but confirmed by the developers.4.5 CoreUtilsI also selected a subset of the 30 smallest programs, as measured by linesof code, from the GNU CoreUtils suite, version 8.23. I chose the flow beingfrom argv[1] to the return code of the main function in all cases, boundingthe size of the argv string to 256 bytes. I selected these examples arbitrarilyto demonstrate nsqflow’s robustness for use as a completely automated tool344.5. CoreUtilsTest LOC SlicedLOCSlicedLOCReduc-tionExitPointSize(bits)Run-ningtimeSymbo-licexecu-tiontime(% oftotal)Modelcount-ingtime(% oftotal)RAMUsage(MB)Influ-ence(bits)mu-Httpd2268 1996 12% 32768 7m 84% 16% 25832752aw-Httpd2257 1602 29% 8192 22m 74% 26% 347 8192dark-Httpd2599 837 68% 32 1m52s85% 15% 494 0xinetd 23759 6320 73% 256 3h39m90% 10% 1818 256FTP-Client1915 440 77% 32 2h54m81% 19% 449 24thttpd 11155 3681 67% 32 2h 1m 73% 27% 1330 1Thy 17134 4968 71% 32 3h13m82% 18% 1653 1ftp-Relay21080 944 13% 8192 timeout 99% 1% ≥1745 ≥3073http-server11433 978 32% 32 2h16m83% 17% 319 32ftp-slib11611 992 38% 32 1h28m84% 16% 336 32Table 4.8: Larger experiment results for the programs presented in Sec-tion 4.4. LOC stands for lines of code. The ftpRelay2 entry is included asan example of my tool’s limitations — although this program is not funda-mentally different from others in the list, nsqflow is unable to complete itsanalysis despite its relatively small size, timing out after 20 hours. Programslicing effectiveness is measured in LOC before preprocessing. For RAMusage, the reader should note that nsqflow has a maximum RAM usage of2000MB that is inherited from Kite.354.6. Performance Discussionrather than one requiring careful and thoughtful developer annotations. Iran nsqflow on these program once each, with a timeout of 60 minutes foreach program. 20 of the 30 completed in 60 seconds or less, with only 2 of30 timing out after 60 minutes. Influence measurements ranged from 0 to2 bits for the instances running to completion. Full details of these resultsmay be found in Table 4.9.In addition, I used a uniform random number generator to select a subsetof 30 programs from CoreUtils, and ran nsqflow on them in the same fashion.This resulted in 9 programs already included in the 30 smallest, leaving uswith 21 new CoreUtils6. 9 of the 21 tests completed in 60 seconds or less,with only 4 of the 21 timing out after 60 minutes. Influence measurementswere 1 or 1.585 bits for the instances running to completion. For CoreUtils,I omit RAM figures, as I have found that memory usage has not been aproblem. I present the full details of these results in Table 4.10, with theones appearing in the 30-smallest benchmark removed, leaving a total of 21programs).4.6 Performance DiscussionIn this section, I discuss issues related to performance and optimizations,recognizing that a reader may wonder which optimizations and componentsof nsqflow’s toolchain are necessary for nsqflow to scale well.Empirically, I have found that my pointer analysis has been crucial toscale to the size of tests I have presented. I ran the instances describedin Section 4.4 with nsqflow’s pointer analysis disabled, and a timeout of 8hours. None of these instances managed to run to completion, with all butmuHttpd running into the intrinsic 2000 MB RAM usage limit that nsqflowhas inherited from Kite.Additionally, the underlying symbolic execution engine’s ability to learnfrom infeasible path constraints is one of its defining features. Thus, aninteresting comparison is between nsqflow using Kite with its path learningenabled, and nsqflow using Kite with path learning disabled. As describedearlier, the pointer analysis I have implemented interacts with the pathlearning. As I have just shown above, pointer-heavy code tends to makensqflow unable to complete. As a consequence, I am unable to meaningfullypresent figures for nsqflow with both path learning and pointer analysisturned off.6I chose to stick with the 21 programs left over, so as to maintain an unbiased randomselection.364.6. Performance DiscussionTest Lines of Code Influence (bits) Time (seconds)basename 190 1 21dirname 137 1 8echo 273 0 10env 164 ≥1.585 timed outgroups 141 1 15hostid 89 1 11hostname 117 1 4link 95 1 4logname 87 1 8mkdir 307 1 30mkfifo 182 1 622mknod 274 1 33nice 223 2 70nohup 241 ≥1.585 timed outnproc 134 1 17printenv 155 1.585 23readlink 179 1 65realpath 278 1 88rmdir 253 1 546runcon 267 2 21sleep 150 1 71sum 275 1 56sync 74 1.585 13tee 221 1 85true 81 0 3tty 125 2 4unlink 90 1 4users 152 1 61whoami 95 1 16yes 89 1 5Table 4.9: Information flow for the set of the 30 smallest CoreUtils. In-formation flow is measured to the return value of the main function. Eachprogram is run with a timeout of 60 minutes.374.6. Performance DiscussionTest Lines of Code Influence (bits) Time (seconds)base64 326 1 39cat 785 1 214chmod 571 1 105chown 331 1 59comm 450 1 28cp 1221 1 329expand 431 1 102factor 2548 ≥1 timed outfmt 1042 1 756install 1047 ≥1 timed outmd5sum 879 ≥1 timed outnl 617 1 36numfmt 1523 1.585 81pathchk 424 1 38shuf 627 1 238sort 4751 ≥0 timed outstat 1580 1 39touch 438 1 26truncate 425 1 41uname 376 1 24wc 802 1 153Table 4.10: Information flow for the set of randomly-selected CoreUtils.Information flow is measured to the return value of the main function. Eachprogram is run with a timeout of 60 minutes.384.6. Performance DiscussionMoreover, the reader may have noticed that not every source line ofa program may be relevant to measuring a specific flow. More generally,nsqflow is able to effectively avoid analyzing large parts of the input pro-gram through the technique of program slicing [41]. I leverage the programslicing implemented in Kite, and also a compound slicer using off-the-shelfFrama-C [5] and LLVMSlicer [34]. I note that the program slicing per-formed is entirely automatic, which is in contrast to results presented in,for instance, Phan and Malacaria’s work [26], which involves manual slic-ing. For the set of benchmarks I have presented, I have found slicing to bevery effective. I measured the degree of code eliminated through programslicing as a way to demonstrate that, although the nsqflow toolchain mustsolve computationally difficult problems, existing heuristics can often reducethe size of the problems in practice, thus making them tractable. Indeed,nsqflow’s ability to complete is sensitive to the effectiveness of the slicing ona given program. Because the LLVM-based slicing does not produce validsource, I approximate the number of source lines sliced by measuring thefraction of LLVM bitcode instructions sliced out. I found that these fig-ures ranged between 12% to 73% for the programs used in the experimentspresented in Section 4.4. Full details were presented in Table 4.8.Furthermore, a reasonable question to pose might be where the bottle-neck is typically found in a run of nsqflow. I have previously hinted at thesurprising result that model counting has not been the most expensive stepin nsqflow’s toolchain, and I present data to support this claim. Althoughmy initial expectations were that nsqflow would nearly always be limited bythe model counting, I now believe that my particular variant of the more gen-eral #SAT problem is responsible for the relatively short time spent insidethe model counter. I present a breakdown of the total running time spentin each component of nsqflow for the experiments from Section 4.4. I foundthat for these, the proportion of time spent performing model counting ac-counts for 10% to 27% of total running time. Table 4.8 lists this informationas a percentage of total running time, to two significant figures.39Chapter 5Related WorkThere is a large body of literature about quantitative information flow, muchof it more theoretical than the scope of this thesis [2, 16, 17, 31, 35, 39, 42].In this chapter, I survey work that, like nsqflow, aims to provide a practicaltool to compute QIF. I also survey techniques from software and hardwareverification that are useful in QIF.5.1 Network-Flow-Based QIFIn [21], McCamant and Ernst present a scalable approach to measuring QIFusing a network flows approach. Unlike the problem I consider in this thesis,measuring the degree of influence of a program’s inputs over its outputs, theyaddress a related confidentiality problem, quantifying the information anattacker can learn about a program’s inputs by observing its outputs. Theymodel information channels in a program as a network of finite-capacitypipes and reduce the measurement problem to a network flow computation,solving it using standard network flow algorithms.Specifically, they construct a graph in which the edges represent variables(including inputs) in the program, with weights corresponding to the bit-widths of the variables, and the nodes represent program instructions onthese variables, with the in-degrees of the nodes representing the number ofarguments needed by the corresponding instructions. The graph is acyclic,and edges always point from older (in program instruction order) nodes tonewer ones. The set of all secret inputs and the set of all public outputs arerepresented by the source and sink nodes, respectively.In order to handle implicit flows, as in the case of a multi-way branch,McCamant and Ernst use static analysis to infer enclosure regions, whichdeclare locations that the code enclosed in enclosure regions may write to.These enclosures become special aggregate nodes in the graph, with weightson edges in and out of these nodes corresponding to the number of differentexecutions (implicit flows) through the aggregate nodes, equal to the base-2logarithm of the number of branches within enclosures.405.2. Symbolic-Execution-Based QIFTo obtain the edge weights, the authors use dynamic tainting at thelevel of bits, maintaining which bits are certainly leaked, which are certainlynot leaked, and which bits are unknown. In order to maintain soundness,unknown bits are treated conservatively and assumed to be leaked. Con-sequently, the result is typically an overapproximation, which tends to becruder as the program size increases. Finally, using standard network flowalgorithms, McCamant and Ernst’s tool arrives at a final QIF measurement.Their graph construction is efficient, as are the network flow algorithms theyuse. As a result, their tool is able to effectively scale to programs consist-ing of 500K or more lines of code, at the cost of precision — indeed, theirapproximation is crude when compared to the QIF reported by nsqflow, astheir approach can only compute an upper bound on the channel capacity.To the best of my knowledge, the work by McCamant and Ernst has beenthe only approach to date, that attempts to measure QIF through programsbased on network flows. While not as precise as nsqflow, the scalability oftheir approach is impressive and I believe that further work in this area is apromising direction for future research in measuring QIF through programs.5.2 Symbolic-Execution-Based QIFIn [25], Newsome, McCamant, and Song present a way to measure channelcapacity, addressing the same fundamental problem as I do in this thesis. Ina manner somewhat reminiscent of nsqflow’s, they use a symbolic executionengine to obtain a formula over the program inputs and outputs that rep-resents the entire program, expressed in first order logic over bitvectors andarrays. Once this formula is obtained, their tool makes repeated queries toa constraint solver in order to characterize the set of possible output valuesthe program can take. Using a heuristic approach, they make queries aboutwhether certain values are feasible or found within certain value ranges,until the number of feasible values they have discovered crosses a desiredthreshold. Theoretically, this step is precise if run to completion, but isvery computationally expensive, and the tool is typically only able to checka small number of values (typically 64 or 128) before stopping and reportinga number of bits (6 or 7) as a lower bound. In order to combat the largecomputational cost associated with this step, the authors also implementan approach based on approximate model counting in order to obtain anapproximation of the number of solutions. However, as I have shown in Sec-tion 4.2, their precision is quite low even for small examples, whereas nsqflowreports a precise and exact influence measurement. Unlike nsqflow, which415.2. Symbolic-Execution-Based QIFexhaustively explores every program path, their approach only operates ona single path. Therefore, the feasible value set they obtain is only a subsetof the whole program’s, and their resulting influence measurement is only alower bound on the channel capacity.Newsome, McCamant, and Song also adopt a different, but even lessprecise, method to scale to much larger program sizes. Specifically, the au-thors consider a return-oriented attack [30] by tracking the inputs’ influenceover the program counter. Using this program counter technique along withconservative lower bounds and approximating model counting, their toolis able to obtain approximate information flow bounds that scales to verylarge programs, on the order of a million lines of code. Their technique hasthe added benefit of operating on binaries, thus eliminating the need forthe program’s source as nsqflow requires. However, the precision of theirapproach is very low for such large programs.Most similar to my approach is the work by Phan and Malacaria de-scribed in [26, 27]. As in this thesis, Phan and Malacaria also use an ap-proach based on symbolic execution and constraint solving to solve the prob-lem of obtaining the channel capacity between a program’s inputs and itsoutputs. Their tool, sqifc, symbolically executes a program, maintaining arunning total of the number of solutions as it executes. This running total isupdated with multiple feasible values when symbolic execution determines apath to be feasible, and sqifc counts the number of new solutions discoveredas a result, for each possible program path.In this way, Phan and Malacaria integrate model counting with sym-bolic execution — the first of two fundamental differences between my workand theirs. In contrast, my approach uses symbolic execution to produce aCNF representation of the program, and then subsequently applies an ef-ficient (subset) model counter to that CNF. To facilitate their integratedmodel counting, Phan and Malacaria introduce #DPLL(T ) model counters.A #DPLL(T ) solver counts the number of distinct satisfying solutions tothe Boolean skeleton of an SMT formula.7 The SMT-based framework theypresent is very flexible, allowing them to represent existing model checkingand symbolic execuction engines as theory solvers. However, as the authorswrite, #DPLL(T ) model counters must explicitly enumerate each satisfyingsolution to the Boolean skeleton individually, as each satisfying assignmentto the Boolean skeleton must also be checked for satisfiability against the7Note that whereas an SMT formula may in some cases have an unbounded number ofdistinct models (for example, if the formula includes unconstrained real numbers), thereare at most a finite number of distinct satisfying assignments to the Boolean skeleton ofany SMT formula.425.2. Symbolic-Execution-Based QIFtheory solvers. In contrast, the #SubSAT solver in nsqflow is able to gen-eralize individual satisfying assignments, often allowing the model countingstep to complete quickly even for large instances and large, multi-byte out-put values. Indeed, as I have shown in Chapter 4,in most of my experiments,symbolic execution was the dominant cost, while the model counting stepcompleted relatively quickly.Second, Phan and Malacaria’s work differs from mine in the symbolicexecution step. Like nsqflow, if their symbolic execution runs to comple-tion, their result will be a sound and complete bound on the total numberof solutions, and therefore be a precise measurement of the channel capacity.However, their tool does not leverage learning CFG information during sym-bolic execution, and therefore cannot prune paths as effectively as nsqflowcan.As a consequence of these key differences in symbolic execution andmodel counting, their results, while precise, scale to programs that span onlya few hundred lines of C following manual modeling and program slicing.As I have shown in Section 4.3, nsqflow is also precise but scales to muchlarger programs, and has far lower running times for small programs.To the best of my knowledge, there has not been other published workin symbolic-execution-based QIF apart from the work I have cited and de-scribed here.43Chapter 6Conclusion6.1 ContributionsI have described an approach to measuring the QIF, specifically the chan-nel capacity, through software. My work is built upon state-of-the-artsymbolic execution techniques and model counting. While previous ap-proaches either scaled to very large programs but provided coarse approx-imations to QIF [21, 25], or were precise but only scaled to hundreds oflines [2, 22, 25, 26], I demonstrate that it is possible to be highly precise andscale to tens of thousands of lines of real C code.My implementation of this approach, nsqflow, is fully automatic andtracks both explicit and implicit information flows through all (finitelybounded) program paths, with high precision. I show how pointer anal-ysis can be added to existing symbolic execution engines in order to makefeasible the analysis of code with pointers. I describe a new variant of the ex-act model counting problem, which I have called #SubSAT, that is tractablefor practical examples compared to the full model counting problem, andmakes feasible model counting specifically for QIF.6.2 Future WorkThere are many directions for future work. A near term possibility wouldbe to expand the applications of nsqflow. For example, I could extend thetool to handle C++, which should be straightforward given that LLVM canhandle C++ already. Or, capitalizing on the completely automatic natureof the tool, one could imagine background application of nsqflow on a mas-sive scale, similar to fuzz testing or regression testing, to flag suspiciousinformation flows. One way to implement such an application could be touse an automatic selection process, perhaps based on static analysis andheuristics, to automatically select variables over which to measure the in-fluence. The application could be continuously running nsqflow to measureinfluence over these automatically-selected output variables, stopping when446.2. Future Worka minimum threshold policy (similar to the one described in Section 3.1.3)for the output variables is exceeded (perhaps computed automatically as afraction of the output variable’s bit width).In the opposite direction, it is possible to extend nsqflow to computemore expensive, but even more valuable information. For example, it mightbe possible to explicitly formalize side-channel information [17], and then thesymbolic execution engine would know that calls to functions like time()must behave in certain ways, thereby capturing side-channel informationvia nsqflow’s existing accounting of implicit information flows. For a verydifferent example, I could extend the model counting to enumerate the in-put equivalence classes for each possible output value, rather than just thevalues. This is a much more expensive computation, but allows preciselycomputing alternative information flow metrics based on Shannon entropy,min-entropy, and guessing entropy, for non-uniform distributions [2], ratherthan only channel capacity as nsqflow currently computes.An interesting theoretical question is how to deal with the distinc-tion between attacker-controlled and auxillary inputs. Adopting the no-tation from Newsome, McCamant, and Song [25], consider a computationP (I, Iaux) → V , where V is the output set, I represents the attacker-controlled input, and Iaux is the auxiliary input. A reasonable extensionto the channel capacity model would be that the attacker cannot controlthe entire set of inputs. Rather, the attacker can only observe the value ofIaux. With this formulation, one can frame both channel capacity and thisalternative definition in the following manner, respectively:CountV(∃I, Iaux | P (I, Iaux) = V)maxIaux(CountV(∃I | P (I, Iaux) = V))The latter is fundamentally a more precise model than the former, andis appropriate as it represents the worst-case internal program state togive the attacker the greatest degree of control over the output vari-able. Specifically, the former is problematic as it may overcount thenumber of output values to which the adversary can purposefully steerthe computation. In contrast, in Newsome et al. [25], Iaux is set to anarbitrary fixed value, which may undercount the adversary’s influence —some other fixed value for Iaux may give the adversary more influence.This motivates the latter definition: It represents the worst-case internalprogram state to give the attacker the greatest degree of control over theoutput variable. As such, this is a natural definition of influence in a secu-456.2. Future Workrity context when there are auxiliary inputs not under the attacker’s control.In another direction, nsqflow would benefit greatly from a more robustpointer analysis. Indeed, in Section 4.4 and Section 4.5, nsqflow was unableto run to completion on some of the programs largely due to the shortcom-ings of nsqflow’s current pointer analysis. Specifically, DSA did not returnsufficient information to allow Kite to prune large parts of the search spacethat resulted from the heavy use of pointers in the programs. As the currentanalysis, based on LLVM’s existing DSA, is simplistic, a promising futuredirection is to more efficiently utilize points-to set information, such as theapproach presented in [28].Finally, while the model counting has not generally been a problem forthe experiments I have presented, model counting is a computationally hardproblem from a theoretical perspective. There is recent work on efficient,approximate model counting, with provable bounds. [6] In some preliminaryexperiments, I have not yet found approximate model counting to be ben-eficial, but these are recent results and additional progress in this area islikely.46Bibliography[1] Gilles Audemard and Laurent Simon. Predicting learnt clauses qualityin modern SAT solvers. In IJCAI, volume 9, pages 399–404, 2009.[2] Michael Backes, Boris Ko¨pf, and Andrey Rybalchenko. Automatic dis-covery and quantification of information leaks. In Security and Privacy,2009 30th IEEE Symposium on, pages 141–153. IEEE, 2009.[3] Cristian Cadar, Daniel Dunbar, and Dawson R Engler. KLEE: Unas-sisted and automatic generation of high-coverage tests for complex sys-tems programs. In OSDI, volume 8, pages 209–224, 2008.[4] Thierry Castell. Computation of prime implicates and prime implicantsby a variant of the Davis and Putnam procedure. In Tools with ArtificialIntelligence, 1996., Proceedings Eighth IEEE International Conferenceon, pages 428–429. IEEE, 1996.[5] CEA-LIST and INRIA-Futurs. Frama C Slicer. http://frama-c.com/slicing.html, 2014. Accessed October 14, 2015.[6] Supratik Chakraborty, Kuldeep S Meel, and Moshe Y Vardi. A scalableapproximate model counter. In Principles and Practice of ConstraintProgramming, pages 200–216. Springer, 2013.[7] David R Chase, Mark Wegman, and F Kenneth Zadeck. Analysis ofPointers and Structures, volume 25. ACM, 1990.[8] David De´harbe, Pascal Fontaine, Daniel Le Berre, and BertrandMazure. Computing prime implicants. In Formal Methods in Computer-Aided Design (FMCAD), 2013, pages 46–52. IEEE, 2013.[9] Niklas Ee´n and Niklas So¨rensson. MiniSat: A SAT solver with conflict-clause minimization. SAT, 5, 2005.[10] Reiner Ha¨hnle, Einar B Johnsen, Bjarte M Østvold, Jan Scha¨fer, MartinSteffen, and Arild B Torjusen. Deliverable D1.1A: Report on the Core47BibliographyABS language and methodology, part A. Technical Report FP7-231620,HATS Project, 2010.[11] William G Halfond, Jeremy Viegas, and Alessandro Orso. A classifi-cation of SQL-injection attacks and countermeasures. In Proceedingsof the IEEE International Symposium on Secure Software Engineering,volume 1, pages 13–15. IEEE, 2006.[12] Klaus Havelund and Thomas Pressburger. Model checking Java pro-grams using Java PathFinder. International Journal on Software Toolsfor Technology Transfer, 2(4):366–381, 2000.[13] Arthur Hicken. SQL injection hall of shame. http://codecurmudgeon.com/wp/sql-injection-hall-of-shame/, 2015. Accessed October 14,2015.[14] NTT Innovation Institute. NTT Group global threat intelligence report,2014.[15] James C King. Symbolic execution and program testing. Communica-tions of the ACM, 19(7):385–394, 1976.[16] Vladimir Klebanov. Precise quantitative information flow analysis usingsymbolic model counting. In Proceedings of the International Workshopon Quantitative Aspects in Security Assurance. Citeseer, 2012.[17] Boris Ko¨pf and David Basin. An information-theoretic model for adap-tive side-channel attacks. In CCS’07: Proceedings of the 14th ACMConference on Computer and Communications Security, pages 286–296, 2007.[18] Chris Lattner and Vikram Adve. LLVM: A compilation framework forlifelong program analysis & transformation. In Code Generation andOptimization, 2004. CGO 2004. International Symposium on, pages75–86. IEEE, 2004.[19] Chris Lattner, Andrew Lenharth, and Vikram Adve. Making context-sensitive points-to analysis with heap cloning practical for the realworld. In ACM SIGPLAN Notices, volume 42, pages 278–289. ACM,2007.[20] Vasco M Manquinho, Paulo F Flores, Joa˜o P Marques Silva, and Ar-lindo L Oliveira. Prime implicant computation using satisfiability algo-48Bibliographyrithms. In Tools with Artificial Intelligence, 1997. Proceedings., NinthIEEE International Conference on, pages 232–239. IEEE, 1997.[21] Stephen McCamant and Michael D Ernst. Quantitative informationflow as network flow capacity. ACM SIGPLAN Notices, 43(6):193–205,2008.[22] Ziyuan Meng and Geoffrey Smith. Calculating bounds on informationleakage using two-bit patterns. In Proceedings of the ACM SIGPLAN6th Workshop on Programming Languages and Analysis for Security,page 1. ACM, 2011.[23] Alan Mishchenko and Robert K Brayton. SAT-based complete don’t-care computation for network optimization. In Design, Automation andTest in Europe, 2005. Proceedings, pages 412–417. IEEE, 2005.[24] Matthew W Moskewicz, Conor F Madigan, Ying Zhao, Lintao Zhang,and Sharad Malik. Chaff: Engineering an efficient SAT solver. InProceedings of the 38th annual Design Automation Conference, pages530–535. ACM, 2001.[25] James Newsome, Stephen McCamant, and Dawn Song. Measuringchannel capacity to distinguish undue influence. In Proceedings ofthe ACM SIGPLAN Fourth Workshop on Programming Languages andAnalysis for Security, pages 73–85. ACM, 2009.[26] Quoc-Sang Phan and Pasquale Malacaria. Abstract model counting: Anovel approach for quantification of information leaks. In Proceedingsof the 9th ACM Symposium on Information, Computer and Communi-cations Security, pages 283–292. ACM, 2014.[27] Quoc-Sang Phan, Pasquale Malacaria, Oksana Tkachuk, and Corina SPa˘sa˘reanu. Symbolic quantitative information flow. ACM SIGSOFTSoftware Engineering Notes, 37(6):1–5, 2012.[28] David A Ramos and Dawson Engler. Under-constrained symbolic exe-cution: Correctness checking for real code. In 24th USENIX SecuritySymposium (USENIX Security 15). USENIX Association, 2015.[29] Darsh P Ranjan, Daijue Tang, and Sharad Malik. A comparative studyof 2QBF algorithms. In SAT, pages 292–305, 2004.49Bibliography[30] Ryan Roemer, Erik Buchanan, Hovav Shacham, and Stefan Savage.Return-oriented programming: Systems, languages, and applications.ACM Transactions on Information and System Security (TISSEC),15(1):2, 2012.[31] Andrei Sabelfeld and Andrew C Myers. Language-based information-flow security. Selected Areas in Communications, IEEE Journal on,21(1):5–19, 2003.[32] Tian Sang, Fahiem Bacchus, Paul Beame, Henry A Kautz, and ToniannPitassi. Combining component caching and clause learning for effectivemodel counting. SAT, 4:7th, 2004.[33] Edward J Schwartz, Thanassis Avgerinos, and David Brumley. All youever wanted to know about dynamic taint analysis and forward symbolicexecution (but might have been afraid to ask). In Security and Privacy(SP), 2010 IEEE Symposium on, pages 317–331. IEEE, 2010.[34] Jiri Slaby. LLVM static slicer. https://github.com/jirislaby/LLVMSlicer, 2014. Accessed October 14, 2015.[35] Geoffrey Smith. On the foundations of quantitative information flow. InFoundations of Software Science and Computational Structures (FOS-SAC), pages 288–302, 2009.[36] Dawn Song, David Brumley, Heng Yin, Juan Caballero, IvanJager, Min Gyung Kang, Zhenkai Liang, James Newsome, PongsinPoosankam, and Prateek Saxena. BitBlaze: A new approach to com-puter security via binary analysis. In Information Systems Security,pages 1–25. Springer, 2008.[37] Celina G Val. Conflict-driven symbolic execution : How to learn toget better. Master’s thesis, The University of British Columbia, March2014.[38] Celina G Val, Michael A Enescu, Sam Bayless, William Aiello, andAlan J Hu. Precisely measuring quantitative information flow: 10Klines of code and beyond. In European Symposium on Security andPrivacy, 2015. IEEE, 2015.[39] Pavol Cˇerny´, Krishnendu Chatterjee, and Thomas A Henzinger. Thecomplexity of quantitative information flow problems. In ComputerSecurity Foundations Symposium (CSF), 2011 IEEE 24th, pages 205–217. IEEE, 2011.50Bibliography[40] Larry Wall and Schwartz Randal L. Programming Perl. O’Reilly andAssociates, 1991.[41] Mark Weiser. Program slicing. In Proceedings of the 5th InternationalConference on Software Engineering, pages 439–449. IEEE Press, 1981.[42] H. Yasuoka and T. Terauchi. Quantitative information flow — verifi-cation hardness and possibilities. In Computer Security FoundationsSymposium (CSF), IEEE 23rd, pages 15–27, 2010.51


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items