A Logic and Decision Procedure for Verification of Heap-Manipulating Programs by Zvonimir Rakamaric A THESIS S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F Master of Science in T H E F A C U L T Y O F G R A D U A T E STUDIES (Computer Science) The University of British Columbia August 2006 © Zvonimir Rakamaric, 2006 11 Abstract Heap-manipulating programs (HMPs), which manipulate unbounded linked data structures via pointers, are a major frontier for formal verification of software. Formal verification is the process of proving (or disproving) the correctness of a system with respect to some kind of formal specification or property. The primary contributions of this thesis are the definition of a simple transitive closure logic tailored for formal verification of HMPs , and an efficient decision procedure for this logic. To assess the effectiveness of the proposed approach, we develop an H M P verification framework, which uses our fast implementation of the decision procedure to verify a number of H M P examples. Experimental examples (including three small container functions from the Linux kernel) demonstrate that the logic is practically useful and expressive enough to prove many interesting heap properties. In addition, the decision procedure provides a substantial time and space advantage over pre-vious approaches. Ill Contents Abstract " Contents i " List of Tables v List of Figures vi 1 Introduction 1 1.1 Motivation 1 1.2 Contributions 4 1.3 Organization of the Thesis 4 2 Background 6 2.1 Heap-Manipulating Programs 6 2.2 Predicate Abstraction 10 2.3 Over-Approximating the Reachable Abstract States 12 2.4 Related Work . . 14 3 Proposed Logic 19 3.1 Basic Logic 19 3.2 Handling Pointer and Data Function Updates 23 iv 4 Decision Procedure 25 4.1 Inference Rules 25 4.2 Basic Decision Procedure 31 4.3 Decision Procedure Extension for Handling Updates 32 5 Experiments 38 6 Conclusions and Future Work 45 Bibliography 47 Appendix A Proofs 54 A . l Proof of Theorem 1 54 A.2 Proof of Theorem 2 57 A.3 Proof of Theorem 3 58 A.4 Proof of Theorem 5 64 A . 5 Complexity of the Satisfiability Problem 65 Appendix B Formalization of the Decision Procedure 68 B . l Proof of Theorem 4 69 Appendix C Pseudocode of the Examples 70 List of Tables Results of Verifying HMPs 43 vi List of Figures 2.1 The Syntax of a Heap-Manipulating Program 7 2.2 ND-Insert H M P Example 8 2.3 Init-Cyclic H M P Example 9 2.4 Linux-List-Del H M P Example 10 3.1 Syntax of the Proposed Logic 20 3.2 Acyclic Heap Structure Example 22 3.3 Cyclic Heap Structure Example 22 4.1 Inference Rule Example 26 4.2 Basic Inference Rules 27 4.3 Between Inference Rules 28 4.4 Pseudocode of the Core Decision Procedure Algorithm 32 4.5 Pointer Update Inference Rules 35 4.6 Data Update Inference Rules 36 C l List-Reverse 70 C.2 List-Add 71 C.3 ND-Insert 71 C.4 ND-Remove 72 C.5 Zip 73 C.6 Sorted-Zip 74 vii C.7 Sorted-Insert 75 C.8 Bubble-Sort 76 C.9 Remove-Elements 77 C I O Remove-Segment 78 C . l l Search-And-Set 79 C.12 Set-Union 80 C.13 Create-Insert 81 C.14 Create-Insert-Data 82 C.15 Create-Free 83 C.16 Init-List 84 C.17 Init-List-Var 84 C.18 Init-Cyclic 84 C.19 Sorted-Insert-DNodes 85 C.20 Remove-Doubly 86 C.21 Remove-Cyclic-Doubly 87 C.22 Linux-List-Add 88 C.23 Linux-List-Add-Tail 89 C.24 Linux-List-Del 90 Chapter 1 Introduction 1.1 Motivation We are witnessing how software systems are becoming a part of every segment of human life. Nowadays, software is often used to control many "mission-critical" tasks, where an error in software could cause very serious consequences. Such software systems have to meet a high reliability bar in order to prevent disasters from happening. Back in 1972, Dijkstra realized that testing is not the solution for achieving error-free software — "Program testing can be used to show the presence of bugs, but never to show their absence!" [Dij72]. In theory, testing could be used to show that a software system doesn't have any errors by exhaustively traversing all possible execution paths. In practice, however, it is usually impossible to test software for every possible execution because of the vast state space. As opposed to testing, formal verification is the process of proving (or disproving) the correctness of a system with respect to some kind of formal specification or property. Formal verification is a now well-accepted method for hardware verification, and it could be used to achieve highly reliable and error-free software as well. Therefore, there has been a lot of research recently in employing successful formal verification techniques from the hardware world on software. Much of the success of applying formal verification to hardware comes from using model checking [CES86]. Model checking is a method for formally verifying finite-state systems by exhaustively traversing their state space. Because the state space grows ex-ponentially with the size of a system, model checkers face a blow up of the state space, commonly known as the state explosion problem. Different techniques are used to over-come state explosion, abstraction being one of the most successful ones. Abstraction is the process of reducing the complexity of a system by removing information which is not relevant for a particular task. It is often used in the verification of large, complex systems. Software model checking has recently emerged as a vibrant area of formal verifi-cation research. Because of the state explosion problem, much of the success of applying model checking to software has come from using predicate abstraction [GS97, DDP99, B M M R 0 T , H J M S 0 2 ] . Predicate abstraction is an abstraction technique that employs a finite set of predi-cates in some logic. The predicates are assertions about states of the concrete system; the concrete system is usually infinite-state. In the abstraction, each predicate is represented with a boolean variable. Therefore, predicate abstraction is used to transform a typically infinite-state system, such as software, to a much smaller, finite-state, manageable over-approximation. The finite-state over-approximation is defined as a transition system over boolean variables which represent predicates, and can in turn be model checked using stan-dard model checking techniques. The smaller, reduced system tries to preserve some of the properties of the original system that are necessary for proving system correctness. As already mentioned, predicate abstraction usually requires a logic and associated decision procedure to define predicates over the (typically infinite) concrete program state. The logic must be expressive enough to allow useful abstractions, but performance of the de-cision procedure is also very important, since most predicate abstraction approaches make numerous queries to the decision procedure. A n important class of programs are heap-manipulating programs (HMPs): pro-grams that access and modify linked data structures consisting of an unbounded number of heap nodes. H M P s access the heap nodes through a finite number of pointers (which we 3 call node variables) and by following pointer fields between nodes. Since the number of nodes in the heap is unbounded, H M P s are infinite-state systems, so one cannot directly apply finite-state model checking to this problem without using abstraction. To apply predicate abstraction to HMPs and assert many interesting correctness properties, one must be able to express the fact that a node is reachable from some other node by following a number of links — pointers in the data structure. For instance, to express a property that a node belongs to a particular singly-linked list, one must be able to say that the node is reachable from the head of that list. This concept is called unbounded reachability (a.k.a. transitive closure) between nodes. Several researchers have previously identified the importance of transitive closure for H M P s [Nel79, Nel83, BRS99, IRR+04a, BPZ05, LQ06, LAIR+05]. Unfortunately, adding support for transitive closure to even simple logics often yields undecidability [IRR + 04a], and therefore, one must be careful when defining such a logic. 1 Verification of H M P s has recently regained the focus of the software verification community. Many of the published approaches are based on predicate abstraction [BPZ05, LQ06, DN03, M Y R S 0 5 ] , and thus require a transitive closure logic and a decision proce-dure. Furthermore, other H M P verification approaches and tools [ W K L + 0 6 , YRS04] could also take advantage of a transitive closure logic. Therefore, defining such a logic and a fast decision procedure with a good implementation would be an enormous benefit. However, there exist only a handful of implemented decision procedures for logics that could be used in the verification of HMPs . This thesis defines such a logic and accompanying decision procedure, and assesses their usability and performance. In order to accomplish that, the thesis sets up and uses an H M P verification framework. The framework employs predicate abstraction, the logic, and the decision procedure to verify H M P s . The verification problem it solves can be stated as follows: given an HMP, determine whether it is the case that all executions that satisfy all initial assumptions also satisfy all assertions in the program. 1 In the heap analysis community, the term "transitive closure logic" refers to many different log-ics that include transitive closure, and not strictly to the first-order transitive closure logic (FO+TC). 4 1.2 Contributions The first contribution is a decidable simple transitive closure logic. This logic is a fragment of the decidable logics containing transitive closure, but we show (through many nontrivial experiments) that it is still expressive enough to verify properties of interest for H M P s using predicate abstraction. Important properties of data structures like singly-linked lists, doubly-linked lists, and cyclic lists can be easily defined in this logic. The second and most important contribution is an efficient decision procedure for this logic 2 . Most other decision procedures for similar logics employ a small model theorem and enumerate a huge number of heap structures up to some bound. Instead of using a small-model theorem and enumerating a super-factorial number of possible models (e.g., [BRS99, BPZ05, LQ06]), our decision procedure is based on inference rules. We show that this procedure, though worst case exponential-time (a proof that satisfiability is NP-hard even for a small fragment of our logic can be found in [RBH06]), solves very quickly the vast majority of queries sent to it during predicate abstraction. The result is an approach that can have large time and memory savings over decision procedures that enumerate all models. Overall, we have been able to verify with a short runtime and an insignificant mem-ory consumption a large variety of interesting HMPs, such as three small container functions from the Linux kernel. 1.3 Organization of the Thesis The material presented in this thesis is based on previously published work [BR06, RBH06] done with collaborators. The initial idea of the simple transitive closure logic and its deci-sion procedure belongs to Jesse Bingham. M y main contributions are extensions that made the logic practically usable and the fast decision procedure. The thesis is organized as follows: Chapter 2 provides background information. It 2The implementation of the decision procedure, called straclos, which stands for Simple TRAn-sitive CLOSure logic, can be downloaded from http://www.cs.ubc.ca/~zrakamar. 5 introduces heap-manipulating programs in Section 2.1, and predicate abstraction and the verification framework in Sections 2.2 and 2.3, respectively. Section 2.4 summarizes other work on verification of HMPs . Chapters 3 and 4 respectively define the transitive closure logic and its decision procedure. Chapter 5 presents experimental results, while Chapter 6 concludes and suggests some possible extensions to the logic and decision procedure. The appendices A and B provide additional, more theoretical aspects of the approach — proofs of the supporting theorems and additional details regarding the decision procedure. Proofs of the theorems are largely due to my collaborator Jesse Bingham. The thesis includes the proofs for the sake of completeness of the presentation. Appendix C gives pseudocode for the example programs and the sets of predicates needed for their verification. Chapter 2 Background 2.1 Heap-Manipulating Programs Heap-manipulating programs are an important and widespread class of programs. Any program that accesses and modifies linked heap1 data structures consisting of an unbounded number of heap nodes falls into this class. A heap node is a dynamically allocated chunk of memory. It usually contains pointer fields and data fields of different types. Pointer fields are links to other heap nodes. HMPs access the heap nodes through a finite number of pointer variables and by following pointer fields between nodes. In our framework, the heap consists of an unbounded number of uniform heap nodes. Uniformity of the nodes means that they all contain the same pointer and data fields. Data fields are booleans. Our HMPs can have a finite number of global node vari-ables (pointers) and a finite number of global boolean variables. Variables or data fields of any other finite type can be modeled (or encoded as) booleans. Note that the form of HMPs that we support doesn't preclude generality, as any HMP can be translated to our form. The framework currently doesn't support procedure/function calls. Figure 2.1 formally defines the supported HMP syntax. 1 In the thesis, the term heap always refers to a memory area used for dynamic memory alloca-tion, and not to the tree-based data structure where the value of each node is less than or equal to the value of its parent, as used to implement, for example, sorting algorithms and priority queues. 7 x,y,z G NodeVariables b G BooleanVariables f,g,h G PointerFields d G DataFields HMP ::— [Statement; }* Statement ::= while (BoolExp) do / / M P end while | if (BoolExp) then / / M P else / / M P end if | Assignment | assert y/ | assume y/ | break I n 0 P Assignment ::= Term := Term | Term := nil | b := BoolExp | d(Term) .= BoolExp Term ::— x I /(Term) BoolExp ::= | t rue | false j ND | Term = Term | d(Term) | -^BoolExp | (BoolExp A BoolExp) | (BoolExp V BoolExp) Figure 2.1: The Syntax of a Heap-Manipulating Program, i// is a simple transitive closure logic formula (Chapter 3). ND is a boolean value that is nondeterministically true or false. 8 1: procedure ND- lNSERT( / i ead , i ' r em) 2: assume ~^f*(head, item) Af*(head,n\\) A ~^head= n i l A/(item) = n i l Ap = head 3: while t r u e do 4: if NDVf(p) = n\\ then 5: f(item):=f(p); 6: /(/>) := //em; 7: break 8: else 9: P-=f(p); 10: end if 11: end while 12: assert f*(head,item) Af*(head,ri\\) 13: end procedure Figure 2.2: ND-Insert H M P Example. A program that nondeterministically inserts a node item into the list pointed to by head. Here, ND is a boolean value that is nondetermin-istically t r u e or f a l s e . Figure 2.2 gives an H M P example called N D - l N S E R T . This program takes a node head and a node item, and inserts item into the linked list pointed to by head at a position se-lected nondeterministically. We denote by f(x) the node pointed to by a pointer field named / of a node x. The pointer head is assumed to be non-nil and to point to an acyclic linked list that does not contain item. These assumptions are formalized by the assume statement on line 2 of the program. In the assume statement, and also in the assert statement, the subformulas of the form f*(x,y) express that node y is reachable from node x by following a sequence of / links of any length; we wil l formally define these predicates in Chapter 3. 2 The fact that nil is reachable from head enforces the acyclicity assumption.3 The body of N D - l N S E R T is straightforward; a pointer p walks the list, and item is inserted at some point. The loop breaks once the insertion has occurred. The expression ND represents a nondeterministic boolean value. The node item is inserted when either ND = t r u e , or the end of the list is reached (detected by the disjunct f(p) = n i l on line 4). The specification is expressed by the assert statement on line 12, and indicates that 2In C-like syntax, f(x) would be written as x->f. In a modal logic, / would be a next operator, and / * would be eventuality. The notation we are using is standard in the heap analysis community. 3In our logical framework, nil is modeled simply as a node having /(nil) = nil. 9 procedure lNlT-CYCL!C(/iead) assume f*(head,t) A /*(/(head),head) A curr = f (head) A btwnf(curr,t,head) A ->head=ri\\ d(head) := t rue; while -<curr = head do d(curr) := t rue; curr := f(curr); end while assert f*(head,t) A /*(/(head),head) A d(t) A -•head— ni l end procedure Figure 2.3: Init-Cyclic H M P Example. A program that sets data fields of all nodes in a cyclic list to t rue. whenever line 12 is reached, head must point to an acyclic list that contains item. Figure 2.3 presents another example called INIT-CYCLIC, and it captures some of the more interesting features the framework supports. The program takes a non-nil node head that points to a cyclic list and sets the (boolean) data fields of all nodes in the list to t rue. Similarly to f(x), we denote by d(x) the value of a data field named d of a node x. Necessary assumptions are again formalized by the assume statement on line 2 of the program. In the assume statement, the subformulas of the form btwn f(x,y,z) express that by following a sequence of / links from node x, we'l l reach node y before we reach node z, i.e. node y comes between nodes x and z. Because of cyclicity, b t w n / predicates are the key to successful verification of this program. We wil l formally define these predicates in Chapter 3. The fact that head is reachable from /(head) enforces the cyclicity assumption. The body of the INIT-CYCLIC procedure first sets the data field of head to t rue on line 3. Then, the loop sets the data fields of all other nodes in the list to t rue. The specification is expressed by the assert statement on line 8, and indicates that whenever line 8 is reached, head must point to a cyclic list with data fields of all nodes set to t rue. Figure 2.4 shows a list container procedure from the Linux kernel. It illustrates the need for multiple pointer fields. The procedure takes a node entry and removes it from a cyclic doubly-linked list. Each node in the list has a prev and a next pointer. The body of the procedure is simple; it connects prev and next pointers of the entry's neighbourhood 10 1: procedure LlNUX-LiST-DEL(enrry) 2: p := prev(entry); 3: n := next (entry); 4: prev(n) := p; 5: next(p) := n; 6: next(entry) := nil; 7: prev(entry) :— nil; 8: end procedure Figure 2.4: Linux-List-Del HMP Example. A Linux kernel function that removes a node from a cyclic doubly-linked list. nodes and therefore removes entry from the list. The assumptions and specifications for this example are very complicated and are given in the Appendix C. Now that we have introduced heap-manipulating programs, we' l l present the basics of the algorithm that our framework uses for their verification. 2.2 Predicate Abstraction Our approach to verifying heap-manipulating programs is based on predicate abstrac-tion [GS97], which is an instance of abstract interpretation [CC77]. In the framework of abstract interpretation, a concrete system is verified by constructing a finite-state over-approximation called the abstract system. In this thesis, the concrete system we are ver-ifying is an H M P . Let 'rf (the concrete states) be the set of states of the concrete system. Predicate abstraction employs a finite set of predicates </>i,..., fa in some logic that are as-sertions about concrete states. Corresponding to the predicates are the abstract boolean variables b\,..., b^. A n abstract state a will be a vector of truth assignments to the abstract boolean variables b\,...,bk- The set of abstract states s# will then be the set of assignments to the abstract boolean variables. Note that a set of states S can also be represented by its characteristic function, i.e. a logic formula \jf such that s 1= y/ iff seS 11 In the rest of the thesis, we wil l use the set and formula definitions interchangeably, and it wil l be clear from the context to which one we are referring. The concrete and abstract systems are connected with two functions: (i) The abstraction function a : —> s/, which maps a concrete state c to the abstract state a, is defined as k i bi iffchfc a(c) = f\ I 1=1 I ->bi otherwise A set of concrete states C is then abstracted by a(C) = V «(c) cec (ii) The concretization function y:sf^c€, which maps an abstract state a to the set of concrete states it represents, is defined as . I 0; iff a 1= b\ y ( « ) = A / I - i 0 , - iff a N ->bi Because a is an abstract state, it defines every abstract boolean variable bi, and there-fore either a \= b, or a N always holds. A set of abstract states A is then concretized by 7(A) = \ / y{a) aCA The abstraction function a and the concretization function y form a Galois connection, and therefore for any concrete set of states C, the following formula is satisfied: CCy(a(C)) Note that since si is finite, a(C) is always finite as well. In contrast, ^ is often infinite; in our case, the infiniteness of the concrete state space arises from the unboundedness of the heap in HMPs . Given a set of concrete initial states Ic, \et R Cff be the set of concrete states that are reachable in the concrete system. We wish to verify that a property expressed as a state 12 assertion i/f over the concrete states holds for all members of R. Predicate abstraction is used to solve this problem by computing an over-approximation Ra C of the set of reachable abstract states such that Cc(R) C Ra. Verification succeeds i f one can prove that the state assertion y holds for all members of j{Ra). A key difference in the various approaches to predicate abstraction is how Ra is computed [GS97, DDP99, DD01, FQ02, BPZ05, DN03]. This typically involves numerous queries to a decision procedure for the underlying logic, and there are tradeoffs between how accurately Ra approximates cc(R) and the number and complexity of these queries. The algorithm that our framework uses to compute Ra is described in the next section. Since predicate abstraction is an incomplete approach, if it fails to verify the prop-erty, this can happen either because the concrete systems actually violates the property, or because of the loss of information inherent in the abstraction. Finding the "right" set of predicates for completing the verification can be a difficult task. Many works have ad-dressed this issue of predicate discovery [DD02, BPR02, HJMS02, DN03], which falls un-der the more general framework of abstraction refinement [CGJ + 00] . As in recent papers on this topic [BPZ05, LQ06], in our current framework, predicates are added by manual in-spection of counterexample behaviors; applying automatic predicate discovery techniques is an important area of future work. 2.3 Over-Approximating the Reachable Abstract States A n over-approximation Ra of the set of reachable abstract states is usually computed as a fixpoint, using some approximation of the abstract post image operator post: 2s* —> 2s*, defined as follows. Given a set of abstract states A, let post(A) = {a(c')\3c,c' e<t?.(c,c') eTAa(c) €A} where T is the transition relation of the concrete system. The abstract post image operator post (A) is thus the set of abstract states representing concrete states that are concrete suc-cessors of those states represented by A . Given post(A) (or an over-approximation) and the 13 initial set of concrete states Ic, we compute Ra using the following least fixpoint iteration (or equivalent) Ro = a(Ic) Ri = fl0Upost(/c0) R2 = R\U p o s t a l ) which can be expressed using the least fixpoint operator u as a formula Ra = nX. a ( / c ) U X U post(X) There exist a number of algorithms for computing post, with tradeoffs between how precisely post is computed and the number of queries to the decision procedure. The naive algorithm is straightforward. Since post distributes over disjunction, 4 computing post(A) is reducible to computing post(p) for each disjunct p in some disjunctive normal form decomposition of A. A disjunct p is a conjunction of possibly negated abstract boolean variables. By using a B D D [Bry86] to represent A, we can easily obtain such a decomposi-tion. The naive algorithm cycles through all 2k abstract states a, and checks i f a 6 post(p); the abstract post image operator post(p) is then the B D D representing the disjunction of all such a. Each check of a e post(p) involves a call to the decision procedure to determine if the following formula is satisfiable: y(p)Awp(y(a)) (2.1) where y is the concretization function defined in the previous section, and wp is the weakest precondition operator [Gri81, Dij76]. The weakest precondition operator wp is a syntactic transformation on logic formulas that depends on the program statement under considera-tion [Gri81, Dij76]. For example, for an assignment statement x :— e, where x is a variable and e is some expression, wp(7r) is constructed by syntactically replacing all occurrences of x with e in the formula n.5 Our approach applies wp at the granularity of individual program statements when performing predicate abstraction. 4meaning that post(Ai VA2) = post(Aj) V post(A2) 5This only works under the assumption that x cannot be aliased. 14 Because our predicate abstraction framework is mainly developed for testing the de-cision procedure, we choose to implement a simple, precise predicate abstraction algorithm. Specifically, the framework uses the described naive algorithm with several straightforward improvements by Das et al. [DDP99] that preserve the precision of the naive algorithm. First, i f (2.1) contains a syntactic contradiction, meaning the existence of a predicate and its negation, then clearly the formula is not satisfiable. In such circumstances there is no need to call the decision procedure. When computing post(p), our implementation initially computes a B D D A representing the set of all abstract states a that won't yield such a con-tradiction. Second, rather than enumerating all a G A, we do recursive case-splitting on the abstract variables, which allows for pruning of large portions of A. For example, let a = b\ be the disjunct containing only the positive occurrence of the abstract boolean variable b \. This disjunct represents all abstract states having b\ true. Then i f y(p) A wp(y(o")) is un-satisfiable, then so too is y(p) A wp(y(a)) for any abstract state a that has b\ equal to true. Hence, our algorithm would only explore those abstract states having b\ false. 2.4 Related Work There is an extensive literature on verification of H M P s describing many different ap-proaches. This section concentrates on the work similar to ours, and on techniques that could benefit from the logic and the decision procedure presented in this thesis. Similarly to Lahiri and Qadeer [LQ06], the described related work wil l be roughly divided, according to the technique(s) it is based on, in the following categories: shape analysis, predicate ab-straction, logics, first-order axiomatization of reachability. These categories often overlap. For example, this thesis spans the predicate abstraction and logic based categories. Shape Analysis. The most well-known shape analysis tool is the Three Valued Logic Analyzer or T V L A [LASOO]. T V L A extends conventional abstract interpretation with a third "uncertain" logic value, and builds so-called 3-valued logical structures that abstract the reachable states at each program point (a.k.a. canonical abstraction). The abstract semantics of program statements are defined by abstract transformers, which can 15 be generated by T V L A or user-defined in the case that the generated transformers are not strong enough. We cannot handle all heap structures that T V L A can. On the other hand, the abstract invariant we compute is always the most precise w.r.t. the given set of predicates. T V L A does not make such a guarantee, and there has been some work done to make T L V A more precise [YRS04]. However, the described improvement hasn't been tested because it requires a transitive closure logic decision procedure similar to ours, which the authors didn't have. Now, T V L A could take advantage of the decision procedure described in this thesis and make its analysis more precise. A l l H M P verification approaches described in the literature require some amount of manual user's effort during the verification (e.g. providing the loop invariants, finding the right set of predicates, etc.). Recently, Loginov et al. [LRS05] presented an interesting combination of abstraction refinement using machine learning techniques, and used T V L A to fully automatically (i.e. no manual effort required) verify some H M P examples. Their technique might be a good starting point for extending this thesis with the abstraction re-finement algorithm, which would even more automatize our approach. Predicate abstraction. T V L A is also used as the underlying engine for the ap-proach by Manevich et al. [MYRS05] . In the approach, the authors observe that the number of shared nodes in linked lists is bounded and present a novel definition of "uninterrupted list segments". This is used to define predicate and canonical abstractions of potentially cyclic singly-linked lists. The approach described in this thesis is not that limited, and can also easily handle doubly-linked lists. The defined abstraction enables them to verify a number of HMPs , though the properties they verify tend to be simpler than ours (see Chapter 5). Balaban et al. [BPZ05] present a predicate abstraction based approach for verifi-cation of H M P s that is similar to ours. The major difference between the two approaches is the way a program abstraction is computed. To compute the abstraction, they employ a small model theorem, and build BDDs representing all models up to the small model size. This is a bottleneck in both computation time and memory, since these B D D s tend to blow-16 up. We, on the other hand, use our saturation-based decision procedure which substantially improves memory consumption and computation time. Though we do not consider liveness in the thesis, it is likely that the technique of Kesten and Pnueli [KPOO] for establishing termination (employed by Balaban et al.) is also compatible with our work. Another approach based on predicate abstraction and model checking is proposed by Dams and Namjoshi [DN03]. They abstract a program by iteratively calculating weakest preconditions of shape predicates, and are able to handle second-order shape properties such as reachability, cyclicity, and sharing. The algorithm doesn't use a decision procedure, and as a consequence, new predicates can be generated in every iteration. Hence, the algorithm often has to be manually provided with "approximation hints" to converge. Logics. The pioneer of logic-based tools is the Pointer Assertion Logic Engine (PALE) [MS01]. P A L E specifies heap structures using graph types [KS93], which are tree-shaped data structures augmented with extra pointers that may point anywhere in the tree. The authors show that many common heap structures can be defined that way, some of which we cannot express, such as trees. P A L E employs a well-known decision procedure for a monadic second-order logic on graph types called M O N A [KMSOO]. M O N A has non-elementary complexity, and therefore, it pays a performance penalty compared to our approach (Chapter 5 gives the initial comparison). Furthermore, loop invariants must be provided by the user. Inherent locality of the heap is employed in recent work that is based on separation logic [MNCL06 , DOY06]. Both approaches utilize symbolic execution of separation logic formulas to infer invariants of heap-manipulating programs. Because there are infinitely many symbolic heaps, they have to use different techniques such as abstraction, widening operators, or rewrite rules to reach a fixpoint. In addition, two new decidable logics for expressing properties of linked data struc-tures have been proposed recently. Ranise and Zarba [RZ05] describe a decidable logic for reasoning about acyclic singly-linked lists with an NP-complete decision problem. They state a small model theorem for the logic, but the design and implementation of a practical 17 decision procedure is left as an area of future work. Yorsh et al. [ Y R S + 0 6 ] define the Logic of Reachable Patterns (LRP), a fragment of the first-order logic over graph structures with transitive closure. Reachability in L R P is defined using regular expressions which denote paths in the heap structure reaching a certain pattern. Patterns are quantifier-free first-order formulas over graph structures used to limit the neighborhood of a reachable node. The authors prove that with suitable restrictions on patterns, the satisfiability problem for L R P is decidable. Because L R P operates on general graphs, even with those restrictions it can express complex heap data structures, such as binary trees. Restricted L R P formulas can be decided using M O N A [KMSOO], or alternatively by directly constructing a tree automaton. There is still no implementation available, and therefore, the authors haven't provided any experiments. The worst case complexity of the satisfiability problem is at least doubly-exponential, but the authors hope to achieve a reasonable performance because formulas that come up in practice are well-structured. To handle more involved data structures, Wies et al. [ W K L + 0 6 ] introduce field con-straint analysis, a novel technique for verifying heap structure invariants. The analysis uses decidable logics (in particular monadic second-order logic of trees and its decision procedure M O N A ) to handle complex data structures originally beyond the scope of these logics, such as skip lists. It generalizes a previous similar approach called structure simu-lation [ IRR + 04b]. While doing the verification, the analysis makes a number of queries to M O N A . We have evaluated our approach on these queries6, and the initial results are en-couraging, that is, we are solving the queries faster than M O N A is (see Chapter 5). There-fore, the field constraint analysis could benefit from using our faster decision procedure instead of M O N A . First-order axiomatization of reachability. First-order axiomatization of reach-ability was first proposed by Nelson [Nel83]. Lahiri and Qadeer [LQ06] define two new predicates to express reachability of heap nodes in linked lists. To prove properties of HMPs , they use an incomplete set of first-order axioms over those predicates. Because the 6Thanks to Thomas Wies and Viktor Kuncak for sending us the queries their tool generates. 18 given set of axioms is incomplete, they provide an induction principle that is used to derive additional axioms when necessary. They use U C L I D [BLS02] as the underlying inference engine. Lev-Ami et al. [LAIR+05] also propose a set of axioms, but it works only for acyclic lists. These approaches harness the generality and expressiveness of more general first-order theorem provers, at a sacrifice in performance. McPeak and Necula [MN05] specify heap data structures using local equality ax-ioms, first-order axioms that constrain only a bounded fragment of the heap around some node. This enables them to describe a variety of shapes and reason about scalar values without abstracting them, while still preserving decidability. However, they can only ap-proximate reachability between nodes (though wnreachability is precise). When pointer disequalities are added, their decision procedure becomes incomplete. We handle both reachability and disequalities, but we can't describe such a variety of shapes and reason about infinite scalar types without abstracting them with booleans. In addition, we compute an inductive invariant of a program automatically (given an appropriate set of predicates), while they require a user to provide loop invariants, which can be a significant burden. 19 Chapter 3 Proposed Logic In the previous chapter, we introduced H M P s and predicate abstraction. Predicate abstrac-tion employs a set of predicates in a logic that has to be expressive enough to allow useful abstractions. This chapter starts by defining the syntax and semantics of our proposed sim-ple transitive closure logic for predicate abstraction of HMPs . Next, it introduces logic extensions necessary for handling pointer and data field updates. Chapter 4 then presents our decision procedure for the described logic. 3.1 Basic Logic Our logic assumes finite sets of node variables V, data variables B, data function symbols D, and pointer function symbols F. The term, atom, and literal syntactic entities are given in Figure 3.1. Literals of the form x = y, ~^x=y, f*(x,y), and ~^f*{x,y) (where x and y are terms) are called equality, disequality, reachability, and unreachability literals, respectively. Literals of the form btwn f(x,y,z) or its negation are called between literals, literals of the form d(x) or -nrf(x), where d e D, are called data literals, while those of the form b or ->b are called data variable literals. The structures over which the semantics of our logic are defined are called heap structures. A heap structure H = (N, 0 ) consists of a set of nodes N and an interpretation function 0 . The interpretation function 0 interprets each symbol c i n V U B U D U F , such 20 term ::= v I /(term) atom ::— b | d(term) | term —term | f* (term,term) | btwnf(term, term, term) literal ..= atom | -^atom Figure 3.1: Syntax of the Proposed Logic. In the syntax, the symbol v GV,d eD,b E.B, and feF. that: • Each node variable symbol a G V is interpreted as a node 0 ( c ) G A/. The variables of V model program variables that point to nodes in the data structure. • Each data variable symbol a G B is interpreted as a boolean value 0 ( c ) G {true, false}. The variables of B model program variables of boolean type. • Each data function symbol a G D is interpreted as a function that maps nodes to booleans 0 ( c ) G [N —> {true,false}]. Data function symbols D model data fields of nodes. • Each pointer function symbol a G F is interpreted as a mapping from nodes to nodes 0 ( a ) G [TV —+ A 7]. Pointer function symbols F model pointers from nodes to nodes. The size of H is defined to be \N\. Heap structures naturally model linked data structures of nodes, each node having some finite number of pointers to other nodes and some finite number of boolean-valued data fields. Clearly, program variables or node data fields of any finite enumerated type can be encoded using the booleans supported by our logic. The interpretation function 0 extends to interpret any term, atom, or literal in a straightforward, inductive way. The interpretation of a term z G V is defined above, other-wise, T has the form f(z') for some term z', and the interpretation is 0 ( T ) = 0 ( / ) ( 0 ( T ' ) ) 21 Atoms are interpreted by 0 as boolean values: • A data variable atom b £ B is interpreted as defined above. • A data atom d(x) is interpreted as @(d)(&(x)). • A n equality atom X\ — %i is interpreted as true iff ) = © ( 1 2 ) . • A reachability atom /*(Ti,T2) is interpreted as true iff there exists some n > 0 such t h a t 0 ( / ) " ( 0 ( T 1 ) ) = 0 ( T 2 ) . 1 • A between atom btwn , Vi, T3) is interpreted as true iff there exist no, mo > 0 such that 0 ( T 2 ) = 0 ( / ) " ° ( 0 ( T i ) ) , 0 (T 3 ) = 0 ( / ) m ° ( 0 ( T i ) ) , n0 < mo, and for all n,m such that 0 (T 2 ) = 0( / )" (0(TI ) ) , 0 (T 3 ) = 0 ( / ) m ( 0 ( T i ) ) , we have n0 < n and m0 < m. Note that the corner case btwnf(x,y,x) holds only if x — y because the "distance" between x and x is zero. Finally, a literal that is not an atom must be of the form - > 0 where 0 is an atom, and we simply define 0 ( - i 0 ) = - > 0 ( 0 ) . Figures 3.2 and 3.3 give examples of some heap structures and literals. Note that when dealing with acyclic lists, as in Figure 3.2, the fact that node prev is between nodes head and curr can be expressed using a conjunction of reachability literals f*(head,prev) A f* (prev, curr). However, i f nodes are on a cycle, as in Figure 3.3, each node is reachable from every other node, and therefore it is not possible to express be-tweenness using reachability literals. For instance, / * (x, z) A f* (z,y) holds in this example, although node z is not between nodes x and y. We have introduced between literals to solve that problem: the fact that node y is between nodes x and z in Figure 3.3 can be expressed with btwn f(x,y,z). Conforming to the usual notation, given a heap structure H = (N, 0 ) and a literal 0 , we write H N <j> iff 0 ( 0 ) = true. For a set of literals 4>, we write H \= <$> iff H N 0 for all 0 S 4>. Given 4>, i f there exists H such that H N <£>, we say that <& is satisfiable. In 1 Here, function exponentiation represents iterative application: for a function g and an element x in its domain, g°(x) =x, and g"(x) = g ( g " _ 1 ( ; c ) ) I o r all n > 1. 22 Figure 3.2: Acycl ic Heap Structure Example. Arrows marked with / represent a pointer function / , while the Ts and Fs inside nodes are values of a data function d. As examples of literals interpreted as true consider: next — curr (both variables point to the same node), f* (head,n\\) (the node nil is reachable from the node pointed to by head following / links), f* (head, prev) (the node pointed to by prev is reachable from the node pointed to by head following / links), d(curr) (the data field of the node pointed to by curr is true), f(f(curr)) = nil (the node to which we get to by following two / links from curr is nil), btwn/(head,prev,next) (node prev is between nodes head and next). As examples of literals interpreted as false consider: d(prev) (the data field of the node pointed to by prev is false), never = nil (the node pointed to by next is not nil), / * (next,prev) (the node pointed to by prev is not reachable from the node pointed to by next following / links). X y Z Figure 3.3: Cycl ic Heap Structure Example. As examples of literals interpreted as true consider: b\wr\f(x,y,z) (node y is between nodes x and z), btwn f(x,y,y), btwnf(x,x,y) (node y is reachable from node x following / links, and between includes the endpoints), f*(f(x),x) (node x is reachable from the node coming after x following / links, i.e. cyclic-ity). As examples of literals interpreted as false consider: f*(x, nil) (nil is not reachable from x following / links), btwn/(x,y,x) (the distance, i.e. the number of / links, between x and y is greater than the distance between x and x because each node has a zero-distance from itself). 23 Chapter 4, we wi l l describe our decision procedure for satisfiability, which has a worst-case exponential running time. The problem it solves is NP-hard (see Appendix A.5), hence a polytime algorithm is unlikely to exist. 3.2 Handling Pointer and Data Function Updates When doing predicate abstraction and computing reachable abstract states, we are always applying the weakest precondition operator over one program statement at a time (see Sec-tions 2.2 and 2.3). The weakest precondition of a formula 0 with respect to an assignment statement x :— e, where x is a variable and e is an expression, is usually defined as the formula constructed by replacing all occurrences of x in <p with e. This holds under the assumption that x cannot be aliased. However, handling program assignments that modify the heap structure or data fields is not that straightforward and requires special care. In-tuitively, assignments that modify heap structure influence not only literals related to the assigned terms, but also other, apparently unrelated, reachability and between literals. For instance, the assignment f(x) := y might alter the reachability between any pair of nodes that link through the updated node x. Similarly, because of aliasing, updates of data fields impose additional node constraints, e.g. the values of data fields of nodes that alias the up-dated node have to be changed accordingly. Therefore, handling these two requires special additions to our logic that we wil l describe here. To handle program assignments that modify the pointers in the heap, i.e. modify some / 6 F, we use a special pointer function symbol / ' for each modified / . The symbols / and / ' model the pointer function before and after the assignment, respectively. Such an assignment has the general form /(TI) := T 2 where Ti and T2 are arbitrary terms. Lines 5 and 6 of the H M P of Figure 2.2 on page 8 are examples of such assignments. The necessary semantic relationship between / and / ' can 24 be expressed using the well-known update operator: ..2 © ( / ) = u p d a t e ( 0 ( / ) , 0 ( T 1 ) , 0 ( T 2 ) ) (3.1) Rather than support update as an interpreted second order function symbol in the logic, our decision procedure, described in the next chapter, implicitly enforces the constraint (3.1) (see Section 4.3). Assignments that modify a data field d e D are handled similarly by using the primed symbol d' for each modified d. Such an assignment has the general form where T is a term, and b is a data variable. Lines 3 and 5 of the H M P of Figure 2.3 on page 9 are examples of such assignments. Analogously to (3.1), the semantic relationship between d and d! is Our decision procedure also knows to implicitly enforce the constraint (3.2). 2 I f g is a function, a is an element in g's domain, and b is an element in g's codomain, then update(g,a,fo) is defined to be the function Ax.(if x = a then b else g(x)). d(x) := b ®(d') = update(0(rf),0(T),0(Z>)) (3.2) 25 Chapter 4 Decision Procedure Predicate abstraction, the previously described technique we are using for H M P verification, makes a number of queries to a decision procedure in order to verify an HMP. Each query is the conjunction of a set of literals in the employed logic. Therefore, the decision problem we aim to solve here is this: given a finite set of literals 3> from the logic proposed in the previous chapter, does there exist a heap structure H such that i.e. H models all the literals in the set? If there is such an H, then we say that 4> is satisfiable, otherwise 4> is unsatisfiable. One approach to this problem would be through a small model theorem, akin to other transitive closure logics [BPZ05, BRS99, RZ05, LQ06]. Unfortunately, even a very small "small model" bound can generate impractical memory requirements, because the number of heap structures with n nodes is at least nn. Our saturation-based approach, on the other hand, has small memory requirements, and is based on the exhaustive application of a set of inference rules. 4.1 Inference Rules The inference rules (IRs) attempt to prove unsatisfiability by deriving a contradiction, meaning the inference of both an atom <j) and its negation Figure 4.1 gives an ex-ample of a more involved IR r. A n antecedent of IR r is a literal appearing above the line, 26 /*(*,?) f*(y,z) f*(z,x) btwn/(x,y,z) btwn / (y ,z ,x) btwn f{z,x,y) :) btwn/{x,z,y) :) btwn / (z,v,x) x = y -) btwn / (y,x,z) x = z y = z Figure 4.1: Inference Rule Example. Here x, y, and z range over variables V and / £ F ranges over pointer fields. while a consequent is a set of vertically stacked literals appearing below the line. We say that an IR r is applicable (to a set of literals 4>) if there are terms appearing in 4> such that when these terms are substituted for the term placeholders of r (i.e. x, y, z), all of r's antecedents appear in <&, and none of r's consequents appear in 4>, where a consequent n is defined as appearing in * if for each literal 0 £ II it is also the case that 0 £ <1>. We define the formal meaning of an IR with the antecedents A and consequents C as usual: The basic IRs are presented in Figures 4.2 and 4.3. We now give a brief intuition behind the IRs given in Figure 4.2: IDENT - states that each node variable is equal to itself. R E F L E X - enforces that any node variable is reachable from itself. T R A N S 1 - states that the transitive closure / * must extend the function / . T R A N S 2 - simply enforces that / * is transitive. F U N C - asserts that if f(x) = y and z is reachable from x, then z must also be reachable from y, unless x=z. C Y C L E * - formalizes that i f there is a cycle of length k > 1 in / , then it follows that any node y reachable from a node on the cycle must be on the cycle as well. Note that C Y C L E * actually defines a separate rule for each k> I. yeA nec <pen 27 -IDENT X=X f*(x,x) -REFLEX f*(x,y) •TRANS 1 f*(x,y) f*(y,z) TRANS2 f(*)=y r(xa) x=z f*(y,z) -FUNC f(xi)=x2 f(x2)=x3 f(xk)=xi y) y=x\ y=x2 y=xk C Y C L E * d(x) -rf(y) ->x=y •NOTEQNODES r(x,y) f*(x,z) f*(y,z) f*(z,y) -TOTAL f*(x,y) f*(y,x) f*(x,z) x=y f*{z,x) S e c f(x)=z f(y)=z r(x,y) f*(y,x) x—y SHARE Figure 4.2: Basic Inference Rules. Here x, y, z, etc. range over variables V, f € F ranges over pointer fields, and d £D ranges over data fields. Note that C Y C L E * actually defines a separate rule for each k > 1. 28 -BTWREFLEX f(x,y) rM / ( * )=* B T W 1 b twn / (x , x , x ) b twn / (x , y , z ) b twn / ( x , y , z ) ( ) = b t w n / ( x , y , z ) D , f*(x,y) BTW2 7 7 / V Y BTW3 b twn/ (w,y ,z ) x = y b twn / (x , y , z ) b twn / ( x , z , y ) B t w < | / * ( x , y ) / * ( x , z ) fiTw5 y = z b twn / (x ,y ,z ) b twn / ( x , z , y ) r ( * , y ) / * ( y , z ) / * ( * , * ) b twn / (x , y , z ) b twn / (x ,z ,y ) B t w 6 b twn / ( y , z , x ) b twn / (z ,y ,x ) x = y x=z y—z b twn / ( z , x , y ) b t w n / (y,x,z) r(x,y) btwn/ (x ,x ,y ) BTW7 btwn/ (x ,y ,y ) b twn / (x , y , z ) f(x)=z B T W g / ( z ) = w b twn / ( x , y ,w ) / * ( x , z ) B t w 9 y=x y = z b twn / (x ,y ,z ) y=w btwn/ (x ,y ,z ) b twn/ (w,z ,y ) f*(x,w) B t w 1 0 f*(z,w) y.=z btwn/ (w,x ,y ) b twn/ (w,y ,z ) - B t w U btwn/(w,x,z) btwn/ (v ,M , x ) b twn/ (v ,n ,y ) b t w n / ( « , x , y ) ^ 2 btwn/ (v ,x ,y ) Figure 4 .3 : Between Inference Rules. Here x, y, z, etc. range over variables V and / e F ranges over pointer f ields. 29 N O T E Q N O D E S - ensures that if the values of a data field of two nodes are not equal, the nodes are not equal as well. TOTAL - requires that i f y and z are both reachable from another node x, then there must exist some reachability relationship between y and z. S C C - states that if x and y are distinct and mutually reachable from each other, and z is reachable from x, then x is reachable from z (since x must lie on a cycle of / ) . It is similar to F U N C , though the two IRs are irredundant with respect to each other. S H A R E - captures the fact that in a cycle of / , no two distinct nodes x and y can have m = m . Similarly, we give an intuition behind the between IRs given in Figure 4.3: B T W R E F L E X - enforces that any node variable is between itself. B T W I - i f a pointer field of node z points to x (i.e. /(z) =x) and y is on the cycle that includes x and z, then y lies between x and z. B T W 2 - states that i f node y lies between nodes x and z, then obviously y is reachable from x, and also z is reachable from y. B T W 3 - asserts that i f x, w, y, and z are on the same chain and f(x) — w, then y is also between w and z, unless x=y. B T W 4 - if y is between x and z, and also z between x and y, then it has to be that y and z are equal. B T W 5 - formalizes the fact that if y and z are both reachable from x, either y lies between x and z, or z lies between x and y. B T W 6 - handles the case when x, y, and z are all on the same cycle. B T W 7 - ensures that betweenness includes the endpoints. 30 B T W 8 - if the distance between x and z is one, and y is between x and z, then y is either equal to x or to z. B T W 9 - asserts that i f x, y, z, and w are on the same chain, y is between x and w, and / ( z ) =w, then y is also between x and z, unless y = w. B T W I O - covers "lollipop" shaped structures where x is a node in the stick leading to the circle of which y and z are a part. It follows that w is on the circle as well, unless y = z. B T W I 1 - i f nodes w, x, y, and z are chained in that order, which is captured by the an-tecedents, then node x is between nodes w and z. B T W I 2 - similarly to B T W I 1, if nodes v, u, x, and y are chained in that order, then node x is between nodes v and y. Given the preceding intuition, it is easy to prove the following: Theorem 1. The inference rules of Figures 4.2 and 4.3 are sound. Proofs of the inference rules related theorems are in Appendix A . Theorem 1 tells us that the iterative application of the IRs preserves the satisfiability status of the initial set of literals, that is, if it yields a contradiction along every branch caused by the IRs with multiple consequents, then we can conclude that the original set of literals is unsatisfiable. To prove completeness, we first reduce the problem to sets of literals in a certain normal form, then prove completeness for only normal sets, defined bellow. Definition 1 (normal). Let Vars(<&) denote the subset of the node variables V appearing in 4>. A set of literals $ is said to be normal if all terms appearing in <$> are variables, except that for each f G F and v G Vars(<J>) there may exist at most one equality literal of the form / (v) = u, where u G Vars(4>). 31 Theorem 2. There exists a polynomial-time algorithm that transforms any set into a normal set <f>' such that O' is satisfiable if and only «/<E> is satisfiable. The algorithm exhaustively replaces occurrences of terms of the form / (v) with freshly introduced variables Vfresh, and consequently adds equality literals f(v) = Vfresh to the set. Let us call a set of literals <t> consistent if it does not contain a contradiction, and call <J> closed if none of the IRs of Figures 4.2 and 4.3 are applicable. Thus, we have our completeness theorem, which is the foundation of our decision procedure: Theorem 3. If<$> is consistent, closed, and normal, then 4> is satisfiable. Corollary 1. For any normal set of literals, the inference rules of Figures 4.2 and 4.3 are complete. Intuitively, i f the iterative application of the IRs on the initial, normal set of literals saturates (i.e. generates a closed set of literals), then the initial set of literals is satisfiable, which brings us to our decision procedure algorithm. 4.2 Basic Decision Procedure Viewed from a high level, the decision procedure first applies the transformation of Theo-rem 2 and transforms any initial set of literals to a normal one. Then, the procedure invokes the core algorithm that is presented in Figure 4.4, which can without loss of generality assume that <£> is normal. The core decision procedure algorithm repeatedly searches for an applicable IR, ap-plies it (i.e. adds the literals of one of its consequents to the set), and recurses. The recursion is necessary for those IRs that branch, i.e. have multiple consequents. If the procedure ever infers a contradiction, it backtracks to the last branching IR with an unexplored consequent, or returns unsatisfiable i f there is no such IR. If the procedure reaches a point where there are no applicable IRs and no contradictions, then the inferred set of literals is consistent, closed, and normal. Hence, by Theorem 3, it may correctly return satisfiable. We note that 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 function DECIDE(<J>) if <$> contains a contradiction then return U N S A T end if if there exists an IR r applicable to 4> then for each consequent II of r do for each literal 0 of II do if § is an equality literal of the form v, = v, then *' := *[Vi/vj] else <J>' : = 4>u{4>} end if end for if DECIDE(4>') = S A T then return S A T end if end for return U N S A T else return S A T end if end function Figure 4.4: Pseudocode of the Core Decision Procedure Algorithm. The procedure D E C I D E requires <f> to be a normal set of literals. The notation *[v,-/vy-] on line 9 represents the set obtained by replacing all occurrences of Vj in literals of <& with v,-. our decision procedure is guaranteed to terminate because none of the IRs introduce new terms. Theorem 4. The decision procedure always terminates. For the proof of this theorem and lemmas that demonstrate the correctness of the algorithm, see Appendix B . 4.3 Decision Procedure Extension for Handling Updates In Section 3.2, we introduced additions for handling pointer and data function updates to our logic. Here, we enforce the introduced update constraints (3.1) and (3.2) by adding a 33 number of additional IRs to our decision procedure. Initially, recall that in each pointer function update constraint / ' = u p d a t e ( / , T i , T 2 ) , (3.1) the symbol / ' is the pointer function after the update. Therefore, each of the IRs of Fig-ures 4.2 and 4.3 that mention a pointer function apply to / ' also. To enforce the constraint (3.1), which involves pointer function / before the update and / ' after the update, we need additional IRs that mention both pointer functions. There-fore, the decision procedure includes the eight IRs of Figure 4.5. For each pointer function update, the IR UPDATE introduces a fresh variable w that is forced to be equal to f(x\). The reason behind introducing w is to preserve all literals in the normal form (page 31) when applying IRs. Other IRs refer to these freshly introduced variables. This allows us to state that / = u p d a t e ( / , T i , w ) , and therefore we have the obvious symmetry between the IRs UPDFUNCI and UPDFUNC2, between UPDTRANSI and UPDTRANS2, and between UPDTRANS3 and UPDTRANS4. Note that some of the IRs in Figure 4.5 can introduce new terms, however, given a normal set of literals, the number of new terms is bounded. This implies that the extended deci-sion procedure also always terminates. Next, we give a brief intuition behind the IRs in Figure 4.5: UPDATE - enforces the update constraint (3.1) on / ' , and also introduces a fresh variable w that is forced to be equal to f(T\). UPDBTWN - asserts that i f y is between x and z before the update, then after the update, y is also between x and z, unless the node T i , whose pointer function is updated, is also between x and z. If T i is between x and z, and not equal to z, its update can alter the fact that y is between x and z. UPDFUNC 1 - if f(x) =y before the update, then after the update also f'(x) —y, unless x is the node X\. 34 UPDFUNC2 - analogously to UPDFUNCI , if f'(x) = y after the update, then before the update also f(x) =y, unless x is the node X\. UPDTRANSI - i f y is reachable from x before the update, then after the update y is also reachable from x, unless the node t\ is on the path from x to y. U P D T R A N S 2 - analogously to UPDTRANS 1 , if y is reachable from x after the update, then before the update y is also reachable from x, unless the node Tj is on the path from x to y. UPDTRANS3 - if Ti and y are reachable from x before and after the update, respectively, then we case-split on whether T\ comes before or after y following the path from x. If Ti comes before y, the update can influence the reachability between x and y. UPDTRANS4 - analogously to UPDTRANS3, if Ti and y are reachable from x after and before the update, respectively, then we case-split on whether X\ comes before or after y following the path from x. Similarly to pointer function updates, recall that in each data function update con-straint d! = update(d,T,fc) , (3.2) the symbol d' is the data function after the update. Therefore, the IR NOTEQNODES of Figure 4.2 applies to d' also. To enforce the data update constraint (3.2), which mentions both d and d', we add the four IRs of Figure 4.6 to our decision procedure. Here, we give the intuition behind the added rules: EQDATA - enforces the data update constraint (3.2), that is, the data function value of the node T, whose data function is updated, has to be equal to the boolean variable b that is assigned to it. PRESERVEVALUE - data values of nodes that are not equal to the node x have to be pre-served. 35 b t w n / ( x , y , z ) - a = z f ' ( T i ) = T2 UPDATE , , . b t w n / - ( j c , T i , z ) UPDBTWN / ( T i ) = W 7 i T i = Z X=TI /'(*)=y UPDFUNCI X=TI f(x)=y UPDFUNC2 y = w y = ^ 2 f*(x,y) f*(x,y) f* (x, Ti) f'*{x,y) UPDTRANSI f*{x,xx) f*(x,y) UPDTRANS2 /*(* ,Ti) r{x,y) _UpDTRANs3 /*(*,*.) C(f,y\ U P D T R A N S 4 f"(x.j) r(T,,y) /'*(*,y) r(T, ,y) Figure 4.5: Pointer Update Inference Rules. These are used to extend our logic to support a second pointer function symbol / ' for each updated / £ F, with the implicit constraint / ' = upda te ( / ,T i ,T2) , where Ti and T 2 are variables, and w is a fresh variable used to capture f(X\). Note that each pointer update introduces its own, unique variable w, and therefore for each update we actually introduce a separate set of IRs. 36 ->z=x b d'(x) -id'(r) EQDATA -^b d{x) d'(x) -<d(x) PRESERVEVALUE ->d'(x) d(x) -d'(x) •EQNODESI -d(x) d'(x) •EQNODES2 X=X X — X Figure 4.6: Data Update Inference Rules. These are used to extend our logic to support a second data function symbol d' for each updated d G D , with the implicit constraint d' — update(<i, x,b), where x e V and b is a boolean variable. EQNODESI , EQNODES2 - nodes whose data function value changes have to be equal to Using the presented intuition behind pointer and data function update rules, we prove the following: Theorem 5. The inference rules of Figures 4.5 and 4.6 are sound; The proof of this theorem is provided in Appendix A . Extending our decision procedure with these additional inference rules allows us to soundly conclude unsatisfiability of a set of literals involving both pointer and data function updates, with the implicit constraints We don't have a proof that this extended set of IRs is complete. Fortunately, not having such a theorem does not compromise the soundness of verification by predicate abstraction. Each time the predicate abstraction engine wants to determine whether some abstract state is reachable, the decision procedure is queried for the satisfiability of formula (2.1). If the decision procedure falsely infers that the formula is satisfiable, when it is actually unsatisfiable, the predicate abstraction algorithm wil l add the unreachable abstract state to the set of abstract states we are assuming are reachable. Therefore, the computed set of reachable abstract states won't be the most precise one, but an over-approximation. Having an over-approximated set of reachable abstract states doesn't preclude soundness of the verification, i.e. i f the program is verified, it satisfies all specified properties. The the node x. (3.1) and (3.2). 37 over-approximation may increase the number of falsely reported property violations (see Section 2.2). However, in our practical experiments of Chapter 5, we never found any property violations caused by the extended decision procedure erroneously concluding that a set of literals was satisfiable. 38 Chapter 5 Experiments This chapter presents the results of testing our framework and the proposed simple transitive closure logic decision procedure on a number of H M P examples. We implemented the de-cision procedure (called straclos1) used in our experiments in C++, and the implementation is publicly available 2. The examples we used in our experiments perform different operations on acyclic and cyclic, singly- and doubly-linked lists. Appendix C provides pseudocode and lists the required predicates for all examples. Here, we give a short summary for each example: LIST-REVERSE is a classical H M P example that performs in-place reversal of a linked list. LIST-ADD first traverses a linked list. Then, it adds a node to the end of the list. ND-INSERT nondeterministically inserts a node into the linked list. (Pseudocode for this example was previously given in Figure 2.2 on page 8 in the background section on HMPs.) ND-REMOVE is similar to ND-lNSERT, except that instead of inserting a node, a node is nondeterministically chosen and removed from the list. 1straclos stands for Simple TRAnsit ive CLOSure logic. 2straclos can be downloaded from http:/ /www.cs.ubc.ca/~zrakamar. 39 ZIP zips two linked lists, shuffling the elements of both list into one. Then, the tail of the longer list is appended to the resulting list. This example is taken from a paper by Jensen et al. [JJKS97]. SORTED-ZlP merges the elements of two sorted lists into one, also sorted. Here, the data elements are simply booleans, so "sorted" means that all nodes with data fields whose value is false come before nodes with data fields whose value is true. This is sufficient to express sortedness of a list of any finite enumerated type (for example i n t ) . SORTED-lNSERT inserts a node into a sorted linked list so that sortedness is preserved. This is a modification of the example from a paper by Lahiri and Qadeer [LQ06]. 3 BUBBLE-SORT sorts elements of a linked list using the bubble sort algorithm. It is taken from a paper by Balaban et al. [BPZ05]. The data fields on which we sort are again booleans. REMOVE-ELEMENTS removes from a cyclic list elements whose data field is false. REMOVE-SEGMENT removes the first contiguous segment of elements whose data field is true from a cyclic singly-linked list. This example is taken from a paper by Manevich et al. [MYRS05] . SEARCH-AND-SET searches for an element with specified data fields in a cyclic singly-linked list, and sets data fields of previous elements to true. SET-UNION combines two cyclic singly-linked lists. Each list represents a set that is uniquely defined with the data field value of its nodes. Therefore, the combination of two lists represents the set union, and the data fields of both lists have to be set to the same value. This example is taken from a paper by Nelson [Nel83]. 3To simplify things, they require that the input list starts with a dummy element whose data field value has to be less than all possible values of that data field. We don't have such requirements in our example, which makes it slightly more complicated. 40 CREATE-INSERT creates a new node (malloc4) and inserts it nondeterministically into a linked list. CREATE-INSERT-DATA creates a new node, initializes its data field, and inserts it nonde-terministically into a linked list. CREATE-FREE creates a new node and inserts it nondeterministically into a linked list. Also, nondeterministically removes a node from the linked list and frees5 it. INIT-LIST initializes the data fields of an acyclic singly-linked list. INIT-LIST-VAR similarly to INIT-LIST, initializes the data fields of an acyclic singly-linked lists, but also sets the value of a global data variable before terminating. INIT-CYCLIC initializes data fields of a cyclic singly-linked list. S O R T E D - l N S E R T - D N O D E S inserts an element into a sorted linked list so that sortedness is preserved. Every node in the linked list has an additional pointer to a node that contains a data field which is used for sorting. REMOVE-DOUBLY removes an element from an acyclic doubly-linked list. REMOVE-CYCLIC-DOUBLY removes an element from a cyclic doubly-linked list. This example is taken from a paper by Lahiri and Qadeer [LQ06]. LINUX-LIST-ADD, LINUX-LIST-ADD-TAIL, LINUX-LIST-DEL are three examples from the Linux kernel list container6, which add and remove nodes from a cyclic doubly-linked list. The data fields and data variables in all of our examples are booleans. The safety properties the tool checked (when applicable) at the end of the H M P s are roughly: Amalloc is modeled as removing a node from the unreachable infinite cyclic list [RSL03]. -'free is modeled in the same fashion as malloc, as adding a node to the unreachable infinite cyclic list. 6Linux kernel version 2.6.13; list container source file is i n c l u d e / l i n u x / l i s t .h; verified functions are l is t_add, l i s t_add_ ta i l , and l i s t jdel . 41 • no leaks (NL) - all nodes reachable from the head of the list at the beginning of the program are also reachable at the end of the program. • insertion (IN) - a distinguished node that is to be inserted into a list is actually reach-able from the head of the list, that is, the insertion "worked". • acyclic (AC) - the final list is acyclic, that is, nil is reachable from the head of the list. • cyclic (CY) - list is a cyclic singly-linked list, that is, the head of the list is reachable from its successor. • doubly-linked (DL) - the final list is a doubly-linked list. • cyclic doubly-linked (CD) - the final list is a cyclic doubly-linked list. • sorted (SO) - list is a sorted linked list, that is, each node's data field is less than or equal to its successor's. • data (DT) - data fields of selected (possibly all) nodes in a list are set to a value. • remove elements ( R E ) - for examples that remove node(s), this states that the node(s) was (were) actually removed. For the program R E M O V E - E L E M E N T S , R E also asserts that the data field of all removed elements is false. Often, the properties one is interested in verifying for H M P s involve universal quantification over the heap nodes. For example, to assert the property N L , we must express that for all nodes t, i f t is reachable from head initially, then t is also reachable from head (or some other node) at the end of the program. Since our logic doesn't support quantification, we use the trick of introducing a Skolem constant t [FQ02, BPZ05] to represent a universally quantified variable. Here, t is a new node variable that is initially assumed to satisfy the antecedent of our property, and is otherwise unmodified by the program. For the example program of Figure 2.2 on page 8, we can express N L by conjoining —if = nil A _/** (head, t) to the a s s u m e statement: a s s u m e ^f*(head,item) Af*(head, nil) A —<head= nil A f(item) — n\\ A p = head 42 on line 2, and conjoining f*(head,t) to the assertion: assert /*(head,item) Af*(head, nil) on line 12. Since (after the assume) t can be any non-nil node reachable from head, i f the assertion is never violated, we have proven N L . Table 5.1 summarizes the results of the experiments, which were run on a 2.6 Ghz Pentium 4 machine. As the table shows, we were successful in verifying interesting prop-erties of many examples quickly and in small amounts of memory. It is hard to make a good comparison with other tools and approaches because the heap structures the tools are able to handle, their expressiveness, and the amount of required manual effort vary greatly.7 Furthermore, most tools are not publicly available, and most papers do not publish quantitative performance results. Here, we make a few comparisons to tools similar to ours, for which we know some performance results: The BUBBLE-SORT example is from Balaban et al. [BPZ05]. Our successful ver-ification of this example highlights the advantage of our inference-rule-based approach against their state-of-the-art small-model-theorem-based approach, which spaced out on this problem [Bal05]. The recent experimental results of Manevich et al. [MYRS05] report comparable execution times to us, in spite of the fact they were executed on a slower machine. 8 For most of their examples, however, their times are for verifying only the simple property of no-null-dereferences (and cyclicity for two examples). Our times are for verifying more complicated properties, for instance N L . In addition, for most of the examples, we verify more than one property in a single run. For the examples in common with Lahiri and Qadeer [LQ06], we are vastly faster at verifying the same properties, with speed-ups of roughly 1 to 3 orders of magnitude on all but one example. It should be noted, however, that we used a slightly faster machine, and 7For instance, some of the tools put the burden of providing loop invariants on the users, while we compute those automatically, which is a costly operation. 8Unfortunately, their tool hasn't been publicly released, and therefore we couldn't make a more thorough comparison. 43 Program Property C F G Preds DP Time calls (sec) LIST-REVERSE N L 6 8 184 0.2 LIST-ADD N L A A C A I N 7 8 66 0.1 ND-INSERT N L A A C A I N 5 13 259 0.5 N D - R E M O V E N L A A C A R E 5 12 386 0.9 ZIP N L A A C 20 22 9153 17.3 S O R T E D - Z l P N L A A C A SO A IN 28 22 14251 22.8 S O R T E D - l N S E R T N L A A C A SO A IN 10 20 5990 13.8 BUBBLE-SORT N L A A C 21 18 3444 11.1 BUBBLE-SORT N L A A C A SO 21 24 31446 114.9 REMOVE-ELEMENTS N L A C Y A R E 15 17 3062 8.8 REMOVE-SEGMENT C Y 17 15 902 2.2 SEARCH-AND-SET N L A C Y A DT 9 16 4892 5.3 SET-UNION N L A C Y A DT A IN 9 21 374 1.4 CREATE-INSERT N L A A C A I N 9 24 3020 14.8 CREATE-INSERT-DATA N L A A C A IN 11 27 8710 39.7 CREATE-FREE N L A A C A I N A R E 19 31 52079 457.4 INIT-LIST N L A A C A DT 4 9 81 0.1 INIT-LIST- VAR N L A A C A D T 5 11 244 0.2 INIT-CYCLIC N L A C Y A D T 5 11 200 0.2 S O R T E D - l N S E R T - D N O D E S N L A A C A SO A IN 10 25 7918 77.9 REMOVE-DOUBLY N L A D L A R E 10 34 3238 24.3 REMOVE-CYCLIC-DOUBLY N L A C D A R E 4 27 1695 15.6 LINUX-LIST-ADD N L A C D A IN 6 25 1240 6.4 LINUX-LIST-ADD-TAIL N L A C D A IN 6 27 1598 7.3 LINUX-LIST-DEL N L A C D A R E 6 29 2057 24.7 Table 5.1: Results of Verifying HMPs. "Program" is the verified H M P ; "Property" spec-ifies the verified property; " C F G " denotes the number of edges in the control-flow graph of the program; "Preds" is the number of predicates required for verification; "DP calls" is the number of decision procedure queries; "Time(sec)" is the total execution time in seconds. The experiments were executed on a 2.6 Ghz Pentium 4 machine. The memory usage for each of the experiments was less than 20 M B . 44 also that our data fields are booleans whereas theirs are abstract integers. For the REMOVE-CYCLIC-DOUBLY example, we are only two times faster. We suspect the the reason behind this is the usage of skolemization which sometimes requires a large number of predicates to define a cyclic, doubly-linked list. The limited support for universally quantified variables we are planning to add would solve this problem. In addition to these experiments, we ran straclos on a couple of queries for M O N A generated by field constraint analysis tool Bohne [ W K L + 0 6 ] . Initial results show that str-aclos is faster than M O N A , but the queries we have are too simple to make a more serious comparison. 45 Chapter 6 Conclusions and Future Work This thesis has presented a novel logic and accompanying decision procedure that are used in the verification of heap-manipulating programs using predicate abstraction techniques. A number of experiments demonstrate the usability and effectiveness of our work for ver-ification of H M P s that occur in practice. Of special note is our verification of three small, but real, Linux kernel list container functions, which use cyclic doubly-linked lists. These results clearly show the potential that this work has in the world of H M P verification. There are some obvious directions left for further improving the proposed logic. First is the addition of quantifiers. We have found that even minimal support for univer-sally quantified variables (e.g. allowing universal quantification over variables only in the beginning of a formula) would allow expression of many common heap structure attributes. For example, even if we introduce Skolem constants, as described in Chapter 5, the cur-rent logic cannot assert that two terms x and y point to disjoint linked lists. Intuitively, using skolemization we can express that some unconstrained node from the list x is not reachable from y, which doesn't necessarily mean that all nodes from the list x are not reachable from y. A single universally quantified variable would allow for this property (see Nelson [Nel79, page 22]). We also found that capturing disjointedness is necessary for verifying that L I S T - R E V E R S E example always produces an acyclic list; hence we were unable to verify this property. Our decision procedure can be enhanced to soundly support 46 universal quantification using heuristic quantifier instantiation techniques, i.e., eliminating quantifiers by instantiating them with terms appearing in a formula. The other logic limitation, that we see no immediate solution to, is our inability to express more involved heap structure properties, in particular trees. Our logic cannot capture "x points to a tree" because expressing that requires using transitive closure over multiple pointer functions (i.e. expressing the fact that some node is reachable from the tree root following a sequence of left or right pointers). It is likely that adding such a transitive closure operator to our logic would cause undecidability [IRR + 04a]. However, we believe that it is possible that an extension could be used to verify simple properties of programs that manipulate trees. For example, by supporting relations instead of pointer functions, we could verify that there are no memory leaks in H M P s that manipulate trees. Besides improving the logic, we also plan on investigating how existing techniques for predicate discovery and more advanced predicate abstraction algorithms mesh with our decision procedure. For instance, predicate abstraction algorithm could be enhanced to support quantifiers by employing indexed predicate abstraction [LB04]. We would also like to look into possible ways of extending our decision procedure to generate inter-polants [CraSl]} Interpolants have been successfully used for refining abstractions in soft-ware model checking [HJMM04], and can be efficiently generated from proofs of unsatis-fiability. Furthermore, by incorporating our decision procedure into a Nelson-Oppen style theorem prover [N079, MZ03], it would be possible to improve the precision of a heap abstraction used by the existing software verification tools that employ theorem provers [ B M M R 0 1 , HJMS02, FLL+02]. We already have the initial results (the sketch of the proof that our logic is stably infinite, which is an important Nelson-Oppen requirement) showing that incorporating the decision procedure can be done. Thanks to Ken McMillan for the proof-generation and interpolant suggestion. 47 Bibliography [Bal05] I. Balaban, 2005. Personal correspondence. [BLS02] R. E . Bryant, S. K . Lahiri, and S. A . Seshia. Modeling and verifying systems using a logic of counter arithmetic with lambda expressions and uninterpreted functions. In 14th International Conference on Computer Aided Verification (CAV), pages 78 - 92, 2002. [BMMR01] T. Bal l , R. Majumdar, T. D. Millstein, and S. K . Rajamani. Automatic pred-icate abstraction of C programs. In ACM Conference on Programming Lan-guage Design and Implementation (PLDI), pages 203-213, 2001. [BPR02] T. Bal l , A . Podelski, and S.K. Rajamani. Relative completeness of abstraction refinement for software model checking. In 8th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), 2002. [BPZ05] I. Balaban, A . Pnueli, and L . Zuck. Shape analysis by predicate abstraction. In 6th International Conference on Verification, Model Checking and Abstract Interpretation (VMCAI), 2005. [BR06] J. Bingham and Z. Rakamaric. A logic and decision procedure for predicate abstraction of heap-manipulating programs. In 7th International Conference on Verification, Model Checking and Abstract Interpretation (VMCAI), pages 48 207-221, 2006. Extended version: U B C Department of Computer Science Technical Report TR-2005-19. [BRS99] M . Benedikt, T. Reps, and M . Sagiv. A decidable logic for describing linked data structures. In European Symposium on Programming (ESOP), 1999. [Bry86] R. E . Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, C-35(8):677-691, August 1986. [CC77] R Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In 4th ACM Symposium on Principles of Programming Languages (POPL), pages 238-252, 1977. [CES86] E . M . Clarke, E . A . Emerson, and A . R Sistla. Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Trans-actions on Programming Languages and Systems (TOPLAS), 8(2):244-263, 1986. [CGJ+00] E . Clarke, O. Grumberg, S. Jha, Y. L u , and H . Veith. Counterexample-guided abstraction refinement. In 12th International Conference on Computer Aided Verification (CAV), pages 154-169, 2000. [Cra57] W. Craig. Linear reasoning. A new form of the Herbrand-Gentzen theorem. The Journal of Symbolic Logic, 22(3):250-268, 1957. [DD01] S. Das and D. L . D i l l . Successive approximation of abstract transition rela-tions,. In IEEE Symposium on Logic in Computer Science (LICS), 2001. [DD02] S. Das and D. L . D i l l . Counter-example based predicate discovery in predicate abstraction. In 4th International Conference on Formal Methods in Computer-Aided Design (FMCAD), 2002. 49 [DDP99] S. Das, D . L . D i l l , and S. Park. Experience with predicate abstraction. In 11th International Conference on Computer Aided Verification (CAV), 1999. [Dij72] E . W. Dijkstra. Notes on structured programming. In O.J. Dahl, E.W. Dijkstra, and C . A . R . Hoare, editors, Structured Programming, pages 1-82. Academic Press, 1972. [Dij76] E . W. Dijkstra. A Discipline of Programming. Prentice-Hall, Englewood Cliffs, New Jersey, 1976. [DN03] D. Dams and K . S. Namjoshi. Shape analysis through predicate abstraction and model checking. In 4th International Conference on Verification, Model Checking and Abstract Interpretation (VMCAI), pages 310-323, 2003. [DOY06] D. Distefano, P. O'Hearn, and H . Yang. A local shape analysis based on separation logic, 2006. [FLL+02] C. Flanagan, K . R. M . Leino, M . Lillibridge, G . Nelson, J. B . Saxe, and R. Stata. Extended static checking for Java. ACM SIGPLAN Notices, 37(5):234-245, 2002. [FQ02] C. Flanagan and S. Qadeer. Predicate abstraction for software verification. In 29th ACM Symposium on Principles of Programming Languages (POPL), pages 191-202, 2002. [GJ90] M . R. Garey and D. S. Johnson. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H . Freeman & Co., New York, NY, U S A , 1990. [Gri81] D . Gries. The Science of Programming. Springer, New York, 1981. [GS97] S. Graf and H . Saidi. Construction of abstract state graphs with P V S . In 9th International Conference on Computer Aided Verification (CAV), 1997. 50 [HJMM04] T. A . Henzinger, R. Jhala, R. Majumdar, and K . McMi l l an . Abstractions from proofs. In 31st ACM Symposium on Principles of Programming Languages (POPL), pages 232-244, 2004. [HJMS02] T. A . Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Lazy abstraction. In 29th ACM Symposium on Principles of Programming Languages (POPL), pages 58-70, 2002. [IRR +04a] N . Immerman, A . Rabinovich, T. Reps, M . Sagiv, and G . Yorsh. The boundary between decidability and undecidability for transitive closure logics. In 18th International Workshop on Computer Science Logic (CSL), pages 160-174, 2004. [IRR + 04b] N . Immerman, A . Rabinovich, T. Reps, M . Sagiv, and G . Yorsh. Verifica-tion via structure simulation. In Conf. on Computer Aided Verification (CAV), 2004. [JJKS97] J. L . Jensen, M . E . J0rgensen, N . Klarlund, and M . I. Schwartzbach. Automatic verification of pointer programs using monadic second-order logic. In ACM Conference on Programming Language Design and Implementation (PLDI), pages 226-236, 1997. [KMS00] N . Klarlund, A . M0ller, and M . I. Schwartzbach. M O N A implementation secrets. In 5th International Conference on Implementation and Application of Automata (CIAA), 2000. [KP00] Y. Kesten and A . Pnueli. Verification by augmented finitary abstraction. In-formation and Computation, 163(l):203-243, 2000. [KS93] N . Klarlund and M . I. Schwartzbach. Graph types. In 20th ACM Symposium on Principles of Programming Languages (POPL), pages 196-205, 1993. [LAIR+05] T. Lev-Ami , N . Immerman, T. W. Reps, M . Sagiv, S. Srivastava, and G. Yorsh. Simulating reachability using first-order logic with applications to verification 51 of linked data structures. In Conference on Automated Deduction (CADE), 2005. [LAS00] T. Lev-Ami and M . Sagiv. T V L A : A system for implementing static analyses. In 7th International Static Analysis Symposium (SAS), pages 280-301, 2000. [LB04] S. K . Lahiri and R. E . Bryant. Constructing quantified invariants via predicate abstraction. In 5th International Conference on Verification, Model Checking and Abstract Interpretation (VMCAI), pages 267-281, 2004. [LQ06] S. K . Lahiri and S. Qadeer. Verifying properties of well-founded linked lists. In 33rd ACM Symposium on Principles of Programming Languages (POPL), pages 115-126, 2006. [LRS05] A . Loginov, T. W. Reps, and S. Sagiv. Abstraction refinement via inductive learning. In 17th International Conference on Computer Aided Verification (CAV), pages 519-533, 2005. [MN05] S. McPeak and G. C. Necula. Data structure specifications via local equality axioms. In 17th International Conference on Computer Aided Verification (CAV), pages 476-490, 2005. [MNCL06] S. Magil l , A . Nanevski, E . Clarke, and P. Lee. Inferring invariants in sepa-ration logic for imperative list-processing programs. In 3rd Workshop on Se-mantics, Program Analysis, and Computing Environments for Memory Man-agement (SPACE), 2006. [MS01] A . M0ller and M . I. Schwartzbach. The pointer assertion logic engine. In ACM Conference on Programming Language Design and Implementation (PLDI), pages 221-231, 2001. [MYRS05] R. Manevich, E . Yahav, G . Ramalingam, and M . Sagiv. Predicate abstraction and canonical abstraction for singly-linked lists. In 6th International Confer-52 ence on Verification, Model Checking and Abstract Interpretation (VMCAI), pages 181-198, 2005. [MZ03] Z. Manna and C. G. Zarba. Combining decision procedures. In Formal Meth-ods at the Cross Roads: From Panacea to Foundational Support, volume 2757 of Lecture Notes in Computer Science, pages 381-422. Springer, 2003. [Nel79] G. Nelson. Techniques for program verification. PhD thesis, Stanford Univer-sity, 1979. [Nel83] G. Nelson. Verifying reachability invariants of linked structures. In 10th ACM Symposium on Principles of Programming Languages (POPL), pages 38-47, 1983. [N079] G. Nelson and D. C. Oppen. Simplification by cooperating decision pro-cedures. ACM Transactions on Programming Languages and Systems (TOPLAS), l(2):245-257, 1979. [RBH06] Z. Rakamaric, J. Bingham, and A. J. Hu. A better logic and decision procedure for predicate abstraction of heap-manipulating programs. Technical Report TR-2006-02, UBC Department of Computer Science, January 2006. [RSL03] T. Reps, M . Sagiv, and A. Loginov. Finite differencing of logical formulas for static analysis. In European Symposium on Programming (ESOP), pages 380-398, 2003. [RZ05] S. Ranise and C. G. Zarba. A decidable logic for pointer programs manipulat-ing linked lists. Unpublished manuscript, 2005. [WKL+06] T. Wies, V. Kuncak, P. Lam, A. Podelski, and M. Rinard. Field constraint analysis. In 7th International Conference on Verification, Model Checking and Abstract Interpretation (VMCAI), 2006. 53 [YRS04] G. Yorsh, T. Reps, and M. Sagiv. Symbolically computing most-precise ab-stract operations for shape analysis. In 10th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 530-545, 2004. [YRS+06] G. Yorsh, A. Rabinovich, M. Sagiv, A. Meyer, and A. Bouajjani. A logic of reachable patterns in linked data-structures. In Foundations of Software Science and Computation Structures (FOSSACS), 2006. 54 Appendix A Proofs In this appendix, we prove all theorems except the decision procedure related Theorem 4, which is proven in Appendix B. The proofs presented here are largely due to Jesse Bingham, who was a coauthor of the published papers on the topic. I have included the proofs in my thesis for the sake of completeness of the presentation. A.l Proof of Theorem 1 Our proof of Theorem 1 uses the following notation and lemmas. Let us fix a heap structure (N,&), and, in a slight abuse, we identify a term x with its interpretation @(x) (which is a node in AO and we also identify the symbol / with © ( / ) . Let x,y EN, then S(x,y) denotes the minimum n such that y — fn (x) if such an n exists, otherwise 8(x,y)=°°. (Hence 8 (x, y) is simply the graph-theoretic directed distance from x to y in the graph of /). Lemma 1. For any nodes x, y, andz, ifS(x,y) is finite, 8(x,z) is finite, andS(x,y) < 8(x,z) then btwnf(x,y,z). Proof. Follows trivially from the semantics of the btwn/ operator. • Lemma 2. Iff(y) — x, then either 8(x,y) = <», or 8(x,y) is finite and 8{x,y) > 8(x,z) for all z such that that 8(x, z) is finite. 55 Proof. If 8(x,y) is finite, then x and y must lie adjacent on a cycle in / ; the result follows. • Lemma 3. If 5(x, y) <8 (x, z) <°° and x^y, then 8(y, z) < 8(y,x). Proof. If 8(y,x) = °o, then the lemma trivially holds. Otherwise we have 0<S(x,y),8(y,x) < °° and thus x, y, and z must occur on a cycle. The lemma follows. • Lemma 4. If btwn f(x,y,z), then 8(x,y) + 8(y,z) = 8(x,z). Proof. Since btwnf(x,y,z), there exist (graph theoretic) paths in the graph of / from x to y and y to z, and hence from x to z. Since the out-degree of all nodes in the graph of / is 1, it follows that all these paths are unique, and that the path from x to z is the concatenation of the path from x to y and the path from y to z. The lemma follows. • Theorem 1. The inference rules of Figures 4.2 and 4.3 are sound. Proof. We argue in turn that each inference rule is sound. Most between inference rule cases involve an implicit appeal to Lemma 1. IDENT, R E F L E X , T R A N S 1, T R A N S 2 . These rules are clearly sound. F U N C . If y = f(x) and z =. fn(x) for some n > 0, then in the case n = 0 we find x = z, and in the case n > 1 we have z = fn~x (y) and hence /*(z,y) holds. C Y C L E * , k > 1. Suppose all the antecedents hold; then y = fn(x\) for some n > 0 and thus y =x\+{n mod it) • N O T E Q N O D E S . The antecedents imply that d(x) ^ d(y), which implies x ^ y. T O T A L . Suppose all the antecedents hold. Then y = fn(x) and z = f"(x) for some n,m>0. Now if n > m, then y = / " _ m ( z ) , otherwise if n < m, then z = fm~n(y). Sec . Suppose all the antecedents hold. In the case x = y, then one of the consequents holds trivially. In the case x^y, then x is on a cycle of / , hence / * (x,z) implies / * (z,x). 56 S H A R E . Suppose all the antecedents hold, and suppose x ^ y. From the third and fourth antecedents, x and y lie on the same cycle of / , and it follows from the first antecedent that z is also on this cycle. If we restrict the domain of / to be this cycle, / must be a permutation, which contradicts f(x) = / (y) . B T W R E F L E X . Trivial. B T W I . From the antecedents it follows that 8(x,y) and 8(x,z) are both finite. In particular, z is reachable from x, and using Lemma 2 we have 8(x,y) < 8(x,z) and thus btwnf(x,y,z). BTW2. Trivial. B T W 3 . Let us suppose the antecedents hold, and x ^ y. It follows that 8(x,y) and S(x,z) are both positive and finite, and that 8(w,y) = 8(x,y) — 1 and 8(w,z) = 8(x,z) — 1. Since 8(x,y) < 8(x,z), this implies 8(w,y) < 8(w,z), and therefore btwn f(w,y,z). B T W 4 . From the antecedents it follows that 8(x,y) and 8(x,z) are both finite. From the first antecedent we have 8(x,y) < 8(x,z); from the second antecedent we have 8(x,y) > S(x,z). Thus S(x,y) = 8(x,z), implying y = z. B T W 5 . From the antecedents it follows that 8(x,y) and 8(x,z) are both finite. If 8(x,y) < 8(x,z), then we have btwnf(x,y,z). If 8(x,y) > 8(x,z), then we have btwn/(.x,z,y). B T W 6 . From the antecedents it follows that 8(a,b) is finite for all a,b e {x,y,z}. Ifx — y, x — z, or y = z, then one of the right three consequents holds. Hence we assume that x, y, and z are distinct nodes. Suppose that we have 8(x,y) < 8(x,z). Then, by Lemma 3, we have 8(y,z) < 8(y,x), and, again by Lemma 3, we have 8(z,x) < S(z,y). We conclude that btwn/(;t,y,z), btwn/(y,z,;t), and btwn/(z,jc,y) all hold. The proof that the second-to-leftmost branch of B T W 6 follows from 8(x,y) > S(x,z) is similar. B T W 7 . From the antecedent it follows that 8(x,y) is finite. Since 8(x,x) = 0 < 8(x,y), we have btwnf(x,x,y); since 8(x,y) < 8(x,y), we also have btwn/(^;,y,y). B T W 8 . From the antecedents it follows that 8(x,y) < 8(x,z) < 1. If 8(x,y) — 0 then y = x, while i f 8(x,y) — 1 then y = z. 57 B T W 9 . Suppose the antecedents hold, and y^w From btwn f(x,y, w) it follows that 8(x, w) is finite and positive, and 8(x,y) < 8(x,w). Since f(z) = w, we have 8(x,z) = 8{x, w) — 1. Thus 8(x,y) < 8(x,z), and hence btwnf(x,y,z). B T W I O . If y = z, then the right consequent holds, hence we assume y ^ z. From the antecedents and y ^ z it follows that z and y are on an /-cycle C. If w is on C, then the left consequent holds and we are done. Otherwise, let k > 0 be minimal such that fk(x) is in C. Since 8(x,w) < °°, we must have 8(x,w) < k, else w would be in C. It follows that 8(w,y) = 8(x,y) - S(x,w) and that 8(w,z) = 8(x,z) - 8(x,w), thus 8(w,y) < S(w,z), since 8(x,y) < 8(x,z), which contradicts the antecedent btwn f(w,z,y). B T W l l . From the antecedents it follows that 8(w,x) < 8(w,y) < 8(w,z) < °°. Thus, 8(w,x) < 8(w,z) < °°, implying btwn/(w,x,z). BTW12. From the antecedent btwn/(w,x,y), it follows that 8(u,x) < 8(u,y) < <». When we add 8(v,u) to this inequality, we get 8(v,u) + 8(u,x) < S(v,u) +8(u,y) < °°. From the antecedents btwn/(v,M,x) and btwn/(v,w,y), it follows that 8(v,u) + 8(u,x) = 8(v,x) and 8(v,u) + 8(u,y) — 8(v,y), respectively, by Lemma 4. Therefore, we conclude that 8(v,x) < 8(v,y) < °°, and thus btwny(v,x,y) is implied. • A.2 Proof of Theorem 2 Theorem 2. There exists a polynomial-time algorithm that transforms any set 4> into a normal set such that Q>' is satisfiable if and only if<b is satisfiable. Proof. Our transformation algorithm has two variables 4>o and <J>i of type "set of literal", such that initially we have 4>o = <t> and <E>i = 0. Now, while there exists mention of a term of the form f(y) (where v e V and / G F) in <t>0, create a fresh variable Vfresh, replace all occurrences of / (v) in 4>o with Vfresh, and add the literal f(v) — Vfresh to 4>i. Once we have no terms of the form / (v) in <50> let * 2 = *o U * i . Now, for each equality of the form v=u (where v and u are variables) in 4>2, replace the equality; Let be the set obtained by <£' satisfies Definition 1, is satisfiable i f and time. 58 all occurrences of u in 4>2 with v, and remove exhaustively applying this reduction. Clearly only i f <l> is, and is constructed in polynomial • A.3 Proof of Theorem 3 In order to prove Theorem 3, we wi l l demonstrate how, given a consistent, closed, and normal set of literals <$>, one can construct a heap structure / / * such that 1= 3>. For the remainder of this section, let us fix a consistent, closed, and normal set 4>, let V = Vars(<I>), and let F, D, and B be respectively the set of pointer fields, data fields, and data variables mentioned in 4>. The set of nodes of / / * wil l be V. For each / £ F, we define the relation / * * C V x V such that (u, v) £ / * * iff f* (u, v) £ <1>. For each v £ V and / £ F, let us define <vfC V x V as follows: u <vf w iff btwn/(v,w, w) £ 4>. Letrt/(v) = {u \ f*(v,u) £ 4>}. Lemma 5. For a// v £ V and f £ F , <^ « a tofa/ order onR/(v). Proof. <vj is clearly reflexive , since B T W R E F L E X is not enabled. Now suppose x<yy and y </ z. Then btwnf(v,x,y) and btwn/(v,y,z) are both in and thus so too is btwnf(v,x,z), since B T W I 1 is disabled. Therefore x <vf z, and <vj is transitive. Now suppose x <j y and y <vf x. Then btwnf(v,x,y) and btwn/(v,y,jc) are both in <£>, which, since B T W 4 is disabled, implies that x — y £ 4>, which implies that x = y (i.e. x and y are the same symbol in V) since 3> is normal. Thus <^ is antisymmetric. Finally, suppose x,y £Rf(v). Then f*(v,x) and / * (v,y) are in 3>, and thus, since B T W 5 is disabled, either x<yyory<yx. • Lemma 6. For all f £ F and v £ V, rne minimal element of<y is v. Proof. Suppose, on the contrary, that there exists some symbol w different from v such that w <yv and hence btwn/(v,w,v) £ <J>. Since B T W 7 is disabled, we also have that btwn/(v, w, w) £ But since B T W 4 is disabled, we also have v—w £ <I>, which contradicts <I> being normal. • 59 Now that we have Lemma 5, the following function is well-defined. Definition 1 (TJ/). For each f G F, define the function T]f : V —> V such that H/(v) =v if Rf(v) = {v}, otherwise T7/(v) is the <vj-minimal element ofRf(v) \ {v}. Definition 2 (/-basin). For f G F and v G V, if R/(v) = /?/(TJ/(v)) we say that v is an /-basin node and we call R/(v) a /-basin. Lemma 7. For all f G F and all v, u G V, if f(v) = then Tj/(v) = u. Proof. Suppose / (v) = M £ $ . Since T R A N S 1 is disabled, we have /*(v,u) G 4>, and hence u G Rf{v). Now if v and u are the same symbol, say v, then there cannot exist some other symbol w G V such that f*(v,w) G since C Y C L E I is disabled and <f> is normal. Thus Rf(v) = {v}, and from Def. 1,7]/(v) = v. On the other hand, i f v and u are distinct symbols, we claim that u is the <^-minimal element of R/(v) \ {v}. Suppose, on the contrary, there exists y G / ? / ( V ) \ V , H such thaty <y u. Than btwn/(v,y,w) G <1>, and we already have / (v) = « £ $ , However, this contradicts the facts that B T W 8 is disabled and <E> is normal. We conclude that f]f(v) = u. • Lemma 8. For all f G F, suppose x, y, and z distinct elements of V in the same SCC of /**, and y <y z. Then z<yx and x <y y. Proof. Since 4> is normal and x, y, and z are distinct symbols, we have that none of x=y, x=z, o r y = z a r e i n <f>. Now from our supposition, btwr\f(x,y,z),f*(x,y),f*(y,z),f*(z,x) G <I>. Since B T W 6 is disabled, either we also have btwn/(y,z,x), btwnf(z,x,y) G 4>, and our lemma holds, or we have btwn / (^ ,z ,y ) ,btwn / (z ,y ) A;) ) btwn / (y ,Jc ,z) G <£>. The latter case yields a contradiction, however, since having both b twn / (* ,y ,z ) ,btwn / (* ,z ,y) G <fr and B T W 4 disabled implies that z=y G <f>, which contradicts the first sentence of this proof. • Lemma 9. For all f G F, /** is reflexive and transitive. Proof. / * * is clearly reflexive and transitive since I D E N T and T R A N S 2 are disabled. • Lemma 10. For all f G F and v G V we have that R/(rj/(v)) C Rf(v). 60 Proof. Let J | /(v) = u. From Def. 1, there exists w G V such that btwn/(v,M,w) G 3>, and since B T W 2 is disabled, we have (v, u) G / * * . Therefore, by Lemma 9 we are done. • Lemma 11. For all f G F and v G V and let u = r//(v). Then either v is an f-basin node, and <" is identical to <vj- except v is made the maximal, or Rf(u) —Rf{y) \ {v} and <y- is <Vf restricted to Rf(u). Proof. If u and v are the same symbol v, then we must have R/(v) — {v} from Def. 1 and <y— {(v, v)}; thus the first bullet holds trivially. Thus we assume for the remainder of the proof that u and v are distinct symbols. We case-split on whether or not (w, v) G / * * . Case: (w,v) G / * * . Then Rf(u) 2 Rf(v) from Lemma 9, and thus, by Lemma 10, Rf(u) — R/(v). Now, let x and y be elements of Rf[u) \ {v}. We wish to show that x <" y implies x <y y. Let us assume that x <" y. If x and y are the same symbol, x, then the facts that (u,x),(v,x) G / * * and B T W 7 is disabled imply that x <"x and x <yx. Hence, we assume that x and y are distinct symbols. From Lemma 5 exactly one of x <vj- y or y <y x holds. In the former case, we are done. In the latter case, we have x <yv by Lemma 8. Also, from the definition of r\f, we have that u <vj y, and, again by Lemma 8, we have v <yj u. Since x <yv,v <y u, and B T W I 1 is disabled, we have x <^ u. Finally, by another application of Lemma 8, we conclude y <" x, which contradicts our assumption x<"y and the facts that x and y are distinct and <" is a total order. It remains to show that x <" v for all x G Rf(u). If x is v we are done; else if x is u we are done by Lemma 6. Hence, we assume that x is distinct from v and u. We have that u<yx and thus, by Lemma 8, x <" v. Hence v is the maximal element of <". Case: (u, v) 0/**. Then clearly v G- R/(u), and from Lemma 10, Rf(u) C R/(v) \ {v}. Conversely, choose w eR/(v) \ {v}. Then, from Def. 1, the fact that r//(v) = u, and Lemma 5 we have that u <vj w and hence btwn/(v, u, w) G 4>. Now, since B T W 2 is disabled, this implies that f*(u,w) G <I> and thus w eR/(u). Therefore, R/(u) =R/(v) \ {v}. Now we argue that <" is <Vf restricted to Rf(u). Let x,y 6 R/{u) be such that x <uf y. As in the previous case, we may assume that x and y are distinct symbols. Thus 61 btwn/(w,;t,y) G <f>. Since u is the <^-minimal element of Rf(y) \ {v}, we also have that btwn/(v,M,;c) G and btwn/(v,u,y) G 3>. It follows that btwn/(v,;e,y) G 4>, since B T W 12 is disabled, and thus x <y y. • Definition 3 (seqj-(y)). Given v G V and / e F , /ef seqj(v) be the infinite sequence over Rf(v) defined inductively as follows. Ifv is an f-basin node, then seq^(v) = s03, where s is the sequence of length \Rf(v) | wherein all elements ofR/(v) are listed according to the total order <y. Otherwise, seqf(v) = v • seqjr(r]f(v)) Lemma 12. For all f E F and v £ Vw e have that seqjiy) =v • seqf(t]f(v)) Proof. If v is an /-basin node, the result follows from Lemmas 6 and 11 and Def. 3. Other-wise, the Lemma follows trivially from Def. 3. • Lemma 13. For all f £ F and v e V and n>0, the nth element ofseqy(v) is T}" (v). Proof. B y induction using Lemma 12. • Lemma 14. For all f G F and v G V, the symbols appearing in seqjiy) are precisely R/(v). Proof. Let i > 0 be the position of the first /-basin node in seqj(v). Note that such a node must exist, else, by Lemma 11, we would have that Rf(v),Rf(r]f(v)),Rf(r]j(v)),... would be an infinite sequence of finite sets, each being a proper subset of the previous. We complete the proof by induction on i. If i = 0 then v is an /-basin node, the lemma follows trivially by Def. 3. Now assume / > 0 and the lemma holds for for all w G V with first /-basin node of seqj(w) begin at position i— 1. From Lemma 12, seqf(v) = v-seqf(r\f(v)), thus the first/-basin node of seqj(r\f(v)) is at position i—l. Therefore the symbols appearing in seqf(r}f(v)) are precisely /?/(n/(v)). Now from Lemma 11 we have that Rf(r}f(y)) = R/(v)\{v}; therefore, since the symbols appearing in seqj-(v) arevalong with those appearing in seqy(r//(v)), we are done. • Lemma 15. For all f G F, /** is the reflexive transitive closure ofrjf. 62 Proof. From Lemma 9 we have that / * * is reflexive and transitive. Thus, letting TJ J denote the reflexive transitive closure of 71/, we must show that T/jf = / * * . Suppose (v, u) G / * * ; then u £ /?/(v), and hence by Lemma 14, u appears in seqj(v). Let n be the position of an occurrence of u in seq^iy). B y Lemma 13 we have that u = n"(v), thus (v,w) G n^. Conversely, suppose that there exists n > 0 such that nj!(v) = u. Using a simple induction along with Lemma 10 and the fact that w G R/(w) for all w G V, one can show that (v,w) G y*(J> j-j Lemma 16. For all f € F and v G V, r/ie pre/a ofseqj(v) of length |/?/(v) | is r/ie elements ofRfiy) ordered according to <y Proof. As in the proof of Lemma 14, let i > 0 be the position of the first /-basin node in seqj-(v). We proceed by induction on /. If i = 0, then v is an /-basin node and the result follows from Def. 3. Now, assume the statement is true for sequences with first / -basin node at position /— 1 for some i > 0. Let u = rj/(v). From Lemma 12 we have that seqj(v) = v-seqf(u), and thus, from our inductive assumption, the prefix of seqj-(u) of length of length is the elements of R/(u) ordered according to <". From Lemmas 6 and 11 and the fact that v is not an /-basin node, it follows that R/(v) = R/(u) U {v}. Therefore, the first |/?/(v)| elements of R/(v) are R/(u) U{v} = R/(v). Furthermore, by Lemmas 6 and 11, we have the ordering requirement on this prefix as well. • Lemma 17. For all f G F and u, v, w G V, btwn f(u, v, w) 6 $ iff the minimal n and m where v — T}J(u) and w = Tj^(u) are such that n<m. Proof. (=>) Since B T W 2 is disabled, we have v,w G /?/(«) and thus, by Lemma 16, v and w appear in the first |/?/(M)| elements of seqj(u). Also, since btwn/(w,v,w) G 4>, we have that v <" w. Finally, using Lemmas 13 and 16, we have that the minimal n and m where v = rif(u) and w = n^(w) exist, and are such that n<m. (4=) Suppose the minimal n and m where v = rjj(u) and w = t]f{u) exist and are such that n<m. Thus, by Lemmas 13 and 14, v,w G / ? / ( « ) . Also, by Lemma 16, we must have v <ur w, therefore, by the definition of we have btwn j (w, v, w) G <E>. • 63 Definition 4. Let = (V, 0* ) be defined such that 0 * is the identity on V For each f £ F, let f be interpreted by 0 * as rjf. For each d G D and v G V, let 0*(rf)(v) true ifd(v) G <D false otherwise For each b G B, /ef true false otherwise Theorem 3. If<& is consistent, closed, and normal, then <f> is satisfiable. Proof. We argue that / / * N <J>. Let 0 be a positive literal of 4>; we show that / / * 1= 0, case-splitting on the type of 0. Here v, u, and w range over V; f £ F, d £ D, and b £ B. Also, only those literals of forms allowed in normal sets (see Definition 1) need be considered. v=v : clearly satisfied by any heap structure. / (v) = u : From Lemma 7 we have that r//(v) = u. d(v) : satisfied by / / * from Def. 4. 6 : satisfied by H® from Def. 4. / * (v, w) : From Lemma 15 we have that (v, u) is in the reflexive and transition closure of r/y. btwnf{y, u, w) : satisfied by by Lemma 17. Let - i0 be a negative literal of <J>; we show that / / * ^ 0, case-splitting on the type of 0. v=w : v and u must be distinct symbols, else <I> would contain a contradiction (since IDENT is disabled). Clearly / / * \fv — u, since v ^ M and 0 * is the identity on V. d(y) : not satisfied by from Def. 4. : not satisfied by / / * from Def. 4. 64 f*(v,u) : Since 4> is consistent, f*(v,u) S- * and thus, by Lemma 15, (v,u) is not in the reflexive and transitive closure of r/y. btwn/(v, w. w) : not satisfied by / / ° by Lemma 17. This completes the proof. • A.4 Proof of Theorem 5 Theorem 5. The inference rules of Figures 4.5 and 4.6 are sound. Proof. We use the same abuse of notation used in the proof of Theorem 1. U P D A T E . This IR is sound since w is a fresh variable. U P D B T W N . We employ the notation 5(-, •) used in the proof of Theorem 1, with the ad-ditional notation S'(x,y) to denote the distance in / ' from x to y. From the antecedents, it follows that 8(x,y) and 8(x,z) are both finite, and also that 8(x,z) > 0. Now suppose 8(x,Ti) > 8(x,z). Then it follows that 8'(x,z) = 8(x,z) and S'(x,y) — 8(x,y), and thus S'(x,y) < 5(x,z) and btwnf(x,y,z). On the other hand, i f 8(x,T\) < 8(x,z), then it follows that Ti ^zand btwnf(x,T\,z). U P D F U N C I . The inference rule clearly respect the facts that / ' = update( / ,Ti,T 2 ) and / = update( / ' ,Ti ,w). U P D F U N C 2 . Analogous to U P D F U N C I . U P D T R A N S 1. Suppose y = fn(x) for some n > 0. We case split on whether or not Ti = fm(x) for some m<n.\i so, then y = fn~m{%\) = f"-m~x (w), where n - m - 1 > 0, thus /* (w,y). If Ti / fm(x) for all m such that 0 < m < n, then f'j(x) = fj(x) for all j such that 0<j<n, hence f'*(x,y). U P D T R A N S 2 . Analogous to U P D T R A N S 1. U P D T R A N S 3. Suppose that Ti = fn(x) and y = f'm(x) for some n and m, and let our choices of n and m be minimal. Now if m < n we clearly have y = fm[x). Otherwise, i f m > n, then y = / " " - « ( T 1 ) . 65 U P D T R A N S 4 . Analogous to U P D T R A N S 3 . E Q D A T A . This IR ensures that after the update the value of a data function d'(z) is equal to b. This is clearly sound, because from the definition d' = update(d,T,o) of d' it can be seen that d'(%) has to be equal to b. P R E S E R V E V A L U E preserves values of data function of nodes which are not equal to T and therefore cannot be influenced by the update. It is clearly sound. E Q N O D E S I . Trivial, since under the constraint d' = update(d,z,b), we have that d(x) ^ d'(x) implies that x — Z. EQNODES2. Analogous to E Q N O D E S I . • A.5 Complexity of the Satisfiability Problem The proof of complexity of the satisfiability problem for our logic uses a reduction from the following decision problem, which is known to be NP-complete [GJ90]. Definition 5 ( B E T W E E N N E S S ) . The decision problem B E T W E E N N E S S asks, given a finite set A and a set C of triples (a,b,c) of distinct elements from A, if there exists a one-to-one function g : A —» { 1 , 2 , s u c h that for each (a,b,c) EC we have either g(a) < g(b) < g{c) or g(c) < g{b) < g{a). Theorem 6. Given a set of literals 4>, the problem of deciding if<& is satisfiable is NP-hard. This holds even when <J> contains no updates, no btwn predicates, no data fields, and only mentions a single pointer function f. Proof. We reduce B E T W E E N N E S S to satisfiability of a set of literals adhering to the sec-ond sentence of the theorem statement. Let (A,C) be an arbitrary instance of B E T W E E N -N E S S . 66 Given a set of terms T, let distinct(T) be the set of ('2') literals that enforces that the terms of T are pair-wise unequal, for example distinct({x,f(x),y}) = = / ( * ) , - * = y,-./(*) = y } Similarly, given two sets of terms T\ and T2, let distinct(T\, T2) be the set of \T\ \ \Tz\ literals that enforces that no term in T\ is equal to any term in Ti. Our set of literals 4> wi l l involve a variable h, a variable e for each e EA, and for each triple t EC, two variables y, and z,. Now, let <E> be the union of the following five sets of literals, where n= \A\: hfn-\h)=f-\h),r\h)=fn{h)} (A.1) distinct (A) (A.2) {f(h,e)\eEA} (A.3) U {/* (y(aAc) MJ* (b, z ( a , M ) , f («, y { a M ) , f* (h, Z { a M ) } (A.4) (a,b,c)eC \J distinct ({y{aM,Z(aAc)},A\{a, c}) (A.5) (a,b,c)eC ( A . l ) says that h is the head of a non-cyclic list of exactly n nodes h,f(h),...,f~l(h). (A.2) and (A.3) say that the elements of A are associated in a one-to-one correspondence with the nodes in this list. To see this, note that (A.3) implies that for each e E A we have e = f(h) for some 0 < i < n while (A.2) enforces that there is no e' E A \ {e} such that e' = f'(h). (A.4) says that for each triple (a,b,c) E C, the variables y(a,b,c) and z{a,b,c) are both in the list (and are hence both equal to elements of A), and further y(a,b,c) comes (not necessarily strictly) before b and Z(a,b,c) comes (not necessarily strictly) after b in the list. Now taking (A.5) into account allows us to conclude that y(a,b,c) ar>d z(a,b,c) can each only be equal to a or c (hence the previous before and after relations become strict, since a and c are distinct from b). 67 With the preceding intuition, it is easy to see that i f (A,C) is a positive instance of B E T W E E N N E S S then one can construct a heap structure that satisfies <t> using the function g of Def. 5 to define an interpretation of the variables A as nodes in the linked list. The interpretation of the variables y(a,b,c) a n d £(a,i>,c) i s respectively a and c i f g(a) < g(c), or respectively c and a otherwise. Conversely, from any satisfying heap structure, one can extract a one-to-one function g satisfying Def. 5 by simply using the total order defined by the linked list. Finally, we note that |<J>| = &(\A\2 + \A\|C|), and each literal of * has length that is at most linear in \A\. It follows that * can be constructed in time polynomial in the size of (A,C). The NP-hardness of satisfiability in our logic therefore follows from the NP-completeness of B E T W E E N N E S S . • 68 Appendix B Formalization of the Decision Procedure The core decision procedure algorithm takes a normal set of literals <£>; this restriction does not lose us any generality thanks to Lemma 18. The pseudocode of the core of the decision procedure is given in Figure 4.4 on page 32. The following three lemmas and a theorem demonstrate the correctness of our algorithm. Note that the proofs of these lemmas and the theorem assume that the literals are from the logic of Figure 3.1 on page 20; they do not apply to the extended logic of Sec-tion 3.2. The proofs of Lemmas 18, 19, and Theorem 4 can easily be generalized to deal with the extension. However, we have not yet been able to prove Lemma 20 for the extended decision procedure. Lemma 18. If invoked with a normal set <I> <P' will be normal if the recursive call of line 14 is reached. Proof. If <f> is normal, then any applicable IR r wil l have all its free terms x, y, z, x\, etc. instantiated as variables of 4>. Inspection of all IRs reveals that if the free terms are instantiated with variables, then any consequent wil l either be an equality between variables, disequality between variables, a set of reachability literals involving two/three variables, or a set of between literals involving three/four variables. In the first case, <$>' is assigned by 69 line 9, and clearly performing the substitution preserves normality. In all other cases, 4>' is assigned by line 11, and 0 is a disequality literal, a set of reachability literals involving two/three variables, or a set of between literals involving three/four variables. Addition of such literals also preserves normality. • Lemma 19. If DECTDEf^ J returns UNSAT then <& is unsatisfiable. Proof. (Sketch) If U N S A T is returned on line 3, then <5 contains a contradiction and is obviously unsatisfiable. If U N S A T is returned on line 18, addition of all consequents of an applicable IR yielded UNSAT from the recursive calls. The proof thus depends on the Theorem 1, which states that the IRs of Figure 4.2 and Figure 4.3 are sound. • Lemma 20. If DECIDE^ returns SAT then <I> is satisfiable. Proof. (Sketch) If SAT is returned, then by applying a sequence of IRs to 4> the algorithm reached a point in which SAT was returned by line 20. Let <f> be the set of literals that caused line 20 to be reached. Then, <I> is obtained from <5 by adding disequality, reachability, and between literals, and doing variable substitutions. Furthermore, <l> is consistent, closed, and, by Lemma 18, normal. Thus, by Theorem 3, <l> is satisfiable, which implies the satisfiability of <l> also. • B.l Proof of Theorem 4 Theorem 4. The decision procedure always terminates. Proof. Follows from the fact that none of the IRs create new terms, and there are only a finite number of possibly literals that one could add given a fixed set of terms. Also, the variable substitutions can only reduce the number of terms. • 70 Appendix C Pseudocode of the Examples 1: procedure L I S T - R E V E R S E ( X ) 2: assume -a; = n i lA/*(x ,n i l ) A/*(x , r ) A - i ? = n i l A y = n i l 3: while —IJC= nil do 4: temp := f(x); 5: f(x):=y; 6: y:=x; 7: x:— temp; 8: end while 9: assert f*(y,t) 10: end procedure Figure C l : LlST-REVERSE is a classical HMP example that performs in-place reversal of a linked list. Predicates used to verify the example: = ni l , f*(x, n i l ) , f*(x,t), / = ni l , y = n i l , f*(y,t), f*(temp,t), f(x)=temp. 71 1: procedure LIST-ADD(head, item) 2: assume -*/*(head, item) Af*(head,ri\\) Af*(head,t) A/(item) = nil A p = head 3: if head = nil then 4: head := p; 5: else 6: while ->f(p) — nil do 7: p:=/(p)\-8: end while 9: /(p) := item; 10: end if 11: assert /*(head,item) A/*(head, n\\) A/*(head,t) 12: end procedure Figure C.2: L I S T - A D D first traverses a linked list. Then, it adds a node to the end of the list. Predicates used to verify the example: f*(head,item), /*(head,r\\\), /*(head,t), /(item) = nil, p = head, head—n\\, /(p) — nil, /*(head,p). 1: procedure ND-lNSERT(head,item) 2: assume -i/*(head, item) A /*(head,r\\\) A ^ head — nil A / * (head, t) A ->t = nil A /(item) = nil A p = head 9 10 11 12 13 while t rue do if NDV/( />) = nil then /(item) :=/(p); /(p) := item; break; else P.•=/(/>); end if end while assert / * (head, item) A /* (head, ni I) A /*(head,t) end procedure Figure C . 3 : ND-lNSERT nondeterministically inserts a node into the linked list. Pred-icates used to verify the example: /*(head,item), /*(head,n\\), head = n\\, /*(head,t), f = nil, /(item) = nil, p — head, /(p) = n'\\, /*(head,p), /*(item, nil), /*(item,p), /*(item,t), r(f(p),t). 72 1: procedure ND-REMOVE(nead) 2: assume -<head = n i IA / * (head, n i I) A /* (head, t)A-^t = n\\Ap — head A r—f{head) 3: while true do 4: if M D V / ( r ) = nil then 5: f(p)-=f(r); 6: break; 7: else 8: p:=r; 9: r:=f(r); 10: end if 11: end while 12: assert f*(head,n\\) A(f*(head,t)@r=t) 13: end procedure Figure C.4: N D - R E M O V E nondeterministically chooses a node and removes it from the list. Predicates used to verify the example: head = ri\\, f*(head,nil), f*(head,t), f = nil. p = head, r=/(head), r=t, f(r) = n\\, f*(head,p),f*(p,r), f*(r,t), f*(f(p),t). 73 procedure Zip(x,y) assume f*(x, nil) A / * (y, nil) A (/* (x, t) V / * (y, t)) A z = nil A p= nil A femp = nil if x=n i l then femp := x; x:=y; y := temp; end if while — IJC=n i I do if z=ni l then z :—x; P '•= x; else f(p) -=x; P '•= x; end if x := f(x); f{p) := nil; if -iy=nil then femp := x; x:=y; y:— temp; end if end while assert /*(z, nil) Af*(z,t) end procedure Figure C.5: ZIP zips two linked lists, shuffling the elements of both list into one. Then, the tail of the longer list is appended to the resulting list. Predicates used to verify the example: /*(x,nil) , /*(y,ni l) , f*(x,t), f*(y,t), z = nil, p = n\\, temp = n\\, /*(Z)nil), f*(z,t), x=n\\, y = nil, f*(temp,t), p=x, f*(p,n\\), f*(p,t), f*{z,x), f*(z,p), p = t, f*(y,p), f*(temp,p), r(x,p),f(P)=x. 74 1: procedure SORTED-ZlP(x,y) 2: assume f*(x, nil) A / * ( y , nil) A -tf = nil A (rf(f) < d(f(t)) V / ( r ) = nil) A (/*(.x,f) ® /*(y ,0 ) A merge — nil A temp — nil while - a = n i l A - \ y = nil do if rf(*) <rf(y) then if -ifemp = nil then f(temp) :=x; else merge := x; end if remp := x; JC := / (x ) ; else if -'temp = nil then f(temp) := y; else merge := y; end if temp:=y; y:= f(y); end if end while if -<x= nil then if merge = nil then merge := else f{temp) :=x; end if end if if —>y = n i I then if merge = nil then merge := y; else /(/ewp) :=y; end if end if assert f*{merge,t) A(d(t) < d(f(t))V f(t) = n\\) end procedure Figure C.6: SORTED-ZlP merges the elements of two sorted lists into one, also sorted. Predicates used to verify the example: f*(x,nil), f*(y,nil), r = nil, d(t), d(f(t)), f(t) = nil, f*(x,t), f*(y,t), merge = nil, femp = nil, f*(merge,t), x = nil, y = nil, d(x), d(y), f*(merge,temp), f(temp)=x, f(temp)=y, temp—x, temp—y, merge—x, merge=y. Comparison between data values is defined as a formula over boolean data predicates. For instance, d(x) < d(y) is defined as -i(d(x) A -*d(y)). 75 1: procedure SORTED-lNSERT(/iead, item) 2: assume -if*(head,item) A f*(head, nil) A -ihead—n\\ A (/*(head,t) ® item — t) A ->t = nil A /(item) = nil A (d(t) < d(f(t)) V f(t) = nil) A curr = head A succ = /(head) 3: while -^succ = x\\\Ad(item) > d(succ) do 4: curr := succ; 5: succ := /(curr); 6: end while 7: if d(head) > d(item) then 8: /(item) := head; 9: /lead := item; 10: else 11: /(item) := succ; 12: /(curr) := item; 13: end if 14: assert /*(head, nil) Af(head,t) A (d(t) < d(/(t)) V/(t) = nil) 15: end procedure Figure C.7: SORTED-lNSERT inserts a node into a sorted linked list so that sortedness is preserved. Predicates used to verify the example: /*(head, item), /*(head,W\\), head = n\\, /*(head,t), item = t, r = nil. /(item) = nil, d(t), d(/(t)), / ( r) = nil, curr = head, succ = /(head), succ = nil, d(item), d(head), d(succ), /*(head,curr), /(item) = succ, /(curr) = succ, /(item) = curr. Comparison between data values is defined as a formula over boolean data predicates. For instance, d(x) < d(y) is defined as ->(d(x) A ^d(y)). 76 procedure B U B B L E - S O R T ( X ) assume / * (x, n i I) A / * (x, t) A -if = n i IA y=x A yn = f(y) A prev = n i IA last = n i I while ->last = f(x) do while ->yn = last do if d(y) > d(yn) then f(y) :=f(yn); f(yn) :=y; if prev = nil then x : = y n ; else fiprev) := yn; end if prev := yn; y n : = / ( y ) ; else prev := y; y :=yn; y n : = / ( y n ) ; end if end while prev := nil; /a^f := y; y :=x; y n : = / ( x ) ; end while assert f*(x, nil) Af*(x,t) A (d(t) < d(f(t)) V / ( f ) = nil) end procedure Figure C.8: BUBBLE-SORT sorts elements of a linked list using the bubble sort algorithm. Predicates used to verify the example: f*(x, nil), f*(x,t), t — r\\\, y=x, yn=f(y), prev = nil, Zosf = ni l , d(t), d(f(t)), / (f) = ni l , f(x) = last, yn = last, d(y), d(yn)y f*(yn,t), f*(last,t), f*(x,prev), f(yn) =y, fiprev) = y , f = y , / (yn) = / (y ) , f*(x,yn), d(last), prev=y. Comparison between data values is defined as a formula over boolean data predicates. For instance, d(x) < diy) is defined as -<(d(x) A ->d(y)). 77 1: procedure R E M O V E - E L E M E N T S ( X ) 2: assume - i * = n i l A f*(f(x),x) A f*(x,t) A btwn / (curr, t,x) Aprev=x Acurr — f(x) 3: while -*curr=x do 4: if d(curr) = false then 5: curr:—f (curr); 6: f(prev) := cwrr; 7: else 8: prev := curr; 9: curr :—f (curr); 10: end if 11: end while 12: if d(x)= false then 13: if ->prev=x then 14: * : = / ( * ) ; 15: f(prev) := x; 16: else 17: x : = n i l ; 18: end if 19: end if 20: assert f*(f(x),x) A( (f*(x,t) Ad(t) = true) V (^f*(x,t) A d ( f ) = false)) 21 : end procedure Figure C.9: R E M O V E - E L E M E N T S removes f rom a cyc l ic l ist elements whose data f ield is false. Predicates used to veri fy the example: f*(f(x),x), curr = x, prev = x, d(curr), d(x), f*(x,t), d(t),x=ri\\, curr=f(x), btwn/(curr, t,x), f(prev) = curr, f (f (prev))=curr, f{f(x),prev), prev = curr, f*(t,prev), f(prev) =t, f(curr) =x. 78 procedure R E M O V E - S E G M E N T ( X ) assume - a = n i l A /* ( / (* ) , * ) Atemp = ri\\ A y = x A z = n i l while -itemp=x do if d(y) = false then temp := f(y); y := temp; else break; end if end while z := y; while ->z—x do if d(z) = true then z := fewp; else break; end if end while if ->y=z then / (y) := nil; / ( y ) : = z ; end if assert f*(f(x),x) end procedure Figure CIO: REMOVE-SEGMENT removes the first contiguous segment of elements whose data field is true from a cyclic singly-linked list. Predicates used to verify the exam-ple: f*(f(x),x), * = n i l , temp = nil, y=x, z = nil, x = temp, z = x, z-y, d(y), d(z), f*(z,x), f*[z,y), btwn / (y,z,x), btwnf (y,temp,x), f*(y,x). 79 1: procedure SEARCH-AND-SET(x,data; ,datai) 2: assume —>JC = n i I A f*(f(x),x) A f(x) — curr A f* (x, t) A btwn/(curr, t,x) Atrue 3: d\ (x) := true; 4: d2(x) := true; 5: while ^curr—x do 6: if dl (curr) = data; A d 2 (curr) — data2 then 7: break; 8: else 9: d\(curr) true; 10: d2(curr) := true; 11: curr := f (curr); 12: end if 13: end while 14: assert f*(f(x),x) Af*(x,t) A-^x—r\\\Atrue 15: assert (x — curr Ad\(t) Ad 2(f)) V ( -ix = curr A di (cwrr) = data/ A d 2 (cw/r) = data2 A ( (btwn/(x,t,curr) A^t —curr Ad\(t) Adi(t)) V (btwny(x,t,curr) At —curr) V - i btwn f (x, t, curr))) 16: end procedure Figure C . l l : S E A R C H - A N D - S E T searches for an element with specified data fields in a cyclic singly-linked list, and sets data fields of previous elements to t rue. Predicates used to verify the example: f*(f(x),x), f(x) = curr, d\(curr), d2(curr), data/, datai, x = curr, f*(x,t), x=nil, btwnf(curr,t,x), btwnf(x,t,curr), d\ (t), d 2(f), true, curr = t, x = t. 80 1: procedure SET-UNiON(a,fe) 2: assume f(a) = curr A /* {/(a), a) A /* (f(b), b) 3: assume f*(a,t) f\->f*(b,t) A->d(t) 4: assume -if*(a,s) Af*(b,s) Ad(s) 5: assume ~^d(a) Ad(b) 6: assume btwny(/(a),r,a) A btwn f(f(b),s,b) 7: assume -tf = nil A —"51 = niI 8: tmpd:=d(b); 9: d(a) := rmpd; 10: while -<curr — a do 11: d(curr) := fmpd; 12: curr:—f (curr); 13: end while 14: tmp := f(a); 15: / (a) :=/(/>); 16: f(b):=tmp; 17: assert f*(f(b),b) Af*(b,t) Ad(t) Af*(b,s) Ad(s) Ad(b) A ->t = nil A -tf = nil 18: end procedure Figure C.12: S E T - U N I O N combines two cyclic singly-linked lists. Predicates used to verify the example: a = curr, f(a) = curr, f*(f(a),a), f*(f(b),b), f*(f(a),t), f*(f(b),t), d(t), f*(f(a),s), f*(f(b),s), d(s), d(a), d(b), btwn f(f(a),t,a), btwnf(f(b),s,b), f = nil, 5 = nil, tmpd, d(curr), btwn/(/'(a), s,b), btwn f(tmp,t,a), btwn j (curr,t, a). 81 1: p rocedu re CREATE-lNSERT (nead) 2 : assume p — head A -tf = n i IA / * (head, n i I) A -<head = n i IA / * (f(malloc) ,malloc) A -'/(malloc) = pmalloc A item = nil A f(pmalloc) = malloc A ~*malloc = nil A /* (malloc, pmalloc) 3 : assume (f*(head,t) A ~<f*(malloc,t)) V (head,t) A/*(malloc,t)) 4 : assume ->f (malloc) = pmalloc; 5: item := malloc; 6: malloc := /(malloc); 7: /(pmalloc) := malloc; 8: wh i l e t rue do 9: i f M ) V / ( p ) = nil then 10: /(item) :=/(p); 1 1 : /(p):=item; 12: b r e a k ; 13: else 14: p:=f(p); 15: e n d i f 16: end wh i l e 17: assert -tf = ni IA / * (head, niI) A -*head = niIA / * (/(malloc), malloc) A - n ' f e m = niIA /*(head, item) A/(pmalloc) = malloc A -^malloc• = nil A / * (malloc, pmalloc) 18: assert (/*(head,t) A-*/*(malloc,t)) V (^/*(head,t) A/*(malloc,t)) 19: end p r o c e d u r e F i g u r e C . 1 3 : C R E A T E - l N S E R T creates a new node {malloc) and inserts it nonde-terministically into a linked list. Predicates used to verify the example: /*(head,t), n i l = / ( p ) , p — head, f = nil, /*(head,nil), /icat/ = nil, /*(/(malloc),malloc), /(malloc) — pmalloc, /*(malloc,t), item = nil, /*(head,item), /(pmalloc) = malloc, malloc = nil, /*(malloc,pmalloc), /*(head,p), /*(item,n\\), /*(item,p), /*(item,t), /*(/(p),t) item = t, /(pmalloc) = item, /(/(pmalloc)) = malloc, /*(malloc,item), n\\=/(item). Lines 4 - 7 model a malloc statement by removing a node from an infinite cyclic list which represents unallocated nodes. 82 1: procedure CREATE-lNSERT-DATA(nead) 2: assume p = head A —•/ = nil A / " * (head, ni I) A ->head — n i IA / * (f(malloc), malloc) A -^/(malloc) = pmalloc A /rem = nil A /(pmalloc) = malloc A -^malloc — nil A /* (malloc, pmalloc) 3: assume (f*(head,t) A->/* (malloc,t) A^(d(head) ®d(t))) V (-if*(head,t) A f*(malloc,t)) 4: assume -i/'(malloc)— pmalloc; 5: /rem := malloc; 6: malloc := f (malloc); 7: f (pmalloc) :— malloc; 8: tmpd := d(head); 9: d(item) := tmpd; 10: while true do 11: if N£>V/(/>) = nil then 12: f(item):=f(p); 13: /(/?) := /rem; 14: break; 15: else 16: p:=f(p)\ 17: end if 18: end while 19: assert —.f = nil A / " * (/read, ni I) A ->head — n i IA / * (/(malloc), malloc) A ->item = nil A / * (/read, /rem) A /(pmalloc) — malloc A -<malloc = ni IA / * (malloc,pmalloc) 20: assert (/*(head,t) A (malloc,t) A~>(d(head)®d(t))) V (-^/*(head,t) A/* (malloc,t)) 21: end procedure Figure C.14: CREATE-lNSERT-DATA creates a new node, initializes its data field, and inserts it nondeterministically into a linked list. Predicates used to verify the example: r = nil, /*(head,t), nil = /( /?), p — head, /*(head,n\\), head — x\\\, /*(/(malloc),malloc), /(malloc) = pmalloc, /*(malloc,t), item — nil, /*(head, item), /(pmalloc) = malloc, malloc = nil, /*(malloc,pmalloc), d(head), d(t), /*(item,p), /*(item,t), /*(/(p),t) item = t, /(pmalloc) = item, /(/(pmalloc)) = malloc, /*(malloc, item), nil = /(item), /*(head,p), f*(item, nil), tmpd. Lines 4-7 model a malloc statement by removing a node from an infinite cyclic list which represents unallocated nodes. 83 1: procedure CREATE-FREE(/iearf) 2: assume p = head A -if = n i IA / * (head, ni I) A -<head = n i IA / * (/(malloc), malloc) A ^f (malloc) = pmalloc A item — nil A /(pmalloc) — malloc A -^malloc — nil A /* (malloc, pmalloc) 3: assume (/*(head,t) A-</*(malloc,t)) V (-•/*(head,t) A/*(malloc,t)) 4: assume -'/(malloc) = pmalloc; 5: item := malloc; 6: malloc := /(malloc); 7: /(pmalloc) := malloc; 8: while true do 9: if NDV/(/>) = nil then 10: /(item) := /(p); 11: /(p):=item; break; 12: else 13: P-=/(p); 14: end if 15: end while 16: p :— head; r:=f(head); 17: while true do 18: if NDV/(r) = w\ then 19: / ( / » ) = = / W ; 20: /(pmalloc) :— r; 21: / ( r ) := malloc; 22: malloc := r; break; 23: else 24: p:=r;r:=/(r); 25: end if 26: end while 27: assert -if = n i IA / * (head, ni I) A -ihead = n i IA / * (/(malloc), malloc) A ->item = n i IA /(pmalloc) = malloc A -^malloc — nil A / * (malloc, pmalloc) A -if* (head, r) 28: assert (f*(head,t) A -if*(malloc,t) A ->r—t A (f*(head, item) © r=item)) V (-•/* (head, t) A f* (malloc, t) A (f* (head, item) ®r — item)) 29: end procedure Figure C.15: CREATE-FREE creates a new node and inserts it into a linked list. Also, removes a node from the list and frees it. Predicates used to verify the example: f*(head,t), <n\\ = f(p), p — head, r = nil, f*(head,nil), head = n\\, /*(/(malloc),malloc), /(malloc) — pmalloc, /*(malloc,t), item = nil, /*(head, item), /(pmalloc) = malloc, malloc — nil, /*(malloc,pmalloc), n i l=/ ( r ) , r = f, r—item, /*(head,r), /*(head,p), /*(item,n\\), /*(item,p), /*(malloc,item), /*(item,t), /*(/(/?),f) item — t, f(pmalloc) = item, f(f (pmalloc)) = malloc, nil =/(/rem), f (pmalloc) = r, f*(f(p),r) f*(f(p)J(r)). Lines 4-7 model a malloc statement by removing a node from an infinite cyclic list which represents unallocated nodes. Lines 20-22 model a free statement by returning a node to the infinite cyclic list of unallocated nodes. 84 1: procedure INIT-LIST(X) 2: assume f*(x,t) Af*(x,nil) Acurr—xA —>r = nil Atrue 3: while -*curr = n i I do 4: d(curr) := rrwe; 5: curr:—f (curr); 6: end while 7: assert / * ( x , f ) A / * ( x , nil) Ad(t) A true A ->/= nil 8: end procedure Figure C.16: I N I T - L I S T in i t ial izes the data fields of an acyc l ic s ing ly- l inked list. Pred i -cates used to veri fy the example: curr=W\\, curr=x, f*(x,t), f*(x,nil), d(t), true, f = ni l , f * (curr, t),f*(t, curr). 1: procedure INIT-LIST-VAR (X, my?) 2: assume f*(x,t)A/*(x,nil)Acurr=x A-^t = n\\ Atrue 3: while -^curr = n i I do 4: d(curr) := true; 5: curr := f (curr); 6: end while 7: tmp:=d(x); 8: assert f*(x,t) Af*(x, nil) A<i(f) A true A tmp A ->t — nil 9: end procedure Figure C.17: i N I T - L l S T - V A R init ial izes the data fields of an acyc l ic s ing ly- l inked lists, and also sets the value of a global data variable before terminating. Predicates used to verify the example: curr= ni l , curr — x, f*(x,t), f*(x,n\\), d(t), true, r = ni l , tmp, f*(curr,t), f*(t,curr), d(x). Parameter tmp is a boolean variable. 1: procedure INIT-CYCLIC(X) 2: assume f*(x,t)Af*(f(x),x)Acurr=f(x) A btwnf(curr,t,x) A - a = n i l A true 3: d(x) := true; 4: while ^ curr=x do 5: d(curr) := true; 6: curr := f (curr); 1: end while 8: assert f*(x,t) Af*(f(x),x) Ad(t) Atrue A ->x= nil 9: end procedure Figure C.18: INIT-CYCLIC ini t ial izes data fields of a cyc l i c s ing ly- l inked list. Pred i -cates used to veri fy the example: curr = x, curr = f(x), f*(x,t), f*(f(x),x), d(t), true, btwn/(curr,t,x), x = n i l , t=x, btwnf(x,t,curr), f*(t,curr). 85 procedure SORTED-lNSERT-DNODES(/ieaJ, item) assume -if*(head, item) A/*(head, nil) A -^head = nil assume / * (head, t) © item = t assume -if = nil A /(item) = nil assume d(t) < d(/(t)) V / ( f ) = nil assume curr— head A succ — /(head) A/(g(t)) = r\\\Ag(g(t)) = r\\\A/(g(item)) — ri\\Ag(g(item)) = ri\\Ag(t)=s while -^succ — nW Ad(g(item)) > d(g(succ)) do curr:— succ; succ := /(curr); end while if d(g(head)) > d(g(item)) then /(item) := head; head := item; else /(item) :— succ; /(curr) := item; end if assert /*(head, nil) A/*(head,t) A (d(t) < d(/(t))V/(t) — n\\) Ag(t)=s end procedure Figure C.19: SORTED-lNSERT-DNODES inserts an element into a sorted linked list so that sortedness is preserved. Every node in the linked list has an additional pointer to a node that contains a data field which is used for sorting. Predicates used to verify the example: /*(head,t), succ = ri\\, d(g(item)), d(g(succ)), d(g(t)), d(g(/(t))), f = nil. / (f) = nil, d(g(head)), item = t, curr = head, /*(head, item), succ =/(head), /*(head, nil), /(item) = nil, head = nil, /(g(t)) = nil, g(g(t)) = nil, /(g(item)) = nil, g(g(item)) = nil, g(t)=s, /(item) = succ, /(curr) = succ, /(item) —curr, /* (head, curr). Comparison between data values is defined as a formula over boolean data predicates. For instance, d(x) < d(y) is defined as ->(d(x) A ->d(y)). 86 1: procedure REMOVE-DouBLY(/ieaa*, tail,node) 2: assume next * (head, tail) A prev* (tail, head) A nil = next (tail) A nil = prev(head) 3: assume next* (head, t) A prev* (tail, t) 4: assume next * (head, node) A prev* (tail, node) 5: assume next * (head, prev(node)) A prev* (tail, next(node)) 6: assume -^node — nWA->f = nil 7: assume ( head = t A (head = t®t = f(g(t))) A(tail = t®t = g(f(t)))) V ( -ihead — t A(head = t®t = f(g(t))) A (tail = t®t = g(f(t))) A (head = node®node = f(g(node))) A (tail = node® node —g(f (node)))) 8: if prev (node) = n i I then 9: head : = next (node); 10: else 11: temp:—prev (node); next (temp) := next (node); 12: end if 13: if next (node) —n\\ then 14: tail:—prev(node); 15: else 16: temp ;= next (node); prev(temp) := prev(node); 17: end if 18: assert next* (head, tail) A prev*(tail, head) A nil = next(tail) A nil = prev(head) 19: assert -^next*(head,node) A->prev*(tail,node) 20: assert -inode = ri\\ A -if = nil 21: assert (node = t A -mext* (head, t) A -iprev* (head, t) A -^head — t A -<tail = t) V ( -mode=t A next* (head, t) A prev* (tail, t) A(head = t®t = f(g(t))) A(tail = t®t = g(f(t)))) 22: end procedure Figure C.20: REMOVE-DOUBLY removes an element from an acyclic doubly-linked list. Predicates used to verify the example: nil = prev(node), nil = next(node), next* (head, tail), prev*(tail,head), n\\ = next (tail), nil = prev(head), next* (head,t), prev* (tail,t), head — t, tail = t, t = prev (next (t)), t = next(prev(t)), next*(head,node), prev*(tail,node), head = node, tail = node, node —prev(next(node)), node = next(prev(node)), next* (head, prev(node)), prev* (tail, next (node)), ri\\ = prev (next (node)), n i I=next (prev (node)), node = n\\, node = t, r = nil. head = next (node), head —temp, temp = prev(node), next* (head, temp), temp = t, temp = next (prev (node)), next (node) — next (prev (node)), prev(temp) — prev(node), prev(node) =t. 87 1: procedure REMOVE-CYCLlC-DouBLY(/zead,enfry) 2: assume next* (next(head), head) A prev* (prev(head), head) 3: assume prev* (head, next (head)) A next* (head, prev(head)) 4: assume next* (head, t) A prev* (head,t) 5: assume next * (head, prev(t)) A prev* (head, next(t)) 6: assume next * (head, entry) A prev* (head, entry) 7: assume next* (head, prev(entry)) A prev*(head, next(entry)) 8: assume t = prev(next(t)) At = next(prev(t)) 9: assume head = prev (next (head)) Ahead = next(prev(head)) 10: assume entry — prev(next(entry)) A entry = next(prev(entry)) 11: assume -<entry = head 12: p := prev(entry); 13: n := next (entry); 14: prev(n) := p; 15: next(p) := n; 16: assert next*(next(head),head) Aprev*(prev(head),head) 17: assert -^next* (head, entry) Aprev* (head, entry) 18: assert prev* (head, next (head)) A next* (head, prev(head)) 19: assert head = prev (next (head)) Ahead — next(prev(head)) 20: assert -*entry = prev(next(entry)) A ^ entry = next(prev(entry)) 21: assert -<entry = head 22: assert ( next* (head,t) A prev* (head,t) At —prev(next(t)) A sentry = t A t = next(prev(t)) Anext* (head,prev(t)) A prev*(head,next(t))) V ( ^next* (head, t) A-iprev* (head, t) A -it = prev(next(t)) A -if = next(prev(t)) A entry = t) 23: end procedure Figure C.21: R E M O V E - C Y C L I C - D O U B L Y removes an element from a cyclic doubly-linked list. Predicates used to verify the example: next*(next(head),head), prev*(prev(head),head), next* (head, t), prev* (head, t), t = prev (next(t)), t = next (prev(t)), next*(head,entry), prev*(head, entry), prev* (head, next (head)), next* (head, prev(head)), head = prev(next(head)), head = next (prev(head)), entry — prev(next (entry)), entry = next (prev (entry)), next*(head,prev(entry)), prev*(head,next(entry)), next*(head,prev(t)), prev*(head,next(t)), t — entry, entry = head, next(entry) = n, prev(entry) = p, n = head, prev(n) — p, n = t, next(head) = entry, p = t. 88 1: procedure LlNUX-LlST-ADD(«ead,new) 2: assume next* (next (head), head) A prev* (prev(head), head) 3: assume -mext* (head, new) A ->prev*(head,new) 4: assume prev* (head, next(head)) A next* (head, prev(head)) 5: assume next (new) = nil A prev(new) = nil 6: assume head = prev(next(head)) A head = next(prev(head)) 7: assume ( next*(head,t) A prev*(head,t) At = prev(next(t)) At = next(prev(t)) A -*t = new Anext* (head, prev(t)) A prev* (head, next (t))) V ( -*next*(head,t) A ^ prev* (head, t) A^t — prev(next(t)) At=new A ->t = next(prev(t)) A ->next*(head,prev(t)) A ->prev*(head,next(t))) 8: p:— head; 9: n := next (head); 10: prev(n) := new; 11: next(new) := n; 12: prev(new) := p; 13: next(p) := new; 14: assert next* (next(head), head) A prev* (prev(head), head) 15: assert prev* (head, next (head)) Anext* (head, prev (head)) 16: assert nexf * (head, t) A prev* (head, t) 17: assert next * (head, prev(t)) A prev* (head, next(t)) 18: assert next* (head, new) A prev* (head, new) 19: assert t — prev(next(t)) At = next(prev(t)) 20: assert ->next(new) = r\\\ A^prev(new) — n\\ 21: assert head = prev (next (head)) A head=next (prev(head)) 22: end procedure Figure C.22: L I N U X - L I S T - A D D adds a node to a cyc l i c doubly- l inked list. Predicates used to verify the example: next * (next (head), head), prev* (prev(head), head), next*(head,t), prev*(head,t), t — prev(next(t)), t — next(prev(t)), next* (head, new), prev* (head,new), prev*(head,next(head)), next*(head,prev(head)), next(new) = nil, prev(new) — nil, head — prev (next (head)), head = next(prev(head)), next*(head,prev(t)), prev*(head,next(t)), t=new, p=head, next(new)=n, prev(n)=new, n=t, head=t, prev(new)—head, n=head, n = next(head). 89 1: procedure LlNUX-LlST-ADD-TAlL(nead,n<?w) 2: assume next* (next (head), head) A prev*(prev(head), head) 3: assume prev*(head,next(head)) Anext*(head,prev(head)) 4: assume -^next*(head,new) A ^prev*(head,new) 5: assume next(new) = nil A prev(new) = nil 6: assume head = prev(next (head)) A head=next (prev(head)) 7: assume ( next*(head,t)Aprev*(head,t)At = prev(next(t))At — next(prev(t)) A -<t = new A next* (head, prev(t)) A prev* (head, next(t))) V ( -next*'(head, t) A~>prev*'(head, t) A-^t —prev (next (t)) At—new A -it = next (prev (t)) A -mext* (head,prev(t)) A -> prev* (head, next(t))) 8: p := prev(head); 9: n := head; 10: prev(n) :— new; 11: next (new) := n; 12: prev(new) := p; 13: next(p) :— new; 14: assert next* (next(head), head) A prev*(prev(head), head) 15: assert prev* (head, next (head)) A next* (head, prev(head)) 16: assert next * (head, t) A prev* (head, t) 17: assert next* (head, prev(t)) A prev* (head,next(t)) 18: assert next * (head, new) A prev* (head, new) 19: assert t = prev (next (t)) At = next(prev(t)) 20: assert -mext(new) = nil A -*prev(new) = nil 21 : assert head = prev(next(head))Ahead — next(prev(head)) 22: end procedure Figure C . 2 3 : L I N U X - L I S T - A D D - T A I L adds a node to the tail o f a cyc l i c doubly- l inked l ist. Predicates used to veri fy the example: next * (next (head), head), prev* (prev(head), head), next* (head, t), prev* (head, t), t — prev (next (t)), t = next (prev (t)), next*(head,new), prev*(head,new), prev*(head,next(head)), next*(head,prev(head)), next(new) — nil, prev(new) = nil, head — prev(next(head)), head —next (prev (he ad)), next*(head,prev(t)), prev*(head,next(t)), t—new, p = t, prev(new)—p, next*(head,p), next(p)—head, new = prev(head), prev* (p, next (head)), n = head, prev(new) = head, prev* (p, next (t)), n = t. 90 1: procedure LlNUX-LlST-DEL(/iead, entry) 2: assume next * (next (head), head) A prev* (prev (head), head) 3: assume prev* (head, next (head)) A next* (head, prev(head)) 4: assume next * (head, t) A prev* (head, t) 5: assume next* (head, prev(t)) A prev* (head,next (t)) 6: assume next* (head, entry) A prev* (head, entry) 7: assume next* (head, prev(entry)) A prev* (head, next(entry)) 8: assume t = prev (next (t)) At = next (prev (t)) 9: assume head = prev(next(head)) Ahead = next(prev(head)) 10: assume entry = prev(next(entry)) A entry = next(prev(entry)) 11: assume -mext(entry) = r\\\A-^prev(entry) = n\\ 12: assume ->entry = head 13: p := prev(entry); 14: n := next(entry); 15: prev(n) := p; 16: next(p) := n; 17: next(entry) := n i l ; 18: prev(entry) := n i l ; 19: assert nexr* (next(head), head) A prev*(prev(head), head) 20: assert prev* (head, next (head)) A next* (head, prev(head)) 21: assert -wex?* (head, entry) A -*prev* (head, entry) 22: assert -^next* (head, prev(entry)) A ^ prev* (head,next (entry)) 23: assert next(entry) = nil A prev(entry) = nil 24: assert /zeaa" = prev (next (head)) Ahead — next (prev (head)) 25: assert -sentry = prev (next (entry)) A ^ entry = next (prev(entry)) 26: assert -sentry— head 27: assert ( next*(head,t) A prev*(head,t) A r = prev(next(t)) At=next (prev (t)) A ->entry = t Anext* (head, prev(t)) A prev* (head, next (t))) V ( -inext*(head,t) A~<t — prev(next(t)) A^t —next (prev (t)) Aentry = t A -<prev*(head, t) A -mext*(head, prev(t)) A prev* (head, next(t))) 28: end procedure Figure C.24: LlNUX-LlST-DEL removes a node f rom a cyc l i c doubly- l inked list. Pred i -cates used to veri fy the example: next*(next(head),head), prev*(prev(head),head), next*(head,t), prev*(head,t), t = prev(next(t)), t = next(prev'(?)), next*(head,entry), prev*(head,entry), prev* (head, next (head)), next* (head, previhead)), next(entry) = ni l , prev(entry) — ni l , head — prev(next(head)), head = next(prev(head)), entry = prev(next(entry)), entry = next (prev (entry)), next*(head,prev(entry)), prev*(head,next(entry)), next*(head,prev(t)), prev*(head,next(t)), t = entry, entry = head, next(entry) = n, prev(entry) = p, n = head, prev(n) = p, n — t, next(head) = entry, p = t.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- A logic and decision procedure for verification of...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
A logic and decision procedure for verification of heap-manipulating programs Rakamarić, Zvonimir 2006
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | A logic and decision procedure for verification of heap-manipulating programs |
Creator |
Rakamarić, Zvonimir |
Publisher | University of British Columbia |
Date Issued | 2006 |
Description | Heap-manipulating programs (HMPs), which manipulate unbounded linked data structures via pointers, are a major frontier for formal verification of software. Formal verification is the process of proving (or disproving) the correctness of a system with respect to some kind of formal specification or property. The primary contributions of this thesis are the definition of a simple transitive closure logic tailored for formal verification of HMPs, and an efficient decision procedure for this logic. To assess the effectiveness of the proposed approach, we develop an HMP verification framework, which uses our fast implementation of the decision procedure to verify a number of HMP examples. Experimental examples (including three small container functions from the Linux kernel) demonstrate that the logic is practically useful and expressive enough to prove many interesting heap properties. In addition, the decision procedure provides a substantial time and space advantage over previous approaches. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-01-13 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0051503 |
URI | http://hdl.handle.net/2429/18130 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2006-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_2006-0629.pdf [ 3.32MB ]
- Metadata
- JSON: 831-1.0051503.json
- JSON-LD: 831-1.0051503-ld.json
- RDF/XML (Pretty): 831-1.0051503-rdf.xml
- RDF/JSON: 831-1.0051503-rdf.json
- Turtle: 831-1.0051503-turtle.txt
- N-Triples: 831-1.0051503-rdf-ntriples.txt
- Original Record: 831-1.0051503-source.json
- Full Text
- 831-1.0051503-fulltext.txt
- Citation
- 831-1.0051503.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0051503/manifest