Leveraging distributed explicit-state model checking forpractical verification of liveness in hardware protocolsbyBrad BinghamM.Sc., The University of British Columbia, 2007B.Sc., The University of Victoria, 2005A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Computer Science)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)April 2015c© Brad Bingham, 2015AbstractProtocol verification is a key component to hardware and software design. The prolifera-tion of concurrency in modern designs stresses the need for accurate protocol models andscalable verification tools. Model checking is an approach for automatically verifying prop-erties of designs, the main limitation of which is state-space explosion. As such, automaticverification of these designs can quickly exhaust the memory of a single computer.This thesis presents PReach, a distributed explicit-state model checker, designedto robustly harness the aggregate computing power of large clusters. The initial versionverified safety properties, which hold if no error states can be reached. PReach has beendemonstrated to run on hundreds of machines and explore state space sizes up to 90 billion,the largest published to date.Liveness is an important class of properties for hardware system correctness which,unlike safety, expresses more elaborate temporal reasoning. However, model checking ofliveness is more computationally complex, and exacerbates scalability issues as comparedwith safety. The main thesis contribution is the extension of PReach to verify two keyliveness-like properties of practical interest: deadlock-freedom and response. Our methodsleverage the scalability and robustness of PReach and strike a balance between tractableverification for large models and catching liveness violations.Deadlock-freedom holds if from all reachable system states, there exists a sequenceof actions that will complete all pending transactions. We find that checking this propertyis only a small overhead as compared to safety checking. We also provide a technique foriiestablishing that deadlock-freedom holds of a parameterized system — a system with avariable number of entities.Response is a stronger property than deadlock-freedom and is the most commonliveness property of interest. In practical cases, fairness must be imposed on system mod-els when model checking response to exclude those execution traces deemed inconsistentwith the expected underlying hardware. We implemented a novel twist on establishedmodel checking algorithms, to target response properties with action-based fairness. Thisimplementation vastly out-performs competing tools.This thesis shows that tractable verification of interesting liveness properties in largeprotocol models is possible.iiiPrefaceThe work of this thesis was mostly conducted at the Integrated Systems Design Lab at theUniversity of British Columbia, Point Grey campus. Some development of the PReachtool and experiments were done at Intel Corporation (Ronler Acres), in Hillsboro, Oregon.Chapters 3 – 6 are based on publications that have already appeared. Accordingly, someof the text and figures are based upon material written or drawn by my co-authors. Thesepublications, along with my role in the research and writing are described in detail below.1. Chapter 3 is based on: Brad Bingham, Jesse Bingham, Flavio M. de Paula, JohnErickson, Gaurav Singh, and Mark Reitblatt. Industrial strength distributed explicitstate model checking. In Parallel and Distributed Model Checking (PDMC), pages28–36, IEEE Computer Society, 2010.2. Chapter 5 is based on: Brad Bingham, Mark Greenstreet, and Jesse Bingham. Pa-rameterized verification of deadlock freedom in symmetric cache coherence protocols.In Formal Methods in Computer-Aided Design (FMCAD), pages 186–195, FMCADInc., 2011.3. Chapter 4 is based on: Brad Bingham, Jesse Bingham, John Erickson, and MarkGreenstreet. Distributed explicit state model checking of deadlock freedom. In Com-puter Aided Verification (CAV), volume 8044 of Lecture Notes in Computer Science,pages 235–241, Springer-Verlag, 2013.iv4. Chapter 6 is based on: Brad Bingham and Mark Greenstreet. Response propertychecking via distributed state space exploration. In Formal Methods in ComputerAided Design (FMCAD), pages 15–22, FMCAD Inc., 2014.In addition, Chapter 2 has small portions that originated from the above publica-tions. My individual contributions to publication 1 include• Created the initial prototype of the distributed model checking algorithm.• Implemented and evaluated various approaches to load balancing to address bothperformance and thread crashes.• Gathered statistics on Erlang communication throughput and latency relative tomessage size, motivating state-batching optimizations.For the other publications, I performed the bulk of the research, with co-authorscontributing with brain-storming discussions, manuscript writing and feedback. Whilethey are not co-authors on publication 4, Jesse Bingham helped by running our tool onproprietary Intel models, and Jim Grundy provided some models he authored which weused as benchmarks.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiList of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xviiChapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Formalized Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 The Explicit-State Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3 Verification Methods of this Thesis . . . . . . . . . . . . . . . . . . . . . . . 121.4 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.4.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16vi1.4.2 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Chapter 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.1.1 Temporal Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.1.2 The Murϕ Description Language . . . . . . . . . . . . . . . . . . . . 202.2 Parallel and Distributed Model Checking . . . . . . . . . . . . . . . . . . . 242.2.1 Model Checking Deadlock Freedom . . . . . . . . . . . . . . . . . . . 292.3 Parameterized Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.4 Model Checking LTL Formulas . . . . . . . . . . . . . . . . . . . . . . . . . 322.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Chapter 3 The PReach Model Checker . . . . . . . . . . . . . . . . . . . . . 413.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.2 Crediting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.3 Light Weight Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . 463.4 Batching of States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.5 PReach Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Chapter 4 Model Checking of Deadlock Freedom . . . . . . . . . . . . . . . 574.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.1.1 Specification Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.2.1 Local Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.2.2 Pass-the-Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.2.3 Outstanding Search Table . . . . . . . . . . . . . . . . . . . . . . . . 644.2.4 Implementation Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 664.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67vii4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Chapter 5 Parameterized Deadlock Freedom . . . . . . . . . . . . . . . . . 735.1 A Simple Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.2 Formal Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.2.1 Mixed Abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.2.2 Insufficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.2.3 Parameterized Systems . . . . . . . . . . . . . . . . . . . . . . . . . 865.2.4 Undecidability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885.3 Syntactical Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.3.1 Syntax and Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . 895.3.2 Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.4 Verifying Universal Quiescence . . . . . . . . . . . . . . . . . . . . . . . . . 945.4.1 Universally Quantified Quiescence . . . . . . . . . . . . . . . . . . . 945.4.2 Abstract Rule Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965.4.3 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.5 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025.5.1 Automatic Deadlock Freedom Predicates . . . . . . . . . . . . . . . 1035.5.2 The German Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 1045.5.3 The FLASH Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.6.1 Permutations on More than One Abstracted Node . . . . . . . . . . 1075.6.2 Local Rule Generalizations . . . . . . . . . . . . . . . . . . . . . . . 1085.6.3 Automatic Strengthening . . . . . . . . . . . . . . . . . . . . . . . . 109Chapter 6 Distributed Response Property Checking . . . . . . . . . . . . .1116.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114viii6.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146.2.2 A Note about Stuttering . . . . . . . . . . . . . . . . . . . . . . . . . 1166.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1186.3.1 Worst-Case Time Complexity for OWCTY . . . . . . . . . . . . . . 1216.4 Distributed Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1236.5 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286.5.1 Saved Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286.5.2 Dynamic Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1296.5.3 Deletion by Predecessor Counting . . . . . . . . . . . . . . . . . . . 1306.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1306.7 Comparison with DiVinE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1356.8 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 136Chapter 7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1387.1 Contributions Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1407.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .146Appendix A German’s Protocol . . . . . . . . . . . . . . . . . . . . . . . . . .159Appendix B HIR Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .164Appendix C Heuristic Examples . . . . . . . . . . . . . . . . . . . . . . . . .168C.1 Discharging AEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168C.2 Discharging AUG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170ixList of Tables1.1 Verification methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.1 Large model experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.1 Performance of DF checking algorithms . . . . . . . . . . . . . . . . . . . . 705.1 Mapping of types from concrete system to mixed abstraction . . . . . . . . 925.2 Heuristics for ruleset tag/property pairs . . . . . . . . . . . . . . . . . . . . 995.3 List of chapter symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106.1 PReach-Resp benchmark results . . . . . . . . . . . . . . . . . . . . . . . 132xList of Figures2.1 Stern-Dill DEMC flow-chart . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.1 Kumar/Mercer load balancing . . . . . . . . . . . . . . . . . . . . . . . . . . 473.2 Light weight load balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.3 Load balancing effect on work queues . . . . . . . . . . . . . . . . . . . . . . 493.4 LDash model load balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.5 SCI model load balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.6 State batching effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.7 Speed up experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . 564.1 Illustration of deadlock freedom . . . . . . . . . . . . . . . . . . . . . . . . . 584.2 Outcomes of a witness search . . . . . . . . . . . . . . . . . . . . . . . . . . 604.3 A subgraph that does not satisfy DF . . . . . . . . . . . . . . . . . . . . . . 634.4 PtP/OST example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.5 Histograms of path lengths in PtP mode . . . . . . . . . . . . . . . . . . . . 714.6 Path length histogram for intel in PtP mode . . . . . . . . . . . . . . . . 724.7 Memory usage of ST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.1 Program turn (mutual exclusion by turn setting) . . . . . . . . . . . . . . 765.2 The Murϕ system for program turn . . . . . . . . . . . . . . . . . . . . . . 775.3 Murϕ system for the mixed abstraction of turn . . . . . . . . . . . . . . . 785.4 Finding paths through permutation . . . . . . . . . . . . . . . . . . . . . . . 80xi5.5 Simulation of S1 by S2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.6 Illustration of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.7 Illustration of A`(i) |= AG (A→ EFr̂B) . . . . . . . . . . . . . . . . . . . . . 1006.1 Sets of interest when checking (p→ ♦q) . . . . . . . . . . . . . . . . . . . 1166.2 Example of PTFA updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1206.3 Response property MC plots for German6 . . . . . . . . . . . . . . . . . . . 1346.4 Response property MC plots for peterson6 wf . . . . . . . . . . . . . . . . 1346.5 Response property MC plots for snoop2 . . . . . . . . . . . . . . . . . . . . 135C.1 Schema for proving ruleset ABS SendGntS2 is underapproximate . . . . . . . 171C.2 Schema for proving ruleset ABS SendGntE1 is underapproximate . . . . . . . 173xiiList of Algorithms2.1 Basic distributed cycle detection . . . . . . . . . . . . . . . . . . . . . . . . 342.2 One-Way-Catch-Them-Young (OWCTY) . . . . . . . . . . . . . . . . . . . 352.3 Maximum Accepting Predecessor (MAP) . . . . . . . . . . . . . . . . . . . 362.4 A LTL DEMC algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.1 Stern-Dill DEMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.2 PReach DEMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.1 OWCTY at a high level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1176.2 PReach-Resp root process . . . . . . . . . . . . . . . . . . . . . . . . . . . 1246.3 PReach-Resp worker process . . . . . . . . . . . . . . . . . . . . . . . . . 1266.4 PReach-Resp ExpandAndSend subroutine . . . . . . . . . . . . . . . . . 127xiiiList of AcronymsBDD Binary Decision DiagramCEGAR Counter-Example Guided Abstraction RefinementCMP Chou-Mannava-ParkCTL Computation Tree LogicDEMC Distributed Explicit-state Model CheckingDF Deadlock-FreedomDFS Depth-First SearchEMC Explicit-state Model CheckingFSCC Fair Strongly Connected ComponentHIR Heuristic Inference RuleLTL Linear Time LogicOA Over-ApproximateOST Outstanding-Search TableOWCTY One-Way-Catch-Them-YoungxivPTFA Predecessor Trace Fair ActionsPtP Pass-the-PathSCC Strongly Connected ComponentsST Search TableUA Under-ApproximateWHT Witness Hash TablexvAcknowledgmentsThanks to Mark Greenstreet for his supervision. His patience, advice and enthusiasm havebeen a constant inspiration. My supervisory committee of Alan Hu, John Harrison andKarthik Pattabiraman have all been helpful sources with regard to my studies. Thanksto Flavio M. de Paula, Jesse Bingham and John Erickson for all their hard work on thePReach project — without them this thesis would not exist. Flemming Andersen alsoprovided invaluable support on this project, which included guidance during two internshipswith his group. I thank John Erickson and Jim Grundy for their mentorship during thesework terms. I appreciate the kindness and support from the ISD lab students over theyears, and my good friends at UBC and beyond. Finally, I thank my family members fortheir deep interest in my education, and for always lending a hand.xviFor my parents, Rosemary and Paul, my role models in life.xviiChapter 1IntroductionConcurrency pervades modern hardware and software. From instruction-level parallelismexploited in CPU execution pipelines, to multiple threads running on a multicore ma-chine, to a fully distributed computation on a compute cluster, concurrency is critical toperformance. Modern CPU designs are becoming increasingly parallel. Energy and tem-perature considerations with current technologies prohibit faster clock speeds [97]. Becausesingle-core performance has reached its limit, multicore architectures are proliferating [3].Such hardware designs require careful considerations for the protocols that support thisconcurrency through inter-core and inter-cache communication.These hardware protocols are often devised and described at a high level by com-puter architects. The concurrent nature of these protocols make them particularly proneto subtle design bugs, as it is difficult for the human to consider all possible interleavingsof concurrent events. Unchecked, any fundamental bugs or mistakes may not be exposeduntil the protocol has been implemented in a hardware description language, amid manyother details, and simulations are run. But as simulation is almost necessarily incompletewith respect the the possible inputs, these bugs may escape to later stages of design andultimately persist into fabricated chips and sold to customers. Errors that are discoveredlate in this design, implementation and production process can have extremely expensive1consequences. Economically, we wish to catch these bugs as early in the design cycle aspossible, for example, before a hardware description exists. This drives the effort knownas formal verification — the attempt to prove formal properties of a computer chip design.Automated methods are easiest to use, but can require massive computational resources.To harness the required resources, we seek to scale our verification capabilities and makeuse of clusters of commodity machines.Let’s consider a simple example protocol that controls a traffic light. Suppose thelight controls 4 directions: north, south, east, west, and has the standard green, yellowand red signals. A controller will dictate which signals are on when the traffic light isfirst powered on, and then how they change over time from there. The controller uses atimer to measure how long to stay in the current state before changing. It may also useadditional information such as the time of day or sensory data that indicates the presenceof a waiting vehicle. At the protocol level, we omit this timing information and considerwhat the controller allows as far as changes to the signals. For example, we wouldn’t expecta good controller to allow a light to transition from green directly to red; we would expecta yellow light in between. For a simple controller, the state of the traffic light is preciselythe signals that are turned on in the 4 directions.How do we check if our controller is correct? One approach is to simply test it outand see if any accidents occur. Perhaps this is done in a controlled environment where nomoving cars are involved. But how much of this testing is sufficient? Ideally we would liketo exhaustively check every possible configuration and make sure all of them are okay. Weneed a formal notion of which states are “bad states”; such a description is called a safetyproperty. A safety property of this system could be “if north is green then east is red”,as we can all agree that violating this statement is indeed quite unsafe. But establishingsafety isn’t enough for a traffic light. Indeed, a light that has all 4 signals turned redforever is safe, but not very functional. This motivates the notion of liveness properties,that reason about events that will eventually occur. A liveness property of this system2could be “north will eventually be green”, because otherwise the southbound cars couldwait indefinitely.Once we decide on safety and liveness properties, we can use verification tools toautomatically verify them. To show safety, we need to exhaustively list all states that thesystem allows, and check that none of them meet the “bad” criteria. Liveness is moredifficult, as this reasons about sequences of states that may be of unbounded length. Thecomputational cost of both of these checks is mainly determined by the number of reachablestates, that is states that could occur in the controller. Unfortunately, this number tendsto increase rapidly as more detail is added to the system. In this example, determining thenumber of reachable states of this system would require a more formal description, but inpractice it is often proportional to the number of possible states. Assuming that exactlyone signal is always on in every direction, the number of possible states for this system(recalling there are 4 directions and 3 signals), is 3 × 3 × 3× 3 = 34 = 81. How does thischange if more detail is added to the system? Suppose that instead of 4 directions we havean addition direction to consider (say, from the north-west), and in addition there’s a leftturn arrow, i.e. there’s now 5 directions and 4 signals. Now the number of possible statesis 45 = 1024, a huge increase. This is an example of combinatorial explosion.In order to deal with state-explosion, many ideas have been proposed. One ofour main approaches is to distribute the exhaustive check among many computers, toleverage their aggregate memory and computing power. Such techniques are incentivized bymodern computers, as commodity multicore machines connected via commodity networksare sufficient to drastically increase our verification capabilities.Much of this thesis focuses on the liveness problem. But with most verificationresearch, it’s best to address safety first, as it is easier to deal with. Thus we presentour tool called PReach, which lists all possible states of a system and checks them forsafety. The motivation for this model checker was straightforward, as we had colleagues inindustry with large model checking problems, and existing tools lacked the robustness and3scalability they needed. PReach can handle systems with enormous numbers of states —on the order of billions — using a hundred machines or more. While the original PReachwas very effective for verifying safety properties, it highlighted the need to check livenessproperties, which are generally harder to deal with. Because it involves reasoning aboutsequences of states, model checking of liveness is susceptible to a much more severe form ofthe combinatorial explosion problem than safety. Furthermore, reasoning about sequencesintroduces mathematical technicalities that can make the properties to be checked incom-prehensible to the architects. To address both the computational and the psychologicalcomplexity of liveness checking, we consider three special cases that both simplify the prob-lem and address a large number of the liveness issues that arise in practice. In particular,we extend PReach to check for deadlock-freedom and response properties.In the context of the traffic light example, a deadlock freedom property could be“if new cars stop arriving at the intersection, all waiting cars will eventually face a greenlight”. This is important for hardware designs as it can demonstrate that the systemwill not get “stuck” in a state where pending operations cannot complete. An exampleresponse property is “every car that arrives will eventually face a green light”. This isa stronger statement than the deadlock freedom property, as it says something aboutliveness of the intersection even when there’s a steady stream of traffic. In practice, thisis a more difficult property to specify and verify. Finally, we also give a method forproving parameterized deadlock freedom properties. This means that we could show thatan intersection with any number of directions satisfies a deadlock freedom property. Suchtechniques are of particular importance when combinatorial explosion exhausts the capacityof our verification tools.In summary, this dissertation attacks the problem of automatically proving live-ness properties of real industrial hardware protocols. We proceed in the next section byformalizing the problem and our contributions using standard verification terminology.41.1 Formalized ProblemConsider the problem of verifying correctness of a hardware design. Often, a verificationengineer will formally express some key aspects of the design in a specification language,which implicitly describes a directed graph called a transition system (or simply a system).This is typically achieved by stating system variables, initial conditions on their values,and conditions for changes to their values. Such descriptions vary in detail, ranging fromthe protocol level, which models event-based nondeterministic behavior, to cycle-accuratemodels to bit-accurate models. The properties we would like to prove about the systemare often categorized as safety properties and liveness properties. These are formally dis-tinguished below, but a commonly used intuitive taxonomy quotes Lamport: “somethingwill not happen” for the former and “something must happen” for the latter [80]. Modelchecking [44] refers to algorithms and tools for automatically verifying these properties. Inthis thesis, we focus on explicit-state model checking (EMC), which represents each state(a valuation of the system variables) as a distinct object in memory. The state-space ex-plosion problem limits the effectiveness of model checking; as more detail is added to themodel the number of possible states grows exponentially. Typically the number of reach-able states grows in proportion to the number of possible states, which places increasingdemands on computational resources.Each vertex of the system corresponds to a state; the arcs of the system are calledtransitions. If p is a predicate over the system variables, then the set of states with avariable valuation that satisfies p are called p-states. The transition system includes apredicate for the initial states. State s is called reachable if there is a path from an initialstate to s. A trace is a walk of possibly infinite length that begins at an initial state. Exceptin highly degenerate transition systems, there are an uncountably infinite number of traces.We define a safety property as an assertion over all traces for which a counterexample tracehas finite length. An example is “Always p”, which means that every reachable state isa p-state. Verifying a safety property using EMC amounts to enumerating the reachable5states and checking something “local” to each state. A liveness property is an assertion overall traces for which a counterexample trace has infinite length. An example is “EventuallyAlways p”, which means that every infinite trace has an infinite postfix of p-states. Livenessproperties are more computationally intensive to verify because they require reasoning overtraces rather than reachable states; liveness verification calls for algorithms that detect andanalyze strongly connected components of the state transition graph [47, Section 22.5].To cope with state-space explosion and computers with limited memory, judiciouschoices must be made when deciding which aspects of the design are included in the model.If a model checking procedure exhausts memory resources, then removing details from themodel will usually reduce the size of the reachable state space. Suppose the model inquestion has is parameterized by an integer that monotonically increases the reachablestate space size. Then, reducing the value of this parameter may render model checkingtractable. For example, a cache coherence protocol description may be parameterized bythe number of caches; clearly increasing this number leads to a greater number of variablesand concurrent transitions, and a larger reachable state-space. Perhaps the design ofinterest has 32 caches, but we can only model check the protocol model with 4 caches. Theexpectation is that safety or liveness violations in the smaller model will have analogousviolations in the real design, thus increasing our confidence when model checking succeeds.Of course, this is does not rule out the possibility that 32 cache design allows violations thatare only present with more than 4 caches. Two different approaches for further improvingconfidence are: (1) using tools and techniques to increase our model checking capacity, and(2) using a parameterized verification technique, which proves something about the cachedesign for any parameter value, including 32. Both approaches are explored in this thesis.Despite the complementary nature of safety and liveness, the majority of verificationresearch and practice focuses on the former. This is due to safety being easier conceptually,theoretically and computationally. Our view is that liveness checking is a crucial componentof the verification task. Not only is liveness usually a desirable property for systems to6adhere to, but liveness checking can also catch modeling errors. Suppose the user intendsto specify a system that will have reachable states S, but the user makes a mistake whenwriting the model and the resulting system only reaches a subset of the intended statesSˆ ⊂ S. If all states of Sˆ satisfy safety but some state of S \ Sˆ does not, then the safetyviolation will be obscured by the user error. However, simple liveness checks can revealerrors that were hidden by a modelling mistake and give the user confidence that thecorrected model faithfully captures the system as intended.In this thesis we focus on two specific classes of properties of finite state systems.Because we emphasize industrial application, we choose properties that practicing designerswill find easy to interpret. Likewise, our emphasis on large-scale problems leads us to chooseproperties that are both useful to the designer and computationally tractable for automaticverification.The first liveness property that we consider is deadlock-freedom (DF). The basic ideais that given a system model, we identify some actions as injecting new transactions intothe system (for example, requesting a read to or write from memory); other actions makeprogress on some pending transaction; and some actions complete one or more pendingtransactions. Let q denote quiesence, a description of states in which there are no pendingtransactions.DF states that from any reachable state, the model can reach a q-state; inother words, all pending tasks can be completed. In CTL (computational tree logic —described in more detail in Section 2.1), we can express DF as AGEF q. AG means “alongall paths (i.e. at all reachable states), and EF q means “there exists a path to a q state”.Thus, AGEF q is CTL for “from every reachable state, there is a path to a state thatsatisfies q.” We present efficient approaches for verifying DF, which can also be appliedto the more general version AG (p → EF q) which means “from any reachable state thatsatisfies p, there exists a path to a state that satisfies q.” Note that the existence of a pathdoes not guarantee that the path is taken. For example, an unending sequence of new tasksthat pre-empt some older task might continue to arrive; or the model may allow the system7to make non-deterministic choices, only some of which lead to completing all pending tasks.To show that every task is eventually completed, we need a stronger property than DF.Intuitively, a response property says that every request for some service is eventuallygranted. Let p-states be those in which the service has been requested, and q-states be thosein which the service has been completed. A response property specifies that every tracethat reaches a p-state will later reach a q-state. In CTL we write this as AG (p → AF q);this means that from all reachable p-states (AGp), all paths eventually reach a q-state. Incontrast, the E in EF q (as used to describe DF) says that there exists a path along whichq is eventually satisfied.Adding a bit more notation, well note that there are two commonly used logics forreasoning about liveness: CTL and LTL (linear-time logic). The differences are describedin Section 2.1 — neither is strictly more expressive than the other. In LTL, indicates“henceforth” (i.e. the property holds for all subsequent states) and ♦ denotes “eventually”(i.e. the property holds for some subsequent state). The CTL formula AG (p → AF q)is equivalent to the LTL formula (p → ♦q). When describing response properties, theLTL notation is more commonly used in the research literature, and we will use the LTLnotation here1.In practice, it is necessary to impose fairness assumptions on the system that for-mally express which traces are deemed to be realistically allowed by the expected underlyingimplementation. For example, an arbiter that always grants to one requester and not theother could be viewed as unrealistic and ruled out using fairness. More generally, mostmodels consist of many concurrently operating subsystems. It is common to have a sce-nario with entities a, b, and c where a makes a request of b while c can continue indefinitelyworking on its own. Without fairness, we must consider traces where a makes a requestand all subsequent actions are those of c. For many models, it is unrealistic to considerscenarios where c performs an arbitrary number of actions while b does nothing. We need1Some authors use G instead of and F instead of ♦ in LTL formulas. We use and ♦ notationas it will not be confused with the CTL notation that describes DF properties.8Property Type Increasing Capacity ParameterizedSafety PReach MC [22] CMP Method [40]Deadlock-Freedom PReach-DF MC, CTL MC [23]Response Properties LTL MC, PReach-Resp MC [93, 18, 56]Table 1.1: Verification methodsa way to exclude these unrealistic, also referred to as unfair, traces. If Fair is a predicateover traces expressing these assumptions, then Fair → (p→ ♦q) says that every fair tracethat includes a p-state will include a q-state in the future, or equivalently, that any tracethat visits a p-state and then cycles forever without visiting a q-state necessarily violatesfairness.We expect designers are more interested in response properties than DF because itguarantees something good will happen as opposed to it merely being possible. However,DF avoids the technical and conceptual difficulties associated with specifying fairness, isless computationally complex, and we believe many real protocols that violate response infact also violate DF. Intuitively, DF can express that pending transactions in the system cancomplete if new transactions stop being injected. On the other hand, response propertiescan reason about a particular kind of transaction always eventually completing regardlessof other activities.The cross product of the property types {Safety, Deadlock-Freedom, Response Prop-erty} and the goals of {increasing capacity, parameterized verification} gives 6 pairs, assummarized in Table 1.1. We briefly cover this table and expand on each entry in Sec-tion 1.3.The work of this thesis is formulated using the specification language of the Murϕmodel checker [52], a guarded command language [50] for finite state transition systems.However, we believe the methods are general to the languages used by other model check-ers [65, 41, 5].Model checking of safety properties has several well-understood enhancements that9aim to increase capacity, including abstraction, symmetry reduction and partial order re-duction. Our contribution in this area is the distributed explicit-state model checking(DEMC) tool called PReach, which utilizes the aggregate memory of hundreds of ma-chines and is entirely compatible2 with these other techniques. We have extended PReachto check both DF (PReach-DF) and response properties (PReach-Resp) in a distributedenvironment.Parameterized verification is known to be an undecidable problem, so human guid-ance is usually a key ingredient. There are a number of previous works for both safety andliveness properties in this area. The Chou-Mannava-Park (CMP) method [40] for safetyproperties has been demonstrated as an effective approach when applied to Murϕ modelswith a symmetric parameter. We built upon this method and developed a theory for prov-ing parameterized DF properties, also using Murϕ models with a symmetric parameter,that relies on PReach support for MC-DF checks [23]. See Chapter 2 for a survey oftechniques that deal with proving that LTL formulas hold in parameterized systems.To summarize, this thesis addresses the following problem.The ubiquity of highly concurrent hardware and software designs calls for veri-fication tools that support both safety and liveness. Given the conceptual andcomputational complexity of specifying and verifying liveness, we need verifica-tion methods that support straightfoward liveness properties and that can beapplied to industrial-scale problems.1.2 The Explicit-State ApproachBefore I discuss our contributions that utilize explicit-state model checking, it is importantto justify the use of this old and simple method. Indeed, landmark contributions have2Abstraction can be performed to generate the input model, and symmetry reduction is built into the Murϕ front-end. Partial order reduction, on the other hand, is not supported by PReachbut it has been investigated in a parallel model checking setting [100].10been made with symbolic model checking techniques that represent a set of states (orapproximation thereof) with a boolean formula over system variables, and the next-statefunction with boolean formula over primed and unprimed variables. The first widespreadsymbolic model checkers used binary decision diagrams (BDD) [35] and demonstrated thatthe state-space explosion can be curbed for many systems of practical importance [90].Since then, other symbolic approaches have been employed, including those that utilizeSAT queries and interpolants [95, 33]. These methods are necessary for what is referredto as hardware model checking, where a sequential circuit is modeled as an and-invertergraph with some modest number of input wires. Explicit-state model checking (EMC) isoften infeasible for these kinds of models as the number of input combinations blows upexponentially. However, we note that while symbolic approaches are essential for certainproblem domains, there is no “silver bullet” to dealing with state-space explosion. Asnoted by Hu [67, Section 1.2.3], there are 22n boolean functions on n variables, and anyrepresentation scheme that uses a polynomial number of bits O(P (n)) can represent atmost 2O(P (n)) of these functions.EMC tends to be appropriate when the reachable state space features sufficientnon-uniformity. In such cases, symbolic expressions may degenerate to a large formulathat resembles a disjunction of conjuncts, where each conjunct describes only a smallnumber of states. Then, symbolic operations will in a sense require just as much workas explicit-state, but with a larger constant factor in terms of time and memory cost.The aforementioned work by Chou et al. [40] suggests that EMC is the right choice whenchecking cache coherence protocols. To paraphrase:1. BDD performance is more sensitive to the data structures used to describe the pro-tocol;2. EMC can take advantage of symmetry reduction [70];3. EMC is better suited for disk-based techniques;114. SAT-based methods are not known to outperform BDD-based methods on cachecoherence protocols.In addition, EMC seems to be better suited for distributed implementations, atleast compared with BDDs. Existing approaches involve maintaning the BDD on a singlemachine until it is about to exceed the memory capacity, and then carefully split the BDDin order to balance memory usage among two machines and minimize duplication [61, 60].In contrast, distributing the reachable state space among machines is trivially achieved byusing a uniform random hash mapping [104]. We note that Bradley’s original IC3 modelchecking paper involves experiments that use multiple cores and machines, but appears tosaturate in performance at 8 to 12 threads [33]. There are also promising IC3 methodsfor showing liveness [34], but to our knowledge the parallel performance has not beeninvestigated.1.3 Verification Methods of this ThesisPReach (Parallel REACH ability) is a distributed explicit-state model checker based onMurϕ [22]. It is designed to be scalable and harness the aggregate computing power ofclusters of machines. PReach verifies Murϕ models and borrows Murϕ’s C++ imple-mentations of key model checking components, such as a hash table to store states andcompilation of Murϕ models to C++ code for the efficient computation of state successors.The main module that handles communication and coordination of threads is written inthe functional language Erlang [2] and is roughly 1000 lines of code. This separation of thedistributed algorithm from model checking details offers both efficiency and simplicity. Ourexperiments have shown that in addition to linear speedups, PReach can utilize hundredsof compute nodes to explore the state space of the largest Murϕ models ever checked —on the order of 100 billion states. The clean code and extensibility of PReach has beenleveraged in subsequent projects [23, 26, 27].12First, we implemented a sequential DF checking procedure as part of a method forparameterized verification of DF [23]. For a symmetric system S described in a guardedcommand language and parameterized by n, we seek to show that for any n, each reachablestate of S(n) has a path to some q-state, i.e., S(n) |= AGEF q, where q is a predicate3.Our approach is inspired by, and builds upon the CMP method [40] which establishesproperties of the form S(n) |= AGp for predicate p, which may include universal quantifiersover the parameterized set. They use a counter example guided abstraction refinement(CEGAR) approach [45]. First, an abstraction A of S(n) for every n > k is computed,which essentially keeps k fixed parameter entities fully modeled, and overapproximatesthe behavior of all others. Next, a trace of A is found via model checker that reaches a¬p-state. The user examines the trace and devises a non-interference lemma φ1 which isthe abstraction of an assumed invariant of S(n). The intention is that if φ1 holds of S(n)for all n > k, then the spurious trace cannot occur. This is achieved by strengthening Awith φ1 by restricting when actions of the abstracted parameter entities may occur, andmodel checking is used again to verify AG (p∧φ1). The user continues this process until theabstraction satisfies AG (p∧φ1 ∧ ...∧φm), a counterexample is found, or the user is unableto find the next useful φi. If AG (p ∧ φ1 ∧ ... ∧ φm) is established, the assumed invariantsalong with p are actual invariants of S(n) because A simulates S(n) for n > k.Adapting this method to proving parameterized DF presents challenges. We needto show that every reachable state in S(n) has a “witness” — a path to some q-state. Inorder to find such paths, it’s often necessary to include the actions of abstracted parameterentities. However, because the abstraction overapproximates these actions, it is unsoundto assume that they may occur as there’s no guarantee that corresponding actions exist inthe concrete system.Suppose we have used the CMP method to verify some safety properties of param-3Typically, q is universally quantified over the parameterized set, and is interpreted by to theset of quiescent states. In cache coherence protocols, this corresponds to states where there are nopending messages.13eterized system S(n) via abstract system A. As the transitions of A are overapproximate,there could exist reachable states for which no concrete analogue is reachable in S(n) (forany n). In our method, we employ a mixed abstraction which augments A with underap-proximate transitions U ; denote the overapproximate transitions with O. Transitions ofU are guaranteed to have a concrete analogue, and thus soundly imply existential pathproperties in S(n). To check such properties, we use state space enumeration to explorestates reached through O-transitions, and for each reachable state a path is found com-posed of U -transitions to a q-state. PReach implements this model checking procedure.If there exists a reachable state s˜ for which there is no U -path to a q-state, a counterex-ample trace is generated. Following an analogous CEGAR procedure to that of the CMPmethod, the user must either prove that s˜ is unreachable by strengthening O, or that sucha path should exist from s˜ by weakening U . The latter is done by using inference rulesthat we refer to as “heuristics” [23]; see Chapter 5. The proof obligations of heuristics aredischarged by model checking the mixed abstraction as described above. We applied thismethod to both the German and FLASH cache coherence protocols as case studies whereDF for any number of caches was established.Following up on this work we designed and implemented two distributed algorithmsfor verifying DF of systems. The user provides both the quiescent predicate q, along witha set of helpful transitions H that are expected to be a sufficient set to form a path fromevery reachable state to some q-state. Similar to the paths formed using U -transitionsabove, both algorithms perform a directed search using only H-transitions. Identifyingthese transitions provides an expression of designer intent. This approach allowed us todiscover a bug in the Peterson mutual exclusion model that had persisted in the Murϕdistribution for about 20 years. We found that in most cases the average search pathis short and PReach can perform this lightweight liveness check for a small additionaloverhead as compared to safety checking.To attack the verification of a response properties, we implemented a variant of14the One-Way-Catch-Them-Young (OWCTY) algorithm [73, 38] in PReach, capable ofchecking response properties on systems augmented with fairness on actions. PreviousOWCTY algorithm descriptions were aimed at checking arbitrary LTL formulas, used datastructures that respresented states sets with full state descriptors, and generally did notinclude fairness assumptions. In PReach, states are hashed to 40 bit descriptors, so thatchecking membership of a set is easy but member states cannot easily be reconstructed.Our approach uses forward reachability only, and augments the hash table entries withvarious bookkeeping bits to emulate membership in the various sets. We have found thisimplementation to be capable of checking response properties on fair systems with hundredsof millions of reachable states.1.4 Thesis StatementGiven some hardware design model, the class of properties that can be described as DF orresponse are of practical interest. Capacity constraints limit the effectiveness of automatic,sequential model checking approaches, especially for protocol level descriptions. Manydesigns can be modeled as a system that is parameterized, for example by number of cachesor addresses. Two approaches are to set the parameter as large as our model checking toolsand hardware resources allow, or to employ a parameterized verification method. Theseapproaches trade-off human effort with strength of the verification result with respectto the original design. Another trade-off is verifying response properties versus DF; theformer is a stronger result, but the latter is easier both from a computational and humaneffort perspective. Regardless of the approach, there is particular importance to mitigatethe burden on the user. By leveraging the stable, tested and large scale distribution ofthe state-space provided by the PReach model checker, we are able to achieve scalablemethods for liveness verification.This brings us to the thesis statement:15This thesis develops and demonstrates tractable, practical and scalable dis-tributed explicit-state model checking methods for establishing liveness prop-erties of practical importance for large-scale models of hardware protocols.1.4.1 Contributions1. PReach: an industrial strength parallel, explicit-state model checker capable ofchecking the largest published Murϕ models by distributing the computation acrosshundreds of machines. My contributions include:(a) Created the initial prototype of the distributed model checking algorithm.(b) Implemented and evaluated various approaches to load balancing to addressboth performance and thread crashes.(c) Gathered statistics on Erlang communication throughput and latency relativeto message size, motivating state-batching, load balancing, and other optimiza-tions.2. Parameterized DF: A novel method for establishing DF properties in symmetricparameterized systems. This is the first work in parameterized DF, and complementsthe CMP method for parameterized safety properties.3. PReach-DF: Two approaches for DEMC of DF. Shows that given simple and eas-ily available user input, can model check DF on large models for a small overheadcompared with state-space enumeration.4. PReach-Resp: A novel algorithm implementation for distributed explicit-state modelchecking of response properties on fair systems. Demonstrated that in practice theOWCTY algorithm takes only a modest constant number of expansions per state —far less than the worst-case performance.161.4.2 RoadmapThe rest of the thesis is organized as follows. Chapter 2 is related work, comparing andcontrasting our contributions with literature in DEMC, parameterized verification, andLTL model checking. The Section 2.1 provides technical background and some definitionsused throughout the thesis. Chapter 3 summarizes the PReach model checker. Chapter 5explains our approach to parameterized DF proofs. Chapter 6 covers the implementationof the distributed OWCTY-like algorithm. Finally, Chapter 7 summarizes the thesis andpoints to areas of future work.17Chapter 2Related WorkHere, we survey the related research from the literature and compare it with our work.First, Section 2.1 provides some definitions, notation for CTL and LTL temporal logics andbackground for the Murϕ description language. These are used throughout the rest of thisthesis and are relevant to related work. The next three sections divide the previous workinto categories. Section 2.2 describes parallel and distributed explicit-state model checkingwith a focus on tools. Section 2.3 examines parameterized verification with an emphasison techniques. Section 2.4 describes LTL model checking with a focus on algorithms. Weconclude with a summary in Section 2.5 that puts the contributions of this thesis intocontext.2.1 PreliminariesA system S is a triple (S, I, T ) where S is a set of states, I ⊆ S are the initial states, andT ⊆ S × S is the transition relation. System S may be viewed as a digraph G = (S, T ).The reachable states of a system are the states s for which a path exists in G from aninitial state to s. A trace is a walk of possibly infinite length in G from an initial state. IfS′ ⊆ S, let 〈S′〉 be the subgraph of G induced by the vertices of S′.A system has a set of variables each with finite range that each state is a valuation18of. If p is a predicate on state variables, then we say that s is a p-state if p(s) is true. Wesometimes overload this terminology; if A is a set of states, then an A-state is a memberof A. If v is a variable of S and s is a state, then v(s) is the value of v in s.2.1.1 Temporal LogicHere we give a brief explanation of CTL and LTL. For a formal description, see [68].CTL is a logic over paths of G, and are meaningful in any state of G. SymbolsA and E are branching quantifiers, where A means “for all paths from the current state”and E means “at least one path from the current state”. Each branching quantifier isimmediately followed by a path-specific quantifier:• X φ means φ holds in the next state on the path.• F φ means φ holds eventually on the path.• G φ means φ holds everywhere on the path.• [φ U ψ] means that φ holds at least until some state where ψ holds, and that ψeventually holds.• [φ W ψ] means that φ holds at least until some state where ψ holds. If φ holdseverywhere on the path then φ U ψ holds for any ψ.Here, φ and ψ are CTL formulas consisting of braching/path-specific quantfier pairsor state predicates. CTL formulas may be combined with the usual logic connectives: ¬,∧, ∨ and →.Unless otherwise stated, when we say that CTL formula ψ holds in S, it means thatψ holds in every initial state. As examples,• AGp means “always-globally p”, or p holds in all states along all paths;• AGAFp means “always-globally, always-eventually p”, or in all states along all paths,p holds eventually’19• AGEF p means “always-globally, exists-eventually p”, or in all states along all paths,there exists a path to a state where p holds.LTL is a logic over all infinite traces the system allows. As with CTL, the logicalconnectives have the usual meaning and p is a state predicate. As LTL reasons over a setof traces, there is no notion of branching quantifiers, only path-specific quantifiers:• φ means φ holds everywhere along the trace.• ♦ φ means φ holds eventually along the trace.• © φ means φ holds in the next state on the trace.• φ U ψ means that φ holds at least until ψ holds, and φ eventually holds along thetrace.• φ R ψ means that φ holds at least until ψ holds, but φ need not eventually holdalong the trace.• p means that p holds in the first state of the trace.We use both CTL and LTL in this thesis. Neither is strictly more expressive thanthe other. For example, the CTL formula AGEF p cannot be expressed in LTL, and the LTLformula ♦p cannot be expressed in CTL. Even though response properties are expressiblein both CTL and LTL, we choose standard LTL notation as it is consistent with theliterature on response properties.2.1.2 The Murϕ Description LanguageHere we give an overview of the Murϕ description language which is the input languageof the Murϕ model checker [52] and PReach. For the Murϕ user manual and a tutorial,see [98, Murphi 3.1]. Examples of full Murϕ models can be found in Figures 5.2 and 5.3,as well as Appendix A.20A Murϕ system gives the variables that dictate the state space, a predicate for theinitial states, and implicitly describe the transition graph through a sequence of guardedcommands, or rules. It also contains a list of state invariants. When model checkingcommences, initial states are constructed and then their successors computed accordingto the list of rules. Each rule has a guard, which is a predicate on states. If the guardholds true in state s, the rule is said to fire and the command part of the rule specifies anatomic update to apply to s to generate successor state s′. If all reachable states satisfy allinvariants, model checking completes once all reachable states have been visited. If somereachable state does not satisfy an invariant, checking halts and a counterexample trace isprinted.We will construct a traffic light controller model as an example, to highlight thevarious syntactic elements.The keyword const is used to specify constants.const NUM_DIRECTIONS 4;The keyword type is used for type definitions. Some built-in types include booleanand num (for integer values). In order to exploit symmetry reduction, a scalarset typemust be used.type DIR : scalarset(NUM_DIRECTIONS);This indicates an index set of symmetric entities. The values of the indices cannotbe compared with less-than or greater-than, only for equality. Enumerated, record andarray types can be defined.COLOR: enum green, yellow, red;SIGNAL: record color : COLOR; waiting_car: boolean; end;The system variables come next, using the keyword var. In our example, the onlyvariable is an symmetric array of type SIGNAL:var intersection: array[DIR] of SIGNAL;21The startstate keyword precedes a command for the initial states. These assign-ments are atomic, and take the form variable := value;. For loops may be used torange over index sets such as scalarsets. The initial assignment may be given a name —here we choose the Init.startstate "Init"for d : DIR dointersection[d].color := red;intersection[d].waiting_car := false;end;end;This will initialize each direction’s color to red and set the waiting car flag to false.Note that any uninitialized variables will be set to the special value undefined, regardlessof type. It is a Murϕ runtime error to read a variable with value undefined unless it iswith the special function isundefined.A ruleset is a group of rules generated from an index set. For example, we coulduse the following to allow green to yellow transitions for all directions.ruleset d: DIR dorule "green_to_yellow"intersection[d].color = green==>intersection[d].color := yellow;end;end;Murϕ predicates (and guards) may include the symbols &, |, !, ->, = and != forlogical and, or, not, implication, and comparing for equality and inequality, respectively.Some other rules we might want:ruleset d: DIR dorule "yellow_to_red"22intersection[d].color = yellow==>intersection[d].color := red;end;rule "red_to_green"intersection[d].color = red==>intersection[d].color := green;end;rule "car_arrives"!intersection[d].car_waiting==>intersection[d].car_waiting := true;end;rule "car_proceeds"intersection[d].car_waiting & intersection[d].color = green==>intersection[d].car_waiting := false;end;Finally, we include invariants that all states must satisfy, otherwise Murϕ will haltwith a counterexample trace. This also introduced forall expressions that form a conjunctof predicates depending on values from an index set.invariant "No_Collision"forall d1 : DIR do forall d2 : DIR doi != j ->((intersection[d1].color = green & intersection[d1].car_waiting) ->!(intersection[d1].color = green & intersection[d1].car_waiting))end end;This invariant says that at most one direction will have a car waiting and a greensignal at a time. But this invariant doesn’t hold of our current system, as all 4 directions23might be green with a car waiting. We could address this by only allowing a change fromred to green when all lights are red, changing the guard of rule red to green.rule "red_to_green"forall d1: DIR do intersection[d1].color = red end==>...This new system satisfies the invariant, but probably isn’t a good traffic light sinceat most one direction can be green at a time. In order to allow opposing directions to bothbe green at once would likely require us to rethink the types and symmetry. We won’t gointo this level of detail, as what we have presented is enough to follow the Murϕ code ofthis thesis.2.2 Parallel and Distributed Model CheckingThe two most common methods of performing model checking are explicit-state space enu-meration and the use of symbolic representations. For some industrial protocols, explicit-state model checking is considered to be the more effective verification technique [104], andhence is employed by numerous tools such as Murϕ [52], SPIN [65], TLC [114] and JavaPathFinder [63]. Most of these model checkers explore the state space in a sequential man-ner which can be a hindrance in terms of both memory usage and runtime when verifyingsystems with a large state space. This motivates the use of distributed explicit-state modelchecking (DEMC) tools. These tools may be categorized into those that handle safetyonly [104, 82, 96, 32] and those that handle some form of liveness [13]. DEMC for safetyproperties has received more attention. Generally, liveness checking is more complex bothcomputationally and conceptually, yet it remains of industrial importance. To address thisneed, the main focus of this thesis is to design scalable solutions for interesting livenessproperties of real models.24The PReachmodel checker [22] is an implementation of the Stern-Dill algorithm forDEMC [106]. It reuses back-end computation components of the Murϕ model checker [52]and reads models written in the Murϕ language, which has become the de facto industrystandard for modelling hardware protocols. This algorithm statically partitions the statespace among processing nodes according to a uniform hash function, where each node is saidto own a set of states. When a state is expanded, the successors are sent to their respectiveowners where they are stored and subsequently queued for expansion, and duplicate statesare ignored. Thus, state ownership specifies the compute node that is responsible forstoring a given state. In PReach, a hash table implementation borrowed from Murϕ [103]is used for each node to record the states it has encountered. See Figure 2.1 for a flow-chartoverview of Stern-Dill DEMC.The fact that state ownership is independent of which compute node performs theexpansion leads to load balancing schemes on the states that are queued for expansion (thework queue). That is, as long as a given state is initially sent to its owner to check for hashtable membership, any compute node may perform the state expansion step. The needfor load balancing in DEMC has previously been identified by Behrmann in the context oftimed automata model checking [19], and later by Kumar and Mercer [77]. We contrast thelatter scheme with that of PReach in Section 3.3. Also, when the work queue becomeslong, we may store most of its states on disk [105] to keep the majority of main memorydedicated to the hash table. Because the work queue is written and read sequentially withlarge blocks of states per disk access, using disk storage has minimal impact on performancecompared with using memory only. States are batched into messages with typically 100 to1000 states before sending between compute nodes. This batching method has been usedby others in the context of DEMC [106, 96].The closest tool to PReach is a model checking tool that has been extensivelydescribed in the literature called DiVinE [111, 12, 10, 4, 6, 7, 13]. The tool originated inJiˇr´ı Barnat’s PhD work [15], and the project has been ongoing since. Initially the tool was25Insert state inhash table andwork queueRoot ProcessInitial StatesWorker Processesto ownersSend initial states (a) (b)Remove state fromwork queueCompute successor statesSend successor statesto ownersLOOPReport result Detect terminationReceive a stateLOOPNO YESin hash table?Figure 2.1: Stern-Dill DEMC flow-chart. Initial states are computed by a designated rootprocess and then sent to their respective owners. These worker processes receive incomingstates and check them for membership in a hash table. If the state is present, it is ignored.Otherwise, the state is inserted into the hash table and queued for expansion. Workerprocesses alternate between threads (a) and (b), where the former receives states and thelater computes state successors and sends them off to owners. Termination is detectedby the root process when all reachable states have been expanded or a state that violatessafety is found.intended to take advantage of distributed memory for model checking LTL formulas, andserved as a platform for algorithm evaluation and comparison. This led to further researchof algorithms and computing environments. Barant, Brim, and Rocˇkai [8] investigated therestricted problem of model checking weak LTL formulas using DiVinE. An LTL formulais weak if the product automaton of the formula and system that accepts counterexampleshas the property that in a given SCC every state is either accepting or none of them areaccepting. The same authors wrote an insightful paper on the challenge of applying partialorder reduction in a parallel setting [10]. On the compute environment side, DiVinEwas extended to run on CUDA enabled graphics cards [4]. They were able to utilizetwo graphics cards to speed up the model checking computation, the only work I am26aware of to do so. This work was for LTL model checking using the MAP Algorithm (seeAlgorithm 2.3). The initial state space is enumerated on a single multicore machine, andthen the reachable state graph is converted to a sparse matrix representation. This graphrepresentation is partitioned into at most two pieces and copied to the memory of theCUDA devices where the MAP algorithm runs. DiVinE has also been used to evaluate thebenefit of exploiting Flash memory for model checking algorithms that formerly utilizedmagnetic disk to store states [6]. Recently, DiVinE 3.0 was released with features aimedat software model checking [7]. Taking as input LLVM bytecode, it is capable of modelchecking C/C++ programs, along with other languages that compile to LLVM.While the DiVinE tool has explored many aspects of parallel and distributed LTLmodel checking, its scalability and applicability to industrial problems is unconfirmed. Onepublication reports it handling state spaces consisting of as many as 419 million states [11].However, another researcher has reported scalability issues when running DiVinE on 16machines [54]. The developers of DiVinE state that it is likely to crash when the computenodes are heterogenous, as it is targeted towards homogeneous clusters [13]. DiVinE has noexplicit algorithms for dealing with strong fairness assumptions, and model checking withmore than a handful of these assumptions renders the computation intractable (more detailsprovided in Section 6.7). While DiVinE has optimizations for models with weak fairnessassumptions [16], it is only possible to attach weak fairness to all rules, not a subset. Thus,DiVinE cannot catch the Murϕ Peterson bug explained in Chapter 4. DiVinE is unableto verify our notion of deadlock-freedom, as it cannot be specified in LTL. As far as I amaware, the benchmarks used to measure performance of DiVinE are substantially smallerthan the problems arising from real, industrial coherence and communication protocolsconsidered in this dissertation.LTSmin [32] is a model checking tool that handles a variety of model inputs and sup-ports (sequential) explicit-state, symbolic BDD-based, and DEMC of safety. It is designedin a modular fashion where input models are translated into an internal representation27that the model checking algorithms use, so it is straightforward to add new modellinglanguages. The Murϕ language is not currently supported, nor is DEMC of liveness prop-erties. The majority of work surrounding LTSmin appears to focus on multicore, sharedmemory model checking [85]. One paper that deals specifically with DEMC attacks theproblem of model checking models with unbounded recursive data types [31]. In thesemodels, states are not restricted to a fixed size, which leads to difficulties when hashing,distributing and comparing for equality. The largest model verified with LTSmin’s DEMCapproach is about 570 million states [32].Some other examples of well-known explicit-state parallel model checkers are adap-tations of Murϕ and SPIN. “Parallel Murϕ” [104] is a parallel version of the Murϕ modelchecker [52] based on parallel and distributed programming paradigms. Eddy Murϕ [96]has improved speed over Parallel Murϕ by the separation of concerns between next-stategeneration and communication during distributed model checking. It was intended to be adistributed model checker for Murϕ. Unlike previous work, Eddy Murϕ was implementedto take advantage of emerging multicore CPUs by using two MPI threads per computenode (one for computation and one for communication). As of this date, there is no activemaintainer of Eddy Murϕ. We have found that a recent version the software reports incon-sistent numbers of reachable states on multiple runs for the same, simple models. Thus,we do not believe that Eddy Murϕ is trustworthy in its current form. The 4500+ lines ofC++ code that comprise Eddy make it more complicated than PReach. The simplicityand stability of PReach make it the obvious foundation for our investigation of DEMCfor liveness properties.PSpin has also been used for performing distributed model checking with the ca-pability of handling up to 2.8 million states [82]. A distributed version of the Spin modelchecker [65] for checking safety only, PSpin departs somewhat from the Stern-Dill approachby allowing the hash function to be less uniform. Using a hash function that only dependson subset of the system variables can in some cases improve performance. In particular,28this non-uniform hashing exploits the structure of models written in Spin’s specificationlanguage, Promela. These models describe the system through processes with some in-dependence between them. Therefore, if many transitions only change variables that thehash function does not depend on, successor states need not be communicated to anotherworker and remain locally owned. The PSpin work shows that with ratios between theworker with the least states and the worker with the most states of as low as 0.71, largeperformance gains are possible when compared with uniform hashing. DEMC has alsobeen studied for models described by Petri nets [75, 20].2.2.1 Model Checking Deadlock FreedomOur approach for verifying the deadlock freedom property AGEF q involves firing helpfultransitions provided by the user. This essentially finds paths that follow pending systemtransactions (say, servicing a cache-miss) to completion, and q-states describe states with nopending transaction. This is reminiscent of techniques used in other verification contexts.Pipeline flushing used by Burch and Dill in their seminal paper [37] shows refinement ofprocessor pipelines by completing in-flight instructions that are beyond a commit point aspart of the refinement map. The idea of iteratively firing certain commands to completein-flight transactions is also similar to the completion functions used by Park and Dill [101],though again their goal was to verify refinement. Neither DiVinE [12] nor Eddy [96] arecapable of checking CTL properties such as AGEF q, and as mentioned above, neither hasbeen applied to large scale problems as has been done with PReach [22]. Our approachdiffers from the classical CTL model checking algorithm [44], which performs a pre-imagefix-point computation from q to compute the set of states that satisfy EFq, and thenchecks that the reachable states are contained in this set. With an explicit-state algorithmcomputing the pre-image fixpoint of q can be expensive. Suppose a Murϕ rule has anupdate of the form var := expr , where the rule guard and expr is independent of var .Then, var could have any type-consistent value in a previous state, many of which could29be unreachable states in the system. By using a forward search, we ensure that the spaceand time complexity is proportional to that of doing (explicit) state-space exploration.Doing CTL model checking only using forward searches has been investigated for symbolicmodel checking, e.g. the work of Iwashita et al. [72]. They show that using forward searchonly offers performance improvements for many CTL properties.2.3 Parameterized VerificationOur work in verifying parameterized deadlock-freedom builds upon the CMP method [40]for verifying parameterized safety properties. The CMP method has motivated otherworks, including its formalization [76], automation [28], and application to industrial pro-tocols [107, 99].There have been many previous efforts to extend compositional techniques to pa-rameterized safety property verification [42, 92, 93, 40, 76, 86, 28, 83, 99]. Pioneering workby Clarke et al. [42] showed that a mutual exclusion scheme with n nodes arranged in aring that passed around a token was correct for all n. This was established by means of abisimulation between a system with k nodes and one with k + 1 nodes for all k > 1. Thelogic to describe parameterized properties was Indexed CTL* (or ICTL*), which is CTL*without the next-time operator1, and indexed formulas of the form∧i f(i) or∨i f(i) wherei ranges over the nodes and f is a predicate over variables local to that node. More recently,the work of Lv et al. [86] gives a technique for automatic strengthening of a CMP-like ab-straction of symmetric systems. If the abstraction has m concrete nodes, they instantiatea system with m + 1 nodes and use invariants of this system as proposed variants of theabstraction.As for liveness-like properties, there are several notable works. McMillan’s work onusing compositional methods for LTL liveness properties [91] was applied to parametricliveness verification of the FLASH coherence protocol [93]. Using SMV, the user can1CTL* is a logic that is a superset of both CTL and LTL.30automatically generate a sound abstraction, and then use lightweight theorem-provingcommands for concretizing additional caches or proposing user-devised noninterferencelemmas, based on counter example analysis. Although [93] focuses on a proof of safety, thesame framework was used to show that whenever the directory is in the pending state, itis eventually not pending [94]. McMillan’s proof relies on a handful of lemmas and fairnessassumptions, designed and proven within SMV.Fang et al. proposed an interesting technique called invisible ranking [56] whichattempts to automatically guess ranking functions to prove response properties. The as-sociated proof obligations (from [88]) are decided using some small-model theorems (con-ditions under which it is sufficient to check some property on only a small instantiationof the parameterized system) and BDD based methods for proposing invariants for theparameterized system. The authors have previously use what they refer to as counterabstraction for parameterized liveness verification [102]. Here, liveness of parameterizedmutual exclusion protocols is established by an abstraction that maps counter variables tohave a range of {0, 1,∞}.Baukus et al. employ WS1S (Weak Second-order theory of 1 successor — a decidablesecond-order logic) to perform liveness verification of parameterized systems [17], and verifyresponse properties for the German protocol as a case study [18]. Like our approach,human effort is required, to select both abstract predicates and ranking predicates neededto create an appropriate abstraction. The complexity of deciding WS1S is well-known tobe super-exponential, hence scalability of this approach seems unlikely. It is unclear ifWS1S is expressive enough to model array variables indexed by the parametric type andwith values over the parametric type. Such variables appear in the Murϕ description ofthe Flash protocol.The earliest example we could find where both over-approximative and under-approximative abstractions of a transition system are employed for verification is thework of Larsen and Thomsen [81]. They distinguish between necessary and admissible31transitions; for a process to refine another it must over-approximate the former and under-approximate the latter. Of course our interests are in abstraction rather than refinement,which are in a sense inverses of each other. Later the work of Dams et al. [48] and in-dependently Cleaveland et al. [46] used mixed transition systems which are defined withtwo transition relations to formulate abstractions that preserve both universal and exis-tential properties of the modal µ-calculus. Our mixed-abstractions are similar, but theseauthors focus on the problem of curbing state-space explosion of a fixed system using anabstraction that preserves certain properties, and we utilize abstractions for parameterizedverification.2.4 Model Checking LTL FormulasThe standard approach to LTL model checking [110] involves constructing the productautomaton that is the synchronous cross product of the Bu¨chi automaton2 that acceptsthe negation of the property in question, and the Bu¨chi automaton for the system itself. Ifthe language accepted by the product automaton is empty, then the LTL property holds;otherwise, a counterexample trace is found. Counterexamples correspond to reachablecycles of the product automaton that do not include an accepting state (where all systemstates are accepting). LTL model checking suffers from state-space explosion and thisapproach takes time and space O(n2|φ|), where n is the number of states in the originalsystem, φ is the LTL formula and | · | is formula size, i.e., the number operators and state-predicates. In this standard approach, including the algorithms we will present shortly(Algorithms 2.1, 2.2 and 2.3), fairness assumptions are handled implicitly, i.e., by addingthem as antecedents on the formula to check.Explicit-state methods for LTL model checking must utilize some method of decom-posing graph (V,E) into its strongly connected components (SCCs). Using a sequential2A Bu¨chi automaton is similar to a nondeterministic finite automaton with accepting states F ,but accepts traces of infinite length — those that visit an F -state infinitely often.32algorithm for accepting cycle detection such as Tarjan’s [108], SCCs may be found inO(|V |+ |E|) time. This, along with several other algorithms for the same problem rely ona depth-first search (DFS) to compute SCCs, however, DFS is known to be P-completeand therefore is not easily parallelizable. However, such DFS-based algorithms are un-suited to parallelization unless3 P = NC [9]. In comparison, the time complexity of ourapproach using PReach for proving response properties differs by a factor of r/s wheres is the speedup of using multiple threads. Manna and Pnueli presented a sequential al-gorithm for model checking response properties of fair transitions systems [89], but thisnot easily parallelizable and thus scalability is limited. Recently, Holzmann implementedsome interesting liveness checking algorithms in a multicore version of SPIN [66]; howeverthis approach will only find counterexamples of bounded length. Other work related toours includes that of the authors of the LTSmin model checking tool, most notably theiralgorithms for parallel SCC decomposition on multicore machines [79, 55].BFS-based algorithms, on the other hand, are amendable to parallel implementa-tions. We present below (Algorithm 2.1) a simple such algorithm, referred to as BasicDistributed Cycle Detection4; but first some definitions.A Bu¨chi automata (S, I, T, F ) is a digraph with vertices S and arcs T ⊆ S × S.Initial states I are a subset of S, as are the accepting states F . In our presentation, wetypically express a Bu¨chi automata as a graph G = (S, I, T ) and associated accepting statepredicate F , so that different notions of trace acceptance are easy to write. Given X ⊆ S,Reachable(G,X) computes the set of states reachable in G in zero or more transitionsstarting from states of X. Reachable+(G,X) computes the set of states reachable in Gin one transition or more, starting from states of X.Each time around the loop, line 9 will remove from Cur all F -states that are un-3P is the complexity class of problems that are decidable in polynomial time. NC, known as“Nick’s Class”, is the set of problems decidable in polylogarithmic time on a parallel computer witha polynomial number of processors.4While variants of this approach are common in the literature, we are not aware of this particularformulation having been previously published.33Algorithm 2.1 Basic distributed cycle detection1: procedure Basic(G = (S, I, T ), F )2: . find accepting states reachable from themselves3: Cur : set of states4: Old : set of states5: Cur ← Reachable(G, I) ∩ F . All reachable F -states6: Old ← ∅7: while Cur 6= Old do8: Old ← Cur9: Cur ← Cur ∩ Reachable+(G,Cur) ∩ F10: end while11: if Cur = ∅ then12: property holds13: else14: Cur contains a cycle with an accepting state15: end if16: end procedurereachable from another F -state in S. Since any F -state on a cycle is reachable from itself,it will never be removed from S.The DiVinE model checker supports a number of SCC detection algorithms [14].We summarize their two main algorithms: One-Way-Catch-Them-Young and Maximum-Accepting-Predecessor. OWCTY was first described in [57] and a parallel implementationwas considered in [38]. It is similar to Algorithm 2.1, but tags states of Cur with a counterof the number of predecessors each state currently has in Cur . The reachability compu-tation determines the counters for each state reachable from the given set, and functionEliminateNoPreds utilizes these counters to removes any states from Cur without apredecessor in Cur , until a fixpoint is reached. This approach can result in fewer loopiterations than in Algorithm 2.1, as it can pre-emptively remove states. For example, if〈Cur〉 includes a long path of states that each have one incoming and one outgoing arc,except for the start of the path which has no incoming arcs, this entire path will be removedby the EliminateNoPreds operation. The MAP algorithm tags states of Cur with anF -state (or null), called its maximum accepting predecessor. The F -states are statically34assigned an ordering ≺, where the null value is ≺ any F -state. Before entering the loop,Cur contains all states that can be reached from some (reachable) F -state, all tagged withnull. The MAPs of are updated by finding, for each state of s ∈ Cur , the ≺-greatest statethat has a path to s within 〈Cur〉. The while loop checks if there exists a state of Curthat is its own MAP; if so, this F -state necessarily lies on a cycle. Otherwise, any stateappearing as a MAP of some state of Cur does not lie on a cycle and is thus removed.Likewise, any state with a MAP of null can be removed as well, as such states cannot lieon an accepting cycle.While the high-level pseudocode appearing in Algorithms 2.1, 2.2 and 2.3, werepresented as sequential, they have direct parallel and distributed message passing imple-mentations by using parallel methods for the breadth-first search [14].Algorithm 2.2 One-Way-Catch-Them-Young (OWCTY)1: procedure OWCTY(G = (S, I, T ), F )2: Cur : set of (state, integer) pairs . states tagged with ints3: Old : set of (state, integer) pairs4: Cur ← (Reachable(G, I) ∩ F )× {0}5: Old ← ∅6: while Cur 6= Old do7: Old ← Cur8: Cur ← ReachableCountPreds(G,Cur)9: Cur ← EliminateNoPreds(G,Cur)10: end while11: if Cur = ∅ then12: property holds13: else14: Cur contains accepting states on cycle15: end if16: end procedureA model checking algorithm is considered to be “on-the-fly”, if it may find a counter-example before constructing the entire graph. In [59], an on-the-fly algorithm for LTLmodel checking is given that attempts to use as little memory as possible by performing aDFS when checking the product automaton. This algorithm, utilized by the SPIN model35Algorithm 2.3 Maximum Accepting Predecessor (MAP)1: procedure MAP(G = (S, I, T ), F )2: Accepting : set of states3: Cur : set of (state, state or null) pairs4: ≺: ordering on F -states5: Accepting ← Reachable(G, I) ∩ F6: Cur ← Reachable(G,Accepting)× {null}7: Cur ← FindMAPs(G,Cur) . compute MAP fixpoint8: while Cur 6= ∅ ∧ ∀(s,m) ∈ Cur . s 6= m do . terminate if a state is its own MAP9: Cur ← EliminateMAPs(Cur) . remove all states that are MAPs10: Cur ← FindMAPs(G,Cur) . compute MAP fixpoint11: end while12: if Cur = ∅ then13: property holds14: else15: Cur contains accepting state on a cycle16: end if17: end procedurechecker, generates a Bu¨chi automaton for the negated property called the tableau graph byrecursively breaking down the formula in question, creating nodes tagged with a formulaholding presently, and with another formula that it’s successors will satisfy. For example, ifthe formula µ U ψ holds presently in a node, the key identity µ U ψ ≡ ψ∨ (µ∧© (µ U ψ))is applied. The node is split into 2 nodes, one with present formula µ and successorformula µ U ψ and another with present formula ψ and an empty successor formula. Onceall present formulas have been processed in node u, a successor node v is created with apresent formula set to the successor formula of u. The appeal of this approach is thatwhen paired with an on-the-fly method of exploring the model, this tableau graph mayalso be explored on-the-fly with parts of it generated on-demand. This avoids generatingthe entire product automaton in cases where the model does not satisfy the property.LTL model checking can also be performed using symbolic methods and BDDs [36].The nuSMV model checker [41] handles LTL formulas by translating them to CTL formulaswith fairness constraints [43]. Unlike our work, this model checker is symbolic and not36implemented to run on a distributed cluster of machines.Recent work includes that of Kesten et al. [74], where a symbolic LTL model check-ing algorithm is presented that explicitly integrates fairness in terms of compassion andjustice [88, 89]. In general, fairness assumptions are properties of “interesting” execu-tion traces, and traces that do not satisfy fairness are considered unrealistic and shouldbe ignored as potential counterexamples. Compassion, also known as strong fairness, as-sumes that some event occurring infinitely often will be followed by another event occurringinfinitely often. Justice, also known as weak fairness, assumes that some event occurs in-finitely often. The key idea is to find the set of feasible states, that is the set of statesthat lie on some fair trace of the product automata. If this set is empty for a negated LTLformula, then the original LTL formula holds. Let the predecessor closure of a set of statesA be the reachable states via reversed transition arcs, starting from A. Let J be a set ofpredicates Ji, where a fair trace visits a Ji-state infinitely often for all i. Let C be a set ofordered pairs of predicates (pi, qi), where a fair trace visits a qi-state infinitely often if itvisits a pi-state infinitely often for all i.The algorithm of Kesten et al. first symbolically computes the reachable states Sof the system. Then, the following three steps are iterated until a fixpoint is reached.1. Remove from S all states without a successor in S.2. For each Ji ∈ J , remove from S all states that are not in the predecessor closure ofthe J-states.3. For each (pi, qi) ∈ C, remove from S all p-states that are not in the predecessor closureof the q-states.Finally, return the predecessor closure of S.Previous symbolic algorithms dealt with compassion by adding the conditions asan antecedent to the LTL formula to check. Another approach is to transform a compas-sion constraint (p, q) into a justice constraint, which involves adding an auxiliary boolean37variable b to the system. This variable is initialized to false and every transition maynondeterministically set it to true, from which point it remains true with each subsequenttransition. Intuitively, this flag is “guessing” a point in the computation trace at which nofurther p-states will be visited. Thus, any (b ∧ p)-state is considered an error state wherecomputation halts, and cannot be on any accepting fair trace. The additional justice con-straint is b ∨ q; either a q-state is visited infinitely often, or b holds infinitely often (andpersistently), indicating that p cannot hold infinitely often.The authors of [74] compare the performance of these 3 approaches for dealing withcompassion (algorithmically, as an antecedent, transform to justice), and conclude thathandling both compassion and justice algorithmically has the best performance. This paperalso gives a clear and concise overview of the development of LTL tools and techniques.Like nuSMV, the approach here is symbolic and not distributed.Cˇerna´ and Pela´nek made the important observation that the sequential symbolicalgorithms of Kesten et al. are easily adapted to DEMC algorithms [38]. One such algorithmis Streett-Detect-Cycle (Algorithm 2.4). Input G = (S, I, T ) is a transition graph Cis the set of compassion constraints as above. The algorithm returns false if the containsa reachable fair cycle, and true otherwise. Although this algorithm does not take a Bu¨chiautomata as input and does not technically solve the LTL model checking problem, itdemonstrates how to check if any trace satisfies strong, state-based fairness.For set X ⊆ S, Elimination(G,A) is the set of states obtained by removing anystate from X without a predecessor in 〈X〉 until a fixpoint is reached. The algorithm isdistributed by assigning each state an owner thread with a random uniform hash func-tion. Each thread is responsible for the bookkeeping of all states it owns, including whichreachable states are currently members of S. As with Algorithms 2.1, 2.2 and 2.3, allcomputations Reachability and Elimination can be implemented with BFS and mes-sage passing between threads, similar to the Stern-Dill DEMC algorithm used by PReach(see Algorithm 3.1). Our adaptation of Algorithm 2.4 that deals specifically with response38properties is given in Chapter 6.Algorithm 2.4 A LTL DEMC Algorithm with Strong Fairness [38], slightly adapted1: procedure Streett-Detect-Cycle(G = (S, I, T ), C)2: Cur ← Reachable(G, I)3: Old ← ∅4: while Cur 6= Old do5: Old ← Cur6: for all (pi, qi) ∈ C do7: Cur ← (Cur − pi) ∪Reachable(G,Cur ∩ qi)8: end for9: Cur ← Elimination(Cur)10: end while11: Cur 6= ∅12: end procedure2.5 SummaryThis survey of the literature puts our contributions into context. We build directly on Sternand Dill’s parallel Murϕ tool [52, 106], this making the PReach tool a peer to Eddy [96].The CMP method for parameterized safety verification is the foundation of our contribu-tion in proving parameterized DF. Our approach of distributed model checking of DF isa new model checking technique. The directed search and utilization of “helpful rules” isinspired by previous verification methods [37, 101] that involved flushing of transactions.The earlier work verified safety properties via refinement and was domain specific. Ourtechnique provide a simple but useful and computationally tractable verification of DF thatcan be applied to a wide range of models. Finally, we leveraged previous model checkingobservations about handling fairness algorithmically [74] and the high level OWCTY al-gorithm for LTL model checking [57, 38] to devise a new algorithm for distributed modelchecking of response properties.We bring these insights and ideas together in the PReach model checker. PReachis a robust, scalable, distributed explicit-state model checker that has been applied to real-39world verification problems for hardware protocols. This tool was leveraged to prototypeefficient DEMC of response properties with fairness, a problem of practical interest. Ourparameterized verification work is novel in the space of parameterized liveness as it extendsthe successful CMP method to support deadlock freedom properties. In practice, theadditional effort to show DF is small once CMP has been used to show safety. Together,our algorithms and their implementation in PReach demonstrate that DEMC is a practicalway to automatically verify liveness properties that are of real-world significance and thathave previously been beyond the capability of model checking tools.40Chapter 3The PReach Model CheckerPReach is a distributed explicit-state model checker based on Murϕ. It was designedto be a reliable, easy to maintain, scalable model checker that was compatible with theMurϕ specification language and suitable for industrial protocols. PReach uses legacyMurϕ C++ code to handle the CPU and memory intensive aspects of the model checkingcomputation. The communication and coordination for the parallel/distributed part isimplemented in the concurrent functional language Erlang, chosen for its parallel program-ming elegance. This leverages the complementary strengths of C++ and Erlang and offerscompatability with existing Murϕ models.1We used the original Murϕ front-end to parse the model description, and borrowMurϕ’s back-end code for maintaining local hash tables and performing state expansions.The Erlang layer handles communication, load balancing, termination detection and otheraspects of the parallel algorithm. This approach offers a clean and simple implementation,with the core parallel algorithms written in under 1000 lines of code. This chapter describesthe high level algorithm, including the various features that are necessary for the largemodels we target. To date, the PReach model checker has verified an industrial cachecoherence protocol with approximately 90 billion states. To our knowledge, this is larger1This chapter is based on the first PReach model checker publication [22].41than any published model verified with an explicit-state model checker. PReach is inongoing development has been released to the public under an open source BSD license [30].3.1 AlgorithmPReach is based on the Stern and Dill distributed explicit-state model checking (DEMC)algorithm [104], a distributed breadth-first search2 that partitions the space across thecompute workers (a.k.a. nodes)3. A message passing environment is assumed, as well as auniform random hash function that associates an owner node with each state. Pseudocodefor Stern-Dill is given in Algorithm 3.1; details such as termination detection, error tracegeneration, as well as all PReach-specific features have been omitted for simplicity. Wehave also done some minor code reformatting compared to the original paper.Each worker has two main data structures T and WQ , which are a set of statesand a list of states, respectively (declared on lines 1 and 2). We say a state has beenvisited if it has been received by its owner, and a state is expanded if its successors havebeen computed. Set T is maintained to contain the states owned by the worker that havebeen visited, while WQ are those states that have been visited but not yet expanded. Thecomputation begins with each worker executing the main routine Search. After initializingT and WQ as empty and synchronizing with a barrier, Search sends the initial states totheir respective owners in lines 8 to 12. Next each worker enters a while loop (line 13) thatiterates until termination has been detected. Termination detection checks that each nodehas an empty work queue and no messages are in flight by comparing that total numberof messages sent to the total number received. We use the same termination detectionalgorithm as in [106]; details are presented in that paper.2Strictly speaking, the search order is not breadth-first because the timing of communicationactions results in various visitation interleavings. However, the algorithm is “breadth-first” in thateach worker operates on a FIFO of states.3The workers can run on distinct hosts or some can seamlessly sit on the same host, e.g. on amulticore machine.42In the body of the while loop, procedure GetStates is invoked on line 25, whichiteratively pulls incoming states from the mailbox (or message queue), that is, the runtimequeue of messages sent to this node but not yet received by the thread. Each state s inGetStates that is not in T is inserted into T and appended to the tail of WQ . Oncethere are no more states to receive from the mailbox, control returns to line 15 where WQis checked for emptiness. If it is nonempty, we pop a state s from WQ and compute itssuccessors. Each successor is first canonicalized for symmetry reduction [70] on line 18,and then the resulting state s′c is sent to its owner.We now discuss PReach’s implementation. To harness fast and reliable code,PReach uses existing Murϕ code for many key functions involved in explicit-state modelchecking, such as:• front end parsing of Murϕ models,• state expansion and initial state generation,• symmetry reduction,• state hash table T look-ups and insertions,• invariant and assertion violation detection,• pretty printing of states (for counter-example traces).To facilitate Erlang calls to Murϕ functions, we wrote a light-weight wrapper on top ofexisting Murϕ C code, which we call the Murϕ Engine. We also employ the Murϕ front-endthat compiles the Murϕ model into a C++ library.Aside from space and time efficiency during model checking, using the Murϕ Enginehas another clear benefit: the Erlang code need only handle management of the distributedaspects of the algorithm, while the Murϕ Engine handles key computations that pre-existing Murϕ code can do quickly and correctly. The Erlang code, at a high level, resemblesAlgorithm 3.1 with key functions such as Successors(), Canonicalize(), and Insert() calling43into the Murϕ Engine. One notable exception to this paradigm is the work queue WQ ; thisis handled in Erlang code and uses an optimized disk-file to allow for efficient processingof very long queues.Algorithm 3.1 Stern-Dill DEMC1: T : set of states2: WQ : list of states3:4: procedure Search( )5: T ← ∅6: WQ ← []7: barrier()8: if I am the root then9: for each start state s do10: Send s to Owner(s)11: end for12: end if13: while ¬Terminated() do14: GetStates()15: if WQ 6= [] then16: s← Pop(WQ)17: for all s′ ∈ Successors(s) do18: s′c ← Canonicalize(s′)19: Send s′c to Owner(s′c)20: end for21: end if22: end while23: end procedure24:25: procedure GetStates( )26: while there is an incoming state s do27: Receive(s)28: if s 6∈ T then29: Insert(s, T )30: Append(s,WQ)31: end if32: end while33: end procedure44Stern and Dill’s distributed model checking algorithm [106] partitions the statespace among processes with a uniform random hash function. Processes are said to ownstates that hash to their process IDs. Once a state has been visited, its owner process isresponsible for storing it locally. In PReach this is done with the Murϕ model checker’shash table [103] which uses a predetermined number of bits4 to represent each state. Theuse of hash compaction and bloom filters in explicit-state model checking is a thoroughlystudied area [113, 53] and lends itself to practical approaches. Hash table compressionadmits a small probability that some state will erroneously be viewed as visited when itactually hasn’t been. In our experience this probability is tiny; for example, a very largemodel checking experiment with about 100 billion states had only a 0.03% chance of amissed state [22]. The experiments in this chapter admit a much smaller probability thanthis; the German6 model with over 316 million states had a probability of a missed state ofless than 7.36 × 10−5. If this probability were of practical concern, the user could simplyre-run the tool using a different seed for randomization and reduce the probability of amissed state in both runs to less than (7.36× 10−5)2 < 5.42× 10−9.3.2 CreditingAn important problem encountered when running large models with early versions ofPReach was that some workers would slow in state expansion rate or completely crash. Ina naive implementation of DEMC, each node sends all the successors of a state to the own-ers of the successors immediately upon expanding a state. After analyzing Erlang mailbox5size statistics, we found that the problematic nodes were accumulating a disproportionatenumber of messages in the mailbox. Simple tests of producer-consumer code written inErlang have revealed that the time for an Erlang process to receive a single message in-4This number is a configuration parameter. The results in this chapter use the default value of40 bits.5The mailbox stores the incoming messages in the Erlang runtime that have yet to be receivedby the Erlang program. We also write message queue when referring to mailbox.45creases in the number of mailbox messages. Once a worker falls behind the others for anyreason, (such as one worker receiving many states at once or OS scheduling) it tends topermanently fall behind the others. As these mailbox messages piled up, the node wouldeventually allocate more memory and this would result in slower processing of messages.This effect cascaded until the node started paging and was unable to do anything at areasonable speed.Our remedy was to introduce a crediting mechanism. Each node is initially granted anumber of credits C per peer node; each credit allows one unacknowledged message from thepeer. When a message containing states is received by another node, an acknowledgmentis returned. This provides a hard bound on the total number of messages a node may havein its message queue at any one time, i.e., nC where n is the number of nodes. We notethat no performance penalty was observed due the additional acknowledgment messages.When no credits are available to send states to a particular node p, the states are locallyqueued in an outbox. A round-robin scheduler ensures that periodically, the number ofcredits for p is checked and states waiting in the outbox are sent. In cases when the outboxfor p grows large, this queue of messages is temporarily written to disk to avoid runningout of memory.3.3 Light Weight Load BalancingWhen a distributed program runs in a heterogeneous computing environment, the totalruntime is determined by the runtime of the slowest node. If one node takes much longerthan the others to finish its work, it does not matter how fast the others were. This can becaused by some nodes being intrinsically slower than others, or dynamically assigned morework than others. In the context of DEMC, the notion of “work” is typically measuredas the number of states awaiting expansion, which is the length of WQ . Load balancingtherefore plays an important role in efficiency, and as explained below, can prevent extremeslowdowns and crashes. Previous work by Behrmann [19] and Kumar and Mercer [77]46 0 500000 1e+06 1.5e+06 2e+06 2.5e+06 0 500 1000 1500 2000 2500 3000Work Queue StatesSecondsFigure 3.1: Load balancing with Kumar and Mercer’s algorithm 0 0.2e+06 0.4e+06 0.6e+06 0.8e+06 1e+06 1.2e+06 0 500 1000 1500 2000 2500Work Queue StatesSecondsFigure 3.2: Light weight load balancingobserved the problem of an unbalanced work load in the context of DEMC. Surprisingly,although at the end of the computation all nodes have expanded roughly the same numberof states (due to uniform hashing) [104], the dynamic work queue sizes during model47checking can vary wildly between workers.To address this problem, Kumar and Mercer proposed a rebalancing scheme thatattempts to keep the work queues of all nodes similar in length throughout the computation[77]. This is achieved by comparing work queue sizes between hypercube adjacent nodesand passing states to neighboring nodes with smaller work queues. This is done in anaggressive manner, with the goal of keeping all work queues roughly the same lengths.In PReach, work queues are kept on disk and we are not concerned about minimizingthe maximum work queue size (as Kumar and Mercer achieve). Rather, we seek to avoidnodes sitting idle with an empty work queue. Our scheme, which we call light weightload balancing, reduces the overhead of work queue balancing. Each node tracks the sizesof all other nodes’ work queues by sending work queue size information along with statemessages. When one node notices that a peer node has a work queue that is LBFactortimes smaller than its own (for a fixed parameter LBFactor), it sends some of its statesto the peer. Empirically, we found LBFactor = 5 to be a good choice; it allows enoughload balancing to occur so that nodes complete around the same time without doing toomuch unnecessary rebalancing. If disk usage were an issue, one could use a smaller factorto keep the maximum work queues smaller at the expense of some extra load balancing.To show the difference between the two schemes, we ran a simple 107 state countermodel using PReach with both schemes. In Figure 3.1 we see strict balancing that keepsthe work queue sizes of all nodes identical. In Figure 3.2 we see light weight load balancing,which allows the work queues to vary somewhat. In both cases, however, all nodes completearound the same time. The benefit of the light weight scheme is that it is able to processstates faster because it is doing less load balancing and it completes sooner, at 2404 secondsas opposed to the stricter scheme which finishes in 2768 seconds. Also notice that thelongest work queue in Figure 3.2 is about half the length of all work queues in Figure 3.1.This is due to more work piling up as more computational resources are used for loadbalancing states, as opposed to the light weight scheme.48 0 50000 100000 150000 200000 250000 300000 350000 400000 0 200 400 600 800 1000 1200Work Queue StatesSeconds 0 50000 100000 150000 200000 250000 300000 350000 400000 0 200 400 600 800 1000 1200Work Queue StatesSecondsFigure 3.3: Load balancing effect on work queues. German model with 9 caches using 20workers – load balancing enabled (top) and disabled (bottom).To show the improvement between a load balanced run and a non-load balancedrun, we ran several well known protocols: SCI, LDASH and German. These protocols areincluded in the Murϕ distribution. We ran them on 60, 40, and 20 nodes, respectively. Theresults for the German protocol are in Figure 3.3, which clearly demonstrate the benefit oflight weight load balancing. The comparisons for the LDASH and SCI protocols exhibitsimilar speedup and graph characteristics; see Figures 3.4 and 3.5.49 0 10000 20000 30000 40000 50000 60000 70000 0 50 100 150 200 250Work Queue StatesSeconds 0 10000 20000 30000 40000 50000 60000 70000 0 50 100 150 200 250Work Queue StatesSecondsFigure 3.4: LDash model load balancing (60 workers); load balancing on (left) and off(right) 0 2e+06 4e+06 6e+06 8e+06 1e+07 1.2e+07 1.4e+07 0 2000 4000 6000 8000 10000 12000Work Queue StatesSeconds 0 2e+06 4e+06 6e+06 8e+06 1e+07 1.2e+07 1.4e+07 0 2000 4000 6000 8000 10000 12000Work Queue StatesSecondsFigure 3.5: SCI model load balancing (40 workers); load balancing on (left) and off (right)3.4 Batching of StatesStern and Dill’s algorithm, and consequently PReach, involves communicating successorstates to their respective owners (see line 19 in Algorithm 3.1). The number of reachabletransitions is typically at least a factor of 4 larger than the number of reachable states; thus,we could easily end up communicating hundreds of billions of states in the large models wetarget. This motivates us to look at reducing the number of messages sent, which reducesthe number of calls to Erlang’s Send and Receive communication primitives, along withutilizing the network bandwidth more efficiently. Indeed, in simple producer-consumerErlang benchmarking tests we have observed a factor of 10 to 20 speedup by sending states50in batched lists of length 100 to 1000 as opposed to sending the states individually.A simple batching variant of the Stern-Dill algorithm, which was implemented in anearly PReach prototype is as follows. Each worker batches generated states in separate outqueue for each peer node before sending. This mechanism is controlled by two parameters:MaxBatch and FlushInterval . A batch of states for a peer node is sent when MaxBatchstates have accumulated for the peer, or of FlushInterval seconds have passed since thelast batch to the peer, whichever happens earlier. Furthermore, the out queues will neglectwaiting for a full batch of states before sending whenever the work queue is short. Thistypically occurs only near the start and near the end of the entire computation. We foundthat setting MaxBatch = 1000 and FlushInterval = 1 second achieved good performance.However, for very large models it is possible that different values would be needed, oran adaptive scheme where their values vary between nodes and over time. This was notexplored in depth because while batching helps reduce runtime, some models sufferedserious performance issues where a small number of nodes would become overwhelmedwith states; this problem was described in Sect. 3.2. Fortunately, batching of states iscompatible with both methods that alleviate this problem: load balancing and crediting.When load balancing is activated, it is known a priori how many states will be sent fromone node’s work queue to the other node’s work queue, so the states can be trivially batchedinto one message. With crediting, we must choose a value for MaxBatch, but the timeoutFlushInterval is unnecessary because MaxBatch states are sent to node p whenever a creditis available for p and the round-robin scheduler selects p. Thus, our batching approachinvolves only the parameter MaxBatch, which bounds the maximum number of statesaggregated into one message.Figure 3.6 shows the benefit of varying the MaxBatch parameter in conjunctionwith crediting.51 0 500 1000 1500 2000 2500 3000 3500 0 50 100 150 200 250 300 350 400 450Work Queue StatesSeconds 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 0 20 40 60 80 100 120 140Work Queue StatesSecondsFigure 3.6: State batching effect on the counter model with 105 states using 2 workers.Message queue lengths with MaxBatch = 1 (left) and MaxBatch = 100 (right).3.5 PReach PseudocodeIn Algorithm 3.2 we show pseudocode for the PReach algorithm. The basic outline of thecode follows closely the original Stern-Dill algorithm. However, the key features of batching,crediting, and load balancing all require changes. First, in order to implement batching,the outgoing states are placed in the queue OutQ (line 22) instead of being sent directlyto their destination. These states are actually sent out later, in line 25, for the currentdestination. We visit the outgoing queues in a round robin manner and attempt to sendto one node after each state is expanded. Next, for crediting, we return acknowledgments(line 33) from states sent to us in GetStates. Each node keeps track of the credits ithas for sending to every other node with Cred . (Note: not shown is the increment anddecrement of Cred [p] when sending states to or receiving states from p, respectively). TheSendQ function contains the rest of the changes. It implements the logic that decideswhen we should send to a particular peer, using its estimate WQEst of how large the othernode’s WQ is. It checks to see that credits are available in Cred , and ensures that wedo not send states to a node that is currently sending us load balancing states (line 43).Next, we check to see if the node is eligible to receive load balancing states (line 44; notethat load balancing is only enabled if the worker’s work queue length is greater than some52Algorithm 3.2 PReach DEMC1: T : set of states2: WQ : list of states3: OutQ [n] : array of list of states . n is number of nodes4: Cred [n] : array of int . C is number of credits5: WQEst [n] : array of int6:7: procedure Search8: T ← ∅; WQ ← []; Cred ← [C, ..., C]9: OutQ ← [[], ..., []]; WQEst [n]← [0, ..., 0]10: CurDest ← 011: if I am the root then12: for each start state s do13: Send s to Owner(s)14: end for15: end if16: while ¬Terminated() do17: GetStates()18: if WQ 6= [] then19: s← Pop(WQ)20: for all s′ ∈ Successors(s) do21: s′c ← Canonicalize(s′)22: Enqueue(s′c,OutQ [Owner(s′c)])23: end for24: end if25: SendQ(CurDest)26: CurDest ← (CurDest + 1) (mod n)27: end while28: end procedurethreshold WQminLB). Following any load balanced messages we check to see if we shouldsend states owned by the current peer from the outbox. We try to wait until there are atleast MaxBatch messages in a batch (typically MaxBatch = 100), but if the destination’swork queue is low or if we don’t have any work ourselves, then we will send smaller batches(line 51).5329: function GetStates()30: while there is an incoming state-message M do31: . S list of states, p sender ID, L sender WQ length32: {S, p, L} ← Receive(M)33: SendAck(p)34: WQEst [p]← L35: for all s ∈ S such that s /∈ T do36: Insert(s, T )37: Enqueue(s,WQ)38: end for39: end while40: end function41:42: function SendQ(dest)43: if Cred [dest ] > 0 ∧ WQEst [dest ] < LBFactor ∗ length(WQ) then44: if length(WQ) ≥WQminLB ∧ LBFactor ∗WQEst [dest ] < length(WQ) then45: NumStates ← min{length(WQ),MaxBatch}46: Dequeue NumStates states from WQ and send to dest47: end if48: if OutQ [dest ] ≥ MaxBatch ∨49: (OutQ [dest ] > 0 ∧ (length(WQ) = 0 ∨ WQEst [dest ] < MaxBatch)) then50: NumStates ← min{length(OutQ [dest ]),MaxBatch}51: Dequeue NumStates states from OutQ [dest ] and send to dest52: end if53: end if54: end function3.6 ResultsHere we present a number of experiments detailing the performance of PReach. Mostof these experiments were run on machines from a heterogeneous computing pool consist-ing primarily of 2.5-3.5GHz Core 2 and Nehalem class Intel machines. Typically 4-8GBof memory was available per worker, although most experiments did not use all of this.Table 3.1 presents a few of the larger models we have verified with PReach.Though previous implementations of the Stern-Dill DEMC algorithm report linearspeed-up [96, 104], it is important to show that such speed up is also achieved in our imple-54Model States Nodes Time States per(×109) (hours) Sec per NodePet8 15.3 100 29.6 1493Intel3 5 10.1 61 24.7 1860Intel3 7 28.2 92 90.2 945Table 3.1: Large model experiments. Here Pet8 is Peterson’s mutual exclusion algorithmover 8 clients (with no symmetry reduction), and Intel3 is an Intel proprietary cache proto-col (with symmetry reduction enabled). The last two rows are for Intel3 with respectively5 and 7 transaction types enabled.mentation, especially given that a high level language (Erlang) is handling communicationdetails. Figure 3.7 shows the speed-up for n nodes against Murϕ for a few public domainmodels. In all cases except german7 (German’s protocol [58] over 7 caches) we see nearlinear speed-up. The diminishing returns of german7 is expected for smaller models; thenumber of reachable states (after symmetry reduction) is less than 2 × 106, and Murϕcompletes checking it in only 315 seconds. In contrast, Murϕ took an hour or more forthe three other protocols (german9: 6242 seconds, mcslock2: 8386 seconds, ldash: 3583seconds). The slope of the speedups of Figure 3.7 are all less than 0.3, where the slope ofthe mcslock2 speedup is especially underwhelming. This slope is determined by the ratio ofruntime spent in the Erlang code versus the Murϕ engine. For these models, the Murϕ en-gine time (for state expansions, checking invariants, etc.) is relatively small. However, forthe industrial Intel models in Table 3.1 the Murϕ Engine takes longer for these operations,and we estimate the slope of the speedup for these models to be about 0.5.55 0 5 10 15 20 25 0 10 20 30 40 50 60 70 80time(murphi)/time(n)n (Node Count)ldashmcslock2german7german9Figure 3.7: Speed up experimental results56Chapter 4Model Checking of DeadlockFreedomThis chapter presents a practical method for verifying Deadlock Freedom (DF) propertiesin guarded command systems. Such properties are expressed in CTL as AGEF q where q isa set of quiescent states. We require the user to provide transitions of the system that are“helpful” in reaching quiescent states. The distributed search constructs a path consistingof helpful transitions from each reachable state s to some state that is either quiescent oris known to have a path to a quiescent state, thereby demonstrating that such a path fors exists. We extended the PReach model-checker with these algorithms. Performancemeasurements on both academic and industrial large-scale models shows that the overheadof checking deadlock-freedom compared with state-space enumeration alone is small.1DF properties are a weak form of liveness. A system could satisfy DF and yetcontain cycles of non-q-states, and thus some executions may visit an infinite sequence ofnon-q-states. However, DF properties provide a necessary but not sufficient condition forappropriate liveness properties to hold. We show that DF is straightforward to formalize,in particular it avoids the specification of fairness conditions. It is also less computationally1This chapter is based on a published tool paper [26].57intensive to model check when compared with even simple liveness properties. We exploreone such liveness property called response in Chapter 6.4.1 OverviewreachablequiescentinitFigure 4.1: Illustration of deadlock freedomAutomatic checking of liveness properties is a challenging task. Approaches to ad-dress this generally require the user to carefully specify system fairness assumptions thatare necessary for liveness to hold. Furthermore, checking liveness is computationally ex-pensive, being more sensitive to the state-space explosion problem than simple state-spaceenumeration. A broad class of liveness failures of practical importance is deadlock, whereinone or more transactions are blocked, for example, due to a cyclic resource dependency [64].In such a state, there exists no path to a state where all transactions have completed. This is58our motivation for characterizing deadlock-freedom by a property AGEF q.2 This propertyis illustrated in Figure 4.1, where some reachable states are highlighted and shown to havepaths to a quiescent state. Notice that a system may have a cycle of non-quiescent statesand still satisfy deadlock freedom. There only need be a path from all states, includingthose in such cycles, to some quiescent state.A feature added to the PReach tool of the previous chapter and the focus ofthis chapter is an explicit state model checking technique to verify the CTL [44] propertyAGEF q. This property (of recent interest [62]) says “for all reachable states, there existsa path to some q-state”. In our approach to verifying AGEF q, the system is modeledusing guarded commands, and the user identifies a subset H of these commands as helpful.These commands are the ones that the user expects will cause the system to make progresstowards q. If s is a state and s′ 6= s can be reached from s by performing a helpfulcommand, then we say that s′ is a helpful successor of s. Thus, from any reachable states1,3 we seek a witness – a path to some q state. If s1 is a q-state, then the path is trivial;otherwise, PReach computes s2, a helpful successor of s1. If s2 is a q-state then we havefound a witness path for s1; otherwise, s3, a helpful successor of s2 is computed. Thisprocess iterates, building a witness ρ = s1, s2, . . . until a state si is found where either• si is a q-state (Figure 4.2a), or• si has no helpful successor (referred to as H-stuck ; Figure 4.2b), or• si already appears in ρ (Figure 4.2c).In the first case, ρ is a witness for s1 and PReach continues with its standard state-space exploration algorithm. If a witness is found for every reachable state then we saythat AGEFH q holds, since the “EF” paths consist entirely of transitions from H. Clearly2Some literature and tools identify deadlock with the much weaker property that all reachablestates have at least one (possibly unique) successor. We use the stronger form, AGEF q throughoutthis chapter.3 PReach can also verify the more general property AG (p→ EF q), but for this paper we assumefor brevity that p is true.59h1 h2 hks s’(q−state)(a) A path is found to some q-state s′h1 h2 hks(no helpful transition enabled)s’(b) A path is found to some state s′ with no helpful transition enabledh1 h2 hks s’(cycle)(c) A path is found that cyclesFigure 4.2: Outcomes of a witness searchAGEFH q is sufficient for AGEF q. In the other two cases, PReach halts and reports thepath ρ to the user. These cases do not imply ¬AGEF q. For example, if the helpful rule listis empty and there exists a reachable state that is not a q-state, then nontrivial witnesseswill never be found. Likewise, PReach may choose a sequence of transitions that leads toa cycle even though a path to a q-state exists. While either error could be a false negative,as we report in Section 4.4, in practice such failures can show that behaviors of the modelare not those intended by the designer and thereby reveal real errors.In our experience, many guarded command models have a clear partition betweencommands that inject a new request and those that service existing ones. Therefore, itis not difficult to identify a suitable set of rules H and a quiescent state predicate q. Weshow through experiments that using only helpful commands to form paths is sufficient60in practice to verify AGEF q. Assuming that EMC has already enumerated the reachablestates as part of safety property verification, the additional time to verify deadlock freedomis usually a small fraction of the time as compared with state-space enumeration.It is critical for performance to leverage the known witnesses during subsequentsearches. Suppose a witness path ρ1 has been found for state s, and s is encounteredon path ρ2 while searching for witness path for s′. Clearly, the concatenation of pathsρ1 and ρ2 is a witness path for s′. PReach uses a dedicated state hash table for thispurpose, called WHT (witness hash table). Each time a witness path is found for somestate, it is added to the WHT; when we check if some state si is a q-state, we also checkfor membership of WHT. Henceforth, we use q ∨WHT to denote states that are eitherq-states or members of WHT.4.1.1 Specification SyntaxThe Murϕ syntax is extended to specify these deadlock freedom (DF) properties. Thebasic structure isliveness "<Property Name>"<P-predicate>CANGETTO<Q-predicate>where liveness and CANGETTO are keywords, <Property Name> is a string that names theproperty, and P-predicate and Q-predicate are predicates describing p and q, written inthe same way as guard predicates are written in Murϕ programs. For example, the Ger-man cache coherence protocol Murϕ code contains a flag ExGntd that indicates if exclusiveaccess has been granted to the (single) cache line. Suppose we wanted to check that ifexclusive access is granted, there exists a sequence of actions that removes this access. In61Murϕ syntax, this is written asliveness "ExSurrendered"ExGntd = trueCANGETTOExGntd = falseThe set of helpful rules H are specified in terms of Murϕ rules. Each rule, whichdefines a set of transitions, is expressed in Murϕ asrule "<Rule Name>"<Guard>==><Command>The helpful rules are determined by passing nonhelpful rules flag -nhr R1 ... Rkto PReach4. Any rule with a name that contains any of the strings R1 ... Rk will beexcluded from H, and all other rules are included.4.2 AlgorithmsThis section describes our algorithms that extend the general Stern-Dill paradigm of stat-ically partitioning the state-space and assigning states to owner workers, as covered inChapter 3. The data structure WHT is assumed to only store states that are owned bya given worker. In a sequential implementation, states could be inserted into WHT whenthe search for a witness path begins, since WHT is not consulted by subsequent searchesuntil a witness path is found. The prefix of the search path must be maintained in order4For our benchmarks, there are more helpful rules than nonhelpful ones, so it is easier to providethe latter.62to detect a cycle, but can be discarded once a witness path is confirmed. In a parallelimplementation we assume state ownership is distributed across workers, i.e., a worker willonly insert owned states into WHT.Under this paradigm, it is unsound to insert states into WHT while it may beconsulted by other workers until it is confirmed that a witness path exists. This is dueto searching for witnesses for states that query one another. Consider Figure 4.3, whereworker 1 owns state a and worker 2 owns state b, and transitions (a, b) and (b, a) are helpful.Assuming that neither a or b are quiescent, then DF does not hold since neither state hasa witness. To find a witness path for a, worker 1 might query worker 2 to check if b is amember of q ∨ WHT, and vice-versa. In a distributed setting, these two queries mightoccur at the same time. If (say) worker 1 reports a witness exists for a (rather than a searchis pending), then worker 2 may erroneously conclude that a witness exists for b. Therefore,our algorithms must be wait for a confirmation of a witness path before inserting statesinto WHT.a bFigure 4.3: A subgraph that does not satisfy DF4.2.1 Local SearchThis search involves no communication to other workers during a witness search. Whena worker in the distributed reachability algorithm encounters a new state s, the workercomputes a path ρ as described in Section 4.1. That is, starting with [s], the path ρ isconstructed by iteratively computing a helpful successor of the current head of ρ. Oncesome state s′ ∈ q ∨WHT is encountered, all states of ρ that are owned by this worker will63be added to WHT, which necessarily includes s. This path computation is not distributedacross workers, and thus, redundant path computations occur. While this approach scalespoorly when the reachability analysis is run on a large number of machines, it provides abaseline that is free from communication overhead.4.2.2 Pass-the-PathPass-the-path (PtP) distributes the witness path searches by forwarding the current pathprefix to the owner of the next state. When a state s is found in the reachability analysis,if it is already in q ∨WHT then a witness path is known to exist and no further work isneeded. Otherwise, an enabled helpful rule is chosen; the successor state, s′ is computed,and the search message ([s], s′) is sent to the owner of s′. Here, [s] is a list of statesrepresenting the current prefix path. This process continues constructing a prefix path ρ,communicating search messages of the form (ρ, scur) to the owner of scur, until a memberof q ∨ WHT is reached, a cycle is encountered, or no helpful commands are enabled. Inthe first case, the owners of all states along the path are notified and they update theirWHTs, and otherwise a failure is reported. See Figure 4.4a for an example.Notice that PtP allows redundant searches and acknowledgments to occur becauseworkers keep no record of which states have pending searches for witness paths.4.2.3 Outstanding Search TableThese redundant searches can be avoided if workers keep track of which states have out-standing witness searches. We have implemented such an approach where each workermaintains the pending searches in local table of states pairs ST (search table). This ap-proach called OST has the benefit of search messages containing only a pair of states (s, s′).If such a message arrives and there is a pending search for s′ in the table, then s is addedto a list of states that must be acknowledged as having a witness once s′ is acknowledged.When starting a witness search for owned state s, if there is no element (s, ·) in ST,64quiescentinita bcd[a,b][a,b,c][a,b,c,d](a) PtP example. When state d is encountered by the witness search, the owner of d sends a messageconfirming the witness to for all states in the path.quiescentinita bcd(a,b) (b,c)(c,d)(b) OST example. When state d is encountered by the witness search, states in the path areconfirmed to have a witness in reverse order, chaining through entries in the OST tables.path in transition graphhelpful transitionWP acknowledgement(c) LegendFigure 4.4: PtP/OST example, assuming that d is in WHT, but a, b and c are not.then store (s, null) in ST; find s′, a helpful successor of s; and send (s, s′) to the owner ofs′. Otherwise, if (s, ·) exists in ST, it signals that a witness search is already pending fors. When the owner of s′ receives (s, s′) and s′ is not H-stuck, then• If s′ is a member of q ∨WHT, send an (WAck , s) message to the owner of s′.65• Else, if ST contains an entry (s′, ·), then insert (s′, s) into ST. There is a pendingwitness search for s′ and the witness search for s is blocked from propagating.• Otherwise, insert (s′, s) into ST; find s′′, a helpful successor of s′; and send themessage (s′, s′′) to the owner(s′′).In the case that s′ isH-stuck, a counterexample trace may be generated by sending message(cex, [s, s′]) to owner(s) and chaining these messages backwards through the path until thestate that started the search is found. When the owner of s receives message (WAck , s),insert s into WHT and send message (WAck , si) to the owner of non-null si for every entry(s, si) in ST, and delete these entries.Notice that witness searches that cycle are not immediately detected, but will resultin ST entries that are never removed. When state-space exploration completes and all sentmessages have been received, the termination detection of PReach will observe that eachnode has an empty work queue and no messages are in-flight, but there are non-empty STs.At this point a counterexample cycle can be generated similar to the H-stuck case above,starting with some ST entry (si, si−1). Thus OST has the disadvantage of not detectingthese cycles until all states have been visited, although heuristics could be designed thattry to discover cycles earlier. Counterexample trace generation in this mode is not yetimplemented.4.2.4 Implementation NotesFor ease of implementation and for repeatable experiments, we restrict the helpful successorof a given state to be fixed but arbitrary. Thus, there is exactly one helpful successor ofeach state provided that at least one exists.The Murϕ-based hash table described in Chapter 3 is once again utilized to storeexplored states. However, we “tag” each hash table entry with two bits – one to indicateif the state has been visited in the state enumeration and one to indicate if a witness has66been confirmed. These bits are a small memory overhead as the hash table uses a 40 bitvalue to store each state.The deadlock freedom algorithms can take advantage of load balancing similar tothe standard PReach algorithm. Recall that PReach sends a large batch of states ownedby a worker with a long WQ to a worker with a shorter one for expansion. In PtP and OSTmode, there are two additional kinds of messages, witness search and witness acknowledge,are sent in large numbers between workers. These message types each have a designatedoutbox, and messages are batched in the same way as Basic PReach (Chapter 3). Thus,during a typical round of sending, a worker will send 3 messages to every other worker,one containing states for expansion, one containing witness searches, and one containingwitness acknowledgments. When a worker receives a non-owned state s via load balancing,a witness search will be initiated for s regardless of s existing in the WHT of its owner.The possibility of redundant witness searches could be avoided by adding extra informationabout witnesses for load balanced states, but this was not implemented.We have experimented with a heuristic to avoid sending search messages to workersthat are running slowly. The main idea is to leverage PReach’s flow control mechanismto detect when a worker is slow to acknowledge received messages. Then, a worker mayavoid sending (ρ, s) to owner(s) for s /∈ q by locally computing helpful successor s′ of s,and sending ([s] ◦ ρ, s′) to owner(s′). The trade-off here is if s ∈WHT at owner(s), thenthis local computation is redundant as is the eventual witness acknowledge message sentto owner(s).4.3 PerformanceWe ran PReach-DF on a variety of combinations of Murϕ models and hardware con-figurations, summarized in Table 4.1. For each, we measured the performance of regularstate-space enumeration (no EF), local search mode (Section 4.2.1), Pass-the-Path mode(Section 4.2.2) and Outstanding-Search-Table mode (Section 4.2.3). Local search mode67takes an impractical amount of time for a few of these so we omit it from the results.The Murϕ models used are the German and Flash cache coherence protocols, the Petersonmutual exclusion algorithm, the MCS lock mutual exclusion algorithm and an Intel pro-prietary cache coherence protocol. We use GermanX and flashX to denote these modelsconfigured with X caches and two data values; peterson12 is Peterson’s algorithm with 12workers and mcslock6 is the MCS Lock algorithm with 6 workers. All benchmarks and thePReach-DF source code is available online[25]. The compute server farms are as follows:• multicore: 8 PReach workers on a 2 socket server machine, each processor is a Intel R©Xeon R© E5520 at 2.26 GHz with 4 cores.• small cluster: 80 PReach workers on a homogenous cluster of 20 Intel Core i7-2600Kat 3.40 GHz with 4 cores.• large cluster: 100 PReach workers on a heterogenous network of contemporaryIntel R© Xeon R© machines.The column “runtime” is the mean runtime of three trials, with the exception of thelarge cluster runs which are based on one trial (marked with †). The “overhead” columnsare the additional runtime relative to that of the “no EF” mode run of the same modelrun on the same hardware. In PtP mode, column “avg. path” is the number of helpfultransitions needed to reach a state that is a member of q ∨ WHT, averaged over thesearches launched for each non-q-state. This number is also the average number of timeseach non-q-state is acknowledged for insertion to WHT.4.4 SummaryWe have shown an efficient distributed algorithm and implementation for checking deadlockfreedom properties. The simpler approach of PtP does some redundant work, but performswell on models of cache coherence where paths to q-states tend to be relatively short (see68Figures 4.5 and 4.6). The approach is intolerably slow for the Peterson mutual exclusionalgorithm where these paths are much longer. In this case, our more involved OST algo-rithm has a favorable runtime and manageable memory overhead (as shown in Figure 4.7).In fact, our experiments reveal that the memory usage of OST is typically small relativeto the amount dedicated to the WHT, and has a small performance gain over PtP for thebenchmarks where it does not timeout. We find that using OST to check deadlock freedomis inexpensive – at most a 109% runtime penalty but typically much smaller. While webelieve OST is generally a better choice, a closer comparison with PtP is a topic of futurework. We hypothesize that the performance gap between them is a function of the averagepath length of a witness search; this could be verified by constructing a toy model wherethis average can be tuned.The utility of our approach to model checking deadlock freedom was underscoredwhen a counterexample trace was generated on the peterson12 benchmark. After carefullychecking our definitions of q and H, we found a critical typo in the Murϕ model, whichdespite being an example model in the popular Murϕ distribution, has persisted for nearly20 years. Interestingly, the bug in question was not revealed by checking the safety propertymutual exclusion, or by “Murϕ deadlock” (states with no successors) or even by checkingAGEF q. With the bug, there exists a state where a worker attempting to enter the criticalsection may not do so until another worker makes an attempt. Thus, the model does notsatisfy AGEFH q.Checking AGEFH q has a “buy-one, get-one free” appeal. Once model checkinghas been done for safety properties, a small amount of human effort is needed to identifyhelpful rules and write a quiescence predicate, q. While this approach cannot check forsubtle liveness errors, especially ones that rely on fairness constraints, AGEFH q can bevery effective at finding deadlocks and violations of designer intent as illustrated by thePeterson example.69model hardware no EF local PtP OSTruntime overhead overhead avg. path overheadGerman9 multicore 1365.12 0.76 0.16 1.005 0.21flash5 multicore 752.75 1.07 0.30 1.005 0.39peterson12 multicore 4018.34 1.11 timeout - 0.38mcslock6 multicore 230.09 1.12 0.43 1.093 0.67German9 small cluster 85.91 3.43 0.24 1.642 0.35flash5 small cluster 57.46 1.52 0.32 1.307 0.52flash6 small cluster 1854.42 1.56 0.23 1.021 0.29peterson12 small cluster 254.27 9.50 timeout - 0.50mcslock6 small cluster 22.04 1.45 2.73 4.763 1.09intel small† large cluster 1025.20 7.10 0.84 2.242 0.55intel large† large cluster 49041.70 timeout 0.35 1.309 0.44Table 4.1: Performance of DF checking algorithms. Model sizes in states: German9 –19844513, flash5 – 24063542, flash6 – 609827554, peterson12 – 116039964, mcslock6 –12838266, intel – 22738573, intel large – 906695343.70 1 10 100 1000 10000 100000 1e+06 1e+070 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29CountPath Length 1 10 100 1000 10000 100000 1e+06 1e+07 1e+08 1e+090 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25CountPath LengthFigure 4.5: Semi-log histograms of path lengths in PtP mode, as defined in Table 4.1. Thetop is from German9/multicore, the bottom is from flash6/small cluster.71 1 10 100 1000 10000 100000 1e+06 1e+07 1e+080 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34CountPath LengthFigure 4.6: Semi-log path length histogram for intel /large cluster in PtP mode 0 5e+06 1e+07 1.5e+07 2e+07 2.5e+07 3e+07 3.5e+07 0 50 100 150 200 250 300 350 400ST WordsSecondsFigure 4.7: Memory usage of ST for peterson12/small cluster in OST mode. Here,“words” are 8 bytes; the maximum memory usage reached over all workers is about 233MB.72Chapter 5Parameterized Deadlock FreedomClassical model checking assumes that the system under consideration is finite-state. Manyresearchers have explored techniques to generalize model checking to verify various classesof parameterized systems. A parameterized system P is a function that yields a finite-statesystem P(n) for all naturals n ≥ 1. Here n indicates the number of values involved insome type P (called the parametric type) used by the system, for example client IDs ormemory addresses. A parameterized model checking problem asks if P(n) satisfies somegiven specification for all1 n. Unless one puts severe restrictions on the class of systems,parameterized model checking is undecidable [1]. The systems we consider are symmetric,a restriction on those for the undecidable proof. However, in Section 5.2.4 we provide aproof that model checking symmetric paramaterized systems is also undecidable.2A promising approach to parameterized model checking is based on abstractionand compositional reasoning [92, 93, 40, 76, 86, 28, 83, 99] and is typically used to verifyuniversally-quantified (over P) state assertions roughly as follows. An initial abstraction A0is created from the syntax of P. By construction, the transitions of A0 over-approximatethose of P(n) for arbitrary n > k (typically for some small k). If the safety property ϕ1Many approaches, including ours, only verify P(n) for all n ≥ n0, where n0 is a small constant.This is not a shortcoming since the P(n) where n < n0 are either “uninteresting” or can bedispatched by finite-state model checking.2This chapter is based on [23].73holds of A0, then we can soundly conclude it holds for P(n). However, often this is notthe case and we must strengthen the transitions of A0 using a conjectured state invariantϕ1, yielding a tighter abstraction A1. This process iterates until we obtain a Aj whereinϕ along with all of ϕ1, . . . , ϕj hold. At this point our parametric verification goal has beenachieved. If ϕ is not invariant of P(n), then eventually the user will either give up or noticea real counter-example. This approach is sound, in that if ϕ does not hold of P(n) then ϕwill not hold of A0, regardless of the strengthening applied.Our contribution is an extension of the approach described above to handle Dead-lock Freedom (DF) properties of the form of the CTL formula AGEF q for a state predicateq (more generally, we also consider AG (p → EF q)). The key idea is to construct abstracttransition relations that not only over-approximate those of P(n), but also transition rela-tions that under-approximate P(n). The resulting verification framework is formalized interm of mixed abstractions [48], which are systems having two transition relations O and U ,which are respectively over-approximative (OA) and under-approximative (UA). As in thetraditional approaches, O is used to explore the reachable abstract states, which representan over-approximation of the reachable states of P(n). During this exploration, paths ofU are explored to check that the existential CTL formula EFq holds for each reachableabstract state. If so, it is safe to conclude DF of P(n); the existence of a path in U impliesthat of a corresponding path in P(n). Like O, it is straightforward to algorithmically con-struct an initial U ; however, analogous to how O might be too weak, we might find that Uis too strong.One of our key contributions is a set of heuristic methods that allow the user tosoundly weaken U . This is achieved by exploiting syntactic comparisons between theguarded-command transitions (i.e. Murϕ rules) of the parameterized system and those ofthe OA transitions. If rule’s guard is not weakened by the overapproximation, then weconclude the rule is UA. In cases when the guard has been weakened, we provide heuristicinference rules (HIRs) that allow us to conclude the rule is UA if some DF property74holds of the mixed abstraction. These properties can be decided by DF model checking.Determining which HIR to apply for a particular rule is straightforward, and generatingthe relevant DF property for a heuristic is automatable. When a DF property for the mixedabstraction fails to hold, the required human insight is similar to that the method of Chouet al. [40], which is used to prove parameterized saftey properties. Another contribution isa theorem that supports DF verification when q involves universal quantification over theparametric type.Note that this is an incomplete method for solving the problem of parameterizedDF for symmetric systems. The presented HIRs provide tools for the user to weaken theunderapproximation and, in our experience with case studies, are sufficient to establish aproof. They are heuristic in that they may or may not be successful in proving deadlockfreedom, but they are sound in that model checking certain side conditions is justificationto add transitions to the underapproximation.This chapter is organized as follows. We begin with a simple example of showingDF in a mutual exclusion scheme. Next, mixed abstractions are defined and some keylemmas are stated. In Section 5.3, we describe how mixed abstractions are modeled andcomputed in Murϕ. Section 5.4 discusses our approach for weakening U by employingHIRs. Section 5.5 outlines the DF verification of the German and FLASH cache coherenceprotocols, and Section 5.6 concludes the chapter. For convenience, a list of chapter symbolsis provided in Table 5.3. Appendix Sections C.1 and C.2 give specific examples of applyingtwo kinds of HIRs.5.1 A Simple ExampleTo keep this example simple, we have glossed over some technical details. These areaddressed in footnotes that are collected at the end of this section on page 80. Sections5.2–5.4 give a more detailed an formal description of our approach.Consider the parameterized program of Figure 5.1 that describes a mutual exclusion75in n : integer where n > 1local t: integer where t = 0n‖i=1P [i] ::`0 : loop forever do`1 : noncritical`2 : while t 6= i do`3 : 〈if t = 0 then t := i end if〉`4 : if t = i then`5 : criticalend ifend while`6 : t := 0end loopFigure 5.1: Program turn (mutual exclusion by turn setting) [89, Fig. 2.8]algorithm. This program can be described with the Murϕ system of Figure 5.2. For brevity,the lines `2 and `3 are combined (modeled as L3, waiting to enter the critical section) as arethe lines `4 and `5 (modeled as L5, the critical section). Line `0 is not modeled; the systembegins with all threads at L1. Instead of the value 0 for t, we use the built-in Murϕ valueundefined. In order to be a valid Murϕ system, we must provide a specific parametervalue for n. Any value greater than 1 will suffice; for this example, we arbitrarily choose 5.The Murϕ system of Figure 5.3 is the corresponding abstract system with 1 thread ex-plicitly modeled. The rest are subsumed into a new value called Other. This abstractionoverapproximates the behavior of the concrete system any n > 2 threads, essentially byassuming that these “other” threads 2, ..., n could be at any line.Some of these rules have identical syntax and names as their concrete counterparts.Since nothing about the state is assumed or overapproximated in the guards of such rules,the following rules are in fact underapproximate: L1 to L3, L3 to L5, L5 to L6, L6 to L1,and Stutter. However, the rules with ABS prefixes in the name do not meet this criteria.For example, compare the guard of the concrete rule L5 to L6 with the abstract rule76-- a mutual exclusion protocolconst NUM THREADS : 5; -- assume 5 threadstype THREAD : scalarset(THREAD NUM); -- a symmetric setLINE NUM : enum L1, L3, L5, L6; -- enumerated type representing the line numbervar Line : array [THREAD] of LINE NUM;t : THREAD;startstate "Init" -- initial statefor i : THREAD do Line[i] := L1; end;end;ruleset i : THREAD dorule "L1 to L3" Line[i] = L1==> Line[i] := L3; end;rule "L3 to L5" Line[i] = L3 & isundefined(t) -- & is logical AND==> Line[i] := L5; t := i; end;rule "L5 to L6" Line[i] = L5 & t = i==> Line[i] := L6; end;rule "L6 to L1" Line[i] = L6==> Line[i] := L1; undefine t; end;end; -- rulesetrule "Stutter" true==>end;Figure 5.2: The Murϕ system for program turnABS L5 to L6. The guard of the latter rule assumes that there exists some i > 1 such thatLine[i] = L5. Every rule of the abstraction contributes to the set of overapproximatetransitions O, and the list of UA rules contribute to the set of UA rules U . The aboveabstraction, augmented with the set of UA rules U , is in fact a mixed-abstraction A.We seek to show that for any n, every reachable state s of the concrete system P(n)has a path to some state s′ where t is undefined. These are states where no thread is in thecritical section. We use model checking of the mixed abstraction to check A |= AGEF q,77-- abstract systemconst THREAD NUM : 1;type THREAD : scalarset(THREAD NUM);ABS THREAD : union {THREAD, enum{Other}}; -- new value "Other" abstracts threads > 1LINE NUM : enum {L1, L3, L5, L6};var Line : array [THREAD] of LINE NUM;t : ABS THREAD;startstate "Init"for i : THREAD do Line[i] := L1; end;end;ruleset i : THREAD do-- underapproximate rulesrule "L1 to L3" Line[i] = L1==> Line[i] := L3; end;rule "L3 to L5" Line[i] = L3 & isundefined(t)==> Line[i] := L5; t := i; end;rule "L5 to L6" Line[i] = L5 & t = i==> Line[i] := L6; end;rule "L6 to L1" Line[i] = L6==> Line[i] := L1; undefine t; end;end; -- rulesetrule "Stutter" true==>end;-- overapproximate rulesrule "ABS L3 to L5" isundefined(t)==> t := Other; end;rule "ABS L5 to L6" t = Other==> undefine t; end;Figure 5.3: Murϕ system for the mixed abstraction of turnwhere q is isundefined(t). This DF property is expressed in Murϕ syntax using newkeywords liveness and CANGETTO as understood by PReach. Here, the property is:liveness "NoCrit"78true CANGETTO isundefined(t)This DF property holds if for every state reachable by O transitions, there exists a path ofU transitions to a state where t is undefined.When executed, a counterexample reveals that the underapproximate transitionrelation U is too strong. The counterexample path begins with ABS L3 to L5 firing fromthe initial state, reaching a state where Line[1] = L1 and t = Other. Then, L1 to L3fires, updating Line[1] with the value L3. At this state none of the UA rules are enabled,but non-UA rule ABS L5 to L6 is enabled. We do not consider this rule UA because itassumes that whatever node t refers to in some concretization, call it j > 1, has Line[j]= L5. Of course, it is easy to see in this example that in any concrete system, t = j ifand only if Line[j] = L5. However, for illustration we show how this can be proven usingpermutation. For any state where t = Other, consider swapping the “actual” value heldby t with thread 1. This permutation exploits the symmetry of the system to force thethread value of t to be visible in the mixed abstraction; see Figure 5.4. The resultingstates are described by the predicate isdefined(t) & t != Other3. We show that allreachable states satisfying this predicate have a path to some state that satisfies the guardof L3 to L5, written as exists i : THREAD do t = i & Line[i] = L5 end4. Thus, wecheck the following property5:liveness "H1"isdefined(t) & t != OtherCANGETTOexists i : THREAD do t = i & Line[i] = L5 endThis DF property holds, which proves ABS L3 to L5 is UA. Now, with this ruleadded to U , the property NoDeadlock is checked again and now holds. This completes theproof parameterized DF, i.e., that for any n > 1, every reachable state of P(n) has a pathto some state where no thread is in the critical section.79spipi(s) pi(s′)observedimpliedpi−1s′Figure 5.4: Finding paths through permutation. The path from s to s′ is implied by findinga path from their permuted counterparts, pi(s) and pi(s′).Notes3The function isdefined does not exist in Murϕ, but we use it as shorthand for !isundefined,where ! is logical NOT. The operator != means “not equal to”, i.e., t != Other is the same as!(t = Other).4We would prefer to write something like t = 1 instead of isdefined(t) & t != Other, and t= 1 & Line[1] = L5 instead of exists i : THREAD do t = i & Line[i] = L5 end. However,Murϕ will complain because the set of nodes are symmetric, and we therefore cannot refer to aparticular value directly.5For this check, instead of using all UA rules to find the requisite path, only rules that are“local” to t are used. These are rules that only change the ith index of array variables when t =i. This is to ensure that the changes of this path remain hidden when the inverse permutation istaken. In this example, rules L1 to L3 and L6 to L1 when i = 1, as well as Stutter are local tot when t = i.5.2 Formal FrameworkThis section presents the formal framework that we use to verify quiescence properties ofparameterized systems. Section 5.2.1 introduces mixed abstractions that have two tran-sition relations: one that under-approximates the behaviors of the concrete systems and80another that provides an over-approximation. Section 5.2.2 presents the idea of insuffi-ciency – a mixed abstraction may have an under-approximation that is too strong to verifythe desired quiescence property or an over-approximation that is too weak. Section 5.2.3describes the parameterized systems that we consider. These are less general than thoseof [1]. Section 5.2.4 shows that safety property checking (and hence DF checking) remainsundecidable even with our restrictions.5.2.1 Mixed AbstractionsA system S is a tuple (S, I, T ) where S is a set of states, I ⊆ S is the set of the initial states,and T ⊆ S×S is the transition relation. We write s1 T s2 to denote that (s1, s2) ∈ T ∗. Astate s is said to be S-reachable (or simply reachable if S is understood) if s0 T s for somes0 ∈ I. A state predicate p is simply a subset of S; if s ∈ p we call s a p-state. Followingstandard CTL syntax, for state predicates p and q, we write: S AGp if all reachablestates are p-states; AG (p → EF q), if for all reachable p-states s there exists a q-state s′such that s T s′; and AGEF q to mean AG (true → EF q), i.e., from any reachable statethere is a path to some q-state.To show that AG (p→ EF q) can be inferred for a concrete system S by establishingproperties of an abstraction A we employ Lynch and Vaandrager’s notion of forward simu-lation [87]. Let S1 = (S1, I1, T1) and S2 = (S2, I2, T2) be two systems and θ ∈ S1×S2 be anabstraction relation. With respect to θ, we say that T2 forward simulates (or “simulates”for short) T1 if for every (s1, s′1) ∈ T1 and for all s2 such that (s1, s2) ∈ θ, there is somes′2 ∈ S2 such that s2 T2 s′2 and (s′1, s′2) ∈ θ. This allows system S2 to take multiplesteps that may be invisible in S1 including possibly steps that have no “explanation” inS1. This general sense of simulation is motivated by our goal of showing the existenceof trajectories. See Figure 5.5. The domain of θ may be a subset of S1, in particular,restricted to the reachable subset of S1. We say T2 simulates T1 when θ is clear fromcontext. Now let s1, . . . , s` be a T1-path, and suppose T2 simulates T1 with respect to81θ. By induction on `, there exists an T2-path s′1, . . . , s′k and a non-decreasing surjectionf : {1, . . . , `} → {1, . . . , k} such that (si, s′f(i)) ∈ θ for all 1 ≤ i ≤ `.S1s1 s′1θs′2θS2s2Figure 5.5: Simulation of S1 by S2To show AG (p→ EF q) using abstraction, the abstract system must, for soundness,over-approximate the set of reachable p-states, and under-approximate the set of pathsfrom p-states to q-states. Thus, we introduce a mixed abstraction as defined below. Notethat unlike Lynch and Vaandrager, we require θ to be a function.Definition 1. Let S = (S, I, T ) be a system and let Reach be the S-reachable states. Amixed abstraction of S (relative to θ : S → SA) is a quadruple A = (SA, IA, U,O) suchthat• SA is a set of abstract states,• IA ⊆ SA are the initial abstract states and satisfy θ(I) ⊆ IA,82• O ⊆ SA × SA simulates T with respect to θ ∩ (Reach × SA), and• T simulates U ⊆ SA × SA with respect to (θ ∩ (Reach × SA))−1.Here U andO are respectively called the under-approximate (UA) and over-approximate(OA) transition relations of the mixed abstraction. When discussing mixed abstractions,we will often refer to S = (S, I, T ) as the concrete system, to S as the concrete states, etc.The following serves as the basis for our approach to proving deadlock freedom fora class of parameterized systems. Let p and q be state predicates over SA, and suppose forall (SA, IA, O)-reachable p-states s there exists a q-state r such that s U r. We assertthis by writingA AG (p→ EF q) (5.1)Hence, (5.1) holds of a mixed abstraction if for all p-states reachable using the over-approximative transition relation, there exists a path through the under-approximativetransition relation to a q state. See Figure 5.6 for a schematic of this relationship betweenthe mixed abstraction and a concrete system.Lemma 1. Let A be a mixed abstraction of S relative to θ and let p and q be state predicateson SA. If A AG (p→ EF q) then S AG (θ−1(p)→ EF θ−1(q)).Proof. Let Reach be the set of S-reachable states. Let w be any θ−1(p)-state in Reach.Because O simulates T with respect to θ ∩ (Reach × SA), θ(w) is (SA, IA, O)-reachable,and furthermore θ(w) is clearly a p-state. Let a0, . . . , am be a U -path from θ(w) = a0 to aq-state am. Because T simulates U with respect to (θ∩ (Reach ×SA))−1, for all 0 ≤ i < mand all wi ∈ θ−1(ai) there exists wi+1 ∈ θ−1(ai+1) such that wi T wi+1. Therefore,taking w0 = w, there is a path w T wm where wm ∈ θ−1(q).Note that the definition of mixed abstraction explicitly mentions the reachable statesof S (in the involved simulation relations), this is just our means of formalizing the minimalrequirements a mixed abstraction must satisfy in order to prove Lemma 1. In other words,83s′Sp qs˜ψ−1ReachO-path U -pathSA IAψ−1(s)ψ−1(s′)s∀w∃w′Figure 5.6: Illustration of Lemma 1. Suppose that for each s ∈ p that is reachable froman initial state through O transitions, there exists s′ ∈ q reachable from s through Utransitions. Then every reachable w of S that is a concretization of s has a path to somestate in the concretization of s′.any methodology that aims to construct mixed abstractions must guarantee at least thesesimulations. We emphasize that this is different than asking the user of such a methodologyto precisely characterize Reach (indeed our methodology does not make such a demand).We conclude this section by stating a connection between S and the transitions ofU and O.Lemma 2. Suppose S = (S, I, T ), SA, IA, and θ : S → SA are as in Def. 1. If U,O ⊆SA × SA satisfy1. for all (w,w′) ∈ T such that w is S-reachable we have (θ(w), θ(w′)) ∈ O, and2. (s, s′) ∈ U implies for all S-reachable w ∈ θ−1(s) there exists w′ ∈ θ−1(s′) such thatw T w′,then (SA, IA, O, U) is a mixed abstraction for S.84Proof: Follows directly from Def. 1.When performing reasoning that allows us to add (or remove) transitions from U andO in a mixed abstraction, we employ sufficient conditions of Lemma 2. This essentiallyrestricts the definition of mixed abstractions to those where each transition of T has acorresponding transition in O, and each transition of U has a corresponding path in T .5.2.2 InsufficiencyLemma 1 allows us to infer S AG (θ−1(p) → EF θ−1(q)) if model checking (or any othermeans) verifies A AG (p → EF q). However, the converse of the lemma does not holdin general (or even in common cases). Let us call the mixed abstraction A insufficient ifS AG (θ−1(p) → EF θ−1(q)) holds but A 6 AG (p → EF q). If A is insufficient, it followsthat there exists a p-state ap such that a0 O ap for some a0 ∈ IA but there is no q-stateaq such that ap U aq. There are two common causes for insufficiency:• OA Insufficiency. There is no (S, I, T )-reachable sp ∈ S such that θ(sp) = ap.Hence ap does not abstract any reachable state of S. This is often caused by O beingtoo weak, i.e. there exists a proper subset O′ ⊂ O such that (SA, IA, O′, U) is a mixedabstraction of S wherein ap becomes unreachable. Thus the CMP method should beapplied to strengthen the overapproximation.• UA Insufficiency. There is (S, I, T )-reachable s ∈ S such that θ(s) = a, howevernone of the T -paths s = s0, . . . , s` where s` ∈ θ−1(q) (at least one such T -pathmust exist), are simulations of any U -paths. In practice, this is often caused by Ubeing too strong. In this case, there may exist a proper superset U ′ ⊃ U such that(SA, IA, O, U ′) is a mixed abstraction of S.For the mixed abstractions we use to verify our parameterized systems, we willobserve that OA insufficiency is addressed in previous literature [40], however UA insuffi-ciency is not. Our basic approach is to identify UA insufficiency from a counter-example85trace. In practice, the transitions from O that are needed in U are often apparent from thiscounter-example. The basic idea is to show that for each such transition (s, s′) of O, thereis a corresponding set of paths in the concrete system, i.e., each state of θ−1(s) has a pathto some state of θ−1(s′). Sections 5.3 and 5.4 present how this can be done by syntacticpattern matching and model checking of the abstract system for properties with the formshown by Formula (5.1).There is also a third flavor of insufficiency,• Abstract Quiescence Insufficiency. Note that in Lemma 1, the quiescent pred-icate actually verified is of the form θ−1(q), where q is a predicate on the abstractstates. Suppose, however that there does not exist a q such that θ−1(q) characterizesthe desired set of quiescent concrete states. We experience this for our case studies:the desired quiescence predicate involves a universal quantification over the paramet-ric type that the underlying simulation relation cannot precisely characterize. Thatis, if the concrete quiescent states are characterized by a predicate of the form ∀i. φ(i),then there is no abstract predicate q such that θ−1(q) = ∀i. φ(i). We deal with thisform of insufficiency via Theorem 1 in Section 5.4.1.5.2.3 Parameterized SystemsFor the approach presented in this chapter, a parameterized system P is a function mappingnatural numbers to systems. We write P(n) = (S(n), I(n), T (n)) to denote the componentsof P(n) for an arbitrary n. The states S(n) are the type-consistent assignments to a set ofstate variables. For a state w of a parameterized system and a state variable v, we writev(w) to denote the value w assigns to v. We allow four types of variables:• finite types that are independent of n, such as booleans and enumerations; for sim-plicity, we denote all such types as B• a type which has cardinality n, denoted Pn86• arrays indexed by Pn with elements in B, denoted array [Pn] of B• arrays indexed by Pn with elements in Pn, denoted array [Pn] of PnWe identify the set Pn with the numbers {1, . . . , n} called nodes; the only operationssupported on nodes are equality comparison, assignment, and nondeterministic choice.6This ensures that for all n, P(n) is fully symmetric [69] in Pn; here we give a brief reviewof this notion. Let us write λi.e to denote the array a indexed by Pn where a[i] = e and eis an expression of the appropriate type. Let pi be a permutation on Pn. We overload pi toact on w ∈ S(n) by defining pi(w) ∈ S(n) to be the state such that for each state variablev, v(pi(w)) is equal to:• v(w), if v has type B• pi(v(w)), if v has type Pn• λi.(v(w)[pi−1(i)], if v has type array [Pn] of B• λi.pi(v(w))[pi−1(i)]), if v has type array [Pn] of PnThen (S(n), I(n), T (n)) is called fully symmetric if for all w,w′ ∈ S(n) and all permutationspi on Pn we have both that w ∈ I(n) iff pi(w) ∈ I(n), and (w,w′) ∈ T (n) iff (pi(w), pi(w′)) ∈T (n). The following lemma has a simple inductive proof using the latter.Lemma 3 (Path Symmetry). For w,w′ ∈ S(n) we have w T (n) w′ if and only ifpi(w) T (n) pi(w′).The method presented in this chapter is applicable to fully symmetric models. Sec-tion 5.3.1 gives the restrictions on parameterized systems that ensure our approach can beapplied. We refer to such system as admissible.6If i and j are nodes, a parameterized system is not allowed to perform a comparison such asi < j or increment a variable i := i+ 1.875.2.4 UndecidabilityHere we sketch a proof that model checking symmetric parameterized systems is undecid-able. Given a Turing machine M with no input, we simulate M using at most i tape cellswith S(i). Each node of S(i) represents a tape cell. The description of M can be encodedusing boolean variables in S(i). Two phases of the parameterized system occur. First,nodes are chosen to represent tape cells in some order. Initially, all cells are unused. Eachstep nondeterministically chooses some unused cell and marks it as used. The first nodeselected holds the symbol for the leftmost tape cell. The second node selected holds thesymbol for the second tape cell to the left and so on. Each node has two variables of typenode that act as pointers to the left and right, so the ordering of cells is maintained througha doubly linked-list. Note that the tape cells are represented by nodes, and the state ofthe Turing machine is stored in global (i.e. non-parameterized) variables. We include anadditional boolean variable halts which is initialized to false. Second, M is simulatedusing the tape as represented by the doubly linked-list of nodes. If M halts using at mosti cells, then our simulation will eventually reach a halting state and halts will be set true.Otherwise, simulation ofM on a tape with i cells will eventually run off the right end of thetape, at which point the simulation enters a TapeOut state and remains there forever. Or,M runs forever without using more than i cells. In either case, halts remains false. If Mhalts when run on empty input, then it uses some bounded number of cells before halting.If our simulation runs with at least that many nodes, then our simulation will reach astate where halts is set, so AG¬halts is not satisfied. Therefore, ∀i. S(i) |= AG¬halts.is satisfied iff M does not halt when run on empty input. Note that deadlock-freedom isa more general property than safety, as AG (¬p→ EF false)⇒ AGp. Since model checkingdeadlock-freedom of a symmetric parameterized system solves the co-halting problem, it isundecidable.885.3 Syntactical AbstractionWe assume that the parameterized system is modeled by Murϕ [51] or a similar guarded-command notation. Given a program P to describe the parameterized system, we usewell-established techniques [92, 93, 40, 76, 86, 28, 83, 99] to obtain an abstraction of P .Our formulation is inspired by the Krstic’s “syntactic” approach [76]; Section 5.3.1 statesrestrictions that we assume on the form of P , and Section 5.3.2 summarizes the abstractiontechnique. In Section 5.4, we show how the abstraction can be generalized to produce anunder-approximate transition relation, U , and how U can be soundly weakened to provequiescence properties.5.3.1 Syntax and RestrictionsWe assume that the guarded-command program that models the parameterized systemsatisfies certain syntactic restrictions described in this section. These restrictions ease thesyntactical abstraction process and simplify reasoning about the program because manyuseful properties are guaranteed by construction. We say that such a program is admissible.From the case studies reported in Section 5.5, we’ve found that these restrictions are notproblematic in practice.An admissible program has a set of variables of the types indicated in Table 5.1. Astate of the admissible program is a type-consistent assignment of values to these variables.If e is a term, we write e(s) to denote the value of e in state s. In Murϕ, a guarded commandis called a rule and has the form: guard _ action, where the guard is a boolean-valuedexpression, and the action is a sequence of one or more assignments. We write r : ρ _ ato denote rule r with guard ρ and action a.The denotation JrK of r is the set of tuples (s, s′) ∈ S × S such that ρ(s), and s′ isthe state reached by performing action a from state s. Murϕ has rulesets of the form:ruleset i : Pn do r(i) end;where r(i) is a rule (or a ruleset, as they may be nested). Here, i ∈ Pn is called the ruleset89parameter. If rs is the ruleset indicated above, thenJrsK = {(s, s′) | ∃i ∈ Pn. (s, s′) ∈ Jr(i)K} (5.2)A local boolean predicate L is a propositional formula over the variables of typearray [Pn] of B. For node i, we say L[i] holds of a state if L evaluates to true when itsvariables are assigned according to the ith array entries of the state. An admissible programmust satisfy the following syntactic restrictions. Rulesets have guards that are a conjunctof:• Boolean terms, composed of variables of type B or array [Pn] of B indexed by a rulesetparameter, and the logical connectives AND, OR and NOT.• At most one forall condition, appearing positively, of the form ∀i ∈ Pn. C[i] where Cis a local boolean predicate.• Any number of P-comparisons, of the form v1 = v2 or v1 6= v2, where v1 and v2are each either a ruleset parameter, a variable of type Pn, or a variable of typearray [Pn] of Pn indexed by a ruleset parameter. Without loss of generality, werestrict each ruleset parameter to appear in at most one P-comparison of equality.The initial states and ruleset commands given by a sequence assignments of thefollowing forms:• Assignments of the form b1 := b2, a1B[i] := b2, a1B[i] := a2B[i], where b1 and b2 arevariables of type B, a1B and a2B are variables of type array [Pn] of B, and i is a rulesetparameter. RHS values may also be the constants true and false.• Assignments of the form p1 := p2, a1P[i] := p2, a1P[i] := a2P[i], where p1 and p2 arevariables of type Pn, a1P and a2P are variables of type array [Pn] of Pn, and i is a rulesetparameter.90• Forall updates of the form ∀i ∈ Pn. aB[i] := `(i), where ` is a boolean functiondepending on variables of type B, Pn and on the ith index of array variables.The general form of the rulesets are as follows.ruleset i1 : Type1 doruleset i2 : Type2 do...ruleset iN : TypeN dorule "Rule1" guard1 ==> command1;...rule "RuleK" guardK ==> commandK;end;...end;end;These restrictions ensure that guards in admissible programs do not contain dis-junctions of comparisons between variables of type P and have no existentially quantifiedterms; updates in admissible programs do not contain if-then-else clauses. These constructscan be handled by a straightforward splitting into multiple rulesets. The Murϕ systems forGerman and FLASH are admissible, and from this experience, we believe that the systemsfor many other symmetric protocols will be admissible or easily modified to produce anadmissible equivalent.5.3.2 AbstractionLet P(n) = (S(n), I(n), T (n)) be the denotation of an admissible program P . We wantto construct a mixed abstraction, A = (SA, IA, O, U). In this section, we show how SA,IA, and O can be readily found by syntactic transformations of the source-code of P .Section 5.4 extends this approach to the construction of U . To create these abstractions,we introduce a new type to represent type Pn from the concrete system; this type requires91concrete type abstract type abstraction ψB B v(ψ(s)) = v(s)Pn Pk ∪ {Other} v(ψ(s)) = αk(v(s))array [Pn] of B array[Pk]of B ∀i ∈ Pk : v(ψ(s))[i] = v(s)[i]array [Pn] of Pn array[Pk]of Pk ∪ {Other} ∀i ∈ Pk : v(ψ(s))[i] = αk(v(s)[i])Table 5.1: Mapping of types from concrete system to mixed abstraction. The abstractstate space is SA and the abstraction function is ψ : S(n) → SA. For a system variable vand s ∈ S(n), the leftmost column gives the type of v in the concrete domain, the secondcolumn gives the type of v in SA, and the third column specifies the value v is assigned byψ(s) in terms of v(s). Function αk : Pn → Pk ∪ {Other} is defined as αk(v) = v for v ≤ k;αk(v) = Other otherwise.the user to choose a constant k. It is assumed throughout that k is at least the greatestnumber of ruleset parameters for any ruleset in P (typically, k ≤ 3). Table 5.1 specifies howeach variable of P is typed in A and how the abstraction function ψ acts on v. Intuitively,ψ(s) preserves B variable values, replaces values of type Pn greater than k with Other , andrestricts arrays to the indices Pk (hence all array entries v[i] for i > k are deleted by ψ).Although ψ is a function, we will treat it as a relation, ψ ⊆ S(n)× SA, and freely employits inverse ψ−1 ⊆ SA×S(n). We call elements of Pk non-abstracted and elements of Pn\Pkabstracted. For every ruleset parameter i interpreted as abstracted, all updates with aB[i]or aP[i] appearing on the LHS are deleted. All instances aB[i] or comparisons depending onaP[i] appearing positively in the guard are replaced with true; those appearing negativelyare replaced with false. Instances of i appearing on the RHS of assignments are replacedwith Other . Finally, equality comparisons with i appearing positively in the guard arereplaced with true. The state variables of A have the same names as those of P, with thetypes changed as shown in Table 5.1.We now overload ψ to map rules of P system to rules that generate the statetransitions of O. Rules of P that are not in rulesets are copied without change of syntax(therefore, with the implied change of types), to O. If ruleset r : ρ _ a depends on mruleset parameters, consider the set of rule instantiations, obtained from assigning each92ruleset parameter a value in Pn. This set is partitioned as r0, ..., r2m−1, where all ruleinstantiations of Rj have the same partitioning of ruleset parameters into Pk and Pn \ Pk(since there are m ruleset parameters, there are 2m possible partitions). Each set rjabstracts to an abstract ruleset r̂j according to the described syntactic transformation.We denote the set of corresponding abstract rulesets to concrete ruleset r by ψ(r) ={r̂0, ..., r̂2m−1}. Let r̂0 : ρ̂0 _ ̂a0 denote the unique element of ψ(r) such that all rulesetparameters are non-abstracted. Note that although the set of rule instantiations differdepending on the value of n, the set ψ(r) does not, for any n > k, hence we can fixn = k + 1 to perform this abstraction.In this chapter I will use the German cache coherence protocol [58] as an example;a Murϕ model for the protocol appears in Appendix A. The SendGntE ruleset sends amessage to inform a cache/node that it has been granted exclusive access to a cache line.Example: The concrete rule from German SendGntE isruleset i : NODE do rule "SendGntE"CurCmd = ReqE ∧ CurPtr = i ∧ Chan2[i].Cmd = Empty∧ ¬ExGntd ∧ forall j : NODE do ¬ShrSet[j] end ==>Chan2[i].Cmd := GntE; ShrSet[i] := true;ExGntd := true; CurCmd := Empty;The abstraction contains two corresponding rulesets, one where i is non-abstractedand one where i is abstracted, with the former corresponding to r̂0:ruleset i : NODE A do rule "ABS SendGntE1"CurCmd = ReqE ∧ CurPtr = i ∧ Chan2[i].Cmd = Empty∧ ¬ExGntd ∧ forall j : NODE A do ¬ShrSet[j] end ==>Chan2[i].Cmd := GntE; ShrSet[i] := true;ExGntd := true; CurCmd := Empty;rule "ABS SendGntE2"CurCmd = ReqE ∧ CurPtr = Other ∧ ¬ExGntd∧ forall j : NODE A do ¬ShrSet[j] end ==>ExGntd := true; CurCmd := Empty;93Notice the difference between the forall condition of SendGntE and the weaker oneof ABS SendGntE1, despite the identical syntax. The type NODE in the concrete systemranges over values from 1 to n, where as NODE A in the abstract system ranges from 1 tok < n.5.4 Verifying Universal QuiescenceWe want to verify properties of the formP(n) |= AGEFQn , (5.3)where P(n) is a parameterized system andQn = G ∧∧i∈Pn L[i] (5.4)is the quiescence property to be verified. Here, G is a boolean predicate, meaning G onlydepends on variables of type B, while L is a local boolean predicate (defined in Sect. 5.3.1).To verify (5.3), we construct a mixed abstraction, A = (SA, IA, O, U), and show that forall O-reachable states, there exists a U -path to a state that satisfies Qn. To do so, wemust address two key issues. First, Qn cannot be established directly from A, as Qn refersto variables of the concrete system that do not appear in the abstraction. This is the“abstract quiescence insufficiency” defined in Section 5.2.2; Section 5.4.1 shows how it canbe addressed. Second, U may omit transitions that are required to reach states that satisfyQn. This is the UA insufficiency from Section 5.2.2, and we address it in Sections 5.4.2and 5.4.3.5.4.1 Universally Quantified QuiescenceTo show (5.3), we need to show that L[i] holds for all i, not just non-abstracted i. Intu-itively, we show that A can reach a state where L[i] holds for all non-abstracted i, thenuse Lemma 3 (Path Symmetry) to exchange any abstracted j for which L[j] might not94hold with a non-abstracted i and find a path that establishes L[i]. We then establish (5.3)by induction. For this approach to be valid, we must show that if G holds, then for eachnon-abstracted node i, there is a U -path to a state that satisfies L[i] whose concretizationin P(n) does not falsify L[j] for any abstracted node j. To do this, we introduce the no-tion of L-preserving transitions. For the remainder of this chapter, let Reach denote thereachable states of S(n).Definition 2. For local boolean predicate L, abstract transition (s, s′) is L-preserving if∀w ∈ ψ−1(s) ∩ Reach. ∃w′ ∈ ψ−1(s′). w T w′∧ ∀i ∈ Pn \ Pk. w ∈ L[i]⇒ w′ ∈ L[i] .Abstract ruleset r̂ is L-preserving if all transitions in Jr̂K are L-preserving. A mixedabstraction is called L-preserving if its UA transitions contain only L-preserving rules.Let us fix a mixed abstraction A = (SA, IA, U,OA) for P(n), where n > k. Wealso use L to denote a local boolean predicate, and B = (SA, IA, UB, OA) to denote amixed abstraction with only L-preserving transitions for UB. Note that the same set OAis used for overapproximate transitions of both A and B. For i, j ∈ Pn, define Pji ⊆ Pn as{h ∈ Pn : i ≤ h ≤ j}.Let permutation pij↔h map elements of Pn according topij↔h(i) =j for i = h,h for i = j,i otherwise.(5.5)Let T be shorthand for T (n). We can now state our main theorem for showing universallyquantified quiescence.Theorem 1 (Universally Quantified Quiescence). Let G denote a boolean predicate, L alocal boolean predicate, and let A and B be mixed abstractions of P(n), and assume that Bis L-preserving. If951. A |= AGEF (G), and2. B |= AG (G→ EF (G ∧∧i∈Pk L[i]))then P(n) |= AGEF (G ∧∧i∈Pn L[i]).Proof. For 1 ≤ h ≤ n, let Jh denote the property ∀w ∈ G ∧ Reach. ∃w′ ∈ (G ∧∧i∈Ph L[i])where w T w′, and ∀i ∈ Pnk+1. w ∈ L[i] → w′ ∈ L[i]. Antecedent 2 implies Jk. AssumeJh holds for k ≤ h < n. Applying permutation pi1↔h+1 to Jk gives ∀w ∈ G ∧Reach. ∃w′ ∈(G ∧∧i∈Ph+12L[i]) where w T w′, and ∀i ∈ Pnk+1. w ∈ L[i] → w′ ∈ L[i]. This propertywith Antecedent 2 implies Jh+1 by transitivity. Thus, property Jn follows by induction.The paths implied by Antecedent 1 composed with those of Jn complete the proof bytransitivity.5.4.2 Abstract Rule TagsGiven a program P , we use the syntactical abstraction technique to produce an abstractprogram ̂P . In the remainder of this paper, we use the term “ruleset” to refer to Murϕ-style rulesets with any degree of ruleset nesting including no such quantification – e.g. a“ruleset” could be a simple rule. We want to identify which rulesets of ̂P have denotationsthat are UA, and which are L-preserving, for a given local boolean predicate L. We user : ρ _ a and r̂0 as defined in Section 5.3.2, and use r̂j to denote an arbitrary element ofψ(r).We tag abstract rulesets with tags from the set {AUG,AEG,AUC,AEC}; the first twoelements are called guard tags, and the last two are called command tags. These indicatereasons (in the guard and command, respectively) why the abstract ruleset is not triviallyUA or L-preserving. An abstract ruleset can be tagged with any of the 16 subsets of thesetags. AUG and AUC indicate that a universal quantifier has been abstracted; similarly AEGand AEC indicate that existential information has been abstracted.77 Note that AEG and AEC are not indication of explicit existential quantifiers in the concrete96We call ρ and ρ̂j syntactically equivalent if they are expressed with identical syntax,and ρ contains no forall conditions. In this case, we attach no guard tags to r̂j . Likewise, ifa and ̂aj have identical syntax and contain no forall updates, then we attach no commandtags. If a ruleset r has no guard or command tags, then it is simple to show that r is bothUA and L-preserving for any local predicate L. Typically, the set of all such rulesets isinsufficient to establish the desired quiescence property.The elements of ψ(r)\ r̂0 may not have syntatically equivalent guards because someruleset parameter i is abstracted, so the abstraction will syntactically change the guard(except for degenerate cases). These rulesets have guards that optimistically abstractaway references to abstracted i; this is safe when constructing the OA but not the UA.Such rulesets are tagged with AEG (abstract existential in guard)7. For example, ruleABS SendGntE2 in Section 5.3.2 is tagged with AEG due to the CurPtr = Other conjunct.When ρ contains a forall condition, it is necessarily weakened in every rule of ψ(r). In thiscase, every ruleset of ψ(r) including r̂0 is tagged with AUG (abstract universal in guard).Similarly, existential or universal updates may be missing from the command of anabstract ruleset, relative to the concrete version. If local update ab[i] := eb appears in afor ruleset parameter i, then any ruleset of ψ(r) where i is abstracted (that is, where theupdate ab[i] := eb vanishes), is tagged with AEC (abstract existential command).7 If acontains a forall update, then every ruleset of ψ(r) is tagged with AUC (abstract universalcommand).Examples: Referring to the example in Section 5.3.2, abstract ruleset ABS SendGntE1is tagged with AUG because the forall condition only ranges over concrete nodes. Ab-stract rule ABS SendGntE2 is tagged with AUG for the same reason, AEG because the guardconjunct Chan2[i].Cmd = Empty is removed, and AEC because the updates Chan2[i].Cmd:= GntE and ShrSet[i] := true are abstracted away.The concrete rule RecvReqS from the German protocol (Appendix A) issystem syntax; existential refers to the quantifier in the ruleset denotation, which ranges over theruleset parameter.97ruleset i : NODE do rule "SendGntE"CurCmd = ReqE ∧ CurPtr = i ∧ Chan2[i].Cmd = Empty∧ ¬ExGntd ∧ forall j : NODE do ¬ShrSet[j] end ==>Chan2[i].Cmd := GntE; ShrSet[i] := true;ExGntd := true; CurCmd := Empty;ruleset i : NODE do rule "RecvReqS"CurCmd = Empty & Chan1[i].Cmd = ReqS ==>CurCmd := ReqS; CurPtr := i; Chan1[i].Cmd := Empty;for j : NODE do InvSet[j] := ShrSet[j] end;end end;The abstraction contains two corresponding rulesets, one where i is non-abstractedand one where i is abstracted.ruleset i : NODE A do rule "ABS RecvReqS1"CurCmd = Empty & Chan1[i].Cmd = ReqS ==>CurCmd := ReqS; CurPtr := i; Chan1[i].Cmd := Empty;for j : NODE A do InvSet[j] := ShrSet[j] end;end end;rule "ABS RecvReqS2"CurCmd = Empty ==>CurCmd := ReqS; CurPtr := Other;for j : NODE A do InvSet[j] := ShrSet[j] end;end;Both rules ABS RecvReqS1 and ABS RecvReqs2 are tagged with AUC, as the forupdate only ranges over concrete nodes.5.4.3 HeuristicsEach tag assigned to a ruleset corresponds to a set of proof obligations for showing it isUA or L-preserving (for some local boolean predicate L). Either of these properties can98PropertyTag UA L-preservingAEG (Abstracted Existential Guard) Heuristic 1 Heuristic 2AUG (Abstracted Universal Guard) Heuristic 3 Heuristic 4AEC (Abstracted Existential Command) None Heuristic 2AUC (Abstracted Universal Command) None Heuristic 4Table 5.2: Heuristics for ruleset tag/property pairs and associated obligations. “None”means there is no obligation to show. A ruleset with no tags is L-preserving, while onewith no guard tags is UA.be established by discharging through the corresponding heuristic according to Table 5.2.Once a tag is discharged we may safely ignore it as a potential reason why the desiredproperty does not hold. Each of the heuristics involves model checking a mixed abstraction.In this section, the various heuristics are stated.An abstract ruleset is called local to i (as a special case of having no tags) when theguard only depends on variables of type B, P, and the ith index array variables aB, andthe command only updates the local state of non-abstracted i. Here, given an abstractor concrete state, the local state of i is simply the values of all array variables at index i.The transitions that compose such rules are called local transitions. A mixed abstractionwith UA set U composed only of rulesets local to i is denoted A`(i). Assuming ruleset r̂is UA, we write A`(i) |= AG (A → EFr̂B) when every O-reachable A-state has a path tosome B-state consisting of transitions of rules local to i and necessarily a single transitionof ruleset r̂. See Figure 5.7.When showing rulesets are UA (Heuristics 1 and 3), note that the tags AEG orAUG indicate guards that are OA because they have abstracted away information aboutabstracted nodes. Our heuristics compute O-reachable states and exploit Lemma 3 (PathSymmetry) to find the possible local state of abstracted nodes under some boolean predi-cate. Then, if the local state of node i does not have a required property, we find “hiddenpaths” composed entirely of rulesets local to i that reach a state that does have the prop-erty. This assures that although some states in the concretization of abstract guard ρ̂j99O-reachableA Br̂r̂r̂ℓ(i)ℓ(i) ℓ(i)ℓ(i)Figure 5.7: Illustration of A`(i) |= AG (A→ EFr̂B). Paths labeled `(i) are paths composedof transitions local to i, and transitions labeled r̂ are transitions for that rule.do not satisfy the corresponding concrete guard ρ, there is a guaranteed path that is notobservable in the abstract system from every ψ−1(ρ̂j) to a ρ-state. For simplicity, wepresent our heuristics for rulesets with at most one abstracted ruleset parameter; however,generalizing these heuristics to rulesets with more parameters is straightforward.When showing rulesets are L-preserving, it must be checked that aspects of theguard and update that have been abstracted away do not affect L-preservation in theabstracted nodes; Heuristics 2 and 4 pertain to this check. The obligations for theseheuristics require that a certain transition must fire on each path that justifies the deadlockfreedom property. Intuitively, when the heuristic obligation holds, the concrete paths thatjustify the tagged ruleset in question r̂ to be UA must have a certain form. For abstractednode i, each path is• a (possibly empty) path composed of transitions of rules local to i, followed by• a transition of concrete rule r (possibly changing non-local variables), followed by• a (possibly empty) path composed of transitions of rules local to i.Furthermore, we only seek a path when the starting state is an L[i]-state, and the finalstate must also be a L[i]-state. The i for which this is shown depends on the heuristic.100Heuristic 2 reasons about those abstracted nodes that are abstracted ruleset parametersin r̂. Heuristic 4 reasons about those abstracted nodes that are not abstracted rulesetparameters in r̂. Note that we assume a ruleset has been proven UA before it is provenL-preserving.A few definitions are needed for the heuristic statements. If r̂ is an abstract rulesetwith ruleset parameter i, let r̂|i=1 : ρ̂|i=1 _ ̂a|i=1 be the ruleset where all instances of iare replaced with the constant value 1. Also, let relax (ρ̂, i) be the guard ρ̂ but with thevalues of variables aB[i] and aP[i] unconstrained. If A ⊆ SA, let Γ(A) denote the strongestboolean predicate implied by A.We include proofs for all heuristics in Appendix B, and sketch the proof of Heuris-tic 1 here. are similar, and are omitted. Throughout these proofs, we fix a mixed abstrac-tion A = (SA, IA, U,O) for P(n), where n > k.Heuristic 1. For ruleset ̂rj tagged with AEG and with abstracted ruleset parameter i,suppose that r̂0 is UA. If A`(1) |= AG (relax (ρ̂0, i)|i=1 → EF (ρ̂0|i=1)) then tag AEG isdischarged for showing r̂j to be UA.Proof. (Sketch)This Heuristic handles the case where we aim to show that an abstracted node eventuallyperforms some action r̂j . The idea is to consider any concrete state that when abstracted,satisfies the guard of r̂0. Some of these concrete states may not satisfy the guard of theconcrete rule rv due to the values of local variables of the abstracted node. We use apermutation to “swap” the abstracted node with the non-abstracted node 1. Becausethe values of local variables of the abstracted node are unknown (other than that theysatisfy any safety properties that have been shown by suitable methods), we get that thestate reached by the permutation satisfies relax (ρ̂0, i)|i=1. We construct a local path toa state that satisfies ρ̂0|i=1 and then perform action r̂j . Details of the proof are given inAppendix B.101Heuristic 2. For ruleset r̂j tagged with AEG and/or AEC with abstracted ruleset parameteri, suppose that r̂0 is UA. If A`(1) |= AG ((relax (ρ̂0, i)|i=1 ∧ L[1]) → EFr̂0 (L[1])) then AEGand AEC are discharged for showing r̂j to be L-preserving.Heuristic 3. For ruleset r̂j tagged with AUG and not AEG, let ∀i ∈ Pn. C[i] be the forallcondition of ρ. If A`(1) |= AG (Γ(ρ̂j)→ EF (C[1])), then tag AUG is discharged for showingr̂j to be UA.Heuristic 4. For ruleset r̂j tagged with AUG and/or AUC, let r̂2m−1 ∈ ψ(r) be the abstractruleset where all ruleset parameters of r are abstracted. If A`(1) |= AG ((Γ(ρ̂j) ∧ L[1]) →EFr̂2m−1 (L[1])), then tags AUG and AUC are discharged for showing r̂j to be L-preserving.We apply each of these heuristics by performing model checking using a mixed ab-straction that uses only local rules for U . As local rules are identified entirely by syntax,they are known a priori ; therefore, we could take a brute force approach that attempts touse our heuristics to prove every abstract rule is UA and L-preserving. However, we preferto take a counter-example driven approach, as there are two distinct situations in which ourheuristics may not suffice that arose in our case studies. Firstly, additional auxiliary vari-ables may be needed to capture the system state with a slightly finer-grained abstraction.Secondly, if the ruleset is not underapproximate, manual guard strengthening or splittinginto multiple rulesets may help. These are illustrated with examples in Section 5.5. SeeAppendix C for detailed examples of applying Heuristics 1 and 3.5.5 Case StudiesMixed abstractions are expressed as Murϕ models. The OA rulesets are borrowed fromChou et al. [40] and the (initial) UA transitions are derived manually according to tags –those rules with no guard tags. Thus, the UA rulesets are maintained as a subset of theOA rulesets. Rulesets with no tags at all are identified as L-preserving, and the relevantsubset of these are identified as local.102We used PReach [22] for the mixed abstraction checks. As described in Chapter 3,PReach was originally designed to check state-invariants. We added a feature to checkCTL properties of the form AG (p → EF q). The search algorithm is simple: for every(p ∧ ¬q)-state s visited during the forward reachability computation, choose an enabledrule of U and fire it to reach a new state. Firing rules of U continues until one of thefollowing occurs.1. a q-state is found,2. a U -dead-end state is found, or3. a cycle is detected.In the first case, a path from s to a q-state exists and we proceed with the forward reach-ability computation. In the second case, there might not exist such a path (although webelieve that in practice this is strong evidence that no path exists). If a cycle is found,this is usually an indication that U contains rules that do not help us reach q-states, so wemight as well exclude them and try again8. For example, there are several easily identifi-able rules in both German and FLASH that initiate requests by injecting messages, andare not useful transitions in finding a quiescent state where all messages are consumed.Notice that deadlock freedom properties can be verified by a CTL model checker, but forour case studies we chose PReach because it was straightforward to implement the notionof UA rulesets and counterexample generation.This section contains a brief overview of the case studies. For a more detailedreport, the reader may refer to supplementary material [21] including the Murϕ sources.5.5.1 Automatic Deadlock Freedom PredicatesAs mentioned above, it is common when checking antecedent 1 of Theorem 1 to reach aU -dead-end state sˇ where no further progress can be made toward the goal. When this8Clearly, if U is an underapproximate transition relation and U ′ ⊂ U then U ′ is underapproxi-mate as well. Accordingly, removing transitions from U produces a mixed abstraction.103occurs, the model checker reports a failure and prints the rules of O that are enabled insˇ, as a guide to the user of which rules could be useful to prove UA and add to U . Theseenabled rulesets necessarily have tags AEG or AUG or both. We have written a simple toolthat, given a particular rule/ruleset name, will determine the tags and generate the modelchecking obligation to prove it is UA through Heuristics 1 and 3.Example: Suppose we seek to show ruleset r̂2 = ABS SendGntE2 is UA, and supposeit is already known by Heuristic 3 that associated r̂1 = r̂0 = ABS SendGntE1 is UA. RulesetABS SendGntE2 is tagged with AEG because ruleset parameter i is abstracted. The guardρ̂0|i=1 isCurCmd = ReqE ∧ CurPtr = 1 ∧ Chan2[1].Cmd = Empty ∧ ¬ExGntd∧ forall j : NODE do ¬ShrSet[j] endand relax (ρ̂0, i)|i=1 isCurCmd = ReqE ∧ CurPtr = 1 ∧ ¬ExGntd ∧ forall j : NODE do ¬ShrSet[j] endAs implemented, our tool does not support automatic generation of the propertiesto check for Heuristics 2 and 4. However, this is generally straightforward to do by hand,and could be automated as well. In cases when the deadlock freedom property for someheuristic when applied to ruleset r̂j fails the verification attempt, the user may use thecounterexample trace as a guide for strengthening ρ̂j manually. Any ruleset of O may beduplicated and strengthened with some predicate, which is trivially sound because the Otransitions are not changed. The resulting strengthened ruleset might satisfy the heuristicdeadlock freedom property and be proven UA or L-preserving. Such manual strengtheningis required in the verification of both German and FLASH.5.5.2 The German ProtocolThe system used for O is the abstract Murϕ model for German of Chou et al., instantiatedwith a single non-abstracted node (k = 1). The initial set of UA transitions U0 includesall rulesets with no guard tags and the local subset of these are also identified.104The property we verify is (5.4), where G states that that the directory is not cur-rently processing a transaction (CurCmd = Empty) and L[i] states that all communicationchannels associated with the ith cache are empty:Chan1[i].Cmd = Empty ∧ Chan2[i].Cmd = Empty ∧ Chan3[i].Cmd = EmptyAntecedent 1 of Theorem 1 requires A |= AGEF (G) for a mixed abstraction A. Check-ing this property, the model-checker gets stuck at a U -dead-end state where the ruleABS SendGntE1 is enabled (see Section 5.3.2). Our tool recognizes this as an AUG-taggedrule and generates the obligations to according to Heuristic 1 so the rule can be soundlyadded to U . The model checker discharges the obligation, and ABS SendGntE1 is added toU .Checking Antecedent 1 is repeated with the weakened U and gets stuck three moretimes: once where ABS SendGntE2 is enabled (tagged with AEG and AUG), and twice whereother AEG-tagged rulesets are enabled. The Heuristic 3 obligation for ABS SendGntE2 isidentical to the one previously shown for ABS SendGntE1, so there is no need to repeat itsverification. The tool generates the Heuristic 1 obligation and it is discharged by modelchecking. In the other two, AEG cases, the corresponding rulesets r̂0 are already known tobe in U , so we proceed directly with the tool, and obligations for Heuristic 1 are generated.One is discharged automatically; the other requires human guidance because the generateddeadlock freedom property fails to verify. An examination of the counterexample revealsthat when exclusive access has been granted to an abstracted node, there is no pointerindicating which node has been granted (only a flag to indicate that it has indeed beengranted, ExGntd). Without this pointer, the permutation of Heuristic 1 is not applied to theproper abstract node actually holding exclusive access. Although manual, the solution isstraightforward: add a new system variable EPtr of type P that points to the node holdingexclusive access; and strengthen the guard of the ruleset. This is done in a sound mannerwhere only the ruleset version we prove is UA is strengthened in this way; the originalruleset belonging to O is not modified. After this modification, the relevant property is105verified.Having added these four rules to U of mixed abstraction A, Antecedent 1 of The-orem 1 is established by model checking. We now describe the procedure to show An-tecedent 2. Initially, all rulesets that have no tags are known to be L-preserving, andthese are added to U for mixed abstraction B. Model checking then reveals that two ad-ditional rules are needed to establish the Antecedent: ABS SendGntE1 (tagged AUG) andABS RecvInvAck2 (tagged AEG and AEC). These tags are discharged by automaticallygenerating and checking the obligations of Heuristics 4 and 2, respectively. Adding thesetwo rules to U for mixed abstraction B allows Antecedent 2 to hold and completes theverification of the German protocol.5.5.3 The FLASH ProtocolThe quiescence property verified of FLASH is of the same form as (5.4), and states that allchannels are clear and the directory is not waiting to perform a write-back9. Antecedent 1of Theorem 1 holds immediately using the initial set of UA rulesets having no guard tags.To show Antecedent 2 of Theorem 1, we start with the set U of L-preserving statesprovided by tag examination and use model checking as with the German protocol. Fourrules, each tagged with AEG and AEC, must be shown L-preserving. We first show thatthey are UA, by applying Heuristic 1. For two of these rulesets, model checking theobligations for Heuristic 1 succeeds. For the other two, model checking fails upon reachinga dead-end state sˇ′ where no local rules fpr node 1 are enabled. The manual strengtheningneeded for these two rulesets is identical. Without loss of generality let the ruleset ber̂j . Inspecting the counter example reveals that the state s ∈ relax (r̂0, i)|i=1 that led tosˇ′ has different values for some B-type variables than those in sˇ, the original dead-end9Although the Murϕ system for the mixed abstraction of FLASH contains rules where two indexvariables have been instantiated as Other , none of these must be shown UA/L-preserving to proveour example property. Some such rules are needed to be shown UA if the conjunct ¬Pending isadded to the quiescent property. We omit these from this paper for ease of presentation, but notethat similar reasoning to Heuristic 1, which assumes only one such index variable, is sufficient.106state revealed when checking Antecedent 2, where r̂j is enabled. This indicates that theguard r̂j is too weak and must be strengthened with a predicate on these variables. Weduplicated the ruleset for the aforementioned reasons of soundness, and strengthened theguard with a predicate requiring these variables to match their value in sˇ. Then, theautomated procedure completed successfully and the four rules are established as UA. Toshow they are L-preserving, Heuristic 2 is applied to each ruleset and the obligations aredischarged automatically; this establishes the quiescence property by Theorem 1. Withregard to the manual strengthening step, we note that in principle the model checker couldclassify the reachable states of relax (r̂0, i)|i=1 for which a path to r̂0|i=1 is found versusthose where no such path is found. Thus, the strengthening predicate could be generatedautomatically.5.6 DiscussionMuch of the theory is restricted for ease of presentation and because it is sufficient forthe case studies. However, there are a number of generalizations and further directionsto consider. As this work has been guided by example protocols, in the future we mayapply these methods to a real-world protocol that requires one or more of the followingextensions.5.6.1 Permutations on More than One Abstracted NodeAs mentioned, the current method for dealing with the AEG tag focuses on permuting atmost a single abstracted node with some non-abstracted node. The same principle maybe applied to multiple abstracted nodes, where they are each swapped with a distinctnon-abstracted node. The local state of such nodes are unknown, and furthermore it isunknown if they are equal or not. Regardless, the reachability of the OA transitions maybe strong enough to show they must be equal or unequal, and may have highly constrained107local states10. This technique is also applicable to array variables. Suppose the guard ofsome concrete ruleset with parameter i contains the condition i = Ptr1 & MyArray[i] =Ptr2, where MyArray is of type array [P] of P, and Ptr1 and Ptr2 are of type P. An abstractruleset where i is abstracted will overapproximate this condition as Other = Ptr1, but apermutation can be designed that swaps in both i and MyArray[i], with isdefined(Ptr1)& Ptr1 != Other & (isdefined(Ptr2) -> Ptr2 != Other). This is the start predicateof the DF property to check for proving the abstract ruleset is UA (the end predicate is,exists i : NODE do i = Ptr1 & MyArray[i] = Ptr2 end).5.6.2 Local Rule GeneralizationsOur approach has a restricted view of what constitutes a “local” rule; the local state of atmost one node may change in the update, and the global state, the variables of type B or P,may not change. Indeed, if a protocol description contains no such rules, then the heuristicmethods cannot be applied to find “hidden paths”, the paths found by model checking thatestablish a DF property and composed of local rules. One could imagine generalizing thedefinition of local rules to depend on and/or update the local state of (say) two nodes i1and i2. However, this complicates the reasoning about hidden paths because it may benecessary to show that for some local state of i1, there always exists a node i2 such that alocal rule depending on i1 and i2 can fire.An orthogonal generalization is to allow local rules to change the global state. Ifevery hidden path changes the global state in an identical manner, then the ruleset thatis shown to be underapproximate must respect these changes in it’s update. Such side-effects of these hidden paths could easily be incorporated to our method, but this was not10As a side note, notice that if m is the number of nodes that are swapped from the set of ab-stracted nodes to the set of non-abstracted nodes, then the number of equivalence class realizationsis exactly the number of partitions of a set with m elements. This is known as the mth Bell numberBm, which can be expressed as a sum of Stirling numbers of the second kind:∑mi=0{mi}. The valuesof Bm’s for 1 ≤ m ≤ 10 are 1, 2, 5, 15, 52, 203, 877, 4140, 21147, 115975. Thus, this generalization tomore than one abstracted node seems reasonable for small m and intractable for m ≥ 10.108necessary for our case studies.5.6.3 Automatic StrengtheningIn the case study of the FLASH protocol, one of the abstract rulesets of O that we wanted toprove was UA had a guard that was too weak for the heuristic method to handle. Instead,we were able to show that a strengthened version of this rule was UA. The strengtheningpredicate, expressed in terms of variables of type B, was deduced by comparing the globalstate of s˜ (the dead-end state found when checking an antecedent of Theorem 1) with s˜′(the dead-end state found when checking the DF property of the heuristic). In the currentimplementation, PReach will halt as soon as the constructed path from a state satisfyingthe heuristic start predicate pstart reaches a dead-end. Instead, we could categorize allreachable pstart -states for which a path is found, ppath , and those for which a path isnot found, pno-path . Then, then a strengthening predicate over B-typed variables couldbe determined that includes every ppath -state but none of the pno-path -states. With thisapproach, human intervention is only necessary when key auxiliary variables are absentfrom the system.109Symbol MeaningP parameterized systemn natural numberk small fixed natural numberP parametric typeS a system (S, I, T )S set of system statesI system initial statesT system transition relationA,B mixed abstractionθ, ψ abstraction relationSA set of abstract statesIA set of abstract initial statesw,w′ a concrete states, s′ an abstract statew T w′ T -path from w to w′B boolean typePn parametric type with n valuespi permutation on Pnr a ruleρ a rule guarda a rule updateJrK transitions corresponding to rL a local boolean predicateψ(r) abstract rulesets corresponding to rr̂0 abstract ruleset with all ruleset parameters non-abstractedr̂2m−1 abstract ruleset all ruleset parameters abstractedQn parameterized quiescent property; see (5.4)G global boolean predicatePji {h ∈ Pn : i ≤ h ≤ j}pij↔h permutation that exchanges j and h; see (5.5)r̂j an element of ψ(r)A`(i) mixed abstraction with all UA transitions local to node iEFr̂ exists a path along which r̂ firesr̂|i=1 r̂ with instances of ruleset parameter i replaced with 1relax (ρ̂, i) guard ρ̂ with constraints local to i removedΓ(A) strongest boolean predicate implied by ATable 5.3: List of chapter symbols110Chapter 6Distributed Response PropertyCheckingA response property is a simple liveness property that, given state predicates p and q,asserts “whenever a p-state is visited, a q-state will be visited in the future”. This chap-ter presents an efficient and scalable implementation for explicit-state model checking ofresponse properties on systems with strongly- and weakly-fair actions, using a network ofmachines. Our approach is a novel twist on the One-Way-Catch-Them-Young (OWCTY)algorithm. Although OWCTY has a worst-case time complexity of O(n2m) where n isthe number of states of the model, and m is the number of fair actions, we show that inpractice, the run-time is a very small multiple of n. This allows our approach to handlelarge models with a large number of fairness constraints. Implemented with the PReachdistributed, explicit-state model checker introduced in Chapter 3, we demonstrate the ef-fectiveness of our approach by applying it to several standard benchmarks and on somereal-world, proprietary, architectural models.11This chapter is based on [27].1116.1 IntroductionResponse properties are liveness properties of the form “From any state in which proposi-tion p is satisfied, execution will eventually reach a state in which proposition q is satisfied.”In LTL such properties are expressed as (p→ ♦q); the corresponding CTL specificationis AG (p → AF q). Specifications of cache protocols and high-level architectural modelsoften include response properties — e.g. if a processor attempts to write to a memorylocation, the processor will eventually have an exclusive copy of that location in its cache;or, if an instruction is fetched, eventually either it will be executed and committed or that(speculative) path will be aborted. In comparison with the previous two chapters, responseproperties are stronger than the corresponding deadlock-freedom property AG (p→ EF q),but verification is harder. By providing methods to verify both properties, we allow theuser a choice in trading property strength versus model accuracy.The standard approach to explicit state model checking of LTL properties in-volves constructing a product automaton. This automaton is the synchronous productof the Bu¨chi automaton for the specification, and the Bu¨chi automaton for the systemitself [110, 59]. The specification automaton accepts the negation (i.e. violations) of theproperty, while the automaton with for system accepts traces that the system allows. Bythis construction, if the language accepted by the product automaton is empty, then theLTL property holds; otherwise, a counterexample trace is found. Counterexamples cor-respond to reachable cycles of the product automaton that do not include an acceptingstate. All model checking approaches are vulnerable to state-explosion problems, and theproduct-automaton construction for LTL model checking exacerbates this problem. If theoriginal system has n reachble states, and the LTL specification, φ, consists of |φ| symbolsand operators, then constructing the product automaton takes O(n2|φ|) time and space.Response properties for models without fairness assumptions can be expressed witha Bu¨chi automaton with only 2 states [110], and thus the blowup from the formula size iscurbed. Essentially, this Bu¨chi automaton has one state where the system is waiting for a112response, and the other state where no request is pending. Unfortunately, only contrivedsystems that contain no cycles along any path from a p-state to a q-state will satisfy suchresponse properties. In practice, response is verified subject to fairness assumptions thatattempt to characterize realistic traces. Response may be verified under those fairnessassumptions that can be written as the LTL formula Fair , by using LTL model checkingto verify the formula Fair → (p → ♦q). The Bu¨chi automaton for this formula growsexponentially in |Fair |, which in turn causes the number of states of the product automatonto explode.Instead of expressing fairness as an antecedent to the LTL property of interest,fairness can be expressed in terms of how the original system is defined or as a speciallyhandled input to the model checking algorithm. Kesten et al. [74] compare expressing fair-ness as a property antecedent with a “fair-aware” approach and show that latter achievesbetter performance. Manna and Pnueli [88, 89] present a model-checking algorithm prop-erty checking for response properties that takes advantage of two notions of action-basedfairness. We build upon these ideas and implemented a fair-aware version of OWCTYfor response properties in the PReach model checker. Our approach uses Manna andPneuli’s notions of strong and weak fairness. In the worst-case, the algorithm could per-form O(n2|Fair |) state expansions, where n is the number of reachable system states. Inthe typical scenario where |Fair | is much smaller than log(n), this far exceeds the numberof worst-case expansions of the Bu¨chi automaton approach which is O(n2|Fair |). However,our results for benchmark models vastly outperform the worst-case, even though the worstcase is achievable on a contrived example (see Section 6.3.1). In contrast, we also reportresults in Section 6.7 for a tool that implements the Bu¨chi automaton approach and usestime and memory as one would expect from the worst-case analysis.Our contributions are as follows.1. We present a novel, efficient, parallel approach for model checking response proper-ties.1132. The algorithm is implemented as an extension of the PReach [22, 30] model checker.3. Demonstration that verifying liveness for large, realistic systems augmented withboth strong and weak fairness is tractable using a modest network of machines.4. We show that the time requirements for One-Way-Catch-Them-Young style algo-rithms are far better in practice than would be expected from the worst case analysis.In practice, we observe that each state is visited a small number of times (typicallyless than 30).6.2 OverviewTo check response properties, we implemented an algorithm inspired by the set-based One-Way-Catch-Them-Young algorithm described in [73, 38]. We focus on systems with bothstrongly fair actions (a.k.a. compassion), denoted C and weakly fair actions (a.k.a. justice),denoted J .6.2.1 PreliminariesA fair transition system (FTS) is a tuple (S, I, T,J , C) where• S is a finite set of states;• I ⊆ S is the set of initial states;• transition relation T ⊆ S × S;• weakly fair actions J ⊆ 2T ;• strongly fair actions C ⊆ 2T .An action is a subset of T . For example, the set of transitions corresponding to aMurϕ rule could be an action. The rule guard _ action describes the set of transitions,114(s, s′) where s satisfies the state predicate guard and s′ is the state reached by performingupdate action starting from state s. Our implementation of action-based fairness associatesweak- and strong-fairness constraints with Murϕ rules.Function En : S → 2C∪J gives the set of actions enabled at state s, i.e. En(s) ={a ∈ C∪J : ∃s′. (s, s′) ∈ a}. State s enables action a if a ∈ En(s). Given state s we use theshorthand notations Cs and Js to refer to the sets of enabled actions that are strongly andweakly fair, respectively. Formally, Js = J ∩ En(s) and Cs = C ∩ En(s). For conveniencewe assume transitions that are not members of any element of J ∪ C are members of thenon-fair set, i.e. NF = T \(⋃a∈J∪C a). For A ⊆ S, 〈A〉 denotes the subgraph of thedigraph (S, T ) induced by A.A trace is a finite sequence of states s0 ◦ s1 ◦ . . . ◦ s` where so ∈ I, and (si, si+1) ∈ Tfor 0 ≤ i < `. A predecessor trace for state s is any trace where s` = s.An execution is an infinite sequence of states, s0 ◦ s1 ◦ . . ., where s0 ∈ I, and∀i ≥ 0. (si, si+1) ∈ T . For a given trace, action a satisfies• InfOftenTaken(a), if ∀i ≥ 0. ∃k ≥ i. (sk, sk+1) ∈ a,• InfOftenEn(a), if ∀i ≥ 0. ∃k ≥ i. a ∈ En(sk), and• InfOftenDisabled(a), if ∀i ≥ 0. ∃k ≥ i. a /∈ En(sk).An execution is called fair if∀a ∈ C. InfOftenEn(a)⇒ InfOftenTaken(a)∧ ∀a ∈ J . InfOftenTaken(a) ∨ InfOftenDisabled(a).In other words, an execution is fair if both of the following hold of any suffix of theexecution.1. If action a ∈ C is enabled in an infinite number of suffix states then a transition in amust eventually occur.1152. If action a ∈ J is enabled in all suffix states then a transition in a must eventuallyoccur.A strongly connected component (SCC) is called fair (a FSCC) if all strongly fair actionsenabled within the SCC are taken within the SCC, and all weakly fair action enabled withinin the SCC are either taken within the SCC or disabled at some state within the SCC.Section 6.3 presents an algorithm that detects FSCCs within the subgraph of reachablestates that can be reached on a path from some p-state without visiting a q-state along theway (this subset is referred to as pending ; see Figure 6.1). Such FSCCs are counterexamplesto the response property (p → ♦q), as traces are infinite and the system is finite state.Furthermore, every counterexample execution has an infinite suffix that only visits statesin a FSCC. Note that p is a subset of pending , and q is disjoint with pending . The initialstates are usually disjoint from both p and pending , but this need not be the case.reachableinit p qpendingFigure 6.1: Sets of interest when checking a system adheres to (p→ ♦q).6.2.2 A Note about StutteringWe note that fair systems may be defined with or without inherent stuttering, the formerassuming that every state has a transition to itself and the latter does not. For simplicityin the following presentation, we assume that stuttering is possible in all states, therebyrequiring a fair “reason” why indefinite stuttering cannot occur. This assumption impliesthat T is reflexive. Including stuttering simplifies the presentation; for example, it ensuresthat all traces can be extended to infinite executions.116Algorithm 6.1 High level algorithm1: function FindFairCycle(I ⊆ S, T ⊆ S × S, C ⊆ A,J ⊆ A, p ⊆ S, q ⊆ S)2: . A is the power set of the power set of S × S3: pending ,MaybeFair ,Prev ,ToExpand ⊆ S4: . Compute the pending states5: pending ← Reachability(S, I, T, p, q)6: ptfa ← new bit [pending ][J ∪ C] . array of bit-strings7: Clear(ptfa) . initialize to all 0s8: MaybeFair ← pending9: Prev ← ∅10: while MaybeFair 6= Prev do11: Prev ← MaybeFair12: ToExpand ← MaybeFair13: while ToExpand 6= ∅ do14: s← RemoveSomeElement(ToExpand)15: for all a ∈ J \ Js do . Weakly fair actions not enabled at s16: ptfa[s][a]← 117: end for18: Next ← Successors(s) \ q . Ignore q-states to remain within pending19: for all s′ ∈ Next do20: OldActions ← ptfa[s′]21: for all a ∈WhatActionsTaken(s, s′) do22: if a ∈ J ∪ C then23: ptfa[s′][a]← 1 . Record action taken24: end if25: end for26: ptfa[s′]← BitwiseOr(ptfa[s], ptfa[s′]) . Actions preceeding s alsopreceed s′27: if (ptfa[s′] 6= OldActions) then28: ToExpand ← ToExpand ∪ {s′}29: end if30: end for31: end while32: for all s ∈ MaybeFair do33: if ∃a ∈ Js ∪ Cs : ptfa[s][a] = 0 then34: MaybeFair ← MaybeFair \ {s}35: end if36: end for37: Clear(ptfa)38: end while39: return MaybeFair 6= ∅40: end function1176.3 AlgorithmOur distributed response checking algorithm is based on the One-Way-Catch-Them-Young(OWCTY) [73] approach. The key idea of the algorithm is to eliminate all states thatdo not belong to an FSCC and are unreachable from an FSCC. If all reachable statesare eliminated, then no counterexample exists; otherwise an FSCC is contained within theremaining states. Observe that if a state is part of some FSCC, then any fairness constraintsassociated with that state must be satisfied by an (infinite length) path within the FSCC. Inparticular, there will be an incoming path to the state that satisfies the fairness constraintsof the state. OWCTY propagates the satisfaction of fairness constraints forward, thusmarking each state with the fairness constraints that are satisfied by incoming paths.When this process reaches a fixpoint, states whose fairness constraints are not satisfiedcannot be part of a FSCC. These states are eliminated. This process is repeated until afixpoint is reached. If all reachable states are eliminated, then no counterexample exists;otherwise an FSCC is contained within the remaining states.This computation is implemented by first initializing a set MaybeFair with thepending states, and then iteratively removing states from MaybeFair that cannot belongto a FSCC. State s is removed when it is discovered that there is no predecessor traceof s in 〈MaybeFair〉 along which action a ∈ C is taken, where a ∈ Cs. Similarly, s isremoved if it is found that there is no predecessor trace of s in 〈MaybeFair〉 along whichaction a ∈ Js is either taken or disabled at some state s′ of the trace, where a ∈ Js. Theresponse property holds iff MaybeFair is empty when the algorithm terminates. To seethis, note that any state that is removed from MaybeFair cannot belong to a FSCC; thus,〈MaybeFair〉 contains all of the FSCCs of 〈pending〉.Furthermore, the SCCs of 〈pending〉 form a DAG. Consider an SCC C that does nothave any incoming edges. If C is not an FSCC, then it contains state s with some fairnessconstraint that is not satisfied by any transition within C. Because there are no incomingedges to C from other SCCs, there is no incoming path to s that satisfies the fairness118constraint under consideration, and s will be eliminated. Thus, if 〈pending〉 contains noFSCCs, then OWCTY will eventually eliminate all of its states.The FSCCs of 〈MaybeFair〉 form a DAG. Let F be any FSCC of 〈MaybeFair〉 thathas no predecessor FSCCs. It is straightforward to construct a cycle in F that satisfies allfairness constraints. By construction, this cycle is reachable from some initial state.The description of OWCTY from [38] for model checking LTL formulas with strongand weak state-based fairness operates on sets of states performing union and disjunctionoperations, as well as deleting all members from a set which have no predecessor within theset until a fixed point is reached2. As described in Section 3.1, PReach uses lossy com-pression when hashing states; thus, we cannot reconstruct states from hashtable entries.To retain the efficiency advantages of the Murϕ hashtables, we avoid the explicit represen-tation of large sets of states, and replace the union and intersection operations of OWCTYwith tag bit manipulations, where each hash table entry includes one such tag bit per fairaction. In Algorithm 6.1, these bits are stored in ptfa (predecessor trace fair actions) table,which is a two-dimensional array of bits (line 6) initialized to all 0s (line 7. The two indicescorrespond to the elements of pending (i.e. the hash table index), and the fair actions (i.e.which tag bit in the hash table entry). For any action a ∈ J ∪ C and state s ∈ MaybeFair ,bit ptfa[s][a] is set if a is taken in some predecessor trace of s in 〈MaybeFair〉, or if b ∈ J isdisabled at some state of a predecessor trace of s in 〈MaybeFair〉. The set pending storesthe states of interest for response, those that can be reached on a path from a p-statewithout visiting a q-state.Each iteration of the outer while-loop (lines 10–38) is called a round, and involvestwo phases.Action Propagation Phase (AP):This step is the while-loop from lines 13 to 31. Some state s is removed from ToExpandand the tag bits are set for each weakly fair action that is disabled at s; this is because2To the best of our knowledge, the algorithm from [38] not been implemented.119any eventual successor of s within 〈pending〉 may be part of an SCC with s. If so, thisSCC is fair with respect to these weakly fair actions. Then, the successors of s within〈pending〉 are computed. For each of these, the current tag bits are saved in OldActions. Ifthe transition that is taken from s to reach a successor s′ is a member of some a ∈ J ∪ C,the ptfa[s′][a] is set (line 23). Then, the bit-string ptfa[s] is ORed with the ptfa[s′], asany predecessor trace ρ for s implies a predecessor trace for s′, namely ρ ◦ s′. If any ofthese operations have set new bits for s′, it must be added to ToExpand so the bits arepropagated along. Otherwise, the s′ is discarded. This loop continues until a fixed pointis reached for the contents of ptfa.Figure 6.2 illustrates some operations of AP with an example. For this example,J = {a0, a1, a2, a3} and C = {a4, a5, a6, a7}, and PTFAs are represented as a7 . . . a0, asseen below each state. Assume that En(b) = {a0, a2, a3, a4}, En(c) = {a0, a1, a7}, andEn(d) = {a0, a1, a3, a5}. When b is expanded, the PTFA on the arc is passed to state ewhich changes the PTFA for e and requires e to be expanded. Subsequently, c is expandedand the PTFA for e is again updated and another e expansion is needed to communicationthe new PTFA to successors. Finally, when d is expanded the PTFA sent to e contains nonew actions, so e does not need another expansion.cdeb{a2, a7}{a2, a3, a5, a7}{a1, a2, a7}{a1 , a4 , a5}a7 takena4 taken{a2, a3, a5, a7}{a1, a5}a1 taken {a1, a4, a5}{a1, a2, a3, a4, a5, a7}∅Figure 6.2: Example of PTFA updates as states are expanded120State Deletion Phase (SD):This phase appears from lines 32 to 37. Any states that enabled a fair action a but with thecorresponding tag bit cleared cannot be part of a FSCC and are removed from MaybeFair .Soundness for the algorithm was described at the beginning of this section. Tosee that the algorithm terminates, we first note that the while-loop at lines 13–31 mustterminate because the flag bits in ptfa are strictly increasing with successive iterations ofthe loop. The while-loop at lines 10–38 terminates because the loop body adds no newelements to MaybeFair . Thus the number of elements in MaybeFair strictly decreases withsuccessive executions of the loop body.6.3.1 Worst-Case Time Complexity for OWCTYHere we present worst-case time-complexity bounds for the One-Way-Catch-Them-Youngalgorithm. The analysis is reasonably straightforward, but we are not aware of previouslypublished bounds for the algorithm. We include the analysis here for completeness andto show that the run-times for OWCTY in practice are much smaller than the worst-casevalues.First, consider the problem of finding all states reachable from some set of initialstates, V0. Let VR denote the set of reachable states (graph vertices), n = |VR|, and ERthe reachable transitions (graph edges). An explicit-state model checker must visit alledges, which provides an Ω(|ER|) time bound for a sequential algorithm. Standard graph-traversal algorithms achieve this bound. Thus, the time complexity for finding the set ofreachable states is Θ(|ER|). For a dense graph, |ER| ∈ Θ(n2), and the time complexityfor reachability is O(n2). In practice, reachability graphs are often quite sparse, with anaverage in-degree (or out-degree) per state less than 10. If we assume |ER| ∈ Θ(dn) whered is the average in- (or out-) degree for a state, then the time complexity for such sparsemodels is O(dn).Now consider OWCTY with m fairness constraints. The worst-case would be if121each round eliminated one node, there were Ω(n) such rounds, and each round required mpasses to propagate fairness bits to Ω(n) vertices over Ω(n2) edges (again, a dense reachablestate graph). We now describe such a graph. Typically, the number of fairness constraintswill be much smaller than the number of reachable stages, so we assume m n. Let nXand nZ be Θ(n), such that nX +(m+1)+nZ = n. The vertices are partitioned into threesets as described below:X = {xi | 0 ≤ i < nX}Y = {yi | 0 ≤ i ≤ m}Z = {zi | 0 ≤ i < nZ}VR = X ∪ Y ∪ ZEX1 = {(xi, xi) | 0 ≤ i < nX}EX2 = {(xi, xi+1) | 0 ≤ i < nX − 1}EX = EX1 ∪ EX2 ∪ {(xnX−1, y0)}EY 1 = {(yi, yi+1) | 0 ≤ i < m}EY 2 = {(ym, zi) | 0 ≤ i < nZ}EY = EY 1 ∪ EY 2EZ = {(zi, zj) | 0 ≤ i, j < |Z|}ER = EX ∪ EY ∪ EZThe graph vertices are VR; transitions are ER; and the initial states are {x0} ∪ Y . Thereare m strongly fair actions a0, ..., am−1 withaj = {(xi, xi) | xi ∈ X, i+ 1 (modm) = j} ∪{(xi, xi+1 | 0 ≤ i ≤ nX − 1, i (modm) = j} ∪{(yj , yj+1)}Number iterations of the while-loop at lines 10–38 of Algorithm 6.1 starting at0. In round 0, there is no path into x0 satisfying the fairness constraint for a0, so x0 isdeleted. In round 1, there is no path into x1 satisfying the fairness constraint for a1, so122x1 is deleted. For round 0 ≤ i < nX , vertex xi is deleted. Thus, OWCTY requires Ω(n)rounds for this example. Now, suppose that in each round the initial states are processedin order according to ym, ym−1, ..., y0, x0. In each round, each z ∈ Z will be expanded mtimes, and all outgoing edges from z will be explored. Thus, the total time for each roundis Ω(m|EZ |) = Ω(mn2). There are at least |X| ∈ Ω(n) rounds, and the total runtime isΩ(mn3). It is straightforward to show an upper-bound of O(mn3). Thus, the worst-caseruntime for OWCTY with dense reachability graphs is Θ(mn3).Typical models do not have the dense reachability graphs described above. Forexample, if a Murϕ model had d rules, then each state has at most d successors. Typically,d n. If the reachability graph is sparse with maximum out-degree d, we can partitionthe vertices of Z into Θ(n/d) separate cliques of d states each, reserving Θ(n/d) vertices tobuild a fan-out tree from vertex ym−1 to the Z vertices. This shows a worst-case runtimein Θ(mdn2).Comparing these bounds with those for reachability, we find that OWCTY has aworst-case run-time that is Θ(mn) times greater than that of reachability alone. If therun-times for real problems resembled these bounds, then OWCTY would be unusable. Asseen in Table 6.1, the actual number of rounds of the algorithm appears to be quite smallfor realistic examples. For our examples, at most 23 rounds were needed, and the “average”state was visited even fewer times. This enormous gap between the worst-case performanceand the observed performance makes the OWCTY algorithm useful in practice.6.4 Distributed ImplementationThe distributed version of this algorithm starts with a Stern-Dill style reachability compu-tation that identifies all p- and pending-states. Each worker process stores its p-states ondisk, and pending-states are marked with tag bits in the hash-table. Initially, the PTFA forpending-states are set to all fair actions, J ∪ C. The distributed algorithm then performsrounds that correspond to those of the sequential version, Algorithm 6.1. As described in123more detail below, each round propagates PTFA tags according to the next state relationuntil a fix point is reached. At the boundary between rounds, states are identified whosePTFAs do not satisfy the fairness constraints for the state. Such states cannot be part ofan FSCC and are marked as “dead” (i.e., removed from MaybeFair). The number of live,MaybeFair states is non-increasing. The algorithm terminates when this number no-longerdecreases. If at this point, all MaybeFair states have been eliminated, then the responseproperty is satisfied. Otherwise, a counter-example is generated. The remainder of thissection describes this algorithm in more detail.Algorithm 6.2 Root Process1: function RootStart(I, p, q)2: . Tags for initial states3: for all s ∈ I do4: SendState((s, ∅))5: end for6: CurMaybeFairCount ← Tally(nstates)7: PrevMaybeFairCount ← 08: while CurMaybeFairCount 6= PrevMaybeFairCount do9: Broadcast(doRound)10: PrevMaybeFairCount ← CurMaybeFairCount11: CurMaybeFairCount ← Tally(nstates)12: end while13: Broadcast(stop)14: if CurMaybeFairCount > 0 then15: return GenerateCounterexampleTrace(. . .)16: else17: return verified18: end if19: end functionAlgorithm 6.2 shows pseudo-code for the root process. It initiates the initial reach-ability computation to identify p- and pending-states, by sending each initial state to theirowners via SendState. It then initiates rounds of propagating PTFA tags and eliminat-ing pending-states until no further states can be eliminated. The termination detectionalgorithm from the original Stern and Dill approach is used to identify the end of each124round and compute the total number of MaybeFair -states. Function Tally sums theMaybeFair -states owned by each worker. This provides a barrier separating the compu-tations of successive rounds. After the final round, the root process notifies the workersthat the computation is complete and reports either that the response property has beenverified or provides a counter-example.Algorithm 6.3 shows pseudo-code for the worker processes. Like the reachabilitycomputation, each worker has two main activities: receiving incoming states and checkingif they have been “seen” previously, and expanding states to send their successors to theirowners. Algorithm 6.3 augments each of these activities to maintain the tags for PTFA.First, the set of owned and reachable p-states is found and stored via ComputePStates.This is similar to Stern-Dill reachability. Then, at the beginning of each round, each processchecks its subset of the p-states to determine which ones satisfied their associated fairnessconstraints in the previous round. Those that do not are marked as dead. All p-states areadded to the work-queue, ToExpand , even if they are dead to ensure that their successorsare examined in this round. When a state is received, the algorithm first checks to seeif this is the first time the state has been seen for the current round. If so, the state’sPTFA is checked to see if the state should be marked as dead, and all states are enteredinto ToExpand the first time they are visited in each round. If the state has been seenbefore, then if the new PTFA indicates incoming paths for fairness constraints that haven’talready been satisfied, these constraints are added to the state’s PTFA, and the state isenqueued in ToExpand to propagate this information to its successors. Notice that eachstate may be expanded multiple times in a given round; at most m times.When a worker removes a state from its work queue, ToExpand , it computes allsuccessor states as in the original reachability algorithm. Because the incoming paths tothis state are prefixes of incoming paths for its successors, the PTFA of the successor mustcontain the PTFA for this state. Furthermore, if the transition to the new state correspondto a fair action, then this action is added to the PTFA. These updates are made to the125Algorithm 6.3 Worker Process1: function Worker(S, I, T,J , C, p, q, rootPid)2: PS← ComputePStates(S, I, T,J , C, p, q) . Global queue for p-states3: RoundCount ← 04: while true do5: case Receive() of . Blocking receive6: doRound → ok7: stop → break while loop8: end case9: RoundCount ← RoundCount + 110: for all s ∈ PS do11: WQ← initStateForRound(s, ∅,RoundCount)12: end for13: while round not terminated do . Stern and Dill’s termination alg14: while (s, thisPTFA)← Receive() do . Nonblocking receive15: T ← HT.GetTags(s)16: if T.round 6= RoundCount then17: initStateForRound(s, thisPTFA,RoundCount)18: else if ¬T.dead ∧ (thisPTFA * T.PTFA) then19: T.PTFA← T.PTFA ∪ thisPTFA20: WQ.Insert((s, T ))21: HT.UpdateTags(s, T )22: end if23: end while24: ExpandAndSend(J , C) . See Algorithm 6.425: end while26: send (nstates,MyMaybeFairCount) to rootPid27: end while28: end function29:30: function initStateForRound(S, thisPTFA,RoundCount)31: T ← HT.GetTags(S)32: if Enabled(S) * T.PTFA then33: T.dead ← true34: thisPTFA← ∅35: end if36: T.round ← RoundCount37: T.PTFA← thisPTFA38: WQ.Insert((S, T ))39: HT.UpdateTags(S, T )40: end function126Algorithm 6.4 Dequeues a WQ state and sends next states with tags to their owners.1: function ExpandAndSend(J , C)2: if IsEmpty(WQ) then3: return done4: end if5: (X,Tags)← Dequeue(WQ)6: NextStates ← ComputeSuccessors(X)7: if Tags.dead then8: for all s′ ∈ NextStates do9: SendState((s′, ∅))10: end for11: return12: end if13: PTFA← Tags.PTFA14: PTFA← PTFA ∪ (J − Enabled(X))15: for all s′ ∈ NextStates do16: ActionTaken ← whatActionTaken(X, s′)17: . Successor PTFA is current state PTFA with the fair action taken18: if ActionTaken ∈ NF then19: NextPTFA← PTFA20: else21: NextPTFA← PTFA ∪ActionTaken22: end if23: SendState((s′,NextPTFA))24: end for25: return26: end function127PTFA for the successor, and the successor with this PTFA set is sent to the successor’sowner.Every operation either marks a state as dead or adds a fair action to some state’sPTFA. Thus, the activities for updating fairness information eventually reach a fixpointand the round terminates. Many optimizations are possible to improve the performance ofthis algorithm. These are described in the next section.6.5 OptimizationsEarly experiments with a prototype implementation revealed several opportunities to im-prove performance. We aim to address the average number of state expansions during aphase, the number of states visited during a phase, the number of rounds. A key observa-tion is that for many examples, the number of states in the pending set decreases rapidlywith successive rounds. Thus, it is important to avoid touching “dead” states so that thework done in later rounds decreases with the smaller size of pending . This also means thattypically most of the time is spent in the initial reachability computation and the first twoor three rounds of the liveness computation. Thus, optimizations should focus on theseearly rounds. Furthermore, the same state can be updated several times during a singleround. Consolidating these updates was simple and led to significant performance gains.The remainder of this section presents three methods of reducing each of these metricsin turn. In addition, various optimizations are inherited from PReach’s state space ex-ploration technique. Namely, load balancing of states offers modest speedups even in ahomogeneous network of machines. Batching of states into messages containing hundredsor thousands is also of benefit. See Chapter 3 for details regarding these optimizations.6.5.1 Saved ExpansionsThe description in the algorithms and implementations presented so far have states pairedwith their tags, including PTFA, when enqueued to the WQ. When the WQ grows large,128state s may arrive tagged with PTFA b2 while the same state is waiting for expansion inthe WQ while paired with PTFA b1, which matches the PTFA at the HT entry for s.When b2 has at least one bit set that b1 does not, s is enqueued for expansion in WQpaired with PTFA b1 ∪ b2. This renders the earlier WQ entry of (s, b1) redundant andunnecessary.To avoid this scenario, the HT is used to maintain PTFA information, and WQentries do not contain a PTFA. When a state s is enqueued, a new HT tag bit InWQis set; when s is dequeued, InWQ is cleared and the current HT value for PTFA is usedwhen computing the PTFA for s’s successors. If state s with PTFA b2 arrives when theHT entry has InWQ set, then HT PTFA bHT is set to bHT ∪ b2 and the just-arrived states is discarded. This approach reduces the number of state expansions at the cost of anadditional bit in HT per state, and one additional HT lookup.6.5.2 Dynamic KernelThe algorithm implementation above uses the reachable p-states as the kernel, defined asfollows.Definition 3. Given a FTS, K ⊆ S is a kernel for A ⊆ S if A is a subset of the reachablestates from K in the digraph (S, T ).Note that the set of initial states I is a kernel for any subset of the reachable states.In the code presented in Section 6.4, we used the reachable p-states Kp as a kernel forMaybeFair to initiate each phase because Kp is a kernel for every subset of pending . Ourexperiments showed that for typical examples, the number of states in MaybeFair dropsrapidly with each SD phase. The expansion of such deleted states can be avoided bymodifying K after each SD phase, using an extra HT tag bit InK and additional diskspace.During the initial phase, only the p-states have InK set to true, and these states aresaved to disk in the kernel-queue. When a state s is removed from MaybeFair during SD129that has InK set, this flag is cleared. When a process receives state s′ tagged with modedelete pred (signaling that a predecessor of s′ has just been removed from MaybeFair), thenif s′ has its InK flag cleared, it is set to true and s′ is added to the kernel-queue. Finally, atthe start of an AP phase the kernel-queue is copied to the WQ to serve as the set of initialstates, but any state encountered that has its InK flag cleared is ignored and removed fromthe kernel-queue.While this approach does not necessarily maintain the smallest possible kernel forMaybeFair , its simple implementation and low overhead lead to large performance gains.6.5.3 Deletion by Predecessor CountingThere are performance advantages when storing the number of predecessors each state hasin 〈MaybeFair〉. Under the assumption of stuttering and ensuring the safety property thatevery state s ∈ pending has |Js ∪ Cs| ≥ 1, any state with 0 predecessors in 〈MaybeFair〉will be deleted from MaybeFair in the next SD phase. However, storing the number ofpredecessors in HT allows detection of this case in order to preemptively remove suchstates. We choose to add 8 bits to the HT tags to store the predecessor count. Thisadditional bookkeeping complicates Algorithms 6.3 and 6.4 somewhat (details omitted).In particular, a state may be expanded more than once during an SD. This occurs when thefirst time a state is visited the condition on line 32 of Algorithm 6.3 holds, but subsequentlyall of its predecessors are deleted. However, this turns out to be a rare occurrence in thebenchmarks, and this strategy can reduce the number of phases. Note that the impactof this optimization is omitted from the Results section as it was inherent to our earlyimplementation versions.6.6 ResultsWe ran PReach on a variety of combinations of Murϕ models with all optimizations ofsection 6.5 enabled, summarized in Table 6.1. For each, we chose a suitable response130property such as “requests for exclusive access to a cache line are eventually granted”, or“processes waiting to enter the critical section will eventually do so”. The Murϕ modelsused are the German cache coherence protocol, the Peterson mutual exclusion algorithm,the MCS lock mutual exclusion algorithm, a snoopy protocol used as a benchmark inprevious verification work [39] and an Intel proprietary protocol. GermanX is the Germanmodel with X caches; petersonY is Peterson’s algorithm with Y processes and mcslock5is the MCS Lock algorithm with 5 processes; snoop2 is the snoopy protocol with 2 L1caches and 2 clusters. Models saw, gbn and swp are various sliding window communicationprotocols, with the response property that the sender can always eventually accept newdata to transmit. All models and the PReach code is provided online [24]. Each Murϕ“rule” (a.k.a. guarded command) is considered a separate action; we attached suitablefairness assumptions specific to the model. The network of machines used for experimentsare as follows:• UBC cluster: 40 PReach processes on a homogeneous cluster of 20 Intel Core i7-2600K at 3.40 GHz with 8 GB of memory (non-intel * models).• Intel cluster: 16 PReach processes on a heterogeneous network of contemporaryIntel R© Xeon R© machines, each with at least 8 GB of memory (intel * models).Not included in the table, but worth noting, is an Intel proprietary sliding windowprotocol model. With over 450 million states and tens of fairness (both strong and weak),we were able to verify response in about 5 and a half hours using 32 cores.131model runtime states p-states pending-states q-states rounds exp/state no -ko no -se no opt.German5 sf 189 15,836,445 3,699,486 4,858,596 5,103 1 3.48 0.98 2.42 2.86German6 sf 4,253 316,542,087 74,465,244 95,266,520 18,225 1 3.33 1.01 3.30 3.52peterson6 wf 820 13,817,679 2,947,800 12,111,713 45,209 14 12.91 1.65 1.30 1.95peterson6 sf 423 13,817,679 2,947,800 12,111,713 45,209 5 9.03 1.36 1.73 2.12peterson7 wf 26,957 380,268,668 79,029,594 340,549,743 775,138 17 14.19 1.65 1.66 2.16peterson7 sf 14,613 380,268,668 79,029,594 340,549,743 775,138 6 10.11 1.27 2.26 -mcslock5 wf 1415 59,318,541 27,785,789 51,474,427 2,780,517 3 5.09 1.17 1.10 1.25snoop2 sf 160 2,648,763 670,689 1,313,100 1,335,663 3 12.71 1.07 4.57 5.00saw20 sf 323 314,183 309,140 309,140 5,043 23 44.06 1.04 1.09 1.15gbn3 2 sf 369 12,753,395 7,859,200 7,859,200 4894195 6 6.44 1.60 1.95 2.56swp4 2 sf 503 18,595,425 11,715,440 11,715,440 6,879,985 6 6.58 1.59 1.63 2.22intel small sf 285 476,778 268,078 268,078 164,057 4 6.36 - - -intel med sf 1,015 2,696,059 1,944,360 1,944,360 635,672 4 8.59 - - -intel big sf 13,872 51,791,350 29,899,694 29,899,694 19,855,989 8 11.92 - - -Table 6.1: PReach-Resp benchmark results. Column “runtime” is given in seconds; “exp/state” is the averagenumber of times each pending-state was expanded. Model peterson6 sf is peterson6 with all actions strongly fair,as opposed to peterson6 wf where some rules were weakly fair and the rest as not fair (for example, the rule thatinitiates the move from the noncritical section to requesting to enter the critical section needs no fairness assumption).These two models have the same number of states of each type but perform a different number of expansions, andillustrate the benefit of only using more fairness than required for the response property to hold. All other modelsrequire strong fairness.132The rightmost three columns of Table 6.1 show the slowdown when benchmarksare run without the kernel optimization, without the saved expansions optimization andwithout either, respectively. The kernel optimization is of most benefit when the numberof rounds is large3. In particular, it is of no benefit for those benchmarks that only requirea single round, as the kernel states are only used during subsequent rounds. The savedexpansion optimization offers large performance gains in many cases. Typically, only 5 to10% of the total state expansions are explicitly avoided by detecting that a just-receivedstate state is present in the WQ. However, avoiding these redundant expansions can inturn save many expansions of successor states which in turn saves expansions of states thatare two transitions away. This cascading effect decreases the total number of expansionsby a significant factor.A few modifications were required when checking the snoop protocol. This modelwas created to represent a cache-coherence protocol in a realistic processor. The protocolappears to have been designed with an emphasis on safety, and liveness does not appear tohave been primary concern. For example, requests for cache lines are clearly not responsiveas they may be negatively acknowledged (Nackd) an arbitrary number of times. To avoidthis, we changed the protocol so that Nacks of this type are simply ignored, and therequest persists. This turned out to also not be responsive, although less obviously so –the counterexample trace included 72 transitions. Therefore, not all of the pending stateswere deleted. Figure 6.5 shows that about 65% of the the pending-states remained in theMaybeFair set when the algorithm terminated.Figures 6.3 through 6.5 show the length of the work queues and the size of MaybeFairfor each process during model checking of some of the benchmarks. The work queue plotsclearly demonstrate the effective load balancing used by PReach. The difference betweenthe shortest and longest work queue is at most a factor of 5. Notice that the WQ plotsclearly indicate the start and end of phases. Recall that the start of each phase, except for3One exception is saw 20 sf where a large proportion of the runtime is spent coordinatingthreads between rounds.133the first one, initializes the work queue to be the owned p-states for that process, causingthe rapid uptick at the start of each phase. Also recall that even numbered phases areSD – comparing the two plots for the same model shows the decline of MaybeFair -statesduring SD and plateaus during AP, as well as the initial discovery of the pending statesduring the first phase. 0 200000 400000 600000 800000 1e+06 1.2e+06 1.4e+06 1.6e+06 1.8e+06 2e+06 0 500 1000 1500 2000 2500 3000 3500 4000 4500Work Queue StatesSeconds 0 500000 1e+06 1.5e+06 2e+06 2.5e+06 0 500 1000 1500 2000 2500 3000 3500 4000 4500MaybeFair StatesSecondsFigure 6.3: Response property model checking plots for German6: WQ length (left) and|MaybeFair | (right) for each process 0 50000 100000 150000 200000 250000 300000 0 100 200 300 400 500 600 700 800Work Queue StatesSeconds 0 50000 100000 150000 200000 250000 300000 350000 0 100 200 300 400 500 600 700 800MaybeFair StatesSecondsFigure 6.4: Response property model checking plots for peterson6 wf: WQ length (left)and |MaybeFair | (right) for each process134 0 5000 10000 15000 20000 25000 0 20 40 60 80 100 120 140 160Work Queue StatesSeconds 0 5000 10000 15000 20000 25000 30000 35000 0 20 40 60 80 100 120 140 160Cur StatesSecondsFigure 6.5: Response property model checking plots for snoop2: WQ length (left) and|MaybeFair | (right) for each process.6.7 Comparison with DiVinEDiVinE is a parallel and distributed LTL model checker that is the closest tool to ours [7].DiVinE constructs a product Bu¨chi automaton to check liveness properties; thus, we ex-pect DiVinE’s space requirement to grow as the product of the number of states in thesystem model and the number in the system automaton. Applying DiVinE to the exam-ples from Section 6.6, we observed that it ran out of memory for all examples except forthose with no or a small number of strong fairness constraints. DiVinE provides a modefor models where all transitions are weakly fair. Using this feature, DiVinE performedwell for the Peterson example for which weak fairness constraints are sufficient to ensureresponsiveness. Note that the error found in the Murϕ model for Peterson is not detectedwhen all transitions are assumed to be weakly fair. The error arises because there areno fairness assumptions for the action where clients make requests, i.e. clients are not re-quired to make requests. Furthermore, many problems require strong fairness; for examplecache coherence protocols often include states where taking one action disables another.We found that for an encoding of the German protocol with 4 caches, the reachable statespace of DiVinE’s product automaton doubled with each additional fair action included.For only 6 fair rules, DiVinE on a multicore machine took 17 minutes to construct the135system automata, 13 minutes to perform the model checking task and used over 16 GB ofmain memory. In our experiments, adding one more fair rule exhausted the main memoryof our 32 GB machine and rendered the computation time infeasible. The protocol inquestion has on the order of 20 strongly fair actions.Our algorithm has a worst-case time complexity that is at least O(mn3) where nis the number of reachable states and m is the number of fair actions. Section 6.3.1 givesan example where the transition relation has O(n2) edges, and for which Algorithm 6.1removes one state per iteration of the outer while-loop. In practice, we observe that thetransition relation is sparse and Algorithm 6.1 converges in far fewer than n rounds –the most extreme case in Table 6.1 has 23 rounds. If |E| is the number of reachabletransitions, then the number of computed sucessor states is O(r(n+ |E|)). The worst-casetime complexity of DiVinE is better, O(n2|φ|) — DiVinE replaces a factor of the systemmodel size with the number of states for the checking Bu¨chi automaton. However, ourexperiments show that the actual time and memory requirements for DiVinE’s algorithmare fairly close to what one would expect from the worst-case bounds, while our approach,in practice, scales much more efficiently. We see this gap between worst-case and actualperformance as a promising area for further investigation.6.8 Conclusions and Future WorkWe have extended the PReach explicit-state, distributed model-checking tool to supportverification of response properties under both strong and weak fairness of actions. Ourapproach uses multiple rounds of reachability computation to implement a variation ofthe OWCTY algorithm. For a model with n states, m fairness constraints, OWCTYcould expand states O(nm) times on average. This would be prohibitively expensive.Our implementation shows that for practical examples, the number of rounds is small— typically less than 10, with a maximum of 23. Thus, OWCTY appears to providea practical approach to checking response properties for real-world problems. For these136examples, liveness checking is slower than safety checking, but not prohibitively so.Implementing our algorithm on top of the PReach distributed model checker allowsit to exploit the aggregate memory of large compute clusters. This enabled verification ofresponse properties for a sliding-window protocol with over 450 million states in about 512hours.We compared our approach with a tool that uses the standard product-automatonformulation, with one automaton for the system model, and the other for the LTL livenessproperty. As predicted by the worst-case analysis, we observed that the size of the propertyautomaton grew exponentially with the number of fairness constraints. The product-automaton approach was significantly faster than PReach for the problems that it couldcomplete. However, it ran out of memory for all but the smallest examples.This approach can be generalized in a number of directions. One is to handle othersimple liveness properties such as reactivity, expressed in LTL as ♦p∨♦q, where p andq are state predicates. This says that either p holds infinitely often, or eventually q holdsforever. Reactivity can express, for example, a strong-fairness condition. Futhermore,a conjunction of reactivity properties is as expressive as full LTL [84, 88], and so wouldenable model checking of a much wider class of properties. We hope to combine these modelchecking methods with the decompositional inference rules of Manna and Pnueli [88, 89].Such decompositions establish that a response property is implied by a handful of safetyproperties and “smaller” response properties, i.e. depending on a smaller fraction of thestate space. Adapting our algorithm to verify multiple such response properties in thesame model checking run would leverage human insight to improve scalability to enableverification of more detailed models with more expressive specifications.137Chapter 7ConclusionsThis thesis work successfully demonstrates that explicit-state model checking can be lever-aged for practical verification of liveness of hardware protocol models. With PReach,we were able to scale safety verification to new levels of model sizes. Our two extentionsof PReach to liveness checking utilized this achievement to verify more interesting prop-erties of like-sized models. All three of these tools have been applied to real, industrialexamples. Our approach for parameterized proofs of deadlock freedom was not appliedto an industrial example; this could require significant human effort. For the simple ex-amples we considered, German and FLASH, very little additional effort was required toshow deadlock-freedom given the non-interference lemmas inherited from the CMP methodfor safety verification. This is in stark contrast to previous approaches for parameterizedliveness verification, which are quite technical and likely require an expert user. We expectthat we would observe a similar outcome if our approach were applied to real, industrialproblems following the CMP method, but this is a topic for future work.The PReach model checker operates on Murϕ models, a widely adopted formatfor hardware protocols. Our liveness extensions have the benefit of straightforward use andinterpretation to verification engineers and hardware designers alike. PReach-DF requiresthe user to partition actions (i.e., Murϕ rules) into those that progress some transaction138and those that do not, and give a predicate describing “no transactions in flight”. PReach-Resp calls for the specification of weakly- and strongly-fair actions, notions that are oftenintuitively clear to architects. While the distributed scalability of PReach was not neededfor our parameterized deadlock freedom work, we do not view parameterized methods andscalable model checking tools as mutually exclusive. Indeed, during a recent internshipwith Intel, a parameterized abstraction model of an industrial sliding window protocolwas on the order of hundreds of millions of states — large enough that distributed modelchecking was necessary for this abstraction.The contributions of this thesis touch on different points in the trade-off betweeneffort and strength of the result in verification of protocols. PReach can automaticallycheck safety properties in Murϕ models, and can scale to tens of billions of states. Withsome human guidance in the form of helpful rules and a quiescent state predicate, PReach-DF can automatically check a DF property of like-sized models at the cost of a small timeoverhead (about 30%). If the user is willing to provide adequate fairness assumptions,PReach-Resp will verify the much stronger response properties, but with a memoryoverhead of roughly a factor of 5 and a time overhead that is typically a factor of 30 whencompared with safety verification. where n is the number of reachable states and m thenumber of fair actions. Finally, our contributions in parameterized DF require moderatehuman effort, but we expect it is proportional to the effort needed to apply the CMPmethod for parameterized safety, thus establishing a parameterized liveness-like result.Distributed explicit-state model checking allows one to exploit the aggregate mem-ory of many machines. PReach provides a platform for developing various practical DEMCbased methods. We have used this to address some liveness properties that were chosenbased on their usefulness and significance in practice. These are deadlock-freedom, param-eterized deadlock-freedom and response properties. This thesis enables the verification ofa new class of properties for real, industrial problems — especially for hardware protocols.1397.1 Contributions RecapRestating my thesis statement Chapter 1:This thesis develops and demonstrates tractable, practical and scalable dis-tributed explicit-state model checking methods for establishing liveness prop-erties of practical importance for large-scale models of hardware protocols.I have demonstrated this with the following, concrete contributions (also listed inChapter 1:1. PReach is an industrial strength distributed explicit-state model checker that hasverified properties of larger Murϕ models than any other published explicit-statemethods by distributing the computation across hundreds of machines. The PReachsoftware architecture achieves efficiency by building on the C++ code base from theMurϕ model checker and simplicity by using a layer of Erlang code to implementthe communication aspects. We have found this approach to be robust and exten-sible. PReach has been used extensively by our collaborators at Intel. Detailed inChapter 3.2. Chapter 4 presented two algorithms for verifying deadlock-freedom using DEMC.This property ensures that from all reachable states, the system has a path to somequiescent state, which is a state with no pending transactions. DF has the advantagethat it is easily understood by computer architects and hardware protocol designers.Our algorithms are computationally efficient, typically requiring a factor of 1.2 to 2of the time taken by safety property checking. We implemented these algorithms byextending PReach, and found an error in the Peterson model included in the Murϕdistribution, which had persisted for about twenty years.3. Chapter 5 extended our DF checks to include symmetric, parameterized systems.We showed how a mixed abstraction can be used, and sound techniques for weak-140ening the underapproximate transitions. This complements the CMP method [40]for strengthening overapproximate abstractions in an effort to prove parameterizedsafety properties. In practice, our approach appears to require only small additionalhuman effort once the CMP method has been used.4. PReach-Resp is our explicit-state approach to verifying response properties. Theseproperties are much stronger than DF, as they show that all requests are eventuallygranted. Similar to DF, they have the advantage of being intuitive to the non-expert.In practice, response property verification requires fairness assumptions that ignoreunrealistic counterexamples. We demonstrated that integrating fairness assumptionsinto the state space exploration of PReach produces a practical tool. In particular,our twist on the OWCTY algorithm [73, 38] takes only a modest constant number ofexpansions per state — far less than the theoretical worst-case. Detailed in Chapter 6.7.2 Future WorkThis work has opened the door for a number of future directions. The choice of Erlangas the chief programming language for PReach allows rapid development and testing ofnew approaches in the context of DEMC. While other tools have a codebase that wouldbe vexing for a new researcher to digest, PReach and its derivatives require only a fa-miliarity with Erlang, and the time to read 1000 to 1500 lines of code1. For example,other researchers have proposed heuristics for partial order reduction in DEMC [10, 78],but to the best of our knowledge, these methods have neither been implementation norexperimentally evaluated. Such experiments ought to be easy in PReach.As performance has not been a primary concern, I have not spent very much timesearching for bottlenecks. However, we suspect that as the number of threads assigned toa single multi-core machine exceeds 2 or 3, the disk IO becomes a limiting factor. This is1Certain modifications may also require tweaks to the Murϕ engine or the Murϕ compiler, butthese changes are typically local to a few C/C++ functions.141likely due to the fact that each worker reads and writes states from a different large diskfile. In many of our DEMC algorithms, the worker process that expands a state need notbe the state’s owner, thus we are eager to experiment with a dedicated Erlang process tomanage disk accesses and coalesce states into a single file. Another PReach limitation iscrash recovery; if a single worker crashes for any reason (such as Murϕ errors, running outof disk space, Erlang runtime errors) then all workers will die and model checking progressis lost. A recent master’s thesis [71] considered using a database to store the Stern-Dillalgorithm data structures, but it is too slow to be practical. We believe that snapshots ofthese data structures taken at infrequent intervals will offer minimal performance impactwhile providing a rollback point in case of a crash. Another idea in this space is to storestates redundantly, say by assigning multiple owners to each state. This would allow crashrecovery even if a snapshot for a given worker is lost.The modular design of the Erlang communication layer and Murϕ computationengines opens the door for PReach-like DEMC applied to other modelling languages.The Erlang code is agnostic to the specifics of a “state”; all queries regarding if a stateadheres to a predicate or which states are successors are calls through an interface to theMurϕ Engine. Indeed, during a recent Intel internship I was able to replace the MurϕEngine with a custom C++ based modelling language and use the Erlang layer for modelchecking. A potential drawback of using the Erlang runtime is its inherent distributionlimits. I have not tested PReach using more than 200 workers and typically use about50. It would be interesting to try a very large PReach experiment using hundreds ofworkers using WestGrid [112] or a similar large-scale cluster and see if and where thingsbreak down.There are a handful of technical aspects of PReach-DF and PReach-Resp thatshould be addressed. Unlike the original PReach, which has active users and a group ofmaintainers, PReach-Resp and PReach-DF have been implemented as research proto-types and they have not been made as easily available to a wider community. First and142foremost, the bitbucket online repositories for PReach, PReach-DF and PReach-Respshould be merged. The latter two were branches of PReach for research purposes, butclearly we shouldn’t expect users to manage three different installations. PReach-Respcould easily be extended to enforce state-based (as opposed to action-based) fairness as-sumptions. Some users may have Murϕ models that more naturally fit with this kindof fairness. Finally, the action-based fairness of PReach-Resp should be generalized topossibly range over an entire Murϕ ruleset, rather than considering each rule as a distinctaction. One case where these missing features cause problems occurs with rulesets whoseparameters do not appear in the rule’s guard, but only in the command. This is sometimesdone to express update assignments of nondeterministically chosen values. Taking eachrule as a different action often leads to excessively long counterexamples that are confusingto interpret and time consuming for the tool to generate.There are a number of directions for longer term future work. The usefulness ofmodel checking response properties with fairness would be expanded if we could also showrefinement from a high level Murϕ model to an RTL implementation. In particular, wewould need to show that1. the RTL is a refinement of the high level model, and2. the RTL adheres to the fairness assumptions of the high level modelWhile there is a number of researchers that have considered the first problem [109, 29, 49],I am unaware of work that deals with the second problem. To expand on the kinds of LTLproperties that PReach-Resp can handle, I am looking forward to considering reactivityproperties, which are a generalization of response properties and are expressive enough thatany LTL formula can be written as reactivity properties with conjunction and disjunction.Finally, the originally proposed idea of using Manna and Pnueli’s inference rules [88, 89]for decomposing response properties into simpler ones and dispatching to PReach formodel checking would increase our model checking performance through human insight.143Currently PReach-Resp does not support the checking of multiple response properties atonce, which needs to be implemented before this approach would see performance gains.We expect that checking multiple response properties within a single PReach-Resp run would be a straightforward feature to add, especially if we assume that themultiple response properties take a special form. Suppose the response properties to showare r0, ..., rn−1, where ri ≡ (φi → ♦∨nj=i+1 φj). If each ri holds, then it is trivial toshow that (φ0 → ♦φn) also holds. The extra bookkeeping would require an additionaldlog (n− 1)e bits per state, to tag each pending-state with an index value. This index forstates in MaybeFair is the greatest i < n such that some φi-state lies on a predecessor pathwithin 〈MaybeFair〉. This approach does not exploit the full generality afforded by Mannaand Pnueli’s inference rules, but I hypothesize it will reduce the runtime of PReach-Respif the user is willing to provide the φi predicates.As an example of a decomposed response property, consider German’s simple cachecoherence protocol, appearing in Appendix A. Suppose we aim to verify a response property(subject to fairness) where p is “ cache 1 has requested exclusive access”, and q is “cache 1has been granted exclusive access”. One way to decompose this property is to let φ0 = p,φ2 = q and φ1 be “the request for exclusive access has been received by the directory”. Theresponse property (φ0 → ♦φ1 ∨ φ2) is nontrivial to verify, as the directory cannot acceptthe request for exclusive access from cache 1 until it is finished servicing other requests.Likewise, the response property (φ1 → ♦φ2) is nontrivial as requests for exclusive accessmay involve invalidating the line in other caches. Splitting the original response propertyin this way results in a partitioned pending set, and only some of the fairness assumptionsare relevant to each sub-response property. This ought to lead to fewer state expansionsfor our OWCTY adaptation, which is the chief contributor of runtime.Our work in parameterized DF in symmetric protocols could be extended to showparameterized response properties. Consider a parameterized system that has fairness at-tached to parameterized actions. In the standard CMP-style overapproximate abstraction,144fairness assumptions on concrete actions will soundly persist, but it is unsound to assumefairness on actions that undergo abstraction and have weakened guards. Model checking aresponse property on the abstract system will almost certainly fail because such abstractrules are not obligated to fire. However, I believe that similar permutation/model checkingmethods as established in this thesis may be applied to prove that abstract traces wherea given abstract rule is persistently enabled have no concretization. Thus, the abstractionmay be strengthened by attaching a fairness assumption to this abstract rule that avoidssuch executions. As with the parameterized DF work, this method could be applied afterthe standard CMP method has been used and strengthening steps already applied to theabstraction.This thesis has shown that tractable verification of interesting liveness propertiesin large protocol models is possible. The proposed extensions above build upon this resultand give different ways of widening the applicability and scalability further. While modelchecking of arbitrary LTL liveness properties is difficult, this thesis lends itself to techniquesthat are tailored to specific properties and fairness assumptions. Proceeding with futurework with this idea in mind will lend itself to tractable verification of stronger results.In particular, using inference rules to decompose response properties will allow modesthuman insight to dramatically reduce the time needed for model checking and provingresponsiveness.145Bibliography[1] K. Apt and D. Kozen. Limits for automatic verification of finite-state concurrentsystems. Information Processing Letters, 15:307–309, 1986.[2] Joe Armstrong. Programming Erlang: software for a concurrent world. PragmaticBookshelf, 2007.[3] Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, KurtKeutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel WebbWilliams, Katherine A. Yelick, Meetings Jim Demmel, William Plishker, John Shalf,Samuel Williams, and Katherine Yelick. The landscape of parallel computing re-search: A view from berkeley. Technical report, TECHNICAL REPORT, UCBERKELEY, 2006.[4] J. Barnat, P. Bauch, L. Brim, and M. Cˇesˇka. Employing Multiple CUDA Devices toAccelerate LTL Model Checking. In 16th International Conference on Parallel andDistributed Systems (ICPADS 2010), pages 259–266. IEEE Computer Society, 2010.[5] J. Barnat, L. Brim, I. Cerna, P. Moravec, P. Rockai, and P. Simecek. DiVinE – a toolfor distributed verification. In Computer Aided Verification, pages 278–281, 2006.[6] J. Barnat, L. Brim, S. Edelkamp, D. Sulewski, and P. Sˇimecˇek. Can Flash MemoryHelp in Model Checking. In Formal Methods for Industrial Critical Systems (FMICS2008), volume 5596 of LNCS, pages 150–165. Springer-Verlag, 2008.146[7] J. Barnat, L. Brim, V. Havel, J. Havl´ıcˇek, J. Kriho, M. Lencˇo, P. Rocˇkai, Vladimı´rSˇtill, and J. Weiser. DiVinE 3.0 – An Explicit-State Model Checker for MultithreadedC & C++ Programs. In Computer Aided Verification (CAV 2013), volume 8044 ofLNCS, pages 863–868. Springer, 2013.[8] J. Barnat, L. Brim, and P. Rocˇkai. A Time-Optimal On-the-Fly Parallel Algorithmfor Model Checking of Weak LTL Properties. In Formal Methods and Software En-gineering (ICFEM 2009), volume 5885 of LNCS, pages 407–425. Springer, 2009.[9] J. Barnat, L. Brim, and P. Rocˇkai. A Time-Optimal On-the-Fly Parallel Algorithmfor Model Checking of Weak LTL Properties. In Formal Methods and Software En-gineering (ICFEM 2009), volume 5885 of LNCS, pages 407–425. Springer, 2009.[10] J. Barnat, L. Brim, and P. Rocˇkai. Parallel Partial Order Reduction with TopologicalSort Proviso. In Software Engineering and Formal Methods (SEFM 2010), pages 222–231. IEEE Computer Society Press, 2010.[11] J. Barnat, L. Brim, P. Simecek, and M. Weber. Revisiting resistance speeds upI/O-efficient LTL model checking. In Tools and Algorithms for the Construction andAnalysis of Systems (TACAS), pages 48–62. Springer, 2008.[12] J. Barnat, L. Brim, M. Cˇesˇka, and T. Lamr. CUDA accelerated LTL model checking.In ICPADS. IEEE, 2009.[13] J. Barnat, L. Brim, M. Cˇesˇka, and P. Rocˇkai. DiVinE: Parallel Distributed ModelChecker (Tool paper). In Parallel and Distributed Methods in Verification and HighPerformance Computational Systems Biology (HiBi/PDMC 2010), pages 4–7. IEEE,2010.[14] J. Barnat, J. Chaloupka, and J. Van De Pol. Distributed Algorithms for SCC De-composition. Journal of Logic and Computation, 21(1):23–44, 2011.147[15] Jiri Barnat. Distributed Memory LTL Model Checking. PhD thesis, Masaryk Uni-versity Brno, Faculty of Informatics, 2004.[16] Jiˇr´ı Barnat, Jan Havl´ıcˇek, and Petr Rocˇkai. Distributed LTL Model Checking withHash Compaction. Electr. Notes Theor. Comput. Sci., 296:79–93, 2013.[17] K. Baukus, S. Bensalem, Y. Lakhnech, and K. Stahl. Abstracting WS1S systems toverify parameterized networks. In International Conference on Tools and Algorithmsfor Construction and Analysis of Systems (TACAS), pages 188–203, 2000.[18] K. Baukus, Y. Lakhnech, and K. Stahl. Parameterized verification of a cache co-herence protocol: Safety and liveness. In Verification, Model Checking, and AbstractInterpretation (VMCAI), pages 317–330, 2002.[19] G. Behrmann. A performance study of distributed timed automata reachabilityanalysis. Electr. Notes Theor. Comput. Sci., 68(4), 2002.[20] Alexander Bell, Er Bell, and Boudewijn R. Haverkort. Sequential and distributedmodel checking of petri net specifications. STTT, 7:43–60, 2002.[21] B. Bingham. Murphi sources for this chapter.http://www.cs.ubc.ca/~binghamb/cav2011.html. Accessed: July 13, 2013.[22] B. Bingham, J. Bingham, F. M. de Paula, J. Erickson, G. Singh, and M. Reitblatt.Industrial strength distributed explicit state model checking. In Parallel and Dis-tributed Model Checking, 2010.[23] B. Bingham, M. Greenstreet, and J. Bingham. Parameterized verification of deadlockfreedom in symmetric cache coherence protocols. In Formal Methods in Computer-Aided Design (FMCAD), 2011, pages 186–195, Oct 2011.[24] Brad Bingham. Preach response. https://bitbucket.org/binghamb/preach-response. Accessed: October 21, 2014.148[25] Brad Bingham, Jesse Bingham, and John Erickson. Preach-df online. https://bitbucket.org/binghamb/preach-brads-fork. Accessed: July 13, 2013.[26] Brad Bingham, Jesse Bingham, John Erickson, and Mark Greenstreet. Distributedexplicit state model checking of deadlock freedom. In Natasha Sharygina and Hel-mut Veith, editors, Computer Aided Verification, volume 8044 of Lecture Notes inComputer Science, pages 235–241. Springer Berlin Heidelberg, 2013.[27] Brad Bingham and Mark Greenstreet. Response property checking via distributedstate space exploration. Formal Methods in Computer Aided Design (FMCAD), 2014.[28] J. Bingham. Automatic non-interference lemmas for parameterized model checking.In Formal Methods in Computer Aided Design (FMCAD), 2008.[29] J. Bingham, J. Erickson, G. Singh, and F. Andersen. Industrial strength refinementchecking. In Formal Methods in Computer-Aided Design, 2009. FMCAD 2009, pages180 –183, nov. 2009.[30] Jesse Bingham, John Erickson, Brad Bingham, and Flavio M. de Paula. Open-sourcePReach. http://bitbucket.org/jderick/preach. Accessed: July 14, 2011.[31] Stefan Blom, Bert Lisser, Jaco Van De Pol, and Michael Weber. A database approachto distributed state-space generation. Journal of Logic and Computation, 21(1):45–62, 2011.[32] Stefan Blom, Jaco van de Pol, and Michael Weber. LTSmin: Distributed and sym-bolic reachability. In CAV’10, pages 354–359, 2010.[33] AaronR. Bradley. Sat-based model checking without unrolling. In Ranjit Jhala andDavid Schmidt, editors, Verification, Model Checking, and Abstract Interpretation,volume 6538 of Lecture Notes in Computer Science, pages 70–87. Springer BerlinHeidelberg, 2011.149[34] A.R. Bradley, F. Somenzi, Z. Hassan, and Yan Zhang. An incremental approach tomodel checking progress properties. In Formal Methods in Computer-Aided Design(FMCAD), 2011, pages 144–153, 2011.[35] Randal E. Bryant. Symbolic boolean manipulation with ordered binary-decisiondiagrams. ACM Comput. Surv., 24(3):293–318, September 1992.[36] J. R. Burch, E. M. Clarke, K. L. Mcmillan, D. L. Dill, and L. J. Hwang. Symbolicmodel checking: 1020 states and beyond, 1990.[37] Jerry Burch and David Dill. Automatic verification of pipelined microprocessor con-trol. pages 68–80. Springer-Verlag, 1994.[38] I. Cˇerna´ and R. Pela´nek. Distributed explicit fair cycle detection. In Proc. SPINworkshop, volume 2648 of LNCS, pages 49–74. Springer, 2003.[39] Xiaofang Chen, Yu Yang, M. Delisi, G. Gopalakrishnan, and Ching-Tsun Chou.Hierarchical cache coherence protocol verification one level at a time through assumeguarantee. In High Level Design Validation and Test Workshop, 2007. HLVDT 2007.IEEE International, pages 107–114, 2007.[40] C.-T. Chou, P. K. Mannava, and S. Park. A simple method for parameterized veri-fication of cache coherence protocols. In FMCAD, pages 382–398, 2004.[41] A. Cimatti, E.M. Clarke, F. Giunchiglia, and M. Roveri. NuSMV: a new symbolicmodel verifier. In N. Halbwachs and D. Peled, editors, Proceedings Eleventh Con-ference on Computer-Aided Verification (CAV’99), number 1633 in Lecture Notes inComputer Science, pages 495–499, Trento, Italy, July 1999. Springer.[42] E. M. Clarke, O. Grumberg, and M. C. Browne. Reasoning about networks with manyidentical finite-state processes. In Proceedings of the fifth annual ACM symposium150on Principles of distributed computing, PODC ’86, pages 240–248, New York, NY,USA, 1986. ACM.[43] E. M. Clarke, O. Grumberg, and D. E. Long. Model checking and abstraction. ACMTrans. Program. Lang. Syst., 16(5):1512–1542, 1994.[44] E. M. Clarke, O. Grumberg, and D. Peled. Model Checking. MIT Press, 1999.[45] Edmund Clarke, Orna Grumberg, Somesh Jha, Yuan Lu, and Helmut Veith.Counterexample-guided abstraction refinement. In Computer aided verification,pages 154–169. Springer, 2000.[46] R. Cleaveland, S. P. Iyer, and D. Yankelevich. Abstractions for preserving all CTL*formulae. Technical report, 1994. Tech. Rep. 9403, Dept. of Comp. Sc., NorthCarolina State University, Raleigh, NC.[47] Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson.Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001.[48] D. Dams, R. Gerth, and O. Grumberg. Abstract interpretation of reactive systems.ACM Trans. Program. Lang. Syst., 19(2):253–291, 1997.[49] N. Dave, M. C. Ng, and Arvind. Automatic synthesis of cache-coherence proto-col processors using bluespec. In Proceedings of the 2Nd ACM/IEEE InternationalConference on Formal Methods and Models for Co-Design, MEMOCODE ’05, pages25–34, Washington, DC, USA, 2005. IEEE Computer Society.[50] Edsger W. Dijkstra. Guarded commands, nondeterminacy and formal derivation ofprograms. Commun. ACM, 18(8):453–457, August 1975.[51] D. L. Dill, A. J. Drexler, A. J. Hu, and C. H. Yang. Protocol verification as ahardware design aid. In IEEE International Conference on Computer Design: VLSIin Computers and Processors, pages 522–525, 1992.151[52] David L. Dill. The murphi verification system. In International Conference onComputer Aided Verification, pages 390–393, London, UK, 1996.[53] Peter C Dillinger and Panagiotis Manolios. Bloom filters in probabilistic verification.In Formal Methods in Computer-Aided Design, pages 367–381. Springer, 2004.[54] John Erickson. Personal communication, 2010.[55] S. Evangelista, A. W. Laarman, L. Petrucci, and J. C. van de Pol. Improved multi-core nested depth-first search. In S. Ramesh, editor, Proceedings of the 10th Inter-national Symposium on Automated Technology for Verification and Analysis, ATVA2012, Thiruvananthapuram (Trivandrum), Kerala, Lecture Notes in Computer Sci-ence, pages 269–283, London, October 2012. Springer Verlag.[56] Y. Fang, N. Piterman, A. Pnueli, and L. Zuck. Liveness with invisible ranking. Int.J. Software Tools for Technology Transfer, 8(3):261–279, June 2006.[57] Kathi Fisler, Ranan Fraer, Gila Kamhi, Moshe Y. Vardi, and Zijiang Yang. Is therea best symbolic cycle-detection algorithm? In Proceedings of the 7th InternationalConference on Tools and Algorithms for the Construction and Analysis of Systems,TACAS 2001, pages 420–434, London, UK, 2001. Springer-Verlag.[58] Steven German. Personal communication, 2000.[59] Rob Gerth, Doron Peled, Moshe Y. Vardi, R. Gerth, Den Dolech Eindhoven, D. Peled,M. Y. Vardi, and Pierre Wolper. Simple on-the-fly automatic verification of lineartemporal logic. In In Protocol Specification Testing and Verification, pages 3–18.Chapman & Hall, 1995.[60] Orna Grumberg, Tamir Heyman, Nili Ifergan, and Assaf Schuster. Achievingspeedups in distributed symbolic reachability analysis through asynchronous com-putation. In In CHARME, pages 129–145. Springer, 2005.152[61] Orna Grumberg, Tamir Heyman, and Assaf Schuster. A work-efficient distributed al-gorithm for reachability analysis. In Jr. Hunt, WarrenA. and Fabio Somenzi, editors,Computer Aided Verification, volume 2725 of Lecture Notes in Computer Science,pages 54–66. Springer Berlin Heidelberg, 2003.[62] Z. Hassan, A. R. Bradley, and F. Somenzi. Incremental, inductive CTL model check-ing. In Proc. of Int’l. Conf. on Computer Aided Verification, pages 532–547, 2012.[63] Klaus Havelund and Thomas Pressburger. Model checking JAVA programs usingJAVA PathFinder. In International Journal on Software Tools for Technology Trans-fer (STTT), volume 2(4), pages 366–381. Springer-Verlag, March 2000.[64] R. C. Holt. Some deadlock properties of computer systems. ACM Computing Surveys,4(3):179–196, 1972.[65] G. J. Holzmann. The model checker SPIN. IEEE Trans. Softw. Eng., 23(5):279–295,1997.[66] Gerard J. Holzmann. Parallelizing the spin model checker. In Proceedings of the19th international conference on Model Checking Software, SPIN’12, pages 155–171,Berlin, Heidelberg, 2012. Springer-Verlag.[67] A. Hu. Techniques for Efficient Formal Verification Using Binary Decision Diagrams.PhD thesis, Stanford University, 1995.[68] Michael Huth and Mark Ryan. Logic in Computer Science: Modelling and ReasoningAbout Systems. Cambridge University Press, New York, NY, USA, 2004.[69] C. N. Ip and D. L. Dill. Better verification through symmetry. In 11th IFIP WG10.2International Conference on Computer Hardware Description Languages and theirApplications, pages 97–111, 1993.153[70] C. N. Ip and D. L. Dill. Better verification through symmetry. Formal Methods inSystem Design, 9(1/2):41–75, 1996.[71] Valerie L. Ishida. Fault tolerance for distributed explicit state model checking. Mas-ter’s thesis, University of British Columbia, Department of Computer Science, 2014.[72] Hiroaki Iwashita, Tsuneo Nakata, and Fumiyasu Hirose. CTL model checking basedon forward state traversal. In Proc. Int’l. Conf. Computer-Aided Design, pages 82–87,1996.[73] Yonit Kesten, Amir Pnueli, and Li on Raviv. Algorithmic verification of linear tem-poral logic specifications. In Proc. 25th Int. Colloq. Aut. Lang. Prog., volume 1443of Lect. Notes in Com p. Sci, pages 1–16. Springer-Verlag, 1998.[74] Yonit Kesten, Amir Pnueli, Li-On Raviv, and Elad Shahar. Model checking withstrong fairness. Form. Methods Syst. Des., 28:57–84, January 2006.[75] W. J. Knottenbelt, P. G. Harrison, M. A. Mestern, and P. S. Kritzinger. A proba-bilistic dynamic technique for the distributed generation of very large state spaces.Perform. Eval., 39(1-4):127–148, 2000.[76] S. Krstic´. Parameterized system verification with guard strengthening and parameterabstraction. In Automated Verification of Infinite-State Systems, 2005.[77] R. Kumar and E. G. Mercer. Load balancing parallel explicit state model checking.In Parallel and Distributed Model Checking, 2004.[78] A. W. Laarman and D. Farago´. Improved on-the-fly livelock detection: Combiningpartial order reduction and parallelism for dfsFIFO. In G. P. Brat, N. Rungta, andA. J. Venet , editors, Proceedings of the Fifth NASA Formal Methods Symposium,NFM 2013, Moffett Field, CA, USA, Lecture Notes in Computer Science, London,May 2013. Springer Verlag.154[79] A. W. Laarman, R. Langerak, J. C. van de Pol, M. Weber, and A. Wijs. Multi-core nested depth-first search. In T. Bultan and P-A. Hsiung, editors, Proceedingsof the 9th International Symposium on Automated Technology for Verification andAnalysis, ATVA 2011, Tapei, Taiwan, volume 6996 of Lecture Notes in ComputerScience, pages 321–335, London, July 2011. Springer Verlag.[80] L. Lamport. Proving the correctness of multiprocess programs. IEEE Trans. Softw.Eng., 3:125–143, March 1977.[81] K. G. Larsen and B. Thomsen. A modal process logic. In Third Annual Symposiumon Logic in Computer Science (LICS), pages 203–210, 1988.[82] F. Lerda and R. Sisto. Distributed-memory model checking with SPIN. In Proc. ofSPIN 1999, volume 1680 of LNCS., pages 22–39. Springer-Verlag, 1999.[83] Y. Li. Mechanized proofs for the parameter abstraction and guard strengtheningprinciple in parameterized verification of cache coherence protocols. In Proceedingsof the 2007 ACM symposium on Applied computing, pages 1534–1535, 2007.[84] Orna Lichtenstein, Amir Pnueli, and Lenore Zuck. The glory of the past. In RohitParikh, editor, Logics of Programs, volume 193 of Lecture Notes in Computer Science,pages 196–218. Springer Berlin Heidelberg, 1985.[85] LTSmin. Ltsmin related papers. fmt.cs.utwente.nl/tools/ltsmin/#sec2. Ac-cessed: December 1, 2015.[86] Y. Lv, H. Lin, and H. Pan. Computing invariants for parameter abstraction. InMEMOCODE ’07: Proceedings of the 5th IEEE/ACM International Conference onFormal Methods and Models for Codesign, pages 29–38, 2007.[87] N. Lynch and F. Vaandrager. Forward and backward simulations – part I: Untimedsystems. Information and Computation, 121(2):214–233, 1995.155[88] Zohar Manna and Amir Pnueli. Completing the temporal picture. Theor. Comput.Sci., 83:97–130, June 1991.[89] Zohar Manna and Amir Pnueli. Temporal verification of reactive systems: Progress(draft). http://theory.stanford.edu/~zm/tvors3.html, 1996.[90] K. L. McMillan. Symbolic model checking. In PhD thesis, Carnegie Mellon Univer-sity, Pittsburgh, May 1992.[91] K. L. McMillan. Circular compositional reasoning about liveness. In Correct Hard-ware Design and Verification Methods (CHARME), pages 342–345, 1999. An ex-tended version appeared as a Cadence technical report.[92] K. L. McMillan. Verification of infinite state systems by compositional model check-ing. In in CHARME, pages 219–233. Springer, 1999.[93] K. L. McMillan. Parameterized verification of the FLASH cache coherence proto-col by compositional model checking. In Correct Hardware Design and VerificationMethods (CHARME), pages 179–195, 2001.[94] Ken McMillan. Personal communication, 2011.[95] K.L. McMillan. Interpolation and sat-based model checking. In Jr. Hunt, WarrenA.and Fabio Somenzi, editors, Computer Aided Verification, volume 2725 of LectureNotes in Computer Science, pages 1–13. Springer Berlin Heidelberg, 2003.[96] I. Melatti, R. Palmer, G. Sawaya, Y. Yang, R. M. Kirby, and G. Gopalakrishnan.Parallel and distributed model checking in eddy. Int. J. Softw. Tools Technol. Transf.,11(1):13–25, 2009.[97] S. Naffziger, J. Warnock, and H. Knapp. Se2 when processors hit the power wall (or”when the cpu hits the fan”). In Solid-State Circuits Conference, 2005. Digest ofTechnical Papers. ISSCC. 2005 IEEE International, pages 16–17, Feb 2005.156[98] University of Utah School of Computing. Murphi model checker web page. http://www.cs.utah.edu/formal_verification/Murphi/. Accessed: December 1, 2014.[99] J. O’Leary, M. Talupur, and M. R. Tuttle. Parameterized verification using messageflows: An industrial experience. In International Conference on Formal Methods inComputer Aided Design (FMCAD), 2009.[100] R. Palmer and G. Gopalakrishnan. Partial order reduction assisted parallel modelchecking. In Proc. Parallel and Distributed Model Checking (PDMC) Workshop, 2002.[101] Seungjoon Park and David L. Dill. Protocol verification by aggregation of distributedtransactions. In Proc. 8th Int’l. Conf. on Computer Aided Verification, pages 300–310, 1996.[102] A. Pnueli, J. Xu, and L. D. Zuck. Liveness with (0, 1, infinity)-counter abstraction.In Proceedings of the 14th International Conference on Computer Aided Verification(CAV), pages 107–122, 2002.[103] U. Stern and D. L. Dill. Improved probabilistic verification by hash compaction. InCorrect Hardware Design and Verification Methods, IFIP WG 10.5 Advanced Re-search Working Conference, CHARME ’95, pages 206–224, 1995.[104] U. Stern and D. L. Dill. Parallelizing the murphi verifier. In International Conferenceon Computer Aided Verification, pages 256–278, 1997.[105] U. Stern and D. L. Dill. Using magnetic disk instead of main memory in the murphiverifier. In 10th International Conference on Computer Aided Verification (CAV),1998.[106] U. Stern and D. L. Dill. Parallelizing the murphi verifier. Formal Methods in SystemDesign, 18(2):117–129, 2001.157[107] Murali Talupur and Mark R. Tuttle. Going with the flow: parameterized verificationusing message flows. In Proceedings of the 2008 International Conference on FormalMethods in Computer-Aided Design, FMCAD ’08, pages 10:1–10:8, Piscataway, NJ,USA, 2008. IEEE Press.[108] Robert Endre Tarjan. Depth-first search and linear graph algorithms. Siam Journalon Computing, 1:146–160, 1972.[109] S. Tasiran, Yuan Yu, and B. Batson. Linking simulation with formal verification ata higher level. Design Test of Computers, IEEE, 21(6):472–482, Nov 2004.[110] Moshe Y. Vardi and Pierre Wolper. An automata-theoretic approach to automaticprogram verification. In Proceedings of the 1st Annual Symposium on Logic in Com-puter Science (LICS’86), pages 332–344. IEEE Comp. Soc. Press, June 1986.[111] K. Verstoep, H. Bal, J. Barnat, and L. Brim. Efficient large-scale model checking. InInternational Parallel and Distributed Processing Symposium, 2009.[112] WestGrid. Western canada research grid. http://www.westgrid.ca. Accessed:January 28, 2015.[113] Pierre Wolper and Denis Leroy. Reliable hashing without collision detection. In INCOMPUTER AIDED VERIFICATION. 5TH INTERNATIONAL CONFERENCE,pages 59–70. Springer-Verlag, 1993.[114] Y. Yu, P. Manolios, and L. Lamport. Model checking TLA+ specifications. In CorrectHardware Design and Verification Methods (CHARME), pages 54–66, 1999.158Appendix AGerman’s Protocol---- German’s Protocol---- A caching protocol proposed by Steven German of IBM as a challenge for-- parameterized verification.--const ---- Configuration parameters ----NODE_NUM : 2;DATA_NUM : 2;type ---- Type declarations ----NODE : scalarset(NODE_NUM);DATA : scalarset(DATA_NUM);CACHE_STATE : enum I, S, E;CACHE : record State : CACHE_STATE; Data : DATA; end;159MSG_CMD : enum Empty, ReqS, ReqE, Inv, InvAck, GntS, GntE;MSG : record Cmd : MSG_CMD; Data : DATA; end;var ---- State variables ----Cache : array [NODE] of CACHE; -- CachesChan1 : array [NODE] of MSG; -- Channels for Req*Chan2 : array [NODE] of MSG; -- Channels for Gnt* and InvChan3 : array [NODE] of MSG; -- Channels for InvAckInvSet : array [NODE] of boolean; -- Set of nodes to be invalidatedShrSet : array [NODE] of boolean; -- Set of nodes having S or E copiesExGntd : boolean; -- E copy has been grantedCurCmd : MSG_CMD; -- Current request commandCurPtr : NODE; -- Current request nodeMemData : DATA; -- Memory dataAuxData : DATA; -- Auxiliary variable for latest data---- Initial states ----ruleset d : DATA do startstate "Init"for i : NODE doChan1[i].Cmd := Empty; Chan2[i].Cmd := Empty; Chan3[i].Cmd := Empty;Cache[i].State := I; InvSet[i] := false; ShrSet[i] := false;end;ExGntd := false; CurCmd := Empty; MemData := d; AuxData := d;end end;---- State transitions ----ruleset i : NODE; d : DATA do rule "Store"160Cache[i].State = E==>Cache[i].Data := d; AuxData := d;end end;ruleset i : NODE do rule "SendReqS"Chan1[i].Cmd = Empty & Cache[i].State = I==>Chan1[i].Cmd := ReqS;end end;ruleset i : NODE do rule "SendReqE"Chan1[i].Cmd = Empty & (Cache[i].State = I | Cache[i].State = S)==>Chan1[i].Cmd := ReqE;end end;ruleset i : NODE do rule "RecvReqS"CurCmd = Empty & Chan1[i].Cmd = ReqS==>CurCmd := ReqS; CurPtr := i; Chan1[i].Cmd := Empty;for j : NODE do InvSet[j] := ShrSet[j] end;end end;ruleset i : NODE do rule "RecvReqE"CurCmd = Empty & Chan1[i].Cmd = ReqE==>CurCmd := ReqE; CurPtr := i; Chan1[i].Cmd := Empty;for j : NODE do InvSet[j] := ShrSet[j] end;end end;161ruleset i : NODE do rule "SendInv"Chan2[i].Cmd = Empty & InvSet[i] = true &( CurCmd = ReqE | CurCmd = ReqS & ExGntd = true )==>Chan2[i].Cmd := Inv; InvSet[i] := false;end end;ruleset i : NODE do rule "SendInvAck"Chan2[i].Cmd = Inv & Chan3[i].Cmd = Empty==>Chan2[i].Cmd := Empty; Chan3[i].Cmd := InvAck;if (Cache[i].State = E) then Chan3[i].Data := Cache[i].Data end;Cache[i].State := I; undefine Cache[i].Data;end end;ruleset i : NODE do rule "RecvInvAck"Chan3[i].Cmd = InvAck & CurCmd != Empty==>Chan3[i].Cmd := Empty; ShrSet[i] := false;if (ExGntd = true)then ExGntd := false; MemData := Chan3[i].Data; undefine Chan3[i].Data end;end end;ruleset i : NODE do rule "SendGntS"CurCmd = ReqS & CurPtr = i & Chan2[i].Cmd = Empty & ExGntd = false==>Chan2[i].Cmd := GntS; Chan2[i].Data := MemData; ShrSet[i] := true;CurCmd := Empty; undefine CurPtr;end end;ruleset i : NODE do rule "SendGntE"162CurCmd = ReqE & CurPtr = i & Chan2[i].Cmd = Empty & ExGntd = false &forall j : NODE do ShrSet[j] = false end==>Chan2[i].Cmd := GntE; Chan2[i].Data := MemData; ShrSet[i] := true;ExGntd := true; CurCmd := Empty; undefine CurPtr;end end;ruleset i : NODE do rule "RecvGntS"Chan2[i].Cmd = GntS==>Cache[i].State := S; Cache[i].Data := Chan2[i].Data;Chan2[i].Cmd := Empty; undefine Chan2[i].Data;end end;ruleset i : NODE do rule "RecvGntE"Chan2[i].Cmd = GntE==>Cache[i].State := E; Cache[i].Data := Chan2[i].Data;Chan2[i].Cmd := Empty; undefine Chan2[i].Data;end end;163Appendix BHIR ProofsRefer to equation (5.5) for the definition of pij↔h and let T be shorthand for T (n).Heuristic 1. For ruleset r̂j tagged with AEG and with abstracted ruleset parameter i,suppose that r̂0 is UA. If A`(1) |= AG (relax (ρ̂0, i)|i=1 → EF (ρ̂0|i=1)) then tag AEG isdischarged for showing r̂j to be UA.Proof. Without loss of generality, consider only states of ψ−1(ρ̂j) where the ruleset pa-rameters different from i have values in {2, ..., k}; denote the set of such states that areO-reachable with A. We claim that for any w ∈ ψ−1(A), there exists some h ∈ Pn \Pk andstate w′ where pih↔1(w′) ∈ ψ−1(relax (ρ̂0, i)|i=1) and either1. w = w′, or2. w T w′ and w′ only differs from w in the local state of node 1.Assume first that ρ̂0 contains no forall condition, i.e., r̂0 has no guard tags. Comparingψ−1(relax (ρ̂0, i)|i=1) with ρ, comparisons depending on i syntactically replace i with 1, andcomparisons depending on array variables indexed by i, a[i], are removed. If ρ contains theP-comparison i = eP (at most one such condition exists by definition of admissible guard),then ρ̂j contains the condition Other = eP if eP is not abstracted away, and thus there is164some h ∈ Pn \ Pk such that h = eP in w. If ρ does not contain i = eP, then choose anyh ∈ Pn \ Pk. In either case, pih↔1(w) ∈ ψ−1(relax (ρ̂0, i)|i=1), i.e. w = w′.On the other hand, suppose r̂ is tagged with AUG. Since r̂0 is UA, it must have beenestablished UA through Heuristic 3, i.e., A`(1) |= AG (Γ(ρ̂0) → EFC[1]). Thus, althoughpih↔1(w) may not satisfy C[1], there exists a path to state pih↔1(w′) ∈ C[1] that onlydiffers from pih↔1(w) in the local state of node 1. Now, pih↔1(w′) ∈ ψ−1(relax (ρ̂0, i)|i=1)by a similar argument as above, and the claim is established.The mixed abstraction property implies that pih↔1(w′) T pih↔1(w′′), where pih↔1(w′)and pih↔1(w′′) only differ in the local state of node 1, and pih↔1(w′′) ∈ ψ−1(ρ̂0|i=1). Casesplit on whether or not r̂0 is tagged with AUG.Case no AUG: This implies that pih↔1(w′′) ∈ ρ. Fire concrete rule r to transition frompih↔1(w′′) to pih↔1(w′′′). Notice that the local state of h does not change between pih↔1(w)and pih↔1(w′′′) if r̂0 has no command tags. Otherwise, if it is tagged with AUC, then someupdate is performed on the local state of h. Thus, in the path from w to w′′′ the local stateof 1 is updated identically, and this update is performed by ̂aj . This implies state w′′′ isin the preimage of the post-state of r̂j , i.e., {ψ(w′′′)} = ̂a ◦ {ψ(w)} (the state obtained byapplying update ̂a to ψ(w)).Case AUG: Similar to the previous case, with the difference that the local state of nodesPn\(Pk∪{h}) must satisfy C before r fires. Again, since r̂0 is UA, it must have been estab-lished UA through Heuristic 3, i.e., A`(1) |= AG (Γ(ρ̂0)→ EFC[1]). Since pih↔1(w′′) ∈ Γ(ρ̂0),path symmetry implies the path pih↔1(w′′) T pih↔1(w′′′) where pih↔1(w′′′) ∈ ρ. Fire con-crete rule r to transition from pih↔1(w′′′) to pih↔1(w(4)). Using a similar case split on r̂0no command tags or AUC as above, state w(4) is in the preimage of the post-state of r̂j .In either case, by path symmetry and noting that any changes to the local state of node hbetween w and w′ are not observable in the abstract system (ψ(w) = ψ(w′)), the proof iscompleted. Heuristic 2. For ruleset r̂j tagged with AEG and/or AEC with abstracted ruleset parameter165i, suppose that r̂0 is UA. If A`(1) |= AG ((relax (ρ̂0, i)|i=1 ∧ L[1]) → EFr̂0 (L[1])) then AEGand AEC are discharged for showing r̂j to be L-preserving.Proof. To show the claim, suffices to show that for each reachable (s, s′) ∈ r̂j ,∀w ∈ ψ−1(s). ∃w′ ∈ ψ−1(s′). ∀j ∈ Pnk+1. w T w′ ∧ L[j]⇒ w′ ∈ L[j] (B.1)Let h ∈ Pnk+1 be the concrete ruleset parameter in these w. We first show the weakerproperty that∀w ∈ ψ−1(s). ∃w′ ∈ ψ−1(s′). w T w′ ∧ L[h]⇒ w′ ∈ L[h] (B.2)noting that h here depends on w. Let Aj = relax (ρ̂0, i)|i=j ∧ L[j] ∧ Reach. By the mixedabstraction property,∀w ∈ ψ−1(A1). ∃w′ ∈ ψ−1(L[1]). w T w′where some transition from r with ruleset parameter value 1 is taken along the w T w′path. Applying permutation pi1↔ h to these path for each h ∈ Pnk+1 gives∀w ∈ ψ−1(Ah). ∃w′ ∈ ψ−1(L[h]). w T w′where some transition from r with ruleset parameter value h is taken along the w T w′path. This implies the property (B.2). To see that the strongly property (B.1) also holds,observe that if the ruleset is tagged with AUG or AUC then Heuristic 4 will imply thelocality property for nodes indexed with Pnk+1 \{h}. On the other hand, if the rulset is nottagged in this way then the local state of such nodes is unchanged because only local rulesfired on the w T w′ paths.Heuristic 3. For ruleset r̂j tagged with AUG and not AEG, let ∀i ∈ Pn. C[i] be the forallcondition of ρ. If A`(1) |= AG (Γ(ρ̂j)→ EF (C[1])), then tag AUG is discharged for showingr̂j to be UA.166Proof. Let A = Γ(ρ̂j) ∧ Reach. By the mixed abstraction property,∀w ∈ A. ∃w′ ∈ C[1]. w T w′ (B.3)where w and w′ only differ in the local state of node 1. For k + 1 ≤ h ≤ n, let Ih denotethe property∀w ∈ A. ∃w′ ∈∧i∈Phk+1C[i]. w T w′where w and w′ only differ in the local state of i ∈ Phk+1. Applying permutation pi1↔k+1to (B.3) gives property Ik+1. This is the base case for induction on h, assuming Ih holds.Apply permutation pi1↔h+1 to (B.3) and use transitivity to establish property Ih+1. Thus,In holds which implies the UA property.Heuristic 4. For ruleset r̂j tagged with AUG and/or AUC, let r̂2m−1 ∈ ψ(r) be the abstractruleset where all ruleset parameters of r are abstracted. If A`(1) |= AG ((Γ(ρ̂j) ∧ L[1]) →EFr̂2m−1 (L[1])), then tags AUG and AUC are discharged for showing r̂j to be L-preserving.Proof. Let Ah = Γ(ρ̂j) ∧ L[h] ∧ Reach. By the mixed abstraction property,∀w ∈ A1. ∃w′ ∈ L[1]. w T w′ (B.4)where each path w T w′ includes a transition from ruleset r where 1 is nota rulesetparameter value. For k + 1 ≤ h ≤ n, let Ih denote the property∀w ∈ Ah. ∃w′ ∈ L[h]. w T w′where each path w T w′ includes a transition from ruleset r where i is not a rulsetparameter value. Applying permutation pi1↔h to (B.4) gives property Ih. The set ofproperties Ih imply the claim, because each path is independent with respect to localvariables of h, with the exception of the transition t from ruleset r. For a given h, transitiont is agnostic to the local variables of other nodes, thus a t exists that satisfies all paths foreach property Ih.167Appendix CHeuristic ExamplesC.1 Discharging AEGWhen the only guard tag is AEG, at least one ruleset parameter has been abstracted.Our presentation makes the simplifying assumption that there is at most one such rulesetparameter when AEG is discharged, but the same approach generalizes to the case of morethan one. If an automatically generated DF property on the mixed abstraction holds, thistag is discharged, and the ruleset is proven UA, and thus may be added to the set of Utransitions. This property is based on a permutation of the guard predicate of the rulesetin question, and also the guard of the related ruleset where none of the ruleset parametersare abstracted.Consider the following concrete ruleset from the German protocol:ruleset i : NODE do rule "SendGntS"CurCmd = ReqS & CurPtr = i & Chan2[i].Cmd = Empty & ExGntd = false==>Chan2[i].Cmd := GntS; Chan2[i].Data := MemData; ShrSet[i] := true;CurCmd := Empty; undefine CurPtr;end end;This ruleset abstracts to two rulesets in the mixed abstraction:168ruleset i : NODE do rule "ABS SendGntS1"CurCmd = ReqS & CurPtr = i & Chan2[i].Cmd = Empty & ExGntd = false==>Chan2[i].Cmd := GntS; Chan2[i].Data := MemData; ShrSet[i] := true;CurCmd := Empty; undefine CurPtr;end end;rule "ABS SendGntS2"CurCmd = ReqS & CurPtr = Other & ExGntd = false==>CurCmd := Empty; undefine CurPtr;end;In ABS SendGntS1, the ruleset parameter i is non-abstracted, and the concrete ruleSendGntS contains no forall conditions in the guard. Thus, ABS SendGntS1 has no guardtags. Ruleset ABS SendGntS2, on the other hand, has i abstracted and is therefore taggedwith AEG. The generated DF property considers all reachable states satisfying the guard ofABS SendGntS2, but with CurPtr permuted to be some non-abstracted node. From thesestates, we seek a path composed of only local transitions to the value of CurPtr1 to a statesatisfying the guard of ABS SendGntS1. This is expressed in Murϕ as-- using only rules local to 1liveness "H1"-- the "start predicate"CurCmd = ReqS & (isdefined(CurPtr) & CurPtr != Other) & ExGntd = falseCANGETTO-- the "end predicate"CurCmd = ReqS & (exists i : NODE do CurPtr = i & Chan2[i].Cmd = Empty end)& ExGntd = falseend;1Due to symmetry we can assume this value is 1 without loss of generality.169It turns out that every reachable state satisfying the start predicate either1. also satisfies the end predicate, or2. requires local rule RecvGntS to fire, which clears a “grant shared access” messagefrom channel 2, and the resulting state satisfies the end predicate.This proves ABS SendGntS2 is UA. To understand why, consider the proof schema in Fig-ure C.1. In this example, r̂2 is the rule ABS SendGntS2, and ρ̂2 is its guard predicate.Permutation pi is specific to the given state w; it exchanges the value of CurPtr in w with1. Ruleset r refers to the concrete ruleset SendGntS, where r|CurPtr=1 is the rule when theruleset parameter i is 1, and r|CurPtr>1 is the set of rules when i is some other value.It is unknown if Chan2[CurPtr].Cmd = Empty for an arbitrary, reachable w ∈ψ−1(s) (i.e., a reachable concretization of s) because CurPtr = Other in s. Since CurPtr >1 in w, there exists some permutation pi such that CurPtr = 1 in pi(w). The output printedby PReach when DF property H1 is model checked indicates that in pi(w) either r|CurPtr=1is enabled or the concrete rule RecvGntS is enabled, and when fired reaches a state wherer|CurPtr=1 is enabled. In either case, there is a path from pi(w) to some state pi(w′) thatenables r|CurPtr=1, and when this rule fires reaches state pi(w′′); by symmetry, there is ananalogous path from w to some state w′′ resulting from some rule of r|CurPtr>1 firing. Sincew was chosen arbitrarily, it follows that ABS SendGntS2 is UA.C.2 Discharging AUGReferring to the rulesets of the example in Section 5.3.2, suppose we wish to show ABS SendGntE1is UA. Notice that although its guard is syntactically identical to that of the concrete rule-set SendGntE, it is not assumed underapproximate because the forall condition weakensthe guard. Thus, this rule is tagged with AUG. To discharge the tag, we show that in theconcrete system each abstracted node j can take a local path to some C[j]-state, whereC[j] is !ShrSet[j]. That is, we check170SApi−1ψψ−1(s)s′ ∈ post(r̂2)path local to 1 pi(w′) r|CurPtr=1 firespi(w′′)pi(w)s ∈ ρ̂2w w′′S(n)ψ−1post(r|CurPtr>1)pir̂2 firesFigure C.1: Schema for proving ABS SendGntS2 is UA, which follows a similar flow forhandling any rule tagged with AEG. For arbitrary reachable w ∈ ψ−1(s), the path frompi(w) to pi(w′′), by symmetry, implies a path exists from w to some w′′ ∈ ψ−1(s′). Thissatisfies the definition for the transition (s, s′) to be UA. The path from pi(w) to pi(w′) isimplied by model checking the DF property H1 in the mixed abstraction.liveness "H2" -- using only rules local to 1CurCmd = ReqE & isdefined(CurPtr) & !ExGntd -- the "start predicate"CANGETTO(forall j : NODE do !ShrSet[j] end) -- the "end predicate" (j = 1)end;This DF property holds in the mixed abstraction. For every reachable state ssatisfying the start predicate, one of the following cases applies.1711. State s satisfies the end predicate.2. A sequence of 4 local rules fire, starting from s, to reach some state satisfying theend predicate. This sequence begins at s where a pending message in channel 2 ofcache 1 that grants shared access; the first rule consumes this message. The secondrule sends invalidate message to cache 1 from the directory. The third rule consumesthis message and response with an invalidation acknowledgment message on channel3, and the fourth rule consumes this message at the directory and sets the sharer flagof cache 1 to false.3. Some post-fix of this sequence of rules is taken, starting from s to reach some statesatisfying the end predicate. That is, either the latter 3 rules above fires, or the latter2 rules, or the final rule.This establishes that for any local state of some cache in a reachable state satisfyingthe start predicate, there is a local path to some state where the sharer flag is false.Intuitively, although the local state of abstracted nodes are unknown when the guard ofABS SendGntE1 is enabled, the above check verifies the existence of an abstracted path toa state where the sharer flag is cleared for all abstracted caches. See Figure C.2.172w w′′SAs′ ∈ post(r̂1)s ∈ ρ̂1path local to 1w1 w′1path local to iC[1]C[i]wi w′iw′ψ−1(s) ∧allj C[j]Γ(ρ̂1) ⊆ S(n)ψ−1r firespipiψFigure C.2: Schema for proving ruleset ABS SendGntE1 is UA. For any reachable concretestate w1 satisfying the boolean predicate implied by ρ̂1, there is a path to w′1 ∈ C[1] thatonly changes the local state of 1. By symmetry, there is a path from any reachable wi ∈Γ(ρ̂1) to w′i ∈ C[i] that only changes the local state of i, for 1 ≤ i ≤ n. Concatenating thesepaths together over all i, there is a path from any w ∈ Γ(ρ̂1) to w′ satisfying∧1≤i≤nC[i].In particular, if w ∈ ψ−1(s), then w′ satisfies the guard of r, and this rule fires to reachw′′. The path from w to w′′ satisfies the definition of the transition (s, s′) to be UA. Thepath from w1 to w′1 is implied by model checking a DF property in the mixed abstraction.173
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Leveraging distributed explicit-state model checking...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Leveraging distributed explicit-state model checking for practical verification of liveness in hardware… Bingham, Brad 2015
pdf
Page Metadata
Item Metadata
Title | Leveraging distributed explicit-state model checking for practical verification of liveness in hardware protocols |
Creator |
Bingham, Brad |
Publisher | University of British Columbia |
Date Issued | 2015 |
Description | Protocol verification is a key component to hardware and software design. The proliferation of concurrency in modern designs stresses the need for accurate protocol models and scalable verification tools. Model checking is an approach for automatically verifying properties of designs, the main limitation of which is state-space explosion. As such, automatic verification of these designs can quickly exhaust the memory of a single computer. This thesis presents PReach, a distributed explicit-state model checker, designed to robustly harness the aggregate computing power of large clusters. The initial version verified safety properties, which hold if no error states can be reached. PReach has been demonstrated to run on hundreds of machines and explore state space sizes up to 90 billion, the largest published to date. Liveness is an important class of properties for hardware system correctness which, unlike safety, expresses more elaborate temporal reasoning. However, model checking of liveness is more computationally complex, and exacerbates scalability issues as compared with safety. The main thesis contribution is the extension of PReach to verify two key liveness-like properties of practical interest: deadlock-freedom and response. Our methods leverage the scalability and robustness of PReach and strike a balance between tractable verification for large models and catching liveness violations. Deadlock-freedom holds if from all reachable system states, there exists a sequence of actions that will complete all pending transactions. We find that checking this property is only a small overhead as compared to safety checking. We also provide a technique for establishing that deadlock-freedom holds of a parameterized system -- a system with a variable number of entities. Response is a stronger property than deadlock-freedom and is the most common liveness property of interest. In practical cases, fairness must be imposed on system models when model checking response to exclude those execution traces deemed inconsistent with the expected underlying hardware. We implemented a novel twist on established model checking algorithms, to target response properties with action-based fairness. This implementation vastly out-performs competing tools. This thesis shows that tractable verification of interesting liveness properties in large protocol models is possible. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2015-04-09 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0166129 |
URI | http://hdl.handle.net/2429/52670 |
Degree |
Doctor of Philosophy - PhD |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2015-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2015_may_bingham_bradley.pdf [ 1.71MB ]
- Metadata
- JSON: 24-1.0166129.json
- JSON-LD: 24-1.0166129-ld.json
- RDF/XML (Pretty): 24-1.0166129-rdf.xml
- RDF/JSON: 24-1.0166129-rdf.json
- Turtle: 24-1.0166129-turtle.txt
- N-Triples: 24-1.0166129-rdf-ntriples.txt
- Original Record: 24-1.0166129-source.json
- Full Text
- 24-1.0166129-fulltext.txt
- Citation
- 24-1.0166129.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0166129/manifest