UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Fault tolerance for distributed explicit-state model checking Ishida, Valerie Lynn 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2014_november_ishida_valerie.pdf [ 1.38MB ]
JSON: 24-1.0167008.json
JSON-LD: 24-1.0167008-ld.json
RDF/XML (Pretty): 24-1.0167008-rdf.xml
RDF/JSON: 24-1.0167008-rdf.json
Turtle: 24-1.0167008-turtle.txt
N-Triples: 24-1.0167008-rdf-ntriples.txt
Original Record: 24-1.0167008-source.json
Full Text

Full Text

Fault Tolerance for DistributedExplicit-State Model CheckingbyValerie Lynn IshidaB.A., University of California, Berkeley, 2008A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Computer Science)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)October 2014c© Valerie Lynn Ishida 2014AbstractPReach, developed at the University of British Columbia and Intel, is a stateof the art parallel model checker. However, like many model checkers, it facesreliability problems. A single crash causes the loss of all progress in checkinga model. For computations that can take days, restarting from the beginningis a problem. To solve this, we have developed PReachDB, a modifiedversion of PReach. PReachDB maintains the state of the model checkingcomputation even across program crashes by storing key data structures ina database. PReachDB uses the Mnesia distributed database managementsystem for Erlang. PReachDB replicates data to allow the continuationof the computation after a node failure. This project provides a proof-of-concept implementation with performance measurements.iiPrefaceThis thesis is an original, unpublished, independent work by the author,Valerie Lynn Ishida. Some of the LATEX for the algorithms of this documentis adapted from [5].PReachDB was written independently by myself. PReachDB was writtenstarting from the code of the UBC PReach model checker written by BradBingham and Flavio dePaula as a course project for CPSC 538E: ParallelComputing, Spring 2009.Brad Bingham, Jesse Bingham, Flavio M. de Paula, John Erickson, Gau-rav Singh and Mark Reitblatt presented the Intel PReach model checker, arewrite of UBC PReach at Intel, in [5].iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . ixDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.1 Hardware Designs and Specifications . . . . . . . . . 21.2.2 Safety Properties, Invariants, Assertions . . . . . . . 21.2.3 Explicit and Symbolic Model Checking . . . . . . . . 31.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Explicit-State Model Checking . . . . . . . . . . . . . . . . . 52.1.1 Murphi . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.2 State Explosion . . . . . . . . . . . . . . . . . . . . . 62.2 Disk-Based Explicit-State Model Checking . . . . . . . . . . 72.2.1 Disk-based Murphi . . . . . . . . . . . . . . . . . . . 72.2.2 Transition Locality . . . . . . . . . . . . . . . . . . . 82.3 Distributed Explicit-State Model Checking . . . . . . . . . . 112.3.1 Parallelizing the Murphi Verifier . . . . . . . . . . . . 112.3.2 Eddy . . . . . . . . . . . . . . . . . . . . . . . . . . . 14ivTable of Contents2.3.3 PReach . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4 Fault Tolerance in Distributed Computing . . . . . . . . . . 172.4.1 CAP and Mnesia . . . . . . . . . . . . . . . . . . . . 172.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 PReachDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Distributed Explicit-State Model Checking Algorithm . . . . 213.3 Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3.1 Mnesia . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3.2 Data Structures in PReach . . . . . . . . . . . . . . . 213.3.3 Data Structures in PReachDB . . . . . . . . . . . . . 223.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 243.4.1 Launch . . . . . . . . . . . . . . . . . . . . . . . . . . 253.4.2 Table Creation and Management . . . . . . . . . . . . 253.4.3 Search() . . . . . . . . . . . . . . . . . . . . . . . . 263.5 Message Acknowledgments . . . . . . . . . . . . . . . . . . . 273.5.1 Sequence Numbers . . . . . . . . . . . . . . . . . . . 293.6 Failure Handling . . . . . . . . . . . . . . . . . . . . . . . . . 303.6.1 Table Fragmentation and Replication . . . . . . . . . 343.6.2 Hot Spare and Data Migration . . . . . . . . . . . . . 353.6.3 Work Migration . . . . . . . . . . . . . . . . . . . . . 393.7 Modified Algorithm . . . . . . . . . . . . . . . . . . . . . . . 403.7.1 Termination Detection . . . . . . . . . . . . . . . . . 403.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 Results and Performance . . . . . . . . . . . . . . . . . . . . . 444.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.1.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.1.2 System-Wide Restart . . . . . . . . . . . . . . . . . . 454.1.3 Sequence Number on Acknowledgments . . . . . . . . 454.1.4 Single Node Termination . . . . . . . . . . . . . . . . 454.1.5 Hot Spare . . . . . . . . . . . . . . . . . . . . . . . . 464.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.2.1 Performance without Replication . . . . . . . . . . . 474.2.2 In-memory Queue Spilling . . . . . . . . . . . . . . . 494.2.3 Replication Factor . . . . . . . . . . . . . . . . . . . . 504.2.4 Work Queue Replication . . . . . . . . . . . . . . . . 504.2.5 Dirty Table Operations . . . . . . . . . . . . . . . . . 514.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52vTable of Contents5 Discussion and Future Work . . . . . . . . . . . . . . . . . . . 555.1 Implementation Improvements . . . . . . . . . . . . . . . . . 565.2 Automated Recovery . . . . . . . . . . . . . . . . . . . . . . 565.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.3.1 Load Balancing . . . . . . . . . . . . . . . . . . . . . 575.3.2 Messages . . . . . . . . . . . . . . . . . . . . . . . . . 575.3.3 Memory and Disk Usage . . . . . . . . . . . . . . . . 575.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59AppendicesCode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Tabular Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64viList of Tables3.1 Presence of data for lifecycle stages. . . . . . . . . . . . . . . 28A1 Data for Figures 4.1, 4.2 . . . . . . . . . . . . . . . . . . . . . 65A2 Data for Figure 4.3 . . . . . . . . . . . . . . . . . . . . . . . . 66A3 Data for Figure 4.4 . . . . . . . . . . . . . . . . . . . . . . . . 67A4 Data for Figure 4.5 . . . . . . . . . . . . . . . . . . . . . . . . 68A5 Data for Figure 4.6 . . . . . . . . . . . . . . . . . . . . . . . . 69A6 Data for Figure 4.7 . . . . . . . . . . . . . . . . . . . . . . . . 70viiList of Figures2.1 Figure 1 from [33]. . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Figure 6 from [10]. . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Figure 1 from [35]. . . . . . . . . . . . . . . . . . . . . . . . . 133.1 A message lost in flight. . . . . . . . . . . . . . . . . . . . . . 273.2 Lifecycle stages of a model state in PReachDB. . . . . . . . . 283.3 Regenerating the lost message. . . . . . . . . . . . . . . . . . 293.4 Sequence Numbers: Example system configuration. . . . . . . 313.5 Sequence Numbers: Slow messages. . . . . . . . . . . . . . . . 313.6 Sequence Numbers: Resending the successors. . . . . . . . . . 323.7 Sequence Numbers: C arrives at Node 3 twice. . . . . . . . . 323.8 Sequence Numbers: Node 3 acknowledges predecessor A twice. 333.9 Sequence Numbers: Node 2 fails and B is missed. . . . . . . . 333.10 Three node replicating three fragments of a table. Node 3has failed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.11 Hot spare Node 4 is introduced to replace Node 3. . . . . . . 363.12 Fragments are copied to Node 4 and Node 4 joins the com-putation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.1 Without Mnesia, time on logarithmic scale: There is a longperiod with very little progress as process 0 catches up. . . . 474.2 Without Mnesia. . . . . . . . . . . . . . . . . . . . . . . . . . 484.3 With Mnesia: Progress is slow but steady. . . . . . . . . . . . 494.4 With Mnesia and no limit on the in-memory queue size. Progressis similar to with limiting enabled. . . . . . . . . . . . . . . . 504.5 R = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.6 With Mnesia and dirty reads and writes to the visited statetable and the global work queue. . . . . . . . . . . . . . . . . 534.7 With Mnesia and dirty reads and writes to the visited statetable and the global work queue. Both tables were held inmemory and not on disk by Mnesia. . . . . . . . . . . . . . . 54viiiAcknowledgmentsThanks to my supervisor, Mark Greenstreet, for helping me through theprogram. Thanks to my second reader, Rachel Pottinger, for teaching mesome neat skills. Thanks to my labmates for their mentorship, to my laband department for providing a great place to be, and to friends and familyfor moral support. Thanks to Michael Constant for companionship andguidance.ixDedicationDedicated to my mother, Connie Ishida.xChapter 1Introduction1.1 MotivationExplicit-state model checking is a graph search problem on a finite statespace, checking invariants at each reachable state. An example of whenmodel checking is used is in the verification of cache coherency protocols,where an example invariant is that no two processors have the same cacheline in the exclusive state at the same time. The model defines a set ofpossible states which correspond to configurations of the cache system atany point in time. In the example, a state could represent which cache linesare resident on which processors and in what cache state those lines are in.Large state spaces can be described succinctly using high level statetransition languages like Murphi [11], but the verification of the state spacestill requires visiting each reachable state. This exploration pushes the limitsof both traditional and distributed computation. Distributed model checkersto date have not focused on handling faults, and thus tend to be problematicwhen used at an industrial scale.PReach, a system for parallel reachability written in Erlang [3], imple-ments the Stern and Dill distributed model checking algorithm [35] and hasbeen successfully used for the verification of large models (billions of states),but it has no features to continue computation in the event of a system fail-ure. If a failure occurs, the computation is ceased with no way to resumeusing previously computed values. Likewise, if a compute node loses con-nectivity, its workload is not dynamically shifted to an online node, and thesystem cannot make progress.This document presents PReachDB, a new model checker based on PReachwith the key features of computation restart on program crash and datareplication with hot-swappable nodes, to address the reliability problems ofa distributed model checker. The goal of this project is to determine if usingdatabase disk and replication techniques for distributed model checker faultrecovery are feasible.11.2. Model Checking1.2 Model CheckingModel checking is the automatic verification of concurrent systems. A con-current system has multiple units working simultaneously. Concurrent sys-tems tend to have a protocol meant to guide the system towards the goalor desired functionality. Verifying that a protocol matches the desired func-tionality can be stated as a computational problem, where a model of thesystem is described in a high level language and the desired functionality isdescribed in a specification language. A verifier takes the model and specifi-cation files as input, and outputs whether the model meets the specification.We will shortly discuss what kinds of properties can be specified, but firstan example.1.2.1 Hardware Designs and SpecificationsAn example of applied model checking is the verification of hardware de-signs. Modern digital designs consist of many concurrently working units.In the creation of hardware designs, modularity is essential; and interfacedesign between modules is considered very difficult. Different modules maybe connected to different clocks, and thus beat to different drums, or beconnected to no clock at all. Protocol bugs found after fabrication can beextremely costly. Hardware specifications lay out properties about the itembeing checked that, if true, should guarantee that the item performs as de-sired. Note that it is possible for the specification to incorrectly describewhat “working” means to the designer. For this reason, hardware manufac-turers use the test-redesign cycle frequently.1.2.2 Safety Properties, Invariants, AssertionsSpecifications describe what the model must do to guarantee correctness.Correctness includes both safety and liveness properties. A safety propertyin a specification states that nothing bad happens, more specifically, thatthe system never reaches a failure state, such as computing a wrong answer,granting two processes simultaneous access to an exclusive resource, etc.Liveness properties state that something good happens; in other words thatthe system eventually reaches a good state. An example of a liveness prop-erty is responsiveness every request is eventually acknowledged. PReachDBverifies safety properties.A safety property is invariant if it holds for all reachable states. Whenusing a model checker, the user is usually interested in whether safety prop-21.3. Overviewerties are invariant. One way invariant checking is written in model checkinglanguages is to use assertions, which are program statements that cause anerror if the condition being asserted is false.1.2.3 Explicit and Symbolic Model CheckingFinite-state systems can be algorithmically checked as a graph search prob-lem. The set of all states reachable by following any path starting fromthe set of initial states is called the reachable state space. In computingthe reachable state space, we build the set of reachable states by iterativelyadding states to the set of states known to be reachable from the initialstates. A state transition is a change in the model state. A state transitionin the model is represented in the graph as a path from one state to another.A state reachable by a state transition from another state is a successor ofthat state. The computation is finished either when a reachable state isfound to violate the specification or if we have examined all reachable statesand found all safety properties invariant.A different approach uses the implicit graph and symbolic logic to decidewhether states that violate safety properties are reachable. Ordered BinaryDecision Diagrams (OBDDs) [8, 9] are often used to represent the set ofreachable states as a boolean function. For protocol verification, explicit-state model checking often outperforms symbolic model checking [20]. Forthis document, we focus on explicit-state model checking nearly exclusively.1.3 OverviewOur goal in this project is to implement a distributed, reliable system foruse with model checking with the specific features of disk persisted data andcomputation resumption in the event of a system error using a databasemanagement system (DBMS) designed for use with functional languages.The research goals are to show that the system can recover in the event ofthe system going down and to collect measurements on state exploration,memory usage, message queue sizes, and machine time.The main contribution of this project is a proof-of-concept implementa-tion of a distributed, crash-tolerant model checker capable of computationresumption or continuation on the event of a node failure. Resumption ispossible in the case of system crash when computation state is stored ondisk. Continuation is possible through the swap in of a hot-spare node for adisconnected node and through the use of data replication among the node31.4. Organizationpool. In addition we consider the theoretical likelihood of computation com-pletion for this system compared with that of a traditional, fault-intolerantmodel checker.This work does not address methods for making this system’s runtimeperformance compare well with that of a traditional, fault-intolerant modelchecker. This work is also a proof-of-concept, and some features are notrobustly implemented. For example, hot-spare swap-in is implemented fora single node failure only; the system could easily be engineered further toallow additional hot-spares, and the free use of hot-spares is assumed for thetheoretical failure handling model.1.4 OrganizationThis document is organized as follows. Related work in explicit-state modelchecking and in fault tolerance for distributed computing is discussed inChapter 2. Chapter 3 contains full details of the PReachDB model checkingsystem, including fault handling procedures. Chapter 4 shows the resultsof the system responding to failures and contains the performance analysis.Conclusions and future work are discussed in Chapter 5.4Chapter 2Related WorkThis chapter will discuss previous work relevant to the understanding of adistributed explicit-state model checker using disk and aiming to providesome fault tolerance. It will cover explicit-state model checkers and workdone on the two orthogonal features of using disk and distributing the com-putation. It will also cover basic fault tolerance measures in distributedcomputing.2.1 Explicit-State Model CheckingFor concurrent systems with a finite number of protocol states, one methodof verifying the safety properties is to enumerate all reachable protocolstates. We can do this as a graph search problem, of which Breadth FirstSearch (BFS) and Depth First Search (DFS) are algorithmic solutions.Definition 1. A state graph is a triple [21]:A = (Q,S,∆)where Q is the set of states.S is the set of start states, S ⊂ Q.∆ is the set of transition rules ri : Q→ Q ∪ {Error}.Reachable states include S and all states reachable from any applicationsequence of the transition rules from S.In practice, model checkers implement either Breadth First Search orDepth First Search and keep a table of protocol states visited in random-access memory. Two well known model checkers are Murphi [11] and SPIN[26]. This document focuses on Murphi because both PReach and PReachDBuse Murphi description files and both are geared towards the Murphi wayof working.52.1. Explicit-State Model Checking2.1.1 MurphiMurphi is both a description language and a verification system. The de-scription language is a high-level programming language that can describe aconcurrent system with protocol state type and variable declarations, tran-sition rules and invariant descriptions. The Murphi compiler generates anexecutable C++ program from a Murphi description file. The C++ pro-gram enumerates all reachable states and checks the safety properties ateach reachable state.PReach uses the Murphi language for describing the model to be checked,but PReach parallelizes the verification process.2.1.2 State ExplosionTime and memory usage are two major concerns when verifying a model.With the basic graph search approach, these both increase linearly withincreasing reachable state space size. However, reachable state space sizestend to increase much more rapidly than linearly with the model descriptionfile. This is known as the state explosion problem [21]. As explicit-statemodel checkers are used to verify larger and larger models, a point is reachedat which the set of reachable states no longer fits in a table in memory.Several techniques have been developed to reduce memory usage, whichis doubly beneficial in that it allows us to check larger models that would nototherwise fit in memory and it decreases the amount of time the machinespends reading and writing memory. One method is to store in the visitedstate table only a hash signature of the state rather than the full statedescriptor. This is known as hash compaction [37]. Improvements such asusing ordered hashing [2] have been examined in [34]. Using the hash insteadof the full descriptor may cause a problem if there is a hash collision. Ifa newly generated state hashes to the same signature as a state we havealready visited, the state and all its descendant states may not be visited.If all protocol error states go undetected, then the verifier is said to producefalse positives. Hash compaction provides bounds on the probability ofmissing even one state. In practice, the probability of missing a state canbe made very small, and reduced even further by repeating the verificationwith a different random-seed for producing the hash function [37].Another method is to create a table of bits. The bits are initially zero.When a state is inserted, two (or more) hash values for the state are gener-ated and the corresponding bits are set to one. A lookup on the table worksin the same fashion, and the state is considered present if the appropriate62.2. Disk-Based Explicit-State Model Checkingbits are set to one. This method is called bitstate hashing [18, 19]. Bitstatehashing provides an average for the percentage of reachable states visited,but not a bound on omission of an error state.Another approach exploits the inherent symmetry in typical protocols.Cache coherence and network protocols tend to be designed so that anyagent may perform actions (such as enter a critical section), where whichagent is acting may not matter. In [21] the authors add a data type called ascalarset to Murphi that allows for pruning of symmetric state graphs whenthe only difference between state descriptors is a permutation of a scalarset.This requires the users to change their Murphi description files, but if theuser requires verification over a highly symmetric protocol, a great deal ofmemory and runtime savings can be achieved.2.2 Disk-Based Explicit-State Model CheckingIn the previous section we discussed memory saving approaches to handlingstate explosion. In this and the next section we discuss approaches of aux-illary storage. This section covers the technique of utilizing hard disks tostore state spaces larger than fit in RAM, and the next section covers dis-tributed techniques utilizing networks of workstations or other aggregratesof machines.2.2.1 Disk-based MurphiIn [33] the authors investigate how to use magnetic disks for storage of thereachable state hash table while keeping the overhead low. Using magneticdisk allows the state table to be of greater size than when using RAMmemory alone. The difficult consideration is that the state table is normallyrandomly accessed, which if on disk would take orders of magnitudes longer.The key insight in [33] is that disk accesses do not have to be randomlyaccessed, incurring the seek time penalty with each access. Disk accessescan be grouped and performed using a linear read of the entire disk statequeue. The authors also report that hash compaction performs extremelywell in reducing runtime for algorithms using disks.The basic disk-based algorithm shown in Figure 1 of [33] is reproducedin Figure 2.1. For this algorithm, two state tables are used, one containedin memory and one on disk. The in memory table is accessed as normal,while disk accesses happen sequentially in CheckTable(). Unlike the usualbreadth first search algorithm, successor states are not checked for presencein the disk table or added to the FIFO work queue right away. Instead, for72.2. Disk-Based Explicit-State Model Checkingeach level of the breadth first seach, successors of all states in the queue aregenerated and stored in the memory table. Once the queue becomes empty,new states (those in the memory table and not in the disk table) are addedto both the disk table and the work queue. Determining which states arenew is done by linearly reading the disk table and removing from the inmemory table any previously stored states. In addition to per breadth firstsearch level, CheckTable() is also invoked when the memory table fills up.The authors of [33] estimate the performance overhead for using thisalgorithm, and find it to have increasing slowdown as the ratio of states ondisk to states in memory (called memory savings factor in [33]) increases.They find that, comparatively, for an instance of the SCI protocol theiralgorithm performs with 151% runtime overhead over the conventional al-gorithm using just memory, while the conventional algorithm, if seeking forevery access, would have a 4108% runtime overhead. When combined withhash compaction, the authors report runtime overhead around 15% for thedisk algorithm.Although disk-based Murphi allows larger state tables than can fit inRAM, it cannot process states in parallel and is thus much slower thanPReach.2.2.2 Transition LocalityA refinement presented in [10] to the disk-based algorithm uses the propertyof transition locality within the protocol transition graph to decrease diskaccesses. Transition locality is defined in terms of breadth-first search (BFS)levels [36], where at level k is defined as L(0) = I, L(k + 1) = {s′ | ∃s s.t.s ∈ L(k) andR(s, s′) and s′ /∈⋃i=ki=0 L(i)}. A graph exhibits 1-local transitionlocality if most transitions from a source state lead to a destination stateeither on the previous level (L(k − 1)), the same level (L(k)), or the nextlevel (L(k + 1)) as that of the source state (L(k)). In [36] the authors findthat for a handful of protocols provided with the Murphi distribution, “formost states more than 75% of the transitions are 1-local.”The locality-based disk-based BFS algorithm in [10] is similar to thealgorithm in [33] except for the function CheckTable(). Whereas in [33] theentire disk table is read to remove states from the memory table and thestate queue of unchecked states, in [10] only parts of the disk table are read.By decreasing the amount of time spent reading from disk, the authors areable to verify protocols using less runtime. Assuming states are added to thedisk table in BFS level order, the insight of this paper is to look primarilyat the tail of the disk table, which contains the 1-local transition states.82.2. Disk-Based Explicit-State Model CheckingFigure 2.1: Figure 1 from [33].92.2. Disk-Based Explicit-State Model CheckingFigure 2.2: Figure 6 from [10].To probabilistically improve the chance of finding a non-1-local state (a farback state) in the disk table, the authors propose a scheme for dynamicallychoosing which segments of disk states to read. The probability of selectionincreases as per Figure 6 of [10], reproduced in Figure 2.2, where variablevalues are chosen experimentally. Both axes are relative between values 0and 1, where the relative disk block index is ρ = block index / number ofblocks.The authors of [10] tune for how many states to read from disk basedon the disk table size. The disk table is segmented into a fixed number (N)of variable size blocks (of size dS/Ne where S is the number of state hashsignatures in the disk table). When dS/Ne is less than a minimum value B,B is used for block size instead to prevent unnecessary seeks when the disktable is small. Thus while the disk table is small, this algorithm reads theentire table for each BFS level, as in [33]. The program also dynamicallytunes for block selection effectiveness by periodically running a calibrationCheckTable() that reads the entire disk table while removing old states andthat counts how many states would have been in the disk block selectionand how many would not have been. Based on the counts, the programdecides whether or not the block selection is effective, and either increasesor decreases the number of blocks to read at each CheckTable() accordingly.The authors of [10] report that their algorithm is about 10 times faster102.3. Distributed Explicit-State Model Checkingthan the algorithm in [33]. For protocols where a tenth of the model size(M(p) = the minimum amount of memory needed to complete state spaceexploration for protocol p) fits in RAM, the authors find that the veriferis between 1.4 and 5.3 (3 on average) times slower than a RAM-BFS withenough RAM to complete the verification task. The overhead is likely largerthan that reported in [33] due to the significantly larger models being verifiedin [10]. The models in [33] are of size able to store 250K to just over 1Mreachable states, whereas in [10] they are of size able to store 2M to over125M reachable states. This suggests that tuning the disk algorithm iseffective at increasing the size of what we can verify with real hardware, butit cannot overcome the increasing slowdown as the memory savings factorincreases, as noted in [33]. For large models it is still not competitive withPReach for speed.2.3 Distributed Explicit-State Model CheckingDistributing state generation and storage over multiple machines has severaladvantages over using a single machine. The greater aggregate memoryallows us to check models of greater size than could be held by a singlemachine. Additional processors allow us to check properties and generatesuccessors of multiple protocol states at a time. Given the potential speedupand scalability advantages of distributing the task, it is not surprising thatmuch work has been done in this direction.2.3.1 Parallelizing the Murphi VerifierThe seminal paper on parallelizing an explicit-state model checker is byStern and Dill [35]. The authors took the existing Murphi verifier algorithmand implementation and presented a parallel and distributed version of each.Unlike previous work, which was aimed at reducing the size of the reachabil-ity graph or reducing memory footprint, this work aimed to use parallelismto reduce the runtime. This paper laid the foundation for our and othermodel checkers, and much of the terminology and methods originate fromthis paper.Parallel Murphi, the tool presented in [35], utilizes a statically parti-tioned state table to store reached protocol states. Each instance of ParallelMurphi, either as a thread or a process on either a separate or the samemachine, is called a node. Each state s has a single node as its owner. Theowner is computed statically using a hash function Owner(s). When a nodegenerates a new state, it is sent to the state’s owner. At each node, owned112.3. Distributed Explicit-State Model Checkingstates are stored in the local partition of the state table. In addition, eachnode utilizes a private work queue. When a state is received, if it has notbeen entered in the state table then it is added to both the state table andthe work queue. States are removed from the work queue when their suc-cessor protocol states have been generated and sent to each of their owningnodes.A received state may either be1. in neither the state table nor the work queue, indicating it is a newstate we have not reached before;2. in the state table and the work queue, indicating it is counted in thereachability graph, but its successor protocol states have not beengenerated for exploration yet; or3. in the state table and not in the work queue, indicating it is countedin the reachability graph, and its successors have been generated andsent to the appropriate nodes.The basic algorithm presented in Figure 1 of [35] is reproduced in Fig-ure 2.3. The basic algorithm uses active messages to communicate, whereboth the data and procedure address are sent. The authors of [35] chose ac-tive messages for their efficiency when nodes are executing asynchronously.Current message passing libraries provide the desired functionality in aneasy-to-use form, and thus our model checker abandons active messages forlibrary methods.Parallel Murphi’s termination conditions are that there are no more mes-sages (containing states) in flight and that there are no more states in theper-node work queues. Parallel Murphi designates one node as the masternode. The master node sends out the initial set of states at startup andis responsible for checking for termination conditions. Termination detec-tion in parallel Murphi and in PReach is determined at the master by theequationN−1∑i=0Senti −Receivedi + |Qi| = 0 (2.1)where Senti and Receivedi are counter variables for each of the i = 0, ..., N−1 nodes, where N is the number of nodes, and where |Qi| is the size ofthe work queue at node i. The master node queries each node for thesevalues when it has been idle longer than a threshold amount of time. Before122.3. Distributed Explicit-State Model Checkingvar // global variables, but local to each nodeT : hash table; // state tableQ: FIFO queue; // state queueStopSend : boolean; // for termination detectionWork, Sent, Received : integer;Search() // main routinebeginT := ;; Q := ;; // initializationStopSend := false; Sent := 0; Received := 0;barrier();if I am the master then // master generates startstatesfor each startstate s0doSend(s0);enddo // search loopif Q 6= ; then begins := top(Q);for all s02 successors(s) doSend(s0);endQ := Q fsg;endpoll();while not Terminated();endSend(s: state) // send state s to \random" node h(s)beginsc:= canonicalize(s); // symmetry reductionwhile StopSend do // wait for StopSend = falsepoll(); // (for termination detection)endSent++;send active message Receive(sc) to node h(sc);endReceive(s: state) // receive state (active message handler)beginReceived++;if s =2 T then begininsert s in T ;insert s in Q;endendFig. 1. Parallel Explicit State Enumeration5Figure 2.3: Figure 1 from [35].132.3. Distributed Explicit-State Model Checkinganswering the master node’s query, each node sets StopSend to disallowsending messages. This allows us a snapshot of the system.1Parallel Murphi writes out a log file at each node of its local state tablepartition. Each record contains a compressed state value and a pointer toa predecessor in the form of the predecessor’s owning node’s number anda record position in the predecessor’s owning node’s log file. This allowserror trace generation when a protocol error is found. PReach at Intel [5]can provide some error trace generation, but PReachDB currently does notsupport it.The authors of Parallel Murphi identified two areas of potential im-provements for a parallel model checker. They identified load balancing asa problem when bandwidth is non-uniformly available. Nodes with limitedbandwidth become bottlenecks. In such a case, the randomized load balanc-ing provided by the static hash function provides poor performance. Theyalso identified communication overhead as a variable affecting runtime. Theywere able to decrease runtime by aggregating messages containing states intobatches of 10 when 20 or more states exist in the work queue. These prob-lem areas continue to be of great importance to model checker performanceand are further addressed in the literature.PReach is based off of Parallel Murphi and PReach explores both areasfor improvement identified by the authors of Parallel Murphi.2.3.2 EddyThe Eddy project’s [27, 28] main achievement is the specialization of threadslocal to a compute node in order to achieve a modular design for a parallel,distributed model checker. Eddy uses shared memory to commuicate be-tween two threads on a single node (implemented in POSIX Theads [17]),and uses message passing (implemented in MPI [15]) to communicate be-tween nodes. The tasks on a compute node are separated into two cate-gories: state generation and inter-node communication. One thread takeson the role of generating successor states and checking safety properties,and another handles bundling states into packages and sending and receiv-ing messages containing states.Eddy’s architecture is structured to take advantage of MPI [15] andPThreads [17] primitives. For example, in termination detection, a token ispassed along the nodes in a ring accumulating the Senti and Receivedi whenthe node is inactive. If the counters are equal when passed back to the root1PReachDB’s termination detection is simplified by the use of acknowledgment mes-sages; see Section Distributed Explicit-State Model Checkingnode, the computation is complete. MPI broadcast is used to terminate allnodes in the case where a state is found not to satisfy the safety property.One way in which using the MPI interface with Eddy is slightly unwieldlyis the way Eddy batches states per destination node. Separate buffers,called lines, per destination node are stored on each node, and complicatedbookkeeping is required to manage which lines are ready to be sent. Lineswhich have been sent also must not be overwritten until the send operationhas completed, otherwise the sent data could be corrupted.The version of Eddy implemented to work with Murphi models is calledEddy Murphi. The authors present experimental performance results forEddy Murphi compared with single machine, standard Murphi on modelswith 106 to 108 states. They observe near linear speedup as the number ofnodes increases, with a note that one of their models does not perform wellwith their static state partitioning function for certain numbers of nodes.2The authors report being unable to reproduce the work in [35] due to chang-ing cluster technology. They report that an MPI port of parallel Murphi[32], performs worse than standard Murphi. They also report being able toverify the FLASH protocol [23] for 5 processors and 2 data values as pa-rameters, a model with more than 3× 109 states, in approximately 9 hourson Eddy Murphi with 60 nodes. This model would have been expensive toverify on a single machine at the time, requiring an estimate 15 GB of RAMmemory for the hash table on standard Murphi and requiring an estimatedcompute time of 3 weeks on disk Murphi [30].2.3.3 PReachPReach [5], short for parallel reachability, is a new implementation of dis-tributed, parallel Murphi written in Erlang. Erlang is a concurrent, func-tional language with elegant communication primitives. Implementing PReachin Erlang allows the authors to easily represent and make adjustments tothe communication aspects of distributed Murphi. The core algorithm isrepresented in under 1000 lines of code, and PReach has been used to verifyan industrial cache coherence protocol with approximately 30 billion states.PReach uses existing Murphi C code for “front end parsing of Murphimodels, state expansion and initial state generation, symmetry reduction,state hash table look-ups and insertions, invariant and assertion violationdetection, and state pretty printing (for error traces)” [5]. The Erlang codehandles the complex distributed communication code. This architectural2The authors cite [25], which more directly focuses on load balancing.152.3. Distributed Explicit-State Model Checkingdivision allows the authors to harness existing proven technology and tofocus on correctly implementing the communication layer.The basic algorithm is the same as that in [35]. Three features exploredin [5] are batching of states, load balancing, and crediting. I will discussthem in the following paragraphs.In the distributed algorithm, when states are generated, they are sentto their respective owners as messages in a message passing system. Asshown previously in [35] and [27], aggregating messages containing statesinto batches improves runtime performance by decreasing the percent of timespent as overhead processing messages. In the PReach paper [5], the au-thors observed a throughput benchmark speedup factor of 10 to 20 batchingmessages of 100 to 1000 states compared with messages of individual states.Unlike Eddy Murphi [27], which has a fixed size communication buffer of 8lines per destination, PReach has a single variable-size queue per destinationof states to be sent. A fixed size buffer may require the exploring thread toblock on insert if the buffer is full, but there is no chance of running out ofmemory due to the send queue. With the variable size buffer, explorationdoes not have to block, but the send queue may overrun memory. To handlethis, PReach will write the send queue to disk if it is sufficiently large.The problem of load imbalance in distributed explicit-state model check-ing has been noted and addressed often in the literature [4, 25, 35]. Loadimbalance occurs when the distribution of work is uneven per unit time. Themain symptom is a disproportionate work queue size distribution over theset of nodes. Since the runtime of a distributed problem is determined bythe slowest worker, techniques for evening out the size of the work queueshave been successful at decreasing runtime. Load balancing also has thebenefit of decreasing memory usage of the work queue, which is beneficialbecause model checkers are inherently memory-bound. In [25] the authorspresent an aggressive balancing scheme for achieving nearly perfectly equalqueue lengths. Balancing is done by comparing queue lengths with dimen-sional neighbors (for [25] nodes are grouped in a hypercube-like structure),and sending an amount of states equal to half of the difference betweenthe queue lengths. In [5] a more relaxed load balancing scheme is imple-mented with the goal of never having an idle node with an empty workqueue. PReach achieves this by including queue size information with mes-sages containing batched states. When a node receives size information froma peer and the peer’s queue size is smaller than its own by a factor of 5 (anempirically found factor), the node sends to its peer some states from itswork queue. This scheme achieves the desired result and achieves runtimessimilar to and sometimes better than that of [25].162.4. Fault Tolerance in Distributed ComputingCrediting is a method to protect against overwhelming nodes with exces-sive messages during the distributed computation. Erlang’s message passingsystem guarantees delivery eventually and is implemented with a variable-size message receive buffer. If a node receives large numbers of messagescontinuously faster than it can service the messages, the node will allocatemore memory for the message buffer and eventually start paging. In earlyversions of PReach this behavior is observed and often leads to the stand-still or crash of the node. Crediting is implemented by keeping on each nodecredit counters for each peer node. When a message is sent to the peer,the appropriate counter is decremented. When an acknowledgment messageis received, the appropriate counter in incremented. While the counter iszero no new messages are sent to that peer. This scheme provides an upperbound on the size of the inbound message queue of n × C where n is thenumber of nodes and C is the per peer initial credit amount.PReach is useful as a distributed model checker prototyping platformbecause it separates the distributed computation and model checking aspectscleanly. PReachDB builds off of an early version of PReach, and focuses onhandling node crashes.2.4 Fault Tolerance in Distributed ComputingFor a survey of fault tolerance in distributed computing, we direct the readerto [16].PReachDB is a distributed system that uses point-to-point message pass-ing for communication. The main tactics employed in PReachDB to handlefaults are to persist data to disk and to replicate data across multiple nodes.The Mnesia DBMS is part of the Erlang OTP framework. We use Mnesiato implement our distributed data storage.2.4.1 CAP and MnesiaThe CAP Theorem [7] describes a space of tradeoffs in designing a dis-tributed system. The rule of thumb is: from consistency (C), high availabil-ity (A), and tolerance to network partitions (P), a distributed system canhave at most two. In [6] the author clarifies that the apparent tradeoff isless strict if the creators of the system are able to handle operations duringa network partition with some finesse.Mnesia is a CP system. It guarantees consistency and is tolerant ofnetwork partitions, but it does not guarantee availability. In particular if172.5. Summaryall nodes storing some data are unreachable, then on a read or write of thatdata Mnesia will fail with an error.2.5 SummaryThe major difference between PReachDB and previous distributed modelchecker is the use of a database backend to provide persistent storage of keydata structures. To the best of our knowledge, there is no previous work ofusing persistent, database storage in model checkers.18Chapter 3PReachDB3.1 OverviewThe goal of this project is to implement and demonstrate a fault tolerant,distributed, explicit-state model checker. This chapter describes PReachDB,our extension to PReach that uses the persistent, transactional storage pro-vided by a disk-based database system. Here we address issues requiredfor fault tolerance and correctness. Chapter 4 evaluates the performance ofPReachDB, and Chapter 5 examines the feasibility and practicality of thisapproach. Using disk storage with model checkers for increasing the max-imum model size capable of being checked has been studied [33], but notwith the aim of handling faults of the distributed computation.Standard distributed explicit-state model checkers (DEMC) assume thatall nodes will remain online and connected to each other and that all mes-sages that are sent will be received. This guarantees that the entire statespace will be reached, given enough memory.In this project, we allow node failures and dropped messages to happen.Our goal is to build a DEMC with tolerance for these kinds of failures.Messages could be dropped or could be sent to an inactive node, and node-local state could be lost if a node crashes. PReachDB provides solutions forthese new circumstances, and we explain them in this chapter.This chapter gives details of the PReachDB tool developed for thisproject. For details of PReach, upon which the code was based, see Sec-tion 2.3.3 and [5]. After Section 3.2, in which we present the PReachDEMC algorithm, this chapter focuses on the differences between PReachand PReachDB. The extra data structures required to persist and replicatedata are discussed in depth in 3.3. A sketch of the program is presentedin 3.4. The need for message acknowledgments when nodes can fail is pre-sented in 3.5. The implementation of the fault tolerance features usingErlang’s Mnesia database system is presented in 3.6. The modified DEMCalgorithm and new termination detection method are shown in OverviewAlgorithm 1 Stern-Dill DEMC as presented in [5]1: T : set of states2: WQ : list of states3:4: procedure Search( )5: T := ∅6: WQ := []7: barrier()8: if I am the root then9: for each start state s do10: Send s to Owner(s)11: end for12: end if13: while ¬Terminated() do14: GetStates()15: if WQ 6= [] then16: s := Pop(WQ)17: Check(s)18: for all s′ ∈ Successors(s) do19: s′c := Canonicalize(s′)20: Send s′c to Owner(s′c)21: end for22: end if23: end while24: end procedure25:26: procedure GetStates( )27: while there is an incoming state s do28: Receive(s)29: if s 6∈ T then30: Insert(s, T )31: Append(s,WQ)32: end if33: end while34: end procedure203.2. Distributed Explicit-State Model Checking Algorithm3.2 Distributed Explicit-State Model CheckingAlgorithmThe PReach algorithm, shown in Algorithm 1, is based on Stern and Dill’sdistributed, explicit-state model checking (DEMC) algorithm. The Sternand Dill algorithm is described in [35], and the PReach algorithm in [5].The basic idea is graph search over an explicit-state space, starting from agiven set of states and exhaustively computing the reachable space. Whilevisiting each state, properties from the specification are checked againstthe state. If a violation of the specification occurs in the set of reachablestates, then the model does not meet the specification and the computationis terminated. If there are no further reachable states to check, then thecomputation is terminated and the model satisfies the specification.3.3 Data StorageStates in PReach are represented as Erlang tuples. PReachDB uses thesame representation and a similar set of data structures, but the key datastructures are stored in an Mnesia database instead of in memory.3.3.1 MnesiaMnesia [24] is a distributed DBMS for Erlang designed for fault toleranttelecommunications systems. It provides key/value store and lookup, shard-ing of a database table over a pool of computing nodes, replication of tablesor shards, and reconfiguration of the system while on-line. Given thesefeatures, it seems an ideal candidate for adding fault tolerance to PReach.In this section, an Mnesia table refers to a key/value store implementedin Mnesia. For tables keyed on states, we use type = set to enforce uniquekeys. Mnesia tables may be held in memory, on disk, or both.3.3.2 Data Structures in PReachIn the basic version of PReach, a state is inserted once into a hash table ofall the visited states when a compute node first visits the state. The entry isnever updated, and it is looked up several times by compute nodes decidingwhether this state should be visited or not. In more complicated versionsof PReach, each tuple may be updated several times, such as if in-degree isbeing tracked.213.3. Data StorageEach compute node has a work queue containing the states whose prede-cessors have been explored, but who themselves have not yet been explored.Each queue contains only the states owned by the compute node it is asso-ciated with. States are added to the queue when they are received over theErlang messaging system from other nodes or from the same node. Whenthe compute node is ready to explore a new state, it will pop a state fromthe queue.3.3.3 Data Structures in PReachDBThere are five data structures used in PReachDB. Two are held in programmemory, one is held on disk, and two are held either in memory or on diskdepending on the user’s preference. The data structures are1. a per-node, in-memory work queue;2. a per-node, in-memory table of states waiting for acknowledgments;3. a per-node, on-disk Mnesia table containing a single entry, the epochnumber of the node;4. the global work queue, implemented as an Mnesia table; and5. the global visited state table, implemented as an Mnesia table.Each of these will be explained in turn.The major difference between PReachDB’s and previous distributed modelcheckers’ data structures is the presence of Mnesia database tables. For thecurrent version of PReachDB, these tables hold data that is accessible fromany node when accessed through the Mnesia API. In Section 3.7 we showhow these additional structures fit into our fault tolerant DEMC algorithm.Per-Node Work QueueThe per-node, in-memory work queue in PReachDB is similar to that inPReach. It is implemented as an Erlang list argument to the tail-recursivefunction that performs the main loop in Search(). Within Search(), thefirst element of the in-memory queue is removed, verified for satisfying safetyproperties, and used to generate successor states. When a state that hasnever been seen before is received in a message, the state is added to theend of the in-memory queue. See Section 3.6.3 for further discussion of thedifferences between the Per-Node Work Queue and the Global Work Queueand of how they are used when work is migrated between nodes.223.3. Data StorageAcknowledgment TablePReach makes the assumption that the nodes will stay connected. As Erlangguarantees message delivery if possible, this is a reasonable assumption. ForPReachDB we do not make that assumption, as it is possible that a messagewill be sent but never received because the receiver crashes. This means thatif a message gets dropped, an entire portion of the search space may not getexplored. To handle this, we wait for acknowledgment messages from thereceivers before removing a state from the global work queue. We keep atable in memory consisting of a triple of {a state, the number of its succes-sors for whom we have not received acknowledgments, the sequence numberfor this round of acknowledgments}. The counter of unreceived acknowledg-ments is decremented when acknowledgment messages are received. Whenthe counter reaches 0, we remove the state from the global work queue. Thisprocess is further explained in Section 3.5. When all of the successors of astate have been acknowledged and safely stored to the disk work queue, itis safe to remove the predecessor state from the disk work queue, as we willnot need to regenerate those graph edges again.Epoch Number TableWhen we allow for a node to rejoin the computation after crashing andrestarting, we need to provide the means for a node to detect if the mes-sages it receives are no longer relevant to the data in memory. To solvethis problem we introduce epoch numbers. Each node has its own epochnumber which tracks how many times that node has started up. Theseper-node epoch numbers are stored using persistent storage, and each nodeincrements its number on start-up. Nodes include their current epoch num-bers in messages sent to other nodes, and acknowledgment messages notethe epoch of the message that is being acknowledged. In this way, a nodecan disregard acknowledgments for messages from earlier epochs. Section3.5 describes this process in more detail.Global Work QueueThe states yet to be explored are stored in an Mnesia table, along with stateswe may have to re-explore if a crash were to occur. The Mnesia work queuestored the states but does not maintain order. The states in the Mnesiawork queue are a superset of the states in the in-memory work queues.If a crash occurs, the in-memory lists are lost, but the Mnesia workqueue is preserved. On a restart, when a node starts up and has an empty233.4. Implementationin memory queue, it will ask Mnesia to read some states from the Mne-sia work queue. Currently PreachDB uses the mnesia:all keys/1 function,which returns all of the states, and pseudorandomly choses a sublist of statesto add to the in-memory work queue. Which states are explored by whichnode does not matter much, because all states from the Mnesia work queuewill eventually be fetched in this way or otherwise explored from anotherpredecessor link. States are removed from the Mnesia work queue when thenumber of acknowledgments outstanding is zero.In an earlier iteration of PReachDB, each node maintained its own diskwork queue. This configuration worked under the assumption that all nodeswould eventually come back online, and that swapping out a node wouldnot be required.Global Visited State TableThe global visited state table is an Mnesia table fragmented across all of thecompute nodes. The table can be held on disk and/or in memory. Fragmentreplication count can also be set between 1 and N, the number of computenodes. A disk copy of each fragment of the table must be kept by at leastone node for all the data to persist through a system restart. The replicationcount must be at least 2 to handle a single node disk failure.The key of the hash table is the state itself (an Erlang tuple). Thisis possible because Mnesia is an extended relational DBMS, which can usearbitrary Erlang data structures as keys. A hash of the state could be usedas a key if probabilistic search coverage is desired instead. Using the state asthe key makes the table much larger than it would be with hash compaction[37]. PReach uses hash compaction.3.4 ImplementationLike Parallel Murphi [35] and as described in 2.3.1, PReachDB designatesone node as the master node. The master has the same responsibilities asin Parallel Murphi but must also create the database tables.The sketch of the program with regards to Mnesia is as follows.1. Start up each compute node.2. Each compute node calls mnesia:start().3. If this is the first run, have the master node create the tables. Eachcompute node connects to the Mnesia tables.243.4. Implementation4. Each compute node waits for the tables to load.5. Each node does its work, accessing the tables.6. Each compute node calls mnesia:stop().3.4.1 LaunchNode pool launch was implemented in PReach using the Erlang moduleslave. slave is a module to start additional Erlang nodes on local or remotehosts. An Erlang node is a single thread of execution in the Erlang runtimeenvironment. When connecting to a remote host through ssh, slave doesnot correctly escape (backslash) quotes in command-line arguments. Mnesiarequires as command-line arguments the properly quoted filesystem locationwhere the schema table and data tables will be stored. It is not possible tostart Mnesia remotely using the slave module. We instead use Perl scriptsto call Erlang directly on each machine.3.4.2 Table Creation and ManagementSetting up the tables is a multi-step process that requires coordination be-tween the nodes. We use message passing to block a node’s progress untilcoordination messages are received. The following describes how we set upthe tables in roughly chronological order.Mnesia runs in the same address space as the application. Any Erlangconfiguration done on a node affects its Mnesia application. Each nodecalls mnesia:start(). The non-master nodes invoke a remote procedure onmaster to send to itself the list of db nodes registered to Mnesia. The non-master nodes add the retrieved db nodes to its extra db nodes through theMnesia configuration. This sets up the node’s Mnesia application to connectto the given nodes and share table definitions, including the schema. “Theschema is a special table which contains information such as the table namesand each table’s storage type.” [14] Each node then sets the schema to bestored both on disk and in RAM (the default is in RAM only). The schemaneeds to be set to type disc copies before any disk resident tables arecreated.Each non-master node sends a message to master that it has completedthe step to set schema to disk. When master has received ready messagesfrom each of the other nodes, it creates the tables.253.4. ImplementationThe epoch number table is created using mnesia:create table/2.3 Thistable is created with the local content option set to true, so each nodestores its own value in this table. This table is always stored to disk.We use an open source Erlang library called Fragmentron [29] to createour fragmented tables. Both the Global Work Queue and the Global VisitedState Table are created in this manner. We call fragmentron:create table/2with the following value for frag properties.[{n_fragments, length(NodePool)}, % N table fragments{node_pool, NodePool}, % Use our nodes in node pool{n_disc_copies, NDiscCopies}] % Disc fragment replicationThis command tells Fragmentron to create a table with N fragments, whereN is the number of nodes in our node set. The table’s node pool is set toour compute nodes. The number of fragment replicas to maintain is set toNDiscCopies. We make no distinction between the “true” fragment and itsreplica in this section. An appropriate value for NDiscCopies is discussedfurther in 3.6.1. If RAM only tables are desired, n disc copies should bereplaced with n ram copies.Fragmentron is a helper library for fragment replica balancing betweenthe nodes in the node pool. The addition of additional nodes to the nodepool or deletion of an existing node is handled gracefully. Fragment replicasare spread out evenly among nodes. If a node deletion causes the numberof replicas to drop below n disc copies, a new replica is created on anexistent node in the node pool.Once the master creates the tables, the master sends a message to thenon-master nodes to proceed. All nodes then call mnesia:wait for tables/2to wait for the tables to load.3.4.3 Search()The core logic of Search() remains largely the same as in the originalPReach. Successor states are generated from the model using the same suc-cessor function, and each successor is in turn sent to a node to be processed.However, many of the data structures present in PReach are replaced withMnesia tables in PReachDB. These tables are accessed using Mnesia trans-3 Erlang is untyped, but functions can be overloaded according to the number ofparameters that the function has. Functions in external modules are referenced bythe module name, the function name, and the number of arguments with the syntaxmoduleName:functionName/argumentCount.263.5. Message Acknowledgmentsactions. In Section 5.3, we discuss why this is bad for performance andpropose possible next steps for improving data access performance.For this project we focused more on correctness after system faults ratherthan on performance. The two most interesting problems we investigatedwere what considerations we must make if messages containing states tobe explored are arbitrarily dropped and how to deal with node crashes.Changes to the implementation of Search() mostly reflect how we dealtwith these.3.5 Message AcknowledgmentsWhen processing a state, a compute node enumerates the successors of thestate. Each successor state is sent to a compute node, which explores thenew state. In PReach the destination for each state is based on a mappingfrom the hash of the state to one of the nodes. A node, in a sense, owns aset of states; it is responsible for exploring the set of states mapped to it.  ● When all nodes go down, we may lose a message in flight● The message represents a state that is not stored anywhereDisk 2Disk 3Node 2Node 3Message with state BFigure 3.1: A message lost in flight.If we allow for failures, a state could fail to reach its destination due tothe destination node being down or due to a network partitioning. Not ex-ploring the state means the reachable state space was incompletely explored;273.5. Message Acknowledgmentsthis is a failure in correctness. The solution we use in PReachDB is to havea node send an acknowledgment (ACK) in response to receiving a state. Asending node stores the number of successors of the state in the Acknowl-edgment Table and waits for that many ACK messages in response. When areceiving node receives a state, it logs the state to disk in the Global VisitedState Table and sends an ACK to the sending node. When the number ofoutstanding ACKs for a state is 0, we know that all its successors have beenlogged to disk.New Queued Pending Retired- - -Figure 3.2: Lifecycle stages of a model state in PReachDB.The lifecycle of a state throughout the computation is as follows. A stateis first discovered to be part of the reachable set by being enumerated asa successor of another state in the reachable set. At this point it is “new”as shown in Figure 3.2; it is not stored to memory or to disk and it is notACKed. The state is sent as an Erlang message to its owner node. Theowner stores it in the global work queue and in the visited state table asunexplored. Once stored, the state is “queued.” An ACK is sent back tothe sending node. After the owner explores the state and sends out thesuccessors of the state, the state is “pending.” When ACKs for all of thesuccessors have been received, the node removes the state from the workqueue, and the state becomes “retired.”Life stage Global State Table Global Queue ACKs outstandingNew Not present Not present Not yet sentQueued Present Present Not yet sentPending Present Present YesRetired Present Not present NoTable 3.1: Presence of data for lifecycle stages.In the case where not all successors of a state are acknowledged aftera certain amount of time, that is the outstanding ACK counter is greaterthan 0, the situation can be remedied by resending all of the successorsand resetting the outstanding ACK counter for that predecessor state to its283.5. Message Acknowledgments  ● We can regenerate that state on restart if we know its parent state● Don't remove a state from the DWQ until we receive Acknowledgments for all childrenDisk 2Disk 3Node 2Node 3Regenerate state B from parent AACK for AMessage (state B, parent A)Figure 3.3: Regenerating the lost message.number of successors. For example, a node receives a state it has alreadyprocessed whose outstanding ACK counter is greater than 0. PReachDBdoes not keep track of which successors have not been ACKed, and there mayhave been messages lost in transit previously. The node resends all successorsstates. To avoid double counting acknowledgments, the node resets thecounter to the number of successors. The nodes that receive the successorssend ACKs regardless of whether they have received the states before or not.Furthermore, the states are sent with sequence and epoch numbers that aresubsequently included in the acknowledgments as described below in Section3.5.1. When the outstanding ACK count reaches 0 for the predecessor state,the predecessor state is removed from the Global Work Queue because thenode does not ever have to resend that state’s successors states.3.5.1 Sequence NumbersResending the successors generates a new problem. It is possible for a mes-sage from the original dispatch to be delayed in the system long enough thatthe original ACK is not received until after the successors states have beenresent and the outstanding ACK counter reset. These delayed ACKs couldrun down the counter to 0, triggering the removal of the predecessor state293.6. Failure Handlingfrom the disk work queue, before all successors states have been logged todisk. Should a successor state, falsely counted off, not be received or loggedto disk, we would not know that we had missed it.In PReachDB we solve this problem by keeping track of an increasingsequence number associated with each dispatch. When successors states aredispatched, they are sent with a sequence number incremented from thatof the previous dispatch and with the epoch number of the sending node.The current sequence number is stored in the Acknowledgment Table alongwith the outstanding ACK count. When an ACK is received, if its sequencenumber is less than the current or if the epoch number does not match,the ACK is ignored; if its numbers match, the outstanding ACK counter isdecremented. On the last decrement, the state is deleted from the GlobalWork Queue and from the Acknowledgment Table.The full message and acknowledgment protocol also includes the epochnumber of the sender. If the epoch number of an ACK does not match theepoch number the sender is currently on, the ACK is discarded.Figures 3.4 to 3.9 illustrate the motivation for using sequence numbersin the acknowledgment protocol. Consider a system with three computenodes with point-to-point communication. Node 1 is processing state A,and generates states B and C from Successors(A). B is sent to Node 2 andC is sent to Node 3. However the messages are slow, and Node 1 beginsreprocessing A before any ACKs have been received. Node 1 resends B andC to Nodes 2 and 3 and resets its ACK counter to 2. While the messages aretaking a long time to reach Node 2, Node 3 receives state C twice, sendingan ACK for each. Node 1 received 2 ACKs for A and decrements its countertwice. Since the counter is 0, A is removed from the Global Work Queue.Meanwhile, Node 2 has not yet received B and then suddenly fails. B isnever processed and the system misses some of the reachable state space.3.6 Failure HandlingA node failure may be caused by anything, but the result is that the PReachDBprocess running on that node is ended, forcibly or not. In PreachDB we ad-dress two types of failures on a node, where either the node can recover byrestarting the node and/or the PReachDB process, or where the node is notrecoverable and is essentially dead. We consider two failure scenarios.One scenario is that all N nodes fail at the same time with their dataintact. This scenario might happen if the user is running on a scheduledcluster and his reservation has ended. The user must halt computation, but303.6. Failure Handling  ● ConsiderDisk 1Disk 2Disk 3Node 1Node 2Node 3AB CNode 1 generates Successors(A)Figure 3.4: Sequence Numbers: Example system configuration.  Disk 1Disk 2Disk 3Node 1Node 2Node 3AB C(state B, parent A)Message is slow!(state C, parent A)Message is slow!State # ACKs outstandingA 2Node 1's ACK TableFigure 3.5: Sequence Numbers: Slow messages.313.6. Failure Handling  Disk 1Disk 2Disk 3Node 1Node 2Node 3AB CState # ACKs outstandingA 2Node 1's ACK TableNode 1 regenerates Successors(A)(state C, parent A)Messages are slow!(state B, parent A)Messages are slow!Figure 3.6: Sequence Numbers: Resending the successors.  Disk 1Disk 2Disk 3Node 1Node 2Node 3AB CState # ACKs outstandingA 2Node 1's ACK Table (state B, parent A)Messages are slow!(state C, parent A)Messages arriveFigure 3.7: Sequence Numbers: C arrives at Node 3 twice.323.6. Failure Handling  Disk 1Disk 2Disk 3Node 1Node 2Node 3AB CState # ACKs outstandingA 0Node 1's ACK Table (state B, parent A)Messages are slow!ACK A ACK AFigure 3.8: Sequence Numbers: Node 3 acknowledges predecessor A twice.  Disk 1Disk 2Disk 3Node 1Node 2Node 3AB CState # ACKs outstandingA 0Node 1's ACK Table (state B, parent A)Messages are slow!Figure 3.9: Sequence Numbers: Node 2 fails and B is missed.333.6. Failure Handlingthe progress is not lost.Another scenario is that a node has a non-recoverable failure. For a non-recoverable failure we look at two options: either continuing the computationwith one fewer node and making the other nodes pick up the dead node’swork, or inserting a new node into the node pool to replace the dead node.Both options require enough redundancy of data for the remaining nodes toaccess to the dead node’s work. We discuss the implementation of the dataredundancy and strategies for dealing with node failure in the rest of thissection.3.6.1 Table Fragmentation and ReplicationWe use data redundancy across nodes to prevent data loss in the event that anode is disabled. There are many means of disabling a node, such as networkdisconnect, hardware failure, and loss of power. We will not distinguishbetween these in the current work, and consider only the case where a singlenode or where all nodes become completely unresponsive. Handling moregeneral failure scenarios such as multiple nodes failing simultaneously or insequence is a topic for future work.One scheme for storing redundant data is create a single master databasenode through which all database writes and reads are done. The masterdatabase is passively replicated by slave database nodes. In the event of afailure of the master, a slave is promoted to master. We did not utilize thisscheme for this project as a matter of choice based on the design of PReachand the existing standard libraries for Erlang. This master database wouldcreate a bottleneck, as it would have to handle all hash table and work queuetransactions. On standard hardware we would quickly saturate the masterdatabase’s throughput.Another redundancy scheme is to store all global tables in their entiretyon every node. This is expensive in storage and in effort required to coor-dinate. Decreasing the number of replicas of the global tables, R, decreasesthe storage space required per state during the computation. We would likethis number R to be high enough to provide reasonable reliability withoutunnecessarily taking up space. If we ignore the data location for a momentand consider only how many replicas we need, we set R = 3 [31] for mostcases but allow the user to override. For our experimental setup, we useR = 2 since we are concerned with only one node failure.We like keeping the data on the nodes, as was done in PReach. Butrather than storing entire global tables on every node, we can utilize Mne-sia’s table fragmentation feature to partition the global tables into chunks.343.6. Failure HandlingMnesia maps each data item in the data table onto a fragment; for example,the default mapping is based on the hash of the data item. A fragment ofa table can be replicated through Mnesia onto any number of nodes in thenode pool. If we set the number of fragments per table to N , the number ofnodes, then we can have one fragment per node (when there are no replicas)to evenly distribute the data across the node pool, assuming uniform hashdistribution. Ideally the fragment residing locally to the node stores thedata for the states that are owned by the node; local data accesses improveread performances especially.When R = 3 and the number of fragments is set to N , the number ofnodes, a node stores three fragments (per table) locally. Should a node godown permanently, we lose one replica of each fragment that was stored onthat node, but two other replicas of those fragments exist on other nodes.We can repair the loss of redundancy by making a copy of each affectedfragment onto a new node from the existent replicas. In the common casewhere R < N , the copying of fragments is less work than making a completecopy of the entire table.Tables created with fragmentron:create table/2 will automaticallybalance the location of table fragments and create new fragment replicas asnodes join or leave the node pool. In Section 3.4.2, we discussed values forfrag properties. n fragments is set to N and n disc copies is set to R.3.6.2 Hot Spare and Data MigrationFigures 3.10, 3.11, and 3.12 show how a hot spare node can be introducedto a system to replace a dead node. In the figures, Nodes 1, 2, and 3 storea database table. The table has 3 fragments, colored blue, pink, and green,and each node stores two different fragments. Node 3 has failed and itsreplicas of the pink and green table fragments are lost. There are still activereplicas of those fragments on Nodes 1 and 2. We introduce Node 4 to thesystem to replace Node 3. The system is paused while Node 4 is added tothe Mnesia node pool and pink and green fragment replicas are created onNode 4. The system updates its message destinations to include Node 4 anddrops Node 3. The system then resumes computation.We created messages to pause and resume all nodes in the system. Whena hot spare is started, it sends these message in its startup sequence.On receiving a resume message, an existing node updates its fragmentowner map. This mapping of fragment to owning node is used for lookingup the destination for every generated successor state. The fragment ownermapping is a small optimization when choosing for each state which node to353.6. Failure Handling  Node 1Node 2Node 3Figure 3.10: Three node replicating three fragments of a table. Node 3 hasfailed.  Node 1Node 2Node 3Node 4Figure 3.11: Hot spare Node 4 is introduced to replace Node 3.363.6. Failure Handling  Node 1Node 2Node 3Node 4Figure 3.12: Fragments are copied to Node 4 and Node 4 joins the compu-tation.send it to. We could choose randomly each time, and we could ask Mnesiafrom which nodes a data item can be written or read and then send it toone of those. But if we know which fragment a data item would be writtento, we can use the fragment owner map to send the state to a node whichstores that fragment locally. Which node stores which fragment is knownafter table creation, and we create the mapping then. After the node poolchanges, the mapping must be updated.One caveat is that if there are several choices of nodes for location toread/write, Mnesia will pick a node based on network latency. This means anode with slightly higher network latency would never normally be chosen,even if it is starved for work. Adding a work queue aware load balancer tothe system is one direction for future work.The mapping is created using Algorithm 2.In selecting an owner, we will choose from among hosts where the frag-ment is resident and will prefer hosts that have been passed over for selectionpreviously.373.6. Failure HandlingAlgorithm 2 Fragment owner mapping creation1: m : map of fragment to host2: seenBefore : map of host to counter3: hostsf : list of hosts4:5: procedure Owners( )6: m := {}7: seenBefore := {}8:9: for each fragment f do10: hostsf = mnesia : table info(f, where to write)11: if hostsf = [] then12: halt()13: end if14: sort hostsf on host h by value of seenBefore[h] descending15: m[f ] = hd(hostsf )16: for each host h in tl(hostsf ) do17: insert seenBefore[h] if needed18: Increment(seenBefore[h])19: end for20: end for21: return m22:23: end procedure383.6. Failure Handling3.6.3 Work MigrationIn PReach, a node’s work queue, represented as an Erlang list, is maintainedin memory as the first argument to the recursive function reach, whichimplements the loop body of Search(). PreachDB maintains this samestructure. The states yet to be explored are also stored in the Global WorkQueue, along with states we may have to re-explore if a crash were to occur.The Global Work Queue stored the states but does not maintain order. Thestates in the Global Work Queue are a superset of the states in any node’sin-memory work queue.If a crash occurs, the in-memory list is lost, but the Global Work Queueis preserved. On a restart, when a node starts up and has an empty in-memory queue, it will ask Mnesia to read a selection of states from theGlobal Work Queue. PreachDB grabs a random selection of keys in theGlobal Work Queue from the mnesia:all keys() function. Which statesare returned does not matter, because all states from the Global Work Queuewill eventually be fetched or otherwise explored. States are removed fromthe Global Work Queue when the number of acknowledgments outstandingis zero.In fact, whenever a node reaches an in-memory work queue size of zero,it will retrieve states from the Global Work Queue in the same fashion. Allstates from all nodes are inserted into the Global Work Queue, and any nodecould read and process (check invariants on and enumerate the successorsof) any state. When retrieving states that belong to a remote node, thereis an inefficiency in that the node doing the exploring does not read andwrite states that are locally resident on the disk. Each Mnesia transactionwill go over the network. However this does allow work stealing, by whicha node that is making progress quickly may run its local work queue tozero and then retrieve states owned by a slower node. This can help reducethe bogging down of one particular node. Reading from the Global WorkQueue when the in-memory work queue is empty essentially provides theload balancing features of PReach through Mnesia without additional code.Another tactic implemented in PReachDB to try to alleviate boggingdown is a limit on the in-memory queue size. PReachDB nodes will store upto 5,000 states in their work queues before spilling them off. When a stateis to be added to a work queue that is full, it is written to the Global WorkQueue as normal, but it is not written to the in-memory queue of the node.This data is not lost because it remains in the Global Work Queue. Sincethe computation does not end until the Global Work Queue is empty, andsince a node with an empty in-memory work queue will fetch states from393.7. Modified Algorithmthe Global Work Queue, that spilled off state will eventually be processed.3.7 Modified AlgorithmWe present in Algorithms 3 and 4 the modified DEMC algorithm includingthe additional message passing used in PReachDB.The definition of Owner(s) differs from that described in 2.3.1. InPReachDB, the owning node for a state can change over the course of thecomputation due to a node failing and its states being adopted by othernodes. Thus, we cannot use PReach’s static mapping from states to nodesthat was described in Section 2.3.1. Instead, states are dynamically mappedto nodes based on the fragment number for the state. See Algorithm Termination DetectionThe termination detection algorithm for PReachDB differs from PReach’sin that it uses just the global work queue size to determine if there is workoutstanding rather than the message counters and queue size of each node.Adding a node to the queue increases the queue size. A state is notremoved from the global work queue unless each of its successors has eitherbeen added to the global work queue or been processed already. A queuesize of zero means all start states and all states reachable from the startstates have been processed.|Q| = 0 (3.1)where Q is the global work queue.3.8 SummaryPReachDB builds on PReach to add fault tolerance for node failures. Itmaintains the same general structure of PReach while changing the datastructure accesses to be mostly database accesses. Saving data to disk andreplicating the data on multiple nodes allows us to recover the system whileit stays online. In addition to database accesses, the main changes arethe addition of message acknowledgments, the addition of sequence andepoch numbers to messages, the modification of the termination detectionalgorithm, and the addition of the procedure to add a new node to the nodepool.403.8. SummaryAlgorithm 3 PReachDB modified DEMC algorithm: per-node process1: pGT : persistent globally accessible set of states seen so far2: pGWQ : persistent globally accessible set of states yet to process3: pEpoch : persistent per-node epoch counter4: WQ : queue of states to process locally5: ACK : map of states to (seq, acknowledgments outstanding) tuples6:7: procedure Search( )8: ACK := {}9: WQ := []10: initialize pEpoch if needed11: Increment(pEpoch)12: if I am the root then13: initialize pGT and pGWQ if needed14: end if15: wait for Mnesia ready16: barrier()17: if I am the root then18: for each s ∈ initialStates do19: Send s to Owner(s)20: end for21: end if22: while ¬Terminated() do23: GetStates()24: if WQ 6= [] then25: s := Pop(WQ)26: if s 6∈ pGWQ then27: seq := 1 + (hd(ACK [s]) or 0)28: ACK [s] := (seq , |Successors(s)|)29: Check(s) . verify s satisfies specified safety properties30: for all s′ ∈ Successors(s) do31: s′c := Canonicalize(s′)32: Send state (s′c, s, seq , pEpoch,Self ()) to Owner(s′c)33: end for34: end if35: end if36: end while37: end procedure413.8. SummaryAlgorithm 4 PReachDB modified DEMC algorithm, continued1: procedure GetStates( )2: while there is an incoming message do3: if Receive ack (predecessor , seq , epoch) then4: if epoch = pEpoch and hd(ACK [predecessor ]) = seq then5: acks := Decrement(tl(ACK [predecessor ]))6: if acks <= 0 then7: Remove(predecessor , pGWQ)8: Remove(predecessor ,ACK )9: end if10: end if11: end if12: if Receive state (s, predecessor , seq , epoch, sender) then13: if s 6∈ pT then14: Insert(s, pT )15: Insert(s, pGWQ)16: if |WQ | < spill threshold then17: Append(s,WQ)18: end if19: end if20: Send ack (predecessor , seq , epoch) to sender21: end if22: if Timeout and pGWQ 6= [] then23: Append(RandomSubset(pGWQ),WQ)24: end if25: end while26: end procedure423.8. SummaryThe main benefit of PReachDB is that by using the Mnesia DBMS weget fault recovery with almost no additional code. We merely need to checkif the database already exists on system start. Another nice side effect ofusing a database is that the termination detection algorithm is now evensimpler than in PReach.43Chapter 4Results and PerformanceThe results of this project are the proof-of-concept demostrations of thefollowing scenarios.1. Terminate all nodes and restart without significant loss of progress.2. Terminate one node and have the rest of the nodes run to completionutilizing data replication.3. Terminate one node, replace it with a hot spare, and have the hotspare and the rest of the nodes run to completion.We also present a performance comparison between PReachDB andPReach.4.1 ResultsWe present here the experimental setup and the results from four test runsdemonstrating different features added in PReachDB.4.1.1 SetupThe machines used for testing were of these two configurations runningLinux.Processor MemoryIntel R© CoreTM2 Duo CPU E6550 @ 2.33GHz 4GBIntel R© Xeon R© CPU 5160 @ 3.00GHz 6GBThe machines were connected over the UBC Computer Science departmentinternal network, and average ping time was less than 0.1 ms. The Mnesiadatabase files were stored locally per machine. The code path was to ashared server filesystem.Each of the following tests was performed on the model filen5 peterson modified.erl. Fragment replication, where applicable, wasR = 2.444.1. Results4.1.2 System-Wide RestartThe purpose of this test was to demonstrate that the Mnesia disk tablesretain the data over a PReachDB restart. When all nodes are forced toexit, we can restart the entire system with all progress retained that wascommitted prior to the crash.For this test we ran PReachDB with N = 3 nodes and with table frag-ment replication enabled. We let PReachDB run through the initializationsequence and begin processing states from the model. We then ran a shellscript similar to the ptest start up script. The script connected to eachmachine via ssh and ran kill -9 on all Erlang processes running under ouraccount. We then ran ptest to restart all of the nodes.We verified that the total number of states in the Mnesia visited statetable on completion of the of the program (visited in aggregate over allprocessing sessions) matched that of our control run using PReach. Theprogram output the number of states visited in the most recent session,and it was less than the number of states contained in the Mnesia visitedstates table. This shows that PReachDB did not visit all states in themodel during the last session. PReachDB retained the progress made in theprevious session.4.1.3 Sequence Number on AcknowledgmentsThe purpose of this test was to demostrate the ACK counter decrementissue described in Section 3.5.1.We ran PReachDB (with N = 3 nodes and with table fragmentation)uninterrupted with and without the logic for sequence numbers on acknowl-edgments. We included a debug conditional to print a debug statement ifan ACK counter in the ACK table is decremented below 0. Without thesequence number implemented, we observed the debug print, and with thesequence number implemented, we did not.4.1.4 Single Node TerminationThe purpose of this test was to demonstrate that with distributed replica-tion enabled we could complete the model exploration with a single nodecompletely removed from the node pool part way through. With N = 3nodes and R = 2 fragment replication, the data on the node that is removedis still held on the other two nodes. The remaining two nodes should pro-vide the necessary data to complete the computation without pausing or454.2. Performancerestarting the system, as the remaining nodes combine have the same dataas the removed node.We ran PReachDB on three nodes, letting the initialization complete,and then ran kill -9 on all Erlang processes on a single machine. Weobserved the remaining two nodes continue to output debug messages duringexploration. The remaining two nodes processed the entire reachable statespace and output the same number of unique explored states as the controlrun without terminating a node.4.1.5 Hot SpareThe purpose of this test was to demonstrate that when one node dies, com-putation can continue after a hot spare replaces the dead node. After thehot spare adds itself to the node pool with fragmentron, Mnesia copiesover the fragments to the new node in the background. The hot spare waitsfor the Mnesia tables to be ready, updates the other nodes to send states toit, and then calls reach.We ran PReachDB on three nodes and used kill -9 on one node, similarto the previous test. We then sent a user command to pause the all nodes.The two remaining nodes responded and waited. We ran the script to adda spare node from a different machine, and when it was ready ran the usercommand to resume all nodes. Although the spare had not participatedin the work before, it had access to the same data as the dead node. Thethree nodes were able to complete and reported the same number of uniqueexplored states as the control.4.2 PerformanceA strong benefit of using Mnesia is how well it provides recovery from systemfaults. Recovery time for most operations appears to the user as practicallyinstant. Both restarting a node with a disk table and inserting a hot sparenode into the Mnesia node pool for fragment replication do not negativelyimpact the other nodes. Mnesia copies table fragments in the background tothe hot spare. This does have the possible effect of overloading the Mnesiasystem.Unfortunately, in the normal usage of PReachDB, Mnesia reported fre-quently that it was overloaded. These messages decreased when replicationwas reduced to R = 1, but once any node became bogged down the systembecame stuck.464.2. PerformanceFigure 4.1: Without Mnesia, time on logarithmic scale: There is a longperiod with very little progress as process 0 catches up.With Mnesia transactions, writes to remote nodes (required for repli-cation) over the network must complete in full before a node can continueprocessing. We use transactions for correctness because Mnesia cannot guar-antee ACID properties with dirty database writes. We relax this later inthis section to test the performance with dirty operations, which return assoon as the operation completes for at least one node in the Mnesia nodepool.4.2.1 Performance without ReplicationRunning PReachDB without the –mnesia flag causes it to run with thePReach data structures implemented at the time of code forking. Eachnode runs with a local visited state table and a local state queue. Thevisited state table is by default a bloom filter based on [1] with fixed capacityN = 40, 000, 000 and error probability E = 0.000001. The state queue ismaintained as a list of states sent as argument to the recursive functionreach. The acknowledgment table is not used.Figure 4.1 shows the time in seconds on a logarithmic scale per 10,000states visited and the Erlang process memory in MB (log scale) of a typical474.2. PerformanceFigure 4.2: Without Mnesia.run. Figure 4.2 shows the same information with time on a non-logarithmicscale. On the non-logarithmic time scale, the horizontal distance betweensuccessive data points is the time to process 10,000 states. Longer distancesmean a slower rate. There is a long gap on process 0 between 130,000and 140,000 states visited, and between those data points, process memoryincreases greatly. State queue size also jumps by an order of magnitudebetween those data points, from 2,308 to 34,354 states. The other processeswait for process 0 to catch up, and begin processing new states receivedfrom process 0 as soon as they are ready.The slowdown may an artifact of the older PReach code base, as currentPReach does not have this behavior. The load balancing added to PReachaddresses the bogging down of a particular node, which in turn helps thewhole system continue to make progress.Figure 4.3 shows the time and process memory of PReachDB with the–mnesia flag enabled. Progress is slow but steady. When –mnesia is enabled,the size of the in-memory state queue is limited to a fixed maximum size of5,000 states. When the in-memory state queue reaches that length, addi-tional states are written to the global work queue but not to the in-memoryqueue. This queue length limitation plus work queue stealing prevents anyone process from becoming bogged down and holding up the other processes.484.2. PerformanceFigure 4.3: With Mnesia: Progress is slow but steady.4.2.2 In-memory Queue SpillingOne possible reason for PReachDB’s slow performance, as suggested by oneof the authors of PReach, is Erlang’s inefficient list operations when thenumber of states in the per-node work queue is large. The Erlang docu-mentation warns that improperly implemented list operations will result ina O(n2) operation due to repeated list copying [12]. To test if performanceis impacted by this issue, we implemented queue spilling for the in-memorywork queue. This was previously discussed in Section 3.6.3.Without limiting the in-memory work queue size, PReachDB with –mnesia does not slow down the same as PReach when the in-memory workqueue of any node grows large. Figure 4.4 shows that the run time withoutlimiting the queue size is similar to the run time when it is limited. Thisimplies that the queue size limitation has no effect on performance. Thelist operations may be implemented as suggested in [12]. It is also possiblethat the Mnesia write/read bottleneck masks a slow down caused by thein-memory work queue size. If this effect is happening, then PReach woulddemonstrate the slow down while PReachDB would not.494.2. PerformanceFigure 4.4: With Mnesia and no limit on the in-memory queue size. Progressis similar to with limiting enabled.4.2.3 Replication FactorThere does not seem to be significant performance impact when running with–mnesia between a replication factor of R = 1 and R = 2. The experimentwe ran with R = 2 (Figure 4.5) finished sooner than for R = 1 (Figure 4.3).4.2.4 Work Queue ReplicationWe tried running PReachDB with Mnesia-backed non-global work queues,where the option local content on the work queue is set to true for eachnode. This inherently disallows work stealing between nodes.This exacerbated the bogging down problem. The fast nodes repeatedlysent states they had already sent. Since states are not removed from thework queue until all acknowledgments are received, the fast nodes wouldrevisit unacknowledged states in its work queue before the slow node couldsend its initial acknowledgment. With no work stealing, the slow nodesnever caught up and the reachability computation did not complete. Thesystem got stuck in livelock. By the time the slow node acknowledged astate from the fast node, the fast node had resent that state with a new504.2. PerformanceFigure 4.5: R = 2sequence number, and so the fast node discarded the acknowledgment.4.2.5 Dirty Table OperationsIn this section we compare the performance of PReachDB using dirty databasetable reads and writes to the visited state table and the global work queueversus using synchronous transactions. According to the Mnesia documen-tation, a dirty operation will not wait for changes to be fully replicated toall nodes, but will instead return as soon as one node completes the oper-ation. If the table fragment is resident on the node, this operation shouldtake less time than waiting for all remote nodes to acknowledge the replica-tion. “This still involves logging, replication and subscriptions, but there isno locking, local transaction storage, or commit protocols involved. Check-point retainers and indices are updated, but they will be updated dirty. ...A dirty operation does, however, guarantee a certain level of consistencyand it is not possible for the dirty operations to return garbled records. ...However, it must be noted that it is possible for the database to be left inan inconsistent state if dirty operations are used to update it.” [13]The dirty operations we perform are:1. Read from the visited state table if a state exists and if not insert the514.3. Summarystate into the visited state table and the global work queue.2. Read from the global work queue to determine if a state being pro-cessed has already been retired and is no longer in the global workqueue.3. Delete the state from the global work queue when its acknowledgmenttable counter reaches 0 outstanding ACKs.The main concern here is that with work stealing, it is possible andprobable for two nodes to explore the same state and read and write lo-cally without seeing the dirty operations of the other node. In the case ofPReachDB, this should not affect the correctness of the reachability com-putation, but it may do redundant work. The correctness claim is justifiedas follows.One problem with this approach is termination detection. An incorrectread of 0 states left in the global work queue could cause the nodes to attemptearly termination. We fix this by doing a synchronous read of the globalwork queue size if the dirty read returns 0 states left. The synchronous readreturns the actual number.Figure 4.6 shows that the performance is similar to that with syn-chronous transactions. Synchronous transactions adds a 21% overhead overdirty operations as implemented, with run times of 1435s for dirty operationsand 1740s for synchronous transactions.We also tried dirty operations with Mnesia in-memory only tables, shownin 4.7. This did not significantly affect performance versus saving to disk.4.3 SummaryWe tested PReachDB under a selection of simple fault scenarios. It suc-cessfully recovered from temporary system-wide node failure and from per-manent single node failures with and without adding a replacement node tothe node pool. However, performance of PReachDB is currently much worsethan that of PReach; thus it is not yet suitable for practical use.524.3. SummaryFigure 4.6: With Mnesia and dirty reads and writes to the visited state tableand the global work queue.534.3. SummaryFigure 4.7: With Mnesia and dirty reads and writes to the visited state tableand the global work queue. Both tables were held in memory and not ondisk by Mnesia.54Chapter 5Discussion and Future WorkDistributed explicit-state model checking attempts to provide an edge overthe state space explosion problem by distributing work across multiple com-pute nodes. As with other distributed computation, it is brittle in the faceof failures. It is a long-running, memory-intensive, communication-intensivedistributed computation. Some possible failures are:1. A node has a system failure.2. A node runs out of memory.3. In the case of persisted data, a node has a disk failure.4. There is a network partition.Ideally none of these happen while trying to verify a model with a largestate space. However, failures do occur in practice and there are measuresthat can be taken to save the progress of the system. Keeping track of whichstates are retired and which states are queued or pending in a durable wayis required for a solution. To recover from a complete node loss also requiresdata redundancy.In PReachDB, we implement redundancy and persistency through Mne-sia. Mnesia provides distributed database tables with configuration to enablereplication, table fragmentation, and disk storage. We are able to demon-strate recovery from a single node crashing in three ways: by restarting thenode, by inserting a hot spare new node, and by letting the remaining nodesfinish the computation without a replacement node.We do not implement explicit checkpointing, but rather use the defaulttransaction management through Mnesia, which does checkpointing withinthe Mnesia subsystem. One option for performance improvement would beto implement explicit checkpointing while doing fewer Mnesia operations.We do not address Byzantine failures, which can occur in PReachDB ina few ways. For instance, a node could make an error while verifying aninvariant predicate on a reachable state. A node could make an error in555.1. Implementation Improvementsapplying Successors(s) and potentially miss a large chunk of the reachablestate space. Or in the case of dirty database operations, an error occurringduring a dirty write could leave a record in an inconsistent state.We do however address the problems of how to handle messages beingdropped and how to handle messages when a node has been restarted. Theuse of acknowledgments also simplies the termination detection equation.As discussed in Section 4.2.1, the performance comparison between PReachand PReachDB is affected by an artifact of the older code base. CurrentPReach with load balancing would finish at least an order of magnitudefaster than presented in 4.2.1. Given that, the overhead of enabling dis-tributed tables in Mnesia is very high even without replication. Limitingthe in-memory work queue size did not seem to affect the performance.Using dirty database operations did marginally improve performance, butbased on [13] the expected improvement from using dirty operations shouldhave been higher. Likewise the impact of disk versus in-memory Mnesiatables was lower than expected, and suggests that using Mnesia at all maybe the problem.The following sections describe some avenues for improvements and fu-ture work.5.1 Implementation ImprovementsThe implementation was written assuming that the root node is alwaysreachable. The root node is used as the host for generating the list of activenodes when a hot spare is added. Even with this limitation, PReachDB ismore tolerant to node failures than PReach, which will fail when any singlenode fails.The implementation also assumes that the number of table fragmentsequals the number of compute nodes at table creation. When the programis started, the tables are created with a fixed number of fragments, and thisnumber is assumed to stay constant until completion. It would make thesystem more robust to allow the number of table fragments to grow or shrinkover time as the number of compute nodes changes.5.2 Automated RecoveryRecovering from system failures in PReachDB is a manual process and re-quires the user to run shell scripts. There are user commands to pause orresume the system, restart a node, or insert a new node into the node pool565.3. Performanceas a spare. It would be helpful for usage of the system and for runningexperiments to automate the recovery process. This would involve writingevent handlers to automatically trigger recovery using Erlang and Mnesiaevents.5.3 PerformanceThere are several places where the performance of PReachDB could be im-proved. There is already work in the literature that could be leveraged inmost of these cases. Further study of the Mnesia system and best practiceswhen dealing with it would also be beneficial.5.3.1 Load BalancingDuring normal usage of PReachDB, Mnesia reports that it is overloadedseveral times. The node from which the warning messages originate alsotends to get bogged down and progresses very slowly. Adding a work queueaware load balancer to the system is one direction for future work which hasbeen met with success in other model checkers. [5, 25]5.3.2 MessagesPReachDB is structured to send at least as many messages as there arereachable state transitions. The cost of communication is particularly im-portant because it is higher for PReachDB than for PReach. PReachDBis designed to run on separate machines due to disk IO, while PReach hasno disk IO and can run multiple nodes on one machine with multithread-ing. Given that [5] found multiple factor speedup by batching messages intogroups of 100 or 1000, applying the same method to PReachDB should yieldsizable performance improvement.5.3.3 Memory and Disk UsageBatching can also be applied to writes to Mnesia. The implementationwrites states to the Global Visited State Table and the Global Work Queueas it explores, but it would be possible to decouple exploration from savingby using different threads for the two tasks. This would allow the explorerto run ahead. In the event of a failure any work done by the explorer thatwas not saved would need to be repeated. Batching writes may fit more575.4. Conclusionnaturally into PReachDB than explicit checkpointing, since we are alreadydoing transactions with single writes.We could decrease both memory and disk footprint by using shorter keysto the Mnesia tables. The key is currently the full state descriptor Erlangobject.Another method we could try is to use an in-memory Erlang ets cacheas the first line of defense before reading from Mnesia. If the cache doeshave the item, then we do not need to do the lookup through Mnesia andwe can avoid resending successors states which have recently been sent.We could potentially try a different approach using Mnesia as a dis-tributed log of operations done on in-memory hash tables. There may be anice way to use the Murphi in-memory hash table, which is tuned for thisproblem. Writes to the log can be batched. Any operations not written tothe log and lost when a node crashes can be rediscovered by replaying thelog and continuing from there.5.4 ConclusionPReachDB adds the fault tolerance capabilities of redundancy and persis-tence to the PReach distributed explicit-state model checker. It does thisthrough use of the Mnesia distributed database system for Erlang. Thisproject provides demonstration that PReachDB can recover from faults bothoff- and on-line. The overhead of the proof-of-concept implementation ex-plored here is too high to recommend it for practical usage. Many of the pos-sible performance improvements suggested are equally applicable to PReachas to PReachDB. We currently recommend using PReach and investigatingperformance improvements that could be applied to both model checkers.58Bibliography[1] Paulo Se´rgio Almeida, Carlos Baquero, Nuno Preguic¸a, and DavidHutchison. Scalable bloom filters. Information Processing Letters,101(6):255–261, 2007.[2] O. Amble and D. E. Knuth. Ordered hash tables. The ComputerJournal, 17(2):135–142, 1974.[3] Joe Armstrong. Erlang - A survey of the language and its industrialapplications. The Ninth Exhibitions and Symposium on Industrial Ap-plications of Prolog, 1996.[4] Gerd Behrmann. A Performance Study of Distributed Timed AutomataReachability Analysis. Electronic Notes in Theoretical Computer Sci-ence, 68(4):486–502, 2002. PDMC 2002, Parallel and Distributed ModelChecking (Satellite Workshop of CONCUR 2002).[5] Brad Bingham, Jesse Bingham, Flavio M. de Paula, John Erickson,Gaurav Singh, and Mark Reitblatt. Industrial Strength DistributedExplicit State Model Checking. Parallel and Distributed Methods inVerification, 2010 Ninth International Workshop on, and High Perfor-mance Computational Systems Biology, Second International Workshopon, 0:28–36, 2010.[6] Eric Brewer. Cap twelve years later: How the “rules” have changed.Computer, 45(2):23–29, 2012.[7] Eric A Brewer. Towards robust distributed systems. In PODC, page 7,2000.[8] R.E. Bryant. Graph-Based Algorithms for Boolean Function Manipu-lation. IEEE Transactions on Computers, C-35(8):677–691, 1986.[9] J.R. Burch, E.M. Clarke, K.L. McMillan, D.L. Dill, and L.J. Hwang.Symbolic model checking: 1020 states and beyond. Information andComputation, 98(2):142–170, 1992.59Bibliography[10] Giuseppe Della Penna, Benedetto Intrigila, Enrico Tronci, and MarisaZilli. Exploiting Transition Locality in the Disk Based Murφ Verifier.In Formal Methods in Computer-Aided Design, volume 2517 of LectureNotes in Computer Science, pages 202–219. Springer Berlin / Heidel-berg, 2002.[11] David L. Dill, Andreas J. Drexler, Alan J. Hu, and C. Han Yang. Pro-tocol Verification as a Hardware Design Aid. In IEEE InternationalConference on Computer Design: VLSI in Computers and Processors.ICCD ’92. Proceedings., pages 522–525, October 1992.[12] Ericsson AB. Efficiency Guide User’s Guide, 6.1 edition. http://www.erlang.org/doc/efficiency_guide/users_guide.html.[13] Ericsson AB. Mnesia Reference Manual, 4.12.1 edition. http://www.erlang.org/doc/man/mnesia.html.[14] Ericsson Utvecklings AB. MNESIA User’s Guide, 3.9.2 edi-tion. http://www.erlang.org/documentation/doc-5.0.1/lib/mnesia-3.9.2/doc/html/part_frame.html.[15] Message Passing Forum. MPI: A Message-Passing Interface Standard.Technical report, University of Tennessee, Knoxville, TN, USA, 1994.[16] Felix C Ga¨rtner. Fundamentals of fault-tolerant distributed comput-ing in asynchronous environments. ACM Computing Surveys (CSUR),31(1):1–26, 1999.[17] The Open Group. Information technology - Portable Operating SystemInterface (POSIX). ISO/IEC/IEEE 9945 (First edition 2009-09-15),pages c1–3830, Sept 2009.[18] Gerard J. Holzmann. On limits and possibilities of automated proto-col analysis. In Protocol Specification, Testing, and Verification. 7thInternational Conference, pages 339–44, 1987.[19] Gerard J. Holzmann. An Analysis of Bitstate Hashing. Formal Methodsin System Design, 13:289–307, 1998.[20] Alan J. Hu, Gary York, and David L. Dill. New techniques for efficientverification with implicitly conjoined BDDs. In Proceedings of the 31stannual Design Automation Conference, DAC ’94, pages 276–282, NewYork, NY, USA, 1994. ACM.60Bibliography[21] C. Norris Ip and David L. Dill. Better Verification Through Symme-try. In Proceedings of the 11th IFIP WG10.2 International Conferencesponsored by IFIP WG10.2 and in cooperation with IEEE COMPSOCon Computer Hardware Description Languages and their Applications,CHDL ’93, pages 97–111, 1993.[22] Valerie Ishida, Brad Bingham, and Flavio M. de Paula. PReachDB.https://github.com/ishidav/PreachDB.[23] J. Kuskin and D. Ofelt, et al. The Stanford FLASH multiprocessor. InProc. of SIGARCH 1994, pages 302–313, 1994.[24] H˚akan Mattsson, Hans Nilsson, and Claes Wikstrom. Mnesia - A Dis-tributed Robust DBMS for Telecommunications Applications. FirstInternational Workshop on Practical Aspects of Declarative Languages,1999.[25] R. Kumar and E. Mercer. Load balancing parallel explicit state modelchecking. Proc. of PDMC 2004, volume 128 issue 3 of Electronic Notesin Theoretical Computer Science, pages 19–34, 2004.[26] F. Lerda and R. Sisto. Distributed-memory model checking with SPIN.Proc. of SPIN 1999, volume 1680 of LNCS. Springer, 1999.[27] Igor Melatti, Robert Palmer, Geoffrey Sawaya, Yu Yang, Robert M.Kirby, and Ganesh Gopalakrishnan. Parallel and Distributed ModelChecking in Eddy. SPIN, 2006.[28] Igor Melatti, Robert Palmer, Geoffrey Sawaya, Yu Yang, Robert M.Kirby, and Ganesh Gopalakrishnan. Parallel and Distributed ModelChecking in Eddy. STTT, 2008.[29] Paul Mineiro. Fragmentron. http://code.google.com/p/fragmentron/.[30] G. Della Penna, B. Intrigila, I. Melatti, E. Tronci, and M. VenturiniZilli. Exploiting transition locality in automatic verification of finite-state concurrent systems. International Journal on Software Tools forTechnology Transfer (STTT), 6(4):320–341, 2004.[31] S. Ghemawat, et al. The Google file system. SOSP, 2003.[32] H. Sivaraj and G. Gopalakrishnan. Random walk based heuristic al-gorithms for distributed memory model checking. In Proc. of PDMC612003, volume 89 issue 1 of Electronic Notes in Theoretical ComputerScience, pages 51–67. Elsevier, 2003.[33] Ulrich Stern and David Dill. Using magnetic disk instead of main mem-ory in the Murφ verifier. In Computer Aided Verification, volume 1427of Lecture Notes in Computer Science, pages 172–183. Springer Berlin/ Heidelberg, 1998.[34] Ulrich Stern and David L. Dill. A New Scheme for Memory-EfficientProbabilistic Verification. In Joint International Conference on FormalDescription Techniques for Distributed Systems and CommunicationProtocols, and Protocol Specification, Testing, and Verification, pages333–348, 1996.[35] Ulrich Stern and David L. Dill. Parallelizing the Murφ Verifier. InComputer Aided Verification. 9th International Conference, pages 256–267. Springer-Verlag, 1997.[36] Enrico Tronci, Giuseppe Della Penna, Benedetto Intrigila, and MarisaZilli. Exploiting Transition Locality in Automatic Verification. In Cor-rect Hardware Design and Verification Methods, volume 2144 of LectureNotes in Computer Science, pages 259–274. Springer Berlin / Heidel-berg, 2001.[37] Pierre Wolper and Denis Leroy. Reliable hashing without collisiondetection. In Costas Courcoubetis, editor, Computer Aided Verifica-tion, volume 697 of Lecture Notes in Computer Science, pages 59–70.Springer Berlin / Heidelberg, 1993.62CodeThe PReachDB code is hosted on Github. See [22].63Tabular DataFor each table, time is in seconds and memory is in MB.64Tabular DataTable A1: Data for Figures 4.1, 4.2# states visitedproc 0 proc 1 proc 2time memory time memory time memory0 0.0 4.7 0.0 4.6 0.0 4.510000 2.3 5.7 5.9 13.6 6.0 13.620000 4.7 7.4 12.5 20.1 12.3 20.130000 7.2 7.4 19.2 31.2 19.0 31.240000 9.8 7.4 25.0 31.2 24.9 31.250000 12.5 7.4 30.7 31.2 30.6 31.260000 15.1 7.4 36.6 31.2 36.6 31.270000 17.7 7.4 42.1 31.2 42.2 31.280000 20.4 7.4 47.7 31.2 47.7 31.290000 25.1 17.2 52.1 31.2 52.2 31.2100000 30.1 22.0 56.5 31.2 56.6 31.2110000 32.8 17.1 60.6 31.2 60.7 31.2120000 37.6 17.1 64.7 31.2 64.9 31.2130000 43.5 11.9 69.0 31.2 69.1 31.2140000 1479.4 235.0 73.0 31.2 73.1 31.2150000 1481.7 235.0 76.9 31.2 77.1 31.2160000 1483.4 235.0 80.6 31.2 80.9 31.2170000 1485.1 235.0 84.2 31.2 84.4 31.2180000 1486.8 235.0 1510.5 35.8 1649.8 13.7190000 1658.6 235.0 1657.1 22.8 1659.8 5.665Tabular DataTable A2: Data for Figure 4.3# states visitedproc 0 proc 1 proc 2time memory time memory time memory0 0.0 5.0 0.0 4.9 0.0 4.810000 220.0 7.7 78.6 7.6 74.0 6.620000 426.2 12.2 160.2 7.6 151.2 9.330000 641.0 12.2 243.4 15.4 229.1 9.340000 840.5 12.2 326.8 12.1 308.2 9.350000 1039.1 12.2 410.0 12.1 387.2 9.360000 1227.7 12.2 494.1 12.1 465.6 9.370000 1417.0 12.2 579.4 12.1 546.7 9.380000 1593.4 12.1 664.3 12.1 631.8 9.490000 1718.6 12.2 750.6 12.1 713.9 9.3100000 1881.3 7.7 842.6 58.5 802.6 81.9110000 2040.9 9.4 932.1 97.4 888.9 81.9120000 1016.2 164.9 985.4 81.9130000 1109.9 343.2 1078.7 82.0140000 1195.7 352.7 1168.2 81.9150000 1285.2 58.5 1257.2 82.0160000 1368.8 116.0 1359.9 81.9170000 1458.4 116.0 1453.7 81.9180000 1541.9 82.1 1548.6 81.9190000 1632.5 97.5 1657.6 82.0200000 1720.7 236.1 1759.8 81.9210000 1802.1 74.6 1856.3 82.0220000 1888.8 36.2 1951.1 81.9230000 1959.2 53.7 2030.6 81.9240000 2034.5 7.766Tabular DataTable A3: Data for Figure 4.4# states visitedproc 0 proc 1 proc 2time memory time memory time memory0 0.0 4.9 0.0 4.9 0.0 4.910000 55.9 5.6 491.8 69.1 57.4 8.420000 113.0 5.3 747.8 100.2 115.6 6.530000 172.4 5.2 973.6 97.4 175.0 6.640000 233.2 7.7 1173.1 97.3 234.1 6.650000 293.6 7.7 1359.9 97.4 295.4 5.160000 356.6 45.8 1519.6 97.4 359.8 58.570000 420.2 6.7 1658.5 97.4 427.3 99.480000 490.8 7.7 1699.7 97.4 493.2 82.090000 552.7 7.7 1719.1 97.4 559.5 115.9100000 622.4 6.7 1730.7 97.4 627.4 69.2110000 702.4 6.0 1737.0 97.4 698.7 116.0120000 775.7 7.8 1741.9 97.4 769.5 116.0130000 857.1 7.8 1746.4 97.4 842.1 82.1140000 928.2 6.7 906.5 116.0150000 999.6 7.7 979.2 82.1160000 1069.9 6.7 1056.1 156.6170000 1138.5 7.8 1125.9 7.7180000 1213.0 7.7 1200.6 179.1190000 1288.8 6.7 1269.2 215.0200000 1363.3 7.8 1338.4 6.1210000 1446.1 50.6 1405.8 69.2220000 1519.9 50.6 1481.5 48.7230000 1600.3 50.7 1549.7 32.3240000 1670.5 50.6 1614.8 7.81675.3 30.367Tabular DataTable A4: Data for Figure 4.5# states visitedproc 0 proc 1 proc 2time memory time memory time memory0 0.0 5.0 0.0 4.9 0.0 4.910000 56.1 6.0 514.4 12.1 57.9 7.620000 113.6 5.9 757.4 12.1 118.2 7.630000 172.3 5.5 976.9 12.2 178.1 7.740000 232.9 7.8 1167.8 12.2 238.1 6.650000 292.4 7.7 1353.7 12.1 300.1 5.160000 354.2 5.1 1491.4 12.1 363.3 97.870000 416.7 7.7 1617.9 44.0 428.0 144.780000 480.0 7.7 491.9 97.590000 544.3 7.7 558.5 116.0100000 612.6 6.0 630.5 130.2110000 680.9 7.7 698.0 176.9120000 749.2 7.7 766.3 227.7130000 820.2 64.0 836.2 172.7140000 896.5 7.7 910.2 177.0150000 974.8 7.7 994.6 287.5160000 1048.6 7.8 1063.3 266.7170000 1123.8 6.0 1134.6 116.0180000 1202.6 7.7 1209.9 262.8190000 1275.0 6.1 1282.6 123.5200000 1347.7 58.5 1353.2 69.2210000 1419.3 58.5 1423.0 69.2220000 1497.5 58.5 1497.5 58.5230000 1568.5 58.5 1564.0 32.3240000 1642.2 58.5 1631.3 31.6250000 1706.7 58.5 1697.6 14.168Tabular DataTable A5: Data for Figure 4.6# states visitedproc 0 proc 1 proc 2time memory time memory time memory0 0.0 4.8 0.0 4.7 0.0 4.810000 32.1 5.9 112.9 12.0 33.0 6.520000 66.0 8.1 224.8 12.0 68.0 9.330000 100.5 8.1 319.6 12.0 103.4 12.740000 135.6 8.0 384.1 12.4 139.2 9.350000 170.7 8.1 431.1 12.4 175.4 9.260000 205.7 6.7 478.4 12.4 212.0 9.370000 241.5 8.0 524.0 15.8 249.3 9.380000 278.3 8.0 560.9 15.8 283.5 6.590000 316.6 7.0 608.3 15.7 323.6 9.7100000 356.6 9.7 656.3 7.9 372.0 8.3110000 414.3 8.0 720.8 86.7 431.7 8.3120000 474.6 8.0 826.0 116.2 493.7 6.6130000 555.4 9.8 999.9 116.2 564.5 7.2140000 614.8 8.4 1134.3 7.9 646.5 8.7150000 679.1 8.0 1271.1 82.2 750.3 154.8160000 766.5 8.0 1381.9 17.6 883.5 62.5170000 934.0 8.0 981.2 161.3180000 1067.2 7.0 1051.9 97.8190000 1213.1 58.8 1157.9 69.4200000 1307.4 58.8 1221.4 44.6210000 1372.4 58.8 1323.9 49.0220000 1387.2 14.369Tabular DataTable A6: Data for Figure 4.7# states visitedproc 0 proc 1 proc 2time memory time memory time memory0 0.0 5.1 0.0 4.8 0.0 4.910000 28.1 6.7 82.6 7.5 28.3 6.520000 57.9 9.4 167.8 7.5 58.2 9.330000 88.6 9.4 254.8 12.1 88.9 9.340000 119.3 9.4 313.6 12.1 119.9 9.350000 151.1 9.4 357.8 12.0 151.7 9.360000 182.2 9.5 403.7 12.1 182.9 9.370000 214.0 9.4 445.8 12.1 214.8 9.380000 247.3 9.5 488.8 12.1 246.5 9.390000 282.4 7.7 535.1 12.1 282.5 7.6100000 332.8 7.8 569.1 12.1 326.5 7.6110000 383.1 6.7 645.9 6.5 384.7 6.3120000 442.1 82.2 779.7 69.1 439.4 8.3130000 501.6 141.0 877.8 178.6 501.0 119.5140000 574.2 82.1 1048.1 164.9 572.8 6.6150000 684.9 82.1 1161.3 5.9 660.1 115.9160000 829.7 82.1 1261.0 58.4 781.9 82.1170000 974.3 129.6 1386.7 58.4 950.6 122.3180000 1121.3 82.1 1055.8 52.0190000 1245.3 82.1 1180.2 58.4200000 1360.1 82.1 1359.1 26.2210000 1459.0 82.1 1442.5 14.070


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items