Verifying a Self-Timed Division Chip by Tarik Ono-Tesfaye Diplom, Universitat Wiirzburg, Germany,1995 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F Master of Science in T H E F A C U L T Y O F G R A D U A T E STUDIES (Department of Computer Science) We accept this thesis as conforming to the required standard The University of British Columbia October 1997 © Tarik Ono-Tesfaye, 1997 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall hot be allowed without my written permission; Department of CQ?1?U (fcR. £Q£)JC£ t h e University of British Columbia Vancouver, Canada Date OO l L < , , DE-6 (2/88) Abstract The formal verification of asynchronous circuits is challenging, because they often have non-deterministic t iming relations between circuit components. Th i s thesis presents a formal proof strategy for verifying the correctness of a dual-rai l , asyn-chronous divider chip. For that purpose, this thesis wi l l extend t iming verification techniques that were originally developed for combinat ional logic to apply to dual-rai l , self-timed designs. The proof strategy is refinement based and employs four progressively more detailed models of the divider. K e y safety properties are first established for the abstract models using model checking. These properties are then shown to hold for the more detailed models by verifying refinement between adjacent levels in the model hierarchy. Contents Abstract ii Contents i" List of Tables vi List of Figures vii Acknowledgements viii 1 Introduction 1 1.1 Related Work 2 1.2 Overview of Thesis 4 2 Background 5 2.1 The Division Chip 5 2.2 Synchronized Transitions 8 2.3 Definitions 9 2.3.1 State Sets and State Transition Relations 9 2.3.2 Speed Independence, Invariants and Safety Properties . . . . 10 2.3.3 Refinement 12 iii 3 Proof Strategy 1 6 4 The Word Level Models 19 4.1 Word Level Programs 19 4.2 Speed-Independent Word Level Program 22 4.3 Timed Word Level Program 23 4.3.1 Time Model 24 4.3.2 The Program 25 4.3.3 Simulation 26 4.3.4 Non-Zenoness 27 4.4 Proof of Refinement 27 5 The Transistor Level Model 31 5.1 Description of the Model 32 5.1.1 Data Representation 33 5.1.2 Implementation Details 37 5.2 Definitions 39 5.3 Proof of Refinement 47 5.3.1 The Abstraction Mapping A t l ) W l 48 5.3.2 Overview of the Proof 51 5.3.3 Timing Graph Q ( p t i , w ( T ^ ) ) \ ^ M i ) 5 3 5.3.4 The Abstract Programs P a b 3 > s i and P a b 3 55 5.3.5 Pti is a refinement of P a b 3 57 5.3.6 P t i is a refinement of P w l 62 6 Conclusion 65 6.1 Future Work 66 iv Bibliography Appendix A Source Code List of Tables 4.1 Speed-Independent Div ider P rogram 22 4.2 T i m e d W o r d Level P rogram 25 5.1 Transistor Level P rogram 33 5.2 D u a l - R a i l Fu l l -Adder 40 v i List of Figures 2.1 Array of Stages "6 2.2 Structure of an Arithmetic Element 7 3.1 Proof Strategy 17 4.1 State Transition of the Arithmetic Array 20 4.2 Timing Diagram 21 5.1 Structure of An Arithmetic Stage 32 5.2 Inputs and Outputs of An Arithmetic Stage 34 5.3 Bit-Adder for Sum Bit 36 5.4 Bit-Adder for Carry Bit 36 5.5 2-Bit Full-Adder 41 5.6 Dependency Graph 45 5.7 Refinement Proof 52 5.8 The Abstract Stage 56 5.9 Hybrid Program 58 vii Acknowledgements I am very grateful to D r . M a r k Greenstreet for all his invaluable help and suggestions in getting this thesis done. Throughout the last two years he has been a very helpful and understanding supervisor. I 'd also like to thank Chr i s toph K e r n for the support in using the proof-checker and my second reader, D r . A l a n H u , for his t ime and patience, M a n y thanks to my parents as well , who have always been there for me. T A R I K O N O - T E S F A Y E The University of British Columbia October 1997 v i i i Chapter 1 Introduction Recent well publicized and exorbitantly expensive mishaps due to software or hard-ware bugs such as the Intel Pentium floating point divider or the European Space Agency's Ariane 5 rocket have demonstrated the motivation for more formal, math-ematically rigorous verification methods in hardware and software design — either as a supplement to conventional methods such as testing and simulation or as a replacement of those. In the hardware field, formal verification has started to gain acceptance in the past few years. However, not many industry relevant examples exist yet. Asyn-chronous circuits, which often have non-deterministic timing relations between cir-cuit components, are especially challenging. Asynchronous circuits do not have a global clock to synchronize components; instead, signals that coordinate the actions of different components are derived internally. Self-timed designs have the poten-tial of being faster than their synchronous counterparts because components in a self-timed design don't have to wait for a clock signal before starting their computa-tion; more modular because communication is regulated by handshaking protocols 1 rather than global timing constraints; and more intellectually challenging because self-timed design is not as widespread as synchronous approaches. This thesis will identify a four level, refinement based, formal proof strategy for a dual-rail, asynchronous divider chip. The feasibility of the strategy will be demonstrated by proving two of the refinement relations. Although asynchronous circuits are often speed-independent, performance considerations have motivated a divider chip design that exploits timing relationships between different components. In verifying such a circuit, a method needs to be developed for introducing time into the specification and for formulating and verifying timing constraints. This thesis will extend timing verification techniques that were originally developed for combinational logic to apply to dual-rail, self-timed designs. In the division chip considered here, the fact that the computation iterates around an array of arithmetic elements presents additional complexity to the timing verification, as the timing dependencies are circular. To overcome that, the proof strategy exploits the fact that the dual-rail, precharged circuit has monotonic transitions in the precharging and evaluation phase. Synchronized Transitions, a concurrent programming language, was used to model the division chip, as it provided a natural way to describe the non-deterministic next state relation of the self-timed chip. 1.1 Related Work After the Intel Pentium floating point divider bug, several papers were published that dealt with the verification of SRT division and synchronous division chips. Ruefi et al. [13] verified the general SRT division theory using the theorem proving system P V S [12]. They then extend the proof to a simple SRT divider implemen-2 tation. Bryant [3] demonstrates how BDD-based verification can be used to verify one iteration of a radix-4 SRT division at the bit level. He first generates a "checker circuit" that determines if the inputs and outputs of the divider, modeled at the gate level, satisfy the desired arithmetic properties. "Checker circuits" have the advantage that they can be reused, but they can be much larger than the actual circuit that is being verified and errors can be introduced in their construction. Both of these papers deal with clocked circuits without timing constraints between components that need to be verified. A methodology for designing and analyzing speed-independent asynchronous circuits is described by Spars0 and Staunstrup [15]. Their design technique uses a compositional approach, where pipelines and rings are composed to create multi-ring structures. They show that is possible to analyze the performance of such multi-ring based asynchronous circuits, even for non-trivial designs. Lee et al. [9] show how more general asynchronous circuits can be modeled in ST (see 2.2) and then verified using BDDs. Their verification is based on the automatic derivation of an invariant that guarantees a desired safety property of the circuit. The first part of the proof in this thesis is based on their approach. However, for the larger divider model used here, their approach is impractical, as will be shown later. There have been other asynchronous SRT division chip implementations de-scribed in the literature. Williams and Horowitz [20] describe a modified divider chip from the one considered in this thesis. Their design uses five arithmetic stages rather than three, which allows overlapping execution in the stages. In their five-stage design, control and precharging overhead are fully overlapped by evaluation of quotient bits and partial remainders. The authors claim that this leads to "zero-overhead" : an iterative design that is much smaller than a full combinational array 3 and divides as quickly. The precharging and return-to-zero encoding of this dual-rai l dynamic circuit leads to increased power consumption. For that reason, Ma t sub -ara and Ide [11] combine a single-rail static circuit wi th a dual-rai l dynamic circuit wi thout computat ion t ime penalty. T i m i n g verification has attracted much attention in the real-time systems community. In real-time systems the correctness of the system depends not only on the value of the result of a computat ion, but also on the t ime at which the result was computed [16]. A self-timed chip that is not speed-independent represents by this definition a real-time system. Henzinger et a l . [8] extend temporal logic to allow the verification of t iming constraints in real-time systems while Ber thomieu and Diaz [2] use time Petri nets, an extension of Pe t r i nets wi th lower and upper t iming bounds for transitions, for modeling and verifying real-time systems. 1.2 Overview of Thesis The next chapter introduces notat ion and terminology used in this thesis and de-scribes the divider chip. Chapter 3 then delineates the proof strategy developed for proving the chip correct. Chapter 4 describes the asynchronous word level models of the divider and the proof that the t imed model is a refinement of the speed-independent model. The detailed transistor level model of the divider is presented in chapter 5 and it is proven that it is a refinement of the t imed word level model. Chapter 6 summarizes the results of this thesis and suggests possible extensions to this research. 4 Chapter 2 Background This chapter describes the division chip verified in this thesis and establishes the theoretical foundations on which this thesis is based. The SRT division chip is described in the next section and the following section gives a short introduction to ST, the language in which the divider was modeled. Section 2.3 then defines terminology used in this thesis. 2 . 1 The Division Chip The divider chip implemented by Ted Williams et al. [18] implements radix-2 SRT division for floating point numbers. In SRT division, unlike the common "paper and pencil" method, the quotient digits are chosen out of a redundant digit set. This means that there is more than one representation of the quotient. In irredundant division it is necessary to compute the entire partial remainder before the next quotient digit can be determined. The main advantage of using a redundant quotient representation is speed. Because of the redundancy, an approximation of the partial remainder can be used to compute the next quotient digit. It is thus not necessary to 5 propagate the carry value across the entire length of the remainder, but rather only a fraction thereof (see [5] for details). A "small mistake" in the choice of the quotient digits due to the imprecise approximation can be corrected in later iterations. Quotient Output Registers Dividend Arithmetic Element 0 Quotient Remainder Arithmetic Element 1 Quotient Arithmetic Element 2 Quotient Remainder Figure 2.1: Array of Stages The divider chip consists of an array of three arithmetic elements (hence-forth also referred to as stages), over which the computation is iterated (see fig-ure 2.1). Each arithmetic element determines one digit of the quotient and the next partial remainder. The cyclic computation around this array continues until the arrival of a completion signal, which is generated by the shift registers that hold the quotient digits. All the signals that control the speed of the elements are derived from within the array. There is no global clock; thus, the array is self-timed. For performance, the arithmetic elements use precharged function block logic. Each element can be in one of the three states, precharging, evaluating or hold-ing. When a stage is precharging, all its signals are reset to low. During the evaluation state, the stage determines the next quotient digit and the next partial remainder. A stage is in the holding state when it is simply holding its output 6 unchanged for its successor stage. When one stage completes its evaluation (i.e. it has determined the next partial remainder and the next quotient digit), precharging in its predecessor stage is enabled. Furthermore, the precharge signal of its successor stage is removed to enable evaluation in that stage. For example when stage 1 has determined its partial remainder and its quotient digit, precharging in stage 0 is started. Next, the precharge signal of stage 2 is removed to start the evaluation. This control logic is explained in more detail in section 4.1. P b(i) • sum(i-l) ' carry(i-l) • divisor • 0 • - divisor " qiiotient(i-l) • M U X C S A C P A sum(i) carry(i) Q S L quotient(i) Figure 2.2: Structure of an Arithmetic Element Figure 2.2 depicts the structure of the arithmetic elements. They consist of a multiplexer, a 55-bit wide carry-save adder, a 3-bit wide carry-propagate adder and the quotient selection logic. During evaluation, the multiplexer chooses — based on the quotient digit of the previous stage — either the divisor, the negative of the divisor or zero. The multiplexer output is added in the carry-save adder to the carry and sum from the previous stage. This result is the new remainder. When the top three bits of both carry-save adder outputs (the carry and the sum output) have been determined, their sum is computed in the carry-propagate adder. The 7 quotient selection logic then determines the next quotient digit based only on these three digits (see [19] for details). 2 . 2 Synchronized Transitions A l l the divider chip models were wri t ten in Synchronized Transitions (ST) , a hardware description language (see [17]). In S T , a circuit is modeled as a collection of state variables and a set of transitions. A transitions has the form Ti =< Gi -»• Mi >, where Gi is the guard and M,- the multi-assignment of the transi t ion. The guard is a Boolean valued expression and the multi-assignment can be executed only if the guard is satisfied. A transi t ion is said to be enabled, i f its guard is satisfied. For example <C a A b —> x, y := y, x » is a transit ion wi th the guard a A b and the multi-assignment x,y •:= y,x. The transit ion is enabled to swap the values of x and y in program states where a Ab holds. Transitions are executed atomically, i.e. the evaluation of the guard and performing the multi-assignment is a single indivisible operation. Transit ions can be combined using the asynchronous combinator, | |. A t each step in program execution, a transi t ion is selected non-deterministically from amongst those that are enabled. For example, given the following program: < a ->• c := d > || < 6 - > - c : = e > , 8 if only one of a or 6 holds, the corresponding multi-assignment will be executed. If however both a and b hold, one of the transitions will be chosen arbitrarily. ST provides two other combinators that can be used in synchronous models. These are not used in this thesis. A description of these operators is given in [17]. 2.3 Definitions 2.3.1 State Sets and State Transition Relations Definition 1 (State Space) The state space V of a ST program P with the vari-ables VQ, VI, ..., vn is defined as D(v0) X D(v{) x . . . X D(vn), the cross product of the variable domains. Definition 2 (Initial State Predicate and Initial State Set) An initial state predicate QQ is a Boolean predicate over the state space V. The initial state set Q is the subset ofD that satisfies QQ. A state that satisfies the initial state predicate is an initial state. Definition 3 (State Transit ion Relation) Given a program TQ \\T\\\ ... \\ Tm and a set of states S C V, the state transition relation TZ C V X V is defined as: Vs,s 'e<S : (s,s')eTZ 3{ : Gi(s) As - Mi(s), where Gi denotes the guard and Mi the multi-assignment of transition The set of reachable states is defined as the subset of the state space that can be reached from a state satisfying QQ after executing a finite number of transitions: 9 Definition 4 (Reachable States) Let P be a ST program with the initial state set Q C V and the state transition relation TZ. The set of reachable states, S, is defined recursively by: • IfseQ then s G S. • If s G S and (s, s') 6 TZ, then s' £ S. • No other states are in S. 2.3.2 Speed Independence, Invariants and Safety Properties In the following definitions P denotes a ST model of a circuit with • the set of transitions T = {T0, Tu ..., Tn} where T{ =< G{ M t > ; the transitions are combined asynchronously, • the set of reachable states <S, • the state transition relation TZ defined by S and T, • the initial state predicate of the program QQ. A variable is read by a transition T,-, if it appears in its guard G; or on the right side of its multiassignment Mi. A variable is written by a transition T;, if it appears on the left side of its multiassignment Mi. Let r[Ti) denote the set of variables read by transition T,- and w(Ti) the set of variables written by transition Ti. A property of P is an invariant of P if once it holds in one state of P it holds in all subsequent states of P. A safety property of P is a predicate that holds in every reachable state of P. This is formalized in these two definitions: 10 Definition 5 (Invariant) I is an invariant of program P iff V(s,s')-eK : I(s) =>• I(s') Definition 6 (Safety Property) Q is a safety property of program P iff VseS : Q(s) A program P is said to be speed-independent i f once a transi t ion is en-abled, it remains enabled unt i l it performs some action and its multiassignment is not affected by other transitions. M o r e formally: Definition 7 (Speed Independence) A program P is speed-independent iff V s , s eS : V ? = 0 : (s,sO € n A Gi(s) A (s ^ Mi(s)) (8 = Mi(8)) V (Gi(s') A V i / € w(Ti) : v(Mi(s)) = v(Mi(s'))) The stable inputs property implies speed-independence and states that when a transi t ion is enabled, none of the variables that it reads are changed by other transit ions: Definition 8 (Stable Inputs) V s , s 6 S : V7=0 : (s,s') eTZ A Gi(s) A (s^Mi(s)) => (s = Mi(s)) V (V i / G r(Ti) : v(s) = v(s')) Speed independence and stable inputs are both safety properties. 11 In this thesis, two methods are used to establish safety properties. In model checking, a sequence of sets of states, So, Si,... is constructed as shown below: Si Qo, i f i = 0, Si-i U (RoSi-i), i f »" > 0 where noSi = { s e v \ 3seSi •. (s,s)en). Note that l i m ; _ > 0 0 Si = S. For finite state systems, this fix-point is reached in a finite number of steps. Once S is thus computed, one can show that Q is a safety property by verifying S => Q. In practice, it is often convenient to represent the sets of states symbolical ly using B D D s [4]. Th i s yields symbolic model checking. In this thesis, the term model checking is used to refer to the symbolic approach. M o d e l checking has an advantage that the verification can be completely automated. However, it is often infeasible for complex designs due to the size of the state spaces involved. In such cases, the fixpoint calculation can be avoided if the designer can propose an invariant, I, such that QQ => I and I = > Q. It is straightforward to show that this establishes that Q is a safety property. Typica l ly , a designer finds the invariant I based on some intui t ion about the operation of the circuit . 2.3.3 Refinement The refinement property enables one to reason about the relationship of one pro-gram to another; typical ly these are the specification and the implementat ion mod-els. Refinement is defined under a function that maps the states of one program to those of the other program: 12 Definition 9 (Abstraction Mapping) An abstraction mapping map is a func-tion that maps a state from the state space of the implementation to a state from the state space of the specification: map = Vimp ->• Vspcc Note that this mapping may be a part ial function. Definition 10 (Refinement) A program Pimp with initial condition Qi0 is a re-finement of a program Pspcc with initial condition QSo under the abstraction mapping map, if map o Qio CQSo and Vs, s € S : (s, s) e TZimp (map(s) — map(s')) V ((map(s), map(s')) e 7lspec, where lZ,mp and lZsptc are the state transition relations of Pimp and Pspec. P i m p is thus a refinement of P3pec i f either the state of the implementat ion before and the state after a transi t ion map to the same state in the specification or there exists an equivalent state transi t ion in the specification. Instead of proving safety properties on a detailed model, it is often easier to prove the safety properties on a more abstract model and then to show that the detailed model is a refinement of the abstract model . T h e following theorem states that safety properties are preserved in refinement relations. Theorem 1 If the program Pimp is a refinement of the program Pspec under the abstraction mapping map, then for every safety property Q of Pspec, Q ° map is a safety property of Pimp-13 The proof of this theorem is straightforward and is omit ted here. The next definition states how transitions of the implementat ion can be matched against transitions in the specification. Theorem 2 then shows how transi-t ion matching can be used to show that the implementation program is a refinement of the specification (see [10]). Def in i t ion 11 (Trans i t ion M a t c h i n g ) Let be a transition in the concrete implementation program and P,pcc be the abstract specification program L^spec —— llj = l ^ ^ ^specj ^ ^^specj • T,mp is matched against Pspec under the mapping map iff Vs e S,mp : Gimp(s) =^ ( map(s) = map(M,mp(s)) ) V ( 3 i : GspeCt{map(s)) A (map(M,mp(s)) = Mspeci(map(s)))Sj T h e function match(Timp, P3pec, map) denotes that T i m p is matched against P 3 p e c under the mapping map. T h e o r e m 2 Pimp is a refinement of P,pec under the abstraction mapping map, iff (Qi0 Qs0 o map) A (^Ai=1match(Timpi, Pspec, map)) is a tautology. Theorem 2 states that if al l transitions of the implementation can be matched against transit ions of the specification under the abstraction mapping map and i f (Qi0 => 14 QSQ o map) holds, then P i m p is a refinement of Pspec. The proof of theorem 2 straightforward and is omit ted here. 15 Chapter 3 Proof Strategy This chapter describes the proof strategy developed for verifying the correctness of the divider. The transistor level model of the divider is too large to permit model checking, and too complicated to verify from first principles using a theorem prover. Therefore it is desirable to prove safety properties on a less detailed, higher level model. Accord ing to theorem 1, i f the more detailed model is a refinement of the higher level model , the safety property wi l l hold in the detailed model as well . The strategy described in this chapter is based on the fact that safety properties are preserved by a refinement relation. Four models of the divider, three word level models and one detailed t ran-sistor level model, are used for the proof. The word level models of the divider chip determine the quotient digits at the word level rather than modeling the actual hard-ware of the chip at the gate level. They only implement the handshaking protocol between the ari thmetic stages of the divider. T h e first model of the divider chip is a synchronous, deterministic model . The second model of the divider is asynchronous and has no t iming information. The th i rd program is a more accurate model of the 16 protocol between the stages and includes timing information. The transistor level program implements the entire transistor level logic of the divider stages. It has lower and upper timing bounds for all transitions. Synchronous Word Level Divider Model P,y„ch r e f i n e m e n t Asynchronous Speed-Indep. Word Level Divider Model PA r e f i n e m e n t ] Asynchronous Timed Word Level Divider Model P„, f r e f i n e m e n t ! Transistor Level Model P u Figure 3.1: Proof Strategy The proof strategy consists of five steps (see figure 3.1), which correspond to five proof obligations: (i) the synchronous, word level model P g y n c h of the divider divides cor-rectly; (ii) the asynchronous, word level model without timing properties Psi is a refinemnt of the synchronous model; (iii) Psi is speed-independent; functional correctness 17 (iv) the t imed, word level model F w , is a refinement of the speed-independent model P3i; (v) the transistor level model Ptl is a refinement of the t imed word level model P w l . The last three properties are proven in this thesis, the first two properties are not proven and wi l l be left for future work. Even though this means that this thesis doesn't prove that the divider divides correctly, it w i l l show that races or other t im-ing hazards cannot occur during the operation of the divider . Th i s follows from the fact that speed-independent circuits function correctly regardless of the internal de-lays of modules. If the transistor level model is a refinement of the speed-independent model, it follows that the behaviour of the transistor level model must be explainable by the (speed-independent) behaviour of the speed-independent model . Obligations (iii) and (iv) are proven in chapter 4, which also includes detailed descriptions of the two asynchronous word level models. Chapter 5 then proves that P t l is a refinement of P w l . 18 Chapter 4 The Word Level Models This chapter focuses on the two asynchronous word level divider models - one with-out timing information and one with. In the first two sections, the two programs are described and the non-timed model is shown to be speed-independent. Section 4.3 then explains how time was modeled in the timed program and section 4.4 proves that the timed model is a refinement of the non-timed model. All the divider models were written in Synchronized Transitions. 4.1 W o r d Level Programs The two higher level programs implement the control logic (i.e. the protocol) between the stages of the arithmetic array and determine quotient digits at the word level. Each arithmetic stage has a precharge_bar and a quotient_done signal as both input and output signals. The input prechargeJbar signal tells a stage when to precharge. When it is low, a stage precharges and sets the output quotient .done signal to low. The input quotient_done signal is generated by the previous stage when it has finished evaluating its quotient. For a stage to evaluate, both the input 19 precha rge_bar signal and the input quo t i en t_done signal have to be high. W h e n both are high, it can set its output quo t i en t_done signal to high. A stage can start precharging as soon as the following stage has finished evaluating its quotient, because the value that it is holding is not needed anymore. Consequently, the input p recharge_bar signal to stage i can be set to low after stage i © 1 sets its output quo t i en t_done signal to high (the operators and " © " represent the addit ion and subtraction operators modulo 3, e.g. i © 1 means ( i + l ) % 3 ) . p!>(2) t- T iMi.i- ' F T T T T F 9 ( 0 ) < - F FF [ F T T T FF P H o ) « - F / » (° )«- ^ F F T T wy^m «- T ? T T TF) [TT T T FT) 9 ( 1 ) <r-T . 9(2) <-r \ [ F F T T TT) pb{l) <- F [TT T F FF) , ( 2 ) < - F / \ p 6 ( l ) <- T [ F F F T I T ) 9 ( 1 ) < - F / \ p 6 ( 0 ) « - T H , E , P ] [ T T F F FF) I T T F FT) ^ r — ' 9 ( 2 ) < - F pi.(l) f F F F F TT) L\-1MI ^ -r } III' FL J [ T T F F FT) < p 6 ( 2 ) <- F P 6 ( 0 ) <- T T T F F T T [ T F F F TT) 9 ( 0 ) < - T Figure 4.1: State Transi t ion of the Ar i thme t i c A r r a y Each stage can be in one of the three states, precharging (P), evaluating (E) or holding (H). F igure 4.1 shows the state transi t ion diagram of the divider . Each state is described by a six-tuple, which corresponds to the values of pb(0), q(0), pb(l), g ( l ) , pb(2) and q(2). There are three states in which exactly one stage 20 is precharging (pb = F A L S E , q — T R U E ) , one stage is evaluating (pb = T R U E , q = F A L S E ) , and one stage is holding (pb = T R U E , q = T R U E ) . For example the state (P ,H ,E) in the figure refers to the state of the ari thmetic array where stage 0 is precharging, stage 1 holding and stage 2 evaluating. The arcs in the diagram are labeled by the signals that trigger the state transi t ion and show the value of the signal after the t ransi t ion. The input p r e c h a r g e J b a r signal of stage i is denoted as pb(i) and the output quo t i en t_done signal as q(i). F igure 4.2 depicts the t iming diagram of the pb(i) and q(i) signals. E.P.H) (H-F.-P) E.P.tl) (H,E,P Pb(0) q(0) P b(i) q(i) pb(2) q(2) r - fall lillll I|H1 prec large time evaluate time Figure 4.2: T i m i n g D iag ram Figure 4.2 is a t iming diagram of the signals in the divider. 21 < ->pb(i © 1) -> pb(i) : = T R U E » < p6(i 9 1) A q(i 9 1) p6(i), sum(i), carry(i), qbit(i) : = F A L S E , F A L S E , F A L S E , 0 > < ->p6(i) ->• : = F A L S E » « p6(«) A ->q(i © 1) - J -sum(i), carry(i), qbit(i) := T R U E , 5i?Ts«m(sum(« 0 1), carry(i9 1), qb i t ( z '©l ) , divisor), SRTcarry(sum(.i Q 1), carry(i9 1), qb i t ( i© 1), divisor), SRTquot(sum(i Q 1), c a r r y ( i © l ) , qbi t ( ie l ) , divisor) > Table 4.1: Speed-Independent Div ider P rogram 4.2 Speed-Independent Word Level Program The S T implementation of the speed-independent program is given in table 4.1. The remainder value determined by stage i is represented using two bit-vectors, c a r r y ( i ) and s u m ( i ) , which correspond to the carry and sum part of the remainder, q b i t ( i ) is the quotient bit determined by stage i. The first t ransi t ion sets the p recharge_bar signal of stage i to true, when the successor stage has its p recharge_bar signal low and is consequently precharging. Th i s allows stage i to evaluate the next part ial re-mainder and quotient bit when inputs become available from stage iQl. The second transi t ion enables precharging in stage i, when its successor has finished evaluating. W h e n precharging is started, each bit of the vectors c a r r y ( i ) and s u m ( i ) is set to FALSE. The quotient bit q b i t ( i ) is set to zero l . W h e n the p recharge_bar signal 1 It doesn't matter what values carry ( i ) , sum(i) and qbi t ( i ) are set to as long as these values are known. It is necessary to assign them known values for the proof that the transistor level program is a refinement of the word level program. The definition of the abstraction mapping depends upon these values. The variables are not read by other transitions while they hold this 22 is low, the quo t ien t_done signal wi l l get set to low by transi t ion three. The last transit ion sets the q u o t i e n t - d o n e signal to T R U E and assigns values to c a r r y ( i ) , s u m ( i ) and q b i t ( i ) using the functions SRTsum, SRTcarry and SRTquot, which represent the operations of the S R T algor i thm. The second guard, -*q{i © 1), is necessary to ensure the speed-independence of the program. It guarantees that a stage wi l l set its quo t i en t_done signal to high after the successor stage has finished precharging (i.e. set its q u o t i e n t - d o n e signal to low). The in i t ia l state predicate of the speed-independent word level program is: QS0 = pb(0).v A q(0).v A ->pb{l).v A ->q(l).v A pb(2).v A q(2).v The speed-independence of this program was shown using model checking. 4.3 Timed Word Level Program In the actual divider chip, precharge operations can be shown to always take less t ime than evaluation. A s seen in figure 4.1, when pb(i © 1) goes low, this enables precharging in stage z © l . The same event enables setting pb(i) to true. W h e n g ( i © l ) is F A L S E (i.e. precharged), and pb(i) is true, then the guard of the fourth transit ion is satisfied, and stage i can start evaluation. Because precharging is faster than evaluation, the design can be modified so that evaluation starts as soon as pb(i) is true: evaluation wi l l not complete unt i l after stage i © l is precharged and the guard of the fourth transit ion is satisfied. Th i s allows precharging in stage i © 1 to overlap evaluation in stage i, rather than performing these two actions sequentially. B y removing precharging from the cr i t ical t iming path, the performance of the divider is increased, values. 23 The arguments of the previous paragraph describe the intui t ion and motiva-t ion behind the t imed design of the actual divider. To verify the resulting design, the t imed design is modeled wi th an S T program, and this program is shown to be a refinement of the speed-independent one. 4.3.1 T i m e M o d e l In the t imed word level program stages have bounds for precharge t ime and evaluate t ime. T h e next section describes how t iming is used in this program and how lower and upper bounds are implemented (see also [7], [1]). In the section thereafter the actual t imed program and its differences to the speed-independent program are presented. T i m e is modeled as a real-valued variable r and is part of the environment in which the programs operate. Protocols describe the relationships between states before and after environment actions. The expression r.pre denotes the value of T before an environment action and r.post its value after the environment action. W h e n it is necessary to distinguish the value of a variable before a state transi t ion and after a state transi t ion, the variable wi l l appear wi th the .pre and .post suffix. Variables that are unchanged by the environment action or state t ransi t ion generally appear without a .pre or .post field. In the timed divider model, only r can be changed by environment actions. In the t imed program the variables pb(i) and q(i) are now structures wi th two fields, T and v. The discrete value of the variable is stored in v and the t ime at which this value was assigned is stored in r . To add lower t iming bounds to a transi t ion in a S T program, an addi t ional 24 { I l iU I <S —*pb(i © l).i> A —ipb(i).v —>• pb(i).v,pb(i).T : = T R U E , T » || < pb(i®l).v A g ( z © l ) . v A pb(i).v -> p6(i).w,p6(f).r,sum(i), c a r r y ( i ) , q b i t ( i ) := F A L S E , T , F A L S E , F A L S E , 0 » || - C -ip6(I).U A q(i).v -¥ q(i).v,q(i).T : = F A L S E , T » || < p & ( I ) . V A -ig(i).i> A ( r > p6(i).r + T e v a l u a t e ) ^ q(i).v,q(i).T, s u m ( i ) , c a r r y ( i ) , q b i t ( i ) := T R U E , T, SRTsum(sum(i 0 1), c a r r y ( i © l ) , q b i t ( i © l ) , d i v i s o r ) , SRTcarry(sum(i 0 1), c a r r y ( i © l ) , q b i t ( i © l ) , d i v i s o r ) , SRTquot(sum(i © 1), c a r r y ( i 0 1), q b i t ( i © l ) , d i v i s o r ) » Table 4.2: T i m e d W o r d Level P rogram condit ion of the form T > (variable.r + T l o w e r b o u n d ) is added to its guard. Upper t iming bounds cannot be implemented in this way. Instead the pro-tocol for the environment is changed to express these bounds. For example (var.v = T R U E ) T.post < (var.r + T u p p e r b o u n d ) means that var.v must change to F A L S E before r reaches the t ime var.T+TnppeT b o u n d . 4.3.2 T h e Program T h e transitions of the t imed word level program are the same as those of the speed-independent program, except for the last one (see table 4.2). T e v a l u a t e is the min imum 25 t ime it wi l l take for a stage to evaluate. Note how the last transit ion doesn't have -*q(i © 1) in its guard, unlike the speed-independent program. In addit ion to these transitions we have a protocol that states that t ime is monotonic and how long precharging wi l l take at most: Proto : r.pre < r.post A (q(i) A -ipb(i).v => r.post < (pb(i).T + T p r e c h a r , Tpr<,charge is the max imum amount of t ime for precharging in a stage. In the t imed program we make the following assumption about t iming : •^timing • ( Tprecharge ^ Tev&\ua.te) A ( 0 <^ -^~precharge ) ^timing states that precharging takes less t ime than evaluating and that precharging takes a positive amount of t ime. The in i t ia l state predicate of the t imed word level program is: Qi0 = pb(0).v A q(0).v A -^pb(l).v A ~>q(l).v A pb(2).v A q{2).v A (pb(2).r < pb(l).r) A (pb(l).r < pb(0).r) A (pb(0).r<T) A ( p & ( l ) . r < r ) A (pb{2).T<r) A ( g ( 0 ) . r < r ) A ( g r ( l ) . r < r ) A ( g ( 2 ) . r < r ) . 4 . 3 . 3 S i m u l a t i o n W h e n s imulat ing the divider program, the moving forward of t ime is modeled by having a random number generator that chooses a floating point value number as the "new" t ime. Th i s t ime has to be greater than the current t ime. The protocol is realized by placing the value p 6 ( i ) . r + r p r e c h a r g e in an array at the same t ime aspb(i).v is set to low. The array entry is deleted when q(i).v goes low. The module that 26 moves t ime forward looks at this array and chooses a real value that is greater than the current t ime but less than the smallest t ime in the array. This same method is also applied when s imulat ing the transistor level divider. 4 . 3 . 4 Non-Zenoness A safety property that allows the system to reach a state that bounds the progress of t ime is called Z e n o (see [1]). Zeno specifications can be a source of incomplete-ness for proof methods. In this thesis formal proofs of Non-Zenoness are omi t ted . However, it can be shown that in all t imed programs described in this thesis there is a lower bound on the amount of time by which the upper bound is incremented. Furthermore, bounds on t ime are introduced to assert upper bounds on the t ime between when a transi t ion is enabled and when it performs its act ion. Th i s means that there is always a transit ion that is enabled to remove the upper bound on the progress of t ime. 4.4 Proof of Refinement Accord ing to theorem 2, to prove that the t imed program is a refinement of the speed-independent program it has to be shown that each transit ion in the t imed program can be matched against a t ransi t ion in the speed-independent program. Let Pspec be the speed-independent program and P , - m p the t imed program. Let A be a function from the states in the t imed program to the states in the speed-independent program that maps the variables pb(i).v and q(i).v of the t imed program to the variables pb(i) and q(i) of the speed-independent program. In the speed-independent program any state change effected by a transit ion changes the .v field of at least one variable. State changes due to environment actions only 27 change the r value, and states that differ only by their values for r map to the same specification state. It follows that V(s , s ) e TZimp : s ^ / ^ A(s) ^ A(s'). Since transitions cannot change r , showing that the t imed program is a refinement of the speed-independent program is equivalent to proving: {Qio => Qs0 oA) A V <C Gimp Mi V s € Simp • Gimp(s) ^ ^3 <^ Gspec ^ ^d-spec G Tspcc • Gspec{A(s)) A A ( M t m p ( s ) ) = M s p e c ( A ( s ) ) ) This follows directly from theorem 2. Since Qi0 Q 5 o o A and the first three transit ions of the t imed program are t r iv ia l ly matched by the first three transitions of the speed-independent program, this is equivalent to proving that whenever the transi t ion < pb(i).v A (r > p6(i).r + T e v a l l i a t e ) A -.g(i) -> q{i).v, q(i).T : = T R U E , r > is enabled, the transi t ion < A ->g(z © 1) A - i g ( i ) ->• : = TRUE > in the speed-independent program is enabled as well . T h i s means that the impl ica-t ion pb(i).v A (r > p6(i).r + T e v a l u a t e ) A ->q(i).v =^ pb(i).v A -<q(i(Bl).v A ->q(i).v 28 has to be a safety property of the t imed program. The implicat ion can be rewritten as pb(i).v A (r > p6( i ) . r + T e v a l u a t e ) A q(i®l).v q(i).v. Theorem 3 Qs : pb(i).v A (T>pb{i).T + Tevaluate) A q(i®l).v q(i)-v is a safety property of the timed program Pimp with the protocol P r o t o and the initial state predicate Qi0. Proof. Let Inv = 7,(0) A 7,(1) A 7i(2) A 7 2(0) A 7 2 (1) A 7 2(2) A ( / 3 ( 0 ) V J 3 ( l ) V 7 3 ( 2 ) ) . where / ! (») = (( g ( i © l).v\t ( ( r > pb(i® l ) . r + - ^ e v a l u a t e ) A pb(i®l).v) ) A~<q(i).v A pb(i).vj => ( r < pb(i).r + - ^ e v a l u a t e ) I2(i) = (pb(i).T < T) A {{pb(i).v A pb(i®l).v A -^pb(i Q l).v) ((p6(*'©l).r > p 6 ( i 0 1 ) . T ) A (p6 (*0l ) . r >p6(i).r))) A A - ip6 ( i© l ) . u A -ip6(z© l ) . u ) . ((p6(i© l ) . r > pb(i).r) A [pb(i).T > pb(i® l ) . r ) ) ) 29 / 3 ( i ) = (pb(i).v A -iq(i).v A -ipb(i®l).v A pb(iQl).v A ? ( ? 9 1 ) . » ) V A A ->pb(i®l).v A^q(i®l).v A p 6 ( i e i ) . u A g ( i 9 1 ) - « ) V (pfr(i).t; A -i<7(i).u A - i p 6 ( i © l ) . u A - ^ ( i © l ) . u A - i p 6 ( J 0 The clause J i is necessary to imply the safety property and was derived using the invariant derivation method described by Lee [10]. Clause I? states the t iming relationship between the p recha rge_bar signals and I3 is an enumeration of the possible states of the divider. Us ing a decision procedure for predicate logic and linear inequality the fol-lowing properties for Inv were proven: 1. Q0 =^ Inv 2. ( Inv A A t i m i n g ) =^ wp(Pimp, Inv) 3. ( A t i m i n g .pre A Inv.jore A P r o t o ) =>- Iiiv.post 4. I nv Qs Thus , Qs is a safety property of P ; m p as claimed. S This completes the proof that _ P ; m p is a refinement of Pspec-30 Chapter 5 The Transistor Level Model The most detailed divider program that was used to verify the correctness of the divider models the chip at the transistor level. To increase the speed of the divider, the divider circuit design takes advantage of of two t iming properties of the cir-cuit . F i rs t ly , instead of having an explicit done detection where all the outputs of a stage are tested to see if they have been computed, only the output of the quotient selection logic is tested. Th i s opt imizat ion is based on the assumption that the quotient selection logic wi l l always take longer than the carry-save adder. Secondly, it is assumed that precharging takes less t ime than evaluating. Th i s means that the outputs of a stage do not need to be tested to determine if the stage has fin-ished precharging, when its predecessor stage has finished evaluating. These t iming assumptions wi l l be proven in this chapter. The next section describes the transistor level model and section 5.2 defines notat ion used in this chapter. Section 5.3 then presents the proof that the transistor level program is a refinement of the word level program. Throughout this chapter the transistor level program w i l l be denoted by Pti, its set of transit ions by Tti and 31 its set of variables by V t l . 5.1 Description of the Model Pb(i) 1 C S A Done sum(i-l) carry(i-l) I sum(i) carry(i) C S A D o n e divisor qbit(i-l) M U X C P A Q S L qbit(i) Figure 5.1: Structure of An Arithmetic Stage In the transistor level program Ptl each arithmetic stage is modeled by one ST cell consisting of four sub-cells: • a multiplexer (MUX) • a carry-save adder (CSA) • a carry-propagate adder (CPA) • a quotient selection logic (QSL) which model the modules of the arithmetic elements as shown in figure 2.2. They operate as explained in section 2.1. In addition to these modules, there is a Done and a CSAJ)one module. The Done module determines when the stage has finished selecting a quotient bit and the CSAJ)one module determines when the carry save adder has computed all the sum and carry bits. Figure 5.1 depicts the structure of one arithmetic stage and the corresponding ST code is shown in table 5.1. If pb(i).v is FALSE, the module Done sets q(i).v and qs l_done to FALSE. When pb(i).v is high, 32 { IlLo i < -<pb(i(&\).v A -<pb(i).v - > pb(i).v,pb(i).T := T R U E , r > || <C pb(i®l).v A q(i®l).v A pb(i).v,pb(i).T : = FALSE, r > || M U X ( pb(i).v, q b i t ( i 0 l ) , d i v i s o r , muxou t ) || CSA ( pb(i).v, c a r r y ( i © 1), sum(i © l ) , muxout q(i).v and qs l_done are only set to high after a quotient bit has been selected by the cell QSL. The signal csa_done is set to FALSE by CSA-Done if pb(i).v is FALSE, and it is set to TRUE, if all the sum and carry bits have been determined. B o t h these modules are assumed to execute in zero t ime and were introduced to simplify the proof. They do not change the functionality of the divider and are not read by any transi t ion. In the remainder of this chapter, the .v notat ion of variables wi l l be dropped whenever it is obvious that it is the variable value and not the t ime at which it was assigned that is referenced. 5 . 1 . 1 D a t a R e p r e s e n t a t i o n In the transistor level model, al l the data except the quotient are encoded using a d u a l - r a i l code. T w o wires are used for dual-rai l encoded signals. In the S T models, the variables that represent these signals have two Boolean valued fields, F and T, II II II c a r r y ( i ) , sum(i) ) CPA( pb(i).v, c a r r y ( i ) , sum(i) , cpsum ) QSL( pb(i).v, cpsum, q b i t ( i ) ) Done ( pb(i).v, q b i t ( i ) , q(i), qsl_done ) CSA-Done ( pb(i).v, c a r r y ( i ) , sum(i) , csa .done ) } Table 5.1: Transistor Level P rogram 33 with the following translation to truth values for the variable: x.T x.F value F A L S E F A L S E empty F A L S E T R U E F A L S E T R U E F A L S E T R U E T R U E T R U E undefined During correct operation, no variable will have the undefined value. When a stage is precharging, all the variables are reset to empty, which in the hardware implementation corresponds to resetting all signals to low. While a stage is evaluat-ing, a variable can only change its value from empty to F A L S E or T R U E . It cannot go back to the empty value. Consequently, if a variable has the value T R U E or F A L S E , it will retain this value until the stage is precharging. Henceforth, the expression variable will not be used to refer to a dual-rail variable. Rather, it will be addressing a variable of the form x.T or x.F. d i v i d e r s u m ( i - l ) c a r r y (i-1) q b i t ( i - l ) pb 55 / „ i / '55 / '55 r^ Arithmetic Stage 00 / „ '55 / * ' 5 5 / „ 55 / „ 55 / „ 55 / 55 / . 3 / „ / 3 / „ c a r r y ( i ) q b i t ( i ) Figure 5.2: Inputs and Outputs of An Arithmetic Stage 34 The inputs and outputs of an ari thmetic stage are depicted in figure 5.2. For the double precision floating point ari thmetic used in the divider, 55 bits are required to represent mantissas. Dua l - ra i l encoding doubles the number of wires to 110. The quotient is encoded using three wires ( q b i t _ 1 , q b i t 0 and q b i t j , one wire for each of the three possible quotient values (-1, 0, 1). T o il lustrate how transistor-level models using dual-rai l encoding can be rep-resented in the S T language, the following example describes the S T implementation of an adder. E x a m p l e > The carry-save and carry-propagate adders used in the divider consist of one-bit full-adders, which operate in parallel in the carry-save adder and in series in the carry-propagate adder. The bit-adder determines the sum bit and the carry bit , both of which are encoded in dual-rai l . Precharged logic is used throughout the divider design. W h e n the pb input to a logic element is low, the p-channel pull-up transistor is turned on, pull ing the input of the inverter high, which then drives the output low. W h e n pb is high, the input to the inverter wi l l be pulled low if there is a path to ground through the network of n-channel transistors, and the output of the inverter wi l l go high. Otherwise, the input of the inverter remains high, and its output remains low. Published descriptions of the divider design do not include detailed designs for most functional blocks. Instead, we derived models for these blocks based on the documentation that pre-charged logic was used and earlier discussions wi th the designer. The design presented here should be representative of the actual design, but is probably not an exact replica. If the exact design were available, we expect that the verification techniques described in this thesis would be equally applicable. For example, figure 5.3 shows a schematic o f the part o f the one-bit full-adder that 35 -HE rin.T |f cin.F If j | c i u . T j l cin.F Figure 5.3: Bit-Adder for Sum Bit Figure 5.4: Bit-Adder for Carry Bit determines the sum.T value of three dual-rai l encoded inputs. T h e design is such that the pull-up overpowers the pull-down chain. For example i f pb is low and a.T, b.T and cin.T are high, the inverter output wi l l s t i l l be low. On ly when the pull-up is removed can the output of the inverter become T R U E . Th is approach produces a faster design at a cost of greater power consumption compared wi th one where each pull-down path includes a n-channel transistor controlled by pb. In S T , the one-bit full-adder can be modeled in the following way: < ->pb ->• sum.T : = F A L S E > || < pb A ((a.T Ab.T Acin.T) V (a.F A ((b.T A cin.F) V (b.F A cin.T))) V (a.T A b.F A cin.F)) -> sum.T :— T R U E > T h e guard for the transit ion that sets sum.T to true is constructed direct ly from the schematic: series connected transistors or networks are combined wi th conjunction; parallel connections are represented by disjunction. The sum.F value is determined in an analogous fashion, subst i tut ing a .T , b.T, cin.T wi th a.F, b.F, cin.F and vice-versa. 36 Likewise, figure 5.4 shows a precharged logic element that computes cout.T. Its ST model is: < -ipb ->• cout.T := F A L S E > || < pb A {{a.T A 6.T) V (a.T A cin.T) V (6.T A c m . T ) ) ->• cout.T := T R U E > Each variable f in the transistor level program F t l , is written by exactly two transitions: one transition that sets the variable to T R U E and one that sets it to F A L S E . The notation TTRUBM(u) and TFALSE| t l(u) will be used to refer to these transitions and GT[a,Eiti(v) and GFALSEtti(v) to refer to their guards. A l l transitions write only one variable. 5.1.2 Implementation Details Section 4.3.1 described how upper and lower bounds for transitions are incorporated into the word level program. Lower bounds were implemented by adding an addi-tional constraint of the form (r > a.r) to the guard of a transition. Upper bounds are described in protocols. For simplicity, it is assumed that all the transitions in the transistor level model — except those of the Done and CSA-Done modules — have the same lower and upper bound for evaluating, once they have been enabled. These execution time bounds, which will be referred to as 8j,a and #p x , correspond to the minimum and maximum gate delays. Similarly it is assumed that all gates have the same lower and upper time bound when precharging. These will be referred to as 7y i n and /yfa-x. As described above, disjunctions of variables in transitions of Pti represent transistors in parallel in the circuit and conjunctions represent transistors in series. Based on that observation, the enabled time of a transition T; with the 37 guard Gi is defined to be E(Gi), where 1/Q.T, i f v VQ w i th VQ £ V, t i t max{z/o .T, VI.T}, if v UQ A and E(v) = min{z/o .T, z^i.r}, i f f i / o V f i , where UQ and 1/1 are either a variable of V t i or a Boolean formula over variables in V t ] . The lower t iming bound (r > E(Gi) + <5fin) is added to the guard of each transi t ion. The max imum delay before transit ion T; executes once enabled is E(Gi) + 5™*- The protocol for the transistor level program is constructed using these upper bounds for transitions and the requirement T.pre < T.post, when an environment action occurs (see section 4.3.1). E(Gi) is not necessarily the t ime at which the guard Gi was enabled, as the actual enabling t ime depends on the values of the variables in the guard. For example, given the transit ion if the values of both VQ and V\ are T R U E , then the guard is enabled at mm{y§.T, VI.T}. If'i/o is F A L S E , then the guard is enabled at V\.T, even i f V\.T > VQ.T. However, E(v) is an upper bound on the t ime at which the transi t ion wr i t ing v becomes enabled. T h i s yields conservative upper bounds on the t ime at which variables are updated. The model may admit behaviours that the real circuit does not. Because the correctness conditions for the divider are safety properties, the verification remains sound. < v0 V vx v2 := V2, > , 38 5.2 Definitions This section introduces notations and definitions used in this chapter. Some of the definitions will be specific to the transistor level program, others will be general and applicable to any ST program. To illustrate these, a dual-rail carry-propagate adder will be used throughout this section as example. Definition 12 (Notation) For a variable a of a ST program P with the set of transitions T, let • p(a) denote the set of variables that appear in the read set of transitions that write a: v £ p{a) 3 Ti E T : v £ r(T;) A a e w(Ti) Wa C T denote the set of transitions that write the variable a, p be a predicate over p(a) Wa\p denote the set of transitions that can write a given that p holds. p(a)\p denote the set of variables that appear in the read set of transitions that write a, given that p holds: vep(a)\p O 3TieWa\p : v£r(Ti) Example > The dual-rail carry-propagate adder depicted in figure 5.5 consists of one-bit full-adders as those shown in figure 5.3 and 5.4 and adds two dual-rail vectors, a = [ai, 02] and b = [b\, 62] and a likewise dual-rail encoded carry bit cin to produce a dual-rail sum vector sum and carry output cout (see 5.5). Table 5.2 shows the ST 39 < ->pb -> c .T := F A L S E > || < pb A ( (01 .T A (bi .T V cin.T)) V (h .T A cin.T) ) C . T := T R U E > | | <C ->pb -> c . F := F A L S E » || < pb A ( ( a i . F A {bi.FV cin.F)) V (b^.F A cin.F) ) ->• c . F := T R U E > || < —>• sumi.T := F A L S E > || < p 6 A ( ( o i . T A {{h.F A cin.F) V ( & i . T A cin.T))) V ( a , . F A ( ( 6 i . T A c m . F ) V ( 6 . F A c in .T) ) ) ) ->• sumi.T : = T R U E > || <C —>• sum\.F :— F A L S E > || < p & A ( ( a i . F A ( ( 6 , . T A c in .T) V ( & , . F A c in .F ) ) ) V(a i .T A ( (61 . F A c in .T) V (b.T A cin.F)))) ->• sumx.F :— T R U E > || < -<pb -» sum2.T : = F A L S E > || < p6 A ( ( a 2 . T A ((6 2.r A c.T) V (6 2.F A c .F ) ) ) V ( a 2 . F A ((62.F A c.T) V (6.T A c .F) ) ) ) ->• sum2.T :— T R U E > || <C —>• sum2.F :— F A L S E > || < p i A ( ( a 2 . F A ( ( 6 2 . F A c .F ) V (6 2.T A c.T))) V(a2.T A ((b2.T A c.F) V (b.F A c.T)))) ->• sum2.F := T R U E > || <C -ip6 ->• cout.T : = F A L S E > || < p6 A ( (a2.T A (b2.T V c.T)) V (62.T A c.T) ) —• cout.T : = T R U E > || < -» cout.F := F A L S E > || < p6 A ( ( a 2 . F A (6 2.F V c .F) ) V (6 2.F A c . F ) ) ->• cout.F : = T R U E > Table 5.2: D u a l - R a i l Fu l l -Adder 40 pb cin.T -cin.F • o , .T-ai.F • i , . r -bi.F-a2.T • a2.F • b2.T-b2.F-Full-Adder c.F Full-Adder sum\.T sum\.F sumi.T sum^.F cout.T cout.F Figure 5.5: 2-Bi t Fu l l -Adder program for this adder. Consider the variable c.T. The set p(c.T) = {pb, a i . T , & i . T , cin.T}, and the set WC.T consists of the first two transitions. If the predicate —>p6 holds, only the first t ransit ion is enabled to write c.T. Therefore, W c .x | - . P & contains only the transi t ion < -ipb -t c.T := F A L S E > . Definition 13 (Input Variable) An input variable of a ST program P is a vari-able that is not written by any transition of P and is read by at least one transition. Definition 14 (Non-Input Variable) All variables of P that are not input vari-ables are non-input variables. Definition 15 (Trace) Given a program P with the initial predicate QQ, the set of reachable states S and the state transition relation 1Z, a trace is a sequence of states (to, t\, £ 2 , . . . , tk) such that Qo(to) and (t,-_i, U) G TZ for i = 1, 2 , . . . , k. The set of all traces is denoted by T(P,QQ). A trace t G T(P,QQ) is said to have length k i f it consists of a sequence of k states. The notation tk w i l l be used to 41 refer to a trace of length k and t\ to refer to the «-th state of the trace. The predicate settled(v>) evaluates to true if no transition writing v is en-abled and all inputs to v are settled: Definition 16 (Settled Signal) If v is an input variable to a program P with the set of transitions T, then settled(v) = T R U E . If v is a non-input variable, then settled(v) =4> V T . S T : ->Gi A W € p(f) : settled(v ) where G{ denotes the guard of the transition T t . The next four definitions have mutually recursive definitions and are used to determine the time by which an internal signal will have settled. In a precharged circuit, signals can only change from F A L S E to T R U E in the evaluation phase (i.e. when the precharge signal is low). That means that any transition that is enabled will set a signal to T R U E . A transition that has a conjunction of terms in its guard will always have to wait for the last one of these variables to be settled before it can be enabled. On the other hand, if a transition has a disjunction of terms in its guard, the transition can be enabled as early as the earliest settling time of either term. In the following: e(T\p) is earliest time at which the guard of transition T can be satisfied, and e(x\p) is the earliest time at which expression x can become true. Likewise, l(T\p) and l(x\p) are the latest possible times for these events. The earliest time at which variable v can be set to true given that p holds is given by minu\p, and maxu\p gives the maximum time. Formal definitions are given below. 42 Definition 17 (Earliest Enabled Time) Let a be a non-input variable and let p be a predicate over p(a) such that Wa\p consists of only one element. This transition will be denoted by Ta\p and its guard by G. The earliest enabled time for transition TQ\P is denoted by e(Ta\p) and is defined as: e(Ta\p) = e(G\p) where minVo\pi if v = i/o with VQ € p(ot), max{minVo\p, minVl \p}, if v = vo A v\ and eiv\p) = { mm{minVo\p, miny^p}, if v = VQVVI, where vo and v\ are either variables of p(a) or Boolean formulas over variables in p(a) and where defined below. Definition 18 (Latest Enabled Time) The latest enabled time for transition Ta\p is denoted by l(Ta\p) and is defined as: l(Ta\p) = l(G\p) where max., - „ 0 i p , if v = i>0 with vo e p{a), ma,x{maxVo|p, maxVl\p}, if v — Vo A v\ or v = VQ V v\, where vo and v\ are either a variables of p(a) or Boolean formulas over variables in p(a) and where miny0\p and maxVo\p are defined below. The min imum settl ing t ime is the earliest t ime by which settled(a) can hold and the max imum settling t ime is the t ime by which a signal must have settled. 43 mina\p = < Definition 19 (Minimum Settling Time) The minimum settling time of a, given that p holds is e(Ta\p) -\-Sfm, if a is a non-input variable, 0, otherwise Definition 20 (Maximum Settling Time) The maximum settling time of a, given that p holds is l(Ta\p) + 5jax, if a is a non-input variable, maxa\p = < 0, otherwise Example > In the adder program (see table 5.2), Ta\pb and Ta\pb are uniquely determined for all a that are non-input variables. Assuming that al l internal signals have been precharged and have settled to F A L S E and all input signals are stable (i.e. don' t change their values) by the t ime the precharge signal pb goes high, the min imum time until settled(c.T) evaluates to true is the min imum time it takes for the transit ion wri t ing c.T to be executed: minc,T\pb = e(Tc.T\pb) + 6fa = e{pb A -ic.T A ((oi.r A (bx.T V cin.T)) V (bx.T A cin.T))) + Sf" T h e max imum settling t ime for c.T is analogously maxc.T\Pb = £fax The sum variables sum2.T and sum^.F depend on the internal carry variables c.T and c.F. They are wri t ten by the ninth, tenth, eleventh and twelfth transit ion of table 5.2. After pb is set to T R U E , the min imum settl ing t ime of sum2-T is minsum.T\pb = e ( ( o i . T A ( ( i i . T A c . r ) V ( 6 i . F A c . f ) ) ) 44 V ( a i . F A {(h.F A c.T) V {b.T A c .F) ) ) ) + Sfin = m i n { m m c . T | p i ) , minc_F\pb} + 6fm = mm{Sfn,5fn} + Sfn = 25fa. < avT m.F bx.T bi.F cin.T cin.F a2.T a2.F b2.T b2.F sumi.T SWTIQ.F cout.T cout.F Figure 5.6: Dependency G r a p h For a variable a, the function />(<*) gives the predecessors of a in the depen-dency graph for the program. A t iming graph is a dependency graph annotated wi th t iming information. The dependency graph of the example adder program is depicted in figure 5.6. In the dependency graph, the nodes are the variables of the adder and an arc exists from a node to another, i f one of the nodes has the other as an input . 45 Definition 21 (Timing Graph) The timing graph of program P for the set of variables A under the assumption that p holds, denoted by G(P,A)\P> consists of the set of nodes N(p^\p, the set of arcs E(p^\p and the timing properties K(P,A)\P S U C A that • n £ N(PtA)\p <3> n £ A V 3a £ A : n £ p{ot)\p • E(P,A)\P - { ( ^ 1 , ^ 2 ) e V x V I n2 E A A ni £ p{n2)\P} • V n : (n,minn\p,maxn\p) £ K(pjA)\p. Let Tt] denote the set of transitions wi th in stage i of the transistor level divider. These are the transitions in the i-th instantiat ion of the MUX, CSA, CPA, QSL and Done cells as well as the two transitions that write pb(i). Stage i in the divider is in evaluation mode, when its precharge signal pb(i) is high and q(i) signal is low: Definition 22 (Evaluation Mode) eval(i) pb(i) A ~<q(i). The inputs to stage i are stable, i f no transi t ion from outside of stage i is enabled to write a variable wri t ten or read by a transit ion of stage i. Definition 23 (Stable Inputs) stable{i) « • ( V i ; eTtl-V, : Gj w{Tf) D (w(V,) U r (7^ ' ) ) = 0 ) Let r(7l\) € V t ] denote the set of variables read by transitions in Tt\ and let w{T2) G V t i denote the set of variables wri t ten by T^. The expression for empty in the next definition stems from the fact that a dual-rai l encoded variable is empty, when both its .T and .F field are F A L S E . 46 Definition 24 (Empty Stage) empty(i) & (a £ w{T?,) =>• ->a.) The inputs to a stage are valid, if the quotient bit from the previous stage is valid and all dual-rail inputs are valid. A dual-rail valued variable is valid, if it evaluates to T R U E or F A L S E and the quotient bit is valid, if exactly one of the wires representing the choices 0, 1 or -1 is high. Definition 25 (Valid Inputs) valid(i) O -Vi/.T, v.F e (r(73) - tw(73')) : iy.T A -.i/.F) V (-.i/.T A v.F) A 3j_ 0 : qbit^i) A V f c # j : -.qbitfe(») where the set r(T^) — w(T^) is the set of input variables of TJ. The next predicate states what the values of variables should be when they are settled after evaluating. Definition 26 (Settled Value) settlejval(i) <=> v € w(Tft) : settled(v) A eval(i) =$> v = GTRUE),;(v)) The predicates empty(i), stable(i), eval(i), valid(i) and settlejval(i) will ap-pear with a .pre and .post suffix, when it is necessary to distinguish their value before and after a state transition. The .r field denotes the time when the predicate last changed its value. 5 . 3 Proof of Refinement This section will prove that the transistor level program P t l is a refinement of the word level program P w l and is structured as follows. First a suitable abstraction 47 mapping will be defined and it will be discussed what obligations need to be shown for the refinement proof. In section 5.3.2 an overview of the proof will be given and the proof obligations will be proven in the remainder of this chapter. 5.3.1 The Abstraction Mapping A t l ) W l In the word level program all the bits of c a r r y ( i ) , sum(i) and q b i t ( i ) are assigned values simultaneously and always have valid values. In the transistor level program each transition only writes one variable. This means that there are reachable states of P t l where some of the dual-rail encoded output bits of a stage are empty and others are T R U E or F A L S E . The abstraction mapping between P t l and P w l needs to be defined in such a way that transitions writing variables of a stage appear as "stuttering" steps until all the outputs of this stage have valid values. In effect the abstraction mapping has to "hide" those states of the transistor level program that have output values that don't conform to those of P w l and only make the output of a stage "visible" when it has a valid value. Under this abstraction mapping, it then needs to be shown that the behaviour of the transistor level program is explainable by the behaviour of the word level program. Since the transistor level model uses dual-rail encoded variables and the word level model bit vectors (vectors of Boolean valued "bits") for the sum and c a r r y values, the abstraction mapping of states in »Stl to states in Swi has to convert the dual-rail variables to Boolean values. An appropriate data abstraction for q b i t has to be defined as well, since P t l uses three Boolean valued variables for each qbit( i ) whereas in the word level program qbit( i ) is one integer valued variable. Let / be a function that maps dual-rail variables of the transistor level stage to the corresponding variable in the word level program, with the following conversion of 48 dual-rai l values to Boolean: f (x.T, x.F) F A L S E , if - i x . T , < [ T R U E , i f X.T. Furthermore, let g be a mapping from the three transistor level q b i t ( i ) variables to the integer valued q b i t variable in P w l : Let A t l > w l be a mapping from states in the transistor level program Ptl to states in the word level program P w l . If pb(i) A q(i) holds for stage i, then the transistor level sum and carry variables of stage i are mapped to the word level sum and carry variables using the function / and q b i t ( i ) is mapped to ^ ( q b i t ( i ) ) . Otherwise, A t l ) W l maps the q b i t variables to the value 0 and all the transistor level sum and carry dual-rail variables to F A L S E . The variables pb(i) and q(i) are mapped to the variables wi th the same names in the word level program. The internal variables of the transistor level program are not part of the state space of P w l and are dropped by the abstraction mapping. Essentially, A t l i W l only makes the output values of a stage "visible" when it is in its holding phase. A l l transitions that take place while a stage is precharging or evaluating appear as s tut ter ing steps in the state space of P w i and the output variables are mapped to F A L S E and zero respectively. It follows that i f a state transi t ion in P t l from state s to state s' is caused by a transi t ion in one of the CSA, MUX, CPA, QSL or CSA-Done modules, both s and s' map to the same state under the abstraction mapping A t l ) V V l . O n l y the transitions that change one of the pb(i) - 1 , i f q b i t . ^ i ) , </(qbit(i)) = { 1, if q b i t , ( i ) 0, else. 49 or q(i) variables can lead to a state change at the word level after the abstraction mapping. The transitions TFAL5EM(q(i)) and T F A L S E , w l (g ( i ) ) both only have -ipb(i) in their guard and are identical . The transitions TTRVEiti(pb(i)) and T T R U E > w l (p6 ( i ) ) are also identical . A s for the transit ion T F A L S E j t l (p&( i ) ) , its guard is identical to the guard of the corresponding word level t ransi t ion. Note that after performing the transistor level t ransi t ion, the abstraction mapping wi l l map the output signals of this stage to the F A L S E vector and the value zero. These are the same values that are assigned to the outputs by transit ion T F A L S E W 1 (p6 ( i ) ) at the word level. It follows that the refinement obligation is t r iv ia l ly discharged for al l state transitions except those where the value of q(i) changes from F A L S E to T R U E . Protocols are used in both programs solely to assert upper bounds on the times between when a transit ion is enabled and when it performs its act ion. A tighter bound is more restrictive on allowed behaviours of the component modeled by the t ransi t ion. Therefore, we require that the protocol of the implementat ion be at least as strong as the protocol of the specification. Note that for both programs, the protocols allow an unbounded advance of t ime. Therefore, proving that P t l is a refinement of P w l under the abstraction map-ping A i , w i requires proving the following obligations: • W h e n a transi t ion in P t l is enabled to set q(i) to high, the t ransi t ion in P w i that sets q(i) to high is also enabled. • Furthermore, i t has to be shown that after q(i) is set to high and the values of c a r r y ( i ) , sum(i) and q b i t ( i ) become "visible" under the abstraction mapping, these values agree wi th the values computed by the word level program. 50 • The protocol of P t l implies the protocol of P w l composed wi th the abstraction mapping A t l ) W l . The next section gives an overview of how these obligations wi l l get dis-charged. 5.3.2 O v e r v i e w o f t h e P r o o f The refinement proof wi l l proceed as follows (see figure 5.7): • A t iming graph of stage i of P t l is constructed and t iming properties derived from i t . Th i s t iming graph represents stage i of P t , dur ing its evaluation phase. T i m i n g bounds for the quotient selection logic and carry-save adder can be de-rived from this graph as well as the input-output relation of the combinat ional network implemented by the internal gates of the stage. These derivations are only valid under the assumption that the following four properties hold: Qi : -*eval(i) A -iq(i) A GTRVEiii(pb(i)) empty(i) Q2 : eval(i) => stable(i) Q3 : the t iming graph is acyclic Q4 : i f eval(i), then only those transit ions in 72 that set variables to T R U E are enabled Qi states that the stage i is empty at the start of evaluation and Q2 states that its inputs are stable. Q3 can easily be verified using a graph traversal algori thm and Q4 follows from Q2. • Since model checking the transistor level model is infeasible due to the large reachable state space, a speed-independent abstract program, P a bs, si) is con-structed, and i t is shown that Qi, Q2 and Q4 are safety properties of P a b 3 ] s i . 51 Asynchronous Timed Word Level Divider Model P„, r e f i n e m e n t Speed-Indep. Abstract Model • r e f i n e m e n t eval(i) valid(i) eval(i) =£• stable(i) -^eval(i) A ->q(i) A Gm(pb(i)) => empty(i) r e f i n e m e n t .. Timed Abstract Model Transistor Level Model P„ Figure 5.7: Refinement Proof The abstract program is described in section 5.3.4. In addition, the properties Q 5 : eval(i) =>• valid(i) FALSE,tl ( < ? ( » ) ) = > -.p6(i) TRUE.tl are proven to be safety properties of P a b 3 i s i . These properties are needed to prove that P t l is a refinement of P w l . • A timed version of P a b 3 i 3 i is constructed and it is shown that the timed program is a refinement of the speed-independent program. This proof is analogous to the refinement proof in chapter 4. • Next it is proven in section 5.3.5 that P t l is a refinement of the timed, abstract program. Consequently, safety properties of P a b 3 are also safety properties of • Subsection 5.3.6 completes the proof that P t l is a refinement of the word level program P w l . The timing properties derived from the graph as well as prop-52 erties Q\ — Q7 are used in the proof. 5.3.3 T i m i n g G r a p h £(ptl,w(rt'))Ui(i) Assuming that a stage i is empty at the start of the evaluation and that all the variables read by transitions in are stable, the earliest and latest settling times for variables in w(Tt\) can be determined from the timing graph G'™^^^. This is captured in the next theorem. T h e o r e m 4 If the following properties hold for Pltl: (i) —<eval(i) A —>a(i) A G TRUE,u(pb(i)) empty(i) (ii) eval(i) stable(i) (iii) the timing graph G(ptlW(Til))\^ai(i) of stage i is acyclic (iv) if eval(i), then only those transitions in 7~*, that set variables to T R U E are enabled then Property (i) is an assertion about the internal and output signals of stage i and states that they all have to be F A L S E , when the transition TTRUBM(pb(i)) is enabled. After TrmeM(pb(i)) has executed, eval(i) will hold. Property (ii) states that when a stage is evaluating, all the inputs to the stage have to be stable. The last property follows from property (ii), because all transitions that set variables to F A L S E have ->pb(i) in their guard, except the transition that writes pb(i) itself. Since 53 pb(i) is an input to stage i, stable(i) implies that the transi t ion T f a l 3 e > t l (p6(i)) is not enabled. The proof of theorem 4 wi l l be omit ted here, but it can be proven using an inductive proof over the depth of nodes in the tree. Theorem 4 applies to any pipeline or ring structure that uses dual-rai l coding and precharged logic. Thus , it provides a framework for verifying other designs in the same style of this divider. Coro l la ry 1 below specializes the theorem for application to the divider. Corollary 1 With the same hypotheses as theorem 4'-The theorem enables reasoning about the relationship between the t ime when the signal csa_done goes high and when the qs l_done signal goes high. Since the transistor level divider uses only the output of the QSL module to determine when a stage has finished evaluating, part of the refinement proof involves proving that the CSA module wi l l always be finished by the t ime the qs l_done signal is set to T R U E . Th i s is stated in the next theorem. However, for the signals csa_done and qs l_done to be settled does not imply that they have the value T R U E . T h i s only follows from the property that valid(i) was true and is expressed in theorem 6. Theorem 5 / / the four properties of theorem 4 hold for Pla, then A A < min, =>• (seri/ed(qsl_done) seM/ed(csa_done)). 54 This follows directly from theorem 4. Theorem 6 // , in addition to the four properties of theorem 4, eval(i) =>• valid(i) holds for P\v it follows then that eval(i) => (Vv G w(TJ) : settled(v)) => valid(i © 1) and eval(i) => ( seM/ed(qsl_done) =>• q s l . d o n e A se^/ec?(csa_done) csa_done) This theorem makes a statement about the relationship between inputs and outputs of a stage and follows from the settle-val(i) predicate by Boolean algebra. Theorem 7 The timing graph Q(ptuW{T}t))\ of stage i is acyclic. Proof. This theorem is proven using a depth first search based algorithm on the graph. & 5.3.4 The Abstract Programs P a b S ) s i and P a b s Two additional programs have to be constructed to prove that the transistor level model is a refinement of the word level program. They consist of three stages that have the same inputs and outputs as the transistor level stages, but they do not divide. The first one of these abstract programs, P a b a > a i is speed-independent and is used to derive safety properties for P t l . It doesn't have any timing information and can be model checked. The second abstract program, P a b s , has the same timing bounds as P w l and is only used as an intermediate model to prove that the transistor 55 sum(i-l) carry (i-1) quolient(i-l) divisor "CSA" "QSL" CSA Don. qsl_done QSL Dond X Figure 5.8: The Abstract Stage sum(i) carry (i) quotientbit(i) level program is a refinement of P a b 3_ s i . Both abstract programs have the same variables as the transistor level program and the same initial state predicate. Figure 5.8 shows the structure of an abstract stage. The cells " C S A " and "QSL" are in quotes, as they don't actually add correctly or select a quotient sen-sibly. Rather, each of the 55 carry and sum bits is assigned T R U E or F A L S E nonde-terministically under the restriction that dual-rail pair cannot be set to undefined. A l l internal variables are also assigned values in this way. The quotient is nonde-terministically set to -1, 0 or 1. During the precharge phase, all signals are set to F A L S E . The abstract models, like the transistor level divider, also have q(i) signals that control the evaluation flow. In the abstract models this signal is only set to high after explicitly testing that all the outputs of a stage have valid values (see figure 5.8). In addition to testing that all outputs of a stage have valid values, P a b 3 i 3 i has ->q{i © 1) in the guard of the transition that sets q(i) to high. 56 T h e o r e m 8 The following properties are safety properties of P0 abs, si • (iii) (iv) (ii) (v) (0 i=0 i=0 GTRms,u(pb(i)) =>• ->q(i) GFALSE,tl(q(i)) => ->pb(i) ->eval(i) A -<q{i) A GrRVE>t,(pb(i)) =>• empty(i) eval(i) => valid(i) eval(i) stable(i) P r o o f . All five properties were shown using model checking. O The timed abstract program has the lower timing bound r > p&(i).r+T e v a l u a t e in the guard of the transition that sets the q(i) signal to high and a protocol that specifies how long precharging will take at most. It includes upper bounds of the form for each variable v as well as for the signal q(i). The proof that P a b 3 is a refinement of P a b 3 , 8 i is analogous to the refinement proof in chapter 4 and will not be described here. Since the interfaces between a stage of P a b 3 and a stage of P t l are identical, one can easily exchange a stage of one program with that of the other. This property will be used in the next section to prove that P t l is a refinement of P a b 3 . 5 .3 .5 P t l i s a r e f i n e m e n t o f P a b s Instead of showing that the entire transistor level divider P t I is a refinement of the abstract program P a b 3 , one stage at a time of P a b 3 is replaced by a stage of P t l , as shown in figure 5.9. Let P h J y b r i d denote the program that is obtained by replacing 57 Abstract Stage Arithmetic Stage Abstract Stage Figure 5.9: H y b r i d P rogram stage i of P a b 3 w i th a stage from P t l It is then proven that the hybrid programs - ^ h y b r i d ) - ^ h y b r i d a n d h y b r i d a r e e a c n a refinement of P a b 3 . A s the stages are identical, the proof is identical for al l three programs. It follows then that P t l is a refinement of P a b 3 (see [6]). In proving that the transistor level program is a refinement of the abstract program, it is necessary to use some of the safety properties of P a b 3 . However, since it hasn't been proven yet that P t l is indeed a refinement of P a b 3 , the safety properties cannot be utilized a priori. The next theorem enables the use of safety properties of a specification program in proving that an implementat ion is a refinement of the specification. In the following let P 3 p e c ( < S 3 p e c , TZspec, <2o,3Pec) denote a specification program wi th the set of reachable states <S3pec, the state transi t ion relation 7Zspec and the ini t ia l state predicate <5o,spec- The implementation program is denoted by P i m p ( > S i m p , T ^ i m p j Q o , i m P ) - To simplify the notat ion, it wi l l be assumed that the state transi t ion relations include al l "stuttering" steps, i.e. V s € S : (s, s) € TZ. The set of traces of the implementat ion is denoted by T ( P i m p , Q o , i m P ) -T h e o r e m 9 Given the properties 58 (*) Q'(si) => ( ( s i , s 2 ) G n i m p (A{SL), A(s2)) G TZapec) (ii) V s G <Sspec : Qspec(s) (iii) W eT(P,mp,Q0iimp) : ( V t o Q » p e c ( A ( £ f ) ) =• V j L 0 V s G <S imp : Qo , ,m P (s) Q0,spec(MS)) the implementation P,mp is a refinement of the specification Pspec under the abstrac-tion mapping A : V s , s G S,mp : (s, s) G 1limp = • ( (A(S), A(s') ) G Kspcc). Proper ty (i) states that the predicate Q' implies that Pimp is a refinement of Pspec under the abstraction mapping A . Proper ty (ii) states that Qspec is a safety property of the specification program. If property (iii) can be shown for al l k, where is a natural number, then it follows that Q' is a safety property of P i m p . The last property states that i f s is an in i t ia l state of the implicat ion program, its mapping A(s) is an ini t ia l state in the specification P 3 p e c and thus a reachable state in Pspec. P r o o f . The proof is an inductive proof over the length of traces. It wi l l first be shown that the theorem holds for al l traces of length 1. The inductive step wi l l prove that i f the theorem holds for traces of length /, it w i l l hold for al l traces of length / + 1. B y the definition of the set of reachable states, it follows then that the theorem holds for V s , s G <S imp w i th (s, s) G 7 ^ i m p -59 V i 1 e T ( P i m p , Q 0 , i m p ) , : g 0 , i m p ( i o ) => Q o , 3 P e c ( A ( i 0 ) ) => A(tl) G Sspec => Q'(th) (A{t0),A{t\)) e n s p e c F r o m the last implicat ion follows that the mapping of t\ is also a reachable state in P 3 p e c . Since Q 3 p e c ( A ( i 0 ) ) and Q 3 p e c ( A ( i } ) ) holds, Q'(t\) must hold as well by property ( i n ) . Assuming that for a natural number / it has been shown that Q' holds for all states of all traces of length I and that V ' : 0 (A(tj.),A(tj.+1)) € 7 e 3 p e c , it follows that W + 1 e T ( P i m p , Q o , m p ) , : Q'(t\+1) (A(t'l+1),A(tlll\)) eR3pec and wi th property (iii) this implies that Q'(iJ+j) holds. S Since P a b 3 and P h y b rid have the same variables, let the abstraction mapping A h y b r i d , a b 3 ) which maps states from P h y b r i d to states in P a b 3 , be defined as the identity function: V s G £ h y b r i d : A h y b r i d ] a b 3(s) = (s). T h e o r e m 10 The program Phybrid with the initial state predicate Qo,hybrid is a refine-ment of the abstract program P a { , s with the initial state predicate Qo,ab, under the abstraction mapping Ahybridiab,. 60 P r o o f S k e t c h . This theorem was proven using a proof checker. Instead of present-ing the details of the proof here, only an overview of the strategy applied wi l l be presented. It suffices to show that there exists a predicate Q'(s) and a safety prop-erty Q3pec(s) such that the first three properties of theorem 9 hold. A s by definition Qo .hybr id = Qo .abs ° A h v b r i d i a b 3 , the last property holds t r ivial ly . For stage i of the abstract model , let the predicates of section 5.2 be defined in an analogous fashion and let Qspec be a predicate over the state space of program P a b 3 defined as: QsPec{s) = V?=0 eval(i) => stable(i) A V - = 0 ->eval(i) A -iq(i) A GTm]EM(pb(i)) => empty(i) A V _^o eval(i) =$> valid(i). Let Q' be a predicate over the hybrid program P , j y b r i d : Q'(s) = eval(i) => valid(i) A stable(i) A settle-val(i) A ( r > eval(i).T + m a a ; c s a . d 0 „ o | e „ a , w => csa_done) A ( r < eval(i).r + m m q ! l . d o n 9 | e „ a , w - i qs l . done) Proper ty (i) of theorem 9 is discharged by Boolean manipulat ion using a proof checker and property (ii) is a safety property of P a b 3 _ s i and hence also of P a b 3 . If eval(i) evaluates to false, Q'(s) is t r iv ia l ly true and property (iii) holds. The t iming graph analysis discharges property (iii) for the states where eval(i) holds (see theorems 4, 6 and 7). S Let A t l ) a b 3 be defined as the identity function over the set of states <St]. T h e o r e m 11 The program P„ with the initial state predicate Qot„ = Qo,ab, ° A „ ) 0 6 , is a refinement of program Pab, with the initial state predicate Qo,ab, under the ab-straction mapping A „ ) ( l 6 s . 61 This follows from theorem 10 and [6]. 5 . 3 . 6 P t l is a refinement of Pwl Theorem 12 The program Pu is a refinement of Pml under the abstraction mapping Proof Sketch. Th i s refinement proof was also proven using a proof checker. On ly a sketch of the proof wi l l be presented here to give an intuit ive understanding of the proof. A s described in section 5.3.1, these obligations must be discharged: (i) W h e n a transi t ion in P t l is enabled to set q(i) to high, the transi t ion in P w i that sets q(i) to high is also enabled. (n) The variables carry(i), sum(i) and q b i t ( i ) of P t l under the abstraction mapping A i . w i have the same values as the variables in P w l . (iii) The protocol of Pti implies the protocol of P m l o A t l ; W l . Obligation (i): Since P t l is a refinement of P a b 3 (see theorem 11), the previously proven safety properties of the abstract program are also safety properties of P t l , subject to the abstraction mapping. These safety properties then satisfy the assumptions of theo-rem 4. It then follows that Q'(s) is a safety property of P t l , where Q'(s) = V^_0 eval(i) =>• valid(i) A stable(i) A settlejval(i) A (T > eval(i).T + maa; I H J l i n , csa_done) A (r < eval(i).r + m m q s l . d 0 „ 8 -iqsl.done). 62 The variable q s l . d o n e is set to T R U E without any t ime delay as soon as one of the q b i t variables goes high, hence it follows that V t 2 = 0 eval{i) =4> ( r < eval(i).r + minvi_„„„, V j _ 0 - i q b i t ^ i ) ) Furthermore, the properties FALSE,tl (q(i)) => -.pft(i) * 2 = 0 ^TRUE.t l (p&(*)) =>• --ffW are safety properties of P a b 3 ] 3 i and therefore of P t ! (see theorem 8). They imply that, for eval(i) to hold, first g(i) must have been set to false and then pb(i) to high. It follows that V 2 _ 0 eval(i) eval(i).r = pb(i).r and that V f _ 0 eval(i) A q b i t ^ ( i ) => r > pb(i).r + mzn q s l . d < i n „ Note that according to theorem 6 only one q b i t - ( i ) can have the value T R U E . The transit ion TTRVEM(q(i)) which sets q(i) to T R U E in the transistor level program is enabled iff exactly one of the q b i t ( i ) variables is T R U E and eval(i) holds. If m m , , u . n . > T e v a l u a t e , it follows that (q(i)) G TRUE.wl where G T R U E , w i ( g ( « ) ) is the guard of the transit ion setting q(i) to T R U E in the word level program. T h i s discharges obligation (i). Obligation (ii): The predicate settle-val(i) establishes a connection between the inputs and the out-puts of a stage. Based on this predicate, tautology checking using the proof checker 63 proved that the values of the transistor level stage outputs were the same under the abstraction mapping as the values of the word level stages. Obligation (iii): T h e protocol of P t l includes an upper bound for the transi t ion that sets q(i) to F A L S E : (-ipb(i) A q(i)) ( r < pb(i).T + 7? a x). Therefore, if *yfax < T f precharge ? the protocol of P t l implies the protocol of P w l under the abstraction mapping A t l ) W l and obligation (ii) is discharged. Th i s completes the proof that P t l is a refinement of P w l . 64 Chapter 6 Conclusion T h i s thesis presented a refinement based proof strategy for verifying the correctness of an asynchronous divider chip. Due to the large state space of the transistor level model , automatic model checking is infeasible. Likewise, constructing a proof from first principles in a t radi t ional theorem prover would be extremely t ime consuming and tedious. Th is thesis presented a pract ical approach employing four progressively more detailed models of the divider . K e y safety properties are established for the abstract models using model checking. These properties are shown to hold for the more detailed models by verifying refinement between adjacent levels in the model hierarchy. Funct ional correctness of the divider is proven on the first, most abstract model , and the second model is shown to be speed-independent. T h e pract ical i ty of the strategy is demonstrated by proving three out of the five steps that constitute the correctness proof for the divider . Furthermore, this thesis showed how different formal verification methodolo-gies can be combined to prove a chip design correct. The proof strategy combined techniques from model checking, theorem proving as well as t iming verification using 65 graph based approaches to overcome the l imitat ions that each exhibits on its own. M o d e l checking was used to automatical ly prove safety properties on the abstract models and a t iming graph was used to extract t iming properties of the transis-tor level model . These were needed to prove that its behaviour corresponds to the behaviour allowed by the word-level models. The aforementioned verification tech-niques were formalized as domain-specific decision procedures in a proof checker, which also includes general purpose decision procedures for proposit ional logic and linear equalities and inequalities. T h e proof checker ensured that al l pending proof obligations were discharged, and thus ascertained soundness of the proof, given the soundness of the decision procedures. The proof checker used for this research is s t i l l under development and the research presented in this thesis provided a significant example for its development. The first prototype of the proof checker was wri t ten in the F L language [14]. The proofs attempted here revealed performance problems in the first prototype; verify-ing speed-independence of the word-level model took several hours. A newer version, implemented in M L has significantly better performance; the proofs described in this thesis took only a few minutes. To achieve this performance, it was necessary to specify good orderings for variables for the B D D based methods. Variable ordering affects the performance but not the semantics of these decision procedures. Inter-faces for specifying variable ordering were added to the proof checker in response to experience gained in developing the proofs presented here. 6.1 Future Work To complete the proof of the divider proposed in chapter 3, two more steps are required. F i r s t , a suitable synchronous divider model needs to be constructed, 66 and it need to be proven that its output converges towards the quotient of the inputs. Second, it has to be shown that the speed-independent word level model is a refinement of the synchronous model. It would furthermore be interesting to apply the research presented in this thesis to other t imed chip designs. The proof strategy of this thesis was geared towards proving this particular divider chip correct. A p p l y i n g the method to other t imed chip designs might lead to a more general description of the proof method-ology. For example extending the proof of the three stage divider to the five stage divider described in [20] could be a first step towards developing a general strategy in proving t imed chips correct. 67 Bibliography [1] Martin Abadi and Leslie Lamport. An old-fashioned recipe for real time. In J . W . de Bakker et al., editors, Proceedings of the REX Workshop, "Real-Time: Theory in Practice". Springer-Verlag, 1992. L N C S 600. [2] Bernard Bertholomieu and Michel Diaz. Modeling and verification of time dependent systems using time petri nets. IEEE Transactions on Software En-gineering, 17(3):259-273, March 1991. [3] Randal E . Bryant. Bit-level analysis of an SRT divider circuit. Technical Re-port CMU-CS-95-140, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 1995. [4] J . R. Burch, E . M . Clarke, D . E . Long, K . L . McMil lan , and D . L . Di l l . Symbolic model checking for sequential circuit verification. Technical Report C M U - C S -93-211, School of Computer Science, Carnegie Mellon Univeristy, July 1993. [5] Joseph J . F . Cavanagh. Digital Computer Arithmetic : Design and Implemen-tation. McGraw-Hil l , New York, 1984. [6] Mark Greenstreet. Synchronized Transitions, unpublished manuscript, 1997. 68 [7] Mark R. Greenstreet. START. A Technique for High-Bandwidth Communi-cation. PhD thesis, Department of Computer Science, Princeton University, January 1993. [8] Thomas A . Henzinger, Zohar Manna, and Amir Pnueli. Temporal proof methodologies for real-time systems. In ACM Symposium on Principles of Programming Languages, pages 353-366, 1991. [9] Trevor Lee, Mark Greenstreet, and Carl-Johan H . Seger. Automatic verification of asynchronous circuits. Technical Report TR-93-40, Department of Computer Science, University of British Columbia, Vancouver, November 1993. U R L : http://www.cs.ubc.ca/tr/1993/TR-93-40.ps. [10] Trevor W . S. Lee. ST to F L translation for hardware design verification. Mas-ter's thesis, Department of Computer Science, University of British Columbia, Apri l 1994. [11] Gensoh Matsubara and Nobuhiro Ide. A low power zero-overhead self-timed division and square root unit. In Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems. I E E E Computer Society Press, Apri l 1997. [12] S. Owre, J . M . Rushby, and N . Shankar. P V S : A prototype verification system. In Deepak Kapur, editor, 11th International Conference on Automated Deduc-tion (CADE), number 607 in Lecture Notes in Artificial Intelligence, pages 748-752, Saratoga, N Y , 1992. Springer Veriag. [13] H . Ruefi, N . Shankar, and M . K . Srivas. Modular verification of SRT division. In R. Alur and T. A . Henzinger, editors, Computer-Aided Verification, CAV '96, 69 Lecture Notes in Computer Science, New Brunswick, N J , July 1996. Springer-Verlag. To appear. [14] Carl-Johan Seger. Voss — A formal hardware verification system: User's guide. Technical Report UBC-CS-93-45, Department of Computer Science, University of British Columbia, Vancouver, B C , 1993. [15] Jens Spars0 and J0rgen Staunstrup. Delay-insensitive multi-ring structures. INTEGRATION, 15(3):313-340, October 1993. [16] J . A . Stankovic. Misconceptions about real-time computing. IEEE Computer, 21(10):10-19, October 1988. [17] J0rgen Staunstrup. A formal approach to hardware design. Kluwer Academic Publishers, Boston, 1994. [18] T. E . Williams, M . A . Horowitz, R. L . Alverson, and T. S. Yang. A self-timed chip for division. In Stanford Conference on Advanced Research in VLSI, pages 75-96, March 1987. [19] Ted E . Williams and Mark A . Horowitz. SRT division diagrams and their usage in designing custom integrated circuits for division. Technical Report C S L - T R -87-326, Computer Systems Lab, Dept. of E E , Stanford, November 1986. [20] Ted E . Williams and Mark A . Horowitz. A 160ns 54 bit C M O S division imple-mentation using self-timing and symmetrically overlapped SRT stages. In 10th IEEE Symposium on Computer Arithmetic, June 1991. 70 Appendix A Source Code c* * d e f i n i t i o n module for the main function * of the timed word l e v e l program *) DEFINITION MODULE main; FROM types IMPORT BinNumber, BinElement, ResElement, ResNumb DRArray, ArrayOfDRA; FROM divider IMPORT BinArray; EXPORT Divider; TYPE IntArray(n: INTEGER) = ARRAY(i: INDEX(n)): INTEGER; T = CELL(STATIC realdividend, r e a l d i v i s o r : REAL); F l = FUNCTION(binelem: BinElement): BOOLEAN; F2 = FUNCTION(binnum: BinNumber; n: INTEGER): BinArray; F3 = FUNCTION(binnum: BinNumber; n: INTEGER): IntArray; STATIC Divider: T; BinElemToBool: F l ; BinNumToBool: F2 ; BinNumToInt: F3 ; END. (* main *) 71 (* * implementation module for the main function * of the timed word le v e l program * ) IMPLEMENTATION MODULE main; FROM stringlO IMPORT WriteString; FROM strings IMPORT IntToString, BoolToString; FROM cast IMPORT round; FROM mycast IMPORT Makebin, MakeString; FROM divider IMPORT DividerArray; FROM higherlevel IMPORT BinToReal, s h i f t b i n , GetMantissa, GetExponent; IMPORT con; IMPORT mycast; STATIC changequot e([,]) ifelse(PREC, 1, [Precision : INTEGER = 6;], PREC, 2, [Precision INTEGER = 32;], [Precision : INTEGER = 64] ) ifelse(PREC, 1, [MantPrec : INTEGER = 3;], PREC, 2, [MantPrec : INTEGER = 23;], [MantPrec : INTEGER = 52] ) ifelse(PREC, 1, [ExpPrec : INTEGER =2;], PREC, 2, [ExpPrec : INTEGER =8;], [ExpPrec : INTEGER =11]) ResBits : INTEGER = MantPrec; MantBits : INTEGER = MantPrec + 3; Divider: T = STATE step: INTEGER = 1; STATIC (* * function to change a binary * number into a boolean variable *) 72 BinElemToBool: Fl= BEGIN { ONE | (binelem = 1) ? TRUE, (binelem = 0) ? FALSE } END; (* BinElemToBool *) BinNumToBool: F2 = STATIC z: BinArray(n) = BEGIN BinElemToBool(binnum(i)) END; BEGIN z END; (* BinNumToBool *) binrealdividend: BinNumber(Precision) = Makebin(realdividend); bi n r e a l d i v i s o r : BinNumber(Precision) = Makebin(realdivisor); booldividend: BinArray(Precision) = BinNumToBool(binrealdividend, binrealdividend.n); booldivisor: BinArray(Precision) = BinNumToBool(binrealdivisor, binrealdivisor.n); dividendexp: REAL =(BinToReal(GetExponent(booldividend), 2)); divisorexp: REAL = (BinToReal(GetExponent(booldivisor), 2)); BEGIN (* * print out the arguments i n f i r s t 2 steps *) « step= 1 -> step := step+1 » * WriteString( BEGIN con .cat2("dividend: ", MakeString(realdividend)) END) I | « step= 2 -> step := step+1 » * WriteString( BEGIN con.cat2("divisor: ", MakeString(realdivisor)) END) 73 (* * c a l l the divider array * ) I | « (step = 3) » * DividerArray( GetMantissa(booldividend), GetMantissa(booldivisor), round(dividendexp - divisorexp), MantBits, ResBits) END; (* Divider *) END. (* main module *) (* * d e f i n i t i o n module f o r WLTimedDivider *) DEFINITION MODULE WLTimedDivider ; FROM types IMPORT ResElement, ResNumber; FROM timer IMPORT Time, TimeArray; EXPORT DividerArray, BinArray; TYPE BinArray(n: INTEGER) = ARRAY(i: INDEX(n)): BOOLEAN; PrechargeBar = RECORD t : Time; (* time at which value was l a s t changed *) v : BOOLEAN (* value *) END; Output = RECORD END; StageState = RECORD pb: PrechargeBar; q: Output END; StageArray(n:INTEGER) = ARRAY(i:INDEX(n)):StageState; t : Time; (* time at which value was la s t changed *) v : BOOLEAN (* value *) 7 4 F = CELL(me, next, prev: StageState; r e s u l t : R'esNumber; quotientnumber: INTEGER; remainder, divisor:BinArray; STATIC MantBits: INTEGER; STATIC exp: INTEGER; tau : Time; NextTime: TimeArray; STATIC number: INTEGER); CI = CELL(STATIC bindividend, bindivisor:BinArray; STATIC exp INTEGER; STATIC MantBits, ResBits: INTEGER); STATIC process: F; DividerArray: CI; END. (* divider *) (**********************************************) (* * implementation module f o r WLTimedDivider *) IMPLEMENTATION MODULE WLTimedDivider; FROM stringlO IMPORT WriteString; FROM strings IMPORT IntToString, BoolToString; FROM WLTimedHigherlevel IMPORT divide, inverter, binprint; FROM timer IMPORT ChangeTime, StopPrechargeTime, StartEvaluationTime; IMPORT con; STATIC NoOfStage : INTEGER = 3 ; process: F = BEGIN (* * enable evaluation, i f the successor stage * has started to precharge * (just added the additional constraint that * the output has to drop f i r s t ) *) « NOT next.pb.v AND NOT me.pb.v -> me.pb.v, me.pb.t := TRUE, tau » 7 5 (* * i f successor stage has finish e d 0 . * and i s holding the output, we can start * precharging *) I I « next.pb.v AND next.q.v AND me.pb.v -> me.pb.v, NextTime(number).otime, me.pb.t := FALSE, tau + StopPrechargeTime, tau » (* * as soon as we start precharging, we can * drop the output signal *) I I « NOT me.pb.v AND me.q.v -> me.q.v, NextTime(number).otime, me.q.t := FALSE, 0 . 0 , tau » (* * i f in 0 . mode ( i . e . no precharge), * and predecessor has value, can determine * the next quotient * * setting count to zero enables the divider * which sets me.q.v to TRUE after determining * the next quotient d i g i t *) I I « me.pb.v AND (tau >= me.pb.t + StartEvaluationTime) AND NOT me.q.v » * divide(remainder, d i v i s o r , r e s u l t , exp, MantBits, quotientnumber, me.q.v) END; (* process *) STATIC DividerArray: CI = TYPE LCI = CEL L ( f i r s t , second, t h i r d : StageState; r e s u l t : ResNumber); LF1 = FUNCTION(first, second, t h i r d : StageState): STRING; 76 LF2 = FUNCTION(bindividend, bindivisor: BinArray; n: INTEGER): ResElement; LF3 = FUNCTION(bindividend: BinArray; n: INTEGER): ResElement; STATE NextTime: TimeArray(NoOfStage); tau : Time = 0.0; f i r s t : StageState; second: StageState; t h i r d : StageState; I n i t : BOOLEAN = TRUE; counter: INTEGER = 0; quotientnumber: INTEGER = ResBits-2; result : ResNumber(ResBits); (* * work with positive dividend and di v i s o r *) remainder: BinArray(MantBits) = BEGIN {ONE | ( NOT bindividend(MantBits - D ) ? bindividend(i), bindividend(MantBits-l) ? inverter(bindividend, MantBits)(i) } END; posdivisor: BinArray(MantBits) = BEGIN {ONE I ( NOT bindivisor(MantBits-1)) ? b i n d i v i s o r ( i bindivisor(MantBits-1) ? inverter(bindivisor, MantBits)(i) } END; negdivisor: BinArray(MantBits) = BEGIN {ONE | bindivisor(MantBits-l) ? b i n d i v i s o r ( i ) , ( NOT bindivisor(MantBits-l))? inverter(bindivisor, MantBits)(i) } END; STATIC detsign: LF2 = (* function to determine sign of result *) BEGIN { ONE| (NOT bindividend(n-1)) AND (NOT bi n d i v i s o r ( n - l ) ) ? 0 bindividend(n-1) AND bindivisor(n-1)? 0, (NOT (bindividend(n-l)=bindivisor(n-l)))? 1 } 77 END; STATIC detquotient: LF3 = (* function to determine quotient b i t *) BEGIN { ONE | NOT (bindividend(n-l)) ? 1, (bindividend(n-l) AND bindividend(n-2) ) AND bindividend(n-3) ? 0, (bindividend(n-1)) AND ( NOT (bindividend(n-2)) OR NOT (bindividend(n-3))) ? -1 } END; STATIC I n i t i a l i z e : LCI = (* * i n i t i a l i z e state of stage * and put the sign of the * quotient into result.n-1 * ) BEGIN « fi r s t . p b . v , f i r s t . q . v , second.pb.v, second.q.v, third.pb.v, third.q.v, result(result.n-1), result(result.n-2), f i r s t . p b . t , second.pb.t, third.pb.t := TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, detsign(bindividend, bi n d i v i s o r , MantBits), detquotient(remainder, MantBits), 0.0, 0.0, 0.0 » * WriteString(BEGIN " I n i t i a l i z i n g . . . " END) END; STATIC Printer: LF1 = (* pr i n t state of the stages *) BEGIN con.cat3( con.cat3( BoolToString(first.pb.v, FALSE), BoolToString(first.q.v, FALSE), " " ), con.cat3( BoolToString(second.pb.v, FALSE), BoolToString(second.q.v, FALSE), " " ), con.cat3( BoolToString(third.pb.v, FALSE), 78 Boo lToStr ing ( th i rd .q .v , FALSE), " " ) ) END; STATE sign : ResElement = dets ign(bindividend, b i n d i v i s o r , MantBits); BEGIN « In i t -> In i t := FALSE » * I n i t i a l i z e d i r s t , second, t h i r d , resu l t ) I I « NOT In i t AND (counter < 2000) -> counter := counter + 1 » * ( process( f i r s t . s e c o n d , t h i r d , r e s u l t , quotientnumber, remainder, posd iv i sor , MantBits, exp, tau , NextTime, 0) II process( s e c o n d , t h i r d , f i r s t , r e s u l t , quotientnumber, remainder, posd iv i sor , MantBits, exp, tau , NextTime, 1) II process( t h i r d , f i r s t , second, r e s u l t , quotientnumber, remainder, posd iv i sor , MantBits, exp, tau , NextTime, 2) I | ChangeTime(tau, tau , NextTime) ) END; (* DividerArray *) END. (* d iv ider module *) (* * d e f i n i t i o n module for WLTimedHigherlevel *) DEFINITION MODULE WLTimedHigherlevel; FROM types IMPORT ResElement, ResNumber, BinNumber, ArrayOfBool; FROM WLTimedDivider IMPORT BinArray; EXPORT BinToReal, d i v i d e , GetExponent, GetMantissa, r e s p r i n t , s h i f t b i n , p r i n t r e s u l t , inver ter , b i n p r i n t ; 79 TYPE F l = FUMCTIOM(m, nbits: INTEGER): BinArray; F2 = FUNCTION(bindividend, bindivisor: BinArray; n: INTEGER): BOOLEAN; F4 = FUNCTION(bool: BOOLEAN): INTEGER; F5 = FUNCTION(bindividend, bindivisor: BinArray; n: INTEGER): BinArray; F6 = FUNCTION(binnumber: BinArray; n: INTEGER): BOOLEAN; F7 = FUNCTION(binnumber: BinArray; n: INTEGER): BinArray; F8 = FUNCTION(binnumber: BinArray; n: INTEGER): REAL; F9 = FUNCTION(bindividend, bindivisor: BinArray; n, m: INTEGER): BinArray; F10 = FUNCTION(result: ResNumber; n, exp: INTEGER): STRING; F l l = FUNCTION(bindividend, bindivisor: BinArray; n: INTEGER): ResElement; F12 = FUNCTION(bindivisor: BinArray; n: INTEGER; m: ResElement) BinArray; F13 = FUNCTION(a: INTEGER): REAL; F14 = FUNCTION(bindividend: BinArray; n: INTEGER): ResElement; F15 = FUNCTION(binnumber: BinArray): BinArray; F16 = FUNCTION(binnumber: BinArray): BinArray; F17 = FUNCTION( num: BinArray; n: INTEGER):STRING; F18 = FUNCTI0N( num: ResNumber; n: INTEGER):STRING; F19 = FUNCTION(summand: BinArray; n: INTEGER):BinArray; CI = CELL(remainder, divisor:BinArray ; r e s u l t : ResNumber; STATIC exp: INTEGER; STATIC n: INTEGER; quotientnumber: INTEGER; done: BOOLEAN ); STATIC change2bin: F l ; bitadd: F2; carry: F2; BoolToInt: F4; bitadder: F5; adder: F9; detsign: F l l ; 80 getsummand: F12; detquotient: F14; BinToReal: F8; divide: CI; ca r r i e r : F5; invert: F6; inverter: F7; s h i f t b i n : F7; printresu.lt: F10; power2: F13; IntToReal: F13; binprint: F17; resprint: F18; getshort: F19; GetMantissa: F15; GetExponent: F16; END. (* higherlevel *) (* * implementation module f o r WLTimedHigherlevel *) IMPLEMENTATION MODULE WLTimedHigherlevel; FROM stringlO IMPORT WriteString; FROM strings IMPORT IntToString, BoolToString; FROM cast IMPORT r e a l , round; FROM mycast IMPORT Makebin, MakeString; IMPORT con; IMPORT mycast; STATIC changequot e([, ] ) ifelse(PREC, 1, [Precision : INTEGER = 6;], PREC, 2, [Precision INTEGER = 32;], [Precision : INTEGER = 64] ) ifelse(PREC, 1, [MantPrec : INTEGER = 3;], PREC, 2, [MantPrec : INTEGER = 23;], [MantPrec : INTEGER = 52] ) ifelse(PREC, 1, [ExpPrec : INTEGER = 2;], PREC, 2, [ExpPrec : 81 INTEGER =8;], [ExpPrec : INTEGER =11]) ResBits : INTEGER = MantPrec; MantBits : INTEGER = MantPrec + 3; (* * function to compute the powers of 2 *) power2: F13 = BEGIN ONE 1 a = -1 ? 0.5, a = -2 ? 0.25, a = -3 ? 0.125, a = -4 ? 0.0625, a = -5 ? 0.03125, a < -5 ? 0.03125* a = 0 ? 1, a = 1 ? 2, a = 2 ? 4, a = 3 ? 8, a = 4 ? 16, a = 5 ? 32, a = 6 ? 64, a = 7 ? 128, a = 8 ? 256, a = 9 ? 512, a > 9 ? 1024.0 *p } END; resprint: F18 = (* function for printing *) BEGIN con.cat2( { ONE | n = 0 ? IntToString(num(n), 3), n > 0 ? con.cat2(IntToString(num(n), 3), resprint(num, n-1)) } , END; (* resprint *) 82 STATIC binprint: F17 = (* function f or p r i n t i n g *) BEGIN con.cat2( •C ONE | n = 0 ? BoolToString(num(n) , FALSE), n > 0 ? con.cat2(BoolToString(num(n), FALSE), binprint(num, n-1)) >, M I I ^ END; (* binprint *) (* * function to change a binary * number into a decimal number *) BinToReal: F8 = TYPE ArrayIndex = INTEGER SUBRANGE 0 .. binnumber.n-2; BEGIN • {+ i : ArrayIndex I real( BoolToInt(binnumber(i))) * (power2(i-(n-2))) } * { ONE I ( NOT binnumber(binnumber.n-1))? 1.0, binnumber(binnumber.n-1) ? -1.0 } END; (* * function to change a number into a binary number * ( f i r s t b i t i s sign b i t ) *) change2bin: F l = STATIC z: BinArray(nbits) = BEGIN FALSE END; BEGIN z END; (* change2binary *) (* * function to a binary number * to the l e f t *) s h i f t b i n : F7= 83 STATIC z: BinArray(n) = BEGIN { ONE | i = n-1 ? binnumber(n-l), i = 0 ? 0, ((i>0) AND (i<n-l))? binnumber(i-l) } END; BEGIN z END; (* * function to change a boolean * into a binary number *) BoolToInt: F4= BEGIN { ONE I bool = TRUE ? 1, bool = FALSE ? 0 } END; (* BoolToInt *) (* * function that adds two b i t s *) bitadd: F2 = BEGIN (bindividend(n) XOR bindivisor(n)) END; (* add *) (* * function that adds the b i t s * of a binary number *) bitadder: F5 = STATIC z: BinArray(n) = BEGIN bitadd(bindividend, bindivisor, END; BEGIN z END; (* bitadder *) 84 (* * function to add two binary numbers *) adder: F9 = STATIC z: BinArray(n) = { ONE | m>0 ? adder( bitadder(bindividend, bindivisor, i carrier(bindividend, bindivisor, n, m-1), m = 0 ? bitadder(bindividend, bindivisor, n) } ; BEGIN z END; (* adder *) (* * function that determines the carry b i t of two b i t s *) carry: F2 = BEGIN { ONE I n = 0 ? FALSE, n > 0 ? (bindividend(n-l) AND b i n d i v i s o r ( n - l ) ) } END; (* carry *) (* * function that determines the carry * b i t s of two binary numbers *) ca r r i e r : F5 = STATIC z: BinArray(n) = BEGIN carry(bindividend, bindivisor, i ) END; BEGIN z END; (* c a r r i e r *) (* * function to invert b i t s *) invert: F6 = 85 BEGIN ( NOT binnumber(n) ) END; (* invert *) (* * function to invert a binary number * (hope t h i s works for a l l cases! ) *) inverter: F7 = STATIC z: BinArray(n) = BEGIN invert (birinumber, i ) END; x: BinArray(n) = BEGIN { 0NE| i = 0? TRUE, i > 0? FALSE } END; BEGIN adder(z, x, n, n) END; (* inverter *) (* * c e l l that does the d i v i s i o n * i t f i r s t checks i f the mantissa of the * dividend and diviso r are the same * ) divide: CI = STATIC getsummand: F12 = (* function to determine the summand *) STATIC z: BinArray(n) = { ONE | m = 0 ? change2bin(0,n), m = -1 ? bind i v i s o r , m = 1 ? inverter(bindivisor, n) >; BEGIN z END; detquotient: F14 = (* function to determine quotient b i t *) BEGIN { ONE | NOT (bindividend(n-l)) ? 1, ((bindividend(n-l)) AND (bindividend(n-2)) ) AND (bindividend(n-3)) ? 0, 86 (bindividend(n-l)) AND ( NOT (bindividend(n-2)) OR NOT (bindividend(n-3))) ? -1 } END; detsign: F l l = (* fmiction to determine sign of result BEGIN { ONE| (NOT bindividend(n-l)) AND (NOT bindivisor(n-1))? 0, bindividend(n-1) AND bindivisor(n-1)? 0, (NOT (bindividend(n-l)=bindivisor(n-l)))? 1 } END; getshort: F19 = STATIC z: BinArray(3) = BEGIN summand(n-(4-i)) END; BEGIN z END; STATE count : INTEGER = 0; BEGIN (* * note that the quotient i s determined using approx * rather than remainder; t h i s i s to simulate the * quotient selection of the tr a n s i s t o r l e v e l divider * ) « NOT done AND ((quotientnumber >= 1) AND (quotientnumber <= ResBits-1)) -> result(quotientnumber-1), remainder , quotientnumber, done := detquotient(adder(getshort(remainder, n) , getshort(getsummand(divisor, n, ^ result(quotientnumber)), n), 3, 3), 3), sh i f t b i n ( adder(remainder, 87 getsummand(divisor, n, result(quotientnumber)), n, n), n), quotientnumber-1, TRUE » I | « NOT done AND (count = 0) AND (quotientnumber = 0) -> quotientnumber := -1 » * WriteString( BEGIN p r i n t r e s u l t ( r e s u l t , ResBits, exp) END) END; p r i n t r e s u l t : F10 = STATIC posbits: BinArray(n) = BEGIN { 0NE| r e s u l t ( i ) = 0 ? FALSE, r e s u l t ( i ) = 1 ? TRUE, r e s u l t ( i ) = -1 ? FALSE } END; negbits: BinArray(n) = BEGIN { ONE I r e s u l t ( i ) = 0 ? FALSE, r e s u l t ( i ) = 1 ? FALSE, r e s u l t ( i ) = -1 ? TRUE } END; getsummand: F5 = STATIC z: BinArray(n) = { ONE | bindividend(n-l) ? inverter(bindivisor, n), ( NOT bindividend(n-l) ) ? inverter(bindivisor, n) >; BEGIN z END; BEGIN con.cat3("result : ", MakeString(BinToReal(adder(posbits, getsummand(posbits, negbits.result.n), result.n, result.n), result.n) * power2(exp)), "\n") END; (* * function to get the mantissa of the re a l number * (in single format); note that the i m p l i c i t "1" * at the most s i g n i f i c a n t b i t i s e x p l i c i t here; * i n case i t i s a neg number, have to use the inverted * number 88 *) GetMantissa: F15 = TYPE LF1 = FUNCTION(binnumber: BinArray; i : INTEGER): INTEGER; STATIC bininvert : BinArray(MantPrec+2) = BEGIN inverter(binnumber, MantPrec+2)(i) END; (* * need to check i f the whole mantissa * (without the unity and i m p l i c i t * b i t ) i s zero, because i n that case inverting the * number gives an * overflow; i f i t i s zero, the i m p l i c i t b i t has to be * set to 1 * e.g. i n the case of 4 or 8 *) checkifzero: LF1 = BEGIN { ONE| i > 0 ? { ONE| binnumber(i) ? 1, ( NOT binnumber(i) )? checkifzero(binnumber, i-1) }, i = 0? { ONE I binnumber(0) ? 1, ( NOT binnumber(0)) ? 0 } }END; z: BinArray(MantBits) = BEGIN { ONE | i = (MantBits-3) ? { ONE I checkifzero(binnumber, MantPrec-1)=0 ? TRUE , checkifzero(binnumber, MantPrec-1) = 1 ? (NOT binnumber(Precision-l) ) }, (* " i m p l i c i t " b i t *) i = (MantBits-2) ? (* unity b i t *) { 0NE| ( NOT binnumber(Precision-l) )? FALSE, binnumber(Precision-l) ? TRUE } , i = (MantBits-1) ? binnumber(Precision-l), (* sign b i t *) ( i < MantBits-3) ? { ONE| NOT binnumber(Precision-l) ? binnumber(i), 89 binnumber(Precision-l) ? bin i n v e r t ( i ) } } END; BEGIN z END; (* * function to get the exponent of the r e a l number * b i t number ExpPrec+1 s i g n i f i e s the * sign of the exponent *) GetExponent: F16 = STATIC z: BinArray(ExpPrec + 1) = BEGIN { ONE I i < (ExpPrec)? { ONE I binnumber(i + MantPrec) ? TRUE, (NOT binnumber(i + MantPrec)) ? FALSE }, i = (ExpPrec)? FALSE }END; BEGIN z END; END. (* higherlevel module *) (***************************************) (* * d e f i n i t i o n module for the main function * of the tra n s i s t o r l e v e l program *) DEFINITION MODULE main; FROM types IMPORT BinNumber, BinElement, ResElement, ResNumber, DRArray, ArrayOfDRA; EXPORT Divider; TYPE T = CELL(STATIC realdividend, r e a l d i v i s o r : REAL); STATIC 90 Divider: T; END. (* main *) (* * implementation module for the main function * of the tr a n s i s t o r l e v e l program * ) IMPLEMENTATION MODULE main; FROM stringlO IMPORT WriteString; FROM strings IMPORT IntToString, BoolToString; FROM cast IMPORT round; FROM mycast IMPORT Makebin, MakeString; FROM t r a n s i s t o r l e v e l IMPORT PrintDRArray, TransistorLevelDivider; FROM higherlevel IMPORT getmantissa, BinToReal, divide, BinToDR, getexponent, s h i f t b i n , inverter; IMPORT con; IMPORT mycast; STATIC changequot e([,]) ifelse(PREC, 1, [Precision : INTEGER = 6;], PREC, 2, [Precision : INTEGER = 32;], [Precision : INTEGER = 64] ) ifelse(PREC, 1, [MantPrec : INTEGER =3;], PREC, 2, [MantPrec : INTEGER = 23;], [MantPrec : INTEGER = 52] ) ifelse(PREC, 1, [ExpPrec : INTEGER =2;], PREC, 2, [ExpPrec : INTEGER =8;], [ExpPrec : INTEGER =11]) ResBits : INTEGER = MantPrec; MantBits : INTEGER = MantPrec + 3; Divider: T = TYPE LF1 = FUNCTION(bindividend, bindivisor: BinNumber; n: INTEGER): BOOLEAN; qindex = INTEGER SUBRANGE 0 .. MantBits-1; 91 STATIC binrealdividend: BinNumber(Precision) = Makebin(realdividend); bi n r e a l d i v i s o r : BinNumber(Precision) = Makebin(realdivisor); dividendexp: REAL= (BinToReal(getexponent(binrealdividend), 2) ) ; divisorexp: REAL = (BinToReal(getexponent(binrealdivisor), 2)); STATE quotientreg: ArrayOfDRA(MantBits, 3); result : ResNumber(ResBits); step: INTEGER =1; dividerdone: BOOLEAN = FALSE; DRdivisor: DRArray(MantBits); DRdividend: Array0fDRA(3,MantBits); bindivisor: BinNumber(MantBits) = getmantissa(binrealdivisor); bindividend: BinNumber(MantBits) = getmantissa(binrealdividend); sign: DRArray(3); quotient: INTEGER = 0; STATIC posdividend: BinNumber(MantBits) = BEGIN {ONE | bindividend(MantBits-l) = 0? bindividend(i), bindividend(MantBits-l) = 1? inverter(bindividend, MantBits)(i) }END; posdivisor: BinNumber(MantBits) = BEGIN {ONE | bindivisor(MantBits-l) = 0? b i n d i v i s o r ( i ) , bindivisor(MantBits-l) = 1? inverter(bindivisor, MantBits)(i) } END; detsign: LF1 = (* function to determine sign of result *) BEGIN { 0NE| (bindividend(n-l)=0) AND (bindivisor(n-l)=0)? FALSE, (bindividend(n-l)=l) AND (bindivisor(n-l)=l)? 92 FALSE, (NOT (bindividend(n-l)=bindivisor(n-l)))? TRUE } END; BEGIN « step= 1 -> step := step+1 » * WriteString( BEGIN con.cat2("dividend: ", MakeString(realdividend)) END) I I « step= 2 -> step := step+1 » * WriteString( BEGIN con.cat2("divisor: ", MakeString(realdivisor)) END) I | « step = 3 -> step := step + 1 » * BinToDR(shiftbin(posdivisor, MantBits), DRdivisor) * BinToDR(posdividend, DRdividend(l)) I | « step = 4 -> step := step + 1 » * WriteString( BEGIN con.cat2("Divisor: ", PrintDRArray(DRdivisor)) END) * WriteString( BEGIN con.cat2("Dividend: ", PrintDRArray(DRdividend(l))) END) I | « step = 5 » * TransistorLevelDivider(DRdivisor, DRdividend, quotientreg, round(dividendexp - divisorexp)) END; (* Divider *) END. (* main module *) ^^^^^^^^^^^^^^^ic^^^^^^^:^:^:^^*************) (* * d e f i n i t i o n module for t r a n s i s t o r l e v e l * of the timed word l e v e l program *) DEFINITION MODULE t r a n s i s t o r l e v e l ; FROM types IMPORT BinNumber, BinElement, ResElement, ResNumber, DualRail, DRArray, ArrayOfDRA, ArrayOfBool, BoolArray; FROM timer IMPORT Time, TimeRecord, ArrayOfPrechargeBar, PrechargeBar; EXPORT TransistorLevelDivider, PrintDRArray, DRtoRes ; 93 TYPE DR2B = FUNCTION(x: DualRail): BOOLEAN; C101 = CELL(pb: BOOLEAN; a, b, carryin, sum : DualRail); C102 = CELL(pb: BOOLEAN; a, b, carryin, carryout : DualRail); C103 = CELL(a, b, c, d : DRArray); C104 = CELL(pb: BOOLEAN; a, b, carryin, carryout, sum : DualRail); C105 = CELL(pb: BOOLEAN; carry, remainder, d i v i s o r , carryout, sum : DRArray; carrybit: DualRail; TimeTable: TimeRecord;STATIC me: INTEGER); C106 = CELL(pb: BOOLEAN; quotient:ArrayOfBool; d i v i s o r , output: DRArray; carrybit: DualRail; TimeTable: TimeRecord; STATIC me: INTEGER); C107 = CELL(pb: PrechargeBar; sum: DRArray; quotient: ArrayOfBool; qsldone: DualRail; TimeTable: TimeRecord; tau: Time; STATIC me: INTEGER); C108 = CELL(pb: BOOLEAN; a, b, sum : DRArray; TimeTable: TimeRecord; STATIC me: INTEGER); C109 = CELL(drarray: DRArray); C1010 = CELL(a, nega: DRArray); C1011 = CELL(pb: BOOLEAN; input: DRArray; done: DualRail); C1012 = CELL(pb: PrechargeBar;STATIC number: INTEGER; qsldone: DualRail; done, donenext: DRArray; tau: Time); ProcessCell = CELL(pb: ArrayOfPrechargeBar;STATIC number: INTEGER; diviso r : DRArray; carry, sum: ArrayOfDRA; quotient: BoolArray; done: ArrayOfDRA; quotientnumber: INTEGER; tau : Time; TimeTable: TimeRecord); StageCell = CELL(pb: ArrayOfPrechargeBar;STATIC number: INTEGER; di v i s o r : DRArray; carry, sum: ArrayOfDRA; quotient: BoolArray; done: ArrayOfDRA; TimeTable: TimeRecord; tau: Time); C1013 = CELL(quotient: DRArray; quotientreg: ArrayOfDRA; quotientnumber: INTEGER); C1014 = CELL(resnumber: ResNumber; dualrailnumber:ArrayOfDRA); C1015 = CELL(dualrailnum, shiftednum: DRArray); C1016 = CELL(pb: BOOLEAN; input: ArrayOfBool; done: DualRail; TimeTable: TimeRecord;STATIC me: INTEGER); F101 = FUNCTION( a, b, carryin, carryout, sum : DRArray) 94 : STRING; F102 = FUNCTION( a, b, carryin, carryout, sum : DRArray; i INTEGER): STRING; F103 = FUNCTI0N( a, b, carryin, carryout, sum : DualRail): STRING; F104= FUNCTION(a: DualRail): STRING; F105= FUNCTION(array: DRArray): STRING; F106= FUNCTION(a: INTEGER): INTEGER; F107= FUNCTIONS: DualRail): BOOLEAN; T10 = CELL(divisor: DRArray; sum, quotientreg: ArrayOfDRA; STATIC exp: INTEGER ); STATIC AdderPos : C101; AdderNeg : C101; CarryPos : C102; CarryNeg : C102; I n i t i a l i z e : C103; AdderStage: C104; CarrySave: C105; MUX: C106; QSL: C107; CarryPropagat e: C108; InitEmpty: C109; InitFalse: C109; MakeNeg: C1010; StageProcess: StageCell; Process: ProcessCell; Done: C1012; AdderDone: C1011; QSLDone: C1016; DRPrinter: F103; PrintDR: F104; PrintDRArray: F105; F u l l : F107; TransistorLevelDivider: T10; DRtoRes: C1014; 95 ShiftDR: C1015; END. (* t r a n s i s t o r l e v e l *) (* * implementation module f o r the t r a n s i s t o r l e v e l *) IMPLEMENTATION MODULE t r a n s i s t o r l e v e l ; FROM stringlO IMPORT WriteString; FROM strings IMPORT IntToString, BoolToString; FROM higherlevel IMPORT resprint, p r i n t r e s u l t ; FROM mycast IMPORT Rand, RealPrint; FROM timer IMPORT ChangeTime, PrintTime, GetNextTime, ChangeTTPrecharge, ChangeTTEvaluate, AdderPMaxTime, MUXPMaxTime, QSLPMaxTime, AdderEMinTime, AdderEMaxTime, MUXMaxTime, QSLMaxTime, QSLMinTime, StopPrechargeTime, StartEvaluationTime; FROM STVerification IMPORT Proto, TimingLowerBound; IMPORT con; FROM constants IMPORT Precision, MantPrec, ExpPrec, ResBits, MantBits; STATE STATIC (* * function to a DRArray number by 1 b i t to the l e f t *) ShiftDR: C1015 = TYPE Index = INTEGER SUBRANGE 1 .. dualrailnum.n-1; BEGIN {* i : Index I « shiftednum(i).f, shiftednum(i).t := dualrailnum(i-l).f, dualrailnum(i-l).t » } * « shiftednum(O).f, shiftednum(O).t := TRUE, FALSE » END; O * function to copy a DRArray number 96 *) CopyDR: C1015 = TYPE Index = INTEGER SUBRANGE 0 .. dualrailnum.n-1; BEGIN {* i : Index I « shiftednum(i).f, shiftednum(i).t := dualrailnum(i).f, dualrailnum(i).t » } END; (* * function to change DRArray (quotientreg.) into ResNumber *) DRtoRes: C1014 = TYPE LF1= FUNCTION(dualrailnumber: DRArray): ResElement; STATIC DRtoResElem: LF1 = BEGIN { 0NE| dualrailnumber(2).t ? 1, dualrailnumber(1).t ? 0, dualrailnumber(0).t ? -1} END; BEGIN {* i : INDEX(resnumber.n-1) | « resnumber(i) := DRtoResElem(dualrailnumber(resnumber.n-l-i)) » > END; Empty: F107 = BEGIN (NOT x.t AND NOT x.f) END; F u l l : F107 = BEGIN (x.t OR x.f) END; InitEmpty: C109 = 9 7 BEGIN {* i : INDEX(drarray.n) I « d r a r r a y ( i ) . f , d r a r r a y ( i ) . t := FALSE, FALSE » } END; InitFalse: C109 = BEGIN {* i : INDEX(drarray.n) I « d r a r r a y ( i ) . f , d r a r r a y ( i ) . t := TRUE, FALSE » } END; InitTrue: C109 = BEGIN {* i : INDEX(drarray.n) I « d r a r r a y ( i ) . t , d r a r r a y ( i ) . f := TRUE, FALSE » } END; (* * i f an odd number of inputs i s high, the sum * output i s also high *) AdderPos: C101 = BEGIN « sum.t AND NOT pb -> sum.t := FALSE » I | « NOT sum.t AND pb AND a.t AND b.t AND carryin.t -> sum.t := TRUE » II « NOT sum.t AND pb AND a.f AND ((b.f AND carryin.t) OR (b.t AND carryin.f)) -> sum.t := TRUE » II « NOT sum.t AND pb AND a.t AND b.f AND carryin.f -> sum.t := TRUE » END; (* * i f zero or an even number of inputs i s high, * the sum output i s low * ) AdderNeg: C101 = BEGIN « sum.f AND NOT pb -> sum.f := FALSE » I | « NOT sum.f AND pb AND a.f AND b.f AND carryin.f -> sum.f 98 := TRUE » II « NOT sum.f AND pb AND a.t AND ((b.t AND carryin.f) OR (b.f AND carryin.t)) -> sum.f := TRUE » I | « NOT sum.f AND pb AND a.f AND b.t AND carryin.t -> sum.f := TRUE » END; (* * i f at least two inputs are high, * the carry output i s also high *) CarryPos: C102 = BEGIN « carryout.t AND NOT pb -> carryout.t := FALSE » II « NOT carryout.t AND pb AND a.t AND b.t AND (carryin.f OR carryin.t) -> carryout.t := TRUE » I | « NOT carryout.t AND pb AND a.t AND b.f AND carryin.t -> carryout.t := TRUE » I ! « NOT carryout.t AND pb AND a.f AND b.t AND carryin.t -> carryout.t := TRUE » END; (* * i f less than two inputs are high, * the carry output i s low *) CarryNeg: C102 = BEGIN « carryout.f AND NOT pb -> carryout.f := FALSE » II « NOT carryout.f AND pb AND a.f AND b.f AND (carryin.t OR carryin.f) -> carryout.f := TRUE » I | « NOT carryout.f AND pb AND a.f AND b.t AND carryin.f -> carryout.f := TRUE » I | « NOT carryout.f AND pb AND a.t AND b.f AND carryin.f -> carryout.f := TRUE » END; AdderStage: C104 = BEGIN AdderPos(pb, a, b, carryin, sum) I | AdderNeg(pb, a, b, carryin, sum) 99 I I CarryPos(pb, a, b, carryin, carryout) I | CarryNegCpb, a, b, carryin, carryout) END; (* * carry propagate adder *) CarryPropagate: C108 = STATE carryout: DRArray(3); carry: DRArray(1); i n i t : BOOLEAN = TRUE; STATIC n: INTEGER = a.n-1; BEGIN « i n i t -> i n i t := FALSE » * InitFalse(carry) * InitEmpty(carryout) * WriteString(BEGIN " I n i t i a l i z i n g i n CarryPropagate" END) I I « NOT i n i t » * ( AdderStage(pb, a(n-4), b(n-3), carry(O), carryout(0), sum(0)) II AdderStage(pb, a(n-3), b(n-2), carryout(O), carryout(l), sum(l)) II AdderStage(pb, a(n-2), b ( n - l ) , carryout(l), carryout(2), sum(2)) ) END; CarrySave: C105 = TYPE Stagelndex = INTEGER SUBRANGE 2 .. remainder.n-1; STATE temp: DualRail; i n i t : BOOLEAN = TRUE; BEGIN « i n i t -> i n i t , temp.f, temp.t := FALSE, TRUE, FALSE » I | « NOT i n i t » *( { I I i : Stagelndex I AdderStage(pb, c a r r y ( i - 2 ) , remainder(i-1), 100 d i v i s o r ( i ) , carryout(i), sum(i)) } II AdderStage(pb, carrybit, temp, divisor(O), carryout(0), sum(0)) II AdderStage(pb, temp, remainder(0), d i v i s o r ( i ) , c a rryout(l), sum(l)) ) END; (* * multiplexer; * selects - depending on the quotient - either * the diviso r or the complement of the di v i s o r * (in that case "1" s t i l l has to be added to the * MUX output, when inserted into the adder * i f the quotient i s 0, the output of the MUX * i s FALSE *) MUX: C106 = TYPE LCI = CELL(pb: BOOLEAN; d i v i s o r , output: DualRail; sn, sz, sp: BOOLEAN); . Stagelndex = INTEGER SUBRANGE 0 .. divisor.n-1; STATE I n i t : BOOLEAN = TRUE; STATIC posmux: LCI = BEGIN « NOT pb AND output.t -> output.t :=FALSE » II « NOT output.t AND pb AND ((sp AND d i v i s o r . t ) OR (sn AND d i v i s o r . f ) ) -> output.t := TRUE » II « NOT pb AND ca r r y b i t . t -> c a r r y b i t . t := FALSE » II « NOT car r y b i t . t AND pb AND sn -> ca r r y b i t . t := TRUE » END; negmux: LCI = BEGIN « NOT pb AND output.f -> output.f := FALSE » II « NOT output.f AND pb AND ((sp AND d i v i s o r . f ) OR 101 (sn AND d i v i s o r . t ) OR (sz)) -> output.f := TRUE » II « NOT pb AND car r y b i t . t -> carrybit.f := FALSE » I | « NOT carrybit.f AND pb AND NOT sn -> carry b i t . f := TRUE » END; BEGIN { I I i : Stagelndex I posmuxCpb, d i v i s o r ( i ) , output(i), quotient(2), quotient(l), quotient(O)) II negmux(pb, d i v i s o r ( i ) , output(i), quotient(2), quotient(l), quotient(O)) } END; (* * remove the qsldone input l a t e r ; added * i t to make the simulation run faster *) QSL: C107 = (* gets as input 3 sum b i t s and outputs * 3 DR numbers, one of which i s T, the * others are F * quotient(2) -> -1 * quotient(1) -> 0 * quotient(0) -> +1 * ) BEGIN « NOT pb.v -> quotient(2) := FALSE » II « NOT pb.v -> quotient(l) := FALSE » II « NOT pb.v -> quotient(0) := FALSE » II « pb.v AND sum(2).f AND (NOT quotient(2)) -> quotient(2) := TRUE » II « pb.v AND sum(2).t AND sum(l).t AND sum(0).t AND (NOT quotient(1)) -> quotient(1) := TRUE » II « pb.v AND sum(2).t AND (sum(l).f OR sum(O).f) AND (NOT quotient(O)) -> quotient(O) := TRUE » END; (* * function f o r converting DRArray to String 102 *) PrintDRArray: F105 = TYPE LF1= FUNCTION(a: DualRail): STRING; LF2= FUNCTION(array: DRArray; i : INTEGER): STRING; STATIC printDR: LF1 = (* function f o r converting DR to String *) BEGIN { ONE | (NOT a.t) AND (NOT a.f) ? "_ n, (* empty *) (NOT a.t) AND a.f ? "0",(* " f a l s e " *) a.t AND (NOT a.f) ? "1",(* "true" *) a.t AND a.f • ? "#" (* forbidden *) > END; printDRArray: LF2 = BEGIN con.cat2( printDR(array(i)), { 0NE| i = 0? "", i > 0? con.cat2(printDRArray(array, i - 1 ) , " ") } ) END; BEGIN printDRArray(array, array.n-1) END; (* * function to check i f the adder i s done * (do I need dual r a i l f o r this??) *) AdderDone: C1011 = BEGIN « pb AND (NOT done.t) » * {* i : INDEX(input.n) | « F u l l ( i n p u t ( i ) ) » } * « done.t := TRUE » I | « NOT pb AND done.t » * •C* i : INDEX(input.n) I « Empty(input(i)) » } * « done.t := FALSE » 103 END; (* * function to check i f the QSL is done *) QSLDone: C1016 = BEGIN « pb AND ((input(O) OR input(l)) OR input(2)) AND NOT done.t -> done.t := TRUE » I I « NOT pb AND NOT ((input(O) OR input(1)) OR input(2)) AND done.t -> done.t := FALSE » END; (* * function to check i f the qsl is done *) Done: C1012 = BEGIN « pb.v AND NOT done(O).t AND qsldone.t -> done(0).t := TRUE » I I « NOT pb.v AND done(O).t -> done(O).t := FALSE » END; (* * function to put the quotient bits into * a register *) PutlnRegister: C1013 = BEGIN {* i : INDEX(quotient.n) I « quotientreg(quotientnumber)(i).t, quotientreg(quotientnumber)(i).f := quotient(i).t, quotient(i).f » } END; TransistorLevelDivider: T10 = STATE carry: ArrayOfDRA(3,MantBits); 104 quotient: BoolArrayO,3) ; done: ArrayOfDRA(3,1); pb: ArrayOfPrechargeBar(3); Init: BOOLEAN = TRUE; counter: INTEGER = 0; quotientnumber: INTEGER = 1; carrybit: DualRail; result: ResNumber(ResBits+3); tau : Time = 0.0; TimeTable: TimeRecord(3,10); TYPE Qlndex = INTEGER SUBRANGE 1 .. MantBits-1; BEGIN O * to start the run, I set the f i r s t 2 stages pb to TRUE, and * the done signal of stage no. 1 to T; this way, stage 1 will * be "holding" its output, and stage 0 can start precharging; * stage 2 can then start evaluating *) « Init -> Init, pb(0).v, pb(l).v, pb(2).v, pb(0).tau, pb(l).tau, pb(2).tau, done(l)(0).t, done(l)(0).f, quotient(1)(0), quotient(l)(1), quotient(l)(2) := FALSE, TRUE, TRUE, FALSE, 0.0, 0.0, 0.0, TRUE, FALSE, FALSE, FALSE, TRUE » * « carrybit.t, carrybit.f := FALSE, FALSE » * InitEmpty(sum(0)) * InitEmpty(sum(2)) * InitEmpty(carry(0)) * InitFalse(carry(l)) * InitEmpty(carry(2)) * {* i : Qlndex I InitFalse(quotientreg(i))} * « quotient(0)(0), quotient(0)(1), quotient(0)(2), quotient(2)(0), quotient(2)(1), quotient(2)(2) := FALSE, TRUE, FALSE, FALSE, FALSE, FALSE » * InitEmpty(done(0)) * InitEmpty(done(2)) * WriteString(BEGIN "Initializing" END) 105 I I « NOT I n i t AND (quotientnumber < MantBits) » *( Process(pb, 0,divisor, carry, sum, quotient, done, quotientnumber, tau, TimeTable) II Process(pb, 1,divisor, carry, sum, quotient, done, quotientnumber, tau, TimeTable) II Process(pb, 2,divisor, carry, sum, quotient, done, quot ientnumber, tau, TimeTable) ) I | « (quotientnumber = MantBits ) -> quotientnumber := MantBits + 1 » *{* i:QIndex I WriteString(BEGIN con.cat3(IntToString(i, 10), 11 : ", PrintDRArray(quotientreg(i))) END) } I I « quotientnumber = MantBits + 1 -> quotientnumber : = quotientnumber + 1 » DRtoRes(result, quotientreg) I | « quotientnumber = MantBits + 2 -> quotientnumber := quotientnumber + 1 » * WriteString( BEGIN con.cat2( "result : ", res p r i n t ( r e s u l t , ResBits+2)) END) * WriteString( BEGIN p r i n t r e s u l t ( r e s u l t , ResBits+3, exp) END) END; Process: ProcessCell = STATE I n i t : BOOLEAN = TRUE; STATIC prev: INTEGER = (number+2) MOD 3; next: INTEGER = (number+1) MOD 3; me: INTEGER = number; BEGIN (* * enable evaluation, i f the successor stage 106 * has started to precharge *) ( « NOT pb(me).v AND NOT pb(next).v -> pb(me).v, pb(me).tau := TRUE, tau » * « quotientnumber := quotientnumber + 1 » (* * i f successor stage has finished 0. * and i s holding the output, we can star t * precharging *) M « pb(me).v AND pb(next).v AND done(next)(0).t -> pb(me).v := FALSE » *PutInRegister(quotient(next), quotientreg, quotientnumber) (* * when precharging i s finished, the done * signal should drop; * i f i n 0. mode ( i . e . no precharge), * and predecessor has value, can signal * output value *) I | StageProcess( pb, me,divisor, carry, sum, quotient, done, TimeTable, tau) I | ChangeTime(tau, tau, TimeTable) ) END; StageProcess: StageCell = STATE muxout: DRArray(MantBits); propagatesum: DRArrayO); adderdone: DualRail; padderdone: DualRail; cadderdone: DualRail; qsldone: DualRail; carrybit: DualRail; I n i t : BOOLEAN = TRUE; STATIC prev: INTEGER = (number+2) MOD 3; 107 next: INTEGER = (number+1) MOD 3; me: INTEGER = number; BEGIN « Ini t -> adderdone.t, adderdone.f, qsldone.t, qsldone.f, c a r r y b i t . t , c a r r y b i t . f , I n i t , padderdone.t, padderdone.f := FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE » * InitEmpty(muxout) * InitEmpty(propagatesum) * WriteString(BEGIN con.cat2( " I n i t i a l i z i n g i n StageProcess ", IntToString(me, 10)) END) I | « NOT I n i t » *( MUX(pb(me).v, quotient(prev), d i v i s o r , muxout, carrybit, TimeTable, me) II CarrySave(pb(me).v, carry(prev), sum(prev), muxout, carry(me), sum(me), carrybit, TimeTable, me) II CarryPropagate(pb(me).v, carry(me), sum(me), propagatesum, TimeTable, me) M QSL(pb(me), propagatesum, quotient(me), qsldone, TimeTable, tau, me) II AdderDone(pb(me).v, sum(me), adderdone) II AdderDone(pb(me).v, carry(me), cadderdone) II QSLDone(pb(me).v, quotient(me), qsldone, TimeTable, me) II Done(pb(me), number, qsldone, done(me), done(next), tau) ) END; END. (* t r a n s i s t o r l e v e l module *) 108
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Verifying a self-timed division chip
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Verifying a self-timed division chip Ono-Tesfaye, Tarik 1997
pdf
Page Metadata
Item Metadata
Title | Verifying a self-timed division chip |
Creator |
Ono-Tesfaye, Tarik |
Date Issued | 1997 |
Description | The formal verification of asynchronous circuits is challenging, because they often have non-deterministic timing relations between circuit components. This thesis presents a formal proof strategy for verifying the correctness of a dual-rail, asynchronous divider chip. For that purpose, this thesis will extend timing verification techniques that were originally developed for combinational logic to apply to dual-rail, self-timed designs. The proof strategy is refinement based and employs four progressively more detailed models of the divider. Key safety properties are first established for the abstract models using model checking. These properties are then shown to hold for the more detailed models by verifying refinement between adjacent levels in the model hierarchy. |
Extent | 3626084 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-03-26 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
IsShownAt | 10.14288/1.0050991 |
URI | http://hdl.handle.net/2429/6572 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 1997-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_1997-0580.pdf [ 3.46MB ]
- Metadata
- JSON: 831-1.0050991.json
- JSON-LD: 831-1.0050991-ld.json
- RDF/XML (Pretty): 831-1.0050991-rdf.xml
- RDF/JSON: 831-1.0050991-rdf.json
- Turtle: 831-1.0050991-turtle.txt
- N-Triples: 831-1.0050991-rdf-ntriples.txt
- Original Record: 831-1.0050991-source.json
- Full Text
- 831-1.0050991-fulltext.txt
- Citation
- 831-1.0050991.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0050991/manifest