"Applied Science, Faculty of"@en . "Electrical and Computer Engineering, Department of"@en . "DSpace"@en . "UBCV"@en . "Darwish, Mohammad Mostafa"@en . "2009-02-27T20:20:42Z"@en . "1994"@en . "Master of Applied Science - MASc"@en . "University of British Columbia"@en . "Designing a microprocessor is a significant undertaking. Modern RISC processors are\r\nno exception. Although RISC architectures originally were intended to be simpler than\r\nCISC processors, modern RISC processors are often very complex, partially due to the\r\nprominent use of pipelining. As a result, verifying the correctness of a RISC design is an\r\nextremely difficult task. Thus, it has become of great importance to find more efficient\r\ndesign verification techniques than traditional simulation. The objective of this thesis is\r\nto show that symbolic trajectory evaluation is such a technique.\r\nFor demonstration purposes, we designed and implemented a 32-bit pipelined RISC\r\nprocessor, called Oryx. It is a fairly generic first-generation RISC processor with a fivestage\r\npipeline and is similar to the MIPS-X and DLX processors. The Oryx processor\r\nis designed down to a detailed gate-level and is described in VHDL. Altogether, the\r\nprocessor consists of approximately 30,000 gates.\r\nWe also developed an abstract, non-pipelined, specification of the processor. This\r\nspecification is precise and is intended for a programmer. We made a special effort not\r\nto over-specify the processor, so that a family of processors, ranging in complexity and\r\nspeed, could theoretically be implemented all satisfying the same specification.\r\nWe finally demonstrated how to use an implementation mapping and the Voss veri\r\nfication system to verify important properties of the processor. For example, we verified\r\nthat in every sequence of valid instructions, the ALU instructions are implemented cor\r\nrectly. This included both the actual operation performed as well as the control for\r\nfetching the operands and storing the result back into the register file. Carrying out this\r\nverification task required less than 30 minutes on an IBM RS6000 based workstation."@en . "https://circle.library.ubc.ca/rest/handle/2429/5260?expand=metadata"@en . "1550522 bytes"@en . "application/pdf"@en . "FORMAL VERIFICATION OF A 32-BIT PIPELINED RISC PROCESSORByMOHAMMAD MOSTAFA DARWISHB. Sc. (Computer Engineering) University of Petroleum & Minerals, 1991A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinTHE FACULTY OF GRADUATE STUDIESDEPARTMENT OF ELECTRICAL ENGINEERINGWe accept this thesis as conformingto the required standardE.THE UNIVERSITY OF BRITISH COLUMBIAJuly 1994\u00C2\u00A9 MOHAMMAD MOSTAFA DARWISH, 1994In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. ,t is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature)Department of e\u00E2\u0080\u0099cc4, a.\u00E2\u0080\u0094/The University of British ColumbiaVancouver, CanadaDate \u00E2\u0080\u0098, 19911DE-6 (2)88)AbstractDesigning a microprocessor is a significant undertaking. Modern RISC processors areno exception. Although RISC architectures originally were intended to be simpler thanCISC processors, modern RISC processors are often very complex, partially due to theprominent use of pipelining. As a result, verifying the correctness of a RISC design is anextremely difficult task. Thus, it has become of great importance to find more efficientdesign verification techniques than traditional simulation. The objective of this thesis isto show that symbolic trajectory evaluation is such a technique.For demonstration purposes, we designed and implemented a 32-bit pipelined RISCprocessor, called Oryx. It is a fairly generic first-generation RISC processor with a five-stage pipeline and is similar to the MIPS-X and DLX processors. The Oryx processoris designed down to a detailed gate-level and is described in VHDL. Altogether, theprocessor consists of approximately 30,000 gates.We also developed an abstract, non-pipelined, specification of the processor. Thisspecification is precise and is intended for a programmer. We made a special effort notto over-specify the processor, so that a family of processors, ranging in complexity andspeed, could theoretically be implemented all satisfying the same specification.We finally demonstrated how to use an implementation mapping and the Voss verification system to verify important properties of the processor. For example, we verifiedthat in every sequence of valid instructions, the ALU instructions are implemented correctly. This included both the actual operation performed as well as the control forfetching the operands and storing the result back into the register file. Carrying out thisverification task required less than 30 minutes on an IBM RS6000 based workstation.11Table of ContentsAbstract iiTABLE OF CONTENTS iiiLIST OF TABLES VI.LIST OF FIGURES ViACKNOWLEDGEMENTS ixDEDICATION Xi1 Introduction 11.1 Symbolic Trajectory Evaluation 21.2 Research Objectives 71.3 Related Work 71.4 Thesis Overview 112 Architecture 132.1 Memory Organization 132.2 Registers 162.2.1 General Registers 162.2.2 Multiplication and Division Register 162.2.3 Status and Control Registers 162.3 Instruction Formats 182.4 Addressing Modes 202.5 Operand Selection 202.6 Co-processor Interface 20ill2.7 Instructions.2.7.1 Arithmetic Instructions .2.7.2 Logical Instructions2.7.3 Control Transfer Instructions2.7.4 Miscellaneous Instructions2.8 Interrupts3 Formal Specification3.1 Abstract Model3.2 Formal Specification3.2.1 Example I: Register File Specification3.2.2 Example II: PC Specification . .4 Implementation4.1 Design Overview4.2 The Control Path4.3 The Data Path5 Formal Verification5.1 Verification Difficulty5.2 Property Verification5.3 Implementation Mapping5.3.1 Cycle Mapping5.3.2 Instant Abstract State Mapping . .5.3.3 Combined Mapping5.4 Verification Condition5.5 Practical Verification222427283030323235384142424557676768697070737474iv5.6 Design Errors.766 Concluding Remarks and Future Work 77Bibliography 79A The FL Verification Script 81VList of Tables2.1 PSW bits definition 172.2 The Oryx processor instruction set 213.3 Function abbreviations 373.4 Operation definitions 374.5 The Oryx processor pipeline action during exceptions 484.6 The Oryx processor pipeline action during instruction squashing 504.7 Pipeline action during instruction processing phases 525.8 Some experimental results 76viList of Figures1.1 The state space partial order 31.2 The Voss verification system 61.3 A simple latch 92.4 Little Endian byte ordering 132.5 A system block diagram of the Oryx processor 142.6 The Oryx processor instruction formats 193.7 The abstract model of the Oryx processor 364.8 The Oryx processor pipeline 424.9 i&2 logic diagram 444.10 Program counter chain logic diagram 454.11 \u00E2\u0080\u0098b2FC logic diagram 464.12 The control path logic diagram 474.13 The exception FSM logic diagram 484.14 The squash FSM logic diagram 494.15 The PC chain FSM logic diagram 514.16 A typical instruction fetch timing diagram 514.17 ICache page fault timing diagram 534.18 The PC unit logic diagram 544.19 The TakeBranch signal logic diagram 554.20 A typical memory read timing 56vii4.21 The Oryx processor data path logic diagram 584.22 The Oryx processor register file logic diagram 594.23 A register file memory bit logic diagram 594.24 The NoDest signal logic diagram 604.25 The ALU logic diagram 614.26 The AU logic diagram 614.27 The LU logic diagram 624.28 The shifter logic diagram 634.29 A typical MD bit logic diagram 644.30 PSWCurrent-PSWOther pair 655.31 Time mapping 705.32 State mapping 715.33 Register file mapping 725.34 Combined mapping 73viiiAcknowledgmentsThere are several people who were major forces in the completion of this work. Firstand foremost, warm and very grateful thanks go to Carl Seger who introduced me toformal verification and has nurtured my knowledge with amazing generosity and patience.Rabab Ward believed in me and encouraged me to carry on with my work despite allthe difficulties. Carl and Rabab have provided support in times of crisis and criticism intimes of confidence.I am heavily obliged to Andr\u00C3\u00A9 lvanov, Scott Hazelhurst, and Alimuddin Mohammadfor donating their time, the most precious of all gifts, to help me improve the presentation.Many ideas in this work come from fascinating discussions with Mark Greenstreet,Andrew Martin, Nancy Day, John Tanlimco, and Gary Hovey.Warm thanks go to my friends who made my life more enjoyable. In particular, I wouldlike to mention Ammar Muhiyaddin who has offered me his friendship over the years,Nasir Aljameh who has always remembered his youth friend despite the distance, andRafeh Hulays who has been a good friend since the very first day I arrived in Vancouver.I also would like to mention Mohammad Estilaei, Catherine Leung, H\u00C3\u00A9l\u00C3\u00A8ne Wong, NeilFazel, Mike Donat, Thomas Ligocki, Kevin Chan, Barry Tsuji, Andrew Bishop, MatthewMendelsohn, William Low, Kendra Cooper, Peter Meisl, Chris Cobbold, Robert Link,and Jeff Chow. To those I have surely forgotten, I must apologize.Special thanks go to the Original Beanery Coffee House where I wrote most of thiswork. Gordon Schmidt has been kind enough to offer me a quiet place where I could sitdown and do all the thinking. I will let you discover the secrets of that place on yourown.ixI have been fortunate in obtaining support for my studies from the SemiconductorResearch Corporation as well as the Department of Electrical Engineering and the Department of Computer Science.Finally, and above all, I owe a great debt of gratitude to my parents, Bedour andMostafa who inspired my life and pushed me to the limit. This work is my way of sayingthank you mother and father.xTo my mother and fatherxiChapter 1IntroductionThe arrival of very large scale integration (VLSI) technology has paved the way forinnovative digital system designs, like reduced instruction set computer (RISC) processors.However, the design of RISC processors is very complex and demands considerable effort,partially due to the problems that designers are faced with in specifying and verifyingthis type of system.From a marketing point of view, there is no doubt that any enterprise should releasenew products on time in order to keep its market share. Unfortunately, the design timeof RISC processors, like other digital systems, is strongly influenced by the number ofdesign errors. The MIPS 4000, for example, was estimated to cost $3-$8 million foreach month of additional design time and 27% of the design time was spent verifyingthe design [8]. This example clearly indicates the importance of finding better ways touncover design errors as early as possible to avoid any delay in releasing the product.Today, design verification is done using (traditional) simulation [10]. A good designof a RISC processor is composed of several smaller modules. Each module is simulatedusing switch-level and gate-level simulators by trying relatively few test cases, one at atime, and checking whether the results are correct. Towards the end of the design phase,all modules are put together and the final design is simulated for an extended period oftime. This simulation is usually done using behavioral models of the components ratherthan at the switch or gate level. A common approach is to run some reasonably largeprograms (e.g. boot UNIX) on the simulated design. It is very common to spend months1Chapter 1. Introduction 2simulating the final design on large computers [12].In spite of this, traditional simulation is still inadequate for verifying RISC processorsas many design errors can be left undetected [1]. The deficiency of traditional simulationis partially due to the introduction of aggressive pipelining and concurrently-operatingsub-systems, which make predicting the interactions between logically unrelated systemactivities very difficult. For example, it is often impossible to simulate all possible instruction sequences that might lead to a trap, because instructions, which could be logicallyunrelated, all of a sudden are being processed in parallel and may even be processed outof order.In addition to the above, even the best design team can make mistakes. For example,an early revision of the DLX processor contained a flaw in the pipeline despite that thedesign was reviewed by more than 100 people and was taught at 18 universities. Theflaw was detected when someone actually tried to implement the processor. [11]From the above discussion, we have established the need to find a more effectivetechnique for verifying RISC processors. When we say effective, we mean three things:fast, automated (i.e. requires minimum human interveiltion), and thorough (i.e. uncoverall those design errors that cause the system to behave in a way other than according tothe system specification). Ideally, the method should allow the design team to be ableto go home at the end of the day with good confidence that the design is going to workrather than having to come the next day to try yet another simulation run to uncovermore design errors.1.1 Symbolic Trajectory EvaluationA way to improve traditional simulation is to consider symbols rather than actual valuesfor the design under simulation, hence, the term symbolic simulation is used to distinguishChapter 1. Introduction 3TZN0 1NZxFigure 1.1: The state space partial orderit from traditional simulation.Using symbolic simulation for circuit verification was first proposed by researchers atIBM Yorktown Heights in the late 1970\u00E2\u0080\u0099s. However, the power of symbolic simulationwas not fully utilized until an efficient method of manipulating symbols emerged whenordered binary decision diagrams (OBDD\u00E2\u0080\u0099s) [3] were developed for representing Booleanfunctions.Bryant and Seger [15] developed a new generation of symbolic simulators based on atechnique they called symbolic trajectory evaluation (STE). In this technique, the statespace of a system is embedded in a lattice {0,1,X,T}. The elements of the lattice arepartially ordered by their information content as shown in Figure 1.1, where X representsno information about the node value, 0 and 1 represent fully defined information, andT represents over-constrained values. As a result, the behavior of the system can beexpressed as a sequence of points in the lattice determined by the initial state and thefunctionality of the system. The partial order between sequences of states is defined byextending the partial order on the state space in the natural way.The model structure of the system is a tuple M = [(S, ), Y], where (5, ) isa complete lattice (S being the state space and IE a partial order on 5) and Y isa monotone successor function Y: S \u00E2\u0080\u0094\u00C3\u00B7 S. A sequence is a trajectory if and only ifY(u) 1 0.i+1 for i 0.Chapter 1. Introduction 4The key to the efficiency of trajectory evaluation is the restricted language that can beused to phrase questions about the model structure. The basic specification language isvery simple, but expressive enough to capture the properties to be checked for in verifyinga RISC processor.A predicate over $ is a mapping from $ to the lattice {false, true} (where false true).Informally, a predicate describes a potential state of the system (e.g. a predicate might be(A is x) which says that node A has the value x). A predicate is simple if it is monotoneand there is a unique weakest s E $ for which p(s) = true. A trajectory formula is definedrecursively as:1. Simple predicates: Every simple predicate over $ is a trajectory formula.2. Conjunction: (F1AF2) is a trajectory formula if F1 and F2 are trajectory formulas.3. Domain restriction: (e \u00E2\u0080\u0094* F) is a trajectory formula if F is a trajectory formulaand e is a Boolean expression over some set of Boolean variables.4. Next time: (NF) is a trajectory formula if F is a trajectory formula.The truth semantics of a trajectory formula is defined relative to a model structure anda trajectory. Whether a trajectory 5 satisfies a formula F (written as & M F) is givenby the following rules:1. g0 I=MP iffp(a\u00C2\u00B0) = true.2. o 1M (F1 A F2) iff o M F1 and M F23. \u00C2\u00B0 M (e \u00E2\u0080\u0094* F) Hf (e) (u M F), for all mapping Boolean expressions to{ true,false}.4. u\u00C2\u00B0&MNFiff&MF.Chapter 1. Introduction 5Given a formula F, there is a unique defining sequence, 5F, which is the weakest sequence which satisfies the formula . The defining sequence can usually be computed veryefficiently. From 6F a unique defining trajectory, TF, can be computed (often efficiently).This is the weakest trajectory which satisfies the formula \u00E2\u0080\u0094 all trajectories which satisfythe formula must be greater than it in terms of the partial order.If the main verification task can be phrased in terms of \u00E2\u0080\u009Cfor every trajectory u thatsatisfies the trajectory formula A, verify that the trajectory also satisfies the formula C\u00E2\u0080\u009D,verification can be carried out by computing the defining trajectory for the formula Aand checking that the formula C holds for this trajectory. Such a result is written as=M [A=>>C]. The fundamental result of STE is given below.Theorem 1.1.1 Assume A and C are two trajectory formulas. Let TA be the defining trajectory for formula A and let 6c be the defining sequence for formula C. Then=M[A=>>C] iff6cTA.A key reason why STE is an efficient verification method is that the cost of performingSTE is more dependent on the size of the formula being checked than the size of thesystem model.The Voss system [13], a formal verification system developed at the University ofBritish Columbia by Carl Seger, supports STE. Conceptually, the Voss system consistsof two major components, as shown in Figure 1.2.The front-end of the Voss system is an interpreter for the FL language, which is thecommand language of the Voss system. The FL language is a a strongly typed, polymorphic, and fully lazy language. Moreover, the FL language has an efficient implementationof OBDD\u00E2\u0080\u0099s built in. In other words, every object of type Boolean in the system is internally represented as an OBDD. A specification is written as a program, which when1\u00E2\u0080\u009CWeakest\u00E2\u0080\u009D is defined in terms of the partial order.Chapter 1. Introduction 6Figure 1.2: The Voss verification systemexecuted builds up the simulation sequence necessary for verifying the circuit.The back-end of the Voss system is an extended symbolic simulator with very comprehensive delay and race analysis capabilities, which uses an externally generated finitestate description to compute the needed trajectories. As shown in Figure 1.2, this description can be generated from a variety of hardware description languages (HDL\u00E2\u0080\u0099s).Verification scriptVerification librariesSymbolicOK/Counter exampleChapter 1. Introduction 71.2 Research ObjectivesThe goal of this research was to demonstrate that STE is an effective technique forverifying RISC processors. To demonstrate the technique, we applied it to the Oryxprocessor, which was designed and implemented for verification purposes. The Oryxprocessor is a fairly generic model for RISC processors. The model is intended to capturethe RISC fundamental characteristics and it is very similar to other RISC processors, suchas the DLX [9] and the MIPS-X [5]. The Oryx processor is described in more details inlater chapters.1.3 Related WorkIn the formal methods community, there have been several efforts to verify RISC processors. The verification process of a RISC processor involves first writing a formalspecification of the processor. A hardware model of the RISC processor is then compared against the formal specification to prove the correctness of the design. So far, mostresearchers have used extremely simple architectures as hardware models.Tahar and Kumar [18] proposed a methodology for verifying RISC processors basedon a hierarchical model of interpreters. Their model is based on abstraction levels, whichare introduced between the formal specification and the circuit implementation. Theabstraction levels are called stage level, phase level, and class level. A stage instruction isa set of transfers which occur during the corresponding pipeline segment, while a phaseinstruction is the set of sub-transfers which occur during that clock phase. The toplevel of the model is the class instruction, which corresponds to the set of instructionswith similar semantics. The model can be used to successively prove the correctness ofall phase, stage, and class instructions by instantiating the proofs for each particularinstruction. The specification of the DLX processor was described in HOL and parts ofChapter 1. Introduction 8the proof tasks were carried out.Srivas and Bickford [16] used Clio, a functional language based verification system,to prove the correctness of the Mini Cayuga processor. The Mini Cayuga is a three-stageinstruction pipelined RISC processor with a single interrupt. The correctness propertiesof the Mini Cayuga were expressed in the Clio assertion language. The Clio systemsupports an interactive theorem prover, which can be used to prove the correctness ofthe properties.The Mini Cayuga was specified at two levels, abstract specification and realizationspecification. The abstract specification described the instruction level behavior from aprogrammer\u00E2\u0080\u0099s point of view, while the realization specification described the processor\u00E2\u0080\u0099sdesign by defining the activities that occur during a single clock cycle.The abstract model of the Mini Cayuga is specified as a function that generates aninfinite trace of states over time as the processor executes instructions. The abstractstate of the Mini Cayuga is represented as a tuple with four components: the state ofthe memory, the state of the register file, the next program counter, and the currentinstruction. The progression of state as the processor executes instructions is done withthe help of the Step function, which takes as parameters a state and an interrupt list. Thisfunction defines the effect of executing a single instruction on the state of the processor,including the effect of the possible arrival of one or more interrupts during execution.The correctness criterion of the Mini Cayuga requires proving that a realization stateand an abstract state of the processor are equivalent under some conditions.Burch and Dill [4] described an automated technique for verifying the control pathof RISC processors assuming the data path computations are correct. They showed anapproach for extracting the correspondence between the specification and the implementation. The statement of correspondence is written in a decidable logic of propositionalconnectives, equality, and uninterpreted functions. Uninterpreted functions are used toChapter 1. Introduction 9LDFigure 1.3: A simple latchrepresent combinational logic, while propositional connectives and equality are used indescribing control in the specification and the implementation, and in comparing them.This technique compares a pipelined implementation to an architectural description.The description is written in a simple HDL based on a small subset of Common LISP andis compiled through a kind of symbolic simulator to a transition function, which takes asarguments a state and the current inputs, and returns the next state.The Ph.D. dissertation of Beatty [1] addressed the processor correctness problem andprovided a step toward a solution. Because of the similarities between Beatty\u00E2\u0080\u0099s work andour work, we will explain his method in some detail. To best illustrate the method, wewill consider the same example Beatty used in his dissertation.The example starts by describing an extremely simple nMOS implementation of thelatch, which is considered to be the simplest sequential circuit. As shown in Figure 1.3,the latch has a control input, a data input, and an output, which are labeled L, D, andQ, respectively. The D input provides the data to be loaded when L is pulsed to loadthe latch. The capacitance on node S retains the latched value when the latch is holdingthe data. In other words, the latch performs two operations: load and hold.Beatty\u00E2\u0080\u0099s approach to verification is through symbolic simulation accompanied withthe reliability of mathematics to establish correctness results. He started by examiningSQChapter 1. Introduction 10the nature of circuits and his simulation model of them to understand their properties,which yielded requirements on his verification method.By considering the latch and its timing diagram more closely, Beatty observed thatthe basic timing of the latch is defined by the operations the latch performs and not byclock signals. In other words, operations dictate their own timing and there is no needto distinguish clock signals from other input signals. Therefore, the beginning and theending of each operation must be marked. This can be done by identifying the nominalstart and the nominal end of each operation.Conceptually, the start marker of each operation is aligned with the end marker ofits predecessor to perform a sequence of operations. Moreover, these markers providea way to define the timing of state storage. For example, the state of the latch can beconsidered at a particular time, such as at the nominal beginning of an operation.The specification model Beatty used is an abstract Moore machine, structured as aset of symbolic assertions, where each assertion corresponds to an operation. By usingsymbols, a single assertion can concisely represent the operation for all possible datavalues. The latch can be specified with two assertions:((operation = load) A (D = b) 4 (Q = b)((operation = hold) A (Q = c) z. (Q = c)The left-hand side of the relation denotes the conditions on the system in thecurrent state, while the right-hand side denotes the conditions in its successor state.Beatty showed that specifications expressed in such a language define an abstract machineand the checks he made between specifications and circuits entail a formal relationshipbetween machines. The specifications are made sufficiently abstract, hence, they includeno particular details of a specific circuit implementation. Consequently, the same highlevel specification can be used to verify several circuit implementations by providing someChapter 1. Introduction 11formal relation between the specification and the circuit implementation.Beatty designed a formal language to support such a style of specifications. Thelanguage consists of writing assertions. Each assertion consists of two parts, an antecedentand a consequent, which are written in a restricted logic. There are two kind of variables,state and case. Case variables appear in the specifications, but are not components ofthe state space. They simply keep the specifications concise. The abstract specificationof the latch is given bellow:SMALversion 0 specification latchtypesvalue word 1;operation enumeration load, hold.stateop : operation;D value;Q value.assertionsop = load /\ D = b:value > Q = b;op = hold /\ Q = c:value > Q = c.Beatty used his methodology to verify the Hector, which is a 16-bit processor. Thoughthe Hector is a non-pipelined processor, the methodology can be applied to pipelinedsystems as well. The physical layout of the Hector was extracted and used as a modelin the symbolic simulation. This can be a tedious and time consuming process, whichexplains why Beatty did not use a more complex system to apply his methodology.1.4 Thesis OverviewThis thesis consists of six chapters with one appendix. In Chapter 2, we give an informaldescription of the Oryx processor. This description is what usually can be found inany programmer\u00E2\u0080\u0099s manual. Of course, this description is vague and cannot be used forverifying the Oryx processor as it is just a natural language description of what theOryx processor does. In Chapter 3, we provide a more precise description of the OryxChapter 1. Introduction 12processor based on an abstract model. The abstract model is used to reason about thebehavior of the Oryx processor and to facilitate the verification process. In Chapter4, we describe in details the implementation of the Oryx processor. In Chapter 5, weexplain the verification process and provides some experimental results. In Chapter 6,we conclude the thesis by discussing some future work. The FL verification script of theOryx processor can be found in Appendix A.Chapter 2ArchitectureThis chapter describes the architectural features of the Oryx processor from a programmer\u00E2\u0080\u0099s point of view. The chapter contains a section for each feature of the architecturenormally visible to the machine language programmer and the compiler writer.2.1 Memory OrganizationThe physical memory of the Oryx processor is organized as a sequence of words. Aword is four bytes occupying four consecutive addresses. Each word is assigned a uniquephysical address, which ranges from 0 to 232 \u00E2\u0080\u0094 1 (4 Gigawords).The byte ordering within a word is Little Eridian as shown in Figure 2.4. A wordcontains 32 bits. The bits of a word are numbered from 0 through 31. The byte containingbit 0 of a word is the least significant byte and the byte containing bit 31 of a word isthe most significant byte.The Oryx is a cache-oriented processor. A system block diagram of the Oryx processor is shown in Figure 2.5. In order to achieve high performance, the Oryx processorByte 3 Byte 2 Byte 1 Byte 031 0Figure 2.4: Little Endian byte ordering13Chapter 2. Architecture 14_____RESULT BUSH H__rDATAREGISTERINSTRUIONUNITRLE 1 r UNITMEMORYCACHE MANAGEMENTUNITBUS CONTROLMEMORYCACHE MANAGEMENTUNITBUS CONTROLIT MEMORY BUSI ORYXINTEGER UNITI I SOURCE 1 BUSSOURCE 2 BUSBUS CONTROL BUS CONTROLUFigure 2.5: A system block diagram of the Oryx processorChapter 2. Architecture 15is provided with two high speed cache memory modules. One cache is dedicated to instructions and the other to data. It is left to the system designer to decide what cachememory sizes should be used. This is usually a performance-cost tradeoff.The Oryx processor is designed with the assumption that the processor does notdirectly address main memory. When the Oryx processor requires instructions or data,it generates a request for that information and sends it to a memory management unit(MMU). If the MMU does not find the necessary information in the cache memory, thenthe Oryx processor waits while the MMU collects the information (i.e. instructions ordata) from main memory and moves it into the cache memory.The Oryx processor is designed to support virtual memory. In other words, the Oryxprocessor operates under the illusion that the system\u00E2\u0080\u0099s disk storage is actually part ofthe system\u00E2\u0080\u0099s main memory.The physical address space is broken into fixed blocks called pages. An addressgenerated by the Oryx processor is called a logical address. The logical address space ismapped into the physical address space which is, in turn, mapped into some number ofpages.A page may be either in main memory or on disk. When the Oryx processor issuesa logical address, it is translated into an address for a page in memory or an interruptis generated. The interrupt gives the operating system a chance to read the page fromdisk, update the page mapping, and restart the instruction.To provide the protection needed to implement an operating system, the physicaladdress space is divided into system space and user space. The Oryx processor reservesthe first 2 Gigawords of the physical memory for executing system programs and theother 2 Gigawords for executing user programs.Chapter 2. Architecture 162.2 RegistersThe Oryx processor has thirty six registers, which are visible to the programmer. Theseregisters may be grouped as:\u00E2\u0080\u00A2 General registers.\u00E2\u0080\u00A2 Multiplication and division register.\u00E2\u0080\u00A2 Status and control registers.2.2.1 General RegistersThe Oryx processor has thirty two 32-bit general registers R0,R1,. . . , R31. For architectural reasons, register R0 is permanently hardwired to a zero value. This register can beread only. A write attempt into register R0 has no effect. Registers R1 to R31 are usedto hold operands for arithmetic and logical operations. They may also be used to holdoperands for address calculations.2.2.2 Multiplication and Division RegisterThe multiplication and division (MD) register is used to facilitate the implementation ofshift-and-add integer multiplication and division algorithms. The MD register stores the32-bit value of the multiplier for multiplication operations and the dividend for divisionoperations.2.2.3 Status and Control RegistersThe Oryx processor has one status and two control registers, which report and allowmodification of the state of the processor.Chapter 2. Architecture 17S CRA queue shift enableE Non-maskable interrupt flagI Maskable interrupt flagM Interrupt maskV Overflow mask0 Overflow flagU User/System modeIPF Instruction page fault flagDPF Data page fault flagTable 2.1: PSW bits definitionProgram Counter RegisterThe program counter (PC) register holds a 32-bit logical address of the next instructionto be executed. The PC is not directly available to the programmer. It is controlledimplicitly by control-transfer instructions and interrupts.Control Return Address RegistersThe Oryx processor has four control return address (CRA) registers CRAO,CRA1,CRA2,and CRA3. These registers form a queue with the CRA3 register being the head of thequeue and the CRA0 register being the tail. Every clock cycle, the value of the PCregister is shifted in the queue. When an interrupt occurs, the shifting of the queue isdisabled to give the programmer a chance to save the CRA registers. The actual contentof the CRA registers is implementation dependent.Program Status Word RegisterThe program status word (PSW) register contains nine status bits, which control certainoperations and indicate the current state of the processor. Table 2.1 shows the definitionof these bits within the P5W register. The programmer (in supervisor mode) can set theChapter 2. Architecture 18PSW register bits, which are also set by the hardware on an interrupt to a pre-determinedvalue.2.3 Instruction FormatsThe Oryx processor distinguishes four instruction formats: memory, branch, compute,and compute immediate. An instruction format has various fixed size fields as shownin Figure 2.6. The type and opcode fields are always present. The other fields may ormay not be present depending on the operation. The fields of an instruction format aredescribed below:\u00E2\u0080\u00A2 Type: is a 2-bit field used to identify the instruction format.\u00E2\u0080\u00A2 Opcode: is a 3-bit field used to specify the operation performed by the instruction.Several instructions may have the same opcode. In this case, some bits of otherfields are used to distinguish between instructions.\u00E2\u0080\u00A2 Source Register Specifiers: are 5-bit fields used to specify source register(s).Instructions may specify one or two source registers. In memory instruction format,one source register is used for address calculations.\u00E2\u0080\u00A2 Destination Register Specifier: is also a 5-bit field used to specify a destinationregister into which the result of an operation will be moved.\u00E2\u0080\u00A2 Sq (Squash) Bit: is a 1-bit field used in branch instruction format only. Whenset, it indicates that the branch is statically predicted to be taken.\u00E2\u0080\u00A2 Offset: is a 17-bit field used in memory instruction format. This field holds asigned value which is used to generate the logical address of a word in memory.Chapter 2. Architecture 19Memoryiol Op Srcl Dest Offset (17)3130 27 22 17 0Branch00 Cond Srcl Src2 sq Disp (16)3130 27 22 17 16 0Compute01 Op Src1 Src2 Dest Func(12) I3130 27 22 17 12 0Compute Immediate11 Op Src 1 Dest Immed (17)3130 27 22 17 0Figure 2.6: The Oryx processor instruction formats\u00E2\u0080\u00A2 Disp (Displacement): is a 16-bit field used in branch instruction format. Thisfield is similar to the offset field except that it is used to determine the branchtarget address.\u00E2\u0080\u00A2 Func (Function): is a 12-bit field used in compute instruction format. Whencompute instructions share the same opcode, this field is used to further define thefunctionality of the instructions.\u00E2\u0080\u00A2 Immed (Immediate): is a 17-bit field used in compute immediate instructionformat. This field holds a signed two\u00E2\u0080\u0099s complement value of an immediate operand.The value is sign-extended to form a 32-bit value.Chapter 2. Architecture 202.4 Addressing ModesThe Oryx processor supports only one addressing mode. The logical address is obtainedby adding the content of a base register to the address part of the instruction. The baseregister is assumed to hold a base address and the address field, also known as the offset,gives a displacement relative to this base register.2.5 Operand SelectionAn Oryx processor instruction may take zero or more operands. Of the instructions whichhave operands, some specify operands explicitly and others specify operands implicitly.An operand is held either in the instruction itself, as in compute immediate instructions,or in a general register. Since no memory operands are allowed, access to operands isvery fast as they are available on-chip.2.6 Co-processor InterfaceThe Oryx processor\u00E2\u0080\u0099s memory instructions can communicate with other types of hardwarebesides the memory system. This is used to implement a simple co-processor interface.Co-processor instructions use the memory instruction format. These instructions areused to transfer data between the processor registers and the co-processor registers. Itis also possible to load and store the co-processor registers without using the processorregisters as an intermediate step.The instruction bits for the co-processor are placed in the offset field of the memoryinstruction. The co-processor can read these bits when they appear on the address linesof the processor.Chapter 2. Architecture 21Instruction Operands OperationMOV Srcl,Dest Dest:=SrclMOVFRS SpecReg,Dest Dest: =SpecRegMOVTOS Srcl ,SpecReg SpecReg: =SrclMOVFRC Coplnstr,Dest Dest : =CopRegMOVTOC Coplnstr,Src2 CopReg: =Src2ALUC Coplnstr Cop executes CoplnstrLD X[Srcl],Dest Dest:=M[X+Srcl]ST X[SrclJ,Src2 M[X+Srcl]:= Src2LDT X[Srcl],Dest Dest:=M[X+SrcljSTT X[Srcl],Src2 M[X+Srcl]:=Src2LDF X[Srcl] ,fDest fDest : =M [X+SrclJSTF X[Srcl] ,fSrc2 M[X+Srcl] :=fSrc2ADD Srcl ,Src2,Dest Dest:=Srcl+Src2ADDI Srcl,#N,Dest Dest:=N+SrclSUB Srcl,Src2,Dest Dest:=Srcl-Src2SUBNC Srcl ,Src2,Dest Dest:=Srcl+Src2DSTEP Srcl,Src2,DestMSTART Src2,DestMSTEP Srcl,Src2,DestAND Srcl,Src2,Dest Dest:=Srcl bitwise and Src2BIC Srcl,Src2,Dest Dest:=J bitwise and Src2NOT Srcl,Dest Dest:=SrclOR Srcl,Src2,Dest Dest:= Srcl bitwise or Src2XOR Srcl,Src2,Dest Dest:= Srcl bitwise exclusive or Src2SRA Srcl,Dest Dest:=Srcl shifted right 1 positionSLL Srcl,Dest Dest:=Srcl shifted left 1. positionSRL Srcl,Dest Dest:=Srcl shifted right 1 positionIRET PC:= CRA3JUMP Srcl ,Dest PC:=Srcl ,Dest:=PC+1BEQ,BEQSQ Srcl,Src2,Label PC:=PC+Label if Srcl=Src2BGT,BCTSQ Srcl,Src2,Label PC:=PC+Label if Srcl>Src2BLT,BLTSQ Srcl,Src2,Label PC:=PC+Label if Srcl1 . e {O, 1}m} is the set of run-timeabstract state sequences, where m is the size of S.Traditional transition functions map one state to another. However, this definitionis not sufficient for modeling the Oryx processor as we may need information from morethan one abstract state (i.e. a sequence of abstract states) in order to determine the nextabstract state.For example, if the Oryx processor is currently executing an instruction in the second delay slot of a branch instruction, then the transition function will require someinformation from two previous abstract states to determine the new value of the PCregister.Therefore, we generalize the definition of a transition function so that it may take asequence of abstract states instead of a single abstract state.Definition 3.2 A transition function 1\u00E2\u0080\u0099 : S\u00E2\u0080\u0099 \u00E2\u0080\u0094* S maps a sequence of abstract states toan abstract state.It is important to note that a specification T is a partial function, hence, a mappingmay not exist for some sequences of abstract states. These sequences represent a class ofprograms which do not comply with the specifications of the Oryx processor. The behavior of the Oryx processor is simply not defined for such programs. It is the programmer\u00E2\u0080\u0099sresponsibility to write valid programs if a correct result is desired.Chapter 3. Formal Specification 35To identify the sequences of abstract states which can be obtained by executing validprograms only, we define what we call an abstract history.Definition 3.3 (Abstract History) An abstract history A is a sequence of abstractstates such that each prefix of A is consistent with the transition function. More formally,an abstract history A= {< > E $W Vj 0 ) =A specification, as defined below, describes how the abstract state of the Oryx processor is affected by executing an instruction every cycle.Definition 3.4 (Specification) A specification F : A \u00E2\u0080\u0094* A maps an abstract history toa new abstract history.Keeping the above discussion in mind, we now present the abstract model of the Oryxprocessor as depicted in Figure 3.7. Assuming that the Oryx processor is in a state So,the specification determines what the next state 51 is going to be based on the current(recent) history of the Oryx processor.3.2 Formal SpecificationThis section shows how to write the formal specification of the Oryx processor by givingan example. The full formal specification can be driven in a similar way. Before givingthe example, however, we introduce some notations and conventions, which will be usedthroughout this section.We write S1- to refer to the abstract state at time r. S0 refers to the current abstractstate. Negative indices are used to refer to previous abstract states. For example, S_is the abstract state one instruction cycle ago, 8\u00E2\u0080\u00942 is the abstract state two instructioncycles ago, and so on. Similarly, positive indices are used to refer to next abstract states.Chapter 3. Formal Specification 36ificationSi -Inputs -S0 S S2Figure 3.7: The abstract model of the Oryx processorChapter 3. Formal Specification 37RF7(i) S7.RF(i)PC7 ST.PCMD7 ST.MDSrclT ST.ICacheData[5..9]Src27 ST.ICacheData[1O..14]Dest7 S,-.ICacheData[15..19]ImmedT ST.ICacheData[15. .31]Offset7 ST.ICacheData[15. .31]DCacheData S- .DCacheDataSpecRegT ST.MDCRAT (i) S. CRA (i)NMInterruptT S-. NMlnterruptInterrupt7 ST .InterruptIPFaultT S-. IFFaultDFFau1tT ST. DPFaultReset1- ST.ResetTable 3.3: Function abbreviationsWriting formal specifications can be a very tedious task since it usually involves a lotof formulation. Therefore, a formal specification syntax should be made as simple as possible. Table 3.3 lists some abbreviations which are used to make the formal specificationof the Oryx processor more readable and Table 3.4 defines some symbols.Throughout the remaining of this chapter, we will show by example how to writethe formal specification of the Oryx processor. We will provide a hierarchical definition+ Binary signed addition\u00E2\u0080\u0094 Binary signed subtractionA Bitwise logical ANDV Bitwise logical ORBitwise logical XOR: Bit concatenationTable 3.4: Operation definitionsChapter 3. Formal Specification 38of two transition functions, namely the register file and the PC register, under normalconditions (i.e. no interrupts and no reset). We then go down in the hierarchy and defineall functions which compose the top-level definition of these transition functions.3.2.1 Example I: Register File SpecificationI NewRF(S) if Dest0=i=RF0(\u00E2\u0080\u0099i) OtherwiseArithmRes(S) if Arithmlnstr(S)LogicRes(S) else if Logiclnstr(S)TransferRes(S) else if Transferlnstr(S)NewRF(S)=MDRes(\u00E2\u0080\u0099S) else if MDlnstr(S)ShifiRes(S) else if Shiftlnstr(S)JumpRes(S) else if Jumplnstr(S)Arithmlnstr(S) = InstrIsADD(Opcodeo) VInstvIsADDI(Opcodeo) VInstrIsSUB(Opcodeo) VInstrIsS UBNC(Opcodeo)Logiclnstr(S)= InstrIsAND(Opcodeo) VInstrIsBIC(Opcodeo) VInstrIsOR (Opcodeo) VInstrIsXOR (Opcodeo) VInstrlsNO T(Opcodeo)Chapier 3. Formal Specification 39Transferlnstr(S)= InstrIsMO V(Opcodeo) VInstrlsLD(Opcodeo) VInstrlsLD T(Opcodeo) VInstrIsMO VFRS(Opcodeo) VInstrIsMO VFR C(Opcodeo)MDlnstr(S)= InstrIsMSTA RT(Opcodeo) VInstrfsMSTEP(Opcodeo) VInstrIsDSTEP(Opcodeo)Shiftlnstr(S) InstrIsSRA (Opcodeo) VInstrIsSLL (Opcodeo) VInstrIsSRL (Opcodeo)Jumplnstr(S)= InstrlsJUMP(Opcodeo)RF0(Srclo) +RF0(Src2o) if InstrIsADD(Opcodeo)ArithmRes(S)= RF0(Srclo) \u00E2\u0080\u0094RF0(Src2o) else if InstrIsSUB(Opcodeo)RF0(Srclo)\u00E2\u0080\u0094 RF0(Src2o) \u00E2\u0080\u00941 else if InstrIsSUBNC(Opcodeo)RF0(Srclo) /\RF0(Src2o) if InstrIsAND(Opcodeo)(RF0(Srclo)) ARE\u00E2\u0080\u00990(Src2o) else if InstrIsBIT(Opcodeo)LogicRes(S)= RF0(Srclo) VRF0(Src2o) else if InstrlsOR(Opcodeo)RF0(Srclo) EDRF0(Src2o) else if InstrIsXOR(Opcodeo)(RF0(Srclo)) else if InstrIsNOT(Opcodeo)Chapter 3. Formal Specification 40RF0(Srclo) if IristrlsMOV(Opcodeo)DCacheData0 else if InstrlsLD(Opcodeo)TransferRes(S)= DCacheDatao else if InstrIsLDT(Opcodeo)DCacheData0 else if InstrIsMOVFRC(Opcodeo)SpecRego else if InstrIsMOVFRC(Opcodeo)FirstMultStep(S) if InstrIsMSTA RT(Opcodeo)MDRes(S) MultStep(S) else if InstrIsMSTEP(Opcodeo)DivStep (S) else if InstrIsDSTEP(Opcodeo)1 0 \u00E2\u0080\u0094 RF0(Src2o) if (MSB(MD0)=1)FisrtMultStep (S) =1 0 Otherwise1 2 xRF0(Srclo) +RF0(Src2o) if (MSB(MD0)=1)FisrtMultStep(S) =( 2 xRF0(Srclo) OtherwiseI DivStepResA(S) if (MSB(DivStepResB(S))=1)DivStep(S) =( DivStepResB(S) OtherwiseDivStepResA(S) = 2 xRFo(Srclo) +MSB(MD0)DivStepResB(S) = DivStepResA(S) \u00E2\u0080\u0094RF0(\u00E2\u0080\u0099Src2o)RF0(\u00E2\u0080\u0099Srclo)[Sl]:RFo(Srclo)[Sl..l] if InstrIsSRA(Opcodeo)ShifiRes(S)= RF0(Srclo)[30. . 0]:0 else if InstrIsSLL(Opcodeo)O:RF0(Srclo)[31. .1] else if InstrIsSRL (Opcodeo)JumpRest(S)= PC0 +\u00C3\u00ADChapter 3. Formal Specification 413.2.2 Example II: PC Specificationif Exception(S)else if ReturnFromException(S)else if TakeBranch(S)else if DoJurnp(S)Otherwise(InstrIsBEQ(Opcode_2)A(RF_Srcl = RF_2(Src2_)))(InstrIsBNE(Opcode_2)A(RF_Srcl RF_2(\u00E2\u0080\u0099Src2_)))(InstrIsBGT(Opcode_2)A(RF_SrcL > RF_2(Src2_)))(InstrIsBLT(Opcode_2)A(RF_2(Srcl_2)< RF_2(Src_2)))ReturnFromException (S) = InstrIsJRET(Opcode_3)DoJump (S) = InstrIsJUMP(Opcode_2Exception (S) = NMlnterrupto VInterrupt0 VIPFault0 VDPFault0 VReset00CRA0(8)Tpc(S) = PCO +Disp_2RF_2(SrcL2)TakeBranch (S) =VVVChapter 4ImplementationThis chapter describes the implementation of the Oryx processor in the VHSIC hardwaredescription language (VHDL). The VHDL code is a detailed gate-level logic implementation of the Oryx processor [6].The logical way to describe the Oryx processor implementation is to divide it intotwo parts: control path and data path. This chapter contains a section for each path,but first we start by describing the Oryx processor from a designers\u00E2\u0080\u0099 point of view.4.1 Design OverviewThe Oryx is a pipelined processor. The pipeline consists of five stages as shown inFigure 4.8. The pipeline stages are typically called the instruction fetch (IF) stage, theinstruction decode (ID) stage, the execute (EXE) stage, the memory (MEM) stage, andthe write-back (WB) stage, respectively.IF ID EXE MEM WBClockFigure 4.8: The Oryx processor pipeline42Chapter 4. Implementation 43Each stage of the pipeline is a pure combinational circuit. High-speed interface latchesseparate the pipeline stages. The latches are fast registers, which hold intermediateresults between the stages. A common clock is simultaneously applied to all the latchesto control information flow between adjacent stages.The functionality of the pipeline stages is explained by describing what each stageof the pipeline does from the time an instruction is issued to the time the instruction isprocessed completely.The IF stage reads the instruction from the instruction cache (ICache) memory usingan address in the PC register. The ID stage identifies the instruction and prepares itsoperands.The computation performed by the EXE stage depends on the instruction being executed. For compute or compute immediate instructions, this stage performs the obviouscomputation. During the execution of memory instructions, the EXE stage computesthe address of the desired memory location. For branch instructions, the EXE stagecompares two registers to evaluate the branch condition.The MEM stage reads from and writes into the data cache (DCache) memory. TheWB stage is the final stage in the pipeline where the result of the instruction is writteninto the register file.Pipelining is a form of temporal parallelism. Several instructions occupy differentpipeline stages at the same time. These instructions could very likely depend on eachother. Therefore, internal bypassing is provided so that an instruction does not have towait for previous instructions to write their results into the register file before being ableto access their results. With bypassing, two instructions can be issued in sequence evenif the second one uses the result of the first one. The only exception to this is the loadinstructions.The Oryx processor uses a two-phase non-overlapping clocking scheme. The two clockChapter 4. Implementation 44phases are called Phil (ql) and Phi2 (q2), respectively. There are also three derivedclocks called Psil (\u00E2\u0080\u0098l), Psi2 (2), and Psi2PC (b2PC).The logic which generates the b2 clock is shown in Figure 4.9. The b2 clock iscontrolled by both the instruction acknowledgment (JAck) and data acknowledgment(DAck) external signals.The memory elements used in the implementation are all negative-edge triggeredflipflops. If either JAck or DAck is not asserted due to an ICache miss or a DCache miss,respectively, the Oryx processor is easily stalled by preventing the \u00E2\u0080\u0098tt\u00E2\u0080\u00992 clock from falling.The L\u00E2\u0080\u0099l clock is allowed to go high only if the b2 clock is low.The \u00E2\u0080\u0098ib2FC is a special clock which is used in the PC chain (PCChain). The PCChainis used to store the addresses of the instructions which need to be restarted after anexception. The PCChain is implemented as a four-stage shift register. As shown inFigure 4.10, the four stages are called PCm1 (PC minus 1), PCm2, PCm3, and PCm4,respectively.On an exception, the 2FC clock is disabled to give the programmer a chance tosave the contents of the PCChain registers. While the 2PC clock is disabled, it cango high only for a \u00E2\u0080\u009CMOVFRS PCm4\u00E2\u0080\u009D instruction. Once the programmer has saved thePCChain registers, the /2FC clock is re-activated. The ,1J2FC clock logic diagram isResetJAckDAck Psi2Phi2Figure 4.9: \u00C3\u00A7b2 logic diagramChapter 4. Implementation 45PCm4out[0-31]PCm4 RegisterPCm3 RegisterI PCm2 Register_J.____Psi2PC PCm1 RegisterPCToPCrnI________SrclToPCmlPCBus[0-3 1] Srcl [0-3 1]Figure 4.10: Program counter chain logic diagramshown in Figure 4.11.Instructions can change the Oryx processor state only when they reach their lastpipeline stage. When an instruction causes an interrupt (e.g. an instruction page fault),an interrupt flag is carried along the pipeline stages and is detected in the WB stage. Theinterrupt flag is used to prevent the instruction from doing anything as it goes throughthe pipeline stages. In other words, the instruction is turned into a \u00E2\u0080\u009CNOP\u00E2\u0080\u009D instruction.4.2 The Control PathThe control path is 32 bit wide. The logic diagram of the control path is shown inFigure 4.12. The design of the Oryx processor is kept as simple as possible and it doesnot require a very complex control. The control is primarily done with logic which islocal to each stage of the pipeline.Chapter 4. Implementation 46Psi2PSWSBitPsi2PCPsi2PSWSBitPCm4DrvResultBusFigure 4.11: 2FC logic diagramThere are few global control signals used in most stages of the pipeline. They are thereset signal, the exception signal, the squash signal, and the PCChainToPC signal.The reset signal puts the Oryx processor in a known state and forces an exception totake place. This is an active-high signal which is controlled by an external pin.The exception is an active-high signal which originates in the exception FSM. Theexception FSM is a simple six-state machine implemented as a six-stage shift register.The logic diagram of the exception FSM is shown in Figure 4.13.The exception FSM is activated when an interrupt is detected in the WB stage. Theexception signal stays active for five clock cycles. This is the time needed to flush thepipeline. The pipeline is flushed by feeding in five \u00E2\u0080\u009CNOP\u00E2\u0080\u009D instructions and preventingthe instructions which are currently in the pipeline from writing their results back intothe register file. The Oryx processor starts fetching instructions at the beginning of theTSR when the exception signal goes low.Table 4.5 shows the pipeline action during exceptions. Instruction T is assumed tohave caused an exception. Consequently, instructions Jo to 14 are canceled. The Oryxprocessor starts fetching instruction T which is the first instruction of the TSR.Chapter 4. ImplementationIPFauItCompFunBusImmedBusTBEqChkTBGrChk\u00E2\u0080\u0094\u00E2\u0080\u0098 TBWhichMovefrsMovetosBranch\u00E2\u0080\u00A2 Compute\u00E2\u0080\u00A2 Memory\u00E2\u0080\u00A2 Computelinmed\u00E2\u0080\u00A2 MDlnstr\u00E2\u0080\u00A2 IRET\u00E2\u0080\u00A2 ShifterDrvResultBus\u00E2\u0080\u00A2 RegFiIeToALUB\u00E2\u0080\u00A2 RegFiIeToALUAMemToALUBMemToALUA\u00E2\u0080\u00A2 ExeToALUB47NMlnterruptInterruptException\u00E2\u0080\u00A2 RdNoDeatOverflowP5W FlagsIDTakeflranchExceptionPC< Reset\u00C2\u00A3 PhilPhi2SquashJhFSMRaRbICacheFigure 4.12: The Control path logic diagramChapter 4. Implementation 48ExceptionFSMlnFigure 4.13: The exception FSM logic diagramIF ID EXE MEM WB\u00E2\u0080\u00980 X X X XIi jo X X X12 \u00E2\u0080\u00981 \u00E2\u0080\u00980 X X13 \u00E2\u0080\u00982 \u00E2\u0080\u00981 \u00E2\u0080\u00980 X14 13\u00E2\u0080\u00982 Ii ToNOP 14 13\u00E2\u0080\u00982 IiNOP NOP 14 13\u00E2\u0080\u00982NOP NOP NOP 14 13NOP NOP NOP NOP 1415 NOP NOP NOP NOPPsilTable 4.5: The Oryx processor pipeline action during exceptionsChapter 4. Implementation 49SquashFSMlnFigure 4.14: The squash FSM logic diagramThe squash is an active-high signal which originates in the squash FSM. The squashFSM is similar to the exception FSM. It is a three-state machine implemented as atwo-stage shift register. The logic diagram of the squash FSM is shown in Figure 4.14.The squash FSM is activated when the instruction being executed is a squashingbranch instruction and the branch is not taken. The squash signal stays active for twoclock cycles. This is the time needed to cancel the two instructions which are in the delayslots.Table 4.6 shows the pipeline action during instruction squashing. Instruction 14 isassumed to be a squashing branch instruction. The two instructions following 14 areturned into \u00E2\u0080\u009CNOP\u00E2\u0080\u009D instructions. The Oryx processor starts fetching instruction 15 whichfollows the two delay slots.The PCChainToPC is an active-high signal which originates in the PCChain FSM.The PCChain FSM is a five-state machine implemented as a four-stage shift register.The logic diagram of the PCChain FSM is shown in Figure 4.15. The PCChain FSM istriggered by an IRET instruction. The PCChainToPC signal stays active for four clockChapter 4. Implementation 50IF ID EXE MEM WB\u00E2\u0080\u00980 X X X XIi jo X X X12 \u00E2\u0080\u00981 \u00E2\u0080\u00980 X X13\u00E2\u0080\u00982 \u00E2\u0080\u00981 \u00E2\u0080\u00980 X14 13 \u00E2\u0080\u00982 \u00E2\u0080\u00981 \u00E2\u0080\u00980NOP 14 13 \u00E2\u0080\u00982 \u00E2\u0080\u00981NOP NOP 14 13\u00E2\u0080\u00982I NOP NOP 14 13Table 4.6: The Oryx processor pipeline action during instruction squashingcycles. During this time, the PC register gets its new value directly from the PCm4register.In order to understand how the control works in the Oryx processor, it is best toexamine what the hardware does in each processing phase of an instruction (Table 4.7).The instruction fetch starts during the IF phase when the PC register is loaded withthe address of the next instruction to be fetched. A typical instruction fetch timingdiagram is shown in Figure 4.16. On 1 of the IF phase, the ICache address pads aredriven with the instruction address and the fetch signal is asserted by the Oryx processor.The ICache is accessed on q2 of the IF phase. In the ideal situation (i.e. a cache hit), theMMU responds by asserting the JAck signal to tell the Oryx processor that an instructionis now available on the ICache pads. Consequently, the value on the ICache pads is loadedinto the instruction register (IR).Since the Oryx processor supports virtual memory, the design has a mechanism todetect page faults. A typical ICache page fault timing diagram is shown in Figure 4.17The same diagram can be used to show a DCache page fault timing by relabeling thewaveforms so they correspond to a DCache memory operation. An ICache page faultis similar to an ICache miss except that the MMU waits for one clock cycle and thenChapter 4. ImplementationlCache[O-3l]51ICacheAddr[O-31]lAckFetchPsi2Phi2PhilFigure 4.15: The PC chain FSM logic diagramFigure 4.16: A typical instruction fetch timing diagramChapter 4. Implementation 52Processing Phase Clock Phase Pipeline ActionIF q1 ICache address pads PC42 Do ICache accessInstruction register ICacheID q1 Calculate new PCDo bypass register comparisonsDetect squashing branch instructionsSrcl bus Srcl register or bypass registerSrc2 bus 4= Src2 register, bypass register oroffset field in memory instructions42 No actionEXE qfl Do ALU operation or shift operationEvaluate TakeBranch signal for branch instructions42 EXEout register 4= Result busSMDR Src2 busPCm1 PCMEM No actionq2 Do DCache accessTEMP register 4= EXEout registerLMDR register 4= DCache input padsWB Detect interruptsq2 WBout register TEMP register or LMDRRegister file 4= WBout registerTable 4.7: Pipeline action during instruction processing phasesChapter 4. Implementation 53ICache[O-3 1]ICacheAddr[O-3 1]IPFaultJAckFetchPsi2PsilPhi2Philasserts the instruction page fault (IPFault) signal.On ql of the ID phase, squashing branch instructions are detected. In the PC unit(Figure 4.18), the address of the next instruction to be fetched is calculated. At the sametime in the bypass control unit, the source register addresses of the current instruction arecompared with the destination register addresses of the previous two instructions usingfour identical comparators. As a result of the comparison, either the source register orthe pipeline intermediate values are driven on the source 1 (Srcl) and source 2 (Src2)buses.On q1 of the EXE phase, either an ALU operation or a shift operation is performed.On q52, the result of the operation is moved into the EXEout register and the value ofthe Src2 bus is loaded into the SMDR register. For memory instructions, the DCacheFigure 4.17: ICache page fault timing diagramChapter 4. ImplementationPC[O-31]\u00E2\u0080\u009C4\u00E2\u0080\u009DPc[O-31JOffsetPCm4DrvPCBusPCm4[O-31JFigure 4.18: The PC unit logic diagramPc[o-31]54SrclDrvPcBusSrcl[O-31]ZeroDrvPcBus\u00E2\u0080\u009C0\u00E2\u0080\u009DTrapVectorOrvPcBusTrap VectorResetChapter 4. Implementation 55TakeBranchFigure 4.19: The TakeBranch signal logic diagramaddress pads are driven with the content of the EXEout register and the DCache outputpads are driven with the content of the SMDR register. For branch instructions, theTakeBranch signal indicates whether fetching instructions should continue sequentiallyor at the branch target. The TakeBranch signal logic diagram is shown in Figure 4.19.A typical memory read timing diagram is shown in Figure 4.20. On q2 of the MEMphase, the content of the DCache input pads is loaded into the LDMR register and thecontent of the EXEout register is moved into the TEMP register. The Oryx processorasserts the read signal. The DCache is accessed on q2 of the next clock cycle giving theMMU enough time to find the required information.On ql of the WB phase, the interrupt flags are scanned according to their priority.When an interrupt is pending, the exception FSM is activated before the content of theWBout register is written into the register file as to prevent the contents of the registersfrom being changed.On 2, the content of the TEMP register is moved into the WBout register exceptfor memory load instructions where the WBout register gets the content of the LMDRregister instead.TBEqChkzTBWhichBranchChapter 4. Implementation 56DCacheIn[O3)X(DAckRead)W\u00C3\u00B1tCPsi2Phi2PhilFigure 4.20: A typical memory read timingChapter 4. Implementation 574.3 The Data PathThe width of data path is parameterized so it can take on a value which is equal to N,where N=21, i {2, 3,4, 5}. The data path logic diagram is shown in Figure 4.21. Itmainly consists of two parts: the register file and the execute unit.The register file provides the operands which are needed to perform an operation. Itis an M-word memory, where M=2t, i {1, 2,3,4, 5}, which can be accessed throughthree ports, one of which is used for writing and the other two ports are used for reading.As shown in Figure 4.22, the register file can be logically divided into three components:the memory array, the destination address decoder, and the output multiplexors.Each memory bit is a D-fiipfiop with an input multiplexor as shown in Figure 4.23.The content of the memory bit is changed when the load signal becomes active. The Nload signals are grouped together to form a word address line which is connected to oneoutput of the address decoder.The NoDest signal originates in the ID stage and is carried along the pipeline stages.It is used to to prevent instructions which do not write into the register file from changingthe contents of the registers. The logic which generates the NoDest signal is shown inFigure 4.24.An M-to-1 word multiplexor is used for each of the register file output ports. Readingfrom the register file is done asynchronously. When the source register address becomesstable at the multiplexor select lines, the required memory word is made available at theoutput port after some propagation delay.The design of the register file is somewhat naive. The main reason being that the toolswhich we used, specifically the VHDL compiler, do not support switch-level component(i.e. transistors) which are necessary for a realistic implementation. In practice, theregister file should be implemented as a Dual-Ported RAM with pre-charge lines andPCExeT0ALUAExeToALUBMemT0ALUAMemT0ALUB a\u00E2\u0080\u0099IDRaRbTakeBranchOverflowCompFuncMovetosMovefrsBranchTBWhichTBGrChkTBEqChkReset- Compute* MemoryImmedflusComputelmmedMDlnstrWETIFChapter 4. Implementation 58Rd Register FileWBMEMEXERegFiIeT0ALUARegFiIeT0ALUBPSW Flags aPCm4DCacheOut3.aShifterDrvResultBusRegister FileFigure 4.21: The Oryx processor data path logic diagramChapter 4. Implementation 59Ra[O-4] WBout[O-3 1]Register 0Srcl[0-31]Register 1DestinationAddress Rd[0-4]DecoderSrc2[0-311 Register 31 I 31Rb[0-4]NoDest ExceptionFigure 4.22: The Oryx processor register file logic diagramout(i)LoadPsi2Figure 4.23: A register file memory bit logic diagramChapter 4. Implementation 60TypOTypiOp1NoDestRd[O-4]Figure 4.24: The NoDest signal logic diagramdifferential sense amplifiers, although designs like ours have been reported [16].The execute unit (Figure 4.21) consists of five components: the ALU, the shifter, theMD register, the PSW register, and the PCChain.The ALU consists of two parts: the arithmetic unit (AU) and the logic unit (LU). Thelogic diagram of the ALU is shown in Figure 4.25. Data arrive at the ALU input portson either the source buses or the immediate bus. A multiplexor is used to select betweenthe Src2 bus or the immediate bus. Similarly, another multiplexor selects between Srclor Srcl shifter by one bit (used to support the multiplication/division instructions). Athird multiplexor selects between the output of the AU and the output of the LU.The AU is an N-bit carry propagate adder composed of - carry look-ahead adders.The logic diagram of the AU is shown in Figure 4.26. The AU supports both two\u00E2\u0080\u0099scomplement addition and subtraction. The AL UComplement signal is used to generatethe two\u00E2\u0080\u0099s complement of the second operand for subtract instructions.The AU is also used to perform register comparison for branch instructions. TheZ signal takes on a \u00E2\u0080\u009C0\u00E2\u0080\u009D value when the contents of the two source register of a branchinstruction are equal.TypOTyplChapter 4. Implementation 61ALUout[O-31]AUDrvResultflusLUDrvResultBus\u00E2\u0080\u00947\u00E2\u0080\u009DLogic Unit_II\u00E2\u0080\u0094_Src1ToALUSrclShiftedToALUSrclBusShiftedByOne[O-311 srcl[o-311 Src2[O-31]Figure 4.25: The ALU logic diagram1 Arithmetic UnitI I IISrc2ToALUedToALUIAUout[O-31]ALUCarryInALUCarryOutzOperandA[O-31] OperandB[O-31]Figure 4.26: The AU logic diagramChapter 4. Implementation 62LUout[O-31]AndDrvResultBus XorDrvResultBusOrDrvResultBus NotDrvResultBusOperandA[O-311 OperandB[O-31]Figure 4.27: The LUThe logic diagram of the LU is shown in Figure 4.27. The LU supports bitwise logical\u00E2\u0080\u009Cand\u00E2\u0080\u009D, \u00E2\u0080\u009Cor\u00E2\u0080\u009D, \u00E2\u0080\u009Cexclusive or\u00E2\u0080\u009D, \u00E2\u0080\u009Cnot\u00E2\u0080\u009D, and \u00E2\u0080\u009Cbit clear\u00E2\u0080\u009D operations.The shifter is a combinational circuit shifter which performs standard shifts suchas logical shifts and arithmetic shifts. The logic diagram of the shifter is shown inFigure 4.28.The shifter has two control lines ShiftLeft and ShiftRight for selecting the type ofoperation. The diagram shows the first stage, the last stage, and a typical stage. Theshifter consists of N such identical stages.The shifting to the right or left is for one bit position. The serial input to the leastsignificant stage is always \u00E2\u0080\u009C0\u00E2\u0080\u009D. The ShifterMSBln signal serves as a serial input for themost significant stage. For arithmetic shift operations, ShiftMSBIn takes on the value ofthe sign bit and for shift right logical operations it becomes \u00E2\u0080\u009C0\u00E2\u0080\u009D.logic diagramChapter 4. Implementation 63ShiftLeft ShiftRightFigui\u00E2\u0080\u0099e 4.28: The shifter logic diagramShifterout(3 1) Shifterout(i) Shifterout(O)ShifterMSBln \u00E2\u0080\u009CO\u00E2\u0080\u009DChapter 4. Implementation 64RestoreMDRegSrcJ ()LdMDRegMDRegout(i-1)MDRegShiftFigure 4.29: A typical MD bit logic diagramThe MD register is an N-bit special register with built-in hardware to facilitate theimplementation of the shift-and-add multiplication and division instructions.A typical MD register bit consists of an input multiplexor and two D-fiipflops asshown in Figure 4.29. On ql, the output value of the input multiplexor is loaded intothe first D-fiipfiop. The second D-fiipfiop which is known as the shadow fiipflop saves thecontent of the first D-fiipfiop on so that the value of the MD register can be restoredafter an exception.The LdMDReg signal loads the MD register with the value on the Srcl bus. Thissignal becomes high only for a \u00E2\u0080\u009CMOVTOS MD\u00E2\u0080\u009D instruction which is not squashed.The MDRegShift signal causes the MD register to shift left by one bit for multiplication/division instructions which are not squashed.On an exception, both the LdMDReg and the MDRegShift signals stay low. Beforestarting the ISR, the Restore OldMDReg rises to restore the value of the MD register whichwas held immediately before the exception signal became active. The MDRegLSBIn isshifted into the low order bit of the MD register when the MDRegShift signal is high.This bit is simply the sign bit of the ALU output.Chapter 4. Implementation 65PSWout(i+I)PSWout(i)PsilFigure 4.30: PSWCurrent-PSWOther pairThe PSW register consists of nine pairs of bits. The two bits of each pair are calledPSWCurrent and PSWOther, respectively. The logic diagram for each PSWCurrentPSWOther pair is shown in Figure 4.30. The top D-flipflop makes up the PSWCurrentbit and the bottom D-flipflop makes up the PSWOther bit. Each PSWcurrent bit canbe saved in and restored from its PSWOther bit.On an exception, the PSWCurrent bits are saved in the PSWOther bits and a particular value is forced into most of the PSWCurrent bits except the E, I, 0, IPF, and DPFbits. These bits are forced to values which depend on whether or not the exception wascaused by a non-maskable interrupt, a maskable interrupt, an overflow, an ICache pagefault, or a DCache page fault, respectively.The LdFSW signal loads the PSW register with the value on the Srcl bus. Thissignal becomes high only for a \u00E2\u0080\u009CMOVTOS PSW\u00E2\u0080\u009D instruction executed when the OryxSrcl(i+1)1-dPswForcePSWPSWOtherToPSWForcePSWSrcl(i)LdPSWChapter 4. Implementation 66processor is in the system mode. The LdPSW signal stays low in case of an exception.On an exception, the ForcePSW signal rises to save the PSWCurrent bits in theirrespective PSWOther bits and force the PSWCurrent bits to predetermined values.The PSWOtherToPSW signal is triggered by an IRET instruction as part of theTSR return sequence. This signal restores the PSWCurrent bits with the values of theirPSWOther bits.Chapter 5Formal VerificationIn this chapter, we demonstrate the application of symbolic trajectory evaluation forverifying the Oryx processor. The chapter contains six sections. First, we briefly discussthe inherent difficulty in carrying out the verification of the Oryx processor. Second,we explain what it means to verify the processor. Third, we show how implementationmappings can be used to overcome some of the difficulty in verifying the Oryx processor.Fourth, we state the verification condition of the processor which allows us to establishthe correctness of the processor. Fifth, we show how the verification is actually doneand we then present some experimental results. Finally, we list some of the design errorswhich were detected during the verification.5.1 Verification DifficultyIn practice, the Voss system is used to carry out the verification task of the Oryx processor. Generally speaking, the formal specification of the Oryx processor is expressed inthe FL language and the simulation model needed by the Voss system is extracted fromthe VHDL implementation of the processor.One of the difficulties in verifying the correctness of the Oryx processor is that theformal specification of the processor is very abstract so that it does not constrain theimplementation of the processor. On the other hand, our implementation of the Oryxprocessor is fairly complex. Therefore, a major effort in the verification effort was to findthe relationship between the formal specification of the Oryx processor and its pipelined67Chapter 5. Formal Verification 68implementation. In particular, we had to deal with both temporal abstraction and structural abstraction.In the abstract domain, time is measured in terms of instruction cycles. The processorexecutes one instruction every cycle. In the implementation domain, however, time ismeasured in terms of clock cycles. It takes an instruction several clock cycles to emergefrom the pipeline of the Oryx processor. For example, an ADD instruction takes 5 clockcycles to complete in the implementation. Furthermore, each clock cycle consists of manybasic time units. Assuming that each clock cycle is equal to 1000 time units, we thusneed to map 1 abstract instruction cycle to 5000 implementation time units.The problem imposed by structural abstraction is due to the fact that a data value inthe abstract domain could have several images in the implementation domain and whichimage should be used at one point of time depends of the state of the Oryx processor. Tofurther illustrate the problem, consider the following example. In the abstract domain,the value of the operands can be obtained from one source called the virtual register file.In the implementation domain, however, the value of operands could be in the registerfile or anywhere along the stages of the pipeline. We will return to this shortly.5.2 Property VerificationIf the programmer wants to use the Oryx processor to run a program, the implementationof the processor should perform the \u00E2\u0080\u009Ccorrect\u00E2\u0080\u009D function, as specified by the program. Invery general terms, answering this is the essential verification task of the Oryx processor.However, carrying out the verification for any arbitrary program is impossible dueto the complexity of the problem. Instead, we will show how to verify properties of theOryx processor using bounded-length, arbitrary, but valid (i.e. written according to theformal specification), instruction sequences. A property of the Oryx processor to verifyChapter 5. Formal Verification 69might be that the PC register is updated properly or that the ALU computes the sumof two operands correctly.At this point, we will consider the abstract model of the Oryx processor to show howto check for properties in the abstract domain. Later in this chapter, we will show how tocheck for the same properties in the implementation domain. We express the propertieswe want to check for in the form A = NO, where A and 0 are instantaneous formulasthat relate to the abstract model of the Oryx processor and IC/ is a next-time operator. Ifthe property we are checking for holds in the abstract domain, we write = A NC.Referring back to Chapter 1, one can realize that STE is a very suitable techniquefor checking properties of the form given above. The formula A, also known as theantecedent, describes the abstract state of the Oryx processor before we perform someinstruction. Intuitively, the antecedent selects out of the arbitrary instruction sequences,those sequences that are relevant for the property we wish to verify. Similarly, the formulaO, also known as the consequent, describes what the state of the Oryx processor shouldbe if the particular property holds and we allow the Oryx processor to run for one morecycle.Since we are verifying properties of the Oryx processor only, it may not be clear tothe reader how that results in verifying the Oryx processor as proposed in the beginningof this section. Nevertheless, it is possible, after each single property has been verified,to compose the individual results so that the correctness of the abstract model can beestablished [7). However, describing those techniques is beyond the scope of this thesis.5.3 Implementation MappingImplementation mapping is a technique to close the gap between the abstract formal specification of the Oryx processor and its pipelined implementation. The technique involvesChapter 5. Formal Verification 70I II IiiiiiiiiiiiiiiI IFigure 5.31: Time mappingfinding a well-defined relationship between the abstract domain and the implementationdomain. Due to the size of the gap, the mapping is relatively complex.Since we have two types of abstraction, we will first show how to do the mapping tohandle each type of abstraction. We will then show how to combine the two mappingsto obtain one implementation mapping.5.3.1 Cycle MappingThe first entity we need to map is time. As shown in Figure 5.31, one instruction cyclein the abstract domain corresponds to many basic time units in the implementationdomain. To relate time in both domains, we define a linear function, (i), which takesan instruction cycle i and maps it to some clock cycles. The definition of (i) depends onthe delays in the components of the implementation. In our case (i) is simply 100000 x i.5.3.2 Instant Abstract State MappingThe second entity we need to map is the instant abstract state. The term instant isused here to emphasize that the abstract state does not involve time explicitly. Thus,this mapping deals primarily with the structural abstraction. As shown in Figure 5.32,i+1Abstract TimeImplementation TimeChapter 5. Formal Verification 711 i+ II I Abstract StateImplementation StateFigure 5.32: State mappingwe define a function, q(i), which takes an abstract state i and maps it to a sequenceof implementation states. We impose some constraints on (i) to make sure that allimplementation states within a sequence are consistent with respect to each other.This mapping is best illustrated by an example. Recall from Chapter 4 that bypassingis a technique used to resolve data conflicts within the pipeline. This means operandscan be obtained from the register file or anywhere along the pipeline stages. To hide thisfact in the abstract domain, we refer to a virtual register file whenever we need to obtainan operand.To illustrate the basic idea behind the mapping, we present two examples as shownin Figure 5.33. The upper part of Figure 5.33 shows the abstract model of the Oryxprocessor with some fields filled in, while the lower part shows a simplified version of itsimplementation.The first example shown in Figure 5.33(a) assumes that the current instruction is anADD 2,3,4 instruction which is supposed to add the contents of the virtual register 2 andthe virtual register 3, and put the result in the virtual register 4. Since the address ofthe instruction operands are not equal to any of the addresses of the destination registersof the previous two instructions, bypassing is not done. Thus, the virtual register file isChapter 5. Formal Verification 72/Abstract Domain(N(a)/Abstract DomainN(b)2 Src1Src24 6 7 Dest-- VRF[21JL VRF[310Src1Src24 3 7 Dest-- -- VRF[2]--u VRF[31, S2/ EXEout\u00E2\u0080\u0099MEMoutI2 10 113 15 EXEoutMEMoutFigure 5.33: Register file mappingChapter 5. Formal Verification 73i+1I\u00E2\u0080\u0098 Abstract StateImplementation StateFigure 5.34: Combined mappingmapped to the physical register file as shown by the arrows.The second example shown in Figure 5.33(b) assumes that the current instruction isthe same. However, in this case one of the operand addresses is equal to the address ofthe previous destination register. Thus, the virtual register is mapped to the EXEoutregister.5.3.3 Combined MappingFrom the previous discussion, we showed how to deal with temporal and structural abstraction separately. Here we show how to combine both mappings to define a singleimplementation mapping function that maps both time and state. We call this functiont. The way the function works is depicted in Figure 5.34. The function takes an abstractstate i and maps it to a sequence of implementation states as explained earlier. The function then takes the next abstract state i + 1, maps it to a sequence of implementationstates, adds time elements to the result and combines it to the previously generatedsequences. Again, some constraints are imposed on the mapping to make sure that theresulting combination of sequences is conflict-free.Chapter 5. Formal Verification 745.4 Verification ConditionTo establish the correctness of the Oryx processor, we need to show the following is true:VA,O if A NO then j=M 1u(A= IrO)This basically says that if some properties hold in the abstract domain, then we canprove that the same properties hold in the implementation domain by using implementation mapping.5.5 Practical VerificationIn this section, we demonstrate how to write implementation mapping functions in theFL language. We also show how to use STE to prove properties of the Oryx processor.To illustrate implementation mappings, consider the following simple example of anoutput signal:let DCacheAddris cyc dv = DCacheAddr isv dvfrom ((Phi2F (cyc+2))+LatchDelay)to ((Phi2F (cyc+4))+LatchDelay);The DCacheAddris function takes two arguments, an abstract cycle number (cyc) anda data vector (dv). This function basically says that the DCache address output padsof the Oryx processor take on the value dv for a period of time. The abstract time ismapped to implementation time with the help of the Phi2F function. The DCacheAddrisfunction also takes into account set-up time and hold time of the data value.A more sophisticated example of implementation mapping is given below:let VRegFile cyc addr dv instrlist =let BypassFromEze = BypassFromExe addr instrlist inlet BypassFroniMem = BypassFromNem addr instrlist inlet NoBypass = (NOT BypassFromExe) AND (NOT BypassFromMem) in(if (BypassFromExe) (EXEoutis (cyc+1) dv))(if (BypassFromNem) (MEMoutis (cyc+1) dv)) @(if (NoBypass) (RegFileis (cyc+1) addr dv))Chapter 5. Formal Verification 75The VRegFile function takes four arguments: an abstract cycle number, a registeraddress, a data vector, and an instruction list. The function compares the register addresswith the destination register address of the previous two instructions and determines thebypassing conditions. Given the bypassing conditions, the function then asserts or checksthe appropriate values at the appropriate time.At this point, we are ready to show how to verify a property of the Oryx processor.Recall that the instructions of the Oryx processor fall into four classes of which the thirdclass is the ALU and shift instructions. Thus, we write a function called VerifyClass3lnstras shown below:let VerifyClass3lnstr =let Newlnstr = (hd NewlnstrList) inlet Ra = RaAddr Newlnstr inlet Rb = RbAddr Newlnstr inlet Rd = RdAddr Newlnstr inlet Ant =(GenOryxClock Clocks) \u00C2\u00AE(Reset is F from (Cycle 0) to (Cycle Clocks)) @(Exception is F from (Cycle 0) to (Cycle Clocks)) @(Squash is F from (Cycle 0) to (Cycle Clocks)) 0(AssignState 1 ControlList InstrList) 0(if (GlobalConstraint) ((if (TakesTwoSources U V NewlnstrList OldlnstrList)((VRegFile (Target) Ra U OldlnstrList) 0(VRegFile (Target) Rb V OldlnstrList))) 0(if (TakesOneSource U V NewlnstrList OldlnstrList)(VRegFile (Target) Ra U OldlnstrList)))) inlet Cons =(if (GlobalConstraint)((if (TakesTwoSources U V NewlnstrList OldlnstrList)(EvaluateClass3lnstrFirst Target Rd Newlnstr OldlnstrList U V)) 0(if (TakesOneSource U V NewlnstrList OldlnstrList)(EvaluateClass3lnstrSecond Target Rd Newlnstr OldlnstrList U V)))) in(nverify Options Oryx VarList Ant Cons TraceList);When we evaluate this function, it will check if the shifter and ALU perform correctly,the control logic and bypass logic are correct, and reading from and writing to the registerfile are done properly. The time it takes the Voss system to evaluate the function forChapter 5. Formal Verification 76various data path widths is given in Table 5.8.Data Path Width Verification Engine Time(Bits) (Minutes)4 SPARC 10 with 64 MB RAM 532 RS6000 with 375 MB RAM 30Table 5.8: Some experimental results5.6 Design ErrorsThe following is a non-exhaustive list of errors which where uncovered during the verification of the Oryx processor. The purpose of this list is to show that the verificationcan detect various types of mistakes that the designer usually makes.1. Pipeline bypasses when source register is 0. (forgotten boundary conditions)2. Both Srcl and ALU drive the result bus. (logic error)3. Trap instructions take effect before reaching their WB stage. (logic error)4. Bypass logic uses Rdm2 and Rdm3 instead of Rdml and Rdm2. (typographicalerror)5. Register file writes data at the wrong edge of the clock. (timing error)6. No way to distinguish between exceptions due to reset and an exception die to aMNlnterrupt. (misunderstanding error)7. Shifter uses some buffered control signals but the buffers do not exist. (VHDLcompiler bug)Chapter 6Concluding Remarks and Future WorkIn this thesis, we established that symbolic trajectory evaluation is an effective andpractical technique for verifying even very complex digital systems. For demonstrationpurposes, we designed and implemented a 32-bit RISC processor which is very similar tomany commercial RISC processors. We called it the Oryx processor.Moreover, we showed how to describe the Oryx processor in an abstract domain.The motivation for this is to avoid over-specifying the architecture and thus constrainthe implementation of the processor. In other words, our implementation of the Oryxprocessor can be replaced with another implementation at any point of time withouthaving to change the specification. Only the circuit description and the implementationmapping would have to be revised. We also showed how to relate the abstract modelof the Oryx processor to its pipelined implementation using implementation mappings.Finally, we used the Voss system to carry out the practical verification of the Oryxprocessor.Although we have not completely verified the processor, the properties verified weresufficiently complex that we do not foresee any major difficulties in completing the verification. However, to complete the verification we estimate would require 3-6 additionalman months. Furthermore, the work has been a very good learning experience. We nowhave a much better understanding of the verification problem which will be an assettowards any future work.77Chapter 6. Concluding Remarks and Future Work 78Like other related verification work, we realize the need to develop a general methodology for verifying this type of systems. We have already began the development of suchmethodology, but more research is still needed. First, due to the complexity of the systems we are dealing with, it would be ideal if we can break down the major verificationtask into smaller tasks and then use result composition to establish the correctness of thesystem. Result composition can be used in the abstract domain or in the implementationdomain. However, showing that composing a sequence of results in the abstract domainwould give the same result if we compose the corresponding sequence in the implementation domain is subject to future work. Second, finding the relation between the abstractdomain and the implementation domain is a very tedious job. A better way is needed toanalyze the mapping conditions to ensure correct mappings.The application of symbolic trajectory evaluation for the verification of RISC processors is only one example of what type of systems we can verify using this technique. Thegeneral framework we presented can be adapted to verify other systems as well. Thiswork is by no means a complete solution to the verification problem. However, we thinkthat it is a good start and a very practical way to establish the correctness of this typeof systems.Bibliography[1] Derek Beatty. A Methodology for Formal Hardware Verification with Application toMicroprocessors. Ph.D. Thesis, CMU-CS-93-190, Department of Computer Science,Carnegie Mellon University, 1993.[2] Graham Birtwistle and P.A. Subrahmanyam. VLSI Specification, Verification andSynthesis. Kiuwer Academic Publishers, Boston, 1988.[3] Randal Bryant. \u00E2\u0080\u009CGraph-Based Algorithms for Boolean Function Manipulation\u00E2\u0080\u009D.IEEETC, Vol. C-35, No. 8, August 1986, pp. 677-691.[4] Jerry R. Burch and David L. Dill. \u00E2\u0080\u009CAutomatic Verification of Pipelined Microprocessor Control\u00E2\u0080\u009D. CAV \u00E2\u0080\u00989j: Proceedings of the Sixth International Conference onComputer-Aided Verification. Lecture Notes in Computer Science 818, Springer-Verlag, June 1994, pp. 68-80.[5] Paul Chow. The MIPS-X RISC Microprocessor. Kiuwer Academic Publishers,Boston, 1989.[6] Mohammad Darwish. Design of a 32-bit Pipelined RISC Processor. Technical Reportin preparation.[7] Scott Hazelhurst and Carl-Johan Seger. \u00E2\u0080\u009CComposing Symbolic Trajectory Evaluation Results\u00E2\u0080\u009D. CAV \u00E2\u0080\u009894: Proceedings of the Sixth International Conference onComputer-Aided Verification. Lecture Notes in Computer Science 818, Springer-Verlag, June 1994, pp. 273-285.[8] J. L. Hennessy. \u00E2\u0080\u009CDesigning a computer as a microprocessor: Experience and lessonsfrom the MIPS 4000\u00E2\u0080\u009D. A lecture at the Symposium on Integrated Systems, Seattle,Washington, 1993.[9] J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufman Publishers, San Mateo, 1990.[10] Lee Howard. \u00E2\u0080\u009CThe design of the SPARC processor\u00E2\u0080\u009D. A lecture at the University ofBritish Columbia, Vancouver, British Columbia, 1994.[11] D. A. Patterson and 3. L. Hennessy. Computer Organization and Design: The Hardware/Software Interface. Morgan Kaufman Publishers, San Mateo, 1994.79Bibliography 80[12] S. Mirapuri, M. Woodacre, and N. Vasseghi. \u00E2\u0080\u009CThe MIPS R4000 Processor\u00E2\u0080\u009D. IEEEMicro, April 1992, pp. 10-22.[13] Carl-Johan Seger. Voss - A Formal Hardware Verification System: User\u00E2\u0080\u0099s Guide.Technical Report UBC-CS-93-45, Department of Computer Science, University ofBritish Columbia, Vancouver, British Columbia, 1993.[14] Carl-Johan Seger. An Introduction to Formal Hardware Verification. TechnicalReport UBC-CS-92-13, Department of Computer Science, University of BritishColumbia, Vancouver, British Columbia, 1992.[15] Carl-Johan Seger and Randal Bryant. Formal Verification by Symbolic Evaluationof Partially-Ordered Trajectories. Technical Report UBC-CS-93-8, Department ofComputer Science, University of British Columbia, Vancouver, British Columbia,1993.[16] Mandayam Srivas and Mark Bickford. \u00E2\u0080\u009CFormal Verification of a Pipelined Microprocessor\u00E2\u0080\u009D. IEEE Software, September 1990, pp. 52-64.[17] Daniel Tabak. RISC Systems. Research Studies Press Ltd., England, 1990.[18] Sofiene Tahar and Ramayya Kumar. \u00E2\u0080\u009CImplementing a Methodology for FormallyVerifying RISC Processors in HOL\u00E2\u0080\u009D. HUG \u00E2\u0080\u009893: Proceedings of the Sixth InternationalWorkshop on Higher Order Logic Theorem Proving and its Applications. LectureNotes in Computer Science 780, Springer-Verlag, August 1993, pp. 281-294.Appendix AThe FL Verification ScriptII include some librariesload \u00E2\u0080\u009Cverification.fl\u00E2\u0080\u009D;load \u00E2\u0080\u009Carithm.f1\u00E2\u0080\u009D;load \u00E2\u0080\u009CHighLowEx f1\u00E2\u0080\u009D;load \u00E2\u0080\u009COryxLib.f1\u00E2\u0080\u009D;II define timing parameterslet ClockPeriod = 2000000;let ScaleFactor = 20;let CycleLength = ClockPeriod/ScaleFactor;let LatchDelay = 100000/ScaleFactor;let Setup = 100000/ScaleFactor;let Iold = lO/ScaleFactor;// define clock phaseslet PhilRof\u00C2\u00B1 = 10*CycleLength/100;let PhilFoff = 36*CycleLength/100;let Phi2Roff = 60*CycleLength/100;let Phi2Foff = 85*CycleLength/100;let PhilF i = (Cycle (i\u00E2\u0080\u00941))+PhilFoff;let PhilR i = (Cycle (i\u00E2\u0080\u00941))+Philftoff;let Phi2F i = (Cycle (i\u00E2\u0080\u00941))+Phi2Foff;let Phi2R i = (Cycle (i\u00E2\u0080\u00941))+Phi2Roff;let Cycle k = (k)*CycleLength;II define implementation parameterslet DataWidth = 32;let InstrWidth = 32;let DataAddrWidth = 32;let InstrAddrWidth = 32;let FPRegAddrWidth = 5;let RxAddrWidth = 5;81Appendix A. The FL Verification Script 82let RegFileSize = 32;let CompFiincWidth = 12;let OpcodeWidth = 5;II short\u00E2\u0080\u0094hands for clock signalslet Phil = \u00E2\u0080\u00981Phil\u00E2\u0080\u009D;let Phi2 = \u00E2\u0080\u0098Phi2\u00E2\u0080\u0099;7/ define two-phase non\u00E2\u0080\u0094overlapping clock signalsletrec GenPhil n = (n0) > (UNC for 0) I(((((Phil is F for PhilRoff) then(Phil is T for (PhilFoff \u00E2\u0080\u0094 PhilRoff))) then(Phil is F for (Phi2Roff \u00E2\u0080\u0094 PhilFoff))) then(Phil is F for (Phi2Foff\u00E2\u0080\u0094 Phi2Roff))) then(Phil is F for (CycleLength\u00E2\u0080\u0094 Phi2Foff))) then(GenPhil (n\u00E2\u0080\u0094l));letrec GenPhi2 n = (n0) > (UNC for 0) I(((((Phi2 is F for PhilRoff) then(Phi2 is F for (PhilFoff \u00E2\u0080\u0094 PhilRoff))) then(Phi2 is F for (Phi2Roff \u00E2\u0080\u0094 PhilFoff))) then(Phi2 is T for (Phi2Foff \u00E2\u0080\u0094 Phi2Roff))) then(Phi2 is F for (CycleLength \u00E2\u0080\u0094 Phi2Foff))) then(GenPhi2 (n\u00E2\u0080\u0094l));let GenOryxClock n (GenPhil n) @ (GenPhi2 n)II define histories of length (n)letrec GenControl n 1 = (n0) > 1 Ilet new = [F,F,F,F,TT] inlet ni = new:l inGenControl (n\u00E2\u0080\u0094l) nl;letrec GenData n 1 = (n0) > 1 Ilet new = (variable_vector (\u00E2\u0080\u009CDCacheln.\u00E2\u0080\u009D(int2str (n\u00E2\u0080\u0094l))\u00E2\u0080\u009D.\u00E2\u0080\u009D)DataWidth) inlet nl = new:l inGenData (n\u00E2\u0080\u0094l) ni;letrec Genlnstr n 1 = (n0) > 1 Ilet new = (variable_vector (\u00E2\u0080\u009CICache.\u00E2\u0080\u009D(int2str (n\u00E2\u0080\u0094l))\u00E2\u0080\u009D.)InstrWidth) inlet nl = new:l inAppendix A. The FL Verification Script 83Genlnstr (n\u00E2\u0080\u0094i) nl;II short\u00E2\u0080\u0094hands for some signalslet Reset = \u00E2\u0080\u009CReset\u00E2\u0080\u009D;let Exception = \u00E2\u0080\u009CExceptionSignal\u00E2\u0080\u009D;let Squash = \u00E2\u0080\u009CSquashSignal\u00E2\u0080\u009D;let IPFault = \u00E2\u0080\u009CIPFault\u00E2\u0080\u009D;let DPFault = \u00E2\u0080\u009CDPFault\u00E2\u0080\u009D;let NMlnterrupt = \u00E2\u0080\u009CNMlnterrupt\u00E2\u0080\u009D;let Interrupt = \u00E2\u0080\u009CInterrupt\u00E2\u0080\u009D;let DAck = \u00E2\u0080\u009CDAck\u00E2\u0080\u009D;let lAck = \u00E2\u0080\u009ClAck\u00E2\u0080\u009D;let DCacheln = nvector \u00E2\u0080\u009CDcacheln\u00E2\u0080\u009D DataWidth;let ICache = nvector \u00E2\u0080\u009CICache\u00E2\u0080\u009D InstrWidth;let BypassCache = \u00E2\u0080\u009CBypassCache\u00E2\u0080\u009D;let MemCycle = \u00E2\u0080\u009CMemCycle\u00E2\u0080\u009D;let CopCycle = \u00E2\u0080\u009CCopCycle\u00E2\u0080\u009D;let ReadWriteb = \u00E2\u0080\u009CReadWriteb\u00E2\u0080\u009D;let IFetch = \u00E2\u0080\u009CIFetch\u00E2\u0080\u009D;let FPReg = nvector \u00E2\u0080\u009CFPReg\u00E2\u0080\u009D FPRegAddrWidth;let DCacheAddr = nvector \u00E2\u0080\u009CDCacheAddr\u00E2\u0080\u009D DataAddrWidth;let OCacheOut = nvector \u00E2\u0080\u009CDCacheOut\u00E2\u0080\u009D DataWidth;let ICacheAddr = nvector \u00E2\u0080\u009CICacheAddr\u00E2\u0080\u009D InstrAddrWidth;let EXEout =letrec row x = (x0) > E] I([\u00E2\u0080\u009CExecutionStage/DataRegisterCellO/DffrsCellO_\u00E2\u0080\u009D(int2str (x\u00E2\u0080\u0094i)Y\u00E2\u0080\u009D/m4\u00E2\u0080\u009D]) Q (row (x\u00E2\u0080\u0094i)) inrow DataWidth;let TEMPout =letrec row x = (x0) > C] I(E\u00E2\u0080\u009DMemoryStage/DataRegisterCellO/DffrsCellO...\u00E2\u0080\u009D(int2str (x\u00E2\u0080\u0094i)Y\u00E2\u0080\u009D/m4\u00E2\u0080\u009D]) C (row (x\u00E2\u0080\u00941)) inrow DataWidthlet LMDRout =letrec row x = (x0) > C] I(C\u00E2\u0080\u009DMemoryStage/DataRegisterCelli/DffrsCellO_\u00E2\u0080\u009D(int2str (x\u00E2\u0080\u0094i)Y\u00E2\u0080\u009D/m4\u00E2\u0080\u009D]) C (row (x\u00E2\u0080\u0094i)) inrow DataWidthlet RegFileliodes =letrec row x y = (x0) > C] I([\u00E2\u0080\u009CRegisterFileStage/PLDataRegisterCell\u00E2\u0080\u009D(int2str yY\u00E2\u0080\u009C/DffrsCellO...\u00E2\u0080\u009D(int2str (x\u00E2\u0080\u0094i)\u00E2\u0080\u009D/m4\u00E2\u0080\u009D]) C (row (x\u00E2\u0080\u0094i) y) inletrec column y = (y0) > C] IAppendix A. The FL Verification Script 84([row DataWidth y]) C (column (y\u00E2\u0080\u0094i)) incolumn (RegFileSize\u00E2\u0080\u00941)II define opcodeslet L.D_OPCODE = [T,F,F,F,F];let ST_OPCODE = [T,F,F,T,F];let LDF_OPCODE = [T,F,T,F,F];let STF_OPCODE = ET,FT,T,F];let LDT_OPCODE = [T,F,F,F,T];let STT_OPCODE\u00E2\u0080\u0094 [T,F,FIT,T],let MOVFRC_OPCODE = [T,F,T,F,T];let MOVTOC_OPCODE = [T,F,T,T,T];let ALUC_OPCODE = [T,F,T,T,T];let BEQ_OPCODE = [F,F,F,F,T];let BGE_OPCODE\u00E2\u0080\u0094 [F,F,T,T,T],let BLT_OPCODE\u00E2\u0080\u0094 [F,F,F,TIT],let BNE_OPCODE = [F,F,T,F,T];let ADD_OPCODE = [F,T,T,F,F];let DSTEP_OPCODE = EF,T,F,F,F];let MSTART_OPCODE = [F,T,F,F,F];let MSTEP_OPCODE = EF,T,F,F,F];let SUB_OPCODE = [F,T,T,F,F];let SUBNC_OPCODE\u00E2\u0080\u0094 [F,T,T,F,F],let AND_OPCODE\u00E2\u0080\u0094 [F,T,T,F,F],let BIC_OPCODE\u00E2\u0080\u0094 [F,T,T,F,F],let NOT_OPCODE\u00E2\u0080\u0094 [F,T,T,F,F],let OR_OPCODE = [F,T,T,F,F];let XOR_OPCODE = [F,T,T,F,F];let NOV_OPCODE = [F,T,T,F,F];let SLL_OPCODE = EF,T,F)F,T];let SRL_OPCODE = [F,T,F,F,T];let SRA_OPCODE = [F,T,F,F,T];let NOP_OPCODE\u00E2\u0080\u0094[F,T,T,FIF],let ADDI_OPCODE\u00E2\u0080\u0094 [T,T,T,F,F],let J_OPCODE = [T,T,F,F,F];let IRET_OPCODE = [T,T,T,T,T];let MOVFRS_OPCODE = [T,T,F,T,T];let MOVTOS_OPCODE = [T,T,F,T,F];let TRAP_OPCODE = ET,T,T,T,FJ;let ADD_COMPFUNC = [F,F,F,F,F,F,F,T,F,F,F,F];let DSTEP_COMPFUNC\u00E2\u0080\u0094 [F,F,F)T,F,T,F,T)F,F,F,F],let MSTART_COMPFUNC- [F,F,F,F,T,T,F,T,F,F,F,F],let NSTEP_COMPFUNC = [F,F,F,F,T,F,F,T,F,F,F,F];let SUB...COMPFUNC = [F,F,F,F,F,T,T,T,F,F,F,F];let SUBNC_COMPFUNC\u00E2\u0080\u0094 [F,F,F,F,F,F,T,T,F,F,F,F],let AND_COMPFUNC = [F,F,F,F,F,F,F,F,F,T,F,F];Appendix A. The FL Verification Script 85let BIC_COMPFUNC = [F,F,F,F,F,F,T,F,F,T,F,F];let NOT_COMPFUNC = [F,F,F,F,F,F,F,F,F,F,F,T];let OR_COMPFUNC = EF,F,F,F,F,F,F,F,T,F,F,F];let XOR_COMPFUNC = [F,F,F,F,F,F,F,F,F,F,T,F];let MOV_COMPFUNC = [F,F,F,F,F,F,F,T,F,F,F,FJ;let SLL_COMPFUNC = EF,F,T,F,F,F,F,F,F,F,F,F],let SRLCOMPFUNC- EF,F,F,T,F,F,F,F,F,F,F,F],let SRA_COMPFUNC\u00E2\u0080\u0094 CF,F,F,F,T,F,F,F,F,F,F,F],let NOP_COMPFUNC\u00E2\u0080\u0094 [F,F,F,F,F,F,F,T,F,F,F,F],II Opcode tableslet OpcodesTablel = CLD_OPCODE,ST...OPCODE,LDF_OPCODE,STF_OP CODE,LDT_OPCODE,STT_OPCODE,MOVFRC_OP CODE,MOVTOC_OPCODE,ALUC_OPCODE,BEQ_OPCODE,BGE_OPCODE,BLT_OPCODE,BNE_OPCODE,ADDI_OPCODE,J_OPCODE,IRET_OPCODE,MOVFRS.OPCODE,MOVTOS_OPCODE,TRAP_OPCODE];let OpcodesTable2 = [(ADD_OPCODE, ADLLCOMPFUNC),(DSTEP_OPCODE, DSTEP_COMPFUNC),(MsTART_OPCODE, MSTART_COMPFUNC),(MsTEP_OPCODE, MSTEP_COZ1PFUNC),(SUB_OPCODE, SUB_COMPFUNC),(SUBNC_OPCODE, SUBNC_COMPFUNC),(AND_OPCODE, AND_COMPFUNC),(BIC_OPCODE, BIC_CONPFUNC),(NOT_OPCODE, NOT_COMPFUNC),(OR_OPCODE, OR_COMPFUNC),(XOR_OPCODE, XO&..COMPFUNC),(MOV_OPCODE, MOV_COMPFUNC),(SLL_OPCODE, SLL_CONPFUNC),(sRL_OPCODE, SRL_COMPFUNC),(SRA_OPCODE, SRA_COMPFUNC),(NOP_OPCODE, NOP.COMPFUNC)];Appendix A. The FL Verification Script 86II these instructions generate resultslet OpGeneratesTablel = ELD_OPCODE,LDT_OPCODE,MOVFRC_OPCODE,ADDI_OPCODE,MOVFRS_OPCODE];let OpGeneratesTable2 = C(ADD_OPCODE, ADD_COMPFUNC),(sUB_0Pc0DE, SUB_COMPFUNC),(SUBNC_OPCODE, SUBNC_COMPFUNC),(AND_OPCODE, AND_COMPFUNC),(BIc_0Pc0DE, BIC_CONPFUNC),(NOT_OPCODE, NOT_COMPFUNC),(0R_0Pc0DE, OR_COMPFUNC),(XOR_OPCODE, XOR_COMPFUNC),(MOV_OPCODE, MOV_COMPFUNC),(SLL_OPCODE, SLL_COMPFUNC),(SRL_OPCODE, SRL_COMPFUNC),(SRA_OPCODE, SRA_COMPFUNC)];II instructions to be verifiedlet SelectedlnstrTable2 = E(SRA_OPCODE, SRA_COMPFUNC),(SRL_OPCODE, SRL.COMPFUNC),(SLL_OPCODE, SLL...COMPFUJC),(BIc_oPcoDE, BIC_COMPFUJC),(N0T_oPcooE, NOT_COMPFUNC),(x0R_oPcoDE, XOR_COMPFUNC),(0R_oPcoDE, OR....COMPFUNC),(AND_OPCODE, AND_CONPFUNC),(SUBNC_OPCODE, SUBNC_COMPFUNC),(SUB_OPCODE, SUB_COMPFUNC),(ADD_OPCODE, ADD_COMPFUNC)];II instructions which take two source operandslet Needs2SrcTable = E(BIC_OPCODE, BIC_COMPFUNC),(xoR_0Pc0DE, XOR_COMPFUNC),(oR_oPcoDE, OR_COMPFUNC),(AND_OPCODE, AND_COMPFUNC),(SUBNC_OPCODE, SUBNC_COMPFUNC),(SUB_OPCODE, SUB_COMPFUNC),(ADD_OPCODE, ADD_COMPFUNC)J;Appendix A. The FL Verification Script 87II some implementation mapping functionslet Squashis v cyc = (Squash is v)from ((PhilF cyc)+LatchDelay)to (Cycle cyc);let ICacheis cyc i src =(((Opcode ICache) isv (Opcode i))((RdAddr ICache) isv (RdAddr i))(((RaAddr ICache) isv (RaAddr i)) when src)(((RbAddr ICache) isv (RbAddr i)) when src) \u00C2\u00AE((CompFunc ICache) isv (CompFunc i)))from ((PhilF cyc)+LatchDelay)to (Cycle cyc);let ICacheAddris cyc a = ICacheAddr isv afrom ((PhilF cyc)+LatchDelay)to ((Phi2F cyc)+LatchDelay);let DCacheAddris cyc a = DCacheAddr isv afrom ((Phi2F (cyc+2))+LatchDelay)to ((Phi2F (cyc+4))+LatchDelay);let DCacheOutis cyc dv = DCacheOut isv dvfrom ((Phi2F (cyc+2))+LatchDelay)to ((Phi2F (cyc+4))+LatchDelay);let DCachelnis cyc dv = DCacheln isv dvfrom ((Phi2F (cyc)H-LatchDelay)to ((Phi2F (cyc))+LatchDelay);let lAckis cyc v = lAck is vfrom ((Cycle (cyc\u00E2\u0080\u00941)))to ((Phi2F cyc)+LatchDelay);let DAckis cyc v = DAck is vfrom ((Cycle (cyc\u00E2\u0080\u00941)))to ((Phi2F (cyc))+L.atchDelay);let IPFaultis cyc v = IPFault is vfrom ((PhilF cyc))to ((Phi2F cyc)+LatchDelay);let DPFaultis cyc v = DPFault is vfrom ((Phi2R (cyc))+LatchDelay)to ((Phi2F (cyc))+LatchDelay);let NMlnterruptis cyc v = NMlnterrupt is vAppendix A. The FL Verification Script 88from ((Phi2R cyc)+LatchDelay)to ((Phi2F cyc)+LatchDelay);let Interruptis cyc v = Interrupt is Vfrom ((Phi2R cyc)+LatchDelay)to ((Phi2F cyc)+LatchDelay);let Resetis cyc v = Reset is vfrom (Cycle (cyc\u00E2\u0080\u00941))to (Cycle cyc);let BypassCacheis cyc v BypassCache is vfrom ((Phi2F (cyc+2))+LatchDelay)to ((Phi2F (cyc+4))+LatchDelay);let MemCycleis cyc v = NemCycle is vfrom ((Phi2F (cyc+2))+Latchoelay)to ((Phi2F (cyc+4))+LatchDelay);let CopCycleis cyc v = CopCycle is vfrom ((Phi2F (cyc+2))+LatchDelay)to ((Phi2F (cyc+4))+Latchflelay);let ReadWritebis cyc v = ReadWriteb is vfrom ((Phi2F (cyc+2))+LatchDelay)to ((Phi2F (cyc+4))+LatchDelay);let FPRegis cyc dv = FPReg isv dvfrom ((Phi2F (cyc+2))+LatchDelay)to ((Phi2F (cyc+4))+LatchDelay);let IFetchis cyc v = IFetch is vfrom ((PhilR cyc)+LatchDelay)to ((Phi2F cyc)+LatchDelay);let EXEoutis cyc dv = EXEout isv dvfrom ((Phi2F cyc)+LatchDelay)to (Cycle cyc);let MEMoutis cyc dv = (TEMPout isv dvfrom ((Phi2F cyc)+LatchDelay)to (Cycle cyc))(LMDRout isv dvfrom ((Phi2F cyc)+LatchDelay)to (Cycle cyc))let RegFileis cyc addr dv =let rRegFileNodes = rev RegFileNodes inlet sRegFileis i dv =Appendix A. The FL Verification Script 89(el (i) rRegFileNodes) isv dvfrom ((Phi2R cyc)\u00E2\u0080\u0094Setup)to ((Phi2R cyc)+LatchDelay) inletrec iterate i =i = 0 => UNC Ilet iv = num2bv RxAddrWidth i in(if (iv = addr) (sRegFileis i dv)) (iterate (i\u00E2\u0080\u0094i)) initerate (RegFileSize\u00E2\u0080\u00941);// check if an instruction generates a destinationlet GeneratesDestination instr =let Operation = Opcode instr inlet Function = CompFunc instr inlet Combi opcode res = (Operation = opcode) OR res inlet Comb2 (opcode,fncd) res =((Operation = opcode) AND (Function fncd)) OR res in(itlist Combi OpGeneratesTablel F) OR (itlist Comb2 OpGeneratesTable2 F);II check if an instruction is allowed in the instruction sequencelet Selectedlnstr instr =let operation = Opcode instr inlet function = CompFunc instr inlet comb (opcode,fncd) res =((operation = opcode) AND (function = fncd)) OR res in(itlist comb SelectedlnstrTable2 F)II check if an instruction takes two source operandslet Needs2Src instr =let operation = Opcode instr inlet function = CompFunc instr inlet comb (opcode,fncd) res =((operation = opcode) AND (function = fncd)) OR res in(itlist comb Needs2SrcTable F)II check if an instruction is validlet Validlnstr instr =let Operation = Opcode instr inlet Function = CompFunc instr inlet Combi opcode res = (Operation = opcode) OR res inlet Comb2 (opcode,fncd) res =((Operation = opcode) AND (Function = fncd)) OR res inAppendix A. The FL Verification Script 90((itlist Combi OpcodesTablel F) OR(itlist Comb2 OpcodesTable2 F)) AND (Selectedlnstr instr);letrec ValidlnstrList 1 = 1 = C] > T IValidlnstr (hd 1) AND ValidlnstrList (tl 1);II determine bypass conditionslet BypassFromExe src instrlist =let Prevlnstr = hd instrlist inlet Generate = GeneratesDestination Prevlnstr inGenerate AND (src = (RdAddr Prevlnstr)) AND (src !t ZeroAddr)let BypassFromMem src instrlist =let Prevlnstr = hd instrlist inlet PrevPrevlnstr = hd (tl instrlist) inlet Generate = GeneratesDestination PrevPrevlnstr inNOT (BypassFromExe src instrlist) ANDGenerate AND (src = (RdAddr PrevPrevlnstr)) AND (src ZeroAddr)II virtual register file implementation mapping functionlet VRegFile cyc addr dv instrlist =let BypassFromExe = BypassFromExe adds instrlist inlet BypassFromMem = BypassFromMem addr instrlist inlet NoBypass = (NOT BypassFromExe) AND (NOT BypassFromMem) in(if (BypassFromExe) (EXEoutis (cyc+1) dv)) @(if (BypassFromNem) (MEMoutis (cyc+1) dv)) \u00C2\u00AE(if (NoBypass) (RegFileis (cyc+1) addr dv))/1 load the circuitlet Oryx = load_exe \u00E2\u0080\u009C. .1. ./bin/Oryx.exe\u00E2\u0080\u009D;II length of the instruction sequencelet Clocks = 5;II which instruction in the sequence we will check forlet Target = 3;II generate historieslet ControlList = (GenControl Clocks []);Appendix A. The FL Verification Script 91let DataList = (GenData Clocks []);let InstrList = (Genlnstr Clocks C]);II assign values to input signals using generated historieslet CurrentState cyc c d src omit =(NMlnterruptis cyc (el 1 c)) \u00C2\u00AE(Interruptis cyc (el 2 c)) 0(IPFaultis cyc (el 3 c)) 0(DPFaultis cyc (el 4 c)) 0(lAckis cyc (el 5 c)) 0(DAckis cyc (el 6 c)) 0(omit > UNC I (if (Validlnstr d) (ICacheis cyc d src)))letrec AssignState cyc cl il = cyc > Clocks > UNC I(CurrentState cyc (hd cl) (hd il) (cyc=Target) (cyc>Target)) 0(AssignState (cyc+1) (tl cl) (tl il))// define data values as Boolean vectorslet U = variable_vector \u00E2\u0080\u009CU.\u00E2\u0080\u009D DataWidth;let V = variable_vector \u00E2\u0080\u009CV.\u00E2\u0080\u009D DataWidth;II specify global constraint herelet GlobalConstraint = ValidlnstrList (prefix Target InstrList)II evaluate the result of an instructionlet EvaluateClass3lnstrFirst Target Rd Newlnstr OldlnstrList U V =((if (InstrIsADD Newlnstr)(VRegFile (Target+1) Rd (U add V) (Newlnstr:OldlnstrList))) 0(if (InstrIsSUB Newlnstr)(VRegFile (Target+1) Rd (U subtract V) (Newlnstr:OldlnstrList))) 0(if (InstrIsSUBNC Nevlnstr)(VRegFile (Target+1) Rd (U subtractnc V) (Newlnstr:OldlnstrList))) 0(if (InstrIsAND Newlnstr)(VRegFile (Target+1) Rd (U bvAND V) (Newlnstr:OldlnstrList))) 0(if (InstrlsOR Newlnstr)(VRegFile (Target+1) Rd (U bvOR V) (Newlnstr:OldlnstrList))) 0(if (InstrIsXOR Newlnstr)(VRegFile (Target+1) Rd (U bvXOR V) (Newlnstr:OldlnstrList))) 0(if (InstrIsNOT Newlnstr)Appendix A. The FL Verification Script 92(VRegFile (Target+i) Rd (bvNOT U) (Newlnstr:OldlnstrList)))(if (InstrIsBIC Newlnstr)(VRegFile (Target+i) Rd (U bvBIC V) (Newlnstr:OldlnstrList))))let EvaluateClass3lnstrSecond Target Rd Newlnstr OldlnstrList U V =(if (InstrIsSLL Newlnstr)(VRegFile (Target+1) Rd (sll U) (Newlnstr:OldlnstrList)))(if (InstrIsSRL Newlnstr)(VRegFile (Target+i) Rd (srl U) (Newlnstr:O].dlnstrList)))(if (InstrIsSRA Newlnstr)(VRegFile (Target+i) Rd (sra U) (Newlnstr:OldlnstrList)))II split the instruction sequence into two parts// the instructions that the processor has executed so far/1 and the instructions that the processor will execute in the futurelet OldlnstrList = (rev (prefix (Target\u00E2\u0080\u0094i) InstrList));let NewlnstrList = (suffix (Clocks\u00E2\u0080\u0094Target+i) InstrList);II check how many source operands the instruction takeslet TakesTwoSources U V NewlnstrList OldlnstrList =let Newlnstr = (hd NewlnstrList) inlet Prevlnstr = (lid OldlnstrList) inlet PrevPrevlnstr = (hd (tl OldlnstrList)) inlet Ra = (RaAddr Newlnstr) inlet Rb = (RbAddr Newlnstr) in(Needs2Src Newlnstr) AND((Ra Rb) OR (U = V)) AND((Ra = ZeroAddr) OR (U = ZeroData)) AND((Rb != Zerokddr) OR (V = ZeroData)) AND(NOT(InstrIsSLL Prevlnstr) OR ((last U) = F)) AND(NOT(InstrIsSLL PrevPrevlnstr) OR ((last U) = F)) AND(NOT(InstrIsSLL Prevlnstr) OR ((last V) = F)) AND(NOT(InstrIsSLL PrevPrevlnstr) OR ((last V) = F)) AND(NOT(InstrIsSRL Prevlnstr) OR ((hd U) = F)) AND(NOT(InstrIsSRL Prevprevlnstr) OR ((hd U) = F)) AND(NOT(InstrIsSRL Prevlnstr) OR ((hd V) = F)) AND(NOT(InstrIsSRL Prevprevlnstr) OR ((hd V) = F)) AND(NOT(InstrIsSRA Prevlnstr) OR ((hd U) = (hd (tl U)))) AND(NOT(InstrIsSRA PrevPrevlnstr) OR ((lid U) = (hd (tl U)))) AND(NOT(InstrIsSRA Prevlnstr) OR ((hd V) = (hd (tl V)))) AND(NOT(InstrIsSRA PrevPrevlnstr) OR ((hd V) = (hd (tl V))))let TakesOneSource U V NewlnstrList OldlnstrList =let Newlnstr = (hd NewlnstrList) inAppendix A. The FL Verification Script 93let Prevlnstr = (hd OldlnstrList) inlet PrevPrevlnstr = (hd (tl OldlnstrList)) inlet Ra = (RaAddr Newlnstr) inlet Rb = (RbAddr Newlnstr) in(NOT(Needs2Src Newlnstr)) AND((Ra ZeroAddr) OR (U = ZeroData)) AND(NOT(InstrIsSLL Prevlnstr) OR ((last U) = F)) AND(NOT(InstrIsSLL PrevPrevlnstr) OR ((last U) = F)) AND(NOT(InstrIsSRL Prevlnstr) OR ((hd U) = F)) AND(NOT(InstrIsSRL PrevPrevlnstr) OR ((hd U) = F)) AND(NOT(InstrIsSRA Prevlnstr) OR ((hd U) = (hd (tl U)))) AND(NOT(InstrIsSRA PrevPrevlnstr) OR ((hd U) = (hd (tl U))))II verify ALU and shift instructionslet VerifyClass3lnstr =let Newlnstr = (hd NewlnstrList) inlet Ra = RaAddr Newlnstr inlet Rb = RbAddr Newlnstr inlet Rd = RdAddr Newlnstr inlet Ant =(GenOryxClock Clocks) 0(Reset is F from (Cycle 0) to (Cycle Clocks)) 0(Exception is F from (Cycle 0) to (Cycle Clocks)) 0(Squash is F from (Cycle 0) to (Cycle Clocks)) 0(AssignState 1 ControlList InstrList) 0(if (GlobalConstraint) ((if (TakesTwoSources U V NewlnstrList OldlnstrList)((VRegFile (Target) Ra U OldlnstrList) (0(VRegFile (Target) Rb V OldlnstrList))) (0(if (TakesOneSource U V NewlnstrList OldlnstrList)(VRegFile (Target) Ra U OldlnstrList)))) inlet Cons =(if (GlobalConstraint)((if (TakesTwoSources U V NewlnstrList OldlnstrList)(EvaluateClass3lnstrFirst Target Rd Newlnstr OldlnstrList U V)) 0(if (TakesOneSource U V NewlnstrList OldlnstrList)(EvaluateClass3lnstrSecond Target Rd Newlnstr OldlnstrList U V)))) in(nverify Options Oryx VarList Ant Cons TraceList);II find verification timetime(VerifyClass3lnstr);"@en . "Thesis/Dissertation"@en . "1994-11"@en . "10.14288/1.0064844"@en . "eng"@en . "Electrical and Computer Engineering"@en . "Vancouver : University of British Columbia Library"@en . "University of British Columbia"@en . "For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en . "Graduate"@en . "Formal verification of a 32-bit pipelined RISC processor"@en . "Text"@en . "http://hdl.handle.net/2429/5260"@en .