TECHNOLOGY MAPPING AND LAYOUT SYNTHESIS OF DCVSByCarly WongB. A. Sc. (Electrical Engineering) University of British Columbia, 1990A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinTHE FACULTY OF GRADUATE STUDIESELECTRICAL ENGINEERINGWe accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIAApril 1992© Carly Wong, 1992In presenting this thesis in partial fulfilment of the requirements for an advanced degree at theUniversity of British Columbia, I agree that the Library shall make it freely available for refer-ence and study. I further agree that permission for extensive copying of this thesis for scholarlypurposes may be granted by the head of my department or by his or her representatives. Itis understood that copying or publication of this thesis for financial gain shall not be allowedwithout my written permission.Electrical EngineeringThe University of British Columbia2356 Main Mall Vancouver, CanadaV6T 1Z4Date:4pra z S. Igq?...AbstractDifferential Cascode Voltage Switch (DCVS) logic is a dynamic logic family that has a numberof desirable properties. In particular, it is hazard-free, easy to make fully robust path delay-fault testable, and has a number of unique timing properties that make it very suitable forself-timed circuits.This thesis investigates the problem of implementing logic with DCVS, and in particular, theautomatic synthesis of DCVS circuits. This task is challenging because DCVS gates are usuallysignificantly more complex than standard static CMOS both in terms of internal connectivityand number of inputs. Most of the work done in circuit synthesis, such as technology mappingand layout synthesis, are not directly applicable to DCVS.We present DMAP, a technology mapping and layout synthesis system for DCVS. The sys-tem operates in three steps. First, the combinational circuit is decomposed and partitioned intosizable Boolean function clusters. Secondly, DMAP takes each cluster function, and generates aDCVS cell layout that implements that function. We develop a heuristic algorithm for finding asuitable transistor path to lay out the DCVS pull-down network. Finally, the generated DCVScells are placed and routed inside the CADENCE environment.Experimental results indicate that DCVS circuits can be implemented with considerablyfewer cells than by conventional mapping techniques. Furthermore, the number of nets thatneed to be wired is typically less than twice the number of nets used in standard mapping. Thisis better than the intuitive assumption that dual-rail circuits require twice as many wires astheir single-rail counterparts. Large DCVS circuits can be feasibly produced. DMAP is a firststep towards an integrated system for dynamic circuit synthesis.iiTable of ContentsAbstractList of Figures^ viList of Tables^ viiiAcknowledgement^ ix1 Introduction^1.1^Outline of Thesis ^^1.2^Basic Definitions and Notations ^1121.2.1 Integrated Circuit Theory 21.2.2 Some Computer-Aided Design (CAD) Terms ^ 41.2.3 Graph Theory ^ 52 Discussion of Problem 72.1 Technology Mapping ^ 112.1.1 Network Representation ^ 112.1.2 DAG Matching ^ 112.1.3 DAG Matching Applied to Technology Mapping ^ 122.2 Cell Layout Synthesis ^ 152.2.1 Static CMOS Cell Layout Synthesis ^ 162.2.2 Transistor Placement for Static CMOS 16iii2.3 Differential Cascode Voltage Switch Logic ^ 202.3.1^Other Dynamic CMOS Technologies 202.3.2^DCVS Operation ^ 232.3.3^DCVS Properties 242.3.4^DCVS Applications ^ 272.4 Technology Mapping and Layout Synthesis for DCVS ^ 292.4.1^The Problem ^ 302.4.2^Previous Work in DCVS Synthesis ^ 313 DMAP: A DCVS Technology Mapping and Layout Synthesis System 353.1 System Overview ^ 363.2 First Step: Clustering 373.2.1^Circuit Clustering in DMAP ^ 383.2.2^Lawler's Algorithm ^ 383.2.3^Modified Lawler's Algorithm ^ 393.3 Second Step: Cell Generation 433.3.1^Obtaining the DCVS Logic Tree ^ 453.3.2^Obtaining the Transistor Layout Path 483.3.3^Obtaining the Track Connections ^ 513.3.4^Layout Issues ^ 523.4 Third Step: Connecting Cells Together ^ 554 Experimental Results 564.1 Evaluating DMAP's Stand-Alone Performance ^ 564.2 Comparing DMAP with Conventional Mapping 57iv4.3 After Placement and Routing ^ 585 Conclusions and Future Work^ 685.1 Main Contributions ^ 685.2 Future Directions 69Bibliography^ 71A Algorithm for Finding an Euler Path^ 76B CADENCE Specific Notes^ 78B.1 Fixing Cells ^ 78B.2 Making the Chip 79vList of Figures1.1 CMOS transistor symbols^ 31.2 Static and dynamic CMOS. 32.1 Levels of representation during digital design automation. ^ 82.2 Directed Acylic Graphs ^ 122.3 Non-isomorphic DAG representations for the function f = abcd. ^ 132.4 Illustration of tree matching in technology mapping. ^ 142.5 Finding the Euler path and transistor placement 172.6 Adding pseudo-edge to find an Euler path. ^ 192.7 Differential Cascode Voltage Switch 212.8 Dynamic DOMINO design. ^ 222.9 Dynamic NORA design. 222.10 3-Input DCVS AND/NAND gate. ^ 242.11 Generic asynchronous pipeline. 282.12 DCVS completion signal generation. ^ 292.13 Six different DCVS trees for the function (f = AB + C). ^ 332.14 Illustration of local optimization in ACORN. ^ 343.1 DMAP system flow diagram.^ 353.2 Illustration of Lawler's clustering algorithm (M = 3). ^ 403.3 Illustration of DMAP clustering. ^ 42vi3.4 Layout of a generated DCVS 3-input XOR cell. ^ 443.5 Unordered and ordered BDDs. ^ 463.6 Converting a BDD to a DCVS pull-down tree. ^ 473.7 Example of BDDs for the same function using different variable orderings^483.8 Illustration of DMAP's transistor path finding algorithm. ^ 503.9 Illustration of Left-Edge-Algorithm. ^ 533.10 Some generated DCVS cell layouts 534.1 MIS-II mapped chip^ 634.2 DMAP mapped chip 644.3 Core area of a MIS-II mapped chip. ^ 654.4 Core area of a DMAP mapped chip. 66viiList of Tables4.1 DMAP generated cells for the ISCAS benchmark circuits. ^ 584.2 DMAP generated cells for the MCNC benchmark circuits 594.3 DMAP vs. MIS-II in cell and net count for ISCAS benchmarks. ^ 604.4 DMAP vs. MIS-II in cell and net count for MCNC benchmarks. ^ 614.5 DMAP vs. MIS-II in cell area for MCNC benchmarks^ 624.6 DMAP vs. MIS-II in cell area for ISCAS benchmarks. 67viiiAcknowledgementI would like to express my sincere thanks to:Dr. Carl Seger, mentor and true scholar, who supervised me through-out the course of this work, through all the nitty gritty details, andwhose incredible patience and dedication to his students and to hisresearch will always be an inspiration to me;Dr. Patrick McGeer, who despite being 1000 miles away, still gavetremendous encouragement, to-the-point email responses, and whotaught me many things from how to give a presentation to thedifference between "it's" and "its";Dr. Dan Camporese, who first introduced me to, and encouraged meto continue in, the wonderful world of graduate studies, VLSI, andCAD of VLSI;Dave Gagne, who taught me many tricks in EDIF and EDGE and allthat fun stuff;Elizabeth Seto, who proofread my thesis draft with great care;Many other faculty and staff members in EE and CS, who helped tomake life easier and enjoyable;All my friends, some nearby and some far away, who listened andshared with me, who laughed and cried with me;And last, but not least, my parents, whose unfailing love and supportwere behind me at all times.I thank God for all of you for making this all possible.ixChapter 1IntroductionDynamic logic families, such as Differential Cascode Voltage Switch (DCVS) [24], have veryattractive properties both at the device level as well as the design level. This thesis analyzesthe problem of implementing combinational logic with DCVS. DMAP, a technology mappingand layout synthesis system for DCVS, is presented. The system takes as input, arbitrarycombinational logic, and produces layout cells ready for placement and routing in the CADENCEdesign environment.1.1 Outline of ThesisThe remainder of Chapter 1 reviews basic definitions and notations used in this thesis report.Discussion of the thesis topic begins in Chapter 2. First, the Computer-Aided Design automa-tion process is presented. We show, in particular, how the problems of technology mapping andlayout synthesis fit into the design cycle. Conventional methods of attacking these problems arereviewed. Next, DCVS logic is presented as an attractive design technology. We discuss DCVS'sdynamic operation, DCVS properties, and DCVS applications. We then describe the problemthis thesis seeks to solve: technology mapping and layout synthesis for DCVS. Previous workrelated to solving this problem is reviewed.Chapter 3 describes in detail DMAP, a system which performs technology mapping andlayout synthesis for DCVS. First, an overview of the system is given. Each of the system steps:circuit clustering, cell generation, and cell connecting, are elaborated on. Clustering is done to1Chapter 1. Introduction^ 2obtain sizable functions from which to implement the DCVS gates in order to achieve minimumdelay between the circuit's inputs and outputs. DCVS cells are generated in a systematic wayin a one-dimensional transistor strip layout style. The cell generation step investigates thesub-problems of determining suitable DCVS logic trees, transistor path finding, and intracellrouting. CADENCE placement and routing tools connect the generated cell library together toproduce a final DCVS chip layout.Experimental results of DMAP, and the discussion of these results, are given in Chapter4. The system presented in this thesis is compared with conventional technology mappingsystems in terms of various factors such as gate count, wire count, and cell area count. Finally,conclusions and future directions appear in Chapter 5.1.2 Basic Definitions and NotationsThis section is devoted to reviewing some common definitions and notations in graph theory andintegrated circuit theory that will be used in the development of this thesis. Readers alreadyfamiliar with the jargon and basics in these areas may wish to skip this section, and proceed toChapter 2.1.2.1 Integrated Circuit TheoryComplementary Metal Oxide Silicon (CMOS) technology has become the most common fabri-cation process for digital integrated circuit industry. It provides two types of transistors as thebasic building blocks in digital circuit design: the N- type or nMOS transistor; and the P- typeor pMOS transistor. The symbols used to represent them are shown in Figure 1.1. The N-typetransistor is an active-high switching device which acts as a closed switch when its input ishigh, i.e., when GATE is high, SOURCE and DRAIN are connected. The P-type transistoris an active-low switching device which acts as a closed switch when its input is low, i.e., whenChapter 1. Introduction^ 3GATE is low, SOURCE and DRAIN are connected.SOURCE^ SOURCEGATE GATEDRAIN^ DRAINPMOSP—type( b) NMOSN —typeFigure 1.1: CMOS transistor symbols.These transistors can be used in different ways to design digital systems. There are twomajor classifications of design styles or design technologies used with these CMOS devices:static CMOS and dynamic CMOS.PMOS 0PULL—UP ATREEz=n-nZ — AllNMOS AHNMOS PULL—DOWNPULL—DOWN TREETREE B0(a)^(b)Figure 1.2: (a) Static CMOS (b) Dynamic CMOSStatic CMOS is also referred to as fully complementary CMOS logic. Figure 1.2(a) showsChapter 1. Introduction^ 4a static CMOS 2-input NAND gate. The P-type transistor network, or pull-up (to power), andthe N-type network, or pull-down (to ground), are series-parallel duals of each other. This factsuggests obvious transistor redundancy, since both networks serve to realize the same function.This led to the development of dynamic circuits.A basic dynamic CMOS gate is shown in Figure 1.2(b). It consists of a single logic networkimplemented in the pull-down nMOS region (in this case, a NAND function), and operates intwo stages. During the precharge phase when 0 = 0, the P-type transistor is turned on while theN-type transistor is turned off. This causes the output to be pulled high, while the pull-downnetwork is "disabled", or non-conducting. During the evaluation phase when 0 = 1, the P-typetransistor is turned off while the N-type transistor is on. As a result, the pull-down network is"enabled", or conducting.There are two types of logic in digital circuits. Combinational (Boolean) logic have outputsthat are a function solely of the input values, while sequential logic have outputs that arefunctions of the inputs as well as data stored in the circuit.There are two basic sequencing techniques in digital systems. A synchronous system usesa fixed global clock with a period that is longer than the worst-case propagation delay. Themajor drawbacks of this method are loss of efficiency, clock signal routing, and difficulty ininterfacing with the asynchronous world. An asynchronous or self-timed system have eventsinitiated by completion signals created by other events. Although timing depends on circuitdelays, the system is guaranteed to sequence properly by adhering to a protocol. This allows formodular design of the system, and there is no upper limit on the size of the system. However,asynchronous circuits generally require more hardware than their synchronous counterparts,and are difficult to design.1.2.2 Some Computer-Aided Design (CAD) TermsStandard cell or polycell refers to an existing library of gate cells used by CAD systems toChapter 1. Introduction^ 5build circuits. A macrocell is a generated cell, usually implementing more complex logic thanstandard cells. Gate arrays and other regular structures are often considered macrocells.A net is simply a set of interconnected nodes in a circuit. A netlist is a circuit representation(which may be a text file, or some other data format) that describes the circuit in terms ofinterconnections between components. These components may be transistors, gate cells, orother circuit building blocks. In other words, a netlist is a "list" of "nets" and the objectsconnected to the nets. An input or output rail is a wire coming into or out of a component,and which connects to other components. Routing refers to connections between inputs andoutputs in a circuit. Global or intercell routing refers to connections between gates or cells in acircuit, while internal or intracell routing refers to connections between devices within a cell.The goal of circuit synthesis is to reduce the effort required to design a circuit while atthe same time produce circuits of comparable, if not better, quality than conventional hand-designed ones. Synthesis in CAD is used at different levels of the design hierarchy to producedifferent intermediate circuit representations. We will return to this in Chapter 2.1.2.3 Graph TheoryMany CAD algorithms use graph representations of circuit networks. Here we define somecommon graph terminology. For a more comprehensive coverage of graph theory and graphalgorithms, the reader is referred to [26, 52].A graph consists of a finite set of vertices or nodes, (i = 1, 2, ...n), and a finite set of edgesor arcs, ei,j, joining a pair of vertices Vi and Vi (ei,i is also said to be incident on V and I/j).Two vertices are adjacent or neighbours if they are joined by an edge. Two edges are adjacent ifthey are both incident to the same vertex. A vertex is of degree k if there are k incident edges.A directed graph is a graph where all the edges have a direction (e.g., an edge can be followedfrom V to V." but not from 173 to Vi). A directed acyclic graph (DAG) is a directed graph with noChapter 1. Introduction^ 6cycles. Two graphs are isomorphic if there is a one-to-one correspondence between the verticesand edges. For directed graphs, the edge directions also correspond. A tree is a connectedgraph with no cycles. A rooted tree T is a tree with one distinguishing node, called the root,and the remaining nodes are disjoint subtrees T 1 , T2 , ..., Trn . Nodes with no subtrees are calledleaves; remaining nodes are called internal nodes.A Path (V1 , V2 , ..., Vn) is an ordered sequence of vertices such that from every 14 to V, +1there exists an edge. A graph is connected if there is a path between any two nodes of the graph.A path where V1 = V?, is called a cycle. An Euler path, or an Eulerian path, in an undirectedand connected graph, is a path that traverses every edge of the graph exactly once.Chapter 2Discussion of ProblemIn Very-Large-Scale-Integration (VLSI) circuit design, Computer-Aided Design (CAD) toolsare used to automate parts or all of the design process. Complete design systems, referred toas silicon compilers [19], automate the whole design process by taking, as input, a high levelbehavioral description, and producing final layout. Individual CAD tools automate part of thedesign process, producing various intermediate representations. The goal of design automationis to produce near-optimal results of comparable and/or better quality than manually designedcircuits. The dividing lines between different levels of design abstraction are sometimes quitefuzzy, but can be generally classified as follows (see Figure 2.1):1. Behavioral Representation. A behavioral description of a digital system describes thealgorithm, flow and control of information in the system. It is usually described in ahigh-level programming-like language such as VHDL [5].2. Register Transfer Level (RTL) / Logical Representation. At this level, the system isdescribed in terms of its modules, and their interconnections. The datapath and controlparts of the design are separated, and the purely combinational parts and memory storageparts of the design are also separated. The control parts are created through a processcalled finite state machine synthesis.3. Gate Level Representation. This level of representation specifies the interconnectionsbetween logic gates. Here, a "gate" is defined as a physical component that realizes aspecific logic function using transistors. For example, a gate level netlist may contain7TransistorNetlistbeginif RUN='1' thenSTOP='0'elseSTOP=1'end if;BehavioralDescriptionGate—LevelNetlistRTLDescriptionmemory elements + combinational logiclogic minimizationdecomposed networktechnology mappinga)-oa)a)-occri>‘00_c0Chapter 2. Discussion of Problem^ 8High—Level Synthesis(Behavioral Synthesis)Logic SynthesisLow—Level Synthesis(Layout Synthesis)VLayoutRepresentationd.V.t.ffcb ri I E ‘ ,`2Eli^lki trolci3t- _milcr-151carFigure 2.1: Levels of representation during digital design automation.Chapter 2. Discussion of Problem^ 9programmable-devices, standard-cells, or macrocells.4. Transistor Level Representation. At this level, the complete circuit is described as inter-connections between transistor elements.5. Physical Layout. This is the physical layout representation of the design at the polygonlevel. The layout of a system includes the layout of the sequential and combinationalparts, as well as the routing of metal wires between the components.The process of going from level 1 to level 2 is generally referred to as high-level synthesis orbehavioral synthesis, while going from level 3 down to layout is considered low-level synthesis orlayout synthesis. Taking the combinational blocks from the RTL level down to the gate level iscalled logic synthesis. In this thesis, we are interested in logic synthesis and low-level synthesis.In most of the current design automation procedures, the storage elements at the RTL level ofthe design are assumed to be predesigned and ready to use. The task of logic synthesis is toproduce a correct final gate level implementation which meets timing and testability constraints,and tries to minimize area. After this step, the combinational blocks are combined with thememory elements to form the overall design.Logic synthesis can be divided into two approaches: two-level synthesis and multi-levelsynthesis [12]. Two-level logic synthesis algorithms, generally used for two-level programmable-logic-array's (PLA) implementations, concentrate on minimizing area by minimizing the numberof PLA product terms [11]. Multi-level logic synthesis, on the other hand, optimizes randomlogic, to be implemented with non-array structured gates, such as individual NAND gates orAND-OR-INVERT (A01) gates. Though some applications are naturally suited to two-level im-plementations, the multi-level realization generally offers better design freedom and flexibility.The multi-level logic synthesis task can be considered in two consecutive stages: a technology-independent optimization step, followed by a technology-dependent mapping step. The firststage involves applying algebraic Boolean transformations on the combinational circuit, in anChapter 2. Discussion of Problem^ 10attempt to minimize overall layout area of the final chip, the critical path delay time, and tomaximize circuit testability [13, 11]. The second processing step is referred to as technologymapping. It takes an arbitrary combinational logic description, and "maps" that into somedesign technology dependent description, which may use standard cells or programmable de-vices or macrocells. In other words, it implements the circuit with specific gate choices. Itperforms this mapping without significantly altering the structure of the circuit network thatwas produced after the first optimization step.The technology mapping step depends on the design method chosen. The use of pro-grammable devices, such as programmable logic devices (PLDs) and programmable gate arraysis becoming increasingly popular in the application-specific-integrated-circuit (ASIC) industry,and their usage is currently fueling much new research. Examples of some popular architecturesare those by A.M.D, Altera, Xilinx and Actel.If the standard cell design method is used, it is assumed that the library cells have beenmanually predesigned. This is the most common method of technology mapping. Automaticallygenerated cells are also a design option. Arbitrary logic cells need to be individually synthesizedfrom the logic description level down to cell layout. This process can be referred to as cell layoutsynthesis.The next two sections in this chapter (Section 2.1 and Section 2.2) will discuss in greaterdetail the general problem of technology mapping, and in particular, mapping for standardcells, as well as the problem of layout synthesis. Subsequently in Section 2.3, we describe arelatively new design technology called DCVS. DMAP, the system described in Chapter 3 ofthis thesis, performs technology mapping and layout synthesis for DCVS.Chapter 2. Discussion of Problem^ 112.1 Technology MappingIn conventional standard cell library design, the problem of technology mapping is one ofchosing the right gates from the cell library to implement the given combinational circuit, whileoptimizing for certain constraints such as area and speed. A good mapper should be able toadapt to changing libraries, and handle different technology-dependent cost functions. Themost popular technology mapping technique is the graph covering method. We will begin bydescribing this technique.2.1.1 Network RepresentationAny Boolean network can be represented as a directed acyclic graph (DAG), such as the onein Figure 2.2(a) 1 . Each node is associated with an output variable y2 , and a representation ofa logic function ft . A directed arc from a node i to a node j exists if the function f3 dependson the variable yi. The leaves of the graph are the network inputs or primary inputs, while theroots are the circuit outputs, or primary outputs. If there is an arc from node j to node i, nodej is a fanin of node i, and node i is a fanout of j.An arbitrary DAG can be partitioned into a forest of trees by making each node with fanoutgreater than one, the root of the new tree, as illustrated in Figure 2.2(b).2.1.2 DAG MatchingIt was observed that the standard cell library technology mapping problem is much like thecompiler code generation problem, where a set of high level instructions are mapped onto aset of machine instructions for a particular target machine [28]. Each high level code sequencecan be represented as a DAG, called a subject graph; each machine instruction for the targetmachine can also be decomposed into a small DAG, called a pattern graph. Each pattern DAG'Henceforth, the terms graph, network, and circuit will be used interchangeably.Chapter 2. Discussion of Problem^ 12(a)^(b )Figure 2.2: (a) DAG (b) partitioned DAG (forest of trees)has an associated execution time cost. The optimum code is generated by finding an optimummapping to cover the large DAG with the little pattern DAGs [6].2.1.3 DAG Matching Applied to Technology MappingSimilarly in technology mapping, each combinational circuit is decomposed into DAGs of stan-dard form (i.e., the nodes of the graph are base functions such as NANDS and INVERTERS, orNoits and INVERTERS). If there is any node with fanout greater than one, partitioning is per-formed by making that node the root of a new DAG. The resulting DAGs form the set ofsubject graphs. Each gate in the standard cell library is represented by several small-sizedDAGs decomposed from the logic function of that gate. Since there are many possible treesfor one gate, only the non-isomorphic patterns are used. For example, Figure 2.3 shows twodifferent DAG patterns for a 4-input NAND gate using 2-input NAND gates and INVERTERS. Theunion of the set of DAGs for each library function is used as the set of pattern graphs.Chapter 2. Discussion of Problem^ 13ff = abcdFigure 2.3: Non-isomorphic DAG representations for the function f = abcd.The technology mapping problem boils down to matching every branch in the subject graphwith one of the tree patterns from the standard cell library. The problem of optimal codegeneration, where a subject DAG is matched by pattern DAGs, is an NP-complete problem[6]. However, in technology mapping where the DAGs are restricted to trees, the problembecomes linear in both the size of the subject tree and pattern forest. The choice of patternsdepends on the technology mapper's cost evaluation, which may consider area, speed, and otherparameters. Each pattern graph has an associated cost function. Achieving a complete coveror map means that every node in the subject graph must be contained in one or more of thepattern graphs, and that each input of a pattern graph is the output of another pattern graph.DAGON is a technology mapper based on DAG matching developed by Keutzer [28] at AT&T.This approach was later extended at Berkeley in the MIS logic synthesis system [10].To illustrate the concept of graph matching, consider Figure 2.4 where technology mappingis performed on a full-adder circuit using MIS. Figure 2.4(a) shows one decomposition for anumber of basic gates into base form tree patterns. Recall that there are several possible decom-positions for each gate, and all of them are considered as patterns in matching. Figure 2.4(b)(a) (C)^—).nand3aoi22coutoai21SUMChapter 2. Discussion of Problem^ 14cincinaLA InvaInvD-1inv nand3N sum^ nand3nond4 —sum^ nand3nand3cout—D— oai21^—cout^ aoi22 ^(b) (d)Figure 2.4: Illustration of tree matching in technology mapping.(a) A possible tree pattern for some library gates.(b) Decomposed adder circuit before mapping.(c) Decomposed adder circuit with INVERTERS inserted.(d) Adder circuit after mapping with MIS-IIChapter 2. Discussion of Problem^ 15shows the decomposed full-adder circuit. Technology mapping in MIS differs from DAGON pri-marily by the fact that MIS inserts INVERTERS in the circuit (without affecting functionality)to enhance the number of successful pattern "matches". From Figure 2.4(c), we can see thatonce INVERTERS are added to the circuit, a number of familiar patterns from Figure 2.4(a) canbe identified in the circuit. Finally, Figure 2.4(d) shows the result of the graph matching.After technology mapping, the structure of the circuit is fixed, and optimizations can nolonger be performed on the combinational circuit as a whole. Any further optimizations wouldhave to occur inside the gate level. For standard cells, the layout is predesigned and fixed, andnothing more can be done with the layout before global placement and routing. For generatedcells, the gate's internals can be synthesized to meet still more optimization parameters. Indeed,this is one of the advantages, and objectives, of cell layout synthesis.2.2 Cell Layout SynthesisLaying out a circuit at the physical level is a tedious and time consuming process. It becomesvery costly to do the layout portion of the design (draw, check, and correct layout) by hand.Therefore, it makes sense to use either layout cells from a pre-drawn standard cell library whichare usually designed to be very compact and efficient, or to use off-the-shelf programmabledevices. As design methods and semiconductor technologies rapidly evolve, it becomes verycostly to maintain a standard cell library. This problem can be significantly alleviated if alayout synthesis program is used to generate cell layouts of reasonable area and performance.This way, the technology-dependent factors can easily be modified inside the synthesis program.Whether to generate basic standard cell libraries, or more complex cells, layout synthesis is ofconsiderable interest. The automatic design and generation of such polycells include two mainsteps:1. Determining the transistor network (i.e., transistor netlist).Chapter 2. Discussion of Problem^ 162. Placing the transistors to meet the design rule specifications.2.2.1 Static CMOS Cell Layout SynthesisMost of the work done in layout synthesis has been for complementary static CMOS design.There is little published work on layout synthesis for dynamic CMOS design. In static CMOS,the first step of obtaining the transistor netlist is relatively straightforward for smaller functions,since the logic function of the cell can easily be translated to a series-parallel connection oftransistors, where the pull-up and pull-down networks are duals of each other.Significant work has been done in cell synthesis for static CMOS standard cells to performthe second step above: given a transistor netlist, lay out the network to meet design rulerequirements.2.2.2 Transistor Placement for Static CMOSUehara and vanCleemput [58] introduced the automatic generation of cell layout in one-dimensionaltransistor arrays. Subsequent work by other people has built on this concept. Given the tran-sistor netlist, the CMOS static circuit is converted into two graphs, one for the pull-up (P-typetransistor region), and one for the pull-down (N-type transistor region). The vertices corre-spond to the transistor connections (i.e., SOURCE/DRAIN), and the edges correspond tothe transistors. Figure 2.5(a) shows an example of a CMOS logic gate f = (AB)(C D), andFigure 2.5(b) shows the two corresponding graph representations for the pull-up and pull-downnetworks.Uehara and vanCleemput's algorithm for finding the layout path is essentially one of findingEuler paths in the pull-down and pull-up graphs, and then laying out the transistors in twostrips according to the Euler path labeling. An algorithm for finding an Euler path is givenin Appendix A. Sometimes, a graph will not have an Euler path, in which case imaginaryC C D^C( 1D0(out)(out)(gnd)(gnd)vddoutgnd(a) (b)^ (c ) (d)BA C D(vdd) (vgld)path =path =B Chapter 2. Discussion of Problem^ 17Figure 2.5:Finding the Euler path and transistor placement:(a) Static CMOS gate for the function (AB) * (C D).(b) Graph corresponding to pull-up and pull-down logic tree.(c) Euler paths with the same labeling for the pull-up and pull-down graphs.(d) Corresponding layout of the gate with two transistor strips.Chapter 2. Discussion of Problem^ 18edges called pseudo-edges are added to the graph. The Uehara and vanCleemput method fortransistor placement can be summarized as follows.1. Find all the Euler paths that cover the two disjoint pull-up and pull-down graphs.The objective for finding this path is to allow consecutive transistors in the path to sharea diffusion region. In other words, the source-drain connections of adjacent transistors onthe path are made by abutment in diffusion. Figure 2.5(c) shows an Euler path for thepull-up graph and one for the pull-down graph.2. Find a pull-up path and a pull-down path such that the labeling of the two paths are thesame.Using two paths with the same labeling allows the transistor inputs to be aligned, so thatthe inputs can be connected with a vertical polysilicon input rail without any horizontalrouting. This saves area and reduces internal routing complexity. Figure 2.5(d) showsthe corresponding layout of our CMOS circuit example.3. If the paths in Step 2 are not found, add pseudo-edges to the graph(s) so that the pathsin Step 2 can be found.When pseudo-edges are added, the resulting Euler path is equivalent to two or moreindividual Euler paths joined by edge separations. When laid out, these separationsbecome diffusion gaps. If the pull-up and pull-down transistor arrays do not have aligningpseudo-edges, one of the transistor strips will need to have a slightly wider diffusionconnection to allow for alignment. Figure 2.6(a) shows a graph where a pseudo-edgeneeds to be added to obtain an Euler path. Figure 2.6(b) shows the added pseudo edgefrom node 5 to node 3 and the resulting Euler path.An improvement of Uehara and vanCleemput's non-optimal heuristic was described in [33],where an optimal, non-exhaustive, method of minimizing the layout area of complementaryChapter 2. Discussion of Problem^ 19(a)^(b)Figure 2.6: Adding pseudo-edge to find an Euler path.series-parallel CMOS cells in the standard cell style was developed. A number of other algo-rithms that extended the basic Uehara algorithm to minimize diffusion or reduce internal wiringhave been described in [9, 25, 44, 47, 61]. Further generalization of the Uehara and vanCleem-put method was presented in [8], where the layout generation is driven by a combination ofoptimization criteria and composition constraints to control various aspects of the layout, suchas wire length, metal utilization, and diffusion breaks.In addition to this standard-cell transistor-row style, other layout styles have been studiedas well, such as those in [49, 54, 50] and gate matrix styles in [62, 63, 55]. A common feature forall of them, however, is that the symmetry between the P-type transistor and N-type transistornetworks in full static CMOS design is considered. For dynamic CMOS, mainly the N-typetransistor region is of concern.Chapter 2. Discussion of Problem^ 202.3 Differential Cascode Voltage Switch LogicIt is desirable to build VLSI circuits using devices of small power consumption, small area anddelay. It was with these goals in mind that the logic design family, Cascode Voltage LogicSwitch (CVSL) [24], surfaced as an attractive design technology. The logic is achieved bycascoding differential pairs of N-type transistors to form a stacked pull-down tree capable ofrealizing complex Boolean functions within a single gate delay. Its single-ended form, Single-Ended Cascode Voltage Switch (SCVS) produces one output function from a binary pull-downtree. On the other hand, its differential form, Differential Cascode Voltage Switch (DCVS) 2 ,produces both the true and complemented form of the output i.e., each cell computes both qand the complement v. DCVS exists in both static and clocked (or dynamic) styles as shownin Figure 2.7. In this thesis, we consider the dynamic version. Before proceeding to describethe operation of the dynamic DCVS cell, we briefly survey some previous dynamic CMOStechniques and some of their shortcomings.2.3.1 Other Dynamic CMOS TechnologiesFirst, let us consider the generic dynamic circuit of Figure 1.2(b). The initial precharge or"set-up" state of the pull-down network must be nonconducting. To achieve this, a commondesign practice is to make all the inputs low to ensure that there is no pull-down path. Duringevaluation, when the pull-down network conducts, the output of the network will be pulledlow to ground. This implies that the output of the network must have initially been high forthis transition to occur. However, if this output is connected to the input of another dynamicgate in a cascading configuration, the required "set-up" condition for the next gate cannot beachieved.Krambeck, Lee, and Law [30] solved this dilemma by putting a static INVERTER gate at theoutput of the pull-down network. This is called DOMINO logic (Figure 2.8). The output f of2 DCVS is sometimes referred to as Differential Cascode Voltage Switch Logic (DCVSL) in the literature.prechargedual ^railinputsdualrailinputsDCVS NMOSPULL—DOWN TREEPULL—DOWN TREEDCVS N MOSprecharge H(a) (b)Chapter 2. Discussion of Problem^ 21Figure 2.7: Differential Cascode Voltage Switch: (a) Static (b) Dynamic.the INVERTER is the real output of the gate, and is precharged low. When this output is fedinto subsequent DOMINO gates, this initial precharged value turns the transistors off. Duringevaluation when the pull-down network is conducting, f goes high.The major drawback of DOMINO is that the gate is non-inverting. DOMINO is, by itself,an incomplete logic. The pull-down network realizes the complement of some function, whilethe static INVERTER forces the gate output to be the uncomplemented function. This makesimplementing arbitrary combinational logic awkward with DOMINO.An alternative design which provides more logic flexibility is NORA [22], where nMOS pull-down gates and pMOS pull-up gates alternate. Figure 2.9 shows a N-type gate feeding into aP-type gate, which feeds into an N-type gate. Since the P-type gate is disabled with invertedinputs, the first N-type gate output need not be inverted. The P-type gate is precharged whenis high, so that the network is disabled, and the output is connected to ground.The major drawback of NORA, however, is that the P-type gates tend to switch slower thanNMOSPULL—DOWNTREE(a)inputsNMOSPULL—DOWNTREENMOSPULL—DOWNTREE--1PMOSPULL—UPTREE0-cLi^t7t_ciChapter 2. Discussion of Problem^ 22f=AB+CD(b) Figure 2.8: Dynamic DOMINO design:(a) Generic DOMINO gate.(b) Detailed schematic of a DOMINO gate for the function (f = AB + CD).Figure 2.9: Dynamic NORA design.Chapter 2. Discussion of Problem^ 23the N-type gates. This is due to lower electron mobility in the p-well. Furthermore, the need toalternate the two types of gates makes NORA circuits very awkward to design. In comparison,DCVS circuits have all the advantages of fast-switching dynamic CMOS, while also being easyto design since every gate realizes both the complemented and uncomplemented form of thelogic function.2.3.2 DCVS OperationA basic DCVS cell consists of: a fixed pull-up network, a function dependent pull-down network,a single pull-down transistor controlled by the precharge signal, and two output inverters. Thesecomponents operate together to form a very powerful circuit building block. DCVS takes intrue and complemented input control signals. Figure 2.10 shows a 3-input clocked DCVSAND/NAND gate. Its operation cycle is as follows.First, the cell is precharged, i.e., the precharge signal is set low. Consequently, the outputs fand f of the DCVS cell will both be pulled low. When the precharge signal goes high, dependingon the values on the inputs of the function specific pull-down tree, one of the nodes q and q willbe pulled down, and thus either f or f will go high. If the pull-down tree is designed properly,i.e., the path function for connecting q to the final precharge transistor is the complement ofthe path function for connecting q to the same transistor, then it is easy to verify that for anyDCVS cell, at most one of f and f will ever be high at the same time.In conventional dynamic design, the gates must be clocked (or precharged) at a minimumoperating frequency to maintain the output values. There are two extra P-type pull-up tran-sistors in a cascode configuration acting as "staticizers". When either q or q is realized, thepair of feedback transistors cause the outputs to immediately stabilize. Hence, there is no lowerlimit on the frequency of the precharge signal in a DCVS cell. This is important for reliabilityreasons as well as when using the logic in more unconventional circuit designs. For exam-ple, very efficient self-timed circuits can be designed using DCVS logic. In such designs, theChapter 2. Discussion of Problem^ 24precharge^\i•Figure 2.10: 3-Input DCVS AND/NAND gate.precharge/evaluation cycle is initiated by the completion of some previous circuit component,whose timing may be impossible to predict.2.3.3 DCVS PropertiesEarly interest in dynamic logic arose from a naive view of circuit layout tradeoffs. It wasthought that due to the absence of a pull-up network, a dynamic logic layout, with roughly halfthe transistors of static logic, should be more compact than its static cousin. This argumentwas misleading for a couple of reasons. First, most of the area consumption in VLSI layout isdue to wiring, not transistors. Secondly, prevention of latchup in CMOS technologies requiresa comparatively large separation between the pull-up and pull-down regions. This in turnenforces a very confined topology on any CMOS cell, which in turn prevents either designers orlayout tools from taking advantage of the relatively empty pull-up region presented by dynamiclogic technologies.Chapter 2. Discussion of Problem^ 25Subsequently, research in dynamic logic has concentrated on the perceived advantages inpower consumption. It was thought that dynamic logic might save power due to two centraleffects:• In static logic, as each gate switches, a short connection between power and ground isestablished, with only two weak resistors in the path; this short effect is responsible for afair amount of static logic's power consumption.• Power is consumed each time a gate switches; in static logic, each gate switches potentiallymany times during a single evaluation phase. In contrast, a dynamic gate switches onlyonce.However, detailed performance comparisons of DCVS with other design methods performedby Chu and Pulfrey [17] revealed that DCVS power gain is unfounded. For example, whencompared to static design, it was found that DCVS gains in terms of input capacitance anddevice count, but looses in regards to power dissipation. In comparison to other dynamic CMOSdesign styles (e.g. NORA and DOMINO), DCVS is faster, at the expense of increased devicecount and power dissipation.Recent experiments by Meng[42] further supported this fact. Dynamic logic, due to therestoring inverter at the output of this gate, does short. Further, power is consumed as eachcapacitor in the circuit is charged during the precharge phase, and as half of these nodes aredischarged during evaluation. Finally, though each gate makes a transition only once duringevaluation, there are many more switches than in typical static networks. In fact in DCVS, halfof all the gates switch during partial evaluation.Despite the illusory advantages of power and size, it is in the area of speed and testabilitywhere dynamic logic shows proven advantages over its static counterpart. More recently, re-search in dynamic logics has concentrated on advantages in the fields of asynchronous design,timing verification, testability for various classes of fault, and hazard freedom.Chapter 2. Discussion of Problem^ 26The most obvious advantage is that dynamic logic, unlike static logic, does not glitch. Ahazard is a momentary transition of the output that should have remained stable. A hazardcan become fatal if it causes a transistion to an improper system state. A glitch is a hazardthat is visible, such as a 0-1-0 or a 1-0-1 transition. In fact, in [37, 39], it was shown that thetwo essential properties that a circuit must possess in order to remain hazard-free are:1. that it be internally noninverting; and2. that each node be precharged to a specific value before evaluation.Dynamic CMOS logic is currently the only logic family possessing these qualities. In dy-namic gates, the outputs are affirmatively driven, and there can only be at most one outputtransistion after a precharge. For example, if the outputs are precharged to 0, then they willeither stay at 0, or be driven to 1 during the evaluation stage. Freedom from glitches has somenice consequences in circuit testing.A delay fault is a fault which slows down a circuit and impairs the clocked operation of thecircuit. Typically, a delay-fault test consists of a set-up vector, Vl , which is first applied to thecircuit, and a second vector, V2, to see if the outputs change. A path delay-fault test is saidto be robust if it detects a fault in that path regardless of whether or not there are faults inother paths. Robust delay-fault testing for integrated circuits is regarded as both difficult andextremely demanding.Since dynamic circuits have all the nodes initialized during precharge, it can be seen intu-itively that only the second test vector, V2 is needed to test a path delay-fault. Together withthe hazard-free property of dynamic circuits, McGeer showed that the condition for robust pathdelay fault testability (RPDFT) on dynamic circuits was much less demanding than on staticcircuits [34]. In particular, it was demonstrated that every minimal sum-of-products form wasrobustly path-delay fault testable, and that every multi-level circuit could be made robustlypath-delay fault testable by transformation. In the era of designing for testability, RPDFT isChapter 2. Discussion of Problem^ 27a very desirable circuit property.Perhaps the simplest and most basic fault model for Boolean networks is the stuck-at fault.A wire is said to have a stuck-at-0 fault if the output of the wire terminal is 0 regardless ofthe wire input. Similarly, a wire has a stuck-at-1 fault if the output of the wire terminal is 1regardless of the wire input. T-irredundant faults are a newly-identified class of stuck-at-faultswhere the faults do not change the function of the circuit, but instead act to slow down thecomputation of the circuit [40]. In [38, 39], McGeer further demonstrated that one could obtaina precise upper bound on the true delay of a dynamic logic network even in the presence ofuncertainties in the individual gate delays. This can be shown to be very difficult for static logicnetworks[35, 36, 39]. This precision in delay estimate permits us to test for the T-irredundantfaults on dynamic logic networks.In fact, a number of other timing related questions can be more precisely addressed fordynamic circuits [40, 38]. This, plus the hazard-free property of DCVS logic and its dual-railnature, make DCVS particularly attractive for asynchronous applications.2.3.4 DCVS ApplicationsThe 1989 Turing award lecture by I.E. Sutherland on his work on "micropipelines" [56] isan excellent example of the growing importance and potential of asynchronous circuits. Amicropipeline can simply be viewed as a series of fast processing asynchronous computations,such as multiplication or division, under the control of some small logic blocks, as shown inFigure 2.11. The control logic generates a precharge signal, P, after the completion signal, C,from the computational block is activated.Several designs have been suggested for the control block [56, 59, 48]. The control blockuses a handshaking protocol, such as the H-Protocol[48], to interface with other logic blocksegments of the pipeline. When one computational block finishes its job, it sends an "I'mdone" signal (C) to the control block. The control block uses this signal, together with controlChapter 2. Discussion of Problem^ 28controlpathCONTROLBLOCKcontrolpath CONTROLBLOCKCONTROLBLOCKCLdatapathLOGICBLOCKdatapathLOGICBLOCKLOGICBLOCKFigure 2.11: Generic asynchronous pipeline.signals from the previous and next stage computational blocks to determine the current state ofthis pipeline section. Eventually, it returns a "go ahead" signal (P) to the logic block when it issafe for the next round of computation to begin. A completion circuit that generates C can beeasily implemented in DCVS by taking advantage of the gate's differential outputs, as shownin Figure 2.12. The completion line goes high when both outputs are stable, and remains lowduring precharge. A DCVS gate only produces three possible output states for (f,7): (0,0),(0,1), and (1,0). Note that the pair of staticizing transistors in the DCVS gate ensures that thestate (1,1) is never achieved.DCVS naturally lends itself to very efficient self-timed circuits. Since asynchronous logicblocks can compute as fast as the signals propagate through the circuit, micropipeline designshave superior speed advantages over centrally-clocked systems. Meng and others have designeda series of circuits, such as multipliers and arithmetic-logic-units, using DCVS signal completion.These circuits have been demonstrated to operate at speeds comparable to their synchronous_JrequestrequestinputsNMOSPULL—DOWN TREE^Ccompletion0^Chapter 2. Discussion of Problem^ 29Figure 2.12: DCVS completion signal generation.counterparts [41, 43, 27]. More recently, Williams has devised a self-timed IEEE floating-point division ring which operates in 100 nanoseconds using this technique[60]. However, inthese works the routing of the DCVS differential lines required significant active area penalty.Furthermore, most of them were built from basic building blocks, such as adders, which werehand designed. Automatic design synthesis for DCVS is needed for building micropipelines forother interesting applications. In particular, we need computer tools to help build any arbitrarycombinational DCVS logic blocks.2.4 Technology Mapping and Layout Synthesis for DCVSAlthough DCVS technology has generated much interest due to its attractive features, theproblem of technology mapping for DCVS has not been thoroughly studied. Implementing aDCVS circuit is challenging due to the fact that the logic is complementary. Consequently, allsignals in the circuit must be duplicated. Intuitively, this leads to twice the number of internalChapter 2. Discussion of Problem^ 30nets and connections, and causes global routing area to increase by a factor of two or more.Since wiring usually take up more area on a chip than the transistors themselves, this wouldappear to make DCVS logic highly area-inefficient. However, this issue is not that clear-cut. Itis possible that the larger area could be compensated for by fewer cells of larger complexity.There are two possible avenues for DCVS design. We can either design a very large celllibrary containing very many complex cells and use the standard tree-matching algorithm fortechnology mapping, or we can develop a synthesis system that can generate an efficient layoutfor an arbitrary DCVS cell given only the Boolean function the cell is to implement.Since DCVS gates can realize complex Boolean functions efficiently, standard cell librariescannot fully take advantage of this feature. Using a standard mapper with a very large libraryhas several shortcomings First of all, since a complete cell library for all, say 5-input gates,would contain somewhere in the range of a billion cells, we would clearly have to compromiseand use an incomplete cell library. Hence, the mapper would often be forced to use smaller cellsthan absolutely necessary, simply because the cell needed was not in the library. On a morepractical level, it is very time-consuming and costly to design and maintain such a large celllibrary. Finally, experimental results [18, 29] have indicated that tree matching-based mapperscan be computationally expensive in matching large cells. In conclusion, using cell libraries oflimited size and functionality is inappropriate for DCVS mapping.The second approach of cell generation is more appropriate for DCVS. The goal then, is togenerate fewer cells of larger complexity. This approach will maximize the functionality of eachcell and optimize transistor usage, while minimizing the number of cells that need to be routed.2.4.1 The ProblemGiven a circuit description in terms of a set of Boolean equations, we are interested in its finalrealization using DCVS cells as building blocks. In general then, there are four steps requiredChapter 2. Discussion of Problem^ 31in transforming the general combinational circuit (assuming some technology-independent op-timization has been done) into a finished DCVS circuit layout:1. Decomposing the original large circuit into a set of smaller Boolean functions.2. Designing the pull-down tree for the DCVS cell corresponding to each such Booleanfunction.3. Laying out each DCVS cell.4. Placing the cells and connecting them together.2.4.2 Previous Work in DCVS SynthesisAlthough technology mapping and automatic layout synthesis have been studied in general, nosynthesis system has been developed specifically tailored to DCVS logic. In particular, we arenot aware of any system that can take an arbitrary combinational circuit and decompose it intoa collection of "as-large-as-possible" DCVS cells and then automatically generate the neededcells.Various automated layout generation tools have been developed over the years. However,these tools have mostly been tailored towards generating layout for standard CMOS cells,i.e., cells that have both function dependent pull-up and pull-down transistors. Consequently,the techniques used in these systems are not directly applicable to DCVS layout generation.Furthermore, DCVS cells are usually significantly more complex than standard CMOS cellsboth in terms of internal connectivity and in terms of number of inputs.Some previous work has been done towards solving the second step of obtaining compactDCVS pull-down trees at the transistor netlist level given a Boolean function [16]. Also, theclose relation between binary decision diagrams and the pull-down tree of DCVS cells waspointed out earlier [13].Chapter 2. Discussion of Problem^ 32ACORN is an automated physical design system developed at IBM by Yoffa and Haugefor DCVS macro design that does the second and third steps of the problem [65, 64], namely,designing the pull-down trees and laying them out. The system uses the brickwall approach forthe physical layout of the DCVS trees, where a two-dimensional tree array style is used. First,it performs tree placement, followed by local customizations to achieve variable alignmentsto improve intercell wirability. The goal is to obtain high density DCVS layout by reducingcongestion caused by the complex internal tree wiring between transistors. ACORN exploitsthe fact that for any Boolean function, there are several distinct tree implementations, andthat could lead to different array arrangements. Figure 2.13 shows the function (f = AB + C)implemented by six different trees. Each square is a transistor controlled by the input labeledinside the square.The program chooses the trees to achieve common input alignments between all pairs ofadjacent trees in each of the input rails. This rail optimization allows for signal bussing, hencereduces metal wiring between blocks. To illustrate the concept of local optimization, considertwo neighbouring DCVS trees as shown in Figure 2.14(a). If the two circuits are wired as is,the connections would be as in Figure 2.14(b). However, by rearranging the transistors of oneof the trees Figure 2.14(c), the wiring between trees is simplified Figure 2.14(d).In [53], a better DCVS placement technique is presented, which builds upon the algorithmsin ACORN to achieve a layout system that is provably good. The main drawback with thesesystems is that they assume that the initial decomposition step has been performed, i.e., theyassume that the user already has partitioned the circuit into suitable "chunks". Since the sizeand characteristics of the final DCVS cell are very difficult to estimate a priori, the partitioningwill likely be non-optimal.Placement and routing is a significant concern in DCVS design because of the differentialinput and outputs. In the brickwall layout approach, no channels are used for the routing.Instead, all wires pass above the macro regions. ACORN performs this with a wiring tool alsoChapter 2. Discussion of Problem^ 33f BAf f(d) (e)Figure 2.13: Six different DCVS trees for the function (f = AB + C).Chapter 2. Discussion of Problem^ 34 P PQ(b) (d)Figure 2.14: Illustration of local optimization in ACORN:(a) Two adjacent DCVS trees after synthesis.(b) Trees wired together.(c) Trees after transistor rearrangement.(d) Trees re-wired after local optimization.developed at IBM [20]. On the other hand, in DCVS polycell designs, routing channels areallocated. The placement and routing stage is no different for DCVS logic than for most otherCMOS standard cell approaches. The next chapter presents a system which incorporates steps1 to 4 of the DCVS problem, and which uses a polycell routing method.DMAPCLUSTERINGindividual file(BLIF)DCVS CELLGENERATIONcell library(EDGE)netlist file(EDIF)Chapter 3DMAP: A DCVS Technology Mapping and Layout Synthesis SystemDMAP is a technology mapping and layout synthesis system for DCVS. Figure 3.1 shows theflow diagram for the DMAP system. This system consists of two major modules: the clusteringmodule and the cell generation module. DMAP employs the CADENCE Design System [3] toperform placement and routing.circuit inputPLACE & ROUTEcircuit layout(EDGE)Figure 3.1: DMAP system flow diagram.35Chapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^363.1 System OverviewThe goal of the system is to take, as input, a technology independent circuit description, andproduce DCVS layout cells ready for placement and routing in CADENCE.The DMAP program is incorporated into Berkeley's MIS-II program, and hence can takean arbitrary circuit input in one of MIS-II's many input formats, such as BLIF (BerkeleyLogic Interchange Format [2]). Once read into the system, the circuit is decomposed, andthen partitioned into single-output, high functionality clusters (or modules). The cluster sizeis controlled by layout constraints. In particular, the system makes sure that each clustercan be laid out in the chosen cell style without violating any design rules. For example, thecluster size is constrained so that the final cell will have internal transistors that can be routedin the given layout style. The clustering module produces two outputs: a global netlist filewhich specifies the interconnection between the clusters (specified in EDIF, Electronic DesignInterchange Format [4]), and circuit descriptions (at the logic function level) of the individualclusters (specified in BLIF).The individual clusters are input to the DCVS cell generation module, where the layout ofeach cell is determined, and deposited in an intermediate layout representation (EDIF). Afterthis, a CADENCE (EDGE) layout representation for each of the cluster functions is generated.The output of this cell generation module is a library of DCVS cell layouts realizing complexfunctions.Finally, the CADENCE automatic placement and routing program takes the cell library(EDGE representations), together with the netlist specification of the whole circuit (EDIF),and produces the final circuit layout (EDGE representation). Each of these steps: clustering,cell generation, and cell connecting, are now expanded.Chapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^373.2 First Step: ClusteringThe problem of circuit clustering or logic partitioning has been of interest for quite some time.In the days of small scale integration (SSI), it was desired to partition a complex system intophysical components (chips) subject to delay constraints. Since inter-chip delay is significant,it was desired to minimize the maximum delay (along the critical path) through the wholedesign during partitioning. The optimization was carried out using objective functions suchas total wire length, wire length in the longest closed path, or the wire length in the longestpath from a network source to a network sink. To minimize cost, the number of chips shouldalso be as small as possible. This lead to the earliest form of the "clustering" problem. Thatis, the partitioning of a circuit into chips and the routing between them. The first solution tothis problem was an algorithm presented by Lawler, Levitt and Turner [31] 1 for combinationalnetworks. They assumed a "unit delay" model, defined as follows:• all gate delays are zero;• the delay between gates in the same cluster is zero;• the delay between gates in different clusters is one.Subsequently, when the scale of integration increased, clustering was applied at many otherlevels of the design hierarchy. Touati applied clustering to technology independent circuit delayoptimization [57]. Rather than identifying critical paths, his idea was to cluster the circuit tominimize the maximum number of clusters on a path. Each of these clusters are collapsed ontoa single node, then simplified. Recently, Murgai et.al . [45] generalized Lawler's algorithm fora general delay model. This is needed for high capacity clusters (such as LSI and VLSI chips),which very likely have critical path delays inside the cluster that are comparable to the totaldelay of the system.'Henceforth referred to simply as Lawler's algorithmChapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^38Network clustering has also been applied in the automated design of programmable devices,such as programmable gate arrays (PGA's) [46]. There are a limited number of configurablelogic blocks (CLBs) on one chip. For example, a typical Xilinx [1] chip has about 320 of theseCLBs to implement random logic. This is further constrained by a maximum limit on thenumber of inputs to each CLB and the total wiring connections that are available between theCLBs. The clustering goal is to minimize the number of blocks used, reduce routing complexity,and minimize delay on the critical path. The goal of DCVS clustering is quite similar.3.2.1 Circuit Clustering in DMAPDMAP partitions the decomposed input circuit onto clusters that can be realized as DCVSgates. The clustering in DMAP is constrained by the following:• If all the nodes in the cluster were to form the function of a DCVS pull-down tree, thenumber of internal routing tracks needed for internal connections would not exceed amaximum limit.• The number of inputs to any cluster does not exceed a maximum limit.The resulting clusters are rooted trees, each of which is collapsed onto a single function (ornode) that is implemented as a DCVS pull-down tree at the transistor level. Since the delaybetween transistors within a cluster is negligible compared to the delay between clusters, it issafe to assume the unit delay model defined earlier. Hence, it seems that some variation ofLawler's algorithm, which will be explained in detail below, is quite suitable for our needs.3.2.2 Lawler's AlgorithmLawler's algorithm, in its basic form, clusters the circuit to minimize the delay throughout thenetwork. The clustering procedure is essentially a labeling algorithm, where the label of node i,Chapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^39A(i), represents the longest delay along any path from a network input to i. Hence, the longestdelay in the network is the largest label value. Let wi(k) be the sum of the weights of all thepredecessor nodes of node i with label k, and wi be the weight of node i. The weight of a nodeis a value associated with some property of the node. A node is added to a cluster of the samelabel, wi plus wi(k), where k, the label of node i, does not exceed a maximum M. Figure 3.2shows a graph after clustering with the Lawler's algorithm for M = 3 and w = 1 for all nodes.The nodes that belong to the same cluster are drawn together, and the label for each node isshown in parenthesis beside the node. The numbering of the nodes is in the order that theyare traversed in the algorithm. The labeling algorithm is outlined as follows:1. Label all input nodes 0.2. Find an unlabeled node i, all of whose predecessors have been labeled.Let k be the largest label of any predecessor.If (wi wi(k) M)A(i) = kElse A(i) = k +1.3. Repeat Step 2 until there are no more nodes.3.2.3 Modified Lawler's AlgorithmThe DMAP clustering algorithm differs from Lawler's in three ways. First, as the reader maynotice from Figure 3.2, Lawler's method produces some inefficient single-noded clusters. Thatis, there are single nodes whose fanins and fanouts are of a different label. Without violatingcluster-constraints, nodes 6, 7, and 8 can be put into one cluster; node 10 can be included intothe 9 and 11 cluster; nodes 4, 5, and 12 can be one cluster. The DMAP algorithm recognizesthese cases. Secondly, DMAP uses a different capacity constraint. In fact, DMAP verifies thatthe clusters can be implemented in less than the maximum number of tracks allowable by a cellJAW I4Me w a 1LII01 puor iaEOM^D" .94=ICE01(a)^ (b)Chapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^40Figure 3.2: Illustration of Lawler's clustering algorithm (M = 3).of standard height and that the number of cluster inputs does not exceed a maximum. Finally,once the labeling count has been advanced during the node traversal procedure, the completedclusters are collapsed and generated immediately.Incorporating these changes then, we develop a "modified Lawler's algorithm". Given anynode 71 in a circuit, we define Tk(n) as the tree built by taking n as root, and including allnodes labeled k which will ultimately fanin to 71. The algorithm is as follows:1. Label all input nodes 0.2. Find an unlabeled node i, all of whose predecessors have been labeled.Let k be the largest label applied to any predecessors.Let tree = Tk(i).If (Satisfy_Constraints(tree))A(i) = kElseA(i) = k + 1Foreach p = fanin(i)Let test_tree = Tk(i) plus the node pChapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^41If (Single_Node(p)) AND (S atisfy_Constraints(test_tree)) )A(p) = A(i)Else If (A(i) > \(p))Generate_Cell(p)3. Repeat Step 2 until there are no more nodes.The function Satisfy_Constraints(n) checks to see if the tree with root n containing onlythose predecessor nodes with the same label, has a logic function that will result in less than themaximum number of allowed routing tracks when run through a somewhat simplified version ofthe cell generation routine. For a 1.2 micron double metal technology, for example, this uppertrack limit is 7. Generate_Cell(n) will take the tree with root n containing nodes of the samelabel, collapse the tree into one function, and generate the DCVS layout cell for that function.This operation will be described in Section 3.3.Figure 3.3 illustrates the DMAP clustering algorithm. Part (a) shows the tree to be clus-tered, and Part (b) shows the results of clustering. First of all, nodes 1 to 14 are labeled 0.Suppose that To(15) does not satisfy the cluster-constraints, and node 15 is labeled 1. At thispoint, node 15's two fanin trees (To(7) and T0 (14)) are collapsed and the first two clusters, Cland C2, are generated. Next, nodes 16 to 30 are labeled 0 because they all fit into one cluster.The next node to be labeled is node 31. The largest predecessor label is k 1 (node 15). T1 (31)consists of nodes 15 and 31 (C1 and C2 are considered as inputs to node 16 now). Nodes 16-30have been labeled 0, and hence are not part of the tree T1 (31). It is found that this tree satisfiesthe cluster constraints, and so \(31) = 1. Before proceeding to label the next node, To(30) canbe collapsed and the corresponding cell can be generated to form C3. Similarly, nodes 32 to61 are labeled, and C5 and C6 are generated. This time, the tree consisting of nodes 46 and62 (T1 (62)) does not satisfy the cluster constraints, and so A(62) = 2. Then C7 is generated.Although node 46 is a single-node cluster, its label cannot be incremented without violatingsome cluster constraints. C8 is generated. Then, nodes 63 to 72 are labeled 0, and node 73 islabeled 0. Lastly, we label node 74. At this point, k --,- 2. The tree consisting of nodes 74 andC 10Chapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^42Figure 3.3: Illustration of DMAP clustering: (a) original DAG (b) DAG after clustering.Chapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^4362 is tested, and found to satisfy constraints, and so A(74) = 2. C4 and C9 are generated. Itis found that adding node 73 to T2 (74) still satisfies constraints. So, finally, the cluster C10,consisting of nodes 62, 73, and 74 is formed and generated.In the example just shown, DMAP clustered the circuit into ten clusters. During the process,two single-noded clusters were encountered (node 46 and node 73), but only node 73 could bere-labeled and included into a bigger cluster. It is likely that node 46 already contained fairlyhigh functionality. The longest wire delay in the network is 2 since the largest label is 2. Forexample, the path (C6, C8, C10) has a wire delay of 2 and a gate delay of 3.Finally, after all the nodes have been labeled and all the cells have been generated, the laststep is to output a global netlist file which specifies the connection between the generated cells.Generate_Cell(n) takes a single output function of single-rail inputs and produces a DCVS cellwith two output functions of dual-rail inputs. Therefore, the original circuit just after clusteringmust be dualized, i.e., to correctly duplicate the appropriate rails so the connections to the dual-rail DCVS cells are correctly made. For example, in Figure 3.3, a netlist of 10 cells, 9 internalnets, 32 inputs, and 1 output becomes a netlist of 10 cells, 18 internal nets, 64 inputs, 2 outputs,plus one additional input for the precharge signal. Any DCVS circuit will have about twice asmany outputs and roughly twice as many inputs, as its single-rail counterpart.3.3 Second Step: Cell GenerationThe goal of the DCVS cell generation step is as follows. Given an arbitrary Boolean functionf, lay out the pull-down tree as well as the other transistors needed to construct a DCVS gatethat implements f. By nature of the DCVS structure, this also means implementing T. Inorder to facilitate automatic synthesis, a structured layout style is used. Figure 3.4 shows thelayout of a generated 3-input XOR DCVS gate. The shaded and dotted regions are the devicewells, while the unshaded and dotted regions intersecting them are polysilicon. The unshadedstriped regions are metal2, while the unshaded cross-hatched regions are metall. The solidleft—half right half/'/'/Y/Y/X//`/X/W\‘/`//s/'/y/X/X//`//‘'//,, //'/.;:/X/`/Y/Y/`/`•1...11•1EIMIGNINIIN^-^ amidimpg ' li 'Ii .4 I! . ^ampaungi a0 OWEN OEM Drill CA 2 Cl Monona asstr.1 01.11607.rsv lesmplea6,1i11..crl t llmrmr0mr° ° 0, ra A El El A El A r2 A ° A °..m. el Ci C3 C3ram,,`/‘/,`^/`/\/'/\/,`/\/\/,‘/\/'/\/,`/\/‘//,`/ /,`/`/\/,'/,`/,`//,`/, \/\/,‘/,`/,‘/,`/,`/,/ WM11„,411641iE12:0=1,4019.7Z..."1.1imgvalri^mom vaDEEM/,/,/,/,`//‘/,`/'/ENE KIELWAINWMCNYMErff&TOEWIKIIC,:lIMMORMEn^MEMERMIN^ r4ff&rAilMeGiriRtr&V"WtOWffiLrffer[fInIWOrlrtregemewejamrwerdrwanwerm0 Offtweitanwer....wmp A/elmaaammamaaawarrgsiiIIIMIirsirjmodummrammaxmaneausweessebnaffewassawasmaffai0rilwffesymmummcmgE3^1111willnmilW1=11MIlinro^ro^El^El^El^Elvia vAChapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^44regions are either contact or via. The cell consists of two parts: a standard left-half, and agenerated right-half. The two halves are connected by the power and ground rails and threerouting tracks: one for the connection to the source of the N-type precharge transistor; one toq; and one to q (refer to Figure 2.7).Figure 3.4: Layout of a generated DCVS 3-input XOR cell.The left-half contains nine transistors: the two precharge and cross-coupled P-type transis-tors, the single N-type precharge transistor, and the two static INVERTERS (one N-type transistorand one P-type transistor). This part is hand-designed and is the same for all the DCVS cells.The right-half of the cell is the layout of the DCVS pull-down tree. It includes a single strip oftransistors placed along the bottom of the cell and the routing tracks for transistor connectionsconsuming the remaining space. A one-dimensional transistor layout style is adopted. TheDMAP program takes several steps to generate such a layout, each of which will be elaboratedChapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^45on in the following subsections:• Given the logic function, obtain a DCVS pull-down transistor netlist.• Given the pull-down netlist, obtain a transistor layout path for this netlist.• Given the transistor layout path, obtain the track connections between transistors.• Output the physical layout.3.3.1 Obtaining the DCVS Logic TreeThere is a very natural relation between binary decision diagrams (BDD) [7] and the pull-downtree of a DCVS. BDDs are an alternative way of representing logic functions. A BDD is adirected acyclic graph with two paths directed away from every node: one for the node assertedtrue, and one for the node asserted false. The root of the tree is the function to be implemented,the leaves are 0 or 1 terminals, and the nodes are function variables. Bryant introduced orderedBDD [14, 15], which requires that for any path from the root to a leaf node, the variables musthave the same ordering. Figure 3.5(a) shows an unordered BDD, and Figure 3.5(b) is an orderedBDD. Most of the BDD research has focused on ordered BDDs. If unspecified, a "BDD" willbe assumed to be ordered. Although in this thesis we will use algorithms for ordered BDDs, itshould be noted the relationship between BDD and DCVS logic is just as valid for unorderedBDDs. Should future algorithms be developed for unordered BDDs, they can be used for ourpurpose as well.To obtain a DCVS pull-down tree for a function f, the basic idea is to turn the BDDrepresenting f "upside down". After this, the following conversions are made. The "1" terminalis labeled q, and the "0" terminal is labeled q. The "1" branch of a variable is labeled with thevariable name, while the "0" branch of the variable is labeled with the complemented variablename. The root node of the BDD is labeled precharge, and will be connected to the prechargetransistor that conducts to ground. The other nodes are labeled arbitrarily, and correspond toChapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^46(a)^(b)Figure 3.5: (a) Unordered BDD (b) Ordered BDDelectrical nodes in the pull-down circuit. Each edge of the BDD becomes a transistor controlledby one of the DCVS cell inputs.To illustrate the transformation of a BDD to the pull-down tree of a DCVS gate, considerFigure 3.6. Part (a) is a possible BDD for a 3-input XOR function, and part (b) is the same BDDturned upside down. Part (c) shows the relabeling, and part (d) shows the corresponding DCVSpull-down tree. Each BDD node converts to two transistors controlled by the complementedand uncomplemented form of the node variable. Given this straightforward transformation,one way of synthesizing a DCVS cell is to derive a small BDD representation for the desiredfunction and then carry out the transformation.The order that variables appear in a BDD can significantly affect the size of the BDD [15].Figure 3.7 shows two different BDDs for the same function x i x 2 + x3 x 4 , using different variableordering. Clearly, a poor initial choice of variable ordering can cause undesirable effects.Chapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^47f=a ebec(a) (b)SW* 1>_ prechargeprecharge(c ) (d) precharge--I(PR)Figure 3.6: Converting a BDD to a DCVS pull-down tree:(a) 3-Input XOR BDD.(b) "upside down" BDD.(c) Relabeled BDD.(d) Corresponding DCVS gate schematic.Chapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^48<3,4,1,2>^ <3,1,4,2>(a) (b)Figure 3.7: Example of BDDs for the same function using different variable orderingsDMAP creates the DCVS pull-down tree by building an ordered BDD of optimal variableordering using the algorithm given in [21]. During the clustering process, a simplified variableordering heuristic algorithm [32], implemented in Berkeley's MIS-11 logic synthesis program, isused to check if a node belongs in a cluster.3.3.2 Obtaining the Transistor Layout PathGiven the transistor netlist for the pull-down tree and the chosen one-dimensional transistorlayout scheme, the transistors must be laid out in an manner that will give good cell area andspeed, as well as facilitate automatic global placement and routing. The problem of finding thebest layout path to minimize the number of internal transistor connections is very difficult tosolve exactly. Hence, DMAP uses heuristics that appear to give good results.The basic idea is to perform a priority-based graph walk. The heuristic first tries to find aChapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^49path that will lead to any node that has already been encountered but which still has transistorsconnected to it that has not been placed. This seeks to localize all the connections made toa single node, so that as many different node connections can be shared on one track lineas possible. Given the above, the heuristic then tries to find a path along adjacent nodes tomaximize diffusion sharing. This is essentially equivalent to finding an Euler path [58] in thetransistor graph. If this cannot be done, a pseudo edge separation is inserted, which results inmaking the cell width greater. The pseudo code for the algorithm is given below. Initially, allnodes are marked as INACTIVE. A node become ACTIVE once a transistor connected to thenode has been placed.Find_Transistor_Path(n) {While (Edge Set is not empty) {If (degree(n)=0)Mark n as FINISHED;S = set of ACTIVE nodes;ElseAN = set of ACTIVE neighboursIf (AN is not empty)S = AN;ElseRA = set of ACTIVE nodes excluding n;if (RA is not empty)S = set of neighbours closest to any node in RA;ElseS = set of neighbours;n_next = node in S with smallest degree;Output edge (n,n_next);Remove edge (n,n_next) from Edge Set;Add n_next to the set of ACTIVE nodes;= n_next;}}In Figure 3.8 we illustrate the first steps in the algorithm for a 3-input XOR function assumingPath=<0,1,2>Active=10,1,21(c)(precharge)(q) (i)Path = <>Active=101(a)Path=<0,1>Active=10,11(b)(precharge)C)/Chapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^50Path=<0,1,2,6>Active=10,1,2,61(d)Path=<0,1,2,6,0>Active=10,1,2,6/(e)Path=<0,1,2,6,0,1,5,62,3,5,4,2>(f)Figure 3.8: Illustration of DMAP's transistor path finding algorithm.Chapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^51the path is started from node 0. ACTIVE nodes are marked with a "tick". Since node 0 hasdegree 2, and has no ACTIVE neighbours, the next node can either be node 1 or node 6—bothof which have the same degree. Arbitrarily, node 1 is chosen (Figure 3.8(b)). Node 1 has degree2, and does not have any ACTIVE neighbours. The set of active nodes, excluding node 1,only contains node 0, i.e., RA = {0}. There are two shortest paths from node 1 to node 0:< 1, 2,6, 0 > and < 1,5,6,0 >. The set of neighbours closest to any node in RA is hence {2, 5}.Arbitrarily, node 2 is chosen, and the Edge and ACTIVE Sets are updated (Figure 3.8(c)). Thenext nodes chosen are nodes 6 and 0 (Figure 3.8(d) and Figure 3.8(e)). At this point a pseudoedge separation must be added from node 0 to an ACTIVE node with smallest indegree—inthis case node 1. Continuing the procedure, the final path, < 0,1, 2,6,0,1,5,6,2,3,5,4,2 >, isshown in Figure 3.8(f). In summary, the 3-input XOR function is laid out as a DCVS cell inwhich the pull-down tree uses ten transistors with only two pseudo edge separations added.3.3.3 Obtaining the Track ConnectionsAfter the transistor layout path is obtained, the next step is to connect the transistors together.DMAP allows the upper portions of the cell's right-half for such connections. This is done ina series of tracks parallel to the transistor strip. For our standard design height of 75 designmicrons, a maximum of 7 tracks are allowed. As mentioned at the beginning of this section, 3of these tracks are already used up: one of the nodes to the pull-down precharge, one for thepositive function realization, and one for the complement of the function. This leaves 4 tracksfree for general connections. Once the transistor path is known, the connection points betweentransistors is fixed. The problem is to place these connections in a way that will share as manytracks as possible, hence using as few tracks as possible.DMAP achieves this by using the Left-Edge-Algorithm (LEA) first given by Hashimoto andStevens in [23], and later employed by Berkeley's YACR-II channel router [51]. The connectionsbetween transistors can be viewed as wire segments (Ii), whose ends [xi, yi] are connected byChapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^52a vertical piece of wire down to the transistor row. As in channel routing, the object is toplace these segments on the fewest number of tracks without overlap. The algorithm, whichguarantees minimum tracks, is essentially as follows:1. Search the set of intervals for the element which has the smallest lower bound.2. This element is assigned to the first track and removed from the interval set.3. Search the element in the interval set which has the smallest lower bound which is greaterthan the upper bound of the previously chosen interval.4. This interval is assigned to the track and removed from the interval set.5. Repeat Steps 3 and 4 until no intervals are left that satisfy the requirement.6. If the interval set is non-empty, go to Step 1 to start on the assigning for the next track.Figure 3.9(a) shows 7 track segments, sorted according to left-edge. Figure 3.9(b) illustratesthe tracks after LEA is applied. The horizontal axes of the graphs are the connection points onthe transistor path (which determine the cell-width); the vertical axes are the routing tracks(which determine the cell-height). The segments 1, 4 and 8 are chosen to put on the first track;segments 2 and 6 share the second track; and segments 3 and 5 are on the third track.3.3.4 Layout IssuesOnce the transistors and track connection positions are all known, the right-half portion of thecell can be added together with the standard left-half portion of the cell. The cell information iswritten onto a layout specification file in EDIF, which can be read into CADENCE. Figure 3.10shows some typical generated DCVS cell layouts.Our designs use a 1.2 micron, Northern Telecom CMOS double metal technology. The cellsare of standard height (75 design units) and variable width. This allows for 7 internal routing\ I \-\^ NNN^ \ Chapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^53TRACK 7TRACK 6TRACK 5TRACK 4 x4x6 y6x7^y7Y5x5 1 II y4TRACK 3 x3 I Y3 I I I TRACK 3 x3^y3 x5 y5I I I I I I I ITRACK 2 x2 1 y4 I I TRACK 2 x21^1 y3^x6 y6 ITRACK 1IJ21I I ITRACK 1I^I IIx1i4Lx4 II^IIII y4 x7 I y7I11^I I I I I I I^I^I^I^I^I I^I I^I I I^ITRANSISTOR PATH^TRANSISTOR PATH(a) (b)Figure 3.9: Illustration of Left-Edge-Algorithm.Figure 3.10: Some generated DCVS cell layouts.Chapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^54tracks in the right-half portion of the cell. Metal 1 is used for horizontal routing of tracksegments, while metal 2 is used for the vertical connection of these segments to the transistors.The function inputs are pulled to the bottom of the right-half portion of the cell. The prechargeand two output lines are at the left-half portion of the cell. Jumper pins are inserted wherepossible to facilitate global routing.The right-half portion of the cell can waste space in the routing region. Also, the one-dimensional array of transistors tends to produce long and short cells, which may make itdifficult for the placement and routing program. However, there are a number of advantagesfor us adopting this layout style.Having a standard left-half that is independent of the pull-down tree laid out in the right-half allows for flexible changes in the pull-up network and pull-down network. For example, itmay be desired to give the static INVERTERS stronger driving power (since DCVS cells typicallydrive many inputs). The size of the INVERTERS transistors can be changed without affectingthe rest of the system. On the other hand, new ways for laying out the pull-down tree can beeasily adopted into the system. This style allows for future extensions of our system for otherforms of dynamic logic.The one-dimensional transistor strip allows the cells to be "porous" with respect to globalrouting. In other words, the input pins act as feedthrough pins, where the input signals canpass through the pin from the top and bottom directions. Furthermore, the one-dimensionalstrip allows for transistor sizing. One of the advantages of cell synthesis is the freedom to varythe transistor sizes for better performance.Chapter 3. DMAP: A DCVS Technology Mapping and Layout Synthesis System^553.4 Third Step: Connecting Cells TogetherAt this stage, we have a set of DCVS layout cells specified in EDIF files, and a circuit netlist de-scription of the interconnection between these cells, also in EDIF format. After some CADENCE-specific preprocessing steps (Appendix B documents these "fixes" in full detail and glory), thelayout files are converted to EDGE cell representations. CADENCE then reads in the globalnetlist file, searches for the layout instances, and performs placement and global routing. Theresult is a complete DCVS circuit layout.Chapter 4Experimental ResultsThe system described in the previous chapter was implemented in the C-programming language,inside the MIS-II programming environment. Exhaustive experiments were done to evaluatethe performance of the system. The primary concern in DCVS synthesis is the final circuitarea, which consists of:• the total area of the generated cells, and• the total area of global routing.In DMAP, the area of each cell is determined by the number transistors in the layout stripand the number of pseudo-edges. The total area of generated cells is also affected by the numberof cells that are produced by the clustering function. The number of internal routing tracksused, give an indication of how well the cell area is being utilized. The global routing area isdetermined by the number of cells that need to be wired together, as well as the number ofnet connections between these cells. We now look at experimental results and examine some ofthese factors that influence the final area.4.1 Evaluating DMAP's Stand-Alone PerformanceDMAP was run with industrial benchmark circuits from MCNC (Microelectronics Corporationof North Carolina) and ISCAS (International Symposium on Circuits and Systems). Table4.1 and Table 4.2 summarize the statistics of these experiments. Column 2 shows the average56Chapter 4. Experimental Results^ 57number of transistors in the layout path, while Column 3 shows the number of pseudo-edges inthe layout path. Typically, each cluster contains approximately 20 transistors, which implementreasonably large sized functions. The small pseudo-edge to transistor count ratio in Column5 indicates that the path finding algorithm is able to find fairly good layout paths. Column4 shows the average number of routing tracks that are used for transistor connections in thatcell. It appears that most of the cells do not actually use up all the allowable tracks. Thissuggests that better use of space in the right-half portion of the cell can be made. We willdiscuss methods of improvement later.4.2 Comparing DMAP with Conventional MappingSince DMAP maps single rail networks onto dual rail circuits, there will always be an areapenalty for using DCVS. In order to understand what the size of the penalty is, we compareDCVS circuits produced by DMAP with static CMOS circuits mapped with MIS-II using theIJBC-Northern Telecom CMOS4 standard cell library.Table 4.3 and Table 4.4 summarize the cell and internal net count of the two approaches.The number of cells used by DMAP is considerably fewer than that used by the standard cellmethod. Often DMAP uses only one fifth as many cells. This means that although DCVSdesigns require dual rails, fewer cells need to be routed. Furthermore, the total number of netsthat need to be wired, i.e., unique signals that need to be routed to each cell, is typically lessthan twice the number of nets in standard mapping. In fact, a number of circuits have fewertotal nets. This is better than the intuitive assumption that dual-rail circuits require twice asmany wires as their single-rail counterparts.Table 4.5 and Table 4.6 summarize some cell area comparisons that were made between thetwo approaches. For most circuits, the total DCVS cell area is slightly more than twice thetotal cell area of the standard cells. This is a relatively good result considering the simplicityof the chosen cell layout style. It is likely that a more sophisticated layout style will improveChapter 4. Experimental Results^ 58this significantly.CircuitNameAverage No.Transistors inPulldown TreeAverage No.Pseudo EdgesAverage No.InternalRouting TracksPseudo-edge/TransistorRatioC1355 16.8 3.8 5.8 0.2C17 9.0 1.5 5.5 0.2C1908 16.9 3.9 5.9 0.2C2670 16.7 3.4 5.8 0.2C3540 17.4 3.5 6.0 0.2C432 18.1 4.0 6.1 0.2C499 17.8 3.8 5.8 0.2C5315 20.1 4.1 6.0 0.2C6288 20.3 4.2 6.2 0.2C7552 19.7 3.9 5.9 0.2C880 14.1 3.0 5.1 0.2Table 4.1: DMAP generated cells for the ISCAS benchmark circuits.4.3 After Placement and RoutingA few circuits were placed and routed with the CADENCE placement and routing tool. Theywere found to have a total circuit area much greater than the corresponding standard cellcircuit. For example, the DUKE2 DCVS circuit was found to be about 3.5 times larger in areathan the standard cell circuit (see Figure 4.1 and Figure 4.2). Note that the DCVS chip hastwice the number of pads as the single rail chip. The ruler scale beside the core area are thesame in both Figures, and can be used to estimate the core areas. However, it is meaninglessto make such comparisons, as the total area is very much dependent upon the placement androuting program, and which parameters the program optimizes for. Figure 4.3 and Figure 4.4,which show the core area of the circuit MISEX2, exemplify how much the placement and routingcan affect the circuit area.Chapter 4. Experimental Results^ 59CircuitNameAverage No.Transistorsin Layout PathAverage No.Pseudo-edgesin Layout PathAverage No.InternalRouting TracksPseudo-edgeto TransistorRatio5xpl-hdl 9.9 2.2 4.9 0.25xpl 15.1 3.2 5.6 0.29sym-hdl 14.7 3.1 5.7 0.29sym 18.6 4.5 5.6 0.29symml 13.8 3.1 5.7 0.2alupla 22.5 4.8 6.4 0.2bw 13.8 3.2 5.7 0.2conl 10.3 2.2 5.3 0.2duke2 21.6 5.4 5.3 0.3f2 10.0 3.0 6.0 0.3f51m-hdl 9.4 2.1 4.8 0.2f51m 16.6 3.8 5.8 0.2misexl 12.1 2.9 5.2 0.2misex2 19.8 4.9 5.0 0.2misex3 14.4 3.6 5.3 0.2misex3c 20.9 5.1 5.7 0.25xpl 15.1 3.2 5.6 0.2rd53-hdl 19.3 3.0 6.0 0.2rd53 19.3 3.0 6.0 0.2rd73-hdl 19.3 4.3 6.0 0.2rd73 21.0 5.0 6.2 0.2rd84-hdl 15.7 3.6 5.6 0.2rd84 20.8 5.1 6.0 0.2sao2-hdl 20.1 4.8 5.8 0.2sao2 18.1 4.4 5.4 0.2seq 20.0 4.6 6.2 0.2vg2 19.4 3.8 6.0 0.2z4ml-hd1 16.9 3.9 5.6 0.2z4m1 17.0 3.9 5.7 0.2Table 4.2: DMAP generated cells for the MCNC benchmark circuits.Chapter 4. Experimental Results^ 60CircuitNameMIS-IIIMapped (a) DMAPIMapped (b)I(b)/(a)i# cells # nets # cells # nets # cells # netsC1355 732 1250 134 1875 0.2 1.5C17 6 12 2 18 0.3 1.5C1908 643 1164 170 1862 0.3 1.6C2670 1036 1852 233 2648 0.2 1.4C3540 1478 2721 393 5172 0.3 2.0C432 260 473 131 1088 0.5 2.3C499 558 892 98 1744 0.2 2.0C5315 2033 3992 656 9182 0.3 2.3C6288 2402 4755 810 9510 0.3 2.0C7552 3036 6200 652 10544 0.2 1.7C880 435 760 148 1140 0.3 1.5Table 4.3: DMAP vs. MIS-II in cell and net count for ISCAS benchmarks.However, as technology allows for more than two levels of interconnect in the layout, itbecomes possible to perform routing on top of the cells. In this case, only the cell areas wouldbe of concern. DCVS uses considerably fewer cells than standard mapping. By focusing onimproving the layout style, the DCVS area penalty can be reduced.Chapter 4. Experimental Results^ 61CircuitNameMIS-IIIMapped (a) DMAPIMapped (b) (b)/(a)I# cells # nets # cells # nets # cells # nets5xpl-hdl 135 275 28 224 0.2 0.85xpl 82 130 25 253 0.3 1.99sym-hdl 138 225 57 479 0.4 2.19sym 203 462 44 664 0.2 1.49symml 161 383 48 504 0.3 1.3alupla 137 246 44 704 0.3 2.9bw 222 491 36 328 0.2 0.7conl 23 41 6 44 0.3 1.1duke2 465 1174 114 2120 0.2 1.8f2 24 44 4 32 0.2 0.7f51m-hdl 78 124 30 237 0.4 1.9f51m 143 273 27 286 0.2 1.0misexl 69 149 16 183 0.2 1.2misex2 89 234 17 286 0.2 1.2misex3 708 1674 229 2656 0.3 1.6misex3c 514 1244 117 1989 0.2 1.6rd53-hdl 56 88 3 30 0.1 0.3rd53 50 98 3 30 0.1 0.3rd73-hdl 90 141 24 230 0.3 1.6rd73 136 317 21 284 0.2 0.9rd84-hdl 122 190 45 410 0.4 2.2rd84 303 665 50 765 0.2 1.2sao2-hdl 307 538 172 2924 0.6 5.4sao2 131 305 33 478 0.3 1.6seq 2093 4456 685 7980 0.3 1.8vg2 87 159 30 293 0.3 1.8z4ml-hdl 69 140 23 220 0.3 1.6z4m1 54 92 21 93 0.4 1.0Table 4.4: DMAP vs. MIS-II in cell and net count for MCNC benchmarks.Chapter 4. Experimental Results^ 62CircuitNameMIS-II TotalCell Area (a)DMAP TotalCell Area (b)(b)/(a)5xpl-hdl 151 303 2.05xpl 287 385 1.39sym-hdl 252 852 3.49sym 47 821 1.79symml 401 684 1.7alupla 246 964 3.9bw 503 516 1.0conl 41 67 1.6duke2 1249 2447 2.0f2 44 45 1.0f51m-hdl 159 310 2.0f51m 293 453 1.5misexl 159 206 1.3misex2 234 336 1.4misex3 1774 3419 1.9misex3c 1256 2425 1.9rd53-hdl 100 56 0.6rd53 104 86 0.8rd73-hdl 159 461 2.9rd73 326 436 1.3rd84-hdl 211 720 3.4rd84 685 1033 1.5sao2-hdl 538 3433 6.4sao2 317 602 1.9seq 5263 13590 2.6vg2 405 571 1.4z4ml-hdl 102 393 3.9z4m1 122 360 3.0Table 4.5: DMAP vs. MIS-II in cell area for MCNC benchmarks.Chapter 4. Experimental Results^ 63Figure 4.1: MIS-II mapped chip.Chapter 4. Experimental Results^ 64millkommumunommossommomeEmmaM ^'^ =-- I.^ 011■1 li 1^ 111I INIIIIMI^1 1^ , 1111NI illill ' '1^,^,III 1 ,^II, ■•^1^1 N^IIIII• m„....m,..i ..._... ,...• .• i 1 •••■• s•■ • il : . SW/ .. lie Ill ommor,......MiliMilMIII.1.= ‘;.;:::::: ".....:::: .1::77:;■■: 1111111 .11.11.=.11=1111.1EMI. '' • . • . • •.... •••. .'•••• . •• • .! ' " ” •^ al-"- _•2:.1■•■•■1111.1■11■1■ENI.: --LIM IMINIMANTIC.: .:.::77.7::::::,177,. "7:1 Illill PI^PIPPIIPPOIL ■11■1^11111111111111111IN■■■■■■■■■■■■■■■■■■■■■■■■■■■■Figure 4.2: DMAP mapped chip.11:1- 111111,i^:1 1 111I him^- - • - .41■1111 WIN 11= I • Tr.. .^_I 11;^= .r; „;=--Pum =!,Chapter 4. Experimental Results^ 65Figure 4.3: Core area of a MIS-II mapped chip.^MIll - - =^ - - -^- -^EEEEl i -H=FIF._■-■ -■:■74911_11... 1 111111 01V-m r-4 1g1MIL-77 -7 -_-_-=,--1111111111111111111111111111111111111111MM11111111111111-111-1111- 11111111111111•^111111111111.11^ WIN^Chapter 4. Experimental Results^ 6611011111E1111111MMIIIMIUM----Figure 4.4: Core area of a DMAP mapped chip.Chapter 4. Experimental Results^ 67CircuitNameMIS-II TotalCell Area (a)DMAP TotalCell Area (b)(b)/(a)C1355 1250 2272 1.8C17 12 20 1.6C1908 1164 2903 2.5C2670 1852 3895 2.1C3540 2721 6806 2.5C432 473 2466 5.2C499 1012 3409 3.4C5315 4103 12936 3.2C6288 4755 16135 3.4C7552 6350 12598 2.0C880 811 2143 2.6Table 4.6: DMAP vs. MIS-II in cell area for ISCAS benchmarks.Chapter 5Conclusions and Future WorkThis thesis addresses the problem of using CAD to implement combinational logic with DCVS.First, we looked at the design automation process for VLSI, and more specifically, the problemsof technology mapping and layout synthesis. We examined dynamic DCVS logic, its operation,its distinguishing properties, and DCVS applications. The DCVS synthesis task was analyzed,and we outlined a system which produces DCVS circuits. DMAP, a technology mapping layoutsynthesis system for DCVS was presented in detail. The system performs the tasks of circuitclustering, transistor path finding, and cell layout connecting. Experimental results were given,and their implications discussed.5.1 Main ContributionsThe major contributions of this thesis are summarized and highlighted as follows.• The problem of DCVS circuit synthesis was formulated. DCVS circuits are difficult tocreate because of the intricate internal connections and the dual-rail inputs and outputsassociated with each cell.• We outlined a solution to this problem by introducing an integrated system called DMAP.By generating as large as possible DCVS cells, the final circuit uses considerably less cellsthan if conventional technology mapping were used. As a result, the area penalty of usingDCVS can be reduced because there are less wires to route in the circuit.68Chapter 5. Conclusions and Future Work^ 69• The DMAP system was designed and implemented. This system takes any combinationallogic block and outputs a DCVS circuit layout. More specifically, it performs the tasks oftechnology mapping and layout synthesis. Large DCVS circuits can be produced.• An algorithm was developed for synthesizing the layout of the DCVS pull-down block ina systematic and straightforward manner. In fact, this algorithm can be applied to layoutthe pull-down block of any dynamic logic, such as NORA and DOMINO. At present, thesynthesis of dynamic cell layout is still unchartered territory. DMAP's layout style canbe easily extended to synthesize other types of dynamic cells.• An algorithm for circuit clustering was developed based on Lawler's node-labeling pro-cedure. It achieves minimum delay through the network and partitions the circuit intomodules of sizable functions, subject to a set of constraints. This is a very useful algo-rithm that can be extended for use in other areas, such as the FPGA (field programmablegate array) technology mapping problem.5.2 Future DirectionsDCVS circuits are costly in terms of total area. However, we believe that the performance ofDMAP can be improved through further work. In obtaining the DCVS pull-down tree, orderedBDDs were used. It is possible that unordered BDDs may give better results. Also, othermethods of obtaining the DCVS pull-down tree, such as that in [16] can be experimented with.It is unclear which decomposed circuit form would better facilitate clustering. In DMAP, 2-input NANDS and 2-input NORS were used; other circuit decompositions should be experimentedwith as well. Using a more sophisticated layout style can improve the total cell area results.One of the benefits with using a layout synthesis approach, rather than a standard cellsapproach for technology mapping, lies in the flexibility in cell style. In particular, a layoutsynthesis system can be modified to alter the sizes of different transistors in order to speed upChapter 5. Conclusions and Future Work^ 70the circuit. In fact, the circuit speed can often be dramatically improved by increasing certaintransistor widths. For the layout style we have chosen, it is easy to see how the output drivingINVERTERS can be made wider without significantly increasing the cell area. Furthermore, it isalso possible to reduce the delay in the pull-down network, and thus decrease the delay in theDCVS cell itself, by modifying the width of some of the input transistors. This later modificationis also possible without a significant area penalty. For a practical DCVS technology mapper,this "size modification" addition to the mapper is necessary if the maximum speed advantageof DCVS logic is to be fully utilized.Combining the ideas of dynamic logic synthesis, and the partitioning of circuits into blocksfor programmable arrays, one interesting direction of development would be "dynamic pro-grammable gate arrays". It is possible that the right-half portion of the cell can be replacedwith a programmable gate array that can allow for any DCVS function to be programmed.Each "configurable logic block" will have a standard pull-up layout section to implement thedynamic precharging functions of the gate. The current one-dimensional transistor row layoutcan be extended to a structured array approach with input rails and routing rails. The circuitcan be clustered into DCVS logic blocks in much the same way partitioning needs to be donefor FPGAs. Also, it may be possible to consider the alignment of input variables between thearrays to facilitate global routing.Dynamic devices such as DCVS are growing in importance and potential. As asynchronoustheory is further developed and more discoveries into dynamic behavior are made, the use ofDCVS will become increasingly popular. We need intelligent and flexible computer tools tohelp build combinational dynamic logic blocks. This thesis is an advance in that direction.Bibliography[1] Xilinx Programmable Gate Array User's Guide. Xilinx, Inc., 1988.[2] Berkeley Logic Interface Format. University of California, Berkeley, 1989.[3] Cadence Design Systems Users Manual. 1989.[4] Electronic Design Interface Format 2 0 0. 1989.[5] VHDL Hardware Description Language IEEE Standard 1076/87. 1992.[6] A.V. Aho and S.C. Johnson. Optimum code generation for expression trees. ACM, 7:488—501,1976.[7] S. B. Akers. Binary decision diagrams. IEEE Transactions on Computers, CAD-27(6):509-516, August 1978.[8] R. Bar-Yehuda, J.A. Feldman, R.Y. Pinter, and S. Wimer. Depth-first-search and dy-namic programming algorithms for efficient CMOS cell generation. IEEE Transactions onComputer-Aided Design, 8(7):737-743, July 1989.[9] J. Bhasker and S. Sahni. Optimal linear arrangement of circuit components. Journal VLSIComputer Systems, pages 87-109,1987.[10] R. K. Brayton, R. L. Rude11, A. L. Sangiovanni-Vincentelli, and A. R. R. Wang. MIS: Amultiple-level logic optimization system. IEEE Transactions on Computer-Aided Design,CAD-6(6):1062-1081, November 1987.[11] R.K. Brayton, G. Hachtel, C. McMullen, and A. Sangiovanni-Vincentelli. Logic Minimiza-tion Algorithms for VLSI Synthesis. Kluwer Academic Publishers, 1984.[12] R.K. Brayton, G.D. Hatchel, and A. Sangiovanni-Vincentelli. Multilevel logic synthesis.Proceedings of the IEEE, 78(2):264-300, February 1990.[13] R.K. Brayton and C. McCullen. The decomposition and factorization of boolean expres-sions. IEEE International Symposium on Circuits and Systems, pages 49-54, May 1982.[14] R. Bryant. Symbolic manipulation of boolean functions using a graphical representation.Design Automation Conference, pages 688-694, November 1985.[15] R. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactionson Computers, CAD-35(8), August 1986.71Bibliography^ 72[16] K.M. Chu and D.L. Pulfrey. Design procedures for differential cascode voltage switchcircuits. IEEE Journal of Solid-State Circuits, SC-21(6):1082-1087, December 1986.[17] K.M. Chu and D.L. Pulfrey. A comparison of CMOS circuit techniques: Differential cas-code voltage switch logic versus conventional logic. IEEE Journal of Solid-State Circuits,SC-22:528-532, August 1987.[18] E. Detjens, G. Gannot, R. L. Rudell, A. L. Sangiovanni-Vincentelli, and A. Wang. Tech-nology mapping in MIS. In Design Automation Conference, pages 116-119,1987.[19] D. Gajski (Editor). Silicon Compilation. Addison-Wedison Publishing Company, Inc.,1988.[20] P.C. Elmendorf. KWIRE: A multiple-technology user-reconfigurable wiring tool forVLSI.IBM Journal of Research and Development, 28:603-612, 1984.[21] S.J. Friedman and K.J. Supowit. Finding the optimal variable ordering for binary decisiondiagrams. IEEE Transactions on Computers, CAD-39(5):710-713, May 1990.[22] N.F. Goncalves and H.J. DeMan. NORA:a racefree dynamic CMOS technique for pipelinedlogic structures. IEEE Journal of Solid-State Circuits, pages 261-266,1983.[23] A. Hashimoto and J. Stevens. Wire routing by optimizing channel assignment within largeapertures. Design Automation Conference, 1971.[24] L.G. Heller, W.R. Griffin, J.W. Davis, and N.G. Thomas. Cascode voltage switch logic: Adifferential CMOS logic family. IEEE International Solid State Circuits Conference, pages16-17, February 1984.[25] D.D. Hill. Sc2 - a hybrid automatic layout system. IEEE International Conference onComputer-Aided Design, pages 172-174,1985.[26] T.C. Hu. Combinatorial Algorithms. Addison Wesley, 1982.[27] G. Jacobs and R.W. Brodersen. Self-timed integrted circuits for digital signal processingapplications. VLSI Signal Processing III, IEEE PRESS, November 1988.[28] K. Keutzer. Dagon: Technology binding and local optimization by dag matching. DesignAutomation Conference, pages 341-347,1987.[29] K. Keutzer. Impact of library size on the quality of automated synthesis. Design Automa-tion Conference, pages 120-123,1987.[30] R. H. Krambeck, C. M. Lee, and H-F. S. Law. High-speed compact circuits with CMOS.IEEE Journal of Solid-State Circuits, SC-17(3):614--619, June 1982.[31] E. Lawler, K. Levitt, and J. Turner. Module clustering to minimize delay in digital net-works. IEEE Transactions on Computers, pages 47-57, January 1969.Bibliography^ 73[32] S. Malik. Logic verification using binary decision diagrams in a logic synthesis environment.IEEE International Conference on Computer-Aided Design, November 1988.[33] R.L. Maziasz and J.P. Hayes. Layout optimization of CMOS functional cells. DesignAutomation Conference, pages 544-551, 1987.[34] P. C. McGeer. Robust path delay-fault testability for dynamic CMOS circuits. In IEEEInternational Conference on Computer Design, 1991.[35] P. C. McGeer and Robert K. Brayton. Efficient algorithms for computing the longestviable path in a combinational network. In Design Automation Conference, 1989.[36] P. C. McGeer and Robert K. Brayton. Provably correct critical paths. In DecennialCalTech VLSI Conference, 1989.[37] P. C. McGeer and Robert K. Brayton. Hazard prevention in combinational circuits. InHawaii International Conference on the System Sciences, 1990.[38] P. C. McGeer and Robert K. Brayton. Timing analysis on precharged-unate networks.Design Automation Conference, pages 124-129, 1990.[39] P. C. McGeer and Robert K. Brayton. Integrating Functional and Temporal Domains inLogic Design: The False Path Problem and its Implications. Kluwer Academic Publishers,1991[40] P. C. McGeer, Robert K. Brayton, Richard L. Rudell, and Alberto L. Sangiovanni-Vincentelli. Extended stuck-fault testability for combinational networks. MIT Conferenceon Advanced Research in VLSI, pages 239-259, 1990.[41] T. Meng. Synchronization Design for Digital Systems. Kluwar Academic Publishers, Nor-well, Massachusetts, 1991.[42] T. Meng. private communication. 1992.[43] T. Meng, R.W. Brodersen, and D.G. Messerschmitt. Implementation of high sampling rateadaptive filters using asynchronous design techniques. VLSI Signal Processing III, IEEEPRESS, November 1988.[44] R. Muller and T. Lengauer. Linear algorithms for two CMOS layout problems. Proc.Aegean Workshop on Computing, July 1986.[45] R. Murgai, R.K. Brayton, and A. Sangiovanni-Vincentelli. On clustering for minimum de-lay/area. IEEE International Conference on Computer-Aided Design, pages 6-9, Novem-ber 1991.[46] R. Murgai, Y. Nishizaki, N. Shenoy, R.K. Brayton, and A. Sangiovanni-Vincentelli. Logicsynthesis for programmable gate arrays. Design Automation Conference, pages 620-625,1990.Bibliography^ 74[47] R. Nair, A. Bruss, and J. Reif. Linear Time Algorithms for Optimal CMOS Layout.Elsevier, New York, 1985.[48] N. Newell. Asynchronous control protocol for a DCVS logic micropipeline. unpublishedmanuscript, 1991.[49] T. Nogatch and T. Hedges. Automated design of CMOS leaf cells. VLSI Systems Design,pages 66-78, November 1985.[50] C.J. Poirier. Excellerator: Custom CMOS leaf cell layout generator. IEEE Transactionson Computer-Aided Design, 8(7):744-755, July 1989.[51] J. Reed, A. Sangiovanni-Vincentelli, and M. Santomauro. A new symbolic channel router:YACR2. IEEE Transactions on Computers, CAD-4(3):208-219, July 1985.[52] E.M. Reingold, J. Nievergelt, and N. Deo. Cominatorial Algorithms. Prentice-Hall Inc.,Englewood Cliffs, New Jersey, 1977.[53] M. Sarrafzadeh. Tree placement in cascode-switch macros. Integration, the VLSI Journal,(5):127-139, 1991.[54] G. Saucier and G. Thuau. Sytematic and optimized layout of mos cells. Design AutomationConference, 1985.[55] C.C. Su and M. Sarrafzadeh. Optimal gate-matrix layout of CMOS functional cells. Inte-gration, the VLSI Journal, 9:3-23, 1990.[56] I.E. Sutherland. Micropipelines. ACM, pages 729-738, 1989.[57] H.J. Touati, H. Savoj, and R.K. Brayton. Delay optimization of combinatioal logic circuitsby clustering and partial collapsing. IEEE International Conference on Computer-AidedDesign, pages 118-121, November 1991.[58] T. Uehara and W.M. vanCleemput. Optimal layout of CMOS functional arrays. IEEETransactions on Computers, C-30(5):305-312, May 1981.[59] T. Williams. A zero-overhead self-timed iterating ring implementating 54b division with45ns to 16Ons latency. IEEE VLSI Workshop, Orlando Florida, 1991.[60] T. Williams. Analyzing and improving throughput in self-timed circuits and rings. Pro-ceedings of TAU '92, 1992.[61] S. Wimer, R.Y. Pinter, and J.A. Feldman. Optimal chaining of CMOS transistors in afunctional cell. IEEE Transactions on Computer-Aided Design, CAD-6:795-801, 1987.[62] 0. Wing. Automated gate matrix layout. IEEE International Symposium on Circuits andSystems, pages 681-685, 1982.[63] 0. Wing. Interval-graph-based circuit layout. IEEE International Conference onComputer-Aided Design, pages 84-85, 1983.Bibliography^ 75[64] E.J. Yoffa and P.S. Hauge. Acorn: A system for CVS macro design by tree placement andtree customization. IBM Journal of Research and Development, 28:596-602, 1984.[65] E.J. Yoffa and P.S. Hauge. Acorn: A local customization approach to DCVS physicaldesign. Design Automation Conference, pages 32-38, 1985.Appendix AAlgorithm for Finding an Euler PathThe following is a C-like pseudo-code algorithm for finding an Euler Path in a graph.typedef struct vertex_struct vertex_t;struct vertex_struct {char *name;linked list of edges Edges;};typedef struct edge_struct edge_t;struct edge_struct {vertex_t *vertexl, *vertex2;char *label;linked list pointers *list_entries[3]; /* One on each vertex, and one globally */};linked list of edges global_edges;edge_t *traverse_arc(m)vertex_t *m;{edge_t *edge;edge = first arc incident on m to any node with at least two unused edges;if there are no such edges thenedge = first arc incident on mdelete edge from all three lists it appears on;return edge;}#define other_terminal(edge, vertex) (vertex == edge->vertexl?edge->vertex2:edge->vertexl)euler_path(n)vertex_t *n;{edge_t *edge;edge = traverse_arc(n);m = other_terminal(edge, n);push edge on path;while(m != n) {edge = traverse_arc(m);push edge on path;76Appendix A. Algorithm for Finding an Euler Path^ 77m = other_terminal(edge, m);}while there are unused edges (edges left on global) {foreach edge on path {if either vertex of edge has any unused edges {let m be that vertex;pathl = euler_path(m);insert pathl on path after edge;}}}return path;}Appendix BCADENCE Specific NotesThis document describes the processing steps that need to be done in CADENCE to convert the EDIFfiles output from the DMAP clustering program, to final chip layout incorporating of all the cells. Thereader is assumed to be familiar with CADENCE and CADENCE'S programming language SKILL.The SKILL command ic("info-filename") takes as input a "info-filename" which contains the clus-tering summary file produced by the clustering program for each circuit. The command consists of twosteps:1. Fix Cells; To fix the individual cells so that they can be used for our purposes in CADENCE.2. Make Chip; To create the chip out of those cells, and the lobal circuit netlist.B.1 Fixing Cells1. Fix jumper Pins: fixPin(blockname)In CADENCE, I0 pins are drawn in metal, and are distinguished as "terminals" by using the "pin"layer of that metal. In EDIF, IO pins are specified terminal ports, but there is no way to specifythat the pin contains a geometry of pin layer (the clustering program simply gives it a bogus"wire" layer).The program edif2edge converts the EDIF file to EDGE Layout Rep. fixPin() searches for allthe terminals in the Layout Rep., and creates the geometry of the metal piece. Access directionspecifies which directions the global router can access the pin or port. These are added to thepins.All jumpers are named "jumperX" where X is the unique number in the cell. Jumpers are alsodeclared as ports in EDIF, of layer "wire". They are searched for the string "jumper" in the name,and then made a pin.2. Generate Symbol Rep: gensCell(blockname)This function opens the Layout Rep of the cell, reads in the input and output pin names, andcreates lists for them. These two lists are used in calling the SKILL function GenS which creates78Appendix B. CADENCE Specific Notes^ 79a symbol with this specified I0 pins.3. Generate Abstract Rep: GenAb(blockname)This function generates an Abstract Rep from the blockfile using the function abgen.4. Fix abstract RepAfter GenAb() is run, we need to change the "Representation/type" property of the Abstract Repto "standard" , so that it can be properly recognized.B.2 Making the ChipAfter all the cells have been fixed, the following is done:1. Run "edif2edge" on the global EDIF file. This creates a Netlist Rep of the circuit.2. Run the SKILL function "globalize" on the Netlist Rep,so that CADENCE will join the vdd and gnd with the global vdd and gnd during P&R.3. Generate a Symbol Rep for the circuit from the Netlist Rep. The function gensCct() scans theinput block for terminals, and creates a list for the input and output pins. This is used to call thefunction GenS() to generate Symbol.4. Generate a new schematic that will be the chip, containing the circuit netlist (placed as a Symbol),plus all the I0 pads.The function AddPads(block "symbol" chip "schematic") opens a new Rep called chip/schematic.In this Rep, it creates and instance of block/symbol. Then it scans for all terminals in block/symbol,and adds input and output pads as required, including the gnd and vdd pads.