UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Design and analysis of active and passive decoupling capacitors for on-chip power supply noise management 2009

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
ubc_2009_fall_meng_xiongfei.pdf [ 4.1MB ]
Metadata
JSON: 1.0068219.json
JSON-LD: 1.0068219+ld.json
RDF/XML (Pretty): 1.0068219.xml
RDF/JSON: 1.0068219+rdf.json
Turtle: 1.0068219+rdf-turtle.txt
N-Triples: 1.0068219+rdf-ntriples.txt
Citation
1.0068219.ris

Full Text

DESIGN AND ANALYSIS OF ACTIVE AND PASSIVE DECOUPLING CAPACITORS FOR ON-CHIP POWER SUPPLY NOISE MANAGEMENT by XIONGFEI MENG B.A.Sc., The University of British Columbia, 2004 M.A.Sc., The University of British Columbia, 2006 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Electrical and Computer Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) July 2009 © Xiongfei Meng, 2009 ABSTRACT On-chip decoupling capacitors (decaps) in the form of MOS transistors are widely used to reduce power supply noise in both standard-cell blocks and white spaces between blocks. This research provides guidelines for layouts of decaps that properly tradeoff high-frequency response, electrostatic discharge (ESD) reliability and gate tunneling leakage for use within standard-cell blocks in ASIC designs in 9Onm and 65nm CMOS technologies. A simple but effective metric is developed to determine the optimal decap layout based on the frequency response. Novel active designs are also presented. If an JR-drop violation (hot spot) is found after the physical design is completed, it is usually difficult to implement a quick fix to the problem. In this dissertation, the use of an active decap in white-space areas as a drop-in replacement for passive decaps is investigated to provide noise reduction for these “hot-spot” problems found late in the design process. A modified active decap design is proposed for ASIC applications operating up to 1GHz, and the use of latch-based comparators provides a better power-delay trade-off. Measurement results from a test chip show that the noise reduction using active decaps improves as operating frequency increases, and provides between 1O%-20% noise reduction at 200MHz-1GHz over its passive counterpart. The concept of active decap is further extended to achieve lower supply noise. It is found that an active decap with a stack height of three (i.e., number of pieces switching) provides the best noise reduction if the supply noise level is between 7%-14%, but a stack height of two is best if the noise level is between 14%-16%. In addition, a novel charge-borrowing decap circuit is introduced which outperforms all forms of active decaps for a fixed area in terms of removing local hot spots. 11 TABLE OF CONTENTS Abstract ii Table of Contents iii List of Tables vi List of Figures vii Acknowledgments xii Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Research Objectives 5 1.3 Organization ofthe Dissertation 6 Chapter 2 Background 7 2.1 Introduction 7 2.2 Decoupling Capacitor Basics and Design Challenges 8 2.3 Thin-Oxide Gate Tunneling Leakage 12 2.4 Electrostatic Discharge Reliability in Decap Design 16 2.5 Standard-Cell Decap Layout and Placement 20 2.6 Metricfor Power Supply Noise Management 22 2.7 Summary 25 Chapter 3 Passive Decoupling Capacitor Design 26 3.1 Introduction 26 3.2 High-Frequency Response ofDecoupling Capacitors 27 3.3 Cross-Coupled Decoupling Capacitor Designs 40 111 3.4 Summaiy 48 Chapter 4 Active Decoupling Capacitor Design 50 4.1 Introduction 50 4.2 Active Decoupling Capacitor Analysis and Design 51 4.2.1 Active Decap Concept and Design Considerations 51 4.2.2 Overall Active Decap Architecture 56 4.2.3 Design Specifications 61 4.2.4 Latch-Based Comparator Design 63 4.3 Chip Design and Experimental Results 70 4.3.1 Test Chip Setup 70 4.3.2 Test Chip Simulations 75 4.3.3 Test Chip Measurements 78 4.3.4 Measurement Results on One Typical Chip 85 4.4 Active Decap Size and Placement 88 4.5 Summaiy 92 ChapterS Generalized Active Decap and Charge-Borrowing Decap 94 5.1 Introduction 94 5.2 Extended Active Decoupling Capacitor 95 5.2.1 Optimal Stack Height n 95 5.2.2 Design and Layout of n=3 Extended Active Decap 102 5.2.3 Simulation Results 103 5.3 Charge-Borrowing Decap (CBD) 107 5.3.1 Charge-Borrowing Decap Concept 107 5.3.2 “Cllc” Signal Generation 113 5.3.3 Design of Charge-Borrowing Decap 116 5.3.4 Simulation Results 118 5.4 Test Chip Setup and Measurement Results 122 iv 5.5 Summary 129 Chapter 6 Conclusions and Future Work 130 6.1 Summary and Conclusions 130 6.2 Contributions in this Dissertation 133 6.3 Future Work 133 Appendix: Comparator Design Fundamentals 135 References 140 V LIST OF TABLES Table 1.1: Comparison on active and passive decap implementations 3 Table 3.1: Optimal number of fingers for different frequency ranges 36 Table 3.2: Comparison of the passive decap designs and their gate leakage current 45 Table 4.1: Design specifications of the active decap 62 Table 4.2: Transistor sizes of the comparators 65 Table 4.3: Simulated switching circuit design specification comparison 69 Table 4.4: Comparator delay td and delay difference Atd in different corners 80 Table 4.5: Measured active decap performance for different process corners 82 Table 4.6: Comparison between equation and simulated result after correlation 82 Table 4.7: Active decap bandwidth versus average comparator delay under process corners 83 Table 5.1: Optimal stack height n selection based on the supply noise k (from formula) 100 Table 5.2: Optimal stack height n selection based on the supply noise k (from simulation) 105 Table 5.3: High-level comparison among passive, active, and charge-borrowing decaps 108 vi LIST OF FIGURES Figure 1.1: Effectiveness of on-chip decaps: (a) without decaps, the power supply noise level can be high, compared to (b) where the noise level is low due to the use of decaps 2 Figure 2.1: Decoupling capacitor implemented using an NMOS device 8 Figure 2.2: Cross-coupled decap schematic [21] 10 Figure 2.3: Active decap concept shown in (a) and two previous designs from (b) [37][38] and (c) [391 11 Figure 2.4: Gate leakage current versus gate area 13 Figure 2.5: Gate leakage current density Jleak versus oxide thickness t0, 13 Figure 2.6: Complete ESD protection scheme 18 Figure 2.7: Simulation setup for ESD analysis [24] 19 Figure 2.8: Sample layout of standard-cell N+P decap (a) with one finger and (b) with two fingers 20 Figure 2.9: DVDavg and DVDm: metric used to evaluate DVD profiles 23 Figure 3.1: (a) A standard cell decap can be implemented as a NMOS in parallel with a PMOS device. The corresponding layout is shown in (b) 28 Figure 3.2: A decap can be implemented as a NMOS device and modeled as a lumped-RC circuit with effective resistance and effective capacitance as functions of frequency,f 29 Figure 3.3: The circuit setup to extract the effective resistance and the effective capacitance values from an ac analysis 31 Figure 3.4: Plots of Ceff and Reff for three device sizes (W x L): lOxlOj.im, 15x5.un, and 5x15jtm. 33 Figure 3.5: Plots of Ceff and Reff for three NMOS devices (HSPICE versus model) 35 vii Figure 3.6: The effective capacitance, Ce(j), of NMOS and PMOS decaps in 9Onm for different numbers of fingers in a fixed area of Y=2jim and X=9itm 38 Figure 3.7: The effective capacitance, Ceji(J), of NMOS and PMOS decaps in 9Onm for different numbers of fingers in a fixed area of Y=2im and X=9j.tm 39 4 Figure 3.8: Cej(J) and Reô’(J) comparison of fixed-area standard decap and cross-coupled decap: same MOS device sizes but different poiy connections 42 Figure 3.9: Sample layouts of cross-coupled decap cells for (a) 3N-4P (b) 8N-9P 43 Figure 3.10: Sample layouts of improved decap cells for (a) 1N-9P (b) 1N-16P 45 Figure 3.11: Frequency response of various cross-coupled designs 47 Figure 4.1: Active decap concept and its MOS implementation 52 Figure 4.2: The reductive factorsfand g for the boosted voltage as a function of (a) “on” resistances of the switches, R, and (b) leakage due to the size of decap Cdecp 55 Figure 4.3: Active decap architecture 56 Figure 4.4: Difference in the input voltages of the comparators as a function of supply noise k. 58 Figure 4.5: The reductive factor h for the boosted voltage as a function of delay difference &d. 61 Figure 4.6: Complementary comparator design: (a) n-type input for the top comparator, and (b) p-type input for the bottom comparator 64 Figure 4.7: (a) DC and (b) AC (compensated) characteristic curves for the two-stage latch-based comparator design (n-type input shown) 67 Figure 4.8: AC characteristic curves for the three designs 70 Figure 4.9: Test chip setup 71 Figure 4.10: Layout of active decap showing the relative size of the components 71 Figure 4.11: Final voltage a as a function of sensing and switching circuitry area overhead x. .. 72 viii Figure 4.12: Annotated test chip microphotograph 74 Figure 4.13: Simulated VDD voltage (on a 500MHz clock) with active decap on and off 76 Figure 4.14: Simulated VDD voltage with active decap on for different process corners 77 Figure 4.15: Grouped scatter plot for active decap noise reduction for the tested sample chips.. 79 Figure 4.16: Comparator delay td and delay difference Atd as a function of supply noise k 80 Figure 4.17: Simulated average VDD voltage with active decap on and off versus clock frequency for (a) slow, (b) typical, and (c) fast process corners 84 Figure 4.18: Measured results (on a 500MHz clock) for (a) active decap on and (b) plotted comparison between active decap on and off 86 Figure 4.19: Measured (0.2 - 1GHz) and simulated (1 - 2.5GHz) average VDD voltage with active decap on and off versus clock frequency 87 Figure 4.20: Simulated average VDD noise per clock cycle versus normalized decap size 90 Figure 4.21: Power supply noise reduction difference from active decap and passive decap with area overhead from switching circuit of active decap 91 Figure 4.22: Improvement on average VDD noise for using active decaps in different placement locations by varying Rdist and Rmesh 92 Figure 5.1: Active decap concept showing different stack height: (a) n=2, (b) n=3, and (c) n=4.96 Figure 5.2: MOS implementation of the extended active decap (n=3) 97 Figure 5.3: Final voltage aVDD as a function of stack height n with k varying (fixed area) 99 Figure 5.4: Final voltage aVDD as a function of k with different stack height n (from formula). 101 Figure 5.5: Extended active decap (n=3) architecture 102 ix Figure 5.6: Layout of extended active decap (n=3) showing the relative size of the components. 103 Figure 5.7: Simulated VDD voltage with extended active decap (n=3) on for two different k levels. 104 Figure 5.8: Average VDD voltage as a function of k with different stack height n (from simulation) 105 Figure 5.9: Charge-borrowing decap concept shown in (b), compared to a passive decap in (a). 107 Figure 5.10: Diode inserted charge-borrowing decap showing the states of the clocking signal: (a) Clk at 0, (b) Clk rises to VDD, and (c) Cik falls back to 0 110 Figure 5.11: Possible implementations of charge-borrowing decap with (a) NMOS-formed diodes and (b) PMOS-formed diodes 110 Figure 5.12: Improved implementation of charge-borrowing decap with PMOS formed diodes. ill Figure 5.13: Generation of the boosted voltage on node B2 112 Figure 5.14: “Cik” signal generation using ring oscillator 114 Figure 5.15: Buffer size determines the edge delay while the buffer chain (with ring oscillator) consumes dynamic power 115 Figure 5.16: Complete circuit diagram of charge-borrowing decap 117 Figure 5.17: Simulated VDD voltage (on a 1GHz clock) with a CBD on and off (best case) 119 Figure 5.18: Simulated VDD voltage (on a 1GHz clock) with a CBD on and off (worst case)... 120 Figure 5.19: Simulated average VDD voltage as a function of k showing the case of CBD 121 Figure 5.20: Test chip 2 setup 122 x Figure 5.21: Layout of charge-borrowing decap showing the relative size of the components. 122 Figure 5.22: (a) Annotated microphotograph and (b) layout of the second test chip 123 Figure 5.23: Scatter plot comparing average VDD for the tested sample chips 125 Figure 5.24: Superimposed waveforms showing a CBD and a passive decap on a 1GHz clock. In the top panel of the figure, the upper trace is when the CBD is on, while the lower trace is when the passive decap is on 126 Figure 5.25: Measured average VDD voltages at different clock frequencies 127 Figure A. 1: A differential input, single-ended output comparator symbol 135 Figure A.2: DC transfer characteristics of (a) an ideal comparator and (b) a practical comparator with finite gain and offset voltage 136 Figure A.3: Circuit schematics showing positive feedback: (a) a basic differential pair with diode-connected load, (b) a gain-enhanced differential pair, and (c) a differential pair using positive feedback to provide increased gain 138 xi ACKNOWLEDGMENTS I would like to express my gratitude to my academic supervisor, Dr. Resve Saleh, whose expertise, understanding, encouragement, and support, added significantly to my graduate experience. I appreciate his profound and vast knowledge in many areas, both within and outside of the scope of my research. I would like to thank the exam committee, Dr. Steve Wilton, Dr. David Pulfrey, Dr. Mark Greenstreet, Dr. John Madden, Dr. Lutz Lampe and Dr. Tim Salcudean, for reviewing this dissertation and providing valuable feedback. A very special thanks goes to my colleagues and friends in the SoC group, for their technical advice and kindness. More specifically, I would like to thank Dr. Shabriar Mirabbasi, Dr. Roberto Rosales, Dipanjan Sengupta, Dr. Mehdi Alimadadi, Dr. Samad Sheikhaei, Jeff Mueller, and Sohaib Majzoub. I also acknowledge Dr. Karim Arabi of Qualcomm and Asad Shayan of PMC-Sierra for their suggestions and help in this study. I recognize that this research would not have been possible without the financial support from NSERC and PMC-Sierra Inc., and express my gratitude to them. I also thank CMC Microsystems for providing chip fabrication and the CAD tools. Last but not least, I would especially like to thank my family for both giving and encouraging me to seek for myself a demanding and meaningful education. In particular, I must acknowledge my wife, Liming, for her love, caring and patience through many years of my life. I would not have accomplished this research without her support. The appreciation extends beyond any words at my command. xii Chapter 1 Introduction 1.1 Motivation Scaling of CMOS technology allows higher speed and higher functional density. As the clock frequency increases and the supply voltage decreases to about 1V, maintaining the quality of the power supply has become a primary issue. Power supply noise in the form of voltage variations arises due to JR drop and Ldi/dt effects [1]. The JR drop has been increasing over time due to increased resistances in the power grid as the metal widths continue to shrink with each successive technology generation. The inductive Ldi/dt effects are also increasing due to the higher current demands in more complex chips. However, the pin and package inductance overwhelms the inductance of the on-chip power distribution network, and therefore the on-chip inductance can be neglected [2], although the on-chip inductance may be considered in certain applications [3][4]. Having the two components together, the overall voltage drop AV, at any point in the power grid is [5][6][7]: rop = ‘supply Rmesh + Lpack (1.1) where Rmesh is the power grid (mesh) resistance, Lpack is the package and pin inductance, and ‘supply is the current flow through the user logic circuits. 1 (a) Figure 1.1: Effectiveness of on-chip decaps: (a) without decaps, the power supply noise level can be high, compared to (b) where the noise level is low due to the use of decaps. A variety of different methods can be used to manage supply voltage drops. Among them, the most popular is to use on-chip decoupling capacitors (decaps) to maintain the power supply within a certain percentage (e.g., 10%) of the nominal supply voltage [1][61. Decaps are typically placed in regions between areas of high current demand and the power pads and I/O pins [7][8][9][10][1 1]. The effectiveness of decaps can be illustrated conceptually in Fig. 1.1, where the power system can be modeled as a distributed RLC network [8][12][13][14]. The power supply noise level of Fig. 1.1(a) is reduced in Fig. 1.1(b) due to the use of decaps. In Application Specific Integrated Circuit (ASIC) designs, two types of decaps can be identified: white-space decaps and standard-cell decaps. White-space decaps are placed in the open areas of the chip between intellectual property (IP) blocks, and are made from passive decaps or active decaps. Standard-cell decaps are always passive decaps placed within the IP blocks themselves [9], typically as filler cells. Passive decaps can be implemented with MOS transistors or metal-insulator-metal (MIM) capacitors. White-space decaps are usually implemented with NMOS transistors since they Iob.LVs LIrrJEJ - I PcIcoge Bond Wire/Pad Power Meok Package Bond Wire/Pad Power Mesh (b) glebol Vss 2 provide a high capacitance value per unit area. In certain applications, PMOS devices and MIM capacitors [15][16][17][18] are two alternatives for white-space decaps. Standard-cell decaps normally use both NMOS and PMOS devices. A more recent approach is the active decap which requires dynamic switching of passive decaps to boost up the power rail voltages when excessive voltage drop is detected. Due to its area requirement, it can only be used in the open areas between blocks. The implementation of active and passive decaps in terms of NMOS and PMOS devices in either white-space areas or standard cells can be illustrated in Table 1.1. This research addresses design and implementation issues for both active and passive decaps. Table 1.1: Comparison on active and passive decap implementations. Active decap Passive decap White space NMOS and PMOS NMOS or PMOS (or MIM) Standard cell Not used NMOS and PMOS The lack of sufficient decaps can result in unsatisfactory timing and even functional failure for the logic circuits and memory cells [191. On the other hand, overdesign may cost too much area. It is necessary to develop a metric to evaluate the decap effectiveness in terms of power supply noise management [20]. Starting from 9Onm, a number of relatively new issues [21] must be addressed that impact the design and layout of decaps. This research addresses three important decap problems including frequency response [22][23], electrostatic discharge (ESD) protection [24], and gate tunneling leakage [6][25][26][27][28][29][30]. The frequency response controls the performance of decaps at increasingly higher operating frequencies. A potential ESD event across a thin gate oxide increases the likelihood that a chip will be permanently damaged due to 3 a short circuit in the decap itself. Higher gate leakage significantly increases the total static power consumption of the chip. In white-space areas, prior to the 9Onm technology, the use of passive decaps was sufficient. At 9Onm, the oxide thickness has been reduced to 2nm or less. Therefore, decaps have been redesigned into a cross-coupled form [21] to protect the device from potential electrostatic discharge (ESD) induced oxide breakdown [24]. However, the additional series resistance considerably reduces the transient response of the decap [23]. As a result, large JR-drop levels in localized regions (usually called “hot-spot” JR-drop violations) may unexpectedly be present in high-speed ASICs. These unresolved hot spots cause timing closure problems or result in functional failures in extreme cases. To remove them, designers must consider many options such as moving the logic blocks, adding or rearranging the power pins, and/or modifying power grid design. Near the tapeout deadline, such time-consuming design iterations may not always be feasible. In these situations, an active decap design can play an important role in reducing supply noise without major changes to the design [31]. That is, the active decap can be a drop-in replacement of the passive decap in an attempt to remove any remaining hot spots, thereby saving time and effort [32]. The concept of active decap can be extended with more switches and decaps to ideally achieve lower supply noise. These extended active decap designs need to be evaluated for advantages and limitations so that an optimal design can be made practical. In addition, a novel circuit called charge-borrowing decap that transfers charge from a clean supply node to a noisy supply node will be introduced to produce superior performance to the basic and the extended active decaps. 4 1.2 Research Objectives Research Statement: To investigate new designs and proper placement of active and passive decoupling capacitors to efficiently manage on-chip power supply noise in white-space areas and standard cells for ASIC applications. Specific Research Goals and Contributions: • Design passive decaps in standard-cell arrays that properly trade off gate leakage, ESD and transient response, and provide decap design metrics to determine the optimal layout to obtain a desired capacitance level over a target operating frequency. Develop empirical formulae with only a few parameters to capture the frequency responses for 9Onm and 65nm CMOS technologies. • Design the basic active decap to provide better power-delay trade-off than prior approaches for white-space hot-spot removal. Identify and resolve limitations of the active decap in terms of suitability for ASIC applications in deep-submicron technologies. Explore the placement of active decaps to remove late-stage JR-drop violations. Validate the design of the active decap using a chip design that provides testing mechanisms to evaluate improvements in dynamic voltage drops. • Extend the concept of active decap by modifying its design for improved supply noise management. Achieve a better active decap design that provides a higher level of power supply noise reduction than the basic form of active decaps within a fixed area. Propose a novel circuit that outperforms both the basic and the extended active decaps with help from a clean supply rail. 5 1.3 Organization of the Dissertation The remainder of this dissertation is organized as follows. Chapter 2 provides the necessary background on decap design basics and challenges, gate tunneling leakage through decaps, ESD reliability of thin-oxide gates, standard-cell layout and placement of decaps, and metrics for power supply noise management. Chapter 3 develops a set of new passive decap designs based on the cross-coupled decap. The modeling of the new designs is described and design metrics are provided to allow hand calculations and analyses to be carried out. Based on the simulation results, the proper layout of these designs is described. Chapter 4 proposes a modified active decap design for hot spot removal in ASIC applications. The design advantages and disadvantages are compared against prior work. Measurement results from a test chip are used to validate the design. After correlation with measurement results, further simulation is carried out to explore the efficiency of active decap placement. Chapter 5 extends the concept of the active decap to achieve a better design that has a higher level of power supply noise reduction. Also presented in the chapter is a novel charge-borrowing decap circuit that outperforms the basic and the extended active decaps. Chapter 6 summarizes the results of the dissertation and provides conclusions. Future research directions are provided. 6 Chapter 2 Background 2.1 Introduction The topics in this chapter provide the necessary background for the rest of the dissertation. Some fundamental and practical decap design issues are also highlighted to motivate the topics in the remainder of the dissertation. This chapter begins with an overview of design challenges and problems associated with decoupling capacitors in 9Onm and 65nm. The overview includes gate tunneling leakage, electrostatic discharge phenomenon and protection, and standard-cell decap placement. Gate leakage is introduced from a physical point of view, and useful information from recent technologies is given. ESD reliability is presented and typical phenomena during an ESD event are discussed. Primary and local ESD protection schemes are briefly illustrated. Since ASIC designs typically utilize standard cells, the decap insertion and placement procedure within standard-cell blocks is briefly introduced. A simple metric for supply noise management is proposed to assess the profiles showing power supply noise and to compare the designs providing different decap performance. 7 2.2 Decoupling Capacitor Basics and Design Challenges A passive decap in the white spaces of a chip can be implemented using an NMOS transistor with the gate connected to VDD and both source and drain connected to Vss, as shown in Fig. 2.1. This approach is considered effective because the thin-oxide capacitance of the transistor gate provides a higher capacitance than any other oxide capacitance available in a standard CMOS fabrication process [211. For design purposes, an approximation for hand calculating the capacitance of this MOS decap can be given by [7]: Cdecap W . L• C0, (2.1) where W is the transistor width, L is the transistor length, and Cox is the oxide capacitance per unit area. A more accurate capacitance model needs to include the parasitic fringing and overlap capacitance of the transistor, and will be discussed in greater detail in Chapter 3. Passive white- space decaps can also be implemented with thick-oxide MOS devices, depending on the requirements on ESD and leakage, knowing that there is a capacitance density penalty. Figure 2.1: Decoupling capacitor implemented using an NMOS device. In the past, the analysis techniques and design metrics dealing with power supply voltage drop were overly simplistic [33][34][35]. Designers analyzed power supply noise with static voltage drop (SVD) analysis, which might not reflect the true nature of power supply fluctuations, VDD Decap 8 leading to either unnecessary overdesign or risk of timing failures [20]. Although SVD analysis can provide useful feedback in terms of certain glaring errors in the power grid design, it does not take into account the impact of decaps and many other important factors. Dynamic voltage drop (DVD) analysis is emerging as a replacement of SVD analysis to capture the impact of decaps, package inductance, and simultaneous switching events. The drawback is that DVD analysis does not return a fixed value that can assess the degree of improvement. Currently, there is no signoff or analysis metric to characterize a DVD profile [36]. Therefore, a good metric for DVD analysis is desired to evaluate decap design and placement, for the purposes of this research. At 9Onm, the oxide thickness has been reduced to about 2nm or less. Oxide thickness reduction causes two problems: possible oxide breakdown during an ESD event and increased leakage current. ESD is a transient process of static charge transfer that can typically arise from human contact with any IC pin [24]. Additional input resistance can be inserted in series with passive decaps to protect from ESD. However, this input resistance causes the decap to suffer from the degraded frequency response, resulting in a poor performance in terms of managing power supply noise. Moreover, increased gate leakage should be considered. If decaps can be disconnected from the power rails when they are not needed (e.g., the logic circuit nearby is quiet), gate leakage reduction can be achieved. Therefore, overdesign of decaps should be avoided. Standard-cell layouts of an IP block consist of rows of fixed-height cells in the ASIC design flow. Typically, decap filler cells have both NMOS and PMOS devices. From the 9Onm node, a cross 9 coupled decap design has been proposed by cell library developers [21] to address the issue of ESD reliability, as shown in Fig. 2.2. The cross-coupled design provides additional ESD protection to the thin-oxide gate of the device [21]. Standard-cell decaps are generally implemented with thin-oxide MOS devices. VDD Figure 2.2: Cross-coupled decap schematic 1211. The major concern for white-space decaps at 9Onm and 65nm is a reduced budget for power supply noise as the supply voltage decreases to about IV [19]. In certain situations, the use of passive decaps only in white-space areas is not sufficient and hot spots at certain locations may appear at a late design stage. Active decaps in the white space may be used to remove hot spots. The goal is to investigate strengths, limitations, design issues and placement strategies for active decaps in ASIC applications. Active decaps were originally intended for custom designs. Our goal is to optimize them for ASIC designs. Two active decap circuits using switched capacitances were proposed in the past to regulate the supply voltages {37][38][39]. The basic concept of the active decap is to switch a pair of passive decaps Caecap either in parallel or in series to increase charge delivery capability, as shown in Fig. 2.3(a). vss 10 VOD C decq -1 (b) global VDD V Active [39]: Opaiup with chain ollnverters Decap (c) Figure 2.3: Active decap concept shown in (a) and two previous designs from (b) [37] [38] and (c) [391. C decap (a) global VDD global Active [37] [38]: Single-Input Single-Output Amplifiers Decap global ) 11 The two designs in [37] and [39] are effective for power supply noise reduction, but they also have certain limitations. The design in [37] can respond quickly to supply noise but dissipates large power, whereas [39] saves power but experiences long switching delays. Therefore, an improved design with a better power-delay trade-off is desired. The two previous designs [37][38][39] are illustrated in Fig. 2.3(b) and 2.3(c), respectively. Ref. [37][38][39] mitigate the effects of LC resonance typically in the 20-400MHz band [40]. Recent work has been done to reduce LC resonance using a switched decap technique [411. Researchers have also reported on a distributed active decap to greatly boost the effective decap value while reducing the area requirements [411. In addition, the issue of directly replacing an area occupied by a passive decap with an active decap needs to be addressed to determine the degree of noise reduction that can be obtained and the associated tradeoffs of area and power. 2.3 Thin-Oxide Gate Tunneling Leakage In 9Onm and 65nm processes, a new design issue for decaps due to oxide thickness reduction is the thin-oxide gate tunneling current. The current is in the form of tunneling electrons or holes from substrate to gate or from gate to substrate through the gate oxide, depending on the voltage biasing conditions [26]. Two forms of gate tunneling exist: Fowler—Nordheim (FN) tunneling and direct tunneling. For normal operations on short-channel devices, FN tunneling is negligible, and direct tunneling is dominant [26]. In the case of direct tunneling, the gate leakage current in PMOS is much less than in NMOS, and it has been shown experimentally that PMOS gate leakage is roughly three times smaller than NMOS gate leakage for same size transistors [27] [431. The gate leakage simulations can be carried out by using BSIM4 SPICE models [44][45]. 12 Assuming a 9Onm technology with 1 .7nm oxide thickness and 1 .OV power supply, the gate leakage current ‘leak is shown in Fig. 2.4. 100 ______________ 35 — NMOS Ileak —--PMOS Ileak 30 80 —-NMOSCdecap —a--PMOS Cdecap - 60 --- 20 a) a) 0 - I 40 , a) 10c 20 -‘ I .5 0 I I 0 0 0.5 1 1.5 2 Transistor Area WL (im2) Figure 2.4: Gate leakage current versus gate area. 500 :i 400 NMOS device I— 300 rJ a) a) :iO a) 100 PMOS device 0 1.5 1.6 1.7 1.8 1.9 2 Oxide Thickness t (nm) Figure 2.5: Gate leakage current density Jleak versus oxide thickness t0,. 13 Clearly, as indicated from simulation results, the gate leakage is proportional to the transistor area. That is, ‘leak = kak WL (2.2) where ileak is the gate leakage current density. From simulation, the decoupling capacitance values of the transistor, Cdecap, are also shown in the figure. As described earlier, Cdecap is equal to WLC0xto the first order. Since Co is fixed in this case, Cdecap is proportional to the decap area of WL. The gate leakage current density Jleak and the oxide thickness tox have an empirical relationship as follows, assuming the voltage across the oxide Vox is fixed [43]: log leak = K1 —K2 . (2.3) where K1 and K2 are non-negative experimental constants and are process dependent. Equation (2.3) implies that the gate leakage current is exponentially related to the oxide thickness. A typical Jicak and tox relationship for a fixed Vox at 1V is illustrated in Fig. 2.5. It is evident that at 9Onm and 65nm technologies, the gate leakage from decaps will be significant [27]. The gate leakage contributes to the total static power consumption, and decaps usually occupy a large on-chip area. The use of PMOS devices exclusively is not a viable solution for high-frequency circuits since they have a poor frequency response relative to the NMOS devices for 9Onm and 65nm. In addition, the amount of gate leakage is also a strong function of the applied bias [28]. If the transistor has a voltage across the oxide, Vox, roughly equal to VDD, the leakage current density 14 is largest. If the transistor has a Vox set to close or below the threshold voltage VT, it leaks significantly less. Indeed, under such a condition, the gate leakage current is typically 3-6 orders of magnitude less, depending on the values of VDD and to [281. Thus, the gate leakage in the second condition can be roughly considered to be zero. In decaps, the gate is at VDD and the source and drain of a transistor are tied together. Therefore, decaps would experience the highest levels of leakage, as a function ofV0x. The oxide capacitance C0 is a critical factor to many physical properties of MOS transistors since the drain current of a transistor is proportional to Cox. A larger Cox results in a larger drain current and hence a faster transition or a shorter gate delay. On the other hand, the subthreshold current is related to Cox: a smaller Cox (or a larger toy) corresponds to a higher threshold voltage VT and therefore smaller subthreshold current [26]. Each technology generation attempts to increase C0 by roughly 1 .4x while reducing the channel length L to O.7x of the previous technology’s channel length [7]. The result is that the product of C0L has been maintained relatively constant as technology scales. The proper Cox selection balances the trade-off between the drain current and the subthreshold current in each technology node. From Equation (2.3), the gate leakage density is inversely related to tox. A smaller tox leads to exponentially increasing gate leakage. From a gate leakage perspective, the oxide thickness t0x should be kept large. However, Cox is determined by: C0=— (2.4) 15 where ox is the permittivity of the oxide and is fixed for a given oxide material. Equation (2.4) suggests that if ox is kept unchanged, the increase in Cox will lead to a decrease in tcj and hence an exponential growth in gate leakage [46]. Knowing that the gate leakage increase may be excessive for 9Onm and 65nm, in order to keep t0x thick while increasing Cox, one can adjust the dielectric constant, k, where = k 6, and 801s the vacuum permittivity. If a high permittivity (high-k) dielectric can be used instead of the normal Si02 oxide, the physical oxide thickness t0x would no longer be limited by its electrical property Cox. This concept of using high-k dielectrics was presented in [47], and researchers and process engineers have continued to pursue better high-k materials [481. One of the main goals of using high-k gate dielectrics is to keep the gate leakage under control [48]. Commonly suggested high-k materials include HfO2, HfSiON, Zr02 and A1203,whose permittivity ranges from 10 to 30 [48][49], compared to 3.97 of Si02. Commercial microprocessors fabricated at a 45nm technology have been developed using high-k materials, and it has been reported that a i OX gate leakage reduction was achieved [50]. Starting from 45nm, most fabrication facilities are anticipated to shift to high-k technology to reduce gate leakage [51]. However, for 9Onm and 65nm, the concerns on gate leakage are still significant [27][51]. 2.4 Electrostatic Discharge Reliability in Decap Design ESD protection due to the thin oxide has become an important concern starting from the 9Onm technology node. ESD is the process of static discharge that can typically arise from human contact with any IC pin. Approximately 0.6pC of charge is carried on a body capacitance of lOOpF, generating a potential of 2kV or higher to discharge from the contacted IC pin to ground 16 for a duration of more than lOOns [24]. Under such an event, the peak discharge current is in the ampere range, leading to permanent damage on certain transistors in the chip if not properly protected. The damage can be in one of two forms, or a combination of the two. The first is thermal burnout in devices or interconnects, while the other is oxide breakdown of devices due to the high voltage across the oxide [24]. When running simulations for an ESD event, the maximum current density J of devices and interconnects is measured to check for potential thermal damage. The oxide voltage also needs to be measured to compare with the oxide breakdown voltage of a device for a given fabrication process. The oxide breakdown voltage is almost linearly proportional to the oxide thickness [24]. For instance, assuming a 9Onm process uses 1 .7nm of oxide thickness, the corresponding oxide breakdown voltage is just below 5V. If the thickness is doubled, the oxide breakdown voltage is also doubled to around 1OV [24]. An ESD event can be delivered between any two pins of an IC. To properly protect an IC from ESD damage, an ESD circuit must shunt ESD current between these two pins [24]. In the case of decaps within standard cells, the only two pins that the decaps have access to are the two local power rails, namely VDD and Vss. Primary and local (sometimes called secondary) protection elements are needed to protect the two rails by limiting the voltage difference between the two rails to a value below the oxide breakdown voltage. The primary element will shunt most of the ESD current, whereas the local element serves to limit the voltage or current at the local circuit until the primary element is fully operational [24]. A primary element can be a thick oxide transistor, a silicon-controlled rectifier, an open-gate, grounded-gate or coupled-gate NMOS transistor, or a large diode [24]. A local protection element can be simply a diode formed by a grounded-gate NMOS transistor [24]. 17 A typical ESD protection scheme is illustrated in Fig. 2.6. In addition to the primary and local elements, a resistor R1 is required to limit the maximum current flow to the decap and to limit the voltage seen from the gate of the decap. For better ESD protection, this resistance is normally large and can be in the forms of polysilicon, diffusion, n-well, or channel resistance [24]. The resistance is generally not implemented together with primary and local protection devices. Rather, it is usually inserted within standard cells where ESD damage is a concern. Previous decap designs (typically before 9Onm technology) did not consider ESD performance for two reasons. First, the transistor’s oxide thickness was large and the oxide breakdown voltage was high enough that the transistor was likely to survive during an ESD event with adequate protection circuits. Second, insertion of the large resistance dramatically reduces the transient response of the decap. However, starting from 9Onm, the gate oxide is so thin that the designer cannot ignore the increased ESD risk. A large resistance is therefore recommended to be placed inside the decap cells to protect the circuit from potential ESD damage. Hence, this tradeoff between ESD reliability and transient response becomes the one of major decap design challenges in 9Onm and 65nm technologies. Global Figure 2.6: Complete ESD protection scheme. 18 The ESD simulation requires an ESD generation model. Among all the existing models, the human body model (HBM) was adopted for simplicity. Following the standard MIL-STD-883x method 3015.7 [24], a human body can be simulated as a series of l.5k2 resistance RHBM and lOOpF capacitance CHBM. The capacitor CHBM is initially charged to 2kV that needs to be discharged through some primary elements. The primary element is arbitrarily chosen to be an ESD diode plus a gate-coupled NMOS device (GCNMOS) with an n-well resistor Rnweii (l5k2) and an NMOS bootstrap capacitor Cb. Two identical primary elements are used to protect the circuit placed in between the HBM generation and the elements, as shown in Fig. 2.7. For simplicity, no secondary element is used. Since the primary elements are designed to handle large current flow, the maximum current density, Jmax, is assumed to be within the safe range and is not measured. HBM generation raises the voltage level at node VDD, and hence turns on the primary elements to discharge. For device protection from oxide breakdown, the voltage across the oxide Vox of each decap transistor needs to be observed in simulation. The Vox voltages should to be kept as low as possible, given that the oxide breakdown voltage for a typical 9Onm is below 5V. Initially charged to 2kV VDD ___________ ____________ ______________ J V. ______________ HBM generation Primary element Primary element (Duplicate) Figure 2.7: Simulation setup for ESD analysis [24]. 19 2.5 Standard-Cell Decap Layout and Placement In the white spaces around the chip, decaps are usually made of NMOS devices, as described earlier. However, within standard cells, it is more convenient to make decaps using both NMOS and PMOS transistors to form a decap filler cell. This is because the n-well is already implemented and usually reserved for PMOS devices. Only about a half-cell area is for NMOS devices. One sample standard-cell decap layout is illustrated in Fig. 2.8. In the figure, the NMOS decap occupies roughly the bottom half of the cell area, whereas the PMOS decap is located in the n-well. The capacitor areas are the polysilicon gates placed on top of the channel regions of the MOS transistors. For standard cells, the height of the cell is always fixed, and the designers can only adjust the cell width. Once the cell width is determined, the size of the decap and the capacitance of the decap are established. Fig. 2.8(a) illustrates a large decap cell (measured in cell width) with long channel transistors. A fingering technique is commonly used to have a smaller effective channel length to improve the decap frequency response. Fig. 2.8(b) depicts the same decap cell but with two fingers. (a) (b) Figure 2.8: Sample layout of standard-cell N-I-P decap (a) with one finger and (b) with two fmgers. 20 During the placement procedure, computer-aided design (CAD) tools place standard cells into rows. Because the height of each cell is always the same, when cells are placed adjacent to each other, the n-well region and the VDD and V55 lines are automatically aligned. The cells for placement are obtained from the standard-cell library, where all the cells are predefined in width and driving strength. Since the total width of the row is fixed and the individual cell widths are fixed, some empty spaces (typically small) between the cells are left after placing cells. Those empty spaces are good candidates for the placement of decap cells [9]. In fact, a set of decap cells with different cell widths is often included in the standard-cell library. Decap insertion is considered as a part of the complete design flow. In a typical ASIC design flow, once the standard-cell blocks are synthesized, placed and routed by CAD tools, the decap cells are placed into the empty spaces. Generally, since the spaces are filled using a library of decap cells with various sizes, the decap placement is done without affecting the placement of other logic cells. After placement and routing, chip-level timing is analyzed and timing violations will be fixed by replacement and/or rerouting. Then, chip-level JR-drop analysis is carried out by a CAD tool (e.g., Apache Redhawk) such that the hot spots of severe voltage-drop areas are identified [521. If the voltage drop at the hot spots exceeds the noise budget, more decaps will be inserted into the violation regions and a modification of the placement of other logic cells may have to be done. The logic cell movement requires additional timing and routability analysis before moving on to next step. Then, the chip’s JR drop is analyzed again for the remaining hot spots. These steps in the design flow are iterated until all the hot spots are eliminated and all the logic circuits pass timing analysis. Typically, it may take one or two 21 (occasionally even more) iterations to eliminate all the hot spots [7][9]. In addition, the potential problem of electromigration is also checked alongside the JR-drop analysis [7]. This commonly used decap placement approach is not optimal simply because the empty cells may not be located near the high JR-drop regions. After the hot spots are first identified, the remaining empty spaces near the hot spots may not be large enough. Hence, the logic cells may have to be shifted, resulting in a need for additional timing analysis. In order to improve the placement efficiency, researchers suggest a few approaches including: global decap placement between standard-cell blocks [53], decap placement using activity [1], standard-cell decap placement not affecting relative placement of logic cells [9], and earlier-stage decap placement decisions [24]. Since decaps experience excessive gate leakage, decap placement methods considering leakage current are proposed in [6] and [25]. 2.6 Metric for Power Supply Noise Management Static Voltage Drop (SVD) analysis and verification has traditionally been an essential part of the overall physical design and verification flow in semiconductor industry for over the past ten years [20]. In this approach, the JR drops across the chip are computed by averaging the current draw by the transistors and blocks from the power grid. The computed fixed values can be fed to timing verifiers to assess the impact on delay. In the past, SVD analysis provided useful feedback in terms of major errors in the power grid design. Going forward, at 90 nm and smaller technologies, SVD verification is not enough to ensure power integrity. SVD does not take into account the contribution of power density, variations in the switching activity profile and impact 22 of inductance and decoupling capacitors (including LC resonance effects) [2]. Therefore, SVD is not an adequate approach to analyze and optimize power delivery networks in SoC designs. Recently, industry began to use Dynamic Voltage Drop (DVD) analysis as a way to capture the impact of decaps, inductance, and spatial and temporal switching events in the design [20]. DVD is emerging as a replacement to SVD to capture the impact of power supply noise on timing behavior of logic and memory cells. In order to evaluate the design of active or passive decoupling capacitors on a DVD profile and to qualify its impact on a design’s timing performance, two quantities can be used as a metric: DVDavg and DVDmax, as shown in Fig. 2.9. DVDavg is the DVD profile’s average value in the timing cycle, whereas DVDmax is the DVD profile’s peak value in the same timing cycle. A design is considered better if it has smaller DVDavg and DVDmaX. Users should add design margins to these metrics to account for the metric’s simplifications, or they should perform the final signoff process with actual DVD profiles. DQ DQ vss Cik V00 V00 DVDyg Cik Figure 29: DVDaVg and DVDmax: metric used to evaluate DVD profiles. 23 Ref. [20] validates the use of this metric. It can be established that the impact of the voltage drop profile on the timing performance of a digital path is equivalent to applying a fixed supply voltage of VDD- DVDavg to the same path. To show this, a logic path was first simulated in presence of a true dynamic voltage profile on the power supply, and then a DC voltage equal to the average of the voltage profile was used on the power supply [20]. The results show that the timing behaviour of the two cases matches. The intuitive reason for this relationship can be illustrated in Fig. 2.9. Considering the delay of the critical path of the circuit between the two flops, gate delay is reduced when the supply voltage overshoots (VDD(t) > VDD nominal) and increased when the supply voltage undershoots (VDD(t) < VDD nominal). When the supply voltage fluctuates, gates that see a voltage drop higher than the VDD- DVDaVg accelerate and gates that see a voltage drop lower than VDD- DVDavg decelerate compared to the situation where all gates see a voltage drop equal to VDD- DVDaVg. Therefore, DVDavg is a good measure of the average effect of the JR drop on delay. The use of DVDavg as a metric for JR-drop analysis has been shown to be valid on several industrial designs [20]. Applying the metric, any approach (e.g., decap insertion) that reduces the average dynamic voltage drop (DVDavg) or raises the average supply voltage (VDD DVDavg) can be considered as a valid solution for reducing power supply noise to improve timing performance, although the instantaneous supply voltage drop may not be affected. The DVDmax value represents the worst-case voltage drop, with a safety margin, that causes a failure in logic circuit and memory cells. That is, if the voltage drop exceeds DVDmax by the 24 margin of safety, the behaviour of standard cells or memory cells will be unpredictable. The value of DVDmax depends on the tolerance of individual IP blocks to power supply noise. Ref. [20] suggests that DVDavg should not be bigger than 10% of VDD-Vss and DVDmaX should not be bigger than 20% of VDD-VSS. These percentage values are commonly used in the industry [20]. These limits are considered pessimistic enough to account for the simplified nature of the metric. By analyzing different DVD profiles using the metric of DVDaVg and DVDmax [20], an important conclusion can be made: Lth/dt contribution of voltage drop is not as critical as the JR contribution. Ldi/dt only affects the DVDmax and therefore has minimal impact on DVDavg as long as the transfer of charge is completed within the cycle. Therefore, Ldi/dt voltage drop may not significantly affect the timing performance of the chip, as long as it does not create supply fluctuations that exceed the DVDmaX. The above observation is consistent with [2]. 2.7 Summary This chapter summarized a number of decap design issues including gate tunneling leakage, ESD protection, standard-cell layout and placement requirements, and the lack of useful metrics evaluating the results from DVD analysis. The decap design challenges for 9Onm and 65nm were described. A simple metric, DVDavg and DVDmax, was proposed to interpret the DVD results from CAD tools. The metric is best used to compare and evaluate different decap designs. 25 Chapter 3 Passive Decoupling Capacitor Design 3.1 Introduction In an ASIC design flow, after placement and routing, empty spaces naturally exist within standard cells. Passive decoupling capacitors, as filler cells, are usually used to fill these empty spaces to reduce JR drop problems locally. This chapter addresses the design and layout of passive decaps [9][23][24] for standard cells at the 9Onm technology node. As described in the previous sections, the JR drop has been increasing over time due to increased resistances in the power grid as the metal widths continue to shrink with each successive technology generation. The inductive Ldi/dt effects are also increasing due to the high current demands of ASIC designs in deep submicron technologies [7][lO][l 1]. The increased supply noise level challenges the design and layout of passive decaps. A number of relatively new issues for standard-cell decaps must be addressed that impact the design and layout of these cells at scaled technology nodes. Two important problems of decap frequency response and electrostatic discharge (ESD) protection [24] will be addressed. Since decaps are required to perform at increasingly higher operating frequencies, the frequency response [lO][12]{54] of passive decaps will be investigated first to propose improvements to 26 optimize decap layouts. Next, the problems of reduced oxide thickness of a transistor, namely, ESD [24] and thin-oxide gate leakage [6][1 1], will be explored in the context of decap design. A potential ESD event across a thin gate oxide increases the likelihood that a chip will be permanently damaged due to a short circuit in the decap itself. Higher gate leakage increases the total static power consumption of the chip. A cross-coupled standard-cell design was proposed [21] to address the issue of ESD performance. The design provides sufficient ESD protection, but does not offer any savings in gate leakage and it may compromise the frequency response. This chapter aims to suggest improved layouts of the cross-coupled design that properly tradeoff frequency response and ESD performance, while greatly reducing gate leakage current. 3.2 High-Frequency Response of Decoupling Capacitors Standard-cell layouts of an IP block consist of rows of fixed-height cells in the ASIC design flow. After cell placement is completed, there are a number of empty cells that can be filled with decaps of various sizes depending on the space available. Previous work has addressed the automatic placement and sizing of decap cells [9]. The focus in this chapter is on optimal layout of each decap filler cell. Typically, these standard cells have both NMOS and PMOS devices as shown in Fig. 3.1(a), with a corresponding layout in Fig. 3.1(b). Thin-oxide MOS devices are generally used for standard-cell decap implementation. 27 VDD I ,‘ I I ‘‘ I ss (a) (b) Figure 3.1: (a) A standard cell decap can be implemented as a NMOS in parallel with a PMOS device. The corresponding layout is shown in (b). As the frequency of operation increases, a fingering approach is required to implement the layout. That is, a single transistor is split into a number of parallel transistors with the same width, but smaller channel lengths. The overhead of this approach is additional spacing for source/drain contacts and an overall reduction in the low-frequency capacitance. However, the average capacitance of the decap over a given frequency range improves as the number of fingers increases. Therefore, the problem of how many fingers to use given a fixed area of a filler cell and fixed gate-oxide thickness needs to be addressed. The objective here is to develop a useful metric to capture the frequency response characteristics in order to choose the optimal number of fingers. To derive the needed equations, an NMOS decoupling capacitor is depicted first in Fig. 3.2. Non-idealities associated with MOS devices are modeled as a lumped-RC circuit [12j where both the effective resistance, Rejj, and effective capacitance, C11, are functions of frequency, f as shown in Fig. 3.2. 28 VDD (f) Figure 3.2: A decap can be implemented as a NMOS device and modeled as a lumped-RC circuit with effective resistance and effective capacitance as functions of frequency,f. The DC capacitance Ceff,O and resistance ReffO are given by {12][24]{55] (where the subscript 0 indicates zero frequency at DC): C0 —CoxWL+2CoLW (3.1) R L (3.2)eff,O 12PCOXW(VGS—VT) where Cox is the oxide capacitance per unit area, COL is the overlap and fringing capacitances per unit width of the device, p is the channel mobility, YGS is the voltage across the oxide, and VT is the threshold voltage. Assuming that a given filler cell has a horizontal dimension X and vertical dimension Y, the channel length of each device in a fingered layout is: Lm =[X—(m—1)xcontac,]/m (3.3) where m is the number of fmgers and is the distance between fingers required by contact spacing rules. Modified expressions for Ceff0 and Rejo can be derived as a function of the number of fingers. Thus, the effective capacitance at DC is given by: CeffO(m) = m•(C0xYLm +2.CQLY) (3.4) 29 For capacitance, each additional finger adds extra overlap and fringing capacitances but loses area due to the contact spacing. Therefore, the capacitance actually decreases linearly as the number of fingers increases. The corresponding equation for RCff0 with m fingers in a parallel combination is: 1 L,, lReff,O eff,O(m) 2 ( )12 PCOxW(VGs-VT) In previous work [12], the resistance was used to select the channel length and there were no area constraints involved. However, since the resistance drops off as m2, it is not as important in the selection of a suitable m. In fact, the goal of an optimal layout should be to provide the highest capacitance value in the given area over a desired operating frequency, 0 tof0,while delivering a low resistance. A simple metric is needed to evaluate layouts with differing number of fingers. The easiest choice for a metric is to use the average capacitance over this frequency response up tof0, as follows: c — C’eff,0m + (3 6)avg(m) 2 where C0(m) is obtained from Equation (3.4) and Ceff(m)(fo) is the effective capacitance with m fmgers at frequencyf0. A weighted average is also feasible, but it was observed that the simple average works well in practice. The main issue with the metric is that Cefftm)(fø) is difficult to compute without the aid of HSPICE or an equivalent simulation tool. To facilitate the process, simple frequency-dependent models for both Ceff and Reff are developed. Also, the characteristics of both functions need to be accurate as technology scales. First, a number of AC simulations were performed in HSPICE for a 9Onm CMOS technology using non-quasi-static (NQS) models, which are essential when simulating 30 decaps in the gigahertz frequency range of operation. Two parameters, ACNQSMOD and TRNQSMOD, were set to “1” in BSIM4 [55]. The circuit in Fig. 3.3 was used to extract the effective resistance and capacitance from HSPICE results as follows [52]: Reff(f)= Re(IRC) (3.7) Mag (IRc) c (f) = Mag (IRC) (3.8) 2lrfIm(IRC) where Re(IRC), Im(IRC), and Mag(IRC) are the real, imaginary, and magnitude components of ‘RC, respectively. It is assumed in Equation (3.7) and (3.8) that the applied AC voltage Vac is 1LO° V. Vac Vdc VDD Ceff(/) Figure 3.3: The circuit setup to extract the effective resistance and the effective capacitance values from an ac analysis. Both NMOS and PMOS decaps were simulated with W x L sizes as follows: l5jtm x 5pm, lOiim x lOiim, and 5j.im x 15j.im. The simulation frequency ranged from 0 to 10GHz. Typical ASIC clock rates today are in the range of 500MHz to 1GHz, but it is important to study frequency response well beyond the clock frequency. Most of the spectral power density of digital signals lies within frequencies of up tOfee l/(2trise), where tflse is a signal’s rise time (which can be on the order of 5Ops or less), andee is the 3-dB cutoff frequency of the spectral power density [56]. It was assumed conservatively that trise SOps, and the analysis was carried out up to 10GHz. 31 Large Length Ceff NMOS 1200 1000 WxL(im) 800 _________ 10x10 600 -°-15x5 —a-5x15 400 - 200 - 0 I 0 2 4 6 8 10 Frequency (GHz) Large Length Rf NMOS 700 ::: a’ 400 10x10 —°--15x5 300 --5x15 200 100 _____ _____________ __ p 0 I 0 2 4 6 8 10 Frequency (GHz) 32 Large Length Ceff PMOS 1200 1000 800 600 400 200 0 Large Length R- PMOS WxL(im) r_a 10x10 —°—15x5 -o_ 5x1 1800 1600 1400 1200 800 600 400 200 0 Frequency (GHz) 10 WxL(iim) Ho_ 10x10 —°—15x5 5x15 Figure 3.4: Plots of Ceff and Rejy for three device sizes (W XL): lOxlOEtm, 15x5im, and 5x15tm. 0 2 4 6 8 10 Frequency (GHz) 0 2 4 6 8 33 The results of the simulations are shown in Fig. 3.4. Asf increases, there is a noticeable roll-off in the curves due to finite transit time effects. Devices with large L’s have a more pronounced effect. In fact, the Ceff curve for 5im x 1 5jim quickly decays in value relative to 1 5jim x 5im. A general observation on NMOS and PMOS decaps can also be made: NMOS is superior to PMOS in its high-frequency behavior since it has a larger Ceff and a smaller Reff at high frequencies, assuming the area is fixed. Although standard cells employ both NMOS and PMOS devices for decaps, these results show that NMOS decaps would provide better . frequency response characteristics. For modeling purposes, based on the frequency responses of Ceff and Reff, it is suggested [55] that the functions can be postulated into the form: C (f)= CeffO (3.9)eff l+(f/fT)2r R (f) R0 (3.10)eff l+(f/fT)2r’ wheret1=1/12 [55] and the transition frequencyf = ‘( Shown in Fig. 3.5 are the 2rL results of curve-fitting using Equations (3.9) and (3.10) against HSPICE for the NMOS device. The results are very close. A factor of 1/2 was applied tofT in order to produce results shown in Fig. 3.5. That is, the equation forfT must be adjusted by a fitting factor of 0.5 in order to obtain good results. Similar results were obtained for PMOS devices. This demonstrates that the first- order equations for Cejj(f) and Rej’(J) are reasonably accurate and, perhaps more importantly, that Ceff(m)(fo) can be easily computed for the metric without running HSPICE. 34 700 600 500 400 300 200 100 0 Large Length Ceff NMOS Frequency (GHz) Large Length RffNMOS WxL(im) Frequency (GHz) Figure 3.5: Plots of Ceff and Reff for three NMOS devices (HSPICE versus model). 1200 1000 800 600 400 200 0 I I I 0 2 4 6 8 10 WxL(iim) lOxlO HSPICE — — lOxlO calculated 15x5 HSPICE — — 1 5x5 calculated —6—5x15 HSPICE E —5x15 calculated — — _____ I I I I —°-- lOxlO HSPICE — — lOxlO calculated 15x5 HSPICE — — 1 5x5 calculated 5x15 HSPICE — — 5x1 5 calculated 0 2 4 6 8 10 35 At this point, all the necessary information is obtained to determine the number of fingers based on the frequency response. From Equation (3.9), the effective capacitance in Equation (3.6) atf0 with m fingers is: Ceff(m)(fo) = 1+(f/J())2v where fT(m) = (s) (3.11) To demonstrate the efficacy of the metric, it was applied to the layout of a standard-cell decap in an available area of 2pm x 9jim. Using Equation (3.6), Table 3.1 lists the Cavg(m) metric values for the NMOS or PMOS devices for different frequency ranges. The optimal number of fingers corresponds to the largest entries in bold. For example, if the frequency range of interest is 0 to 10GHz, then 3 NMOS fingers and 4 PMOS fingers are optimal relative to the metric. Of course, if the range is 0 to 2GHz, two fingers are sufficient for both N or P devices. Note that PMOS devices typically require one more finger than NIvIOS devices at higher frequencies of operation. Table 3.1: Optimal number of fingers for different frequency ranges. Frequency Range Cavg Metric versus Number of Fingers 0 —fo m=1 m=r2 m=3 m4 m5 N 180W 187fF 182fF 177fF 171W 0-2GHz P 150W 183fF 182fF 177fF 171W N 150fF 184fF 182W 177fF 171W 0-5GHz P 110fF 165fF 178fF 176W 171W N 120fF 173fF 180fF 176fF 171W 0 - 10 GHz P 100fF 135ff 166fF 172fF 169W 36 Table 3.1 was shown to illustrate the use of the metric in determining the optimal number of fingers. In practice, the design process would be as follows. First, the area of a filler cell (in particular, the X dimension of the cell) and frequency range of operation are used as input parameters. Then, the capacitance value as a function of m is computed using Equation (3.6). Finally, the value of m producing the highest capacitance is used to implement the layout. The results in Table 3.1 can be validated by using Equations (3.4) and (3.9) to generate Ceffim) plots for both NMOS and PMOS devices, as shown in Fig. 3.6. The results in the plot were verified with HSPICE to ensure consistency. As an example, consider the cases with 1 finger and f0=lOGHz. For the NMOS case in Fig. 3.6, Cavg(1)N(Ceff,o+Ceff(1)(J,))/2(l9OfF+5OfF)/2 12OfF, whereas for PMOS, Cavg(1)p(Ceff,o+Ceff(1)(f))/2(l9OfF+lOff)/2lOOfF. These are the same values that are found in the last row of the table with m= 1. The rest of the table is produced in the same manner for different values of m and frequency range, 0 —f0. By inspection, the plots indicate that 3 fingers would be optimal for NMOS decaps and 4 fingers would be optimal for PMOS decaps, based on the flatness of the lines and the initial value of the capacitance. This conclusion is consistent with Table 3.1. However, by using the metric, designers can quickly obtain the optimal number of fingers for a target operating frequency, without the need for such plots or SPICE simulations. I 37 Calculated Ceff NMOS 200 XX 180 160 140 120 100 c80 60 40 20 0 6 8 Frequency (GHz) Calculated Ceff PMOS 200 180 - _ _ _ x—x X 140 ‘120 100L) 80 60 40 20 0 10 Frequency (GHz) Figure 3.6: The effective capacitance, Cejr/f), of NMOS and PMOS decaps in 9Onm for different numbers of fingers in a fixed area of Y2Lm and X9Lm. 0 2 4 10 ng 2 Fingers 3 Fingers 4 Fingers ingerj FingerS\ —-.3 Finger5 -0-4 FingerS\ jingeri 0 2 4 6 8 38 VDDL:.. • :: : P*os NMOS --‘ . - --. - (a) Figure 3.7: The effective capacitance, Cej$f), of NMOS and PMOS decaps in 9Onm for different numbers of fingers in a fixed area of Y=2iim and X=9im. Fig. 3.7 illustrates how standard cell layouts would be implemented using the above results, assuming a 10GHz operating range. These layouts would be automatically created by a decap filler cell generator. Two possible layouts are shown: (a) uses the N and P devices and (b) is NMOS only. Fig. 3.7(a) uses 3 fingers for the NMOS device and 4 fingers for the PMOS device. From an average capacitance perspective, the NMOS-only layout style of Fig. 3.7(b) is better, and this is also reflected in Table 3.1. To implement this type of layout in a standard cell, the p well region must be extended to cover the entire area, which is not typical of standard cell design. This approach can be used as long as the design rules at the boundaries of adjacent standard cells are satisfied. (b) 39 3.3 Cross-Coupled Decoupling Capacitor Designs At the 9Onm technology node, there is the possibility of oxide breakdown during an ESD event. A simple ESD protection scheme for decaps is to insert a relatively large resistance in series to limit the maximum voltage seen at the gate of the decap [24]. A minimum ReJJçO is needed to ensure ESD reliability for decap cells. A cross-coupled decap design has been proposed by cell library developers [21] to address the issue of ESD reliability. As shown previously in Fig. 2.2, the drain of the PMOS device is connected to the gate of the NMOS, and vice-versa [21]. The cross-coupled design provides additional series resistance to the inherent decap resistance to increase Reff0. The frequency response characteristics of this new configuration can be evaluated to determine if the results obtained in the last section can be applied directly to the new circuit. A standard 3N- 4P decap of Fig. 3.7(a) is first compared to a same-area cross-coupled decap using HSPICE ac analysis in Fig. 3.3, and the results are shown in Fig. 3.8. The standard 3N-4P decap has a very low resistance (around 3O), which makes it prone to ESD failure. The cross-coupled 3N-4P design has a much higher DC Rejj 0 (around 35002) but a poorer frequency response for Ceff. Consequently, the tradeoff between ESD reliability and frequency response must be considered in the design process and decap layout. To improve the frequency response, additional fingers must be used. The target resistance Rejj,o target for ESD protection in our case is a minimum of 5002. The number of fingers was increased to reduce the resistance from 35OO down to that required by ESD. According to Equation (3.5), the scale factor on the 3N-4P design can be found as follows: 40 3500 Rotget= m2 =500 (3.12) :.m=g3500/500=2.6 Scaling the 3N-4P by approximately this amount, a cross-coupled 8N-9P decap was produced. Similarly, for an ESD target Reff0 target = 10002, it was found that m=l .9, so a cross-coupled 6N- 7P decap was chosen. The plots for 6N-7P and 8N-9P fingers are also illustrated in Fig. 3.8. The results show that the 8N-9P cross-coupled version is the best configuration to address both frequency response and ESD protection. 41 200 x x x x x x x x x180 Standard decap 3-finger NMOS & 4-finger PMOS 160 - Cross-coupled 140 8-finger N & 9-finger P 120 / ‘ 100 Cross—coupled (‘) 80 6-fmger N & 7-finger P 60 / 10 Cross-coupled 3-finger NMOS & 4-finger PMOS Frequency (GHz) 3500 Cross-coupled 3-fmger 3000 NMOS & 4-fmger PMOS 2500 / Cross-coupled 6-finger N & 7-finger P ‘2000 Cross-coupled 8-finger N 1500 & 9-fmger P 1000 500 0 ) x / x x 0 / 2 4 6 8 10 Standard decap 3-finger NMOS & 4-fmger PMOS Frequency (GHz) Figure 3.8: Cejj(J) and Refl(J) comparison of fixed-area standard decap and cross-coupled decap: same MOS device sizes but different poiy connections. 42 Vim .Prj Vss From a layout perspective, the cross-coupled decaps can be realized by simply rerouting the poly connections of the standard decaps, while keeping the MOS devices the same. The layouts of two cases, 3N-4P and 8N-9P, are shown in Fig. 3.9. NMOS :tW< “ ///4/ At//A’ ////V/////d’Af/V///V///<’/ //VA/A’/ ,/, V / /V, / / /4/ / /, / ////A( / V - V % tinaaa * 7i Lh __ ______qj \‘\‘\ _____ wflaiaNaJr V; ___• ,< T C _ iaaaiiaa S wcwva W% .fl \“ ——.4 (a) (b) Figure 3.9: Sample layouts of cross-coupled decap cells for (a) 3N-4P (b) 8N-9P. It is important to address one other issue of thin-oxide, gate leakage current of the decap, which contributes to the chip’s total static power. Using HSPICE, the standard and cross-coupled decap circuits were found to have almost identical gate leakage. That is, since the cell area is fixed and only the poly terminal connections are swapped, the cross-coupled design provides no inherent savings in gate leakage as compared to the standard design. There exists a simple design approach to save gate leakage. Simulations using BSIM4 SPICE models [44] indicated that 43 PMOS gate leakage is roughly 3 times smaller than NMOS gate leakage for same size transistors [27][43]. Therefore, PMOS devices are preferred from a leakage perspective. Since PMOS devices have a poor frequency response, more fingers can be used to obtain the desired result. But this must be carried out in the context of the cross-coupled design to preserve ESD protection. The basic idea to control leakage is to have the smallest possible NMOS device cross-coupled with the largest possible multi-fingered PMOS device. This way, the advantages of PMOS leakage and cross-coupling ESD protection are preserved. The layouts of two configurations are illustrated in Fig. 3.10. A small NMOS device is used in both cases. Note the n-well regions have been expanded in both layouts to accommodate the larger PMOS device. Fig. 3.10(a) uses 9 PMOS fingers while Fig. 3.10(b) has a total of 16 fingers. The same cell area as before (2jim x 9pm) was used for the two designs. 44 Table 3.2: Comparison of the passive decap designs and their gate leakage current. Decap Cell Layout Description Gate Leakage Std. 3N-4P Std. decap with 3 fmgers for N and 4 fingers for P 262.4 nA Cross-Coupled 3N-4P Cross coupled with 3 fingers for N and 4 fingers for P 260.8 nA Cross-Coupled 8N-9P Cross coupled with 8 fingers for N and 9 fingers for P 206.8 nA 9 fingers Cross coupled with smallest N and 9 fingers for P 119.1 nA Modified 16 fingers Cross coupled with smallest N and 16 fingers for P 99.7 nA (a) 45 Table 3.2 summarizes the leakage values for the different cases. The standard and cross-coupled 3N-4P decaps have roughly the same leakage. It is somewhat reduced for the 8N-9P case since there is less area for leakage. However, for the two layouts with the small NMOS devices, the leakage is cut in half. In fact, the case with 1N-16P, the leakage is 62% less than the standard decap 3N-4P. The Reff,o target of the cross-coupled design must be set based on ESD considerations, but that also controls the maximum number of fingers permitted, mmax. Since the NMOS device is fixed while the PMOS device is multi-fingered, the following equation can be used to determine the resistance: “eff,O_target — “W” 2 mmax where RN and R are the resistance of the decaps without fingers, and “II” means “in parallel with.” This target Reff0 sets up the equation for a maximum number of fmgers, That is, I 1 1 mm=I( ——)R (3.14) eff,O_target N As described in the previous section, the optimal m depends on the frequency response (i.e., Cavg(m)), but the number of fingers selected should not exceed to mm to satisfy ESD requirements. 46 200 x x x x x x x x x x180 Standard decap 3N-4P 160 - 140 - Mod, cross-coupled 1N-16P 120- / ‘ 100 - 80 - 60 Mod, cross-coupled 1N-9P 40 - Cross-coupled 8N-9P 20 I 0 0 /“2 4 6 8 10 Cross-coupled 3N-4P Frequency (GHz) 1000 Mod. cross-coupled iN 9P //Cross coupled 3N 4P 800 600 1:1) MoZs300 200 Cross-coupled 8N-9P 100 ,c— x x x x x x x x x xI I I I0’ 0 / 2 4 6 8 10 Standard decap 3N-4P Frequency (GHz) Figure 3.11: Frequency response of various cross-coupled designs. 47 Fig. 3.11 illustrates the frequency response of the various designs from 0-10GHz. All of the configurations provide similar Ceff0 values but are dramatically different in the frequency response characteristics. The standard 3N-4P case is the best, followed by the modified cross- coupled 1 N-i 6P. The Reff0 are different in all cases but only the standard 3N-4P case is unsuitable for ESD protection. However, it is desirable to select the configuration with the lowest Reff0 that satisfies the ESD criteria (5002 in this case) for a rapid time-domain response. Overall, the cross-coupled iN- 1 6P layout is recommended because it provides the required RefJ 0 for ESD reliability and saves at least 50-60% on gate leakage. To summarize, for 9Onm and 65nrn, standard-cell passive decap design should follow the layout strategy shown in Fig. 3.10. By using the smallest NMOS device and the largest multi-fingered PMOS device in the cross-coupled form, the decap has the lowest leakage and is able to satisi’ the ESD requirements. 3.4 Summary This chapter investigated the tradeoffs between high-frequency performance of decaps and ESD protection and its impact on the layout of standard-cell passive decaps. A design metric was introduced to determine the optimal number of fingers to use in the standard-cell layout to obtain a desired capacitance level over a target operating frequency. Models were developed to capture the frequency responses of Reff and Ceff for a given technology with only a few parameters. As a result, the models can be used to predict the same characteristics of future technologies. 48 For ESD protection, a cross-coupled design was proposed by cell library developers to provide a large series resistance, but it suffers from reduced frequency response and provides no savings in gate leakage. This chapter demonstrated that more fingers are needed with the cross-coupled standard-cell layouts to provide the target resistance value for ESD protection. The design of the target resistance can follow the formulae provided in this chapter. The layout with the smallest NMOS device and a multi-fingered PMOS device delivers acceptable frequency response and ESD reliability, while providing the lowest leakage. 49 Chapter 4 Active Decoupling Capacitor Design 4.1 Introduction Passive decaps described previously have a small layout and are useful within the block of standard cells. However, for large global decaps (i.e., outside the block), other approaches can be used. This chapter addresses a novel application of the active decoupling capacitor. The objective is to investigate the effectiveness of removing local JR-drop violations (usually called “hot spots”) by replacing passive decaps with active decaps. Starting from 9Onm, large power supply noise levels in localized regions may unexpectedly be present in high-speed ASICs [19]. These unresolved hot spots cause timing closure problems or result in functional failures in extreme cases. These hot spots are often detected late in the design cycle so they become problematic and difficult to remove. To remove them, designers must consider many options such as moving the logic blocks, adding or rearranging the power pins, and/or modifying power grid design. Near the tapeout deadline, such time-consuming design iterations may not always be feasible. In these situations, an active decap design can play an important role in reducing supply noise without major changes to the design [31]. That is, the active decap can be a drop-in replacement of the passive decap in an 50 attempt to remove any remaining hot spots, thereby saving time and effort. In this chapter, to explore the effectiveness of an active decap, quantitative data will be provided on the expected improvements, sizing considerations and placement of an active decap relative to the hot spot. Two active decap circuits using switched capacitances were proposed in the past to regulate the supply voltages [37] [38] [39]. By increasing charge delivery capability, the two designs in [37] and [39] are quite effective at reducing supply noise, but they also have certain limitations. The design in [37] can switch quickly but dissipates large power, whereas [39] saves power but experiences long switching delays. They both mitigate the effects of LC resonance [40], which is typically in the 20-400MHz band. Recent work has been done to reduce LC resonance using an improved switched decap technique [41]. Further work has also been reported on a distributed active decap to greatly boost the effective decap value while reducing the area requirements [411. In this chapter, a modified active decap design that has lower power than [37] and a better response time than [39] is proposed, targeting ASIC applications up to 1GHz. The issue of directly replacing an area occupied by a passive decap with an active decap is addressed to determine the degree of noise reduction that can be obtained and the associated tradeoffs of area and power. 4.2 Active Decoupling Capacitor Analysis and Design 4.2.1 Active Decap Concept and Design Considerations The basic idea of an active decap is to switch a pair of passive decaps, Cdecap, from parallel to series to provide a local boost in the supply voltage [37][38][39]. As illustrated in Fig. 4.1(a), the decaps are initially in a parallel configuration with a full charge developed across both capacitors. 51 In this standby state, the equivalent capacitance is2Cdecap. When placed in a series stack, as in Fig. 4.1(b), the boosted voltage is ideally 2VDD while the equivalent capacitance is reduced to Cciecap/2. When switched back in parallel, the voltage returns to the original value of VDD. In this case, the stacking level n is 2. _____ VDD Cdecq I decap Vss Vss (a) decaps in parallel d 2VDD Ce C decap Vsdecap (c) circuit implementation V Vss (b) decaps in series Figure 4.1: Active decap concept and its MOS implementation. The active decap circuit is depicted in Fig. 4.1(c), with Cdecap and the switches implemented using NMOS and PMOS transistors [37][38J[39]. When the capacitors are in parallel, both Mnl and Mp 1 are on while Mn2 and Mp2 are off (i.e., subthreshold). When the capacitors are in series, both Mnl and Mpl are off while Mn2 and Mp2 are on. The switches exhibit finite “on” resistances, indicated as R and R,, and there is also thin-oxide gate leakage through the decaps, C d.ecap H 52 ‘leak, especially in 9Onm and 65nm CMOS technologies. Both of these effects reduce the performance of the active decap, as described below. For the general case of stacking n parallel decaps into a series chain, the maximum improvement can be characterized in terms of a gain, G [37]. If k is the voltage regulation tolerance, where kVDD is the permissible drop in voltage, then the charge delivered by n parallel capacitors is: Qpcv-aiiei = kVDD flCdecap (4.1) When the capacitors are stacked in series, the charge delivered for the same voltage drop is: Qseries = [n VDD —(1— k)VDD]• CdeCap / n (4.2) The overall charge gain is: G = Qseries = [D —(1—k)VDD].Cd In (43) Qparaiiei kVDD . n Cdecap Therefore, as given in [37], the gain is controlled by n and k: G=’21 (4.4) There exists a value of k such that the regular decap outperforms the active decap. For example, setting G=l and n=2, it is found that k=l/3. For values of k> 1/3, the active decap is of no value. However, if k is below this value, the active decap is able to deliver more charge. For example, if k=O. 1 and n=2, then 0=2.75. This implies that 2.75 times more charge can be delivered by the active decap before its output voltage drops to the same level as the passive decap. Previous research [37][38][39] provides no information on practical limitations when using Equation (4.4). For design purposes, this level of improvement is not possible due to the switch 53 resistances and leakage currents. In fact, the boosted voltage cannot reach flVDD but instead reaches a lower voltage of bVDD. Therefore, the gain equation should be rewritten as: (45) kVDD•nCP where b = n f(R0) g(Cdecap) (4.6) The reduction factors, f(R,) and g(Cdep), depend on the switch resistance, R1, and the leakage current which, in turn, is proportional to Cciecap. Using circuit simulation, normalized plots of J(R) and g(Cdecap) are provided in Fig. 4.2. The switch resistance has a more pronounced effect on b as compared to the leakage current. For example, withR011lO aild Cciecap700PF, it Cd.fl be obtained that f=O.9 and g=O.95 from Fig. 4.2. If the two effects are combined, then b=2(O.9)(O.95)1.7 instead of 2. With k=O.l, the achievable gain is now reduced to G2.O. The actual final voltage value, aVDD, when the active decap supplies the same charge as the passive decap is determined by setting G=1 and solving for a in the following equation: (47) kVDDflCdecap In this case, with b=1.7, k=O.1 and n=2, one can obtain a=1.3, which implies that the active decap will be boosted initially to 1 .7V (instead of 2V) and then falls back to 1 .3V due to the charge demand of a nearby logic circuit. In the passive case, the initial voltage of 1V would be reduced to O.9V, so the active decap is still superior even with the nonidealities included. 54 10.9 - 0.8 - 0 0.7 - 0.6 - 0.5 - - 0.4 0 200 400 600 800 1000 On Resistance of the Switches Ron (2) (a) Decap Value Cdecap (uF) (b) Figure 4.2: The reductive factorsfand g for the boosted voltage as a function of (a) “on” resistances of the switches, R0, and (b) leakage due to the size of decap Cdap. 55 To design the sizes of the MOS switches, a number of issues must be considered. From the above analysis, a small resistance value is preferable to increase the voltage boosting capability of the active decap, and to improve transient response times. The “on” resistances also provide ESD protection because they are in series with the decaps. Any large voltage fluctuations are absorbed by the resistors to reduce the drop across the thin-oxide gates of the decaps, similar to the effect of cross-coupling decaps [221. Therefore, this resistance must be large enough to safely protect the thin-oxide gates. Considering the factors of boosted voltage level, decap performance, and ESD reliability, the “on” resistances should be designed to be in the range of lO-202 by proper selection of transistor widths. This will require rather large switches. Once their sizes are determined, the buffers generating the switching signals must supply enough current to drive the large capacitances resulting in a large sensing and switching circuitry that consumes a considerable amount of power and area. Therefore, these active decaps should be used sparingly in ASIC designs but are particularly suitable for localized hot-spot removal. 4.2.2 Overall Active Decap Architecture global VDD Figure 4.3: Active decap architecture. global Reference Voltage High-pass Switched Generator Filters Compirators Decaps 56 Fig. 4.3 illustrates the complete active decap design containing four blocks: a reference voltage generator, a pair of high-pass filters, two comparators, and the switched decaps. Compared to the previous work [37][38][39], the key difference in this new approach is the use of latch-based comparators. The user logic circuit block shown in the figure is considered to be the main cause of power supply noise violation. The switch control circuit for the active decap is realized using two comparators. The differential inputs of each comparator decide the standby voltage level at the outputs of the comparators. In the standby mode, the top comparator has an output at VDD, whereas the bottom comparator is set to Vss. When the power grid discharges, VDD will drop and Vss will rise. The voltage variations are passed through the high-pass filters to the comparator inputs causing the comparators to reverse their output values, and switching the decaps from parallel to series. Later, when the power grid charges up, the comparator inputs and outputs switch back to their original values. The use of latch-based comparators with hysteresis to switch the decaps is one of the main contributions to this work. An enable signal is provided for testing purposes to allow the active decap circuitry to be turned on or off. When off, the design behaves purely as a passive decap. This allows for a comparison between active and passive decaps. The trigger voltage for the circuit is set by the comparators and the resistor R1. In Fig. 4.3, the reference voltages are generated by a simple voltage divider and are set to roughly VDD/2. However, depending on the comparator design, the absolute input levels of VDD/2 are somewhat flexible due to the differential nature of the inputs. The diode-connected transistors in the reference generator should have large length and small width to control the static current. Inserting a small resistor, R1, between the two transistors is intended to separate the reference voltages by approximately 3OmV. Then, if the comparators are designed to switch when the 57 voltage difference at the inputs is 10-1 5mV (plus an additional 5mV of hysteresis), the overall design will trigger at approximately 5OmV. If R1 is chosen to be smaller, the sensitivity of the active decaps is improved [41], at a cost of significantly increased dynamic power because the active decap is triggered more often. If R1 is designed to be larger, the resulting supply noise k will increase, as shown in Fig. 4.4. When the two input signals of the comparators are separated by a level that exceeds the supply noise generated by the nearby logic, the comparators will not switch, making it a passive decap. In the plot, the active decap stops switching when the input voltages differ by approximately 130—450mV. 0.12 0.11 cl 0 zol C)) 0.09 0.08 200 Comparator Input Voltage Difference (mV) Figure 4.4: Difference in the input voltages of the comparators as a function of supply noise k. The targeted supply noise k value is at 0.09-0.1 (i.e., /CVDD value is 90-lOOmV), resulting in a triggering input voltage between 20-7OmV. Thus, the comparator can be designed with a 2OmV 0 50 100 150 58 switching threshold then the input voltage difference can be adjusted by varying R1 to the midpoint of about 5OmV. From another perspective, since the maximum JR drop is allowed to be 1 OOmV, the active decap should trigger at about a half of 1 OOmV, which is 5OmV, because there is a delay between the time that the active decap is switched and the time that it actually boosts the supply voltage. Of course, a 2OmV of input triggering voltage would be better in that sense, but the active decap will be switching too often, resulting in an increased concern on large power consumption. To design the triggering voltage at 5OmV seems to be a good tradeoff between power and performance. Therefore, the comparators are required to switch once the voltage (VDD-VSS) discharges by about 5OmV, which can be considered as the input sensitivity of the active decap. The delay between detection and activation of the switched decap, td, and the delay difference between the two outputs of the comparators, Ltd, impact the bandwidth of the active decap and the boosted voltage, respectively. Specifically, the delay of the comparators td is inversely proportional to the bandwidth of the active decap, BW. That is, BW1 (4.8) If the operating frequency is below the bandwidth, the active decap reduces the supply noise relative to a same-area passive decap. On the other hand, if the clock frequency is beyond the bandwidth, the active decap may result in equal or more supply noise than the passive decap. If it takes an entire clock period to switch the decaps, then they are just like the passive case because they will not switch during the whole clock cycle. Before they actually switch, the supply voltage goes back to the right level and they are forced not to switch any more. The same situation happens on the next cycle. Therefore, as the switching frequency of the logic increases, 59 the active decap is less and less effective. Then, at 1/ td, it looks like a passive decap. Beyond that point, due to larger static power consumption and varying “on” resistance of the switches, the active decap becomes worse than the passive decap. The above observation will be validated in the next section. Ideally, the delay of the top and the bottom comparators should be the same. In practice, a difference in delay &d between the top and the bottom comparators exists and can be defined as: = tdtop tdix)ttom (4.9) The delay difference will not result in a short connection between power and ground or an open circuit where no decoupling capacitance is present. The effect is, however, the boosted voltage b will be degraded due to leakage current when the switches are not turned on/off at the same time. A function, h(z.td), can be used to capture this effect, as shown in Fig. 4.5. Therefore, Equation (4.6) should be re-written as: b = (4.10) It is desired to keep the delay difference of the two comparators small even under process! voltage/temperature (PVT) variations to ensure sufficient improvement of the boosted voltage. 60 1.05 1 0.95 -e 0.9 0.85 0.8 0.75 -500 -400 -300 -200 -100 0 100 200 300 400 500 Delay Difference Ltd (ps) Figure 4.5: The reductive factor h for the boosted voltage as a function of delay difference At1. 4.2.3 Design Specifications When designing the active decap, a certain value for Cdecap is needed to keep the supply noise small. However, in the case of drop-in replacement, this design freedom is not available. Therefore, as the first step, designers should check and see through simulation if the replacement of active decap can remove the hot spot nearby or not. The next step is to select a proper “on” resistance of the switches R. For a high boosted voltage b, R0 should be kept low enough. On the other hand, R0 must also be high enough for ESD protection. Designers should make this tradeoff for R selection to design the switches. The size of the switches and the passive decaps determines the load capacitance on the comparators. 61 The comparator design should generally satisf’ low power and high speed requirements of ASIC applications. For example, a static power of 5mW or below can be achieved while the bandwidth of the active decap should be able to handle a 1GHz clock. This sets the average comparator delay to approximately ins. Note that the delay requirement should be fulfilled under PVT variations. Therefore, the worst-case delay should be used here. The output of the comparators in the standby state should be close to VDD or V55 to reduce leakage current through the passive decaps and switches to save power. The comparator should be designed to provide high gain in the switching region such that the output will swing from low to high when the input varies 10- 1 5mV. Higher gain can ease the requirement on input DC biasing and lower the static power consumption. From a stability perspective, a certain amount of hysteresis is desired to reduce the risk of oscillation. A practical value of 5mV for hysteresis is reasonable. Overall, the design specifications can be summarized in Table 4.1. Table 4.1: Design specifications of the active decap. Specifications Worst-case switching delay < 1 ns Bandwidth > 1 GHz Static power <5 mW Specific design values are as follows. In the reference voltage generation circuit, the size of the transistors is chosen to provide a branch current of 5-6j.tA. R1 can then be implemented with a value of 5k2 to produce a separation voltage of the comparator inputs for —30mV. For the RC based high-pass filters, a cut-off frequency above 10MHz is used to filter out low-frequency 62 supply noise to save power. However, if the cut-off frequency is set too high, it may cause oscillation at the supply rails. A cut-off frequency of 16MHz was finally selected. The two resistors have the value ofR2=R31 Ok2, while the capacitors (C2 and C3) are implemented with the same value of lpF. These RC values are somewhat flexible, unless the cut-off frequency is set exceedingly high. For instance, it was observed that for a cut-off frequency of 1.6GHz or above, oscillation on the supply rail will occur. Therefore, it is a good approach to have the cut off frequency designed to be a few orders of magnitudes smaller than the oscillation frequency. 4.2.4 Latch-Based Comparator Design There are a wide variety of ways to design a comparator [57][58][59][60]. Also, the Appendix section of this dissertation provides the fundamentals of comparator designs. In this specific application, the two comparators must be able to sense voltage variations that exceed the pre specified sensitivity level (i.e., 1O-l5mV in this case). When the decaps are in parallel, the subthreshold leakage from the switches consumes considerable power due to the large sizes of the switch transistors. To reduce leakage current, the outputs of the comparators should be as close as possible to either VDD or Vss. The supply noise budget for a 1V power supply is normally less than 50-lOOmV and the output is full swing, indicating the need for high gain in the switching region. With the above considerations, a latch-based comparator was selected for this application, as shown in Fig. 4.6. The exact transistor sizes are listed in Table 4.2. The branch current of the comparators is as follows: ‘bl= 826iA, I7 377pA, 18= 135iA, ‘b2= 9OOiA, I7= 339iiA, and 118= 136jiA. 63 VDD M1O MT M3TR __ 1rM2 M1_o Cc ______________ Vin+i v- h ‘b1 I M7 CL Ijf I 7Vbiasl —. Mbl (a) VDD Vbias2 M18 h ‘b2 M17 Viii- J I Vm+ Vout OC I]M12 M11iC) Cc M1 rRZ i CL _ 1 I113I I[ M15 (b) Figure 4.6: Complementary comparator design: (a) n-type input for the top comparator, and (b) p type input for the bottom comparator. 64 Table 4.2: Transistor sizes of the comparators. Transistors WidthlLength Transistors Width/Length Mbl (NMOS) 75jimJO.liim Mb2 (PMOS) 15OimIO.1pm M1/M2 (NMOS) 37.5iimJO.1,tm M11/M12 (PMOS) 75jim!O.ljtm M3/M4 (PMOS) 37.5 jim/O.ljim M13/M14 (NMOS) 18.75iimIO.1,im M5/M6 (PMOS) 48jimIO.ljtm M15/M16 (NMOS) 24jimIO.lj.tm M7/M8 (NMOS) 37.5jimJO.ljim M17/M18 (PMOS) 75i.tmJO.1im M9/MlO (PMOS) 75iim/O.1,im M19/M20 (NMOS) 18.75jim/O.ljim This two-stage architecture satisfies the need for high gain and full swing, but must be designed to avoid any potential stability or oscillation problems. For a latch-based first stage, introducing a certain amount of hysteresis will prevent the comparator from switching back to the standby state in the presence of small variations around the switching region. For the n-type input comparator shown in Fig. 4.6(a), the hysteresis voltage, Vhys, is given by [57]: V 2I 24 —l (411)hys JpflCQx.(W/L)Ml ji—:--:• where ‘bi is the bias current. In Equation (4.11), 2 is the size ratio [(W/L)Ms / (W/L)] and 2>1 for a latch. Once the slew rate and the bias current is determined, both ‘bi and (W/L)M1 are fixed, leaving 2 as the only parameter for Vhys. In this case, a 2 value of 1.28 was chosen, producing a hysteresis voltage of around 5mV. 65 The second stage converts the differential signals into a signal-ended output and provides the requisite level shifting. The second stage is also used as an output buffer to drive the large switches, where the desired slew rate can be achieved by adjusting the bias currents and transistor sizes. Complementary designs are used for the top and the bottom comparators to have roughly equal switching delays. The bias voltages for the comparators are generated by simple current mirrors. PVT variations on the comparator and the bias generation can cause delay differences in the comparator outputs. This delay difference acts to further reduce the boosted voltage, as illustrated earlier in Fig. 4.5. During the design stage, great care has been given to ensure that the delay differences are within lOOps under all PVT variation simulations, which results in an additional 5% loss in the boosted voltage (i.e., 1 .6V rather than 1 .7V). The dominant poles of this two-stage comparator were identified for stability compensation since there is a feedback path through the supply rails back to the comparator inputs. Therefore, the output resistance and the load capacitance of the comparator need to be carefully designed to properly position the dominant pole. In this case, a Miller compensation capacitance C is added to shift the dominant pole to a low frequency to improve stability. Also, a nulling resistor Rz is present to cancel the right-half-plane zero [59]. The simulated large-signal DC characteristics of the n-type input comparator are illustrated in Fig. 4.7(a), where the curves with hysteresis are shown. Here, the switching region of the comparator is in the range of ±lOmV. A 2 value of 1.28 from Equation (4.11) was selected to produce about 5mV of hysteresis. The peak DC gain is around 48dB. The AC curve for the comparator is shown in Fig. 4.7(b) where the phase margin (PM) at unity gain is indicated as 39°. 66 1Frequency (MHz) (b) Figure 4.7: (a) DC and (b) AC (compensated) characteristic curves for the two-stage latch-based comparator design (n-type input shown). 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.05 -0.04 -0.03 -0.02 -0.01 0 50 40 30 20 10 0 -10 0 0.01 0.02 0.03 0.04 0.05 zWin (V) (a) I II I I I I 11111 I I I 11111 I I I 1111 I I I I 11111 I I 11111 I I I 11111 I I I 1111 I I I 11111 I I 11111 I I I I 11111 I liii I I 11111 I I 11111 I I I I 11111 I I I I I I I 1111 I I I 1111 I I I liii I I I II III F-lI ITh rrrn rT-r-rT I I I 11111 I I III I I I 11111 I I 1111 I I 11111 I I I I 1111 I I I 11111 I i I 1111 I I I 11111 I I I I 11111 I liii I I I III I I I 11111 I I I 11111 I I I 11111 I I I I 1111 I I I I 11111 I I I 11111 I I I I 11111 I I I I I I 11111 I I I I 11111 I I P I 11111 I I 1111 I I 11111 I I 11111 I I I 11111 I I I I 1111 I I 11111 I I 11111 I I I I 11111 I I I 1111 I I I 11111 I I I I PIll I 11111 I I I 1111 I I I I 11111 I I I 11111 I I I 11111 I I I I 1111 I I 11111 I I 11111 I I I 11111 I I I I I I I 11111 I I I 11111 I I I 11111 I I I 1111 I I I 11111 I I 11111 I 11111 I I I 1111 I I 11111 I I I 11111 I I I (III I I I 1111 I I I I 11111 I I I 11111 I I I I III I I II III I I 11111 I I I 11111 I I I III II I I I 1111 I I I II 1111 I I I 1111 I I I III I I 1111 I I II 1111 I I II 1111 I I I I 1111 p i i I I I I liii I I I 1111 I I I lull I I I II III I I I I liii I I I IIII I I 11111 I I I II III I I Illil I I I I 11111 I I I hiP I I I I liii I I I 11111 I I 11111 I I I 1111 I I 1111 I 1111111 I 11111111 I II!i II 11111 I I I 11111 I I I II 1111 I I I 11111 I I I II III I I I 11111 I I I 11111 I I I liii I I I I liii I I I II 1111 I I 11111 I I I liii I 1111 I I I 11111 I I I I 11111 I I PjI 0 I I 1111 I 11111 I I I I 11111 I I I Irl I I 1111 I I 11111 I I 1111 I I I I 11111 I I II lii 10 100 1000 10000 67 The active decap must be able to boost the supply voltage within one clock cycle such that the average supply noise per clock cycle is reduced, since this factor controls the path delay of the logic blocks [20][61]. In this case, our design goal was set to a maximum clock speed of about 1GHz, which makes it suitable for today’s high-end ASICs, and even medium-speed custom designs. When the supply voltage drops to 0.9V (implying lOOmV of noise, i.e., k=0.1), the average switching delay for a full output swing was designed to be 0.5ns, which should allow proper operation up to 2GHz. The boosted voltage, based on prior considerations, should be in the range of 1 .6V. The charge demand of the logic circuit itself will cause an additional voltage drop of kVDDn2 = 0.1 •1 .22 0.4 V, resulting in an expected final voltage of 1 .2V. In addition, the current drive of the comparators will act to reduce the supply voltage further, but hopefully keep the value above 0.9V, which is the noise budget. The active decap was simulated and compared to prior architectures that were also redesigned in 9Onrn CMOS to quantify the improvement and design tradeoffs. The circuit proposed in [39] was first implemented, where it uses opamps in place of comparators, followed by a chain of inverters to drive the switches. The inverters were optimally-sized according to logical effort [39]. However, its minimum delay was about 0.9ns (1 .2ns for the slow process corner) which is almost unsuitable for typical ASIC speeds, although its power dissipation was only 0.8mW. In the second design in [37J[38], the sensing circuitry is formed by a pseudo-cascode amplifier delivering high speed at the cost of high power. The original design [37] was implemented in a 0.1 5pm process. The design was adapted to the 9Onrn process and it was found that, by replacing their comparator design with the latch-based version, the static power consumption of the switching circuitry is reduced from 13mW to 2.8mW, an improvement factor of almost 5X, 68 while the delay only increases slightly from O.4ns to O.5ns. The comparison of the three designs is provided in Table 4.3, where the design parameters that fail to satisfy the specifications are shown in italic. Note that the new design features hysteresis while the other two do not. The small-signal ac characteristics of the three designs are shown in Fig. 4.8. For the circuit in [39], the chain of inverters was removed for small-signal analysis. Table 4.3: Simulated switching circuit design specification comparison. Specifications 13711381 1391 This work Process 1V-core 9Onm STM Switching delay (typical) 0.4 ns 0.9 ns 0.5 ns Switching delay (slow) < 1 ns 0.5 ns 1.2 ns 0.75 ns Bandwidth >1 GHz 2 GHz 0.8 GHz 1.5 GHz Static power* < 5 mW 13 mW 0.8 mW 2.8 mW Hysteresis voltage 0 0 4.2 mV * Switching circuitry only 69 50 ,.‘I, Figure 4.8: AC characteristic curves for the three designs. 4.3 Chip Design and Experimental Results 4.3.1 Test Chip Setup A test chip was fabricated in a standard 9Onm 1V-core CMOS process with seven metal layers to validate the results and to quantify the degree of improvement as operation frequency increases when an active decap is used as a drop-in replacement for a passive decap. The test chip setup is shown in Fig. 4.9, where an active decap, a passive decap, and some user logic are implemented. The layout of the active decap is shown in Fig. 4.10. The switching circuitry is located in the center, with the two parallel decaps on either side. The decap on the left is PMOS, and the one on the right is NMOS. The total layout area for the active decap is 600im x 142jim = 0.085mm2,in 70 II - -1 Cl) E Cl) S. 40 30 20 10 0 -10 [39] (ui -———4- * — — + t t H - — — — I— —4 — 4- + + 4- H “°“+? I I ——‘ I— —r — — 11 — — i rI r rr- -r r -r r - PM=10? I I 44--I——— -I—— 1 rru r:T TE7\ETrrJiJHH I I I I I I I I I I I I I I I I ..I_..J 390 I II I I I I I I I I I I I 10 100 1000 /10000 PM = 11° Frequency (MHz) 100000 which the two decaps on either side combine for an area of 0.077mm2The switching circuitry, including switch transistors (Mnl, Mn2, Mpl and Mp2), accounts for only 10% of the area and this does not greatly affect the final voltage drop, as shown below. Norininal Supply (1V) Passive Decap (ESD protected) Rmesh Figure 4.9: Test chip setup. The area overhead consumed by the sensing and switching circuitry should be considered as an additional penalty for the active decap performance. The percentage area overhead can be defmed as x, then the charge provided by the active decap is: Qseries = {bJr,D —(1— k) VDDI . (1 X)Cdecp / fl (4.12) Rmesh Lpack Rjjst VDD input (external) Rst package —— mesh inductance resistance (Innuicked) capacitances resistance .—--. User Logic to decap (Buffer) Figure 4.10: Layout of active decap showing the relative size of the components. 71 Thus, Equation (4.7) should be re-written as: (4.13) kVDD flCdecap Assuming k=0.1, n=2, and b=1.5, the final voltage a can be plotted as a function of the area overhead x, as shown in Fig. 4.11. From the figure, it is clear that the area overhead should be limited to within 30% to achieve a reasonable fmal voltage. If the area overhead is above 43%, using the active decap brings no benefit. In our case, x=lO% so that the penalty is only 5OmV. 0.9 0.8 0.7 I I I 0% 10% 20% 30% 40% 50% x Figure 4.11: Final voltage a as a function of sensing and switching circuitry area overhead x. The switch sizes were chosen to have a suitable parasitic series resistance to provide ESD protection, sufficient transient response [12] and good damping for potential LC resonance [13]. In our case, the two parallel decaps are formed using thin-oxide transistors to improve area 72 efficiency, since ESD is not a major concern due to switch resistances inherent in the circuit. The decoupling capacitance values in the standby mode are 0.34nF each, resulting in a total of 0.68iiF in parallel. The extra passive decap of Fig. 4.9 was used to represent fixed decap that is always present in the neighborhood of the active decap. It cannot be shut off. It also employs a series PMOS device to protect it from ESD risks. Both active and passive decaps are placed about 600pm away from the user logic. Ref. [41] uses a linear feedback shift register (LFSR) as the user logic to generate power supply noise because the resulting noise pattern is somewhat randomized. In our design, simply a large buffer with a large capacitive load was used to create supply noise, with the input controlled by an external signal. This way, the switching frequency can be controlled and modified directly from the input. The size of the decaps was chosen to be only a few times larger than the capacitive load to create a —‘lOOmV voltage drop for the experiments. The three resistors (R1,R2 and R3 shown in Fig. 4.3) were implemented using p+ poiy resistances. The two capacitors C2 and C3 in Fig. 4.3 were implemented using MOS transistors to minimize area overhead. A test chip microphotograph is illustrated in Fig. 4.12. The test chip area is totally 1.2 x 0.8 mm2. 73 To measure the on-chip supply noise, a packaged die was not used because it was intended to observe internal voltages near the logic block. Thus, the supply variations were measured directly with probes. Power supply noise comes from both JR drop and Ldi/dt effects. The inductance L in the Ldi/dt effect is mainly due to the package, as the on-chip wire inductances are normally negligible [2J. Since an actual package was not used, two on-chip spiral inductors were implemented to mimic the package inductances, one on the supply path and the other on the return path. The value of the spiral inductors is close to a typical wire-bond package inductance. The user logic and the decaps were placed far away from the supply/ground pads (about 6OOim) to create a large mesh resistance. Effectively, the pad and mesh of the test chip were designed to Active Decap Passive Decap Figure 4.12: Annotated test chip microphotograph. 74 produce a measurable amount of power supply noise such that any improvements of using active decaps could be easily observed. 4.3.2 Test Chip Simulations Before showing the measurement results of the test chips, it is desired to illustrate the simulation results first to demonstrate the close relationship of the two and to provide a better understanding of how the active decap actually behaves when large supply noise is present. The simulation setup follows exactly the test chip in Fig. 4.9. When a clock signal is fed into the test chip, the buffer, as the user logic circuit, switches to draw certain current from the supply rails. The current perks created from the buffer cause the supply voltage VDD to drop, resulting in the two input voltages of the comparators to swap. The comparators then switch their output levels according to the input swap. After certain delay, the outputs of the comparators switch, causing a local boost at the supply voltage VDD. Once the supply is boosted, the inputs of the comparators move back to their nominal values. As a result, the outputs of the comparators switch back after some delay and the active decap waits for the next voltage drop. When the active decap is turned off, the circuit behaves like a passive decap. In such a scenario, the supply voltage follows the current draw of the buffer, and no boost in the supply voltage is achieved. From post-layout HSPICE simulation, the current draw, the supply voltage VDD and the internal signal switching are shown in Fig. 4.13, where the clock frequency is set at 500MHz. 75 t_LJ l f J /1/ — -r AvArr*fJ.JrrJEfrN:cf iZa i’*i fl1twA---.- Current taken from buffer 25Dm ZOOm mom lOOn SOn 0 N N r ion 1.05 12n lime (Fm) (liME) VDD voltage active decap on active decao off 950m SOOn V a I 05Dm - A! 4 / iZn vin+ Vin - Vt DOOm DOOm 55Dm gsoon 45Dm 40Dm Vog (top) 1fout (hot... 00015 g DOOm L ‘ZOOm 0 line (Iii) (liME) Figure 4.13: Simulated VDD voltage (on a 500MHZ clock) with active decap on and off. 76 Random process variations exist to affect the effectiveness of the active decap. At the slow corner, the delay of the comparators is increased, resulting in a later boosting point at the supply rail. On the other hand, for the fast process, the comparators take a shorter delay to switch so that the local supply boost occurs earlier. As a consequence, the supply voltage remains high for a longer time for the fast corner, and for a shorter time for the slow corner, as illustrated in Fig. 4.14. Therefore, the average supply noise per clock cycle should be less for a test chip at the fast process corner. Alternatively, a larger average supply noise level can be expected if a test chip happens to be at the slow process corner. Designers should make sure that the active decap provides satisfactory improvement to remove hot spots under process variations, especially at the slow process corner. VDD voltage (Typicel) VDD voltage (Slow) 141Pfl 00Dm lZn 1411 95Dm 90Dm DOOm ion 95Dm 91mm 05Dm VDD voltage (Fast) lime iOn Figure 4.14: Simulated VDD voltage with active decap on for different process corners. 77 4.3.3 Test Chip Measurements This section describes the measurement results obtained on 15 test chips and validates the simulation results, as illustrated in Fig. 4.15. An Agilent 861 30A bit error rate tester was used to drive the inputs, while an Agilent DSO8 1 304A oscilloscope was used to observe the results. As mentioned earlier, the passive decap shown in the figure is always present in the test chip. The enable signal is used to selectively turn on or off the active decap by applying a high or low voltage. When disabled, the decaps are biased in parallel, utilizing a maximum standby capacitance. In that case, the active decap behaves purely as a passive decap. When enabled, the active decap is triggered by voltage drops of about 5OmV. By turning on and off the active decap, the average VDD voltage improvement can be measured. A collection of 15 sample chips are tested, where the clock frequency is fixed at 1GHz to observe the improvement across the sample space. The test results from these sample chips were categorized into three groups: slow, typical, and fast, to reflect the nature of random process variation on silicon. Note that in all cases, the active decap moved the JR drop inside the lOOmV noise budget. The average VDD voltages of each group line up closely with simulation under process variations. 78 930 920 - A A A A A A A A • A A ci) A A900 • Noise Budget . 890 Slow 880 A Active Decap ON _________ I 870 • Active Decap OFF Fat 860 I I I I I I I I I 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sample Number Figure 4.15: Grouped scatter plot for active decap noise reduction for the tested sample chips. In Fig. 4.15, when the active decap is off, the average supply voltage varies significantly from 873mV to 9lOmV. This is due to the random process variation on Rmesh and Lpack, which determines the JR drop on the supply rails. But more importantly, a higher average VDD value when the active decap is off results in a smaller improvement when turning on the active decap. This fact is caused by the nature of the comparators. Specifically, if the input of the comparator varies with a large swing, then the comparator will generate an output with a shorter delay. That is, for a larger supply noise k, the delay of the comparator td is smaller. This effect can be illustrated in Fig. 4.16. Longer delay results in lower bandwidth, which affects the active decap performance at high frequencies. Note that the delay difference Etd also increases as k increases, which will reduce the boosted voltage b slightly. 79 Table 4.4: Comparator delay td and delay difference Atd in different corners. Corners td (top) td (bottom) Average td Slow 0.77 ns 0.72 ns 0.05 ns 0.75 ns Typical 0.53 ns 0.48 ns 0.05 ns 0.51 ns Fast 0.32 ns 0.30 ns 0.02 ns 0.31 ns The different process corners have direct impact on comparator delay and delay difference. For a fixed value of k=0. 1, the comparator delay and delay difference under process variation is highlighted in Table 4.4. The average delay between the top and the bottom comparator varies -d I 0.65 0.08 0.07 0.6 0.06 4- 0.55 0.05 0.04 0.5 0.03 0.02 0.45 • 0.01 0.4 I I I I O.OO 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19 Supply Noise k Figure 4.16: Comparator delay td and delay difference Md as a function of supply noise k. I) 0 I 80 from 0.75ns (slow) to 0.3 ins (fast), more than 140% of variation. This effect explains the on-die variation for the active decap performance measurements. However, the delay difference of the comparators only varies slightly, indicating that the boosted voltage b will not be affected greatly by process variations. By averaging the average supply voltage from each group in Fig. 4.15, the overall improvement of using the active decap can be assessed, as highlighted in Table 4.5. Note that in the test chip the passive decap is always connected. To illustrate the improvement solely due to use of active decap, simulations were carried out with the passive decap completely removed. The simulated results showing active decap only versus passive decap is summarized in Table 4.6, where the case for the active decap provides a higher average supply voltage across the process corners. Due to the dynamic power consumption of the comparators during switching and other non idealities, the active decap cannot practically reach the final voltage value described in Equation (4.13). Given a level of the supply noise when the active decap is off, the final voltage can be calculated from Equation (4.13). Comparing the expected and the actual final voltage when the active decap is turned on, the two values are close (-400mV of difference) for the cases of typical and fast process corners, as summarized in Table 4.6. 81 Table 4.5: Measured active decap performance for different process corners. Average VDD voltage Corners Improvement active decap ON active decap OFF Slow 914 mV 903 mV 11 mY Typical 904mV 878mV 26mV Fast 917 mV 872 mV 45 mY Table 4.6: Comparison between equation and simulated result after correlation. Simulated avg Final voltage Measured average VDD voltage VDD: active Corners (from eqn.) decap only aVDD active decap ON active decap OFF (correlated) Slow 1169 mY 932 mY 914 mV 903 mV Typical 1058 mY 909 mV 904 mV 878 mV Fast 1031 mY 921mV 917mV 872mV It is now possible to validate Equation (4.8). Although the test equipment has a limited bandwidth of less than 1.5GHz, the test results for a 1GHz clock can be used to correlate simulations for high frequency effects. The three process corners are used here. Simulations were 82 carried out from 1GHz to 3GHz, showing the cross points of active decap and passive decap for the average supply voltage. The simulation results are then correlated with measurement results for the three process corners. The correlated results are shown in Fig. 4.17. The relationship between the active decap bandwidth (crossing point) and the average comparator delay are summarized in Table 4.7. Clearly, Equation (4.8) captures the effect of process variation corners. Table 4.7: Active decap bandwidth versus average comparator delay under process corners. Corners Bandwidth Average delay Slow 1.55GHz 0.75ns Typical 2.4 GHz 0.51 ns Fast > 3 GHz 0.31 ns 920 900 880 p860 840 820 ci) 780 760 1000 1500 25002000 Clk Frequency (MHz) (a) 3000 83 920 900 880 .s 860 840 820 800 780 760 1000 3000 (b) 920 900 840 820 La) 780 760 1000 3000 (c) Figure 4.17: Simulated average VDD voltage with active decap on and off versus clock frequency for (a) slow, (b) typical, and (c) fast process corners. 1500 2000 2500 Cik Frequency (MHz) A Active Decap ON • Active Decap OFF 1500 2000 2500 Cik Frequency (MHz) 84 4.3.4 Measurement Results on One Typical Chip Three sample chips were found to be at the typical process corner. The average VDD voltages when the active decap is on and off can be found back in Table 4.5. A sample chip that has a close value to the average voltage of the three typical chips was used for further analysis in this section. As before, by turning on and off the active decap, the average VDD voltage improvement was measured. This is shown in Fig. 4.18, where the input is set at 500MHz, typical of ASIC designs. In Fig. 4.18(a), an actual screen shot of the input and supply voltages are provided with the active decap enabled. In Fig. 4.18(b), the supply waveforms for the passive and active cases are superimposed. The average supply voltage increases from 900mV to 914mV. Therefore, the noise level dropped from lOOmV to 86mV, an improvement of l4mV (or about 14% less noise). This improvement can be expected to be almost doubled for an isolated active decap, as illustrated later using simulation. : O’? 0?” (a) 85 1.02 0.97 0.92 0.87 0.82 Time (ns) (b) Figure 4.18: Measured results (on a 500Mflz clock) for (a) active decap on and (b) plotted comparison between active decap on and off. Fig. 4.19 shows the measured points as the external input frequency increased from 200MHz to 1GHz. The measurements at 500MHz described above are circled. Two solid trend lines are provided corresponding to active decap on and off. The gap between the two trend lines initially widens indicating that the benefit of active decaps increases as frequency increases. The test chip validated that the active decap has a maximum improvement of 23mV (or about 20% less noise) for a 1 GHz design. Circuit simulation was used to further study the effect of higher clock frequencies, also shown in Fig. 4.19 as dashed lines. Below 2GHz, the active decap can provide more charge than the passive decap. Above 2GHz, its performance diminishes because of the fixed switching speed. The crossing point of the two trend lines at about 2.4GHz indicates the 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 86 bandwidth limits due to the switching nature of this active decap design. However, today’s high- end ASIC designs still run at below 1GHz, so this active decap is quite acceptable. 930 - A A A A Measured Active Decap ON • Measured Active Decap OFF Simulated Active Deca.p ON D Simulated Active Decap OFF 910 > 890 blj Ct 870 850 830 810 0 500 1000 1500 2000 2500 Cik Frequency (MHz) Figure 4.19: Measured (0.2 - 1GHz) and simulated (1 - 2.5Ghz) average VDD voltage with active decap on and off versus clock frequency. In Fig. 4.19, the maximum difference between the active decap on and off occurs at about 1GHz, about a half of the active decap bandwidth. As described in the previous section, the high- frequency cross point of the active decap and the passive decap is at about l/O.5ns=2GHz, where O.5ns is the comparator delay. At low frequencies, the active decap and the passive decap have similar average VDD values because the clock period is long enough to eliminate the advantage of using the active decap when taking an average VDD level per clock cycle. As a result, since the difference is almost zero at low frequency and at the bandwidth frequency, the maximum 87 difference (or benefit) can be considered to occur at roughly 1/2 of the active decap bandwidth, in this case, 2GHz/2=1 GHz. Therefore, to maximize the effectiveness of the active decap, designers should ensure that the comparator delay td is always less than 1/2 of the clock cycle. 4.4 Active Decap Size and Placement While the test chips are useful in quantifying active decap improvement over passive decap as frequency increases, the proper sizing and placement of the active decap determines the effectiveness of the drop-in replacement approach. When converting a fixed area from passive decaps into active decaps, the standby capacitance is always smaller because of the area overhead of the switches and comparators in the active decap. The actual noise improvement depends on the area available and the location of the active decap relative to the hot spot. Intuitively, if the area available for active decaps is small and the overhead area is a large percentage of the total area, active decaps may not be an effective replacement for passive decaps. On the other hand, if the area available is too large, the noise reduction for active and passive decaps may be similar because of excessive delays in the switching circuitry trying to switch the oversized Cdecap’S. Therefore, there exists an optimal area for active decaps where they are most effective. Similarly, the instantaneous boost provided by active decap must be close to the hot spot to be effective, but this should be traded off against the distance from the supply to replenish the charge. To explore these aspects, extensive circuit simulation was used to first calibrate the test chip measurements with HSPICE simulation using exactly the same setup in Fig. 4.9. As a calibration 88 metric, the average VDD noise per clock cycle, VDD noise, was used since it is known to be the controlling factor of the critical path delay of logic circuits [20]: VDD noise = VDD nominal — VDD = lv — VDjJ avg (4.14) With a 500MHz input and the active decap turned on, the waveforms from circuit simulation of the supply voltage and internal signals were previously shown in Fig. 4.13. These results are not identical to the measured results of Fig. 4.18 but they do, in fact, have a similar average value. For one clock cycle, VDD avg = 918mV, which is fairly close to the measured average of 914mV. It was found that, for other frequencies, VDD avg closely matched the running average from measurement. Since the measured and simulated values are correlated in this way, the average was used for the rest of the analysis. The fixed passive decap was also removed from the circuit in further analysis to study the improvements derived from the stand-alone active decap. While VDD noise is determined by many other factors including package/padlpower grid design and clock frequency, for the purpose of this analysis the same power grid design and the 500MHz input were kept the same, and only the decap size varied. The average noise for same- area passive and active decaps was compared. The size of the center switching circuitry of Fig. 4.10 also remained constant. In the passive decap simulations, standard cross-coupled designs were used. In Fig. 4.20, the average noise VDD noise is plotted versus passive and active decap size varying from 85iim2 to 8.5mm2 The plot shows that the active decap reduces the noise relative to passive decaps for sizes between 0.001mm2 and 0.6mm2 However, if the available area for decap insertion is smaller than 0.00 1mm2 or greater than 0.6mm2, there is little difference between the two. When the area is small, the active decap is not as effective since the amount of capacitance switched in series is small. When the area is large, the fixed switching circuit in the 89 active decap cannot switch the decaps effectively because the capacitive load exceeds its capability. 150 -°- Passive Decap .140 — Active Decap 130 120 I:. zzz <90 80 I 0.0001 0.001 0.01 0.1 1 10 2Decap Size (mm ) (log scale) Figure 4.20: Simulated average VDD noise per clock cycle versus normalized decap size. Fig. 4.21 is used to illustrate the optimal size for the active decap design, where the solid curve indicates the noise reduction difference between passive and active decaps. The maximum difference occurs in the range of 0.01mm2 to 0.1mm2 If the active decap is designed in this range, it has the greatest advantage over passive decaps in terms of average supply noise reduction. The test chip was designed to be 0.085mm2to obtain close to optimum improvement of 23mV, as described in the previous section. In Fig. 4.21, it is also shown that the area overhead of the switching circuitry in the design is only 10% in the region around the optimal value. 90 30 10000 1000 20 ci) 15 100 10 . - . 1 o 1 -10 0.1 0.0001 0.001 0.01 0.1 1 10 Decap Size (2) (log scale) Figure 4.21: Power supply noise reduction difference from active decap and passive decap with area overhead from switching circuit of active decap. As mentioned earlier, another important factor in the resulting improvement is the actual placement of active decaps relative to the hot spot. Referring back to Fig. 4.9, the effective distance from the hot spot can be adjusted by varying Similarly, the distance from the charge re-supply path can be controlled by varying Rmesh. Simulations were carried out to observe the voltage drops while changing only Rmesh and Rdt. The simulation results are shown in Fig. 4.22. The decap size was fixed at the optimal value of 0.02mm2from Fig. 4.20 so that the maximum improvement could be observed. As the distance between the decap and the user logic is varied from lORth to 0.1Rajs, the average noise level in the passive case changes from 134mV to l24mV. However, for the active case, the average noise level reduces from 133mV to 74mV. 91 Therefore, the active decap is more sensitive to placement than the passive decap. This makes intuitive sense because the active decap provides a short-term boost in the charge which acts in a small, localized neighborhood. However, the passive and active decaps exhibit similar characteristics as a function of Rmesh, according to the results in Fig. 4.22. As a result, the active decap should be placed as close as possible to the hot spot to be most effective. 160 140 ,120 j 100 Cd) o 80 z > ci) ci) 20 0 0.1 Rdist 1 .ORdist 1 ORdist 0.1 Rmesh 1 .ORmesh 1 ORmesh Figure 4.22: Improvement on average VDD noise for using active decaps in different placement locations by varying Rdist and Rmesh. 4.5 Summary This chapter described the effectiveness of active decaps as a late-stage drop-in replacement for passive decaps so that a completed chip layout need not be disrupted near the tapeout deadline. Improvements to the design of an active decoupling capacitor were described for removal of hot- 92 spot power supply noise in ASIC designs up to 10Hz operation. The modified active decap using latch-based comparators in 9Onm CMOS is able to switch in 0.5ns and consumes a relatively low power of 2.8mW, which is about 5X lower than a previous design running at approximately the same speed. This reduced power makes it more suitable for use in ASIC designs. Measurement results from test chips indicate improvement over passive decaps of 10% - 20%, operating from 200MHz to 1GHz. The optimal active decap size to maximally remove hot-spot noise was identified. Placement analysis was also carried out and it was found that the active decap is most effective when placed in close proximity to the hot spot, as compared to the passive decap which is not as sensitive to the exact location. In summary, if sized and placed properly, active decaps can be up to 20% better when used as drop-in replacements df passive decaps for power supply noise reduction. 93 Chapter 5 Generalized Active Decap and Charge-Borrowing Decap 5.1 Introduction The previous chapter explored the effectiveness of using active decaps to remove hot-spot IR drop violations. This chapter investigates advanced versions of the active decoupling capacitor and proposes a novel design of a charge-borrowing decap (CBD). The extension of the active decap concept is derived by increasing the stack height n to a larger value to ideally achieve a higher boosted voltage than the basic n=2 active decap. The optimal number of n will be evaluated in theory and practical applications [62]. The CBD design is a completely different approach to addressing the hot-spot removal problem [63]. The new design aims to provide enough charge during every cycle to reduce JR drop with only a minimum power overhead. However, the location of the JR-drop problem must be known in advance and sufficient area must be available with a relatively clean supply to implement the solution. The applications and limitations of the charge-borrowing decap will be evaluated in this chapter. 94 5.2 Extended Active Decoupling Capacitor 5.2.1 Optimal Stack Height n The concept of the extended active decap is simply to increase the stack height n of the basic active decap described in Chapter 4. The motivation to have a larger stack height (n>2) is to generate a higher boosted voltage flVDD to potentially achieve a better improvement when applied to reduce supply noise. For example, the ideal boosted voltage is 3VDD for n=3, 4VDD for n=4, and so on. Therefore, it seems that the stack height should be designed as large as possible to obtain a high enough local boost so that the supply noise can be reduced to an arbitrarily small level. However, this is not true in practice. The higher boosted levels cannot be reached due to the nonidealities of the circuit. Also, by increasing the stack height, more switches will be required to turn the decaps in parallel or in series. The active decap circuits for n=2, n=3 and n—4 are illustrated in Fig. 5.1(a), 5.1(b) and 5.1(c), respectively. In the figure, it is assumed that the total area available for the decaps is fixed at2Cdecap. Therefore, the decap occupies an area of Cdecap for n=2, while the cases of n=3 and n=4 have an area of(2/3)Cdecap and (l/2)Cdecap, respectively. 95 VDD 2VDD 1 Vg Caccap V Vss decaps in parallel VDD Caecap Cdecap Cdecap Vss Vss decaps in series 3VDD (a) Caccap 4+C& Vss Vss Vss decaps in parallel VDD 1-c decap C decap Vss decaps in series (b) 4Vpp L J decap dec4 rr41 Vss Vss Vss Vss Vss decaps in parallel decaps in series (c) Figure 5.1: Active decap concept showing different stack height: (a) n=2, (b) n=3, and (c) r—4. 96 The practical constraints of stacking the decaps can be illustrated by first showing the actual transistor implementation of the n=3 case in Fig. 5.2, where the switches are implemented using MOS transistors. Note that the number of switches required is increased by three every time the stack height is increased by one. That is, for n2, the number of switches 3(n-1). In the figure, each horizontal switch is implemented with two transistors (NMOS and PMOS), whereas each vertical switch requires only one transistor (NMOS or PMOS). The design of the stack height of n=4 can be realized in a similar manner, although not shown here. V Vs Figure 5.2: MOS implementation of the extended active decap (n=3). Due to the additional switches, two methods of design exist: one is to expand the area occupied by the switch transistors, and the other is to reduce the size of each switch transistor if a relatively constant total area needs to be maintained. The first method increases the area overhead x of the active decap, and also causes a longer comparator delay because of the additional loading capacitance. The increased area overhead reduces the final voltage aVDD to which the active decap can boost the supply voltage. But more importantly, a longer delay results VDD C decap 97 in a reduced bandwidth, which lowers the operating frequency, as described in the previous chapter. The second method uses a fixed total area occupied by the switch transistors. By this approach, the delay of the comparators should remain roughly the same since there is only a minimum change in the loading capacitance. Therefore, the operating frequency is not affected. However, packing more switch transistors into the same area can cause the “on” resistance R of each transistor to increase, which reduces the boosted voltage bVDD. The second method to implement the extended active decap was used since the active decap bandwidth should not be compromised for high-end ASIC applications. That is, different designs are designed to have the same operating bandwidth. The final voltage aVDD can be obtained by varying the stack height, n, as illustrated in Fig. 5.3, with different supply noise levels, k. The large dots indicate the highest final voltage points on the different k curves. Note that the starting point of n=1 implies a passive decap. Clearly, for k=O.05, the optimal n value that produces the highest final voltage is 4, whereas for k=O. 1, the optimal value is n=3. For an increased level of k=O. 15, the optimal n value reduces to 2. As the supply noise k further increases, the use of passive decaps (n1) is recommended, as for the case of k=O.2. From the figure, one can conclude that a higher n should be used jfk is small, and vice-versa. Therefore, it is important to select n based on the k range where the resulting final voltage aVDD is the highest. In order to do this, the optimal k ranges for each n and the crossover points between ranges must be determined. 98 21.5 -d C E -e 1 C 0 1) cl 0.5 0 n Figure 5.3: Final voltage aVDD as a function of stack height n with k varying (fixed area). The supply noise crossover point, for two different stacking levels, fl=i and =fl2, S defined when both cases produce the same final voltage. This can be used to identif’ suitable ranges for each stacking level. To obtain Equation (4.13) can be rearranged into the following form: k12 = (bL2 — bL1)(1 — xL1 )(1 — xL2) (5.1)(n2 —n1 )+n1 x2 —n2 where fllfl2 and fl2>fll. Plugging in numbers, the crossover noise value from n=2 to n=3 is k2,3=O. 12, and from n=3 to n=4 isk3,4=O.08. Effectively, the crossover pointk4,5=O.05 determines the boundary where a passive decap should be used since the case of n5 would not be used if 1 2 3 4 99 the noise was 5% or less (i.e., acceptable level). Similarly,k12=0.17 produces the same final voltage for n=2 and n=1 (passive decap). When k is above 0.17, the passive decap should be used. The results are presented in a graphical form in Fig. 5.4. The four lines represent n=l, 2, 3 and 4, respectively. The line with the highest value in each region is the optimal value for n. For low values of k, the best choice is n=4. Starting at k3,4 the best choice is n=3. At k2,3 the best choice becomes n=2. At k1,2 the best choice is n=1 from that point onward. The results are summarized in Table 5.1. Table 5.1: Optimal stack height n selection based on the supply noise k (from formula). Condition Optimal n k < 0.05 n=1 (use passive decap) O.05<k <0.08 n=4 0.08 < k < 0.12 n3 0.12<k<O.17 n=2 k> 0.17 n=1 (use passive decap) 100 2.5 2 ‘‘-‘l 5 0.5 0 0 0.2 k Figure 5.4: Final voltage aVDD as a function of k with different stack height n (from formula). As described earlier, if the supply noise is above 0.17, the use of any form of the active decap cannot boost the supply voltage to a satisfactory level. Other design approaches to reduce the supply noise may have to be used in that situation. However, the more interesting range of k is from 0.08 to 0.17, since this noise level is typically unacceptable. If the supply noise k is below 0.12 but above 0.08, then the active decap should be designed with n=3 to produce the minimum noise. Similarly, jfk is above 0.12 but below 0.17, the basic active decap with n=2 is optimal. 0.05 0.1 0.15 101 5.2.2 Design and Layout of n=3 Extended Active Decap To validate the results of optimal n selection, the extended active decaps with n=3 and n=4 were implemented. For simplicity, only the design of n3 is illustrated in this section. Similar to the basic active decap design, Fig. 5.5 illustrates the extended active decap for n=3 containing four blocks: a reference voltage generator, a pair of high-pass filters, two comparators, and the switched decaps. The operation of the circuit remains the same: the differential inputs of each comparator decide the standby voltage level at the outputs of the comparators. When the power grid discharges, VDD will drop and Vss will rise. The voltage variations are passed through the high-pass filters to the comparator inputs causing the comparators to reverse their output values, and switch the three-piece decaps from parallel to series. Later, when the power grid charges up, the comparator inputs and outputs switch back to their original values. An enable signal is provided for testing purposes to allow the active decap circuitry to be connected to the global supply rail. Unlike the previous design, when off, the extended active decap is disconnected from the rest of the circuit. This allows for a comparison between the basic and the extended decaps. Switched Decaps (N=3) Figure 5.5: Extended active decap (n=3) architecture. global VDD global VsReference Voltage High-pass Generator Filters Comparators 102 --t - - Figure 5.6: Layout of extended active decap (n=3) showing the relative size of the components. The layout of this extended active decap is shown in Fig. 5.6. The switching circuitry is located offset from the center, with the three parallel decaps on either side. The two decaps on the left are PMOS, and the one on the right is NMOS. The total layout area for the active decap is 600j.tm x 142jim = 0.085mm2,in which the three decaps combine for an area of 0.077mm2Each switch transistor was layed out with only a half of the area as before, resulting in an almost doubling of the “on” resistance. As a result, the switching circuitry, including the switch transistors, still accounts for only 10% of the area overhead. Note that the total area is the same as the basic active decap so that a comparison between the two can be carried out. Although not shown, the design and layout of the n=4 case is similar to the case of n3. 5.2.3 Simulation Results The next step was to use simulations to obtain the supply voltage waveforms under different supply noise levels, k. By increasing the size of the user logic buffer, k can be varied and the supply voltages will differ in the cases of n=2 and n=3, as illustrated in Fig. 5.7. When k0. 12, case n=3 provides a larger boost than n=2. On the other hand, when k=0.15, both n=2 and n=3 were insufficient in terms of delivering charge to boost up the supply, but the n=3 case is even 103 worse. As shown in the figure, by using the second method, the delay of the comparators remains roughly the same. The design of n=4 active decap was also simulated, and the results are similar to the n=2 basic active decap case. One noticeable difference is that as k increases above 0.12, the average VDD voltage drops earlier for the n—4 case than the n=2 case. However, when k is at around 0.05, the average VDD voltage for n=4 is slightly higher than both n=2 and n=3 cases. uIvnrnTTnIa 600ni 50Dm ....-... £loomCurrenttaken From buffer to produce k=O.12 or k=O.15 DO:trl :i(rcur) DO:trfl:i(rcur) k.are... In I in ‘fl... IIhfl rr’LIfl VDD voltage (k=O.12) Basic act. decap Extended active decap kualue 1.1 1.05 Iii a5an 500m 05Dm I ion nfl VDD voltage (k=O.15) Basic act. decap Extended active decap lb 1.1 too I 95&n 50Cm U0 05Dm 00Dm 70Cm 70Dm rAi , .s . iOn in rmm (On) (liME) 1’in Figure 5.7: Simulated VDD voltage with extended active decap (n3) on for two different Ic levels. 104 980 960 940 920 900 I:: < 840 820 800 Table 5.2: Optimal stack height n selection based on the suppiy noise k (from simulation). Condition Optimal n (simulated) k < 0.05 n=1 (use passive decap) 0.05 <k<0.07 0.07 0.14 n=3 0.14<k<0.16 n2 k> 0.16 n=1 (use passive decap) —Passive Decap (n=1) —°- Active Decap (n=2) -&- Active Decap (n=3) —‘c- Active Decap (n=4) 0.050 0.1 0.15 k Figure 5.8: Average VDD voltage as a function of k with different stack height n (from simulation). 0.2 105 Using simulations, the average supply voltages per clock cycle for the n2, 3 and 4 cases when the supply noise k varies from 0.02 to 0.2 were compared and plotted in Fig. 5.8. The corresponding optimal stack height n as a function of the supply noise k is summarized in Table 5.2. The crossover points of k1, n2 are similar between formula and simulation. The most important crossover point of the n=2 and n=3 cases from simulation is k2,3=0. 14, somewhat higher than the calculated value of 0.12. Above 0.14, no approach can raise the supply level back to 900mV, making the use of active decaps less valuable in this region. On the other hand, the lower bound of k3,4 is at 0.07, slightly below the calculated level of 0.08. Therefore, a slightly wider k range of 0.070. 14 for n=3 makes it superior to the basic active decap. Unlike the formula, when the k value is low, the active decaps do not switch due to the fixed triggering voltage of about 5OmV, resulting in the active decaps producing slightly worse average supply voltage than the passive decap. However, the active decaps become worse then the passive decap when k<0.05, and they should not to be used. Although the n4 case is the best when k is in the range of 0.050.07, it only has limited value since this k range is small and the improvement over the n=3 active decap is marginal. Thus, it can be concluded from simulation that n3 provides the optimal level of the average supply voltage across a wide supply noise range of below 0.14. If the supply noise is above 0.14, a larger area is required to increase the average supply level to a satisfactory level above 900mV. 106 5.3 Charge-Borrowing Decap (CBD) 5.3.1 Charge-Borrowing Decap Concept The main purpose of the active decap is to boost the voltage locally to reduce supply noise. Therefore, any technique that offers this type of improvement would also qualify as a viable alternative. For example, if charge is “borrowed” from a clean supply to boost up a noisy supply, it would help reduce the hot-spot JR-drop problem. That is the basic concept behind a charge- borrowing decap (CBD), which is a novel but relatively simple idea illustrated in Fig. 5.9. The key idea here is based on capacitive feedthrough. Assuming that the total area available is the same as before, the decoupling capacitance is 2Cdecap. In Fig. 5.9(a), when power supply noise kVDD is present, a passive decap provides charge equal to(2Cdecap)(kVDD) (2k)CdecapVDD. In the case of the CBD, as shown in Fig. 5.8(b), it can boost the local supply voltage to 2VDD ideally, similar to the active decaps. From another perspective, the charge provided by the CBD circuit in one clock cycle can ideally be up to(2Cdecap)LVc1k =2CdecapVDD, where AV1k is the clock swing. Therefore, over one clock cycle, the charge-borrowing decap provides significantly more charge than a same-area passive decap. VDD VDD —a.. 2VDD 2Cdecap 2Cdecap VDD 0JL (a) (b) Figure 5.9: Charge-borrowing decap concept shown in (b), compared to a passive decap in (a). 107 Table 5.3: High-level comparison among passive, active, and charge-borrowing decaps. Charge provided (ideal) Boosted voltage (ideal) Passive decap (2k)CdecapVDD - Active decap (n=2) (l/2)CdecapVDD 2VDD Active decap (n=3) (419)CdecapVDD 3VDD Active decap (n=4) (3/8)CdecapVDD 4VDD Charge-borrowing decap (CBD) 2CdecapVDD 2VDD A high-level comparison between passive decap, active decaps, and CBD is highlighted in Table 5.3, where the differences in charge available and boosted voltage are shown. Note that the total decap area is fixed to 2Cdecap in the comparison. In Table 5.3, the basic active decap (n=2) is better in charge provided than the passive decap if 2k is less than 0.5 (or, k<0.25), plus the active decap also provides a local boost in the supply voltage. The charge-borrowing decap is always superior to basic active decap since it supplies more charge while generating the same boost voltage level. The charge supplied from active decaps with n=3 or n=4 is about 11% or 25% less than the basic active decap (n=2), respectively, but their boosted voltages are higher. This intuitively explains the fact that the extended active decap is better than the basic case only in limited situations. From the concept, it is difficult to conclude the superiority between the extended active decaps and the CBD. More design details need to be studied before a conclusion can be made. From the table, the design that provides the most charge per clock cycle is the charge-borrowing decap so that it was named after this feature. If designed properly, the use of 108 charge-borrowing decaps can potentially remove hot spots as a drop-in replacement of passive decaps, similar to the active decaps. The rest of this chapter will provide results to support this argument. In Fig. 5.9(b), there is a problem with the falling edge of the clock. When the Cik signal rises from 0 to VDD, the supply is boosted to 2VDD ideally due to capacitance feedthrough. Then, the user logic nearby switches, resulting in certain amount of charge to withdraw from the supply. The supply voltage will drop by AV. And then the Cik signal falls from VDD back to 0, forcing the supply voltage to drop from 2VDD-LW to VDD-AV, which may not be acceptable if iw is excessively large. Therefore, the concept of charge-borrowing decap needs some additional circuitry to function properly, as shown in Fig. 5.10(a). Two diodes are inserted at node Bi, one from the clean supply and the other from the noisy supply. Without the connection to the clean VDD, when the clock is low, B 1 stays at roughly Vss since the current flow from VDD to B 1 is prevented by the diode, D2. When the clock goes high, Bi rises to VDD. This boost in voltage will not trigger current flow from Bi to the supply because both B1 and the supply are at the same level of VDD. Therefore, the voltage at Bi should be around VDD when the clock is low. This ensures that the voltage at B 1 can reach about 2VDD when the clock goes high and the supply can be charged. To achieve that, access to a clean supply of VDD is needed through Dl. Therefore, the implementation of the charge-borrowing decap requires a clocking signal, two diodes, and a supply node that has less noise (clean). Assuming there is one threshold voltage VT drop across each diode, the operation of the CBD circuit can be illustrated in Fig. 5.10. When Clk is at 0, the noisy supply node is assumed to be at 109 VDD, while B 1 is charged at VDD-VT. When Cik rises to VDD, B 1 rises to2VDD-VT at the same time. As a result, the noisy supply node is increased to2VDD-T. Before the clock falls, some drop occurs at the supply, causing it to drop by zW to 2VDD-2VT-AV. Then, Clk falls back to 0, so that B 1 also reduces to VDD-VT. However, D2 prevents charge from flowing back to B 1 from the supply. Therefore, the noisy supply remains at2VDD-2VT-AV. clean supply ® VDD clean supply ( VDD D1 D1 c&Ho I 0 VDjVT VDD 2VDD-VT 2VDp-2VT-VI 0 In a CMOS process, the diodes shown in Fig. 5.10 can generally be implemented in one of the two forms: NMOS and PMOS, as illustrated in Fig. 5.11(a) and 5.11(b), respectively [58j[64]. In Fig. 5.11(a), the voltage at Bi when the clock signal is low is VDD-VTnI, where VT1 is the clean supply @ DD Dl D2 noisy supply VDD-VT 2VDD2VTV (c)(a) (b) Figure 5.10: Diode inserted charge-borrowing decap showing the states of the clocking signal: (a) Cik at 0, (b) Cik rises to VDD, and (c) Cik falls back to 0. VDD (clean) VDD (clean) Mdiii Mdpi JL VDD Mdnl (noisy) (a) JL VDD (noisy) (b) Figure 5.11: Possible implementations of charge-borrowing decap with (a) NMOS-formed diodes and (b) PMOS-formed diodes. 110 threshold voltage of Mdnl. When clock rises, B 1 is increased to2VDD-VTn, causing current flow to charge up the noisy supply rail. Assuming that the supply is localized, then the boosted voltage is2VDD-VTflI-VTII2 due to the threshold voltage degradation of Mdn2. Similarly, in Fig. 5.10(c), the boosted voltage is2VDD-IVTPII-IVTP2I where IVTpII and IVTp2I are the absolute threshold voltages ofMdpl and Mdp2, respectively. Both forms in Fig. 5.11(a) and 5.11(b) have practical limitations. In Fig. 5.11(a), when Bi reaches 2VDD-VT1, the same voltage is applied at the gate of Mdn2. Such a high gate voltage may cause oxide breakdown in deep submicron processes, particularly 9Onm and below [24]. A thick-oxide transistor should be used for Mdn2 to protect it from breakdown. In Fig. 5.11(b), when B 1 is at2VDD-IVTPI , the drain of the transistor Mdp 1 is at the same voltage, while the body of the transistor is tied to VDD. This also creates a reliability concern that the pn junction of the transistor Mdpl is forward-biased, resulting in the injection of a large amount of current back to the clean supply rail. Therefore, the PMOS implementation in Fig. 5.11(b) has to be modified. For example, if the gate voltage of Mdp 1 were controlled separately using an appropriate voltage level and the bulk of Mdp 1 were connected to B 1, the forward biasing of the pn junction can be avoided. 2VDD oJL B2 VDD 0JL Cik VDD(noisy) Figure 5.12: Improved implementation of charge-borrowing decap with PMOS formed diodes. VDD (clean) 111 The modified circuit shown in Fig. 5.12 resolves the issue of forward-biasing the transistor Mdpl. When the clock is low, B2 is set at V5s, which allows B 1 to be charged to VDD instead of VDD VTPII. Forward-biasing the pn junction of Mdpl here is no problem because the transistor behaves as a diode. When the clock rises, Bi is switched to2VDD, and B2 is also raised to2VDD. The high gate-source voltage of Mdpl disables the current flow to back to the supply. The pn junction is now reverse-biased to prevent leakage. Note that in Fig. 5.12, the gate-body voltages of both transistors Mdpl and Mdp2 are always within one VDD of each other, ensuring gate- oxide reliability. Moreover, as a side effect, this circuit can generate a boosted voltage of2VDD- IVTp2I instead of2VDD-IVTP1 HVTp2I, which improves the design. 2V 0_[L Figure 5.13: Generation of the boosted voltage on node B2. To generate the bootstrapped signal at node B2, the concept of a clock multiplier [58][65][66] can be used. The circuit for generating the bootstrapped signal at B2 is shown in Fig. 5.13. If one assumes that the clock has no previous activity, due to leakage through the transistors, it can be verified that both nodes B3 and B4 are at roughly VDD while both transistors Mcnl and Mcn2 are shut off. When the clock is turned low, B3 goes to2VDD and B4 is roughly at VDD, which causes Mcn2 to turn on. The output node B2 is low, along with the clock signal. When the clock rises to 112 VDD, B3 is discharged from2VDD to VDD, whereas B4 is charged up to 2VDD. This causes Mcnl to turn on and Mcn2 to turn off. The rise of the clock signal also turns on Mip, allowing the output node B2 to follow B4. Because the gate of Mip is at Vss, the voltage at B2 can rise to 2VDD without any voltage loss. Similar to the previous design approach, the body of the transistor Mip is connected to B4 to ensure reverse-biased pn junctions. In Fig. 5.13, the two inverters are powered by the clean supply rail. The sizes of the PMOS-formed capacitors should be large enough to supply enough charge to the load capacitance on B2. For Mcnl and Mcn2, thick-oxide transistors are used to reduce the risk of potential oxide breakdown because their gate-body voltages can be up to2VDD. 5.3.2 “CIk” Signal Generation A critical concern about the charge-borrowing decap design is the additional capacitive loading on the clock distribution system if the main clock of the chip is connected to the Cik input of the CBD. Since the CBD has a large capacitance value in the range of hundreds of picofarads, such a large capacitance loaded on the clock tree may cause an imbalance of the tree and introduce more clock skew and jitter [67]. In extreme cases, this extra loading may cause a functional failure in the clock distribution network. Therefore, the Cik input of the CBD should be generated from some other sources to keep the main clock tree unaffected. The Cik input of the CBD simply requires a repetitive signal with enough buffer strength to drive a large capacitor. The frequency of the Cik signal should be roughly in the range of the chip’s operating frequency to ensure sufficient charge pumped into the supply rail at every clock cycle. If the chip operates at a lower frequency, the extra charge provided from CBD will not harm the logic circuits connected to the local supply as long as the slew rate of the boosted voltages 113 remains controlled within practical limitations. Another useful feature would be to implement an enable/disable function to the CBD block. When disabled, the block should behave like a regular passive decap. The CBD is turned on only when the JR drop exceeds certain predefined level or only during the period that the logic circuits connected to the local supply experience higher activities. VDD (clean) enab Ring oscillator: 39 stages Buffer chain: 7 stages Figure 5.14: “CIk” signal generation using ring oscillator. Satisfying the requirement above, a simple ring oscillator was selected to generate the Clk signal for the CBD, as illustrated in Fig. 5.14. A total of 39 inverter stages were used to provide an oscillation frequency of 1GHz, the upper limit of the targeted ASIC applications. A NAND gate replaces the regular inverter at the first stage to incorporate an enable signal. By using unit-sized inverters, the ring oscillator consumes about 651iW of dynamic power. To provide enough current flow to charge and discharge the decoupling capacitance, in this case2Cdecap, a chain of inverters was added and sized according to logical effort [7]. A fan-out factor of the inverter chain was selected to be about 3—4 so that the delay through the chain was minimized. The number of stages required to generate the fan-out factor of 3 to 4 was then calculated. Of course, the circuit in Fig. 5.14 will cause additional supply noise on the “clean” supply, especially with a largely-sized last stage of the buffer chain. One has to ensure that the additional noise on the relatively clean supply node does not exceed certain noise budget when transferring charge from 114 the clean supply to the noisy supply. As the size of the inverter chain increases, its dynamic power also increases, at a benefit of the improved slew rate (SR). This effect can be shown in Fig. 5.15. Note that in the figure the dynamic power includes the ring oscillator plus the entire buffer chain, in which each buffer is sized up properly to produce the minimum path delay. 14 100 9012 80 110 8 :JZJJZ._:JZJZ::z:z 50 . 6 40 S - 4 . - 30 I. 4 -20 2 :300rn 10 0 I I I 0 0 200 400 600 800 1000 Final Stage Size (NMOS transistor) (tim) Figure 5.15: Buffer size determines the edge delay while the buffer chain (with ring oscillator) consumes dynamic power. The current through the last stage of the chain I1t_stage controls the decap charge and discharge, so it determines the slew rate. The edge delay can be defined here as the delay time for the positive side of the decap to rise a full swing, as follows: Edge Delay = i... 2Cdecap (5.2) SR ‘last stage 115 The edge delay is a better term in this scenario to illustrate the design tradeoffs in Fig. 5.15. For a targeting clock frequency of 1GHz, the corresponding clock period is ins. Having an edge delay in the range of 50 to iO0ps (i.e., 1/20 — 1/10 of the clock period) is reasonable. Therefore, a buffer size of 300.tm/600jim (NMOS/PMOS) for the last stage was selected to produce an edge delay of about 5Ops, while the total dynamic power was around 3.8mW. In that case, the buffer chain was designed to have 7 stages sized up according to logical effort. When determining the size of the buffer chain, the capacitance of2Cdecap is fixed at 0.68nF, consistent with the basic active decap design in Chapter 4. Clearly, assuming a fixed edge delay, the size of buffer required is proportional to the decap value. If a smaller decap is used, the buffer size can be made smaller to dissipate less power. This power drawn from the clean supply node is critical so that the supply noise caused by the ring oscillator and the buffer chain should not rise beyond the noise budget in its localized region. That is, the goal of generating the “Clk” signal from a clean VDD is to provide charge from the clean supply that is not connected to the main system clock or any important circuitry. Designers must ensure that the clean supply itself does not become excessively noisy so that the local supply integrity is compromised. 5.3.3 Design of Charge-Borrowing Decap With the considerations from the previous section, the complete charge-borrowing decap circuit is depicted in Fig 5.16. An enable signal is provided to turn off the CBD for test purposes. When the enable signal is low, the transistor Msp is off, preventing current flow from the clean supply. The decap is implemented using PMOS transistors, whose value is set to2Cdecap for comparison. When enabled, the voltage at B2 varies from 0 to 2VDD. The gate-source capacitance of Mdpl creates certain noise on the clean supply node due to clock feedthrough. The existence of Msp 116 provides shielding to the clock feedthrough to reduce the noise. With this circuit configuration, the practical boosted voltage is2VDD-IVTP2I, while the charge provided is at2CdecapVDD. Note that the ESD concern on the thin-oxide decap in this circuit should be addressed by proper sizing of the two transistors, Mdpl and Mdp2. VDD • — El Mdp2 (noisy) I2Cd I L — — —_‘qj Figure 5.16: Complete circuit diagram of charge-borrowing decap. As described earlier, after a local hot spot is identified, its nearby white space or passive decap area is occupied with an active decap or a CBD to reduce the supply noise. In the case of CBD, only the passive decap 2Cciecap and the transistor Mdp2 need to be implemented locally. Other circuits showing in Fig. 5.16 can be located away from the hot spot but near a clean supply node, once such a clean supply is identified. Two global interconnects may be required to connect the two parts of circuits at node Cik and B 1. The actual placement of the ring oscillator and the buffer chain will depend on the floorplan and location of power pins of the chip itself. Compared to the size of the passive decap, the size of Mdp2 is fairly small and even negligible to include in an area overhead. Since the clean supply node does not require a large area of decaps, the area 1 117 occupied by the circuits at the clean supply is relatively small. Thus, it is assumed that the CBD block requires a minimal area overhead, relative to the hot spot area where the CBD is placed. 5.3.4 Simulation Results To validate the concept of charge-borrowing decap, HSPICE simulations were carried out. In the simulation setup, one charge-borrowing decap with enable signal is present, and no decoupling capacitor is connected between VDD and Vss. The load on the supply rail is a large buffer whose current demand can be controlled. As mentioned previously, the CBD can boost the supply voltage when the clock signal rises. The best scenario occurs when the current demand of the buffer also lines up with the rising edge of the clock. As a result, the dips on the supply voltage produced by the current demand and the peaks generated by the CBD cancel each other, causing a relatively low noise voltage profile. Such a case can be illustrated in Fig. 5.17. In Fig. 5.17, the top part of the figure shows the clock signal, whereas the second part depicts the supply voltage when both passive decap and CBD are disconnected from the supply rail. The load buffer is designed to create voltage sags at the rising clock edges. In the third portion in Fig. 5.17, the load buffer is removed and only the CBD is connected and turned on. Clearly, the supply voltage is boosted at the rising edges. The dips near the falling edges can be considered as ripples since there is no decoupling capacitance connected. The last (fourth) part in the figure illustrates the voltage waveform when the CBD is turned on and the load is switching. The resulting supply voltage experiences a low level of noise because of the cancellation of peaks and dips. 118 1.18 1.16 1.14 1.12 1.1 1.06 1.06 1.04 1.02 580 90Gm 94Gm 92Gm 90Dm 88Gm 86Gm 64Gm Tmin (01) (TIME) Figure 5.17: Simulated VDD voltage (on a 1GHz clock) with a CBD on and off (best case). cm GOOn 60Gm DO:lrl :v(cIk) 40Dm WOn ion 1.05 vDD voltage CBD off (no decap) 12n 950e GOOn ,\ It, ,\ It, 12nion 1.15 1.1VDD voltage cnn on (no load) Time (lie) (TIME) 1.05 DG:tro:v(vebs) 9 C 90Gm 85Cm lot VDD voltage cnn on (with load) e lZn J ::::::::::::::::::::::::::::::::.ç:. :::.ç::;.:::::::::::::::j:::::::::::::::::::::::::::: 101 Time (he) (liME) in 119 The above example can be considered as a best-case scenario because the current demand of the load and the clock rising edge are synchronized. On the other hand, the worst case would be that the current demand and the falling clock edge are lined up. The simulation result at a 1GHz external clock for the worst case is shown in Fig. 5.18. In the figure, the voltage sags created by the current demand are similar when the CBD circuit is turned on or off. The peaks created by the CBD are out of phase with the sags produced by the current demand. However, if the average VDD value per clock cycle is used, similar to the previous approaches, it is clear that the CBD circuit produces a much higher average VDD voltage because the voltage peaks in every clock cycle raise the average VDD. Cik I 0m E 60DmIi 40Dm 2n I I / 0 ion VDD voltage CBD on CBD off 1211 1 .i5 1.1 1.05 Dnkll Domn 05Dm ion Time (un) (liME) Figure 5.18: Simulated VD voltage (on a 1Ghz clock) with a CBD on and off (worst case). 12n 120 The simulations above are intended only to illustrate how the CBD circuit behaves under controlled conditions. Another set of simulations was used to compare the CBD with passive decap and active decap (both n=2 and n3) for a clock frequency of 1GHz. As before, the size of the user logic buffer is changed to produce different supply noise levels k. The results are plotted in Fig. 5.19. At all k levels, the average VDD voltage for the CBD is higher than the passive decap and the two active decaps. When k=O. 15 for the passive decap, the average supply noise for the CBD is still at a satisfactory level of 1 OOmV. Compared to the case of active decaps at the same k level, the average noise from the basic and extended active decaps fall close to that of the passive decap. This indicates that the CBD is more effective as a drop-in replacement than the other schemes. 1000 960 920 880 840 800 0.2 Figure 5.19: Simulated average VDD voltage as a function of k showing the case of CBD. 0 0.05 0.1 0.15 k 121 5.4 Test Chip Setup and Measurement Results To validate this new approach to hot-spot removal, another test chip was fabricated in the same 9Onm process. The degree of improvement is of interest as operation frequency increases when a basic active decap, an extended active decap, or a charge-borrowing decap is used as a drop-in replacement for a passive decap. The test chip setup is shown in Fig. 5.20. The decap circuits are individually controlled to be connected to the supply rail. In this case, the passive decap can be disconnected from the supply to observe the performance of the active decaps and the CBD by themselves. Nonninal Supply (1V) Figure 5.20: Test chip 2 setup. F= lllIIItI ______ I I liii II II I I HHII HIIIHHHHIH 1IiiIILIIF 111111 HHHHHI H H Hi IrIHI HHI1HHLiI liii 1HHH IHIHIIHIHIHH11I1 F F F F F F FI F I F F F I F If F F F F F F F I I Figure 5.21: Layout of charge-borrowing decap showing the relative size of the components. inductance (mimicked) parasitic capacitances 122 The layout of the charge-borrowing decap for the test chip is shown in Fig. 5.21. Unlike the circuit diagram shown in Fig. 5.16, the ring oscillator and the buffer chain were not implemented, as the Clk signal was provided from the same external clock for synchronization purposes. The rest of the circuit, along with the passive decap2Cdep, was implemented on the test chip. The switching circuitry is located on the left, with the PMOS-formed decap on the right. The total layout area for the CBD is about 600jim x l5Ojim = 0.09mm2,in which the decap occupies an area of 0.083mm2The switching circuitry, including the diode transistors and the bootstrapping circuit, accounts for 8% of the area. The gate-source voltage across the decap oxide is always less than or equal to one VDD so that thin-oxide devices are acceptable in this case. Thus, thin- oxide PMOS transistors were used to implement the decap in the CBD. \ F:: EEj ---E .J ____ ij ________ _ __ _ _ . , \I ____ _ __ ___ : _ __ _______ _____________ II== - - - \k i::i T EIL. (b) Figure 5.22: (a) Annotated microphotograph and (b) layout of the second test chip. The microphotograph and the layout of the second test chip are shown in Fig. 5.22. In the figure, the decap circuits are placed about 600jim away from the user logic, in which a large buffer with a large capacitive load was used to create supply noise, with the input controlled by an external signal. Similar to the first test chip, the size of the decaps was chosen to be only a few times F 123 larger than the capacitive load to create a —1 OOmV voltage drop for the experiments. Note that the load cannot be dynamically changed to produce a variable k level. The clock frequency can be controlled externally by providing an input from a bit-rate analyzer. The probed pad of the supply node can be connected to an oscilloscope to measure the voltage waveforms. The test chip area is 1.1 x 0.86 = 0.95 mm2 in total. A collection of 9 sample chips was tested. The clock frequency was fixed at 1 GHz for this test. The improvement across the sample space is shown in Fig. 5.23. The sample chips that are in the slow or normal process corners cannot be distinguished easily. However, the two sample chips that are in the fast process corner can be identified by correlating with simulation. As mentioned in Chapter 4, the average supply voltage varies significantly mainly due to the random process variation on Rmesh and Lpack, which determines the JR drop on the supply rails. Note that the supply noise reduction improvement by using the CBD is rather consistent under process variations across dies, which indicates the robustness of the design. The sample chip that provides the best improvement was used for further analysis. 124 990 _________________ 980 970 -. V - > 960 V E ‘95O V 940 - • - - .0 > 930 - V 920 -A A 910 -E11.) A A900 VNoise Budget > QQfl - I 880 870 8 60 V I I I Fast 1 2 3 4 5 6 7 8 9 Sample Number Figure 5.23: Scatter plot comparing average VDD for the tested sample chips. The waveforms of a test chip with a 1GHz clock are depicted in Fig. 5.24. The dark gray curve is when the CBD is on, and the light gray (red in color) curve is when the passive decap is on. The two curves are superimposed. As expected, the supply voltage returns to high when extra charge is fed through from the clean supply on the rising edge of the clock, improving the average VDD level per cycle. • Passive Decap A Active Decap n=2 • Active Decap n=3 •CBD 125 ‘V •tI L r •4••’—•+-+’ jt. :x......ti..1. \/NJ\ ‘-+--I—+—I’- -+-44—4-4-+.——I- 1h’ .1. a.!. t° .t-’4- ,‘. .1. Figure 5.24: Superimposed waveforms showing a CBD and a passive decap on a 1GHz clock. In the top panel of the figure, the upper trace is when the CBD is on, while the lower trace is when the passive decap is on. It is also useful to obtain the CBD performance over the operating frequency range. Fig. 5.25 shows the impact of changing the external input frequency from 100MHz to 1.50Hz. Two solid trend lines are provided conesponding to the CBD and the passive decap. The gap between the two trend lines widens, indicating that the benefit of using the CBD increases as frequency increases. Even at 1.50Hz, the average VDD level for the CBD is significantly higher than the passive decap, suggesting that the CBD is suitable for today’s high-speed ASICs and medium- speed custom designs. At 1.5GHz, the test chip validated that the CBD has a maximum cquis. is stopped. 0.0 GSa/s L8O kpts [iWi 00 rnv/ ‘Ii———— 2 \j\vV vvz. 1 -a7ps — .1! vv 126 improvement of 93mV (or about 53% less noise) over the passive decap, 74mV (48% less noise) over the basic active decap with n=2, or 46mV (36% less noise) over the extended active decap with n=3. From 100MHz to 1.5GHz, compared to the passive decap, using the CBD reduces the supply noise from 42% to 55%. Note that the CBD outperforms both the basic and the extended active decaps across the operating frequency range. Figure 5.25: Measured average VDD voltages at different clock frequencies. Another important observation is that, unlike active decaps, the CBD does not seem to have a specific frequency, at which the average VDD voltages from the CBD and the passive decap crossover. In other words, there is always a gap in the average supply voltage between the CBD and the passive decap across the frequency range. This makes intuitive sense since the CBD 980 960 940 920 900 880 860 840 820 800 • Passive Decap A Active Decap n=2 • Active Decap n3 •CBD • • A 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Input Clock Frequency (GHz) 127 boosts the supply voltage at every clock cycle no matter the clock frequency. Although the noise increases as the frequency increases, the amount of charge provided by the CBD circuit remains roughly the same at every cycle (assuming that it is running at the clock frequency). From the test results of the sample chips, it can be concluded that the charge-borrowing decap is capable of reducing the supply noise more efficiently than the passive and active decaps. In addition, there are other attractive features of the CBD, such as higher bandwidth, simplicity, and robustness. Moreover, the use of active decaps will increase the power consumption of the chip because of the internal switching circuitry. In the case of CBD, the local power overhead is fairly small since there is only very limited leakage current and most charge transferred from the clean supply to the noisy supply is not wasted. Specifically, the leakage power of the CBD is about 4p.W, about 0.1% of the total power consumption. However, the dynamic power of the Cik generation circuit is comparable to the active decap. Therefore, the equivalent power overhead with better performance makes the CBD circuit an appealing alternative to the active or passive decaps. Although the charge-borrowing decap provides many advantages, it has a number of limitations. One important issue is that the supply voltage change is abrupt once the CBD is turned on. High frequency glitches on the supply may affect the logic circuit powered by it. To smooth out the supply voltages, large amount of passive decaps should be present in its vicinity. Although parasitic decoupling capacitance is inherently present on the supply, more decaps may be still needed. As a result, the existing area may not be completely replaceable by the CBD, as certain portion of the area should be reserved for the passive decaps. For a fixed area, the proportion of 128 the CBD and the passive decap depends on the sensitivity of the local logic circuit to the supply glitches. Questions like the maximum allowed distance from the decap to the CBD and the minimum required size of the decap remain unanswered at this stage. However, since the local circuitry of the CBD is relatively simple (passive decap plus a diode-connected transistor), it is still an attractive alternative to the active decaps. 5.5 Summary This chapter further extended the concept of active decap by increasing the stack height n to find an optimal value, depending on the supply noise k level presented at the local supply. It was found that the extended active decap with n=3 provided superior performance by delivering a higher average supply voltage than the basic active decap with n=2 when k is <14%. This chapter also introduced a novel design of charge-borrowing decap to provide better supply noise reduction than the basic and the extended active decaps. The charge-borrowing decap delivers more charge and an increased supply boost for a wide range of operating frequencies. Its relatively simplistic design and implementation ensures its robustness. 129 Chapter 6 Conclusions and Future Work 6.1 Summary and Conclusions As technology scales further into the deep submicron regime, increasing clock frequency and decreasing supply voltage makes maintaining the quality of power supply a critical issue. On- chip power supply noise, due to JR drop and Ldi/dt effects, has a great impact on delay variation, and may even cause improper functionality. Power supply noise can be reduced by placing decoupling capacitors close to power pads and large drivers throughout the power distribution system. Decaps provide locally “instantaneous” current to the switching drivers and keep the power supply within certain noise budgets. Traditionally, a decap is made from an NMOS transistor outside the standard-cell blocks, or a pair of NMOS and PMOS transistors within the blocks. However, starting from 9Onm technology, the reduction in oxide thickness of MOS transistors causes an increased ESD risk and more gate leakage. Standard decap designs, therefore, may no longer be appropriate for 9Onm and beyond. In this dissertation, the goal was to provide practical solutions to active and passive decap designs targeting ASIC applications in both white-space and standard-cell areas. The dissertation began with an overview of decap design basics, gate leakage phenomenon, ESD concerns, and 130 standard-cell decap layout and placement. Some essential decap design issues were highlighted through the background section to motivate the topics for the rest of the dissertation. More importantly, the metric for power supply noise management was proposed and validated for decap performance comparisons used throughout the dissertation. Next, the tradeoffs between high-frequency performance of decaps and ESD protection were investigated, and their impacts on the layout of standard-cell passive decaps were discussed. A design metric was introduced to determine the optimal number of fingers to use in the standard- cell layout to obtain a desired capacitance level over a target operating frequency. Models were developed to capture the frequency responses of Reff and Ceff for any given technology with only a few parameters. For ESD protection, a cross-coupled design that had been previously proposed by cell library developers was shown to suffer from reduced frequency response and provided no savings in gate leakage. It was shown that more fingers than typically used were needed to provide the target resistance value for sufficient ESD protection. The layout with the smallest NMOS device and a multi-fingered PMOS device was described to deliver acceptable frequency response and ESD reliability, while providing the lowest leakage. For white-space areas, the effectiveness of active decaps as a late-stage drop-in replacement for passive decaps was evaluated so that a completed chip layout need not be disrupted near the tapeout deadline. Improvements to the design of an active decoupling capacitor were described for removal of hot-spot JR-drop violations in ASIC designs running at up to 1GHz. The modified active decap using latch-based comparators in 9Onm CMOS is able to switch in O.5ns and consumes a relatively low power of 2.8mW, which is about 5X lower than a previous design 131 running at approximately the same speed. This reduced power makes it more suitable for use in ASIC designs. Measurement results from test chips indicate improvement over passive decaps of 10% - 20%, operating from 200MHz to 1GHz. The optimal active decap size to maximally remove hot-spot JR drop was identified. Placement analysis was also carried out and it was found that the active decap is most effective when placed in close proximity to the hot spot, as compared to the passive decap which is not as sensitive to the exact location. If sized and placed properly, active decaps can be up to 20% better when used as drop-in replacements of passive decaps for power supply noise reduction. Further research on active decap design was explored. By increasing the stack height n to an optimal value, and depending on the supply noise k level presented at the local supply, a maximum supply boost can be achieved. It was found that the extended active decap with n=3 provided a superior performance by delivering a higher average supply voltage than the n=2 and n=4 cases when the supply noise k is in the range of 7-14%. When k is above 14%, n=2 must be used and beyond 16%, the area for the drop-in replacement of active decaps must be expanded to produce satisfactory improvement over the passive clecap. Finally, the novel design for charge-borrowing decap was proposed. This design provides better supply noise reduction than all other forms of active decaps. The charge-borrowing decap efficiently transfers charge from a clean supply rail to eliminate the hot spots on relatively noisy supply nodes. The CBD only requires a minimum power overhead and delivers a maximum supply boost for a wide range of operating frequencies. Test results indicate that the CBD outperforms both the basic and the extended active decaps by reducing the supply noise to a 132 lower level. The design and implementation of the CBD was kept relatively simple so that the robustness of the design can be maintained. 6.2 Contributions in this Dissertation The following summarizes the major contributions in this dissertation: • Developed standard-cell passive decap designs that properly trade off gate leakage, ESD reliability, and transient response. Provided simple and yet practical decap design metrics and guidelines for 9Onm and 65nm CMOS technologies. • Designed and implemented a white-space active decap using latch-based comparators that provides adequate supply noise reduction while consuming relatively low static power. Validated the design through a test chip. Explored the placement issues of the active decap. • Extended the concept of active decap for an optimal design that produced the highest supply boost for the maximum supply noise reduction in a local area. Proposed a simple but novel charge-borrowing decap circuit that outperforms the basic and the extended active decaps. Validated the design through another test chip. 6.3 Future Work The limitations on the charge-borrowing decap design needs to be evaluated further. Quantitative answers to questions like the optimal distance between the CBD and the logic, the accurate proportion between the CBD area and the nearby passive decap area, and the maximum allowable charge transferring from the clean supply node while still maintaining its supply integrity, should be investigated. A test chip with real industrial blocks that are placed as the user logic circuit is desired for the most accurate decap performance evaluation. 133 Another issue is the scalability of the active decaps and the CBD. The design and implementation of the active decap were only accomplished and validated in 9Onm CMOS. It is desirable to make the active decap designs useful in future technologies, such as 65nm and 45nm CMOS. As technology scales, more design challenges will occur. If any part of the design needs to be modified to accommodate the more advanced technology, it should be investigated in future research. Monitoring power supply fluctuations on-chip in real-time is also an emerging area of research [20]. The measured results of real-time power supply noise from a monitoring circuitry can be used as a validation of decap design and placement. Many techniques have been used to monitor power supply noise on-chip [68][69]. These techniques are not suitable for production environments as they either need a significant area or require complex data processing off-chip. To overcome these limitations, a simple monitoring based on an under-sampling technique [70] is worthy of being investigated. Under-sampling is used to capture a high-frequency periodic signal from a large number of cycles using a slower sampling signal to achieve an effective high- speed sampling rate. In the case of power supply noise, the dynamic voltage drop is not periodic in nature, but the same experiment could be repeated several times, and each time skew the sampling point by a small time shift, &, which represents the sampling period resulting in an equivalent sampling frequency of funcier-sampiing = 1/At [70]. Using this technique, the measurements may be repeated to average and cancel out the noise effect. This approach needs to be evaluated from concept to test chip in order to finally validate its advantages and disadvantages and serves as a promising area of future work. 134É APPENDIX: COMPARATOR DESIGN FUNDAMENTALS Comparators are one of the most widely used components in analog integrated circuits. A voltage comparator is a circuit that compares the instantaneous value of an input signal with a reference signal and produces an output at logic level, depending on whether the input is greater or smaller than the reference level [57]. One important application for high-speed voltage comparators is in data converters, where the conversion speed is limited by the response time of the comparators [57]. Other issues related to comparator design include finite resolution, offset, power, and area. As technology scales, more advanced CMOS technologies allow comparators to be realized for higher speed and potentially smaller area and power. However, it is difficult to achieve high speed and high accuracy at the same time because of the existence of device mismatches [71]. A widely used comparator configuration is a high-gain differential input, single-ended output amplifier, whose symbol is shown in Fig. A. 1. The output of the comparator should have a large swing, ideally from VDD to Vss, as the input varies across a small swing, typically in the millivolt range [57]. In many applications, a comparator is used in open-loop operations, such that no frequency compensation is required [57][581[59]. However, in certain cases, due to the nature of AC coupling of the output and the input, a comparator may need frequency compensation to avoid oscillations [32]. Vin+ Vin Figure A.1: A differential input, single-ended output comparator symbol. Vout 135 vout vout VDD Vp offset IVj+ - Vj11. / v+ - I, I vii! v111 —‘finite gain Av (a) (b) Figure A.2: DC transfer characteristics of (a) an ideal comparator and (b) a practical comparator with finite gain and offset voltage. The DC characteristic curve of an ideal differential comparator is shown in Fig. A.2(a). When the positive input V11+ is greater than the negative input V, the output is high (i.e., at VDD). When V+ is less than the output is low (i.e., at Vss). This ideal DC transfer curve corresponds to a differential gain of infinity. That is, an infinitely small polarity change in (V+ - V) will cause the output to switch. A more realistic DC transfer curve of a comparator is depicted in Fig. A.2(b). In practice, the differential gain is finite and equal to A. In Fig. A.2(b), the two voltages VIL and VIH are the overdrive voltages (also called the input excess voltages). The overdrive is the input level that drives the comparator from an initial saturated input condition to an input level that barely causes the output of the comparator to switch its level [57]. Another non-ideal effect of the comparator is the input referred DC offset voltage that is mainly caused by device mismatches. If no offset voltage is present, the comparator DC transfer curve will be symmetrical around the point where V+ = V. However, for a finite offset voltage of V05, the comparator output will switch at (V÷ - V + Vos). In general, the output voltage V0 of a comparator can be written as [57] [581: 136 VDD zf M”nJH = Aixç zf “IL M/n <“IH (A.l) VSS lf n<VJL where AV1 = - Clearly, the finite gain and offset voltage affect the accuracy of the comparator. It is desired that a comparator is designed to have a high gain and a small offset voltage. Three major challenges exist in any comparator design: high speed, high resolution, and low power [72]. High speed is achieved by having a fast response time. That is, following an input polarity change, the comparator should switch its output between VDD and Vss with fast rise and fall times. In order to achieve the highest comparison speed, the minimum channel length for a specific technology is often used in comparator designs [71]. In addition, low power consumption is always desired. As technology scales, due to VDD scaling, the dynamic power of a comparator will scale accordingly. Meanwhile, the static power consumption of the comparator may increase due to larger leakage current in a more advanced deep-submicron technology. Overall, the total power consumption including both dynamic and static power may reduce as technology shrinks [72]. A high gain A is essential to achieve high resolution in a comparator design. For example, the input of the comparator needs to resolve lmV of input variation, which requires the output to switch a full swing of 1V at VDD, the voltage gain Av is therefore 1V/lmV = 1000. It is difficult to achieve such a high gain within one stage of amplification. Hence, a multi-stage amplifier or a regenerative latch using positive feedback may be used as a comparator to achieve the high gain 137 requirement. A latch is normally faster then a multi-stage amplifier achieving the same gain [60][71][73]. Therefore, latch-based comparators are often used in practice. VDD M4 M3 Vout2 Vouti c-jLM2 M1jF-O Via- I I Vin+ Vbiasl -H Mbl V (a) Figure A.3: Circuit schematics showing positive feedback: (a) a basic differential pair with diode- connected load, (b) a gain-enhanced differential pair, and (c) a differential pair using positive feedback to provide increased gain. The concept of positive feedback used in the latch approach can be explained in the following example, as shown in Fig A.3. In Fig. A.3(a), a differential pair is loaded with two diode- connected PMOS devices. Its small-signal gain AV(a) can be given by [64] [73]: A -= IPn(’’Ml (A.2)V(a) g V/tP(W/L)M3 where gmi and g3 are the transconductance of Ml and M3, and 1n and pp are the channel mobility of NMOS and PMOS devices, respectively. In Fig. A.3(b), a gain enhancement approach is added to the circuit with two additional transistors, M5 and M6. The small-signal gain of Av(b) can be given by [54j[57]: AV(a) l—(2I “hi) (A.3) (b) (c) 138 where I and Ibi are the current flow through M5 and Mbl, respectively. By properly choosing the size of M5, the small-signal gain can be improved. In Fig. A.3(c), the drain terminals of M5 and M6 are cross connected, creating a form of positive feedback. The positive feedback functions as follows: when V+ is slightly larger than Vu, it causes V0i to be slightly smaller than V02. Larger V02 forces M6 to deliver less current to the node V01. Similarly, smaller V01 forces M5 to deliver more current to charge up V02. This positive loop reinforces V02 to reach VDD and V0i to reach V55 [54][57]. The small-signal gain of Av(0) can be given by [57]: 1 Av(C) AV(a) W/D (A.4))M5 (W/L)M3 Equation A.4 requires that the size ratio of (W/L)M5 / (W/L)M3 is less than 1. When the size ratio is greater than unity, the small-signal gain will become infinity and the circuit will operate as a regenerative latch [57]. 139 REFERENCES [1] H. H. Chen and S. E. Schuster, “On-chip decoupling capacitor optimization for high- performance VLSI design,” Symposium on VLSI Technology, Systems, and Applications, pp. 99-103, May-Jun. 1995. [2] S. Pant and E. Chiprout, “Power grid physics and implications for CAD,” IEEE/A CM Design Automation Conference, pp. 199-204, Jul. 2006. [3] N. Srivastava, X. Qi, and K. Banerjee, “Impact of on-chip inductance on power distribution network design for nanometer scale integrated circuits,” International Symposium on Quality ofElectronic Design (ISQED), pp. 346-35 1, Mar. 2005. [4] C. W. Fok and D. L. Pulfrey, “Full-chip power-supply noise: the effect of on-chip power- rail inductance,” International Journal ofHigh Speed Electronics and Systems, vol. 12, no. 2, pp. 573-582, Jun. 2002. [5] J. Kim, B. Choi, H. Kim, W. Ryu, Y. -H. Yun, S. -H. Hamm, S. -H. Kim, and Y. -H. Lee, “Separated role of on-chip and on-PCB decoupling capacitors for reduction of radiated emission on printed circuit board,” IEEE International Symposium on Electromagnetic Compatibility, pp. 53 1-536, Aug. 2001. [6] H. H. Chen, J. S. Neely, M. F. Wang, and G. Co, “On-chip decoupling capacitor optimization for noise and leakage reduction,” Symposium on Integrated Circuits and Systems Design, pp. 3 19-326, Sep. 2003. [7] D. A. Hodges, H. G. Jackson, and R. A. Saleh, Analysis and Design ofDigital Integrated Circuits in Deep Submicron Technology, 31X ed., New York: McGraw-Hill, 2004. [8] N. Na, T. Budell, C. Chiu, E. Tremble, and I. Wemple, “The effects of on-chip and package decoupling capacitors and an efficient ASIC decoupling methodology,” Electronic Components and Technology Conference (ECTC), pp. 556-567, Jun. 2004. [9] H. Su, S. S. Sapatnekar, and S. R. Nassif, “Optimal decoupling capacitor sizing and placement for standard-cell layout designs,” IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems, vol. 22, no. 4, pp. 428-436, Apr. 2003. [10] M. Popovich and E. G. Friedman, “Decoupling capacitors for multi-voltage power distribution systems,” IEEE Transactions on Veiy Large Scale Integration (VLSI) Systems, vol. 14, no. 3, pp. 217-228, Mar. 2006. [11] M. Popovich, E. G. Friedman, M. Sotman, A. Kolodny, and R. M. Secareanu, “Maximum effective distance of on-chip decoupling capacitors in power distribution grids,” IEEE/A CM Great Lakes Symposium on VLSI, pp. 173-179, May 2006. [12] P. Larsson, “Parasitic resistance in an MOS transistor used as on-chip decoupling capacitance,” IEEE Journal ofSolid-State Circuits, vol. 32, no. 4, pp. 574-576, Apr. 1997. 140 [13] P. Larsson, “Resonance and damping in CMOS circuits with on-chip decoupling capacitance,” IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications, vol. 45, no. 8, pp. 849—858, Aug. 1998. [14] M. D. Powell and T. N. Vijaykumar, “Exploiting resonant behavior to reduce inductive noise,” International Symposium on Computer Architecture, pp. 288—299, Jun. 2004. [15] C. H. Ng, C. S. Ho, S. -F. S. Chu, and S. -C. Sun, “MIM capacitor integration for mixed- signal/RE applications,” IEEE Transactions on Electron Devices, vol. 52, no. 7, pp. 1399- 1409, Jul. 2005. [16] M. W. C. Goh, Q. Lim, R. A. Keating, A. V. Kordesch, and Y. Bin Mohd Yusof, “Design of radio frequency metal-insulator-metal (MIM) capacitors,” International Conference on Solid-State and Integrated Circuits Technology, pp. 209-2 12, Oct. 2004. [17] C. H. Ng, C. S. Ho, N. G. Toledo, and S. -F. Chu, “Characterization and comparison of single and stacked MIMC in copper interconnect process for mixed-mode and RE applications,” IEEE Electron Device Letters, vol. 25, no. 7, pp. 489-491, Jul. 2004. [18] C. H. Ng, C. S. Ho, S. -F. S. Chu, and S. -C. Sun, “MIM capacitor integration for mixed- signal/RE applications,” IEEE Transactions on Electron Devices, vol. 52, no. 7, pp. 1399- 1409, Jul. 2005. [19] H. Yamamoto and J. A. Davis, “Decreased effectiveness of on-chip decoupling capacitance in high-frequency operation,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 15, no. 6, pp. 649-659, Jun. 2007. [20] K. Arabi, R. Saleh, and X. Meng, “Power supply noise in SoCs: metrics, management, and measurement,” IEEE Design and Test of Computers, vol. 24, no. 3, pp. 23 6-244, May-Jun. 2007. [21] TSMC 9Onm CLN9OG Process SA GE-X v3. 0 Standard Cell Library Databook, Release 1.0, Artisan Components Inc., Sunnyvale, CA, 2004. [22] X. Meng, R. Saleh, and K. Arabi, “Layout of decoupling capacitors in IP blocks for 90-nm CMOS,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 16, no. 11, pp. 1581-1588, Nov. 2008. [23) X. Meng, K. Arabi, and R. Saleh, “Novel decoupling capacitor designs for sub-9Onm CMOS technology,” International Symposium on Quality Electronic Design (ISQED), pp. 266-272, Mar. 2006. [24] A. Amerasekera and C. Duvvury, ESD in Silicon Integrated Circuits, 2 ed., Hoboken, NY: John Wiley & Sons, 2002. [25] J. Fu, Z. Luo, X. Hong, T. Cai, S. X. -D. Tan, and Z. Pan, “VLSI on-chip power/ground network optimization considering decap leakage currents,” Asia and South PacJIc Design Automation Conference, vol. 2, pp. 735-738, Jan. 2005. 141 [261 K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, “Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits,” Proceedings of IEEE, vol. 91, no. 2, pp. 305-327, Feb. 2003. [27] F. Hamzaoglu and M. Stan, “Circuit-level techniques to control gate leakage for sub lOOnm CMOS,” International Symposium on Low Power Electronics and Design, pp. 60— 63, Aug. 2002. [28] R. S. Guindi and F. N. Najm, “Design techniques for gate-leakage reduction in CMOS circuits,” International Symposium on Quality Electronic Design (ISQED), pp. 6 1-65, Mar. 2003. [29] D. Lee, W. Kwong, D. Blaauw, and D. Sylvester, “Analysis and minimization techniques for total leakage considering gate oxide leakage,” IEEE/A CM Design Automation Conference, pp 175-180, Jun. 2003. [30] L. Chang, K. J. Yang, Y. -C. Yeo, Y. -K. Choi, T. -J. King, and C. Hu, “Reduction of direct-tunneling gate leakage current in double-gate and ultra-thin body MOSFETs,” IEEE Transactions on Electron Devices, vol. 49, no. 12, pp. 2288-2295, Dec. 2002. [311 X. Meng, K. Arabi, and R. Saleh, “A novel active decoupling capacitor design in 9Onm CMOS,” IEEE International Symposium on Circuits and Systems (ISCAS), pp. 657-660, May 2007. (Top 10 Honorable Mention Award) [32] X. Meng and R. Saleh, “An improved active decoupling capacitor for “hot-spot” supply noise reduction in ASIC designs,” IEEE Journal ofSolid-State Circuits, vol. 44, no. 2, pp. 584-593, Feb. 2009. [33] R. Saleh, D. Overhauser, S. Taylor, “Full-chip verification of UDSM designs,” IEEE/A CM International Conference on Computer-Aided Design, pp. 45 3-460, Nov. 1998. [34] S. Sapatnekar, “High-performance power grids for nanometer technologies,” IEEE International Conference on VLSIDesign, pp. 839-844, Jan. 2004. [35] G. Bai, S. Bobba, and I. N. Hajj, “Static timing analysis including power supply noise effect on propagation delay in VLSI circuit,” IEEE/A CM Design Automation Conference, pp. 295-300, Jun. 2001. [36] H. Harizi, R. HauBler, M. Olbrich, and E. Barke, “Efficient modeling techniques for dynamic voltage drop analysis,” IEEE/A CMDesign Automation Conference, pp. 706-711, Jun. 2007. [37] M. Ang, R. Salem, and A. Taylor, “An on-chip voltage regulator using switched decoupling capacitors,” IEEE International Solid-State Circuits Conference, pp. 438-439, Feb. 2000. [38] M. A. Ang, and A. D. Taylor, “Voltage regulating circuit for attenuating inductance induced on-chip supply variations,” U.S. Patent 6509785, Jan. 21, 2003. 142 [39] C. Giacomotto, R. P. Masleid, and A. Harada, “Four-state switched decoupling capacitor system for active power stabilizer,” U.S. Patent 6744242 Bi, Jun. 1, 2004. [40] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits. A Design Perspective, 2nd ed., Upper Saddle River, NJ: Prentice Hall, 2004. [41] J. Gu, H. Eom, and C. Kim, “A switched decoupling capacitor circuit for on-chip supply resonance damping”, Symposium on VLSI Circuits, pp. 126-127, Jun. 2007. [42] J. Gu, R. Harjani, and C. Kim, “Distributed active decoupling capacitors for on-chip supply noise cancellation in digital VLSI circuits”, Symposium on VLSI Circuits, pp. 216- 217, Jun. 2006. [43] W. C. Lee and C. Hu, “Modeling gate and substrate currents due to conduction- and valence-band electron and hole tunneling,” Symposium on VLSI Technology, pp. 198-199, Jun. 2000. [44] K. Cao, W. -C. Lee, W. Liu, X. Jin, P. Su, S. K. H. Fung, J. X. An, B. Yu, and C. Hu, “BSIM4 gate leakage model including source drain partition,” International Electron Devices Meeting (IEDM), pp. 815-818, Dec. 2000. [45] X. Xi, M. Dunga, J. He, W. Liu, K. M. Cao, X. Jin, J. J. Ou, M. Chan, A. M. Niknejad, and C. Hu, “BSIM4.4.0 MOSFET model user’s manual,” University of California, Berkeley, 2004. [46] C. K. Alexander and M. N. 0. Sadiku, Fundamentals of Electric Circuits, New York: McGraw-Hill, 2000. [47] X. W. Wang, Y. Shi, T. P. Ma, G. J. Cui, T. Tamagawa, J. W. Golz, B. L. Halpen, and J. J. Schmitt, “Extending gate dielectric scaling limit by use of nitride or oxynitride,” Symposium on VLSI Technology, pp. 109-110, Jun. 1995. [48] T. P. Ma, “Opportunities and challenges for high-k gate dielectrics,” International Symposium on the Physical and Failure Analysis of Integrated Circuits (IPFA), pp. 1-4, Jul. 2004. [49] T. P. Ma, “Electrical characterization of high-k gate dielectrics,” International Conference on Solid-State and Integrated Circuits Technology, pp. 36 1-365, Oct. 2004. [50] V. George, S. Jahagirdar, C. Tong, K. Smits, S. Damaraju, S. Siers, V. Naydenov, T. Khondker, S. Sarkar, and P. Singh, “Penryn: 45-nm next generation Intel® CoreTM 2 processor,” IEEE Asian Solid-State Circuits Conference (ASSCC), pp. 14-17, Nov. 2007. [51] M. T. Bohr, R. S. Chau, T. Ghani, and K. Mistry, “The high-k solution,” IEEE Spectrum, vol. 40, no. 10, pp. 29-35, Oct. 2007. [52] J. Chia, “Design, layout and placement of on-chip decoupling capacitors in IP blocks,” MA.Sc Thesis, University of British Columbia, 2004. 143Q [53] S. Zhao, K. Roy and C. -K. Koh, “Decoupling capacitance allocation and its application to power-supply noise-aware floorplanning,” IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems, vol. 21, no. 1, pp 81-92, Jan. 2002. [54] J. R. Hauser, “Bias sweep rate effects on quasi-static capacitance of MOS capacitors,” IEEE Transactions on Elecfron Devices, vol. 44, no. 6, pp. 1009-1012, Jun. 1997. [55] W. Liu, MOSFET Models for SPICE Simulation Including BSIM3v3 and BSIM4, Wiley IEEE Press, 2001. [56] H. Johnson and M. Graham, High-Speed Digital Design, Prentice-Hall, 1993. [57] R. Gregorian, Introduction to CMOS Op-Amps and Comparators, New York: John Wiley & Sons, 1999. [58] R. J. Baker, CMOS: Circuit Design, Layout, and Simulation, 2nd ed., Piscataway, NJ: IEEE Press, 2005. [59] D. A. Jones and K. Martin, Analog Integrated Circuit Design, New York: John Wiley & Sons, 1997. [60] 5. Sheikhaei, S. Mirabbasi, and A. Ivanov, “A 43mW single-channel 4G5/s 4-bit flash ADC in 0.l8jim CMOS,” IEEE Custom Integrated Circuits Conference (CICC), pp. 353- 356, Sep. 2007. [61] E. Alon and M. Horowitz, “Integrated regulation for energy-efficient digital circuits,” IEEE Journal ofSolid-State Circuits, vol. 43, no. 8, pp. 1795-1807, Aug. 2008. [62] X. Meng and R. Saleh, “Active decap design considerations for optimal supply noise reduction,” International Symposium on Quality Electronic Design (ISQED), pp. 765-769, Mar. 2009. [63] X. Meng, R. Saleh, and S. Wilton, “Charge-borrowing decap: a novel circuit for removal of local supply noise violations,” accepted to IEEE Custom Integrated Circuits Conference (CICC), Sep. 2009. [64] B. Razavi, Design ofAnalog CMOS Integrated Circuits, New York: McGraw Hill, 2001. [65] T. B. Cho and P. R. Gray, “A lOb, 2oMsamples/s, 35mW pipeline AID converter,” IEEE Journal ofSolid-State Circuits, vol. 30, no. 3, pp. 166-172, Mar. 1995. [66] A. M Abo and P. R. Gray, “A 1 .5-V, 10-bits, 14.3-MS/s CMOS pipeline analog-to-digital converter,” IEEE Journal ofSolid-State Circuits, vol. 34, no. 5, pp. 599-606, May 1999. [67] K. A. Jenkins, K. L. Shepard, and Z. Xu, “On-chip circuit for measuring period jitter and skew of clock distribution networks,” IEEE Custom Integrated Circuits Conference (CICC), pp. 157-160, Sep. 2007. 144 [68] E. Alon, V. Stojanovic, and M. A. Horowitz, “Circuits and techniques for high-resolution measurement of on-chip power supply noise,” IEEE Journal of Solid-State Circuits, vol. 40, no. 4, pp. 820-828, Apr. 2005. [69] T. Nakura, M. Ikeda, and K. Asada, “Design and measurement of on-chip dildt detector circuit for power supply line,” IEEE Asia-PacUlc Conference on Advanced System Integrated Circuits, pp. 426-427, Aug. 2004. [70] B. Kaminska and K. Arabi, “Mixed signal DFT: a concise overview,” IEEE International Conference on Computer-Aided Design, pp. 672-680, Nov. 2003. [71] B. Murmann, “AID converter trends: power dissipation, scaling and digitally assisted architectures,” IEEE Custom Integrated Circuits Conference (CICC), pp. 105-112, Sep. 2008. [72] R. J. van de Plassche, J. H. Huij sing, and W. Sansen, Analog Circuit Design: High-Speed Analog-to-Digital Converters; Mixed-Signal Design; PLL ‘s and Synthesizers, Boston: Kluwer Academic Publishers, 2000. [73] M. Gustavsson, J. J. Wikner, and N. Tan, CMOS Data Converters for Communications, Boston: Kiuwer Academic Publishers, 2000. 145

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
Russia 8 0
China 3 11
United States 1 1
City Views Downloads
Unknown 8 0
Beijing 3 0
Ashburn 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}

Share

Share to:

Comment

Related Items