UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Design and analysis of active and passive decoupling capacitors for on-chip power supply noise management Meng, Xiongfei 2009

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
ubc_2009_fall_meng_xiongfei.pdf [ 4.1MB ]
Metadata
JSON: 1.0068219.json
JSON-LD: 1.0068219+ld.json
RDF/XML (Pretty): 1.0068219.xml
RDF/JSON: 1.0068219+rdf.json
Turtle: 1.0068219+rdf-turtle.txt
N-Triples: 1.0068219+rdf-ntriples.txt
Original Record: 1.0068219 +original-record.json
Full Text
1.0068219.txt
Citation
1.0068219.ris

Full Text

DESIGN AND ANALYSIS OF ACTIVE AND PASSIVE DECOUPLING CAPACITORS FOR ON-CHIP POWER SUPPLY NOISE MANAGEMENT  by XIONGFEI MENG B.A.Sc., The University of British Columbia, 2004 M.A.Sc., The University of British Columbia, 2006 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY  in THE FACULTY OF GRADUATE STUDIES (Electrical and Computer Engineering)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  July 2009  © Xiongfei Meng, 2009  AB STRACT On-chip decoupling capacitors (decaps) in the form of MOS transistors are widely used to reduce power supply noise in both standard-cell blocks and white spaces between blocks. This research provides guidelines for layouts of decaps that properly tradeoff high-frequency response, electrostatic discharge (ESD) reliability and gate tunneling leakage for use within standard-cell blocks in ASIC designs in 9Onm and 65nm CMOS technologies. A simple but effective metric is developed to determine the optimal decap layout based on the frequency response. Novel active designs are also presented. If an JR-drop violation (hot spot) is found after the physical design is completed, it is usually difficult to implement a quick fix to the problem. In this dissertation, the use of an active decap in white-space areas as a drop-in replacement for passive decaps is investigated to provide noise reduction for these “hot-spot” problems found late in the design process. A modified active decap design is proposed for ASIC applications operating up to 1GHz, and the use of latch-based comparators provides a better power-delay trade-off. Measurement results from a test chip show that the noise reduction using active decaps improves as operating frequency increases, and provides between 1O%-20% noise reduction at 200MHz-1GHz over its passive counterpart. The concept of active decap is further extended to achieve lower supply noise. It is found that an active decap with a stack height of three (i.e., number of pieces switching) provides the best noise reduction if the supply noise level is between 7%-14%, but a stack height of two is best if the noise level is between 14%-16%. In addition, a novel charge-borrowing decap circuit is introduced which outperforms all forms of active decaps for a fixed area in terms of removing local hot spots.  11  TABLE OF CONTENTS  Abstract  ii  Table of Contents  iii  List of Tables  vi  List of Figures  vii  Acknowledgments  xii  Chapter 1 Introduction  1  1.1 Motivation  1  1.2 Research Objectives  5  1.3 Organization of the Dissertation  6  Chapter 2 Background  7  2.1 Introduction  7  2.2 Decoupling Capacitor Basics and Design Challenges  8  2.3 Thin-Oxide Gate Tunneling Leakage  12  2.4 Electrostatic Discharge Reliability in Decap Design  16  2.5 Standard-Cell Decap Layout and Placement  20  2.6 Metricfor Power Supply Noise Management  22  2.7 Summary  25  Chapter 3 Passive Decoupling Capacitor Design  26  3.1 Introduction  26  3.2 High-Frequency Response ofDecoupling Capacitors  27  3.3 Cross-Coupled Decoupling Capacitor Designs  40  111  3.4 Summaiy  .  Chapter 4 Active Decoupling Capacitor Design  48 50  4.1 Introduction  50  4.2 Active Decoupling Capacitor Analysis and Design  51  4.2.1 Active Decap Concept and Design Considerations  51  4.2.2 Overall Active Decap Architecture  56  4.2.3 Design Specifications  61  4.2.4 Latch-Based Comparator Design  63  4.3 Chip Design and Experimental Results  70  4.3.1 Test Chip Setup  70  4.3.2 Test Chip Simulations  75  4.3.3 Test Chip Measurements  78  4.3.4 Measurement Results on One Typical Chip  85  4.4 Active Decap Size and Placement  88  4.5 Summaiy  92  ChapterS Generalized Active Decap and Charge-Borrowing Decap  94  5.1 Introduction  94  5.2 Extended Active Decoupling Capacitor  95  5.2.1 Optimal Stack Height n  95  5.2.2 Design and Layout of n=3 Extended Active Decap  102  5.2.3 Simulation Results  103  5.3 Charge-Borrowing Decap (CBD)  107  5.3.1 Charge-Borrowing Decap Concept  107  5.3.2 “Cllc” Signal Generation  113  5.3.3 Design of Charge-Borrowing Decap  116  5.3.4 Simulation Results  118  5.4 Test Chip Setup and Measurement Results  iv  122  5.5 Summary  .  Chapter 6 Conclusions and Future Work  129 130  6.1 Summary and Conclusions  130  6.2 Contributions in this Dissertation  133  6.3 Future Work  133  Appendix: Comparator Design Fundamentals  135  References  140  V  LIST OF TABLES Table 1.1: Comparison on active and passive decap implementations  3  Table 3.1: Optimal number of fingers for different frequency ranges  36  Table 3.2: Comparison of the passive decap designs and their gate leakage current  45  Table 4.1: Design specifications of the active decap  62  Table 4.2: Transistor sizes of the comparators  65  Table 4.3: Simulated switching circuit design specification comparison  69  Table 4.4: Comparator delay td and delay difference Atd in different corners  80  Table 4.5: Measured active decap performance for different process corners  82  Table 4.6: Comparison between equation and simulated result after correlation  82  Table 4.7: Active decap bandwidth versus average comparator delay under process corners  83  Table 5.1: Optimal stack height n selection based on the supply noise k (from formula)  100  Table 5.2: Optimal stack height n selection based on the supply noise k (from simulation)  105  Table 5.3: High-level comparison among passive, active, and charge-borrowing decaps  108  vi  LIST OF FIGURES Figure 1.1: Effectiveness of on-chip decaps: (a) without decaps, the power supply noise level can be high, compared to (b) where the noise level is low due to the use of decaps  2  Figure 2.1: Decoupling capacitor implemented using an NMOS device Figure 2.2: Cross-coupled decap schematic [21]  8 10  Figure 2.3: Active decap concept shown in (a) and two previous designs from (b) [37][38] and (c)  [391  11  Figure 2.4: Gate leakage current versus gate area  13  Figure 2.5: Gate leakage current density Jleak versus oxide thickness t , 0  13  Figure 2.6: Complete ESD protection scheme  18  Figure 2.7: Simulation setup for ESD analysis [24]  19  Figure 2.8: Sample layout of standard-cell N+P decap (a) with one finger and (b) with two fingers  20  Figure 2.9: DVDavg and DVDm: metric used to evaluate DVD profiles  23  Figure 3.1: (a) A standard cell decap can be implemented as a NMOS in parallel with a PMOS device. The corresponding layout is shown in (b)  28  Figure 3.2: A decap can be implemented as a NMOS device and modeled as a lumped-RC circuit with effective resistance and effective capacitance as functions of frequency,f  29  Figure 3.3: The circuit setup to extract the effective resistance and the effective capacitance values from an ac analysis  31  Figure 3.4: Plots of Ceff and Reff for three device sizes (W x L): lOxlOj.im, 15x5.un, and 5x15jtm. 33 Figure 3.5: Plots of Ceff and Reff for three NMOS devices (HSPICE versus model)  vii  35  Figure 3.6: The effective capacitance, Ce(j), of NMOS and PMOS decaps in 9Onm for different numbers of fingers in a fixed area of Y=2jim and X=9itm  38  Figure 3.7: The effective capacitance, Ceji(J), of NMOS and PMOS decaps in 9Onm for different numbers of fingers in a fixed area of Y=2im and X=9j.tm  39 4  Figure 3.8: Cej(J) and Reô’(J) comparison of fixed-area standard decap and cross-coupled decap: same MOS device sizes but different poiy connections  42  Figure 3.9: Sample layouts of cross-coupled decap cells for (a) 3N-4P (b) 8N-9P  43  Figure 3.10: Sample layouts of improved decap cells for (a) 1N-9P (b) 1N-16P  45  Figure 3.11: Frequency response of various cross-coupled designs  47  Figure 4.1: Active decap concept and its MOS implementation  52  Figure 4.2: The reductive factorsf and g for the boosted voltage as a function of (a) “on” resistances of the switches, R, and (b) leakage due to the size of decap Cdecp  55  Figure 4.3: Active decap architecture  56  Figure 4.4: Difference in the input voltages of the comparators as a function of supply noise k. 58 Figure 4.5: The reductive factor h for the boosted voltage as a function of delay difference &d. 61 Figure 4.6: Complementary comparator design: (a) n-type input for the top comparator, and (b) p-type input for the bottom comparator  64  Figure 4.7: (a) DC and (b) AC (compensated) characteristic curves for the two-stage latch-based comparator design (n-type input shown)  67  Figure 4.8: AC characteristic curves for the three designs  70  Figure 4.9: Test chip setup  71  Figure 4.10: Layout of active decap showing the relative size of the components  71  Figure 4.11: Final voltage a as a function of sensing and switching circuitry area overhead x.  viii  ..  72  Figure 4.12: Annotated test chip microphotograph  74  Figure 4.13: Simulated VDD voltage (on a 500MHz clock) with active decap on and off  76  Figure 4.14: Simulated VDD voltage with active decap on for different process corners  77  Figure 4.15: Grouped scatter plot for active decap noise reduction for the tested sample chips.. 79 Figure 4.16: Comparator delay t d and delay difference  Atd  as a function of supply noise k  80  Figure 4.17: Simulated average VDD voltage with active decap on and off versus clock frequency for (a) slow, (b) typical, and (c) fast process corners  84  Figure 4.18: Measured results (on a 500MHz clock) for (a) active decap on and (b) plotted comparison between active decap on and off Figure 4.19: Measured (0.2  -  1GHz) and simulated (1  86 -  2.5GHz) average VDD voltage with active  decap on and off versus clock frequency Figure 4.20: Simulated average  VDD  87  noise per clock cycle versus normalized decap size  90  Figure 4.21: Power supply noise reduction difference from active decap and passive decap with area overhead from switching circuit of active decap  91  Figure 4.22: Improvement on average VDD noise for using active decaps in different placement locations by varying Rdist and Rmesh  92  Figure 5.1: Active decap concept showing different stack height: (a) n=2, (b) n=3, and (c) n=4.96 Figure 5.2: MOS implementation of the extended active decap (n=3)  97  Figure 5.3: Final voltage aVDD as a function of stack height n with k varying (fixed area)  99  Figure 5.4: Final voltage  aVDD  as a function of k with different stack height n (from formula). 101  Figure 5.5: Extended active decap (n=3) architecture  ix  102  Figure 5.6: Layout of extended active decap (n=3) showing the relative size of the components. 103 Figure 5.7: Simulated VDD voltage with extended active decap (n=3) on for two different k levels. 104 Figure 5.8: Average  VDD  voltage as a function of k with different stack height n (from  simulation)  105  Figure 5.9: Charge-borrowing decap concept shown in (b), compared to a passive decap in (a). 107 Figure 5.10: Diode inserted charge-borrowing decap showing the states of the clocking signal: (a) Clk at 0, (b) Clk rises to  VDD,  and (c) Cik falls back to 0  110  Figure 5.11: Possible implementations of charge-borrowing decap with (a) NMOS-formed diodes and (b) PMOS-formed diodes  110  Figure 5.12: Improved implementation of charge-borrowing decap with PMOS formed diodes. ill Figure 5.13: Generation of the boosted voltage on node B2  112  Figure 5.14: “Cik” signal generation using ring oscillator  114  Figure 5.15: Buffer size determines the edge delay while the buffer chain (with ring oscillator) consumes dynamic power  115  Figure 5.16: Complete circuit diagram of charge-borrowing decap  117  Figure 5.17: Simulated VDD voltage (on a 1GHz clock) with a CBD on and off (best case)  119  Figure 5.18: Simulated VDD voltage (on a 1GHz clock) with a CBD on and off (worst case)... 120 Figure 5.19: Simulated average VDD voltage as a function of k showing the case of CBD  121  Figure 5.20: Test chip 2 setup  122  x  Figure 5.21: Layout of charge-borrowing decap showing the relative size of the components. 122 Figure 5.22: (a) Annotated microphotograph and (b) layout of the second test chip  123  Figure 5.23: Scatter plot comparing average VDD for the tested sample chips  125  Figure 5.24: Superimposed waveforms showing a CBD and a passive decap on a 1GHz clock. In the top panel of the figure, the upper trace is when the CBD is on, while the lower trace is when the passive decap is on  126  Figure 5.25: Measured average VDD voltages at different clock frequencies  127  Figure A. 1: A differential input, single-ended output comparator symbol  135  Figure A.2: DC transfer characteristics of (a) an ideal comparator and (b) a practical comparator with finite gain and offset voltage  136  Figure A.3: Circuit schematics showing positive feedback: (a) a basic differential pair with diode-connected load, (b) a gain-enhanced differential pair, and (c) a differential pair using positive feedback to provide increased gain  xi  138  ACKNOWLEDGMENTS I would like to express my gratitude to my academic supervisor, Dr. Resve Saleh, whose expertise, understanding, encouragement, and support, added significantly to my graduate experience. I appreciate his profound and vast knowledge in many areas, both within and outside of the scope of my research. I would like to thank the exam committee, Dr. Steve Wilton, Dr. David Pulfrey, Dr. Mark Greenstreet, Dr. John Madden, Dr. Lutz Lampe and Dr. Tim Salcudean, for reviewing this dissertation and providing valuable feedback.  A very special thanks goes to my colleagues and friends in the SoC group, for their technical advice and kindness. More specifically, I would like to thank Dr. Shabriar Mirabbasi, Dr. Roberto Rosales, Dipanjan Sengupta, Dr. Mehdi Alimadadi, Dr. Samad Sheikhaei, Jeff Mueller, and Sohaib Majzoub. I also acknowledge Dr. Karim Arabi of Qualcomm and Asad Shayan of PMC-Sierra for their suggestions and help in this study.  I recognize that this research would not have been possible without the financial support from NSERC and PMC-Sierra Inc., and express my gratitude to them. I also thank CMC Microsystems for providing chip fabrication and the CAD tools.  Last but not least, I would especially like to thank my family for both giving and encouraging me to seek for myself a demanding and meaningful education. In particular, I must acknowledge my wife, Liming, for her love, caring and patience through many years of my life. I would not have accomplished this research without her support. The appreciation extends beyond any words at my command.  xii  Chapter 1  Introduction  1.1 Motivation Scaling of CMOS technology allows higher speed and higher functional density. As the clock frequency increases and the supply voltage decreases to about 1V, maintaining the quality of the power supply has become a primary issue. Power supply noise in the form of voltage variations arises due to JR drop and Ldi/dt effects [1]. The JR drop has been increasing over time due to increased resistances in the power grid as the metal widths continue to shrink with each successive technology generation. The inductive Ldi/dt effects are also increasing due to the higher current demands in more complex chips. However, the pin and package inductance overwhelms the inductance of the on-chip power distribution network, and therefore the on-chip inductance can be neglected [2], although the on-chip inductance may be considered in certain applications  [3][4].  Having the two components together, the overall voltage drop AV, at any  point in the power grid is [5][6][7]:  rop =  ‘supply  Rmesh + Lpack  (1.1)  where Rmesh is the power grid (mesh) resistance, Lpack is the package and pin inductance, and ‘supply  is the current flow through the user logic circuits.  1  LIrrJEJ  Iob.LVs  glebol Vss -  I PcIcoge  Bond Wire/Pad  Power Meok  Package  (a)  Bond Wire/Pad  Power Mesh  (b)  Figure 1.1: Effectiveness of on-chip decaps: (a) without decaps, the power supply noise level can be high, compared to (b) where the noise level is low due to the use of decaps.  A variety of different methods can be used to manage supply voltage drops. Among them, the most popular is to use on-chip decoupling capacitors (decaps) to maintain the power supply within a certain percentage (e.g., 10%) of the nominal supply voltage [1][61. Decaps are typically placed in regions between areas of high current demand and the power pads and I/O pins [7][8][9][10][1 1]. The effectiveness of decaps can be illustrated conceptually in Fig. 1.1, where the power system can be modeled as a distributed RLC network [8][12][13][14]. The power supply noise level of Fig. 1.1(a) is reduced in Fig. 1.1(b) due to the use of decaps.  In Application Specific Integrated Circuit (ASIC) designs, two types of decaps can be identified: white-space decaps and standard-cell decaps. White-space decaps are placed in the open areas of the chip between intellectual property (IP) blocks, and are made from passive decaps or active decaps. Standard-cell decaps are always passive decaps placed within the IP blocks themselves [9], typically as filler cells.  Passive decaps can be implemented with MOS transistors or metal-insulator-metal (MIM) capacitors. White-space decaps are usually implemented with NMOS transistors since they  2  provide a high capacitance value per unit area. In certain applications, PMOS devices and MIM capacitors [15][16][17][18] are two alternatives for white-space decaps. Standard-cell decaps normally use both NMOS and PMOS devices. A more recent approach is the active decap which requires dynamic switching of passive decaps to boost up the power rail voltages when excessive voltage drop is detected. Due to its area requirement, it can only be used in the open areas between blocks. The implementation of active and passive decaps in terms of NMOS and PMOS devices in either white-space areas or standard cells can be illustrated in Table 1.1. This research addresses design and implementation issues for both active and passive decaps. Table 1.1: Comparison on active and passive decap implementations. Active decap  Passive decap  White space  NMOS and PMOS  NMOS or PMOS (or MIM)  Standard cell  Not used  NMOS and PMOS  The lack of sufficient decaps can result in unsatisfactory timing and even functional failure for the logic circuits and memory cells [191. On the other hand, overdesign may cost too much area. It is necessary to develop a metric to evaluate the decap effectiveness in terms of power supply noise management [20]. Starting from 9Onm, a number of relatively new issues [21] must be addressed that impact the design and layout of decaps. This research addresses three important decap problems including frequency response [22][23], electrostatic discharge (ESD) protection [24], and gate tunneling leakage [6][25][26][27][28][29][30]. The frequency response controls the performance of decaps at increasingly higher operating frequencies. A potential ESD event across a thin gate oxide increases the likelihood that a chip will be permanently damaged due to  3  a short circuit in the decap itself. Higher gate leakage significantly increases the total static power consumption of the chip.  In white-space areas, prior to the 9Onm technology, the use of passive decaps was sufficient. At 9Onm, the oxide thickness has been reduced to 2nm or less. Therefore, decaps have been redesigned into a cross-coupled form [21] to protect the device from potential electrostatic discharge (ESD) induced oxide breakdown [24]. However, the additional series resistance considerably reduces the transient response of the decap [23]. As a result, large JR-drop levels in localized regions (usually called “hot-spot” JR-drop violations) may unexpectedly be present in high-speed ASICs. These unresolved hot spots cause timing closure problems or result in functional failures in extreme cases. To remove them, designers must consider many options such as moving the logic blocks, adding or rearranging the power pins, and/or modifying power grid design. Near the tapeout deadline, such time-consuming design iterations may not always be feasible. In these situations, an active decap design can play an important role in reducing supply noise without major changes to the design [31]. That is, the active decap can be a drop-in replacement of the passive decap in an attempt to remove any remaining hot spots, thereby saving time and effort [32].  The concept of active decap can be extended with more switches and decaps to ideally achieve lower supply noise. These extended active decap designs need to be evaluated for advantages and limitations so that an optimal design can be made practical. In addition, a novel circuit called charge-borrowing decap that transfers charge from a clean supply node to a noisy supply node will be introduced to produce superior performance to the basic and the extended active decaps.  4  1.2 Research Objectives Research Statement: To investigate new designs and proper placement of active and passive decoupling capacitors to efficiently manage on-chip power supply noise in white-space areas and standard cells for ASIC applications.  Specific Research Goals and Contributions: •  Design passive decaps in standard-cell arrays that properly trade off gate leakage, ESD and transient response, and provide decap design metrics to determine the optimal layout to obtain a desired capacitance level over a target operating frequency. Develop empirical formulae with only a few parameters to capture the frequency responses for 9Onm and 65nm CMOS technologies.  •  Design the basic active decap to provide better power-delay trade-off than prior approaches for white-space hot-spot removal. Identify and resolve limitations of the active decap in terms of suitability for ASIC applications in deep-submicron technologies. Explore the placement of active decaps to remove late-stage JR-drop violations. Validate the design of the active decap using a chip design that provides testing mechanisms to evaluate improvements in dynamic voltage drops.  •  Extend the concept of active decap by modifying its design for improved supply noise management. Achieve a better active decap design that provides a higher level of power supply noise reduction than the basic form of active decaps within a fixed area. Propose a novel circuit that outperforms both the basic and the extended active decaps with help from a clean supply rail.  5  1.3 Organization of the Dissertation The remainder of this dissertation is organized as follows. Chapter 2 provides the necessary background on decap design basics and challenges, gate tunneling leakage through decaps, ESD reliability of thin-oxide gates, standard-cell layout and placement of decaps, and metrics for power supply noise management.  Chapter 3 develops a set of new passive decap designs based on the cross-coupled decap. The modeling of the new designs is described and design metrics are provided to allow hand calculations and analyses to be carried out. Based on the simulation results, the proper layout of these designs is described.  Chapter 4 proposes a modified active decap design for hot spot removal in ASIC applications. The design advantages and disadvantages are compared against prior work. Measurement results from a test chip are used to validate the design. After correlation with measurement results, further simulation is carried out to explore the efficiency of active decap placement.  Chapter 5 extends the concept of the active decap to achieve a better design that has a higher level of power supply noise reduction. Also presented in the chapter is a novel charge-borrowing decap circuit that outperforms the basic and the extended active decaps.  Chapter 6 summarizes the results of the dissertation and provides conclusions. Future research directions are provided.  6  Chapter 2  Background  2.1 Introduction The topics in this chapter provide the necessary background for the rest of the dissertation. Some fundamental and practical decap design issues are also highlighted to motivate the topics in the remainder of the dissertation. This chapter begins with an overview of design challenges and problems associated with decoupling capacitors in 9Onm and 65nm. The overview includes gate tunneling leakage, electrostatic discharge phenomenon and protection, and standard-cell decap placement. Gate leakage is introduced from a physical point of view, and useful information from recent technologies is given. ESD reliability is presented and typical phenomena during an ESD event are discussed. Primary and local ESD protection schemes are briefly illustrated. Since ASIC designs typically utilize standard cells, the decap insertion and placement procedure within standard-cell blocks is briefly introduced. A simple metric for supply noise management is proposed to assess the profiles showing power supply noise and to compare the designs providing different decap performance.  7  2.2 Decoupling Capacitor Basics and Design Challenges A passive decap in the white spaces of a chip can be implemented using an NMOS transistor with the gate connected to  VDD  and both source and drain connected to Vss, as shown in Fig. 2.1.  This approach is considered effective because the thin-oxide capacitance of the transistor gate provides a higher capacitance than any other oxide capacitance available in a standard CMOS fabrication process [211. For design purposes, an approximation for hand calculating the capacitance of this MOS decap can be given by [7]: Cdecap  W L• C , 0 .  (2.1)  where W is the transistor width, L is the transistor length, and Cox is the oxide capacitance per unit area. A more accurate capacitance model needs to include the parasitic fringing and overlap capacitance of the transistor, and will be discussed in greater detail in Chapter 3. Passive whitespace decaps can also be implemented with thick-oxide MOS devices, depending on the requirements on ESD and leakage, knowing that there is a capacitance density penalty. VDD  Decap  Figure 2.1: Decoupling capacitor implemented using an NMOS device.  In the past, the analysis techniques and design metrics dealing with power supply voltage drop were overly simplistic [33][34][35]. Designers analyzed power supply noise with static voltage  drop (SVD) analysis, which might not reflect the true nature of power supply fluctuations,  8  leading to either unnecessary overdesign or risk of timing failures [20]. Although SVD analysis can provide useful feedback in terms of certain glaring errors in the power grid design, it does not take into account the impact of decaps and many other important factors. Dynamic voltage drop (DVD) analysis is emerging as a replacement of SVD analysis to capture the impact of decaps, package inductance, and simultaneous switching events. The drawback is that DVD analysis does not return a fixed value that can assess the degree of improvement. Currently, there is no signoff or analysis metric to characterize a DVD profile [36]. Therefore, a good metric for DVD analysis is desired to evaluate decap design and placement, for the purposes of this research.  At 9Onm, the oxide thickness has been reduced to about 2nm or less. Oxide thickness reduction causes two problems: possible oxide breakdown during an ESD event and increased leakage current. ESD is a transient process of static charge transfer that can typically arise from human contact with any IC pin [24]. Additional input resistance can be inserted in series with passive decaps to protect from ESD. However, this input resistance causes the decap to suffer from the degraded frequency response, resulting in a poor performance in terms of managing power supply noise. Moreover, increased gate leakage should be considered. If decaps can be disconnected from the power rails when they are not needed (e.g., the logic circuit nearby is quiet), gate leakage reduction can be achieved. Therefore, overdesign of decaps should be avoided.  Standard-cell layouts of an IP block consist of rows of fixed-height cells in the ASIC design flow. Typically, decap filler cells have both NMOS and PMOS devices. From the 9Onm node, a cross  9  coupled decap design has been proposed by cell library developers [21] to address the issue of ESD reliability, as shown in Fig. 2.2. The cross-coupled design provides additional ESD protection to the thin-oxide gate of the device [21]. Standard-cell decaps are generally implemented with thin-oxide MOS devices. VDD  vss Figure 2.2: Cross-coupled decap schematic 1211.  The major concern for white-space decaps at 9Onm and 65nm is a reduced budget for power supply noise as the supply voltage decreases to about IV [19]. In certain situations, the use of passive decaps only in white-space areas is not sufficient and hot spots at certain locations may appear at a late design stage. Active decaps in the white space may be used to remove hot spots. The goal is to investigate strengths, limitations, design issues and placement strategies for active decaps in ASIC applications.  Active decaps were originally intended for custom designs. Our goal is to optimize them for ASIC designs. Two active decap circuits using switched capacitances were proposed in the past to regulate the supply voltages {37][38][39]. The basic concept of the active decap is to switch a pair of passive decaps Caecap either in parallel or in series to increase charge delivery capability, as shown in Fig. 2.3(a).  10  VOD  C decq  -1  C decap  (a) global VDD  global  Active Decap  [37] [38]: Single-Input Single-Output Amplifiers  (b) global VDD  global  V [39]: Opaiup with chain ollnverters  )  Active Decap (c)  Figure 2.3: Active decap concept shown in (a) and two previous designs from (b) [37] [38] and (c) [391.  11  The two designs in [37] and [39] are effective for power supply noise reduction, but they also have certain limitations. The design in [37] can respond quickly to supply noise but dissipates large power, whereas [39] saves power but experiences long switching delays. Therefore, an improved design with a better power-delay trade-off is desired. The two previous designs [37][38][39] are illustrated in Fig. 2.3(b) and 2.3(c), respectively. Ref. [37][38][39] mitigate the effects of LC resonance typically in the 20-400MHz band [40]. Recent work has been done to reduce LC resonance using a switched decap technique [411. Researchers have also reported on a distributed active decap to greatly boost the effective decap value while reducing the area requirements [411. In addition, the issue of directly replacing an area occupied by a passive decap with an active decap needs to be addressed to determine the degree of noise reduction that can be obtained and the associated tradeoffs of area and power.  2.3 Thin-Oxide Gate Tunneling Leakage In 9Onm and 65nm processes, a new design issue for decaps due to oxide thickness reduction is the thin-oxide gate tunneling current. The current is in the form of tunneling electrons or holes from substrate to gate or from gate to substrate through the gate oxide, depending on the voltage biasing conditions [26]. Two forms of gate tunneling exist: Fowler—Nordheim (FN) tunneling and direct tunneling. For normal operations on short-channel devices, FN tunneling is negligible, and direct tunneling is dominant [26]. In the case of direct tunneling, the gate leakage current in PMOS is much less than in NMOS, and it has been shown experimentally that PMOS gate leakage is roughly three times smaller than NMOS gate leakage for same size transistors [27] [431. The gate leakage simulations can be carried out by using BSIM4 SPICE models [44][45].  12  Assuming a 9Onm technology with 1 .7nm oxide thickness and 1 .OV power supply, the gate leakage current ‘leak is shown in Fig. 2.4.  35  100  NMOS Ileak —--PMOS Ileak —-NMOSCdecap —a--PMOS Cdecap —  80  -  30  60  ---  a)  20 a) 0  -  I  40  ,  a)  10c 20  -‘  I  0  .5  I  0  0  I  0.5  1  1.5  2  Transistor Area WL (im ) 2 Figure 2.4: Gate leakage current versus gate area. 500 :i  400 NMOS device  I—  300  rJ  a) a)  :iO a)  100 PMOS device 0 1.5  1.6  1.7  1.8  1.9  Oxide Thickness t (nm) Figure 2.5: Gate leakage current density  13  Jleak  versus oxide thickness t ,. 0  2  Clearly, as indicated from simulation results, the gate leakage is proportional to the transistor area. That is,  ‘leak = kak  where  ileak  WL  (2.2)  is the gate leakage current density. From simulation, the decoupling capacitance  values of the transistor, Cdecap, are also shown in the figure. As described earlier, Cdecap is equal to x to the first order. Since Co is fixed in this case, Cdecap is proportional to the decap area 0 WLC of WL.  The gate leakage current density Jleak and the oxide thickness tox have an empirical relationship as follows, assuming the voltage across the oxide Vox is fixed [43]: log leak  =  1 —K K 2  (2.3)  .  where K 1 and K 2 are non-negative experimental constants and are process dependent. Equation (2.3) implies that the gate leakage current is exponentially related to the oxide thickness. A typical  Jicak  and tox relationship for a fixed Vox at 1V is illustrated in Fig. 2.5.  It is evident that at 9Onm and 65nm technologies, the gate leakage from decaps will be significant [27]. The gate leakage contributes to the total static power consumption, and decaps usually occupy a large on-chip area. The use of PMOS devices exclusively is not a viable solution for high-frequency circuits since they have a poor frequency response relative to the NMOS devices for 9Onm and 65nm.  In addition, the amount of gate leakage is also a strong function of the applied bias [28]. If the transistor has a voltage across the oxide, Vox, roughly equal to  14  VDD,  the leakage current density  is largest. If the transistor has a Vox set to close or below the threshold voltage  VT,  it leaks  significantly less. Indeed, under such a condition, the gate leakage current is typically 3-6 orders of magnitude less, depending on the values of VDD and to [281. Thus, the gate leakage in the second condition can be roughly considered to be zero. In decaps, the gate is at  VDD  and the  source and drain of a transistor are tied together. Therefore, decaps would experience the highest levels of leakage, as a function of V x. 0  The oxide capacitance C 0 is a critical factor to many physical properties of MOS transistors since the drain current of a transistor is proportional to Cox. A larger Cox results in a larger drain current and hence a faster transition or a shorter gate delay. On the other hand, the subthreshold current is related to Cox: a smaller Cox (or a larger toy) corresponds to a higher threshold voltage VT and therefore smaller subthreshold current [26]. Each technology generation attempts to increase C 0 by roughly 1 .4x while reducing the channel length L to O.7x of the previous technology’s channel length [7]. The result is that the product of C L has been maintained 0 relatively constant as technology scales. The proper Cox selection balances the trade-off between the drain current and the subthreshold current in each technology node.  From Equation (2.3), the gate leakage density is inversely related to tox. A smaller tox leads to exponentially increasing gate leakage. From a gate leakage perspective, the oxide thickness t x 0 should be kept large. However, Cox is determined by:  C = 0 —  15  (2.4)  where ox is the permittivity of the oxide and is fixed for a given oxide material. Equation (2.4) suggests that if ox is kept unchanged, the increase in Cox will lead to a decrease in tcj and hence an exponential growth in gate leakage [46].  Knowing that the gate leakage increase may be excessive for 9Onm and 65nm, in order to keep x thick while increasing Cox, one can adjust the dielectric constant, k, where 0 t 1s 0 8  =  k  6,  and  the vacuum permittivity. If a high permittivity (high-k) dielectric can be used instead of the  normal Si0 2 oxide, the physical oxide thickness t x would no longer be limited by its electrical 0 property Cox. This concept of using high-k dielectrics was presented in [47], and researchers and process engineers have continued to pursue better high-k materials [481. One of the main goals of using high-k gate dielectrics is to keep the gate leakage under control [48]. Commonly suggested high-k materials include HfO , and A1 2 , HfSiON, Zr0 2 , whose permittivity ranges from 10 to 3 0 2 30 [48][49], compared to 3.97 of Si0 . Commercial microprocessors fabricated at a 45nm 2 technology have been developed using high-k materials, and it has been reported that a i OX gate leakage reduction was achieved [50]. Starting from 45nm, most fabrication facilities are anticipated to shift to high-k technology to reduce gate leakage [51]. However, for 9Onm and 65nm, the concerns on gate leakage are still significant [27][51].  2.4 Electrostatic Discharge Reliability in Decap Design ESD protection due to the thin oxide has become an important concern starting from the 9Onm technology node. ESD is the process of static discharge that can typically arise from human contact with any IC pin. Approximately 0.6pC of charge is carried on a body capacitance of lOOpF, generating a potential of 2kV or higher to discharge from the contacted IC pin to ground  16  for a duration of more than lOOns [24]. Under such an event, the peak discharge current is in the ampere range, leading to permanent damage on certain transistors in the chip if not properly protected. The damage can be in one of two forms, or a combination of the two. The first is thermal burnout in devices or interconnects, while the other is oxide breakdown of devices due to the high voltage across the oxide [24]. When running simulations for an ESD event, the maximum current density J of devices and interconnects is measured to check for potential thermal damage. The oxide voltage also needs to be measured to compare with the oxide breakdown voltage of a device for a given fabrication process. The oxide breakdown voltage is almost linearly proportional to the oxide thickness [24]. For instance, assuming a 9Onm process uses 1 .7nm of oxide thickness, the corresponding oxide breakdown voltage is just below 5V. If the thickness is doubled, the oxide breakdown voltage is also doubled to around 1OV [24].  An ESD event can be delivered between any two pins of an IC. To properly protect an IC from ESD damage, an ESD circuit must shunt ESD current between these two pins [24]. In the case of decaps within standard cells, the only two pins that the decaps have access to are the two local power rails, namely  VDD  and Vss. Primary and local (sometimes called secondary) protection  elements are needed to protect the two rails by limiting the voltage difference between the two rails to a value below the oxide breakdown voltage. The primary element will shunt most of the ESD current, whereas the local element serves to limit the voltage or current at the local circuit until the primary element is fully operational [24]. A primary element can be a thick oxide transistor, a silicon-controlled rectifier, an open-gate, grounded-gate or coupled-gate NMOS transistor, or a large diode [24]. A local protection element can be simply a diode formed by a grounded-gate NMOS transistor [24].  17  A typical ESD protection scheme is illustrated in Fig. 2.6. In addition to the primary and local elements, a resistor R 1 is required to limit the maximum current flow to the decap and to limit the voltage seen from the gate of the decap. For better ESD protection, this resistance is normally large and can be in the forms of polysilicon, diffusion, n-well, or channel resistance [24]. The resistance is generally not implemented together with primary and local protection devices. Rather, it is usually inserted within standard cells where ESD damage is a concern.  Global Figure 2.6: Complete ESD protection scheme.  Previous decap designs (typically before 9Onm technology) did not consider ESD performance for two reasons. First, the transistor’s oxide thickness was large and the oxide breakdown voltage was high enough that the transistor was likely to survive during an ESD event with adequate protection circuits. Second, insertion of the large resistance  dramatically reduces the transient  response of the decap. However, starting from 9Onm, the gate oxide is so thin that the designer cannot ignore the increased ESD risk. A large resistance is therefore recommended to be placed inside the decap cells to protect the circuit from potential ESD damage. Hence, this tradeoff between ESD reliability and transient response becomes the one of major decap design challenges in 9Onm and 65nm technologies.  18  The ESD simulation requires an ESD generation model. Among all the existing models, the human body model (HBM) was adopted for simplicity. Following the standard MIL-STD-883x method 3015.7 [24], a human body can be simulated as a series of l.5k2 resistance lOOpF capacitance  CHBM.  The capacitor  CHBM  RHBM  and  is initially charged to 2kV that needs to be  discharged through some primary elements. The primary element is arbitrarily chosen to be an ESD diode plus a gate-coupled NMOS device (GCNMOS) with an n-well resistor Rnweii (l5k2) and an NMOS bootstrap capacitor Cb. Two identical primary elements are used to protect the circuit placed in between the HBM generation and the elements, as shown in Fig. 2.7. For simplicity, no secondary element is used. Initially charged to 2kV VDD  J  V.  Primary element (Duplicate) Figure 2.7: Simulation setup for ESD analysis [24].  HBM generation  Primary element  Since the primary elements are designed to handle large current flow, the maximum current density,  Jmax,  is assumed to be within the safe range and is not measured. HBM generation raises  the voltage level at node  VDD,  and hence turns on the primary elements to discharge. For device  protection from oxide breakdown, the voltage across the oxide Vox of each decap transistor needs to be observed in simulation. The Vox voltages should to be kept as low as possible, given that the oxide breakdown voltage for a typical 9Onm is below 5V.  19  2.5 Standard-Cell Decap Layout and Placement In the white spaces around the chip, decaps are usually made of NMOS devices, as described earlier. However, within standard cells, it is more convenient to make decaps using both NMOS and PMOS transistors to form a decap filler cell. This is because the n-well is already implemented and usually reserved for PMOS devices. Only about a half-cell area is for NMOS devices. One sample standard-cell decap layout is illustrated in Fig. 2.8. In the figure, the NMOS decap occupies roughly the bottom half of the cell area, whereas the PMOS decap is located in the n-well. The capacitor areas are the polysilicon gates placed on top of the channel regions of the MOS transistors. For standard cells, the height of the cell is always fixed, and the designers can only adjust the cell width. Once the cell width is determined, the size of the decap and the capacitance of the decap are established. Fig. 2.8(a) illustrates a large decap cell (measured in cell width) with long channel transistors. A fingering technique is commonly used to have a smaller effective channel length to improve the decap frequency response. Fig. 2.8(b) depicts the same decap cell but with two fingers.  (b)  (a)  Figure 2.8: Sample layout of standard-cell N-I-P decap (a) with one finger and (b) with two fmgers.  20  During the placement procedure, computer-aided design (CAD) tools place standard cells into rows. Because the height of each cell is always the same, when cells are placed adjacent to each other, the n-well region and the  VDD  and V 55 lines are automatically aligned. The cells for  placement are obtained from the standard-cell library, where all the cells are predefined in width and driving strength. Since the total width of the row is fixed and the individual cell widths are fixed, some empty spaces (typically small) between the cells are left after placing cells. Those empty spaces are good candidates for the placement of decap cells [9]. In fact, a set of decap cells with different cell widths is often included in the standard-cell library.  Decap insertion is considered as a part of the complete design flow. In a typical ASIC design flow, once the standard-cell blocks are synthesized, placed and routed by CAD tools, the decap cells are placed into the empty spaces. Generally, since the spaces are filled using a library of decap cells with various sizes, the decap placement is done without affecting the placement of other logic cells. After placement and routing, chip-level timing is analyzed and timing violations will be fixed by replacement and/or rerouting. Then, chip-level JR-drop analysis is carried out by a CAD tool (e.g., Apache Redhawk) such that the hot spots of severe voltage-drop areas are identified [521. If the voltage drop at the hot spots exceeds the noise budget, more decaps will be inserted into the violation regions and a modification of the placement of other logic cells may have to be done. The logic cell movement requires additional timing and routability analysis before moving on to next step. Then, the chip’s JR drop is analyzed again for the remaining hot spots. These steps in the design flow are iterated until all the hot spots are eliminated and all the logic circuits pass timing analysis. Typically, it may take one or two  21  (occasionally even more) iterations to eliminate all the hot spots [7][9]. In addition, the potential problem of electromigration is also checked alongside the JR-drop analysis [7].  This commonly used decap placement approach is not optimal simply because the empty cells may not be located near the high JR-drop regions. After the hot spots are first identified, the remaining empty spaces near the hot spots may not be large enough. Hence, the logic cells may have to be shifted, resulting in a need for additional timing analysis. In order to improve the placement efficiency, researchers suggest a few approaches including: global decap placement between standard-cell blocks [53], decap placement using activity [1], standard-cell decap placement not affecting relative placement of logic cells [9], and earlier-stage decap placement decisions [24]. Since decaps experience excessive gate leakage, decap placement methods considering leakage current are proposed in [6] and [25].  2.6 Metric for Power Supply Noise Management Static Voltage Drop (SVD) analysis and verification has traditionally been an essential part of the overall physical design and verification flow in semiconductor industry for over the past ten years [20]. In this approach, the JR drops across the chip are computed by averaging the current draw by the transistors and blocks from the power grid. The computed fixed values can be fed to timing verifiers to assess the impact on delay. In the past, SVD analysis provided useful feedback in terms of major errors in the power grid design. Going forward, at 90 nm and smaller technologies, SVD verification is not enough to ensure power integrity. SVD does not take into account the contribution of power density, variations in the switching activity profile and impact  22  of inductance and decoupling capacitors (including LC resonance effects) [2]. Therefore, SVD is not an adequate approach to analyze and optimize power delivery networks in SoC designs.  Recently, industry began to use Dynamic Voltage Drop (DVD) analysis as a way to capture the impact of decaps, inductance, and spatial and temporal switching events in the design [20]. DVD is emerging as a replacement to SVD to capture the impact of power supply noise on timing behavior of logic and memory cells. In order to evaluate the design of active or passive decoupling capacitors on a DVD profile and to qualify its impact on a design’s timing performance, two quantities can be used as a metric: DVDavg and DVDmax, as shown in Fig. 2.9. DVDavg is the DVD profile’s average value in the timing cycle, whereas DVDmax is the DVD  profile’s peak value in the same timing cycle. A design is considered better if it has smaller DVDavg and DVDmaX. Users should add design margins to these metrics to account for the metric’s simplifications, or they should perform the final signoff process with actual DVD profiles.  DQ  DQ  vss Cik V 0 0 00 DVDyg V  Cik Figure 29: DVDaVg and DVDmax: metric used to evaluate DVD profiles.  23  Ref. [20] validates the use of this metric. It can be established that the impact of the voltage drop profile on the timing performance of a digital path is equivalent to applying a fixed supply voltage of  VDD- DVDavg  to the same path. To show this, a logic path was first simulated in  presence of a true dynamic voltage profile on the power supply, and then a DC voltage equal to the average of the voltage profile was used on the power supply [20]. The results show that the timing behaviour of the two cases matches. The intuitive reason for this relationship can be illustrated in Fig. 2.9. Considering the delay of the critical path of the circuit between the two flops, gate delay is reduced when the supply voltage overshoots increased when the supply voltage undershoots  (VDD(t) < VDD nominal).  fluctuates, gates that see a voltage drop higher than the see a voltage drop lower than see a voltage drop equal to  VDD-  (VDD(t) > VDD nominal)  VDD-  and  When the supply voltage  DVDaVg accelerate and gates that  DVDavg decelerate compared to the situation where all gates  VDD- DVDaVg.  Therefore, DVDavg is a good measure of the average  effect of the JR drop on delay. The use of DVDavg as a metric for JR-drop analysis has been shown to be valid on several industrial designs [20].  Applying the metric, any approach (e.g., decap insertion) that reduces the average dynamic voltage drop (DVDavg) or raises the average supply voltage  (VDD  DVDavg) can be considered as  a valid solution for reducing power supply noise to improve timing performance, although the instantaneous supply voltage drop may not be affected.  The  DVDmax  value represents the worst-case voltage drop, with a safety margin, that causes a  failure in logic circuit and memory cells. That is, if the voltage drop exceeds DVDmax by the  24  margin of safety, the behaviour of standard cells or memory cells will be unpredictable. The value of DVDmax depends on the tolerance of individual IP blocks to power supply noise.  Ref. [20] suggests that DVDavg should not be bigger than 10% of VDD-Vss and DVDmaX should not be bigger than 20% of VDD-VSS. These percentage values are commonly used in the industry [20]. These limits are considered pessimistic enough to account for the simplified nature of the metric.  By analyzing different DVD profiles using the metric of DVDaVg and DVDmax [20], an important conclusion can be made: Lth/dt contribution of voltage drop is not as critical as the JR contribution. Ldi/dt only affects the DVDmax and therefore has minimal impact on DVDavg as long as the transfer of charge is completed within the cycle. Therefore, Ldi/dt voltage drop may not significantly affect the timing performance of the chip, as long as it does not create supply fluctuations that exceed the DVDmaX. The above observation is consistent with [2].  2.7 Summary This chapter summarized a number of decap design issues including gate tunneling leakage, ESD protection, standard-cell layout and placement requirements, and the lack of useful metrics evaluating the results from DVD analysis. The decap design challenges for 9Onm and 65nm were described. A simple metric, DVDavg and DVDmax, was proposed to interpret the DVD results from CAD tools. The metric is best used to compare and evaluate different decap designs.  25  Chapter 3  Passive Decoupling Capacitor Design  3.1 Introduction In an ASIC design flow, after placement and routing, empty spaces naturally exist within standard cells. Passive decoupling capacitors, as filler cells, are usually used to fill these empty spaces to reduce JR drop problems locally. This chapter addresses the design and layout of passive decaps [9][23][24] for standard cells at the 9Onm technology node. As described in the previous sections, the JR drop has been increasing over time due to increased resistances in the power grid as the metal widths continue to shrink with each successive technology generation. The inductive Ldi/dt effects are also increasing due to the high current demands of ASIC designs in deep submicron technologies [7][lO][l 1]. The increased supply noise level challenges the design and layout of passive decaps.  A number of relatively new issues for standard-cell decaps must be addressed that impact the design and layout of these cells at scaled technology nodes. Two important problems of decap frequency response and electrostatic discharge (ESD) protection [24] will be addressed. Since decaps are required to perform at increasingly higher operating frequencies, the frequency response [lO][12]{54] of passive decaps will be investigated first to propose improvements to  26  optimize decap layouts. Next, the problems of reduced oxide thickness of a transistor, namely, ESD [24] and thin-oxide gate leakage [6][1 1], will be explored in the context of decap design. A potential ESD event across a thin gate oxide increases the likelihood that a chip will be permanently damaged due to a short circuit in the decap itself. Higher gate leakage increases the total static power consumption of the chip.  A cross-coupled standard-cell design was proposed [21] to address the issue of ESD performance. The design provides sufficient ESD protection, but does not offer any savings in gate leakage and it may compromise the frequency response. This chapter aims to suggest improved layouts of the cross-coupled design that properly tradeoff frequency response and ESD performance, while greatly reducing gate leakage current.  3.2 High-Frequency Response of Decoupling Capacitors Standard-cell layouts of an IP block consist of rows of fixed-height cells in the ASIC design flow. After cell placement is completed, there are a number of empty cells that can be filled with decaps of various sizes depending on the space available. Previous work has addressed the automatic placement and sizing of decap cells [9]. The focus in this chapter is on optimal layout of each decap filler cell. Typically, these standard cells have both NMOS and PMOS devices as shown in Fig. 3.1(a), with a corresponding layout in Fig. 3.1(b). Thin-oxide MOS devices are generally used for standard-cell decap implementation.  27  VDD  I  ,‘  I  ‘‘  I I ss  (a)  (b)  Figure 3.1: (a) A standard cell decap can be implemented as a NMOS in parallel with a PMOS device. The corresponding layout is shown in (b).  As the frequency of operation increases, a fingering approach is required to implement the layout. That is, a single transistor is split into a number of parallel transistors with the same width, but smaller channel lengths. The overhead of this approach is additional spacing for source/drain contacts and an overall reduction in the low-frequency capacitance. However, the average capacitance of the decap over a given frequency range improves as the number of fingers increases. Therefore, the problem of how many fingers to use given a fixed area of a filler cell and fixed gate-oxide thickness needs to be addressed. The objective here is to develop a useful metric to capture the frequency response characteristics in order to choose the optimal number of fingers.  To derive the needed equations, an NMOS decoupling capacitor is depicted first in Fig. 3.2. Non-idealities associated with MOS devices are modeled as a lumped-RC circuit [12j where both the effective resistance, Rejj, and effective capacitance, C , 1 1 are functions of frequency, shown in Fig. 3.2. 28  f  as  VDD  (f)  Figure 3.2: A decap can be implemented as a NMOS device and modeled as a lumped-RC circuit with effective resistance and effective capacitance as functions of frequency,f.  The DC capacitance Ceff,O and resistance ReffO are given by {12][24]{55] (where the subscript 0 indicates zero frequency at DC): 0 —CoxWL+2CoLW C L  Reff,O  (3.1)  12PCOXW(VGS—VT)  where Cox is the oxide capacitance per unit area,  COL  per unit width of the device, p is the channel mobility,  (3.2)  is the overlap and fringing capacitances YGS  is the voltage across the oxide, and  VT is the threshold voltage.  Assuming that a given filler cell has a horizontal dimension X and vertical dimension Y, the channel length of each device in a fingered layout is: Lm =[X—(m—1)xcontac,]/m  where m is the number of fmgers and  (3.3)  is the distance between fingers required by contact  spacing rules. Modified expressions for Ceff 0 and Rejo can be derived as a function of the number of fingers. Thus, the effective capacitance at DC is given by: CeffO(m) =  xYLm +2.CQLY) 0 m•(C  29  (3.4)  For capacitance, each additional finger adds extra overlap and fringing capacitances but loses area due to the contact spacing. Therefore, the capacitance actually decreases linearly as the number of fingers increases. The corresponding equation for RCff 0 with m fingers in a parallel combination is: L,,  1  lReff,O  eff,O(m)  12 PCOxW(VGs-VT)  2  (  )  In previous work [12], the resistance was used to select the channel length and there were no area constraints involved. However, since the resistance drops off as m , it is not as important in the 2 selection of a suitable m. In fact, the goal of an optimal layout should be to provide the highest capacitance value in the given area over a desired operating frequency, 0 tof , while delivering a 0 low resistance. A simple metric is needed to evaluate layouts with differing number of fingers. The easiest choice for a metric is to use the average capacitance over this frequency response up , as follows: 0 tof  c  —  C’eff,0m  avg(m)  +  2  (3 6)  where 0 C (m) is obtained from Equation (3.4) and Ceff(m)(fo) is the effective capacitance with m fmgers at frequency f . A weighted average is also feasible, but it was observed that the simple 0 average works well in practice.  The main issue with the metric is that Cefftm)(fø) is difficult to compute without the aid of HSPICE or an equivalent simulation tool. To facilitate the process, simple frequency-dependent models for both Ceff and Reff are developed. Also, the characteristics of both functions need to be accurate as technology scales. First, a number of AC simulations were performed in HSPICE for a 9Onm CMOS technology using non-quasi-static (NQS) models, which are essential when simulating  30  decaps in the gigahertz frequency range of operation. Two parameters, ACNQSMOD and TRNQSMOD, were set to “1” in BSIM4 [55]. The circuit in Fig. 3.3 was used to extract the effective resistance and capacitance from HSPICE results as follows [52]:  Reff(f)=  c (f)  Re(IRC)  (3.7)  Mag (IRc)  = Mag (IRC)  (3.8)  2lrfIm(IRC)  where Re(IRC), Im(IRC), and Mag(IRC) are the real, imaginary, and magnitude components of ‘RC, respectively. It is assumed in Equation (3.7) and (3.8) that the applied AC voltage Vac is 1LO° V.  Vac  Vdc  Ceff(/)  VDD  Figure 3.3: The circuit setup to extract the effective resistance and the effective capacitance values from an ac analysis.  Both NMOS and PMOS decaps were simulated with W x L sizes as follows: l5jtm x 5pm, lOiim x lOiim, and 5j.im x 15j.im. The simulation frequency ranged from 0 to 10GHz. Typical ASIC clock rates today are in the range of 500MHz to 1GHz, but it is important to study frequency response well beyond the clock frequency. Most of the spectral power density of digital signals lies within frequencies of up  tOfee  l/(2trise), where  tflse  is a signal’s rise time (which can be on  the order of 5Ops or less), andee is the 3-dB cutoff frequency of the spectral power density [56]. It was assumed conservatively that  trise  SOps, and the analysis was carried out up to 10GHz.  31  Large Length Ceff NMOS 1200 1000 WxL(im)  800  10x10 -°-15x5 —a-5x15  600 400 200  -  -  0  I  0  2  4  6  8  10  Frequency (GHz)  Large Length Rf NMOS 700  a’  ::  10x10 —°--15x5 --5x15  400 300 200 100 p  0  I  0  2  4  6  Frequency (GHz)  32  8  10  Large Length Ceff PMOS 1200  1000 WxL(im)  800  r_a  10x10 —°—15x5 -o_ 5x1  600 400 200 0 0  2  4  6  8  10  Frequency (GHz)  Large Length R- PMOS 1800 1600 1400  WxL(iim)  1200  Ho_ 10x10 —°—15x5 5x15  800 600 400 200 0 0  2  4  6  8  10  Frequency (GHz)  Figure 3.4: Plots of Ceff and Rejy for three device sizes (W XL): lOxlOEtm, 15x5im, and 5x15tm.  33  The results of the simulations are shown in Fig. 3.4. As f increases, there is a noticeable roll-off in the curves due to finite transit time effects. Devices with large L’s have a more pronounced effect. In fact, the Ceff curve for 5im x 1 5jim quickly decays in value relative to 1 5jim x i5 m. A general observation on NMOS and PMOS decaps can also be made: NMOS is superior to PMOS in its high-frequency behavior since it has a larger Ceff and a smaller Reff at high frequencies, assuming the area is fixed. Although standard cells employ both NMOS and PMOS devices for decaps, these results show that NMOS decaps would provide better frequency response .  characteristics.  For modeling purposes, based on the frequency responses of Ceff and Reff, it is suggested [55] that the functions can be postulated into the form:  Ceff (f)= Reff (f)  CeffO  (3.9)  r 2 l+(f/fT)  0 R  (3.10)  l+(f/fT) r 2 ’  where 1 t = 1/12 [55] and the transition frequencyf = ‘( 2rL  Shown in Fig. 3.5 are the  results of curve-fitting using Equations (3.9) and (3.10) against HSPICE for the NMOS device. The results are very close. A factor of 1/2 was applied  tofT in  order to produce results shown in  Fig. 3.5. That is, the equation forfT must be adjusted by a fitting factor of 0.5 in order to obtain good results. Similar results were obtained for PMOS devices. This demonstrates that the firstorder equations for Cejj(f) and Rej’(J) are reasonably accurate and, perhaps more importantly, that Ceff(m)(fo) can be easily computed for the metric without running HSPICE.  34  Large Length Ceff NMOS 1200 WxL(iim)  1000  lOxlO HSPICE lOxlO calculated 15x5 HSPICE 1 5x5 calculated —6—5x15 HSPICE —5x15 calculated  800  —  —  600  —  —  400  E  200 0  I  0  2  4  I  I  6  8  10  Frequency (GHz)  Large Length RffNMOS 700 WxL(im)  600 500  —°-— —  400 300  — —  200  — —  lOxlO HSPICE lOxlO calculated 15x5 HSPICE 1 5x5 calculated 5x15 HSPICE 5x1 5 calculated  100 —  0 0  —  I  I  I  I  2  4  6  8  10  Frequency (GHz)  Figure 3.5: Plots of Ceff and Reff for three NMOS devices (HSPICE versus model).  35  At this point, all the necessary information is obtained to determine the number of fingers based on the frequency response. From Equation (3.9), the effective capacitance in Equation (3.6) atf 0 with m fingers is:  where  Ceff(m)(fo) =  v 2 1+(f/J())  =  (s)  (3.11)  fT(m)  To demonstrate the efficacy of the metric, it was applied to the layout of a standard-cell decap in an available area of 2pm x 9jim. Using Equation (3.6), Table 3.1 lists the Cavg(m) metric values for the NMOS or PMOS devices for different frequency ranges. The optimal number of fingers corresponds to the largest entries in bold. For example, if the frequency range of interest is 0 to 10GHz, then 3 NMOS fingers and 4 PMOS fingers are optimal relative to the metric. Of course, if the range is 0 to 2GHz, two fingers are sufficient for both N or P devices. Note that PMOS devices typically require one more finger than NIvIOS devices at higher frequencies of operation.  Table 3.1: Optimal number of fingers for different frequency ranges.  Frequency Range  Cavg Metric versus Number of Fingers m=1  m=r2  m=3  m4  m5  N  180W  187fF  182fF  177fF  171W  P  150W  183fF  182fF  177fF  171W  N  150fF  184fF  182W  177fF  171W  P  110fF  165fF  178fF  176W  171W  N  120fF  173fF  180fF  176fF  171W  P  100fF  135ff  166fF  172fF  169W  0 —fo  0-2GHz  0-5GHz  0  -  10 GHz  36  Table 3.1 was shown to illustrate the use of the metric in determining the optimal number of fingers. In practice, the design process would be as follows. First, the area of a filler cell (in particular, the X dimension of the cell) and frequency range of operation are used as input parameters. Then, the capacitance value as a function of m is computed using Equation (3.6). Finally, the value of m producing the highest capacitance is used to implement the layout.  The results in Table 3.1 can be validated by using Equations (3.4) and (3.9) to generate Ceffim) plots for both NMOS and PMOS devices, as shown in Fig. 3.6. The results in the plot were verified with HSPICE to ensure consistency. As an example, consider the cases with 1 finger and =lOGHz. For the NMOS case in Fig. 3.6, Cavg(1)N(Ceff,o+Ceff(1)(J,))/2(l9OfF+5OfF)/2 12OfF, 0 f  )Cavg( p(Ceff,o+Ceff(1)(f))/2(l9OfF+lOff)/2lOOfF. These are the same whereas for PMOS, 1 values that are found in the last row of the table with m= 1. The rest of the table is produced in the same manner for different values of m and frequency range, 0 —f . 0  By inspection, the plots indicate that 3 fingers would be optimal for NMOS decaps and 4 fingers would be optimal for PMOS decaps, based on the flatness of the lines and the initial value of the capacitance. This conclusion is consistent with Table 3.1. However, by using the metric, designers can quickly obtain the optimal number of fingers for a target operating frequency, without the need for such plots or SPICE simulations.  I 37  _  Calculated Ceff NMOS 200 180 160  XX  140  ng 2 Fingers 3 Fingers 4 Fingers ingerj  120 100 c80 60 40 20 0 0  2  4  6  8  10  Frequency (GHz) Calculated Ceff PMOS 200 18 0  -  x—x  X  140 ‘120  FingerS\ —-.3 Finger5 -0-4 Finger S\ jingeri  100 L) 80 60 40 20 0 0  2  4  6  8  10  Frequency (GHz) Figure 3.6: The effective capacitance, Cejr/f), of NMOS and PMOS decaps in 9Onm for different numbers of fingers in a fixed area of Y2Lm and X9Lm.  38  VDDL:.. •  ::  :  P*os  NMOS --‘  .  --.  -  -  (a)  (b) Figure 3.7: The effective capacitance, Cej$f), of NMOS and PMOS decaps in 9Onm for different numbers of fingers in a fixed area of Y=2iim and X=9im.  Fig. 3.7 illustrates how standard cell layouts would be implemented using the above results, assuming a 10GHz operating range. These layouts would be automatically created by a decap filler cell generator. Two possible layouts are shown: (a) uses the N and P devices and (b) is NMOS only. Fig. 3.7(a) uses 3 fingers for the NMOS device and 4 fingers for the PMOS device. From an average capacitance perspective, the NMOS-only layout style of Fig. 3.7(b) is better, and this is also reflected in Table 3.1. To implement this type of layout in a standard cell, the p well region must be extended to cover the entire area, which is not typical of standard cell design. This approach can be used as long as the design rules at the boundaries of adjacent standard cells are satisfied.  39  3.3 Cross-Coupled Decoupling Capacitor Designs At the 9Onm technology node, there is the possibility of oxide breakdown during an ESD event. A simple ESD protection scheme for decaps is to insert a relatively large resistance in series to limit the maximum voltage seen at the gate of the decap [24]. A minimum ReJJçO is needed to ensure ESD reliability for decap cells. A cross-coupled decap design has been proposed by cell library developers [21] to address the issue of ESD reliability. As shown previously in Fig. 2.2, the drain of the PMOS device is connected to the gate of the NMOS, and vice-versa [21]. The cross-coupled design provides additional series resistance to the inherent decap resistance to increase Reff 0.  The frequency response characteristics of this new configuration can be evaluated to determine if the results obtained in the last section can be applied directly to the new circuit. A standard 3N4P decap of Fig. 3.7(a) is first compared to a same-area cross-coupled decap using HSPICE ac analysis in Fig. 3.3, and the results are shown in Fig. 3.8.  The standard 3N-4P decap has a very low resistance (around 3O), which makes it prone to ESD failure. The cross-coupled 3N-4P design has a much higher DC Rejj  0  (around 35002) but a  poorer frequency response for Ceff. Consequently, the tradeoff between ESD reliability and frequency response must be considered in the design process and decap layout. To improve the frequency response, additional fingers must be used. The target resistance Rejj,o  target  for ESD  protection in our case is a minimum of 5002. The number of fingers was increased to reduce the resistance from 35OO down to that required by ESD. According to Equation (3.5), the scale factor on the 3N-4P design can be found as follows:  40  Rotget=  3500 2 m  =500  (3.12)  :.m=g3500/500=2.6 Scaling the 3N-4P by approximately this amount, a cross-coupled 8N-9P decap was produced. Similarly, for an ESD target Reff 0  target  =  10002, it was found that m=l .9, so a cross-coupled 6N-  7P decap was chosen. The plots for 6N-7P and 8N-9P fingers are also illustrated in Fig. 3.8. The results show that the 8N-9P cross-coupled version is the best configuration to address both frequency response and ESD protection.  41  200 x  x  180 160  x  x  x  x  x  Cross-coupled 8-finger N & 9-finger P  /  120 (‘)  x  -  140  ‘  x  Standard decap 3-finger NMOS & 4-finger PMOS  100  Cross—coupled 6-fmger N & 7-finger P  80 60  /  10  Cross-coupled 3-finger NMOS & 4-finger PMOS  Frequency (GHz)  3500 Cross-coupled 3-fmger NMOS & 4-fmger PMOS  3000  /  2500  Cross-coupled 6-finger N & 7-finger P  ‘2000 Cross-coupled 8-finger N & 9-fmger P  1500 1000 500 0  )  x  /  /  x  0 2 Standard decap 3-finger NMOS & 4-fmger PMOS  x  4  6  8  10  Frequency (GHz)  Figure 3.8: Cejj(J) and Refl(J) comparison of fixed-area standard decap and cross-coupled decap: same MOS device sizes but different poiy connections.  42  From a layout perspective, the cross-coupled decaps can be realized by simply rerouting the poly connections of the standard decaps, while keeping the MOS devices the same. The layouts of two  cases, 3N-4P and 8N-9P,  Vim  ///4/  “  are  shown in Fig. 3.9.  At//A’ ////V/////d’Af/V///V///<’/  //VA/A’/  tinaaa  %  ——.4  :tW<  V/  ,/, V  /V,  /  /  /4/  -  / /,  /  ////A(  /  V  * 7i Lh wflai aNaJr NMOS T C .Prj  \‘\ ‘\  V;  ,<  Vss  S  iaaai iaa wcwva W%  .fl  \“  (a)  (b) Figure 3.9: Sample layouts of cross-coupled decap cells for (a) 3N-4P (b) 8N-9P.  It is important to address one other issue of thin-oxide, gate leakage current of the decap, which contributes to the chip’s total static power. Using HSPICE, the standard and cross-coupled decap circuits were found to have almost identical gate leakage. That is, since the cell area is fixed and only the poly terminal connections are swapped, the cross-coupled design provides no inherent savings in gate leakage as compared to the standard design. There exists a simple design approach to save gate leakage. Simulations using BSIM4 SPICE models [44] indicated that  43  PMOS gate leakage is roughly 3 times smaller than NMOS gate leakage for same size transistors [27][43]. Therefore, PMOS devices are preferred from a leakage perspective. Since PMOS devices have a poor frequency response, more fingers can be used to obtain the desired result. But this must be carried out in the context of the cross-coupled design to preserve ESD protection.  The basic idea to control leakage is to have the smallest possible NMOS device cross-coupled with the largest possible multi-fingered PMOS device. This way, the advantages of PMOS leakage and cross-coupling ESD protection are preserved. The layouts of two configurations are illustrated in Fig. 3.10. A small NMOS device is used in both cases. Note the n-well regions have been expanded in both layouts to accommodate the larger PMOS device. Fig. 3.10(a) uses 9 PMOS fingers while Fig. 3.10(b) has a total of 16 fingers. The same cell area as before (2jim x 9pm) was used for the two designs.  44  (a)  Table 3.2: Comparison of the passive decap designs and their gate leakage current. Decap Cell Layout  Description  Gate Leakage  Std. 3N-4P  Std. decap with 3 fmgers for N and 4 fingers for P  262.4 nA  Cross-Coupled 3N-4P  Cross coupled with 3 fingers for N and 4 fingers for P  260.8 nA  Cross-Coupled 8N-9P  Cross coupled with 8 fingers for N and 9 fingers for P  206.8 nA  9 fingers  Cross coupled with smallest N and 9 fingers for P  119.1 nA  16 fingers  Cross coupled with smallest N and 16 fingers for P  99.7 nA  Modified  45  Table 3.2 summarizes the leakage values for the different cases. The standard and cross-coupled 3N-4P decaps have roughly the same leakage. It is somewhat reduced for the 8N-9P case since there is less area for leakage. However, for the two layouts with the small NMOS devices, the leakage is cut in half. In fact, the case with 1N-16P, the leakage is 62% less than the standard decap 3N-4P.  The R eff,o target of the cross-coupled design must be set based on ESD considerations, but that also controls the maximum number of fingers permitted,  mmax. Since the NMOS device is fixed while  the PMOS device is multi-fingered, the following equation can be used to determine the resistance:  “eff,O_target  —  “W”  2 mmax  where RN and R are the resistance of the decaps without fingers, and “II” means “in parallel with.” This target Reff 0 sets up the equation for a maximum number of fmgers,  I  mm=I(  1 ——)R  1 eff,O_target  As described in the previous section, the optimal  46  (3.14)  N  m depends on the frequency response (i.e.,  Cavg(m)), but the number of fingers selected should not exceed to requirements.  That is,  mm to satisfy ESD  200 x  x  x  x  x  160 140  x  x  100 80  x  Mod, cross-coupled 1N-16P  -  /  -  -  Mod, cross-coupled 1N-9P  60 40  x  -  120‘  x  Standard decap 3N-4P  180  Cross-coupled 8N-9P  -  20  I  0 /“2 0 Cross-coupled 3N-4P  4  6  8  10  Frequency (GHz)  1000  //Cross coupled 3N 4P  Mod. cross-coupled iN 9P 800  600 1:1)  MoZs  300 200  Cross-coupled 8N-9P  100 ,c—  x  x  I 0’ 0 2 Standard decap 3N-4P  /  x  x  x  I  4  I  6  x  x  x  I  8  Frequency (GHz)  Figure 3.11: Frequency response of various cross-coupled designs.  47  x  x  10  Fig. 3.11 illustrates the frequency response of the various designs from 0-10GHz. All of the configurations provide similar Ceff 0 values but are dramatically different in the frequency response characteristics. The standard 3N-4P case is the best, followed by the modified crosscoupled 1 N-i 6P. The Reff 0 are different in all cases but only the standard 3N-4P case is unsuitable for ESD protection. However, it is desirable to select the configuration with the lowest Reff 0 that satisfies the ESD criteria (5002 in this case) for a rapid time-domain response. Overall,  the cross-coupled iN- 1 6P layout is recommended because it provides the required R efJ 0 for ESD reliability and saves at least 50-60% on gate leakage.  To summarize, for 9Onm and 65nrn, standard-cell passive decap design should follow the layout strategy shown in Fig. 3.10. By using the smallest NMOS device and the largest multi-fingered PMOS device in the cross-coupled form, the decap has the lowest leakage and is able to satisi’ the ESD requirements.  3.4 Summary This chapter investigated the tradeoffs between high-frequency performance of decaps and ESD protection and its impact on the layout of standard-cell passive decaps. A design metric was introduced to determine the optimal number of fingers to use in the standard-cell layout to obtain a desired capacitance level over a target operating frequency. Models were developed to capture the frequency responses of Reff and Ceff for a given technology with only a few parameters. As a result, the models can be used to predict the same characteristics of future technologies.  48  For ESD protection, a cross-coupled design was proposed by cell library developers to provide a large series resistance, but it suffers from reduced frequency response and provides no savings in gate leakage. This chapter demonstrated that more fingers are needed with the cross-coupled standard-cell layouts to provide the target resistance value for ESD protection. The design of the target resistance can follow the formulae provided in this chapter. The layout with the smallest NMOS device and a multi-fingered PMOS device delivers acceptable frequency response and ESD reliability, while providing the lowest leakage.  49  Chapter 4  Active Decoupling Capacitor Design  4.1 Introduction Passive decaps described previously have a small layout and are useful within the block of standard cells. However, for large global decaps (i.e., outside the block), other approaches can be used. This chapter addresses a novel application of the active decoupling capacitor. The objective is to investigate the effectiveness of removing local JR-drop violations (usually called “hot spots”) by replacing passive decaps with active decaps. Starting from 9Onm, large power supply noise levels in localized regions may unexpectedly be present in high-speed ASICs [19]. These unresolved hot spots cause timing closure problems or result in functional failures in extreme cases. These hot spots are often detected late in the design cycle so they become problematic and difficult to remove.  To remove them, designers must consider many options such as moving the logic blocks, adding or rearranging the power pins, and/or modifying power grid design. Near the tapeout deadline, such time-consuming design iterations may not always be feasible. In these situations, an active decap design can play an important role in reducing supply noise without major changes to the design [31]. That is, the active decap can be a drop-in replacement of the passive decap in an  50  attempt to remove any remaining hot spots, thereby saving time and effort. In this chapter, to explore the effectiveness of an active decap, quantitative data will be provided on the expected improvements, sizing considerations and placement of an active decap relative to the hot spot.  Two active decap circuits using switched capacitances were proposed in the past to regulate the supply voltages [37] [38] [39]. By increasing charge delivery capability, the two designs in [37] and [39] are quite effective at reducing supply noise, but they also have certain limitations. The design in [37] can switch quickly but dissipates large power, whereas [39] saves power but experiences long switching delays. They both mitigate the effects of LC resonance [40], which is typically in the 20-400MHz band. Recent work has been done to reduce LC resonance using an improved switched decap technique [41]. Further work has also been reported on a distributed active decap to greatly boost the effective decap value while reducing the area requirements [411. In this chapter, a modified active decap design that has lower power than [37] and a better response time than [39] is proposed, targeting ASIC applications up to 1GHz. The issue of directly replacing an area occupied by a passive decap with an active decap is addressed to determine the degree of noise reduction that can be obtained and the associated tradeoffs of area and power.  4.2 Active Decoupling Capacitor Analysis and Design 4.2.1 Active Decap Concept and Design Considerations The basic idea of an active decap is to switch a pair of passive decaps, Cdecap, from parallel to series to provide a local boost in the supply voltage [37][38][39]. As illustrated in Fig. 4.1(a), the decaps are initially in a parallel configuration with a full charge developed across both capacitors.  51  In this standby state, the equivalent capacitance is 2 Cdecap. When placed in a series stack, as in Fig. 4.1(b), the boosted voltage is ideally 2 VDD while the equivalent capacitance is reduced to . When switched back in parallel, the voltage returns to the original value of VDD. In this 2 Cciecap/ case, the stacking level n is 2. VDD Cdecq  I  decap  H  C d.ecap Vss  Vss  (a) decaps in parallel VDD 2  d C decap  Ce  Vs  decap  (c) circuit implementation V  Vss  (b) decaps in series Figure 4.1: Active decap concept and its MOS implementation.  The active decap circuit is depicted in Fig. 4.1(c), with Cdecap and the switches implemented using NMOS and PMOS transistors [37][38J[39]. When the capacitors are in parallel, both Mnl and Mp 1 are on while Mn2 and Mp2 are off (i.e., subthreshold). When the capacitors are in series, both Mnl and Mpl are off while Mn2 and Mp2 are on. The switches exhibit finite “on” resistances, indicated as R and R,, and there is also thin-oxide gate leakage through the decaps,  52  ‘leak,  especially in 9Onm and 65nm CMOS technologies. Both of these effects reduce the  performance of the active decap, as described below.  For the general case of stacking n parallel decaps into a series chain, the maximum improvement can be characterized in terms of a gain, G [37]. If k is the voltage regulation tolerance, where kVDD  is the permissible drop in voltage, then the charge delivered by n parallel capacitors is:  Qpcv-aiiei  (4.1)  = kVDD flCdecap  When the capacitors are stacked in series, the charge delivered for the same voltage drop is: Qseries =  [n VDD —(1— k)VDD]• CdeCap / n  (4.2)  The overall charge gain is:  G=  Qseries  Qparaiiei  = [D  —(1—k)VDD].Cd kVDD n Cdecap  In  (43)  .  Therefore, as given in [37], the gain is controlled by n and k:  1 2 G=’  (4.4)  There exists a value of k such that the regular decap outperforms the active decap. For example, setting G=l and n=2, it is found that k=l/3. For values of k> 1/3, the active decap is of no value. However, if k is below this value, the active decap is able to deliver more charge. For example, if k=O. 1 and n=2, then 0=2.75. This implies that 2.75 times more charge can be delivered by the active decap before its output voltage drops to the same level as the passive decap.  Previous research [37][38][39] provides no information on practical limitations when using Equation (4.4). For design purposes, this level of improvement is not possible due to the switch  53  resistances and leakage currents. In fact, the boosted voltage cannot reach  flVDD  but instead  reaches a lower voltage of bVDD. Therefore, the gain equation should be rewritten as:  (45) kVDD•nCP  where b = n f(R ) g(Cdecap) 0  (4.6)  The reduction factors, f(R,) and g(Cdep), depend on the switch resistance, R , and the leakage 1 current which, in turn, is proportional to Cciecap. Using circuit simulation, normalized plots of J(R) and g(Cdecap) are provided in Fig. 4.2. The switch resistance has a more pronounced effect on b as compared to the leakage current. For example, with lO 011 aild Cciecap R PF, it 700  Cd.fl  be obtained that f=O.9 and g=O.95 from Fig. 4.2. If the two effects are combined, then b=2(O.9)(O.95)1.7 instead of 2. With k=O.l, the achievable gain is now reduced to G2.O. The actual final voltage value,  aVDD,  when the active decap supplies the same charge as the passive  decap is determined by setting G=1 and solving for a in the following equation:  (47) kVDDflCdecap  In this case, with b=1.7, k=O.1 and n=2, one can obtain a=1.3, which implies that the active decap will be boosted initially to 1 .7V (instead of 2V) and then falls back to 1 .3V due to the charge demand of a nearby logic circuit. In the passive case, the initial voltage of 1V would be reduced to O.9V, so the active decap is still superior even with the nonidealities included.  54  1 0.9  -  0.8 0  -  0.7  -  0.6  -  0.5  -  -  0.4 0  200  400  600  800  1000  On Resistance of the Switches Ron (2) (a)  Decap Value Cdecap (uF) (b)  Figure 4.2: The reductive factorsf and g for the boosted voltage as a function of (a) “on” resistances of the switches, R , and (b) leakage due to the size of decap Cdap. 0  55  To design the sizes of the MOS switches, a number of issues must be considered. From the above analysis, a small resistance value is preferable to increase the voltage boosting capability of the active decap, and to improve transient response times. The “on” resistances also provide ESD protection because they are in series with the decaps. Any large voltage fluctuations are absorbed by the resistors to reduce the drop across the thin-oxide gates of the decaps, similar to the effect of cross-coupling decaps [221. Therefore, this resistance must be large enough to safely protect the thin-oxide gates. Considering the factors of boosted voltage level, decap performance, and ESD reliability, the “on” resistances should be designed to be in the range of lO-202 by proper selection of transistor widths. This will require rather large switches. Once their sizes are determined, the buffers generating the switching signals must supply enough current to drive the large capacitances resulting in a large sensing and switching circuitry that consumes a considerable amount of power and area. Therefore, these active decaps should be used sparingly in ASIC designs but are particularly suitable for localized hot-spot removal.  4.2.2 Overall Active Decap Architecture global VDD  Reference Voltage Generator  global  High-pass Filters  Compirators  Switched Decaps  Figure 4.3: Active decap architecture.  56  Fig. 4.3 illustrates the complete active decap design containing four blocks: a reference voltage generator, a pair of high-pass filters, two comparators, and the switched decaps. Compared to the previous work [37][38][39], the key difference in this new approach is the use of latch-based comparators. The user logic circuit block shown in the figure is considered to be the main cause of power supply noise violation. The switch control circuit for the active decap is realized using  two comparators. The differential inputs of each comparator decide the standby voltage level at the outputs of the comparators. In the standby mode, the top comparator has an output at whereas the bottom comparator is set to Vss. When the power grid discharges,  VDD  VDD,  will drop and  Vss will rise. The voltage variations are passed through the high-pass filters to the comparator inputs causing the comparators to reverse their output values, and switching the decaps from parallel to series. Later, when the power grid charges up, the comparator inputs and outputs switch back to their original values. The use of latch-based comparators with hysteresis to switch the decaps is one of the main contributions to this work. An enable signal is provided for testing purposes to allow the active decap circuitry to be turned on or off. When off, the design behaves purely as a passive decap. This allows for a comparison between active and passive decaps.  The trigger voltage for the circuit is set by the comparators and the resistor R . In Fig. 4.3, the 1 reference voltages are generated by a simple voltage divider and are set to roughly VDD/ . 2 However, depending on the comparator design, the absolute input levels of VDD/ 2 are somewhat flexible due to the differential nature of the inputs. The diode-connected transistors in the reference generator should have large length and small width to control the static current. Inserting a small resistor, R , between the two transistors is intended to separate the reference 1 voltages by approximately 3OmV. Then, if the comparators are designed to switch when the  57  voltage difference at the inputs is 10-1 5mV (plus an additional 5mV of hysteresis), the overall design will trigger at approximately 5OmV. If R 1 is chosen to be smaller, the sensitivity of the active decaps is improved [41], at a cost of significantly increased dynamic power because the active decap is triggered more often. If R 1 is designed to be larger, the resulting supply noise k will increase, as shown in Fig. 4.4. When the two input signals of the comparators are separated by a level that exceeds the supply noise generated by the nearby logic, the comparators will not switch, making it a passive decap. In the plot, the active decap stops switching when the input voltages differ by approximately 130—450mV.  0.12  0.11 cl  0  zol C))  0.09  0.08  0  50  100  150  200  Comparator Input Voltage Difference (mV) Figure 4.4: Difference in the input voltages of the comparators as a function of supply noise k.  The targeted supply noise k value is at 0.09-0.1 (i.e., /CVDD value is 90-lOOmV), resulting in a triggering input voltage between 20-7OmV. Thus, the comparator can be designed with a 2OmV  58  switching threshold then the input voltage difference can be adjusted by varying R 1 to the midpoint of about 5OmV. From another perspective, since the maximum JR drop is allowed to be 1 OOmV, the active decap should trigger at about a half of 1 OOmV, which is 5OmV, because there is a delay between the time that the active decap is switched and the time that it actually boosts the supply voltage. Of course, a 2OmV of input triggering voltage would be better in that sense, but the active decap will be switching too often, resulting in an increased concern on large power consumption. To design the triggering voltage at 5OmV seems to be a good tradeoff between power and performance. Therefore, the comparators are required to switch once the voltage (VDD-VSS)  discharges by about 5OmV, which can be considered as the input sensitivity of the  active decap.  The delay between detection and activation of the switched decap, between the two outputs of the comparators,  Ltd,  td,  and the delay difference  impact the bandwidth of the active decap and  the boosted voltage, respectively. Specifically, the delay of the comparators  td  is inversely  proportional to the bandwidth of the active decap, BW. That is,  1 BW  (4.8)  If the operating frequency is below the bandwidth, the active decap reduces the supply noise relative to a same-area passive decap. On the other hand, if the clock frequency is beyond the bandwidth, the active decap may result in equal or more supply noise than the passive decap. If it takes an entire clock period to switch the decaps, then they are just like the passive case because they will not switch during the whole clock cycle. Before they actually switch, the supply voltage goes back to the right level and they are forced not to switch any more. The same situation happens on the next cycle. Therefore, as the switching frequency of the logic increases, 59  the active decap is less and less effective. Then, at 1/  td,  it looks like a passive decap. Beyond  that point, due to larger static power consumption and varying “on” resistance of the switches, the active decap becomes worse than the passive decap. The above observation will be validated in the next section.  Ideally, the delay of the top and the bottom comparators should be the same. In practice, a difference in delay &d between the top and the bottom comparators exists and can be defined as: = tdtop  tdix)ttom  (4.9)  The delay difference will not result in a short connection between power and ground or an open circuit where no decoupling capacitance is present. The effect is, however, the boosted voltage b will be degraded due to leakage current when the switches are not turned on/off at the same time. A function, h(z.td), can be used to capture this effect, as shown in Fig. 4.5. Therefore, Equation (4.6) should be re-written as: b  (4.10)  =  It is desired to keep the delay difference of the two comparators small even under process! voltage/temperature (PVT) variations to ensure sufficient improvement of the boosted voltage.  60  1.05  1 0.95 -e  0.9 0.85 0.8 0.75  -500  -400  -300  -200  -100  0  100  200  300  400  500  Delay Difference Ltd (ps) Figure 4.5: The reductive factor h for the boosted voltage as a function of delay difference At . 1  4.2.3 Design Specifications When designing the active decap, a certain value for Cdecap is needed to keep the supply noise small. However, in the case of drop-in replacement, this design freedom is not available. Therefore, as the first step, designers should check and see through simulation if the replacement of active decap can remove the hot spot nearby or not. The next step is to select a proper “on” resistance of the switches R. For a high boosted voltage b, R 0 should be kept low enough. On the other hand, R 0 must also be high enough for ESD protection. Designers should make this tradeoff for R selection to design the switches. The size of the switches and the passive decaps determines the load capacitance on the comparators.  61  The comparator design should generally satisf’ low power and high speed requirements of ASIC applications. For example, a static power of 5mW or below can be achieved while the bandwidth of the active decap should be able to handle a 1GHz clock. This sets the average comparator delay to approximately ins. Note that the delay requirement should be fulfilled under PVT variations. Therefore, the worst-case delay should be used here. The output of the comparators in the standby state should be close to  VDD  or V 55 to reduce leakage current through the passive  decaps and switches to save power. The comparator should be designed to provide high gain in the switching region such that the output will swing from low to high when the input varies 101 5mV. Higher gain can ease the requirement on input DC biasing and lower the static power consumption. From a stability perspective, a certain amount of hysteresis is desired to reduce the risk of oscillation. A practical value of 5mV for hysteresis is reasonable. Overall, the design specifications can be summarized in Table 4.1. Table 4.1: Design specifications of the active decap. Specifications Worst-case switching delay  <  1 ns  Bandwidth  >  Static power  <5 mW  1 GHz  Specific design values are as follows. In the reference voltage generation circuit, the size of the transistors is chosen to provide a branch current of 5-6j.tA. R 1 can then be implemented with a value of 5k2 to produce a separation voltage of the comparator inputs for —30mV. For the RC based high-pass filters, a cut-off frequency above 10MHz is used to filter out low-frequency  62  supply noise to save power. However, if the cut-off frequency is set too high, it may cause oscillation at the supply rails. A cut-off frequency of 16MHz was finally selected. The two resistors have the value of 3 =R 1 Ok2, while the capacitors (C 2 R = 2 and C ) are implemented with 3 the same value of lpF. These RC values are somewhat flexible, unless the cut-off frequency is set exceedingly high. For instance, it was observed that for a cut-off frequency of 1.6GHz or above, oscillation on the supply rail will occur. Therefore, it is a good approach to have the cut off frequency designed to be a few orders of magnitudes smaller than the oscillation frequency.  4.2.4 Latch-Based Comparator Design There are a wide variety of ways to design a comparator [57][58][59][60]. Also, the Appendix section of this dissertation provides the fundamentals of comparator designs. In this specific application, the two comparators must be able to sense voltage variations that exceed the pre specified sensitivity level (i.e., 1O-l5mV in this case). When the decaps are in parallel, the subthreshold leakage from the switches consumes considerable power due to the large sizes of the switch transistors. To reduce leakage current, the outputs of the comparators should be as close as possible to either  VDD  or Vss. The supply noise budget for a 1V power supply is  normally less than 50-lOOmV and the output is full swing, indicating the need for high gain in the switching region.  With the above considerations, a latch-based comparator was selected for this application, as shown in Fig. 4.6. The exact transistor sizes are listed in Table 4.2. The branch current of the comparators is as follows: 118=  ‘bl=  826iA, I7 377pA, 18= 135 iA, 1  136 jiA.  63  ‘b2=  9OOiA, I7= 339iiA, and  VDD  MT  M1O  T  M3  R Cc  2 M r 1  Ijf  I  i  h  v-  CL  Vin+ M1_o  I Vbiasl  M7 7  Mbl —.  ‘b1  (a)  VDD  Vbias2  ‘b2  M18  M17  h I  J  Vm+ M11iC)  ViiiOC I]M12  Vout Cc  i  CL  rRZ  M1  I  1  I[  I113  M15 (b) Figure 4.6: Complementary comparator design: (a) n-type input for the top comparator, and (b) p type input for the bottom comparator.  64  Table 4.2: Transistor sizes of the comparators. Transistors  WidthlLength  Transistors  Width/Length  Mbl (NMOS)  75jimJO.liim  Mb2 (PMOS)  15OimIO.1pm  M1/M2 (NMOS)  37.5iimJO.1,tm  M11/M12 (PMOS)  75jim!O.ljtm  M3/M4 (PMOS)  37.5 jim/O.ljim  M13/M14 (NMOS)  18.75iimIO.1,im  M5/M6 (PMOS)  48jimIO.ljtm  M15/M16 (NMOS)  24jimIO.lj.tm  M7/M8 (NMOS)  37.5jimJO.ljim  M17/M18 (PMOS)  75i.tmJO.1im  M9/MlO (PMOS)  75iim/O.1,im  M19/M20 (NMOS)  18.75jim/O.ljim  This two-stage architecture satisfies the need for high gain and full swing, but must be designed to avoid any potential stability or oscillation problems. For a latch-based first stage, introducing a certain amount of hysteresis will prevent the comparator from switching back to the standby state in the presence of small variations around the switching region. For the n-type input comparator shown in Fig. 4.6(a), the hysteresis voltage, Vhys, is given by [57]:  Vhys where  ‘bi  2I  24  JpflCQx.(W/L)Ml  —l  ji—:--:•  (411)  is the bias current. In Equation (4.11), 2 is the size ratio [(W/L)Ms / (W/L)] and 2>1  for a latch. Once the slew rate and the bias current is determined, both ‘bi and (W/L)M1 are fixed, leaving 2 as the only parameter for Vhys. In this case, a 2 value of 1.28 was chosen, producing a hysteresis voltage of around 5mV.  65  The second stage converts the differential signals into a signal-ended output and provides the requisite level shifting. The second stage is also used as an output buffer to drive the large switches, where the desired slew rate can be achieved by adjusting the bias currents and transistor sizes. Complementary designs are used for the top and the bottom comparators to have roughly equal switching delays. The bias voltages for the comparators are generated by simple current mirrors. PVT variations on the comparator and the bias generation can cause delay differences in the comparator outputs. This delay difference acts to further reduce the boosted voltage, as illustrated earlier in Fig. 4.5. During the design stage, great care has been given to ensure that the delay differences are within lOOps under all PVT variation simulations, which results in an additional 5% loss in the boosted voltage (i.e., 1 .6V rather than 1 .7V).  The dominant poles of this two-stage comparator were identified for stability compensation since there is a feedback path through the supply rails back to the comparator inputs. Therefore, the output resistance and the load capacitance of the comparator need to be carefully designed to properly position the dominant pole. In this case, a Miller compensation capacitance C is added to shift the dominant pole to a low frequency to improve stability. Also, a nulling resistor Rz is present to cancel the right-half-plane zero [59].  The simulated large-signal DC characteristics of the n-type input comparator are illustrated in Fig. 4.7(a), where the curves with hysteresis are shown. Here, the switching region of the comparator is in the range of ±lOmV. A 2 value of 1.28 from Equation (4.11) was selected to produce about 5mV of hysteresis. The peak DC gain is around 48dB. The AC curve for the comparator is shown in Fig. 4.7(b) where the phase margin (PM) at unity gain is indicated as 39°.  66  1 0.9 0.8 0.7 0.6 0.5 0  0.4 0.3 0.2 0.1 0 -0.05 -0.04 -0.03 -0.02 -0.01  0  0.01  0.02  0.04  0.03  0.05  zWin (V) (a)  50  I I I I I  I  40  I  11111 11111 11111  I I I  I I I  I I  I I I I  I  1111  I  I  I I  I I  I I  11111  I  I  I  11111 11111 11111 11111  I I  I I I I  I P I I  I I I  I  I I I I  I I  III 1111 I 11111 I  I  I  I  I I  I I  I I  I  I  I  I  11111  I I  I I I  I I I  11111 11111 11111 11111 I 11111 I 11111 I 11111 11111 I 11111 11111 I 11111 I 11111 II 1111 II 1111 I liii I liii Illil 11111  I I  I  1111111  I  11111111  I  I  I  I  11111  I  I  I  I  I  11111  I  I  I I  I I  II 1111 11111 I 11111 11111  I I  I I I  II 1111 I 11111 11111 I I 11111 I 11111 I 1111  I I I I I I  I I I I  10  I  I  -10  1111  I I I I  I I I  I I I  I I I I I I  I  0  I  I I  11111 11111 11111 11111  I I  I I I I  11111 11111 11111 11111  I  I I  I I  I I  I  liii  I  I  I I  11111 11111 liii  I I  I i I  I I I  rT-r-rT 1111 1111 III  11111  I  I  I  I  1111  I I I I  11111 11111 11111  I I  I I I  I I  11111  I  11111 11111  I I  I  I I I I  1111 1111 1111 1111 1111  I  11111  I  I  rrrn  ITh I  30 20  I  II I 11111 I 11111 11111  I I I I  I  I I I I I  I I I  I I I I I I I I I I I I I  I  I I  I I I I I I I I I I I I I  I  I I I I I I I I I I I I I  I I I  I I I I I  I  I I  I I I I I  I I I I I I I I I I I I  PIll  11111 11111 11111 11111 11111 I 11111 I 11111 I I 1111 II 1111 1111 I IIII I 11111 11111  10  I I  100  I I I I I I I  I  11111 11111 I (III I III III II I III I 1111 I lull I 11111 hiP I 1111  I I I  I  I I  Irl  I  I  11111  1000  I I i I I I  I I  0  I I I  I  II  III  I I I I I  I  I I  1111 1111 1111 II III I 1111 1111  I i I I I I  II III II III I liii 1111  II  11111  I I  II III I liii  p I I I I  11111 liii I liii  PjI I  I I  I I I I I  II!i  I I I  1111  liii  I  I I I I I  I  I I I  I I  1111 I  I I  I I I I  1111 1111 1111 II lii  10000  Frequency (MHz) (b) Figure 4.7: (a) DC and (b) AC (compensated) characteristic curves for the two-stage latch-based comparator design (n-type input shown).  67  The active decap must be able to boost the supply voltage within one clock cycle such that the average supply noise per clock cycle is reduced, since this factor controls the path delay of the logic blocks [20][61]. In this case, our design goal was set to a maximum clock speed of about 1GHz, which makes it suitable for today’s high-end ASICs, and even medium-speed custom designs. When the supply voltage drops to 0.9V (implying lOOmV of noise, i.e., k=0.1), the average switching delay for a full output swing was designed to be 0.5ns, which should allow proper operation up to 2GHz. The boosted voltage, based on prior considerations, should be in the range of 1 .6V. The charge demand of the logic circuit itself will cause an additional voltage drop of kVDDn 2  =  0.1 •1 .22  0.4 V, resulting in an expected final voltage of 1 .2V. In addition, the  current drive of the comparators will act to reduce the supply voltage further, but hopefully keep the value above 0.9V, which is the noise budget.  The active decap was simulated and compared to prior architectures that were also redesigned in 9Onrn CMOS to quantify the improvement and design tradeoffs. The circuit proposed in [39] was first implemented, where it uses opamps in place of comparators, followed by a chain of inverters to drive the switches. The inverters were optimally-sized according to logical effort [39]. However, its minimum delay was about 0.9ns (1 .2ns for the slow process corner) which is almost unsuitable for typical ASIC speeds, although its power dissipation was only 0.8mW. In the second design in [37J[38], the sensing circuitry is formed by a pseudo-cascode amplifier delivering high speed at the cost of high power. The original design [37] was implemented in a 0.1 5pm process. The design was adapted to the 9Onrn process and it was found that, by replacing their comparator design with the latch-based version, the static power consumption of the switching circuitry is reduced from 13mW to 2.8mW, an improvement factor of almost 5X,  68  while the delay only increases slightly from O.4ns to O.5ns. The comparison of the three designs is provided in Table 4.3, where the design parameters that fail to satisfy the specifications are shown in italic. Note that the new design features hysteresis while the other two do not. The small-signal ac characteristics of the three designs are shown in Fig. 4.8. For the circuit in [39], the chain of inverters was removed for small-signal analysis.  Table 4.3: Simulated switching circuit design specification comparison. Specifications  13711381  1391  This work  Process  1V-core 9Onm STM  Switching delay (typical)  0.4 ns  0.9 ns  0.5 ns  0.5 ns  1.2 ns  0.75 ns  Switching delay (slow)  <  1 ns  Bandwidth  >1 GHz  2 GHz  0.8 GHz  1.5 GHz  Static power*  <  5 mW  13 mW  0.8 mW  2.8 mW  0  0  4.2 mV  Hysteresis voltage * Switching  circuitry only  69  ,.‘I,  50  I—  ——‘  40  11  -1 -  [39] (ui 20  -———4-  *  —  —  + t t H  —  —  —  r rr- -r  i rI  r  -r r  S.  II  30  —  —r  -  “°“+? I  I  Cl) I  10  -  —  —  —  I— —4  —  I  4- + + 4- H  44--I——— -I——  PM=10?  E  Cl)  0  rru I I  I  I I  I I  I I  I  r:T TE7\ETrrJiJHH I  I  I  I  I I I I  I  ..I_..J  390  I  II I  I  I  I  I  I  I I  -10 1  10  100  1000  Frequency (MHz)  PM  =  /10000 11°  100000  Figure 4.8: AC characteristic curves for the three designs.  4.3 Chip Design and Experimental Results 4.3.1 Test Chip Setup A test chip was fabricated in a standard 9Onm 1V-core CMOS process with seven metal layers to validate the results and to quantify the degree of improvement as operation frequency increases when an active decap is used as a drop-in replacement for a passive decap. The test chip setup is shown in Fig. 4.9, where an active decap, a passive decap, and some user logic are implemented. The layout of the active decap is shown in Fig. 4.10. The switching circuitry is located in the center, with the two parallel decaps on either side. The decap on the left is PMOS, and the one on the right is NMOS. The total layout area for the active decap is 600im x 142 jim  70  =  , in 2 0.085mm  which the two decaps on either side combine for an area of 0.077mm . The switching circuitry, 2 including switch transistors (Mnl, Mn2, Mpl and Mp2), accounts for only 10% of the area and this does not greatly affect the final voltage drop, as shown below. Norininal Supply (1V) VDD  Rjjst  Rmesh Lpack Passive Decap (ESD protected)  input (external)  Rst  Rmesh package inductance (Innuicked)  ——  mesh resistance  resistance to decap  .—--.  User Logic (Buffer)  capacitances Figure 4.9: Test chip setup.  Figure 4.10: Layout of active decap showing the relative size of the components.  The area overhead consumed by the sensing and switching circuitry should be considered as an additional penalty for the active decap performance. The percentage area overhead can be defmed as x, then the charge provided by the active decap is: Qseries =  {bJr,D —(1— k) VDDI (1 X)Cdecp / fl .  71  (4.12)  Thus, Equation (4.7) should be re-written as:  (4.13) kVDD flCdecap  Assuming k=0.1, n=2, and b=1.5, the final voltage a can be plotted as a function of the area overhead x, as shown in Fig. 4.11. From the figure, it is clear that the area overhead should be limited to within 30% to achieve a reasonable fmal voltage. If the area overhead is above 43%, using the active decap brings no benefit. In our case, x=lO% so that the penalty is only 5OmV.  0.9 0.8 I  0.7 0%  10%  I  20%  30%  I  40%  50%  x Figure 4.11: Final voltage a as a function of sensing and switching circuitry area overhead x.  The switch sizes were chosen to have a suitable parasitic series resistance to provide ESD protection, sufficient transient response [12] and good damping for potential LC resonance [13]. In our case, the two parallel decaps are formed using thin-oxide transistors to improve area  72  efficiency, since ESD is not a major concern due to switch resistances inherent in the circuit. The decoupling capacitance values in the standby mode are 0.34nF each, resulting in a total of 0.68iiF in parallel.  The extra passive decap of Fig. 4.9 was used to represent fixed decap that is always present in the neighborhood of the active decap. It cannot be shut off. It also employs a series PMOS device to protect it from ESD risks. Both active and passive decaps are placed about 600pm away from the user logic. Ref. [41] uses a linear feedback shift register (LFSR) as the user logic to generate power supply noise because the resulting noise pattern is somewhat randomized. In our design, simply a large buffer with a large capacitive load was used to create supply noise, with the input controlled by an external signal. This way, the switching frequency can be controlled and modified directly from the input. The size of the decaps was chosen to be only a few times larger than the capacitive load to create a —‘lOOmV voltage drop for the experiments. The three resistors (R 3 shown in Fig. 4.3) were implemented using p+ poiy resistances. ,R 1 2 and R The two capacitors C 2 and C 3 in Fig. 4.3 were implemented using MOS transistors to minimize area overhead. A test chip microphotograph is illustrated in Fig. 4.12. The test chip area is totally 1.2 x 0.8 mm . 2  73  Active Decap Passive Decap  Figure 4.12: Annotated test chip microphotograph.  To measure the on-chip supply noise, a packaged die was not used because it was intended to observe internal voltages near the logic block. Thus, the supply variations were measured directly with probes. Power supply noise comes from both JR drop and Ldi/dt effects. The inductance L in the Ldi/dt effect is mainly due to the package, as the on-chip wire inductances are normally negligible [2J. Since an actual package was not used, two on-chip spiral inductors were implemented to mimic the package inductances, one on the supply path and the other on the return path. The value of the spiral inductors is close to a typical wire-bond package inductance. The user logic and the decaps were placed far away from the supply/ground pads (about 6OOim) to create a large mesh resistance. Effectively, the pad and mesh of the test chip were designed to  74  produce a measurable amount of power supply noise such that any improvements of using active decaps could be easily observed.  4.3.2 Test Chip Simulations Before showing the measurement results of the test chips, it is desired to illustrate the simulation results first to demonstrate the close relationship of the two and to provide a better understanding of how the active decap actually behaves when large supply noise is present. The simulation setup follows exactly the test chip in Fig. 4.9. When a clock signal is fed into the test chip, the buffer, as the user logic circuit, switches to draw certain current from the supply rails. The current perks created from the buffer cause the supply voltage  VDD  to drop, resulting in the two  input voltages of the comparators to swap. The comparators then switch their output levels according to the input swap. After certain delay, the outputs of the comparators switch, causing a local boost at the supply voltage  VDD.  Once the supply is boosted, the inputs of the comparators  move back to their nominal values. As a result, the outputs of the comparators switch back after some delay and the active decap waits for the next voltage drop. When the active decap is turned off, the circuit behaves like a passive decap. In such a scenario, the supply voltage follows the current draw of the buffer, and no boost in the supply voltage is achieved. From post-layout HSPICE simulation, the current draw, the supply voltage  VDD  and the internal signal switching  are shown in Fig. 4.13, where the clock frequency is set at 500MHz.  75  25Dm  ZOOm  Current taken from buffer  N  mom lOOn  N  r  t_LJ  SOn  0  l  12n  ion  Va lime (Fm) (liME)  1.05  I  VDD voltage active decap on  950m  active decao off  A!  SOOn  05Dm  4  /  -  iZn  Vt  DOOm  J  f  DOOm  vin+  /1/  -r  —  55Dm  Vin  -  gsoon 45Dm  AvArr*fJ.JrrJEfrN:cf  40Dm  iZa  i’*i  fl1twA---.-  00015  Vog (top) g fout (hot... 1  DOOm  L ‘ZOOm 0  line (Iii) (liME)  Figure 4.13: Simulated  VDD  voltage (on a 500MHZ clock) with active decap on and off.  76  Random process variations exist to affect the effectiveness of the active decap. At the slow corner, the delay of the comparators is increased, resulting in a later boosting point at the supply rail. On the other hand, for the fast process, the comparators take a shorter delay to switch so that the local supply boost occurs earlier. As a consequence, the supply voltage remains high for a longer time for the fast corner, and for a shorter time for the slow corner, as illustrated in Fig. 4.14. Therefore, the average supply noise per clock cycle should be less for a test chip at the fast process corner. Alternatively, a larger average supply noise level can be expected if a test chip happens to be at the slow process corner. Designers should make sure that the active decap provides satisfactory improvement to remove hot spots under process variations, especially at the slow process corner. 141Pfl  VDD voltage (Slow)  00Dm lZn  VDD voltage (Typicel)  1411  95Dm 90Dm DOOm  ion  VDD voltage  (Fast)  lime  95Dm  91mm 05Dm  iOn  Figure 4.14: Simulated  VDD  voltage with active decap on for different process corners.  77  4.3.3 Test Chip Measurements This section describes the measurement results obtained on 15 test chips and validates the simulation results, as illustrated in Fig. 4.15. An Agilent 861 30A bit error rate tester was used to drive the inputs, while an Agilent DSO8 1 304A oscilloscope was used to observe the results. As mentioned earlier, the passive decap shown in the figure is always present in the test chip. The enable signal is used to selectively turn on or off the active decap by applying a high or low voltage. When disabled, the decaps are biased in parallel, utilizing a maximum standby capacitance. In that case, the active decap behaves purely as a passive decap. When enabled, the active decap is triggered by voltage drops of about 5OmV. By turning on and off the active decap, the average  VDD  voltage improvement can be measured. A collection of 15 sample chips are  tested, where the clock frequency is fixed at 1GHz to observe the improvement across the sample space. The test results from these sample chips were categorized into three groups: slow, typical, and fast, to reflect the nature of random process variation on silicon. Note that in all cases, the active decap moved the JR drop inside the lOOmV noise budget. The average each group line up closely with simulation under process variations.  78  VDD  voltages of  930 920  -  A  A A  A A  A  •  ci)  A  A  A  A  .  A  A  900  •  890  Noise Budget  Slow  880 Active Decap ON • Active Decap OFF A  870  I  I  860  1  2  3  4  5  I  I  6  I  I  7 8 9 10 Sample Number  I  I  I  I  11  12  13  14  Fat 15  Figure 4.15: Grouped scatter plot for active decap noise reduction for the tested sample chips.  In Fig. 4.15, when the active decap is off, the average supply voltage varies significantly from 873mV to 9lOmV. This is due to the random process variation on Rmesh and Lpack, which  determines the JR drop on the supply rails. But more importantly, a higher average  VDD  value  when the active decap is off results in a smaller improvement when turning on the active decap. This fact is caused by the nature of the comparators. Specifically, if the input of the comparator varies with a large swing, then the comparator will generate an output with a shorter delay. That is, for a larger supply noise k, the delay of the comparator  td  is smaller. This effect can be  illustrated in Fig. 4.16. Longer delay results in lower bandwidth, which affects the active decap performance at high frequencies. Note that the delay difference which will reduce the boosted voltage b slightly.  79  Etd  also increases as k increases,  0.65  0.08 0.07  0.6 0.06  4-  0.05  0.55  I) 0  -d  I  0.04 0.5  0.03 0.02  I  0.45 • 0.4  I  0.05  0.07  0.09  0.11  0.13  I  I  I  0.15  0.17  0.19  0.01 O.OO  Supply Noise k Figure 4.16: Comparator delay td and delay difference Md as a function of supply noise k.  Table 4.4: Comparator delay t d and delay difference Atd in different corners. Corners  d t  (top)  td  (bottom)  Average td  Slow  0.77 ns  0.72 ns  0.05 ns  0.75 ns  Typical  0.53 ns  0.48 ns  0.05 ns  0.51 ns  Fast  0.32 ns  0.30 ns  0.02 ns  0.31 ns  The different process corners have direct impact on comparator delay and delay difference. For a fixed value of k=0. 1, the comparator delay and delay difference under process variation is highlighted in Table 4.4. The average delay between the top and the bottom comparator varies  80  from 0.75ns (slow) to 0.3 ins (fast), more than 140% of variation. This effect explains the on-die variation for the active decap performance measurements. However, the delay difference of the comparators only varies slightly, indicating that the boosted voltage b will not be affected greatly by process variations.  By averaging the average supply voltage from each group in Fig. 4.15, the overall improvement of using the active decap can be assessed, as highlighted in Table 4.5. Note that in the test chip the passive decap is always connected. To illustrate the improvement solely due to use of active decap, simulations were carried out with the passive decap completely removed. The simulated results showing active decap only versus passive decap is summarized in Table 4.6, where the case for the active decap provides a higher average supply voltage across the process corners. Due to the dynamic power consumption of the comparators during switching and other non idealities, the active decap cannot practically reach the final voltage value described in Equation (4.13). Given a level of the supply noise when the active decap is off, the final voltage can be calculated from Equation (4.13). Comparing the expected and the actual final voltage when the active decap is turned on, the two values are close (-400mV of difference) for the cases of typical and fast process corners, as summarized in Table 4.6.  81  Table 4.5: Measured active decap performance for different process corners.  Average VDD voltage Corners  Improvement active decap ON  active decap OFF  Slow  914 mV  903 mV  11 mY  Typical  904mV  878mV  26mV  Fast  917 mV  872 mV  45 mY  Table 4.6: Comparison between equation and simulated result after correlation. Simulated avg Final voltage  Measured average VDD voltage VDD:  Corners  active  (from eqn.) decap only aVDD  active decap ON  active decap OFF  (correlated) Slow  1169 mY  932 mY  914 mV  903 mV  Typical  1058 mY  909 mV  904 mV  878 mV  Fast  1031 mY  921mV  917mV  872mV  It is now possible to validate Equation (4.8). Although the test equipment has a limited bandwidth of less than 1.5GHz, the test results for a 1GHz clock can be used to correlate simulations for high frequency effects. The three process corners are used here. Simulations were  82  carried out from 1GHz to 3GHz, showing the cross points of active decap and passive decap for the average supply voltage. The simulation results are then correlated with measurement results for the three process corners. The correlated results are shown in Fig. 4.17. The relationship between the active decap bandwidth (crossing point) and the average comparator delay are summarized in Table 4.7. Clearly, Equation (4.8) captures the effect of process variation corners. Table 4.7: Active decap bandwidth versus average comparator delay under process corners. Corners  Bandwidth  Average delay  Slow  1.55GHz  0.75ns  Typical  2.4 GHz  0.51 ns  Fast  >  3 GHz  0.31 ns  920 900 880 p860 840 820 ci)  780 760 1000  1500  2000  Clk Frequency (MHz) (a)  83  2500  3000  920 900 880  .s  860 840 820 800 780 760 1000  1500  2000  2500  3000  Cik Frequency (MHz) (b) 920 A  900  Active Decap ON  • Active Decap OFF  840 820 La)  780 760 1000  1500  2000 Cik Frequency (MHz)  2500  3000  (c) Figure 4.17: Simulated average VDD voltage with active decap on and off versus clock frequency for (a) slow, (b) typical, and (c) fast process corners.  84  4.3.4 Measurement Results on One Typical Chip Three sample chips were found to be at the typical process corner. The average  VDD  voltages  when the active decap is on and off can be found back in Table 4.5. A sample chip that has a close value to the average voltage of the three typical chips was used for further analysis in this section. As before, by turning on and off the active decap, the average  VDD  voltage improvement  was measured. This is shown in Fig. 4.18, where the input is set at 500MHz, typical of ASIC designs. In Fig. 4.18(a), an actual screen shot of the input and supply voltages are provided with the active decap enabled. In Fig. 4.18(b), the supply waveforms for the passive and active cases are superimposed. The average supply voltage increases from 900mV to 914mV. Therefore, the noise level dropped from lOOmV to 86mV, an improvement of l4mV (or about 14% less noise). This improvement can be expected to be almost doubled for an isolated active decap, as illustrated later using simulation.  :  O’?  0?”  (a) 85  1.02  0.97  0.92  0.87  0.82  0  0.5  1  1.5  2  2.5 Time (ns)  3  3.5  4  4.5  5  (b)  Figure 4.18: Measured results (on a 500Mflz clock) for (a) active decap on and (b) plotted comparison between active decap on and off.  Fig. 4.19 shows the measured points as the external input frequency increased from 200MHz to 1GHz. The measurements at 500MHz described above are circled. Two solid trend lines are provided corresponding to active decap on and off. The gap between the two trend lines initially widens indicating that the benefit of active decaps increases as frequency increases. The test chip validated that the active decap has a maximum improvement of 23mV (or about 20% less noise) for a 1 GHz design. Circuit simulation was used to further study the effect of higher clock frequencies, also shown in Fig. 4.19 as dashed lines. Below 2GHz, the active decap can provide more charge than the passive decap. Above 2GHz, its performance diminishes because of the fixed switching speed. The crossing point of the two trend lines at about 2.4GHz indicates the  86  bandwidth limits due to the switching nature of this active decap design. However, today’s highend ASIC designs still run at below 1GHz, so this active decap is quite acceptable.  930  -  A A  910  A  > 890 blj Ct  870 850 Measured Active Decap ON • Measured Active Decap OFF Simulated Active Deca.p ON D Simulated Active Decap OFF A  830 810 0  500  1000  1500  2000  2500  Cik Frequency (MHz) Figure 4.19: Measured (0.2  1GHz) and simulated (1 decap on and off versus clock frequency. -  -  2.5Ghz) average  VDD  voltage with active  In Fig. 4.19, the maximum difference between the active decap on and off occurs at about 1GHz, about a half of the active decap bandwidth. As described in the previous section, the highfrequency cross point of the active decap and the passive decap is at about l/O.5ns=2GHz, where O.5ns is the comparator delay. At low frequencies, the active decap and the passive decap have  similar average VDD values because the clock period is long enough to eliminate the advantage of using the active decap when taking an average  VDD  level per clock cycle. As a result, since the  difference is almost zero at low frequency and at the bandwidth frequency, the maximum  87  difference (or benefit) can be considered to occur at roughly 1/2 of the active decap bandwidth, in this case, 2GHz/2=1 GHz. Therefore, to maximize the effectiveness of the active decap, designers should ensure that the comparator delay td is always less than 1/2 of the clock cycle.  4.4 Active Decap Size and Placement While the test chips are useful in quantifying active decap improvement over passive decap as frequency increases, the proper sizing and placement of the active decap determines the effectiveness of the drop-in replacement approach. When converting a fixed area from passive decaps into active decaps, the standby capacitance is always smaller because of the area overhead of the switches and comparators in the active decap. The actual noise improvement depends on the area available and the location of the active decap relative to the hot spot. Intuitively, if the area available for active decaps is small and the overhead area is a large percentage of the total area, active decaps may not be an effective replacement for passive decaps. On the other hand, if the area available is too large, the noise reduction for active and passive decaps may be similar because of excessive delays in the switching circuitry trying to switch the oversized  Cdecap’S.  Therefore, there exists an optimal area for active decaps where they  are most effective. Similarly, the instantaneous boost provided by active decap must be close to the hot spot to be effective, but this should be traded off against the distance from the supply to replenish the charge.  To explore these aspects, extensive circuit simulation was used to first calibrate the test chip measurements with HSPICE simulation using exactly the same setup in Fig. 4.9. As a calibration  88  metric, the average  VDD  noise per clock cycle,  VDD noise,  was used since it is known to be the  controlling factor of the critical path delay of logic circuits [20]: VDD noise =  VDD  nominal  —  VDD =  lv  —  VDjJ  avg  (4.14)  With a 500MHz input and the active decap turned on, the waveforms from circuit simulation of the supply voltage and internal signals were previously shown in Fig. 4.13. These results are not identical to the measured results of Fig. 4.18 but they do, in fact, have a similar average value. For one clock cycle,  VDD avg  =  918mV, which is fairly close to the measured average of 914mV.  It was found that, for other frequencies,  VDD avg  closely matched the running average from  measurement. Since the measured and simulated values are correlated in this way, the average was used for the rest of the analysis. The fixed passive decap was also removed from the circuit in further analysis to study the improvements derived from the stand-alone active decap.  While  VDD noise  is determined by many other factors including package/padlpower grid design  and clock frequency, for the purpose of this analysis the same power grid design and the 500MHz input were kept the same, and only the decap size varied. The average noise for samearea passive and active decaps was compared. The size of the center switching circuitry of Fig. 4.10 also remained constant. In the passive decap simulations, standard cross-coupled designs were used. In Fig. 4.20, the average noise  VDD noise  is plotted versus passive and active decap size  varying from 85iim . The plot shows that the active decap reduces the noise relative 2 2 to 8.5mm to passive decaps for sizes between 0.001mm 2 and 0.6mm . However, if the available area for 2 decap insertion is smaller than 0.00 1mm 2 or greater than 0.6mm , there is little difference 2 between the two. When the area is small, the active decap is not as effective since the amount of capacitance switched in series is small. When the area is large, the fixed switching circuit in the  89  active decap cannot switch the decaps effectively because the capacitive load exceeds its capability.  150 -°-  .140  —  Passive Decap Active Decap  130 120  I:.  zzz  <90  80 0.0001  I  0.001  0.01 0.1 2 Decap Size (mm ) (log scale)  1  10  Figure 4.20: Simulated average VDD noise per clock cycle versus normalized decap size.  Fig. 4.21 is used to illustrate the optimal size for the active decap design, where the solid curve indicates the noise reduction difference between passive and active decaps. The maximum difference occurs in the range of 0.01mm 2 to 0.1mm . If the active decap is designed in this 2  range, it has the greatest advantage over passive decaps in terms of average supply noise reduction. The test chip was designed to be 0.085mm 2 to obtain close to optimum improvement of 23mV, as described in the previous section. In Fig. 4.21, it is also shown that the area overhead of the switching circuitry in the design is only 10% in the region around the optimal value.  90  30  10000  1000 20 ci)  15  100  10 .  11  .  o  -10 0.0001  -  0.1 0.001  0.01 Decap Size  (2)  0.1 (log scale)  1  10  Figure 4.21: Power supply noise reduction difference from active decap and passive decap with area overhead from switching circuit of active decap.  As mentioned earlier, another important factor in the resulting improvement is the actual placement of active decaps relative to the hot spot. Referring back to Fig. 4.9, the effective distance from the hot spot can be adjusted by varying  Similarly, the distance from the  charge re-supply path can be controlled by varying  Simulations were carried out to  Rmesh.  observe the voltage drops while changing only Rmesh and Rdt. The simulation results are shown in Fig. 4.22. The decap size was fixed at the optimal value of 0.02mm 2 from Fig. 4.20 so that the maximum improvement could be observed. As the distance between the decap and the user logic is varied from lORth to 0.1Rajs, the average noise level in the passive case changes from 134mV to l24mV. However, for the active case, the average noise level reduces from 133mV to 74mV.  91  Therefore, the active decap is more sensitive to placement than the passive decap. This makes intuitive sense because the active decap provides a short-term boost in the charge which acts in a small, localized neighborhood. However, the passive and active decaps exhibit similar characteristics as a function of Rmesh, according to the results in Fig. 4.22. As a result, the active decap should be placed as close as possible to the hot spot to be most effective.  160 140 ,120  j Cd)  o  z  100 80  > ci) ci)  20 0 0.1 Rdist  1 .ORdist  1 ORdist  Figure 4.22: Improvement on average locations by varying Rdist and Rmesh.  VDD  0.1 Rmesh 1 .ORmesh 1 ORmesh noise for using active decaps in different placement  4.5 Summary This chapter described the effectiveness of active decaps as a late-stage drop-in replacement for passive decaps so that a completed chip layout need not be disrupted near the tapeout deadline. Improvements to the design of an active decoupling capacitor were described for removal of hot-  92  spot power supply noise in ASIC designs up to 10Hz operation. The modified active decap using latch-based comparators in 9Onm CMOS is able to switch in 0.5ns and consumes a relatively low power of 2.8mW, which is about 5X lower than a previous design running at approximately the same speed. This reduced power makes it more suitable for use in ASIC designs. Measurement results from test chips indicate improvement over passive decaps of 10% 20%, operating from -  200MHz to 1GHz. The optimal active decap size to maximally remove hot-spot noise was identified. Placement analysis was also carried out and it was found that the active decap is most effective when placed in close proximity to the hot spot, as compared to the passive decap which is not as sensitive to the exact location. In summary, if sized and placed properly, active decaps can be up to 20% better when used as drop-in replacements df passive decaps for power supply noise reduction.  93  Chapter 5  Generalized Active Decap and Charge-Borrowing Decap  5.1 Introduction The previous chapter explored the effectiveness of using active decaps to remove hot-spot IR drop violations. This chapter investigates advanced versions of the active decoupling capacitor and proposes a novel design of a charge-borrowing decap (CBD). The extension of the active decap concept is derived by increasing the stack height n to a larger value to ideally achieve a higher boosted voltage than the basic n=2 active decap. The optimal number of n will be evaluated in theory and practical applications [62]. The CBD design is a completely different approach to addressing the hot-spot removal problem [63]. The new design aims to provide enough charge during every cycle to reduce JR drop with only a minimum power overhead. However, the location of the JR-drop problem must be known in advance and sufficient area must be available with a relatively clean supply to implement the solution. The applications and limitations of the charge-borrowing decap will be evaluated in this chapter.  94  5.2 Extended Active Decoupling Capacitor 5.2.1 Optimal Stack Height n The concept of the extended active decap is simply to increase the stack height n of the basic active decap described in Chapter 4. The motivation to have a larger stack height (n>2) is to generate a higher boosted voltage flVDD to potentially achieve a better improvement when applied to reduce supply noise. For example, the ideal boosted voltage is 3 VDD for VDD for n=3, 4 n=4, and so on. Therefore, it seems that the stack height should be designed as large as possible to obtain a high enough local boost so that the supply noise can be reduced to an arbitrarily small level. However, this is not true in practice. The higher boosted levels cannot be reached due to the nonidealities of the circuit. Also, by increasing the stack height, more switches will be required to turn the decaps in parallel or in series. The active decap circuits for n=2, n=3 and n—4 are illustrated in Fig. 5.1(a), 5.1(b) and 5.1(c), respectively. In the figure, it is assumed that the total area available for the decaps is fixed at 2 Cdecap. Therefore, the decap occupies an area of Cdecap  for n=2, while the cases of n=3 and n=4 have an area of ( )Cdecap and (l/ 3 / 2 )Cdecap, 2  respectively.  95  VDD  VDD 2  Caccap  Cdecap  Caecap  V  Cdecap  Vss  Vss  decaps in parallel  1-c  Vss  decaps in series  (a) VDD  Caccap  VDD 3 decap  4 + C&  C decap  Vss  Vss  Vss  Vss  decaps in parallel  decaps in series (b)  4Vpp  VDD  1  J rr41 L  Vg  Vss  Vss  Vss  decaps in parallel  decap  dec4  Vss  Vss decaps in series  (c) Figure 5.1: Active decap concept showing different stack height: (a) n=2, (b) n=3, and (c) r—4.  96  The practical constraints of stacking the decaps can be illustrated by first showing the actual transistor implementation of the n=3 case in Fig. 5.2, where the switches are implemented using MOS transistors. Note that the number of switches required is increased by three every time the stack height is increased by one. That is, for n2, the number of switches  3(n-1). In the figure,  each horizontal switch is implemented with two transistors (NMOS and PMOS), whereas each vertical switch requires only one transistor (NMOS or PMOS). The design of the stack height of n=4 can be realized in a similar manner, although not shown here. VDD  C decap  V  Vs  Figure 5.2: MOS implementation of the extended active decap (n=3).  Due to the additional switches, two methods of design exist: one is to expand the area occupied by the switch transistors, and the other is to reduce the size of each switch transistor if a relatively constant total area needs to be maintained. The first method increases the area overhead x of the active decap, and also causes a longer comparator delay because of the additional loading capacitance. The increased area overhead reduces the final voltage aVDD to which the active decap can boost the supply voltage. But more importantly, a longer delay results  97  in a reduced bandwidth, which lowers the operating frequency, as described in the previous chapter. The second method uses a fixed total area occupied by the switch transistors. By this approach, the delay of the comparators should remain roughly the same since there is only a minimum change in the loading capacitance. Therefore, the operating frequency is not affected. However, packing more switch transistors into the same area can cause the “on” resistance R of each transistor to increase, which reduces the boosted voltage  bVDD.  The second method to implement the extended active decap was used since the active decap bandwidth should not be compromised for high-end ASIC applications. That is, different designs are designed to have the same operating bandwidth. The final voltage  aVDD  can be obtained by  varying the stack height, n, as illustrated in Fig. 5.3, with different supply noise levels, k. The large dots indicate the highest final voltage points on the different k curves. Note that the starting point of n=1 implies a passive decap. Clearly, for k=O.05, the optimal n value that produces the highest final voltage is 4, whereas for k=O. 1, the optimal value is n=3. For an increased level of k=O. 15, the optimal n value reduces to 2. As the supply noise k further increases, the use of passive decaps (n1) is recommended, as for the case of k=O.2. From the figure, one can conclude that a higher n should be used jfk is small, and vice-versa. Therefore, it is important to select n based on the k range where the resulting final voltage  aVDD  is the highest. In order to do  this, the optimal k ranges for each n and the crossover points between ranges must be determined.  98  2  1.5 -d C  E  -e  1  C  0 1)  cl  0.5  0  1  2  3  4  n Figure 5.3: Final voltage aVDD  as a function of stack height n with  The supply noise crossover point,  k varying (fixed area).  for two different stacking levels,  fl=i  and  =fl2, S  defined when both cases produce the same final voltage. This can be used to identif’ suitable ranges for each stacking level. To obtain  Equation (4.13) can be rearranged into the  following form:  k 1 2 where  fllfl2  and  fl2>fll.  =  xL )(1 xL 1 ) 2 1 x 2 —n (n 2 —n 1 )+n 2  2 bL (bL 1 )(1 —  —  —  (5.1)  Plugging in numbers, the crossover noise value from n=2 to n=3 is  k = 3 , 2 O. 12, and from n=3 to n=4 is k =O.08. Effectively, the crossover point k 4 , 3 =O.05 determines 5 , 4 the boundary where a passive decap should be used since the case of n5 would not be used if  99  the noise was 5% or less (i.e., acceptable level). Similarly, 1 k = 2 , 0.17 produces the same final voltage for n=2 and n=1 (passive decap). When k is above 0.17, the passive decap should be used. The results are presented in a graphical form in Fig. 5.4. The four lines represent n=l, 2, 3 and 4, respectively. The line with the highest value in each region is the optimal value for n. For low values of k, the best choice is n=4. Starting at k 4 the best choice is n=3. At k , 3 3 the best choice , 2 becomes n=2. At k 2 the best choice is n=1 from that point onward. The results are summarized , 1 in Table 5.1.  Table 5.1: Optimal stack height n selection based on the supply noise k (from formula). Condition  Optimal n  k < 0.05  n=1 (use passive decap)  O.05<k <0.08  n=4  k < 0.12  n3  0.12<k<O.17  n=2  k> 0.17  n=1 (use passive decap)  0.08  <  100  2.5  2  ‘‘-‘l 5  0.5  0 0  0.05  0.1  0.15  0.2  k Figure 5.4: Final voltage aVDD as a function of k with different stack height n (from formula).  As described earlier, if the supply noise is above 0.17, the use of any form of the active decap cannot boost the supply voltage to a satisfactory level. Other design approaches to reduce the supply noise may have to be used in that situation. However, the more interesting range of k is from 0.08 to 0.17, since this noise level is typically unacceptable. If the supply noise k is below 0.12 but above 0.08, then the active decap should be designed with n=3 to produce the minimum noise. Similarly, jfk is above 0.12 but below 0.17, the basic active decap with n=2 is optimal.  101  5.2.2 Design and Layout of n=3 Extended Active Decap To validate the results of optimal n selection, the extended active decaps with n=3 and n=4 were implemented. For simplicity, only the design of n3 is illustrated in this section. Similar to the basic active decap design, Fig. 5.5 illustrates the extended active decap for n=3 containing four blocks: a reference voltage generator, a pair of high-pass filters, two comparators, and the switched decaps. The operation of the circuit remains the same: the differential inputs of each comparator decide the standby voltage level at the outputs of the comparators. When the power grid discharges,  VDD  will drop and Vss will rise. The voltage variations are passed through the  high-pass filters to the comparator inputs causing the comparators to reverse their output values, and switch the three-piece decaps from parallel to series. Later, when the power grid charges up, the comparator inputs and outputs switch back to their original values. An enable signal is provided for testing purposes to allow the active decap circuitry to be connected to the global supply rail. Unlike the previous design, when off, the extended active decap is disconnected from the rest of the circuit. This allows for a comparison between the basic and the extended decaps. global VDD  global Vs  Reference Voltage  Generator  High-pass Filters  Comparators  Switched Decaps (N=3)  Figure 5.5: Extended active decap (n=3) architecture.  102  --t  - - -  Figure 5.6: Layout of extended active decap (n=3) showing the relative size of the components.  The layout of this extended active decap is shown in Fig. 5.6. The switching circuitry is located offset from the center, with the three parallel decaps on either side. The two decaps on the left are PMOS, and the one on the right is NMOS. The total layout area for the active decap is 600 j.tm x 142 jim  =  , in which the three decaps combine for an area of 0.077mm 2 0.085mm . Each 2  switch transistor was layed out with only a half of the area as before, resulting in an almost doubling of the “on” resistance. As a result, the switching circuitry, including the switch transistors, still accounts for only 10% of the area overhead. Note that the total area is the same as the basic active decap so that a comparison between the two can be carried out. Although not shown, the design and layout of the n=4 case is similar to the case of n3.  5.2.3 Simulation Results The next step was to use simulations to obtain the supply voltage waveforms under different supply noise levels, k. By increasing the size of the user logic buffer, k can be varied and the supply voltages will differ in the cases of n=2 and n=3, as illustrated in Fig. 5.7. When k0. 12, case n=3 provides a larger boost than n=2. On the other hand, when k=0.15, both n=2 and n=3 were insufficient in terms of delivering charge to boost up the supply, but the n=3 case is even  103  worse. As shown in the figure, by using the second method, the delay of the comparators remains roughly the same. The design of n=4 active decap was also simulated, and the results are similar to the n=2 basic active decap case. One noticeable difference is that as k increases above 0.12, the average  VDD  voltage drops earlier for the n—4 case than the n=2 case. However, when k is at  around 0.05, the average VDD voltage for n=4 is slightly higher than both n=2 and n=3 cases. uIvnrnTTnIa 600ni DO:trl :i(rcur) DO:trfl:i(rcur)  50Dm  loom  Current  taken From buffer to produce k=O.12  ....-...  £  or k=O.15  In ‘fl...  I  in  I Ihfl rr’LIfl  kualue  1.1  1.05  VDD voltage (k=O.12) Basic act. decap Extended active  Iii a5an  decap 500m  05Dm  I  lb  nfl  ion  1.1  too  VDD voltage (k=O.15) Basic act. decap  I  95&n 50Cm  0 U  Extended active decap  rAi  ,  .s  .  05Dm 00Dm 70Cm 70Dm iOn  in rmm(On)  1’in (liME)  Figure 5.7: Simulated VDD voltage with extended active decap (n3) on for two different Ic levels.  104  980 960 940 920 900  I::  —Passive Decap (n=1) —°- Active Decap (n=2) -&- Active Decap (n=3) —‘c- Active Decap (n=4)  < 840 820 800  0  0.05  0.1 k  0.15  0.2  Figure 5.8: Average VDD voltage as a function of k with different stack height n (from simulation).  Table 5.2: Optimal stack height n selection based on the suppiy noise k (from simulation). Condition  Optimal n (simulated)  k < 0.05  n=1 (use passive decap)  0.05 <k<0.07 0.14  n=3  0.14<k<0.16  n2  k> 0.16  n=1 (use passive decap)  0.07  105  Using simulations, the average supply voltages per clock cycle for the n2, 3 and 4 cases when the supply noise k varies from 0.02 to 0.2 were compared and plotted in Fig. 5.8. The corresponding optimal stack height n as a function of the supply noise k is summarized in Table 5.2. The crossover points of k , 1  n2  are similar between formula and simulation. The most  important crossover point of the n=2 and n=3 cases from simulation is 2 k = 3 , 0. 14, somewhat higher than the calculated value of 0.12. Above 0.14, no approach can raise the supply level back to 900mV, making the use of active decaps less valuable in this region. On the other hand, the lower bound of k 4 is at 0.07, slightly below the calculated level of 0.08. Therefore, a slightly , 3 wider k range of 0.070. 14 for n=3 makes it superior to the basic active decap. Unlike the formula, when the k value is low, the active decaps do not switch due to the fixed triggering voltage of about 5OmV, resulting in the active decaps producing slightly worse average supply voltage than the passive decap. However, the active decaps become worse then the passive decap when k<0.05, and they should not to be used. Although the n4 case is the best when k is in the range of 0.050.07, it only has limited value since this k range is small and the improvement over the n=3 active decap is marginal. Thus, it can be concluded from simulation that n3 provides the optimal level of the average supply voltage across a wide supply noise range of below 0.14. If the supply noise is above 0.14, a larger area is required to increase the average supply level to a satisfactory level above 900mV.  106  5.3 Charge-Borrowing Decap (CBD) 5.3.1 Charge-Borrowing Decap Concept The main purpose of the active decap is to boost the voltage locally to reduce supply noise. Therefore, any technique that offers this type of improvement would also qualify as a viable alternative. For example, if charge is “borrowed” from a clean supply to boost up a noisy supply, it would help reduce the hot-spot JR-drop problem. That is the basic concept behind a chargeborrowing decap (CBD), which is a novel but relatively simple idea illustrated in Fig. 5.9. The key idea here is based on capacitive feedthrough. Assuming that the total area available is the same as before, the decoupling capacitance is 2 Cdecap. In Fig. 5.9(a), when power supply noise kVDD  is present, a passive decap provides charge equal to ( Cdecap)(kVDD) 2  k)CdecapVDD. 2 (  In the  case of the CBD, as shown in Fig. 5.8(b), it can boost the local supply voltage to 2 VDD ideally, similar to the active decaps. From another perspective, the charge provided by the CBD circuit in one clock cycle can ideally be up to ( Cdecap)LVc1k = 2 2 CdecapVDD, where  AV1k  is the clock swing.  Therefore, over one clock cycle, the charge-borrowing decap provides significantly more charge than a same-area passive decap. VDD  VDD  2Cdecap  2Cdecap  —a..  VDD 2  VDD  0 J L (a)  (b)  Figure 5.9: Charge-borrowing decap concept shown in (b), compared to a passive decap in (a).  107  Table 5.3: High-level comparison among passive, active, and charge-borrowing decaps. Charge provided (ideal)  Boosted voltage (ideal)  Passive decap  k)CdecapVDD 2 (  -  Active decap (n=2)  )CdecapVDD 2 (l/  VDD 2  Active decap (n=3)  9)CdecapVDD 4 ( 1  VDD 3  Active decap (n=4)  (3/8)CdecapVDD  VDD 4  CdecapVDD 2  VDD 2  Charge-borrowing decap (CBD)  A high-level comparison between passive decap, active decaps, and CBD is highlighted in Table 5.3, where the differences in charge available and boosted voltage are shown. Note that the total decap area is fixed to 2 Cdecap in the comparison. In Table 5.3, the basic active decap (n=2) is better in charge provided than the passive decap if 2k is less than 0.5 (or, k<0.25), plus the active decap also provides a local boost in the supply voltage. The charge-borrowing decap is always superior to basic active decap since it supplies more charge while generating the same boost voltage level. The charge supplied from active decaps with n=3 or n=4 is about 11% or 25% less than the basic active decap (n=2), respectively, but their boosted voltages are higher. This intuitively explains the fact that the extended active decap is better than the basic case only in limited situations. From the concept, it is difficult to conclude the superiority between the extended active decaps and the CBD. More design details need to be studied before a conclusion can be made. From the table, the design that provides the most charge per clock cycle is the charge-borrowing decap so that it was named after this feature. If designed properly, the use of  108  charge-borrowing decaps can potentially remove hot spots as a drop-in replacement of passive decaps, similar to the active decaps. The rest of this chapter will provide results to support this argument.  In Fig. 5.9(b), there is a problem with the falling edge of the clock. When the Cik signal rises from 0 to  VDD,  the supply is boosted to  ideally due to capacitance feedthrough. Then, the  VDD 2  user logic nearby switches, resulting in certain amount of charge to withdraw from the supply. The supply voltage will drop by AV. And then the Cik signal falls from the supply voltage to drop from  2VDD-LW  to  VDD-AV,  VDD  back to 0, forcing  which may not be acceptable if iw is  excessively large. Therefore, the concept of charge-borrowing decap needs some additional circuitry to function properly, as shown in Fig. 5.10(a). Two diodes are inserted at node Bi, one from the clean supply and the other from the noisy supply. Without the connection to the clean VDD,  when the clock is low, B 1 stays at roughly Vss since the current flow from  prevented by the diode, D2. When the clock goes high, Bi rises to  VDD.  VDD  to B 1 is  This boost in voltage  will not trigger current flow from Bi to the supply because both B1 and the supply are at the same level of VDD. Therefore, the voltage at Bi should be around This ensures that the voltage at B 1 can reach about  VDD 2  VDD  when the clock is low.  when the clock goes high and the  supply can be charged. To achieve that, access to a clean supply of VDD is needed through Dl. Therefore, the implementation of the charge-borrowing decap requires a clocking signal, two diodes, and a supply node that has less noise (clean).  Assuming there is one threshold voltage VT drop across each diode, the operation of the CBD circuit can be illustrated in Fig. 5.10. When Clk is at 0, the noisy supply node is assumed to be at  109  VDD,  while B 1 is charged at  VDD-VT.  When Cik rises to  time. As a result, the noisy supply node is increased to drop occurs at the supply, causing it to drop by zW to so that B 1 also reduces to  VDD-VT.  VDD,  B 1 rises to  VDD2 V T.  VDD-VT 2  at the same  Before the clock falls, some  2VDD-2VT-AV.  Then, Clk falls back to 0,  However, D2 prevents charge from flowing back to B 1 from  the supply. Therefore, the noisy supply remains at 2 VDD-2VT-A V.  clean supply  clean supply ( VDD  ® VDD  D1  D1  c&Ho 0  clean supply  Dl  I  VDjVT  @ DD  D2  VDD  VDD-VT 2  (a)  2VDp-2VT-VI  0  VDD-VT  (b)  noisy supply  VDD2VTV 2  (c)  Figure 5.10: Diode inserted charge-borrowing decap showing the states of the clocking signal: (a) Cik at 0, (b) Cik rises to VDD, and (c) Cik falls back to 0.  VDD (clean)  VDD (clean) Mdpi  Mdiii  Mdnl  JL  VDD (noisy)  VDD (noisy)  JL  (a)  (b)  Figure 5.11: Possible implementations of charge-borrowing decap with (a) NMOS-formed diodes and (b) PMOS-formed diodes.  In a CMOS process, the diodes shown in Fig. 5.10 can generally be implemented in one of the two forms: NMOS and PMOS, as illustrated in Fig. 5.11(a) and 5.11(b), respectively [58j[64]. In Fig. 5.11(a), the voltage at Bi when the clock signal is low is  110  VDD-VTnI,  where VT1 is the  threshold voltage of Mdnl. When clock rises, B 1 is increased to  2 V DD-VTn,  causing current flow  to charge up the noisy supply rail. Assuming that the supply is localized, then the boosted voltage is  2 V DD-VTflI-VTII2  due to the threshold voltage degradation of Mdn2. Similarly, in Fig.  5.10(c), the boosted voltage is V 2 DD-IVTPII-IVTP2I where IVTpII and IVTp2I are the absolute threshold voltages ofMdpl and Mdp2, respectively.  Both forms in Fig. 5.11(a) and 5.11(b) have practical limitations. In Fig. 5.11(a), when Bi reaches 2VDD-VT1, the same voltage is applied at the gate of Mdn2. Such a high gate voltage may cause oxide breakdown in deep submicron processes, particularly 9Onm and below [24]. A thick-oxide transistor should be used for Mdn2 to protect it from breakdown. In Fig. 5.11(b), when B 1 is at 2 VDD-IVTPI , the drain of the transistor Mdp 1 is at the same voltage, while the body of the transistor is tied to  VDD.  This also creates a reliability concern that the pn junction of the  transistor Mdpl is forward-biased, resulting in the injection of a large amount of current back to the clean supply rail. Therefore, the PMOS implementation in Fig. 5.11(b) has to be modified. For example, if the gate voltage of Mdp 1 were controlled separately using an appropriate voltage level and the bulk of Mdp 1 were connected to B 1, the forward biasing of the pn junction can be avoided.  VDD (clean) 2 V DD  o  JL  B2  VDD  0 J L  Cik  VDD  (noisy)  Figure 5.12: Improved implementation of charge-borrowing decap with PMOS formed diodes.  111  The modified circuit shown in Fig. 5.12 resolves the issue of forward-biasing the transistor Mdpl. When the clock is low, B2 is set at 5 V s , which allows B 1 to be charged to  VDD  instead of VDD  VTPII. Forward-biasing the pn junction of Mdpl here is no problem because the transistor behaves as a diode. When the clock rises, Bi is switched to  2 V DD,  and B2 is also raised to  2 V DD.  The high gate-source voltage of Mdpl disables the current flow to back to the supply. The pn junction is now reverse-biased to prevent leakage. Note that in Fig. 5.12, the gate-body voltages of both transistors Mdpl and Mdp2 are always within one  VDD  of each other, ensuring gate-  oxide reliability. Moreover, as a side effect, this circuit can generate a boosted voltage of 2 VDDIVTp2I instead of 2 VDD-IVTP1 HVTp2I, which improves the design.  2V  _0 [L Figure 5.13: Generation of the boosted voltage on node B2.  To generate the bootstrapped signal at node B2, the concept of a clock multiplier [58][65][66] can be used. The circuit for generating the bootstrapped signal at B2 is shown in Fig. 5.13. If one assumes that the clock has no previous activity, due to leakage through the transistors, it can be verified that both nodes B3 and B4 are at roughly VDD while both transistors Mcnl and Mcn2 are shut off. When the clock is turned low, B3 goes to  2 V DD  and B4 is roughly at  VDD,  which causes  Mcn2 to turn on. The output node B2 is low, along with the clock signal. When the clock rises to 112  VDD,  B3 is discharged from  2 V DD  to  VDD,  whereas B4 is charged up to  2VDD.  This causes Mcnl  to turn on and Mcn2 to turn off. The rise of the clock signal also turns on Mip, allowing the output node B2 to follow B4. Because the gate of Mip is at Vss, the voltage at B2 can rise to 2 V DD  without any voltage loss. Similar to the previous design approach, the body of the  transistor Mip is connected to B4 to ensure reverse-biased pn junctions. In Fig. 5.13, the two inverters are powered by the clean supply rail. The sizes of the PMOS-formed capacitors should be large enough to supply enough charge to the load capacitance on B2. For Mcnl and Mcn2, thick-oxide transistors are used to reduce the risk of potential oxide breakdown because their gate-body voltages can be up to  2 V DD.  5.3.2 “CIk” Signal Generation A critical concern about the charge-borrowing decap design is the additional capacitive loading on the clock distribution system if the main clock of the chip is connected to the Cik input of the CBD. Since the CBD has a large capacitance value in the range of hundreds of picofarads, such a large capacitance loaded on the clock tree may cause an imbalance of the tree and introduce more clock skew and jitter [67]. In extreme cases, this extra loading may cause a functional failure in the clock distribution network. Therefore, the Cik input of the CBD should be generated from some other sources to keep the main clock tree unaffected.  The Cik input of the CBD simply requires a repetitive signal with enough buffer strength to drive a large capacitor. The frequency of the Cik signal should be roughly in the range of the chip’s operating frequency to ensure sufficient charge pumped into the supply rail at every clock cycle. If the chip operates at a lower frequency, the extra charge provided from CBD will not harm the logic circuits connected to the local supply as long as the slew rate of the boosted voltages  113  remains controlled within practical limitations. Another useful feature would be to implement an enable/disable function to the CBD block. When disabled, the block should behave like a regular passive decap. The CBD is turned on only when the JR drop exceeds certain predefined level or only during the period that the logic circuits connected to the local supply experience higher activities. VDD (clean) enab  Ring oscillator: 39 stages  Buffer chain: 7 stages  Figure 5.14: “CIk” signal generation using ring oscillator.  Satisfying the requirement above, a simple ring oscillator was selected to generate the Clk signal for the CBD, as illustrated in Fig. 5.14. A total of 39 inverter stages were used to provide an oscillation frequency of 1GHz, the upper limit of the targeted ASIC applications. A NAND gate replaces the regular inverter at the first stage to incorporate an enable signal. By using unit-sized inverters, the ring oscillator consumes about 1 65 i W of dynamic power. To provide enough current flow to charge and discharge the decoupling capacitance, in this case 2 Cdecap, a chain of inverters was added and sized according to logical effort [7]. A fan-out factor of the inverter chain was selected to be about 3—4 so that the delay through the chain was minimized. The number of stages required to generate the fan-out factor of 3 to 4 was then calculated. Of course, the circuit in Fig. 5.14 will cause additional supply noise on the “clean” supply, especially with a largely-sized last stage of the buffer chain. One has to ensure that the additional noise on the relatively clean supply node does not exceed certain noise budget when transferring charge from  114  the clean supply to the noisy supply. As the size of the inverter chain increases, its dynamic power also increases, at a benefit of the improved slew rate (SR). This effect can be shown in Fig. 5.15. Note that in the figure the dynamic power includes the ring oscillator plus the entire buffer chain, in which each buffer is sized up properly to produce the minimum path delay. 14  100 90  12  80 110  8 .  :JZJJZ._:JZJZ::z:z  6  40  S  4  50 -  -  .  30  I.  -20  4  2  10  :300rn I  0  0  200  400  I  I  600  800  0 1000  Final Stage Size (NMOS transistor) (tim) Figure 5.15: Buffer size determines the edge delay while the buffer chain (with ring oscillator) consumes dynamic power.  The current through the last stage of the chain I1t_stage controls the decap charge and discharge, so it determines the slew rate. The edge delay can be defined here as the delay time for the positive side of the decap to rise a full swing, as follows: Edge Delay = i... SR  115  2Cdecap ‘last  stage  (5.2)  The edge delay is a better term in this scenario to illustrate the design tradeoffs in Fig. 5.15. For a targeting clock frequency of 1GHz, the corresponding clock period is ins. Having an edge delay in the range of 50 to iO0ps (i.e., 1/20  —  1/10 of the clock period) is reasonable. Therefore, a  buffer size of 300.tm/600jim (NMOS/PMOS) for the last stage was selected to produce an edge delay of about 5Ops, while the total dynamic power was around 3.8mW. In that case, the buffer chain was designed to have 7 stages sized up according to logical effort.  When determining the size of the buffer chain, the capacitance of C 2 decap is fixed at 0.68nF, consistent with the basic active decap design in Chapter 4. Clearly, assuming a fixed edge delay, the size of buffer required is proportional to the decap value. If a smaller decap is used, the buffer size can be made smaller to dissipate less power. This power drawn from the clean supply node is critical so that the supply noise caused by the ring oscillator and the buffer chain should not rise beyond the noise budget in its localized region. That is, the goal of generating the “Clk” signal from a clean  VDD  is to provide charge from the clean supply that is not connected to the  main system clock or any important circuitry. Designers must ensure that the clean supply itself does not become excessively noisy so that the local supply integrity is compromised.  5.3.3 Design of Charge-Borrowing Decap With the considerations from the previous section, the complete charge-borrowing decap circuit is depicted in Fig 5.16. An enable signal is provided to turn off the CBD for test purposes. When the enable signal is low, the transistor Msp is off, preventing current flow from the clean supply. The decap is implemented using PMOS transistors, whose value is set to When enabled, the voltage at B2 varies from 0 to  2 V DD.  2 C decap  for comparison.  The gate-source capacitance of Mdpl  creates certain noise on the clean supply node due to clock feedthrough. The existence of Msp  116  provides shielding to the clock feedthrough to reduce the noise. With this circuit configuration, the practical boosted voltage is 2 VDD-IVTP2I, while the charge provided is at 2 CdecapVDD. Note that the ESD concern on the thin-oxide decap in this circuit should be addressed by proper sizing of the two transistors, Mdpl and Mdp2.  1  •  El  —  I2Cd L —  —  —  Mdp2  VDD (noisy)  I  _‘qj  Figure 5.16: Complete circuit diagram of charge-borrowing decap.  As described earlier, after a local hot spot is identified, its nearby white space or passive decap area is occupied with an active decap or a CBD to reduce the supply noise. In the case of CBD,  only the passive decap 2Cciecap and the transistor Mdp2 need to be implemented locally. Other circuits showing in Fig. 5.16 can be located away from the hot spot but near a clean supply node, once such a clean supply is identified. Two global interconnects may be required to connect the two parts of circuits at node Cik and B 1. The actual placement of the ring oscillator and the buffer chain will depend on the floorplan and location of power pins of the chip itself. Compared to the size of the passive decap, the size of Mdp2 is fairly small and even negligible to include in an area overhead. Since the clean supply node does not require a large area of decaps, the area  117  occupied by the circuits at the clean supply is relatively small. Thus, it is assumed that the CBD block requires a minimal area overhead, relative to the hot spot area where the CBD is placed.  5.3.4 Simulation Results To validate the concept of charge-borrowing decap, HSPICE simulations were carried out. In the simulation setup, one charge-borrowing decap with enable signal is present, and no decoupling capacitor is connected between  VDD  and Vss. The load on the supply rail is a large buffer whose  current demand can be controlled. As mentioned previously, the CBD can boost the supply voltage when the clock signal rises. The best scenario occurs when the current demand of the buffer also lines up with the rising edge of the clock. As a result, the dips on the supply voltage produced by the current demand and the peaks generated by the CBD cancel each other, causing a relatively low noise voltage profile. Such a case can be illustrated in Fig. 5.17.  In Fig. 5.17, the top part of the figure shows the clock signal, whereas the second part depicts the supply voltage when both passive decap and CBD are disconnected from the supply rail. The load buffer is designed to create voltage sags at the rising clock edges. In the third portion in Fig. 5.17, the load buffer is removed and only the CBD is connected and turned on. Clearly, the supply voltage is boosted at the rising edges. The dips near the falling edges can be considered as ripples since there is no decoupling capacitance connected. The last (fourth) part in the figure illustrates the voltage waveform when the CBD is turned on and the load is switching. The resulting supply voltage experiences a low level of noise because of the cancellation of peaks and dips.  118  DO:lrl :v(cIk) GOOn  cm  60Gm  40Dm  WOn  12n  ion  1.05  vDD  voltage  CBD off (no decap) 950e  GOOn  ,\  ,\  It,  ion  It, 12n  1.15  DG:tro:v(vebs)  VDD voltage cnn on (no load)  Time (lie) (TIME)  1.1  9  1.05  C  90Gm 85Cm lot  Tmin (01) (TIME)  lZn  J  VDD voltage cnn on (with load) e  1.18 1.16 1.14 1.12 1.1 1.06 1.06 1.04 1.02 580 90Gm 94Gm 92Gm 90Dm 88Gm 86Gm 64Gm  ::::::::::::::::::::::::::::::::.ç:.  :::.ç::;.:::::::::::::::j::::::::::::::::::::::::::::  101  Figure 5.17: Simulated  Time (he) (liME)  VDD  in  voltage (on a 1GHz clock) with a CBD on and off (best case).  119  The above example can be considered as a best-case scenario because the current demand of the load and the clock rising edge are synchronized. On the other hand, the worst case would be that the current demand and the falling clock edge are lined up. The simulation result at a 1GHz external clock for the worst case is shown in Fig. 5.18. In the figure, the voltage sags created by the current demand are similar when the CBD circuit is turned on or off. The peaks created by the CBD are out of phase with the sags produced by the current demand. However, if the average VDD  value per clock cycle is used, similar to the previous approaches, it is clear that the CBD  circuit produces a much higher average VDD voltage because the voltage peaks in every clock  cycle raise the average VDD.  I  0m  Cik E  Ii  60Dm  40Dm  2n  0  I  /  ion  I  1211  1 .i5 1.1  VDD voltage CBD on  1.05  CBD off Dnkll  Domn 05Dm  ion  Time (un) (liME)  12n  Figure 5.18: Simulated VD voltage (on a 1Ghz clock) with a CBD on and off (worst case).  120  The simulations above are intended only to illustrate how the CBD circuit behaves under controlled conditions. Another set of simulations was used to compare the CBD with passive decap and active decap (both n=2 and n3) for a clock frequency of 1GHz. As before, the size of the user logic buffer is changed to produce different supply noise levels k. The results are plotted in Fig. 5.19. At all k levels, the average  VDD  voltage for the CBD is higher than the passive decap  and the two active decaps. When k=O. 15 for the passive decap, the average supply noise for the CBD is still at a satisfactory level of 1 OOmV. Compared to the case of active decaps at the same  k level, the average noise from the basic and extended active decaps fall close to that of the passive decap. This indicates that the CBD is more effective as a drop-in replacement than the other schemes.  1000  960  920  880  840  800 0  0.05  0.1 k  0.15  0.2  Figure 5.19: Simulated average VDD voltage as a function of k showing the case of CBD.  121  5.4 Test Chip Setup and Measurement Results To validate this new approach to hot-spot removal, another test chip was fabricated in the same 9Onm process. The degree of improvement is of interest as operation frequency increases when a basic active decap, an extended active decap, or a charge-borrowing decap is used as a drop-in replacement for a passive decap. The test chip setup is shown in Fig. 5.20. The decap circuits are individually controlled to be connected to the supply rail. In this case, the passive decap can be disconnected from the supply to observe the performance of the active decaps and the CBD by themselves. Nonninal Supply (1V)  inductance (mimicked)  parasitic capacitances  Figure 5.20: Test chip 2 setup. lllIIItI I  I  F=  liii  II  II  I  I  HHII  HIIIHHHHIH  111111  1IiiIILIIF HHHHHI  H  Hi  IrIHI  HHI1HHLiI  liii  1 H HH F F F F  F  H  IHIHIIHIHIH H11I1 If F F F F F I  F  FI  F  I  F  F  F  I  F  F  F  Figure 5.21: Layout of charge-borrowing decap showing the relative size of the components.  122  I  The layout of the charge-borrowing decap for the test chip is shown in Fig. 5.21. Unlike the circuit diagram shown in Fig. 5.16, the ring oscillator and the buffer chain were not implemented, as the Clk signal was provided from the same external clock for synchronization purposes. The rest of the circuit, along with the passive decap 2 Cdep, was implemented on the test chip. The switching circuitry is located on the left, with the PMOS-formed decap on the right. The total layout area for the CBD is about 600 jim x l5Ojim  =  , in which the decap occupies an 2 0.09mm  area of 0.083mm . The switching circuitry, including the diode transistors and the bootstrapping 2 circuit, accounts for 8% of the area. The gate-source voltage across the decap oxide is always less than or equal to one VDD so that thin-oxide devices are acceptable in this case. Thus, thinoxide PMOS transistors were used to implement the decap in the CBD.  F  F:: EEj ---E  \  .J ij  _  . ,  \I  :  I I==  -  -  -  \k i::i T EIL. (b) Figure 5.22: (a) Annotated microphotograph and (b) layout of the second test chip.  The microphotograph and the layout of the second test chip are shown in Fig. 5.22. In the figure, the decap circuits are placed about 600 jim away from the user logic, in which a large buffer with a large capacitive load was used to create supply noise, with the input controlled by an external signal. Similar to the first test chip, the size of the decaps was chosen to be only a few times  123  larger than the capacitive load to create a —1 OOmV voltage drop for the experiments. Note that the load cannot be dynamically changed to produce a variable k level. The clock frequency can be controlled externally by providing an input from a bit-rate analyzer. The probed pad of the supply node can be connected to an oscilloscope to measure the voltage waveforms. The test chip area is 1.1 x 0.86 = 0.95 2 mm in total.  A collection of 9 sample chips was tested. The clock frequency was fixed at 1 GHz for this test. The improvement across the sample space is shown in Fig. 5.23. The sample chips that are in the slow or normal process corners cannot be distinguished easily. However, the two sample chips that are in the fast process corner can be identified by correlating with simulation. As mentioned in Chapter 4, the average supply voltage varies significantly mainly due to the random process variation on Rmesh and Lpack, which determines the JR drop on the supply rails. Note that the supply noise reduction improvement by using the CBD is rather consistent under process variations across dies, which indicates the robustness of the design. The sample chip that provides the best improvement was used for further analysis.  124  990 • Passive Decap  980  A  970 > 960 E ‘95O 940 0 > 930 920 910 11.) 900 >  Active Decap n=2  -.  V  V  -  •  V  -  • Active Decap n=3 •CBD -  .  -  V -  -  -A A -E A  A V  Noise Budget  QQfl  -  I  880 870 V  I  8 60  1  2  3  4  5 6 Sample Number  I  I  7  8  Fast 9  Figure 5.23: Scatter plot comparing average VDD for the tested sample chips.  The waveforms of a test chip with a 1GHz clock are depicted in Fig. 5.24. The dark gray curve is when the CBD is on, and the light gray (red in color) curve is when the passive decap is on. The two curves are superimposed. As expected, the supply voltage returns to high when extra charge is fed through from the clean supply on the rising edge of the clock, improving the average level per cycle.  125  VDD  cquis.  is stopped.  0.0 GSa/s  L8O kpts  [iWi 00 rnv/ ‘Ii———— 2 ‘V  \/NJ\ •tI  •4••’—•+-+’  jt.  ‘-+--I—+—I’-  -+-44—4-4-+.——I-  4 .t-’  1h’ ,‘. r:x.. . .ti. .1. .1.  .1. L\j\vV  a.!.  vvz.  vv .1!  1  -a7ps  —  t°  Figure 5.24: Superimposed waveforms showing a CBD and a passive decap on a 1GHz clock. In the top panel of the figure, the upper trace is when the CBD is on, while the lower trace is when the passive decap is on.  It is also useful to obtain the CBD performance over the operating frequency range. Fig. 5.25 shows the impact of changing the external input frequency from 100MHz to 1.50Hz. Two solid trend lines are provided conesponding to the CBD and the passive decap. The gap between the two trend lines widens, indicating that the benefit of using the CBD increases as frequency increases. Even at 1.50Hz, the average  VDD  level for the CBD is significantly higher than the  passive decap, suggesting that the CBD is suitable for today’s high-speed ASICs and mediumspeed custom designs. At 1.5GHz, the test chip validated that the CBD has a maximum  126  improvement of 93mV (or about 53% less noise) over the passive decap, 74mV (48% less noise) over the basic active decap with n=2, or 46mV (36% less noise) over the extended active decap with n=3. From 100MHz to 1.5GHz, compared to the passive decap, using the CBD reduces the supply noise from 42% to 55%. Note that the CBD outperforms both the basic and the extended active decaps across the operating frequency range. 980 960 940 920 900 880  •  860  • Passive Decap  840  Active Decap n=2 • Active Decap n3  820  •CBD  • A  A  800 0  0.2  0.4  0.6 0.8 1 1.2 Input Clock Frequency (GHz)  1.4  1.6  Figure 5.25: Measured average VDD voltages at different clock frequencies.  Another important observation is that, unlike active decaps, the CBD does not seem to have a specific frequency, at which the average VDD voltages from the CBD and the passive decap crossover. In other words, there is always a gap in the average supply voltage between the CBD and the passive decap across the frequency range. This makes intuitive sense since the CBD  127  boosts the supply voltage at every clock cycle no matter the clock frequency. Although the noise increases as the frequency increases, the amount of charge provided by the CBD circuit remains roughly the same at every cycle (assuming that it is running at the clock frequency).  From the test results of the sample chips, it can be concluded that the charge-borrowing decap is capable of reducing the supply noise more efficiently than the passive and active decaps. In addition, there are other attractive features of the CBD, such as higher bandwidth, simplicity, and robustness. Moreover, the use of active decaps will increase the power consumption of the chip because of the internal switching circuitry. In the case of CBD, the local power overhead is fairly small since there is only very limited leakage current and most charge transferred from the clean supply to the noisy supply is not wasted. Specifically, the leakage power of the CBD is about 4p.W, about 0.1% of the total power consumption. However, the dynamic power of the Cik generation circuit is comparable to the active decap. Therefore, the equivalent power overhead with better performance makes the CBD circuit an appealing alternative to the active or passive decaps.  Although the charge-borrowing decap provides many advantages, it has a number of limitations. One important issue is that the supply voltage change is abrupt once the CBD is turned on. High frequency glitches on the supply may affect the logic circuit powered by it. To smooth out the supply voltages, large amount of passive decaps should be present in its vicinity. Although parasitic decoupling capacitance is inherently present on the supply, more decaps may be still needed. As a result, the existing area may not be completely replaceable by the CBD, as certain portion of the area should be reserved for the passive decaps. For a fixed area, the proportion of  128  the CBD and the passive decap depends on the sensitivity of the local logic circuit to the supply glitches. Questions like the maximum allowed distance from the decap to the CBD and the minimum required size of the decap remain unanswered at this stage. However, since the local circuitry of the CBD is relatively simple (passive decap plus a diode-connected transistor), it is still an attractive alternative to the active decaps.  5.5 Summary This chapter further extended the concept of active decap by increasing the stack height n to find an optimal value, depending on the supply noise k level presented at the local supply. It was found that the extended active decap with n=3 provided superior performance by delivering a higher average supply voltage than the basic active decap with n=2 when k is <14%. This chapter also introduced a novel design of charge-borrowing decap to provide better supply noise reduction than the basic and the extended active decaps. The charge-borrowing decap delivers more charge and an increased supply boost for a wide range of operating frequencies. Its relatively simplistic design and implementation ensures its robustness.  129  Chapter 6  Conclusions and Future Work  6.1 Summary and Conclusions As technology scales further into the deep submicron regime, increasing clock frequency and decreasing supply voltage makes maintaining the quality of power supply a critical issue. Onchip power supply noise, due to JR drop and Ldi/dt effects, has a great impact on delay variation, and may even cause improper functionality. Power supply noise can be reduced by placing decoupling capacitors close to power pads and large drivers throughout the power distribution system. Decaps provide locally “instantaneous” current to the switching drivers and keep the power supply within certain noise budgets. Traditionally, a decap is made from an NMOS transistor outside the standard-cell blocks, or a pair of NMOS and PMOS transistors within the blocks. However, starting from 9Onm technology, the reduction in oxide thickness of MOS transistors causes an increased ESD risk and more gate leakage. Standard decap designs, therefore, may no longer be appropriate for 9Onm and beyond.  In this dissertation, the goal was to provide practical solutions to active and passive decap designs targeting ASIC applications in both white-space and standard-cell areas. The dissertation began with an overview of decap design basics, gate leakage phenomenon, ESD concerns, and  130  standard-cell decap layout and placement. Some essential decap design issues were highlighted through the background section to motivate the topics for the rest of the dissertation. More importantly, the metric for power supply noise management was proposed and validated for decap performance comparisons used throughout the dissertation.  Next, the tradeoffs between high-frequency performance of decaps and ESD protection were investigated, and their impacts on the layout of standard-cell passive decaps were discussed. A design metric was introduced to determine the optimal number of fingers to use in the standardcell layout to obtain a desired capacitance level over a target operating frequency. Models were developed to capture the frequency responses of Reff and Ceff for any given technology with only a few parameters. For ESD protection, a cross-coupled design that had been previously proposed by cell library developers was shown to suffer from reduced frequency response and provided no savings in gate leakage. It was shown that more fingers than typically used were needed to provide the target resistance value for sufficient ESD protection. The layout with the smallest NMOS device and a multi-fingered PMOS device was described to deliver acceptable frequency response and ESD reliability, while providing the lowest leakage.  For white-space areas, the effectiveness of active decaps as a late-stage drop-in replacement for passive decaps was evaluated so that a completed chip layout need not be disrupted near the tapeout deadline. Improvements to the design of an active decoupling capacitor were described for removal of hot-spot JR-drop violations in ASIC designs running at up to 1GHz. The modified active decap using latch-based comparators in 9Onm CMOS is able to switch in O.5ns and consumes a relatively low power of 2.8mW, which is about 5X lower than a previous design  131  running at approximately the same speed. This reduced power makes it more suitable for use in ASIC designs. Measurement results from test chips indicate improvement over passive decaps of 10%  -  20%, operating from 200MHz to 1GHz. The optimal active decap size to maximally  remove hot-spot JR drop was identified. Placement analysis was also carried out and it was found that the active decap is most effective when placed in close proximity to the hot spot, as compared to the passive decap which is not as sensitive to the exact location. If sized and placed properly, active decaps can be up to 20% better when used as drop-in replacements of passive decaps for power supply noise reduction.  Further research on active decap design was explored. By increasing the stack height n to an optimal value, and depending on the supply noise k level presented at the local supply, a maximum supply boost can be achieved. It was found that the extended active decap with n=3 provided a superior performance by delivering a higher average supply voltage than the n=2 and n=4 cases when the supply noise k is in the range of 7-14%. When k is above 14%, n=2 must be used and beyond 16%, the area for the drop-in replacement of active decaps must be expanded to produce satisfactory improvement over the passive clecap.  Finally, the novel design for charge-borrowing decap was proposed. This design provides better supply noise reduction than all other forms of active decaps. The charge-borrowing decap efficiently transfers charge from a clean supply rail to eliminate the hot spots on relatively noisy supply nodes. The CBD only requires a minimum power overhead and delivers a maximum supply boost for a wide range of operating frequencies. Test results indicate that the CBD outperforms both the basic and the extended active decaps by reducing the supply noise to a  132  lower level. The design and implementation of the CBD was kept relatively simple so that the robustness of the design can be maintained.  6.2 Contributions in this Dissertation The following summarizes the major contributions in this dissertation: •  Developed standard-cell passive decap designs that properly trade off gate leakage, ESD reliability, and transient response. Provided simple and yet practical decap design metrics and guidelines for 9Onm and 65nm CMOS technologies.  •  Designed and implemented a white-space active decap using latch-based comparators that provides adequate supply noise reduction while consuming relatively low static power. Validated the design through a test chip. Explored the placement issues of the active decap.  •  Extended the concept of active decap for an optimal design that produced the highest supply boost for the maximum supply noise reduction in a local area. Proposed a simple but novel charge-borrowing decap circuit that outperforms the basic and the extended active decaps. Validated the design through another test chip.  6.3 Future Work The limitations on the charge-borrowing decap design needs to be evaluated further. Quantitative answers to questions like the optimal distance between the CBD and the logic, the accurate proportion between the CBD area and the nearby passive decap area, and the maximum allowable charge transferring from the clean supply node while still maintaining its supply integrity, should be investigated. A test chip with real industrial blocks that are placed as the user logic circuit is desired for the most accurate decap performance evaluation.  133  Another issue is the scalability of the active decaps and the CBD. The design and implementation of the active decap were only accomplished and validated in 9Onm CMOS. It is desirable to make the active decap designs useful in future technologies, such as 65nm and 45nm CMOS. As technology scales, more design challenges will occur. If any part of the design needs to be modified to accommodate the more advanced technology, it should be investigated in future research.  Monitoring power supply fluctuations on-chip in real-time is also an emerging area of research [20]. The measured results of real-time power supply noise from a monitoring circuitry can be used as a validation of decap design and placement. Many techniques have been used to monitor power supply noise on-chip [68][69]. These techniques are not suitable for production environments as they either need a significant area or require complex data processing off-chip. To overcome these limitations, a simple monitoring based on an under-sampling technique [70] is worthy of being investigated. Under-sampling is used to capture a high-frequency periodic signal from a large number of cycles using a slower sampling signal to achieve an effective highspeed sampling rate. In the case of power supply noise, the dynamic voltage drop is not periodic in nature, but the same experiment could be repeated several times, and each time skew the sampling point by a small time shift, &, which represents the sampling period resulting in an equivalent sampling frequency of  funcier-sampiing  =  1/At [70]. Using this technique, the  measurements may be repeated to average and cancel out the noise effect. This approach needs to be evaluated from concept to test chip in order to finally validate its advantages and disadvantages and serves as a promising area of future work.  É 134  APPENDIX: COMPARATOR DESIGN FUNDAMENTALS Comparators are one of the most widely used components in analog integrated circuits. A voltage comparator is a circuit that compares the instantaneous value of an input signal with a reference signal and produces an output at logic level, depending on whether the input is greater or smaller than the reference level [57]. One important application for high-speed voltage comparators is in data converters, where the conversion speed is limited by the response time of the comparators [57]. Other issues related to comparator design include finite resolution, offset, power, and area. As technology scales, more advanced CMOS technologies allow comparators to be realized for higher speed and potentially smaller area and power. However, it is difficult to achieve high speed and high accuracy at the same time because of the existence of device mismatches [71].  A widely used comparator configuration is a high-gain differential input, single-ended output amplifier, whose symbol is shown in Fig. A. 1. The output of the comparator  should have a  large swing, ideally from VDD to Vss, as the input varies across a small swing, typically in the millivolt range [57]. In many applications, a comparator is used in open-loop operations, such that no frequency compensation is required [57][581[59]. However, in certain cases, due to the nature of AC coupling of the output and the input, a comparator may need frequency compensation to avoid oscillations [32]. Vin+  Vout  Vin  Figure A.1: A differential input, single-ended output comparator symbol.  135  vout  vout  Vp  VDD  I / v+ I I, vii! v 111  offset  Vj+ Vj . 11 -  -  —‘finite gain Av (a)  (b)  Figure A.2: DC transfer characteristics of (a) an ideal comparator and (b) a practical comparator with finite gain and offset voltage.  The DC characteristic curve of an ideal differential comparator is shown in Fig. A.2(a). When the positive input + 11 is greater than the negative input V, the output is high (i.e., at V When V+ is less than  VDD).  the output is low (i.e., at Vss). This ideal DC transfer curve  corresponds to a differential gain of infinity. That is, an infinitely small polarity change in (V+  -  V) will cause the output to switch. A more realistic DC transfer curve of a comparator is depicted in Fig. A.2(b). In practice, the differential gain is finite and equal to A. In Fig. A.2(b), the two voltages VIL and VIH are the overdrive voltages (also called the input excess voltages). The overdrive is the input level that drives the comparator from an initial saturated input condition to an input level that barely causes the output of the comparator to switch its level [57]. Another non-ideal effect of the comparator is the input referred DC offset voltage that is mainly caused by device mismatches. If no offset voltage is present, the comparator DC transfer curve will be symmetrical around the point where V+  =  V. However, for a finite offset voltage of  05 the comparator output will switch at (V÷ V V , -  of a comparator can be written as [57] [581:  136  +  Vos). In general, the output voltage V 0  zf Aixç zf VDD  =  VSS  where AV 1  =  -  lf  M”nJH “IL  (A.l)  M/n <“IH n<VJL  Clearly, the finite gain and offset voltage affect the accuracy of the  comparator. It is desired that a comparator is designed to have a high gain and a small offset voltage.  Three major challenges exist in any comparator design: high speed, high resolution, and low power [72]. High speed is achieved by having a fast response time. That is, following an input polarity change, the comparator should switch its output between  VDD  and Vss with fast rise and  fall times. In order to achieve the highest comparison speed, the minimum channel length for a specific technology is often used in comparator designs [71]. In addition, low power consumption is always desired. As technology scales, due to  VDD  scaling, the dynamic power of  a comparator will scale accordingly. Meanwhile, the static power consumption of the comparator may increase due to larger leakage current in a more advanced deep-submicron technology. Overall, the total power consumption including both dynamic and static power may reduce as technology shrinks [72].  A high gain A is essential to achieve high resolution in a comparator design. For example, the input of the comparator needs to resolve lmV of input variation, which requires the output to switch a full swing of 1V at  VDD,  the voltage gain Av is therefore 1V/lmV  =  1000. It is difficult  to achieve such a high gain within one stage of amplification. Hence, a multi-stage amplifier or a regenerative latch using positive feedback may be used as a comparator to achieve the high gain  137  requirement. A latch is normally faster then a multi-stage amplifier achieving the same gain [60][71][73]. Therefore, latch-based comparators are often used in practice. VDD M4  M3  Vout2  Vouti  c-jLM2 I  M1jF-O  I  Via-  Vbiasl  -H  Vin+  Mbl V  (a)  (b)  (c)  Figure A.3: Circuit schematics showing positive feedback: (a) a basic differential pair with diodeconnected load, (b) a gain-enhanced differential pair, and (c) a differential pair using positive feedback to provide increased gain.  The concept of positive feedback used in the latch approach can be explained in the following example, as shown in Fig A.3. In Fig. A.3(a), a differential pair is loaded with two diodeconnected PMOS devices. Its small-signal gain AV(a) can be given by [64] [73]:  AV(a)  -=  g  IPn(’’Ml  (A.2)  3 V/tP(W/L)M  where gmi and g3 are the transconductance of Ml and M3, and 1n and  pp  are the channel  mobility of NMOS and PMOS devices, respectively. In Fig. A.3(b), a gain enhancement approach is added to the circuit with two additional transistors, M5 and M6. The small-signal gain of Av(b) can be given by [54j[57]:  AV(a)  l—(2I  138  (A.3) “hi)  where I and  Ibi  are the current flow through M5 and Mbl, respectively. By properly choosing  the size of M5, the small-signal gain can be improved. In Fig. A.3(c), the drain terminals of M5 and M6 are cross connected, creating a form of positive feedback. The positive feedback functions as follows: when V+ is slightly larger than Vu, it causes V i to be slightly smaller 0 than . 01 Similarly, smaller V 02 Larger V V 02 forces M6 to deliver less current to the node . 02 This positive loop reinforces V V 01 forces M5 to deliver more current to charge up . V 02 to reach VDD and V i to reach V 0 55 [54][57]. The small-signal gain of Av( ) can be given by [57]: 0  Av(C)  1 W/D)M5  AV(a)  (A.4)  3 (W/L)M Equation A.4 requires that the size ratio of (W/L)M5 / (W/L)M3 is less than 1. When the size ratio is greater than unity, the small-signal gain will become infinity and the circuit will operate as a regenerative latch [57].  139  REFERENCES [1]  H. H. Chen and S. E. Schuster, “On-chip decoupling capacitor optimization for highperformance VLSI design,” Symposium on VLSI Technology, Systems, and Applications, pp. 99-103, May-Jun. 1995.  [2]  S. Pant and E. Chiprout, “Power grid physics and implications for CAD,” IEEE/A CM Design Automation Conference, pp. 199-204, Jul. 2006.  [3]  N. Srivastava, X. Qi, and K. Banerjee, “Impact of on-chip inductance on power distribution network design for nanometer scale integrated circuits,” International Symposium on Quality ofElectronic Design (ISQED), pp. 346-35 1, Mar. 2005.  [4]  C. W. Fok and D. L. Pulfrey, “Full-chip power-supply noise: the effect of on-chip powerrail inductance,” International Journal ofHigh Speed Electronics and Systems, vol. 12, no. 2, pp. 573-582, Jun. 2002.  [5]  J. Kim, B. Choi, H. Kim, W. Ryu, Y. -H. Yun, S. -H. Hamm, S. -H. Kim, and Y. -H. Lee, “Separated role of on-chip and on-PCB decoupling capacitors for reduction of radiated emission on printed circuit board,” IEEE International Symposium on Electromagnetic Compatibility, pp. 53 1-536, Aug. 2001.  [6]  H. H. Chen, J. S. Neely, M. F. Wang, and G. Co, “On-chip decoupling capacitor optimization for noise and leakage reduction,” Symposium on Integrated Circuits and Systems Design, pp. 3 19-326, Sep. 2003.  [7]  D. A. Hodges, H. G. Jackson, and R. A. Saleh, Analysis and Design ofDigital Integrated Circuits in Deep Submicron Technology, 31X ed., New York: McGraw-Hill, 2004.  [8]  N. Na, T. Budell, C. Chiu, E. Tremble, and I. Wemple, “The effects of on-chip and package decoupling capacitors and an efficient ASIC decoupling methodology,” Electronic Components and Technology Conference (ECTC), pp. 556-567, Jun. 2004.  [9]  H. Su, S. S. Sapatnekar, and S. R. Nassif, “Optimal decoupling capacitor sizing and placement for standard-cell layout designs,” IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems, vol. 22, no. 4, pp. 428-436, Apr. 2003.  [10] M. Popovich and E. G. Friedman, “Decoupling capacitors for multi-voltage power distribution systems,” IEEE Transactions on Veiy Large Scale Integration (VLSI) Systems, vol. 14, no. 3, pp. 217-228, Mar. 2006. [11] M. Popovich, E. G. Friedman, M. Sotman, A. Kolodny, and R. M. Secareanu, “Maximum effective distance of on-chip decoupling capacitors in power distribution grids,” IEEE/A CM Great Lakes Symposium on VLSI, pp. 173-179, May 2006. [12] P. Larsson, “Parasitic resistance in an MOS transistor used as on-chip decoupling capacitance,” IEEE Journal ofSolid-State Circuits, vol. 32, no. 4, pp. 574-576, Apr. 1997. 140  [13] P. Larsson, “Resonance and damping in CMOS circuits with on-chip decoupling capacitance,” IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications, vol. 45, no. 8, pp. 849—858, Aug. 1998. [14] M. D. Powell and T. N. Vijaykumar, “Exploiting resonant behavior to reduce inductive noise,” International Symposium on Computer Architecture, pp. 288—299, Jun. 2004. [15] C. H. Ng, C. S. Ho, S. -F. S. Chu, and S. -C. Sun, “MIM capacitor integration for mixedsignal/RE applications,” IEEE Transactions on Electron Devices, vol. 52, no. 7, pp. 13991409, Jul. 2005. [16] M. W. C. Goh, Q. Lim, R. A. Keating, A. V. Kordesch, and Y. Bin Mohd Yusof, “Design of radio frequency metal-insulator-metal (MIM) capacitors,” International Conference on Solid-State and Integrated Circuits Technology, pp. 209-2 12, Oct. 2004. [17] C. H. Ng, C. S. Ho, N. G. Toledo, and S. -F. Chu, “Characterization and comparison of single and stacked MIMC in copper interconnect process for mixed-mode and RE applications,” IEEE Electron Device Letters, vol. 25, no. 7, pp. 489-491, Jul. 2004. [18] C. H. Ng, C. S. Ho, S. -F. S. Chu, and S. -C. Sun, “MIM capacitor integration for mixedsignal/RE applications,” IEEE Transactions on Electron Devices, vol. 52, no. 7, pp. 13991409, Jul. 2005. [19] H. Yamamoto and J. A. Davis, “Decreased effectiveness of on-chip decoupling capacitance in high-frequency operation,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 15, no. 6, pp. 649-659, Jun. 2007. [20] K. Arabi, R. Saleh, and X. Meng, “Power supply noise in SoCs: metrics, management, and measurement,” IEEE Design and Test of Computers, vol. 24, no. 3, pp. 23 6-244, May-Jun. 2007. [21] TSMC 9Onm CLN9OG Process SA GE-X v3. 0 Standard Cell Library Databook, Release 1.0, Artisan Components Inc., Sunnyvale, CA, 2004. [22] X. Meng, R. Saleh, and K. Arabi, “Layout of decoupling capacitors in IP blocks for 90-nm CMOS,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 16, no. 11, pp. 1581-1588, Nov. 2008. [23) X. Meng, K. Arabi, and R. Saleh, “Novel decoupling capacitor designs for sub-9Onm CMOS technology,” International Symposium on Quality Electronic Design (ISQED), pp. 266-272, Mar. 2006. [24] A. Amerasekera and C. Duvvury, ESD in Silicon Integrated Circuits, 2 ed., Hoboken, NY: John Wiley & Sons, 2002. [25] J. Fu, Z. Luo, X. Hong, T. Cai, S. X. -D. Tan, and Z. Pan, “VLSI on-chip power/ground network optimization considering decap leakage currents,” Asia and South PacJIc Design Automation Conference, vol. 2, pp. 735-738, Jan. 2005.  141  [261 K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, “Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits,” Proceedings of IEEE, vol. 91, no. 2, pp. 305-327, Feb. 2003. [27] F. Hamzaoglu and M. Stan, “Circuit-level techniques to control gate leakage for sub lOOnm CMOS,” International Symposium on Low Power Electronics and Design, pp. 60— 63, Aug. 2002. [28] R. S. Guindi and F. N. Najm, “Design techniques for gate-leakage reduction in CMOS circuits,” International Symposium on Quality Electronic Design (ISQED), pp. 6 1-65, Mar. 2003. [29] D. Lee, W. Kwong, D. Blaauw, and D. Sylvester, “Analysis and minimization techniques for total leakage considering gate oxide leakage,” IEEE/A CM Design Automation Conference, pp 175-180, Jun. 2003. [30] L. Chang, K. J. Yang, Y. -C. Yeo, Y. -K. Choi, T. -J. King, and C. Hu, “Reduction of direct-tunneling gate leakage current in double-gate and ultra-thin body MOSFETs,” IEEE Transactions on Electron Devices, vol. 49, no. 12, pp. 2288-2295, Dec. 2002. [311 X. Meng, K. Arabi, and R. Saleh, “A novel active decoupling capacitor design in 9Onm CMOS,” IEEE International Symposium on Circuits and Systems (ISCAS), pp. 657-660, May 2007. (Top 10 Honorable Mention Award) [32] X. Meng and R. Saleh, “An improved active decoupling capacitor for “hot-spot” supply noise reduction in ASIC designs,” IEEE Journal of Solid-State Circuits, vol. 44, no. 2, pp. 584-593, Feb. 2009. [33] R. Saleh, D. Overhauser, S. Taylor, “Full-chip verification of UDSM designs,” IEEE/A CM International Conference on Computer-Aided Design, pp. 45 3-460, Nov. 1998. [34] S. Sapatnekar, “High-performance power grids for nanometer technologies,” IEEE International Conference on VLSIDesign, pp. 839-844, Jan. 2004. [35] G. Bai, S. Bobba, and I. N. Hajj, “Static timing analysis including power supply noise effect on propagation delay in VLSI circuit,” IEEE/A CM Design Automation Conference, pp. 295-300, Jun. 2001. [36] H. Harizi, R. HauBler, M. Olbrich, and E. Barke, “Efficient modeling techniques for dynamic voltage drop analysis,” IEEE/A CMDesign Automation Conference, pp. 706-711, Jun. 2007. [37] M. Ang, R. Salem, and A. Taylor, “An on-chip voltage regulator using switched decoupling capacitors,” IEEE International Solid-State Circuits Conference, pp. 438-439, Feb. 2000. [38] M. A. Ang, and A. D. Taylor, “Voltage regulating circuit for attenuating inductance induced on-chip supply variations,” U.S. Patent 6509785, Jan. 21, 2003.  142  [39] C. Giacomotto, R. P. Masleid, and A. Harada, “Four-state switched decoupling capacitor system for active power stabilizer,” U.S. Patent 6744242 Bi, Jun. 1, 2004. [40] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits. A Design nd ed., Upper Saddle River, NJ: Prentice Hall, 2004. Perspective, 2 [41] J. Gu, H. Eom, and C. Kim, “A switched decoupling capacitor circuit for on-chip supply resonance damping”, Symposium on VLSI Circuits, pp. 126-127, Jun. 2007. [42] J. Gu, R. Harjani, and C. Kim, “Distributed active decoupling capacitors for on-chip supply noise cancellation in digital VLSI circuits”, Symposium on VLSI Circuits, pp. 216217, Jun. 2006. [43] W. C. Lee and C. Hu, “Modeling gate and substrate currents due to conduction- and valence-band electron and hole tunneling,” Symposium on VLSI Technology, pp. 198-199, Jun. 2000. [44] K. Cao, W. -C. Lee, W. Liu, X. Jin, P. Su, S. K. H. Fung, J. X. An, B. Yu, and C. Hu, “BSIM4 gate leakage model including source drain partition,” International Electron Devices Meeting (IEDM), pp. 815-818, Dec. 2000. [45] X. Xi, M. Dunga, J. He, W. Liu, K. M. Cao, X. Jin, J. J. Ou, M. Chan, A. M. Niknejad, and C. Hu, “BSIM4.4.0 MOSFET model user’s manual,” University of California, Berkeley, 2004. [46] C. K. Alexander and M. N. 0. Sadiku, Fundamentals of Electric Circuits, New York: McGraw-Hill, 2000. [47] X. W. Wang, Y. Shi, T. P. Ma, G. J. Cui, T. Tamagawa, J. W. Golz, B. L. Halpen, and J. J. Schmitt, “Extending gate dielectric scaling limit by use of nitride or oxynitride,” Symposium on VLSI Technology, pp. 109-110, Jun. 1995. [48] T. P. Ma, “Opportunities and challenges for high-k gate dielectrics,” International Symposium on the Physical and Failure Analysis of Integrated Circuits (IPFA), pp. 1-4, Jul. 2004. [49] T. P. Ma, “Electrical characterization of high-k gate dielectrics,” International Conference on Solid-State and Integrated Circuits Technology, pp. 36 1-365, Oct. 2004. [50] V. George, S. Jahagirdar, C. Tong, K. Smits, S. Damaraju, S. Siers, V. Naydenov, T. Khondker, S. Sarkar, and P. Singh, “Penryn: 45-nm next generation Intel® CoreTM 2 processor,” IEEE Asian Solid-State Circuits Conference (ASSCC), pp. 14-17, Nov. 2007. [51] M. T. Bohr, R. S. Chau, T. Ghani, and K. Mistry, “The high-k solution,” IEEE Spectrum, vol. 40, no. 10, pp. 29-35, Oct. 2007. [52] J. Chia, “Design, layout and placement of on-chip decoupling capacitors in IP blocks,” MA.Sc Thesis, University of British Columbia, 2004.  Q 143  [53] S. Zhao, K. Roy and C. -K. Koh, “Decoupling capacitance allocation and its application to power-supply noise-aware floorplanning,” IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems, vol. 21, no. 1, pp 81-92, Jan. 2002. [54] J. R. Hauser, “Bias sweep rate effects on quasi-static capacitance of MOS capacitors,” IEEE Transactions on Elecfron Devices, vol. 44, no. 6, pp. 1009-1012, Jun. 1997. [55] W. Liu, MOSFET Models for SPICE Simulation Including BSIM3v3 and BSIM4, Wiley IEEE Press, 2001. [56] H. Johnson and M. Graham, High-Speed Digital Design, Prentice-Hall, 1993. [57] R. Gregorian, Introduction to CMOS Op-Amps and Comparators, New York: John Wiley & Sons, 1999. [58] R. J. Baker, CMOS: Circuit Design, Layout, and Simulation, Press, 2005.  nd 2  ed., Piscataway, NJ: IEEE  [59] D. A. Jones and K. Martin, Analog Integrated Circuit Design, New York: John Wiley & Sons, 1997. [60] 5. Sheikhaei, S. Mirabbasi, and A. Ivanov, “A 43mW single-channel 4G5/s 4-bit flash ADC in 0.l8jim CMOS,” IEEE Custom Integrated Circuits Conference (CICC), pp. 353356, Sep. 2007. [61] E. Alon and M. Horowitz, “Integrated regulation for energy-efficient digital circuits,” IEEE Journal ofSolid-State Circuits, vol. 43, no. 8, pp. 1795-1807, Aug. 2008. [62] X. Meng and R. Saleh, “Active decap design considerations for optimal supply noise reduction,” International Symposium on Quality Electronic Design (ISQED), pp. 765-769, Mar. 2009. [63] X. Meng, R. Saleh, and S. Wilton, “Charge-borrowing decap: a novel circuit for removal of local supply noise violations,” accepted to IEEE Custom Integrated Circuits Conference (CICC), Sep. 2009. [64] B. Razavi, Design ofAnalog CMOS Integrated Circuits, New York: McGraw Hill, 2001. [65] T. B. Cho and P. R. Gray, “A lOb, 2oMsamples/s, 35mW pipeline AID converter,” IEEE Journal ofSolid-State Circuits, vol. 30, no. 3, pp. 166-172, Mar. 1995. [66] A. M Abo and P. R. Gray, “A 1 .5-V, 10-bits, 14.3-MS/s CMOS pipeline analog-to-digital converter,” IEEE Journal ofSolid-State Circuits, vol. 34, no. 5, pp. 599-606, May 1999. [67] K. A. Jenkins, K. L. Shepard, and Z. Xu, “On-chip circuit for measuring period jitter and skew of clock distribution networks,” IEEE Custom Integrated Circuits Conference (CICC), pp. 157-160, Sep. 2007.  144  [68] E. Alon, V. Stojanovic, and M. A. Horowitz, “Circuits and techniques for high-resolution measurement of on-chip power supply noise,” IEEE Journal of Solid-State Circuits, vol. 40, no. 4, pp. 820-828, Apr. 2005. [69] T. Nakura, M. Ikeda, and K. Asada, “Design and measurement of on-chip dildt detector circuit for power supply line,” IEEE Asia-PacUlc Conference on Advanced System Integrated Circuits, pp. 426-427, Aug. 2004. [70] B. Kaminska and K. Arabi, “Mixed signal DFT: a concise overview,” IEEE International Conference on Computer-Aided Design, pp. 672-680, Nov. 2003. [71] B. Murmann, “AID converter trends: power dissipation, scaling and digitally assisted architectures,” IEEE Custom Integrated Circuits Conference (CICC), pp. 105-112, Sep. 2008. [72] R. J. van de Plassche, J. H. Huij sing, and W. Sansen, Analog Circuit Design: High-Speed Analog-to-Digital Converters; Mixed-Signal Design; PLL ‘s and Synthesizers, Boston: Kluwer Academic Publishers, 2000. [73] M. Gustavsson, J. J. Wikner, and N. Tan, CMOS Data Converters for Communications, Boston: Kiuwer Academic Publishers, 2000.  145  

Cite

Citation Scheme:

    

Usage Statistics

Country Views Downloads
United States 8 1
Russia 8 0
China 3 11
City Views Downloads
Unknown 12 1
Beijing 3 0
Ashburn 3 0
San Antonio 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0068219/manifest

Comment

Related Items