UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Design and analysis of active and passive decoupling capacitors for on-chip power supply noise management Meng, Xiongfei 2009-11-12

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
24-ubc_2009_fall_meng_xiongfei.pdf [ 4.1MB ]
Metadata
JSON: 24-1.0068219.json
JSON-LD: 24-1.0068219-ld.json
RDF/XML (Pretty): 24-1.0068219-rdf.xml
RDF/JSON: 24-1.0068219-rdf.json
Turtle: 24-1.0068219-turtle.txt
N-Triples: 24-1.0068219-rdf-ntriples.txt
Original Record: 24-1.0068219-source.json
Full Text
24-1.0068219-fulltext.txt
Citation
24-1.0068219.ris

Full Text

DESIGN AND ANALYSIS OF ACTIVE AND PASSIVEDECOUPLING CAPACITORS FOR ON-CHIP POWERSUPPLYNOISE MANAGEMENTbyXIONGFEI MENGB.A.Sc., The University of British Columbia, 2004M.A.Sc., The University of British Columbia, 2006A THESIS SUBMITTED IN PARTIAL FULFILMENTOFTHE REQUIREMENTS FOR THE DEGREEOFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIES(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)July 2009© Xiongfei Meng, 2009ABSTRACTOn-chip decoupling capacitors (decaps) in the formof MOS transistors are widely usedto reduce power supply noise in both standard-cellblocks and white spaces between blocks. Thisresearch provides guidelines for layouts of decaps thatproperly tradeoff high-frequency response,electrostatic discharge (ESD) reliability and gate tunneling leakagefor use within standard-cellblocks in ASIC designs in 9Onm and 65nm CMOS technologies.A simple but effective metric isdeveloped to determine the optimal decap layout based on thefrequency response. Novel activedesigns are also presented.If an JR-drop violation (hot spot) is found after the physical designis completed, it isusually difficult to implement a quick fix to the problem. In thisdissertation, the use of an activedecap in white-space areas as a drop-in replacement for passive decapsis investigated to providenoise reduction for these “hot-spot” problems found late inthe design process. A modified activedecap design is proposed for ASIC applications operating up to1GHz, and the use of latch-basedcomparators provides a better power-delay trade-off. Measurementresults from a test chip showthat the noise reduction using active decaps improvesas operating frequency increases, andprovides between 1O%-20% noise reduction at 200MHz-1GHzover its passive counterpart.The concept of active decap is further extended to achieve lower supplynoise. It is foundthat an active decap with a stack height of three (i.e., numberof pieces switching) provides thebest noise reduction if the supply noise level is between 7%-14%,but a stack height of two isbest if the noise level is between 14%-16%. In addition, a novelcharge-borrowing decap circuitis introduced which outperforms all forms of active decapsfor a fixed area in terms of removinglocal hot spots.11TABLE OF CONTENTSAbstractiiTable of ContentsiiiList of TablesviList of FiguresviiAcknowledgmentsxiiChapter 1 Introduction11.1 Motivation11.2 Research Objectives51.3 Organization ofthe Dissertation6Chapter 2 Background72.1 Introduction72.2 Decoupling Capacitor Basics and Design Challenges82.3 Thin-Oxide Gate Tunneling Leakage 122.4 Electrostatic Discharge Reliability in Decap Design 162.5 Standard-Cell Decap Layout and Placement202.6 Metricfor Power Supply Noise Management 222.7 Summary25Chapter 3 Passive Decoupling Capacitor Design263.1 Introduction263.2 High-Frequency Response ofDecoupling Capacitors273.3 Cross-Coupled Decoupling Capacitor Designs401113.4 Summaiy.48Chapter 4 Active Decoupling Capacitor Design504.1 Introduction504.2 Active Decoupling Capacitor Analysis and Design514.2.1 Active Decap Concept and Design Considerations514.2.2 Overall Active Decap Architecture564.2.3 Design Specifications614.2.4 Latch-Based Comparator Design634.3 Chip Design and Experimental Results704.3.1 Test Chip Setup704.3.2 Test Chip Simulations754.3.3 Test Chip Measurements784.3.4 Measurement Results on One Typical Chip854.4 Active Decap Size and Placement884.5 Summaiy92ChapterS Generalized Active Decap and Charge-Borrowing Decap945.1 Introduction945.2 Extended Active Decoupling Capacitor955.2.1 Optimal Stack Height n955.2.2 Design and Layout of n=3 Extended Active Decap1025.2.3 Simulation Results1035.3 Charge-Borrowing Decap (CBD)1075.3.1 Charge-Borrowing Decap Concept1075.3.2 “Cllc” Signal Generation1135.3.3 Design of Charge-Borrowing Decap1165.3.4 Simulation Results1185.4 Test Chip Setup and Measurement Results122iv5.5 Summary.129Chapter 6 Conclusions and Future Work1306.1 Summary and Conclusions1306.2 Contributions in this Dissertation1336.3 Future Work133Appendix: Comparator Design Fundamentals135References140VLIST OF TABLESTable 1.1: Comparison on active and passive decap implementations3Table 3.1: Optimal number of fingers for different frequency ranges36Table 3.2: Comparison of the passive decap designs and their gateleakage current 45Table 4.1: Design specifications of the active decap62Table 4.2: Transistor sizes of the comparators65Table 4.3: Simulated switching circuit design specification comparison69Table 4.4: Comparator delaytd and delay difference Atd in different corners 80Table 4.5: Measured active decap performance for different process corners82Table 4.6: Comparison between equation and simulated result after correlation82Table 4.7: Active decap bandwidth versus average comparator delayunder process corners 83Table 5.1: Optimal stack height n selection based on the supply noisek (from formula) 100Table 5.2: Optimal stack height n selection based on the supply noise k(from simulation) 105Table 5.3: High-level comparison among passive, active, and charge-borrowingdecaps 108viLIST OF FIGURESFigure 1.1: Effectiveness of on-chip decaps: (a) without decaps, the power supply noise level canbe high, compared to (b) where the noise level is low due to the use of decaps 2Figure 2.1: Decoupling capacitor implemented using an NMOS device8Figure 2.2: Cross-coupled decap schematic [21]10Figure 2.3: Active decap concept shown in (a) and two previous designs from(b) [37][38] and (c)[39111Figure 2.4: Gate leakage current versus gate area13Figure 2.5: Gate leakage current densityJleakversus oxide thickness t0, 13Figure 2.6: Complete ESD protection scheme18Figure 2.7: Simulation setup for ESD analysis [24]19Figure 2.8: Sample layout of standard-cell N+P decap (a) with one finger and (b) with twofingers 20Figure 2.9: DVDavg and DVDm: metric used to evaluate DVD profiles 23Figure 3.1: (a) A standard cell decap can be implemented as a NMOS in parallel with a PMOSdevice. The corresponding layout is shown in (b)28Figure 3.2: A decap can be implemented as a NMOS device and modeled as a lumped-RC circuitwith effective resistance and effective capacitance as functions of frequency,f 29Figure 3.3: The circuit setup to extract the effective resistance and the effective capacitancevalues from an ac analysis31Figure 3.4: Plots of Ceff and Reff for three device sizes (W x L): lOxlOj.im, 15x5.un, and 5x15jtm.33Figure 3.5: Plots of Ceff and Reff for three NMOS devices (HSPICE versus model) 35viiFigure 3.6: The effective capacitance,Ce(j), of NMOS and PMOS decaps in 9Onm for differentnumbers of fingers in a fixed area of Y=2jim andX=9itm 38Figure 3.7: The effective capacitance,Ceji(J), of NMOS and PMOS decaps in 9Onm for differentnumbers of fingers in a fixed area of Y=2im and X=9j.tm394Figure 3.8:Cej(J) and Reô’(J) comparison of fixed-area standard decap and cross-coupled decap:same MOS device sizes but different poiy connections42Figure 3.9: Sample layouts of cross-coupled decap cellsfor (a) 3N-4P (b) 8N-9P 43Figure 3.10: Sample layouts of improved decap cells for(a) 1N-9P (b) 1N-16P 45Figure 3.11: Frequency response of various cross-coupleddesigns 47Figure 4.1: Active decap concept and its MOS implementation52Figure 4.2: The reductive factorsfandg for the boosted voltage as a function of (a) “on”resistances of the switches, R, and (b) leakage due tothe size of decapCdecp 55Figure 4.3: Active decap architecture56Figure 4.4: Difference in the input voltagesof the comparators as a function of supply noise k. 58Figure 4.5: The reductive factor h for the boosted voltage as a functionof delay difference&d. 61Figure 4.6: Complementary comparator design: (a) n-type inputfor the top comparator, and (b)p-type input for the bottom comparator64Figure 4.7: (a) DC and (b) AC (compensated) characteristiccurves for the two-stage latch-basedcomparator design (n-type input shown)67Figure 4.8: AC characteristic curves for the three designs70Figure 4.9: Test chip setup71Figure 4.10: Layout of active decap showing the relative sizeof the components 71Figure 4.11: Final voltage a as a function of sensingand switching circuitry area overhead x. .. 72viiiFigure 4.12: Annotated test chip microphotograph74Figure 4.13: Simulated VDD voltage (on a 500MHz clock) with active decapon and off 76Figure 4.14: SimulatedVDD voltage with active decap on for different process corners 77Figure 4.15: Grouped scatter plot for active decap noise reduction for thetested sample chips.. 79Figure 4.16: Comparator delaytdand delay differenceAtd as a function of supply noise k 80Figure 4.17: Simulated averageVDD voltage with active decap on and off versus clock frequencyfor (a) slow, (b) typical, and (c) fast process corners84Figure 4.18: Measured results (on a 500MHz clock) for (a) active decapon and (b) plottedcomparison between active decap on and off86Figure 4.19: Measured (0.2 - 1GHz) and simulated (1 - 2.5GHz) averageVDD voltage with activedecap on and off versus clock frequency87Figure 4.20: Simulated average VDD noise per clock cycle versus normalized decap size90Figure 4.21: Power supply noise reduction difference from active decap and passive decapwitharea overhead from switching circuit of active decap91Figure 4.22: Improvement on averageVDD noise for using active decaps in different placementlocations by varying Rdist and Rmesh92Figure 5.1: Active decap concept showing different stack height: (a) n=2,(b) n=3, and (c) n=4.96Figure 5.2: MOS implementation of the extended active decap (n=3)97Figure 5.3: Final voltageaVDD as a function of stack height n with k varying (fixed area) 99Figure 5.4: Final voltageaVDD as a function of k with different stack height n (from formula).101Figure 5.5: Extended active decap (n=3) architecture 102ixFigure 5.6: Layout of extended active decap (n=3) showing the relative sizeof the components.103Figure 5.7: SimulatedVDD voltage with extended active decap (n=3) on for two different k levels.104Figure 5.8: Average VDD voltage as a function of k with different stack heightn (fromsimulation)105Figure 5.9: Charge-borrowing decap concept shown in (b), compared to a passivedecap in (a).107Figure 5.10: Diode inserted charge-borrowing decap showing the states ofthe clocking signal: (a)Clk at 0, (b) Clk rises toVDD, and (c) Cik falls back to 0 110Figure 5.11: Possible implementations of charge-borrowing decap with (a) NMOS-formeddiodes and (b) PMOS-formed diodes110Figure 5.12: Improved implementation of charge-borrowing decap with PMOSformed diodes.illFigure 5.13: Generation of the boosted voltage on node B2112Figure 5.14: “Cik” signal generation using ring oscillator114Figure 5.15: Buffer size determines the edge delay while the buffer chain(with ring oscillator)consumes dynamic power115Figure 5.16: Complete circuit diagram of charge-borrowing decap117Figure 5.17: Simulated VDD voltage (on a 1GHz clock) with a CBD on and off(best case) 119Figure 5.18: SimulatedVDD voltage (on a 1GHz clock) with a CBD on and off (worst case)... 120Figure 5.19: Simulated averageVDD voltage as a function of k showing the case of CBD 121Figure 5.20: Test chip 2 setup122xFigure 5.21: Layout of charge-borrowing decap showing the relative sizeof the components. 122Figure 5.22: (a) Annotated microphotograph and (b) layoutof the second test chip 123Figure 5.23: Scatter plot comparing averageVDD for the tested sample chips 125Figure 5.24: Superimposed waveforms showing a CBD anda passive decap on a 1GHz clock. Inthe top panel of the figure, the upper trace is when the CBD is on, while the lowertrace iswhen the passive decap is on126Figure 5.25: Measured averageVDD voltages at different clock frequencies 127Figure A. 1: A differential input, single-ended output comparatorsymbol 135Figure A.2: DC transfer characteristics of (a) an ideal comparatorand (b) a practical comparatorwith finite gain and offset voltage136Figure A.3: Circuit schematics showing positive feedback:(a) a basic differential pair withdiode-connected load, (b) a gain-enhanced differential pair, and(c) a differential pair usingpositive feedback to provide increased gain138xiACKNOWLEDGMENTSI would like to express my gratitude to my academic supervisor, Dr.Resve Saleh, whoseexpertise, understanding, encouragement, and support, added significantlyto my graduateexperience. I appreciate his profound and vast knowledge in many areas, bothwithin and outsideof the scope of my research. I would like to thank the exam committee, Dr.Steve Wilton, Dr.David Pulfrey, Dr. Mark Greenstreet, Dr. John Madden, Dr. Lutz Lampeand Dr. Tim Salcudean,for reviewing this dissertation and providing valuable feedback.A very special thanks goes to my colleagues and friends in the SoCgroup, for their technicaladvice and kindness. More specifically, I would like to thank Dr.Shabriar Mirabbasi, Dr.Roberto Rosales, Dipanjan Sengupta, Dr. Mehdi Alimadadi, Dr. Samad Sheikhaei,Jeff Mueller,and Sohaib Majzoub. I also acknowledge Dr. Karim Arabi of Qualcomm andAsad Shayan ofPMC-Sierra for their suggestions and help in this study.I recognize that this research would not have been possible without the financialsupport fromNSERC and PMC-Sierra Inc., and express my gratitude to them. I alsothank CMCMicrosystems for providing chip fabrication and the CAD tools.Last but not least, I would especially like to thank my family for both giving and encouraging meto seek for myself a demanding and meaningful education. In particular, I must acknowledgemywife, Liming, for her love, caring and patience through many years of my life. I wouldnot haveaccomplished this research without her support. The appreciation extends beyondany words atmy command.xiiChapter 1Introduction1.1 MotivationScaling of CMOS technology allows higher speed and higher functionaldensity. As the clockfrequency increases and the supply voltage decreases to about 1V, maintainingthe quality of thepower supply has become a primary issue. Power supply noise in theform of voltage variationsarises due to JR drop and Ldi/dt effects [1]. The JR drop has been increasing over timedue toincreased resistances in the power grid as the metal widths continueto shrink with eachsuccessive technology generation. The inductive Ldi/dt effects are also increasingdue to thehigher current demands in more complex chips. However, the pinand package inductanceoverwhelms the inductance of the on-chip power distribution network, and therefore theon-chipinductance can be neglected [2], although the on-chip inductance may be considered incertainapplications [3][4]. Having the two components together, the overall voltagedrop AV, at anypoint in the power grid is [5][6][7]:rop = ‘supplyRmesh+Lpack(1.1)where Rmesh is the power grid (mesh) resistance,Lpack is the package and pin inductance, and‘supplyis the current flow through the user logic circuits.1(a)Figure 1.1: Effectiveness of on-chip decaps: (a) without decaps, the powersupply noise level can behigh, compared to (b) where the noise level is low due to theuse of decaps.A variety of different methods can be used to manage supply voltagedrops. Among them, themost popular is to use on-chip decoupling capacitors (decaps) to maintainthe power supplywithin a certain percentage (e.g., 10%) of the nominal supply voltage[1][61. Decaps are typicallyplaced in regions between areas of high current demand and the power pads andI/O pins[7][8][9][10][1 1]. The effectiveness of decaps can be illustratedconceptually in Fig. 1.1, wherethe power system can be modeled as a distributed RLC network [8][12][13][14]. Thepowersupply noise level of Fig. 1.1(a) is reduced in Fig. 1.1(b) due to theuse of decaps.In Application Specific Integrated Circuit (ASIC) designs, two types ofdecaps can be identified:white-space decaps and standard-cell decaps. White-space decaps are placedin the open areas ofthe chip between intellectual property (IP) blocks, and are madefrom passive decaps or activedecaps. Standard-cell decaps are always passive decaps placed within the IP blocksthemselves[9], typically as filler cells.Passive decaps can be implemented with MOS transistors or metal-insulator-metal(MIM)capacitors. White-space decaps are usually implemented withNMOS transistors since theyIob.LVsLIrrJEJ -IPcIcoge Bond Wire/Pad Power Meok PackageBond Wire/Pad Power Mesh(b)glebol Vss2provide a high capacitance value per unit area. In certain applications, PMOSdevices and MIMcapacitors [15][16][17][18] are two alternatives for white-spacedecaps. Standard-cell decapsnormally use both NMOS and PMOS devices. A morerecent approach is the active decap whichrequires dynamic switching of passive decaps to boostup the power rail voltages when excessivevoltage drop is detected. Due to its area requirement, itcan only be used in the open areasbetween blocks. The implementation of active and passive decaps in termsof NMOS and PMOSdevices in either white-space areas or standard cells can be illustratedin Table 1.1. This researchaddresses design and implementation issues for both active andpassive decaps.Table 1.1: Comparison on active and passive decap implementations.Active decap Passive decapWhite space NMOS and PMOS NMOS or PMOS (orMIM)Standard cell Not used NMOS and PMOSThe lack of sufficient decaps can result in unsatisfactory timing and even functionalfailure forthe logic circuits and memory cells[191. Onthe other hand, overdesign may cost too much area.It is necessary to develop a metric to evaluate the decap effectiveness in termsof power supplynoise management [20]. Starting from 9Onm, a number of relatively new issues[21] must beaddressed that impact the design and layout of decaps. This research addressesthree importantdecap problems including frequency response [22][23], electrostatic discharge (ESD)protection[24], and gate tunneling leakage [6][25][26][27][28][29][30]. The frequencyresponse controlsthe performance of decaps at increasingly higher operating frequencies. A potentialESD eventacross a thin gate oxide increases the likelihood that a chip willbe permanently damaged due to3a short circuit in the decap itself. Higher gate leakage significantly increasesthe total staticpower consumption of the chip.In white-space areas, prior to the 9Onm technology, the use of passive decapswas sufficient. At9Onm, the oxide thickness has been reduced to 2nm or less. Therefore,decaps have beenredesigned into a cross-coupled form [21] to protect the device frompotential electrostaticdischarge (ESD) induced oxide breakdown [24]. However,the additional series resistanceconsiderably reduces the transient response of the decap [23]. Asa result, large JR-drop levels inlocalized regions (usually called “hot-spot” JR-drop violations)may unexpectedly be present inhigh-speed ASICs. These unresolved hot spots cause timing closure problemsor result infunctional failures in extreme cases. To remove them, designers mustconsider many optionssuch as moving the logic blocks, adding or rearranging the power pins, and/or modifying powergrid design. Near the tapeout deadline, such time-consuming design iterationsmay not always befeasible. In these situations, an active decap design can play an important role in reducing supplynoise without major changes to the design [31]. That is, the active decap canbe a drop-inreplacement of the passive decap in an attempt to remove any remaining hotspots, therebysaving time and effort [32].The concept of active decap can be extended with more switches and decaps toideally achievelower supply noise. These extended active decap designs need to be evaluatedfor advantagesand limitations so that an optimal design can be made practical. In addition, a novelcircuit calledcharge-borrowing decap that transfers charge from a clean supply node to a noisy supply nodewill be introduced to produce superior performance to the basic and the extendedactive decaps.41.2 Research ObjectivesResearch Statement:To investigate new designs and proper placement of active and passivedecoupling capacitors toefficiently manage on-chip power supply noise in white-space areasand standard cells for ASICapplications.Specific Research Goals and Contributions:• Design passive decaps in standard-cell arrays that properly tradeoff gate leakage, ESDand transient response, and provide decap design metrics to determinethe optimal layoutto obtain a desired capacitance level over a target operating frequency. Develop empiricalformulae with only a few parameters to capture the frequency responses for 9Onmand65nm CMOS technologies.• Design the basic active decap to provide better power-delaytrade-off than priorapproaches for white-space hot-spot removal. Identify and resolve limitations of theactive decap in terms of suitability for ASIC applications in deep-submicrontechnologies.Explore the placement of active decaps to remove late-stage JR-drop violations. Validatethe design of the active decap using a chip design that providestesting mechanisms toevaluate improvements in dynamic voltage drops.• Extend the concept of active decap by modifying its design for improvedsupply noisemanagement. Achieve a better active decap design that provides a higher levelof powersupply noise reduction than the basic form of active decaps within a fixed area.Propose anovel circuit that outperforms both the basic and the extended active decaps withhelpfrom a clean supply rail.51.3 Organization of the DissertationThe remainder of this dissertation is organized as follows. Chapter 2 providesthe necessarybackground on decap design basics and challenges, gate tunneling leakagethrough decaps, ESDreliability of thin-oxide gates, standard-cell layout and placement of decaps,and metrics forpower supply noise management.Chapter 3 develops a set of new passive decap designs basedon the cross-coupled decap. Themodeling of the new designs is described and design metricsare provided to allow handcalculations and analyses to be carried out. Based on the simulation results,the proper layout ofthese designs is described.Chapter 4 proposes a modified active decap design for hot spot removal inASIC applications.The design advantages and disadvantages are compared against priorwork. Measurement resultsfrom a test chip are used to validate the design. After correlation with measurement results,further simulation is carried out to explore the efficiency of active decap placement.Chapter 5 extends the concept of the active decap to achieve a better designthat has a higherlevel of power supply noise reduction. Also presented in the chapter is a novel charge-borrowingdecap circuit that outperforms the basic and the extended active decaps.Chapter 6 summarizes the results of the dissertation and provides conclusions.Future researchdirections are provided.6Chapter 2Background2.1 IntroductionThe topics in this chapter provide the necessary background for the rest of the dissertation. Somefundamental and practical decap design issues are also highlighted to motivate the topicsin theremainder of the dissertation. This chapter begins with an overview of designchallenges andproblems associated with decoupling capacitors in 9Onm and 65nm. The overview includes gatetunneling leakage, electrostatic discharge phenomenon and protection, and standard-cell decapplacement. Gate leakage is introduced from a physical point of view, and useful informationfrom recent technologies is given. ESD reliability is presented and typical phenomenaduring anESD event are discussed. Primary and local ESD protection schemes are briefly illustrated. SinceASIC designs typically utilize standard cells, the decap insertion and placement procedure withinstandard-cell blocks is briefly introduced. A simple metric for supply noise management isproposed to assess the profiles showing power supply noise and to compare the designsproviding different decap performance.72.2 Decoupling Capacitor Basics and DesignChallengesA passive decap in the white spaces of a chip can be implemented usingan NMOS transistorwith the gate connected to VDD and both source and drain connected toVss, as shown in Fig. 2.1.This approach is considered effective because the thin-oxide capacitanceof the transistor gateprovides a higher capacitance than any other oxide capacitanceavailable in a standard CMOSfabrication process[211.For design purposes, an approximation for handcalculating thecapacitance of this MOS decap can be given by [7]:CdecapW . L• C0, (2.1)where W is the transistor width, L is the transistor length, andCox is the oxide capacitance perunit area. A more accurate capacitance model needs to includethe parasitic fringing and overlapcapacitance of the transistor, and will be discussed in greater detail in Chapter3. Passive white-space decaps can also be implemented with thick-oxide MOSdevices, depending on therequirements on ESD and leakage, knowing that there is a capacitance densitypenalty.Figure 2.1: Decoupling capacitor implemented using an NMOSdevice.In the past, the analysis techniques and design metrics dealingwith power supply voltage dropwere overly simplistic [33][34][35]. Designers analyzed power supplynoise with static voltagedrop (SVD) analysis, which might not reflect the true natureof power supply fluctuations,VDDDecap8leading to either unnecessary overdesign or risk of timing failures [20]. Although SVD analysiscan provide useful feedback in terms of certain glaring errors in the powergrid design, it doesnot take into account the impact of decaps and many other important factors.Dynamic voltagedrop (DVD) analysis is emerging as a replacement of SVD analysis tocapture the impact ofdecaps, package inductance, and simultaneous switching events. The drawbackis that DVDanalysis does not return a fixed value that can assess the degree of improvement.Currently, thereis no signoff or analysis metric to characterize a DVD profile [36]. Therefore,a good metric forDVD analysis is desired to evaluate decap design and placement, forthe purposes of thisresearch.At 9Onm, the oxide thickness has been reduced to about 2nm or less. Oxide thickness reductioncauses two problems: possible oxide breakdown during an ESD event andincreased leakagecurrent. ESD is a transient process of static charge transfer that can typically arisefrom humancontact with any IC pin [24]. Additional input resistance can be inserted in series with passivedecaps to protect from ESD. However, this input resistance causes the decap to sufferfrom thedegraded frequency response, resulting in a poor performance in terms of managing powersupply noise. Moreover, increased gate leakage should be considered. If decaps can bedisconnected from the power rails when they are not needed (e.g., the logic circuit nearby isquiet), gate leakage reduction can be achieved. Therefore, overdesign of decaps shouldbeavoided.Standard-cell layouts of an IP block consist of rows of fixed-height cells in the ASIC design flow.Typically, decap filler cells have both NMOS and PMOS devices. From the 9Onm node, a cross9coupled decap design has been proposed by cell library developers [21] to addressthe issue ofESD reliability, as shown in Fig. 2.2. The cross-coupled design providesadditional ESDprotection to the thin-oxide gate of the device [21]. Standard-celldecaps are generallyimplemented with thin-oxide MOS devices.VDDFigure 2.2: Cross-coupled decap schematic1211.The major concern for white-space decaps at 9Onm and 65nm is a reducedbudget for powersupply noise as the supply voltage decreases to about IV [19]. In certain situations, theuse ofpassive decaps only in white-space areas is not sufficient and hot spots at certain locations mayappear at a late design stage. Active decaps in the white space may be used to removehot spots.The goal is to investigate strengths, limitations, design issues and placement strategiesfor activedecaps in ASIC applications.Active decaps were originally intended for custom designs. Our goal is to optimizethem forASIC designs. Two active decap circuits using switched capacitances were proposed in the pastto regulate the supply voltages {37][38][39]. The basic concept of the active decap is to switch apair of passive decapsCaecap either in parallel or in series to increase charge delivery capability,as shown in Fig. 2.3(a).vss10VODC decq-1(b)global VDDVActive[39]: Opaiup with chain ollnvertersDecap(c)Figure 2.3: Active decap concept shown in (a) and two previous designs from(b) [37] [38] and (c)[391.C decap(a)global VDDglobalActive[37] [38]: Single-Input Single-Output Amplifiers Decapglobal)11The two designs in [37] and [39] are effective for power supply noise reduction,but they alsohave certain limitations. The design in [37] can respond quickly to supplynoise but dissipateslarge power, whereas [39] saves power but experiences long switching delays.Therefore, animproved design with a better power-delay trade-off is desired. The twoprevious designs[37][38][39] are illustrated in Fig. 2.3(b) and 2.3(c), respectively. Ref. [37][38][39] mitigate theeffects of LC resonance typically in the 20-400MHz band [40]. Recent workhas been done toreduce LC resonance using a switched decap technique[411.Researchers have also reported on adistributed active decap to greatly boost the effective decap value whilereducing the arearequirements[411.In addition, the issue of directly replacing an area occupiedby a passivedecap with an active decap needs to be addressed to determine the degree of noisereduction thatcan be obtained and the associated tradeoffs of area and power.2.3 Thin-Oxide Gate Tunneling LeakageIn 9Onm and 65nm processes, a new design issue for decaps due to oxide thicknessreduction isthe thin-oxide gate tunneling current. The current is in the form of tunnelingelectrons or holesfrom substrate to gate or from gate to substrate through the gate oxide, dependingon the voltagebiasing conditions [26]. Two forms of gate tunneling exist: Fowler—Nordheim(FN) tunnelingand direct tunneling. For normal operations on short-channel devices, FN tunnelingis negligible,and direct tunneling is dominant [26]. In the case of direct tunneling, the gate leakagecurrent inPMOS is much less than in NMOS, and it has been shown experimentally that PMOS gateleakage is roughly three times smaller than NMOS gate leakage for same size transistors[27][431.The gate leakage simulations can be carried out by using BSIM4 SPICE models [44][45].12Assuming a 9Onm technology with 1 .7nm oxide thickness and 1 .OV power supply,the gateleakage current‘leakis shown in Fig. 2.4.100______________35— NMOS Ileak—--PMOS Ileak 3080—-NMOSCdecap—a--PMOS Cdecap- 60 ---20a)a)0-I40,a)10c20 -‘I .50 I I 00 0.5 1 1.5 2Transistor Area WL (im2)Figure 2.4: Gate leakage current versus gate area.500:i400NMOS deviceI—300rJa)a):iOa)100PMOS device01.5 1.6 1.7 1.8 1.9 2Oxide Thicknesst (nm)Figure 2.5: Gate leakage current densityJleakversus oxide thickness t0,.13Clearly, as indicated from simulation results, the gate leakage is proportionalto the transistorarea. That is,‘leak = kakWL (2.2)whereileakis the gate leakage current density. From simulation, the decoupling capacitancevalues of the transistor, Cdecap, are also shown in the figure. As described earlier,Cdecap is equal toWLC0xto the first order. Since Co is fixed in this case, Cdecap is proportional to the decap areaof WL.The gate leakage current densityJleakand the oxide thickness tox have an empirical relationshipas follows, assuming the voltage across the oxideVox is fixed [43]:logleak= K1 —K2.(2.3)where K1 and K2 are non-negative experimental constants and are process dependent.Equation(2.3) implies that the gate leakage current is exponentially relatedto the oxide thickness. AtypicalJicakand tox relationship for a fixedVox at 1V is illustrated in Fig. 2.5.It is evident that at 9Onm and 65nm technologies, the gate leakage fromdecaps will besignificant [27]. The gate leakage contributes to the total static power consumption,and decapsusually occupy a large on-chip area. The use of PMOS devices exclusively isnot a viablesolution for high-frequency circuits since they have a poor frequency responserelative to theNMOS devices for 9Onm and 65nm.In addition, the amount of gate leakage is also a strong function of the applied bias [28].If thetransistor has a voltage across the oxide, Vox, roughly equal toVDD, the leakage current density14is largest. If the transistor has a Vox set to close or below the threshold voltageVT, it leakssignificantly less. Indeed, under such a condition, the gate leakage currentis typically 3-6 ordersof magnitude less, depending on the values ofVDD and to [281. Thus, the gate leakage in thesecond condition can be roughly considered to be zero. In decaps, the gateis at VDD and thesource and drain of a transistor are tied together. Therefore, decaps wouldexperience the highestlevels of leakage, as a function ofV0x.The oxide capacitance C0 is a critical factor to many physicalproperties of MOS transistorssince the drain current of a transistor is proportional toCox. A larger Cox results in a larger draincurrent and hence a faster transition or a shorter gate delay. On the other hand, the subthresholdcurrent is related toCox: a smaller Cox (or a larger toy) corresponds to a higher threshold voltageVT and therefore smaller subthreshold current [26]. Each technology generation attempts toincrease C0 by roughly 1 .4x while reducing the channel length Lto O.7x of the previoustechnology’s channel length [7]. The result is that the product ofC0L has been maintainedrelatively constant as technology scales. The properCox selection balances the trade-off betweenthe drain current and the subthreshold current in each technology node.From Equation (2.3), the gate leakage density is inversely related totox. A smaller tox leads toexponentially increasing gate leakage. From a gate leakage perspective, the oxide thicknesst0xshould be kept large. However,Cox is determined by:C0=—(2.4)15whereoxis the permittivity of the oxide and is fixed for a given oxide material. Equation(2.4)suggests that ifoxis kept unchanged, the increase inCox will lead to a decrease in tcj andhence an exponential growth in gate leakage [46].Knowing that the gate leakage increase may be excessive for 9Onm and 65nm,in order to keept0x thick while increasing Cox, one can adjust the dielectric constant, k, where = k6,and801s the vacuum permittivity. If a high permittivity (high-k) dielectric can be usedinstead of thenormal Si02 oxide, the physical oxide thicknesst0x would no longer be limited by its electricalpropertyCox.This concept of using high-k dielectrics was presented in [47], andresearchers andprocess engineers have continued to pursue better high-k materials[481.One of the main goals ofusing high-k gate dielectrics is to keep the gate leakage under control [48].Commonly suggestedhigh-k materials include HfO2,HfSiON, Zr02,and A1203,whose permittivity rangesfrom 10 to30 [48][49], compared to 3.97 of Si02. Commercial microprocessors fabricated ata 45nmtechnology have been developed using high-k materials, and it has been reportedthat a i OX gateleakage reduction was achieved [50]. Starting from 45nm, most fabricationfacilities areanticipated to shift to high-k technology to reduce gate leakage [51]. However,for 9Onm and65nm, the concerns on gate leakage are still significant [27][51].2.4 Electrostatic Discharge Reliability in Decap DesignESD protection due to the thin oxide has become an important concern startingfrom the 9Onmtechnology node. ESD is the process of static discharge that can typically arise from humancontact with any IC pin. Approximately 0.6pC of charge is carried on abody capacitance oflOOpF, generating a potential of 2kV or higher to discharge from the contactedIC pin to ground16for a duration of more than lOOns [24]. Under such an event, the peak discharge current is intheampere range, leading to permanent damage on certain transistors in the chip if not properlyprotected. The damage can be in one of two forms, or a combination of the two. The first isthermal burnout in devices or interconnects, while the other is oxide breakdown of devices due tothe high voltage across the oxide [24]. When running simulations for an ESD event, themaximum current density J of devices and interconnects is measured to check for potentialthermal damage. The oxide voltage also needs to be measured to compare with the oxidebreakdown voltage of a device for a given fabrication process. The oxide breakdown voltage isalmost linearly proportional to the oxide thickness [24]. For instance, assuming a 9Onm processuses 1 .7nm of oxide thickness, the corresponding oxide breakdown voltageis just below 5V. Ifthe thickness is doubled, the oxide breakdown voltage is also doubled to around 1OV [24].An ESD event can be delivered between any two pins of an IC. To properly protect an IC fromESD damage, an ESD circuit must shunt ESD current between these two pins [24]. In the case ofdecaps within standard cells, the only two pins that the decaps have access to are the two localpower rails, namely VDD and Vss. Primary and local (sometimes called secondary) protectionelements are needed to protect the two rails by limiting the voltage difference between the tworails to a value below the oxide breakdown voltage. The primary element will shunt most of theESD current, whereas the local element serves to limit the voltage or current at the local circuituntil the primary element is fully operational [24]. A primary element can be a thick oxidetransistor, a silicon-controlled rectifier, an open-gate, grounded-gate or coupled-gate NMOStransistor, or a large diode [24]. A local protection element can be simply a diode formed by agrounded-gate NMOS transistor [24].17A typical ESD protection scheme is illustrated in Fig. 2.6. In addition to the primaryand localelements, a resistor R1 is required to limit the maximum current flow to the decapand to limitthe voltage seen from the gate of the decap. For better ESD protection, thisresistance is normallylarge and can be in the forms of polysilicon, diffusion, n-well, or channelresistance [24]. Theresistance is generally not implemented together with primary and localprotection devices.Rather, it is usually inserted within standard cells where ESD damage is a concern.Previous decap designs (typically before 9Onm technology) did not considerESD performancefor two reasons. First, the transistor’s oxide thickness was large and the oxide breakdown voltagewas high enough that the transistor was likely to survive during an ESD event with adequateprotection circuits. Second, insertion of the large resistance dramatically reduces the transientresponse of the decap. However, starting from 9Onm, the gate oxide is so thin that the designercannot ignore the increased ESD risk. A large resistance is therefore recommendedto be placedinside the decap cells to protect the circuit from potential ESD damage. Hence, this tradeoffbetween ESD reliability and transient response becomes the one ofmajor decap designchallenges in 9Onm and 65nm technologies.GlobalFigure 2.6: Complete ESD protection scheme.18The ESD simulation requires an ESD generation model. Among all theexisting models, thehuman body model (HBM) was adopted for simplicity. Following the standardMIL-STD-883xmethod 3015.7 [24], a human body can be simulated as a series of l.5k2 resistanceRHBM andlOOpF capacitance CHBM. The capacitorCHBM is initially charged to 2kV that needs to bedischarged through some primary elements. The primary element is arbitrarilychosen to be anESD diode plus a gate-coupled NMOS device (GCNMOS) with an n-well resistorRnweii (l5k2)and an NMOS bootstrap capacitorCb. Two identical primary elements are used to protect thecircuit placed in between the HBM generation and the elements, as shownin Fig. 2.7. Forsimplicity, no secondary element is used.Since the primary elements are designed to handle large current flow, the maximum currentdensity,Jmax,is assumed to be within the safe range and is not measured. HBM generation raisesthe voltage level at node VDD, and hence turns on the primary elements to discharge. For deviceprotection from oxide breakdown, the voltage across the oxideVox of each decap transistorneeds to be observed in simulation. TheVox voltages should to be kept as low as possible, giventhat the oxide breakdown voltage for a typical 9Onm is below 5V.Initially charged to 2kVVDD___________ ____________ ______________J V.______________HBM generation Primary element Primary element(Duplicate)Figure 2.7: Simulation setup for ESD analysis [24].192.5 Standard-Cell Decap Layout and PlacementIn the white spaces around the chip, decaps are usually made of NMOSdevices, as describedearlier. However, within standard cells, it is more convenient to make decapsusing both NMOSand PMOS transistors to form a decap filler cell. This is because then-well is alreadyimplemented and usually reserved for PMOS devices. Onlyabout a half-cell area is for NMOSdevices. One sample standard-cell decap layout is illustrated in Fig. 2.8. In thefigure, the NMOSdecap occupies roughly the bottom half of the cell area, whereas the PMOS decapis located inthe n-well. The capacitor areas are the polysilicon gates placedon top of the channel regions ofthe MOS transistors. For standard cells, the height of the cell is always fixed, andthe designerscan only adjust the cell width. Once the cell width is determined, the sizeof the decap and thecapacitance of the decap are established. Fig. 2.8(a) illustrates a large decapcell (measured incell width) with long channel transistors. A fingering technique is commonlyused to have asmaller effective channel length to improve the decap frequency response. Fig. 2.8(b)depicts thesame decap cell but with two fingers.(a) (b)Figure 2.8: Sample layout of standard-cell N-I-P decap (a) with one finger and (b) withtwo fmgers.20During the placement procedure, computer-aided design (CAD) toolsplace standard cells intorows. Because the height of each cell is always the same, when cellsare placed adjacent to eachother, the n-well region and theVDD and V55 lines are automatically aligned. The cells forplacement are obtained from the standard-cell library,where all the cells are predefined in widthand driving strength. Since the total width of the row isfixed and the individual cell widths arefixed, some empty spaces (typically small) between the cells are leftafter placing cells. Thoseempty spaces are good candidates for the placement ofdecap cells [9]. In fact, a set of decapcells with different cell widths is often included in the standard-cell library.Decap insertion is considered as a part of the completedesign flow. In a typical ASIC designflow, once the standard-cell blocks are synthesized, placed and routedby CAD tools, the decapcells are placed into the empty spaces. Generally, since the spacesare filled using a library ofdecap cells with various sizes, the decap placement is done without affectingthe placement ofother logic cells. After placement and routing, chip-level timingis analyzed and timingviolations will be fixed by replacement and/or rerouting. Then, chip-levelJR-drop analysis iscarried out by a CAD tool (e.g., Apache Redhawk) such that the hotspots of severe voltage-dropareas are identified[521.If the voltage drop at the hot spots exceeds the noise budget, moredecaps will be inserted into the violation regions and a modificationof the placement of otherlogic cells may have to be done. The logic cell movement requiresadditional timing androutability analysis before moving on to next step. Then, thechip’s JR drop is analyzed again forthe remaining hot spots. These steps in the design floware iterated until all the hot spots areeliminated and all the logic circuits pass timing analysis. Typically,it may take one or two21(occasionally even more) iterations to eliminateall the hot spots [7][9]. In addition, the potentialproblem of electromigration is also checkedalongside the JR-drop analysis[7].This commonly used decap placement approachis not optimal simply because the empty cellsmay not be located near the high JR-dropregions. After the hot spots are first identified,theremaining empty spaces near the hot spotsmay not be large enough. Hence,the logic cells mayhave to be shifted, resulting in a needfor additional timing analysis. Inorder to improve theplacement efficiency, researchers suggesta few approaches including: globaldecap placementbetween standard-cell blocks [53], decap placementusing activity [1], standard-cell decapplacement not affecting relative placementof logic cells [9], and earlier-stage decap placementdecisions [24]. Since decaps experienceexcessive gate leakage, decap placementmethodsconsidering leakage current are proposed in[6] and [25].2.6 Metric for PowerSupply Noise ManagementStatic Voltage Drop (SVD) analysis and verificationhas traditionally been an essential partofthe overall physical design and verificationflow in semiconductor industry for overthe past tenyears [20]. In this approach, the JR drops acrossthe chip are computed by averaging thecurrentdraw by the transistors and blocks from thepower grid. The computed fixed values canbe fed totiming verifiers to assess the impact on delay.In the past, SVD analysis providedusefulfeedback in terms of major errors in the powergrid design. Going forward, at 90 nm and smallertechnologies, SVD verification isnot enough to ensure power integrity. SVD doesnot take intoaccount the contribution of power density,variations in the switching activity profileand impact22of inductance and decoupling capacitors (including LCresonance effects) [2]. Therefore, SVD isnot an adequate approach to analyze and optimize powerdelivery networks in SoC designs.Recently, industry began to use Dynamic Voltage Drop(DVD) analysis as a way to capture theimpact of decaps, inductance, and spatial andtemporal switching events in the design[20]. DVDis emerging as a replacement to SVD to capturethe impact of power supply noise on timingbehavior of logic and memory cells. Inorder to evaluate the design of active or passivedecoupling capacitors on a DVD profile and toqualify its impact on a design’s timingperformance, two quantities can be usedas a metric: DVDavg andDVDmax, as shown in Fig. 2.9.DVDavg is the DVD profile’s average value in the timing cycle, whereasDVDmax is the DVDprofile’s peak value in the same timing cycle.A design is considered better if it has smallerDVDavg and DVDmaX. Users should add design margins to these metricsto account for themetric’s simplifications, or they should performthe final signoff process with actualDVDprofiles.DQ DQvssCikV00V00 DVDygCikFigure 29: DVDaVg and DVDmax: metric used to evaluate DVD profiles.23Ref. [20] validates the use of this metric. It can be established thatthe impact of the voltage dropprofile on the timing performance of a digital pathis equivalent to applying a fixed supplyvoltage of VDD-DVDavg to the same path. To show this, a logic path was first simulated inpresence of a true dynamic voltage profile on thepower supply, and then a DC voltage equal tothe average of the voltage profile was used on the power supply [20].The results show that thetiming behaviour of the two cases matches. The intuitivereason for this relationship can beillustrated in Fig. 2.9. Considering the delayof the critical path of the circuit between the twoflops, gate delay is reduced when the supply voltage overshoots(VDD(t)>VDDnominal)andincreased when the supply voltage undershoots(VDD(t)<VDDnominal).When the supply voltagefluctuates, gates that see a voltage drop higher than theVDD- DVDaVg accelerate and gates thatsee a voltage drop lower thanVDD- DVDavg decelerate compared to the situation where all gatessee a voltage drop equal toVDD- DVDaVg. Therefore, DVDavg is a good measure of the averageeffect of the JR drop on delay. The use ofDVDavg as a metric for JR-drop analysis has beenshown to be valid on several industrial designs [20].Applying the metric, any approach (e.g., decap insertion) thatreduces the average dynamicvoltage drop (DVDavg) or raises the average supply voltage(VDD DVDavg) can be considered asa valid solution for reducing power supply noise to improve timingperformance, although theinstantaneous supply voltage drop may not be affected.The DVDmax value represents the worst-case voltage drop, with asafety margin, that causes afailure in logic circuit and memory cells. That is, if the voltagedrop exceeds DVDmax by the24margin of safety, the behaviour of standard cells or memory cells willbe unpredictable. Thevalue of DVDmax depends on the tolerance of individual IP blocks to power supplynoise.Ref. [20] suggests thatDVDavg should not be bigger than 10% of VDD-Vss andDVDmaX shouldnot be bigger than 20% ofVDD-VSS. These percentage values are commonly used in the industry[20]. These limits are considered pessimistic enough to accountfor the simplified nature of themetric.By analyzing different DVD profiles using the metric ofDVDaVg and DVDmax [20], an importantconclusion can be made: Lth/dt contribution of voltagedrop is not as critical as the JRcontribution. Ldi/dt only affects theDVDmax and therefore has minimal impact on DVDavg aslong as the transfer of charge is completed within the cycle. Therefore, Ldi/dtvoltage drop maynot significantly affect the timing performance of the chip, as long as it does notcreate supplyfluctuations that exceed theDVDmaX. The above observation is consistent with [2].2.7 SummaryThis chapter summarized a number of decap design issues including gate tunnelingleakage, ESDprotection, standard-cell layout and placement requirements,and the lack of useful metricsevaluating the results from DVD analysis. The decap design challenges for9Onm and 65nm weredescribed. A simple metric,DVDavg and DVDmax, was proposed to interpret the DVD resultsfrom CAD tools. The metric is best used to compare and evaluate different decapdesigns.25Chapter 3Passive Decoupling CapacitorDesign3.1 IntroductionIn an ASIC design flow, after placementand routing, empty spaces naturallyexist withinstandard cells. Passive decouplingcapacitors, as filler cells, are usuallyused to fill these emptyspaces to reduce JR drop problems locally.This chapter addresses the designand layout ofpassive decaps [9][23][24] for standardcells at the 9Onm technology node.As described in theprevious sections, the JR drop has been increasingover time due to increased resistancesin thepower grid as the metal widths continueto shrink with each successive technologygeneration.The inductive Ldi/dt effects are also increasingdue to the high current demandsof ASIC designsin deep submicron technologies [7][lO][l1]. The increased supply noise level challengesthedesign and layout of passive decaps.A number of relatively new issues for standard-celldecaps must be addressed that impact thedesign and layout of these cellsat scaled technology nodes. Two importantproblems of decapfrequency response and electrostatic discharge(ESD) protection [24] will be addressed.Sincedecaps are required to performat increasingly higher operating frequencies,the frequencyresponse [lO][12]{54] of passive decapswill be investigated first to propose improvementsto26optimize decap layouts. Next, the problems of reduced oxide thicknessof a transistor, namely,ESD [24] and thin-oxide gate leakage [6][1 1], will beexplored in the context of decap design. Apotential ESD event across a thin gate oxide increasesthe likelihood that a chip will bepermanently damaged due to a short circuit in thedecap itself. Higher gate leakage increases thetotal static power consumption of the chip.A cross-coupled standard-cell design was proposed [21] to addressthe issue of ESD performance.The design provides sufficient ESD protection, but does not offerany savings in gate leakageand it may compromise the frequency response. This chapter aimsto suggest improved layoutsof the cross-coupled design that properly tradeoff frequency responseand ESD performance,while greatly reducing gate leakage current.3.2 High-Frequency Responseof Decoupling CapacitorsStandard-cell layouts of an IP block consist of rows of fixed-height cellsin the ASIC design flow.After cell placement is completed, there are a number of emptycells that can be filled withdecaps of various sizes depending on the space available. Previouswork has addressed theautomatic placement and sizing of decap cells [9]. The focusin this chapter is on optimal layoutof each decap filler cell. Typically, these standard cells have bothNMOS and PMOS devices asshown in Fig. 3.1(a), with a corresponding layout in Fig. 3.1(b).Thin-oxide MOS devices aregenerally used for standard-cell decap implementation.27VDDI ,‘ II ‘‘ Iss(a)(b)Figure 3.1: (a) A standard cell decap can be implemented asa NMOS in parallel with a PMOSdevice. The corresponding layout is shown in (b).As the frequency of operation increases, a fingering approach is requiredto implement the layout.That is, a single transistor is split into a number of parallel transistorswith the same width, butsmaller channel lengths. The overhead of this approach is additionalspacing for source/draincontacts and an overall reduction in the low-frequency capacitance. However, theaveragecapacitance of the decap over a given frequency range improvesas the number of fingersincreases. Therefore, the problem of how many fingers to use givena fixed area of a filler celland fixed gate-oxide thickness needs to be addressed. The objectivehere is to develop a usefulmetric to capture the frequency response characteristics in order to choosethe optimal number offingers.To derive the needed equations, an NMOS decoupling capacitor isdepicted first in Fig. 3.2.Non-idealities associated with MOS devices are modeled asa lumped-RC circuit [12j where boththe effective resistance,Rejj, and effective capacitance, C11, are functions of frequency,fasshown in Fig. 3.2.28VDD(f)Figure 3.2: A decap can be implemented as a NMOS device and modeled asa lumped-RC circuitwith effective resistance and effective capacitance as functions of frequency,f.The DC capacitance Ceff,O and resistanceReffO are given by {12][24]{55] (where the subscript 0indicates zero frequency at DC):C0—CoxWL+2CoLW (3.1)RL(3.2)eff,O12PCOXW(VGS—VT)whereCox is the oxide capacitance per unit area, COL is the overlap and fringing capacitancesper unit width of the device,p is the channel mobility,YGSis the voltage across the oxide, andVT is the threshold voltage.Assuming that a given filler cell has a horizontal dimension X and vertical dimensionY, thechannel length of each device in a fingered layout is:Lm =[X—(m—1)xcontac,]/m (3.3)where m is the number of fmgers and is the distance betweenfingers required by contactspacing rules. Modified expressions forCeff0 and Rejo can be derived as a function of the numberof fingers. Thus, the effective capacitance at DC is given by:CeffO(m) = m•(C0xYLm +2.CQLY)(3.4)29For capacitance, each additional finger addsextra overlap and fringing capacitancesbut losesarea due to the contact spacing.Therefore, the capacitance actuallydecreases linearly as thenumber of fingers increases. Thecorresponding equation forRCff0with m fingers in a parallelcombination is:1 L,, lReff,Oeff,O(m)2( )12PCOxW(VGs-VT)In previous work [12], the resistance wasused to select the channel length andthere were no areaconstraints involved. However, sincethe resistance drops off as m2,it is not as important in theselection of a suitable m.In fact, the goal of an optimal layout shouldbe to provide the highestcapacitance value in the given area overa desired operating frequency, 0 tof0,while delivering alow resistance. A simple metricis needed to evaluate layouts withdiffering number of fingers.The easiest choice for a metric is to use theaverage capacitance over this frequencyresponse uptof0,as follows:c— C’eff,0m +(3 6)avg(m)2where C0(m)is obtained from Equation (3.4)andCeff(m)(fo) is the effective capacitance with mfmgers at frequencyf0. A weightedaverage is also feasible, but it wasobserved that the simpleaverage works well in practice.The main issue with the metric is thatCefftm)(fø) is difficult to compute without the aid of HSPICEor an equivalent simulation tool. To facilitatethe process, simple frequency-dependentmodelsfor bothCeff and Reff are developed. Also, the characteristics of both functionsneed to be accurateas technology scales. First, a number of ACsimulations were performed in HSPICEfor a 9OnmCMOS technology using non-quasi-static(NQS) models, which are essential whensimulating30decaps in the gigahertz frequency range of operation. Two parameters, ACNQSMOD andTRNQSMOD, were set to “1” in BSIM4 [55]. The circuit in Fig. 3.3 was used to extract theeffective resistance and capacitance from HSPICE results as follows [52]:Reff(f)=Re(IRC)(3.7)Mag(IRc)c(f)= Mag(IRC)(3.8)2lrfIm(IRC)whereRe(IRC), Im(IRC),and Mag(IRC) are the real, imaginary, and magnitude components of‘RC,respectively. It is assumed in Equation (3.7) and (3.8) that the applied AC voltageVacis 1LO° V.VacVdc VDDCeff(/)Figure 3.3: The circuit setup to extract the effective resistance and the effective capacitance valuesfrom an ac analysis.Both NMOS and PMOS decaps were simulated with W x L sizes as follows: l5jtm x 5pm,lOiimxlOiim,and 5j.im x 15j.im. The simulation frequency ranged from 0 to 10GHz. Typical ASICclock rates today are in the range of 500MHz to 1GHz, but it is important to study frequencyresponse well beyond the clock frequency. Most of the spectral power density of digital signalslies within frequencies of uptOfee l/(2trise), where tflse is a signal’s rise time (which can beonthe order of 5Ops or less),andeeis the 3-dB cutoff frequency of the spectral power density [56].It was assumed conservatively thattrise SOps, and the analysis was carried out up to 10GHz.31Large LengthCeffNMOS12001000WxL(im)800_________10x10600-°-15x5—a-5x15400 -200 -0I0 2 4 6 8 10Frequency (GHz)Large Length Rf NMOS700:::a’40010x10—°--15x5300--5x15200100__________________p0I0 2 4 6 8 10Frequency (GHz)32Large LengthCeffPMOS120010008006004002000Large Length R- PMOSWxL(im)r_a 10x10—°—15x5-o_5x118001600140012008006004002000Frequency (GHz)10WxL(iim)Ho_ 10x10—°—15x55x15Figure 3.4: Plots ofCeff and Rejy for three device sizes (W XL): lOxlOEtm, 15x5im, and 5x15tm.0 2 4 6 8 10Frequency (GHz)0 2 4 6 833The results of the simulations are shown inFig. 3.4. Asfincreases, there is a noticeable roll-offin the curves due to finite transit time effects. Deviceswith large L’s have a more pronouncedeffect. In fact, theCeff curve for 5im x 1 5jim quickly decays in value relative to 1 5jim x5im.Ageneral observation on NMOS and PMOS decapscan also be made: NMOS is superior to PMOSin its high-frequency behavior since it has a largerCeff and a smaller Reff at high frequencies,assuming the area is fixed. Although standard cellsemploy both NMOS and PMOS devicesfordecaps, these results show that NMOS decapswould provide better . frequency responsecharacteristics.For modeling purposes, based on the frequencyresponses ofCeff and Reff, it is suggested [55] thatthe functions can be postulated into the form:C(f)=CeffO(3.9)effl+(f/fT)2rR(f)R0(3.10)effl+(f/fT)2r’wheret1=1/12 [55] and the transition frequencyf= ‘(Shown in Fig. 3.5 are the2rLresults of curve-fitting using Equations (3.9)and (3.10) against HSPICE for the NMOSdevice.The results are very close. A factor of 1/2 wasapplied tofT in order to produce results shown inFig. 3.5. That is, the equationforfT must be adjusted by a fitting factor of 0.5 in order to obtaingood results. Similar results were obtained for PMOSdevices. This demonstrates that the first-order equations forCejj(f) and Rej’(J)are reasonably accurate and, perhapsmore importantly, thatCeff(m)(fo) can be easily computed for the metric without running HSPICE.347006005004003002001000Large LengthCeff NMOSFrequency (GHz)Large LengthRffNMOSWxL(im)Frequency (GHz)Figure 3.5: Plots ofCeff and Reff for three NMOS devices (HSPICE versusmodel).120010008006004002000I I I0 2 4 68 10WxL(iim)lOxlO HSPICE—— lOxlO calculated15x5 HSPICE—— 1 5x5 calculated—6—5x15HSPICEE—5x15 calculated— —_____I I I I—°-- lOxlO HSPICE—— lOxlO calculated15x5 HSPICE—— 1 5x5 calculated5x15 HSPICE—— 5x1 5 calculated0 2 46 8 1035At this point, all the necessary information is obtainedto determine the number of fingers basedon the frequency response. From Equation (3.9), theeffective capacitance in Equation (3.6) atf0with m fingers is:Ceff(m)(fo)=1+(f/J())2vwherefT(m)= (s)(3.11)To demonstrate the efficacy of the metric, it was appliedto the layout of a standard-cell decap inan available area of 2pm x 9jim. Using Equation(3.6), Table 3.1 lists theCavg(m) metric valuesfor the NMOS or PMOS devices for differentfrequency ranges. The optimal number of fingerscorresponds to the largest entries in bold. For example,if the frequency range of interest is 0 to10GHz, then 3 NMOS fingers and 4 PMOS fingers areoptimal relative to the metric. Of course,if the range is 0 to 2GHz, two fingersare sufficient for both N or P devices. Note that PMOSdevices typically require one more finger than NIvIOS devicesat higher frequencies of operation.Table 3.1: Optimal number of fingers for different frequency ranges.Frequency RangeCavg Metric versus Number of Fingers0—fom=1m=r2m=3 m4 m5N 180W 187fF 182fF177fF 171W0-2GHzP 150W 183fF 182fF177fF 171WN 150fF 184fF 182W 177fF171W0-5GHzP 110fF 165fF 178fF 176W171WN 120fF 173fF 180fF176fF 171W0 - 10 GHzP 100fF 135ff 166fF172fF 169W36Table 3.1 was shown to illustrate the use of the metric in determining the optimal number offingers. In practice, the design process would be as follows. First, the area of a filler cell (inparticular, the X dimension of the cell) and frequency range of operation are used as inputparameters. Then, the capacitance value as a function of m is computed using Equation (3.6).Finally, the value of m producing the highest capacitance is used to implement the layout.The results in Table 3.1 can be validated by using Equations (3.4) and (3.9) to generateCeffim)plots for both NMOS and PMOS devices, as shown in Fig. 3.6. The results in the plot wereverified with HSPICE to ensure consistency. As an example, consider the cases with 1 finger andf0=lOGHz. For the NMOS case in Fig. 3.6,Cavg(1)N(Ceff,o+Ceff(1)(J,))/2(l9OfF+5OfF)/2 12OfF,whereas for PMOS,Cavg(1)p(Ceff,o+Ceff(1)(f))/2(l9OfF+lOff)/2lOOfF.These are the samevalues that are found in the last row of the table with m= 1. The rest of the table is produced in thesame manner for different values of m and frequency range, 0—f0.By inspection, the plots indicate that 3 fingers would be optimal for NMOS decaps and 4 fingerswould be optimal for PMOS decaps, based on the flatness of the lines and the initial value of thecapacitance. This conclusion is consistent with Table 3.1. However, by using the metric,designers can quickly obtain the optimal number of fingers for a target operating frequency,without the need for such plots or SPICE simulations.I37CalculatedCeff NMOS200XX180160140120100c8060402006 8Frequency (GHz)CalculatedCeff PMOS200180-___x—xX140‘120100L)80604020010Frequency (GHz)Figure 3.6: The effective capacitance,Cejr/f), ofNMOS and PMOS decaps in 9Onmfor differentnumbersof fingers in a fixed area of Y2Lm and X9Lm.0 2 4 10ng2 Fingers3Fingers4FingersingerjFingerS\—-.3 Finger5-0-4FingerS\jingeri0 2 4 6838VDDL:..• :: :P*osNMOS--‘ .- --. -(a)Figure 3.7: The effective capacitance,Cej$f), of NMOS and PMOS decaps in 9Onm for differentnumbers of fingers in a fixed area ofY=2iimand X=9im.Fig. 3.7 illustrates how standard cell layouts would be implemented using theabove results,assuming a 10GHz operating range. These layouts would be automatically createdby a decapfiller cell generator. Two possible layouts are shown: (a) uses the N and P devicesand (b) isNMOS only. Fig. 3.7(a) uses 3 fingers for the NMOS device and 4 fingers forthe PMOS device.From an average capacitance perspective, the NMOS-only layout style of Fig.3.7(b) is better,and this is also reflected in Table 3.1. To implement this type of layout in a standardcell, the pwell region must be extended to cover the entire area, which is not typical of standardcell design.This approach can be used as long as the design rules at the boundaries of adjacentstandard cellsare satisfied.(b)393.3 Cross-Coupled Decoupling Capacitor DesignsAt the 9Onm technology node, there is the possibility of oxide breakdown duringan ESD event.A simple ESD protection scheme for decaps is to insert a relatively largeresistance in series tolimit the maximum voltage seen at the gate of the decap [24]. A minimumReJJçO is needed toensure ESD reliability for decap cells. A cross-coupled decap design has beenproposed by celllibrary developers [21] to address the issue of ESD reliability. As shown previouslyin Fig. 2.2,the drain of the PMOS device is connected to the gate of the NMOS, andvice-versa [21]. Thecross-coupled design provides additional series resistance to the inherentdecap resistance toincrease Reff0.The frequency response characteristics of this new configuration can be evaluatedto determine ifthe results obtained in the last section can be applied directly to the new circuit. Astandard 3N-4P decap of Fig. 3.7(a) is first compared to a same-area cross-coupled decapusing HSPICE acanalysis in Fig. 3.3, and the results are shown in Fig. 3.8.The standard 3N-4P decap has a very low resistance (around 3O), which makes itprone to ESDfailure. The cross-coupled 3N-4P design has a much higher DCRejj 0 (around 35002) but apoorer frequency response forCeff.Consequently, the tradeoff between ESD reliability andfrequency response must be considered in the design process and decap layout.To improve thefrequency response, additional fingers must be used. The target resistanceRejj,o target for ESDprotection in our case is a minimum of 5002. The number of fingers was increasedto reduce theresistance from 35OO down to that required by ESD. According to Equation(3.5), the scalefactor on the 3N-4P design can be found as follows:403500Rotget=m2=500(3.12):.m=g3500/500=2.6Scaling the 3N-4P by approximately this amount, a cross-coupled 8N-9P decap wasproduced.Similarly, for an ESD target Reff0 target= 10002, it was found that m=l .9, so a cross-coupled 6N-7P decap was chosen. The plots for 6N-7P and 8N-9P fingers are also illustrated inFig. 3.8. Theresults show that the 8N-9P cross-coupled version is the best configuration to address bothfrequency response and ESD protection.41200x x x x xx x xx180 Standard decap 3-finger NMOS& 4-finger PMOS160 -Cross-coupled1408-finger N & 9-finger P120/‘ 100Cross—coupled(‘)806-fmger N & 7-finger P60/10Cross-coupled 3-fingerNMOS & 4-finger PMOSFrequency (GHz)3500Cross-coupled 3-fmger3000NMOS & 4-fmger PMOS2500/Cross-coupled 6-finger N& 7-finger P‘2000Cross-coupled 8-finger N1500& 9-fmger P10005000) x/x x0/2 4 6 8 10Standard decap 3-fingerNMOS & 4-fmger PMOSFrequency (GHz)Figure 3.8: Cejj(J) and Refl(J) comparison of fixed-area standard decapand cross-coupled decap:same MOS device sizes but different poiy connections.42Vim.PrjVssFrom a layout perspective, the cross-coupled decaps canbe realized by simply rerouting the polyconnections of the standard decaps, while keeping the MOS devicesthe same. The layouts oftwo cases, 3N-4P and 8N-9P, are shown in Fig. 3.9.NMOS:tW<“ ///4/ At//A’ ////V/////d’Af/V///V///<’/ //VA/A’/ ,/, V / /V, / / /4/ / /, / ////A(/V -V%tinaaa*7iLh________qj\‘\‘\_____wflaiaNaJrV;___•,< TC____iaaaiiaaSwcwvaW%.fl \“——.4(a)(b)Figure 3.9: Sample layouts of cross-coupled decap cells for (a) 3N-4P(b) 8N-9P.It is important to address one other issue of thin-oxide, gate leakage currentof the decap, whichcontributes to the chip’s total static power. Using HSPICE, the standardand cross-coupled decapcircuits were found to have almost identical gate leakage.That is, since the cell area is fixed andonly the poly terminal connections are swapped, the cross-coupled designprovides no inherentsavings in gate leakage as compared to the standard design. There existsa simple designapproach to save gate leakage. Simulations using BSIM4 SPICE models[44] indicated that43PMOS gate leakage is roughly 3 times smaller than NMOS gateleakage for same size transistors[27][43]. Therefore, PMOS devices are preferred froma leakage perspective. Since PMOSdevices have a poor frequency response, more fingerscan be used to obtain the desired result.But this must be carried out in the context of the cross-coupleddesign to preserve ESDprotection.The basic idea to control leakage is to have the smallest possibleNMOS device cross-coupledwith the largest possible multi-fingered PMOS device. Thisway, the advantages of PMOSleakage and cross-coupling ESD protection are preserved. The layoutsof two configurations areillustrated in Fig. 3.10. A small NMOS deviceis used in both cases. Note the n-well regions havebeen expanded in both layouts to accommodate the larger PMOSdevice. Fig. 3.10(a) uses 9PMOS fingers while Fig. 3.10(b) has a total of 16 fingers. Thesame cell area as before (2jim x9pm) was used for the two designs.44Table 3.2: Comparison of the passive decap designs and theirgate leakage current.Decap Cell Layout DescriptionGate LeakageStd. 3N-4P Std. decap with 3 fmgers for N and 4fingers for P 262.4 nACross-Coupled 3N-4P Cross coupled with 3fingers for N and 4 fingers for P 260.8 nACross-Coupled 8N-9P Cross coupled with 8 fingers for N and9 fingers for P 206.8 nA9 fingers Cross coupled with smallest N and 9 fingersfor P 119.1 nAModified16 fingers Cross coupled with smallest N and 16 fingersfor P 99.7 nA(a)45Table 3.2 summarizes the leakage values for the different cases.The standard and cross-coupled3N-4P decaps have roughly the same leakage. It is somewhatreduced for the 8N-9P case sincethere is less area for leakage. However, for the two layouts withthe small NMOS devices, theleakage is cut in half. In fact, the case with 1N-16P, the leakageis 62% less than the standarddecap 3N-4P.The Reff,o targetof the cross-coupled design must be set based on ESD considerations,but that alsocontrols the maximum number of fingers permitted,mmax. Since the NMOS device is fixed whilethe PMOS device is multi-fingered, the following equationcan be used to determine theresistance:“eff,O_target — “W”2mmaxwhere RN and R are the resistance of the decaps without fingers,and “II” means “in parallelwith.” This targetReff0sets up the equation for a maximum number of fmgers, Thatis,I 11mm=I(——)R (3.14)eff,O_target NAs described in the previous section, the optimal m dependson the frequency response (i.e.,Cavg(m)), but thenumber of fingers selected should not exceedto mm to satisfy ESDrequirements.46200x x x xx xx x xx180 Standard decap 3N-4P160 -140 - Mod, cross-coupled 1N-16P120-/‘ 100 -80 -60Mod, cross-coupled 1N-9P40 - Cross-coupled 8N-9P20 I00/“24 6 8 10Cross-coupled 3N-4PFrequency (GHz)1000Mod. cross-coupled iN 9P//Cross coupled 3N 4P8006001:1)MoZs300200Cross-coupled 8N-9P100,c—x x x x x x x x xxI I II0’0 / 2 4 68 10Standard decap 3N-4PFrequency (GHz)Figure 3.11: Frequency response of various cross-coupled designs.47Fig. 3.11 illustrates the frequency response of the various designsfrom 0-10GHz. All of theconfigurations provide similarCeff0 values but are dramatically different in the frequencyresponse characteristics. The standard 3N-4P case is thebest, followed by the modified cross-coupled 1 N-i 6P. TheReff0are different in all cases but only the standard3N-4P case isunsuitable for ESD protection. However, it is desirable to select the configurationwith the lowestReff0that satisfies the ESD criteria (5002 in this case) for a rapid time-domainresponse. Overall,the cross-coupled iN- 1 6P layout is recommended because it providesthe required RefJ 0for ESDreliability and saves at least 50-60% on gate leakage.To summarize, for 9Onm and 65nrn, standard-cell passive decapdesign should follow the layoutstrategy shown in Fig. 3.10. By using the smallest NMOSdevice and the largest multi-fingeredPMOS device in the cross-coupled form, the decap has the lowest leakageand is able to satisi’the ESD requirements.3.4 SummaryThis chapter investigated the tradeoffs between high-frequency performanceof decaps and ESDprotection and its impact on the layout of standard-cell passivedecaps. A design metric wasintroduced to determine the optimal number of fingers to use in the standard-celllayout to obtaina desired capacitance level over a target operating frequency. Modelswere developed to capturethe frequency responses ofReffand Ceff for a given technology with only a few parameters.As aresult, the models can be used to predict the same characteristics offuture technologies.48For ESD protection, a cross-coupled design was proposedby cell library developers to provide alarge series resistance, but it suffers from reduced frequencyresponse and provides no savings ingate leakage. This chapter demonstrated that more fingersare needed with the cross-coupledstandard-cell layouts to provide the target resistance valuefor ESD protection. The design of thetarget resistance can follow the formulae provided inthis chapter. The layout with the smallestNMOS device and a multi-fingered PMOS device delivers acceptablefrequency response andESD reliability, while providing the lowest leakage.49Chapter 4Active Decoupling Capacitor Design4.1 IntroductionPassive decaps described previously have a small layout andare useful within the block ofstandard cells. However, for large global decaps (i.e., outside the block), other approaches can beused. This chapter addresses a novel application of the active decoupling capacitor. Theobjective is to investigate the effectiveness of removing local JR-drop violations (usuallycalled“hot spots”) by replacing passive decaps with active decaps. Starting from9Onm, large powersupply noise levels in localized regions may unexpectedly be present in high-speed ASICs [19].These unresolved hot spots cause timing closure problems or result in functional failures inextreme cases. These hot spots are often detected late in the design cycleso they becomeproblematic and difficult to remove.To remove them, designers must consider many options such as moving the logicblocks, addingor rearranging the power pins, and/or modifying power grid design. Near thetapeout deadline,such time-consuming design iterations may not always be feasible.In these situations, an activedecap design can play an important role in reducing supply noise without majorchanges to thedesign [31]. That is, the active decap can be a drop-in replacement of the passivedecap in an50attempt to remove any remaining hot spots, therebysaving time and effort. In this chapter, toexplore the effectiveness of an active decap, quantitativedata will be provided on the expectedimprovements, sizing considerations and placementof an active decap relative to the hot spot.Two active decap circuits using switched capacitanceswere proposed in the past to regulate thesupply voltages [37] [38] [39]. By increasing charge deliverycapability, the two designs in [37]and [39] are quite effective at reducing supply noise,but they also have certain limitations. Thedesign in [37] can switch quickly but dissipates largepower, whereas [39] saves power butexperiences long switching delays. They both mitigate theeffects of LC resonance [40], which istypically in the 20-400MHz band. Recent work hasbeen done to reduce LC resonance using animproved switched decap technique [41]. Furtherwork has also been reported on a distributedactive decap to greatly boost the effective decap value while reducingthe area requirements[411.In this chapter, a modified active decap design that has lower powerthan [37] and a betterresponse time than [39] is proposed, targeting ASIC applicationsup to 1GHz. The issue ofdirectly replacing an area occupied by a passive decap with anactive decap is addressed todetermine the degree of noise reduction that can be obtainedand the associated tradeoffs of areaand power.4.2 Active Decoupling Capacitor Analysisand Design4.2.1 Active Decap Concept and Design ConsiderationsThe basic idea of an active decap is to switch a pair ofpassive decaps,Cdecap, from parallel toseries to provide a local boost in the supply voltage [37][38][39]. Asillustrated in Fig. 4.1(a), thedecaps are initially in a parallel configuration with a full charge developedacross both capacitors.51In this standby state, the equivalent capacitance is2Cdecap. When placed in a series stack, as inFig. 4.1(b), the boosted voltage is ideally2VDD while the equivalent capacitance is reduced toCciecap/2.When switched back in parallel, the voltage returns to the original value ofVDD. In thiscase, the stacking level n is 2._____VDDCdecqIdecapVss Vss(a) decaps inparalleld2VDDCeC decapVsdecap(c) circuit implementationV Vss(b) decaps in seriesFigure 4.1: Active decap concept and its MOS implementation.The active decap circuit is depicted in Fig. 4.1(c), withCdecapand the switches implementedusing NMOS and PMOS transistors [37][38J[39]. When the capacitorsare in parallel, both Mnland Mp 1 are on while Mn2 and Mp2 are off (i.e., subthreshold).When the capacitors are inseries, both Mnl and Mpl are off while Mn2 and Mp2 are on. The switchesexhibit finite “on”resistances, indicated as R and R,, and there is also thin-oxide gateleakage through the decaps,C d.ecapH52‘leak,especially in 9Onm and 65nm CMOS technologies. Both of these effects reducetheperformance of the active decap, as described below.For the general case of stacking n parallel decaps into a series chain, the maximum improvementcan be characterized in terms of a gain, G [37]. If k is the voltage regulation tolerance,wherekVDD is the permissible drop in voltage, then the charge delivered by n parallel capacitors is:Qpcv-aiiei= kVDD flCdecap (4.1)When the capacitors are stacked in series, the charge delivered for the same voltagedrop is:Qseries= [nVDD—(1—k)VDD]• CdeCap/ n (4.2)The overall charge gain is:G= Qseries = [D—(1—k)VDD].CdIn(43)QparaiieikVDD. nCdecapTherefore, as given in [37], the gain is controlled by n and k:G=’21 (4.4)There exists a value of k such that the regular decap outperforms the active decap.For example,setting G=l and n=2, it is found that k=l/3. For values of k> 1/3, the active decap isof no value.However, if k is below this value, the active decap is able to deliver more charge.For example, ifk=O. 1 and n=2, then 0=2.75. This implies that 2.75 times more charge can be delivered by theactive decap before its output voltage drops to the same level as the passive decap.Previous research [37][38][39] provides no information on practical limitations when usingEquation (4.4). For design purposes, this level of improvement is not possible due tothe switch53resistances and leakage currents. In fact, the boosted voltage cannot reachflVDD butinsteadreaches a lower voltage ofbVDD. Therefore, the gain equation should be rewritten as:(45)kVDD•nCPwhereb = n f(R0) g(Cdecap) (4.6)The reduction factors, f(R,) and g(Cdep), depend on the switch resistance, R1,and the leakagecurrent which, in turn, is proportional toCciecap. Using circuit simulation, normalized plots ofJ(R) and g(Cdecap) are provided in Fig. 4.2. The switch resistance has a more pronounced effecton b as compared to the leakage current. For example, withR011lO aildCciecap700PF, it Cd.flbe obtained that f=O.9 and g=O.95 from Fig. 4.2. If the two effects are combined, thenb=2(O.9)(O.95)1.7 instead of 2. With k=O.l, the achievable gain is now reduced to G2.O. Theactual final voltage value, aVDD, when the active decap supplies the same charge as the passivedecap is determined by setting G=1 and solving for a in the following equation:(47)kVDDflCdecapIn this case, with b=1.7, k=O.1 and n=2, one can obtain a=1.3, which implies that the activedecap will be boosted initially to 1 .7V (instead of 2V) and then falls back to 1 .3V due to thecharge demand of a nearby logic circuit. In the passive case, the initial voltage of 1V would bereduced to O.9V, so the active decap is still superior even with the nonidealities included.5410.9-0.8-00.7-0.6-0.5- -0.40 200 400 600 8001000On Resistance of the Switches Ron (2)(a)Decap Value Cdecap (uF)(b)Figure 4.2: The reductive factorsfandg for the boosted voltage as a function of (a) “on” resistancesof the switches, R0,and (b) leakage due to the size of decapCdap.55To design the sizes of the MOS switches, a number of issues mustbe considered. From theabove analysis, a small resistance value is preferable to increase the voltageboosting capabilityof the active decap, and to improve transient response times. The “on”resistances also provideESD protection because they are in series with the decaps.Any large voltage fluctuations areabsorbed by the resistors to reduce the drop across the thin-oxide gatesof the decaps, similar tothe effect of cross-coupling decaps[221.Therefore, this resistance must be large enough to safelyprotect the thin-oxide gates. Considering the factors of boosted voltagelevel, decap performance,and ESD reliability, the “on” resistances should be designedto be in the range of lO-202 byproper selection of transistor widths. This will require rather largeswitches. Once their sizes aredetermined, the buffers generating the switching signals must supplyenough current to drive thelarge capacitances resulting in a large sensing and switchingcircuitry that consumes aconsiderable amount of power and area. Therefore, these active decapsshould be used sparinglyin ASIC designs but are particularly suitable for localized hot-spot removal.4.2.2 Overall Active Decap Architectureglobal VDDFigure 4.3: Active decap architecture.globalReferenceVoltage High-passSwitchedGenerator Filters CompiratorsDecaps56Fig. 4.3 illustrates the complete active decap design containing four blocks:a reference voltagegenerator, a pair of high-pass filters, two comparators, and the switcheddecaps. Compared to theprevious work [37][38][39], the key difference in this new approach isthe use of latch-basedcomparators. The user logic circuit block shown in the figure is consideredto be the main causeof power supply noise violation. The switch control circuit for the active decapis realized usingtwo comparators. The differential inputs of each comparator decide the standbyvoltage level atthe outputs of the comparators. In the standby mode, the top comparator hasan output at VDD,whereas the bottom comparator is set toVss. When the power grid discharges, VDD will drop andVss will rise. The voltage variations are passed through the high-pass filters to the comparatorinputs causing the comparators to reverse their output values, and switchingthe decaps fromparallel to series. Later, when the power grid charges up, the comparatorinputs and outputsswitch back to their original values. The use of latch-based comparatorswith hysteresis to switchthe decaps is one of the main contributions to this work. An enable signal is providedfor testingpurposes to allow the active decap circuitry to be turned on or off. When off,the design behavespurely as a passive decap. This allows for a comparison between active andpassive decaps.The trigger voltage for the circuit is set by the comparators and the resistorR1. In Fig. 4.3, thereference voltages are generated by a simple voltage dividerand are set to roughly VDD/2.However, depending on the comparator design, the absolute input levels ofVDD/2are somewhatflexible due to the differential nature of the inputs. The diode-connectedtransistors in thereference generator should have large length and small width to control thestatic current.Inserting a small resistor, R1, between the two transistors is intended to separatethe referencevoltages by approximately 3OmV. Then, if the comparators are designed toswitch when the57voltage difference at the inputs is 10-1 5mV (plus an additional 5mV of hysteresis),the overalldesign will trigger at approximately 5OmV. If R1 is chosen to be smaller,the sensitivity of theactive decaps is improved [41], at a cost of significantlyincreased dynamic power because theactive decap is triggered more often. If R1 is designed to be larger, the resultingsupply noise kwill increase, as shown in Fig. 4.4. When the two input signals of the comparatorsare separatedby a level that exceeds the supply noise generated by the nearby logic, the comparators will notswitch, making it a passive decap. In the plot, the active decap stops switchingwhen the inputvoltages differ by approximately 130—450mV.0.120.11cl0zolC))0.090.08200Comparator Input Voltage Difference (mV)Figure 4.4: Difference in the input voltages of the comparators as a function of supply noise k.The targeted supply noise k value is at 0.09-0.1 (i.e.,/CVDD value is 90-lOOmV), resulting in atriggering input voltage between 20-7OmV. Thus, the comparator can be designed with a 2OmV0 50 100 15058switching threshold then the input voltage difference canbe adjusted by varying R1 to themidpoint of about 5OmV. From another perspective, since the maximumJR drop is allowed to be1 OOmV, the active decap should trigger at about a half of 1 OOmV, whichis 5OmV, because thereis a delay between the time that the active decap is switched and thetime that it actually booststhe supply voltage. Of course, a 2OmV of input triggering voltagewould be better in that sense,but the active decap will be switching too often, resulting in an increased concern on largepowerconsumption. To design the triggering voltage at 5OmV seems to bea good tradeoff betweenpower and performance. Therefore, the comparators are requiredto switch once the voltage(VDD-VSS) discharges by about 5OmV, which can be considered as the input sensitivity of theactive decap.The delay between detection and activation of the switched decap,td, and the delay differencebetween the two outputs of the comparators,Ltd, impact the bandwidth of the active decap andthe boosted voltage, respectively. Specifically, the delay of the comparatorstd isinverselyproportional to the bandwidth of the active decap, BW. That is,BW1 (4.8)If the operating frequency is below the bandwidth, the active decap reducesthe supply noiserelative to a same-area passive decap. On the other hand,if the clock frequency is beyond thebandwidth, the active decap may result in equal or more supply noise than the passive decap.If ittakes an entire clock period to switch the decaps, then they are just likethe passive case becausethey will not switch during the whole clock cycle. Before they actually switch,the supplyvoltage goes back to the right level and they are forced not to switchany more. The samesituation happens on the next cycle. Therefore, as the switching frequencyof the logic increases,59the active decap is less and less effective. Then, at 1/td, it looks like a passive decap. Beyondthat point, due to larger static power consumption and varying“on” resistance of the switches,the active decap becomes worse than the passive decap. The aboveobservation will be validatedin the next section.Ideally, the delay of the top and the bottom comparators should be the same. In practice, adifference in delay&dbetween the top and the bottom comparators existsand can be defined as:= tdtop tdix)ttom(4.9)The delay difference will not result in a short connection between powerand ground or an opencircuit where no decoupling capacitance is present. The effect is, however,the boosted voltage bwill be degraded due to leakage current when the switches are not turned on/off at the same time.A function,h(z.td), can be used to capture this effect, as shown in Fig. 4.5. Therefore, Equation(4.6) should be re-written as:b = (4.10)It is desired to keep the delay difference of the two comparators smalleven under process!voltage/temperature (PVT) variations to ensure sufficient improvement of the boosted voltage.601.0510.95-e0.90.850.80.75-500 -400 -300 -200 -100 0 100 200 300 400500Delay Difference Ltd (ps)Figure 4.5: The reductive factor h for the boosted voltage as a functionof delay difference At1.4.2.3 Design SpecificationsWhen designing the active decap, a certain value forCdecap is needed to keep the supply noisesmall. However, in the case of drop-in replacement, this design freedomis not available.Therefore, as the first step, designers should check and see through simulationif the replacementof active decap can remove the hot spot nearby or not. The next step is to select a proper “on”resistance of the switches R. For a high boosted voltage b, R0 should be kept lowenough. Onthe other hand, R0 must also be high enough for ESD protection. Designersshould make thistradeoff for R selection to design the switches. The size of the switches andthe passive decapsdetermines the load capacitance on the comparators.61The comparator design should generally satisf’ low power and high speed requirementsof ASICapplications. For example, a static power of 5mW or below can be achieved while the bandwidthof the active decap should be able to handle a 1GHz clock. This sets the averagecomparatordelay to approximately ins. Note that the delay requirement should be fulfilledunder PVTvariations. Therefore, the worst-case delay should be used here. The output of the comparatorsinthe standby state should be close toVDD or V55 to reduce leakage current through the passivedecaps and switches to save power. The comparator should be designed to provide high gain inthe switching region such that the output will swing from low to high when the input varies10-1 5mV. Higher gain can ease the requirement on input DC biasing and lower the static powerconsumption. From a stability perspective, a certain amount of hysteresis is desiredto reduce therisk of oscillation. A practical value of 5mV for hysteresis is reasonable. Overall, the designspecifications can be summarized in Table 4.1.Table 4.1: Design specifications of the active decap.SpecificationsWorst-case switching delay < 1 nsBandwidth > 1 GHzStatic power <5 mWSpecific design values are as follows. In the reference voltage generation circuit, the size of thetransistors is chosen to provide a branch current of 5-6j.tA. R1 can then be implemented with avalue of 5k2 to produce a separation voltage of the comparator inputs for —30mV. For the RCbased high-pass filters, a cut-off frequency above 10MHz is used to filter out low-frequency62supply noise to save power. However, if the cut-off frequencyis set too high, it may causeoscillation at the supply rails. A cut-off frequencyof 16MHz was finally selected. The tworesistors have the value ofR2=R3=1 Ok2, while thecapacitors (C2 and C3) are implemented withthe same value of lpF. These RC values are somewhat flexible,unless the cut-off frequency isset exceedingly high. For instance, it was observedthat for a cut-off frequency of 1.6GHz orabove, oscillation on the supply rail will occur. Therefore, it is agood approach to have the cutoff frequency designed to be a few orders of magnitudes smaller thanthe oscillation frequency.4.2.4 Latch-Based Comparator DesignThere are a wide variety of ways to designa comparator [57][58][59][60]. Also, the Appendixsection of this dissertation provides the fundamentalsof comparator designs. In this specificapplication, the two comparators must be able to sense voltagevariations that exceed the prespecified sensitivity level (i.e., 1O-l5mV in this case). When thedecaps are in parallel, thesubthreshold leakage from the switches consumes considerablepower due to the large sizes ofthe switch transistors. To reduce leakage current, the outputsof the comparators should be asclose as possible to eitherVDD or Vss. The supply noise budget for a 1V power supply isnormally less than 50-lOOmV and the output is full swing, indicatingthe need for high gain inthe switching region.With the above considerations, a latch-based comparator was selectedfor this application, asshown in Fig. 4.6. The exact transistor sizes are listed in Table 4.2.The branch current of thecomparators is as follows:‘bl=826iA, I7 377pA,18= 1351iA,‘b2=9OOiA, I7=339iiA,and118= 136jiA.63VDDM1OMTM3TR__1rM2M1_oCc______________Vin+iv-h‘b1IM7CLIjfI7Vbiasl—.Mbl(a)VDDVbias2M18h‘b2M17Viii-JIVm+VoutOCI]M12M11iC)CcM1rRZiCL____1I113I I[M15(b)Figure 4.6: Complementary comparator design: (a) n-type input for thetop comparator, and (b) ptype input for the bottom comparator.64Table 4.2: Transistor sizes of the comparators.Transistors WidthlLength TransistorsWidth/LengthMbl (NMOS)75jimJO.liim Mb2(PMOS) 15OimIO.1pmM1/M2 (NMOS)37.5iimJO.1,tm M11/M12 (PMOS) 75jim!O.ljtmM3/M4 (PMOS) 37.5 jim/O.ljim M13/M14 (NMOS)18.75iimIO.1,imM5/M6 (PMOS) 48jimIO.ljtm M15/M16 (NMOS) 24jimIO.lj.tmM7/M8 (NMOS) 37.5jimJO.ljim M17/M18 (PMOS)75i.tmJO.1imM9/MlO (PMOS)75iim/O.1,im M19/M20 (NMOS) 18.75jim/O.ljimThis two-stage architecture satisfies the need for high gain and full swing, butmust be designedto avoid any potential stability or oscillation problems. For a latch-based first stage, introducingacertain amount of hysteresis will prevent the comparator from switching back to the standby statein the presence of small variations around the switching region. For the n-type input comparatorshown in Fig. 4.6(a), the hysteresis voltage,Vhys, is given by [57]:V2I24 —l(411)hysJpflCQx.(W/L)Mlji—:--:•where‘biis the bias current. In Equation (4.11), 2 is the size ratio[(W/L)Ms / (W/L)] and 2>1for a latch. Once the slew rate and the bias current is determined, both‘biand (W/L)M1 are fixed,leaving 2 as the only parameter forVhys. In this case, a 2 value of 1.28 was chosen, producing ahysteresis voltage of around 5mV.65The second stage converts the differential signals intoa signal-ended output and provides therequisite level shifting. The second stage is alsoused as an output buffer to drive the largeswitches, where the desired slew rate can be achievedby adjusting the bias currents andtransistor sizes. Complementary designs are used for thetop and the bottom comparators to haveroughly equal switching delays. The bias voltages for the comparatorsare generated by simplecurrent mirrors. PVT variations on the comparatorand the bias generation can cause delaydifferences in the comparator outputs. This delay difference acts to furtherreduce the boostedvoltage, as illustrated earlier in Fig. 4.5. During the design stage, greatcare has been given toensure that the delay differences are within lOOps underall PVT variation simulations, whichresults in an additional 5% loss in the boosted voltage(i.e., 1 .6V rather than 1 .7V).The dominant poles of this two-stage comparator were identified forstability compensation sincethere is a feedback path through the supply rails back to the comparatorinputs. Therefore, theoutput resistance and the load capacitance of the comparator needto be carefully designed toproperly position the dominant pole. In this case, a Miller compensationcapacitance C is addedto shift the dominant pole to a low frequency to improve stability. Also,a nulling resistorRz ispresent to cancel the right-half-plane zero [59].The simulated large-signal DC characteristics of the n-type inputcomparator are illustrated inFig. 4.7(a), where the curves with hysteresis are shown. Here, theswitching region of thecomparator is in the range of ±lOmV. A 2 value of 1.28 from Equation(4.11) was selected toproduce about 5mV of hysteresis. The peak DC gain is around48dB. The AC curve for thecomparator is shown in Fig. 4.7(b) where the phase margin (PM)at unity gain is indicated as 39°.661Frequency (MHz)(b)Figure 4.7: (a) DC and (b) AC (compensated) characteristic curvesfor the two-stage latch-basedcomparator design (n-type input shown).0.90.80.70.60.50.40.30.20.10-0.05 -0.04 -0.03 -0.02 -0.01050403020100-100 0.01 0.02 0.03 0.04 0.05zWin (V)(a)I II I I I I 11111 I I I 11111 I I I 1111I I I I 11111 I I 11111 I I I 11111I I I 1111I I I 11111 I I 11111 I I I I 11111I liiiI I 11111 I I 11111 I I I I 11111I I I II I I 1111 I I I 1111 I I I liiiI I I II IIIF-lIITh rrrn rT-r-rTI I I 11111 I I III I I I 11111I I 1111I I 11111 I I I I 1111 I I I 11111 I i I1111I I I 11111 I I I I 11111 I liii I I I IIII I I 11111 I I I 11111 I I I 11111 I I II 1111I I I I 11111 I I I 11111 I I I I 11111 I II I I I 11111 I I I I 11111 I I P I 11111 I I 1111I I 11111 I I 11111 I I I 11111 I I I I 1111I I 11111 I I 11111 I I I I 11111 I I I 1111I I I 11111 I I I I PIll I 11111 I II 1111I I I I 11111 I I I 11111 I I I 11111 I I I I 1111I I 11111 I I 11111 I I I 11111 I I II I I I 11111 I I I 11111 I I I 11111 I I I1111I I I 11111 I I 11111 I 11111 I I I 1111I I 11111 I I I 11111 I I I (III I I I 1111I I I I 11111 I I I 11111 I I I I III I I IIIIII I 11111 I I I 11111 I I I III III I I 1111I I I II 1111 I I I 1111 I I I IIII I 1111I I II 1111 I I II 1111 I I I I 1111 p i iI I I I liii I I I 1111 I I I lull I I III IIII I I I liii I I I IIII I I 11111 I I I IIIIII I Illil I I I I 11111 I I I hiP I I I I liiiI I I 11111 I I 11111 I I I 1111 I I1111I 1111111 I 11111111 I II!i II 11111I I I 11111 I I I II 1111 I I I 11111 I I III IIII I I 11111 I I I 11111 I I I liii I I I I liiiI I I II 1111 I I 11111 I I I liiiI 1111I I I 11111 I I I I 11111 I I PjI0I I 1111I 11111 I I I I 11111 I I I Irl II 1111I I 11111 I I 1111 I I I I 11111 I III lii10 100 10001000067The active decap must be able to boost the supply voltage within oneclock cycle such that theaverage supply noise per clock cycle is reduced, sincethis factor controls the path delay of thelogic blocks [20][61]. In this case, our design goalwas set to a maximum clock speed of about1GHz, which makes it suitable for today’s high-endASICs, and even medium-speed customdesigns. When the supply voltage drops to 0.9V(implying lOOmV of noise, i.e., k=0.1), theaverage switching delay for a full output swing was designed to be 0.5ns,which should allowproper operation up to 2GHz. The boosted voltage, basedon prior considerations, should be inthe range of 1 .6V. The charge demand of the logic circuititself will cause an additional voltagedrop ofkVDDn2= 0.1 •1.220.4 V, resulting in an expected final voltage of 1 .2V. Inaddition, thecurrent drive of the comparators will act to reduce the supply voltagefurther, but hopefully keepthe value above 0.9V, which is the noise budget.The active decap was simulated and compared to prior architecturesthat were also redesigned in9Onrn CMOS to quantify the improvement and design tradeoffs. Thecircuit proposed in [39] wasfirst implemented, where it uses opamps in placeof comparators, followed by a chain ofinverters to drive the switches. The inverters were optimally-sizedaccording to logical effort[39]. However, its minimum delay was about 0.9ns (1 .2ns for the slowprocess corner) which isalmost unsuitable for typical ASIC speeds, although itspower dissipation was only 0.8mW. Inthe second design in [37J[38], the sensing circuitry is formedby a pseudo-cascode amplifierdelivering high speed at the cost of high power. The original design [37]was implemented in a0.1 5pm process. The design was adapted to the 9Onrn process and itwas found that, by replacingtheir comparator design with the latch-based version, the staticpower consumption of theswitching circuitry is reduced from 13mW to 2.8mW,an improvement factor of almost 5X,68while the delay only increases slightly from O.4nsto O.5ns. The comparison of the three designsis provided in Table 4.3, where the design parametersthat fail to satisfy the specifications areshown in italic. Note that the new design featureshysteresis while the other two do not. Thesmall-signal ac characteristics of the three designsare shown in Fig. 4.8. For the circuit in [39],the chain of inverters was removed for small-signal analysis.Table 4.3: Simulated switching circuit design specificationcomparison.Specifications137113811391This workProcess 1V-core 9OnmSTMSwitching delay (typical) 0.4 ns0.9 ns 0.5 nsSwitching delay (slow) < 1 ns 0.5ns 1.2 ns 0.75 nsBandwidth >1 GHz 2GHz 0.8 GHz 1.5 GHzStaticpower*< 5 mW 13 mW 0.8 mW 2.8 mWHysteresis voltage 00 4.2 mV*Switching circuitry only6950,.‘I,Figure 4.8: AC characteristic curves for the threedesigns.4.3 Chip Design and Experimental Results4.3.1 Test Chip SetupA test chip was fabricated in a standard 9Onm 1V-core CMOSprocess with seven metal layers tovalidate the results and to quantify the degree of improvementas operation frequency increaseswhen an active decap is used as a drop-in replacementfor a passive decap. The test chip setup isshown in Fig. 4.9, where an active decap, a passive decap, and someuser logic are implemented.The layout of the active decap is shown in Fig. 4.10.The switching circuitry is located in thecenter, with the two parallel decaps on either side. The decap onthe left is PMOS, and the one onthe right is NMOS. The total layout area for the activedecap is 600im x 142jim = 0.085mm2,in70II--1Cl)ECl)S.403020100-10[39] (ui-———4- * — — + t t H- — — —I— —4 — 4- + + 4- H“°“+?I I——‘ I——r — —11 — — i rI r rr- -rr -r r-PM=10?I I44--I——— -I——1rru r:TTE7\ETrrJiJHHI I I I I I I I I I I I I I I I ..I_..J390 I II I I I I I I I II I I10 100 1000/10000PM = 11°Frequency (MHz)100000which the two decaps on either side combine for an areaof 0.077mm2.The switching circuitry,including switch transistors (Mnl, Mn2, Mpl and Mp2), accounts foronly 10% of the area andthis does not greatly affect the final voltage drop, as shown below.Norininal Supply (1V)PassiveDecap(ESD protected)RmeshFigure 4.9: Test chip setup.The area overhead consumed by the sensing and switching circuitry shouldbe considered as anadditional penalty for the active decap performance. The percentagearea overhead can bedefmed as x, then the charge provided by the active decap is:Qseries= {bJr,D—(1—k) VDDI. (1X)Cdecp/ fl (4.12)RmeshLpackRjjstVDDinput(external)Rstpackage——meshinductance resistance(Innuicked)capacitancesresistance.—--.User Logicto decap(Buffer)Figure 4.10: Layout of active decap showing the relative size of the components.71Thus, Equation (4.7) should be re-written as:(4.13)kVDD flCdecapAssuming k=0.1, n=2, and b=1.5, the final voltagea can be plotted as a function of the areaoverhead x, as shown in Fig. 4.11. From the figure,it is clear that the area overhead should belimited to within 30% to achieve a reasonable fmal voltage. If the areaoverhead is above 43%,using the active decap brings no benefit. In our case,x=lO% so that the penalty is only 5OmV.0.90.80.7I II0% 10% 20% 30%40% 50%xFigure 4.11: Final voltage a as a function of sensing and switching circuitryarea overhead x.The switch sizes were chosen to have a suitable parasitic series resistanceto provide ESDprotection, sufficient transient response [12] and good damping for potentialLC resonance [13].In our case, the two parallel decaps are formed using thin-oxidetransistors to improve area72efficiency, since ESD is not a major concern due to switch resistances inherent in the circuit. Thedecoupling capacitance values in the standby mode are 0.34nF each, resulting in a total of 0.68iiFin parallel.The extra passive decap of Fig. 4.9 was used to represent fixed decap that is always present inthe neighborhood of the active decap. It cannot be shut off. It also employs a series PMOSdevice to protect it from ESD risks. Both active and passive decaps are placed about 600pmaway from the user logic. Ref. [41] uses a linear feedback shift register (LFSR) as the user logicto generate power supply noise because the resulting noise pattern is somewhat randomized. Inour design, simply a large buffer with a large capacitive load was used to create supply noise,with the input controlled by an external signal. This way, the switching frequency can becontrolled and modified directly from the input. The size of the decaps was chosen to be only afew times larger than the capacitive load to create a —‘lOOmV voltage drop for the experiments.The three resistors (R1,R2 and R3 shown in Fig. 4.3) were implemented using p+ poiy resistances.The two capacitors C2 and C3 in Fig. 4.3 were implemented using MOS transistors to minimizearea overhead. A test chip microphotograph is illustrated in Fig. 4.12. The test chip area is totally1.2 x 0.8 mm2.73To measure the on-chip supply noise, a packaged die was not usedbecause it was intended toobserve internal voltages near the logic block. Thus, thesupply variations were measureddirectly with probes. Power supply noise comes from both JR dropand Ldi/dt effects. Theinductance L in the Ldi/dt effect is mainly due to the package, as theon-chip wire inductancesare normally negligible [2J. Since an actual package was not used, two on-chipspiral inductorswere implemented to mimic the package inductances, one on the supplypath and the other on thereturn path. The value of the spiral inductors is close to a typical wire-bond packageinductance.The user logic and the decaps were placed far away from the supply/groundpads (about 6OOim)to create a large mesh resistance. Effectively, the pad and mesh of the testchip were designed toActive DecapPassive DecapFigure 4.12: Annotated test chip microphotograph.74produce a measurable amount of power supply noisesuch that any improvements of using activedecaps could be easily observed.4.3.2 Test Chip SimulationsBefore showing the measurement results of the testchips, it is desired to illustrate the simulationresults first to demonstrate the close relationship ofthe two and to provide a better understandingof how the active decap actually behaves when largesupply noise is present. The simulationsetup follows exactly the test chip in Fig. 4.9.When a clock signal is fed into the test chip, thebuffer, as the user logic circuit, switches to drawcertain current from the supply rails. Thecurrent perks created from the buffer cause the supply voltageVDD to drop, resulting in the twoinput voltages of the comparators to swap.The comparators then switch their output levelsaccording to the input swap. After certain delay, theoutputs of the comparators switch, causing alocal boost at the supply voltageVDD. Once the supply is boosted, the inputs of the comparatorsmove back to their nominal values. As a result, theoutputs of the comparators switch back aftersome delay and the active decap waits for the nextvoltage drop. When the active decap is turnedoff, the circuit behaves like a passive decap.In such a scenario, the supply voltage follows thecurrent draw of the buffer, and no boost inthe supply voltage is achieved. From post-layoutHSPICE simulation, the current draw, the supply voltageVDD and the internal signal switchingare shown in Fig. 4.13, where the clock frequencyis set at 500MHz.75t_LJ lf J/1/— -rAvArr*fJ.JrrJEfrN:cfiZa i’*ifl1twA---.-Currenttaken frombuffer25DmZOOmmomlOOnSOn0N N rion1.0512nlime (Fm) (liME)VDD voltageactive decap onactive decao off950mSOOnV aI05Dm -A!4 /iZnvin+Vin -VtDOOmDOOm55Dmgsoon45Dm40DmVog(top)1fout (hot...00015gDOOmL‘ZOOm0line (Iii) (liME)Figure 4.13: Simulated VDD voltage (on a 500MHZ clock) with active decap on andoff.76Random process variations exist to affect the effectivenessof the active decap. At the slowcorner, the delay of the comparators is increased, resulting ina later boosting point at the supplyrail. On the other hand, for the fast process, the comparatorstake a shorter delay to switch so thatthe local supply boost occurs earlier. As a consequence, thesupply voltage remains high for alonger time for the fast corner, and for a shorter time for the slow corner,as illustrated in Fig.4.14. Therefore, the average supply noise per clock cycle should be lessfor a test chip at the fastprocess corner. Alternatively, a larger average supply noise level canbe expected if a test chiphappens to be at the slow process corner. Designers should make surethat the active decapprovides satisfactory improvement to remove hot spots under process variations,especially at theslow process corner.VDD voltage(Typicel)VDD voltage(Slow)141Pfl00DmlZn 141195Dm90DmDOOmion95Dm91mm05DmVDD voltage(Fast)limeiOnFigure 4.14: SimulatedVDD voltage with active decap on for different process corners.774.3.3 Test Chip MeasurementsThis section describes the measurement resultsobtained on 15 test chips and validates thesimulation results, as illustrated in Fig. 4.15. AnAgilent 861 30A bit error rate tester wasused todrive the inputs, while an Agilent DSO8 1 304A oscilloscopewas used to observe the results. Asmentioned earlier, the passive decap shown in thefigure is always present in the test chip. Theenable signal is used to selectively turn on or offthe active decap by applying a high or lowvoltage. When disabled, the decaps are biasedin parallel, utilizing a maximum standbycapacitance. In that case, the active decap behavespurely as a passive decap. When enabled, theactive decap is triggered by voltage drops of about5OmV. By turning on and off the active decap,the average VDD voltage improvement can be measured.A collection of 15 sample chips aretested, where the clock frequency is fixed at 1GHz to observethe improvement across the samplespace. The test results from these sample chips were categorized intothree groups: slow, typical,and fast, to reflect the nature of random process variationon silicon. Note that in all cases, theactive decap moved the JR drop inside the lOOmV noisebudget. The averageVDD voltages ofeach group line up closely with simulation underprocess variations.78930920-AAAAAAAA• AAci)A A900• Noise Budget. 890Slow880AActive Decap ON_________I870• Active Decap OFFFat860I I I I I I I I I1 2 3 4 5 6 7 8 9 10 1112 13 14 15Sample NumberFigure 4.15: Grouped scatter plot for active decap noise reduction forthe tested sample chips.In Fig. 4.15, when the active decap is off, the average supplyvoltage varies significantly from873mV to 9lOmV. This is due to the random process variationon Rmesh and Lpack, whichdetermines the JR drop on the supply rails. But more importantly,a higher averageVDD valuewhen the active decap is off results in a smaller improvement whenturning on the active decap.This fact is caused by the nature of the comparators. Specifically,if the input of the comparatorvaries with a large swing, then the comparator will generate an outputwith a shorter delay. Thatis, for a larger supply noise k, the delay of the comparatortd is smaller. This effect can beillustrated in Fig. 4.16. Longer delay results in lower bandwidth, whichaffects the active decapperformance at high frequencies. Note that the delay difference Etdalso increases as k increases,which will reduce the boosted voltage b slightly.79Table 4.4: Comparator delaytdand delay difference Atd in different corners.Cornerstd(top)td (bottom) Average tdSlow 0.77 ns 0.72 ns0.05 ns 0.75 nsTypical 0.53 ns 0.48 ns 0.05ns 0.51 nsFast 0.32 ns 0.30 ns0.02 ns 0.31 nsThe different process corners have direct impact on comparator delay and delaydifference. For afixed value of k=0. 1, the comparator delay and delay differenceunder process variation ishighlighted in Table 4.4. The average delay between the top andthe bottom comparator varies-dI0.650.080.070.60.064-0.550.050.040.50.030.020.45• 0.010.4 II I IO.OO0.05 0.07 0.09 0.11 0.13 0.150.17 0.19Supply Noise kFigure 4.16: Comparator delaytd and delay difference Md as a function of supply noise k.I)0I80from 0.75ns (slow) to 0.3 ins (fast), more than 140%of variation. This effect explains the on-dievariation for the active decap performance measurements. However,the delay difference of thecomparators only varies slightly, indicating that theboosted voltage b will not be affected greatlyby process variations.By averaging the average supply voltage from eachgroup in Fig. 4.15, the overall improvementof using the active decap can be assessed, as highlighted in Table4.5. Note that in the test chipthe passive decap is always connected. To illustrate the improvementsolely due to use of activedecap, simulations were carried out with the passive decap completelyremoved. The simulatedresults showing active decap only versus passive decap issummarized in Table 4.6, where thecase for the active decap provides a higher average supply voltage across theprocess corners.Due to the dynamic power consumption of the comparatorsduring switching and other nonidealities, the active decap cannot practically reach the final voltagevalue described in Equation(4.13). Given a level of the supply noise when the active decapis off, the final voltage can becalculated from Equation (4.13). Comparing the expected and theactual final voltage when theactive decap is turned on, the two values are close (-400mVof difference) for the cases oftypical and fast process corners, as summarized in Table 4.6.81Table 4.5: Measured active decap performance fordifferent process corners.Average VDD voltageCornersImprovementactive decap ON active decap OFFSlow 914 mV 903 mV 11mYTypical 904mV 878mV26mVFast 917 mV 872 mV 45 mYTable 4.6: Comparison between equation and simulated result after correlation.Simulated avgFinal voltage Measured averageVDD voltageVDD: activeCorners (from eqn.)decap onlyaVDD active decap ON active decapOFF(correlated)Slow 1169 mY 932 mY 914 mV903 mVTypical 1058 mY 909 mV 904 mV878 mVFast 1031 mY 921mV917mV 872mVIt is now possible to validate Equation (4.8). Althoughthe test equipment has a limitedbandwidth of less than 1.5GHz, the test results for a 1GHz clock can beused to correlatesimulations for high frequency effects. The three processcorners are used here. Simulations were82carried out from 1GHz to 3GHz, showing the cross points of activedecap and passive decap forthe average supply voltage. The simulation results are then correlatedwith measurement resultsfor the three process corners. The correlated results are shown inFig. 4.17. The relationshipbetween the active decap bandwidth (crossing point) and the average comparatordelay aresummarized in Table 4.7. Clearly, Equation (4.8) captures the effect ofprocess variation corners.Table 4.7: Active decap bandwidth versus average comparator delayunder process corners.Corners Bandwidth Average delaySlow 1.55GHz 0.75nsTypical 2.4 GHz 0.51 nsFast > 3 GHz 0.31 ns920900880p860840820ci)7807601000 1500 25002000Clk Frequency (MHz)(a)300083920900880.s86084082080078076010003000(b)920900840820La)78076010003000(c)Figure 4.17: Simulated averageVDD voltage with active decap on and off versus clock frequency for(a) slow, (b) typical, and (c) fast process corners.1500 2000 2500Cik Frequency (MHz)AActive Decap ON• Active Decap OFF1500 2000 2500Cik Frequency (MHz)844.3.4 Measurement Results on One Typical ChipThree sample chips were found to be at the typical process corner.The averageVDD voltageswhen the active decap is on and off can be found back in Table4.5. A sample chip that has aclose value to the average voltage of the three typicalchips was used for further analysis in thissection. As before, by turning on and off the active decap,the average VDD voltage improvementwas measured. This is shown in Fig. 4.18, where the inputis set at 500MHz, typical of ASICdesigns. In Fig. 4.18(a), an actual screen shot of the input and supply voltagesare provided withthe active decap enabled. In Fig. 4.18(b), the supply waveformsfor the passive and active casesare superimposed. The average supply voltage increases from 900mVto 914mV. Therefore, thenoise level dropped from lOOmV to 86mV, an improvement of l4mV(or about 14% less noise).This improvement can be expected to be almost doubled foran isolated active decap, asillustrated later using simulation.: O’? 0?”(a)851.020.970.920.870.82Time (ns)(b)Figure 4.18: Measured results (on a 500Mflz clock) for (a) activedecap on and (b) plottedcomparison between active decap on and off.Fig. 4.19 shows the measured points as the external input frequency increasedfrom 200MHz to1GHz. The measurements at 500MHz described above are circled. Twosolid trend lines areprovided corresponding to active decap on and off. The gap between the twotrend lines initiallywidens indicating that the benefit of active decaps increases as frequency increases.The test chipvalidated that the active decap has a maximum improvement of 23mV (or about20% less noise)for a 1 GHz design. Circuit simulation was used to further study theeffect of higher clockfrequencies, also shown in Fig. 4.19 as dashed lines. Below2GHz, the active decap can providemore charge than the passive decap. Above 2GHz, its performancediminishes because of thefixed switching speed. The crossing point of the two trend lines at about 2.4GHz indicatesthe0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 586bandwidth limits due to the switching nature of this active decap design.However, today’s high-end ASIC designs still run at below 1GHz, so this active decap is quite acceptable.930- AAAAMeasured Active Decap ON• Measured Active Decap OFFSimulated Active Deca.p ONDSimulated Active Decap OFF910> 890bljCt8708508308100 500 1000 1500 2000 2500Cik Frequency (MHz)Figure 4.19: Measured (0.2 - 1GHz) and simulated (1 - 2.5Ghz) averageVDD voltage with activedecap on and off versus clock frequency.In Fig. 4.19, the maximum difference between the active decap onand off occurs at about 1GHz,about a half of the active decap bandwidth. As described in the previous section, the high-frequency cross point of the active decap and the passive decap is at about l/O.5ns=2GHz,whereO.5ns is the comparator delay. At low frequencies, the active decap andthe passive decap havesimilar average VDD values because the clock period is long enough to eliminate the advantage ofusing the active decap when taking an averageVDD level per clock cycle. As a result, since thedifference is almost zero at low frequency and at the bandwidth frequency,the maximum87difference (or benefit) can be considered to occur at roughly1/2 of the active decap bandwidth,in this case, 2GHz/2=1 GHz. Therefore, to maximize the effectivenessof the active decap,designers should ensure that the comparator delaytd is always less than 1/2 of the clock cycle.4.4 Active Decap Size and PlacementWhile the test chips are useful in quantifying active decap improvement overpassive decap asfrequency increases, the proper sizing and placement of the active decapdetermines theeffectiveness of the drop-in replacement approach. When converting a fixedarea from passivedecaps into active decaps, the standby capacitance is always smaller becauseof the areaoverhead of the switches and comparators in the active decap. The actualnoise improvementdepends on the area available and the location of the active decap relativeto the hot spot.Intuitively, if the area available for active decaps is small and the overheadarea is a largepercentage of the total area, active decaps may not be an effective replacementfor passivedecaps. On the other hand, if the area available is too large, the noise reductionfor active andpassive decaps may be similar because of excessive delays in the switchingcircuitry trying toswitch the oversized Cdecap’S. Therefore, there exists an optimal area for activedecaps where theyare most effective. Similarly, the instantaneous boost providedby active decap must be close tothe hot spot to be effective, but this should be traded off against the distance fromthe supply toreplenish the charge.To explore these aspects, extensive circuit simulation was used to first calibrate thetest chipmeasurements with HSPICE simulation using exactly the same setup in Fig. 4.9.As a calibration88metric, the average VDD noise per clock cycle,VDDnoise,was used since it is known to be thecontrolling factor of the critical path delay of logic circuits [20]:VDDnoise= VDDnominal— VDD= lv— VDjJavg(4.14)With a 500MHz input and the active decap turned on, the waveforms from circuitsimulation ofthe supply voltage and internal signals were previously shown inFig. 4.13. These results are notidentical to the measured results of Fig. 4.18 but theydo, in fact, have a similar average value.For one clock cycle,VDD avg = 918mV, which is fairly close to the measured average of 914mV.It was found that, for other frequencies,VDD avg closely matched the running average frommeasurement. Since the measured and simulated values are correlated in thisway, the averagewas used for the rest of the analysis. The fixed passive decap was also removedfrom the circuitin further analysis to study the improvements derived from the stand-aloneactive decap.While VDDnoiseis determined by many other factors including package/padlpowergrid designand clock frequency, for the purpose of this analysis the same powergrid design and the500MHz input were kept the same, and only the decap size varied. The averagenoise for same-area passive and active decaps was compared. The size of the center switchingcircuitry of Fig.4.10 also remained constant. In the passive decap simulations,standard cross-coupled designswere used. In Fig. 4.20, the average noiseVDD noise is plotted versus passive and active decap sizevarying from85iim2to 8.5mm2.The plot shows that the active decap reduces the noise relativeto passive decaps for sizes between 0.001mm2and 0.6mm2.However, if the availablearea fordecap insertion is smaller than 0.00 1mm2 or greater than 0.6mm2,there islittle differencebetween the two. When the area is small, the active decap is not as effectivesince the amount ofcapacitance switched in series is small. When the area is large, the fixed switchingcircuit in the89active decap cannot switch the decaps effectively because the capacitiveload exceeds itscapability.150-°- Passive Decap.140— Active Decap130120I:.zzz<9080 I0.0001 0.001 0.01 0.1 1 102Decap Size (mm)(log scale)Figure 4.20: Simulated averageVDD noise per clock cycle versus normalized decap size.Fig. 4.21 is used to illustrate the optimal size for the active decap design, wherethe solid curveindicates the noise reduction difference between passive and active decaps.The maximumdifference occurs in the range of 0.01mm2to 0.1mm2.If the active decap isdesigned in thisrange, it has the greatest advantage over passive decaps in terms of averagesupply noisereduction. The test chip was designed to be 0.085mm2to obtain close to optimum improvementof 23mV, as described in the previous section. In Fig. 4.21, it is alsoshown that the areaoverhead of the switching circuitry in the design is only 10% in the regionaround the optimalvalue.903010000100020ci)1510010.-. 1o1-100.10.0001 0.001 0.01 0.1 110Decap Size(2)(log scale)Figure 4.21: Power supply noise reduction difference from active decap andpassive decap witharea overhead from switching circuit of active decap.As mentioned earlier, another important factor in the resulting improvementis the actualplacement of active decaps relative to the hot spot. Referringback to Fig. 4.9, the effectivedistance from the hot spot can be adjusted by varying Similarly, thedistance from thecharge re-supply path can be controlled by varyingRmesh. Simulations were carried out toobserve the voltage drops while changing onlyRmesh and Rdt. The simulation results are shownin Fig. 4.22. The decap size was fixed at the optimal value of0.02mm2from Fig. 4.20 so that themaximum improvement could be observed. As the distance between thedecap and the user logicis varied from lORth to 0.1Rajs, the average noise level in the passive case changesfrom 134mVto l24mV. However, for the active case, the average noise level reduces from133mV to 74mV.91Therefore, the active decap is more sensitive to placement than the passivedecap. This makesintuitive sense because the active decap provides a short-term boost in the chargewhich acts in asmall, localized neighborhood. However, the passive and active decapsexhibit similarcharacteristics as a function ofRmesh, according to the results in Fig. 4.22. As a result, the activedecap should be placed as close as possible to the hot spot to be most effective.160140,120j 100Cd)o 80z>ci)ci)2000.1 Rdist 1 .ORdist 1 ORdist 0.1 Rmesh 1 .ORmesh1 ORmeshFigure 4.22: Improvement on averageVDDnoise for using active decaps in different placementlocations by varying Rdist and Rmesh.4.5 SummaryThis chapter described the effectiveness of active decaps as a late-stage drop-inreplacement forpassive decaps so that a completed chip layout need not be disrupted near the tapeout deadline.Improvements to the design of an active decoupling capacitor were described forremoval of hot-92spot power supply noise in ASIC designs up to 10Hz operation.The modified active decap usinglatch-based comparators in 9Onm CMOS is able to switch in0.5ns and consumes a relatively lowpower of 2.8mW, which is about 5X lower than a previousdesign running at approximately thesame speed. This reduced power makes it more suitable foruse in ASIC designs. Measurementresults from test chips indicate improvement over passivedecaps of 10% - 20%, operating from200MHz to 1GHz. The optimal active decap size tomaximally remove hot-spot noise wasidentified. Placement analysis was also carried out and it was found that theactive decap is mosteffective when placed in close proximity to the hot spot, as compared to thepassive decap whichis not as sensitive to the exact location. In summary, if sizedand placed properly, active decapscan be up to 20% better when used as drop-in replacementsdf passive decaps for power supplynoise reduction.93Chapter 5Generalized Active Decap and Charge-BorrowingDecap5.1 IntroductionThe previous chapter explored the effectiveness of using active decaps to removehot-spot IRdrop violations. This chapter investigates advanced versionsof the active decoupling capacitorand proposes a novel design of a charge-borrowing decap (CBD). The extension of theactivedecap concept is derived by increasing the stack height n to a larger value to ideally achieveahigher boosted voltage than the basic n=2 active decap. The optimal number ofn will beevaluated in theory and practical applications [62]. The CBD design isa completely differentapproach to addressing the hot-spot removal problem [63]. The new design aimsto provideenough charge during every cycle to reduce JR drop with only a minimum poweroverhead.However, the location of the JR-drop problem must be known in advanceand sufficient areamust be available with a relatively clean supply to implement the solution. The applications andlimitations of the charge-borrowing decap will be evaluated in this chapter.945.2 Extended Active Decoupling Capacitor5.2.1 Optimal Stack Height nThe concept of the extended active decap is simply to increasethe stack height n of the basicactive decap described in Chapter 4. The motivation to have a largerstack height (n>2) is togenerate a higher boosted voltageflVDD to potentially achieve a better improvement whenapplied to reduce supply noise. For example, the ideal boosted voltage is3VDD for n=3, 4VDD forn=4,and so on. Therefore, it seems that the stack height should be designedas large as possibleto obtain a high enough local boost so that the supply noisecan be reduced to an arbitrarily smalllevel. However, this is not true in practice. The higher boosted levelscannot be reached due tothe nonidealities of the circuit. Also, by increasing the stack height,more switches will berequired to turn the decaps in parallel or in series. The active decap circuitsfor n=2, n=3 and n—4are illustrated in Fig. 5.1(a), 5.1(b) and 5.1(c), respectively. In the figure, itis assumed that thetotal area available for the decaps is fixed at2Cdecap.Therefore, the decap occupies an area ofCdecap for n=2, while the cases of n=3 and n=4 have an area of(2/3)Cdecap and (l/2)Cdecap,respectively.95VDD 2VDD1VgCaccapV Vssdecaps in parallelVDDCaecapCdecapCdecapVss Vssdecaps in series3VDD(a)Caccap4+C&Vss Vss Vssdecaps in parallelVDD1-cdecapC decapVssdecaps in series(b)4VppLJdecapdec4rr41Vss Vss Vss Vss Vssdecaps in parallel decaps in series(c)Figure 5.1: Active decap concept showing different stack height: (a) n=2, (b)n=3, and (c) r—4.96The practical constraints of stacking the decaps can be illustratedby first showing the actualtransistor implementation of the n=3 case in Fig. 5.2, wherethe switches are implemented usingMOS transistors. Note that the number of switches requiredis increased by three every time thestack height is increased by one. That is, for n2, the numberof switches 3(n-1). In the figure,each horizontal switch is implemented with two transistors(NMOS and PMOS), whereas eachvertical switch requires only one transistor (NMOS or PMOS). Thedesign of the stack height ofn=4 can be realized in a similar manner, although not shown here.V VsFigure 5.2: MOS implementation of the extended active decap (n=3).Due to the additional switches, two methods of design exist: one is to expandthe area occupiedby the switch transistors, and the other is to reduce the size of eachswitch transistor if arelatively constant total area needs to be maintained. The firstmethod increases the areaoverhead x of the active decap, and also causes a longer comparatordelay because of theadditional loading capacitance. The increased area overhead reduces the final voltageaVDD towhich the active decap can boost the supply voltage. But more importantly,a longer delay resultsVDDC decap97in a reduced bandwidth, which lowers the operatingfrequency, as described in the previouschapter. The second method uses a fixed total area occupiedby the switch transistors. By thisapproach, the delay of the comparators should remain roughly the samesince there is only aminimum change in the loading capacitance. Therefore, the operatingfrequency is not affected.However, packing more switch transistors into the same area cancause the “on” resistance R ofeach transistor to increase, which reduces the boosted voltagebVDD.The second method to implement the extended active decap was usedsince the active decapbandwidth should not be compromised for high-end ASIC applications.That is, different designsare designed to have the same operating bandwidth. The final voltageaVDD can be obtained byvarying the stack height, n, as illustrated in Fig. 5.3, with differentsupply noise levels, k. Thelarge dots indicate the highest final voltage points on the different kcurves. Note that the startingpoint of n=1 implies a passive decap. Clearly, for k=O.05, the optimaln value that produces thehighest final voltage is 4, whereas for k=O. 1, the optimal value is n=3. For anincreased level ofk=O. 15, the optimal n value reduces to 2. As the supply noise k further increases,the use ofpassive decaps (n1) is recommended, as for the case of k=O.2. Fromthe figure, one canconclude that a higher n should be used jfk is small, and vice-versa. Therefore,it is important toselect n based on the k range where the resulting final voltageaVDD is the highest. In order to dothis, the optimal k ranges for each n and the crossover points between ranges must bedetermined.9821.5-dCE-e1C01)cl0.50nFigure 5.3: Final voltage aVDD as a function of stack height n with k varying (fixedarea).The supply noise crossover point, for two different stackinglevels,fl=iand=fl2,Sdefined when both cases produce the same final voltage. This can beused to identif’ suitableranges for each stacking level. To obtain Equation (4.13) canbe rearranged into thefollowing form:k12= (bL2 — bL1)(1— xL1)(1— xL2)(5.1)(n2 —n1 )+n1 x2 —n2wherefllfl2and fl2>fll. Plugging in numbers, the crossover noise valuefrom n=2 to n=3 isk2,3=O. 12, and from n=3 to n=4 isk3,4=O.08. Effectively, the crossoverpointk4,5=O.05 determinesthe boundary where a passive decap should be used since the case of n5 would notbe used if1 2 3499the noise was 5% or less (i.e., acceptable level). Similarly,k1,2=0.17produces the same finalvoltage for n=2 and n=1 (passive decap). When k is above 0.17, thepassive decap should be used.The results are presented in a graphical form in Fig. 5.4. The four linesrepresent n=l, 2, 3 and 4,respectively. The line with the highest value in each region is the optimalvalue for n. For lowvalues of k, the best choice is n=4. Starting at k3,4 the best choice isn=3. At k2,3 the best choicebecomes n=2. At k1,2 the best choice is n=1 from that point onward. The results are summarizedin Table 5.1.Table 5.1: Optimal stack height n selection based on the supply noisek (from formula).Condition Optimal nk < 0.05 n=1 (use passive decap)O.05<k <0.08 n=40.08 < k < 0.12 n30.12<k<O.17 n=2k> 0.17 n=1 (use passive decap)1002.52‘‘-‘l 50.500 0.2kFigure 5.4: Final voltage aVDD as a function of k with different stack height n (from formula).As described earlier, if the supply noise is above 0.17, the use of any form of the active decapcannot boost the supply voltage to a satisfactory level. Other design approaches to reduce thesupply noise may have to be used in that situation. However, the more interesting range of k isfrom 0.08 to 0.17, since this noise level is typically unacceptable. If the supply noise k is below0.12 but above 0.08, then the active decap should be designed with n=3 to produce the minimumnoise. Similarly, jfk is above 0.12 but below 0.17, the basic active decap with n=2 is optimal.0.05 0.1 0.151015.2.2 Design and Layout of n=3 Extended Active DecapTo validate the results of optimal n selection, the extendedactive decaps with n=3 and n=4 wereimplemented. For simplicity, only the design ofn3 is illustrated in this section. Similar to thebasic active decap design, Fig. 5.5 illustrates the extended activedecap for n=3 containing fourblocks: a reference voltage generator, a pair of high-pass filters,two comparators, and theswitched decaps. The operation of the circuit remainsthe same: the differential inputs of eachcomparator decide the standby voltage level at theoutputs of the comparators. When the powergrid discharges,VDD will drop and Vss will rise. The voltage variations are passed through thehigh-pass filters to the comparator inputs causing the comparatorsto reverse their output values,and switch the three-piece decaps from parallel to series. Later, whenthe power grid charges up,the comparator inputs and outputs switch back to their originalvalues. An enable signal isprovided for testing purposes to allow the active decap circuitryto be connected to the globalsupply rail. Unlike the previous design, when off, theextended active decap is disconnected fromthe rest of the circuit. This allows for a comparison betweenthe basic and the extended decaps.Switched Decaps (N=3)Figure 5.5: Extended active decap (n=3) architecture.globalVDDglobal VsReferenceVoltage High-passGenerator Filters Comparators102--t --Figure 5.6: Layout of extended active decap (n=3) showing the relativesize of the components.The layout of this extended active decap is shown inFig. 5.6. The switching circuitry is locatedoffset from the center, with the three parallel decapson either side. The two decaps on the leftare PMOS, and the one on the right is NMOS. Thetotal layout area for the active decap is600j.tm x 142jim = 0.085mm2,in which the three decaps combine foran area of 0.077mm2.Eachswitch transistor was layed out with only a half ofthe area as before, resulting in an almostdoubling of the “on” resistance. As a result, the switching circuitry,including the switchtransistors, still accounts for only 10% of the area overhead.Note that the total area is the sameas the basic active decap so that a comparison between the two canbe carried out. Although notshown, the design and layout of the n=4 case is similar to the case ofn3.5.2.3 Simulation ResultsThe next step was to use simulations to obtain the supply voltagewaveforms under differentsupply noise levels, k. By increasing the size of the user logic buffer,k can be varied and thesupply voltages will differ in the cases of n=2 and n=3, asillustrated in Fig. 5.7. When k0. 12,case n=3 provides a larger boost than n=2. On the other hand, when k=0.15,both n=2 and n=3were insufficient in terms of delivering charge to boost up the supply,but the n=3 case is even103worse. As shown in the figure, by using the second method, thedelay of the comparators remainsroughly the same. The design of n=4 active decap was also simulated,and the results are similarto the n=2 basic active decap case. One noticeable difference isthat as k increases above 0.12,the averageVDD voltage drops earlier for the n—4 case than the n=2 case. However, when k is ataround 0.05, the averageVDD voltage for n=4 is slightly higher than both n=2 and n=3 cases.uIvnrnTTnIa600ni50Dm ....-...£loomCurrenttaken Frombuffer toproducek=O.12or k=O.15DO:trl :i(rcur)DO:trfl:i(rcur)k.are...InIin‘fl... IIhfl rr’LIflVDD voltage(k=O.12)Basic act. decapExtended activedecapkualue1.11.05Iiia5an500m05DmIionnflVDD voltage(k=O.15)Basic act. decapExtended activedecaplb1.1tooI 95&n50CmU005Dm00Dm70Cm70DmrAi, .s .iOninrmm (On) (liME)1’inFigure 5.7: Simulated VDD voltage with extended active decap (n3) on for twodifferent Ic levels.104980960940920900I::<840820800Table 5.2: Optimal stack height n selection based on the suppiy noise k(from simulation).Condition Optimal n (simulated)k < 0.05 n=1 (use passive decap)0.05 <k<0.070.07 0.14 n=30.14<k<0.16 n2k> 0.16 n=1 (use passive decap)—Passive Decap (n=1)—°- Active Decap (n=2)-&-Active Decap (n=3)—‘c- Active Decap (n=4)0.050 0.1 0.15kFigure 5.8: AverageVDD voltage as a function of k with different stack height n (from simulation).0.2105Using simulations, the average supply voltages per clock cyclefor the n2, 3 and 4 cases whenthe supply noise k varies from 0.02 to 0.2 were comparedand plotted in Fig. 5.8. Thecorresponding optimal stack height n asa function of the supply noise k is summarized in Table5.2. The crossover points of k1,n2are similar between formula and simulation.The mostimportant crossover point of the n=2 and n=3 casesfrom simulation is k2,3=0. 14, somewhathigher than the calculated value of 0.12. Above 0.14,no approach can raise the supply level backto 900mV, making the use of active decaps less valuablein this region. On the other hand, thelower bound of k3,4 is at 0.07, slightly belowthe calculated level of 0.08. Therefore,a slightlywider k range of 0.070. 14 for n=3 makes it superiorto the basic active decap. Unlike theformula, when the k value is low, the active decapsdo not switch due to the fixed triggeringvoltage of about 5OmV, resulting in the active decaps producingslightly worse average supplyvoltage than the passive decap. However, the active decaps becomeworse then the passive decapwhen k<0.05, and they should not to be used. Althoughthe n4 case is the best when k is in therange of 0.050.07, it only has limited value since thisk range is small and the improvementover the n=3 active decap is marginal. Thus, itcan be concluded from simulation that n3provides the optimal level of the average supplyvoltage across a wide supply noise range ofbelow 0.14. If the supply noise is above 0.14, a largerarea is required to increase the averagesupply level to a satisfactory level above 900mV.1065.3 Charge-Borrowing Decap (CBD)5.3.1 Charge-Borrowing Decap ConceptThe main purpose of the active decap is to boost thevoltage locally to reduce supply noise.Therefore, any technique that offers this type of improvement wouldalso qualify as a viablealternative. For example, if charge is “borrowed” from a clean supplyto boost up a noisy supply,it would help reduce the hot-spot JR-drop problem. That is the basicconcept behind a charge-borrowing decap (CBD), which is a novel but relatively simple ideaillustrated in Fig. 5.9. Thekey idea here is based on capacitive feedthrough. Assuming that thetotal area available is thesame as before, the decoupling capacitance is2Cdecap. In Fig. 5.9(a), when power supply noisekVDD is present, a passive decap provides charge equal to(2Cdecap)(kVDD)(2k)CdecapVDD. In thecase of the CBD, as shown in Fig. 5.8(b), it can boost the localsupply voltage to 2VDD ideally,similar to the active decaps. From another perspective, the charge providedby the CBD circuit inone clock cycle can ideally be up to(2Cdecap)LVc1k =2CdecapVDD, where AV1k is the clock swing.Therefore, over one clock cycle, the charge-borrowing decap provides significantlymore chargethan a same-area passive decap.VDD VDD —a.. 2VDD2Cdecap 2CdecapVDD0JL(a) (b)Figure 5.9: Charge-borrowing decap concept shown in (b), compared to a passive decapin (a).107Table 5.3: High-level comparison among passive, active,and charge-borrowing decaps.Charge provided (ideal) Boosted voltage (ideal)Passive decap(2k)CdecapVDD -Active decap (n=2)(l/2)CdecapVDD 2VDDActive decap (n=3)(419)CdecapVDD 3VDDActive decap (n=4)(3/8)CdecapVDD 4VDDCharge-borrowing decap (CBD)2CdecapVDD 2VDDA high-level comparison between passive decap, activedecaps, and CBD is highlighted in Table5.3, where the differences in charge available andboosted voltage are shown. Note that the totaldecap area is fixed to2Cdecap in the comparison. In Table 5.3, the basic active decap (n=2) isbetter in charge provided than the passive decap if 2kis less than 0.5 (or, k<0.25), plus the activedecap also provides a local boost in the supply voltage. The charge-borrowingdecap is alwayssuperior to basic active decap since it supplies morecharge while generating the same boostvoltage level. The charge supplied from active decapswith n=3 or n=4 is about 11% or 25% lessthan the basic active decap (n=2), respectively, buttheir boosted voltages are higher. Thisintuitively explains the fact that the extended active decapis better than the basic case only inlimited situations. From the concept, it is difficultto conclude the superiority between theextended active decaps and the CBD. More design details needto be studied before a conclusioncan be made. From the table, the design that providesthe most charge per clock cycle is thecharge-borrowing decap so that it was named afterthis feature. If designed properly, the use of108charge-borrowing decaps can potentially remove hot spots asa drop-in replacement of passivedecaps, similar to the active decaps. The rest of this chapter will provideresults to support thisargument.In Fig. 5.9(b), there is a problem with the falling edgeof the clock. When the Cik signal risesfrom 0 to VDD, the supply is boosted to2VDD ideally due to capacitance feedthrough. Then, theuser logic nearby switches, resulting in certain amountof charge to withdraw from the supply.The supply voltage will drop by AV. And then the Cik signal fallsfrom VDD back to 0, forcingthe supply voltage to drop from2VDD-LW to VDD-AV, which may not be acceptable if iw isexcessively large. Therefore, the concept of charge-borrowingdecap needs some additionalcircuitry to function properly, as shown in Fig. 5.10(a). Two diodesare inserted at node Bi, onefrom the clean supply and the other from the noisy supply. Withoutthe connection to the cleanVDD, when the clock is low, B 1 stays at roughly Vss since the current flow fromVDDto B 1 isprevented by the diode, D2. When the clock goes high, Bi risesto VDD. This boost in voltagewill not trigger current flow from Bi to the supply because bothB1 and the supply are at thesame level of VDD. Therefore, the voltage at Bi shouldbe around VDD when the clock is low.This ensures that the voltage at B 1 can reach about2VDD when the clock goes high and thesupply can be charged. To achieve that, access to a clean supply ofVDD is needed through Dl.Therefore, the implementation of the charge-borrowing decap requires a clockingsignal, twodiodes, and a supply node that has less noise (clean).Assuming there is one threshold voltageVT drop across each diode, the operation of the CBDcircuit can be illustrated in Fig. 5.10. When Clk is at 0, the noisy supplynode is assumed to be at109VDD, while B 1 is charged at VDD-VT. When Cik rises toVDD, B 1 rises to2VDD-VT at the sametime. As a result, the noisy supply node is increased to2VDD-VT. Before the clock falls, somedrop occurs at the supply, causing it to drop by zW to2VDD-2VT-AV. Then, Clk falls back to 0,so that B 1 also reduces toVDD-VT. However, D2 prevents charge from flowing back to B 1 fromthe supply. Therefore, the noisy supply remains at2VDD-2VT-AV.clean supply® VDDclean supply(VDDD1 D1c&HoI0 VDjVT VDD 2VDD-VT2VDp-2VT-VI0In a CMOS process, the diodes shown in Fig. 5.10 can generally be implementedin one of thetwo forms: NMOS and PMOS, as illustrated in Fig. 5.11(a) and 5.11(b), respectively[58j[64]. InFig. 5.11(a), the voltage at Bi when the clock signal is low isVDD-VTnI, where VT1 is theclean supply@ DDDlD2noisysupplyVDD-VT 2VDD2VTV(c)(a) (b)Figure 5.10: Diode inserted charge-borrowing decap showingthe states of the clocking signal: (a)Cik at 0, (b) Cik rises toVDD, and (c) Cik falls back to 0.VDD(clean)VDD(clean)MdiiiMdpiJLVDDMdnl(noisy)(a)JLVDD(noisy)(b)Figure 5.11: Possible implementations of charge-borrowing decap with (a)NMOS-formed diodesand (b) PMOS-formed diodes.110threshold voltage of Mdnl. When clockrises, B 1 is increased to2VDD-VTn, causing current flowto charge up the noisy supply rail.Assuming that the supply islocalized, then the boostedvoltage is2VDD-VTflI-VTII2 due to the threshold voltage degradationof Mdn2. Similarly, inFig.5.10(c), the boosted voltage is2VDD-IVTPII-IVTP2I where IVTpII andIVTp2I are the absolutethreshold voltages ofMdpland Mdp2, respectively.Both forms in Fig. 5.11(a) and5.11(b) have practical limitations.In Fig. 5.11(a), when Bireaches2VDD-VT1, the same voltage is applied at the gate of Mdn2.Such a high gate voltagemay cause oxide breakdown in deepsubmicron processes, particularly9Onm and below [24]. Athick-oxide transistor shouldbe used for Mdn2 to protect it frombreakdown. In Fig. 5.11(b),when B 1 is at2VDD-IVTPI,the drain of the transistor Mdp1 is at the same voltage, whilethe bodyof the transistor is tied toVDD. This also creates a reliability concern that thepn junction of thetransistor Mdpl is forward-biased,resulting in the injection ofa large amount of current backtothe clean supply rail. Therefore,the PMOS implementation inFig. 5.11(b) has to be modified.For example, if the gate voltage ofMdp 1 were controlled separately usingan appropriate voltagelevel and the bulk of Mdp 1 were connectedto B 1, the forward biasing ofthe pn junction can beavoided.2VDDoJLB2VDD0JLCikVDD(noisy)Figure 5.12: Improved implementationof charge-borrowing decap with PMOSformed diodes.VDD(clean)111The modified circuit shown in Fig.5.12 resolves the issue of forward-biasingthe transistor Mdpl.When the clock is low, B2 isset atV5s, which allows B 1 to be charged toVDD instead of VDDVTPII.Forward-biasing the pn junctionof Mdpl here is no problembecause the transistorbehaves as a diode. When the clockrises, Bi is switched to2VDD, and B2 is also raised to2VDD.The high gate-source voltageof Mdpl disables the current flowto back to the supply. The pnjunction is now reverse-biased to prevent leakage.Note that in Fig. 5.12, the gate-bodyvoltagesof both transistors Mdpl and Mdp2are always within oneVDD of each other, ensuring gate-oxide reliability. Moreover, as aside effect, this circuit can generatea boosted voltage of2VDD-IVTp2Iinstead of2VDD-IVTP1 HVTp2I, which improves the design.2V0_[LFigure 5.13: Generation ofthe boosted voltage on node B2.To generate the bootstrapped signalat node B2, the concept of a clock multiplier[58][65][66]can be used. The circuit for generatingthe bootstrapped signal at B2is shown in Fig. 5.13. If oneassumes that the clock has no previousactivity, due to leakage throughthe transistors, it can beverified that both nodes B3 and B4are at roughlyVDD while both transistors Mcnl and Mcn2 areshut off. When the clock is turned low, B3goes to2VDD and B4 is roughly atVDD, which causesMcn2 to turn on. The output nodeB2 is low, along with the clocksignal. When the clock rises to112VDD, B3 is discharged from2VDD to VDD, whereas B4 is charged up to2VDD. This causes Mcnlto turn on and Mcn2 to turnoff. The rise of the clocksignal also turns on Mip,allowing theoutput node B2 to follow B4.Because the gate of Mipis atVss, the voltage at B2 can rise to2VDD without any voltage loss. Similar tothe previous designapproach, the body of thetransistor Mip is connectedto B4 to ensure reverse-biasedpn junctions. In Fig. 5.13,the twoinverters are powered bythe clean supply rail. The sizesof the PMOS-formed capacitorsshouldbe large enough to supplyenough charge to the loadcapacitance on B2. For Mcnland Mcn2,thick-oxide transistorsare used to reduce therisk of potential oxide breakdownbecause theirgate-body voltages canbe up to2VDD.5.3.2 “CIk” Signal GenerationA critical concern aboutthe charge-borrowing decapdesign is the additionalcapacitive loadingon the clock distributionsystem if the main clockof the chip is connectedto the Cik input of theCBD. Since the CBD hasa large capacitance value inthe range of hundreds of picofarads,such alarge capacitance loadedon the clock tree maycause an imbalance ofthe tree and introducemore clock skew and jitter[67]. In extreme cases, thisextra loading may causea functionalfailure in the clock distributionnetwork. Therefore,the Cik input of theCBD should begenerated from some other sourcesto keep the main clock tree unaffected.The Cik input of the CBD simplyrequires a repetitive signalwith enough buffer strengthto drivea large capacitor. The frequencyof the Cik signal shouldbe roughly in the rangeof the chip’soperating frequencyto ensure sufficient chargepumped into the supply rail atevery clock cycle.If the chip operates ata lower frequency, the extracharge provided fromCBD will not harm thelogic circuits connectedto the local supply as longas the slew rate of theboosted voltages113remains controlled within practical limitations. Anotheruseful feature would be to implement anenable/disable function to the CBD block. When disabled,the block should behave like a regularpassive decap. The CBD is turned on onlywhen the JR drop exceeds certain predefined level oronly during the period that the logic circuits connectedto the local supply experience higheractivities.VDD(clean)enabRing oscillator: 39 stages Bufferchain: 7 stagesFigure 5.14: “CIk” signal generation using ringoscillator.Satisfying the requirement above, a simple ring oscillatorwas selected to generate the Clk signalfor the CBD, as illustrated in Fig. 5.14. Atotal of 39 inverter stages were used to provide anoscillation frequency of 1GHz, the upper limit of thetargeted ASIC applications. A NAND gatereplaces the regular inverter at the first stage to incorporatean enable signal. By using unit-sizedinverters, the ring oscillator consumes about 651iWof dynamic power. To provide enoughcurrent flow to charge and discharge the decouplingcapacitance, in this case2Cdecap, a chain ofinverters was added and sized according to logical effort[7]. A fan-out factor of the inverterchain was selected to be about 3—4 so thatthe delay through the chain was minimized. Thenumber of stages required to generate the fan-out factorof 3 to 4 was then calculated. Of course,the circuit in Fig. 5.14 will cause additional supply noiseon the “clean” supply, especially with alargely-sized last stage of the buffer chain. Onehas to ensure that the additional noise on therelatively clean supply node does not exceed certainnoise budget when transferring charge from114the clean supply to the noisy supply. As the sizeof the inverter chain increases, its dynamicpower also increases, at a benefit of the improved slewrate (SR). This effect can be shown inFig.5.15. Note that in the figure the dynamic power includesthe ring oscillator plus the entire bufferchain, in which each buffer is sized up properlyto produce the minimum path delay.141009012801108:JZJJZ._:JZJZ::z:z50. 640S-4 .- 30I.4-202:300rn100I I I00 200 400 600800 1000Final Stage Size (NMOS transistor) (tim)Figure 5.15: Buffer size determines the edge delaywhile the buffer chain (with ring oscillator)consumes dynamic power.The current through the last stage of the chainI1t_stagecontrols the decap charge and discharge,so it determines the slew rate. The edge delay can be definedhere as the delay time for thepositive side of the decap to rise a full swing,as follows:Edge Delay =i...2Cdecap(5.2)SR‘last stage115The edge delay is a better term in thisscenario to illustrate the designtradeoffs in Fig. 5.15. Foratargeting clock frequency of 1GHz,the corresponding clock periodis ins. Having an edgedelayin the range of 50 to iO0ps(i.e., 1/20 — 1/10 of the clockperiod) is reasonable. Therefore,abuffer size of 300.tm/600jim(NMOS/PMOS) for the laststage was selected to producean edgedelay of about 5Ops, whilethe total dynamic power was around3.8mW. In that case, the bufferchain was designed to have7 stages sized up according to logicaleffort.When determining the size of the bufferchain, the capacitanceof2Cdecap is fixed at 0.68nF,consistent with the basic activedecap design in Chapter4. Clearly, assuming a fixed edgedelay,the size of buffer required is proportionalto the decap value. If a smallerdecap is used, thebuffer size can be made smallerto dissipate less power. This powerdrawn from the clean supplynode is critical so that thesupply noise caused by the ring oscillatorand the buffer chain shouldnot rise beyond the noise budget inits localized region. That is,the goal of generatingthe “Clk”signal from a cleanVDD is to provide charge from the clean supply thatis not connected to themain system clock or any importantcircuitry. Designers must ensurethat the clean supply itselfdoes not become excessively noisyso that the local supply integrity is compromised.5.3.3 Design of Charge-BorrowingDecapWith the considerations from the previous section,the complete charge-borrowingdecap circuitis depicted in Fig 5.16. An enablesignal is provided to turn offthe CBD for test purposes. Whenthe enable signal is low, the transistorMsp is off, preventing current flowfrom the clean supply.The decap is implemented using PMOStransistors, whose value is setto2Cdecap for comparison.When enabled, the voltage atB2 varies from 0 to2VDD. The gate-source capacitance of Mdplcreates certain noise on theclean supply node due to clock feedthrough.The existence of Msp116provides shielding to the clock feedthroughto reduce the noise. Withthis circuit configuration,the practical boosted voltage is2VDD-IVTP2I,while the charge providedis at2CdecapVDD. Notethat the ESD concern on the thin-oxidedecap in this circuit should be addressedby proper sizingof the two transistors, Mdpl and Mdp2.VDD•— El Mdp2 (noisy)I2CdIL — ——_‘qjFigure 5.16: Complete circuit diagramof charge-borrowing decap.As described earlier, after a local hot spotis identified, its nearby white spaceor passive decaparea is occupied with an active decap ora CBD to reduce the supply noise.In the case of CBD,only the passive decap2Cciecap and the transistor Mdp2 need to be implemented locally.Othercircuits showing in Fig. 5.16 canbe located away from the hot spot but neara clean supply node,once such a clean supply is identified. Two globalinterconnects may be requiredto connect thetwo parts of circuits at node Cik andB 1. The actual placementof the ring oscillator and thebuffer chain will depend on thefloorplan and location of power pinsof the chip itself. Comparedto the size of the passive decap, the size ofMdp2 is fairly small and even negligibleto include inan area overhead. Since the clean supplynode does not require a large areaof decaps, the area1117occupied by the circuits at theclean supply is relatively small.Thus, it is assumed thatthe CBDblock requires a minimal areaoverhead, relative to the hotspot area where the CBD is placed.5.3.4 Simulation ResultsTo validate the concept of charge-borrowingdecap, HSPICE simulations werecarried out. Inthesimulation setup, one charge-borrowingdecap with enable signal is present,and no decouplingcapacitor is connected betweenVDD and Vss. The load on the supply rail is a largebuffer whosecurrent demand canbe controlled. As mentioned previously,the CBD can boost thesupplyvoltage when the clock signalrises. The best scenario occurswhen the current demandof thebuffer also lines up with therising edge of the clock. Asa result, the dips on the supplyvoltageproduced by the current demand andthe peaks generated by the CBDcancel each other, causinga relatively low noise voltage profile.Such a case can be illustratedin Fig. 5.17.In Fig. 5.17, the top part of the figureshows the clock signal, whereasthe second part depictsthesupply voltage when both passivedecap and CBD are disconnectedfrom the supply rail. Theload buffer is designed to create voltagesags at the rising clock edges.In the third portion in Fig.5.17, the load buffer is removed andonly the CBD is connected andturned on. Clearly, thesupply voltage is boostedat the rising edges. The dips near thefalling edges can be consideredasripples since there is no decouplingcapacitance connected. Thelast (fourth) part in thefigureillustrates the voltage waveformwhen the CBD is turned onand the load is switching. Theresulting supply voltage experiencesa low level of noise because of thecancellation of peaksand dips.1181.181.161.141.121.11.061.061.041.0258090Gm94Gm92Gm90Dm88Gm86Gm64GmTmin (01) (TIME)Figure 5.17: SimulatedVDD voltage (on a 1GHz clock) with a CBD on and off(best case).cmGOOn60GmDO:lrl :v(cIk)40DmWOnion1.05vDD voltageCBD off(no decap)12n950eGOOn,\ It, ,\ It,12nion1.151.1VDD voltagecnn on(no load)Time (lie) (TIME)1.05DG:tro:v(vebs)9C90Gm85CmlotVDD voltagecnn on(with load)elZnJ::::::::::::::::::::::::::::::::.ç:.:::.ç::;.:::::::::::::::j::::::::::::::::::::::::::::101Time (he) (liME)in119The above example can be consideredas a best-case scenario becausethe current demand of theload and the clock rising edgeare synchronized. On the otherhand, the worst case wouldbe thatthe current demand and thefalling clock edge are linedup. The simulation result ata 1GHzexternal clock for the worstcase is shown in Fig. 5.18. In thefigure, the voltage sags createdbythe current demand are similarwhen the CBD circuit is turnedon or off. The peaks createdbythe CBD are out of phase with thesags produced by the current demand.However, if the averageVDD value per clock cycle is used, similar to the previous approaches,it is clear that the CBDcircuit produces a much higher averageVDD voltage because the voltage peaks in everyclockcycle raise the averageVDD.CikI0mE 60DmIi40Dm2nII/0ionVDDvoltageCBD onCBD off12111 .i51.11.05DnkllDomn05DmionTime (un) (liME)Figure 5.18: Simulated VD voltage (ona 1Ghz clock) with a CBD on andoff (worst case).12n120The simulations above areintended only to illustratehow the CBD circuit behavesundercontrolled conditions. Anotherset of simulations was usedto compare the CBD withpassivedecap and active decap (both n=2and n3) for a clock frequencyof 1GHz. As before, the sizeofthe user logic buffer is changed toproduce different supply noiselevels k. The results areplottedin Fig. 5.19. At all k levels, the averageVDD voltage for the CBD is higher than the passivedecapand the two active decaps. When k=O.15 for the passive decap, theaverage supply noise fortheCBD is still at a satisfactory level of1 OOmV. Compared to the caseof active decaps at thesamek level, the average noise from thebasic and extended activedecaps fall close to that ofthepassive decap. This indicates thatthe CBD is more effectiveas a drop-in replacement thantheother schemes.10009609208808408000.2Figure 5.19: Simulated averageVDD voltage as a function of k showing the case of CBD.0 0.050.1 0.15k1215.4 Test Chip Setup and Measurement ResultsTo validate this new approach to hot-spot removal,another test chip was fabricated in thesame9Onm process. The degree of improvement is of interestas operation frequency increases whenabasic active decap, an extended active decap, or acharge-borrowing decap is used as a drop-inreplacement for a passive decap. The test chip setupis shown in Fig. 5.20. The decap circuits areindividually controlled to be connected to the supplyrail. In this case, the passive decap canbedisconnected from the supply to observe the performanceof the active decaps and the CBD bythemselves.Nonninal Supply (1V)Figure 5.20: Test chip 2 setup.F=lllIIItI______I I liii II II I IHHII HIIIHHHHIH1IiiIILIIF111111 HHHHHIH H HiIrIHI HHI1HHLiI liii1HHH IHIHIIHIHIHH11I1F F F F F FFIF I F F F I F IfF F F F F F F I IFigure 5.21: Layout of charge-borrowing decap showing the relativesize of the components.inductance(mimicked) parasiticcapacitances122The layout of the charge-borrowing decap for the testchip is shown in Fig. 5.21. Unlike thecircuit diagram shown in Fig. 5.16, the ring oscillator andthe buffer chain were not implemented,as the Clk signal was provided from the same external clockfor synchronization purposes. Therest of the circuit, along with the passive decap2Cdep, was implemented on the test chip. Theswitching circuitry is located on the left, with the PMOS-formed decapon the right. The totallayout area for the CBD is about 600jim x l5Ojim = 0.09mm2,in which the decapoccupies anarea of 0.083mm2.The switching circuitry, including the diode transistors and thebootstrappingcircuit, accounts for 8% of the area. The gate-source voltage acrossthe decap oxide is alwaysless than or equal to oneVDD so that thin-oxide devices are acceptable in this case. Thus, thin-oxide PMOS transistors were used to implement the decap in the CBD.\F::EEj---E.J____ij__________________ ______., \I________ ________:_______________ _____________I______I== - - -\ki::i T EIL.(b)Figure 5.22: (a) Annotated microphotograph and (b) layout of the second test chip.The microphotograph and the layout of the second test chip are shown in Fig. 5.22.In the figure,the decap circuits are placed about 600jim away from the user logic, in whicha large buffer witha large capacitive load was used to create supply noise, with the input controlledby an externalsignal. Similar to the first test chip, the size of the decaps was chosen tobe only a few timesF123larger than the capacitive load to create a —1 OOmVvoltage drop for the experiments. Note thatthe load cannot be dynamically changed to producea variable k level. The clock frequency canbe controlled externally by providing an input froma bit-rate analyzer. The probed pad of thesupply node can be connected to an oscilloscopeto measure the voltage waveforms. The testchip area is 1.1 x 0.86 = 0.95 mm2 in total.A collection of 9 sample chips was tested. The clock frequency wasfixed at 1 GHz for this test.The improvement across the sample space is shown in Fig. 5.23. Thesample chips that are in theslow or normal process corners cannot be distinguishedeasily. However, the two sample chipsthat are in the fast process corner can be identified by correlatingwith simulation. As mentionedin Chapter 4, the average supply voltage varies significantly mainlydue to the random processvariation onRmesh and Lpack, which determines the JR drop on the supply rails. Note that thesupply noise reduction improvement by using the CBD is ratherconsistent under processvariations across dies, which indicates the robustnessof the design. The sample chip thatprovides the best improvement was used for further analysis.124990_________________980970-. V ->960VE‘95OV__________940 - •- -.0>930 -V920-AA910-E11.)AA900VNoise Budget> QQfl -I8808708 60VI I IFast1 2 3 4 56 7 8 9Sample NumberFigure 5.23: Scatter plot comparing averageVDD for the tested sample chips.The waveforms of a test chip with a 1GHz clock are depictedin Fig. 5.24. The dark gray curve iswhen the CBD is on, and the light gray (red in color) curveis when the passive decap is on. Thetwo curves are superimposed. As expected, the supply voltagereturns to high when extra chargeis fed through from the clean supply on the rising edge of theclock, improving the averageVDDlevel per cycle.•Passive DecapAActive Decap n=2• Active Decap n=3•CBD125‘V•tIL r•4••’—•+-+’jt.:x......ti..1.\/NJ\‘-+--I—+—I’- -+-44—4-4-+.——I-1h’.1.a.!.t°.t-’4-,‘..1.Figure 5.24: Superimposed waveforms showing a CBD and a passivedecap on a 1GHz clock. In thetop panel of the figure, the upper trace is when the CBD is on, whilethe lower trace is when thepassive decap is on.It is also useful to obtain the CBD performance over the operatingfrequency range. Fig. 5.25shows the impact of changing the external input frequency from 100MHzto 1.50Hz. Two solidtrend lines are provided conesponding to the CBD and the passivedecap. The gap between thetwo trend lines widens, indicating that the benefit of using theCBD increases as frequencyincreases. Even at 1.50Hz, the averageVDD level for the CBD is significantly higher than thepassive decap, suggesting that the CBD is suitable for today’s high-speedASICs and medium-speed custom designs. At 1.5GHz, the test chip validated that theCBD has a maximumcquis. is stopped.0.0 GSa/s L8O kpts[iWi00 rnv/‘Ii———— 2\j\vVvvz.1-a7ps —.1!vv126improvement of 93mV (or about 53% less noise) over the passivedecap, 74mV (48% less noise)over the basic active decap with n=2, or 46mV (36% lessnoise) over the extended active decapwith n=3. From 100MHz to 1.5GHz, compared to thepassive decap, using the CBD reduces thesupply noise from 42% to 55%. Note that the CBD outperforms boththe basic and the extendedactive decaps across the operating frequency range.Figure 5.25: Measured averageVDD voltages at different clock frequencies.Another important observation is that, unlike active decaps, the CBDdoes not seem to have aspecific frequency, at which the averageVDD voltages from the CBD and the passive decapcrossover. In other words, there is always a gap in the average supply voltagebetween the CBDand the passive decap across the frequency range. This makes intuitivesense since the CBD980960940920900880860840820800• Passive DecapAActive Decap n=2• Active Decap n3•CBD••A0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6Input Clock Frequency (GHz)127boosts the supply voltage at every clock cycle no matterthe clock frequency. Although the noiseincreases as the frequency increases, the amount of charge providedby the CBD circuit remainsroughly the same at every cycle (assuming that it is runningat the clock frequency).From the test results of the sample chips, it can beconcluded that the charge-borrowing decap iscapable of reducing the supply noise more efficientlythan the passive and active decaps. Inaddition, there are other attractive features of the CBD,such as higher bandwidth, simplicity, androbustness. Moreover, the use of active decaps willincrease the power consumption of the chipbecause of the internal switching circuitry. In the case of CBD,the local power overhead is fairlysmall since there is only very limited leakage current andmost charge transferred from the cleansupply to the noisy supply is not wasted. Specifically,the leakage power of the CBD is about4p.W, about 0.1% of the total power consumption. However, the dynamicpower of the Cikgeneration circuit is comparable to the active decap. Therefore,the equivalent power overheadwith better performance makes the CBD circuit an appealing alternativeto the active or passivedecaps.Although the charge-borrowing decap provides many advantages, ithas a number of limitations.One important issue is that the supply voltage change is abrupt oncethe CBD is turned on. Highfrequency glitches on the supply may affect the logic circuitpowered by it. To smooth out thesupply voltages, large amount of passive decaps shouldbe present in its vicinity. Althoughparasitic decoupling capacitance is inherently present on the supply, moredecaps may be stillneeded. As a result, the existing area may not be completely replaceableby the CBD, as certainportion of the area should be reserved for the passive decaps. For a fixed area,the proportion of128the CBD and the passive decap depends on the sensitivityof the local logic circuit to the supplyglitches. Questions like the maximum allowed distancefrom the decap to the CBD and theminimum required size of the decap remain unansweredat this stage. However, since the localcircuitry of the CBD is relatively simple (passive decapplus a diode-connected transistor), it isstill an attractive alternative to the active decaps.5.5 SummaryThis chapter further extended the concept of active decap by increasingthe stack height n to findan optimal value, depending on the supply noise k level presentedat the local supply. It wasfound that the extended active decap with n=3 provided superiorperformance by delivering ahigher average supply voltage than the basic active decapwith n=2 when k is <14%. This chapteralso introduced a novel design of charge-borrowing decap toprovide better supply noisereduction than the basic and the extended active decaps. The charge-borrowingdecap deliversmore charge and an increased supply boost for a wide range of operatingfrequencies. Itsrelatively simplistic design and implementation ensures its robustness.129Chapter 6Conclusions and Future Work6.1 Summary and ConclusionsAs technology scales further into the deep submicron regime, increasing clockfrequency anddecreasing supply voltage makes maintaining the quality of powersupply a critical issue. On-chip power supply noise, due to JR drop and Ldi/dt effects, has a great impact ondelay variation,and may even cause improper functionality. Power supply noise canbe reduced by placingdecoupling capacitors close to power pads and large drivers throughoutthe power distributionsystem. Decaps provide locally “instantaneous” current to the switching drivers andkeep thepower supply within certain noise budgets. Traditionally, a decap is made froman NMOStransistor outside the standard-cell blocks, or a pair of NMOS and PMOStransistors within theblocks. However, starting from 9Onm technology, the reduction in oxide thicknessof MOStransistors causes an increased ESD risk and more gate leakage.Standard decap designs,therefore, may no longer be appropriate for 9Onm and beyond.In this dissertation, the goal was to provide practical solutions to active and passivedecapdesigns targeting ASIC applications in both white-space and standard-cell areas.The dissertationbegan with an overview of decap design basics, gate leakage phenomenon, ESDconcerns, and130standard-cell decap layout and placement. Some essential decap design issues were highlightedthrough the background section to motivate the topics for the rest of the dissertation. Moreimportantly, the metric for power supply noise management was proposed and validated fordecap performance comparisons used throughout the dissertation.Next, the tradeoffs between high-frequency performance of decaps and ESD protection wereinvestigated, and their impacts on the layout of standard-cell passive decaps were discussed. Adesign metric was introduced to determine the optimal number of fingers to use in the standard-cell layout to obtain a desired capacitance level over a target operating frequency. Models weredeveloped to capture the frequency responses of Reff and Ceff for any given technology with onlya few parameters. For ESD protection, a cross-coupled design that had been previously proposedby cell library developers was shown to suffer from reduced frequency response and provided nosavings in gate leakage. It was shown that more fingers than typically used were needed toprovide the target resistance value for sufficient ESD protection. The layout with the smallestNMOS device and a multi-fingered PMOS device was described to deliver acceptable frequencyresponse and ESD reliability, while providing the lowest leakage.For white-space areas, the effectiveness of active decaps as a late-stage drop-in replacement forpassive decaps was evaluated so that a completed chip layout need not be disrupted near thetapeout deadline. Improvements to the design of an active decoupling capacitor were describedfor removal of hot-spot JR-drop violations in ASIC designs running at up to 1GHz. The modifiedactive decap using latch-based comparators in 9Onm CMOS is able to switch in O.5ns andconsumes a relatively low power of 2.8mW, which is about 5X lower than a previousdesign131running at approximately the same speed. This reduced power makes it more suitable for use inASIC designs. Measurement results from test chips indicate improvement over passive decaps of10% - 20%, operating from 200MHz to 1GHz. The optimal active decap size to maximallyremove hot-spot JR drop was identified. Placement analysis was also carried out and it was foundthat the active decap is most effective when placed in close proximity to the hot spot, ascompared to the passive decap which is not as sensitive to the exact location. If sized and placedproperly, active decaps can be up to 20% better when used as drop-in replacements of passivedecaps for power supply noise reduction.Further research on active decap design was explored. By increasing the stack height n to anoptimal value, and depending on the supply noise k level presented at the local supply, amaximum supply boost can be achieved. It was found that the extended active decap with n=3provided a superior performance by delivering a higher average supply voltage than the n=2 andn=4 cases when the supply noise k is in the range of 7-14%. When k is above 14%, n=2 must beused and beyond 16%, the area for the drop-in replacement of active decaps must be expanded toproduce satisfactory improvement over the passive clecap.Finally, the novel design for charge-borrowing decap was proposed. This design provides bettersupply noise reduction than all other forms of active decaps. The charge-borrowing decapefficiently transfers charge from a clean supply rail to eliminate the hot spots on relatively noisysupply nodes. The CBD only requires a minimum power overhead and delivers a maximumsupply boost for a wide range of operating frequencies. Test results indicate that the CBDoutperforms both the basic and the extended active decaps by reducingthe supply noise to a132lower level. The design and implementation of the CBD was kept relatively simple so that therobustness of the design can be maintained.6.2 Contributions in this DissertationThe following summarizes the major contributions in this dissertation:• Developed standard-cell passive decap designs that properly trade off gate leakage, ESDreliability, and transient response. Provided simple and yet practical decap design metricsand guidelines for 9Onm and 65nm CMOS technologies.• Designed and implemented a white-space active decap using latch-based comparatorsthat provides adequate supply noise reduction while consuming relatively low staticpower. Validated the design through a test chip. Explored the placement issues of theactive decap.• Extended the concept of active decap for an optimal design that produced the highestsupply boost for the maximum supply noise reduction in a local area. Proposed a simplebut novel charge-borrowing decap circuit that outperforms the basic and the extendedactive decaps. Validated the design through another test chip.6.3 Future WorkThe limitations on the charge-borrowing decap design needs to be evaluated further. Quantitativeanswers to questions like the optimal distance between the CBD and the logic, the accurateproportion between the CBD area and the nearby passive decap area, and the maximumallowable charge transferring from the clean supply node while still maintaining its supplyintegrity, should be investigated. A test chip with real industrial blocks that are placedas the userlogic circuit is desired for the most accurate decap performanceevaluation.133Another issue is the scalability of the active decaps and the CBD. The design andimplementation of the active decap were only accomplished and validated in 9Onm CMOS. It isdesirable to make the active decap designs useful in future technologies, such as 65nm and 45nmCMOS. As technology scales, more design challenges will occur. If any part of the design needsto be modified to accommodate the more advanced technology, it should be investigated infuture research.Monitoring power supply fluctuations on-chip in real-time is also an emerging area of research[20]. The measured results of real-time power supply noise from a monitoring circuitry can beused as a validation of decap design and placement. Many techniques have been used to monitorpower supply noise on-chip [68][69]. These techniques are not suitable for productionenvironments as they either need a significant area or require complex data processing off-chip.To overcome these limitations, a simple monitoring based on an under-sampling technique [70]is worthy of being investigated. Under-sampling is used to capture a high-frequency periodicsignal from a large number of cycles using a slower sampling signal to achieve an effective high-speed sampling rate. In the case of power supply noise, the dynamic voltage drop is not periodicin nature, but the same experiment could be repeated several times, and each time skew thesampling point by a small time shift, &, which represents the sampling period resulting in anequivalent sampling frequency offuncier-sampiing= 1/At [70]. Using this technique, themeasurements may be repeated to average and cancel out the noise effect. This approach needsto be evaluated from concept to test chip in order to finally validate its advantages anddisadvantages and serves as a promising area of future work.134ÉAPPENDIX: COMPARATOR DESIGN FUNDAMENTALSComparators are one of the most widely used components in analog integrated circuits. Avoltage comparator is a circuit that compares the instantaneous value of an input signal with areference signal and produces an output at logic level, depending on whether the input is greateror smaller than the reference level [57]. One important application for high-speed voltagecomparators is in data converters, where the conversion speed is limited by the response time ofthe comparators [57]. Other issues related to comparator design include finite resolution, offset,power, and area. As technology scales, more advanced CMOS technologies allow comparators tobe realized for higher speed and potentially smaller area and power. However, it is difficult toachieve high speed and high accuracy at the same time because of the existence of devicemismatches [71].A widely used comparator configuration is a high-gain differential input, single-ended outputamplifier, whose symbol is shown in Fig. A. 1. The output of the comparator should have alarge swing, ideally from VDD to Vss, as the input varies across a small swing, typically in themillivolt range [57]. In many applications, a comparator is used in open-loop operations, suchthat no frequency compensation is required [57][581[59]. However, in certain cases, due to thenature of AC coupling of the output and the input, a comparator may needfrequencycompensation to avoid oscillations [32].Vin+VinFigure A.1: A differential input, single-ended output comparatorsymbol.Vout135vout voutVDDVpoffset IVj+ - Vj11./v+ -I,Ivii! v111—‘finite gain Av(a) (b)Figure A.2: DC transfer characteristics of (a) an ideal comparator and (b) a practical comparatorwith finite gain and offset voltage.The DC characteristic curve of an ideal differential comparator is shown in Fig. A.2(a). Whenthe positive input V11+ is greater than the negative input V, the output is high (i.e., at VDD).When V+ is less than the output is low (i.e., at Vss). This ideal DC transfer curvecorresponds to a differential gain of infinity. That is, an infinitely small polarity change in (V+ -V) will cause the output to switch. A more realistic DC transfer curve of a comparator isdepicted in Fig. A.2(b). In practice, the differential gain is finite and equal to A. In Fig. A.2(b),the two voltagesVILandVIHare the overdrive voltages (also called the input excess voltages).The overdrive is the input level that drives the comparator from an initial saturated inputcondition to an input level that barely causes the output of the comparator to switch its level [57].Another non-ideal effect of the comparator is the input referred DC offset voltage that is mainlycaused by device mismatches. If no offset voltage is present, the comparator DC transfer curvewill be symmetrical around the point where V+ = V. However, for a finite offset voltage ofV05,the comparator output will switch at (V÷ - V + Vos). In general, the output voltageV0of a comparator can be written as [57][581:136VDDzfM”nJH=Aixç zf“ILM/n<“IH(A.l)VSS lf n<VJLwhere AV1 = - Clearly, the finite gain and offset voltage affect the accuracy of thecomparator. It is desired that a comparator is designed to have a high gain and a small offsetvoltage.Three major challenges exist in any comparator design: high speed, high resolution, and lowpower [72]. High speed is achieved by having a fast response time. That is, following an inputpolarity change, the comparator should switch its output betweenVDD and Vss with fast rise andfall times. In order to achieve the highest comparison speed, the minimum channel length for aspecific technology is often used in comparator designs [71]. In addition, low powerconsumption is always desired. As technology scales, due toVDD scaling, the dynamic power ofa comparator will scale accordingly. Meanwhile, the static power consumption of the comparatormay increase due to larger leakage current in a more advanced deep-submicron technology.Overall, the total power consumption including both dynamic and static power may reduce astechnology shrinks [72].A high gain A is essential to achieve high resolution in a comparator design. For example, theinput of the comparator needs to resolve lmV of input variation, which requires the output toswitch a full swing of 1V atVDD, the voltage gain Av is therefore 1V/lmV = 1000. It is difficultto achieve such a high gain within one stage of amplification. Hence, a multi-stage amplifieror aregenerative latch using positive feedback may be used as a comparator to achievethe high gain137requirement. A latch is normally faster then a multi-stage amplifier achieving the same gain[60][71][73]. Therefore, latch-based comparators are often used in practice.VDDM4 M3Vout2 Voutic-jLM2 M1jF-OVia- I I Vin+Vbiasl-HMblV(a)Figure A.3: Circuit schematics showing positive feedback: (a) a basic differential pair with diode-connected load, (b) a gain-enhanced differential pair, and (c) a differential pair using positivefeedback to provide increased gain.The concept of positive feedback used in the latch approach can be explained in the followingexample, as shown in Fig A.3. In Fig. A.3(a), a differential pair is loaded with two diode-connected PMOS devices. Its small-signal gain AV(a) can be given by [64] [73]:A-= IPn(’’Ml(A.2)V(a)gV/tP(W/L)M3wheregmiand g3 are the transconductance of Ml and M3, and1nandppare the channelmobility of NMOS and PMOS devices, respectively. In Fig. A.3(b), a gain enhancementapproach is added to the circuit with two additional transistors, M5 and M6. The small-signalgain of Av(b) can be given by [54j[57]:AV(a)l—(2I“hi)(A.3)(b) (c)138where I andIbiare the current flow through M5 and Mbl, respectively. By properly choosingthe size of M5, the small-signal gain can be improved. In Fig. A.3(c), the drain terminals of M5and M6 are cross connected, creating a form of positive feedback. The positive feedbackfunctions as follows: when V+ is slightly larger than Vu, it causes V0ito be slightly smallerthan V02.Larger V02 forces M6 to deliver less current to the node V01.Similarly, smallerV01 forces M5 to deliver more current to charge up V02.This positive loop reinforces V02 toreach VDD and V0ito reach V55 [54][57]. The small-signal gain of Av(0)can be given by [57]:1Av(C) AV(a)W/D(A.4))M5(W/L)M3Equation A.4 requires that the size ratio of (W/L)M5 / (W/L)M3 is less than 1. When the size ratiois greater than unity, the small-signal gain will become infinity and the circuit will operate as aregenerative latch [57].139REFERENCES[1] H. H. Chen and S. E. Schuster, “On-chip decoupling capacitor optimization for high-performance VLSI design,” Symposium on VLSI Technology, Systems, and Applications,pp.99-103, May-Jun. 1995.[2] S. Pant and E. Chiprout, “Power grid physics and implications for CAD,” IEEE/A CMDesign Automation Conference,pp.199-204, Jul. 2006.[3] N. Srivastava, X. Qi, and K. Banerjee, “Impact of on-chip inductance on powerdistribution network design for nanometer scale integrated circuits,” InternationalSymposium on Quality ofElectronic Design (ISQED), pp. 346-35 1, Mar. 2005.[4] C. W. Fok and D. L. Pulfrey, “Full-chip power-supply noise: the effect of on-chip power-rail inductance,” International Journal ofHigh Speed Electronics and Systems, vol. 12, no.2,pp.573-582, Jun. 2002.[5] J. Kim, B. Choi, H. Kim, W. Ryu, Y. -H. Yun, S. -H. Hamm, S. -H. Kim, and Y. -H. Lee,“Separated role of on-chip and on-PCB decoupling capacitors for reduction of radiatedemission on printed circuit board,” IEEE International Symposium on ElectromagneticCompatibility, pp. 53 1-536, Aug. 2001.[6] H. H. Chen, J. S. Neely, M. F. Wang, and G. Co, “On-chip decoupling capacitoroptimization for noise and leakage reduction,” Symposium on Integrated Circuits andSystems Design,pp.3 19-326, Sep. 2003.[7] D. A. Hodges, H. G. Jackson, and R. A. Saleh, Analysis and Design ofDigital IntegratedCircuits in Deep Submicron Technology,31Xed., New York: McGraw-Hill, 2004.[8] N. Na, T. Budell, C. Chiu, E. Tremble, and I. Wemple, “The effects of on-chip andpackage decoupling capacitors and an efficient ASIC decoupling methodology,” ElectronicComponents and Technology Conference (ECTC),pp.556-567, Jun. 2004.[9] H. Su, S. S. Sapatnekar, and S. R. Nassif, “Optimal decoupling capacitor sizing andplacement for standard-cell layout designs,” IEEE Transactions on Computer-AidedDesign ofIntegrated Circuits and Systems, vol. 22, no. 4,pp.428-436, Apr. 2003.[10] M. Popovich and E. G. Friedman, “Decoupling capacitors formulti-voltage powerdistribution systems,” IEEE Transactions on Veiy Large Scale Integration(VLSI) Systems,vol. 14, no. 3,pp.217-228, Mar. 2006.[11] M. Popovich, E. G. Friedman, M. Sotman, A. Kolodny, and R.M. Secareanu, “Maximumeffective distance of on-chip decoupling capacitors inpower distribution grids,”IEEE/A CM Great Lakes Symposium on VLSI,pp. 173-179, May 2006.[12] P. Larsson, “Parasitic resistance in an MOS transistorused as on-chip decouplingcapacitance,” IEEE Journal ofSolid-State Circuits, vol. 32,no. 4,pp.574-576, Apr. 1997.140[13] P. Larsson, “Resonance and damping in CMOS circuits with on-chip decouplingcapacitance,” IEEE Transactions on Circuits and Systems-I: Fundamental Theory andApplications, vol. 45, no. 8,pp.849—858, Aug. 1998.[14] M. D. Powell and T. N. Vijaykumar, “Exploiting resonant behavior to reduce inductivenoise,” International Symposium on Computer Architecture,pp.288—299, Jun. 2004.[15] C. H. Ng, C. S. Ho, S. -F. S. Chu, and S. -C. Sun, “MIM capacitor integration for mixed-signal/RE applications,” IEEE Transactions on Electron Devices, vol. 52, no. 7,pp.1399-1409, Jul. 2005.[16] M. W. C. Goh,Q.Lim, R. A. Keating, A. V. Kordesch, and Y. Bin Mohd Yusof, “Designof radio frequency metal-insulator-metal (MIM) capacitors,” International Conference onSolid-State and Integrated Circuits Technology, pp. 209-2 12, Oct. 2004.[17] C. H. Ng, C. S. Ho, N. G. Toledo, and S. -F. Chu, “Characterization and comparison ofsingle and stacked MIMC in copper interconnect process for mixed-mode and REapplications,” IEEE Electron Device Letters, vol. 25, no. 7,pp.489-491, Jul. 2004.[18] C. H. Ng, C. S. Ho, S. -F. S. Chu, and S. -C. Sun, “MIM capacitor integration for mixed-signal/RE applications,” IEEE Transactions on Electron Devices, vol. 52, no. 7,pp.1399-1409, Jul. 2005.[19] H. Yamamoto and J. A. Davis, “Decreased effectiveness of on-chip decouplingcapacitance in high-frequency operation,” IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems, vol. 15, no. 6,pp.649-659, Jun. 2007.[20] K. Arabi, R. Saleh, and X. Meng, “Power supply noise in SoCs: metrics, management, andmeasurement,” IEEE Design and Test of Computers, vol. 24, no. 3,pp.23 6-244, May-Jun.2007.[21] TSMC 9Onm CLN9OG Process SA GE-X v3. 0 Standard Cell Library Databook, Release 1.0,Artisan Components Inc., Sunnyvale, CA, 2004.[22] X. Meng, R. Saleh, and K. Arabi, “Layout of decoupling capacitors in IP blocks for 90-nmCMOS,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 16, no.11,pp.1581-1588, Nov. 2008.[23) X. Meng, K. Arabi, and R. Saleh, “Novel decoupling capacitor designs forsub-9OnmCMOS technology,” International Symposium on Quality Electronic Design(ISQED), pp.266-272, Mar. 2006.[24] A. Amerasekera and C. Duvvury, ESD in Silicon IntegratedCircuits, 2 ed., Hoboken,NY: John Wiley & Sons, 2002.[25] J. Fu, Z. Luo, X. Hong, T. Cai, S. X. -D. Tan, and Z. Pan, “VLSIon-chip power/groundnetwork optimization considering decap leakage currents,”Asia and South PacJIc DesignAutomation Conference, vol. 2,pp.735-738, Jan. 2005.141[261K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, “Leakage current mechanismsand leakage reduction techniques in deep-submicrometer CMOS circuits,” Proceedings ofIEEE, vol. 91, no. 2, pp. 305-327, Feb. 2003.[27] F. Hamzaoglu and M. Stan, “Circuit-level techniques to control gate leakage for sublOOnm CMOS,” International Symposium on Low Power Electronics and Design, pp. 60—63, Aug. 2002.[28] R. S. Guindi and F. N. Najm, “Design techniques for gate-leakage reduction in CMOScircuits,” International Symposium on Quality Electronic Design (ISQED), pp. 6 1-65, Mar.2003.[29] D. Lee, W. Kwong, D. Blaauw, and D. Sylvester, “Analysis and minimization techniquesfor total leakage considering gate oxide leakage,” IEEE/A CM Design AutomationConference,pp175-180, Jun. 2003.[30] L. Chang, K. J. Yang, Y. -C. Yeo, Y. -K. Choi, T. -J. King, and C. Hu, “Reduction ofdirect-tunneling gate leakage current in double-gate and ultra-thin body MOSFETs,” IEEETransactions on Electron Devices, vol. 49, no. 12,pp.2288-2295, Dec. 2002.[311X. Meng, K. Arabi, and R. Saleh, “A novel active decoupling capacitor design in 9OnmCMOS,” IEEE International Symposium on Circuits and Systems (ISCAS), pp. 657-660,May 2007. (Top 10 Honorable Mention Award)[32] X. Meng and R. Saleh, “An improved active decoupling capacitor for “hot-spot” supplynoise reduction in ASIC designs,” IEEE Journal ofSolid-State Circuits, vol. 44, no. 2,pp.584-593, Feb. 2009.[33] R. Saleh, D. Overhauser, S. Taylor, “Full-chip verification of UDSM designs,” IEEE/A CMInternational Conference on Computer-Aided Design, pp. 45 3-460, Nov. 1998.[34] S. Sapatnekar, “High-performance power grids for nanometer technologies,” IEEEInternational Conference on VLSIDesign,pp.839-844, Jan. 2004.[35] G. Bai, S. Bobba, and I. N. Hajj, “Static timing analysis including power supply noiseeffect on propagation delay in VLSI circuit,” IEEE/A CM Design Automation Conference,pp.295-300, Jun. 2001.[36] H. Harizi, R. HauBler, M. Olbrich, and E. Barke, “Efficient modeling techniques fordynamic voltage drop analysis,” IEEE/A CMDesign Automation Conference,pp.706-711,Jun. 2007.[37] M. Ang, R. Salem, and A. Taylor, “An on-chip voltage regulator using switcheddecoupling capacitors,” IEEE International Solid-State Circuits Conference,pp. 438-439,Feb. 2000.[38] M. A. Ang, and A. D. Taylor, “Voltage regulating circuit for attenuatinginductanceinduced on-chip supply variations,” U.S. Patent 6509785, Jan. 21, 2003.142[39] C. Giacomotto, R. P. Masleid, and A. Harada, “Four-state switched decoupling capacitorsystem for active power stabilizer,” U.S. Patent 6744242 Bi, Jun. 1, 2004.[40] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits. A DesignPerspective,2nded., Upper Saddle River, NJ: Prentice Hall, 2004.[41] J. Gu, H. Eom, and C. Kim, “A switched decoupling capacitor circuit for on-chip supplyresonance damping”, Symposium on VLSI Circuits,pp.126-127, Jun. 2007.[42] J. Gu, R. Harjani, and C. Kim, “Distributed active decoupling capacitors for on-chipsupply noise cancellation in digital VLSI circuits”, Symposium on VLSI Circuits,pp.216-217, Jun. 2006.[43] W. C. Lee and C. Hu, “Modeling gate and substrate currents due to conduction- andvalence-band electron and hole tunneling,” Symposium on VLSI Technology, pp. 198-199,Jun. 2000.[44] K. Cao, W. -C. Lee, W. Liu, X. Jin, P. Su, S. K. H. Fung, J. X. An, B. Yu, and C. Hu,“BSIM4 gate leakage model including source drain partition,” International ElectronDevices Meeting (IEDM),pp.815-818, Dec. 2000.[45] X. Xi, M. Dunga, J. He, W. Liu, K. M. Cao, X. Jin, J. J. Ou, M. Chan, A. M. Niknejad, andC. Hu, “BSIM4.4.0 MOSFET model user’s manual,” University of California, Berkeley,2004.[46] C. K. Alexander and M. N. 0. Sadiku, Fundamentals of Electric Circuits, New York:McGraw-Hill, 2000.[47] X. W. Wang, Y. Shi, T. P. Ma, G. J. Cui, T. Tamagawa, J. W. Golz, B. L. Halpen, and J. J.Schmitt, “Extending gate dielectric scaling limit by use of nitride or oxynitride,”Symposium on VLSI Technology,pp.109-110, Jun. 1995.[48] T. P. Ma, “Opportunities and challenges for high-k gate dielectrics,” InternationalSymposium on the Physical and Failure Analysis of Integrated Circuits (IPFA),pp.1-4,Jul. 2004.[49] T. P. Ma, “Electrical characterization of high-k gate dielectrics,” International Conferenceon Solid-State and Integrated Circuits Technology, pp. 36 1-365, Oct. 2004.[50] V. George, S. Jahagirdar, C. Tong, K. Smits, S. Damaraju, S. Siers, V. Naydenov, T.Khondker, S. Sarkar, and P. Singh, “Penryn: 45-nm next generation Intel® CoreTM 2processor,” IEEE Asian Solid-State Circuits Conference (ASSCC),pp. 14-17, Nov. 2007.[51] M. T. Bohr, R. S. Chau, T. Ghani, and K. Mistry, “The high-k solution,” IEEESpectrum,vol. 40, no. 10,pp.29-35, Oct. 2007.[52] J. Chia, “Design, layout and placement of on-chip decouplingcapacitors in IP blocks,”MA.Sc Thesis, University of British Columbia, 2004.143Q[53] S. Zhao, K. Roy and C. -K. Koh, “Decoupling capacitance allocation and its application topower-supply noise-aware floorplanning,” IEEE Transactions on Computer-Aided DesignofIntegrated Circuits and Systems, vol. 21, no. 1,pp81-92, Jan. 2002.[54] J. R. Hauser, “Bias sweep rate effects on quasi-static capacitance of MOS capacitors,”IEEE Transactions on Elecfron Devices, vol. 44, no. 6,pp.1009-1012, Jun. 1997.[55] W. Liu, MOSFET Models for SPICE Simulation Including BSIM3v3 and BSIM4, WileyIEEE Press, 2001.[56] H. Johnson and M. Graham, High-Speed Digital Design, Prentice-Hall, 1993.[57] R. Gregorian, Introduction to CMOS Op-Amps and Comparators, New York: John Wiley& Sons, 1999.[58] R. J. Baker, CMOS: Circuit Design, Layout, and Simulation,2nded., Piscataway, NJ: IEEEPress, 2005.[59] D. A. Jones and K. Martin, Analog Integrated Circuit Design, New York: John Wiley &Sons, 1997.[60] 5. Sheikhaei, S. Mirabbasi, and A. Ivanov, “A 43mW single-channel 4G5/s 4-bit flashADC in 0.l8jim CMOS,” IEEE Custom Integrated Circuits Conference (CICC), pp. 353-356, Sep. 2007.[61] E. Alon and M. Horowitz, “Integrated regulation for energy-efficient digital circuits,”IEEE Journal ofSolid-State Circuits, vol. 43, no. 8,pp.1795-1807, Aug. 2008.[62] X. Meng and R. Saleh, “Active decap design considerations for optimal supply noisereduction,” International Symposium on Quality Electronic Design (ISQED),pp.765-769,Mar. 2009.[63] X. Meng, R. Saleh, and S. Wilton, “Charge-borrowing decap: a novel circuit for removalof local supply noise violations,” accepted to IEEE Custom Integrated Circuits Conference(CICC), Sep. 2009.[64] B. Razavi, Design ofAnalog CMOS Integrated Circuits, New York: McGraw Hill, 2001.[65] T. B. Cho and P. R. Gray, “A lOb, 2oMsamples/s, 35mW pipeline AID converter,” IEEEJournal ofSolid-State Circuits, vol. 30, no. 3,pp.166-172, Mar. 1995.[66] A. M Abo and P. R. Gray, “A 1 .5-V, 10-bits, 14.3-MS/s CMOS pipeline analog-to-digitalconverter,” IEEE Journal ofSolid-State Circuits, vol. 34, no. 5,pp.599-606, May 1999.[67] K. A. Jenkins, K. L. Shepard, and Z. Xu, “On-chip circuit formeasuring period jitter andskew of clock distribution networks,” IEEE Custom IntegratedCircuits Conference(CICC),pp.157-160, Sep. 2007.144[68] E. Alon, V. Stojanovic, and M. A. Horowitz, “Circuits and techniques for high-resolutionmeasurement of on-chip power supply noise,” IEEE Journal of Solid-State Circuits, vol.40, no. 4,pp.820-828, Apr. 2005.[69] T. Nakura, M. Ikeda, and K. Asada, “Design and measurement of on-chip dildt detectorcircuit for power supply line,” IEEEAsia-PacUlcConference on Advanced SystemIntegrated Circuits,pp.426-427, Aug. 2004.[70] B. Kaminska and K. Arabi, “Mixed signal DFT: a concise overview,” IEEE InternationalConference on Computer-Aided Design, pp. 672-680, Nov. 2003.[71] B. Murmann, “AID converter trends: power dissipation, scaling and digitally assistedarchitectures,” IEEE Custom Integrated Circuits Conference (CICC),pp.105-112, Sep.2008.[72] R. J. van de Plassche, J. H. Huij sing, and W. Sansen, Analog Circuit Design: High-SpeedAnalog-to-Digital Converters; Mixed-Signal Design; PLL ‘s and Synthesizers, Boston:Kluwer Academic Publishers, 2000.[73] M. Gustavsson, J. J. Wikner, and N. Tan, CMOS Data Converters for Communications,Boston: Kiuwer Academic Publishers, 2000.145

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0068219/manifest

Comment

Related Items