RECYCLING CLOCK NETWORK ENERGY IN HIGH-PERFORMANCE DIGITAL DESIGNS USING ON-CHIP DC-DC CONVERTERS by Mehdi Alimadadi M.A.Sc., University of British Columbia, 2000 B.A.Sc., Iran University of Science and Technology, 1989 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in The Faculty of Graduate Studies (Electrical and Computer Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) July 2008 © Mehdi Alimadadi, 2008 ii ABSTRACT Power consumption of CMOS digital logic designs has increased rapidly for the last several years. It has become an important issue, not only in battery-powered applications, but also in high-performance digital designs because of packaging and cooling requirements. At multi-GHz clock rates in use today, charging and discharging CMOS gates and wires, especially in clocks with their relatively large capacitances, leads to significant power consumption. Recovering and recycling the stored charge or energy about to be lost when these nodes are discharged to ground is a potentially good strategy that must be explored for use in future energy-efficient design methodologies. This dissertation investigates a number of novel clock energy recycling techniques to improve the overall power dissipation of high-performance logic circuits. If efficient recycling energy of the clock network can be demonstrated, it might be used in many high-performance chip designs, to lower power and save energy. A number of chip prototypes were designed and constructed to demonstrate that this energy can be successfully recycled or recovered in different ways: • Recycling clock network energy by supplying a secondary DC-DC power converter: the output of this power converter can be used to supply another region of the chip, thereby avoiding the need to draw additional energy from the primary supply. One test chip demonstrates energy in the final clock load can be recycled, while another demonstrates that clock distribution energy can be recycled. • Recovering clock network energy and returning it back to the power grid: each clock cycle, a portion of the energy just drawn from the supply is transferred back at the end of the cycle, effectively reducing the power consumption of the clock network. The recycling methods described in this thesis are able to preserve the more ideal square clock shape which has been a limitation of previous work in this area. Overall, the results provided in this thesis demonstrate that energy recycling is very promising and must be pursued in a number of other areas of the chip in order to obtain an energy-efficient design. iii TABLE OF CONTENTS Abstract................................................................................................................................... ii Table of Contents ..................................................................................................................iii List of Tables ......................................................................................................................... vi List of Figures.......................................................................................................................vii List of Abbreviations ............................................................................................................. x List of Symbols ...................................................................................................................... xi Acknowledgments ................................................................................................................xii Dedication ............................................................................................................................xiii 1 Introduction.................................................................................................................... 1 1.1 Main Motivation............................................................................................... 1 1.2 Research Challenges and Objectives ............................................................... 3 1.3 Research Contributions .................................................................................... 5 1.4 Thesis Outline .................................................................................................. 7 2 Background..................................................................................................................... 8 2.1 Discrete Switching Power Converters ............................................................. 9 2.1.1 Basic Switching Converters ............................................................. 9 2.1.2 Zero Voltage Switching.................................................................. 11 2.2 CMOS Inverter Driver Circuit ....................................................................... 13 2.3 Integrated Switching Power Converters......................................................... 14 2.4 Literature Survey............................................................................................ 16 2.4.1 Switching Power Converters .......................................................... 16 2.4.2 Low-Swing Power Converters ....................................................... 20 2.4.3 Resonant Clock Strategies.............................................................. 23 2.5 Implementation Considerations ..................................................................... 24 3 Integrated Buck Converters........................................................................................ 28 3.1 Integrated Clock Driver/Buck Converter ....................................................... 29 iv 3.1.1 Introduction .................................................................................... 29 3.1.2 Circuit Design................................................................................. 31 3.1.3 Complete Circuit ............................................................................ 33 3.1.4 Simulation ...................................................................................... 37 3.1.5 Chip Implementation...................................................................... 41 3.1.6 Chip Measurements ........................................................................ 43 3.1.7 Summary ........................................................................................ 54 3.2 Low-Swing Buck Converter .......................................................................... 55 3.2.1 Introduction .................................................................................... 55 3.2.2 Circuit Design................................................................................. 56 3.2.3 Complete Circuit ............................................................................ 57 3.2.4 Simulation ...................................................................................... 60 3.2.5 Chip Implementation...................................................................... 63 3.2.6 Chip Measurements ........................................................................ 65 3.2.7 Summary ........................................................................................ 68 4 Integrated Boost and Buck-Boost Converters........................................................... 69 4.1 Integrated Clock Driver/Boost Converter ...................................................... 69 4.1.1 Introduction .................................................................................... 69 4.1.2 Circuit Design................................................................................. 70 4.1.3 Complete Circuit ............................................................................ 72 4.1.4 Simulation ...................................................................................... 74 4.1.5 Chip Implementation...................................................................... 76 4.1.6 Chip Measurements ........................................................................ 76 4.1.7 Summary ........................................................................................ 77 4.2 Integrated Clock Driver/Buck-Boost Converter ............................................ 77 4.2.1 Introduction .................................................................................... 77 4.2.2 Circuit Design................................................................................. 78 4.2.3 Complete Circuit ............................................................................ 80 4.2.4 Simulation ...................................................................................... 82 4.2.5 Chip Implementation...................................................................... 84 4.2.6 Chip Measurements ........................................................................ 84 4.2.7 Summary ........................................................................................ 84 4.3 Conclusions .................................................................................................... 85 v 5 Low-Power Clock Driver............................................................................................. 86 5.1 Introduction .................................................................................................... 86 5.2 Circuit Design ................................................................................................ 86 5.3 Complete Circuit ............................................................................................ 89 5.4 Simulation ...................................................................................................... 91 5.5 Chip Implementation...................................................................................... 94 5.6 Chip Measurements........................................................................................ 95 5.7 Summary ........................................................................................................ 97 6 Conclusions................................................................................................................... 98 6.1 Future Work ................................................................................................. 101 6.1.1 Continuation of the Work............................................................. 101 6.1.2 Investigating New Ideas ............................................................... 104 References ........................................................................................................................... 106 Appendices.......................................................................................................................... 111 A Discrete Switching Power Converters ............................................................ 111 A.1 Buck (Step-Down) Switching Converters .................................... 111 A.2 Boost (Step-Up) Switching Converters ........................................ 117 A.3 Buck-Boost Switching Converters ............................................... 118 B On-Chip Passive Components......................................................................... 120 B.1 Inductors ....................................................................................... 120 B.2 Capacitors ..................................................................................... 123 vi LIST OF TABLES Table 2.1. Performance comparison of reviewed converters........................................................ 19 Table 2.2. Comparison of recent microprocessors........................................................................ 26 Table 3.1. Summary of comparison between integrated buck converters .................................... 54 Table 6.1. Chip prototype results ................................................................................................ 101 vii LIST OF FIGURES Figure 1.1. Recycling clock energy with a DC-DC converter (approximate model) ..................... 4 Figure 1.2. Reducing clock power consumption with an inductor ................................................. 4 Figure 2.1. Basic switching converter topologies ......................................................................... 10 Figure 2.2. ZVS operation in a synchronous buck converter ....................................................... 12 Figure 2.3. A CMOS inverter driver with tapering factor r.......................................................... 13 Figure 2.4. A CMOS inverter chain driving a CMOS buck converter ......................................... 14 Figure 2.5. ZVS operation in a CMOS synchronous buck converter ........................................... 15 Figure 2.6. Block diagram of a four-phase interleaved DC-DC converter [13] ........................... 17 Figure 2.7. Circuit diagram of the fully integrated two-stage buck converter [15] ...................... 18 Figure 2.8. Circuit diagram of the fully integrated boost converter [21]...................................... 18 Figure 2.9. Low swing DC-DC conversion technique [24] .......................................................... 20 Figure 2.10. Cascode bridge circuit [26] ...................................................................................... 21 Figure 2.11. Block diagram of a power management on-chip [27] .............................................. 21 Figure 2.12. Implicit DC-DC conversion through charge recycling [28]..................................... 22 Figure 2.13. Simple lumped circuit model of the resonant clock distribution [11] ...................... 23 Figure 2.14. A tapered H-tree clock distribution network ............................................................ 25 Figure 2.15. Components of a resonant clock sector [10] ............................................................ 27 Figure 3.1. Efficiency block diagram ........................................................................................... 30 Figure 3.2. Integrated clock driver/buck converter....................................................................... 32 Figure 3.3. Circuit of the reference clock for the integrated clock driver/buck converter ........... 34 Figure 3.4. Circuit diagram of the integrated clock driver/buck converter .................................. 35 Figure 3.5. Timing diagram of Vclk ............................................................................................... 36 Figure 3.6. Simplified circuit model for analyzing Vclk during clock fall time............................. 36 Figure 3.7. Simulated waveforms for the integrated clock driver/buck converter ....................... 38 Figure 3.8. Simulated output voltage and input power of the integrated buck converter............. 39 Figure 3.9. Simulated raw and effective efficiencies of the integrated buck converter................ 40 viii Figure 3.10. Implementation of the integrated clock driver/buck converter ................................ 42 Figure 3.11. Block diagram of the test bench setup...................................................................... 43 Figure 3.12. The effect of Fsw on Vout ........................................................................................... 44 Figure 3.13. The effect of D on Vout.............................................................................................. 45 Figure 3.14. The effect of Fsw on Pin1 and Pin2.............................................................................. 47 Figure 3.15. The effect of D on Pin1 and Pin2 ................................................................................ 48 Figure 3.16. The effect of Fsw on η............................................................................................... 50 Figure 3.17. The effect of D on η ................................................................................................. 51 Figure 3.18. The effect of Fsw on ηeff ............................................................................................ 52 Figure 3.19. The effect of D on ηeff .............................................................................................. 53 Figure 3.20. Low-swing buck converter ....................................................................................... 57 Figure 3.21. Circuit diagram of the low-swing buck converter .................................................... 58 Figure 3.22. Simulation results for each variant of the circuit...................................................... 62 Figure 3.23. Deep n-well implementation cross sectional view ................................................... 63 Figure 3.24. Chip micrograph....................................................................................................... 64 Figure 3.25. Measured prototype performance............................................................................. 67 Figure 4.1. Integrated clock driver/boost converter...................................................................... 71 Figure 4.2. Circuit diagram of the integrated clock driver/boost converter.................................. 73 Figure 4.3. Simulation results of the integrated clock driver/boost converter.............................. 75 Figure 4.4. Chip micrograph of the integrated clock driver/boost converter................................ 76 Figure 4.5. Integrated clock driver/buck-boost converter............................................................. 79 Figure 4.6. Circuit diagram of the integrated clock driver/buck-boost converter ........................ 81 Figure 4.7. Simulation results of the integrated clock driver/buck-boost converter..................... 83 Figure 4.8. Chip micrograph of the integrated clock driver/buck-boost converter ...................... 84 Figure 5.1. Low-power clock driver ............................................................................................. 87 Figure 5.2. Circuit diagram of the low-power clock driver and the reference clock.................... 90 Figure 5.3. Simulated clock waveforms of Figure 5.1(b) and Figure 5.2..................................... 91 Figure 5.4. Simulated Mp2 drain current waveforms of Figure 5.1(b) and Figure 5.2.................. 93 Figure 5.5. Simulated Mn1 drain current waveforms of Figure 5.1(b) and Figure 5.2.................. 93 Figure 5.6. Effect of changing inductor value on power savings in Figure 5.2............................ 94 Figure 5.7. Chip micrograph......................................................................................................... 95 Figure 5.8. Test and simulation results ......................................................................................... 96 Figure A.1. A basic buck converter ............................................................................................ 112 ix Figure A.2. Operation states of the buck converter in CCM mode ............................................ 112 Figure A.3. Waveforms of a buck converter in CCM mode....................................................... 113 Figure A.4. Waveforms of a buck converter in DCM mode ...................................................... 114 Figure A.5. A synchronous buck converter ................................................................................ 114 Figure A.6. Effect of Fsw and Iout on buck converter inductor .................................................... 116 Figure A.7. Effect of Fsw and Iout on buck converter capacitor................................................... 116 Figure A.8. A basic boost converter ........................................................................................... 117 Figure A.9. Operation states of the boost converter in CCM mode ........................................... 117 Figure A.10. A buck-boost converter ......................................................................................... 119 Figure A.11. Operation states of a buck-boost converter in CCM mode ................................... 119 Figure B.1. A Simplified pi model of an inductor ....................................................................... 121 Figure B.2. A wide bar PGS ....................................................................................................... 122 Figure B.3. Effect of high frequency on an inductor characteristic............................................ 123 Figure B.4. Gate capacitance vs. gate voltage for an NMOS device.......................................... 124 Figure B.5. Model of a MOSFET gate capacitor........................................................................ 124 Figure B.6. ESR as a function of transistor aspect ratio W/L ...................................................... 125 x LIST OF ABBREVIATIONS ABB: Adaptive Body Biasing AC: Alternative Current ALUCAP: Aluminum Cap ASITIC: Analysis and Simulation of Inductors and Transformers for Integrated Circuits CCM: Continuous Conduction Mode CMOS: Complementary Metal Oxide Semiconductor DC: Direct Current DCM: Discontinuous Conduction Mode DVFS: Dynamic Voltage and Frequency Scaling DRC: Design Rule Checking ESR: Equivalent Series Resistance PGS: Patterned Ground Shield I/O: Input/Output LVDS: Low-Voltage Differential Signaling MIM: Metal-Insulator-Metal MOSFET: Metal-Oxide-Semiconductor Field-Effect Transistor MSV: Multiple Supply Voltages NMOS: Negative-Channel Metal-Oxide Semiconductor PDA: Personal Digital Assistant PMOS: Positive-Channel Metal-Oxide Semiconductor PWM: Pulse Width Modulation RAM: Random Access Memory SoC: System on Chip ZVS: Zero-Voltage Switching xi LIST OF SYMBOLS Cclk: Clock Load Capacitance CF: Filter Capacitor Cox: Oxide Capacitance Cx: Parasitic Capacitance D: Clock Duty Cycle Fsw: Clock/Switching Frequency IL: Inductor Current Iout: Output Current L: CMOS Transistor Length LF: Filter Inductor Mn: NMOS Transistor Mp: PMOS Transistor η: Raw Efficiency ηeff: Effective Efficiency Pin: Input Power Pout: Output Power r: Tapering Factor Tdelay: ZVS delay-time Tsw: Clock/Switching Period Vclk: Clock Node Voltage VDD: Supply Voltage Vgs: Gate-to-Source Voltage Vt: Threshold Voltage W: CMOS Transistor Width xii ACKNOWLEDGMENTS I offer my enduring gratitude to the faculty, staff and my fellow students at the UBC, who have inspired me to continue my work in this field. I owe particular thanks to (in alphabetical order) Drs. William Dunford, Guy Lemieux, Shahriar Mirabbasi, Patrick Palmer and Resve Saleh for enlarging my vision of science and providing coherent answers to my endless questions. I also thank my colleague Ph.D. student Samad Sheikhaei whom without his help my work would have not been as successful. Special thanks are owed to my parents, who have supported me throughout my years of education. xiii DEDICATION To my parents and my aunt and uncle who encouraged me to continue my education … 1 1 INTRODUCTION 1.1 Main Motivation Power consumption of digital logic has increased rapidly for the last several years. It has become an important issue, not only in battery-powered applications, but also in high-performance digital designs because of packaging and cooling requirements. As current manufacturing reaches the nanometer range, clock switching frequency has increased dramatically. This further increases the dynamic power loss of those designs because of the continuous charging and discharging of capacitance that characterizes CMOS logic behavior. In high-performance chip designs, the clock itself consumes a significant amount of power. For example, the clock network in IBM’s POWER6 processor can operate above 5GHz and consumes 22% (roughly 22W) of the total power and is second only to leakage power [1]. As another example, the Intel Itanium 2 microprocessor clock, with adaptive frequency changes around 2GHz, consumes 25% (roughly 25W) of the total power [2]. In an older 1GHz Itanium 2 microprocessor, the clock consumes 33% (roughly 43W) of the total power [3]. Clearly, it is very important to reduce clock power consumption as much as possible. A typical buffered clock network consists of a balanced H-tree distribution network terminated by a chain of inverter drivers. The final drivers are sized large enough to drive hundreds to thousands of latches and very long wires [4]. To reduce skew, groups of these final drivers can be shorted by a mesh, effectively producing a larger driver and clock capacitance. 2 Charging and discharging large clock capacitance of this nature is the main cause of the high power consumption. There have been a few methods of clock energy reduction previously reported in the literature, such as gated clocks, low-swing signals, double-edge triggered flip-flops, adiabatic switching, and resonant clocking. Clock gating, which is the most common method, is done by masking the clock input to a sub-circuit with an appropriate signal to cut-down its activity and thus power [5] [6]. One drawback is the high level of design effort needed to ensure that there are no potential timing problems in the circuit because of clock gating. Another disadvantage is the resulting explosion of different clock gating states that makes the circuits difficult to verify and test. Low-swing signaling and double-edge triggered flip-flops utilize complex circuitry and are sometimes employed in high-performance designs. Low-swing is used in the distribution of the clock but not the final drivers [7]. Double-edge triggered flip-flops are sometimes employed in ASICs [8], which often operate below 1GHz, but not in custom microprocessors operating over 1GHz. Adiabatic switching is done by slow charging and discharging of the clock [9], but it is too slow to be employed for high-performance circuits. Resonant clocking is a promising technique for high-speed clocks. It operates by recycling the clock energy using another charge reservoir [10] or by exchanging the charge between load capacitances of two differential clock networks [11]. In both methods, because of the resonating nature of the circuits, a sinusoidal clock waveform is generated. This type of clock waveform is problematic because sharp edges are needed to define precise timing points. Although promising, resonant techniques are not yet practical enough for most applications. 3 1.2 Research Challenges and Objectives The main objective of this dissertation is to investigate methods of recycling or recovering charge in the clock network by using fully integrated power conversion techniques in a system- on-chip. It takes an extremely large amount of energy to operate a high frequency clock in high- performance designs. In each cycle, the energy stored in the clock is wasted by discharging it to ground. One way of recovering energy is to re-deploy the charge elsewhere in the circuit as a second voltage source. Such re-deployment is called energy recycling in this thesis, and it can be used to enable further energy-reducing strategies. Another way of recovering the energy is to return it to the original power grid, a concept called energy recovery. The goals of this thesis are to: • apply energy recycling and energy recovery techniques to energy stored in clocked capacitances, • design switching converters that operate at a very high frequency so passive components are small enough to fit on-chip, and • demonstrate the proposed solutions through chip fabrication and testing. In this dissertation, novel voltage converter circuits are introduced to recycle energy stored in the clock network on every cycle. As shown in Figure 1.1, these converters operate by taking their input energy from the clock network and producing a useful DC energy source. By running at a high clock frequency of roughly 3GHz, the size of passive components for a low-pass filter needed by the switch-mode power supply are greatly reduced and this enables on-chip integration. However, operating at a high switching frequency results in high switching losses in power converter. These losses are reduced by employing zero voltage switching (ZVS) techniques and by directly integrating clock-tree drivers with converter power-transistor drivers. Also, using low-swing signaling helps in reducing dynamic losses in the gate driver. 4 Figure 1.1. Recycling clock energy with a DC-DC converter (approximate model) Although the main goal of this thesis is to reduce energy by recovering stored energy in the clock, many energy-saving techniques rely upon having voltages other than the primary VDD supply available on-chip. For example, since dynamic power is a quadratic function of voltage, circuitry that is not performance-critical can operate at a lower supply voltage to save significant energy. Also, adaptive body biasing can use new voltage sources to dynamically adjust transistor threshold voltages between high-performance and low-power modes. Generating these voltages with an on-chip power converter rather than bringing them in from outside can simplify chip and board design and reduce costs. Since the extra regulated voltage may not always be needed, a more practical solution would recycle energy in a way that more directly reduces power consumption of the clock network. Figure 1.2 shows one way this can be done using an inductor to recover energy from Cclk. Here, rather than providing a second voltage supply, the energy is returned back to the on- chip power-supply grid through a circuit configuration resembling a DC-DC boost converter. To achieve a nearly square clock waveform, the energy is transferred in a non-resonant way. Figure 1.2. Reducing clock power consumption with an inductor 5 This thesis investigates recycling and recovery strategies to reduce the effective clock power consumption. Here, the main challenge comes from the necessity of being limited to on- chip CMOS power transistors and passive components. Other challenges to overcome to make these methods feasible are: • minimizing the impact of using charge-recycling methods and driver integration on internal signals of the original system, • reducing the increased dynamic losses that result from the high switching frequency needed to shrink the passive components, • avoiding complex circuit solutions as they are prone to malfunction, have higher chance of failure, and can easily introduce more energy losses than they save, • avoiding technology-dependent solutions that may not be valid for future generations of finer feature size CMOS technologies, and • coping with inaccuracy of models and simulation results at very high frequencies with very large current densities. 1.3 Research Contributions Overall, this dissertation presents several “firsts” in the field of on-chip power supplies and clock distribution. It is the first work to consider recycling or returning energy stored in the clock distribution network back to the power supply system. The work also includes switch- mode DC-DC converters with the smallest area at 0.27mm2, and the highest operating frequency at 3+ GHz reported to date. Furthermore, it represents the first work to employ ZVS at such high operating frequencies. To enable successful on-chip integration, the following techniques have been applied: 6 • A high switching frequency of 3GHz was used to reduce the size of the converter passive components, so they could be moved on-chip. • Converter switching drivers were integrated with clock-tree drivers, to improve efficiency of the voltage converter by reducing the power needed to drive the converter at such high frequencies. • Zero-voltage switching was employed in the clock network to recycle its energy by redeploying it through the power converter. • Creation of a novel delay circuit to provide the time delay needed to implement ZVS at such a high switching frequency. Also, this dissertation introduces “reduce, reuse and recycle” as a complete energy savings strategy using an on-chip buck converter circuit as an example. While the previous contributions focused on recycling energy at the final output of a clock driver circuit, this method focuses on the energy used to operate the “front end” of a converter circuit which can also be applied to clock networks. The following additional contributions were made: • Low-swing signaling is used to reduce energy in the front-end drive chain (energy reduction). • Supply stacking of two separate front-end drive-chains allows the charge used by the PMOS drive chain to be re-used by the NMOS drive chain (energy reuse). • Surplus charge from the PMOS drive chain is sent to the load by the switching converter (energy recycling). Although the first two concepts have been implemented before (low-swing signaling and supply stacking), this thesis demonstrates how they are part of an overall strategy encompassing reducing, reusing, and recycling energy. However, the third concept of energy recycling is a new contribution. 7 This dissertation also presents a novel clock driver circuit that returns energy back to the power supply grid with the help of a charged inductor. The circuit configuration is based on the boost converter topology. This method differs from previous contributions by improving the power consumption of a clock tree itself instead of producing an auxiliary voltage supply. The following additional contributions were made: • The gating delay circuit also provides ZVS delay time for turning on the final PMOS driver. • Compared to clock resonant schemes, the clock waveform is kept nearly square. • The clock duty cycle is fixed to avoid concerns of clock jitter and timing uncertainty. • This method is simpler than the other circuits presented here. Modification to the original clock driver is done by adding only one inductor and two transistors; the other methods require a large filter capacitor. 1.4 Thesis Outline The remainder of this dissertation is organized as follows. Chapter 2 provides background on discrete and integrated DC-DC switching power converters, including an explanation of typical switch-mode power converter topologies. It also describes some of the previous work that has been done in the area of on-chip converters and high performance clocking. Chapter 3 presents the charge-recycling architectures using buck converter topology. Chapter 4 explores alternative converter topologies, namely boost and buck-boost designs. Chapter 5 presents the low-power clock driver design. Lastly, Chapter 6 provides a final summary and discusses future work. 8 2 BACKGROUND Discrete switch-mode power converters [12] are popular as they are very efficient regulator circuits. The use of switch-mode DC-to-DC power converters has increased in recent years as more electronic devices, such as laptop computers and cell phones, are powered from batteries. By powering the electronic circuits through a DC-DC converter, they receive a regulated voltage as the battery voltage drops. DC-DC converters can also adjust the voltage level needed to supply different sub-circuits of a system as they can provide higher or lower voltage levels than the battery voltage or even a negative voltage, if needed. A key quality metric for power converters is the conversion efficiency. Typical efficiencies are 50 - 70% for the lower end, and 80 - 95% for the higher end. Other key quality metrics are the output voltage regulation and the output voltage ripple which is usually kept below 5% peak-to-peak. In a discrete switch-mode converter, efficiency is compromised due to the parasitic elements of the circuit. Integrating the converter within a system-on-chip diminishes the problem by reducing the stray components. Therefore, a number of efforts are underway to move the power converter on-chip [13] [14] [15]. This also could lower the number of required power pins and improve the quality of voltage regulation as well. The rest of this chapter provides background on discrete and integrated DC-DC power converters including a brief explanation of the basic topologies and a detailed survey of the previously published on-chip power converters. It also describes some of the previous work which has been done in the area of high-performance clocking. 9 2.1 Discrete Switching Power Converters 2.1.1 Basic Switching Converters Switch-mode converters consist of an inductor that periodically is connected in different configurations. Usually the input of these converters is an unregulated DC voltage, such as a rectified AC voltage or a battery. By adjusting the ratio of time spent in each configuration, i.e., the duty cycle, the output voltage can be established and regulated. Switching frequency itself does not have an effect on the output voltage. Switch-mode converters are more efficient than linear converters, in the range of 80% to 95% for a discrete design. As switches are either fully closed or fully open, the voltage drop happens only across the inductor, which ideally is a no-loss component. That is, voltage drop is due to energy stored, but not dissipated, in the inductor. The higher efficiency of these converters has made them attractive for all types of applications. For example, using these converters can increase battery life in a portable device. As there are switches in the circuit that are closed and opened, harmonics are present in the system that need to be dealt with by employing a suitable filter. There are two basic DC-DC converter topologies that can generate output voltages that are lower or higher than the input voltage. A third simple configuration can be derived from the two basic ones to generate a negative output voltage with a magnitude that is either greater than or less than the input voltage magnitude [12]. One of the basic switching conversion topologies is the step-down or buck converter, shown in Figure 2.1(a). Basically, its operation can be described as averaging a PWM square wave signal by passing it through a low-pass filter. The average or DC value is D×VDD which implies that the output voltage is a function of the magnitude (VDD) and also the duty cycle (D) of the square waveform. The operation of the buck converter is fairly simple as there are only 10 two operational states. In the first state, the switch is closed, diode is reverse-biased and current builds up in the inductor. In the second state, the switch is opened. Current in the inductor can not change instantly, so the current finds its way through the diode and the energy is transferred from the inductor to the load. The ideal DC gain of a buck converter is D. (a) Buck configuration (b) Boost configuration (c) Buck-boost configuration Figure 2.1. Basic switching converter topologies Another basic DC-DC conversion topology is the step-up or boost converter. Components used are similar to the buck converter but connected in a different configuration as shown in Figure 2.1(b). Similarly, there are two operational states. In the first state, the switch is closed, current builds up in the inductor and diode is reverse-biased, isolating the output stage. In 11 the second state, the switch is opened. Current in the inductor can not change instantly, so the current finds its way through the diode. The inductor voltage will be in series with the source voltage, so the output capacitor receives a voltage that is higher than the supply voltage. The load receives energy from the input source as well as the inductor. Therefore, the ideal DC gain of a boost converter is 1/(1 − D). The buck-boost topology also uses the same components as the buck converter but connected in yet another configuration, as shown in Figure 2.1(c). Again, there are two operational states. In the first state, the switch is closed, so current builds up in the inductor and the diode is reverse-biased isolating the output stage. In the second state, the switch is opened. Current in the inductor can not change instantly, so the current finds its way through the diode. The inductor will be in parallel with the output and the energy is transferred from the inductor to the load. Hence the ideal DC gain of a buck-boost converter is − D/(1 − D). 2.1.2 Zero Voltage Switching In advanced switch-mode power converters, zero voltage switching (ZVS) operation is used to manage dynamic power losses in the power transistors [16]. The basic idea of ZVS is that these power transistors are turned on when the voltage across their terminals is zero, which results in no power loss during switching. Consider the circuit diagram of a synchronous buck converter shown in Figure 2.2(a). S1 acts as the switch to connect to the supply and S2 acts as the diode from Figure 2.1(a). Cx includes all capacitances at node Vinv. When S1 is on, Vinv = VDD and the current in the inductor is increasing. S1 is turned off in accordance with the required converter output voltage, in other words, the duty cycle of the gating signal. S2 is kept off and the inductor current moves the charge stored in Cx to CF and, as a result, Vinv decreases. When Vinv = 0, S2 is turned on to achieve ZVS for S2. Noticing that S1 is off and no supply voltage is connected to the 12 circuit, inductor current decreases and by design reaches to some negative value. At this time, S2 is turned off and the negative inductor current charges Cx. Vinv is increased and when Vinv = VDD, S1 is turned on to achieve ZVS for S1. (a) Synchronous buck configuration (b) Idealized timing diagram Figure 2.2. ZVS operation in a synchronous buck converter In Figure 2.2, the value of Cx affects the rise and fall times of Vinv as a larger capacitor will slow down the transitions of the node Vinv. The output voltage has a ripple due to the switching action. The percentage ripple in the output voltage is usually specified to be less than, 13 for example, 5% peak-to-peak. Therefore, as a first order of approximation, it is valid to assume that Vout is constant. 2.2 CMOS Inverter Driver Circuit CMOS transistors used as switches in the converter of Figure 2.2 are big compared to other transistors used in digital logic. The gate inputs of these transistors have significant capacitance. To achieve rapid turn-on and turn-off transitions, a tapered driver, which is a chain of inverters whose size successively grows by the tapering factor r, is used to drive those big transistors. Figure 2.3 shows such a driver chain with n stages. To increase the overall efficiency of a power converter, the driver circuit should be designed so that the power consumption in the driver chain is minimized. Figure 2.3. A CMOS inverter driver with tapering factor r As described in [17], two parameters β and r characterize the inverter chain: β is the ratio of PMOS to NMOS transistor sizes in an inverter, and r is the ratio of the transistor sizes in consecutive inverter stages. A common practice is to widen PMOS transistors so that in an inverter stage, the resistances of PMOS and NMOS transistors are matched [17]. This typically requires β = 2.5~3.5. As a result, high-to-low and low-to-high propagation delays are equalized. In addition, the rise/fall times of inputs and outputs of the inverters are equalized, which minimizes 14 the short circuit dissipation. As such, most power dissipation in an inverter driver is associated with the dynamic power, and only a minor fraction (<10%) is due to short-circuit currents. Increasing r is also a key to reducing power consumed by the front-end inverter chain. In this work, to keep the design simple without varying too many variables, a value of r = 4 corresponding to fan out of four is chosen for the inverter chain. Using fan-out of four (FO4) is a common practice that minimizes propagation delay in the inverter chain [17]. 2.3 Integrated Switching Power Converters To be able to implement switch-mode DC-DC power converters on chip, the power switches of the converter are replaced by CMOS transistors. As an example, comparing the circuit diagram of the buck converter in Figure 2.1(a) with the CMOS inverter in Figure 2.4 reveals a similarity which leads to use of a CMOS inverter as power switches of a buck converter. Figure 2.4. A CMOS inverter chain driving a CMOS buck converter The CMOS transistors in the converter of Figure 2.4 are playing the role of power transistors, so they need to be big to pass high currents. Big transistors have small on-state resistance resulting in a reduced static power loss. On the other hand, this will increase the stray 15 capacitances of the transistors and dynamic power loss is increased which needs to be addressed. A bigger transistor will need a bigger gate driver circuit as well. Integrated power converters usually work at higher switching frequencies to shrink the size of the passive components. Therefore, to reduce the amount of heat generated, higher efficiency values are much preferred and, as such, implementation of ZVS operation is very important. To do this, separate gating signals are needed for each transistor as shown in Figure 2.5. Figure 2.5. ZVS operation in a CMOS synchronous buck converter Here, after Mp is turned off, Mn is kept off until the inductor current discharges Cx to zero and then Mn is turned on, achieving ZVS for Mn. Similarly, after Mn is turned off, Mp is kept off until the negative inductor current charges Cx to VDD and then Mp is turned on, achieving ZVS for Mp. For integrated power converter designs, to save on-chip area, smaller inductor and capacitor values are preferred. Graphs illustrating the required inductor and capacitor values for different currents and switching frequencies are given in Appendix A. Choosing a mid-level output current will give a good compromise between inductor and capacitor values while higher switching frequencies will reduce both. 16 2.4 Literature Survey Discrete power converters have been around for many years and, as such, they have been studied in detail in many publications. While the earlier papers, such as [18] and [19], had focused on design optimization, more recent papers have focused on using advanced control methods that could be employed using digital signal processors [8]. As on-chip power converters are becoming popular, various approaches to integrating them on-chip have been reported in the latest literature. Some have tried to implement discrete design techniques such as using a multi-phase configuration to improve the quality of the output voltage while others have tried to implement integrated design techniques such as using low- swing transistors to reduce the consumption of the converter itself. Those designs are discussed briefly in the following sections. 2.4.1 Switching Power Converters Physical constraints push on-chip integrated power converters to use small inductors and capacitors. Recent work has focused on reducing the size of these components while maintaining high efficiency. In [20], an analytical solution is derived for the optimal DC-DC converter design, linking power efficiency directly to CMOS front-end parameters and inductor technology. In recent years, many integrated power converters have been reported, mostly switching at a few megahertz frequency and with off-chip passive components. A converter switching at 480MHz [14] operates at one of the highest reported frequencies. It contains four single-phase modules that operate as stand-alone converters and receive synchronization signals from a block synchronizer as shown in Figure 2.6. In a multiphase topology, the switching times of the inductors are staggered to cancel out the output voltage ripple. This design utilizes air-core 17 inductors mounted on the package of a 90nm CMOS chip. At 233MHz, power efficiency of 83.2% has been reported with voltage conversion of 1.2V to 0.9V at 0.3A load current and at 480MHz, efficiency of 72% was reported with voltage conversion of 1.8V to 0.9V at 0.5A. On the other hand, [15] is an example of a fully integrated step-down converter fabricated in a 0.18µm SiGe RF BiCMOS process. The converter provides a programmable 1.5V to 2V output voltage at a 200mA current rating with a switching frequency of 45MHz. This design, shown in Figure 2.7, utilizes a two-stage interleaved ZVS synchronous buck topology, and has a maximum efficiency of 65%. Also, [21] is an example of a fully integrated step-up converter fabricated in a 0.5µm process. The circuit diagram for this design is shown in Figure 2.8. The target specifications are input and output voltages of 5V, a maximum load current of 200mA, and an average switching frequency of 75 MHz. The conversion efficiency was not reported for this design. Figure 2.6. Block diagram of a four-phase interleaved DC-DC converter [13] 18 Figure 2.7. Circuit diagram of the fully integrated two-stage buck converter [15] Figure 2.8. Circuit diagram of the fully integrated boost converter [21] 19 Table 2.1, partially taken from [13], shows a comparison of the previously reported on- chip converters, some of which have on-chip passives. Among fully integrated converters, [21] has the highest switching frequency of 75MHz and uses an area of 1.5mm × 1.5mm to fit the large on-chip passive components. On the other hand, [14] has the highest reported switching frequency of 480MHz and uses on-package inductors. It later appeared in [13], switching at a lower frequency of 233MHz, to boost the efficiency. Table 2.1. Performance comparison of reviewed converters [22] [23] [24] [14] [13] [21] [25] [15] Year 2001 2004 2004 2004 2005 2000 2005 2006 Technology 0.35µm 1.5µm 0.18µm 90nm 90nm 0.50µm 1.5µm 0.18µm SiGe RF BiCMOS Switching frequency, Fsw (MHz) 1 2 102 480 233 75 10 45 Input voltage, Vin (V) 1.2 3.3 1.8 1.8 1.2 ~ 1.4 5 4 2.8 Output voltage, Vout (V) 0.5 1.7 0.9 0.9 0.9 ~ 1.1 5 2 1.8 Output current, Iout (A) 0.02 0.07 0.25 0.5 0.3 ~ 0.4 0.2 0.25 0.2 Efficiency, η (%) 91 92 88 72 83 ~85 N/A 50 65 Filter inductor, LF 10µH 4.7µH 8.8nH 3.6nH * 6.8nH ** 50nH 1µH 11nH *** Filter capacitor, CF 20µF 10µF 3.0nF 2.5nF 2.5nF 650pF 180nF 6nF On/off-chip passives off off off Off-chip, on-package inductors Off-chip, on-package inductors on on on * This design uses four inductors, 3.6nH each. ** This design uses four inductors, 6.8nH each. *** This design uses two inductors, 11nH each. To reduce the size and footprint of the passive components, the switching frequency of the converters needs to be increased. The integrated clock driver/power converter designs introduced in this dissertation uses GHz-range switching frequency for full on-chip passive component integration and a smaller passive component footprint. Reduced efficiencies at those higher frequencies are compensated by employing charge recycling methods, as described in detail later in this dissertation. 20 2.4.2 Low-Swing Power Converters To enhance the efficiency characteristics of high-frequency switching DC-DC converters, [24] proposes a low-voltage-swing MOSFET gate drive technique as shown in Figure 2.9. It has been reported that an efficiency of 88% at a switching frequency of 102 MHz is achieved for a voltage conversion from 1.8V to 0.9V with a low-swing DC-DC converter based on a 0.18µm CMOS technology. This corresponds to a power reduction of 27.9% as compared to a standard full-swing DC-DC converter. Figure 2.9. Low swing DC-DC conversion technique [24] Another low-swing design presented in [26] utilizes a cascode bridge circuit as shown in Figure 2.10. The circuit can operate at input voltages higher than the maximum voltage that can be applied directly across the terminals of a MOSFET. It has been reported that an efficiency of 79.6% is achieved for 5.4V to 0.9V conversion in a 0.18µm CMOS technology. This DC-DC converter operates at a switching frequency of 97MHz while supplying a DC current of 250mA to the load. Moreover, [27] combines the low-swing idea with digital controlling as a power management solution as shown in Figure 2.11. In this scheme, normally the DC-DC converter 21 works in pulse width modulation (PWM) mode to achieve high-quality regulation as well as good efficiency. However, in standby mode in which the load current is very low, pulse width modulation control leads to low efficiency due to excessive switching loss. To extend the standby time, pulse frequency modulation is used for light-load operation to achieve good efficiency. This digitally-controlled buck converter is implemented in 0.25µm CMOS. The PWM switching frequency is 1.5MHz. The converter achieves a maximum of 91% efficiency at 200mA output current. Maximum input voltage is 5.5V and the output voltage ranges from 1V to 1.8V. Figure 2.10. Cascode bridge circuit [26] Figure 2.11. Block diagram of a power management on-chip [27] 22 In contrast, using a different methodology, the approach in [28] consists of stacking CMOS logic domains to operate from a voltage supply that is a multiple of the nominal supply voltage. DC-DC down conversion is performed using charge recycling without the need for explicit power converters as shown in Figure 2.12. This high-voltage power delivery system would need start-up devices to avoid device overstress during power-on. Also level shifters that translate logic levels between stacked domains are needed. The approach clearly requires that the stacked loads have well-balanced charge utilization for high efficiency. One context in which this approach may be more easily applicable is in a multi-core microprocessor in which each core could be designed to operate in a different stacked domain. Current utilization in each domain could be controlled with workload balancing; level-shifting voltage interfaces would only have to be present to interface between cores or with the chip pads. Figure 2.12. Implicit DC-DC conversion through charge recycling [28] The low-swing buck converter design introduced in this dissertation improves upon this previous work by introducing “reduce, reuse and recycle” as a complete energy savings strategy for an on-chip buck converter circuit. Energy reduction in the front-end drive chain is achieved by using low-swing signaling at 660MHz. Charge reuse is achieved by supply-stacking separate front-end drive-chains for the output transistors. And finally, energy recycling is achieved by taking surplus charge available from the top front-end drive-chain along with the charge 23 available in the clock load capacitance and sending it to the load as a regulated supply. Although the first two concepts have been implemented before, the third concept of energy recycling is a new contribution, as described in detail later in this dissertation. 2.4.3 Resonant Clock Strategies A clock signal distribution network in an integrated circuit requires a capacitive clock distribution model. An approach to global clock distribution presented in [29] augments traditional tree-driven grids with on-chip inductors. The large clock capacitance then resonates with the inductance Lspiral shown in Figure 2.13. This approach promises to significantly reduce the power necessary to drive the grid, since the energy of the fundamental resonates back and forth between electric and magnetic forms rather than being dissipated as heat. Consequently, the clock drivers must only supply the energy needed to overcome losses at the fundamental. Furthermore, because the effective capacitance of the clock network is dramatically reduced, the number of gain stages and the associated latency required to drive the clock is reduced as well, resulting in considerable improvement in skew and jitter. Figure 2.13. Simple lumped circuit model of the resonant clock distribution [11] While the non-resonant power scales linearly with frequency, [29] and [10] report that the resonant power is fairly constant, with better-than-80% power savings at the desired resonance frequency of 1.1GHz. To minimize energy dissipation at the fundamental, there might be some 24 need to tune the grid resonance to the clock frequency with MOS capacitors that can be switched onto the clock load. Local buffering would not be resonant and would dissipate the same amount of power as a non-resonant distribution. Hence, with resonant clocking there would be a desire to shift more of the clock load to the resonant grid. This approach can scale to higher clock frequencies for a given clock load by the addition of more inductors to the network. Sinusoidal clocks are, however, generally undesirable because of slower signal transition times. The slow transition results in increased skew and jitter as there is no precise moment to define the clock event. The concept presented in [29] is improved in [11] by introducing a distributed differential oscillator global clock network. Here, the distribution is differential with the use of symmetric inductors placed between the two clock phases, eliminating the need to add large capacitors to the clock distribution as in the resonant single-ended distribution of [29]. The low-power clock driver circuit introduced in this thesis differs from resonant clocking by providing a quasi-square clock waveform with sharp edges at the frequency of 4GHz using a circuit configuration that resembles a boost converter. This improves the power consumption of a clock tree itself. 2.5 Implementation Considerations The designs introduced here rely on the charge stored in the clock load capacitance. Thus, the exact location to connect these circuits depends on where the clock load capacitance is located, i.e., it depends on the configuration of clock distribution network. Clock distribution networks have been studied in detail in [4]. They usually form a tree structure. If the ends of the branches are connected to each other, a mesh structure is formed which has the benefit of reduced interconnect resistance within the clock tree. A single buffer 25 can be used to drive the entire clock if the clock is distributed entirely on metal. The buffer needs to be able to provide enough current to drive the clock network capacitance while keeping the clock waveform intact. One of the goals of the clock designers is to minimize the clock skew. One common way of achieving this is by using a symmetric layout such as an H-tree, as shown in Figure 2.14. Therefore, each clock path from the clock source to individual clock loads has the same delay, assuming exact matching of the layout and no process variations. The interconnect capacitance in an H-tree is greater than a standard clock-tree [30] because total wire length tends to be greater. Thus, using an H-tree reduces skew while increasing the power. In the example of Figure 2.14, the last inverter that drives the clock tree trunk is the biggest inverter which drives the capacitance of the whole H-tree. Therefore, the last inverter could be used for the power switches in the integrated clock driver/buck converter, or it could be where the inductor would be located in the low-power clock driver design. Figure 2.14. A tapered H-tree clock distribution network Another approach is to distribute buffers throughout the clock network. This method requires more area but will be necessary if the resistance of the clock interconnects is significant. In a well-balanced clock distribution network, buffers are the primary source of clock skew [4] as active device characteristics vary more than passive device characteristics. Buffers may also 26 be used to drive local loads. In this case, the integrated clock driver/buck converter must be replicated for each region that has its own local load buffer. Another concern is the area overhead when these energy-recycling designs are utilized in a real microprocessor environment. To investigate this concern, power consumption of a few recent microprocessor designs is summarized in Table 2.2, which is partially taken from [3]. Table 2.2. Comparison of recent microprocessors Microprocessor Itanium 2 [3] Power 4 [3] (2 core) Montecito [31] (2 core) Power 6 [1] (2 core) Switching frequency, Fsw (GHz) 1.0 1.3 ~2.0 5+ Overall power consumption (W) 130 125 100 ~100 Chip area (mm2) 421 400 est. 596 341 Clock power share (%) 33 30 25 22 As an example, the clock network in IBM’s POWER6 processor consumes 22% of the overall chip power or about 22W with an overall area of 341mm2. This results in an estimated clock power consumption of 65mW/mm2. Using swDDFCVP 2= , in which P, C, VDD and Fsw are clock power consumption, clock capacitance, supply voltage and clock frequency, respectively, and assuming VDD = 1.0V, it can be estimated that the overall clock capacitance of the chip is 4.4nF or in other words 13pF/mm2. In the low-power clock driver design, a clock load gate capacitance of 21pF has been used which corresponds to an area of 1.6mm2 of a high performance microprocessor. In contrast, the area needed to implement the inductor in that design is 0.1mm2, which is much smaller than 1.6mm2 it is trying to recover power from. In comparison, [10] reports a 90nm CMOS resonant clocking test-chip with Cclk = 7.5pF and four sets of LC passives as shown in Figure 2.15. This results in a clock phase and amplitude that are both uniform across the entire clock network [11]. Local buffering would not be resonant and would dissipate power as a non-resonant network [29]. Tuning of the grid to the clock 27 resonance frequency could be done by switching MOS capacitors onto the clock load, but if the Q of the resonator is small, resonance can be achieved over a wide frequency band [29]. In Figure 2.15, each Cdecap is 20pF and L is 1nH, occupying a chip area of 80µm × 80µm and 90µm × 90µm, respectively [10]. Since the H-tree itself does not include any buffers, the four sets of LC passives are in parallel which results in an effective decoupling capacitance of 80pF and an effective inductance of 250pH at clock resonance frequency of 3.7GHz. It has been reported that approximately 20% of the clock power is being recycled in the test chip which, with a redesign, would likely approach the 80% observed in 0.18µm test-chips [10]. Figure 2.15. Components of a resonant clock sector [10] 28 3 INTEGRATED BUCK CONVERTERS In this chapter, an integrated clock driver/buck converter design and a low-swing buck converter design are discussed. In the first design, the energy stored in the clock load capacitance provides the input power to voltage converters operating at the clock speed of roughly 3GHz. The second design, a low-swing buck converter, introduces “reduce, reuse and recycle” as a complete energy savings strategy at 660MHz. Energy reduction in the front-end drive chain is achieved by using low-swing signaling (half-rail swings instead of full-rail). Charge reuse is achieved by one drive- chain reusing charge from the other drive-chain. And finally, energy recycling is achieved by taking surplus charge available from one drive-chain, along with the charge available in the clock-load-capacitance, and sending it to the load as a regulated supply. In the designs introduced here, high-speed switching losses are reduced by employing zero voltage switching and by directly integrating the clock-tree drivers with the converter power-transistor drivers. Also, the designs are implemented in an open loop, lacking output voltage regulation, but with the goal of having less than 5% ripple on Vout. The techniques proposed in these designs are valid for finer feature size CMOS technologies as well. 29 3.1 Integrated Clock Driver/Buck Converter 3.1.1 Introduction This section describes a new method where the energy of the clock is recovered to supply on- chip DC-DC converters [32] [33]. This work differs from resonant clocking by providing a quasi-square clock waveform with sharp edges because the inductors are not working in a resonating mode with the clock capacitors. Here, part of the challenge comes from the necessity of being limited to on-chip CMOS power transistors and passive components, as the design is limited to the same technology as the rest of the circuit. Directly integrating a clock driver intended for high-performance logic with a DC-DC power converter merges several compatible concepts. The converter switching losses are merged into the clock-tree switching losses, the multi-GHz clock frequency reduces the size of converter passive components so that they can be put on-chip, and the final clock drivers and the DC-DC converter power transistors are both very wide to improve switching time of the clock and reduce static losses of the converter. Also, these large, low-impedance transistors need to be driven by a tapered inverter chain to keep up with the very high frequency. Similarly, the power used by this chain should be minimized in both cases. But higher switching frequency increases the dynamic power loss. To compensate for this loss, two major ideas have been used in this work: charge- recycling and zero-voltage switching. Output voltage regulation can be achieved by modulating the clock duty cycle, a scheme compatible with single-edge triggered clocking. The converters’ output voltage can be used to supply sub-circuits that operate at other voltage levels as it is challenging to bring in and distribute several voltage domains. Since the switching DC-DC converters are small, several of them can be deployed in different regions to produce independent, regional power supplies. This allows several different regulated voltages to be on-chip at the same time, all powered from the 30 same off-chip primary supply. Many power-saving techniques such as mixed-voltage islands and adaptive body biasing (ABB) [34] can utilize these additional supply voltages. An on-chip DC- DC converter can power these schemes without the need for external pins, external components, or board design effort. Another advantage of on-chip converters is the ability to respond quickly to dynamic load conditions in many-core processors, a key requirement for achieving the savings promised by dynamic voltage and frequency scaling (DVFS) [35]. Figure 3.1 shows how integrating the clock driver with the power converter helps in increasing the overall efficiency. The integrated clock driver/power converter in Figure 3.1(a) receives Pin1. Part of Pin1 is required to operate the clock network. If a dedicated clock driver was constructed, this power consumption would be Pin2. We use Pin1 – Pin2 to operate the power converter and recycle energy from the clock driver. As shown in Figure 3.1(b), if this power and circuitry was removed from the integrated design, a stand alone power converter would still be needed that provides Pout using just the incremental power 21 inin PP − . Recycling the clock power increases the effective efficiency. (a) Raw efficiency (b) Effective efficiency Figure 3.1. Efficiency block diagram 31 To compare the dual-purpose circuit with traditional on-chip power converters, a new concept is introduced as effective efficiency. Effective efficiency (ηeff) is defined as the output power of the converter divided by the incremental power to operate the converter. 21 inin out eff PP P − =η ( 3.1) Effective efficiency captures how efficient a traditional converter would have to be if it were to supply the same output power using just the additional input power needed by the dual- purpose circuit. 3.1.2 Circuit Design One of the basic switch-mode DC-DC conversion topologies is the step-down or buck converter. Its operation can be described as averaging a square wave signal by passing it through a low-pass filter as shown in Figure 3.2(a). The average or DC value is D×VDD which implies that the output voltage is a function of the magnitude, VDD, and also the duty cycle, D, of the square waveform. A basic integrated clock driver/buck converter circuit is shown in Figure 3.2(b). Here, a chain of cascaded inverters (not shown) is used as a driver buffer for node Vclk-in. Cclk is the sum of all transistor and wiring capacitances that are connected to the clock node. The idealized timing diagram of the internal signals is presented in Figure 3.2(c), where D, Tsw, and Tdelay represent clock duty cycle, switching period (i.e., clock period), and ZVS delay-time, respectively. As shown in Figure 3(c), there are three intervals of operation: • Interval 1 (time 0 to D×Tsw) is intended to drive the load and charge Cclk through Mp. During this time, the inductor current increases linearly since the voltage across it is constant. 32 (a) A typical buck converter (b) Simplified circuit diagram of the integrated clock driver/buck converter (c) Idealized timing diagram Figure 3.2. Integrated clock driver/buck converter • Interval 2 (time D×Tsw to D×Tsw+Tdelay) is intended for charge recycling. Therefore, both Mn and Mp are off. The charge that is stored in Cclk is moved to the output circuit through the inductor, as the inductor current can not be disrupted abruptly. This results in a rapid drop of Vclk which is intended. In this short period of time, the inductor current can be assumed somewhat constant. It is worth mentioning that if there is no delay present, then at time D×Tsw, Cclk would be discharged to ground through Mn, wasting the stored energy. • Interval 3 (time D×Tsw+Tdelay to Tsw) starts when the voltage across Mn is close to zero. At this time, Mn is turned on to provide a low-resistance path for the inductor current. As there is no energy supplied to the system and the voltage across the inductor is constant, 33 inductor current decreases linearly. ZVS operation occurs when Mn is turned on while its source-drain voltage is close to zero, thereby reducing dynamic power loss. Theoretically, in interval 3, when the falling inductor current crosses zero, Mn could be turned off to allow charging Cclk with the negative inductor current. Then, at the beginning of the next switching cycle, Mp would be turned on with zero voltage across it, i.e., ZVS operation for Mp. In practice, this might increase the output voltage ripple, as CF should provide the required charge for the large Cclk. Moreover, the inductor RMS current and thus the power loss in the inductor resistance would be increased. In this design, the minimum inductor current is set to be close to zero; therefore, no ZVS operation is implemented for Mp. In practice, due to the process variation, the inductor current may go slightly negative. However, as the inductor current does not stop at zero, the converter is considered to be operating in continuous conduction mode (CCM). At the end of interval 3, Mp is turned on and Mn turned off at roughly the same time. That is, the delay element should only delay a rising edge on Vclk, not the falling edge. 3.1.3 Complete Circuit To be able to calculate the effective efficiency using Equation ( 3.1), a reference clock driver is needed. In this section, this reference circuit will be described first. This will be followed by the integrated clock/converter circuit. Reference Clock Circuit To evaluate the performance of the integrated clock driver/power converter circuit, a reference circuit containing the tapered inverters to form a clock driver was designed using a reference 34 clock capacitance Cclk. In this work, the clock capacitance Cclk is assumed to be 12pF. The approach described in [17] is used here to design the inverter chain. A common practice is to use wider PMOS transistors than NMOS transistors so that the resistance of PMOS and NMOS transistors is matched. In this circuit, PMOS transistors are three times wider than NMOS transistors, except for the last inverter stage in which the PMOS is four times wider as shown in Figure 3.3. This is done to keep the reference circuit similar to the integrated design where Mp needs to be wider to drive Cclk and LF simultaneously. As is common practice, a fan-out ratio of four is chosen for the inverter chain. To increase the overall efficiency of a power converter, the driver circuit can be designed so that the power consumption in the drive chain is minimized. Figure 3.3. Circuit of the reference clock for the integrated clock driver/buck converter Integrated Clock Driver/Buck Converter A detailed circuit diagram of the integrated clock driver/buck converter, including the buffer delay circuitry, is shown in Figure 3.4. Some transistors have been added to implement the capacitors. To control the exact on/off timing of Mn and Mp, the inverter driving those transistors is replaced with two separate inverters, with the same total transistor sizes and roughly the same 35 power consumption as the original single driver. To implement the delay time, the gate of M1 is connected to Vclk instead of being connected to the gate of M2. Therefore, compared to Vp, the rising edge of Vn is delayed by Tdelay, a duration which depends on how quickly LF drains Cclk and how fast M1 turns on to raise Vn. A drop in Vclk will result in M1 and then Mn to turn on and consequently Vclk drops faster. Since the gate of M2 is connected to Vm, no falling edge delay is observed for Vn. To prevent M1 and M2 from being on concurrently at the rising edge of Vm, the source of M1 is connected to Vp instead of VDD. Therefore, Vn falls at the falling edge of Vp. Mp Mn Vclk-PWM Vm Vn Vp Vout Vclk M3 M4 M1 M2 Cclk 12pF Wp/Lp=24/0.1 Wn/Ln=8/0.1 96/0.1 Transistor dimensions are in µm. LF=320pH CF=350pF Wp/Lp=96/0.1 Wn/Ln=32/0.1 288/0.1 96/0.1 32/0.1 2048/0.1 512/0.1 ZVS Delay Circuit Wp/Lp=6/0.1 Wn/Ln=2/0.1 18000/1.5 6144/0.1 2048/0.1 ILf Figure 3.4. Circuit diagram of the integrated clock driver/buck converter In interval 1 of the operation, Cclk stores some energy which is then being delivered to the load in interval 2. The output voltage is therefore given by DDeffout VDV ×= where sw falldelay eff T TT DD − ⋅+= 2 1 ( 3.2) Tfall is the fall time of Vclk if there was no ZVS delay and Tdelay is the fall time of Vclk in the presence of ZVS, as shown in Figure 3.5. 36 Figure 3.5. Timing diagram of Vclk Equation ( 3.2) suggests that if Tdelay is equal to Tfall, the duty cycle remains unchanged. Any Tdelay larger than Tfall would increase the effective duty cycle accordingly. Tdelay can be calculated using the simplified circuit model given in Figure 3.6. Figure 3.6. Simplified circuit model for analyzing Vclk during clock fall time At time t = 0 when Mp turns off, ( ) PMOSonaxLmDDclk RIVV −⋅−=0 . During clock fall time, ILmax can be assumed to be constant, therefore ( ) ( ) tICVtV axLmclkclkclk ⋅⋅−= 10 . The time that it takes for Vclk to reach zero can be determined by:         −= −PMOSon axLm DD clkdelay RI VCT ( 3.3) 37 3.1.4 Simulation To evaluate the performance of the integrated clock-driver/power converter circuits, it is simulated in 90nm CMOS technology using standard-Vt transistors. A square wave signal with ~30psec rise/fall time, which is about the rise/fall time of an inverter with fan-out of four, is used as the clock source. Simulated waveforms for the integrated buck converter are shown in Figure 3.7. The circuit is simulated with a 50% duty cycle and 70mA load current. The inductor current shown as Lf in Figure 3.7(b), exhibits a triangular shape as expected, with minimum and maximum values of around −50mA and 190mA, respectively. In the first half cycle of the clock, Mp source current provides the energy to charge up Cclk as well as LF. Because of the high current, there is a voltage drop of ~0.1V across Mp as suggested by the droop of Vclk to ~0.9V in Figure 3.7(a). In this figure, the reference clock circuit output is shown as Vclk-ref. Both clocks have similar edge slopes. In the second half cycle of the clock, inductor current discharges Cclk. As can be seen in Figure 3.7(b), Mn source current is always positive, which means that all the charge in Cclk is delivered to the load instead of the ground. Simulation results of the buck converter circuit at different duty cycles and output currents are given in Figure 3.8 and Figure 3.9. Pout can also be derived from Figure 3.8(a). The output voltage increases as D is increased and, at the same time, the effective efficiency decreases. For example, at 70mA output current, by varying the duty cycle from 30% to 70%, the output voltage changes from 0.27V to 0.7V. The corresponding effective efficiency ranges from 286% down to 135%. For the reference circuit (the clock driver alone), simulations determined its power consumption, Pin2, was 41mW. 38 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 Time (nSec) Vo lta ge (V ) Vclk Vclk-ref Vout (a) Voltage waveforms -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.2 0.4 0.6 0.8 1 Time (nSec) Cu rr en t (A ) Lf Mn Mp (b) Current waveforms Figure 3.7. Simulated waveforms for the integrated clock driver/buck converter 39 0 0.25 0.5 0.75 1 10 20 30 40 50 60 70 80 D = Duty Ratio (%) Vo u t (V ) Iout=30 Iout=50 Iout=70 Iout=100 (a) Output voltage vs. duty ratio 0 25 50 75 100 10 20 30 40 50 60 70 80 D = Duty Ratio (%) Pi n 1 (m W ) Iout=30 Iout=50 Iout=70 Iout=100 (b) Input power vs. duty ratio Figure 3.8. Simulated output voltage and input power of the integrated buck converter 40 0 25 50 75 100 125 0 20 40 60 80 100 Iout (mA) R aw Ef fic ie n cy (% ) D=30% D=40% D=50% D=60% D=70% (a) Raw efficiency vs. output current 0 50 100 150 200 250 300 40 50 60 70 80 90 100 Iout (mA) Ef fe ct iv e Ef fic ie n cy (% ) D=30% D=40% D=50% D=60% D=70% (b) Effective efficiency vs. output current Figure 3.9. Simulated raw and effective efficiencies of the integrated buck converter 41 3.1.5 Chip Implementation As models and simulation results of large passive on-chip components are inaccurate at very high frequencies and current densities, the integrated clock driver/buck converter is fabricated to assess the difficulties of implementing power regulation in deep-submicron technologies. The block diagram and micrograph of the clock driver/buck converter chip are shown in Figure 3.10. The area of the integrated converter including LF and CF is 0.27mm2. The inductor alone is 0.1mm2. The total die area is 1mm2 to allow for probe station testing. In order to avoid potential hot spots on the chip, especially at high load currents, some layout decisions were made to transfer heat out of the chip as quickly as possible. Higher metal layers such as M6 and M7 are better for transferring heat as they are the thickest. Power and ground grids are connected to high-power transistors through a large number of vias. These vias transfer the heat from the transistors located on the substrate to the surface of the chip and then to the probe pins, which serve as heat sinks. In order to satisfy the specified maximum current densities and to avoid electro- migration, paths that would normally carry high currents are widened. This also helps in reducing resistive voltage drops across the circuit. To satisfy DRC rules for maximum width and density of metal layers in 90nm CMOS process, wide paths such as those used in the inductor layout are slotted. Large transistors inject high currents into the substrate through the large drain junction capacitances and by forward-biasing the source-bulk junction diodes. In order to prevent latch up caused by those high currents, the layout of the circuit incorporates substrate contacts with sufficiently small spacing to minimize the resistance [36]. A few provisions to the chip layout are needed for testing purposes. To match the chip input impedance with the signal generator output impedance, a 50Ω termination resistor is added on chip. The probes available in the lab provide a limited number of connections that can be 42 made simultaneously. Also, since it is very difficult to monitor 3GHz waveforms on the chip without being invasive, these types of measurements were not attempted. (a) Chip block diagram (b) Chip micrograph Figure 3.10. Implementation of the integrated clock driver/buck converter 43 3.1.6 Chip Measurements The test bench for the integrated clock driver/buck converter was setup as shown in Figure 3.11. For precise power measurement, all the parasitic resistances in the test setup were accounted for through measurement and calibration. As a result, a supply voltage of 1.0V was applied at the chip probe pads. An external signal generator provides the clock signal to the chip under test. Figure 3.11. Block diagram of the test bench setup Investigating the Output Voltage The converter output voltage vs. the output current is plotted in Figure 3.12. In each graph, the duty cycle is kept constant and the switching frequency and load are changed to produce different curves. As expected, the output voltage does not change much with frequency. However, at 3.5GHz, Figure 3.12(a) suggests that the chip may not be working properly since the output voltage is significantly higher than the other data points. Figure 3.13 can be derived from Figure 3.12 by keeping the frequency constant while the duty cycle is changed. It shows that at higher duty ratios, the output voltage is higher as expected. 44 Fsw Sweep (D=50%) 0 0.2 0.4 0.6 0.8 1 1.2 30 40 50 60 70 80 90 100 110 Iout (mA) Vo u t (V ) 3.5GHz 3GHz 2.5GHz 2GHz (a) With D = 50% Fsw Sweep (D=66%) 0 0.2 0.4 0.6 0.8 1 1.2 30 40 50 60 70 80 90 100 110 Iout (mA) Vo u t (V ) 2.5GHz 2GHz (b) With D = 66% Figure 3.12. The effect of Fsw on Vout 45 D Sweep (Fsw=2GHz) 0 0.2 0.4 0.6 0.8 1 1.2 30 40 50 60 70 80 90 100 110 Iout (mA) Vo u t (V ) 66% 50% (a) With Fsw = 2GHz D Sweep (Fsw=2.5GHz) 0 0.2 0.4 0.6 0.8 1 1.2 30 40 50 60 70 80 90 100 110 Iout (mA) Vo u t (V ) 66% 50% (b) With Fsw = 2.5GHz Figure 3.13. The effect of D on Vout 46 Investigating the Input Power The input power to the integrated clock driver/buck converter, Pin1, is plotted in Figure 3.14 along with the input power to the reference clock circuit Pin2. Figure 3.14 shows that as the frequency is increased, the input power to the circuits are increased due to more switching activity. Because of a test anomaly, there is not much difference between data points at 2GHz and 2.5GHz. Similar to the previous conclusion, Figure 3.14(a) suggests that the chip may not be working properly at 3.5GHz as Pin1 is lower than other data points. By keeping the frequency constant while the duty cycle is changed, Figure 3.15 can be derived. Higher duty cycles increase Pin1 because it affects the conversion duty cycle of the power converter. This figure also suggests that the change in duty cycle does not affect Pin2, which is expected since it does not change the switching activity. However, this conclusion cannot be drawn from the data due to the test anomaly described earlier. 47 Fsw Sweep (D=50%) 0 20 40 60 80 100 120 30 40 50 60 70 80 90 100 110 Iout (mA) Po w er (m W ) 3.5GHz 3GHz 2.5GHz 2GHz (a) With D = 50% Fsw Sweep (D=66%) 0 20 40 60 80 100 120 30 40 50 60 70 80 90 100 110 Iout (mA) Po w er (m W ) 2.5GHz 2GHz (b) With D = 66% Figure 3.14. The effect of Fsw on Pin1 and Pin2 Pin1 Pin2 Pin1 Pin2 48 D Sweep (Fsw=2GHz) 0 20 40 60 80 100 120 30 40 50 60 70 80 90 100 110 Iout (mA) Po w er (m W ) 66% 50% (a) With Fsw = 2GHz D Sweep (Fsw=2.5GHz) 0 20 40 60 80 100 120 30 40 50 60 70 80 90 100 110 Iout (mA) Po w er (m W ) 66% 50% (b) With Fsw = 2.5GHz Figure 3.15. The effect of D on Pin1 and Pin2 Pin1 Pin2 Pin1 Pin2 49 Investigating Raw and Effective Efficiencies Raw efficiency of the integrated converter is defined by 1in out P P =η and is plotted in Figure 3.16 and Figure 3.17. These figures show that the raw efficiency does not change much at different duty ratios and different frequencies. Again, the chip may not be working properly at 3.5GHz. The key metric for measuring the performance of the integrated clock driver/power converters is the effective efficiency. Since the overall input power operates two separate functions, the amount of power needed to operate the reference stand-alone clock network is not included as input power to the converter when calculating effective efficiency. Instead, only the incremental amount of power is counted as input power. When some additional energy is recycled from the clock, it is possible for the output power to exceed the incremental input power. Since energy cannot be spontaneously created, an effective efficiency greater than 100% is proof that energy recycling is taking place. Effective efficiency also represents the efficiency required of a stand-alone power converter to compete with an energy recycling architecture. Also, the effective efficiency which is defined by 21 inin out eff PP P − =η is very sensitive to the value of Pin1 − Pin2. This problem is especially more pronounced at lower output currents where Pin1 would be close to Pin2. If there is a slight inaccuracy in the measured values of Pin1 and Pin2, the corresponding effective efficiency value can dramatically change. As can be seen in Figure 3.18 and Figure 3.19, effective efficiency is increased at lower output currents. Since the available energy in Cclk is constant with respect to output current, at low current outputs a greater proportion of the output energy comes from recycling. However, higher Fsw results in more energy being stored in the capacitor per second. Hence, ηeff benefits from increasing the frequency and lowering the output current. Achieving an effective efficiency above 100% is definitive proof that energy is being recovered from the clock. 50 Fsw Sweep (D=50%) 0 20 40 60 80 100 120 30 40 50 60 70 80 90 100 110 Iout (mA) R aw Ef fic ie n cy (% ) 3.5GHz 3GHz 2.5GHz 2GHz (a) With D = 50% Fsw Sweep (D=66%) 0 20 40 60 80 100 120 30 40 50 60 70 80 90 100 110 Iout (mA) R aw Ef fic ie n cy (% ) 2.5GHz 2GHz (b) With D = 66% Figure 3.16. The effect of Fsw on η 51 D Sweep (Fsw=2GHz) 0 20 40 60 80 100 120 30 40 50 60 70 80 90 100 110 Iout (mA) R aw Ef fic ie n cy (% ) 66% 50% (a) With Fsw = 2GHz D Sweep (Fsw=2.5GHz) 0 20 40 60 80 100 120 30 40 50 60 70 80 90 100 110 Iout (mA) R aw Ef fic ie n cy (% ) 66% 50% (b) With Fsw = 2.5GHz Figure 3.17. The effect of D on η 52 Fsw Sweep (D=50%) 0 40 80 120 160 200 240 30 40 50 60 70 80 90 100 110 Iout (mA) Ef fe ct iv e Ef fic ie n cy (% ) 3.5GHz 3GHz 2.5GHz 2GHz (a) With D = 50% Fsw Sweep (D=66%) 0 40 80 120 160 200 240 30 40 50 60 70 80 90 100 110 Iout (mA) Ef fe ct iv e Ef fic ie n cy (% ) 2.5GHz 2GHz (b) With D = 66% Figure 3.18. The effect of Fsw on ηeff 53 D Sweep (Fsw=2GHz) 0 40 80 120 160 200 240 30 40 50 60 70 80 90 100 110 Iout (mA) Ef fe ct iv e Ef fic ie n cy (% ) 66% 50% (a) With Fsw = 2GHz D Sweep (Fsw=2.5GHz) 0 40 80 120 160 200 240 30 40 50 60 70 80 90 100 110 Iout (mA) Ef fe ct iv e Ef fic ie n cy (% ) 66% 50% (b) With Fsw = 2.5GHz Figure 3.19. The effect of D on ηeff 54 3.1.7 Summary The integrated clock driver/power converter designs presented here are capable of recovering energy from the clock and supplying it to the converter. The results show that the use of on-chip passives with power switching by CMOS inverters in ZVS mode allows for good efficiency [32] [33]. By converting unused potential energy into a useful regulated supply, the designer can power other parts of a circuit instead of wasting energy by simply dissipating unwanted charge to ground. Many applications can benefit from this new design technique. Optimization of the designs will require further investigation into the simulation tools, particularly their use in designing on-chip passives. Table 3.1 provides a summary of performance comparison between this work and two other previously published buck converters. The output voltage ripple given is part of the design specification. Note the high levels of efficiency relative to the other designs. Table 3.1. Summary of comparison between integrated buck converters Previous Work This Work Converter type 4-Phase Buck [14] 2-Phase Buck [15] Buck [32] [33] Technology 90nm CMOS 0.18µm SiGe RF BiCMOS 90nm CMOS Layout Area (mm2) 0.14 * (excludes L) 27 0.27 Switching frequency, Fsw (MHz) 480 45 3 000 Inductor, LF (pH) 3 600 (per phase) 11 000 (per phase) 320 Capacitor, CF (pF) 2 500 6 000 350 Supply Voltage, Vin (V) 1.8 2.8 1.0 Output Voltage, Vout (V) 0.9 1.5 ~ 2 0.53 ~ 0.75 Output Voltage Ripple < 5% Output Current, Iout (mA) 500 200 40 ~ 100 Effective Efficiency, ηeff (%) 72 65 184 (Vout=0.75V) 102 (Vout=0.63V) 74 (Vout=0.53V) * Layout area was reported in [13]. Among the previously published on-chip DC-DC converters, [14] has the highest reported switching frequency, which is 480MHz but still using on-package inductors. In contrast, 55 [15] implemented a fully on-chip buck converter in 0.18µm SiGe RF BiCMOS technology that was 65% efficient. It also used an area of 27mm2 to fit the large passive components. The buck converter in this work achieves a much higher effective efficiency using only 1/100th of the area. 3.2 Low-Swing Buck Converter 3.2.1 Introduction A high switching frequency is the key design parameter that enables the full integration of active and passive devices of a switching converter. At these high frequencies, the energy dissipated in the power MOSFETs and gate drivers are a good part of the total losses of a DC-DC converter. Although the integrated clock driver/converter circuit presented in Section 3.1 recycles the energy stored in the main clock capacitor, it does not attempt to save energy used in the “front- end” driver chain. In this section, the energy conscious techniques of reduce, reuse and recycle are applied to the front-end driver chain. In this design two separate chains of inverters are used to drive each of the power transistors in a buck converter circuit. A switching frequency near 1GHz1 results in a reduction in the filter inductor and capacitor area which allows full integration of these power supplies. To compensate for the switching power loss under high-frequency operation, low-swing drivers and supply stacking techniques are used together with charge recycling of the PMOS drive chain to improve conversion efficiency [37]. 1 This design was implemented in older 0.18µm CMOS technology for reasons of cost and fabrication schedule. All other implementations in this thesis were designed in newer 90nm CMOS technology. 56 3.2.2 Circuit Design The circuit diagram of a CMOS-based buck converter is shown in Figure 3.20(b). Cx includes all the parasitic capacitances at node Vinv including Mp and Mn drain to ground capacitances. When both Mp and Mn are off, a positive inductor current will remove charge from Cx, reducing Vinv, while a negative inductor current will charge Cx, increasing Vinv. When Vinv = 0, the Mn transistor is turned on, while when Vinv = VDD, the Mp transistor is turned on. In this way, ZVS operation is achieved for both Mn and Mp transistors by independently driving their gates. In Figure 3.20(c), the two time periods when both transistors are off are characterized as Tdelay1 and Tdelay2, corresponding to the delay-time needed to implement ZVS operation for the Mn and Mp transistors, respectively. There are four intervals of operation: • Interval 1 (time 0 to D×Tsw). Mp is on. During this time, the inductor current increases linearly since the voltage across it is constant. At the end of this interval, Mp is turned off in accordance with the required converter output voltage (the duty cycle). • Interval 2 (time D×Tsw to D×Tsw+Tdelay1). Both Mp and Mn are off. The charge that is stored in the parasitic capacitance Cx is moved to the output circuit through the inductor, as the inductor current can not be disrupted abruptly. This results in rapid drop of Vinv. In this short period of time, the inductor current can be assumed to be constant, as shown. • Interval 3 (time D×Tsw+Tdelay1 to Tsw−Tdelay2) starts when the voltage across Mn is close to zero. At this time the Mn is turned on under ZVS to provide a low-resistance path for the inductor current. As there is no energy supplied to the system and voltage across the inductor is constant, inductor current decreases linearly and by design reaches some negative value. At this point of time, Mn is turned off. • Interval 4 (time Tsw−Tdelay2 to Tsw). Both Mp and Mn are off. Parasitic capacitance Cx is charged as the inductor current can not be disrupted abruptly. This results in rapid 57 increase of Vinv. At the end of this interval, Vinv is close to VDD and Mp is ready to be turned on under ZVS. Diode Switch + Vin Vout + CF ILf LF (a) A typical buck converter (b) Simplified circuit diagram of the low- swing buck converter (c) Idealized timing diagram Figure 3.20. Low-swing buck converter 3.2.3 Complete Circuit In this design, two separate inverter chains are used to drive each of the power transistors of the buck converter circuit as shown in Figure 3.21. The tapered inverter chains are voltage-stacked to use the same VDD supply, similar to [27]. As a result, the inverter chains each have a lower supply voltage, resulting in low-swing operation to save gate and driver power. 58 Figure 3.21. Circuit diagram of the low-swing buck converter The size of transistor Mp is set to be three times the size of transistor Mn for symmetrical behavior. The chain to drive Mp is similarly three times larger than the bottom chain, which is optimized to drive Mn. Since the PMOS chain is larger, charge accumulates in the middle capacitor Cm which should operate near VDD/2. In [27], the excess charge is dissipated to Vss through an additional regulator forcing node Vm to VDD/2. Here, the extra charge is delivered to the converter output circuit to increase efficiency. This task is performed by two series diode- connected NMOS transistors, D1 and D2. These diodes automatically deliver charge to load when Vinv < (Vm − 2Vt) without a need for additional gating signals. Two diodes in series are needed to act as a voltage regulator for Vm when Mn is ON and Vinv is low. The goal is to keep Vm near VDD/2. Hence, accumulated charge at Cm is removed through the diodes by inductor LF instead of an external regulator. The voltage divider R1 and R2 puts Vm near VDD/2 at startup and does not significantly contribute to operational power. 59 Charge recycling occurs during intervals 2 and 4 when both Mp and Mn are off and Vinv is in transition. In particular, when Vinv is rising there is significant charge stored on the gate of Mp that is discharged through the upper driver to the Cm node at the same time that current is drawn from this node into Cx. When Vinv is falling, any additional surplus charge from the PMOS drivers can also be delivered to Cx. In this design, the reduce, reuse and recycle design technique has been employed as follows [38]: • Reduce: The wide NMOS and PMOS output transistors have large input gate capacitance, requiring them to be driven by a chain of tapered inverters referred to here as the front-end drive chain. Separate drive chains are required to allow precise control of the NMOS and PMOS turn-on and turn-off times to achieve ZVS. Despite ZVS, which reduces energy waste in the final NMOS/PMOS pair, significant losses are associated with operating the two drive chains and the gates of the output transistors at high switching frequencies. To reduce the energy lost at every transition, each drive chain employs low-swing signaling by swinging only half-rail, between 0 and VDD/2 or between VDD/2 and VDD for NMOS and PMOS, respectively. This saves a significant amount of energy compared to full-rail switching. However, the outputs of the low-swing drive chains must turn on their respective NMOS and PMOS output transistors, so it is essential that VDD/2 > Vt-NMOS and VDD/2 > |Vt-PMOS|. To increase overdrive, it is recommended that low-Vt devices be used for the NMOS and PMOS output transistors as well as the rest of the drive chain. • Reuse: A half-rail swing for both drive chains offers a further advantage: the NMOS and PMOS chain can share the common reference voltage of VDD/2. This allows energy reuse in the form of voltage supply stacking as shown in Figure 3.21. Charge used by the upper 60 PMOS drive chain still has unused potential, so it can be reused by the lower NMOS drive chain. A more general case of supply stacking is called charge recycling in [28]. • Recycle: The PMOS output transistor Mp in Figure 3.21 is three times wider than NMOS output transistor Mn. As a result, the drive chain of the PMOS (top inverter chain) is much larger and requires much more charge to operate than the drive chain of the NMOS (bottom inverter chain). Charge accumulates at node Vm, which is stored in the middle capacitor Cm. The excess charge is recycled by delivering it to the converter output load through the two series diode-connected NMOS transistors, D1 and D2. In this design, weak negative feedback helps keep Vm near a stable operating point of VDD/2. If Vm increases, the bottom chain receives a higher supply voltage, which increases its power intake and causes Vm to drop. At the same time, Mn turns on with a higher Vgs and Vinv is pulled closer to VSS, giving D1 and D2 higher Vgs, facilitating charge removal from Cm. Similarly, if Vm decreases, the top chain receives a higher supply voltage, which results in increasing its power intake and causing Vm to increase. Also, a lower Vm causes D1 and D2 to receive lower Vgs, facilitating accumulation of charge in Cm. Capacitance Cm was chosen to be 20 times larger than the NMOS Cgate to limit ripple at Vm. LF and CF values were chosen to be 4.38nH and 1.1nF, respectively, to operate at a switching frequency of 660MHz with a voltage ripple of less than 5% at 50mA load. 3.2.4 Simulation Three variants of the circuit were simulated: (i) baseline converter using full-swing drivers; (ii) low-swing/stacked drive chain is added to reduce and reuse energy; and (iii) diodes and Cm are added to recycle energy, similar to the prototype. Here, changes to the original baseline converter are done in two stages to be able to study the effect of each modification. Using low-Vt 61 transistors would have facilitated the operation of the supply-stacked low-swing transistors. Due to the lack of low-Vt transistors in the available 0.18µm CMOS kit, simulations of these designs are done at 2.2V instead of the typical 1.8V for this technology. Simulation results for a fixed load current of 50mA are shown in Figure 3.22. As expected, the circuit with all the options has the highest efficiency. Indeed, the efficiencies show improvement with each additional change. For example, at a 40% duty cycle, the efficiency of the circuits are (i) baseline 22%, (ii) low-swing 30%, and (iii) energy recycling diodes 35%. Thus the efficiency improves from 22% to 35% with the reduce, reuse and recycle methodology. Figure 3.22(a) also shows that while circuits (ii) and (iii) are more efficient than (i), they have lower Vout at the same duty cycle. 62 0 0.4 0.8 1.2 1.6 2 5 15 25 35 45 55 65 Duty Cycle (%) Vo u t ( V) Low-Swing with Energy Recycling (iii) Low-Swing without Energy recycling (ii) Baseline Full-Swing (i) (a) Output voltage vs. duty cycle 0 10 20 30 40 50 5 15 25 35 45 55 65 Duty Cycle (%) Ef fic ie n c y (% ) Low-Swing with Energy Recycling (iii) Low-Swing without Energy recycling (ii) Baseline Full-Swing (i) (b) Efficiency vs. duty cycle Figure 3.22. Simulation results for each variant of the circuit 63 3.2.5 Chip Implementation The chip was fabricated in 0.18µm CMOS. Node Vm, the middle voltage that should remain at VDD/2 for supply stacking, is made available off-chip to be externally probed or adjusted if necessary. Input resistors R3 and R4 in Figure 3.21 are 50Ω terminators so Vpmos-in and Vnmos-in can be driven by external signal generators at the high frequency of 660MHz. To keep things simple due to fabrication deadlines, this design does not automatically delay signals to achieve ZVS. Instead, the implementation relies upon the test equipment to generate input signals Vpmos-in and Vnmos-in with the appropriate timing. Although it is difficult to employ ZVS at a high frequency it has been successfully implemented in the other designs of this thesis. The NMOS transistors in the top inverter chain for Mp need to have zero body voltage with respect to their sources, so they are isolated from the p-substrate using n-well and deep n- well implantation as described in [39] and shown in Figure 3.23. The same procedure is used for D1 and D2, where the body should be connected to the drain to reverse bias the intrinsic body diode. Figure 3.23. Deep n-well implementation cross sectional view 64 The chip micrograph is shown in Figure 3.24. The chip is laid out for on-chip probing. Here, the inductor LF design is two turns of simple concentric coils implemented in the top four metal layers of the chip. The tracks include shorts along their length to reduce series resistance. The ground shield (PGS) is implemented using the lowest of the six available metal layers. The current density is 0.122mA/µm2. The value of inductance was extracted using ASITIC [40]. The inductance extracted was found to be 4.38nH, at 660MHz, with lumped pi model capacitances of 6.5pF and a quality factor of 10 at a resonant frequency around 1GHz. A DC series resistance of 0.7Ω was also extracted. The integrated capacitor CF is implemented using gate capacitance of an array of NMOS transistors. The 3.4mm2 total die area uses 2.5mm2 for the converter. Even at 660MHz, the inductor dominates the area which occupies 1.8mm2. Designed for an output current of 50mA at 1V, the power converter achieves a power-to-area ratio of 50/2.5=20mW/mm2. Figure 3.24. Chip micrograph. There are a few limitations with the implemented prototype. First, Mp and Mn and the drive chains should all be implemented with low-Vt transistors. Using them would help the drivers fully turn on with the low-voltage supply, thereby reducing power consumption in the drive chains and improving power delivery to the output load. However, these were not available in the CMOS process that was used. Instead, regular transistors were used, resulting in degraded efficiency in both simulation results and the manufactured prototype. Using an ad hoc method of 65 simulating low-Vt transistors, conversion efficiency at a 40% duty cycle is improved to 46% (up from 35%). Second, power is lost due to the voltage drop across diodes D1 and D2. The diodes were used to keep it simple for proof-of-concept, but a more complex circuit could be devised. Nonetheless, it is clear from the simulations that the concept is working and a significant improvement in efficiency is gained by the use of the driver energy recycling. Although there is a drop in Vout after switching to low-swing drivers, Figure 3.22(a) clearly shows that the addition of the energy recycling diodes is able to improve energy conversion to the point where the Vout is nearly restored to the same level obtained with the original full-swing drivers. The restoration in the voltage conversion ratio (Figure 3.22(a)) also implies that the rising edge of Vinv is sped up. Speeding it up by means of an increased reverse inductor current would be detrimental to the conversion efficiency because of discharging CF and it would increase the losses with a higher ripple current. Third, the ZVS timing delays were controlled by the signal generator, but a proper circuit needs to be added to control these delays itself. This was not implemented to keep the design simple. 3.2.6 Chip Measurements Testing of this chip was done at 2.2V like the simulations. Conversion efficiency and output voltage measurements are presented in Figure 3.25. The physical measurements required the use of an external supply of 1.1V connected to Vm because it was higher than the expected voltage of VDD/2. However, measurements show that this supply voltage was not delivering any power to the circuit as it was always sinking current to reduce Vm. The output is adjustable between 0.75V 66 to 1V by varying duty cycle D from 45 to 64% with a fixed Rload = 18.3Ω. Conversion efficiency, Pout/Pin, ranges 25 to 31%. The use of the external source voltage sink indicates that the simulation of the gate driver inverters is not as accurate as required when using standard transistors in the supply-stacked manner. The efficiency of the prototype could be improved in a few ways. Using low-Vt transistors would help the drivers fully turn on with the low-swing voltage supply, thereby reducing power consumption in the drive chains. Power is also lost due to the voltage drop across diodes D1 and D2. The diodes keep it simple, but a more complex circuit could be devised. For example [41] mimics the behavior of a diode using a transistor, where the gate is driven by a voltage comparator sensing VDS. However, gating circuitry used here must operate much more quickly, on the order of tens of picoseconds [41]. 67 0 0.2 0.4 0.6 0.8 1 1.2 40 50 60 70 Duty Cycle (%) Vo u t ( V) (a) Output voltage vs. duty cycle 0 10 20 30 40 50 60 40 50 60 70 Duty Cycle (%) Ra w Ef fic ie n c y (% ) (b) Raw efficiency vs. duty cycle Figure 3.25. Measured prototype performance 68 3.2.7 Summary The low-swing buck converter design presented here demonstrates the operation of a 660MHz converter implemented in a 0.18µm process, including on-chip passives. The measured efficiency obtained is promising for such a prototype and for such a high switching frequency [37]. However, the important result is that energy recycling is shown to be a feasible way to reduce energy loss in the front-end drive chain and to boost overall conversion efficiency. The chip area consumed by the converter is dominated by the inductance even at 660MHz. However, the inductor was designed for a current of 50mA and this represents a power to area ratio of 50mW/2.5mm2. By combining the techniques in this chip with clock energy recycling introduced in the integrated clock driver/power converter circuits, it should be possible to boost the raw efficiency above 50%. 69 4 INTEGRATED BOOST AND BUCK-BOOST CONVERTERS In this chapter, two more integrated clock driver/power converter designs are discussed that operate at 3GHz. First, a boost converter configuration is used to provide higher output voltage levels than buck converters. Second, a buck-boost converter is used to generate a negative supply voltage, which may be useful for analog circuits. Similar to the previous designs introduced here, high-speed switching losses are reduced by employing zero voltage switching and by directly integrating the clock-tree drivers with the converter power-transistor drivers. Also, the designs are implemented in open-loop, with the goal of having less than 5% ripple on Vout. The techniques proposed in these designs are valid for finer feature size CMOS technologies as well. 4.1 Integrated Clock Driver/Boost Converter 4.1.1 Introduction Compared to discrete designs, on-chip converters have relatively higher static power losses. Also, clocks always require a minimum low time. As a result, a buck converter won't be able to practically provide an output voltage that is close to VDD. To remedy this, a boost converter configuration is investigated here that provides higher output voltage levels [42]. 70 4.1.2 Circuit Design In the typical boost converter of Figure 4.1(a), when the switch is on, voltage Vin will be across the inductor LF and current will build up in the inductor. In the next phase, when the switch is off, inductor current finds its way through the diode and charges the output capacitor to Vout = VLf + Vin. The diode plays an important role as it will automatically turn off to prevent shorting Vout to ground when the switch turns on. One challenge comes from the fact that a low-loss power diode is not available in CMOS technology.2 The integrated clock driver/boost converter circuit shown in Figure 4.1(b) uses a switched-capacitor voltage-shifter circuit to generate a shifted gating signal for the PMOS transistor used in place of a power diode. Similar to Figure 3.2(b), a chain of inverters is used to drive Cclk and ZVS needs to be employed to recover the energy stored in the capacitor. In addition to providing output voltage levels higher than VDD, the circuit also produces a buffered version of the clock, Vclk_scaled, at the same magnitude as Vout. This clock signal can be used in the circuitry powered by the converter, but allowances for clock skew and level-conversion will need to be made in the data path logic. Ignoring turn on/off times of the transistors, there are two intervals of operation as shown in Figure 4.1(c): • Interval 1: At the beginning of this interval, Vclk goes high and Mn turns on. Consequently, voltage VDD will be across the inductor LF and the inductor current increases linearly (assuming a constant voltage across the inductor). At the same time, the voltage of capacitor Cshift will be added to Vclk so that Vshift reaches voltage Vmax, a higher voltage than Vout. The diodes Dshift are reverse biased. As Cshift is pre-charged to Vout – 2Vdiode_drop in the previous interval 2, Vgs of Mp would be equal to: Vmax – Vout = (VDD + 2 Diodes consisting of a simple p-n junction can be built in CMOS, but the associated voltage drop in modern CMOS is large relative to Vin, Vout and VDD. 71 (Vout – 2Vdiode_drop)) – Vout = VDD – 2Vdiode_drop which has a positive value and Mp turns off completely. • Interval 2: As a new Vclk half-cycle starts, Mp turns on and Mn turns off. Capacitor Cshift will be charged through diodes Dshift to a value of Vout – 2Vdiode_drop. As the diodes are forward biased, Vgs of Mp becomes equal to –2Vdiode_drop which has a negative value larger than the threshold voltage of Mp, turning it on completely. At this time, inductor current finds its way through Mp and will charge up the output capacitor CF. (a) A typical boost converter (b) Simplified circuit diagram of the integrated clock driver/boost converter (c) Idealized timing diagram Figure 4.1. Integrated clock driver/boost converter 72 In the above discussion, the average voltage of Vclk is D×VDD, where D is the duty cycle. This is the operating voltage available to the boost converter (and not VDD). Ideally, the output voltage would be DDclkout VD DV D V × − =× − = 11 1 . With D > 50%, the output voltage will be higher than VDD. Voltages higher than VDD could be used for 1) high-voltage I/O circuits, 2) gating signal of NMOS pass transistors such as those used in sampling circuits, 3) providing PMOS transistors with body bias voltages higher than VDD which is used to dynamically change the threshold voltage to achieve speed and power scaling and 4) speeding up the operation of some parts of the circuit by increasing VDD. 4.1.3 Complete Circuit The complete circuit diagram of the integrated clock driver/boost converter circuit is shown in Figure 4.2. Mn2 and Mp2 introduce the turn-on delay for Mn1 as in the integrated clock driver/buck converter from Chapter 3. The drain node of Mn3 and Mp3, denoted as Vclk_scaled, swings from zero to Vout. Here, the value of the scaled clock capacitor is selected to be 2.2pF. Some of the recovered energy is subsequently lost when this capacitor is discharged, so it should be kept small. To keep the output ripple on Vout < 5%, a large capacitance CF is needed for bulk energy storage. In Figure 4.2, the gating signal for Mn3 changes from VDD to zero. However as the source of Mp3 is connected to Vout, the appropriate gating signal for Mp3 should instead change from Vout to Vout – VDD, therefore a voltage shift grater than or equal to Vout – VDD is needed. The combination of diodes Dshift, capacitor Cshift, and transistors Mn1 and Mp1 perform as a switched- capacitor voltage shifter. In interval 2, the top plate of Cshift is connected to Vout through Dshift diodes and the bottom plate is connected to the ground through Mn1. The top plate of Cshift is connected to the gate of Mp3 and turns it on due to a gating voltage of Vout – 2Vdiode_drop. In 73 interval 1, the bottom plate is switched to VDD through Mp1. The capacitor Cshift retains its charge since the diodes Dshift are reverse biased, so the top plate of Cshift jumps up by VDD to Vout – 2Vdiode_drop + VDD. However, since 2Vdiode_drop is smaller than VDD, transistor Mp3 receives an acceptable gating signal and turns off. 192/0.1 Wp/Lp = 576/0.1 Wp/Lp = 192/0.1 Vpulse Mp1 Wp/Lp = 48/0.1 Wp/Lp = 16/0.1 Wp/Lp = 192/0.1 Wp/Lp = 64/0.1 64/0.1 4096/0.1 1024/0.1 512/0.1 x2 Clock Load Capacitance + ILf Cshift=21pF Vclk Vclk_scaled 4096/0.1 2048/0.1 Mp2 Mp3 Mn2 Mn3 Mn1 Cclk_scaled Vshift Dshift Vout 1k Cclk 25PF + CF=378pF 2.2pF LF=310pH 216/0.75 36720/0.75 VDD 2016/0.75 Cclk=Cshift Figure 4.2. Circuit diagram of the integrated clock driver/boost converter Except for Dshift and Cshift, all transistor body terminals are connected to their source pins. The body terminals of Dshift and Cshift are connected to ground instead. This prevents forward biasing of the body-drain intrinsic diode, in case the drain voltage goes lower than the source voltage. Also, this makes the layout implementation easier as well, since no deep n-well structure is required. Finally, a 1kΩ resistor is added in parallel to Cshift to bias the Dshift diodes and provide a DC current path to avoid floating nodes when the Dshift is off. 74 4.1.4 Simulation Figure 4.3 shows the output voltage and the effective efficiency of the boost converter at different duty cycles and output currents. As D is increased, the output voltage increases and the effective efficiency decreases. By varying the duty cycle, the highest effective efficiency changes to a different output current level. A maximum effective efficiency of 111% is achieved at D = 40% with Iout = 30mA. At Iout = 50mA, by varying the D from 40% to 80%, Vout changes from 0.75V to 1.73V. The corresponding effective efficiency ranges from 98% down to 24%. For the reference circuit consisting of a clock driver only, simulations determined its power consumption, Pin2, was 100mW. Compared to the integrated clock driver/buck converter circuit, Pin2 is higher here because a larger Cclk has been selected (25pF vs. 12pF). Also, all the transistors are low-Vt type to facilitate operation at lower VDD levels. 75 0 0.5 1 1.5 2 2.5 30 40 50 60 70 80 Duty Ratio (%) Vo u t (V ) Iout=10mA Iout=30mA Iout=50mA Iout=70mA Iout=100mA (a) Output voltage vs. duty cycle 0 25 50 75 100 125 0 20 40 60 80 100 Iout (mA) Ef fe ct iv e Ef fic ie n cy (% ) D=40% D=50% D=60% D=70% D=80% (b) Effective efficiency vs. output current Figure 4.3. Simulation results of the integrated clock driver/boost converter 76 4.1.5 Chip Implementation The micrograph of the clock driver/boost converter chip is shown in Figure 4.4. The area of the integrated clock driver/boost converter including LF is 0.26mm2 and the area of the reference clock driver is 0.03mm2. The inductor alone is 0.1mm2. The total die area of 2mm2 is shared with two other designs in this work (the integrated clock driver/buck-boost converter later in this chapter and the low-power clock driver circuit in Chapter 5). Figure 4.4. Chip micrograph of the integrated clock driver/boost converter 4.1.6 Chip Measurements Unfortunately, this circuit was not functional due to a number of suspected problems. Higher peak-to-peak current levels compared to the buck design might have been a reason. Since the inductor current in this design is much higher, the resistive voltage drop across the inductor and current paths may be significant. Although it used wider/thicker paths than the buck design, using even more metal is suggested for future layouts. Also this circuit shares the die with two other designs, a buck-boost converter and a low- power clock driver. There are some elements of the circuit that were also used in the buck-boost 77 design which also didn’t work, such as the voltage shifter circuit. This leads to the conclusion that the present voltage shifter design might be very sensitive to fabrication variation, and/or necessary layout masks have not been used, specifically for the 1kΩ resistor. If the voltage shifter circuit is faulty, there won’t be enough gate voltage Vshift to turn off the Mp3 transistor, thus it stays on and drains the output capacitor CF. This agrees with chip measurement which show it providing an output voltage of a few hundreds of millivolts, indicating that the output may be shorted within the chip. Inspection of the voltage shifter circuit is suggested for future layouts. 4.1.7 Summary The idea of energy recovery from a high-speed clock load in high-speed digital circuits was investigated by exploring the integration of the boost converter topology with a high-speed clock driver [42]. While simulation shows promising results of effective efficiency above 100%, chip measurement results are unable to confirm this due to non-functional fabricated chips. 4.2 Integrated Clock Driver/Buck-Boost Converter 4.2.1 Introduction Another basic switching converter investigated here is a buck-boost converter which has a negative output voltage with respect to the common terminal of the input voltage [42]. 78 4.2.2 Circuit Design In the typical buck-boost converter of Figure 4.5(a), when the switch is on, voltage Vin will be across the inductor LF and current will build up in the inductor. In the next phase, when the switch is off, inductor current finds its way through the diode and charges the output capacitor to Vout = VLf which has a negative value. Here, the diode prevents shorting Vout to VDD when the switch is on. The integrated clock driver/buck-boost converter circuit shown in Figure 4.5(b) uses a switched capacitor voltage shifter circuit to generate a shifted gating signal for the NMOS transistor used in place of the power diode. Similar to Figure 3.2(b), a chain of inverters is used to drive the converter and ZVS needs to be employed. An extra switch Sclk is also added between nodes Vclk and Vinv. This switch prevents Vclk from becoming negative as Vinv goes below zero when Mn is on. Ignoring turn on/off times of the transistors, there are two intervals of operation as shown in Figure 4.5(c): • Interval 1: At the beginning of this interval, clkV goes to zero and Mp turns on. Switch Sclk is closed and Cclk is charged up. Consequently, voltage VDD will be across the inductor LF and current in the inductor increases linearly assuming a constant voltage across the inductor. At the same time, voltage of the capacitor Cshift from the previous interval 2 will be added to clkV and Vshift reaches a lower value than Vout as diodes Dshift are reversed biased. Since Cshift is pre-charged to VDD – (Vout + 3Vdiode_drop) in the previous interval 2, the Vgs of Mn would be equal to: Vshift – Vout = (– VDD + (Vout + 3Vdiode_drop)) – Vout = – VDD + 3Vdiode_drop which has a negative value and Mn turns off completely. • Interval 2: As clkV is high, Mp is off and Mn is on. At the same time, capacitor Cshift will be charged through diodes Dshift to a value of VDD – (Vout + 3Vdiode_drop). Since the diodes 79 are forward biased, the Vgs of Mn is equal to 3Vdiode_drop, which is a positive value larger than the threshold voltage of Mn, thus ensuring it turns on completely. At this time, inductor current finds its way through Mn and will charge up the output capacitor CF to a negative voltage value. The switch Sclk is closed at the beginning of interval 2 to allow the inductor to discharge Cclk. However, when Vinv starts to go negative, the switch is opened to keep Vclk at zero. (a) A typical buck-boost converter (b) Simplified circuit diagram of the integrated clock driver/buck-boost converter (c) Idealized timing diagram Figure 4.5. Integrated clock driver/buck-boost converter In the above discussion, the available input voltage to the converter is DDclkin VDVmeanV ×== )( . Hence, the ideal output voltage is calculated by 80 DDinout VD DV D DV × − − =× − − = 11 2 which is negative. A negative output voltage could be used for 1) gating signals of PMOS pass transistors such as those used in sampling circuits, 2) providing NMOS transistors with negative body bias voltages which is used to dynamically change the threshold voltage to achieve speed and power scaling, and 3) negative supply voltage for analog circuits. 4.2.3 Complete Circuit A complete implementation of the integrated clock driver/buck-boost converter is shown in Figure 4.6. Many of the changes are similar in nature to those used to implement the boost circuit, e.g., the addition of Mp3 and Mn3 to delay the energy-wasting discharge of Cclk. In Figure 4.6, the gating signal for Mp1 changes from zero to VDD. However as the source of Mn1 is connected to Vout, the appropriate gating signal for Mn1 should instead change from Vout to Vout + VDD, therefore a voltage shift equal to Vout is needed. The combination of diodes Dshift, capacitor Cshift, and transistors Mn3 and Mp3 perform as a switched-capacitor voltage shifter. The bottom plate of Cshift is connected to Vout through Dshift diodes and the top plate is connected to clkV through Mp3 which is connected to VDD in interval 2. In interval 1, Dshift diodes are reversed biased and the top plate is switched to ground through Mn3. As the capacitor Cshift retains its charge, the bottom plate of Cshift jumps down by VDD. The switched capacitor voltage is –VDD + (Vout + 3Vdiode_drop) instead of Vout. However, since 3Vdiode_drop is smaller than VDD, transistor Mn1 still receives an acceptable gating signal to turn off. There are three implementation decisions in Figure 4.6 that warrant further discussion. First, transistors Mp2 and Mn2 are added to protect Mp1 and Mn1 from potentially large voltage drops across them since Vinv switches between VDD and Vout. Connecting the gates of transistors 81 Mp2 and Mn2 to ground will provide for automatic on-off timing and proper operation of the circuit. Second, transistor Mp4 acts as the switch to prevent Vclk from going negative. The gate of Mp4 is connected to Vbias, which is set at the threshold voltage of PMOS transistor Mp5. When Vinv is positive, Mp4 is on and provides the path for the inductor current to discharge Cclk. When Vinv falls below zero, Mp4 turns off and nodes Vinv and Vclk are disengaged. Meanwhile, Mn2 turns on and provides a path for the inductor current. In this design, Vbias is generated by a small DC current passing through the diode-connected PMOS transistor Mp5. To stabilize the voltage, capacitor Cbias is added to the node Vbias. Figure 4.6. Circuit diagram of the integrated clock driver/buck-boost converter Third, the body terminals of all NMOS transistors need to be connected to their source node or the most negative voltage in the system to prevent forward biasing of body-source 82 intrinsic diodes. For transistors Mn1 and Mn2, the body is connected to the (non-ground) source node, so these transistors need to be isolated inside a deep n-well structure for layout. The body terminals of all other transistors are also connected to their sources. For layout implementation of CF and Cbias, PMOS transistors are used. If NMOS transistors were used, since Vout and Vbias are both negative, the gate and source nodes should have been connected to the ground and a negative voltage, respectively, to have a positive Vgs. Therefore, a deep n-well would be needed in order to be able to connect body to their source. 4.2.4 Simulation Figure 4.7 shows the output voltage and effective efficiency of the integrated buck-boost converter circuit. Here, maximum effective efficiency of 66% is achieved at D = 20% with Iout = 50mA. In this case, at 50mA output current, the output voltage changes from −0.5V to −1.43V when varying the duty cycle from 20% to 60%. The corresponding effective efficiency ranges from 66% down to 35%. Simulated Pin2 was 100mW. The lower efficiency compared to the previous circuits is a result of more transistors in the main current path. Also, all the transistors are low-Vt type to facilitate operation at lower VDD levels. In these circuits, the effective efficiency can only exceed 100% when clock energy is being recycled, since it is not counted as the input power by the effective efficiency metric. In this buck-boost design, the effective efficiency does not exceed 100%, so it does not offer any proof that clock energy is being recycled. However, looking at Figure 4.6 reveals that during recycling time, there is no path from Cclk to ground except through LF. This means charge in Cclk is being recycled to current in LF while Mn2 is off. During this recycling time, it should be noted that Mn2 is off because Vinv ≥ −Vt_nmos, so it is not conducting. 83 -2 -1.6 -1.2 -0.8 -0.4 0 10 20 30 40 50 60 70 Duty Ratio (%) Vo u t (V ) Iout=10mA Iout=30mA Iout=50mA Iout=70mA Iout=90mA (a) Output voltage vs. duty cycle 0 20 40 60 80 100 0 20 40 60 80 100 Iout (mA) Ef fe ct iv e Ef fic ie n cy (% ) D=20% D=30% D=40% D=50% D=60% D=70% (b) Effective efficiency vs. output current Figure 4.7. Simulation results of the integrated clock driver/buck-boost converter 84 4.2.5 Chip Implementation The micrograph of the clock driver/buck-boost converter chip is shown in Figure 4.8. The area of the integrated clock/converter including LF is 0.2mm2. The inductor alone is 0.1mm2. Although it is more complex, it is smaller than the boost design because more effort was put into its layout design. This design shares the same die as the integrated clock driver/boost converter presented earlier in this chapter and the low-power clock driver design presented in Chapter 5. Figure 4.8. Chip micrograph of the integrated clock driver/buck-boost converter 4.2.6 Chip Measurements Unfortunately, this circuit was not functional due to a number of suspected problems similar in nature to the boost design presented earlier. The chip can provide a negative output voltage of a few hundreds of millivolts, leading to similar conclusions as the boost design. 4.2.7 Summary The idea of energy recovery from a high-speed clock load in high-speed digital circuits was investigated by exploring the integration of the buck-boost converter topology with a high-speed 85 clock driver [42]. While the simulation results are promising, test results are not available due to non-functional fabricated chips. 4.3 Conclusions The two designs presented in this chapter work in simulation, but there appear to be related layout issues that prevent the fabricated prototype from operating correctly. This highlights some of the difficulty of designing these new types of circuits. It is essential to fabricate prototypes and test them due to difficulties with modeling and simulating these high power circuits with magnetic fields, heat, and other practical issues. Although the two designs presented in this chapter did not result in a functional prototype, they inspired the design of a third circuit which is presented in the next chapter. It borrows from the integrated clock driver/boost converter to produce a low-power clock driver. This third prototype circuit did operate correctly and results in 35% lower power in the clock drivers. 86 5 LOW-POWER CLOCK DRIVER 5.1 Introduction In this chapter, a low-power clock driver is designed to return the energy stored in the clock capacitance back to the power grid [43]. This way, instead of producing a secondary regulated output voltage like all of the other circuits in this thesis, the energy needed to operate the clock driver itself is effectively reduced. The circuit configuration of this low-power clock driver resembles a boost converter or full-bridge DC-DC converter [12]. 5.2 Circuit Design A simplified schematic of the proposed low-power clock driver circuit is shown in Figure 5.1(b). This circuit incorporates an inductor at the clock node, but unlike resonant clocking schemes, the inductor appears in the driver side not the load side. Cclk and Cint are the sum of wiring and transistor capacitances that are connected to nodes Vclk, and Vint, respectively. Assuming a fan- out of four as the inverter taper factor, Cint is one-fourth of Cclk. In the discharging phase of Cclk, the energy stored in the capacitor is transferred to the inductor instead of being discharged to ground. Some of this inductor energy is returned to the power grid through Mp2, effectively reducing power consumption of the clock driver. 87 (a) A typical full-bridge converter (b) Simplified circuit diagram of the low-power clock driver (c) Idealized timing diagram Figure 5.1. Low-power clock driver The circuit in Figure 5.1(b) resembles a full-bridge DC-DC converter in which Mp1, Mn1, Mp2 and Mn2 are the bridge switches, and Cint, Cclk and LF are the bridge load. CF represents the intrinsic power-grid capacitance and the on-chip decoupling capacitances commonly added to digital designs. The input to the generic full-bridge converter shown in Figure 5.1(a) is a fixed DC voltage but the DC magnitude and polarity of the bridge load voltage (Vclk – Vint) can be adjusted by pulse-width modulating the gating signals. Switches (Mp2, Mn1) and (Mn2, Mp1) are treated as two pairs. Because of the inductive load, depending on the direction of the load voltage and 88 current, the load may consume or return power. The load current does not become discontinuous but the input current to the bridge can change its direction, so it is important that the source has low internal impedance. A bigger CF would better facilitate this requirement. If the bridge stays in a particular state long enough, the energy stored in the inductor would be large enough to be used for charging/discharging the load capacitors. In practice, non- ideality of Mn2 and Mn1 results in their slow turn-on, providing the time needed for the inductor current to discharge Cint and Cclk. Similarly, non-ideality of Mp2 and Mp1 gives the inductor time to charge those capacitors. In the simplified design of Figure 5.1(b), the CMOS inverter propagation delay (from Vint to Vclk) helps provide more time for the inductor to charge/discharge capacitor Cclk. This is observed, for example, after Mp2 turns on and raises Vint with the assistance of the inductor before Vclk falls due to the turn-on of Mn1. The complete circuit, which will be discussed in detail later, utilizes zero-voltage switching (ZVS) to provide an even longer delay that is dynamically adjusted. Operation of the circuit in Figure 5.1(b) can be explained using the idealized timing diagram shown in Figure 5.1(c). There are eight intervals: • Interval 1: Mp1 and Mn2 are on. Cclk is already charged up and Vclk is high. Inductor current is positive and is increasing linearly. • Interval 2: Mn2 is turned off and Mp2 is turned on. Vint increases. • Interval 3: Mn1 is turned on and Mp1 is turned off. Vclk decreases. For a short time the inductor current continues to rise. When Mp1 is off, the inductor takes energy from Cclk rather than VDD and helps Vclk to fall rapidly. The inductor will first transfer energy to Cint, helping Mp2 to increase Vint quickly, and then transfer energy to the on-chip power grid through Mp2. Inductor current peaks when Vint = Vclk, i.e., when the voltage across LF is 89 zero. The inductor current starts to decrease. Vclk and Vint reach low and high values, respectively. • Interval 4: Mp2 and Mn1 are on. Cclk is already discharged and Vclk is low. Inductor current is positive and is decreasing linearly. • Intervals 1′–4′: With the direction of the inductor current reversed, intervals 1–4 repeat in the opposite sense to help charge capacitor Cclk from the stored energy in Cint and LF. When Cint is discharged, Mn2 keeps Vint at zero, providing the current path for LF to charge up Cclk. In the above discussion, whenever the absolute value of the inductor current is decreasing, the energy stored in the inductor is being delivered to another element of the circuit. Here, the destination of the charge can be CF, Cclk, or Cint. Energy recycling occurs when Cclk charge is returned to the power grid via the inductor during interval 3. The inductor also reduces the amount of energy consumed by helping to precharge Cclk from the energy stored in itself and Cint during interval 3′. However, as Cint is smaller than Cclk, there is no opportunity to return energy to the power grid in this interval. Additional energy recycling occurs when LF magnetic energy is returned to the power grid during intervals 4 and 4′. 5.3 Complete Circuit Ideally, all of the energy stored in Cclk should be recovered (by moving it to Cint and/or CF) rather than being wasted by discharging Cclk into the ground. Thus, to maximize the energy savings, the turn-on of Mn1 should be delayed. This is shown in Figure 5.2 with the addition of transistors Mn3 and Mp3. Furthermore, Mn3 and Mp3 also delay the turn-on of Mp1, allowing Cclk to be precharged by the inductor. This achieves zero-voltage switching in the final drive stage and reduces switching power loss. 90 The main benefit of implementing ZVS for Mn1 is that Cclk won’t be shorted to ground anymore. During ZVS dead-time, the charge is removed (recovered) by the inductor current and consequently Vclk is reduced to zero. After this, Mn1 is turned on to provide a low-loss path for current and also to keep Vclk around zero. If Mn1 is not turned on, the inductor current would turn on the intrinsic body-drain diode of Mn1. The resultant voltage drop across this diode, –Vdiode_drop, would contribute to the overall power consumption of the system. In the charging phase of Cclk, ZVS for Mp1 causes Cclk to be charged mainly through the inductor LF. Figure 5.2. Circuit diagram of the low-power clock driver and the reference clock 91 5.4 Simulation The circuit of Figure 5.2, consisting of an inductor and two ZVS transistors, returns part of the Cclk energy back to the power grid thus the power consumption of the clock driver is reduced in a non-resonant fashion. In comparison, clock-resonance schemes such as [10] and [11] reduce energy by resonating Cclk with an inductor, resulting in nearly sinusoidal clock waveforms. Simulation results of the implemented low-power clock driver operating at 4 GHz are shown in Figure 5.3. As shown in the figure, the proposed technique preserves the sharp edges of the clock in the presence of the inductor. Compared to the reference clock driver implemented in the same process, the slope of the rising clock edge in the new circuit is similar, although the falling slope is slightly slower because ZVS transistors Mn3 and Mp3 are in the path of charging the Vintn node. Thus, Mn1 turns on slightly slower and hence, Vclk has a slower falling edge. -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.1 0.2 0.3 0.4 0.5 Time (ns) Vo lta ge (V ) Complete Clock Driver Simplified Clock Driver Reference Clock Driver Figure 5.3. Simulated clock waveforms of Figure 5.1(b) and Figure 5.2 To investigate the effect of ZVS transistors on circuit operation, Mp2 and Mn1 drain currents are plotted in Figure 5.4 and Figure 5.5, respectively. A positive Mp2 drain current 92 means that Cclk charge is being returned to VDD and a positive Mn1 drain current means that Cclk is being discharged to the ground. Figure 5.4 shows that there are periods of time that Mp2 drain current, in both simplified and complete circuit versions, has a positive “area under the curve”, with the complete circuit having a bigger area. Similarly, Figure 5.5 shows that Mn1 drain current in both simplified and complete versions have a smaller “area under the curve”, with the complete circuit having a smaller area. Simulations show that the “area under the curves” in Figure 5.4 are 1.3, −0.7 and −7.9pA.s for the complete, simplified and reference circuits, respectively. Similarly, the “area under the curves” in Figure 5.5 are 16.2, 19.3 and 24.0pA.s for those circuits. These results that are for the PMOS and the NMOS transistors that are in the Cclk discharge path, can help in comparing the three variants of the circuit. The complete circuit has the biggest Mp2 area, confirming the most recycling to VDD, and has the smallest Mn1 area, confirming the least dissipation of Cclk charge to ground. The low-power clock driver of Figure 5.2 was also simulated at different switching frequencies along with its simplified version from Figure 5.1(b) and the reference clock driver. The simulation results in Figure 5.8 show a trend that power savings is improved as the clock frequency is increased. The simplified circuit does not perform as well as the complete circuit since the ZVS transistors Mn3 and Mp3 in Figure 5.2 assist in energy return to the power grid. Also, simulation results at 4 GHz show a percentage power saving equal to (Pin2−Pin1)/Pin2 = 37%. Here, Pin1 = 86mW and Pin2 = 136mW are the power consumption of the complete and the reference circuits, respectively. 93 -200 -150 -100 -50 0 50 100 150 200 0 0.1 0.2 0.3 0.4 0.5 Time (ns) M p2 D ra in Cu rr en t (m A) Complete Clock-Driver Simplified Clock-Driver Reference Clock-Driver Figure 5.4. Simulated Mp2 drain current waveforms of Figure 5.1(b) and Figure 5.2 -100 0 100 200 300 400 500 600 700 800 0 0.1 0.2 0.3 0.4 0.5 Time (ns) M n 1 D ra in Cu rr e n t (m A) Complete Clock Driver Simplified Clock Driver Reference Clock Driver Figure 5.5. Simulated Mn1 drain current waveforms of Figure 5.1(b) and Figure 5.2 94 To evaluate the effect of inductor value on power consumption, the complete circuit is simulated with different inductor values by varying a factor K such that LF = K×310pH. Figure 5.6 shows the results and suggests an optimum inductor value is needed for different frequency ranges. For example, at K = 1, minimum power consumption is achieved over the clock frequency range of 3 to 4GHz. This value of inductance corresponds to the fabricated prototype. 0 20 40 60 80 100 120 140 160 180 1.5 2 2.5 3 3.5 4 4.5 5 Fsw (GHz) Po w e r (m W ) Reference (no inductor) K=0.5 K = 0.7 K = 1.0 (Fabricated) K = 1.4 K = 2.0 Figure 5.6. Effect of changing inductor value on power savings in Figure 5.2 5.5 Chip Implementation As a proof of concept, the two circuits in Figure 5.2 have been fabricated in a 1P7M2T 90nm CMOS process using low-Vt transistors to facilitate operation at lower VDD levels. The 310pH inductor is made with a single loop using the four top metal and one extra aluminum (ALUCAP) layers in parallel. The inductor was modeled using ASITIC. A Patterned Ground Shield (PGS) was also placed in between the inductor coil and the substrate. In the chip, the total capacitance connected to node Vclk (shown as Cclk in Figure 5.2) is 25pF. Presenting a fanout-of-4 load to the clock driver, the load gate capacitance connected to 95 node Vclk is 21pF which is implemented using gate capacitance of 2016/0.75µm NMOS transistor array. All transistor bodies are connected to their sources, except for Mn3 whose body is connected to ground. This prevents forward biasing of the body-drain intrinsic diode and avoids the need for using a deep n-well structure. The chip micrograph is shown in Figure 5.7. The inductor area is 0.1mm2. The low- power clock driver (including the inductor) and the reference circuit occupy 0.15mm2 and 0.03mm2, respectively. Figure 5.7. Chip micrograph 5.6 Chip Measurements Chip measurement results in Figure 5.8 show energy savings for a clock frequency range of 2.75 to 4GHz. The measurements show increasing power savings as clock frequency increases to 4GHz. At lower frequencies, the inductor current will have more time to build-up, which results in an increased resistive voltage drop across the inductor. Thus the energy savings are reduced. To improve this, a larger inductance is needed as shown in Figure 5.6. 96 The simulation results show very good agreement with the measured results below 3.5GHz, but begin to deviate at higher frequencies. Measurements above 4GHz were not possible due to limits of our test equipment. At 4GHz, measurements confirm the power consumption is reduced from 117mW (in the reference circuit) to 76mW (in the complete circuit), a net power savings of 35%. 0 20 40 60 80 100 120 140 2.75 3 3.25 3.5 3.75 4 Fsw (GHz) Po w er (m W ) Reference Clock Driver (Simulation) Reference Clock Driver (Measured) Simplified Low-Power Clock Driver (Simulation) Low-Power Clock Driver (Simulation) Low-Power Clock Driver (Measured) Figure 5.8. Test and simulation results The clock waveforms are made available off-chip using open-drain PMOS buffers. At 4GHz, the RMS clock jitter is measured to be 1.25ps and 1.17ps for the complete low-power clock driver and the reference clock driver, respectively. Thus, the added jitter by the inductor and ZVS transistors is negligible. 97 5.7 Summary The design introduced here benefits from the charge stored in the clock load capacitance. Thus, the exact location for including the proposed clock driver circuits depends on the configuration of clock distribution network as was discussed in Section 2.5. In many situations, it is desirable to “stop the clock” to save power by gating the incoming clock signal. With a stopped clock, the inductor would be continuously conducting and dissipate significant static power. To solve this problem, power gating with a header transistor can disconnect the power supply from the driver, which also reduces standby leakage [44]. This introduces a new concern: the LC components can oscillate and introduce additional unwanted clock transitions until the stored energy in the system is dissipated. To address this issue, an extra NMOS transistor can be added in parallel to Cclk to provide a discharge path for the clock, keeping Vclk at zero and immediately shorting any unwanted oscillations. This shorting transistor can share the same gating signal as the header power-gating transistor. One of the strengths of the circuit presented in this chapter compared to earlier chapters is its simplicity. It requires relatively few components, and it does not require changing the operation of the clock by duty cycle modulation or low-swing distribution. Hence, the application of energy recycling concepts and on-chip DC-DC converter technology resulted in significant power savings to a very important circuit. 98 6 CONCLUSIONS As an energy saving strategy, recycling energy stored in the clock that would otherwise be discharged to ground has been the subject of this dissertation. Two methods of reusing this energy have been investigated: 1) using the energy to provide an extra supply voltage for other circuits and 2) transferring the energy back to power grid to improve power consumption of the circuit. Power losses in a system can be divided into two categories: resistive and dynamic. Reducing resistive power loss by optimizing along current paths has always been a goal for circuit designers, while reducing dynamic losses have been achieved by minimizing the gate capacitance, reducing the supply voltage and/or switching frequency. The voltage across an open switch is stored as electric energy in the stray capacitance of the switch. It has been known that an inductor is needed to successfully remove this energy in full, a common practice in designing an LC oscillator circuit. The inductor is a good candidate for transferring energy into it as, ideally, there is no loss in the transfer process. However, in power circuits, the oscillation in the circuit is avoided by choosing the switching frequency much higher than the resonant frequency of the LC circuit. As a result, the state of the circuit is changed many times before one oscillation cycle is completed. In one state of the circuit, current from the supply voltage builds up in the inductor. In the other state, the circuit configuration is changed so that current continues to flow from the capacitor, since current can not change instantly in an inductor. When the capacitor is fully discharged, i.e., its voltage is zero, the switch should close to provide a low resistance path for the inductor current; otherwise, the intrinsic 99 diode of the switch would turn on. This is not desirable as the voltage drop across a diode is bigger than the voltage drop across a closed switch. The energy improvement methods proposed here are based on the Zero Voltage Switching technique which delays turning on a switch until the voltage across it is zero. During the delay time, the voltage across the device is reduced, i.e., the energy that was about to discharge to ground through the device is transferred to another passive reservoir. Consequently, power loss related to dynamic losses in the system is reduced. To further reduce the total power loss, resistive losses in the system have been reduced by using wider and/or thicker current paths. Also wider transistors have been employed, although they have large gate capacitance requiring them to be driven by bigger drivers. As the inductor and capacitor used in a power converter are big, the only way that a converter can fit on-chip is to reduce their size. Converter transistors can switch at higher frequencies that allow converter passive components to be integrated on-chip. However, these higher frequencies were traditionally avoided due to dynamic losses. By reducing these dynamic losses, on-chip integration of power converters can be made more practical. This thesis has investigated several methods of reducing these dynamic losses by recycling energy stored in the clock network. The first circuit is integrated with the clock driver in a way that delivers the final clock load energy to the switching converter. The second circuit recognizes that the high switching losses of the front-end driver chain can be reduced through supply stacking and that some excess energy that results can also be delivered to the switching converter. The third and forth circuits explore boost and buck-boost topologies. The fifth circuit uses power converter circuitry to directly reduce the power consumption of a clock driver in a manner that represents a boost converter. The charge recycling methods used here are able to exploit a large clock capacitance. Although traditional design practices try to keep the clock capacitance as small as possible, these 100 techniques provide a power-saving alternative when that is no longer an option. The proposed methods are generic, technology-independent solutions that could be valid for future generations of finer feature size CMOS technologies. The implementations in this thesis attempt to diminish the impact of the integration on the original system by minimizing the effect of charge recycling on internal signals. Also stability and robustness of the proposed solutions are assured by avoiding complex circuit configurations as they are prone to malfunction and also have higher failure rates. The results demonstrated here are very promising. A significant amount of work remains to be done to optimize these circuits before they are practical. Although simulations show good- quality, quasi-square clock waveforms, there is concern that clock jitter may be increased as a result of the power converter integration. Placement of these circuits in the clock network could also increase skew. Future work is needed to address concerns regarding integrating the new circuits in a real, complex chip design; for example, interaction between the converters and the system power grid and/or decoupling capacitors. As a limiting factor, the new clock waveform from the integrated converter/driver circuits is only suitable for positive-edge-triggered digital blocks as the converter output voltage is adjusted by pulse width modulation of the clock waveform. The low-power clock driver does not have this drawback, since it can work at a fixed duty cycle. If these methods of charge recycling prove to be successful in practice, they potentially could be used in many high performance, high frequency designs to lower power and save energy. This new design approach may transform a regular CMOS designer's way of thinking to take into account energy recycling. For example, designers typically minimize capacitance, but bigger capacitors in some areas may lead to more energy recovery and provide benefits in other areas. 101 Table 6.1 summarizes the overall results of the fabricated prototypes that are presented in this thesis: Table 6.1. Chip prototype results Type Buck Boost Buck-Boost Integrated clock driver/power converter 90nm CMOS Simulation works Prototype works 90nm CMOS Simulation works Prototype not working 90nm CMOS Simulation works Prototype not working Low-swing power converter 180µm CMOS Simulation works Prototype works N/A N/A Low-power clock driver N/A 90nm CMOS Simulation works Prototype works N/A 6.1 Future Work The plan for future extension of this work can be divided into two key categories: continuation of the previous work and finding new ideas for charge recycling. 6.1.1 Continuation of the Work As a continuation of the previous work, the two previous buck converter concepts can be merged onto a single design. That is, two chains of gate drivers can be considered in the clock-tree converter design. The advantage of this configuration is that the electric charge in the clock-tree circuit as well as the transistor gating circuits will be reused to get improved efficiency. There are potential problems with this approach. For example, there might be a mismatch between the parallel inverter chains and therefore gating signals arriving at the power transistors may go out of synchronization. This is particularly a problem at very high speeds. This problem could be alleviated by using new adaptive delay circuits that are part of the ZVS function to 102 either re-sync or tolerate mismatch better. The following improvements to this new design can also be pursued: • NMOS ZVS operation has been implemented in the integrated clock driver/power converter designs. The implementation of PMOS ZVS operation and its effect on converter efficiency could be considered. A dual circuit similar to the NMOS delay circuit can be used for this purpose. Early simulation results (using a different circuit) had shown that negative inductor current needed for ZVS operation of Mp resulted in increased power loss due to inductor series resistance Rs. The new delay circuit and reduced Rs through inductor thickening may alleviate this power loss. • The existing diode-connected NMOS transistors in low-swing buck converter suffer from power loss during the on state. A simplified gating circuit is needed to fully turn on an NMOS transistor while mimicking diode behavior. This could be achieved by connecting the gate of a wide transistor to a comparator circuit that senses the voltage difference across the transistor [41]. The challenge comes from the fact that the comparator circuit needs to react very quickly while driving a big transistor. • The existing transistors in the low-swing buck converter are of standard-Vt type because low-Vt devices were not available in the design kit. A low-Vt device is needed to facilitate operation at VDD/2 levels. • To improve the chip layouts, some fine tuning could be performed to reduce resistance across the circuit. This would include resizing of the power transistors and changing the width of the circuit paths and/or the inductor path. This would increase area but would improve efficiency by decreasing the resistive power loss. • In the clock-tree charge-recycling scheme, the clock signal has been disturbed in order to achieve converter voltage regulation. However, the quality of clock signal is important to 103 a logic designer. Clock jitter and duty cycle in clock-tree scheme should be measured and improved, perhaps by using on-chip structures and experimentation. • Stacking of the passive filter components, specifically putting the filter inductor above the filter capacitor to save area, could be considered. The area under the inductor has not been used here due to concerns of negative impact on inductance and/or eddy current losses. Recently these concerns have been studied in [45] with reassuring results. • Power grid capacitance could be integrated with the converter output filter. They behave like a distributed capacitor across the chip. Power grids can potentially oscillate due to L and C effects in the grid itself. This effect needs to be taken into account while studying the stability of the system. This idea would reduce the size of the output capacitor and, as a result, reduce the converter area. • The effect of injecting charge back to on-chip DC power distribution grid could be investigated. A large system can potentially have several of the integrated low-power clock drivers working in parallel, raising concerns with their possibly synchronized operation. • To simplify the designs, the current chips have limited controllability and observability. On-chip voltage buffer circuits can be added to view the internal signals such as the clock waveform. Also, on-chip jitter measurement circuits would help in accurate jitter measurement. • The effect of delivering a voltage surge back to the power grid that is perfectly synchronized to the clock is unknown. It could deliver energy just-in-time to reduce resistive voltage drop, or it could be at the wrong time and increase it. 104 6.1.2 Investigating New Ideas Clock networks are one of the charge dissipating sub-circuits in a system. There are other sub- circuits of an integrated system that have capacitors with a charging/discharging operation cycle. Those circuits could be investigated in order to apply charge recycling methods to feed DC-DC converters or for returning the charge back to the power grid. Examples of other charge dissipating circuits include: • On-chip memories: In a synchronous random access memory (RAM), one entire word line is always fully charged/discharged every access cycle. As well, all bit lines are pre- charged (possibly not to full VDD, but halfway) and during the read cycle they are partially or fully discharged. DRAM empties its capacitor storage onto the bit line, but the charge change is very small and probably can't be captured. However, SRAM uses a pull-down NMOS to drain the bit line to ground. It might be possible to capture and collect the SRAM pull-down charge in a “pseudo-ground” grid, and fed it to a DC-DC converter. • I/O pads: I/O pads usually have big capacitance that are charged and discharged in every change of output state. Instead of discharging the pad capacitance to ground, the charge can be delivered to a power converter. There are two common types of I/O pads: full swing digital pads that are used in low-speed signaling and low-voltage differential signaling (LVDS) pads that are used in high-speed signaling. Different charge recycling methods could be applied to those pads. • Tail current source in differential pairs and biased circuits: Instead of sinking current to the ground, it can be redirected to a DC-DC boost converter. This circuit is different from the others in the sense that the charge is not recycled from a capacitor but from a continuous current source. 105 From the list above, the most advantageous ones could be identified and selected to demonstrate advantages of charge recycling. This would define a new category of designs that reuse, recover, and recycle energy called “green” chips or environmentally friendly electronic circuits. With reduced energy consumption, green chips can be powered from the renewable energy of the environment, such as sunlight or human body heat. Living off free ambient energy, they will be closer to zero-footprint and can become true wireless devices. 106 REFERENCES [1] J. Friedrich, B. McCredie, N. James, B. Huott, B. Curran, E. Fluhr, G. Mittal, E. Chan, Y. Chan, D. Plass, S. Chu, H. Le, L. Clark, J. Ripley, S. Taylor, J. Dilullo, and M. Lanzerotti, "Design of the Power6 microprocessor," Proc. IEEE Int. Solid- State Circuits Conf. (ISSCC), 2007, pp. 96-97. [2] S. Naffziger, B. Stackhouse, T. Grutkowski, D. Josephson, J. Desai, E. Alon, and M. Horowitz, "The implementation of a 2-core, multi-threaded Itanium family processor," IEEE J. Solid-State Circuits, vol. 41, no. 1, Jan. 2006, pp. 197-209. [3] S. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger, T. Sullivan, and T. Grutkowski, "The implementation of the Itanium 2 microprocessor," IEEE J. Solid- State Circuits, vol. 37, no. 11, Nov. 2002, pp. 1448-1460. [4] E. G. Friedman, "Clock distribution networks in synchronous digital integrated circuits," Proc. of the IEEE, vol. 89, no. 5, May 2001, pp. 665-689. [5] A. Strollo, E. Napoli, and D. De Caro, "New clock-gating techniques for low- power flip-flops," Proc. IEEE Int. Symposium on Low Power Electronics and Design (ISLPED), 2000, pp. 114-119. [6] V. Tiwari, R. Donnelly, S. Malik, and R. Gonaalea, "Dynamic power management for microprocessors: a case study," Proc. IEEE Int. Conf. on Very Large Scale Integration Design (VLSI Design), 1997, pp. 185-192. [7] H. Kojima, S. Tanaka, and K. Sasaki, "Half-swing clocking scheme for 75% power saving in clocking circuitry," IEEE J. Solid-State Circuits, vol. 30, no. 4, Apr. 1995, pp. 432-435. [8] A. Oliva, S. Ang, and G. Bortolotto, "Digital control of a voltage-mode synchronous buck converter," IEEE Trans. Power Electronics, vol. 21, no. 1, Jan. 2006, pp. 157-163. [9] M. Stan and W. Burleson, "Low-power CMOS clock drivers," Proc. ACM/IEEE Int. Workshop on Timing Issues in the Specification and Synthesis of Digital Systems (TAU Workshop), 1995, pp. 149-156. [10] S. C. Chan, K. L. Shepard, and P. J. Restle, "Uniform-phase uniform-amplitude resonant-load global clock distributions," IEEE J. Solid-State Circuits, vol. 40, no. 1, Jan. 2005, pp. 102-109. 107 [11] S. C. Chan, K. L. Shepard, and P. J. Restle, "Distributed differential oscillators for global clock networks," IEEE J. Solid-State Circuits, vol. 41, no. 9, Sept. 2006, pp. 2083-2094. [12] N. Mohan, T. M. Undeland, and W. P. Robins, Power electronics converters application and design, 2nd ed., Ed. New York: John Wiley & Sons Inc., 1995. [13] P. Hazucha, G. Schrom, J. Hahn, B. A. Bloechel, P. Hack, G. E. Dermer, S. Narendra, D. Gardner, T. Karnik, V. De, and S. Borkar, "A 233MHz 80%-87% efficient four-phase DC-DC converter utilizing air-core inductors on package," IEEE J. Solid-State Circuits, vol. 40, no. 4, Apr. 2005, pp. 838-845. [14] G. Schrom, P. Hazucha, J. Hahn, D. S. Gardner, B. A. Bloechel, G. Dermer, S. G. Narendra, T. Karnik, and V. De, "A 480-MHz, multi-phase interleaved buck DC- DC converter with hysteretic control," Proc. IEEE Power Electronics Specialists Conf. (PESC), 2004, pp. 4702-4707. [15] S. Abedinpour, B. Bakkaloglu, and S. Kiaei, "A multi-stage interleaved synchronous buck converter with integrated output filter in a 0.18µm SiGe process," Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2006 pp. 356-357. [16] A. J. Stratakos, S. R. Sanders, and R. W. Brodersen, "A low-voltage CMOS DC- DC converter for a portable battery-operated system," Proc. IEEE Power Electronics Specialists Conf. (PESC), 1994, pp. 619-626. [17] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital integrated circuits, 2nd ed., Ed. New Jersey: Pearson Education International, 2003. [18] S. Rahman and F. Lee, "Computer simulations of optimum boost and buck-boost converters," IEEE Trans. Aerospace and Electronic Systems, vol. AES-18, no. 5, Sept. 1982, pp. 598-608. [19] Y. Yu, F. Lee, and J. Triner, "Power converter design optimization," IEEE Trans. Aerospace and Electronic Systems, vol. AES-15, no. 3, May 1979, pp. 344-355. [20] G. Schrom, P. Hazucha, F. Paillet, D. Gardner, S. Moon, and T. Karnik, "Optimal design of monolithic integrated DC-DC converters," Proc. IEEE Int. Conf. on Integrated Circuit Design and Technology (ICICDT), 2006, pp. 1-3. [21] E. A. McShane and K. Shenai, "Monolithic DC power supplies for wireless telecommunications and multimedia systems," Proc. IEEE Int. Telecommunications Energy Conf. (INTELEC), 2000, pp. 733-740. [22] T. Fuse, A. Kameyama, M. Ohta, and K. Ohuchi, "A 0.5V power-supply scheme for low power LSIs using multi-Vt SOI CMOS technology," Proc. IEEE Symposium on Very Large Scale Integration (VLSI) Circuits, 2001, pp. 219-220. 108 [23] Z. Chuang, M. Dongsheng, and A. Srivastava, "Integrated adaptive DC/DC conversion with adaptive pulse-train technique for low-ripple fast-response regulation," Proc. IEEE Int. Symposium on Low Power Electronics and Design (ISLPED), 2004, pp. 257-262. [24] V. Kursun, S. Narendra, V. De, and E. Friedman, "Low-voltage-swing monolithic DC-DC conversion," IEEE Trans. Circuits and Systems, vol. 51, no. 5, May 2004, pp. 241-248. [25] S. Musunuri, P. L. Chapman, Z. Jun, and L. Chang, "Design issues for monolithic DC-DC converters," IEEE Trans. Power Electronics, vol. 20, no. 3, May 2005, pp. 639-649. [26] V. Kursun, S. Narendra, V. De, and E. Friedman, "High input voltage step-down DC-DC converters for integration in a low voltage CMOS process," Proc. IEEE Int. Symposium on Quality Electronic Design (ISQED), 2004, pp. 517-521. [27] J. Xiao, A. Peterchev, J. Zhang, and S. Sanders, "A 4µA-quiescent-current dual- mode buck converter IC for cellular phone applications," Proc. IEEE Int. Solid- State Circuits Conf. (ISSCC), 2004, pp. 280-283. [28] S. Rajapandian, K.L. Shepard, P. Hazucha, and T. Karnik, "High-voltage power delivery through charge recycling," IEEE J. Solid-State Circuits, vol. 41, no. 6, June 2006, pp. 1400-1410. [29] S. Chan, K. Shepard, and P. Restle, "Design of resonant global clock distributions," Proc. IEEE Int. Conf. on Computer Design (ICCD), 2003, pp. 248-253. [30] D. Somasekhar and V. Visvanathan, "A 230-MHz half-bit level pipelined multiplier using true single-phase clocking," IEEE Trans. Very Large Scale Integration Systems, vol. 1, no. 4, Dec. 1993, pp. 415-422. [31] S. Naffziger, B. Stackhouse, and T. Grutkowski, "The implementation of a 2-core multi-thread Itanium-family processor," Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2005, pp. 182-183. [32] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, and P. Palmer, "A 3GHz switching DC-DC converter using clock-tree charge-recycling in 90nm CMOS with integrated output filter," Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2007, pp. 532-533. [33] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, W. Dunford, and P. Palmer, "Energy recycling from multi-GHz clocks using fully integrated switching converters," submitted to IEEE J. Solid-State Circuits, 2008. [34] J. W. Tschanz, J. T. Kao, S. G. Narendra, R. Nair, D. A. Antoniadis, A. P. Chandrakasan, and V. De, "Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage," IEEE J. Solid-State Circuits, vol. 37, no. 11, Nov. 2002, pp. 1396-1402. 109 [35] W. Kim, M. Gupta, G.-Y. Wei, and D. Brooks, "System level analysis of fast, per- core DVFS using on-chip switching regulators," Proc. IEEE Int. Symposium on High Performance Computer Architecture (HPCA), 2008, pp. 123-134. [36] B. Razavi, Design of analog CMOS integrated circuits, 1st ed., Ed. Boston: McGraw-Hill Companies Inc., 2001. [37] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, P. Palmer, and W. Dunford, "A 660MHz ZVS DC-DC converter using gate-driver charge-recycling in 0.18µm CMOS with an integrated output filter," Proc. IEEE Power Electronics Specialists Conf. (PESC), 2008, pp. 140-146. [38] G. Lemieux, M. Alimadadi, S. Sheikhaei, S. Mirabbasi, and P. Palmer "SoC energy savings = reduce + reuse + recycle: A case study using a 660MHz DC-DC converter with integrated output filter," Proc. IEEE Canadian Conf. on Electrical and Computer Engineering (CCECE), 2008, pp. 947-950. [39] J. Lin, "Challenges for SoC Design in Very Deep Submicron Technologies," National Semiconductor Corp. [Online]. Available: http://www.ece.uci.edu/codes+isss/Invited/JamesLin.pdf [40] A. M. Niknejad and R. G. Meyer, "Analysis, design, and optimization of spiral inductors and transformers for Si RF IC’s," IEEE J. Solid-State Circuits, vol. 33, no. 10, Oct. 1998, pp. 1470-1481. [41] T. Y. Man, P. K. T. Mok, and M. Chan, "A CMOS-control rectifier for discontinuous-conduction mode switching DC-DC converters," Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), 2006, pp. 358-359. [42] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, P. Palmer, and W. Dunford, "Energy recovery from high-frequency clocks using DC-DC converters," Proc. IEEE Int. Symposium on Very Large Scale Integration (ISVLSI), 2008, pp. 162-167. [43] M. Alimadadi, S. Sheikhaei, G. Lemieux, S. Mirabbasi, P. Palmer, and W. Dunford, "A 4GHz non-resonant clock driver with power-grid energy return," submitted to IEEE Trans. Circuits and Systems II, 2008. [44] J. W. Tschanz, S. G. Narendra, Y. Ye, B. A. Bloechel, S. Borkar, and V. De, "Dynamic Sleep transistor and body bias for active leakage power control of microprocessors," IEEE J. Solid-State Circuits, vol. 38, no. 11, Nov. 2003, pp. 1838-1845. [45] F. Zhang and P. R. Kinget, "Design of components and circuits underneath integrated inductors," IEEE J. Solid-State Circuits, vol. 41, no. 10, Oct. 2006, pp. 2265-2271. [46] S. C. O. Mathuna, T. O'Donnell, W. Ningning, and K. Rinne, "Magnetics on silicon: an enabling technology for power supply on chip," IEEE Trans. Power Electronics, vol. 20, no. 3, May 2005, pp. 585-592. 110 [47] C. Yue and S. Wong, "On-chip spiral inductors with patterned ground shields for Si-based RF ICs," IEEE J. Solid-State Circuits, vol. 33, no. 5, 1998, pp. 743-752. [48] J. N. Burghartz, "Progress in RF inductors on silicon-understanding substrate losses," Proc. IEEE Int. Electron Devices Meeting (IEDM), 1998, pp. 523-526. [49] J. N. Burghartz, D. C. Edelstein, M. Soyuer, H. A. Ainspan, and K. A. Jenkins, "RF circuit design aspects of spiral inductors on silicon," IEEE J. Solid-State Circuits, vol. 33, no. 12, Dec. 1998, pp. 2028-2034. [50] J. Gil and S. Hyungcheol, "A simple wide-band on-chip inductor model for silicon- based RF ICs," IEEE Trans. Microwave Theory and Techniques, vol. 51, no. 9, Sept. 2003, pp. 2023-2028. [51] H. Samavati, A. Hajimiri, A. R. Shahani, G. N. Nasserbakht, and T. H. Lee, "Fractal capacitors," IEEE J. Solid-State Circuits, vol. 33, no. 12, Dec. 1998, pp. 2035-2041. [52] G. Villar, E. Alarcon, F. Guinjoan, and A. Poveda, "Optimized design of MOS capacitors in standard CMOS technology and evaluation of their equivalent series resistance for power applications," Proc. IEEE Int. Symposium Circuits and Systems (ISCAS), 2003, pp. 451-454. [53] X. Meng, R. Saleh, and K. Arabi, "Layout of decoupling capacitors in IP blocks for 90nm CMOS," to be published in IEEE Trans. Very Large Scale Integration Systems. 111 APPENDICES A Discrete Switching Power Converters Switch-mode converters consist of an inductor that periodically is connected in different configurations. By adjusting the ratio of time spent in each configuration, the output voltage can be regulated. This method is more efficient, in the range of 80% to 95% for a discrete design, as switches are either fully on or fully off and voltage drop ideally happens only across the inductor, which is a no-loss component, i.e., voltage drop causes energy to be stored, not to be dissipated, in the inductor. For the sake of simplicity, in the following discussions power losses in the circuits are neglected, i.e., Pin = Pout. A.1 Buck (Step-Down) Switching Converters One of the basic switch-mode DC-DC conversion topologies is the step-down or buck converter. Basically its operation can be described as averaging a square wave signal by passing it through a low pass filter. The average or DC value is D×VDD which implies that the output voltage is a function of the magnitude and also the duty cycle of the square waveform. As shown in Figure A.1, the square waveform is generated using two switches: one transistor and one diode. Using the diode simplifies the circuit as it operates automatically and does not need a gating signal. 112 Figure A.1. A basic buck converter The operation of the buck converter is fairly simple. If the inductor current never stays at zero, it is said that the converter is operating in Continuous Conduction Mode (CCM). As shown in Figure A.2, there are two operational states. Figure A.2. Operation states of the buck converter in CCM mode In the first state, the transistor is on, diode is reversed biased and current builds up in the inductor. In the second state, the transistor is turned off. Current in the inductor can not change instantly, so the current finds its way through the diode. Since the supply is disconnected from the circuit, inductor current decreases as the energy is transferred from the inductor to the load. In steady-state operation, the integral of the inductor voltage over one time period, in other words the average of the inductor voltage, must be zero. Therefore )()( onswoutonoutDD tTVtVV −=− or 113 D T t V V sw on DD out == (A.1) which implies that the converter has a linear ideal DC gain, i.e., behaves like a DC transformer. Also in steady state as there is no DC current going through the capacitor, the inductor average current is equal to the output DC current. Suppose the DC load current is decreased slowly. The average value of the inductor current falls to the point that the minimum inductor current reaches zero. At this time the average inductor current is )( 2 )( 22 1 2 1 max, outDD sw outDD on LLL VVL DTVV L tiiI −=−==∆= as shown in Figure A.3. Noting that IL = Iout, Equation A.2 gives the minimum inductance needed to keep the converter in CCM with a minimum design load current Iout-min. In practice, it is considered that Iout-min ≅ 0.1×Iout. )( 2 outDDout sw VV I DTL −= (A.2) Figure A.3. Waveforms of a buck converter in CCM mode If the DC load current is further decreased, since the diode can not conduct a negative current, minimum inductor current stays at zero as shown in Figure A.4. This is called Discontinuous Conduction Mode (DCM). In DCM, the converter behavior is not linear which requires a complex controller algorithm for voltage regulation. 114 Figure A.4. Waveforms of a buck converter in DCM mode In reality there is a significant voltage drop across the diode in the basic buck converter. Because a transistor on-state voltage drop is less than a diode on-state voltage drop, the diode can be replaced by a transistor as shown in Figure A.5. This configuration is referred to as Synchronous Buck Converter. This configuration also avoids the complexity of DCM. As the second transistor can conduct negative currents, the converter always stays in CCM mode. Figure A.5. A synchronous buck converter The filter capacitor that is directly connected at the output of the converter, makes it seen as a voltage source by the load. A bigger capacitor makes the output voltage ripple smaller. It can be proven that the peak-to-peak output voltage ripple can be written as 22 , )1( 2        ⋅−= ∆ sw c out ppout F FD V V pi where LC Fc pi2 1 = is the corner frequency of the filter. Choosing swswc FFF <<×≅ 1.0 minimizes the ripple. This also shows that the output voltage ripple is 115 independent of the output current in the CCM mode. Thus the filter capacitor can be derived using Equation A.3: 2 , )/(8 )1( swoutppout FVV DC ∆ − = (A.3) It is also worth noting that in the second CCM state, when the diode in the basic configuration is conducting, the converter model can be simplified to an LCR circuit. Choosing swc FF << prevents the potential for oscillation as well. In integrated power converter designs, to save on-chip area, smaller inductor and capacitor values are much preferred. Using Equations A.2 and A.3 for the basic buck converter, the effect of switching frequency and converter output current on the converter inductor and capacitor are illustrated in Figure A.6 and Figure A.7, respectively. Choosing a mid-level converter output current will give a good compromise between inductor and capacitor values while higher switching frequencies will reduce both. 116 0 1 2 3 0 200 400 600 800 1000 0 100 200 Fsw (GHz) Converter Inductor Iout (mA) Lf (nH ) 20 40 60 80 100 120 140 160 Figure A.6. Effect of Fsw and Iout on buck converter inductor 0 1 2 3 0 200 400 600 800 1000 0 2 4 6 Fsw (GHz) Converter Capacitor Iout (mA) Cf (nF ) 0.5 1 1.5 2 2.5 3 3.5 4 Figure A.7. Effect of Fsw and Iout on buck converter capacitor 117 A.2 Boost (Step-Up) Switching Converters Another basic DC-DC conversion topology is the step-up or boost converter. Components used are similar to the buck converter but connected in a different configuration as shown in Figure A.8. Figure A.8. A basic boost converter The operation of the boost converter is fairly simple. If the inductor current never stays at zero, it is said that the converter is operating in Continuous Conduction Mode (CCM). As shown in Figure A.9, there are two operational states. Figure A.9. Operation states of the boost converter in CCM mode In the first state, the transistor is on, current builds up in the inductor and diode is reversed biased isolating the output stage. In the second state, the transistor is turned off. Current in the inductor can not change instantly, so the current finds its way through the diode. Inductor 118 voltage will be in series with the source voltage, so the output capacitor receives a voltage that is higher than the supply voltage. The load receives energy from the input source as well as the inductor and therefore the inductor current decreases. In steady-state operation, the integral of the inductor voltage over one time period, in other words the average of the inductor voltage, must be zero. Therefore 0)( =−+ offoutDDonDD tVVtV or Dt T V V off sw DD out − == 1 1 (A.4) which implies that the converter has a non-linear ideal DC gain even in CCM. Also in steady state, the inductor average current is equal to the input average current. Suppose the DC load current is decreased slowly. The average value of the inductor current falls to the point that the minimum inductor current reaches zero. At this time the average inductor current is )1( 22 1 2 1 2 1 max, DDL VT t L ViiI outswonDDLLL −===∆= . Noting that D III outDDL − == 1 , this equation gives the minimum inductance needed to keep the converter in CCM with a minimum design load current. If the DC load current is further decreased, since at the end of commutation cycle the inductor discharges completely, minimum inductor current stays at zero. This is called Discontinuous Conduction Mode (DCM). A.3 Buck-Boost Switching Converters Buck-boost topology is a combination of the two basic configurations. Components used are similar to the buck converter but connected in a different configuration as shown in Figure A.10. 119 Figure A.10. A buck-boost converter The operation of the boost converter is fairly simple. If the inductor current never stays at zero, it is said that the converter is operating in Continuous Conduction Mode (CCM). As shown in Figure A.11, there are two operational states. Figure A.11. Operation states of a buck-boost converter in CCM mode In the first state, the transistor is on, current builds up in the inductor and diode is reversed biased isolating the output stage. In the second state, the transistor is turned off. Current in the inductor can not change instantly, so the current finds its way through the diode. Inductor voltage will be in parallel with the output voltage. Since the supply is disconnected from the circuit, inductor current decreases as the energy is transferred from the inductor to the load. 120 In steady-state operation, the integral of the inductor voltage over one time period, in other words the average of the inductor voltage, must be zero. Therefore 0)( =−+ offoutonDD tVtV or D D t t V V off on DD out − == − 1 (A.5) which implies that the converter has a non-linear ideal DC gain even in CCM. Also in steady state, the inductor average current is equal to the sum of the input average current and the output average current. Suppose the DC load current is decreased slowly. The average value of the inductor current falls to the point that the minimum inductor current reaches zero. At this time the average inductor current is )1( 22 1 2 1 2 1 max, DL VTD L VTiiI outswDDswLLL −===∆= . Noting that outDDL III += outout IID D + − = 1 , this equation gives the minimum inductance needed to keep the converter in CCM with a minimum design load current. If the DC load current is further decreased, since at the end of commutation cycle the inductor discharges completely, minimum inductor current stays at zero. This is called Discontinuous Conduction Mode (DCM). B On-Chip Passive Components B.1 Inductors An inductor is an integral part of any switch-mode power converter. Traditionally, magnetic materials are used in construction of inductors to confine the magnetic field close to the coil, thereby increasing the inductance. Magnetics on silicon have been introduced before in the literature [46]. However, to keep the inductor design compatible with conventional CMOS process, coreless inductors are being used in this work. 121 A simplified pi model of an inductor is shown in Figure B.1 which consists of an ideal inductance Lseries, a series resistance Rseries representing the ohmic losses in the coil, inductor capacitances Cs1 and Cs2, and substrate resistances Rs1 and Rs2. The value of these components can be derived using ASITIC software [40]. Figure B.1. A Simplified pi model of an inductor In CMOS processes, the silicon substrate has a relatively low resistivity and eddy currents in the silicon can be considerable. As the eddy current tries to create a magnetic field that opposes the applied magnetic field, the effect of eddy current is seen by a reduced net flux and thus a reduced inductance. Since different substrate structures have different resistivity, they will have different effects on the inductance. Any coupled currents in the substrate will increase the substrate noise because they change the substrate voltage. Consequently, a metal Patterned Ground Shield (PGS) is placed in between the inductor coil and the substrate [47]. By using strings of ground-substrate contacts, any induced current in the substrate will be shorted at regular intervals to the system ground as well. The inductor characteristics will become independent of the substrate structure and eddy currents in the substrate may also reduce. Among the different patterns that are introduced and studied in the literature, the wide bar pattern shown in Figure B.2 avoids eddy current path and is used in this work [48]. 122 Figure B.2. A wide bar PGS Use of only higher metal layers for the inductor and the lowest metal layer for PGS will keep the inductor high up above the PGS. Excluding use of the lower metal layers for the inductor will also reduce Cs1 and Cs2. On the other hand, block out masks can be applied during fabrication to keep the doping level under the spiral coil at a minimum to maximize Rs1 and Rs2 [49]. The effect of high frequencies on inductor characteristics has previously been studied in [50]. Using ASITIC [40], those effects are illustrated in Figure B.3 for the following inductor in a 1P7M2T 90nm process: a one-turn octagon inductor with an external radius of 300µm and a width of 50µm. The two thick metal layers of the process (M6 and M7) are put in parallel to reduce the series resistance Rseries. ASITIC simulation for the inductor used in the buck converter chip shows that at 3GHz, an inductance value of 320pH with Rseries = 260mΩ, Cs1 = Cs2 =140fF, Rs1 = Rs2 = 280Ω and quality factor of 20 is achieved. In Figure B.3, as the frequency increases, Rseries increases mainly due to both skin and proximity effects. At frequency reaching the maximum Q, Rseries starts to decrease rapidly. It is believed that the decrease is caused by coupling through the silicon substrate. It has been reported that adding a parallel combination of resistance and capacitance to the pi model can increase the accuracy of the inductor model because coupling mechanisms through the silicon 123 substrate are resistive-coupling dominant at low frequencies and capacitive-coupling dominant at high frequencies [50]. 0 50 100 150 200 250 300 350 400 0.1 1 10 100 Fsw (GHz) Ls (p H ), R s (m O hm ), Q x 10 Ls (pH) Rs (mOhm) Q x 10 Figure B.3. Effect of high frequency on an inductor characteristic B.2 Capacitors In CMOS there are a few different types of capacitors available, including MIM, Fractal and MOSFET gate capacitances. MIM capacitors are manufactured using special metal layers and as such they can be accurately characterized but they have low capacitance density. Fractal capacitors are made of geometrically shaped regular metal layers. They have higher capacitance density but the capacitance can vary depending on the variations of the fabrication process [51]. MOSFET gate capacitors have the highest capacitance density, but they are non-linear [52] and require a DC bias voltage to operate. In switch-mode power converters, capacitors are used as bulk energy storage devices. In this work, an array of hundreds of NMOS devices in parallel is used to accommodate the high capacitance needed. The nonlinear behavior of gate capacitance is not significant in power converter applications because capacitance can be predicted according to the working voltage. 124 Using the procedure given in [17] the effect of gate voltage on gate capacitance is presented in Figure B.4 for a 1µm2 NMOS device in 90nm CMOS technology. 0 2 4 6 8 10 12 14 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 Voltage (V) Ca pa c ita n c e (fF ) Figure B.4. Gate capacitance vs. gate voltage for an NMOS device As shown in Figure B.4, for voltages higher than 0.75V, the capacitance density is around Cox = 12fF/µm2. The total capacitance of one transistor’s gate capacitance can then be calculated using oxCLWC ..= in which W.L product, represents the gate area of the transistor. To increase the gate area, transistors are usually designed with a length much higher than the minimum allowed by the technology. MOSFET-based capacitors has been studied in [52] using the distributed resistor and capacitor model shown in Figure B.5. Figure B.5. Model of a MOSFET gate capacitor The internal resistance, commonly known as Equivalent Series Resistance (ESR), of such a capacitor consists of two parts: gate resistance Rg and channel resistance Rch. Equation B.1 summarizes the relationship between ESR and transistor aspect ratio. 125 L WK W LKRRESR Gch 109 +=+= (B.1) While Equation B.1 gives the ESR of one device, in practice, many gate capacitors are put in parallel. The ESR given by Equation B.1 is then divided by the number of parallel devices in the capacitor structure, to achieve the total resistance of the capacitor. Equation B.1 suggests that ESR would exhibit a minimum for a certain device aspect ratio. Minimum ESR is independent of the capacitance value, since it is not dependent on the size of the MOSFET capacitor but on its shape, i.e., aspect ratio. Equation B.1 is plotted in Figure B.6 for a 90nm CMOS design kit using MATLAB rather than Cadence. That is because Cadence schematic simulation engine only considers Rch. In Cadence, post-layout simulation based on extracted component values is needed for the effect of RG to be included. As shown in Figure B.6, a W/L of about 10 minimizes the ESR. A reduced ESR not only decreases power loss in the capacitor, but also lowers the voltage ripple across it. The effect of high frequency on capacitance and series resistance of gate capacitors in CMOS technology has been studied in [53] which shows no significant change at frequencies of interest of this work. 0 5 10 15 20 25 30 350 5 10 15 20 25 30 Transistor W/L Ratio ES R (O hm ) Vgs=1.0 Vgs=0.5 Figure B.6. ESR as a function of transistor aspect ratio W/L