A 43mW SINGLE-CHANNEL 4GS/s 4-BIT FLASH ADC IN 0.18µm CMOS by Samad Sheikhaei M.A.Sc., Sharif University of Technology, 1999 B.A.Sc., Sharif University of Technology, 1996 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in The Faculty of Graduate Studies (Electrical and Computer Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) October 2008 © Samad Sheikhaei, 2008 ii ABSTRACT The continued speed improvement of serial links and appearance of new communication technologies, such as ultra-wideband (UWB), have introduced increasing demands on the speed and power specifications of high-speed low-to-medium resolution analog-to-digital converters (ADCs). While multi-channel ADCs can achieve high speeds, they often require extensive and costly post-fabrication calibration. A single-channel 4-bit flash ADC, suitable for abovementioned or similar applications, implemented entirely using current-mode logic (CML) blocks, is presented. CML implementation allows for high sampling rates, while typically providing low power consumption at high speeds. To improve the conversion rate, both the analog (comparator array) and the digital (encoder) parts of the ADC are fully pipelined. Furthermore, the logic functions in the encoder are reformulated to reduce wire crossings and delay and to equalize the wires lengths in the layout. To keep the design simple, inductors are avoided. As a result, a compact design with small wire parasitics is achieved. Moreover, some geometric layout techniques, including a common-centroid layout for the resistor ladder, are introduced to reduce the effect of mismatches to eliminate the use of digital calibration. The ADC is designed and fabricated in 0.18µm CMOS and operates at 4GS/s. It achieves an effective number of bits (ENOB) of 3.71 (3.14, 2.75) for a 10MHz (0.501GHz, 1.491GHz) signal sampled at 4GS/s (3GS/s, 3GS/s). Differential/integral nonlinearity (DNL/INL) errors iii are between ±0.35LSB and ±0.26LSB, respectively. The ADC consumes 43mW from a 1.8V supply and occupies 0.06mm2 active area. Due to the use of CML circuits, the ADC achieves the highest speed reported for a single-channel 4-bit ADC in a 0.18µm CMOS technology. It also reports the best power performance among the 4-bit ADCs with similar or higher speeds. The active area is also among the smallest reported. In addition, in this thesis, the signal-to-noise-ratio (SNR) of an ADC is formulated in terms of its INL performance. The related formulas in the literature are not accurate for low-resolution ADCs, and yet they do not take the input waveform into account. Two standard waveforms, ramp and sinusoid, are considered here. The SNR formulas are derived and confirmed by simulation results. iv TABLE OF CONTENTS Abstract .................................................................................................................................. ii Table of Contents.................................................................................................................. iv List of Tables ....................................................................................................................... viii List of Figures ....................................................................................................................... ix List of Abbreviations .......................................................................................................... xiii Acknowledgments ............................................................................................................... xiv Dedication............................................................................................................................. xv 1 Introduction.................................................................................................................... 1 1.1 Motivation........................................................................................................ 1 1.2 Objectives ........................................................................................................ 2 1.3 Contributions.................................................................................................... 4 1.3.1 Implementation of the Complete ADC using CML Blocks, Achieving a High-Speed, Low-Power, and Compact Design.......................... 4 1.3.2 Reformulation in the Encoder Function ............................................ 5 1.3.3 Introducing a Common-Centroid Layout for the Resistor Ladder .... 6 1.3.4 Time-Domain Analysis of INL Effects on the SNR of ADCs .......... 7 1.4 Outline.............................................................................................................. 7 2 Background .................................................................................................................... 8 2.1 High-Speed ADC Architectures....................................................................... 8 2.2 Building Blocks of a Flash ADC ..................................................................... 9 2.2.1 Track-and-Hold ............................................................................... 10 2.2.2 Preamplifier ..................................................................................... 11 2.2.3 Comparator ...................................................................................... 12 2.3 Offset of the Comparator ............................................................................... 13 2.3.1 Offset Removal or Reduction.......................................................... 15 2.3.2 Input Offset Storage ........................................................................ 15 2.3.3 Offset Averaging ............................................................................. 16 v 2.3.4 Digital Calibration ........................................................................... 18 2.4 Digital Encoder .............................................................................................. 21 2.4.1 Sources of Errors in a Flash ADC ................................................... 21 2.4.2 Gray Coding .................................................................................... 22 2.5 Time-Interleaved ADCs................................................................................. 23 2.6 Review of the Previous Work in the Area of 4-Bit ADCs in CMOS Technology ................................................................................................................ 24 2.6.1 A 68mW 1.356GS/s 4-Bit Flash ADC in 0.18µm CMOS .............. 24 2.6.2 A 1W 12GS/s 4-Bit ADC in 0.25µm CMOS using Eight Time-Interleaved Flash ADCs ....................................................................... 26 2.6.3 A 0.6W 4GS/s 4-Bit Flash ADC in 0.18µm CMOS........................ 27 2.6.4 A 225mW 10GS/s 4-Bit Flash ADC in 0.13µm CMOS ................. 28 2.6.5 A 2.5mW 1.25GS/s 4-Bit Flash ADC in 90nm CMOS................... 29 2.6.6 A 40GHz-Bandwidth 4-Bit Time-Interleaved ADC using GaAs Photoconductive Sampling ............................................................................ 31 2.6.7 Performance Summary of the Previous Work................................. 31 3 Flash ADC Design........................................................................................................ 33 3.1 Introduction.................................................................................................... 33 3.2 Comparator .................................................................................................... 35 3.2.1 Preamplifier ..................................................................................... 37 3.2.2 Large Signal Analysis of a Differential Pair Amplifier................... 40 3.2.3 Input-Referred Offset Voltage of a Differential Pair Amplifier...... 41 3.2.4 Resistor Averaging in the Preamplifier ........................................... 43 3.2.5 CML Latch ...................................................................................... 46 3.2.6 Overdrive Recovery......................................................................... 52 3.3 Encoder .......................................................................................................... 53 3.3.1 Encoder Design ............................................................................... 54 3.3.2 Pipelining......................................................................................... 55 3.3.3 CML Implementation ...................................................................... 60 3.3.4 Simulation Results........................................................................... 62 3.3.5 Further Modifications in the Encoder.............................................. 64 3.4 Reformulation of the Encoder Function......................................................... 67 3.5 Differential Clocking in the ADC.................................................................. 71 vi 3.5.1 Effect of Amplitude/Phase Mismatch in Clock Signals .................. 71 3.5.2 Clocking of a Two-Channel Time-Interleaved ADC ...................... 72 3.6 Resistor Ladder .............................................................................................. 73 3.7 More Discussion on Gray Coding.................................................................. 74 3.7.1 Effect on Metastability .................................................................... 75 3.7.2 Effect on Bubble Errors................................................................... 76 4 Chip layout and Measurement results ....................................................................... 78 4.1 Layout ............................................................................................................ 78 4.1.1 Distribution of the Input and Clock Signals to the Comparator Array 80 4.1.2 Common-Centroid Layout in the Preamplifier................................ 82 4.1.3 Common-Centroid Layout for the Resistor Ladder......................... 84 4.2 Post-Layout Simulation Results..................................................................... 86 4.3 Output Driver ................................................................................................. 87 4.4 Test Setup....................................................................................................... 87 4.5 Measurement Results ..................................................................................... 90 4.5.1 Power Consumption ........................................................................ 90 4.5.2 DNL/INL Performance Measurements ........................................... 91 4.5.3 ENOB (SNDR) Measurements........................................................ 93 4.5.4 SFDR Measurements....................................................................... 95 4.5.5 Output Waveforms .......................................................................... 96 4.5.6 Frequency Spectrum of the Output.................................................. 97 4.5.7 Bit-Error Rate Measurements.......................................................... 98 4.5.8 Input Port and Clock Port Capacitance ........................................... 98 4.5.9 Propagation Delay of the ADC........................................................ 99 4.5.10 Measurements Summary............................................................... 99 5 Time-Domain Analysis of INL Effects on the SNR of ADCs................................. 102 5.1 INTRODUCTION ....................................................................................... 102 5.2 Transfer Characteristic of an ADC .............................................................. 105 5.3 Noise Power Calculations ............................................................................ 109 5.3.1 Time-Domain Equations for Noise Power and SNR..................... 109 5.3.2 A Ramp Input Signal ..................................................................... 110 5.3.3 A Sinusoidal Input Signal.............................................................. 113 vii 5.4 Simulation Results ....................................................................................... 115 5.5 Summary ...................................................................................................... 118 6 Conclusions................................................................................................................. 119 6.1 Scalability .................................................................................................... 120 6.1.1 Newer CMOS Technologies with Lower Supply Voltages .......... 120 6.1.2 Multi-Channel Architectures ......................................................... 121 6.1.3 Increasing the Speed; Speed and Power Trade-off........................ 121 6.1.4 Increasing the Resolution .............................................................. 122 6.2 Limitations ................................................................................................... 122 6.2.1 Limitations of the ADC applications............................................. 123 6.2.2 Limitations of the scalability ......................................................... 123 6.2.3 Limitations of the contributions .................................................... 124 6.3 Future work .................................................................................................. 125 6.3.1 Enhancement to the Proposed Flash ADC .................................... 125 6.3.2 Reconfigurable Circuit with Speed and Power Trade off ............. 126 6.3.3 Using Inductors in the Preamplifier Stage to Increase Speed and Bandwidth .................................................................................................... 127 6.3.4 Further Enhancements to the Proposed ADC................................ 128 6.3.5 Extending the Ideas to other Circuits ............................................ 128 References........................................................................................................................... 129 Appendix A: Design and Test of a 1GHZ Comparator in 0.35M CMOS................... 135 A.1 Introduction....................................................................................................... 135 A.2 Block Diagram of the Proposed Comparator.................................................... 135 A.3 Circuit Blocks of the Proposed Comparator ..................................................... 136 A.4 Slight Amplification by the Transmission Gate................................................ 139 A.5 Offset of the Comparator .................................................................................. 141 A.6 Measurement Results ........................................................................................ 142 A.7 Summary ........................................................................................................... 144 Appendix B: ADC Performance metrics ......................................................................... 145 B.1 Static Metrics .................................................................................................... 145 B.2 Dynamic Metrics ............................................................................................... 146 viii LIST OF TABLES Table 2.1: Performance summary of the previous work........................................................ 32 Table 3.1: Comparison between the three implementations of Figure 3.17.......................... 59 Table 3.2: Comparison between CML implementation of the circuits in Figure 3.17.......... 62 Table 3.3: Amplitude of the encoder input and output signals.............................................. 63 Table 3.4: Binary codes versus Gray codes........................................................................... 75 Table 3.5: Resultant code after inserting a bubble error into a thermometer code N. ........... 77 Table 4.1: Summary of the measured performance and comparison with recently published work ............................................................................................................. 100 Table 5.1: The NQ_sin / NQ_ramp ratio for different ADC resolutions .................................... 115 Table A.1: Sampling frequency measurement results for three sample chips..................... 143 Table A.2: Measurement results averaged for three chips compared with [25] .................. 143 ix LIST OF FIGURES Figure 1.1: Multi-level serial data transmission [2]................................................................. 3 Figure 2.1: Flash ADC architecture......................................................................................... 9 Figure 2.2: Mismatches between sampling instances of adjacent comparators in a distributed T/H [30] ......................................................................................................... 10 Figure 2.3: A preamplifier with reset switch [25] ................................................................. 11 Figure 2.4: Two latch stages of a 1.3GHz comparator [25] ................................................. 13 Figure 2.5: Probability of achieving a good linearity (e.g., DNL<0.5LSB) versus offset standard deviation for a 6-bit flash ADC [29] ............................................... 14 Figure 2.6: Input offset storage or “auto-zeroing” function [6]............................................. 16 Figure 2.7: Offset averaging resistors (R1) connecting adjacent preamplifiers [41] ............ 17 Figure 2.8: Dummy (over-range) preamplifiers in a 4-bit flash ADC [42] ........................... 18 Figure 2.9: Trimming by differential current DAC [18] ....................................................... 19 Figure 2.10: (a) Nominal decision levels of comparators (b) Example of the actual decision levels. Highlighted comparators are selected during calibration [33][39]. ... 20 Figure 2.11: Using three-input NAND gates to remove single bubbles [31] ........................ 22 Figure 2.12: Time-interleaved architecture for ADCs [47] ................................................... 23 Figure 2.13: Error signals as a result of gain mismatch (for M = 4) [47].............................. 24 Figure 2.14: 4-bit ADC reported in [50]................................................................................ 25 Figure 2.15: Amplifier and latch used in 4-bit ADC of [51] ................................................. 26 Figure 2.16: Comparator block diagram of the 4-bit ADC in [18]........................................ 27 Figure 2.17: Circuit of the comparator core [18]................................................................... 28 Figure 2.18: 10GS/s 4-bit flash ADC of [12] ........................................................................ 29 Figure 2.19: 4-bit ADC reported in [54]................................................................................ 30 Figure 3.1: Block diagram of the ADC ................................................................................. 35 Figure 3.2: Block diagram of the comparator........................................................................ 36 Figure 3.3: Timing diagram for the comparator operation .................................................... 36 x Figure 3.4: Circuit details of the comparator......................................................................... 37 Figure 3.5: Schematic of the preamplifier ............................................................................. 38 Figure 3.6: Schematic of a differential pair for large signal analysis .................................... 40 Figure 3.7: Large signal I-V characteristic of a differential pair [58].................................... 40 Figure 3.8: Circuit for calculation of input-referred offset voltage in a differential pair amplifier......................................................................................................... 42 Figure 3.9: Simulation results of the effect of the resistor averaging on input-referred offsets in the comparator array. In simulations, randomly generated offset voltages with zero mean and standard deviation of 1 LSB are applied to the comparators. Horizontal axes are the comparator numbers. In figures (a) to (c), dummy comparators are shown in gray color. ........................................ 45 Figure 3.10: Architecture of a CML latch ............................................................................. 46 Figure 3.11: Timing diagram for a CML latch ...................................................................... 46 Figure 3.12: Simplified circuit of the CML latch, in the latch mode .................................... 49 Figure 3.13: Simplified schematic for calculating bandwidth of a differential pair amplifier ....................................................................................................................... 52 Figure 3.14: Simulated waveforms of the clock, inputs, and outputs of the comparator, showing overdrive recovery under the stressful test conditions. ................... 53 Figure 3.15: Schematic of a simple 4-bit encoder ................................................................. 55 Figure 3.16: Pipelined structures with the worst case delays of............................................ 57 Figure 3.17: Three implementations of the encoder.............................................................. 58 Figure 3.18: Critical paths of the encoder circuits in Figure 3.17 ......................................... 59 Figure 3.19: Implementation of (a) AND/NAND gate (b) XOR gate and (c) Latch in CML61 Figure 3.20: Bar graph comparison of the speed and power of the CML implementations of Figure 3.17. The encoder with no-pipelining is used as the base circuit for this comparison.............................................................................................. 62 Figure 3.21: Simulated output waveforms of the four-stage pipelined encoder (Figure 3.17(c)) for a thermometer code corresponding to a ramp signal................. 63 Figure 3.22: Delay insertion to equalize the delays in the different signal paths of Figure 3.17(b)............................................................................................................ 64 Figure 3.23: An AND/NAND gate with a similar topology to an XOR gate........................ 65 Figure 3.24: The encoder all implemented in XOR gates ..................................................... 66 Figure 3.25: Routing of the comparator outputs to the encoder ............................................ 68 xi Figure 3.26: Encoder structure after reformulation and four-stage pipelining...................... 69 Figure 3.27: A visual proof for reformulation of the encoder function................................. 70 Figure 3.28: Timing diagram of the two-channel ADC ........................................................ 72 Figure 3.29: Feedthrough of input signal to resistor ladder through preamplifier [30]......... 73 Figure 3.30: Binary versus Gray counter............................................................................... 74 Figure 3.31: Effect of an error in B3 (a) in a binary code and (b) in a Gray code. The top waveform is related to the correct switching time. The bottom waveforms are related to earlier or later switching. The respective codes are shown below the waveforms. It is assumed that all other bits are switching at the right time.. 76 Figure 4.1: ADC chip micrograph ......................................................................................... 80 Figure 4.2: Propagation delay estimation of the input and clock distribution across the comparator array using Elmore delay. The delay for node M is estimated as RCm 2 2 1 ......................................................................................................... 81 Figure 4.3: (a) Schematic of a differential pair, (b) common-centroid layout for (a), .......... 83 Figure 4.4: A common-centroid layout for the resistor ladder. ............................................. 85 Figure 4.5: Post-layout simulation results (ADC output codes vs. the sample numbers) ..... 86 Figure 4.6: Output driver to drive the 50Ω input resistance of the oscilloscope................... 87 Figure 4.7: ADC test setup .................................................................................................... 88 Figure 4.8: Pie diagram of the power consumption in different blocks of the ADC............. 90 Figure 4.9: Measured DNL/INL for a 11.131MHz signal sampled at 3GHz ........................ 92 Figure 4.10: ENOB (SNDR) graph for a 10MHz input signal .............................................. 93 Figure 4.11: ENOB (SNDR) graphs for 3 and 3.5GHz clock frequencies............................ 94 Figure 4.12: SFDR measurement results ............................................................................... 95 Figure 4.13: ADC output waveforms (horizontal axes are the sample numbers) ................. 96 Figure 4.14: ADC output frequency spectrum for the waveforms of Figure 4.13 ................ 97 Figure 5.1: Operation of an ADC on a triangular waveform............................................... 103 Figure 5.2: The transfer characteristics of ADCs ................................................................ 107 Figure 5.3: Output waveforms for an ideal ADC with (a) a ramp and (b) a sinusoidal input signal............................................................................................................ 108 Figure 5.4: Comparison of the SNRs achieved using the derived formulas, time-domain and frequency-domain simulations, and Equation ( 5.2).................................... 117 xii Figure 5.5: FFT of the simulated ADC output for a 123 Hz sinusoidal signal sampled at 1 KHz for (a) Frms = 0 (b) Frms = 0.39 LSB.................................................. 117 Figure 6.1: A CML latch and the method to adjust its load resistance and tail current ...... 126 Figure 6.2: Speed improvement in ADC using inductors in preamplifier........................... 127 Figure A.1: Architecture of the proposed comparator......................................................... 136 Figure A.2: The proposed comparator................................................................................. 137 Figure A.3: Charge injection effect in a full TG ................................................................. 140 Figure A.4: Amplification by a full TG............................................................................... 140 Figure A.5: The test setup.................................................................................................... 143 Figure B.1: Static ADC metrics [22] ................................................................................... 145 xiii LIST OF ABBREVIATIONS ADC: Analog-to-Digital Converter BER: Bit Error Rate CML: Current-Mode Logic CMOS: Complementary Metal Oxide Semiconductor DAC: Digital-to-Analog Converter DC: Direct Current DLO: Decision-Level Offset DNL: Differential Nonlinearity ENOB: Effective Number of Bits FoM: Figure of Merit INL: Integral Nonlinearity ISI: Inter-Symbol Interference LSB: Least Significant Bit MSB: Most Significant Bit NMOS: Negative-Channel Metal-Oxide Semiconductor PAM: Pulse Amplitude Modulation PMOS: Positive-Channel Metal-Oxide Semiconductor PVT: Process, Voltage, Temperature SFDR: Spurious-Free Dynamic Range SNDR: Signal-to-Noise-and-Distortion-Ratio SNR: Signal-to-Noise-Ratio T/H: Track-and-Hold TG: Transmission Gate TI: Time-Interleaved UWB: Ultra-Wideband xiv ACKNOWLEDGMENTS I would like to express my gratitude to my supervisors, Professors Shahriar Mirabbasi and André Ivanov for their enthusiasm, guidance, and unconditional support. It was an honor to work with them, and I thank them deeply for giving me the opportunity to join their research groups. I am thankful to my colleague Ph.D. students, Mehdi Alimadadi, and Shahrzad Jalali Mazlouman, for all their support and the technical discussions that played a significant role in my research. Also, I would like to acknowledge Dr. Roberto Rosales for his technical assistance and willingness to help. I wish to express my deepest appreciation to my parents and brothers and sisters for their unconditional love and endless support. They have always been the greatest source of inspiration for me, and I dedicate this thesis to them. This research was supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC), the Bell University Laboratories (BUL) program, Micronet, and CMC Microsystems. xv DEDICATION To my family 1 1 INTRODUCTION 1.1 Motivation High-speed analog-to-digital converters (ADCs) are the key building blocks in many applications including high-data-rate serial links [1]–[3], ultra-wideband (UWB) systems [4], the read channels of magnetic and optical data storage devices [5], high-speed instrumentation [6][7], wideband radar and optical communications [8]. A majority of these applications require 4 to 6 bits of resolution at mutli-GHz conversion rates. For high-speed low-to-medium resolution, ADCs the most appropriate architecture is flash. This architecture is widely used in ADCs with resolutions of 7 bits or less. In CMOS technology, for 6-bit resolution, conversion rates of 1GHz and beyond have been previously reported. For 4-bit resolution, multi-GHz rates are also achieved using time-interleaved architectures. It is important that such ADCs be implemented in a standard CMOS process for easy integration with other digital signal processing circuits. The main challenges in designing high-speed CMOS flash ADCs are optimizing the speed and power, static and dynamic offset reduction, calibration, and low supply voltage operation [9]. One application of high-speed low-to-medium resolution ADCs is in serial links. Because of their lower cost and lower power consumption, serial links are atracting more and more attention in wireline data transmission. PCI-Express [10] and Serial ATA [11] are two examples of multi-giga-bit per second serial links. The conventional binary pulse amplitude modulation (2-PAM) cannot accommodate the high-speed data rates of many modern 2 applications due to the intrinsic bandwidth limitation of the transmission medium. To address this problem, multi-bit-per-symbol modulations (multi-level PAM) appear as an attractive solution [12]. As an example, for a given transfer rate, an 8-PAM modulation scheme reduces the symbol rate of the channel to one third of a conventional 2-PAM modulation. As a result, the inter-symbol interference (ISI) in the channel is also reduced. Figure 1.1(a) shows a basic block diagram of a multi-level serial data transmission between two chips. Chip A uses an N-bit digital-to-analog converter (DAC) to transmit the data across the channel and Chip B recovers the data using an N-bit ADC [2]. The serial link data rate is the same as the parallel data rate. In the case of an 8-PAM data transmission, the waveforms are shown in Figure 1.1(b). For the proper operation of the receiver, the analog signal should be sampled close to the center of each symbol. Alternatively, the received data can be oversampled, as shown in Figure 1.1(b), to determine the sample that is closest to the center of the transmitted symbol [2]. This example shows how achieving higher speeds in the ADC design is translated to higher rates and/or more reliable data transmission in a serial link. 1.2 Objectives This research is aimed at developing design techniques for flash ADCs with emphasis on high-speed and low-power operation, as compared to previous work. In order to integrate the ADC with other analog and digital circuits on a single chip, the circuit is designed in CMOS technology. The design is targeted for applications such as high-speed serial links and UWB that require 4 bits of resolution at multi-GHz speeds. However, the applicability of the techniques to higher resolution flash ADCs (such as 5- and 6-bit ADCs) are also considered. As the focus is on the circuit design aspects, a single-channel ADC is targeted to avoid the 3 complexity of on-chip calibration and/or digital post processing of the multi-channel architectures. The ADC is targeted for 0.18µm as it was the mainstream CMOS technology, at the beginning of this research. However, the circuit should be portable to smaller feature size CMOS technologies with lower supply voltages. A speed of 4GS/s is considered, as it is more than two times faster than the fastest reported single-channel flash ADC in CMOS technology at the time that we started this research (a 1.6GS/s 6-bit flash ADC in 0.18µm [13]). Nonetheless, the methods to increase the speed, and the speed/power trade-off should be investigated. (a) basic block diagram (b) output waveforms for an 8-PAM serial link, sampled at a 3× speed Figure 1.1: Multi-level serial data transmission [2] 4 1.3 Contributions 1.3.1 Implementation of the Complete ADC using CML Blocks, Achieving a High-Speed, Low-Power, and Compact Design In this thesis, the design and measurement results of a single-channel 4-bit flash ADC in 0.18µm CMOS are presented [14]−[17]. All the ADC sub-blocks including the comparators and the encoder are implemented using current-mode logic (CML) circuits. These CML circuits include a current-mode preamplifier, a CML latch, a CML AND/NAND gate, and a CML XOR gate. The similarity of the sub-blocks facilitates increased matching in the layout. Also, due to the CML implementation, the ADC has a number of features as follows. (1) All the signals in the circuit are differential and low swing. Differential operation results in higher immunity to the common-mode noise, while low-swing operation leads to lower noise generation. (2) As all the clocked transistors in the circuit are differential pairs, low-swing differential clock signals (as well as sinusoidal clocks) can be used. This low-swing operation is important, especially at high speeds. (3) All the stages of this ADC consist of differential pairs and none of the tail current sources are turned off. This minimizes the supply noise due to switching (current ripple less than 10%), which removes the need for on-chip supply-voltage decoupling capacitors. In addition to the CML implementation, the comparator array and the encoder are fully pipelined. As a result, the flash ADC achieves a sampling rate of 4GS/s. To the author’s knowledge this is the fastest reported single-channel 4-bit ADC in a 0.18µm or an older CMOS technology. The ADC is also the most power efficient among the 4-bit ADCs with similar or higher speeds. Also, as the inductors are not used, the ADC has one of the smallest reported active areas in those technologies. Inductors can be used in the comparators to improve their bandwidth; however, they are avoided here to keep the circuit 5 simple and compact with lower wire parasitics, which allows for high-speed and/or low- power operation. A Gray coding encoder is used to increase the immunity against bubble errors and metastability. As a result, complex digital schemes such as Wallace tree counting [18] or digital averaging [19] are avoided. The 4GS/s 4-bit flash ADC of [18] is implemented in 0.18µm CMOS and consumes ~0.6W and the 4GS/s 6-bit flash ADC of [19] is implemented in 0.13µm CMOS and consumes 1W. It should be noted that the speed of 4GS/s in [18] is achieved using a single-channel comparator array; however, the encoder is time-interleaved. Those reported powers include that of the clock buffer, which is expected to be in the order of 1/3 of the total power. The power consumption of the proposed ADC is 43mW (with no clock buffer) that shows a significant reduction (~10×) in the power consumption, compared to [18]. Part of this power saving is also due to the CML implementation, in which the power consumption of each block is dictated by a bias current. Therefore, the power does not increase with higher sampling rates. 1.3.2 Reformulation in the Encoder Function High-speed operation in the encoder requires both circuit-level and layout-level considerations. An important challenge in the encoder layout is the connection of the comparator outputs to the encoder inputs. Long routings and different wiring lengths for different comparator outputs and crossings of the wires adversely affect the encoder speed. In this work, the expressions for converting the thermometer codes to Gray codes are reformulated in order to overcome these layout issues. The proposed reformulation [17] takes advantage of the special properties of thermometer codes. In the new formulas, only the outputs of the physically adjacent comparators in the layout are connected to the same 6 gate in the encoder. Therefore, all long wires that connect the comparator outputs to the first gate level in the encoder are replaced with short wires of equal length. As a result, the total number of long wires and wiring crossovers are greatly reduced. Without this reformulation, achieving the reported speed of 4GS/s is only possible by either increasing the drive power of the last latch stage in the comparator, or by adding an extra level of CML latches at the midpoint of wires connecting the comparator outputs to the first gate level. Both these solutions would result in higher complexity and also higher power consumption. 1.3.3 Introducing a Common-Centroid Layout for the Resistor Ladder The resistor ladder is spread over the entire comparator array input to provide the reference voltages for all comparators. Therefore, resistors in the ladder are prone to systematic mismatches arising from fabrication process variations, and can potentially increase the decision-level offsets (DLOs – refer to Appendix B) of the ADC. A common-centroid layout is introduced in this work for the resistor ladder to reduce the effect of those mismatches [17]. Each resistor is first broken into a number of parallel segments such that the resistor ladder forms a grid of resistors. The resistor grid is then twisted to interleave the resistor segments symmetrically such that the structure ends up possessing a common center. This technique is more effective in flash ADCs with a larger area (e.g., those that include inductors in their design). Moreover, it can be used in flash ADCs with higher resolutions, as they require a larger on-chip area for the resistor ladder, and are also more sensitive to the offsets of the reference voltages. 7 1.3.4 Time-Domain Analysis of INL Effects on the SNR of ADCs An accurate formula is derived for signal-to-noise ratio (SNR) of an ADC based on its integral nonlinearity (INL) performance [20]. Effects of differential nonlinearity (DNL) and INL errors on the ADC operation are well-studied in the literature, including the formulas that are given to estimate SNR in the presence of jitter and nonlinearity. Those formulas have reasonable accuracy for high-resolution ADCs (with more than ~7 bits of resolution). However, they are not accurate for ADCs with lower resolutions, such as a 4-bit flash ADC. In addition, other formulas are generic and do not take the input waveform into account. The type of input is definitely important for estimating the SNR in low-resolution ADCs. In this thesis, accurate SNR formulas are derived versus the INL performance for two different input waveforms: a ramp signal, as a good representative of a uniformly distributed input waveform; and a sinusoidal signal. 1.4 Outline The thesis is organized as follows. Chapter 2 provides a background on flash ADC design including building blocks of a flash ADC, offset removing/reduction in the comparators, and a review of the previous work in the area of 4-bit flash ADCs. Chapter 3 covers the design details of the proposed 4-bit flash ADC including the block diagram, architecture and circuit design of the comparator and the encoder, pipelining and CML implementation, and reformulation in the encoder for speed improvement. Chapter 4 presents the layout techniques used in the design, the test setup and the measurement results for the flash ADC. Chapter 5 derives the formulas that describe the SNR of an ADC in terms of its INL performance, and Chapter 6 provides concluding remarks and suggestions for future work. 8 2 BACKGROUND 2.1 High-Speed ADC Architectures Among the different ADC architectures, pipeline, folding, and flash architectures are the most popular candidates for a high-speed design [21][22]. Pipeline ADCs [23] are usually used for high resolutions. However, large latency is a drawback, and also proper sampling at higher rates is challenging. Folding ADCs [24] consume less power and have lower latency as well as the possibility of operation at higher sampling rates. However, they suffer from the limited bandwidth of the analog path. Flash [2][5][8][9][21][25]–[29] is the architecture of choice for a high-speed ADC design, for applications that require 6 bits of resolution or lower. Interpolation is also an option in a flash ADC [26] that reduces the power consumption by removing some preamplifiers in the preamplifier array. However, as each preamplifier drives more than one comparator, it comes at the cost of reduction in the speed, as the capacitance load for each preamplifier increases. Multi-channel architectures, such as frequency- or time-interleaved systems, are alternative options in achieving very high-speed circuits. However, these parallel ADCs require extra circuit area and usually suffer from mismatches among the channels that call for costly calibration algorithms [6]–[8]. 9 2.2 Building Blocks of a Flash ADC A generic architecture for a flash ADC is shown in Figure 2.1. The input signal of the ADC is compared against evenly-spaced reference voltages generated by a resistor ladder. Comparators, including several amplification-and-latching stages, amplify the differences between the input signal and those reference voltages. They deliver the comparison results as an array of digital bits or a codeword to the encoder. This codeword is called a thermometer code, due to the thermometer-like appearance of 1’s and 0’s in it (see Figure 2.1). The encoder converts the thermometer code to a Gray or a binary code. Flash ADCs have a simple architecture. However, the circuitry required to achieve very high speeds at reasonable power consumption, and also the provisions required to overcome the comparator offsets, make their implementation challenging. In the following sections, the main building blocks of a flash ADC, including track-and-hold, preamplifier, comparator, and encoder are explained. The main challenges, as well as sample state-of-the-art circuits, are presented for each block. Figure 2.1: Flash ADC architecture 10 2.2.1 Track-and-Hold Track-and-hold (T/H) is usually the first stage in an ADC. T/H samples the analog input signal at predefined sampling times and holds the sample static for digitization by the rest of the ADC circuit. T/Hs in a flash ADC can be in the form of a single front-end T/H or an array of distributed T/Hs. Front-end T/Hs sample the signal for the overall comparator array, while the distributed T/Hs sample the signal for each comparator [28]. Therefore, front-end T/Hs are more challenging to design, as they should have higher linearity, higher speed, and also higher drive capability in order to overcome the input capacitance of a large number of comparators [30]. Distributed T/Hs, on the other hand, are easier to design and they improve the conversion rate by pipelining the analog path. However, their performance degrades as a result of several mechanisms under Nyquist-rate conditions. Some of those mechanisms include capacitive feedthrough from the analog input signal to the resistor ladder, mismatches between the sampling instances of adjacent comparators at high frequencies, as shown in Figure 2.2, and slew-rate limited input stage that changes the shape of the input signal [22][30]. Figure 2.2: Mismatches between sampling instances of adjacent comparators in a distributed T/H [30] 11 2.2.2 Preamplifier High-speed comparators usually contain one or more stages of pre-amplification followed by several track-and-latch stages. The preamplifiers typically have a low gain and are used to increase the resolution and to minimize the effect of kickback noise. Kickback refers to the charge injection into the input when a track-and-latch stage goes from the track mode to the latch mode [31]. The differential difference amplifier structure shown in Figure 2.3 is a common circuit used as a preamplifier [25]. Figure 2.3: A preamplifier with reset switch [25] Loads can be either diode-connected transistors [21][25] or resistors to attain higher bandwidths [13][29]. This structure provides a very high unity-gain bandwidth and reasonably high output swings. In the case of a large differential input voltage, the overdrive voltages of the input transistors are high and those transistors can go into the linear mode. Recovery from such a status requires a long time that slows down the preamplifier. In [25], a reset switch Msw is inserted between the two output nodes, as shown in Figure 2.3. This switch is intended to reset the preamplifier in each new clock cycle, to help it override its previous decision that could be a result of large overdrive voltage. While the T/H (that is the 12 stage prior to the preamplifier) is in the hold mode, the reset switch is off. When the T/H goes to track mode, the reset switch is turned on to erase the residual voltage from the previous sample. This is referred to as overdrive recovery. Failing to recover from an overdrive would result in a dynamic offset in the comparator, which means that the comparator keeps a memory of its previous decision. 2.2.3 Comparator Usually a single-stage, high-gain and high-speed comparator is quite power hungry. In addition, the preamplifier stage also consumes more power to drive the large transistors of such a comparator. Therefore, a multistage cascaded comparator is preferred, in which the output voltage swing of each stage is increased stage by stage [32]. The main design challenges in a comparator block are speed, power consumption, resolution and, last but not least, offset cancellation. Figure 2.4 shows a comparator that is designed to be used in a 6-bit 1.3GS/s ADC in 0.35µm CMOS [25]. This comparator has two latching stages. In the first latch, shown in Figure 2.4(a), when CLK is low, the differential output follows the differential input. During this period, the switch M7 is shorted that erases any memory from the previous decision, and the new input is sampled with a less than unity gain in the latch outputs. When CLK goes high, the sampled signal is further amplified and latched. The second latching stage, shown in Figure 2.4(b), provides rail-to-rail output voltages. During the reset mode (CLK = high), the outputs of the latch are reset through clocked transistors and the input is copied to the cross-connected inverters of M1-M3 and M2-M4. In the next half clock cycle (the regeneration mode), differential pair devices (M1, M2, and M5) steer the tail current from one side to the other, to speed up the regeneration process. 13 (a) the first latch (b) the second latch Figure 2.4: Two latch stages of a 1.3GHz comparator [25] 2.3 Offset of the Comparator Fabrication process variations may introduce mismatches in the different parts of an integrated circuit. Those mismatches can change the comparator decision levels and result in DNL errors and more importantly can cause nonuniformity. The so called bubbles can also appear in the thermometer code, resulting in a degradation of ADC linearity and signal-to-noise-and-distortion-ratio (SNDR) [33]. Therefore, there is a limit for the offset voltage standard deviation of the comparator in order to guarantee that the design achieves a certain performance with a high probability [29][34]. Figure 2.5 demonstrates the probability of achieving a good linearity (e.g., DNL<0.5LSB) versus offset standard deviation for a 6-bit flash ADC. Offsets are generally divided into three categories [35]: systematic, dynamic and random. Systematic offsets appear as a result of deterministic variations in the fabrication process, such as increase in the doping level of the n+ regions along a particular axis of the wafer. Some layout techniques such as common-centroid [36] can be used to minimize those 14 offsets. Dynamic offsets occur due to the limited bandwidth of the ADC. They can appear for example from failing in overdrive recovery in the comparator that leads to residual voltage from the previous sample. Random offsets are the result of random variations in the fabrication process, that cause mismatches to be introduced in the size and threshold voltage of the transistors. Those mismatches are inversely proportional to the square root of the transistor area, as given by the following equations [35][37]. WL AV VTHth =)(σ ( 2.1) WL A W W W = ∆ )(σ ( 2.2) where, W, L, and Vth are the width, length, and threshold voltage of the transistors, and AVTH and Aw are process-dependent parameters. Figure 2.5: Probability of achieving a good linearity (e.g., DNL<0.5LSB) versus offset standard deviation for a 6-bit flash ADC [29] 15 2.3.1 Offset Removal or Reduction Reducing or removing the comparator offset is a major challenge in designing flash ADCs. Besides using preamplifiers, various methods are introduced in the literature for removing/reducing the comparator offsets in flash ADCs. These methods fall into three categories: input offset storage, offset averaging, and digital calibration. Offset averaging is the most popular technique for offset reduction in high-speed ADCs [8][9][13][23][25][28]. Input offset storage is a common method for offset reduction in ADCs that are not constantly working and have idle time periods in their operation [21][27]. Digital calibration is the ultimate way for accurate offset removal that improves the resolution at the cost of increased digital complexity of the system [23][38][39]. Sometimes this method is used in conjunction with one or both of the other analog offset reduction techniques. 2.3.2 Input Offset Storage This technique measures the input offset voltage of the preamplifier by closing a unity-gain feedback loop around it and stores the measured offset on the capacitors that are added in series with the input ports [22]. This operation is sometimes referred to as “auto-zeroing” function. Figure 2.6(a) shows a schematic of two preamplifiers in series that are part of a comparator [6]. The on-off timing for the switches for the proper operation of the auto-zeroing function is shown in Figure 2.6(b). This method can be used, as an example, in disk drive read channels that allow idle times during the repositioning of the head [21]. 16 P1 P2 To Latch az1 az1 C C+Ref+ In+ In Ref az2 az3 az3 az2 Auto-zero az1 Normal conversion az2 az3 (a) (b) Figure 2.6: Input offset storage or “auto-zeroing” function [6] (a) the circuit schematic (b) on-off timing of the switches for proper operation 2.3.3 Offset Averaging Invented by Kattmann and Barrow in 1991 [40], today this technique is used in almost all flash or folding-interpolating ADCs. In this method, as shown in Figure 2.7 [41], a resistor network is used to connect the outputs of the adjacent preamplifiers in the preamplifier array. This resistor network has no impact on the circuit operation, when the devices are matched. When mismatch occurs and causes error, the resistor network uses the average of many preamplifier outputs and produces a restoring force that pushes the errors toward zero and reduces DNL. Error correction factor (ECF) is defined as the percentage improvement in the DNL performance for a given 0 1 R R ratio [40], where R1 is the resistance of the averaging network and R0 is the output resistance of the preamplifier array. 100 2 2 1 1 0 0 0 ×                   + + − −= x x x R R RR R RECF , 2 4 10 2 11 RRRRRx ++ = ( 2.3) In [41], the effect of averaging on the zero crossing points of a typical preamplifier array is studied using Monte Carlo simulations. 17 According to the study in [13], as the ratio 0 1 R R is lowered, DNL and INL errors decrease proportional to 4 3 0 1       R R and 4 1 0 1       R R , respectively. For values of 0 1 R R smaller than two, the DNL becomes negligible and the INL remains as the main error. Figure 2.7: Offset averaging resistors (R1) connecting adjacent preamplifiers [41] The R0-R1 network forms a spatial filter with impulse response h(n) [25]       +− == 0 1 2 1cosh ,).0()( R R a n ebbhnh ( 2.4) where n represents the index of differential pairs in the array. The ratio 0 1 R R determines the width of the impulse response. Averaging is optimum when the number of differential pairs in the unclipped linear region of their characteristic is larger than the impulse response width of the network [25][36]. Dummy (or as sometimes called, over-range) preamplifiers are inserted at either end of the preamplifier array to remove the edge effects by simulating an infinite array. As an example in Figure 2.8, three dummy preamplifiers are added at each end of the preamplifier array in a 4-bit flash ADC. For achieving better performance, as shown in Figure 2.8, the outputs of 18 the first and last dummy preamplifiers are cross connected. As a result, all the preamplifiers in the array would have the same effective load impedance and, at the same time, a balanced number of preamplifiers are contributing to the preamplifier’s output [42]. Figure 2.8: Dummy (over-range) preamplifiers in a 4-bit flash ADC [42] An alternative solution to the dummy preamplifiers is presented in [13]. In this solution, the values of the averaging resistors at the edges of the averaging network are changed to simulate the effect of an infinite preamplifier array. 2.3.4 Digital Calibration Moving toward the smaller feature sizes benefits analog and digital circuits at the same time. However, the overall performance of the analog circuits is compromised by trends such as reduced supply voltages. The idea of digitally-assisted analog circuits closes the gap between analog and digital circuits by handing over the analog precision requirements to a digital processor [43]. Three examples of digital calibration are presented here. 19 2.3.4.1 Calibration by offset compensation Various offset compensation techniques are introduced in the literature. Among them, trimming by a current DAC is explained here, that is used in a 4GS/s 4-bit flash ADC implemented in 0.18µm CMOS [18][44]. As shown in Figure 2.9, a differential current DAC is connected to the preamplifier outputs. During calibration, the inputs of the preamplifier are shorted. The calibration engine begins the search by applying the entire DAC current to one side of the preamplifier. The current is then steered in 1 DAC LSB increments until the output of the comparator changes. This calibration scheme reduces not only the offset of the preamplifier, but also the input-referred offset of the next stages [18]. Figure 2.9: Trimming by differential current DAC [18] 2.3.4.2 Calibration incorporating redundancy This calibration scheme allows accuracy to be achieved through the use of redundant components in the fabricated circuit. As a result, analog performance is decoupled from the 20 component matching. In this method, very large comparator offsets (several LSBs) can be tolerated, allowing small and fast comparators to be used in the ADC design. This also reduces the power consumption of the analog circuits. This technique is used in a 500MS/s 6-bit flash ADC implemented in 0.13µm CMOS [33][39]. In this method, instead of 2N−1 comparators, the ADC has a bank of R×(2N−1) comparators, with R comparators assigned to each code. During calibration at power-on, a calibration algorithm searches the entire bank and assigns the most suitable comparators to each code. Figure 2.10 shows an example with R equal to 3. The drawback of this method is the reduction in the circuit speed as a result of switches that route the signals of the selected comparators to the encoder. Another example of redundant components is given in the 4-bit flash ADC of [18]. In this ADC, a redundant set of comparators is also implemented on the same chip. If the search for calibration (explained in Section 2.3.4.1) fails, then the redundant comparator is calibrated and used. Figure 2.10: (a) Nominal decision levels of comparators (b) Example of the actual decision levels. Highlighted comparators are selected during calibration [33][39]. 21 2.3.4.3 Digital offset averaging This method is also based on the redundant elements in the analog part of the ADC, and has been used in a 4GS/s 6-bit flash ADC in 0.13µm CMOS [19]. Instead of 63 comparators, a bank of 255 comparators is utilized. These comparators are small and fast, with relatively large offsets. As a result, a large number of bubble errors appear in the thermometer code. An adder adds up the “1’s” in the thermometer code to achieve an 8-bit binary number. The two LSBs of the summation are later discarded due to the inaccuracies introduced by the offsets. This method is named digital offset averaging, and is comparable to analog offset resistor averaging. A major drawback of this method is the significant power consumption, as all the comparators including the redundant ones are operating at the same time. In [19], the power consumption of the ADC is 1W. 2.4 Digital Encoder The encoder circuit is the last stage of a flash ADC that converts the thermometer code output of the comparator array into a binary code. Usually, the thermometer code is first converted to a 1-out-of-N code before coding into binary. Two main issues in the encoder design for a flash ADC are speed and error handling capability [45][46]. 2.4.1 Sources of Errors in a Flash ADC There are two types of errors in a flash ADC; metastability and bubble errors. Metastability in a comparator happens when the comparator output is neither logic ‘0’ nor logic ‘1’. As a result, if the comparator output is connected to different logic gates, this undefined output is interpreted by some gates as ‘1’ while others interpret it as ‘0’. This different interpretation 22 results in significant errors in the ADC output. Bubble error happens when the thermometer code is corrupted by a zero among ones or vice versa (having more than one bubble error is also possible). Bubble errors result from three major sources. First, comparator offset voltages larger than 0.5 LSB may change the zero crossing points of the two adjacent preamplifiers and result in a bubble error. Second, in the ADCs with distributed T/H, different sampling times in different comparators may cause bubble errors. Finally, the propagation delay variations through the preamplifier stage at high input frequencies is another source of bubble errors [25]. The simplest method to suppress single bubble errors is using three-input NAND gates, instead of conventional two-input NAND gates, while converting the thermometer code to a 1-out-of-N code. This is shown in Figure 2.11 [31]. 1 1 0 1 0 1 0 1 1 1 Figure 2.11: Using three-input NAND gates to remove single bubbles [31] 2.4.2 Gray Coding A more efficient way to reduce the effect of those errors explained above is using Gray code as an intermediate step. The probability of metastable states can be lowered, as in a Gray encoder no signal is applied to more than one input, allowing the use of pipelining to increase the time for regeneration [22]. The effect of bubbles is reduced, because only one 23 bit changes between adjacent Gray codes. Therefore, the accuracy of the Gray code degrades very gradually as more bubble errors appear in the thermometer code [22]. In Chapter 3 (Section 3.7) Gray coding is explained in more detail and its effects on metastability and bubble errors are investigated. 2.5 Time-Interleaved ADCs A time-interleaved (TI) ADC system, as shown in Figure 2.12, is an effective way to achieve a high-sampling rate ADC with relatively slow circuits. However, mismatches, such as offset and gain mismatches among ADC channels, as well as timing skew of the clocks distributed to those ADCs degrade the SNR of the overall TI system by introducing extra components in the output signal spectrum. For a TI system consisting of M parallel ADCs, each sampling at fs/M, these components include M pairs of line spectra centered at multiples of fs/M as a result of gain mismatches and/or timing skews, and M line spectra located at multiples of fs/M as a result of offset mismatches [47]−[49]. Figure 2.12: Time-interleaved architecture for ADCs [47] 24 As an example, gain mismatch error components in the output spectrum in the case of M = 4 are shown in Figure 2.13. Figure 2.13: Error signals as a result of gain mismatch (for M = 4) [47] 2.6 Review of the Previous Work in the Area of 4-Bit ADCs in CMOS Technology In this section, the results of a literature survey for 4-bit flash ADCs are presented. Some of these designs are investigated in detail, including the techniques used to enhance the speed or to reduce the power consumption. A comparative table is provided at the end, to summarize the specifications of these ADCs1. 2.6.1 A 68mW 1.356GS/s 4-Bit Flash ADC in 0.18µm CMOS A 1.356GS/s 4-bit ADC targeted for direct-spectrum code-division multiple-access ultra-wideband (DS-CDMA UWB) communications is presented in [50]. The ADC uses a 1 Appendix B of this thesis provides a brief description of the ADC performance metrics. 25 fully-differential flash architecture. To achieve low power consumption and high conversion rate, the proposed converter is designed with a preamplifier array that consists current-mode amplifiers (CMAs) followed by dual sense amplifiers (DSAs). The ADC architecture and the CMA and DSA circuits are shown in Figure 2.14. The dual sense amplifier senses both the voltage and the current differences of the preamplifier output signals. The ADC achieves an effective number of bits (ENOB) of 3.7 (3.35) at 30MHz (650MHz) input while sampling at 1.356 GHz. The current draw is 38mA from a power supply of 1.8V. The ADC is fabricated in a 0.18µm CMOS process and its active area is 0.35mm2. (a) ADC block diagram (b) Current-mode amplifier (CMA) (c) Dual-sense amplifier (DSA) Figure 2.14: 4-bit ADC reported in [50] 26 2.6.2 A 1W 12GS/s 4-Bit ADC in 0.25µm CMOS using Eight Time-Interleaved Flash ADCs A 12GS/s 4-bit ADC fabricated in a 0.25µm CMOS process is reported in [51]. The ADC is targeted for an equalized multi-level link. The speed is achieved by a TI architecture consisting of eight 1.5GS/s flash ADCs. Clocked differential amplifiers are used to sample the input, as shown in Figure 2.15(a). They are followed by high-speed comparators, shown in Figure 2.15(b), with current-summed offset cancellation that basically includes digitally-controlled current sources for calibration. In Figure 2.15(a), when CLK is high (CLKB is low), M1, M2, and M3 are in the linear region, and the input is amplified. When CLK is low, the input pair and the loads are disabled and the output is held in high impedance. The input bandwidth is measured as 2.5GHz and the ENOB is reported as 3.34 at low-frequency inputs. The chip consumes 1W from a 2.5V supply, and the total chip area, including 128×4 bits memory for storing output bits, is 2.8×3.8mm2. (a) Clocked sampling amplifier (b) Offset cancelled latch Figure 2.15: Amplifier and latch used in 4-bit ADC of [51] 27 2.6.3 A 0.6W 4GS/s 4-Bit Flash ADC in 0.18µm CMOS A 4-bit flash ADC implemented in 0.18µm digital CMOS is reported in [18][44] and achieves a sampling rate of 4GS/s. As shown in Figure 2.16, the comparator consists of a comparator core and two latch stages followed by a D flip-flop (DFF). High comparator speed is made possible through the use of on-chip differential inductors (32µm by 32µm wide) in the comparator core, shown in Figure 2.17, and the use of small fast devices. A combination of DAC trimming and comparator redundancy are used to reduce the DNL and INL errors. The ENOB is measured at 3.84 and 3.48 bits for a 100MHz input sampled at 3 and 4GS/s, respectively. A Wallace tree counter [52] is used for thermometer-to-binary conversion, for increased immunity to metastability and bubble errors. To ensure a proper operation of the counter at 4GS/s, the ADC uses two time-interleaved counters running at half of the comparator clock frequency (i.e., 2GHz). The ADC including the clock buffer consumes ~0.6W from 1.8V (for the analog part) and 2.1V to 2.5V (for the digital part) supplies. The input capacitance is 1.6pF. Figure 2.16: Comparator block diagram of the 4-bit ADC in [18] 28 Figure 2.17: Circuit of the comparator core [18] 2.6.4 A 225mW 10GS/s 4-Bit Flash ADC in 0.13µm CMOS A 10GS/s 4-bit flash ADC implemented in 0.13µm CMOS is reported in [12]. This ADC paired with a current-steering DAC is used in the design of advanced serial link transceivers. CML gates are used to alleviate the severe power bouncing. The block diagram of the ADC is shown in Figure 2.18(a). The active feedback amplifiers, CML, and wave-pipelining technique [53] help achieve the ultimate 10GHz sampling rate. The ADC achieves an ENOB of 3.86 bits, for a 1.11GHz input signal. The overall power consumption of the test chip is 420mW from a 1.2V supply, of which 225mW is consumed in the ADC. The area of the ADC is 0.1575mm2. Similar to the ADC proposed in this thesis [14]−[17], this paper (published in November 2007) uses the concept of low-swing operation using CML blocks to enhance the circuit speed. An example of such CML gates is shown in Figure 2.18(b). 29 Comparator Slice #1 Comparator Slice #2 ... Comparator Slice #15 O R A rr a y T h e rm o m e te r- to -G ra y E n c o d e r C M L L a tc h e s Vin Differential CLK Intentional Delays Gray-code Outputs Comparator Array (a) ADC architecture (b) A CML AND/NAND/OR/NOR gate used in the encoder Figure 2.18: 10GS/s 4-bit flash ADC of [12] 2.6.5 A 2.5mW 1.25GS/s 4-Bit Flash ADC in 90nm CMOS In [54], a very low power 1.25GS/s 4-bit flash ADC in 90nm CMOS is presented. It achieves 3.7 ENOB from DC to Nyquist rate input while consuming 2.5mW from a 1.2V supply, that results in an energy per conversion-step of 0.16pJ. To save power in this ADC, as shown in Figure 2.19(a), all the non-essential blocks of the flash ADC have been 30 removed, including T/H, preamplifiers, reference ladder, and bubble error correction. The comparators have built-in threshold levels by proper sizing of the input transistor pairs and dynamic calibration by binary scaled array of variable capacitors, as shown in Figure 2.19(b). The outputs of the comparators are stored in Set-Reset latches and later converted into a 4-bit Gray code (that has intrinsic error correcting properties) using a ROM-based encoder. The word-line selection of the encoder is performed by 2-input NAND gates. (a) ADC architecture (b) Circuit of comparator with built-in threshold voltages and dynamic calibration Figure 2.19: 4-bit ADC reported in [54] 31 2.6.6 A 40GHz-Bandwidth 4-Bit Time-Interleaved ADC using GaAs Photoconductive Sampling GaAs photoconductive switches have been integrated with parallel TI 4-bit CMOS ADC channels to demonstrate sampling of electrical signals with tens of GHz bandwidth at low-to-medium resolution [42]. An experimental two-channel TI ADC achieves an ENOB of 3.5 bits for inputs up to 40GHz when tested at an optically-triggered sampling rate of 160MHz. The sampling rate was limited by the available optical source. Each ADC channel operates at up to a 640MHz conversion rate, dissipates 70mW from a 2.5V supply, and occupies an area of 150µm×450µm in 0.25µm CMOS. The main drawback of photoconductive sampling is that it cannot be integrated with CMOS technology, and results in a higher implementation cost. 2.6.7 Performance Summary of the Previous Work Table 2.1 summarizes the performance of the previous work discussed above. It also includes two other designs (the last two columns) that are used for comparison. In [55], a 5-bit flash ADC in 90nm CMOS is presented that uses a comparator core similar to Figure 2.17 (Section 2.6.3) to achieve a sampling rate of up to 4GS/s. In [19], a 6-bit flash ADC in 0.13µm CMOS is demonstrated that uses small transistors with large offsets to achieve a sampling rate of 4GS/s. A digital offset averaging technique (Section 2.3.4.3) is then used to reduce the effect of bubble errors arisen from those offsets. 32 Table 2.1: Performance summary of the previous work [50] [51] [18] [52] [54] [42] [55] [19] Technology (CMOS) 0.18µm 0.25µm 0.18µm 0.13µm 90nm 0.25µm 90nm 0.13µm Resolution (bits) 4 4 4 4 4 4 5 6 Sampling rate (GS/s) 1.356 12 (8×1.5) 4 (*) 10 1.25 n×0.64 (**) Up to 4 4 Supply (V) 1.8 2.5 1.8 1.2 1.2 2.5m 1.4 1.5 Power (mW) 68 1000 619 225 2.5 n×70 132 990 ENOB @ fin 3.7 @30M 3.34 3.89 @10M 3.86 @1.11G 3.7 @0.62G 3.5 4.28 @5M No ENOB Area (mm2) 0.35 - 0.88 0.1575 0.033 N×0.067 0.658 0.5 Section 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.6.6 2.6.7 2.6.7 (*) This design uses a single-channel comparator array, and a two-channel time-interleaved encoder. (**) This design has a 40GHz bandwidth. However, the sampling rate depends on the number of time- interleaved channels, n. 33 3 FLASH ADC DESIGN 3.1 Introduction In this chapter, the design of a 4-bit flash ADC is presented, which is targeted for applications such as high-speed serial links and UWB. The emphasis of the design is on high-speed and low-power operation. The target technology is 0.18µm CMOS, with a 1.8V supply. However, the circuit should be portable to newer CMOS technologies with lower supply voltages. A single-channel ADC is preferred to avoid the complexity and cost of on-chip calibration to compensate for channel mismatches. A speed of 4GS/s is considered to satisfy the above applications and also to beat the fastest reported single-channel ADC (at the beginning of this research) [13] by 2.5 times (the speed of 4GS/s is still the fastest reported single-channel ADC in a 0.18µm or an older CMOS technology). Nevertheless, the methods to increase the speed should be investigated. Although, this ADC is designed to have a 4-bit resolution, the compatibility of the circuits and design techniques to flash ADCs with higher resolutions should be considered. With the above goals in mind, we decided to implement the circuits with simple CML blocks that only consist of resistors and NMOS transistors. PMOS transistors are avoided because of their lower speed. Using CML circuits ensures high-speed operation, while the simplicity of the blocks provides for the low-voltage operation. Simple CML circuits (that have a maximum of three stacked transistors) are considered for this ADC to make the 34 design portable to the newer technologies. Inductors can be used in the comparators to enhance the speed; however, they occupy a large on-chip area and usually have inaccurate models that will add to the design challenges. To keep the circuits simple, inductors are not used. Therefore, the ADC occupies a small on-chip area that increases the matching properties of its components. This compact layout also reduces the wire RC parasitics and results in an enhancement in the speed and/or power performance. Some techniques such as common-centroid layout and resistor averaging are used in the comparators to reduce their voltage offsets. The block diagram of the proposed ADC is shown in Figure 3.1. The comparator array consists of 21 comparators, including 15 main and 3 over-range comparators at each end of the array, and compares the input signal with the reference voltages generated by the resistor ladder (not shown in the figure) to produce a thermometer-coded version of the input signal. The encoder converts the thermometer code to the binary output using an intermediate Gray code conversion. No front-end track-and-hold (T/H) is used in this ADC. Instead, a distributed sampling [56] is performed in the first latch of the comparator array. To reduce the effects of the comparator offsets, two stages of resistor averaging are used. For this purpose, three over-range (dummy) comparators are added on each side of the array. This chapter is organized as follows. The architecture of the comparator, circuit details and design issues of the preamplifier and the CML latches, and the offset averaging technique used in the comparator array are described first. Then, the architecture of the encoder, circuit details, pipelining and CML implementation of the encoder, and the reformulation used in the encoder are described. The chapter ends by describing the Gray coding and its effects on the metastability and bubble errors in a flash ADC. 35 Figure 3.1: Block diagram of the ADC 3.2 Comparator The block diagram of the comparator is shown in Figure 3.2. The comparator consists of a preamplifier, a regenerative latch that also works as a sampling stage or distributed track-and-hold [56], and two additional cascaded latches that further amplify the sampled signal to achieve the differential low-swing levels at the comparator outputs. The timing diagram for the comparator operation is shown in Figure 3.3. The schematics of the comparator blocks are also shown in Figure 3.4. The preamplifier is not clocked. Therefore, the first latch receives a continuous (non-sampled) signal. In the first latch, when the clock is high, the circuit is in the track mode with a low amplifying gain. When the clock goes low, the circuit goes to the latch mode and the signal is sampled. The sampled signal is amplified due to the high gain of the positive feedback of the latch stage, and is then delivered to the next stage. The three latching stages operate in a fully pipelined manner, such that when one stage is in the latch mode, the next stage is in the track mode and vice versa. At the 36 comparator outputs, a signal swing of at least ±0.4V is achieved, corresponding to a differential low-swing digital signal. The sizing of the components in the different stages of the comparator is chosen to achieve high component matching in the preamplifier and in the first latching stage, while maintaining high amplifying gain in the last two stages. Details of the circuit design for the comparator are explained in the following subsections. Another example of a comparator design is also presented in Appendix A of this thesis. Figure 3.2: Block diagram of the comparator Figure 3.3: Timing diagram for the comparator operation 37 8/0.2 1k 4/0.4 8/0.2 bias (a) Preamplifier (b) Comparator’s 1st latch that also includes distributed T/H (c) Comparator’s 2nd latch (d) Comparator’s 3rd latch Figure 3.4: Circuit details of the comparator 3.2.1 Preamplifier The preamplifier, as shown in Figure 3.5, is a differential difference amplifier with resistive loads. In this circuit, the transconductance of the input transistors is 2 , tail D OV D m I I V I g == ( 3.1) 38 thGSOV VVV −= ( 3.2) in which Vov is the overdrive voltage of those transistors. The gain and the output voltage of the preamplifier are given by OV DD Dm V IR RgGain == ( 3.3) ( ) ( )( )2121 refrefininDmout VVVVRgV −−−= ( 3.4) where Vin1 and Vin2 are the differential inputs and Vref1 and Vref2 are the differential reference voltages, generated by the resistor ladder. Equation ( 3.4) shows how the circuit operates as a difference amplifier. As long as the tail current sources remain on, the power consumption of the preamplifier is almost constant, and is given by DDtailpreamp VIP ⋅≅ 2. ( 3.5) Figure 3.5: Schematic of the preamplifier Using PMOS transistors instead of resistors in Figure 3.5 increases the gain of the preamplifier. However, the parasitic capacitance at the output nodes is increased and the bandwidth of the circuit is reduced [57]. Diode-connected transistors reduce the output 39 resistance and increase the output parasitic capacitance. They result in a lower gain (due to the lower resistance) and higher bandwidth (due to the lower output RC product). They also have the advantage of decoupling the preamplifier gain from the output common-mode voltage [25]. However, due to the use of PMOS transistors, their bandwidth is still smaller than resistive load preamplifiers. The order of input ports in Figure 3.5 is (Vin1, Vref1) and (Vref2, Vin2) for the two input differential pairs. An alternative to this configuration, that is (Vin1, Vin2) and (Vref1, Vref2), also works, in which both input pairs take advantage of a balanced (differential) input signal. However, in the former configuration, if the input signals and the reference voltages have the same DC levels, then, at the zero crossing point of the preamplifier, Vin1 = Vref1 and Vin2 = Vref2. As a result, both differential pairs are in the middle of their linear region, and the preamplifier would have higher linearity. For the proper operation of the preamplifier, all the transistors should operate in the saturation mode. Therefore, the input and the reference voltages have a minimum and a maximum value. The upper and lower limitations, respectively, keep the input differential pair, and the tail current transistor, in the saturation region. To keep a transistor in the saturation region, we should have thGSDS VVV −> or thGD VV < . As the minimum output voltage of the preamplifier is tailDDDout IRVV 2(min) −= , we can calculate a maximum value for Vin as thtailDDDin VIRVV +−= 2(max) to maintain the input differential pair in the saturation region. In addition, for the tail current transistor to remain in the saturation region, a minimum voltage can be calculated for Vin as OVthsatDSin VVVV ++= )((min) . For the component values shown in Figure 3.4(a) and assuming Vth(NMOS) = 0.45V, VDS(sat) = 0.1V, and VOV ≅ 0.2V, Vin(max) and Vin(min) are calculated as 2.15V and 0.75V, respectively. 40 3.2.2 Large Signal Analysis of a Differential Pair Amplifier The differential pair amplifier is the core of the preamplifier and the latch stages of the proposed comparator. For such an amplifier shown in Figure 3.6, large-signal analysis results in [58]: 2 21 )/( 4 42 },{ id oxn tail id oxntail D VLWC IV L WCII −±= µ µ ( 3.6) Figure 3.7 shows a normalized plot of the drain currents given by ( 3.6). Figure 3.6: Schematic of a differential pair for large signal analysis Figure 3.7: Large signal I-V characteristic of a differential pair [58] 41 For the input transistor pair to operate in the active region, the differential input voltage should be limited by [58]: 0 )(2)/( 2 = =≤ idV OV oxn tail id VLWC IV µ ( 3.7) After reaching the maximum differential voltage shown by (3.7) all the tail current is delivered to one branch, and if the input goes beyond that voltage, as there is no more tail current, Vs (the common source node voltage) would increase, while the other input transistor is off. The I-V characteristic is linear for near zero input voltages. However, the linearity of the preamplifier in an ADC is not of prime importance. The important characteristic is the zero crossing point, as it determines the offset of the comparator. The linearity would also be of interest in the presence of a resistor averaging network. A linear preamplifier would better average out the offsets, compared to a nonlinear one. 3.2.3 Input-Referred Offset Voltage of a Differential Pair Amplifier Any mismatch in the components in a differential pair amplifier results in an offset voltage in the amplifier outputs. As shown in Figure 3.8, this offset is usually translated to the amplifier inputs and is referred to as the input-referred offset voltage, denoted here by Vos. Mismatches in the values of the resistive loads and sizes and threshold voltages of the input transistor pair are the important contributors to this offset. It can be proved that if all these three sources of offset are available at the same time, the overall input-referred offset is given by [58] 42 ( ) ( )      ∆ + ∆ −∆≅ LW LW R RVVV D DOV thos / / 2 ( 3.8) where 2 21 DD D RRR += ( 3.9) 21 DDD RRR −=∆ ( 3.10) ( ) ( ) ( ) 2 /// 21 LWLWLW += ( 3.11) ( ) ( ) ( )21 /// LWLWLW −=∆ ( 3.12) 2 21 thth th VVV += ( 3.13) 21 ththth VVV −=∆ ( 3.14) Hence, the threshold voltage mismatch has a more significant effect on the input-referred offset voltage, compared to the mismatch in the transistor sizes and the load resistors. Figure 3.8: Circuit for calculation of input-referred offset voltage in a differential pair amplifier 43 3.2.3.1 Effect of the Resistor Ladder on the Comparator Offset Any mismatch in the resistors of the resistor ladder results in voltage offsets in the produced reference voltages. These voltage offsets would also contribute to the total offsets of the ADC decision levels. It can be shown that these offsets have similar effects to that of the threshold voltages, i.e., they directly add up to the decision level offsets. Therefore, special circuit/layout techniques are required to keep the effect of such mismatches as small as possible. Furthermore, because of the parasitic capacitance of the preamplifier input pair transistors, there is a coupling capacitance between input ports and the reference voltages. This capacitive coupling will induce some noise on the generated reference voltage lines of the resistor ladder. These issues are revisited in Section 3.6, where the resistor ladder is discussed. 3.2.4 Resistor Averaging in the Preamplifier In a multi-stage comparator, the input-referred offset of the comparator is given by the following equation [25]. ⋅⋅⋅+++= 2 32 2 2 1 2 22 1 2 1 2 11 VOS VV VOS V VOSVOS AAA σσσσ ( 3.15) where σVOSn and AVn are the offset voltage standard deviation and the gain of the stage n of the comparator, respectively. In the proposed ADC, two layers of offset averaging are used, that are in the preamplifier and in the first latch. Due to the high gain of the first latch (because of the positive feedback) the offset voltage of the second latch has negligible effect on the input-referred offset voltage. Therefore, no resistor averaging is used for the second latches in the comparator array. 44 Simulations are performed to evaluate the effect of preamplifier resistor averaging on the reduction of the ADC decision level offsets. Voltage offsets are generated using the MATLAB uniform random generator with a mean value of zero and standard deviation (σoffset) of 1 LSB. These voltage offsets are incorporated into the structure of the preamplifiers by insertion between the input voltage and the gate of the input transistors. Simulations are performed using Cadence DC analysis. By sweeping the input voltage over the input voltage full range and finding the voltages that produce zero crossing points at the comparator outputs, the resultant input-referred offsets are measured. The simulation results are shown in Figure 3.9. Figure 3.9(a) shows the normal decision levels (or threshold voltages) for all the 21 comparators when there is no offset. Dummy comparators are shown in gray color. Figure 3.9(b) shows the randomly generated voltage offsets. Figure 3.9(c) shows how the ADC decision levels are affected by the offsets, when there is no resistor averaging network. In fact, values in Figure 3.9(c) are the sum of the values in Figure 3.9(a) and Figure 3.9(b). MATLAB calculations on the data in Figure 3.9(c) show that the offsets degrade the ENOB to 2.51 (out of 4). Figure 3.9(d) shows the decision levels for all the 15 main comparators in the ADC after resistor averaging, with a 200Ω resistance. The ENOB for Figure 3.9(d) is 3.65, that shows a significant enhancement in the ADC linearity. Figure 3.9(e) to Figure 3.9(h) show the static INL and DNL performance of the ADC, before and after averaging. Before averaging, INL and DNL errors are limited between ±1.67 and ±2.53 LSB, respectively. After averaging, these ranges are reduced to ±0.35 and ±0.23 LSB that show the effectiveness of the resistor averaging network. It is worth noting that good linearity for an ADC is considered as INL and DNL values less than ±1 and ±0.5 LSB, respectively [29]. 45 Figure 3.9: Simulation results of the effect of the resistor averaging on input-referred offsets in the comparator array. In simulations, randomly generated offset voltages with zero mean and standard deviation of 1 LSB are applied to the comparators. Horizontal axes are the comparator numbers. In figures (a) to (c), dummy comparators are shown in gray color. 46 3.2.5 CML Latch 3.2.5.1 Architecture and Modes of Operation The architecture of a CML latch is shown in Figure 3.10. In this figure, M1 and M2 are equal and M3 and M4 are also equal. Therefore, we have gm1 = gm2 and gm3 = gm4. The idealized timing diagram of the CML latch is shown in Figure 3.11. As shown in this figure, the CML latch has two modes of operation, the track mode, and the latch mode. Figure 3.10: Architecture of a CML latch Figure 3.11: Timing diagram for a CML latch In the first half of the clock cycle, when CLK goes high, the tail current is diverted to the input differential pair, and the latch goes to the track mode. In this mode, the input 47 differential pair together with the resistive loads makes an amplifier that amplifies the input signals with a low gain and produces the output signals. For a small input voltage that keeps the input differential pair in the linear region, we have inDmout VRgV 1= ( 3.16) and therefore, { } ( )inmtailDDDouttailDDDD VgIRVVIRVV 12,1 222 ±−=−= m . ( 3.17) These output signals are connected to the gate of the cross-coupled transistor pair, which is off in the track mode. In the second half of the clock cycle, when CLK goes low and CLKB goes high, the tail current is switched to the cross-coupled transistors, and the latch goes to the latch (regeneration) mode. In this mode, the input differential pair is off and the input signals have no further effect on the output signals of the circuit. The current that passes through the cross-coupled transistor pair activates their positive feedback. The output voltage produced in the track mode works as an initial state for the regeneration process of the cross-coupled pair. With their positive feedback, they amplify the initial difference between the signals and eventually all the tail current is passing only through one of the transistors. In this case, tailDDDDDDD IRVVVV −== 21 , ( 3.18) and therefore, tailDout IRV = . ( 3.19) For further clarification, the detailed operation of the positive feedback in the cross-coupled transistor pair is described here. Suppose that at the start of the latch mode, VD1>VD2. These voltages appear on the gates of M3 and M4. Thus, ID4 would be larger than ID3. If the conditions for the positive feedback are met (as will be discussed shortly), this higher ID4 would result in even more voltage drop across RD than before. As a result, VD2 (i.e., VG3) is 48 lowered compared to VD1 (i.e., VG4). Therefore, ID3 decreases and ID4 increases. This process is continued until ID3 vanishes and ID4 reaches Itail. One can look at the cross-coupled pair as a differential pair that, together with the resistive loads, makes an amplifier. Here, the overdrive voltage of the transistors and, therefore, their transconductance, is increasing with time. As a result, an exponentially increasing gain versus time is achieved for the cross-coupled pair amplifier. To achieve the time-domain equation for the output voltage of the latch, the circuit schematic is simplified as shown in Figure 3.12(a). For small output voltages, the input voltage of the cross-coupled differential pair is small and we can assume that the amplifier is working in the linear region. In this region, a half-circuit small-signal model as shown in Figure 3.12(b) can be used. The voltage controlled current source in this model can be replaced with a negative resistor with the value of −1/gm3. This resistor in parallel with RD makes an equivalent resistor of 13 − − = Dm D eq Rg R R . ( 3.20) To have a positive feedback, Req should be a negative resistor. Therefore, we should have 13 >Dm Rg . ( 3.21) Now, we have an RC circuit with a negative time constant of τ = ReqCL that has an exponentially increasing response, when a small initial input voltage is applied to it. For this RC circuit, the voltage waveform is given by τ t outout eVV − ⋅= )0( . ( 3.22) The initial output voltage in the latch mode is equal to the final output voltage of the track mode. If we assume that there is enough time in the track mode for the input differential pair to generate the final output voltage on the output nodes, then, 49 inDmout VRgV 1)0( = . ( 3.23) Therefore, the output voltages of the latch in the latch mode are { } 2 )0( 24,3 τ t outtailD DDD eVIRVV − ⋅ −= m . ( 3.24) or { }       ⋅±−= − LeqCR t inmtail D DDD eVgI RVV 14,3 2 . ( 3.25) This equation is valid as long as the output voltages are small enough to keep the cross-coupled differential pair in the linear region. Since the tail current source is always on, the power consumption of a CML latch can be estimated as tailDDlatch IVP ≅ . ( 3.26) RD CL Vout / 2 I=-gm3Vout / 2 Req CL Req = RD || (-1/gm3) Vout / 2 (a) circuit schematic (b) simplified half-circuit small-signal equivalent model Figure 3.12: Simplified circuit of the CML latch, in the latch mode 50 3.2.5.2 Sampling of the Second Stage The first latch in each comparator works as a T/H for that comparator. In the track mode, the amplifier of the first latch tracks the preamplifier output voltage. At the clock edge, when the latch tail current switches to the cross-coupled pair, the latch mode starts and the output signal of the latch is disconnected from its inputs. At this time, the input signal is sampled on the output capacitor of the latch. The positive feedback of the cross-coupled pair, as explained before, amplifies the latched signals and delivers them to the next latching stage for further amplification in the subsequent clock phases. As compared to a single front-end T/H, high linearity is not a requirement for this distributed T/H. In fact, the sampled signal (that can be very small) works as the initial state for the latch (regeneration) mode, and therefore, it is very important for the latch to correctly sample the polarity of the input signal, as it determines the comparator’s final decision. The absolute value of the sampled signal is not as important as its polarity, since the sampled signal will be amplified by the following latches. However, if the sampled signal has a higher absolute value, it helps the latching stages of the comparator make their decisions faster, and prevents metastability in the comparator outputs. Another issue associated with distributed sampling is that all T/Hs in the comparator array must sample the same point of the input signal. For this purpose, all the stages prior to the distributed T/H must have the same overall propagation delays. These stages include the preamplifier and the amplifier stage of the first latch. In order to avoid a bubble error in the thermometer code, the propagation delay variations of the different comparators should be less than the time needed for the input signal to change 1 LSB. The fastest change for a sinusoidal input signal happens at the zero-volt crossing point. The slope of an input signal with amplitude a at this point is 2pifin⋅a. For an ADC with n-bit resolution and an input signal range of 2a, 1 LSB is (2a)/2n. Therefore, the shortest 51 time needed for the input signal to change 1 LSB, that is the acceptable propagation delay variation (shown by TPDV), is n in PDV fT 2 1 max− = pi ( 3.27) where fin-max is the maximum input frequency. Calculations show that at Nyquist rate (fin=2GHz), a propagation delay variation of smaller than 10ps is tolerable for a 4-bit flash ADC. In a 6-bit flash ADC, the variations should be limited to 2.5ps that shows the challenges of distributed sampling for such an ADC. 3.2.5.3 Bandwidth of the ADC For an ADC with a single front-end T/H, the ADC bandwidth is defined by the bandwidth of the front-end T/H. However, in an ADC with distributed sampling, the ADC bandwidth is defined by the bandwidths of all the stages prior to and that of the distributed T/H stage. The simplified schematic diagram of Figure 3.13 can be used for bandwidth calculation in a differential pair amplifier. In this figure, the dominant pole is the output pole, which is characterized by the output resistance Rout and the output capacitance Cout, as ploadoutoDout outout CCCrRR CR BW ||,||, 2 1 == ⋅ = pi ( 3.28) where ro is the output resistance of the input transistors, Cload is the input capacitance of the next stage, and Cp is the parasitic capacitance of the output node, including the junction and wiring capacitances. The amplifier of Figure 3.13 has another high-frequency pole in its input port that is characterized by Rgate and Cgate. Cgate is in the order of Cload. However, Rgate is usually ~2−3 orders of magnitude smaller than Rout. The former is in the order of ohms, while the latter is in the order of hundred- or kilo-ohms. 52 Figure 3.13: Simplified schematic for calculating bandwidth of a differential pair amplifier 3.2.6 Overdrive Recovery The comparators must be able to recover from overdrive (see Section 2.2.2); otherwise, their next decision is affected by their current state. In this case, dynamic offset appears in the comparator. To test the overdrive recovery of the preamplifier and the comparator, the input signal is switched between ½ full-scale and −1 LSB (or −½ full-scale and 1 LSB) in consecutive clock cycles. For a large input signal of ½ full-scale, the high overdrive voltage of the input transistors in the preamplifier may push them into the triode region. In the next clock cycle, when the input changes to 1 LSB, this small input signal should be able to recover those transistors from triode region, and override the preamplifier’s previous decision. As a result, the comparator’s output bit should change. This is the most stressful condition for the test of a comparator [35]. The waveforms of the proposed comparator, from the post-layout simulation results, after applying this input signal are shown in Figure 3.14. This figure confirms the proper operation of the comparator. 53 Figure 3.14: Simulated waveforms of the clock, inputs, and outputs of the comparator, showing overdrive recovery under the stressful test conditions. 3.3 Encoder The encoder in a flash ADC is one of the speed bottlenecks of the system. Furthermore, the encoder should be able to properly handle comparator metastability and bubble errors in the thermometer code. Implementing an intermediate Gray-coding stage reduces the effect of metastability errors as each bit of the thermometer code affects only one bit of the Gray code. On the other hand, as only one bit changes between adjacent Gray codes, the degradation in the accuracy of the Gray code is gradual as more bubbles appear in the thermometer code. Therefore, Gray coding reduces the bubble-error effects as well [22]. 54 Section 3.7 at the end of this chapter provides more insight into the Gray coding and its effects on metastability and bubble errors in a flash ADC. In the encoder presented in this section, a fully pipelined architecture is used to enhance the circuit speed. To further increase the speed and to handle the low-swing output signals of the comparators, the encoder is implemented using CML blocks. In order to investigate the effect of pipelining, two-stage and four-stage pipelined encoders as well as an encoder with no pipelining are designed and their performances are analyzed. Compared to the encoder with no-pipelining, circuit analysis predicts that, for the two- and four-stage pipelined encoders, speed improvements of 67% and 150% are achieved, respectively, at the cost of 28% and 94% more power consumption. 3.3.1 Encoder Design The following equations show the relation between the digits of the thermometer-coded data (Tn), Gray-coded data (Gn), and binary-coded data (Bn) for a 4-bit encoder. 151311975310 1410621 1242 83 TTTTTTTTG TTTTG TTG TG +++= += = = ( 3.29) ( 3.30) ( 3.31) ( 3.32) 100 211 322 33 BGB BGB BGB GB ⊕= ⊕= ⊕= = ( 3.33) ( 3.34) ( 3.35) ( 3.36) To simplify the implementation, the following equivalent expressions can be used for G1 and G0. As a result, the thermometer code to Gray code encoder can be implemented using only AND and NAND gates. 55 151311975310 1410621 TTTTTTTTG TTTTG ⋅⋅⋅= ⋅= ( 3.37) ( 3.38) Figure 3.15 shows the schematic of a simple 4-bit encoder. Note that no gate is needed for inversion of the input thermometer code (e.g., T12 in Figure 3.15), since the encoder inputs (i.e., the outputs of the comparators) are differential signals; that is, both the inverted and non-inverted inputs are available. Figure 3.15: Schematic of a simple 4-bit encoder 3.3.2 Pipelining Pipelining is a general method that is used to enhance the operation speed of digital circuits [59]. However, this improvement usually comes at the cost of increased power, silicon area, and complexity. The basic concept of pipelining is to break the complete circuit down to a number of stages between which latches are inserted. Before pipelining, the 56 propagation delay of the complete circuit determines the circuit maximum operating frequency, while after pipelining the propagation delay of the slowest pipelined stage constrains the operating frequency [59]. Therefore, to implement an efficient pipelining scheme, it is desirable to have approximately the same delay in all stages. This is achieved in our case by implementing the encoder using CML gates. One possible scheme to break down the encoder circuit of Figure 3.15 is shown in Figure 3.16(a). This implementation reduces the propagation delay of the circuit to three-gate delays, while in the original circuit of Figure 3.15, the propagation delay is four gate delays. In Figure 3.16(a), many paths in the circuit have a delay of less than three gates. Taking advantage of this fact and applying another modification as shown in Figure 3.16(b) further decreases the propagation delay. The two encircled gates in Figure 3.16(b) can be re-located and moved to the appropriate stage (as indicated in the figure) to reduce the worst case propagation delay of each stage. The final circuit is shown in Figure 3.17(b). To evaluate the benefits of pipelining, three different implementations of the encoder are considered, as shown in Figure 3.17. They include an implementation with no pipelining in Figure 3.17(a), a two-stage pipelining in Figure 3.17(b) and a four-stage pipelining in Figure 3.17(c). These three implementations are compared in terms of speed and power. The results of the comparison are summarized in Table 3.1. In this table, TNAND, TAND, and TXOR are the respective gate propagation delays and TSH is the sum of the setup and hold times for each latch. Figure 3.18 shows the critical paths that are used to calculate the critical path delays (CPDs) listed in Table 3.1. Here, the relationship between the encoder maximum operating frequency (fmax) and the CPD is as follows [59]: CPD f × = 2 1 max ( 3.39) 57 Table 3.1 shows that pipelining has a potential of reducing the critical path delay and therefore increasing the operating frequency. However, different gates may have different propagation delays, and a solid comparison in terms of speed can not be offered at this stage. (a) LATCH LATCH LATCH LATCH LATCH LATCH LATCH LATCH (b) Figure 3.16: Pipelined structures with the worst case delays of (a) three gates and (b) two gates per stage 58 (a) No pipelining (b) Two-stage pipelining LATCH LATCH LATCH LATCH LATCH LATCH LATCH LATCH LATCH LATCH LATCH LATCH LATCH LATCH LATCH LATCH LATCH LATCH LATCH LATCH LATCH (c) Four-stage pipelining Figure 3.17: Three implementations of the encoder 59 Table 3.1: Comparison between the three implementations of Figure 3.17 No pipelining, Figure 3.17(a) Critical path delay SHXORANDXORNANDXORANDNAND TTTTTTTTMax +++++ )3,22,2( Power LatchXORANDNAND PPPP 4338 +++ Two-stage pipelining, Figure 3.17(b) Critical path delay SHXORXORANDXORNANDANDNANDNAND TTTTTTTTTMax ++++ )2,,,,2( Power LatchXORANDNAND PPPP 9338 +++ Four-stage pipelining, Figure 3.17(c) Critical path delay SHXORANDNAND TTTTMax +),,( Power LatchXORANDNAND PPPP 21338 +++ (a) No pipelining (b) Two-stage pipelining Figure 3.18: Critical paths of the encoder circuits in Figure 3.17 60 3.3.3 CML Implementation To further increase the speed, CML circuits are used to implement the encoder circuitry. CML circuits also allow for operating on low-swing signals of the comparator outputs without converting them to full-swing digital signals. Figure 3.19 shows the implementation of an AND/NAND gate, an XOR gate, and a latch in CML. The dummy transistor in the AND/NAND gate (Figure 3.19(a)) is not needed for the functionality of the circuit; it is added to increase the symmetry in the gate circuit by having three transistors in each path from VDD to ground, and to attain symmetric waveforms in the gate output ports [60]. The similarity in the structure of CML gates simplifies the delay and power calculations for the encoder, especially when a common bias voltage is used for all the circuit blocks. In this case, all the gates as well as all the latches consume almost the same amount of power, which is equal to the tail current multiplied by the supply voltage. In addition, the CML AND/NAND and XOR gates shown in Figure 3.19(a) and Figure 3.19(b) have almost the same propagation delays. This can be intuitively justified as follows: In both gates, the output node is charged through the same equivalent load resistance. Also, in both of them, the discharge path is through three stacked NMOS transistors. Table 3.2 compares the CML implementation of the different encoders of Figure 3.17. Tgate is the propagation delay of the gates while Pgate is the power consumption of a gate or a latch (which are equal). Assuming TSH ≅ Tgate, the speed and power of the three encoder implementations are compared in Figure 3.20. The two-stage and four-stage pipelined encoders have 67% and 150% higher speed, while consume 28% and 94% more power compared to the encoder with no-pipelining. Therefore, to achieve the highest speed, the encoder of Figure 3.17(c) is chosen for the proposed flash ADC. 61 (a) AND/NAND gate (b) XOR gate (c) Latch Figure 3.19: Implementation of (a) AND/NAND gate (b) XOR gate and (c) Latch in CML 62 Table 3.2: Comparison between CML implementation of the circuits in Figure 3.17 No pipelining, Figure 3.17(a) Critical path delay SHgate TT +4 Power gateP18 Two-stage pipelining, Figure 3.17(b) Critical path delay SHgate TT +2 Power gateP23 Four-stage pipelining, Figure 3.17(c) Critical path delay SHgate TT + Power gateP35 0 0.5 1 1.5 2 2.5 3 No Pipelining Two-s tage pipelining Four-s tage pipelining Normalized Speed Normalized Power Figure 3.20: Bar graph comparison of the speed and power of the CML implementations of Figure 3.17. The encoder with no-pipelining is used as the base circuit for this comparison. 3.3.4 Simulation Results Using Cadence Spectre the selected encoder circuit (Figure 3.17(c)) is simulated in a 0.18µm CMOS technology with a 1.8V supply. A bias current of 100µA is used for all the gates and the latches. As shown in Table 3.3, low-swing differential signals are considered 63 for both the input thermometer code and the input clock signal. Simulation results show that the swing of the differential output signals is equal to or greater than that of the input signals. Figure 3.21 shows the output waveforms of this encoder operating at 4GS/s when a thermometer code corresponding to a ramp signal is applied to the circuit. These waveforms are the results of post-layout simulation. Table 3.3: Amplitude of the encoder input and output signals Signals DC value Differential swing Input signals (thermometer code) 1.6V ±0.4Vpp Clock signal 1V ±0.5Vpp Output signals (simulation results) > 1.5V > ±0.4Vpp 0 1 2 3 4 5 6 7 8 9 0.8 1 1.2 Time(ns) CL K (V ) 0 1 2 3 4 5 6 7 8 9 1.2 1.4 1.6 1.8 Time(ns) B3 (V ) 0 1 2 3 4 5 6 7 8 9 1.2 1.4 1.6 1.8 Time(ns) B2 (V ) 0 1 2 3 4 5 6 7 8 9 1.2 1.4 1.6 1.8 Time(ns) B1 (V ) 0 1 2 3 4 5 6 7 8 9 1.2 1.4 1.6 1.8 Time(ns) B0 (V ) Figure 3.21: Simulated output waveforms of the four-stage pipelined encoder (Figure 3.17(c)) for a thermometer code corresponding to a ramp signal 64 3.3.5 Further Modifications in the Encoder In this section, further modifications in the encoder are presented, which are not implemented in the fabricated chip. However, they can be considered as design options. Design option of Section 3.3.5.1 was not implemented to keep the power consumption low. Design options of Sections 3.3.5.2 and 3.3.5.3 were not implemented as they were developed after chip fabrication. 3.3.5.1 Equalizing the Delay of the Different Signal Paths in the Encoder To equalize the delay of the different signal paths in each pipelined stage, a proper number of delay elements can be added to the circuit. As an example, for the two-stage pipelined encoder, this delay insertion is shown in Figure 3.22. The delay elements can be in the form of CML AND or CML XOR gates. Figure 3.22: Delay insertion to equalize the delays in the different signal paths of Figure 3.17(b) 65 3.3.5.2 A Modification to the AND/NAND Gate A modification to enhance symmetry in an AND/NAND gate is presented here [61]. The circuit schematic of the AND/NAND gate that was previously shown in Figure 3.19(a) has the advantage of symmetric waveforms for the two branches, because of the dummy transistor added to the circuit. However, from a layout perspective, it is still not as symmetric as an XOR gate. One modification that ensures more symmetry in the layout is adding another dummy transistor with grounded gate in parallel with the other dummy transistor (with the gate connected to VDD), as shown in Figure 3.23. The drain of this added transistor is connected to the other branch, resembling the schematic of an XOR gate. In this way, with minor modifications, the symmetric XOR gate layout can be used for the AND/NAND gate. Figure 3.23: An AND/NAND gate with a similar topology to an XOR gate 3.3.5.3 Implementation of the Complete Encoder Using XOR Gates Only Because of the special format of the thermometer codes, XOR gates can be used instead of AND/NAND gates in the structure of the encoder, as shown in Figure 3.24. In this way, 66 AND/NAND gates, which are not as symmetric in the layout as the XOR gates, can be avoided. Furthermore, the complete encoder is implemented using a single gate, which adds to the reliability of the circuit, as any process, voltage, and temperature (PVT) variation treats all the sub-circuits in the same way. To clarify how the replacement of the AND/NAND gates with the XOR gates works, an example is given here. An AND gate with 1T and 3T as its inputs produces 31TT . Alternatively, an XOR with 1T and 3T as its inputs produces 3131 TTTT + . However, if we assume 1T is 1 ( 1T is 0), then, due to the special format of a thermometer code, 3T is 0. It means that 31TT is always 0. Therefore, the XOR gate is producing 31TT , which is the same as the AND gate. As a result, all AND/NAND gates in the first gate level in the encoder can be simply replaced with XOR gates to get the same logical function. For AND/NAND gates in the second and third gate levels, looking at equations ( 3.31) and ( 3.32) for G1 and G0 reveals that those gates are used to generate OR functions. As the two terms in the OR function of ( 3.31) cannot be true at the same time, an XOR gate can be used to generate the same functionality. In a similar way, replacing OR gates of ( 3.32) with XORs is explained. Figure 3.24: The encoder all implemented in XOR gates 67 3.4 Reformulation of the Encoder Function High-speed operation in the encoder requires both circuit-level and layout-level considerations. An important challenge in the encoder layout is the connection of the comparator outputs to the encoder inputs. Comparators are laid out in narrow rectangles. As a result, the output ports of all comparators in the array are located close to each other. This strategy reduces the routing delay of the comparator outputs to the encoder. However, in the presence of the RC parasitics, the delays of the routing wires are still problematic. Figure 3.25(a) shows a typical signal routing from the comparator outputs to the encoder. The special placement sequence of the comparators shown in Figure 3.25(a) is chosen to achieve a circular network for resistor averaging [42]. Long routings, different wiring lengths, and crossings of the wires that appear in the layout, especially those for implementing G0 and G1, adversely affect the encoder speed. To reduce the length of the wires and the number of wiring crossovers, we can reformulate G1 and G0 as follows (see ( 3.31) and ( 3.32) for original equations): 1061421 TTTTG ⋅= ( 3.40) 971151331510 TTTTTTTTG ⋅+⋅= ( 3.41) This reformulation is achieved by taking advantage of the format of thermometer codes. As shown in Figure 3.25(b), using the reformulation, only the outputs of the adjacent comparators are connected to the same logic gate of the encoder. Therefore, long interconnects between the comparator outputs and the encoder inputs are removed. Furthermore, as shown in this figure, after the encoder reformulation, those interconnects become equal in length, reducing the race effect. From another viewpoint, the reformulation can reduce the power consumption, since in order to connect the comparator outputs to the encoder using the routing in Figure 3.25(a), 68 either CML latches must be added in the middle of the interconnect wires, or the drive capability of the last latch in the comparator array must be increased. In both solutions, the overall power consumption increases. Although reformulation removes the long wires in the first stage, as shown in Figure 3.25(b), a few long interconnects remain. For those wires, we can place CML pipelining latches at wire midpoints to reduce worst-case wire lengths by half. Figure 3.26 shows the encoder structure after reformulation and four-stage pipelining. (a) (b) Figure 3.25: Routing of the comparator outputs to the encoder (a) before (b) after encoder logic reformulation. 69 1 15 2 14 3 13 4 12 5 11 6 10 7 9 8 G3 G1G2 Four levels of latch B3 B2 B1 B0 G0 Figure 3.26: Encoder structure after reformulation and four-stage pipelining A proof for the reformulation of G1 is given here: 1410621 TTTTG += ( 3.42) 214101462 TTTTTT ⋅+⋅= ( 3.43) )( 106142 TTTT +⋅= ( 3.44) 106142 TTTT ⋅= ( 3.45) In ( 3.43), the first and second terms are multiplied (logical AND) by the factors 14T and 2T , respectively. In the thermometer code, when 6T is 1, 14T is also 1, and when 10T is 1, 2T is also 1. Therefore, ( 3.42) and ( 3.43) are equivalent. A similar proof can be given for the reformulation of G0. 70 Another proof that also gives an intuition for the reformulation can be achieved by looking at the waveform of G0 and G1 in Figure 3.27 (see also Section 3.7). In the original formulas, G1 is formulated by adding the two pulses shown in Figure 3.27(a) top. An alternative implementation is achieved by looking at the G1 waveform as an AND function between the two pulses shown in Figure 3.27(a) bottom. In this way, comparators No. 2 and 14 are connected to the same gate, and comparators No. 6 and 10 are also connected to the same gate. These are adjacent comparators in the circular array. For G0, similarly, we can look at its waveform as a combination of four pulses shown in Figure 3.27(b) top. Alternatively, it can be looked at as shown in Figure 3.27(b) bottom. 1061421 TTTTG ⋅= 1410621 TTTTG += 971151331510 TTTTTTTTG ⋅+⋅= 151311975310 TTTTTTTTG +++= (a) (b) Figure 3.27: A visual proof for reformulation of the encoder function (a) reformulation of G1 (b) reformulation of G0 71 3.5 Differential Clocking in the ADC 3.5.1 Effect of Amplitude/Phase Mismatch in Clock Signals The clock signals, i.e., CLK and CLKB, are connected to the comparators through differential pair transistors. As these two clocks are differential signals, their common-mode signal should only be a DC voltage. However, like any other differential signals, in practice, differential clocks are subject to amplitude and/or phase mismatch. Amplitude mismatch can arise for example from mismatch in on-chip 50Ω input termination resistors. Moreover, any mismatch in the on-chip capacitive load of the two clock signals can cause amplitude and/or phase mismatch in the clocks. Different routing delays of the clock signals are also another source of phase mismatch. In general, amplitude/phase mismatch between the clock signals can cause severe degradation in the performance of a system. However, due to the differential clock signaling in the complete ADC (i.e., all the clock ports in ADC sub-blocks are differential pair transistors), the ADC system is fairly immune to those mismatches. In differential clock signaling, any amplitude/phase mismatch between the clock signals would translate to a non-DC common-mode signal for the clocks. Fortunately, differential pairs tend to attenuate the common-mode components of their input signals. Furthermore, any common-mode signal that appears at the outputs would be attenuated by the next differential pair stage. Therefore, the mismatches in amplitude and phase of the differential clock signals are tolerated to some extent. Compared to the square clock signals, sinusoidal clocks are more tolerant to amplitude and phase mismatches. The reason is that any mismatch between the sinusoidal clocks would gradually decrease the effective amplitude of the differential (sinusoidal) signal, and increase the amplitude of the common-mode (sinusoidal) signal. In square clocks, mismatches may destroy the differential signal, because at some points, both clock signals can be low or high, resulting in an undefined state in a CML latch. 72 3.5.2 Clocking of a Two-Channel Time-Interleaved ADC We can take advantage of the differential clocking in the proposed ADC, if the ADC is to be used in a two-channel time-interleaved (TI) ADC system. For a two-channel TI ADC, two clocks are required with 180° phase difference. However, as the proposed ADC operates with differential clocks, simply interchanging the CLK and CLKB signals would result in the required phase difference. Since the TI ADC is sampling on both rising and falling edges of the clock, this method is referred to as the double-edge sampling technique. In addition to the clocking strategy, for a TI ADC system, we should either use a fast single front-end T/H, or increase the input bandwidth of the preamplifier and the first latch stage (that works as the distributed T/H). The rest of the ADC sub-blocks would not feel the increased input bandwidth, as they work with the half-rate sampled data. Figure 3.28 shows the timing diagram, including the exact sampling times, for such a two-channel TI ADC. When one ADC is sampling the input signal, the other ADC is processing the previously sampled signal. It takes several clock cycles for the ADCs to finish their process. The binary outputs will be available on the ADCs output ports with 180° (i.e., half a clock cycle) phase difference. Figure 3.28: Timing diagram of the two-channel ADC 73 3.6 Resistor Ladder The resistor ladder consists of a number of equal resistors. As long as the resistors are equal in value, independently of their absolute values, the resistor ladder produces evenly-spaced voltage references. However, there is a trade-off between power consumption of the resistor ladder and the susceptibility to noise injection. Smaller resistors result in higher power consumption, while they reduce noise injection at the reference voltages. Input signal feedthrough is one of the major noise sources for the resistor ladder. The simplified model shown in Figure 3.29 shows how the input signal can inject noise on a reference voltage through the parasitic capacitances of the input differential pairs of the preamplifier [30]. Figure 3.29: Feedthrough of input signal to resistor ladder through preamplifier [30] The amplitude of the injected noise is proportional to the resistance of the reference node. The worst case would be the middle reference voltage that has the highest resistance. To minimize the voltage induced on the reference voltages, the wiring from the resistor ladder to the preamplifier should be kept as small as possible. Otherwise, the wire resistance would add up to the resistance of the nodes and degrade the quality of the reference voltage. Thermal noise is another source of noise in the resistor ladder. Thermal noise in a resistor is given by the following equation [58]: 74 fTRkv Bn ∆= 4 ( 3.46) where kB is the Boltzmann constant (1.38×10-23 J/°K), T is the resistor absolute temperature in kelvins, and R is the resistor value in ohms. For a resistor of 5 to 10Ω (typical of resistors in the ladder), a bandwidth of 2GHz (bandwidth of the proposed ADC), and assuming eight resistors in series as the worst case resistance in the array, vn is less than 52µV at the room temperature. This value is far smaller than the LSB for the proposed 4-bit ADC that is ~90mV. Therefore, the thermal noise of the resistor ladder is negligible. 3.7 More Discussion on Gray Coding In this section, Gray coding and its effects on the metastability and bubble errors are discussed in more detail. Figure 3.30(a) and Figure 3.30(b) show waveforms for 4-bit Gray and binary counters, respectively. As shown in these figures, at some points, such as the switching time from code 7 to code 8, more than one bit change in the binary code. However, in the Gray code, there is only a one-bit transition between consecutive codes. Table 3.4 shows the conversion from a decimal code (thermometer code) to the binary and Gray codes. (a) Binary counter (b) Gray counter Figure 3.30: Binary versus Gray counter 75 Table 3.4: Binary codes versus Gray codes Decimal code Binary code Gray Code 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000 0001 0011 0010 0110 0111 0101 0100 1100 1101 1111 1110 1010 1011 1001 1000 3.7.1 Effect on Metastability Metastability, or undefined state of a comparator, can cause error in the ADC outputs. If the thermometer code is converted to a Gray code, in case of a metastability state only one bit is affected in the Gray code. This is shown in Figure 3.31(a) and Figure 3.31(b) for binary and Gray counters, respectively. In these figures, it is assumed that the switching time of the third bits in the Gray or binary codes happens earlier or later, while all other bits are switching at the right time. The resultant codes are shown in the figures as well. In a binary counter, the earlier or later switching time results in a large error in the output code, while in a Gray counter, the output code only changes by one, as a result of such an error. Converting the thermometer code to the Gray code gives the comparator outputs more time to resolve the metastability, while those signals are passing through the gates and pipelining latches in the encoder [22]. 76 (a) (b) Figure 3.31: Effect of an error in B3 (a) in a binary code and (b) in a Gray code. The top waveform is related to the correct switching time. The bottom waveforms are related to earlier or later switching. The respective codes are shown below the waveforms. It is assumed that all other bits are switching at the right time 3.7.2 Effect on Bubble Errors Bubble errors are another source of error in flash ADCs that may result in the corruption of the ADC output bits and the degradation of the ADC linearity performance. An effective method to increase the immunity against bubble errors is to convert the thermometer code to a Gray code as an intermediate step before binary encoding. To investigate this effect, a bubble error is introduced into a thermometer code close to the signal level. For this purpose, code N of a thermometer code is considered that has a format of 00…0011…11 with the right N bits equal to 1 and the rest equal to 0. Two cases of bubble errors are considered here: (1) a bubble error ‘0’ is inserted at bit N-1 and (2) a bubble error ‘1’ is inserted at bit N+2. The results are reported in Table 3.5 for a Gray code encoder, using equations ( 3.29) to ( 3.32). In the case of bubble errors, ‘1’s counter encoders (such as Wallace tree counters [52]) produce the best results. However, counter encoders suffer from hardware intensity and low-speed operation. To evaluate the effectiveness of the Gray code encoder, the results are compared with that from a ‘1’s counter encoder. As shown in the 77 table, in all cases, the resultant code of the Gray encoder has only a difference of two, with the code produced by a ‘1’s counter. Table 3.5: Resultant code after inserting a bubble error into a thermometer code N. (1) a bubble error ‘0’ inserted at bit N-1 (2) a bubble error ‘1’ inserted at bit N+2. Results are reported for a Gray code encoder (shown by G) and a ‘1’s counter encoder (shown by C) Decimal code 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 G - - 3 0 5 2 7 4 9 6 11 8 13 10 15 12 TN-1=0 C - - 1 2 3 4 5 6 7 8 9 10 11 12 13 14 G 3 0 5 2 7 4 9 6 11 8 13 10 15 12 - - TN+2=1 C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 - - 78 4 CHIP LAYOUT AND MEASUREMENT RESULTS In this chapter, layout techniques, circuit for the output buffer, test setup, and the measurement results for the flash ADC are discussed. A performance summary and a comparison with the state-of-the-art work are presented at the end. 4.1 Layout The ADC was laid out for on-chip probing and was fabricated in a standard digital 1-poly 6-metal (1P6M) 0.18µm CMOS process. The chip micrograph and a close-up of the ADC core are shown in Figure 4.1. The complete ADC including the output buffers is laid out in an area of 0.06mm2. This small area facilitates achieving better component matching as well as higher speeds. Common-centroid structures are used for all transistor and resistor pairs in the preamplifier and first latching stages of the comparator. Also, a common-centroid layout is utilized in the resistor ladder. Using these geometric layout techniques as well as the two-stage resistor averaging in the comparator, the effects of mismatches are reduced, and the use of digital calibration is avoided. Comparators are laid out in narrow 8µm×102µm rectangles. As a result, the input ports of all comparators in the array are located close to each other within a span of 170µm. The same layout proximity applies to the clock, bias, and output ports of the comparators. The closeness of the ports preserves the integrity of the input and clock signal as distributed to 79 the comparators, while reduces the routing delay of the comparator outputs to the encoder, and also reduces the drop of the bias voltage as it is distributed across the comparator array. Furthermore, the proximity of the comparators increases component matching among them, while bringing the preamplifiers (and the first latches) output ports close to each other, thereby increasing the effectiveness of the resistor averaging network. High-speed input signals, including the input and clock signals, have 50Ω on-chip termination resistors to remove the reflected signal at high speeds. In order to reduce the capacitance of high-speed ports, the on-chip pads for those signals are made smaller (although, this is more critical for output ports). An octagon pad is used instead of a rectangular one (Figure 4.1), that resulted in a 30% reduction in the pad capacitance. Upon availability of precise probe positioners, smaller pads can also be used. The routing of all high-speed signals including the input and clock signals and output bits of the ADC is carried out in such a way that the corresponding signals, such as CLK and CLKB, and B3 to B0, have the same wire length in the layout. In the proposed ADC, the output buffers have their own power supply lines on the chip. Therefore, the current switching in the output buffers does not affect the ADC operation, which has almost no switching current. Also, this supply separation makes the power measurement possible for the ADC core. In practice, if the ADC is to be used as an embedded block, no output buffer is required. On the other hand, if the ADC is to be packaged as an stand alone circuit, then, due to the high-speed changes in the ADC outputs, low-voltage differential signaling is required to send the output bits to the other parts of the system. Due to the differential operation of such output buffers, the produced switching noise on the power supply is small. All the sub-circuits of the ADC, including the different stages in the comparators and gates and latches in the encoder, are surrounded by substrate contact shield to reduce the effect of 80 substrate noise. All the ground pins are connected together on-chip by a wide metal trace under the pads around the chip. The body of all NMOS transistors is connected to ground. Figure 4.1: ADC chip micrograph 4.1.1 Distribution of the Input and Clock Signals to the Comparator Array Distributed T/Hs suffer from skews in the clock and input signals distributed across the array [22]. To reduce this effect, a binary-tree distribution [62] (similar in concept to the 81 H-tree clock distribution network) can be used to deliver the input signals to the preamplifiers and the clock signals to the first latch stages (which act as the T/H). Because of the narrow layout of the comparators, the input and clock ports are formed close to each other. On the other hand, there are only 21 comparators in the comparator array. Therefore, in this case, the input and clock signals are directly connected to the comparators. It is clear that the proximity of the input and clock ports is more essential in a 6-bit or higher-resolution flash ADC with distributed sampling. Using the Elmore delay equation [63], the delay of the input and clock signals across the comparator array can be estimated. If the signal is applied to the middle comparator of the array, there are (N−1)/2 comparators on each side of the connection point. R is the resistance of each part of the wire and C is the parasitic capacitance of the input port of the preamplifier (or the parasitic capacitance of the clock port of the first latch, in the case of clock distribution). Based on Elmore, the propagation delay from the middle comparator to node M (shown in Figure 4.2), which is m nodes away from the middle, can be approximated as RCm 2 2 1 . This results in a worst case delay of RCNT 2 2 1 2 1       − ≅ for the comparators on the two ends of the array in the layout. Note that the comparator array is folded for better implementation of the circular resistor averaging network. Therefore, the comparators on the two ends in the layout are the comparators #11 and #21 (or #1). Figure 4.2: Propagation delay estimation of the input and clock distribution across the comparator array using Elmore delay. The delay for node M is estimated as RCm 2 2 1 . 82 4.1.2 Common-Centroid Layout in the Preamplifier To increase matching, the common-centroid layout [61] is used for the input transistors, tail current transistors and load resistors in the preamplifier (and the first latching stages). For a differential pair, shown in Figure 4.3(a), each transistor is divided into a number of fingers and the fingers are interleaved, as shown in Figure 4.3(b). Two dummy fingers are added on each end to reduce the edge effects and are connected to ground. For the preamplifier, that includes two differential pairs as shown in Figure 4.3(c), the layout of Figure 4.3(b) can be used for each differential pair. Alternatively, one of the layouts introduced in Figure 4.3(d) and Figure 4.3(e) can be used, where each transistor is divided to several fingers, and a common-centroid layout is drawn for the four transistors, as a whole. The difference between these two figures is that in Figure 4.3(d), G1 and G2 gates, and also G3 and G4 gates, do not have a common center, while G1+G3 (for which the drain nodes are connected together) and G2+G4 (for which the drains are also connected together) still have a common center. This symmetry is also important considering the preamplifier gain given by ( 3.4) that can be rewritten as follows. ( ) ( )( )3241 GGGGDmout VVVVRgV −−−= ( 4.1) ( ) ( )( )4231 GGGGDm VVVVRg +−+= ( 4.2) In Figure 4.3(e), a common centorid layout is shown that provides a common center for G1 and G2 gates as well as G3 and G4 gates, at the cost of slightly larger layout area. Figure 4.3(f) shows the method of producing the layouts of Figure 4.3(d) and Figure 4.3(e). As shown in this figure, if we start from node D1 and follow the line, we will end up in node D1 again, while we have passed all the nodes of the double differential pair. If the node names are recorded in the order they are faced with, we will get “D1, G1, S1, G2, 83 D2, G4, S2, G3, and D1”. This list gives the order of the nodes in the left part of the layout in Figure 4.3(d). For the right part of this layout, we should start from D2 node. (a) Gnd D1 S D2 S D1 S D2 Gnd G1 G2 (b) (c) (d) (e) (f) Figure 4.3: (a) Schematic of a differential pair, (b) common-centroid layout for (a), (c) schematic of the transistors in a differential difference amplifier, (d) and (e) two common-centroid layouts for (c), (f) the order of nodes used in drawing (d) 84 For simplicity, the wires connecting the nodes with the same name are not shown in the layouts of Figure 4.3(d) and Figure 4.3(e). Routing those wires makes the layout more complicated. To keep away from this complexity, in the proposed ADC, the simple common-centroid layout of Figure 4.3(b) is used for each differential pair in the preamplifier. However, the layout structures introduced in Figure 4.3(d) and Figure 4.3(e) can be used for flash ADCs with higher resolutions (e.g., 6-bits and beyond), where matching is more pronounced. 4.1.3 Common-Centroid Layout for the Resistor Ladder In addition to the comparators, the resistor ladder is also a source of decision-level offsets in a typical flash ADC. The resistor ladder can occupy a considerable amount of on-chip area, as it is connected to the input ports of all comparators in the array. As resistors in the ladder are spread over a wide area, they are prone to systematic mismatches arising from process variations. These mismatches produce offset in the voltage references and, as discussed in Section 3.2.3.1, any offset in the voltage references directly affects the decision levels of the ADC. Therefore, besides using common-centroid structures for all transistor and resistor pairs in the comparators, a common-centroid layout is also introduced for the resistor ladder to further decrease the decision-level offsets of the ADC. As shown in Figure 4.4, each resistor is first broken into a number of parallel segments such that the resistor ladder forms a grid of resistors. The resistor grid is then twisted to interleave the resistor segments symmetrically such that the overall structure ends up possessing a common center. In the resulting common-centroid layout shown in Figure 4.4, for each resistor, only one of its parallel segments appears on any given row or column. 85 The technique that is used here can be extended to higher ADC resolutions, e.g., 6-bit flash ADCs, where the voltage offsets of the reference voltages generated by the resistor ladder become more significant. 1 2 3 4Vref1 Vref2 V1 V2 V3 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 V1 V2 V3 Vref1 Vref2 Breaking resistors to parallel segments Interleaving to achieve a common-centroid layout. : Common center 1 2 3 4 1 2 34 1 23 4 12 3 4 1 2 3 4 Vref2 Vref1 V1 V2 V3 Centroid line Centroid line Figure 4.4: A common-centroid layout for the resistor ladder. It is worth noting that the resistor ladder should supply differential reference voltages for each comparator, e.g., the first and the last reference voltages are required for both the first and the last comparators in the comparator array. Using a circular array of comparators not only provides for an easier implementation of the resistor averaging network for the 86 preamplifiers and the first level of latches, but also facilitates the folded structure for the comparators, as a result of which the comparators that use the same reference voltages (e.g., the first and the last comparator) stay close to each other. A folded layout for the resistor ladder [35] would provide such differential voltages for the comparator array. However, in the common-centroid layout introduced here, all reference voltages are available on each row of the resistor ladder. 4.2 Post-Layout Simulation Results The ADC was tested using a 4GHz clock and a low-speed input signal of 15.625MHz. Also the ADC was tested near Nyquist rate conditions, with a high-speed input signal of 1.984375GHz ( = 2GHz – 15.625MHz ). Post-layout simulations resulted in ENOBs of 4.00 and 3.44, respectively. The reconstructed signals are shown in Figure 4.5. (a) a 15.625MHz signal sampled at 4GHz (b) a 1.984375GHz signal sampled at 4GHz Figure 4.5: Post-layout simulation results (ADC output codes vs. the sample numbers) 87 4.3 Output Driver Due to the availability of high-speed measurement equipment, no decimation was performed on the outputs. For the ADC outputs to drive the 50Ω input resistance of the measurement equipment, the differential output voltages of the ADC are converted to single-ended output currents, using the circuit shown in Figure 4.6. The ADC output voltages are first buffered to be able to drive the large transistors in the last stage. The tail current for this stage is 1.5mA. The current mirror in the output of this stage multiplies the current by two and provides an output current with the peak amplitude of 3mA. This current produces a square waveform with the peak amplitude of 150mV on the 50Ω input resistance of the oscilloscope. Figure 4.6: Output driver to drive the 50Ω input resistance of the oscilloscope 4.4 Test Setup The test setup for the proposed ADC is shown in Figure 4.7. 88 Figure 4.7: ADC test setup The DC supplies used for the ADC test are as follows. The ADC core and the output buffers are supplied by two separate 1.8V supplies (VDD and VDD-OUT). Two other supplies (Vref-high and Vref-low) are used as the reference voltages for supplying the resistor ladder. Their voltages are set to 1.75V and 0.8V, respectively. Another supply (Vbias) is used to provide an externally adjustable bias voltage for the reference current source of the ADC. All the ADC internal bias currents are adjusted by this current source. An RF signal generator produces a sinusoidal waveform as the input signal. The signal is converted to differential by a splitter. The differential signals are level shifted by ( 0.8 + 1.75 ) / 2 = 1.275V through two bias-tees and connected to the input ports of the ADC. The differential clock signals are generated by a differential square clock signal generator that is part of an error performance analyzer (Agilent 86130A), and are directly connected to the clock ports of the ADC. To monitor and record the ADC output bits, a four-channel 40GS/s oscilloscope (Agilent Infinium DSO81304A) is used. The ADC outputs are directly connected to the oscilloscope input ports. All those ports have 50Ω termination resistors. The oscilloscope is set at 4-channel simultaneous sampling at 20GS/s on each channel. It captures the ADC output 89 waveforms and stores them in its memory. The oscilloscope is equipped with an Ethernet port that is used to transfer the stored data offline to a PC for further processing. A MATLAB file on the PC extracts the ADC output bits from the raw sampled data that are the digitized ADC output waveforms. To prevent leakage in the frequency domain (FFT operation), the sampled signal should be periodic. Thus, the input and clock signals should be synchronized. On the other hand, synchronization of the sampling time of the oscilloscope with input and clock signals helps in easier extraction of the ADC output bits from the sampled data. For this purpose, these three equipments (oscilloscope, and input and clock signal generators) are synchronized though their 10MHz synchronization ports. In our test setup, the oscilloscope is set as the synchronization master, while any of the two other equipments can also fulfill the job. As the differential clock generator had no synchronization port, another RF signal generator, that was equipped with a 10MHz synchronization port, was used to provide a clock input signal for the differential clock generator. The extraction of the ADC output bits from the raw sampled data is as follows. For a 4GHz clock, the output bits can be obtained by taking one sample out of each five samples. To find the exact starting point among the five possible choices, the MATLAB program chooses the starting point that gives the best ENOB. ENOB and SNDR are calculated using FFT of the samples. First, the samples are converted to digital values using a comparison with the signal midpoint (similar to a 1-bit quantizer). The digital values of different bits are binary weighted and added together to achieve a 4-bit code. The array of output codes is truncated for a proper FFT operation. The signal and noise powers are then calculated from the frequency spectrum of the array. 90 4.5 Measurement Results 4.5.1 Power Consumption Due to the use of CML circuits in all blocks, the ADC power consumption is almost independent of the clock and input frequencies. The proposed ADC consumes 43mW from a 1.8V supply, which includes 21mW (~1/2) for the comparators, 8mW (~1/6) for the encoder, and 14mW (~1/3) for the resistor ladder, as shown by the pie diagram of Figure 4.8. In the proposed ADC, no clock buffer is used, i.e., the 43mW power does not include the clock buffer power consumption. Also, it does not include the power of the output buffers. Figure 4.8: Pie diagram of the power consumption in different blocks of the ADC The reason for not implementing a clock buffer in the proposed ADC is to provide the possibility of testing the ADC with different clock waveforms, such as low swing or full swing, and square or sinusoidal. If a buffer is to be added to the ADC, its power consumption can be estimated using PCLK = CCLKVCLK2FCLK, in which PCLK, CCLK, VCLK, and FCLK respectively represent the clock power, the clock capacitance (i.e., 2×1.1pF), voltage 91 swing of the clock signal (i.e., a value between 0.4 and 1.8V), and the sampling rate of the ADC (i.e., 4GS/s). As a result, for the clock voltage swings of 0.4, 1, and 1.8V, PCLK would be 1.4, 8.8, and 28.5mW. To reduce the clock power, we can take advantage of an LC oscillator that can produce low- or full-swing clocks with low power consumption [19]. The power consumption of the proposed ADC is substantially lower as compared with the previously reported flash ADCs, [18] and [19]. The reason is that in those ADCs, smaller transistors with higher speeds are utilized in the comparator. As a result, comparators have relatively larger voltage offsets compared to the proposed ADC. Then, to remove the effect of the bubble errors that appear because of large offset comparators, complex digital circuits such as digital averaging [19] or Wallace-tree counting [18] are used in the encoder. As a result, the total power consumption is increased. A performance summary for [18] and [19] is given in Table 4.1 at the end of this chapter. 4.5.2 DNL/INL Performance Measurements Several techniques are available for measuring the DNL/INL performance [64]. In these methods usually the DNL performance is measured and, by integration, the INL performance is achieved. A method that is appropriate for low-speed tests is to apply a slow ramp signal as the input to the ADC. In this mode of operation, the ADC resembles a 4-bit counter. As the input signal is slow, each output code of the ADC would appear repeatedly in the outputs, before the code changes. If different codes are placed in different bins, at the end of the test time, the number of samples in each bin is proportional to the DNL of the respective code. To have an appropriate measurement, an integer number of periods of the input signal should be considered. Another input waveform that can also be used in either low-speed or high-speed tests is a sinusoidal signal. At the end of the test time, the number 92 of samples in each bin should follow the probability distribution of the input signal amplitude. Any deviation from that distribution specifies the INL and DNL errors. For a sinusoidal signal with amplitude a, the amplitude distribution function is 22 1)( xa xp − = pi . ( 4.3) This method is known as code density or histogram test of ADCs [65] and is used here for the DNL measurements. As a differential ramp signal generator was not available at the time of chip test, a sinusoidal signal was used. Figure 4.9 shows the DNL/INL performance for an 11.131MHz input signal sampled at 3GHz. They lie in the range of −0.35 to 0.35LSB and −0.26 to 0.24LSB, respectively. Note that this DNL/INL performance is achieved without using digital calibration, which confirms the effectiveness of the layout techniques incorporated in this design. -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 Output code D NL /IN L (L SB ) DNL INL Figure 4.9: Measured DNL/INL for a 11.131MHz signal sampled at 3GHz 93 4.5.3 ENOB (SNDR) Measurements 4.5.3.1 ENOB for a Constant Input Frequency The relationship between ENOB and SNDR (in dB) is given by 02.6 76.1− = SNDRENOB ( 4.4) ENOB measurements for the ADC are reported in two separate diagrams. First, a 10MHz input signal is applied to the ADC, and the clock frequency is changed. Figure 4.10 shows the ENOB (SNDR) performance versus the clock frequency. At 2GHz, an ENOB of 3.93 bits is achieved. At 4GHz (5GHz), ENOB drops to 3.71 (3.5) bits. Higher clock frequencies are also applicable to the ADC, due to the CML structure. In these frequencies, ENOB drops slowly. At 9GHz, the ADC still has an ENOB of 3.22 bits (not shown in the figure). 0 1 2 3 4 0 1 2 3 4 5 Clock Frequency (GHz) EN OB (B its ) 2 8 14 20 26 SN D R (dB ) Figure 4.10: ENOB (SNDR) graph for a 10MHz input signal 94 4.5.3.2 ENOB for a Constant Clock Frequency ENOB is also reported for a constant sampling rate. Clock frequencies of 3 and 3.5GHz are applied to the ADC, and the input frequency is changed from low frequencies to the Nyquist rate. Figure 4.11 shows the measurement results for the ENOB performance versus the input frequency. For a 3GHz clock signal, an ENOB of 3.14 is achieved for a 0.501GHz input signal. At 1.491GHz input (close to the Nyquist frequency), ENOB is still above 2.75. As shown in the figure, the ENOB performance drops for high-speed input signals. This is due to two reasons. First, the ADC has a limited input bandwidth. Thus, as the input frequency increases, the input signal is attenuated in different stages of the ADC, so that instead of a full range of the output codes, a subset of them appears in the output. Furthermore, at high-frequency inputs, the performance of the internal sub-blocks of the ADC is degraded and the linearity of the ADC is decreased. 0 1 2 3 4 0 0.4 0.8 1.2 1.6 Input Frequency (GHz) EN OB (B its ) 2 8 14 20 26 SN D R (dB ) 3GHz Clock 3.5GHz Clock Figure 4.11: ENOB (SNDR) graphs for 3 and 3.5GHz clock frequencies 95 4.5.4 SFDR Measurements Figure 4.12(a) shows the spurious-free dynamic range (SFDR) performance versus the sampling rate for a constant input signal of 10MHz. Figure 4.12(b) shows the same performance versus the input frequency for constant sampling rates of 3 and 3.5GHz. (a) for a 10MHz input signal and different clock frequencies (b) for 3 and 3.5GHz clock frequencies and different input frequencies Figure 4.12: SFDR measurement results 96 4.5.5 Output Waveforms Figure 4.13 shows the ADC output waveform for different input and clock frequencies, including (a) a 0.01GHz signal sampled at 4GHz, (b) a 0.101GHz signal sampled at 3.5GHz, (c) a 0.301GHz signal sampled at 3GHz, and (d) a 1.491GHz signal sampled at 3GHz. Increasing the input frequency, as shown in the figures, reduces the range of the ADC output codes. This signal attenuation is due to the limited input bandwidth of the ADC. Adding a front-end T/H could be a solution to improve the performance at high input frequencies. (a) a 0.01GHz (10MHz) signal sampled at 4GHz, ENOB = 3.71 (b) a 0.101GHz signal sampled at 3.5GHz, ENOB = 3.35 (c) a 0.301GHz signal sampled at 3GHz, ENOB = 3.24 (d) a 1.491GHz signal sampled at 3GHz, ENOB = 2.75 Figure 4.13: ADC output waveforms (horizontal axes are the sample numbers) 97 4.5.6 Frequency Spectrum of the Output Figure 4.14 shows the frequency spectrum of the ADC output waveforms shown in Figure 4.13. The main signal is distinguishable in all Figures. Harmonics can be seen in Figure 4.14(b) and Figure 4.14(c). In Figure 4.14(a) the harmonics are too small to be seen on the figure, and in Figure 4.14(d), higher-order harmonics are filtered out, due to their high frequency. (a) a 0.01GHz (10MHz) signal sampled at 4GHz (b) a 0.101GHz signal sampled at 3.5GHz (c) a 0.301GHz signal sampled at 3GHz (d) a 1.491GHz signal sampled at 3GHz Figure 4.14: ADC output frequency spectrum for the waveforms of Figure 4.13 98 4.5.7 Bit-Error Rate Measurements To measure bit-error rate (BER) in the ADC, a very low-frequency input signal is applied to the ADC over a long period of time and output samples are stored. Any jump more than one step in the output code, in increasing or decreasing order, is considered as an error [35][44]. Alternatively, BER can be measured by applying a low amplitude input signal around the MSB transition. The occurrences of bit errors can be identified with an XNOR operation on the MSB and MSB-1 bit [56]. BER is measured for a long time, since the number of samples should be as large as 1010. In our case, the latter method was easier to use because of the limited memory of the oscilloscope. BER was measured as ~10-7, which is larger than expected. This error can be attributed to converting the Gray code to the binary code in the encoder. Because of this conversion, the outputs of some gates are connected to the inputs of more than one gate. As a result, in the case of metastability, the output bits of the ADC are corrupted. A solution to this problem is to remove the Gray to binary converter in the encoder, which can be achieved by removing the XOR gates. The conversion can take place later in the digital domain using the system embedded processor that uses the sampled data to analyze the original analog signal. This solution was verified and confirmed in a later chip fabrication. A BER of better than 10-10 is achieved for the converter with Gray-to-binary conversion removed. 4.5.8 Input Port and Clock Port Capacitance Since the input signal is directly connected to the input ports of all comparators, the ADC has a relatively large capacitance, which is equal to 21× the capacitance of a single comparator plus the pad and wiring capacitance. Based on simulations, this input capacitance is 0.5pF. This value has a slight change of less than 1%, as the input voltage 99 changes over the entire input voltage range. Simulations also show that the clock capacitance for the overall ADC circuit is 1.1pF. This capacitance value is the same for both CLK and CLKB ports that ensures the same propagation delay for these clock signals as they travel across the chip. As a result, the chance of clock skew in the system is reduced. In comparison, the input capacitance of [18] is 1.6pF (clock capacitance is not reported). 4.5.9 Propagation Delay of the ADC As the proposed ADC uses pipelining, its propagation delay is mainly determined by the clock rate of the system. In each comparator, there are three latches, and one preamplifier. In the encoder, there are four latches. Each latch stage requires half a clock cycle to operate. Therefore, the overall ADC propagation delay is 3.5 clock cycles plus the propagation delay of the preamplifier (less than half a clock cycle). In comparison, the comparator array of [18] has a two clock cycle propagation delay (i.e., more than the comparator array in the proposed ADC). No latency is reported for the encoder part, though. However, as the Wallace tree counter in the encoder is implemented using pipelined full-adders, multiple levels of latches are expected to be used in the encoder. As a result, it is estimated that the propagation delay of [18] to be equal to or greater than that of the proposed ADC. However, it is worth noting that if an ADC is to be designed for applications other than timing or gain control loops, the latency is not important [25]. A high-speed serial link is an example of such applications. 4.5.10 Measurements Summary A measurement summary of the ADC is presented in Table 4.1 along with a comparison with the state-of-the-art single-channel flash ADCs with comparable speeds [18][19][55]. 100 Table 4.1: Summary of the measured performance and comparison with recently published work This work [18] [19] [55] Technology 0.18µm CMOS 0.18µm CMOS 0.13µm CMOS 90nm CMOS Resolution 4 bits 4 bits 6 bits 5 bits Sampling rate 4GS/s 4GS/s(*) 4GS/s Up to 4GS/s Supply 1.8V A:1.8V D:2.1-2.5V 1.5V 1.4V Input range ±0.65V ±0.46V 0.6V ±0.32V Power (mW) Total: 43 (excl. clock buffer) A: 14(res. ladder) + 21(comps) D: 8 (encoder) Total: ~619 (incl. clock buffer) A: 89, D: ~530 Total: 990 (incl. clock buffer) A: 90, D: 900 Total: 132 (excl. clock buffer) A: 115, D: 17 DNL/INL (LSB) DNL: −0.35~0.35 INL: −0.26~0.24 (No calibration) DNL: −0.14~0.15 INL: −0.24~0.20 (After calibration) DNL: −0.23~0.91 INL: −0.98~1.2 DNL: −0.83~0.93 INL: −0.89~0.88 (After calibration) ENOB @ fs, fin 3.71@4G,10M 3.06@3.5G,0.501G 3.14@3G,0.501G 2.75@3G,1.491G 3.89@4G,10M 3.48@4G,0.1G 3.47@3.4G,0.8G ENOB/SNDR not reported. SFDR: 36dB@4G,0.509G 30dB@4G,1.017G 4.28@4G,5M 3.63@3.5G,1G ADC area 0.06mm 2 (incl. res. ladder) 0.88mm2 (excl. res. ladder) 0.5mm 2 0.658mm2 (incl. res. ladder) FoM (pJ/conv-step) @ fs, fin 2.14@3G,1.491G ~22@3G,1.3G ENOB/SNDR not reported. ~4.5@3G,1G Digital calibration No Yes Digital averaging Yes (*) This design uses a single-channel comparator array, and a two-channel time-interleaved encoder. 101 The power consumption of the proposed ADC is 43mW, while [46] and [48] consume 990 and ~619mW, respectively. The power reported for these two ADCs include that of the clock buffer, while the power reported for the proposed ADC does not. In [18], the power consumption of the digital part is proportional to the clock frequency. The value reported in this table, i.e., 530mW, is measured at 4GHz clock. Different ADC designs can be compared using the figure-of-merit (FoM) defined in [66] as in ENOB f PowerFoM ×× = 22 . ( 4.5) For fin=1.491GHz at a 3GHz clock (the Nyquist condition) an FoM of 2.14pJ/conversion-step is achieved, which is at least 2× better than comparable designs. As no on-chip calibration circuit is used and also inductors are avoided in this ADC, an active area as small as 0.06mm2 is achieved for the ADC, including the output buffers. This circuit area is more than 10× smaller than [18]. 102 5 TIME-DOMAIN ANALYSIS OF INL EFFECTS ON THE SNR OF ADCS In this chapter, the SNR of an ADC is formulated based on its INL performance. INL, in turn, is shown in terms of the decision level offsets (DLOs). The analysis is performed in the time domain. Since the SNR depends on the input waveform, two standard waveforms are considered. The SNR is first calculated for a ramp waveform, which is a good representative for an input signal with uniformly distributed amplitude. Then, the SNR is calculated for a sinusoidal waveform, which is a common input signal for practical SNR measurements. The derived equation for the ramp input signal indicates that the contribution of the DLO noise to the total output noise can be estimated by the mean square of the DLOs. The derived formulas are confirmed by simulation results. 5.1 INTRODUCTION Most ADCs sample the input signal in equally spaced time intervals and quantize the samples with equally spaced decision levels. These two specifications are deteriorated by sampling jitter and DLOs, respectively. Figure 5.1(a) shows the operation of such an ADC on a triangular input waveform. The vertical solid lines are the sampling times, the horizontal solid lines are the decision levels and the horizontal dashed lines are the analog representation of the output codes commonly referred to as the quantization levels. The 103 decision levels are used to choose the best output code. The uniform lattice of the ideal ADC shown in Figure 5.1(a) is disturbed in both dimensions in the presence of DLOs and sampling jitter, as demonstrated in Figure 5.1(b). The dashed lines remain unchanged, as these represent the output codes. Due to the resultant non-uniform lattice, the output waveform changes as compared to the waveform of Figure 5.1(a). Here, the ADC output waveform refers to the analog representation of the ADC output code. (a) an ADC with no DLOs and no sampling jitter (b) an ADC with DLOs and sampling jitter, shown as shaded areas Figure 5.1: Operation of an ADC on a triangular waveform 104 In this chapter, the effect of DLOs on the SNR performance of an ADC is analyzed. Noise associated with the sampling jitter is beyond the scope of this analysis. The reader is referred to [67] for an analysis of the effects of jitter on the ADC performance. The SNR, DNL, and INL are figures of merit for ADC linearity. DNL and INL express the offsets for individual decision levels, while SNR gives a single number as a coarse evaluation of the ADC linearity. DNL and INL metrics may change with input frequency, input waveform, and clock frequency. However, for a specific set of DNL and INL values and a particular input waveform, SNR can be calculated based on DNL and INL values. To simplify the discussions, here, the SNR of an ADC is expressed in terms of the DLOs. The DLOs, in turn, are formulated based on DNL and INL values. The effects of INL and DNL metrics on the operation of an ADC have been well studied and reported in the literature [67]−[70]. However, the literature appears scarce on formulas that relate SNR to INL and DNL. In [67], a formula is given to estimate SNR in the presence of jitter, non-linearity, and thermal noise, viz., ( ) 2 1 22 2 22 12log20               +      + +−= − − n rmsnoise nrmsjitterin V tfSNR εpi ( 5.1) where fin is the input signal frequency, tjitter-rms is the aperture jitter, ε is the average DNL of the ADC, and Vnoise-rms is the thermal noise in LSBs. In the absence of jitter and thermal noise, ( 5.1) is simplified to ( )ε+−= 1log2002.6 nSNR . ( 5.2) Equation ( 5.2) has a reasonable accuracy for high-resolution ADCs (e.g., n > 7). However, it is not accurate for ADCs with lower resolutions (e.g., n ≤ 7, as we will see in the simulation 105 results in Section 5.4). On the other hand, this equation does not take the input waveform into account. Another formula is given in [69] that predicts the worst case drop in the ENOB performance based on the peak-to-peak INL of the ADC. However, a formula that predicts the actual ENOB and/or SNDR is not proposed. Here, accurate SNR formulas are derived for different input waveforms. Simulation results are also presented that compare the accuracy of ( 5.2) with that of the derived formulas. To formulate the SNR, two common waveforms are selected; a ramp and a sinusoidal. The ramp waveform is chosen, as it is a good representative for a signal with uniform amplitude distribution. A sinusoidal waveform is also chosen as it is more commonly used for SNR measurements. The remainder of this chapter is organized as follows. Section 5.2 presents the transfer characteristics of an ADC in the absence or presence of the DLOs, which will be used for noise calculations. Section 5.3 derives the output noise equations for a ramp and a sinusoidal input waveform. Section 5.4 validates the derived formulas against MATLAB simulation results, and Section 5.5 summarizes the chapter. 5.2 Transfer Characteristic of an ADC To have a symmetric transfer characteristic and simplify the equations, a bipolar mid-rise (as opposed to mid-tread) model [71] is considered here for the ADC. Figure 5.2(a) and Figure 5.2(b) show the transfer characteristics for an ideal ADC (i.e., with no DLOs) and a non-ideal ADC (i.e., with DLOs), respectively. In Figure 5.2(a), the output (quantization) levels are denoted as Si and the decision levels are denoted as αi. There are n2 quantization levels and 12 −n decision levels. Without loss of generality, the amplitude of the input signal is normalized to 1 V, i.e., 2 V peak-to-peak. The output levels are also scaled so that 106 the slope of the transfer characteristic is unity. As the output levels are only a representation of the output code, the output voltage range does not affect this discussion. 122/2 −== nnK is defined as the number of steps in either the positive or the negative part of the transfer characteristic. All steps have equal widths and heights of K/1 which is equal to 1 LSB for the ADC. All the ADC input voltages in the range of αi-1 to αi are mapped to Si in the ADC output, which is the mean value of αi-1 and αi. αi and Si are given by KiK K i i ≤≤−= ,α ( 5.3) and KiK K iS iii ≤≤+− − = + = − 1, 2 12 2 1αα . ( 5.4) The two ends of the input range, −1 and 1, are also added as two pseudo-decision levels to the decision level array. The transfer function associated with the ideal ADC transfer characteristic of Figure 5.2(a) is denoted as S(x). Figure 5.2(b) shows the transfer characteristic of a non-ideal ADC. The DLOs change the step widths of the transfer characteristic, but maintain the step heights, as the quantization levels are not affected. In Figure 5.2(b), the decision levels, denoted as βi, and the corresponding DLOs, denoted by fi, are then related to the ideal decision levels, αi, by KiKf iii ≤≤−+= ,αβ . ( 5.5) Here, we assume that offsets are small enough to keep the order of the decision levels, i.e., the ADC has no missing code. Therefore, jiforji << ,ββ . ( 5.6) For the endpoints, we have KK −− = αβ and KK αβ = . 107 K i i =α K 1 1 =LSB (a) an ideal ADC (i.e., with no DLOs) K Fi i ii + =+= fiαβ (b) a non-ideal ADC (i.e., with DLOs) Figure 5.2: The transfer characteristics of ADCs 108 Normalized DLOs, named as Fi, are defined as the DLOs in terms of the LSB of the ADC. Therefore, ii fKF ⋅= . In addition, INL and DNL values can be expressed in terms of the DLOs as follows: ii FINL = , 1−−= iii FFDNL . ( 5.7) The transfer function associated with the non-ideal ADC transfer characteristic of Figure 5.2(b) is denoted as SF(x). For SNR calculations, a ramp or a sinusoidal signal with maximum amplitude, i.e., 2 Vp-p, and 0 V dc is applied to the ADC. Considering the transfer characteristic of the ideal ADC in Figure 5.2(a), output waveforms of Figure 5.3(a) and Figure 5.3(b) are generated for the ramp and sinusoidal waveforms, respectively. For the sinusoidal waveform, because of symmetry, only half of a period is shown in Figure 5.3. These waveforms are used for noise calculations in Section 5.3. In Figure 5.3(b), α'i is the corresponding x value for Vout(x) = αi. Therefore, α'i = sin-1(αi). For a non-ideal ADC, αi and α'i in Figure 5.3 are replaced with βi and β'i, respectively. i 1 i αsinα − =′ π/2αk =′ π/2α k- −=′ (a) (b) Figure 5.3: Output waveforms for an ideal ADC with (a) a ramp and (b) a sinusoidal input signal 109 5.3 Noise Power Calculations 5.3.1 Time-Domain Equations for Noise Power and SNR In practice, the SNR of an ADC is measured by applying a sinusoidal input waveform to the ADC circuit. The power spectrum of the ADC output reveals the signal power, S, and the noise power, N. Here, to formulate the noise, a time-domain approach is favored to simplify calculations. Therefore, signal power is calculated by integrating the input signal over period, T: ∫= T in dxVT S 0 21 . ( 5.8) Since the input and output waveforms of the ADC are normalized to have the same amplitude, noise power can be calculated based on the difference of those two waveforms, ( )∫ −= T outin dxVVT N 0 21 . ( 5.9) Equations ( 5.8) and ( 5.9) are used for signal and noise power and SNR calculations in the rest of this section. Quantization noise, here, refers to the noise generated in the quantizing process of an ideal ADC. In a non-ideal ADC, where DLOs come into effect, the output noise is increased and some distortions may appear in the output as well. In such a case, the SNDR is sometimes used to describe the linearity performance of the ADC. Here, we consider those distortions as part of the output noise. Therefore, we use the general term of the SNR in this case as well. Here, the DLO noise refers to the part of the output noise (and distortion) that appears in the output because of the DLOs. The DLO noise, NF, is achieved by subtracting the quantization noise, NQ, from the total output noise in the presence of the DLOs, N, i.e., NF = N − NQ. 110 5.3.2 A Ramp Input Signal For an ideal ADC with transfer characteristic S(x) and a ramp input, the signal power, Sramp, quantization noise power, NQ_ramp, and output SNR, SNRQ_ramp are as follows. 3 1 2 1 1 1 2 ∫ − == dxxS ramp ( 5.10) ( )( ) nrampQ dxxSxN 2 1 1 2 _ 23 1 2 1 ⋅ =−= ∫ − ( 5.11) nSNR rampQ 02.6_ = ( 5.12) To take DLOs into account, Figure 5.2(b) can be used to calculate the total output noise. ( )( )∫ − −= 1 1 2 2 1 dxxSFxN ramp ( 5.13) where SF(x) is the non-ideal ADC transfer characteristic. Breaking down the integral into the intervals of [βi−1, βi] results in ( )∑ ∫ +−= − −= K Ki iramp i i dxSxN 1 2 1 2 1 β β . ( 5.14) The quantization noise in ( 5.11) can also be re-written as ( )∑ ∫ +−= − −= K Ki irampQ i i dxSxN 1 2 _ 1 2 1 α α . ( 5.15) The contribution of the DLO noise to the ADC noise performance, shown by NF_ramp, is calculated by ( ) ( )∑ ∫∫ +−=         −−−= −= −− K Ki ii rampQramprampF i i i i dxSxdxSx NNN 1 22 __ 11 2 1 α α β β . ( 5.16) 111 Exchanging the integration limits gives ( ) ( )∑ ∫∫ +−=         −−−= − − K Ki iirampF i i i i dxSxdxSxN 1 22 _ 1 1 2 1 β α β α . ( 5.17) Replacing i with i+1 in the second term of ( 5.17) and considering the fact that KK αβ = and KK −− = αβ results in ( ) ( )( )∑ ∫− +−= +−−−= 1 1 2 1 2 _ 2 1 K Ki iirampF i i dxSxSxN β α . ( 5.18) Factorization and substitution of Si and Si+1 lead to ( ) ∑∑∑ ==−= − +−= − +−= i in K Ki i K Ki iirampF ffKKN 2 1 1 2 1 1 2 _ 2 1 2 1 2 1 αβ . ( 5.19) in which, for the purpose of brevity, the summation limits are removed as the summation is performed over the entire range. Equation ( 5.19) can be written in terms of frms, the root mean square of the DLOs, as 2 _ 2 11 rmsnrampF fN ⋅      −= . ( 5.20) where, ∑ − = i inrms ff 212 1 . ( 5.21) For large n values, 2 _ rmsrampF fN ≅ . Equation ( 5.20) indicates that for a ramp input waveform, any distribution of the offsets results in the same noise contribution, as long as the root mean square of the offsets remains unchanged. NF_ramp in ( 5.19) can be rewritten using normalized offsets and, then, can be expressed in terms of the quantization noise. rampQinrampF NFN _ 2 _ 2 112 ⋅      ⋅= ∑ ( 5.22) The total noise and the output SNR then become as follows: 112 rampQinrampFQramp NFNNN _ 2 2 1121}{ ⋅      ⋅+=+= ∑ ( 5.23)       ⋅+−= ∑ 22 1121log1002.6 inramp FnSNR ( 5.24)       ⋅      −⋅+−= 2 2 11121log1002.6 rmsnramp FnSNR ( 5.25) In the SNR formula of ( 5.25), the first term is the ideal ADC's SNR and the second term is the SNR reduction due to the DLO noise. Equation ( 5.23) indicates that both the quantization noise and the DLO noise contribute to the total noise, Nramp. Clearly, for small DLOs, the total noise is dominated by the quantization noise and thus the SNR remains almost the same as the ideal ADC's SNR. For large DLOs, however, the DLO noise may dominate and the quantization noise may no longer determine the output SNR. In such a case, the SNR is limited by the offsets and not by the ADC resolution. This is shown by the following equations. rampFDLOsrgelaramp NN _ ≅ ( 5.26)       ⋅−≅ ∑ 22 13log10 inDLOsrgelaramp fSNR ( 5.27) This discussion confirms that in the design of an ADC, the quantization noise should be the dominant noise, or at least should be comparable with other noise factors. Otherwise, the achievable SNR (and the effective resolution) is limited. Finally, the ENOB of the ADC is calculated as 02.6 ramp ramp SNR ENOB = . ( 5.28) 113 5.3.3 A Sinusoidal Input Signal Similarly to the discussion in Section 5.3.2, we start with calculating the quantization noise, NQ_sin. Unlike a ramp signal, quantization noise calculation for a sinusoidal signal is not straightforward. In fact, the quantization noise for the ramp signal NQ_ramp is a good approximation of NQ_sin for high-resolution ADCs. The reason is that for such ADCs, the quantization errors become so small that the distribution of the input signal amplitude does not have a major effect on the total quantization noise. If such an approximation is used, we would achieve [22] 76.102.6sin_ +≅ nSNRQ , ( 5.29) which has a good accuracy especially for high-resolution ADCs. However, we are looking for formulas that are applicable to low-resolution ADCs as well. Furthermore, the method used to achieve NQ_sin can also be used to calculate the total output noise, Nsin, in the presence of DLOs. For a sinusoidal waveform, the signal power is 2 1 sin1 2 2 2 ∫ − =⋅= pi pipi dxxS nsi . ( 5.30) Figure 5.3(b) is used for quantization noise calculations. So, ( )( )∫ −= 2 0 2 _ sinsin2 pi pi dxxSxN nsiQ . ( 5.31) Breaking down the integral in ( 5.31) into the intervals of [α'i−1, α'i] results in ( ) ( )∑ ∫ ∑ ∫ = ′ ′ = ′ ′ − − +⋅−= −= K i ii K i insiQ i i i i dxSSxx dxSxN 1 22 1 2 _ 1 1 sin2sin2 sin2 α α α α pi pi . ( 5.32) After separating the integral terms, 114 ( )( ) ( ) ( )∑ ∑ = − = − ′ − ′ −⋅ ⋅ + ′ − ′ − ⋅ += K i ii K i iinsiQ i K i K N 1 1 2 2 1 1_ 12 2 1 coscos122 2 1 αα pi αα pi . ( 5.33) Expanding, simplifying the summations, and substituting αi for (i/K) and α'i for sin-1(i/K) results in ( )       ⋅−+′⋅⋅−⋅ ⋅ +       ′⋅−− ⋅ += ∑ ∑ − = − = 2 128 2 1 cos212 2 1 2 1 1 2 1 1 _ pi α pi α pi Ki K K N K i i K i insiQ ( 5.34) ( )      ′⋅+′⋅+ ⋅ −      −+= ∑ − = 1 1 2 cos212 2 11 2 1 K i iiiKK ααα pi ( 5.35)                       ⋅      +      −⋅+ ⋅ −       −+= ∑ − = − 1 1 1 2 2 sin1212 2 11 2 1 K i K i K i K i K K pi ( 5.36) Table 5.1 provides the NQ_sin / NQ_ramp ratio for different ADC resolutions. A similar method can be used to calculate the total output noise Nsin., viz., ( )∑ − +−= ′⋅+′ ⋅ −      −+= 1 1 2 cos 2 2 11 2 1 K Ki iiiins KK N βαβ pi ( 5.37) ∑ − +−= −               + ⋅      +      + − ⋅ −       −+= 1 1 1 2 2 sin12 2 11 2 1 K Ki ii K Fi K i K Fi K K pi ( 5.38) Then, the SNR and ENOB of the ADC can be calculated as 115         = nsi nsi ins N S SNR log10 , ( 5.39) and 02.6 76.1 dBSNR ENOB insnsi − = . ( 5.40) Table 5.1: The NQ_sin / NQ_ramp ratio for different ADC resolutions n NQ_sin /NQ_ramp 10 log(NQ_sin /NQ_ramp) 3 1.183 0.729 4 1.130 0.529 5 1.092 0.381 6 1.065 0.273 → ∞ 1 0 5.4 Simulation Results Here, the formulas derived in Section 5.3 are validated against MATLAB simulations. To simulate a non-ideal ADC, a set of values is generated for the DLOs using MATLAB's Gaussian random generator function. The numbers are produced with a zero mean and variance of σF02. The decision levels are calculated and sorted to have a monotonic transfer characteristic. Using the sorted decision levels, the new DLOs are calculated. These DLOs have a variance of σF2 that is less than or equal to σF02. In the next step, a ramp or sinusoidal waveform is applied to the ADC. To calculate the signal power, the noise power, and the SNR, either the time domain or the frequency domain can be used. In the time-domain approach, Equation ( 5.9) is used for noise power calculations, while in the frequency-domain approach, the noise power is calculated using the power spectrum of the difference between the input and output signals of the ADC. 116 In our specific experiment, the ADC is assumed to have a 6-bit resolution. Simulations were performed by sweeping σF0 from 0 to 0.5 LSB and the SNR diagrams were plotted versus Frms. Note that here, Frms = σF, as mean(f) = 0. Figure 5.4 shows the simulation results for both ramp and sinusoidal input waveforms. For each waveform, graphs for time- and frequency-domain techniques are plotted. The results based on the equations derived (Equations ( 5.25) and ( 5.39)), as well as Equation ( 5.2), are also included in this figure. Figure 5.4 shows that both time-domain and frequency-domain simulations and the results from the derived formulas are in very good agreement (errorpeak = 0.0256 dB or 0.6% and errorrms = 0.0114 dB or 0.26%). It is worth noting that for a ramp input signal, the SNR depends only on Frms, while for a sinusoidal signal, the SNR depends not only on Frms, but also on the exact distribution of the DLOs in the decision levels. Therefore, if in each simulation run new sets of DLOs are utilized, the same diagram is achieved for the ramp signal, while different diagrams are achieved for the sinusoidal signal. Furthermore, for a sinusoidal signal, the SNR diagram versus Frms is not necessarily monotonic, i.e., there is a possibility that with a larger Frms, higher SNR is achieved, as in the case of increasing Frms from 0.39 to 0.42 LSB in Figure 5.4. To compare the results of ( 5.2) with the SNR of a sinusoidal signal, an offset equal to 1.76 dB is added to ( 5.2) which is plotted in Figure 5.4 using a dashed line. The plots in Figure 5.4 confirm that Equation ( 5.2) is not accurate (errorpeak = 1.16 dB or 30% and errorrms = 0.615 dB or 15%). In Figure 5.5, the FFT of the simulated ADC output for a 123 Hz sinusoidal signal sampled at 1 KHz is plotted for two different values of Frms. For Frms = 0.39 LSB the level of the output noise is higher than that of Frms = 0. This has resulted in a ~4.5 dB reduction in the output SNR. 117 31 32 33 34 35 36 37 38 0 0.1 0.2 0.3 0.4 F rms (LSB) SN R (dB ) Ramp signal formula, Equation (5.25) Ramp signal time-domain simulation Ramp signal frequency-domain simulation Equation (5.2) Sinusoidal singal formula, Equation (5.39) Sinusoidal signal time-domain simulation Sinusodial signal frequency-domain simulation Equation (5.2)+1.76dB Figure 5.4: Comparison of the SNRs achieved using the derived formulas, time-domain and frequency-domain simulations, and Equation ( 5.2) (a) (b) Figure 5.5: FFT of the simulated ADC output for a 123 Hz sinusoidal signal sampled at 1 KHz for (a) Frms = 0 (b) Frms = 0.39 LSB 118 5.5 Summary In this chapter, the effect of the DLOs on the ADC linearity is studied, and accurate formulas are introduced that calculate the SNR of the ADC in term of its INL performance. The analysis is performed in the time domain. Two input waveforms are examined for SNR formulation: a ramp and a sinusoidal. For a ramp waveform, the contribution of the DLOs to the total output noise power is estimated by the mean square of the DLOs. Simulation results in both the time and frequency domains confirm the accuracy of the derived formulas (error ≤ 0.6%), while they reveal that the formula presented in [67] produces up to 30% error in the SNR evaluations. 119 6 CONCLUSIONS A low-power single-channel 4-bit flash ADC in 0.18µm CMOS is presented in this thesis. A speed of 4GS/s is achieved through utilizing CML circuits, pipelining the entire ADC including the comparators and the encoder, and reformulation of the encoder function. The idea of implementing the complete ADC using CML blocks, which allows all the signals in the analog and digital parts of the ADC to operate as low-swing differential signals, results in improvements in speed and power consumption. In addition, due to the use of CML circuits, this ADC has a number of features including: (1) All the signals in this ADC in both the analog part and the digital part are differential, which ensures the immunity of the system to the common-mode noise. This is important, especially when the noise generating digital processing circuits are implemented in the vicinity of the ADC core. (2) All the signals in the circuit are low-swing signals. As a result, a lower noise is generated at the high-frequency operation of the circuit. (3) As all the clocked transistors in the circuit are in differential pairs, low-swing clock signals are also applicable to the ADC. Simulations as well as measurements show that the complete ADC circuit can operate with a differential low-swing ‘sinusoidal’ clock. (4) Because all the stages of this ADC consist of differential pairs, all the tail currents in the circuit are reused and none of the current sources are turned off. As a result, supply noise due to switching is minimized. Simulation results show that the ADC draws a DC current with less than 10% ripple. As a result, no supply decoupling capacitors are used on-chip, resulting in more saved area. 120 Designed and fabricated in a digital 1P6M 0.18µm CMOS process, the ADC achieves a linearity of 3.71 ENOB for a 10MHz input signal sampled at 4GS/s and a linearity of 3.14 (2.75) ENOB for a 0.501GHz (1.491GHz) input signal sampled at 3GS/s. The ADC circuit consumes 43mW from a 1.8V supply (14mW in the resistor ladder, 21mW in the comparator array, and 8mW in the encoder). Due to the use of CML blocks, the power consumption is independent of the clock frequency. The active area of the ADC including the output buffers is 0.06mm2. In comparison, a recently published work with similar specifications in speed and resolution [18] consumes around 0.6W (including clock buffer power consumption) while operating at 4GS/s. The active area for the ADC of [18] is 0.88mm2 (See Section 2.6.3 for more detail). For comparison with other published work, an FoM defined as Power/(2ENOB×2×fin) [66] is used for fin=1.491GHz with a 3GHz clock (the Nyquist condition). The proposed ADC achieves an FoM of 2.14pJ/conversion-step, which is at least 2× better than comparable designs. 6.1 Scalability Scalability of this ADC design in terms of applicability to newer CMOS technologies, using it in multi-channel architectures, increasing the speed with speed-power trade-off, and increasing the resolution are investigated here. 6.1.1 Newer CMOS Technologies with Lower Supply Voltages The ADC can be ported to smaller CMOS technologies to achieve even higher speeds. Since transistors are not stacked in this ADC and also due to the use of resistors instead of PMOS transistors [29], the circuit operation is possible with lower supply voltages. As an example, 121 this ADC was ported to a standard 90nm CMOS technology. Simulation results proved that a speed of 8 to 10GS/s could be achieved. 6.1.2 Multi-Channel Architectures This ADC can also be used in a multi-channel architecture to achieve very high speeds. As an example, a two-channel TI architecture is proposed in Section 3.5.2 using the double-edge sampling technique. To make the operation possible at the Nyquist rate for the TI ADC, either a T/H must be added at the system front end, or the bandwidth of each flash ADC must be doubled to allow for proper operation of the distributed T/H at Nyquist rate. 6.1.3 Increasing the Speed; Speed and Power Trade-off Because of a continuous tail current, CML blocks consume a constant power. However, this power could be traded-off with speed, as explained here. The speed of a CML block is limited by the output pole characteristic, which is determined by the Rout×CL product. The output resistance Rout includes the load resistance RD and the output resistance of transistors, and the load capacitance CL includes the parasitic capacitance of the CML block output node and the input capacitance of its next stage. Reducing RD improves the speed. However, the resultant reduction in gain and output voltage swing needs to be compensated by increasing the tail current, which leads to a higher power consumption. Note that speed improvement can also be achieved by reducing CL. However, decreasing CL requires using smaller transistors that would result in larger comparator offsets. Therefore, reducing RD is a preferred method. 122 6.1.4 Increasing the Resolution The architecture of both the comparators and the encoder used in this 4-bit flash ADC can be ported to flash ADCs with higher resolutions, e.g., 5- or 6-bit flash ADCs. The CML structure allows for multi-GHz operation of each sub-block, while the geometric layout techniques used here allow for efficient connection of the increased number of ADC sub-blocks. The narrow layout of the comparators allows for efficient distribution of the input and clock signals as well as the bias voltage to the comparators. The resistor averaging network would also have a better performance in the case of closeness of the preamplifier’s output ports. For a 6-bit flash ADC that contains 63 main comparators, connecting the comparator outputs to the encoder gates, especially for the first gate level, is a big challenge in drawing the layout. Using the reformulation introduced here, the outputs of the adjacent comparators go to the same gate. The speed enhancement that is achieved is even more pronounced as compared to the case of a 4-bit flash ADC. On the other hand, in a 6-bit flash ADC, resistor mismatches in a resistor ladder are more significant. The effect of those mismatches can be decreased using the common-centroid layout introduced for the resistor ladder in this thesis. However, in order to achieve a linearity that is appropriate for this higher ADC resolution, some digital calibration technique might be required. The current DAC trimming technique of [18] is a common technique that is used for comparator offset calibration. 6.2 Limitations Limitations of this work in terms of the applications, the scalability, and the contributions are listed in this section. 123 6.2.1 Limitations of the ADC applications • Due to the use of CML, this ADC is not suitable for low clock rate applications or the applications with relatively large periods of standby mode. • If a good performance at high frequency inputs is required, a T/H should be added in front of the ADC. • The ADC requires a differential clock rather than a single-ended. • Clock buffers must be added to the ADC to drive the 2×1.1pF clock capacitors. • The ADC has a 0.5pF input capacitance that requires addition of an input buffer for most applications. • The ADC has low-swing differential outputs that are suitable for most high-speed applications. However, some circuitry is required to convert them to full-swing single-ended outputs for other applications. • The differential input voltage range is ±0.65V. A variable-gain amplifier (VGA) may be required at the ADC input to use the full-scale input voltage. Also, a DC shift is required for the differential inputs. 6.2.2 Limitations of the scalability • In the CML blocks of this ADC a maximum of three transistors are stacked. This allows for low-supply operation. However, adopting the design into new technologies at some point could be problematic due to the low supply voltage. • Very high frequency ADC systems can be achieved by using this ADC in a multi-channel architecture. However, the complexity and cost of calibration and/or signal processing to overcome the mismatch effects could be very challenging. 124 • To improve the speed with speed-power trade-off explained in Section 6.1.3, larger input transistors are required for the increased bias current. This creates larger load capacitance for the previous stage and increases RDCL product that limits the speed. • To reduce power using the same trade-off, larger RD and smaller bias currents can be used. In this case, noise could be a limiting factor. • In increasing the resolution, extensive hardware is required for the ADCs with 7-bit and higher resolutions that results in a considerable on-chip implementation cost and power consumption. 6.2.3 Limitations of the contributions • CML increase the ADC speed. However, it suffers from the static power consumption that makes the ADC power inefficient in low clock rates. Also, it requires a power down circuitry for the standby mode. • Reformulation can enhance the signal routing from the comparator outputs to the encoder. Some long wires still remain in the next gate levels of the encoder, at the middle of which, CML latches can be added to reduce the delay by half. However, in higher resolution ADCs, these reduced wire delays could be still significant. It is worth noting that even though the reformulation is proposed for the flash ADCs with resistor averaging network that require a circular ordering of the comparators, it can be applied as well to the other flash ADCs that do not use resistor averaging, as in such ADCs, the order of the comparators is one of the design options. • For the common-centroid layout in the resistor ladder, the complexity associated with the number of segments and routing overhead, which increases quadratically 125 with the resolution, could become a challenge. In such cases, the resistance of the routing wires should also be accounted for. • The SNR versus INL formulas introduced in this thesis are limited to the ramp and sine input waveforms. Also, for the sine waveform a closed formula is not offered, as each INL value is important in the exact SNR calculation. 6.3 Future work Future work can be divided into two groups: (1) Enhancements to the proposed 4-bit flash ADC to improve its performance and/or extend its applications, and (2) using the ideas presented in this thesis in other flash ADCs and/or other analog circuit designs. 6.3.1 Enhancement to the Proposed Flash ADC In Section 3.3.5, several enhancements are introduced for the encoder. These modifications include: (1) In the encoder, some dummy gates can be added to different paths in order to equalize the delay of all paths. As a result, the timing performance of the encoder is improved. (2) For the AND/NAND gates, the circuit of Figure 3.23 can be used. Using this circuit results in a more symmetric layout that is also very similar to the layout of an XOR gate. (3) Because of the special properties of a thermometer code, all the AND/NAND gates in the encoder can be replaced with XOR gates, that has symmetric topology unlike an AND/NAND gate. The resultant circuit is an all-XOR gate encoder circuit. Also, in order to keep the circuit simple, the proposed ADC is implemented with a distributed T/H. However, adding a single front-end T/H that samples the signal with the required speed (e.g., 4GS/s) and the required linearity (e.g., 5 to 6 bit T/H linearity for a 126 4-bit flash ADC) [25] would improve its performance at high frequencies. In addition, to improve the BER, either the conversion of the Gray code to binary code should be removed (as explained in Section 4.5.7) or it should be performed after the third level of latches, that gives the comparator outputs a higher chance to resolve the metastability. 6.3.2 Reconfigurable Circuit with Speed and Power Trade off As explained in Section 6.1.3, to trade off speed with power, RD and the tail current should be adjusted according to the speed/power requirements. Changing the tail current is straightforward as all ADC sub-blocks share the same bias current. However, changing the resistance requires further considerations. A simple solution is to use several resistors in parallel that could be switched, as shown in Figure 6.1. Figure 6.1: A CML latch and the method to adjust its load resistance and tail current A digital control circuit turns the switches on and off according to the speed/power requirements. PMOS transistors can operate as the switches shown in the figure. They can 127 be sized according to the resistor values in order to preserve the resistance ratios among the different branches. One drawback of this method is that the number of components connected to the circuit output node can increase the output capacitance. However, as the transistors are sized according to the resistor sizes, the added capacitance could be small. On the other hand, this larger capacitance can be compensated with a smaller RD resistance and a larger tail current. 6.3.3 Using Inductors in the Preamplifier Stage to Increase Speed and Bandwidth For further speed improvements, inductors can be added to the ADC. Similar to [18], as shown in Figure 6.2, inductors can be added in the preamplifier stage to enhance its gain-bandwidth product. This method is referred to as the inductor peaking technique. Figure 6.2: Speed improvement in ADC using inductors in preamplifier Inductors have the drawback of large area required for on-chip implementation. In [18] differential inductors are used with a small area of 32µmx32µm. However, still a huge area 128 is required to implement the inductors for all comparators. An alternative to these passive inductors are active inductors such as in [72]–[74]. Preliminary schematic simulation results in 90nm CMOS show that adding active inductors to the preamplifiers of the proposed ADC could enhance the sampling speed from 10GS/s to 13GS/s. 6.3.4 Further Enhancements to the Proposed ADC As there are many current sources in the circuit, if a current reuse method [75] is used in the ADC, part of the consumed power can be recycled and the total power consumption can be reduced. As another enhancement to the design in order to reduce the power, wave-pipelining [53] can be used to remove the CML latches used in the encoder. However, removing those latches would increase the chance of metastability. 6.3.5 Extending the Ideas to other Circuits In addition to the flash ADCs, the high-speed low-swing comparator introduced here, can be utilized for other ADCs. For example, pipelined ADCs can be designed using this low-swing comparator. Other blocks of the ADC can also be designed using similar CML circuits, in order to operate the whole pipelined ADC circuit at multi-GHz rates. On the other hand, this comparator can be used in a sigma-delta ADC. CML implementation of the other blocks in such an ADC would result in high-speed operation of the complete circuit. Using low-swing CML blocks can be extended to other mixed signal blocks such as VCO or PLL as well. All the delay elements in VCO and the phase detector and other logic blocks can be implemented in CML to allow operating at multi-GHz speeds. 129 REFERENCES [1] M. Harwood et al., “A 12.5Gb/s serdes in 65nm CMOS using a baud-rate ADC with digital receiver equalization and clock recovery,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 436−437, February 2007. [2] D. J. Foley and M. P. Flynn, “A low-power 8-PAM serial transceiver in 0.5µm digital CMOS,” IEEE Journal of Solid-State Circuits, vol. 37, pp. 310−316, March 2002. [3] C. K. Yang, V. Stojanovic, S. Mojtahedi, M. Horowitz, and W. Ellersick, “A serial-link transceiver based on 8-G samples/s A/D and D/A converters in 0.25µm CMOS,” IEEE Journal of Solid-State Circuits, vol. 36, pp. 293–301, November 2001. [4] R. Thirugnanam, D. S. Ha, S. S. Choi, “Design of a 4–bit 1.4GSamples/s low power folding ADC for DS–CDMA UWB,” IEEE International Conference on Ultra– Wideband (ICU), pp. 536 – 541, September 2005. [5] T. Yamamoto, S.-I. Gotoh, T. Takahashi, K. Irie, K. Ohshima, and N. Mimura, “A mixed-signal 0.18-µm CMOS SoC for DVD systems with432-MSample/s PRML read channel and 16-Mb embedded DRAM,” IEEE Journal of Solid-State Circuits, vol. 36, pp. 1785–1794, November 2001. [6] K. Poulton, R. Neff, A. Muto, A. W. L. Burstein, and M. Heshami, “A 4 GSample/s 8b ADC in 0.35µm CMOS,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 166−167, February 2002. [7] K. Poulton et al., “A 20Gs/s 8b ADC with a 1Mb memory in 0.18µm CMOS,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 318−319, February 2003. [8] L. Y. Nathawad, R. Urata, B. A. Wooley, and D.A.B. Miller, “A 20GHz bandwidth, 4b photoconductive-sampling time-interleaved CMOS ADC,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 320−321, February 2003. [9] X. Jiang, Z. Wang, and M. F. Chang, “A 2GS/s 6b ADC in 0.18µm CMOS” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 322–323, February 2003. 130 [10] R. Budruk, D. Anderson, T. Shanley, and J. Winkles, PCI Express System Architecture, Addison-Wesley, 2003. [11] Serial ATA: High speed serialized AT attachment, Serial ATA Workgroup, APT Technologies, Inc., Santa Cruz, CA, revision 1.0, pp. 1–306, August 29, 2001. [12] S. C. Liang, D. J. Huang, C. K. Ho, H. C. Hong, “10 GSamples/s, 4-bit, 1.2V, design-for-testability ADC and DAC in 0.13µm CMOS technology,” IEEE Asian Solid-State Circuits Conference, pp. 416−419, November 2007. [13] P. Scholtens and M. Vertregt, “A 6-b 1.6-Gsample/s flash ADC in 0.18µm CMOS using averaging termination,” IEEE Journal of Solid-State Circuits, vol. 37, pp. 1599−1609, December 2002. [14] S. Sheikhaei, S. Mirabbasi, and A. Ivanov, “A 4-bit 5GS/s flash A/D converter in 0.18µm CMOS,” IEEE International Symposium on Circuits and Systems (ISCAS), pp. 6138–6141, May 2005. [15] S. Sheikhaei, S. Mirabbasi, and A. Ivanov, “An encoder for a 5GS/s 4-bit flash ADC in 0.18µm CMOS,” IEEE Canadian Conference on Electrical and Computer Engineering, pp. 680−683, May 2005. [16] S. Sheikhaei, S. Mirabbasi, and A. Ivanov, “A 0.18µm CMOS pipelined encoder for a 5GS/s 4-bit flash analogue-to-digital converter,” IEEE Canadian Journal of Electrical and Computer Engineering, vol. 30, pp. 183–187, Fall 2005. [17] S. Sheikhaei, S. Mirabbasi, and A. Ivanov, “A 43mW single-channel 4GS/s 4-bit flash ADC in 0.18µm CMOS,” IEEE Custom Integrated Circuit Conference (CICC), pp. 333−336, September 2007. [18] S. Park, Y. Palaskas, and M. P. Flynn, “A 4-GS/s 4-bit flash ADC in 0.18 µm CMOS,” IEEE Journal of Solid-State Circuits, vol. 42, pp. 1865−1872, September 2007. [19] C. Paulus, H.M. Bluthgen, M. Low, E. Sicheneder, N. Briils, A. Courtois, M. Tiebout, and R. Thewes, “A 4GS/s 6b flash ADC in 0.13µm CMOS,” IEEE Symposium on VLSI Circuits Digest of Technical Papers, pp. 420−423, June 2004. [20] S. Sheikhaei, S. Mirabbasi, and A. Ivanov, “Time-Domain Analysis of INL Effects on the SNR of ADCs”, submitted for review to IEEE Transactions on Circuits and Systems II (TCAS-II), July 2008 [21] I. Mehr and D. Dalton, “A 500-MSample/s, 6-b Nyquist-rate ADC for disk-drive read channel application,” IEEE Journal of Solid-State Circuits, vol. 34, pp. 912–920, July 1999. [22] B. Razavi, Principles of Data Conversion System Design, New York, IEEE Press, 1995. 131 [23] M. Amourah et al., “A 9b 165MS/s 1.8V pipelined ADC with all digital transistors amplifier,” IEEE Custom Integrated Circuits Conference (CICC), pp. 421−424, September 2003. [24] Y. Li and E. S.-Sinencio, “A wide input bandwidth 7-bit 300-MSample/s folding and current mode interpolating ADC,” IEEE Journal of Solid-State Circuits, vol. 38, pp. 1405−1410, August 2003. [25] M. Choi and A. Abidi, “A 6-bit 1.3-GSample/s flash ADC in 0.35µm CMOS,” IEEE Journal of Solid-State Circuits, vol. 36, pp. 1847–1858, December 2001. [26] K. Sushihara, H. Kimura, Y. Okamoto, K. Nishimura, and A. Matsuzawa, “A 6-b 800-MSample/s CMOS A/D converter,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 428–429, February 2000. [27] K. Nagaraj et al., “A dual-mode 700-Msamples/s 6-bit 200-Msamples/s 7-bit A/D converter in a 0.25-µm digital CMOS process,” IEEE Journal of Solid-State Circuits, vol. 37, pp. 1760–1768, December 2000. [28] G. Geelen, “A 6-b 1.1-GSample/s CMOS A/D converter,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 128–129, February 2001. [29] K. Uyttenhove and M. Steyaert, “A 1.8-V 6-bit 1.3-GHz flash ADC in 0.25µm CMOS,” IEEE Journal of Solid-State Circuits, vol. 38, pp. 1115−1122, July 2003. [30] B. Razavi, “Design of sample-and-hold amplifiers for high-speed low-voltage A/D converters,” IEEE Custom Integrated Circuits Conference (CICC), pp. 59−66, May 1997. [31] D. Johns and K. Martin, Analog Integrated Circuit Design, John Wiley, 1997. [32] Y. Tamba and K. Yamakido, “A CMOS 6-b 500-MSample/s ADC for a hard disk-drive read channel,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 324–325, February 1999. [33] C. Donovan and M. P. Flynn, “A ‘digital’ 6-bit ADC in 0.25µm CMOS,” IEEE Journal of Solid-State Circuits, vol. 37, pp. 432−437, March 2002. [34] K. Uyttenhove and M. Steyaert, “Speed-power-accuracy tradeoff in high-speed CMOS ADCs” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 49, pp. 280−287, April 2002. [35] M. B. Choi, “A 6bit 1.3GSample/s A/D converter in 0.35µm CMOS”, Ph.D. dissertation, University of California, Los Angeles, CA, 2002. [36] H. Pan, M. Segami, M. Choi, L. Cao, and A. A. Abidi, “A 3.3V 12b 50MS/s A/D converter in 0.6µm CMOS with over 80dB SFDR,” IEEE Journal of Solid-State Circuits, vol. 35, pp. 1769−1780, December 2000. 132 [37] M. J. M. Pelgrom, H. P. Tuinhout, and M. Vertregt, “Transistor matching in analog CMOS applications,” International Electron Devices Meeting, pp. 915−918, December 1998. [38] W. Ellersick, C.-K.K.Yang, V. Stojanovic, S. Modjtahedi, and M. A. Horowitz, “A serial-link transceiver based on 8 GSample/s A/D and D/A converters in 0.25µm CMOS,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 58−59, February 2001. [39] M. P. Flynn, C. Donovan, and L. Sattler, “Digital calibration incorporating redundancy of flash ADCs,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 50, pp. 205−213, May 2003. [40] K. Kattmann and J. Barrow, “A technique for reducing differential nonlinearity errors in flash A/D converters,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 170−171, February 1991. [41] H. Pan and A. Abidi, “Spatial filtering in flash A/D converters,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 50, pp. 424−436, August 2003. [42] L.Y. Nathawad, R. Urata, B.A. Wooley, and D.A.B. Miller, “A 40-GHz-bandwidth, 4-bit, time-interleaved A/D converter using photoconductive sampling,” IEEE Journal of Solid-State Circuits, vol. 38, pp. 2021–2030, December 2003. [43] B. Murmann and B. Boser, “Digitally assisted analog integrated circuits,” ACM Queue, vol. 2, no. 1, March 2004. pp. 65–71 [44] S. Park, Y. Palaskas, and M. P. Flynn, “A 4GS/s 4b flash ADC in 0.18µm CMOS,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 570–571, February 2006. [45] S. Tsukamoto, W. G. Schofield, and T. Endo, “A CMOS 6-b, 400-MSample/s ADC with error correction,” IEEE Journal of Solid-State Circuits, vol. 33, pp. 1939–1947, December 1998. [46] C. Portmann and T. Meng, “Power-efficient metastability error reduction in CMOS flash A/D converters,” IEEE Journal of Solid-State Circuits, vol. 31, pp. 1132–1140, August 1996. [47] N. Kurosawa, H. Kobayashi, K. Maruyama, H. Sugawara, and K. Kobayashi, “Explicit analysis of channel mismatch effects in time-interleaved ADC,” IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, vol. 48, pp. 261−271, March 2001. [48] W.C. Black and D.A. Hodges, “Time interleaved converter arrays,” IEEE Journal of Solid-State Circuits, vol. 15, pp. 1022−1029, December 1980. 133 [49] B. Yu and W.C. Black, “A 900 MS/s 6b interleaved CMOS flash ADC,” IEEE Custom Integrated Circuits Conference (CICC), pp. 149−152, May 2001. [50] J. H. Koo, Y. J. Kim, B. H. Park, S. S. Choi, S.I. Lim, and S. Kim, “A 4-bit 1.356 Gsps ADC for DS-CDMA UWB system,” IEEE Asian Solid-State Circuits Conference, pp. 339−342, November 2006. [51] W. Ellersick, K. Yang, M. Horowitz, and W. Dally, “GAD: A 12GS/s CMOS 4-bit A/D converter for an equalized multilevel link,” IEEE Symposium on VLSI Circuits Digest of Technical Papers, pp. 49–52, June 1999. [52] F. Kaess, R. Kanan, B. Hochet, and M. Declercq, “New encoding scheme for high-speed flash ADCs,” IEEE International Symposium on Circuits and Systems (ISCAS), pp. 878–882, June 1997. [53] W. P. Burleson, M. Ciesielski, F. Klass, and W. Liu, “Wave-pipelining: a tutorial and research survey,,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 6, no. 3, pp. 464–474, September 1998. [54] G. Van der Plas, S. Decoutere, and S. Donnay, “A 0.16pJ/conversion-step 2.5mW 1.25GS/s 4b ADC in a 90nm digital CMOS process,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 2310−2311, February 2006. [55] S. Park, Y. Palaskas, A. Ravi, R. E. Bishop, and M. P. Flynn, “A 3.5GS/s 5-b flash ADC in 90nm CMOS,” IEEE Custom Integrated Circuit Conference (CICC), pp. 489–492, September 2006. [56] A. G. W. Venes and R. J. van de Plassche, “An 80-MHz, 80-mW, 8-b CMOS folding A/D converter with distributed track-and-hold preprocessing,” IEEE Journal of Solid-State Circuits, vol.31, pp.1846−1853, December 1996. [57] B. Razavi, Design of Analog CMOS Integrated Circuits, McGraw-Hill, New York, 2001. [58] P. R. Gray, P. J. Hurst, S. H. Lewis, and R. G. Meyer, Analysis and Design of Analog Integrated Circuits, 4th ed. New York: John Wiley, 2001. [59] J.L. Hennessy, D. A. Patterson, D. Goldberg, and K. Asanovic, Computer Architecture: A Quantitative Approach, 3rd edition, 2003. [60] C. S. Vaucher, I. Ferencic, M. Locher, S. Sedvallson, U. Voegeli, and Z. Wang, “A family of low-power truly modular programmable dividers in standard 0.35µm CMOS technology,” IEEE Journal of Solid-State Circuits, vol. 35, no. 7, pp. 1039–1045, July 2000. [61] A. Hastings, The Art of Analog Layout, Prentice Hall, Upper Saddle River, NJ, 2001. 134 [62] K. Yip, “Clock tree distribution,” IEEE Potentials, vol. 16, pp. 11–16, April/May 1997. [63] J. Rabaey, Digital Integrated Circuits: A Design Perspective. Prentice-Hall, 1995. [64] W. Kester, The Data Conversion Handbook, Analog Devices, Inc., Elsevier, New York, 2005. [65] J. Doernberg, H. Lee, and D. Hodges, “Full-speed testing of A/D converters,” IEEE Journal of Solid-State Circuits, vol. 19, no. 6, pp. 820−827, December 1984. [66] D. Draxelmayr, “A 6b 600MHz 10mW ADC array in digital 90nm CMOS,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 264−265, February 2004. [67] B. Brannon, Aperture Uncertainty and ADC System Performance, Analog Devices, Application note AN-501, September 2000. [68] B. Brannon, “DNL and Some of its Effects on Converter Performance”, Wireless Design and Development, June 2001. [69] R. van de Plassche, CMOS Integrated Analog-to-Digital and Digital-to-Analog Converters, Kluwer Academic Publishers, 2003. [70] M. F. Wagdy and S. S. Awad, “Determining ADC effective number of bits via histogram testing,” IEEE Transactions on Instrumentation and Measurement, vol. 40, pp. 770–772, August 1991. [71] E. Farag and M. Elmasry, Mixed Signal VLSI Wireless Designs Circuits and Systems, Kluwer Academic Publishers, 2000. [72] N. Krishnapura, M. Barazande-Pour, Q. Chaudhry, J. Khoury, K. Lakshmikumar, and A. Aggarwal, “A 5Gb/s NRZ transceiver with adaptive equalization for backplane transmission,” IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, pp. 60−61, February 2005. [73] E. Sackinger and W. Fischer, “A 3-GHz 32-dB CMOS limiting amplifier for SONET OC-48 receivers,” IEEE Journal of Solid-State Circuits, vol. 35, pp. 1884−1888, December 2000. [74] Y.-S. M. Lee, S. Sheikhaei, and S. Mirabbasi, “A 10Gb/s active-inductor structure with peaking control in 90nm CMOS”, to be presented in Asian Solid-State Circuit Conference (ASSCC), Nov. 2008. [75] C. Inui, I. C. H. Lai, M. Fujishima, “60GHz CMOS current-reuse cascade amplifier,” IEEE Asia-Pacific Microwave Conference (APMC), pp. 793−796, December 2007. [76] S. Sheikhaei, S. Mirabbasi, and A. Ivanov, “A 0.35µm CMOS comparator circuit for high-speed ADC applications,” IEEE International Symposium on Circuits and Systems (ISCAS), pp. 6134–6137, May 2005. 135 APPENDIX A: DESIGN AND TEST OF A 1GHZ COMPARATOR IN 0.35µM CMOS A.1 Introduction In this appendix, a 1GHz clocked comparator is presented [76] as an example of high-speed comparator design. The comparator consists of a preamplifier and a latch stage followed by a dynamic latch that operates as an output sampler. The output sampler circuit consists of a full transmission gate (TG) and two inverters. The use of this sampling stage results in a reduction in the power consumption of this high-speed comparator. Simulations show that the charge injection of the TG adds constructively to the sampled signal value, therefore amplifying the sampled signal with a modest gain of 1.15. Combined with the high gain of the inverters, the sampled signals are amplified toward the rail voltages. This comparator is designed and fabricated in a 0.35µm standard digital CMOS technology. Measurement results show a sampling frequency of 1GHz with 16mV resolution for a 1V input signal range and 2mW power consumption from a 3.3V supply. A.2 Block Diagram of the Proposed Comparator A block diagram of this comparator is shown in Figure A.1. It consists of three main stages: a preamplifier, a latch and an output sampler. 136 Figure A.1: Architecture of the proposed comparator In high-speed ADCs, the conventional comparator architecture consists of several stages of preamplifiers and latches [21][25]. In such architectures, in the last latching stage, during the regeneration process, two cross-connected inverters create a positive feedback and generate the rail-to-rail voltages at the output of the comparator. During this process, a low resistance path is formed from VDD to ground through the cross-connected inverters. As a result, the last latching stage usually consumes a higher power. The proposed architecture reduces this power consumption by using an output sampler as the last latching stage, while maintaining the speed performance. A TG is used to implement the output sampler circuit. As explained in Section A.3, the proposed architecture takes advantage of the typically problematic charge injection phenomenon of the transistors in the transmission gate. A test circuit based on the proposed architecture was fabricated in a 0.35µm standard digital CMOS technology, and was successfully tested. A.3 Circuit Blocks of the Proposed Comparator The circuit-level schematics of the building blocks of the proposed comparator are shown in Figure A.2, including a preamplifier (Figure A.2(a)), a latch and output samplers (Figure A.2(b)) and sampling clock generators (Figure A.2(c)). Timing diagrams of the comparator clock and the output sampler clock are also shown in Figure A.2(d). 137 (a) Preamplifier (b) Latch stage and output sampler (c) Sampling clock generators (d) Timing diagram Figure A.2: The proposed comparator 138 The detailed operation of each building block is described here. The preamplifier is shown in Figure A.2(a). The input differential pairs are PMOS rather than NMOS transistors. This is partly because the range of the input voltage is assumed to be below 1V. The output currents of the preamplifier are mirrored into the latch stage through transistors M5-M6 in Figure A.2(a) and M3-M4 in Figure A.2(b). The latch stage (Figure A.2(b)) consists of a cross-coupled pair of NMOS and PMOS transistors, which are connected to the ground through the clock enabled transistor M5. When the clock (CLK) is low (i.e., CLKB is high), the latch is in its reset state (Figure A.2(d)). In this state, the latch output voltages are at the midpoint of the rails. This yields a faster regeneration time in comparison with the case of starting from the rail voltages [31]. During the reset phase, the preamplifier translates the voltage difference between the inputs of the comparator into an unbalanced state in the latch stage. Then, in the evaluation phase (when CLK goes high), the latch stage is activated. Because of the positive feedback, this unbalanced state is amplified towards the rail voltages. At the end of this phase, there is still a gap between latch outputs and rail voltages. The outputs of the latch are sampled and then the latch is reset through the reset switch, M8. Deferring the sampling time of the output sampler to the end of the evaluation phase decouples the latch stage and its parasitic capacitance from the output sampler (refer to Figure A.2(b)). This decoupling allows for a faster operation of the latch circuit at the beginning of its evaluation phase. Furthermore, it minimizes the kickback effects of the output sampler and the error due to the charge injection of the TG on the latch outputs, at this moment. The output sampling circuit is also shown in Figure A.2(b) and consists of a full TG and two inverter buffers. This configuration is in fact a dynamic latch. The combination of the input capacitance of the inverter gate and the output capacitance of the TG acts as a holding 139 capacitor. The outputs are sampled at the end of the evaluation phase. A short pulse signal is needed as the sampling clock (Figure A.2(d)). The samples are amplified (and buffered) using the output inverters. The final output samples remain constant for the whole clock period, which relaxes the timing requirements for the following stages (e.g., encoder). Figure A.2(c) shows the output sampling clock generators. Two delay lines, each consisting of three inverters, delay the rising edge of the Samp and falling edge of the SampB signals. As a result, the required sampling signals of Figure A.2(d) are generated. A.4 Slight Amplification by the Transmission Gate Typically, charge injection is a major problem when a TG is utilized as a sampling switch. However, in the proposed architecture, voltage change due to charge injection adds constructively to the sampled signal and helps push the sampled signals towards the rail voltages. To further clarify this property, consider the TG shown in Figure A.3. A full TG passes rail-to-rail voltages. The NMOS transistor acts as a closed switch for low-to-medium voltages while the PMOS transistor operates as a low resistance path for medium-to-high (near supply) voltages. When the TG is in the track mode, i.e., the Samp signal is high, the Mn transistor acts as a closed switch for low input voltages. At the end of the track phase, Samp changes to low. This opens the Mn switch and causes half of its channel charge to be injected into the sampling capacitor. Since the channel charge is negative, it shifts the sampled signal towards lower voltages. A similar complementary process happens for Mp. Thus, the differential sampled signal is amplified. Simulations show a modest gain of 1.15, as depicted by the input-output characteristic of the TG in Figure A.4. 140 (a) for low-voltage input signals Po sit iv e C ha rg e (b) for high-voltage input signals Figure A.3: Charge injection effect in a full TG Figure A.4: Amplification by a full TG The gain of the TG can be estimated using the following equations. In these equations, NMOSV∆ and PMOSV∆ are the voltage changes at the output of the TG due to the charge injection of NMOS and PMOS transistors, respectively [31]. CH is the total capacitance at 141 the output of the TG that acts as the holding capacitor. This capacitor has two parts, CTG and Cinv, which are the output capacitance of the TG and the input capacitance of the inverter, respectively. G is the small-signal gain of the TG. H TNinDDoxN NMOS C VVVLCWV 2 )( −− −=∆ (A.1) H TPinoxP PMOS C VVLCW V 2 )( − =∆ (A.2) NMOSPMOSinjcharge VorVV ∆∆=− (A.3) For WWW PN == , we have, H ox in injchargein in out C WLC V VV V VG 2 1 )( += ∂ +∂ = ∂ ∂ = − (A.4) and assuming the sizes of W and 2W for the NMOS and PMOS transistors in the inverter, oxoxox invTGH WLCLCWWWLC CCC 2 7)2( 2 1 =++≅ += (A.5) and 14.1 7 11 =+≅G (A.6) A.5 Offset of the Comparator In a flash ADC, the offset of the comparators is an important issue, which results in the degradation of the accuracy of the ADC. There are various methods reported in the literature to reduce this offset; including input offset storage [21], offset averaging [25], and digital calibration [33]. The input offset storage method is used for the ADCs which permit idle times (i.e., not continuously working). Therefore, this method is not applicable for the 142 potential applications of the proposed comparator. The offset averaging method is a popular technique, which is commonly used in flash ADCs. The last method, digital calibration, is obtained at the expense of added complexity in the digital domain. However, it can be used for accurate offset removal of a flash ADC. The offset averaging and digital calibration methods are appropriate for use in conjunction with the proposed comparator when the comparators are employed in a flash ADC. However, none of these methods were implemented in the fabricated stand-alone comparator. The offset averaging method is not applicable for a single comparator, while the digital calibration method adds considerable complexity to the circuit of a single comparator. Although, no offset cancellation technique was used in the fabricated comparator, the offset of the comparator was measured in view of enhancing future designs. A.6 Measurement Results Figure A.5 shows the test setup of the chip. Reference voltages were generated using a DC power supply. A single signal generator was employed to generate synchronous clock and input signals. Attenuators were added to the input path for precise resolution and offset measurements. Table A.1 shows the frequency measurement results for three sample chips. Table A.2 shows the performance measurement results averaged for three chips, compared with the fastest reported comparator in 0.35µm CMOS [25]. The ratio of the comparator speed to its power consumption can be used as a performance metric of the comparator. From the data in the table, the Power Speed metric for the proposed comparator shows a major improvement in comparison with the comparator in [25]. 143 Figure A.5: The test setup Table A.1: Sampling frequency measurement results for three sample chips Chip Number Max Freq #1 960MS/s #2 1.00GS/s #3 1.03GS/s Table A.2: Measurement results averaged for three chips compared with [25] Performance metrics The proposed comparator Comparator in [25] Technology 0.35µm CMOS 0.35µm CMOS Supply Voltage 3.3V 3.3V Input signal range 1V 1V Resolution 16mV = 6bit 16mV = 6bit Sampling frequency 1.0GHz 1.3 GHz Power consumption 2mW 4.3mW Power Speed 500 GS/J 300 GS/J Input-referred offset voltage 50mV Cancelled by offset averaging 144 A.7 Summary A comparator architecture suitable for high-speed applications is presented. Using an output sampler consisting of a TG and two inverters as the last latching stage reduces the power consumption. The TG’s charge injection enhances the voltage level of the sampled signal. For proper operation of this architecture, a short-pulse sampling signal is required to activate the output sampling circuitry. A simple pulse-generator, which constructs such short pulses from the comparator clock, was also presented. The comparator is fabricated in a standard 0.35µm CMOS process and consumes 2mW from a 3.3V supply, while operating at 1GHz. 145 APPENDIX B: ADC PERFORMANCE METRICS2 B.1 Static Metrics Figure B.1: Static ADC metrics [22] Decision level is the input voltage at which the output code changes. Decision level offset (DLO) is the offset of a decision level in reference to the respective ideal decision level. LSB voltage is the voltage that creates a unit change in the ADC output code. It is calculated by the input voltage range divided by the number of output codes (i.e., 2n). Offset error is the vertical intercept of the straight line through the end points (line AB in Figure B.1). Gain error is the deviation of the slope of line AB from its ideal value (usually unity). 2 Refer to [22] for more information. 146 Integral nonlinearity (INL) is the deviation of the input/output characteristic from a straight line passed through its end points (line AB) in terms of LSB. Differential nonlinearity (DNL) is the deviation in the difference between consecutive decision levels from the ideal value of 1 LSB. It is worth noting that good linearity for an ADC is considered as INL and DNL values less than ±1 and ±0.5 LSB, respectively [29]. INL>0.5LSB or DNL>1LSB can cause missing code, which means a code that does not appear in the output. B.2 Dynamic Metrics Signal-to-noise-ratio (SNR) is the ratio of the signal power to the total noise power at the output (usually measured for a sinusoidal input). Signal-to-noise-and-distortion-ratio (SNDR) is the ratio of the signal power to the total noise and harmonic power at the output, when the input is a sinusoid. Spurious-free-dynamic-range (SFDR) is the ratio of the signal power to the power of the strongest spurious signal at the output, when the input is a sinusoid. Effective number of bits (ENOB) is defined by the following equation when SNDRP is the peak SNDR of the converter expressed in decibels. 02.6 76.1− = PSNDRENOB