Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A hybrid phase-locked loop for clock and data recovery applications Jalali, Mohammad Sadegh 2010

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2010_fall_jalali_mohammad.pdf [ 6.57MB ]
Metadata
JSON: 24-1.0064827.json
JSON-LD: 24-1.0064827-ld.json
RDF/XML (Pretty): 24-1.0064827-rdf.xml
RDF/JSON: 24-1.0064827-rdf.json
Turtle: 24-1.0064827-turtle.txt
N-Triples: 24-1.0064827-rdf-ntriples.txt
Original Record: 24-1.0064827-source.json
Full Text
24-1.0064827-fulltext.txt
Citation
24-1.0064827.ris

Full Text

A HYBRID PHASE-LOCKED LOOP FOR CLOCK AND DATA RECOVERY APPLICATIONS by Mohammad Sadegh Jalali B.Sc., The University of Tehran, 2008.  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in The Faculty of Graduate Studies (Electrical and Computer Engineering)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August 2010 c Mohammad Sadegh Jalali, 2010  Abstract Clock and data recovery (CDR) circuits are among critical building blocks of wireline receivers. In these receivers, typically after compensating for the adverse effects of the channel by an equalizer, the received signal is processed to the CDR block. The timing of the signal is first extracted (clock recovery), and then the actual data is recovered (data recovery). To recover the clock, phase-locked loops (PLLs) are usually used. The PLL output can track the phase and the frequency of its input. In this thesis, a hybrid PLL architecture is proposed. The PLL starts its operation using a binary phase/frequency detector (PFD) to achieve a fast lock and a wide tuning range. The operation is then automatically switched to a linear phase detector (PD) to achieve a low jitter clock signal upon lock, and finally the bandwidth is decreased to decrease the output jitter even more. Automatic switching of the operation from the binary to the linear PD is achieved by detecting the point at which the clock frequency crosses the data frequency. This PLL structure is particularly suitable for CDR applications, as its output is insensitive to continuous data streams. Also, a feedback-based technique is used in the charge pump (CP) to increase its swing. This is done by detecting the change in the drain-source voltages of the current source transistors of the CP and changing their gate-source voltages in a closed-loop feedback system to keep their currents constant. The PLL is designed and simulated in a 0.13 µm CMOS technology. Post-layout simulations show that the tuning range of the PLL is from ∼8.3 GHz to 9.6 GHz, and it ii  Abstract consumes about 35 mW from a 1.2 V supply and has a deterministic jitter of about 35 fs. The total random jitter of the designed PLL is about 0.1 unit interval (UI) (11.7 ps with a clock frequency of 8.5 GHz). The worst-case lock time of the PLL is slightly less than 30 ns.  iii  Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ii  Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  iv  List of Tables  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ix  List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  x  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  xv  1 Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1  1.1  Motivation  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1  1.2  Objectives  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4  1.3  Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  5  1.3.1  Low- and High-Frequency Hybrid PFD for CDR Applications . . .  5  1.3.2  Fast Frequency Comparator for Switching the Operation Mode  . .  6  1.3.3  High-Swing Differential Charge Pump  . . . . . . . . . . . . . . . .  6  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  7  1.4  Outline  2 Phase-Locked Loops 2.1  Introduction  2.2  Basic PLL  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  8  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  8  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  9  iv  Table of Contents  2.3  2.2.1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  9  2.2.2  Mathematical Modelling . . . . . . . . . . . . . . . . . . . . . . . .  11  Charge-Pump PLLs  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  13  2.3.1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  13  2.3.2  Mathematical Modeling . . . . . . . . . . . . . . . . . . . . . . . .  15  2.4  PLL Noise  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  18  2.5  Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  20  3 Phase/Frequency Detector  . . . . . . . . . . . . . . . . . . . . . . . . . . .  21  3.1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  21  3.2  Hogge Phase Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  23  3.3  Alexander Phase Detector . . . . . . . . . . . . . . . . . . . . . . . . . . .  26  3.4  Half-Rate Phase Detectors  . . . . . . . . . . . . . . . . . . . . . . . . . .  29  3.5  Hybrid Phase Detector  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  31  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  31  3.5.1  Introduction  3.5.2  A Hybrid PFD for CDR Applications  . . . . . . . . . . . . . . . .  33  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  42  4 High-Speed Frequency Comparators . . . . . . . . . . . . . . . . . . . . .  43  3.6  Conclusion  4.1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  43  4.2  Frequency Comparator Design Overview . . . . . . . . . . . . . . . . . . .  46  4.3  Design of the Control Block . . . . . . . . . . . . . . . . . . . . . . . . . .  48  4.3.1  Leakage-Tolerant Frequency Comparator Design  . . . . . . . . . .  48  4.3.2  High-Speed Frequency Comparator Design . . . . . . . . . . . . . .  50  4.4  Charge Pump Design  . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  52  4.5  Comparator Design  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  53  4.6  Buffer Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  55 v  Table of Contents 4.7  Simulation Result  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  57  4.8  Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  60  5 Charge Pump  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  61  5.1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  61  5.2  Charge Pump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  61  5.3  Wide Swing Charge Pump  . . . . . . . . . . . . . . . . . . . . . . . . . .  67  5.4  Simulation Result  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  69  5.5  Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  70  6 Voltage-Controlled Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . .  72  6.1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  72  6.2  Oscillators  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  72  6.3  Voltage-Controlled Oscillators . . . . . . . . . . . . . . . . . . . . . . . . .  75  6.4  VCO Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  78  6.5  Designed VCO and Simulation Results . . . . . . . . . . . . . . . . . . . .  79  6.6  Conclusion  80  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  7 Loop Filter Design and Stability  . . . . . . . . . . . . . . . . . . . . . . .  81  7.1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  81  7.2  Passive Loop Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  81  7.3  Loop Filter Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  82  7.4  Conclusion  85  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  8 Simulation Results  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  86  8.1  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  86  8.2  PLL Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  86  8.2.1  86  Low-Frequency PLL . . . . . . . . . . . . . . . . . . . . . . . . . .  vi  Table of Contents 8.2.2 8.3  High-Frequency PLL . . . . . . . . . . . . . . . . . . . . . . . . . .  Design of the Buffer and the Initialization Circuitry  88  . . . . . . . . . . . .  99  8.3.1  Buffer Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  99  8.3.2  Initializing Circuit Design . . . . . . . . . . . . . . . . . . . . . . . 100  8.4  Chip Layout  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101  8.5  Chip Results  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101  8.6  Conclusion  9 Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107  9.1  Introduction  9.2  Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107  9.3  Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108  9.4  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107  9.3.1  High Power Consumption . . . . . . . . . . . . . . . . . . . . . . . 108  9.3.2  Turning Off the FD  9.3.3  High Clock Frequency . . . . . . . . . . . . . . . . . . . . . . . . . 109  Future Work  . . . . . . . . . . . . . . . . . . . . . . . . . . 109  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109  9.4.1  Using a More Advanced Technology  . . . . . . . . . . . . . . . . . 109  9.4.2  Using the Fast Frequency Comparator . . . . . . . . . . . . . . . . 110  9.4.3  Decreasing the Power Consumption of the Chip . . . . . . . . . . . 110  9.4.4  Decreasing the Chip Area by Changing the VCO . . . . . . . . . . 111  9.4.5  Designing a Complete Receiver . . . . . . . . . . . . . . . . . . . . 112  Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114  Appendices A Current-Mode Logic Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 vii  Table of Contents A.1 Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132  A.2 CML Gate Design Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 135 A.3 NAND  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138  A.4 XOR  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141  A.5 MUX  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143  A.6 Latch  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144  A.7 Fast Latch  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146  A.8 Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149  viii  List of Tables 3.1  Operation of an Alexander PD. . . . . . . . . . . . . . . . . . . . . . . . .  28  3.2  Operation of a hybrid PFD. . . . . . . . . . . . . . . . . . . . . . . . . . .  36  5.1  Comparison of charge pump swing  . . . . . . . . . . . . . . . . . . . . . .  70  8.1  Comparison of PLL results . . . . . . . . . . . . . . . . . . . . . . . . . . .  98  A.1 Logic of a CML NAND gate . . . . . . . . . . . . . . . . . . . . . . . . . . 139  ix  List of Figures 1.1  (a) Oversampling CDR (b) Phase tracking CDR . . . . . . . . . . . . . . .  3  1.2  Basic CDR structure [26] . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4  2.1  Basic PLL structure [22] . . . . . . . . . . . . . . . . . . . . . . . . . . . .  9  2.2  Output of the XOR PD for two different phase errors . . . . . . . . . . . .  10  2.3  Characteristic of an XOR PD . . . . . . . . . . . . . . . . . . . . . . . . .  10  2.4  Structure of a CP PLL [35]  . . . . . . . . . . . . . . . . . . . . . . . . . .  14  2.5  Structure of a simple CP [23] . . . . . . . . . . . . . . . . . . . . . . . . .  14  2.6  A common PD [23] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  15  2.7  Output of the PD of Figure 2.6 . . . . . . . . . . . . . . . . . . . . . . . .  16  2.8  Characteristic of the PD of Figure 2.6 . . . . . . . . . . . . . . . . . . . . .  16  2.9  The two inputs used to enable linearization a CP PLL [23] . . . . . . . . .  17  2.10 Continuous data stream output of the PD of Figure 2.6 . . . . . . . . . . .  18  3.1  Characteristic of a (a) Linear PD (b) Binary PD . . . . . . . . . . . . . . .  22  3.2  Hogge phase detector [50] . . . . . . . . . . . . . . . . . . . . . . . . . . .  24  3.3  Hogge PD output when clock is (a) Early (b) On time (c) Late . . . . . . .  24  3.4  Change of VCO frequency when clock is (a) Early (b) Late . . . . . . . . .  26  3.5  Alexander phase detector [51] . . . . . . . . . . . . . . . . . . . . . . . . .  26  3.6  Alexander PD output when clock is (a) Early (b) Late . . . . . . . . . . .  27  3.7  Modified Alexander PD [53] . . . . . . . . . . . . . . . . . . . . . . . . . .  28 x  List of Figures 3.8  Half rate linear phase detector [129] . . . . . . . . . . . . . . . . . . . . . .  30  3.9  Half rate linear phase detector operation [129] . . . . . . . . . . . . . . . .  30  3.10 Half rate binary phase detector [48] . . . . . . . . . . . . . . . . . . . . . .  31  3.11 Half rate binary PD operation when clock is late [48] . . . . . . . . . . . .  32  3.12 A semi-hybrid PD characteristic [43] . . . . . . . . . . . . . . . . . . . . .  33  3.13 A semi-hybrid phase detector [43] . . . . . . . . . . . . . . . . . . . . . . .  34  3.14 A hybrid phase detector [31] . . . . . . . . . . . . . . . . . . . . . . . . . .  35  3.15 A hybrid PD for CDR applications . . . . . . . . . . . . . . . . . . . . . .  36  3.16 (a) Down (b) Up CML gates of the FD . . . . . . . . . . . . . . . . . . . .  37  3.17 Output of Hogge PD when clock lags data . . . . . . . . . . . . . . . . . .  40  3.18 High-frequency hybrid phase/frequency detector . . . . . . . . . . . . . . .  41  4.1  Pull-in process of a PLL [21] . . . . . . . . . . . . . . . . . . . . . . . . . .  44  4.2  PLL locking process [38] . . . . . . . . . . . . . . . . . . . . . . . . . . . .  46  4.3  System overview of the frequency comparator . . . . . . . . . . . . . . . .  47  4.4  Leakage-tolerant frequency comparator design . . . . . . . . . . . . . . . .  49  4.5  Leakage-tolerant frequency comparator operation . . . . . . . . . . . . . .  50  4.6  High-speed frequency comparator design . . . . . . . . . . . . . . . . . . .  51  4.7  High-speed frequency comparator operation . . . . . . . . . . . . . . . . .  51  4.8  Frequency comparator charge pump . . . . . . . . . . . . . . . . . . . . . .  52  4.9  High speed comparator . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  54  4.10 High speed comparator with a strong reset . . . . . . . . . . . . . . . . . .  55  4.11 Converting comparator output to frequency comparator output . . . . . .  56  4.12 Circuit used to test the frequency comparators . . . . . . . . . . . . . . . .  57  4.13 Simulation results for the high-speed comparator  . . . . . . . . . . . . . .  58  4.14 Simulation results for the leakage-tolerant comparator . . . . . . . . . . . .  59  xi  List of Figures 5.1  Basic charge pump structure [23] . . . . . . . . . . . . . . . . . . . . . . .  62  5.2  Low-ripple charge pump [70] . . . . . . . . . . . . . . . . . . . . . . . . . .  63  5.3  Rail-to-rail buffer [23] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  64  5.4  Charge pump design with no buffer [77] . . . . . . . . . . . . . . . . . . . .  65  5.5  Charge pump with NMOS switches [77] . . . . . . . . . . . . . . . . . . . .  66  5.6  The proposed high swing charge pump . . . . . . . . . . . . . . . . . . . .  67  5.7  The proposed high swing charge pump operation . . . . . . . . . . . . . . .  70  6.1  Positive feedback amplifier . . . . . . . . . . . . . . . . . . . . . . . . . . .  74  6.2  VCO model [89] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  75  6.3  (a) MOS varactor structure (b) MOS varactor characteristics [86] . . . . .  77  6.4  MOS accumulation varactor characteristic [86] . . . . . . . . . . . . . . . .  77  6.5  Varactor based voltage-controlled oscillator [86] . . . . . . . . . . . . . . .  78  6.6  Voltage controlled oscillator output . . . . . . . . . . . . . . . . . . . . . .  80  7.1  PLL loop filter circuit [23] . . . . . . . . . . . . . . . . . . . . . . . . . . .  82  7.2  Phase margin of the linear PLL . . . . . . . . . . . . . . . . . . . . . . . .  83  7.3  Digital delay generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  84  8.1  Sample locking of the low frequency PLL . . . . . . . . . . . . . . . . . . .  87  8.2  Jitter of the low frequency PLL . . . . . . . . . . . . . . . . . . . . . . . .  88  8.3  Minimum control voltage of the VCO in the low frequency PLL . . . . . .  88  8.4  Operation of the low frequency FD . . . . . . . . . . . . . . . . . . . . . .  89  8.5  Design of the Hybrid PLL . . . . . . . . . . . . . . . . . . . . . . . . . . .  89  8.6  Sample lock of the high-frequency PLL . . . . . . . . . . . . . . . . . . . .  90  8.7  Operation of the frequency comparator . . . . . . . . . . . . . . . . . . . .  92  8.8  Locking of the PLL at a frequency of 8.7 GHz. . . . . . . . . . . . . . . . .  93  xii  List of Figures 8.9  Operation of the frequency comparator in Figure 8.8. . . . . . . . . . . . .  93  8.10 Jitter of the PLL before bandwidth switching in Figure 8.8. . . . . . . . . .  94  8.11 Jitter of the PLL after bandwidth switching in Figure 8.8. . . . . . . . . .  95  8.12 Minimum locking frequency of the designed PLL . . . . . . . . . . . . . . .  96  8.13 Operation of the frequency comparator in Figure 8.12 . . . . . . . . . . . .  96  8.14 Operation of the PLL under a significant frequency difference  97  . . . . . . .  8.15 CML buffer [124] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 8.16 CML buffer output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 8.17 PLL layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 8.18 PLL chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8.19 VCO output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 8.20 Inductor with no filling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 8.21 Inductor with RX filling only . . . . . . . . . . . . . . . . . . . . . . . . . 105 8.22 Inductor with PC filling only . . . . . . . . . . . . . . . . . . . . . . . . . . 106 9.1  Typical DFE structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112  9.2  A simple receiver [115] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113  A.1 Basic CMOS structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 A.2 Basic CML structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 A.3 Structure of a CML buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 A.4 Transfer characteristic of a CML buffer [124] . . . . . . . . . . . . . . . . . 137 A.5 CML NAND gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 A.6 CML NAND gate output . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 A.7 CML NAND gate output at 33 GHz  . . . . . . . . . . . . . . . . . . . . . 141  A.8 CML XOR gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 A.9 CML XOR gate output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 xiii  List of Figures A.10 CML MUX gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 A.11 CML MUX gate output . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 A.12 CML latch gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 A.13 CML latch gate output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 A.14 Inductive peaking on CML gates  . . . . . . . . . . . . . . . . . . . . . . . 147  A.15 A latch gate with a feedforward path . . . . . . . . . . . . . . . . . . . . . 148 A.16 Output of the fast latch gate . . . . . . . . . . . . . . . . . . . . . . . . . . 149  xiv  Acknowledgements I would like to express my gratitude to my supervisor, Professor Shahriar Mirabbasi for his enthusiasm, guidance, and unconditional support. It was an honor to work with him, and I thank him deeply for giving me the opportunity to join his research group. Shahriar has been a great friend and a sound lead for me and has impressed me with his patience and his great character. I have learned a lot from him and I am thankful to him for all his help. I would also like to thank my committee members, Prof. Alireza Nojeh and Prof. Tor Aamodt, for reviewing my thesis and providing me invaluable feedback which improved the quality of the thesis. I am also thankful to my colleagues, Alireza Sharif Bakhtiar and Brian Cousins, for all their support and the technical discussions that played a significant role in my research. Neda Nouri helped me a lot during my chip fabrication and Arash Zargaran-Yazd kindly proofread this thesis and provided very useful feedback. I am really thankful to them. Also, I would like to acknowledge Mr. Roozbeh Mehrabadi for his help with the CAD tools and Dr. Roberto Rosales for his technical assistance and willingness to help. Roberto also proofread part of this thesis and provided valuable feedback. I am thankful to him for that. I would also like to extend my gratitude to my friends here in the SoC lab and in UBC for their great friendship, all the memories I share with them, and all the wonderful discussions during coffee breaks and weekends. They made this whole experience much xv  Acknowledgements more pleasant for me, and I am thankful to them for this. I am specially thankful to Anil Kumar, Hooman Rashtian, Jack Shiah and Nima Sadeghi whose friendship is highly appreciated. Last but most certainly not least, I would like to extend my deepest appreciation to my wonderful parents, my lovely brother and sister for inspiring me with their enthusiasm, for believing in me and encouraging me, for their endless love and support, and for being around whenever I needed them the most. They have always been the greatest source of inspiration for me, and I dedicate this thesis to them. This research is supported in part by the Intel Corporation University Research Program and the Natural Sciences and Engineering Research Council of Canada (NSERC). CAD tool support is provided by Canadian Microelectronics Corporation (CMC Microsystems).  xvi  Chapter 1 Introduction 1.1  Motivation  A phase-locked loop (PLL) is a system whose output frequency and phase track those of its input. This feature proves useful in a variety of applications. They are used for clock generation in microprocessors [1], as they are able to generate an accurate high-frequency signal from a low frequency reference input. They can also be used in FM demodulators [2] as they are able to detect any change in frequency. Also, in some applications, PLLs are used to reduce the input jitter or to deskew the clock [3] as they filter any high frequency noise (or signal) on the input. They are also the main component of frequency synthesizer circuits [4, 5]. PLLs are also widely used in serial (data) communication applications. Serial communication is the process of sending data sequentially from one device to another. This is in contrast to parallel (data) communication, where multiple bits of information are sent in parallel. Previously, serial communication was only used in applications where the distance between the transmitter and the receiver was relatively long, as the wire cost would make parallel communication impractical. However, due to the recent increase in the speed of the serial links [6], they are replacing their parallel counterparts in some short distance applications [7]. One main reason for this migration, other than the cost of wires is the cost of the chip, as chips with more input/output (IO) pins are more expensive. Also, serial links can be clocked considerably faster than their parallel counterparts, as they do not suffer from clock skew between channels and channel crosstalk. 1  Chapter 1. Introduction Serial communication protocols can be divided into two main categories, namely synchronous and asynchronous [8, 9]. In a synchronous serial transmission, a clock signal is used to synchronize the transmitter and the receiver. A serial peripheral interface (SPI) bus is a well-known example of a synchronous serial link. This master/slave-based fullduplex four-wire serial interface bus operates by synchronizing its transmission with the master clock [8]. Although this bus has a rather simple architecture, it is still relatively costly due to the number of wires and IO pins used. Asynchronous transmission however operates based on pre-defined agreements on baud rate, number of bits per bytes, number of start bits and stop bits and flow control between the transmitter and the receiver. Universal asynchronous receiver transmitter (UART) is widely used in asynchronous links [8]. Here, the receivers polls the input line for the start bits. As soon as they are received, the receiver adjusts its internal circuitry for data reception. The operation ends when the stop bits arrive. Thus in this standard, only two wires and subsequently two pins are needed, one for reception and one for transmission [10]. In serial links, to minimize the cost and number of wires, typically only data bits are sent across the communication channel and the timing (clock) information is extracted at the receiver using a CDR block. In this thesis, we focus on a specific structure for such CDR circuits. Due to the ever increasing need for faster communication links, the adverse effects of the channel on the received data are becoming more pronounced. With state-of-the-art and emerging technologies, the internal circuitry can run at tens of Gb/s, far exceeding the bandwidth of the typical channels [11]. Thus the channel can heavily distort the data. To solve this problem, some researchers suggest using more complicated channels with higher bandwidths [12, 13] which is a rather expensive solution, while others use circuit blocks which compensate for the effect of the channel, namely equalizers. Equalizers are blocks  2  Chapter 1. Introduction whose transfer function is ideally the inverse of that of the channel and thus cancel the effect of the channel. Some designers put the equalizer in the transmitter [14], some design their system such that the equalizer is in the receiver [15], and some split the task of the equalization between the transmitter and the receiver [16]. After removing the effects of the channel in the receiver, the data has to be recovered. However, both the timing (clock) information and the data are contained in the same signal. To be able to correctly perform data recovery, the timing information should first be extracted from the received signal. This is called clock recovery. In broadband receivers, equalizers are typically followed by clock and data recovery (CDR) circuits [17]. Clock and data recovery circuits are usually implemented either by using oversampling or phase tracking. Oversampling-based CDR circuits [18] operate by taking multiple samples of the received signal. At least three samples per bit are needed. Each sample is then compared with its neighboring samples to determine the position of the data edge. The correct data is finally extracted using a majority voting block [18]. Phase tracking CDR circuits operate by aligning the rising (or falling) edge of a locally generated reference clock to the rising edge of the data. The falling (rising) edge of the clock then samples the data in the middle of its eye diagram1 . Since PLLs can track phase, they are the main component of phase tracking CDR circuits [20–27]. Figure 1.1 shows these two schemes.  Din Clk (a)  (b)  Figure 1.1: (a) Oversampling CDR (b) Phase tracking CDR 1  [19] combines the two methods to achieve superior performance in a “semi-blind oversampling CDR” circuit.  3  Chapter 1. Introduction Figure 1.2 shows the general structure of a phase tracking CDR circuit, where the decision circuit is simply a flip flop [26].  Din  Dout D  Decision Circuit Q  Clock Recovery (PLL)  Clk  Figure 1.2: Basic CDR structure [26]  One important challenge in the the design of the above PLL is that it should be insensitive to a lack of data transition, meaning that its frequency (and phase) should not change if a continuous stream of “0”s or “1”s is received [26, 27]. We will revisit this point again later in the PFD design chapter. Finally, note that in this thesis, we are assuming that the format of the incoming data is non-return to zero (NRZ). For a return to zero (RZ) data format, the clock information is directly contained in the received data and thus, clock recovery is typically done differently [28].  1.2  Objectives  This research is aimed at designing a PLL suitable for high-speed CDR applications. The PLL is intended to be designed in CMOS to enable its integration with the other circuits on a single chip. The design is targeted for high-speed serial links (multi Gb/s links) that require a wide locking range, a fast acquisition time, and a low jitter. 4  Chapter 1. Introduction The PLL is designed and simulated in a 0.13 µm CMOS technology. A higher operating frequency can be obtained in more advanced technologies due to the increase in the transition frequency fT of transistors.  1.3 1.3.1  Contributions Low- and High-Frequency Hybrid PFD for CDR Applications  Depending on the application, there are many stringent requirements imposed on the PLL design. Some PLLs are needed to lock fast [29], some need to cover a wide input range [30], and some to have a low jitter when they lock [29]. It is known that a PLL with a binary PD has a fast lock and a wide tuning range, while a PLL with a linear PD has a low output jitter [31]. Thus one can combine a linear and a binary PD into one hybrid PD [31]. The PLL operation will start using the binary PD to achieve a fast lock and a wide tuning range, and then upon lock, it will switch to the linear PD to achieve a low jitter output [31]. This approach is in fact what is done in this thesis. However, we have designed two alternative hybrid PLLs, one for low frequency and the other for high-frequency applications. Both hybrid phase detectors have the frequency detection capability incorporated in them, making them hybrid phase-frequency detectors (PFDs). It should be mentioned that both of these hybrid PFDs are designed for CDR applications. To the best of our knowledge, hybrid PFDs suitable for CDR applications were not reported before.  5  Chapter 1. Introduction  1.3.2  Fast Frequency Comparator for Switching the Operation Mode  To switch the operation from the binary to the linear PLL, we need a fast mechanism to detect if the clock frequency is equal to the data frequency and a fast frequency comparator is ideal for our application. However, many of the frequency comparators reported in the literature are based on counters [30, 32], which are inherently slow. This thesis introduces a novel frequency comparison technique based on charging two capacitors. This circuit has two inputs, whose frequencies are to be compared. A capacitor is assigned to each signal. These capacitors are charged when their corresponding signal is high, and then the voltage on the capacitors are compared to one another. The signal with the lower frequency charges its corresponding capacitor more. The capacitors are then discharged, and the cycle is repeated.  1.3.3  High-Swing Differential Charge Pump  The design of the charge pump (CP), the main building block of many PLLs, is based on two current sources, charging or discharging the loop filter capacitance, when their corresponding switches are on or off, respectively. These two current sources should ideally supply equal currents [23]. However, as the loop filter voltage is decreasing (or increasing), the drain-source voltage of the current source which is pulling its voltage down (or up) decreases. This subsequently decreases the current output of the current source. To fix this problem, we have introduced a feedback-based technique to keep the output current of the current source relatively constant [33]. The system works by detecting the decrease of the drain-source voltage of the current source, and increasing its gate source voltage to compensate for the current loss. 6  Chapter 1. Introduction  1.4  Outline  This thesis is organized as follows. In the next chapter, the basic operation of a PLL is discussed. The chapter reviews the background material needed for the rest of this thesis. In the third chapter, we review the design of hybrid phase/frequency detectors. The fourth chapter is dedicated to the design of a fast frequency comparator, responsible for switching the operation from the binary to the linear PD. In the fifth chapter, we review the design of a wide swing charge pump. This is then followed by the design of the VCO and the loop filter in the sixth and the seventh chapters. In Chapter 8, we report our simulation results and compare our work to a few other PLLs. Chapter 9 concludes the thesis.  7  Chapter 2 Phase-Locked Loops 2.1  Introduction  A phase-locked loop (PLL) is a closed-loop feedback system whose output (generated by a voltage-controlled oscillator (VCO) or a current controlled oscillator (ICO)) “tracks” the input signal in frequency as well as in phase [21]. This very property of PLLs has made them extremely popular in a variety of applications. In fact, the first PLL was introduced by H. deBellescize in 1932 [22] and decades later, people are still doing research on this circuit. Probably the main reason for this wide popularity is the versatility of PLLs [22]. PLL applications range from clock generation in microcontrollers, to frequency multiplication and synthesis in transceivers and to skew and jitter reduction in purely digital blocks [22, 23]. In this chapter, the operation a basic phase-locked loop (PLL) is presented. We start from the most simple PLL structure and build up on it. The role of the different blocks is discussed, and a few very simple circuits realizing each block is introduced. The main goal of this chapter is to give an overview about the system-level operation of each block and to discuss the dynamics of a simple PLL. To do this, each block is linearized and modelled in the s-domain. This will then be used to characterize the whole system. However, the detailed circuit-level implementation of the blocks is left to be discussed in the other chapters. Also, this chapter defines the technical terms that are used in the rest of this thesis. 8  Chapter 2. Phase-Locked Loops  2.2 2.2.1  Basic PLL Introduction  Figure 2.1 shows the structure of a basic PLL, showing the three essential blocks for PLL operation, namely phase detector (also known as phase comparator) (PD), loop filter (a low pass filter in many designs), and voltage-controlled oscillator (VCO)2 [22]. Vcont  PLL Input Phase Detector  PLL Output  LPF  Figure 2.1: Basic PLL structure [22]  To understand the operation of the above PLL, the operation of each block should be discussed first. The phase detector compares the phase of the PLL input to that of the PLL output, and generates a signal, whose average is proportional to the phase difference between the two signals [21]. Calling the phase difference (phase error) between the input and the output θe , the ideal output of a linear PD (after averaging) is [23]:  ud (t) = KP D θe  (2.1)  where KP D is the gain of the PD. Note that the role of the low pass filter (LPF) is to calculate the average of the PD output. An XOR gate can be used to realize a simple PD [34]. Consider the two diagrams shown in Figure 2.2, where PLL input and PLL output are the two inputs of the PD (as shown in Figure 2.1): 2  A current-controlled oscillator (ICO) can also be used here.  9  Chapter 2. Phase-Locked Loops PLL Input  PLL Input  PLL Ouput  PLL Ouput  PD Output  PD Output  Figure 2.2: Output of the XOR PD for two different phase errors  In the above figure, when the phase difference between the two inputs is zero, the average of the PD output is zero. As this phase difference begins to increase, so does the average of the PD output. However, if the phase difference exceeds π, the average of the PD output begins to decrease again. This can lead to input-output characteristic of an XOR PD, shown in Figure 2.3 [23].  Vout  -2π  -π  0  Өe π  2π  Figure 2.3: Characteristic of an XOR PD  Oscillators are circuit blocks used to generate a periodic output signal whose frequency is fixed. Note that oscillators have no input signal, as they basically amplify the inherent noise of the circuit at the desired frequency [23]. A VCO is an oscillator whose output frequency linearly changes with an input control voltage (named Vcont in Figure 2.1). Ideally,  10  Chapter 2. Phase-Locked Loops the output of the VCO can be written as [23]:  ωout = ω0 + KV CO Vcont  (2.2)  where KV CO is the gain (or sensitivity) of the VCO, Vcont is the control voltage of the VCO, ωout is the output frequency of the VCO and ω0 is the output frequency of the VCO when its input is zero (also known as the free running frequency). Now, assume a periodic signal whose frequency is close to the free running frequency of the PLL VCO is applied to the phase detector. The phase detector detects the phase difference between the PLL input and and the PLL output and generates a signal whose average value is proportional to this phase difference. This average value is calculated in the low pass filter, whose output is then fed into the control voltage of the VCO. Knowing that the frequency of the oscillation of the VCO changes with Vcont , this also changes the phase of the VCO output. This is finally fed back to the phase detector and the process is repeated until the phases of the output and of the input are “aligned”. This alignment is called “phase locking” and the PLL is said to be “locked” in this condition.  2.2.2  Mathematical Modelling  In the system shown in Figure 2.1, the output of the VCO is fed back to the input of the PD, using a negative feedback. Despite all the benefits of systems with feedback, they have the potential of becoming unstable, if not designed properly. To avoid this, it is critical to linearize and model each block in the s-domain, and to find the phase margin of the system in MATLAB. The result can then be optimized in Cadence. As shown in Figure 2.3, the input-output characteristic of a simple PD, in its “useful” range of operation, is linear. Let us Call the gain of the PD (which is the slope of the characteristic line) KP D . Considering that the averaging operation is done in the low pass 11  Chapter 2. Phase-Locked Loops filter with a well-known transfer function of:  T FLP F =  1 1 + s/ωLP F  (2.3)  where ωLP F is the 3-dB frequency of the filter. Therefore the combined s-domain transfer function of the low-pass filter and the PD is [23]: Vcont KP D = θe 1 + s/ωLP F  (2.4)  To characterize the VCO, note that as shown in Figure 2.3, the PD inputs are expressed in terms of phase and not frequency. Thus Equation 2.2 should be modified to represent the phase of the VCO. This is possible considering that the phase is the integral of frequency [22]. Integrating both sides of Equation 2.2 and taking the Laplace transform, it is easy to characterize the VCO as [22]: θout KV CO = Vcont s  (2.5)  Now, the open loop transfer function of the PLL is [23]:  H(s) |open =  KP D KV CO θout |open = . θin 1 + ωLPs F s  (2.6)  It can be seen that the system is a type I system and has two poles, which means it is always stable.  12  Chapter 2. Phase-Locked Loops  2.3 2.3.1  Charge-Pump PLLs Introduction  Although an XOR gate is a very simple PD, it has limited applications. The main reason is that although the PD works well in case of a phase difference between the PLL input and the VCO free running output, its usage is rather limited if its two inputs have a frequency difference. The reason for this is the problem of a limited “acquisition range” of a PLL that uses this PD [23]. “Lock acquisition” is defined as the PLLs ability to move from an unlocked state to a locked one and acquisition range3 is defined as the maximum initial frequency different between the PLL input and the PLL (VCO) output that makes lock acquisition possible. The time required for a PLL to acquire lock is called locking time. The locking time increases as the initial frequency difference between the two PD inputs increase. If this frequency difference exceeds the acquisition range, the PLL will never lock. To solve the problem of a limited acquisition range, the structure of the PD should be changed. Modern PDs mostly provide two outputs (instead of one) to indicate whether the VCO frequency should increase or decrease for lock acquisition to take place [21–23,34,35]. Since the VCO has only one input, a circuit block, called charge pump (CP), is responsible to convert the two outputs4 of the PD to one output, which will later be used by the VCO. The structure of PLLs using CPs is shown in Figure 2.4 [35]. The structure of a basic charge pump is shown in Figure 2.5 [23], where the U p and Dn signals come from the “improved” PD. When the U p or the Dn signals are “1”, their corresponding switches conduct, charging or discharging the output capacitor respectively. This in turn changes the VCO frequency, until the PLL is locked. At this point, both signals will remain low, thus keeping the VCO 3  Sometimes also referred to as the capture range or the pull-in range PDs with four outputs also exist. They are usually called phase/frequency detectors (PFD), and will be discussed in chapter 3. 4  13  Chapter 2. Phase-Locked Loops  PLL Input  Phase Detector  Charge Pump  CPout  Loop Filter  Vcont  PLL Output  Figure 2.4: Structure of a CP PLL [35] VDD Iup A Up Out Dn  C1 B  Idn  Figure 2.5: Structure of a simple CP [23]  frequency constant. In many designs, it is desirable to make sure that Iup = Idn . There are many phase detectors whose operation rely on the existence of a CP in the PLL design. Figure 2.6 shows one of these phase detectors [23]. In the above PD, In1 is connected to the PLL input, while In2 is connected to the VCO output. Figure 2.7 shows the operation of this PD, assuming that the frequency of the PLL input is less than that of the clock. The outputs of the figure show that the VCO frequency should go down for locking to occur, which is expected. To get the characteristic of the above PD, let us define the PD output as the difference between the signals U p and Dn. Now, note that the average output linearly changes, as the phase difference between the two inputs changes. However, when the phase difference  14  Chapter 2. Phase-Locked Loops VDD  D In1  Ck  Up  Reset  Reset In2  Ck  Dn  D  VDD  Figure 2.6: A common PD [23]  exceeds 360 degrees or -360 degrees, the PD cannot detect it any more, i.e. the output of the PD is the same for a phase difference of 20 degrees or 380 degrees. The characteristic output of this PD is shown in Figure 2.8 [21]. Now that the basic operation of a CP PLL is known, the mathematical model for such a system will be derived next.  2.3.2  Mathematical Modeling  Note the part highlighted on Figure 2.7. Here, we have assumed that In2 leads In1 . If In2 lags In1 , the outputs of the PD will be different. This exceptional case is intentionally shown to emphasize that the locking process of a PLL is a very non-linear phenomenon, specially if the PD is exposed to two inputs with unequal frequencies [23, 36], which decreases the accuracy of the modelling even further. Despite this, the modelling in this section is done assuming the simplest form, namely the case where the two inputs only have a fixed phase difference. Figure 2.9 shows this case [23]. Note that the bottom graph 15  Chapter 2. Phase-Locked Loops  Figure 2.7: Output of the PD of Figure 2.6 Vout  -4π  -2π 0  Өe 2π  4π  Figure 2.8: Characteristic of the PD of Figure 2.6  belongs to the case where the CP of Figure 2.5 is connected to the PD of Figure 2.6. When the signal U p is high, the current source Iup charges the output capacitance, C1 , thus increasing the voltage of node Out in Figure 2.5 and when both inputs are zero, this voltage remains constant5 . We can linearize the system now, assuming that in the charge pump, Iup = Idn = Ip . Assuming that the two inputs have a phase difference of θe , the slope of the dotted line in Figure 2.9 is Ip /(2πC1 )θe [23], where C1 is the capacitor, shown in Figure 2.5. The transfer 5  In practice, various non idealities create a ripple on the VCO control voltage, which contribute to the jitter of a locked PLL, as shown in [37].  16  Chapter 2. Phase-Locked Loops In1  In2  Up  Dn  CPout  Figure 2.9: The two inputs used to enable linearization a CP PLL [23]  function of the PD and the CP can be written as [23]: IP 1 Vout (s) = θe 2πC1 s  (2.7)  Combining the above transfer function with that of the VCO yields:  H(s) |open =  θout IP KV CO |open = θin 2πC1 s2  (2.8)  It can be seen that the above transfer function has two poles at zero and thus the PLL is called a type II PLL. It can also be seen that an uncompensated type II PLL is unstable. To fix the above problem, a zero should be introduced into the transfer function. This is done by connecting a resistor, R1 , in series with C1 . Now, it can be shown that the new transfer function is [23]:  H(s) |open =  θout IP 1 KV CO |open = (R1 + ) θin 2π C1 s s  (2.9)  The modified system has two poles and one zero, and can thus be stable. It also has  17  Chapter 2. Phase-Locked Loops a low pass response, and thus the PLL bandwidth can be defined for such a system [23]. Finally, and to reduce the high-frequency ripple on the VCO control voltage, a capacitor C2 , is usually connected in parallel with the series combination of R1 and C1 . It can be shown that the PLL response does not significantly change if C2 is from one-fifth to one-tenth of C1 [23]. Finally, note that the CP PLLs used in CDR applications should not be sensitive to the lack of a data transition. Let us assume that the PLL is locked and then observe the behavior of the PD of Figure 2.6 when a continuous stream of “Zero” is received. This is shown in Figure 2.10. In1  In2  Up  Dn  Figure 2.10: Continuous data stream output of the PD of Figure 2.6  In the figure, In1 is the received data, which we have assumed to be “1000001010”. Clearly, the lack of a data transition has caused the VCO frequency to decrease, and as a result, the PLL will lose lock. This limits the use of the PD of Figure 2.6 in CDR applications.  2.4  PLL Noise  In most applications, the PLL output is a buffered version of the VCO output. Neglecting the added noise of the buffer, the PLL output noise is equal to the VCO noise. The 18  Chapter 2. Phase-Locked Loops output of the VCO can have two types of noise, namely, noise in amplitude and noise in phase. Noise in amplitude can be ignored, as the output amplitude can simply be changed or sliced, if needed. However, noise in phase, whose effect usually appears as PLL jitter should be taken into account during the design phase6 . Due to the nonlinear nature of PLLs, a detailed noise analysis of PLLs is rather combersome [21]. Here however, we briefly review some of the basic concepts. There are many noise sources in PLLs. VCO (phase) noise, the thermal noise of the loop filter resistance, transistor noise and transistor leakage all create noise in the PLL output, which contribute to the PLL jitter. Also, the jitter of the PLL input can affect the noise of the PLL output. To illustrate this, let us discuss the effect of the transistor leakage on the PLL jitter in more detail. Here, assume that ideally, the control voltage of the VCO should be equal to VDC , which means that the CP charges the loop filter, until its voltage reaches VDC . However, due to the transistor leakage, this voltage drops in value, subsequently, decreasing the VCO output frequency. This is then detected by the PD, and the loop filter is charged again to VDC . This creates ripple on the VCO control voltage. Let us now approximately model the control voltage of the VCO as:  Vcont = VDC + Aripple sin(ωripple t)  (2.10)  where Aripple and ωripple are the amplitude and the frequency of the ripple on the VCO control voltage respectively. Using Equation 2.2, the output frequency of the VCO can now be written as:  ωout = ω0 + KV CO VDC + KV CO Aripple sin(ωripple t)  (2.11)  6  Later, we will be introducing the phase noise of a VCO. The PLL jitter is a time-domain representative of the phase noise of the PLL VCO.  19  Chapter 2. Phase-Locked Loops It can now be seen that this noise modulates the VCO output frequency, which subsequently changes the output phase. This change is then translated into the PLL jitter. Since the PLL has a low frequency response, the high-frequency jitter on its input is filtered out [23]. Also, it is shown that the lower the PLL bandwidth, the lower the amount of the PLL jitter [21], as a larger portion of the noise component is filtered out. Measurement results in [38] and our own simulation results also confirm this. However, a low bandwidth increases the lock time.  2.5  Conclusion  Now that the basic operation of the PLL is discussed, and all of the building blocks have been introduced, linearized and mathematically modelled, it is time to see how the blocks are actually designed. This is done in the next few chapters, starting with the PFD design in Chapter 3. Also, as mentioned in this chapter, many PDs use digital gates in their structure7 and the PD designed in this work is of no exception. Since digital gates have been frequently used in the rest of this thesis (in both the PFD design (Chapter 3) and the frequency comparator design (Chapter 4)), their design will be discussed in Appendix A.  7 Probably mixers are the most famous fully analog phase detectors with no digital gates [39]. They basically multiply their two input signals by each other. Note that an XOR gate can also be viewed as a digital multiplier.  20  Chapter 3 Phase/Frequency Detector 3.1  Introduction  As we have already seen in the motivation section, PLLs are widely used in a variety of communication applications including clock and data recovery (CDR) circuits. Many such systems require stringent specifications for the PLL which make their design a challenging task. For example, in some designs it is desired to have a PLL with a wide pull-in range (the frequency range within which the PLL will always become locked) and low output jitter. A PLL with a wide pull-in range is more immune to the process and temperature variations, as locking can be achieved even when the difference between the voltage-controlled oscillator (VCO) and the input frequency is large. When the PLL is locked typically a low-jitter output is desired, so that the PLL meets the jitter restrictions of the application. However, there is a trade-off between the wide pull-in range and low output jitter [40]. The choice of the phase detector (PD) determines many of the PLL properties and its design is discussed in this chapter. Linear phase detectors have already been introduced in chapter 2. The output of these phase detectors is linearly proportional to the phase difference between the two inputs. However, bang-bang (also known as binary or early-late) phase detectors are also often seen in the literature [41,42]. The output of these phase detectors is either “0” or “1”, and it does not linearly change with the phase difference between the two inputs. Figure 3.1 shows the characteristic of a typical bang-bang and linear phase detector [43], where Θe is 21  Chapter 3. Phase/Frequency Detector the phase difference between the two inputs. Vout  Vout  -π  -π 0  Өe π  (a)  0  Өe π  (b)  Figure 3.1: Characteristic of a (a) Linear PD (b) Binary PD  It can be shown that using a linear phase detector in a PLL results in a low ripple on the VCO control line and thus a low jitter [129], however, a PLL with a linear PD has a limited pull-in range and a long acquisition time. This is due to the fact that linear PDs do not respond well in the presence of a large frequency difference between their two inputs [43]. In comparison, a binary PD provides a wider pull-in range and a faster acquisition time, at the cost of a higher ripple on the VCO control line [43]. One way to solve the problem of a limited pull-in range in linear PLLs is to use a dual-loop PLL where one loop locks the frequency and once the frequency is within the lock range of the second loop, the other loop locks the phase. In these circuits, a lock detector (LD) circuit is usually responsible to switch the operation from one loop to the other. Note that both loops share the same VCO [44–46]. This is typically known as aided acquisition8 . To increase the pull-in range, some designs use phase/frequency detectors (PFD). PFDs are circuits combining the frequency detection capability as well as the phase detection capability into one circuit. Switching from frequency detection to phase detection is usually 8 Aided acquisition using two loops and two VCOs have also been reported. [47] uses one loop for frequency acquisition. This loop also sets the voltage of the coarse tuning input of the VCO in the second loop. Phase locking is finally achieved in the second loop.  22  Chapter 3. Phase/Frequency Detector done automatically. The circuit introduced in Figure 2.6 is an example of a phase/frequency detector. Also, [48] and [49] introduce two other phase/frequency detector circuits. In clock and data recovery applications, phase detector circuits are also categorized in terms of the clock frequency with respect to the data frequency. Some designers propose structures in which the clock frequency is twice as much as the data frequency (full rate) [50–53], while others propose structures in which the clock frequency is equal to the data frequency (half rate) [48, 54, 55, 129]. Recently, and as the need for very high speed serial data communication increases, designers have looked into using structures in which the clock frequency is half (quarter rate) [46, 130] or even a quarter [56] of the data frequency. Note that in “partial-rate” CDR structures, a multi Gigabits per second (Gbps) data can be recovered using a lower frequency VCO. This improves the noise (phase noise) of the VCO which is directly proportional to the square of the VCO oscillation frequency [57]. Also, the operating frequency of the PD (PFD) decreases, which eases the design, but more importantly, reduces the overall power consumption. In this chapter, several of the above-mentioned phase detectors suitable for CDR applications will first be introduced. This is then followed by a section pointing out the possible improvements to the previously discussed PDs, which leads to the introduction of a hybrid phase/frequency detector circuit. Note that since this PFD is based on combining the Alexander and the Hogge PD, the design of these two circuits is discussed in more detail. This chapter is concluded by discussing the challenges in the design of a PLL which uses the new PFD.  3.2  Hogge Phase Detector  First introduced by Charles R. Hogge in 1985, this is one of the most famous linear phase detectors suitable for CDR applications [50]. The circuit is shown in Figure 3.2. 23  Chapter 3. Phase/Frequency Detector Proportional or Y  FF1 Data in D  Reference or X  FF2 Q  B  D  Q  A  Clock  Figure 3.2: Hogge phase detector [50]  Note that in the above PD, the signal Data in is connected to the incoming data, while the signal clock is connected to the VCO of the PLL9 . Also, like many other phase detectors, this one also needs a charge pump to subtract the signal X from the signal Y . The loop filter then sees the output of this subtraction. This PD is simulated in Matlab Simulink with ideal blocks. Figure 3.3 shows the simulation results.  Figure 3.3: Hogge PD output when clock is (a) Early (b) On time (c) Late  To explain the operation of this PD, we focus on Figure 3.3 (a). Without any loss of 9  We are assuming that the PLL is not fractional in this case and in the rest of the thesis. More information on fractional PLLs can be found in [58].  24  Chapter 3. Phase/Frequency Detector generality, we assume that the falling edge of the data arrives before the rising edge of the clock. The arrival of this edge, sets the proportional output to a logical “1”. This output stays high until the rising edge of the clock arrives, in which time node B falls. This will reset the proportional output, but also set the reference output, as the voltage of node A is VDD from the previously received bit. Finally node A will also fall with the falling edge of the clock, reseting the reference output. As it can be seen in Figure 3.3, in case of a transition on the data, the reference output is always high from the rising edge to the falling edge of the clock, and thus its period is equal to that of the clock. This is why it is called the “reference output”. However, the pulse width of the proportional output is linearly “proportional” to the arrival time of the falling (or rising) edge of the incoming data, making the pulse width of X − Y proportional to the phase difference between clock and the data, thus giving this PD its linear capability. This is also why this output is called the “proportional output” of the PD. Finally, if there is no transition on the data, the voltage on the nodes A, B and Data in will all be equal, thus making both the proportional and the reference outputs of the PD zero. This makes this PD insensitive to a lack of a data transition. Now, assume that the clock is “early”, the case depicted in Figure 3.3 (a). It can be seen that the VCO frequency should decrease for locking to occur in the subsequent data cycles. This is better illustrated in Figure 3.4 (a), where the dotted line shows the desired change in the VCO output. Similarly, Figure 3.4 (b) shows that when the clock is “late”, the VCO frequency should increase for locking to occur in the subsequent data cycles. Thus the reference output of the PD should be connected to the Dn input and the proportional output of the PD should be connected to the U p input of the CP of Figure 2.5.  25  Chapter 3. Phase/Frequency Detector Data in Clock (a)  (b)  Figure 3.4: Change of VCO frequency when clock is (a) Early (b) Late  3.3  Alexander Phase Detector  Alexander phase detector [51] is one of the most well-known bang-bang PDs. Its structure is shown in Figure 3.5. FF1  Late or Y  FF2  Data in Q  D  S3  D  Q  D  Q  S1  Early or X  Clock  Q  D FF3  S0  S2  FF4  Figure 3.5: Alexander phase detector [51]  Binary PDs usually have two outputs, namely (clock) early or (clock) late. The value of these outputs only depends on the relative phases of the two PD inputs and is independent of the magnitude of the phase difference between them. Figure 3.6 shows the output of the PD. Note that in bang-bang phase detectors, one of the outputs is always high (even when the PLL is locked). Thus unlike linear PLLs, i.e. PLLs which use linear phase detectors, in binary PLLs, i.e. PLLs which use binary phase detectors, the VCO frequency changes in each clock cycle (assuming a transition on the  26  Chapter 3. Phase/Frequency Detector data has occurred). This significantly deteriorates the jitter of the PLL. When the PLL is locked, the output of the PD alternates periodically between clock early and clock late (the period is closely related to the loop dynamics). Also, note that unlike a linear PLL, where both outputs become “1” sometime during the clock cycle and the change in the VCO frequency occurs because of the difference in the pulse width of the two outputs, in a bang-bang PLL, one of the outputs will stay at zero. Thus only one output drives the VCO for a full period of the clock. This significantly decreases the lock-in time of a binary PLL. S2 S3  Data in  S1  S1 S2 S3  Clock S3 S1 S0 S2 (a)  (b)  Figure 3.6: Alexander PD output when clock is (a) Early (b) Late  For an Alexander PD, it can be shown that in case of a continuous data stream, both of the PD outputs remain low, making this PD suitable for CDR applications. As it can be seen in Figure 3.6, the Alexander PD operates by comparing three samples, S1 , S2 and S3 . These three samples can have eight states, two of which are dealt with in the figure. Also, as it can be seen in the figure, if a value of 011 for S1 , S2 and S3 is associated with a “clock early” state, a value of 100 for S1 , S2 and S3 also represents the 27  Chapter 3. Phase/Frequency Detector same state, and is dealt with in the PD. Similarly, both values of 110 and 001 for S1 , S2 and S3 represent a “clock late” state. This is shown in Table 3.1. If any of the other four cases occur, both of the PD outputs will remain low, which enables the PD to deal with a continuous data stream. However, note that a significant frequency difference between the clock and the incoming data can also result in the occurrence of these cases. A conventional Alexander PD however cannot deal with this. S1 0 0 0 0 1 1 1 1  S2 0 0 1 1 0 0 1 1  S3 0 1 0 1 0 1 0 1  Clock is Early/Late Not Applicable Late Not Applicable Early Early Not Applicable Late Not Applicable  VCO frequency should Not Applicable Increase Not Applicable Decrease Decrease Not Applicable Increase Not Applicable  Table 3.1: Operation of an Alexander PD. The Alexander PD is modified in [53] as shown in Figure 3.7. In this PD, the CP voltage changes only when the Ref erence output of the PD is high. Reference  FF1  FF2  Data in Q  D  D  Q  Clock  Q  D  Up / Down  FF3  Figure 3.7: Modified Alexander PD [53]  To understand the advantages of the above PD, let us compare it to the PD of Figure 3.4. 28  Chapter 3. Phase/Frequency Detector In a conventional Alexander phase detector, F F3 is clocked with the falling edge of the Clock while F F4 is clocked with the rising edge of the Clock. This means that F F3 should have the data ready in half the period of the clock, or alternatively, it should work at twice the clock frequency, which increases the power consumption (Note that the same problem exists with the Hogge PD). The above PD eliminates the need for a very high speed flip flop, thus saving power. Also it uses three flip flops and only one XOR gate, which decrease the power consumption of the gate even more. It is also more robust, as the flip flops are not extensively loaded. However, only the falling edge of the data is used to sample the clock. Since this output is subsequently used to correct the VCO frequency, this frequency is not regularly corrected, which could increase the jitter.  3.4  Half-Rate Phase Detectors  As the need for multi Gbps serial communication increases, designers have started using partial rate phase detectors. Although more complicated, these structures have the benefit of allowing the VCO to work at a lower frequency. [129] introduces a half-rate linear phase detector, while [48] introduces a half-rate binary phase detector. Figure 3.8 shows a half-rate linear phase detector [129]. Note that this PD uses four latches and two XOR gates. Also, all the signals are fully differential. The operation of this phase detector is shown in Figure 3.9. The operation of a half-rate binary phase detector is based on the usage of doubleedge-triggered flipflops (DETFFs) which are flip flops that sample the data on both edges of the clock [59]. The PD is shown in Figure 3.10 [48], where the subscripts Q and I for the clock refer to the quadrature and in-phase components of the clock [60]. The waveforms of Figure 3.11 illustrates the operation of this PD [48]. Note that in the figure, the S0 sample corresponds to a lack of a data transition (contin29  Chapter 3. Phase/Frequency Detector Error Latch1  Latch2  Data in Q  D  X1  D  Q  D  Q  Reference Y1  Clock  Q  D Latch3  Y2  X2  Latch4  Figure 3.8: Half rate linear phase detector [129]  Data in  Clock  Error  Reference  Figure 3.9: Half rate linear phase detector operation [129]  uous data stream). This is discarded in the final stage, as S− or S+ outputs of the bottom DETFF are sampled by DQ . The output of the PD is differential and it is positive if the clock is late and negative if it is early [48]. Although the S0 sample is discarded, the output of the PD can either be positive (increasing the VCO frequency) or negative (decreasing the VCO frequency). In other words, even without any data edges, the PD continues to change to VCO frequency, which is undesirable, as it increases the output jitter and if not dealt with in the design stage, it can make the PLL lose lock. To solve this problem, a small loop bandwidth is chosen [48], which increases the lock time of the PLL [21].  30  Chapter 3. Phase/Frequency Detector ClkQ Data in D  Q M U X  D  Q  DQ  D  Q M U X  ClkI  D  D  Q M U X  D  Out  Q  S- or S+  Q  Figure 3.10: Half rate binary phase detector [48]  3.5 3.5.1  Hybrid Phase Detector Introduction  As mentioned before, compared to a binary PD, a linear PD results in less CP activity, less ripple on the VCO control line and thus less jitter [129]. However, a linear PD suffers from a limited pull-in range and slow locking [40, 43]. So what now comes in mind is to have a PLL which starts its operation with a binary PD, thus taking advantage of both the increased pull-in range and the fast acquisition time. After the binary PD locks, then the operation can be switched to a linear PD, which has a low output jitter. This in fact is done in [31] and to a lesser extend in [43]. In [43], a phase detector is introduced which is capable of changing its KP D , as shown in Figure 3.12 [43]. Note that the PD changes its characteristic from bang-bang to linear as Vg changes from 1.75 V to 1.65 V.  31  Chapter 3. Phase/Frequency Detector S+ Data in  S-  S0  ClkI  ClkQ  DQ  Figure 3.11: Half rate binary PD operation when clock is late [48]  The PD design is based on a ternary (three-state) D-flip flop [43,61]. The PD is shown in Figure 3.13, where the signal Vg can be used to adjust the gain of the positive feedback in each latch. However, the PD is not immune to continuous data streams and cannot be used in CDR applications. Since this PD cannot be used in CDR applications,a more detailed discussion on it is out of the scope of this report. Interested readers are encourage to consult [43, 61, 62] for further information on this phase detector. In [31, 49] another hybrid phase-locked loop is introduced. In this PLL, the VCO has to provide four square pulses, each lagging the previous output by 90 degrees. Calling the VCO outputs, φ1 to φ4 , the PD is shown in Figure 3.14 [31], where U p1 and Dn1 are the outputs of the linear PD and U p2 and Dn2 are the outputs of the binary PD. The circuit is designed to ensure that only one phase detector is active at each time. An FD is added to the above PD to form a PFD. To explain the operation of the binary PD, note that it is only active if the phase difference between the clock and the input exceed 45 degrees, otherwise the output of the two-XOR block linear PD is selected [31]. The operation of this linear PD is similar to that of a single XOR PD, discussed in chapter 2.  32  Chapter 3. Phase/Frequency Detector  Figure 3.12: A semi-hybrid PD characteristic [43]  However, this PD is also sensitive to a continuous stream of zeros or ones, and cannot be used in CDR application.  3.5.2  A Hybrid PFD for CDR Applications  We present a hybrid phase/frequency detector that is suitable for CDR applications, namely, it is insensitive to continuous data streams. In this section, two different hybrid PFDs, for low-frequency and high-frequency applications, are introduced.  Low Frequency Hybrid PFD The basic idea can be understood by looking at Figure 3.2 and Figure 3.5. By noting that the bottom branch of the Alexander PD is similar to the Hogge PD, and by adding only two more XOR gates to the well-known Alexander PD, a linear PD can also be obtained. This is the genesis of the proposed hybrid PD circuit shown in Figure 3.15, where all the signals  33  Chapter 3. Phase/Frequency Detector  Figure 3.13: A semi-hybrid phase detector [43]  are fully differential. By comparing this PD to the Hogge PD, one can observe that instead of clocking F F1 and F F2 by clock and clock, respectively, as is the case in a conventional Hogge PD, the proposed linear PD is clocked with clock and clock. This clocking proves vital for correct operation as the conventional Alexander PD locks when the rising edge of the clock coincides with the rising edge of the data, while the conventional Hogge PD locks when the falling edge of the clock coincides with the rising edge of the data, thereby locking 180 degrees out of phase with respect to each other. The problem is resolved using the above-mentioned clock inversion. As mentioned before, the PLL operation starts with the binary PD, and after the initial lock is achieved, the operation is switched to the linear PD. After the lock of the linear PLL, the system will be ready to receive data. However, since no data can be interpreted before lock acquisition, we can make the assumption that before lock, continuous data streams cannot be present10 . So now, although the linear PD should still be insensitive 10  Many researchers, for example [63], also make this assumption.  34  Chapter 3. Phase/Frequency Detector φ1  Up1  φ3 Dn1  UP2 Dn2 D  FF1  Q  φ1  Up2 D  Data in  FF2  Q  φ1  φ2 D  FF3  Q  φ4  Dn2 D  FF4  Q  φ1  Figure 3.14: A hybrid phase detector [31]  to continuous data streams, this is no longer a requirement for the binary PD. With this assumption, let us revisit the “Not Applicable” rows of Table 3.1. Noting that now, only a significant frequency difference between the clock and the data can take the “state” of the PD into one of these rows, dealing with them will extend the phase detector into a phase/frequency detector. This is done in Table 3.2. Note that in the table, a “fast clock” is a clock whose frequency is significantly higher than that of the data, and a “slow clock” is a clock whose frequency is significantly lower than that of the data. Therefore, two more outputs can be added to the PD to take care of the fast or the slow clock. Since these two outputs enable the PD operation under a significant frequency difference between the clock and the data, they are named F Ddn (corresponding to a fast clock) and F Dup (corresponding to a slow clock). It is these outputs that make the PD  35  Chapter 3. Phase/Frequency Detector Late FF1  FF2  Data in Q  D  S3  D  Q  D  Q  S1  Early  Clock  Q  D  S0  FF3  S2 Ref  FF4 Prop  Figure 3.15: A hybrid PD for CDR applications S1 0 0 0 0 1 1 1 1  S2 0 0 1 1 0 0 1 1  S3 0 1 0 1 0 1 0 1  Clock is Fast Late Slow Early Early Slow Late Fast  VCO Frequency should Decrease Increase Increase Decrease Decrease Increase Increase Decrease  Table 3.2: Operation of a hybrid PFD. work as a PFD. The binary PD outputs of Table 3.2 can now be simplified as follows:  P Dup = S2 ⊕ S3 P Ddn = S1 ⊕ S2  (3.1)  which is the same as those of the Alexander PD. The FD outputs can be written as:  F Dup = S1 .S2 .S3 + S1 .S2 .S3 F Ddn = S1 .S2 .S3 + S1 .S2 .S3  (3.2)  36  Chapter 3. Phase/Frequency Detector Thus, by adding two more gates to what was first introduced in Figure 3.15, a PFD will be obtained. One of the advantages of the above expressions is that since the FD output is separate from the PD output, the CP can be designed such that the current pumped into the loop filter due to a signal from the FD is more than the current pumped due to a signal from the PD. Thus a faster lock can be achieved. Also, note that the operation mode switches smoothly from the FD to the bang-bang PD, while switching from the binary to the linear PD will be discussed later in chapter 4. The above FD gates suffer from a drawback. To understand this, let us first illustrate the CML gates used to realize Equation 3.2. This is done in Figure 3.16. VDD  VDD  R1  R2  R1  R2  Voutb  Vout  Voutb  Vout  S3  S3b  S3  S3  S3b  S3  S2  S2b  S2  S2b  S2  S2b  S1b  S1  I1  S1b  S1  I1  (a)  (b)  Figure 3.16: (a) Down (b) Up CML gates of the FD  In this figure, both gates are three-stack CML gates, which are relatively slow. This can be understood through the application of the well-known Elmore delay formula [64]. Thus 37  Chapter 3. Phase/Frequency Detector the above FD is not suitable for high-frequency applications. Also as mentioned before, in Figure 3.15, F F3 should operate at twice the clock frequency. Assuming a clock frequency of 10 GHz, the operating frequency of the flip flop needs to be 20 GHz. This is because it is clocked with the falling edge of the clock signal and its output is sampled on the rising edge of the clock, giving it only 50 ps to stabilize its output. In high-frequency applications, this inevitably results in a small swing for S0 . Now, considering that there is no time to reconstruct the output of F F3 , F F4 is driven by the already weak S0 output, which unless extensive power is consumed in F F3 , might not be enough to turn the transistors in F F4 completely off or on. This drastically decreases the swing of S2 . What is even worse is that according to Equation (3.1) (the binary PD) and Equation (3.2) (the FD) and Figure 3.15 (the linear PD), F F4 has more capacitive loading than the other flops in the circuit, as SS2 appears in all four equations and is used to generate the reference output of the linear PD. These all make the design of F F3 and F F4 extremely challenging. To make this PFD work with a 10 GHz clock, all the common techniques that are used to increase the speed of the flops, namely use of inductive peaking, use of a feed forward path, increasing the power consumption of the gate and careful sizing, which are also reviewed in the Appendix, should be used. Thus these flip flops are both power hungry and they consume a large area of the chip.  High-Frequency Hybrid PFD Let us now revisit Table 3.2 with a different approach. Instead of using the second last column to form the PFD, this time, let us try to simplify the PFD using the last column of the table. This way, we are combining the functionality of both the FD and the binary PD into one block. Using Karnaugh maps [65] to simplify the above truth table, it can be  38  Chapter 3. Phase/Frequency Detector shown that:  P F Dup = S2 .S3 + S2 .S3 P F Ddn = S2 .S3 + S2 .S3  (3.3)  where the P F Dup and P F Ddn signals force the VCO frequency to increase and decrease, respectively. Note that in the above two equations, the S1 term does not appear any more. This means that we can get rid of F F2 to save power and area. Let us now try to see if we can still simplify Equation (3.3) further:  P F Dup = S2 .S3 + S2 .S3 = S2 ⊕ S3 P F Ddn = S2 .S3 + S2 .S3 = (S2 + S3 ).(S2 + S3 ) = 0 + S2 .S3 + S2 .S3 + 0 = S2 .S3 + S2 .S3 = S2 ⊕ S3 = P F Dup  (3.4)  The significance of this result can be understood if one considers that when using fully differential signals, inversion can simply be achieved by swapping the outputs and no gates are needed. Using this, we will be left with only one XOR gate, thus saving power even further. Also, compared to the low frequency PFD, where S2 is used in four gates to generate the outputs of the FD and the binary PD, in the above high-frequency PFD, S2 only has to drive one gate which significantly reduces the loading on F F4 . Now that we are done simplifying the binary PFD, let us revisit the linear PD one more time. Although the swing of S2 has significantly improved, it is still less than that of S0 because the “low-swing” S0 signal is driving F F4 . Furthermore, this swing difference is frequency dependent and gets worse in higher frequencies. This results in the swing of the reference output being smaller than that of the proportional output. Considering 39  Chapter 3. Phase/Frequency Detector that these signals drive the switches corresponding to the bottom and the top current sources of the CP, it is important that these switches have equal resistances, as ideally, we require equal currents to charge and discharge the loop filter capacitors. To solve this problem, a multiplexer is used whose select input is connected to the reference output of the PD and its inputs are connected to ideal voltage sources. The MUX can be designed such that it switches its output with a minimum of about 400 mV peak-to-peak swing on the select input. Thus although the voltage swing on the reference output is only about 500 mV (If the clock frequency is 10 GHz), the voltage sources can be designed such that the peak-to-peak swing on the Dn input of the CP matches that of the U p input. The addition of the MUX can now be used to improve the operation of the linear PD. This is explained using Figure 3.17 which shows the outputs of a normal Hogge PD when the VCO output (declared as Clock in the figure) lags the incoming data. Data in Clock Prop Ref  λ  t1  t2  Figure 3.17: Output of Hogge PD when clock lags data  It has already been shown that in a normal Hogge, the CP charges the loop filter during t1 and discharges it during t2 . Thus when t1 is greater than t2 , which is the case shown in Figure 3.17, the VCO frequency increases, until t1 =t2 , which basically denotes a locked state11 . However, a faster lock can be achieved if the CP discharges the loop filter during λ, as λ < t2 . This can be achieved by sending prop to the output of the MUX when Ref is high. Note that upon lock, λ = t2 and thus the modified Hogge PD acts like a perfect 11  Note that upon lock, the clock period is ideally equal to the data period which is the same as the Hogge PD. Loosely speaking, the clock frequency is twice the data rate.  40  Chapter 3. Phase/Frequency Detector linear phase detector. In case of a consecutive data stream, the reference output will be low and thus the MUX will select “0” as its output, thus making the modified Hogge still insensitive to a consecutive data stream. Figure 3.18 finally summarizes the design of the high-frequency PFD. Note that all signals are fully differential. Once again, it should be mentioned that no gate was used to invert the signals. In the figure, P Ddn and P Dup are the outputs of the linear PD, while P F D up/dn are the outputs of the binary PFD. FF1 Data in Q  D  PFD up/dn  S3  Clock  0  Q  D FF3  S0  Q  D  S2  PDdn  Prop Ref  FF4 Prop=PDup  Figure 3.18: High-frequency hybrid phase/frequency detector  Finally note that in the low frequency version of this PFD, we mentioned that more current was pumped into the loop filter due to a signal from FD, which could lead to a decrease in the lock time. Now however, equal CP current is pumped into the loop filter due to a signal from the FD or the binary PD, as we have no means of ascertaining if the PFD output is due to the activation of the FD or the binary PD.  41  Chapter 3. Phase/Frequency Detector  3.6  Conclusion  In this chapter, phase/frequency detectors are introduced. After reviewing the design of a few well-known PFDs, two hybrid PFDs, suitable for CDR applications are introduced. A PFD, suitable for low frequency applications is first designed. This design is then extended into a PFD, which can be used in higher frequencies. However, one point is still missing. Although the use of hybrid PDs can provide several benefits, the operation should somehow be switched from the binary PD to the linear PD. This problem was avoided in this chapter. Chapter 4, introduces two circuits which can potentially be used to handle the automatic switching of the PLL operation.  42  Chapter 4 High-Speed Frequency Comparators 4.1  Introduction  After the introduction of a novel hybrid phase/frequency detector in Chapter 3, a method should be developed to switch the operation from the binary phase-locked loop to the linear phase-locked loop. Probably the first and the most inefficient method that comes to mind is to do this manually. This method is reliable, but too slow for many applications. The second solution is to use a timer. The worst case locking time (corresponding to the largest frequency difference between the two inputs) of the bang-bang PD should first be measured. Let us call this TBinaryLock . Then a timer should be designed to switch the operation when t − tStart = TBinaryLock , where tStart is the time at which the PLL operation starts. However, it is clear that this method is not the optimum method, as switching is optimized for the case where the frequency difference between the two inputs is maximum. The third method that can be used is to differentiate the control voltage of the VCO, and then compare its value to zero. Switching is done when this derivate is almost zero. However, this method can also be problematic, as it loads the loop filter, which changes its characteristics. This could necessitate a redesign of some of the blocks. More importantly, the reliability of this method is questionable as it is based on the prediction of the very non-linear behavior of the PLL during lock. One of these non-linearities, compromising the effectiveness of this method is called cycle slip. This well-known effect takes place in many 43  Chapter 4. High-Speed Frequency Comparators PLL designs [21, 66, 67]. To explain this, Figure 4.1 shows the pull-in process of a general PLL [21]. In this figure, ωV CO is the frequency of the PLL VCO (which corresponds to the VCO control voltage), TP is called the pull-in time, and ∆ω0 is the initial frequency difference between the clock and the data.  Figure 4.1: Pull-in process of a PLL [21]  In this figure, it can be seen that the increase of the VCO control voltage is not monotonic. The temporary decrease in the VCO voltage is a result of the cycle slip phenomenon. A complete explanation of this effect is rather long, and unnecessary. To briefly explain this effect, let us assume that the frequency of the clock is smaller than that of the data, as is the case in Figure 4.1. Also, let us assume that the response time of the PLL is small (which corresponds to a small bandwidth for the PLL). Now, also let us assume that for the phase difference between the clock and the data to decrease, the PD should increase the VCO frequency. However, since the response time of the PLL is slow, in the next clock cycle, the frequency difference causes the phase difference to increase even more. 44  Chapter 4. High-Speed Frequency Comparators This continues until the VCO frequency goes through a complete cycle. At this point, the phase difference suddenly decreases, causing the PD to think that the real phase difference is less than a clock cycle, while it is actually a bit greater than one clock cycle. Thus the wrong command is sent to the VCO. Cycle slipping directly depends on the inputoutput characteristics of the PD, and occurs when the phase difference between the two PD inputs exceeds the maximum allowable range for that particular PD (for example 2π in Figure 2.8) [67]. Assuming that the slip causes the VCO control voltage to change direction, its slope changes from positive to negative,thus crossing zero. This could cause the above method to become unreliable. The method used in our design to switch from the binary to the linear PD takes advantage of one of the inherent properties of the PLL. As it is mentioned in [38, 68], the PLL operation starts with the frequency acquisition phase, during which the clock frequency approaches the data frequency, rapidly accumulating phase in the process. After the two frequencies are equal, the phase will be corrected and only then, complete locking will be achieved. Figure 4.2 [38] shows this behavior in more detail. This property of the PLL can be used to switch the operation from the bang-bang PD to the linear PD. In Figure 4.2, it seems that t1 is an ideal place for the switching to occur. Thus it seems that designing a frequency comparator can take care of the switching. However, a fast locking PLL, would require a very fast frequency comparator. Many circuits have been suggested in the literature that do frequency comparison, some of which work in lower frequencies [38]. The high-frequency comparators are mostly based on counters [30, 32] which have a slow response time. In this chapter, we introduce two different ways of designing a fast frequency comparator, one of which will later be used for the purpose of switching the operation. Note that  45  Chapter 4. High-Speed Frequency Comparators  Figure 4.2: PLL locking process [38]  the power consumption of the frequency comparator is not a concern, as it can be switched off after it switches the operation, as will be explained later.  4.2  Frequency Comparator Design Overview  Like most frequency comparators, this circuit also has two inputs whose frequencies are to be compared with one another. It also assigns two equal capacitors to the two signals. Each capacitor is charged only when its corresponding signal is “1”. When both capacitors are completely charged, a signal declares it, at which moment, the charging process is stopped, the voltages on the capacitors are compared to each other and the system is reset for another run. Considering the fact that the signal with the higher frequency has a shorter period, its corresponding capacitor will be charged for a shorter time and will thus have less charge on it upon comparison. This fact is used to compare the frequency of the  46  Chapter 4. High-Speed Frequency Comparators two signals. Using this, frequency comparison can be done in a very short time. Probably the greatest challenge faced in the design of a circuit implementing the above scenario is the fact that the rising edges of the two signals can be completely random with respect to one another. This can create numerous errors if not dealt with in advance. All control signals should be carefully designed to ensure that despite the inherent randomness of the inputs, charging and discharging of the capacitors and also comparing the voltages on them take place at exactly the right time to avoid a false output. Based on what was said above, the proposed comparator has four main blocks, namely, a control block, whose function was already described, a charge pump (CP), which charges the two equal capacitors with two equal current sources and discharge them after the comparison, a fast comparator, which compares the voltages on the two CP capacitors, and an output stage, which buffers, amplifies and level shifts the output of the comparator. This is shown in Figure 4.3. Reset  Din Clk  Control Block  CPd CPc  CP  Outdata OutClk  decision Out decisionb  Buffer  Figure 4.3: System overview of the frequency comparator  In this figure, Din (data in) and Clk (clock) are the two signals whose frequencies are to be compared. They are converted to CPd and CPc , respectively, by the control block, which charge the two capacitors, Cd (corresponding to the data) and Cc (corresponding to the clock) of the charge pump. The outout of the CP, namely Outdata (corresponding to the input data) and OutClk (corresponding to the clock) are then compared to each other. The output of the comparator, decision, is either 1 or -1, depending on which of the signals has the higher frequency. The CP and the comparator are then reset by the 47  Chapter 4. High-Speed Frequency Comparators control block after the comparison. Note that all the signals are fully differential. In our specific application, since the frequency of the data stays constant before the acquisition of lock, there is no need to constantly charge and discharge its corresponding capacitor, Cd , which simplifies the design of the control block and also speeds up the process of frequency comparison. We can charge Cd only once at the beginning of the operation, and then only charge and discharge Cc for each comparison. However, the leakage on Cd can cause inaccuracy, specially as the lock time of the PLL increases. The two frequency comparators mainly differ in the design of their control blocks. In this chapter, the design of both of these circuits are reviewed, but for the sake of generality, we use the version which discharges both Cd and Cc after each comparison in our final design. Also note that as shown in Chapter 3 and after the lock acquisition, the frequency of the clock is twice as much as that of the data. Thus a frequency divider (divide by 2) should be used for the Clk input, before doing any frequency comparison. The output of this divider is named hClk.  4.3 4.3.1  Design of the Control Block Leakage-Tolerant Frequency Comparator Design  Figure 4.4 shows the design of the control block of the frequency comparator in detail. All the other blocks are shown as boxes, and their design will be explained later in this chapter. This is our first design, where both Cc and Cd are discharged after each comparison. This circuit produces the outputs given in Figure 4.5. Note that we are assuming that the two inputs have a duty cycle of 50 %. The divide by 2 circuit, will guarantee this for the clock input anyway, but for the data input, this must be done by the designer. The two capacitors in the CP will be charged only when CPc or CPd are high. After  48  Chapter 4. High-Speed Frequency Comparators  RdyD  1  D  Q  1  R  Din  RdyC  1  RdyD  D  Q  Clk  Q  D  R  Q  CkDly  hClk  CPc  Charge Pump  Q  RdyD  R  R  Din  CPd Q  D  qClk  R  D  D Done 1 qdata  qData  CmpRs  1 qClk  D  Q  RdyC  R  Outdata OutClk  Comparator  Buffer  Out  Reset  Control Block  Figure 4.4: Leakage-tolerant frequency comparator design  both capacitors are fully charged by CPc and CPd , a signal donates this, at which point the comparator compares the charges on the two capacitors. In Figure 4.5, note that the comparison is done when CompReset is low. After the comparison is finished, a signal called “Done” is set, which will discharge the capacitors. The process will be repeated again at this point. However, to avoid the capacitors getting charged by half of a cycle of hClk or Din, qClk and qData should be reset at the right time, namely, when hClk and Din are both low, respectively. The circuit implementing this is shown in Figure 4.4, where the signals ReadyClk and ReadyData take care of this. Although the minimum frequency of operation of the proposed circuit depends on the rising edges of the two signals with respect to one another, for a 5 GHz data and a 10 GHz clock, on average one comparison is made every 600 ps. The simulation results for this system will be presented at the end of this chapter.  49  Chapter 4. High-Speed Frequency Comparators Din hClk qData qClk CkDly CmpRs RdyD RdyC CPd CPc  Figure 4.5: Leakage-tolerant frequency comparator operation  4.3.2  High-Speed Frequency Comparator Design  A small modification to the circuit shown in Figure 4.4 provides us with a high-speed frequency comparator, which also has a simpler design and can be designed to burn less power (as it has fewer blocks), however as mentioned before, its output is sensitive to the capacitor leakage. Figure 4.6 shows this modified frequency comparator. Once again, all the signals are fully differential and all the gates are CML. This circuit has three inputs, namely Din, Clk and Reset, where the Reset input resets the system. This circuit produces only one pulse on CPd which charges its corresponding capacitor. This capacitor then holds its value, until the Reset signal discharges it. Note that this is regarded as a system reset. The control block here makes sure that the capacitor corresponding to CPd is initially charged with one full period of Din. This is important because this capacitor is not reset any more during the operation, and thus it must be properly charged. The above circuit produces the outputs shown in Figure 4.7. Here and with a 5 GHz input and a 10 GHz clock, one comparison is made approxi50  Chapter 4. High-Speed Frequency Comparators  D 1 Reset R  Q  D 1 Reset R  Q  qdata D  Q  hClk  D  Q  CmpRs 1 D  Q  R  Clk  D  Q  Done  R  Din  CPd 1 Done  D  Q  qClk CPc  ClkReset  Outdata  Charge Pump  OutClk  DataReset  R  Control Block  Comparator  Buffer  Out  Reset  Figure 4.6: High-speed frequency comparator design  Din hClk qData CmpRs qClk Done CPd CPc  Figure 4.7: High-speed frequency comparator operation  mately every 300 ps. Both of the two circuits have many gates and since they have to operate fast, they consume a significant power. However, considering the fact that they are turned off as soon as they switch the operation, this temporary power consumption can be tolerated. To turn the frequency comparator off, several solutions come to mind. One is to turn off the supply rails. However, this is not efficient, as the supply capacitor is considerable, which makes turning the operation on again a time consuming process. The second method, which is used in our design, can be understood by looking at the structure of the CML 51  Chapter 4. High-Speed Frequency Comparators gates. The current source of the CML gates can be easily turned off by turning the reference current off which subsequently turns the gate off. The detail of the circuit doing this will be shown later in the thesis.  4.4  Charge Pump Design  After the generation of CPc and CPd by the first block, equal capacitors need to be charged with these two signals in a charge pump (CP)12 . The way this is done is that when CPc or CPd are “1”, a switch conducts current to their corresponding capacitors. Figure 4.8 shows the circuit which charges Cc , the capacitor associated with CPc . Note that if Done is high, this capacitor will be discharged. VDD I1 A CPc  M1  M2  CPc  M3  M4  CPc Out  Done  Cc M5  M6  Done  Figure 4.8: Frequency comparator charge pump  This CP circuit is similar to that introduced in [70], which will be discussed in detail in the next chapter. As it can be seen in Figure 4.8, the CP has two branches. The 12  In the next chapter, we will be talking about another charge pump (CP) used in the PLL loop. Note that although similar, these two charge pumps have some fundamentals differences and are used for different purposes in our design and should not be confused with each other.  52  Chapter 4. High-Speed Frequency Comparators main branch is used to charge the output capacitor, while the second branch ensures that the current source I1 always conducts and thus the voltage of node A is never VDD . The reason this is important is that in the absence of the second branch and when CPc is “0”, the switch is basically off and the current I1 has to be zero. This is only possible if the voltage of node A is equal to VDD , as I1 is a non-ideal source. Now, the arrival of CPc will momentarily connect the output node to VDD , which will produce unwanted jump on the voltage of this node. Comparing the modified CP to that introduced in [70] will reveal that our CP is missing a buffer. This does not introduce any unwanted jump on the capacitor output, since when CPc becomes “1”, the voltage of the outout node will always start increasing from zero. So when the current of I1 switches from the second branch into the main branch, the voltage of node A will not change at all. Thus this branch switching is not sensed by the current source, and no ripple is created. The exact same circuit is is used for CPd in the leakage-tolerant design. For the highspeed design and since we are not discharging CPd for each comparison, transistors M5 and M6 are connected to the signals which reset the whole system, instead of Done and Done.  4.5  Comparator Design  Fast comparators are typically used in flash analog to digital converters (ADCs) [71]. In [72] and [73], two comparators are introduced. However, the use of PMOS transistors in their design slows them down, as PMOS transistors are inherently slower than NMOS transistors. This is because the mobility of the holes is smaller than that of the electrons [74]. In [75], a comparator, fully based on NMOS transistors (with PMOS loads) is introduced. A comparator very similar to this is used in our design. However, due to the stringent speed requirements of the comparator in the high-speed frequency comparator design, the PMOS 53  Chapter 4. High-Speed Frequency Comparators load transistors were replaced by resistors, which introduce less capacitance into the circuit. The comparator used in the high-speed design is shown in Figure 4.9. VDD Reset R1  R2 M7  Vout Voutb  In1  M2  M1  Reset  In2  M5  M3  M4  M6  Bias  Resetb  M8  Figure 4.9: High speed comparator  Another version of this comparator for the cases where the speed requirements is not the design bottleneck can also be used. This one has a stronger reset, but is a bit slower. This comparator is used in the leakage-tolerant version of the frequency comparator and is shown in Figure 4.10. Here, note that when Reset is “1”, the comparator makes both its outputs almost equal to VDD . In the mean time, the voltage difference between In1 and In2 sets a small current difference in the two branches, which is then amplified when Reset becomes “0” by the positive feedback in the right branch, making one output “1” and the other “0”. An accuracy of about 2 mV in 1.5 GHz was obtained from the comparator.  54  Chapter 4. High-Speed Frequency Comparators VDD  Resetb  M8  Resetb  R2 M7  R1  Vout Voutb  In1  M2  M1  Reset  In2  M5  M3  M4  M6  Bias  Resetb  M9  Figure 4.10: High speed comparator with a strong reset  4.6  Buffer Design  The final stage of the comparator is the analog buffer and the level shifter. Also, a simple digital logic is needed to indicated if the frequency of the two signals have gone passed each other. Since the design of the comparators are such that their outputs approach VDD during reset, an analog buffer and level shifter with a negative gain should be used. This way, the comparator outputs, which are signals that decrease from VDD and are reset to VDD will be converted to signals which increase from zero. The output of the buffer can then be used to drive digital CMOS gates. Also, one more problem with the comparator is that when in Figure 4.9 and Figure 4.10 Resetb becomes “1”, one output stays at VDD while the other output move towards Gnd. However, the high operation frequency resets the comparator before its output voltage reaches Gnd. So the voltages of nodes Vout and Voutb should be amplified, level shifted to zero and also buffered. A common-source (CS) amplifier, [23] satisfies all of these requirements, and is thus used in the design. To decrease the loading  55  Chapter 4. High-Speed Frequency Comparators on the comparator, the amplifier should also be designed such that its input capacitance is small. This increases the maximum operating frequency of the comparator. Since the required gain of the amplifier is about two, it is easy to optimize the amplifier design for bandwidth and low input capacitance. Also so far, all of the design was fully differential. From the output of the comparator to the output of the frequency comparator, single-ended CMOS circuits are used to save power, as the effect of the circuit noise is negligible after the comparator output. The circuit used is two single-ended common source amplifiers with current source loads, one for each comparator output. Finally, note that in this design, we are not looking to determine the signal with the higher frequency, and the only important point is to be able to determine the point where the frequency of the clock passes that of the data. This is done by using the output of the comparator. Say that if Vout − Voutb is high, then the frequency of the clock is higher than that of the data, and conversely, if Vout − Voutb is low, then the frequency of the data is higher. We thus have to determine the point at which Vout − Voutb changes from high to low, or from low to high. The single-ended logic implementing this circuit is shown in Figure 4.11. The gates used here are basic CMOS gates, introduced in [76].  Outdata  decision  OutClk  decisionb  A A  1  D  Reset Compoutb  R  Q  Sel Compout Reset  R  1  D  Q  Figure 4.11: Converting comparator output to frequency comparator output  56  Chapter 4. High-Speed Frequency Comparators  4.7  Simulation Result  To show the operation of the two frequency comparators designed, they are simulated with the same inputs that they should work with in the final PLL design. The circuit used to test these two frequency comparators are shown in Figure 4.12. VCO Clk Frequency Comparator  Vdc  Sel  Din Vpulse  Figure 4.12: Circuit used to test the frequency comparators  The VCO is also loaded with dummy loads to mimic the load on the VCO in the PLL. One input is connected to an ideal pulse source, to mimic the data input, and the other input is connected to the VCO to mimic the clock input. The tuning voltage of the VCO is then swept from zero to one volt. As will be seen in Chapter 6, this increases the VCO frequency, from about 8 GHz, to about 10 GHz. The frequency of the input pulse is about 4.5 GHz. Figure 4.13 shows the simulation results for the high-speed comparator. The simulation is done in IBM 0.13 µm CMOS technology and with a VDD of 1.2 V. In Figure 4.13 and starting from the top, the first figure is OutClk (red dotted line) and Outdata (blue solid line), the CP outputs; the second figure shows decision (red solid line) and decisionb (blue dotted line), the comparator outputs; the third figure is the differential output of the comparator before amplification and level shifting (red dotted line) and after that (blue solid line); and finally the fourth figure is Compout (red solid 57  Chapter 4. High-Speed Frequency Comparators  Figure 4.13: Simulation results for the high-speed comparator  line) and Compoutb (blue dotted line), the outputs of the analog amplifier and level shifter. All these signals are depicted in Figure 4.11. The power consumption of this comparator is 46.6 mW, which is tolerable for our application, for the reasons already discussed above. One point about the above figure is that the outputs are totally incorrect until about 1 ns after the start of the simulation. This has to do with the way the VCO is simulated. As it will be explained in Chapter 6, a VCO is basically a positive feedback circuit, amplifying its own noise at a specific frequency. However, inclusion of circuit noise would make the transient simulation extremely time consuming. So, the VCO is usually simulated with an initial condition on one of its nodes, which gives the VCO the initial energy that it needs to start oscillating. After the start of the simulation, the amplitude of the VCO output begins to increase, until it reaches its maximum [23] and then VCO will oscillate with a constant amplitude. The reason for the incorrect outputs is that at first, the amplitude of  58  Chapter 4. High-Speed Frequency Comparators the VCO output is too small to be sensed by the CML gates. Figure 4.14 shows the simulation results for the leakage tolerant comparator.  Figure 4.14: Simulation results for the leakage-tolerant comparator  In Figure 4.14 and starting from the top, the first figure is OutClk (red dotted line) and Outdata (blue solid line), the CP outputs; the second figure shows decision (red solid line) and decisionb (blue dotted line), the comparator outputs; the third figure is the differential output of the comparator before amplification and level shifting (red dotted line) and after that (blue solid line); and finally the fourth figure is Compout (red solid line) and Compoutb (blue dotted line), the outputs of the analog amplifier and level shifter. Once again, all of these signals are depicted in Figure 4.11. The power consumption of this comparator is 41.2 mW.  59  Chapter 4. High-Speed Frequency Comparators  4.8  Conclusion  In this chapter, a method to switch the operation from the binary to the linear PD is presented. This is then followed by the introduction of two frequency comparator circuits, none of which are based on counters, and thus they have a significantly higher operation speed compared to prior work. This enables fast operation. Of the two introduced frequency comparators, one is leakage-tolerant, while the other is high-speed. For the sake of generality, the leakage-tolerant design is chosen to be used in the final PLL design. Chapters 3 and 4, completed the PFD section of the PLL design. We also know from Chapter 2 that the PFD is connected to a charge pump, which translates the PFD output, to a single-ended signal to be used by the VCO. In the next chapter, we will introduce some basic charge pump designs and then we will discuss the charge pump used in this thesis.  60  Chapter 5 Charge Pump 5.1  Introduction  Charge pumps are used as an interface circuit between the phase/frequency detectors and the loop filters and they basically ”translate” the PD or FD output to signals usable by the loop filter and the VCO. Typically, the charge pump output current is translated to a voltage by the loop filter which in turn changes the frequency of the VCO. Depending on the PD or FD design, various CP structures have been proposed [77]. Besides the PFD design, another important factor affecting the locking range of PLLs is the output swing of the charge pumps. To get the maximum tuning range from the VCO, its control voltage should be able to provide a large swing (ideally rail to rail). Since this is the output of the loop filter which is typically connected to the output node of the charge pump, the CP output voltage characteristic directly affects this swing. The ability of the charge pump to provide a wide swing, ideally a rail-to-rail change, is therefore critical. In this section and after the introduction of a few widely-used charge pump designs, we will introduce a feedback-based technique to increase the charge pump output swing.  5.2  Charge Pump  Simply put, a charge pump is a circuit that converts the output of the PD to a current which then flows through the loop filter. The basic block diagram of a generic CP was 61  Chapter 5. Charge Pump shown in Chapter 2, but is repeated here for convenience. Figure 5.1 shows this CP [23]. When the U p signal is active, current I1 charges the output node and thus the voltage on the output capacitor increases, which in turn results in increasing the frequency of the VCO. When the Dn signal is active, current I2 discharges the output node and decreases the output voltage, which in turn results in decreasing the frequency of the VCO. Based on the discussion on the Hogge PD in Chapter 2, in a typical PLL, the reference signal is connected to the Dn input of the CP, while the U p input is connected to the proportional signal. The problem with the above CP is that when one switch is off, the voltage difference across its corresponding (non-ideal) current source will become zero. For example, assume U p is zero. The voltage of node A will become equal to VDD to make I1 equal to zero. When the U p signal arrives again, the switch connects node Out to VDD creating an unwanted transient ripple.  VDD Iup A Up Out Dn  C1 B  Idn  Figure 5.1: Basic charge pump structure [23]  The circuit proposed in Figure 5.2 [70] solves this problem. Also, attempt should be made to make the value of the (non-ideal) current sources I1 and I2 equal [78], which could 62  Chapter 5. Charge Pump be difficult if the voltage of the output node is not VDD /2. In this circuit, the inputs are differential, which is another advantage when compared to the circuit of Figure 5.1. VDD I1 A UP  UP  UP  Out  OutL  DN  UP  DN  DN  DN  CL  B I2  Figure 5.2: Low-ripple charge pump [70]  The main purpose of the buffer is to make the voltages of the nodes Out and OutL exactly equal, so that the value of the current source stays exactly the same before and after the branch switching. In the above CP, a rail to rail design for the buffer is preferred. In [23] and [79], a few rail to rail buffer structures are proposed. The buffer used is shown in Figure 5.3. In this buffer, and when the input level is close to VDD , transistors M1 and M2 are working, and when the input is close to zero, transistors M3 and M4 handle the signal. In the middle, all these transistors are in saturation, which increases the gain of the amplifier 63  Chapter 5. Charge Pump VDD M5 M6  Vb1 M1  M7  M2  M8  I1 Vin1  Vin2  Vout  VDD I2  M3  M4  Vb2  M9 M10  M11 M12  Figure 5.3: Rail-to-rail buffer [23]  Although more sophisticated structures are proposed in the literature [80], which have a constant gain in the whole region of operation, since this amplifier is used as a buffer in our design, its non-constant gain of is not problematic. The addition of the buffer has solved the unwanted ripples on the CP output, but has increased the power consumption of the CP. In [77] another CP is proposed which does not use a buffer. This is shown in Figure 5.4. However, in this figure, note that when the current switches from one branch to the other, the situation changes a bit, which changes the voltages of nodes A and B again. However, the ripple introduced into the loop filter in this case is small and can be tolerated in some applications. Both CPs introduced so far in Figure 5.4 and Figure 5.2 suffer from the inherent 64  Chapter 5. Charge Pump VDD I1 A UP  UP  UP  UP  Out Vdd DN  DN  DN  DN  CL  B I2  Figure 5.4: Charge pump design with no buffer [77]  mismatch between NMOS and PMOS transistors in the switch. The circuit proposed in Figure 5.5 solves this problem [77, 81]. Finally, depending on the design specifications, other CP structures are proposed in the literature that minimize the mismatch between I1 and I2 [82], have a very low power consumption [83], or are optimized to work in very high frequencies [84]. Also, CPs with fully differential outputs have also been proposed in the literature [85], which use VCOs with differential tuning voltages [86]. A detailed discussion on the above CPs is out of the scope of this thesis. Now, assume that the above charge pumps are used with a Hogge PD. When, the signal U p is “1” and the signal Dn is “0”, the current I1 flows into the output capacitor, thus raising the voltage of the output nodes. When U p and Dn are both “1”, ideally the voltage on the output capacitor should not change, as long as I1 and I2 are equal. When both 65  Chapter 5. Charge Pump VDD  Out UP  UP  DN  DN CL  I2  I1  Figure 5.5: Charge pump with NMOS switches [77]  U p and Dn are “0”, I1 and I2 flow through the left branch. Thus the current sources are always on, which eliminates the transient ripple. Let us assume that at the beginning of the operation, the voltage on the output node is VDD /2. Also, assume that the clock is early, the case shown in Figure 3.3(a). The output capacitor is charged with I1 when the Y output of the PD is “1” and discharged with I2 when the X output of the PD is “1”. In the case of Figure 3.3(a), since the pulse width of X is greater than that of Y and since I1 = I2 in the CP, the capacitor is discharged more than it is charged, and as expected the loop filter voltage decreases to decrease the output frequency of the VCO. However, the problem with the above charge pump is that as the voltage on the output node decreases, so does the voltage of node B (and to a lesser extent the voltage of node A). Since the current sources are non-ideal, the decrease of the voltage of node B will decrease I2 . Eventually although the capacitor is discharged for more time, it is discharged with less current, and the CP output settles on a value that is more than expected. This limits the swing of the charge pump which in turn limits the PLL pull in range, a highly undesirable effect.  66  Chapter 5. Charge Pump  5.3  Wide Swing Charge Pump  One way to solve the problem with the limited swing of the charge pump is to detect the change of the voltage on nodes A and B (in Figure 5.2) and to try to keep the currents I1 and I2 constant. We have done this in Figure 5.6 [33], where transistors M3 and M8 are responsible for sampling the voltage of nodes A and B. VDD  VDD  M7  M6  M8  VDD  A UP  Iup  UP  UP  UP  VDD M10  Out M9 M2  DN  DN  DN  DN  B  M5  M1  Idn  M3  M4  Figure 5.6: The proposed high swing charge pump  Assume the same case described above; namely, consider that the output voltage is decreasing. This in turn decreases the voltage on node B. Now, however, the decrease on the voltage of node B is detected by M3 . This decrease, increases the gate source voltage of the PMOS transistor M3 , thus increasing its current. This current then flows through 67  Chapter 5. Charge Pump M4 , which increases its gate source voltage, which increases the gate source voltage of M5 , thus increasing its current. Therefore, although the decrease in the drain voltage of M5 decreases its current, the increase in its gate-source voltage increases the current, and if the design is done carefully, these two effects will balance each other, keeping I2 almost constant. On the other hand, the decrease in the voltage of node A, decreases the gate source voltage of M8 , which decreases its current, which in turn decreases I1 . So the increase in I1 due to the increase of its drain source voltage is balanced out by the current decrease due to the decrease of the gate source voltage of M6 . A similar scenario takes place when the clock is late and the CP increases the voltage on the capacitor of the loop filter. Finally, it is important to design the CP, such that the two mechanisms for changing the current of each current source balance each other out. In order to do this, consider the bottom branch once again. The amplifier can be modeled by a common-source amplifier with source degeneration, with M3 as the main amplifier. The gain for such a configuration can be expressed as [23]:  Av =  −gm RD ro RD + RS + ro + gm RS ro  (5.1)  where gm is the transconductance of the transistor, RD models the drain resistance, RS is the resistance seen at the source and ro is the transistor output resistance. In Figure 5.3, we have  Av =  −gm3 ro3 × 1/gm4 1/gm4 + ro2 + ro3 + gm3 ro2 ro3  (5.2)  The design should be done such that the above equation is equal to −1/(gm5 ro5 ), so that the two mechanisms equalize each other. After some manipulations, the result can be  68  Chapter 5. Charge Pump given as: (  W4 2 1 W2 ) × 2 = 45 L4 L2 L5  (5.3)  Similarly, for the top current source, we should have 1 W62 W7 2 ) × 2 = 4 ( L7 L9 L6  (5.4)  It should be noted that several conditions should be satisfied in the CP design for proper operation. Firstly, the bottom and the top current sources should be equal. Secondly, Equations (5.3) and (5.4) should be satisfied. Thirdly, transistors M2 , M3 , and M4 and also M7 , M8 , and M9 should operate in saturation for as long as possible. Also, note that current source transistors, M5 and M6 , should not be minimum length transistors, as this decreases their output resistance.  5.4  Simulation Result  The proposed charge pump is designed and simulated in 0.13 µm CMOS with a supply voltage of 1.2 V. Figure 5.7 compares the swing of the proposed CP to that of the CP in Figure 5.2 when the CPs are connected to a Hogge PD. Both CPs are tested with the same inputs, and the same up and down current. Also, the two CPs also have the same input capacitance and drive the same load capacitance. For the Hogge PD, we are assuming a clock frequency of 5 GHz and a data frequency of 2.5 GHz. Table 5.1 summarizes the results and compares this work with a few other designs that are reported in the literature. Note that in all designs, Iup = Idn , and is arbitrarily chosen to be 800 µA. The U p and Dn inputs are exactly the same in all cases (all CPs are connected the the exact same Hogge PD) and the circuits have all been designed to have an equal input capacitance. The design of the charge pumps are also optimized in 0.13 µm 69  Chapter 5. Charge Pump  Figure 5.7: The proposed high swing charge pump operation  CMOS process with a supply of 1.2 V.  This work Figure 5.4 Figure 5.5 Ref [70] Ref [78] Ref [82] Ref [53] Ref [84]  5.5  Table 5.1: Comparison of charge pump swing Minimum Maximum Power Publication Output Voltage Output Voltage Consumption year (mV) (V) (mW) 23.3 1.18 6.31 2010 409 0.85 3.7 1999 411 1.02 3 1999 188 0.957 6.12 1992 67.5 1.025 10.37 2009 18 1.12 8.4 2000 264 0.888 3.78 2007 260 0.833 3.24 2007  Conclusion  In this chapter, the basic challenges in the design of a charge pump were reviewed. The limited output swing of a widely used charge pump is discussed and a feedback-based 70  Chapter 5. Charge Pump solution to improve the output swing is proposed. Finally, a wide-swing charge pump is designed in a 0.13 µm CMOS technology where simulations show up to ∼2.6× improvement in the output swing as compared to other CP designs available in the literature. In the next chapter, another block used in the PLL, namely voltage-controlled oscillators will be introduced.  71  Chapter 6 Voltage-Controlled Oscillator 6.1  Introduction  Oscillators are an important part of many electronic circuits. Applications range from clock generation in microprocessors to carrier synthesis in cellular telephones, requiring different oscillator topologies and performance parameters [23]. In PLL circuits, voltage-controlled oscillators (VCO) convert the output of the charge pump (and the loop filter), which is in usually in the form of voltage or current, to frequency, which is then fed back to the PD/PFD block. In most applications, the buffered output of the VCO becomes the PLL output as well. This signifies the important role of the VCO is the PLL design. This chapter starts by introducing oscillators. Voltage controlled oscillators are then discussed and the VCO structure used in the PLL design will subsequently be introduced. Also, the noise of the VCOs will be briefly introduced. This is then followed by simulation results, in which the operation of the designed voltage-controlled oscillator is demonstrated.  6.2  Oscillators  An oscillator is a circuit which produces a periodic output signal, usually in the form of voltage, where no AC input is present [23]. Basically, the circuit operates by allowing its own noise to grow at a specific frequency to eventually become a periodic signal. 72  Chapter 6. Voltage-Controlled Oscillator It is known that a badly designed negative feedback circuit has the potential to oscillate [23,79]. Many analog designs use frequency compensation to avoid this, however, oscillators are closed-loop amplifiers intentionally designed to be unstable. Consider a unity-gain feedback system, with a closed loop transfer function of:  H(s) |close =  Vout H(s) (s) |close = Vin 1 + H(s)  (6.1)  where H(s) is the open loop transfer function. Now consider a frequency, ω0 , at which [86]13 :  | H(jω0 ) | ≥ 1 ∠H(jω0 ) = 180◦  (6.2)  In this frequency, the denominator of Equation 6.1, is zero, making the value of close loop transfer function approach infinity. Thus any small input at this frequency is significantly amplified. Considering the fact that the white noise of the circuit contains all frequencies [86], the component of this noise at ω0 is amplified to infinity, which will eventually saturate the output. Thus in the output of the circuit, a periodic signal with an amplitude close to VDD and a frequency of exactly ω0 is obtained. In summary three factors affect the circuit oscillation. The circuit topology dictates the frequency of oscillation. The circuit noise provides a signal with this frequency and finally, the DC sources provide the power needed for oscillation to take place. In a sense, an oscillator can be viewed a DC to AC power converter. This is the idea behind the design of a very important type of oscillators, known as ring oscillators [23], which consist of a few gain stages in a loop. However since a significant number of noise generating elements (transistors) are present 13  These two conditions are known as Barkhausen criteria.  73  Chapter 6. Voltage-Controlled Oscillator in each stage, these oscillators are generally quite noisy and are not used here [23]. Another widely used type of oscillators is called LC oscillators. It is well-known that an ideal LC circuit does not consume energy. So any energy given to the circuit will move between the magnetic field of the inductor and the electric field of the capacitor indefinitely. The frequency of this “oscillation” is [23]:  ω=√  1 LC  (6.3)  However, all inductors are lossy, which convert the system from an LC system to an RLC system. In an RLC circuit, each time the energy is moving between the magnetic and the electric field, a fraction of it is wasted in the resistor [87], and unless this energy is somehow returned back to system, the oscillation eventually dies out. Active devices, (transistors), can be used for this purpose. Consider the circuit shown in Figure 6.1. Rin  M1  M2  I1  Figure 6.1: Positive feedback amplifier  It can be proven that Rin = −2/gm , where we are assuming that gm1 = gm2 = gm . Note that the resistance is negative, meaning that the circuit can generate AC energy. Now the LC circuit and the above positive feedback circuit can be modeled as shown in Figure 6.2 [88, 89]. In the above circuit, oscillation can take place if Ra ≤ R. In order for it to start though, 74  Chapter 6. Voltage-Controlled Oscillator  C  L  -Ra  R  Noisy active device Figure 6.2: VCO model [89]  an instantaneous energy should be given to the circuit which can be the energy given upon connecting the voltage sources to the circuit. This initial oscillation is then indefinitely retained.  6.3  Voltage-Controlled Oscillators  Voltage-controlled oscillators (or current-controlled oscillators (ICO)) are oscillators whose output frequency changes with voltage (or current). This is easily done in ring oscillators, as it is easy to control the delay of each stage with voltage. However, the issue is a bit more complicated when it comes to LC oscillators. Going back to Equation 6.3, two solutions are possible to change the operating frequency of LC oscillators, namely changing the inductance, and changing the capacitance. Changing the value of the passive inductors is not easily possible. [57] and [90] do so by switching discrete inductors in and out of the circuit. However, only doing so will result in a VCO whose output spectrum is not continuous. To solve this, capacitor tuning is used to make the output spectrum continuous. Also, passive inductors are generally bulky components, and having a few of them in the design will take a considerable amount of area. Another way to change the VCO frequency is to use active inductors. Active inductors  75  Chapter 6. Voltage-Controlled Oscillator are active circuits which show an inductive behavior in a certain frequency range. Their design is based on only resistors, capacitors and transistors [91], and thus, they are significantly smaller than their passive counterparts and this is probably the main reason for their popularity. [92–94] propose a few VCO structures with active inductor circuits. However, due to the realization of inductance using transistors (which are noise generators), these VCOs are usually quite noisy. They are also not very linear and have a limited output swing. Finally, the most popular way to realize a VCO is to make the value of the capacitors change with voltage [23,86,88,89]. These capacitors, whose values change with voltage are called varactors. As a simple example, consider a reverse biased diode. It can be shown that the capacitance for such a structure is [86, 95]:  Cvar =  C0 (1 + ΦVRB )m  (6.4)  where C0 is the zero-bias capacitance, VR is the reverse-bias voltage on the diode, ΦB is the built-in potential of the junction, and m is a constant typical between 0.3 and 0.4. However, diode-based varactors suffer from a low quality factor, and have a high parasitic capacitance. They are also highly non-linear [86]. An alternative MOSFET-based varactor structure is shown in Figure 6.3(a) [86]. The characteristic of this varactor is shown in Figure 6.3(b). Note that this characteristic is not monotonic [86]14 . To make it monotonic, usually the MOS transistor is put in an NWell. This is then called an “accumulation mode MOS varactor” [86]. The characteristic of this varactor is shown in Figure 6.4 [86]. In our design, and since we do not have access to a negative supply, obtaining a negative 14  The physical reason for the non-monotonic behavior of the device is discussed in [96].  76  Chapter 6. Voltage-Controlled Oscillator  + VGS -  (a)  (b)  Figure 6.3: (a) MOS varactor structure (b) MOS varactor characteristics [86]  Figure 6.4: MOS accumulation varactor characteristic [86]  Vgs can be quite difficult. We are thus using the MOS varactor shown in Figure 6.3. The VCO used in our design has the conventional structure reported in [86]. Figure 6.5 illustrates this VCO. Note that to increase the tuning range of a VCO, the ratio of the parasitic capacitance to the varactor capacitance should be kept small. However, in our PLL design, since the VCO is both connected to the PFD and the frequency comparator, it is highly loaded. To ensure a wide tuning range, the channel length of the varactor should be increased, which in turn increases the transistor noise [23]. Probably, the easiest solution to this problem 77  Chapter 6. Voltage-Controlled Oscillator  VDD  I  L1  L2  Vout  M4  Voutb  M1  M2  M3  Vcnt Figure 6.5: Varactor based voltage-controlled oscillator [86]  is to buffer the VCO output, before connecting it to the PD. However, the introduction of the buffer into the PLL system, introduces an inherent delay, which can be modeled as an e−ts factor in the transfer function of the system. This delay will subsequently deteriorate the system stability [97]. In this design, no buffer is used, and to get a wide range, large varactors are chosen, which make the VCO noisy.  6.4  VCO Noise  The VCO noise appears in two forms, namely noise in amplitude and noise in phase. Noise in amplitude is usually ignored, as its effect is usually negligible. However, noise in phase, known as phase noise, should be carefully dealt with, and reduced as much as possible.  78  Chapter 6. Voltage-Controlled Oscillator The output of a (noisy) oscillator can now be written as:  x(t) = A(t)cos(ωc t + Φn (t))  (6.5)  where ωc is the main frequency and Φn (t) is the phase noise term. Phase noise is more formally defined as the ratio of the power contained in a unit bandwidth of the signal at an offset of ∆ω from ωc , to the power of the main signal [98]. To model the phase noise, Leeson [99] models the VCO as a linear time invariant (LTI) system, while Hajimiri [100] models it as a linear time variant system. After some manipulation, the Leesons formula results in Equation 6.6 for the phase noise [57].  L(∆ω) = 10log{  2F kT ω0 2 ) )} (1 + ( Psig 2Q∆ω  (6.6)  where F is the noise factor, Psig is the signal power, Q is the quality factor of the inductor, ∆ω is the frequency offset at which the phase noise is calculated, ω0 is the oscillation frequency, k is the Boltzmann constant and T is the temperature of operation. Finally, the effect of this phase noise, plus any other noise in the PLL, is observed at the output of the PLL as jitter.  6.5  Designed VCO and Simulation Results  The VCO shown in Figure 6.5 was designed, laid out and simulated in a 0.13 µm technology node with a VDD of 1.2 V. Note that since the VCO frequency is sensitive to parasitic capacitances, post-layout simulation results of the VCO are reported here. The VCO is fully loaded with the extracted models of all the other blocks. Figure 6.6 plots the output frequency of the VCO versus its control voltage.  79  Chapter 6. Voltage-Controlled Oscillator  Figure 6.6: Voltage controlled oscillator output  The single ended oscillation amplitude of the VCO varies from 1.059 V to 1.06 V, when the control voltage changes from 0 V to 1 V and the VCO consumes 3.48 mW of power. The phase noise of the VCO is -104 dBc/Hz at an offset of 1 MHz.  6.6  Conclusion  In this chapter, voltage-controlled oscillators are introduced. This is then followed by a brief discussion on various mechanisms to change the oscillation frequency. Also, the phase noise of oscillators is briefly introduced. The chapter is concluded by plotting the simulation results of the designed VCO. In the next chapter, the PLL stability, and the design of the loop filter will be discussed.  80  Chapter 7 Loop Filter Design and Stability 7.1  Introduction  After the design of all the PLL blocks, they should be connected together in a system. At this point, the loop filter should be designed, so that the final PLL design satisfies all the required specifications. Two general types of loop filters can be found in the literature, namely active and passive. Passive loop filters are more common, as they are simple and do not consume any power. They are also mainly based on capacitors and resistors [21,23–25]. However, active loop filters provide more design flexibility, and consume less area and thus they are used in some designs [101, 102]. In this design passive filters are used. In this chapter, several passive filters will be introduced. This is then followed by introducing the filter structure used in this design. Also, the trade offs in the filter design will be discussed. We will then show the benefits that can be obtained by switching the PLL bandwidth, after lock acquisition. The chapter is finally concluded by discussing the method used to design the filter.  7.2  Passive Loop Filters  Several passive filter designs can be found in the literature. Let us start our discussion by looking at Equation 2.9 again, where a resistor was placed in series with the capacitor to 81  Chapter 7. Loop Filter Design and Stability make the system stable. Although a PLL with this loop filter is operational, its output jitter is considerable, as there are jumps on the loop filter voltage due to charge injection and clock feedthrough [23]. To decrease this, a capacitor, C2 , is connected in parallel with the series combination of C1 and R1 . It is mentioned in [23] that if C2 is about one-fifth to one-tenth of C1 , the frequency response of the PLL remains relatively unchanged. This is in fact the loop filter used here, and it many other designs [24, 30, 43, 63]. The structure of the loop filter is shown in Figure 7.1.  R1 C2 C1  Figure 7.1: PLL loop filter circuit [23]  Depending on the design needs, some designers use different loop filter designs. [25] uses a resistor between the CP and the series combination of C1 and R1 . [44] puts R1 in parallel with C1 and connects C2 in series with this parallel combination. Other designs for the loop filter can also be found [21, 35].  7.3  Loop Filter Design  Designing the loop filter in MATLAB is inaccurate due to the approximations made in the modelling of each component. Also, binary PDs are more complicated to model and analyze than their linear counterparts [6]. In our design, and due to the non-linearity introduced by switching, the inaccuracy of the mathematical models are even worse. To 82  Chapter 7. Loop Filter Design and Stability circumvent the problem, and considering that the final lock is obtained using the linear PD, the PLL only with a linear PD was analyzed in MATLAB. The loop filter was designed such that this system has a phase margin of about 65◦ . This bode plot of this system is shown in Figure 7.2. To obtain this plot and referring to Equation 2.9, Ip is 150 µA, KV CO is 2 GHz, R1 is 4 KΩ and C1 is chosen to be 2.5 pf.  Figure 7.2: Phase margin of the linear PLL  In Cadence however, this necessarily does not provide with the optimum result for the PLL output. Although the PLL locks, it should still be optimized. In order to optimize the loop filter, it should be considered that increasing the PLL bandwidth decreases the lock time, but increases the jitter [38]. So one idea that comes into mind is to increase the bandwidth to achieve a fast lock, and then after the linear PD locks, decrease it again to achieve a low jitter. This bandwidth switching idea is in fact not new, and has also been done in [103, 104]. 83  Chapter 7. Loop Filter Design and Stability However, note that as we are increasing the bandwidth, it is vital to ensure that the PLL still locks in all the process corners. In our design, we found that decreasing R1 from 4 KΩ to 1 KΩ drastically decreased the lock time, without compromising stability. Decreasing this resistor more could make the PLL unstable in some corners. The calculated phase margin for the improved PLL is about 25◦ . Note that the PLL is still stable in all the corners15 . About 20 ns16 after switching occurs between from the bang-bang to the linear PD, the PLL switches its bandwidth from approximately 1% to approximately 0.1% of the clock frequency. This decreases the jitter of the final PLL output. In this situation, the phase margin of the PLL increases to about 45◦ . To generate the 20 ns delay, both analog [105] and digital delay generators are possible. Due to their simplicity, a digital delay generator was used here. This is shown in Figure 7.3. Note that the value of the delay does not need to be accurate. In  Out  Figure 7.3: Digital delay generator  Finally, note that the bandwidth is switched by adding a capacitor in parallel with C1 . 15 16  We have also taken the effect of process variation into account. This is almost twice the worse case lock time of the linear PD,as will be seen in the next chapter.  84  Chapter 7. Loop Filter Design and Stability  7.4  Conclusion  In this short chapter, the design of the loop filter is discussed, and also the operation of our bandwidth switching circuit is explained. Now that we have discussed all of the blocks in the PLL, it is time to put them all in a closed loop system to make the PLL. This is done in the next chapter, where the design of the pad buffer can also be found.  85  Chapter 8 Simulation Results 8.1  Introduction  After discussing the design of each block, this chapter combines all of these blocks in a closed loop system to make a PLL. The PLL is then simulated in this chapter, and the simulation results are compared to a few other works. In this chapter, the circuit level design of any other block used will also be explained. These blocks include the reseting circuitry and the buffer to drive the pads.  8.2  PLL Design  To demonstrate the effectiveness of the proposed hybrid PFDs, two PLLs were designed in 0.13 µm CMOS. The low frequency PLL (∼ 5 GHz clock) uses the low frequency PD/FD. For this PLL, the CP is designed such that the current pumped in the loop filter due to the FD command is twice as much as the pumped current due to the PD command. This increases the locking speed. The high frequency PLL (∼10 GHz clock) uses the high frequency PFD.  8.2.1  Low-Frequency PLL  The low frequency PLL is designed only as a proof of concept circuit for the PD/FD and it was not optimized. It also does not have the bandwidth switching capability and in it, 86  Chapter 8. Simulation Results switching between the binary and the linear PD is done manually. All the circuits reported in the previous chapters17 are optimized for the high-frequency PLL. For the low frequency PLL, all these blocks are redesigned. Since we did not tape this design out, no buffer to drive the pads and also no initializing circuitry is reported here. In this PLL, the VCO has a tuning range from 3.85 GHz to about 5.05 GHz. Figure 8.1 shows a sample lock when the input frequency is 2.32 GHz. The resulting clock frequency is 4.65 GHz.  Figure 8.1: Sample locking of the low frequency PLL  The PLL locks with a (deterministic) jitter of 2.2 ps, as shown in the eye diagram of Figure 8.2. It also consumes a power of 42.6 mW from a 1.2 V supply. The locking range of the PLL is approximately 1.1 GHz. To show this, Figure 8.3 shows the control voltage of the VCO of the PLL, where the input frequency is approximately 1.95 GHz. Also, to show the effectiveness of the FD, we have applied a signal whose frequency is much lower than that of the clock. Although the PLL will not lock, the FD should pull the control voltage of the VCO down as much as possible. This in fact is happening in Figure 8.4. 17  The reported CP was optimized for the low frequency PLL, as the effectiveness of the introduced feedback was our main focus. The CP was redesigned in the high-frequency design.  87  Chapter 8. Simulation Results  Figure 8.2: Jitter of the low frequency PLL  Figure 8.3: Minimum control voltage of the VCO in the low frequency PLL  8.2.2  High-Frequency PLL  PLL Design and Simulation Figure 8.5 shows the system level view of the high-frequency hybrid PLL. The operation of the above PLL starts with the hybrid PFD. Two MUX gates are used for selecting the outputs of the linear or the binary PD. The frequency comparator is the block responsible for making this selection. The delayed version of the frequency comparator output is used to turn the frequency comparator block off. This way, we can be sure that after the operation switching, the frequency comparator is not consuming any more power. Note that reseting the system also resets the frequency comparator output, 88  Chapter 8. Simulation Results  Figure 8.4: Operation of the low frequency FD Reset Reset  Din  Freq. Comparator CLK  Digital Delay  Off / On Delay  Din  Din  PFDUP PFDDN  Hybrid PFD CLK  Prop Ref  UP  Switch BW  High Swing CP  BW Switching LP  Clk  Buffer  PLL out  DN  Reset  Initializing Circuitry  Figure 8.5: Design of the Hybrid PLL  whose delayed version, completely turns the block back on. The output of the two MUX gates subsequently drive the charge pump, whose output is connected to the loop filter. The loop filter output is connected to the VCO control voltage, and the VCO output is fed back to the hybrid PFD to close the loop. A delayed version of the frequency comparator output take care of bandwidth switching. Finally, note that except the buffer and the initializing circuitry, the design of all other blocks have already been discussed. The above PLL was designed, laid out and simulated in IBM 0.13 µm technology, and with a supply of 1.2 V. Three modes of operation can be seen in the simulation results, 89  Chapter 8. Simulation Results namely, binary PLL, linear PLL before bandwidth switching and finally, linear PLL after bandwidth switching. In Figure 8.6 we have shown the output of our PLL when the data frequency is 4.54 GHz, and the clock frequency is 9.1 GHz. In this figure, we have attempted to demonstrate the operation of the system.  Figure 8.6: Sample lock of the high-frequency PLL  In Figure 8.6, and from the top, the first figure is the VCO control voltage; the second and the third figure are the up and the down commands of the CP respectively; the fourth figure is the output of the frequency comparator; the fifth figure shows the the signals OutClk (solid red) and outdata (dashed blue) (which can be found in Figure 4.4); and finally the bottom figure is the comparator output after buffering and level shifting. In Figure 8.6, the followings can be seen: • For the first 6 ns, the binary PFD is active. • The frequency comparator is responsible for generating OutClk and outdata. Since 90  Chapter 8. Simulation Results the frequency is decreasing, the value of OutClk is increasing, until it exceed Outdata, at which point switching takes place (Figure 8.7 shows this more clearly.). The frequency comparator is then “turned off”. This can be seen by noting that in the rest of the simulation, no signal is generated on OutClk and outdata. Finally note that if the switching is done properly, the VCO control voltage should change its slope after switching, meaning that it should increase if it was decreasing, and vice versa. For example, in this figure, switching takes place as soon as the clock frequency drops below the data frequency. At this point, the linear PLL is faced with a situation where the clock frequency is slightly smaller than the data frequency, and thus it should increase this frequency for locking to take place. • After this and for the rest of the operation, the linear PD takes charge of the operation, however, the bandwidth is still large. • Finally, at around t=28 ns, the bandwidth is switched. Note that the PLL after bandwidth switching is still able to track the slight frequency changes in the input, however, if this frequency change is considerable, re-locking can be slow. In these cases, it might be a better idea to reset the PLL. Note that this PLL consumes 36.3 mW of power. Figure 8.7 shows the operation of the frequency comparator. More specifically, we have zoomed in to show OutClk and outdata in Figure 8.6. In figure 8.7 and once again from the top, the dotted blue line is the frequency of the input data multiplied by two, the solid red line is the VCO frequency; in the second figure, the solid red line is OutClk and the dotted blue line is Outdata; and the third figure is the output of the frequency comparator. Let us now look at the outputs of the PLL at another frequency. This time, let the data frequency be 4.35 GHz. The resulting clock frequency would be 8.7 GHz. This is shown 91  Chapter 8. Simulation Results  Figure 8.7: Operation of the frequency comparator  in Figure 8.8. To make this figure a bit smaller, we are only showing the control voltage of the VCO and the output of the frequency comparator. Although it is difficult to see in the control voltage, the bandwidth has switched at approximately t=34 ns. To show this one more figure is added to Figure 8.8, which shows the frequency of data and clock. Thus in Figure 8.8, the top figure is the plot of the clock and of the data frequencies, where the the dotted blue line is the input data frequency multiplied by two and the solid red line is the VCO frequency; the middle figure is the VCO control line; and the bottom figure is the output of the frequency comparator. In Figure 8.9, we are showing the output of the frequency comparator, where the dotted blue line is the frequency of the input data multiplied by two, the solid red line is the VCO frequency; in the second figure, the solid red line is OutClk and the dotted blue line is Outdata; and the third figure is the output of the frequency comparator. It can be seen both in this figure and in Figure 8.7 that the frequency comparator switches the operation a few nano seconds after it is supposed to. The reason for this is the inherent delay of the internal blocks of the frequency comparator. Note that Figure 8.1 shows the case where  92  Chapter 8. Simulation Results  Figure 8.8: Locking of the PLL at a frequency of 8.7 GHz.  manual switching is done at exactly the right time. However, in this case, the switching is fully automatic, and is a few nano seconds behind.  Figure 8.9: Operation of the frequency comparator in Figure 8.8.  Now that we have seen how the PLL works, let us take a look at the PLL jitter before and after bandwidth switching. Note that in these simulations, the inherent transistor noise is ignored (this is why we are using the term “deterministic jitter”)18 . The inclusion of the 18  This is because in time-based Spectre simulations, transistor noise is not included. Small signal AC  93  Chapter 8. Simulation Results transistor noise will clearly deteriorate the reported jitter, but here, at least some sense about the effect of the bandwidth switching can be obtained. Figure 8.10 and Figure 8.11 show the eye diagram of the PLL before and after lock respectively for the lock of the PLL shown in Figure 8.8. As it can be seen in these two figures, the deterministic jitter decrease from 113 fs to about 32 fs.  Figure 8.10: Jitter of the PLL before bandwidth switching in Figure 8.8.  To find the total random jitter, it should be noted that unlike other noise sources in a typical PLL, the VCO phase noise is high-pass filtered before it reaches the output. In many designs, the bandwidth of the PLL is chosen to be small [58], which increases the contribution of the VCO phase noise towards total PLL jitter. Typically, the VCO phase noise is the main contributor to the PLL random jitter. To find the total random jitter, the transfer function of the system from a noise source inserted at the VCO output to the PLL output should first be found. Multiplying the VCO phase noise by this transfer function, and integrating the result [86] will give us the total random jitter. Using this approach, the analysis includes transistor noise, but it can only be used in linear time-invariant systems. Transistor noise can be included in simulations as custom noise sources connected to particular nodes. However, this was shown to introduce a considerable error in the simulation result [98, 106]. Other simulation-based works either do not report their total jitter [20] or only report the deterministic jitter [31].  94  Chapter 8. Simulation Results  Figure 8.11: Jitter of the PLL after bandwidth switching in Figure 8.8.  total random jitter is 0.12 UI and 0.1 UI before and after bandwidth switching, respectively. Since in this design this noise is significantly larger than the deterministic noise, we are only focusing on the effect of the random jitter on the PLL jitter. Now, let us decrease the input frequency to the minimum locking frequency. It can be seen in Figure 8.12 that the PLL can lock to a data frequency of 4.16 GHz, which makes the clock frequency 8.33 GHz. Since we have already seen the operation of the frequency comparator, in this figure, we only show the VCO control voltage, and the frequency comparator output (Figure 8.13 shows a zoomed in view of the operation of the frequency comparator). In Figure 8.12, the top graph is the VCO control voltage, while the bottom graph is the frequency comparator output. Figure 8.13 shows the operation of the frequency comparator, where the dotted blue line is the frequency of the input data multiplied by two, the solid red line is the VCO frequency; in the second figure, the solid red line is OutClk and the dotted blue line is Outdata; and the third figure is the output of the frequency comparator.  95  Chapter 8. Simulation Results  Figure 8.12: Minimum locking frequency of the designed PLL  Figure 8.13: Operation of the frequency comparator in Figure 8.12  Note that the worst case lock time takes less than 30 ns (for the “high jitter” lock). Finally, let us apply a signal with a significantly lower frequency to see the operation of the FD part of the binary PFD. This is shown in Figure 8.14. Similar to Figure 8.4, we are expecting the VCO control voltage to reach its minimum value, but obviously, the PLL will not lock. In Figure 8.14 and from the top, the top figure is the VCO control voltage; in the second figure, the red solid line is the down command of the CP while the dashed blue line is the  96  Chapter 8. Simulation Results  Figure 8.14: Operation of the PLL under a significant frequency difference  up command of the CP; the third figure is the frequency comparator output, which does not change the operation mode from binary to linear as expected; the fourth figure is the up output of the linear PD; and the fifth figure is the down output of the linear PD. The momentary ripple on all these voltages are because of a transition in the data input, as the input frequency is 1 GHz. The reason we have included the outputs of the linear PD is to show that they both stay at zero, in the case a data transition is lacking, which means that the VCO frequency will not change, making the PLL suitable for CDR applications. To summarize the results, the PLL has a locking range from 8.3-9.6 GHz, consumes a power of 36 mW and has a deterministic and random jitter of about 35 fs and 11.7 ps respectively, at least in simulations and its worst case locking time is less than 30 ns.  97  Chapter 8. Simulation Results Result Summary and Comparison Table 8.1 summarizes our results and compares it to a few other PLLs19 .  This work [20] [24] [43] [53] [38] [31] [30] [107] [108] [129] [109] [29] [110] [111]  Table 8.1: Comparison of Technology Lock range Jitter (µm) (GHz) (ps) 0.13 8.3-9.6 11.7 0.13 5.3-5.7 NA 0.35 0.62-0.93 12.5 0.18 10-10.25 2 0.18 5-6.25 11 0.18 5.27-5.6 NA 0.13 0.05-0.36 9 0.13 8.7-10.96 15.1 0.13 17.6-19.46 4.9 0.13 2.5-3.1 0.86 0.18 10-10.77 14.5 Si-Bipolar 2.25-2.35 7.7 0.5 0.75-0.8 7.5 0.15 0.07-0.04 200 0.35 0.36-1.44 45  PLL results Power Lock time (mW) (ns) 36 30 NA 750 NA NA NA ∼200 120 NA 19.8 <10000 NA ∼900 195 NA 480 NA 35 NA 72 NA 100 NA NA 200 NA 1000 23 2700  Publication year 2010 2008 2000 2004 2007 2010 2008 2005 2005 2003 2001 1993 2003 2008 2003  In the above table our obtained jitter is significantly smaller than other papers. The reason for this is the exclusion of the transistor noise from the simulations. Note that in this table, we have included some papers which are based on simulation results and some which are based on measurement results. Our superior lock time is due to the hybrid operation, and the high initial bandwidth. Finally, note that our PLL suffers from one draw back. If the initial data frequency is very close to the clock frequency, locking is achieved before the frequency comparator has the time to charge its capacitors. So although the PLL without the frequency comparator has a locking range from 8.3 GHz to 9.9 GHz, the existence of the frequency comparator has decreased this range. In applications where non-optimum switching can be afforded, 19  In this table, NA stands for Not Available.  98  Chapter 8. Simulation Results the tuning range of the PLL will increase. Also, we believe that the reported lock range is only limited by the VCO tuning range, and not by the PFD design. Using a more complicated VCO structure can lead to an increase in the PLL acquisition range.  8.3  Design of the Buffer and the Initialization Circuitry  To fabricate the above PLL, two more blocks are needed. In this section we discuss these two blocks.  8.3.1  Buffer Design  To see the output of the PLL on the oscilloscope, the VCO should be connected to the output pads. However, since the capacitance of these pads are considerable [112], the VCO frequency can significantly drop. What is even worse is that the input impedance of the oscilloscope is 50 Ω to ground. This changes the DC operation point of the VCO, which can even prevent it from oscillating. To avoid this, the VCO output should first be buffered. Also, in an SoC system, a PLL might have to drive various loads. To avoid redesigning the PLL each time the load changes, the VCO output has to be buffered first. However, this buffer cannot be in the PLL loop, as the introduced delay can risk stability. The buffer should be placed outside the loop, as shown in Figure 8.5. The design of a CML buffer is discussed in [124]. Basically, the important point in the buffer design is to ensure that it has a small input capacitance and a sufficient bandwidth. It should also be able to drive the pad capacitance, in parallel with the input resistance of the oscilloscope. The only way to do this is to use multiple stage CML buffer. Two of these stages are shown in Figure 8.15.  99  Chapter 8. Simulation Results VDD  VDD  R  Vin  R/2  R  W/L  W/L  Vinb  R/2  2W/L  2W/L  ISS Vbias  M1  2ISS Vbias  M2  Figure 8.15: CML buffer [124]  The designed buffer has six of the above stages to drop the drain resistance from 7 kΩ in the first stage to 50 Ω in the last one. Figure 8.16 shows the simulation result for the above buffer, where the top figure is the VCO output and the bottom figure shows the buffer output. The buffer is loaded with 50 fF of capacitance and 50 Ω of resistance. It consumes 40.8 mW of power. The above buffer however adds noise to the circuit, which deteriorates the PLL jitter measurement.  8.3.2  Initializing Circuit Design  In all of the above simulations, the initial CP output voltage was chosen to be 600 mV, which is equal to VDD /2. This is chosen as the reference for all the measurement (lock time is specially affected by this choice). However, in practice, a circuit should be designed to take care of this. This design however can easily be done by designing a transmission gate which is connected to a DC voltage source whose value is 600 mV. When the Reset signal in Figure 8.5 is high, this switch is conducting, thus resetting the loop filter voltage. 100  Chapter 8. Simulation Results  Figure 8.16: CML buffer output  8.4  Chip Layout  The above design was laid out in a CMOS 0.13 µm technology. Figure 8.17 shows this layout. Note that filling is not shown in this layout. The total area of the chip, including the pads is 690 µm × 673 µm and excluding them is 430 µm × 410 µm. The actual chip is shown in Figure 8.18.  8.5  Chip Results  During the chip testing we realized that the VCO only starts to oscillate when the supply voltage is increased to about 1.8 V. The output of the VCO when VDD is 1.8 V is shown in Figure 8.19. Also, the oscillation frequency is off by about 1 GHz. However, as it can be seen from this figure, the VCO output is fully differential. The current drawn by the whole circuit is also very close to the value obtained in the simulation phase. 101  Chapter 8. Simulation Results  Figure 8.17: PLL layout  This creates various problems for the circuit. Although the PLL is designed such that it is robust to a 10 % variation in the supply voltage, it does not lock when the VDD is 1.8 V. Also, a gate-source voltage of 1.8 V exceeds the maximum tolerable Vgs value of the transistors and this causes the chip to burn out after a few minutes of testing. Also, increasing VDD increases the current flowing through wires, necessitating the usage of thicker wires. To find the source of the problem, post-layout simulations on the VCO were performed over various process corners. It was observed that oscillation starts in each simulation without any problems. This persuaded us to consider the effects that are not modelled in Cadence simulations, and the filling of the chip is the most important one. The CMOS technology node used in this work requires the designers to locally fill  102  Chapter 8. Simulation Results  Figure 8.18: PLL chip  their chip with various layers, which are all the metal layers, the poly layer (PC) and the active layer (RX). A certain percentage of the area of any pre-defined window should be occupied by these layers [112]. Usually, the effect of these fillings are negligible in the circuit performance. However, in the case of inductors this is not the case. To be able to safely apply the inductor models provided by the foundry, it is recommended not to have RX within a distance of 50 µm from the inductor [112]. However, this usually results in the violation the filling requirements. Our initial design was not approved due to this very reason, and we were asked to satisfy the local densities. In order to do this, a dense filling around the inductance was done, which has drastically dropped the quality factor, thus increasing the loss of the inductor. The reason for this is the generation of the eddy currents in the filling blocks. It is well known that if the electromagnetic field through a conductor changes, a current, known as eddy current, is generated in that conductor. This  103  Chapter 8. Simulation Results  Figure 8.19: VCO output  current then flows in a circular loop and wastes energy [113]. Considering the very dense packing of the PC and the RX layer, this loss becomes significant. It was also recommended by IBM to avoid thick conducting metal loops around the inductor. Unfortunately, in the layout, we are routing the chip GND very close to the inductor, which drops the quality factor even more. To verify this, electromagnetic simulation was performed on the inductor with and without the various fillings and the GND loop20 . To be able to individually see the effect of each problem, simulations were performed on each layer separately (also simulations which include all the layers require too much memory). It was observed that the initial quality factor of the inductor at 10 GHz is about 18.6. This is shown in Figure 8.20. Note that the left figure shows the actual value of the inductance (in H), while the right figure shows its quality factor. Including the RX layer in the momentum simulations drops the quality factor of the 20  At the time of the tape out, the substrate model for the current technology was unavailable, which made accurate simulations of the filling effect almost impossible.  104  Chapter 8. Simulation Results  Figure 8.20: Inductor with no filling  inductor to about 13.8. This is shown in Figure 8.21.  Figure 8.21: Inductor with RX filling only  Finally, including the PC layer in the momentum simulations drops the quality factor of the inductor to about 12.6. This is shown in Figure 8.22. Also, filling for the other metal layers drops the Q to about 15, and the existence of the large conductor close to the inductor also drops the Q to about 15. The effect of all these drops collectively kills the VCO oscillation, which is the case in our current design. This is also why increasing the VCO current causes the oscillation to start again as this increase leads to an increase in the energy that the active circuitry in the VCO can provide. This problem could have also been avoided by generating the 105  Chapter 8. Simulation Results  Figure 8.22: Inductor with PC filling only  biasing current of the VCO off chip.  8.6  Conclusion  In this chapter, the simulation results for both the low and the high-frequency PLL are presented. The low frequency PLL is only simulated to show the operation of the low frequency PFD and the enhanced CP. In the high-frequency PLL, however, both the frequency comparator and the bandwidth switching circuitry are implemented. In this chapter we have also shown our buffer design and the PLL layout, and have compared our work with a few other papers. In the next chapter, we will be discussing the potential improvements to this work, and will be concluding this thesis.  106  Chapter 9 Conclusion 9.1  Introduction  This chapter concludes the thesis. In this chapter, the key contributions of this work are once again briefly reviewed, the issue of design scalability is discussed, and the potential limitations of the current work are considered. Finally, this chapter is concluded by listing a few ideas that we currently have to improve the current work.  9.2  Achievements  In this work, two hybrid phase-locked loops to be used in clock and data recovery applications are introduced and designed. The high-frequency PLL is also fabricated in a 0.13 µm CMOS technology. In both designs, the PLL operation starts using the binary PFD, and once the binary PFD is locked, the operation is switched to the linear PD. In the high-frequency PLL and after the lock of the linear PD, the bandwidth is switched to decrease the PLL jitter. The low frequency PLL, designed only as a proof-of-concept circuit has a locking range of about 1.1 GHz, a worst case lock time of about 160 ns, and a jitter of 2.2 ps. It consumes a power of 42.6 mW from a 1.2 V supply. Note that in this PLL switching from the binary PFD to the linear PD is done manually. The high-frequency PLL has an acquisition range from 8.3 GHz to 9.6 GHz (with 107  Chapter 9. Conclusion manual switching of the operation, the tuning range can be pushed up to 9.9 GHz) and the worst case lock time of the PLL is about 30 ns. The power consumption of the PLL is 36 mW. The designed PLL has the following features: • All the signals, (except for the VCO control voltage), both in the digital and the analog part are fully differential. This increases the noise immunity of the circuit. • The switching noise of the digital blocks on the power lines are highly reduced due to the use of CML logic to realize the digital gates. These circuits are fully differential and introduce a negligible supply noise. Also, logic inversion in this logic family can be done without using a gate. • Due to the differential nature of the design, simulations show that the amplitude of the inputs can be reduced to VDD /2, without any noticeable change in the PLL output. This can be particularly useful when the signal is getting equalized first. • Due to the use of CML gates, the power consumption of the circuit is approximately independent of the input and the clock frequency.  9.3 9.3.1  Limitations High Power Consumption  The power consumption of the PLL is relatively high. The reason for this is that although CML gates are beneficial in many aspects, they constantly consume static power. Compared to the power consumed by the digital gates in the circuit, the analog circuits have a negligible power consumption. In the next section, we will discuss a remedy for this.  108  Chapter 9. Conclusion  9.3.2  Turning Off the FD  Due to the lack of time, the taped-out frequency comparator does not switch itself off. The feature which allows the frequency comparator to switch off was added after the chip tape out. Also, note that this added feature can make the layout of the frequency comparator rather complex.  9.3.3  High Clock Frequency  The design of the PFD is such that the clock frequency is twice as much as the data frequency. Also, the latches in the PFD should run at twice the clock speed. This limits the usage of the designed PLL in high-frequency applications.  9.4 9.4.1  Future Work Using a More Advanced Technology  Considering that designing a high-frequency LC VCO is rather straight forward [114], and the operation frequency of the charge pump and loop filter is significantly lower than that of the VCO, the only factor limiting the maximum speed of operation of the PLL is the operation speed of the PFD. Since CML gates are used in the PFD, the maximum operation speed of the PLL is limited by how fast the digital CML gates work. As mentioned in Appendix A, inductive peaking can be used to increase the operation speed of CML gates, but the resulting chip would consume a lot of area. It is well known that as the gate length of transistors get smaller, their operation speed increases [74]. Thus, it is possible to increase the operation frequency of the designed PLL using a more advanced technology.  109  Chapter 9. Conclusion  9.4.2  Using the Fast Frequency Comparator  To switch the operation, two frequency comparators are introduced. Although the leakagetolerant one is used in this design, it might be a good idea to use the fast frequency comparator. This has several advantages. The design of the fast frequency comparator is easier than that of the leakage-tolerant comparator, and if optimized, it will consume less power (note that in Chapter 4, all the blocks were optimized for the leakage tolerant frequency comparator). However, probably the most important advantage is that the PLL locking range, in which the frequency comparator automatically switches the operation will increase (Just a reminder that the locking range of the PLL is 1.6 GHz when using a manual switch and 1.3 GHz when using the frequency comparator for switching). However, we should think of a solution to decrease the capacitor leakage to an acceptable level.  9.4.3  Decreasing the Power Consumption of the Chip  The high operation frequency of the chip is only possible because the gates in the hybrid PFD consume a significant amount of power. In fact, the majority of the chip power consumption belongs to the PFD block21 . Using inductive peaking on the gates would decrease this power, but increase the chip area. In our PFD, the design of all the flip flips are identical, which is unnecessary. Also, the XOR gates have been overdesigned to be able to increase the output swing of the PFD. Li and Green [115] also have a similar power problem to solve, as their PLL uses a conventional Alexander PD. To solve this problem, [115] models each gate in MATLAB. The model includes the delay of the gate (the gate bandwidth), and the basic gate operation. Now, the bandwidth of the gates are decreased to save power until the PD stops working. After the optimization of each gate in MATLAB, the PD is designed and optimized in 21  We are ignoring the power consumption of the buffers, as they are not part of the chip. Also, since the frequency comparator can be switched off easily, we are not including it here as well.  110  Chapter 9. Conclusion Cadence. This way, the gates can be designed independently and optimized for power consumption. The same idea can be applied to our hybrid PFD as well to decrease power. Also to save power and depending on the application, the power hungry frequency comparator can be replaced by a simple timer to switch the operation, which can also save a significant amount of power. Finally, the buffer in the charge pump can be removed by using a different CP structure, but as it can be seen in Chapter 5, this can lead to a smaller output swing for the CP.  9.4.4  Decreasing the Chip Area by Changing the VCO  We strongly believe that the locking range of the PFD far exceeds the obtained 2 GHz range. In this work the locking range of the PLL was limited by the tuning range of the VCO. Increasing the tuning range of an LC VCO can be done by increasing the size of its varactors. This however increases the phase noise, which will subsequently increase the PLL jitter. A wide tuning range can also be obtained from a ring-based VCO, but they are also noisy [52]. Active inductor VCOs [116] also generally have a wide tuning range, but a high phase noise. To obtain a wide tuning range, low phase noise VCO, two solutions have been proposed. [57] changes the inductance to change the frequency, however, this solution leads to a bulky VCO. Capacitor banks are used in [117] to increase the tuning range of the VCO. In this method, capacitors are switched in and out of the circuit to coarsely tuning the frequency. Fine tuning is then achieved using a normal varactor. This method leads to a low phase noise VCO. However, what is even more beneficial is that the wide frequency range is divided into smaller segments. This is crucial, since if the wide frequency range was to be covered by one segment, any ripple on the VCO control line would translate to a significant output jitter. In fact, [117] uses this very idea to increase  111  Chapter 9. Conclusion the acquisition range of their PLLs.  9.4.5  Designing a Complete Receiver  In the motivation section, it was mentioned that the effect of the channel should first be removed from the received signal before the data can be recovered. In fact, it was shown in [115] that the channel leaves the low frequency part of the signal intact, while seriously distorting the high-frequency part of it. This mainly creates inter-symbol interference (ISI) [118] where a symbol interfere with its subsequent symbols. Therefore, the previously received symbols act similar to noise. However, there is one very important difference between ISI and noise. Unlike noise, which is a totally random process, the effect of ISI can actually be removed from the received signal, as the previously received signals are already known. This is done by equalizers. Many algorithms and many structures for equalization have already been introduced [118]. Although most designers do equalization on the analog signal [17], equalization after data conversion from analog to digital is gaining popularity [119]. Also, many equalizers, especially the ones to be implemented in the receiver are decision feed back equalizers (DFE) [115, 119]. A typical DFE is shown in Figure 9.1. Input  Z -1  a0  Z -1  a1  Z -1  Z -1  a2  Adaptive algorithm to set each weighting element  an + Error  Σ  Output -  Σ  Ref +  Figure 9.1: Typical DFE structure  112  Chapter 9. Conclusion Considering that the delay elements are nothing more than flip flops, it is possible to combine the flip flops in the Alexander PD with the above DFE [115]. The resulting circuit will then both do equalization and clock recovery at the same time. This is shown in Figure 9.2.  Figure 9.2: A simple receiver [115]  However, it was mentioned that the jitter of the binary PLLs is significant. This could increase the bit error rate (BER) of the receiver. We believe that replacing the Alexander PD in the above structure by our developed PFD could solve this problem. Finally, note that since DF F 1 is not highly loaded in our developed PFD, this combination will not impose a lower frequency of operation for the receiver.  113  Bibliography [1] D. Chen, “Designing On-Chip Clock Generators,” IEEE Circuits and Devices Magazine, Vol. 8, No. 4, pp. 32-36, July 1992. [2] W. Mohr,  “Approximative threshold-calculation of a PLL FM demodulator with  bandpass-limiter,” IEEE International Symposium on Circuits and Systems, pp. 14061409, May 1990. [3] A. Maxim, “A 0.16-2.55 GHz CMOS active clock deskewing PLL using analog phase interpolation,” IEEE Journal of Solid-State Circuits, vol. 40, no. 1, pp. 110-131, Jan. 2005. [4] K. Efstathiou, G. Papadopoulos, and G. Kalivas, “High speed frequency synthesizer based on PLL,” IEEE International Conference on Electronics, Circuits, and Systems, pp. 13-16, Oct. 1996. [5] S. Jeon, H. Bang, S. Jung, D. Lee, and H. Lee, “Frequency Generation for Mobile RFID Reader,” European Microwave Integrated Circuits Conference, pp. 324-327, Jan. 2007. [6] J. Lee, and B. Razavi, “Analyses and Modeling of Bang-Bang Clock and Data Recovery Circuits,” IEEE Journal of Solid-State Circuits, vol. 39, no. 9, pp. 1571-1580, Sep. 2004.  114  Bibliography [7] L. Jiuxing, A. Mamidala, A. Vishnu, and D. Panda, “Performance evaluation of InfiniBand with PCI Express,” Annual IEEE Symposium on High Performance Interconnects, pp. 13-19, Jan. 2005. [8] K. V. Shibu, Introduction to Embedded Systems, McGraw-Hill, New York, first edition, 2009. [9] M. A. Mazidi, and J. G. Mazidi, The 8051 Micro controller and Embedded Systems, Prentice-Hall Inc., New Jersey, first edition, 2000. [10] R. Kamal, Embedded Systems, Architecture, Programming and Design, McGraw-Hill, New York, second edition, 2008. [11] V. Stojanovic, M. Horowitz, “Modeling and analysis of high-speed links,” IEEE Custom Integrated Circuits Conference, pp. 589-594, Sep. 2003. [12] D. Hopkins, A. Chow, R. Bosnyak, B. Coates, J. Ebergen, S. Fairbanks, J. Gainsley, R. Ho, J. Lexau, F. Liu, T. Ono, J. Schauer, I. Sutherland, R. Drost, “Circuit Techniques to Enable 430 Gb/s/mm2 Proximity Communication,” IEEE International Solid-State Circuits Conference, pp. 368-369, Feb. 2007. [13] B. Casper, J. Jaussi, F. Mahony, M. Mansuri, K. Canagasaby, J. Kennedy, E. Yeung, R. Mooney, “A 20 Gb/s Forwarded Clock Transceiver in 90 nm CMOS,” IEEE International Solid-State Circuits Conference, pp. 263-264, Feb. 2006. [14] R. Yuen, M. V. Ierssel, A. Sheikholeslami, W. Walker, and H. Tamura, “A 5 Gb/s Transmitter with Reflection Cancellation for Backplane Transceivers,” IEEE Custom Integrated Circuits Conference, pp. 413-416, Sep. 2006.  115  Bibliography [15] D. Johns, and D. Essig, “Integrated Circuits for Data Transmission Over Twisted-Pair Channels,” IEEE Journal of Solid-State Circuits, Vol. 32, No. 3, pp. 398-406, March 1997. [16] T. Beukema, M. Sorna, K. Selander, S. Zier, B. Ji, P. Murfet, J. Mason, W. Rhee, H. Ainspan, B. Parker, and M. Beaker, “A 6.4 Gb/s CMOS SerDes Core with FeedForward and Decision-Feedback Equalization,” IEEE Journal of Solid-State Circuits, Vol. 40, No. 12, pp. 2633-2645, Dec. 2005. [17] S. Gondi, and B. Razavi, “Equalization and Clock and Data Recovery Techniques for 10-Gb/s CMOS Serial Links,” IEEE Journal of Solid-State Circuits, Vol. 42, No. 9, pp. 1999-2011, Sep. 2007. [18] K. Lee, S. Kim, G. Ahn, and D. K. Jeong, “A CMOS Serial Link for Fully Duplexed Data Communication,” IEEE Journal of Solid-State Circuits, Vol. 30, No. 4, pp. 353364, April 1995. [19] M. V. Ierssel, A. Sheikholeslami, H. Tamura, and W. W. Walker, “A 3.2 Gb/s CDR using semi-blind oversampling to achieve high jitter tolerance,” IEEE Journal of SolidState Circuits, Vol. 42, No. 10, pp. 2224-2234, Oct. 2007. [20] M. Assaad, and D. Cumming, “20 GB/S Reference less Quarter-rate PLL-based Clock and Data Recovery Circuit in 130 nm CMOS Technology,” IEEE International Conference on Mixed Design, pp. 147-150, June 2008. [21] R. E. Best, Phase-Locked Loops, McGraw-Hill, New York, fifth edition, 2003. [22] T. H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits, Cambridge University Press, New York, second edition, 2004.  116  Bibliography [23] B. Razavi, Design of Analog CMOS Integrated Circuits, McGraw-Hill, New York, first edition, 2001. [24] H. Djahanshashi, and C. A. Salama, “Differential CMOS circuits for 622-MHz/933MHz Clock and Data Recovery Applications,” IEEE Journal of Solid-State Circuits, Vol. 35, No. 6, pp. 847-855, June 2000. [25] R. C. Walker, “Designing bang-bang PLLs for clock and data recovery in serial data transmission systems,” in Phase-Locking in High-Performance Systems, B. Razavi, Ed: IEEE Press, pp. 34-45, 2003. [26] B. Razavi, “Challenges in the Design of High-Speed Clock and Data Recovery Circuits,” IEEE Comm. Mag., Vol. 40, pp. 94-101, Aug. 2002. [27] T. Siang, M. Sulaiman, C. Teik, and M. Sachdev, “Design of High-Speed Clock and Data Recovery Circuits,” Analog Integrated Circuits and Signal Processing, Vol. 52, No. 1-2, pp. 15-23, Aug. 2007. [28] S. B. Anand, and B. Razavi, “A 2.5-Gb/s Clock Recovery Circuit for NRZ Data in 0.4-µm CMOS Technology,” IEEE Journal of Solid-State Circuits, Vol. 36, No. 3, pp. 432-439, March 2001. [29] R. Zhang, and G. S. Larue, “Clock and Data Recovery Circuits with Fast Acquisition and Low Jitter,” Biennial Symposium on University/Government/Industry microelectronics, pp. 82-85, Sep. 2003. [30] H. Lee, M. Hwang, B. Lee, Y. Kim, D. Oh, J. Kim, S. Lee, D. Jeong, and W. Kim, “A 1.2-V-Only 900-mW 10 Gb Ethernet Transceiver and XAUI Interface With Robust VCO Tuning Technique,” IEEE Journal of Solid-State Circuits, vol. 40, No. 11, pp. 2418-2158, Nov. 2005. 117  Bibliography [31] J. Li, and F. Yuan, “A new Hybrid Phase Detector for Reduced Lock Time and Timing Jitter of Phase-Locked Loops,” Analog Integrated Circuits and Signal Processing, Vol. 56, No. 3, pp. 233-240, Sep. 2008. [32] V. Kratyuk, P. K. Hanumolu , U. K. Moon, and K. Mayaram, “Frequency detector for fast frequency lock of digital PLLs,” Electronic letters, vol. 43, pp. 1361-1365, Jan. 2007. [33] M. S. Jalali, A. S. Bakhtiar, and S. Mirabbasi, “A Charge-Pump with a High Output Swing for PLL and CDR Applications,” IEEE NEWCAS conference, pp. 297-300, June 2010. [34] D. A. Johns, and K. Martin, Analog Integrated Circuit Design, New York: John Wiley & Sons Inc., New York, first edition, 1997. [35] F. M. Gardner, “Charge-Pump Phase-Lock Loops,” IEEE Transactions on Communications, Vol. 28, No. 11, pp. 1849-1858, Nov. 1980. [36] F. M. Gardner, Phase-lock Techniques, New York: John Wiley & Sons Inc., New York, third edition, 2005. [37] B. Razavi, RF Microelectronics, Prentice-Hall Inc., New York, first edition, 1998. [38] W. Chiu, Y. Huang, and T. Lin, “A Dynamic Phase Error Compensation Technique for Fast-Locking Phase-Locked Loops,” IEEE Journal of Solid-State Circuits, vol. 45, No. 6, pp. 1137-1149, June 2010. [39] W. A. Davis, and K. Agarwal, Radio Frequency Circuit Design, New York: John Wiley & Sons Inc., New York, first edition, 2001.  118  Bibliography [40] H. Nosaka, E. Sano, K. Ishii, M. Ida, K. Kurishima, S. Yamahata, T. shibata, H. Fukuyama, M. Yoneyama, T. Enoki, and M. Muraguchi, “A 39-45 Gbit/s Multi DataRate Clock and Data Recovery Circuit With a Robust Lock Detector,” IEEE Journal of Solid-State Circuits, vol. 39, no. 8, pp. 1361-1365, Aug. 2004. [41] G. Georgiou, Y. Baeyens, Y. Chen, A. Gnauck, C. Gropper, P. Paschke, R. Pullela, M. Reinhold, C. Dorschky, J. Mattia, T. Mohrenfels, and C. Schulien, “Clock and data recovery IC for 40 Gb/s fiber-optic receiver,” IEEE Journal of Solid-State Circuits, Vol. 37, No. 9, pp. 1120-1124, Sep. 2002. [42] R. Payne, P. Landman, B. Bhakta, S. Ramaswamy, S. Wu, J. Powers, M. Erdogan, A. Yee, R. Gu, L. Wu, Y. Xie, B. Parthasarathy, K. Brouse, W. Mohammed, K. Heragu, V. Gupta, L. Dyson, and W. Lee, “A 6.25-Gb/s Binary Transceiver in 0.13-µm CMOS for Serial Data Transmission,” IEEE Journal of Solid-State Circuits, Vol. 40, No. 12, pp. 2646-2657, Dec. 2005. [43] X. Chen, and M. M. Green, “A CMOS 10 Gb/s Clock and Data Recovery Circuit with a Novel Adjustable KP D Phase Detector,” IEEE International Symposium on Circuits and Systems, pp. 301-304, May 2004. [44] S. Miyazawa, R. Horita, K. Hase, K. Kato and S. Kojima, “A BiCMOS PLL-Based Data Separator Circuit with High Stability and Accuracy,” IEEE Journal of Solid-State Circuits, Vol. 26, No. 2, pp. 116-121, Feb. 1991. [45] A. Momtaz, J. Cao, M. Caresosa, A. Hairapetian, D. Chung, K. Vakilian, M. Green, W. Tan, K. Jen, I. Fujimori, and Y. Cai,  “A Fully Integrated SONET OC-48  Transceiver in Standard CMOS,” IEEE Journal of Solid-State Circuits, Vol. 36, No. 12, pp. 1964-1973, Dec. 2001.  119  Bibliography [46] S. Byun, J. Lee, J. Shim, K. Kim, and H. Yu, “A 10-Gb/s CMOS CDR and DEMUX IC with a Quarter-Rate Linear Phase Detector,” IEEE Journal of Solid-State Circuits, Vol. 41, No. 11, pp. 2566-2576, Nov. 2006. [47] G. Gutierrez, S. Kong, and B. Coy, “2.488 Gb/s Silicon Bipolar Clock and Data Recovery IC for SONET (OC-48),” IEEE Custom Integrated Circuits Conference, pp. 575-578, Sep. 1998. [48] J. Savoj, and B. Razavi, “A 10-Gb/s CMOS Clock and Data Recovery Circuit With a Half-Rate Binary Phase/Frequency Detector,” IEEE Journal of Solid-State Circuits, Vol. 38, No. 1, pp. 13-21, Jan. 2003. [49] J. Li, and F. Yuan, “A New Bang-Bang Phase/Frequency Detector for Fast Locking of Phase-Locked Loops,” International Midwest Symposium on Circuits and Systems, pp. 5-8, Aug. 2007. [50] C. R. Hogge, “A Self-Correcting Clock Recovery Circuit,” IEEE Transactions on Electron Devices, Vol. ED-32, No. 12, Dec. 1985. [51] J. D. Alexander, “Clock Recovery from Random Binary Data,” Electronics Letters, Vol. 11, No. 22, pp. 541-542, Oct. 1975. [52] A. Rezayee, and K. Martin, “A 10-Gb/s Clock Recovery Circuit with Linear Phase Detector and Coupled Two-stage Ring Oscillator,” IEEE European Solid-State Circuits Conference, pp. 419-422, Sep. 2002. [53] D. Rennie, and M. Sachdev, “A Novel Tri-State Binary Phase Detector,” IEEE International Symposium on Circuits and Systems, pp. 185-188, May. 2007. [54] M. Meghelli, A. V. Rylyakov, S. Zier, M. Sorna, and D. Friedman, “A 0.18-µm SiGe BiCMOS Receiver and Transmitter Chipset for SONET OC-768 Transmission 120  Bibliography Systems,” IEEE Journal of Solid-State Circuits, Vol. 38, No. 12, pp. 2147-2154, Dec. 2003. [55] K. Fukuda, H. Yamashita, F. Yuki, G. Ono, R. Nemoto, E. Suzuki, T. Takemoto, and T. Saito, “10 Gb/s Receiver with Track-and-Hold-Type Linear Phase Detector and Charge-Redistribution 1st -order Σ∆ modulator,” IEEE International Solid-State Circuits Conference, pp. 186-188, Feb. 2009. [56] S. Song, S. M. Park, and H. Yoo, “A 4-Gb/s CMOS Clock and Data Recovery Circuit Using 1/8-Rate Clock Technique,” IEEE Journal of Solid-State Circuits, Vol. 38, No. 7, pp. 1213-1219, July 2003. [57] B. Sadhu, and R. Harjani, “Capacitor Bank Design for Wide Tuning Range LC VCO: 850 MHz-7.1 GHz (157%),” to appear in IEEE International Symposium on Circuits and Systems, May. 2010. [58] K. J. Wang, A. Swaminathan, and I. Galton, “Spurious Tone Suppression Techniques Applied to a Wide-Bandwidth 2.4 GHz Fractional-N PLL,” IEEE Journal of Solid-State Circuits, Vol. 43, No. 12, pp. 2787-2797, Dec. 2008. [59] A. Pottbacker, U. Langmann, and H. Schreiber, “A Si Bipolar Phase and Frequency Detector IC for Clock Extraction up to 8 Gb/s,” IEEE Journal of Solid-State Circuits, Vol. 27, No. 12, pp. 1747-1751, Dec. 1992. [60] C. D. Ranter, and M. Steyaert, High Data Rate Transmitter Circuits, RF CMOS Design and Techniques for Design Automation, Kluwer Academic Publishers, Netherlands, first edition, 2003. [61] X. Shou, and M. Green, “A Family of CMOS Latches with 3 Stable Operating Points,” IEEE International Symposium on Circuits and Systems, pp. 109-112, May. 2001. 121  Bibliography [62] L. B. Goldgeisser and M. M. Green, “On the topology and number of operating points of MOSFET circuits,” IEEE Transactions on Circuits and Systems-Part 1, vol. 48, pp. 218-221, Feb. 2001. [63] W. Wilson, U. Moon,K. Lakshmikumar, and L. Dai, “A CMOS Self-Calibrating Frequency Synthesizer,” IEEE Journal of Solid-State Circuits, Vol. 35, No. 10, pp. 1437-1444, Oct. 2000. [64] W. Wolf, Modern VLSI Design: System-on-Chip Design, Prentice-Hall Inc., New York, third edition, 2002. [65] S. Brown and Z. Vranesic,  Fundamentals of Digital Logic with Verilog Design,  McGraw-Hill, New York, first edition, 2003. [66] D. Dulk, “Digital PLL lock-detection circuit,” Electronics Letters, Vol. 24, No. 14, pp. 880-882, July 1988. [67] L. De Vito, “A versatile clock recovery architecture and monolithic implementation,” in Monolithic Phased-Locked Loops and Clock Recovery Circuits: Theory and Design, B. Razavi, Ed. New York: IEEE Press, pp. 405-442, 1996. [68] W. Chiu, Y. Huang, and T. Lin, “A 5 GHz Phase-Locked Loop Using Dynamic PhaseError Compensation Technique for Fast Settling in 0.18-µm CMOS,” IEEE Symposium on VLSI Circuits, pp. 128-129, June 2009. [69] A. Nizamani, “A novel Frequency Comparator: Applications in Frequency Meters and in Difference Clock for Generator Frequency Error Monitors,” IEEE Transactions On instrumentation and measurement, vol. 45, pp. 320-323, Feb. 1996.  122  Bibliography [70] I. Young, “A PLL Clock Generator with 5 to 110 MHz of Lock Range for Microprocessors,” IEEE Journal of Solid-State Circuits, vol. 27, No. 11, pp. 1599-1607, Nov. 1992. [71] Y. Lin, S. Chang, Y. Liu, C. Liu, and Guan-Ying Huang, “An Asynchronous BinarySearch ADC Architecture with a Reduced Comparator Count,” to appear in IEEE Transactions on Circuits and Systems-Part 1, 2010. [72] M. J. Ym, F. Eynde, and W. Sansen, “A High-Speed CMOS Comparator with 8-b Resolution,” IEEE Journal of Solid-State Circuits, vol. 27, no. 2, pp. 208-211, Feb. 1992. [73] S. Sheikhaei, S. Mirabbasi, and A. Ivanov, “A 0.35 µm CMOS Comparator Circuit for High-Speed ADC Applications,” IEEE International Symposium on Circuits and Systems, pp. 6134-6137, May 2005. [74] D. L. Pulfrey, Understanding Modern Transistors and Diodes, Cambridge University Press, New York, First edition, 2010, c2010. [75] M. Choi, and A. A. Abidi, “A 6-b 1.3-GSample/s A/D converter in 0.35 µm CMOS,” IEEE Journal of Solid-State Circuits, vol. 36, no. 12, pp. 1847-1858, Dec. 2001. [76] K. Martin, Digital Integrated Circuit Design, Oxford University Press Inc., New York, first edition, 2000. [77] W. Rhee, “Design of high-performance CMOS charge pumps in phase-locked loops,” IEEE International Symposium on Circuits and Systems, pp. 545-548, May 1999. [78] S. Han, J. Jin, and C. Mao, “A Full swing charge pump with zero phase offset,” Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics, pp. 298-301, Jan. 2009. 123  Bibliography [79] P. R. Gray, P. J.Hurst, S. H. Lewis, and R. G. Martin, Analysis and Design of Analog Integrated Circuits, New York: John Wiley & Sons Inc., New York, fourth edition, 2001. [80] R. Hogervorst, J. P. Tero, R. H. Eschauzier, and J. H. Huijsing, “A Compact PowerEfficient 3 V CMOS Rail-to-Rail Input/Output Operational Amplifier for VLSI Cell Libraries,” IEEE Journal of Solid-State Circuits, vol. 29, no. 12, pp. 1505-1513, Dec. 1994. [81] J. Maneatis, “Low-jitter and Process-Independent DLL and PLL Based on Self-Biased Techniques,” IEEE Journal of Solid-State Circuits, vol. 31, no. 11, pp. 1723-1732, Nov. 1996. [82] J. Shin, M. Keel, S. Lim, and S. Kim, “Charge pump with perfect current matching characteristics in phase locked loops,” Electronics letters, vol. 26, pp. 1907-1908, Nov. 2000. [83] C. Klapf, A. Missoni, W. Pribyl, G. Holweg, and G. Hofer, “Analyses and Design of Low Power Clock Generators for RFID Tags,” Prime IEEE PhD Research in Microelectronics and Electronics, pp. 181-184, July 2008. [84] C. Cao, Y. Ding, and K. O, “A 50 GHz Phase-Locked loop in 130-nm CMOS,” IEEE Journal of Solid-State Circuits, vol. 42, pp. 1649-1656, Aug. 2007. [85] M. Soyuer, J. Ewen, and H. Chuang, “A Fully Monolithic 1.25 GHz CMOS Frequency Synthesizer,” IEEE Symposium on VLSI Circuits, pp. 127-128, Jun. 1994. [86] B. Razavi, Design of Integrated Circuits for Optical Communications, McGraw-Hill, New York, first edition, 2003.  124  Bibliography [87] J. W. Nilsson, and S. A. Riedel, Electric Circuits, Prentice-Hall Inc., New York, sixth edition, 2001. [88] T. H. Lee, and A. Hajimiri, “Oscillator Phase Noise: A Tutorial,” IEEE Journal of Solid-State Circuits, vol. 35, no. 3, pp. 326-336, Mar. 2000. [89] D. Ham, and A. Hajimiri, “Concepts and Methods in Optimization of Integrated LC VCOs,” IEEE Journal of Solid-State Circuits, vol. 36, no. 6, pp. 896-909, June 2001. [90] B. Sadhu, J. Kim, and R. Harjani, “A CMOS 3.3-8.4 GHz Wide Tuning Range, Low Phase Noise LC VCO,” IEEE Custom Integrated Circuits Conference, pp. 559-562, Sep. 2009. [91] A. Thanachayanont, “CMOS Transistor-Only Active Inductor For IF/RF Applications,” IEEE International Conference on Industrial Technology, pp. 1209-1212, March 2002. [92] F. Yuan, “A fully Differential VCO Cell with Active Inductors for Gbps Serial Links,” Analog Integrated Circuits and Signal Processing, Vol. 47, pp. 213-223, Mar. 2006. [93] J. Zhan, J. Duster, and K. Kornegay, “A 7.3-GHz, 55% Tuning Range Emitter Degenerated Active Inductor VCO,” IEEE Bipolar / BiCMOS Circuits and Technology Meeting, pp. 60-63, Oct. 2004. [94] H. Hayashi, and M. Muraguchi, “A Novel Broad-Band MMIC VCO Using an Active Inductor,” Analog Integrated Circuits and Signal Processing, Vol. 20, pp. 103-109, Oct. 1999. [95] U. K. Mishra, and J. Singh, Semiconductor Device Physics and Design, Springer, Dordrecht, first edition, 2008.  125  Bibliography [96] B. Streetman, and Sanjay Banerjee, Solid State Electronic Devices, Prentice-Hall Inc., New York, fifth edition, 1999. [97] B. C. Kuo, Automatic Control Systems, Prentice-Hall Inc., New York, sixth edition, 1991. [98] B. Razavi, “A Study of Phase Noise in CMOS Oscillators,” IEEE Journal of SolidState Circuits, vol. 31, no. 3, pp. 331-343, March 1996. [99] D. B. Leeson, “A Simple Model of Feedback Oscillator Noises Spectrum,” Proceedings of IEEE, Vol. 54, pp. 329-330, Feb. 1966. [100] A. Hajimiri, and T. H. Lee, “General Theory of Phase Noise in Electrical Oscillators,” IEEE Journal of Solid-State Circuits, vol. 33, no. 2, pp. 179-194, Feb. 1998. [101] A. Maxim, B. Scott, E. Schneider, M. Hagge, S. Chacko, and D. Stiurca, “A LowJitter 125-1250 MHz Process-Independent and Ripple-Poleless 0.18 µm CMOS PLL Based on a Sample-Reset Loop Filter,” IEEE Journal of Solid-State Circuits, vol. 36, no. 11, pp. 1673-1683, Nov. 2001. [102] K. Lin, L. Tee, and P. Gray, “A 1.4 GHz differential low noise CMOS frequency synthesizer using a wide band PLL architecture,” IEEE International Solid-State Circuits Conference, pp. 147-149, Feb. 2000. [103] W. Chiu, Y. Chang, and T. Lin, “A 5.5-GHz 16-mW fast-locking frequency synthesizer in 0.18 µm CMOS,” IEEE Asian Solid-State Circuit Conference, pp. 456-459, Nov. 2007. [104] C. Yang, and S. Liu, “Fast Switching Frequency Synthesizer with a discriminatoraided phase detector,” IEEE Journal of Solid-State Circuits, vol. 35, no. 10, pp. 14451452, Oct. 2000. 126  Bibliography [105] J. Buckwalter, and A. Hajimiri, “An active Analog Delay and the Delay Reference Loop,” IEEE Radio Frequency Integrated Circuits Symposium, pp. 17-20, May 2004. [106] R. Rosales, Integrated Silicon Bipolar Wideband Frequency Modulation Circuits For High-Performance Analog Lightwave Transmission, PhD dissertation, University of British Columbia, 2003. [107] J. Kim, J. K. Kim, B. Lee, N. Kim, D. Jeong, and W. Kim, “A 20 GHz Phase-Locked Loop for 40 Gb/s Serializing Transmitter in 0.13-µm CMOS,” IEEE Symposium on VLSI Circuits, pp. 144-147, June 2005. [108] N. Dalt, and C. Sandner, “A Subpicosecond Jitter PLL for Clock Generation in 0.12µm digital CMOS,” IEEE Journal of Solid-State Circuits, vol. 38, no. 7, pp. 1275-1278, July 2003. [109] M. Soyuer, “A Monolithic 2.3-Gb/s 100-mW Clock and Data Recovery Circuit in Silicon Bipolar Technology,” IEEE Journal of Solid-State Circuits, vol. 28, No. 12, pp. 1310-1314, Dec. 1993. [110] J. Roche, W. Rahadjandrabe , L. Zad , G. Bracmard, and D. Fronte, “A PLL with Loop Bandwidth Enhancement for Low-Noise and Fast-Settling Clock Recovery,” IEEE International Conference on Electronics Circuits and Systems, pp. 802-805, Sep. 2008. [111] K. Cheng, W. Yang, and C. Ying, “A Dual-Slope Phase Frequency Detector and Charge Pump Architecture to Achieve Fast Locking of Phase-Locked Loop,” IEEE Transactions on Circuits and Systems-Part 2, Vol. 50, No. 11, pp. 892-896, Nov. 2003. [112] IBM Microelectronics Disivion, CMOS8RF Design Manual, IBM Technology Files, March 2009.  127  Bibliography [113] P. Yue, and S. S. Wong, “On-chip spiral inductors with patterned ground shields for Si-based RF ICs,” IEEE Journal of Solid-State Circuits, vol. 33, No. 5, pp. 743-752, May 1998. [114] C. Cao, and K. O, “Millimeter-Wave Voltage-Controlled Oscillators in 0.13 µm CMOS Technology,” IEEE Journal of Solid-State Circuits, vol. 41, No. 6, pp. 12971304, June 2006. [115] L. Li, and M. Green, “Power Optimization of 11.75 Gb/s Combined Decision Feedback Equalizer and Clock Data Recovery Circuit in 0.18 µm CMOS,” to appear in IEEE Transactions on Circuits and Systems-Part 1, 2010. [116] M. Mehrabian, A. Nabavi, and N. Rashidi, “A 4 to 7 GHz Ultra Wideband VCO with Tunable Active Inductor,” IEEE International Conference on Ultra-Wideband, pp. 21-24, Sep. 2008. [117] J. Cho, H. Lee, K. Nah, and B. Park, “A 2-GHz wide band low phase noise voltagecontrolled oscillator with on-chip LC tank,” IEEE Custom Integrated Circuits Conference, pp. 559-562, Sep. 2003. [118] T. S. Rappaport, Wireless Communications, principles and practice, Prentice-Hall Inc., New Jersey, second edition, 2002. [119] S. Sarvari, T. Tahmoureszadeh, A. Sheikholeslami, H. Tamura, and M. Kibune, “A 5 Gb/s Speculative DFE for 2x Blind ADC-based Receivers in 65-nm CMOS,” to appear in IEEE Symposium on VLSI Circuits, June 2010. [120] D. J. Comer, and D. T. Comer, Fundamentals of electronic circuit design, New York: John Wiley & Sons Inc., New York, first edition, 2003.  128  Bibliography [121] D. A. Hodges, H. G. Jackson, and R. A. Saleh, Analysis and Design of Digital Integrated Circuits: In Deep Sub micron Technology, McGraw-Hill, New York, third edition, 2004. [122] L. Li, S. Raghavendran, and D. T. Comer, “CMOS Current Mode Logic Gates for High-Speed Applications,” 12th NASA Symposium on VLSI Design, Idaho, USA, Oct. 2005. [123] A. S. Sedra, and K. C. Smith, Microelectronic Circuits, Oxford University press, New York, fifth edition, 2004. [124] P. Heydari, and R. Mohavavelu, “Design of Ultra High-Speed CMOS CML buffers and Latches,” IEEE Transactions on VLSI Systems, Vol. 12, pp. 1081-1093, Oct. 2004. [125] T. O. Dickson, K. Yau, T. Chalvatzis, A. M. Mangan, E. Laskin, R. Beerkens, P. Westergaard, M. Tazlauanu, M. Yang, and S. P. Voinigescu, “The Invariance of Characteristic Current Densities in Nanoscale MOSFETs and Its Impact on Algorithmic Design Methodologies and Design Porting of Si(Ge) (Bi)CMOS High-Speed Building Blocks,” IEEE Journal of Solid-State Circuits, Vol. 41, No. 8, pp. 1830-1845, Aug. 2006. [126] B. Razavi, “The Role of PLLs in Future Wire line Transmitters,” IEEE Transactions on Circuits and Systems-Part 1, Vol. 56, pp. 450-457, Feb. 2009. [127] S. Sheikhaei, S. Mirabbasi, and A. Ivanov, “A 43 mW Single-Channel 4 GS/s 4Bit Flash ADC in 0.18 µm CMOS,” IEEE Custom Integrated Circuit Conference, pp 333-336, Sep. 2007.  129  Bibliography [128] B. Razavi, Y. Ota, and R. Swartz, “Design Techniques for Low-Voltage High-Speed Digital Bipolar Circuits,” IEEE Journal of Solid-State Circuits, Vol. 29, No. 3, pp. 332-339, March 1994. [129] J. Savoj, and B. Razavi, “A 10-Gb/s CMOS Clock and Data Recovery Circuit with a Half-Rate Linear Phase Detector,” IEEE Journal of Solid-State Circuits, Vol. 36, No. 5, pp. 761-767, May 2001. [130] J. Lee, and B. Razavi, “A 40-Gb/s CMOS Clock and Data Recovery Circuit in 0.18-µm CMOS Technology,” IEEE Journal of Solid-State Circuits, Vol. 38, No. 12, pp. 2181-2190, Dec. 2003. [131] T. Chalvatzis, K. Yau, P. Schvan, M. Yang, and S. P. Voinigescu, “A 40-Gb/s Decision Circuit in 90-nm CMOS,” IEEE European Solid-State Circuits Conference, pp. 512-515, Sep. 2006. [132] J. Kim, J. K. Kim, B. Lee, and D. Jeong, “Design Optimization of On-Chip Inductive Peaking Structures for 0.13-µm CMOS 40-Gb/s Transmitter Circuits,” IEEE Transactions on Circuits and Systems-Part 1, pp. 2544-2555, Dec. 2009. [133] D. Kehrer, H. Wohlmuth, H. Knapp, M. Wurzer, and A. L. Scholtz, “40-Gb/s 2:1 Multiplexer and 1:2 De multiplexer in 120-nm Standard CMOS,” IEEE Journal of Solid-State Circuits, Vol. 38, No. 11, pp. 1830-1837, Nov. 2003. [134] S. S. Mohan, M. Hershenson, S. P. Boyd, and T. H. Lee. “Bandwidth Extension in CMOS with Optimized On-Chip Inductors,” IEEE Journal of Solid-State Circuits, Vol. 35, No. 3, pp. 346-355, Mar. 2000.  130  Bibliography [135] S. Shekhar, J. Walling, and D. J. Allstot, “Bandwidth Extension Techniques for CMOS Amplifiers,” IEEE Journal of Solid-State Circuits, Vol. 41, No. 11, pp. 24242439, Nov. 2006. [136] Y. M. Lee, and S. Mirabbasi, “Design of an Active-Inductor-Based Termination Circuit for High-Speed I/O,” IEEE International Symposium on Circuits and Systems, pp. 3061-3064, May. 2008.  131  Appendix A Current-Mode Logic Gates A.1  Introduction  Complementary Metal Oxide Semiconductor (CMOS) gates are often used in digital circuits due to their low power consumption, high noise margins and small area usage [120]. Their static power consumption, which is a result of the leakage currents, can be quite small, and they can provide a rail-to-rail output, significantly increasing their noise margin. Their operation is based on the existence of a pull up path to VDD and a pull down path to Gnd. A set of switches control each path, as shown in Figure A.1. VDD Control Signals  Pull U P Path  Control Signals  Pull Down Path  Cout  Figure A.1: Basic CMOS structure  As it can be seen in Figure A.1, the gate operates by charging (corresponding to a logical “1” output) or discharging (corresponding to a logical “0” output) the output capacitor. When the pull up path is connected, the capacitor is charged, while it is discharged when the pull down path is connected. Clearly, the pull up and pull down path should not both 132  Appendix A. Current-Mode Logic Gates be on at the same time. This ensures that no resistive path will exist from VDD to Gnd and thus the gate consumes no static power (except for the power consumed due to the leakage currents). Also, if both paths are off at the same time, the output is left in a high impedance state, which should also be avoided. However, when the gate output is changing from “0” to “1” or from “1” to “0”, there exists a short period of time during which both switches are conducting, which causes a flow of current (in the form of a current pulse) from VDD to Gnd. The consumed power in this case is called dynamic (switching) power. It can be proved that this power is proportional to the average switching activity of the gate and the output capacitance [121]. There are however two major problems with the above gates. Firstly, the continuous demand for faster circuits increases the activity of the gates, thus increasing their power consumption. The second problem is that each switching activity causes a spike noise on the supply lines which is not a serious problem for digital circuits as they have a very wide noise margin. On the other hand, the supply noise can seriously affect the analog circuits [23]. This, together with the fact that the demand for faster circuits have resulted in an unacceptable increase in the power consumption of the CMOS gates, has lead designers to use a different design technique to realize digital gates in high-frequency mixed signal circuits. Due to the reasons discussed above, current-mode logic (CML) gates are a popular logic family for accurate high speed mixed signal circuits. Their design, which is mostly differential, is based on steering a reference current between the two branches, depending on the status of the input signals. This is more clearly shown in Figure A.2. Clearly, one of the two branches in Figure A.2 should always be conducting. The above design was first implemented using bipolar transistor, but later extended into designs with MOS transistors [122]. These types of gates are called MOS CML gates (MCML) gates22 . 22  Emitter Coupled Logic (ECL) gates, a buffered version of CML gates, are also widely used.  133  Appendix A. Current-Mode Logic Gates  VDD  Control Signals  R  R  Out  Outb  Left Branch  Right Branch  Control Signals  IRef  Figure A.2: Basic CML structure  As can be seen in Figure A.2, the current drawn from the supplies in a CML design is always constant, thus eliminating the supply noise resulted from switching of the digital gates. Also, the fully differential design of this circuit reduces the effect of the supply noise and makes the circuit more robust to other common mode noise sources. The reason for the high speed of these circuits will be discussed later in this appendix. However, note that these gates draw a constant current from the supplies, thus consuming static power. As shown in [122], a MCML NOR gate is about 2.5 times faster than a conventional CMOS NOR gate, while consuming more static power (about 1.2 mW per gate) [122]. The rest of this section. is dedicated to introducing a brief CML design strategy. Also, the various MCML (CML for short) gates used in the rest of the thesis will be discussed here. For each gate, the power consumption and the input/output waveforms are reported. Note that all simulations are done in a 0.13 µm CMOS technology. 134  Appendix A. Current-Mode Logic Gates  A.2  CML Gate Design Techniques  Replacing the boxes of Figure A.2 by NMOS transistors, and considering that the ideal current source in the figure can be realized by an NMOS transistor operating in the saturation region, one cannot deny the similarities between CML gates in digital and differential amplifiers in analog. The main difference is that a differential amplifier deals with the small signal behavior of the transistors, [23], while a CML gate deals with their large signal behavior. This is because in CML gates, the input voltage swing is a few hundred millivolts. However, the techniques applies to CML gate design are more analog than digital. In this section, and to gain insight to the design of CML gates, the large signal behavior of a differential amplifiers are discussed23 . Consider the differential pair (CML buffer) shown in Figure A.3, where the transistor M3 act as a current source. When Vin is considerably larger than Vinb , the bottom current is steered to the left branch, creating a voltage drop on R1 . Since no voltage drop exists across R2 , Vout will be equal to VDD and the output is viewed as a logical “1”. The large signal characteristic of the above circuit is shown in Figure A.4 [124]. In this figure, it is assumed that R1 = R2 = RD . As it can be seen, the differential output swing of the gate is RD ISS . As it is shown in [23], the input common mode voltage can decrease until M3 goes into the triode region. This occurs when the drain-source voltage of M3 goes below the overdrive voltage of M3 . To obtain the maximum input common mode level, note that as the common mode of the inputs increases, so does the voltage of the source of the transistors M1 and M2 . This keeps M3 in saturation, which keeps ISS constant. This in turn causes the voltages of Vout and Voutb to stay unchanged. The input common-mode level can increase until transistors M1 and M2 enter into the triode region. Depending the 23  The structure of a differential amplifier is identical to a that of a CML buffer. Although not discussed here, the insights can be applied to other CML gates as well  135  Appendix A. Current-Mode Logic Gates Vdd  R1  R2  Voutb  Vin  Vout  M2  M1  CL  Vinb CL  ISS Vbias  M3  Figure A.3: Structure of a CML buffer  values of the elements, this point can be above VDD , in which case, the maximum common mode level of the input is VDD [23].  VGS,12 + (VGS,3 − VT Hn ) < Vin,CM < min[VDD , VDD − RD ISS /2 + VT Hn ]  (A.1)  where VGS,12 is the gate-source voltage of transistors M1 and M2 and VT Hn is their threshold voltage. For transistors M1 and M2 to be in saturation, the output should be between the following two limits [124]: Vin,max − VT Hn < Vout < VDD  (A.2)  where Vin,max is the maximum level of the input signals. Combining (A.1) and (A.2), the  136  Appendix A. Current-Mode Logic Gates  Figure A.4: Transfer characteristic of a CML buffer [124]  maximum level for the differential swing of the output should be [124]:  RD ISS < VT Hn  (A.3)  Note that a large value for RD increases the RC delay of the gate, thus reducing its maximum speed. In [125], it is shown that to get the minimum delay from a CML gate, the tail current density should approximately be 0.3 mA/µm, irrespective of the technology node. It is also shown in [125] that the gate delay changes by less than 10% when the tail current density varies between 0.15 mA/µm and 0.5 mA/µm. For a CML gate, the optimum performance is achieved when the swing of the driving stage is sufficient to completely switch the current in the driven stage. Consider a CML gate (Stage1) driving another CML gate (stage 2). For optimum performance, the following equation should hold [124]:  RD1 ISS1 >  2ISS2 /(µn Cox (  W )2 ) L  (A.4)  where all the variables with subscripts 1 refer to the driving stage, and all the variable  137  Appendix A. Current-Mode Logic Gates with subscript 2 refer to the driven stage. (W/L)2 is the width to length ratio of M1 and M2 in the driven stage. Therefore, to design a CML gate, ISS should be chosen according to the power budget. From (A.3), RD is obtained. Knowing the current density of M3 , its width is calculated. Finally using (A.4), (W/L) of M1 and M2 are calculated.  A.3  NAND  Figure A.5 shows the structure of a CML NAND gate.  VDD R1  R2  Vout  Voutb  Vin2  Vin2b M3  M4 Vin1b  Vin1  M2  M1  I1  Figure A.5: CML NAND gate  As it can be seen in the above figure, when Vin1 − Vin1b and Vin2 − Vin2b are logical “1”, transistors M1 and M3 conduct while M2 and M4 are off. This steers the current in the left branch, generating a voltage drop across R1 . Thus, Vout = VDD − R1 I1 . However, the current through R2 is zero and thus Voutb = VDD . So Vout − Voutb is a logical “0”. It can 138  Appendix A. Current-Mode Logic Gates be similarly shown that in all other cases for Vin1 − Vin1b and Vin2 − Vin2b , the current will be steered in the right branch instead, generating a logical “1” for Vout − Voutb . This is summarized in Table A.1. As it can easily be seen from this table, the structure shown in Figure A.5 belongs to a NAND gate. Vin1 − Vin1b 0 0 1 1  Vin2 − Vin2b 0 1 0 1  Vout − Voutb 1 1 1 0  Table A.1: Logic of a CML NAND gate Note that in Figure A.5, for the currents to switch between the two branches, the required input voltage swing is much less than VDD . It is shown in [123] that in a differential circuit, the current completely switches from one branch to the other if the difference √ between the two inputs exceed 2Vov , where Vov is the overdrive voltage of the transistors. It is shown in [125,131] that for a CML gate the minimum peak-to-peak input voltage swing is approximately 300 mV on each side in the 90-nm node and about 400 mV in the 130-nm node. The significance of this number is that for a normal CMOS gate, the switching of the output starts when the input passes approximately VDD /224 . For differential inputs, and considering that VDD is 1.2 V in the 0.13 µm CMOS technology used, each input should change by about 300 mV for switching to start, which is 0.33 of VDD . This fact enables the CML gates to work at very high frequencies. In fact, [126] reports a CML flip flop suitable for operation with a data rate of 80 Gb/s in 65-nm CMOS technology. Finally, note that usually dummy NMOS transistor is used in series with M2 in Figure A.5 to make the delay from all inputs to the output equal [127]. The above NAND gate consumes 1.8 mW. Figure A.6 shows the output of a CML NAND gate, where the solid line is the output and the dotted lines are the inputs. To 24  Called the gate threshold voltage.  139  Appendix A. Current-Mode Logic Gates obtain this figure, the output of the gate was loaded with the same NAND gate. The frequency of one of the inputs is 10 GHz, while the frequency of the second input is chosen to be much lower.  Figure A.6: CML NAND gate output  The input frequency of the CML NAND gate was increased to 33 GHz. The consumed power is still 1.8 mW, which proves that the power consumption of a CML gate is independent of the input frequency. Note that the differential output swing has decreased. Figure A.7 shows the gate output for this case. Note that to obtain an AND gate, simply the output wires should be rearranged, meaning that if Vout − Voutb in Figure A.5 is the output of a NAND gate, then Voutb − Vout would be the output of an AND gate and an inverter is not needed25 .In fact, inverters are rarely used in CML designs, which is another advantage of this logic family. 25  This is true for all the other gates introduced here.  140  Appendix A. Current-Mode Logic Gates  Figure A.7: CML NAND gate output at 33 GHz  A.4  XOR  Figure A.8 shows the structure of a CML XOR gate. In this figure, if both inputs are a logical “0” or “1”, the current is steered to the left branch, while it is steered to the right branch if the two inputs are different. One important point that should be considered in the design of this gate is to ensure that all inputs see the same capacitance. To do this, the width of transistors M1 and M2 is twice as much as the width of the other transistors. Other structures for high-frequency CML XOR gates have also been proposed. [128] proposes a high-frequency XOR gate using bipolar junction transistors (BJT). The gate has two differential inputs and a single-ended output. It also has a reference voltage, which should be equal to the common mode level of the input signals. [129] modifies [128], by replacing the BJTs with MOS transistors. Finally [130] modifies both these designs by eliminating the required reference voltage. However, none of these XOR gates are used in the current design, as the frequency of operation of the gate introduced in Figure A.8 was sufficient for this work. The output waveform of the XOR gate, loaded with another XOR gate is shown in 141  Appendix A. Current-Mode Logic Gates VDD  R1  R2 Voutb  Vout  Vin2 M3  Vin2b  M4  Vin2 M5  M6  Vin1b  Vin1  M2  M1  I1  Figure A.8: CML XOR gate  Figure A.9, where the dotted lines are the inputs and the solid line is the output. The gate consumes a power of 1.92 mW.  Figure A.9: CML XOR gate output  142  Appendix A. Current-Mode Logic Gates  A.5  MUX  Figure A.10 shows the structure of a CML multiplexer (MUX) gate.  VDD  R1  R2  Voutb Vout  Vin2b  M3  Sel  M4  Vin2 Vin1b  M5  M6  M2  M1  Vin1  Selb  I1  Figure A.10: CML MUX gate  In the above figure, if Sel − Selb is logical “1”, M1 is on, steering the current to the source of the transistors M3 and M4 . The value of Vin2 − Vin2b now determines if the current should be steered to the right or to the left branch. Thus if Sel − Selb is logical “1”, Vin2 − Vin2b control the value of Vout − Voutb . Similarly, if Sel − Selb is logical “0”, Vin1 − Vin1b control the value of Vout − Voutb , which is the definition of a MUX gate. Figure A.11 show the simulation result of the loaded MUX gate, where in the top figure, the blue dashed-dotted line is Vin1 − Vin1b , the red dashed line is Vin2 − Vin2b and the purple 143  Appendix A. Current-Mode Logic Gates solid line is the output. The bottom figure is the select of the MUX gate. As it can be seen when Sel − Selb = 0, the output is equal to Vin1 − Vin1b , which is expected. The gate consumes 1.56 mW of power.  Figure A.11: CML MUX gate output  A.6  Latch  Figure A.12 shows the structure of the latch gate used, where Vin and Vinb refer to the differential input and VClk and VClkb refer to the differential clock used for latching. In this figure, when VClk −VClkb is a logical “1”, transistor M5 conducts, while transistor M6 is “off”, thus disabling the positive feedback formed by transistors M3 and M4 . In this clock phase, Vin and Vinb create a voltage difference between Vout and Voutb . In the next clock phase and when VClk − VClkb is a logical “0”, transistor M6 is on. In this case, the current flows to transistors M3 and M4 , turning the positive feedback on. This feedback  144  Appendix A. Current-Mode Logic Gates  VDD  R1  R2 Vout  Voutb Vin  M2  M1  Clk  Vinb  M3  M5  M4  M6  Clkb  Figure A.12: CML latch gate  subsequently “latches” the value of Vout and Voutb . Note that in this state, the gate operation becomes independent of the input signal, as transistor M5 is turned off. Also, the latch of Figure A.12 can easily be extended to a resetable latch, with a differential reset signal, reset − resetb. This is done by adding an NMOS transistor, whose gate in connected to the reset input, from Vout to Gnd and a PMOS transistor, whose gate in connected to the resetb input, from Voutb to VDD . Note that these transistors should be large enough to be able to reset the latch. Figure A.13 shows the simulation result of the latch gate, loaded with another latch gate, where the dashed blue line is the clock, the dotted red line is the data, and the solid purple line is the output. Note that we have also labeled the two inputs on the figure. The  145  Appendix A. Current-Mode Logic Gates gate consumes 1.92 mW of power.  Figure A.13: CML latch gate output  A.7  Fast Latch  As it will be shown later, a latch capable of operation at around 20 GHz is needed in the hybrid phase/frequency detector block. Due to the high capacitive loading on this latch, the latch introduced in section*3.6 cannot be used. This forces us to change the structure of the conventional CML latch to enable it to work in higher frequencies. Several designs for a high speed latch have been proposed in the literature. [131] uses inductive peaking to extended the frequency of operation of the latch gates to about 32 GHz. [125, 132, 133] use the same method to improve the speed of a CML MUX, latch and buffer gates. The same idea can also be used to increase the bandwidth of analog amplifier blocks as well [134, 135]. The basic idea is to have two inductors in series with the drain resistors, as shown in Figure A.14. The above gate works by designing the inductor such that it resonates with the load capacitor. This increases the speed of the operation of the gate. To justify this physically, 146  Appendix A. Current-Mode Logic Gates VDD  R1  R2  L1  L2 Vout  Voutb  Vin CL  M2  M1  Vinb CL  ISS Vbias  M3  Figure A.14: Inductive peaking on CML gates  note that the current through the inductor cannot change instantaneously. Thus immediately when the state of the output changes, all the available current flows to the load capacitor, changing its voltage faster than usual. This increases the speed of the operation of the gate. However, passive inductors are bulky components and active inductors increase the power consumption, degrade linearity and limit the output swing [136]. Finally, note that other forms of peaking can also be used to improve speed, but they mostly occupy a lot of area. The second method is to use the latch gate introduced in [124]. However, this gate requires the use of a reference voltage equal to the common mode level of the input. It also occupies more area on chip. The third structure to realize a fast latch is to use a feedforward path as shown in Figure A.15 [126]. Note that in [126], inductive peaking on the gate is also used which is  147  Appendix A. Current-Mode Logic Gates not shown here. VDD  R1  R2 Vout  Voutb Vin  M9  M8  Vinb M3  M2  M1  M4 I1  I2 Bias  M7  Clk  M5  M6  Clkb  Figure A.15: A latch gate with a feedforward path  Unlike the other gates which would also work in lower frequencies, this latch only works well if the frequency of the clock is high. The high speed operation of this latch is possible because of the feedforward path added to the gate through transistors M7 , M8 and M9 . When Clk is high, the gate is tracking the inputs and operates normally. However, when Clk is low, the gate latches the inputs thorough transistors M3 , M4 , but also weakly tracks the inputs through transistors M7 and M8 . This weak tracking cause the latch to very quickly change states, if needed, when the Clk is high again. Note that the weak tracking is only possible if I2 is much less than I1 . Figure A.16 shows the simulation results of the loaded fast latch gate where the dotted red line is the clock, the dashed blue line is the data, and the solid purple line is the output. The tracking mode of operation is indicated on the figure. Note that the clock frequency is 20 GHz and the gate almost has a full swing. The gate consumes 2.5 mW of power. In the actual PD, the increased loading on the latches result in a decrease of the output 148  Appendix A. Current-Mode Logic Gates  Figure A.16: Output of the fast latch gate  swing, as the above latch is only load with another fast latch. Finally, note that a flip flop can easily be designed by connecting two latch gates in series and clocking them with different edges of the clock.  A.8  Conclusion  In this appendix, all the digital gates used throughout this thesis are introduced, and their simulated outputs are shown. All simulations are done in the CMOS 0.13 µm technology and with a supply voltage of 1.2 V.  149  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0064827/manifest

Comment

Related Items