# Embedded Test Circuits and Methodologies for Mixed-Signal ICs

by

Sassan Tabatabaei-Zavareh

M. Sc. (Electrical & Computer Engineering), The University of Calgary, 1994

B. Sc. (Electrical & Computer Engineering), Isfahan University of Technology, 1991

#### A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

#### THE REQUIREMENTS FOR THE DEGREE OF

## **Doctor of Philosophy**

in

#### THE FACULTY OF GRADUATE STUDIES

(Department of Electrical and Computer Engineering) We accept this thesis as conforming to the required standard

## The University of British Columbia

February 2000

© Sassan Tabatabaei-Zavareh, 2000

In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission.

Department of <u>Electrical & Computer</u> Engineering

The University of British Columbia Vancouver, Canada

20, 2000 Date

# Abstract

The rapid pace of the integrated circuit industry towards more miniaturization is making system-on-chip (SOC) a reality. For practical implementations of SOC, however, the test issues of such devices must be addressed through the integration of design-for-testability (DFT), built-in self-test (BIST), and embedded test for embedded blocks, such as digital, memory, and mixed-signal circuits.

This thesis presents two novel embedded test solutions for mixed-signal circuits. The first one is a built-in current monitor (BICM) suitable for power supply current ( $I_{DD}$ ) testing. The BICM includes a built-in current sensor (BICS) which provides high measurement sensitivity without introducing a large impedance in the  $I_{DD}$  path. Although the BICS structure has been proposed before, the new circuit analysis and chip measurement results provide important insights into the BICS characteristics and design trade-offs. The BICM also includes a mixed-signal built-in current integrator (BICI) which generates a digital signature proportional to the average  $I_{DD}$  ( $\overline{I_{DD}}$ ). Two different circuits have been developed for BICI: a single-phase and a double-phase BICI; the first is less accurate but requires less silicon area. These new BICI architectures offer an advantage over previously proposed circuits because they can perform integration over large time windows (T > 1 ms) while occupying a chip area equivalent to a only few hundred NAND gates. The BICM is compact, accurate (error < 2%), and insensitive to process and temperature variations.

The second embedded test circuit is designed for non-intrusive functional testing of high-speed clock-recovery units (CRU) and clock-synthesis units (CSU). To the author's knowledge, this new structure is the first circuit which can perform on-chip, single-shot jitter measurement with high resolution and precision without requiring element matching. The simulation and analysis predict a jitter measurement resolution of 10ps and a precision of 11ps in a 0.35  $\mu m$  CMOS technology under typical power supply and thermal noise conditions. Combined with a jitter generator block, it can test intrinsic jitter, and jitter transfer characteristics of CRUs and CSUs. The circuit is digital, partially synthesizable, and automatically placeable and routable. Novel gate delay model and analysis techniques, supported by simulation, are also introduced to evaluate the accuracy of the circuit.

# Contents

| Abstra    | ct       |                           |          |   |   | ii    |  |
|-----------|----------|---------------------------|----------|---|---|-------|--|
| Conten    | ts       |                           |          |   |   | iv    |  |
| List of ' | Tables   |                           |          |   |   | ix    |  |
| List of ] | Figures  |                           |          |   |   | x     |  |
| Claims    | of Orig  | inality                   |          |   |   | xiv   |  |
| Acknow    | vledgem  | ents                      |          |   |   | xvi   |  |
| Dedicat   | tion     |                           |          |   |   | xviii |  |
| 1 Mo      | tivation | s and Overview            |          |   |   | 1     |  |
| 1.1       | Built-i  | n current monitoring      | • •      | • | • | . 2   |  |
|           | 1.1.1    | On-chip current sensing   |          | • |   | . 3   |  |
|           | 1.1.2    | On-chip current averaging |          | • |   | . 5   |  |
| 1.2       | PLL te   | sting                     | 、<br>• • | • |   | . 7   |  |
|           | 1.2.1    | Jitter testing            |          |   | • | . 8   |  |

|   | 1.3 | Motivations and Contributions                                             | 9          |
|---|-----|---------------------------------------------------------------------------|------------|
| 2 | Bac | ckground And Survey 1                                                     | 13         |
|   | 2.1 | IC Testing                                                                | 4          |
|   |     | 2.1.1 Why test integrated circuits?                                       | 6          |
|   |     | 2.1.2 Test and diagnosis                                                  |            |
|   |     | 2.1.3 Functional versus structural testing                                | 1 <b>8</b> |
|   |     | 2.1.4 Defects and faults                                                  | 9          |
|   |     | 2.1.5 Test generation                                                     | 21         |
|   |     | 2.1.6 Design for testability (DFT), built-in-self-test (BIST), and embed- |            |
|   |     | ded test                                                                  | 22         |
|   |     | 2.1.7 Digital circuit testing                                             | 24         |
|   | 2.2 | Analog and Mixed-Signal Testing                                           | 26         |
|   |     | 2.2.1 History                                                             | 27         |
|   |     | 2.2.2 Testing analog circuits                                             | 29         |
|   |     | 2.2.3 Analog and mixed-signal DFT                                         | \$4        |
|   |     | 2.2.4 Mixed-signal BIST                                                   | 5          |
|   | 2.3 | Conclusions                                                               | 7          |
| 3 | Bui | ilt-In Current Monitor for Testing Analog Circuit Blocks 3                | 38         |
|   | 3.1 | Built-In Current Sensor (BICS)                                            | ŀ0         |
|   |     | 3.1.1 BICS circuit                                                        | 0          |
|   |     | 3.1.2 BICS accuracy                                                       | 2          |
|   |     | 3.1.3 Calculating $Z_{BIC}$                                               |            |
|   |     |                                                                           |            |
|   |     | v                                                                         |            |
|   |     |                                                                           |            |

.

|   | 3.2               | Single                                                                           | -phase BICI                    | 47                                                                                                                     |
|---|-------------------|----------------------------------------------------------------------------------|--------------------------------|------------------------------------------------------------------------------------------------------------------------|
|   |                   | 3.2.1                                                                            | Basis                          | 47                                                                                                                     |
|   |                   | 3.2.2                                                                            | Notation and definitions       | 49                                                                                                                     |
|   |                   | 3.2.3                                                                            | Circuit operation              | 50                                                                                                                     |
|   |                   | 3.2.4                                                                            | BICI two-point calibration     | 55                                                                                                                     |
|   |                   | 3.2.5                                                                            | Single-phase BICI accuracy     | 57                                                                                                                     |
|   |                   | 3.2.6                                                                            | Circuit implementation         | 58                                                                                                                     |
|   | 3.3               | Simula                                                                           | ation and Experimental Results | 61                                                                                                                     |
|   |                   | 3.3.1                                                                            | BICS                           | 61                                                                                                                     |
|   |                   | 3.3.2                                                                            | Single-phase BICI              | 65                                                                                                                     |
|   | 3.4               | Conclu                                                                           | usions                         | 68                                                                                                                     |
|   |                   |                                                                                  |                                |                                                                                                                        |
| 4 | Dou               | ible-Ph                                                                          | ase Built-In Integrator        | 70                                                                                                                     |
| 4 |                   |                                                                                  | ase Built-In Integrator        | 70                                                                                                                     |
| 4 | <b>Dou</b><br>4.1 |                                                                                  | ase Built-In Integrator        |                                                                                                                        |
| 4 |                   | Introdu                                                                          |                                | 70                                                                                                                     |
| 4 | 4.1               | Introdu                                                                          | uction                         | 70<br>71                                                                                                               |
| 4 | 4.1               | Introdu<br>Double                                                                | uction                         | 70<br>71<br>71                                                                                                         |
| 4 | 4.1               | Introdu<br>Double<br>4.2.1                                                       | uction                         | 70<br>71<br>71<br>73                                                                                                   |
| 4 | 4.1               | Introdu<br>Double<br>4.2.1<br>4.2.2                                              | uction                         | <ul> <li>70</li> <li>71</li> <li>71</li> <li>73</li> <li>75</li> </ul>                                                 |
| 4 | 4.1               | Introdu<br>Double<br>4.2.1<br>4.2.2<br>4.2.3                                     | uction                         | <ul> <li>70</li> <li>71</li> <li>71</li> <li>73</li> <li>75</li> <li>76</li> </ul>                                     |
| 4 | 4.1               | Introdu<br>Double<br>4.2.1<br>4.2.2<br>4.2.3<br>4.2.4                            | uction                         | <ul> <li>70</li> <li>71</li> <li>71</li> <li>73</li> <li>75</li> <li>76</li> <li>81</li> </ul>                         |
| 4 | 4.1               | Introdu<br>Double<br>4.2.1<br>4.2.2<br>4.2.3<br>4.2.4<br>4.2.5                   | uction                         | <ul> <li>70</li> <li>71</li> <li>71</li> <li>73</li> <li>75</li> <li>76</li> <li>81</li> <li>82</li> </ul>             |
| 4 | 4.1               | Introdu<br>Double<br>4.2.1<br>4.2.2<br>4.2.3<br>4.2.4<br>4.2.5<br>4.2.6<br>4.2.7 | uction                         | <ul> <li>70</li> <li>71</li> <li>71</li> <li>73</li> <li>75</li> <li>76</li> <li>81</li> <li>82</li> <li>82</li> </ul> |

| 5 | On- | Chip Ji  | tter Specification Testing of High-Performance PLLs | 89  |
|---|-----|----------|-----------------------------------------------------|-----|
|   | 5.1 | Jitter D | Definitions                                         | 90  |
|   |     | 5.1.1    | PLL jitter specifications                           | 92  |
|   | 5.2 | Jitter M | fleasurement Circuit (JMC)                          | 92  |
|   |     | 5.2.1    | State of the art in TDC design                      | .93 |
|   |     | 5.2.2    | High-resolution TDC                                 | 98  |
|   |     | 5.2.3    | Notation and definitions                            | 99  |
|   |     | 5.2.4    | Time quantizer                                      | 100 |
|   |     | 5.2.5    | Measurement range extension                         | 105 |
|   |     | 5.2.6    | Calibration                                         | 108 |
|   |     | 5.2.7    | Automatic resolution adjustment                     | 110 |
|   |     | 5.2.8    | Controlled load (CL) cell design                    | 116 |
|   |     | 5.2.9    | Resolution adjustment control block                 | 119 |
|   |     | 5.2.10   | TDC error sources                                   | 132 |
|   |     | 5.2.11   | Accuracy, Precision, and Resolution                 | 136 |
|   | 5.3 | Jitter G | enerator                                            | 141 |
|   | 5.4 | Schem    | es for On-Chip Jitter Specification Testing         | 142 |
|   |     | 5.4.1    | Cycle-to-cycle jitter measurement                   | 142 |
|   |     | 5.4.2    | Relative jitter measurement                         | 142 |
|   | 5.5 | Implen   | nentation                                           | 146 |
|   | 5.6 | Simula   | tion Results                                        | 150 |
|   |     | 5.6.1    | Jitter measurement circuit                          | 150 |
|   |     | 5.6.2    | Accuracy estimation                                 | 151 |

.

.

| 5.7 Conclusions                                                              | . 153 |
|------------------------------------------------------------------------------|-------|
| 6 Summary and Conclusions                                                    | 159   |
| 6.1 Future Research                                                          | . 163 |
| Bibliography                                                                 | 165   |
| Appendix A BICS Frequency Response Analysis                                  | 178   |
| <b>Appendix B</b> N and $\overline{I}$ Relationship in the Single-Phase BICI | 181   |
| <b>Appendix C</b> N and $\overline{I}$ Relationship in the Double-Phase BICI | 185   |
| Appendix D TDC Calibration                                                   | 187   |
| D.1 Two-point Calibration                                                    | . 187 |
| D.2 <i>n</i> -point Calibration Technique                                    | . 190 |
| Appendix E Metastability window of a D flip-flop                             | 194   |
| Appendix F Range Extender Block Analysis                                     | 197   |
| Appendix G Two-Parameter Model for $V_{dd}$ -induced Gate Delay Variations   | 199   |
| G.1 Test 1: Single Gate Delay Simulations                                    | . 199 |
| G.2 Test 2: Ring Oscillator Test                                             | . 201 |
| Appendix H TDC Power Supply Noise Analysis                                   | 205   |

# **List of Tables**

| 3.1 | Area overhead comparison of two current mirror matching techniques                                  | 43  |
|-----|-----------------------------------------------------------------------------------------------------|-----|
| 3.2 | Low frequency impedance and bandwidth of $Z_{BIC}$                                                  | 61  |
| 3.3 | Single-phase BICI test signals                                                                      | 66  |
| 3.4 | Simulation results for the average current measurements by single-phase                             |     |
|     | BICI circuit                                                                                        | 68  |
| 4.1 | Double-phase BICI test signals                                                                      | 86  |
| 4.2 | Simulations of average current measurements by double-phase BICI                                    | 88  |
| 5.1 | $T_{dif}$ and $S_{V_{ctrl}}^{T_{dif}}$ for different styles of CL cells. Numbers 1 through 6 in the |     |
|     | first column refer to the states $\vec{a} = 000001, 000011, 000111, 001111, 011111,$                |     |
|     | 111111 in test bench circuit shown in Fig. 5.10, respectively.                                      | 120 |
| 5.2 | Specifications of the implemented CL cells                                                          | 147 |
| . 1 | Number of product terms in the coefficients of $N_{BIC}(s)$ and $D_{BIC}(s)$ 1                      | 180 |
| A.1 |                                                                                                     |     |

# **List of Figures**

| 1.1  | CUT and the current sensor.                                                          | 3  |
|------|--------------------------------------------------------------------------------------|----|
| 1.2  | $\overline{I_{DD}}$ dependence on the duration and beginning of the averaging window | 6  |
| 2.1  | Integrated circuit design and test flow                                              | 15 |
| 2.2  | Integrated circuit test stages.                                                      | 18 |
| 2.3  | Examples of faults due to local defects                                              | 20 |
| 2.4  | Inductive fault analysis flow diagram.                                               | 33 |
| 3.1  | The proposed current monitor.                                                        | 39 |
| 3.2  | The block diagram of the current sensor.                                             | 41 |
| 3.3  | Current sensor circuit                                                               | 44 |
| 3.4  | AC model of a MOS transistor.                                                        | 45 |
| 3.5  | Single-phase BICI functional block diagram                                           | 48 |
| 3.6  | Single-phase BICI circuit schematic                                                  | 50 |
| 3.7  | Timing diagram and different waveforms in the BICI circuit of Fig. 3.6 .             | 52 |
| 3.8  | The transistor-level schematic of the integrator circuit                             | 59 |
| 3.9  | Schematic of the comparator                                                          | 60 |
| 3.10 | $Z_{BIC}$ versus frequency                                                           | 62 |

Х

| 3.11 | $I_{sense}/K_s I_{DD}$ transfer function for $I_{DD} = 1 mA$ at the operating point ( $K_s =$ |                 |
|------|-----------------------------------------------------------------------------------------------|-----------------|
|      | 1/6)                                                                                          | 63              |
| 3.12 | $V_{BIC}$ versus $I_{DD}$ DC characteristics                                                  | 64              |
| 3.13 | Measurement and simulation results for $ERR$ versus $I_{DD}$ considering the                  |                 |
|      | parasitic resistances in the circuit.                                                         | 65              |
| 3.14 | Current waveforms SPK1, SPK2, and SQ for validating the operation of the                      |                 |
|      | single-phase BICI circuit                                                                     | 67              |
| 4.1  | Double-phase BICI functional block diagram                                                    | 71 <sup>-</sup> |
| 4.2  | Quantization residue feed forward technique used in the ADC of double-                        |                 |
|      | phase BICI                                                                                    | 74              |
| 4.3  | Current integrating circuit schematic                                                         | 75              |
| 4.4  | Timing diagram and different waveforms in the integrator circuit of Fig. 4.3                  | 77              |
| 4.5  | Half-wave current integrating circuit schematic                                               | 79              |
| 4.6  | The transistor-level schematic of the half-wave integrator circuit (HCI)                      | 84              |
| 4.7  | Schematic of the comparator                                                                   | 85              |
| 4.8  | Current waveforms for validating the operation of the integrator circuit                      | 87              |
| 5.1  | (a) Measuring cycle-to-cycle or period jitter, (b) Measuring accumulative                     |                 |
|      | jitter using a reference clock, (c) Measuring relative jitter                                 | 91              |
| 5.2  | Block diagram of the proposed jitter measurement circuit                                      | 93              |
| 5.3  | Time digitization using a delay chain                                                         | 95              |
| 5.4  | Time digitization using differential delay technique                                          | 97              |
| 5.5  | Block diagram of the proposed TDC circuit                                                     | 99              |
| 5.6  | Time digitization using two oscillator period difference method 1                             | .02             |

| 5.7  | Measurement range extension to $(2^k - 1)T_A \dots \dots$ |
|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 5.8  | $KT_{ref}$ interval selection circuit                                                                                                                           |
| 5.9  | Automatic resolution adjustment circuit                                                                                                                         |
| 5.10 | CL cell evaluation test bench                                                                                                                                   |
| 5.11 | Different CL cell styles: (a1, b1, c1, d1, e1) Circuits; (a2, b2, c2, d2, e2)                                                                                   |
|      | simplified models                                                                                                                                               |
| 5.12 | Circuit for checking the necessary condition that $T_A > T_B$                                                                                                   |
| 5.13 | Alternative circuit for checking the condition $T_A > T_B$                                                                                                      |
| 5.14 | Circuit to generate $\tau_{A1} + \tau_{A2} < T_d < T_B$ for TATB checker circuit 125                                                                            |
| 5.15 | Algorithms for selecting $\vec{a}$ and $\vec{b}$ in uniform load CL method, (a) exhaustive                                                                      |
|      | search, (b) directed search                                                                                                                                     |
| 5.16 | Exhaustive search algorithms for selecting $\vec{a}$ and $\vec{b}$ in the 'incremental step'                                                                    |
|      | CL cell method                                                                                                                                                  |
| 5.17 | Semi-exhaustive search algorithm for selecting $\vec{a}$ and $\vec{b}$ in the 'incremental                                                                      |
|      | step' CL cell method                                                                                                                                            |
| 5.18 | Fast search algorithm for selecting $\vec{a}$ and $\vec{b}$ in the 'incremental step' CL cell                                                                   |
|      | method                                                                                                                                                          |
| 5.19 | Jitter generator circuit                                                                                                                                        |
| 5.20 | Cycle-to-cycle jitter measurement                                                                                                                               |
| 5.21 | Relative jitter measurement                                                                                                                                     |
| 5.22 | Top block-level schematic of the jitter measurement circuit                                                                                                     |
| 5.23 | Implemented TQ circuit                                                                                                                                          |
| 5.24 | TDC resolution adjustment simulation waveforms                                                                                                                  |

| 5.25        | TDC measurement error for resolution of 34.1 ps                                        |
|-------------|----------------------------------------------------------------------------------------|
| <b>E</b> .1 | The test bench for estimating the metastability window of a D flip-flop 195            |
| E.2         | Simulation results of the flip-flop output for five different values of $\tau_D$ 196   |
| E.3         | Simulated clk-to-Q delay versus $\tau_D$                                               |
| 0.1         |                                                                                        |
| G.1         | The test bench for validating the two parameter model for $V_{dd}$ -induced gate       |
|             | delay variations                                                                       |
| G.2         | The ring oscillator test bench to validate the two-parameter model for $V_{dd}$ -      |
|             | induced gate delay variations                                                          |
| G.3         | Ring oscillator period jitter from simulation and the two-parameter model              |
|             | for $V_{dd}$ -induced gate delay variations, a) $f=1.1$ GHz, b) $f=350$ MHz, c) $f=50$ |
| · •         | MHz, d) $f=5$ MHz                                                                      |

## Claims of Originality

#### Chapter 3:

- A novel analytical method for obtaining the frequency response of a built-in current sensor (BICS). This analysis leads to closed-form equations for the frequency domain characteristics of the BICS.
- IC Prototype measurement results for the BICS. Such results leads to conclusions about the critical importance of the routing resistances in BICS layout design.
- A novel single-phase built-in current integrator (BICI). Both the current integration technique and the circuit implementation are new. This design is small enough to reside on-chip, provides a digital signature, and is robust against process variations.

#### Chapter 4:

• A novel *quantization residue feed forward* technique. This technique prevents accumulation of ADC quantization noise which significantly enhances the accuracy of the BICI. Using this technique leads to a new, double-phase BICI which provides high accuracy for any power supply current waveform.

#### Chapter 5:

- A novel time quantizer (TQ) structure. This TQ structure achieves high resolution and high accuracy simultaneously.
- New resolution adjustment (RA) techniques for time interval measurement. These techniques guarantee high resolution under process variations.

- A high-resolution time to digital converter (TDC). The TDC uses the designed TQ and resolution adjustment circuits to measure time interval with high-resolution and accuracy. The design of this circuit is compatible with a conventional digital design flow.
- A jitter measurement circuit (JMC). The JMC uses the designed TDC to perform highresolution and accuracy single-shot jitter measurement with resolution and accuracy of approximately 10 ps.
- A jitter generator circuit. This circuit performs digital phase modulation using only digital gates. It can be used for jitter tolerance testing of phase locked loops.
- A novel power supply noise-induced gate delay variation model. This model has only two fitting parameters and is used to show the immunity of the TQ to power supply noise.

# Acknowledgements

My special thanks goes to Dr. André Ivanov, my supervisor, whose invaluable support and constant help, encouragement and guidance made this contribution possible. He has been a great source of inspiration and a role model to me not only in technical and educational sense but also in many aspects of life as a person.

Many thanks also to my supervisory committee, specially Dr. Mike Jackson, for reading my thesis carefully and giving me precious feedback which resulted in great amount of improvement in both style and content of this thesis.

I also thank Mr. Yong Luo, Mr. Zhurang Zhao for their help in testing the fabricated chips, writing 'vhdl' codes and also for their feedback and interesting discussions about different circuits. My thanks also to Mr. Farinam Farahmand and Mr. Alireza Moshtaghi for writing the CAD tools which were used from time to time in the course of this research.

I'd like also to thank Micronet, PMC-Sierra for their financial support and CMC for providing design tools and access to fabrication facilities.

Throughout the years of my Ph.D., so many family and friends have been great sources of motivation and inspiration to me which enabled me to complete this work. Thousands of thanks to all of them specially my loving parents and my wonderful sisters, Nasrin and Nooshin. And, so many thanks to my fiancée, Mojgan, who understood, helped, and encouraged me during the intense and sometimes difficult months that I was busy writing this thesis.

## Sassan Tabatabaei-Zavareh

The University of British Columbia February 2000 To my loving parents

# Chapter 1

# **Motivations and Overview**

Testing is an integral part of the integrated circuit (IC) production which ensures the correct functionality of the final product shipped to the customer. Smaller geometry technologies, higher density [1], increased performance and implementation of different sub-systems (analog, digital, memory, micro-electro-mechanical systems (MEMS), ...) on the same chip require new test methodologies. Without such methodologies, testing cost can become the major hurdle in the way of the future progress of IC production industries [2].

This work is an effort to provide effective test solutions for testing widely used mixedsignal circuits, more specifically phase-locked loop (PLL) circuits. Two different strategies have been developed for testing embedded mixed-signal circuits: power supply current testing and jitter testing. The first scheme can be used to test any analog and mixed-signal circuit, whereas the second one is more appropriate for timing circuits such as clock recovery, clock synthesis, and clock skew compensation circuitry, which are often PLL based.

This chapter outlines the state of the art and motivations of the research reported in the rest of this thesis. Sec. 1.1 and 1.2 review the literature and the incentive for designing

1

the on-chip current monitor and jitter test circuits, respectively. A background of IC testing, and analog and mixed-signal test methodologies, including the definitions of the major concepts, is given in Chapter 2 for readers who are interested or do not have sufficient background in mixed-signal IC testing.

## **1.1** Built-in current monitoring

The success of supply current monitoring in digital CMOS integrated circuits (IDDQ testing) [3, 4, 5] has prompted researchers to investigate the feasibility of monitoring supply current ( $I_{DD}$ ) in analog circuits as a testing methodology [6, 7, 8, 9]. The results of these investigations suggest that  $I_{DD}$  testing, although not sufficient by itself, can offer additional fault coverage for analog circuits.

Often,  $I_{DD}$  testing in analog circuits is based on measuring the average, rms, or the value of the current at specific times, and comparing these against a pre-defined tolerance range. If the measured value falls outside the associated fault-free tolerance range, the circuit is declared faulty [6, 7, 8]. Average  $I_{DD}$  is a convenient and attractive test signature because it is a compact indicator of the current value over time. It also serves as a means of power monitoring when the supply voltage is constant [10].

Due to the lack of resolution of off-chip current sensors, parasitic currents of pads, and also fault masking effects (*e.g.*, the  $I_{DD}$  of one block masking the faulty  $I_{DD}$  of another one), it is necessary to isolate the analog blocks in small groups, sense the supply current of each group using built-in current sensors (BICS), and average the associated  $I_{DD}$ 's on the chip [6].

## 1.1.1 On-chip current sensing

A number of different BICS's have been proposed for current monitoring. All these sensors insert a current sensing device [11, 12, 13, 14] or rely on the circuit intrinsic impedance in the  $I_{DD}$  path [15], as shown in Fig. 1.1. A voltage drop across the sensing device ( $V_{BIC}$ )



Figure 1.1: CUT and the current sensor.

results as  $I_{DD}$  passes through it. Therefore, the circuit under test (CUT) supply voltage,  $V_{CUT}$ , is:

$$V_{CUT} = V_{DD} - V_{BIC} \tag{1.1}$$

Drawbacks of the  $V_{BIC}$  voltage drop include performance degradation of the CUT, *i.e.*, additional delay and noise in the circuit [16].

 $V_{BIC}$  can usually be expressed as:

$$V_{BIC} = V_{DC} + V_I = V_{DC} + f(I_{DD})$$
(1.2)

where  $V_{DC}$  and  $V_I$  are the  $I_{DD}$ -independent and  $I_{DD}$ -dependent components of  $V_{BIC}$ , respectively, and f is the function relating  $V_I$  to  $I_{DD}$ .  $V_{DC}$  is often a DC voltage whose effect is the reduction of the CUT supply voltage. The  $V_I$  component of  $V_{BIC}$ , to a first order, can be modeled by an impedance ( $Z_{BIC}$ ) in the supply current path (as illustrated in Fig. 1.1). The impact of  $V_{DC}$  on the operation of the CUT can be minimized by choosing small values of  $V_{DC}$  and designing the CUT to operate with reduced supply voltage.  $Z_{BIC}$ , however, can cause an unacceptable degradation in the CUT performance due to the induced noise, *i.e.*,  $V_{BIC} = Z_{BIC}I_{DD}$ , resulting from  $I_{DD}$  variations.

Most current sensors proposed in the literature use a resistor or a MOS transistor biased in the triode region as the current sensing device. The voltage drop across the sensing device is used as the current value indicator. Such sensing devices trade sensitivity for impact on CUT performance. Designing for small values of  $Z_{BIC}$  reduces the impact of the sensor on the CUT performance by reducing  $V_{BIC}$  variations [17], which in turn reduces the sensitivity of the sensor. In digital circuit IDDQ testing, high current sensor sensitivity is not required because the sensor only needs to distinguish two values of current which are typically at least one or more orders of magnitude apart [4]. In analog circuits, however, both a high sensitivity and a low  $Z_{BIC}$  value are required for acceptable performance.

The current sensor from [6] uses an NMOS transistor operating in the triode region for sensing  $I_{DD}$  of embedded analog blocks. This sensor cannot provide a high sensitivity and a low  $Z_{BIC}$  simultaneously. Even if the sensitivity requirement is relaxed, the sensor needs a very large NMOS transistor (width ~ 1mm - 10mm) to obtain an acceptably small  $Z_{BIC}$  (e.g.,  $10\Omega$  to  $1\Omega$ ). In [18], a low voltage current mirror based on a NMOS device and a buffer is used as the sensor. Although this design reduces  $V_{DC}$ , it has two problems: (i)  $V_{DC}$  is process dependent and *(ii)* it does not reduce  $Z_{BIC}$ . The circuit proposed in [10] provides both a small  $Z_{BIC}$  and a low process-insensitive  $V_{DC}$ . However, in [10] the  $Z_{BIC}$  characteristics have not been analyzed. Such an analysis is essential to optimize the sensor's characteristics to achieve a low  $Z_{BIC}$  in a wide frequency range while minimizing the sensor area.

#### **1.1.2 On-chip current averaging**

The current sensed by a BIC sensor has to be integrated over a time window to obtain the average  $I_{DD}$  current,  $\overline{I_{DD}}$  as shown below:

$$\overline{I_{DD}}(T,t_b) = \frac{1}{T} \int_{t_b}^{t_b+T} I_{DD}(t) dt$$
(1.3)

where T and  $t_b$  are the duration and the starting time of the averaging time window. Eqn. 1.3 indicates that by choosing a fixed averaging period T, averaging can be replaced with integration. In general,  $\overline{I_{DD}}$  depends on T and  $t_b$ ; for example, Fig. 1.2 shows how  $\overline{I_{DD}}$  varies for three different selections of T and  $t_b$  when  $I_{DD}$  is a sinusoid with offset. This requires timing circuitry to synchronize the averaging operation with  $I_{DD}$ . However, if T is chosen such that  $T \gg 1/BW_I$  where  $BW_I$  is the bandwidth of  $I_{DD}$  AC component, then  $\overline{I_{DD}} \sim I_{DD(DC)}$  where  $I_{DD(DC)}$  is the  $I_{DD}$  DC component which is independent of T or  $t_b$ .

Therefore, to avoid the need for synchronizing circuitry, averaging should be performed over a long period of time, e.g.,  $T = 100/BW_I$ . This condition results in long averaging windows in the range of a few tens of milliseconds for circuits such as audio frequency filters in which  $BW_I \sim 4 \, kHz$ .



Figure 1.2:  $\overline{I_{DD}}$  dependence on the duration and beginning of the averaging window

A simple method of integrating  $I_{DD}$  is to pass it through a capacitor; the voltage stored across the capacitor at the end of the integration window will be proportional to  $\overline{I_{DD}}$  [10],

$$V_C = \frac{1}{C} \int_0^T I_{DD} dt = \frac{T}{C} \overline{I_{DD}}$$
(1.4)

Using this method may require a large capacitor. For example, assuming a supply voltage of 3.3V, an  $\overline{I_{DD}} = 100 \ \mu A$  and  $T = 1 \ msec$ , a 30 nF capacitor would be required to ensure that  $V_C < V_{DD}$  during integration, and therefore the electronics providing the current to the capacitor do not saturate. Such a large capacitor would obviously not be able to reside on-chip. In [19] a capacitor connected between  $V_{SS}$  and  $V_{DD}$  terminals of the CUT, and the power supply is switched periodically to estimate the CUT power consumption. This estimate also provides an indirect estimate of  $\overline{I_{DD}}$ . This method, however, does not measure  $\overline{I_{DD}}$  directly and also requires a large capacitor. Therefore, an on-chip bus or multiplexer is needed to direct the  $I_{DD}$  from different analog blocks on the chip to an output pin. In addition to requiring an extra pin, the bus and pad parasitics reduce  $\overline{I_{DD}}$  measurement accuracy.

## **1.2 PLL testing**

An example of a widely used class of circuits requiring functional testing is phase-locked loops (PLL), since most structural test methods are either too intrusive (affect the performance) or provide poor correlation to important PLL specifications such as jitter.

PLL testing has gained significant interest recently due to the widespread integration of PLLs in mixed-signal communications and data processing ICs. PLLs are used in timing applications such as clock synthesis, clock recovery, and clock skew compensation. In such applications, the single most important set of specifications for a PLL is its jitter characteristics, such as intrinsic jitter, jitter transfer function, and jitter tolerance[20]. In fact, jitter specifications are critical in most high-speed interfaces because of a limited available timing budget. Jitter testing, however, is an expensive test because it requires costly equipment and long test time[21].

A number of authors have tried to find cost-effective test methods for PLLs [22, 23, 8, 24, 25, 26, 27, 28, 29, 30, 31, 32]. Works in [22] suggests partial specification testing such as lock range, lock time, and power supply current for PLL testing. Dalmia et al. [8, 29] also show the viability of power supply current monitoring for PLL testing. The authors of [24, 25] propose methods for efficient fault simulation of PLLs and suggest lock frequency range measurement for PLL testing. Although a combination of these techniques seems to provide a good fault coverage, it is difficult to correlate the test results to important jitter specifications, partly because simulating jitter for fault-free and faulty circuits is extremely

difficult due to a lack of tools capable of simulating noise in non-linear dynamic circuits. Azias et al. [26] proposes a reconfiguration technique for testing ring oscillator-based PLLs. This technique has the advantage of being compatible with digital test methods, but it requires reconfiguring sensitive parts of a PLL. Also, it exhibits the problem of unknown correlation of test results and functional specifications.

A practical solution to PLL testing is to directly measure the jitter characteristics of the PLLs on the chip. The following section reports major methods for on-chip jitter testing.

### **1.2.1** Jitter testing

Jitter testing often requires jitter measurement. Jitter measurement techniques are divided into two categories: frequency domain and time domain. Historically, spectrum-based techniques have been used more often because of the availability of high-frequency spectrum analyzers [33, 21]. The drawback of this method is that the amplitude distortion and noise degrade the measurement accuracy. Also, this method is not suitable for on-chip measurement because performing high-resolution spectrum analysis on the chip requires a significant amount of design time and silicon area.

In the time domain, jitter measurement can be performed using modulation domain analysis [34], histogram-based [21] or undersampling [21] techniques. In [30], a BIST circuit is proposed which is capable of measuring lock range and loop gain of a PLL in addition to performing a jitter test. Because of its statistical nature and dependence on bit error rate (BER), the application of the jitter testing approach in [30] is limited primarily to clock recovery PLLs. Also, high-resolution measurement requires generation of precise, high-resolution and stable delays. Two US patents in [27] and [28] outline on-chip jitter

8

measurement techniques for PLL testing. The BIST circuits proposed in these patents are mixed-signal and their resolution is limited to one gate delay, which is inadequate for testing high-speed PLLs. In [31], a jitter transfer function measurement circuit is proposed for PLL testing; however the jitter measurement circuit proposed does not have sufficient resolution for intrinsic jitter testing. Veillette et al. [32] proposes an interesting method for generating jitter at the input of the PLL for jitter transfer testing. This method, however, requires reconfiguration of the feedback in the PLL loop, which could affect the performance of the loop.

Since most of the time domain techniques require some sort of time interval measurement, on-chip accurate jitter measurement necessitates high resolution on-chip time measurement circuitry. For on-chip time interval measurement, a number of time-to-digital converters (TDC) circuits, used in physics experiments, have been proposed in the literature [35, 36, 37, 38, 39]. Such circuits, however, are mixed-signal, require custom design and layout, occupy a large area, do not provide high resolution, or rely heavily on matching of the elements. The idea of a controlled delay line used in TDCs is also used as an on-chip jitter measurement method in [27]. This circuit, however, is mixed-signal and suffers from the same limitations as most TDCs.

# **1.3** Motivations and Contributions

Although power supply current testing is proposed as an attractive structural testing method for mixed-signal circuits, it has limited applicability due to the lack of a practical BICM scheme. Two issues have to be addressed in an effective BICM:

9

- 1. The current has to be sensed with high resolution without introducing large impedance in the  $I_{DD}$  path
- 2. The  $I_{DD}$  signature has to be digital and accurate. In addition, it has to be generated without requiring large chip area

These requirements led to the design of a BICM reported in Chapter 3. This BICM is capable of sensing  $I_{DD}$  without introducing significant impedance in the power supply current path. It averages the current and generates a digital signature. Since the circuits are compact and easy to design, they are suitable for BIST and DFT applications. The circuits can provide an rms error less than 2% if the  $I_{DD}$  contains significant AC components which are uncorrelated with the system clock.

As another contribution of this thesis, a double-phase built-in current integrator is reported in Chapter 4. This integrator occupies slightly larger area than the single-phase integrator introduced in Chapter 3, but it can measure the average current with an rms accuracy of 1% for any current waveform.

The design of on-chip jitter test circuit was undertaken because the effectiveness of structural tests or non-jitter based partial specification tests can not be proven, and also existing external jitter testing methods are costly, if possible at all. At the start of this effort, there was no solution for high-precision on-chip jitter testing. Even today, the new high-precision on-chip jitter testing technique by Logic Vision, Inc. [30] is based on bit-error-rate whose application is limited to clock-recovery PLLs. Another recent technique has been announced by Fluence Technology, Inc.<sup>1</sup>, but no information is available in the public domain about the specifics of the technique. For a practical and widely applicable jitter BIST solution, we set

<sup>&</sup>lt;sup>1</sup>See the VCOBIST data sheets at http://www.opmaxx.com/products/products.htm

the following objectives for the jitter measurement circuit:

- Single-shot jitter measurements (similar to those performed by an external equipment).
   This is to make the test solution independent of the circuit under test
- 2. Compatibility with digital design flow
- 3. Ability to generate digital signatures
- 4. High-resolution and precision (better than 10 ps in a  $0.35\mu m$  technology)

The search for on-chip jitter test solutions resulted in the novel jitter measurement and generation circuit for BIST of PLLs which is presented in Chapter 5. The circuit satisfies all the conditions for a practical BIST circuit mentioned in Sec. 2.2.4 as well as the objectives mentioned above. The measurement circuit is fully digital and automatically synthesizable, occupies an area equivalent to 1200 2-input NAND gate, provides a resolution of approximately 10 ps (~1/5 of a gate delay in a standard 0.35  $\mu m$  technology) and a precision of 11 ps, and generates a digital signature which can be read out by an inexpensive tester. The digital signature can be further analyzed by the tester to obtain the jitter characteristics.

In addition to the circuit structures, a number of design and analysis techniques have been employed in these designs which have wider applications. These techniques are:

- 1. The approximation technique used in Appendix A to obtain a closed form for the frequency response of the BICS
- 2. The noise feed-forward technique presented in Chapter 4, which is an effective method for quantization noise reduction in averaging analog to digital converters

- 3. The differential method used for jitter measurement which provides power supply noise rejection. This property proved in Appendix H makes such architecture a strong candidate for a multitude of high-precision timing circuits
- 4. The  $V_{DD}$ -induced modeling of gate delay variation presented in Appendix G. This model can be used for estimating the effect of power supply noise on digital circuits

# Chapter 2

# **Background And Survey**

Today's electronic integrated circuits are extremely complex and dense. New technologies allow multi-million transistors [1] and mixed-signal circuitry on the same chip. This progress has led to the design of integrated circuits possessing an unprecedented functionality with the expectation that this trend will continue. As the complexity of integrated circuits increases, IC manufacturers face many new challenges, one of which is developing effective test solutions for ICs.

This chapter provides the background, terminology, and principles often used in the field of IC testing. Sec. 2.1 reviews major concepts and the terminology used in IC testing. Sec. 2.2 reports the major issues encountered and techniques developed for testing mixed-signal circuits in more detail. Sec. 2.3 concludes this chapter by summarizing the important concepts and describing some active research areas in the field of mixed-signal testing.

# 2.1 IC Testing

The typical IC design flow starts with a concept which leads to a set of functionalities and specifications. After that, an appropriate technology is selected which is believed to yield the required performance. Using the appropriate technology models, the specifications are mapped to an implementable design, which is analyzed and simulated to ensure that the design satisfies all the requirements. Subsequently, some prototype ICs are fabricated, characterized and diagnosed thoroughly to determine if the device meets all its specifications under the given process variations. The design and/or fabrication process is corrected according to the results before the IC is sent for production (Fig. 2.1) [40]. In the last stage of production flow, each fabricated device undergoes a series of tests before it is shipped to the customers.

Some tests are performed on the wafer before cutting the dies (wafer test), and more extensive tests are run on the packaged devices (final test). Since time-to-market is a critical factor in the success of an electronic product, the test procedures are often devised parallel to design verification and prototype characterization, and automatic test equipment (ATE) programs are developed before the first batch of devices is fabricated.

This section reviews the major concepts in the field of IC testing. First, in Sec.2.1.1, the fundamental question as to why ICs must be tested is addressed. Then, a definition of test and the difference between test and its close relative, diagnosis, is presented in Sec. 2.1.2. The major test disciplines, functional and structural testing, are defined and compared in Sec. 2.1.3. Often, regardless of the test discipline selected, the concepts of *defect* and *fault* are used to generate or evaluate different test methodologies. These concepts are reviewed in Sec. 2.1.4. They are also used in Sec. 2.1.5 to briefly describe the two general test gen-



Figure 2.1: Integrated circuit design and test flow

eration methodologies, automatic and ad-hoc. Next, in Sec. 2.1.6, Design-for-testability (DFT), built-in self-test (BIST) and the embedded test concepts are defined. These techniques are used when it is difficult to generate or apply the test to the circuit under test. Finally, a very brief review of major test methodologies for digital circuit testing is presented in Sec. 2.1.7 since many of the basic ideas have also found their way into the mixed-signal test domain.

## 2.1.1 Why test integrated circuits?

If a device functions properly under the nominal variations of a fabrication line, why is there a need to test each device at the end of the line? The answer lies in the imperfections of the fabrication line. These imperfections result in some random failures at various stages of the line; *e.g.*, mask misalignments, missing or extra metal, polysilicon or oxide due to spot defects and cracks [40]. Some experts believe that these failures can be avoided by improving the fabrication line, and subsequently increasing the yield to about 100%. Assuming this opinion to be theoretically correct, the need for testing is still unavoidable because

- fabrication line improvement would require resources that may not be justified by the resulting increase in yield. This becomes especially true for the devices with short product life times;
- as fabrication line technology improves, industry demands smaller device geometries and more masking stages which drive the yield down; therefore, the need for testing.
   In reality, almost all IC manufacturers test (in-house or outsource to test houses) their final products to bridge the gap between the fabrication imperfections and their customer's expected quality [2].

In some applications such as aviation systems, military equipment, and deep space systems, where correct functionality can be life-critical, the test quality is the important factor and the cost is a secondary concern. However, in commercial and consumer applications, the test choice depends on the trade-off between test quality and test cost. For today's complex chips the test cost can account for 30% of the total cost of the product [41], and may even surpass the cost of manufacturing in near future [2]. As a result, industry has focused on finding efficient ways of testing circuits and systems to increase the quality to cost ratio.

In addition to quality assurance, testing is sometimes used as a critical tool to locate and diagnose failures and to take corrective measures where possible. However, there are distinctions between test and diagnosis which are described next.

#### 2.1.2 Test and diagnosis

In general, testing is the process of identifying a faulty device, while diagnosis is the procedure of locating the source of the malfunction. As shown in Fig. 2.1, during early production stages of a design, extensive diagnosis procedures are run on the samples of the device in order to locate the most frequent faults in the circuit and improve the design or the production line in order to achieve an acceptable yield. When the device is approved for mass production, each device undergoes two stages of electrical testing. The first one is called *wafer probe* and is performed prior to packaging (Fig. 2.2). At this stage thin probes are used to apply and monitor signals at various nodes of the circuit. Devices declared faulty at this stage are discarded in order to save the high cost associated with packaging a faulty device. Furthermore, there are a number of non-electrical test and quality control procedures that can be used at this stage such as optical, infrared, scanning microscopy, and thermal imaging analysis [42].



Figure 2.2: Integrated circuit test stages.

Since packaging can introduce some faults [40, Chapter 16], and also because of the difficulty in applying high-performance tests on the wafer, a second test, called *final test*, is run on the device. Since there is no direct access to the internal nodes of the device during the final test, the test must determine the status of the chip using only package pins.

In each electrical testing step (wafer probe and final test), different approaches to test development can be chosen. These are generally divided into two groups: functional and structural. These two approaches are described in the next section.

## 2.1.3 Functional versus structural testing

Functional testing is defined as testing the conformance of the functionality of a circuit to its specifications. Historically, this type of testing has been the method most commonly used, but as the functionality of chips increased, such testing became costly and in many cases impossible. For example, for a combinational logic circuit with 64 inputs with a testing

speed of  $10^8$  patterns per second, it can take 5849 years to test the complete functionality of the chip! In other circuits such as complex analog or mixed-signal circuits, the functional testing would require a large amount of time and resources, translating into higher cost per device.

Another approach is to assume that a well-designed and analyzed circuit which has been thoroughly debugged, *i.e.*, all systematic or design errors have been eliminated prior to sending it for fabrication, will meet its specifications except when there is a fault in the circuit due to physical defects. Therefore, after identifying and modeling the possible faults in the circuit, the test procedure tries to determine their existence or absence. Should a fault be detected, the device is declared faulty and is discarded. This approach, called structural testing, is expected to achieve a more efficient outcome. But is this expectation reasonable? A supportive argument is that different functional specifications of a system are correlated, *i.e.*, if a defect exists in the circuit, it will most likely affect several circuit specifications. This notion implies that functional testing might be overkill, and detecting faults in the system would require less effort. This can be compared to data compression in which the key to compression is to transform highly correlated data to a set of uncorrelated data which carries the same information.

The next section provides the definition of defect and fault which are used to generate tests and also measure the quality of a test.

#### **2.1.4 Defects and faults**

A defect usually refers to physical failures in a circuit, while a fault is the dysfunctional electrical effect of the defect on the operation of the circuit. In a defect-oriented test method-



Figure 2.3: Examples of faults due to local defects

ology, to devise or evaluate a test strategy for a circuit, fault models based on the possible defects are first chosen, and then a test methodology is selected to detect the modeled faults. The faulty circuit is simulated to ascertain whether the selected test can detect the fault. The percentage of faults detected by a test is one measure of the test quality.

The effects of different defects are technology-dependent. Much research has focused on identifying and modeling defects in different technologies. For example, a missing oxide could cause a gate-substrate short in a CMOS transistor. Some of the common defects in most technologies include missing or extra oxide, metal, or polysilicon, and inaccuracy in the etching or doping processes. The electrical consequences of these defects include opens, shorts, as well as significant stray elements and large deviations in component values. Examples of shorts and opens due to spot defects are shown in Fig. 2.3. Hnatek [40] reviews possible physical defects and their sources at each stage of the fabrication process in detail. Faults are not always mapped directly to defects. In fact, sometimes faults are defined at higher levels of abstraction because of advantages such as the ease of test generation (*e.g.*, stuck-at fault model [43, Sec. 4.5]) and tight connection to performance parameters (*e.g.*, performance-dependent faults [44]).

### **2.1.5** Test generation

Fault models are often used to generate test strategies. To detect a fault, the fault should be activated by a set of inputs that cause a significant deviation in a circuit output parameter from its nominal value in the presence of the fault. To activate the fault, it should be both controllable, *i.e.*, the fault-free and faulty status of the area affected by the fault are significantly different, and observable, *i.e.*, the outputs reflect the change in the expected behavior of the faulty area. There are two methods used in test pattern generation:

1. Automatic test pattern generation (ATPG) algorithms. Examples are ATPG algorithms for combinational digital circuits [43, Chapter 6][45]

2. Ad hoc methods. This methods are applicable to specific circuits

The ATPG methods cannot be used easily for generating test patterns for some circuits, e.g., analog circuits, because the interaction of the signals at different nodes is too complex. An ad hoc approach used in these cases is to consider an input pattern and simulate the faulty circuit to determine whether the selected pattern can activate the fault. Fault simulation requires simulating the circuit under different faulty conditions and input signals. Simulations in low levels of abstraction (circuit level) for large circuits are computationally prohibitive. This problem is alleviated by modeling the circuit at higher levels of abstraction. Sensitivity analysis [46] has been also proposed as a test generation method. This method is regarded as a fast substitute for fault simulation because it can help to identify appropriate test stimulus in the possible space of stimulus. Functional testing is also an ad hoc method as it is circuit specific.

Once the test input stimulus has been identified, the test is exercised on each chip by applying the stimulus to the circuit and observing the outputs with a tester. However, in some cases, applying a test to a circuit with an external tester does not suffice or is not possible; in such cases, other methods are used such as design for testability (DFT), builtin-self-test (BIST), and Embedded Test, which are explained next.

# 2.1.6 Design for testability (DFT), built-in-self-test (BIST), and embedded test

In many high-density devices it may be extremely difficult to generate the test pattern for a fault using only primary inputs and observing outputs. In such cases, additional circuitry or techniques can increase the testability of the circuit significantly. Many manufacturers, realizing the importance of testing and the cost involved, include these test circuits on their devices as a means of decreasing test expenses, and, as a result, cost per device.

There are two main categories of DFT methods: general purpose, and ad hoc methods. DFT can mean adding some additional circuitry or modifying the CUT to increase the accessibility of internal nodes of a device, or can be as complete as a built-in-self-test (BIST) scheme. Perhaps, the best example of a standard DFT method is scan chain in sequential digital circuits [43, Sec. 9.4]. In a BIST scheme, the device under test (DUT) is switched to test mode, in which after a specified time required for completion of the test operation in the chip, a flag indicates whether the DUT is faulty or not. BIST for digital technology has been well investigated and is widely used in cases such as memory testing. Some of the advantages of BIST are as follows:

- 1. It eliminates the need for expensive test equipment. As technology improves and circuits with higher speeds and increased functionality are designed, the test equipment has to be upgraded correspondingly, adding to the testing cost
- 2. A large number of chips can be tested simultaneously, resulting in cost reduction
- 3. Tests can be performed on-site repeatedly. Therefore, if the customer needs, he/she can run a field test on the chip under his/her required conditions
- 4. It provides automatic and fast diagnosis which reduces time-to-market considerably [47]

BIST can be viewed as moving the tester into the chip. Inclusion of a full BIST in present day mixed chips (containing analog macros, memory, MEMS and digital circuit) may not be practical due to the area overhead. An alternative approach is to move part of the tester to the chip and leave the rest to external equipment. Such an approach is called Embedded Test. Usually embedded test circuits are designed to perform very high speed tests which are difficult to perform because of the gap between external and internal bandwidth [2]. As the chip complexity and performance increases and access to internal circuits and embedded blocks is reduced, embedded test becomes more and more necessary.

#### 2.1.7 Digital circuit testing

Digital circuit testing is a relatively mature field, and today, ATPG tools are able to generate efficient test patterns for digital circuits. The key to the success of these tools is the existence of efficient fault models, fault simulators for digital circuits and ATPG algorithms.

In a combinational digital circuit, any logic node can have one of two logical values. The status of a logic node depends on the inputs to the circuit. It can be assumed there is a fault in the circuit if a logic node does not change according to the design, or changes too late or too quickly. Consequently, stuck-at and delay faults models have been proposed for digital circuits [43, Chapter 4].

In a stuck-at fault model, it is assumed that a faulty logic node is stuck at one of the logic values (stuck-at-1 and stuck-at-0) and does not change when it is supposed to. By applying proper patterns to the inputs, a stuck-at-v (v = 0, 1) fault can be activated to  $\overline{v}$  (controllability) and observed as a discrepancy between obtained and expected outputs (observability). To detect a delay fault, at least two patterns are needed.

Sequential circuits contain memory elements. Therefore, the outputs depend not only on the circuit inputs, but also on the current state of the system, which is a function of previous inputs and states. Forcing the circuit to a known state and then applying a specific input is a challenging problem. Two alternate methods which are sometimes used to test sequential circuits are Scan Path and CrossCheck [48]. The idea of the scan path method is to divide the circuit into two parts in test mode; a combinational block and a serial shift register containing all the flip-flops in the circuit. Test patterns for the combinational part can be generated using ATPG. Entering a sequence of 1's and 0's from one end of the test shift register and reading it from the other end can determine whether any of the flip-flops in the chain are faulty. In the CrossCheck method some circuitry is added to each logic block of the device so that those blocks are accessible through external pins. Therefore, each logic block including flip flops can be accessed, initialized, and probed directly.

Test strategies based on these models have proven sufficient for most applications, although there are defects which the voltage-based test strategies do not cover. Additional test such as IDDQ testing is needed [49, 50] for these defects.

#### **IDDQ** testing

Using the quiescent supply current  $(I_{DDQ})$  as a testing parameter for CMOS circuits has become increasingly popular, mostly because it is simple and effective [3] [4]. Research shows that for some faults, the DUT draws significantly more (or less) current from its power supply than the nominal value. Therefore, many faults can be detected by monitoring  $I_{DD}$  for specific inputs. IDDQ testing is an example of massive observability because for each input pattern, a large number of faults can be detected by monitoring  $I_{DDQ}$ . In fact, some researchers claim that there are some types of defects that only IDDQ testing detects because they do not alter the functionality of the DUT, but the chip is still faulty because it draws an excessive amount of current [49]. This may lead to reliability problems.

In order to detect a defect by IDDQ testing, it should be activated by an input pattern. Currently, ATPG tools are able to generate IDDQ testing patterns for digital circuits. Although there is a conjecture that these techniques could also be used to test analog circuits, not much research has been done in this area.

25

# 2.2 Analog and Mixed-Signal Testing

The art of testing of analog circuits, unlike its digital counterparts is far from maturity. Most test engineers use ad hoc approaches that do not represent a unified test generation methodology applicable to all analog circuits. Functional testing is still widely used to test these circuits, but as the technology of mixed-signal and analog VLSI circuits progress, the cost of thorough functional testing becomes increasingly prohibitive. Currently, industry is facing serious issues with respect to testing analog and mixed-signal circuits, and it is calling for solutions to this matter, which in some cases has become a bottleneck for manufacturers. This section reviews the major developments in analog and mixed-signal testing to date.

Analog circuits are different from digital circuits in their operation. The input and output signals of an analog circuit are continuous waveforms, and since internal components do not work as switches, their values and characteristics become more important in the operation of the circuit. Digital DFT methods based on partitioning the circuit are often not applicable in the case of analog circuits due to negative impact of DFT circuitry on the performance of the circuit. Also, for many circuits expressing the outputs in an analog circuit in terms of inputs and element values cannot be formulated in a generic form applicable in faulty conditions.

The lack of effective fault models in higher levels of abstraction is one of the most important problems in analog testing. Simulating or analyzing an analog circuit at the circuit level is lengthy and computationally expensive.

Proposed fault models in analog circuits are basically of two types:

1. hard or catastrophic faults, including shorts and opens. Shorts can be capacitive or resistive. These faults can change the topology of the circuit, posing a problem in

using topology methods for analog testing and diagnosis.

2. Soft or parametric faults including some variations in element values for which some functional parameters of the circuit are affected significantly.

Catastrophic faults generally degrade the performance of the system significantly and can be detected by simple tests. For soft faults a decision has to be made regarding the amount of element value variation which is considered faulty. This is a difficult problem, especially when multiple fault models are included.

#### 2.2.1 History

Research on test and diagnosis of analog circuit testing started in the 1970's, almost one decade after digital testing had attracted attention. A number of reasons have been given for this late start [51]. Analog circuits were still largely discrete and relatively small, there-fore, functional testing was possible. Consequently, there was no industrial motive to pursue research in this area. The other reason was the lack of any major breakthrough in the academic research [51].

Since the 1970's, there have been some important results achieved largely in the area of diagnosis of linear circuits. Today's modern analog circuits are mostly non-linear, or become non-linear under faulty conditions. Some researchers have attempted to extend the linear methods to non-linear problems with some degree of success. Some of the efforts in the area of analog fault diagnosis are as follows:

**Post measurement simulations or simulation after test (SAT):** Assuming a certain circuit connectivity, this method tries to solve for the element values using the voltage and currents of accessible nodes and branches, and therefore, determine which component in the circuit is faulty. Berkowitz [52] initiated the subject of solvability of a network based on knowing the currents and voltages in a circuit. Trick et al. [53], and Navid et al. [54], provide some necessary and sufficient conditions for the solvability of a network and introduce algorithms for efficient element value computation. They investigate linear circuits using a single-frequency measurement. Navid et al. suggest their scheme could be used for non-linear circuits by linearizing them around the operating points. Since then, a number of papers on different linear and non-linear methods suggested for fault diagnosis have followed using methods such as multi-frequency measurements [55], element modulation [56], neural networks [57], and artificial intelligence-based techniques [51, Chapter 7]. Piecewise linear (PWL) modeling of non-linear elements is one of the approaches suggested for nonlinear analog fault diagnosis[51].

Estimation methods is also a SAT category. These schemes, using some estimation algorithms such as least square criteria, try to estimate circuit element values by minimizing the error between nominal and measured values. Statistical methods have also been used to select test parameters[58].

**Simulation before test:** In this method, a fault dictionary (FD) is formed through determining the response of the circuit to a set of stimuli in the presence of some specified faults. The faults can be isolated by matching the measured values to the closest set of responses in the dictionary. Schreiber [59] proposes an algorithm based on state space analysis to design an efficient stimulus for the fault dictionary. The accuracy of this method depends on the accuracy of the fault dictionary. Depending on how the FD is formed, some processing of measured values may be necessary; for example, the transient response of the circuit may have to be estimated.

Some research has focused on efficient stimulus design, by which a fault can be located with minimal measurement points and computation. Multi-frequency measurements [55] and element modulation techniques [56] provide examples of such efforts.

## 2.2.2 Testing analog circuits

With the advent of analog VLSI circuits and mixed-signal technology, the field of test development for analog circuits gained significant momentum. In recent years much research has been devoted to exploring and solving the analog testing problems. This section reviews some of these efforts.

#### **Test optimization**

Every second of test on the tester adds to the cost of the device. Testing cost for some dense and complex ICs is estimated to account for up to 30% of the total manufacturing cost. Therefore it is imperative to minimize the testing time by optimizing the test strategy. Milor et al. [41] introduce an algorithm for this optimization. Using the statistics of defects and faults, and the time associated with each test, their algorithm tries to minimize the test time by selecting and ordering the best set of tests. Using this algorithm, only the specifications of the device that provide maximum fault coverage in minimal time are tested.

#### **Fault simulation**

In the case of analog circuits, fault simulation is one method of test pattern generation. After forming a fault list, the circuit is simulated under faulty conditions to determine whether a specific parameter (test parameter) varies beyond its specified tolerance. The input signals used in fault simulation are chosen to make the test parameter observable. Analog circuit simulation is computationally intensive, and simulating a large circuit for a large number of faults can be quite lengthy and almost impossible. A number of attempts have been made to model analog circuits at higher levels of abstraction to decrease the simulation time [60]. Nagi et al. [61], suggest a solution for linear circuits by transforming the circuit to the s and then z-domain. This approach is limited because it applies only to linear circuits, and also assumes that the faulty circuit remains linear as well.

Macromodeling is another approach to analog circuit modeling in which the circuit is divided into behavioral blocks and then a circuit simulator such as SPICE or SPECTRE is used to simulate the system. This method decreases simulation time by lumping the effect of a number of circuit level elements into a behavioral block. These blocks should be selected such that the circuit level defects can be easily incorporated in them for fault simulation. Harvey et al. use a macromodeling technique in the fault simulation of a PLL circuit [22] and a current mode ADC [62]. Spinks, et al. [63] review some of the major works in this area.

#### Sensitivity analysis

One method used to determine an efficient set of test parameters is using sensitivity analysis. In this method the sensitivity of each output parameter to each circuit element is used to select an optimum test set. The test set stimulus and test parameters are chosen for maximum observability and controllability of the modeled faults. Slamani and Kaminska[46] analyze this scheme and suggest an algorithm to optimize the test set for maximum coverage. Their study includes single and multiple fault model cases, and concludes that as the number of faults in multiple fault model increases, testing approaches diagnosis. Although promising, sensitivity analysis is mostly applicable to linear or near-linear circuits.

#### DC and AC tests

Different parameters of an analog circuit can be tested. A DC test is an attractive option due to its simplicity. In a DC test, DC operating points of the circuit are tested for a set of DC inputs. This test generally offers good fault coverage for catastrophic faults and is suitable at the wafer test stage. Soma [64] suggests that some open circuit faults cannot be detected by a DC test. Also, capacitive faults escape detection in a DC test. Bishop et al. [65, 66] evaluate fault coverage of DC tests for two common classes of analog circuits, operational amplifiers and operational transconductance amplifiers (OTAs), and conclude that up to 80% of catastrophic faults in these circuits can be detected.

Sometimes AC tests are used to achieve higher fault coverage. In the AC case, single or multi-tone signals can be used. Multi-tone signals are especially useful in testing linear analog circuits such as filters because the sensitivities of output parameters vary with frequency. Using a multi-tone signal can help to detect more faults in one test.

#### **Inductive fault analysis (IFA)**

In forming a fault list for a CUT, it is important to select realistic faults to avoid testing for defects that are rare or nonexistent. Such a realistic fault list can be created by using the information on the process variations and possible physical defects. This list also depends on the layout of the circuit. This approach is viewed as more practical. Fig. 2.4 shows different stages of inductive fault analysis test generation method. Currently some tools capable of

inserting defects on the layout and generating a fault list [67] [68] exist. Although global defects due to some process failure do occur, mostly spot defects (local defects) are considered because the probability of global defects is low, and in general, detecting them is easy.

In a typical single-poly, double-metal CMOS process, some common spot defects include ([40])

1. short between wires;

2. open in wires;

3. pin holes in oxide, gate oxide, pn junctions;

4. missing contacts or vias; and,

5. extra contacts or vias.

The appropriate fault model for each defect depends on the condition of the defects and the processing technology. For example, a wire short can be low or high resistance depending on the amount of metal connecting the wires (Fig. 2.3). Defect statistics can also be used to optimize the test.

IFA has been the focus of some recent papers. It has been suggested as a means of evaluating test strategies such as functional testing [69]. Harvey, et al. use it to generate and evaluate different tests on circuits such as an ADC [62] and a PLL [22]. Sachdev [70] applies it to a class AB amplifier, and Xing [71] uses it to improve fault coverage for an air-bag control circuit. Sachdev in [72] gives an overview of IFA.



Figure 2.4: Inductive fault analysis flow diagram.

#### 2.2.3 Analog and mixed-signal DFT

Some DFT schemes have been proposed to improve the testability of analog circuits by increasing the observability of internal nodes of the circuit. Sunter [73] introduces a wideband analog test bus circuit which can be used for monitoring internal analog signals. Soma [74] exploits the serial structure of active filters and adds some switches to the circuit making it possible to test individual internal blocks. Slamani and Kaminska [75] suggest using parameter to DC converters. These on-chip converters produce a DC voltage which is a function of an internal parameter, and a comparator determines whether this voltage is within the acceptable range. This information is stored in a shift register and can be used to test and diagnose the device. Wey [76] and Wurtz [77] propose BIST structures for analog circuits which are in fact, the analog shift register (ASR). Taps (inputs of different stages) of this shift register are connected to different nodes in the circuit and by clocking it, the internal voltages of the circuit can be accessed. The authors of [76] also propose a current-based ASR which can alleviate the problem of limited voltage swing in a voltage-based ASR.

One of the problems with Slamani and Wey's methods is that the DFT (although the authors call it BIST) circuit can be comparable to the CUT in terms of chip area. In such cases, using these methods cannot be justified.

Another mixed-signal DFT method for solving problems with controllability and observability of embedded blocks is using an analog test bus. These efforts have led to a new standard: IEEE1149.4 mixed-signal test bus [78]. This standard is aimed at solving the problem of analog access and also is a means of extending the digital boundary scan standard IEEE1149.1 to full board-level connectivity test as well as testing on-board discrete passive components. Some implementation issues regarding the standard have been

addressed in [79, 80].

## 2.2.4 Mixed-signal BIST

Submicron mixed analog and digital technologies have enabled IC manufacturers to achieve high levels of integration, making system-on-chip a reality. One of the consequences, however, is difficulty in testing embedded analog and mixed-signal blocks because of: *(i)* limited access to these blocks, and *(ii)* the long time and expensive equipment needed for specificationbased testing. To solve the first challenge, on-chip DFT techniques such as internal buses have been proposed to increase the access to embedded blocks. Structural testing has been suggested to address the second problem by replacing expensive specification tests with less costly ones.

Although structural testing has been partially successful, specification testing in many cases is often seen as necessary. This is mostly due to the existence of soft faults, lack of proper fault models, and poor correlation between structural tests and specifications. To reduce the cost of functional testing, one solution is to use design for test (DFT) and built-in self test (BIST) techniques to reduce the test time and also the cost of test equipment needed. A practical DFT or BIST method has to satisfy several conditions; it must

- 1. occupy small area in comparison with the CUT;
- 2. be easy to design (preferably automatically synthesizable);
- 3. be sufficiently accurate for targeted tests;
- 4. generate a signature which can be analyzed conveniently on-chip or off-chip, such as a string of digital numbers.

The analog and mixed-signal BIST proposed in the literature can be divided to two groups: the general ones, which are applicable to a large class of circuits, and the more specific ones, which are used for limited classes of circuits such as active filters, switchedcapacitor op-amps, ADC/DAC and PLL.

Roberts et al. proposed mixed-signal BIST (MADBIST) which is primarily based on generating single- and multi-tone sine waveforms on the chip using an over-sampling oscillator [81, 82]. They have applied their method for testing sigma-delta analog to digital converters [83] and a wireless communication system [84]. This method, although very attractive for on-chip signal generation, is limited to cases where spectrum-based testing is possible and also requires on-chip digital signal processing capability for response analysis. On-chip sine generation is also used in [85]. However, a large portion of circuits proposed in [85] are analog, and therefore prone to process variations and also occupy large chip area.

Another BIST approach proposed by Arabi et al. is oscillation-based BIST (O-BIST) [86, 87, 88, 89, 90]. This method is based on modifying the CUT or adding some amount of circuitry on the chip to turn the CUT to an oscillator in test mode. Since oscillation parameters such as frequency and amplitude are functions of CUT parameters, O-BIST can detect many faults. Although effective is some cases, O-BIST is basically a structural test because the specifications of the CUT are not tested directly. Therefore, it is difficult to correlate the test results with actual specifications. O-BIST has been used to test ADCs [86, 87], operational amplifiers [89], active filters [88] and even digital circuits [90].

More specific BIST structures have been designed for ADCs and DACs (*e.g.*, [91]) and PLLs. The ADC and DAC test scheme in [91] includes a significant amount of analog circuitry which makes it difficult to design and occupies large area. The details of PLL BIST

schemes have been reviewed in Sec. 1.2.

# 2.3 Conclusions

One of the major problems facing industry is balancing the need for testing to ensure product quality and the cost of testing. Since the final goal of testing is ensuring the functionality of the final product, functional testing is perhaps the most intuitive test method. However, the fault-based structural test generation approach for digital testing proved to guarantee correct functionality while reducing the test cost. This success prompted a large amount of research to develop structural tests for mixed-signal circuits, resulting in some partially successful techniques. However, the complex nature of mixed-signal circuits hampers the efforts to develop efficient fault models, fault simulation tools, and automatic test generation methodologies. These problems, combined with the expensive external tests, have made many experts believe that functional BIST or embedded test is the best solution for testing high-performance mixed-signal circuits, at least until the mixed-signal fault modeling and simulation issues are resolved.

This chapter reviewed a number of existing mixed-signal BIST and DFT techniques available for widely used circuits such as amplifiers, filters, ADCs, DACs and PLLs. However, many of these techniques have limitations, such as impact on the CUT, insufficient accuracy, limited to specific applications and/or large area. Therefore, new cost-effective mixed-signal BIST and embedded test techniques are required to meet industry needs.

# Chapter 3

# Built-In Current Monitor for Testing Analog Circuit Blocks

In this chapter, a built-in current monitor (BICM) capable of sensing and averaging the supply current of embedded analog circuit blocks is presented. The current monitor has two main parts: a *built-in current sensor* (BICS) [92] and a *built-in current integrator* (BICI) [93], as shown in Fig. 3.1. The current sensor circuit has a structure similar to that in [10] and [94, Chapter 12]. It can have a small  $Z_{BIC}$  (~ 2 $\Omega$ ) for a medium-sized current sensing device ( $w = 72 \,\mu m/0.35 \,\mu m$ ). It also provides CUT supply voltage regulation. Such regulation can be exploited to decrease the sensitivity of the CUT to power supply variations [95]. By analyzing the  $Z_{BIC}$  characteristic of the sensor, we can optimize the design to minimize area and performance impact.

The BICI generates a digital signature proportional to  $\overline{I_{DD}}$ . The features of the BICI include:

1. The possibility of on-chip integration since (i) it uses small capacitors ( $\sim$ 74 pF in



Figure 3.1: The proposed current monitor.

total), and (*ii*) the circuit area increases only marginally (one flip-flop and 2 gates) for a twofold increase in T (in contrast to almost doubling the size if simple capacitive integration is used). Only one integrator is needed on the whole chip by multiplexing the current from a number of CUTs to the BICI (as illustrated in Fig. 3.1)

2. Suitability for BIST applications since it generates a digital signature which can be easily evaluated on chip by simple digital circuitry

The BICI circuit, presented in this chapter, performs integration during one phase of a digital control signal, hence the name single-phase BICI. Chapter 4 presents a double-phase BICI circuit which performs integration during both phases of the control signal. The current monitor has been implemented using a standard 0.35  $\mu m$  CMOS technology. Analytical

models, simulation results and some silicon measurements are also presented.

The organization of the paper is as follows. The detailed operation and analytical model of the current sensor circuit are presented in Sec. 3.1. In Sec. 3.2, the current integrator circuit details are described. Sec. 3.3 reports the simulation and experimental measurement results. Sec. 3.4 provides further discussion on the proposed circuits, and draws some conclusions.

# **3.1 Built-In Current Sensor (BICS)**

#### 3.1.1 BICS circuit

A block diagram of the proposed current sensor circuit appears in Fig. 3.2. The main part of the circuit is a current mirror composed of transistors M12 and M13. The CUT supply current  $I_{DD}$  is sensed by M12 and mirrored onto M13 as a current  $I_{sense}$  given by:

$$I_{sense} = K_s I_{DD} \tag{3.1}$$

where  $K_s = \frac{w_{13}/l_{13}}{w_{12}/l_{12}}$  (w/l is the aspect ratio of the transistors).

The differential amplifier (DIFFAMP) amplifies the voltage difference between  $V_{BIC}$ and a reference  $V_{ref}$ . This difference, used in the negative feedback loop, stabilizes  $V_{BIC}$ , forcing  $V_{BIC}$  to be close to  $V_{ref}$ .

We assume that  $V_{ref}$  is supplied from an on-chip voltage source (e.g., a bandgap voltage source [95]).  $V_{ref}$  has to be small to allow a large portion of  $V_{DD}$  to be supplied to the CUT. However, choosing a very small value for  $V_{ref}$  could cause M12 to operate in the triode region, which would result in increased  $Z_{BIC}$  (see Sec. 3.1.3). In the circuit implementation shown in Fig. 3.3,  $V_{ref} = 0.1V_{DD}$ .



Figure 3.2: The block diagram of the current sensor.

DIFFAMP is a single stage differential amplifier instead of the two-stage op-amp used in [10] for the following two principal reasons:

- 1. The second stage of an op-amp provides additional gain and output drive. For our sensor application, sufficient gain can be achieved with only one stage (the gain is sufficient to yield a  $Z_{BIC} = 2\Omega$ ), and large output drive is not necessary because the amplifier is not driving a resistive load. Therefore, in addition to simplifying the design, eliminating this second stage saves area without inducing performance loss.
- 2. Using a two-stage op-amp in conjunction with M12 creates an amplifier with three poles in the frequency domain. Such a structure is difficult to stabilize.

The negative feedback also reduces  $Z_{BIC}$  since:

$$Z_{BIC} = \frac{r_{12}}{A_D},$$
 (3.2)

where  $r_{12}$  is M12's drain-source resistance and  $A_D$  is the differential amplifier gain (Sec. 3.1.3 gives justification for the latter claim).  $Z_{BIC}$ , however, increases at high frequencies giving rise to noise at  $V_{BIC}$  which appears as supply noise at the CUT ground terminal. Sec. 3.1.3 offers a bandwidth definition for the sensor.

#### **3.1.2 BICS accuracy**

The accuracy of the current mirror is indicated by

$$ERR = 1 - K_s I_{DD} / I_{sense}.$$

It is possible to achieve an error close to 1% by using layout matching techniques [96], assuming the effect of mismatch between  $v_{ds12}$  and  $v_{ds13}$  is negligible. This mismatch can safely be neglected if long channel transistors ( $l > 5 \mu m$ ) are used, as these result in very large transistors M12 and M13. Instead, we choose short channel transistors ( $l = 0.35 \mu m$ ) and use an op-amp (OPAMP in Fig. 3.3) to ensure  $v_{ds12} = v_{ds13}$  [10]. An advantage of this technique is that *ERR* will remain small even if M12 moves to its triode region of operation due to a large faulty  $I_{DD}$ . Also, the area overhead of the OPAMP can be less than that of using longer M12 and M13. For example, Table 3.1 compares the area overhead of the two techniques for the design illustrated in Fig. 3.3.

Another advantage of this technique is that as the width of M12 is increased to accommodate larger values of  $I_{DD}$ , the area of OPAMP remains the same. This results in a larger amount of area saved than when long channel transistors are used.

| ERR reduction Technique                                                                                | Approximate area |
|--------------------------------------------------------------------------------------------------------|------------------|
| Long channel M12 and M13                                                                               |                  |
| $\left(\frac{w_{12}}{l_{12}} = \frac{500\mu}{5\mu}, \frac{w_{13}}{l_{13}} = \frac{80\mu}{5\mu}\right)$ | $9000\mu m^2$    |
| Using an op-amp                                                                                        |                  |
| for $V_{ds}$ matching                                                                                  | $4700\mu m^2$    |

Table 3.1: Area overhead comparison of two current mirror matching techniques

## **3.1.3** Calculating $Z_{BIC}$

Fig. 3.3 shows a transistor level diagram of the sensor circuit.  $Z_{BIC}$  is obtained from a small signal analysis of the circuit. To calculate  $Z_{BIC}$  in this circuit, the CUT is modeled with the current source  $I_{DD}$  and the  $C_{CUT}$  capacitor (effective capacitance at the CUT supply node). The capacitance is assumed to be 3 pF which is the equivalent capacitance at the GND node of an example VCO circuit investigated in [8]. The differential amplifier (M1-M11), the sensing transistor (M12), and the  $I_{DD}$  current source, in effect, form a two-stage operational amplifier (op-amp) connected in a unity-gain configuration[97, Chapter 5]. The compensation capacitor,  $C_{comp}$ , is used to achieve a 60° phase margin for sufficient stability.

To obtain  $Z_{BIC}(f)$ , a small signal analysis is performed as follows. From Fig. 3.3:

$$Z_{cz}(s) = Z_{BIC}(s) || Z_{I_{DD}}(s)|$$

where  $Z_{cz}(s)$  and  $Z_{I_{DD}}(s)$  are the Laplace transforms of the closed-loop op-amp output impedance and the  $I_{DD}$  current source impedance, respectively. Since  $Z_{I_{DD}}(s) = \infty$ , it can be concluded that:

$$Z_{BIC}(s) = Z_{cz}(s)$$

We denote the Laplace transform of the open-loop gain and output impedance of the resulting op-amp by  $H_{og}(s)$  and  $Z_{oz}(s)$ , respectively. Algebraic manipulation yields the fol-

lowing closed-loop output impedance of the unity gain connected op-amp [98]:

$$Z_{cz}(s) = \frac{V_{BIC}}{I_{BIC}} = \frac{Z_{oz}(s)}{1 + H_{og}(z)}$$
(3.3)



Figure 3.3: Current sensor circuit.

To find an approximate closed-form expression for  $Z_{BIC}$ , we replaced each transistor with the small signal model shown in Fig. 3.4. From [97, Chapter 5], for the frequency of OHz (DC), all the capacitances in the model can be replaced by open-circuits, resulting in



Figure 3.4: AC model of a MOS transistor.

the following equations for  $Z_{oz}(0)$  and  $H_{og}(0)$ :

$$Z_{oz}(0) = r_{d12}$$
 and  $H_{og}(0) = g_{m12}r_{d12}g_{m1}(r_{d1}||r_{d3})$  (3.4)

where  $g_{mi}$  and  $r_{di}$  denote the transconductance and drain-source resistance of the transistor Mi (i = 1, ..., 13). Substituting these values in Eqn. 3.3 yields the following for  $Z_{BIC}(0)$ :

$$Z_{BIC}(0) = r_{d12}/(1 + g_{m12}r_{d12}g_{m1}(r_{d8}||r_{d9}))$$

$$\simeq 1/g_{m12}g_{m1}(r_{d8}||r_{d9}) = r_{12}/A_D$$
(3.5)

where  $A_D = g_{m1}(r_{d8}||r_{d9})$  is the differential amplifier gain, and  $r_{12} = 1/g_{m12}$  is M12's drain-source resistance. Note that  $g_{m12}$  decreases as M12 moves to the triode region. Therefore, M12 should operate in saturation region to ensure larger values of  $g_{m12}$  which, in turn, yield a smaller  $Z_{BIC}(0)$ .

To determine the characteristics of  $Z_{BIC}$  at frequencies other than zero, we derive approximate expressions for the dominant zeros and poles of  $Z_{BIC}(s)$ , as they specify the frequency response of the current sensor. To do so, the symbolic toolbox of MATLAB [99] was used in conjunction with simulated operating point values for transconductances, capacitors and resistor in the small signal model. Details of the technique used for this derivation can be found in Appendix A.

The result of the small signal analysis indicates that  $Z_{BIC}(s)$  has only one dominant zero given by:

$$z_d = -1/(C_{comp} + C_{gs12})(r_{d8}||r_{d9})$$
(3.6)

Since  $Z_{BIC}(s)$  has one dominant zero,  $|Z_{BIC}|$  increases at frequencies exceeding  $z_d$ . This implies that the sensor will have a frequency limit beyond which  $|Z_{BIC}|$  will be too large to be acceptable. Practically, the sensor will have a limited acceptable bandwidth which depends on the CUT performance specifications. For example, assuming that a CUT can tolerate 25 mV of supply voltage noise ( $|V_{BIC}|$  (ac amplitude) = 25 mV) and that the high frequency components of  $I_{DD}$  have maximum amplitude of 1mA, the maximum acceptable value for  $Z_{BIC}$  is  $V_{BIC}/I_{DD} = 25\Omega$ . We define the sensor bandwidth,  $BW_{BIC}$  as the frequency range f = 0 to  $f_R$ , where  $f_R$  is the frequency such that  $|Z_{BIC}(s = j2\pi f_R| =$  $R_{MAX}$  where  $R_{MAX}$  is a specified maximum acceptable value for the magnitude of the sensor impedance.

To obtain a closed-form expression for  $BW_{BIC}$ , we approximate  $Z_{BIC}(s)$  by a singlezero system expression (with s = jf):

$$Z_{BIC}(jf) = Z(0)(1 + \frac{jf}{z_d}) \quad \Rightarrow \quad |Z_{BIC}(jf_R)| = Z_{BIC}(0)|(1 + \frac{jf_R}{z_d})| \tag{3.7}$$

If  $Z_{BIC}(0) \ll R_{MAX}$  then  $z_d \ll f_R$ . Therefore, substituting  $z_d$  and  $Z_{BIC}(0)$  from Eqns. 3.5 and 3.6 in Eqn. 3.7 leads to the following simple expression for  $f_R$ :

$$f_R = R_{MAX} \frac{z_d}{Z_{BIC}(0)} = R_{MAX} \frac{g_{m1}g_{m12}}{(C_{gs12} + C_{comp})}$$
(3.8)

From Eqn. 3.8, the sensor bandwidth is directly proportional to  $g_{m1}$  (and also  $g_{m2} = g_{m1}$ ) and  $g_{m12}$ . Various design techniques can be used to increase  $BW_{BIC}$ . A simple measure is to choose wider M1 and M2 transistors.

# **3.2** Single-phase BICI

Here, we propose a single-phase BICI circuit which performs integration only in one phase of a control signal  $\phi$  (when  $\phi$ =LOW) which controls the integration operation.

#### **3.2.1** Basis

For a feasible on-chip  $I_{DD}$  integration, the integrator circuit should not require a large capacitor. To avoid the use of a large capacitor, the circuit proposed here exploits the following two principles:

- Reducing the integration time, T, results in a proportional reduction in the size of the integrating capacitor (Eqn. 1.4).
- Integration over a specific time can be expressed as the summation of a series of integrations over shorter periods of time, *i.e.*,

$$\int_{0}^{T} I dt = \sum_{i=0}^{M-1} \int_{iT_{s}}^{(i+1)T_{s}} I dt$$
(3.9)

where  $T = MT_s$  and I is the BICI input current.

The functional block diagram in Fig. 3.5 illustrates the basis of this integrator circuit. In this circuit, the total integration window of duration T is divided into M smaller integration sub-windows of duration  $T_s$ , *i.e.*,  $T = MT_s$ . The 'short-time analog integrator' (STAI) integrates an input current I for the time  $T_s$  (the initial condition at the beginning of each integration sub-window must be 0V). By making  $T_s$  sufficiently short, an arbitrarily small capacitor C can be used to generate the voltage  $v_i(t)$  proportional to the integral of the input current:

$$v_i(t) = K_C \int_{iT_s}^t I(\tau) d\tau$$
(3.10)



Figure 3.5: Single-phase BICI functional block diagram

where  $K_C$  is a constant dependent on STAI design. The STAI output at the end of the *i*-th integration sub-window, denoted by  $V_i$ , is given by:

$$V_i = v_i(t)|_{t=(i+1)T_s} = K_C \int_{iT_s}^{(i+1)T_s} I(\tau) d\tau$$
(3.11)

 $V_i$  is sampled and held by the S&H block. The ADC block converts  $V_i$  to an n-bit digital number  $N_i$  proportional to  $V_i$ , *i.e.*, :

$$V_i = N_i V_\Delta + V_i^R \tag{3.12}$$

where  $V_{\Delta}$  is the *ADC* quantization step and  $0 < V_i^R < V_{\Delta}$  is the *ADC*'s quantization voltage error in the *i*-th integration sub-window. The *ACC* block adds all the  $N_i$ 's (i = 0, ..., M-1)to generate the number N, *i.e.*,  $N = \sum_{i=0}^{M-1} N_i$ . The following formally summarizes the current integrator functionality:

$$N = \sum_{i=0}^{M-1} N_{i}$$

$$= \sum_{i=0}^{M-1} (V_{i} - V_{i}^{R}) / V_{\Delta}$$

$$= (1/V_{\Delta}) \sum_{i=0}^{M-1} K_{C} \int_{iT_{s}}^{(i+1)T_{s}} I dt - (1/V_{\Delta}) \sum_{i=0}^{M-1} V_{i}^{R} \qquad (3.13)$$

$$= (K_{C}/V_{\Delta}) \int_{0}^{MT_{s}} I dt - V^{R}/V_{\Delta}$$

$$= (K_{C}/V_{\Delta}) \int_{0}^{T} I dt - V^{R}/V_{\Delta}$$

where  $V^R = \sum_{i=0}^{M-1} V_i^R$ . The above is an idealized analysis. Actual limitations arising in practice are discussed later.

## **3.2.2** Notation and definitions

The notation and definitions used in describing the details of the single-phase BICI circuit and its operation are as follows:

- T: total integration time window duration
- $T_s$ : integration sub-window duration such that  $T = MT_s$
- M: the number of integration sub-windows in T
- $\phi_i$ : *i*-th (*i* = 0,..., *M* 1) integration sub-window
- $T_{s(eff)}$ : "effective integration interval". Portion of  $T_s$  in which integration is performed.
- $T_R = T_s T_{s(eff)}$ : reset time. Portion of  $T_s$  in which the circuit is reset for the next integration sub-window.
- $t_i$ : beginning time of the sub-window  $\phi_i$ .
- $t'_i = t_i + T_{s(eff)}$
- CLK: clock signal
- $\phi$ : signal with period  $T_s$  controlling the integration timing.
- $N_i$ : digital number proportional to integration of I in  $\phi_i$
- N: final number proportional to integration of I over time  $T(N = \sum_{i=0}^{M-1} N_i)$
- $Q_i$ : ADC quantization noise associated with  $N_i$

- Q: total quantization noise over time T
- $R_i$ : reset noise associated with  $N_i$
- R: total reset noise over time T

3.2.3 Circuit operation



Figure 3.6: Single-phase BICI circuit schematic

The single-phase BICI schematic is given in Fig. 3.6. Transistors M1 and M2 form a scaling current mirror to supply the current  $I_{sense}$  from the sensor to the integrating capacitor  $C_{int} = C_1 + C_2$ . The value of  $C_{int}$  is chosen such that M2 does not enter its triode region of operation during the integration over time  $T_s$ . For example, assuming that M2 enters its triode region at  $V_C = 2.5V$ ,  $T_s = 1 \,\mu s$ , and the maximum average current flowing into  $C_1$  during an integration sub-window,  $\overline{I_{T_s(max)}}$ , is 150  $\mu A$ , a 60 pF capacitor or larger would be required since  $C_{int} = \overline{I_{T_s(max)}}T_s/V_C = 60 \, pF$ .

A timing diagram of the circuit operation illustrating the significant waveforms is shown in Fig. 3.7. The input current I is assumed to be a sinusoid with DC offset.  $\phi$  is a signal with period  $T_s$  which controls the integration timing. Since  $T = MT_s$ , a complete integration requires M consecutive periods of  $\phi$ . In each sub-window  $\phi_i$ , two parallel operations are performed:

- 1. I is integrated over time  $T_{s(eff)}$  resulting in the voltage waveform  $V_C$ . In the subsequent time  $T_R$ , a part of the integration result is stored as the voltage  $V_i$  on  $C_3$  ( $V_S$  waveform).  $T_R$  is the *reset* time during which  $C_1$  is discharged and the control circuitry is reset (initialized)
- 2. Concurrently to the above, the ADC & ACC blocks digitize  $V_{i-1}$  stored on  $C_3$  at the end of the previous integration sub-window  $\phi_{i-1}$  and accumulate the result

These operations are repeated over M integration sub-windows to obtain the integral of I over the time  $MT_s$ .



Figure 3.7: Timing diagram and different waveforms in the BICI circuit of Fig. 3.6

#### Short-Time analog integrator (STAI) and S&H

The operation of STAI and S&H blocks are tightly coupled, and thus both are described in this section.

From Figs. 3.6 and 3.7, in the interval  $\phi_i$ , switches S1 and S3 are closed for time  $T_{s(eff)}$ , while switches S2 and S4 are open. During such a time interval, I flows through  $C_1$  and  $C_2$ . The voltage at node C at the end of the *i*-th integration interval,  $V_C^i$ , is given by:

$$V_C^i = V_C(t_i') = \frac{1}{C_1 + C_2} \int_{t_i}^{t_i'} I dt + K_{21} V_{i-1}$$
(3.14)

where  $V_{i-1}$  is the initial condition on  $C_2$  at the beginning of  $\phi_i$ ,  $K_{21} = C_2/(C_1 + C_2)$ ,  $t_i = iT_s$  and  $t'_i = iT_s + T_{s(eff)} = (i+1)T_s - T_R$ . Note that we assume  $C_1$  is fully discharged at the end of each interval  $\phi_i$ .

For the subsequent time  $T_R$  in  $\phi_i$ , switches S1 and S3 are opened, and switches S2 and S4 are closed. This effectively results in two functions: (i)  $C_1$  is discharged through S2 to initialize  $C_1$  for the next integration sub-window; (ii) the charge stored on  $C_2$  at  $t = t'_i$  is distributed between  $C_2$  and  $C_3$ . Therefore, the voltage on node S at time  $t'_i = t'_i + \epsilon$  ( $\epsilon \rightarrow 0$ ) denoted by  $V_i$ , is:

$$V_i = \frac{C_2}{C_3 + C_2} V_C^i = K_{23} V_C^i$$
(3.15)

where  $K_{23} = C_2/(C_3 + C_2)$ . The waveforms  $V_S$  and  $V_A$  in Fig. 3.7 illustrate this operation for  $K_{23} = 0.5$ .

In the subsequent interval  $\phi_{i+1}$ , the switch S4 is opened causing  $V_i$  to be held on  $C_3$ . Therefore, switches S3 and S4 in conjunction with  $C_2$  and  $C_3$  perform a sample and hold operation. The advantage of this circuit over more conventional sample and hold circuits is that it does not require analog buffers in front of the sampling switches S3 and S4. This simplifies the design significantly because there is no need to design wideband buffers with very small input DC offset voltages [97]. Not using an isolating buffer between  $C_1$  and  $C_2$ means that the charges on  $C_2$  in  $\phi_i$  affect the integration in  $\phi_{i+1}$  (see Eqn. 3.14). However, as shown in Appendix B, this affects only the proportionality factor K of BICI characteristics (Eqn. 3.16) which is obtained through calibration.

#### ADC & accumulator (ACC)

During  $\phi_i$  (time  $t_i$  to  $t'_i$ ), the ADC and ACC blocks digitize the voltage  $V_{i-1}$  stored on  $C_3$  (node S) and add the converted voltage to the digital number accumulated using a counter. The single slope technique is used to perform the analog-to-digital conversion because it can be implemented with small area and provides good linearity.

When  $\phi$  goes from HIGH to LOW, the flip-flop FF2 is set (r\_rst signal turns LOW) causing the counter to start counting (incrementing by one at each CLK rising edge). The CLK period must be much less than  $T_{s(eff)}$ , e.g.,  $T_{s(eff)} = 20T_{CLK}$ . Concurrently, switches S5 and S7 are opened, and S6 is closed. This causes the constant current  $I_{ramp}$  to flow into  $C_4$ , generating a ramp voltage waveform with constant slope at node R. The comparator COMP output, r1, switches state when  $V_R \ge V_S = V_{i-1}$ . At this moment, FF1 is set. This, in turn, resets FF2, which stops the counter. At the end of such a cycle of operations, a number  $N_{i-1}$  proportional to  $V_{i-1}$  is added to the value already accumulated in the counter.

After stopping the counter, S6 is opened and switches S5 and S7 are closed to discharge  $C_3$  and  $C_4$ . Also, FF1 is reset to initialize FF1 and FF2 for conversion in the next integration sub-window.

As shown in Appendix B, the relationship between the final counter state, N, and

the average current over time T, is obtained from the following:

$$N = K \int_0^T I dt + O + E$$
  
=  $KT\overline{I} + O + E$  (3.16)

where K is a constant, O is an offset value, and E is a random number resulting from ADC quantization noise Q, and a reset error R, *i.e.*, :

$$E = Q + R \tag{3.17}$$

Q and R are the result of the accumulation of quantization noises  $Q_i$ 's, and reset errors  $R_i$ 's in each integration sub-window, respectively:

$$Q = \sum_{i=0}^{M-1} Q_i$$
 (3.18)

$$R = \sum_{i=0}^{M-1} R_i \tag{3.19}$$

The reset error factor is due to the reset time  $T_R$  in each integration sub-window during which I is not integrated. As shown in Appendix B, E has a Gaussian distribution with a zero mean and a variance given by:

$$\sigma_E^2 = M \left[ \frac{1}{12} + \left( \frac{K_M}{K_r T_{CLK}} \right)^2 \overline{I_{ac}^2} T_R^2 \right]$$

where  $K_M = \frac{K_{23}}{(C_1 + C_2)(1 - K_{23}K_{21})}$  and  $K_r = I_{ramp}/C_4$ .

Sec. 3.2.4 describes how K and O can be obtained through calibration, while Sec. 3.2.5 discusses guidelines for minimizing E.

#### **3.2.4 BICI two-point calibration**

To calibrate the integrator circuit, two known current signals,  $I_{c1}$  and  $I_{c2}$ , are applied to the integrator circuit. In each case, for given CLK, T,  $T_s$ ,  $T_R$  and  $I_{ramp}$ , the corresponding av-

erages  $N_{c1}$  and  $N_{c2}$  are recorded by the counter:

$$N_{c1} = KT\overline{I_{c1}} + O + E_{c1} \tag{3.20}$$

$$N_{c2} = KT\overline{I_{c2}} + O + E_{c2}$$
(3.21)

The calibrating current signals,  $I_{c1}$  and  $I_{c2}$ , must be AC signals with a large average (e.g., more than 30% of the maximum average current the circuit is designed for  $(\overline{I}_{MAX})$ ). The large average current yields relatively large values for  $N_i$ 's. This reduces the ratio  $Q_i/N_i$ because the statistics of  $Q_i$  are independent of  $N_i$  (they are only a function of the ADC quantization step (refer to Appendix B)). The AC component reduces  $E_{c1}$  and  $E_{c2}$  by randomizing the  $Q_i$ 's and the reset errors  $R_i$ 's in each integration sub-window. The frequency of the AC components, denoted by  $f_{CAL}$ , should satisfy the relationship  $1/T < f_{CAL} < 1/T_s$  and be uncorrelated with  $\phi$ . Assuming negligible  $E_{c1}$  and  $E_{c2}$ , Eqns. 3.20 and 3.21 simplify to:

$$N_{c1} = KT\overline{I_{c1}} + O \tag{3.22}$$

$$N_{c2} = KT\overline{I_{c2}} + O \tag{3.23}$$

Computing K and O in the above completes the calibration process. Eqns. 3.16, 3.22 and 3.23 yield the following relationship between N and  $\overline{I}$ :

$$N = (N_{c2} - N_{c1})(\frac{\overline{I}}{\overline{I_{c2}} - \overline{I_{c1}}}) + \frac{N_{c1} - N_{c2}\overline{I_{c1}}/\overline{I_{c2}}}{1 - \overline{I_{c1}}/\overline{I_{c2}}} + \epsilon$$
(3.24)

The DC values of  $I_{c1}$  and  $I_{c2}$  signals must be precisely known but the frequency, amplitude and waveform of their AC components are not critical because the AC components only serve as an error randomizer to reduce the factor E. Any AC signal available on the chip could be used for this purpose as long as it satisfies the conditions mentioned above. The precise DC current  $\overline{I_{c1}}$  has to be generated on-chip or be supplied from off-chip.  $\overline{I_{c2}}$  can be generated using a properly matched current mirror  $\overline{I_{c2}} = n\overline{I_{c1}}$ .

In a BIST application, the current measurement result should be evaluated on-chip by comparing N with limit values  $N_l$  and  $N_h$  corresponding to the predetermined minimum and maximum average current thresholds  $I_l$  and  $I_h$ , respectively. Assuming  $\overline{I_{c2}} = 2\overline{I_{c1}}$ ,  $I_l = l\overline{I_{c2}}$  and  $I_h = m\overline{I_{c2}}$  where l and m are integers, from Eqn. 3.24, the calculation of  $N_l$ and  $N_h$  only requires simple integer multiplication and addition as shown below:

$$N_l = (2l-1)N_{c2} - (2l-2)N_{c1}$$
(3.25)

$$N_h = (2m-1)N_{c2} - (2m-2)N_{c1}$$
(3.26)

The above operation can be implemented using a shift register and an adder. Digital multipliers and adders, usually readily available on mixed-signal ICs, can also be used to calculate  $N_l$  and  $N_h$ . This is important, as each chip has to be calibrated and therefore standard  $N_l$  and  $N_h$  values cannot be downloaded from an external source (*e.g.*, tester).

## 3.2.5 Single-phase BICI accuracy

To achieve a high measurement accuracy, sources of error have to be identified and their impact minimized by design. The factors affecting the measurement accuracy of the integrator circuit are as follows:

- 1. M1 and M2 should be long transistors to reduce the effect of  $V_{ds}$  variations on I
- The M1/M2 mismatch, C<sub>1</sub>, C<sub>2</sub>, C<sub>3</sub>, C<sub>4</sub>, and I<sub>ramp</sub> variations, and the propagation delays in the comparator, flip-flops and the gates (denoted by t<sub>d</sub> in Fig. 3.7), affect the factors K and O in Eqn. 3.16; this is shown by the analysis in Appendix B. Using a two-point calibration scheme, K and O can be determined. Therefore, any process

and temperature variations affecting K and O will affect measurement accuracy, *i.e.*, these can be accounted for via the calibration process explained in Sec. 3.2.4

- 3. The clock feedthrough associated with the S3 and S4 switches can alter the charges on  $C_1$ ,  $C_2$  and  $C_3$ , thereby affecting measurement accuracy. Avoiding very small values for these capacitors and using complementary switches will minimize the clock feedthrough problem [97]
- 4. Power supply  $(V_{DD})$  variation will not deteriorate the performance of the integrator circuit because it affects the value of  $I_{ramp}$ , the delay and the offset of the comparator, and the delays of the gates and flip-flops in the circuit, all of which are compensated for during calibration
- 5. The error term E in Eqn. 3.16 will generally be the major source of reduced degradation. E has two components: the quantization noise Q; and R that results because I is not integrated during a reset time T<sub>R</sub>. The R can be minimized by making T<sub>R</sub>/T<sub>s</sub> ≪ 1. As for Q, increasing T<sub>s</sub> or increasing the CLK frequency will reduce this factor

As discussed above, the integrator circuit accuracy is independent of process variation when using a two-point calibration scheme. This property makes the circuit especially suitable for integration on high volume mixed-signal chips.

#### **3.2.6** Circuit implementation

Assuming that the integrator circuit is to be used in an application where the maximum of  $\overline{I}$  is  $150 \,\mu A$  and the available CLK frequency is 20 MHz, the following parameters were chosen to design the integrator circuit:

- $T = 1 ms, T_s = 1 \mu s, M = 1000$
- $T_{s(eff)} = 0.95 \, \mu s, T_R = 50 \, ns$
- $I_{ramp} = 11.2 \, \mu A$
- Counter size: 18-bits



Figure 3.8: The transistor-level schematic of the integrator circuit

These parameters imply that  $C_1 = 60 \, pF$ . The transistor level schematics of the main circuit and the comparator are shown in Figs. 3.8 and 3.9. The component values and transistor sizes are shown for critical components. These values have been chosen to achieve reasonable integration accuracy (error < 2%) as explained in Sec. 3.2.5 and to reduce the circuit area. Since the delay and offset of the comparator are not critical, a simple two-stage comparator was chosen. It is possible to use faster comparator structures such as the one in [100, Chapter 26], but a larger circuit area would be required.

 $\phi_a$  and  $\phi_b$  are non-overlapping control signals constructed from  $\phi$ . They are used instead of  $\phi$  and  $\overline{\phi}$  in Fig. 3.6 to avoid sample and hold errors [97].

The total area of the circuit is  $45500 \ (\mu m)^2$ . This is equivalent to the area of a digital circuit consisting of 250 NAND gates in a  $0.35 \ \mu m$  digital cell library.



Figure 3.9: Schematic of the comparator

# 3.3 Simulation and Experimental Results

# 3.3.1 BICS

The current sensor circuit has been simulated with SpectreS [101] and SpectreS Verilog from Cadence Design System.

The BIC sensor's main performance measure is its output impedance response versus frequency, *i.e.*,  $Z_{BIC}(s = j2\pi f)$ . Fig. 3.10 reports  $Z_{BIC}$  vs. frequency from our approximate first-order analytical model, simulations, and IC measurements for the circuit illustrated in Fig. 3.3. The dc impedance,  $Z_{BIC}(0)$ , and bandwidth for  $R_{max} = 25\Omega$  results are tabulated in Table 3.2.

|                                          | $Z_{BIC}(0)$ | $BW_s$ (MHz)           |
|------------------------------------------|--------------|------------------------|
|                                          | $(\Omega)$   | $(R_{max} = 25\Omega)$ |
| 1st-order analytical model               | 2.7          | 5.7                    |
| Simulation ( $C_{CUT} = 3 \text{ pF}$ )  | 2.7          | 5.4                    |
| Simulation ( $C_{CUT} = 30 \text{ pF}$ ) | 2.7          | 5.3                    |
| Measurement                              | 3            | 5.3                    |

Table 3.2: Low frequency impedance and bandwidth of  $Z_{BIC}$ 

The parameter values needed for first-order model calculation are obtained from the operating point simulation. These parameters were found to be:  $r_{ds8} = 700 \, k\Omega$ ,  $r_{ds9} = 720 \, k\Omega$ ,  $g_{m1} = 166 \, \mu \Omega^{-1}$ ,  $g_{m12} = 6.1 \, m \Omega^{-1}$ ,  $C_{gs12} = 88 f F$ .

The high frequency poles, from measurement, are located at frequencies lower than expected from simulation. This is primarily because of the pad and oscilloscope probe capacitance. Repeating simulations for  $C_{CUT} = 30$  pF (shown in Fig. 3.10) confirms this hypothesis.



Figure 3.10:  $Z_{BIC}$  versus frequency

Plots of the simulated transfer function for  $I_{sense}/K_s I_{DD}$  appear in Fig. 3.11. This figure shows that even fairly high frequency components ( $f \sim 20 MHz$ ) of the current are sensed adequately (*i.e.*, without significant attenuation). However, since the high frequency components do not contribute to the average current (see Sec. 1.1.1), the limited bandwidth does not affect the accuracy of the BIC monitor.

Fig. 3.12 shows the simulated and measured DC characteristic of  $V_{BIC}$  versus  $I_{DD}$ . As expected, for current values larger than a threshold value,  $V_{BIC}$  increases rapidly with the current, because the differential amplifier DIFFAMP output saturates. Consequently,  $v_{gs12}$  does not increase further as  $I_{DD}$  is elevated. In other words, for large currents the negative feedback does not exist, which results in increased  $Z_{BIC}$ . Therefore, an upper limit for the dynamic range of this sensor with regards to  $Z_{BIC}$  is  $I_{th} = 10 \, mA$  from simulation and



Figure 3.11:  $I_{sense}/K_s I_{DD}$  transfer function for  $I_{DD} = 1 mA$  at the operating point ( $K_s = 1/6$ )

 $I_{th} = 7.8 \, mA$  from chip measurement. The difference between the simulation and measurement result can be justified by the fact that the model file parameters used in simulation are extracted from a different fabrication run than the one in which the tested chip was manufactured. To achieve a larger dynamic range, a wider M12 should be chosen.

Fig. 3.13 shows the simulated and measured current mirror gain error,  $ERR = 1 - (K_s I_{DD}/I_{out})$ , for different  $I_{DD}$  levels. The dashed-dotted line shows the ideal value of ERR when  $I_{out}$  is the drain current of M13 (in Fig. 3.3). As expected, this ERR value is very small (< 0.1%) even when M12 operates in the triode region ( $I_{DD} > 10 mA$ ). The dotted plot shows ERR when  $I_{out} = I_{sense}$ . In this case, ERR has relatively large value (~ 2%) because of the leakage currents of transistor M13a in Fig. 3.3. However, as  $I_{DD}$  increases, this leakage current becomes less significant and ERR decreases to less than 1.2%



Figure 3.12:  $V_{BIC}$  versus  $I_{DD}$  DC characteristics

for  $I_{DD} > 2mA$ .

The chip measurement results, shown with circles, are significantly different from the simulation results. Upon investigation of the layout, it was discovered that this difference is due to the resistance of the metal lines and vias connecting the input  $V_{BIC}$  node to the drain of M12. This resistance, denoted by  $R_P$ , is approximately 3 $\Omega$ . Repeating the simulations considering  $R_P = 3\Omega$  yields the solid-line plot in Fig. 3.13, which matches the measurement results well. In the designed layout, we modified the layout to decrease this resistance to 0.5 $\Omega$ . The dashed plot in Fig. 3.13 shows that this layout modification reduces the systematic mismatch to a maximum of 1.6%.





## 3.3.2 Single-phase BICI

Cadence SpectreS [101] and SpectreSVerilog simulators were used to simulate the BICI circuit described in Sec. 4.2.7. BSIM3 MOS models were used for analog and mixed-signal simulations.

Table 3.4 reports simulation results for the average current measured by the BICI for a set of 13 different current signals with different waveforms, frequency, amplitude, and

| Name | Waveform             | Parameters                                              |
|------|----------------------|---------------------------------------------------------|
| DC1  | DC                   | $I_{DC} = 150\mu A$                                     |
| DC2  | DC                   | $I_{DC} = 140\mu A$                                     |
| DC3  | DC                   | $I_{DC} = 100\mu A$                                     |
| DC4  | DC                   | $I_{DC} = 50\mu A$                                      |
| SIN1 | sinusoid             | f=3 MHz, DC offset=60 $\mu A$ , Amplitude=40 $\mu A$    |
| SIN2 | sinusoid             | f=100 kHz, DC offset=100 $\mu A$ , Amplitude=50 $\mu A$ |
| SIN3 | sinusoid             | f=5 kHz, DC offset=80 $\mu A$ , Amplitude=70 $\mu A$    |
| TRG1 | triangular           | f=3.57 MHz, DC offset=50 $\mu A$ , Amplitude=20 $\mu A$ |
| TRG2 | triangular           | f=20 kHz, DC offset=70 $\mu A$ , Amplitude=30 $\mu A$   |
| TRG3 | triangular           | f=2.5 kHz, DC offset=120 $\mu A$ , Amplitude=30 $\mu A$ |
| SPK1 | Shown in Fig. 4.8(a) |                                                         |
| SPK1 | Shown in Fig. 4.8(b) |                                                         |
| SQ   | Shown in Fig. 4.8(c) |                                                         |

Table 3.3: Single-phase BICI test signals

offset values. The large set was to test the performance of the circuit for a wide range of input signals described in Table 3.3.

Signals SIN2 and TRG2 were used as calibration signals because they satisfy the conditions mentioned in Sec. 3.2.4. The second column in Table 3.4 reports the true average current for each signal and the fourth column indicates the average current estimated by the BICI circuit. The relative error of the estimated average current is reported in the last column. A relatively large error occurs when the average current is small and the current is constant for a large portion of its period (signals SPK1 and SPK2). This is due to the quantization noise Q because: (i)  $Q_i$ 's becomes significant as the value of the current decreases; (ii) since current integration in each integration sub-window yields a constant voltage,  $Q_i$  is constant for  $i = 0, \ldots, M-1$ , causing the error to accumulate over time. The same reasoning explains the relatively large errors for signals DC4 and SQ. Such inaccuracy, however,

may not be a limiting factor in using this circuit for test purposes because the faulty current may be well below the lower threshold of the current tolerance band. For example, assume the fault-free current is 100  $\mu A$ , and the current lower threshold is 50  $\mu A$ . If the average current measured for a faulty circuit is 20  $\mu A$ , even with 25% error for this value (the actual average current could be  $25 \mu A < 50 \mu A$ ), the decision to discard this circuit is correct.

The signals SPK1 and SPK2 (Figs. 3.14*a* and 3.14*b*) are in fact similar to smallvalued DC currents but with periodic short time current spikes in each period. Such signals model supply current of circuits such as VCOs and digital circuit blocks.

The error is also significant for signal SIN1. The reason is that signal SIN1 is correlated with the  $\phi$  signal, causing both Q and R to accumulate over time.

In summary, simulations indicate that the BICI circuit provides < 2% error provided the current ac component is not correlated with  $\phi$ .



Figure 3.14: Current waveforms SPK1, SPK2, and SQ for validating the operation of the single-phase BICI circuit

| Waveform | True I    | N     | $\overline{I}_{estimated}$ | $100(\Delta \overline{I}/\overline{I})$ |
|----------|-----------|-------|----------------------------|-----------------------------------------|
|          | $(\mu A)$ |       | $(\mu A)$                  |                                         |
| DC1      | 150       | 16985 | 150.1                      | 0.1                                     |
| DC2      | 140       | 15986 | 140.4                      | 0.3                                     |
| DC3      | 100       | 11990 | 101.6                      | 1.6                                     |
| DC4      | 50        | 6994  | 53.1                       | 6.1                                     |
| SIN1     | 60        | 7993  | 62.8                       | 4.6                                     |
| SIN2     | 100       | 11826 | 100                        | 0 (Cal. point)                          |
| SIN3     | 80        | 9688  | 79.2                       | -1                                      |
| TRG1     | 50        | 6710  | 47.2                       | 0.6                                     |
| TRG2     | 70        | 8737  | 70                         | 0 (Cal. point)                          |
| TRG3     | 123       | 14207 | 123.1                      | 0.1                                     |
| SPK1     | 18.3      | 2999  | 14.3                       | -22                                     |
| SPK2     | 33        | 4388  | 27.7                       | -16                                     |
| SQ       | 72.4      | 9266  | 74.6                       | 3                                       |

Table 3.4: Simulation results for the average current measurements by single-phase BICI circuit

# **3.4** Conclusions

We presented a built-in current (BIC) monitor suitable for power supply current  $(I_{DD})$  testing of analog circuit blocks. The BIC monitor senses the  $I_{DD}$  and generates a digital signature by averaging the current over a programmable time window.

The BIC monitor provides a low impedance,  $Z_{BIC}$ ,  $(2.7\Omega)$  in the power supply path of the CUT.  $Z_{BIC}(f)$  has a bandwidth of about 5.3 MHz. The BIC sensor could be used in circuits where the CUT supply current has large DC and smaller ac components. This is the case in many analog circuits because (i) the biasing circuitry contributes a large DC current, and (ii) most embedded analog circuits do not drive large ac loads.

The BIC sensor could also be used in high speed submicron digital circuits by adding a bypass capacitor on the CUT connection to the sensor. Such a capacitor would reduce the maximum impedance of the sensor (as shown in Fig. 3.10), which would result in less power supply noise.

For BIST, a necessary functional block, besides the sensing block, is a measurement block. In this chapter, a measurement block was proposed that measures average current, *i.e.*, integrates current over a time window. The BICI requires small area (equivalent to a circuit consisting of 250 NAND gates). By using any linear feedback shift register (LFSR) or a register already available on chip (*e.g.*, as part of digital BIST) to implement the counter needed for this circuit, a saving of 28% in area is possible. BICI also yields a digital signature which is convenient to interface to standard digital external or internal ATE and standards like IEEE1149.1. Our circuit provides a good accuracy (error < 2%) if the current levels are not too small relative to the maximum current for which the circuit has been designed (*e.g.*,  $\overline{I}/\overline{I}_{MAX} > 0.3$ ). The digital signature makes the monitor suitable for BIST application.

# Chapter 4

# **Double-Phase Built-In Integrator**

# 4.1 Introduction

As discussed in Sec. 1.1, averaging is an effective method for generating current signatures. In Sec. 3.2 a single-phase BICI circuit was presented that performs on-chip current integration. Although sufficient for some applications, the single-phase BICI might be inadequate for applications where higher measurement accuracy is required.

This chapter presents a double-phase BICI circuit which performs integration during both phases of the control signal (hence, the name 'double-phase'). This novel BICI architecture uses the same principles as those in Sec. 3.2.1. However, the circuit proposed here provides significantly more accurate measurements at the expense of a slightly larger area (approximately 20% more). The higher accuracy is achieved by: (*i*) using two integrators which operate at two complementary time intervals in each cycle of a control signal, hence eliminating reset error; and, (*ii*) feeding forward the quantization residues from each integration interval to the next, thereby preventing the accumulation of quantization noise. The double-phase BICI generates a digital signature proportional to  $\overline{I_{DD}}$ . Like the single-phase BICI, this circuit is also compact and can provide a worst-case error less than 1% for any  $I_{DD}$  waveform.

The organization of this chapter is as follows. In Sec. 4.2, the details of BICI circuit are described. Sec. 4.3 reports the simulation results. Sec. 4.4 provides further discussion regarding the proposed circuits, and concludes.

# 4.2 Double-phase BICI Circuit

#### **4.2.1** Basis

The functional block diagram in Fig. 4.1 illustrates the basis of double-phase BICI. In this



Figure 4.1: Double-phase BICI functional block diagram

circuit [93], the total integration window of duration T is divided into M smaller integration sub-windows of duration  $T_s$  sec., *i.e.*,  $T = MT_s$ . the input current is split into two components:

$$I = I_A + I_B$$

where  $I_A = \alpha I$  and  $I_B = (1 - \alpha)I$ .  $\alpha$  is a periodic function with period of  $T_s$ , whose one period is defined as:

$$\alpha = \begin{cases} 1 & 0 \le t < T_s/2 \\ 0 & T_s/2 \le t < T_s \end{cases}$$

 $I_A$  and  $I_B$  are integrated by the two half-wave current integrators HCI(A) and HCI(B) over the time T to generate two digital numbers,  $N_A$  and  $N_B$ , respectively, such that:

$$N_A = K_\Delta \int_0^T I_A dt + Q_A$$
$$N_B = K_\Delta \int_0^T I_B dt + Q_B$$

where  $Q_A$  and  $Q_B$  are HCI(A) and HCI(B) quantization errors. The digital adder sums  $N_A$  and  $N_B$  to obtain N which is proportional to the integration of I over T:

$$N = N_A + N_B = K_\Delta \int_0^T I dt + Q_A + Q_B$$

A HCI block generally functions according to the principles described in Sec. 3.2.1, but it uses a different ADC method than the one described in Sec. 3.2.1. The ADC presented in Sec. 3.2.3 generates a number  $N_i$  such that  $V_i = N_i V_{\Delta} + V_i^R$  where  $V_{\Delta}$  is the quantization step of the ADC, and  $V_i^R < V_{\Delta}$  is the ADC's quantization voltage error in the *i*-th integration sub-window. Therefore, from Eqn. 3.13,  $V^R = \sum_{i=0}^{M-1} V_i^R$ , which may result in significant quantization error. We propose a different ADC design to reduce  $V^R$ . The new ADC uses a quantization residue feed forward technique, which enhances the accuracy of the BICI by preventing the accumulation of quantization errors  $V_i^R$ . In this technique,  $V_{i=1}^R$ , the residual voltage from digitization in the (i - 1)-th sub-window is fed forward for digitization in the *i*-th sub-window. This is illustrated in Fig. 4.2 for an example ADC with seven quantization thresholds. Using this technique, ADC converts  $V_i^A = (V_i + V_{i-1}^R)$  to an n-bit digital number  $N_i$ , such that:

$$V_i^A = N_i V_\Delta + V_i^R \tag{4.1}$$

From Eqns. 3.11, 4.1, and the above discussion, the BICI's functionality is summarized as follows:

$$K_{C} \int_{0}^{T} I dt = \sum_{i=0}^{M-1} K_{C} \int_{iT_{s}}^{(i+1)T_{s}} I dt$$

$$= N_{0}V_{\Delta} + (V_{0}^{R} + K_{C} \int_{T_{s}}^{2T_{s}} I dt)$$

$$+ \sum_{i=2}^{M-1} K_{C} \int_{iT_{s}}^{(i+1)T_{s}} I dt$$

$$= N_{0}V_{\Delta} + N_{1}V_{\Delta} + (V_{1}^{R} + K_{C} \int_{2T_{s}}^{3T_{s}} I dt)$$

$$+ \sum_{i=3}^{M-1} K_{C} \int_{iT_{s}}^{(i+1)T_{s}} I dt$$

$$= .$$

$$= V_{\Delta} \sum_{i=0}^{M-1} N_{i} + V_{M-1}^{R}$$
(4.2)

From Eqn. 4.2, it is evident that  $V^R = V^R_{M-1}$ . The above is an idealized analysis. Actual limitations arising in practice are discussed later.

# 4.2.2 Notation and definitions

The notations and definitions used in describing the integrator circuit and its operation are as follows:

• CLK: Clock signal



Figure 4.2: Quantization residue feed forward technique used in the ADC of double-phase BICI

- $\phi$ : Signal with period  $T_s$  controlling the integration timing.
- $\Phi$ : Signal with period  $T_s$  controlling the integration timing in a half-wave current integrator.
- T: Total integration time window duration
- $T_s$ : Integration sub-window duration such that  $T = MT_s$
- M: The number of integration sub-windows in T
- $\phi_i$ : *i*-th (i = 0, ..., M 1) integration sub-window
- $T_{s(eff)}$ : "effective integration interval"; the portion of  $T_s$  in which integration is performed.
- $T_R = T_s T_{s(eff)}$ : "reset time"; portion of  $T_s$  in which the circuit is reset for the next integration sub-window.
- $t_i$ : Beginning of the period  $\phi_i$ .

- $t'_i = t_i + T_{s(eff)}$
- $N_i$ : The integer number proportional to integration of I in  $T_i$
- N: Final number proportional to integration of I over time T  $(N = \sum_{i=0}^{M-1} N_i)$
- Q: Total quantization noise over time T

Any parameter with a (A) or (B) in its subscript refers to the corresponding parameter in HCI(A) or HCI(B), respectively.

# 4.2.3 Circuit operation



Figure 4.3: Current integrating circuit schematic

The BICI schematic and its important waveforms are given in Figs. 4.3 and 4.4, respectively. The DIVIDER divides the frequency of the clock by an even number, L, and generates the signal  $\phi$  with 50% duty cycle. The COUNTER increments at each rising edge of the clock if its **inhibit** input is LOW and keeps its value otherwise. For implementation reasons, we use two half-wave current integrators (HCI) in parallel. HCI(A) and HCI(B) integrate  $I_{\phi}$  and  $I_{\overline{\phi}}$ , respectively, where  $I_X$  ( $X = \phi$  or  $\overline{\phi}$ ) is defined as below:

$$I_X = \begin{cases} I & X = HIGH \\ 0 & X = LOW \end{cases}$$

HCI(A) integrates I only when  $\phi_i = LOW$  ( $\phi_{i(L)}$  interval). During  $\phi_{i(L)}$ , HCI(A) turns its **cnt\_ctrl** output LOW for a time proportional to the current integration in  $\phi_{(i-1)(L)}$ . This causes a number  $N_{i(A)}$  to be added to the counter value. Similarly, during  $\phi_{i(H)}$  a number  $N_{i(B)}$  proportional to the current integration during  $\phi_{(i-1)(H)}$  is added to the counter value. Since HCI(A) and HCI(B) perform integration during complementary times, the counter state at the end of the operation will be proportional to the integral of I over T.

#### 4.2.4 Half-wave current integrator (HCI)

The HCI operates based on principles described in Sec. 4.2.1. The HCI circuit schematic is illustrated in Fig. 4.5, while a timing diagram of its significant waveforms is shown in Fig. 4.4. The input current I is assumed to be a sinusoid with DC offset.  $\Phi$  is a signal with period  $T_s$  which controls the integration timing. Since  $T = MT_s$ , a complete half-wave integration requires M consecutive periods of  $\Phi$ . In the interval  $\Phi_{i(L)}$ , two parallel operations are performed:

- 1. *I* is integrated for *effective integration time* interval  $T_{s(eff)}$  resulting in the voltage waveform  $V_C$
- 2. Concurrently to the above, the ADC & ACC blocks digitize and accumulate the voltage  $(V_{i-1} + V_{i-2}^{R})$  stored on  $C_3$  at the end of the previous integration sub-window  $\Phi_{i-1}$



Figure 4.4: Timing diagram and different waveforms in the integrator circuit of Fig. 4.3

During the  $\Phi_{i(H)}$  two parallel operations are also performed:

- 1. During the reset time  $T_R = duration of \Phi_{i(H)}, C_1$  is discharged and the control circuitry is reset (initialized) for the next integration sub-window
- 2. A voltage,  $V_i$ , indicating the integration result is added to the  $V_{i-1}^R$  voltage stored on  $C_3$  ( $V_S$  waveform). Therefore, at the end of  $\phi_{i(H)}$ ,  $V_{C3} = V_i + V_{i-1}^R$

These operations are repeated over M integration sub-windows to obtain the half-wave integral of I over the time  $MT_s$ . Further details are given next.

#### Short-time analog integrator (STAI) and S&H

Since the operation of STAI and S&H blocks are tightly coupled, both are described in this section.

From Figs. 4.5 and 4.4, in the interval  $\Phi_{i(L)}$ , for time  $T_{s(eff)}$  switches S1, S3 and S5 are closed, and switches S2, S4 and S6 are open. During such a time interval, I flows through  $C_1$  and  $C_2$ . The voltage at node C, *i.e.*,  $V_C$ , at the end of  $\Phi_{i(L)}$ , *i.e.*,  $V_C^i$ , is given by:

$$V_C^i = V_C(t_i') = \frac{1}{C_1 + C_2} \int_{t_i}^{t_i'} I dt + K_{21} V_\alpha$$
(4.3)

where  $V_{\alpha}$  is a constant voltage remained on  $C_2$  from  $\phi_{i-1}$  period,  $K_{21} = C_2/(C_1 + C_2)$ ,  $t_i = iT_s$  and  $t'_i = iT_s + T_{s(eff)} = (i+1)T_s - T_R$ . Note that we assume  $C_1$  is fully discharged at the end of each interval  $T_i$ .

For the time  $T_R$ , *i.e.*, during  $\Phi_{i(H)}$ , switches S1, S3 and S5 are opened, and switches S2, S4 and S6 are closed. This effectively results in two functions: (i)  $C_1$  is discharged through S2 to initialize  $C_1$  for the next integration sub-window; (ii) the charge stored on



Figure 4.5: Half-wave current integrating circuit schematic

 $C_2$  at the end of  $\Phi_{i(L)}$  is transferred to  $C_3$ . The charge transfer is accomplished as follows. A current starts flowing from  $V_{DD}$  through  $C_2$  and  $C_3$  to  $V_{ref1}$ . As soon as the voltage on  $C_2$  crosses 0V, the comparator CMP1 switches state causing the switch S6 to open and hold the voltage on  $C_3$ . This results in the charge  $Q_{C2} = C_2 V_i^C$  to be added to  $C_3$  and held:

$$V_{C3} = V_{i-1}^{R} + K_{23}V_{i}^{C}$$

$$= V_{i-1}^{R} + V_{i}$$
(4.4)

where  $K_{23} = C_2/C_3$ . The waveforms  $V_S$  and  $V_A$  in Fig. 4.4 illustrate this operation for  $K_{23} = 1/3$ .

#### ADC & accumulator

During  $\Phi_{i(L)}$  (time  $t_i$  to  $t'_i$ ), the ADC and ACC blocks digitize the voltage  $(V_{i-1}+V_{i-2}^R)$  stored on  $C_3$  (node S) and add the converted voltage to the number accumulated in a counter. The single slope technique is used to perform the analog-to-digital conversion because it can be implemented with small area and provides good linearity. The principle of the ADC ACC blocks operation is further explained next.

When  $\Phi$  turns LOW, the flip-flop FF2 is set (r\_rst signal turns LOW) causing the counter to start counting (incrementing by one at each CLK rising edge). The CLK period must be smaller than  $T_{s(eff)}$ , e.g.,  $T_{s(eff)} = 10T_{CLK}$ . Concurrently, the switch S7 closes. This causes the constant current  $I_{ramp}$  to flow through  $C_3$  generating a ramp voltage waveform with constant slope at node R. The comparator CMP2 output, r1, switches state when  $V_R \geq V_{ref2}$ . At this moment, FF1 is set. This, in turn, resets FF2, which causes the counter to stop at the next clock edge. At the end of such a cycle of operations, a number  $N_{i-1}$  proportional to  $(V_{i-1} + V_{i-2}^R)$  is added to the previous value already accumulated in the counter.

After stopping the counter, S7 is opened which causes the voltage  $V_{i-1}^R$  to remain on  $C_3$ . Also, FF1 is reset to initialize FF1 and FF2 for conversion in the next integration sub-window.

As shown in Appendix C, the relationship between the final state of the counter, N, and the average current over time T for HCI(Y) (Y=A or B) is obtained as follows:

$$N_{(Y)} = K_{(Y)} \int_{0}^{T} I_{\Phi} dt + O_{(Y)} + E_{(Y)}$$
  
=  $K_{(Y)} T \overline{I_{\Phi}} + O_{(Y)} + E_{(Y)}$  (4.5)

where K is a constant factor, O is the offset value, and E is a random number resulting from ADC quantization noise Q.  $E_{(Y)}$  can be shown to have a uniform distribution in the range [0,1) with a mean of 0.5 and a variance  $\sigma_E^2 = \frac{1}{12}$  (see Appendix C). Sec. 4.2.5 describes how K and O can be obtained through calibration.

## 4.2.5 Double-phase BICI two-point calibration

From Fig. 4.3 and Eqn. 4.5, N is obtained below:

$$N = N_{(A)} + N_{(B)}$$

$$N = K_{(A)} \int_{0}^{T} I_{\phi} dt + O_{(A)} + E_{(A)}$$

$$+ K_{(B)} \int_{0}^{T} I_{\overline{\phi}} dt + O_{(B)} + E_{(B)}$$
(4.6)

Assuming  $K_{(A)} = K_{(B)} = K$ ,  $O_{(A)} = O_{(B)} = O$ , and  $E = E_{(A)} + E_{(B)}$ , the following relationship results:

$$N = KT\overline{I} + O + E \tag{4.7}$$

where 0 < E < 2 (from Appendix C). For large values of M, N is a large number, therefore E is negligible and can be ignored.

To calibrate the integrator circuit, two known DC current signals,  $I_{c1}$  and  $I_{c2}$ , are applied to the integrator circuit, and the calibration is performed as explained in Sec. 3.2.4. The precise DC current  $\overline{I_{c1}}$  has to either be generated on-chip or be supplied from off-chip.  $\overline{I_{c2}}$  can be generated using a properly matched current mirror  $\overline{I_{c2}} = n\overline{I_{c1}}$ .

#### 4.2.6 BICI accuracy

To achieve a high measurement accuracy, sources of error must be identified and their impact minimized by design. Factors affecting the measurement accuracy of the integrator circuit are as follows:

- As shown in Appendix C, C<sub>1</sub>, C<sub>2</sub>, C<sub>3</sub>, and I<sub>ramp</sub> variations and the propagation delays in the comparators, flip-flops and the gates affect the factors K and O in Eqn. 4.6. K and O can be determined using a two-point calibration scheme (obtaining N for two known I values). Therefore, as in single-phase BICI the process, temperature and power supply variations affecting K and O will not affect measurement accuracy
- 2. The error due to clock feedthrough associated with the switches in the circuit can be minimized as described in Sec. 3.2.5
- 3. The error term E in Eqn. 4.6 is between 0 and 2, whereas the number N is much larger for large values of M. Therefore, E is negligible

## 4.2.7 Circuit implementation

Assuming that the integrator circuit is to be used in an application where the maximum average current in one period of  $\phi$  is  $120 \,\mu A$  and the available CLK frequency is 20 MHz, the

following parameters were chosen to design the integrator circuit:

- T = 1 ms,  $T_s = 1 \mu s$ , M = 1000,
- $T_{s(eff)} = 0.5 \,\mu s, T_R = 0.5 \,\mu,$
- $f_{CLK} = 20 MHz$ ,
- $I_{ramp} = 11.2 \, \mu A$
- Counter size: 18-bits

These parameters imply  $C_{1(A)} = C_{1(B)} = 30 \, pF$ . The transistor level schematic of the main circuit and the comparator [100, Chapter 26] are shown in Figs. 4.6 and 4.7, respectively. The component values and transistor sizes are shown for critical components. These values have been chosen to achieve reasonable integration accuracy (error < 1%) as explained in Sec. 4.2.6, and to reduce the circuit area.

In each H-CI block,  $\phi_a$  and  $\phi_b$  are non-overlapping control signals constructed from  $\phi$ . They are used instead of  $\phi$  and  $\overline{\phi}$  in Fig. 4.5 to avoid sample and hold errors [97].

The total area of the circuit is 72500  $(\mu m)^2$ . This is equivalent to the area of a digital circuit consisting of 280 NAND gates in a 0.35  $\mu m$  digital cell library.







Figure 4.7: Schematic of the comparator

# 4.3 Simulation and Experimental Results

The Cadence Spectre Verilog mixed-signal simulator was used to simulate the integrator circuit described in Sec. 4.2.7. Table 4.2 reports the simulation results for the average current measured by the circuit for a set of 13 different current signals, each with different waveforms, frequency, amplitude, and offset values to test the performance of the circuit for a wide range of input signals, described in Table 4.1.

Signals DC3 and DC6 are used as calibration signals. The second column in Table 4.2 reports the true average current for each signal, while the fourth column indicates the average current estimated by the BICI. The relative error of the estimated average current is

| Name | Waveform   | Parameters                          |
|------|------------|-------------------------------------|
| DC1  | DC         | $I_{DC} = 10\mu A$                  |
| DC2  | DC         | $I_{DC} = 17\mu A$                  |
| DC3  | DC         | $I_{DC} = 25\mu A$                  |
| DC4  | DC         | $I_{DC} = 36\mu A$                  |
| DC5  | DC         | $I_{DC} = 53.4\mu A$                |
| DC6  | DC         | $I_{DC} = 83.2\mu A$                |
| DC7  | DC         | $I_{DC} = 105.6 \mu A$              |
| DC8  | DC         | $I_{DC} = 122.4\mu A$               |
| SIN1 | sine       | f=1 MHz, DC offset=36 $\mu A$ ,     |
|      |            | Amplitude=15.5 $\mu A$              |
| SIN2 | sine       | f=4.3 kHz, DC offset=87.5 $\mu A$ , |
|      |            | Amplitude=20.6 $\mu A$              |
| SIN3 | sine       | f=101 kHz, DC offset=20 $\mu A$ ,   |
|      | 1          | Amplitude= $10  \mu A$              |
| TRG1 | triangular | f=35.7 MHz, DC offset=50 $\mu A$ ,  |
|      |            | Amplitude=20 $\mu A$                |
| TRG2 | triangular | f=20 kHz, DC offset=65 $\mu A$ ,    |
|      |            | Amplitude=45 $\mu A$                |
| TRG3 | triangular | f=4 kHz, DC offset=74 $\mu A$ ,     |
|      |            | Amplitude=30 $\mu A$                |
| SQ   | square     | Shown in Fig. 4.8                   |

Table 4.1: Double-phase BICI test signals

given in the last column. Simulations indicate that the BICI provides a very good accuracy (< 1% error) for all the current signals chosen.

# 4.4 Conclusions

This chapter presented a mixed-signal built-in current integrator (BICI) used to perform current integration for the purpose of analog block testing. The BICI generates a digital signature by averaging the current over a programmable time window.



Figure 4.8: Current waveforms for validating the operation of the integrator circuit

The BICI requires small area (equivalent to a circuit consisting of 280 NAND gates). By using any LSFR or register already available on chip (*e.g.*, as part of JTAG or digital BIST) to implement the counter needed for this circuit, a saving of 25% in area is possible. BICI also yields a digital signature which is convenient to interface to standard digital external or internal ATE and standards such as IEEE1149.1. Our circuit provides good accuracy (error < 1%). The monitor is suitable for current-based embedded test application but also for on-chip power monitoring because it is compact and generates a digital signature. In addition, the BICI generates digitized samples of  $\overline{I_{DD}}$  during the integration window. Additional processing of these samples may be used to generate more complex signatures with potentially higher fault coverage.

| Waveform | True $\overline{I}$ | Ν     | $\overline{I}_{estimated}$ | $100(\Delta \overline{I}/\overline{I})$ |
|----------|---------------------|-------|----------------------------|-----------------------------------------|
|          | $(\mu A)$           |       | $(\mu A)$                  |                                         |
| DC1      | 10                  | 2740  | 9.9                        | 1                                       |
| DC2      | 17                  | 3403  | 17.0                       | 0 .                                     |
| DC3      | 25                  | 4157  | 25                         | 0 (Cal. point)                          |
| DC4      | 36                  | 5196  | 36                         | 0                                       |
| DC5      | 53.4                | 6825  | 53.4                       | 0                                       |
| DC6      | 83.2                | 9634  | 83.2                       | 0 (Cal. point)                          |
| DC7      | 121.7               | 13322 | 122.4                      | 0.6                                     |
| DC8      | 105.6               | 11778 | 106                        | 0.4                                     |
| SIN1     | 36                  | 5181  | 36                         | 0                                       |
| SIN2     | 88.67               | 10142 | 88.6                       | 0.07                                    |
| SIN3     | 20                  | 3690  | 20                         | 0                                       |
| TRG1     | 50.1                | 6505  | 50                         | 0.2                                     |
| TRG2     | 65                  | 7925  | 65                         | 0                                       |
| TRG3     | 74                  | 8763  | 73.9                       | 0.14                                    |
| SQ       | 22.63               | 3929  | 22.6                       | 0.1                                     |

Table 4.2: Simulations of average current measurements by double-phase BICI

# Chapter 5

# **On-Chip Jitter Specification Testing of High-Performance PLLs**

This chapter presents a jitter measurement and generation circuit for BIST of PLLs. The circuit satisfies all the conditions for a practical BIST circuit described in Sec. 2.2.4. The measurement circuit is fully digital and automatically synthesizable, occupies an area equivalent to 1200 2-input NAND gate, provides a resolution of approximately 10ps (~1/5 of a gate delay in a standard 0.35  $\mu m$  technology), and generates a digital signature which can be read out by an inexpensive tester for further analysis to obtain the jitter characteristics.

The remainder of this chapter is organized as follows. In Sec. 5.1, various jitter specifications are defined. Sec. 5.2 describes the jitter measurement circuit. Sec. 5.3 details the jitter generator circuit. Sec. 5.4 outlines ways in which the jitter generation and measurement circuits can be used for testing various jitter specifications of PLLs. Sec. 5.5 contains the circuit implementation details. Sec. 5.6 reports some simulation results, and Sec. 5.7 concludes the chapter.

# 5.1 Jitter Definitions

The definition of jitter varies depending on the fields of application. In sequential circuits, *e.g.*, CPUs, jitter is defined as the variation of the clock period, known as *cycle-to-cycle* or *period jitter*. Such variation is best modeled as a frequency modulation of the clock signal. More formally we can write:

$$V_{FM}(t) = sgn[sin(\int_0^t (\frac{2\pi}{T_0 + T_J(t)})dt)]$$
(5.1)

where  $V_{FM}(t)$  is the clock signal,  $T_0$  is the average clock period,  $T_J(t)$  is the frequency modulating jitter signal and sgn[x] is the sign function:

$$sgn[x] = \begin{cases} 1 \quad x > 0 \\ 0 \quad x \le 0 \end{cases}$$

$$(5.2)$$

Fig. 5.1(a) illustrates how the period jitter samples are collected by measuring the duration of each period of the signal IN1.

In serial communication applications, jitter is defined as the short-term variations of a digital signal's significant instants, *e.g.*, rising edges, from their ideal position in time [20]. Such jitter is often often denoted as accumulative jitter and is described as a phase modulation of a clock signal. Formally:

$$V_{PM}(t) = sgn[sin(\omega_0 t + \omega_0 \tau_J(t))]$$
(5.3)

where  $V_{PM}(t)$  is the jittered clock,  $\omega_0$  is the average angular frequency, and  $\tau_J(t)$  is the phase modulating jitter signal. In a clock synthesis circuit, where the absolute jitter is important, often a jitter-free (practically low-jitter) reference signal is used for jitter measurement. In such a case, the difference between the position of corresponding edges of the signal (IN1) relative to the reference clock (REF) indicates the jitter. Fig. 5.1(b) illustrates how accumulative jitter samples,  $\tau_{J(i)}$  for i = 1, ..., are collected. Sometimes, the relative jitter between two signals is of interest if neither of the two signals is a jitter-free signal, *e.g.*, in data recovery circuits. Fig. 5.1(c) shows how relative jitter between the edges of signal IN1 and IN2 is measured.



Figure 5.1: (a) Measuring cycle-to-cycle or period jitter, (b) Measuring accumulative jitter using a reference clock, (c) Measuring relative jitter

For both period and accumulative jitter measurements, M jitter samples,  $T_{J(i)}$  or  $\tau_{J(i)}$ (i = 1, ..., M) are collected to calculate jitter characteristics, such as rms, peak-to-peak, or frequency components. For example, the rms and peak-to-peak period jitter is obtained as:

$$T_{J(rms)} = \sqrt{1/M \sum_{i=0}^{M-1} T_{J(i)}^2}$$
$$T_{J(pk-to-pk)} = max(T_J) - min(T_J)$$

# 5.1.1 PLL jitter specifications

The important jitter specifications for PLLs used in digital communication interfaces are intrinsic jitter, jitter tolerance and jitter transfer. These specifications are given in standards for each application (*e.g.*, see [20] for SONET interfaces).

- 1. **Intrinsic jitter** is defined as the jitter at the output of the PLL when the input is jitterfree. This is often expressed in terms of unit interval UI, which is defined as the period of a signal with a frequency equal to the average frequency of the original signal. For example in 155.54 MHz SONET network application, 1 UI is 6.429 ns
- 2. **Jitter transfer** is defined as the ratio of the output jitter to input jitter of the PLL as a function of frequency
- 3. **Jitter tolerance** is the peak-to-peak amplitude of the sinusoidal jitter applied to the input of the PLL which causes 1dB power penalty (in terms of bit error rate)

# 5.2 Jitter Measurement Circuit (JMC)

Here a digital circuit is presented which is capable of measuring jitter not only of PLLs, but of any signal with high resolution, as illustrated in Fig. 5.2. The core of this circuit is a

high-resolution time-to-digital converter (TDC) which measures a time interval  $T_d$ :

$$T_d = t_{STOP} - t_{START} \tag{5.4}$$

where  $t_{STOP}$  and  $t_{START}$  are the time instances at which the rising edges of the STOP and START signals occur, respectively. In this circuit, the Edge Sampler block controlled by the ES Controller selects the appropriate START and STOP edges and passes them to the TDC. For measuring different jitter specifications, the Edge Sampler and the ES Controller have to be adapted accordingly while TDC remains the same. Examples of Edge Samplers (and their associated controllers) are given later in Sec. 5.4 for cycle-to-cycle and relative jitter measurements.



Figure 5.2: Block diagram of the proposed jitter measurement circuit

The TDC circuit details follow. However, before delving into the details of this circuit, the state of the art in TDC design is reviewed.

# 5.2.1 State of the art in TDC design

A classic method of measuring a time interval  $T_d$  is to start a counter at the beginning of the interval and stop it when the interval ends. The resulting number in the counter will be proportional to  $T_d$ . The resolution in this method is the period of the clock controlling the counter. To measure intrinsic jitter of a high-speed PLL (*e.g.*, 155MHz clock synthesis PLL), where a high resolution in the range of 20 ps is required, a clock frequency of 50 GHZ would be needed! Obviously such method is not suitable for on-chip high resolution time measurement when the maximum clock available is in the range of a few hundreds of MHz.

In [102] a TDC based on the use of a delay chain as shown in Fig. 5.3 is presented. In this circuit, the output of the delay elements in the delay chain are set HIGH as the START rising edge travels through them. When the STOP rising edge arrives, only the flip-flops with a HIGH on their D inputs will have their outputs set HIGH. That is, the final flip-flop settings correspond to a snap shot of the delay chain at the time of the STOP rising edge. Therefore the number of set flip-flops indicates the number of delay elements (N) that the START edge travels through before the STOP edge arrives. Consequently,

$$T_d = t_{STOP} - t_{START} = NT_\Delta + T_C + T_Q$$

where  $T_{\Delta}$ , the quantization step, is the delay of each delay element,  $0 < T_Q < T_{\Delta}$  is the quantization error, and  $T_C$  is a constant offset delay due to set-up time of the DFFs and any delay difference in the paths of the START and STOP signals to the delay chain and the recording flip-flops. A delay locked loop (DLL) is used to calibrate the delay elements to a known delay  $T_{\Delta} = T_{ref}/M$ , where  $T_{ref}$  is the period of a reference clock and M is the number of delay elements. Such a calibration requires very good matching between all the delay elements in both the delay chain and the DLL. In [35] an alternative circuit is proposed to combine the delay chain and the DLL, hence obviating the need for element matching.

In schemes mentioned above, the DLL and the controlled delay elements are analog. Eliminating the DLL and using digital gates as delay elements make the circuit fully



Figure 5.3: Time digitization using a delay chain

digital. In that case, a two-point (instead of one-point) calibration scheme can be used to extract  $T_{\Delta}$  and  $T_{C}$ . The trade-off is decreased accuracy due to the quantization error associated with calibration reference inputs, as demonstrated in Appendix D.1 (Eqns. 5.35 and 5.32). The resolution ( $T_{\Delta}$ ) of such methods without time interpolation is limited to one gate delay at best. In a 0.35  $\mu m$  CMOS technology, the smallest gate delay is approximately 50 ps, whereas a resolution and precision of about 20ps is required for functional testing of high-speed PLLs with 155 MHz center frequency[20]. Also, since this delay is dependent on process variations and temperature, the resolution in such schemes is not controllable. The authors of [36] propose the use of an array of DLLs to improve the measurement resolution, while those of [103] propose an RC delay line approach to increase the measurement resolution through time interpolation. Although resolutions in the range of 25ps (rms) have been reported in these papers, the design of the circuit requires a great deal of care because of the need for a high degree of matching. Also, the design and layout of the DLL need careful attention due to the presence of significant power supply noise in large mixed-signal ICs.

In [37] a differential delay technique based on using two delay chains is used (Fig. 5.4). One chain is composed of gates (each with a delay of  $\tau_g$ ), while the other is made of latches (each with a delay of  $\tau_l$ ). The latch  $L_{i-1}$  in chain 2 samples the signal at the *i*-th tap of the chain 1 ( $D_i$ ) on the rising edge of the signal  $F_i$ . Since  $F_1$  is delayed with respect to  $D_1$ , latch  $L_0$  samples HIGH. Since  $\tau_g < \tau_l$ ,  $D_i$  approaches  $F_i$  as time progresses until the edge on  $F_i$ passes  $D_i$  and  $L_{i-1}$  samples a LOW for i = N. In this method, the time quantization step is the difference between the delay of the delay elements in two delay chains:

$$T_{\Delta} = \tau_l - \tau_g \tag{5.5}$$



Figure 5.4: Time digitization using differential delay technique

Since gates and latches are very different structures,  $\tau_l$  and  $\tau_g$  may differ significantly, making it difficult to achieve high resolution (in the range of 20 ps or less).

All the schemes mentioned above require good matching of the elements in the delay chains, something which is difficult to achieve within an accuracy of 1% under typical process variations. As the time interval to be measured becomes longer, more elements must be added to the delay chains, making it even more difficult to assure matching of delays in the chains; when more elements more added, the elements will have to be placed further apart and more routing delay will have to be accounted for. Therefore, these schemes do not provide good TDC linearity. In addition, they do not lend themselves well to automatic place and route. Furthermore, the resolution is set by the process parameters on each chip and cannot be controlled or adjusted.

In the following section, we introduce a high-resolution TDC which provides good linearity, is automatically placeable and routable, and has an adjustable resolution.

### 5.2.2 High-resolution TDC

Fig. 5.5 illustrates the block diagram of the proposed TDC circuit. The Time Quantizer (TQ) block quantizes time with a quantization step of  $T_{\Delta}$ , which is set by the Resolution Adjustment (RA) block to a value less than a programmable threshold. This threshold is supplied to the circuit as a 16-bit digital number. Since the maximum time interval measurable by the TQ is limited, the Range Extender (RE) block is used to extend the capability of the TQ to measure longer time intervals. The Calibration Controller (CC) calibrates the TQ using a reference low jitter clock to provide a precise estimate of  $T_{\Delta}$ . The TDC Controller (TC) controls the communication and sequence of operation of the different blocks. After



Figure 5.5: Block diagram of the proposed TDC circuit

resolution adjustment and TQ calibration, the TC instructs the Edge Sampler controller to pass jitter samples as time intervals to the TQ for measurement. The following section describes the principle of high resolution time measurement and reports the details of each block.

# 5.2.3 Notation and definitions

The notation and definitions used throughout the remainder of the chapter are listed next. Note that any variable denoted by t refers to an instant in time, T refers to a time interval, and  $\tau$  refers to a time delay associated with a physical structure in the circuit, *e.g.*, gates, routing, etc.

•  $t_{START}$ : The time when the START signal is set HIGH

- $t_{STOP}$ : The time when the STOP signal is set HIGH
- $T_d = t_{STOP} t_{START}$ : the time interval to be measured
- clkA: The output signal of oscillator A (Osc-A)
- clkB: The output signal of oscillator B (Osc-B)
- $T_A(T_B)$ : clkA(clkB) period
- $t_{X(i)}$ : The time when the *i*-th rising edge of clkX (X = A or B) occurs.
- $T_{\Delta} = T_A T_B$ : The time quantization step. This is also the resolution of the TDC.
- N: The final TQ number indicating  $T_d$  value
- $M_A(M_B)$ : The output state of the k-bit counter CntrA(CntrB)

## 5.2.4 Time quantizer

The circuit proposed here uses a differential method to obtain high resolution. It relies on the difference between the periods of two oscillators for time quantization rather than on gate delay, therefore, reducing the need for circuit matching. In fact, the circuit is made virtually insensitive to mismatches by using a period adjustment scheme discussed in Sec. 5.2.7. This makes the circuit fully synthesizable and automatically placeable and routable using standard digital circuit electronic design automation (EDA) tools. The operational principle of the TQ is as follows.

Assume  $T_d$  as defined in Eqn. 5.4 is to be measured. Fig. 5.6(a) shows the schematic of the TQ circuit which is the core of the jitter measurement scheme. It consists of two ring

oscillators, one flip-flop, and one counter. The resolution of the scheme is dictated by the time quantization step,  $T_{\Delta}$ , obtained as:

$$T_{\Delta} = T_A - T_B \tag{5.6}$$

To conceptualize the operation of TQ, assume that in a sample chip,  $T_{\Delta}$  is a small time about 20 ps. Sec. 5.2.7 describes a circuit which automatically adjusts the resolution to guarantee the required resolution under typical process and temperature variations.

The waveforms in Fig. 5.6(b) illustrate the operational principle of the circuit. Oscillators A and B ( $T_B < T_A$ ) start oscillating at the rising edge of START and STOP, respectively. The counter MAIN starts counting at the STOP rising edge. The output of oscillator B, clkB, is sampled at the rising edge of clkA by the D flip-flop EOC\_DFF to set the end-ofconversion flag, EOC\_Flag. Assuming  $T_d$  is larger than  $T_\Delta$ , EOC\_Flag will be LOW for the first cycle of clkB. However, for every cycle of clkB, the *i*-th rising edge of clkB approaches that of clkA by  $T_\Delta$ , until eventually the *N*-th rising edges of clkB precedes that of clkA by the setup time of EOC\_DFF, causing EOC\_Flag to be set HIGH:

$$t_{B(N)} = t_{A(N)} - \tau_{EOC} - T_Q \tag{5.7}$$

where  $\tau_{EOC}$  is the setup time of EOC\_DFF, and  $0 < T_Q < T_{\Delta}$  is the quantization error. At this time, the MAIN counter stops and the digital control circuitry is able to process the output of the counter, N, which indicates the value of  $T_d$ . The  $t_{A(N)}$  and  $t_{B(N)}$  are obtained as below:

$$t_{A(N)} = t_{START} + \tau_A + NT_A \tag{5.8}$$

$$t_{B(N)} = t_{STOP} + \tau_B + NT_B \tag{5.9}$$



Figure 5.6: Time digitization using two oscillator period difference method

where  $\tau_A$  and  $\tau_B$  are delays from START and STOP signals to the D and CLK inputs of EOC\_DFF, respectively. From Eqns. 5.7, 5.8 and 5.8:

$$t_{STOP} - t_{START} = N(T_A - T_B) - \tau_B + \tau_A - \tau_{EOC} - T_Q$$
(5.10)

Therefore:

$$NT_{\Delta} = T_d + T_C + T_Q + T_R \tag{5.11}$$

where  $T_C = \tau_B - \tau_A + \tau_{EOC}$  is a constant offset time, and  $T_R$  is a random error term due to intrinsic jitter of the gates and flip-flops (refer to Sec. 5.2.10 for more details). The setting of EOC\_Flag is also used by the TDC controller to initiate processing of the data and to initialize measurement of another time interval. Sec. 5.2.6 demonstrates how to estimate  $T_{\Delta}$  and  $T_C$  through a two-point calibration scheme, while Sec. 5.2.10 provides an analysis of the random errors terms  $T_Q$  and  $T_R$  in Eqn. 5.11.

Completing one measurement requires some time, denoted by  $T_{meas}$ .  $T_{meas}$  depends on the value of the  $T_d$ . Assuming that the error terms in Eqn. 5.11 are negligible, then  $T_d + T_C = NT_{\Delta}$ . Since it takes N cycles of clkA to perform the measurement, the required measurement time is:

$$T_{meas} = NT_A = \frac{T_d + T_C}{T_\Delta} T_A$$

For example, if  $T_C = 0$ , measuring an interval of 1 ns ( $T_d = 1ns$ ) with a resolution of  $T_{\Delta} = 20$  ps (N = 50) requires  $50T_A$  of time. If  $T_B = 4$  nsec, the measurement time is approximately 200 nsec.

#### **EOC\_DFF** metastability

From Fig. 5.6 EOC\_DFF is used to set EOC\_Flag indicating that the measurement is complete when Eqn. 5.7 is satisfied. If for some value of  $T_d$ ,

$$\tau_{EOC} - T_{mw}/2 < t_{A(N)} - t_{B(N)} < \tau_{EOC} + T_{mw}/2$$

where  $T_{mw}$  is a small time interval, EOC\_DFF may exhibit a metastable behavior. This means that the EOC\_DFF output might take significantly longer than  $\tau_{clk-to-Q}$  to set its output HIGH.  $T_{mw}$  is called the metastability window of the EOC\_DFF. As shown in Appendix E,  $T_{mw}$  is less than 0.01 ps for a flip-flop is a 0.35  $\mu m$  CMOS digital cell library. The excessive delay due to metastability can cause a logic error in synchronous circuits with asynchronous inputs [104, Sec. 3.11]. However, in the TQ, if this delay results in no decision after the N-th rising edge of clkB, the decision to end the measurement will be made by EOC\_DFF at the next rising edge of clkB, *i.e.*, on (N + 1)-th edge. This is because the relative delay between the N + 1-th edges of clkA and clkB will increase by  $T_{\Delta}$  which is larger than  $T_{mw}$ , and therefore, no metastable behavior can occur on the successive edge. As discussed in Sec. 5.2.10, since  $T_{mw}$  is less than 0.01 ps, it does not significantly affect the precision of the TQ.

EOC\_Flag is also used by the TDC Controller to control the sequence of operations in the TDC. Therefore, synchronizers must be used to ensure the reliable operation of the synchronous circuit in the TDC controller [104, Sec. 3.11]. In the implementation of TDC described in Sec. 5.5 a single flip-flop (TQEOC\_sync\_DFF in Fig. 5.22) is used for synchronization.

### 5.2.5 Measurement range extension

From the waveforms in Fig. 5.6(b), it can be concluded that if  $T_d > T_A - DT_B$  (D is the duty cycle of clkB), EOC\_DFF will sample a HIGH at the second rising edge of clkA, which signals an end-of-conversion erroneously. Also, if  $T_d < -T_C$ , the first rising edge of clkA will sample a HIGH regardless of the value of  $T_d$ . Therefore, the valid measurement range of  $T_d$  for this circuit is:

$$-T_C < T_d < T_A - DT_B \tag{5.12}$$

To extend this range, a Range Extender (RE) circuit is used. The RE block ensures that the *i*-th rising edge of clkB and clkA are within the valid measurement range of the TQ (given in Eqn. 5.12) before allowing EOC\_DFF in the TQ to start sampling clkB at clkA rising edges. The RE block consists of three flip-flops, two counters and a k-bit comparator as shown in Fig. 5.7(a). The waveform diagram of the circuit is also given in Fig. 5.7(b). This block generates a flag signal (RE\_Flag) when  $t_{A(i)} - t_{B(i)} < \tau_{A1}$ , where  $-T_C < \tau_{A1} < T_A - DT_B$ . The details of the RE block operation follows.

The signal clkA is delayed by  $\tau_{A1}$  and  $\tau_{A1} + \tau_{A2}$ , using the delay buffers Dbuf1 and Dbuf2, to generate clkA1 and clkA2, respectively. The two k-bit counters, CntrA and CntrB, which are initialized to 1 and 0, count the number of rising edges of clkA2 and clkB, respectively. Fig. 5.7(b) shows different waveforms in the circuit for  $T_d = 2.6T_A$ . As shown in Fig. 5.7(b), the number in CntrA (*i.e.*,  $M_A$ ) remains larger than the number in CntrB (*i.e.*,  $M_B$ ) as long as  $t_{A2(i)} - t_{B(i)} > \tau_{A1} + \tau_{A2}$ . When  $t_{A2(i)} - t_{B(i)} < \tau_{A1} + \tau_{A2}$ ,  $M_A$  becomes equal to  $M_B$  for a short amount of time, causing the comparator to generate a pulse which becomes wider at the subsequent clkB rising edges. This pulse, when sufficiently wide, sets the RE\_Flag, allowing EOC\_DFF to start recording the relative position of the rising edge



Figure 5.7: Measurement range extension to  $(2^k - 1)T_A$ 

of clkA and clkB. Appendix F presents the operation of the RE block more rigorously from a mathematical viewpoint.

Since the RE block is asynchronous, a novel time diversity sampling technique is used to ensure valid sampling of the comparator output. On each rising edge of clkA2, the output of CntrA changes. Since the CLK-to-Q delay for each output bit of CntrA will differ for a short period of time after the rising edge of clkA2, the number  $M_A$  can a have transient random value. This random value, if equal to  $M_B$ , may result in a short pulse at the comparator output. Since this short pulse occurs after the rising edges of clkA1 and clkA2, it will not be sampled by Ext\_DFF1 and Ext\_DFF2. However, such random glitches may also occur after the rising edge of clkB due to CLK-to-Q delay differences in the output bits of CntrB. These glitches, if close to the rising edges of clkA2 (or clkA1), may be sampled by Ext\_DFF2 (or Ext\_DFF1). Choosing  $\tau_{A2}$  such that it is larger than the maximum width of such glitches guarantees that Ext\_DFF1 and Ext\_DFF2 will not be set HIGH in the same cycle of clkA2 due to such glitches. As  $t_{A2(i)} - t_{B(i)}$  becomes smaller, the cmp\_out pulse becomes wider until it is also sampled by Ext\_DFF1. At this time, the RE\_Flag is set HIGH. In summary, when Esamp1 and Esamp2 are both set HIGH, it is concluded that the cmp\_out pulse has been wider than any possible glitch due to the asynchronous nature of the RE block.

From the above discussion,  $\tau_{A2}$  is chosen such that it exceeds the maximum glitch width. The glitch width can be reasonably assumed to be 50% of  $\tau_{CLK-to-Q}$ , where  $\tau_{CLK-to-Q}$ is the worst case CLK-to-Q delay for an output bit of CntrB. This assumes that the variations of  $\tau_{CLK-to-Q}$ , denoted by  $\Delta \tau_{CLK-to-Q}$ , is less than  $0.25\tau_{CLK-to-Q}$  under process variations. The actual value of  $\Delta \tau_{CLK-to-Q}$  can be obtained by running monte-carlo analysis on the D flip-flops used in CntrB.  $\tau_{A1}$  must be within the range in Eqn. 5.12. We chose  $\tau_{A1} = -T_C + 3T_{setup}$  ( $T_{setup}$  is the maximum setup time for EOC\_DFF) in a standard 0.35  $\mu m$  CMOS process since this value is well within the required range.

It is worthwhile to note that this RE circuit does not affect the precision or accuracy of the measurement because it only ensures the closeness of the *i*-th rising edges of clkA and clkB without interfering with the path of clkA and clkB signals to EOC\_DFF, which is the critical path for precision and accuracy.

From Fig. 5.7(b), the range extension achieved by this circuit is  $T_d < (2^k - 1)T_A$ . In this case, since  $T_d = 2.6T_A$ , both 3-bit and 2-bit counters for CntrA and CntrB can be used. This is also evident from the mathematical analysis of the RE given in Appendix F.

# 5.2.6 Calibration

From Eqn. 5.11, the relationship between N and  $T_d$  is linear. Therefore, knowing the values of  $T_C$  and  $T_{\Delta}$  suffices to estimate  $T_d$  from the number N. To estimate  $T_C$  and  $T_{\Delta}$ , two accurately known time intervals  $T_{cal1}$  and  $T_{cal2}$  (typically supplied from off-chip) are measured and the resulting TDC numbers,  $N_1$  and  $N_2$ , are recorded:

$$N_{cal1}T_{\Delta} = T_{cal1} + T_C + T_{Q1} + T_{R1}$$

$$N_{cal2}T_{\Delta} = T_{cal2} + T_C + T_{Q2} + T_{R2}$$
(5.13)

where  $T_{Q_1}$  and  $T_{Q_2}$  are the quantization errors, and  $T_{R_1}$  and  $T_{R_2}$  are the random errors associated with measuring  $T_{cal_1}$  and  $T_{cal_2}$ , respectively. These two measurements are used in a two-point calibration scheme, described in Appendix D.1, to estimate  $T_C$  and  $T_{\Delta}$ . This Appendix also shows how the random terms  $T_{Q_1}$ ,  $T_{Q_2}$ ,  $T_{R_1}$  and  $T_{R_2}$  in Eqn. 5.13 result in  $T_C$ and  $T_{\Delta}$  estimation errors, denoted by  $T_{Ce}$  and  $T_{\Delta e}$ . From Eqn. 4D, the standard deviation of  $T_{\Delta e}, \sigma_{T_{\Delta e}}$ , is less than 0.1% if  $T_{cal1}$  and  $T_{cal2}$  are chosen such that  $N_{cal2} - N_{cal1} > 200$ . However, Eqn. 7D shows that the standard deviation of  $T_{Ce}, \sigma_{T_{Ce}}$ , can be several  $T_{\Delta}$ 's, which results in measurement accuracy degradation. This inaccuracy can be decreased using the *n*-point calibration scheme described in Sec. D.2. In this calibration scheme, N is obtained for *n* calibration time intervals  $T_{cal(i)} = 0, T_{ref}, \dots, (n-1)T_{ref}$  for  $i = 1, \dots, n$ . The correlation between the *n* measurements is used to shrink the range of  $T_C$  variations and provide a more accurate estimate of  $T_C$ .



Figure 5.8:  $KT_{ref}$  interval selection circuit

Since a low-jitter reference clock is often available on the chip, for two-point or *n*-point calibration it is convenient to choose  $T_{cal1} = T_{ref}$ ,  $T_{cal2} = 2T_{ref}$ , ...,  $T_{caln} = nT_{ref}$ .

A circuit that allows reliable generation of  $KT_{ref}$  intervals is shown in Fig. 5.8(a). In this circuit, when Cal=0, the Ref signal is connected to the clk inputs of SP\_DFF and ST\_DFF. Since D input of ST\_DFF is always HIGH, START is set high at the first rising edge of the Ref signal. The STOP signal always is set HIGH one Ref cycle after SP\_In turns HIGH. Since the K\_DGen state machine block sets SP\_In to HIGH (K - 1) cycles after the rising edge of the Ref signal, K Ref cycles delay results between the edges of START and STOP. The waveforms in Fig. 5.8(b) illustrate the operation of the circuit for K = 0, 1 and 2.

Constant delay is generated in the path of calibration signals in this circuit. The same delay will be used in the actual measurement, except for the term  $\Delta \tau_{MUX1} - \Delta \tau_{MUX2}$ , which represents the variation of the difference in propagation delays from I0 and I1 inputs to output in the multiplexers MUX1 and MUX2, respectively. If the mismatch is significant, estimated value for  $T_C$  during calibration will not be the same as the one used in actual measurements, resulting in additional error. Therefore, the Ref signal paths to clk inputs of SP\_DFF and ST\_DFF must be matched to the IN1 and IN2 signal paths to the same inputs, respectively. Here, it is assumed that this matching is achieved, and therefore the term  $\Delta \tau_{MUX1} - \Delta \tau_{MUX2}$  is negligible. This matching, however, is not required in a differential measurement method because in this method  $T_C$  does not affect the measurement accuracy or precision (refer to Sec. 5.2.11 for details).

### 5.2.7 Automatic resolution adjustment

The circuit in Fig. 5.6 provides a high resolution, *i.e.*, a small  $T_{\Delta}$ , by generating a small difference in loop-around delay in the two oscillators A and B. This small delay can be achieved by using additional capacitive loading  $(C_L)$  at the output of a logic gate in either

of the ring oscillators. However, any mismatch between the gate delays and interconnect wiring in the oscillators A and B also contributes to  $T_{\Delta}$ . This mismatch can cause a significant increase in  $T_{\Delta}$ , resulting in resolution degradation. The mismatch might also result in  $T_B > T_A$ , causing a measurement error.

To overcome the effects of mismatch, we use the resolution adjustment technique shown in Fig. 5.9. Using this technique,  $T_A$  and  $T_B$  are controlled digitally [105]. To do so, a series of digitally controllable capacitive loads are connected to a number of nodes in both oscillators A and B through switches. Each load and its associated switch form a controlled load (CL) cell. Turning ON the switch in a CL cell connected to a node in one of the oscillators adds some loading to the corresponding nodes, resulting in longer output oscillation periods. CL cells can be designed and added to the standard digital cell library of a technology to preserve the possibility of automatic place and route. The actual design considerations for the CL cells are addressed in Sec. 5.2.8. The details regarding the control of the oscillators output period are as follows.

Assume that some CL cells are activated. The calibration control circuit performs calibration using two known time intervals  $T_{ref}$  and  $2T_{ref}$ . The two resulting counts are then subtracted to yield the difference  $N_{\Delta} = N_2 - N_1$ . Assuming the measurement error is negligible,  $N_{\Delta}$  and  $T_{\Delta}$  are related by:

$$T_{ref} = N_{\Delta} T_{\Delta}$$

Since  $T_{ref}$  is constant, a larger  $N_{\Delta}$  indicates a smaller  $T_{\Delta}$ .

Assume a resolution of  $T_{th}$  is required, therefore:

$$T_{ref} = N_{th}T_{th}$$



Figure 5.9: Automatic resolution adjustment circuit

where  $N_{th}$  is the TDC output number associated with the resolution  $T_{th}$ . If, for a specific loading condition,  $N_{\Delta}$  is smaller than the pre-determined threshold  $N_{th}$ , the switch in a CL cell is turned ON and  $N_{\Delta}$  is obtained again. Turning ON a CL switch increases  $T_A$  or  $T_B$ , allowing for smaller  $T_{\Delta} = T_A - T_B$ . For example, assume  $T_A = 3.255 ns$  and  $T_B =$ 3.205 ns. Therefore,  $T_{\Delta} = 50 ps$ . If activating a CL cell in the oscillator B results in  $T_B =$ 3.235 ns, a better resolution of  $T_{\Delta} = 20 ps$  can be obtained.

The resolution adjustment (RA) control circuit searches for the vectors  $\vec{a} = a_0 \dots a_{l-1}$ and  $\vec{b} = b_0 \dots b_{l-1}$  such that the only CL cells activated (switch turned ON) are those that result in a  $T_{\Delta} < T_{th}$  (or equivalently  $N_{\Delta} > N_{th}$ ). The algorithm used by RA block depends on how much delay each controlled load adds to  $T_A$  or  $T_B$  when it is activated. This algorithm is given in Sec. 5.2.9. Two different approaches in designing CL cells, the *uniform load* and the *incremental step load*, are described in detail.

#### **Uniform load**

One method for resolution adjustment is to use the same cell for all the controlled loads  $CL_i^A$ , i = 1, ..., l and  $CL_j^B$ , j = 1, ..., l. The switch and the loading capacitor of each CL cell should be designed in such a way that when a cell is activated, the nominal added delay to  $T_A$  or  $T_B$  is less than  $0.5T_{th}$ . This can be done by choosing a correct size for the pass transistor by implementing the switch and/or the capacitor value. Therefore, as  $\vec{a}$  ( $\vec{b}$ ) goes from 0..0 to 1..1,  $T_A$  ( $T_B$ ) steps through l values which are  $T_{step} = 0.5T_{th}$  apart. Therefore:

$$T_{\Delta} = T_{A0} - T_{B0} + [ONE(\vec{a}) - ONE(\vec{b})]T_{step}$$
  
=  $T_{A0} - T_{B0} + \frac{1}{2}[ONE(\vec{a}) - ONE(\vec{b})]T_{th}$  (5.14)

where  $T_{A0} = T_A|_{\vec{a}=0..0}$ ,  $T_{B0} = T_B|_{\vec{b}=0..0}$  and ONE(x) is a function yielding the number of '1's in a binary number x. If the initial difference between  $T_A$  and  $T_B$ , *i.e.*,  $(T_{A0} - T_{B0})$ , is in the range  $(-1/2(l-1)T_{th}, 1/2(l-1)T_{th})$  there exist two vectors  $\vec{a}$  and  $\vec{b}$  which will result in  $T_{\Delta} \leq T_{th}$ .

Eqn. 5.14 assumes an ideal case where the  $T_A$  and  $T_B$  change with constant step of  $T_{step} = 0.5T_{th}$  as  $ONE(\vec{a})$  and  $ONE(\vec{b})$  increase. On a real chip it is difficult to guarantee the uniformity of the steps because the value of  $T_{step}$  varies with process variations, and also because it is affected by turning ON the neighboring CL cells. Assume that  $T_{step(l)} < T_{step} < T_{step(u)}$ , where  $T_{step(l)}$  and  $T_{step(u)}$  are the lower and upper  $3\sigma$  thresholds of  $T_{step}$  probability density function (PDF) obtained through monte-carlo simulations of loaded ring oscillators. As long as  $T_{step(u)} < T_{th}$ ,  $\vec{a}$  and  $\vec{b}$  that satisfy the resolution requirement can be found under process variations if:

$$-lT_{step(l)} < T_{A0} - T_{B0} < lT_{step(l)}$$
(5.15)

If  $T_{step(l)}$  is small relative to  $T_{th}$  (e.g.,  $0.2T_{th}$ ), larger values of l must be chosen to ensure valid resolution adjustment for large  $T_{A0} - T_{B0}$ . Using  $3\sigma$  thresholds of  $T_{step}$  PDF for  $T_{step(l)}$ and  $T_{step(u)}$ , 99% of the manufactured circuits are guaranteed to meet the required resolution, whereas, using  $6\sigma$  thresholds guarantees a successful resolution adjustment for 99.94% of the circuits.

The maximum time for resolution adjustment,  $T_{adj(max)}$ , occurs when all 2(l + 1) combinations of  $\vec{a}$  and  $\vec{b}$  with different number of '1's in  $\vec{a}$  or  $\vec{b}$  have to be tried before the required resolution is achieved. Therefore:

$$T_{adj(max)} = \sum_{i=0}^{2l-1} T_{calib(i)}$$

where  $T_{calib}(i)$  is the time needed for a two-point calibration for the *i*-th state of  $\vec{a}$ ,  $\vec{b}$ . Since  $T_{ref}$  and  $2T_{ref}$  are used for calibration:

$$T_{calib(i)} = \frac{3T_{ref} + 2T_C}{T_{\Delta(i)}}$$

where  $T_{\Delta(i)}$  is the  $T_{\Delta}$  for the *i*-th selection of  $\vec{a}$  and  $\vec{b}$ .

For small values of  $T_{th}$ , a large number of CL loads might be needed. This is because the  $T_{step}$  must be small and a large l is required to guarantee successful resolution adjustment for possibly large  $T_{A0} - T_{B0}$ . In that case, the incremental step design (described next) is preferable.

#### **Incremental step load**

Another method for delay control is to design CL cells such that

$$\tau_{CL^{A}} = (1+\xi)\tau_{CL^{A}}$$
(5.16)

where  $\tau_{CL_i^A}$  is the delay added to the total ring oscillator A loop delay where  $CL_i^A$  is activated and  $0 < \xi < 1$  is a constant. Such design allows for different resolution adjustment steps. For example, if  $\tau_{CL_i^A} = 8 ps$  and  $\xi = 0.5$ , then  $\tau_{CL_i^A} = 8 ps$ , 12 ps, 18 ps, 27 ps, 40.5 ps, 60.75 ps for  $i = 1, \ldots, 6$ . Therefore, assuming  $\tau_{CL_i^A} = \tau_{CL_i^B}$  for  $i = 1, \ldots, l, T_{\Delta}$  can be adjusted by steps of  $T_{step} = \pm \tau_{CL_i^A}, \ldots, \pm \tau_{CL_i^A}$ . This method effectively provides different levels of coarse and fine resolution adjustment steps. This enables the circuit to achieve very fine resolutions (less than 5 ps in a 0.35  $\mu m$  CMOS process) while reducing the average adjustment time  $T_{adj}$ , using binary-like search algorithms explained Sec. 5.2.9.

To guarantee the resolution adjustment, the maximum of the smallest adjustment step, should be less than  $T_{th}$ . The maximum of the smallest adjustment step, denoted by  $\tau_{CL_{1(u)}^{A}}$ , can be chosen as the upper  $3\sigma$  threshold of  $\tau_{CL_1^A}$  PDF under process variations. xi has to be selected such that under process variations, for any  $\vec{a}$ , the difference between  $T_{\Delta}$  for  $\vec{a}$  and  $\vec{a} + 1$  is less than  $T_{th}$ . Ideally, xi could be derived from  $3\sigma$  variation of  $\tau_{CL_i^A}$  (i = 1, ..., l), however, since this derivation is cumbersome, an arbitrary value of  $\xi = 0.5$  is chosen for the implemented circuit reported in Sec. 5.5.

# 

#### 5.2.8 Controlled load (CL) cell design

Figure 5.10: CL cell evaluation test bench

In this section, advantages and disadvantages of different CL cell design styles are described. The evaluation test bench is the circuit shown in Fig. 5.10, where a ring oscillator is loaded by 6 CL cells (l = 6) and the output oscillation period,  $T_{osc}$  is measured for  $\vec{a} =$ 000001,000011,000111,001111,011111, and 111111. Then difference between  $T_{osc}$  and  $T_0 = T_{osc}|_{\vec{a}=000000}$  is calculated. This difference is denoted by  $T_{dif}$ . A standard 0.35  $\mu m$ CMOS technology has been used for evaluation purposes.

Fig. 5.11 illustrates different design styles for CL cells and their simplified models.



Figure 5.11: Different CL cell styles: (a1, b1, c1, d1, e1) Circuits; (a2, b2, c2, d2, e2) simplified models

The important design consideration for CL designs are:

- 1. The cell area for achieving a unit of  $T_{dif}$ . This area should be minimized
- 2. The sensitivity of  $T_{dif}$  to control voltage ( $V_{ctrl}$ ) variations:

$$S_{V_{ctrl}}^{T_{dif}} = \frac{\Delta T_{dif}}{T_{dif}} \frac{V_{ctrl}}{\Delta V_{ctrl}}$$

If this sensitivity is high,  $V_{ctrl}$  noise will add jitter to clkA and clkB when the CL cells are connected to oscillators A and B in Fig. 5.6. This, in turn, increases  $T_R$  in Eqn. 5.11, which translates into a loss of precision in the TDC.

In the following,  $C_{gs(X)}$ ,  $C_{gd(X)}$ ,  $C_{gb(X)}$ ,  $C_{db(X)}$  and  $C_{sb(X)}$  denote the gate-source,

gate-drain, gate-bulk, drain-bulk and source-bulk capacitances of the transistor  $M_X$ , respectively, where X is a transistor identifier.

The voltage-controlled NMOS capacitor design (style *a*) has been proposed in [105] to implement a controlled delay line. Although it provides a relatively large  $T_{dif}$  in a small area, the  $T_{dif}$  is quite sensitive to  $V_{ctrl}$  because the equivalent capacitive loading of the CL cell is a function of  $V_{ctrl}$ . Values for  $S_{V_{ctrl}}^{T_{dif}}$  are listed in Table 5.2.8 for the different CL cell styles for comparison.

In Fig. 5.11(b1), a capacitor is used as the load. A simple model for such a CL cell, shown in Fig. 5.11(b2), consists of an ideal switch S, the switch resistance  $R_S$ , the switch drain capacitance  $C_{d(S)} = C_{db(S)} + C_{gd(S)}$ , the switch source capacitance  $C_{s(S)} = C_{sb(S)} + C_{gs(S)}$ , and the load capacitance  $C_L$ .  $R_S$  is in the range of a few tens of  $M\Omega$  when the switch is OFF and a few  $K\Omega$  when it is ON. As is evident in the model, the  $C_{d(S)}$  and  $C_{s(S)}$  are also loading the oscillator. Since  $C_{db(S)}$ ,  $C_{gd(S)}$ ,  $C_{sb(S)}$ , and  $C_{gs(S)}$  are functions of  $V_{ctrl}$ , this style has a high  $T_{dif}$  sensitivity to  $V_{ctrl}$ . In fact, any design with the switch connected to the oscillator node suffers from this high sensitivity characteristic.

An alternate circuit and its simple model are shown in Fig. 5.11(c1) and (c2). In the model,  $C_{d(S)} = C_{db(S)} + C_{gd(S)}$ . This design provides a low  $T_{dif}$  sensitivity to  $V_{ctrl}$  (see Table 5.2.8) because when the switch transistor  $M_S$  is ON, the impedance of  $C_{d(S)}$ ,  $Z_{d(S)} = 1/2\pi f C_{d(S)} >> R_S$ . Therefore, the  $C_{d(S)}$  variations do not affect the total loading of the CL cell significantly. Note also that  $R_S$  variation due to  $V_{ctrl}$  does not affect the capacitive loading of the cell noticeably. If the  $M_S$  area is large such that  $C_L << C_{d(S)}$  and  $Z_{d(S)}$  dominates (*i.e.*,  $Z_{d(S)} << R_S$ ), then the CL load variation due to  $V_{ctrl}$  variations is not

significant because:

$$C_{L(con)} = C_L C_{d(S)} / (C_L + C_{d(S)}) \sim C_L$$

In this case, the effect of  $R_S$  is significantly diminished, which means that the CL load variations for ON and OFF states of switch  $M_S$  are small. This is a disadvantage when larger CL load variations are required. Therefore, special attention must be paid to switch size in this design. Finally, style (c) occupies a small area for a given load. However, the target technology has to permit fabrication of floating capacitors.

Fig. 5.11(d1) shows a design similar to the one in Fig. 5.11(c1), with the difference that a NMOS gate capacitor is used instead of a parallel-plate capacitor. In the associated model, shown in Fig. 5.11(d2),  $C_{g(L)} = C_{gs(L)} + C_{gd(L)}$  and  $C_d = C_{sb(L)} + C_{db(L)} + C_{db(S)} + C_{dg(S)}$ . The  $T_{dif}$  sensitivity to  $V_{ctrl}$  is only marginally greater than that of style (c). The style in Fig. 5.11(e) shows a good  $S_{V_{ctrl}}^{T_{dif}}$ , but requires more area to achieve the same delay as style (d). In the model for style (e) shown in Fig. 5.11(e2),  $C_{d(L)} = C_{gs(L)} + C_{gd(L)}$  and  $C_d = C_{gb(L)} + C_{db(S)} + C_{dg(S)}$ . In the prototype implementation, described is Sec. 5.5, style (d) is chosen because it provides 10 ps delay in an area of a single-drive NOT gate and it exhibits low  $S_{V_{ctrl}}^{T_{dif}}$ .

### 5.2.9 Resolution adjustment control block

Using either 'uniform load' or 'incremental load' strategy before measuring  $N_{\Delta}$ , it is required to check the assumption that  $T_A > T_B$ . A TATB checker circuit is used to perform this check. Two possible methods for designing the TATB checker circuit are outlined below:

| V <sub>ctrl</sub> | 2.5V  | 2.6V  | 2.7V  | 2.8 V | 2.9V  | - 3V  | 3.1V  | 3.2V  | .3.3V  | $\frac{\Delta T_{dif}}{\Delta V_{ctrl}}$ | $S_{V_{ctrl}}^{T_{dif}}$ |
|-------------------|-------|-------|-------|-------|-------|-------|-------|-------|--------|------------------------------------------|--------------------------|
|                   | (ps)   | (ps/V)                                   | ♥ ctrl                   |
| a1                | 24.3  | 25.9  | 27.4  | 28.9  | 30.2  | 31.5  | 32.8  | 34.0  | 35.2   | 13.5                                     | 1.3                      |
| a2                | 75.7  | 80.3  | 84.7  | 89.0  | 93.4  | 97.7  | 101.6 | 105.8 | 109.5  | 42.2                                     | 1.3                      |
| a3                | 154.6 | 162.9 | 170.9 | 179.8 | 187.9 | 195.9 | 203.7 | 211.2 | 218.7  | 80.2                                     | 1.2                      |
| a4                | 269.1 | 284.0 | 299.1 | 314.2 | 328.5 | 342.7 | 356.2 | 369.4 | 382.1  | 141.2                                    | 1.2                      |
| a5                | 423.8 | 447.5 | 471.5 | 495.0 | 518.2 | 540.4 | 561.1 | 581.8 | 601.8  | 222.6                                    | 1.2                      |
| a6 🛛              | 636.4 | 672.7 | 708.7 | 743.5 | 777.7 | 811.1 | 843.4 | 875.1 | 905.2  | 336.0                                    | 1.2                      |
| b1                | 57.4  | 61.6  | 65.89 | 70.2  | 74.3  | 78:4  | 82.4  | 86.5  | 90.2   | 40.9                                     | 1.5                      |
| b2                | 142.8 | 153.2 | 163.6 | 173.9 | 184.3 | 194.7 | 204.7 | 214.6 | 224.3  | 101.9                                    | 1.5                      |
| b3                | 250.2 | 268.8 | 287.3 | 306.1 | 324.6 | 342.9 | 361.0 | 378.9 | 396.4  | 182.8                                    | 1.5                      |
| b4                | 387.5 | 417.3 | 446.9 | 476.5 | 506.4 | 535.6 | 564.8 | 593.8 | 622.2  | 293.4                                    | 1.6                      |
| b5                | 555.0 | 598.3 | 642.3 | 686.2 | 730.1 | 773.9 | 817.1 | 859.9 | 902.5  | 434.4                                    | 1.3                      |
| b6                | 771.5 | 834.7 | 898.6 | 962.8 | 1020  | 1090  | 1150  | 1210  | 1280   | 636.7                                    | 1.64                     |
| c1                | 34.8  | 34.8  | 34.9  | 34.9  | 34.9  | 35.0  | 35.0  | 35.0  | 35.0   | 0.308                                    | 0.029                    |
| c2                | 121.1 | 121.3 | 121.4 | 121.5 | 121.6 | 121.7 | 121.9 | 121.9 | 122.1  | 1.23                                     | 0.033                    |
| c3                | 262.4 | 262.8 | 263.1 | 263.4 | 263.8 | 264.0 | 264.2 | 264.3 | 264.6  | 2.77                                     | 0.034                    |
| c4                | 484.0 | 484.9 | 485.7 | 486.4 | 486.9 | 487.5 | 488.0 | 488.4 | 488.9  | 6.09                                     | 0.041                    |
| c5                | 789.4 | 791.0 | 792.4 | 793.5 | 794.5 | 795.4 | 796.3 | 797.0 | 797.84 | 10.55                                    | 0.044                    |
| c6                | 1248  | 1251  | 1253  | 1255  | 1257  | 1259  | 1260  | 1262  | 1263   | 18.35                                    | 0.048                    |
| d1                | 9.44  | 9.47  | 9.43  | 9.44  | 9.45  | 9.39  | 9.44  | 9.55  | 9.49   | 0.063                                    | 0.022                    |
| d2                | 33.19 | 33.25 | 33.35 | 33.53 | 33.59 | 33.67 | 33.69 | 33.73 | 33.80  | 0.755                                    | 0.073                    |
| d3                | 73.19 | 73.40 | 73.62 | 73.95 | 74.19 | 74.33 | 74.37 | 74.59 | 74.62  | 1.78                                     | 0.078                    |
| d4                | 130.3 | 131.1 | 131.5 | 131.9 | 132.3 | 132.8 | 133.1 | 133.4 | 133.7  | 4.23                                     | 0.104                    |
| d5                | 210.6 | 211.7 | 212.5 | 213.2 | 213.9 | 214.6 | 215.2 | 215.7 | 216.3  | 7.1                                      | 0.108                    |
| d6                | 316.9 | 318.7 | 320.6 | 321.6 | 323.0 | 324.0 | 325.1 | 326.2 | 327.1  | 12.72                                    | 0.128                    |
| e1                | 0.844 | 0.825 | 0.904 | 0.757 | 0.891 | 0.938 | 0.952 | 0.791 | 0.761  | -0.104                                   | -0.451                   |
| e2                | 3.33  | 3.27  | 3.30  | 3.26  | 3.35  | 3.16  | 3.47  | 3.19  | 3.31   | -0.303                                   | -0.302                   |
| e3                | 7.14  | 7.16  | 7.18  | 7.28  | 7.42  | 7.14  | 7.27  | 7.15  | 7.31   | 0.211                                    | 0.095                    |
| e4                | 15.72 | 15.83 | 15.68 | 15.73 | 15.71 | 15.73 | 15.75 | 15.82 | 15.77  | 0.594                                    | 0.124                    |
| e5                | 28.17 | 28.42 | 28.25 | 28.49 | 28.29 | 28.41 | 28.41 | 28.41 | 28.62  | 0.569                                    | 0.065                    |
| e6                | 49.38 | 49.47 | 49.56 | 49.40 | 49.46 | 49.47 | 49.56 | 49.53 | 49.48  | 0.123                                    | 0.008                    |

1. The circuit in Fig. 5.12 can be used to check the condition that  $T_A > T_B$ . A zero time interval ( $T_d = 0$ ) is applied as input to the TQ. As the waveforms of  $T_A$  and  $T_B$  illustrate, when  $T_A < T_B$ , DFF\_EOC samples LOW until the *i*-th rising edge of clkA matches that of clkB after  $((D-1)T_A+T_C)/T_\Delta$  cycles of clkA. However, DFF\_ERR1 samples a HIGH after  $T_C/T_\Delta$  cycles of clkA, *i.e.*, DFF\_ERR1 is set before DFF\_EOC. If  $T_A > T_B$ , the reverse occurs, *i.e.*, DFF\_EOC is set before DFF\_ERR1. Therefore, the RA control block can check the condition  $T_A < T_B$  by monitoring the two flags EOC\_FLAG and ERR1\_FLAG. When the condition  $T_A < T_B$  is being checked, the reset lines of both EOC\_DFF and ERR1\_DFF must be inactive. The RA control block sets these reset lines HIGH through the OR gate.

An important requirement for this circuit is ensuring that  $\lfloor ((D-1)T_A + T_C)/T_\Delta \rfloor \neq \lfloor T_C/T_\Delta \rfloor$  ( $\lfloor \rfloor$  means integer part), otherwise the two flags are set in the same cycle of clkA resulting in a decision deadlock which causes failure in checking the condition  $T_A > T_B$ . However, such a requirement is easily met by a typical design under typical process variations. For example, in a circuit implementation in a 0.35  $\mu m$  CMOS process, the minimum  $(D-1)T_A$  is 1.5 ns, maximum  $T_C$  is 0.4 ns, and maximum  $T_\Delta$  is 0.15 ns. Therefore, in the worst case:

$$\lfloor \frac{(D-1)T_A + T_C}{T_\Delta} \rfloor = 12$$

and

$$\lfloor T_C/T_\Delta \rfloor = 2$$

Another design requirement is in regard to the first rising edges of clkA and clkB. Since setup and hold time for DFF\_ERR1 and DFF\_EOC could be different, DFF\_ERR1 and DFF\_EOC might be set HIGH simultaneously on the first or second edges of clkA and clkB, respectively. Such a case results in decision deadlock. The circuit proposed next solves this issue but requires more hardware.

2. From the waveforms in Fig. 5.13, if  $T_A < T_B$ , CntrA will count faster than CntrB which causes the difference  $M_A - M_B$  to increase rather than decrease as time progresses. Assuming  $T_d = 0$ , the difference between the two counters is initially 0 or 1. Therefore,  $M_A - M_B > 2$  implies that  $T_A < T_B$ . This method requires more hardware than the first method, but its operation does not depend on the values of  $T_A$ ,  $T_C$  and  $T_{\Delta}$  or on the setup and hold times of the flip-flops as in previous circuit. Fig. 5.13 depicts the TATB checker circuit. The Cntr3A and Cntr3B are 3-bit counters initialized to 0 and 2, respectively. To save hardware, instead of Cntr3A, the three least significant bits of CntrA in RE block could be used. The 3-bit comparator 'cmp3', compares the outputs of Cntr3A and Cntr3B, denoted by M3A and M3B, respectively. The same technique used in the RE block is used here for the reliable detection of the event M3A > M3B + 2. Detecting such an event implies that clkB has been slower than clkA (*i.e.*,  $T_B > T_A$ ). In order to use the time diversity technique, the clkA1 and clkA2 signals from the RE block must be used. In addition, a  $T_d$  such that  $\tau_{A1} + \tau_{A2} < T_d < T_B$  has to be applied to the TQ, where  $\tau_{A1}$  and  $\tau_{A2}$  are as defined in Sec. 5.2.5. The circuit in Fig. 5.14 is used to generate an appropriate  $T_d$ .

If M3A < M3B, it is deduced that  $T_B < T_A$  and EOC\_Flag is set before ERR\_Flag.

If the 'uniform load' approach is used for selecting the CL loads, one hardware efficient method to search for  $\vec{a}$  and  $\vec{b}$  is the exhaustive search. In this method, the RA block is composed of a 2*l*-bit state machine, RA-SM. *l* bits of the state machine are connected to  $a_0 \dots a_{l-1}$  and the other *l* bits to  $b_0 \dots b_{l-1}$ . The RA-SM sequentially tests all the distinct



Figure 5.12: Circuit for checking the necessary condition that  $T_A > T_B$ 



Figure 5.13: Alternative circuit for checking the condition  $T_A > T_B$ 

124





combinations of  $\vec{a}$  and  $\vec{b}$ :

$$\vec{a} \in \{0 \dots 000, 0 \dots 001, 0 \dots 011, 0 \dots 111, \dots, 1 \dots 1\}$$
  
 $\vec{b} \in \{0 \dots 000, 0 \dots 001, 0 \dots 011, 0 \dots 111, \dots, 1 \dots 1\}$ 

The flowchart of the algorithm used to select  $\vec{a}$  and  $\vec{b}$  is given in Fig. 5.15(a). Note that distinct combinations for  $\vec{a}$  are sequences with different number of '1's in them. Therefore,  $\vec{a} = 110000$  and  $\vec{a} = 000011$  result in the same  $T_{\Delta}$  because all the CL cells are the same. If all the distinct combinations are tried and the required resolution is not obtained, the circuit sets a failure flag RAERR\_Flag. If the circuit has been designed to operate under typical process variations, setting this flag means that the BIST circuit is faulty.

The exhaustive search approach, though more efficient in terms of hardware, can result in long resolution adjustment time. To reduce this time, a heuristic can be added to the algorithm as shown in the flowchart of Fig. 5.15(b). The main difference here from the exhaustive algorithm in Fig. 5.15(a) is that given the condition  $T_A > T_B$ , only combinations of  $\vec{b}$  or  $\vec{a}$  are tested, but not both.

If the 'incremental step load' strategy is used, the search algorithms must be adjusted accordingly. The exhaustive test strategy now requires potentially testing all the combinations of  $\vec{a}$  and  $\vec{b}$ :

$$\vec{a} \in \{0 \dots 000, 0 \dots 001, 0 \dots 010, 0 \dots 011, \dots, 1 \dots 1\}$$
  
 $\vec{b} \in \{0 \dots 000, 0 \dots 001, 0 \dots 010, 0 \dots 011, \dots, 1 \dots 1\}$ 

The flowchart of the algorithm is shown in Fig. 5.16. Again, the hardware implementation is simple because one 2l-bit counters can be used to generate  $\vec{b}$  or  $\vec{a}$ . However, the resolution adjustment time can be very long because a maximum of  $2^{2l}$  combinations might have to be checked (in the worst case scenario).



Figure 5.15: Algorithms for selecting  $\vec{a}$  and  $\vec{b}$  in uniform load CL method, (a) exhaustive search, (b) directed search



Figure 5.16: Exhaustive search algorithms for selecting  $\vec{a}$  and  $\vec{b}$  in the 'incremental step' CL cell method



Figure 5.17: Semi-exhaustive search algorithm for selecting  $\vec{a}$  and  $\vec{b}$  in the 'incremental step' CL cell method



Figure 5.18: Fast search algorithm for selecting  $\vec{a}$  and  $\vec{b}$  in the 'incremental step' CL cell method

-

A semi-exhaustive search will reduce the adjustment time with small modifications in hardware. The flowchart for this method is given in Fig. 5.17. The main difference with the exhaustive search is that either  $\vec{a}$  or  $\vec{b}$  is incremented during the search, but not both. The decision on which counter to be incremented depends on whether  $T_A > T_B$  or  $T_A < T_B$ . Therefore, the maximum number of combinations to be tested is  $2^l$ .

A fast search algorithm is depicted in the flowchart in Fig. 5.18. In this algorithm, if  $T_A < T_B$ , only  $\vec{a}$  is adjusted because  $T_A$  must be increased until the difference between  $T_A$  and  $T_B$  satisfies the required resolution. Similarly, if  $T_A > T_B$ , only  $\vec{b}$  is adjusted to increase  $T_B$  so that the required  $T_{\Delta}$  is achieved. In this algorithm, since the two oscillators A and B are similar, there is a high probability that  $T_A$  and  $T_B$  are close. Therefore, the first choice is  $\vec{a} = 0..0$  and  $\vec{b} = 0..0$ . If  $N_{\Delta} < N_{th}$ , the lowest significant bit of  $\vec{a}$  or  $\vec{b}$  (depending on whether  $T_A < T_B$  or  $T_A > T_B$ ) is set high to increase  $T_A$  or  $T_B$  by the smallest amount possible. If the required resolution is still not achieved, the next bit of  $\vec{a}$  or  $\vec{b}$  is set HIGH and all other bits are set LOW. This is continued until setting the *i*-th bit HIGH implies that  $T_A$ or  $T_B$  has been increased too much. Then the *i*-th and (i-1)-th bit are set LOW and HIGH, respectively, and the process starts over by setting the 0-th bit. To illustrate the algorithm, assume that  $T_A > T_B$ , l = 6 and the required resolution achieved for  $\vec{b} = 001001$ . The algorithm goes through the following sequence to find the required  $\vec{b}$ :

## 000000, 000001, 000010, 000100, 001000, 010000, 001001

This is in contrast with the exhaustive and semi-exhaustive searches, which go through the following sequence:

000000, 000001, 000010, 000011, 000100, 000101, 000110, 000111, 001000, 001001

As can be seen from above, the fast algorithm finds the solution in 7 steps, while the exhaustive and semi-exhaustive search require 10 steps.

### 5.2.10 TDC error sources

From Eqn. 5.11, one of the major sources of measurement error is  $T_R$ , which is due to different noise sources in TQ (in Fig. 5.6). Also, the measurement accuracy (accuracy is formally defined in Sec. 5.2.11) is degraded due to the delay variations in the circuit of Fig. 5.8, causing random variation between the arrival of rising edges of IN1 and IN2, and START and STOP, respectively. In this section, we identify these noise sources and will show how the proposed TDC architecture can significantly reduce power supply noise effect on the rms value of  $T_R$ .

To identify different noise sources, we obtain the relationship between N and the interval being measured by TDC,  $T_D = t_{IN2} - t_{IN1}$ , as follows. In Fig. 5.8:

$$t_{START} = t_{IN1} + \tau_{ST} \tag{5.17}$$

$$t_{STOP} = t_{IN2} + \tau_{SP} \tag{5.18}$$

where  $\tau_{ST} = \tau_{mux1} + \tau_{ST(clk-to-Q)}$  and  $\tau_{SP} = \tau_{mux2} + \tau_{SP(clk-to-Q)} (\tau_{ST(clk-to-Q)})$  and  $\tau_{SP(clk-to-Q)}$  are the clk-to-Q delays of ST\_DFF and SP\_DFF, respectively, and  $\tau_{mux1}$  and  $\tau_{mux2}$  are the delays from input I0 to out in MUX1 and MUX2, respectively). Substituting Eqns. 5.17 and 5.17 in Eqn. 5.10 yields:

$$T_D = N(T_A - T_B) - \tau_{EOC} + \tau_A - \tau_B + \tau_{SP} - \tau_{ST}$$
(5.19)

From Eqn. 5.19, Different sources of errors affecting  $T_R$  include:

- 1.  $\tau_{SP}$  and  $\tau_{ST}$ : the edge sampling flip-flops ST\_DFF and SP\_DFF, and multiplexers MUX1 and MUX2
- 2.  $\tau_{EOC}$ : DFF\_EOC flip-flop setup time and metastability window
- 3.  $\tau_A$ ,  $\tau_B$ ,  $T_A$  and  $T_B$ : ring oscillators jitter

The  $\tau_{ST}$ ,  $\tau_{SP}$  and  $\tau_{EOC}$  can be expressed as:

$$\tau_{ST} = \tau_{ST0} + \tau_{ST(e)}$$
  
$$\tau_{SP} = \tau_{SP0} + \tau_{SP(e)}$$
  
$$\tau_{EOC} = \tau_{EOC0} + \tau_{EOC(e)}$$

where  $\tau_{ST0}$ ,  $\tau_{SP0}$  and  $\tau_{EOC0}$  are the nominal values (under noise-free conditions) of  $\tau_{ST}$ ,  $\tau_{SP}$ and  $\tau_{EOC}$ , respectively, and  $\tau_{ST(e)}$ ,  $\tau_{SP(e)}$  and  $\tau_{EOC(e)}$  are the jitter of ST\_DFF, and SP\_DFF clk-to-Q delays, and the EOC\_DFF set-up time and metastability window, respectively. The delay and setup time jitters are due to different sources such as thermal noise and power supply noise. The metastability error, however, is due the inability of the DFF\_EOC to decide its output state if the delay between the arrival time of the signals at the clk and D inputs are very close the setup time of the DFF\_EOC. Therefore:

$$\tau_{EOC(e)} = \tau_{EOC(en)} + \tau_{EOC(m)}$$

where  $\tau_{EOC(en)}$  is the variation of EOC\_DFF setup time due to noise source and  $\tau_{EOC(m)}$  is the error induced by EOC\_DFF metastability window. The characterization of EOC\_DFF through simulation, reported in Appendix E, shows that this window is less than 0.01 ps. Therefore, for practical applications, this error is negligible in comparison with TDC resolution of about 10 ps. The  $\tau_{ST0}$ ,  $\tau_{SP0}$  and  $\tau_{EOC0}$  will be accounted for during calibration because they only contribute to the offset  $T_C$  (refer to Sec. 5.2.6). However,  $\tau_{ST(e)}$ ,  $\tau_{SP(e)}$ and  $\tau_{EOC(e)}$  cause loss of precision. The inaccuracy due to the above three sources is given by:

$$T_{DFF(e)} = \tau_{SP(e)} - \tau_{ST(e)} + \tau_{EOC(e)}$$

The inaccuracy caused by the jitter in ring oscillators A and B used in the TQ, is analyzed next.

#### Jitters in ring oscillator A and B

The outputs of both ring oscillators A and B, clkA and clkB signals, include some amount of jitter. In general, the rms jitter of a ring oscillator increases with the square-root of the number of gates in its oscillating loop [106]. When a rising edge passes through the *i*-th gate of the loop, the output of the gate output switches after  $\tau_{q(i)}$ :

$$\tau_{g(i)} = \tau_{g(i)0} + \tau_{g(i)e}$$

where  $\tau_{g(i)0}$  is the average gate propagation delay and  $\tau_{g(i)e}$  is the random variation of this value due to different noise sources. The period at the output of the oscillator A can be expressed as:

$$T_{A} = \sum_{\substack{i=0\\2M_{A}-1\\2M_{A}-1}}^{2M_{A}-1} \tau_{g(i)}^{A}$$

$$= \sum_{\substack{i=0\\2M_{A}-1\\i=0}}^{2M_{A}-1} \tau_{g(i)0}^{A} + \tau_{g(i)e}^{2M_{A}-1} \tau_{g(i)e}^{A}$$

$$= T_{A0} + T_{Ae}$$
(5.20)

where  $M_A$  is the number of gates in the ring oscillator A,  $T_{A0}$  is the average period, and  $T_{Ae}$  is the period jitter of clkA:

$$T_{Ae} = \sum_{i=0}^{2M_A - 1} \tau_{g(i)e}^A$$
(5.21)

Assuming  $\tau_{g(i)e}$ 's are independent, normally distributed random variables with standard deviation  $\sigma_g$  and mean of 0, the variance of  $T_{Ae}$  will be:

$$\sigma_{Ae} = \sqrt{2M_A}\sigma_g \tag{5.22}$$

Similarly,

$$\sigma_{Be} = \sqrt{2M_A}\sigma_g$$

where  $\sigma_{Be}$  is clkB's period jitter. In a measurement sample, the TDC stops after N cycles of clkA and clkB. If the noise sources are independent, the jitter in each cycle will be independent from the jitter in any other cycle in either clkA or clkB. In such case, the effect of the jitter on  $T_d$  is as follows:

$$T_d = N(T_{A0} - T_{B0}) + \sum_{j=0}^{N-1} (T_{A(j)e} - T_{B(j)e}) + T_Q$$

Therefore, the measurement error due to the jitter in the ring oscillators is:

$$T_{R1} = \sum_{j=0}^{N-1} (T_{A(j)e} - T_{B(j)e})$$
(5.23)

The variance of  $T_{R1}$  is obtained as:

$$\sigma_{R1}^2 = N(\sigma_{Ae}^2 + \sigma_{Be}^2) = 4NM_A\sigma_g^2$$
(5.24)

Eqn. 5.24 is valid when the noise sources in the gates are independent, such as thermal noise. However, the noise sources due to power supply and substrate noise for each gate are not independent. These correlations are inherent to the structure of the TDC, making it resistant to such noise sources. The analysis in Appendix H shows that when measuring a time interval  $T_d$ , the inaccuracy of the TDC due to the  $V_{dd}$ -induced jitter in Oscillator A and B is:

$$E_{PS} = T_d[\kappa \overline{V_{dd(e)}(t_0, t_d)} - \frac{\gamma}{\tau_{A0}} \overline{V_{dd(e)}(t_d, t_{eoc})}]$$
(5.25)

where  $t_0$  is the time at which START edge occurs;  $t_d = t_0 + T_d$ ;  $t_{eoc}$  is the time at which measurement completes;  $\kappa$  and  $\gamma$  are two constants modeling the  $V_{dd}$ -induced gate delay jitter (see Appendix G.1);  $V_{dd(e)}$  is the power supply noise term;  $\overline{V_{dd(e)}(t_0, t_d)}$  and  $\overline{V_{dd(e)}(t_d, t_{eoc})}$ are the averages of  $V_{dd(e)}$  over the window  $[t_0, t_d]$  and  $[t_d, t_{eoc}]$ , respectively. Eqn. 5.25 indicates that  $E_{PS}$  is proportional to the time interval measured by TQ,  $T_d$ . This was to be expected because the larger the interval  $T_d$ , the greater the number of switching events in the oscillator loops.  $E_{PS}$  is also proportional to the power supply noise averaged over a time window. Due to noise power reduction because of noise averaging and differential noise rejection, this circuit is capable of high-precision measurements. Sec. 5.6.2 shows that the rms value of  $E_{PS}$  is approximately 5.5 ps under a typical conditions in a  $0.35\mu m$  CMOS implementation of the circuit.

## 5.2.11 Accuracy, Precision, and Resolution

Accuracy, precision and resolution are three important characteristics of a measurement circuit or device. This section evaluates these characteristics for the TDC circuit.

#### Definitions

The formal definition of accuracy, precision and resolution are [107]:

1. Accuracy is the degree of exactness (closeness) of a measurement when compared to the expected (most probable) mean of the variable being measured. For example, if the value Y is measured for a variable with expected value of X, the measurement accuracy is:

$$A = 1 - |\frac{X - Y}{Y}| = 1 - \frac{\varepsilon}{Y}$$
(5.26)

where  $\varepsilon = X - Y$  is the absolute error. Since  $\varepsilon$  is mathematically more convenient to calculate and is directly mapped to accuracy, throughout this document we calculate  $\varepsilon$  as a measure of accuracy.

2. *Precision* is the measurement sample deviation relative to measurement mean. Therefore,

$$P = 1 - \left|\frac{Y - \overline{Y}}{\overline{Y}}\right|$$

where Y is a sample measurement and  $\overline{Y}$  is the the measurement mean if a large number of measurements are performed on the variable to be measured. Precision is, in fact, an indicator of the consistency of the measurements taken by an measurement device. We define precision error as:

$$\eta = Y - \overline{Y} \tag{5.27}$$

Since  $\eta$  is mathematically more convenient to calculate and is directly mapped to precision, hereafter we will use  $\eta$  to analyze the precision of the TDC.

3. *Resolution* is defined as smallest change in a measured variable to which a measurement device responds. Therefore, the resolution of the TDC circuit is  $T_{\Delta}$ .

Using the above definitions, we can say that if a measurement is accurate it is also precise, but the reverse is not necessarily true. This is because some unknown variables affect all the measurement the same way, *e.g.*, measurement offset in the measurement device. These random reduce the accuracy but not the precision. To illustrate this, assume that a measurement device measures the value Y for a variable with expected value of X such that

$$Y = X + \varepsilon \tag{5.28}$$

where  $\varepsilon$  is a random variable with a mean of  $m_{\varepsilon}$  and a standard deviation of  $\sigma_{\varepsilon}$ . Therefore, the rms error,  $\varepsilon_{rms}$ , is

$$E_{rms}^{2} = E[\varepsilon^{2}]$$

$$= E[(\varepsilon - \overline{\varepsilon} + \overline{\varepsilon})^{2}]$$

$$= \sigma_{\varepsilon}^{2} + m_{\varepsilon}^{2}$$
(5.29)

where E[] represents the Expectation function [108].

 $\varepsilon$ 

From Eqns. 5.27 and 5.28 the precision error,  $\eta$ , for this example, is obtained as below:

$$\eta_{rms}^{2} = E[\eta^{2}]$$

$$= E[(X + \varepsilon - (\overline{X + \varepsilon}))^{2}]$$

$$= \sigma_{\varepsilon}^{2}$$
(5.30)

Comparing Eqn. 5.29 and Eqn. 5.30 shows  $\eta_{rms}^2 < \varepsilon_{rms}^2$  due to the  $m_{\varepsilon}^2$ , which can be interpreted as the offset error of the measurement device.

#### **TDC accuracy and precision**

We analyze the TDC accuracy, precision and resolution for two different measurement: *absolute* and *differential* methods. In an absolute measurement of a time interval  $T_d$ , the measured values by TDC are used directly, whereas in a differential method, the differences of

measurements from a reference measurement are analyzed. Consequently, the set of meaningful quantities for the two methods are:

Absolute: 
$$N_1, N_2, ...$$
  
Differential:  $N_1 - N_{ref}, N_2 - N_{ref}, ...$ 
(5.31)

where  $N_{ref}$  is a number obtained for a reference measurement.

In an absolute measurement of an time interval  $T_d$ , the TDC measurement error is (from Eqn. 8D):

$$\varepsilon_{abs} = NT_{\Delta e} + T_{Ce} + T_Q + T_R \tag{5.32}$$

The error terms  $T_{\Delta e}$  and  $T_{Ce}$  are due to calibration, but their values remain constant for all the measurement samples following calibration. In fact,  $T_{\Delta e}$  and  $T_{Ce}$  can be interpreted as offset error. Therefore, the TDC precision error is given as:

$$\eta_{abs} = T_Q + T_R \tag{5.33}$$

In the differential measurement method, the offset error can be eliminated through subtraction. Assume  $N_a$  and  $N_b$  correspond to the measurement of two time interval sample  $T_a$  and  $T_b$ , respectively. Therefore:

$$T_{ab} = (N_a - N_b)T_\Delta + T_{Qa} - T_{Qb} + T_{Ra} - T_{Rb}$$
(5.34)

where  $T_{ab}$  is the measured difference between  $T_a$  and  $T_b$ . Therefore, in differential measurement, the error,  $\varepsilon_{diff}$ , and precision error,  $\eta_{diff}$ , are given as:

$$\varepsilon_{diff} = (N_a - N_b)T_{\Delta e} + T_{Qa} - T_{Qb} + T_{Ra} - T_{Rb}$$
(5.35)

$$\eta_{diff} = T_{Qa} - T_{Qb} + T_{Ra} - T_{Rb} \tag{5.36}$$

If  $T_{Ce}$  variations are significant compared to that of other error sources, from Eqns. 5.32, 5.33, 5.35, and 5.36, it can be concluded that in differential measurement mode, the error decreases (accuracy improves) because the constant  $T_C$  term disappears, but the precision error increases (precision degrades) due to accumulating effects of additional independent random variables.

Using a two-point calibration scheme to calibrate the TDC, the improved accuracy in differential measurement over absolute measurement significantly outweighs the reduced precision. To illustrate this claim, consider an example in which two sample time intervals  $T_a$  and  $T_b$ , the TDC generates two numbers  $N_a$  and  $N_b$ . Assume that  $N_a << N_{cal2} - N_{cal1}$ and  $N_b << N_{cal2} - N_{cal1}$ . Using the variances of  $T_{\Delta e}$  and  $T_{Ce}$  obtained in Sec. D.1, the rms error in the absolute and differential measurement modes are:

$$(\varepsilon_{abs(rms)})^{2} = \frac{N_{a}^{2}}{(N_{cal2} - N_{cal1})^{2}} (\frac{T_{\Delta}^{2}}{6} + 2\sigma_{R}^{2}) + \frac{5T_{\Delta}^{2}}{12} + 5\sigma_{R}^{2} + (\frac{T_{\Delta}^{2}}{12} + \sigma_{R}^{2})$$

$$\simeq 6(\frac{T_{\Delta}^{2}}{12} + \sigma_{R}^{2})$$

$$(\varepsilon_{diff(rms)})^{2} = \frac{(N_{b} - N_{a})^{2}}{(N_{cal2} - N_{cal1})^{2}} (\frac{T_{\Delta}^{2}}{6} + 2\sigma_{R}^{2}) + 2(\frac{T_{\Delta}^{2}}{12} + \sigma_{R}^{2})$$

$$\simeq 2(\frac{T_{\Delta}^{2}}{12} + \sigma_{R}^{2})$$
(5.37)
$$(5.38)$$

The rms precision error in absolute and differential measurement modes are:

$$(\eta_{abs(rms)})^2 = \frac{T_{\Delta}^2}{12} + \sigma_R^2$$
(5.39)

$$(\eta_{diff(rms)})^2 = 2(\frac{T_{\Delta}^2}{12} + \sigma_R^2)$$
(5.40)

Comparison of Eqns. 5.37 and 5.38 show that using the differential measurement method decreases rms error, while Eqns. 5.39 and 5.40 indicate that this method increases the rms precision error.

# 5.3 Jitter Generator

For the jitter tolerance and jitter transfer test of CRUs, it is necessary to supply the CRU with a signal which has a known jitter. Here, a circuit is proposed which is capable of generating a controlled jittered signal out of a jitter-free clock signal. The circuit is shown in Fig. 5.19. The circuit is composed of a delay line, a multiplexer, and a sequence counter. Different taps of the delay line are multiplexed to the output J. The counter specifies which tap is multiplexed to the output at any clock edge. For example, with 8 taps and a 3-bit up/down counter, it is possible to generate a triangular shaped jitter signal with the maximum peak-topeak amplitude of  $8\tau_g$ , where  $\tau_g$  is the delay of each delay element in the delay line. Using a counter with a count sequence which follows a sinusoidal pattern, the circuit will generate a signal with sinusoidal jitter. By designing a programmable counter, the circuit can generate different jitter signals according to the stored program in the counter. Such counters can be implemented as general state-machines.



Figure 5.19: Jitter generator circuit

# 5.4 Schemes for On-Chip Jitter Specification Testing

## 5.4.1 Cycle-to-cycle jitter measurement

Cycle-to-cycle or period jitter is defined as variations in the period of a signal. A histogram approach can provide statistics of such jitter. Using this approach, to measure period jitter, two consecutive rising (or falling) edges of the signal  $V_{in}$  are passed to the time measurement circuit as START and STOP signals by a control circuit (Fig. 5.20). After this is completed, the control circuit reads the N stored in the TQ counter and sends it to an external tester (possibly through a JTAG controller if it exists on the chip) or to an on-chip processing unit [109] for post-analysis. Concurrently, the test controller can pass two other consecutive edges of  $V_{in}$  to the TDC. This procedure can be repeated until a predetermined number of samples of the  $V_{in}$  periods are measured. Subsequently, the external tester (or on-chip processor) can form a histogram of the data and calculate the variance and peak-to-peak jitter. The tester does not have to be a high-speed or high-performance mixed-signal type because the data is digital and can be sent off-chip using a low-speed serial bus.

If the information about the times at which the jitter samples are taken is supplied to the tester along with the jitter sample measurements, frequency components of the jitter can also be analyzed. This feature enables the JMC to perform full jitter standard compliance tests, if required. However, this feature is not fully detailed in this thesis.

## 5.4.2 Relative jitter measurement

In some applications such as serial communications, it is important to ensure that the relative displacement of corresponding edges of two signals, for example, IN1 and IN2, meets a



Figure 5.20: Cycle-to-cycle jitter measurement

given specification. Here, without loss of generality, we assume that the specification sheet requires that the corresponding edges of the signals IN1 and IN2 occur within a tight time window (*e.g.*, 0.001UI, where UI is the unit interval or period duration). Based on this assumption, we propose the circuit in Fig. 5.21 to measure the relative jitter between the edges of IN1 and IN2. In this circuit,  $DFF_1$  samples an edge of IN1 and,  $DFF_2$  samples the edge of IN2 closest to the sampled IN1 edge. The delay element D2 ensures that the setup&hold time of  $DFF_1$ , *i.e.*,  $\tau_{S\&H1}$ , is met so that the DFF1 output is set before an IN2 edge arrives. This condition is satisfied if

$$t_{IN2} > t_{IN1} - \tau_{D2} + \tau_{q1} + \tau_{s2}$$

where  $\tau_{D2}$  is the delay of the D2 delay element,  $t_{IN1}$  and  $t_{IN2}$  are the time instants at which the sampled edges of IN1 and IN2 occur,  $\tau_{q1}$  is the CLK-to-Q delay of DFF1, and  $\tau_{s2}$  is the setup time of DFF2. Fig. 5.21(b) shows the timing diagram of the circuit for one positive and one negative value of  $T_J = t_{IN1} - t_{IN2}$ . The generated START and STOP signals are passed to the TDC to measure the time displacement. After the completion of a measurement,  $DFF_1$  and  $DFF_2$  are reset and are ready for the next sample. Note that the sample&hold time of the flip-flops  $DFF_1$ ,  $DFF_2$  affect the actual time displacement being measured, since the measured  $T_d$  is

$$T_d = T_J + \tau_{D2} + \tau_{q2} - \tau_{q1}$$

where  $T_J = t_{IN2} - t_{IN1}$  is actual time displacement between IN2 and IN1 edges. But since  $\tau_{D2}$ ,  $\tau_{q2}$  and  $\tau_{q1}$  are constant, they can be accounted for through calibration. In such a measurement scheme, the meaningful data is the variation of measurement values from one sample to the next. Jitter statistics can be obtained from a sufficient number of sample measurements.

In the scheme described, the measurement range is limited to approximately

$$-\tau_{D2} + \tau_{q1} + \tau_{s2} < T_J < T_{TDC} - \tau_{D2} + \tau_{q1} + \tau_{s2}$$

where  $T_{TDC}$  is the maximum measurement range of the TDC.  $\tau_{D2}$  cannot be more than 0.5UI, otherwise the displacement on the IN1 edge and a non-adjacent edge of IN2 will be measured. This, however, is generally not a limitation for test purposes because the acceptable edge displacement variations is usually a small fraction of unit interval (*e.g.*, 0.1UI).

The relative jitter measurement scheme can also be used to perform a jitter tolerance limit test on clock recovery units (CRUs). For a jitter tolerance test, a signal with known jitter is applied to the CRU. The TDC measures the relative jitter between the input clock and the recovered clock. This relative jitter cannot exceed a certain threshold for jitter signals



Figure 5.21: Relative jitter measurement

specified in standards such as [20]. The excessive relative jitter indicates the inability of the CRU to meet its rated bit-error-rate.

For production test purposes, we suggest testing the jitter tolerance at two different frequency/amplitude points: one frequency inside the loop bandwidth of the CRU and one frequency point outside it. For each case maximum amplitude given in standards should be selected. Performing tolerance tests at more points is also possible but requires more test time.

# 5.5 Implementation

The jitter measurement circuit proposed in Sec. 5.4.1 has been designed and implemented using a  $0.35\mu$  CMOS technology. The CL cells have been designed as standard cells to allow for automatic place and route. The uniform CL cell style has been used for implementing the resolution adjustment circuits. The rest of the cells used in the implementation have been taken from a standard digital cell library.

The top block level schematic of the jitter measurement circuit is shown in Fig. 5.22. It contains the following blocks:

- 1. TQ: Time Quantizer
- 2. **RE\_TATB:** Range extender and  $T_A > T_B$  condition checker
- 3. Main Counter and the DivBy2 circuit
- 4. **REEOC\_sync\_DFF**, **TQEOC\_sync\_DFF** and **ERR1\_sync\_DFF**: RE\_EOC, TQ\_EOC and ERR1\_flag synchronizer flip-flops

- 5. TATB Check Delay Gen: Generates a small time delay for checking  $T_A > T_B$
- 6. **Delay Generator:** Controls the selection of the delays needed for resolution adjustments, calibration and measurement.

The implementation details of each part of the circuit follows.

1. CL cells

A total of six different CL cells have been designed using style (d) in Fig. 5.11. The transistor sizes for each of these cells are given in Table 1. In the forth row of this table, the additional delay in the oscillator A or B loops obtained by activating the cell is listed. Such a selection of CL cells allows for  $\pm 288 \, ps$  or  $\pm 8.5\%$  period mismatch between clkA and clkB (this is obtained when all the cells are activated). The last row lists the area of each cell. The height of all the cells is the same as standard cell library height. The area of the smallest cell is equivalent to the area of a double-drive 2-input NAND gate.

| \                                           | $CL_0$ | $CL_1$ | $CL_2$ | $CL_3$   | $CL_4$ | $CL_5$ |
|---------------------------------------------|--------|--------|--------|----------|--------|--------|
| $M_S\left[w(\mu m)/l(\mu m) ight]$          | 4/0.35 | 4/0.35 | 4/0.35 | 4/0.35   | 4/0.35 | 4/0.35 |
| $M_L \left[ w(\mu m)/l(\mu m)  ight]$       | 3/0.5  | 6/0.5  | 9/0.5  | 13.5/0.5 | 18/0.5 | 27/0.5 |
| Delay (ps)                                  | 11     | 23     | 35     | 50       | 67     | 101    |
| Area (width( $\mu m$ ) × height( $\mu m$ )) | 6.3×21 | 7.9×21 | 9.4×21 | 9.4×21   | 11×21  | 14×21  |

Table 5.2: Specifications of the implemented CL cells

#### 2. Time Quantizer (TQ)

Oscillators A and B in the TQ block consists of 11 NAND gates and one AND gate.

Six taps for each oscillator are connected to six different CL cells. The outputs of oscillators A and B are directly connected to the clk and D inputs of DFF\_EOC. These outputs are buffered before being used in other control blocks which are less time sensitive. The output of DFF\_EOC is sampled and held by another flip-flop to ensure that the end-of-conversion signal, EOC\_Flag, can be observed by the control blocks operating with the system clock.

In addition to Start and Stop inputs, two other inputs have been reserved for applying StartCheck and StopCheck signals to oscillators A and B, respectively (see Fig. 5.23). These inputs are used to apply a  $T_d$  (as defined in Sec. 5.2.9) for the purpose of checking the condition  $T_A > T_B$ . Without these inputs, an additional multiplexer would be required. However, when Start and Stop are applied to the TQ StartCheck and StopCheck signals must be inactive (HIGH), and vice versa. The main controller block ensures this condition using four control signals: MainSet, rbMain, CheckSet, and rbCheck.

The flip-flop TQEOC\_sync\_DFF has been used to synchronize the TQ\_EOC signal with the system clock, SCLK, in order to avoid sampling erros by the Main\_Controller block.

3. **RE\_TATB:** The implementation of this block closely follows the structure shown in Fig. 5.7(a) and 5.13(a) with k = 6,  $\tau_{A1} = 1.2 ns$  and  $\tau_{A1} = 0.4 ns$ . The flip-flop REEOC\_sync\_DFF synchronizes the RE\_EOC signal with the system clock, SCLK. This prevents errors in sampling by the Main Controller block.

The buffers Buf1 to Buf6 are not needed in the actual implementation, but are necessary to perform mixed-signal simulations. Without these, during simulation the signals RE\_EOC1 and ERR1\_flag1 will become 'unknown' due to intentional setup & hold violations in the internal flip-flops of the RE\_TATB block. The propagation of the unknown states prohibits meaningful simulations. To overcome this problem, Buf1, Buf3, Buf4 and Buf6 are simulated as digital cells, and Buf2 and Buf5 are simulated as analog cells. This arrangement, effectively, translates the RE\_EOC1 and ERR1\_flag1 signals from the digital domain to the analog one and then back to digital. Since the 'unknown' digital state at the input of the analog block is interpreted as a voltage of OV, an 'unknown' digital state does not propagate to the rest of the circuits.

4. Main Counter and DivBy2 circuit: A 16-bit counter is used to count the number N. As shown in Fig. 5.6, clkA should drive the counter's clock input. However, the maximum operational frequency of the 16-bit counter is 250 MHz, whereas  $f_{clkA} = 350 MHz$ . The divider circuit divides  $f_{clkA}$  by two enabling the counter to count the number of clkA edges. The state of the DivBy2 circuit recovers the lost bit due to division as follows:

$$N = 2N_{cntr16} - \text{clkDiv2}$$

where  $N_{cntr16}$  is the state of the 16-bit counter and clkDiv2 is the state of the DivBy2 DFF.

- 5. Delay Generator: This block generates Start and Stop edges with  $t_{Stop} t_{Start} = T_{ref}, 2T_{ref}$  and  $(t_{IN2} t_{IN1})$  for [SelD1, SelD0]=[01], [10], and [11], respectively. When [SelD1, SelD0]=[00], both Start and Stop are set HIGH and the StartCheck and StopCheck signals are activated to check for the condition  $T_A > T_B$ .
- 6. TATB Check Delay Generator: This block generates a delay of 1.8 nsec between

the StartCheck and StopCheck edges in the  $T_A > T_B$  check mode. The outputs of this block are set HIGH in other modes.

7. Main Controller: This controller monitors the outputs of all other blocks and generates required signals for controlling the operation of the TDC. The TDC operation starts by loading a threshold  $N_{thre}$  serially. The serial data is read through SThre input while TestStart is HIGH. Then, the controller controls the Delay Genrator while the TATB Check Delay Generator block performs resolution adjustment. After adjustment, calibration is performed and the TDC switches to measurement mode. In this mode, the Main Controller instructs the Delay Generator block to pass jitter samples to the TQ. Upon completion of each measurement, the data is sent off-chip serially through DataOut output. The InputReady, MeasReady, and DataReady signals are used for handshaking between the external tester and TDC.

## 5.6 Simulation Results

#### 5.6.1 Jitter measurement circuit

All the individual blocks in the JMC as well as the complete circuit were simulated under a variety of conditions to verify their functionality and performance. The TQ, Edge Sampler, RE, and RA blocks were verified through analog simulation because of timing and loading sensitivity. The controller and counter blocks were simulated as digital blocks. The Spectre and Verilog simulators were used for analog and digital simulations.

The complete circuit were simulated using SpectreSVerilog mixed-signal simulator. For this simulation, the TQ block is considered as an analog block while the rest of the circuits are treated as digital cells. Analog high-level description language (ahdl) code was written to perform measurements during simulations and to emulate the external tester which handshakes with the JMC.

To test the full capability of the circuit, oscillator A was loaded with an additional capacitive load to model a mismatch of 35 ps between  $T_A$  and  $T_B$ . In addition,  $T_{ref} = 6 ns$  and  $N_{th} = 272$ . Therefore, the required resolution is  $T_{th} = T_{ref}/N_{th} = 22 ps$ . The waveforms in Fig. 5.24 show how TDC successfully adjusts its resolution by controlling the  $Bc[0:5] = \vec{b}$  tabs to achieve a resolution better than  $T_{th}$ . In this case a resolution of approximately 11 ps is achieved. The resolution adjustment took approximately 16.4  $\mu s$ . The result of the last step in the resolution adjustment is also used for calibration.

The rippling seen on waveforms  $T_A$  and  $T_B$  in Fig. 5.24 represents simulation artifacts caused by numerical coupling of the two oscillators A and B in the simulator. Tightening the accuracy parameters of the simulator mitigates the ripples, but increases the simulation time significantly.

A resolution of 34.1 ps was achieved in a separate simulation. In this case, a number of time intervals from  $T_d = 1 ns$  to 11 ns with a step of 200 ps were measured. Fig. 5.25 shows the difference between the simulated measured intervals and the expected values. The measurement rms error is 12.1 ps, which matches the rms quantization error estimated in Eqn. 5.40.

### 5.6.2 Accuracy estimation

In this section, the accuracy of the TDC is estimated assuming a typical condition on the chip based on the analysis in Secs. 5.2.10 and 5.2.6. These calculations assume that

- power supply noise (V<sub>dd(e)</sub>) is a high-frequency signal with a lower bandwidth of 50 MHz and rms value of 50mV;
- 2. the thermal noise jitter for a NAND gate delay is approximately 25 fsec [106] ( $\sigma_g = 25$  fsec);
- 3. the thermal noise jitter in a flip-flop delay is negligible compared to its  $V_{dd}$ -induced noise;
- 4. the resolution  $(T_{\Delta})$  is 10 ps;
- 5.  $T_d = 1 ns;$
- 6. the constant offset  $T_C = 0$ .

To estimate the  $V_{dd}$ -induced noise, a number of sinusoidal noise components,  $V_{dd(e)}$ with frequencies between 50MHz and 2GHz and eight phases between 0 and  $2\pi$  have been selected and the resulting jitter in ring oscillators and flip-flops are simulated for each selection. The rms value of each jitter term is estimated by obtaining the rms value of the simulated jitters of that term for all the selections of  $V_{dd(e)}$ .

Based on the above assumptions, different jitter terms in Sec. 5.2.10 are estimated as follows:

1.  $\tau_{ST(e)}$  and  $\tau_{SP(e)}$ : The rms value of the sampling flip-flop clk-to-Q jitter is:

$$\sigma_{ST(e)} = \sigma_{SP(e)} = 3ps \tag{5.41}$$

2.  $\tau_{EOC(e)}$ : The rms value of the DFF\_EOC flip-flop is:

$$\sigma_{EOC(e)} = 1.3ps \tag{5.42}$$

3. Oscillators A and B: From Eqn. 5.24, for M = 12,  $N = T_d/T_{\Delta} = 100$ , and  $\sigma_g = 25$  fsec, the rms error due to thermal noise is  $\sigma_{R1} = 1.73 \, ps$ 

Eqn. 5.25 is used to evaluate  $V_{dd}$ -induced jitter term. The required  $\kappa$  and  $\gamma$  values are extracted from Table G.1.

$$\sigma_{PS} = 5.5 \, ps \tag{5.43}$$

Assuming different jitter terms are independent, the resulting inaccuracy is:

$$\sigma_R = \sqrt{\sigma_{EOC}^2 + \sigma_{ST}^2 + \sigma_{SP}^2 + \sigma_{PS}^2 + \sigma_{R1}^2} = 7.3 \, ps$$

Therefore, total rms error in differential measurement mode is:

$$\sigma_d = \sqrt{2 * \sigma_R^2 + T_\Delta^2/6} = 11 \, ps$$

# 5.7 Conclusions

We have developed a high-resolution jitter measurement circuit and jitter generator block. All the circuits are digital and fit well in a digital ASIC design flow. The total area of the circuits in a  $0.35\mu m$  technology is  $450\mu m \times 500\mu m$ , which is equivalent to 1200 doubledrive 2-input NAND gates. The Main Controller was written as synthesizable VHDL code and the rest of the circuits were described at the schematic level. Automatic place & route were performed for all the circuits. Exhaustive simulations and analysis show that the jitter measurement circuit is capable of jitter measurement with resolution and accuracy in the order of 10 ps. The digital and compact nature of this TDC circuit makes it very attractive for BIST applications for testing high-speed serial communication interfaces, *e.g.*, clock and data recovery, timing circuits, and edge placement circuits. Since the TDC provides a very high-resolution time measurement capability, it is also suitable for use in design of all digital clock recovery and clock synthesis circuits.



Figure 5.22: Top block-level schematic of the jitter measurement circuit



Figure 5.23: Implemented TQ circuit



Figure 5.24: TDC resolution adjustment simulation waveforms





# Chapter 6

# **Summary and Conclusions**

The IC industry is undergoing a constant evolution. This has major implications on test. Notably, two important requirements emerge. These are

- 1. the requirement for block re-use; and,
- 2. the need for cost-effective high-performance test capability.

From the above, embedded test provides an attractive solution approach provided certain features are achieved. Such features include

- 1. compactness: *i.e.*, small area comparing to the circuit under test (CUT);
- 2. design simplicity and robustness: *i.e.*, resistant to process variations, temperature and power supply variations;
- 3. digital output generation: *i.e.*, generate one or more digital signatures which can be sent off-chip at relatively low speed, *e.g.*, serially;
- 4. accuracy: measurement accuracy must be sufficient for the test;

5. calibration: *i.e.*, calibration-free, self-calibrating, or use signal readily available signals to the chip for calibration;

6. performance impact: *i.e.*, the impact on the CUT performance must be minimal.

Meeting all the requirement above is a challenging task especially for functional BIST or embedded test of high-performance high-speed circuits which require high resolution and high accuracy.

At the outset of this research only a few solutions with limited applicability were known to exist. In this research, two different embedded test methodologies for mixedsignal circuits were developed: (*i*) on-chip power supply current ( $I_{DD}$ ) monitoring, and, (*ii*) on-chip jitter testing. One important demonstration of this research is that designing feasible embedded test methods for testing high performance mixed-signal circuits is possible. This demonstration counters the often widely-held opinion to the contrary. A reason for such capability is that often the test circuits are not required to test the full functionality of the CUT, thereby relaxing the requirements of the test circuits.

In regards to current monitoring, the main challenges in devising an effective on-chip  $I_{DD}$  test scheme is designing a built-in current monitor (BICM) that has minimal impact on the analog CUT performance while maintaining a good measurement sensitivity, and also generates an on-chip signature in a relatively small silicon area. We met this challenge by designing a novel BICM structure. The BICM has two major pars: a current mirror-based built-in current sensor (BICS) and a single-phase built-in current integrator (BICI). The BICS structure had been reported previously but the novel, thorough performance analysis and actual chip fabrication and characterization were performed. Results confirm that the BICS impedance can be decreased to acceptable levels (~  $3\Omega$ ) while providing a high

sensitivity and accuracy by using only a single-stage feedback amplifier. Also, we found that in submicron CMOS technologies, even small routing resistances can cause significant error in current mirror operation. Our investigation shows that a 2% accuracy is achievable if special attention is paid in routing the current mirror connections. Moreover, the specific BICS impedance has a bandwidth of 5 MHz and the current mirror bandwidth of 120 MHz. These are sufficient characteristics for many practical purposes. The BICS is a relatively simple design that requires matching only two transistors in a current mirror.

With respect to the BICI, we designed a new integrator structure to meet the challenge of reducing the size of an integrator circuit with a long time constant. This structure employs a novel technique of breaking the long time constants in analog domain to shorter ones and transforming the remainder of the operations to digital domain. This results in a drastic saving in silicon area while providing the additional benefit of yielding a digital signature at the end. The implementation of this structure is simple because it contains only switches, capacitors, a comparator, and a small digital circuit. Also, a two-point calibration makes the circuit robust against process variations, temperature, and power supply variations. However, an off-chip or on-chip current source is needed for this calibration. The specific implementation of BICI yields an accuracy of 2% if the  $I_{DD}$  is not correlated with the integration control signal. The BICM can be used to test different analog blocks where  $I_{DD}$  contains AC components. Typically, this test is not conclusive but it can reduce test cost by weeding out many faulty devices earlier in the test.

The novel technique of 'quantization residue feed forward' was invented to achieve more integration accuracy for arbitrary  $I_{DD}$  waveforms. In this technique, instead of discarding quantization residues at each digitization cycle, they are stored and used in subsequent cycles. Employing this technique in combination with using two complementary integrators in parallel led to the design of a double-phase BICI. This structure is about 20% larger than the single-phase BICI, but it is accurate to within 1% regardless of the  $I_{DD}$  waveform. This BICI extends the use of BICM for  $I_{DD}$  testing to a larger number of analog circuits in comparison with single-phase BICI because of its higher accuracy which is independent of  $I_{DD}$  waveform.

In regards to jitter testing, our novel approach is the first circuit which can perform single-shot jitter measurement with accuracy and resolution in 10 ps range, while satisfying all the requirements for a practical embedded test scheme. Prior to this research, there has been no such circuit reported in open literature. The circuit is composed of two parts: a jitter generator and a jitter measurement block.

The jitter generator occupies an area equivalent to 200 2-input NAND gates. It accepts a jitter-free signal and digitally modulates it to generate an output signal with controlled jitter. The amplitude and frequency of the jitter signal are programmable.

The jitter measurement circuit (JMC) uses a differential ring-oscillator technique to achieve high-resolution. The resolution of this circuit is programmable; the minimum guaranteed resolution depends on process variations. A very important side product of the differential nature of the circuit is robustness against power supply noise which result in a significant accuracy improvement. Since parts of the circuit are asynchronous, a new technique, time-diversity sampling, was developed to ensure valid sampling and correct operation. JMC calibration is practical and simple because the same signal which is used for clock synthesis is also used for calibration purpose. A JMC prototype has been designed in a standard 0.35  $\mu m$  CMOS technology. Simulation and analysis predict a measurement accuracy of about 11 ps rms and a a resolution of 10 ps for this implementation. The total area of the circuit is equivalent to approximately 1200 2-input NAND gates; less than 10% of this area is occupied by time-sensitive circuitry. The time-to-digital (TDC) circuit at the core of JMC, can measure 1 ns time interval in about 400 ns. Therefore, the approximate test time for measuring 1000 jitter samples on a 155.54 MHz PLL (OC-3 Sonet standard) including resolution and calibration time is about 0.5 ms. This number of samples is sufficient for histogram-based testing.

The predicated resolution and accuracy of the JMC is sufficient for testing SONET OC-3 (155 MHz) and even OC-12 (622 MHz) signals because the measurement resolution required for testing these signals are approximately 64 ps and 16 ps, respectively. Because of the single-shot measurement capability of the the JMC, it can be used to analyze the jitter frequency components, thereby, enabling frequency-dependent jitter standard compliance test. This is an important additional advantage over previously reported JMC's.

#### 6.1 Future Research

This research demonstrated the feasibility of designing high performance and robust embedded test circuits. More embedded test circuits are needed for testing future high-performance mixed-signal circuits such as high-speed ADCs, DACs, and analog equalizers.

Calibration is a very important aspect of any measurement circuit. The BICM and BICI in this work require accurate current sources for calibration. Some ICs may have such sources available on the chip. However, for the more general cases, new self-calibration techniques will be required, or calibration should use typically accurately-controlled power

supply voltage and/or clock signals.

The 11 ps (rms) precision in the jitter test circuit is sufficient for testing 155.54 MHz and possibly 622.2 MHz clock recovery and synthesis circuits. However, new techniques capable of reducing the noise in the jitter measurement circuit would have to be devised to extend the ability of this circuit for testing 1.2 GHz circuits and beyond. Some proposals include using faster ring oscillators, using clean (low jitter) clock signals in measurement and not just in calibration, and extending the differential structure to other parts of the circuits. Also, the use of CL cells requires the ability to add cells to digital cell libraries. This is increasingly difficult for many companies that use libraries supplied by vendors over which they have little control. To circumvent this problem, new fully digital delay control techniques with very fine delay control steps need to be developed. One proposal is to use loaded gates and multiplexers.

Another research direction is to use the TDC in the jitter measurement circuit to design all-digital high-speed low jitter PLLs. In such applications, the TDC would replace the phase detector and loop filter. With proper design of the TDC and controlled circuitry, this could open the gateway for a new generation of high-speed all-digital timing circuits.

#### **Bibliography**

- L. Geppert, "The 100-million transistor IC," *IEEE Spectrum*, vol. 36, pp. 22–24, July 1999.
- [2] Y. Zorian, "Testing the monster chip," *IEEE Spectrum*, vol. 36, pp. 54–60, July 1999.
- [3] J. M. Soden, C. F. Hawkins, R. K. Gulati, and W. Mao, "I<sub>DDQ</sub> testing: A review," Journal of Electronic Testing: Theory and Applications, vol. 3, pp. 5–15, December 1992.
- [4] S. McEuen, "IDDq benefits," Proc. of IEEE VLSI Test Symposium, pp. 285–290, 1991.
- [5] A. Gattiker and W. Maly, "Current signatures: Application," *Proc. of Int. Test Conf.*, pp. 156–165, 1997.
- [6] K. R. Eckersall, P. L. Wrighton, I. M. Bell, B. R. Bannister, and G. E. Taylor, "Testing mixed signal ASICs through the use of supply current monitoring," *Proc. of European Test Conf.*, pp. 385–391, 1993.

- [7] M. Robson and G. Russell, "Current monitoring technique for testing embedded analogue functions in mixed-signal ICs," *Electronics Letters*, vol. 32, pp. 796–798, April 1996.
- [8] M. Dalmia, S. Tabatabaei, and A. Ivanov, "Power supply current monitoring techniques for testing PLLs," *Proc. of Asian Test Symposium*, pp. 366–371, 1997.
- [9] Z. Wang, G. Gielen, and W. Sansen, "Testing of analog integrated circuits based on power-supply current monitoring and discrimination analysis," *Proc. of Asian Test Symposium*, pp. 126–131, 1994.
- [10] K. Arabi and B. Kaminska, "Design and realization of an accurate built-in current sensor for on-line power dissipation measurement and I<sub>DDQ</sub> testing," Proc. Int. Test Conf., pp. 578–586, 1997.
- [11] T.-L. Shen, J. C. Daly, and J.-C. Lo, "A 2-ns time, 2-μm CMOS built-in current sensing circuit," *IEEE Journal of Solid-State Circuits*, vol. 28, pp. 72–77, January 1993.
- [12] R. Burgess, "Regenerative V<sub>dd</sub> monitored differential CMOS I<sub>ddq</sub> sensor," *Electronic Letters*, vol. 31, pp. 1660–1661, September 1995.
- [13] A. Rubio, E. Janssens, H. Casier, and J. Figueras, "A built-in quiescent current monitor for CMOS VLSI circuits," *Proc. of European Design and Test Conf.*, pp. 581–585, 1995.
- [14] E. Lupon, G. Gorriz, C. Martinez, and J. Figueras, "Compact BIC sensor for I<sub>ddq</sub> testing of CMOS circuits," *Electronics Letters*, vol. 29, pp. 772–774, April 93.

- [15] Y Maidon, Y. Deval, P. Fouillat, J. Tomas, and J. P. Dom, "On chip i<sub>DDx</sub> sensor," Proc. of Int. Workshop on IDDQ Testing, pp. 64–67, 1996.
- [16] S. M. Menon and Y. K. Malaiya, "Limitations of built-in current sensors (BICs) for I<sub>DDQ</sub> testing," Proc. of Asian Test Symposium, pp. 243–248, 1993.
- [17] S. M. Menon, Y. K. Malaiya, A. P. Jayasumana, and C. Q. Tong, "The effect of builtin current sensors (BICs) on operational and test performance," *Proc. of 7th Int. Conf. on VLSI Design*, pp. 187–180, 1994.
- [18] J. Ramfrez-Angulo, "Low voltage current mirrors for built-in current sensors," *Proc.* of Int. Symposium on Circuits and Systems, pp. 529–532, 1994.
- [19] M. A. Ortega, J. Rius, and J. Figueras, "Test of CMOS circuits based on its energy consumption," Proc. of Int. Workshop on I<sub>DDQ</sub> Testing, pp. 36–40, 1996.
- [20] Bell Research Laboratories, "SONET transport systems: Common criteria network element architectural features," *GR-253-core, Issue 1*, pp. 5–81, December 1994.
- [21] W. Dalal and D. Rosenthal, "Measuring jitter of high speed data channels using undersampling technique," *Proc. of Int. Test Conf.*, pp. 814–818, 1998.
- [22] R. J. A. Harvey, E. M. J. G. Bruls, A. M. D. Richardson, and K. Baker, "Test evaluation for complex mixed-signal IC's by introducing layout dependent faults," *IEE Colloquium on Mixed Signal VLSI Test*, pp. 6/1–8, 1993.
- [23] L. Bonet, J. Ganger, C. Greaves, M. Pendleton, and D. Yatim, "Test features of the MC145472 ISDN U-transceiver," *Proc. of Int. Test Conf.*, pp. 68–79, 1990.

- [24] G. Devarayanadurg, P. Goteti, and M. Soma, "Hierarchy based statistical fault simulation of mixed-signal ICs," *Int. Test Conf.*, pp. 521–527, 1996.
- [25] P. Goteti, G. Devarayanadurg, and M. Soma, "DFT for embedded charge-pump PLL systems incorporating IEEE 1149.1," Proc. of Custom Integrated Circuits Conference, pp. 210–213, 1997.
- [26] F. Azias, R. Renovell, Y. Bertrand, A. Ivanov, and S. Tabatabaei, "A unified digital test technique for PLLs using re-configurable VCO," *Proc. of Int. Mixed-Signal Test Workshop*, 1999.
- [27] R. Kelkar, I. Novof, and S. D. Wyatt, "Integrated circuit chip having built-in self measurement for PLL jitter and phase error," US Patent #5663991 assigned to IBM Corp., Sept. 1997.
- [28] L. D. Smith and Norman E., "On-chip PLL phase and jitter self-test circuit," US Patent #5889435 assigned to Sun Microsystems Corp., March 1999.
- [29] K. Dalmia, A. Ivanov, B. Gerson, and C. Lapadat, "Built-in test scheme for a jitter tolerance test of a clock and data recovery unit," US Patent #5835501 assigned to PMC-Sierra Ltd., Nov. 1998.
- [30] S. Sunter and A. Roy, "BIST for phase-locked loops in digital applications," Proc. of Int. Test Conf., pp. 532–540, 1999.
- [31] B. Veillette and G. Roberts, "On-chip measurement of the jitter transfer function of charge-pump phase-locked loops," *Int. Test Conf.*, pp. 776–785, 1997.

- [32] B. Villette and G. Roberts, "Stimulus generation for built-in self-test of charge-pump phase-locked loops," *Int. Test Conf.*, pp. 698–707, 1998.
- [33] S. Blazo, "Jitter testing for high-speed telecommunications," *Evaluation Engineer*ing, vol. 31, pp. 76/80–84, April 1992.
- [34] D. Chu, "Phase digitizing sharpens timing measurements," *IEEE Spectrum*, vol. 25, pp. 28–32, July 1988.
- [35] Y. Arai, "A time digitizer CMOS gate-array with a 250 ps time resolution," *IEEE Journal of Solid-States Circuits*, vol. 31, pp. 212–220, February 1996.
- [36] J. Christiansen, "An integrated high resolution CMOS timing generator based on an array of delay locked loops," *IEEE Journal of Solid-States Circuits*, vol. 31, pp. 952–957, July 1996.
- [37] J. Kalisz, R. Szplet, J. Pasierbinski, and A. Poniecki, "Field-programmable-gatearray-based time-to-digital converter with 200-ps resolution," *IEEE Trans. on Instrumentation and Measurement*, vol. 46, pp. 51–55, February 1997.
- [38] P. Chen, S. Liu, and J. Wu, "A low power high accuracy CMOS time-to-digital converter," Proc. of Int. Symp. on Circuits and Systems, pp. 281–284, 1997.
- [39] M. B. Anderson and P. A. Atkinson, "BIST jitter tolerance measurement technique," US Patent #5793822 assigned to Symbios Ltd., August 1998.
- [40] E. R. Hnatek, Integrated Circuit Quality and Reliability. Marcel Dekker, Inc., 1995.

- [41] L. Milor and A. L. Sangiovanni-Vincentelli, "Minimizing production test time to detect faults in analog circuits," *IEEE Transaction On Computer-Aided Design of Integrated Circuits And Systems*, vol. 13, pp. 796–813, June 1994.
- [42] J. M. Soden and R. E. Anderson, "IC failure analysis: Techniques and tools for quality and reliability improvement," *Proceedings of the IEEE*, vol. 81, pp. 703–715, May 1993.
- [43] M. Abramovici, M. A. Breuer, and A. D. Friedman, *Digital Systems Testing and Testable Design*. Computer Science Press, 1990.
- [44] W. M. Lindermeir, H. E. Graeb, and K. J. Antreich, "Design based analog testing by characteristics observation inference," *Proc. of IEEE Int. Conf. on CAD*, pp. 620– 626, 1995.
- [45] T. Kirkland and M. R. Mercer, "ATPG tutorial: Generating a test set," *IEEE Design & Test of Computers*, pp. 44–55, June 1988.
- [46] N. Ben Hamida and B. Kaminski, "Multiple fault analog circuit testing by sensitivity analysis," *Journal of Electronic Testing: Theory and applications*, vol. 4, pp. 331–343, November 1993.
- [47] A. Ambler, M. B. Bassat, and L. Ungar, "Economics of diagnosis," Proc. of AutoScan, pp. 435–445, 1997.
- [48] S. Chandra, K. Pierce, G. Srinath, H.R. Sucar, and V. Kulkarni, "CrossCheck: An innovative testability solution," *IEEE Design & Test of Computers*, vol. 10, pp. 56– 67, June 1993.

- [49] H. Hao and E. J. McCluskey, "On the modeling and testing of gate oxide shorts in logic gates," Proc. of Int. Workshop on Defect and Fault Tolerance on VLSI Systems, pp. 161–174, 1991.
- [50] J. M. Soden and C. F. Hawkins, "Electrical properties and detection methods for CMOS IC defects," Proc. of European Test Conf., pp. 159–167, 1989.
- [51] R. Liu, Testing and Diagnosis of Analog Circuits and Systems. VAn Nostrand Reinhold, 1995.
- [52] R. S. Berkowitz, "Conditions for network-element-value solvability," IRE Transaction on Circuit Theory, vol. CT-9, pp. 19–24, 1962.
- [53] T. N. Trick, W. Mayeda, and A. A. Sakla, "Calculation of parameter values from node voltage measurement," *IEEE Transaction on Circuits and Systems*, vol. CAS-26, pp. 466–474, July 1979.
- [54] N. Navid and A. N. Wilson, "A theory and an algorithm for analog circuit fault diagnosis," *IEEE Transaction on Circuits and Systems*, vol. CAS-26, pp. 440–456, July 1979.
- [55] N. Sen and R. Saeks, "Fault diagnosis for linear systems via multi-frequency measurements," *IEEE Transactions on Circuits and Systems*, vol. 26, pp. 457–465, July 1979.
- [56] A. Walker, W. E. Alexander, and P. K. Lala, "Fault diagnosis in analog circuits using element modulation," *IEEE design and Test of Computers*, pp. 19–31, March 1992.

- [57] R. Spina and S. Upadhyay, "Fault diagnosis of analog circuits using artificial neural networks as signature analysis," *Proc. of ASIC Conf. and Exhibit*, pp. 355–358, 1992.
- [58] S. Freeman, "Optimum fault isolation by statistical inference," *IEEE Transaction on Circuits and Systems*, vol. CAS-26, pp. 505–512, July 1979.
- [59] H. H.Schreiber, "Fault dictionary based upon stimulus design," *IEEE Transactions on Circuit and Systems*, vol. CAS-26, pp. 529–536, July 1979.
- [60] R. Voorakaranam, S. Chakrabarti, J. Hou, A. Gomes, S. Cherubal, and A. Chatterjee, "Hierarchical specification-driven analog fault modeling for efficient fault simulation and diagnosis," *Int. Test Conf.*, pp. 903–912, 1997.
- [61] N. Nagi, A Comprehensive Test Framework for Analog and Mixed-signal Circuits.PhD thesis, The University of Texas at Austin, December 1994.
- [62] R.J.A. Harvey, A.M.D. Richardson, and H.G. Kerhoff, "Defect oriented test development based on inductive fault analysis," *Proc. of Int. Mixed Signal Testing Workshop*, pp. 2–9, June 1995.
- [63] S.J. Spinks and and I.M. Bell, "Analogue fault simulation," IEEE Colloquium on Mixed Mode Modeling and Simulation, pp. 911–915, November 1994.
- [64] M. Soma, "Fault coverage of DC parametric tests for embedded analog amplifiers," *Proc. of Int. Test Conf.*, pp. 566–573, 1993.
- [65] A. J. Bishop and A. Ivanov, "On the testability of CMOS feedback amplifiers," *Proc.* of Wrokshop on Defect and Fault Tolerance in VLSI Systems, pp. 65–73, 1994.

- [66] A. J. Bishop and A. Ivanov, "Fault simulation and testing of an OTA biquadratic filter," Proc. of Int. Symp. on Circuit and Systems, pp. 1764–1767, 1995.
- [67] F. J. Ferguson and J. P. Shen, "A CMOS fault extractor for inductive fault analysis," *IEEE Transactions on computer-Aided Design*, vol. 7, pp. 1181–1194, November 1988.
- [68] A. Jee and F. J. Ferguson, "Carafe: An inductive fault analysis tool for CMOS VLSI circuits," Proc. of VLSI Test Symposium, pp. 92–98, 1993.
- [69] R. Valenzco and B. Martinet, "Physical fault injection: A suitable method for the evaluation of functional test efficiency," Proc. of Int. Workshop on Defect and Fault Tolerance in VLSI Systems, pp. 179–182, 1991.
- [70] M. Sachdev, "A realistic defect oriented testability methodology for analog circuits," Journal of Electronic Testing: Theory and Applications, pp. 265–276, 1995.
- [71] Y. Xing, "Defect-oriented testing of mixed-signal ICs: Some industrial experience," *Proc. of Int. Test Conf.*, pp. 678–687, 1998.
- [72] M. Sachdev and B. Atzema, "Industrial relevance of analog IFA: A fact or a fiction," *Proc. of Int. Test Conf.*, pp. 61–70, 1995.
- [73] S. Sunter, "A low cost 100 MHz analog test bus," *Proc. of VLSI Test Symp.*, pp. 60–65, 1995.
- [74] M. Soma, "A design-for-test methodology for active analog filters," *Proc. of Int. Test Conf.*, pp. 183–192, 1990.

- [75] M. Slamani and B. Kaminska, "T-BIST a built-in self-test for analog circuits based on parameter translation," *Proc. of Asian Test Symposium*, pp. 172–177, 1993.
- [76] C.-L. Wey, "Built-in self-test BIST structure for analog circuit fault diagnosis," *IEEE Transactions on Instrumentation and Measurement*, vol. 39, pp. 517–521, June 1990.
- [77] L. T. Wurtz, "Built-in self-test structure for mixed-mode circuits," *IEEE Transactions* on Instrumentation and Measurement, vol. 42, pp. 25–29, February 1993.
- [78] IEEE Standards, 1149.4-1999 Standard for Mixed-Signal Test Bus. IEEE Customer Services, 1999.
- [79] B. Vinnakota, Analog and Mixed-Signal Test. Prentice Hall, Inc., 1998.
- [80] K. Lofstrom, "A demonstration IC for the P1149.4 mixed-signal test standard," Proc. of Int. Test Conf., pp. 92–98, 1992.
- [81] A. K. Lu, G. W. Roberts, and D. A. Johns, "A high-quality analog oscillator using oversampling D/A conversion techniques," *IEEE Transaction on Circuits and Systems-II: Analog and Digital Signal Processing*, vol. 41, pp. 437–444, July 1994.
- [82] A. K. Lu and G. W. Roberts, "An analog multi-tone signal generator for built-in-selftest applications," *Proc. of Int. Test Conf.*, pp. 650–659, 1994.
- [83] M. Toner and G. W. Roberts, "A BIST scheme for a SNR gain tracking, and frequency response test of a sigma-delta ADC," *IEEE Trans. on Circuits and System-II: Analog* and Digital Signal Processing, vol. 42, pp. 1–15, January 1995.
- [84] B. Veillette and G. W. Roberts, "A built-in self-test strategy for wireless communication systems," *Proc. of Int. Test Conf.*, pp. 930–939, 1995.

- [85] A. Hatzopoulos, S. Siskos, and J. M. Kontoleon, "A complete scheme of built-in selftest (BIST) structure for fault diagnosis in analog circuits and systems," *IEEE Trans. on Instrumentation and Measurement*, vol. 42, pp. 689–694, June 1993.
- [86] K. Arabi and B. Kaminska, "Oscillation built-in test (OBIST) scheme for functional and structural testing of analog and mixed-signal integrated circuits," *Proc. of Int. Test Conf.*, pp. 786–795, 1997.
- [87] K. Arabi and B. Kaminska, "Efficient and accurate testing of analog-to-digital converters using oscillation-test method," *Proc. of European Design and Test Conference. (ED & TC)*, pp. 348–352, 1997.
- [88] K. Arabi and B. Kaminska, "Oscillation-test strategy for analog and mixed-signal integrated circuits," *Proc. of VLSI Test Symp.*, pp. 476–482, 1996.
- [89] K. Arabi and B. Kaminska, "Design for testability of embedded integrated operational amplifiers," *IEEE Journal of Solid-State Circuits*, vol. 33, pp. 573–581, April 1998.
- [90] K. Arabi, H. Ihs, C. Dufaza, and B. Kaminska, "Dynamic digital integrated circuit testing using oscillation-test method," *Electronics Letters*, vol. 34, pp. 762–764, April 1998.
- [91] K. Arabi, B. Kaminska, and J. Rzeszut, "BIST for D/A and A/D converters," *IEEE Design and Test of Computers*, pp. 40–49, Winter 1996.

- [92] S. Tabatabaei and A. Ivanov, "A built-in current sensor for testing analog circuit blocks," Proc. of Instrumentation and Measurement Technology Conf., pp. 1403– 1408, 1999.
- [93] S. Tabatabaei and A. Ivanov, "A current integrator for BIST of mixed-signal ICs," Proc. of VLSI Test Symposium, pp. 311–318, 1999.
- [94] C. Toumazou, F. J. Lidgey, and D. G. Haigh, *Analogue IC Design: the current-mode approach*. IEE Circuits and System Series 2, 1990.
- [95] K. M. Ware, H. Lee, and C. G. Sodini, "A 200-MHz CMOS phase-locked loop with dual phase detectors," *IEEE Journal of Solid-State Circuits*, vol. 24, pp. 1560–1568, December 1989.
- [96] S. Milam and P. E. Allen, "Accurate two-transistor current mirrors," Proc. of 37th Midwest Symposium on Circuits & Systems, pp. 151–154, 1994.
- [97] R. Gregorian and G. C. Temes, Analog MOS integrated circuits for signal processing. John Wiley & Sons, Inc., 1986.
- [98] P. R. Gray and R. G. Meyer, Analysis and Design of Analog Integrated Circuits. John Wiley & Sons, Inc., 1993.
- [99] The MathWorks, Inc., MATLAB 4.2c Symbolic Toolbox. The MathWorks, Inc., 1994.
- [100] R. J. Baker, H. W. Li, and D. E. Boyce, CMOS Circuit Design, Layout, and Simulation. IEEE Press, 1998.
- [101] Cadence Design Systems, Spectre User Manual, 4.4.1. Cadence Design Systems, February 1997.

- [102] D. M. Santos, "A CMOS delay locked and sub-nanosecond time-to-digital converter chip," *IEEE Trans. on Nuclear Science*, vol. 43, pp. 1717–1719, June 1996.
- [103] M. Mota and J. Christiansen, "A high-resolution time interpolator based on a delay locked loop and an RC delay line," *IEEE Journal of Solid-State Circuits*, vol. 34, pp. 1360–1366, October 1999.
- [104] H. W. Johnson and M. Graham, High-Speed Digital Design: A Handbook of Black Magic. Prentice-Hall, Inc., 1993.
- [105] P. Andreani, F. Bigongiari, R. Roncella, R. Saletti, and P. Terreni, "A digitally controlled shunt capacitor CMOS delay line," *Analog Integrated Circuits and Signal Processing, An International Journal*, vol. 18, pp. 89–95, January 1999.
- [106] A. Hajimiri and T. H. Lee, *The Design of Low Noise Oscillators*. Kluwer Academic Publishers, 1999.
- [107] L. D. Jones and A. F. Chin, *Electronic Instruments and Measurements*. Prentice-Hall Inc., 1991.
- [108] Sheldon Ross, A First Course in Probability. Macmillan Publishing Co., 1976.
- [109] M. F. Toner and G. W. Roberts, "A BIST scheme for an SNR test of a sigma-delta ADC," Proc. of Int. Test Conf., pp. 805–814, 1993.

### **Appendix A**

# **BICS Frequency Response Analysis**

This appendix shows how the symbolic toolbox of MATLAB [99] has been used to determine the frequency response of the BICS circuit in Fig. 3.3.

First, the following steps are performed:

- 1. The nodal equations of the circuit ac model are written in terms of resistances, transconductances and capacitances.
- 2. The nodal equations are solved to obtain

$$Z_{BIC}(s) = \frac{V_{BIC}(s)}{I_{DD}(s)} = \frac{N_{BIC}(s)}{D_{BIC}(s)}$$

where  $N_{BIC}(s)$  and  $D_{BIC}(s)$  are the numerator and denominator of the  $Z_{BIC}(s)$ , respectively.

3. In N<sub>BIC</sub>(s) = a<sub>0</sub> + a<sub>1</sub>s + ... + a<sub>3</sub>s<sup>3</sup> and D<sub>BIC</sub>(s) = b<sub>0</sub> + b<sub>1</sub>s + ... + b<sub>4</sub>s<sup>4</sup>, each of coefficients a<sub>i</sub>, i = 1,..., 3, and b<sub>j</sub>, j = 1,..., 3, is expressed as a sum of products:

$$a_i = \sum_{l=1}^{L_i} PD_{il}$$

$$b_j = \sum_{l=1}^{K_j} PN_{jl}$$

where  $L_i$ ,  $K_j$  are the number of product terms in  $a_i$ ,  $b_j$ , respectively, and,  $PD_{il}$ ,  $PN_{jl}$ are the *l*-th product term in  $a_i$ ,  $b_j$ , respectively. The second column in Table A.1 reports the number of product terms in each of the coefficients ( $L_i$ s and  $K_j$ s).

- 4. For each  $a_i$  and  $b_j$ , the magnitude of some product terms is negligible compared to that of others. To identify such terms, the operating point values of various resistances, transconductances and capacitances in the circuit's ac model are substituted in the expressions for each product term. These operating point values are obtained by circuit simulation. For the circuit in Fig. 3.3, the Spectre [101] simulator and BSIM3 MOS models were used.
- 5. For the coefficient  $a_i$ , i = 0, 1, 2, 3, and  $b_j$ , j = 0, 1, 2, 3, 4, only the significant product terms with a magnitude greater than 1/10 of the maximum of all the terms are kept while the rest of the terms are assumed negligible, and consequently eliminated, *i.e.*,:

$$if PD_{il} < PD_{i_{max}}/10$$
 then  $PD_{il} = 0$ 

where  $PD_{i_{max}} = \max(PD_{i1}, PD_{i2}, ...)$ . The number of retained terms in each coefficient is given in the third column of Table A.1.

The general forms of the resulting product terms for coefficients  $a_i$  and  $b_j$  are given in the fourth column of Table A.1, where the *i*, *j*, *k* and *l* subscripts indicate the indices of different transconductances, resistances and capacitances. Initially, we assume that  $N_{BIC}(s)$  has a root,  $s = z_d$ , at which  $a_3s^3$  and  $a_2s^2$  are negligible in comparison with  $a_0$ . Therefore:

$$z_d = a_0/a_1 = 1/(r_{13}C_t)$$

| Coefficient           | Total # of    | # of significant | General form of            |  |  |
|-----------------------|---------------|------------------|----------------------------|--|--|
| •                     | product terms | product terms    | significant product terms  |  |  |
| <i>a</i> <sub>0</sub> | 10            | 2                | $g_{mi}g_{mj}/r_{dl}$      |  |  |
| $a_1$                 | 45            | 2                | $C_i g_{mi} g_{mj}$        |  |  |
| $a_2$                 | 59            | 8                | $C_i C_j g_{mi}$           |  |  |
| $a_3$                 | 22            | 8                | $C_i C_j C_l$              |  |  |
| $b_0$                 | 13            | 1                | $g_{mi}g_{mj}g_{ml}g_{mk}$ |  |  |
| $b_1$                 | 107           | 5                | $g_{mi}g_{mj}g_{ml}C_k$    |  |  |
| $b_2$                 | 243           | 2                | $g_{mi}g_{mj}C_lC_k$       |  |  |
| $b_3$                 | 208           | 8                | $g_{mi}C_jC_lC_k$          |  |  |
| <i>b</i> <sub>4</sub> | 59            | 8                | $C_i C_j C_l C_k$          |  |  |

Table A.1: Number of product terms in the coefficients of  $N_{BIC}(s)$  and  $D_{BIC}(s)$ 

where  $r_{13} = r_1 || r_3$  and  $C_t = C_{comp} + C_{gs12}$ .

Now we verify that  $a_3s^3$  and  $a_2s^2$  are indeed much smaller than  $a_0$  by showing that  $a_3s^3/a_0 \ll 0$  and  $a_2s^2/a_0 \ll 0$ . To do so, note that the  $C_i$ 's,  $C_j$ 's and  $C_k$ 's are in the range of 0.1pF to 0.55pF,  $r_{13}$  and  $r_i$ 's in the range of 0.7  $M\Omega$  to 1.3  $M\Omega$  and  $g_{mi}$ s between  $10^{-4}/\Omega$  and  $1.3 \times 10^{-4}/\Omega$ . Hence, the ratio of  $a_3s^3$  and  $a_2s^2$  to  $a_0$  at  $s = z_d$  evaluate to:

$$\begin{array}{ll} a_{3}s^{3}/a_{0} & \sim \frac{(C_{i}C_{j}C_{l})/(r_{13}^{3}C_{t}^{3})}{g_{mi}g_{mi}/r_{i}} \sim \frac{1}{g_{mi}g_{mi}r_{13}^{2}} \sim 10^{-4} \\ a_{2}s^{2}/a_{0} & \sim \frac{(g_{mi}C_{j}C_{l})/(r_{13}^{2}C_{t}^{2})}{g_{mi}g_{mi}/r_{i}} \sim \frac{1}{g_{mi}r_{13}} \sim 10^{-2} \end{array}$$
(1A)

The above approximation verifies the assumptions used to calculate  $s = z_d$ . Numerical solutions of  $N_{BIC}(s) = 0$  and  $D_{BIC}(s) = 0$  show that the rest of the poles and zeros of the circuit are located at frequencies at least 10 times higher than  $z_d$ . Therefore,  $z_d$ , rewritten below, is considered a dominant zero:

$$z_d = -1/(C_{comp} + C_{gs12})(r_{d1}||r_{d3})$$
(2A)

#### **Appendix B**

# N and $\overline{I}$ Relationship in the Single-Phase BICI

This appendix contains the derivation of Eqn. 3.16:

$$N = KT\overline{I} + O + E \tag{1B}$$

In the circuit shown in Fig. 3.6, the counter is incremented by the value  $N_i$  at the end of the integration sub-window  $\phi_{i+1}$ .  $N_i$  corresponds to the number of clock (CLK) cycles that span the time takes for  $V_S$  to ramp from 0 to  $V_i$  (voltage stored on  $C_4$ ). Since the ramp slope is  $K_r = I_{ramp}/C_4$ , the counter is enabled for a time  $TC_i$  during the (i + 1)-th period of  $\phi$ :

$$TC_i = \frac{V_i}{K_r} + \tau$$

where  $\tau$  is the delay between the time  $V_S = V_i$  and the time counter stops counting.  $\tau$  is due to the comparator delay and input offset voltage, and the propagation delay from node r1 to  $i_rst$ . Therefore,  $N_i$  can be obtained from Eqn. 2B.

$$N_{i} = \frac{TC_{i}}{T_{CLK}} - Q_{i}$$

$$= \frac{1}{K_{r}T_{CLK}}V_{i} + \frac{\tau}{T_{CLK}} - Q_{i}$$
(2B)

where  $T_{CLK}$  is the period of CLK, and  $Q_i$  is the quantization noise. Assuming that the *I* and CLK signals are uncorrelated,  $Q_i$  can be considered as a random number uniformly distributed in the range (0,1) with a mean and variance given by:

$$m_{Q_i} = 0.5 \tag{3B}$$

$$\sigma_{Q_i}^2 = \frac{1}{12} \tag{4B}$$

Since  $N = \sum_{i=0}^{M-1} N_i$ , from (2B),

$$N = \sum_{i=0}^{M-1} \left( \frac{1}{K_r T_{CLK}} V_i + \frac{\tau}{T_{CLK}} - Q_i \right)$$
(5B)

$$= \frac{1}{K_r T_{CLK}} \sum_{i=0}^{M-1} (V_i) + \frac{M\tau}{T_{CLK}} - Q$$
(6B)

where  $Q = \sum_{i=0}^{M-1} Q_i$ . Since the  $Q_i$ 's are independent and M is generally large, from the central limit theorem Q has a Gaussian distribution with a mean and variance given by:

$$m_Q = 0.5M \tag{7B}$$

$$\sigma_Q^2 = \frac{M}{12} \tag{8B}$$

To complete the derivation,  $V_M = \sum_{i=0}^{M-1} V_i$  is obtained from Eqns. 3.15 and 3.14:

$$V_{M} = \sum_{i=0}^{M-1} V_{i}$$

$$= K_{23} \sum_{i=0}^{M-1} V_{C}^{i}$$

$$= K_{23} \sum_{i=0}^{M-1} \left(\frac{1}{C_{1} + C_{2}} \int_{t_{i}}^{t_{i}'} I dt + K_{21} V_{i-1}\right)$$

$$= \frac{K_{23}}{C_{1} + C_{2}} \sum_{i=0}^{M-1} \int_{t_{i}}^{t_{i}'} I dt + K_{23} K_{21} \sum_{i=0}^{M-2} V_{i}$$
(9B)

Replacing  $V_i$  and  $V_C^i$  from Eqns. 3.15 and 3.14 recursively in Eqn. 9B yields:

$$V_{M} = \frac{K_{23}}{C_{1} + C_{2}} \sum_{i=0}^{M-1} [1 + (K_{23}K_{21}) + (K_{23}K_{21})^{2} + \dots + (K_{23}K_{21})^{M-i-1}] \int_{t_{i}}^{t_{i}'} I dt$$
  
$$= \frac{K_{23}}{C_{1} + C_{2}} \sum_{i=0}^{M-1} (\frac{1 - (K_{23}K_{21})^{M-i}}{1 - K_{23}K_{21}}) \int_{t_{i}}^{t_{i}'} I dt$$
(10B)

Generally,  $C_1 \gg C_2$ , therefore,  $K_{23}K_{21} \ll 1$  and  $[1 - (K_{23}K_{21})^{M-i}] = 1$  for i = 0, M - 2. Hence, Eqn. 10B simplifies to:

$$V_{M} = \frac{K_{23}}{(C_{1} + C_{2})(1 - K_{23}K_{21})} \left[\sum_{i=0}^{M-1} \left(\int_{t_{i}}^{t_{i}'} Idt\right) - K_{23}K_{21} \int_{t_{M-1}}^{t_{M-1}'} Idt\right]$$
(11B)

where the term  $K_{23}K_{21}\int_{t_{M-1}}^{t'_{M-1}} Idt$  is the effect of the residual remained on  $C_2$  at the beginning of *M*-th integration sub-window. This term can be ignored because (*i*)  $(K_{23}K_{21}) \ll 1$ and (*ii*) the term  $\int_{t_{M-1}}^{t'_{M-1}} Idt$  is the result of integration in only one integration sub-window, whereas,  $\sum_{i=0}^{M-1} (\int_{t_i}^{t'_i} Idt)$  is the integration result over a large number of sub-windows. This yields:

$$V_M = K_M \sum_{i=0}^{M-1} \int_{t_i}^{t_i'} I dt$$
 (12B)

where  $K_M = \frac{K_{23}}{(C_1+C_2)(1-K_{23}K_{21})}$ . Substituting  $t_i = iT_s$  and  $t'_i = (i+1)T_s - T_R$  into Eqn. 12B yields:

$$V_{M} = K_{M} \sum_{i=0}^{M-1} \left( \int_{iT_{s}}^{(i+1)T_{s}} I dt - \int_{(i+1)T_{s}-T_{R}}^{(i+1)T_{s}} I dt \right)$$
  
=  $K_{M} \int_{0}^{T} I dt - K_{M} R$  (13B)

where  $R = \sum_{i=0}^{M-1} \int_{(i+1)T_s}^{(i+1)T_s} I dt$ . The reset error, R, results from not integrating I for a time  $T_R$  in each sub-window. This is required to reset the circuit for integration in the subsequent sub-window. Denoting each term in R as  $R_i$ , then:

$$R_{i} = \int_{(i+1)T_{s}-T_{R}}^{(i+1)T_{s}} I dt$$
(14B)

 $T_R$  can be such that  $T_R/T_s \ll 1$ . Assuming  $1/T_R$  is much larger than the bandwidth of I, I can be considered as approximately constant during the time  $T_R$ . Therefore, we can express  $R_i$  as:

$$R_i = I(t_i'')T_R$$

where  $t''_i = iT_s - T_R/2$ . Since the time  $t''_i$  is set by  $\phi$  which is independent of I,  $I(t''_i)$  can be considered as a random variable. Assuming  $t''_i$  is uniformly distributed over one period of I, the mean and variance of  $R_i$  are given by:

$$m_{R_i} = T_R \frac{1}{T_I} \int_0^{T_I} I(t) dt = \overline{I} T_R$$
$$\sigma_{R_i}^2 = T_R^2 [\frac{1}{T_I} \int_0^{T_I} [I(t) - \overline{I}) dt]^2 = \overline{I_{ac}^2} T_R^2$$

where  $T_I$  is the period of I and  $\overline{I_{ac}^2}$  is the ac power of the signal I. Assuming  $R_i$ 's to be mutually independent, then:

$$m_R = M \overline{I} T_R \tag{15B}$$

$$\sigma_R = M \overline{I_{ac}^2} T_R^2 \tag{16B}$$

From Eqns. 6B, 13B:

$$N = K_M / K_r T_{CLK} \int_0^T I dt + M \tau / T_{CLK} - K_M R / K_r T_{CLK} - Q$$
(17B)

Substituting the mean of Q and R from Eqns. 7B, 7B, 15B and 16B into Eqn. 17B, yields:

$$N = K \int_0^T I dt + O - E \tag{18B}$$

where  $K = K_M/K_r T_{CLK}$ ,  $O = M\tau/T_{CLK} - M(0.5 - K_M \overline{I}T_R/K_r T_{CLK})$  and E is a random variable with zero mean and variance:

$$\sigma_E^2 = M \left[ \frac{1}{12} + \left( \frac{K_M}{K_r T_{CLK}} \right)^2 \overline{I_{ac}^2} T_R^2 \right]$$
(19B)

#### **Appendix C**

# N and $\overline{I}$ Relationship in the Double-Phase BICI

From Fig. 4.5, assume a half-wave integrator integrates the current I from  $t_i = iT$  to  $t'_i = iT + T/2$ . Therefore, the voltage  $v_{C_2}(t)$  across  $C_2$  is

$$v_{C2}(t) = \frac{1}{C_1 + C_2} \int_{t_i}^t I dt$$

The voltage  $v_{C2}(t)$  is sampled and held at  $t'_i = iT + T/2$  such that:

$$V_{C2}^{i} = v_{C2}(t)|_{t=t_{i}'} = \frac{1}{C_{1} + C_{2}} \int_{iT}^{iT + T/2} I dt$$
(1C)

From t = iT + T/2 to t = (i + 1)T, the charge on  $C_2$  is transferred into  $C_3$ . Therefore, the voltage across  $C_3$  at t = (i + 1)T is

$$V_{C3}^{i} = \frac{C_2}{C_3} V_{C2}^{i} + V_{i-1}^{R} + V_{CMP1}$$

where  $V_{i-1}^R$  is the initial voltage on  $C_3$  remaining from the previous digitization cycle and  $V_{CMP1}$  is a constant voltage resulting from charge accumulation on  $C_3$  due to comparator

input voltage offset and delay. From t = (i + 1)T to t = (i + 1)T + T/2,  $V_{C3}^i$  is digitized by counting the number of clock cycle it takes to discharge  $C_3$ . The discharge is performed using current  $I_{ramp}$  until  $v_{C3}(t)$  becomes negative. Therefore:

$$C'_{3}(V_{C3}^{i} - V_{R}^{i}) = (N_{i}T_{clk} + \tau_{CMP2})I_{ramp}$$

where  $\tau_{CMP2}$  is delay due to input voltage offset and switching delay of the COMP2,  $T_{clk}$  is clock period,  $C'_3$  is the effective capacitance being discharged, and  $N_i$  is the number of cycles it takes to discharge  $C'_3$ . Note that two different values for  $C_3$  are considered for charge and discharge cycles to include different parasitic capacitances that affect  $C_3$  in these two cycles.  $N_i$ 's (i = 1, ..., M) are added to obtain half-wave integration result for I, N, as follows:

$$N = \sum_{i=1}^{M} N_{i}$$
  
=  $\frac{M \tau_{CMP2}}{T_{clk}} + \frac{C_{3}V_{CMP1}}{T_{clk}I_{ramp}} + \frac{C'_{3}C_{2}}{C_{3}T_{clk}I_{ramp}} \sum_{i=0}^{M-1} V_{C2}^{i} + V_{R}^{M-1}$  (2C)

Substituting  $V_{C2}^i$  from Eqn. 1C into 2C yields:

$$N = K \sum_{i=0}^{M-1} \int_{iT}^{iT+T/2} I dt + O + Q$$

where  $Q = V_R^{M-1}$  is the quantization voltage error,  $K = \frac{C'_3 C_2}{C_3 T_{clk} I_{ramp}(C_1 + C_2)}$  is the proportionality coefficient, and  $O = (\frac{M \tau_{CMP2}}{T_{clk}} + \frac{C_3 V_{CMP1}}{T_{clk} I_{ramp}})$  is the offset.

## **Appendix D**

# **TDC** Calibration

#### **D.1** Two-point Calibration

As shown in Sec. 5.2.6, the following relates  $T_d$ , the time interval to be measured, and N, the measurement made by the TDC:

$$NT_{\Delta} = T_d + T_C + T_Q + T_R \tag{1D}$$

In a two-point calibration scheme,  $T_{\Delta}$  and  $T_C$  are estimated by measuring two reference time intervals:

$$N_{cal1}T_{\Delta} = T_{cal1} + T_C + T_{Q1} + T_{R1}$$

$$N_{cal2}T_{\Delta} = T_{cal2} + T_C + T_{Q2} + T_{R2}$$
(2D)

Therefore,

$$T_{\Delta} = \frac{T_{cal2} - T_{cal1}}{N_{cal2} - N_{cal1}} + \frac{T_{Q2} - T_{Q1} + T_{R2} - T_{R1}}{N_{cal2} - N_{cal1}} = T_{\Delta 0} + T_{\Delta e}$$
(3D)

187

where  $T_{\Delta 0}$  is the estimated  $T_{\Delta}$  and  $T_{\Delta e}$  is the error associated with this estimation:

$$T_{\Delta 0} = \frac{T_{cal2} - T_{cal1}}{N_{cal2} - N_{cal1}}$$

$$T_{\Delta e} = \frac{T_{Q2} - T_{Q1} + T_{R2} - T_{R1}}{N_{cal2} - N_{cal1}}$$

 $T_{Q2}$  and  $T_{Q1}$  are two independent and uniformly distributed random variables in the range  $[0, T_{\Delta})$  (with mean of  $T_{\Delta}/2$  and variance of  $T_{\Delta}^2/12$ ), and,  $T_{R1}$  and  $T_{R2}$  are two independent normally distributed random variable with a mean of zero and standard deviation of  $\sigma_R$ . Therefore, from Eqn. 3D,  $T_{\Delta e}$  is a random variable with the following mean and variance:

$$m_{T_{\Delta e}} = 0$$

$$\sigma_{T_{\Delta e}}^{2} = \frac{T_{\Delta}^{2}}{6(N_{cal2} - N_{cal1})^{2}} + \frac{2\sigma_{R}^{2}}{(N_{cal2} - N_{cal1})^{2}}$$
(4D)

The larger the term  $(N_{cal2} - N_{cal1})$ , the less the error in estimating  $T_{\Delta}$ .

 $T_C$  is obtained as below:

$$T_{C} = \frac{T_{cal2}N_{cal1} - T_{cal1}N_{cal2}}{N_{cal2} - N_{cal1}} + \frac{(T_{Q2} + T_{R2})N_{cal1} - (T_{Q1} + T_{R1})N_{cal2}}{N_{cal2} - N_{cal1}} = T_{C0} + T_{Ce}$$

where  $T_{C0}$  is the  $T_C$  estimate and  $T_{Ce}$  is the random variable indicating the uncertainty in estimating  $T_C$ :

$$T_{C0} = \frac{T_{cal2}N_{cal1} - T_{cal1}N_{cal2}}{N_{cal2} - N_{cal1}}$$
(5D)

$$T_{Ce} = \frac{(T_{Q2} + T_{R2})N_{cal1} - (T_{Q1} + T_{R1})N_{cal2}}{N_{cal2} - N_{cal1}}$$
(6D)

The mean and variance of  $T_{Ce}$  are obtained as below:

$$m_{T_{Ce}} = -\frac{T_{\Delta}}{2}$$

188

$$\sigma_{T_{Ce}}^{2} = \left(\frac{T_{\Delta}^{2}}{12} + \sigma_{R}^{2}\right) \frac{1 + \left(N_{cal2}/N_{cal1}\right)^{2}}{(1 - N_{cal2}/N_{cal1})^{2}}$$
(7D)

Using the number N associated with measuring an interval  $T_d$  in Eqn. 1D, the  $T_d$  estimate given by the TDC,  $\hat{T}_d$ , is:

$$\hat{T}_d = NT_{\Delta 0} - T_{C0}$$

Therefore the measurement error is:

$$T_{de} = T_d - \hat{T}_d = NT_{\Delta e} + T_{Ce} + T_Q + T_R$$
 (8D)

The mean and variance of the measurement error are:

$$m_{T_{de}} = Nm_{T_{\Delta e}} + m_{T_{Ce}} + m_Q + m_R = 0$$
(9D)  

$$\sigma_{T_{de}}^2 = N^2 \sigma_{T_{\Delta e}}^2 + \sigma_{T_{Ce}}^2 + \sigma_Q^2 + \sigma_R^2$$
  

$$= \frac{N^2 (T_{\Delta}^2/6 + 2\sigma_R^2)}{(N_{cal2} - N_{cal1})^2} + (T_{\Delta}^2/12 + \sigma_R^2) \frac{N_{cal2}^2 + N_{cal1}^2}{(N_{cal2} - N_{cal1})^2} + T_{\Delta}^2/12 + \sigma_R^2$$
(10D)  

$$= (1 + \frac{2N^2 + N_{cal2}^2 + N_{cal1}^2}{(N_{cal2} - N_{cal1})^2})(T_{\Delta}^2/12 + \sigma_R^2)$$

Assuming  $T_{cal2} = 2T_{cal1}$  and  $T_{cal1} >> T_C$ , one can conclude that  $N_{cal2} \simeq 2N_{cal1}$ . Under this assumption the statistics of  $T_{Ce}$  are:

$$\sigma_{T_{Ce}}^2 = \frac{5T_{\Delta}^2}{12} + 5\sigma_R^2$$

Furthermore, assuming  $N \ll (N_{cal2} - N_{cal1})$ , the rms measurement error is  $\sigma_{T_{de}} = \sqrt{T_{\Delta}^2/2 + 6\sigma_R^2}$ . If  $\sigma_R$  is considered negligible, the rms measurement error due to quantization is obtained as  $T_{\Delta}/\sqrt{2}$  and the worst case error  $(3\sigma \text{ band})$  is  $\pm (3/\sqrt{2})T_{\Delta} \simeq \pm 2T_{\Delta}$ . It is noteworthy that in Eqn. 8D  $T_{Ce}$  and  $T_{\Delta e}$  are constant for all the actual measurements since they are a result of calibration, whereas,  $T_Q$  and  $T_R$  vary for each measurement sample.

#### **D.2** *n*-point Calibration Technique

A method to increase the accuracy of measurement accuracy is to limit the variation range of  $T_C$ ,  $T_{Ce}$ , in Eqn. 8D. This means reducing the variance  $\sigma_{T_{Ce}}$ . This may be done by the using n-point calibration technique described next. In this technique, n accurately known time intervals are measured by the TDC. These time intervals are multiples of a reference interval  $T_{cal} = 0, T_{ref}, 2T_{ref}, \dots, (n-1)T_{ref}$ . Assuming  $T_R$  is negligible, the measurements can be expressed as below:

$$N_1 T_\Delta = T_C + T_{Q1} \tag{11D}$$

$$N_2 T_\Delta = T_{ref} + T_C + T_{Q2} \tag{12D}$$

$$N_3 T_\Delta = 2T_{ref} + T_C + T_{Q3} \tag{13D}$$

(14D)

(15D)

(16D)

$$N_n T_\Delta = (n-1)T_{ref} + T_C + T_{Qn}$$
 (17D)

The objective of *n*-point calibration is to limit the range of  $T_{Q1}$  variations, which in turn reduces  $\sigma_{T_{Ge}}$ .

In Eqn. 11D to 17D,  $T_{ref}$  and  $T_{\Delta}$  are assumed to be known ( $T_{\Delta}$  can be estimated within a 0.1% accuracy using two of the chosen calibration points as discussed in Sec. D.1). Assume the following is defined:

$$N_{ref}T_{\Delta} = T_{ref} + T_{Qref} \tag{18D}$$

where  $N_{ref}$  is an integer and  $T_{Qref}$  is a time interval in the range  $[0, T_{\Delta})$ . Since  $T_{ref}$  and  $T_{\Delta}$  are accurately known,  $N_{ref}$  and  $T_{Qref}$  are also known from Eqn. 18D. Substituting  $T_{ref}$  and  $T_C$  from 18D and 11D, respectively, in Eqn. 12D yields:

$$N_2 T_{\Delta} - N_{ref} T_{\Delta} - N_1 T_{\Delta} = T_{Q2} - T_{Q1} - T_{Qref}$$
(19D)

The right side of Eqn. 19D is a multiple of  $T_{\Delta}$ . Therefore, the left side must also be a multiple of  $T_{\Delta}$ . Since  $T_{Q_1}$  and  $T_{Qref}$  must be in the range  $[0, T_{\Delta})$ , the two following cases are possible:

CASE 1: 
$$T_{Q2} - T_{Q1} - T_{Qref} = 0$$
 (20D)

$$T_{Q_2} = T_{Q_1} + T_{Qref}$$
 (21D)

CASE 2: 
$$T_{Q2} - T_{Q1} - T_{Qref} = -T_{\Delta}$$
 (22D)

$$T_{Q2} = T_{Q1} + T_{Qref} - T_{\Delta} \tag{23D}$$

Since  $T_{Q2} \in [0, T_{\Delta})$ , Eqns. 21D and 23D result in two different ranges for  $T_{Q2}$ . The intersection of these ranges with  $[0, T_{\Delta})$ , results in a smaller range for  $T_{Q2}$  as shown below: CASE 1:

$$T_{Q1} \in [-T_{Qref}, T_{\Delta} - T_{Qref}) \cap [0, T_{\Delta})$$
  

$$\in [0, T_{\Delta} - T_{Qref})$$
(24D)

and

$$N_2 = N_{ref} + N_1 \tag{25D}$$

CASE 2:

$$T_{Q_1} \in [T_{\Delta} - T_{Qref}, 2T_{\Delta} - T_{Qref}) \cap [0, T_{\Delta})$$
  

$$\in [T_{\Delta} - T_{Qref}, T_{\Delta})$$
(26D)

and

$$N_2 = N_{ref} + N_1 - 1 \tag{27D}$$

Therefore, if Eqn. 25D holds, the range of variation for  $T_{Q1}$  is limited to the range in Eqn. 24D. Similarly, the range in Eqn. 26D applies if Eqn. 27D holds.

The process of limiting the range of variations for  $T_{Q1}$  can be continued by adding another calibration point, *e.g.*,  $2T_{ref}$  (Eqn. 13D). Substituting  $T_{ref}$  and  $T_C$  from 18D and 11D, respectively, in Eqn. 13D results in:

$$(N_3 - 2N_{ref} - N_1)T_{\Delta} = T_{Q2} - T_{Q1} - 2T_{Qref}$$
(28D)

Using the same reasoning as before, three cases are possible: CASE 3:

$$T_{Q2} - T_{Q1} - 2T_{Qref} = 0$$

$$T_{Q2} = T_{Q1} + 2T_{Qref}$$
(29D)

Therefore:

$$T_{Q1} \in [-T_{Qref}, T_{\Delta} - 2T_{Qref}) \cap [0, T_{\Delta})$$
  

$$\in [0, T_{\Delta} - 2T_{Qref})$$
(30D)

and

$$N_3 = 2N_{ref} + N_1$$
 (31D)

CASE 4:

$$T_{Q2} - T_{Q1} - 2T_{Qref} = -T_{\Delta}$$

$$T_{Q2} = T_{Q1} + 2T_{Qref} - T_{\Delta}$$
(32D)

Therefore:

$$T_{Q1} \in [T_{\Delta} - 2T_{Qref}, 2T_{\Delta} - 2T_{Qref}) \cap [0, T_{\Delta})$$
  

$$\in [0, 2T_{\Delta} - 2T_{Qref}) \qquad if \quad T_{\Delta} - 2T_{Qref} < 0 \qquad (33D)$$
  

$$\in [T_{\Delta} - 2T_{Qref}, T_{\Delta}) \qquad if \quad T_{\Delta} - 2T_{Qref} > 0$$

and

$$N_3 = 2N_{ref} + N_1 - 1 \tag{34D}$$

CASE 5:

$$T_{Q2} - T_{Q1} - 2T_{Qref} = -2T_{\Delta}$$

$$T_{Q2} = T_{Q1} + 2T_{Qref} - 2T_{\Delta}$$
(35D)

Therefore:

$$T_{Q1} \in [2T_{\Delta} - 2T_{Qref}, 3T_{\Delta} - 2T_{Qref}) \cap [0, T_{\Delta})$$
  

$$\in [0, 3T_{\Delta} - 2T_{Qref}) \qquad if \quad 2T_{\Delta} - 2T_{Qref} < 0 \qquad (36D)$$
  

$$\in [2T_{\Delta} - 2T_{Qref}, T_{\Delta}) \qquad if \quad 2T_{\Delta} - 2T_{Qref} > 0$$

and

$$N_3 = 2N_{ref} + N_1 - 2 \tag{37D}$$

If both  $T_{ref}$  and  $2T_{ref}$  are used, the intersections of CASE 1 or 2 with CASE 3, 4 or 5 will result in tighter bounds on  $T_{Q1}$  which yield a more accurate estimate of  $T_C$ .

#### **Appendix E**

#### Metastability window of a D flip-flop

This appendix provides an estimate of the metastability window for a D flip-flop in a 0.35  $\mu m$  digital cell library obtained through simulation.

Assuming the two signals Din and clkin are applied to the D and clk inputs of a flipflop, respectively, we define  $\tau_D$  as:

$$\tau_D = t_{clkin} - t_{Din}$$

where  $t_{Din}$  and  $t_{clkin}$  indicate the times at which the rising edges of Din and clkin signals occur. Denoting the flip-flop's setup time by  $\tau_{setup}$ , we define metastability window of a flip-flop as the time interval  $[-T_{mw}/2, T_{mw}/2]$  such that if

$$\tau_{setup} - T_{mw}/2 < \tau_D < \tau_{setup} + T_{mw}/2,$$

then the nominal clk-to-Q delay of the flip-flop  $(\tau_{clk-to-Q})$  is increased by an amount  $\tau_{mt}$ . We choose  $\tau_{mt} = 3$  ns [104] because this value is close to  $T_B$ , the oscillation period of the oscillator B in the prototype TDC implemented (details given in Sec. 5.5). Any delay exceeding this threshold may result in a  $T_{\Delta}$  measurement error, as explained in Sec. 5.2.4.



The test bench shown in Fig. E.1(a) is used to estimate  $T_{mw}$ . The buffers and the

Figure E.1: The test bench for estimating the metastability window of a D flip-flop

OR gate are used to model the inverters and the loading at the output of EOC\_DFF in the TQ of Fig. 5.6, respectively. The timing diagram for input signals Din, clkin, and rb are shown in Fig. E.1(b). This circuit is simulated for different values of  $\tau_D$  and the Q output is monitored. The simulated DFF output for five values of  $\tau_D$ , illustrated in Fig. E.2, show that the DFF output switching delay increases as  $\tau_D$  approaches  $\tau_{setup}$ . Plots of Fig. E.3 show  $\tau_{clk-to-Q}$  versus  $\tau_D$ . These plots indicate that for  $\tau_D = 80.17ps$ , the output remains LOW, whereas for  $\tau_D = 80.18ps$ , the output switches HIGH. The additional clk-to-delay (compared to the nominal delay) for  $\tau_D = 80.18ps$  is approximately 460 ps which is still less that the 3 ns threshold. This result suggests that the metastability window of the DFF is  $T_{mw} < (80.18 - 80.17) ps = 0.01 ps$ .



Figure E.2: Simulation results of the flip-flop output for five different values of  $\tau_D$ 



Figure E.3: Simulated clk-to-Q delay versus  $\tau_D$ 

196

#### **Appendix F**

#### **Range Extender Block Analysis**

In Fig. 5.7(b), the numbers at the outputs of cntrA and cntrB versus time can be expressed as follows:

$$M_A(t) = \lfloor \frac{t}{T_A} \rfloor + 1 + M_A(0) \tag{1F}$$

$$M_B(t) = \lfloor \frac{t - T_D}{T_B} \rfloor + 1 + M_B(0)$$
(2F)

where  $M_A(0)$  and  $M_A(0)$  are the initial numbers in **cntrA** and **cntrB** at times t < 0, respectively, and  $T_D$  in the delay between the first rising edge of clkA and that of clkB, and  $\lfloor X \rfloor$  represents the integer part of X. As time progresses, for some value of *i*, the *i*-th rising edge of clkB precedes that of clkA2, *i.e.*,  $t_{A2(i)} \ge t_{B(i)}$ . At this time, the outputs of cntrA and cntrB become equal for a very short amount of time ( $t_{eq}$  in Fig. 5.7(b)). Therefore:

$$M_A(0) + \frac{t}{T_A} = N_i + 1 - \epsilon_A \tag{3F}$$

$$M_B(0) + \frac{t - T_D}{T_A} = N_i + \epsilon_B \tag{4F}$$

197

where  $N_i$  is the state of cntrA and cntrB, and  $\epsilon_A$  and  $\epsilon_B$  are two very small real numbers. Assuming  $\epsilon_A$  and  $\epsilon_A$  are almost zero at  $t_{eq}$ , eliminating  $N_i$  from Eqn. 3F and Eqn. 4F, yields:

$$M_A(0) + \frac{t}{T_A} = M_B(0) + \frac{t - T_D}{T_A} + 1$$
(5F)

Therefore,

$$t = \frac{T_D}{T_\Delta} T_A + (M_A(0) - M_B(0) - 1) \frac{T_B}{T_\Delta} T_A$$
(6F)

 $\frac{T_D}{T_{\Delta}}$  is the number of cycles it takes for the *i*-th edge rising of clkA2 and that of clkB to match. Therefore  $\frac{T_D}{T_{\Delta}}T_A$  is the total time needed for this edge matching to occur. Since the goal is to have these edges match after  $\frac{T_D}{T_{\Delta}}T_A$ , the term  $(M_A(0) - M_B(0) - 1)\frac{T_B}{T_{\Delta}}T_A$  must be identically zero, *i.e.*,

$$M_A(0) = M_B(0) + 1 \tag{7F}$$

Eqn. 7F shows that **cntrA** must be initialized to a number corresponding to the initial state of **cntrB** plus one.

### Appendix G

# **Two-Parameter Model for** V<sub>dd</sub>-induced Gate Delay Variations

In this appendix, two different tests are described to validate the two parameter noise model in Eqn. 4H. We wish to model the effect of power supply voltage variations  $(V_{dd(e)})$  on gate delay. We denote this variation by  $\tau_e$ .

#### G.1 Test 1: Single Gate Delay Simulations

Assume the propagation delay variations of a digital gate is obtained as below:

$$\tau_e = \gamma V_e + \rho \tau_0 V_e \tag{1G}$$

where  $\tau_0$  is the static delay of the gate (assuming no power supply noise) and

$$V_e = \frac{1}{\tau_0} \int_{t=t_0}^{t_0+\tau_0} V_{dd(e)}(t)$$

where  $t_0$  is the time the input of the gate crosses the gate switching threshold. A first test to validate the model in Eqn. 1G is to assume the above model for delay variations of this digital gate, and obtain  $\gamma$  and  $\rho$  by fitting simulation results to the model. The discrepancies between the simulation results and the fitted model will indicate the model accuracy.

To perform the test, as shown in Fig. G.1, a digital gate is loaded with a capacitor to generate different static delays. For each case,  $\tau_e$  is measured through simulation for different values of  $V_e$ . For each value of  $V_e$ , the coefficients  $\gamma V_e$  and  $\rho V_e$  in Eqn. 1G are obtained by fitting the simulation data for different loading values to a straight line in MATLAB. Then, these coefficients are divided by  $V_e$  to estimate  $\gamma$  and  $\rho$  for each value of  $V_e$ . The average value of all these estimated coefficients is used as an estimate of  $\gamma$  and  $\rho$ , respectively.



Figure G.1: The test bench for validating the two parameter model for  $V_{dd}$ -induced gate delay variations

| Parameter        | LOW-to-HIGH transition |        |        |        | HIGH-to-LOW transition |        |        |        |
|------------------|------------------------|--------|--------|--------|------------------------|--------|--------|--------|
|                  | INVX1                  | INVX2  | NANDX1 | NANDX2 | INVX1                  | INVX2  | NANDX1 | NANDX2 |
| $\gamma$ (ps/V)  | -4.4                   | -3.81  | -3.76  | -3.67  | -2.27                  | -1.93  | -2.94  | -2.83  |
| ρ<br>(/V)        | -0.135                 | -0.153 | -0.166 | -0.172 | -0.219                 | -0.222 | -0.217 | -0.218 |
| rms error<br>(%) | 4.5                    | 4.3    | 4.5    | 4.5    | 4.6                    | 4.45   | 4.8    | 4.2    |

Table G.1:  $\gamma$  and  $\rho$  estimates and the resulting model errors for four different digital gates

The above test was performed on four different types of gates in a standard digital cell library for a  $0.35 \,\mu m$  CMOS technology: a single-drive inverter; a double-drive inverter; a single-drive 2-input NAND; and a double-drive 2-input NAND gate. The loading capacitances used were 0 to 70 fF in steps of 5 fF. Also,  $V_{dd(nominal)} = 3.3V$  and  $-165 \,mV < V_{dd(e)} < 165 \,mV$  which is equivalent to 10% peak-to-peak variations on  $V_{dd}$ . Table G.1, lists the estimated values of  $\gamma$  and  $\rho$  for LOW-to-HIGH and HIGH-to-LOW input transition for each type of gate. The rms error in each case, given in the last column, is less than 5% in all cases which proves that the two-parameter model of Eqn. 1G is adequate to model gate delay variations due to power supply noise.

#### G.2 Test 2: Ring Oscillator Test

A ring Oscillator is an efficient circuit to test gate delays. In this section, we validate the twoparameter gate delay variation model by comparing the simulation and modeling results for the output period and accumulative  $V_{dd}$ -induced jitter.

The output period of a ring oscillator (Fig. G.2) is a summation of the LOW-to-HIGH



Figure G.2: The ring oscillator test bench to validate the two-parameter model for  $V_{dd}$ -induced gate delay variations

and HIGH-to-LOW propagation delays of all the gates in the loop:

$$T = \sum_{i=1}^{2N} (\tau_{g(i)0} + \tau_{g(i)e})$$

where T is the output oscillation period,  $\tau_{g(i)0}$  and  $\tau_{g(i)e}$  are the *i*-th gate's static delay and delay variation, respectively. Therefore, the period jitter  $T_e$  is

$$T_e = \sum_{i=1}^{2N} \tau_{g(i)e}$$

Using the two-parameter model for gate delay variations yields the following for  $T_e$ :

$$T_e = \sum_{i=1}^{2N} \left[ \gamma \left( \frac{1}{\tau_{g(i)0}} \int_{t_i}^{t_i + \tau_{g(i)0}} V_{dd(e)} dt \right) + \rho \tau_{g(i)0} \left( \frac{1}{\tau_{g(i)0}} \int_{t_i}^{t_i + \tau_{g(i)0}} V_{dd(e)} dt \right) \right]$$
(2G)

Assuming  $\tau_{g(i)e} \ll \tau_{g(i)0}$ , and also that  $\tau_{g(1)0} = \tau_{g(2)0} = \ldots = \tau_{g(2N)0}$ , Eqn. 2G simplifies to:

$$T_{e} = \left(\frac{\gamma}{\tau_{g(1)0}} + \rho\right) \int_{t'_{i}}^{t'_{i}+T} V_{dd(e)} dt = \kappa \int_{t'_{i}}^{t'_{i}+T} V_{dd(e)} dt$$
(3G)

where  $\kappa$  is a constant and  $t'_i$  is the beginning time of the *i*-th period. Note that  $T_e$  in Eqn. 3G is independent of the number of gates N and the output loading of each gate. To validate

Eqn. 3G, a ring oscillator consisting of 21 NAND gates in a 0.35  $\mu m$  CMOS digital cell library was simulated with a number of sinusoidal power supply noise waveforms of the form:

$$V_{dd(e)} = V_E \sin(2\pi f t) \tag{4G}$$

for  $V_E = 0.165V$ , denoting  $\pm 5$  variations, and f=5 MHz, 50 MHz, 150 MHz, 350 MHz, 650 MHz and 1.1 GHz. A wide range of frequencies have been chosen to validate model for a large frequency range. Figs. G.3 shows that the derived model with parameters obtained previously matches the simulation results within 2% in for all the tested frequencies.

As a by-product of the above analysis, we obtain the peak-to-peak period jitter,  $T_{e(pp)}$ , for sinusoidal noise. This gives insight in the jitter behavior of ring oscillators in the presence of power supply noise. This is especially important whenever ring oscillators are used for timing purposes such as in the TDC application. Substituting  $V_{dd(e)}$  from Eqn. 4G in 3G:

$$T_{e(pp)} = 4\kappa \frac{V_E}{\omega} sin(\frac{\omega T}{2})$$

The above shows that the jitter at output of a ring oscillator decreases as the noise frequency increases.



Figure G.3: Ring oscillator period jitter from simulation and the two-parameter model for  $V_{dd}$ -induced gate delay variations, a)f=1.1 GHz, b)f=350 MHz, c)f=50 MHz, d)f=5 MHz

### **Appendix H**

### **TDC Power Supply Noise Analysis**

In this appendix, we analyze the effect of  $V_{dd}$ -induced jitter in TDC's Oscillators A and B on the accuracy of the TDC. We show that the differential nature of our TDC eliminates significant parts of the power supply noise.

We denote the error component due to power supply noise by  $E_{PS}$ . From Eqns. 5.23 and 5.21:

$$E_{PS} = \sum_{j=0}^{N-1} \left[ \sum_{i=0}^{2M-1} \left( \tau_{g(i,j)e}^{A} - \tau_{g(i,j)e}^{B} \right) \right]$$
(1H)

where  $\tau_{g(i,j)e}^{A}$  and  $\tau_{g(i,j)e}^{B}$  are the propagation delay jitter of the *i*-th gate of Oscillators A and B in the *j*-th period of clkA and clkB, respectively, and  $M = M_{A} = M_{B}$  is the number of gates in oscillators A and B. Defining the index k = Nj + M, the double summation in Eqn. 1H can be reduced to one summation:

$$E_{PS} = \sum_{k=0}^{2NM-1} (\tau_{g(k)e}^{A} - \tau_{g(k)e}^{B})$$
(2H)

where  $\tau_{g(k)e}^{A}$  and  $\tau_{g(k)e}^{B}$  are the variations of gate delay for the k-th switching event in oscil-

lators A and B, respectively.

To estimate the relationship between  $E_{PS}$  and power supply noise  $V_{dd(e)}$ , gate delay variations,  $\tau_{g(k)e}^{A}$  and  $\tau_{g(k)e}^{B}$ , are modeled as follows:

$$\tau_{g(k)e}^{A} = (\gamma + \rho \tau_{g(k)0}^{A}) V_{(k)e}^{A}$$
(3H)

$$\tau_{g(k)e}^{B} = (\gamma + \rho \tau_{g(k)0}^{B}) V_{(k)e}^{B}$$
(4H)

where  $\gamma$  and  $\rho$  are constants; and:

$$V_{(k)e}^{A} = \frac{1}{\tau_{g(k)0}^{A}} \int_{t=t_{k}^{A}}^{t_{k}^{A} + \tau_{g(k)0}^{A}} V_{dd(e)}(t)$$
(5H)

$$V_{(k)e}^{B} = \frac{1}{\tau_{g(k)0}^{B}} \int_{t=t_{k}^{B}}^{t_{k}^{B} + \tau_{g(k)0}^{B}} V_{dd(e)}(t)$$
(6H)

where  $t_k^A$  and  $t_k^B$  are the times at which the input switching thresholds are crossed in oscillators A and B, respectively, and  $V_{dd(e)}$  is the difference between the actual  $V_{dd}$  and the nominal noise-free value:  $V_{dd(e)} = V_{dd} - V_{dd(nominal)}$ . Note that in this analysis the switching threshold is  $V_{dd}(t_k)/2$ . Eqns. 4H and 4H model  $V_{dd}$ -induced gate delay variation as a function of two  $V_{dd}$ -dependent components, one independent of and the other one proportional to static delay of the gate. To validate this model, two different tests have been performed. Details are described in Appendix G.

From Fig. 5.7, for the interval t = 0 to  $t = T_d$ , only oscillator A oscillates. Assume  $N_1$  switching events of oscillator A occur during this interval. We split our noise analysis into two parts, one due to switchings from  $t = t_0 = 0$  to  $t = t_d = T_d$ , and the second part due to switchings events from  $t = t_d$  to  $t = t_{eoc} = T_{EOC}$ , where  $t_{EOC}$  (EOC: end of conversion) is the time the TDC requires to terminate the measurement (2*MN*-th edges of switching events in Osc-A and Osc-B occur at the same time). Therefore:

$$E_{PS} = E_{PS(a)} + E_{PS(b)} \tag{7H}$$

where:

$$E_{PS(a)} = \sum_{k=0}^{N_1-1} \tau_{g(k)e}^A,$$
(8H)

$$E_{PS(b)} = \sum_{k=N_1}^{2MN-1} \tau_{g(k)e}^A - \sum_{k=0}^{2MN-1} \tau_{g(k)e}^B$$
(9H)

Assuming that the power supply noise in the  $[0, t_d]$  window is independent of the noise during interval  $[t_d, t_{eoc}]$ , the total rms noise is:

$$\sigma_{PS} = \sqrt{\sigma_{PS(a)}^2 + \sigma_{PS(b)}^2}$$

To determine  $E_{PS(a)}$ , we use the analysis technique explained in Sec. G.2:

$$E_{PS(a)} = \kappa \int_{t=0}^{T_d} V dd(e)(t)$$
(10H)

where  $\kappa$  is a constant (see Appendix G.1). To obtain  $\sigma_{E_{PS(b)}}^2$ , we substitute  $\tau_{g(k)e}^A$  and  $\tau_{g(k)e}^B$  from 4H and 4H into 9H to yield:

$$T_{PS(b)} = \Gamma_1 + \Gamma_2 \tag{11H}$$

where:

$$\Gamma_{1} = \gamma \left[\sum_{k=N_{1}}^{2MN-1} \frac{1}{\tau_{g(k)0}^{A}} \int_{t=t_{k}^{A}}^{t_{k}^{A}+\tau_{g(k)0}^{A}} V_{dd(e)}(t) - \sum_{k=0}^{2MN-1} \frac{1}{\tau_{g(k)0}^{B}} \int_{t=t_{k}^{B}}^{t_{k}^{B}+\tau_{g(k)0}^{B}} V_{dd(e)}(t)\right]$$
(12H)  

$$\Gamma_{2} = \rho \left[\sum_{k=N_{1}}^{2MN-1} \int_{t=t_{k}^{A}}^{t_{k}^{A}+\tau_{g(k)0}^{A}} V_{dd(e)}(t) - \sum_{k=0}^{2MN-1} \int_{t=t_{k}^{B}}^{t_{k}^{B}+\tau_{g(k)0}^{B}} V_{dd(e)}(t)\right]$$
(12H)

Since  $t_{N_1}^A = t_0^B = t_d$ :

$$\sum_{k=N_1}^{2MN-1} \int_{t=t_k^A}^{t_k^A + \tau_{g(k)^0}^A} V_{dd(e)}(t) = \sum_{k=0}^{2MN-1} \int_{t=t_k^B}^{t_k^B + \tau_{g(k)^0}^B} V_{dd(e)}(t)] = \int_{t=t_d}^{t_{eoc}} V_{dd(e)}(t)$$

Therefore:

 $\Gamma_2 = 0 \tag{14H}$ 

Assume that all the gates in oscillator A have the same static delay, and similarly for the gates in oscillator B, *i.e.*,  $\tau_{g(k)0}^A = \tau_{A0}$  for k = 0, ..., 2MN - 1 and  $\tau_{g(k)0}^B = \tau_{B0}$  for k = 0, ..., 2MN - 1. From  $t = t_d$  to  $t = t_{eoc}$ , there are  $2MN - N_1$  and 2MN switching events in oscillators A and B, respectively:

$$\tau_{A0} = \frac{t_{eoc} - t_d}{2MN - N_1} \tag{15H}$$

$$\tau_{B0} = -\frac{t_{eoc} - t_d}{2MN} \tag{16H}$$

where  $N_1$  is the number of switching events occurring in oscillator A from t = 0 to  $t = T_d$ . Therefore:

$$\Gamma_{1} = -\gamma N_{1} \left(\frac{1}{t_{eoc} - t_{d}}\right) \int_{t=t_{d}}^{t_{eoc}} V_{dd(e)}(t)$$

$$= -\gamma N_{1} \overline{V_{dd(e)}(t_{d}, t_{eoc})}$$
(17H)

where  $\overline{V_{dd(e)}(t_d, t_{eoc})}$  is the average of  $V_{dd(e)}(t)$  over the time interval  $[t_d, t_{eoc}]$ .

From Eqns. 7H, 10H, 11H 14H and 17H, the following relationship follows for  $E_{PS}$ :

$$E_{PS} = \kappa \int_{t=0}^{T_d} V dd(e)(t) - \gamma N_1 \overline{V_{dd(e)}(t_d, t_{eoc})}$$
(18H)

Since  $T_d = N_1 \tau_{A0}$ , therefore:

$$T_{PS} = T_d[\kappa \overline{V_{dd(e)}(t_0, t_d)} - \frac{\gamma}{\tau_{A0}} \overline{V_{dd(e)}(t_d, t_{eoc})}]$$
(19H)

where  $\overline{V_{dd(e)}(t_0, t_d)}$  is the average of  $V_{dd(e)}$  over the interval  $[t_0, t_d]$ . Eqn. 19H shows that the  $V_{dd}$ -induced inaccuracy in the ring oscillators is proportional to the time interval being measured, *i.e.*,  $T_d$ .