# HIGH-SPEED OPTOELECTRONIC LINKS FOR DATACENTERS

by

### Abdelrahman Hesham Elsayed Ahmed

B.Sc., Alexandria University, Egypt, 2012

M.Sc., The American University in Cairo, Egypt, 2014

### A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF

#### THE REQUIREMENTS FOR THE DEGREE OF

#### DOCTOR OF PHILOSOPHY

in

#### THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES

(Electrical and Computer Engineering)

#### THE UNIVERSITY OF BRITISH COLUMBIA

(Vancouver)

March 2020

©Abdelrahman Hesham Elsayed Ahmed, 2020

The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled:

High-Speed Optoelectronic Links for Datacenters

submitted by Abdelrahman H. Ahmed in partial fulfillment of the requirements for

the degree of Doctor of Philosophy

in Electrical and Computer Engineering

#### **Examining Committee:**

Sudip Shekhar

Supervisor

Shahriar Mirabbasi

Supervisory Committee Member

Lukas Chrostowski

Supervisory Committee Member

Edmond Cretu

University Examiner

Joshua Folk

University Examiner

### Abstract

Optoelectronic (O/E) links are necessary for inter/intra-datacenter communication. Driven by the need to support higher data throughput, breakthroughs in Silicon-photonics and innovative circuit techniques are needed to enable efficient, compact, and low-cost links across a wide range of interconnect lengths.

For short-reach applications, where energy efficiency is a major concern, microring resonator (MRR)-based transmitters (TXs) promise low cost and dense multiplexing to replace their vertical-cavity surface-emitting laser (VCSEL)-based counterparts. This thesis presents an analysis of MRR-based links from the perspective of optical devices, circuits, and link budget and compares them to VCSEL-based links.

On the receiver (RX) side, sensitivity enhancement is necessary to improve the link's energy efficiency. Due to their multiplication gain, avalanche photodetectors (APDs) improve RX sensitivity. When implemented monolithically with the RX, they reduce cost and parasitics. An RX with a noise-canceling active balun is presented. The RX works as part of the APD bias stabilization loop. The integrated O/E-RX achieves a measured sensitivity of -18.8dBm at 0.57pJ/b.

A high-sensitivity, high-speed, and low-power RX demands solutions to the gain-bandwidth-

power trade-offs. Accordingly, a current-mode receiver that eliminates the noisy and powerhungry front-end is proposed. The proposed design converts the single-ended PD current into differential currents and resolves the data using a current-based sense amplifier.

For long-reach applications, where spectral efficiency is critical, coherent O/E links rely on advanced modulation and dual-polarization, leading to stringent link requirements. The TX requires high bandwidth (BW), linearity, swing, and reliability, while the RX requires minimizing noise and total harmonic distortion (THD) across gains and frequency.

A linear high-swing driver for Mach-Zehnder modulator is presented. The driver uses a voltage breakdown enhancement technique to ensure reliability, and resistor-based capacitor-splitting technique to enhance BW. It achieves  $6V_{ppd}$ , 3.6% THD, and >40GHz BW, enabling >0.5Tb/s/wavelength operation.

An auto-reconfigurable transimpedance amplifier satisfying the stringent noise-linearity conditions is presented. Operating on a single sense-voltage, it reduces base resistor noise, gain peaking, phase margin and  $f_T$  degradation. Techniques such as collaborative offset and DC current cancellation are also described. The RX achieves a gain of 75.5dBOhm and an input-referred noise of 18.5pA/sqrt(Hz) at 42GHz BW, enabling >0.5Tb/s/wavelength operation.

# Lay Summary

Optoelectronic links are necessary for inter/intra-datacenter communication. Driven by the need to support higher data throughput, breakthroughs in photonics and circuits are needed to enable efficient, compact, and low-cost links across a wide range of interconnect lengths.

For short-reach applications where a multitude of links are used inside a datacenter, the energy efficiency of the link is the primary concern. This dissertation discusses the different transmitter and receiver designs to achieve that goal.

For long-reach applications, where link installation is expensive, maximizing the link's ability to send more data is sought-after. Coherent optical links use advanced techniques to multiplex different data streams on the same link increasing the link's speed but imposing stringent requirements on the link components. This dissertation discusses various design approaches for the different link components to achieve the targeted performance.

### Preface

The content of this dissertation is mostly based on the publications listed below, which resulted from collaboration with other researchers, under the supervision of Professor Sudip Shekhar, and with partial support from Elenion Technologies, New York, NY, USA.

 A. H. Ahmed, A. Sharkia, B. Casper, S. Mirabbasi and S. Shekhar, "Silicon-Photonics Microring Links for Datacenters—Challenges and Opportunities," in IEEE Journal of Selected Topics in Quantum Electronics, vol. 22, no. 6, pp. 194-203, Nov.-Dec. 2016.

A. H. Ahmed and A. Sharkia worked on surveying optical devices used in the analysis. A.H. Ahmed worked on the mathematical analyses and scripts with help from S. Shekhar.A. H. Ahmed wrote the paper with the help of the co-authors. S. Shekhar supervised the project.

 S. Nayak, A. H. Ahmed, A. Sharkia, A. S. Ramani, S. Mirabbasi and S. Shekhar, "A 10-Gb/s -18.8 dBm Sensitivity 5.7 mW Fully-Integrated Optoelectronic Receiver With Avalanche Photodetector in 0.13-μ m CMOS," IEEE Trans. Circuits Syst. -I, vol. 66, no. 8, pp. 3162-3173, Aug. 2019.

S. Nayak worked on the APD design and the control loop. A. H. Ahmed worked on the

high-speed path design. A. Sharkia worked on the voltage booster design. S. Nayak, A. H. Ahmed, A. S. Ramani, and S. Shekhar worked on measuring different parts of the system. S. Nayak and S. Shekhar wrote the paper with help from the co-authors. S. Shekhar supervised the project.

### A. H. Ahmed et al., "A 6V Swing 3.6% THD >40GHz Driver with 4.5× Bandwidth Extension for a 272Gb/s Dual-Polarization 16-QAM Silicon Photonic Transmitter," IEEE Int. Solid-State Circuits Conf. (ISSCC), 2019, pp. 484-486.

A. H. Ahmed worked on the high-swing output stage design, with help from the colleagues at Elenion. A. Elmoznine assisted with the pre-driver design, D. Lim assisted with system verification, K. Padmaraju, T. Huynh and J. Roman provided test support, Y. Ma, Y. Liu, R. Shi, M. Streshinsky, A. Novack and R. Ding provided the optical transmitter and packaging support, C. Williams and L. Vera provided layout support, and R. Younce and R. Sukkar provided system analysis support. A. H. Ahmed wrote the papers with the help of S. Shekhar and A. Rylyakov.

The current-mode receivers were developed by A. H. Ahmed, under the supervision of Professor S. Shekhar.

# **Table of Contents**

|   | Abstract                                                                                                                                                                                                                                                                                                                                                   |
|---|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|   | Lay Summary                                                                                                                                                                                                                                                                                                                                                |
|   | Preface                                                                                                                                                                                                                                                                                                                                                    |
|   | Table of Contents                                                                                                                                                                                                                                                                                                                                          |
|   | List of Tables                                                                                                                                                                                                                                                                                                                                             |
|   | List of Figures                                                                                                                                                                                                                                                                                                                                            |
|   | List of Abbreviations                                                                                                                                                                                                                                                                                                                                      |
|   | Acknowledgments                                                                                                                                                                                                                                                                                                                                            |
|   | Dedication                                                                                                                                                                                                                                                                                                                                                 |
| 1 | Introduction11.1Communication inside a datacenter21.2Datacenter-to-datacenter communication31.3Dissertation contribution and organization5                                                                                                                                                                                                                 |
| 2 | Short-reach optoelectronic links72.1Optoelectronic link: receiver82.1.1Photodiode92.1.2TIA and RX Front-End102.1.3Noise Analysis of a Two-Stage RX Front-End122.2Optoelectronic link: transmitter152.2.1MRR-based transmitter162.2.2VCSEL-Based transmitter192.3Link budgets and systems evaluation202.4Towards a power-efficient Tb/s links262.5Summary29 |
| 3 | Avalanche photodetector-based receiver       31         3.1       Introduction and motivation       32                                                                                                                                                                                                                                                     |

|   | 3.2        | Avalanche photodetector    32      2 2 1    Desenensivity and link hydrat |
|---|------------|---------------------------------------------------------------------------|
|   | 22         | S.2.1 Responsivity and link budget                                        |
|   | 5.5<br>3.4 | System implementation and measurements 41                                 |
|   | 3.4        |                                                                           |
|   | 5.5        | Summary                                                                   |
| 4 | Cur        | rent-mode receiver                                                        |
|   | 4.1        | Introduction and motivation                                               |
|   | 4.2        | Proposed current-mode receiver                                            |
|   | 4.3        | System implementation and measurements                                    |
| 5 | Silic      | con-Photonic coherent transmitter                                         |
|   | 5.1        | Introduction                                                              |
|   | 5.2        | Coherent Silicon-photonics transmitter                                    |
|   | 5.3        | Mach-Zender modulator driver                                              |
|   |            | 5.3.1 Input buffer                                                        |
|   |            | 5.3.2 Variable gain amplifier                                             |
|   |            | 5.3.3 Driver output stage                                                 |
|   | 5.4        | System implementation and measurements                                    |
|   | 5.5        | Summary                                                                   |
| 6 | Silic      | con-Photonic coherent receiver                                            |
|   | 6.1        | Introduction                                                              |
|   | 6.2        | The proposed Auto-reconfigurable receiver                                 |
|   |            | 6.2.1 Auto-reconfigurable Transimpedance Amplifier                        |
|   |            | 6.2.2 Collaborative offset and DC cancellation                            |
|   |            | 6.2.3 Automatic/Manual gain control                                       |
|   | 6.3        | System implementation and measurements                                    |
|   | 6.4        | Summary                                                                   |
| 7 | Con        | clusion and future work                                                   |
|   | 7.1        | Conclusion                                                                |
|   | 7.2        | Future work                                                               |
|   | Bibl       | iography                                                                  |

# **List of Tables**

| 2.1 | Best-in-class photodetectors                                                      | 0  |
|-----|-----------------------------------------------------------------------------------|----|
| 2.2 | Best-in-class interferometric modulators                                          | 6  |
| 2.3 | Best-in-class MRR modulators                                                      | 7  |
| 2.4 | Best-in-class CW lasers                                                           | 8  |
| 2.5 | Best-in-class MM VCSELs                                                           | 9  |
| 2.6 | MRR-based link characteristics                                                    | 2  |
| 2.7 | VCSEL-based link characteristics                                                  | :3 |
| 3.1 | Performance summary and comparison to 850 nm linear CMOS TIAs 4                   | 3  |
| 4.1 | Performance summary and comparison to state-of-the-art clocked TIAs 5             | 8  |
| 5.1 | Performance summary of the high-swing linear driver and comparison to prior art 8 | ;7 |
| 5.2 | E/O TX assembly performance summary                                               | ;7 |
| 6.1 | Performance summary and comparison                                                | 13 |
| 7.1 | PAM-4-current-mode receiver schematic performance summary and compari-            |    |
|     | son to state-of-the-art                                                           | 18 |
|     |                                                                                   |    |

# **List of Figures**

| 1.1 | Optical links for inter/intra-datacenter communication.                                        | 1  |
|-----|------------------------------------------------------------------------------------------------|----|
| 1.2 | 25 Gb/s per lane I/O technology as a function of interconnect length for data-                 |    |
|     | center applications.                                                                           | 3  |
| 1.3 | Coherent transceivers: from LiNbO <sub>3</sub> + III-V to all-Si/SiGe Platform $\ldots \ldots$ | 4  |
| 2.1 | A typical O/E RX consisting of a PD, TIA, MA(s) and an SA                                      | 9  |
| 2.2 | RX sensitivity vs. input capacitance for different data rates                                  | 14 |
| 2.3 | RX sensitivity vs. input capacitance for different values of $C_F$                             | 15 |
| 2.4 | (a) MRR-based link. (b) MRR characteristics. (c) VCSEL-based link                              | 16 |
| 2.5 | MRR-based link efficiency degradation for CW Laser WPE of $1.3\%$ and $12\%$                   |    |
|     | (no significant difference in performance between 10 m and 100 m SMF)                          | 21 |
| 2.6 | VCSEL-based link efficiency degradation for 10 m, 100 m, and 300 m MMF                         |    |
|     | (a dispersion penalty of 0 dB for the 10 m and 100 m links, and 5 dB for the                   |    |
|     | 300 m link is assumed, respectively)                                                           | 22 |
| 2.7 | Energy efficiency vs. RX sensitivity for TX, RX and the overall VCSEL-based                    |    |
|     | and MRR-based links (100 m), excluding any clock and data recovery or clocking.                | 24 |
| 2.8 | Energy efficiency breakdown vs. data rate for VCSEL-based and MRR-based                        |    |
|     | links (100 m), excluding any clock and data recovery or clocking                               | 25 |

| 2.9  | MRR-based link energy efficiency vs. the CW laser WPE                                     | 25 |
|------|-------------------------------------------------------------------------------------------|----|
| 2.10 | A wavelength division multiplexing system with MRR-based links                            | 28 |
| 3.1  | A 10 Gb/s APD O/E RX in 0.13- $\mu$ m CMOS process                                        | 36 |
| 3.2  | A differential amplifier to convert single-ended TIA output to differential signals       |    |
|      | for the MAs                                                                               | 37 |
| 3.3  | A noise-canceling active balun to convert single-ended TIA output to differential         |    |
|      | signals for the MAs                                                                       | 38 |
| 3.4  | Simulated noise referred to TIA input to compare differential amplifier and               |    |
|      | noise-canceling active balun when both have the same gain and BW                          | 40 |
| 3.5  | Die micrograph of the APD O/E RX in 0.13- $\mu$ m CMOS                                    | 41 |
| 3.6  | Power breakdown based on post-layout simulations (with measured power con-                |    |
|      | sumption of 5.52 mW)                                                                      | 42 |
| 3.7  | Measured optical eye diagram at 10 Gb/s.                                                  | 42 |
| 3.8  | Measured BER vs. RX sensitivity at 10 Gb/s                                                | 43 |
| 4.1  | (a) $R_B/C_B$ combination used for single-ended to differential conversion. (b) $R_B/C_B$ |    |
|      | in the proposed design. (c) $R_B/C_B$ in [101]                                            | 47 |
| 4.2  | Full-rate current-mode receiver                                                           | 48 |
| 4.3  | Quarter-rate current-mode receiver.                                                       | 49 |
| 4.4  | Proposed current-SA.                                                                      | 50 |
| 4.5  | Clock phases, and current-SA phases of operation.                                         | 50 |
| 4.6  | Proposed current-SA: Phase1-reset                                                         | 52 |
| 4.7  | Proposed current-SA: Phase2-integrating the input on the output capacitors                | 52 |

| 4.8  | Proposed current-SA: Phase3-Decision, and regeneration phase                                     | 53 |
|------|--------------------------------------------------------------------------------------------------|----|
| 4.9  | (a) The current-SA during phase 2, (b) and its simplistic equivalent circuit of                  |    |
|      | the current-SA in phase 2                                                                        | 54 |
| 4.10 | current-mode receiver die micrograph.                                                            | 57 |
| 4.11 | current-mode receiver measurement setup                                                          | 57 |
| 4.12 | current-mode receiver measurement: BER vs. input current                                         | 58 |
| 5.1  | A Dual-Polarization (DP) QPSK/QAM coherent TX                                                    | 60 |
| 5.2  | The main building block of a coherent TX - a driver and a null-biased MZM $\ .$                  | 61 |
| 5.3  | Null-biased MZM operation to obtain a PSK modulated signal                                       | 62 |
| 5.4  | Mach-Zender modulator driver block diagram.                                                      | 66 |
| 5.5  | Input buffer schematic.                                                                          | 67 |
| 5.6  | VGA schematic.                                                                                   | 68 |
| 5.7  | (a) A conventional cascode output stage. (b) A voltage breakdown doubling                        |    |
|      | (VBD) output stage. (c) VBD limitations.                                                         | 70 |
| 5.8  | (a) Main and auxiliary paths connect to the same driving point. (b) Resistor-                    |    |
|      | based capacitor splitting. (c) BW extension due to resistor-based capacitor                      |    |
|      | splitting. (d) BW extension ratio vs. r/R                                                        | 72 |
| 5.9  | Output stage design evolution, changes in the driver transfer function and THD                   |    |
|      | enhancement: Voltage break-down doubler $\rightarrow$ Capacitor-splitting $\rightarrow$ emitter- |    |
|      | degeneration $\rightarrow$ emitter-degeneration and pre-emphasis                                 | 73 |
| 5.10 | Large signal analysis for the output stage: (a) Effect on the output based on                    |    |
|      | matching the timing between the two paths. (b) Dynamic base currents                             | 77 |

| 5.11 | The complete output stage of the MZM driver                                                       | 78 |
|------|---------------------------------------------------------------------------------------------------|----|
| 5.12 | Die photos of the Silicon-photonic IC and driver chips alongside the flip-chipped                 |    |
|      | assembly.                                                                                         | 79 |
| 5.13 | Electrical single-ended S21 of the driver.                                                        | 79 |
| 5.14 | Electrical single-ended S11 of the driver.                                                        | 80 |
| 5.15 | E/O S21 of the flip-chipped assembly.                                                             | 80 |
| 5.16 | (a) ROSNR test setup. (b) ROSNR performance of this work compared to a                            |    |
|      | setup comprising of a III-V driver and LiNbO <sub>3</sub> modulator                               | 82 |
| 5.17 | Breakdown of the power consumed in the driver.                                                    | 83 |
| 5.18 | Constellation for 272 Gb/s DP-16QAM.                                                              | 84 |
| 5.19 | Constellation for 256 Gb/s DP-QPSK.                                                               | 84 |
| 5.20 | optical modulation analyzer-recovered eye diagrams for (a) 272 Gb/s DP-                           |    |
|      | 16QAM, and (b) 256 Gb/s DP-QPSK                                                                   | 85 |
| 5.21 | Constellation for 552 Gb/s DP-16QAM.                                                              | 86 |
| 5.22 | Constellation for 408 Gb/s DP-64QAM.                                                              | 86 |
| 6.1  | A dual-polarization QAM Silicon-photonic O/E RX                                                   | 91 |
| 6.2  | IRN, THD, and ROSNR vs. $I_{IN}$ for different RX designs                                         | 92 |
| 6.3  | TIA design for improving THD across different frequencies for low-gain set-                       |    |
|      | tings: $g_m$ -control $\rightarrow Q_1$ reconfigurability $\rightarrow R_L$ -control              | 93 |
| 6.4  | $Q_1$ -reconfigurability for optimizing linearity at large $I_{IN}$ and noise at small $I_{IN}$ . | 95 |
| 6.5  | Block level diagram of a single channel RX along with the details for Collabo-                    |    |
|      | rative offset and $I_{DC}$ cancellation (COIDCC) and the gain control                             | 97 |

| 6.6  | Critical signals generated from the control block to control the TIA and VGAs     |
|------|-----------------------------------------------------------------------------------|
|      | as $V_{GC}$ changes                                                               |
| 6.7  | Measured $Z_T$ at max. and min. gain                                              |
| 6.8  | Measured IRN vs. $Z_T$                                                            |
| 6.9  | Measured THD at different frequencies and inputs                                  |
| 6.10 | Measured input DC voltage vs. the injected DC current                             |
| 6.11 | Die micrographs depicting the O/E RX: Silicon-photonic IC along with $2 \times 2$ |
|      | RX ICs, and a zoom-in on the RX IC                                                |
| 6.12 | Measured constellations for 528Gb/s DP-16QAM O/E RX                               |
| 7.1  | Input currents and output voltages of an NRZ-current-mode receiver for different  |
|      | PAM-4 symbols                                                                     |
| 7.2  | Input currents and output voltages of an NRZ-current-mode receiver for different  |
|      | PAM-4 symbols                                                                     |

# **List of Abbreviations**

| ADC                                         | Analog to digital converter                                                                                                          |
|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|
| APD                                         | Avalanche photodetector                                                                                                              |
| BER                                         | Bit error rate                                                                                                                       |
| BV                                          | Breakdown voltage                                                                                                                    |
| BW                                          | Bandwidth                                                                                                                            |
| CW                                          | Continuous-wave                                                                                                                      |
| DAC                                         | Digital to analog converter                                                                                                          |
| DCOC                                        | DC offset cancellation                                                                                                               |
|                                             |                                                                                                                                      |
| DP                                          | Dual-Polarization                                                                                                                    |
| DP<br>DSP                                   | Dual-Polarization<br>Digital signal processing                                                                                       |
| DP<br>DSP<br>ER                             | Dual-Polarization<br>Digital signal processing<br>Extinction ratio                                                                   |
| DP<br>DSP<br>ER<br>ESD                      | Dual-Polarization<br>Digital signal processing<br>Extinction ratio<br>Electrostatic discharge                                        |
| DP<br>DSP<br>ER<br>ESD<br>IDCC              | Dual-PolarizationDigital signal processingExtinction ratioElectrostatic dischargeIDC cancellation                                    |
| DP<br>DSP<br>ER<br>ESD<br>IDCC<br>IL        | Dual-PolarizationDigital signal processingExtinction ratioElectrostatic dischargeIDC cancellationInsertion loss                      |
| DP<br>DSP<br>ER<br>ESD<br>IDCC<br>IL<br>IRN | Dual-PolarizationDigital signal processingExtinction ratioElectrostatic dischargeI_DC cancellationInsertion lossInput-referred noise |

- LSB Least significant bit
- MA Main amplifier
- ML Modulation loss
- MM Multi-Mode
- **MMF** Multi-Mode fiber
- MRR Microring resonator
- MSB Most significant bit
- MZI Mach-Zehnder interferometer
- MZM Mach-Zehnder modulators
- NRZ Non-return to zero
- O/E Optoelectronic
- **OMA** Optical modulation amplitude
- **OSNR** Optical signal-to-noise ratio
- PAM Pulse amplitude modulation
- **PBSR** Polarization beam splitter and rotator
- PD Photodetector
- **PSK** Phase shift keying
- **QAM** Quadrature-amplitude modulation
- **QPSK** Quadrature phase shift keying
- **ROSNR** Required optical signal-to-noise ratio
- **RX** Receiver
- SA Sense-amplifier

- SiPh Silicon-photonics
- SM Single-Mode
- **SMF** Single-Mode fiber
- **SNR** Signal-to-noise ratio
- **SP** Single-polarization
- **THD** Total harmonic distortion
- **TIA** Transimpedance amplifier
- TX Transmitter
- VCSEL Vertical-cavity surface-emitting laser
- **VGA** Variable gain amplifier
- **WPE** Wall-plug efficiency

## Acknowledgments

I would like to extend my endless gratitude to all those who helped me through my PhD program. First, I would like to thank my PhD advisor and mentor, Professor Sudip Shekhar, for his guidance and hard work throughout this journey.

I would also like to thank my doctoral committee: Professors Shahriar Mirabbasi and Lukas Chrostowski. I thank Professors Nicolas Jaeger, Edmond Cretu, Joshua Folk, and Guangrui Xia for being part of my PhD examination committee.

I would like to show my gratitude to everyone at Elenion Technologies, and special thanks to Michael Hochberg, Alexander Rylyakov, and Leonardo Vera.

I am deeply thankful to my wonderful friends (A-Z): Abdelsalam Ahmed, Nour Seif, Ramy Tadros, Sherif Hussein, and Seif Ahmed for being there for me throughout the years.

Above all, a special thanks to my beloved family. Words cannot express how grateful I am to my mother, father, and sisters for their endless love, support, and patience.

Finally, I would like to thank the whole System-on-Chip (SoC) team, Professors, researchers, and administrative staff. They have been great people to work with, live with, and have fun with.

To my beloved family ...

# Chapter 1

# Introduction

Driven by the rapidly- progressing technologies, such as entertainment platforms, smartphones and the internet of things, the global data storage and traffic have increased exponentially in the past decade [1]. Thanks to the breakthroughs in photonics and electronic circuit techniques, optoelectronic (O/E) links have revolutionized the inter/intra-datacenter communication, shown in Figure 1.1, allowing for efficient, compact, and low-cost communication links across a wide range of applications and reaches. Inside a datacenter, where the data is transported locally through thousands of communication links, highly efficient O/E links are essential. On the other hand, as the communication between datacenters relies on fewer but more expensive optical



Figure 1.1: Optical links for inter/intra-datacenter communication.

links, maximizing the utilization of the spectrum on each of those links is of utmost importance.

### **1.1** Communication inside a datacenter<sup>1</sup>

The insatiable demand for data storage and communication has resulted in the rapid expansion of datacenters to massive warehouse proportions, necessitating optical interconnects between servers and racks that span a range of distances from a few meters to a few kilometers [3]. A significant portion of the operating cost of datacenters is attributed to the consumption of large amounts of electrical power. A notable portion of this electrical power, in turn, is dissipated in the high-speed interconnects. A reduction in the power consumption of the interconnects leads to more efficient datacenters, with lower operating costs and lower carbon footprints. Several published works have attempted to study the viability of building high-bandwidth (BW), Silicon-photonics-based, optical interconnects addressing the issue at the optical [4–6], electrical [7–9], and the system levels [2, 10]. At the optical level, the development of area and power-efficient, low loss, cheap, and CMOS-compatible components is rapidly underway. At the circuit level, highly sensitive receivers (RXs) and power-efficient transmitters (TXs) are two major areas of research focus. Moreover, a system-level study is as important as the design of each block separately. By taking a general look at the overall picture and carefully dividing the roles between the different subsystems, the ever-increasing performance demand can be achieved without over-designing every single block in the system.

Figure 1.2 shows the scope of different I/O technologies for datacenter applications for different interconnect lengths for a data rate of 25 Gb/s. For short-reach (< 100 m), Multi-

<sup>&</sup>lt;sup>1</sup>© of IEEE. Reprinted, with permission from [2]



Figure 1.2: 25 Gb/s per lane I/O technology as a function of interconnect length for datacenter applications.

mode (MM) vertical-cavity surface-emitting laser (VCSEL) coupled to MM fiber (MMF) are prevalent. Beyond 100 m, even the higher-BW OM4 MMFs suffer from significant loss. Due to modal and chromatic dispersion, along with mode partition noise from the MM VCSEL [10], a severe power penalty (> 5 dB) is predicted for a 300 m channel at a BER of 1E–12 [11].

The reach of MMF may be extended to a few hundred meters using single-mode (SM) or quasi-SM VCSELs [12]. SM VCSELs are also being developed to be used with SM fibers (SMF) to extend the reach further [13]. On the other hand, a Microring resonator (MRR)-based Silicon-photonic link presents itself as a strong candidate for these distances. Leveraging the benefits of CMOS technology and manufacturing capabilities, Silicon-photonic links with SMFs offer the benefit of superior energy efficiency for medium reach interconnects. However, with the advances made in Silicon-photonics technology over the last decade, MRR-based links are expected to compete with, and maybe replace, VCSEL-based links even for lengths < 100 m.

### **1.2 Datacenter-to-datacenter communication**

The installation of optical fibers for long haul and metro links is expensive, therefore maximizing the throughput on a given fiber is always desired. Even for datacenter connectivity, there is now a need to support sub-Tb/s of data rate per wavelength on a fiber. Optical links supporting only intensity-modulation and wavelength division multiplexing do not fully utilize all possible dimensions of data modulation. By using the electric field (phase and amplitude) instead of just intensity, modulation schemes such as quadrature-amplitude modulation (QAM), and dual-polarization (DP) of the same wavelength, coherent optical communication links offer high spectral efficiency per wavelength [14]. Utilizing a laser at the RX as a local oscillator (LO) for mixing, a coherent RX offers better sensitivity compared to its direct-detect counterpart in an intensity-modulation link. This can enhance the unrepeated transmission distances in a coherent link [14, 15].

Traditionally, high-performance optical coherent communication transceivers, Figure 1.3., have relied on discrete assemblies of expensive material such as LiNbO<sub>3</sub> Mach-Zehnder modulators (MZMs) and III-V drivers and RXs. Although such platforms deliver high performance



Figure 1.3: Coherent transceivers: from LiNbO<sub>3</sub> + III-V to all-Si/SiGe Platform

in terms of BW, linearity, swing, and reliability, they are bulky and unsuitable for high-volume or low-cost applications. During the last decade, Silicon-photonics has emerged as a successful platform for high-volume intensity-modulation-direct-detect transceivers [2, 16–18]. Silicon-photonics also offers the capability of integrating all the required optical functions for a coherent link, paving the way for an all-Si/SiGe-based platform for their next-generation transceivers.

### **1.3** Dissertation contribution and organization

The objective of this dissertation is to explore O/E links for short and long-reach applications and provide solutions addressing their different requirements.

Chapter 2 presents an analysis of MRR-based links from the holistic perspective of optical devices, CMOS circuits, and system-level link budget and energy-efficiency simulations. Design considerations and trade-offs for the RX, TX, and the overall link are presented, and comparisons are made to the mainstream MM VCSEL-based links with MMFs. Moreover, research opportunities are highlighted for further improving the energy efficiency of single-channel and wavelength division multiplexing-based Silicon-photonics links.

In Chapter 3, an avalanche photodetector (APD)-based RX is presented. The fully integrated O/E RX utilizes the APD multiplication gain to enhance the overall sensitivity, and the monolithic implementation to allow low cost and reduced parasitics. With a noise-canceling active balun-based RX and an integrated APD bias stabilization control, the integrated O/E RX achieves a measured sensitivity of -18.8 dBm at 0.57 pJ/b.

In Chapter 4, a current-mode receiver that eliminates the noisy and power-hungry frontend is proposed as a solution to conventional O/E RX gain-bandwidth-power trade-offs. The proposed design converts the single-ended photodetector (PD) current into differential currents and resolves the data using a current-based sense amplifier.

In Chapter 5, an MZM linear high-swing driver is presented. The driver uses a voltage breakdown enhancement technique to ensure reliability, and resistor-based capacitor-splitting technique to enhance BW. The driver achieves 6Vppd, 3.6% THD, and > 40 GHz BW, enabling 0.5 Tb/s/wavelength operation. The all-Si/SiGe design matches the required optical signal-to-noise ratio (ROSNR) performance of LiNbO<sub>3</sub> modulators with III-V drivers at 34 Gbaud.

In Chapter 6, an automatically reconfigurable transimpedance amplifier (TIA) satisfying the stringent noise-linearity conditions is proposed. The TIA reduces base resistor noise, gain peaking, phase margin and  $f_T$  degradation, operating on a single sense-voltage. Collaborative offset and DC current cancellation is used to reduce nonlinearity and protect against current overdrive. The RX achieves a gain of 75.5 dB $\Omega$  and an input-referred noise of 18.5 pA/ $\sqrt{Hz}$  at 42 GHz BW. The O/E RX enables 50 Gbaud 24 dB ROSNR for a wide range of the received signal and 528 Gb/s/wavelength.

In Chapter 7, conclusions and major contributions are presented, and the possibilities for extending the work further are described.

## Chapter 2

### Short-reach optoelectronic links<sup>1</sup>

This chapter discusses various design considerations for NRZ O/E links for use within warehousesize datacenters and other short-to-medium reach (few meters to kilometer) applications. The discussions evaluate the impact on the overall link performance due to each element in the O/E link - interconnects, O/E TX, and RX.

Although not described herein, the discussion can be extended to include PAM-4 O/E links. Different TX electro-optical techniques can be adapted to expand the NRZ TX capability to allow PAM-4 signaling [19–21]. Linear, and not limiting, RX designs must be included for PAM-4 links [22, 23].

Section 2.1 of the chapter discusses the various elements of a typical O/E RX, providing an in-depth noise analysis of TIA architecture. The trade-offs in the design of an RX that set the limit for the RX sensitivity are analyzed. Section 2.2 briefly presents an overview of MRR-based TX and VCSEL-based TX and lists the best-in-class device performances reported to date.

 $<sup>{}^1 \</sup>ensuremath{\mathbb C}$  of IEEE. Reprinted, with permission from [2]

With an understanding of the limits of RX sensitivity and TX trade-offs, and setting the baseline with the current best-in-class devices, Section 2.3 presents the link budget and energy efficiency calculations for MRR-based and VCSEL-based links for various lengths of the interconnect. It highlights the benefits of each approach, exposing the challenges associated with them that limit the overall system performance and setting the ground for future research.

Section 2.4 explores the opportunities of MRR-based links in realizing Tb/s aggregate throughput using wavelength division multiplexing and discusses the research challenges. Finally, the chapter is summarized in Section 2.5.

### 2.1 Optoelectronic link: receiver

An O/E RX, shown in Figure 2.1, converts the received modulated optical signal into an electrical signal, amplifies it, and prepares it to be processed by the RX digital core. A PD is used to convert the optical signal into an electrical current. The PD is followed by a TIA and a main amplifier (MA) to amplify the electrical signal and convert it from a single-ended current to a differential voltage. A clocked-sense-amplifier (SA) is used after the MA to re-time the signal and provide a rail-to-rail digital output, which can either be processed on the same chip or buffered and sent to another chip. The SA itself could be integrated with the O/E RX on the same chip along with clock recovery or implemented separately. A replica TIA is often used for generating an automatic threshold control voltage ( $V_{ATC}$ ), which helps in the single-ended to differential conversion.



Figure 2.1: A typical O/E RX consisting of a PD, TIA, MA(s) and an SA.

#### 2.1.1 Photodiode

The responsivity of a PD (R), defined as the ratio of the output electrical current to the input optical power, is essentially the gain of the first stage of the O/E RX, which means it directly affects the RX sensitivity and impacts the link budget and power consumption.

The PD capacitance ( $C_{PD}$ ) is the parasitic capacitance at the output of the PD ( $C_{PD}$  has been incorporated into the total capacitance  $C_T$ , and not explicitly shown in Figure 2.1).  $C_{PD}$  can be reduced by making the active region thicker and smaller in diameter at the expense of increased carrier transit time and alignment tolerances [11]. In addition, any capacitance between the PD output and the TIA input, such as pads, electrostatic discharge (ESD) device, packaging, and routing, leads to increasing the effective PD capacitance and thus degrading the RX BW. PDs can either be integrated, i.e., implemented on the same CMOS chip as the TIA and the rest of the RX front-end, or discrete, i.e., fabricated on a separate chip and connected to the RX front-end either using wire bonding or flip-chip packaging. Monolithically integrated PDs are desirable since they do not require additional packaging steps and offer lower parasitic capacitances due to the lack of pads, ESD, and package parasitics between the PD and the TIA. Although a topic of active ongoing research, integrated PDs typically suffer from lower responsivities [24], attributed to their fabrication on CMOS processes that are not optimized for O/E devices. Integrated PDs on CMOS SOI processes with Ge doping have better responsivities [12], but the introduction of Ge requires high-temperature processing and may degrade the transistor performance [25]. Table 2.1 lists some of the state-of-the-art PDs operating up to 25 Gb/s.

|      | C <sub>PD</sub> (fF) | <b>R</b> (A/W) | Data Rates (Gb/s) | Wavelength (nm) |
|------|----------------------|----------------|-------------------|-----------------|
| [26] | 65                   | 0.75           | 25                | 1300            |
| [27] | 80                   | 0.55           | 25                | 850             |
| [28] | $200^{1}$            | 1              | 24                | 1550            |
| [29] | 10/80 <sup>1</sup>   | 0.6            | $25^{2}$          | 1550            |
| [30] | 20                   | 0.8            | 25                | 1550            |
| [31] | 44                   | 0.54           | 25                | 1545            |

Table 2.1: Best-in-class photodetectors

<sup>1</sup>Total capacitance includes PD, bonding pad, and ESD capacitances. <sup>2</sup>Reported BW in GHz.

#### 2.1.2 TIA and RX Front-End

A TIA is a gain stage used to convert the PD output current to a voltage signal that can be further processed by the RX. The TIA is required to have sufficient gain based on the sensitivity demands of the SA, appropriate –3 dB BW for the link data rate, and minimum noise contribution for achieving the bit error rate (BER) requirements. These characteristics are strongly coupled to one another and must be addressed carefully. Several architectures have been recently published [32–34] to relax the gain-BW trade-off in TIAs.

Figure 2.1 shows an RX front-end consisting of an inverter-based TIA followed by one or more differential MA gain stages. The inverter-based TIA architecture has been chosen due to its simplicity, good performance, and suitability to low supply operation. It consists of a CMOS inverter with a voltage gain,  $A_{inv}$ , given by equation 2.1, and resistance  $R_F$  providing the negative feedback to reduce the dependency between the gain and BW.  $C_L$  includes the output capacitance of the first stage and the loading effect from the second stage.  $C_F$  is the parasitic capacitance between the input and output nodes of the TIA's forward path gain stage, which is amplified by the miller effect. The mid-band transimpedance gain of the inverter-based TIA and the -3 dB BW at its input,  $BW_i$ , are given by equation 2.2 and equation 2.3, respectively.

$$A_{inv} = \left(g_{mn} + g_{mp}\right)\left(r_{dsn}||r_{dsp}\right) = G_m r_o \tag{2.1}$$

$$Gain_{TIA} = \frac{R_F A_{inv}}{1 + A_{inv}}$$
(2.2)

$$BW_i = \frac{1}{2\pi C_T \left(\frac{R_F}{1+A_{inv}}\right)}$$
(2.3)

Here,  $g_{mn}(g_{mp})$  and  $r_{dsn}(r_{dsp})$  represent the transconductance and output impedance of the NMOS (PMOS) transistor in the inverter, respectively.  $C_T$  incorporates the input capacitance of the TIA. If the gain of the inverter stage is assumed to be sufficiently large, the overall mid-band gain of the TIA is approximately equal to  $R_F$ . Furthermore, the effect of increasing  $R_F$  on  $BW_i$  is suppressed by the  $(1 + A_{inv})$  factor, which relaxes the gain-BW trade-off at the expense of power consumption. It is noting that increasing  $A_{inv}$  beyond a certain value negatively affects the BW due to the concomitant increase in  $C_T$  [35].

The expression for the overall –3 dB BW of the RX must include the transfer function of the TIA and the following amplifiers. As the corresponding mathematical expressions are cumbersome when incorporating all the parasitics, those effects have been captured through Cadence and MATLAB based co-simulation in this work. To obtain insights into the RX sensitivity, a detailed analysis of the noise contribution of the RX front-end is performed next.

#### 2.1.3 Noise Analysis of a Two-Stage RX Front-End

The RX sensitivity [36] and the link budget dictate the minimum amount of transmitted laser power to meet a specific BER. Since the laser power dominates the overall power consumption of an O/E link, any change in the RX sensitivity would impact the overall power consumption of the system. A simplified expression for RX sensitivity is given in equation 2.4.

$$i_{sens} = 2Qi_n^{rms} + i_{offset} + i_{sens,SA}$$
(2.4)

For a given BER which dictates the Q ( $\approx$ 7 for 10E–12 BER), the RX sensitivity is limited by the input-referred noise ( $i_n^{rms}$ ), the input-referred residual offset current ( $i_{offset}$ ), and the input-referred sensitivity of the SA ( $i_{sens,SA}$ ). Assuming an SA stage with ideal sensitivity and neglecting the offset, the overall RX sensitivity is dictated by the noise performance of the TIA and MA stages. To further simplify the analysis, the BW of the RX is assumed to be dominated by its first stage, with the MA stage having comparatively much larger BW. If needed, multiple MA stages can be cascaded to relax the gain-BW trade-off [37] to maintain this assumption.

To derive an expression for  $i_n^{rms}$ , the noise contribution of each element is referred to the output of the MA (input of the SA) taking into account the effects of various transfer functions that the noise passes through. Then the overall noise is referred back to the input of the RX by dividing by the mid-band gain of the RX front-end ( $R_T$ ), given by equation 2.5. When calculating  $i_n^{rms}$ , selecting the noise integration boundaries could significantly impact the accuracy of the results. Due to the low pass response of the RX, the high-frequency noise components beyond the -3 dB BW of the RX are attenuated, and it becomes sufficient to integrate the noise over twice the RX BW [38].

$$R_T = A_o \left( \frac{r_o - G_m r_o R_F}{1 + G_m r_o} \right)$$
(2.5)

Equations 2.6 and 2.7 describe the transfer functions for the noise from  $R_F$  and  $G_m$  to the MA input. The resulting noise at the input of the MA is then amplified to its output through the single-pole frequency response of the MA. The voltage noise of the MA ( $V_{n,MA}$ ) appears directly at its output. The  $i_n^{rms}$  is calculated using equation 2.8, where  $C_P$  is the sum of mutual products of  $C_T$ ,  $C_L$ , and  $C_F$ ,  $A_o$  is the gain of the second stage, and  $G = (A_{inv} + 1)$ .

$$Z_1(s) = \frac{V_{OUT}}{i_{n,R_F}} = \frac{G_m r_o R_F + s R_F r_o C_T}{S^2 (R_F r_o C_P) + S (R_F C_T + r_o (C_T + C_L) + G R_F C_F) + G}$$
(2.6)

$$Z_1(s) = \frac{V_{OUT}}{i_{n,G_m}} = \frac{r_o + sr_o R_F (C_T + C_F)}{S^2 (R_F r_o C_P) + S (R_F C_T + r_o (C_T + C_L) + G R_F C_F) + G}$$
(2.7)

$$i_n^{rms} = \frac{\sqrt{\int_0^{2BW} \left( V_{n,MA}^2 + A_o^2 I_{n,R_F}^2 |Z_1(f)|^2 + A_o^2 I_{n,G_m}^2 |Z_2(f)|^2 \right) df}}{R_T}$$
(2.8)

Figure 2.2 plots the RX sensitivity as a function of the total input capacitance,  $C_T$ , for different data rates, obtained for a 65 nm CMOS process. To generate this plot,  $C_F$  is assumed to be 10 fF, and the PD responsivity to be 0.8 A/W. The optimum value of  $R_F$  is calculated such that the RX sensitivity is maximized at each data rate and  $C_T$  value. As expected, a reduction in  $C_T$  enhances the RX sensitivity, which reduces the power consumption of the system. For example, reducing  $C_T$  from 250 fF to 110 fF at 25 Gb/s improves the RX sensitivity by 3 dB. Consequently, the transmitted power can be dropped by 3 dB, leading to a 50% reduction in



Figure 2.2: RX sensitivity vs. input capacitance for different data rates.

the power consumption of the laser. This underlines the importance of developing integrated PDs with high responsivity, advanced packaging techniques, and ESD solutions that minimize various parasitic capacitances at the RX input.

Finally, for links operating at high data rates (> 20 Gb/s), reducing the input capacitance below a few tens of fF does not lead to substantial improvements in the RX sensitivity. Instead,  $C_F$  limits the sensitivity for low  $C_T$  values, and any reduction in  $C_F$  would significantly enhance the RX sensitivity, as shown in Figure 2.3. This, in turn, suggests that CMOS technology scaling can be leveraged to improve the RX sensitivity at high data rates. With device scaling in successive CMOS technology nodes, the parasitic capacitance of the transistor decreases, and the transconductance of PMOS increases, thereby reducing the size of the PMOS transistor in the inverter-based TIA. Finally, the sensitivity is limited by  $C_L$  when both  $C_T$  and  $C_F$  are made



Figure 2.3: RX sensitivity vs. input capacitance for different values of  $C_F$ .

small.

### 2.2 Optoelectronic link: transmitter

Next, we discuss the modulation of the electrical data at the TX using indirect Silicon-photonic based techniques, and direct modulation using VCSELs. Owing to their superior power and area efficiency in comparison to Silicon-based interferometric modulators, such as Mach–Zehnder interferometers (MZIs) [10], we only describe Silicon-photonic TXs based on microring-modulators. For the sake of completeness, Table 2.2 lists some of the best-in-class interferometric modulators, highlighting the significant trade-off between area, power consumed due to large driver voltage requirements, insertion loss (IL) and extinction ratio (ER).

|          | ER   | IL            | VπL    | Driver swing | Data rate | Length | Power            |
|----------|------|---------------|--------|--------------|-----------|--------|------------------|
|          | (dB) | ( <b>dB</b> ) | (V.cm) | (Vpp)        | (Gb/s)    | (mm)   | (mW)             |
| [39]     | 7.5  | 8             | 2.7    | 6.5          | 40        | 1      | $208^{1}$        |
| [39]     | 7    | 15            | 2.7    | 4            | 40        | 3.5    | $80^{1}$         |
| [40]     | 4.3  | 4.7           | 0.72   | 3.4          | 30        | 0.5    | 57.6             |
| [20, 41] | 9    | >3.12         | <2     | 2            | 20        | 0.48   | 90               |
| [42]     | 9    | 22.5          | 1.28   | 0.5          | 26        | 2      | 3.8 <sup>1</sup> |

 Table 2.2: Best-in-class interferometric modulators

<sup>1</sup>Only includes the dynamic power consumed in the 50 $\Omega$  termination.



Figure 2.4: (a) MRR-based link. (b) MRR characteristics. (c) VCSEL-based link.

#### 2.2.1 MRR-based transmitter

The optical output of an MRR-based TX, shown in Figure 2.4 (a), is generated by passing the light emitted from a continuous-wave (CW) laser through an MRR modulator. An MRR-based link promises small form factors, ease of integration with CMOS systems, and the support of wavelength division multiplexing for achieving high throughputs. However, as a system, it suffers from some limitations that need to be addressed to make its deployment practical.

#### **MRR** modulator

As shown in Figure 2.4 (b), an MRR modulator is a very high-quality band-stop filter that changes its notch frequency based on the applied electrical voltage. If the laser emits a
continuous light at a wavelength  $\lambda_o$ , the ring should be designed, and tuned, to pass the light when logic '1' voltage is applied with minimum IL, and to suppress the light when a logic '0' voltage is applied. The ER of an MRR is defined as the difference between transmitted optical power in both cases. A higher ER translates into a higher SNR, which relaxes the required sensitivity of the RX. MRRs are very compact, thereby enabling high-speed and energy-efficient switching due to their small parasitic capacitances.

Despite its benefits, the high selectivity of MRRs could be detrimental if the center wavelength drifts with process, voltage, and temperature variations. Therefore, thermal or electrical tuning circuits associated with MRR modulators are essential, putting power, area, and complexity overhead on the overall system. Table 2.3 summarizes some of the best MRRs reported along with techniques employed to tune their wavelength.

|      | ER               | IL   | Driver swing | Data rate | Tuning     |            |               |
|------|------------------|------|--------------|-----------|------------|------------|---------------|
|      | (dB)             | (dB) | (Vpp)        | (Gb/s)    | Method     | Range (nm) | Power<br>(mW) |
| [43] | 11               | 0.9  | 4            | 5         | Electrical | 0.28       | 0.34          |
| [44] | 7                | -    | 2            | 10        | Thermal    | 1.6        | 1.25          |
| [29] | 7-9 <sup>1</sup> | -    | 1.95         | 20        | Thermal    | 1.5        | 7.1/nm        |
| [45] | 7                | 5    | 4.4          | 25        | Thermal    | 0.8        | -             |

Table 2.3: Best-in-class MRR modulators

<sup>1</sup>7 for a wavelength division multiplexing system, and 9 for a single wavelength system.

### **CW** laser

A CW laser generates a single wavelength light beam, the purity of which is expressed in terms of its linewidth. The Fabry-Perot laser and the distributed-feedback laser are the most commonly used lasers in telecommunications [38]. Available lasers often suffer from low power conversion efficiencies. To function properly, wavelength stabilization systems [46–48]

are added to the laser to compensate for wavelength drifts that are caused by various effects such as temperature variations and aging.

When the power consumed by the wavelength stabilization systems is included, the wallplug efficiency (WPE) of commercially available lasers is typically in the range of 1% [49], where WPE is the ratio of laser output power to its total electrical power consumption. Since CW lasers are power-hungry and low-efficiency devices, their effect on the overall system power efficiency is significant. Accordingly, it is clear that for MRR-based links to be practical and energy-efficient, research should be directed towards enhancing the laser WPE.

Hybrid-integrated Silicon-photonic based lasers employing distributed-Bragg-reflector topologies can improve the WPE. Together with a power-efficient stabilization technique, [50] achieved a 12.2% waveguide-coupled WPE (ratio of laser power available in the Silicon waveguide to its total electrical power consumption). In Section 2.3, a 12.2% waveguide-coupled WPE is assumed for the CW laser for link budget analysis. Table 2.4 lists some of the best-in-class CW lasers reported.

|      | Optical power<br>(mW) | λ<br>( <b>nm</b> ) | Linewidth | WPE<br>(%) | Peak optical power<br>(mW) |
|------|-----------------------|--------------------|-----------|------------|----------------------------|
| [49] | 10                    | 1550               | 10MHz     | 1.31       | 20                         |
| [50] | 9                     | 1550               | 0.22pm    | $12.2^2$   | 10                         |
| [51] | 6                     | 1556               | 0.46nm    | 9.5        | -                          |
| [52] | 15                    | 1555               | -         | $7.6^2$    | -                          |
| [53] | 17.3                  | 1566               | 0.22pm    | 7.8        | 20                         |

Table 2.4: Best-in-class CW lasers

<sup>1</sup>Laser's efficiency is 16.7% but drops severely with stabilization. <sup>2</sup>Waveguide-coupled WPE.

### 2.2.2 VCSEL-Based transmitter

VCSELs are semiconductor-based lasers with an ability to be directly modulated in an optical TX. Most of the popular VCSELs are MM, thereby requiring MMF interconnects. High performance and low fabrication cost has resulted in their widespread use in high-speed optical interconnects [44, 54]. As seen in Figure 2.4 (c), VCSELs can be either wire-bonded or flip-chip connected to the CMOS TX chip, which provides both the bias and the data signals. An MMF couples the VCSEL's output directly to the input of the PD on the RX side, and since PDs are typically wideband, there is no need for tuning circuits to adjust the VCSEL's output wavelength. However, the loss and dispersion in MMFs have limited their applications to distances < 100 m in warehouse-size datacenters. Another disadvantage of using MM VCSELs is the lack of support for wavelength division multiplexing. However, as an alternative to wavelength division multiplexing, arrays of MM VCSEL are put together to obtain higher data rates [55].

Table 2.5 lists some of the best-in-class MM VCSELs reported, comparing them in terms of WPE, data rate, and the output power. In Section 2.3, a 17% peak WPE is assumed for the VCSEL for link budget analysis.

|      | WPE<br>(%) | λ<br>(nm) | ER<br>(dB) | Data rate<br>(Gb/s) | Output power<br>(mW) | Bias current<br>(mA) |
|------|------------|-----------|------------|---------------------|----------------------|----------------------|
| [25] | 17         | 850       | 7.3        | 18                  | 3                    | 9                    |
| [56] | 11         | 850       | 5.1        | 15-25               | $0.73^{1}$           | 4.2                  |

Table 2.5: Best-in-class MM VCSELs

<sup>1</sup>Average power.

## 2.3 Link budgets and systems evaluation

This section presents link budget analyses for both the MRR and VCSEL-based links in datacenter applications using the best efficiency figures reported for both CW lasers and VCSELs.

In an MRR-based system, an external CW laser source converts electrical power into optical power,  $P_{TX}$ , with a WPE,  $WPE_{CW}$ . The optical power is then coupled to the photonics chip, with coupling losses,  $P_{CW-CPL}$ , and  $P_{SM-CPL}$ , to be modulated by an MRR with an IL,  $P_{MRR-IL}$ . On the other hand, in a VCSEL-based link, the laser diode is directly modulated by a CMOS driver, which is wirebonded or flip-chip connected to the VCSEL chip. The VCSEL converts electrical power into optical power,  $P_{TX}$ , with an efficiency,  $WPE_{VCSEL}$ . Afterward, in both cases, the modulated optical signal is coupled to an optical fiber, with a coupling loss,  $P_{SM-CPL}/P_{MM-CPL}$ , that carries it to the RX side after introducing an attenuation,  $P_{OF-aft}$ , and in the case of an MMF, a dispersion of  $P_{OF-disp}$  as well. Finally, the optical signal is coupled to the PD, with a coupling loss,  $P_{SM-CPL}/P_{MM-CPL}$ , which converts it to an electrical signal with a specific responsivity. The link budget of an MRR-based link and a VCSEL-based link are calculated based on equations 2.9 and 2.10, respectively.

$$P_{RX}(dBm) = P_{TX} - P_{MRR-IL} - P_{CW-CPL} - 3P_{SM-CPL} - P_{OF-att} - P_{Pen}$$
(2.9)

$$P_{RX}(dBm) = P_{TX} - 2P_{MM-CPL} - P_{OF-att} - P_{OF-disp} - P_{Pen}$$
(2.10)

Figures 2.5 and 2.6 are graphical representations of the efficiency degradation from the TX to the RX in an MRR-based and a VCSEL-based link, respectively. The link budget analysis is performed at 25 Gb/s for 10 m and 100 m long optical fibers, with  $WPE_{VCSEL}$  of 17%

and 11%, and  $WPE_{CW}$  of 12% and 1.3%, respectively. The link penalty ( $P_{Pen}$ ) is calculated by adding the ER penalty [57], the crosstalk and Inter-symbol interference penalties, and the relative intensity noise penalty [58]. For a link with 7 dB ER,  $P_{Pen}$  adds up to about 4.8 dB. Tables 2.6 and 2.7 summarize the numbers used in the link budget analysis. Recent reported numbers for  $P_{SM-CPL}$  range from 0.5 dB [59], 0.7 dB [60], 1.3 dB [61] to 2.8 dB [62]. In this work, a  $P_{SM-CPL}$  of 1.3 dB is assumed [61]. The output of the CW laser is coupled to an SMF with a coupling loss,  $P_{CW-CPL}$ , of 2 dB [63].



Figure 2.5: MRR-based link efficiency degradation for CW Laser WPE of 1.3% and 12% (no significant difference in performance between 10 m and 100 m SMF).

In MRR-based link, an SMF with 0.5 dB/km attenuation introduces 0.005 dB and 0.05 dB loss for 10 m and 100 m length, respectively. These losses are negligible, and therefore, the

curves in Figure 2.5 are almost overlapping. However, for a VCSEL-based link using an OM4 MMF with 3.5 dB/Km attenuation, the optical fiber introduces attenuation of 0.035 dB and 0.35 dB for 10 m and 100 m long fibers, respectively. Furthermore, for longer distances, an MMF will cause significant modal and chromatic dispersion to the transmitted signal, and the associated power penalty along with mode partition noise can severely degrade the overall efficiency. For illustration purposes, Figure 2.6 also shows the efficiency degradation for a VCSEL link on a 300 m long MMF assuming 5 dB of  $P_{OF-disp}$  [11].



Figure 2.6: VCSEL-based link efficiency degradation for 10 m, 100 m, and 300 m MMF (a dispersion penalty of 0 dB for the 10 m and 100 m links, and 5 dB for the 300 m link is assumed, respectively).

| Laser<br>WPE | MRR<br>IL | Fiber loss<br>(1550nm) | Coupler loss | No. of couplers | Link Penalities |
|--------------|-----------|------------------------|--------------|-----------------|-----------------|
| 1.3%,12%     | 1dB       | 0.5dB/km               | 1.3dB,2dB    | $4^{1}$         | 4.8dB           |

Table 2.6: MRR-based link characteristics

<sup>1</sup>Laser to SMF (2 dB). SMF to MRR, MRR to SMF, and SMF to PD (1.3 dB).

| VCSEL<br>WPE | Fiber loss<br>(850nm) | Coupler loss | No. of couplers | Link Penalities |  |
|--------------|-----------------------|--------------|-----------------|-----------------|--|
| 11%,17%      | 3.5dB/km              | 1.1dB[25]    | 2               | 4.8dB           |  |

Table 2.7: VCSEL-based link characteristics

With a 3 dB margin assumed for each link [25], [64], the overall efficiency of a VCSELbased link with a 17% WPE VCSEL and 10 m MMF is calculated to be 1.7%. On the other hand, if it is replaced by a  $12\% WPE_{CW}$  MRR-based link, the overall efficiency is calculated to be 0.6%. The corresponding numbers for a 100 m long fiber are 1.6% and 0.6%, respectively (assuming no MMF dispersion penalty).

Figure 2.7 shows the RX, TX, and the overall energy efficiency versus the RX sensitivity for 25 Gb/s MRR-based and VCSEL-based links with 100 m long channel and an inverter-based TIA RX as discussed in Section 2.1. Beyond the optical devices and interconnects, the energy efficiency includes power consumption in the TX driver, MRR and CW laser tuning, and RX front-end. Power consumed in clock and data recovery and clocking is not included in this analysis.

It can be seen that improving the RX sensitivity drastically improves the overall energy efficiency. For example, an improvement in RX sensitivity from -10 dBm to -13 dBm enhances the overall energy efficiency by  $\approx 0.55 \text{ pJ/b}$  for the MRR-based link and  $\approx 0.15 \text{ pJ/b}$  for the VCSEL-based link, respectively. However, improving the RX sensitivity beyond -16 dBm requires a significant power overhead in the RX, as seen by the sharp knee in the efficiency plot, degrading the overall link energy efficiency. An optimal energy efficiency of < 2 pJ/b can be realized for either link if an RX sensitivity of -11 to -16 dBm is achieved.

The approach used to generate Figure 2.7 is adapted to find the optimal design points for



Figure 2.7: Energy efficiency vs. RX sensitivity for TX, RX and the overall VCSEL-based and MRR-based links (100 m), excluding any clock and data recovery or clocking.

the two links at different data rates. Figure 2.8 shows the energy efficiency breakdown for the two links at different data rates, where each link is assumed to be operating at its optimal design point for energy efficiency. The performance of the optical devices is assumed to be fixed. A significant factor for the performance gap is the improved WPE for a VCSEL. The power consumption due to the TX drivers in both links is similar. However, the VCSEL-based TX consumes more power when operating the link at higher data rates due to the needed equalization power. Also, the efficiency of the MRR modulator degrades especially at lower data rates due to their constant power consumption overhead, depending on the tuning scheme used with the MRR [29, 43, 44]. A constant tuning power of 1 mW [44] each is assumed in Figure 2.8 for tuning the hybrid-Silicon laser and the MRR.



Figure 2.8: Energy efficiency breakdown vs. data rate for VCSEL-based and MRR-based links (100 m), excluding any clock and data recovery or clocking.



Figure 2.9: MRR-based link energy efficiency vs. the CW laser WPE.

As evident from Figures 2.7 and 2.8, the MRR-based link can be comparable in performance to a VCSEL-based link in energy efficiency for short-to-medium reach. In order to bridge the performance gap,  $WPE_{CW}$  must be enhanced. Figure 2.9 plots the energy efficiency of the MRR-based link as a function of  $WPE_{CW}$  and shows the trend of a dramatic reduction in the energy consumption with increase in  $WPE_{CW}$ . A significant improvement in energy efficiency will be achieved in commercializing the  $WPE_{CW}$  to 12% [50] and higher. Furthermore, another approach to narrow the performance gap lies in the design of highly sensitive RXs with better energy efficiency profiles. For example, if an RX with an energy efficiency knee at -22 dBm is used in the previously discussed links, the power performance of both the links would be almost identical for an RX sensitivity of -19 dBm with an overall link energy efficiency of  $\approx 1.2$  pJ/b. Additionally, as an MRR-based link needs more couplers compared to its VCSELbased counterpart, more research should be directed towards reducing the coupling losses.

In conclusion, beyond a few hundred meters of length, MRR-based links in datacenters are already very promising. For short-to-medium reach, their performance is becoming comparable to a VCSEL-based design in a single-channel implementation, with further enhancement in the RX sensitivity,  $WPE_{CW}$ , and coupling technology needed to bridge the performance gap.

# 2.4 Towards a power-efficient Tb/s links

The rising demand for data throughput in datacenters has been driving the contemporary research in increasing the aggregate link speed. Following the IEEE 802.3 standard specifications, several 100 Gb/s links are already in production [55, 65, 66] as a  $4\times25$  Gb/s parallel link. A shortwave wavelength division multiplexing alliance has been established to multiplex different wavelengths near 850 nm from multiple VCSELs on a single fiber pair [11, 67]. A 100 Gb/s link can be implemented using  $4\times25$  Gb/s shortwave wavelength division multiplexing. A 100 Gb/s PSM4 (parallel SM 4-lane) alliance has also been established [68] to support interconnects up to 500 m. PSM4 could incorporate splitting the CW single-wavelength output of one high power laser source [10] with good WPE between 4 parallel Silicon-photonic channels. The use of this technique can potentially enhance the overall energy efficiency of the link at the expense of the fiber count.

In order to approach the Tb/s aggregate data throughput for medium-reach distances (<2 km) [64], multiple channels can be combined using PSM or wavelength division multiplexing. The high selectivity of MRRs facilitates the design of wavelength division multiplexing interconnects with very high aggregate data rates. In a wavelength division multiplexing link, shown in Figure 2.10, a comb CW laser source is used to generate as many wavelengths as needed, from  $\lambda_1$  to  $\lambda_n$ . The laser output is coupled to a waveguide on the Silicon-photonics chip, which passes through several MRRs, each of which is tuned for a specific wavelength. The modulated optical signals are coupled to a single optical fiber that carries them to the RX side. On the RX side, the received optical signal is coupled to a waveguide that passes through a series of MRR-filters, each of which is tuned to select a specific wavelength. After separating the modulated optical signals based on their wavelengths, PDs are used to detect each of the incoming signals. It has already been shown that MRR-based links are capable of operating at an excess of 25 Gb/s per wavelength [45, 69, 70], implying that a 25 Gb/s lane could be replicated as many times as needed to reach the aggregate goal. To relax the RX front-end design requirements to attain a 100 Gb/s throughput and beyond, the multiplexing factor (number of wavelengths, n) can theoretically be increased to a large extent [71]. However, practical implementations introduce



Figure 2.10: A wavelength division multiplexing system with MRR-based links.

new challenges, such as the need for a good WPE comb laser with sufficient output optical power for each wavelength. Published work on comb lasers show the capability of generating 10 to 20 different wavelengths using a single comb laser with 0 to 7 dBm output optical power per wavelength, and with conversion efficiencies that could reach up to 22% [72, 73]. However, further research efforts are needed to account for wavelength stabilization and CMOS-based control systems, which are essential for MRR-based wavelength division multiplexing links.

As the multiplexing factor increases, the complexity of the tuning circuits needed in an MRR-based wavelength division multiplexing link also increases dramatically [74]. Moreover, electrical or optical interchannel crosstalk limits the multiplexing factor. On the RX side, the frequency response of the MRR and the link budget allowance for the optical crosstalk penalty sets the minimum channel spacing and hence, the maximum multiplexing factor. As

an example, [75] presented an MRR with a 12.4 nm free spectral range, and reported that for a crosstalk penalty of 0.76 dB, the maximum achievable multiplexing factor is 16. In scenarios where the comb laser and/or the RX filter limits the number of wavelengths, the aggregate throughput can still be increased by designing a wavelength division multiplexing link with the maximum possible multiplexing factor and then putting multiple such links in parallel. In both cases, the energy efficiency of the wavelength division multiplexing link would be a scaled version of a single channel link, along with the wavelength division multiplexing crosstalk penalty. Putting multiple parallel links would have the disadvantage of the increased number of laser sources and fibers. Finally, data rates per lane can also be increased to 50 Gb/s NRZ or PAM-4. However, such high data rates introduce many challenges on both the electrical and optical sides that lead to significant energy efficiency degradation. For example, challenges for the optical devices include limited BW, difficult trade-off between ER and IL, effects of mon-linearity, and relative intensity noise. Significant research is currently being pursued to mitigate those challenges [76–79].

## 2.5 Summary

Warehouse-scale datacenters demand short-to-medium reach optical interconnects spanning distances up to a few kilometers. While the short-reach implementations are currently dominated by MM VCSEL-based links, the increase in loss and modal dispersion in MMFs render them unattractive from the perspective of link energy efficiency for medium reach. MRR-based Silicon-photonic links utilizing SMFs are a promising alternative. Even for shorter reach (< 100 m), MRR-based links can be a competitive technology.

Through a comprehensive overview of the state-of-the-art of optical devices and CMOS circuit-level simulations, a detailed power and noise analysis for both VCSEL and MRR-based links is presented in this chapter. The simulations estimate the energy efficiency of the MRR-based link at 25 Gb/s is within a pJ/b of that of a VCSEL-based link at 100 m of interconnect length. To further bridge the performance gap in energy efficiency, there are several research opportunities at the device, circuit, and link level. The WPE of CW lasers must be improved, and the IL and ER of the MRRs should be improved. Integrated PDs are attractive due to reduced parasitic capacitances, but such a reduction must not be at the cost of reduced responsivity or degradation in transistor performance. Beyond a certain lower limit of PD capacitance, the capacitance from the pads and the TIA limit the RX sensitivity. From the perspective of CMOS circuit design, alternate TIA topologies must be developed to improve the sensitivity of the RX. Finally, at the link level, reducing the coupling loss and improving packaging techniques will ease the link budget.

The small area and the high selectivity of an MRR promote it as an ideal building block for wavelength division multiplexing transceivers that can achieve very high aggregate data rates for short to medium reach. Beyond solving the aforementioned challenges for a single lane, developing better comb lasers, RX filters, and tuning techniques will facilitate the application of Silicon-photonics wavelength division multiplexing links in datacenters for high-throughput medium-reach communication.

# **Chapter 3**

# Avalanche photodetector-based receiver<sup>1</sup>

Due to their high multiplication gain and responsivity, APDs can be used to enhance the O/E RX sensitivity and hence improve the overall energy efficiency of an O/E link. Moreover, the APD monolithic implementation with the CMOS RX provides further advantages such as low cost and extended BW. However, CMOS-compatible APDs require high bias voltages and are sensitive to variations in the operating conditions.

This chapter presents a CMOS RX with noise-canceling active balun as part of a fully integrated APD-based O/E RX with bias generation and stabilization.

The motivation of this work is discussed in Section 3.1, followed by the fundamentals of APD operation and challenges in Section 3.2. Section 3.3 presents in detail the electrical high-frequency path. The implementation of the O/E RX, along with the measurement results, are discussed in Section 3.4. Finally, the chapter is summarized in Section 3.5.

<sup>&</sup>lt;sup>1</sup>© of IEEE. Reprinted, with permission from [80]

## **3.1** Introduction and motivation

As the O/E transceiver market experiences significant growth, it is imperative to reduce the bill of materials and lower the cost of the transceivers. As most of these transceivers utilize an off-chip PD, a fully-monolithic implementation of the PD will reduce the cost for both the component and the associated packaging. From the link perspective, there are also requirements to increase the data rate from 10 Gb/s to 25 Gb/s and higher, and reduce the overall power consumption. A single-chip O/E RX with a monolithic PD and the associated CMOS circuits can significantly reduce the parasitics and ease the overall O/E RX design. Beyond the transceiver market, a fully-integrated O/E RX is also desirable for other applications in high-performance computing and sensors [81].

A significant majority of the optical interconnects in the existing datacenters today span a distance of less than 300 m, and operate at 850 nm with MMFs, VCSELs, and PDs [10]. In this project, we propose a fully-integrated, single-chip CMOS O/E RX incorporating a CMOS APD for 850 nm applications.

# **3.2** Avalanche photodetector

### 3.2.1 Responsivity and link budget

PDs convert the light into an electrical current and therefore are used at the front-end of every O/E RX. The PD responsivity, R, defined as the ratio of the output current  $I_{PD}$  to the input

optical power  $P_{in}$ , is a measure of its gain as in equation 3.1 [38].

$$R = \frac{I_{PD}}{P_{in}} = \frac{\eta}{hc/\lambda}$$
(3.1)

where  $\eta$  is the quantum efficiency of the PD,  $\lambda$  is the wavelength of the incident light, *h* is the Planck's constant, and *c* is the velocity of the light. The relation between the optical sensitivity,  $P_{op-sen}$ , of the O/E RX and the electrical sensitivity,  $i_{sen}^{PP}$ , of the electronic RX can be shown as in equation 3.2 [38].

$$P_{op-sen} = \frac{i_{sen}^{PP}}{2R} \tag{3.2}$$

Higher responsivity of the PD, therefore, significantly improves the O/E RX optical sensitivity. Traditionally, external PDs implemented in expensive III-V technologies offer R of up to 1 A/W.

One approach to boost the responsivity is to leverage the avalanche effect [82] and design an APD. In an APD, the photoelectric effect first converts the incident photons into electrons, similar to regular PDs. Then, by applying a high reverse bias voltage, these electrons are accelerated to create impact ionization and generate many more carriers. This *avalanche* effect further increases the current gain, boosting the effective responsivity,  $R_{eff}$ , of the APD by a multiplication factor, *M*.

The multiplication gain of the APD has a huge impact on the overall link budget. Consider a typical VCSEL and MMF link in a datacenter. The link budget given in equation 2.10, can be represented as in equation 3.3.

$$P_{RX}(dBm) = P_{TX} - 2P_{MM-CPL} - P_{OF-att} - P_{OF-disp} - P_{Pen} = \frac{i_{sen}^{PP}}{2MR}$$
(3.3)

It is evident that using APDs and improving M for the APDs can significantly reduce the overall power consumption of the link. Conversely, for the same laser power, the design of the O/E RX can be considerably relaxed.

### **On-Chip PD vs. Off-Chip PD**

Traditionally, PDs are implemented in a separate process compared to the CMOS electronic RX to increase the PD performance, namely, –3 dB BW of its frequency response, and responsivity. PDs are generally made in expensive technologies such as Ge [83], GaAs [84, 85] or InP-InGaAs [86] to enhance their performance.

However, connecting the external PD to a CMOS electronic RX using wirebonding assembly results in several issues: increase in manufacturing and packaging cost, possible decrease in yield, crosstalk between the bondwires degrading RX performance especially when implemented as arrays of PDs connecting multiple RXs, requirement for ESD devices, additional packaging parasitics degrading the sensitivity of the RX, etc. A flip-chip package reduces, but does not eliminate, parasitics and crosstalk.

To overcome the aforementioned problems with a discrete PD, a fully monolithic CMOS PD can be implemented by using additional modifications to the CMOS process [87]. However, adding Ge to CMOS process to improve the PD performance increases manufacturing cost and complexity of fabrication, and is detrimental to the performance of CMOS transistors [88]. A Ge-based APD has high optical absorption only in 1.3 to 1.55  $\mu$ m wavelength range and is therefore not suitable for 850 nm applications. Fully-integrated bulk CMOS PDs [89, 90] have low responsivity and are not very attractive for high-speed links.

#### Integrated CMOS APD

In order to reduce cost, manufacturing complexity, bondwire crosstalk, and parasitics, an APD is designed in the same bulk CMOS process as the electronic RX circuits. Due to monolithic integration, no external signal pads are needed ( $C_{pad} = 0$  fF), and the ESD requirement is also considerably reduced. On-chip APDs thus improve O/E RX sensitivity and BW and have inspired several recent research efforts in the design of on-chip CMOS APDs [82].

Despite the advantages of CMOS APDs, certain drawbacks have limited their practical use. For instance, APDs in bulk CMOS process require bias voltage of up to 10 V [82], which can be addressed using an on-chip voltage booster to get the high APD bias voltage from the available nominal supply. Moreover, due to the APD thermal and shot noise contribution, it has to be operated at its SNR optimal point, which is sensitive to the bias voltage and temperature [38, 91].

The system presented in this chapter, shown in Figure 3.1, consists of a high-speed data path that converts the received input optical signal to a differential voltage, an on-chip voltage booster to provide the needed APD bias voltage, and a precision bias control loop that senses the output level of the MAs and uses a search algorithm to maintain a steady APD BW and gain.

## **3.3 CMOS receiver**

This section discusses the details of the high-speed path of the 10 Gb/s APD O/E RX implemented in a 0.13- $\mu$ m CMOS process, as shown in Figure 3.1. The CMOS APD is followed by a TIA to convert the electric current to an electric voltage with gain ( $A_{TIA}$ ), and four stages



Figure 3.1: A 10 Gb/s APD O/E RX in 0.13-µm CMOS process.

of MAs for further voltage amplification. An offset cancellation loop consisting of a low-pass filter (LPF), error amplifier, and a current source is also shown in Figure 3.1.

The output of a PD is single-ended. As differential designs are preferred in an RX, a differential TIA driven by a PD and a dummy PD can be implemented [92], where the dummy PD does not have any light incident on it. Another popular topology is to implement a single-ended TIA followed by a differential amplifier where the TIA is conventionally connected to a differential amplifier, with the other input of the differential amplifier connected to a replica TIA, Figure 3.2.

However, these methods cause mismatch in gain and phase of the differential output, resulting in asymmetric signals. Moreover, a dummy TIA also increases the power consumption of the RX.

In the differential amplifier, Figure 3.2, transistor M1 is connected to the TIA and M2



Figure 3.2: A differential amplifier to convert single-ended TIA output to differential signals for the MAs.

is connected to dummy TIA for better matching. Transconductance  $(g_m)$  and the load (R) of both transistors M1 and M2 are considered to be matched. The output-referred noise,  $(\overline{V_{(o,n)}^2})$ , and the input-referred noise,  $(\overline{V_{(i,n)}^2})$ , of the differential amplifier can be calculated as in equations 3.4 and 3.5, respectively.

$$\overline{V_{(o,n)}^2} = 8KTR\left(1 + \gamma g_m R\right) \tag{3.4}$$

$$\overline{V_{(i,n)}^2} = \frac{8KT \left(1 + \gamma g_m R\right)}{g_m^2 R}$$
(3.5)

In this work, a single-ended inverter-based push-pull TIA followed by a self-noise-canceling active balun to convert the single-ended TIA output to differential signal was implemented, Figure 3.1. Figure 3.3 shows the active balun implementation, inspired by noise-canceling low-noise amplifiers [93].  $V_{in}$  represents the single-ended signal from the TIA,  $R_s$  is the output resistance of the TIA,  $V_B$  is the gate bias for the common-gate transistor M2, and  $V_P$  and  $V_N$ 



Figure 3.3: A noise-canceling active balun to convert single-ended TIA output to differential signals for the MAs.

are differential signal outputs.

As shown in Figure 3.3, the TIA signal,  $V_{in}$ , undergoes amplification by M2 in phase, and amplification at M1 out of phase, and thus the two signals add up differentially at the output. The gain of the balun,  $A_{Balun}$ , follows equation 3.6.

$$A_{Balun} = \frac{g_{m1}R_1 + g_{m2}R_2}{1 + g_{m2}R_s} = g_{m1}R_1$$
(3.6)

Where  $g_{m1}$  and  $g_{m2}$  are transconductance of transistors M1 and M2, respectively.  $R_1$  and  $R_2$ are the loads seen by M1 and M2 transistors and can be approximated to  $1/g_{m4}$  and  $1/g_{m3}$ , respectively. For the output to have matched swings at  $V_P$  and  $V_N$ , the gain of the two paths should be designed such that  $g_{m1}R_1 = g_{m2}R_2$ .

On the other hand, the gate-referred equivalent noise of M2 has two paths to the differential outputs. The noise of M2,  $V_{n2}$ , with M2 acting as a common-source device, is inverted at P,  $V_{P,n2}$ .  $V_{n2}$  is also sensed at node A in-phase,  $V_{An2}$ , and then inverted by M1 to appear at the node N as  $V_{N,n2}$ . Where  $V_{P,n2}$  and  $V_{N,n2}$  are given by equations 3.7 and 3.8, respectively.

$$V_{P,n2} = V_{n2} \left( \frac{-g_{m2}R_2}{1 + g_{m2}R_s} \right)$$
(3.7)

$$V_{N,n2} = V_{n2} \left( \frac{-g_{m1} R_1 g_{m2} R_s}{1 + g_{m2} R_s} \right)$$
(3.8)

For the differential design, where  $g_{m1}R_1 = g_{m2}R_2$ , if  $R_s = 1/g_{m2}$ , then the effective noise of *M*2 is canceled at the output, as per equation 3.9.

$$V_{out,n2} = V_{n2} \left( \frac{g_{m1} R_1 g_{m2} R_s - g_{m2} R_s}{1 + g_{m2} R_s} \right)$$
(3.9)

Considering noise from other sources, M1,  $R_1$  and  $R_2$ ,  $\left(\overline{V_{(o,n)}^2}\right)$  and  $\left(\overline{V_{(i,n)}^2}\right)$  can be calculated as in equations 3.10 and 3.11, respectively.

$$\overline{V_{(o,n)}^2} = 4KTR\left(2 + \gamma g_m R\right) \tag{3.10}$$

$$\overline{V_{(i,n)}^2} = \frac{8KT \left(1 + \gamma g_m R/2\right)}{g_m^2 R}$$
(3.11)

Comparing equation 3.11 with equations 3.5, the input-referred noise of the active balun is smaller than that of a differential amplifier with dummy TIA. Figure 3.4 compares the noise, when referred to the TIA input, of the active balun with a differential amplifier based



Figure 3.4: Simulated noise referred to TIA input to compare differential amplifier and noise-canceling active balun when both have the same gain and BW.

implementation when both are designed and simulated for the same gain and BW. The total RMS input-referred noise of the TIA with active balun and differential amplifier is 2.2  $\mu$ A and 3.1  $\mu$ A RMS, respectively. Apart from minimizing the mismatches in gain and delay as compared to a dummy TIA, the active balun provides gain (*A*<sub>Balun</sub>), thereby reducing the input-referred noise from the MAs.

The loads  $R_1$  and  $R_2$  are implemented as active shunt-peaking inductors for BW extension [37, 94, 95] at the expense of additional noise. The active balun is followed by four stages of MAs, with each stage implemented as a differential amplifier employing shunt-peaking active inductors. MAs are provided with two different power supplies, 1.24 V for the transistors and 1.54 V for resistors of active inductor. The amplified signal is finally buffered to the output using 50 $\Omega$  driver for measurement purposes.



Figure 3.5: Die micrograph of the APD O/E RX in 0.13- $\mu$ m CMOS.

# **3.4** System implementation and measurements

The APD O/E RX, as shown in Figure 3.1, is implemented as a proof-of-concept prototype. Figure 3.5 shows the die micrograph. The core area of APD is 10  $\mu$ m × 10  $\mu$ m and that of TIA with MAs, buffers, and offset cancellation loop is 170  $\mu$ m × 140  $\mu$ m (without pads).

For measurement purposes, the chip is wirebonded to a CQFP80 package and soldered onto a PCB. The power consumption for the TIA is 1.74 mW, and for the balun and MA stages is 3.96 mW. A more detailed breakdown of the power consumed in various RX circuits is shown in Figure 3.6 based on post-layout simulations.

The optical eye diagram measurement for the APD O/E RX at  $P_{avg} = -18.8 \ dBm$  is shown in Figure 3.7, for a 10 Gb/s PRBS7 signal generated using a VCSEL rated at 25 Gb/s equivalent



Figure 3.6: Power breakdown based on post-layout simulations (with measured power consumption of 5.52 mW).

BW at a 6 dB extinction ratio.



Figure 3.7: Measured optical eye diagram at 10 Gb/s.

Table 3.1 provides the performance summary and comparison to prior-art CMOS linear (non-clocked) TIA based 10 Gb/s RXs operating at 850 nm with CMOS APD [96] and n-well based PD [97], along with the state-of-the-art designs with external PDs [98–100]. When

published and to the authors' knowledge, this work achieved the best sensitivity for 10 Gb/s CMOS linear O/E RX at 850 nm. The design also compares favorably to the state-of-the-art in area, power and energy efficiency. BER measurements are carried out using an SHF 11125 analyzer and a bathtub plot is shown in Figure 3.8.



Figure 3.8: Measured BER vs. RX sensitivity at 10 Gb/s.

|                        | [98]              | [99]              | [100]     | [96]              | [97]              | This Work        |
|------------------------|-------------------|-------------------|-----------|-------------------|-------------------|------------------|
| CMOS Tech. (nm)        | 120               | 130               | 65        | 65                | 65                | 130              |
| Architecture           | TIA+MA            | TIA+MA            | TIA+MA    | APD+TIA           | Nwell-PD+TIA+MA   | APD+TIA+MA       |
| Gain (dBΩ)             | 81.1 <sup>1</sup> | 87                | 63.2      | 60                | 102               | 71               |
| BW (GHz)               | NA                | 6.6               | NA        | 6                 | 12.5/0.5          | NA/3.5           |
| <b>R</b> (A/W)         | NA                | 0.67              | 0.55      | NA                | NA                | 3.92             |
| Data rate (Gb/s)       | 10                | 10                | 10        | 10                | 9                 | 10               |
| BER                    | 1E-12             | 1E-12             | 1E-12     | 1E-12             | 1E-12             | 1E-12            |
| PRBS                   | 7                 | 7                 | 7         | 7                 | 15                | 7                |
| VDD (V)                | 1.0/1.7           | 1.8               | NA        | 1.2               | 1.0/1.2           | 1.24/1.54        |
| VDD-PD (V)             | 3.5               | 2.5               | NA        | 10.7              | 0.5               | 9.51             |
| Power (mW)             | 8 <sup>1</sup>    | 44 <sup>1,2</sup> | 68.2      | 13.7 <sup>1</sup> | 48 <sup>1,3</sup> | 5.7 <sup>1</sup> |
| Energy/bit (pJ/b)      | 0.8               | $3.52^2$          | 6.82      | 1.37              | 5.33 <sup>3</sup> | 0.57             |
| Area (mm2)             | 0.043             | NA                | $0.004^4$ | 0.024             | 0.23              | 0.024            |
| P <sub>ave</sub> (dBm) | -13.1             | -12.3 to -12.7    | -16.1     | -6.5              | -11.5             | -18.8            |
| OMA (dBm)              | -13.1             | -11.5             | -15.6     | NA                | NA                | -18              |

Table 3.1: Performance summary and comparison to 850 nm linear CMOS TIAs.

<sup>1</sup> Excluding 50Ω buffer, <sup>2</sup> At 12.5 Gb/s, <sup>3</sup> At 9 Gb/s, <sup>4</sup> Excluding offset cancellation

# 3.5 Summary

An APD-based O/E RX greatly relaxes the sensitivity requirements of the electronic RX because of the inherent avalanche gain of the APD. However, due to the high reverse bias requirement and temperature sensitivity of the APD, APD-based RXs have been traditionally implemented as multi-die solutions. This work proposes the first monolithic solution of a CMOS based O/E-RX with APD. Fully-monolithic implementation further improves the BW at the input of the electronic RX by eliminating package parasitics. The electrical RX uses a noise-canceling active balun to enhance the overall sensitivity while providing a single-ended to differential conversion. The APD O/E RX achieved the best-reported sensitivity (as of its publication date) among 850 nm linear CMOS TIAs at 10 Gb/s.

# **Chapter 4**

# **Current-mode receiver**

## 4.1 Introduction and motivation

As mentioned in Section 2.1, a high-speed, low-noise, and low-power RX demands better solutions to combat the gain-BW-power trade-offs encumbering the state-of-the-art. Conventional RXs usually follow the general architecture shown in Figure 2.1, with the topology of the TIA being the main difference between different designs. Three common topologies for linear TIAs include a resistor [101], an amplifier with resistive shunt feedback [102], or a Gm-boosted regulated cascode [9]. Each of these techniques has its pros and cons. However, in all of them, the signal undergoes a series of current(I)/voltage(V) domain conversions until it gets to the SA where a binary (0/1) decision is made.

Consider the example shown in Figure 2.1. Assuming a one-stage MA follows the TIA before the SA, the received signal is in 'I' domain, it is then converted into 'V' by means of the TIA. The MA stage can be looked at as a combination of a Gm and TIA stage. In the first, the input 'V' is converted into 'I', to be then converted back by the latter to 'V' before it is fed to the

SA. The V-I-V conversions can repeat depending on how many MAs precede the SA. All the blocks preceding the SA are required to have a BW serving the needed data rate with minimal input-referred noise contribution, demanding a high-power budget. A solution to this problem can be achieved by moving the decision-making block, the SA, to the front-end, getting rid of the noisy and power-hungry blocks. As the interest in eliminating the power-hungry analog front-end increases, integrating RX front-ends started to become more popular [32]. However, these designs relied mainly on replacing the analog front end with a S/H circuit, while keeping the SA as is. Even though the new implementations get rid of the linear front-end, the input signal still goes through a series of I-V-I-V conversion.

In this chapter, we propose a current-mode receiver that gets rid of the linear front-end as well as the Gm stage in the SA by merging the S/H circuit with the SA into a current-mode receiver. By nature, the input signal to an O/E RX is single-ended, while an SA requires a differential input. In conventional RXs, the differential inputs for the SA are generated by the TIA/MA combination using dummies or baluns, as discussed in Section 3.3. The proposed design aims to convert the single-ended PD current into differential currents feeding directly into a current-based SA to resolve the data right at the input, without the need for any noisy and power-hungry stages.

## 4.2 **Proposed current-mode receiver**

The proposed design converts the single-ended PD current into a differential current through the  $R_B/C_B$  "bias-tee-like" combination, as shown in Figure 4.1 (a). As shown in Figure 4.1 (b), a large value for  $R_B$  ( $R_B \gg Z_{IN}$ ) ensures that it acts as a DC current source to bias the



Figure 4.1: (a)  $R_B/C_B$  combination used for single-ended to differential conversion. (b)  $R_B/C_B$  in the proposed design. (c)  $R_B/C_B$  in [101]

PD with an  $I_B = I_{ON}/2$ .  $R_B$  is then followed by a DC-blocking cap,  $C_B$ . After the  $R_B/C_B$  combination, currents  $I_P$  and  $I_N$  are fully differential and are ready to be processed by the SA. A large  $R_B$  ensures minimal contribution to the input-referred noise of the RX. Similar to the offset-canceling feedback in conventional linear RX front-ends,  $R_B/C_B$  sets the high-pass corner for the RX, and must be kept sufficiently low in order to not degrade the RX eye.

A similar  $R_B/C_B$  combination is used in [101], but for very different purposes. In [101],  $R_B$  is used as a TIA, with a value less than  $Z_{IN}$ , converting the PD current to a voltage, as shown in Figure 4.1 (c). A pair of  $R_B/C_B$  then enables the single-ended to differential conversion, and  $V_{2P}$  and  $V_{2N}$  are fully differential with a common mode,  $V_{CM}$ , set by the following circuit.

In [101], the value of  $R_B$  strongly dictates the gain and BW of the TIA, creating a tight trade-off between them. Practically, the value of  $R_B$  should be relatively small to achieve the targeted BW, usually in the range of a few hundred Ohms. Consequently, it will have a high

noise current showing directly at the input, which, in turn, will degrade the sensitivity of the RX. Moreover, as the input changes, the PD biasing will be modulated, which could potentially degrade the performance of the RX.

The proposed design, shown in Figure 4.2, aims to avoid any of the I-V conversions and handle the signal current directly at the SA for a low-power implementation. As the input to the SA in Figure 4.2 is a current rather than a voltage, the Gm stage in a conventional SA can be omitted. When the SA is enabled, the input current is integrated on  $C_{PP}$  and  $C_{PN}$  until either  $V_P$ or  $V_N$  hits the threshold of the back-to-back inverters enabling the positive feedback to resolve a one or a zero aided by the regenerative action.

A quarter-rate implementation for the current-mode receiver is shown in Figure 4.3, where four slices of the RX are used to sample and recover the incoming serial data in a time-interleaved fashion. The switching matrix, along with the clock generation circuits, ensures a seamless



Figure 4.2: Full-rate current-mode receiver.

transition between the four slices of the RX.  $I_{DC,P}$ ,  $I_{DC,N}$ ,  $C_{PP}$ , and  $C_{PN}$  are used to set the SA gain and implement offset cancellation.

In Figure 4.3, and for illustration purposes, the variable current sources ( $I_{DC,P}$  and  $I_{DC,N}$ ), and the time-interleaving switching matrix are shown as separate blocks. However, in the proposed implementation, shown in Figure 4.4, they are integrated within the SA. The proposed SA is working at quarter the data rate and is clocked with four clock phases, as shown in Figure 4.5. The four clock phases are generated by a quadrature current-mode logic (CML) divider, followed by a CML-to-CMOS converter, and are then buffered to the SA slices.

The SA phases of operation are depicted in Figure 4.5 along with Figures 4.6 - 4.8, and can be summarized as follows



Figure 4.3: Quarter-rate current-mode receiver.



Figure 4.4: Proposed current-SA.



Figure 4.5: Clock phases, and current-SA phases of operation.

#### Phase 1: 'Reset'

In the reset phase, shown in Figure 4.6,  $M_{1P,N}-M_{5P,N}$  are turned off disconnecting the slice from the input and ensuring that the back-to-back inverters  $M_{7P,N}-M_{8P,N}$  are not connected to ground to avoid any static power consumption. In the meantime,  $C_{SP,N}$  are discharged through  $M_{6P,N}$ , while  $C_{PP,N}$  are pre-charged to  $V_{DD}$  through  $M_{9P,N}$ .

### Phase 2: 'Data sampling and integration'

In this phase, shown in Figure 4.7, the reset switches  $M_{6P,N}-M_{9P,N}$  and  $M_{10}$  are turned off and the input is connected to the SA core though  $M_{1P,N}-M_{2P,N}$ . At the same time,  $C_{SP,N}$  are connected to the core through  $M_{5P,N}$ . When connected,  $C_{SP,N}$  provide a DC current that helps increase the SA gain with minimal added noise. The input currents are integrated on  $C_{PP,N}$ , and according to the current's polarity, a difference in the values of  $V_P$  and  $V_N$  is established, putting one of them on the onset of the SA threshold.

### Phase 3: 'Decision and regeneration'

In the last phase of an operation cycle, shown in Figure 4.8, as  $\phi_{270}$  goes low the input is disconnected from the SA and is connected to the SA of the next slice. Simultaneously,  $\phi_{90}$  goes high, enabling a current path through  $M_{3P,N}-M_{4P,N}$  to allow the SA to make a decision and its outputs to reach the rail-to-rail values. With this technique, the SA is allowed double the unit interval (UI) to resolve the value of one bit.

During the design phase, the resistors and capacitors values, and transistors sizes were chosen to achieve the best possible performance in terms of noise, BW, low cutoff frequency  $(f_{lc})$ , and dynamic range. The general considerations taken to achieve that target are presented



Figure 4.6: Proposed current-SA: Phase1-reset.



Figure 4.7: Proposed current-SA: Phase2-integrating the input on the output capacitors.


Figure 4.8: Proposed current-SA: Phase3-Decision, and regeneration phase.

in the rest of this section.

At the input, the  $R_B/C_B$  combination is used to convert the single-ended current into differential. However, the selection of their values is very crucial to the design as they set the PD biasing voltage and the  $f_{lc}$  of the system. Ideally,  $R_B$  should have a value as high as possible to reduce its noise current contribution. However, the PD bias voltage ( $V_{PD} = V_{DD} - 2 I_{R_B}$ ) needs to be high enough for the PD to have sufficient BW and responsivity. Therefore, by knowing the maximum  $V_{DD}$  a process can handle, and the maximum expected DC current from the PD, the value of  $R_B$  can be set. The value of  $C_B$  is, then, selected with a trade-off between the BW and the  $f_{lc}$ . The parasitics of  $C_B$  show up as part of the total input capacitance, which will have a direct impact on the RX BW, setting the upper limit on  $C_B$ . As  $C_B$  is inversely proportional to  $f_{lc}$ , it should be sized to allow minimal drift in the longest consecutive identical digits in the targeted PRBS sequence at the lowest targeted data rate. As a rule of thumb, the maximum  $f_{lc}$  could be obtained as the ratio between the targeted data rate and the number of bits in the targeted sequence [103].

During an operation cycle, the most critical phase is the sampling and integration phase. The performance of the RX during that phase determines the overall performance in terms of noise, BW, and dynamic range. To gain more insight on what dictates the noise and BW performance, the SA can be reduced during the sampling phase to the circuit shown in Figure 4.9 (a). A transistor, in this case, can be represented as a switch with a series on-resistance ( $R_{ON}$ ) and infinite off-resistance, and the circuit could be represented by the model in Figure 4.9 (b).

Ideally, the PD current should be completely routed to  $C_{PP}$ . However, along the way from the input to the  $C_{PP}$ , the current will see different leaking paths with  $C_{PD}$  as the dominant one. Here,  $C_{PD}$  represents the total capacitance at the RX input, including the PD capacitance, pad capacitance, and any parasitic capacitance from  $C_B$ . The ratio of the current loss in  $C_{PD}$  is equivalent to the ratio of the SA's input impedance to  $C_{PD}$ 's impedance over frequency. This



Figure 4.9: (a) The current-SA during phase 2, (b) and its simplistic equivalent circuit of the current-SA in phase 2.

will determine the -3 dB BW of the system, according to equation 4.1.

$$f_{3dB} = \frac{1}{2\pi R_{ON} C_{PD}} \tag{4.1}$$

Moving on to the noise analysis, an ON switch can be viewed as a resistor with a parallel noise current source (for noise calculations a Thevenin's equivalent might be more convenient), and the RMS output noise voltage follows equation 4.2 [7].

$$V_{out}^{rms} = \sqrt{\frac{2KT}{C_{PP}}} \tag{4.2}$$

Therefore, increasing  $C_{PP,N}$  will lower the noise and enhance the sensitivity at the expense of the SA's gain and BW.

Finally, the input current to the current-mode receiver could vary due to a change in the received light intensity and/or the PD responsivity. Therefore, the RX's gain should be adjustable to accommodate as wide of a dynamic range as possible. The gain of an SA could be directly coupled to the slew rate of discharging the capacitance at the output nodes, and hence depends on the current strength and the value of the capacitance. To achieve this goal, the design uses a bank of capacitors for both  $C_{PP,N}$  and  $C_{SP,N}$  to control the capacitance and the current at the output nodes, respectively. Even though in the current implementation, the values of  $C_{PP,N}$  and  $C_{SP,N}$  are changed externally, in a complete implementation, a feedback loop sensing the output of the SA and trimming their values is necessary.

### **4.3** System implementation and measurements

The design was implemented in a CMOS 65 nm TSMC process, and Figure 4.10 shows the chip micrograph. Figure 4.11 shows a block diagram of the setup used in measuring the chip. A Pulse Pattern Generator (PPG)/Error Detector (ED) pair were used to generate the input data, and to check the output data for errors. A balun was used to generate fully differential clocks. A programmable delay line was used to adjust the clock-to-data delay. The chip includes a serial to parallel interface (SPI) that feeds all the programmability bits used in system selection, offset cancellation, gain control, and termination resistance trimming.

The chip has two copies of the proposed design with different input options. First, an on-chip modulated current source with a total parasitic capacitance of 8 fF is used as a PD emulator (EMU). This is particularly useful as it allows for testing the system in an all-electric test setup. It also gives a general idea about how the system would work if a low parasitic on-chip PD is used. After verifying the system's functionality using the PD-EMU, the second copy could be used with a wirebonded external PD (XPD) for a realistic test setup.

Ongoing measurements show that the system with the PD-EMU is functional up to 14 Gb/s and can achieve a BER of 1E–12 with an input current of 70  $\mu$ A, as shown in Figure 4.12. Table 4.1 includes a comparison between the best results obtained to date for our design with recently published designs.



Figure 4.10: current-mode receiver die micrograph.



Figure 4.11: current-mode receiver measurement setup.



Figure 4.12: current-mode receiver measurement: BER vs. input current

|                    | This work           | [101]      | [104] | [7]   | [8]   |
|--------------------|---------------------|------------|-------|-------|-------|
| Technology         | 65 nm               | 130 nm SOI | 65 nm | 90 nm | 28 nm |
| Data rate (Gb/s)   | 14                  | 10         | 25    | 16    | 25    |
| Sensitivity (µApp) | 70                  | 6          | 86    | 284   | 25    |
| CPD (fF)           | PD-EMU <sup>1</sup> | On-chip    | 60    | 440   | <25   |
| Efficiency (pJ/b)  | $0.27/2^2$          | 1.5        | 3.25  | 1.4   | 0.17  |

Table 4.1: Performance summary and comparison to state-of-the-art clocked TIAs.

<sup>1</sup>Estimated capacitance = 8 fF <sup>2</sup>Including clocking buffers and divider

# Chapter 5

# Silicon-Photonic coherent transmitter<sup>1</sup>

## 5.1 Introduction

The architecture of a coherent TX supporting DP-QPSK (or DP-QAM) signaling is shown in Figure 5.1. The Silicon-photonics IC consists of four MZMs, optical splitters and combiners, a polarization beam splitter and rotator (PBSR) [106, 107] and phase shifters. The Silicon-photonics TX is driven by four SiGe linear drivers. When adequately biased and driven, an operation that is described in detail later in this chapter, a DP-QPSK optical output is generated if the drivers' output are NRZ, and a DP- $M^2$ -QAM output is attained if the MZMs are driven by M-level PAM (PAM-M) signals.

In this chapter, we present a high-swing linear MZM driver. The driver utilizes several BW-extension circuit techniques, including a resistor-based capacitor splitting technique, to simultaneously achieves high BW (over 40 GHz), large swing (6  $V_{ppd}$ ), low total harmonic distortion (THD) (3.6%) while being protected against breakdown voltage (BV), and mitigating

<sup>&</sup>lt;sup>1</sup>© of IEEE. Reprinted, with permission from [105]



Figure 5.1: A Dual-Polarization (DP) QPSK/QAM coherent TX

reliability concerns in a 130 nm SiGe process. The driver also maintains a targeted swing and E/O BW using pre-emphasis control in its output stage and gain control in the pre-driver variable gain amplifiers (VGA). Finally, the driver is co-packaged with a Silicon-photonics optical TX enabling the same level of performance as LiNbO<sub>3</sub> MZMs with III-V drivers at 34 Gbaud.

This chapter is organized as follows. A description of operation and requirements for the coherent TX is given in Section 5.2. The details of the MZM driver are presented in Section 5.3. The test benches used to evaluate the fabricated prototype, along with the experimental results, are summarized in Section 5.4. Finally, the chapter is concluded and summarized in Section 5.5.

### **5.2** Coherent Silicon-photonics transmitter

Figure 5.2 shows the basic building block of a coherent TX consisting of an electrical driver and an MZM. The optical power intensity at any bias point for the MZM is given by (5.1).

$$P_{out}(t) = \frac{\alpha P_{in}}{2} \left( 1 + \cos \pi \frac{V_{RF}(t) + V_{bias}}{V_{\pi}} \right),\tag{5.1}$$

Where  $V_{RF}(t)$  is the driver output signal,  $V_{bias}$  is the voltage used to bias the MZM at the target operating point (of optical output power),  $V_{\pi}$  is the half-wave voltage,  $P_{in}$  is the input optical power to the MZM, and  $\alpha$  is the MZM insertion loss. Figure 5.3 shows the optical power intensity for the MZM as a function of the driver swing ( $V_{RF}$ ).

In an intensity-modulation TX, the MZM is biased at the quadrature point (Q), which is the drive voltage at which the output optical intensity drops to half of its maximum. The optical power intensity is modulated by the MZM sending high power for a logic-'1' and a low power for a logic-'0' in an NRZ transmission. The TX is optimized to deliver high launch power and high extinction ratio, and a swing of  $V_{\pi}$ .

On the other hand, in a coherent TX, the MZM is biased at the null bias point (N). Figure 5.3



Figure 5.2: The main building block of a coherent TX - a driver and a null-biased MZM.



Figure 5.3: Null-biased MZM operation to obtain a PSK modulated signal.

also shows the optical field  $(P \propto |E|^2)$  for the MZM. Equal but opposite swings at the null bias point lead to the optical output field changing its sign but maintaining the same magnitude. This enables phase modulation.

In addition to the transfer functions, Figure 5.3 shows the constellation for an NRZ driving signal. When a common-mode signal drives both MZM arms, the electric field vectors at the combiner are  $\pi$  out of phase, yielding an intensity null in the output optical power, and a (0,0) point on the constellation map. However, when a differential signal representing a logic-'1' is applied to the MZM, one arm gets a positive phase shift rotating its electric field vector anti-

clockwise, resulting in a non-zero optical signal and a point on the real axis of the constellation map. On the other hand, if a differential signal representing a logic-'0' is applied, the resulting output is of the same intensity as that of the logic-'1', but with a  $\pi$  phase shift. The operation of this fixture can be represented by a phase shift keying (PSK) constellation. Driven With NRZ signals, the logic-'1' and logic-'0' have similar intensity levels, but the optical fields have 0 and  $\pi$  phases, respectively. Moreover, the spectral efficiency of the TX can be increased by driving the modulator with a PAM-4 signal instead of NRZ, which results in a 4-PSK operation.

Figure 5.1 shows the schematic of the full coherent TX. It consists of four MZMs performing PSK each, grouped into pairs of two. Each pair is configured as a Mach-Zehnder interferometer (IQ-MZI), with a  $\pi/2$  phase shift concatenated to one MZM to generate both *I*-phase and *Q*-phase constellation points. To mathematically represent the output of the TX, the electrical field can be analyzed at different points along the optical TX. The electric fields at points 'A' and 'B' are represented in (5.2) and (5.3), respectively.

$$E_A(t) = E_I(t)\cos\left(\omega_{RF}t + \phi_I\right),\tag{5.2}$$

$$E_B(t) = E_Q(t) \sin\left(\omega_{RF}t + \phi_Q\right), \tag{5.3}$$

where  $\omega_{RF}$  is the angular frequency of the electric field,  $E_I (E_Q)$  and  $\phi_I (\phi_Q)$  are the electric field amplitude and phase at point 'A' ('B'), respectively.

At point 'C',  $E_A(t)$  and  $E_B(t)$  are combined and the resulting  $E_C(t)$  follows (5.4), where r and  $\theta$  are obtained according to (5.5) and (5.6), respectively.

$$E_C(t) = r(t)\cos\left(\omega_{RF} + \theta(t)\right),\tag{5.4}$$

$$r(t) = \sqrt{E_I^2(t) + E_Q^2(t)},$$
(5.5)

$$\theta(t) = \tan^{-1}\left(\frac{E_Q}{E_I}\right),\tag{5.6}$$

A QAM signal is thus generated by combining the *I*-phase and *Q*-phase.

A PBSR is then used to multiplex the signals from the upper and lower paths in the polarization domain. The output of such DP coherent optical TX can generally be represented using (5.7) and can be visualized via a constellation diagram.

$$E_{TX}(t) = [r(t)\cos\left(\omega_{RF} + \theta(t)\right)]_{X-pol} + [r(t)\cos\left(\omega_{RF} + \theta(t)\right)]_{Y-pol},$$
(5.7)

Thus the TX outputs a DP-QPSK (or DP-4QAM) signal for NRZ-driven MZMs and a DP-16QAM for PAM-4-driven MZMs, respectively.

By inspecting Figure 5.1 and Figure 5.3, it is evident that any deviation from a  $\pi$  phase shift in the inner MZMs or  $\pi/2$  in the outer IQ-MZI would result in a distorted constellation, and hence, a degraded BER [108]. Therefore, a control loop ensuring proper biasing of the MZMs and IQ-MZI is implemented. Similar to bias controls in discrete quadrature modulators like LiNbO<sub>3</sub>, dithering is applied to set all the TX phase shifts to their desired points [109].

In a coherent link, it is desirable to maximize the launch power by maximizing the optical power intensity of the symbols. A maximum power intensity can be achieved by driving the MZM to span a  $2V_{\pi}$  of its characteristics. Driving the MZM with a voltage swing lower than  $2V_{\pi}$  will result in a launch power degradation. The degradation from the maximum possible power is referred to as modulation loss (ML). However, due to the nature of the optical field, spanning  $2V_{\pi}$  means driving the MZM into a highly nonlinear region. As a compromise to

keep the operation as linear as possible while transmitting high launch power, a  $V_{\pi}$  driving swing around the null-bias point is usually targeted, resulting in  $\approx 3$  dB of ML. For instance, the MZMs in this work have electrodes designed as  $Z_0 = 44 \Omega$  transmission line as a result of the trade-off between the driver power consumption, MZM BW, and phase shift efficiency. The MZM is  $\approx 3.5$  mm long with  $V_{\pi}L = 2.44$  V.cm, and is driven by a 6 V<sub>ppd</sub> driver. That swing will span close to  $1V_{\pi}$  when the MZM is biased at the null for coherent operation and achieve an extinction ratio of 15.8 dB when the MZM is biased at the quadrature for intensity-modulation operation. It should be noted that achieving a full  $1V_{\pi}$  modulation would require re-designing either the MZM or the driver output stage. On the MZM side,  $V_{\pi}$  can be reduced by increasing the MZM's length, which is usually limited by the available area on the SiPh chip and adds to the MZM loss. Another way would be enhancing the phase shift efficiency at the expense of increasing the PN-junction capacitance, which in turn lowers the characteristic impedance of the transmission line, increasing the driver's power consumption. On the driver side, in the used technology, going beyond 6  $V_{ppd}$  will raise reliability concerns, and a breakdown tripler will be needed. Following the same approach for breakdown doubler, designing a tripler will require adding another auxiliary path and a stack of more transistors at the output stage, which will come at the expense of BW, headroom, and power consumption.

## 5.3 Mach-Zender modulator driver

In the presented work, the 44  $\Omega$ -MZM is driven by a SiGe electrical driver, as shown in Figure 5.4. The high-speed path of the driver includes an input buffer, two VGAs, and the output stage. To support the high-speed path, the driver also includes a cross-coupled quad

proportional to absolute temperature (PTAT) generator-based bandgap reference similar to [110]. The bandgap reference provides reference currents to the rest of the driver blocks. Moreover, to minimize the impact of any mismatch between the differential paths, the driver includes an offset cancellation circuit. As shown in Fig. 5.4, the DC levels of the driver outputs are extracted using an 8 kHz low pass filter and then compared. Any mismatch is then corrected for by the Gm-stage drawing unbalanced current from the input termination resistors using NMOS current steering pair. Furthermore, in the integrated solution, the driver's outputs are connected to the MZM and cannot be accessed for debugging or gain control purposes. Therefore, a peak detector (PKD) circuit [111], as shown in Fig. 5.4, is used to generate a signal strength indicator. To protect the driver against electro-static discharge (ESD) events, ESD devices are added to all the pads. The ESD devices provide 500 V protection with  $\approx 25$  fF capacitance and 2 kV protection with  $\approx 95$  fF capacitance for the high-speed pads and the DC pads, respectively.



Figure 5.4: Mach-Zender modulator driver block diagram.

#### 5.3.1 Input buffer

In a coherent TX, the signal processed by the digital signal processing (DSP) engine and RF Digital to analog converter (DAC) is fed to the MZM driver through the transceiver carrier in a 50  $\Omega$  environment. Therefore, the driver uses an emitter-follower-based input buffer along with 50  $\Omega$  input-side termination, as shown in Figure 5.5, to guarantee proper input matching, introduce the needed level shifting, and shield the parasitic capacitance of the following stages from input node.

#### 5.3.2 Variable gain amplifier

To compensate for variations in the input signal level as well as the interconnect losses between the DSP and the driver, two Gilbert cell-based VGAs are used, as shown in Fig. 5.6. Using control voltages  $VGA_P$  and  $VGA_N$ ,  $Q_3$ - $Q_6$  are used to steer the signal currents, changing the current levels delivered to the load  $R_L$ , and hence controlling the gain of the amplifier. The VGAs have continuous-time linear equalization of 4.6 dB [36] and provide 10 dB of gain range without significant changes to the BW or peaking.



Figure 5.5: Input buffer schematic.



Figure 5.6: VGA schematic.

#### 5.3.3 Driver output stage

As mentioned in Section 5.2, an output swing of 6 V<sub>ppd</sub> is chosen for the output stage as a realistic value to provide the needed  $V_{\pi}$  for the MZM. As the TX is targeted for > 200 Gb/s DP-16QAM operation, this imposes challenging specifications for the driver to be met simultaneously.

For instance, a 6  $V_{ppd}$  output swing must be supported in our SiGe technology with a breakdown characterization voltage ( $B_{VCEo}$ ) [112] of only 1.65 V. Simultaneously, a nominal electrical BW of 50 GHz and E/O BW of 40 GHz is targeted with sufficient slew-rate for large-signal swings. Moreover, with the coherent optical link relying on pulse shaping and the use of amplitude modulation to improve its spectral efficiency, the high-swing driver must also be highly-linear, with THD < 6%. The driver's THD performance has a direct impact on the SNR, and standards often specify a THD performance requirement [113]. Note that prior-art in high-speed large-swing drivers either supported switching (non-linear) output stages for NRZ signaling in lower-cost BiCMOS processes [112, 114, 115] or reduced- $V_{ppd}$  for a linear high-speed design using a state-of-the-art process with high- $f_T$  MOSFETS and HBTs [116]. E/O measurements are also seldom shown.

Traditionally, MZM drivers used a cascode design in their output stage to support high swings. However, in a conventional cascode design, Figure 5.7 (a), the emitter voltage of the cascode device  $Q_2$  is almost constant, and the maximum output swing is limited by the collector-emitter breakdown voltage of  $Q_2$ . To break the  $B_{VCEo}$ -limit on the output swing and achieve 3  $V_{pp}$  ( $\approx 2B_{VCEo}$ ) swing, the conventional cascode design can be modified to divide the output swing between  $Q_1$  and  $Q_2$  using an auxiliary path as in [114], Figure 5.7 (b). Instead of driving the cascode base with a constant voltage, the auxiliary path drives  $Q_2$  with a scaled copy of the signal, ensuring that the output swing is divided between  $Q_1$  and  $Q_2$ , hence doubling the maximum achievable output swing.

Although adding the auxiliary path helps increase the maximum achievable swing, it adds more capacitance to the input node (P), as shown in Figure 5.7 (c). As a result, the voltage doubler design suffers a BW reduction when compared to the conventional cascode design. Moreover, the MZM driver is required to deliver a high swing on a low impedance, requiring a large  $I_{Q1} (\geq V_{pp}/Z_o)$ . Consequently,  $Q_1$  is sized up to handle that current and to operate near its maximum  $f_T$  leading to a large parasitic capacitance  $C_T$  arising from the base-emitter capacitance ( $C_{BE}$ ) of  $Q_1$  and other parasitics. This, in turn, limits the BW at node (P). Moreover, unlike the driver in [114], which mainly focused on NRZ operation, the desired driver in our work must address linearity to satisfy the coherent link requirements. Therefore, our design must primarily focus on the BW, linearity, and swing in the output stage while ensuring the timing of the main and auxiliary paths are matched to avoid distorting the output. Note that small-signal gain/BW and linearity considerations reflect the overall performance even for a large-swing driver [112].

To benefit from the voltage doubling technique while alleviating the BW limitation due to



Figure 5.7: (a) A conventional cascode output stage. (b) A voltage breakdown doubling (VBD) output stage. (c) VBD limitations.

the added capacitance in the main and auxiliary paths, we present a resistor-based capacitor splitting, as shown in Figure 5.8. Inductor-based capacitor splitting techniques have been used before [37, 95] for gain-BW extension. Resistor-based capacitor splitting is usually not effective in conventional single-stage or multistage amplifier designs for gain-BW extension.

However, resistor-based capacitor splitting and shielding can lead to BW expansion in our twopath additive design without affecting the overall gain of the signal path. The auxiliary path, however, should be modified to ensure that the reliability criteria are still met.

In Figure 5.8 (a), the inputs to both paths are connected to the same node, and assuming  $A_1$  and  $A_2$  have sufficiently larger BW, the system has a dominant pole at 1/RC, as can be seen in (5.8). Here, R is the load resistance of the stage that drives the two paths, and C is the total load capacitance looking into the two paths. The input capacitance of the main path is attributed to the  $C_{BE}$  of its emitter-follower input stage, while the input capacitance of the auxiliary path is due to  $C_{BE}$  and the Miller-amplified  $C_{BC}$ . After properly sizing both paths, the input capacitance values of both paths are close in value in this design, assumed to be C/2 each. Furthermore, the additive gain of both paths can be approximated to that of the main path  $A_1$  as the auxiliary path's gain is heavily degenerated by the output resistance of  $Q_1$ .

$$\frac{V_{OUT}}{V_{IN}} = (A_1 - A_2) \frac{-g_{mQP}R}{1 + sRC} \approx \frac{-A_1 g_{mQP}R}{1 + sRC},$$
(5.8)

where  $V_{IN}$  is the input voltage to the output stage (from the VGA),  $V_{OUT}$  is the output voltage, and  $g_{mQP}$  is the transconductance of the stage that drives the two paths.

If the auxiliary path capacitance is shielded between R - r and r, as shown in Figure 5.8 (b), the transfer function of the system changes as in (5.9).

$$\frac{V_{OUT}}{V_{IN}} = \frac{-A_1 g_{mQp} R \left(1 + \frac{s(R-r)rC}{2R}\right)}{\left(\frac{s(R-r)rC^2}{4}\right)^2 + \frac{s(R+(R-r))C}{2} + 1},$$
(5.9)

As per (5.9), the resistor-based capacitor splitting technique creates a pole-zero pair at a

higher frequency, but more importantly, pushes the dominant pole further without affecting the overall gain of the system, as shown in Figure 5.8 (c). As shown in Figure 5.8 (d), a higher split-ratio (r/R) leads to a higher BW extension ratio (BWER) but also reduces the signal into the auxiliary path that must be compensated by the auxiliary path. A split ratio of r/R = 0.5 leads to BWER = 1.5.



Figure 5.8: (a) Main and auxiliary paths connect to the same driving point. (b) Resistor-based capacitor splitting. (c) BW extension due to resistor-based capacitor splitting. (d) BW extension ratio vs. r/R.



Figure 5.9: Output stage design evolution, changes in the driver transfer function and THD enhancement: Voltage break-down doubler  $\rightarrow$  Capacitor-splitting  $\rightarrow$  emitter-degeneration  $\rightarrow$  emitter-degeneration and pre-emphasis.

With the design targets and considerations described earlier, the rest of this Section explains the steps and the design flow for the output stage. Figure 5.9 shows the design evolution. After sizing the output differential-pair ( $Q_1$ ) and cascode transistors ( $Q_2$ ) to accommodate the needed current to achieve the targeted large signal swing on the load, matching between the main and auxiliary paths is addressed. Since the output resistance of  $Q_1$  acts as a high emitter-degeneration for  $Q_2$ , the auxiliary path exhibits high BW compared to the main path. Therefore,  $C_H$  and  $R_H$  are used to slow down the auxiliary path to match the timing of the main path [114]. The inadequate BW of the voltage breakdown doubling (VBD) architecture is enhanced by pushing the dominant pole using resistor-based capacitor splitting. The split in this design results in a BWER = 1.3 without any power, gain, or area overhead. A side effect of this technique is the drop in the auxiliary path gain. The drop can be compensated by increasing  $R_H$  and decreasing

 $C_H$ , which is easily accomplished. However, the practical BWER attainable from this technique is limited by the effect increasing  $R_H$  has on headroom, timing mismatch, and reliability. For instance, r and  $R_H$  should be carefully changed to avoid changing the operation region of  $Q_3$ and make sure it stays in the active region. Moreover, to maintain the timing between both paths, an increase in  $R_H$  should accompany a reduction in  $C_H$ . Eventually, the capacitance seen by  $R_H$  will be limited by the parasitic capacitance of  $Q_2$ . Finally, the breakdown voltage in the used technology is rated for an open base operation ( $BV_{CEO} = 1.65$  V), and 100  $\Omega$  base resistor connection ( $BV_{CE|R=100} \simeq 2.1$  V). Using low  $R_H$  values allows a margin to exceed  $BV_{CEO}$  without sacrificing the reliability of  $Q_2$ . In this work, BWER is practically limited by the parasitic capacitance of  $Q_2$  to 1.5. To allow a reasonable margin in the design, a split ratio resulting in BWER = 1.3 is chosen. Next, resistor-degeneration  $(R_{E1})$  is implemented for  $Q_1$ . The degeneration increases linearity and BW as the transconductance of  $Q_1$  ( $g_{m1}$ ) and its base-emitter capacitance ( $C_{BE1}$ ) are decreased by  $[1 + g_{m1}(R_{E1}/2)]$ . This also reduces the gain, but the reduction is acceptable as it brings the overall gain closer to the targeted range. It is important to note that BW extension and linearity enhancement in the linear, high-swing, and low-impedance driving output stage is more important than gain reduction. The gain can be compensated in the VGA stages with more relaxed design constraints. To maintain timing, BW, and linearity matching between both paths, emitter degeneration  $(R_{E3})$  is also implemented in the auxiliary path with  $R_{E3}$  chosen to satisfy (5.10), where  $g_{m3}$  is the transconductance of the common-emitter transistor  $(Q_3)$  in the auxiliary path.

$$1 + g_{m1}\left(\frac{R_{E1}}{2}\right) = 1 + g_{m3}\left(\frac{R_{E3}}{2}\right),\tag{5.10}$$

The degeneration increases the overall BWER to 2.3 and improves the simulated 1 GHz THD from 6.7% to 5.1%. The THD is simulated using a 1 GHz single-tone input signal and by observing the power levels of the fundamental and its harmonics at the output [117]. Finally, emitter degeneration at  $Q_5$  ( $R_{E5}$ ) is used to further improve the linearity for an overall THD of 3.6%, and pre-emphasis capacitor ( $C_{E5}$ ) is used to introduce a low-frequency zero and push the overall BWER to 4.54.

Overall, the combination of these techniques enhances the electrical BW from 11 GHz to 50 GHz and reduces THD by 47%, from 6.7% to 3.6% for the high-swing driver. The tunable  $C_{E5}$ , along with the VGA deemphasis, compensates variations to ensures a minimum E/O BW of 40GHz.

Even though small-signal analysis provides the needed intuition to understand and design the output stage, large-signal analysis, shown in Figure 5.10, provides beneficial insights to help better understand the design. The timing of the main and the auxiliary paths have to be matched to minimize the output distortion and maintain breakdown protection, as shown in Figure 5.10 (a). Since the auxiliary path exhibits high BW compared to the main path, the signal traveling through the main path arrives at the output with a delay compared to the auxiliary path's signal, the output signal could experience severe distortion. Therefore, properly slowing down the auxiliary path using  $C_H$  [114] and speeding up the main path using emitter-degeneration  $R_{E1}$  to have matched delays guarantee a minimized output distortion.

Another large-signal consideration is the dynamic base currents of  $Q_1$  and  $Q_2$ . Although their static base currents are low, high dynamic base currents are needed to charge-discharge their large parasitic capacitances ( $C_{BE}$  and  $C_{BC}$ )[112]. Therefore, the driving stages must be designed to handle the needed currents. As shown in Figure 5.10 (b), the charge-discharge currents of  $C_H$  oppose those of  $C_{BE2}$  and  $C_{BC2}$  minimizing the overall needed dynamic current at node (N2). Moreover, degenerating  $Q_1$  reduces the effective  $C_{BE1}$  capacitance, reducing the needed dynamic current at node (N1). Hence,  $C_H$  and  $R_{E1}$  relax the needed driving capabilities of the paths-driving stage and enhance the speed of the overall system.

The complete output stage combines all these steps in a differential fashion, as shown in Figure 5.11. It consists of a driving stage with a split load resistor, a main path providing all the needed gain, and an auxiliary path to double the maximum achievable swing with its input capacitance shielded by the resistor-based capacitor splitting. The output stage uses ground as the high supply and -5.5 V as the low supply. Drop diodes are used to adjust the DC operating point of each stage separately.





Figure 5.10: Large signal analysis for the output stage: (a) Effect on the output based on matching the timing between the two paths. (b) Dynamic base currents.



Figure 5.11: The complete output stage of the MZM driver.

# 5.4 System implementation and measurements

A prototype 2-channel driver chip is fabricated on a 250 GHz  $f_T$  130 nm SiGe BiCMOS process. Two 2-channel driver dies, and a Silicon-photonics IC are flip-chipped to a ceramic substrate and co-packaged, as shown in Figure 5.12. The driver is first tested electrically by wafer probing using a 40 GHz vector network analyzer. An electrical S21 BW higher than 40 GHz, as shown in Figure 5.13, and an S11 < -10 dB for frequencies up to 40 GHz, as shown in Figure 5.14, are obtained, respectively. Then, the E/O BW of the flip-chipped assembly is tested using a 50 GHz optical-vector network analyzer setup, which showed a 40 GHz E/O BW, as shown in Figure 5.15.



Figure 5.12: Die photos of the Silicon-photonic IC and driver chips alongside the flip-chipped assembly.



Figure 5.13: Electrical single-ended S21 of the driver.



Figure 5.14: Electrical single-ended S11 of the driver.



Figure 5.15: E/O S21 of the flip-chipped assembly.

The ROSNR at the RX is an overall metric to characterize the coherent link performance. The test setup to measure the ROSNR is shown in Figure5.16 (a). For a constant optical power at the RX set by the variable optical amplifier  $VOA_2$ , the noise power added to the TX output is controlled by  $VOA_1$  until a target BER is reached at the RX, determining the ROSNR at that point. For this application, a BER of 3E–2 is targeted. As shown in Figure 5.16 (b), for the targeted BER and RX input optical power ranging from –25 to 0 dBm, the 34 Gbaud DP-16QAM performance of the presented TX matches that of a commercial fixture comprising a 32 Gbaud III-V linear driver and 35 GHz LiNbO<sub>3</sub> modulator.

The 2-channel driver chip occupies an area of  $1.6 \text{ mm}^2$ . The output stage consumes 145 mA from a -5.5 V supply, while the pre-amplifiers along with the rest of the blocks on the chip consume 75 mA from a -3.3 V supply. Accordingly, the total measured power consumption is  $\approx 1 \text{ W}$  per channel and Figure 5.17 shows the power breakdown based on post-layout relative contributions.

The full E/O assembly of four drivers and dual-polarization IQ modulators is tested using an optical modulation analyzer [118]. An optical modulation analyzer is used to visualize the transmitted signal in a constellation format and construct the TX output eye diagrams for different phases and polarizations. A commercial coherent DSP and a commercial 8-bit 64 GS/s DAC are used to provide the needed multilevel PRBS-31 signals for different test conditions. The DAC is connected to the drivers through a board and cabling with 14 GHz bandwidth restraining the link performance and is partially compensated by the VGA continuous-time linear equalization and DSP equalization. Figure 5.18 shows the constellations for 34Gbaud DP-16QAM, achieving an aggregate data rate of 272 Gb/s/ $\lambda$  at a BER less than 1E–3, and Figure 5.19 shows the constellations for 64 Gbaud DP-QPSK, achieving an aggregate data rate



Figure 5.16: (a) ROSNR test setup. (b) ROSNR performance of this work compared to a setup comprising of a III-V driver and  $LiNbO_3$  modulator.



Figure 5.17: Breakdown of the power consumed in the driver.

of 256 Gb/s/ $\lambda$  at a BER less than 1E–6. In both cases, the BER is well below the FEC threshold of 3E–2.

The optical modulation analyzer is used to recover the X-polarization (I-phase & Q-phase) and Y-polarization (I-phase & Q-phase) eye diagrams for the 272 Gb/s DP-16QAM and 256 Gb/s DP-QPSK cases as shown in Figure 5.20 (a) and (b), respectively.

To enable higher data rates and high order modulation schemes such as 64QAM, a stateof-the-art coherent DSP is used to test the E/O TX. Moreover, the overall linearity of link is enhanced by reducing the driver's output swing to 2.4 V<sub>ppd</sub>, allowing for a PAM-4 and PAM-8 operation at high baud rates. However, as a result, the ML is increased by 3 dB, and  $P_{LO}$ is increased to maintain the same launch power. Consequently, 552 Gb/s/ $\lambda$  is achieved with 69 Gbaud DP-16QAM at a BER of 3.2E–2, and 408 Gb/s/ $\lambda$  is achieved with 34 Gbaud DP-64QAM at a BER of 9.6E–3, as shown in Figure 5.21 and Figure 5.22, respectively, allowing for post-FEC error-free operation.



Figure 5.18: Constellation for 272 Gb/s DP-16QAM.



Figure 5.19: Constellation for 256 Gb/s DP-QPSK.



Figure 5.20: optical modulation analyzer-recovered eye diagrams for (a) 272 Gb/s DP-16QAM, and (b) 256 Gb/s DP-QPSK.



Figure 5.21: Constellation for 552 Gb/s DP-16QAM.



Figure 5.22: Constellation for 408 Gb/s DP-64QAM.

The performance summary and comparison with the state-of-art drivers are shown in Table 5.1. Using only 250 GHz- $f_T$  HBTs in the high-speed path, this is the only design to report E/O BW of 40 GHz. Moreover, the design achieves the highest gain of 30 dB while allowing for a 10 dB dynamic range to accommodate any changes in the DSP signal strength. Most of the designs for high-swing drivers in Table 5.1 are non-linear switching amplifiers. This

design achieves high linearity, with 3.6% THD at 1 GHz. Table 5.2 summarizes the overall single-polarization (SP) and DP performance of the E/O TX assembly.

|                            | [114]       | [112]       | [115]       | [116]                 | This work            |
|----------------------------|-------------|-------------|-------------|-----------------------|----------------------|
| Gbaud                      | 10          | 40          | 40          | 56/64                 | 34/69/64             |
| Signalling                 | NRZ         | NRZ         | NRZ         | PAM-8/PAM-4           | PAM-8/PAM-4/NRZ      |
| Driver type                | Switching   | Switching   | Switching   | Linear                | Linear               |
| Elec. BW (GHz)             | N/R         | 33.7        | N/R         | 57.5                  | 40 <sup>1</sup>      |
| E/O BW (GHz)               | N/R         | N/R         | N/R         | N/R                   | 40                   |
| Gain (dB)                  | 23.6        | 13          | 26          | 14.5                  | 20-30 (With VGA)     |
| Supply <sup>2</sup> (V)    | 6.5         | 5.5         | 5           | 6                     | 5.5                  |
| Output (V <sub>ppd</sub> ) | 7.6         | 6           | 6           | 3.8/4.8               | 2.4/2.4/6            |
| THD                        | N/A         | N/A         | N/A         | 6%                    | 3.6%                 |
|                            |             |             |             | (at 10 GHz and 6Vppd) | (at 1 GHz and 6Vppd) |
| P <sub>DC</sub> (W)        | 3.7         | 1.35        | 1.92        | 0.82                  | 1                    |
| Area (mm <sup>2</sup> )    | 1.2         | 0.72        | 3           | 0.6                   | 1.6                  |
| BiCMOS Tech.               | 180nm       | 250nm       | 130nm       | 55nm                  | 130nm                |
| $f_T$ (GHz)                | 120         | 180         | 200         | 300                   | 250                  |
|                            | (HBTs only) | (HBTs only) | (HBTs only) | (HBTs and MOSFETs)    | (HBTs only)          |

Table 5.1: Performance summary of the high-swing linear driver and comparison to prior art

<sup>1</sup>Measured on a 40 GHz Vector Network Analyzer

<sup>2</sup>Of the output stage

Table 5.2: E/O TX assembly performance summary.

| Modulation                        | SP-QPSK | DP-QPSK | SP-16QAM | DP-16QAM | DP-16QAM | DP-64QAM |
|-----------------------------------|---------|---------|----------|----------|----------|----------|
| Gbaud                             | 64      | 64      | 45       | 34       | 69       | 34       |
| Data rate (Gb/s)                  | 128     | 256     | 180      | 272      | 552      | 408      |
| Driver output (V <sub>ppd</sub> ) | 6       | 6       | 6        | 6        | 2.4      | 2.4      |
| # of driver modules               | 2       | 4       | 2        | 4        | 4        | 4        |
| P <sub>DC</sub> (W)               | 2       | 4       | 2        | 4        | 4        | 4        |
| Energy/bit (pJ/b)                 | 15.6    | 15.6    | 11.11    | 14.7     | 7.2      | 7.35     |

# 5.5 Summary

In this chapter, we present a Silicon-photonic MZM-based optical TX, and linear, high-swing SiGe drivers. The spectral efficiency of the optical TX is enhanced using advanced modulation schemes such as QPSK and QAM, and polarization multiplexing. To enable that, various circuit techniques are described to meet the stringent requirements of high swing, BW, and

linearity for the MZM driver, along with the high-level of integration that Silicon-photonics offer. The low-cost, compact, and all-Si/SiGe design matches the ROSNR performance of LiNbO<sub>3</sub> modulators with III-V drivers at 34 Gbaud. To the best of our knowledge, this is the first published Silicon-photonic design to support DP-16QAM at data rates higher than 500 Gb/s/ $\lambda$ .
# Chapter 6

## Silicon-Photonic coherent receiver

### 6.1 Introduction

Coherent O/E RX support high spectral efficiency by using varying-envelope modulation schemes, such as 16QAM, and DP multiplexing [14]. Prior-art in linear TIAs have focused on increasing transimpedance gain ( $Z_T$ ) [119–122], –3 dB BW [120, 122], and reducing high- $Z_T$ input-referred noise (IRN) [119–122] and 1GHz THD [119–121]. However, to support data rates beyond 400 Gb/s/ $\lambda$  and high order modulation in links where the RX input signal current ( $I_{IN}$ ) vary significantly based on TX power, fiber length, and multiplexing/demultiplexing losses, link considerations demand exceedingly stringent IRN-THD performance: (1) Low THD across a large range of  $I_{IN}$ . (2) Low THD at 1 GHz and up to a frequency of BW/3. (3) Low IRN at maximum  $Z_T$  and at lower  $Z_T$  (larger  $I_{IN}$ ) to maintain SNR gains.

The O/E BW must be sufficiently high to support the data rate, but not exceedingly large so as to minimize channel crosstalk, IRN and THD. For a 66 Gbaud operation, a BW of 40 to 45 GHz is targeted.

Figure 6.1 shows a DP-QAM Silicon-photonic O/E RX. The received DP optical signal,  $P_{RF}$ , is demultiplexed using a PBSR. After demultiplexing the incoming RF signals into X– and Y–polarization, a 90°-hybrid, is used to down-convert the *I*-phase and *Q*-phase signals directly into baseband, in an intradyne RX architecture ( $\omega_{LO} \approx \omega_{RF}$ ). Inside the 90°-hybrid, the received signal is split into two paths to be mixed with the LO in *I*&*Q*-phase fashion. Each PD receives an optical signal  $E_{PD}(t) = E_{LO}(t) + E_{RF}(t)$ , and the resulting current  $I_{PD}$ follows equation 6.1.

$$I_{PD}(t) = \frac{R}{8} \left( P_{LO} + \sqrt{P_{RF}P_{LO}} \cos\left(\Phi(t)\right) \right), \tag{6.1}$$

where *R* is the PD responsivity,  $\Phi$  is the phase modulation component of the received signal, and  $P_{LO}$  and  $P_{RF}$  are the LO power and the received signal power, respectively. As can be seen in the second term of equation 6.1, the received input signal experiences an optical gain due to the mixing with  $P_{LO}$ . As a result, the sensitivity of the coherent RX is enhanced by increasing the LO power, and is superior to that of a direct-detect RX by a factor of  $2\sqrt{P_{LO}/P_{RF}}$  [119]. A detailed discussion on the design challenges for a coherent RX is provided in [119].

The RX high-swing linear output is digitized and processed via a commercial analog to digital converter (ADC) and a coherent DSP engine.

The O/E RX must support a wide range of  $P_{RF}$  ( $I_{IN} \propto P_{RF}$ ) and LO powers, as shown in Figure 6.2. At very low  $I_{IN}$ , the RX is noise-limited and does not meet the ROSNR target. Increasing  $I_{IN}$  improves the SNR. However, to maintain a constant signal swing at the RX output for the ADC, larger  $I_{IN}$  must accompany appropriate  $Z_T$  reduction. But,  $Z_T$  reduction usually comes at the expense of a degraded IRN [119–122], limiting the ROSNR improvement



Figure 6.1: A dual-polarization QAM Silicon-photonic O/E RX.

and degrading the constellation. Increasing  $I_{IN}$  further pushes the RX in THD-limited regime. A low target-ROSNR for DP-16QAM constellation and across a wide range of input signals can only be met by making the RX reconfigurable, achieving low noise and THD across gain settings. To achieve that, the RX discussed in this chapter monitors the output level and automatically adjusts 5 different parameters to operate at the optimum point.



Figure 6.2: IRN, THD, and ROSNR vs. *I*<sub>IN</sub> for different RX designs.

### 6.2 The proposed Auto-reconfigurable receiver

#### 6.2.1 Auto-reconfigurable Transimpedance Amplifier

The RX first-stage is realized using a resistive feedback TIA, in which, a feedback resistor  $(R_F)$  is connected around a differential amplifier, and the TIA gain is approximately  $R_F$ . In [120–122], the value of  $R_F$ , and hence the gain of the first-stage, is kept constant. VGAs in subsequent stages realize the dynamic range. However, in systems where a wide dynamic range is needed,  $R_F$  control is required to realize the low ends of the dynamic range. Figure 6.3 shows the required reconfiguration, and the design methodology is as follows:



Figure 6.3: TIA design for improving THD across different frequencies for low-gain settings:  $g_m$ -control  $\rightarrow Q_1$  reconfigurability  $\rightarrow R_L$ -control.

- 1. Although  $R_F$  reduction can be used, it degrades the phase margin leading to unwanted peaking in the output voltage (at 36 GHz in this design) which, in turn, degrades the high-frequency THD, as shown in Figure 6.3. The THD calculated from the system's large-signal response can be related to peaking in its small-signal response with the worst THD at the third of the peak frequency.
- 2. In order to enhance the TIA phase margin,  $R_F$  reduction is accompanied by a reduction in the forward path gain. The  $g_m$  of  $Q_1$  is reduced by reducing  $I_1$ , and hence the forward path gain is reduced. The phase margin enhancement due to  $g_m$ -control reduces the peaking and the mean (over frequencies up to BW/3) THD to 9.1%.

However,  $Q_1$  was sized in part to operate near its maximum  $f_T$  at the original value of  $I_1$ 

at high-gain (high  $R_F$ ) settings. Therefore, at low-gain settings, a reduction in  $I_1$  reduces  $f_T$  of  $Q_1$ , increases the delay around the feedback loop, and degrades the phase margin.

3. Thus, the  $g_m$ -control is augmented with  $Q_1$  size-control to maintain a high  $f_T$  operation across the dynamic range. Note that for an NPN, the impact of  $Q_1$  size-control on  $g_m$  is not significant. The addition of the  $Q_1$  size-control fully utilizes the  $g_m$  reduction benefits and decreases the mean THD to 7.8%.

Moreover, the PD-to-TIA connection acts as an LC network interacting with the TIA's input impedance ( $Z_{IN}$ ). As  $R_F$  changes,  $Z_{IN}$  changes and unwanted peaking is created (at 16 GHz in this design) with the most peaking happening at the lowest  $R_F$  values.

4. In addition to the  $g_m$ -control,  $R_L$ -control is introduced to stabilize the  $Z_{IN}$ , flatten the low-frequency peak and improve the mean THD to 5.5%.

Moreover, enhancing the phase margin and reducing the unwanted peaking lowers the group delay variations across gain settings and over frequencies to 30 GHz to less than 30 psp.

Besides phase margin enhancement, reconfiguring  $Q_1$  size directly impacts the RX IRN and THD, as shown in Figure 6.4. At low  $I_{IN}$  (high  $Z_T$ ), RX performance is noise-limited. Given that the parasitic base resistance of  $Q_1$  ( $r_b$ ) acts as a major noise contributor, its value should be minimized by sizing up  $Q_1$ . However, at high  $I_{IN}$  (low  $Z_T$ ), RX performance is THD-limited and the nonlinear parasitic capacitance ( $C_{bc}$ ) of  $Q_1$  contributes to the high-frequency THD. Thus, sizing  $Q_1$  down enhances the high-frequency THD. The conflicting requirements for IRN-THD are resolved by splitting  $Q_1$ , as shown in Figure 6.4. At low  $Z_T$ , the effective size is reduced to  $Q_{1A}$  by turning  $I_2$  off. As a result,  $C_{bc}$  is reduced lowering the high-frequency THD. As  $Z_T$  increases,  $I_2$  starts increasing and  $Q_{1B}$  is fully utilized at maximum TIA gain. As a



Figure 6.4:  $Q_1$ -reconfigurability for optimizing linearity at large  $I_{IN}$  and noise at small  $I_{IN}$ . result,  $r_b$  is reduced lowering the IRN. Figure 6.4 shows simulated IRN and THD performance comparing a TIA with large, small and reconfigurable  $Q_1$  devices. The performance of the reconfigurable device approaches the IRN performance of the large device at maximum gain, and the THD performance of the small device at minimum gain, hence covering a wide dynamic range without sacrificing neither linearity nor noise.

Due to the dynamic changes in  $g_m$  and  $R_L$ , the common-mode voltages in the TIA loop and at the TIA output can experience large variations potentially putting the RX at suboptimal bias points. DC loops are thus augmented into the TIA and linked to the  $g_m$ ,  $R_L$  controls to compensate for any common-mode change across different gain settings.

#### 6.2.2 Collaborative offset and DC cancellation

In a coherent O/E RX, the received signal is mixed with the LO and then applied to the PD. Consequently, an increase in the LO power increases the input current to the TIA and enhances the RX sensitivity. However, it also increases the PD DC current ( $I_{DC}$ ) as a mixing sideproduct. Thus,  $I_{DC}$  cancellation (IDCC) loop is essential for each input to retain the sensitivity enhancement and protect the RX from current overdrive. As shown in Figure 6.5, the IDCC compares the single-ended input common-mode voltages to a predetermined reference voltage ( $V_{REF}$ ) and sinks any excess  $I_{DC}$ .

In addition, a DC offset cancellation (DCOC) loop is required to cancel any offset in the TIA or between the two IDCCs. The DCOC senses the difference between P and N of a differential signal at its input and corrects the offset by drawing an unbalanced current at its output. In [119], the DCOC is connected to the TIA output, and with the DCOC load being an emitter-follower stage, it can drop the DCOC gain significantly and limit its correction range. Moreover, any offset from the PDs and/or the IDCCs will propagate through the TIA degrading dynamic performance including THD. Therefore, a better approach is to correct the TIA offset at its input. However, both IDCC and DCOC attempting to set input common-mode voltages can create contention. Thus, a collaborative offset and  $I_{DC}$  cancellation (COIDCC) loop is proposed in which the output of the DCOC is scaled and added to the IDCC  $V_{REF}$  to create a customized  $V_{REF}$  for the P and N paths separately. The COIDCC protects the first-stage against high  $I_{DC}$  and resolves offset at the very input to maintain high linearity.



Figure 6.5: Block level diagram of a single channel RX along with the details for Collaborative offset and  $I_{DC}$  cancellation (COIDCC) and the gain control.

#### 6.2.3 Automatic/Manual gain control

Figure 6.5 shows the block diagram of the RX. In addition to the reconfigurable TIA, the high-speed path consists of two current-steering VGAs similar to [119–122] and a 50  $\Omega$  output driver. The gains of the TIA and the VGAs can be adjusted manually by an externally applied voltage, or automatically via an automatic gain control loop. In the automatic gain control mode, the output level is estimated by a peak detector circuit and compared to an externally applied voltage (OA) set to get a targeted output voltage swing. The result of comparing OA

to the peak detector output is the gain control voltage ( $V_{GC}$ ). Alternatively, in manual mode,  $V_{GC}$  can be applied externally and the automatic gain control loop is broken. In both cases, the  $V_{GC}$  voltage is then processed in the control-generation block and TIA/VGA control signals are generated.

To get the best noise performance, the gain stages are sequentially controlled. The TIA gain is maximized first, then the VGAs start contributing to the  $Z_T$  with VGA1 starting slightly before VGA2. This permits the RX to benefit from any increase in the input signal without



Figure 6.6: Critical signals generated from the control block to control the TIA and VGAs as  $V_{GC}$  changes.

significantly increasing the RX noise contribution, hence enhancing the SNR. Figure 6.6 shows plots of the main control signals for the TIA and the VGAs. To generate those signals,  $V_{GC}$  is fed into multiple control-generation building blocks (Figure 6.1), and the generated curves have the nature of a differential amplifier DC transfer characteristics.

The reference voltage  $(V_R)$ , tail current (I) and degeneration resistance  $(R_E)$  are used to control the slope and the start/end values of each curve separately. The control signals are provided as currents to the high-speed blocks where they are converted into control voltages.  $R_F$  is varied by controlling a shunt NFET, while  $R_L$  is varied by controlling a shunt PFET.

### 6.3 System implementation and measurements

The auto-reconfigurable RX was implemented on 0.13  $\mu$ m SiGe process. The electrical RX is tested on-wafer. Figure 6.7 shows a maximum  $Z_T$  of 75.5 dB $\Omega$  and a 35.5 dB  $Z_T$  range with BW > 40 GHz across gain settings. Figure 6.8 shows a maximum-gain IRN of 18.5 pA/ $\sqrt{Hz}$ , with minimal IRN degradation as the gain drops to  $\approx 0.5$  k $\Omega$  allowing SNR to benefit from any received signal increase. RX THD is measured at different input currents and frequencies: 1 GHz THD remains low, and worst-case THD at 10 GHz is still below 10% as shown in Figure 6.9. As shown in Figure 6.10, thanks to the COIDCC, the TIA can handle up to 8.5 mA of measured  $I_{DC}$  into each terminal.

Electrical driver ICs, RX ICs, and Silicon-photonic transceivers are flipped-chipped to a substrate and co-packaged. Figure 6.11 shows the O/E RX. Thanks to the auto-reconfigurable TIA, the O/E RX meets the target 24 dB-ROSNR at 400 Gb/s/ $\lambda$  for PRF > -23 dBm and up to  $P_{RF} = 1$  dBm (limited by test setup) at the commercial DSP FEC threshold of 1E-2 allowing



Figure 6.7: Measured  $Z_T$  at max. and min. gain.



Figure 6.8: Measured IRN vs.  $Z_T$ .

for zero post-FEC errors. Figure 6.12 shows the constellations for 66 Gbaud DP-16QAM, achieving 528 Gb/s/ $\lambda$  with zero post-FEC errors. TABLE 6.1 summarizes the performance and



Figure 6.9: Measured THD at different frequencies and inputs.



Figure 6.10: Measured input DC voltage vs. the injected DC current.

compares it to prior-art.



Figure 6.11: Die micrographs depicting the O/E RX: Silicon-photonic IC along with  $2 \times 2$  RX ICs, and a zoom-in on the RX IC.



Figure 6.12: Measured constellations for 528Gb/s DP-16QAM O/E RX.

|                                              | This work | [119]    | [120]      | [121]     | [122]     |  |
|----------------------------------------------|-----------|----------|------------|-----------|-----------|--|
| Baud rate (Gbaud)                            | 66        | 34       | 64         | 32/25     | 100/50    |  |
| Modulation                                   | DP-16QAM  | DP-16QAM | SP-QPSK    | NRZ/PAM-4 | NRZ/PAM-4 |  |
| O/E RX bit rate (Gb/s)                       | 528       | 272      | 128        | 32/50     | 100       |  |
| -3dB Elec. BW (GHz)                          | 42        | 27       | 53         | 33        | 42        |  |
| $Max Z_T (dB\Omega)$                         | 75.5      | 73       | 80         | 74        | 68.5      |  |
| Z <sub>T</sub> range (dB)                    | 35.5      | 43       | -          | 24        | 20        |  |
| Target ROSNR <sup>1</sup> (dB)               | 24        | N/R      |            |           |           |  |
| IRN @ max $Z_T (pA/\sqrt{Hz})$               | 18.5      | 20       | 24.8       | 12.2      | 8         |  |
| <b>IRN @ 500</b> $\Omega$ (pA/ $\sqrt{Hz}$ ) | 42        | 100      | 65         | 40        | N/R       |  |
| 1GHz THD @3mAppd (%)                         | $2.2^{2}$ | N/R      | $2.55^{3}$ | N/R       | N/R       |  |
| Avg. <sup>4</sup> THD @3mAppd (%)            | 4.4       | N/R      |            |           |           |  |
| Power/TIA (mW)                               | 275       | 313      | 277        | 218       | 150       |  |
| <b>Technology</b> (µm)                       | 0.13 SiGe |          |            |           |           |  |

Table 6.1: Performance summary and comparison.

<sup>1</sup>50Gbaud DP-16QAM, <sup>2</sup>500mVppd output, <sup>3</sup>600mVppd output, <sup>4</sup>Averaged over frequency

### 6.4 Summary

The auto-reconfigurable TIA minimizes THD and noise across a wide range of input signal and frequencies to meet a target-ROSNR. To the best of our knowledge, this is the first published work to report an auto-reconfigurable TIA breaking IRN-THD trade-off, high-frequency THD measurements, and support >0.5Tb/s/ $\lambda$  DP-16QAM.

## Chapter 7

# **Conclusion and future work**

### 7.1 Conclusion

Silicon-photonics, along with innovative circuit techniques, have paved the way for O/E links to satisfy the ever-growing communication demands for different applications and reaches.

In warehouse-scale datacenters, where implementations are currently dominated by MM VCSEL-based links, the increase in loss and modal dispersion in MMFs render them unattractive from the perspective of link energy efficiency for medium reach. MRR-based Silicon-photonic links utilizing SMFs are a promising alternative. This dissertation presents a comprehensive overview of the state-of-the-art of optical devices and CMOS circuit-level simulations, along with a detailed power and noise analysis for both VCSEL and MRR-based links. To benefit from MRR-based links while bridging the performance gap in energy efficiency to the VCSEL-based links, several research opportunities at the device, circuit, and link-level are discussed.

On the RX side, an APD greatly enhances the sensitivity of the O/E RX, which in turn enhances the energy efficiency of the O/E link. In this dissertation, we present an APD-based O/E RX achieving the best-reported sensitivity to date among 850 nm linear CMOS TIAs at 10 Gb/s. Moreover, from the perspective of CMOS circuit design, alternate TIA topologies must be developed to improve the sensitivity of the RX. Therefore, a current-based RX attempting to break the gain-BW-power trade-offs facing conventional RX designs is proposed.

For long-reach communication, the spectral efficiency of the optical TX is enhanced using advanced modulation schemes such as QPSK and QAM, and polarization multiplexing. In this dissertation, we present a Silicon-photonic MZM-based optical TX, and linear, high-swing SiGe drivers. The low-cost, compact, and all-Si/SiGe design matches the ROSNR performance of LiNbO<sub>3</sub> modulators with III-V drivers at 34 Gbaud. To the best of our knowledge, this is the first published Silicon-photonic design to support DP-16QAM and DP-64QAM at data rates higher than 500 Gb/s/ $\lambda$ .

Finally, we present an O/E RX for long-reach applications. The RX uses an autoreconfigurable TIA to minimize THD and noise across a wide range of input signal and frequencies to meet a target-ROSNR. To the best of our knowledge, this is the first published work to report auto-reconfigurable TIA breaking IRN-THD trade-off, addressing high-frequency THD, and supporting >0.5Tb/s/ $\lambda$  DP-16QAM.

### 7.2 Future work

As PAM-4 promises higher throughput for the same baud rate compared to NRZ, it is being adopted into the new optical communication standards for datacenters. Usually, handling a multi-leveled signal requires a linear TIA to preserve the information contained in the amplitude in the I-V conversion. A current-mode receiver, discussed in Chapter 4, is a nonlinear RX that



Figure 7.1: Input currents and output voltages of an NRZ-current-mode receiver for different PAM-4 symbols.

cannot be used on its own to detect a PAM-4 current signal. Therefore, as a continuation of the work presented in Chapter 4, we propose a PAM-4-current-mode receiver that has the advantages of an NRZ-current-mode receiver with the ability to detect the information contained in the signal's amplitude.

In an NRZ-current-mode receiver, shown in Figure 4.2, the input current is integrated on  $C_{PP,N}$  leading to a change in values of  $V_P$  and  $V_N$  with a slew rate proportional to the input current amplitude.

As shown in Figure 7.1, '11' and '10' input currents have the same polarity. Therefore, an NRZ-current-mode receiver output will be the same in both cases, losing the information about the second bit in the symbol. However, by taking a closer look at  $V_N$ , it has a higher slew rate when the input is '11'. This difference in the slew rate of  $V_N$  carries the information about the second bit in the symbol, and hence, it can be extracted by means of a slew rate comparator.

The slew rate of two signals could be compared using a voltage comparator with a threshold,

ideally, set at the maximum  $\tau$ , where  $\tau$  is the time difference between the two signals at a certain voltage level. However, in this case,  $\tau$  values are relatively low, in the range of  $\approx 10$  ps, which complicates the design of the voltage comparator. Therefore, in a realistic implementation, the values of  $\tau$  should be increased to relax the comparator's design. We propose the use of a *time amplifier* to increase  $\tau$  and simplify the voltage comparator's design. Time amplifiers have been used before in phase-locked loops [123].

The proposed PAM-4-current-mode receiver, shown in Figure 7.2, consists of two NRZcurrent-mode receivers. One of the NRZ-current-mode receivers is connected to the PD through an  $R_B/C_B$  combination, while the other is connected to a controllable  $I_{DC}$  to generate a reference,  $V_{ref}$ , for the slew rate comparator. The outputs from the first NRZ-current-mode receiver,  $OUT_P$  and  $OUT_N$ , are then fed to an SR-latch to generate the MSB. At the same



Figure 7.2: Input currents and output voltages of an NRZ-current-mode receiver for different PAM-4 symbols.

time, the slew rates of  $OUT_P$  and  $OUT_N$  are compared to that of  $V_{ref}$  to determine the LSB. Depending on the symbol, the correct LSB could be generated from comparing either  $OUT_P$  to  $V_{ref}$ , or  $OUT_N$  to  $V_{ref}$  and the correct path will be selected based on the value of the MSB.

As a proof-of-concept, the system has been verified at a schematic level in TSMC 65 nm CMOS technology, with the layout and the testing plan in progress. From schematic simulations, the system is expected to work up to 30 G/s. Table 7.1 includes a performance summary and comparison between the schematic simulation results and recently published designs.

Table 7.1: PAM-4-current-mode receiver schematic performance summary and comparison to state-of-the-art.

|                    | This work   | [124]  | [122]        | [125]       |
|--------------------|-------------|--------|--------------|-------------|
| Technology         | 65nm        | 65nm   | 130nm BiCMOS | 16nm-FinFet |
| Data rate (Gb/s)   | 30          | 40     | 56           | 100         |
| Sensitivity (µApp) | 40          | 37.8   | 19           | 42          |
| Efficiency (pJ/b)  | 1.07        | 2      | 1.5          | 0.6         |
| RX type            | Integrating | Linear | Linear       | Linear      |

## **Bibliography**

- S. K. Korotky, "Semi-empirical description and projections of internet traffic trends using a hyperbolic compound annual growth rate", *Bell Labs Technical Journal*, vol. 18, no. 3, pp. 5–21, Dec 2013.
- [2] A. H. Ahmed et al., "Silicon-Photonics Microring Links for Datacenters—Challenges and Opportunities", *IEEE J. Sel. Top. Quantum Electron.*, vol. 22, no. 6, pp. 194–203, Nov. 2016.
- [3] Y. A. Vlasov, "Silicon CMOS-integrated nano-photonics for computer and data communications beyond 100G", *IEEE Commun. Mag.*, vol. 50, no. 2, pp. 67–72, Feb. 2012.
- [4] H. Jayatilleka et al., "Crosstalk in SOI Microring Resonator-Based Filters", J. Light. Technol., vol. 34, no. 12, pp. 2886–2896, June 2016.
- [5] H. Jayatilleka et al., "Automatic wavelength tuning of series-coupled vernier racetrack resonators on SOI", in *Opt. Fiber Commun. Conf.*, Mar. 2016, pp. 1–3.
- [6] N. Y. Li et al., "High-performance 850 nm VCSEL and photodetector arrays for 25 Gb/s parallel optical interconnects", in *Opt. Fiber Commun. Conf.*, Mar. 2010, pp. 1–3.
- [7] S. Palermo, A. Emami-Neyestanak, and M. Horowitz, "A 90 nm CMOS 16 Gb/s Transceiver for Optical Interconnects", *IEEE J. Solid-State Circuits*, vol. 43, no. 5, pp. 1235–1246, May 2008.
- [8] S. Saeedi et al., "A 25 Gb/s 3D-Integrated CMOS/Silicon-Photonic Receiver for Low-Power High-Sensitivity Optical Communication", J. Light. Technol., vol. 34, no. 12, pp. 2924–2933, June 2016.
- [9] Sung Min Park and Hoi-Jun Yoo, "1.25-Gb/s regulated cascode CMOS transimpedance amplifier for Gigabit Ethernet applications", *IEEE J. Solid-State Circuits*, vol. 39, no. 1, pp. 112–121, Jan. 2004.
- [10] D. Mahgerefteh et al., "Techno-Economic Comparison of Silicon Photonics and Multimode VCSELs", J. Light. Technol., vol. 34, no. 2, pp. 233–242, Jan. 2016.
- [11] J. A. Tatum et al., "VCSEL-Based Interconnects for Current and Future Data Centers", *J. Light. Technol.*, vol. 33, no. 4, pp. 727–732, Feb. 2015.

- [12] P. Moser et al., "85-fJ Dissipated Energy Per Bit at 30 Gb/s Across 500-m Multimode Fiber Using 850-nm VCSELs", *IEEE Photonics Technol. Lett.*, vol. 25, no. 16, pp. 1638–1641, Aug. 2013.
- [13] D. M. Kuchta et al., "Error-free 56 Gb/s NRZ modulation of a 1530 nm VCSEL link", in *IEEE Eur. Conf. Opt. Commun.*, Sep. 2015, pp. 1–3.
- [14] Kazuro Kikuchi, "Fundamentals of Coherent Optical Fiber Communications", *J. Light. Technol.*, vol. 34, no. 1, pp. 157–179, Jan. 2016.
- [15] M. G. Ahmed, T. N. Huynh, C. Williams, Y. Wang, P. K. Hanumolu, and A. Rylyakov, "34 GBd linear transimpedance amplifier for 200 Gb/s DP-16-QAM optical coherent receivers", *IEEE J. Solid-State Circuits*, vol. 54, no. 3, pp. 834–844, Mar. 2019.
- [16] S. Shekhar et al., "Silicon electronics-photonics integrated circuits for datacenters", in *IEEE Compound Semiconductor IC Symp.*, Nov. 2016, pp. 1–4.
- [17] C. Li et al., "A 3D-Integrated 56 Gb/s NRZ/PAM4 Reconfigurable Segmented Mach-Zehnder Modulator-Based Si-Photonics Transmitter", in *IEEE BiCMOS and Compound Semiconductor Integrated Circuits and Technology Symp.*, Oct. 2018, pp. 32–35.
- [18] H. Li et al., "A 112 Gb/s PAM4 Linear TIA with 0.96 pJ/bit Energy Efficiency in 28 nm CMOS", in *IEEE European Solid-State Circuits Conf.*, Sep. 2018, pp. 238–241.
- [19] M. S. Hai, M. M. P. Fard, and O. Liboiron-Ladouceur, "A Ring-Based 25 Gb/s DAC-Less PAM-4 Modulator", *IEEE J. Sel. Top. Quantum Electron.*, vol. 22, no. 6, pp. 123–130, Nov 2016.
- [20] X. Wu et al., "A 20 Gb/s NRZ/PAM-4 1V transmitter in 40nm CMOS driving a Siphotonic modulator in 0.13μm CMOS", in *Proc. IEEE Int. Solid-State Circuits Conf.*, Nov. 2013, pp. 128–129.
- [21] J. Hwang et al., "A 64Gb/s 2.29pJ/b PAM-4 VCSEL Transmitter With 3-Tap Asymmetric FFE in 65nm CMOS", in *IEEE Symp. VLSI Circuit*, June 2019, pp. C268–C269.
- [22] K. R. Lakshmikumar et al., "A Process and Temperature Insensitive CMOS Linear TIA for 100 Gb/s/ λ PAM-4 Optical Links", *IEEE J. Solid-State Circuits*, vol. 54, no. 11, pp. 3180–3190, Nov 2019.
- [23] K. Zheng et al., "An Inverter-Based Analog Front-End for a 56-Gb/s PAM-4 Wireline Transceiver in 16-nm CMOS", *IEEE Solid-State Circuits Letters*, vol. 1, no. 12, pp. 249–252, Dec 2018.
- [24] C. Sun et al., "A Monolithically-Integrated Chip-to-Chip Optical Link in Bulk CMOS", *IEEE J. Solid-State Circuits*, vol. 50, no. 4, pp. 828–844, Apr. 2015.
- [25] I. A. Young et al., "Optical I/O Technology for Tera-Scale Computing", IEEE J. Solid-State Circuits, vol. 45, no. 1, pp. 235–248, Jan. 2010.

- [26] G. Kalogerakis et al., "A quad 25Gb/s 270mW TIA in 0.13µm BiCMOS with <0.15dB crosstalk penalty", in *Proc. IEEE Int. Solid-State Circuits Conf.*, Feb 2013, pp. 116–117.
- [27] N. Y. Li et al., "High-performance 850 nm VCSEL and photodetector arrays for 25 Gb/s parallel optical interconnects", in *Proc. Opt. Fiber Commun. Conf.*, March 2010, pp. 1–3.
- [28] M. H. Nazari and A. Emami-Neyestanak, "A 24-Gb/s Double-Sampling Receiver for Ultra-Low-Power Optical Communication", *IEEE J. Solid-State Circuits*, vol. 48, no. 2, pp. 344–357, Feb 2013.
- [29] M. Rakowski et al., "A 4×20Gb/s WDM ring-based hybrid CMOS silicon photonics transceiver", in *IEEE Int. Solid-State Circuits Conf.*, Feb. 2015, pp. 1–3.
- [30] J. F. Buckwalter et al., "A Monolithic 25-Gb/s Transceiver With Photonic Ring Modulators and Ge Detectors in a 130-nm CMOS SOI Process", *IEEE J. Solid-State Circuits*, vol. 47, no. 6, pp. 1309–1322, June 2012.
- [31] R. Ding et al., "A silicon platform for high-speed photonics systems", in *Proc. Opt. Fiber Commun. Conf.*, March 2012, pp. 1–3.
- [32] A. Emami-Neyestanak et al., "A 1.6 Gb/s, 3 mW CMOS receiver for optical communication", in *IEEE Symp. VLSI Circuit*, June 2002, pp. 84–87.
- [33] S. Saeedi and A. Emami, "A 25Gb/s 170μW/Gb/s optical receiver in 28nm CMOS for chip-to-chip optical communication", in *IEEE Radio Freq. Integr. Circuits Symp.*, June 2014, pp. 283–286.
- [34] S. Huang and W. Chen, "A 25 Gb/s, -10.8 dBm input sensitivity, PD-bandwidth tolerant CMOS optical receiver", in *IEEE Symp. VLSI Circuit*, June 2015, pp. C120–C121.
- [35] E. Säckinger, "On the Noise Optimum of FET Broadband Transimpedance Amplifiers", *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 59, no. 12, pp. 2881–2889, Dec. 2012.
- [36] S. Shekhar et al., "Design considerations for low-power receiver front-end in high-speed data links", in *IEEE Cust. Integr. Circuits Conf.*, Sep. 2013, pp. 1–8.
- [37] S. Shekhar, J. S. Walling, and D. J. Allstot, "Bandwidth Extension Techniques for CMOS Amplifiers", *IEEE J. Solid-State Circuits*, vol. 41, no. 11, pp. 2424–2439, Nov. 2006.
- [38] E Säckinger, *Broadband circuits for optical fiber communication*, Hoboken, NJ, USA John Wiley&Sons, Inc, 2005.
- [39] D. J. Thomson et al., "High contrast 40 Gbit/s optical modulation in silicon", *Opt. Exp.*, vol. 19, no. 12, pp. 11507–11516, June 2011.
- [40] D. Patel et al., "High-speed compact silicon photonic Michelson interferometric modulator", Opt. Exp., vol. 22, no. 22, pp. 26788–26802, Nov. 2014.

- [41] M. Webster et al., "Silicon photonic modulator based on a MOS-capacitor and a CMOS driver", in *Proc. IEEE Compound Semicond. Integr. Circuit Symp.*, Oct. 2014, pp. 1–4.
- [42] R. Ding et al., "Ultra-low-power carrier-depletion Mach-Zehnder silicon optical modulator", Opt. Exp., vol. 20, no. 7, pp. 7081–7087, Mar. 2012.
- [43] C. Li et al., "Silicon Photonic Transceiver Circuits With Microring Resonator Bias-Based Wavelength Stabilization in 65 nm CMOS", *IEEE J. Solid-State Circuits*, vol. 49, no. 6, pp. 1419–1436, June 2014.
- [44] F. Y. Liu et al., "10-Gbps, 5.3-mW Optical Transmitter and Receiver Circuits in 40-nm CMOS", *IEEE J. Solid-State Circuits*, vol. 47, no. 9, pp. 2049–2067, Sep. 2012.
- [45] H. Li et al., "A 25 Gb/s 4.4V-swing AC-coupled Si-photonic microring transmitter with 2-tap asymmetric FFE and dynamic thermal tuning in 65nm CMOS", in *Proc. IEEE Int. Solid-State Circuits Conf.*, Feb. 2015, pp. 410–412.
- [46] R. Yu et al., "High-precision in-situ wavelength stabilization and monitoring of tunable lasers using AWG and PD arrays", in 2011 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference, Mar. 2011, pp. 1–3.
- [47] M. Lu et al., "A highly integrated optical phase-locked loop for laser wavelength stabilization", in *IEEE Photonics Conf.*, Sep. 2012, pp. 844–845.
- [48] S. Qhumayo, R. M. Manuel, and M. Grobler, "Wavelength and power stabilization of a three wavelength Erbium doped fiber laser using a nonlinear optical loop mirror", in *AFRICON*, Sep. 2015, pp. 1–4.
- [49] L. Sumitomo Electric Industries, "SLT5411/SLT5413 series.".
- [50] J. Lee et al., "12.2% waveguide-coupled wall plug efficiency in single mode externalcavity tunable Si/III–V hybrid laser", in *IEEE Opt. Interconnects Conf.*, Apr. 2015, pp. 142–143.
- [51] A. Zilkie et al., "Power-efficient III-V/silicon external cavity DBR lasers", Opt. Exp., vol. 20, no. 21, pp. 23456–23462, Oct. 2012.
- [52] S. Tanaka et al., "High-output-power single-wavelength silicon hybrid laser using precise flip-chip bonding technology", *Opt. Exp.*, vol. 20, no. 27, pp. 28057–28067, Dec. 2012.
- [53] J. Lee et al., "High power and widely tunable Si hybrid external-cavity laser for power efficient Si photonics WDM links", *Opt. Exp.*, vol. 22, no. 7, pp. 7678–7685, Apr. 2014.
- [54] K. Iga, "Surface-emitting laser-its birth and generation of new optoelectronics field", *IEEE J. Sel. Top. Quantum Electron.*, vol. 6, no. 6, pp. 1201–1215, Nov. 2000.
- [55] J. Jiang et al., "100Gb/s ethernet chipsets in 65nm CMOS technology", in *IEEE Int. Solid-State Circuits Conf.*, Feb 2013, pp. 120–121.

- [56] J. Proesel C. Schow A. Rylyakov, "25 Gb/s 3.6 pJ/b and 15 Gb/s 1.37 pJ/b VCSEL-based optical links in 90nm CMOS", in *Proc. IEEE Int. Solid-State Circuits Conf.*, Feb. 2012, pp. 418–420.
- [57] C. Politi, D. Klonidis, and M. J. O'Mahony, "Waveband converters based on four-wave mixing in SOAs", J. Light. Technol., vol. 24, no. 3, pp. 1203–1217, March 2006.
- [58] Maxim Integrated, "Impact of transmitter RIN on optical link performance.", Application note: HFAN-9.1.0 Rev.1; 04/08.
- [59] Sharee McNab, Nikolaj Moll, and Yurii Vlasov, "Ultra-low loss photonic integrated circuit with membrane-type photonic crystal waveguides", *Opt. Express*, vol. 11, no. 22, pp. 2927–2939, Nov. 2003.
- [60] T. Tsuchizawa et al., "Spot-size converters for rib-type silicon photonic wire waveguides", in *IEEE Int. Conf. Gr. IV Photonics*, Sep. 2008, pp. 200–202.
- [61] T. Barwicz et al., "An O-band metamaterial converter interfacing standard optical fibers to silicon nanophotonic waveguides", in *Opt. Fiber Commun. Conf.*, March 2015, pp. 1–3.
- [62] Philippe P Absil et al., "Silicon photonics integrated circuits: a manufacturing platform for high density, low power optical I/O's", *Opt. Express*, vol. 23, no. 7, pp. 9369–9378, Apr0 2015.
- [63] J. Yao et al., "A CMOS-compatible low back reflection grating coupler for on-chip laser sources integration", in *Opt. Fiber Commun. Conf.*, Mar. 2016, pp. 1–3.
- [64] N. Ophir et al., "Silicon Photonic Microring Links for High-Bandwidth-Density, Low-Power Chip I/O", *IEEE Micro*, vol. 33, no. 1, pp. 54–67, Jan 2013.
- [65] H. Won et al., "A 0.87 W transceiver IC for 100 gigabit Ethernet in 40nm CMOS", *IEEE J. Solid-State Circuits*, vol. 50, no. 2, pp. 399–413, Feb 2015.
- [66] T.Shibasaki, "4 × 25.78 Gb/s retimer ICs for optical links in 0.13 μm SiGe BiCMOS", in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 2015, pp. 1–3.
- [67] "SWDM [Online]", Available: https://www.swdm.org.
- [68] "PSM4 [Online]", Available: https://www.psm4.org.
- [69] J. Li et al., "A 25-Gb/s monolithic optical transmitter with micro-ring modulator in 130-nm SoI CMOS", *IEEE Photon. Technol. Lett.*, vol. 25, no. 19, pp. 1901–1903, Oct 2013.
- [70] Y.Chen, "A 25Gb/s hybrid integrated silicon photonic transceiver in 28nm CMOS and SOI", in *Proc. IEEE Int. Solid-State Circuits Conf.*, Feb. 2015, pp. 402–404.
- [71] M.Georgas, "Addressing link-level design tradeoffs for integrated photonic interconnects", in *Proc. IEEE Custom Integr. Circuits Conf.*, Sep. 2011, pp. 1–8.

- [72] D.Livshits, "High efficiency diode comb-laser for DWDM optical interconnects", in *Proc. IEEE Opt. Interconnects Conf.*, May 2014, pp. 83–84.
- [73] C.Chen, "A comb laser-driven DWDM silicon photonic transmitter with microring modulator for optical interconnect", in *Proc. Lasers Electro-Opt. Conf.*, May 2015, pp. 1–2.
- [74] H.Jayatilleka, "Automatic wavelength tuning of series-coupled Vernier racetrack resonators on SOI", in Proc. Lasers Electro-Opt. Conf., March 2016.
- [75] H. Jayatilleka et al., "Crosstalk in SOI microring resonator-based filters", *J. Lightw. Technol.*, vol. 34, no. 12, pp. 2886–2896, June 2016.
- [76] D.Kuchta, "A 55 Gb/s directly modulated 850nm VCSEL-based optical link", in Proc. IEEE Photon. Conf., Sep. 2012, pp. 1–2.
- [77] C.Xiong, "A monolithic 56 Gb/s CMOS integrated nanophotonic PAM-4 transmitter", in *Proc. IEEE Opt. Interconnects Conf.*, Apr. 2015, pp. 16–17.
- [78] G.Denoyer, "Hybrid silicon photonic circuits and transceiver for 56 Gb/s NRZ 2.2 km transmission over single mode fiber", in *Proc. IEEE Eur. Conf. Opt. Commun.*, Sep. 2014, pp. 1–3.
- [79] A.Roshan-Zamir, "A 40 Gb/s PAM4 silicon microring resonator modulator transmitter in 65nm CMOS", in *Proc. IEEE Opt. Interconnects Conf.*, May 2016, pp. 1–2.
- [80] S. Nayak, A. H. Ahmed, A. Sharkia, A. S. Ramani, S. Mirabbasi, and S. Shekhar, "A 10-Gb/s -18.8 dBm Sensitivity 5.7 mW Fully-Integrated Optoelectronic Receiver With Avalanche Photodetector in 0.13- μ m CMOS", *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 66, no. 8, pp. 3162–3173, Aug 2019.
- [81] A. Tait et al., "Feedback control for microring weight banks", *Opt. Express*, vol. 26, no. 20, pp. 26422–26443, Oct. 2018.
- [82] M.-J. Lee W.-Y. Choi, "A silicon avalanche photodetector fabricated with standard CMOS technology with over 1 THz gain-bandwidth product", *Opt. Express*, vol. 26, no. 20, pp. 26422–26443, Oct. 2010.
- [83] N.Duan et al., "310 GHz gain-bandwidth product Ge/Si avalanche photodetector for 1550 nm light detection", *Opt. Express*, vol. 20, no. 10, pp. 11031–11036, May 2012.
- [84] J. Choi et al., "A monolithic GaAs receiver for optical interconnect systems", *IEEE J. Solid-State Circuits*, vol. 29, no. 3, pp. 328–331, Mar. 1994.
- [85] C. Takano et al., "Monolithic integration of 5-Gb/s optical receiver block for short distance communication", *IEEE J. Solid-State Circuits*, vol. 27, no. 10, pp. 1431–1433, Oct. 1994.
- [86] J. Jang et al., "Long-wavelength In 0.53 Ga 0.47 As metamorphic p-i-n photodiodes on GaAs substrates", *IEEE Photon. Technol. Lett.*, vol. 13, no. 2, pp. 151–153, Feb. 2001.

- [87] J. Bowers et al., "High-gain high-sensitivity resonant Ge/Si APD photodetectors", in *Proc. SPIE*, May 2010, vol. 7660, p. 76603H.
- [88] J. Wang and S. Lee, "Ge-photodetectors for Si-based optoelectronic integration", Sensors, vol. 11, no. 1, pp. 696–718, May 2011.
- [89] F. Tavernier and M. S. J. Steyaert, "High-speed optical receivers with integrated photodiode in 130 nm CMOS", *IEEE J. Solid-State Circuits*, vol. 44, no. 10, pp. 2856–2867, Oct. 2009.
- [90] T. S. C. Kao, F. A. Musa and A. C. Carusone, "A 5-Gbit/s CMOS optical receiver with integrated spatially modulated light detector and equalization", *IEEE Trans. Circuits Syst. I Reg. Papers*, vol. 57, no. 11, pp. 2844–2857, Nov. 2010.
- [91] G. P. Agrawal, *Fiber-Optic Communication Systems*, Hoboken, NJ, USA John Wiley&Sons, Inc, 2002.
- [92] J. Youn et al., "High-speed CMOS integrated optical receiver with an avalanche photodetector", *IEEE Photon. Technol. Lett.*, vol. 21, no. 20, pp. 1553–1555, Oct. 2009.
- [93] F. Bruccoleri, E. A. M. Klumperink, and B. Nauta, "Wide-band CMOS low-noise amplifier exploiting thermal noise canceling", *IEEE J. Solid-State Circuits*, vol. 39, no. 2, pp. 275–282, Feb. 2004.
- [94] E. Sackinger and W. C. Fischer, "A 3-GHz 32-dB CMOS limiting amplifier for SONET OC-48 receivers", *IEEE J. Solid-State Circuits*, vol. 35, no. 12, pp. 1884–1888, Dec. 2000.
- [95] J. Walling et al., "Wideband CMOS amplifier design: time-domain considerations", *IEEE Trans. Circuits Systems –I*, vol. 55, no. 7, pp. 1781–1793, Aug. 2008.
- [96] H.-Y. Jung, J.-M. Lee and W.-Y. Choi, "A high-speed CMOS integrated optical receiver with an under-damped TIA", *IEEE Photon. Technol. Lett.*, vol. 27, no. 13, pp. 1367–1370, July 2015.
- [97] Q. Pan et al., "An 18-Gb/s fully integrated optical receiver with adaptive cascaded equalizer", *IEEE J. Quantum Electron.*, vol. 22, no. 6, Nov./Dec. 2016.
- [98] D. Guckenberger et al., "1 V 10 mW 10 Gb/s CMOS optical receiver front-end", in *IEEE Radio Freq. Integr. Circuit Symp. Dig. Papers*, June 2005, pp. 309–312.
- [99] C. Schow et al., "A single-chip CMOS-based parallel optical transceiver capable of 240-Gb/s bidirectional data rates", J. Lightw. Technol., vol. 27, no. 7, pp. 915–929, Apr. 2009.
- [100] H. Morita et al., "A 12 × 5 two-dimensional optical I/O array for 600 Gb/schipto-chip interconnect in 65 nm CMOS", in *IEEE Int. Solid-State Circuits Conf.*, Feb. 2014, pp. 140–141.

- [101] D. Kucharski et al., "10Gb/s 15mW optical receiver with integrated Germanium photodetector and hybrid inductor peaking in 0.13μm SOI CMOS technology", in *IEEE Int. Solid-State Circuits Conf.*, Feb. 2010, pp. 360–361.
- [102] M. G. Ahmed et al., "A 12-Gb/s -16.8-dBm OMA sensitivity 23-mW optical receiver in 65-nm CMOS", *IEEE J. Solid-State Circuits*, vol. 53, no. 2, pp. 445–457, Feb. 2018.
- [103] Tektronix, "D.C. Blocks, a Trap for the Unwary When Using Long Patterns.".
- [104] T. Takemoto et al., "A 25-to-28 Gb/s High-Sensitivity (-9.7 dBm) 65 nm CMOS Optical Receiver for Board-to-Board Interconnects", *IEEE J. Solid-State Circuits*, vol. 49, no. 10, pp. 2259–2276, Oct. 2014.
- [105] A. H. Ahmed et al., "A 6V swing 3.6% THD >40GHz driver with 4.5× bandwidth extension for a 272Gb/s dual-polarization 16-QAM silicon photonic transmitter", in *IEEE Int. Solid-State Circuits Conf.*, Feb. 2019, pp. 484–486.
- [106] Wesley D. Sacher et al., "Polarization rotator-splitters in standard active silicon photonics platforms", *Opt. Express*, vol. 22, no. 4, pp. 3777–3786, Feb. 2014.
- [107] Hang Guan et al., "Ultracompact silicon-on-insulator polarization rotator for polarization-diversified circuits", *Opt. Letters*, vol. 39, no. 16, pp. 4703–4706, Aug. 2014.
- [108] H G Choi et al., "Modulation-Format-Free Bias Control Technique for MZ Modulator Based on Differential Phasor Monitor", in *Opt. Fiber Commun. Conf.*, Mar. 2011.
- [109] P. S. Cho, J. B. Khurgin, and I. Shpantzer, "Closed-Loop Bias Control of Optical Quadrature Modulator", *IEEE Photonics Technology Letters*, vol. 18, no. 21, pp. 2209– 2211, Nov. 2006.
- [110] G. C. M. Meijer, "A New Configuration for Temperature Transducers and Bandgap References", in *IEEE European Solid-State Circuits Conf.*, Sep. 1978, pp. 142–145.
- [111] J. W. M. Rogers, D. Rahn, and C. Plett, "A study of digital and analog automaticamplitude control circuitry for voltage-controlled oscillators", *IEEE J. Solid-State Circuits*, vol. 38, no. 2, pp. 352–356, Feb 2003.
- [112] C. Knochenhauer, J. C. Scheytt, and F. Ellinger, "A compact, low-power 40-Gbit/s modulator driver with 6-V differential output swing in 0.25-μm SiGe BiCMOS", *IEEE J. Solid-State Circuits*, vol. 46, no. 5, pp. 1137–1146, May 2011.
- [113] "OIF-CFP2-ACO-01.0 [Online].", Available:https://www.oiforum.com/wpcontent/uploads/2019/01/OIF-CFP2-ACO-01.0.pdf.
- [114] S. Mandegaran and A. Hajimiri, "A Breakdown Voltage Multiplier for High Voltage Swing Drivers", *IEEE J. Solid-State Circuits*, vol. 42, no. 2, pp. 302–312, Feb. 2007.

- [115] L. Vera and J. R. Long, "A 40-Gb/s SiGe-BiCMOS MZM driver with 6-V p-p output and on-chip digital calibration", *IEEE J. Solid-State Circuits*, vol. 52, no. 2, pp. 460–471, Feb. 2017.
- [116] A. Zandieh et al., "Linear large-swing push-pull SiGe BiCMOS drivers for silicon photonics modulators", *IEEE Trans. Microwave Theory Techniques*, vol. 65, no. 12, pp. 5355–5366, Dec. 2017.
- [117] "Basic Total Harmonic Distortion Measurement [Online].", Available: https://www.microsemi.com/document-portal/doc\_download/134813-an30basic-total-harmonic-distortion-thd-measurement.
- [118] "Tektronix 45 GHz Optical Modulation Analyzer [Online].", Available: https://www.tek.com/optical-modulation-analyzer/om4245.
- [119] M. Ahmed et al., "34-GBd Linear Transimpedance Amplifier for 200-Gb/s DP-16-QAM Optical Coherent Receivers", *IEEE J. Solid-State Circuits*, vol. 54, no. 3, pp. 834–844, 2019.
- [120] A. Awny et al., "A dual 64Gbaud 10k $\Omega$  5% THD linear differential transimpedance amplifier with automatic gain control in 0.13 $\mu$ m BiCMOS technology for optical fiber coherent receivers", in *IEEE Int. Solid-State Circuits Conf.*, Feb. 2016.
- [121] A. Awny et al., "A Linear Differential Transimpedance Amplifier for 100-Gb/s Integrated Coherent Optical Fiber Receivers", "*IEEE Trans. on Microw. Theory and Tech.*, vol. 66, no. 2, pp. 973–986, 2018.
- [122] I. G. López et al., "A 60 GHz bandwidth differential linear TIA in 130 nm SiGe:C BiCMOS with < 5.5 pA/ $\sqrt{Hz}$ ", in *IEEE Bipolar/BiCMOS Circuits and Technology Meeting*, Oct. 2017, pp. 114–117.
- [123] A. Elkholy et al., "A 3.7 mW Low-Noise Wide-Bandwidth 4.5 GHz Digital Fractional-N PLL Using Time Amplifier-Based TDC", *IEEE J. Solid-State Circuits*, vol. 50, no. 4, pp. 867–881, Apr. 2015.
- [124] S. Facchin et al., "A 20Gbaud/s PAM-4 65nm CMOS optical receiver using 3D solenoid based bandwidth enhancement", in *IEEE International Midwest Symposium on Circuits* and Systems, Aug 2017, pp. 723–726.
- [125] K. Lakshmikumar et al., "A process and temperature insensitive CMOS linear TIA for 100 Gbps/λ PAM-4 optical links", in *IEEE Cust. Integr. Circuits Conf.*, Apr 2018, pp. 1–4.