Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Synchronizer analysis and design tool : an application to automatic differentiation Reiher, Justin James 2020

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2020_may_reiher_justin.pdf [ 13.35MB ]
Metadata
JSON: 24-1.0388329.json
JSON-LD: 24-1.0388329-ld.json
RDF/XML (Pretty): 24-1.0388329-rdf.xml
RDF/JSON: 24-1.0388329-rdf.json
Turtle: 24-1.0388329-turtle.txt
N-Triples: 24-1.0388329-rdf-ntriples.txt
Original Record: 24-1.0388329-source.json
Full Text
24-1.0388329-fulltext.txt
Citation
24-1.0388329.ris

Full Text

Synchronizer Analysis and Design Tool: an Application toAutomatic DifferentiationbyJustin James ReiherBCS, University of British Columbia, 2017B.Eng, University of Victoria, 2011Dipl.T, British Columbia Institute of Technology, 2008A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Computer Science)The University of British Columbia(Vancouver)January 2020c© Justin James Reiher, 2020The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:Synchronizer Analysis and Design Tool: an Application to AutomaticDifferentiationsubmitted by Justin James Reiher in partial fulfillment of the requirements for thedegree of Master of Science in Computer Science.Examining Committee:Mark R. Greenstreet, Computer ScienceSupervisorIan M. Mitchell, Computer ScienceSupervisory Committee MemberiiAbstractIn 2007, Yang and Greenstreet presented an algorithm that enables the computationof synchronizer failure probabilities, even when these probabilities are extremelysmall. Their approach gives a single probability number for the synchronizer butdoes not explain how the circuit details within the synchronizer contributes to thefinal result. We present an extension of their algorithm that connects the time-to-voltage gain of a synchronizer to the propagation of metastability through synchro-nizer circuits. This allows the designer to see what circuit features are helpful ornot for synchronizer performance. There exists abundant folklore about what helpsor hinders multistage synchronizer performance. We use our analysis to examineand explain such synchronizer folklore and draw novel conclusions. A brief erroranalysis is presented to provide evidence that the machinery used to compute ournew measure of synchronizer effectiveness is accurate. The tools are exercised toobjectively evaluate a handful of industry used synchronizer designs in order tocompare their effectiveness against one another.iiiLay SummarySynchronizers are circuits which coordinate the transfer of information betweentwo processors that run on different schedules. It is well known that synchroniza-tion can fail because the circuit gets stuck deciding if data transfer should occur ornot. These events are very rare, thus making it difficult to analyze. The tools devel-oped in this work provide a framework to evaluate the manifestation of those rarebehaviours, deliver novel methods to systematically break down synchronization ata sub-circuit level. A consequence of this framework is that circuit designers nowhave a concrete way of objectively comparing synchronizer designs in a succinctand fair way.ivPrefaceThis thesis contains the research I conducted while in UBC’s Integrated SystemsDesign Lab under the guidance of Mark Greenstreet. Portions of the work pre-sented in Chapter 3, Chapter 5 and Chapter 6 are published as Explaining Metasta-bility in Real Synchronizers, Justin Reiher, Mark R. Greenstreet and Ian W. Jones(2018) [42].Mark R. Greenstreet and I jointly derived the mathematics, while Mark R.Greenstreet and Ian W. Jones from Oracle Labs provided guidance on which ex-periments to conduct in [42]. In addition to being the main contributor on thederivations, I implmented the code to produce all results reported in [42] and thosefound in this thesis. All figures in this thesis are original and do not appear in anypublications.An earlier version of the Massachusetts Institute of Technology (MIT) VirtualSource (MVS) transistor model from Chapter 5 was presented as Combining theMVS compact MOSFET model with automatic differentiation for flexible circuitanalysis, Justin Reiher, Mark R. Greenstreet (2018) [41]. I performed the simula-tions and used code from [39] to build our version of the MVS transistor model.The MVS transistor model as described in Chapter 5 is used in work publishedas Finding all DC operating points using interval arithmetic based verificationalogirthms, Itrat A. Akhter, Justin Reiher, and Mark R. Greenstreet (2019) [1].My contribition to [1] is the MVS transistor model where I provided support withregards to this model. Itrat Akhter and Mark Greenstreet performed the analysisfound in [1].vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 The Synchronization Problem . . . . . . . . . . . . . . . . . . . 31.1.1 Saddle Problems . . . . . . . . . . . . . . . . . . . . . . 41.2 Synchronizer Metastability Analysis Workflow . . . . . . . . . . 51.3 Contributions & Thesis Statement . . . . . . . . . . . . . . . . . 62 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1 Transistor Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.1 BSIM and PTM Model Cards . . . . . . . . . . . . . . . 102.1.2 Compact Models . . . . . . . . . . . . . . . . . . . . . . 10vi2.2 Circuit Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Automatic Differentiation . . . . . . . . . . . . . . . . . . . . . . 112.4 Metastability Analysis and Synchronizer Designs . . . . . . . . . 122.4.1 Metastability . . . . . . . . . . . . . . . . . . . . . . . . 122.4.2 MTBF and Metastability Analysis . . . . . . . . . . . . . 142.4.3 The Nested Bisection Algorithm . . . . . . . . . . . . . . 162.4.4 Synchronizers . . . . . . . . . . . . . . . . . . . . . . . . 193 Synchronizer Metastability Analysis Tools . . . . . . . . . . . . . . . 203.1 Linear Small Signal Model Construction . . . . . . . . . . . . . . 243.1.1 Constructing the “Perfectly” Metastable Trajectory . . . . 253.1.2 Small Signal Linear Mapping . . . . . . . . . . . . . . . 273.2 Computing β (t) . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.3 Sensitivity Analysis u(t)T . . . . . . . . . . . . . . . . . . . . . . 303.4 Synchronizer Gain g(t) and Instantaneous Gain λ (t) . . . . . . . 323.5 Failure Probability and Mean Time Between Failures (MTBF) . . . 343.6 Computing Tw . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.6.1 Obtaining Tw for a Passgate Latch . . . . . . . . . . . . . 363.7 Component Gain Decomposition Tool . . . . . . . . . . . . . . . 384 Automatic Differentiation Analysis . . . . . . . . . . . . . . . . . . . 424.1 Numerical Differentiating Comparisons . . . . . . . . . . . . . . 444.1.1 Example Function f (t,w) . . . . . . . . . . . . . . . . . 474.1.2 Approximating ∂∂w f (t,w) . . . . . . . . . . . . . . . . . . 474.2 Linear Switched Dynamics Synchronizer . . . . . . . . . . . . . 514.2.1 The Dynamics of the Toy Synchronizer . . . . . . . . . . 534.2.2 Derivation of the Gain of the Toy Synchronizer . . . . . . 544.2.3 The Results . . . . . . . . . . . . . . . . . . . . . . . . . 544.3 Derivative of Solution to an Iterative Algorithm . . . . . . . . . . 584.3.1 Fixed Point Algorithm f (x,y) = 0 . . . . . . . . . . . . . 584.3.2 Derivative of VS in MVS Model . . . . . . . . . . . . . . 59vii5 Transistor and Circuit Modeling . . . . . . . . . . . . . . . . . . . . 615.1 Transistor Modeling Basics . . . . . . . . . . . . . . . . . . . . . 625.1.1 Transistor Construction . . . . . . . . . . . . . . . . . . . 625.1.2 IV Characteristics . . . . . . . . . . . . . . . . . . . . . . 655.1.3 Transistor Effects . . . . . . . . . . . . . . . . . . . . . . 665.1.4 Device Capacitance . . . . . . . . . . . . . . . . . . . . . 695.2 MVS Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.2.1 MVS Model Deficiencies . . . . . . . . . . . . . . . . . . 755.3 Simplified Enz, Krummenacher and Vittoz’s model (EKV) Model . 775.3.1 EKV Model Fitting . . . . . . . . . . . . . . . . . . . . . 785.3.2 EKV Capacitance Model . . . . . . . . . . . . . . . . . . 815.3.3 EKV Model Deficiencies . . . . . . . . . . . . . . . . . . 826 Passgate Synchronizer and Variants: A Running Example . . . . . 846.1 The Passgate Latch & Synchronizer . . . . . . . . . . . . . . . . 856.1.1 The Passgate Latch . . . . . . . . . . . . . . . . . . . . . 856.1.2 The Two Flip-Flop Passgate Synchronizer . . . . . . . . . 916.2 The Passgate Latch Synchronizer Benchmark . . . . . . . . . . . 936.2.1 Nested Bisection Trajectories . . . . . . . . . . . . . . . 946.2.2 Analysis Results . . . . . . . . . . . . . . . . . . . . . . 966.3 Passgate Synchronizer Variants . . . . . . . . . . . . . . . . . . . 1026.3.1 Synchronizer with Simulated Scan Circuitry . . . . . . . . 1036.3.2 Synchronizer Kicking . . . . . . . . . . . . . . . . . . . 1036.3.3 Synchronizer with Offset . . . . . . . . . . . . . . . . . . 1076.4 Investigating the Synchronizer with Offset . . . . . . . . . . . . . 1106.4.1 The Offset Coupling Inverter and Passgate . . . . . . . . . 1136.4.2 The Master2 Cross-Coupled Loop . . . . . . . . . . . . 1156.4.3 The Slave2 Cross-Coupled Loop . . . . . . . . . . . . . 1176.4.4 Summary of Results . . . . . . . . . . . . . . . . . . . . 1186.4.5 Offset Investigation Remarks . . . . . . . . . . . . . . . . 1227 Conclusion and Futurework . . . . . . . . . . . . . . . . . . . . . . 1247.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124viii7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1267.3 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . 128Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130A Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . . . 137A.1 Derivative of Matrix A−1 . . . . . . . . . . . . . . . . . . . . . . 137A.2 Derivation of g˙(t) . . . . . . . . . . . . . . . . . . . . . . . . . . 137A.3 Tensor - Vector Product . . . . . . . . . . . . . . . . . . . . . . . 139ixList of TablesTable 4.1 Switched dynamics toy dynchronizer equations . . . . . . . . . 53Table 4.2 Derivation of toy synchronizer gain β (t) with initial conditionsx0(0) = x1(0) =−1 . . . . . . . . . . . . . . . . . . . . . . . 54Table 5.1 Junction capacitance list of paramaters . . . . . . . . . . . . . 73Table 6.1 Summary of offset and benchmark gain contributions . . . . . 121xList of FiguresFigure 1.1 42 years of microprocessor trend [46] . . . . . . . . . . . . . 2Figure 1.2 The synchronization problem . . . . . . . . . . . . . . . . . . 4Figure 1.3 One-bit synchronizer as a saddle problem . . . . . . . . . . . 5Figure 1.4 Analysis tool work flow . . . . . . . . . . . . . . . . . . . . 6Figure 2.1 Synchronizer outcomes . . . . . . . . . . . . . . . . . . . . . 14Figure 2.2 Synchronizer output branch . . . . . . . . . . . . . . . . . . 14Figure 2.3 Nested bisection algorithm illustration . . . . . . . . . . . . . 18Figure 3.1 Synchronization timing analysis and failure example . . . . . 21Figure 3.2 Nested bisection trajectories for passgate latch . . . . . . . . 37Figure 3.3 λ (t) for passgate latch . . . . . . . . . . . . . . . . . . . . . 38Figure 3.4 Linear fit line and plot of logg(t) vs. t for passgate latch . . . 39Figure 3.5 Linear fit line and plot of log∆V (t) vs. t for passgate latch . . 40Figure 4.1 Error analysis with respect to ∆ . . . . . . . . . . . . . . . . 48Figure 4.2 Approximating methods for ∂∂w f (t,w) . . . . . . . . . . . . . 49Figure 4.3 Approximating methods error log10 | ∂∂w f (t,w)− ∂∂w fˆ (t,w)| . . 50Figure 4.4 Two latch synchronizer . . . . . . . . . . . . . . . . . . . . . 52Figure 4.5 Two latch toy synchronizer simulation results . . . . . . . . . 56Figure 4.6 Two latch toy syncronizer β (t) simulation results . . . . . . . 56Figure 4.7 Two latch toy synchronizer β (t) error: log10 |β (t)− βˆ (t)| . . . 57Figure 4.8 Relative error analysis log10||β (t)−βˆ (t)||||β (t)|| as t → ∞,i.e. β (t) ex-ponentially growing. . . . . . . . . . . . . . . . . . . . . . . 57xiFigure 4.9 Ids − Ids0 = 0 Assumption for an NMOS device in the MVSmodel and Jacobian elements for a sinusoidal input signal . . 60Figure 5.1 PMOS and NMOS simplified device construction . . . . . . . 63Figure 5.2 PMOS and NMOS device symbols and Ids current . . . . . . . 64Figure 5.3 PMOS experiment test setup for the Predictive TechnologyModel (PTM) 45nm High Performance (HP) model card . . . . 65Figure 5.4 PMOS Vds sweep for Vgs = [−1,0]V in increments of 0.1V . . 66Figure 5.5 PMOS Vgs sweep for Vds = [−1.0,0]V in increments of 0.1V . 67Figure 5.6 NMOS parasitic capacitors (not shown Cgb) . . . . . . . . . . 70Figure 5.7 Linear gate capacitance results . . . . . . . . . . . . . . . . . 71Figure 5.8 Gate capacitance experimental setup . . . . . . . . . . . . . . 71Figure 5.9 MVS n-device transistor with virtual source and drain . . . . . 74Figure 5.10 MVS fit for NMOS IV characteristic sweeping Vds, bubbles areSimulation Program with Integrated Circuit Emphasis (SPICE)data points . . . . . . . . . . . . . . . . . . . . . . . . . . . 75Figure 5.11 MVS fit for NMOS IV characteristic sweeping Vgs, bubbles areSPICE data points . . . . . . . . . . . . . . . . . . . . . . . . 76Figure 5.12 Three ring oscillator circuit with Miller capacitance . . . . . . 76Figure 5.13 Three stage ring oscillator MVS model comparison with SPICEsimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Figure 5.14 EKV fit for NMOS IV characteristic sweeping Vds, bubbles areSPICE data points . . . . . . . . . . . . . . . . . . . . . . . . 80Figure 5.15 EKV fit for NMOS IV characteristic sweeping Vgs, bubbles areSPICE data points . . . . . . . . . . . . . . . . . . . . . . . . 80Figure 5.16 EKV fit for PMOS IV characteristic sweeping Vds, bubbles areSPICE data points . . . . . . . . . . . . . . . . . . . . . . . . 81Figure 5.17 EKV fit for PMOS IV characteristic sweeping Vgs, bubbles areSPICE data points . . . . . . . . . . . . . . . . . . . . . . . . 81Figure 5.18 Three stage ring oscillator EKV model comparison with SPICEsimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83Figure 6.1 The passgate latch . . . . . . . . . . . . . . . . . . . . . . . 86xiiFigure 6.2 Passgate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87Figure 6.3 Cross-coupled pair . . . . . . . . . . . . . . . . . . . . . . . 88Figure 6.4 Inverter DC transfer function . . . . . . . . . . . . . . . . . . 89Figure 6.5 Cross-coupled pair phase space with the attractors marked ingreen and the unstable saddle in red . . . . . . . . . . . . . . 90Figure 6.6 Cross-coupled pair as amplifier circuit . . . . . . . . . . . . . 90Figure 6.7 Two flip-flop passgate synchronizer . . . . . . . . . . . . . . 92Figure 6.8 Digital behaviour and propagation of a two flip-flop passgatesynchronizer . . . . . . . . . . . . . . . . . . . . . . . . . . 92Figure 6.9 Metastability behaviour and propagation of a two flip-flop pass-gate synchronizer . . . . . . . . . . . . . . . . . . . . . . . . 92Figure 6.10 Passgate synchronizer latch outputs from nested bisection runs 95Figure 6.11 Passgate synchronizer latch outputs for “perfectly” metastabletrajectory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96Figure 6.12 Estimate of log10 ||βˆ (t)||2 as the nested bisection simulates for-ward in time . . . . . . . . . . . . . . . . . . . . . . . . . . 97Figure 6.13 Comparing numerical estimated βˆ (t) vs. β (t) via AD . . . . . 98Figure 6.14 u(t)T highlighting which latch is in play in the first flip-flop forall time t during metastability . . . . . . . . . . . . . . . . . 99Figure 6.15 u(t)T highlighting which latch is in play in the second flip-flopfor all time t during metastability . . . . . . . . . . . . . . . . 100Figure 6.16 Instantaneous gain for passgate synchronizer with p:n ratio of1:1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101Figure 6.17 Non-homogenous term ρ(t) for passgate synchronizer with p:nratio of 1:1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 102Figure 6.18 Instantaneous gain comparison between the passgate synchro-nizer without and with scan loading . . . . . . . . . . . . . . 104Figure 6.19 log-ratio of the gain between the scan loaded vs. benchmark:log10(gScan(t)gBenchmark(t)). . . . . . . . . . . . . . . . . . . . . . . . 104Figure 6.20 Nested bisection simulation runs for synchronizer with kick . 105Figure 6.21 Synchronizer kicking results . . . . . . . . . . . . . . . . . . 106Figure 6.22 Instantaneous gain comparison between kicked and non-kickedsynchronizer . . . . . . . . . . . . . . . . . . . . . . . . . . 106xiiiFigure 6.23 log-ratio of the gain between the kick design vs. benchmark:log10(gKick(t)gBenchmark(t)). . . . . . . . . . . . . . . . . . . . . . . . 106Figure 6.24 Nested bisection trajectories for offset synchronizer . . . . . . 108Figure 6.25 u(t)T highlighting which latch is in play in the first flip-flop forall time t during metastability . . . . . . . . . . . . . . . . . 109Figure 6.26 u(t)T highlighting which latch is in play in the second flip-flopfor all time t during metastability . . . . . . . . . . . . . . . . 110Figure 6.27 Instantaneous gain comparison between offset and non-offsetsyncronizer . . . . . . . . . . . . . . . . . . . . . . . . . . . 110Figure 6.28 log-ratio of the gain between the offset design vs. benchmark:log10(gOffset(t)gBenchmark(t)). . . . . . . . . . . . . . . . . . . . . . . . 110Figure 6.29 3 flip-flop passgate synchronizer . . . . . . . . . . . . . . . . 112Figure 6.30 Three flip-flop offset vs. benchmark log-ratio: log10(gOffset(t)gBenchmark(t))113Figure 6.31 Three flip-flop component analysis: inv9 . . . . . . . . . . . 114Figure 6.32 Three flip-flop component analysis: pg6 . . . . . . . . . . . . 115Figure 6.33 Three flip-flop component analysis: pg6−NMOS . . . . . . . . 115Figure 6.34 Three flip-flop component analysis: pg6−PMOS . . . . . . . . 116Figure 6.35 Common gate amplifier circuit with NMOS . . . . . . . . . . 116Figure 6.36 Comparison of effective capacitance on node y2 . . . . . . . 117Figure 6.37 Three flip-flop component analysis: inv7 . . . . . . . . . . . 118Figure 6.38 Three flip-flop component analysis: inv8 . . . . . . . . . . . 118Figure 6.39 Three flip-flop component analysis: pg5 . . . . . . . . . . . . 119Figure 6.40 Three flip-flop component analysis: inv10 . . . . . . . . . . 119Figure 6.41 Three flip-flop component analysis: inv11 . . . . . . . . . . 120Figure 6.42 Three flip-flop component analysis: pg7 . . . . . . . . . . . . 120xivGlossaryAD Automatic DifferentiationBSIM Berkeley Short-channel Insulated gate field effect transistors ModelCAD Computer Aided DesignCD Centered DifferencingCDC Clock Domain CrossingCMG Common Multi-GateCMOS Complementary Metal Oxide SemiconductorCPU Central Processing UnitDIBL Drain Induced Barrier LoweringEKV Enz, Krummenacher and Vittoz’s modelHP High PerformanceIVP Initial Value ProblemMIT Massachusetts Institute of TechnologyMOS Metal Oxide SemiconductorMOSFET Metal Oxide Semiconductor Field Effect TransistorMTBF Mean Time Between FailuresxvMVS MIT Virtual SourceODE Ordinary Differential EquationPCHIP Piecewise Cubic Hermite Interpolating PolynomialPTM Predictive Technology ModelSPICE Simulation Program with Integrated Circuit EmphasisVLSI Very Large Scale IntegrationxviAcknowledgmentsThank you to Mark Greenstreet for reminding both of us on several occasions thatwe are computer scientists. Our conversations about the details of semi-conductorphysics, electronics and countless other interesting subjects keep the research fun,engaging and meaningful.A thank you to Ian Jones, formally at Oracle Labs, who helped clarify as-pects of synchronizer designs, always striving to explaining their behavior in sim-ple terms.A special thank you is in order to my loving wife Madeleine who has encour-aged me in this academic career, and continues to support my pursuit of academiainstead of lofty financial gains. It is thanks to her that I am here and able to workwith the fantastic group of people at UBC.xviiChapter 1IntroductionModern day micro-electronic designs often have many separate computational blocksthat are running in parallel. For a variety of engineering reasons, each computa-tion block typically operates with its own localized clock, creating a clock domain.Everything from consumer grade electronics such as cell phones to the servers run-ning cloud services are designed with multiple clock domains. In a typical server,each Central Processing Unit (CPU) can have several hundred to over a thousandclock domain crossings.Figure 1.1 shows key trends in microprocessor designs. For the first 30 years,clock frequencies increased along a clearly exponential pathway. Around 2003,clock frequencies hit a plateau, and since then, the number of cores per CPU chiphas had an increasing trend. New design challenges arise in multi-core/multi clockdomain designs. Sending information between clock domains, called a Clock Do-main Crossing (CDC), requires synchronization. The design of reliable synchro-nizers is a critical component to the correct execution of computational blocks.The distinctive failure mode of a synchronizer is metastability – the outputof a synchornizer can be logically undefined or change after an arbitrarily longtime beyond the clock edge that sampled the input to the synchronizer. This canthen cause failures in downstream logic circuits that depend on the output of thesynchronizer. The probability of such failures can be made very small, but it cannotbe driven all the way to 0. The metastable behavior of a synchronizer can be1Figure 1.1: 42 years of microprocessor trend [46]characterized as a saddle problem which has a failure probability described byan exponential law. The failure of a properly designed synchronizer is very rare.However, because of the number of synchronizers and frequency of sychronizationevents on modern CPU designs, a thorough understanding of synchronizer behavioris increasingly important for reliable CPU design.This thesis introduces and develops a new tool to characterize and quantify theperformance of synchronizers when metastable. This tool gives the designer newinsight and quantifiable explanations for synchronizer behavior and failure proba-bility by developing a time varying differential equation which describes synchro-nizer trajectories which are metastable. My implementation of this tool makesextensive use of Automatic Differentiation (AD). A benefit of this approach is thatwe can readily compute quantities related to the probability of synchronization fail-ure. I present a work flow which can quantitatively compare the performance ofsynchronizer designs. I explore and quantify a passgate synchronizer design and2demonstrate the merits and pitfalls of proposed design modifications which aim toimprove metastability resolution.1.1 The Synchronization ProblemSynchronizers are circuits which coordinate the transfer of information across pro-cessing blocks operating on different clock domains. The digital behavior of asynchronizer is to accept or reject a synchronization request that occurs in a givenclock period. The circuit designer’s goal is to design a circuit which can achievesynchronization with a very large Mean Time Between Failures (MTBF) due tometastability.Let’s consider a very simple one-bit synchronizer with signals shown in Fig-ure 1.2. Let our synchronizer’s output and input bits both initially be 0. The tran-sition to a 1 on the synchronizer’s input indicates that another clock domain isoffering a data transfer. The output of the synchronizer should transition to a 1synchronously with respect to the synchronizer’s clock to let the local computa-tional block know that a transfer is ready. We assume that the circuitry using theoutput of the synchronizer requires the output to change before time tcrit and re-main logically constant (1 or 0) over an interval of time t ∈ [tcrit, tend] otherwise afailure occurs.Consider that a transfer requests may occur at any time on the interval t ∈[0, tend]. This gives rise to three distinct scenarios. The first scenario is that a requestoccurs at a time thi which is early enough, allowing our synchronizer’s output bit tochange to a 1 before tcrit and we accept the transfer. The second scenario is that arequest arrives at a late time tlo which does not change the output bit in the intervalt ∈ [tcrit, tend], i.e. the request is rejected for the current synchronization period. Thethird scenario requires us to consider circuit-level models. Standard circuit modelsare based on ordinary differential equations, where the derivative function is C1 orsmoother. As a consequence, the output of the synchronizer must be a continuousfunction of the input signals and of time. Therefore, there exists times between thiand tlo for which the output will have an intermediate value (neither logically 0,nor logically 1) for some t ≥ tcrit leading to a failure. This is the basic argumentfor why a perfect synchronizer cannot exist [34].3Figure 1.2: The synchronization problem1.1.1 Saddle ProblemsSynchronizers have behavior which belongs to the class of problems called saddleproblems. Figure 1.3 illustrates that our one-bit synchronizer has two basins ofattraction, if the input transition occur at or before thi, the synchronizer’s outputwill go high which is one of its basins of attraction. Similarly for input transitionsthat occur at or after tlo will result in the synchronizer’s output to go low formingthe other basin of attraction. Because the synchronizer’s output changes its outputfrom low to high (or high to low) in a continuous fashion, implies that there mustexists a separator between the two basins of attraction.A synchronizer’s separator (or saddle) is an unstable equilibrium, hence thename metastable. The fact that a synchronizer circuit has an unstable equilibriumon the separator implies that for any synchronizer design, we can theoretically findan input transition which will cause the synchronizer’s output trajectory to remainon the separator for an indefinite amount of time. A synchronizer whose output ison the separator is one that is metastable and depending on which side the trajectoryleaves the separator will determine in which basin of attraction it will settle.The dotted lines shown in Figure 1.3 represents that in practice there is a setof input transitions and a corresponding region around the separator which leadto metastability failures. We also in practice treat an output signal which is suffi-ciently close to one of the basins of attractions as a logical 0 or 1. A synchroniza-4tion failure occurs if by tcrit the synchronizer’s output is sufficiently far from eitherbasin of attraction.Figure 1.3: One-bit synchronizer as a saddle problemSynchronizers are generally designed with multiple stages. The ideas of ourone-bit synchronizer still applies, however, the multi-stage synchronizer give riseto a separator which is described with time varying dynamics. The details and im-plications are covered in this thesis. Designers often reason about the time-varyingnature of synchronizer metastability as “moving” from stage n− 1 to stage n anddescribe the synchronizer as metastable if any stage is behaving in a metastablefashion. Because most real synchronizers are multi-staged, have high dimensionalstate spaces and time-varying saddles, visualizing the time evolution of metastablebehavior is difficult. In this thesis I provide a succinct way to both summarize andvisualize their behavior.1.2 Synchronizer Metastability Analysis WorkflowA high level overview of the work flow is shown in Figure 1.4. The highlightedgreen boxes in Figure 1.4 are components where I have made contributions. The5recipe to use the tool developed in this thesis is as follows:• Pick a transistor model (details in Chapter 5)• Design a synchronizer (example design in Chapter 6)• Run the nested bisection algorithm (details in Chapter 2)• Run the sychronizer analysis tool (details in Chapter 3)• Examine results (example in Chapter 6)Figure 1.4: Analysis tool work flow1.3 Contributions & Thesis StatementIn this thesis I implement Yang and Greenstreet’s nested bisection algorithm [53]using Yan’s MSPICE framework (MATLAB). The perturbation analysis techniquesdeveloped in this thesis are implemented using the automatic differentiation ADpackage in IntLAB [44]. To model deep-submicron Complementary Metal OxideSemiconductor (CMOS) circuits, I adapted the Massachusetts Institute of Technol-ogy (MIT) Virtual Source (MVS) model [39] to work with AD and developed a sim-plified Enz, Krummenacher and Vittoz’s model (EKV) [14]. My thesis statement is:“I present a novel method to construct a simple, time-varying linear model of syn-chronizer dynamics from the non-linear circuit model. This enables new ways to6examine, explain, characterize, and explore metastable behaviour and the impactof design choices on synchronizer robustness.”This thesis is supported by four contributions.1. My main contribution is a method to automatically create a linear small-signal perturbation model to describe trajectories which lie near the separa-tor. This linear model gives insight into the circuit’s performance and failureprobability. I have implemented this method as a new tool that provides thedesigner with insight into the time varying nature of the circuit’s behavior asthe trajectory evolves along the separator.2. There is an enormous body of work which describes transistors to varyinglevels of detail. Different models focus on different aspects of semiconductorphysics. My contribution is to provide a transistor model which is simple butreasonably accurate. This allows me to use my new analysis techniques with-out overly compromising accuracy and obtain plausible results in a timelymanner. The model I have developed in this thesis is empirically fit to Sim-ulation Program with Integrated Circuit Emphasis (SPICE) data and borrowsfrom underlying physical principles where needed to increase the accuracy.Many transistor models are broken into operating regions with conditionalstatements which are problematic for AD packages. My contribution is toprovide a model which by construction is AD friendly, simple enough to beeasily used in experimental code, while realistic enough to capture the keyproperties of state-of-the-art designs.3. Using this tool on a standard synchronizer design, I evaluate ”improvements”that have been suggested by designers, and thereby found some unexpectedbehaviors of real synchronizer designs.4. I also make contributions to the error analysis and implementation of ADmethods through iterative algorithms such as numerical integration. I de-velop concrete examples with which I am able to test the accuracy of mymethods with AD against straightforward numerical approximating methodsand draw conclusions about the merits of using AD.7The organization of the thesis starts with background information on metasta-bility analysis, circuit simulation and transistor modeling in Chapter 2. Chapter 3provides the details of the mathematics behind the new tool developed for metasta-bility and failure probability analysis in synchronizers. Chapter 4 outlines experi-ments performed to provide evidence that using AD to implement the mathematicsin Chapter 3 provides numerically accurate results.A brief survey in transistor modeling and the different effects that are consid-ered are covered in Chapter 5, while also introducing two compact transistor mod-els that have been evaluated in sychnronizer simulations. Chapter 6 goes throughthe analysis of a passgate synchronizer in detail to demonstrate how the new tool isapplied. Chapter 7 finishes the thesis with a summary of the work presented and alist of near future work and where I believe the techniques can be further applied.8Chapter 2Related WorkThe novel work that I develop in this thesis builds on many years of previous studyin metastability, circuit simulators, transistor models and automatic differentiation.Metastable failures in synchronizers are a well known and studied phenomenondating back to work by Kinniment and Edwards [25] and Chaney and Molnar [5]in the 1970s. In that same time frame, SPICE [38] was being developed as a toolto simulate circuit designs. Meanwhile short-channel Metal Oxide Semiconductor(MOS) effects [47] were being studied for use with Computer Aided Design (CAD)tools. The implementation of AD was also being developed in that time frame toautomatically compute Jacobians, gradients and produce automatic error analysison numerically computed solutions of Ordinary Differential Equation (ODE)s [40][23].This chapter is a brief overview of the work in the field of transistor modeling,circuit simulation, automatic differentiation, metastability analysis and synchro-nizer design.2.1 Transistor ModelingA plethora of work exists to model the behavior of MOS devices. There are manytextbooks (e.g. [13, 16, 18, 51]). Our focus is on CMOS models for use in VeryLarge Scale Integration (VLSI). Modern technology processes require proper mod-eling of short-channel effects such as velocity saturation [47] [37]. The Berkeley9Short-channel Insulated gate field effect transistors Model (BSIM) has evolved toinclude many transistor effects and is generally considered to be the benchmark forMOS behavior. Compact models such as the MVS [39] and EKV model [14][11][12]seek to simplify MOS behavior to a handful of equations. By no means is this anexhaustive list. The work in this thesis does not require the full understanding ofall the details of semi-conductor physics, but enough knowledge is needed in orderto build a reasonable compact model.2.1.1 BSIM and PTM Model CardsThe BSIM models [17] are the foundation of the SPICE family of simulators suchas ngSPICE [49], HSPICE[19] and LTSPICE[10]. The BSIM group has developedmodels for bulk/substrate/well MOS devices which are the ones considered in thisthesis. More recently, the BSIM group has developed models for FinFET and multi-gate MOS constructions [8] which are used in today’s microprocessor designs[22].The parameters used to simulate circuit designs with the BSIM models are oftenproprietary, which can be an impediment to public domain research. Researchersat Arizona State University provide open source model cards called the PredictiveTechnology Model (PTM) [21] which operate with the BSIM models. These PTMmodel cards are used as the basis for collecting ”ground truth” MOS device data tobuild our compact model in Chapter 5.2.1.2 Compact ModelsThe BSIM models describe the behavior of MOS devices quite accurately. Howeverthese models have many parameters and equations make them less intuitively un-derstandable. For this reason compact MOS models which trade off accuracy forsimplicity are also an area of active research. MIT researchers have developed onesuch model called the MVS model [39][24][50][29]. Their compact model comeswith routines to automatically fit MOS device data supplied by the user. The modelis semi-empirical, and describes the behavior of MOS devices in a piecewise fash-ion. Another compact model is the EKV model [14] which aims to describe allregions of operation for a MOS device with a single set of governing equationswhich apply for all operating modes. Both rely on building their models based on10physical properties such that the behavior can be explained through physics whilemaintaining simplicity.2.2 Circuit SimulationThere are a number of circuit simulators which designers use to simulate theircircuit designs. ngSPICE [49] is an open source SPICE program that is activelybeing developed. Analog Devices Inc. provides LTSPICE [10] for free whichincludes a graphical user interface. Often used in industry is a licensed version ofSPICE program called HSPICE[19]. All of these SPICE programs provide means tosimulate circuits with the BSIM models. Yan’s work in [52] provides a MATLAB[32] version of SPICE called MSPICE. MSPICE is the base framework I use in thisthesis. I have extended Yan’s MSPICE to include the analysis tools described inthis thesis. LTSPICE and ngSPICE by default use a multi-step Gear’s integratorwhere as MSPICE uses by default MATLAB’s ode45 integrator. MSPICE hasthe advantage of exposing all the internal structures and make it easy to manipulateand add functionality. Future work could include the addition of these tools inngSPICE.2.3 Automatic DifferentiationIn this work, we use automatic differentiation to automatically obtain derivativesfrom the transistor models. This technique for computing derivatives, gradients,Jacobians and Hessians is not new. Rall has a published set of lecture notes on us-ing AD in a number of different ways in [40]. Details on an early implementationsof AD for computer programs can be found in [23] which, within traces the ideasof recursive derivative definitions as early as the 1930s. Rump notes in [45] that“the method was found and forgotten several time, starting in the 1950s.”In this thesis, Rump’s licensed IntLAB package [44] is used which implementsautomatic differentiation for MATLAB. Another MATLAB AD package developedby Coleman and Xu called ADMAT is found in [6]. More recent work in automaticdifferentiation include a package for the scientific computing language Julia [43].Research is being conducted to develop an AD framework which also includesfunctionality in stochastic models and variables by Fries in [15]. This work could11potentially be used to model the stochastic nature of asynchronous data signals.The research by Eberhard and Bischof in [9] describes how to compute thederivative of parameters with respect to numerically integrated solutions. The de-scribed method of recasting this problem as a new ODE is adopted in this thesis tocompute the derivative of numerically obtained trajectories with respect to variousparameters. i.e.:ddt(dxdpT)= ∂ f∂xT∇x+∂ f∂pT∇x(t = t0) = ∇x0(2.1)2.4 Metastability Analysis and Synchronizer DesignsMarino [31] and Mendler & Stroup [34] argue that resolving synchronizer metasta-bility in a bounded time without error is a physically impossible endeavor. Theirarguments are based on the continuous nature of the circuit models. Intuitively,these continuity arguments suggest a bisection method for finding metastable tra-jectories. Pick some time tcrit at which time the synchronizer is expected to havelogically defined voltage levels. Start with finding an input thi which leads to thecircuit’s output to be high at tcrit, then find an input tlo which leads to the circuit’soutput to be low at tcrit. Somewhere between these two exists an input tin whichleads to a metastable trajectory which is neither high or low at time tcrit.2.4.1 MetastabilityWe now describe metastability and how it pertains to a synchronizer. Consider thescenario depicted in Figure 2.1 where we have two CPUs, A and B, where CPUA wants to send information over to CPU B. CPU A sends a signal to CPU B’ssynchronizer to indicate that CPU A has presented data for CPU B to consume.There are three possible outcomes from CPU B’s synchronizer. The clock shownin Figure 2.1 is the clock for CPU B, and we refere to the first rising edge of theclock in this figure as clock edge 1.• The first outcome (in Red), is that CPU A’s signal arrives to CPU B’s syn-chronizer’s at a time thi well before CPU B’s synchronizer’s clock edge 1.CPU B’s synchronizer outputs a high signal before clock edge 2 which trig-12gers CPU B to prepare to receive data.• The second outcome (in Blue) is that CPU A’s signal arrives to CPU B’s syn-chronizer at a time tlo well after CPU B’s synchronizer’s clock edge 1 andonly acknowledges CPU A’s request on clock edge 2 which triggers CPU Bto get prepared to receive data on clock edge 3. In relation to the first sce-nario, everything happens as in the first scenario but delayed by one clockcycle.• The last outcome lumps all other possible outcomes into one category. Bythe continuity arguments made by Marino [31] and Mendler & Stroup [34],assuming that the synchronizer is modeled by an ODE where the derivativefunction has at least C1 continuity, then there must exist transition times tinsuch that the synchronizer’s output can be anywhere between high and lowby time tcrit. This argument can be extended to propagate these non-digitalvalues to any signal that depends on the synchronizer output, either in thecurrent clock cycle or an arbitrary number of clock cycles later. For any sig-nal that depends on the value of the synchronizers output, there are transitiontimes for tin such that it leads to ill-defined values for the downstream signal.The last outcome is what is called a synchronization failure due to metastability.This is problematic if the synchronizer’s result branches as shown in Figure 2.2to multiple places. For example, consider the case where the output of a synchro-nizer determines the next value of a CPU’s program counter, e.g. whether the nextinstruction is fetched from the current program, or if the CPU jumps to an interrupthandler. Then some of the bits of the program counter may be set as if the synchro-nizer has resolved low, while others as if it had resolved high, and some bits couldbe invalid levels for a digital signal. This inconsistency can lead to corruption ofdata, system crash or other unexpected behavior. The possibility of the inconsistentinterpretation of the synchronizer’s output allows CPU B in our scenario to reachstates that can not be explained by a purely digital model.13Figure 2.1: Synchronizer outcomesFigure 2.2: Synchronizer output branch2.4.2 MTBF and Metastability AnalysisIntuitively, the mean-value argument from Section 2.4.1 above suggests that onecould find times thi and tlo for which the synchronizer resolves high and resolveslow respectively. Then, an algorithm could apply bisection to find a narrow inter-val that contains the actual metastability failures of the synchronizer. Implementingsuch a bisection approach using simulators such as ngSPICE or HSPICE is hope-less in practice because the numerical errors preclude distinguishing behaviors fortrajectories with nearly identical initial conditions. To achieve failure probabilitiesthat are acceptable in real designs, the gap between thi and tlo must be several or-ders of magnitude smaller than the resolution of the double-precision floating pointrepresentation. Yang & Greenstreet [53] [54] demonstrate an algorithm of “bisec-14tion with restarts” which has been implemented in this thesis MSPICE frameworkand the BlendICS MetaACE [35] tool.Early analysis of synchronizers and metastability focused on a single latch.Couranz & Wann [7] and Kinniment & Wood [26] explored this and obtained ananalytical model where:P{failure} = Tw fce−tcrit/τ (2.2)P{failure} is the probability of a synchronizer failure for a single transition to thesynchronizer’s input. Tw is the “input window” where the synchronizer is “vulner-able”, fc is the clock frequency and τ is the metastability resolution time-constantof the bistable element in the latch, e.g. the cross coupled pair of inverters (seeChapter 6). Equation 2.2 assumes that data transitions are equally likely to occurat any time within the clock period Pclk = 1/ fclk, i.e. the data transitions are drawnfrom a uniform distribution over the clock period Pclk.Typically, the designer’s concern is MTBF – the Mean Time Between Failure. For asingle synchronizer, this is the mean time between synchronization events dividedby the probability of a synchronization event failing. The rate of data transitions iswritten as fdata, and the probability of an individual synchronization event leadingto failure is given by Equation 2.2. This leads to Equation 2.3:MTBF = etcrit/τfc fdataTw(2.3)Beer et al [4] outline a history of MTBF models for synchronizers and their multi-stage nature. They all follow a form similar to Equation 2.2. Yang and Green-street’s nested bisection algorithm address the computation of the MTBF for a par-ticular synchronizer design directly by reporting ∆tin, the width of the interval ofinput transition times for which failures can occur. By the assumption that inputtransition times are uniformly distributed over the clock period,MTBF = Pclk∆tin fdata (2.4)of an input transition occurring within the small fraction of the clock period Pclk.The algorithm on its own however does not provide insight into the explanation or15causes for the different MTBF of the synchronizer. This thesis extends those toolsfor a better analysis and was first presented in [42].2.4.3 The Nested Bisection AlgorithmIn order to perform the synchronizer analysis, it is necessary to be able to sim-ulate and produce metastable trajectories. Yang & Greenstreet’s nested bisectionalgorithm [53] is a key contribution to the simulation/computation of metastabletrajectories in synchronizers. Designers typcially want to have a MTBF on theorder of millions of years to ensure that synchronization failures are sufficientlyrare, even in designs with thousands of synchronizers. Note, the million year per-synchronizer MTBF is typical for servers and other designs with high reliabili-ty/availability requirements. Other designs (such as consumer products like cellphones) have much lower standards, with MTBFs of a few thousand years per syn-chronizer. However, given the exponential relationship between resolution timeand failure probability, the issues remain largely the same.Prior to the work in [53], it was not possible to simulate metastable trajectoriesto those kinds of MTBF via SPICE. SPICE is not capable of simultaneously placing adata transition with enough precision and simulating a synchronizer which exhibitsthe expected corresponding metastability behavior. This is because SPICE can notsimulate a correspondingly accurate trajectory for an input that would occur withinthe necessary interval. An MTBF on the order of millions of years corresponds to ainput time window ∆tin on the order of 10−25 to 10−30.The nested bisection algorithm solves this problem and functions in the follow-ing manner:• Find an input transition time, thi for which the synchronizer settles high, andanother time, tlo for which the synchronizer settles low.• To accelerate the search, we do an M-way bisection instead of a simple bi-section. Because of vectorization, the cost of simulating M additional trajec-tories in MATLAB is relatively small in comparison to simulating M indi-vidual trajectories. We evenly space input transitions tini between thi and tloin M.16• Launch M simultaneous synchronizer simulations where each synchronizertrajectory corresponds to a particular input transition time tini . Let the kth runof M simulations be called the kth epoch. We write Tk to denote the start timeof the kth epoch.• The user provides a function, end epoch(v) that returns true when thevoltages in voltage vector v indicate that all simulations have settled and/orresolved. For example, a simple criterion is to test that all voltages are withinVtol of Vdd or gnd.• Search for the last synchronizer simulation trajectory that settles high and thefirst one that settles low. In order to make the classification of the trajectoriesrobust, the before last trajectory that settled high~VH(t) and the 2nd trajectorythat settles low ~VL(t) are stored. We do this because if the first trajectorythat settles low (or the last to settle high) happens to be very close to theseparator it’s possible that either of those trajectories occurred as a result ofnumerical artifacts. Selecting the trajectories that are once removed givesthe algorithm some margin for error at the expense of a slightly longer runtime (see Figure 2.3).• At the end of epoch k, the algorithm needs a way to determine how far theoverall simulation can advance in time. The goal is to determine a time inter-val t ∈ [Tk, tlin], such that all n simulation trajectories over that time intervalcan be approximated by a linear model. For each time point t j, correspond-ing to the jth time point in epoch k, we take each synchronizer voltage statevector ~Vi(t j) across all M simulations are fit them to a cubic polynomial. Ifthe synchronizer has N nodes, then the state for M simulations at time t j is avector of MN values. At each time-step, we calculate the `2 distance betweenthis vector as determined by the integrator, and its approximation as deter-mined by the cubic polynomial fit for that time. We find the first time-pointfor which this discrepancy is greater than some user specified tolerance, ε .The epoch ends at the previous time-point.• The saved trajectories that settled high and low are used to set the initialconditions for the next set of simulations for epoch k+ 1 beginning at time17Tk+1 = tlin. The initial conditions for the following epoch is an affine combi-nation of~VH(tlin) and~VL(tlin) i.e. (1−α)~VH(tlin)+α~VL(tlin) where α ∈ [0,1]split evenly in n.• The relative time window ∆tin is updated corresponding to the fraction withwhich we have shrunk the relative time window.• The input data transitions thi and tlo are updated to correspond to the trajec-tories ~VH(t) and ~VL(t). The entire process is repeated until a user-specifiedend condition is reached, for example, the desired relative time window ∆tinis achieved.Figure 2.3: Nested bisection algorithm illustrationBecause the simulations are performed using MATLAB’s ode45 integrator we canrestart simulations and impose stopping conditions to speed up simulation times.The linearity test via a cubic measure is a result of legacy test code which has beencarried through without presenting any problems. There is certainly room for amore rigorous analysis and likely a cleaner test.Another observation is that even though the input data transitions are indis-tinguishable using floating point arithmetic, the synchronizer states at each epochboundary, Tk, clearly distinguish trajectories that settle high from those that settle18low. This is because these states include the voltages on all nodes, including thosethat are most relevant to resolving metastability at time Tk.I also observed that progress is slow until the input data transitions thi and tloare narrowly placed. Once thi and tlo are close to each other, the nested bisectionalgorithm progress much faster. Figure 2.3 illustrates a sample of the nested bisec-tion algorithm from one epoch to the other. It is possible to generate trajectoriescorresponding to arbitrarily small relative input time windows ∆tin and overcomethe limiting need to specify the input transition time to an accuracy greater thanmachine epsilon.2.4.4 SynchronizersDesigners have focused on designing synchronizers to achieve large MTBF to en-sure reliable computation. Zhou et al. [56] propose a synchronizer design whichclaims to be robust under voltage supply power variation and be efficient at metasta-bility resolution. Beer and Ginosar [3] propose eleven ways to improve metasta-bility resolution in synchronizer designs, one of which is tested in this thesis. Idemonstrate that it is advisable to avoid putting loads on critical nodes, an obser-vation made in Beer and Ginosar’s paper [3]. Yang et al. [55] performed a surveyand comparison of synchronizer designs and propose a new design. The pseudoNMOS design in [55] is shown to have a larger MTBF when compared with previ-ous synchronizer designs. More recently, Li et al. [28] have proposed that boostingthe voltage supply when the synchronizer is metastable improves metastability res-olution.Apart from Yang et al. where failure probabilities were computed using thenested bisection algorithm, the ideas in these papers tend to revolve around folkloreand intuition which often does not capture the entire picture, as is demonstrated inthis thesis.19Chapter 3Synchronizer MetastabilityAnalysis ToolsThe details found in this chapter provides the foundation for the main contributionsof this thesis. The chapter outlines the mathematics and details for synchronizeranalysis in the metastable regime and clearly derives the MTBF of the synchronizer.Equation 2.2 from Chapter 2 gives a widely used formula for modelling syn-chronizer failure probability. The input window Tw and resolution constant τ fora synchronizer are terms that have been either empirically fit from observation orderived for a single latch. The MTBF of a multi-stage synchronizer is not explicitlyderived from the underlying circuit dynamics.In this chapter, we build on the small-signal, linear model that is the foundationof the nested-bisection algorithm [53]. In particular, we show that there is a scalarquantity, λ (t) that models how the synchronizer converts small changes to its inputtransition time into large changes in its output voltage at a later time, tcrit. Weshow that we can model λ (t) as a first-order, linear, inhomogeneous ODE. Thevector-valued function, β (t) gives the time-to-voltage transfer of the synchronizerat time t; in other words, β (t) describes the sensitivity of the voltage state at time tto the input transition time, tin. This analysis provides a quantitative way to explainwhy a synchronizer has a particular failure probability. We can quantify the impactof particular sub-circuits of the synchronizer and identify which sub-circuits arecritical at what times. As a bonus, we can calculate the values of parameters such20as Tw and τ from Equation 2.2 and connect these values to the dynamics of thesynchronizer circuit.Consider a one latch synchronizer, that is transparent when the clock signal islow and opaque when it is high. There are four time points of importance in theanalysis of a synchronizer. Figure 3.1 illustrates an example where the times tclk,tin, teola, and tcrit are shown for an event that leads to a failed synchronization in ourone latch example, i.e. the output changes at some time after tcrit.Figure 3.1: Synchronization timing analysis and failure example• The time tclk is the time at which the clock transition makes the first stage ofour synchronizer transition from being transparent to opaque. The period of21the clock is Pclk. In the following, we will describe other times, e.g. tin, teola,and tcrit relative to tclk. In otherwords, tclk = 0 unless otherwise stated.• The time tin is the time at which the data transition occurs at the input ofthe synchronizer. This time is assumed to be uniformly distributed over theclock period Pclk.• A designer will pick a time tcrit by which time the synchronizer’s outputneeds to be logically defined, i.e. clearly high or clearly low, or else a syn-chronization failure has occurred. The time tcrit is relative to the transition ofthe clock.• For trajectories that are metastable, the time teola is the time at which ourlinear analysis ends. From time teola to tcrit and beyond in a single periodof Pclk, non-linear large signal behavior dictates the outcome of the synchro-nizer. The time teola is selected such that the dynamics of the synchronizerhave a valid linear small signal behaviour but where ~V(teola) has separatedenough from the saddle to be clearly headed towards one of the basins ofattraction.To compute the failure probability of a synchronizer, we must identify whatconditions constitute a “failure”. Throughout this thesis, we will consider anychange of the logical values of the synchronizer outputs between tcrit and the nextrising edge of the clock to be a failure. Furthermore, if the synchronizer has mul-tiple outputs, e.g. Q and Q, then we require these outputs to be logically consistent,e.g. Q = ¬Q during this interval; otherwise, the behaviour is a failure. Conversely,we place no constraints on the synchronizer outputs during the interval from a ris-ing edge of the clock until tcrit.There are three main pieces to derive the MTBF of a synchronizer. The firstpiece, is to compute β (t) = ∂~V(t)∂ tin which represents how the voltage state~V(t) of thesynchronizer is affected by a change in tin. Informally, β (t) summarizes the cumu-lative progress the synchronizer has made at time t towards resolving metastability.The units of β (t) are in [V/s].The second piece requires a designer to identify what components of the syn-chronizer state are important to metastability resolution at time teola. For exam-22ple, latches often involve two inverting gates connected in a cycle to form a state-holding element (see, e.g. Figure 6.1 or Figure 6.3). If x(t) and y(t) denote thevoltages on the outputs of two such gates at time t, then x(t)− y(t) could be suchan indicator of progress from metastability. With the designer’s selected measureu˜T , a normalized unitless vector u(t)T is computed such that at the end of our linearanalysis, u˜T~V(teola) computes the designer’s measure of importance to metastabilityresolution, e.g. u˜T~V(teola) = 1√2 (x(teola)− y(teola)). For times t ∈ [tin, teola], u(t)T isthe component of the synchronizer state at time t such that perturbations of the state~V(t) are co-linear with u˜T at time teola. u(t)T has the effect of telling the designerwhat will happen from time t to teola.Lastly, the synchronizer’s time to voltage gain, a scalar quantity measured inunits of [V/s], represents how effective the synchronizer is at resolving metastabil-ity at time t and is shown in Equation 3.1.g(t) = u(t)Tβ (t) (3.1)In Section 3.4, we show that g(t) can be expressed as a time-varying, first-order,inhomogenous, differential equation of a single variable:ddt g(t) = λ (t)g(t)+ρ(t) (3.2)where, λ (t) describes the instantaneous gain of the synchronizer and ρ(t) de-scribes the initial time-to-voltage conversion of the synchronizer. For a typicalsynchronizer, ρ(t) rapidly approaches 0 after the initial clock event that samplesdin. We write thomo to indicate a time, after which ρ(t) is effectively 0, and note thatEquation 3.2 becomes a homogenous, linear, time-varying differential equation fort > thomo.In Section 3.6, we apply this analysis to a single latch and show how the quan-tities τ and Tw in Equation 2.2 can be obtained from our model. Section 3.7 showshow our analysis can be applied on a component-by-component basis to quantifyhow each transistor (or each inverter, etc.) of a synchronizer impacts the perfor-mance.The tools developed in this chapter are exercised in Chapter 6 on a passgate23synchronizer design to illustrate the impact of design variants and how they af-fect synchronizer performance quantitatively. We develop a summarizing figureof merit λ (t) called the instantaneous gain of the synchronizer which is used tocompare the design variants in Chapter 6.3.1 Linear Small Signal Model ConstructionThe nested bisection algorithm simulates M trajectories simultaneously and com-putes metastable trajectories forward in time. Once the algorithm has producedtrajectories to the desired time tcrit, we collect those trajectories and create a singletrajectory which we use for the remainder of the analysis.Working backwards from time tcrit, we identify a time teola such that there isno ambiguity about how the synchronizer will resolve from time t ∈ [teola, tcrit], butwhere the trajectories have not diverged enough to violate our linearity assump-tion. Therefore from time t ∈ [tin, teola] we can construct a single trajectory whichdescribes the synchronizer as a linear small-signal perturbation model by comput-ing β (t), the derivative of the synchronizer state with respect to the input transitiontime, for all time t ∈ [tin, teola].We note in Section 2.4.3 that simulating M circuit trajectories in the nestedbisection algorithm accelerates our search. We defined N to be the number ofcircuit node variables per circuit, then the nested bisection algorithm at each timestep is an operation which goes from RMN → RMN . Computing the Jacobian foreach of the M circuit models at each time point is an operation which goes fromRMN → RMN×MN . For this reason we want to separate the linearization of thesynchronizer to a single trajectory and not compute the Jacobian at each time stepfor all M circuit trajectories of the nested bisection algorithm. Doing so wouldbe a O(N) order increase per circuit in computing complexity, i.e. the complexitywould go from O(MN)→ O(MN2) at each time step of simulation. We note thatfor simplicity, the O(MN) and O(MN2) ignores the time for computing C−1.Constructing the metastable small-signal model of the synchronizer is brokeninto two parts. The first is to construct the “perfectly” metastable trajectory for thedesired time tcrit. The second is to construct a linear mapping along the “perfectly”metastable trajectory. This mapping is then used to obtain a linear small-signal24perturbation model for all points in time t ∈ [tin, teola] for use in subsequent sections.3.1.1 Constructing the “Perfectly” Metastable TrajectoryEach epoch of the nested bisection algorithm computes a pair of trajectories thatbracket the “perfectly” metastable trajectory. We write tk to denote the startingtime of the kth epoch, and note that tk+1 is the end-time of that epoch. Nested bi-section computes M trajectories, and selects two that bracket the separator betweentrajectories that settle-high and those that settle-low: we write ~VH,k to denote thetrajectory that will ultimately settle-high, and ~VL,k to denote the trajectory that willultimately settle-low. These trajectories have associated values of the tin parameterthat we will refer to as tin,H,k and tin,L,k respectively.For 1 ≤ k ≤ K, where K is the total number of epochs computed, the initialpoints for trajectories ~VH,k and ~VL,k are an affine combination of the end pointsfrom epoch k−1. In particular, we define αH,k and αL,k such that:~VH,k(tk) = αH,k~VH,k−1(tk)+(1−αH,k)~VL,k−1(tk)~VL,k(tk) = αL,k~VH,k−1(tk)+(1−αL,k)~VL,k−1(tk)(3.3)and note that tin,H,k = αH,ktin,H,k−1+(1−αH,k)tin,L,k−1, and tin,L,k = αL,ktin,H,k−1+(1−αL,k)tin,L,k−1. In the final epoch, k = K, we find trajectories ~VH,K and ~VL,Kwhich just meet the user defined settling criteria at time tcrit.The key idea of the nested bisection algorithm is to find trajectories that boundthe metastable failures and are close enough to each other so that small-signal,linear analysis is applicable. In the final epoch, these trajectories diverge to thesettled-high trajectory, ~VH,K , and the settled-low trajectory, ~VL,K . Clearly, theseinvolve large-signal, non-linear behaviour. We write teola to denote the time atwhich the small-signal linear analysis ends, and the remainder of the analysis relieson large-signal, non-linear, methods. We now describe how we determine a valuefor teola. Let∆V (t) = u˜T(~VH,K(t)−~VL,K(t))(3.4)∆V (t) quantifies the amount of separation between the settles-high and settles-lowtrajectory. The user defines a threshold, ∆Veola such that if ∆V (t) < ∆Veola, thensmall-signal, linear methods are applicable. After ∆V (t)≥ ∆Veola, then non-linear25methods should be used. In particular, we defineteola = mints.t.∆V (t)≥ ∆Veola (3.5)Finally, we find αH,K and αL,K such that Equation 3.3 is satisfied for ~VH,K(teola)and ~VL,K(teola).We now define the “perfectly” metastable trajectory, ~VK,meta(t) as the one thatbisects ~VH,K and ~VL,K – we don’t actually need for it to be exactly on the separatorbetween trajectories that would eventually settle-high and those that would eventu-ally settle-low. We just need a continuous trajectory that spans the entire analysis(up until tcrit) and is in the region where our linearization of the model applies. Tothis end, we pick the trajectory that is half way between ~VH,K and ~VL,K in the finalepoch and derive it’s predecessors in earlier epochs as shown below. Working backto k = 0 from K, we get:α˜K = 12(αH,K +αL,K)α˜k = α˜k+1αH,k+1+(1− α˜k+1)αL,k+1, 0≤ k < K~Vk,meta(t) = α˜k~VH,k(t)+(1− α˜k)~VL,k(t)(3.6)Given these interpolations, ~Vk,meta for each epoch k, we use MATLAB’s PiecewiseCubic Hermite Interpolating Polynomial (PCHIP) function to create a continuousfunction from the time-points of the combined epochs. This constructed trajectory,~Vmeta(t), is the “perfectly” metastable trajectory that we use in the reminder of theanalysis.We note that a synchronizer in reality has a window of times t ∈ [tcrit, tφnext ]where a change on the synchronizer’s outputs would result in a failure. The timetφnext is the latest time where a change to the synchronizer’s outputs would havean effect on the circuit’s behaviour within the given clock period Pclk. In otherwords metastable resolution which occurs after time tφnext happens late enough tono longer affect the resultant outcome. Finding the window where such trajectoriescould exist is acknowledged as a more rigorous analysis but we only consider thefirst occurrence of failure at time tcrit in this thesis. Furthermore we note that ∆Veolaand teola are parameters to the analysis but as will be shown in Section 3.6 has a26range of valid choices and ultimately cancel each other out when computing theMTBF. The current implementation requires the user to provide ∆Veola, however,∆Veola acts as another linearity test. We believe that the linearity test used for thenested bisection algorithm can also be used to determine teola. This would furthersimplify and automate the analysis for the user.3.1.2 Small Signal Linear MappingA trajectory~V(t) is the solution to the differential equation ddt~V(t) = f(~V(t), tin)=C(~V,(t)tin)−1I(~V(t), tin). The Jacobian, J f (t) =∂ f(~V(t),tin)∂~V(t)for all points in timefrom t ∈ [tin, teola], is used as a small-signal linear mapping to determine how fwould change as a result of perturbing the voltage nodes ~V(t). Notice that if wealso want to know how f would change as a result of perturbing when the inputtransition tin occurs, we need∂ f (t)∂ tin . The Jacobian of f and∂ f (t)∂ tin are automaticallyobtained by using AD for the trajectory ~V (t). Equation 3.7 shows the derivationfor the Jacobian and ∂ f (t)∂ tin on the time interval t ∈ [tin, teola]. The derivative of theinverse of a matrix is shown in Appendix A, Section A.1 for reference.Note that C(~V(t), tin)is a non-linear capacitance matrix with respect to theinputs~V(t) and tin. As a consequence when we compute∂C(~V(t),tin)∂~V(t), an intermediatepart of the Jacobian matrix, the result is a N×N×N tensor. In order to computethe Jacobian J f (t), we need to compute∂C(~V(t),tin)∂~V(t)⊗(C(~V(t), tin)−1I(~V(t), tin)).The result of(C(~V(t), tin)−1I(~V(t), tin))is a vector of size N×1 and so we areleft with a tensor-vector product. We define ⊗ as the operator to denote a tensor-vector product and demonstrate our definition in Section A.3.Observe that the synchronizer’s input signal transitions from gnd to Vdd (orfrom Vdd to gnd) within some small time window around tin and then remainsconstant for the remaining of the synchronization simulation. This means that ∂ f (t)∂ tinshould go to zero for t tin.The following is a list of definitions that are used in the derivation for Equa-tion 3.7:• ~V(t) is the vector of voltages in the synchronizer in units of [V ]27• tin is the time at which the input signal transition occurs in units of [s]• Let Inodes = I(~V(t), tin) the current at each node of the synchronizer in unitsof [A]. Currents flowing out of a node are positive and currents flowing intoa node are negative• Let C = C(~V(t), tin) be the capacitance matrix for the synchronizer in unitsof [F ]• Let f = ddt~V (t) = f (~V(t), tin) = C−1Inodes be the derivative function for thesynchronizer in units of [V/s]• In order for f to be AD friendly, it is assumed in this thesis that f ∈C∞, i.e.f is infinitely differentiable. (see Chapter 5 for details on f )• J f (t) is in units of [1/s]• ∂ f (t)∂ tin is in units of [V/s2]∂ f (t)∂~V(t)= J f (t) = −C−1(∂C∂~V(t)⊗ (C−1Inodes))+C−1 ∂ Inodes∂~V(t)∂ f (t)∂ tin = −C−1 ∂C∂ tin C−1Inodes+C−1∂ Inodes∂ tin(3.7)There are a few remarks to make with regards to J f (t) and∂ f (t)∂ tin . The first is that~V(t) includes source voltages such as clock, power supply, ground and input sourcesignals in addition to the voltage nodes of the synchronizer (i.e. the state variablesof the ODE model), so when I refer to J f (t), this really is the Jacobian of the statevariables – likewise when referring to ∂ f (t)∂ tin . Implementing these derivatives requirebeing able to separate out the sources and internal nodes.3.2 Computing β (t)Section 3.1 provides all the pieces needed to analyze synchronizers under metasta-bility. We define a new term β (t) = ∂~V(t)∂ tin . What β (t) tells us is how all the volt-age nodes of the synchronizer change for a change in the input transition time tin.This means that a perturbation of tin, the time at which the input signal changes,translates into a change in the synchronizer’s voltage state; in other words, we areregarding the synchronizer as a time to voltage converter. What a designer wants28is for β (t) to be as large as possible because the goal is to design a circuit which ishighly sensitive to tin.An intuitive way to think about this is to perform a small thought experimentwhere the synchronizer is metastable. If β (t) were small, this would imply thatperturbing tin would result in a small change on the voltage nodes of the synchro-nizer. Let’s say then that a large change in tin is required to get the synchronizer toappreciably change its voltage state. By extension, this implies that a large changein tin is required to get the synchronizer out of metastability. The opposite is alsotrue, a very large value for β (t) means a small perturbation to tin translates intolarge changes in the synchronizer’s voltage state.How do we compute β (t)? From the nested bisection algorithm we can con-struct a “perfectly” metastable trajectory up to time teola. That trajectory is definedas ~Vmeta(t). However, we are interested in∂~Vmeta(t)∂ tin , and~Vmeta(t) is computed vianumerical integration of f ; for realistic circuit models, there is no closed formsolution or function that can be directly used to compute β (t). What we can dois compute ∂ f (t)∂ tin and J f (t). We pose the problem of computing β (t) as an InitialValue Problem (IVP) ODE shown in Equation 3.8.β˙ (t) = J f (t)β (t)+ ∂ f (t)∂ tinβ (0) = 0(3.8)To solve for β (t), just like solving for ~Vmeta(t) we numerically integrate Equa-tion 3.8. Note that β (t) could be computed in conjunction to computing J f (t) and∂ f (t)∂ tin . This is not done because J f (t) is used in multiple places in the overall anal-ysis and J f (t) is the time consuming quantity to compute, therefore J f (t) and∂ f (t)∂ tinare pre-computed. Note that β (t) can be estimated by numerical differencing oneach nested bisection epoch k as the nested bisection algorithm makes progress infinding metastable trajectories to the desired tcrit. Equation 3.9 shows how this es-timate is obtained. We can use Equation 3.9 as a quantity to compare against oursolution obtained from Equation 3.8.βˆ (t) =~VH,k(t)−~VL,k(t)tin,H,k−tin,L,kt ∈ [tk, tk+1](3.9)293.3 Sensitivity Analysis u(t)TSection 3.2 outlines how to compute β (t) which is the sensitivity of all the voltagenodes at time t to the input signal transistion time tin. Our main concern is how aperturbation to the voltage nodes of the synchronizer at time tin will be reflected tothe nodes of interest at time teola. We define a sensitivity matrix S(t) as:S(t) = ∂~V(t)∂~V(0)S(0) = I(3.10)Equation 3.10 answers the question of how a change in ~V(0) changes ~V(t) for alltime. To generalize, let the sensitivity matrix S(t1, t2) explain how ~V(t2) changesas a result of perturbations that occur at time t1 for all t ∈ [t1, t2]. In other wordsS(t1, t2) is the small-signal model of the circuit. We can solve for S(t1, t2) similarlyto solving for β (t):ddt S(t1, t) = J f (t)S(t1, t)or in integral form:S(t1, t) = S(t1, t1)+∫ tt1 J f (w)S(t1,w)dwS(t1, t1) = I(3.11)What we want is to compute S(t, teola) for all time t ∈ [tin, teola]. However Equa-tion 3.11 would need to be solved for all time points ti along the “perfectly”metastable trajectory, this would be prohibitively expensive to compute. Instead,we compute S(t1, t2)−1 = S(t2, t1), which has the effect of reversing the roles of t1and t2. Observe that this reversal is the same as looking backwards in time fromt2 to t1. Thus to understand how a perturbation at ~Vmeta(t) perturbs ~Vmeta(teola)requires knowledge of the future. Because we have already computed the “per-fectly” metastable trajectory to teola, we can indeed integrate backwards in time30starting from teola. Equation 3.12 shows the derivation for S(t, teola):ddt S(t, teola) =ddt S(teola, t)−1= −S(teola, t) ddt S(teola, t)S(teola, t)−1= −S(teola, t)J f (t)S(teola, t)S(teola, t)−1 from Equation 3.11= −S(teola, t)J f (t)in integral form this is:S(t, teola) = I−∫ tteola S(teola,w)J f (w)dw(3.12)S(t, teola) is a matrix which is of size N×N, where N is the number of circuit nodes.Solving for all S(t, teola)i, j elements at each time step is actually overkill for whatthe designer is interested in. We can dramatically speed up the computation if weproject S(t, teola) onto the nodes of interest. The designer chooses which nodesthey want to investigate and creates a unit vector u˜T which is right multiplied byS(t, teola). The vector u˜T may consider the difference of two nodes, the sum ofsome nodes or possibly a single node. The designer has the flexibility to choosethe measure they are looking for with this vector. To prevent numerical overflows,the resulting vector is normalized and so we define u(t, teola)T in Equation 3.15:u(t, teola)T =u˜T S(t,teola)||u˜T S(t,teola)|| (3.13)In short form we refer to Equation 3.15 as u(t)T . To efficiently compute u(t)T ,Equation 3.14 puts all the pieces together:w(t)T = u˜T S(t, teola)w(teola)T = u˜Tddt w(t)T = ddt u˜T S(t, teola)= u˜T ddt S(t, teola)= −u˜T S(t, teola)J f (t)= −w(t)T J f (t)(3.14)Solving the IVP above via numerical integration we get:u(t)T = w(t)T||w(t)|| (3.15)31In practice, u(t)T provides a magnifying glass that identifies which nodes areimportant at which time to resolving metastability by teola. Let’s explore how thiscan be used.3.4 Synchronizer Gain g(t) and Instantaneous Gain λ (t)We now have all the tools necessary to derive a precise formulation for the gainof the synchronizer g(t) and how we can summarize the effectiveness of a syn-chronizer. We developed β (t) which measures the sensitivity to the synchronizer’svoltage nodes with respect to the input signal transition tin. Next we developeda mechanism that allows us to understand how perturbations at time t manifestthemselves at time teola through u(t)T . Combining these two ideas, the gain of thesynchronizer is:g(t) = u(t)Tβ (t) (3.16)The designer hopes that u(t)T and β (t) align themselves exactly, i.e β (t)·u(t)T||β (t)||||u(t)T || =1. We can reason about g(t) as being the amount of useful gain the synchronizerdevelops as a result of β (t). The synchronizer gain g(t) is still describing a timeto voltage conversion, i.e. β (t) is in units of [V/s] and u(t)T is unitless, so g(t) isalso in units of [V/s]The last piece of the puzzle is to determine how g(t) changes in time: ddt g(t) =g˙(t). The derivation for g˙(t) is quite involved, thus the main points of the deriva-tion are highlighted in Equation 3.17, where the full blown details are found in32Appendix A Section A.2.g˙(t) = ddt(u(t)Tβ (t))= ddt(u˜T S(t,teola)||u˜T S(t,teola)||β (t))= ddt(1||u˜T S(t,teola)||)u˜T S(t, teola)β (t)+ ddt(u˜T S(t, teola)) β (t)||u˜T S(t,teola)||+ u˜T S(t,teola)||u˜T S(t,teola)||ddtβ (t)= u(t)T J f (t)u(t)g(t)−u(t)T J f (t)β (t)+u(t)T(J f (t)β (t)+ ∂ f (t)∂ tin)= u(t)T J f (t)u(t)g(t)+u(t)T∂ f (t)∂ tin(3.17)Let λ (t) = u(t)T J f (t)u(t) and ρ(t) = u(t)T ∂ f (t)∂ tin which are time varying scalarquantities, then we obtain the time varying differential equation:g˙(t) = λ (t)g(t)+ρ(t) (3.18)The ρ(t) term is transient, and when t tin, g˙(t)≈ λ (t)g(t). Choosing t1 such thatρ(t)≈ 0 for all t ≥ t1, allows us to integrate g˙(t) and get a solution of the form:g(t) ≈ g(t1)exp(∫ tt1 λ (s)ds)(3.19)This approximate solution shows the familiar emergence of the exponential behav-ior of synchronizer probability failure. If we consider the exponential term λ (t),we can talk about how the gain is helping metastability resolution at all points intime. We define the term λ (t) as the synchronizer’s instantaneous gain and is theterm of interest in analyzing synchronizer designs.λ (t) is the term that is used to observe a global picture of the effectiveness ofthe synchronizer. It is desired for this term to be large and positive. If λ (t) becomesnegative then the sychronizer is harming metastability resolution, and if it is zerothe synchronizer is “stalled”, in other words not doing anything useful.33In practice, concerns about λ (t) (should) dominate synchronizer design andanalysis. The ρ(t) term quantifies the necessary initial “time to voltage” conver-sion at the input of the synchronizer. Thus, questions about the impact of “edgesharpening” and other changes to the input circuitry of the synchronizer can bequantitatively analyzed by considering ρ(t).3.5 Failure Probability and MTBFTo determine the failure probability of a synchronizer, we determine the width ofthe interval for tin for which the synchronizer does not resolve by time tcrit. Wewrite ∆tin to denote the width of this interval for tin. Assuming that transitions ondin are uniformly distributed over the clock period, Pclk, the probability of failurefor a single synchronization event is:P{failure} = ∆tinPclk (3.20)Intervals of time at tin correspond to intervals in the voltage state at time teola, andwe get∆Veola = g(teola)∆tin⇒ ∆tin = ∆Veolag(teola)⇒ P{failure} = ∆Veolag(teola)Pclk(3.21)More precisely, those intervals of time at tin correspond to intervals in the sepa-ration between the settle-high and settle-low voltage from Equation 3.4 such that∆V (teola)<∆Veola. The width of the corresponding window for tin is ∆Veola/g(teola).Recall Equation 3.18:ddt g(t) = λ (t)g(t)+ρ(t)As noted earlier, ρ(t) goes rapidly to zero after the clock edge that initially samplesdin. Let thomo be a time, such that for t ≥ thomo, ρ(t) ≈ 0 (more precisely ρ(t)λ (t)g(t)). Solving Equation 3.18 for t ≥ thomo, we get:g(t) ≈ g(thomo)exp(∫ tthomo λ (s)ds)(3.22)34Substituting the approximation into Equation 3.21, we get:P{failure} = ∆Veolag(thomo)Pclk exp(−∫ tthomo λ (s)ds) (3.23)We note that g(thomo) is a property of the synchronizer circuit, independent oft; g(thomo) models the initial time-to-voltage sensitivity of the synchronizer tochanges of tin. Likewise, ∆Veola is a property of the circuit; ∆Veola describes thedivergence from metastability required to ensure that the synchronizer settles todigitally defined values by time tcrit. Pclk is an operating condition for the synchro-nizer. Finally exp(−∫ tthomo λ (s)ds) gives the exponentially decreasing probabilityof failure with increased time for synchronization.If the average rate of transitions on din is given by fd , we get that the failurerate is P{failure} fd . Thus,MTBF =g(thomo)∆VeolaPclkfdexp(∫ tthomoλ (s)ds)(3.24)3.6 Computing TwFor a one latch synchronizer, we can derive the quantities Tw and τ from Equa-tion 2.2. To demonstrate this, we derive τ and Tw using the passgate latch whichwill be covered in Chapter 6. Note that from Equation 2.2 and Equation 3.21 weget:Tw = ∆Veolag(teola) exp(tcrit/τ) (3.25)We assume that teola is sufficiently far away from tclk (otherwise the synchronizeris being misused) that ρ(t)≈ 0 and λ (t) is nearly constant for t leading up to teola.As shown in Figure 3.4, we can find a line-of-best fit for g(t) to obtain:g(t) ≈ g0 exp(λ0 t) (3.26)Let τ = 1/λ0 . In the region where λ (t) is nearly constant, we can approximatelog∆V (t) with:∆V (t) ≈ V0 exp(λ0 t) (3.27)35where V0 =∆Veola exp(−λ0 teola). Equivalently, we can write ∆Veola =V0 exp(λ0 teola).Substituting the approximation for g(teola) from Equation 3.26 and the approxima-tion for ∆Veola into Equation 3.25 yields:Tw = V0g0 exp(tcrit/τ) (3.28)We define∆Vcrit = V0 exp(tcrit/τ) (3.29)Noting that ∆Vcrit is the extrapolation of ∆V (t) to time tcrit. We obtain Equation 2.2:P{failure} ≈ TwPclk exp(−tcrit/τ)τ = 1/λ0Tw = ∆Vcritg0(3.30)where λ0 is the “steady-state” value for λ (t), ∆Vcrit is from Equation 3.29, and g0is from Equation 3.26.This derivation has a nice graphical interpretation. We can plot logg(t) vs. twhich for time t greater than a few gate delays will be linear by our approximationto g(t) in Equation 3.26. We can find the line of best fit as described above for thedata on the time interval t ∈ [tlin, teola] where time tlin tclk. The slope of our bestfit line will be 1/τ = λ0 and the y-intercept is g0. Similarly we can plot log∆V (t)vs. t and find the line of best fit on the same time interval. We find ∆Vcrit wherethe line of best fit intercepts the time tcrit. We give an example of this approach inSection 3.6.1 below.3.6.1 Obtaining Tw for a Passgate LatchWith the above derivation, I now demonstrate how we can obtain Tw for a singlepassgate latch synchronizer shown in Figure 6.1. The passgate latch is describedin much more detail in Chapter 6. In this example I select tcrit = 1.93×10−10 [s],and select u˜T = 1√2(y0−x0). The resultant nested bisection algorithm trajectoriesare shown in Figure 3.2 and the resultant plot for λ (t) in Figure 3.3.We observe that in Figure 3.3 for times t > 5× 10−11 s, λ (t) indeed remainsnearly constant at a value of λ0 = 14× 1010 s−1. Figure 3.4 shows logg(t) vs. t,36Figure 3.2: Nested bisection trajectories for passgate latchthe linear best fit line and the y-intercept g0 = 2.09× 1010 V/s. Figure 3.5 showslog∆V (t) vs. t, the linear best fit line and the x-intercept at tcrit where ∆Vcrit = 2.845V . Finally we can compute Tw = ∆Vcritg0 = 1.36×10−10 s for the passgate latch andthis particular value of tcrit. Note that computing ∆V (t) in Figure 3.5 for timesbeyond teola includes several approximations because of the large signal non-linearbehaviour. On the other hand we see in Figure 3.5 that we have a lot of freedomin picking ∆Veola as our criteria to determine teola before the linearity assumptionfails. We also do not compute g(t) beyond teola in Figure 3.4 for similar reasons.37Figure 3.3: λ (t) for passgate latch3.7 Component Gain Decomposition ToolThe function u(t)T as developed in Section 3.3 gives a precise mathematical for-mulation of the notion that metastability is a time varying quantity which “moves”from one latch stage to the next during the operation of the synchronizer. Metasta-bility arises in latch i+ 1 because it becomes opaque before latch i fully leavesmetastability, thus landing latch i+1 in a metastable voltage state. The term u(t)Tpoints to the nodes of importance which are currently contributing to the synchro-nizer’s ability to resolve at time teola. Here, we build upon the derivations for g(t),λ (t) and u(t)T to show how one can isolate sub-circuits and components of thesynchronizer and determine how those parts are contributing to the overall instan-taneous gain.38Figure 3.4: Linear fit line and plot of logg(t) vs. t for passgate latchAs described in Chapter 5, our ODE models for synchronizers are derived usingmodified-nodal analysis of the synchronizer circuit. This allows us to decomposethe Jacobian J f (t), into the sum of contributions from each of the devices:J f (t) = ∑d∈devicesJd(t) (3.31)39Figure 3.5: Linear fit line and plot of log∆V (t) vs. t for passgate latchTherefore,λ (t) = u(t)T J f (t)u(t)= u(t)T(∑d∈devicesJd(t))u(t)= ∑d∈devicesu(t)T Jd(t)u(t)(3.32)If we have some set of “interesting” devices, e.g. a particular transistor, or thetransistors forming a particular inverter, passgate or other structure, we get:λdev(t) = ∑d∈devu(t)T Jd(t)u(t) (3.33)40Where dev is the set of interesting devices.Define thomo as described in Section 3.5: ρ(t) ≈ 0 for t ≥ thomo. Then, g˙(t) ≈λ (t)g(t), which implies:g(t) = g(thomo)exp(∫ tthomo λ (s)ds)λ (s) = ∑d∈devλd(s)g(t) = g(thomo) ∏d∈devexp(∫ tthomoλd(s)ds)gd(t) = exp(∫ tthomo λd(s)ds)g(t) = g(thomo) ∏d∈devgd(t)(3.34)We show in Chapter 6 how λd(t) can be very helpful for understanding how par-ticular circuit modifications impact synchronizer performance.Note that for “most” components in the synchronizer (i.e. anything not in thefirst latch), λd(t) is close to 0 for values of t in the neighborhood of the initialclock transition. Thus we can safely pick any reasonable value for thomo for timesafter the first clock transition and characterize the impact of devices after the firststage of the synchronizer. However, analysis of the first stage is more complicatedbecause the contribution of ρ(t) cannot be neglected. The case studies in this thesisdo not require the decomposition of components in the first stage.41Chapter 4Automatic DifferentiationAnalysisDifferentiation plays a key role in synchronizer analysis methods developed in thisthesis. This chapter examines and compares the three most common approaches todifferentiation in scientific computing: symbolic, numerical differencing, and auto-matic differentiation. Section 4.1 and Section 4.2 present quantitative comparisonsbased on simple “toy” models, and we conclude that automatic differentiation pro-vides a combination of numerical accuracy and practical implementation that makeit our choice for synchronizer analysis. Section 4.3 examines issues that arise whenusing automatic differentiation in combination with numerical solutions to systemsof non-linear equations, which is part of our synchronizer analysis when used withrealistic transistor models. We present our solutions to these challenges.Symbolic differentiation starts with the formula for a function, and uses stan-dard methods from calculus to derive a formula for the derivative of the functionwith respect to some specified variable or variables. An appealing property ofsymbolic methods is that they are mathematically exact. Software packages suchas Maple[30], Mathematica[20] and MATLAB’s symbolic toolbox [33] are exam-ples where symbolic mathematics and differentiation are used. For our purposes,the critical limitation of symbolic approaches is that we must have a formula todifferentiate. Synchronizer analysis involves the solutions of non-linear, differen-tial equations. Typically, such equations do not have a closed form solution; thus42there is no explicit formula available for subsequent differentiation. Fortunately,this saves us from having to go down the tangent of discussing whether or notsymbolic methods are tractable for our work – they are simply inapplicable.At the other extreme is approximating the derivative by finite differencing. Thesimplest approach uses the approximation:ddx f (x) ≈ f (x+h)− f (x)h (4.1)for some h sufficiently small, but not zero. The derivative estimated by equationEquation 4.1 is easily shown to have an error term that is o(h2). A simple improve-ment is to take a symmetric differenceddx f (x) ≈f (x+ h2 )− f (x− h2 )h(4.2)which has an error of o(h3). Better approximations can be achieved by evaluatingf at more points, fitting a polynomial to the results, and differentiating the polyno-mial at the desired value of x. All of these formulations suffer from the trade-offthat if h is too large, then higher-order derivatives of f corrupt the estimate of thefirst derivative. On the other hand, if h is too small, then we end up with a small-difference of large numbers (e.g. in the numerators of Equations Equation 4.1 andEquation 4.2). As a result, the estimate of ddx f (x) is inherently less accurate thanthe computation of f (x) itself. We refer to these methods as being numericallyapproximate.Automatic differentiation (AD, also known as “source code differentiation”)provides a third approach. The basic idea is to compute both the value and thederivative for each term in an expression. Let x be a variable, and e, e1 and e2 beexpressions. The observations are thatddx x = 1ddx e = 0, if x does not appear in eddx (e1+ e2) =ddx e1+ddx e2ddx e1 ∗ e2 = e1 ddx e2+ e2 ddx e1ddx sin(e) = cos(e)ddx e(4.3)and so on for other arithmetic operators and standard math library functions. Inmany programming languages, AD can be implemented by operator overloading.43The value for an expression, e, is now a tuple: the value and its derivative. Ifderivatives are taken with respect to several variables, then AD computes the valueof e and a gradient vector. The number of floating point operations to computethe derivative is at most the number of operations required to evaluate e times thenumber of derivatives requested times five (because five operations are needed fordivision). Thus AD is reasonably efficient.For our purposes, the biggest strength of AD is that, unlike numerical differenc-ing, AD does not introduce any approximation errors beyond floating point round-off errors and the errors introduced by any approximations in evaluating the orig-inal expression (such as discretization for approximating an integral). We say thatAD is numerically accurate.For the reasons described above, AD is the workhorse for differentiation inthis thesis. That noted, numerical differencing also had an important role in thisresearch: code testing. We often had to do careful derivations to put the prob-lem into a form that minimized the overhead of AD. Doing so produced several,big-O efficiency improvements. To check our approach, we compared the resultsof our optimized implementations with brute-force numerical differencing. Thereare many confounding issues in doing such checks including higher-order termsfor numerical differencing, round-off errors, and the approximation errors of thenumerical integrator. To better understand the accuracy issues for AD and numer-ical differencing I developed a model for a “toy” synchronizer. While this modeldoes not correspond to a VLSI circuit, I designed the model to “act like a synchro-nizer” and to admit closed form solutions. This provides an exact reference that Iuse for comparing AD with numerical differencing in Section 4.1 and Section 4.2.In Section 4.3 we address one more implementation issue with AD: root solvingalgorithms. A naı¨ve implementation of AD does not play nicely with iterative al-gorithms –“if” is not differentiable. Section 4.3 describes how I adapted the rootsolver in the MVS model to work with AD.4.1 Numerical Differentiating ComparisonsIn the synchronizer analysis of Chapter 3, we require the derivative of ~V(t, tin) withrespect to tin. We obtain ~V(t, tin) by numerical integration of our derivative func-44tion ddt~V(t, tin). In an attempt to verify our approach, we numerically estimatedthe derivative ∂~V(t,tin)∂ tin through brute-force numerical differencing. Our expectationwas that a close match would arise between our method and brute-force numericaldifferencing. However, a non-trivial difference remained between the two approx-imating methods was observed that appeared to be independent of our delta fortin for a wide range of small deltas. Our prejudice is that the AD results are moreaccurate, but let’s find out.Because there is no known closed form solution for ~V(t, tin) for realistic mod-els, I develop a toy example function f (t,w) with known derivatives ddt f (t,w),∂∂w f (t,w) and outline five different ways to numerical approximate∂∂w f (t,w). Theinvestigation aims to uncover and separate the sources of error introduced by nu-merical differencing, Automatic Differentiation and numerical integration. The fivemethods of approximating the derivative ∂∂w f (t,w) are as follows:• Method 1: If we assume that we have a formula for the function f (t,w), thenwe can compute numerically accurate results by setting the parameter w asan AD variable and compute the tuple(f (t,w), ∂∂w f (t,w)).• Method 2: Following the assumption that we have the function f (t,w), thenwe can compute numerically approximate results by a centered difference(CD):∂ f (t,w)∂w ≈ f (t,w+∆)− f (t,w−∆)2∆ (4.4)The question arises of what value of ∆ to use in order to minimize the er-ror. If ∆ is too large, then the higher order terms which are present in theapproximation dominate the error. On the other hand if ∆ is too small, thenround-off errors are being amplified and thus dominate the error. Picking ∆to minimize the error in practice is often a best guess.• Method 3: If we now assume that we only have the derivative functionddt f (t,w), then we can compute numerically approximate results by numeri-45cal integration of ddt f (t,w) for w±∆ followed by a centered difference:f (t,w+∆) =∫ t0ddr f (r,w+∆)drf (t,w−∆) = ∫ t0 ddr f (r,w−∆)dr∂ f (t,w)∂w ≈ f (t,w+∆)− f (t,w−∆)2∆(4.5)Once again, picking ∆ is not obvious, furthermore in this method we note thatnumerical integration may produce f (t,w±∆) to about 6 or 7 digits of ac-curacy and our derivative approximation is good to o(∆3) which means thatour approximation reduces the accuracy by approximately another 3 digits.This approximation to the derivative is thus likely only 3 or 4 digits accu-rate. Higher order differencing methods can improve the errors introduced,but it illustrates the fact that numerically approximating derivatives com-pounds errors. Therefore to approximate further derivatives from alreadyapproximated derivatives becomes a dubious method to obtain derivativesupon derivatives.• Method 4: Another approach is to numerically integrate ∂∂w ddt f (t,w) to get anumerically approximate result of ∂∂w f (t,w). We assume that Equation 4.6is integrable. For circuit models this is reasonable.∂∂w∫ t0ddr f (r,w)dr =∫ t0∂∂wddr f (r,w)dr (4.6)Seeing as we know the formula for ddt f (t,w), we can write down a mathemat-ically exact formula for ∂∂wddt f (t,w) which we then numerically integrate.This approach has the advantage that computing ∂∂wddt f (t,w) is numericallyaccurate, thus the only errors introduced are those produced by numericalintegration.• Method 5: Suppose that ddt f (t,w) is complicated and includes many sec-ondary functions to compute f (t,w) such as those found in transistor models,then writing down ∂∂wddt f (t,w) may be difficult. Our last method is to com-pute a numerically approximate result as described in Method 4 by comput-ing ∂∂wddt f (t,w) via AD. We note that obtaining the tuple(ddt f (t,w),∂∂wddt f (t,w))via AD is numerically accurate. Thus, the value we get for ∂∂w f (t,w) is nu-46merically approximate due to the numerical integration.4.1.1 Example Function f (t,w)The typical scenario, such as in the synchronizer analysis, is that we have a deriva-tive function ddt f and have no closed form solution for f . In order to evaluate allthe approximating techniques, I want to compare the results against a known closedform solution to ddt f (t,w), where the dynamics ofddt f (t,w) are time varying. Letf (t,w), ddt f (t,w),∂∂w f (t,w) and∂∂wddt f (t,w) be:f (t,w) = tanh(wt+aw2t)ddt f (t,w) =(w+aw2)sech2(wt+aw2t)∂∂w f (t,w) = (t+2awt)sech2 (wt+aw2t)∂∂wddt f (t,w) = (1+2aw)sech2 (wt+aw2t)−2(w+aw2)sech2 (wt+aw2t) tanh(wt+aw2t)(t+2awt)(4.7)In the above example if we let x˙ = ddt f (t,w) and x = f (t,w) then using the identitythat sech2 x = 1− tanh2 x we get a non-linear differential equation of the form:x˙ =(w+aw2)(1− tanh2 (wt+aw2t))=(w+aw2)(1− x2)∂∂w x˙ = (1+2aw)(1− x2)−2(w+aw2)x ∂x∂w(4.8)4.1.2 Approximating ∂∂w f (t,w)To begin evaluating our approximations to ∂∂w f (t,w), we arbitrarily choose valuesfor the parameters used for Equation 4.7 and Equation 4.8 as follows:w = 1.19a = 0.65t ∈ [0,10](4.9)The integrator used to perform the numerical integration is MATLAB’s ode45integrator and we use a ‘RelTol’ = ‘AbsTol’ = 1×10−6.We start by addressing the issue of picking ∆ for Methods 2 and 3. In order to47give the numerical methods the best results, we investigate the approximation errorof Method 2 and 3 against the numerically accurate computation of ∂∂w f (w, t). Wesummarize ∂∂w f (t,w) by computing || ∂∂w f (t,w)||2. We compare the approximatingmethods for ∂∂w f (t,w) against || ∂∂w f (t,w)||2 for different values of ∆ to charac-terize the error. Figure 4.1 shows the error as we vary ∆. We can conclude fromFigure 4.1 that ∆ should be ≈ 1× 10−5 to minimize the error for approximating∂∂w f (t,w) with f (t,w) and anywhere between 1×10−6 to 1×10−10 when approx-imating ∂∂w f (t,w) with numerically integrated trajectories ofddt f (t,w).Thus we pick ∆= 1×10−6.Figure 4.1: Error analysis with respect to ∆With ∆ selected, we proceed to evaluate the other three methods. Figure 4.2shows all five approximating methods in addition to the numerically accurate com-putation of ∂∂w f (t,w) from Equation 4.7. There is certainly no obvious discrepancythat appears in Figure 4.2; thus we investigate the error over time in Figure 4.3. Imake several observations about Figure 4.3:• Overall, the integration of ∂∂w ddt f (t,w) outperforms the centered differencingof numerically integrated results of ddt f (t,w±∆) for different values of ∆.The advantage is not so large that AD outperforms the centered differencing48Figure 4.2: Approximating methods for ∂∂w f (t,w)approach at every time point.• Method 3 has errors which are not strongly sensitive to the choice of ∆. Weshow in Figure 4.3 the errors for ∆= 10−3, ∆= 10−6 and ∆= 10−9 and seeminor differences. This suggests that choosing a good ∆ may not be overlydifficult. On the other hand, it’s unlikely that Method 3 will match Method5 by simply tuning ∆.• We observe that Methods 4 and 5 produce the same results, i.e. they areindistinguishable from each other. The hand derived version of ∂∂wddt f (t,w)and computing it via AD produce numerically equivalent results. Thus, eventhought the two methods are evaluated with two separate calls to ode45, thesequence of time points and state values are identical for both calls.• Using a centered difference numerical approximation is a better approximat-ing method for ∂∂w f (t,w) provided that we are given the function f (t,w).The centered differencing approach on f (t,w±∆) directly outperforms Meth-ods 3 through 5. This further supports our hypothesis that the integrator isthe largest cause of error.49Figure 4.3: Approximating methods error log10 | ∂∂w f (t,w)− ∂∂w fˆ (t,w)|• Computing ∂∂w f (t,w) via AD is numerically equivalent to the hand writtenimplementation of ∂∂w f (t,w). Comparing the two results agree to within afew digits of least precision, i.e. relative errors < 10−15, and thus is muchmore accurate than centered differencing. Concluding that either AD or thehand-written derivative is better than the other is not a conclusion that onecan make. Either result can be used as our measure to compare against. As aresult, an error plot is not included. For consistency the hand written versionof ∂∂w f (t,w) is used.The experiments performed here are preliminary. We observe that changing thesimulation time span yields varying error results. Our experiments show that ourAD methods overall outperform centered differencing. We also note that ode45produces more time points at the tail end of the simulation for Method 3. Theerrors observed in Methods 3 through 5 are sensitive to the settings applied for‘RelTol’ and ‘AbsTol’. Furthermore as we tighten ‘RelTol’ the seper-ation of AD and centered differencing shrinks, however, AD always maintains anadvantage. Over the course of running these experiments, these anomalies andunexplained behavior emerged. Further exploratory experiments are required tobetter understand the anomalous results. On the other hand, these experiments per-50sistently suggest that our initial belief that the AD approach has a higher accuracyas compared to numerical differencing is supported and that integrator error is thedominant error that arises in the approaches which use integration.4.2 Linear Switched Dynamics SynchronizerThe example developed in Section 4.1 does not have a saddle such as in our syn-chronizers. In this section, I present a set of switched linear equations with behav-ior that qualitatively corresponds to a two latch synchronizer shown in Figure 4.4.The system is broken into 4 phases and has two input variables x0 and x1and two output variables q0 and q1. The outputs q0(t) = tanh(a2x0(t)) and q1 =tanh(a2x1(t)) bound the output of the toy synchronizer to values [−1,1], whereasx1 and x2 are unbounded. To avoid unreasonably large results for x1 and x2, time isrestricted to t ∈ [0,1]. The dynamics, solution and imposed conditions are brokeninto each phase individually shown in Table 4.1. In phase φ0 a signal transitionoccurs at time tin according to a function din(t, tin). Analogous to a real synchro-nizer, tin can drive the dynamics such that q1 can be 0, -1 or +1 as t → ∞. Inthis toy example q1 = 0 would correspond to the toy synchronizer which is forevermetastable.Real synchronizers are made as a chain of latches. A latch is described as beingeither transparent or opaque. When a latch is opaque, the input has no effect on thelatch’s output. Conversely when a latch is transparent, then the latch’s input has adirect effect on its output. As a latch transitions from being transparent to opaquethe influence the input has on the output is also transitioning. Thus we can talkabout the phases as modeling the behavior of the latches in a synchronizer.In the terminology of latches:• Phase φ0 simulates latch 0 as being transparent and captures the input signaldin(t, tin).• Phase φ1 simulates both latch 0 and latch 1 as opaque, this phase simulateslatch 0 transitional period. The transition can leave output q0 ∈ [−1,1]. Itis this transitional period which captures the possibility of latch 0 becom-ing metastable depending on latch 0’s internal state at the time of the phaseswitch.51• Phase φ2 simulates latch 1 as transparent and obtaining signal x0 from latch0. The signal x0 of latch 0 is kept constant in this phase, but in a real syn-chronizer this would not be the case.• Finally phase φ3 simulates latch 1’s transitional period, at which point ourswitched dynamics ends.Mathematically, the phases can be described as: phase φ0 is a tracking phase forthe variable x0 which is asymptotically approaching the value of din(t, tin) whilevariable x1 is staying constant at its initial value. In phase φ1 variable x0 is expo-nentially growing either positively or negatively depending on the sign of x0 at thetime we switch between phase φ0 and φ1. This has the effect of driving q0 to either+1 or -1 in the limit. If x0 = 0 at the phase switch, then q0 = 0. Variable x1 remainsconstant.In phase φ2, variable x0 stays constant from where it left off at the switch be-tween phase φ1 and φ2. Variable x1 is asymptotically approaching the value of x0,i.e. x1 is converging to the value of x0. Finally, in phase φ3, x0 stays constant andvariable x1 is growing exponentially which has the effect of driving q1 to +1 or -1depending on the sign of x1 at the time of the switch between phase φ2 and φ3. Ifx1 = 0 at the start of phase φ3 then the q1 = 0 forever. The switched dynamics de-scribed here captures the behavior of a saddle system and is cast in the terminologyof a synchronizer.Figure 4.4: Two latch synchronizer524.2.1 The Dynamics of the Toy SynchronizerThe conditions imposed on the constants Cx are to maintain continuity. Notice thatthe conditions on C1 are a function of tin which cascades throughout the rest of theswitched dynamics. The variables a0, a1 and a2 are parameters which dictate howfast the signals change in the switched dynamics model. As shown in Table 4.1, thesimple dynamics of the toy synchronizer allow us to solve for x(t) in closed form.Table 4.1: Switched dynamics toy dynchronizer equationsPhase Dynamics Solutionsφ0 : din(t, tin) ={−1, t ≤ tin1, t > tin0≤ t ≤ tφ1 x˙0(t) = a0(din(t, tin)− x0(t)) x0(t) ={C0 exp(−a0t)−1, t < tinC1 exp(−a0t)+1, t ≥ tintin < tφ1 x˙1(t) = 0 x1(t) =C2C0 = x0(0)+1x0(tin) =C0 exp(−a0tin)−1C1 =x0(tin)−1exp(−a0tin)C2 = x1(0)φ1 : x˙0(t) = a1x0(t) x0(t) =C3 exp(a1t)tφ1 < t ≤ tφ2 x˙1(t) = 0 x1(t) =C4C3 =x0(tφ1 )exp(a1tφ1 )C4 = x1(tφ1)φ2 : x˙0(t) = 0 x0(t) =C4tφ1 < t ≤ tφ3 x˙1(t) = a0(x0(t)− x1(t)) x1(t) =C5+C6 exp(−a0t)C5 = x0(tφ2)C6 =x1(tφ2 )−C5exp(−a0tφ2 )φ3 : x˙0(t) = 0 x0(t) =C7tφ3 < t x˙1(t) = a1x1(t) x1(t) =C8 exp(−a1t)C7 = x0(tφ3)C8 =x1(tφ3 )exp(−a1tφ3 )534.2.2 Derivation of the Gain of the Toy SynchronizerChapter 3 describes the gain of a synchronizer as a measurement of how the state ofthe synchronizer changes as a result of a change to when its input signal transitionoccurs. i.e. β (t) = ∂~V(t)∂ tin . For this toy synchronizer, the function β (t) can be solvedin closed form as summarized in Table 4.2. The derivation for β (t) is with theinitial conditions of x0(0) = x1(0) = −1. Recall that C1 is a functions of tin andthe coefficient in the derivation of Table 4.2 would change based on the value ofx0(tin). In this derivation x0(tin) =−1 which means that β (tin) =[−2a00].Table 4.2: Derivation of toy synchronizer gain β (t) with initial conditionsx0(0) = x1(0) =−1Phase β (t)φ0 : 0≤ t < tφ1 β0(t) ={0, t < tin−2a0 exp(a0(tin− t)) , t ≥ tinβ1(t) = 0φ1 : tφ1 ≤ t < tφ2 β0(t) =−2a0 exp(a0(tin− tφ1)+a1(t− tφ1))β1(t) = 0φ2 : tφ2 ≤ t < tφ3 β0(t) =−2a0 exp(a0(tin− tφ1)+a1(−tφ1 + tφ2))β1(t) = β0(t)+2a0 exp(a0(tin− tφ1 + tφ2− t)+a1(−tφ1 + tφ2))φ3 : tφ3 ≤ t β0(t) = β0(tφ−3 )β1(t) =(β0(t)+2a0 exp(a0(tin− tφ1 + tφ2− tφ3)+a1(−tφ1 + tφ2)))exp(a1(−tφ3 + t))4.2.3 The ResultsAs in Section 4.1, we evaluate our AD approach against a centered differencenumerical approximation of β (t). Given that we have a piecewise equation forf (t, tin) = x˙(t, tin) from Table 4.1, we can compute the Jacobian of f (t, tin) asJ f (t) =∂ f (t,tin)∂x via AD, and solve the differential equation β˙ (t) = J f (t)β (t) with an54initial value of β (0) = 0. Therefore β (t) =∫ t0 J f (w)β (w)dw. We can also numeri-cally approximate β (t)≈ x(t,tin+∆)−x(t,tin−∆)2∆ by numerical integration of x˙(t, tin±∆).The parameters used for the toy synchronizer are as follows:tin ≈ 0.12336tφ1 = 0.25tφ2 = 0.50tφ3 = 0.75a0 = a1 = a2 = 6(4.10)Numerical integration is performed using MATLAB’s ode45 integrator with ‘RelTol’= ‘AbsTol’ = 1× 10−6. Figure 4.5 shows the resultant trajectory for x0, x1,q0 and q1. A simple bisection on tin is performed such that q1(t = 1) ≤ 1×10−2.The values for thi ≈ 0.12333 and tlo ≈ 0.12338 which translates to ∆≈ 2.5×10−5.The numerically accurate derived solution, the numerically integrated one viaAD and the numerically approximate result for β (t) are shown in Figure 4.6. Fig-ure 4.7 shows the absolute error between the numerically accurate solution and thetwo approximations. The max absolute error maxerr log10 |||β (t)− βˆ (t)||2| ≈ 0.7for the centered difference approximation and ≈ −5 for the AD method. The ADmethod has an absolute error about an order of magnitude worse than what we setfor MATLAB’s ode45 error tolerance, however the numerical approximation isseveral orders of magnitude worse when compared to the AD method. I wantto also observe what happens as t → ∞. I simulated the toy synchronizer on atime interval t ∈ [0,100] to let β (t) grow exponentially. Because log10 ||β (t)|| isgrowing linearly, I compute the relative error between β (t) and my approximationmethod βˆ (t). Figure 4.8 shows the relative error with respect to the numericallyaccurate derivation of β (t) and βˆ (t). I also show in Figure 4.8 the log10 ||β (t)|| asa reference to show the magnitude of β (t).We observe that the relative error of our AD method is slowly growing, howeverit is several orders of magnitude more accurate as compared to the centered differ-ence approximation. We also observe that log10 ||β (t)|| ≈ 250 and our AD methodis able to achieve a relative error of ≈ 1×10−4 while the numerical approximationhas a relative error of ≈ 1×10−1 to 10−1.5 throughout.55Figure 4.5: Two latch toy synchronizer simulation resultsFigure 4.6: Two latch toy syncronizer β (t) simulation resultsThe experiments performed in Section 4.1 and Section 4.2 suggests that ourinitial belief in the AD methods of approximating derivatives from Chapter 3 doindeed have a higher degree of accuracy as compared to brute-force numericalapproximating methods. This is not a proof and we note that higher order differ-entiating approximation schemes may have comparable results to our AD methods.Preliminary tests of our analysis tool from Chapter 3 using the non-linear transistor56Figure 4.7: Two latch toy synchronizer β (t) error: log10 |β (t)− βˆ (t)|Figure 4.8: Relative error analysis log10||β (t)−βˆ (t)||||β (t)|| as t → ∞,i.e. β (t) expo-nentially growing.models described in Chapter 5, showed that a higher order numerical differentiat-ing scheme produced results between the centered differencing scheme exploredhere and our AD method. We can conclude from the experiments in this chapterthat our AD method outperform the centered difference approach.574.3 Derivative of Solution to an Iterative AlgorithmThe MVS model described in Chapter 5 includes an iterative solver to obtain inter-nal quantities to the model. The focus of this section is to present a solution to theproblem of computing the derivative(s) of the result(s) obtained from an iterativesolver with respect to its input(s).Iterative algorithms are problematic with AD because they involve a series ofconditional steps. An if-cond-then-else construction is not differentiable atthe point(s) where cond changes. Often, AD is used ignoring such details becauseit is correct in most places, or because the derivative of the then and else termsmatch at the point(s) where cond changes. Iterative algorithms are approximatingfix points; conceptually, the algorithm produces the limit solution to an infinitenumber of iterations. If each iteration involves non-differentiable terms such asif, then there can be an infinite number of such discontinuities, and as a result, adirect application of AD is meaningless.In particular to the MVS iterative solver, we want to compute ∂ Ids∂~Vtx, where~Vtx arethe terminal voltages of the transistor in question and Ids is its current flowing fromthe drain to the source. The problem is that Ids depends on the values obtained fromthe iterative solver for the virtual source and drain which are internal variables tothe MVS model.The concept can be generalized by considering a function z= f (x,y) where wehave some way of finding y such that f (x,y) = 0. Then we want to find ∂y∂x . Thissection presents our method and demonstrates how it applies to the MVS modelcovered in Chapter 5.4.3.1 Fixed Point Algorithm f (x,y) = 0With several assumptions, we present a solution to compute ∂y∂x for a fixpoint al-gorithm f which for a given input x finds y such that f (x,y) = 0. We note thata full treatment of AD for fixpoint algorithms is beyond the scope of this thesis.We assume that f (x,y) : Rn×Rm → Rm, i.e. y and the range of f have the samedimensions. In the case of our non-linear function solver for circuit model eval-uation, this restriction holds. Thus ∂y∂x ∈ Rm×n. We will also assume that there is58some open neighborhood of x, X ⊆ Rn such that:∀x ∈ X . ∃y ∈ Rm. f (x,y) = 0 (4.11)Let g(x) : Rm→ Rm be a differentiable function such that ∀x ∈ X . f (x,g(x)) = 0.Therefore the total derivative of f with respect to x is:ddx f (x,g(x)) =[∂∂x f (x,y)]y=g(x)+[∂∂y f (x,y)]y=g(x)∂∂x g(x) (4.12)Because we assume that x ∈ X , f (x,g(x)) = 0 then Equation 4.12 becomes:ddx f (x,g(x)) = 0[∂∂y f (x,y)]y=g(x)∂∂x g(x) = −[∂∂x f (x,y)]y=g(x)(4.13)We assume that: [∂∂yf (x,y)]y=g(x)(4.14)is non-singular. Then by multiplying both sides of Equation 4.13 by the inverse ofthe above Jacobian matrix yields:∂∂x g(x) = −[∂∂y f (x,y)]−1y=g(x)[∂∂x f (x,y)]y=g(x)(4.15)Which gives us a way to compute ∂y∂x given our assumptions.4.3.2 Derivative of VS in MVS ModelThe MVS model described in Chapter 5 Section 5.2 has an iterative solver in itstransistor model. More details on the MVS model can be found in Section 5.2, buthere I focus on the iterative solver that computes the virtual source and drain of themodel. These internal nodes dictate the Ids current flowing in the transistor (refer toFigure 5.9) and it depends on~Vtx, the bias voltages applied to the terminal nodes ofthe transistor, i.e. ~Vtx =[drain gate source body]. The iterative solver withinthe MVS model takes ~Vtx and an initial guess I0 (a guess for the current flowingfrom the source to drain) in order to find a solution for the internal virtual sourceand drain. The MVS implementation computes I0 based on the user supplied ~Vtx59input. We define a new function Ids0(~Vtx, I0), which computes the virtual sourceand drain quantities based on the supplied guess I0 without the iterative solver.From those results we then computes the resultant Ids current. Letf (~Vtx, I0) = I0− Ids0(~Vtx, I0) (4.16)In order to compute the Jacobian of Ids, i.e. ∂ Ids∂~Vtx , we compute Ids as described bythe MVS model with an input ~Vtx. We then compute f (~Vtx, Ids) where Ids is thecurrent we just computed using the original solver. The way in which we defineIds0 means that computing Ids0(~Vtx, Ids) effectively computes a new I˜ds with onemore round of the original iterative solver. The assumption is that the result ofIds− Ids0(~Vtx, Ids) is near zero. Therefore if we let x = ~Vtx and y = Ids and followthe steps in Section 4.3.1, then ∂y∂x is the desired Jacobian:∂ Ids∂~Vtx. The assumptionthat Ids− Ids0(~Vtx, Ids) is near zero is validated in Figure 4.9 by testing an NMOSdevice with a sinusoidal input. The results show that the computed residual is onthe order of 10−14. Also shown in Figure 4.9 is the computed Jacobian elementsfor the transistor.Figure 4.9: Ids− Ids0 = 0 Assumption for an NMOS device in the MVS modeland Jacobian elements for a sinusoidal input signal60Chapter 5Transistor and Circuit ModelingAt the heart of any circuit simulator is the models that describe how the devicesbehave. Of importance in this thesis are the device models for Metal Oxide Semi-conductor Field Effect Transistor (MOSFET)s, often abbreviated as MOS devices.There are two kinds of MOS devices, the PMOS and NMOS devices.SPICE is generally regarded as the gold standard for VLSI circuit simulation.SPICE was first released in the 1970s [38], and has been updated many times.These updates include frequent releases of new and revised device models. Astransistors have gotten smaller, a growing number of effects and parameters areadded to model the correct physical and observed behavior of MOS devices.Today the SPICE models are generally based on the BSIM models [17]. Inthis thesis, I exclusively use the BSIM4 model which has on the order of a 100parameters. There are other BSIM models such as the BSIM-Common Multi-Gate(CMG) models which are used to simulate FinFET technology[8]. These are notdiscussed in this thesis. Despite the accuracy the BSIM4 model offers, it is such alarge code base that it makes it cumbersome to use with AD packages.For this reason, this chapter briefly introduces the main ideas behind short-channel transistor modeling and outlines two different model implementations: theMVS model [24][50] and a model developed by C. C. Enz, F. Krummenacher andE. A. Vittoz known as the EKV model [14]. These MOS models are used to simulatesynchronizer behavior in subsequent chapters.The MVS model has approximately 20 parameters derived from physical prin-61ciples and is known as a source referenced model, while the simplified EKV modelI develop in this thesis has 6 parameters and is known as a body referenced model.This chapter describes how the model parameters are fit and how their respec-tive capacitance model is derived. This chapter may be skipped if the reader iscomfortable with transistor models or does not wish to dive into the inner details ofMOS modeling. The subsequent simulations in future chapters make use of thesemodels but an understanding of their inner workings is not required. It is sufficientto know that the current flowing out of a circuit node for a particular device is afunction of the transistor’s terminal voltages, i.e. Ids(Vd ,Vg,Vs,Vb) and that eachdevice has an associated capacitance matrix that models the capacitance betweeneach pair of device terminal nodes. The compact models have been vectorized foruse in MATLAB and fully differentiable for all combinations of node voltages forfriendly use with AD.5.1 Transistor Modeling BasicsBefore diving into the MVS or EKV transistor models, we can talk about transistorsin a general way. The models are specific implementations of the following details,but their underlying principles are similar. The discussion on modeling will onlyconsider short channel MOSFET devices. The models are fit with empirical datagathered from LTSPICE [10] and ngSPICE[49]. This section outlines the maineffects for short channel MOSFET devices as background for the implementationsof the MVS and EKV models.5.1.1 Transistor ConstructionThis section is a brief and simplified summary of transistor construction. This isa deep field in itself, but in order to understand the effects that are being modeled,a picture to illustrate the construction of a transistor is helpful. A MOS device has4 terminals: the drain, gate, source and body. The gate of a transistor is sepa-rated from the rest of the transistor by a thin, insulating layer. For simplicity weassume that no DC current flows into the gate terminal, e.g. we are neglecting quan-tum tunnelling through the gate-insulator. The area directly underneath the gate iscalled the channel. The gate terminal of a transistor is the control which determines62how open or closed the channel is, and current flows between the source and drainthrough the channel.MOS devices come in two types, PMOS and NMOS shown in Figure 5.1. Thedifference between the two is the doping of the source, drain and channel. Dop-ing in this context means injecting impurities into the silicon which change theconducting characteristics of the silicon in that region. N-doping introduces extraelectrons into the conduction band to allow a flow of current. Conversely, P-dopingresults in an absence of electrons in the valence band. These “missing” electronsare called “holes”, and these holes move around the semiconductor crystal struc-ture as if they were positively charged. In a NMOS device, the channel is p-dopedand the source and drain terminals are n-doped and vice versa for a PMOS device.When the gate terminal of an NMOS device is biased positively it attracts a layer ofelectrons (negative charge) that sit under the gate oxide. Depending on the sourceand drain terminal bias voltages, current either flows from the drain to the source,or the source to the drain. This current is called Ids where a positive current isdefined to be current flowing from the drain to the source and a negative currentwould be the current flowing the other way (see Figure 5.2). Similarly for a PMOSdevice, except that the gate is negatively biased to attract a layer of holes (positivecharge) under the gate oxide.Figure 5.1: PMOS and NMOS simplified device constructionThe NMOS and PMOS devices sit on top of what is called the bulk/substrate (orwell) which is either doped p for NMOS devices or n for PMOS devices. The 4thand last terminal in a MOSFET devices is called the body terminal which connects tothis substrate. For all subsequent discussions the substrate is connected to gnd for63Figure 5.2: PMOS and NMOS device symbols and Ids currentthe NMOS devices and the supply voltage Vdd for the PMOS devices as is commonfor digital circuits.The geometry of a transistors dictates how much current can flow in the channeland how large its associated parasitic capacitances are. The “process size” refers tothe minimum feature size of the process. So a 45nm process would mean that thesmallest feature in that process is 45nm. For all discussions in this thesis, it willbe assumed that the transistor gate length will be the size of the process, i.e. 45nmwould be the length for the transistor’s gate unless otherwise specified.The transistor width however is varied to change the parasitic capacitance andamount of current the device in question can provide. For fitting purposes a 10 : 1ratio is used, that is to say that the width is 10× the length of the transistor.For SPICE simulations, MOS devices parameters are provided via a model card.These model card parameters come from the process by which the devices aremade and include the appropriate values to accurately simulate the device basedon observed measurements. These model cards are generally proprietary, and re-quire non-disclosure agreements in order to use them in the development of micro-electronic circuits. Thankfully, researchers at the University of Arizona have comeup with a set of open-source models. These model cards are called the PTM[21]model. The PTM model cards are the ones used to generate simulated measuredtransistor data from LTSPICE[10] and ngSPICE[49]. Transistor model fitting inthis thesis is derived from the PTM 45nm High Performance (HP) model card. Thereare many others to choose from.645.1.2 IV CharacteristicsTo characterize a function for the Ids current flowing in a transistor, there are anumber of tests performed to get current-voltage (IV) characteristic curves. Thesemeasured data points are then used to find the appropriate parameters to the tran-sistor model which best describe those curves. There are two kinds of IV charac-teristic curves that are important to the behavior of MOS devices.The first is to understand how the channel current Ids changes as a function ofthe drain-source voltage Vds = Vd −Vs while holding Vgs = Vg−Vs constant. Thesecond is to understand how Ids changes as a function of the gate-source voltageVgs =Vg−Vs while holding Vds constant.For the purposes of demonstration, the process will be described for a PMOS de-vice with the PTM 45nm HP model card. Classic textbooks such as [51] will gener-ally describe the process for an NMOS device. The experimental setup is shown inFigure 5.3. Three voltage sources are placed for Vg, Vd and Vs. The body terminalVb is connected to Vs =Vdd for a PMOS device.Figure 5.3: PMOS experiment test setup for the PTM 45nm HP model cardThe 45nm HP process is a 1.0V process which means that the supply voltageVdd is 1.0V. The 1.0V supply voltage will be carried throughout subsequent simu-65lations. In the PMOS device, the source terminal is connected to Vdd, in an NMOSexperiment the test setup remains the same but the source terminal gets connectedto gnd. Eleven different values for Vg are chosen, starting from 0V to 1V in incre-ments of 0.1V. Notice that Vgs is in fact negative and goes from -1.0V to 0V. Vd isswept from 0 to 1 in increments of 0.01V for each value of Vgs. Again note that Vdssweeps from -1.0V to 0V and so Ids is negative as shown in Figure 5.4.Figure 5.4: PMOS Vds sweep for Vgs = [−1,0]V in increments of 0.1VThe other IV characteristics curve is when Vd is kept constant and Vg is variedfor various values of Vd . Shown in Figure 5.5 is the SPICE simulation results forsweeping Vg in 0.01 increments for Vd of 1.0 to 0 in 0.1 steps.The SPICE simulation data from Figure 5.4 and Figure 5.5 is saved, formattedand used as the dataset for model fitting.5.1.3 Transistor EffectsTransistors simplistically are devices that control current flow from the drain tosource via the gate. Well known physical properties that arise from short-channeltransistor construction get incorporated into state-of-the-art transistor models. In-troducing these parameters helps the model fitter have more degrees of freedom tobetter describe the IV characteristics discussed in Section 5.1.2. In this section, themain effects are discussed, but by no means is exhaustive. The goal is to be able to66Figure 5.5: PMOS Vgs sweep for Vds = [−1.0,0]V in increments of 0.1Vunderstand the main effects which contribute to the observations from SPICE.Velocity SaturationVelocity saturation is the first main point to understand in short-channel transis-tor models. Velocity saturation of the carriers means that under the electric fieldthat is applied at either the drain or source node of the transistor causes the chargecarriers to reach their maximum speed in the channel. When this occurs the termi-nology in literature[37] says that the semiconductor is velocity saturated. This hasimplications on the IV characteristics shown in Section 5.1.2. It is not a terribleassumption to just assume that the charge carriers are always velocity saturated inshort channel devices. Although there is some loss of accuracy when the electricfield across the drain and source is sufficiently low.Body Effect, Threshold Voltage and Pinchoff VoltageThe threshold voltage for a transistor is the minimum gate voltage which allowscurrent to flow from the drain to source. While the pinchoff voltage is the voltageat which the transistor transitions from the linear region to the saturation region.The body effect in a transistor is the effect present due to the bulk/substrate/well/body which changes the effective threshold voltage of a transistor and has the fol-67lowing form[51]:Vth =Vth0+ γ(√φs+(Vs−Vb)× type−√φs)(5.1)The terms in Equation 5.1 are Vth0 a fit parameter to describe the threshold voltagein [V ], φs the surface potential which is a physical property of the device in units of[V ] and γ – the body effect coefficient in units of [√V ]. NMOS devices get a typeparameter of 1 while PMOS devices get a type of -1 because the body is connectedto Vdd.Note that the body effect does not impact the model for circuits such as invert-ers, where the sources of NMOS devices are connected to gnd and the sources ofPMOS devices are connected to Vdd. On the other hand, our analysis tool showsthat the body effect has an impact on circuits such as the passgates in a passgatelatch where the source terminals of the transistors in the passgate are connectedto the outputs of inverters. The inverter output can be at an intermediate valuebetween gnd and Vdd, especially when a latch is in a metastable condition.Drain Induced Barrier Lowering (DIBL)Drain Induced Barrier Lowering (DIBL) is a significant effect in short channel de-vices. This is the effect of the bias voltage on the drain node and its influence onthe transistor effective threshold voltage. An interesting observation is that withoutthis term, the synchronizers considered in this thesis would have very high failurerates. DIBL is essential to ensure that the inverters of a metastable latch are biasedinto their high gain regions rather than being “stuck” with all transistors in theircut-off regions.Channel Length ModulationChannel length modulation models how the the reverse bias of the drain-to-channeljunction creates a depletion region that effectively reduces the length of the overallchannel. This increases the conductance of the device.685.1.4 Device CapacitanceThe most dominant capacitance for a MOS device is the parallel plate capacitorcreated from the geometry of the transistor. Cox is the capacitance from the gateinsulating oxide and channel which is measured in capacitance per units area of thedevice [F/m2]. The total parallel plate capacitance Cox is distributed to the drain,source and gate nodes according to the voltages on the terminals.The distribution of the capacitance gives rise to the Miller effect which is dueto the capacitance between the gate and drain/source nodes and lastly there is thejunction capacitance at the source and drain which also contribute to the overallcapacitance of the MOS device. To get the capacitance from the transistor modeldirectly, it is necessary to be able to compute:Ci j =−∂Qi∂Vj , i 6= j∂Qi∂Vj , i = j(5.2)If the transistor model provides a charge model, then the capacitance matrix canbe automatically obtained using automatic differentiation. The full capacitanceMatrix for any individual device is then:CTx =Cdd Cdg Cds CdbCgd Cgg Cgs CgbCsd Csg Css CsbCbd Cbg Cbs Cbb (5.3)Where the relevant elements of Cxx is label in Figure 5.6. For simplicity, the matrixshown in Equation 5.3 is assumed to be symmetric1, the off diagonal terms areequal to each other, i.e. Cxy =Cyx. Our model thus only requires the computationof 6 unique capacitance values for each device. Furthermore it is reasonable toassume Cds = 0 for the purposes of the work presented in this thesis.1SPICE models can generate non-symmetric CTx. Although this can lead to non-physical be-haviours (e.g. violation of charge conservation), for many circuits, these models can be a “best fit”for modeling the distributed structure of an actual transistor to a four-terminal device.69Figure 5.6: NMOS parasitic capacitors (not shown Cgb)Cox ModelFor both models considered in this thesis, having an accurate Cox is crucial forproper simulation. Nominally Cox is defined as:Cox =ε0εrtox(5.4)Where ε0 = 8.854× 10−12 Fm is the vacuum permittivity, εr is the relative permit-tivity of the gate oxide material and tox is the oxide thickness. From the modelcard provided by the PTM 45nm HP process, εr = 3.9 and tox = 1.25× 10−9m.Alternatively, one can extract Cox by setting up a small signal experiment withSPICE (experimental setup show in Figure 5.8) and an AC signal sweep of a volt-age source applied on the gate. In the 45nm HP PTM model card this was donefor transistor gate lengths of 32nm, 45nm, 60nm, 90nm and 120nm with a fixedwidth of 450nm. The results shown in Figure 5.7 show the linear relationship anddemonstrate that approximating the gate capacitance as a parallel plate capacitor isa reasonable assumption.Comparing the results obtained from the SPICE experiment and the parallelplate capacitance yields a difference of ≈ 2.71%. For both the MVS and EKVmodels Cox is used as a base unit of capacitance from which the gate capacitanceis derived from.70Figure 5.7: Linear gate capacitance resultsFigure 5.8: Gate capaci-tance experimentalsetupJunction CapacitanceNeither the MVS or EKV model have explicit junction capacitance models. In orderto model this, I have extracted from the BSIM4 model [17] manual the equations tocompute junction capacitance. Recall that the 45nm HP model card is available tous, therefore it is possible to find all the necessary parameters to compute the junc-tion capacitance. The junction capacitance is the capacitance that exists betweenthe source and body or drain and body, i.e. Csb or Cdb.The junction capacitance model shown in Equation 5.5 is used to improve theaccuracy of the simulations. Table 5.1 lists all the parameters needed to properlycompute the junction capacitance for the MOS model. The names of the parame-ters and how they are derived is not of significant importance. Note that junctioncapacitance is the capacitance which accounts for a circuits ability to go to voltagelevels slightly above Vdd or below gnd voltage levels.Although the BSIM4 model for computing the junction capacitance is compre-hensive, a simpler diode model with fewer parameters is likely to perform just as71well and is left as a future model simplification.Cjbd =Cjd(T ) ·(1− VbdPbd(T ))−Mjd, Vbd < 0Cjd(T ) ·(1+Mjd · VbdPbd(T )), otherwiseCjd(T ) = Cjd(Tnom) · (1+Tcj · (T −Tnom))Pbd(T ) = Pbd(Tnom)−Tpb · (T −Tnom)Cjbdsw =Cjswd(T ) ·(1− VbdPbswd(T ))−Mjswd, Vbd < 0Cjswd(T ) ·(1+Mjswd VbdPbswd(T )), otherwiseCjswd(T ) = Cjswd(Tnom)+TCjsw · (T −Tnom)Pbswd(T ) = Pbswd(Tnom)−Tpbsw(T −Tnom)Cjbdswg =Cjswgd(T ) ·(1− VbdPbswgd(T ))−Mjswgd, Vbd < 0Cjswgd ·(1+Mjswgd · VbdPbswgd(T )), otherwiseCjswgd(T ) = Cjswgd(Tnom)(1+TCjswg(T −Tnom)Pbswgd(T ) = Pbswgd(Tnom)−Tpbswg(T −Tnom)Cbd = AdCjbd +PdCjbdsw+WactiveCjbdswgAd = Wactive×LactivePd = X j× (Wactive+Lactive)Lactive = L+XL−2LintWactive = W +XW −Wint(5.5)5.2 MVS ModelThe main idea behind the MVS model[24][50][29] is to create internal virtual nodesfor the source and drain of each device which are determined based on the currentthat would need to flow in order to create an equilibrium. An iterative algorithmis used to find the solution to the virtual source and drain by starting with an ini-tial guess for those virtual nodes to be the same potential as the potential actuallyapplied to the drain and source nodes. Solving for the virtual source and draindictates the current that must be flowing in the device (see Figure 5.9 ).A semi-empirical model is used to capture all the effects described in Sec-tion 5.1.3. The main effects discussed in the previous section are derived from72Table 5.1: Junction capacitance list of paramatersParameters Description 45nmHP Model CardTnom Temperature for parameters 27oCWint Channel-width offset parameter 5.0×10−9mLint Channel-length offset parameter 3.75×10−9mXL Channel Length offsetdue to mask/etch effect −20×10−9mXW Channel Width offsetdue to mask/etch effect 0.0mX j Source/Drain junction Depth 1.4×10−8mCjd(Tnom) Bottom junction capacitanceper unit area at zero bias 5×10−4F/m2Mjd Bottom junction capacitancegrating coefficient 0.5 unitlessCjswd(Tnom) Isolation-edge sidewall junctioncapacitance per unit area 5×10−10F/mMjswd Isolation-edge sidewall junctioncapacitance grading coefficient 0.33 unitlessCjswgd(Tnom) Gate-edge sidewall junctioncapacitance per unit length 5×10−10F/mMjswgd Gate-edge sidewall junctioncapacitance grading coefficient 0.33 unitlessPbd(Tnom) Bottom junction built-in potential 1.0VTpb Temperature coefficient of Pbd 0.005V/KTcj Temperature coefficient of Cjd 0.001K−1Pbswd(Tnom) Bottom junction built-in potential 1.0VTpbsw Temperature coefficient of Pbswd 0.005V/KTCjsw Temperature coefficient of Cjswd 0.001K−1Pbswgd(Tnom) Bottom junction built-in potential 1.0VTpbswg Temperature coefficient of Pbswgd 0.005V/KTCjswg Temperature coefficient of Cjswgd 0.001K−1physical principles. This model is a source referenced model which means thatthe assumption is that the drain node is at a higher potential then the source node,otherwise the source and drain nodes and the direction of the current is flipped.This model also calculates the charge in the device which means that the node73capactiances can be calculated directly with automatic differentiation by applyingequations Equation 5.2 and Equation 5.5. The body of work found in [39] includessoftware and instructions on how to fit the measured transistor data to the modelparameters. Figure 5.10 and Figure 5.11 show the results of fitting LTSPICE datato the MVS model. The default Cox value in the parameter fitting needed to bemodified to that extracted from the PTM 45nm HP model card to get a good fit.Figure 5.9: MVS n-device transistor with virtual source and drainI used the three-stage ring oscillator circuit shown in Figure 5.12 to evaluatehow well the fit parameters and capacitance model perform. In Figure 5.13 a com-parison between the results obtained using SPICE and the MVS model is observed.The MVS model and its associated capacitance model extracted from AD yields re-sults that agree quite closely to what is obtained using SPICE. This little experimentgives us confidence in the model’s ability to properly simulate synchronizers.Note however that there is a potential weakness in this test for the capacitancemodel. The three-stage ring oscillator does not exhibit any body effect becausethe circuit is purely made of inverters. A better test circuit such as a C-elementoscillator[48] (micropipeline ring) would be a more comprehensive check.74Figure 5.10: MVS fit for NMOS IV characteristic sweeping Vds, bubbles areSPICE data points5.2.1 MVS Model DeficienciesAlthough the MVS model is a nice and compact model, there are some workaroundsthat need to be addressed. The first is that because the model is source referenced,and requires Vds to be a positive quantity, there arise difficulties when Vds ≈ 0. Aphysically realistic model should be infinitely differentiable everywhere. Exper-iments in [39] demonstrate that the MVS model is at least C2 at Vds = 0, but notC∞ due to the piecewise definition at Vds = 0. A practical problem arose becausethe iterative solver in the MATLAB implementation provided by the MVS modelauthors [39] checked Vds and iteratively updated estimates of the virtual source anddrain. When Vds is close to zero, intermediate steps of the solver can flip the signof the virtual source and drain which caused the model evaluation to fail when at-tempting to take the log of negative quantities. A correction to the code from [39]was applied to mitigate this problem.The iterative solver introduces challenges for use with AD. While the iterativesolver is finding solutions for the virtual source and drain, we suspend the use ofAD. We then need a meaningful way to re-introduce the AD functionality after thevirtual source and drain have been finalized. The special care that is required to75Figure 5.11: MVS fit for NMOS IV characteristic sweeping Vgs, bubbles areSPICE data pointsFigure 5.12: Three ring oscillator circuit with Miller capacitancepreserve meaningful derivatives makes the implementation less flexible. Comput-ing the derivative of the current with respect to the drain or source nodes requiresnew methods because they are functions of the solution to the virtual source anddrain. In order to compute ∂ Ids∂Vx where x are the terminal voltages, a new techniqueis employed which is discussed in Chapter 4 Section 4.3.The final remarks are that despite the MVS model being a nicely compact modelwhich produces accurate results as demonstrated in Figure 5.13, Figure 5.10 andFigure 5.11 its piecewise model is problematic with AD; if is not differentiable.The MATLAB code has been revised to eliminate all but one of the conditional76Figure 5.13: Three stage ring oscillator MVS model comparison with SPICEsimulationstatements in the model (outside of the root solver, which was handled separatelyas described above). The remaining if arises from the requirement that Vs ≤ Vd(for an NMOS device). This is a consequence of the MVS model being a source-referenced model. These concerns of simplicity, efficiency and differentiabilitymotivated examining the EKV model as described in the next section. Furthermore,model evaluation dominated the time in the analysis algorithms of Chapter 3.5.3 Simplified EKV ModelThe EKV model, named after its original creators, Enz, Krummenacher, and Vittozis a body-referenced model that has gone through many revision over the past 25years [14][11][13][12]. Similarly to the BSIM and MVS models, the EKV model isphysics based. By taking a body-referenced approach, the symmetry of the source77and drain is built into the design of the formulas of the model. For the work in thisthesis, we are primarily interested in a model that is numerically tractable and pro-duces results that are close enough to the physics based model to enable meaningfulanalysis of circuit design trade-off. Thus, we derive a simplified, empirical versionof the model. The form of our model is motivated by the underlying physics, e.g.using exponentials to reflect Boltzmann distributions, but the choice of parametersis purely empirical: based on curve fitting to get a “good enough” fit.In this section I introduce a simplified EKV model with a few parameters thatlump together many of the physical parameters described in Section 5.1.3. Simi-larly to the MVS model, the EKV model is compact and has a handful of parameters.The main idea behind the EKV model is that the current flowing in the channel of atransistor is the sum of the forward current and backward current. This simple ideais attractive because the function to compute both the forward and backward cur-rents are continuous and infinitely differentiable functions i.e. C∞. The EKV modelis a body referenced model which means that there is symmetry about the MOSdevice, thus there is no case split for Vds when the source is larger than the drain. Inthat scenario all that happens is the backward current overtakes the forward currentand Ids becomes negative.I start by writing out our proposed formula for computing Ids:u = α{Vg−βVd−Vs−(Vth0+ γ(√φ f +(Vs−Vb)× type−√φ f))}v = α{Vg−βVs−Vd−(Vth0+ γ(√φ f +(Vs−Vb)× type−√φ f))}Ids = WI0 (log [1+ exp(u)]− log [1+ exp(v)])(5.6)A few terms that are familiar from Section 5.1.3 are present, such as the bodyeffect term from Equation 5.1 and W the width of the transistor. That leaves α ,a term to capture velocity saturation, β , a term to capture DIBL and I0 a scalingterm that gives the optimizer some flexibility to scale the output as it sees fit. Theunits associated with these quantities are not derived from physical properties ofthe device like the MVS model.5.3.1 EKV Model FittingThe fitting algorithm to find the parameters for the EKV model simply uses gradientdescent (Equation 5.7) and takes a step size according to the Wolfe condition shown78in Equation 5.9. I allow for a weighted sum of the error with wi because otherwisethe fit will distribute all the error to regions where the current is relatively small.Therefore the weighting provides a mechanism to distribute the error a little better.Automatic differentiation can easily be used to compute the gradient with respect tothe parameters, however it is also nearly as easy and more illuminating to computethe derivative manually shown in Equation 5.8. The tabulated Ids from Section 5.1.2is taken as ground truth, thus the objective is to come up with parameters thatminimize the error from the EKV model and the tabulated Ids from SPICE.∇ParamsIds = ∂ Ids∂Params∇ErrParams = Σni=12∂ Ids∂Params ·wi ·(Idsi− Idsmeasi)Paramsi+1 = Paramsi− γstep∇ErrParams(5.7)Let φM =√φ f +(Vs−Vb)× type−√φ fVmod = γ×φMu = α (Vg+βVd−Vs− (Vth0+Vmod))v = α (Vg+βVs−Vd− (Vth0+Vmod))∂ Ids∂Params=∂ Ids∂ I0∂ Ids∂α∂ Ids∂β∂ Ids∂Vth0∂ Ids∂γ∂ Ids∂φ f=W (log [1+ exp(u)]− log [1+ exp(v)])WI0(euuα(1+eu) − evvα(1+ev))WI0(αVd eu1+eu −αVs ev1+ev)WI0(−α eu1+eu +α ev1+ev)WI0(−αφM eu1+eu +αφM ev1+ev)WI0(−αγ eu(1+eu(12√φ f+(Vs−Vb)×type− 12√φ f)+αγ ev1+ev(12√φ f+(Vs−Vb)×type− 12√φ f))(5.8)γstep =|(Paramsi+1−Paramsi)T( ∂ Idsi+1∂Params −∂ Idsi∂Params)||| ∂ Idsi+1∂Params −∂ Idsi∂Params ||2(5.9)Using gradient descent as shown in Equation 5.7, results in a fit shown in Fig-ure 5.14 and Figure 5.15. The resultant fit is not as good as that from Section 5.2but the simplicity and body referenced model make this a very attractive modelwhich captures the behavior reasonably well to explore synchronizer designs. The79PMOS device fits better in this EKV model than the NMOS device. See Figure 5.16and Figure 5.17Figure 5.14: EKV fit for NMOS IV characteristic sweeping Vds, bubbles areSPICE data pointsFigure 5.15: EKV fit for NMOS IV characteristic sweeping Vgs, bubbles areSPICE data points80Figure 5.16: EKV fit for PMOS IV characteristic sweeping Vds, bubbles areSPICE data pointsFigure 5.17: EKV fit for PMOS IV characteristic sweeping Vgs, bubbles areSPICE data points5.3.2 EKV Capacitance ModelThis EKV model uses the capacitance model from [11] which distributes Cox com-puted by the geometry of the transistor in proportion to the normalized currents i fand ir as in Equation 5.10.81Using the computed capacitance as shown in Equation 5.10 with the junctioncapacitance from Equation 5.5 yields a result that has too little capacitance whencomparing against the result of the three ring oscillator. Through trial and error,n the shape factor in Equation 5.10 for the PTM 45nm HP model is set to 1.3.The overall computed capacitance from Equation 5.10 and Equation 5.5 is scaledup by 2.2 which produces results that are similar in accuracy as those shown inFigure 5.13. Figure 5.18 demonstrates that it is possible to get reasonable resultsas compared to SPICE by distributing Cox as described in Equation 5.10.i f = log(1+ eu)ir = log(1+ ev)q f =√1+4i f−12qr =√1+4ir−12cs = (1/3)(q f ·(2q f+4qr+3)(q f+qr+1)2cd = (1/3)(qr·(2qr+4q f+3)(qr+q f+1)2Cgs = Cox · csCgd = Cox · cdCgb = n−1n ·(Cox−Cgs−Cgd)(5.10)5.3.3 EKV Model DeficienciesMy EKV model lumps the physical behaviors of certain transistor effects into asingle parameter without any physical explanation unlike the BSIM, MVS, and pub-lished EKV models. The model makes no attempt to explain the parameters butsimply provides them as a means to make a better fit.The fit for the NMOS device is noticeably worse for small values of Vds and Vgsas compared to the PMOS device. The NMOS device computes currents that aresmaller than they should be. This means that my NMOS devices in those regionswill be slower responding or have worse than expected drive strength.The assumption of always having a velocity saturated model is obviously ap-parent for low Vds in the NMOS devices. We believe that a near future improvementcan be made to obtain a better term for velocity saturation which takes into account82Figure 5.18: Three stage ring oscillator EKV model comparison with SPICEsimulationthe field strength.On the other hand, the subjective notice in simulation speed-up time by using asimpler model is attractive for analysis. In the interest of getting simulation resultsfaster and obtaining results that are likely to be within 20% is considered to begood enough to demonstrate the effectiveness of the new tools in this thesis.83Chapter 6Passgate Synchronizer andVariants: A Running ExampleChapter 3 developed the mathematics and tools to analyze synchronizer designs,and Chapter 5 described the underlying models which are used to simulate cir-cuits. In this chapter, we turn our attention on a particular design to see how themodels and tools produce results that enable designers to understand their syn-chronizers behaviour and thereby optimize their designs in ways that could not bedone with previous analysis tools. The synchronizers analyzed in this chapter con-sist of chains of two or three flip-flops based on pass-gate latches. The methodspresented are readily applicable to synchronizers with more or fewer latches, andother design topologies which are not a linear chain such as the wagging basedsynchronizer design by Alshaikh et al. in [2]. A flip-flop can be constructed to be“edge” triggered by connecting two latches together that are enabled/disabled bythe clock signal and its logical inverse. By enabling latches on opposite polaritiesof the clock, the latches in a chain alternate between transparent and opaque asdescribed in Section 4.2.The simplified EKV model from Chapter 5 is used to perform all of the simula-tions without loss of generality. The simplified EKV model can be evaluated muchfaster when running the tools as compared to the more accurate MVS model. Theresults from Chapter 5 gives us confidence that when a designer switches to a moreaccurate model, that the analysis becomes more accurate as dictated by the model84but will not lead to qualitatively different conclusions in most cases.We start with the passgate latch synchronizer because it is a design that iswidely used in practice. The goal is to establish a baseline design with which wecan use as a benchmark in order to compare different synchronizer designs andvariants. In this chapter, the details of the passgate latch and passgate synchronizerdesigns are described with explanations of their digital and continuous behaviors.We then cover in detail the results obtained by using the tools from Chapter 3 onthe passgate synchronizer design and demonstrates how to interpret g(t), u(t)T andλ (t) as measures to establish benchmark characteristics. With a benchmark de-sign to compare against, I investigate three passgate synchronizer design variants:one with scan circuitry, the kicked synchronizer and the offset synchronizer. Thekicked and offset synchronizers are variants that aim to improve metastability res-olution. I explore their consequence and impact to metastability resolution againstthe benchmark design.The analysis tools leads us to a new discovery. The passgate element, one oftenthough of as being a passive device can be exploited as a small signal amplifier fora brief period of time during metastability resolution of a passgate synchronizer.This subtle behavior is discovered as a result of the new tools at our disposal.6.1 The Passgate Latch & SynchronizerThis section describes the components and functionality of a passgate latch andoutlines how the composition of four passgate latches makes a two flip-flop pass-gate synchronizer. No new or novel results are presented in this section, on theother hand this section is present to keep the thesis a self-contained document. Wepresent the functionality of the building blocks of a passgate latch by first describ-ing their digital abstraction followed by an intuition into their analog and contin-uous behavior. This section is intended to be referred to as a reference when Idiscuss various parts of the passgate synchronizer in the following sections.6.1.1 The Passgate LatchThe passgate latch, shown in Figure 6.1, is made of two passgates, a cross-coupledpair of inverters and a buffer inverter which minimizes loading the nodes of the85cross-coupled pair when the latch is connected to additional circuitry. The passgatelatch uses the clock and its logical inverse to turn on and off the passgates whichproduce the transparent and opaque phases of the latch. We describe how the latchand each of its components function in three steps:• By describing the component’s digital abstraction behavior• By an intuitive analog argument and its implications• And finally we provide some key insights into the detailed non-linear behav-ior of transistorsdin x0ClockClocky0Clockpg0pg1z0inv1inv2inv3inv0din y0Figure 6.1: The passgate latchPassgatesA passgate is a device which has two input signals x and y and a control inputenable. The CMOS implementation consists of two transistors, a NMOS andPMOS shown in Figure 6.2 and control signals enable and enable. The enableand enable ports are mapped to the clock signal and its logical inverse, clock inthe passgate latch. Let us consider the passgate as a digital device. When enableis true, the passgate acts like a closed switch connecting x to y, and converselywhen enable is false the passgate acts like an open switch. In the passgate latchfrom Figure 6.1, when the clock signal is high, then pg0 connects nodes din andx0. When the clock signal is low, then pg1 connects nodes z0 and x0. The pass-86x yenableenableFigure 6.2: Passgategates in the passgate latch work together as a multiplexer to drive node x0 either tothe signal at node z0 or din.Designers construct passgates with two transistors as shown in Figure 6.2 be-cause the NMOS devices are effective at conveying 0s (low levels) but not 1s (highlevels) and vice versa for PMOS devices. To see why, let’s consider just the NMOSdevice from Figure 6.2. In order for the NMOS device to be well conducting, thegate voltage on the enable node needs to be at least the transistor’s thresholdvoltage above the source node. Let x be the source node, then in order to con-vey x = Vdd to node y, would require the enable node to be Vdd + the thresholdvoltage. This is not practical. Fortunately, PMOS devices behave in a complemen-tary fashion. Therefore the NMOS and PMOS devices together form a practicalpassgate.In more detail, transistor behavior can be thought of as a variable resistor whoseresistance is controlled by the transistor’s gate signal at enable. The resultantresistance also dependent on the input signals x and y, i.e. the source and drainnodes as described above. Thus the passgate is a non-linear device. Section 6.4describes how the non-linearity of the passgate can lead it to behave as a smallsignal amplifier. As a result we uncover a surprising outcome when comparingpassgate synchronizer variants.87Inverters & Cross-Coupled PairsA cross-coupled pair of inverters is simply two inverters connected in a loop asshown in Figure 6.3. From a digital point of view, if we let x = 1 then y = 0 andconversely if we let x = 0 then y = 1. These two states are stable and thus thecross-coupled pair is a digital circuit which can hold state.Figure 6.3: Cross-coupled pairThe continuous model of an inverter defines a DC transfer function as shownin Figure 6.4. The transfer function shows the voltage level the inverter wouldasymptotically approach for a fixed input voltage. In practice, after a few nanosec-onds, the output will have settled to well within the tolerances of any practicalmeasurement.Designers want digital logic gates to be restoring circuits. In other words,the output of a digital logic gate should be closer to gnd or Vdd than its inputs.For the inverter, this means that the slope of the DC transfer function should benegative everywhere and close to 0 near the points where x= 0 and y= 1 or wherex = 1 and y = 0. A designer wants a high gain region where the slope  −1when transitioning between these two states. These design requirements suggestthe negated sigmoid shape depicted in Figure 6.4.When we consider the superposition of the DC transfer functions of Figure 6.4for the cross-coupled pair of inverters, we get Figure 6.5. The points where the twocurves intersect are the equilibrium points of the circuit. The equilibria at x = 0and y = 1 and at x = 0 and y = 1 are familiar from the digital description of thecircuit. These two equilibria must be stable because we assume that at these points|slope|< 1 for the DC transfer functions of the inverters. On the other hand, there is88Figure 6.4: Inverter DC transfer functiona third equilibrium, when x= y≈ 0.47V for the inverter that we simulated. This isa metastable equilibrium. At this point, the DC transfer function has a |slope|> 1which leads to an unstable equilibrium. Small perturbation about the metastableequilibrium will cause nodes x and y to diverge and settle to one of the two stableequilibria.We can analyze the inverter circuit at the metastable equilibrium depicted inFigure 6.5, and describe its behavior as a linear small-signal system. Each inverterof the cross-coupled pair can be thought of as an amplifier with gain G <−1 driv-ing an RC circuit as shown in Figure 6.6. For this small-signal analysis, we willwrite vms(x) to denote the voltage on x minus the voltage of the balance point, i.e.we are shifting our voltage reference to the metastable voltage. Similarly we willdefine vms(y) as the voltage on y minus the metastable point. Therefore we have:ddt vms(x) =GRC (vms(y)− vms(x))ddt vms(y) =GRC (vms(x)− vms(y))(6.1)The eigenvalues of the system are −1±G. Because G < −1, the eignvalue at89Figure 6.5: Cross-coupled pair phase space with the attractors marked ingreen and the unstable saddle in redFigure 6.6: Cross-coupled pair as amplifier circuit−1+G is negative and corresponds to the decay of the “commond mode” compo-nent of the state (i.e. x+ y) as time evolves. The eigenvalue at −1−G is positiveand gives rise to the divergence from metastability – the differential component ofthe state (i.e. x− y) is exponentially increasing with time. This same exponentialdivergence is the one that is used to described the probability failure of a synchro-nizer. Here the “synchronizer” is a single latch, however, keeping this in mind, the90synchronizer designer’s goal is to maximize this positive eigenvalue to maximizethe rate of divergence from the metastable equilibirum and minimize the failureprobability.A succinct way of demonstrating the dynamics of a cross-coupled pair of in-verters is shown in Figure 6.5 where the green marks illustrate the attractors of thesystem and the red mark illustrates the unstable saddle at the point x = y. We canchange the shape of the curves by changing the design of the inverters, but theremust exist between the two stable attractors a point where the transfer functions ofthe inverters intersect and it is at this intersection where metastability arise.6.1.2 The Two Flip-Flop Passgate SynchronizerA two flip-flop passgate synchronizer is constructed by connecting four passgatelatches in a chain as shown in shown in Figure 6.7. In order to make the chain oflatches a synchronizer, the clock signals clk and clkB are connected to the chainof latches in an alternating pattern. With this alternating phasing of the clocks, thefirst latch, often called the “master”, of each flip-flop is transparent when the clockis low and opaque when the clock goes high. Conversely, the second latch, oftencalled the “slave”, is transparent when the clock is high, and opaque when the clockis low. Thus, when the clock makes a low-to-high transition, the master latch re-tains the value of din from just before the clock transition, and the slave propagatesthis value the output, y1, just after the clock transition. This sampling of the inputand updating of the output on the rising edge of the clock makes the flip-flop “pos-itive edge triggered”. When used as a synchronizer, we observe that the latchescan exhibit metastable behaviour when they are opaque. Thus, the master latch canbe metastable when the clock is high, and the slave can be metastable when theclock is low. The metastability “moves” from one latch to the next when the clocktransitions. In our analysis, this notion of metastability moving is quantified by thefunction u(t)T .The digital behaviour of the passgate synchronizer shown in Figure 6.8 illus-trates how a data transition which occurs well before the first rising edge of theclock at t1 propagates through the synchronizer to the output at y3 within two clockcycles. For the period of time between t0 and t1, master1 is transparent, slave191Figure 6.7: Two flip-flop passgate synchronizeris opaque, and master1’s output y0 tracks the synchronizer’s input at din. Whenthe clock signal transitions at time t1, slave1 becomes transparent while master1becomes opaque. On the time interval t ∈ [t1, t2], slave1’s output y1 tracks the out-put of master1, y0. After the falling edge of the clock at time t2 slave1 becomesopaque and master2 becomes transparent and the signal propagation continues.We can think of the synchronizer as a shift register where at each clock transitionthe input signal shift down the chain by 1 latch.Clktdinty0ty1ty2ty3tt0 t1 t2 t3Figure 6.8: Digital behaviour andpropagation of a two flip-floppassgate synchronizerClktdinty0ty1ty2ty3tt0 t1 t2 t3Figure 6.9: Metastability behaviourand propagation of a two flip-flop passgate synchronizerFigure 6.9 considers the continuous nature of the underlying electronics. Ifwe allow din to transition around the neighbourhood of t1, i.e. as the clock isalso transitioning, then master1’s passgate may become opaque at a point in time92where nodes x0, y0 and z0 happen to be at a voltage levels which are very close tothe cross-coupled pair’s metastable point x = y. Having previously establishedthat between times t ∈ [t1, t2] slave1’s output y1 tracks master1’s output y0,then if node y0 is at or near the metastable point then slave1’s output y1 willcorrespondingly track y0. Observe that in the event that x0≈ y0 at time t2, then it’spossible that as slave1 becomes opaque its output y1 causes master2’s output y2to head towards the metastable point. The ideas behind the digital propagation ofthe input signal down the chain of latches at logic levels through the synchronizeralso applies to metastable behaviour.The difference is that the probability that x0 ≈ y0 at time t2 as a result ofx0 ≈ y0 at time t1 is decreasing exponentially in time. Intuitively we can thinkabout the time varying nature of metastability in a synchronizer “moving” fromstage to stage, but in order for y1 to exhibit metastable behaviour requires y0 tohave been metastable long enough such that at time t2 node y1≈ x2 and so on. Theremaining sections in this chapter illustrates, quantifies and explains the manifes-tation of metastability in the passgate synchronizer design and explores the impacton design variants.6.2 The Passgate Latch Synchronizer BenchmarkThis section explains the details in analyzing the passgate synchronizer describedin Section 6.1 using the tools developed in Chapter 3 to establish a baseline bench-mark. The benchmark will be used to compare design variants in the subsequentsections.The key definitions established from Chapter 3 are summarized:• tcrit is the time by which the outputs of the synchronizer must have stabledigital values.• teola is the time at which the small-signal linear analysis ends. The synchro-nizer has large signal, non-linear behaviours that are distinct for the settle-low and the settle-high outcomes for t > teola.• ∆tin is the width of the time window for which the synchronizer output is notguaranteed to settle by tcrit.93• In the analysis of the passgate synchronizer, we define tsettle as the time atwhich all signals in the synchronizer have settled to within 10% of the volt-age rails. The conditions that define tsettle are used to accelerate the nestedbisection algorithm, but tsettle itself plays no role in the analysis.To establish our benchmark design, we start by running Yang & Greenstreet’snested bisection algorithm [53] on the passgate synchronizer design shown in Fig-ure 6.7. The resultant trajectories shown in Figure 6.10 are collected as describedin Section 3.1 to create a single “perfectly” metastable trajectory up to time teola.A small-signal linear model of the synchronizer describes all simulated trajec-tories from when the data transition tin occurs to the time teola. This model is usedto compute g(t) and λ (t) which are derived from β (t) and u(t)T . We use λ (t) toobserve the overall performance characteristics of the synchronizer and take theresultant curve as our benchmark measure. I demonstrate how u(t)T is used as ourmagnifying glass in Figure 6.14 and Figure 6.15 to show which latch and its inter-nal nodes matters towards metastability resolution at any given time illustrating thetime varying nature of metastability in a synchronizer.6.2.1 Nested Bisection TrajectoriesTo begin the analysis of the passgate synchronizer, we run the nested bisection al-gorithm to find the “perfectly” metastable trajectories which just meet our selectedcritical time tcrit = 7.5×10−10s. All our examples unless otherwise stated use thistiming criterion. Our benchmark example employs a transistor sizing scheme suchthat all inverters and passgates in the design have a 1:1 p-to-n ratio. Figure 6.10shows the results obtained from the nested bisection algorithm to at least time tcrit.In Figure 6.10 we show latch output nodes y0, y1, y2 and y3. One can visualizethe “perfectly” metastable trajectory by following the path that is between thesetrajectories that settle high and low. Figure 6.11 illustrates the resultant “perfectly”metastable trajectory up to time teola which is computed from the collection of thenested bisection trajectories.We have previously claimed in Section 3.2 that we can compute β (t) by nu-merical differencing as the nested bisection algorithm progress. Because β (t) is avector of size N× 1, we report it as log10 (||β (t)||2). Figure 6.12 summarizes the94Figure 6.10: Passgate synchronizer latch outputs from nested bisection runsnumerical differentiated result of βˆ (t) from the K epochs of the nested bisectionalgorithm using Equation 6.2. Note that in Equation 6.2 we divide by 3∆ becausetrajectories ~VH,k(t) and ~VL,k(t) at epoch k are in fact 3∆s away from each other fortrajectory classification robustness as discussed in Section 2.4.3.∆ = tin,H,k−tin,L,kMβˆ (t) =~VH,k(t)−~VL,k(t)3∆(6.2)95Figure 6.11: Passgate synchronizer latch outputs for “perfectly” metastabletrajectory6.2.2 Analysis ResultsWith the “perfectly” metastable trajectory ~Vmeta(t), we can now compute β (t),u(t)T and λ (t). Our selection of u˜T is influenced by the observations made in Fig-ure 6.10 and Figure 6.11. We note that node y3 begins to track master2’s outputnode y2 as slave2 becomes transparent after the 2nd rising edge of the clock signalshown in Figure 6.10 and Figure 6.11. We observe that as metastability is resolvingitself in master2, slave2 follows. For this reason, I select u˜T = 1√2(x2−y2), thedifference on master2’s cross-coupled pair element. This is an example of howthe designer states the intended behaviour of the circuit to set up the analysis. The96Figure 6.12: Estimate of log10 ||βˆ (t)||2 as the nested bisection simulates for-ward in timeparticular selection of u˜T implies by construction of u(t)T that at teola:u(teola)T~V(teola) = 1√2 (x2(teola)−y2(teola)) (6.3)With u˜T selected, we continue our analysis by computing β (t) as outlined inSection 3.2. Figure 6.13 superimposes the summary of β (t) obtained as the nestedbisection algorithm progress using Equation 6.2 and our AD method. We observethat qualitatively they both exhibit the expected exponential growth and both havesimilar characteristic shapes, but otherwise there is a clear divergence between thetwo.The experiments and discussion found in Section 4.1 and Section 4.2 are moti-vated by the diverging results between the two methods of computing β (t) shown97in Figure 6.13. The main observation in Figure 6.13 is that log10 ||β (t)||2 as com-puted via AD grows at a slower rate than the one obtained from numerical differ-entiating. Section 4.1 and Section 4.2 outline experiments we conducted betweenthe two methods. The results from Chapter 4 provide support that our AD methodof computing β (t) is more accurate.Figure 6.13: Comparing numerical estimated βˆ (t) vs. β (t) via ADPrior descriptions of metastability in synchronizers often include a discussionof metastability “moving” from one stage to the next, and the circuitry betweenstages is usually neglected, except for a remark that it somehow contributes to Tw.Our analysis views the synchronizer as a time varying, linear system. The termu(t)T identifies what nodes in the synchronizer are most significant at time t. u(t)Thighlights very clearly what nodes matter most to metastability resolution at timeteola and how those nodes of importance change in time. Thus, u(t)T demonstratesthat at some times, the cross-coupled pair of a particular latch is the most critical98subcircuit of the synchronizer and at other times, u(t)T may “point” to anotherlatch. For t near a clock edge, u(t)T quantifies the impact of circuitry that coupleslatches. Figure 6.14 and Figure 6.15 show the time varying nature of metastabil-ity while u(t)T highlights what nodes matter at which time for each latch in thepassgate synchronizer.Notice that in Figure 6.14 the weight of u(t)T only moves to slave1 as slave1becomes opaque, prior to that effectively all of the weight is on master1 becauseany perturbation on master1 will directly influence slave1 and not the other wayaround. Figure 6.15 illustrates that slave2 never has any significant weight asso-ciated with it because metastability is resolving itself in master2 while slave2 isstill transparent.Figure 6.14: u(t)T highlighting which latch is in play in the first flip-flop forall time t during metastabilityu(t)T provides designers a tool which focuses the attention to the parts of thecircuit which matter at a particular time t. This is not to say that making changesto a stage will not affect the overall performance, but u(t)T provides a magnifyingglass to highlight the order in which nodes matter which may influence how a de-signer thinks about making design improvements. In other words u(t)T highlightswhich nodes of the circuit at the present time are most significant in determining thesynchronization outcome at time tcrit. When the clock signal is close to 0 or 1, then99Figure 6.15: u(t)T highlighting which latch is in play in the second flip-flopfor all time t during metastabilityu(t)T will identify the nodes in the cross-coupled pair of inverters for a particularlatch as being the most significant. Informally, we can say that the metastability islocated at that latch. In the same manner, u(t)T tracks how metastability propagatesthrough the coupling circuitry and into the next latch when the clock changes. Thebig advantage of our formulation, is that u(t)T gives a clear, quantitative descrip-tion of what it means for metastability to “move”.Finally, the instantaneous gain λ (t) of the synchronizer is shown in Figure 6.16.The clock is plotted as an anchoring time reference with which the designer can ref-erence in order to remind the designer what is happening to the circuit state~Vmeta(t)at a particular time t while looking at this new figure of merit. The first most ob-vious feature of Figure 6.16 is that the gain of the synchronizer drops dramaticallyduring the clock transitions. Figure 6.17 shows that in the neightbourhood of thefirst clock transition, i.e. for t close to tclk, ρ(t) is responsible for establishing thefirst “time-to-voltage” conversion in order for the synchronizer to enter metasta-bility. The contribution of ρ(t) does not appear in Figure 6.16, but Figure 6.17shows that the contribution of ρ(t) in that time is positive. The following clocktransition shows how transferring metastability from one stage to the next resultsin performance degradation. The instantaneous gain λ (t) does not go to zero but100Figure 6.16: Instantaneous gain for passgate synchronizer with p:n ratio of1:1the performance profile tells us that the design is not particularly efficient at trans-ferring metastability from stage to stage. Synchronizer designers have appreciatedthat the coupling logic is less effective than the cross-coupled pairs of inverters atresolving metastability. For example, by suggesting that the logic delay associatedwith coupling inverters and passgates should be added to Tw. We are not aware ofany prior work that quantifies the impact of the coupling logic.At this point one might ask why we would desire an “efficient” transition ofmetastability from one stage to the next. From Equation 3.23 we have:P{failure} = ∆Veolag(thomo)Pclk exp(−∫ tthomo λ (s)ds)Thus, larger values of λ (t) lead to smaller probabilities of synchronization failure.The dips in λ (t) show a decrease in the rate of divergence from the metastablecondition at the clock edges. If the clock period is sufficiently long, such small dipshave little overall impact. At higher clock frequencies, these losses of gain can bemore seriously detrimental and must be considered to obtain a robust synchronizer.101Figure 6.17: Non-homogenous term ρ(t) for passgate synchronizer with p:nratio of 1:1We use the terms g(t) and λ (t) as characteristics of the benchmark design whichwe use for comparing the benchmark with variants.6.3 Passgate Synchronizer VariantsOver the years as metastability in synchronizers has become better understood, de-signers have proposed improvements to synchronizer designs which aim to reducesynchronizer failures due to metastability, examples can be found in [28][56][3].Without tools to quantify synchronizer failure probabilities, these “enhancements”have been largely untested.In this section I explore three design variants and compare them against the nonmodified benchmark design from Figure 6.7. The first design variant investigatesthe impact that scan circuitry has on the benchmark synchronizer. The second is toinvestigate if applying a large perturbation to one of the nodes of the cross-coupledpair of inverters can improve the probability failure and lastly I investigate theimpact of modifying the coupling circuitry between stages within the synchronizer.102For each design variant I compare the instantaneous gains λ (t) with respect tothe benchmark design and compute the log-ratio of the gains: log10(gVariant(t)gBenchmark(t)).The log-ratio of the gains provides a clear comparison between the benchmark anda particular variant to identify which design has a better resultant probability fail-ure. If the log-ratio is positive the design variant is outperforming the benchmark,likewise if it is negative then the benchmark is outperforming the variant, whilezero means they are performing identically the same.6.3.1 Synchronizer with Simulated Scan CircuitryTo improve testability of clock-domain crossing circuits, synchronizers are usuallyincluded in testing scan-chains. At the circuit level, this involves adding multi-plexors, i.e. passgates, to select whether the latches are configured in “scan modeor “mission mode” (i.e. normal operations). We simulate the extra capactive loadby adding the capacitance equivalent to two inverters to nodes y1 and y3 from theschematic shown in Figure 6.7. The expectation is that adding scan circuity willdegrade the synchronizer’s instantaneous gain profile λ (t) because adding capac-itance to any node means more current is required to change the voltage on saidnodes.Figure 6.18 shows a comparison between the benchmark and scan loaded in-stantaneous gains λ (t) . Indeed, the scan loaded synchronizer has a performancedecrease as expected and a resultant probability failure which is greater than thatobserved by the unmodified benchmark design shown in Figure 6.19. This ex-periment gives an intuitive feel for how to view the instantaneous gain summaryλ (t) of the synchronizer design. It also highlights that careful placement of scancircuitry may preserve synchronizer performance and is a suggested synchronizerimprovement technique in [3].6.3.2 Synchronizer KickingThe first bit of folklore that we test is that synchronizer performance can, allegedly,be improved by applying a major perturbation during the settling time. The ideais to “knock” it out of metastability. The rationale behind the idea is that if thesynchronizer is metastable, applying a large perturbation to one of the nodes of103Figure 6.18: Instantaneous gaincomparison between thepassgate synchronizer with-out and with scan loadingFigure 6.19: log-ratio of thegain between the scanloaded vs. benchmark:log10(gScan(t)gBenchmark(t))the cross-coupled pair will drive the synchronizer out of metastability thus “solv-ing” the metastability failure mode. I call this idea kicking the synchronizer. Theinvestigation of kicking a synchronizer was presented in [42].The perturbation is done as part of the simulation in the middle of the clockperiod during which master1 is opaque and slave1 is transparent. Furthermore,my experiment was performed by stopping the integration of the circuit at the timethe perturbation is supposed to be applied (i.e. in the middle of the clock period),modifying the value of x1 (and correspondingly z1 because that node would equiv-alently be affected), and then resuming integration. This analysis ignores the extracapacitances that would be introduced by a real “kick” circuit, thus I name this a“magic kick”. Applying the “magic kick” clearly favours the kicking approach.In this particular example, I select the time at which the kick occurs to beat tkick ≈ 4.4× 10−10s. Figure 6.20 shows the trajectories produced by the nestedbisection algorithm for the kicked synchronizer and Figure 6.21 shows the resultant“perfectly” metastable trajectory that the nested bisection finds which quantifies themetastability failures of the kicked synchronizer. In this trajectory, the first latchdiverges from its balance point prior to the kick in precisely a way such that thekick returns the latch to nearly perfect balance. In essence the nested bisectionalgorithm “anticipates” the kick and places the synchronizer in a state such thatafter tkick, the circuit state is put back to where ~Vmeta(t) would have been if the kick104had not occurred.Figure 6.20: Nested bisection simulation runs for synchronizer with kickFigure 6.21 investigates voltage nodes x1 and y1 of the cross-coupled pair ofinverters in slave1 to better observe the effect of this large perturbation. In orderto evaluate the effectiveness of the kicked synchronizer, it is compared against thebenchmark design. I make several observation from these plots:• A synchronizer needs to have the ability to output a logical 1 or 0 at tcrit,which means perturbing the synchronizer at time tkick with the goal of having105Figure 6.21: Synchronizer kicking resultsFigure 6.22: Instantaneous gaincomparison between kickedand non-kicked synchronizerFigure 6.23: log-ratio of thegain between the kickdesign vs. benchmark:log10(gKick(t)gBenchmark(t))a deterministic outcome defeats the intended functionality of a synchronizer.• Because of the previous point which says that at tcrit we must have the abilityto output a 0 or 1, there must be a metastable trajectory for any synchronizerdesign with or without the kick that will produce any value between 0 and1 by time tcrit. This means that the nested bisection algorithm will find atrajectory such that there is a metastable trajectory across the kick.106• Kicking the synchronizer becomes a very obviously bad idea. In order tohave the synchronizer be metastable across the kick, the synhronizer is re-quired to nearly resolve such that the kick knocks the synchronizer back intometastability, doing exactly the opposite of what is intended.The comparison of instantaneous gain profiles between the kicked and benchmarkdesigns in Figure 6.22 immediately shows that kicking the synchronizer is a badidea. There is a large gain penalty at the time of the kick which harms the per-formance of the synchronizer. The results shown in Figure 6.23 demonstrate thatthe benchmark synchronizer design has a better gain ratio as compared to the kickdesign implying that the kicked synchronizer has a smaller MTBF. Observe thatFigure 6.23 and Figure 6.22 show that the kick design variant has a smaller gainpenalty at the following clock edge as compared to the benchmark. However, thepenalty incurred due to the “magic kick” does not make up for this increase in gainat that later time.6.3.3 Synchronizer with OffsetAnother idea that designers have contemplated is to introduce an offset in the cou-pling inverter between synchronizer stages. In this case, I introduce an offset bychanging the coupling inverter’s p:n ratio which goes between the nodes y0 and y0while maintaining the same overall total transistor width (we assume that this willroughly preserve the overall capacitance), i.e. the inverter that connects master1and slave1 of the first flip-flop. The rationale behind introducing the offset is thatit shifts the metastable point on the output of the coupling inverter away from thenominal metastable level. This has the effect of requiring the signal on node y0 toswing further in order to maintain metastability as slave1 transitions from trans-parent to opaque. The idea is that this has a net effect of decreasing the probabilityof metastability to propagate through to the next stage.The hypothesis is that this idea will hurt the synchronizer’s performance forreasons similar to what we observed with the kicked synchronizer: the latch whosestate is diverging from its balance point experiences a decrease in gain. The nestedbisection algorithm trajectories in Figure 6.24 displays the effects of introducingthe offset in the coupling inverter between master1 and slave1. The output y1 of107Figure 6.24: Nested bisection trajectories for offset synchronizerslave1 as a result is nearly at gnd and illustrates how the offset has dramaticallyshifted the metastable voltage level for the output of that particular stage. Howeverdespite this output level seemingly at gnd, this signal is not stable. To observe this,Figure 6.25 shows that node x0 which drives node y0 is metastable. The output y0is the input to slave1, and we can see that node y0 is metastable but at a highervoltage level because of the offset introduce by the inverter. Because of this highervoltage level, x1 is closer to Vdd which drives node y1, the output inveter of slave1108which has no offset, near gnd. Thus x0’s metastability is propagating to the outputy1 in such a way that makes y1 appear to be near a logical 0 level. This illustratesanother potential trap that designers may fall into, which is to say that a signal mayappear to have logically defined behavior but is in fact subject to change at anymoment as a result of metastability resolution.Figure 6.25: u(t)T highlighting which latch is in play in the first flip-flop forall time t during metastabilityInvestigating the instantaneous gain profiles in Figure 6.27 and the log-gain ra-tio in Figure 6.28 produce results that are not expected. Figure 6.28 shows that theoffset design has a briefly lived strong benefits as compared to the benchmark whenslave1 transitions from transparent to opaque. We also observe that on the timeinterval between the first rising and falling edge of the clock signal that the off-set synchronizer outperforms the benchmark design. Furthermore there is anothersmall advantage given to the offset design at the last rising edge which suggeststhat introducing this offset has changed the circuit in ways that had not been antic-ipated. We investigate this design variant in greater detail in the following sectionusing the decomposition tool developed in Section 3.7.109Figure 6.26: u(t)T highlighting which latch is in play in the second flip-flopfor all time t during metastabilityFigure 6.27: Instantaneous gaincomparison between offsetand non-offset syncronizerFigure 6.28: log-ratio of thegain between the offsetdesign vs. benchmark:log10(gOffset(t)gBenchmark(t))6.4 Investigating the Synchronizer with OffsetIn the previous section, our results suggest that the two flip-flop synchronizer withan offset in the coupling inverter slightly improves the design. Figure 6.28 showsthat there is a significant gain contribution in the offset synchronizer as comparedto the benchmark when slave1 becomes opaque and master2 becomes transpar-110ent (i.e. at the falling edge of the clock). Figure 6.27 does not provide an obviousexplanation for the difference between the two designs so we employ our decom-position tool discussed in Section 3.7 to further investigate what differences existbetween the offset and benchmark synchronizers.The two flip-flop synchronizer design muddles the gain decomposition analysisbetween the offset and benchmark. We noted previously that ρ(t) is responsible forlaunching the synchronizer into metastability, which means that ρ(t) is responsiblefor initially putting master1 into metastability. In Section 6.3.3, I changed thep:n ratio of the inverter which connects master1 to slave1. To eliminate thepossibility of ρ(t) contributing to different startup conditions, we analyze a threeflip-flop passgate synchronizer shown in Figure 6.29 and introduce the offset in thecoupling inverter inv9 which connects master2 and slave2.Analyzing the three flip-flop design shown in Figure 6.29 allows us to fullyisolate and investigate the differences between the two designs by removing thepossibility of any differences introduced by the initial onset of metastability. Fig-ure 6.30 shows that the gains between the offset and benchmark design leading upto the point where master2 becomes the important subcircuit to metastability res-olution are identical. We note that the characteristics that appear in Figure 6.30 andFigure 6.28 are very similar. This first qualitative comparison suggests consistencyin the behaviour of introducing an offset into the design.For each component I show for both the benchmark and offset synchronizersthe following:• The voltage node at the component’s input and output and the clock signal.• The importance those nodes have on the overall circuit as dictated by u(t)T .• The gain contribution of the component as described in Section 3.7.• The log-ratio comparison log10(gOffset(t)gBenchmark(t)).I superimpose in each of the subplots u(t)T for the nodes in question to show howmuch influence those nodes have towards metastability resolution on the circuit asa whole.By considering the component-wise contributions to the gain, we can see howeach part of the synchronizer contributes differently to the overall gain for the two111din x0ClockClocky0Clockpg0pg1z0inv1inv2inv3inv0din y0 x1ClockClocky1Clockpg2pg3z1inv4inv5inv6y1x2ClockClocky2Clockpg4pg5z2inv7inv8inv9y2 x3ClockClocky3Clockpg6pg7z3inv10inv11inv12y3x4ClockClocky4Clockpg8pg9z4inv13inv14inv15y4 x5ClockClocky5Clockpg10pg11z5inv16inv17inv18y5Master1 Slave1Master2 Slave2Master3 Slave3Figure 6.29: 3 flip-flop passgate synchronizerdesigns. We observed many second-order effects due to the change in operatingpoints caused by the offset and non-linearities in the circuits themselves. Some ofthese increase the gain of the offset design relative to the benchmark, while otherscause a decrease. The combined effect is a slight increase in the total gain thatcannot be properly attributed to any single cause. We describe below the impact ofthe offset design on each affected inverter and passgate.112Figure 6.30: Three flip-flop offset vs. benchmark log-ratio:log10(gOffset(t)gBenchmark(t))6.4.1 The Offset Coupling Inverter and PassgateThe immediate observation from Figure 6.30 (which is similar in Figure 6.28) isthat there is a sharp rise in the effectiveness of the offset design immediately fol-lowed by a sharp drop which settles to a net advantage for the offset design. Wehome in on that behaviour by starting our investigation with the component that haschanged: inv9. Figure 6.31 shows a sharp rise in gain at the clock edge of interestfollowed by a sharp fall in the gain. As seen in Figure 6.30, the total gain contribu-tion of inv9 settles to a value for the offset synchronizer that is less than the gainfor the benchmark. Thus we must look at other components to find the cause ofthe offset synchronizer’s advantage. We also note that u(t)T only considers inv9’soutput y2 for a brief period of time.113Figure 6.31: Three flip-flop component analysis: inv9We then investigate pg6, the passgate that connects master2 and slave2. Fig-ure 6.32 shows that this passgate contributes a clear advantage for the offset designas compared to the benchmark. Throughout the time period between clock edgetransitions, node x3 matters. As with inv9, a small but non-zero amount of weightis given to node y2 at the rising edge of the clock at around t = 12.5× 10−10s,however, no appreciable change in the gain is observed.We can further decompose pg6 into its individual transistors to determine whichtransistor is responsible for this improvement. Looking at Figure 6.33 and Fig-ure 6.34 we see that the PMOS device in both the offset and benchmark designsare contributing negatively and insignificantly (on the order of 10−3 difference be-tween the offset and benchmark) to the circuit whereas the NMOS device accountsfor the observed gain in Figure 6.32. We see this behaviour because as the clocksignal is changing, for a brief moment in time the circuit is configured as a commongate amplifier shown in Figure 6.35. In more detail, with the offset design, nodey2 drops earlier in the offset design. This increases the gate-to-source voltage ofthe NMOS device in pg6, and thus increase the transconductance, gm, of that tran-sistor. In other words, the drain-to-source current of this transistor has a greatersensitivity to the voltage on node y2 in the offset design than in the benchmark.This is how the passgate acts as an active device and makes a positive contribution114Figure 6.32: Three flip-flop component analysis: pg6to the overall gain in the offset design but not in the benchmark.Figure 6.33: Three flip-flop component analysis: pg6−NMOS6.4.2 The Master2 Cross-Coupled LoopThe plots for inv7 (Figure 6.37) show a higher instantaneous gain for the offsetdesign than the benchmark when the master2 latch is opaque. We conjectured115Figure 6.34: Three flip-flop component analysis: pg6−PMOSFigure 6.35: Common gate amplifier circuit with NMOSthat the offset design has less total capacitance on node y2 than the benchmark,even though the total transistor width for inv9 is the same in both designs. Fig-ure 6.36 confirms this conjecture. We suspect that this is a consequence of thenon-linear gate-capacitance of the MOSFET transistors, but need to perform furtherexperiments to test this hypothesis. At the clock edge, slave2 goes opaque, wesee a sharp loss of gain for inverter inv7. This is because the master2 latch exitsmetastability earlier for the offset design than for the benchmark. The combinedeffect of these two phenomena is that inv7 has lower overall gain in the offset116design than the benchmark.Figure 6.38 and Figure 6.39 show the contributions of inv8 and pg5. Bothprovide better performance for the offset synchronizer than the benchmark. Thedifferences can be attributed to their u(t) functions. Inverter inv8 makes a positivecontribution to the overall gain, and u(t) shows that inv8 remains relevant slightlylonger in the offset design. Conversely, the passgate pg5 causes a reduction in gain,and its u(t) term shows that it is relevant for slightly less time than in the benchmarkdesign. We have not yet determined a circuit-design-friendly explanation for whyu(t) functions are different: for now, they just are.Figure 6.36: Comparison of effective capacitance on node y26.4.3 The Slave2 Cross-Coupled LoopFinally we investigate the components in the slave2 cross-coupled loop: inv10,inv11 and pg7 with corresponding plots Figure 6.40, Figure 6.41 and Figure 6.42.All three components in the offset synchronizer have a net negative gain contri-bution at the clock transition, noting that inv10 contributes the most towards thelower gain for the offset synchronizer than the benchmark.117Figure 6.37: Three flip-flop component analysis: inv7Figure 6.38: Three flip-flop component analysis: inv86.4.4 Summary of ResultsTable 6.1 summarizes our results. We see that the combination of inv9 (the offsetcoupling inverter), pg6, inv8 and pg5 in the cross-coupled loop of master2 con-tribute positively to the offset design at the falling edge of the clock, while inv7and all the components in the cross-coupled loop of slave2 contribute negatively118Figure 6.39: Three flip-flop component analysis: pg5Figure 6.40: Three flip-flop component analysis: inv10to the offset design. The net result is a wild variation in gain between the offsetand benchmark design at the clock transition, with a settled net gain favouring theoffset design. We note that the net gain that is observed in the two flip-flop designfrom Figure 6.28 is a factor of roughly 100.02 over the benchmark which is small ascompared to the scan loaded synchronizer (decrease in performance by over 103.5)119Figure 6.41: Three flip-flop component analysis: inv11Figure 6.42: Three flip-flop component analysis: pg7and the kicked design (decrease in performance by ≈ 100.6).We note that because of the difference in the effective capacitance depicted inFigure 6.36 on node y2, the offset design accumulates gain over the clock periodwhen the mater stage associated with the offset coupling inverter is metastable.Figure 6.28 and Figure 6.30 show that despite the significant gain from the passgate120Table 6.1: Summary of offset and benchmark gain contributionsComponent log10(gOffset(teola)gBenchmark(teola))master2inv7 -0.0381inv8 0 .0738pg5 0 .0457inv9 -0.0922slave2pg6 0 .1470inv10 -0.0623inv11 -0.0381pg7 -0.0339Σ Components 0 .0019Overall 3 flip-flop design 0 .0269Difference -0.025acting as an amplifier at the falling clock transition, the offset design only comesout marginally ahead. We believe that if the clock period was shortened sufficiently,that the offset design would be worse than benchmark. On the other hand if theclock period is larger then the offset design improvement is made larger. The totalof the per-component log-gains does not quite match the overall difference betweenthe two designs. This suggests looking deeper into the analysis and the circuits tofind an explanation, or maybe we should just change the clock period to get a cleardistinction between the two designs!In the course of performing this analysis, I made two distinct discoveries. Thefirst, is that generally, designers think of passgates as being passive devices wherethe introduction of the offset makes the passgate a small-signal amplifier for abrief period of time. The second, is that we assumed that introducing the offset ininv9 (and the coupling inverter of master1 in the two flip-flop design) by keepingthe total width of the transistors fixed would preserve the overall capacitance ofthe circuit. In reality, this assumption was violated and brought to light subtlesecondary effects which were not considered. Our tool allowed us to discover andexplain the discrepancy in the behaviour of the passgate between the two designs121and expose that our underlying assumption about capacitance was incorrect.We have shown how our new tool can be systematically used to explain big pic-ture behaviour in the synchronizer. However, there are other aspects to Figure 6.30which remain a mystery and motivates future investigation. One such unexplainedanomaly is seen in Figure 6.30 at the last rising edge of the clock where a smallstep in performance is suggested to be present in the offset design which contin-ues to elude us. Given what we have already discovered, we believe these subtlebehaviours are continued secondary effects from the non-linear capacitance modelwe employ.6.4.5 Offset Investigation RemarksThe tools developed in Chapter 3 have been fully exercised to uncover why the off-set has the observed behaviour in the passgate synchronizer design at the clocktransition. The reason is subtle, one which has not been argued before to myknowledge. The improvement observed in the offset design as compared to thebenchmark is repeatable but small. This leads me to believe that the benchmarksynchronizer with all transistors set with a p:n ratio of 1:1 at its current operatingcondition is potentially not optimal. The way the information is presented with thisnew set of analysis tools allowed us to systematically narrow down the reason forthe major discrepancy between the benchmark and offset designs leading us to anew discovery! A circuit component which designers often perceive as a passivecircuit element (the passgate in this case) turned out to behave in an active wayfor a brief moment. This discovery explained the observed behaviour after someinvestigation in textbook circuit designs. We also discover that a small differencein the effective capacitance on the node between the cross-coupled loop and theoffset coupling inverter accounts for the accumulation of gain in the offset designas compared to the benchmark. We believe that this small accumulation in gain iskey to the offset design outperforming the benchmark.Recall that in the kicked synchronizer variant shown in Figure 6.23 we ob-served an increase in the log-gain ratio at the last rising edge of the clock (notenough to overcome the harm done by the kick). I believe that this boost in gain atthat last clock transition is also due to the passgate element acting as an amplifier122because as the synchronizer recovers from the kick its output voltage is in the pro-cess of returning to where it should be to maintain metastability – thus at the clocktransition we get a boost from the small-signal amplifier present in the circuit.My analysis tool has put into question common intuitive arguments that design-ers make about synchronizers and metastability. Some of the intuitive notions havebeen confirmed while others have been debunked with an accompanying explana-tion. The conclusion that can be drawn is that the set of tools outlined in Chapter 3are useful for synchronizer analysis.Our analysis has not accounted for all the outstanding anomalies. We observein Figure 6.30 and Figure 6.28 a small improvement in the offset design at thelast clock transition for which we have not accounted for. We also assumed thatkeeping the total transistor width constant in both designs would yield a constantcapacitance matrix for the two designs. Figure 6.36 shows that this assumptionis violated. We acknowledge that the full capacitance model described in Sec-tion 5.3.2 is non-linear and may account for the other unexplained discrepanciesbetween the designs.Further analysis is required to understand the tradeoffs in sizing transistors dif-ferently. However, our new tools not only provide a mathematical model to quan-tify the synchronizer’s probability failure, but can be used to quantify performancecharacteristics and compare design ideas. This has lead to new discoveries aboutsynchronizer behaviour and provides a foundation for synchronizer optimization.123Chapter 7Conclusion and Futurework7.1 SummaryThis thesis has presented novel methods for the analysis of synchronizer circuits.We started with Yang and Greenstreet’s nested bisection algorithm which they usedto compute failure probabilities [53] and produce failure trajectories [54]. In Chap-ter 3, we showed (see Equation 3.8) how to compute ∂∂ tin~V(t) where tin is the time ofan input transition of the synchronizer, and ~V(t) is the synchronizer state at time t.We then presented a compuation for u(t), where u(t) is a unit-vector such that per-turbation of the synchronizer state in direction u(t) at time t map to perturbationsin the direction of synchronizer resolution at the end of the synchronization process(see Equation 3.14 and Equation 3.15). Intuitively, β (t) connects input transitionsto the circuit state at time t, and u(t) connects the circuit’s state at time t to thefinal state at the deadline for synchronization. From these, we derive a notion ofcumulative gain, g(t) = u(t)Tβ (t) that describes how much “progress” the circuithas achieved in resolving metastability at time t. Crucially, g(t) is a scalar, and wederive a simple, first-order, linear ODE for g(t) where (see Equation 3.16):ddt g(t) = λ (t)g(t)+ρ(t)Where λ (t) is the “instantaneous gain” of the synchronizer, and the inhomoge-nous component, ρ(t) describes the initial time-to-voltage conversion of the syn-124chronizer. For real-world synchronizers, ρ(t) rapidly converges to zero after theclock edge that samples the data input.The formulation for g(t) enables detailed comparison of synchronizer designs.Whereas the methods from [53, 54] allow a designer to compare failure proba-bilities (and thus MTBFs) of synchronizer designs, g(t) lets a designer see whenin the synchronization process the differences arise. This added information canprovide valuable insight into understanding the trade-offs in the circuit designs.Furthermore, we note that the standard modified-nodal-analysis approach to cir-cuit modeling naturally lends itself to decomposing the time-derivative functionfor the circuit state, and the Jacobian function for this derivative into a sum of con-tributions from individual components, e.g. transistors, inverters, logic gates, ect.As described in Section 3.7, this allows our tool to be used to identify both whereperformance differences arise (i.e. in which sub-circuit) as well as when. In Chap-ter 6, I analyzed several synchronizer designs and showed how the new methodscan be used to understand their performance characteristics.To apply the methods from Chapter 3 to the circuits in Chapter 6, we addressedtwo main issues. Our implementation make extensive use of automatic differen-tiation (AD). Chapter 4 provides a comparison of the accuracy of automatic dif-ferentiation (AD) with numerical approximation using Centered Differencing (CD)for a simple model with a fix-point attractor and a system with switched dynamicsthat behaves like a synchronizer. Unlike real-world circuit models, both modelsin Chapter 4 were formulated to allow closed-form solutions. For both models,AD was more overall accurate than CD, with a modest overall advantage for thesystem with a fixed point attractor, and a clear advantage with the “synchronizer”.While this is not a rigorous proof, these examples strongly suggest that AD is ad-vantageous for problems like the ones in this dissertation where we are differen-tiating the result of numerically approximate algorithm (in this case integration)with respect to one or more parameters of the algorithm. Further investigation iswarranted, but the current results justify our use of AD.Implementing our synchronization analysis methods requires circuit modelsthat are amenable to automatic differentiation. To that end, in Chapter 5, I imple-mented an AD friendly version of the MVS MOSFET model [24]. In the process, Ireplaced if-statements with algebraically equivalent, differentiable formulas, and125I fixed an error in the MVS root-solver for determining the virtual source and drainvoltages. Not surprisingly, model-evaluation time dominated the run-time for themetastability analysis. I developed a simplified EKV model that provides a tolera-ble approximation of the more detailed MVS or BSIM models while enabling muchfaster execution of my metastability analysis algorithms.I tested my approach to metastability analysis using real synchronizer designsin Chapter 6. In particular, I examined a simple passgate based synchronizer alongwith three variants: adding a scan chain, “kicking” the synchronizer and addingan offset between stages. Using the functions computed by my algorithms, inparticular, β (t), u(t), and g(t), I showed how my metastability analysis can be usedto compare the designs and explain their differences. The analysis also revealedsome unexpected behaviours. Most notably, my tool showed how the passgatecoupling two latches can be an active device, providing gain as a common-gateamplifier.7.2 Future WorkOver the course of developing the transistor models, the implementation of thenested bisection algorithm and investigating AD methods I noted many near futureimprovements to be made.• The current implementation of the nested bisection algorithm expects theuser to specify a parameter to test the linearity assumption and a ∆V term toestablish teola. These are two tests for linearity, and it should be possible touse the same condition for both.• In Chapter 4, we perform some preliminary experiments to determine thetrade off of numerical integration of∫ tt0∂∂wddr f (r,w)dr to estimate∂∂w f (t,w)compared to numerical differencing of trajectories produced by∫ tt0ddr f (r,w±∆)dr. AD appears to be more accurate, but some anomalies need further in-vestigation. We also only investigated the performance against a centereddifference approximating scheme and did not consider higher order numeri-cal differencing methods.• Chapter 4 develops a solution to compute ∂y∂x via AD where y comes from a126fixpoint algorithm which solves f (x,y) = 0. The described solution makesassumptions about f and y. I note that we could relax those assumptions andderive a more general treatment of computing ∂y∂x for fixpoint algorithms.• Section 4.3 provides preliminary ground work to perform an end-to-end erroranalysis of the nested bisection algorithm. Although the error analysis isnot covered in this thesis, doing an end-to-end error analysis is likely to bebeneficial for other applications that have multi-dimensional saddle pointproblems.• The simplified EKV model in Chapter 5 can be further improved. Classi-cal derivations of velocity saturation models (e.g. [16]) consider a “vertical”term related to Vgs and a “horizontal” term, related to Vds with the horizontalterm being more significant. We realize in retrospect that our model capturesthe vertical term, but not the horizontal one. We expect that a simple changeto the model should improve the accuracy of our simplified EKV model with-out increasing its complexity. Furthermore, our model for attributing gatecapacitance between the source, drain, and body is ad hoc, and we expectthat improvements should be possible here as well. Another correction termcomputed as a log(∑i) of three operating regions is one solution to exploreto get a better fit as the transistor goes from the cut off, to linear to velocitysaturated regions. The addition of such a term is likely to make computingthe gradient in the fitting routine be more easily readable with the incorpo-ration of AD. The end result would remain as one equation to compute Ids inthe same form as shown in Equation 5.6.• Currently the implementation of the junction capacitance is taken out of theBSIM manual. However, it is likely that a simple, junction diode model witha few parameters should greatly simplify the computation.• Designs like the boost [28] and robust synchronizer in [56] concern them-selves with the power consumption of the synchronizer design when it ismetastable. Bounding the total power consumption may be a potential con-straint of importance. An extension to my tools could include keeping track127of the net current flowing from nodes connected to Vdd which by proxy de-termines the power consumption of the design in question.• This thesis ignores process and temperature variation in micro-electronicsdesign. This is known to have a large impact on the jamb latch designs.Finding methods that take process variation into consideration should be ex-plored.• In this thesis I only consider the two and three flip-flop passgate latch syn-chronizers. Preliminary work has been started to investigate the jamb-latch.Additional synchronizer designs such as the strongARM latch [27, 36] de-sign is the next logical design to explore. Other designs of interest to com-pare against would be, Li et al boosting voltage supply synchronizer[28],Yang et al pseudo NMOS synchronizer[55] and Zhou et al robust synchronizer[56].The proposed designs by these researchers revolve around circuit modifica-tions around the passgate, jamb and strong arm latch designs. Exercisingmy tools in this thesis to perform a thorough quantitative analysis of thesedesigns would be of interest to the community.• We have begun investigating ways to leverage AD to automatically apply gra-dient descent to transistor sizing on the two flip-flop passgate synchronizerdesign to automatically maximize its gain g(t) by computing ∂β (t)∂w where wis the vector of transistor widths. Initial results look promising.7.3 Other ApplicationsThe methodology of selecting a non-linear model, simulating trajectories of in-terest, automatically develop a linear small-signal perturbation model, and sum-marizing performance metrics I believe to be applicable to many fields other thancircuit analysis. One such field of particular interest to me is in control theoryand robotics. Model predictive control and adaptive control schemes in roboticsare candidate use cases for this design methodology. In synchronizers the per-formance criteria is MTBF/resolution time, in robotics it could be time-to-reach,position variance, velocity tracking among many others.128Circuit designers use SPICE in order to perform circuit analysis. Developing asimilar analytical framework in the field of robotics is an area I believe contribu-tions can be made.Another equally interesting application would be to create a framework whichcan automatically generate correct scientific computing code to compute the de-sired mathematics. Using AD is very powerful when setup properly. However, asimplemented in common scientific computing frameworks such as MATLAB, it iseasy to make indexing errors or produce confounding results with mixed units. De-veloping a language which translates mathematical equations into scientific com-puting code automatically would go a long way to accelerate the development ofnew analysis techniques and reduce human error in the implementation of AD ob-jects to achieve the correct mathematical operations.129Bibliography[1] I. A. Akhter, J. Reiher, and M. R. Greenstreet. Finding all DC operatingpoints using interval arithmetic based verification algorithms. In 2019Design, Automation Test in Europe Conference Exhibition (DATE), pages1595–1598, March 2019. doi:10.23919/DATE.2019.8714966. → page v[2] M. Alshaikh, D. Kinniment, and A. Yakovlev. A synchronizer design basedon wagging. In 2010 International Conference on Microelectronics, pages415–418, Dec 2010. doi:10.1109/ICM.2010.5696176. → page 84[3] S. Beer and R. Ginosar. Eleven ways to boost your synchronizer. IEEETransactions on Very Large Scale Integration (VLSI) Systems, 23(6):1040–1049, June 2015. ISSN 1063-8210.doi:10.1109/TVLSI.2014.2331331. → pages 19, 102, 103[4] S. Beer, J. Cox, T. Chaney, and D. M. Zar. MTBF bounds for multistagesynchronizers. In 2013 IEEE 19th International Symposium onAsynchronous Circuits and Systems, pages 158–165, May 2013.doi:10.1109/ASYNC.2013.18. → page 15[5] T. J. Chaney and C. E. Molnar. Anomalous behavior of synchronizer andarbiter circuits. IEEE Transactions on Computers, C-22(4):421–422, April1973. ISSN 0018-9340. doi:10.1109/T-C.1973.223730. → page 9[6] T. Coleman and W. Xu. Automatic Differentiation in MATLAB UsingADMAT with Applications. Society for Industrial and Applied Mathematics,Philadelphia, PA, 2016. doi:10.1137/1.9781611974362. URLhttps://epubs.siam.org/doi/abs/10.1137/1.9781611974362. → page 11[7] G. R. Couranz and D. F. Wann. Theoretical and experimental behavior ofsynchronizers operating in the metastable region. IEEE Transactions onComputers, C-24(6):604–616, June 1975. ISSN 0018-9340.doi:10.1109/T-C.1975.224273. → page 15130[8] M. V. Dunga, C.-H. Lin, A. M. Niknejad, and C. Hu. BSIM-CMG: ACompact Model for Multi-Gate Transistors, pages 113–153. Springer US,Boston, MA, 2008. ISBN 978-0-387-71752-4.doi:10.1007/978-0-387-71752-4 3. URLhttps://doi.org/10.1007/978-0-387-71752-4 3. → pages 10, 61[9] P. Eberhard and C. Bischof. Automatic differentiation of numericalintegration algorithms. Mathematics of Computation, 68(226):717–731,1999. ISSN 00255718, 10886842. URLhttp://www.jstor.org/stable/2585052. → page 12[10] L. M. engelhardt. version XVII. Linear Techology Corporation - now part ofAnalog Devices, 2019. URL https://www.analog.com/en/design-center/design-tools-and-calculators/ltspice-simulator.html. → pages 10, 11, 62, 64[11] C. Enz. An MOS transistor model for RF IC design valid in all regions ofoperation. IEEE Transactions on Microwave Theory and Techniques, 50(1):342–359, Jan 2002. ISSN 0018-9480. doi:10.1109/22.981286. → pages10, 77, 81[12] C. C. Enz. A short story of the EKV MOS transistor model. IEEESolid-State Circuits Society Newsletter, 13(3):24–30, Summer 2008. ISSN1098-4232. doi:10.1109/N-SSC.2008.4785778. → pages 10, 77[13] C. C. Enz and E. A. Vittoz. Charge-based MOS transistor modeling - theEKV model for low-power and RF IC design. 2006. URLhttp://infoscience.epfl.ch/record/149582. → pages 9, 77[14] C. C. Enz, F. Krummenacher, and E. A. Vittoz. An analytical MOStransistor model valid in all regions of operation and dedicated tolow-voltage and low-current applications. special issue of the AnalogIntegrated Circuits and Signal Processing Journal on Low-Voltage andLow-Power Design, 8:83–114, 1995. URLhttp://infoscience.epfl.ch/record/149574. → pages 6, 10, 61, 77[15] C. P. Fries. Stochastic automatic differentiation: automatic differentiationfor monte-carlo simulations. Quantitative Finance, 19(6):1043–1059, 2019.doi:10.1080/14697688.2018.1556398. URLhttps://doi.org/10.1080/14697688.2018.1556398. → page 11[16] P. R. Gray and R. G. Meyer. Analysis and Design of Analog IntegratedCircuits. John Wiley & Sons, Inc., New York, NY, USA, 2nd edition, 1990.ISBN 0471874930. → pages 9, 127131[17] B. Group. Berkeley short-channel IGFET model (BSIM) group, Feb 2017.URL https://bsim.berkeley.edu/models. → pages 10, 61, 71[18] D. Hodges, H. G. Jackson, and R. Saleh. Analysis and design of digitalintegrated circuits : In deep submicron technology. 01 2004. → page 9[19] HSPICE. the gold standard for accurate circuit simulations. Synopsis Inc.,Mountain View, California, 2019. URLhttps://www.synopsys.com/verification/ams-verification/hspice.html. →pages 10, 11[20] W. R. Inc. Mathematica, Version 12.0. Champaign, IL, USA, 2019. URLhttps://www.wolfram.com/mathematica/. → page 42[21] N. Integration and M. N. Group. Predictive technology model, June 2011.URL http://ptm.asu.edu/. → pages 10, 64[22] D. James. Intel Ivy bridge unveiled the first commercial tri-gate, high-k,metal-gate CPU. In Proceedings of the IEEE 2012 Custom IntegratedCircuits Conference, pages 1–4, Sep. 2012.doi:10.1109/CICC.2012.6330644. → page 10[23] G. Kedem. Automatic differentiation of computer programs. ACM Trans.Math. Softw., 6(2):150–165, June 1980. ISSN 0098-3500.doi:10.1145/355887.355890. URLhttp://doi.acm.org/10.1145/355887.355890. → pages 9, 11[24] A. Khakifirooz, O. M. Nayfeh, and D. Antoniadis. A simple semiempiricalshort-channel MOSFET current voltage model continuous across all regionsof operation and employing only physical parameters. IEEE Transactions onElectron Devices, 56(8):1674–1680, Aug 2009. ISSN 0018-9383.doi:10.1109/TED.2009.2024022. → pages 10, 61, 72, 125[25] D. J. Kinniment and D. B. G. Edwards. Circuit technology in a largecomputer system. Radio and Electronic Engineer, 43(7):435–441, July1973. ISSN 0033-7722. doi:10.1049/ree.1973.0068. → page 9[26] D. J. Kinniment and J. V. Woods. Synchronisation and arbitration circuits indigital systems. Proceedings of the Institution of Electrical Engineers, 123(10):961–966, October 1976. ISSN 0020-3270.doi:10.1049/piee.1976.0212. → page 15132[27] T. Kobayashi, K. Nogami, T. Shirotori, Y. Fujimoto, and O. Watanabe. Acurrent-mode latch sense amplifier and a static power saving input buffer forlow-power architecture. In 1992 Symposium on VLSI Circuits Digest ofTechnical Papers, pages 28–29, June 1992.doi:10.1109/VLSIC.1992.229252. → page 128[28] Y. Li, P. I.-J. Chuang, A. Kennings, and M. Sachdev. Voltage-boostedsynchronizers. In Proceedings of the 25th Edition on Great LakesSymposium on VLSI, GLSVLSI ’15, pages 307–312, New York, NY, USA,2015. ACM. ISBN 978-1-4503-3474-7. doi:10.1145/2742060.2742075.URL http://doi.acm.org/10.1145/2742060.2742075. → pages19, 102, 127, 128[29] M. S. Lundstrom and D. A. Antoniadis. Compact models and the physics ofnanoscale FETs. IEEE Transactions on Electron Devices, 61(2):225–233,Feb 2014. ISSN 0018-9383. doi:10.1109/TED.2013.2283253. → pages10, 72[30] Maple. 2019. Maplesoft, Waterloo ON, Canada, 2019. URLhttps://www.maplesoft.com/. → page 42[31] L. R. Marino. General theory of metastable operation. IEEE Transactionson Computers, C-30(2):107–115, Feb 1981. ISSN 0018-9340.doi:10.1109/TC.1981.6312173. → pages 12, 13[32] MATLAB. version 9.3.0.713579 (R2017b). The MathWorks Inc., Natick,Massachusetts, 2017. URLhttps://www.mathworks.com/products/matlab.html. → page 11[33] MATLAB. Symbolic Math Toolbox. Mathworks, Natick, Massachusetts,2017b. URL https://www.mathworks.com/products/symbolic.html. → page42[34] M. Mendler and T. Stroup. Newtonian arbiters cannot be proven correct.Formal Methods in System Design, 3(3):233–257, Dec 1993. ISSN1572-8102. doi:10.1007/BF01384075. URLhttps://doi.org/10.1007/BF01384075. → pages 3, 12, 13[35] MetaACE. BlendICS Inc., St. Louis, Missouri, 2016. URLhttp://blendics.com. → page 15[36] J. Montanaro, R. T. Witek, K. Anne, A. J. Black, E. M. Cooper, D. W.Dobberpuhl, P. M. Donahue, J. Eno, W. Hoeppner, D. Kruckemyer, T. H.133Lee, P. C. M. Lin, L. Madden, D. Murray, M. H. Pearce, S. Santhanam, K. J.Snyder, R. Stehpany, and S. C. Thierauf. A 160-mhz, 32-b, 0.5-w CMOSRISC microprocessor. IEEE Journal of Solid-State Circuits, 31(11):1703–1714, Nov 1996. ISSN 1558-173X. doi:10.1109/JSSC.1996.542315.→ page 128[37] W. Mller and I. Eisele. Velocity saturation in short channel field effecttransistors. Solid State Communications, 34(6):447 – 449, 1980. ISSN0038-1098. doi:https://doi.org/10.1016/0038-1098(80)90648-1. URLhttp://www.sciencedirect.com/science/article/pii/0038109880906481. →pages 9, 67[38] L. W. Nagel and D. Pederson. SPICE (simulation program with integratedcircuit emphasis). Technical Report UCB/ERL M382, EECS Department,University of California, Berkeley, Apr 1973. URLhttp://www2.eecs.berkeley.edu/Pubs/TechRpts/1973/22871.html. → pages9, 61[39] S. Rakheja and D. Antoniadis. MVS nanotransistor model (silicon), Dec2015. URL https://nanohub.org/publications/15/4. → pages v, 6, 10, 74, 75[40] L. B. Rall. Automatic Differentiation: Techniques and Applications, volume120 of Lecture Notes in Computer Science. Springer, 1981. ISBN3-540-10861-0. doi:10.1007/3-540-10861-0. URLhttps://doi.org/10.1007/3-540-10861-0. → pages 9, 11[41] J. Reiher and M. R. Greenstreet. Combining the MVS compact MOSFETmodel with automatic differentiation for flexible circuit analysis. FACWorkshop, May 2018. → page v[42] J. Reiher, M. R. Greenstreet, and I. W. Jones. Explaining metastability inreal synchronizers. In 2018 24th IEEE International Symposium onAsynchronous Circuits and Systems (ASYNC), pages 59–67, May 2018.doi:10.1109/ASYNC.2018.00024. → pages v, 16, 104[43] J. Revels, M. Lubin, and T. Papamarkou. Forward-mode automaticdifferentiation in Julia. CoRR, abs/1607.07892, 2016. URLhttp://arxiv.org/abs/1607.07892. → page 11[44] S. M. Rump. INTLAB — INTerval LABoratory, pages 77–104. SpringerNetherlands, Dordrecht, 1999. ISBN 978-94-017-1247-7.doi:10.1007/978-94-017-1247-7 7. URLhttps://doi.org/10.1007/978-94-017-1247-7 7. → pages 6, 11134[45] S. M. Rump. Verification methods: Rigorous results using floating-pointarithmetic. Acta Numerica, 19:287449, 2010.doi:10.1017/S096249291000005X. → page 11[46] K. Rupp. 42 years of microprocessor trend data.https://github.com/karlrupp/microprocessor-trend-data, 2018. → pages xi, 2[47] D. Smith and J. Linvill. An accurate short-channel IGFET model forcomputer-aided circuit design. In 1971 IEEE International Solid-StateCircuits Conference. Digest of Technical Papers, volume XIV, pages 40–41,Feb 1971. doi:10.1109/ISSCC.1971.1154896. → page 9[48] I. E. Sutherland. Micropipelines. Commun. ACM, 32(6):720–738, June1989. ISSN 0001-0782. doi:10.1145/63526.63532. URLhttp://doi.acm.org/10.1145/63526.63532. → page 74[49] H. Vogt, M. Hendrix, and P. Nenzi. ngSPICE - version 30. 2019. URLhttp://ngspice.sourceforge.net/. → pages 10, 11, 62, 64[50] L. Wei, O. Mysore, and D. Antoniadis. Virtual-source-based self-consistentcurrent and charge FET models: From ballistic to drift-diffusionvelocity-saturation operation. IEEE Transactions on Electron Devices, 59(5):1263–1271, May 2012. ISSN 0018-9383.doi:10.1109/TED.2012.2186968. → pages 10, 61, 72[51] N. Weste and D. Harris. CMOS VLSI Design: A Circuits and SystemsPerspective. Addison-Wesley Publishing Company, USA, 3rd edition, 2004.ISBN 0321149017, 9780321149015. → pages 9, 65, 68[52] C. Yan. Reachability Analysis Based Circuit-Level Formal Verification. PhDthesis, The University of British Columbia, 2011. → page 11[53] S. Yang and M. Greenstreet. Computing synchronizer failure probabilities.In 2007 Design, Automation Test in Europe Conference Exhibition, pages1–6, April 2007. doi:10.1109/DATE.2007.364487. → pages6, 14, 16, 20, 94, 124, 125[54] S. Yang and M. R. Greenstreet. Simulating improbable events. InProceedings of the 44th Annual Design Automation Conference, DAC ’07,pages 154–157, New York, NY, USA, 2007. ACM. ISBN978-1-59593-627-1. doi:10.1145/1278480.1278518. URLhttp://doi.acm.org.ezproxy.library.ubc.ca/10.1145/1278480.1278518. →pages 14, 124, 125135[55] S. Yang, I. W. Jones, and M. R. Greenstreet. Synchronizer performance indeep sub-micron technology. In 2011 17th IEEE International Symposiumon Asynchronous Circuits and Systems, pages 33–42, April 2011.doi:10.1109/ASYNC.2011.19. → pages 19, 128[56] J. Zhou, D. Kinniment, G. Russell, and A. Yakovlev. A robust synchronizer.volume 2006, pages 2 pp.–, 04 2006. ISBN 0-7695-2533-4.doi:10.1109/ISVLSI.2006.12. → pages 19, 102, 127, 128136Appendix ASupporting MaterialsA.1 Derivative of Matrix A−1Let A be a non-singular matrix, then the derivative of ddt A−1 is shown in Equa-tion A.1:I = AA−10 = ddt(AA−1)=( ddt A)A−1+A( ddt A−1)−A( ddt A−1) = ( ddt A)A−1ddt A−1 = −A−1 ( ddt A)A−1(A.1)A.2 Derivation of g˙(t)Found here are all the details on the derivation to g˙(t). Of particular lenght isthe derivative of 1||u˜T S(t,teola)||2 . Each piece is derived individually and pieced backtogether for the entire result:ddt S(t, teola) =ddt S(teola, t)−1= −S(teola, t) ddt S(teola, t)S(teola, t)−1= −S(teola, t)J f (t)S(teola, t)S(teola, t)−1= −S(teola, t)J f (t)(A.2)137ddt(u˜T S(t, teola))= u˜T ddt (S(t, teola))= −u˜T S(t, teola)J f (t)(A.3)ddt(1||u˜T S(t,teola)||2)= −1||u˜T S(t,teola)||22ddt(||u˜T S(t, teola)||2)= −1||u˜T S(t,teola)||22ddt(√u˜T S(t, teola)(u˜T S(t, teola))T)= −1||u˜T S(t,teola)||22ddt(u˜T S(t,teola)(u˜T S(t,teola))T)2√u˜T S(t,teola)(u˜T S(t,teola))T= −12||u˜T S(t,teola)||32ddt(u˜T S(t, teola)S(t, teola)T u˜)= −12||u˜T S(t,teola)||32(u˜T ddt S(t, teola)S(t, teola)T u˜+u˜T S(t, teola) ddt S(t, teola)T u˜)= −12||u˜T S(t,teola)||32(− u˜T S(t, teola)J f (t)S(t, teola)T u˜+u˜T S(t, teola)( ddt S(t, teola))Tu˜)= −12||u˜T S(t,teola)||32(− u˜T S(t, teola)J f (t)S(t, teola)T u˜−u˜T S(t, teola)(S(teola, t)J f (t))T u˜)= −12||u˜T S(t,teola)||32(− u˜T S(t, teola)J f (t)S(t, teola)T u˜−u˜T S(t, teola)J f (t)T S(teola, t)T u˜)=u˜T S(t,teola)J f (t)S(t,teola)T u˜+u˜T S(t,teola)J f (t)T S(teola,t)T u˜2||u˜T S(t,teola)||32(A.4)ddtβ (t) = J f (t)β (t)+∂ f (t)∂ tin(A.5)u(t)T = u˜T S(t,teola)||u˜T S(t,teola)||2u(t) = (u˜T S(t,teola))T||u˜T S(t,teola)||2= S(t,teola)T u˜||u˜T S(t,teola)||2(A.6)g(t) = u(t)Tβ (t) (A.7)138g˙(t) = ddt(u˜T S(t,teola)||u˜T S(t,teola)||2β (t))= ddt(u˜T S(t, teola)) β (t)||u˜T S(t,teola)||2 + u˜T S(t, teola) ddt(1||u˜T S(t,teola)||2)β (t)+ u˜T S(t,teola)||u˜T S(t,teola)||2ddt (β (t))=−u˜T S(t,teola)J f (t)β (t)||u˜T S(t,teola)||2+u˜T S(t, teola)u˜T S(t,teola)J f (t)S(t,teola)T u˜+u˜T S(t,teola)J f (t)T S(teola,t)T u˜2||u˜T S(t,teola)||32β (t)+ u˜T S(t,teola)||u˜T S(t,teola)||2(J f (t)β (t)+ ∂ f (t)∂ tin)= −u(t)T J f (t)β (t)+u(t)T(u(t)T J f (t)u(t)+u(t)T J f (t)T u(t)2)β (t)+u(t)T(J f (t)β (t)+ ∂ f (t)∂ tin)= u(t)T(u(t)T J f (t)u(t)β (t)+ ∂ f (t)∂ tin)Note that u(t)T J f (t)u(t) is a scalar, then:= u(t)T J f (t)u(t)u(t)Tβ (t)+u(t)T ∂ f (t)∂ tin= u(t)T J f (t)u(t)g(t)+u(t)T∂ f (t)∂ tin(A.8)A.3 Tensor - Vector ProductThis section defines the details for computing a tensor vector product. We usethe operator ⊗ to denote a tensor-vector product. Consider a tensor A to be anN×N×K tensor and~b be a N× 1 vector. Then each sub element Ai is a N×Nmatrix then, thus:A =[A1 A2 · · · Ak]A⊗~b =[A1 ·~b A2 ·~b · · · Ak ·~b] (A.9)The result of Ai ·~b element is a N×1, therefore overall A⊗~b is a N×K matrix.139

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0388329/manifest

Comment

Related Items