UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A novel high resolution delay locked loop Saghafi, Ardeshir 2005

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2005-0625.pdf [ 4.36MB ]
Metadata
JSON: 831-1.0065411.json
JSON-LD: 831-1.0065411-ld.json
RDF/XML (Pretty): 831-1.0065411-rdf.xml
RDF/JSON: 831-1.0065411-rdf.json
Turtle: 831-1.0065411-turtle.txt
N-Triples: 831-1.0065411-rdf-ntriples.txt
Original Record: 831-1.0065411-source.json
Full Text
831-1.0065411-fulltext.txt
Citation
831-1.0065411.ris

Full Text

A NOVEL HIGH RESOLUTION DELAY LOCKED LOOP by ARDESHIR SAGHAFI B.Sc, The University of Science and Technology Tehran, Iran, 1989 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES (Electrical & Computer Engineering)  THE UNIVERSITY OF BRITISH COLUMBIA July 2005  © Ardeshir Saghafi, 2005  Abstract  With the rapid advances in semiconductor technology, modern digital systems operated at GHz frequency have been successfully developed for many years. As the chip size gets progressively bigger, and the number of logic gates and chip operating frequencies increase, the clock skew becomes increasingly more important in ensuring the proper functioning of VLSI chips. With a synchronous methodology, it is impossible to increase the clock speed further without reducing the clock skew on the chip.  The Phase Locked Loops (PLLs) and Delay Locked Loops (DLLs) have been widely adopted to solve the clock skew problem. In recent years, Delay Locked Loops (DLL's) have been widely used for clock alignment due to their lower phase-error accumulation and faster locking time. In this thesis a novel high resolution D L L with less than 10 ps is proposed which combines the coarse and fine delay line into an efficient hybrid delay line. Consequently, it saves power and area.  11  Table of Contents Abstract Table of Contents List of Figures Acknowledgment  ii iii v viii  C H A P T E R 1 Introduction 1.1 Clock skew 1.2 Delay Locked Loop 1.3 D L L Vs. P L L 1.4 Applications 1.4.1 Clock distribution 1.4.2 S D R A M 1.4.3 Time-to-Digital converter (TDC) 1.4.4 Automatic Test Equipment (ATE) 1.4.5 Clock synthesis 1.4.6 Clock and data recovery (CDR)  1 1 3 5 7 7 7 9 10 10 11  '.  C H A P T E R 2 Background  12  2.1 Analog D L L 2.2 Digital D L L 2.3 Double loop D L L 2.4 Synchronous Mirror Delay (SMD) 2.5 Register controlled D L L (RDLL) 2.6 Vernier Delay Locked Loop (VDLL)  12 16 18 20 24 27  CHAPTER 3 Design of proposed D L L  30  3.1 Block diagram 3.2 D L L modules description 3.2.1 Vernier delay line 3.2.2 Vernier delay line controller 3.2.3 High resolution phase detector 3.2.4 Lock detector.  30 39 39 42 47 51  C H A P T E R 4 Analysis of proposed D L L  52  4.1 Testbench 4.2 Initial lock 4.3 Lock re-entry 4.3.1 Lock re-entry (case 1) 4.3.2 Lock re-entry (case 2)  52 53 57 57 58 iii  4.4 Gate count of vernier unit delay 4.5 Resolution of the proposed D L L 4.6 Limitations of the proposed D L L CHAPTER 5  63 63 64  Conclusion  65  Bibliography Appendix A  68 Design V H D L code  80  A p p e n d i x B Synthesis result  95  iv  List of Figures  Figure 1.1 Possible hold violation due to clock skew  2  Figure 1.2 Possible setup violation due to clock skew.  3  Figure 1.3 Typical D L L block diagram  4  Figure 1.4 Typical P L L block diagram  6  Figure 1.5 S D R A M output timing with and without a D L L  8  Figure 1.6 Block diagram of the laser range finder [101]  9  Figure 2.1 Conventional Analog D L L  12  Figure 2.2 Analog D L L with duty-cycle correction  14  Figure 2.3 Analog multiphase D L L  15  Figure 2.4 Digital D L L block diagram  16  Figure 2.5 Dual loop D L L  19  Figure 2.6 Conventional SMD  21  Figure 2.7 Timing diagram of a conventional SMD  22  Figure 2.8 Block diagram of Direct S M D  23  Figure 2.9 Register Controlled D L L (RDLL)  24  Figure 2.10 Core circuit in R D L L  25  Figure 2.11 Core circuit in a R S D L L  26  Figure 2.12 Block diagram of a vernier delay line [73]  29  Figure 2.13 Schematic of vernier delay line [73]  29  Figure 3.1 Block diagram of proposed D L L  30  V  Figure 3.2 Circuit and timing diagram ofa Conventional unit delay.  31  Figure 3.3 Circuit and timing diagram of a Symmetrical unit delay  32  Figure 3.4 CMOS N A N D gate  34  Figure 3.5 Phase Detector block  34  Figure 3.6 Lock Detector block  35  Figure 3.7 Controller block  35  Figure 3.8 Vernier delay line block  36  Figure 3.9 S A R D L L block diagram [33]  37  Figure 3.10 Flowchart for weighing sequence  38  Figure 3.11 Proposed unit delay circuit  40  Figure 3.12 State diagram of controller block  43  Figure 3.13 Shift registers in controller block  44  Figure 3.14 Phase detector in [50]  48  Figure 3.15 Proposed high resolution phase detector  49  Figure 3.16 Phase detector waveforms  50  Figure 3.17 Lock detector circuit  51  Figure 4.1 Initial lock mode waveform for a leading input clock  54  Figure 4.2 Initial lock mode waveform for a leading input clock (zoomed in)  55  Figure 4.3 Initial lock mode waveform for a leading output clock  56  Figure 4.4 Initial lock mode waveform for a leading output clock (zoomed in)  56  Figure 4.5 Lock re-entry mode waveform for small phase error.  58  Figure 4.6 Lock re-entry mode waveform for a leading input clock  59  Figure 4.7 Lock re-entry mode waveform for a leading input clock (zoomed in)  60 vi  Figure 4.8 Introduced glitch waveform for a leading input clock  60  Figure 4.9 Lock re-entry mode waveform for a leading output clock  61  Figure 4.10 Lock re-entry mode waveform for a leading output clock (zoomed in)  62  Figure 4.11 Introduced glitch waveform for a leading output clock  62  ACKNOWLEDGMENTS  I would like to express my deepest gratitude to my academic and research advisor Dr. Andre Ivanov for his guidance and constant support in helping me to conduct and complete this work.  Also my wife has been supportive, not just tolerant, of my return to graduate school. She is as pleased as I am that my dissertation is finished. She knows that I am grateful to her for continuous support, but I take this opportunity for a public acknowledgment of my debt to her.  Vlll  Chapter 1 Introduction  This chapter introduces the research topic of this thesis. A quick review of the D L L circuit and its comparison with Phased Locked Loop are also included in this chapter. The chapter also describes the different applications in which the D L L is used.  1.1 Clock skew As silicon fabrication technology develops, more logic can be packed on a die and as a result the chip size gets progressively bigger. The number of logic gates and chip operating frequencies increase, and the clock skew becomes increasingly more important in ensuring the proper functioning of V L S I chips. With a synchronous communication protocol on and off the chip, it is impractical to increase the communication clock speed further without reducing the clock skew on the chip. In a synchronous design the period of clock determines the available time for any operation between two flip-flops. Any uncertainty such as skew or jitter reduces this period.  The clock skew is caused by different R C delay of clock interconnections along different clock signal paths, different delays of clock buffers due to process and temperature variations on the same chip, and power supply differences caused by power rail voltage drop.  l  The clock skew problem can also exist in other situations. For example, the input clock driver in any chip will introduce uncertain time delays between the internal and external clocks. As a result, internal clocks in a multi-chip system become asynchronous and problems occur when data transfer between chips is performed.  Clock skew can lead to both setup and hold time violations. Consider the circuit in Figure 1.1(a), where the clock is shown routed in the direction of the data path. Delays in the clock path lead to skewed versions of the system clock arriving at the two flip-flops. If 62 is greater than the sum of the clock-to-Q delay of FF1, the logic delay, and the setup time of FF2, then a hold time violation will occur. As shown in Figure 1.1(b), FF2 samples the wrong data. This can be prevented by adding delay to the data path from FF1 to FF2 (which increases the cycle time and is not preferred) or by reducing the clock skew.  D1  (a)  6* D  D2  logic  Q  D  Q  r4 F F 2 clk_  -*J  elk clkl  clkl d1  clk2  d2  V  J  A  f  A  \  (b) D2 clk2  t  J  I  I  \  I  V  Figure 1.1 Possible hold violation due to clock skew  If the clock signal is routed in the opposite direction to data flow as shown in Figure 1.2(a), then clock skew will not cause a hold time violation. However a setup time violation can occur since clk2 might arrive earlier than clkl as shown in Figure 1.2(b). The clock cycle has to be increased in order to prevent this violation, which also harms system performance.  (a)  e l k _J  \  clkl  / V  J  (b)  7  \  X  D2 clk2  V  J  V  \  Figure 1.2 Possible setup violation due to clock skew  1.2 Delay L o c k e d L o o p To reduce clock skew, the clock distribution network should be designed with care. In addition, circuits such as Phased Locked Loops (PLLs) and Delay Locked Loops (DLL) may be necessary to reduce the total clock skew by employing them in several critical places of the clock distribution structure. 3  Basic D L L consists of a phase detector (PD) or a phase comparator (PC) block, a variable delay line, and a controller to convert the PD's output to a control signal for the delay line as shown in Figure 1.3(a). A basic D L L detects the phase error between the input clock and its output clock and adjusts the total delay of variable delay line to a multiple of periods of the input clock. It introduces enough delay (Td) so the rising edge of the output clock coincides with the next rising edge of the input clock as shown in Figure 1.3(b).  Clock Buffer  External Reference Clock  Low Pass Filter  Phase Detector  Point of use  Error signal (a)  DLL Input Clock  v  j  -Lock limeDLL Output Clock  /  \  /  \  \\J  \  /  \  \\-  DLL is locked  (b)  Figure 1.3 Typical D L L block diagram  The correct timing of a synchronous circuit relies on clock edges and is affected by the clock skew and jitter, so the introduced 1 input clock period delay doesn't have any negative impact on the functionality of systems that utilize the Delay Locked Loop circuits.  The output clock's frequency of a standard Delay Locked Loop circuit is the same as that in the input clock, so generally D L L s are not used for clock synthesis. PLLs are used widely for synthesis and clock multiplication. While there are some applications which use DLLs for clock synthesis, this is not common [45], [53] and [107].  The D L L and P L L circuits are considered feedback circuits. They generally require several clock cycles to achieve lock, resulting in a large standby power consumption. These circuits cannot be used in clock deskewing applications requiring low standby power consumption. In other words, these circuits cannot be turned off in standby mode due to their slow locking operation.  The Synchronous Mirror Delay (SMD) and Clock Synchronized Delay (CSD) circuits were developed for applications requiring low standby currents [52], [88] and [94]. These circuits have no feedback so their lock-in time is significantly less than that of D L L s or PLLs. During the standby mode, it is possible to switch them off. When power is resumed, it only takes two or three clock cycles for them to lock in, which is negligible for most applications.  1.3  D L L vs. P L L  When it comes to choosing between a P L L and D L L for a particular application, differences in their architecture need to be understood. The oscillator used in the P L L inherently introduces instability and accumulation of phase errors (Figure 1.4). This in turn degrades the performance of the P L L when compensating for the delay of the clock distribution net5  work. On the other hand, the unconditionally stable DLL architecture does not accumulate phase errors [20], [100], [103]. For this reason, the DLL architecture is widely used for delay compensation and clock conditioning.  External Reference! Phase Clock Frequency 6H  Detector  Low Pass Filter  Voltage Controlled Oscillator  Clock Tree Point of use  Error signal  Figure 1.4 Typical PLL block diagram  The DLL's closed loop transfer function has only one pole (a first order system) [56] and [57]. Therefore, it is naturally a stable system. On the other hand, a PLL's closed loop transfer function has two or three poles. Therefore, stability is a major issue and needs to be addressed during design. Normally, one needs to add zeroes to a PLL's transfer function in order to stabilize the PLL circuit [103].  The input clock's jitter propagates through a DLL circuit (first order system) and can affect the performance of the system. PLL filters out the jitter, so it is the best choice for applications with high jitter input. In a clock distribution system, the main clock is generated by a quartz crystal oscillator, which does not introduce a significant amount of jitter. Therefore, generally the DLL circuit is utilized for de-skewing purposes [33], [66], [68] and [98]. 6  The main disadvantage of a conventional D L L compared to a P L L , is its limited phase capture range [37]. At a given operating clock frequency a D L L can delay its input clock by an amount bounded by a minimum and maximum delay. As a consequence, extra care must be taken by a designer to prevent the loop from trying to lock to a delay outside these limits. To extend the operating range, the number of delay cells or the gain of the delay line (analog DLL) should be increased. This not only consumes additional power, but also causes more jitter from supply.  1.4 A p p l i c a t i o n s DLLs are used in many different applications as described in the following subsections.  1.4.1 Clock distribution As previously mentioned, a D L L is mainly used in the clock distribution circuit which do not require clock synthesis or multiplication. Due to the nature of these systems (fixed clock frequency), a DLL's narrow capture range is not an issue, [35], [82], [33], [66], [68], [98] and [102].  1.4.2 S D R A M In synchronous D R A M , the output data strobe (DQS) should be locked to data outputs (DQ outputs) for high-speed performance. The clock-access and output-hold times of conventional D R A M designs are determined by the delay time of internal circuits such as clock input and output buffers. Variations in temperature and process change access times and reduce the size of the valid data window. Several publications describe how a D L L 7  can optimize and stabilize clock-access and output hold times, [26], [47], [65], [70], [72], [73], [74], [77], [79], [85], [87], [90], [93], [94] and [104]. A n internal D L L can be used to adjust the time difference between the output and input clock signals in SDRAMs (Figure 1.5).  (a) without DLL tAC = Td(max)  r  cik  tOH = Td(min)  td  DQ  Data out  xxyy  -^53  Valid data window  (b) with DLL tOH  tAC Clk  DQ  L  \  \  /  X)( -4  Data out  ) •  Valid data window Figure 1.5 S D R A M output timing with and without a D L L  In Double Data Rate synchronous D R A M [1], [17], [21], [32], [44], [46], [50], [51] and [75], where read/write accesses can occur on both rising and falling edges of the clock, clock synchronizing is critical and is required for both clock edges. A symmetrical D L L is  used for this application. The term ''symmetrical'' means that the delay line used in the DLL has the same delay for a high-to-low or a low-to-high logic transition.  1.4.3 Time-to-Digital converter (TDC) High-resolution time-to-digital converters (TDCs) have an application in a number of measurement systems such as time-of-flight (TOF) particle detectors, laser range finders (Figure 1.6), and logic analyzers. Laser range-finding is used in many industrial applications, for example measuring dimensions of ship blocks in shipyards, inspection of oil level in large tanks, and robot vision [4], [22], [23], [36], [54], [62], [69], [81], [83], [95], [96], [97], [101] and [106].  Laser diode Transmitter  Time interval measurement (DLL)  Amplifier +  Timing discriminator"  Target  Distance result  Figure 1.6 Block diagram of the laser range finder [101]. Modern TOF systems used in particle physics experiments, require TDCs to have a resolution below 1 ns. A distance measurement accuracy of 2-3 cm corresponds to 100-200 ps of measurement time. A high-resolution measurement can be obtained by utilizing a logic buffer delay as a time unit, and a DLL is used to stabilize the value of buffer delay against  process variations, temperature and power supply changes. The delay line is used in a closed loop controlled by a D L L . The time resolution is limited to the delay of each unit cell in the delay line.  1.4.4 Automatic Test Equipment (ATE) General purpose Automatic Test Equipment (ATE) requires fast devices, high tester bandwidth, high data rates, and high timing accuracy. A t the heart of ATE is timing event generation circuitry which generates control signals for different parts of ATE [99]. DLLs have been used widely in ATE to achieve required precision and eliminate process variations, temperature fluctuations and supply voltage (PVT) that affect the time base generator.  1.4.5 Clock synthesis PLLs have been used successfully in creating tapped ring oscillators for clock synthesis. A PLL's delay elements have two dependent variables controlled by the feedback system, the frequency and the phase. A D L L , however, has only a single dependent variable controlled by the feedback loop, the phase. The P L L will integrate the error of all its noise sources,but a D L L will only integrate the noise sources that cause jitter such as power supply noise or thermal noise. This only happens over one delay period, so a D L L does not accumulate noise because it is a first order system. This is a desirable characteristic for every high performance clock generator [2], [3], [6], [7], [8], [9], [11], [12], [15], [18], [27], [28], [30], [39], [40], [41], [42], [45], [48], [64], [67], [80], [88], [103] and [105].  10  1.4.6 Clock and data recovery (CDR) Clock and data recovery is a mechanism that allows a receiver to extract the clock from an incoming data stream which then can be used to extract the incoming data. The receiver extract the embedded clock in the data stream in order to transmit data back to the source. Both Delay-Locked-Loops (DLLs) and Phase-Locked-Loops (PLLs) can be used in clock and data recovery circuits. DLLs are rarely used in CDR circuits [14], [19], [25], [55], [61] and [91].  11  Chapter 2 Background  In this chapter, we provide an overview of different D L L types. The advantages and disadvantages of each D L L type has been discussed. A extensive literature overview of differ- • ent types of DLLs has been included, which covers papers from 1993 to 2005.  2.1 A n a l o g D L L Analog DLLs were first used in clock distribution applications [10] and [13]. A conventional analog D L L consists of four main blocks: a voltage controlled delay line (VCDL), a charge-pump, a low pass filter, and a phase detector as shown in (Figure 2.1).  VCDL RefClk  Figure 2.1 Conventional analog D L L 12  The input reference clock drives the delay line and is comprised of cascaded variable delay buffers. The output clock drives the loop phase detector. The output of the phase detector is integrated by the charge pump and the loop filter capacitor to generate a loop control voltage. The loop negative feedback drives the control voltage to a value that ideally orces a zero phase error between the output clock and the reference clock.  The simple design of the D L L offers many advantages when compared to Voltage Controlled Oscillator (VCO) based PLLs. Due to frequency acquisition constraints, P L L usually uses a specific type of phase detector, the state-machine based phase frequency detector (PFD). In contrast, a DLL's phase detector can be easily implemented by using bang-bang control [109]. This means that the control signal of the loop can simply be a binary up or down signal rather than being proportional to the phase error magnitude.  Additionally, since DLLs do not use a V C O , phase errors induced by supply or substrate noise do not accumulate over many clock cycles [108]. This improved noise immunity is the main reason for the increased usage of DLLs in applications that do not require clock synthesis [16], [19], [34] and [105].  A n analog D L L is a relatively complex analog circuit requiring process-specific implementation. It is difficult to reuse the same design for different technology, making analog D L L a non-portable architecture. For example, if an analog D L L is designed for 0.35 | i m CMOS technology then it is not practical to upgrade it to 0.18 | i m technology, as major changes in the layout of the design are required. 13  The output clock's duty cycle changes as it passes through many delay cells. The reason is that the propagation delay of each unit cell in the delay line is not the same for low-to-high and high-to-low input, so even i f the duty-cycle of a reference clock is 50% at the input, the output duty-cycle may be significantly different. A conventional solution to this is attaching duty-cycle correction circuits to all clock output drivers, which also adds to the area and increases jitter.  A n all-analog multiphase D L L is proposed in [34]. It achieves both wide range operation and low jitter performance. The proposed D L L has the same benefits as conventional analog D L L such as jitter cancelling and multiphase clock generation. It also uses a dual controlled delay cell to correct the duty-cycle problem as shown in Figure 2.2.  VCDL Reference Clock  Clk  Phase Detector Charge pump Low pass filter Phase Detector Charge pump Low pass filter  Vcp  Vduty  Figure 2.2 Analog D L L with duty-cycle correction  14  A second phase detector compares the inverted clock input, with the inverted clock output and generates a control signal Vduty as shown in Figure 2.2. It fine-tunes the cell current ratio and therefore aligns the falling edges of reference clock and output clock. In this way, it maintains a reference clock's duty cycle.  A quadrature phase mixing D L L was proposed in [104] and [105], which completely eliminates the limited capture range deficiency of conventional analog DLLs (Figure 2.3). This approach is based on the fact that quadrature clocks (90 degree phase shifted clocks) can be generated for a given clock frequency. The quadrature clocks are input to a phase mixer, which can produce a clock whose phase can span the complete 0 - 3 6 0 degree phase interval. This approach reduces the limited phase range problem of conventional D L L .  Reference Clock  Phase Detector  Divide 0 By 2 9o°| Charge Pump  Figure 2.3 Analog multiphase D L L  15  2.2  Digital D L L  Both analog and digital DLLs have been used for clock alignment applications [35], [82], [33], [66], [68], [98] and [102]. A n analog D L L generally provides better jitter performance at the expense of greater complexity. Although the digital D L L uses more area and power than the analog D L L , its greater simplicity, and lower minimum required power supply voltage makes it very attractive for many clock alignment applications.  Digital DLLs are characterized by their use of digital delay lines. They are typically made from simple digital circuit elements (Figure 2.4). This simplicity helps to design a portable digital D L L which can be easily adopted for different technologies. Additionally, because phase information in a digital D L L is stored as a digital state, digital DLLs can provide very fast timing recovery after being placed in standby mode. However, conventional digital D L L s provide only moderate phase resolution and jitter performance [1], [21], [32], [48], [49], [71], [74], [76], [78], [92] and [94].  Demultiplexer  N  External Reference | Clock _  -h Phase Detector  Right Shift Left ^ Register  •1  Error signal  Figure 2.4. Digital D L L block diagram 16  Another benefit of digital DLLs is their ability to operate at lower voltages than analog DLL's. Because analog DLLs require the use of saturated current sources, they experience minimum voltage problems as supply voltage decreases. Digital D L L s , on the other hand, only require enough voltage to ensure the proper operation of their digital gate elements.  A digital DLLs utilize the power saving benefits of power supply scaling better than analog DLLs. The power consumption of an analog D L L is the sum of static power consumed by the constant current sources in the circuit and the dynamic power of C V f (where C is capacitance and f is frequency). The power consumption of a digital D L L , on the other hand, is determined primarily by C V f power, which decreases quadratically with supply voltage.  The delay elements can be implemented with almost any circuit block, but because the phase resolution of the delay line is determined by the propagation of each unit cell, delay elements that provide minimal delay are generally preferred. The delay line of a conventional digital D L L uses inverters, since they provide the shortest delay of any CMOS digital gates. Because of the inverting characteristic of an inverter gate, the delay line is tapped only at every other inverter (two inverters in a series form a unit cell) to ensure that output taps are not inverted and only shifted by the total propagation delay of the two inverters.  Although conventional delay lines are attractive for their simplicity, D L L s based on such conventional delay elements suffer from several significant limitations. First, the delay 17  line provides fairly coarse resolution. For example, the delay line with inverters as unit cells provides a minimum phase step corresponding to two inverter delays. Such coarse phase resolution is not enough for many clock alignment applications.  Second, conventional delay lines deliver only a limited phase range. In order to cover at least one full cycle of phase, the delay line length and unit cell delays are adjusted to provide at least 360 degrees of phase under the fastest process, voltage, and temperature (PVT) conditions and minimum operating frequency. Consequently to cover this range, a long delay line which occupies more silicon area and dissipates additional power is required. Additionally, because inverters offer a poor power supply rejection ratio (PSRR), power supply's noise-induced jitter can be accumulated as the signal propagates through the delay line. This causes the signals from the later taps in the delay line to introduce more jitter than earlier taps.  2.3 D o u b l e l o o p D L L The key parameters in the D L L design are locking time, power consumption, jitter, and phase error, which depend on the choice of proper delay elements and loop control methods. The phase adjustment is done through a variable delay line or a tapped delay line. The tapped delay line is used for digital control, where the locking characteristics are less sensitive to switching noise and cross talk. On the other hand, the variable delay line is used for reducing the static phase error, where the delay changes gradually. Therefore, the logical approach to obtaining a D L L with fast locking and a low phase error is to combine these two methods. This is called a dual loop D L L , sometimes referred to as semi-digital 18  D L L [26], [29], [31], [38], [45], [58], [60], [63], [84] and [89]. The locking procedure is done in two steps, coarse tuning and fine tuning. Coarse tuning and fine tuning are performed in the digital and analog domains, respectively (Figure 2.5). The dual loop D L L can be used in low power stand-by mode applications. Then, the recovery from stand-by mode to regular operational mode is almost immediate because digital information is kept in the stand-by mode and the position of the output tap in the delay line is known at startup.  External Clock  Delay  Delay  Delay  Delay  Mux /—SjVlux / - S j V l u x / ~ \ Mux /—/-  Mux / Charge Pump] Loop Filter  PFD  Analog Delay  Digital Control Block  Clock Buffer Digital Phase Detector!  Figure 2.5 Dual loop D L L After powering up the system, the coarse tuning mechanism starts. Normally the middle tap in the delay line is selected and the output clock is compared to input reference clock 19  by a digital phase detector. Depending on which clock is leading and which is lagging, the output of the phase detector shifts the selected tap right or left. Finally, the proper tap with minimum delay to the reference clock is selected. By that time, coarse tuning phase had been completed.  To avoid unwanted phase jitter, the digital block is disabled and shift registers in the controller block hold their positions. The analog control part is enabled to reduce the phase error. This function is performed by a lock window mechanism. If the internal clock is outside the window, the digital block is enabled. Once the internal clock enters the lock window range, the analog block is enabled and the digital block is disabled. The range of the analog part must be large enough to cover the lock-detecting window. The analog control block consists ofa PFD, a charge pump, and a loop filter. The operation of the analog loop is the same as that of the conventional analog D L L .  2.4 S y n c h r o n o u s Mirror Delay (SMD) The conventional P L L and D L L circuits are considered feedback systems, requiring many clock cycles to achieve lock. Therefore they can not be turned off and are not used in clock-skew suppression applications requiring low standby currents for example in a cell phone device. On the other hand, Synchronous Mirror Delay (SMD) and Clock Synchronized Delay (CSD) circuits are non-feedback systems which can achieve the lock, in only two clock cycles [52], [88] and [94]. Therefore, in standby mode these circuits can be disabled, and they can lock to the reference clock in just two clock cycles when the operation mode is resumed. 20  A conventional SMD circuit as shown in Figure 2.6, consists of an input buffer with delay of d l , a clock driver with delay d , a replica delay line (a dummy input buffer plus a 2  dummy clock driver with total delay (t,-  eplica  = dj + d ), and two delay lines (a delay-mea2  surement line and a variable-delay line arranged in parallel). When the circuit is activated, the first clock signal propagates through the input buffer, the replica delay line, and the delay-measurement line with delay [ t input buffer. Delay time [ t  CK  CK  -t  replica  ] until the second signal comes out of the  - 1 ^ ] ^ ] determines the length of the variable line. The sec-  ond signal propagates through the variable-delay line and comes out of the clock driver. The resulting total delay time is d, + d + [ t 2  CK  - t ^ J + [t  C K  -t  replica  ] + d = 2t 2  C K  (Fig-  ure 2.7). In this manner, no feedback circuitry is used and clock skew is eliminated within two clock cycles. The simple structure of the SMD circuit also reduces design efforts [52], [88] and [94].  tV = [tCK - (dl + d2)] < • Meas. Delay Line Var. Delay Line Buffer  R e p l i c a  D e l a  y d2  Clock Driver Internal Clock Line  I  Figure 2.6 Conventional SMD  21  tCK Ext Clock A  n V i Vreplica  B  tCK  n —| i  n 1  r~ii  n  C Int Clock  d2\  ~i  n  Figure 2.7 Timing diagram of a conventional SMD  Despite their advantages, S M D circuits are not widely used because they use a dummy clock driver circuit based on clock driver circuits after the placement and routing phases. Therefore, they are used for devices in which the clock driver circuits can be fixed during the circuit design stage, e.g., memory elements [94].  Furthermore, a difference between the original clock driver circuit and the dummy clock driver circuit exists due to process, power supply voltage,, and temperature variations (PVT). This delay difference increases the phase error, which can not be compensated for during the operation mode because no feedback mechanism exists for a SMD circuit.  22  A direct-skew-detect synchronous mirror delay (direct SMD) achieves clock-skew suppression in only two clock cycles [43] and [52]. It can be used for application-specific integrated circuits (ASIC) with undefined clock paths as shown in Figure 2.8. The direct SMD circuit detects both clock skew and clock cycle by using a direct-skew detector and clock suppression circuitry. The direct S M D circuit does not use a dummy clock driver circuit. Therefore, it does not experience the same problems as mentioned above for a conventional SMD circuit.  Skew-Detection Signal  Input Ext Clock  B u f f e r  Meas. Delay Line Var. Delay Line  Dummy, Input Buffer  Skew Detector h  -1  Switch  Clock Driver  Internal Clock Line  Figure 2.8 Block diagram of direct SMD  23  2.5 Register c o n t r o l l e d D L L ( R D L L ) The R D L L belongs to the digital D L L family and is widely used in high speed synchronous D R A M (SDRAM) applications [17], [51], [85] and [90]. In a SDRAM, the output data strobe (DQS) should be locked to the data outputs. To optimize and stabilize clockaccess and output times, an internal R D L L is used in a S D R A M memory chip, which adjusts the time difference between the output and input clock signals.  The R D L L consists of a tapped delay line, a shift register, a phase detector, and a replica input buffer dummy [85]. The replica input buffer dummy is used in the feedback path to match the delay of the input clock buffer. The phase detector (PD) is used to compare the relative timing of the edges of the input clock and the feedback clock signal, which comes through the tapped delay line. The shift register controls the point of entry in the delay line for the incoming external clock as shown in Figure 2.9.  External Clock M Clock buffer  Delay line  Phase Comparator]  Output Clock  Shift register  Clock buffer (dummy)  Figure 2.9 Register Controlled D L L (RDLL)  24  The outputs of the phase detector, shift-right and shift-left, are used to control the shift register. In the conventional R D L L , only one bit of the shift register output is high, while the other bits are zero. The single bit is used to select a point of entry for CLKIn in the delay line. When the rising edge of the input clock is within the resolution of the output clock, then both outputs of PD, shift-right and shift-left, are low and the loop is locked as shown in Figure 2.10.  ^>-^7t>^>---"rOH> :  CLKOut  CLKInL  L  H Shift register  Figure 2.10 Core circuit in R D L L  The resolution of the R D L L is determined by the size of unit delay used in the delay line. The locking range is determined by the number of delay stages used in the delay line. Since the D L L circuit inserts a delay time between CLKIn and CLKOut, making the output clock change simultaneously with the next rising edge of the input clock, the minimum operating frequency to which the R D L L can lock is the reciprocal of the product of the number of stages in the delay line with the delay per stage (F j = l/(Td * N), where Td is m  n  the delay of one unit delay and N is the number of unit delays in the delay line). Adding more delay stages will increase the locking range of the R D L L at the cost of increased chip area and power consumption [17], [51], [85] and [90]. 25  The conventional R D L L uses an A N D gate as the unit-delay stage (NAND + Inverter). The problem created by using a N A N D + Inverter as the basic delay element is that the propagation delay through the unit delay for a high-to-low transition is not equal to the delay of a low-to-high transition, i.e, t t  P H L  and t  P L H  P H L  is not equal to t  P L H  . If the difference between  is 20 ps, for example, then the total skew of the falling edge through 50  stages is 1 ns. Because of this skew, the input clock's duty-cycle will not be preserved, when the clock propagates through the delay line.  A Register-Controlled Symmetrical D L L (RSDLL) is proposed in [51], which can be used for duty-cycle sensitive applications. For example, it meets the requirements of doubledata-rate (DDR) S D R A M that read/write accesses occurs on both rising and falling edges of the clock. In the R S D L L , a modified symmetrical delay element is used, with a N A N D gate instead of an inverter (two N A N D gates per delay stage).  Input • H  Li  Q  Q  H  H Q  Q  Shift register  Figure 2.11 Core circuit in a R S D L L  26  This symmetrical unit delay guarantees that t  P H L  =t  P L H  independently of process varia-  tions, since when one N A N D switches from HIGH to LOW, the other switches from L O W to HIGH. The schematic for a symmetrical D L L is shown in Figure 2.11.  2.6 Vernier Delay L o c k e d L o o p ( V D L L ) The Vernier principle is based on the Vernier caliper [83]. The tool measures the length of an object placed between its two jaws. On the sides of the jaws, an indicator mark shows the distance between the jaws on a scale. Since the indicator usually falls between two tick marks, additional accuracy is obtained by dividing the distance between tick marks.  A n additional scale is included next to the indicator, which has ten divisions in a distance equal to nine divisions on the scale. Because of this mismatch it is possible to measure a subdivision of the primary scale ten times smaller than the distance between tick marks.  Based on this concept, a delay line with N =10 delay elements can be designed to have a total delay of H - 9 times of clock periods. The minimum achievable time step is  = TV  N = D/H where T is the period of the input clock and D is delay of each delay element.  This technique was introduced and implemented for a time to digital converter (TDC) [36], [70], [83] and [99]. A TDC is mainly used to digitize the time which has many potential applications in high-energy and nuclear physics experiments.  27  In a conventional digital D L L , the quantization error is equal to the propagation delay of each unit in the delay line. In a 0.35 | i m CMOS technology, the propagation delay of an inverter gate is about 40 ps. Thus, a unit delay consisting of two series inverter presents a delay of 80 ps. For a GHz operating frequency, the 80 ps quantization error accounts for 8% of 1 ns clock period, an error that affects the functionality of a synchronous system. The Vernier technique is implemented to reduce this error in [5], [24], and [86].  A modified version of the Register-Controlled D L L (RDLL) is proposed [73], which relies on the Vernier concept. It consists of two series of RDLLs. The first R D L L performs the coarse delay adjustment, with a 200 ps quantization error. The second R D L L , with a 40 ps quantization error, performs fine-tuning.  The coarse R D L L uses the conventional delay line, where each unit delay consists of a N A N D gate and an inverter in a series configuration. The fine R D L L uses a different configuration, composed of two delay elements that have delay times of t and 1.2 t , where t d  d  d  is the unit delay time of the conventional delay element as shown in Figure 2.12.  The delay lines are arranged in two parallel main and sub delay lines and are serially connected by switches SW0 to SW4. In Figure 2.12, only one of the switches can be closed at any time. For example, i f SW0 is closed, the delay line generates 5 t . Similarly, i f SW1 is d  closed, the delay line generates 5.2 t . Thus, this delay arrangement can generate a 0.2 t d  delay step, which is considerably smaller than that of conventional delay.  28  d  Sub-delay line IN  SWO  1.2 td  1.2 td  SW1  N  1.2 td  SW2\  1.2 td  SW3  N  SW4  N  OUT -  td  td td Main delay line  td  o  td  Figure 2.12 Block diagram of a Vernier delay line [73]  In figure 2.13, the main and sub-delay lines are connected with SWO to SW4 switches. The fan-out of the main delay line is one, while that of the sub_delay line is two. Hence, the delay of the sub_delay line exceeds that of the main delay line. This delay difference becomes the unit delay time of the delay line, which is equal to the quantization error as shown in F igure 2.13.  Sub-delay line tr td+A  F.0 = 2  SW(n-l)  -a  SW(n)  V  td  Main delay line  V F.0 = 1  Figure 2.13 Schematic of vernier delay line [73]. 29  Chapter 3 Design of proposed DLL  This chapter covers the block diagram of the proposed circuit and detailed circuit explanations of each module in the block diagram. The logic design is described thoroughly. The simulation results are covered in the next chapter. The design goal is to increase the resolution of D L L to less than 10 ps, as well as reducing the area (gate size) of the vernier delay line in the D L L by a minimum of 10%. The power consumption is also reduced as a result of the gate reduction in the vernier delay line. The resolution of less than 10 ps, area reduction of 15% and operating frequency of up to 200 M H z is achieved in this design.  3.1 B l o c k d i a g r a m The block diagram consists of four modules, phase detector, lock detector, Vernier delay line and controller as shown in Figure 3.1. Output Vernier delay line Clock Input Clock  Phase Detector  a  Controller  Lock Lock Indicator • Detector  Error signal Figure 3.1 Block diagram of proposed D L L 30  The input clock is connected to two modules, the phase detector and the Vernier delay line. The Vernier delay line propagates the input clock and provides N output taps where N is the number of unit delays in the delay line  In order to lock the output clock to the input clock for all input frequencies, the delay of the delay line should be greater than the period of the minimum operating frequency. For example, if a DLL's locking range is between 100 M H z to 200 MHz, then the delay line must be able to delay the input clock by 10 ns. Therefore, the input clock is delayed by 10 ns when it exits from the last output tap. If the delay of each unit is, for example, 50 ps, then the delay line needs 200 unit delays. Therefore, to reduce the minimum operating frequencies, more unit delays are required, which leads to more area and power consumption.  The delay of each unit depends on the number of cascaded gates in each unit and the technology in which the circuit is implemented. The conventional unit delay consists of 1 N A N D and 1 inverter gates in series, which, in 0.18 | i m , technology generates a delay of approximately 70 ps. The same unit cell implemented in 0.35 [xm can generate approximately 100 ps. The delay estimates are based on commercial libraries.  There is a drawback for conventional unit delay, as the propagation delay is not symmetrical and the total delay for the rising edge of input signal is not the same as for the falling edge. Therefore, an input clock with a 50% duty cycle can result in a square wave pulse which no longer has the a 50% duty cycle as shown in Figure 3.2. This non-symmetrical aspect can cause problems in Double-Data-Rate D R A M s , where read/write access can 31  occur on both rising and falling edges of the clock [1], [17], [21], [32], [44], [46], [50] and  [51].  InA  t  In  pLH  Out OutA  tl  t2  t l j*t2  H  Figure 3.2 Circuit and timing diagram of a conventional unit delay  The proposed D L L utilizes the unit delay consisting of two basic NAND gates in series. This configuration eliminates the non-symmetrical characteristic of a conventional unit delay. The total propagation delay of tl ( T t2 ( T  P L H  +T  P H L  P H L  +T  P L H  ) for the input rising edge is equal to  ) for the falling edge of the same input clock. The T  P H L  and T  P L H  are high  to low and low to high delays of the NAND gate, respectively as shown in Figure 3.3. Therefore, the duty cycle of the input clock is preserved through-out the delay line.  InA  ^t  Out  In  J>TO H  H  OutA  tl  t2  t l =t2  • t  Figure 3.3 Circuit and timing diagram of a symmetrical Unit delay  32  The Vernier delay line is controlled by a finite state machine or simply a controller. There are two modes of operation, coarse and fine. A system reset signal initiates the coarse tuning mode. In this mode, the phase detector compares the output clock signal from the center tap with the reference input clock. If the positive edge of the input reference clock is leading, then the controller shifts the output tap to the left and the total delay decreases. On the other hand, if the positive edge of the input reference clock is lagging, then controller shifts the output tap to the right and the total effective delay increases.  The controller enters the fine tuning mode when the positive edge of the input reference clock and the output tap of delay line are less than a unit delay apart. Therefore, the delay of each Vernier unit determines the resolution of the coarse tuning mode. In the fine tuning mode, each time unit shift to the left or right is a fraction of its coarse tuning mode. This enhanced resolution determines the final resolution of the system and sets the maximum phase jitter.  The phase detector compares the input clock reference with the output tap signal of the delay line. The resolution of the D L L depends not only on the fine resolution of each Vernier unit delay but also on the resolution of the phase detector. In this design, the phase detector's resolution is determined by the differential delay of a two input N A N D .  Generally, in C M O S gates the propagation delay from input ports to output port are not the same. For example, in a N A N D gate, the input A which is connected to N M O S transistor T l , has a smaller propagation delay than input B , which is connected to the N M O S tran33 •  sistor T2 because the capacitance load on the drain of T2 is more that of TI as shown in Figure 3.4. This difference for a two input N A N D gate in CMOS 0.18 | i m technology is less than 10 ps and varies with load and input signal transition time (slew rate).  Out  A  Figure 3.4 CMOS N A N D gate  The phase detector block has three outputs: increasedelay,  decrease_delay,  and  controller_clk as shown in Figure 3.5. At any time during the coarse and fine tuning mode, one of the increase_delay or decreasedelay outputs is active and controllerclk is used to synchronize the controller with the phase detector, so any shift to right or left is performed on the positive edge of controllerclk output.  increase_delay register_clk decrease_delay  dll_clk_input dll_clk_output reset  Figure 3.5 Phase Detector block  34  When the interval between the positive edge of the output clock and the input reference clock is within the resolution of the D L L , then D L L is in lock mode. A l l of the phase detector's outputs are disabled and the controller stays is in standby mode. The lock detector block indicates when D L L is in the lock mode, and its output goes high when the D L L is locked as is shown in Figure 3.6.  increase_delay dll_clock_input decrease_delay  •  lock indicator  reset Figure 3.6 Lock Detector block  The controller block is a finite state machine (FSM) controlling the delay line as shown in Figures 3.7 and 3.8. It controls the coarse and fine tuning modes. It also provides the mechanism to resume the lock mode when the input's clock frequency or phase changes rapidly. The system reset pulse initializes the D L L , and the controller block goes into reset mode when the system is powered up. The detailed flowchart is shown in Figure 3.12.  reset increase_delay.  fine control  decrease_delay • registerer_clk  ^ > fine_control_inv ^>  coarse_control  Figure 3.7 Controller block 35  fine control i fine control inv coarse control I  Vernier delay line  •*»*- delay_line_output  delay_line_input  Figure 3.8 Vernier delay line block  The delay line has 128 output taps controlled by the controller block. During initialization the center tap is selected as the output tap. The register-input bus is hardwired to a hex value of "0000000080000000", which means that all the register-input bits except bit 63 are tied to logic zero. During system power up, the input load signal is asserted to logic one. Consequently, this number is loaded into a 128 bit shift register. After reset, the center tap corresponding to output control bit 63 of the controller's shift register is selected for the delay line output tap.  It is possible to load the shift register with any other number, so any output tap in the delay line can be selected. The center tap is however the best choice, because it gives the maximum dynamic range for both right and left shift, so the lock mode can be achieved in the fastest time. In addition to speed, choosing the center tap as initial output tap leaves a maximum number of unit delays in both directions. Therefore, the controller output selector does not reach the boundary taps before entering the lock mode.  36  At any time, only one bit of the shift register is active, selecting an output tap of the delay line. In this design all the unit delays are the same and exhibit the same amount of delay. A linear approach has been selected to achieve the lock mode in the design of DLL in this thesis. Therefore, the controller linearly shifts the output tap to the right or left one step at a time so the skew between the output clock and input reference clock is gradually reduced to the minimum, which is less than the resolution offinetuning delay units.  A Successive Approximation Register Delay Locked Loop (SARDLL) is proposed in [33], which uses a counter instead of a shift register. Also, its delay line is designed in a binary-weighted manner and no longer consists of delay units with equal delay time. The N-bit control word from the up/down counter determines whether the input clock goes through the delay stage or passes it as shown in Figure 3.9.  Input Clock  1  2  4  2  m f  ^  J  N-3 N-2 N-1 2  2  1  1  |  4-,  JL  JL  Delay Line  Output r\nnV Clock  ^  N-bit Control Word  Phase Comp Feedback Clock  Fast Idle Slow  N-bit Up/Down Counter  Figure 3.9 SARDLL block diagram [33].  37  For faster lock time, the binary search algorithm is incorporated into S A R D L L . This algorithm reduces the searching effort and speeds up the lock time process. The flowchart in Figure 3.10 demonstrates how this algorithm works for a three-bit control word. In the beginning, the most significant bit (MSB) of the controller output is set to one, and all the other bits are set to zero. A phase comparator examines whether the output clock leads the input clock or not. If it does, the M S B remains high. If not, it is set to low and held constant. In this way the M S B is determined and the process is repeated for each following bit until the least significant bit (LSB) is determined. In this way, the D L L can be locked quickly.  (Start)  Figure 3.10 Flowchart for weighing sequence  A conventional linear approach has been implemented in this thesis. Devising the best algorithm to speed up the lock time period is an independent topic which can be explored in future research projects.  38  3.2 D L L m o d u l e s d e s c r i p t i o n In this section, all the modules for the proposed D L L are explained in detail. First, the circuit and all of its components are described. Then, the functionality and operation of each block are investigated in details.  3.2.1 Vernier delay line The delay line consists of N unit delay in a chain configuration. In this design N=128, which establishes an approximate minimum operating frequency of 100 M H z based on target spec. More unit delays are needed to lower the minimum operating frequency. Each unit delay consists of five dual-input N A N D gates. Therefore, a total of 640 N A N D gates are used for this Vernier delay line.  In clock distribution applications, the clock frequency is fixed, so the minimum value of N is calculated for the frequency, automatically leading to minimal power and area consumption. In the clock recovery application, the D L L operates in a range of frequencies, so the value of N is determined by the lowest frequency component in the incoming data.  The output port of all 128 delay units comprising the Vernier delay line are connected to a single-bit bus. This single-bit bus is the output of the D L L and is fed back to the phase detector block for phase comparison. If none of the tri-sate output buffers in the Vernier delay line are enabled, then the D L L output floats which is neither low or high value.  39  In order to prevent the DLL output to float, a small tri-state buffer is hooked up to the DLL output. The input and enable ports of this buffer are tied to logic high, so its output holds the DLL output to a weak high '1' value. Due to the weak drive capability of this small buffer, a low output at any one of these 128 buffers overrides this weak high value and the DLL output is pulled down to the '0' logic value.  Each unit delay consists of five NAND and one tri-state buffer gates. U l and U2 form the fine unit delay, and U3 and U4 form the coarse unit delay. U5 acts as a switch controlled by the fine-control input. The coarsecontrol input is connected to enable port of the buffer gate (U6) and determines whether the unit_delay_out port is connected to output of the U4 or is in a state of high impedance as shown in Figure 3.11.  VDD  VDD  fme_output  finejnput  finecontrol fine control inv ..vernier_output coarse_output  vernier_input coarse_input •  clk_output  Figure 3.11 Proposed unit delay circuit  40  The clk_output of all N unit delays are tied to each other and form a one bit tri-state bus. A single tri-state with weak output drive holds this bus at weak high level which guarantees this single-bit bus never floats.  The fine and coarse delay units are constructed by two N A N D gates in series, forming a symmetrical delay line. The propagation delay is the same for both rising and falling edges, so the duty-cycle is preserved along the line.  Each output port of U2 and U4 is connected to two other inputs, so the fan-out is two and both U2 and U4 use the A l port for delay input. As the result, both U2 and U4 introduce the same amount of propagation delay. The difference between fine and coarse unit delay is that the fineinput is connected to port A2 of U I , but the coarse_input is connected to port A l of U3. In a N A N D gate, the propagation delay from A l and A2 ports to output Z is not the same. The Vernier technique is based on this inherent characteristic of the N A N D gate and uses this differential delay between the two inputs to achieve a fine step resolution.  In DLLs proposed in [47], [59], [51] [85] and [90], the input clock is connected to all the unit delays in the delay chain, so there are N taps, where N is the number of unit delays in the delay line. This large fan-out requires a clock driver, which is large in area and consumes extra power. It also introduces an extra delay that has to be compensated for with another dummy clock driver in the feedback path.  41  In this design, the input clock is connected to only two N A N D gates in the first unit delay, so there is no need for the clock driver. This eliminates the phase shift between the input reference clock and output clock due to delay mismatch between the clock and dummy clock driver.  There are a total of 5 dual-input N A N D and one tri-state gate in each unit delay, which is less than 6 dual input N A N D and 6 inverter gates used in the previously described digital Vernier D L L circuit [73] as it is shown in Figure 2.13.  The coarseinput and fmeinput ports of the first unit delay are tied to the input reference elk port. This is the entry port for both fine and coarse chains, and from this point the reference clock propagates through two separate fine and coarse delay chains. The fme_output, coarse_output and vernierout ports in the last unit delay of the delay chain are not connected to any net.  3.2.2 Vernier delay line controller The controller block consists of a finite state machine (FSM) and two shift registers that control the D L L operation. A l l the timing control for the delay line is originated in the controller block. It determines which output tap in the delay line is connected to the D L L output and whether the D L L is in coarse or fine mode.  42  Reset  decreasedelay & fine_control(N-l)  increase_delay & fine_control(0)  Figure 3.12 State diagram of controller block  The finite state machine has four states: IDLE, INCREMENT, DECREMENT, and FINE as shown in Figure 3.12. The F S M remains in the IDLE state while reset is asserted. The initial coarse_load_data value is loaded into the coarse shift register when reset is asserted. This value determines which output tap is selected as the output of the Vernier delay line. The default value of "00000000000000008000000000000000" selects the center tap.  The register_clock, increase_delay and decrease_delay are generated by the phase detector block The input register_clock signal is used to clock the shift register. The increase_delay and decrease_delay signals determine whether is a right shift or a left shift as shown in Figure 3.13.  43  coarse_load_data  increasedelaydecrease_delayregister_clk  5^ B^  -  gB»  Right Left Clk  D Enable Q  (STATE = INCREMENT) (STATE == D E C R E M E N T )  coarse_control fine load data  increase_delay decrease_delayregister_clk  (STATE == FINE)  fine control  fine  control inv  Figure 3.13 Shift registers in controller block  Depending on whether the increase_delay or decrease_delay signals is asserted, the state machine moves to I N C R E M E N T or D E C R E M E N T state, respectively. The state machine stays in the D E C R E M E N T state as long as decrese_delay is asserted and moves to the FINE state when increasedelay is asserted for the first time. The sate machine stays in the I N C R E M E N T state as long as increase_delay is asserted and moves to the D E C R E M E N T state when decrease_delay is asserted for the first time. Subsequently, it moves to the FINE state in the next clock when increase_delay is asserted 44  Therefore, regardless of whether it is in a state of D E C R E M E N T or INCREMENT, the state machine ends up in the FINE state where the coarse shift register is disabled and the fine shift register's output determines the amount of incremental fine delay needed for the D L L to lock its output clock with the input reference clock. The D L L stays in the lock mode for as long as the input clock phase is steady and the phase difference between the output and input reference clock is within the resolution of the phase detector.  If input clock's frequency and phase change at any time, then the D L L exits the lock mode. If the output clock's rising edge leads the input clock's rising edge, then increase_delay is asserted. On the other hand, if the input clock's rising edge leads the output clock's rising edge, then decrease_delay is asserted. In either case, the register_clock is enabled. The state machine stays in the FINE state and the fine shift register shifts left or right depending on whether decreasedelay or increase_delay is asserted.  For example if the resolution of Vernier delay line is 10 ps and the fine shift register holds the hex value of "00000000010000000000000000000000" when D L L is in lock mode. The fine shift register can be shifted to the left until its most significant bit becomes "1", which requires 39 clock cycles. The delay of delay line is then decreased by 390 ps. On the other hand shift register can be shifted to the right until its least significant bit becomes "1" which requires 88 clock cycles and the delay of delay line is increased by 880 ps. Therefore, i f the phase error between the output clock's rising edge and input clock's rising edge is within this window, then the state machine stays in the FFNE state and lock mode is achieved. 45  If phase error is not within this window, then the state machine shifts to either INCREMENT or DECREMENT, depending on whether an increase or decrease in the delay line is required. At this point, the fine shift register resets to "0" and is disabled. The coarse shift register, which controls the coarse delay line, is enabled and each shift to the right or left increases or decreases the delay by an amount of delay equal to coarse unit delay (delay of two NAND gates in a row). The state machine finally moves into the FINE state when the phase error is less than the coarse unit delay, and then the fine incremental delay can reduce the phase error into less than Vernier resolution.  In order to lower the power consumption in this DLL, only register_clock is used as the clock to the controller module. Therefore, while DLL is in lock mode, both increase_delay and decrease_delay are deasserted and registerclock is not enabled. The controller module has 128 flip-flops for each coarse and fine shift registers, so turning off the clock to shift registers when both are disabled, lowers the power consumption. A flip-flop consumes power if it is clocked regardless of its D input changes. Disabling a clock when is not required saves power in digital circuits.  The Vernier delay line consists of 128 unit delays. Therefore, there are 128 flip-flops in each coarse and fine shift register. The finite state machine has four independent states. Two flip-flops are required to encode the two bits representing these 3 states. In total, there are 258 flip-flops in the controller module, so clock-gating (disabling a clock when is not required) saves power when DLL is in the lock mode.  46  3.2.3 High-resolution phase detector The phase detector in D L L detects the phase error between output and input reference clocks. The resolution of a Vernier D L L depends not only on the Vernier concept utilized in the delay line, but also on how its phase detector is designed. The minimum phase error that can be detected by the phase detector is defined as the phase detector's resolution. The resolution of a phase detector depends on many factors, including design methodology and CMOS technology implemented in chip fabrication.  A high-resolution phase detector is proposed in [50], where the delay of a buffer determines the resolution. The 70 ps is achieved when it is implemented in 0.18 | i m technology. The phase detector has three outputs: Shift_Left, Shift_Right, and Clk as shown in Figure 3.14. When the rising edge of the input clock is within one unit delay (the delay of U4) of the rising edge of the output clock, both outputs of the phase detector, Shift_Right and ShiftJLeft, go to low and Clk is turned off.  A divide-by-two is included in the phase detector, so the phase detector is made to wait at least two clock cycles before making another decision, generating a high on either Shift_Right or Shift_Left. This provides enough time for the shift register in the proposed [50] design to operate and for its output waveform to stabilize, on the other hand increases the lock time, because now a decision has to be made for every two input clock cycles.  47  Figure 3.14 Phase detector in [50].  A modified version of the high-resolution phase detector [50] is proposed in this thesis which can significantly improve resolution. The Vernier methodology is implemented in this design, which effectively reduces the amount of delay between the D inputs of UI and U2. As explained previously, the delay between two inputs and the output of the A N D gate is not the same for both inputs.  The delay difference is exploited in the Vernier delay line to achieve a very small fine incremental unit delay. The same concept is used in the proposed high-resolution phase detector in the thesis. The schematic of this phase detector is shown in Figure 3.15. The U7 and U8 introduce the same delay because both gates are connected through pin A l of 48  the A N D gate. The U3 gate introduces slightly more delay, because the A 2 pin is used as input. The 0.18 |J.m technology library used for simulation and synthesis, introduces less than 10 ps of delay difference between two outputs and output of an A N D gate.  Figure 3.15 Proposed high resolution phase detector  The decreasedelay and increasedelay are ORed to generate register_clk. The resolution of a phase detector is defined as the minimum detectable phase error between its two inputs. If phase error is within the resolution of the phase detector, then decrease_delay, increase_delay and register_clk stay low.  The OR gate (U6) also delays the register_clk to either increasedelay or decreasedelay which guarantees the required setup for the flip-flops in the controller driven by register_clk. In a flip-flop the data should not change within setup and hold time window around the clock edge, otherwise output is not predictable and can go to a metastable (unstable) condition. 49  If the output clock leads the input clock by a margin greater than the resolution, a delay difference is created between the A l and A2 input pins to the output pin in the A N D gate. Then, the Q pin of U I and U2 go high resulting a high on the increase_delay output.  On the other hand, if the input clock leads the output clock by a margin greater than the resolution, then the Q pin of UI and U2 go low (Q goes high for both U I and U2), resulting in a high on decreasedelay output. In either case, register_clk goes high and generates the required clock edge for the logic in the controller module as shown in Figure 3.16.  r-*\ Output leading DLL Input Clock I—*| Input leading DLL Output Clock U1/Q U2/Q  increase_delay  ^  "  / /  \  /  \  /  \  /  ft  j j_  decrease_delay register_clk  w  \  jj  —  ft  /  "— ^SS \  Figure 3.16 P h a s e detector w a v e f o r m s  If none of the two cases exist, the input and output clocks are within the resolution of the phase detector. In this case, the Q of UI goes high and the Q pin of U2 goes low resulting in a low on both increase_delay and decrease_delay outputs. This happens when output locks to input and D L L is in lock mode.  50  The divide by two logic (U3 and U7 in Figure 3.13) is not used in the proposed high-resolution phase detector. The delay of the Vernier delay line increases or decreases by a small differential amount equal to the resolution of the delay line. Therefore, the delay line can be stabilized before the next decision is taken on the next edge of input clocks, and there is no need to delay by every other clock. This reduces the time required for the D L L to achieve the lock mode. The lock mode is detected by the lock detector module and is described in the next section.  3.2.4 Lock detector The lock detector is a very simple circuit, which outputs a high when D L L is in the lock mode as shown in Figure 3.17. If both increasedelay and decrease_delay are low on the falling edge of the D L L input clock, then the output lockjndicator goes high to indicate that D L L now is in lock mode. The D L L input clock is used instead of the register_clk because when D L L goes to lock mode, the register_clk is off and can not clock the low value on the decreasedelay and increasedelay.  increase_delay decreasedelay D L L input clock  D Q  lock indicator  1—c >  Figure 3.17 Lock detector circuit  51  Chapter 4 Analysis of proposed DLL  This chapter analyzes the simulation results, describes the testbench, and demonstrates how the D L L achieves the lock mode. The coarse and fine phases of the locking process are investigated and illustrated in the captured waveforms.  4.1 T e s t b e n c h A simple testbench instantiates the D L L design, clock, and reset generator. It also introduces glitch in the clock in order to examine how the D L L re-enters the lock mode when its input clock phase changes abruptly. The lock_indicator signal is monitored any time this signal becomes high indicating that D L L has entered the lock mode. The target resolution is less than 10 ps for the operating frequency range of 100 M H z to 200 MHz.  In order to verify that the D L L can recover from any abrupt input phase changes, after a set period of time a glitch is imposed on the input clock source. This drives the D L L into the non-locking mode, where the D L L mechanism guarantees recovery. After some time, the D L L locks to the input signal. The time it takes for D L L to lock depends on input fluctuations, the D L L architecture, the length of the delay line, and the algorithm used in the controller's module, where the worst period is defined as the lock recovery period.  52  The D L L described in this thesis is in lock mode when the controllers state machine is in the FINE state and when both increase_delay and decrease_delay signals are inactive. Depending on the imposed glitch, the lock mode can be achieved in the FINE state based on the condition that this glitch is smaller than unit delay. Any variation larger than unit delay forces the state machine to enter INCREMENT or D E C R E M E N T state, which later re-enter the FINE state and finally enable D L L to regain lock status.  The testbench is configured for six different cases and exhaustively covers all the different operational modes of D L L . The first two cases verify the general locking process after power up and reset, considering both possible leading or lagging input clock in reference to output clock. The other four cases verify the lock re-entry process when an amount of glitch is applied to input clock. Depending on the amount of glitch and the relative position of the input to output clocks (leading or lagging), the four possible cases are investigated in the testbench. The following sections detail all the cases. A l l the waveforms are included, and a description of the phase detector and the controller's operation for every case clarifies the DLLs operating mechanism.  4.2 Initial lock After powering up and resetting, either phase detector's increasedelay or decrease_delay becomes high, depending on the polarity of the phase error. If the input clock leads the output clock, then decrease_delay is enabled. On the other hand, i f the output clock leads the input clock then increase_delay is enabled. In the case where output clock is in the same phase as the input clock, then both increase_delay and decrease_delay signals (phase detector outputs to the controller module) are disabled. 53  If decrase_delay is enabled, then the state machine transits to the D E C R E M E N T state. In this state at every clock the coarse shift register shifts one unit to the left, which consequently decreases the total delay by. one unit. At some point the output clock starts leading ' the input clock, which means that coarse action is completed and the state machine has transited to the FINE state. In this state, the fine shift register shifts to the right and at every cycle the total delay of delay line increases by an incremental value. As described in the previous chapter, the incremental value is very small, 4 ps for the N A N D gate used in this design. Finally, the output clock is within the D L L resolution (4 ps) of the input clock, and increase_delay is disabled. The lock_indicator signal becomes high, which indicates that D L L is locked. The captured waveforms are shown in Figures 4.1 and 4.2. For clarity, only related signals are captured. The phase error between the output and input clocks is 2 ps after the D L L locks, where the L O C K I N D I C A T O R signal is high as shown in Figure 4.2.  File  Edit  D|cg|B|  Marker  GoTo  ' I '1 , 1  View  Options  a-|z-JTJ  Window  Help  K|>J«|»|H«|  50000  R|[*T«.|  100000 ' ' I.J...' • j  (S|f| 150000 . . . . . . . i . .  200000  250000 _  RESET LOCKJNDICATOR DLL_CLOCK_OUTPUT DLL_CLOCKJNPUT REGISTER_CLOCK DECREASE_DELAY INCREASE_DELAY NEXTSTATE STATE  | Ready  DLE  DECREMENT  FINE  DECREMENT  FINE  RR~T jTlrne - ZS0000  Wif-10  5Wfc=9  Sei-0  Figure 4.1 Initial lock mode waveform for a leading input clock 54  File  Edit  Marker  OJEgjt  GoTo i  View  Options  Window  Help  z+ z232910  232920  232930  RESET LOCK_INDICATOR DLL_CLOCK_OUTPUT DLL_CLOCK_INPUT REGISTER_CLOCK DECREASE_DELAY INCREASE_DELAY NEXTSTATE  FINE  STATE  FINE  J Ready  Time « HS0000  :Wif=1D  lWfc=9  ;Sel=0  Figure 4.2 Initial lock mode waveform for a leading input clock (zoomed in)  On the other hand, i f increase_delay is enabled, then the state machine transits to the I N C P v E M E N T state. In this state, at every clock edge the coarse shift register shifts one unit to the right, which increases the total delay by one unit delay. At some point, the input clock starts leading the output clock and decrease_delay is asserted, which means that coarse action is completed. The state machine then moves to the D E C R E M E N T state and after one clock cycle enters the FINE state as shown in Figure 4.3. The reason behind this sequence is that initially the fine delay line output tap is set to the first tap, the most left tap position of the chain, so fine delay can only be increased. Therefore, by going to the D E C R E M E N T state the output clock leads the input clock again, but this time the phase error is less than one unit delay. By moving to FINE state the delay incrementally increases until the phase error becomes zero and lock state is achieved.The phase error between the output and input clocks is 2 ps after the D L L locks as shown in Figure 4.4. 55  File  Marker  Edit  E _  GoTo  View  Options  i  Window  Help  I«I*HAI[M£|JSL£|| -32086  50000 1  1  i  1  1  L.. . v i . 1  1  1  1  1  1  1  100000 I • • 1  150000  1  200000  • 1 1 ,1.1—' I ' I I <  '  2500001;  I. ' • ' ' I1 I I I I 1  RESET LOCKJNDICATOR  LnmrLr^^  DLL_CLOCK_OUTPUT DLL_CLOCK_INPUT REGISTER_CLOCK  n  DECREASE_DELAY  I  INCREASE_DELAY NEXTSTATE STATE  iDLE  NCRdMEMT  D"|  FINE  i  iNCRE M E N T  D"  FINE  i  31  Ready  .Time = 260000  ,Wif-10  Wfc-9  Sel=0  Figure 4.3 Initial lock mode waveform for a leading output clock  J250480 I  '  250500 1  1  1  250520 ..... I ...  1  i  i  I  U  250540 . I .  RESET LOCKJNDICATOR DLL_CLOCK_OUTPUT DLL_CLOCKJNPUT REGISTER_CLOCK DECREASE_DELAY INCREASE_DELAY NEXTSTATE  FINE  STATE  7H Time = 2G0CIB0  Wif-10  Wfc-9  ,Sel-0  Figure 4.4 Initial lock mode waveform for a leading output clock (zoomed in) 56  4.3 L o c k re-entry Phase variations on the input clock due to jitter and glitch introduce phase error, which in causes the D L L to exit the lock mode. This initiates a re-entry process and subsequently the D L L resumes its lock status. Depending on the amount of phase error, the state machine can stay in the FINE state or move to I N C R E M E N T or D E C R E M E N T states. The following sections explain these 2 possible cases in detail.  4.3.1 L o c k re-entry ( e a s e l ) If the phase error is within the dynamic range of the fine delay line, then D L L re-enters the lock mode and the state machine stays in the FINE state. The dynamic range of the fine delay line is the range at which its delay can be increased or decreased without reaching the limit in both direction. The total delay of fine delay line is (N * T^, where N is the number of fine delay units in the chain and Tf is the delay of each fine unit. In this design N is 128 and the delay of each fine unit is 4 ps. The 4ps is the difference of input to output delay of 2 input N A N D gate in the library.  For example, i f in lock mode the fine delay line's output is the middle tap of the chain then the fine delay line can be increased or decreased by a delay equal to half of the total delay of the fine delay line or 256 ps, which then any input phase error less than 256 ps is compensated and lock mode is resumed while state machine is still in the FINE state. The simulation result is shown in Figure 4.5. The I N C R E A S E _ D E L A Y signal goes high for one clock so increases the total delay by 1 fine unit delay or 4 ps and compensates for the  57  added 6 ps input phase error. The phase error is within 4 ps resolution of phase detector and D L L is locked.  Eile  Edit  Marker  D  GoTo  View  Options  Window  Help  .1 j I U.\ 2+ | Z - | 'Ji | K | > | «|»j*r | R I [fT «t >  277048  J,,.,L,,J,J1....!  290000 300000 ! [...! !.... .J. ! .' ) ! ,1...! ! ! ,'  1  1  1  1  ....  1  1  310000 ' ' 1  1  RESET LOCKJNDICATOR DLL_CLOCK_OUTPUTj  LT  DLL_CLOCK_INPUT REGISTER_CLOCK DECREASE_DELAY INCREASE  DELAY FINE  STATE  FINE  | Ready  FINE  |Time = 600000  sWif=28  Wfc=9  jSel-1  Figure 4.5 Lock re-entry mode waveform for small phase error  4.3.2 L o c k re-entry (case 2) The phase error can not be corrected by fine action if the amount of error is larger than the dynamic range of the fine delay line. For example, if in the lock mode the fine delay's output tap is in the center of the fine delay line, then any phase error greater than half of the fine delay line, or 256 ps can not be corrected while the state machine is in the FINE state.  A phase error is generated i f input clock leads the output clock. The decrease_delay is enabled, and the fine delay line output tap shifts to the left until it reaches the first tap of 58  fine delay line. At this point the state machine moves to the D E C R E M E N T state and coarse action is enabled. At every clock, the total delay of D L L is decremented by an amount equal to one unit delay or 78 ps in the simulation. At a certain point, the decrease_delay is deasserted and increase_delay is enabled. Then, the state machine moves to the FINE state and finally achieves the lock mode.  Figures 4.6 shows that originally D L L locks at time 220 ns. A 500 ps glitch is applied at time 240 ns and D L L locks again at time 550 ns. The final phase when the D L L locks again is shown in Figure 4.7. The 500 ps is the amount of glitch required for the D L L to exit the lock mode and not to be locked within the dynamic range of fine delay line as described in lock re-entry (case 1). The introduced glitch is shown in Figure 4.8.  Figure 4.6 Lock re-entry mode waveform for a leading input clock  59  File  I  Edit  D|ES!|H|  Marker  GoTo  *NES|-j  View  1 1 1  Options  H H '  Window  ^2^iS^«&Ji,ssl.tX>LLCIl  Help »l[M5l|  1 K | » M * M  S|f|  f 450000  51476C RESET  1  LOCKJNDICATOR  1  DLL_CLOCK_OUTPUT  0  DLL_CLOCKJNPUT  0  REGISTER_CLOCK  0  DECREASEJ3ELAY  0  INCREASE_DELAY  0  . 1 ... 1 1  I  1 1 1  I  500000 r  I  1 i  j  — i  i i i_  i  Hi  i i m j m i i J T r L j i j i J T R j i j i UUTT  mjmRjmnjirijmnjT iimnjiruiiirirmrLn inn  FINE  cn-M. Kl  TTLRT  FINE FINE  |.«|  J  '  •1  | »|  [Ready  -time - 600000  M ,Wfc=9"  ;Wlf«31"  M " Sel= 1  Figure 4.7 Lock re-entry mode waveform for a leading input clock (zoomed in)  File  Edit  D £  Marker  GoTo  -1'J  II  View  £60990 .! RESET  1  LOCK_INDICATOR  1  Options  z+ Z-  K  Window  Help  a  ,.- +.  •  280000 1  1  1  1  ;  .... j  1  1  1  ?|  300000 1  1  '..j  1  1  1  1  1  1  1 1 1  1  J  320000  1  •  DLL_CL0CK_0uTPUT| 1 DLL_CLOCK_INPUT  1  REGISTER_CLOCK  0  DECREASEJOELAY  0  IIMCREASE_DELAY  0  NEXTSTATE  FINE  FIN Fl'-E  Read-  Time -6D0000  }Wlf-31  !Wfc=9  Sel=1  Figure 4.8 Introduced glitch waveform for a leading input clock  60  A phase error is generated i f the output clock leads the input clock. The increase_delay is enabled, and the fine delay line output tap shifts to the right until it reaches the last tap of fine delay line. At this point, the state machine moves to I N C R E M E N T state and coarse action is enabled. At every clock, the total delay of D L L is incremented by an amount equal to one unit delay or 78 ps in the simulation. At a certain point the increase_delay is deasserted and decrease_delay is enabled. Then, the state machine moves to the FINE state and finally achieves the lock mode.  Figures 4.9 shows that, originally, the D L L locks at time 250 ns. A 2 ns glitch is applied at time 300 ns and D L L locks again at time 1315 ns. The final phase when the D L L locks again is shown in Figure 4.10. The 2 ns input phase shift is introduced as glitch which causes the D L L exits the lock mode and L O C K J N D I C A T O R signal goes low as shown in Figure 4.11.  File  Edit  Marker  GoTo  •leg]sal aiNgsli  View  Options  J^J z,|z-| -j| - i :  500001) RESET  1  LOCKJNDICATOR  1  DLL_CLOCK_OUTPUT| 0 DLL_CLOCK_INPUT  0  REGISTER_CLOCK  0  DECREASE_DELAY  0  INCREASE_DELAY  0  NEXTSTATE  FINE  S T A T E •* '  ax  Ready  •  I FINE  |0  r  Window  Help  H^MjjrncjMil 500000  J—i—i—i—i—i—i—i—i—I—i  1000000  i—i i i i i i i I i i i _i_  J~L  •III i l l Ml IIIII j IJIllllillllilM L J  j II III  ujjjj_ji..jjjiijjjjjjjj;iji..jjjjjiwiiii I I I I M J N ' I I I  = INE  INCREIvE  FINE  !Tlme - 1 5 0 0 0 0 0  FINE  •sCREIv.E'  Wif=31  :Wfc=S  IIME  Sel-1  Figure 4.9 Lock re-entry mode waveform for a leading output clock 61  File  Edit  Marker  GoTo T  View  Options  Window  [ - | - J »|z-||. |  al;  11250000 240131 _ l l _ RESET  1  LOCKJNDICATOR  0  Help  1300000 _l  I  l  I  DLL_CLOCK_OUTPUT| 0 DLL_CLOCKJNPUT  0  REGISTER_CLOCK  0  DECREASE_DELAY  0  INCREASEJDELAY  0  NEXTSTATE  INCREH: I M . - H M N  >  STATE',,,'  i.>\K:}  f Ml  \CRE!-/EU~ DECBE'  I  FINE  Ti  Ready  T i m e - 1500000  Wlf-31  Wfc=9  Figure 4.10 Lock re-entry mode waveform for a leading output clock (zoomed in)  File  Edit  Marker  GoTo  D|cs|al *|<Mm| __4  View  Options  Window  Help  z+|z-|, | < | > | « | » H jVjfff^J  267250 RESET  1  LOCK_INDICATOR  1  DLL_CLOCK_OUTPUT|  1  DLL_CLOCK_INPUT  1  REGISTER_CLOCK  0  DECREASE_DELAY  0  INCREASE_DELAY  0  NEXTSTATE  FINE  300000 . I  I  I .  L_  ,  I  320000  ,  I  I  i  ,  .  ,  I  i  FINE FIM=  J3I Ready  iTime » 15DOOO0  Wif=31  !Wfc=9  |Sel=1  Figure 4.11 Introduced glitch waveform for a leading output clock  62  4.4  Gate c o u n t of the vernier unit delay  The proposed vernier unit delay line was mapped to a commercial 0.18 | i m library. The total cell area is about 96 basic cells. The previously published unit delay [73], was also mapped to the same library and the total cell area is about 122 basic cells. Therefore, the proposed unit delay saves about 20% gate count when is implemented in the same library. The gate count reduction is significant considering hundreds of the unit delays blocks are needed in a typical delay line.  The static power consumption of a circuit is due to the leakage current and is proportional to the gate count. Therefore, the static power consumption of the delay line is reduced by 20%. The dynamic power consumption of the circuit not only depends on the gate size but also at the rate each gate is being toggled in the circuit. The toggle rate is a function of logic and operating frequency. The dynamic power consumption can be measured using the dynamic test vectors which are generated during functional simulation.  The practical formulas are given by fabs to estimate the dynamic power consumption. The general guideline is that dynamic power consumption increases proportionally with the gate count increase. Based on this rule of thumb the 20% dynamic power saving is realized by the proposed delay line.  4.5 R e s o l u t i o n of the p r o p o s e d D L L The proposed vernier unit delay is based on the delay difference between the 2 inputs to output of a dual-input N A N D gate. The difference for a N A N D gate in 0.18 (imcommer63  cial library is measured less than 10 ps in the functional simulation (4 ps). The previously published unit delay [73], is based on the delay difference of a N A N D gate with different fanout loads. The achieved resolution was the fifth of the delay of each unit block, i.e, about 20 ps i f it was implemented in the same 0.18 [im library, considering the delay of each unit delay block is 100 ps.  Therefore, the proposed design offers 100% improvement for resolution of the delay line. The higher resolution reduces the phase error between the output and input clocks of a D L L . At the same time, the cycle-to-cycle jitter is also reduced due to the fact that output clock can be delayed by smaller unit between the two consecutive clock edges.  4.6 Limitations of the proposed DLL The main limitation of the proposed D L L is, that depending on the phase error between the input and output clocks, it can take up to 128 clock cycles for the D L L to lock which is considered relatively slow. For example, i f the first output tap of the fine delay line is selected while the D L L is locked, then a 512 ps glitch at input causing the output clock to lead the input clock, requires 128 input clock so the D L L can lock again. The resolution of fine delay line is 4 ps so at every clock cycle the delay of the whole delay line can is increased by an amount equal to 4 ps, therefore 128 input clock cycles is required to lock. This example is considered the worst case and normally D L L locks in a shorter time. The thesis mainly concentrates on how to improve a DLL's resolution. The extra research can be done to improve the lock time, for example devising efficient algorithms to shorten the lock time period [33]. 64  Chapter 5 Conclusion  The phase-locked loops (PLLs) and delay-locked loops (DLLs) have been widely adopted to solve the clock skew problem. In recent years, Delay Locked Loops (DLLs) have been widely used for clock alignment due to their lower phase-error accumulation and faster locking time [35], [82]. A D L L is used in many other applications such as clock synthesis [2], [3], [6], clock recovery [14], [19], [25], S D R A M controller [26], [47], [65], Automatic Test equipment (ATE) [99] and Time to Digital Converter (TDC) [4], [22], and [23].  The first DLLs were analog and mainly used for clock distribution applications [10], and [13]. A conventional analog D L L consists of four main blocks: a voltage controlled delay line (VCDL), a charge-pump, a low pass filter, and a phase detector. The simple design of the D L L offers many advantages when compared to VCO-based PLLs. It is still relatively complex analog circuit, requiring process-specific implementation, making it very difficult to reuse the same design for different technology. Basically an analog D L L is a nonportable architecture as major changes in the layout of design are required to port a design from one technology to another one.  Digital DLLs are characterized by their use of digital delay lines. They are typically made from simple digital circuit elements. This simplicity helps to design a portable digital D L L which can be easily adopted for different technologies. Although the digital D L L uses 65  more area and power than the analog D L L , its greater simplicity, and lower minimum required power supply voltage makes it very attractive for many applications.  The Register Delay Locked Loop (RDLL) belongs to the digital D L L family and is widely used in high speed synchronous D R A M (SDRAM) applications [17], [51], [85] and [90]. The R D L L consists of a tapped delay line, a shift register, a phase detector, and a replica input buffer dummy [85].  The Synchronous Mirror Delay (SMD) and Clock Synchronized Delay (CSD) circuits are non-feedback systems which can achieve the lock, in only two clock cycles [52], [88] and [94]. Therefore, in standby mode these circuits can be disabled, and they can lock to the reference clock in just two clock cycles when the operation mode is resumed.  The latest DLLs use Vernier principle, based on the Vernier caliper tool[83]. The Vernier technique implemented in the proposed design is based on the characteristic of a N A N D gate and uses the delay difference between the inputs to output of a dual-input N A N D gate to achieve a fine step resolution. The previous technique [73] was based on the delay difference of a N A N D gate with different fanout loads. The analysis in previous chapter shows the resolution of D L L is doubled based on the new technique implemented in the proposed design.  This thesis introduced a novel architecture for a high-resolution Vernier D L L with a resolution of less than 10 ps. It combines the two coarse and fine unit delay blocks into one 66  unit delay block in a way that effectively reduces the area of the delay line. This reduction is considered significant when taking into account the number of unit delay blocks required in a typical delay line. The combination of smaller delay line and integration of fine and coarse controllers reduces D L L power consumption. The analysis in the previous chapter shows that a 20% gate count reduction in the delay line is achieved by using the proposed unit delay block. It also shows that total power consumed by delay line is also reduced 20% approximately.  A testbench was written for all different cases, exhaustively covers all the different operational modes of D L L . The first two cases verify the general locking process after power up and reset, considering both possible leading or lagging input clock in reference to output clock. The other four cases verify the lock re-entry process when an amount of glitch is applied to input clock.  A linear control algorithm is used in this thesis to achieve lock mode. The controller linearly increases or decreases the total delay .of the delay line. For faster lock time, the binary search algorithm is incorporated into S A R D L L [33]. This algorithm reduces the searching effort and speeds up the lock time process. The various lock mechanism can be explored in order to speed up the lock time period of the D L L . This can be considered as one of the of future research topics.  67  Bibliography [1]  T.Hamamoto, K.Furutani, T.Kubo, S.Kawasaki, H.Iga, T.Kono, Y.Konishi, T.Yoshihara, " A 667-Mb/s Operating Digital D L L Architecture for 512-Mb DDR S D R A M , " IEEE J. Solid-State Circuits, vol. 39, N O . l , pp. 194-206, Jan 2004.  [2]  C.C.Chung, C.Y.Lee, " A New DLL-Based Approach for All-Digital Multiphase Clock Generation," IEEE J. Solid-State Circuits, vol. 39, NO.3, pp. 469-471, Mar 2004.  [3]  R.F.Rad, A.Nguyen, J.M.Tran, T.Greer, J.Poulton, W.J.Dally, J.H.Edmondson, R.Senthinathan, R.Rathi, M.E.Lee, H.T.Ng, " A 33-mw 8-Gb/s CMOS Clock Multiplier and CDR for Highly Integrated I/Os," IEEE J. Solid-State Circuits, vol. 39, NO.9, pp. 1553-1561, Sept 2004.  [4]  C.S.Hwang, P.Chen, H.W.Tsao, " A High-Precision Time-to-Digital Converter Using a Two-Level Conversion Scheme," IEEE Transactions on Neuclear Science, vol 51, NO.4, pp. 1349-1352, Aug 2004.  [5]  A.H.Chan, GW.Roberts, " A Jitter characterization system using a componentinvariant Vernier delay line," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol 12, N O . l , pp. 79-95, Jan 2004.  [6]  C.S.Hwang, P.Chen, H.W.Tsao, " A wide-range and fast-locking clock synthesizer IP based on delay-locked-loop," ISCAS 2004, Proceedings of the 2004 International Symposium on, Vol.1 May 2004, pp.352-361.  [7]  K.Kim, N.Park, T.Kim, " A n unlimited lock range D L L for clock generator," ISCAS 2004, Proceedings of the 2004 International Symposium on, Vol.1 May, 2004,pp.352-361.  [8]  K.Cheng, Y L o , WFang, S.Hung, " A mixed-mode delay-locked loop for widerange operation and multiphase clock generation,"System-on-chip for Real-Time Applications, 2003 Proceedings, Jul 2003, pp.90-93.  [9]  A.Suzuki, S.Kawahito, D.Miyazaki, M.Furuta, " A digitally skew correctable multi-phase clock generator using a master-slave D L L , " ISCAS 03, Proceedings of the 2003 International Symposium on, Vol.1 May 2003, pp. 105-108.  [10]  K.Taesung, K.Beomsup, "Phase interpolator using delay locked loop," Mixed-Signal Design, 2003, Southwest Symposium on, Feb 2003, pp.76-80.  68  ZJingcheng, D.Qingjin, T.Kawasniewski "A-107dBe, lOKHz Carrier offset 2GHz DLL-based frequency synthesizer," Custom Integrated Circuits Conference, 2003, Proceedings of the IEEE 2003, Sept 2003, pp.301-304. GManganaro, S.Kwak, S.Bugeja " A dual 10b 200MSPS pipeline D / A converter with DLL-based clock synthesizer," Custom Integrated Circuits Conference, 2003, Proceedings of the IEEE 2003, Sept 2003, pp.301-304. H.Chang, C.Sun, S.Liu, " A low-jitter and precise multiphase delay-locked loop using shifted averaging V C D L , " in ISSCC 2003 Dig. Tech. Papers, Vol.1, 2003, pp. 434-505. W.Pvhee, H.Ainspan, S.Rylov, A.Rylyakov, M.Beakes, D.Friedman, S.Gowada, M.Soyuer, " A 10-Gb/s CMOS clock and data recovery circuit using a secondary delay-locked-loop," Custom Integrated Circuits Conference, 2003, Proceedings of the IEEE 2003, Sept 2003, pp.81-84. GWei, J.Stonick, D.Weinlader, J.Sonntag, S.Searles " A 500MHz M P / D L L Clock Generator for a 5Gb/s Backplane Transceiver in 0.25 C M O S , " in ISSCC 2003 Dig. Tech. Papers, Vol.1, 2003, pp. 464-465. S.J.Kim, S.H.Hong, J.K.Wee, J.H.Ahn, J.Y.Chung, " A low Jitter, fast recoverable, fully analog D L L using tracking A D C for high speed and low stand-by power DDR I/O interface," VLSI Circuits, 2003, Digest of Technical Papers, 2003 Symposium on, June 2003, pp. 285-286. J.T.Kwak, C.K.Kwon, K.W.Kim, S.H.Lee, J.S.Kih, " A low cost high performance register-controlled digital D L L for 1 Gbps/spl times/32 D D R S D R A M , " VLSI Circuits, 2003, Digest of Technical Papers, 2003 Symposium on, June 2003, pp. 283284. K.H.Cheng, Y.L.Lo, W.F.Yu, S.Y.Hung, " A mixed-mode delay-locked loop for wide-range operation and multiphase clock generation," System-on-Chip for RealTime Applications, 2003, Proceedings, The 3rd IEEE International Workshop on, Jul 2003, pp. 90-93. Z.Mao, T.H.Szymansli, " A 4Gb/s CMOS fully-differential analog dual delaylocked loop clock/data recovery circuit," Electronics, Circuit and Systems, 2003, ICECS 2003, Proceedings of the 2003 10th IEEE International Conference on, Vol.2, Dec 2003, pp. 559-562.  69  M.E.Lee, W.J.Dally, T.Greer, H.T.Ng, R.F.Rad, J.Poulton, R.Senthinathan, "Jitter Transfer Characteristics of Delay-Locked Loops, Theories and Design Techniques," IEEE J. Solid-State Circuits, vol. 38, NO.4, pp. 614-621, Apr 2003. T.Matano, Y.Takai, T.Takahashi, Y.Sakito, I.Fujii, Y.Takaishi, H.Fujisawa, S.Kubouchi, S.Narui, K.Arai, M.Morino, M.Nakamura, S.Miyatake, T.Sekiguchi, K.Koyama, " A 1-Gb/s/pin 512-Mb DDRII S D R A M Using a Digital D L L and a Slew-Rate-Controlled Output Buffer," IEEEJ. Solid-State Circuits, vol. 38, NO.5, pp. 762-768, May 2003. S.Tabatabaei, A.Ivanov, "Embedded Timing Analysis: A SOC Infrastructure," IEEE Design & Test Of Computers, vol. 19, NO.3, pp. 24-36, June 2002. S.Tabatabaei, A.Ivanov, " A n embedded core for Sub-Picosecond timing measurements,"^? Conference, 2002, Proceedings of ITC International, pp. 129-137, Oct 2002. A.H.Chan, G.W.Roberts, " A deep sub-micron timing measurement circuit using a single-stage Vernier delay line," Custom Integrated Circuits Conference, 2002, Proceedings of the IEEE 2002, May 2002, pp.77-80. X.Millard, F.Devisch, M.Kuijk, " A 900-Mb/s CMOS Data Recovery D L L using Half-Frequency Clock," IEEEJ. Solid-State Circuits, vol. 37, NO.6, pp. 711-715, June 2002. S.J.Kim, S.H.Hong, J.K.Wee, J.H.Cho, P.S.Lee, J.H.Ahn, J.Y.Chung, " A Low-Jitter Wide-Range Slew-Calibrated Dual-Loop D L L Using Antifuse Circuitry for High-Speed D R A M , " IEEE J. Solid-State Circuits, vol. 37, NO.6, pp. 726-734, June 2002. R.F.Rad, WDally, H.T.Ng, R.Senthinathan, M.E.Lee, R.Rathi, J.Poulton, " A LowPower Multiplying D L L for Low-Jitter Multi gigahertz Clock Generation in Highly Integrated Digital Chips," IEEEJ. Solid-State Circuits, vol. 37, N O . 12, pp. 1804-1812, Dec 2002. C.Kim, I.C.Hwang, S.M.Kang, " A Low-Power Small-Area +/-7.28-ps-Jitter 1GHz DLL-Based Clock Generator," IEEEJ. Solid-State Circuits, vol. 37, N O . 11, pp. 1414-1420, Nov 2002. Y.J.Jung, S.WLee, D.Shim, W.Kim, C.Kim, S.I.Cho, " A Dual-Loop DelayLocked Loop Using Multiple Voltage-Controlled Delay Lines," IEEE J. SolidState Circuits, vol. 36, NO.5, pp. 784-791, May 2001. 70  DJ.Foley, M.Flynn, "CMOS DLL-Based 2-V 3.2-ps Jitter 1-GHz Clock Synthesizer and Temperature-Compensated Tunable Oscillator," IEEE J. Solid-State Circuits, vol. 36, NO.3, pp. 417-423, Mar 2001. G.K.Dehng, J.W.Lyn, S.I.Liu, " A Fast-Lock Mixed-Mode D L L Using a 2-b SAR algorithm," IEEEJ. Solid-State Circuits, vol. 36, NO.10, pp. 1464-1471, Oct 2001. J.B.Lee, K.H.Kim, C.Yoo, S.Lee, O.GNa, C.Y.Lee, H.Y.Song, J.S.Lee, Z.H.Lee, K.W.Yeom, H.J.Chung, I.W.Seo, M.S.Chae, Y.H.Choi, S.I.Cho, "Digitally-Controlled D L L and I/O Circuits for 500Mb/S/Pin x l 6 D D R S D R A M , " ISSCC Dig, Tech. Papers, Feb 2001, pp.68-70. G. K.Dehng, J.M.Hsu, C.Y.Yang, S.I.Liu, "Clock-Deskew Buffer Using a SARControlled Delay-Locked Loop," IEEE J. Solid-State Circuits, vol. 35, pp. 11281136, Aug 2000. Y M o o n , J.Choi, K.Lee, D.K.Jeong, and M . K . K i m , " A n All-Analog Multiphase Delay-Locked Loop Using a Replica Delay Line for Wide-Range Operation and Low-Jitter Performance," IEEE J. Solid-state Circuits, vol.35, pp. 377-384, Mar 2000. H. Lee, H.Q.Nguyen, D.W.Potter, "Design Self-Synchronized Clock Distribution Networks In A n SOC ASIC Using D L L With Remote Clock Feedback," ASIC/ SOC Conference, 2000, Proceedings, 13th Annual IEEE International, Sept 2000, pp.248-252. P.Dudek, S.Szczepanski, J.V.Hatfield, " A High-Resolution CMOS Time-to-Digital Converter Utilizing a Vernier Delay Line," Solid-State Circuits, IEEE Transactions on, vol 35, NO.2, pp. 240-247, Feb 2000. K.Minami, M.Mizuno, H.Yamaguchi, T.Nakano, YMatsushima, YSumi, T.Sato, H.Yamashida, M.Yamashina, " A 1GHz Portable Digital Delay-Locked Loop with infinite Phase Capture Ranges," ISSCC Dig, Tech. Papers, Feb 2000, pp.350-351. Y.J.Jung, S.W.Lee, D.Shim, W.Kim, C.H.Kim, S.I.Cho, " A low Jitter Dual Loop D L L using Multiple V C D L s with a Duty Cycle Corrector," VLSI Circuits, 2000, Digest of Technical Papers, 2000 Symposium on, pp. 50-51. D.J.Foley, M.P.Flynn, "CMOS D L L Based 2V, 3.2ps Jitter, 1GHz Clock Synthesizer and Temperature Compensated Tunable Oscillator," Custom Integrated Circuits Conference, 2002, Proceedings of the IEEE 2002, May 2000, pp.371-374.  71  S.S.Hwang, K.M.Joo, H.J.Park, J.W.Kim, P.Chung, " A D L L based 10-320 M H z Clock Synchronizer," ISCAS 2000, Proceedings of the 2000 International Symposium on, Vol.1 May, 2000 pp.265-268. D.J.Foley, M.P.Flynn, " A 3.3V, 1.6GHz, Low-Jitter, Self-Correcting D L L Based Clock Synthesizer in 0.5 C M O S , " ISCAS 2000, Proceedings of the 2000 International Symposium on, Vol.l May 2000, pp.249-252. GChien, P.R.Gray, " A 900-MHz Local Oscillator Using a DLL-Based Frequency Multiplier Technique for PCS applications," IEEE J. Solid-state Circuits, vol.35, NO.12, pp. 1996-1999, Oct 2000. J.H.Lee, S.H.Han, H.J.Yoo, " A 330MHz Low-Jitter and Fast-Locking Direct Skew Compensation D L L , " ISSCCDig, Tech. Papers, Feb 2000, pp.352-353. S.Kuge, T.Kato, K.Furutani, S.Kikuda, K.Mitsui, T.Hamamoto, J.Setogawa, K . H amade, Y.Komiya, S.Kawasaki, T.Kono, T.Amano, T.Kubo, M.Haraguchi, Y.Nakaoka, M.Akiyama, Y.Konishi, H.Ozaki, T.Yoshihara, " A 0.18 256-Mb DDRS D R A M with Low-Cost Post-Mold Tuning Method for D L L Replica," IEEE J. Solid-state Circuits, vol.35, N O . l l , pp. 1680-1689, Nov 2000. S.S.Hwang, "Dual-Loop DLL-based clock synthesizer," Electronics Letters, vol 36, NO. 14, pp. 1173-1174, Jul 2000. T.Hamamoto, S.Kawasaki, K.Furutani, K.Yasuda, Y.Konishi " A skew and jitter suppressed D L L architecture for high frequency D D R SDRAMs," VLSI Circuits, 2000, Digest of Technical Papers, 2000 Symposium on, Mar 2000, pp. 76-77. J.J.Kim, S.B.Lee, T.S.Jung, C.H.Kim, S.I.Cho, B.Kim, " A Low-Jitter MixedMode D L L for High-Speed D R A M Applications," IEEE J. Solid-state Circuits, vol.35, NO.10, pp. 1430-1436, Oct 2000. C.S.Hwang, WC.Chung, C.Y.Wang, H.W.Tsao, S.I.Liu, " A 2V Clock Synthesizer using Digital Delay-Locked Loop," ASIC, 2000, Proceeding, 2002 IEEE AsiaPacific Conference on, Aug 2000, pp.91-94. S.Eto, H.Akita, K.Isobe, K.Tsuchida, H.Toda, T.Seki, " A 333MHz, 20mW, 18ps Resolution Digital D L L using Current-Controlled Delay with Parallel Variable Resistor D A C (PVR-DAC)," ASIC, 2000, Proceeding, 2002 IEEE Asia-Pacific Conference on, Aug 2000, pp.349-350.  72  H.Yoon, GCha, C.YOO, N.J.Kim, K.Y.Kim, C.H.Lee, K.N.Lim, k.Lee, J.Y.Jeon, T.S.Jung, H.Jeong, T.Y.Chung, K . K i m and S.I.Cho, " A 2.5-V, 333-Mb/s/pin, 1Gbit, Double-Data-Rate Synchronous D R A M , " IEEE J. Solid-State Circuits, vol. 34, N O . l l , pp. 1589-1599 Nov, 1999.  F.Lin, J.Miller, A.Schoenfeld, M.Ma, and R.J.Baker, " A Register-Controlled Symmetrical D L L for Double-Data-Rate D R A M , " IEEE J. Solid-State Circuits, vol. 34, pp. 565-568, Apr 1999. T.Saeki, K.Minami, H.Yoshida, H.Suzuki, " A Direct-Skew-Detect Synchronous Mirror Delay for Application-Specific Integrated Circuits," IEEE J. Solid-State Circuits, vol. 34, pp. 372-379, Mar 1999. W.Rhee, A . A l i , " A n On-Chip Phase compensation technique in fractional-N-frequency synthesis," ISCAS 1999, Proceedings of the 1999 International Symposium on, Vol.3, June 1999, pp.363-366. A.Mantyniemi, T.Rahkonen, J.Kostamovaara, " A High Resolution digital CMOS Time-To-Digital converter based on nested Delay Locked Loops," ISCAS 1999, Proceedings of the 1999 International Symposium on, Vol.2, June 1999, pp.537540. S.Nagavarapu, J.Yan, E.K.F.Lee, R.L.Geiger " A n asynchronous data recovery/ retransmission technique with foreground D L L calibration," ISCAS 1999, Proceedings of the 1999 International Symposium on, Vol.6, June 1999, pp.354-357. R.L.Aguiar, D.M.Santos, "Simulation and modeling of digital Delay Locked Loops," ISCAS 1999, 42ndMidwest Symposium On, Vol.2, Aug 1999, pp.843-846. R.L.Aguiar, D.M.Santos, "Modeling Charge-Pump Delay Locked Loops," ICECS 1999, The 6th IEEE International Conference On, Vol.2, Sept 1999, pp.823-826. S.H.Han, J.H.Lee, H.J.Yoo, " A fast lock-on time Mixed Mode D L L with lOps jitter," VLSI and CAD, 1999, ICVC 1999, The 6th IEEE International Conference On, Oct 1999,pp.564-565. M.Miyazaki, K.Ishibashi, " A 3-Cycle lock time Delay-Locked Loop with a parallel phase detector for low power mobile systems," ASICs, 1999, AP-ASIC 1999, The First IEEE Asia Pacific Conference On, Aug 1999, pp.396-399.  73  Y.S.Song, J.K.kang, " A Delay-Locked Loop circuit with Mixed-Mode tuning," ASICs, 1999, AP-ASIC 1999, The First IEEE Asia Pacific Conference On, Aug 1999, pp.347-350. P.D.Capofreddi, C.D.Baringer, J.F.Jenson, M.J.W.Rodwell, W.P.Posey, M.W.Yung, Y.M.Xie, " A Clock and Data recovery IC for communications and radar applications," Design Of Mixed-Mode Integrated Circuits and Applications, 1999, Third International Workshop On, Jul 1999, pp.88-90. T.Toifi, R.Vari, P.Moreira, A.Marchioro, "4-Channel Rad-Hard Delay Generation ASIC with Ins Timing Resolution for L H C , " Nuclear Science, IEEE Transactions On, Vol.46, NO.3, June 1999, pp.423-427. J.Park, Y.Koo, W.Kim, " A Semi-Digital Delay-Locked Loop for clock skew minimization," VLSI Design, 1999, Proceedings of 12th International Conference On, Jan 1999,pp.584-588. A.Balatsos, D.Lewis, "Low-Skew clock generator with dynamic impedance and delay matching," ISSCC Dig, Tech. Papers, Feb 1999, pp. 182-183. L.Paris, J.Benzreba, P.Demone, M.Dunn, L.Falkenhagen, P.Gillingham, I.Harrison, W.He, D.Macdonald, M.Macintosh, B.Millar, K.Wu, H.J.Oh, J.Stender, V.Chen, J.Wu, " A 800MB/s 72Mb S L D R A M with digitally calibrated D L L , " ISSCC Dig, Tech. Papers, Feb 1999, pp.414-415. Y.Moon, D.K.Jeong, " A lGbps transceiver with Receiver-End deskewing capability using Non-Uniform Tracked Oversampling and a 250-750 M H z Four-Phase D L L , " 1999 Symposium On VLSI Circuits, Dig, Tech. Papers, pp.47-48. F.Mu, A.Edman, C.Sevenson, "Digital Multiphase Clock/Pattern Generator," IEEEJ. Sold-State Circuits, vol.34, NO.2, pp. 182-191, Feb 1999. S.I.Liu, J.H.Lee, H.W.Tsao, "Low-Power Clock-Deskew Buffer for High-Speed Digital Circuits," IEEE J. Sold-State Circuits, vol.34, NO.4, pp. 554-558, Apr 1999. M.Mota, J.Christiansen, " A High-Resolution Time Interpolator Based on a Delay Locked Loop and an R C Delay Line," IEEE J. Sold-State Circuits, vol.34, NO. 10, pp. 1360-1366, Oct 1999. Y.Nakase, YMorooka, D.J.Perlman, D.J.Kolar, J.M.Choi, H.J.Shin, T.Yoshimura, N.Watanabe, Y.Matsuda, M.Kumanoya, M.Yamada, "Source-Synchronization and  74  Timing Vernier Techniques for 1.2-GB/s S L D R A M interface," IEEE J. Sold-State Circuits, vol.34, NO.4, pp. 494-501, Apr 1999. W.Bruno, K.S.Donnelly, J.Kim, P.S.Chau, J.L.Zerbe, C.Huang, C.V.Tran, C.L.Portmann, D.Stark, Y.F.Chan, T.H.Lee, M.A.Horowitz, " A Portable Digital D L L for High-Speed CMOS Interface Circuits," IEEE J. Sold-State Circuits, vol.34, NO.5, pp. 632-644, May 1999. C.Kim, H.K.Kyung, W.P.Jeong, J.S.Kim, B.S.Moon, J.W.Chai, S.M.Yim, J.H.Choi, K.H.Han, C.J.Park, H.S.Hwang, H.Choi, S.B.Cho, C.L.Portmann, S.I.Cho, " A 2.5-V, 72-Mbit, 2.0-GByte/s Packet-Based D R A M with a 1.0-Gbps/ pin Interface," IEEE J. Sold-State Circuits, vol.34, NO.5, pp. 645-652, May 1999. S.Eto, M.Matsumiya, M.Takita, Y.Ishii, T.Nakamurra, K.Kawabata, H.Kano, A . Kitamoto, T.Ikeda, T.Koga, M.Higashiro, Y.Serizawa, K.Itabashi, O.Tsuboi, Y.Yokoyama, and M.Taguchi, " A 1 Gb S D R A M with ground level precharged bitline and non-boosted 2.1V word line," IEEE J. Solid-State Circuits, vol. 33, N O . l 1 pp. 1697-1702, Nov 1998. M.Hasegawa, M.Nakamura, S.Narui, S.ohkuma, YKawase, H.Endoh, S.Miyatake, T.Akiba, K.Kawakita, M.Yoshida, S.Yamada, T.Sekigguchi, I.Asano, Y.Tadaki, R.Nagai, S.Miyako, K.Kajigaya, M.Horiguchi, and Y.Nakagome, " A 256 Mb S D R A M with subthreshold leakage current suppression," in ISSCC 1998 Dig. Tech. Papers, Feb 1998, pp. 80-81. C.H.Kim, J.H.Lee, J.B.Lee, B.S.Kim, C.S.Park, S.B.Lee, S.Y.Lee, C.W.Park, J.GRoh, H.S.Nam, D . G K i m , D.Y.Lee, T.S.Jung, H.Yoon, S.I.Cho, " A 64-Mbit, 640-MByte/s bidirectional data strobed, Double-Data-Rate S D R A M with a 40mW D L L for a 256-MByte memory system," IEEE J. Sold-State Circuits, vol.33, N O . l l , pp. 1703-1710, Nov 1998. B. S.Kim, L.S.Kim, "100 M H z all-digital Delay-Locked Loop for low power application" Electronics Letters, vol 34, NO.18, pp. 1739-1740, Sept 1998. S.J.Jang, S.H.Han, C.S.kim, Y.H.Jun, H.J.Yoo, " A compact ring delay line for high speed synchronous D R A M , " VLSI Circuits, 1998, Digest of Technical Papers, 1998 Symposium on, pp. 60-61. B. W.Garlepp, K.S.Donnelly, J.kim, P.S.Chau, J.L.Zerbe, C.Huang, C.V.Tran, C. L.Portmann, D.Stark, Y.F.Chan, T.H.Lee, M.A.Horwitz, " A portable digital D L L architecture for CMOS interface circuits," VLSI Circuits, 1998, Digest of Technical Papers, 1998 Symposium on, pp. 214-215. 75  T.Yushimura, Y.Nakase, N.Watanabe, YMorooka, Y.Matsuda, M.Kumanoya, H.Hamano, " A Delay-Locked Loop and 90-degree phase shifter for 800Mbps Double Data Rate memories," VLSI Circuits, 1998, Digest of Technical Papers, 1998 Symposium on, pp. 66-67. D.Birru, " A novel Delay-Locked Loop based CMOS clock multiplier," IEEE J. Sold-State Circuits, vol.44, NO.4, pp. 1319-1322, Nov, 1998 M.Mota, J.Christiansen, " A four channel, self-calibrating, high resolution, Time To Digital Converter, "Electronics, Circuits and Systems, 1998 IEEE International Conference On, vol.1, pp. 409-412, Sept 1998. RL.Aguiar, D.M.Santos, "Wide-Area clock distribution using controlled delay lines," Electronics, Circuits and Systems, 1998 IEEE International Conference On, vol.2, pp. 63-66, Sept 1998. M.S.Gorbics, J.Kelly, K.M.Roberts and R.L.Sumner, " A High Resolution Multihit Time to Digital Converter Integrated Circuit," IEEE Transactions on Neuclear Science, vol 44, pp. 379-384, June 1997. S.Sidiropoulos, M.Horwitz, " A Semidigital Dual Delay-Locked Loop," IEEE J. Solid-State Circuits, vol. 32, pp. 1683-1692, Nov 1997. A.Hatakeyama, H.Mochizuki, T.Aikawa, M.Takita, Y.Ishii, H.Tsuboi, S.Y.Fujioka, S.Yamaguchi, M.Koga, Y.Serizawa, K.Nishimura, K.Kawabata, YOkajima, M.Kawano, H.Koima, K.Mizutani, T.Anezaki, M.Hasegawa, and M.taguchi, " A 256-Mb S D R A M using a register-controlled digital D L L , " IEEE J. Solid-State Circuits, vol. 32, pp. 1728-1732, Nov 1997. GC.Moyer, M.Clements, W.Liu, T.Schaffer, R.K.Cavin, "The Delay Vernier pattern generation technique," IEEE J. Sold-State Circuits, vol.32, NO.4, pp. 551562, Apr 1997. K.Gotch, S.Wakayama, M.Saito, J.Ogawa, H.Tamura, YOkajima, M.Taguchi, "All-Digital Multi-Phase Delay Locked Loop for internal timing generation in embedded and/or high speed D R A M s , " VLSI Circuits, 1997, Digest of Technical Papers, 1997 Symposium on, pp. 107-108. T.Saeki, H.Nakamura, J.Shimizu, " A lOps jitter 2 clock cycle lock time CMOS digital clock generator based on an interleaved synchronous mirror delay scheme" VLSI Circuits, 1997, Digest of Technical Papers, 1997 Symposium on, pp. 109110. 76  [89]  S.Sidiropoulos, M.Horwitz, " A Semi-Digital D L L with unlimited phase shift capability and 0.08-400MHz operating range," ISSCC Dig, Tech. Papers, Feb 1997, pp.332-333.  [90]  A.Hatakeyama, H.Mochizuki, TAikawa, M.Takita, Y.Ishi, H.Tsuboi, S.Fujioka, S.Yamaguchi, M.Koga, Y.Serizawa, K.Nishima, K.Kawabata, YOkajima, M.Kawano, H.Kojima, K.Mizutani, T.Anezaki, M.Hasegawa, M.Taguchi, " A 256Mb S D R A M using a Register-Controlled Digital D L L , " ISSCC Dig, Tech. Papers, Feb 1997, pp.72-73.  [91]  S.Gogaert, M.Steyaert, " A skew tolerant CMOS level-based A T M data-recovery system without P L L topology," Custom Integrated Circuits Conference, 1997, Proceedings of the IEEE 1997, Sept 1997, pp.'453-456.  [92]  B.S.Kim, L.S.Kim, " A low power 100MHz A l l Digital Delay-Locked Loop," ISCAS 2004, Proceedings of the 1997 International Symposium on, Vol.1 May 1997, pp. 1820-1823.  [93] V.Lines, M.A.Scido, C.Mar, A.Achyuthan, "High speed circuit techniques in a 150MHz 64M S D R A M , " Memory Technology, Design and Testing, 1997, Proceedings International Workshop On, Aug 1997, pp.8-11. [94]  T.Saeki, YNakaoka, M.Fujita, A.Tanaka, K.Nagata, K.Sakakibara, T.Matano, Y.Hoshino, K.Miyano, S.Isa, E.Kakehashi, J.Drynan, M.Komuro, T.Fukase, H.Iwasaki, J.Sekine, M.Igeta, N.Nakanishi, T.Itani, K.Yoshida, H.Yoshina, S.Hashimoto, T.Yshii, M.Ichinose, T.Imura, M.Uziie, K.Koyama, Y.Fukuzo, and T.Okuda, " A 2.5 ns clock access 250 M H z 256 Mb S D R A M with synchronous mirror delay," ISSCC 1996 Dig. Tech. Papers, Feb 1996, pp. 374-375.  [95]  A.Chau, D.Deusschere, S.Dow, J.Flasck, M.E.Levi, F.Kristen, E.Su, " A MultiChannel Time-to-Digital converter chip for drift chamber readout," Nuclear Science, IEEE Transactions On, Vol.43, NO.3, June 1996, pp. 1720-1724.  [96]  D.M.Santos, S.F.Dow, M.E.Levi, " A CMOS Delay-Locked Loop and Sub-Nanosecond Time-to-Digital converter chip," Nuclear Science, IEEE Transactions On, Vol.43, NO.3, June 1996, pp.1717-1719.  [97]  J.Christiansen, " A n Integrated High Resolution CMOS Timing Generator Based on an Array of Delay Locked Loops," IEEE J. Sold-State Circuits, vol.31, NO.7, pp. 952-957, Jul 1996.  77  [98]  S.Tanoi, T.Tanabe, K.Takahashi, S.Miyamoto, M.Uesugi, " A 250-622 M H z Deskew and Jitter-Suppressed Clock Buffer Using Two-Loop Architecture," IEEE J. Sold-State Circuits, vol.31, NO.4, pp. 487-493, Apr 1996.  [99]  J.Chapman, J.Currin, S.Payne, " A Low-Cost High-Performance CMOS Timing Vernier for ATE," Test Conference, 1995, Proceedings International, pp. 459-468, Oct 1995.  [100] R.F.Ormondroyd, "The acquisition performance of Delay-Locked Loops in noise," Radio Receivers and Associated Systems, Sept 1995, pp.192-197. [101] E.R.Ruotsalainen, T.Rahkonen, J.Kostamovaara, " A Low-Power CMOS Time-toDigital converter," IEEE J. Sold-State Circuits, vol.30, NO.9, pp. 984-990, Sept 1995. [102] H.Sutoh, K.Yamakoshi, M.Ino, "Circuit technique for Skew-Free Clock distribution," Custom Integrated Circuits Conference, 1995, Proceedings of the IEEE 1995, Sept 1995, pp.163-166. [103] B.Kim, T.C.Weigandt, P.R.Gray " P L L / D L L system noise analysis for Low-Jitter Clock synthesizer design" ISCAS 1994, Proceedings of the 1994 International Symposium on, Vol.4 June 1994, pp.31-34. [104] T.Lee, " A 2.5 V CMOS delay-locked loop for an 18 Mbit, 500 MB/s D R A M , " IEEEJ. Solid-State Circuits, vol. 29, pp. 1491-1496, Dec 1994. [105] M.Izzard, "Analog versus digital control of a clock synchronizer for a 3 Gb/s data with 3.0 V differential E C L , " inDig. Tech, Papers 1994 Symp. VLSI Circuits, June 1994, pp. 39-40. [106] C.Ljuslin, J.Christiansen, A.Marchioro, O.Klingsheim, " A n integrated 16 channel CMOS Time-to-Digital converter," Nuclear Science, IEEE Transactions On, Vol.41, NO.4, Aug 1994,pp.ll04-1108. [107] A.Waizman, " A Delay Line Loop for frequency synthesis of De-Skewed Clock," ISSCC Dig, Tech. Papers, Feb 1994, pp.298-299. [108] T.Kuroda, T.Fujita, S.Mita, T.Mori, K.Matsuo, M.Kakumu, T.Sakurai, "Substrate noise influence on circuit performance in variable threshold-voltage scheme," IEEE J. Sold-State Circuits, vol.29, NO.3, pp. 309-312, Mar 1994.  78  [109]  M.Ramezani, C.A.T.Salama, " A n improved Bang-Bang phase detector for clock and data recovery applications," ISCAS, vol.1, NO.3, pp. 715-718, 1994.  79  Appendix A Design VHDL code  library ieee; use ieee. stdlogicl 164. all; — library vst_nl8_sc_tsm_c4_wc; — use vst_nl8_sc_tsm_c4_wc.components.all; — library tpz973gtc; — use tpz973gtc.components.all; entity vernierunitdelay is port ( coarsecontrol : in stdlogic := '0'; ~ contol line for the coarse chain finecontrol : in stdlogic := '0'; — contol line for the fine chain fine_control_inv : in stdlogic := '0'; — inverted version of fine_control coarseinput : in stdlogic := '0'; ~ input to coarse chain fineinput : in stdjogic := '0'; - input tofinechain vernierinput : in stdlogic := '0'; — input from previous stage coarseoutput : out stdlogic := '0'; ~ output of coarse chain fineoutput : out std_logic := '0'; - output offinechain veraieroutput : out stdlogic := '0'; - output to next stage clkoutput : out stdlogic := '0'); - output of vernier unit end vernier_unit_delay; architecture structural of vernierunitdelay is signal A,B,coarse_output_int,fine_output_int: stdlogic := '0'; signal logicone : stdlogic := '1'; component NAN2D0 port( Z : out STDLOGIC; A l :in STDJLOGIC; A2 : in STD_LOGIC); end component; component BUFTD1 port( Z :out STD_LOGIC; A :in STDLOGIC; ENB : in STDLOGIC); end component;  component N AN2M1D1 port( Z :out STD_LOGIC; A l : in STD_LOGIC; A2 : in STDLOGIC); end component;  begin -- structural U1: NAN2D0 port map (A, logicone, fine_input); U2: NAN2D0 port map (fine_output_int, A, logicone); U3: NAN2D0 port map (B, coarseinput, vernier_input); U4: NAN2D0 port map (coarse_output_int, B, finecontrolinv); U5: NAN2M1D1 port map (vernier_output,fine_output_int,finecontrol); U6: BUFTD1 port map (clkoutput, coarse_output_int, coarsecontrol); fineoutput <= fme_output_int; coarse_output <= coarseoutputint; logic_one <= '1'; end structural; entity vernierdelayline is generic ( N : integer := 128 ); -- number of delay elements port( delaylineoutput: out stdlogic := '0'; delay_line_input : in std_logic := '0'; finecontrol : in std_logic_vector(N-1 downto 0) := (others => '0'); fine_control_inv : in std_logic_vector(N-l downto 0) := (others => '0'); coarse_control : in std_logic_vector(N-l downto 0) := (others => '0')); end vernierdelayline; architecture structural of vernier_delay_line is signal fine,coarse,vernier : std_logic_vector(N-1 downto 1) :=. (others => '0'); signal logicone : stdlogic := T; component vernierunitdelay port( coarse_control : in std_logic; ~ contol line for the coarse chain finecontrol : in std_logic; - contol line for the fine chain finecontroMnv : in stdlogic; - inverted version of fme_control coarse_input : in stdlogic; - input to coarse chain fme_input : in stdlogic; - input tofinechain vernierinput : in stdlogic; - input from previous stage coarseoutput : out stdlogic; - output of coarse chain fme_output : out stdlogic; - output offinechain  vernier_output : out std_logic; - output to next stage clkoutput : out std_logic); — output of vernier unit end component; begin — structural chain: for i in 0 to N-l generate last_unit: if (i = 0 ) generate Dl: vernier_unit_delay port map (coarsecontrol(O), finecontrol(O), finecontrolinv(O), coarse(l), fine(l), vernier(l), open, open, open, delay_line_output); end generate last_unit; middleunits: if (i > 0 and i < N-l) generate Dl: vernierunitdelay port map (coarse_control(i), finecontrol(i), fme_control_inv(i), coarse(i+l), fme(i+l), vernier(i+l), coarse(i), fine(i), vernier(i), delay_line_output); end generate middleunits; first_unit: if (i = N-l) generate Dl : vernier_unit_delay port map (coarse_control(N-l), fme_control(N-l), fine_control_inv(N-l), delay_line_input, delay_line_input, logicone, coarse(N-l), fine(N-l), vernier(N-l), delay_line_output); end generate first_unit; end generate chain; logicone <= ' 1'; delaylineoutput <= 'H'; — Should be commented for synthesis end structural; entity vernier_controller is generic ( N : integer := 128 );  — number of delay elements  port ( reset : in std_logic := '0'; registerclock : in std_logic := '0'; increasedelay : in std_logic := '0'; decreasedelay : in stdlogic := '0'; coarse_control : out std_logic_vector(N-l downto 0) := (others => '0'); finecontrol : out std_logic_vector(N-1 downto 0) := (others => '0'); fine_control_inv : out std_logic_vector(N-l downto 0) := (others => '0')); end verniercontroller; architecture behavior of vernier_controller is signal coarse_load_data : std_logic_vector(N-l downto 0) := (others => '0'); 82  signal fine_load_data : std_logic_vector(N-l downto 0) := (others => '0'); signal fme_control_int : std_logic_vector(N-l downto 0) := (others => '0'); signal coarse_control_int: std_logic_vector(N-l downto 0) := (others => '0'); signal coarse_enable : std_logic := '0'; signal fine_enable : stdlogic := '0'; signal logic_zero : stdlogic := '0'; signal logic_one : stdlogic := '1'; type statejype is (IDLE,INCREMENT,DECREMENT,FiNE); signal nextstate : state_type; signal state : state_type; begin process (nextstate, increase_delay, decrease_delay, fine control_int) begin case state is when IDLE => if increase_delay = '1' then nextstate <= INCREMENT; elsif decreasedelay = '1' then nextstate <= DECREMENT; else nextstate <= IDLE; end if; when INCREMENT => if decrease_delay = i ' then nextstate <= DECREMENT; else nextstate <= INCREMENT; end if; when DECREMENT => if increasedelay = T then nextstate <= FINE; else nextstate <= DECREMENT; end if; when FINE => if increasedelay = '1' and finecontrolint(O) = T then nextstate <= INCREMENT; elsif decreasedelay = '1' and fine_control_int(N-2) = '1' then nextstate <= DECREMENT; else nextstate <= FINE; end if; end case; end process; process (reset, registerclock) begin if reset = '0' then state <= IDLE; elsif (register_clock'event and registerclock = '1') then state <= nextstate; end if; end process; process(reset, register_clock) begin if reset = '0' then coarse_control_int <= coarse_load_data;  elsif (register_clock'event and register_clock = T) then if increase_delay = '1' and coarseenable = '1' then rightshift: for i in 0 to N-2 loop coarse_control_int(i) <= coarse_control_int(i+l); end loop; coarsecontrolint(N-l) <= logic_one; elsif decrease_delay = '1' and coarse_enable = '1' then leftshift: for i in N-1 downto 2 loop coarse_control_int(i) <= coarse_control_int(i-l); end loop; coarse_control_int(0) <= logic_one; end if; end if; end process; process(reset, register_clock) begin if reset = '0' thenfine_control_int<= fine_load_data; elsif (register_clock'event and registerclock = '1') then if increasedelay = '1' and fineenable = '1' then rightshift: for i in 0 to N-2 loop finecontrolint(i) <= fine_control_int(i+l); end loop; fine_control_int(N-1) <= logic_zero; elsif decrease_delay = T and fine_enable = '1' then leftshift: for i in N-1 downto 2 loop finecontrolint(i) <= finecontrolint(i-l); end loop; fine_control_int(0) <= logic_zero; end if; end if; end process; logiczero <= '0'; logicone <= '1'; coarse_load_data <= x"FFFFFFFFFFFFFFFF7FFFFFFFFFFFFFFF"; fine_load_data <= x"80000000000000000000000000000000"; coarse_enable <= '1' when ((state = INCREMENT or state = DECREMENT) and (nextstate /= FINE)) else '0'; fine_enable <= T' when (state = FINE) else '0'; , fine_control_inv <= not finecontrolint; fine_control <= fine_control_int; coarsecontrol <= coarse_control_int; end behavior; entity high_resoloution_phase_detector is port( dll_clock_output: in stdlogic := '0'; dllclockinput : in stdlogic := '0';  — DLL's output clock — Input clock to DLL 84  reset : in std_logic := '0'; — reset input register_clock : out stdlogic := '0'; ~ Clock for shift register decreasedelay : out stdlogic := '0'; — shift-left output increasedelay : out std_logic := '0'); — shift right output end high_resoloution__phase_detector; architecture structural of high_resoloution_phase_detector is signal A,B,C,D,E,decrease_delay_int,increase_delay_int stdlogic := '0'; signal F,G,H,I,dll_clock_input_int,reg_clk_l,reg_clk_2,reg_clk_3 : stdlogic := '0'; signal logicone : stdlogic := '1'; :  component BUFBD4 port( Z : out STDLOGIC; A : in STDLOGIC); end component; component BUFBD16 port( Z : out STDLOGIC; A : in STDLOGIC); end component; component BUFBD32 port( Z : out STDJLOGIC; A : in STDLOGIC); end component; component DFFRPB1 port( Q : out STDLOGIC; QB :out STDLOGIC; CK :in STDLOGIC; D : in STD_LOGlC; RB :in STD_LOGIC); end component; component AND3D1 port( Z : out STD_LOGIC; A l :in STDLOGIC; A2 : in STD_LOGIC; A3 : in STDLOGIC); end component; component AND2D1 port( 85  Z : out STDJLOGIC; A l :in STDLOGIC; A2 :in STD_LOGIC); end component; component OR2D1 port( Z : out STD_LOGIC; A l :in STD_LOGIC; A2 :in STD_LOGIC); end component;  begin — structural Ul: DFFRPB1 port map (E, F, dllclockinputjnt, B, reset); U2: DFFRPB1 port map (G, H, dllclockinputjnt, A, reset); U3: AND2D1 port map (A, logicone, dll_clock_output); U4: AND3D1 port map (increase_delay_int, E, G, dll_clock_input__int); U5: AND3D1 port map (decrease_delay_int, F, H, dllclockinputjnt); U6: OR2D1 port map (regclkl, increasedelayint, decrease_delay_in U7: AND2D1 port map (B, dllclockoutput, logic_one); U8: AND2D1 port map (dllclockjnputint, dll_clock_input, logicone) u9: BUFBD4 port map (reg_clk_2, regclkl); ulO: BUFBD16 port map (reg_clk_3, reg_clk_2); ul 1: BUFBD32 port map (register_clock, reg_clk_3); logic_one <= '1'; decreasedelay <= decrease delay_int; increase_delay <- increase_delay_int; end structural; entity lock_detector is port( lock_indicator : out stdlogic := '0'; — high when DLL is locked dll_clock_input : in std_logic := '0'; ~ Input clock to DLL reset : in stdlogic := '0'; — reset input decrease_delay : in std_logic := '0'; - shift-left output increase_delay : in std_logic := '0'); ~ shift_right output end lockdetector; architecture behavior of lock detector is  begin process(reset, dll_clock_input) begin  if reset = '0' then lock indicator <= '0'; elsif dll_clock_input'event and dllclock input = '0' then lockindicator <= not(increase_delay or decrease_delay); end if; end process; end behavior; entity vernierdll is generic ( N : integers 128 ); port( dll_clock input dll_clock_output lockindicator reset : in end vernierdll;  : in std_logic := '0'; : out stdlogic := '0'; : out std_logic := '0'; std_logic := '0');  architecture structural of vernier dll is signal registerclock : stdlogic := '0'; signal finecontrol : std_logic_vector(N-l downto 0) := (others => '0'); signal fine_control_inv : stdJogic_vector(N-1 downto 0) := (others => '0'); signal coarse_control : std_logic_vector(N-l downto 0) := (others => '0'); signal decrease_delay : std_logic := '0'; signal increasedelay : stdlogic := '0'; signal dllclockoutput int : stdlogic := '0'; signal dllclock inputjnt : stdjogic := '0'; signal delayjineoutput : std_logic := '0'; signal logiczero : stdlogic := '0'; signal reset_int : stdlogic := '0'; signal lock indicator int : stdlogic := '0'; signal dll_clock_output_pad : stdjogic := '0'; component PDCH3DGZ port( CLK : in std_logic; CP : out stdjogic); end component; component PDD24DGZ port( I : in stdjogic; OEN : in stdjogic; PAD : inout stdjogic; C : out stdlogic); end component; 87  component PDIDGZ port( PAD : in stdjogic; C : out stdlogic); end component; component PDO02CDG port( I : in stdlogic; PAD : out stdlogic); end component; component vernier_delay_line generic ( N : integer);  number of delay elements  port( delay_line_output: out std_logic; delayjinejnput : in std_logic; finecontrol : in std_logic_vector(N-l downto 0); fine_control inv : in std_logic_vector(N-1 downto 0); coarsecontrol : in std_logic_vector(N-l downto 0)); end component; component high_resoloution_phase_detector port( dllclockoutput: in stdlogic; — DLL's output clock dllclock input : in stdjogic; - Input clock to DLL reset : in stdjogic; - reset input register_clock : out stdjogic; -- Clock for shift register decreasedelay : out std logic; ~ shift left output increase_delay : out stdlogic); - shift right output end component; component vernier_controller is generic ( N : integer);  — number of delay elements  port( reset : in stdjogic; registerclock in stdjogic; increase_delay : in stdlogic; decrease_delay : in std logic; coarsecontrol : out stdlogic_vector(N-l downto 0); fine_control : out stdlogic_vector(N-l downto 0); fine control inv : out stdlogic_vector(N-l downto 0));  end component; component lockdetector is port( lockindicator : out stdjogic; dll_clock_input : in stdlogic; reset : in std_logic; decrease_delay : in stdlogic; increase_delay : in stdlogic); end component; begin — structural  ul : vernierdelayline generic map (N => 128) port map (delayjineoutput, dll_clock_input_int, finecontrol,fine_control_inv,coarse_control); u2 : high_resoloution_phase_detector port map (dll_clock_output_int, dll_clock_input_int, reset_int, registerclock, decreasedelay, increasedelay); u3 : verniercontroller generic map (N=> 128) port map (resetint, registerclock, increasedelay, decreasedelay, coarsecontrol, fine_control, fine_control_inv); u4 : lockdetector port map (lockindicatorint, dll_clock_input_int, reset_int, decreasedelay, increasedelay); u5: PDD24DGZ port map (delay_line_output, logiczero, dll_clock_output_pad, dll_clock_output_int); u6: PDO02CDG port map (lockindicatorint, lockindicator); u7: PDIDGZ port map (reset, resetint); u8: PDCH3DGZ port map (dll_clock_input, dll_clock_input_int); end structural; logic_zero <= '0'; dll_clock_output <= dll_clock_output_pad; 89  library ieee; use ieee.stdjogicl 164.all; library vst_nl8_sc_tsm_c4_typ; use vst_nl8_sc_tsm_c4_typ.components.all; library tpz973gtc; use tpz973gtc.components.all; entity verniertestbench is generic( N : integer := 128); end verniertestbench; architecture behavior of vernierjestbench is signal jitterl: stdjogic := '1'; signal jitterh : stdlogic := '0'; signal clock ljnput: stdjogic := '0'; signal clock2 input: stdjogic := '0'; signal clockenable : stdjogic := ' 1'; signal dll_clock input : stdjogic := '0'; signal dll_clock_output : stdjogic := '0'; signal lock indicator : stdjogic := '0'; signal reset : stdjogic := '0'; component vernierdll generic ( N : integer); port( dll_clock input : in stdjogic; dll_clock_output : out'stdjogic; lock indicator : out stdjogic; reset : in stdlogic); end component;  begin UI : vernierdll generic map(N => 128) port map (dllclockjnput, dll_clock_output, lockjndicator, reset); process begin clockl input <= '0'; wait for 4100 ps; clockl Jnput <= '1'; wait for 4100 ps; 90  end process; process begin clock2_input <= '1'; wait for 3900 ps; clock2_input <= '0'; wait for 3900 ps; end process; process begin clock_enable <= '1'; wait for 291200 ps; clock_enable <= '0'; wait for 2910000 ps ; end process; process begin jitterl <= '1'; wait for 291100 ps; jitterl <= '0'; wait for 200 ps; jitterl <=T; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps;  jitterl<=T; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <=T; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= *1 wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <=T; end process; process begin jitterh <= '0'; wait for 295200 ps jitterh <= '1'; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= '1'; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <=T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= '1'; wait for 200 ps;  jitterh <= '0'; wait for 8000 ps; jitterh <= '1'; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <=T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <=T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <=']'; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <=T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= '1'; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= '1'; wait for 200 ps; jitterh <= '0'; end process; dllclockinput <= (((clock 1 input and jitterl) or jitterh) and clock_enable) or (clock2_input and (not clockenable)); process begin reset <= '0'; wait for 10000 ps; reset <= T; wait; end process; end behavior; configuration vernierrtl of vernierjestbench is 93  for behavior for ul : vernierdll use entity work.vernierdll(structural); for structural forul: vernier_delay_line use entity work.vernierdelayline(structural); end for; for u2: high_resoIoution_phase_detector use entity work.high_resoloution_phase_detector(structural); end for; foru3: vernier_controller use entity work.vernier_controller(behavior); end for; for u4: lockdetector use entity work.lock_detector(behavior); end for; end for; end for; end for; end vernier rtl  Appendix B Synthesis result  Report: cell Design : vernier_unit_delay Version: V-2004.06-SP1 Date : Mon Jun 27 12:29:56 2005 Attributes: b- black box (unknown) h- hierarchical n - noncombinational r - removable u - contains unmapped logic Cell  Reference  UI  NAN2D0  U2 U3 U4 U5 U6  Total 6 cells  Library  Area Attributes  vst nl8 sc tsm c4 wc 12.197000 NAN2D0 vst nl8 sc tsm c4 wc 12.197000 NAN2D0 vst nl8 sc tsm c4 wc 12.197000 NAN2D0 vst nl8 sc tsm c4 wc 12.197000 NAN2M1D1 vst nl8 sc tsm c4 wc 16.261999 BUFTD1 vst nl8 sc tsm c4 wc 28.459000 n 93.508995  Report: area Design : vernier_unit_delay Version: V-2004.06-SP1 Date : Thu Jun 23 14:31:56 2005 Library(s) Used: vst_n 18_sc_tsm_c4_wc (File: /CMC/kits/cmosp 18/synopsys/2004/syn/vst_nl 8_sc_tsm_c4_wc.db) Number of ports: Number of nets: Number of cells: Number of references:  10 13 6 3  Combinational area: 93.508995 Noncombinational area: 0.000000 Net Interconnect area: undefined (Wire load has zero net area) Total cell area: Total area:  93.508995 undefined  **************************************** Report: area Design : vernierdelayline Version: V-2004.06-SP1 Date : Thu Jun 23 14:34:45 2005 Library(s) Used: vstnl 8_sc_tsm_c4_wc (File: /CMC/kits/cmospl8/synopsys/2004/syn/vst_nl8_sc_tsm_c4_wc.db) Number of ports: Number of nets: Number of cells: Number of references:  386 767 128 1  Combinational area: 11969.153320 Noncombinational area: 0.000000 Net Interconnect area: undefined (Wire load has zero net area) Total cell area: Total area:  11969.151367 undefined  96  Report: area Design : high_resoloution_phase_detector Version: V-2004.06-SP1 Date : Thu Jun 23 14:39:20 2005 Library(s) Used: vst_nl8_sc_tsm_c4_wc (File: /CMC/kits/cmospl8/synopsys/2004/syn/vst_nl8_sc_tsm_c4_wc.db) Number of ports: Number of nets: Number of cells: Number of references:  6 17 11 7  Combinational area: 760.265991 Noncombinational area: 154.492004 Net Interconnect area: undefined (Wire load has zero net area) Total cell area: Total area:  914.757996 undefined  Report: area Design : verniercontroller Version: V-2004.06-SP1 Date : Thu Jun 23 14:50:12 2005  Library(s) Used: vst_n 18_sc_tsm_c4_wc (File: /CMC/kits/cmosp 18/synopsys/2004/syn/vst_n 18_sc_tsm_c4_wc.db) Number of ports: Number of nets: Number of cells: Number of references:  388 825 567 19  Combinational area: 6244.804199 Noncombinational area: 25637.822266 Net Interconnect area: undefined (Wire load has zero net area) Total cell area: Total area:  31882.552734 undefined 97  ****************************************  Report: area Design : lock_detector Version: V-2004.06-SP1 Date : Thu Jun 23 14:42:05 2005 **************************************** Library(s) Used: vst_nl8_sc_tsm_c4_wc (File: /CMC/kits/cmospl8/synopsys/2004/syn/vst_nl8_sc_tsm_c4_wc.db) Number of ports: Number of nets: Number of cells: Number of references:  5 6 2 2  Combinational area: 12.197000 Noncombinational area: 77.246002 Net Interconnect area: undefined (Wire load has zero net area) Total cell area: Total area:  89.443001 undefined  ****************************************  Report: area Design : vernier_dll Version: V-2004.06-SP1 Date : Thu Jun 23 15:09:01 2005 **************************************** Library(s) Used: vstnl 8_sc_tsm_c4_wc (File: /CMC/kits/cmosp 18/synopsys/2004/syn/vst_nl 8_sc_tsm_c4_wc.db) tpz973gwc (File: /CMC/kits/cmosp 18/synopsys/2004/syn/tpz973gwc.db) Number of ports: Number of nets: Number of cells: Number of references:  4 397 8 8  Combinational area: 65985.875000 Noncombinational area: 25869.562500 Net Interconnect area: undefined (Wire load has zero net area) Total cell area: Total area:  91855.906250 undefined 98  ****************************************  Report: cell Design : vernierdll Version: V-2004.06-SP1 Date : Thu Jun 23 15:10:29 2005 **************************************** Attributes: b - black box (unknown) h - hierarchical n - noncombinational p - parameterized r - removable u - contains unmapped logic Cell  Reference  ul  vernier_delay_line  u2 u3 u4 u5 u6 u7 u8  Library  Area Attributes  11969.151367 h, n, p highresoloutionphasedetector 914.757996 h,n vernier_controller 31882.552734 h, n, p lockdetector 89.443001 h, n PDD24DGZ tpz973gwc 9400.000000 n PDO02CDG tpz973gwc 9400.000000 PDIDGZ tpz973gwc 9400.000000 PDCH3DGZ tpz973gwc 18800.000000  Total 8 cells HDL Parameter Information: ul - N=>128 u3 - N => 128  91855.906250  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065411/manifest

Comment

Related Items