UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A novel high resolution delay locked loop Saghafi, Ardeshir 2005

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2005-0625.pdf [ 4.36MB ]
Metadata
JSON: 831-1.0065411.json
JSON-LD: 831-1.0065411-ld.json
RDF/XML (Pretty): 831-1.0065411-rdf.xml
RDF/JSON: 831-1.0065411-rdf.json
Turtle: 831-1.0065411-turtle.txt
N-Triples: 831-1.0065411-rdf-ntriples.txt
Original Record: 831-1.0065411-source.json
Full Text
831-1.0065411-fulltext.txt
Citation
831-1.0065411.ris

Full Text

A NOVEL HIGH RESOLUTION DELAY LOCKED LOOP by ARDESHIR SAGHAFI B.Sc, The University of Science and Technology Tehran, Iran, 1989 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES (Electrical & Computer Engineering) THE UNIVERSITY OF BRITISH COLUMBIA July 2005 © Ardeshir Saghafi, 2005 Abstract With the rapid advances in semiconductor technology, modern digital systems operated at GHz frequency have been successfully developed for many years. As the chip size gets progressively bigger, and the number of logic gates and chip operating frequencies increase, the clock skew becomes increasingly more important in ensuring the proper functioning of VLSI chips. With a synchronous methodology, it is impossible to increase the clock speed further without reducing the clock skew on the chip. The Phase Locked Loops (PLLs) and Delay Locked Loops (DLLs) have been widely adopted to solve the clock skew problem. In recent years, Delay Locked Loops (DLL's) have been widely used for clock alignment due to their lower phase-error accumulation and faster locking time. In this thesis a novel high resolution D L L with less than 10 ps is proposed which combines the coarse and fine delay line into an efficient hybrid delay line. Consequently, it saves power and area. 11 Table of Contents Abstract i i Table of Contents i i i List of Figures v Acknowledgment viii CHAPTER 1 Introduction 1 1.1 Clock skew 1 1.2 Delay Locked Loop '. 3 1.3 D L L Vs. P L L 5 1.4 Applications 7 1.4.1 Clock distribution 7 1.4.2 S D R A M 7 1.4.3 Time-to-Digital converter (TDC) 9 1.4.4 Automatic Test Equipment (ATE) 10 1.4.5 Clock synthesis 10 1.4.6 Clock and data recovery (CDR) 11 CHAPTER 2 Background 12 2.1 Analog D L L 12 2.2 Digital D L L 16 2.3 Double loop D L L 18 2.4 Synchronous Mirror Delay (SMD) 20 2.5 Register controlled D L L (RDLL) 24 2.6 Vernier Delay Locked Loop (VDLL) 27 CHAPTER 3 Design of proposed D L L 30 3.1 Block diagram 30 3.2 D L L modules description 39 3.2.1 Vernier delay line 39 3.2.2 Vernier delay line controller 42 3.2.3 High resolution phase detector 47 3.2.4 Lock detector. 51 CHAPTER 4 Analysis of proposed D L L 52 4.1 Testbench 52 4.2 Initial lock 53 4.3 Lock re-entry 57 4.3.1 Lock re-entry (case 1) 57 4.3.2 Lock re-entry (case 2) 58 i i i 4.4 Gate count of vernier unit delay 63 4.5 Resolution of the proposed D L L 63 4.6 Limitations of the proposed D L L 64 CHAPTER 5 Conclusion 65 Bibl iography 68 Appendix A Design V H D L code 80 Appendix B Synthesis result 95 iv List of Figures Figure 1.1 Possible hold violation due to clock skew 2 Figure 1.2 Possible setup violation due to clock skew. 3 Figure 1.3 Typical D L L block diagram 4 Figure 1.4 Typical P L L block diagram 6 Figure 1.5 S D R A M output timing with and without a D L L 8 Figure 1.6 Block diagram of the laser range finder [101] 9 Figure 2.1 Conventional Analog D L L 12 Figure 2.2 Analog D L L with duty-cycle correction 14 Figure 2.3 Analog multiphase D L L 15 Figure 2.4 Digital D L L block diagram 16 Figure 2.5 Dual loop D L L 19 Figure 2.6 Conventional SMD 21 Figure 2.7 Timing diagram of a conventional SMD 22 Figure 2.8 Block diagram of Direct SMD 23 Figure 2.9 Register Controlled D L L (RDLL) 24 Figure 2.10 Core circuit in R D L L 25 Figure 2.11 Core circuit in a RSDLL 26 Figure 2.12 Block diagram of a vernier delay line [73] 29 Figure 2.13 Schematic of vernier delay line [73] 29 Figure 3.1 Block diagram of proposed D L L 30 V Figure 3.2 Circuit and timing diagram ofa Conventional unit delay. 31 Figure 3.3 Circuit and timing diagram of a Symmetrical unit delay 32 Figure 3.4 CMOS N A N D gate 34 Figure 3.5 Phase Detector block 34 Figure 3.6 Lock Detector block 35 Figure 3.7 Controller block 35 Figure 3.8 Vernier delay line block 36 Figure 3.9 S A R D L L block diagram [33] 37 Figure 3.10 Flowchart for weighing sequence 38 Figure 3.11 Proposed unit delay circuit 40 Figure 3.12 State diagram of controller block 43 Figure 3.13 Shift registers in controller block 44 Figure 3.14 Phase detector in [50] 48 Figure 3.15 Proposed high resolution phase detector 49 Figure 3.16 Phase detector waveforms 50 Figure 3.17 Lock detector circuit 51 Figure 4.1 Initial lock mode waveform for a leading input clock 54 Figure 4.2 Initial lock mode waveform for a leading input clock (zoomed in) 55 Figure 4.3 Initial lock mode waveform for a leading output clock 56 Figure 4.4 Initial lock mode waveform for a leading output clock (zoomed in) 56 Figure 4.5 Lock re-entry mode waveform for small phase error. 58 Figure 4.6 Lock re-entry mode waveform for a leading input clock 59 Figure 4.7 Lock re-entry mode waveform for a leading input clock (zoomed in) 60 vi Figure 4.8 Introduced glitch waveform for a leading input clock 60 Figure 4.9 Lock re-entry mode waveform for a leading output clock 61 Figure 4.10 Lock re-entry mode waveform for a leading output clock (zoomed in) 62 Figure 4.11 Introduced glitch waveform for a leading output clock 62 A C K N O W L E D G M E N T S I would like to express my deepest gratitude to my academic and research advisor Dr. Andre Ivanov for his guidance and constant support in helping me to conduct and complete this work. Also my wife has been supportive, not just tolerant, of my return to graduate school. She is as pleased as I am that my dissertation is finished. She knows that I am grateful to her for continuous support, but I take this opportunity for a public acknowledgment of my debt to her. V l l l Chapter 1 Introduction This chapter introduces the research topic of this thesis. A quick review of the D L L circuit and its comparison with Phased Locked Loop are also included in this chapter. The chap-ter also describes the different applications in which the D L L is used. 1.1 Clock skew As silicon fabrication technology develops, more logic can be packed on a die and as a result the chip size gets progressively bigger. The number of logic gates and chip operat-ing frequencies increase, and the clock skew becomes increasingly more important in ensuring the proper functioning of VLSI chips. With a synchronous communication proto-col on and off the chip, it is impractical to increase the communication clock speed further without reducing the clock skew on the chip. In a synchronous design the period of clock determines the available time for any operation between two flip-flops. Any uncertainty such as skew or jitter reduces this period. The clock skew is caused by different RC delay of clock interconnections along different clock signal paths, different delays of clock buffers due to process and temperature varia-tions on the same chip, and power supply differences caused by power rail voltage drop. l The clock skew problem can also exist in other situations. For example, the input clock driver in any chip will introduce uncertain time delays between the internal and external clocks. As a result, internal clocks in a multi-chip system become asynchronous and prob-lems occur when data transfer between chips is performed. Clock skew can lead to both setup and hold time violations. Consider the circuit in Figure 1.1(a), where the clock is shown routed in the direction of the data path. Delays in the clock path lead to skewed versions of the system clock arriving at the two flip-flops. If 62 is greater than the sum of the clock-to-Q delay of FF1, the logic delay, and the setup time of FF2, then a hold time violation will occur. As shown in Figure 1.1(b), FF2 samples the wrong data. This can be prevented by adding delay to the data path from FF1 to FF2 (which increases the cycle time and is not preferred) or by reducing the clock skew. (a) clk_ D1 6* D Q clkl - * J d1 logic D2 d2 D Q r 4 F F 2 clk2 (b) elk clkl D2 clk2 J V f A A t \ J \ I V I I Figure 1.1 Possible hold violation due to clock skew If the clock signal is routed in the opposite direction to data flow as shown in Figure 1.2(a), then clock skew will not cause a hold time violation. However a setup time viola-tion can occur since clk2 might arrive earlier than clkl as shown in Figure 1.2(b). The clock cycle has to be increased in order to prevent this violation, which also harms system performance. (a) e l k _J \ / V (b) c lk l D2 clk2 J V X J V 7 \ \ Figure 1.2 Possible setup violation due to clock skew 1.2 Delay Locked L o o p To reduce clock skew, the clock distribution network should be designed with care. In addition, circuits such as Phased Locked Loops (PLLs) and Delay Locked Loops (DLL) may be necessary to reduce the total clock skew by employing them in several critical places of the clock distribution structure. 3 Basic D L L consists of a phase detector (PD) or a phase comparator (PC) block, a variable delay line, and a controller to convert the PD's output to a control signal for the delay line as shown in Figure 1.3(a). A basic D L L detects the phase error between the input clock and its output clock and adjusts the total delay of variable delay line to a multiple of peri-ods of the input clock. It introduces enough delay (Td) so the rising edge of the output clock coincides with the next rising edge of the input clock as shown in Figure 1.3(b). External Reference Clock Phase Detector Low Pass Filter Clock Buffer Error signal (a) Point of use DLL Input Clock j v -Lock lime-DLL Output Clock / \ / \ \\J \ / \ \\-DLL is locked (b) Figure 1.3 Typical D L L block diagram The correct timing of a synchronous circuit relies on clock edges and is affected by the clock skew and jitter, so the introduced 1 input clock period delay doesn't have any nega-tive impact on the functionality of systems that utilize the Delay Locked Loop circuits. The output clock's frequency of a standard Delay Locked Loop circuit is the same as that in the input clock, so generally DLLs are not used for clock synthesis. PLLs are used widely for synthesis and clock multiplication. While there are some applications which use DLLs for clock synthesis, this is not common [45], [53] and [107]. The D L L and P L L circuits are considered feedback circuits. They generally require sev-eral clock cycles to achieve lock, resulting in a large standby power consumption. These circuits cannot be used in clock deskewing applications requiring low standby power con-sumption. In other words, these circuits cannot be turned off in standby mode due to their slow locking operation. The Synchronous Mirror Delay (SMD) and Clock Synchronized Delay (CSD) circuits were developed for applications requiring low standby currents [52], [88] and [94]. These circuits have no feedback so their lock-in time is significantly less than that of DLLs or PLLs. During the standby mode, it is possible to switch them off. When power is resumed, it only takes two or three clock cycles for them to lock in, which is negligible for most applications. 1.3 D L L vs . P L L When it comes to choosing between a P L L and D L L for a particular application, differ-ences in their architecture need to be understood. The oscillator used in the P L L inherently introduces instability and accumulation of phase errors (Figure 1.4). This in turn degrades the performance of the P L L when compensating for the delay of the clock distribution net-5 work. On the other hand, the unconditionally stable DLL architecture does not accumulate phase errors [20], [100], [103]. For this reason, the DLL architecture is widely used for delay compensation and clock conditioning. External Reference! Clock 6H Phase Frequency Detector Error signal Low Pass Filter Voltage Controlled Oscillator Clock Tree Point of use Figure 1.4 Typical PLL block diagram The DLL's closed loop transfer function has only one pole (a first order system) [56] and [57]. Therefore, it is naturally a stable system. On the other hand, a PLL's closed loop transfer function has two or three poles. Therefore, stability is a major issue and needs to be addressed during design. Normally, one needs to add zeroes to a PLL's transfer function in order to stabilize the PLL circuit [103]. The input clock's jitter propagates through a DLL circuit (first order system) and can affect the performance of the system. PLL filters out the jitter, so it is the best choice for applications with high jitter input. In a clock distribution system, the main clock is gener-ated by a quartz crystal oscillator, which does not introduce a significant amount of jitter. Therefore, generally the DLL circuit is utilized for de-skewing purposes [33], [66], [68] and [98]. 6 The main disadvantage of a conventional D L L compared to a PLL, is its limited phase capture range [37]. At a given operating clock frequency a D L L can delay its input clock by an amount bounded by a minimum and maximum delay. As a consequence, extra care must be taken by a designer to prevent the loop from trying to lock to a delay outside these limits. To extend the operating range, the number of delay cells or the gain of the delay line (analog DLL) should be increased. This not only consumes additional power, but also causes more jitter from supply. 1.4 Appl icat ions DLLs are used in many different applications as described in the following subsections. 1.4.1 Clock distribution As previously mentioned, a D L L is mainly used in the clock distribution circuit which do not require clock synthesis or multiplication. Due to the nature of these systems (fixed clock frequency), a DLL's narrow capture range is not an issue, [35], [82], [33], [66], [68], [98] and [102]. 1.4.2 S D R A M In synchronous D R A M , the output data strobe (DQS) should be locked to data outputs (DQ outputs) for high-speed performance. The clock-access and output-hold times of con-ventional D R A M designs are determined by the delay time of internal circuits such as clock input and output buffers. Variations in temperature and process change access times and reduce the size of the valid data window. Several publications describe how a D L L 7 can optimize and stabilize clock-access and output hold times, [26], [47], [65], [70], [72], [73], [74], [77], [79], [85], [87], [90], [93], [94] and [104]. A n internal D L L can be used to adjust the time difference between the output and input clock signals in SDRAMs (Figure 1.5). (a) without DLL cik r DQ tAC = Td(max) tOH = Td(min) td Data out xxyy -^ 53 Valid data window (b) with DLL Clk tAC tOH DQ L \ / \ X )( Data out ) - 4 • Valid data window Figure 1.5 S D R A M output timing with and without a D L L In Double Data Rate synchronous D R A M [1], [17], [21], [32], [44], [46], [50], [51] and [75], where read/write accesses can occur on both rising and falling edges of the clock, clock synchronizing is critical and is required for both clock edges. A symmetrical D L L is used for this application. The term ''symmetrical'' means that the delay line used in the DLL has the same delay for a high-to-low or a low-to-high logic transition. 1.4.3 Time-to-Digital converter (TDC) High-resolution time-to-digital converters (TDCs) have an application in a number of measurement systems such as time-of-flight (TOF) particle detectors, laser range finders (Figure 1.6), and logic analyzers. Laser range-finding is used in many industrial applica-tions, for example measuring dimensions of ship blocks in shipyards, inspection of oil level in large tanks, and robot vision [4], [22], [23], [36], [54], [62], [69], [81], [83], [95], [96], [97], [101] and [106]. Time interval measurement (DLL) Distance result Laser diode Transmitter Amplifier + Timing discriminator" Target Figure 1.6 Block diagram of the laser range finder [101]. Modern TOF systems used in particle physics experiments, require TDCs to have a resolu-tion below 1 ns. A distance measurement accuracy of 2-3 cm corresponds to 100-200 ps of measurement time. A high-resolution measurement can be obtained by utilizing a logic buffer delay as a time unit, and a DLL is used to stabilize the value of buffer delay against process variations, temperature and power supply changes. The delay line is used in a closed loop controlled by a D L L . The time resolution is limited to the delay of each unit cell in the delay line. 1.4.4 Automatic Test Equipment (ATE) General purpose Automatic Test Equipment (ATE) requires fast devices, high tester band-width, high data rates, and high timing accuracy. At the heart of ATE is timing event gen-eration circuitry which generates control signals for different parts of ATE [99]. DLLs have been used widely in ATE to achieve required precision and eliminate process varia-tions, temperature fluctuations and supply voltage (PVT) that affect the time base genera-tor. 1.4.5 Clock synthesis PLLs have been used successfully in creating tapped ring oscillators for clock synthesis. A PLL's delay elements have two dependent variables controlled by the feedback system, the frequency and the phase. A D L L , however, has only a single dependent variable controlled by the feedback loop, the phase. The P L L will integrate the error of all its noise sources,but a D L L will only integrate the noise sources that cause jitter such as power sup-ply noise or thermal noise. This only happens over one delay period, so a D L L does not accumulate noise because it is a first order system. This is a desirable characteristic for every high performance clock generator [2], [3], [6], [7], [8], [9], [11], [12], [15], [18], [27], [28], [30], [39], [40], [41], [42], [45], [48], [64], [67], [80], [88], [103] and [105]. 10 1.4.6 Clock and data recovery (CDR) Clock and data recovery is a mechanism that allows a receiver to extract the clock from an incoming data stream which then can be used to extract the incoming data. The receiver extract the embedded clock in the data stream in order to transmit data back to the source. Both Delay-Locked-Loops (DLLs) and Phase-Locked-Loops (PLLs) can be used in clock and data recovery circuits. DLLs are rarely used in CDR circuits [14], [19], [25], [55], [61] and [91]. 11 Chapter 2 Background In this chapter, we provide an overview of different D L L types. The advantages and disad-vantages of each D L L type has been discussed. A extensive literature overview of differ- • ent types of DLLs has been included, which covers papers from 1993 to 2005. 2.1 Ana log D L L Analog DLLs were first used in clock distribution applications [10] and [13]. A conven-tional analog D L L consists of four main blocks: a voltage controlled delay line (VCDL), a charge-pump, a low pass filter, and a phase detector as shown in (Figure 2.1). V C D L RefClk Figure 2.1 Conventional analog D L L 12 The input reference clock drives the delay line and is comprised of cascaded variable delay buffers. The output clock drives the loop phase detector. The output of the phase detector is integrated by the charge pump and the loop filter capacitor to generate a loop control voltage. The loop negative feedback drives the control voltage to a value that ide-ally orces a zero phase error between the output clock and the reference clock. The simple design of the D L L offers many advantages when compared to Voltage Con-trolled Oscillator (VCO) based PLLs. Due to frequency acquisition constraints, P L L usu-ally uses a specific type of phase detector, the state-machine based phase frequency detector (PFD). In contrast, a DLL's phase detector can be easily implemented by using bang-bang control [109]. This means that the control signal of the loop can simply be a binary up or down signal rather than being proportional to the phase error magnitude. Additionally, since DLLs do not use a V C O , phase errors induced by supply or substrate noise do not accumulate over many clock cycles [108]. This improved noise immunity is the main reason for the increased usage of DLLs in applications that do not require clock synthesis [16], [19], [34] and [105]. An analog D L L is a relatively complex analog circuit requiring process-specific imple-mentation. It is difficult to reuse the same design for different technology, making analog D L L a non-portable architecture. For example, i f an analog D L L is designed for 0.35 | im CMOS technology then it is not practical to upgrade it to 0.18 | im technology, as major changes in the layout of the design are required. 13 The output clock's duty cycle changes as it passes through many delay cells. The reason is that the propagation delay of each unit cell in the delay line is not the same for low-to-high and high-to-low input, so even i f the duty-cycle of a reference clock is 50% at the input, the output duty-cycle may be significantly different. A conventional solution to this is attaching duty-cycle correction circuits to all clock output drivers, which also adds to the area and increases jitter. A n all-analog multiphase D L L is proposed in [34]. It achieves both wide range operation and low jitter performance. The proposed D L L has the same benefits as conventional ana-log D L L such as jitter cancelling and multiphase clock generation. It also uses a dual con-trolled delay cell to correct the duty-cycle problem as shown in Figure 2.2. Reference Clock V C D L Phase Detector Charge pump Low pass filter Phase Detector Charge pump Low pass filter Vcp Vduty Clk Figure 2.2 Analog D L L with duty-cycle correction 14 A second phase detector compares the inverted clock input, with the inverted clock output and generates a control signal Vduty as shown in Figure 2 . 2 . It fine-tunes the cell current ratio and therefore aligns the falling edges of reference clock and output clock. In this way, it maintains a reference clock's duty cycle. A quadrature phase mixing D L L was proposed in [104] and [105] , which completely elim-inates the limited capture range deficiency of conventional analog DLLs (Figure 2 . 3 ) . This approach is based on the fact that quadrature clocks ( 9 0 degree phase shifted clocks) can be generated for a given clock frequency. The quadrature clocks are input to a phase mixer, which can produce a clock whose phase can span the complete 0 - 3 6 0 degree phase interval. This approach reduces the limited phase range problem of conventional D L L . Reference Clock Divide 0 By 2 9o°| Phase Detector Charge Pump Figure 2 .3 Analog multiphase D L L 15 2.2 Digital D L L Both analog and digital DLLs have been used for clock alignment applications [35], [82], [33], [66], [68], [98] and [102]. A n analog D L L generally provides better jitter perfor-mance at the expense of greater complexity. Although the digital D L L uses more area and power than the analog D L L , its greater simplicity, and lower minimum required power supply voltage makes it very attractive for many clock alignment applications. Digital DLLs are characterized by their use of digital delay lines. They are typically made from simple digital circuit elements (Figure 2.4). This simplicity helps to design a portable digital D L L which can be easily adopted for different technologies. Additionally, because phase information in a digital D L L is stored as a digital state, digital DLLs can provide very fast timing recovery after being placed in standby mode. However, conventional dig-ital DLLs provide only moderate phase resolution and jitter performance [1], [21], [32], [48], [49], [71], [74], [76], [78], [92] and [94]. External Reference | Clock _ Demultiplexer N -h Phase Detector Right Shift Register •1 Left ^ Error signal Figure 2.4. Digital D L L block diagram 16 Another benefit of digital DLLs is their ability to operate at lower voltages than analog DLL's . Because analog DLLs require the use of saturated current sources, they experience minimum voltage problems as supply voltage decreases. Digital DLLs , on the other hand, only require enough voltage to ensure the proper operation of their digital gate elements. A digital DLLs utilize the power saving benefits of power supply scaling better than ana-log DLLs. The power consumption of an analog D L L is the sum of static power consumed by the constant current sources in the circuit and the dynamic power of C V f (where C is capacitance and f is frequency). The power consumption of a digital D L L , on the other hand, is determined primarily by C V f power, which decreases quadratically with supply voltage. The delay elements can be implemented with almost any circuit block, but because the phase resolution of the delay line is determined by the propagation of each unit cell, delay elements that provide minimal delay are generally preferred. The delay line of a conven-tional digital D L L uses inverters, since they provide the shortest delay of any CMOS digi-tal gates. Because of the inverting characteristic of an inverter gate, the delay line is tapped only at every other inverter (two inverters in a series form a unit cell) to ensure that output taps are not inverted and only shifted by the total propagation delay of the two inverters. Although conventional delay lines are attractive for their simplicity, DLLs based on such conventional delay elements suffer from several significant limitations. First, the delay 17 line provides fairly coarse resolution. For example, the delay line with inverters as unit cells provides a minimum phase step corresponding to two inverter delays. Such coarse phase resolution is not enough for many clock alignment applications. Second, conventional delay lines deliver only a limited phase range. In order to cover at least one full cycle of phase, the delay line length and unit cell delays are adjusted to pro-vide at least 360 degrees of phase under the fastest process, voltage, and temperature (PVT) conditions and minimum operating frequency. Consequently to cover this range, a long delay line which occupies more silicon area and dissipates additional power is required. Additionally, because inverters offer a poor power supply rejection ratio (PSRR), power supply's noise-induced jitter can be accumulated as the signal propagates through the delay line. This causes the signals from the later taps in the delay line to intro-duce more jitter than earlier taps. 2.3 Double loop D L L The key parameters in the D L L design are locking time, power consumption, jitter, and phase error, which depend on the choice of proper delay elements and loop control meth-ods. The phase adjustment is done through a variable delay line or a tapped delay line. The tapped delay line is used for digital control, where the locking characteristics are less sen-sitive to switching noise and cross talk. On the other hand, the variable delay line is used for reducing the static phase error, where the delay changes gradually. Therefore, the logi-cal approach to obtaining a D L L with fast locking and a low phase error is to combine these two methods. This is called a dual loop D L L , sometimes referred to as semi-digital 18 D L L [26], [29], [31], [38], [45], [58], [60], [63], [84] and [89]. The locking procedure is done in two steps, coarse tuning and fine tuning. Coarse tuning and fine tuning are per-formed in the digital and analog domains, respectively (Figure 2.5). The dual loop D L L can be used in low power stand-by mode applications. Then, the recovery from stand-by mode to regular operational mode is almost immediate because digital information is kept in the stand-by mode and the position of the output tap in the delay line is known at star-tup. External Clock Delay Delay Delay Delay Charge Pump] Loop Filter PFD Mux / -Analog Delay Mux /—SjVlux / - S j V l u x / ~ \ Mux /—/-Clock Buffer Digital Control Block Digital Phase Detector! Figure 2.5 Dual loop D L L After powering up the system, the coarse tuning mechanism starts. Normally the middle tap in the delay line is selected and the output clock is compared to input reference clock 19 by a digital phase detector. Depending on which clock is leading and which is lagging, the output of the phase detector shifts the selected tap right or left. Finally, the proper tap with minimum delay to the reference clock is selected. By that time, coarse tuning phase had been completed. To avoid unwanted phase jitter, the digital block is disabled and shift registers in the con-troller block hold their positions. The analog control part is enabled to reduce the phase error. This function is performed by a lock window mechanism. If the internal clock is outside the window, the digital block is enabled. Once the internal clock enters the lock window range, the analog block is enabled and the digital block is disabled. The range of the analog part must be large enough to cover the lock-detecting window. The analog con-trol block consists ofa PFD, a charge pump, and a loop filter. The operation of the analog loop is the same as that of the conventional analog D L L . 2.4 S y n c h r o n o u s Mirror Delay (SMD) The conventional P L L and D L L circuits are considered feedback systems, requiring many clock cycles to achieve lock. Therefore they can not be turned off and are not used in clock-skew suppression applications requiring low standby currents for example in a cell phone device. On the other hand, Synchronous Mirror Delay (SMD) and Clock Synchro-nized Delay (CSD) circuits are non-feedback systems which can achieve the lock, in only two clock cycles [52], [88] and [94]. Therefore, in standby mode these circuits can be dis-abled, and they can lock to the reference clock in just two clock cycles when the operation mode is resumed. 20 A conventional SMD circuit as shown in Figure 2.6, consists of an input buffer with delay of d l , a clock driver with delay d 2 , a replica delay line (a dummy input buffer plus a dummy clock driver with total delay (t,- e p l i c a = dj + d2), and two delay lines (a delay-mea-surement line and a variable-delay line arranged in parallel). When the circuit is activated, the first clock signal propagates through the input buffer, the replica delay line, and the delay-measurement line with delay [ t C K - t r e p l i c a ] until the second signal comes out of the input buffer. Delay time [ t C K - 1 ^ ] ^ ] determines the length of the variable line. The sec-ond signal propagates through the variable-delay line and comes out of the clock driver. The resulting total delay time is d, + d 2 + [ t C K - t ^ J + [ t C K - t r e p l i c a ] + d 2 = 2 t C K (Fig-ure 2.7). In this manner, no feedback circuitry is used and clock skew is eliminated within two clock cycles. The simple structure of the SMD circuit also reduces design efforts [52], [88] and [94]. tV = [tCK - (dl + d2)] < • Buffer R e p l i c a D e l a y Meas. Delay Line Var. Delay Line d2 Clock Driver Internal Clock Line I Figure 2.6 Conventional SMD 21 tCK tCK Ext Clock A B C Int Clock n n V i —| n Vreplica 1 i r~ii n d2\ ~ i n Figure 2.7 Timing diagram of a conventional SMD Despite their advantages, SMD circuits are not widely used because they use a dummy clock driver circuit based on clock driver circuits after the placement and routing phases. Therefore, they are used for devices in which the clock driver circuits can be fixed during the circuit design stage, e.g., memory elements [94]. Furthermore, a difference between the original clock driver circuit and the dummy clock driver circuit exists due to process, power supply voltage,, and temperature variations (PVT). This delay difference increases the phase error, which can not be compensated for during the operation mode because no feedback mechanism exists for a SMD circuit. 22 A direct-skew-detect synchronous mirror delay (direct SMD) achieves clock-skew sup-pression in only two clock cycles [43] and [52]. It can be used for application-specific integrated circuits (ASIC) with undefined clock paths as shown in Figure 2.8. The direct SMD circuit detects both clock skew and clock cycle by using a direct-skew detector and clock suppression circuitry. The direct SMD circuit does not use a dummy clock driver circuit. Therefore, it does not experience the same problems as mentioned above for a con-ventional SMD circuit. Input Ext B u f f e r Clock Dummy, Input Buffer Skew Detector h - 1 Skew-Detection Signal Meas. Delay Line Var. Delay Line Switch Clock Driver Internal Clock Line Figure 2.8 Block diagram of direct SMD 23 2.5 Register controlled D L L (RDLL) The R D L L belongs to the digital D L L family and is widely used in high speed synchro-nous D R A M (SDRAM) applications [17], [51], [85] and [90]. In a SDRAM, the output data strobe (DQS) should be locked to the data outputs. To optimize and stabilize clock-access and output times, an internal R D L L is used in a SDRAM memory chip, which adjusts the time difference between the output and input clock signals. The R D L L consists of a tapped delay line, a shift register, a phase detector, and a replica input buffer dummy [85]. The replica input buffer dummy is used in the feedback path to match the delay of the input clock buffer. The phase detector (PD) is used to compare the relative timing of the edges of the input clock and the feedback clock signal, which comes through the tapped delay line. The shift register controls the point of entry in the delay line for the incoming external clock as shown in Figure 2.9. External Clock M Clock buffer Clock buffer (dummy) Phase Comparator] Delay line Output Clock Shift register Figure 2.9 Register Controlled D L L (RDLL) 24 The outputs of the phase detector, shift-right and shift-left, are used to control the shift register. In the conventional R D L L , only one bit of the shift register output is high, while the other bits are zero. The single bit is used to select a point of entry for CLKIn in the delay line. When the rising edge of the input clock is within the resolution of the output clock, then both outputs of PD, shift-right and shift-left, are low and the loop is locked as shown in Figure 2.10. ^>-^7t>^>---:"rOH> CLKIn-CLKOut H L L Shift register Figure 2.10 Core circuit in R D L L The resolution of the R D L L is determined by the size of unit delay used in the delay line. The locking range is determined by the number of delay stages used in the delay line. Since the D L L circuit inserts a delay time between CLKIn and CLKOut, making the out-put clock change simultaneously with the next rising edge of the input clock, the minimum operating frequency to which the R D L L can lock is the reciprocal of the product of the number of stages in the delay line with the delay per stage (F mj n= l/(Td * N), where Td is the delay of one unit delay and N is the number of unit delays in the delay line). Adding more delay stages will increase the locking range of the R D L L at the cost of increased chip area and power consumption [17], [51], [85] and [90]. 25 The conventional R D L L uses an A N D gate as the unit-delay stage (NAND + Inverter). The problem created by using a N A N D + Inverter as the basic delay element is that the propagation delay through the unit delay for a high-to-low transition is not equal to the delay of a low-to-high transition, i.e, t P H L is not equal to t P L H . If the difference between t P H L and t P L H is 20 ps, for example, then the total skew of the falling edge through 50 stages is 1 ns. Because of this skew, the input clock's duty-cycle will not be preserved, when the clock propagates through the delay line. A Register-Controlled Symmetrical D L L (RSDLL) is proposed in [51], which can be used for duty-cycle sensitive applications. For example, it meets the requirements of double-data-rate (DDR) S D R A M that read/write accesses occurs on both rising and falling edges of the clock. In the RSDLL, a modified symmetrical delay element is used, with a N A N D gate instead of an inverter (two N A N D gates per delay stage). Input • L i H H Q Q Q Q Shift register H Figure 2.11 Core circuit in a R S D L L 26 This symmetrical unit delay guarantees that t P H L = t P L H independently of process varia-tions, since when one N A N D switches from HIGH to LOW, the other switches from L O W to HIGH. The schematic for a symmetrical D L L is shown in Figure 2.11. 2.6 Vernier Delay Locked Loop (VDLL) The Vernier principle is based on the Vernier caliper [83]. The tool measures the length of an object placed between its two jaws. On the sides of the jaws, an indicator mark shows the distance between the jaws on a scale. Since the indicator usually falls between two tick marks, additional accuracy is obtained by dividing the distance between tick marks. A n additional scale is included next to the indicator, which has ten divisions in a distance equal to nine divisions on the scale. Because of this mismatch it is possible to measure a subdivision of the primary scale ten times smaller than the distance between tick marks. Based on this concept, a delay line with N =10 delay elements can be designed to have a total delay of H - 9 times of clock periods. The minimum achievable time step is = TV N = D/H where T is the period of the input clock and D is delay of each delay element. This technique was introduced and implemented for a time to digital converter (TDC) [36], [70], [83] and [99]. A TDC is mainly used to digitize the time which has many poten-tial applications in high-energy and nuclear physics experiments. 27 In a conventional digital D L L , the quantization error is equal to the propagation delay of each unit in the delay line. In a 0.35 | im CMOS technology, the propagation delay of an inverter gate is about 40 ps. Thus, a unit delay consisting of two series inverter presents a delay of 80 ps. For a GHz operating frequency, the 80 ps quantization error accounts for 8% of 1 ns clock period, an error that affects the functionality of a synchronous system. The Vernier technique is implemented to reduce this error in [5], [24], and [86]. A modified version of the Register-Controlled D L L (RDLL) is proposed [73], which relies on the Vernier concept. It consists of two series of RDLLs. The first R D L L performs the coarse delay adjustment, with a 200 ps quantization error. The second R D L L , with a 40 ps quantization error, performs fine-tuning. The coarse R D L L uses the conventional delay line, where each unit delay consists of a N A N D gate and an inverter in a series configuration. The fine R D L L uses a different con-figuration, composed of two delay elements that have delay times of t d and 1.2 td, where t d is the unit delay time of the conventional delay element as shown in Figure 2.12. The delay lines are arranged in two parallel main and sub delay lines and are serially con-nected by switches SW0 to SW4. In Figure 2.12, only one of the switches can be closed at any time. For example, i f SW0 is closed, the delay line generates 5 td. Similarly, i f SW1 is closed, the delay line generates 5.2 td. Thus, this delay arrangement can generate a 0.2 t d delay step, which is considerably smaller than that of conventional delay. 28 Sub-delay line IN 1.2 td 1.2 td 1.2 td 1.2 td SWON SW1 S W 2 \ SW3N SW4N OUT - o td td td td Main delay line td Figure 2.12 Block diagram of a Vernier delay line [73] In figure 2.13, the main and sub-delay lines are connected with SWO to SW4 switches. The fan-out of the main delay line is one, while that of the sub_delay line is two. Hence, the delay of the sub_delay line exceeds that of the main delay line. This delay difference becomes the unit delay time of the delay line, which is equal to the quantization error as shown in F igure 2.13. Sub-delay line SW(n-l) t r td+A F.0 = 2 V td - a V SW(n) Main delay line F.0 = 1 Figure 2.13 Schematic of vernier delay line [73]. 29 Chapter 3 Design of proposed DLL This chapter covers the block diagram of the proposed circuit and detailed circuit explana-tions of each module in the block diagram. The logic design is described thoroughly. The simulation results are covered in the next chapter. The design goal is to increase the reso-lution of D L L to less than 10 ps, as well as reducing the area (gate size) of the vernier delay line in the D L L by a minimum of 10%. The power consumption is also reduced as a result of the gate reduction in the vernier delay line. The resolution of less than 10 ps, area reduction of 15% and operating frequency of up to 200 MHz is achieved in this design. 3.1 B lock diagram The block diagram consists of four modules, phase detector, lock detector, Vernier delay line and controller as shown in Figure 3.1. Output Input Clock Vernier delay line Phase Detector a Controller Lock Lock Indicator Detector • Error signal Clock Figure 3.1 Block diagram of proposed D L L 30 The input clock is connected to two modules, the phase detector and the Vernier delay line. The Vernier delay line propagates the input clock and provides N output taps where N is the number of unit delays in the delay line In order to lock the output clock to the input clock for all input frequencies, the delay of the delay line should be greater than the period of the minimum operating frequency. For example, if a DLL's locking range is between 100 MHz to 200 MHz, then the delay line must be able to delay the input clock by 10 ns. Therefore, the input clock is delayed by 10 ns when it exits from the last output tap. If the delay of each unit is, for example, 50 ps, then the delay line needs 200 unit delays. Therefore, to reduce the minimum operating fre-quencies, more unit delays are required, which leads to more area and power consumption. The delay of each unit depends on the number of cascaded gates in each unit and the tech-nology in which the circuit is implemented. The conventional unit delay consists of 1 N A N D and 1 inverter gates in series, which, in 0.18 | i m , technology generates a delay of approximately 70 ps. The same unit cell implemented in 0.35 [xm can generate approxi-mately 100 ps. The delay estimates are based on commercial libraries. There is a drawback for conventional unit delay, as the propagation delay is not symmetri-cal and the total delay for the rising edge of input signal is not the same as for the falling edge. Therefore, an input clock with a 50% duty cycle can result in a square wave pulse which no longer has the a 50% duty cycle as shown in Figure 3.2. This non-symmetrical aspect can cause problems in Double-Data-Rate DRAMs, where read/write access can 31 occur on both rising and falling edges of the clock [1], [17], [21], [32], [44], [46], [50] and [51]. InA In pLH Out OutA H t l t2 t t l j*t2 Figure 3.2 Circuit and timing diagram of a conventional unit delay The proposed DLL utilizes the unit delay consisting of two basic NAND gates in series. This configuration eliminates the non-symmetrical characteristic of a conventional unit delay. The total propagation delay of t l ( T P H L + T P L H ) for the input rising edge is equal to t2 ( T P L H + T P H L ) for the falling edge of the same input clock. The T P H L and T P L H are high to low and low to high delays of the NAND gate, respectively as shown in Figure 3.3. Therefore, the duty cycle of the input clock is preserved through-out the delay line. InA In J>TO Out OutA ^ t H H t l t2 t l =t2 • t Figure 3.3 Circuit and timing diagram of a symmetrical Unit delay 32 The Vernier delay line is controlled by a finite state machine or simply a controller. There are two modes of operation, coarse and fine. A system reset signal initiates the coarse tun-ing mode. In this mode, the phase detector compares the output clock signal from the cen-ter tap with the reference input clock. If the positive edge of the input reference clock is leading, then the controller shifts the output tap to the left and the total delay decreases. On the other hand, if the positive edge of the input reference clock is lagging, then control-ler shifts the output tap to the right and the total effective delay increases. The controller enters the fine tuning mode when the positive edge of the input reference clock and the output tap of delay line are less than a unit delay apart. Therefore, the delay of each Vernier unit determines the resolution of the coarse tuning mode. In the fine tuning mode, each time unit shift to the left or right is a fraction of its coarse tuning mode. This enhanced resolution determines the final resolution of the system and sets the maximum phase jitter. The phase detector compares the input clock reference with the output tap signal of the delay line. The resolution of the D L L depends not only on the fine resolution of each Ver-nier unit delay but also on the resolution of the phase detector. In this design, the phase detector's resolution is determined by the differential delay of a two input N A N D . Generally, in CMOS gates the propagation delay from input ports to output port are not the same. For example, in a N A N D gate, the input A which is connected to NMOS transistor T l , has a smaller propagation delay than input B, which is connected to the NMOS tran-33 • sistor T2 because the capacitance load on the drain of T2 is more that of TI as shown in Figure 3.4. This difference for a two input N A N D gate in CMOS 0.18 | im technology is less than 10 ps and varies with load and input signal transition time (slew rate). Out A Figure 3.4 CMOS N A N D gate The phase detector block has three outputs: increasedelay, decrease_delay, and controller_clk as shown in Figure 3.5. At any time during the coarse and fine tuning mode, one of the increase_delay or decreasedelay outputs is active and controllerclk is used to synchronize the controller with the phase detector, so any shift to right or left is performed on the positive edge of controllerclk output. dll_clk_input dll_clk_output reset Figure 3.5 Phase Detector block increase_delay register_clk decrease_delay 34 When the interval between the positive edge of the output clock and the input reference clock is within the resolution of the D L L , then D L L is in lock mode. A l l of the phase detector's outputs are disabled and the controller stays is in standby mode. The lock detec-tor block indicates when D L L is in the lock mode, and its output goes high when the D L L is locked as is shown in Figure 3.6. increase_delay dll_clock_input decrease_delay • lock indicator reset Figure 3.6 Lock Detector block The controller block is a finite state machine (FSM) controlling the delay line as shown in Figures 3.7 and 3.8. It controls the coarse and fine tuning modes. It also provides the mechanism to resume the lock mode when the input's clock frequency or phase changes rapidly. The system reset pulse initializes the D L L , and the controller block goes into reset mode when the system is powered up. The detailed flowchart is shown in Figure 3.12. reset increase_delay. decrease_delay • registerer_clk fine control ^ > fine_control_inv ^ > coarse_control Figure 3.7 Controller block 35 fine control inv coarse control I fine control i Vernier delay line •*»*- delay_line_output delay_line_input Figure 3.8 Vernier delay line block The delay line has 128 output taps controlled by the controller block. During initialization the center tap is selected as the output tap. The register-input bus is hardwired to a hex value of "0000000080000000", which means that all the register-input bits except bit 63 are tied to logic zero. During system power up, the input load signal is asserted to logic one. Consequently, this number is loaded into a 128 bit shift register. After reset, the cen-ter tap corresponding to output control bit 63 of the controller's shift register is selected for the delay line output tap. It is possible to load the shift register with any other number, so any output tap in the delay line can be selected. The center tap is however the best choice, because it gives the maxi-mum dynamic range for both right and left shift, so the lock mode can be achieved in the fastest time. In addition to speed, choosing the center tap as initial output tap leaves a maximum number of unit delays in both directions. Therefore, the controller output selec-tor does not reach the boundary taps before entering the lock mode. 36 At any time, only one bit of the shift register is active, selecting an output tap of the delay line. In this design all the unit delays are the same and exhibit the same amount of delay. A linear approach has been selected to achieve the lock mode in the design of DLL in this thesis. Therefore, the controller linearly shifts the output tap to the right or left one step at a time so the skew between the output clock and input reference clock is gradually reduced to the minimum, which is less than the resolution of fine tuning delay units. A Successive Approximation Register Delay Locked Loop (SARDLL) is proposed in [33], which uses a counter instead of a shift register. Also, its delay line is designed in a binary-weighted manner and no longer consists of delay units with equal delay time. The N-bit control word from the up/down counter determines whether the input clock goes through the delay stage or passes it as shown in Figure 3.9. Input Clock Feedback Clock 1 2 4 m 2N-3 2N-2 2N-1 1 1 | Output 4-, J L J L r\nnV f ^ J Delay Line ^ N-bit Control Word Phase Comp Fast Idle N-bit Up/Down Counter Slow ClockFigure 3.9 SARDLL block diagram [33]. 37 For faster lock time, the binary search algorithm is incorporated into S A R D L L . This algo-rithm reduces the searching effort and speeds up the lock time process. The flowchart in Figure 3.10 demonstrates how this algorithm works for a three-bit control word. In the beginning, the most significant bit (MSB) of the controller output is set to one, and all the other bits are set to zero. A phase comparator examines whether the output clock leads the input clock or not. If it does, the MSB remains high. If not, it is set to low and held con-stant. In this way the MSB is determined and the process is repeated for each following bit until the least significant bit (LSB) is determined. In this way, the D L L can be locked quickly. (Start) Figure 3.10 Flowchart for weighing sequence A conventional linear approach has been implemented in this thesis. Devising the best algorithm to speed up the lock time period is an independent topic which can be explored in future research projects. 38 3.2 D L L modules description In this section, all the modules for the proposed D L L are explained in detail. First, the cir-cuit and all of its components are described. Then, the functionality and operation of each block are investigated in details. 3.2.1 Vernier delay line The delay line consists of N unit delay in a chain configuration. In this design N=128, which establishes an approximate minimum operating frequency of 100 M H z based on target spec. More unit delays are needed to lower the minimum operating frequency. Each unit delay consists of five dual-input N A N D gates. Therefore, a total of 640 N A N D gates are used for this Vernier delay line. In clock distribution applications, the clock frequency is fixed, so the minimum value of N is calculated for the frequency, automatically leading to minimal power and area con-sumption. In the clock recovery application, the D L L operates in a range of frequencies, so the value of N is determined by the lowest frequency component in the incoming data. The output port of all 128 delay units comprising the Vernier delay line are connected to a single-bit bus. This single-bit bus is the output of the D L L and is fed back to the phase detector block for phase comparison. If none of the tri-sate output buffers in the Vernier delay line are enabled, then the D L L output floats which is neither low or high value. 39 In order to prevent the DLL output to float, a small tri-state buffer is hooked up to the DLL output. The input and enable ports of this buffer are tied to logic high, so its output holds the DLL output to a weak high '1 ' value. Due to the weak drive capability of this small buffer, a low output at any one of these 128 buffers overrides this weak high value and the DLL output is pulled down to the '0' logic value. Each unit delay consists of five NAND and one tri-state buffer gates. U l and U2 form the fine unit delay, and U3 and U4 form the coarse unit delay. U5 acts as a switch controlled by the fine-control input. The coarsecontrol input is connected to enable port of the buffer gate (U6) and determines whether the unit_delay_out port is connected to output of the U4 or is in a state of high impedance as shown in Figure 3.11. VDD VDD finejnput finecontrol fine control inv vernier_input coarse_input • clk_output fme_output ..vernier_output coarse_output Figure 3.11 Proposed unit delay circuit 40 The clk_output of all N unit delays are tied to each other and form a one bit tri-state bus. A single tri-state with weak output drive holds this bus at weak high level which guarantees this single-bit bus never floats. The fine and coarse delay units are constructed by two N A N D gates in series, forming a symmetrical delay line. The propagation delay is the same for both rising and falling edges, so the duty-cycle is preserved along the line. Each output port of U2 and U4 is connected to two other inputs, so the fan-out is two and both U2 and U4 use the A l port for delay input. As the result, both U2 and U4 introduce the same amount of propagation delay. The difference between fine and coarse unit delay is that the fineinput is connected to port A2 of U I , but the coarse_input is connected to port A l of U3. In a N A N D gate, the propagation delay from A l and A2 ports to output Z is not the same. The Vernier technique is based on this inherent characteristic of the N A N D gate and uses this differential delay between the two inputs to achieve a fine step resolution. In DLLs proposed in [47], [59], [51] [85] and [90], the input clock is connected to all the unit delays in the delay chain, so there are N taps, where N is the number of unit delays in the delay line. This large fan-out requires a clock driver, which is large in area and con-sumes extra power. It also introduces an extra delay that has to be compensated for with another dummy clock driver in the feedback path. 41 In this design, the input clock is connected to only two N A N D gates in the first unit delay, so there is no need for the clock driver. This eliminates the phase shift between the input reference clock and output clock due to delay mismatch between the clock and dummy clock driver. There are a total of 5 dual-input N A N D and one tri-state gate in each unit delay, which is less than 6 dual input N A N D and 6 inverter gates used in the previously described digital Vernier D L L circuit [73] as it is shown in Figure 2.13. The coarseinput and fmeinput ports of the first unit delay are tied to the input reference elk port. This is the entry port for both fine and coarse chains, and from this point the ref-erence clock propagates through two separate fine and coarse delay chains. The fme_output, coarse_output and vernierout ports in the last unit delay of the delay chain are not connected to any net. 3.2.2 Vernier delay line controller The controller block consists of a finite state machine (FSM) and two shift registers that control the D L L operation. A l l the timing control for the delay line is originated in the controller block. It determines which output tap in the delay line is connected to the D L L output and whether the D L L is in coarse or fine mode. 42 Reset decreasedelay & fine_control(N-l) increase_delay & fine_control(0) Figure 3.12 State diagram of controller block The finite state machine has four states: IDLE, INCREMENT, DECREMENT, and FINE as shown in Figure 3.12. The F S M remains in the IDLE state while reset is asserted. The initial coarse_load_data value is loaded into the coarse shift register when reset is asserted. This value determines which output tap is selected as the output of the Vernier delay line. The default value of "00000000000000008000000000000000" selects the center tap. The register_clock, increase_delay and decrease_delay are generated by the phase detec-tor block The input register_clock signal is used to clock the shift register. The increase_delay and decrease_delay signals determine whether is a right shift or a left shift as shown in Figure 3.13. 43 increasedelay-decrease_delay-register_clk increase_delay decrease_delay-register_clk coarse_load_data 5^ Right D B^- Left Enable (STATE = INCREMENT) gB» Clk Q (STATE == DECREMENT) coarse_control fine load data (STATE == FINE) fine control fine control inv Figure 3.13 Shift registers in controller block Depending on whether the increase_delay or decrease_delay signals is asserted, the state machine moves to INCREMENT or DECREMENT state, respectively. The state machine stays in the D E C R E M E N T state as long as decrese_delay is asserted and moves to the FINE state when increasedelay is asserted for the first time. The sate machine stays in the INCREMENT state as long as increase_delay is asserted and moves to the D E C R E M E N T state when decrease_delay is asserted for the first time. Subsequently, it moves to the FINE state in the next clock when increase_delay is asserted 44 Therefore, regardless of whether it is in a state of DECREMENT or INCREMENT, the state machine ends up in the FINE state where the coarse shift register is disabled and the fine shift register's output determines the amount of incremental fine delay needed for the D L L to lock its output clock with the input reference clock. The D L L stays in the lock mode for as long as the input clock phase is steady and the phase difference between the output and input reference clock is within the resolution of the phase detector. If input clock's frequency and phase change at any time, then the D L L exits the lock mode. If the output clock's rising edge leads the input clock's rising edge, then increase_delay is asserted. On the other hand, if the input clock's rising edge leads the out-put clock's rising edge, then decrease_delay is asserted. In either case, the register_clock is enabled. The state machine stays in the FINE state and the fine shift register shifts left or right depending on whether decreasedelay or increase_delay is asserted. For example if the resolution of Vernier delay line is 10 ps and the fine shift register holds the hex value of "00000000010000000000000000000000" when D L L is in lock mode. The fine shift register can be shifted to the left until its most significant bit becomes "1", which requires 39 clock cycles. The delay of delay line is then decreased by 390 ps. On the other hand shift register can be shifted to the right until its least significant bit becomes "1" which requires 88 clock cycles and the delay of delay line is increased by 880 ps. Therefore, i f the phase error between the output clock's rising edge and input clock's ris-ing edge is within this window, then the state machine stays in the FFNE state and lock mode is achieved. 45 If phase error is not within this window, then the state machine shifts to either INCRE-MENT or DECREMENT, depending on whether an increase or decrease in the delay line is required. At this point, the fine shift register resets to "0" and is disabled. The coarse shift register, which controls the coarse delay line, is enabled and each shift to the right or left increases or decreases the delay by an amount of delay equal to coarse unit delay (delay of two NAND gates in a row). The state machine finally moves into the FINE state when the phase error is less than the coarse unit delay, and then the fine incremental delay can reduce the phase error into less than Vernier resolution. In order to lower the power consumption in this DLL, only register_clock is used as the clock to the controller module. Therefore, while DLL is in lock mode, both increase_delay and decrease_delay are deasserted and registerclock is not enabled. The controller mod-ule has 128 flip-flops for each coarse and fine shift registers, so turning off the clock to shift registers when both are disabled, lowers the power consumption. A flip-flop con-sumes power if it is clocked regardless of its D input changes. Disabling a clock when is not required saves power in digital circuits. The Vernier delay line consists of 128 unit delays. Therefore, there are 128 flip-flops in each coarse and fine shift register. The finite state machine has four independent states. Two flip-flops are required to encode the two bits representing these 3 states. In total, there are 258 flip-flops in the controller module, so clock-gating (disabling a clock when is not required) saves power when DLL is in the lock mode. 46 3.2.3 High-resolution phase detector The phase detector in D L L detects the phase error between output and input reference clocks. The resolution of a Vernier D L L depends not only on the Vernier concept utilized in the delay line, but also on how its phase detector is designed. The minimum phase error that can be detected by the phase detector is defined as the phase detector's resolution. The resolution of a phase detector depends on many factors, including design methodology and CMOS technology implemented in chip fabrication. A high-resolution phase detector is proposed in [50], where the delay of a buffer deter-mines the resolution. The 70 ps is achieved when it is implemented in 0.18 | im technol-ogy. The phase detector has three outputs: Shift_Left, Shift_Right, and Clk as shown in Figure 3.14. When the rising edge of the input clock is within one unit delay (the delay of U4) of the rising edge of the output clock, both outputs of the phase detector, Shift_Right and ShiftJLeft, go to low and Clk is turned off. A divide-by-two is included in the phase detector, so the phase detector is made to wait at least two clock cycles before making another decision, generating a high on either Shift_Right or Shift_Left. This provides enough time for the shift register in the proposed [50] design to operate and for its output waveform to stabilize, on the other hand increases the lock time, because now a decision has to be made for every two input clock cycles. 47 Figure 3.14 Phase detector in [50]. A modified version of the high-resolution phase detector [50] is proposed in this thesis which can significantly improve resolution. The Vernier methodology is implemented in this design, which effectively reduces the amount of delay between the D inputs of UI and U2. As explained previously, the delay between two inputs and the output of the A N D gate is not the same for both inputs. The delay difference is exploited in the Vernier delay line to achieve a very small fine incremental unit delay. The same concept is used in the proposed high-resolution phase detector in the thesis. The schematic of this phase detector is shown in Figure 3.15. The U7 and U8 introduce the same delay because both gates are connected through pin A l of 48 the A N D gate. The U3 gate introduces slightly more delay, because the A2 pin is used as input. The 0.18 |J.m technology library used for simulation and synthesis, introduces less than 10 ps of delay difference between two outputs and output of an A N D gate. Figure 3.15 Proposed high resolution phase detector The decreasedelay and increasedelay are ORed to generate register_clk. The resolution of a phase detector is defined as the minimum detectable phase error between its two inputs. If phase error is within the resolution of the phase detector, then decrease_delay, increase_delay and register_clk stay low. The OR gate (U6) also delays the register_clk to either increasedelay or decreasedelay which guarantees the required setup for the flip-flops in the controller driven by register_clk. In a flip-flop the data should not change within setup and hold time window around the clock edge, otherwise output is not predictable and can go to a metastable (unstable) condition. 49 If the output clock leads the input clock by a margin greater than the resolution, a delay difference is created between the A l and A2 input pins to the output pin in the A N D gate. Then, the Q pin of UI and U2 go high resulting a high on the increase_delay output. On the other hand, if the input clock leads the output clock by a margin greater than the resolution, then the Q pin of UI and U2 go low (Q goes high for both UI and U2), result-ing in a high on decreasedelay output. In either case, register_clk goes high and generates the required clock edge for the logic in the controller module as shown in Figure 3.16. DLL Input Clock DLL Output Clock U1 /Q decrease_delay r-*\ Output leading I—*| Input leading ^ " w U2 /Q / increase_delay / \ / \ j j_ ft ft — "— S^S register_clk / \ / \ jj / \ Figure 3.16 P h a s e detector waveforms If none of the two cases exist, the input and output clocks are within the resolution of the phase detector. In this case, the Q of UI goes high and the Q pin of U2 goes low resulting in a low on both increase_delay and decrease_delay outputs. This happens when output locks to input and D L L is in lock mode. 50 The divide by two logic (U3 and U7 in Figure 3.13) is not used in the proposed high-reso-lution phase detector. The delay of the Vernier delay line increases or decreases by a small differential amount equal to the resolution of the delay line. Therefore, the delay line can be stabilized before the next decision is taken on the next edge of input clocks, and there is no need to delay by every other clock. This reduces the time required for the D L L to achieve the lock mode. The lock mode is detected by the lock detector module and is described in the next section. 3.2.4 Lock detector The lock detector is a very simple circuit, which outputs a high when D L L is in the lock mode as shown in Figure 3.17. If both increasedelay and decrease_delay are low on the falling edge of the D L L input clock, then the output lockjndicator goes high to indicate that D L L now is in lock mode. The D L L input clock is used instead of the register_clk because when D L L goes to lock mode, the register_clk is off and can not clock the low value on the decreasedelay and increasedelay. increase_delay decreasedelay D L L input clock D Q > 1—c lock indicator Figure 3.17 Lock detector circuit 51 Chapter 4 Analysis of proposed DLL This chapter analyzes the simulation results, describes the testbench, and demonstrates how the D L L achieves the lock mode. The coarse and fine phases of the locking process are investigated and illustrated in the captured waveforms. 4.1 Testbench A simple testbench instantiates the D L L design, clock, and reset generator. It also intro-duces glitch in the clock in order to examine how the D L L re-enters the lock mode when its input clock phase changes abruptly. The lock_indicator signal is monitored any time this signal becomes high indicating that D L L has entered the lock mode. The target resolu-tion is less than 10 ps for the operating frequency range of 100 MHz to 200 MHz. In order to verify that the D L L can recover from any abrupt input phase changes, after a set period of time a glitch is imposed on the input clock source. This drives the D L L into the non-locking mode, where the D L L mechanism guarantees recovery. After some time, the D L L locks to the input signal. The time it takes for D L L to lock depends on input fluc-tuations, the D L L architecture, the length of the delay line, and the algorithm used in the controller's module, where the worst period is defined as the lock recovery period. 52 The D L L described in this thesis is in lock mode when the controllers state machine is in the FINE state and when both increase_delay and decrease_delay signals are inactive. Depending on the imposed glitch, the lock mode can be achieved in the FINE state based on the condition that this glitch is smaller than unit delay. Any variation larger than unit delay forces the state machine to enter INCREMENT or DECREMENT state, which later re-enter the FINE state and finally enable D L L to regain lock status. The testbench is configured for six different cases and exhaustively covers all the different operational modes of D L L . The first two cases verify the general locking process after power up and reset, considering both possible leading or lagging input clock in reference to output clock. The other four cases verify the lock re-entry process when an amount of glitch is applied to input clock. Depending on the amount of glitch and the relative posi-tion of the input to output clocks (leading or lagging), the four possible cases are investi-gated in the testbench. The following sections detail all the cases. A l l the waveforms are included, and a description of the phase detector and the controller's operation for every case clarifies the DLLs operating mechanism. 4.2 Initial lock After powering up and resetting, either phase detector's increasedelay or decrease_delay becomes high, depending on the polarity of the phase error. If the input clock leads the output clock, then decrease_delay is enabled. On the other hand, i f the output clock leads the input clock then increase_delay is enabled. In the case where output clock is in the same phase as the input clock, then both increase_delay and decrease_delay signals (phase detector outputs to the controller module) are disabled. 53 If decrase_delay is enabled, then the state machine transits to the DECREMENT state. In this state at every clock the coarse shift register shifts one unit to the left, which conse-quently decreases the total delay by. one unit. At some point the output clock starts leading ' the input clock, which means that coarse action is completed and the state machine has transited to the FINE state. In this state, the fine shift register shifts to the right and at every cycle the total delay of delay line increases by an incremental value. As described in the previous chapter, the incremental value is very small, 4 ps for the N A N D gate used in this design. Finally, the output clock is within the D L L resolution (4 ps) of the input clock, and increase_delay is disabled. The lock_indicator signal becomes high, which indicates that D L L is locked. The captured waveforms are shown in Figures 4.1 and 4.2. For clarity, only related signals are captured. The phase error between the output and input clocks is 2 ps after the D L L locks, where the L O C K I N D I C A T O R signal is high as shown in Figure 4.2. File Edit Marker G o T o View Options Window Help D | c g | B | ' I '1 , 1 a - | z - J T J K | > J « | » | H « | R | [ * T « . | ( S | f | RESET L O C K J N D I C A T O R D L L _ C L O C K _ O U T P U T D L L _ C L O C K J N P U T REGISTER_CLOCK D E C R E A S E _ D E L A Y I N C R E A S E _ D E L A Y N E X T S T A T E S T A T E 50000 100000 150000 ' ' I.J...' • j . . . . . . . i . . 200000 250000 _ DECREMENT FINE DLE DECREMENT FINE R R ~ T | Ready jTlrne - ZS0000 Wi f -10 5Wfc=9 Se i -0 Figure 4.1 Initial lock mode waveform for a leading input clock 54 File Edit Marker GoTo View Options Window Help OJEgjt i z+ z-RESET LOCK_INDICATOR D L L _ C L O C K _ O U T P U T DLL_CLOCK_INPUT REGISTER_CLOCK D E C R E A S E _ D E L A Y INCREASE_DELAY N E X T S T A T E S T A T E 232910 232920 FINE FINE 232930 J Ready Time « HS0000 :Wif=1D lWfc=9 ;Sel=0 Figure 4.2 Initial lock mode waveform for a leading input clock (zoomed in) On the other hand, i f increase_delay is enabled, then the state machine transits to the I N C P v E M E N T state. In this state, at every clock edge the coarse shift register shifts one unit to the right, which increases the total delay by one unit delay. At some point, the input clock starts leading the output clock and decrease_delay is asserted, which means that coarse action is completed. The state machine then moves to the D E C R E M E N T state and after one clock cycle enters the FINE state as shown in Figure 4.3. The reason behind this sequence is that initially the fine delay line output tap is set to the first tap, the most left tap position of the chain, so fine delay can only be increased. Therefore, by going to the DECREMENT state the output clock leads the input clock again, but this time the phase error is less than one unit delay. By moving to FINE state the delay incrementally increases until the phase error becomes zero and lock state is achieved.The phase error between the output and input clocks is 2 ps after the D L L locks as shown in Figure 4.4. 55 File Edit Marker GoTo View Options Window Help E _ i I « I * H A I [ M £ | J S L £ | | RESET L O C K J N D I C A T O R D L L _ C L O C K _ O U T P U T DLL_CLOCK_INPUT REGISTER_CLOCK D E C R E A S E _ D E L A Y I N C R E A S E _ D E L A Y N E X T S T A T E S T A T E 31 50000 100000 1 1 1 i 1 L..1 . v i . 1 1 1 1 1 1 I 1 • 1 • -32086 150000 200000 2500001; • 1 1 ,1.1—' I ' I I < ' I . ' • ' ' I 1 I I I I 1 L n m r L r ^ ^ n I NCRdMEMT D" | FINE i iDLE iNCRE M E N T D" FINE i Ready .Time = 260000 ,Wif-10 W f c - 9 Sel=0 Figure 4.3 Initial lock mode waveform for a leading output clock RESET L O C K J N D I C A T O R DLL_CLOCK_OUTPUT D L L _ C L O C K J N P U T REGISTER_CLOCK D E C R E A S E _ D E L A Y INCREASE_DELAY N E X T S T A T E S T A T E 7 H J250480 I ' 1 1 1 1 250500 250520 ..... I ... i i I U FINE 250540 . I . Time = 2G0CIB0 Wif-10 W f c - 9 ,Se l -0 Figure 4.4 Initial lock mode waveform for a leading output clock (zoomed in) 56 4.3 Lock re-entry Phase variations on the input clock due to jitter and glitch introduce phase error, which in causes the D L L to exit the lock mode. This initiates a re-entry process and subsequently the D L L resumes its lock status. Depending on the amount of phase error, the state machine can stay in the FINE state or move to INCREMENT or D E C R E M E N T states. The following sections explain these 2 possible cases in detail. 4.3.1 Lock re-entry (easel) If the phase error is within the dynamic range of the fine delay line, then D L L re-enters the lock mode and the state machine stays in the FINE state. The dynamic range of the fine delay line is the range at which its delay can be increased or decreased without reaching the limit in both direction. The total delay of fine delay line is (N * T^, where N is the number of fine delay units in the chain and Tf is the delay of each fine unit. In this design N is 128 and the delay of each fine unit is 4 ps. The 4ps is the difference of input to output delay of 2 input N A N D gate in the library. For example, i f in lock mode the fine delay line's output is the middle tap of the chain then the fine delay line can be increased or decreased by a delay equal to half of the total delay of the fine delay line or 256 ps, which then any input phase error less than 256 ps is com-pensated and lock mode is resumed while state machine is still in the FINE state. The sim-ulation result is shown in Figure 4.5. The INCREASE_DELAY signal goes high for one clock so increases the total delay by 1 fine unit delay or 4 ps and compensates for the 57 added 6 ps input phase error. The phase error is within 4 ps resolution of phase detector and D L L is locked. Eile Edit Marker GoTo View Options Window Help D .1 j I U.\ 2+ | Z - | ' J i | K | > | «|»j*r>| R I [fT «t 277048 290000 300000 310000 J,,.,L,,J,J1....! ! [...! !.... .J. ! .' ) ! ,1...! ! ! ,' 1 1 1 1 1.... 1 1 1 ' ' R E S E T L O C K J N D I C A T O R D L L _ C L O C K _ O U T P U T j D L L _ C L O C K _ I N P U T R E G I S T E R _ C L O C K D E C R E A S E _ D E L A Y I N C R E A S E D E L A Y S T A T E FINE LT | Ready F I N E F I N E |Time = 600000 sWif=28 Wfc=9 jSel-1 Figure 4.5 Lock re-entry mode waveform for small phase error 4.3.2 Lock re-entry (case 2) The phase error can not be corrected by fine action if the amount of error is larger than the dynamic range of the fine delay line. For example, if in the lock mode the fine delay's out-put tap is in the center of the fine delay line, then any phase error greater than half of the fine delay line, or 256 ps can not be corrected while the state machine is in the FINE state. A phase error is generated i f input clock leads the output clock. The decrease_delay is enabled, and the fine delay line output tap shifts to the left until it reaches the first tap of 58 fine delay line. At this point the state machine moves to the D E C R E M E N T state and coarse action is enabled. At every clock, the total delay of D L L is decremented by an amount equal to one unit delay or 78 ps in the simulation. At a certain point, the decrease_delay is deasserted and increase_delay is enabled. Then, the state machine moves to the FINE state and finally achieves the lock mode. Figures 4.6 shows that originally D L L locks at time 220 ns. A 500 ps glitch is applied at time 240 ns and D L L locks again at time 550 ns. The final phase when the D L L locks again is shown in Figure 4.7. The 500 ps is the amount of glitch required for the D L L to exit the lock mode and not to be locked within the dynamic range of fine delay line as described in lock re-entry (case 1). The introduced glitch is shown in Figure 4.8. Figure 4.6 Lock re-entry mode waveform for a leading input clock 59 File Edit Marker G o T o View Opt ions Window Help ^ 2 ^ i S ^ « & J i , s s l . t X > L L C I l I D | E S ! | H | * N E S | - j 1 1 1 H H ' 1 K | » M * M » l [ M 5 l | S|f| 51476C f 450000 500000 . 1 ... 1 1 I 1 1 1 I I r 1 i j — i i i i i_ R E S E T L O C K J N D I C A T O R D L L _ C L O C K _ O U T P U T D L L _ C L O C K J N P U T R E G I S T E R _ C L O C K D E C R E A S E J 3 E L A Y I N C R E A S E _ D E L A Y 1 1 0 0 0 0 0 F INE Hi UUTT TTLRT i i m j m i i J T r L j i j i J T R j i j i mjmRjmnj i r i jmn jT iimnjiruiiirirmrLn i n n cn-M. Kl FINE FINE |.«| J ' | »| •1 M M [Ready -t ime - 600000 ;Wl f«31" ,Wfc=9" " Sel= 1 Figure 4.7 Lock re-entry mode waveform for a leading input clock (zoomed in) File Edit Marker GoTo View Options Window Help D £ -1 'J II z+ Z- K • ,.- +. a ?| £60990 280000 300000 320000 .! 1 1 1 1 ; 1.... j 1 1 1 1 '..j 1 1 1 1 1 1 1 1 1 1 J 1 RESET LOCK_INDICATOR DLL_CL0CK_0uTPUT| DLL_CLOCK_INPUT REGISTER_CLOCK DECREASEJOELAY IIMCREASE_DELAY NEXTSTATE 1 1 1 1 0 0 0 FINE • FIN Fl ' -E Read- Time -6D0000 }Wlf-31 !Wfc=9 Sel=1 Figure 4.8 Introduced glitch waveform for a leading input clock 60 A phase error is generated i f the output clock leads the input clock. The increase_delay is enabled, and the fine delay line output tap shifts to the right until it reaches the last tap of fine delay line. At this point, the state machine moves to INCREMENT state and coarse action is enabled. At every clock, the total delay of D L L is incremented by an amount equal to one unit delay or 78 ps in the simulation. At a certain point the increase_delay is deasserted and decrease_delay is enabled. Then, the state machine moves to the FINE state and finally achieves the lock mode. Figures 4.9 shows that, originally, the D L L locks at time 250 ns. A 2 ns glitch is applied at time 300 ns and D L L locks again at time 1315 ns. The final phase when the D L L locks again is shown in Figure 4.10. The 2 ns input phase shift is introduced as glitch which causes the D L L exits the lock mode and L O C K J N D I C A T O R signal goes low as shown in Figure 4.11. Fi le Edi t M a r k e r G o T o V i e w O p t i o n s W i n d o w H e l p •leg]sal a iNgsl i J^ J z,|z-|:-j| - i H ^ M j j r n c j M i l 500001) |0 500000 1000000 J—i—i—i—i—i—i—i—i—I—i i—i i i i i i i I i i i _i_ RESET L O C K J N D I C A T O R DLL_CLOCK_OUTPUT| DLL_CLOCK_INPUT REGISTER_CLOCK DECREASE_DELAY INCREASE_DELAY NEXTSTATE 1 1 0 0 0 0 0 FINE r J~L •III i l l Ml IIIII j j II III IJIllllillllilM I I I I M J N ' I I I ujjjj_ji..jjjiijjjjjjjj;iji..jjjjjiwiiii L J STATE •* ' • I FINE = INE I N C R E I v E F I N E ax F I N E • s C R E I v . E ' IIME Ready !Tlme - 1500000 Wif=31 :Wfc=S S e l - 1 Figure 4.9 Lock re-entry mode waveform for a leading output clock 61 File Edit Marker GoTo View Options Window Help T [ - | - J »|z-||. | a l ; 240131 11250000 _ l l _ _ l I I 1300000 l RESET L O C K J N D I C A T O R DLL_CLOCK_OUTPUT| D L L _ C L O C K J N P U T REGISTER_CLOCK DECREASE_DELAY INCREASEJDELAY NEXTSTATE > S T A T E ' , , , ' 1 0 0 0 0 0 0 INCREH: I M . - H M N i . > \ K : } f Ml \ C R E ! - / E U ~ D E C B E ' I F I N E Ti Ready T i m e - 1500000 W l f - 3 1 Wfc=9 Figure 4.10 Lock re-entry mode waveform for a leading output clock (zoomed in) File Edit Marker GoTo View Options Window Help D|cs|al *|<Mm| __4 z+|z-|, | < | > | « | » H jVjff f^J 267250 . I I I . L_ 300000 , I , 320000 I I i , . , I i RESET LOCK_INDICATOR DLL_CLOCK_OUTPUT| DLL_CLOCK_INPUT REGISTER_CLOCK D E C R E A S E _ D E L A Y INCREASE_DELAY N E X T S T A T E 1 1 1 1 0 0 0 FINE F I N E FIM= J3I Ready iTime » 15DOOO0 Wif=31 !Wfc=9 |Sel=1 Figure 4.11 Introduced glitch waveform for a leading output clock 62 4.4 Gate count of the vernier unit delay The proposed vernier unit delay line was mapped to a commercial 0.18 | im library. The total cell area is about 96 basic cells. The previously published unit delay [73], was also mapped to the same library and the total cell area is about 122 basic cells. Therefore, the proposed unit delay saves about 20% gate count when is implemented in the same library. The gate count reduction is significant considering hundreds of the unit delays blocks are needed in a typical delay line. The static power consumption of a circuit is due to the leakage current and is proportional to the gate count. Therefore, the static power consumption of the delay line is reduced by 20%. The dynamic power consumption of the circuit not only depends on the gate size but also at the rate each gate is being toggled in the circuit. The toggle rate is a function of logic and operating frequency. The dynamic power consumption can be measured using the dynamic test vectors which are generated during functional simulation. The practical formulas are given by fabs to estimate the dynamic power consumption. The general guideline is that dynamic power consumption increases proportionally with the gate count increase. Based on this rule of thumb the 20% dynamic power saving is real-ized by the proposed delay line. 4.5 Resolut ion of the proposed D L L The proposed vernier unit delay is based on the delay difference between the 2 inputs to output of a dual-input N A N D gate. The difference for a N A N D gate in 0.18 (imcommer-63 cial library is measured less than 10 ps in the functional simulation (4 ps). The previously published unit delay [73], is based on the delay difference of a N A N D gate with different fanout loads. The achieved resolution was the fifth of the delay of each unit block, i.e, about 20 ps i f it was implemented in the same 0.18 [im library, considering the delay of each unit delay block is 100 ps. Therefore, the proposed design offers 100% improvement for resolution of the delay line. The higher resolution reduces the phase error between the output and input clocks of a D L L . At the same time, the cycle-to-cycle jitter is also reduced due to the fact that output clock can be delayed by smaller unit between the two consecutive clock edges. 4.6 Limitations of the proposed DLL The main limitation of the proposed D L L is, that depending on the phase error between the input and output clocks, it can take up to 128 clock cycles for the D L L to lock which is considered relatively slow. For example, i f the first output tap of the fine delay line is selected while the D L L is locked, then a 512 ps glitch at input causing the output clock to lead the input clock, requires 128 input clock so the D L L can lock again. The resolution of fine delay line is 4 ps so at every clock cycle the delay of the whole delay line can is increased by an amount equal to 4 ps, therefore 128 input clock cycles is required to lock. This example is considered the worst case and normally D L L locks in a shorter time. The thesis mainly concentrates on how to improve a DLL's resolution. The extra research can be done to improve the lock time, for example devising efficient algorithms to shorten the lock time period [33]. 64 Chapter 5 Conclusion The phase-locked loops (PLLs) and delay-locked loops (DLLs) have been widely adopted to solve the clock skew problem. In recent years, Delay Locked Loops (DLLs) have been widely used for clock alignment due to their lower phase-error accumulation and faster locking time [35], [82]. A D L L is used in many other applications such as clock synthesis [2], [3], [6], clock recovery [14], [19], [25], S D R A M controller [26], [47], [65], Automatic Test equipment (ATE) [99] and Time to Digital Converter (TDC) [4], [22], and [23]. The first DLLs were analog and mainly used for clock distribution applications [10], and [13]. A conventional analog D L L consists of four main blocks: a voltage controlled delay line (VCDL), a charge-pump, a low pass filter, and a phase detector. The simple design of the D L L offers many advantages when compared to VCO-based PLLs. It is still relatively complex analog circuit, requiring process-specific implementation, making it very diffi-cult to reuse the same design for different technology. Basically an analog D L L is a non-portable architecture as major changes in the layout of design are required to port a design from one technology to another one. Digital DLLs are characterized by their use of digital delay lines. They are typically made from simple digital circuit elements. This simplicity helps to design a portable digital D L L which can be easily adopted for different technologies. Although the digital D L L uses 65 more area and power than the analog D L L , its greater simplicity, and lower minimum required power supply voltage makes it very attractive for many applications. The Register Delay Locked Loop (RDLL) belongs to the digital D L L family and is widely used in high speed synchronous D R A M (SDRAM) applications [17], [51], [85] and [90]. The R D L L consists of a tapped delay line, a shift register, a phase detector, and a replica input buffer dummy [85]. The Synchronous Mirror Delay (SMD) and Clock Synchronized Delay (CSD) circuits are non-feedback systems which can achieve the lock, in only two clock cycles [52], [88] and [94]. Therefore, in standby mode these circuits can be disabled, and they can lock to the reference clock in just two clock cycles when the operation mode is resumed. The latest DLLs use Vernier principle, based on the Vernier caliper tool[83]. The Vernier technique implemented in the proposed design is based on the characteristic of a N A N D gate and uses the delay difference between the inputs to output of a dual-input N A N D gate to achieve a fine step resolution. The previous technique [73] was based on the delay dif-ference of a N A N D gate with different fanout loads. The analysis in previous chapter shows the resolution of D L L is doubled based on the new technique implemented in the proposed design. This thesis introduced a novel architecture for a high-resolution Vernier D L L with a reso-lution of less than 10 ps. It combines the two coarse and fine unit delay blocks into one 66 unit delay block in a way that effectively reduces the area of the delay line. This reduction is considered significant when taking into account the number of unit delay blocks required in a typical delay line. The combination of smaller delay line and integration of fine and coarse controllers reduces D L L power consumption. The analysis in the previous chapter shows that a 20% gate count reduction in the delay line is achieved by using the proposed unit delay block. It also shows that total power consumed by delay line is also reduced 20% approximately. A testbench was written for all different cases, exhaustively covers all the different opera-tional modes of DLL. The first two cases verify the general locking process after power up and reset, considering both possible leading or lagging input clock in reference to output clock. The other four cases verify the lock re-entry process when an amount of glitch is applied to input clock. A linear control algorithm is used in this thesis to achieve lock mode. The controller lin-early increases or decreases the total delay .of the delay line. For faster lock time, the binary search algorithm is incorporated into SARDLL [33]. This algorithm reduces the searching effort and speeds up the lock time process. The various lock mechanism can be explored in order to speed up the lock time period of the D L L . This can be considered as one of the of future research topics. 67 Bibliography [1] T.Hamamoto, K.Furutani, T.Kubo, S.Kawasaki, H.Iga, T.Kono, Y.Konishi, T.Yoshihara, " A 667-Mb/s Operating Digital D L L Architecture for 512-Mb DDR S D R A M , " IEEE J. Solid-State Circuits, vol. 39, N O . l , pp. 194-206, Jan 2004. [2] C.C.Chung, C.Y.Lee, " A New DLL-Based Approach for All-Digital Multiphase Clock Generation," IEEE J. Solid-State Circuits, vol. 39, NO.3, pp. 469-471, Mar 2004. [3] R.F.Rad, A.Nguyen, J.M.Tran, T.Greer, J.Poulton, W.J.Dally, J.H.Edmondson, R.Senthinathan, R.Rathi, M.E.Lee, H.T.Ng, " A 33-mw 8-Gb/s CMOS Clock Mul-tiplier and CDR for Highly Integrated I/Os," IEEE J. Solid-State Circuits, vol. 39, NO.9, pp. 1553-1561, Sept 2004. [4] C.S.Hwang, P.Chen, H.W.Tsao, " A High-Precision Time-to-Digital Converter Using a Two-Level Conversion Scheme," IEEE Transactions on Neuclear Science, vol 51, NO.4, pp. 1349-1352, Aug 2004. [5] A.H.Chan, GW.Roberts, " A Jitter characterization system using a component-invariant Vernier delay line," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol 12, N O . l , pp. 79-95, Jan 2004. [6] C.S.Hwang, P.Chen, H.W.Tsao, " A wide-range and fast-locking clock synthesizer IP based on delay-locked-loop," ISCAS 2004, Proceedings of the 2004 Interna-tional Symposium on, Vol.1 May 2004, pp.352-361. [7] K.Kim, N.Park, T.Kim, "An unlimited lock range D L L for clock generator," ISCAS 2004, Proceedings of the 2004 International Symposium on, Vol.1 May, 2004,pp.352-361. [8] K.Cheng, Y L o , WFang, S.Hung, " A mixed-mode delay-locked loop for wide-range operation and multiphase clock generation,"System-on-chip for Real-Time Applications, 2003 Proceedings, Jul 2003, pp.90-93. [9] A.Suzuki, S.Kawahito, D.Miyazaki, M.Furuta, " A digitally skew correctable multi-phase clock generator using a master-slave D L L , " ISCAS 03, Proceedings of the 2003 International Symposium on, Vol.1 May 2003, pp. 105-108. [10] K.Taesung, K.Beomsup, "Phase interpolator using delay locked loop," Mixed-Sig-nal Design, 2003, Southwest Symposium on, Feb 2003, pp.76-80. 68 ZJingcheng, D.Qingjin, T.Kawasniewski "A-107dBe, lOKHz Carrier offset 2-GHz DLL-based frequency synthesizer," Custom Integrated Circuits Conference, 2003, Proceedings of the IEEE 2003, Sept 2003, pp.301-304. GManganaro, S.Kwak, S.Bugeja " A dual 10b 200MSPS pipeline D/A converter with DLL-based clock synthesizer," Custom Integrated Circuits Conference, 2003, Proceedings of the IEEE 2003, Sept 2003, pp.301-304. H.Chang, C.Sun, S.Liu, " A low-jitter and precise multiphase delay-locked loop using shifted averaging V C D L , " in ISSCC 2003 Dig. Tech. Papers, Vol.1, 2003, pp. 434-505. W.Pvhee, H.Ainspan, S.Rylov, A.Rylyakov, M.Beakes, D.Friedman, S.Gowada, M.Soyuer, " A 10-Gb/s CMOS clock and data recovery circuit using a secondary delay-locked-loop," Custom Integrated Circuits Conference, 2003, Proceedings of the IEEE 2003, Sept 2003, pp.81-84. GWei, J.Stonick, D.Weinlader, J.Sonntag, S.Searles " A 500MHz M P / D L L Clock Generator for a 5Gb/s Backplane Transceiver in 0.25 CMOS," in ISSCC 2003 Dig. Tech. Papers, Vol.1, 2003, pp. 464-465. S.J.Kim, S.H.Hong, J.K.Wee, J.H.Ahn, J.Y.Chung, " A low Jitter, fast recoverable, fully analog D L L using tracking A D C for high speed and low stand-by power DDR I/O interface," VLSI Circuits, 2003, Digest of Technical Papers, 2003 Sym-posium on, June 2003, pp. 285-286. J.T.Kwak, C.K.Kwon, K.W.Kim, S.H.Lee, J.S.Kih, " A low cost high performance register-controlled digital D L L for 1 Gbps/spl times/32 DDR S D R A M , " VLSI Cir-cuits, 2003, Digest of Technical Papers, 2003 Symposium on, June 2003, pp. 283-284. K.H.Cheng, Y.L.Lo, W.F.Yu, S.Y.Hung, " A mixed-mode delay-locked loop for wide-range operation and multiphase clock generation," System-on-Chip for Real-Time Applications, 2003, Proceedings, The 3rd IEEE International Workshop on, Jul 2003, pp. 90-93. Z.Mao, T.H.Szymansli, " A 4Gb/s CMOS fully-differential analog dual delay-locked loop clock/data recovery circuit," Electronics, Circuit and Systems, 2003, ICECS 2003, Proceedings of the 2003 10th IEEE International Conference on, Vol.2, Dec 2003, pp. 559-562. 69 M.E.Lee, W.J.Dally, T.Greer, H.T.Ng, R.F.Rad, J.Poulton, R.Senthinathan, "Jitter Transfer Characteristics of Delay-Locked Loops, Theories and Design Tech-niques," IEEE J. Solid-State Circuits, vol. 38, NO.4, pp. 614-621, Apr 2003. T.Matano, Y.Takai, T.Takahashi, Y.Sakito, I.Fujii, Y.Takaishi, H.Fujisawa, S.Kubouchi, S.Narui, K.Arai, M.Morino, M.Nakamura, S.Miyatake, T.Sekiguchi, K.Koyama, " A 1-Gb/s/pin 512-Mb DDRII S D R A M Using a Digital D L L and a Slew-Rate-Controlled Output Buffer," IEEEJ. Solid-State Circuits, vol. 38, NO.5, pp. 762-768, May 2003. S.Tabatabaei, A.Ivanov, "Embedded Timing Analysis: A SOC Infrastructure," IEEE Design & Test Of Computers, vol. 19, NO.3, pp. 24-36, June 2002. S.Tabatabaei, A.Ivanov, " A n embedded core for Sub-Picosecond timing measure-ments,"^? Conference, 2002, Proceedings of ITC International, pp. 129-137, Oct 2002. A.H.Chan, G.W.Roberts, " A deep sub-micron timing measurement circuit using a single-stage Vernier delay line," Custom Integrated Circuits Conference, 2002, Proceedings of the IEEE 2002, May 2002, pp.77-80. X.Millard, F.Devisch, M.Kuijk, " A 900-Mb/s CMOS Data Recovery D L L using Half-Frequency Clock," IEEEJ. Solid-State Circuits, vol. 37, NO.6, pp. 711-715, June 2002. S.J.Kim, S.H.Hong, J.K.Wee, J.H.Cho, P.S.Lee, J.H.Ahn, J.Y.Chung, " A Low-Jit-ter Wide-Range Slew-Calibrated Dual-Loop D L L Using Antifuse Circuitry for High-Speed D R A M , " IEEE J. Solid-State Circuits, vol. 37, NO.6, pp. 726-734, June 2002. R.F.Rad, WDally, H.T.Ng, R.Senthinathan, M.E.Lee, R.Rathi, J.Poulton, " A Low-Power Multiplying D L L for Low-Jitter Multi gigahertz Clock Generation in Highly Integrated Digital Chips," IEEEJ. Solid-State Circuits, vol. 37, NO. 12, pp. 1804-1812, Dec 2002. C.Kim, I.C.Hwang, S.M.Kang, " A Low-Power Small-Area +/-7.28-ps-Jitter 1-GHz DLL-Based Clock Generator," IEEEJ. Solid-State Circuits, vol. 37, NO. 11, pp. 1414-1420, Nov 2002. Y.J.Jung, S.WLee, D.Shim, W.Kim, C.Kim, S.I.Cho, " A Dual-Loop Delay-Locked Loop Using Multiple Voltage-Controlled Delay Lines," IEEE J. Solid-State Circuits, vol. 36, NO.5, pp. 784-791, May 2001. 70 DJ.Foley, M.Flynn, "CMOS DLL-Based 2-V 3.2-ps Jitter 1-GHz Clock Synthe-sizer and Temperature-Compensated Tunable Oscillator," IEEE J. Solid-State Cir-cuits, vol. 36, NO.3, pp. 417-423, Mar 2001. G.K.Dehng, J.W.Lyn, S.I.Liu, " A Fast-Lock Mixed-Mode D L L Using a 2-b SAR algorithm," IEEEJ. Solid-State Circuits, vol. 36, NO.10, pp. 1464-1471, Oct 2001. J.B.Lee, K.H.Kim, C.Yoo, S.Lee, O.GNa, C.Y.Lee, H.Y.Song, J.S.Lee, Z.H.Lee, K.W.Yeom, H.J.Chung, I.W.Seo, M.S.Chae, Y.H.Choi, S.I.Cho, "Digitally-Con-trolled D L L and I/O Circuits for 500Mb/S/Pin x l6 DDR S D R A M , " ISSCC Dig, Tech. Papers, Feb 2001, pp.68-70. G. K.Dehng, J.M.Hsu, C.Y.Yang, S.I.Liu, "Clock-Deskew Buffer Using a SAR-Controlled Delay-Locked Loop," IEEE J. Solid-State Circuits, vol. 35, pp. 1128-1136, Aug 2000. YMoon, J.Choi, K.Lee, D.K.Jeong, and M.K.Kim, "An All-Analog Multiphase Delay-Locked Loop Using a Replica Delay Line for Wide-Range Operation and Low-Jitter Performance," IEEE J. Solid-state Circuits, vol.35, pp. 377-384, Mar 2000. H. Lee, H.Q.Nguyen, D.W.Potter, "Design Self-Synchronized Clock Distribution Networks In An SOC ASIC Using D L L With Remote Clock Feedback," ASIC/ SOC Conference, 2000, Proceedings, 13th Annual IEEE International, Sept 2000, pp.248-252. P.Dudek, S.Szczepanski, J.V.Hatfield, " A High-Resolution CMOS Time-to-Digital Converter Utilizing a Vernier Delay Line," Solid-State Circuits, IEEE Transactions on, vol 35, NO.2, pp. 240-247, Feb 2000. K.Minami, M.Mizuno, H.Yamaguchi, T.Nakano, YMatsushima, YSumi, T.Sato, H.Yamashida, M.Yamashina, " A 1GHz Portable Digital Delay-Locked Loop with infinite Phase Capture Ranges," ISSCC Dig, Tech. Papers, Feb 2000, pp.350-351. Y.J.Jung, S.W.Lee, D.Shim, W.Kim, C.H.Kim, S.I.Cho, " A low Jitter Dual Loop D L L using Multiple VCDLs with a Duty Cycle Corrector," VLSI Circuits, 2000, Digest of Technical Papers, 2000 Symposium on, pp. 50-51. D.J.Foley, M.P.Flynn, "CMOS D L L Based 2V, 3.2ps Jitter, 1GHz Clock Synthe-sizer and Temperature Compensated Tunable Oscillator," Custom Integrated Cir-cuits Conference, 2002, Proceedings of the IEEE 2002, May 2000, pp.371-374. 71 S.S.Hwang, K.M.Joo, H.J.Park, J.W.Kim, P.Chung, " A D L L based 10-320 M H z Clock Synchronizer," ISCAS 2000, Proceedings of the 2000 International Sympo-sium on, Vol.1 May, 2000 pp.265-268. D.J.Foley, M.P.Flynn, " A 3.3V, 1.6GHz, Low-Jitter, Self-Correcting D L L Based Clock Synthesizer in 0.5 CMOS," ISCAS 2000, Proceedings of the 2000 Interna-tional Symposium on, Vol . l May 2000, pp.249-252. GChien, P.R.Gray, " A 900-MHz Local Oscillator Using a DLL-Based Frequency Multiplier Technique for PCS applications," IEEE J. Solid-state Circuits, vol.35, NO.12, pp. 1996-1999, Oct 2000. J.H.Lee, S.H.Han, H.J.Yoo, " A 330MHz Low-Jitter and Fast-Locking Direct Skew Compensation D L L , " ISSCCDig, Tech. Papers, Feb 2000, pp.352-353. S.Kuge, T.Kato, K.Furutani, S.Kikuda, K.Mitsui, T.Hamamoto, J.Setogawa, K . H amade, Y.Komiya, S.Kawasaki, T.Kono, T.Amano, T.Kubo, M.Haraguchi, Y.Nakaoka, M.Akiyama, Y.Konishi, H.Ozaki, T.Yoshihara, " A 0.18 256-Mb DDR-S D R A M with Low-Cost Post-Mold Tuning Method for D L L Replica," IEEE J. Solid-state Circuits, vol.35, N O . l l , pp. 1680-1689, Nov 2000. S.S.Hwang, "Dual-Loop DLL-based clock synthesizer," Electronics Letters, vol 36, NO. 14, pp. 1173-1174, Jul 2000. T.Hamamoto, S.Kawasaki, K.Furutani, K.Yasuda, Y.Konishi " A skew and jitter suppressed D L L architecture for high frequency DDR SDRAMs," VLSI Circuits, 2000, Digest of Technical Papers, 2000 Symposium on, Mar 2000, pp. 76-77. J.J.Kim, S.B.Lee, T.S.Jung, C.H.Kim, S.I.Cho, B.Kim, " A Low-Jitter Mixed-Mode D L L for High-Speed D R A M Applications," IEEE J. Solid-state Circuits, vol.35, NO.10, pp. 1430-1436, Oct 2000. C.S.Hwang, WC.Chung, C.Y.Wang, H.W.Tsao, S.I.Liu, " A 2V Clock Synthesizer using Digital Delay-Locked Loop," ASIC, 2000, Proceeding, 2002 IEEE Asia-Pacific Conference on, Aug 2000, pp.91-94. S.Eto, H.Akita, K.Isobe, K.Tsuchida, H.Toda, T.Seki, " A 333MHz, 20mW, 18ps Resolution Digital D L L using Current-Controlled Delay with Parallel Variable Resistor D A C (PVR-DAC)," ASIC, 2000, Proceeding, 2002 IEEE Asia-Pacific Conference on, Aug 2000, pp.349-350. 72 H.Yoon, GCha, C.YOO, N.J.Kim, K.Y.Kim, C.H.Lee, K.N.Lim, k.Lee, J.Y.Jeon, T.S.Jung, H.Jeong, T.Y.Chung, K . K i m and S.I.Cho, " A 2.5-V, 333-Mb/s/pin, 1-Gbit, Double-Data-Rate Synchronous D R A M , " IEEE J. Solid-State Circuits, vol. 34, N O . l l , pp. 1589-1599 Nov, 1999. F.Lin, J.Miller, A.Schoenfeld, M.Ma, and R.J.Baker, " A Register-Controlled Sym-metrical D L L for Double-Data-Rate D R A M , " IEEE J. Solid-State Circuits, vol. 34, pp. 565-568, Apr 1999. T.Saeki, K.Minami, H.Yoshida, H.Suzuki, " A Direct-Skew-Detect Synchronous Mirror Delay for Application-Specific Integrated Circuits," IEEE J. Solid-State Circuits, vol. 34, pp. 372-379, Mar 1999. W.Rhee, A . A l i , "An On-Chip Phase compensation technique in fractional-N-fre-quency synthesis," ISCAS 1999, Proceedings of the 1999 International Symposium on, Vol.3, June 1999, pp.363-366. A.Mantyniemi, T.Rahkonen, J.Kostamovaara, " A High Resolution digital CMOS Time-To-Digital converter based on nested Delay Locked Loops," ISCAS 1999, Proceedings of the 1999 International Symposium on, Vol.2, June 1999, pp.537-540. S.Nagavarapu, J.Yan, E.K.F.Lee, R.L.Geiger " A n asynchronous data recovery/ retransmission technique with foreground D L L calibration," ISCAS 1999, Pro-ceedings of the 1999 International Symposium on, Vol.6, June 1999, pp.354-357. R.L.Aguiar, D.M.Santos, "Simulation and modeling of digital Delay Locked Loops," ISCAS 1999, 42ndMidwest Symposium On, Vol.2, Aug 1999, pp.843-846. R.L.Aguiar, D.M.Santos, "Modeling Charge-Pump Delay Locked Loops," ICECS 1999, The 6th IEEE International Conference On, Vol.2, Sept 1999, pp.823-826. S.H.Han, J.H.Lee, H.J.Yoo, " A fast lock-on time Mixed Mode D L L with lOps jit-ter," VLSI and CAD, 1999, ICVC 1999, The 6th IEEE International Conference On, Oct 1999,pp.564-565. M.Miyazaki, K.Ishibashi, " A 3-Cycle lock time Delay-Locked Loop with a paral-lel phase detector for low power mobile systems," ASICs, 1999, AP-ASIC 1999, The First IEEE Asia Pacific Conference On, Aug 1999, pp.396-399. 73 Y.S.Song, J.K.kang, " A Delay-Locked Loop circuit with Mixed-Mode tuning," ASICs, 1999, AP-ASIC 1999, The First IEEE Asia Pacific Conference On, Aug 1999, pp.347-350. P.D.Capofreddi, C.D.Baringer, J.F.Jenson, M.J.W.Rodwell, W.P.Posey, M.W.Yung, Y.M.Xie, " A Clock and Data recovery IC for communications and radar applica-tions," Design Of Mixed-Mode Integrated Circuits and Applications, 1999, Third International Workshop On, Jul 1999, pp.88-90. T.Toifi, R.Vari, P.Moreira, A.Marchioro, "4-Channel Rad-Hard Delay Generation ASIC with Ins Timing Resolution for L H C , " Nuclear Science, IEEE Transactions On, Vol.46, NO.3, June 1999, pp.423-427. J.Park, Y.Koo, W.Kim, " A Semi-Digital Delay-Locked Loop for clock skew mini-mization," VLSI Design, 1999, Proceedings of 12th International Conference On, Jan 1999,pp.584-588. A.Balatsos, D.Lewis, "Low-Skew clock generator with dynamic impedance and delay matching," ISSCC Dig, Tech. Papers, Feb 1999, pp. 182-183. L.Paris, J.Benzreba, P.Demone, M.Dunn, L.Falkenhagen, P.Gillingham, I.Harri-son, W.He, D.Macdonald, M.Macintosh, B.Millar, K.Wu, H.J.Oh, J.Stender, V.Chen, J.Wu, " A 800MB/s 72Mb S L D R A M with digitally calibrated D L L , " ISSCC Dig, Tech. Papers, Feb 1999, pp.414-415. Y.Moon, D.K.Jeong, " A lGbps transceiver with Receiver-End deskewing capabil-ity using Non-Uniform Tracked Oversampling and a 250-750 MHz Four-Phase D L L , " 1999 Symposium On VLSI Circuits, Dig, Tech. Papers, pp.47-48. F.Mu, A.Edman, C.Sevenson, "Digital Multiphase Clock/Pattern Generator," IEEEJ. Sold-State Circuits, vol.34, NO.2, pp. 182-191, Feb 1999. S.I.Liu, J.H.Lee, H.W.Tsao, "Low-Power Clock-Deskew Buffer for High-Speed Digital Circuits," IEEE J. Sold-State Circuits, vol.34, NO.4, pp. 554-558, Apr 1999. M.Mota, J.Christiansen, " A High-Resolution Time Interpolator Based on a Delay Locked Loop and an RC Delay Line," IEEE J. Sold-State Circuits, vol.34, NO. 10, pp. 1360-1366, Oct 1999. Y.Nakase, YMorooka, D.J.Perlman, D.J.Kolar, J.M.Choi, H.J.Shin, T.Yoshimura, N.Watanabe, Y.Matsuda, M.Kumanoya, M.Yamada, "Source-Synchronization and 74 Timing Vernier Techniques for 1.2-GB/s S L D R A M interface," IEEE J. Sold-State Circuits, vol.34, NO.4, pp. 494-501, Apr 1999. W.Bruno, K.S.Donnelly, J.Kim, P.S.Chau, J.L.Zerbe, C.Huang, C.V.Tran, C.L.Portmann, D.Stark, Y.F.Chan, T.H.Lee, M.A.Horowitz, " A Portable Digital D L L for High-Speed CMOS Interface Circuits," IEEE J. Sold-State Circuits, vol.34, NO.5, pp. 632-644, May 1999. C.Kim, H.K.Kyung, W.P.Jeong, J.S.Kim, B.S.Moon, J.W.Chai, S.M.Yim, J.H.Choi, K.H.Han, C.J.Park, H.S.Hwang, H.Choi, S.B.Cho, C.L.Portmann, S.I.Cho, " A 2.5-V, 72-Mbit, 2.0-GByte/s Packet-Based D R A M with a 1.0-Gbps/ pin Interface," IEEE J. Sold-State Circuits, vol.34, NO.5, pp. 645-652, May 1999. S.Eto, M.Matsumiya, M.Takita, Y.Ishii, T.Nakamurra, K.Kawabata, H.Kano, A . Kitamoto, T.Ikeda, T.Koga, M.Higashiro, Y.Serizawa, K.Itabashi, O.Tsuboi, Y.Yokoyama, and M.Taguchi, " A 1 Gb S D R A M with ground level precharged bit-line and non-boosted 2.1V word line," IEEE J. Solid-State Circuits, vol. 33, N O . l 1 pp. 1697-1702, Nov 1998. M.Hasegawa, M.Nakamura, S.Narui, S.ohkuma, YKawase, H.Endoh, S.Miyatake, T.Akiba, K.Kawakita, M.Yoshida, S.Yamada, T.Sekigguchi, I.Asano, Y.Tadaki, R.Nagai, S.Miyako, K.Kajigaya, M.Horiguchi, and Y.Nakagome, " A 256 Mb S D R A M with subthreshold leakage current suppression," in ISSCC 1998 Dig. Tech. Papers, Feb 1998, pp. 80-81. C.H.Kim, J.H.Lee, J.B.Lee, B.S.Kim, C.S.Park, S.B.Lee, S.Y.Lee, C.W.Park, J.GRoh, H.S.Nam, D.GKim, D.Y.Lee, T.S.Jung, H.Yoon, S.I.Cho, " A 64-Mbit, 640-MByte/s bidirectional data strobed, Double-Data-Rate S D R A M with a 40-mW D L L for a 256-MByte memory system," IEEE J. Sold-State Circuits, vol.33, N O . l l , pp. 1703-1710, Nov 1998. B. S.Kim, L.S.Kim, "100 MHz all-digital Delay-Locked Loop for low power appli-cation" Electronics Letters, vol 34, NO.18, pp. 1739-1740, Sept 1998. S.J.Jang, S.H.Han, C.S.kim, Y.H.Jun, H.J.Yoo, " A compact ring delay line for high speed synchronous D R A M , " VLSI Circuits, 1998, Digest of Technical Papers, 1998 Symposium on, pp. 60-61. B. W.Garlepp, K.S.Donnelly, J.kim, P.S.Chau, J.L.Zerbe, C.Huang, C.V.Tran, C. L.Portmann, D.Stark, Y.F.Chan, T.H.Lee, M.A.Horwitz, " A portable digital D L L architecture for CMOS interface circuits," VLSI Circuits, 1998, Digest of Technical Papers, 1998 Symposium on, pp. 214-215. 75 T.Yushimura, Y.Nakase, N.Watanabe, YMorooka, Y.Matsuda, M.Kumanoya, H.Hamano, " A Delay-Locked Loop and 90-degree phase shifter for 800Mbps Double Data Rate memories," VLSI Circuits, 1998, Digest of Technical Papers, 1998 Symposium on, pp. 66-67. D.Birru, " A novel Delay-Locked Loop based CMOS clock multiplier," IEEE J. Sold-State Circuits, vol.44, NO.4, pp. 1319-1322, Nov, 1998 M.Mota, J.Christiansen, " A four channel, self-calibrating, high resolution, Time To Digital Converter, "Electronics, Circuits and Systems, 1998 IEEE International Conference On, vol.1, pp. 409-412, Sept 1998. RL.Aguiar, D.M.Santos, "Wide-Area clock distribution using controlled delay lines," Electronics, Circuits and Systems, 1998 IEEE International Conference On, vol.2, pp. 63-66, Sept 1998. M.S.Gorbics, J.Kelly, K.M.Roberts and R.L.Sumner, " A High Resolution Multihit Time to Digital Converter Integrated Circuit," IEEE Transactions on Neuclear Sci-ence, vol 44, pp. 379-384, June 1997. S.Sidiropoulos, M.Horwitz, " A Semidigital Dual Delay-Locked Loop," IEEE J. Solid-State Circuits, vol. 32, pp. 1683-1692, Nov 1997. A.Hatakeyama, H.Mochizuki, T.Aikawa, M.Takita, Y.Ishii, H.Tsuboi, S.Y.Fujioka, S.Yamaguchi, M.Koga, Y.Serizawa, K.Nishimura, K.Kawabata, YOkajima, M.Kawano, H.Koima, K.Mizutani, T.Anezaki, M.Hasegawa, and M.taguchi, " A 256-Mb S D R A M using a register-controlled digital D L L , " IEEE J. Solid-State Circuits, vol. 32, pp. 1728-1732, Nov 1997. GC.Moyer, M.Clements, W.Liu, T.Schaffer, R.K.Cavin, "The Delay Vernier pat-tern generation technique," IEEE J. Sold-State Circuits, vol.32, NO.4, pp. 551-562, Apr 1997. K.Gotch, S.Wakayama, M.Saito, J.Ogawa, H.Tamura, YOkajima, M.Taguchi, "All-Digital Multi-Phase Delay Locked Loop for internal timing generation in embedded and/or high speed DRAMs," VLSI Circuits, 1997, Digest of Technical Papers, 1997 Symposium on, pp. 107-108. T.Saeki, H.Nakamura, J.Shimizu, " A lOps jitter 2 clock cycle lock time CMOS digital clock generator based on an interleaved synchronous mirror delay scheme" VLSI Circuits, 1997, Digest of Technical Papers, 1997 Symposium on, pp. 109-110. 76 [89] S.Sidiropoulos, M.Horwitz, " A Semi-Digital D L L with unlimited phase shift capa-bility and 0.08-400MHz operating range," ISSCC Dig, Tech. Papers, Feb 1997, pp.332-333. [90] A.Hatakeyama, H.Mochizuki, TAikawa, M.Takita, Y.Ishi, H.Tsuboi, S.Fujioka, S.Yamaguchi, M.Koga, Y.Serizawa, K.Nishima, K.Kawabata, YOkajima, M.Kawano, H.Kojima, K.Mizutani, T.Anezaki, M.Hasegawa, M.Taguchi, " A 256Mb S D R A M using a Register-Controlled Digital D L L , " ISSCC Dig, Tech. Papers, Feb 1997, pp.72-73. [91] S.Gogaert, M.Steyaert, " A skew tolerant CMOS level-based A T M data-recovery system without P L L topology," Custom Integrated Circuits Conference, 1997, Proceedings of the IEEE 1997, Sept 1997, pp.'453-456. [92] B.S.Kim, L.S.Kim, " A low power 100MHz A l l Digital Delay-Locked Loop," ISCAS 2004, Proceedings of the 1997 International Symposium on, Vol.1 May 1997, pp. 1820-1823. [93] V.Lines, M.A.Scido, C.Mar, A.Achyuthan, "High speed circuit techniques in a 150MHz 64M S D R A M , " Memory Technology, Design and Testing, 1997, Pro-ceedings International Workshop On, Aug 1997, pp.8-11. [94] T.Saeki, YNakaoka, M.Fujita, A.Tanaka, K.Nagata, K.Sakakibara, T.Matano, Y.Hoshino, K.Miyano, S.Isa, E.Kakehashi, J.Drynan, M.Komuro, T.Fukase, H.Iwasaki, J.Sekine, M.Igeta, N.Nakanishi, T.Itani, K.Yoshida, H.Yoshina, S.Hashimoto, T.Yshii, M.Ichinose, T.Imura, M.Uziie, K.Koyama, Y.Fukuzo, and T.Okuda, " A 2.5 ns clock access 250 MHz 256 Mb S D R A M with synchronous mirror delay," ISSCC 1996 Dig. Tech. Papers, Feb 1996, pp. 374-375. [95] A.Chau, D.Deusschere, S.Dow, J.Flasck, M.E.Levi, F.Kristen, E.Su, " A Multi-Channel Time-to-Digital converter chip for drift chamber readout," Nuclear Sci-ence, IEEE Transactions On, Vol.43, NO.3, June 1996, pp. 1720-1724. [96] D.M.Santos, S.F.Dow, M.E.Levi, " A CMOS Delay-Locked Loop and Sub-Nano-second Time-to-Digital converter chip," Nuclear Science, IEEE Transactions On, Vol.43, NO.3, June 1996, pp.1717-1719. [97] J.Christiansen, "An Integrated High Resolution CMOS Timing Generator Based on an Array of Delay Locked Loops," IEEE J. Sold-State Circuits, vol.31, NO.7, pp. 952-957, Jul 1996. 77 [98] S.Tanoi, T.Tanabe, K.Takahashi, S.Miyamoto, M.Uesugi, " A 250-622 MHz Deskew and Jitter-Suppressed Clock Buffer Using Two-Loop Architecture," IEEE J. Sold-State Circuits, vol.31, NO.4, pp. 487-493, Apr 1996. [99] J.Chapman, J.Currin, S.Payne, " A Low-Cost High-Performance CMOS Timing Vernier for ATE," Test Conference, 1995, Proceedings International, pp. 459-468, Oct 1995. [100] R.F.Ormondroyd, "The acquisition performance of Delay-Locked Loops in noise," Radio Receivers and Associated Systems, Sept 1995, pp.192-197. [101] E.R.Ruotsalainen, T.Rahkonen, J.Kostamovaara, " A Low-Power CMOS Time-to-Digital converter," IEEE J. Sold-State Circuits, vol.30, NO.9, pp. 984-990, Sept 1995. [102] H.Sutoh, K.Yamakoshi, M.Ino, "Circuit technique for Skew-Free Clock distribu-tion," Custom Integrated Circuits Conference, 1995, Proceedings of the IEEE 1995, Sept 1995, pp.163-166. [103] B.Kim, T.C.Weigandt, P.R.Gray " P L L / D L L system noise analysis for Low-Jitter Clock synthesizer design" ISCAS 1994, Proceedings of the 1994 International Symposium on, Vol.4 June 1994, pp.31-34. [104] T.Lee, " A 2.5 V CMOS delay-locked loop for an 18 Mbit, 500 MB/s D R A M , " IEEEJ. Solid-State Circuits, vol. 29, pp. 1491-1496, Dec 1994. [105] M.Izzard, "Analog versus digital control of a clock synchronizer for a 3 Gb/s data with 3.0 V differential E C L , " inDig. Tech, Papers 1994 Symp. VLSI Circuits, June 1994, pp. 39-40. [106] C.Ljuslin, J.Christiansen, A.Marchioro, O.Klingsheim, "An integrated 16 channel CMOS Time-to-Digital converter," Nuclear Science, IEEE Transactions On, Vol.41, NO.4, Aug 1994,pp.ll04-1108. [107] A.Waizman, " A Delay Line Loop for frequency synthesis of De-Skewed Clock," ISSCC Dig, Tech. Papers, Feb 1994, pp.298-299. [108] T.Kuroda, T.Fujita, S.Mita, T.Mori, K.Matsuo, M.Kakumu, T.Sakurai, "Substrate noise influence on circuit performance in variable threshold-voltage scheme," IEEE J. Sold-State Circuits, vol.29, NO.3, pp. 309-312, Mar 1994. 78 [109] M.Ramezani, C.A.T.Salama, "An improved Bang-Bang phase detector for clock and data recovery applications," ISCAS, vol.1, NO.3, pp. 715-718, 1994. 79 Appendix A Design VHDL code library ieee; use ieee. stdlogicl 164. all; — library vst_nl8_sc_tsm_c4_wc; — use vst_nl8_sc_tsm_c4_wc.components.all; — library tpz973gtc; — use tpz973gtc.components.all; entity vernierunitdelay is port ( coarsecontrol : in stdlogic := '0'; ~ contol line for the coarse chain finecontrol : in stdlogic := '0'; — contol line for the fine chain fine_control_inv : in stdlogic := '0'; — inverted version of fine_control coarseinput : in stdlogic := '0'; ~ input to coarse chain fineinput : in stdjogic := '0'; - input to fine chain vernierinput : in stdlogic := '0'; — input from previous stage coarseoutput : out stdlogic := '0'; ~ output of coarse chain fineoutput : out std_logic := '0'; - output of fine chain veraieroutput : out stdlogic := '0'; - output to next stage clkoutput : out stdlogic := '0'); - output of vernier unit end vernier_unit_delay; architecture structural of vernierunitdelay is signal A,B,coarse_output_int,fine_output_int: stdlogic := '0'; signal logicone : stdlogic := '1'; component NAN2D0 port( Z : out STDLOGIC; A l :in STDJLOGIC; A2 : in STD_LOGIC); end component; component BUFTD1 port( Z :out STD_LOGIC; A :in STDLOGIC; ENB : in STDLOGIC); end component; component N AN2M1D1 port( Z :out STD_LOGIC; A l : in STD_LOGIC; A2 : in STDLOGIC); end component; begin -- structural U1: NAN2D0 port map (A, logicone, fine_input); U2: NAN2D0 port map (fine_output_int, A, logicone); U3: NAN2D0 port map (B, coarseinput, vernier_input); U4: NAN2D0 port map (coarse_output_int, B, finecontrolinv); U5: NAN2M1D1 port map (vernier_output, fine_output_int, finecontrol); U6: BUFTD1 port map (clkoutput, coarse_output_int, coarsecontrol); fineoutput <= fme_output_int; coarse_output <= coarseoutputint; logic_one <= '1'; end structural; entity vernierdelayline is generic ( N : integer := 128 ); -- number of delay elements port( delaylineoutput: out stdlogic := '0'; delay_line_input : in std_logic := '0'; finecontrol : in std_logic_vector(N-1 downto 0) := (others => '0'); fine_control_inv : in std_logic_vector(N-l downto 0) := (others => '0'); coarse_control : in std_logic_vector(N-l downto 0) := (others => '0')); end vernierdelayline; architecture structural of vernier_delay_line is signal fine,coarse,vernier : std_logic_vector(N-1 downto 1) :=. (others => '0'); signal logicone : stdlogic := T; component vernierunitdelay port( coarse_control : in std_logic; ~ contol line for the coarse chain finecontrol : in std_logic; - contol line for the fine chain finecontroMnv : in stdlogic; - inverted version of fme_control coarse_input : in stdlogic; - input to coarse chain fme_input : in stdlogic; - input to fine chain vernierinput : in stdlogic; - input from previous stage coarseoutput : out stdlogic; - output of coarse chain fme_output : out stdlogic; - output of fine chain vernier_output : out std_logic; - output to next stage clkoutput : out std_logic); — output of vernier unit end component; begin — structural chain: for i in 0 to N-l generate last_unit: if (i = 0 ) generate Dl : vernier_unit_delay port map (coarsecontrol(O), finecontrol(O), finecontrolinv(O), coarse(l), fine(l), vernier(l), open, open, open, delay_line_output); end generate last_unit; middleunits: if (i > 0 and i < N-l) generate Dl : vernierunitdelay port map (coarse_control(i), finecontrol(i), fme_control_inv(i), coarse(i+l), fme(i+l), vernier(i+l), coarse(i), fine(i), vernier(i), delay_line_output); end generate middleunits; first_unit: if (i = N-l) generate Dl : vernier_unit_delay port map (coarse_control(N-l), fme_control(N-l), fine_control_inv(N-l), delay_line_input, delay_line_input, logicone, coarse(N-l), fine(N-l), vernier(N-l), delay_line_output); end generate first_unit; end generate chain; logicone <= ' 1'; delaylineoutput <= 'H'; — Should be commented for synthesis end structural; entity vernier_controller is generic ( N : integer := 128 ); — number of delay elements port ( reset : in std_logic := '0'; registerclock : in std_logic := '0'; increasedelay : in std_logic := '0'; decreasedelay : in stdlogic := '0'; coarse_control : out std_logic_vector(N-l downto 0) := (others => '0'); finecontrol : out std_logic_vector(N-1 downto 0) := (others => '0'); fine_control_inv : out std_logic_vector(N-l downto 0) := (others => '0')); end verniercontroller; architecture behavior of vernier_controller is signal coarse_load_data : std_logic_vector(N-l downto 0) := (others => '0'); 82 signal fine_load_data : std_logic_vector(N-l downto 0) := (others => '0'); signal fme_control_int : std_logic_vector(N-l downto 0) := (others => '0'); signal coarse_control_int: std_logic_vector(N-l downto 0) := (others => '0'); signal coarse_enable : std_logic := '0'; signal fine_enable : stdlogic := '0'; signal logic_zero : stdlogic := '0'; signal logic_one : stdlogic := '1'; type statejype is (IDLE,INCREMENT,DECREMENT,FiNE); signal nextstate : state_type; signal state : state_type; begin process (nextstate, increase_delay, decrease_delay, fine control_int) begin case state is when IDLE => if increase_delay = '1' then nextstate <= INCREMENT; elsif decreasedelay = '1' then nextstate <= DECREMENT; else nextstate <= IDLE; end if; when INCREMENT => if decrease_delay = i ' then nextstate <= DECREMENT; else nextstate <= INCREMENT; end if; when DECREMENT => if increasedelay = T then nextstate <= FINE; else nextstate <= DECREMENT; end if; when FINE => if increasedelay = '1' and finecontrolint(O) = T then nextstate <= INCREMENT; elsif decreasedelay = '1' and fine_control_int(N-2) = '1' then nextstate <= DECREMENT; else nextstate <= FINE; end if; end case; end process; process (reset, registerclock) begin if reset = '0' then state <= IDLE; elsif (register_clock'event and registerclock = '1') then state <= nextstate; end if; end process; process(reset, register_clock) begin if reset = '0' then coarse_control_int <= coarse_load_data; elsif (register_clock'event and register_clock = T) then if increase_delay = '1' and coarseenable = '1' then rightshift: for i in 0 to N-2 loop coarse_control_int(i) <= coarse_control_int(i+l); end loop; coarsecontrolint(N-l) <= logic_one; elsif decrease_delay = '1' and coarse_enable = '1' then leftshift: for i in N-1 downto 2 loop coarse_control_int(i) <= coarse_control_int(i-l); end loop; coarse_control_int(0) <= logic_one; end if; end if; end process; process(reset, register_clock) begin if reset = '0' then fine_control_int <= fine_load_data; elsif (register_clock'event and registerclock = '1') then if increasedelay = '1' and fineenable = '1' then rightshift: for i in 0 to N-2 loop finecontrolint(i) <= fine_control_int(i+l); end loop; fine_control_int(N-1) <= logic_zero; elsif decrease_delay = T and fine_enable = '1' then leftshift: for i in N-1 downto 2 loop finecontrolint(i) <= finecontrolint(i-l); end loop; fine_control_int(0) <= logic_zero; end if; end if; end process; logiczero <= '0'; logicone <= '1'; coarse_load_data <= x"FFFFFFFFFFFFFFFF7FFFFFFFFFFFFFFF"; fine_load_data <= x"80000000000000000000000000000000"; coarse_enable <= '1' when ((state = INCREMENT or state = DECREMENT) and (nextstate /= FINE)) else '0'; fine_enable <= T' when (state = FINE) else '0'; , fine_control_inv <= not finecontrolint; fine_control <= fine_control_int; coarsecontrol <= coarse_control_int; end behavior; entity high_resoloution_phase_detector is port( dll_clock_output: in stdlogic := '0'; — DLL's output clock dllclockinput : in stdlogic := '0'; — Input clock to DLL 84 reset : in std_logic := '0'; — reset input register_clock : out stdlogic := '0'; ~ Clock for shift register decreasedelay : out stdlogic := '0'; — shift-left output increasedelay : out std_logic := '0'); — shift right output end high_resoloution__phase_detector; architecture structural of high_resoloution_phase_detector is signal A,B,C,D,E,decrease_delay_int,increase_delay_int: stdlogic := '0'; signal F,G,H,I,dll_clock_input_int,reg_clk_l,reg_clk_2,reg_clk_3 : stdlogic := '0'; signal logicone : stdlogic := '1'; component BUFBD4 port( Z : out STDLOGIC; A : in STDLOGIC); end component; component BUFBD16 port( Z : out STDLOGIC; A : in STDLOGIC); end component; component BUFBD32 port( Z : out STDJLOGIC; A : in STDLOGIC); end component; component DFFRPB1 port( Q : out STDLOGIC; QB :out STDLOGIC; CK :in STDLOGIC; D : in STD_LOGlC; RB :in STD_LOGIC); end component; component AND3D1 port( Z : out STD_LOGIC; A l :in STDLOGIC; A2 : in STD_LOGIC; A3 : in STDLOGIC); end component; component AND2D1 port( 85 Z : out STDJLOGIC; A l :in STDLOGIC; A2 :in STD_LOGIC); end component; component OR2D1 port( Z : out STD_LOGIC; A l :in STD_LOGIC; A2 :in STD_LOGIC); end component; begin — structural Ul : DFFRPB1 port map (E, F, dllclockinputjnt, B, reset); U2: DFFRPB1 port map (G, H, dllclockinputjnt, A, reset); U3: AND2D1 port map (A, logicone, dll_clock_output); U4: AND3D1 port map (increase_delay_int, E, G, dll_clock_input__int); U5: AND3D1 port map (decrease_delay_int, F, H, dllclockinputjnt); U6: OR2D1 port map (regclkl, increasedelayint, decrease_delay_in U7: AND2D1 port map (B, dllclockoutput, logic_one); U8: AND2D1 port map (dllclockjnputint, dll_clock_input, logicone) u9: BUFBD4 port map (reg_clk_2, regclkl); ulO: BUFBD16 port map (reg_clk_3, reg_clk_2); ul 1: BUFBD32 port map (register_clock, reg_clk_3); logic_one <= '1'; decreasedelay <= decrease delay_int; increase_delay <- increase_delay_int; end structural; entity lock_detector is port( lock_indicator : out stdlogic := '0'; — high when DLL is locked dll_clock_input : in std_logic := '0'; ~ Input clock to DLL reset : in stdlogic := '0'; — reset input decrease_delay : in std_logic := '0'; - shift-left output increase_delay : in std_logic := '0'); ~ shift_right output end lockdetector; architecture behavior of lock detector is begin process(reset, dll_clock_input) begin if reset = '0' then lock indicator <= '0'; elsif dll_clock_input'event and dllclock input = '0' then lockindicator <= not(increase_delay or decrease_delay); end if; end process; end behavior; entity vernierdll is generic ( N : integers 128 ); port( dll_clock input : in std_logic := '0'; dll_clock_output : out stdlogic := '0'; lockindicator : out std_logic := '0'; reset : in std_logic := '0'); end vernierdll; architecture structural of vernier dll is signal registerclock : stdlogic := '0'; signal finecontrol : std_logic_vector(N-l downto 0) := (others => '0'); signal fine_control_inv : stdJogic_vector(N-1 downto 0) := (others => '0'); signal coarse_control : std_logic_vector(N-l downto 0) := (others => '0'); signal decrease_delay : std_logic := '0'; signal increasedelay : stdlogic := '0'; signal dllclockoutput int : stdlogic := '0'; signal dllclock inputjnt : stdjogic := '0'; signal delayjineoutput : std_logic := '0'; signal logiczero : stdlogic := '0'; signal reset_int : stdlogic := '0'; signal lock indicator int : stdlogic := '0'; signal dll_clock_output_pad : stdjogic := '0'; component PDCH3DGZ port( CLK : in std_logic; CP : out stdjogic); end component; component PDD24DGZ port( I : in stdjogic; OEN : in stdjogic; PAD : inout stdjogic; C : out stdlogic); end component; 87 component PDIDGZ port( PAD : in stdjogic; C : out stdlogic); end component; component PDO02CDG port( I : in stdlogic; PAD : out stdlogic); end component; component vernier_delay_line generic ( N : integer); number of delay elements port( delay_line_output: out std_logic; delayjinejnput : in std_logic; finecontrol : in std_logic_vector(N-l downto 0); fine_control inv : in std_logic_vector(N-1 downto 0); coarsecontrol : in std_logic_vector(N-l downto 0)); end component; component high_resoloution_phase_detector port( dllclockoutput: in stdlogic; — DLL's output clock dllclock input : in stdjogic; - Input clock to DLL reset : in stdjogic; - reset input register_clock : out stdjogic; -- Clock for shift register decreasedelay : out std logic; ~ shift left output increase_delay : out stdlogic); - shift right output end component; component vernier_controller is generic ( N : integer); — number of delay elements port( reset : in registerclock increase_delay decrease_delay coarsecontrol fine_control : fine control inv stdjogic; in stdjogic; : in stdlogic; : in std logic; : out stdlogic_vector(N-l downto 0); out stdlogic_vector(N-l downto 0); : out stdlogic_vector(N-l downto 0)); end component; component lockdetector is port( lockindicator : out stdjogic; dll_clock_input : in stdlogic; reset : in std_logic; decrease_delay : in stdlogic; increase_delay : in stdlogic); end component; begin — structural ul : vernierdelayline generic map (N => 128) port map (delayjineoutput, dll_clock_input_int, finecontrol, fine_control_inv, coarse_control); u2 : high_resoloution_phase_detector port map (dll_clock_output_int, dll_clock_input_int, reset_int, registerclock, decreasedelay, increasedelay); u3 : verniercontroller generic map (N=> 128) port map (resetint, registerclock, increasedelay, decreasedelay, coarsecontrol, fine_control, fine_control_inv); u4 : lockdetector port map (lockindicatorint, dll_clock_input_int, reset_int, decreasedelay, increasedelay); u5: PDD24DGZ port map (delay_line_output, logiczero, dll_clock_output_pad, dll_clock_output_int); u6: PDO02CDG port map (lockindicatorint, lockindicator); u7: PDIDGZ port map (reset, resetint); u8: PDCH3DGZ port map (dll_clock_input, dll_clock_input_int); logic_zero <= '0'; dll_clock_output <= dll_clock_output_pad; end structural; 89 library ieee; use ieee.stdjogicl 164.all; library vst_nl8_sc_tsm_c4_typ; use vst_nl8_sc_tsm_c4_typ.components.all; library tpz973gtc; use tpz973gtc.components.all; entity verniertestbench is generic( N : integer := 128); end verniertestbench; architecture behavior of vernierjestbench is signal jitterl: stdjogic := '1'; signal jitterh : stdlogic := '0'; signal clock ljnput: stdjogic := '0'; signal clock2 input: stdjogic := '0'; signal clockenable : stdjogic := ' 1'; signal dll_clock input : stdjogic := '0'; signal dll_clock_output : stdjogic := '0'; signal lock indicator : stdjogic := '0'; signal reset : stdjogic := '0'; component vernierdll generic ( N : integer); port( dll_clock input : in stdjogic; dll_clock_output : out'stdjogic; lock indicator : out stdjogic; reset : in stdlogic); end component; begin UI : vernierdll generic map(N => 128) port map (dllclockjnput, dll_clock_output, lockjndicator, reset); process begin clockl input <= '0'; wait for 4100 ps; clockl Jnput <= '1'; wait for 4100 ps; 90 end process; process begin clock2_input <= '1'; wait for 3900 ps; clock2_input <= '0'; wait for 3900 ps; end process; process begin clock_enable <= '1'; wait for 291200 ps; clock_enable <= '0'; wait for 2910000 ps ; end process; process begin jitterl <= '1'; wait for 291100 ps; jitterl <= '0'; wait for 200 ps; jitterl <=T; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl<=T; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <=T; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= '1'; wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <= *1 wait for 8000 ps; jitterl <= '0'; wait for 200 ps; jitterl <=T; end process; process begin jitterh <= '0'; wait for 295200 ps jitterh <= '1'; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= '1'; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <=T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= '1'; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= '1'; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <=T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <=T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <=']'; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <=T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= '1'; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= T; wait for 200 ps; jitterh <= '0'; wait for 8000 ps; jitterh <= '1'; wait for 200 ps; jitterh <= '0'; end process; dllclockinput <= (((clock 1 input and jitterl) or jitterh) and clock_enable) or (clock2_input and (not clockenable)); process begin reset <= '0'; wait for 10000 ps; reset <= T; wait; end process; end behavior; configuration vernierrtl of vernierjestbench is 93 for behavior for ul : vernierdll use entity work.vernierdll(structural); for structural forul: vernier_delay_line use entity work.vernierdelayline(structural); end for; for u2: high_resoIoution_phase_detector use entity work.high_resoloution_phase_detector(structural); end for; foru3: vernier_controller use entity work.vernier_controller(behavior); end for; for u4: lockdetector use entity work.lock_detector(behavior); end for; end for; end for; end for; end vernier rtl Appendix B Synthesis result Report: cell Design : vernier_unit_delay Version: V-2004.06-SP1 Date : Mon Jun 27 12:29:56 2005 Attributes: b- black box (unknown) h- hierarchical n - noncombinational r - removable u - contains unmapped logic Cell Reference Library Area Attributes UI NAN2D0 vst nl8 sc tsm c4 wc 12.197000 U2 NAN2D0 vst nl8 sc tsm c4 wc 12.197000 U3 NAN2D0 vst nl8 sc tsm c4 wc 12.197000 U4 NAN2D0 vst nl8 sc tsm c4 wc 12.197000 U5 NAN2M1D1 vst nl8 sc tsm c4 wc 16.261999 U6 BUFTD1 vst nl8 sc tsm c4 wc 28.459000 n Total 6 cells 93.508995 Report: area Design : vernier_unit_delay Version: V-2004.06-SP1 Date : Thu Jun 23 14:31:56 2005 Library(s) Used: vst_n 18_sc_tsm_c4_wc (File: /CMC/kits/cmosp 18/synopsys/2004/syn/vst_nl 8_sc_tsm_c4_wc.db) Number of ports: 10 Number of nets: 13 Number of cells: 6 Number of references: 3 Combinational area: 93.508995 Noncombinational area: 0.000000 Net Interconnect area: undefined (Wire load has zero net area) Total cell area: 93.508995 Total area: undefined **************************************** Report: area Design : vernierdelayline Version: V-2004.06-SP1 Date : Thu Jun 23 14:34:45 2005 Library(s) Used: vstnl 8_sc_tsm_c4_wc (File: /CMC/kits/cmospl8/synopsys/2004/syn/vst_nl8_sc_tsm_c4_wc.db) Number of ports: 386 Number of nets: 767 Number of cells: 128 Number of references: 1 Combinational area: 11969.153320 Noncombinational area: 0.000000 Net Interconnect area: undefined (Wire load has zero net area) Total cell area: 11969.151367 Total area: undefined 96 Report: area Design : high_resoloution_phase_detector Version: V-2004.06-SP1 Date : Thu Jun 23 14:39:20 2005 Library(s) Used: vst_nl8_sc_tsm_c4_wc (File: /CMC/kits/cmospl8/synopsys/2004/syn/vst_nl8_sc_tsm_c4_wc.db) Number of ports: 6 Number of nets: 17 Number of cells: 11 Number of references: 7 Combinational area: 760.265991 Noncombinational area: 154.492004 Net Interconnect area: undefined (Wire load has zero net area) Total cell area: 914.757996 Total area: undefined Report: area Design : verniercontroller Version: V-2004.06-SP1 Date : Thu Jun 23 14:50:12 2005 Library(s) Used: vst_n 18_sc_tsm_c4_wc (File: /CMC/kits/cmosp 18/synopsys/2004/syn/vst_n 18_sc_tsm_c4_wc.db) Number of ports: 388 Number of nets: 825 Number of cells: 567 Number of references: 19 Combinational area: 6244.804199 Noncombinational area: 25637.822266 Net Interconnect area: undefined (Wire load has zero net area) Total cell area: 31882.552734 Total area: undefined 97 **************************************** Report: area Design : lock_detector Version: V-2004.06-SP1 Date : Thu Jun 23 14:42:05 2005 **************************************** Library(s) Used: vst_nl8_sc_tsm_c4_wc (File: /CMC/kits/cmospl8/synopsys/2004/syn/vst_nl8_sc_tsm_c4_wc.db) Number of ports: 5 Number of nets: 6 Number of cells: 2 Number of references: 2 Combinational area: 12.197000 Noncombinational area: 77.246002 Net Interconnect area: undefined (Wire load has zero net area) Total cell area: 89.443001 Total area: undefined **************************************** Report: area Design : vernier_dll Version: V-2004.06-SP1 Date : Thu Jun 23 15:09:01 2005 **************************************** Library(s) Used: vstnl 8_sc_tsm_c4_wc (File: /CMC/kits/cmosp 18/synopsys/2004/syn/vst_nl 8_sc_tsm_c4_wc.db) tpz973gwc (File: /CMC/kits/cmosp 18/synopsys/2004/syn/tpz973gwc.db) Number of ports: 4 Number of nets: 397 Number of cells: 8 Number of references: 8 Combinational area: 65985.875000 Noncombinational area: 25869.562500 Net Interconnect area: undefined (Wire load has zero net area) Total cell area: 91855.906250 Total area: undefined 98 **************************************** Report: cell Design : vernierdll Version: V-2004.06-SP1 Date : Thu Jun 23 15:10:29 2005 **************************************** Attributes: b - black box (unknown) h - hierarchical n - noncombinational p - parameterized r - removable u - contains unmapped logic Cell Reference Library Area Attributes ul vernier_delay_line 11969.151367 u4 u5 u3 u2 h, n, p highresoloutionphasedetector 914.757996 h,n vernier_controller 31882.552734 h, n, p lockdetector 89.443001 h, n PDD24DGZ tpz973gwc 9400.000000 n u6 u7 u8 PDO02CDG PDIDGZ PDCH3DGZ tpz973gwc 9400.000000 tpz973gwc 9400.000000 tpz973gwc 18800.000000 Total 8 cells 91855.906250 HDL Parameter Information: ul - N=>128 u3 - N => 128 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065411/manifest

Comment

Related Items