Equalizing Filter Design for Cross-talk Cancellation by Jihong Ren B. Sc., Huazhong University of Science and Technology, 1995 M . Eng., Huazhong University of Science and Technology, 1998 M . Sc., The University of British Columbia, 2000 A THESIS SUBMITTED IN PARTIAL F U L F I L L M E N T OF T H E REQUIREMENTS FOR T H E D E G R E E OF Master of Science in T H E F A C U L T Y OF G R A D U A T E STUDIES (Department of Computer Science) we accept this thesis as conforming to the required standard The University of British Columbia June 2002 © Jihong Ren, 2002 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia Vancouver, Canada DE-6 (2/88) Abstract \ As interconnect line width and spacing decreases and operating clock rate increases, in-terconnect has become a bottleneck in developing high-speed integrated circuits, multichip modules, printed circuit boards, and systems. With small line spacing, mutual capacitance and inductance approach the level of self-capacitance and inductance, and can severely de-grade signal integrity. The well-known equalizing filter method can significantly improve signal integrity. This thesis explores the effectiveness of equalizing niters in cross-talk can-cellation for high-speed, off-chip buses. It demonstrates that linear programming provides effective methods for designing cross-talk canceling equalizing filters that greatly increase the bandwidth of high-speed digital buses. ii Contents Abstract " Contents iii List of Tables vi List of Figures vii Acknowledgments ix 1 Introduction 1 1.1 M o t i v a t i o n 1 1.2 M e t h o d and Proposed Sys t em Structure 2 1.3 Cont r ibut ions 4 1.4 Thes is Ou t l ine 5 2 Background 6 2.1 Transmiss ion channel l imitat ions 7 2.2 E q u a l i z a t i o n 8 2.2.1 D e s i g n M e t h o d s 10 i i i 2.2.2 Application of equalizing filters in cross-talk cancellation for the local telephone subscriber loop 12 2.3 Summary 13 3 Coupled Distributed RLC Interconnect Model 14 3.1 Interconnect Model 14 3.2 Bus parameters and Simulation results 18 4 Linear Equalizing Filter Design 20 4.1 Measurements of filter performance 20 4.2 Preliminaries 23 4.2.1 Convolution 23 4.2.2 Matrix Representations of Convolution 25 4.3 Least Squares Optimization Method with Pseudo-random Input 30 4.3.1 Motivation 30 4.3.2 Least Square problem formulation 30 4.3.3 An example 37 4.4 Linear Programming Method with Worst-case Input 38 4.4.1 Motivation 39 4.4.2 Linear Programming Problem formulation 44 4.4.3 Smoothing filter 47 4.4.4 An example 48 4.5 Testing results: Comparison of LSQ method and LP method 49 4.5.1 Worst-case input sequence 49 4.5.2 Indirect coupling 54 4.5.3 Over-fitting 56 iv 4.5.4 Minimum bit time 58 4.6 Time-variant Linear FIR Filter 62 4.7 Optimized Smoothing Filter 64 4.8 Summary 65 5 Predictor-Corrector Algorithm with Model Reduction 67 5.1 Mehrotra's predictor-corrector algorithm 68 5.2 Implementation Details 71 5.2.1 Starting and Stopping 71 5.2.2 Solving the linear systems 72 5.3 Ill-conditioning and Model Reduction 72 5.4 Results 74 6 Conclusions and Future Work 77 6.1 Future work 78 Bibliography 81 v List of Tables 4.1 Performance of equalizing niters with different sizes for a bus 32-bits wide and 5 cm long 59 4.2 Performance of equalizing filters with different sizes for buses 32-bits wide. A l l filters designed using the L P method 61 4.3 Performance of different smoothing filters with 8 x 3 equalizing filters de-signed by the L P method at 300 ps 65 5.1 linprogO iteration display 75 5.2 Iteration display of our approach: Mehrotra interior-point method with model reduction 75 vi List of Figures 1.1 Proposed transmission network structure 3 2.1 A coupled microstrip transmission line 7 2.2 Simple lumped model for two coupled interconnects 7 2.3 Block diagram of an equalized transmission channel (from [3]) 9 2.4 Simplified model for full-duplex transmission over a linear multi-input/multi-output channel 12 3.1 Analytical solution from equation 3.16 vs. Spice simulation results 19 4.1 An illustrative eye diagram 21 4.2 Example of a data eye 22 4.3 Predistorted signal: equalizing filter output 39 4.4 Examples of output signal for 32-bit interconnect network 40 4.5 Eye-diagrams for a 32-bit interconnect network 41 4.6 Frobenius norm of the bus impulse response 46 4.7 System with smoothing filter at the receiver end 48 4.8 Example of output signals for systems with and without the equalizing filter designed 50 vii 4.9 Pseudo-random test: eye diagrams for systems with and without the equal-izing filter designed 51 4.10 Worst-case test vs. Pseudo-random test 53 4.11 Worst-case performance of different equalizing filters designed with the LP method 54 4.12 Indirect coupling between non-adjacent lines 55 4.13 Eye diagram for system with 8 x 16 equalizing filters designed by the LP method. Grey traces indicate high signal transmitted. Black traces indicate low signal transmitted 56 4.14 Magnitude of overshoot increases with the size of the equalizing filter de-signed with the LP method 57 4.15 The convolution procedure of the time-variant FIR filter 63 viii Acknowledgments First of all, I would like to thank my supervisor Dr. Mark Greenstreet. This thesis would not have been possible without his inspiration, extensive support , patience and encouragement. I also would like to thank my husband, Rui L i , for his consistent support. JlHONG REN The University of British Columbia June 2002 ix Chapter 1 Introduction 1.1 Motivation Advances in digital integrated circuit (IC) fabrication technology have resulted in an ex-ponential growth for the speed and integration levels of ICs. With more and more circuits placed on each die, high-performance systems require larger and larger I/O bandwidth. This demand has been addressed by increasing the number of high-speed signals and the per-pin interconnection bandwidth. Although the number of I/Os has increased from 16 ~ 24 pins in the 1970s, to several hundred pins per IC now [18], this growth is being rapidly out-paced by the bandwidth demands. To continue to improve overall system performance, the per-pin interconnection bandwidth must scale with the speed and integration level of ICs. However, without new approaches, we will soon reach the limit set by the intrinsic properties of copper lines. The number of I/Os increases by 12% per year, half of which is due to the increase in chip perimeter and half of which is due to the increase in pin density. On chip, both the number of devices and clock rates have increased at 50-60% per year, creating a growing 1 bandwidth gap. Higher bit-rates and pin densities have come to a point that interconnections are no longer well-behaved short interconnections. With the decreasing cross sectional ar-eas of interconnections, the line resistance per unit length has increased to a point that long interconnections can no longer be considered lossless. Resistive effects are particularly se-vere at high bit-rates because of both the high frequency roll-off of RC transmission lines and the increase of resistance with frequency due to the skin effect. To achieve maximum packing density, designers attempt to place signal lines as close to each other as possible. This introduces problems of electromagnetic coupling (cross-talk) which are exacerbated by high data rates. Cross-talk has become a critical issue in interconnect performance and hence overall system performance. Traditionally, cross-talk is reduced by carefully control-ling line geometry and arranging circuits to decrease the coupled line length. Moreover, signaling conventions that are less susceptible to coupled energy can be used. These meth-ods reduce cross-talk in a somewhat ad-hoc way. For example, as a rule-of-thumb, a ratio of two-to-one for line spacing against line width is commonly used, based on the assumption that cross-talk decreases monotonically with the increase in line spacing. However, this simple assumption can fail for high bit-rate design. The relationship between line spacing and line width is non-linear, and a two-to-one ratio between width and spacing may actu-ally result in higher coupled energy than smaller line spacing [11][20]. Furthermore, while these methods might reduce the amount of cross-talk, the problem of cross-talk still exists. New approaches in cross-talk reduction are needed. 1.2 Method and Proposed System Structure Equalizing niters have been used effectively for cross-talk cancellation in acoustic applica-tions such as telephone line subscriber system [1][6][7]. Recently, they have been used to compensate for the frequency-dependent attenuation of transmission lines [2]. 2 filter filter filter Bus Receiver filter Filter Network Figure 1.1: Proposed transmission network structure. This thesis explores the effectiveness of equalizing filters in cross-talk cancellation for high-bandwidth, digital communication. The proposed system structure is depicted in figure 1.1. In this transmission system, an equalizing filter is assigned to each wire of the bus. Each filter takes the input signals on a wire and its adjacent wires as its inputs, and outputs a predistorted signal onto the wire. For a fc-bit bus, the filter system can be viewed as a k x k network. Cross-talk is eliminated if the filter network is designed in a way that the concatenation of the filter network and the bus has frequency response in the form of a diagonal matrix. Several optimal filter design strategies are explored, such as the linear programming method and the least-squares method. Matlab simulation results show that the resulting 3 filters dramatically reduce cross-talk and substantially increase the maximum bandwidth that can be achieved by buses on PC boards. Thus, the equalizing filter method is promising for cross-talk cancellation and merits further investigation. 1.3 Contributions This thesis demonstrates that linear programming models provide effective methods for designing cross-talk canceling equalizing filters that greatly increase the bandwidth of high speed digital buses on printed circuit boards. The following are the major contributions supporting this thesis: • Equalizing filter design for high speed digital buses can be formulated as a least squares optimization problem, using a I2 metric for optimality. This metric ensures the quality of the received signal "on average". • The Zoo metric corresponds to the traditional eye height measurement of signal in-tegrity and guarantees worst-case performance. The filter design problem for Zoo optimality can be formulated as a linear programming problem. • An evaluation of the linear programming and least squares methods for a variety of filter configurations shows that both offer a dramatic increase in bandwidth when compared with a bus with no filter or with transmitter pre-emphasis without cross-talk cancellation. Furthermore, the filters designed for the Z ^ optimality criterion using linear programming significantly outperform their counterparts designed by traditional, least-squares method, when evaluated for digital data transmission. • To evaluate these methods, I implemented them both using Matlab. In doing so, I found that Matlab optimization package does not always converge for the linear 4 programming problems presented in this thesis. Therefore, I implemented an interior-point method with a model reduction technique that successfully solves the linear programming problems encountered. 1.4 Thesis Outline In this thesis, Chapter 2 introduces the equalizing filter technique and its existing applica-tions. Chapter 3 describes a coupled distributed R L C model for transmission lines. Based on this model, Chapter 4 discusses various techniques, such as least squares and linear programming, that I explored to design optimal linear FIR equalizing niters. Chapter 5 is devoted to Mehrotra's interior point method with a model reduction technique that is used to solve our particular linear programming problem introduced in Chapter 4. 5 Chapter 2 Background Computer system performance is often limited by communication bandwidths between chips and between subsystems. A typical signaling system consists of a transmitter, a chan-nel, and a receiver. The transmitter encodes digital information as analogue waveforms on the transmission channel, such as a circuit board trace. On the other end of the transmission channel, the receiver samples and quantizes the signal to recover the original digital infor-mation. Although we often think of transmission channels such as wires as being ideal by having zero resistance, capacitance and inductance, real wires are not ideal but rather par-asitic circuit elements whose geometry affects their electrical properties. Moreover, with small line spacing, inductive and capacitive cross-talk can severely degrade signal integrity. With the growth in integration levels, the interconnect line width and spacing decreases, and interconnect has become a bottleneck in high-speed digital designs. This chapter first discusses the channel characteristics, particularly PC board traces. I then provide background on the equalizing filter technique and an overview of its related, existing applications. 6 h Figure 2.1: A coupled microstrip transmission line. r line 1 line 2 c g r c g Figure 2.2: Simple lumped model for two coupled interconnects 2.1 Transmission channel limitations Transmission channels, such as PC board traces and coaxial or twisted-pair cables, have limited bandwidths that are determined by their physical characteristics: the size and con-struction of their conductor and shield, and the dielectric material. In this thesis, I am particularly interested in high-speed interconnect on PC boards. Thus, the following dis-cussion focuses on PC board traces. Figure 2.1 shows typical microstrip interconnections. A simple lumped model for two coupled interconnects is shown in figure 2.2. The resistance per unit length of a trace is given by the conductance of the trace ma-terial (typically copper) divided by the cross-sectional area of the trace. The cross-sectional area is the product of the width of the trace and its thickness. The width is determined by the design. The thickness is specified when the board is manufactured: thickness is specified in ounces of copper per square yard. A board with 1 oz copper has a conductor thickness 7 of roughly 35 microns. More accurate models consider the skin effect: at high frequencies, currents flow closer to the surface of the trace, resulting in a frequency-dependent increase in the series resistance [10][3]. The capacitance per unit length (c) and the inductance per unit length (I) of a mi-crostrip trace are determined by many factors including its width and height and its separa-tion from the ground plane. Electric and magnetic fields between adjacent traces lead to the coupling capacitance, c m , and the mutual inductance, lm, respectively. For PC board traces, the loss in transmission is primarily due to the series resistive component of the copper (r). Because of this loss, without a special transmission scheme, off-chip signaling on long wires, even with good current-mode signaling methods, is limited to about 1GHz [2]. Full-swing unterminated signaling methods that are used in most digital systems have even lower limits. With narrow wires and smaller line spacing, the coupling inductance and capacitance between adjacent lines approach the level of self-inductance and capacitance. In high speed circuits, because of fast signal rise times, coupling effects are severe and have become a primary concern for present and future high-speed high-density circuit design. Besides the resistive properties of the line, the coupling effects further limit the maximum bit-rate at which data can be transmitted correctly. 2.2 Equalization An ideal transmission channel would in all cases deliver the near end signal Vi„(t) from the driver without distortion to the far end receiver, i.e. vout(t) = v-m{t — tf), where td is the propagation delay across the channel. Thus, an ideal channel would have the transfer function e~iwtdI, where j = y/^T and I is the identity matrix. If an equalizing filter has a transfer function that equals the inverse of the transfer function of the channel, the concate-nation of the equalizer and the channel has a flat frequency and phase response. This is the 8 Transmit ter G(s) Equa l i ze r H(s ) C h a n n e l F igu re 2.3: B l o c k d iagram o f an equal ized t ransmiss ion channel ( f rom [3]). equal iza t ion technique w i d e l y used to act ively compensate for the channel transfer func-t ion. C h a n n e l equal iza t ion can be performed at the transmitter end, as shown i n figure 2.3, preceding the actual channel driver. Transmitters that u t i l ize equa l i z ing filters are ca l led pre-distort ing transmitters. T h e equa l i z ing filter can also be incorporated into the receiver, ca l l ed receiver equal iza t ion . It can also be split between the two ends. • Pre-dis tor t ing Transmitters Pre-dis tor t ing transmitters integrate equa l i z ing filters, c o m m o n l y rea l ized as finite impulse response ( F I R ) d ig i ta l filters. W h i l e infinite impu l se response (IIR) [9] fil-ters can be more f lexible than F I R , they are general ly not used for h igh data rate t ransmiss ion because o f the diff icul ty o f ca lcula t ing the I IR recurrence (i.e. feed-back) at very h igh rates. T h e inputs to the equa l i z ing F I R filters are the present and past transmitted symbols . The output o f the F I R filter is a we igh ted s u m o f these symbol s . T h e length o f the filter depends on the number o f symbol s that affect the response o f the channel to the current s y m b o l . T h e filter coefficients depend on the channel characteristics. Pre-dis tor t ing transmitters were first used by Pou l ton et a l . [2] i n a serial channel over copper wires at 4 G b / s to reduce in te rsymbol interference caused by frequency depen-9 dent attenuation o f the channel . Later, other groups [4][17] used the same technique to design high-speed serial l ink transceivers. F I R equa l i z ing filters bui l t into trans-mitters are easy to implement at very h igh speed because o f the ava i lab i l i ty o f trans-mit ted symbols at the transmitter end. Furthermore, because the transmitted symbol s are either Is or Os, mul t ip l i ca t ion w i t h the filter coefficients is easy. F o r example , i n [2], a five-tap F I R filter is implemented w i t h d ig i ta l adders, and a d ig i ta l - to-analog converter ( D A C ) is used to generate pre-distorted pulses. H ow e ve r , because trans-mitters general ly don ' t have informat ion o f received signals , F I R filter coefficients are obtained either by characterizat ion o f channel properties i n advance [2] [4], or by adaptive implementa t ion w i t h feedback informat ion f rom the receiver end [17]. • Rece ive r E q u a l i z a t i o n Rece ive r equal iza t ion can be real ized either w i t h analog filters preceding the analog-to-digi ta l converter ( A D C ) or w i t h d ig i ta l filters f o l l o w i n g the A D C . T h e latter one is the usual technique because d ig i ta l filters are easy to implemen t and adapt. M o r e o v e r , more complex and non-l inear filters can be implemented . However , it is w e l l - k n o w n that receiver equal iza t ion amplif ies h igh frequency noise [8]. Fur thermore , h is tor i -ca l ly , h igh speed A D C technology is behind h igh speed D A C technology. Therefore, pre-distort ing transmitters are c o m m o n l y used i n h igh speed t ransmiss ion systems that run at G H z speed. Recent ly , H o r o w i t z ' s group rea l ized 8-Gsamples /s A D C i n 0.25 / i i n C M O S , w h i c h makes h igh speed l inks w i t h equal iza t ion at the receiver end possible [19]. 2.2.1 Design Methods T h e f o l l o w i n g are two methods that are currently used to design equa l i z ing filters. 10 Zero - fo rc ing method T h e transfer funct ion H o f the channel can be der ived f rom mode ls established for each part icular channel ( reviewed i n [18]). T h e frequency response o f the channel and also the desired frequency response o f the equa l i z ing filter is then calcula ted at each frequency point. T h i s set o f discrete points is used to obtain a discrete impulse response funct ion us ing inverse Four i e r transform. T h e f o l l o w i n g two steps are used to obtain a more manageable impulse response funct ion. - W i n d o w i n g : h(n) = hd{n)W(n) where hd(n) is the desired impu l se response and W(n) is the w i n d o w i n g funct ion. T h i s step is needed to obta in a filter w i t h a finite number o f taps. - D e l a y i n g : h(n) is shifted to the right unt i l the samples are a l l indexed by a non-negative integer to obtain a causal filter. In practice, large w i n d o w s must be used to obtain effective equa l i z ing filters. A c -cord ing ly , many researchers have turned to us ing op t imiza t ion methods to obtain g o o d approximate equa l i z ing filters. T h i s is the approach that I take i n this thesis. Leas t Squares M i n i m i z a t i o n W i t h an idea l t ransmiss ion channel , the received s ignal is a de layed vers ion o f the transmitted s ignal . U s i n g least squares m i n i m i z a t i o n , the equa l i z ing filter design p rob l em is equivalent to the p rob lem o f des igning equa l i z ing filters to determine the values for the filter coefficients that m i n i m i z e the I2 n o r m o f the difference between the received s ignal and the delayed vers ion o f the transmitted s igna l . T h i s method is used in op t ima l pre-emphasis equa l i z ing filter des ign i n [2] [19] to b u i l d serial l inks that operate at over 1 Gigab i t s per second. A l s o it is w i d e l y used to des ign equalizers for telephone subscriber systems [1][6][7]. 11 G(t) P(t) near-end transmitter channel filter a(t) P(t) transmitter filter H(t) far-end channel b(t) n(t) R(t) receiver filter Figure 2.4: S i m p l i f i e d m o d e l for fu l l -duplex t ransmiss ion over a l inear mul t i - i npu t /mul t i -output channel . H(t),G(t), P(t), R(t) are the impulse responses o f the far-end channel , near-end channel , transmitter filter and receive filter respectively. 2.2.2 Application of equalizing filters in cross-talk cancellation for the local telephone subscriber loop E q u a l i z i n g filters are used to reduce in te rsymbol interference caused by the characterist ics o f a s ingle channel [2][4][17][19]. U n t i l now, no w o r k has been reported on the appl ica t ion o f equa l i z ing filters i n cross-talk cancel la t ion for h igh speed buses that run at m u l t i - G b / s . A l o n g w i t h the l i m i t e d bandwid th o f t ransmission channels, cross-talk is another c r i t i ca l p rob l em that l imi t s the m a x i m u m data rate that can be achieved by h igh density w i d e buses. L o c a l telephone subscriber loops have the same p rob lem. B u n d l e s o f twis ted copper wires are used i n l o c a l telephone subscriber loops. Because o f the c lose p h y s i c a l p rox imi ty , cross-talk interference f rom ne ighbour ing channels is one o f the major l imi ta t ions on the max-i m u m data rate that can be achieved over the loops [7]. M u l t i c h a n n e l equa l iza t ion can effectively suppress both near- and far-end cross-talk [6] [7]. In these papers, a cable o f twisted pairs that is terminated at a s ingle p h y s i c a l loca -t ion is treated as a s ingle mul t i - input /mul t i -output channel . Cross- ta lk is then character ized by off-diagonal components o f the matr ix impulse response o f the channel . T h e mul t i chan-nel adaptive F I R equalizers , the transmitter and the receiver process the entire vector o f 12 inputs and outputs (see figure 2.4). Rather than direct ly d i agona l i z ing the sys tem trans-fer funct ion matr ix , the mul t i channe l equalizers are designed to m i n i m i z e the I2 n o r m o f the difference between the received s ignal and the transmitted wave fo rm. In S a l z ' s w o r k [16], the m i n i m u m mean square error ( M M S E ) l inear equal izer for the N x N channel is comple te ly specif ied, assuming uncorrelated data and whi te noise. Later , H o n i g et a l . [6] general ized Sa l z ' s w o r k by assuming correlated data symbols , pulse ampl i tude modu la t ion ( P A M ) signals and co lo red noise. 2.3 Summary T h e equal iza t ion technique has been successfully used to compensate for resistive effects o f t ransmiss ion l ines [2][4] [17]. W i t h this technique and careful ly chosen s igna l ing methods, m u l t i - G b / s serial l i nks have been built . Equa l i za t i on is also c o m m o n l y used i n telephone subscriber systems to cance l near-end and far-end cross-talk [7][1][6]. In this thesis, I explored the effectiveness o f the equal iza t ion technique i n cross-talk cance l la t ion for h igh -speed, off -chip buses. Moreove r , besides the least squares op t imiza t ion technique that is c o m m o n l y used to des ign equa l i z ing filters, this thesis is the first w o r k that formulates the op t ima l equa l i z ing filter design p rob lem into a l inear p r o g r a m m i n g p r o b l e m for h igh speed d ig i ta l buses. 13 Chapter 3 Coupled Distributed RLC Interconnect Model 3.1 Interconnect Model A n e lec t r ica l m o d e l o f a un i fo rm transmission l ine has inductance / , resistance r, capac i -tance c and para l le l conductance g, a l l per unit length. T h e term g models the effects o f current leakage and is prac t ica l ly zero for most d ig i ta l t ransmiss ion on integrated c i rcu i t and printed c i rcu i t boards. W e w o u l d l i ke our system be able to operate at bi t rate greater than 2 Gbi ts / sec . A s s u m i n g that the rise and fa l l t imes are 10% o f the bit t ime, edges have an e lec t r ica l length o f A e = R i s e t ime (ps) /Delay (ps/cm) = 50 (ps)/33 (ps/cm) = 1.51 c m , where 33 p s / c m is the speed o f l ight i n a vacuum. T h e propagation delay o f signals t ravel ing i n other med ia such as a P C B trace is larger [10], and thus the corresponding e lec t r ica l length w o u l d be even smaller. F o r example , the c o m m o n F R - 4 printed c i rcu i t board mater ia l has a d ie lect r ic constant o f about 4.5 and propagation delay about 71 ps /cm. T h e e lec t r ica l length o f a bi t at 14 2Gbi ts / sec is 0.7 c m . A s a rule o f thumb, distr ibuted models should be used w h e n the w i re length is greater than or equal to A e / 6 . T h u s the c r i t i ca l d imens ion separating l u m p e d f rom distr ibuted systems for pr inted c i rcui t board is 0.117 c m . T h e wi re lengths we consider here are i n the range o f 2 ~ 5 0 c m . Thus a distributed m o d e l is needed to correct ly m o d e l the behavior o f this system at mu l t i g iga bit/sec data rate. A s s u m i n g the T E M mode o f wave propagat ion, for a lossy mul t iconductor system o f n wires , w e have nxn inductance mat r ix L, capacitance matr ix C and resistance matr ix R, where Lij, Cij is the mutua l inductance and c o u p l i n g capacitance between l ine i and j respectively. F o r s imp l i c i t y , the f o l l o w i n g assumptions are made: • C o u p l i n g between l ines is entirely due to mutual inductance and mutua l capacitance. There is no conductance between wires o f the bus or between wires o f the bus and ground. O n l y coup l ing between adjacent l ines are taken into account. W e ignore direct c o u p l i n g between wires o f the bus that are not adjacent. • E v e r y w i re is assumed to have the same characteristics. • W i r e s are assumed to be arranged around a cy l inde r so that every wi re is the same as others. W i t h the above assumptions, the L and R matrices are shown below. T h e capacitance mat r ix C has the same structure as L. L = lm ••• 0 . . . 0 lm. 0 0 R = r 0 0 . . . 0 0 r 0 . . . 0 0 0 r . . . 0 0 0 0 . . . r (3.1) 15 The behavior of this distributed system can be described by the following partial differential equation, where voltage vector v and current vector i are both functions of position x and time t. f 1 = - ° a i , <"> ox ot Taking the Fourier transformation of these equations yields: DV — = -(R + jcoL)l (3.4) — = -jcoCV (3.5) where V is the Fourier transform of v, X is the Fourier transform of i, and j = v—1. Differentiating equation 3.4 with respect to x and substituting equation 3.5 into the result gives 0 = (jooRC - u2LC)V (3.6) Let M = juRC -up-LC. Let T be a diagonalizing matrix for M, i.e., T~lMT is the diag-onal matrix G whose diagonal elements are the eigenvalues of M . Rewriting equation 3.6 with T yields: T~1-^ = {T~l MT){T~lV) (3.7) Let V T = T~lV and G = T~lMT, we get d2VT = GVT (3.8) 9a;2 This differential equation has the general solution VT = A i e ^ x + A2e~^x (3.9) 16 F o r a bus w i t h non-zero resistive and capaci t ive or induct ive components , the elements o f G and \[G are c o m p l e x numbers. C o m b i n i n g equation 3.9 w i t h the def in i t ion o f V r y ie lds : V = T{Aie^x + A2e~^x) (3.10) A s s u m i n g a l l source ends are terminated w i t h an impedance o f Z{n and the load ends are left open, w e have the f o l l o w i n g boundary condi t ions . T(x=length) — 0 (3.11) 7 . =I{x=o) (3-12) Vm - V( s =o) C o m b i n e d w i t h equat ion 3.4 and 3.10, the first boundary cond i t ion g iven above y ie lds : Ai = A2e-2^lens'h (3.13) F r o m equation 3.10, we k n o w that: V ( x = 0 ) =T(Al+A2) (3.14) Thus , equation 3.12 yie lds : V i n - T ( A ! + A 2 ) = -Zin{R + juLy'TiAi - A2)\/G (3.15) Equat ions 3.13, 3.15 y i e l d the final so lu t ion V = T(Hie^x + H2e-^x)Vin (3.16) w i t h H2 = [e-2^-lens,h + I-T-1Zin(R + juL)-1T(e-2^-len& (3.17) Hi = e-2^-lens'hH2 where I is the identi ty matr ix . N o t e that A\ = H\Vin, and A2 = H2X>in. Thus , the frequency response o f the bus is : H = T{Hie^-lensth + H2e-^4ensth) (3.18) 17 with Hi,H2 defined in equation 3.17. The inverse Fourier transform yields the impulse response of the bus which is used extensively in the next chapter. Note that the frequency response of the bus is a square matrix at each frequency. The impulse response of the bus is also a square matrix at each time sample. Entry at time t denotes the response on wire j at time t given an impulse input on wire i at time 0. 3.2 Bus parameters and Simulation results I validated the model derived above by comparing its prediction with Spice simulations. Figure 3.1a shows the solution of equation 3.16 using Matlab and figure 3.1b shows spice simulation results. The parameters used in both simulation are: bus width = 3, length = 5 cm, r = 0.066 ohm/cm, c = 0.8 pF/cm, I = 3.99 nH/cm, lmjl = 0.31, cm/c = 0.23, Vdd = 5.0 V, bit time = 500 ps, tr = 10% *bit time = 50 ps. These parameters correspond to microstrip lines 34.5 //m thick (1 oz copper), 75 fj,m wide with 75 /itn separation between lines, running above a ground plane with a dielectric thickness of 100 /xm, and a dielectric constant of er 4.5. The bus parameters are computed using formulas given in [10]. 18 2 bO 1 Input pulse on line 1 Output on line 1 Ml r Output on line 2,3 1000 2000 3000 4000 Time (ps) 5000 6000 7000 Input pulse on line 1 EJ Output on line 1 I 1 i! i , - ' Output on line 2,3 3n 4n 5n Time (lin) (TIME) Figure 3.1: A n a l y t i c a l so lut ion f rom equation 3.16 (upper panel) vs. S p i c e s imula t ion results ( lower panel) o f 3-bit bus. A l l l ines are quiet except l ine 1. 19 Chapter 4 Linear Equalizing Filter Design In this chapter, I present techniques for the des ign o f l inear equa l i z ing filters. I first i n -troduce the idea o f a data eye and its use to quantify filter performance. T h e next section defines notations that s imp l i fy the mathemat ical presentation o f l inear equa l i z ing filter de-s ign. T h e n , I introduce the least squares ( L S Q ) method and the l inear p r o g r a m m i n g ( L P ) method, f o l l o w e d by test results. F ina l l y , based on the l inear F I R filter designs, t ime-variant F I R filter des ign and op t ima l smooth ing filter design are discussed. 4.1 Measurements of filter performance T h e effects o f dis tor t ion and noises are often i l lustrated us ing eye diagrams. A n i l lustrat ive eye d iagram is shown i n figure 4.1. It is ca l l ed eye d iagram because o f its shape. D u r i n g sample interval , s ignal is either dis t inct ly h igh or d is t inct ly low. It must not go through the center o f the eye. T h i s a l lows the receiver to unambiguous ly determine the value o f the bit that was transmitted. T h e s ignal can change between sampl ing intervals. I also restrict h o w h igh (or l o w ) the s ignal may go, otherwise, w i t h sca l ing any eye open ing can be made 20 low target eye width, w Sample Interval Interval Figure 4.1: A n i l lustrat ive eye d iagram. arbi trari ly large. E y e height height is denned as height = min(/iu n d e r, 2 • target - hover) (4.1) where /iUnder and / i o v e r are defined in figure 4.1. T h e eye height and w i d t h are often used as an ind ica t ion o f s ignal integrity. F igu re 4.2 shows h o w a data eye is fo rmed by ove r l ay ing a s ignal wave fo rm over mul t ip le cycles . T h e eye w id th , w i n figure 4.1, is the t ime that the separation between h igh-go ing and l o w - g o i n g signals is greater than zero. In practice, the receiver w i l l attempt to sample the s ignal near the moment o f the widest eye opening. D u e to uncertainties i n the t i m i n g o f the transmitter and receiver and i n the delay o f the interconnect, the actual s amp l ing may occur at some t ime other than this ideal . T h e eye-width gives an ind ica t ion o f the robustness o f the interface to these t i m i n g uncertainties. In this thesis, the effectiveness o f a filter is quantif ied i n the three f o l l o w i n g ways : • eye height o f the output s ignal g iven a pseudo-random input sequence. 21 4 0 0 8 0 0 1200 1600 2 0 0 0 2 4 0 0 t ime (ps) 0 20 4 0 60 80 t ime (ps) F igu re 4 .2: E x a m p l e o f a data eye. U p p e r panel shows a r andom s ignal . Its cor responding eye d iagram is shown i n the lower panel . 22 • eye height of the output signal given the worst-case input sequence. • the smallest bit time (or highest bit rate) at which the eye height of output signals is greater than a specified amount, e.g. 50% of the nominal signal level and the eye width is greater than another specified amount, e.g. 25% of the bit time. 4.2 Prel iminaries By defining some notation up-front, the presentation of the filter design methods can be more succinct and direct. The responses of filters and buses are naturally written as con-volutions while linear and least squares problems are naturally formulated with matrices. Here I define some notation to show the connection between various convolutions and their corresponding matrix representations. Let v € W1 be a vector of size n. The n components are v(0)... v(n — 1). I'll write \v\ to denote the size of v, \\v\\ to denote the I2 norm of v, and ||w||fc to denote the Ik norm of v. Some matrix abbreviations used below are: = The n x n identity matrix Omxn = The mxn matrix of zeros (4-2) T n = The n x n matrix where T(i, n — i) = 1 4.2.1 Convolution Linear Convolution: Let v\ and vi be two vectors. The linear convolution of v\ and i>2 is the vector of size \v\ \ + \v2\ defined below: ( « i x v 2 ) ( i ) = E i = o « i C ? > 2 ( i - j ) (4-3) Linear convolution is commutative and associative. 23 Circular Convolution: L e t v\ and V2 be two vectors i n R n . L e t v\ <g> vi G K™ denote the c i rcu la r convo lu t ion o f v\ and V2: (wi®u2)(«) = E"=owi0>2((*-j)modn) (4-4) C i r c u l a r convo lu t ion is commuta t ive and associative. L e t v be a vector and n be an integer w i t h \v\ < n. T h e zero-extension o f v pads v w i t h zero elements to produce a vector o f size n : extendo(u,n) = Zero-extens ion is a l inear operator: e x t e n d 0 ( u , n ) = (n—|n|) x 1 (4.5) (4.6) L°(n-|i;|)x|i;| L e t extendo(|u|, n ) be the left matr ix on the right hand side o f the equat ion. L i n e a r convolu t ion can be expressed as c i rcu la r convolu t ion o f zero-extended vec-tors: v\ x v2 = extendo(wi, |vi| + |«2|) ® extendo(u2, bi| + |u2|) (4.7) Block Linear Convolution: L e t M € M m k x k be a matr ix . W e can th ink o f M as a c o l u m n o f m matrices: M = M0 M i (4.8) Mm-l where each o f the M j is a k x k matr ix . T h e b l o c k l inear convo lu t ion o f M G fl£mfcxfc a n d W G W"kxk is denned s imi l a r l y as l inear convolu t ion : (MxW)(i) = X^.o M^W:I (4.9) 24 T h e b l o c k l inear convolu t ion o f matr ix M 6 Wnkxk and vector V G R w k is defined s i m i -lar ly. Block Circular Convolution: T h e b lock c i rcu la r convolu t ion o f M and W, where M, W G jjjmfc x k. M ® W(i) = MjW^) m o d m (4.10) 3=0 B l o c k c i rcu la r convolu t ion is associative. It is commuta t ive i f the product o f the sub-matrices is commuta t ive , for example , i f the sub-matrices are a l l symmet r i c or a l l c i rculant (circulant matrices are defined i n sec 4.2.2 be low) . E x t e n d i n g the extendo operator to b lock matrices, let M G M m k x k be a matr ix , and let n > m. extendo (M,n) = M 0fc*(n—m) X k (4.11) Z e r o extension on b l o c k matrices is a.linear operator just as it is for vectors. B l o c k l inear convolu t ion can be expressed as b lock c i rcu la r convo lu t ion o f zero-extended matrices: M x W = extendo ( M , m + w) <g> extendo(W,m + w) (4.12) where M G R m f c x f c , W € R w k x k . 4.2.2 Matrix Representations of Convolution In this section, I w i l l first present matr ix representations for l inear convo lu t ion , then extend it to b lock l inear convolu t ion . L e t v G M™ be a vector. L e t G R n x n be the circulant matrix [5] generated by v: v®(i,j) = v{(i - j) mod\v\) (4-13) 25 T h e fo rm o f this c i rculant matr ix is depicted be low: v{0) v(n-l) v(n-2) . . . v(l) v{l) v(0) v(n-l) . . . v(2) v(2) v(l) v(0) ... «(3) (4.14) [v{n-l) v(n-2) v{n - 3) . . . v(0) L e t v\ and v2 be two vectors o f the same size. Equat ions 4.4 and 4.13 y i e l d : Vi <g> V2 = vf V2 Furthermore, ifv\,V2,...,Vk are a l l vectors o f the same size, then ui <8> v2 ® . . . ® Vk-\ <8> Vk — vf vf . . . vk (4.15) (4.16) N o t e that matr ix mul t ip l i ca t ion o f ci rculant matrices is commuta t ive and associative, just l ike the corresponding convolu t ion . L e t M G ]R n x m be a matr ix , and let row(i, M) G W1 be the vector such that row(»,Af)(j) = M(i,j) L i k e w i s e , let COl(j, M) G IR" be the vector such that (4.17) COl(j,M)(«) = M(i,j) C o n v o l u t i o n can be expressed w i t h a l l arguments represented as matrices: (4.18) (4.19) vi®V2 = COl(0,ufuf) e q u i v a l e n t l y (ui<g>u2)® = vf vf U s i n g equation 4.7, l inear convolu t ion can be expressed us ing mat r ix mu l t ip l i ca t ion : v i x v 2 = extendo (ui, h i + |«2|)® extendo («2>i I + 1^1) (4.20) 26 Define vXn 6 Wixn as the matrix given by extend0(v,n)<; (4.21) The form of this matrix is depicted below: v(0j vXn — where m = \v\. The linear convolution of v\,... ,Vk can be written as « 1 X f l 2 X . . . X t ) t _ 1 X » ) . = vXn v%n --.V^iVk (4.22) (4.23) (4.24) = COl(0,nt!^Xr where n = E?=i N The matrix representation for linear convolution described above can be extended to block linear convolution. Let M 6 Wnkxk be a matrix. As described in the previous section, the matrix M can be regarded as a column of m submatrices of dimension k x k each. The block circulant matrix generated by M is M0 Mm-i . . . Mi Mi M Q . . . M i M m _ x M m _ 2 . . . M 0 (4.25) 27 F o r those w h o prefer formulas to el l ipses: A f ® ( i , j ) = M ( ( i d i v A . ) _ ( i d i v f c ) ) m o d m ( ? m o d A ; , j m o d A;) L e t M,W e R m k x k be matrices. Equat ions 4 .10 and 4.25 y i e l d : (4.26) M®W = M®W (4.27) U s i n g equat ion 4 .12, b lock l inear convolu t ion o f M G R m k x k and W G M w k x k can be expressed us ing matr ix mul t ip l i ca t ion : MxW = extend 0 (M ,n )®extend 0 (W>) = M x " extended, n ) (4.28) = co\{0...k-l,Mx*WXn) where MXn G R n k x k is defined as extend0{M,n)®, n = m + w, and C O l ( 0 . . . A : , M) is defined i n the obvious manner. B l o c k l inear convolu t ion o f M G Wnkxk w i t h v G Wk can also be expressed as matr ix mul t ip l i ca t ion : M xv = MXn extendo(w,n) (4.29) where n = m + w. Def ine the f o l l o w i n g operators: • V = blocks, AT) G R m k x k creates c i rculant b locks f rom vector v G Rmk. V(i,j) = v(k * ( idiv/c) + ((i — j) m o d k)) T h e f o r m o f this matr ix is depicted be low: v(0,...,k- 1)® (4.30) V = v(k,...,2k- If v({m - l)k,... ,mk - 1)^ (4.31) 28 • vec2cir(t;, fc, n) converts a vector v G R to a ci rculant matr ix : yec2cir(u, fc, n) = extendo (Wock(-y, fc), n)® (4.32) Define u x " G ygnkxnk a s ^ e matr ix g iven by vec2cirfa, fc, n). T h i s mat r ix has the same fo rm as » x " (see equation 4.14), except that now each b l o c k is a ci rculant mat r ix o f size fc x fc. N o t i c e that vXn is a b lock c i rculant mat r ix and k extendo (v,n) = col(0, vXn) W i t h these operators, it is straightforward to see that for M G Wnkxk and v G R w k , Mxv = c o l ( 0 , M x " u x " ) (4.33) where n = m + w. T h e b l o c k l inear convolu t ion o f two vectors v\ G Wnik and v2 G Wn'lk is defined as: Vlxkv2 = co\(0,vfnvfn) (4.34) w h e r e n = m\+m2. T h e b l o c k l inear convolu t ion o f v\ G Wnik,v2 G Wn2k,...,vi G Rm'k can be wri t ten as V\ Xk V2 Xk . . . Xk Vi_\ Xk Vi = v?*vZ" . . . vf\ Vl ' (4.35) = coi(o,nU« t x") where n = EU"»i ( 4 - 3 6 ) 29 4.3 Least Squares Optimization Method with Pseudo-random Input 4.3.1 Motivation As discussed in the previous section, an ideal bus would in all cases deliver the near end signal without distortion to the far end receiver, with some amount of delay. Thus, we know that in the ideal case, the expected output signal would be simply a delayed version of the input signal. The goal of filter design is to find a set of filter coefficients that make the output signal as close to this ideal output signal as possible. Following the example of [6] [7], I use RMS error (l2 metric) in this section as a measure of the distance of the filter output from the ideal, delayed signal. In this case, filter design can be formalized as a least square optimization problem. In section 4.4, I use worst-case difference between a signal and the target as a measure of distance (1^ metric) and show that the resulting filter design problem is an instance of linear programming. 4.3.2 Least Square problem formulation • Input Consider a bus with & b U S wires. Let n j n p u t denote the length of the input training sequence in bit times. Thus, an input is a function that gives a value, +1 or -1, for each wire w € [0, k^us — 1] at each time t G [ 0 , n j n p u t — 1]. This function can be represented by a vector, input, with input(A;|:)USt + w) denoting the value of the wth wire at time t. Because filter coefficients are given in tap times, oversampling is needed to convert the input from a sequence in bit times to a sequence in tap times. Let input € fl^input^bus 5 e a vector and r be a positive integer. The oversample op-erator, oversample(input, r, fcbus) computes in e R r n input f e bus which is the r times 30 oversample of input: in(i) = input(&bus * (idiv (A;busr)) + (i mod A;bus)) The oversample operator is linear. In particular, oversample(njnput, r, £; b u s) is a ma-trix with: oversample(ninPUt, r, kbus)(i,j) = I 1 if (i div (A:busr)) = {j div r) and (i mod kbus) = (j mod A; b u s) 0 otherwise Thus, oversample(input, r, A; b u s) = oversample(n inpUt, r, kbus) • input Define input0 r u s € K r n i " P u t f c b u s as the vector given by oversample(input, r, A: b u s). The form of this vector is depicted below: input0 input: i n P u t f c b u s - i Repeats A; b u s — 1 more times i n P u t fc b u s ' n P U t * b u s + l i n P U t 2fc b u s - l Repeats kbliS — 1 more times inpuL. L _ i 1 ' t i n p u t " ' b u s 1 inpuf (4.37) 31 • Buses In m u c h the same manner as above, the impulse response o f a bus w i t h kbus wires is a c o l u m n o f kbus x kbus matrices w i t h each matr ix g i v i n g the response cor responding to a part icular delay. L e t nbus denote the length o f the bus impu l se response i n tap t imes. T h e bus impulse response can be represented by a fct>Usnbus x ^bus mat r ix , bus where bus(A;bus* + wout, Wj„) is the response o f the wout output w i re o f the bus after a delay o f t tap t imes to the Wjn input wi re . L e t in be the vector for the input o f the bus i n tap t ime. in = input r L e t in t £ M f c b u s denote the value o f the input at tap t ime t: mt(i) = m(kbust + i e [0, kb{iS - 1] L i k e w i s e , let bust 6 M f c b u s x f c b u s denote the bus impulse response at t ime t: bust(i,j) = bus(kbust + i,j), i,j e [0,kbus - 1} L e t output £ I g ^ i n p u t + n b u s ^ b u s be the vector for the output o f the bus and let OUtputt be the output at tap t ime t: output = E!=o bust-iin* ( 4 - 3 8 ) E q u a t i o n 4.38 has the fo rm o f a b lock l inear convolu t ion . Thus , output = co l (0 ,bus X n i n x " b u s ) (4.39) where n = nbus + rn\nput. Moreove r , i n this thesis, for s impl i c i ty , I assume that wires are arranged around a cy l inder (see chapter 3). T h i s means that a l l wi res have the same characteristics, and bus* is a circulant matr ix . L e t h £ W1^^ he the 32 vector whose tkbUS, . . . , (< + l)A;bU S — 1 components are the first c o l u m n o f b u S j . Tha t is, bus = block(h) b u s X n = h x " b u s T h u s the ouput s ignal o f the bus g iven input i n bit t ime is: output = co l (0 ,h x " b u s ( inpu t ° ' b u s ) x " b u s ) (4.40) where n = ribus + ?"™input-• Filter In figure 1.1, a filter is depicted for each wi re o f the bus. Because a l l wires have the same characteristics, I assume that every filter is the same. F o r a fcbus-bit bus, this filter system can also be v i e w e d as a kbus x kbus input/output network. Thus , s im i l a r to the bus, the input/output re la t ionship o f the filter sys tem w i t h 7% taps can be expressed as: filterOutput = c o l ( 0 , F x - b u s ( i n p u t 0 " " ) x"" u s) (4.41) where F € M n f i r fcbus is the filter coefficient vector o f the filter for w i r e 0 o f the bus. It 33 has the f o r m depicted be low: F = MO) /i(0) / f c b u s - l ( ° ) /o(l) / l ( l ) (4.42) [fkbus-i{mr -1)_ where fi(t) denotes the contr ibut ion o f the input on wi re 0 at t ime 0 to the filter output for w i r e i at t ime t. Because the bus is symmetr ic , I restrict m y attention to symmet r i c filters. Tha t is , i n the F vector depicted above, fiti) = Abu,-*0') f o r 3 e [ ° » n f i r - 1]. * e [1, kbus - 1] M o r e o v e r , inputs on wires far away produce very l i t t le cross-talk. Therefore, it may be pract ical to force the filter coefficients for these wires to zero to s i m p l i fy the i m -plementat ion o f the filter. In this thesis, filters w i t h various sizes are investigated. F i l t e r size is defined as filter length x filter width. A n\\r x k\\r filter contains k\\r sets o f nfj r filter coefficients for inputs on wi re i tse l f and the k\\f — 1 nearest wires i n both 34 direc t ion . Its filter coefficient vector fir e R f e f i r 7 l f i r is depicted be low: fir = firo(O) firi(O) f i r f c f i r - i ( 0 ) f i r 0 ( l ) [ f i rA:,i r-l(nfir - 1)_ Define filterExtend(nfir, kfir, kbus) € M^^x^n,-,, a s m e m a t r i x depic ted be low: ' M i r 0(*b US o,r f c f i r _! 0 (fcbus-2fc f i r+l)xfc f i r o,r f c f i r _! o (fcbus-2/cfir + l)xA;fir o,r f e f i r _ x (4.43) where 0, Ffc f i r _i denotes the hor izonta l concatenation o f a c o l u m n vector 0 ( f c f i r _ 1 ) x l w i t h a matr ix r f c f i r _ i . Operator filterExtend(fir, fcfir, kbus) transforms fir to the fu l l filter coefficient vector F i n equation 4.42. filterExtend(fir, fcfir, kbus) = filterExtend(nfir, fcfir, A; b u s ) • fir (4.44) 35 Denote fir' as the vector given by filterExtend(fir, kfir, A;bus)> which equals the full filter coefficient vector F depicted previously. Thus, with equation 4.41, the output signal of the filter system with 7 % x k\\r filters can be expressed as: where n = nf\r + rn\npui. • Target signal Let T be the target signal which is a delayed version of the input signal. I considered two ways to approximate the expected delay. - LC delay: lengthy/lc. - approximate the delay by determining the peak of the Frobenius norm of the bus impulse response. The second one is more accurate because the effect of resistance is also taken into account, especially for long buses where RC delay dominates LC delay. In this thesis, all results are obtained with the second method. Let D d € M " x n be the following matrix: filterOutput = col(0, (firj£")x»bu* (input0""" ) x * b u s ) (4.45) where d is the approximated delay in tap time and n = rrijnput + d The target signal T is given by T = D d • input .0;' (4.46) 36 • Output signal With above analysis, it is straightforward to express the output signal of the system with 7ifjr x fcfjr filters in figure 1.1 using matrix multiplication. Let the vector output G R n f c b u s represent the output of the system in tap time: output = c o l ( 0 , h x " b u s ( t i r ^ ) x > s ( i n p u t ° * b u s ) x > s ) = col(0, h x * b u s (input0*""5) x > s (fir*™) x" b u s) = h X n b u s (input° r b u s ) x " b u s extendo ( r i f i r , kbus, n)filterExtend(n f i r , k\„, A ; b u s ) f i r (4.47) where n = nmpuXr + n f i r + n b u s . Let G = h X n b u s ( inpu t O r b u s ) X n b u s ex tend 0 (n f ^ (4.48) Then output = G • fir (4.49) • Least Squares Problem With equations 4.46 and 4.49, the least squares problem is: min | | T - G - f i r | | fireiR nfir' cfir Given G and T, I used QR decomposition (i.e. the backslash command in Matlab) to find the vector fir that minimizes the least square error of the over-determined system G • fir = T. 4.3.3 An example To show the effectiveness of the equalizing filter approach in cross-talk cancellation, con-sider a 32-bit bus with length 5 cm. The electrical parameters of the bus are: r = 0.066 37 Q / c m , c = 0.8 p F / c m , I = 3.99 n H / c m , Z m /Z = 0.31, cm/c = 0.23. F i l t e r des ign parameters are: 7 % = 8, / % = 2, taps per bit = 4, bi t t ime = 400 ps, length o f the t ra in ing sequence, rijnput = 200 bits. F o r this part icular example , i n equation 4.46 and 4.49, • B u s w id th fcbus = 32. Leng th o f the bus impulse response, r i 5 U S is set to be 16 taps. Thus , h is a vector o f length: nbus * &bus = 512. • input is a vector o f length: A;bUS * i^nput — 6400. • fir e K 1 6 . • G is a mat r ix o f s ize 25600 x 16. Pseudo- random input sequences are used as test sequence. B y c o m p a r i n g the eye opening w i t h and wi thout the filter designed, w e get an ind ica t ion o f the effectiveness o f the filter. F igu re 4.3 shows the predistorted input s ignal on w i r e 1. T h e waveforms i n fig-ure 4.4 c lear ly show that the 8 x 2 filter greatly reduces the overshoot and undershoot, w h i c h is also shown by the eye diagrams i n figure 4.5. W i t h the 8 x 2 F I R filter designed, the eye height is increased f rom 31% to 82%. T h i s tells us that equa l i z ing filters are a very p r o m i s i n g method i n cross-talk cancel la t ion for h igh speed buses. M o r e thorough testing results are presented i n sect ion 4.5. 4.4 Linear Programming Method with Worst-case Input In this section, a l inear p rog ramming method is in t roduced w i t h the assumpt ion that we can solve the formula ted l inear p rog ramming p rob lem. 38 Input signal on wire 1 Predistorted signal on wire 1 Figure 4.3: Predistorted signal: equalizing filter output 4.4.1 Motivation Although the least squares optimization method works and the FIR filter described greatly improves the eye height of signals transmitted, this method has several shortcomings. • The filter designed by the LSQ optimization method greatly depends upon the pseudo-random input pattern used as training sequence. To get a good filter design, a long training sequence must be used, which makes the speed of the filter design very slow for wide buses as occur frequently in practice. • The design objective is to transmit the bits without error. It is assumed that as long as a bit satisfies the eye specification, it will be received correctly. Thus, getting some bits that already satisfy the eye specification closer to the target signal doesn't matter. It's the worst-case pattern that determines the eye height. Thus, the li metric doesn't 39 S y s t e m without filters — Input s igna l o n wire 1 Output s igna l on wire 1 Figure 4.4: Examples of output signal for 32-bit interconnect network with (lower panel) and without (upper panel) 8x2 equalizing filters designed with the LSQ method. 4 0 Eye diagram for system without filters (eye height 29%, eye width 75%) t(ps) Eye diagram for system with 8*2 filters (eye height 82%, eye width 75%) t(ps) Figure 4.5: Eye-diagrams for a 32-bit interconnect network with (lower panel) and without (upper panel) 8x2 equalizing filters designed with the LSQ method. Grey traces indicate high signal transmitted. Black traces indicate low signal transmitted. 41 strictly correspond to eye height, the metric defined in equation 4.1. For example, it is possible that for a training sequence, some filter coefficient set produces very small RMS but the output signal has 1 bad trace. It is that 1 bad trace which determines the eye height. Certainly, we can reformulate the same problem into a linear pro-gramming problem, such that for a given training sequence (pseudo-random input), the / Q O metric is minimized. However, in order to guarantee worst-case performance, ideally, all possible input combinations should be part of the training sequence. This is obviously not practical. It turns out that for a given set of filter coefficients, the worst-case input pattern can be figured out and thus the worst-case eye height can be computed. This section is devoted to this method that minimizes over all possible inputs. First, I show that the search space for the metric is convex even when more general, non-linear filters are considered. Formalize an eye height specification as a sequence of tuples: E = {UQ, aQ), (ui,oti),..., (um-i,am-i) A filter F satisfies E if and only if for every input every output satisfies: Vt G [0...N- l] ,Vi G [ 0 . . . m - l],Vu G [ui,ui+1], ((0lltput(tr + d) > ai) A (input(t) = +1)) (4-50) V((0Utput( i r + d) < -a^ A (input(i) = -1)) where r is the number of taps per bit, d is the expected delay of the bus, output = h X n b u s F( input) and F is the filter function. Let F \= E denote filter F satisfies eye E. Let F i and F2 be two filters that satisfy some eye opening constraint E. That is, F\ f= E and F2 f= E. Let F3 = OFi + {1-9)F2, where 9 G [0,1]. Because the bus, h x " b u \ is linear, a system with F3 produces output signals that are the same linear combination of what is 42 produced by systems w i t h F\ and F3. It then fo l lows f rom equation 4 .50 that F3 |= E. T h u s the space o f filters that satisfy eye opening constraint E is convex. T h e objective is to send - 1 , 1 signals d o w n the bus as c lear ly as poss ib le . In this system, every w i r e o f the bus has the same configurat ion and thus is interchangeable. Thus , the o r ig ina l objective is the same as t ry ing to send d o w n 1 on wi re 1 as c lear ly as possible w i t h the worst-case disturbances f rom other wires and preceding and f o l l o w i n g bits. T h e output s ignal on wi re 1 for the current bi t is s i m p l y a summat ion o f the effect on w i re 1 at the current bit f rom • the input s ignal on wi re 1 for the current bit, w h i c h is the s ignal expected to come through i f there is no disturbance. • the input s ignal on wi re 1 at other bit t imes and also the input signals on other wires for the current bit and other bits, w h i c h produce disturbances on the first w i r e at the current bit. Thus , the op t imiza t ion p rob lem can be restated as the f o l l o w i n g : G i v e n that the current bi t input on the first w i r e is 1, f ind the best set o f filter coefficients that makes the output s ignal o n w i r e 1 at the current bi t as c lose to 1 as poss ib le for the worst-case input sequence w h i c h produces the largest disturbances on wi re 1 f rom other bit t imes and other wires . m i n s subject to undisturbed + disturbances — s < 1 (4.51) undisturbed — disturbances + s > 1 43 4.4.2 Linear Programming Problem formulation I n o w focus on the pract ical case where the filter is l inear and F I R , and show that the des ign p rob l em is an instance o f l inear p rog ramming . T h e goa l remains to send d o w n 1 a long wi re 1 as c lear ly as possible . A quantif ied vers ion o f this goa l is : for the worst case input sequence w i t h 1 at the current bit, the output s ignal at some g iven s a m p l i n g t ime is as close to 1 as possible . That is , at this s ampl ing point, the eye height is as h igh as possible . A reasonable s ampl ing point is d, the delay o f the bus. Equa t ion 4.51 shows that to formulate the L P p r o b l e m for the equa l i z ing filter design, we need to k n o w the undis turbed output at the sampl ing point and the largest total disturbances at the s amp l ing point . L e t in be the input sequence that is 1 bit l o n g and on l y the bit on the first w i r e is 1: { 1 i f * = 1 (4.52) 0 otherwise Because the w h o l e system is l inear and circulant , the response to this pulse input in gives us a l l the in format ion w e need to compute the output for the worst-case disturbances. F r o m sect ion 4 .3 , for the system w i t h 7% x F I R filters, we k n o w that the cor responding output is g iven by G • fir, where G as g iven by equation 4.48: G = h x " b u s ( i n O r b u s ) x " b u s e x t e n d 0 ^ where n = r + n\\r + nbus is the length o f the response i n tap t ime. Different rows o f G • fir represent the response on some wi re at some tap t ime. T h e cont r ibut ion o f the bit f rom equation 4.52 to the output at the sampl ing t ime is: undisturbed — row(<i • fcbus, G) • fi r (4.53) Responses f rom other wires and responses f rom the first w i r e ar is ing f rom earl ier and later bits are the disturbances. F o r example , the disturbance on wi re 1 at the s amp l ing t ime 44 caused by input on the second wi re i bit t imes earlier, is the same as the disturbance on w i r e 2 at the sampl ing t ime f rom the input on the first w i r e i bit t imes earlier. M o r e o v e r , it is the same as the response o f the o r ig ina l pulse input f rom equation 4.52 observed on w i r e 2 at the tap t ime that is i bit t imes later than the sampl ing t ime. These are due to the l ineari ty and symmet ry o f the system. Thus , disturbance (j, i) = row((ir + d) • kbus + j, G) • fir (4.54) where disturbanceis the disturbance on the first w i r e at the samp l ing t ime g iven an input o f 1 on the j t h w i r e i bit t imes earlier. I f the disturbances f rom other bi t t imes and other wires are a l l posi t ive , w e get the largest total disturbance and hence the worst-case disturbances. L e t d(j, i) denote the worst-case, posi t ive disturbance on wi re 1 at the sam-p l i n g t ime f rom the input on j t h w i r e i bit times earlier. N o t i n g that each input to the filter is either +1 or - 1 , the f o l l o w i n g inequal i ty constraints compute the absolute value funct ion needed to obtain d(j, i): d(j, i) > row((ir + d) • kbus + j, G) • fir (4.55) d(j,i) > -row((ir + d) • kbus+j,G) -fir Because the cost funct ion is posi t ive monoton ic i n each o f the d(j, i), either the first con -straint or the second constraint is tight at the op t ima l point . L e t be the mat r ix that contains a l l the rows i n G that matter. T h e total number o f rows are [n/r\. To calculate the total disturbances f rom other bit t imes and wires , ideal ly , an inf in i te ly l o n g history should be considered because o f the infini te impu l se response o f the bus. T h i s is not pract ical . N o t i c e that most o f the energy o f the impu l se response expands over about 6 t imes L C delay o f the bus (see figure 4.6). F o r the part icular bus mode l presented i n this thesis, the L C delay is about 250 ps. A l l the results presented here 45 peak 0 2 0 0 4 0 0 6 0 0 8 0 0 1000 1200 1400 1600 t ( p s ) Figu re 4.6: Frobenius n o r m o f the bus impulse response. are obta ined w i t h : n b u s = 10 • length • Vic/(tap time) Moreove r , notice that the bus impulse response does not rise immed ia t e ly to the peak. T h i s means that not on ly the history bits affect the output o f the current bit but also a few future bits. A m o n g the [n/rj rows, there are [d/r\ future bits, 1 current bi t and the rest are history bits. F o r j G [ 0 A U S - e (1 - [d/r\, [n/r\ - [d/r\] r o w ( ( i - 1 + [d/r])kbus + j , G f ) = row((ir + d) • k b u s + j , G ) (4.56) T h u s row([d/r\ • kb[iS + 1, G*) • fir gives the undisturbed output. Here is the equa l i z ing filter des ign p rob l em as a l inear p rog ramming p rob lem i n fir, d, s: 46 mins Gt -Gt \n/r\ '[n/rj row(|d/rJ • khus + l,Gt) l l x [ n / r j -row([d/r\ • kbus + 1, Gt) -llx [ n / r i 0|n/rjxl -1 -1 fir u2[n/rjxl d < 1 s -1 (4.57) 4.4.3 Smoothing filter It was found that if we average a few taps of the current output bit and use that as the objective function of the LP problem, the eye height obtained is better than simply asking the optimizer to bring one tap of the current output bit as close to 1 as possible. However, a corresponding smoothing filter is needed at the receiver end in order to get the desired output signal. Fortunately, such averaging behavior is typical of the input circuits on real chips [10]. The new system structure is shown in figure 4.7. Smoothing filters will be further examined in section 4.7. Assuming we are averaging over 3 taps (a more sophisticated strategy will be dis-cussed in section 4.7), define a smoothing operator: 1/3 1/3 1/3 0 0 . . . 0 0 1/3 1/3 1/3 0 . . . 0 0 0 1/3 1/3 1/3 . . . 0 v (4.58) 0 0 0 0 0 . . . 1/3 smoothfa) = 47 Transmit ter E q u a l i z i n g filter B U S S m o o t h i n g filter R e c e i v e r F igu re 4.7: Sys tem w i t h smooth ing filter at the receiver end. W i t h the smooth ing operator, now G = smooth h x " b u s ( in° r b u s ) X n b u s extend 0(nfj r , kbus, n)filterExtend(nfjr, /%, kbus) (4.59) Moreove r , the delay o f the bus might not be the best s amp l ing point . It was found that the best s ampl ing point depended on the filter size and the bit t ime. F o r example , at 300 ps bit t ime, 8 x 3 equa l i z ing filters designed w i t h 1 tap extra delay i n addi t ion to the bus delay give the highest eye height (81%) among a l l 8 x 3 filters. It is also better than the system wi thout the smooth ing filter (74%). In the rest o f this thesis, a l l testing results are obtained w i t h the extra delay var ied to g ive the best eye height. 4.4.4 An example T h e f o l l o w i n g example shows the effectiveness o f the equa l i z ing filter approach ( L P method) i n cross-talk cancel la t ion. I use the same bus parameters and filter s ize as the example g iven i n sect ion 4 .3 .3 . A pseudo-random test sequence is used. F o r this part icular example , the L P p rob l em formulated has the f o l l o w i n g properties: • fir G M 1 6 . • number o f disturbance variables, |d|: 223. Thus , total number o f variables is 240. 48 • number of constraints: 448. From figures 4.5 and 4.9, note that the eye-height for the filter designed by the LP method (1^ norm) is slightly higher than that for the LSQ filter (<!2 norm), 84% vs. 82%. As expected, optimizing for eye-height produces greater actual eye-height than the "average case" optimization of the LSQ method. The eye width for LP is significantly smaller than that for LSQ, 50% vs. 75%. This is expected because the LP filter is optimized for eye-height at a specific sampling point, whereas the LSQ objective function considers the entire waveform. Section 4.5 presents further comparisons. The speed of FIR filter design with the LP method largely depends on the size of the LP problem formulated. Thus it depends on how many bits (number of disturbances) are used to design the filter and the size of the filter. The number of disturbances is determined by the length of the bus impulse response in bit time. The smaller the bit time, the larger the LP problem. For an 8 x 2 filter design at 400 ps, on a Linux box with a 800MHz Pentium III CPU and 256MB memory, it finishes within a few seconds. Based on this method, I investigated other variations of linear FIR filters, such as time-variant linear FIR filters and other types of smoothing filters. 4.5 Testing results: Comparison of L S Q method and L P method 4.5.1 Worst-case input sequence In section 4.3.3 and 4.4.4, pseudo-random input sequences were used to measure the eye opening (eye height and eye width). By comparing the eye opening with and without the filter designed, we get an indication of the effectiveness of the filter. A shortcoming of using pseudo-random input sequences as testing sequence is the result varies a lot from time to time if the input sequence is not long enough. But simulation with a very long input 49 System without filters — Input signal on wire 1 Output signal on wire 1 Figure 4.8: Example of output signals for systems with (lower panel) and without (upper panel) the equalizing filter designed with the LP method. 50 Eye diagram for system without filters (eye height 29%, eye width 75%) 2 . CD s > 0 100 200 300 400 500 600 700 800 t(ps) Eye diagram for system with 8*2 filters (eye height 84%, eye width 50%) 1.5, 0 100 200 300 400 500 600 700 800 t(ps) Figure 4.9: Pseudo-random test: eye diagrams for systems with (lower panel) and without (upper panel) the equalizing filter designed with the LP method. Grey traces indicate high signal transmitted. Black traces indicate low signal transmitted. 51 sequence takes a l o n g t ime. Inspired by the L P filter design procedure, I used the worst-case input sequence for each filter instead o f pseudo-random input sequence as the testing sequence. S ince the total disturbance f rom other bi t t imes and other wires is the largest for the worst-case input, the eye opening is the smallest among a l l input sequences and hence the most representative. F o r a g iven set o f filter coefficients, the worst-case input sequence input w i t h length \n/r\ (where n is defined i n equation 4.48, r is the number o f taps per bit) can be found by: • for every w i r e j and every bit i, calculate the resul t ing disturbance on wi re 1 at a g iven sampl ing t ime. I f the filter coefficients are obtained w i t h the L P method, the sampl ing t ime is the same as what was used i n the L P filter des ign. F o r an arbitrary set o f filter coefficients, the sample point is not defined i n advance. Instead, I consider every tap t ime as a possible sample point and select the one w i t h the best eye height as the sample point . A c c o r d i n g l y , the worst-case input sequence is determined by finding the worst-case input for each possible s ampl ing t ime and concatenat ing these sequences together. • input(l, [n/r\ — [d/r\)= 1. W e are l o o k i n g for the largest negative disturbances w h e n 1 is sent. Nega t ion o f this input sequence is also a worst-case input sequence. • I f disturbance< 0 , input(j, i) = 1. Otherwise input(_7, i) = —1, for j £ [l,kbus],i £ [1, [ n / r j ] . In this section, a l l testing results are obtained w i t h input sequences that are con -catenations o f the worst-case input sequence and pseudo-random input sequences, unless otherwise indicated. 52 Eye diagram for system with 8*3 filters (eye height 80%, eye width 50%) 2 1 i 1 1 1 — i , _ 2 1 1 ' • 1 1 i I 0 100 200 300 400 500 600 t(ps) Eye diagram for system with 8*3 filters (eye height 92%, eye width 50%) 2 I 1 1 1 1 1 1 _2 1 1 1 1 1 i I 0 100 200 300 400 500 600 t(ps) Figure 4.10: Worst-case test (upper panel) vs. Pseudo-random test (lower panel): eye dia-grams for systems with 8 x 3 equalizing filters designed with the LP method. Grey traces indicate high signal transmitted. Black traces indicate low signal transmitted. 53 100.0% 8 0 . 0 % M 6 0 . 0 % s >> 4 0 . 0 % 2 0 . 0 % 0 .0% O • A* • • • • • • 5 10 Filter Width 15 20 Figure 4.11: Worst-case performance of different equalizing filters designed with the LP method. Only equalizing filters with eye width greater than 25% are shown. Simulation parameters are: r = 0.066 O/cm, c = 0.8 pF/cm, / = 3.99 nH/cm, lm/l = 0.31, c m / c = 0.23. Filter design parameters are: taps per bit = 4, bit time = 300 ps. Figure 4.10 upper panel shows an eye diagram obtained with such a testing se-quence. Comparing with the eye diagram shown on the lower panel, it is noticed that the worst-case input sequence happens rarely and there is a significant difference between worst-case eye height 80% and random eye height 92%. This gives a possibility that if a certain amount of bit error is tolerated by using some error correcting code strategy, the maximum bit rate could be further improved. 4.5.2 Indirect coupling For simplicity in the distributed coupled RLC model, I only considered capacitive and in-ductive coupling between adjacent lines. Thus, originally I thought that it should be enough to equalize one line when only information on this line itself and its two nearest neighbours 54 1 •< [ Cm = i = r<zxxxx>4 Lm J • h Cm = r = 1 *W*AA* 1 |_|"fl i r Cm =1= ai—V\*A*W—i Lm Cm =1= r^vCCCC^n Lm 1 i Cm =1= r^Cvvw^n Lm ^ • [ Cm =1= }=ccccco>=l Lm 1 M [ Figure 4.12: Indirect coupling between non-adjacent lines are used by the filter design. For a three-bit interconnect, this method considers all wires and therefore does give the optimal result. However, for a bus with more than three lines, if we consider more lines instead of just adjacent lines (increase the filter width), better cross-talk cancellation can be achieved. Figure 4.11 shows the performance of filters with different sizes at 300 ps bit time. Compared with an 8 x 2 filter, the 8 x 4 filter has a much greater eye opening. This indicates that although the direct coupling between non-adjacent lines is weak and ignored in the interconnect network model, the indirect coupling between non-adjacent lines is strong and shouldn't be ignored in the equalizing filter design. It is also shown in figure 4.11 that for the bus considered this trend nears its asymptote when filter width is larger than 4. The indirect coupling between non-adjacent lines is illustrated in figure 4.12. From figure 4.12, a pattern of transfer function in the frequency domain was conjectured. That is, V\,out{k) = Vl,out(k) _ 2 Vl,out{k) __ V2M(k) ~aV3,in(k) ~ a V4,in(k) If this pattern existed, a simple filter considering all lines could be designed. Unfortunately, since we are considering far end noise cancellation, which is not only a function of input voltage but also a function of distance x, this speculated pattern does not occur in practice. 55 eye diagram for system with 8*16 filters (eye height 99%, eye width 0%) t(ps) Figure 4.13: Eye diagram for system with 8 x 16 equalizing filters designed by the LP method. Grey traces indicate high signal transmitted. Black traces indicate low signal transmitted. 4.5.3 Over-fitting Compared with the LSQ method, the LP method is fast and guarantees worst-case per-formance. However, it has an over-fitting problem. Figure 4.14 shows the trend that the magnitude of overshoot increases with the size of equalizing filter designed. It suggests that with more degrees of freedom, the optimizer tends to put more energy into the filter in order to get higher eye height at the sampling time, which results in much greater overshoot. For some inputs, the output signal changes abruptly but close to the target right at the sam-pling time, resulting in a larger eye height but also a much smaller eye width. This effect is clearly shown in figure 4.13, which shows an eye diagram obtained with 8 x 16 filter designed with the LP method. The eye height of the diagram is the largest among all filters with length 8, but it barely has an eye opening, and has an enormous amount of overshoot. 56 1000000 > o 100000 10000 1000 100 10 1 100 200 300 N u m b e r o f fi l ter coefficients o o u > o 1000000 100000 10000 1000 100 10 1 X X. 5 10 15 Fi l t e r W i d t h •4 taps • 8 taps A12 taps X16 taps 20 Figure 4.14: M a g n i t u d e o f overshoot increases w i t h the size o f the equa l i z ing filter designed w i t h the L P method. U p p e r panel shows this trend w i t h the number o f filter coefficients as x axis . L o w e r panel shows this trend w i t h the filter w id th as x axis , g iven filter length. A l l s imulat ions are done w i t h the same parameters as i n figure 4 .11 . 57 T h e lower panel o f figure 4.14 shows that for a g iven filter length, magni tude o f overshoot increases w i t h the filter w id th . Whereas , for filter widths less than 6, the magni tude o f the overshoot doesn ' t f o l l o w this trend w i t h the filter length. Thus , the filter w i d t h p lays a more c r i t i ca l role than filter length i n the over-fi t t ing p rob lem. T h i s c o u l d be exp la ined by the fact that wires further away produce less disturbances on the w i re than history bits o n the w i re itself. T h u s w i t h a longer filter, the op t imize r c o u l d easi ly push the eye height up wi thout putt ing more energy into the filter. W h e n the filter length is l o n g enough to cover most s ignif icant por t ion o f the bus impulse response, this trend reaches its l i m i t . A s w e can see f rom figure 4 .11 , at 300 ps bit t ime, filters w i t h length 8 taps already do a very g o o d j o b i n cross-talk cancel la t ion. C o m p a r e d w i t h 12 tap filters, 16 tap filters don ' t s ignif icant ly improve the eye height. M o r e o v e r , for the bus considered, the improvement o f eye height by increas ing filter w id th also stops w h e n the filter w id t h is more than 4 for the bus cons idered (see 4.11) . So , very l o n g and w i d e filters w o n ' t b r ing more benefit, yet the filter becomes more and more compl ica ted and expensive. In this sense, over-fi t t ing p rob lem o f L P method m a y not be a serious p r o b l e m i n practice. Fur thermore , instead o f t ry ing to b r ing 1 tap as c lose to 1 as poss ible , the L P method can be easi ly formulated to b r ing 2 taps as close to 1 as poss ible . B y d o i n g this, sharp tran-sitions are avoided hence the amount o f overshoot is decreased. In pract ice, this method works i n decreasing the severity o f the over-fi t t ing p rob lem o f the L P method w h e n des ign-ing large filters. 4.5.4 Minimum bit time To s imp l i fy des ign and yet achieve reasonable cross-talk cancel la t ion , an important question is how many l ines away should be considered when des igning the equa l i z ing filter. In other 58 Taps Width Minimum bit time (ps) L P L S Q 4 1 679 684 4 2 338 575 4 3 332 551 4 4 298 390 4 6 298 390 4 8 240 390 4 16 228 400 8 1 679 684 8 2 240 343 8 3 234 343 8 4 228 343 8 6 200 343 8 8 200 343 12 1 679 684 12 2 234 297 12 3 200 230 12 4 200 228 16 1 679 684 16 2 234 297 16 3 200 230 0 0 740 Table 4 .1 : Performance o f equa l i z ing filters w i t h different sizes for a bus 32-bits w i d e and 5 c m long . words , what 's an appropriate w id th for the filter. A n o t h e r quest ion is h o w many taps (filter length) should w e use. O b v i o u s l y , the longer the filter, the better the noise cancel la t ion , but the more expensive the design. Table 4.1 shows s imula t ion results w i t h the filter length and w i d t h var ied . To eval -uate the performance o f an equa l i z ing filter, the m a x i m u m operat ing frequency ( m i n i m u m bit t ime) at w h i c h the height o f the eye is around 5 0 % and eye w i d t h is over 2 5 % is used. In these s imula t ions , design parameters are the same as i n figure 4.10. Table 4.1 shows that: 59 • E q u a l i z i n g filters designed w i t h both the L P method and the L S Q method effectively i m p r o v e the m a x i m u m bit rate o f the bus. • E q u a l i z i n g filters designed w i t h the L P method have better performance than equal-i z i n g filters designed w i t h the L S Q method for every conf igurat ion considered. A s discussed further below, the advantage for the L P method is most p ronounced for w i d e filters. • A n 8 x 2 filter is a good cho ice i n terms o f performance and cost. A l t h o u g h A n 8 x 3 filter does improve the eye height at lower bit rate (see figure 4.11) , it has s imi l a r m i n i m u m bit t ime as the 8 x 2 filter. N o t e that w id th = 1 is separate pre-emphasis for each l ine . W i t h w i d t h = 1, L S Q and L P have s im i l a r performance. Because the focus o f this w o r k is on cross- talk cancel la t ion , high-frequency attenuation caused by sk in effect is not bui l t i n the bus m o d e l . Because o f this, the performance o f the system without filter (wid th = 0) and systems w i t h pre-emphasis filters (wid th = 1) are s imi l a r (740 ps vs. 679 ps). W i t h cross-talk cancel la t ion (wid th > 1), the performance o f the bus is greatly i m p r o v e d (2.7 to 3.4 t imes higher bit rate than independent pre-emphasis) . W i t h w id th > 1, L P is s ignif icant ly better than L S Q . T h i s might not be a comple te ly fair compar i son because a l l the L P results were obtained w i t h an addi t ional smooth ing filter at the receiver end whereas the L S Q results were obtained wi thout any smooth ing filters. T h e L S Q method can be easi ly appl ied to the system w i t h a smooth ing filter at the receiver end. However , the L S Q method w i t h smooth ing is no better than L S Q w i t h increased number o f taps. F o r example , 8 tap filters designed by the L S Q method w i t h 3 tap smooth ing filter is no better than 12 tap filters designed by the L S Q method wi thout any smooth ing filter. Table 4.1 shows that the performance o f the L P method is better than this upper bound o f the performance o f the L S Q method w i t h 60 Taps Width Minimum bit time (ps) 20 cm bus 50 cm bus 4 1 2354 6627 4 2 1419 2510 4 3 1341 2446 4 4 1264 2432 4 6 1264 2432 4 8 1108 2432 8 1 2295 6005 8 2 1030 2393 8 3 932 2198 8 4 874 2198 8 6 835 2042 12 1 2295 6005 12 2 932 2315 12 3 797 2042 12 4 777 1963 0 0 2734 6807 Table 4.2: Performance of equalizing filters with different sizes for buses 32-bits wide. All filters designed using the LP method. smoo'thing filter. It appears that LP and LSQ are comparably well suited for designing filters for independent pre-emphasis. However, LP is much better for cross-talk cancellation. The table also shows that cross-talk cancellation is essential to obtain high bit rates with wide buses. Table 4.2 shows simulation results with the filter length and width varied for buses 20 cm long and 50 cm long. Again, the performance of equalizing filters are quantified with the minimum bit time at which the height of the eye is around 50% and eye width is over 25%. All filters are designed with the LP method. 61 4.6 Time-variant Linear FIR Filter A bit t ime consists o f several tap t imes. In most examples presented previous ly , there are 4 taps per bit. A m o n g them, the first tap and the last tap are bi t- t ransi t ion taps. T h e other two are stable taps. T h e receiver samples stable taps and is insensi t ive to the input value at bi t- transi t ion taps. T h e F I R filter designed above has no in format ion about bit t ransit ion. It treats every tap the same, no matter whether it is a bi t- transi t ion tap or a stable tap. W h a t i f we treat them differently and let the filter have the knowledge o f bi t t ransi t ion? T h i s leads to the des ign o f t ime-variant l inear F I R filter. T h e idea is to assign a set o f filter coefficients for each tap per bit. F o r example , for a 8 x 3 F I R filter and 4 taps per bit, the t ime-variant l inear F I R filter contains 4 sets o f 8 x 3 filter coefficients. B y d o i n g this, the op t imize r can differentiate transit ion taps f rom stable taps, and assign different filter coefficients to their cor responding filter. T h e way this t ime-variant filter works is i l lustrated i n figure 4 .15. In the figure, different sets o f filter coefficients are indicated by different l ine style. T h i s p r o b l e m can be formulated into a l inear p r o g r a m m i n g p r o b l e m s i m i l a r l y as the o r ig ina l s imple l inear F I R filter design. T h e on ly difference between these two l inear pro-g r a m m i n g p rob l em formula t ion is the wa y that filter output is ca lcula ted . L e t f i r i , . . . , fir r denote the r set o f filter coefficient vectors. T h e t ime-variant l inear F I R filter coefficient vector: F = f i r 2 f i r r (4.60) 62 x x x x x x x x x x x x x x x x x If f \ f f \ f flf f l f f l f 1-1 I I I 1 I I I x x x x x x x x x x x x x x x x x I I I I I I I I I I I 1 I I I I I T I x x x x F I R 2 F I R 3 F I R 4 F igu re 4 .15: T h e convolu t ion procedure o f the t ime-variant F I R filter. Different sets o f filter coefficients are indicated by different l ine style. F o r a g iven input sequence input i n bi t t ime, the filter output is: filterOutput = shuffle in in in in (4.61) t0>Sx*bus where in = (input r ) x " w i t h n = \(nmputr + n f i r + n b u s ) / r ] • r, and shuffle e b u s xrnkbus j s t n e f 0 i i o w m g m a t r i x : shuffle(j, i) = < 1 i f (i d iv r) = (j m o d n/r) and (i m o d r) — (j d iv n/r) (4.62) 0 otherwise T h e correctness o f the t ime-variant F I R filter designed is checked by adding a set o f equal i ty constraints w h i c h specify that a l l four sets o f filter coefficients are equal . Af te r 63 adding the equality constraints, this design gives the same set of filter coefficients as the simple FIR filter designed in section 4.4, and the same objective value. So the time-variant FIR filter designed should be at least as good as the simple FIR filter designed in section 4.4. For example, in the case of 4 taps per bit, and same design parameters as for figure 4.10, time-variant 8x3 FIR filter has worst-case eye height 86% (vs. 80% for simple 8 x 3 FIR filter). The improvement of the eye height tells us it does help to assign different filter coefficients to different taps. However the benefit might not be large enough to justify any extra cost in an implementation. 4.7 Optimized Smoothing Filter The system structure shown in figure 4.7 naturally leads to the topic of optimized smoothing filter design. In previous sections, a smoothing filter which simply averages over 3 taps was used. Test results show that this is a good choice. However it is not optimal. For example, it is observed from those eye diagrams that weights assigned to those 3 taps shouldn't be the same. The first tap and last tap are closer to tap transition and should have smaller values. The middle tap contributes more to the eye height and should be assigned a larger weight. Moreover, there is no reason to limit the window size of the smoothing filter to only 3 taps. The system where the coefficients of both the equalizing filter and the smoothing filter are taken as variables is not linear because output values of the bus depend on the product of the coefficients of the two filters. Thus, a LP formulation is no longer possible in this case. The following simple strategy addresses this problem. Given a set of smoothing filter coefficients, a set of optimal equalizing filter coefficients and the resulting optimal objective value can be easily computed with the LP method presented previously. Thus the LP method can be treated as a function with smoothing filter coefficients as its variables and the objective value s as its return value. Then a general optimizer (in particular, finincon() 64 Filters Smoothing eye height at (tapsxwidth) Filter bit time 300 ps X X 7 % 8 x 3 N o smooth ing filter 7 4 % s imple averaging 8 x 3 over 3 data points 8 1 % 8 x 3 op t imized 3 tap w i n d o w 8 6 % 8 x 3 op t imized 9 tap w i n d o w 9 5 . 5 % Table 4 .3 : Performance o f different smooth ing filters w i t h 8 x 3 equa l i z ing filters designed by the L P method at 300 ps. p rov ided by M a t l a b O p t i m i z a t i o n T o o l b o x ) can be used to m i n i m i z e this funct ion on the variable space o f smooth ing filter coefficients. T h i s strategy doesn' t guarantee finding the op t ima l so lu t ion , actual ly it doesn' t even give an ind ica t ion about h o w c lose w e are to the op t ima l point . H o w e v e r it does f ind better smooth ing filter coefficients and equa l i z ing filter coefficients i n terms o f decreasing the objective value s and larger eye height (see table 4.3). Table 4.3 shows the effectiveness o f this approach. In this example , the bus is a 32-bit bus w i t h a length o f 5 c m . T h e electr ical parameters o f the bus are: r = 0.066 fi/cm, c = 0.8 p F / c m , / = 3.99 n H / c m , lm/l = 0 .31, c m / c = 0.23. Other s imula t ion parameters are: taps per bi t = 4, 7 % = 8, = 3. 4.8 S u m m a r y In this chapter, I descr ibed two methods o f l inear equa l i z ing filter des ign: the L S Q method corresponding to h metr ic and the L P method corresponding to /<» metr ic . T h e s imu la -t ion results presented i n section 4.5 demonstrate that the L P method outperforms the L S Q method and provides effective methods for des igning cross-talk cance l ing equa l i z ing filters that greatly increase the bandwid th o f high-speed d ig i ta l buses. B a s e d on the l inear F I R filter designs, t ime-variant F I R filter design and o p t i m i z e d smooth ing filter des ign were 65 presented. 66 Chapter 5 Predictor-Corrector Algorithm with Model Reduction As shown in chapter 4, an optimal filter design problem can be formulated into the following LP problem: mm{Fy\By<d} (5.1) In chapter 4, the LP method is introduced based on the assumption that the LP problem formulated could be solved. It turns out that the linprog() routine provided by Matlab does not converge when more than 7 bits are used to design the filter. In the case of large-scale problems, linprogO implements LIPSOL (Linear Interior Point Solver [21]), which is a variant of Mehrotra's predictor-corrector algorithm, a primal-dual interior-point method. It is known that when approaching the optimal solution, the system gets more and more ill-conditioned, which may eventually lead to non-convergence. In this chapter, I describe an approach that I implemented to overcome the ill-conditioning problem and can be used to solve problems where B is too large to be given explicitly. This approach employs Mehrotra's predictor-corrector algorithm along with a model reduction technique. 67 Sec t ion 5.1 introduces Mehro t ra ' s predictor-corrector a lgor i thm. T h e n , several ma-j o r issues i n implementa t ion are discussed. Sec t ion 5.3 is devoted to the m o d e l reduct ion technique that I used to overcome the p rob lem o f i l l - cond i t i on ing . A l t h o u g h the method in t roduced here is used to solve the l inear filter des ign (wi th B e x p l i c i t l y g iven) , it c o u l d easi ly be adapted to solve l inear programs w i t h B g iven i m p l i c i t l y , by us ing an iterative solver instead o f C h o l e s k y factorizat ion to solve the l inear systems encountered. 5.1 Mehrotra's predictor-corrector algorithm P r i m a l - d u a l inter ior point methods outperform the s imp lex method o n many larger prob-lems and per form better than other interior point methods [14]. A m o n g many general a l -gor i thmic approaches, the most effective one i n practice has proven to be the p r ima l -dua l infeasible- inter ior-point approach, i nc lud ing a number o f variants and enhancements such as Mehro t ra ' s predictor-corrector technique [13]. T h e M a t l a b funct ion linprog() i m p l e -ments an variant o f this a lgor i thm i n the case o f large scale problems . C o n s i d e r the L P p rob l em i n standard fo rm: m i n {cTx\Ax = b,x>0} (5.2) where A G R m x n , w h i c h determines the sizes o f other vectors i nvo lved . T h e dual p rob l em for equat ion 5.2 is m a x {bT\\ATX + s = c,s > 0} (5.3) It is w e l l k n o w n that p r ima l -dua l solutions o f equat ion 5.2 and 5.3 are character ized by K a r u s h - K u h n - T u c k e r condi t ions [14]: AT\ + s-c F{x,X,s)= Ax-b XSe = 0 (5.4) 68 where X = diag(xi,x2, ...,xn) S = diag(si,s2,...,sn) e = (1,1,-.., 1) T T h e sys tem o f equations 5.4 can be so lved by app ly ing N e w t o n ' s method and ca r ry ing out a l inear search to enforce the non-negat ivi ty constraints on x and s. Unfor tunately , often w e can on ly take a s m a l l step before the non-negativi ty constraints get v io la ted . Therefore, the pure N e w t o n ' s method w i t h l inear search converges very s l o w l y i n this case. Rather than s o l v i n g the sys tem o f equations 5.4, p r ima l -dua l interior point methods int roduce the concept o f a central path. T h e central path is parameterized by a scalar r , and consists o f a set o f points that are solutions o f the f o l l o w i n g l inear system for T > 0: A T \ + s - c i l F(xT, XT,sr) — Ax - b 0 XSe re {x,s) > 0 T h e role o f r is to enforce that a l l the complementar i ty products have the same values for a l l indices . Hence , the central path keeps iterates b iased towards the inter ior o f the nonnegative orthant (x, s) > 0 . A s r approaches 0, the so lu t ion o f the l inear sys tem 5.5 approaches the op t ima l solut ion (x*, A * , s*) w h i c h is the so lu t ion o f the l inear sys tem 5.4. In practice, r is defined as the product o f a centering parameter a and a complementar i ty gap fx, where a € [0,1] and /J, = xTs/n. Mehro t r a ' s predictor-corrector a lgor i thm [13][14] implements the basic ideas de-scr ibed above w i t h extra second-order correct ion. It consists o f three major steps. 69 Given an initial point (x°, X°,s°) with (x°, s°) > 0. Fork = 0,1,2,... Predictor step: At this step, it computes the pure Newton (affine-scaling) direction (Axa^, AX°ff, L\saff) by solving: 0 AT I L\xaff -rc A 0 0 L\Xaff = - n (5.6) Sk 0 Xk -XkSke where r c = ATXk + sk — c, = Axk — b are the residuals in primal and dual feasibility respectively. Adaptive approach to compute centering parameter a: This parameter is calculated in terms of the complementarity gap at the current point and the complementarity gap after a hypothetical step in the affine scaling direction is taken. The step size in the affine scaling direction is calculated by o ^ = m i n ( l , min -xk/8xf) iff o4=min(l , min -ski/5sf) (5.7) in i-.5SyJ<o Vaff =(x + alffL\xaff)T(s + adaffL\saff)h Then set the centering parameter to a = ( / i a j / / i ) 3 . Thus, the centering parameter is small when good progress can be made in the affine direction and large when the affine direction produce little improvement and more centrality is needed. This is chosen to trade off between the twin goals of reducing \x and improving centrality. A corrector step: solves the following equations to get a corrected, centered step direction. It is essentially a step based on the Taylor series expansion of the complementarity equations [15]. 70 0 AT I Axk A 0 0 AXk — Sk 0 Xk Ask -Tc. - n -XkSke - Diag(Axaff)Diag{Asaff)e + a\xe (5.8) C o m p u t e step size s imi l a r ly as i n equations 5.7, and update (xk+1, Xk+l, sk+1) = (xk, Xk,sk) + a(Axk, A A f c , Ask). 5.2 Implementation Details W i t h above f ramework, we s t i l l need to specify the f o l l o w i n g : (1) the in i t i a l poin t and the stopping cr i ter ia; (2) how to solve l inear systems 5.6 and 5.8. 5.2.1 Starting and Stopping T h e starting point select ion is based on [12]. M a t l a b lsqr() solver solves the system Ax = b and makes an in i t i a l estimate to the p r i m a l variable x, denoted as x. T h e n define £ i = m a x (— m i n i < j < n a T j , 100, || b || / 1 0 0 ) and £2 = 1 + || c ||, where || • | | i denotes L\ no rm. T h e n , for each j = 1 , . . . , n, set xQ- = m a x (x, (1) and s ° = m a x (£2, x'j). A t last compute A 0 g iven (x°,s°). Stopp ing cr i ter ion is the standard one [21]: I cTx - bTX I error + + < tol (5.9) m a x (1, || b ||) m a x (1, || c ||) m a x (1, | cTx |, | bTX |) where r^, rc are the residuals i n p r i m a l and dual feasibi l i ty. In this implementa t ion , tol is 1 0 " 6 by default. 71 5.2.2 Solving the linear systems T h e specia l structure o f the left-hand-side matr ix i n 5.6 and 5.8 a l lows us to reformulate them as systems w i t h posi t ive definite matrices [14]. F o r example , 5.6 can be reformulated into the f o l l o w i n g system: AD2ATAX = -rb + A{-S-lXrc + x-a^S-le) As = —rc — ATAX (5-10) Ax = -x + t7fiS-1e - S-lXAs w i t h D = S~ll2X1l2 and As = —s - X'^-SAx. C h o l e s k y factor izat ion is used to solve the first equat ion i n this l inear system. T h e n A s and A A are obtained. I f w e don ' t k n o w anything about A exp l i c i t l y except its d imensions , iterative methods can be used to solve this l inear system. 5.3 Ill-conditioning and Model Reduction E a c h iteration o f the predictor-corrector a lgor i thm involves s o l v i n g l inear systems whose left-hand-side mat r ix is the same as AD2AT i n equation 5.10. A s w e approach the op t ima l point , either Xj or Sj (for each j = 1 , . . . , n ) decreases to zero. T h u s the elements o f the d iagona l mat r ix D take on both huge and t iny values. F o r this reason, i l l cond i t ion ing often occurs dur ing the final stages o f the predictor-corrector a lgor i thm. In practice, this prevented linprogO f rom converging to a point that satisfied the error tolerance for the l inear programs ar is ing i n filter design us ing the L P formula t ion presented i n chapter 4. M y implementa t ion uses two techniques to handle this i l l cond i t ion ing : hopp ing to the op t ima l point and m o d e l reduct ion. • H o p over to the op t ima l 72 If we are really close to the optimal, the L P solver can hop over to the nearest vertex on the polytope boundary and check if it is optimal. The simplest thing to do is to set the smallest n — m components of a; to 0 and solve Ax = b for the remaining components of x. As we know that x variables in the primal form reflect the marginal costs of corresponding constraints in the dual form, Xj = 0 means the jth constraint in the dual form is non-essential (the optimal solution is not affected by perturbations of this constraint). So we solve the remaining part of ATX = c for A. Then we can get values for the slack variables s. After we get the solution, we need to check for its feasibility and optimality (i.e. make sure all non-negativity constraints on x and slack variables s are satisfied). I've tried this for linear FIR filter design problems. The solutions I got had negative x and s values. This suggests that although we are close to the optimal (system gets ill-conditioned), we are not close enough to be able to identify the optimal vertex. • Model Reduction Although we are not close enough to be able to hop over to the optimal solution di-rectly, we know that we are in a region close to the optimal vertex. In this region, we should be able to identify some of the non-essential constraints (although not all of them). Since x variables of the primal form reflect the marginal costs of correspond-ing constraints of the dual form, the XjS corresponding to these ready-to-identify non-essential constraints are very small and contribute to the ill-conditioning of the linear system. After we identify them, by setting those XjS to 0 and ignoring their corresponding constraints in the dual form, we reduce the original L P problem to a smaller size problem and with a smaller condition number. Then we solve the smaller problem as before. If the total error gets smaller than the tolerance, we stop. If sys-73 tem gets i l l - cond i t ioned , w e do m o d e l reduct ion again. W e keep d o i n g this t i l l either the total error is smal le r than the tolerance or a l l n — m non-essential constraints are identif ied. - E m p i r i c a l cr i ter ion for Xj be ing sma l l : I f Xj <\\ x || / ( 4 0 v / n ) , set Xj to 0 and label j t h constraint o f the dual f o r m as non-essential . T h e cho ice o f 1/40 was s i m p l y to make sure that Xj was re la t ively s m a l l . - W h e n to do m o d e l reduct ion? In this implementa t ion , when the l inear system gets i l l - cond i t i oned , m o d e l re-duc t ion w i l l be done. cond() p rov ided by M a t l a b is used to calculate the c o n d i -t ion number o f the matr ix . A default threshold 1 0 7 is used to determine whether m o d e l reduct ion is needed. In the case that we don ' t have the exp l i c i t f o r m o f H (r ight-hand-side matr ix) avai lable, w e have no w a y to calculate its cond i t ion number. H o w e v e r the relative residual o f iterative solver minres() is a g o o d i n -d ica t ion o f how i l l - cond i t ioned the system is. It appears that for a fixed number o f iterations, the more the system gets i l l - cond i t ioned , the larger the relative res idual is . - M o d e l reduct ion guesses are verif ied at the end by p lugg ing the final so lu t ion to the o r ig ina l p rob l em and check ing op t imal i ty requirements. 5.4 Results Suppose 12 bits are used to design an 8 x 3 l inear F I R filter for a 32-bit bus at 4 0 0 ps bit t ime. T h i s l inear F I R filter design p rob lem can be formulated as a L P p rob l em w i t h 768 inequal i ty constraints and 408 variables. It is naturally i n dual fo rm. In this sect ion, w e 74 Residuals: Primal Dual Duality Total Infeasibility Infeasibility Gap Relative Ax - b ATy + z - f T x z Error Iter 0: 2.74e+03 9.95e+01 3.96e+05 1.00e+03 Iter 1: 2.13e-09 5.82e+01 1.46e+05 4.11e+01 Iter 2: 5.15e-08 4.83e-02 4.39e+02 1.00e-00 Iter 3: l.lle-07 5.10e-04 3.64e+00 9.99e-01 Iter 4: 6.41e-04 1.87e-04 1.02e+00 4.77e-01 Iter 5: 1.27e-02 3.72e-05 3.40e-01 2.26e-01 Exiting: One or more of the residuals, duality gap, or total relative error has grown 100000 times greater than its minimum value so far: the dual ap-pears to be infeasible (and the primal unbounded). (The primal residual < TolFun=1.00e-08.) Table 5.1: UnprogO iteration display Residuals: Primal Dual Duality Total # Infeasibility Infeasibility Gap Relative constraints Ax - b ATy + z - f T X Z Error reduced IterO 0.02 10.8843 194.059 3.5854 Iter 1 0.0004 0.21769 7.7587 1.0515 Iter 2 8e-06 0.050589 1.871 0.9993 Iter 3 1.8542e-07 0.014951 0.5932 0.59672 Iter 4 8.1055e-08 0.0048963 0.2152 0.21635 Iter 5 4.5586e-08 0.00016507 0.089084 0.089123 Iter 6 8.3459e-09 7.076le-05 0.037025 0.037042 Iter 7 1.457e-09 1.9964e-05 0.011225 0.01123 Iter 8 4.3047e-10 7.1626e-06 0.0042428 0.0042445 Iter 9 1.4528e-10 3.0168e-06 0.0018633 0.001864 IterO 0.00028457 7.0108e-07 0.00055431 0.00083904 17 IterO 0.00054624 1.8242e-07 0.00020812 0.00075441 26 IterO 0.00015359 6.6702e-08 3.7658e-05 0.00019127 39 IterO 0.00026999 1.4576e-08 7.4835e-06 0.00027748 78 IterO 4.4736e-05 4.2838e-09 1.603e-06 4.634e-05 93 IterO 2.7846e-05 1.6497e-10 6.0702e-08 2.7906e-05 75 IterO 1.121 le-05 3.362e-12 1.2376e-09 1.1212e-05 13 Iter 1 2.2422e-07 6.7534e-14 2.2472e-ll 2.2424e-07 Table 5.2: Iteration display of our approach: Mehrotra interior-point method with model reduction. 75 show the results obtained w i t h linprog() and our approach. T h e linprog() i terat ion d isp lay is shown i n table 5.1. T h e linprogO routine doesn't converge on this p rob l em! Table 5.2 shows the i teration d isp lay o f the method presented i n this chapter. T h i s method does converge! There are 7 m o d e l reductions a long the way. T h e p rob lem size was reduced by 17, 26, 39, 78, 93 , 75 and 13 respect ively at each t ime. Al toge ther 341 constraints were reduced. 76 Chapter 6 Conclusions and Future Work T h i s thesis explores the effectiveness o f equa l i z ing filters i n cross-talk cance l la t ion for h i g h -speed, off-chip buses. It demonstrates that l inear p r o g r a m m i n g provides effective methods for des ign ing cross-talk cance l ing equa l i z ing filters that greatly increase the bandwid th o f h igh-speed d ig i ta l buses. F o r 5 c m long 32-bit w i d e P C B buses that are c lose ly spaced (75 fim w i d t h and 75 / j m separation), w i t h s imple fu l l - sw ing voltage s igna l ing method, system w i t h 8 x 2 equa l i z ing filters can operate at 4 . 1 G H z . Wi thou t equa l i z ing filters, such buses can on ly operate at 1 . 3 G H z , 1 . 4 7 G H z w i t h pre-emphasis but no cross-talk cancel la t ion . In this thesis, a coup led distributed R L C interconnect m o d e l is first constructed and val idated. B a s e d on the bus mode l , the first technique used to des ign equa l i z ing fil-ters is the least squares method. The least squares method produces equa l i z ing filters that greatly improve s ignal integrity and m i n i m u m bit rate at w h i c h signals can be received correctly. N e x t , because the w h o l e t ransmiss ion network is linear, the equa l i z ing filter de-s ign p rob l em can be formulated into a l inear p r o g r a m m i n g p rob l em w h i c h uses the Loo metr ic cor responding to the tradit ional eye height measurement o f s ignal integrity. E q u a l -i z i n g filters designed w i t h the l inear p rog ramming approach have better performance than 77 the filters designed with the least squares method (see section 4.5). Another advantage of the linear programming method over the least squares method is that it does not depend on pseudo-random sequences and guarantees worst-case performance. The simulation re-sults presented in chapter 4 show that the equalization technique is a promising method in cross-talk cancellation for high-speed buses. Moreover, scaling trends of the VLSI technology favor this approach. Long buses cost more and support lower data rates. The cost of the bus justifies added circuitry on the chip. The lower data rate provide more time for the filtering operations. Furthermore, improvements in chip fabrication are producing smaller and faster circuits for implementing the filter while buses remain big and slow. This also contributes to the favorability of adding more sophisticated equalizing filters. 6.1 Future work This work has demonstrated the effectiveness of equalization technique in cross-talk can-cellation for high-speed PCB buses by simulation. A natural and necessary work in the future is circuit design, PCB fabrication and test. Besides this, based on these results, in the future, the following ideas should be explored to further improve the performance of the equalizing filter design: • Figure 4.6 shows the impulse response of the bus considered in this thesis. The source end of the bus is terminated with resistors whose resistance equals the approximated input impedance of the bus: However the bus traces are not perfect LC lines. Its resistance, mutual inductance and coupling capacitance makes the input impedance calculation inaccurate. This is 78 shown i n figure 4.6 by the reflections o f the impulse response. Because o f the l i m i t i n filter length, these reflections degrade the performance o f the equa l i z ing filters. O p -t imiza t ion routines, such as M a t l a b ' s fininunc can be used to m i n i m i z e the ampl i tude o f reflections by tuning the input resistor. W i t h current A D C / D A C technology, more taps per bit can be imp lemen ted at over 1GHz bi t rate. It was observed that by increasing the number o f taps per bit, the performance o f equa l i z ing filters for longer buses can be i m p r o v e d substantially. T h i s issue should be investigated more thoroughly and systemat ical ly i n the future. skin effect Because the pr imary goal o f this thesis is cross-talk cancel la t ion , high-frequency at-tenuation caused by sk in effect was not taken into account i n the bus m o d e l . H o w -ever, sk in effect is one o f the major components that l i m i t the of f -chip s igna l ing above 1GHz [2]. It has been shown that pre-emphasis for serial l i nks greatly improves their bandwid th [2]. In the future, sk in effect should be incorporated into the bus m o d e l . T h e method presented i n this thesis can be used di rect ly w i t h the bus models that inc lude the sk in effect. Differential signaling Different ia l s igna l ing is c o m m o n l y used to achieve h i g h speed s igna l ing and improve s ignal integrity. T h e down-s ide o f the differential s igna l ing method is that it doubles the number o f wires . Is the differential s igna l ing method inherent ly the best one g iven 2k wires for t ransmit t ing k s ignals? Moreove r , can we i m p r o v e performance by adding m wires to a A>bit bus to transmit k s ignals? T h e l inear p r o g r a m m i n g method for equa l i z ing filter design can be adapted to answer these questions. Multi-level signaling 79 M u l t i - l e v e l s igna l ing has been a great deal o f recent interest. It uses mu l t i p l e voltage levels and hence has lower fundamental frequencies than the s imp le b inary s igna l ing at the same data rate. M a n y off-chip c o m m u n i c a t i o n l inks e m p l o y mul t i - l eve l s igna l -i ng to achieve higher performance w i t h l im i t ed bandwidth . M e t h o d s presented i n this thesis can be easi ly adapted to design equa l i z ing filters for mu l t i - l eve l s igna l ing . • Non-linear filter design Because filters are relat ively narrow, look-up tables may be a s imp le and pract ical i m -plementat ion. B y exp lo i t ing the symmetr ies o f the bus, these tables can be made quite compact . There is no reason that the table entries must correspond to l inear c o m b i n a -tions o f the inputs. Thus , non-l inear filters may be easy to implemen t and they may be able to handle the apparently rare worst-case input patterns more effectively. I started to explore this idea. F o r example , each entry o f a l ook-up table can correspond to the filter output for a part icular type o f input. Input types can be d i v i d e d accord ing to the number o f transitions. G i v e n an input type on a w i re at the current bit, input types on its adjacent wires and preceding bit t imes are constrained. T h e l inear equa l i z ing filter des ign suggests long history and few future bits need to be cons idered i n order to get a good filter. T h e cha in effect o f the input type constraints makes the non-l inear filter des ign imprac t i ca l to find the worst-case input. U s i n g more sophist icated op t imiza t ion method may solve this and is a topic for future work . 80 Bibliography [1] P.M. Crespo and M.L. Honig. Pole-zero decision feedback equalization with a rapidly converging adaptive IIR algorithm. IEEE Journal of Selected Areas in Communica-tions, 9:817-829, 1991. [2] WJ. Dally and J.W. Poulton. Transmitter equalization for 4-GBPs signaling. IEEE Micro, 1:48-56, 1997. [3] WJ. Dally and J.W. Poulton. Digital Systems Engineering. Cambridge University Press, 1998. [4] A. Fiedler, R. Mactaggart, J. Welch, and S. Krishnan. A 1.0625Gbps transceiver with 2x-oversampling and transmit signal pre-emphasis. In Proc. ofISSCC97, pages 238-239,1997. [5] NJ . Higham. Accuracy and Stability of Numerical Algorithms. SIAM, 1996. [6] M.L. Honig, P. Crespo, and K. Steiglitz. Suppression of near- and far-end crosstalk by linear pre- and pose-filtering. IEEE Journal of Selected Areas in Communications, 10:614-629, 1992. [7] M.L. Honig, K. Steiglitz, and B. Gopinath. Multichannel signal procesing for data communications in the presence of crosstalk. IEEE Transactions on Communications, 38:551-558, 1990. [8] M . Horowitz, C. Ken, and S. Sidiropoulos. High speed electrical signalling: Overview and limitations. IEEE Micro, 18:12-24, 1998. [9] L. Jackson. Digital Filters and Signal Processing. Kluwer Academic Publishers, 1996. [10] H. Johnson and M. Graham. High-Speed Digital Design: A Handbook of Black Magic. Prentice Hall, 1993. 81 [11] L . L u and V . U n g v i c h i a n . Cross ta lk versus interl ine space i n ul t ra h igh speed d ig i ta l P C B s . In IEEE International Symposium on EMC, pages 6 2 9 - 6 3 4 , 1998. [12] I. J . L u s t i g , R . E . Mars t en , and D . F . Shanno. O n imp lemen t ing mehrotra 's predictor-corrector inter ior point method for l inear p rog ramming . S1AM Journal on Optimiza-tion, 2 :435 -449 , 1992. [13] S. Mehro t r a . O n the implementa t ion o f a p r ima l -dua l inter ior point method. SIAM Journal on Optimization, 2 : 5 7 5 - 6 0 1 , 1992. [14] J . N o c e d a l and S. W r i g h t . Numerical Optimization, pages 3 9 5 ^ - 1 7 . Spr inger Series i n Operat ions Research , Spr inger Press, 1999. [15] V . R i c o - R a m i r e z and A . W . Westerberg. Interior point methods on the so lu t ion o f condi t iona l models . Techn ica l report, Carneg ie M e l l o n Univers i ty , 1997. [16] J . S a l z . D i g i t a l t ransmission over cross-coupled l inear channels . Bell System Technol-ogy Journal, 64 :1147 -1159 , 1985. [17] V . S to janovic , G . G i n i s , and M . A . H o r o w i t z . Transmi t pre-emphasis for high-speed t ime-d iv i s ion -mul t ip l exed ser ia l - l ink transceiver. IEEE Transactions on Communica-tions, 38 :551 -558 , 2001 . [18] S . K . Tewksbury . Microelectronic Systems Interconnections. I E E E Press, 1995. [19] C K . Y a n g , V . Sto janovic , S. Modj tahed i , M . A . H o r o w i t z , and W . F . E l l e r s i c k . A ser ial-l i n k transceiver based on 8-GSamples / s A / D and D / A converters i n 0 . 2 5 - ^ m C M O S . In IEEE International Conference on Communications, pages 1934-1939 , 2002. [20] T . S . Y e o , C . S . N g , M . S . L e o n g , and P S . K o o i . Interline c o u p l i n g o f ul tra-high-speed pulse propagat ion on P C B . IEEE Transactions on Electromagnetic Compatibility, 35(3) :401-404 , A u g u s t 1993. [21] Y . Z h a n g . S o l v i n g large-scale l inear programs by interior-point methods under the M A T L A B environment . Techn ica l Repor t T R 9 6 - 0 1 , U n i v e r s i t y o f M a r y l a n d , Ju ly 1995. 82
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Equalizing filter design for cross-talk cancellation
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Equalizing filter design for cross-talk cancellation Ren, Jihong 2002
pdf
Page Metadata
Item Metadata
Title | Equalizing filter design for cross-talk cancellation |
Creator |
Ren, Jihong |
Date Issued | 2002 |
Description | As interconnect line width and spacing decreases and operating clock rate increases, interconnect has become a bottleneck in developing high-speed integrated circuits, multichip modules, printed circuit boards, and systems. With small line spacing, mutual capacitance and inductance approach the level of self-capacitance and inductance, and can severely degrade signal integrity. The well-known equalizing filter method can significantly improve signal integrity. This thesis explores the effectiveness of equalizing niters in cross-talk cancellation for high-speed, off-chip buses. It demonstrates that linear programming provides effective methods for designing cross-talk canceling equalizing filters that greatly increase the bandwidth of high-speed digital buses. |
Extent | 3156859 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
FileFormat | application/pdf |
Language | eng |
Date Available | 2009-09-30 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
IsShownAt | 10.14288/1.0051448 |
URI | http://hdl.handle.net/2429/13450 |
Degree |
Master of Science - MSc |
Program |
Computer Science |
Affiliation |
Science, Faculty of Computer Science, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2002-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-ubc_2002-0543.pdf [ 3.01MB ]
- Metadata
- JSON: 831-1.0051448.json
- JSON-LD: 831-1.0051448-ld.json
- RDF/XML (Pretty): 831-1.0051448-rdf.xml
- RDF/JSON: 831-1.0051448-rdf.json
- Turtle: 831-1.0051448-turtle.txt
- N-Triples: 831-1.0051448-rdf-ntriples.txt
- Original Record: 831-1.0051448-source.json
- Full Text
- 831-1.0051448-fulltext.txt
- Citation
- 831-1.0051448.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0051448/manifest