UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

An integrated low-cost functional tester for CMOS logic Low, William 1993

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1994-0080.pdf [ 2.94MB ]
Metadata
JSON: 831-1.0064845.json
JSON-LD: 831-1.0064845-ld.json
RDF/XML (Pretty): 831-1.0064845-rdf.xml
RDF/JSON: 831-1.0064845-rdf.json
Turtle: 831-1.0064845-turtle.txt
N-Triples: 831-1.0064845-rdf-ntriples.txt
Original Record: 831-1.0064845-source.json
Full Text
831-1.0064845-fulltext.txt
Citation
831-1.0064845.ris

Full Text

AN INTEGRATED LOW-COST FUNCTIONAL TESTER FOR CMOS LOGICByWiffiam LowB. A. Sc., University of British Columbia, Canada, 1991A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinTHE FACULTY OF GRADUATE STUDIESELECTRICAL ENGINEERINGWe accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIA1993© Wiffiam Low, 1993In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature)Department of E (edri Lc& I Ei’aeev-ui’tyThe University of British ColumbiaVancouver, CanadaDate DecDE-6 (2/88)AbstractThis thesis focuses on improving the quality of tests performed by low-cost testers for Very LargeScale Integration (VLSI) chips. The testing of timing parameters become increasingly importantwith higher performance technology. Circuits that operate correctly at low speeds may fail at higherspeeds because of timing problems. Most low-end test systems lack the performance required toconduct the timing diagnostics needed to perform more advanced testing of modern VLSI chips.Areas of improvement in low-cost test systems include increasing vector memory, test speed,or I/O timing and wave formatting capability. Increasing I/O timing and wave formatting capabilityprovides a good compromise of improved tester functionality at a reasonable cost. These featuresprovide considerably more improvements in test quality than simply increasing the raw test speed(or test pattern rate).A test strategy was developed that exploits this improved timing and I/O waveform formatting capability to generate short high-speed clock bursts which can be used to perform testingat speeds equivalent to a much higher rate than the tester’s pattern rate. This strategy can beused to test many digital designs. However, using this strategy for “high-speed” testing requiresmore effort in test pattern development and more test vectors than an equivalent test performedon a high-speed tester. This approach can also be used in existing high-end testers that alreadyhave the required timing and waveform formatting capabilities.A functional tester system for CMOS logic was developed that provides these improvedtiming and I/O formatting features. This system integrates most of tester circuits into modularfunctional tester chips (FTCs) which are used in parallel to form a tester. Vector encoding combinedwith an internal lookup table was used to reduce the memory bandwidth required to support theincreased functionality.A single-channel FTC was designed and implemented on a 3.8mm x 2.9mm die and requires:iionly 40-pins which gives considerable allowance for future increases in memory width and channels.Measurements from earlier prototypes of the FTC waveform formatter give worst case timingresolutions of ins and maximum rise-fall skews of i-2ns. Worst case skews between devices were onthe order of ins. These results are very reasonable considering the relatively conservative standardcell design using i.2m CMOS.111Table of ContentsAbstract iiList of Tables viiList of Figures ixAcknowledgements xi1 Introduction 11.1 Background on VLSI Testing 11.1.1 Design for Testability 21.1.2 IC Testers 41.2 Thesis Objective and Organization 52 Functional Testing and System Specifications 72.1 Functional Testing 72.1.1 Waveform Driving and Sensing 72.1.2 An Example 102.2 Functional Tester System Specifications 132.2.1 Limited High-Speed Testing 153 Architectures and System Design 193.1 Tester Architectures 193.1.1 Shared Resource Architecture 213.1.2 Tester Per-Pin Architecture 22iv3.2 Exploiting VLSI Technology 233.2.1 Systems Without Wave Formatting 243.2.2 Systems With Wave Formatting 263.3 System Design 273.3.1 Functional Tester System Overview 273.3.2 System Operation 283.3.3 Functional Tester Chip 293.3.4 Board Layout 293.3.5 Pin Count and Vector Depth 313.3.6 Speed and Bandwidth 324 Functional Tester Chip 344.1 Overview 344.2 Control 344.3 Registers 364.4 Memory Interface 364.5 Clock Module 384.6 Test Data Generation 384.6.1 Format Memory 384.6.2 Instruction Encoding and Decoding 394.6.3 Pipelining 414.7 Waveform Formatting and Timing 424.7.1 Timing Generation 424.7.2 Propagation Delay Sensitivities 444.7.3 Formatting 464.7.4 Dead-Zones 464.7.5 Skew 48v4.8 Input Comparison.494.9 Pin Electronics 515 Implementation and Results 525.1 Implementation 525.1.1 FTC Area 525.2 Performance 555.2.1 Timing Resolution 565.2.2 Timing Skew 596 Conclusions and Future Work 676.1 Conclusions 676.2 Future Work 68Bibliography 70A FTC Pin Description 74B WFC Measurements 76C FTC Implementation Details 78C.1 FTC Circuit Documentation 78C.2 FTC Timing 96C.3 FTC Circuit Schematics 98viList of Tables2.1 Example test vectors for testing flip-flop 112.2 Example timing parameters for flip-flop 113.1 Characteristics of VLSI technologies 234.1 FTC Modes 354.2 Address space of FTC registers 374.3 Format memory word fields 394.4 Test data stored per byte for scheme A instruction distributions 414.5 Test data stored per byte for scheme B instruction distributions 414.6 Pulse generator 0 triggering conditions 474.7 Pulse generator 1 triggering conditions 474.8 Leading edge skew 494.9 Trailing edge skew 495.1 Bidirectional pads available from CMC’s CMOS4S cell library 545.2 Simulated FTC edge placement times in ns (normalized) 575.3 Average WFC edge placement times for a rising edge at t1 in ns (normalized) 585.4 Rise-fall skew measurements 635.5 Rise-fall skew calculations 63A.1 FTC pinout 75B.1 Rising edge at t1 76B.2 Faffing edge at t1 76vuB.3 Rising edge at277B.4 Faffing edge at t2 77C.1 Memory map of address space registers in the caregmod module 80C.2 Input signals for chmod 81C.3 Output signals for chrnod 81C.4 Control signals generated by instruction decoder 86C.5 Control signals for executing instructions 86C.6 Address decoding in modebits module 88C.7 Truth table for generating SPO.D in pulsegenO module 89C.8 Karnaugh map for generating SPO.D in pulsegenO module 89C.9 Truth table for generating WPO.D in pulsegenO module 90C.10 Karnaugh map for generating WPO.D in pulsegenO module 90C.11 Truth table for generating SP1.D in pulsegeni module 91C.12 Karnaugh map for generating SP1.D in pulsegeni module 91C.13 Truth table for generating WP1.D in pulsegeni module 92C.14 Karnaugh map for generating WP1.D in pulsegeni module 92viiiList of Figures2.1 Waveform formats 82.2 Comparison methods 92.3 Flip-flop test example 102.4 Flip-flop test waveforms 122.5 Flip-flop formatted test waveforms 132.6 Generating clock waveforms 162.7 State machine model of static synchronous circuit 173.1 Functional block diagram of a generic tester 213.2 Generic channel electronics in systems without waveform formatting 253.3 MacTester channel multiplexing 253.4 Block diagram of DGR Architecture 263.5 Test system architecture 283.6 Pin driver/receiver placement 304.1 Block diagram of FTC 354.2 Address space example 384.3 Vector encoding 404.4 FTC pipeline 424.5 Functional block diagram of timing generator 424.6 Fine vernier implementations 434.7 Functional block diagram of waveform formatting circuit 464.8 Dead-zones 48ix4.9 Functional block diagram of input comparison circuit 504.10 Comparison result output encoding 515.1 FTC chip layout 535.2 Delay line macroceli layout 555.3 Timing measurement locations 565.4 Simulated and measured delays 595.5 Interchannel skew measured for rising and faffing edges at t1 615.6 Deskewed placement times for rising edges at t1 625.7 Rise-fall skew at t1 645.8 Rise-fall skew at t2 655.9 Rise-fall deskew circuit 66C.1 Timing diagram showing the fetching of instructions to the decoding stage 97xAcknowledgementsI would like to take this opportunity to sincerely thank my supervisor, Dr. André Ivanov, for hisguidance, support, patience, and encouragement throughout the course of this work. I must alsothank Yuejian Wu, one of my supervisor’s Ph. D. students, for originally suggesting this researchproject to my supervisor and for his many enlightening discussions on VLSI testing. I would liketo thank Dr. Rick Hobson for taking the time to show me the operation of the IMS tester and forhis thoughts on IC testers. I would also like to thank Jeffery Chow, Barry Tsuji, Peter Bonek,Robert Lam, Oswaldo Antezana, and Andrew Bishop for their contributions of various resources,and helpful discussions and suggestions. Special thanks goes to Dave Gagne for his help and patiencein suggesting and providing solutions to my VLSI problems. The work reported here would not bepossible without the facilities and resources for integrated circuit fabrication generously providedby the Canadian Microelectronics Corporation and Northern Telecom Limited respectively. I wouldlike to thank the many faculty, staff, and students who made my experience at UBC interestingand pleasant. Finally, I thank my family for their constant support and encouragement over thepast two years.)Chapter 1IntroductionImprovements in semiconductor technology, transistor densities, and manufacturing yields are providing more powerful Very Large Scale Integration (VLSI) technology at less cost. Coupled withadvances in computer-aided design (CAD) tools, the technology to produce fairly complex integrated circuits (ICs) has become more accessible. As a result, cliipsets and Application SpecificIntegrated Circuits (ASICs) are being used more frequently to implement circuits that once required discrete components. Despite these improvements, VLSI design and fabrication remainsan imperfect process. Consequently, testing continues to be an important step in the design andmanufacture of VLSI chips that is necessary to ensure a working final product.1.1 Background on VLSI TestingThe purpose of VLSI testing is to detect faulty chips. Faults in a VLSI chip may arise frommany sources. For example, fabrication defects and contamination may cause faults such as shortsor opens, excessive leakage currents, etc.[Director9O]. Packaging and subsequent handling mayintroduce additional faults.Unfortunately, an IC containing faults may not reveal its defective nature very easily. Inthe past ICs were much simpler and it was feasible to verify that a chip was good simply byexhaustively testing that it performed its specified function. This strategy, known as behavioral[Levitt92] (or f’unctional) testing, does not scale well with the higher pin counts and integrationdensities of modern ICs [Levitt92]. Instead of exhaustively testing a chip as a single functional unit,exploiting the structure of the circuit elements within the chip can lead to significantly shorterpseudoexhaustive tests that are equivalent to the exhaustive tests [Abramovici90, page 305].1Chapter 1. Introduction 2The detection of faults in a VLSI chip is accomplished by stimulating the chip inputs sothat potentially faulty circuits inside the chip produce a result at the chip’s outputs that indicateswhether the circuit is operating correctly. Therefore, controllability and observability of a chip’sinternal circuit nodes is important for making a chip testable [Abramovici90, pages 343-358]. One ofthe main difficulties with VLSI testing is that nodes deep within a chip are not so easily controlledand observed from the available I/O pins. Much computer time is needed to generate test patterns(or test vectors) that could stimulate and observe these difficult to reach nodes to achieve a highcoverage of potential faults. These problems continue to grow as researchers strive to achieve higherspeeds and integration densities. The cost of allowing defective parts to pass can be very expensive.It is often cited that the cost of finding a defective component increases by a factor of 10 for eachlevel of assembly, i.e., from device to board level, system level, and field installation [Levitt92]. Therecognition of these difficulties have caused IC designers to more seriously consider test issues anddesign for testability techniques [Wi]iiams83].1.1.1 Design for TestabilityDesign for testability (DFT) techniques make an IC more testable by improving its controllabilityand/or observability. Today, many VLSI chips are employing design for testability techniques tofacilitate testing. (E.g., Intel 80386 [Gelsinger87], Motorola 68340 [Bishop9o], etc.) There aremany DFT schemes available to a designer {Williams83, Levitt92]. Two of the more establishedtechniques are scan-path design [Abramovici90, pages 358-382][Eichelberger78] and built-in self-test(BIST) [Abramovici90, pages 457-539] [McCluskey8 1].Scan design is a technique used to simplify the testing of sequential circuits. This methoduses special latches or flip-flops that provides an additional set of data and clock (or control) inputs.Essentially, these special memory elements provide “dual input ports.” One set of inputs is usedfor normal circuit design. The other set of inputs is used to connect the memory elements into onelong shift register (or scan chain). During normal operation, the regular latch inputs are used andthe circuit performs its normally specified function. During test mode, the alternate latch inputsChapter 1. Introduction 3are used and the memory elements act as a shift register. Any internal state can be initialized byshifting the appropriate state bits into the memory elements. Similarly, any internal state can beobserved by shifting the state bits out of the memory elements. Thus, complete controllability andobservability is achieved. The problem of testing a sequential circuit is reduced to one of testingcombinational logic and a shift register. The costs associated with this scheme are in the arearequired for the additional circuits, routing, and pins. In addition, the performance of the flip-flopmay be degraded. The serial nature of this test strategy results in a relatively slow test.The idea behind BIST is to provide additional circuits in a design that are capable of testingthe design from within the chip. The term BIST generally includes a wide variety of test strategiesthat satisfies this criterion. The BIST circuits generate the test stimulus and evaluate the responseof the circuit under test. Being inside the chip, these test circuits have the advantage of being betterable to control and observe nodes that are difficult to reach from outside. Typically, pseudorandomtechniques are used to efficiently generate test stimulus and to compress the test results [Bardell87].Generally, a BIST test is performed by providing a specified number of clock cycles followed bygo/no-go comparison of the test result. Testing performed under BIST is generally very fast. Themain cost associated with BIST is the increased area.CrossCheck [Chandra93] is a recently introduced ASIC DFT strategy based on a proprietary technology. This scheme employs an embedded test matrix to provide massive controllabilityand observability. Besides the limited availability of this technology due to licensing, a drawbackassociated with CrossCheck is the area overhead [Levitt92]. For gate bound designs, the test structures could occupy 25% of the core area while routing bound designs may experience an overheadof 12% to 20% [Chandra93j.There are many DFT strategies available. In practice, a design may employ a combinationof formal and application specific DFT methods, e.g. Intel 80386 [Gelsinger87].Chapter 1. Introduction 41.1.2 IC TestersTesting an integrated circuit involves three major steps: i) the generation of suitable test patterns,ii) the application of these patterns to the device under test (DUT), and iii) the analysis of the testresponse from the DUT. The first step, test pattern generation, is usually CPU intensive and isdependent on the DFT strategy. These test patterns are usually prepared ahead of time, althoughsome test systems provide algorithmic pattern generators for special applications such as memorytesting [Kikuchi89].The last two steps, test pattern application and test response evaluation, involve the useof an IC tester. Electrical tests applied to a DUT could be classified into parametric (or DC)tests and functional (or AC) tests [Feugate88]. Parametric tests measure the analog quantities(voltages and currents) at the pins of the device under test (DUT). Typically, these tests measureleakage currents, output voltage levels, and check for shorts and opens at the pins. Parametric testsusually take place at fairly low speeds because of the relatively long settling times of the precisioninstruments involved and the slow switches used to connect these instruments to the DUT pins.In the context of IC testers, the term functional test refers to the high-speed operating modeof a tester. It does not imply anything about the test methodology mentioned earlier. To avoidconfusion, we will refer to the functional test methodology as behavioral testing. Functional testsverify the correct operation of a DUT. This involves driving the DUT inputs with the appropriatelogic stimulus and comparing its outputs with the expected response. Preferably test patterns areapplied at the DUTs normal operating speed which allows testing of timing critical parameterssuch as propagation delay, setup/hold times, and maximum frequency.The set of tests to be performed on the DUT is usually described using a test language (Tb).Support for generating Tb instructions range from highly automated, that is, mostly generated bysoftware (e.g., [Organ9l], [Walter88j) to manual writing of fairly low-level code. ATLAS, an IEEEstandard test lariguage[ATLAS84], has been published, but it has not been widely adopted for ICtesters.Chapter 1. Introduction 5VLSI testers are available with various capabilities and performance. The specific requirements for a test system depends much on its intended use. For example, in a manufacturingenvironment, test throughput is an important criterion in addition to other requirements such astiming and input/output (I/O) capability. In [Bassett9o], IBM (International Business MachinesInc.) reports producing a tester with a parametric measurement unit (PMU) dedicated to each pinfor the purpose of increasing test throughput. The requirements for a design-prototyping environment are less demanding, however, test quality is an important issue in both environments.With the increasing speeds and densities of VLSI chips, the testers that are capable oftesting these high performance devices are becoming increasingly expensive and complex. High-end IC testers are large multi-million dollar machines with per-pin costs of $600048000 [LaBuda9O][Bassett9o]. Unfortunately, these high-priced testers that are needed to thoroughly test most VLSIchips are economical only for high volume users such as semiconductor manufacturers. Userswithout access to these testers could either rely on the chip manufacturer to perform the requiredtests, or use a more affordable, but inferior lower-end test system. Contracting the manufacturer toperform production testing is fine, but it is less practical for design debugging and diagnosis wherea high degree of interaction is needed with a tester.Low-end IC testers typically cut costs by providing smaller pattern memories, lower testspeeds, and minimal timing and input/output (I/O) capability. A long test can usually be partitioned into several smaller tests that would fit into a small pattern memory. The reduced throughputbecause of pattern memory reloading is usually not a problem for prototype diagnostics. However,the remaining deficiencies degrade the test quality.1.2 Thesis Objective and OrganizationThe this thesis focuses on improving the quality of functional tests provided by low-cost VLSItest systems. We show that increasing timing and I/O capability provides greater improvementsin test quality than simply increasing the test speed. To exploit these capabilities, a strategy toChapter 1. Introduction 6perform limited high-speed testing was developed. A simple tester architecture that supports theseenhancements was designed. A key component of this test system, the integrated functional testerchip, was designed and implemented.The remainder of this thesis is organized as follows. Chapter 2 provides more detailedbackground on functional testing using IC testers. The requirements for improved test quality ina low-cost test environment are developed. We examine some of the capabilities and limitationsof this functional tester system. In Chapter 3, we review conventional tester architectures andrelated work in this area. An architecture for the functional tester system is developed. Therequirements of the Functional Tester Chip (FTC) are given. Chapter 4 gives an overview of thedesign and implementation of the FTC. Simulated and experimental results for part of the FTCimplementation are presented in Chapter 5. Conclusions and directions for future work are givenin Chapter 6.Chapter 2Functional Testing and System Specifications2.1 Functional TestingThe characteristics of the patterns required to test a DUT depends on the test strategy and the DFTfeatures used. For example, a chip that uses scan design typically has most test activity confinedto a few pins on the device {Tsui87J whereas a device using BIST may only require a number ofclock cycles to operate the test followed by a “go/no-go” comparison to check the results. In orderto test a device, a functional tester provides an interface consisting of a number of channels (orpins) that connect to the DUT. Through these channels, test waveforms are applied and the DUT’sresponses are sensed.2.1.1 Waveform Driving and SensingWaveforms to be generated by a tester are usually specified in segments of a given test period (ortest cycle time). The length of the test period is programmable and usually remains fixed for theduration of a test. The speed of a tester is the frequency corresponding to the shortest test periodthat could be generated by the tester. Since transitions in the applied waveform may not necessarilyfall on test period boundaries, the waveforms generated for each test period are usually formatted[Feugate88] to allow more flexibility in edge placement. The logic levels of formatted waveformscan be programmed to change at programmable time intervals following the start of the test period.Typical output formats provided by functional testers are shown in Figure 2.la-d and described inthe following [Feugate88]:7Chapter 2. Functional Testing and System Specifications 8ti t2 ti t2 ti t2 ti t2Test Period—H H_Test Period —H H_Test Period —H H—Test PeriodValue = 1 Value = 0 Value = 1 Value = 0a) Return low b) Return high.Lr — —ti t2 ti t2 ti t2 ti t2Test Period—H H_Test Period —H H__Test Period —H H__Test PeriodValue = 1 Value = 0 Value = 1 Value = 0c) Return complement d) No return.retains value at end of previous periodI I LF1ELiI_I l t2 ti t2 ti t2 ti t2 Ii t2 ti t24—RHorRC NR NR RL0rRC RH0rRC-*j4--- RLValue = 0 Value = 0 Value = 1 Value = 1 Value =0 Value = 0e) Some combined waveforms.Figure 2.1: Waveform formats.a) Return Low (RL): During the interval t1 to t2, the channel is driven with the logic levelspecified for the current test period. Outside this interval, the channel is driven with a lowlogic level.b) Return High (RH): During the interval t1 to t2, the channel is driven with the logic levelspecified for the current test period. Outside this interval, the channel is driven with a highlogic level.c) Return Complement (RC): During the interval t1 to t2, the channel is driven with the logiclevel specified for the current test period. Outside this interval, the channel is driven withthe opposite logic level.d) No Return (NR): The channel is driven with the logic level at the end of the previous testperiod up to time t1. From t1 to the end of the current test period, the channel is driven withthe logic level specified for the current test period. The t2 timing specification is not used.Chapter 2. Functional Testing and System Specifications 9Figure 2.le shows how formatted waveform segments are used together to form the desired waveform. The test period must be chosen so that it partitions the desired waveform into segmentscontaining one or two transitions that can be reproduced by the waveform formatter. Low-endtesters usually provide only the NR format. A tester without waveform formatting essentiallyprovides NR with t1 set at the beginning of the test period.Input comparison is performed using edge or window strobing (or comparison) [Feugate88].Figure 2.2 illustrates the difference between the two comparison methods. Edge strobing samplesthe DUT output at one time instance (t1) within a test period while window strobing tests the DUToutput over a time interval (t1 to t2) or window. As shown with the third comparison in Figure 2.2,edge strobing is more vulnerable to producing erroneous comparisons with unstable output signals.Window comparison is usually provided in higher-end testers because of the additional resourcesrequired to specify and generate the window. Generally, only one logic level could be checked pertest period because oniy one edge or window strobe is available. Moreover, only one vector valueis associated with each test period for output driving or input comparison.DataCompare windowand expected dataCompare resultData 1fLflfLJL1L[[............JL....Strobe poht and_________11Compare result OK Error OK OKb) Edge comparison.•window closeda) Window comparison.Figure 2.2: Comparison methods.Chapter 2. Functional Testing and System Specifications 10There are generally a number of constraints governing the relationship between the testperiod and the placement of t1 and t2. These constraints arise from hardware limitations andimplementation issues that are tester dependent. For example, one fundamental limitation of inputwindow comparison is that any changing of the expected value while the window is active will causea comparison error.2.1.2 An ExampleAs an example of functional testing, consider testing the flip-flop shown in Figure 2.3a. (The setand reset signals were omitted to simplify the example.) A test to determine whether the flip-flopcould latch a logic 1 might consist of the test vectors given in Table 2.1. Each row under the vectorcolumn is a test vector. The test period column gives the period during which a test vector isactive. The output vectors drive the flip-flop inputs while the compare vectors check the flip-flopoutputs.dkDQ JEomnationE1EJa) Truth table for flip-flop to be tested. b) Example flip-flop use.Figure 2.3: Flip-flop test example.Before these test vectors could be used, more timing information about the Q output isneeded. The difficulty lies in determining when to test Q for the latched D input. Q could not bechecked at the rising edge of elk because of propagation delay. Checking Q just before the end ofthe elk period may pass most flip-flops, but the resulting test criterion is not stringent enough toguarantee the operation of common circuit configurations of the type shown in Figure 2.3b. Ideally,the test waveforms should adequately stress the timing requirements of the DUT.Table 2.2 gives the timing parameters needed for a more comprehensive test of the flip-flop.Chapter 2. Functional Testing and System Specifications 11VectorsTest Output Compare DescriptionPeriod cik D Q0 0 0 X initialize with 01 1 0 02 0 1 0 load 1 (with 0 starting state)3 1 1 14 0 1 1 load 1 (with 1 starting state)5 1 1 1Table 2.1: Example test vectors for testing flip-flop.Figure 2.4 gives the more detailed set of test waveforms derived using these timing parameters. Theactual flip-flop output is given by Q0. The required flip-flop output is given by Q. For clarity, theminimum cik pulse width was not used. In using a slower elk speed, the waveforms also illustratethe situation that arises when a tester is too slow to test a device at its maximum speed.Description Parameter Typical (ns)Propagation delay tpd 31Minimum Clock Pulse Width t, 21Setup Time 14Hold Time th 0Table 2.2: Example timing parameters for flip-flop1.The timing used for the D input is designed to test the specified setup and hold times ofthe flip-flop. To test the setup time, the value to be latched is present at the D input only for atime of t. Although the hold time is zero, the D input could not change reliably at the sametime instance as the rising edge of elk. To ensure that erroneous operation does not arise from theslight skew between D and elk, the D transition used to test the hold time takes place shortly afterthe rising elk edge. Testing the Q output takes place after the specified propagation delay, tpd.1Based on SN74HCT74 Dual D-Type Positive-Edge-Triggered Flip-Flops with Clear and Preset by TexasInstruments.Chapter 2. Functional Testing and System Specifications 12clkDQoQcFigure 2.4: Flip-flop test waveforms.The actual propagation delay is probably less than this value. A guard band is needed betweencomparison regions of differing logic levels to allow the comparison hardware time to settle on thenew value.The test waveforms in Figure 2.4 could be specified using the appropriate output formatsand input comparisons defined in Figures 2.1 and 2.2 respectively. The first step involves choosinga test period. Setting the test period equal to the elk period would be ideal for generating the elkand D signals as shown in Figure 2.5a. However, performing the input comparison for Q in testperiod 1 presents a problem because both the high and low logic levels could not be checked in onetest period. A solution to this problem is given in Figure 2.5b. By choosing a test period equalto the elk phase, the applied waveform is still divided into segments containing only one or twotransitions, but the output comparison now checks only one logic level per test cycle. Once the testperiod is established, the appropriate formats, timing, and vector values to be used in each testperiod could be chosen.A complete test of a device may involve several functional tests because the tester maynot have the resources (e.g., vector memory, timing generators, etc.) to complete the test in oneoperation. Each additional functional test requires reinitialization of some part of the tester. Inguard band— = expected output levelChapter 2. Functional Testing and System Specifications 13Period 0 1 2 Period 0 1 2 3 4 5cik_j _J __J cik —RH(O) RH(O) RH(O) NR(O) NR(1) NR(O) NR(1) NR(O) NR(1)— — —RC(O) RC(1) RC(1) RC(1) RC(1) NR(1) NR(O) NR(1) NR(O)__F-l_______ _ ___ ______Qcexpect 0 problem expect 1 expect o expect o expect i expect 1 expect 1— = comparison window open — = comparison window opena) Test period equals cik period. b) Test period equals cik phase.Figure 2.5: Flip-flop formatted test waveforms.a prototyping environment, the additional time required to reinitialize the tester is generally notimportant. However, in a manufacturing environment, this extra time reduces test throughput andleads to lost revenue. In [Bassett9O], IBM reports using a tester with a 641v1 pattern memory justto avoid vector reloading when testing their level-sensitive scan-design (LSSD) devices.2.2 Functional Tester System SpecificationsClock rate is a frequently quoted performance figure for IC testers. However, as shown in theexample in the previous section, this figure alone is insufficient as a measure of test capability. Otherfactors, such as waveform formatting and comparison methods are also important. In developingthe specifications for an more effective low-cost functional test system, we want to concentrate onthe features that provide the most test capability while maintaining reasonable cost. There arethree main parameters to consider:1. Vector Memory - The main advantage offered by increasing the vector memory size is theaccommodation of longer tests without memory reloading. However, long tests can usuallybe partitioned into several shorter tests. As long as a workable amount of memory is available,Chapter 2. Functional Testing and System Specifications 14increasing this amount provides little gain in test capability. Large memories also increasethe size of the test system and require more support hardware to implement.2. Tester Speed- Increasing the speed of a tester provides some improvement in test capabilitymainly in the form of higher edge resolution and increased test speed. The difficulty with thisapproach is that the entire test system must operate at a higher speed. The improvementin edge resolution does not seem to justify this additional cost. For example, a tester with a40MHz test frequency can provide a 25ns test period which is still fairly coarse for detailedtiming tests. The memory system supporting this tester probably requires an access time ofless than 25ns which requires either expensive static RAMs or memory interleaving schemes.3. I/O Timing and Formatting - Efforts to improve I/O timing and formatting yields the mostimprovement in test capability at a modest cost. As shown in the example in Section 2.1.2,1/0timing and formatting flexibility is essential for performing detailed timing tests. Section 2.2.1elaborates more on some of the possibilities of this test system. The main costs of this featureare in the increased complexity and the additional memory width needed to encode the desiredformatted waveforms. Increased complexity can be accommodated in a VLSI chip. However,the increase in memory width reduces the potential channel density.Providing a high degree of I/O timing and formatting flexibility seems to give the best enhancementin test capability at a reasonable cost. The specific requirements of the functional tester system(FTS) established here are:• output waveform formatting (Re, RL, RH, and NR)• input window comparison• format and timing changes on-the-fly• timing resolution should allow a wide selection of edge placements (perhaps a resolution of1-3ns)Chapter 2. Functional Testing and System Specifications 15Other specifications include a reasonable memory size of approximately 16k vectors/channel, support for 48-64 channels, and test speed of 5MHz. We now examine the capabilities and limitationsof this test system based on these specifications.2.2.1 Limited High-Speed TestingClock SignalsOne of the most basic signals generated by a tester is a clock waveform. As shown in Figure 2.6,there are several ways in which a tester could generate clock signals. In Figure 2.6a, the clock signalis generated using a NR format. This example is representative of the capabilities of a low-costtester with limited output formats and timing flexibility. Even if the tester allows delaying theedges by a fixed time, as shown in Figure 2.6b, the frequency of this clock is still only half thetester frequency because of the format used.Once waveform formatting is provided, considerably more edge placement flexibility is available and clock speeds from 1 to 1.5 times the test frequency could be generated as shown in Figures2.6c and d. Providing waveform formatting alone allows 50% higher speed tests to be performed.From a diagnostics point of view, the purpose of using a higher frequency clock is to reveal faultsarising from clock skew, propagation delay, and other timing critical factors. These problems maynot appear during low-speed testing when there is adequate time between clock edges to allowcorrect circuit operation.Strategy for Limited High-Speed TestingSo far, 50% duty cycles have been assumed for the clock signals. If symmetric clocks are notrequired, it is possible to generate short bursts of high frequency clock cycles by exploiting on-the-fly format and timing changes as shown in Figure 2.6e. Each pair of high frequency clock pulses isfollowed by one slow clock cycle. The frequency of the fast clock cycles are determined by the edgeplacement resolution and the minimum pulse width that could be generated by the tester. TheChapter 2. Functional Testing and System Specifications 160 1 2 3test perioda) 0.5 times test frequencyb) J 1 1 0.5 times test frequencyc)_j— 1 times test frequencyd)_J[........ _JI_ 1.5 times test frequencye) [f Ji__________ [f ji_________ assymetric clockf) — desired comparisong) — — actual comparison— = compare window open—= compare window for result of slow clock cycleFigure 2.6: Generating clock waveforms.slow clock cycle has a frequency of approximately half the tester speed.Generating these high-speed “bursty” clock and data signals enables a tester to performpotentially more stringent circuit tests. For example, consider testing a static synchronous ASICdesign. Static synchronous designs are based on edge-sensitive, single-phase clocking schemes. Allstorage elements in the design are sensitive to either the rising or faffing edge (not both) of acommon clock signal. The clock period is longer than the settling time of the circuit. If the clockwere to stop, the circuit would remain in its current state as long as power were maintained. Anexcellent description of this common design methodology is given in [Naish88].The operation of a static synchronous circuit could be abstractly represented using a statemachine as shown in Figure 2.7. A state transition could occur during each active clock edge. Afunctional test for this circuit would require testing that each transition takes place correctly atthe circuit’s operating speed. Since the circuit is static (i.e., its state is retained when the clock isChapter 2. Functional Testing and System Specifications 17stopped), the timing of each state transition is independent of other state transitions. The questionto be answered through testing is: “From a given starting state, could a correct state transitionoccur using a particular clock period?”inputs outputsCombinationalLogic7FStoragecik Elementsa) Block diagram of circuit shown as b) State diagram of circuit showing fast(F)state machine, and slow(s) state transitions due toapplied test vectors..Figure 2.7: State machine model of static synchronous circuit.Figure 2.7b shows the state transition that occurs when the circuit is stimulated with burstyhigh-speed test patterns and clocking. Although one third of the transitions take place under a slowclock, additional test passes could be used to cover the slowly clocked transitions from the initialtest. The key requirement of this scheme is that the circuit remain static. However, it may bepossible to use this technique on dynamic circuits provided that the slow clock meets the minimalspeed required for the circuit to retain its state.LimitationsA limitation of many test systems, including the FTS, is the disparity between the capability ofthe output formatting and input comparison. In the FTS, up to three output transitions (1.5 clockcycles) could be generated per test period while only one input comparison could be performed.This problem arises because more timing resources are required to perform input comparison thanare required to generate output waveforms. Using window comparison, six timing edges are required3SChapter 2. Functional Testing and System Specifications 18to test the three output transitions (edge strobing requires three timing edges).Fortunately, most tests do not require three comparisons per test period. If the threetransitions are used to generate 1.5 clock cycles and one result is generated per clock cycle, thenonly two comparisons are required per test period as shown in Figure 2.6f. This requirement stillexceeds the capability of the input comparison by one comparison. Discarding the comparison ofthe slow clock cycle results still leaves the problem of having two comparisons on every second testperiod.Figure 2.6g gives a reasonable compromise to this difficulty. We allow full flexibility in thecomparison of the first result from the fast clock. The result from the second fast clock period iscompared after one phase of the slow clock. The result from this second compare is valid becausethe timing stress comes from the application of short clock periods. The correct circuit state shouldbe retained as long as the correct results are latched by the circuit.This test strategy can be used in many digital designs, but more effort is needed to developsuitable test patterns to exploit the high-speed clock bursts. A larger set of test vectors, comparedto conventional testing, is also needed because multiple test passes are required to produce one“equivalent” high-speed test pass. The DUT operating under this irregular clocking wifi havedifferent levels of power dissipation and current consumption from a DUT operating under continuoshigh-speed. Therefore, problems arising from these sources may not be detected using this testapproach. Despite these difficulties, this test strategy at least allows a low-cost test system toperform some limited high-speed testing that would otherwise require a much more expensive testsystem.Chapter 3Architectures and System DesignThe previous chapter reviewed functional testing and defined some common I/O formats providedby testers. The desired specifications of the FTS, a low-cost test system, were given and the capabifities and limitations of this system were examined. In this chapter, we examine the architectureof an IC tester in more detail and we review some related work. We then present the system designof the FTS.3.1 Tester ArchitecturesAn IC functional tester is essentially a specialized parallel processing system with channels rangingfrom a few dozen to hundreds or even over a thousand (e.g., Hitachi {Kikuchi89], NTT LSI [Sudo87]).During each test period, a tester must apply data to the DUT pins or the DUT pin outputs mustbe sensed. It is this basic activity that a tester architecture must support with the appropriateorganization and allocation of resources. Even with our modest tester requirements of 64 channelsoperating at 5MHz, the data rate required to support these operations is fairly high:4bits/channel1x 5MHz x 64channels = 32OMbits/sThis figure does not include the additional bandwidth that may be required to store test results.Testers with sophisticated formatting, 250MHz operating speeds, and 512 pins clearly require datarates well into the high gigabit/second ranges. These high data rates are achieved through parallelism. The test vectors are stored in memory associated with the tester channels. During testingthese vectors are delivered to the channels in parallel at high speeds.Ea.ch vector value is assumed to require 4 bits to encode. The actual system uses 8 bits to encode one vectorvalue in the worse case (see Section 4.6.2).19Chapter 3. Architectures and System Design 20Figure 3.1 shows a functional block diagram of a generic IC tester. Generally, an IC testerconsists of the following key systems:1. System Controller: This system controls and coordinates the overall tester operation. Ageneral purpose microprocessor is usually responsible for low-speed operations such as theloading of test vectors, system configuration, and interface functions. This microprocessoralso controls a sequencer which is a high-speed controller used during functional testing toprovide the control signals at the speeds needed by the tester channels. Essentially thesequencer provides the addresses of the test vectors to be used in the current test cycle. Inmore sophisticated systems, the sequencer may be programmed to perform looping, decisionbranching, or other higher level functions. The microprocessor has little responsibility overthe real-time test operations other thall initiating and stopping the test.2. Pin Electronics: The term pin electronics refer to the formatting, driver, and receiver circuitscommon to each tester channel. The interface electronics block shown in Figure 3.1 containsthe driver/receiver circuits that interface with the DUT pins. More advanced pin electronicshave programmable high and low voltage levels for output driving and input threshold comparison [Parker87]. In more expensive test systems, active loading may be used during inputcomparison to source or sink programmable load currents. Since these programmable voltagesand currents cannot be rapidly changed amidst testing, they are fixed for the duration of atest. Depending on the tester architecture, other additional circuits common to each testerchannel may be designated as part of the pin electronics.3. Timing Generators: The timing generators provide the timing edges needed by the formattingelectronics to define waveforms for I/O purposes. The sequencer may also use a timinggenerator to adjust the test period. More advanced timing generators could change its timingoutput on-the-fly {Feugate88].4. Memory System: This system provides storage for test patterns, timing specifications (usedChapter 3. Architectures and System Design 21by the timing generators), format specifications (used by the formatting electronics), andcomparison results. Large memory systems may require multiple banks of memory devices.Interleaving or other schemes may be required to provide higher access speeds.The actual organization of these systems (especially 2-4) depend much on the architecture andimplementation of the tester. Tester architectures can generally be classified into shared resource[Feugate88] architectures and tester per-pin {Bisset83] architectures.systemcontrollermemorysystem3.1.1 Shared Resource ArchitectureicsThe shared resource architecture attempts to optimize the cost of a tester system through resourcesharing. Usually timing generator signals are shared either by fixed groups of channels or dynamically distributed with a crossbar switching network. I/O channels cannot be shared, but they couldFigure 3.1: Functional block diagram of a generic tester.Chapter 3. Architectures and System Design 22be designed to be splittable into separate input and output channels. This feature provides the opportunity to more efficiently allocate I/O resources as a single bi-directional channel, or as separateinput and output channels. Since bi-directiollal signals are often buses, further savings could beobtained by specifying the I/O direction for a group of channels, instead of on an individual basis.For example, I/O direction control in the HP E1451/1452 pattern module are given for groups ofeight channels [HP E1451/E1452 91]. Some manual wiring is usually needed to prepare an fixtureboard that interfaces the appropriate tester channels to the DUT because the functionality of thechannels could be quite inhomogeneous. To minimize cost, a shared resource tester system is assembled with the minimal resources that would meet all its anticipated test requirements. As testneeds grow, hardware is added to the test system to meet these demands. This architecture ispopular for low-end testers because of its lower cost.3.1.2 Tester Per-Pin ArchitectureThe tester per-pin architecture uses the opposite approach of the shared resource architecture. Thisarchitecture attempts to provide uniform functionality between all channels by allocating dedicatedresources to each channel. Early studies of this architecture suggested that this approach is justifiedfor its improved flexibility and productivity [Bisset83]. For example, test program development onsuch a system is simpler because resource (e.g., timing generators) allocation is no longer an issue.Hardware is also somewhat simpler because the complex crossbar switches used for resource sharingare not required which reduces cost and simplifies the calibration of internal timing paths. However,the net cost of tester per-pin systems are still more than shared resource systems because of thelarge number of replicated channel components. This architecture provides a great deal of flexibilitythat makes it attractive or even required in a manufacturing environment [Chang87]. The high costassociated with this architecture means that its use is mainly restricted to high-end test systems(e.g., IBM [Gruodis88], NTT [Sudo87]). Efforts have also been made towards lower cost hybridsystems (e.g., Texas Instruments [Fehr92]) that attempt to combine the best of the two approaches.Chapter 3. Architectures and System Design 233.2 Exploiting VLSI TechnologyTo meet the specifications required by leading edge IC testers, manufacturers rely on high performance technologies such as emitter-coupled logic (ECL) to implement the high speed subsystemsthat are mainly comprised of the pin electronics. However, these technologies have low integrationdensities (see Table 3.1 from [Huber9l]). Compensation for low integration densities is providedwith aggressive packaging technologies (e.g. mult-chip modules, hybrid) which reduce size andimprove signal distribution.Technology Speed Density PowerCMOS 50 MHz Highest LowestTTL 100 MHz Medium LowBiCMOS 50 MHz High LowECL >1 GHz Low HighestGaAs >3 GHz Lowest MediumTable 3.1: Characteristics of VLSI technologies.In the past, pin electronics were custom designed by the tester manufacturer because commercial parts were not available for these special purpose functions. For similar reasons, thisapproach is still used in todays high-end systems. However, for less demanding mid-range andlower end systems, VLSI technology could be exploited to produce compact test systems. Researchat Tektronix has produced a pair of CMOS chips that provide output formatting and input comparison with active loading [Branson89]. On chip timing generators and DACs make these ICsfairly self-contained. Recently, Credence has reported using CMOS technology to integrate bothinput and output functions into a single device [Lesmeister9l]. While this implementation doesnot provide dynamic loading or high/low voltage thresholds for I/O, the technology to integratethese functions is clearly available. A number of commercial pin electronics chipsets have also become available that provide active loading, pin driving, and comparison functions [Goodenough9o].These chipsets are lower density implementations that are more general purpose than the customChapter 3. Architectures and System Design 24devices mentioned above.CMOS technology is emerging as an attractive alternative where extremely high performance is not required. CMOS is attractive for its high density and low cost, but it is more sensitiveto temperature and voltage variations than other technologies. Once compensation is provided forthese problems [Chapman92], CMOS technology appears to be a serious contender for use in testerimplementation [Branson89] [Lesmeister9l].Recently efforts have also been made at universities to exploit CMOS VLSI technology inthe design of IC testers. Industrial efforts mentioned earlier mainly emphasize highly functional, butlower channel density pin-electronics chipsets. Typically, one or more ICs are needed to support onetester channel. On the other hand, directions taken by university research emphasize a higher degreeof system integration with less functionality. Typically, these systems integrate many functionaltester channels into a single chip. Several chips are used in parallel to form a test system. Theresult of this research are several compact functional tester systems. The functionality providedby these integrated testers are quite different, but they could be classified into systems withoutwaveform formatting [Miyamoto87][Butner87][McKenzie92] and systems with waveform formattingGasbarro89]. The architecture and features of these systems are briefly discussed in the next twosections.3.2.1 Systems Without Wave FormattingTest systems without waveform formatting are the most simple and can be built most compactly.In these systems, each test vector bit typically needs to be specified with a minimum of two bits:i) the output value or the expected DUT response, and ii) the I/O direction. Using external vectormemory with this approach requires 3 device pins per channel (2 pins for i) and ii), and a singleI/O pin for the channel itself). Figure 3.2a shows a block diagram of this channel. As shown inFigure 3.2b, several channels could be integrated into each chip. (Control signals were not shown tosimplify the diagram.) Several chips are used in parallel to form a tester with the desired number ofchannels. Outputs are usually formatted using NR with t1 fixed. Edge strobing is usually providedChapter 3. Architectures and System Design 25for input comparison.I/O valueI/O directionpina) Functional block diagram of single channel.Tester ChipI/O valueI/O directionJO value1/OvalueI/O directionb) Functional block diagram of channel system.Figure 3.2: Generic channel electronics in systems without waveform formatting.Figure 3.2a shows two data pins supporting one I/O channel. In multichannel chips, thenumber of pins required to support each channel could be reduced by multiplexing the data inputs.Multiplexing is used in the Mac Tester [McKenzie89] [McKenzie92] to reduce the pin requirementsto a manageable level (3 pins per 2 channels) for implementation using Xilinx XC3020 PLDs. (SeeFigure 3.3.) Each XC3020 provides 22 I/O channels. The main disadvantage of pin multiplexing isthat several bus cycles are needed to transfer the data required for one test cycle. For the MacTester,the speed penalty incurred for multiplexing was acceptable because it was primarily designed tobe sequenced by the host computer in real-time (on-line testing) which is already very slow. Whenoperating independent from the host computer (off-line mode), 13 state machine cycles are requiredto process each test vector. This gives a vector rate on the order of 1MHz. The key advantageoffered by the MacTester system is the high degree of programmability. The main disadvantagesof this system is its slow operating speed.I/O datapinDUT pincontrolFigure 3.3: MacTester channel multiplexing.Chapter 3. Architectures and System Design 26Miyamoto [Miyamoto87] used a different approach to reduce pin-count without compromising speed in their Data Generator/Receiver (DGR) tester chip. They integrate the vector memorydirectly in the DGR chip which eliminates the large number of data pins normally required forI/O interfacing. Figure 3.4 shows the basic DGR architecture. A narrow external data bus (e.g.,8-bits) is used to initialize the relatively wide vector memory inside the DGR chip. During testing,test vectors could be internally transferred between memory and pin electronics without widthconstraints. Similar to the MacTester, two bits are used to encode each test vector bit. The advantages of this architecture include: i) a higher operating speed because of the built-in RAM, and ii)a reduced pin count which yields compact packaging. The main disadvantage of this approach isthe limited size of the on-chip vector memory. A prototype sixteen channel DGR was implementedwith 256 vectors/channels in an 84-pin device.Figure 3.4: Block diagram of DGR Architecture.3.2.2 Systems With Wave FormattingGasbarro expands on Miyamoto’s approach by adding waveform formatting (NR, RH, RL, RC,and RT2 ), on-chip timing generators, input window comparison and hardware decompression toallow more vectors to be stored in an embedded memory [Gasbarro89]. Formats and timing areper-pin programmable, but they remain fixed for the duration of a test. This restriction reducesthe number of bits needed to encode each test vector bit to four. The four data bits specify the testvector bit, the expected input bit, the I/O direction, and the relevance of the comparison result.2The Return Tn-state (RT) format is similar to RH or RL (see Figure 2.1) except the logic level outside theinterval t1 to t2 is tn-state (high-impedance).Chapter 3. Architectures and System Design 27Compression ratios on the order 2.5:1 were achieved. (This figure accounts for the area occupied bythe decompression hardware.) Compressed test vectors are stored using internal DRAM (dynamicrandom access memory). Sixteen channels were integrated into one 68-pin chip.3.3 System DesignIn some ways, our tester requirements are similar to the systems discussed in the previous section.It is desirable to integrate many channels into one VLSI device which facilitates the construction ofa compact tester and helps to reduce cost. Having a compact tester also makes it easier to maintainsignal integrity across the tester-DUT interface because I/O driver pins could be located very nearthe DUT.Our test system requires the I/O formatting functionality of the commercial pin-electronicschip-sets with the additional system integration and density of the university developed testers.Besides implementation issues, one of the problems that need to be addressed is the high-bandwidthrequired to support on-the-fly formatting, timing and I/O changes. The remainder of this chapterpresents the design of our functional tester system.3.3.1 Functional Tester System OverviewThe proposed architecture of the tester system is shown in Figure 3.5. The tester provides channelsimplemented using functional tester chips (FTCs). Four identical channels are integrated into eachFTC. The test vectors required by a FTC is stored in its own vector memory in an encoded format.Higher test speeds may be achieved using an embedded memory to store the test vectors, but atthis stage external memory is used to simplify the design. The FTCs are located as close to theDUT as possible to reduce transmission line effects.The overall activity of the tester is controlled and monitored by the host computer throughan interface provided by the tester system controller (TSC). The address space of the FTCs andthe associated vector memories are accessed through the TSC. The TSC is also responsible forChapter 3. Architectures and System Design 28generating the control signals which specify the operating mode for the FTCs. For an IBM AT ISA(industry standard architecture) bus interface, a programmable logic device (PLD) should sufficefor implementing the TSC.3.3.2 System OperationTiming specifications for a functional test are first written using a high-level test language which iscompiled into a low-level instruction encoding for execution by the individual channels. For initialprototyping, the encoded instructions could be manually derived (analogous to machine languageprogramming). Timing measurements from the channels may need to be collected for calibrationpurposes. The FTCs are initialized using the calibration data to correctly setup the required formatsand timing. Testing could then begin. Depending on the configuration (or implementation) of theTSC, the host computer could either wait for an interrupt or poll to determine the tester status.Testing could also be halted at any time by the host computer. The time required to initialize thetest system depends on the speed of the interface between the host computer and the functionaltester system.— control: : address— dataFigure 3.5: Test system architecture.Chapter 3. Architectures and System Design 293.3.3 Functional Tester ChipThe FTC is the interface between the test system and the DUT. Each FTC contains the waveformformatters, input comparators, and timing generators needed for the four channels. In addition,the FTC includes a sequencer which controls and manages access to the external vector memory.Except for the sequencer, resources are dedicated to each channel. In this situation, using a per-pinapproach is simpler because fewer external signals and chips are needed.The FTCs mainly operate independently, however, some rudimentary communications between the TSC and possibly with other FTCs is needed. For example, a FTC may need to signalthat testing has stopped because an error was detected or all the test vectors have been applied.Some control pins are also needed to initiate testing and to select other FTC operating modes.Access to internal registers can be provided by the memory interface. Chapter 4 discusses thedesign of the FTC in greater detail.3.3.4 Board LayoutA standard method of placing pin drivers/receivers for an IC tester is shown iii Figure 3.6. Thislayout allows the maximum number of drivers to connect with the DUT while maintaining relativelyconstant distances. Maintaining constant driver to DUT distances is important for consistent signalpropagation across the drivers.In large test systems, the channel electronics (e.g., waveform formatting, pattern generators,etc.) are located further from the pin drivers because of their size and numbers. Special effortsare made to propagate signals without distortion between the channel electronics and pin driversbecause of the relatively large distances that separate them.In our integrated tester, the functions of the pin electronics and the waveform formatter arecombined into one chip. This allows all the pertinent circuitry to be located very near the DUT.The signal distribution problem is simplified and the resulting test system is very compact.Our 64 channel tester system requires 16 ICs (4 channels per IC). Each IC has a footprintChapter 3. Architectures and System Design 30Figure 3.6: Pin driver/receiver placement.of 26mmx26mm. If each IC requires a spacing of 5mm for routing purposes, the circle of FTCswill have a radius of:radius — (# of chips) x (footprint length + 2xspacing) — 16 x (26mm + 2 >< 5mm)— 9 17cm— 2ir—2ir—Therefore, a DUT to pin driver distance of 9.17 cm could be achieved. Guidelines given in[Cascade9l] suggest that the transition time of a signal should be greater than twice the timeit takes to propagate down its length of stripline to maintain good signal integrity without severeringing and other undesirable effects. This relationship is given in equation (3.1) below.LTrXVP (3.1)Chapter 3. Architectures and System Design 31where L physical stripline lengthTr = transition timeVP = wave velocity(er. 4.9 for FR4 printed circuit board (PCB) material)— 3x108m/s-= 1.355 x 10cm/sUsing equation (3.1) with L=9.l7cm shows that the proposed layout can be used for transitiontimes of 1.35ns or more. This result should be more than adequate for our design.3.3.5 Pin Count and Vector DepthThe pin count (or the number of channels) and vector depth are two parameters that generallyscale well in terms of size and communication requirements. The main issues in increasing pincount deal with density and the maintenance of minimum driver-DUT distances. Some of the waysto increase the pin count for our test system are:1. Increasing the size of the circle allows more pin drivers to fit while also increasing the driverDUT distances. This approach may also require a larger printed circuit board (PCB).2. Using smaller IC packaging will allow more pin drivers to fit around the same radius circle.3. The footprint of the FTC could be significantly reduced by mounting it on a daughter cardwhich is in turn mounted on the motherboard. This approach essentially reduces the footprintof the FTC to the thickness of a PCB pius the area required for routing.There are limits to this scalability as high pin-count testers require more exotic packaging andcircuit board technology to achieve the high density requirements. For our requirements of 48-64pins, ordinary PCB and packaging technology should suffice.The depth of the vector memory is another parameter that scales well. Currently, the testermemory system calls for one static random-access memory (RAM) chip. Relatively inexpensiveChapter 3. Architectures and System Design 32static RAMs with densities of 64-256k bits range should be adequate for our system. Using asingle memory chip keeps the memory system small and simple. Multiple devices could be usedto increase the vector depth, but a small memory system is sufficient to test the functionality ofinitial prototypes.3.3.6 Speed and BandwidthTest systems that support on-the-fly format and timing changes require significantly more bits toencode test vectors than other simpler systems. The extra bits specify the formatting and timinginformation required during each test period. Since each channel is probably going to use only asmall subset of the possible formats and timings of each channel, it is possible to reduce the widthof a vector bit encoding by storing only an index to an on-chip table which in turn specifies theactual format and timing. This approach substantially reduces the bits needed to encode each testvector value, but at a slight reduction in flexibility.While the width of the encoding is reduced in this scheme, it is stifi significantly morethan the two to four bits used in the other integrated test systems. We restrict the width of theencoded vector values to a maximum of 8 bits. The relatively wide memory bus and high memorybandwidth required per channel makes it impractical to integrate many channels into one FTCwhile maintaining a low pin count. Using a shared bus, it is possible to integrate four channels intoone FTC at the cost of reducing the potential test speed by a factor of four. Alternatively, internalvector memory could be used to eliminate FTC I/O bandwidth and pin requirements, but the chiparea would increase significantly and the size of the vector memory would be limited. Even systemswith a two or four bit encoding uses either multiplexing or embedded memory to achieve their highdensities.A tester must be fast enough to generate the desired waveforms and perform the requiredcomparisons. For chips containing dynamic circuits, the tester must also meet a minimum “keepalive” speed. The main constraint on test speed is imposed by the memory system. A 20MHzoperating speed requires a memory access time of less than SOns. There are well-known techniquesChapter 3. Architectures and System Design 33for improving the access speed of a memory system such as interleaving [Rau79]. These schemesshould work well because test vector memory access is highly sequential. However, these techniquesincrease the size, complexity, and cost of the memory system.Chapter 4Functional Tester ChipThis chapter discusses the design of the main components of the functional tester chip. Implementation details and schematics are given in Appendix C.4.1 OverviewFigure 4.1 shows a block diagram of the Functional Tester Chip (FTC). The FTC consists ofthe following modules: control, registers, memory interface, clock, and channels. Each channelcontains an instruction decoder, format table, a waveform generator, and an input comparator.Although the original design supported four channels per FTC, the number of channels used in theactual prototype chip was reduced to one because of limited fabrication resources. It would still bepossible to evaluate the functionality of the FTC with this one channel. The remaining sections ofthis chapter discuss the functional blocks of the FTC in more detail.4.2 ControlThe operation of the FTC is essentially controlled by three pins: halt and mode<1:O>. The modeand halt signals are intended to be connected in parallel to all the FTC chips. The halt signalbehaves as an open drain active-low signal and allows external devices to stop the FTC. While inthe halt state, the tester channels are tristated. The values present at the mode pins determine theoperating mode of the FTC. Changes between operating modes are synchronized with the clk input(failing edge). Table 4.1 gives the decoding of these mode bits. Future use of the unused mode (01)include a “keep-alive” mode for maintaining the state of a dynamic circuit during pattern reload.34Chapter 4. Functional Tester Chip 35+buffsaddressaddressritr.r:.;i i X format I0Ft’ jregisters dec. I L1: mem.Ibufferlhalt j control___________and counter I wave (3 more)mode I mode I hift reg.fmt. cmpjcontrol H -1__registers — control_ _address11/0 driven — dataChanneliTesterFigure 4.1: Block diagram of FTC.Mode<1:0> Mode Description00 Calibration Mode01 Unused10 Program Mode11 Test ModeTable 4.1: FTC Modes.The operation of the FTC under these modes is described in the following:1. Test Mode: In this mode, the FTC is actively fetching and executing instructions from thevector memory. The FTC could leave test mode and enter a halt state for one of three reasons:i) end of instructions, ii) comparison error, or iii) external halt signal. Once the FTC entersthe halt state it could only recover by leaving test mode.2. Program Mode: While in this mode, the tester channels are inactive and tristated. The FTCoperates as a memory allowing read/write operations on internal registers through its memoryinterface.3. Calibration Mode: This mode facilitates calibration operations on the FTC. Tester channelsChapter 4. Functional Tester Chip 36are active in this mode, but test pattern data are not fetched from the pattern memory.Instead, the host computer directly writes data into the channel buffers. The contents of thechannel buffers are repeatedly executed.The behavior of the FTC under various error conditions depend on the state of the system bitswhich are discussed in the next section.4.3 RegistersA memory map of the FTC registers and status bits are given in Table 4.2. The system bits specifythe behavior of the FTC during test mode. The tclk period is programmable from 4 to 259 timesthe clk period. The address wrap bits enable the vectors to continue at the wrap address. Otherwisetesting stops because a halt condition is reached. The stop on compare error bits enable a compareerror to trigger a halt condition. The 3-bit synch select register selects the signals that would bemultiplexed out the synch pin for synchronization purposes. The address end and compare errorstatus bits are set to indicate the cause of the halt condition. The functions of the remainingregisters are described in the related sections.4.4 Memory InterfaceAccess to the external vector memory is multiplexed between the four channels. The address spaceof each channel is defined by the start, end, and wrap addresses. Figure 4.2 illustrates the use ofthese registers. Vector decoding begins at the start address and continues sequentially until theend address is reached (arrow a). From the end address, decoding continues at the wrap address(arrow b) if wrap execution mode is enabled for the channel (i.e., the wrap bit is set). Otherwise,the FTC enters a halt state after the instruction at the end address has been executed. Under thewrap operating mode, execution continues until termination by a halt condition.This organization allows initialization waveforms to be placed between the start and endaddresses. Since the address space of each channel is completely configurable, the available patternChapter 4. Functional Tester Chip 37memory can be more efficiently distributed between the four channels. Two channels on the sameFTC that are producing the same waveforms could even share the same address space.Address Description<10:6> <5:0>00000 System Space000000 tclk period000001 address wrap (0-4) / stop on compare error (0-4)000010 synch output select (5 bits)0001cc buffer of channel cc during calibration mode001000 LSB of channel 0 error count001001 MSB of channel 0 error count001010 LSB of channel 1 error count001011 MSB of channel 1 error count001100 LSB of channel 2 error count001101 MSB of channel 2 error count001110 LSB of channel 3 error count001111 MSB of channel 3 error count010000 address end status (0-4) / compare error status (0-4)00001 Channel Address Space BoundsOcc000 LSB of channel cc wrap addressOccOOl MSB of channel cc wrap addressOccOiM LSB of channel cc end addressOccOll MSB of channel cc end addressOcclOO LSB of channel cc start addressOcciOl MSB of channel cc start address00010 Format Memoryccffbb Addresses byte bb of format if of channel cc.Currently oniy <10:6>=00010 is used.Byte ordering: 0 is LSB; 3 is MSB.Table 4.2: Address space of FTC registers. (LSB = least significant byte; MSB = most significantbyte)Encoded vectors are read sequentially in round-robin fashion into a separate buffer for eachchannel. External memory is accessed only when a buffer is empty. Buffering allows idle periods inthe bus cycle to be potentially used for other purposes. However, it is difficult to reliably exploitthese idle periods because they are highly dependent on the encoding stream. Depending on theChapter 4. Functional Tester Chip 38waveforms to be generated, these idle periods may not exist at all.4.5 Clock ModuleFigure 4.2: Address space example.The clock module derives the test clock (tclk) and other clock related timing signals from themaster clock (elk) input. The tclk period is programmable from 4 to 259 times the elk period. Thisprovides coarse adjustment of the test period. Fine adjustment of the test period could be providedby changing the elk frequency externally. The test period is fixed for the duration of a test.4.6 Test Data GenerationTest data must be provided during each test period to the formatting, timing, and comparisoncircuits. The generation of this test data is the responsibility of the format memory and instructiondecoder.4.6.1 Format MemoryThe format memory specifies the set of formats and timings to be used for the current test. Anindex is associated with each format memory that identifies the active memory word. The data inthe memory word is used in combination with the incoming test data bits to generate the formattedChapter 4. Functional Tester Chip 39test data or comparison data. The contents and function of the fields in the format memory wordare given in Table 4.3.Position Field Width Field Function1..O 2 Format (NR=11, RL=1O, RH=O1, RC=OO).2 1 I/O direction (Input=O / Output=1).3 1 Vector value for waveform formatting during input comparison.4 1 Select alternate data source (Data generator=O / Other=1).Width Timing Generator5 1 Delay line tap enable.11..6 5 Delay line tap select (0 to 31).12 1 Phase delay (phase delay=1 / no phase delay=0).18..13 6 Synchronous delay counter (0 to 63 elk cycles).Edge Timing Generator19 1 Delay line tap enable.24..20 5 Delay line tap select (0 to 31).25 1 Phase delay (phase delay=1 / no phase delay=0).31..26 6 Synchronous delay counter (0 to 63 elk cycles).Table 4.3: Format memory word fields.4.6.2 Instruction Encoding and DecodingSince the timing, formatting and I/O data is stored in the on-chip format memory, the simplistencoding scheme that allows on-the-fly format1 changes is given in Figure 4.3a. This encodingsimply specifies the test datum and the table entry used to format the datum. Therefore, each testvector bit requires one byte to encode.Using the slightly more sophisticated encoding (scheme A) shown in Figure 4.3b, the available pattern memory could be more efficiently used. The FTC recognizes three 8-bit instructions:vector sequence, format, and repeat. The desired waveform is described using these instructions.The vector sequence instruction specifies 7-bits of test data. The format instruction specifies thetable entry to use for formatting the test data. Starting with the data bit included in the format‘To simplify the text, the term format will henceforth imply format, timing, and I/O direction.Chapter 4. Functional Tester Chip 40instruction, this new format is used on all subsequent test data until it is changed by another formatinstruction. The 5-bit format index allows up to 32 formats per channel. The repeat instructionrepeats the last data bit from 1 to 64 times.76 0Idi76 0bid d d d d d dl Vector Sequence7654 0Ii Old I format Format765 0765 010 Old I Id i Id I I76543 0Ii Old i I format765 0Ii 11Ii 11 count I Repeatb) Instruction encoding scheme A.Note: d = test data bit; i = I/O directionFigure 4.3: Vector encoding.The number of test bits stored per byte depends on the distribution of instructions usedto encode a particular test waveform. Table 4.4 gives the test data per byte for some instructiondistributions using encoding scheme A. Case I is a test containing mainly vector sequences. Thiscase is representative of a test with few changes in I/O direction and/or fairly simple timingrequirements. Case II is a test containing primarily format instructions. Having a high number offormat changes indicates that the channel is either I/O intensive or requires complex timing. CaseIII is unlikely to represent a real instruction distribution except when used in conjunction withan algorithmic pattern generator in the FTC (not implemented). This encoding scheme is fairlyoptimal for channels that operate strictly as inputs or outputs.Encoding scheme B shown in Figure 4.3c handles I/O channels better than scheme A andyields fairly good encoding for strictly input or output channels. This scheme is similar to the firstone in providing the same three instruction types. However, the I/O direction associated with atest data bit is also given. Therefore, the I/O direction is no longer part of the internal formattable. The vector sequence instruction now encodes only three test data and the format instructionformata) Minimal encoding.Vector SequenceFormatcount J Repeatc) Instruction encoding scheme B.Chapter 4. Functional Tester Chip 41can only choose from a selection of 16 formats.Table 4.5 summarizes test data per byte of instruction distributions using this scheme.Again, Case I is a test containing mainly vector sequences. However, this case covers I/O intensivetests as well as test with simple timing requirements. Case II contains primarily format instructionswhich indicates that complex timing is required.Instruction Distribution (%)Instruction I II IIIVector Seq.(7) 90-100 10-0 0Format(1) 10-0 90-100 0Repeat(64) 0 0 100Test Data/Byte 6.4-7 1.6-1 [ 64Table 4.4: Test data stored per byte for scheme A instruction distributions.Instruction Distribution (%)Instruction I II IIIVector Seq.(3) 90-100 10-0 0Format(1) 10-0 90-100 0Repeat(64) 0 0 100Test Data/Byte 2.8-3 1.2-1 [ 64Table 4.5: Test data stored per byte for scheme B instruction distributions.4.6.3 PipeliningDuring each test period, one instruction for each channel is read. Fetch, decoding, and execution isdone using a 3-stage pipeline as shown in Figure 4.4. The execution step, which involves generatingthe specified waveforms, occurs simultaneously for the four channels and lasts for four memorycycles. The format instruction requires the highest memory bandwidth because it encodes only onetest data bit per byte. Therefore, the memory interface must satisfy this worst case situation andoperate at 4xMHz to support a rMHz test frequency.Chapter 4. Functional Tester Chip 420 1 23456789101112chO ED E___chi FD Ech2 ED Ech3____FDEchO ED Echi Fj El FFigure 4.4: FTC pipeline.4.7 Waveform Formatting and Timing4.7.1 Timing GenerationThe timing generators need to produce timing edges at discrete intervals over the given test period.A functional block diagram of the timing generator is shown in Figure 4.5. Each test period (Ttlk)consists of 4 to 259 system clock periods (T1k). Timing edges are placed relative to the rising orfaffing edge of elk. A counter and a elk phase delay circuit is used to determine from which edge ofelk to trigger the fine vernier. Therefore, the fine vernier must provide enough delay to span oneelk phase.timing dataTGEdge To PincikcounterFinee ElectronicsCoarse VernierFigure 4.5: Functional block diagram of timing generator.One traditional approach to implementing the fine delay vernier involves using a rampgenerator, a digital to analog converter (DAC), and a comparator as shown Figure 4.6a. The voltagelevel of the linear ramp is compared with the output of the precision DAC to produce a delayed edge.This circuit is difficult to design using CMOS VLSI because high-resistance current sources andhigh precision DACs are required [Chapman92]. Furthermore, the relatively long settling times ofDACs (lOOns to 100s) make this approach unsuitable for on-the-fly timing changes at test speeds.Chapter 4. Functional Tester Chip 43timing data DACCMPrV,,/fa) Analog vernier.Figure 4.6: Fine vernier implementations.Figure 4.6b shows the implementation of the fine vernier used in the timing generator. Theedge generated by the coarse vernier passes through the variable length delay line configured toplace the timing edge at the desired offset from the rising or faffing edge of the master clock. Thebasic delay element, TD, is a CMOS buffer which is essentially a pair of inverters. The number ofbuffers used in each successive delay segment is binary ratioed, i.e., twice the number used in theprevious segment. This implementation is compact and gives a resolution of one buffer delay. Thebuffer delay element determines to a large extent the resolution of the waveform formatter.The range of delays generated by the fine vernier is given by:T = Tmu, + 71TD where 0 < n < 31For the buffer cell used, TD 0.63ns which yields a range of 19.5ns for 3irD. This delay rangesupports a 25MHz master clock. The propagation delay through the six multiplexors, TmUT, amountto approximately 4ns. Bypassing the first five multiplexors allow an edge to be propagated withinins.. HL LH HL LH HLb) Digital vernier.Chapter 4. Functional Tester Chip 444.7.2 Propagation Delay SensitivitiesSince we are relying on the propagation delays of CMOS gates to generate timing edges, it isimportant to examine the sensitivity of these delays to process, supply voltage, and temperaturevariations. Process variations lead to fixed differences in propagation delay which could be accounted for through calibration. Temperature and power supply variations are dynamic in natureand therefore require active compensation.CMOS circuits primarily drive capacitive loads because of their high impedance inputs.Therefore, the propagation delay of these circuits is inversely proportional to the drive currentsupplied by the MOSFETs that comprise the CMOS circuit. The drain-source current, ‘DS, ofa MOSFET operating in saturation (or pinch-off) is given by (4.1). This equation contains bothtemperature and voltage rail sensitive parameters.‘DS = LCOX(E)(VGS — V)2 (4.1)Mobifity (both electrons and holes), ji, was experimentally measured to have a temperaturedependence of approximately T312 due to lattice scattering (in the temperature range of interest)[Pulfrey89]. The threshold voltage, V, is also temperature sensitive and decreases approximately2mV per °C increase [Sedra87]. However, the dependency of ‘DS on temperature is dominated by. The gate-source voltage, VGS, depends primarily on the supply rail. These characteristics ofCMOS circuits must be considered in the design of the timing and formatting circuits.To get an idea of the severity of these problems, we need to examine data showing the propagation delay sensitivities of CMC’s CMOS4S library to supply voltage and temperature variations.Unfortunately, this information is not available. However, propagation delay curves specified byVLSI technology for their 1.Oum CMOS library are available and should give an indication of theexpected variations2 [VLSI89]:• Temperature: +O.003x per °C2These factors were linearly interpolated from curves published by VLSI Technology.Chapter 4. Functional Tester Chip 45• Voltage: —0.019x per 100 mVFor example, an increase in 1 °C increases the propagation delay by 0.3% while an increase in100 mV decreases the propagation delay by 1.9%.These sensitivities result in a fairly small, but cumulative skew in the delay line. Early tapsin the delay line experience only a small skew with respect to the expected delay, but the skewexperienced by the later taps may accumulate beyond the resolution of the delay line. For our 2Onsdelay line, the 0.63ns steps are 3.15% of the total delay length. Therefore, variations of as little as200mV could shift a timing edge at the end of the delay line by one step.On-chip temperature regulation and phase-locked loop methods that compensate for thesedrifts in propagation delay are given in [Chapman92]. In [Lesmeister9l], a phase-locked loop wasused for stablizing timing edges against temperature variations. These techniques are suitable forcompensating slow drifts in temperature and voltage.Compensating for temperature and voltage variations require considerable efforts in simulations, chip characterization, custom layout and analog design. These efforts are beyond the scopeof this project due to limited time and resources. Furthermore, it is not certain that compensationis needed without actual measurements. For example, the timing generator implemented by Gasbarro using a 2m CMOS process did not use temperature or voltage compensation circuits, butstill had fairly good timing stability [Gasbarro90]:• Temperature: 2Ops per °C per delay step.• Voltage: 7Ops per Volt per delay step.In [Branson89], maximum thermal drifts of 1.6ns were reported for pin-electronics operating at100MHz. It may be possible to ignore thermal effects on the FTC because of its relatively lowoperating frequency.Chapter 4. Functional Tester Chip 464.7.3 FormattingThe waveform formatter provides NR, RH, RL and RC formats. Except for the NR format, twotiming edges are required to specify the formats. Figure 4.7 shows a block diagram of the formattingcircuit. The waveform formatter uses a negative edge sensitive flip-flop to synchronize transitionsin the formatted output.FormattedVectorwidthedge 1Figure 4.7: Functional block diagram of waveform formatting circuit.The edge timing generator (TG) causes the flip-flop to latch the test data at t1 which placesthe leading pulse edge. Depending on the format, the width TG triggers either the P0 or P1 pulsegenerators to clear or set the flip-flop at t2 which places the trailing edge. The width TG is disabledfor the NR format since only the leading edge is required. P0 or P1 may also initialize the flip-flopto a set or clear state at the start of the test period. The format and test datum to be used in thenext test period is made available to the pulse generators to allow this initialization. The triggeringconditions for P0 and P1 are given in Tables 4.6 and 4.7 respectively. For example, the RC formatrequires the flip-flop cleared if the test datum is 1. Flip-flop initialization is not required for theNR format.4.7.4 Dead-ZonesDead-zones (or dead-bands) are regions in the test period where edges cannot be placed even thoughthe region is larger than the edge resolution of the tester. These dead-zones arise from design andhardware limitations. For example, the timing generators and formatting circuits takes time toI FormatNext IiTest DatumFormatChapter 4. Functional Tester Chip 47Next Test Next ClearData Bit Format Flip-flopv RC Yes if v = 1 otherwise Nov RH Nov Rb Yesv NR NoTable 4.6: Pulse generator 0 triggering conditions.Next Test Next SetData Bit Format Flip-flopv RC Yes if v = 0 otherwise Nov RH Yesv RL Nov NR NoTable 4.7: Pulse generator 1 triggering conditions.initialize and reset. Dead-zones result if this time exceeds the resolution of the timing generators.A common technique to reduce or eliminate the dead-zone involves interleaving (or multiplexing)between two timing generators and waveform formatting circuits. Each circuit now has one testperiod for recovery and initialization. At the cost of redundancy and increased complexity, dead-zones caused by initialization time is essentially eliminated. Interleaving was not used in theformatting and timing circuits for simplicity, and reduced area.The dead-zones that occur in the FTC arise primarily from insufficient recovery time between edges. The set (S) and reset (Rb) signals of the flip-flop are asynchronous level sensitiveinputs. Since these inputs have priority over the clock and data inputs, it is desirable to minimize the duration of the reset and set pulses. The pulse durations are reduced using a feedbackconnection from the flip-flop output (not shown) which switches off the pulse and resets the pulsegenerator when the ifip-flop reaches the desired low or high state. By keeping the duration of thepulse short, waveform edges could be placed closer to the test cycle boundaries without interferingChapter 4. Functional Tester Chip 48with the next test cycle. Since the pulses cannot be infinitesimally short, some interference isexpected.The relationship between these timing edges is shown in Figure 4.8. Simulations show thatthe pulses are approximately 6.ins in duration and take approximately 2.3ns from the triggeringedge to activate. The dead-zone at the beginning of the test period is actually less than 8.4nsbecause the start of the test period is determined by the time of the earliest possible edge. Thisearliest possible edge is also triggered by tclk, but it is delayed by the propagation delay throughthe coarse and fine vernier in edge TG. This delay amounts to 5.3ns (4.3ns + ins) which leaves a3.ins dead-zone at the start of the test period. The dead-zone after the trailing edge is primarilycaused by the pulse width of the pulse generators and amounts to approximately 6.ins.tclkbtclkbdatumwidth TGP0P0coarsevernier trailingearliest edgeedgea) Dead-zone at beginning of test period.Figure 4.8: Dead-zones (all times in ns).4.7.5 SkewThe waveforms generated by the FTC will exhibit some skew relative to the input clock. This skewis not a serious problem as long it is consistent amongst the channels. A more important concernis the skew, caused by the inconsistent propagation delay of rising and falling edges through logicgates, i.e., rise-fall skew. From Figure 4.7 we see that sources of rise-fall skew are the timingb) Dead-zone after trailing edge.Chapter 4. Functional Tester Chip 49generators, the pulse generators, the flip-flop, and the I/O pad. Since the same tclk and cik edgesare used, these signals do not contribute to rise-fall skew. Similarly, the timing generators do notadd to the rise-fall skew since the flip-flop and pulse generators are both active on the falling edges,The calculated skew contribution of the remaining modules to the leading and trailing edges of aformatted waveform are summarized in Tables 4.8 and 4.9 respectively. These calculations showthat the rise-fall skew is fairly small. In Chapter 5, we examine the rise-fall skew measured from afabricated waveform formatter and discuss ways to reduce this skew.Module LII Delay (ns) HL Delay (ns)Flip-flop (clk) 1.96 2.26I/O Pad 4.33 3.76Total Delays 6.29 6.02Total Skew 0.27Table 4.8: Leading edge skew.Module LII Delay (ns) Rb Delay (ns)Flip-flop (Set/Reset) 1.00 0.96Pulse generators (Set/Reset) 2.56 2.34I/O Pad 4.33 3.76Total Delays 7.89 j 7.06Total Skew 0.83Table 4.9: Trailing edge skew.4.8 Input ComparisonFigure 4.9 gives a functional block diagram of the input comparison circuit. During input comparison, the waveform formatter is used to generate formatted waveforms that define the comparewindow. When this waveform is high, the compare window opens which sensitizes an error latchto the result of the comparison between the DUT output (pad input) and the expected value. AnyChapter 4. Functional Tester Chip 50detected discrepancy produces an error. Although strobe sampling is not provided, it could beapproximated using a very narrow window.ComparisonErrorcompare data,next compare data,I/O direction,formatted waveform ComparisonCounterpad inputI/O directionFigure 4.9: Functional block diagram of input comparison circuit.Skew in the formatted waveform causes part of the output of the previous test period tobe present for a brief duration after the I/O transition. If this output happens to be high, thecomparison window may be inadvertently opened and cause an erroneous comparison. To avoidthis problem, the window define circuit is edge sensitive and opens the compare window on a risingedge and closes the compare window a faffing edge.Similarly, skew in the formatted waveform may leave the comparison window open whilethe next comparison datum is loaded at the beginning of the test period. This transition in thecomparison data is almost certain to cause an error to be recorded. To avoid this problem, thevector synchronization circuit allows comparison data to change oniy when the compare window isclosed. Hspice simulations suggest that pulses on the order of 1.Sns could be detected.Comparison results are not stored in the current design because of I/O and chip arealimitations. However, this function could be easily provided with using a simple encoding schemesuch as the one shown in Figure 4.10. Comparison results are returned as binary data (e.g.,1=correct; 0=error). The result sequence encoding describes 7 bits of comparison results while therepeated result encoding describes a string of 1 to 32 ones or zeros. Although this feature requiresformatted waveform,I/O directionChapter 4. Functional Tester Chip 51relatively little I/O (a maximum of l/8 the bandwidth of instruction memory), an additionalmemory bus is needed to provide the required bandwidth.76 0lOir r r r r r r I Result Sequence765 0Ii I r I count I Repeated ResultNote: r = comparison resultFigure 4.10: Comparison result output encoding.Instead of storing the comparison results, a counter with an overflow bit is used to log thenumber of successful comparisons before a failure. This counter could be examined after a test todetermine in which test cycle the first error occurred.4.9 Pin ElectronicsPin electronics in high-end testers usually have programmable voltage supplies for output drivingand input high-low threshold sensing. Dynamic loading is provided to facifitate testing undervarious load conditions. For initial prototyping, we can use standard CMOS I/O pads to driveand sense the DUT pins. The short distances between the FTC and the DUT should allow evenCMOS pads to directly drive the DUT inputs without secondary buffers. However, we are probablylimited to testing only CMOS devices because of the limited drive current available from the pad.The chosen pad can provide 4.2mA of drive current and has a rise-fail time on the order of ins.Chapter 5Implementation and Results5.1 ImplementationThe Canadian Microelectronics Corporation (CMC) provides two CMOS technologies for implementing the FTC— a 3.0im process [CMOS3DLM] and a 1.2tm process {CMOS4SV2]. The 1.2umtechnology was chosen for several reasons. Its smaller feature size allows the design to be implemented with a substantially smaller die. A 3.0gm implementation would require a die size of overtwice that required by a 1 .2tm implementation. More importantly, critical properties of the timinggenerators and waveform formatter are speed dependent. Using the higher speed 1.2gm technology increases the timing resolution and generally reduces the skew. Better performance could beobtained using a CMOS4S standard cell design without resorting to a full custom design effort.5.1.1 FTC AreaEven when using the higher density technology, a four channel FTC required an area of5733gm x 5122um (core area: 4632gm x 4O72tm). The control circuits were implemented withapproximately 2200 gates while each channel required approximately 4200 gates. A four channelFTC therefore requires approximately 19k gates. This implementation was scaled down for initialprototyping, due to the limited fabrication resources of the CMC. An outline of the prototype FTClayout is shown in Figure 5.1. The number of channels integrated into the FTC was reduced to onewhich decreased the size of the chip substantially without changing the functions of the FTC thatneeds to be tested.52Chapter 5. Implementation and Results 53tII I________IIIIIIIIII NI 1111111 huh IllIllIllill IIV 1111111 III liii! I IlIIIIIIIhIIIIIIIlIIhihIlIIHIhIH Iflu III huGh 1111111 IIIHI IIHHHII IHH umuuuiiuiinuu IIhI 11111)01111IIIIIHIIIIIIIIIIHIIIIIIIflIIIIIIIIIIIIIIIIItNIIIIIIIIIIHIIIIIIIIIII111111111 Ill 111111111 III 1111111111111111111111111111 II IIII IthhhhI 11111IhhiJil IIIIIIIIlIIIIIIlII(I 11111 1111111111 IFlhllIhhIhIl 1111 1111111 II 1111111 111111111III liii liii liii 11111111111111111111111111 1hhhhl1h1 III IIIIIlIIIIIIlIIIIIIIlIIIIIIIIIIIIIIIIlIIhIIIlIIIIIItIIiIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII,IIIIIhIIIIIIIIIIIIIIIIIIIIIIIIII 11111 I II 11111111111 liii IHI III 111111111 III! 111111111111111111111111 liii 11111 (II II 11111111111111111111 11111 11111111 H I 1111111 lII11I111I11111IhI11111111111I11111111111111 hhhII 1111111111111 IIIIIlIIIIIIIIlIlIIIIIIIII liii IIlIIlhIIIIIlIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIlIlIIIII__ __ __ __IIHIIIIIII 111111111 IiIIIIIIIIIIIIIIIIIIIlI IIlIlIIIIlIIIIIIIIIIIllIIIIIIIIIIIIIIIIIIIIIIIHilhIllIllIhhIllI 11111 IhiltIl IllIhhIlhIllIl IhlIhlIlhIhl IllIhhIlhIhl 111111111 11111111IIIIIIIIII 11111 III I I I—•••••••—’ I..._.......I_l *) ) o). ———-. ——— — IFigure 5.1: FTC chip layout.Chapter 5. Implementation and Results 54Reducing the FTC to one channel left the chip is I/O bound, that is, its area is determinedby the number of pads in the perimeter of the layout. Further area reduction could be accomplishedby using either narrower or fewer pads. Since the FTC directly interfaces with the DUT, the padsthat give the best performance should be chosen. Table 5.1 summarizes the performance of theavailable bidirectional pads [CMOS4SV2]. Compared to the other available pads, the 4mA widepads gave the best combination of rise/fall times, propagation delay and skew. Therefore, the widefamily of pads were retained in the layout.Available Bidirectional PadsParameter Wide Narrow4mA 8mA 4mA 8mARise time (tr) 1.0 + 0.3CL 1.0 + O.15CL 5.0 + O.34CL 6.0 + 0.36CLFall time (tf) 1.0 + 0.3CL 0.5 + O.16CL 7.2 + O.21CL 6.0 + 0.036CLLow-high prop. delay (tpLH) 2.9 + O.llCL 4.8 + O.067CL 4.3 + 0.l5Cj 6.0 + 0.073CLHigh-low prop. delay (tpHL) 2.2 + O.12CL 2.7 + O.O8OCL 6.2 + O.1CL 5.2 + O.O6OCLTable 5.1: Bidirectional pads available from CIvIC’s CMOS4S cell library.The number of pads were decreased by reducing the width of the address bus to 10-bitsand by using the corner locations for supply and ground pads. This reduced layout is stifi slightlyI/O bound and occupies an area of 3873.8gm x 2931.4tm (39% of the area required by the fourchannel version). Most of the design was implemented with standard cells and automatically placedand routed. The single-channel FTC requires only 40 pins which gives considerable ailowance forexpanding the memory width and the number of channels. A description of the FTC pins is givenin Appendix A.The delay lines used to implement the fine vernier in the timing generators were placedin the lower right corner of the FTC layout (see Figure 5.1). These circuits were laid out byhand to provide more consistent delays. Figure 5.2 shows the layout the delay line macrocell. Themultiplexors were placed adjacent to each other to minimize the zero delay skew. The smaller delayelements were placed closest to the selection multiplexors to reduce the affect of wire delays. ThisChapter 5. Implementation and Results 55circuit occupies an area of 2921um x 174.8,um.5.2 PerformanceSince the waveform formatting circuit and the associated timing generators are such a critical partof the FTC, these circuits were fabricated to provide an indication of the expected performance.Four out of the five manufactured chips were functional. The measured results given in this chapterwere obtained from these four chips. Unfortunately, results from the actual FTC are unavailablebecause it has not yet returned from fabrication.The resolution and skew performance of these waveform formatting chips (WFCs) werecharacterized by measuring the four edge placement parameters shown in Figure 5.3: i) the risingedge placement time at t1, ii) the faffing edge placement time at t1, iii) the rising edge placementtime at t2, and iv) the faffing edge placement time at t2. We are mainly interested in characterizingFigure 5.2: Delay line macrocell layout.Chapter 5. Implementation and Results 56the asynchronous delay generated by the fine vernier. The fine vernier allows the t1 and t2 edgesto be placed over the range of one clock phase. The actual range is slightly delayed after the clockedge due to skew. From these measurements the resolution and skew were determined.5.2.1 Timing ResolutionTiming resolution is measured as the size of the interval between which successive edges can beplaced. Table 5.2 shows the simulated edge placement delays of the FTC. The change in () delaycolumn gives the difference between the current delay and the previous delay. The absolute delaysmarked with an asterisk(*) are the results obtained from Silos simulation. These simulated resultswere used to compute the other values in the table. The delay values were normalized to eliminatethe skew offset from the reference edge. Timing skew is an important issue that is discussed in thenext section.From this table, the resolution is determined to be 1.2ns, the largest z delay. These resultsshow that most of the z delays fall in the O.Sns to O.8ns range. The outliers with delays of O.2nsand 1.2ns are caused by skew mainly in the delay elements. For example, consider the transitionfrom seven to eight delay elements. The delay line implementation shown in Figure 4.6b shows thata seven element delay involves (l+4)=5 high-low (ilL) transitions and 2 low-high (LII) transitionswhile the eight element delay involves eight LII transitions. The buffer delay elements propagate afixed referenceedge (cik)Figure 5.3: Timing measurement locations.Chapter 5. Implementation and Results 57delay delay delay delaytap] abs.(ns) i(ns) jtap abs. t tap abs. Li tap] abs. L0 0.0*— 8 454* 0.2 16 10.10* 1.2 24 14.64 0.21 0.63* 0.6 9 5.17 0.6 17 10.72 0.6 25 15.26 0.62 1.17* 0.5 10 5.71 0.5 18 11.27 0.5 26 15.81 0.53 1.80 0.5 11 6.34 0.6 19 11.90 0.6 27 16.44 0.64 2.56* 0.8 12 7.10 0.8 20 12.66 0.8 28 17.20 0.85 3.19 0.6 13 7.73 0.6 21 13.28 0.6 29 17.82 0.66 3.73 0.5 14 8.27 0.5 22 13.83 0.5 30 18.37 0.57 4.36 0.6 15 8.90 0.6 23 14.46 0.6 31 19.00 0.6Table 5.2: Simulated FTC edge placement times in ns (normalized).HL transition with 0.58ns delay while a LII transition is propagated with 0.5lns delay. Therefore,seven delay elements provide a 3.92ns delay and an eight delay elements provide only a 4.O8ns delay(i.e., an additional 0.l6ns). The cause of the 1.2ns Li delay is similar.Table 5.3 lists the average edge placement times obtained from the WFCs for a risingedge at t1. Again, the absolute delays marked with an asterisk(*) were measured from the WFCswhile the unmarked delays were derived from the measurements. Test stimulus for the WFC weregenerated with an HP 8180A Data Generator. The results were measured using a HP 54600A100MHz oscilloscope (with l3pF load from probe). The output edges were very stable with noobservable jitter. Measurement errors are estimated to be +0.lns.An initial observation from comparing Tables 5.2 and 5.3 is that the measured l5ns delayrange is substantially less than the simulated l9ns delay range. This result is not totally surprisingconsidering that timing characteristics can be quite different due to process variations. However, itis quite possible that the propagation delays used in the simulation models are overly pessimistic.The resolution given by the largest z delay is ins.A second observation is that the WFCs appear to exhibit a greater variation in L delaywhen compared to the simulated results. Closer examination of the measurements show thatChapter 5. Implementation and Results 58delay delay delay delaytap abs.(ns) (ns) tap abs. Li tap abs. z tap abs. L0 0.0* — 8 3.78* 0.3 16 7.60* 0.3 24 11.38 0.31 0.23* 0.2 9 4.01 0.2 17 7.83 0.2 25 11.61 0.22 1.25* 1.0 10 5.02 1.0 18 8.85 1.0 26 12.62 1.03 1.48 0.2 11 5.26 0.2 19 9.08 0.2 27 12.85 0.24 2.00* 0.5 12 5.78 0.5 20 9.60 0.5 28 13.38 0.55 2.23 0.2 13 6.01 0.2 21 9.83 0.2 29 13.61 0.26 3.25 1.0 14 7.03 1.0 22 10.85 1.0 30 14.63 1.07 3.48 0.2 15 7.26 0.2 23 11.08 0.2 31 14.86 0.2Table 5.3: Average WFC edge placement times for a rising edge at ti in ns (normalized).the variations originate mainly from non-linearities in the two shorter delay elements. These nonlinearities are more easily seen when the data is plotted as shown in Figure 5.4. The individual chipmeasurements were plotted separately so that variations between chips are more apparent. Processvariations are the main cause of the deviations in delay between devices. Most of the measurementsfall on a fairly straight line except for those from the one element delay. Examining the layoutof the WFC showed that the single buffer delay element was ideally placed between its adjacentmultiplexors while the other larger delay elements were placed further apart. (Figure 4.6b givesa block diagram of the delay lines.) Therefore, the non-linearity may be caused by the disparitybetween the additional delay introduced by the longer routing of larger delay elements and theshorter delays caused by minimal routing of the single delay element. In some cases, the delay ofthe single buffer is barely measurable.A third observation is that the WFC measurements are significantly more linear than thesimulated data. This result indicates that the skew in the buffer delay elements are less severe thanmodeled in the CMOS4S library. Comparison with timing measurements of the other three edgeplacement parameters (see Appendix B) give similar results.Resolution provides a measure of the interval at which a tester can place distinct waveformedges. The amount of resolution required is highly dependent on the application. Usually, a testChapter 5. Implementation and Results 59Simulated and Measured Delays•1 I I I I I I I10.09.08.07.0C,)S. 6.05.014.0_I I simulated30 —chipO—chip1—chip 22.0 O—Ochip31.0-0.0-1.0I I I I • I0 2 4 6 8 10 12 14 16 18Delay TapFigure 5.4: Simulated and measured delays. (Measured times are for rising edge at t1.)is scaled to adapt to the available resolution and other tester parameters. While a fine resolutionis desirable, it may not be justified if a less costly system with coarser resolution is adequate.Measurements from the WFCs give a resolution of ins. Resolutions of O.Sns are achievable if thenon-linearities in the shorter delay elements could be eliminated.5.2.2 Timing SkewSo far, in the discussion of timing resolution, the channel outputs have been considered independently. In practice, however, ICs are tested with groups of waveforms. Once more than one signalis used, the issue of skew becomes important. Skew refers to the deviation (sometimes called phasedifference) between the actual and desired edge placement times. There are two types of skew thatChapter 5. Implementation and Results 60we are concerned with: i) interchannel skew, and ii) rise-fall skew. Generally, the presence of skewdegrades the accuracy with which edges could be placed. Skew cannot be completely eliminatedbecause the environment in which the electrical signals operate cannot be perfectly duplicatedbetween channels. However, the goal is to minimize skew.Interchannel SkewInterchannel skew arises from a phase difference between edges generated by different channels fromeither the same device or from different devices. Figure 5.5 shows a plot of the typical interchannelskew measured from the WFCs at edge t1. The delay times are measured relative to a fixed referenceedge (an edge of the input clock). The rising edges are graphed with the heavier lines while thefalling edges are graphed with the lighter lines.From this plot, the maximum interchannel skew obtained is i.2ns for both rising and faffingedges. Several calibration techniques are available that could be used to correct for this skew. Onemethod of skew compensation is to introduce an additional delay line at either the timing generatoroutput or the waveform formatter output [Healy85, Dah187]. Faster edges are delayed so that theyarrive at the same time as the slower edges. The net effect is an upward shifting of the graphs tominimize skew. Figure 5.6 shows how this deskewing technique can be used to reduce most of theskew to 0.3ns or less. (The offset from Ons for absolute delay was uniformly reduced to ins so thatthe maximum skew could be plotted on the same graph in larger scale.) This deskewing scheme iscommonly used in shared resource tester architectures [Healy8S].An alternative deskewing scheme uses the timing generators already present in each channelto compensate for the skew [Catalano83] [Healy8S]. The idea is to choose the appropriate delaythat minimizes skew for a given edge placement time. This scheme is only effective in the regionswhere the delay ranges for the different channels overlap. The edges of the delay ranges are notusable, i.e., they are dead-zones. By using sufficiently wide delay ranges, these dead-zones couldbe located outside the active region of the current test cycle. Multiplexing is probably required toeliminate these dead-zones.Chapter 5. Implementation and Results 61Interchannel Skewfor rising and falling edges at ti26.0 I I I I I I25.024.023.022.0U,Cw 21.0EI—20.0__________________—chip 0 rising-edge19.0—chip 1 rising-edge—chip 2 rising-edge18.0 O—Ochip 3 rising-edgeE—EJ chip 0 falling-edge—Echip 1 falling-edge17.0—chip 2 falling-edgeG—Ochip 3 falling-edge16.015.0I I I I I I0 2 4 6 8 10 12 14 16 18Delay TapFigure 5.5: Interchannel skew measured for rising and faffing edges at t1.Rise-fall SkewCMOS logic circuits usually do not propagate both rising and faffing edges with the same delay.This difference in propagation delay is the cause of rise-fall skew. Rise-fall skew primarily degradesthe accuracy with which edges could be placed.The measured rise-fall skew at t1 and t2 are summarized in Table 5.4. The calculated skewsare shown in Table 5.5. The magnitude of the measured skews are fairly close to the calculatedskews and the relative positions of the edges are correct. The waveform formatter implementedin the WFC could not produce all four formats because of a design error. The revised waveformChapter 5. Implementation and Results 62Delay TapFigure 5.6: Deskewed placement times for rising edges at t1.formatter in the FTC may have a slightly longer propagation delay for t1 edges, but the overallrise-fall skew is reduced. The overall propagation delay is reduced because of the 4mA wide pads.The effect of these skews is more clearly seen when the actual delay ranges for t1 and t2 are plottedas shown in Figures 5.7 and 5.8 respectively.Rise-fall skew can be reduced using custom transistor sizing and/or subsequent deskewing.Designing the waveform formatter circuit with minimum rise-fall skew involves appropriately sizingits transistors for a given capacitive load. This problem requires a considerable amount of effort.Even sizing a pair of inverter chains (consisting of two and three gates) for the purposes of generatingcomplementary clocks with minimal skew and pulse distortion requires as many a 100 brute forcespice simulations [Argade89] depending on the desired precision. In comparison, the waveformDeskewed Rising Edge at tiU)Ca)EI—>.a)10.09.08.07.06.05.04.03.02.01.00.00 2 4 6 8 10 12 14 16 18Chapter 5. Implementation and Results 63Tap Rise-fall skew at t1 (ns) Rise-fall skew at t2 (ns)1 2 3 4 1 2 3 40 2.0 1.6 1.4 1.6 0.8 0.6 0.8 0.81 1.0 0.9 1.1 1.5 0.1 0.0 0.6 0.62 1.7 1.8 1.6 1.6 0.7 0.7 0.6 0.64 1.7 1.6 1.5 1.5 0.5 0.5 0.5 0.48 1.9 1.9 1.5 1.5 0.8 0.9 0.4 0.716 2.1 2.1 2.0 1.6 0.8 0.7 0.9 1.1Mean: 1.61 Mean: 0.63Standard Deviation: 0.31 Standard Deviation: 0.24Variance: 0.09 Variance: 0.06Table 5.4: Rise-fall skew (rising edge — faffing edge) measurements.WFC FTCComponent t1rise fall rise fall rise fail rise fallWave formatter 3.06 2.89 2.12 1.95 1.96 2.26 3.56 3.30I/O pad 6.95 5.98 6.95 5.98 4.33 3.76 4.33 3.76Total Delay 10.01 8.87 9.07 7.93 6.29 6.02 7.89 7.06Total Skew 1.14 1.14 0.27 0.83Table 5.5: Rise-fall skew calculations (all times in us).formatter circuit is significantly larger with 40 gates (160 transistors) and the presence of feedbackfurther complicates optimization. Application specific optimization methods could significantlyreduce the effort (e.g., [Argade89]), but these methods need to be developed.Deskewing rise-fall skew using calibration circuits requires more sophisticated methods thanthose used for interchannel skew because we need to selectively adjust rising and faffing edges att1 and t2. Figure 5.9 shows a block diagram of the waveform formatting module including thedeskewing circuits. The waveform formatter is essentially a memory element with synchronousload, set, and reset. The latch signal generates either a rising or faffing edge at t1 depending on thevector data. The set signal generates a rising edge at t2 while the reset signal generates a faffingChapter 5. Implementation and Results 64Rise-fall Skew at t12 iii18.016.014.012.010.0 V8.0x rising edge6.0 V V falling edge* * rise-fall skew4.0 V2.0 jtons0.0 — I I I I I I I • I I • I I I I I0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34Delay TapFigure 5.7: Rise-fall skew at t1.edge at t2. Additional control signals (not shown) determine whether the set or reset inputs areactive. These control signals are set up ahead of time and do not affect the skew. The timing ofthe latch signal is controlled by the edge timing generator (Te) while the timing of the set and resetsignals are controlled by the width timing generator (TW).The placement of the four edges A, B, C, and D are given by the following equations:A = Te+T3+Tr,B = Te+T4+TfC = Tw+T2+T4+TfD Tw+T1+T3+Tr2Chapter 5. Implementation and Results 65Rise-fall Skew at t220.0 .......18.016.014.012.0C’,10.08.0 V rising edgeV v V falling edge6.0 V * * rise-fall skew4.02.00.0109ns-2.00 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34Delay TapFigure 5.8: Rise-fall skew at t2.where Tr and is the respective rising and faffing edge propagation delay at t (for i = 1 or 2).Differences between the Tr and Tf. delay values causes rise-fall skew at t. To deskew the edges att1 and t2, we want:A=BTe+T3+Tri Te+T4+Tf1T3 + T1 = T4 + Tf (5.1)and,C=DChapter 5. Implementation and Results 66vector data datate C >latch wave_J— 11:__I—N J-_L::tetformatter_______________orLf JL ® ©Figure 5.9: Rise-fail deskew circuit.Tw+T2+T4 f = Tw+T1+T3+Tr2T2+T4+Tf = T1+T3+Tr2 (5.2)These equations show that any differences between Tr and Tf. can be deskewed by an appropriatechoice of-i-i, r2, T3, and r4 delay values.If we can assume that Tn > Tf1 and Tn2 > Tf2, (as in our case) T2 and T3 are no longerneeded. Therefore, equations (5.1) and (5.2) can be reduced to equations (5.3) and (5.4) as follows:Tn1 — Tf = T4 (5.3)Tn2 — Tf = T4 — T1 (5.4)In practice, the degree of deskewing is limited by the resolution of the delay lines T1 to r4. Theresolution of the existing delay lines could be enhanced by using using inverters which have apropagation delay of O.26ns. Further increases in resolution could be achieved by varying thecapacitive loading on the inverters [Branson89].Chapter 6Conclusions and Future Work6.1 ConclusionsThis thesis focuses on improving the test quality of low-end VLSI functional tester systems. Threepotential areas of improvement that were considered are: memory size, tester speed, and I/Ocapability (timing and formatting). Increasing the memory size has little effect on test quality.Simply increasing the test speed (test pattern rate), does not provide sufficient gain in test qualitycompared to cost. We found that the best improvement in test quality can be obtained by improvingthe timing and I/O wave formatting capability of a tester.A test strategy was developed which exploits this improved timing and I/O waveform formatting capability to generate short high-speed clock bursts. These high-speed clock bursts canbe used to perform testing at speeds equivalent to a much higher rate than the tester’s patternrate. This strategy can be used to test many digital designs. However, using this strategy for“high-speed” testing requires more effort in test pattern development and more test vectors thanan equivalent test performed on a high-speed tester. This approach can also be used in existinghigh-end testers that already have the required timing and waveform formatting capabilities.A functional tester system for CMOS logic was developed that provides these timing andwaveform formatting enhancements. The simplicity of this system was achieved by integratingmost of the required circuits into modular functional tester chips (FTCs) that are used in paraJielto form a tester. Vector encoding combined with an internal lookup table was used to reduce thememory bandwidth required to support the increased functionality.A single-channel FTC, was designed and implemented. The chip has a die size of 3.8mm x67Chapter 6. Conclusions and Future Work 682.9mm and requires only 40-pins which gives considerable allowance for future increases in memorywidth and channels. Measurements from earlier prototypes of the FTC waveform formatter giveworst case timing resolutions of ins and maximum rise-fall skews of i-2ns. Worst case skews betweendevices were on the order of ins. These results are very reasonable considering the relativelyconservative standard cell design using i.2m CMOS. Improved performance is expected from amore customized and aggressive design.6.2 Future WorkThe work presented in this thesis represents an initial step towards implementing the functionaltester system (FTS). More work is needed to produce a final version of the FTC. Software willalso be needed to configure and control the FTS. Once completed, FTS could be used to provide aplatform from which to explore VLSI testing. Some directions for future work are given below.The effectiveness of the high speed “burst” testing strategy depends on the implementationof the device to be tested. More work is needed to formally quantify the effectiveness of this strategyfor different circuit types in terms of the effort required for test vector development and the increasein the number of test vectors. It would be useful to examine the capabilities and limitations ofthis approach with some more concrete example circuits. The industrial benchmark circuits fromMCNC (Microelectronics Corporation of North Carolina) and ISCAS (International Symposiumon Circuits and Systems) may be a useful source.The test vectors in the FTS were stored using an encoding scheme to reduce the bandwidthrequired between the FTC and the external memory. It will be useful to develop more optimalencoding by examining instruction distributions for test vectors. It may be possible to use datafrom other test systems, but there is some machine dependency in the actual test patterns.Many aspects of the existing FTC implementation could be enhanced with more customizeddesign. Accuracy could be improved by including skew calibration, and compensation for thermalChapter 6. Conclusions and Future Work 69and/or voltage drifts which provides more stable timing. Improved timing resolution and performance may be achieved using the BiCMOS technology and the analog cells that are now availablefrom CMC. Custom pads could be designed that supports programmable high/low voltage thresholds and drive currents for the purposes of I/O and dynamic loading. This would allow the FTCto function in a variety of environments.Bibliography{Abramovici9o] M. Abramovici, M. A. Breuer, and A. D. Friedman, “Digital Systems Testing andTestable Design,” AT&T Bell Laboratories and W. H. Freeman and Company,New York, 1990.[Argade89] P. V. Argade, “Sizing an Inverter with a Precise Delay: Generation of Complementary Signals with Minimal Skew and Pulsewidth Distortion in CMOS,” IEEETransactions on Computer-Aided Design, Vol. 8, No. 1, pp. 33-40, Jan. 1989.[ATLAS84] IEEE ATLAS Committee, IEEE Standard ATLAS Test Language (IEEE Std.416-1984), IEEE, New York, 1984.[Bardell87] P. H. Bardell, W. H. McAnney, and J. Savir, “Built-In Test for VLSI: Pseudorandom Techniques,” John Wiley and Sons, New York, 1987.[Bassett9o] R. W. Bassett, B. J. Butkus, S. L. Dingle, M. R. Faucher, P. S. Gillis, J. H.Panner, J. G. Petrovick (Jr.), and D. L. Wheater, “Low-Cost Testing of High-Density Logic Components,” IEEE Design and Test of Computers, pp. 15-28,1990.[Bishop9o] P. E. Bishop, C. T. Glover, “Testability Considerations in the Design of theMC68340 Integrated Processing Unit,” International Test Conference, pp. 337-345, 1990.[Bisset83] S. Bisset, “The Development of a Tester-per-pin VLSI Test System Architecture,”International Test Conference, pp.151-155, 1983.[Branson89] C. W. Branson, “Integrated Pin Electronics for a VLSI Test System,” IEEE Transactions on Industrial Electronics, Vol.36, No. 2, pp. 185-191, May 1989.[Butner87] S. E. Butner, “Evaluation of a prototype VLSI tester,” INTEGRATION, the VLSIjournal, No. 5, pp. 275-288, 1987.[Cadence89] Cadence Design Systems Users Manual, 1989.{Cascade9l] Cascade Microtech, “High Speed Digital Microprobing Principles and Applications,” 1991.[Catalano83] M. Catalano, IL Feldman, R. Krutiansky, R. Swan, “Individual Signal Path Calibration for Maximum Timing Accuracy in a High Pincount VLSI Test System,”International Test Conference, pp. 188-192, 1983.70Bibliography 71[Chandra93] S. Chandra, K. Pierce, 0. Srinath, H. R. Sucar, and V. Kulkarni, “CrossCheck:An Innovative Testability Solution,” IEEE Design and Test of Computers, pp.56-68, June 1993.[Chang87] Y. E. Chang, D. E. Hoffman, A. J. Gruodis, and J. E. Dickel, “A 250-MHzAdvanced Test Systems,” International Test Conference, pp. 68-75, 1987.{Chapman92l J. Chapman, “High-Performance CMOS-Based VLSI Testers: Timing Controland Compensation,” International Test Conference, pp. 59-67, 1992.[CMOS3DLM] “CMOS3 DLM Cell Library,” Canadian Microelectronics Corporation, 1989.[CMOS4SV2] J. Mowchenko, and T. Monson, “The CMOS4S Standard Cell Library Version2.0,” Canadian Microelectronics Corporation, 1991.[Dahl87] M. Dahl, “Closed-Loop Error Correction: A Unique Approach to Test SystemCalibration,” International Test Conference, pp. 772-778, 1987.[Director9o] S. W. Director, W. Maly, and A. J. Strojwas, “VLSI Design for Manufacturing:Yield Enhancemeilt,” Kiuwer Academic Publishers, New York, 1990.[Eichelberger78] E. B. Eichelberger, and T. W. Williams, “A Logic Design Structure for LSI Testability,” Journal of Design Automation, Fault Tolerant Computing, Vol. 2, No. 2,pp. 165-178, 1978.[Fehr92] G. Fehr, “Timing-Per-Pin Flexibility at Shared-Resource Cost,” InternationalTest Conference, pp. 431-438, 1992.[Feugate88] R. J. Feugate (Jr.), and S. M. McIntyre, “Introduction to VLSI Testing,” PrenticeHail, Englewood Cliffs NJ, 1988.[Gasbarro89] J. A. Gasbarro, and M. A. Horowitz, “Integrated Pin Electronics for VLSI Functional Testers,” IEEE Journal of Solid-State Circuits, Vol. 24, No. 2, pp. 331-337,April 1989.[Gasbarro90] J. A. Gasbarro, “An Architecture for High-Performance Single-Chip VLSITesters,” PhD. Thesis, Xerox Palo Alto Research Center, 1990.[Gelsinger87] P. Gelsinger, “Design and Test of the 80386,” IEEE Design and Test of Computers, pp. 42-50, June 1987.[Goodenough9o] F. Goodenough, “Analog ICs Target Tester Pin-Electronics,” Electronic Design,pp. 91-96, June 14, 1990.[Gruodis88] A. J. Gruodis, and D. E. Hoffman, “250-MHz Advanced Test Systems,” IEEEDesign and Test of Computers, pp. 24-35, April 1988.Bibliography 72[Healy85] J. Healy, and G. Ure, “A Method of Reducing ATE System Error Componentsand Guaranteeing Subnanosecond Measurement Accuracies,” International TestConference, pp. 191-202, 1985.[Huber9l] J. P. Huber, and M. W. Rosneck, “Successful ASIC Design the First TimeThrough,” Van Nostrand Reinhold, New York, 1991.[HP E1451/E1452 91] HP E1451/E1452 20MHz Pattern I/O Modules Hardware Manual, 1991.[Kikuchi89] S. Kikuchi, Y. Hayashi, T. Matsumoto, R. Yoshino, and R. Takagi, “A 250-MHz Shared-Resource VLSI Test System with High Pin Count and Memory TestCapability,” International Test Conference, pp. 558-566, 1989.[LaBuda9O] V. P. LaBuda, and R. Youngblood, “DFT Standards Allow Optimized TesterConfiguration to Reduce Cost of Test,” Proceedings of the Third Annual IEEEASIC Seminar and Exhibit, pp. P13-7.1-P13-7.4, 1990.[Lesmeister9l] G. Lesmeister, “A Densely Integrated High Performance CMOS Tester,” International Test Conference, pp. 426-429, 1991.[Levitt92] M. E. Levitt, “ASIC Testing Upgraded,” IEEE Spectrum, pp. 26-29, May 1992.[McCluskey8l] E. J. McCluskey, and S. Bozorgui-Nesbat, “Design for Autonomous Test,” IEEETransactions on Computers, Vol. C-30, No. 11, pp. 866-874, 1981.[McKenzie89j N. McKenzie, “UW VLSI Chip Tester,” University of Washington, Dept. of Computer Science and Engineering, TR# 89-12-01, 1989.[McKenzie92] N. McKenzie, L. McMurchie, and C. Ebeling, “The UW MacTester: A Low-CostFunctional Tester for Interactive Testing and Debugging,” University of Washington, Dept. of Computer Science and Engineering, TR# 92-10-08, 1992.[Miyamoto87] J. Miyamoto, and M. A. Horowitz, “A Single Chip LSI High Speed FunctionalTester,” IEEE Journal of Solid-State Circuits, Vol. 22, No. 5, pp. 820-828, Oct.1987.[Naish88] P. Naish, and P. Bishop, “Designing ASICs,” John Wiley and Sons, New York,1988.[Organ9l] D. Organ, “The enVision Timing Resolver,” International Test Conference, pp.1004-1008, 1991.[Parker87] K. P. Parker, “Integrating Design and Test: Using CAE Tools for ATE Programming,” Computer Society Press of the IEEE, Washington D.C., 1987.[Pulfrey89j D. L. Puifrey, and N. G. Tarr, “Introduction to Microelectronic Devices,” PrenticeHall, pp. 46-48, 1989.Bibliography 73[Rau79] B. R. Rau, “Program behavior and the performance of interleaved memories,”IEEE Transactions on Computers, Vol. C-28, pp. P191-199, March 1979.[Sedra87] A. S. Sedra, and K. C. Smith, “Microelectronic Circuits 2nd Edition,” Holt, Rinehart and Winston, pp. 343, 1987.[SudoSi] T. Sudo, A. Yoshii, T. Tamama, N. Narumi, and Y. Sakagawa, “ULTIMATE:A 500-MHz VLSI Test System with High Timing Accuracy ,“ International TestConference, pp. 206-213, 1987.{Tsui87] F. F. Tsui, “LSI/VLSI Testability Design,” McGraw-Hill, New York, 1987.[VLSI89] VLSI Technology, Inc., “1.0 Micron CMOS VGT300 Portable Library - Rev. 2.0,”pp. vi-vii, 1989.[Walter88] A. Walter, Y. Kleinman, L. Edelshteyn, and J. Gartner, “An Expert Test ProgramGeneration System for Per-pin Testers,” International Test Conference, pp. 665-668, 1988.[Wffliams83] T. W. Williams, and K. P. Parker, “Design for Testability— A Survey,” Proceedings of the IEEE, Vol. 71, pp. 98-112, Jan. 1983.Appendix AFTC Pin DescriptionA description of the FTC signals is given in Table A.1.74Appendix A. FTC Pin Description 75[ Name 17O] Descriptionreset I This is an asynchronous active-high reset signal for the FTCchip. The reset signal must be asserted for two clock cyclesfor proper operation.cik I This is a single-phase clock input for the FTC chip.Data<7:0> I/O This is an 8-bit bidirectional data bus. During test or calibration mode this bus operates as an input. During programmode this bus is bidirectional.Addr<9:0> I/O This is a 10-bit bidirectional address bus. During test modethis bus generates the addresses from which to fetch instructions. During program or calibration mode this bus acceptsaddresses to access internal FTC memory elements.cs I This is an active-high_chip select signal. This signal must beactive before read/write operations on internal FTC memoryelements can be initiated by the memiordb and memiowrbsignals.memiordb I This is an active-low read signal for FTC internal registers.memiowrb I This is an active-low write signal for FTC internal registers.mode<1:0> I These pins are used to control the current operating modeof the FTC. Mode changes are latched on the falling edge ofcik.memcs I This input gives external chip select signals for the memory.MEMCS 0 This output provides the chip select control signal for the vector memory. During test mode this output is always asserted.[During program and calibrate mode?]rwb I This input gives external read/write memory operations.RWB 0 This output provides the read/write control signal for thevector memory. During test mode this signal always specifythe read operation. [During program and calibrate mode?]SYNC 0 This output provides programmable access to certain internalcontrol signals which could be used for synchronization ofother test equipment.Fv I/O This pin provides the channel output which connects to theDUT.HALTB I/O This signal behaves as an open drain output and should beresistively pulled high. It is active when the signal is low.Table A.1: FTC pinout.Appendix BWFC MeasurementsTables B.1 to B.4 summarizes the WFC measurements. The average rising and faffing edge timesfor the input clock is 3.4ns and 3.Ons respectively. The average rising and faffing edge times for theWFC measurements is 3.7ns and 2.4ns respectively.Delay Delay in ns for Chip # Mean Std. SampleTap 0 1 2 3 Dev. Var.bypass 13.9 14.2 13.7 13.6 13.85 0.23 0.070 17.8 17.8 16.8 17.0 17.35 0.46 0.281 17.6 17.9 17.5 17.3 17.58 0.22 0.062 18.7 19.1 18.4 18.2 18.60 0.34 0.154 19.4 19.9 19.1 19.0 19.35 0.35 0.168 21.2 21.8 20.9 20.6 21.13 0.44 0.2616 25.0 25.6 24.8 24.4 24.95 0.43 0.25Table B.1: Rising edge at t1.Delay Delay in ns for Chip # Mean Std. SampleTap 0 1 2 3 Dev. Var.bypass 11.3 11.8 11.4 11.3 11.45 0.21 0.060 15.8 16.2 15.4 15.4 15.70 0.33 0.151 16.6 17.0 16.4 15.8 16.45 0.43 0.252 17.0 17.3 16.8 16.6 16.93 0.26 0.094 17.7 18.3 17.6 17.5 17.78 0.31 0.138 19.3 19.9 19.4 19.1 19.43 0.29 0.1216 22.9 23.5 22.8 22.8 23.00 0.29 0.11Table B.2: Failing edge at t1.7677Appendix B. WFC MeasurementsDelay Delay (ns) for Chip # Mean Std. Sample0 17.6 17.8 17.4 17.0 17.45 0.30 0.121 17.7 18.2 17.4 17.2 17.63 0.38 0.192 18.6 19.2 18.4 18.2 18.60 0.37 0.194 19.8 20.4 19.5 19.3 19.75 0.42 0.238 21.8 22.4 21.4 21.3 21.73 0.43 0.2516 25.8 26.5 25.5 25.3 25.78 0.45 0.28Table B .3: Rising edge at t2.Delay Delay in ns for Chip # Mean Std. Samplemm0 16.8 17.2 16.6 16.2 16.70 0.36 0.171 17.6 1.8.2 16.8 16.6 17.30 0.64 0.552 17.9 1.8.5 17.8 17.6 17.95 0.34 0.154 19.3 19.9 19.0 18.9 19.28 0.39 0.208 21.0 21.5 21.0 20.6 21.03 0.32 0.1416 25.0 25.8 24.6 24.2 24.90 0.59 0.47Table B.4: Falling edge at t2.Appendix CFTC Implementation DetailsC.1 FTC Circuit DocumentationacquireThe acquire module provides the input window comparison function. This module activates when achannel switches to input mode (fmt<2>=O). The compare value and the DUT value is comparedby the XOR gate 163. The result of this comparison is used to set the error bit (EB) when thecomparison window is open. The comparison window opens when fv is high. The opening andclosing of the comparison window is synchronized to the edges of fv. As long as the comparisonremains error free (EB.Q=O) and the channel is in input mode, the compare counter (cmpcnt)increments on every tclkb cycle.The comparison vector changes on the falling edge of tclkb. This change does not occur onthe test cycle boundary because of skew between tclkb and the test cycle. The “cv synchronizer”synchronizes comparison vector changes to the actual test period boundary. Comparison windowsthat extend beyond the current test cycle retains the same comparison vector in the next cycle asin the current cycle. That is, the comparison vector (cv) will not be changed. Since cv changesoccur on the test cycle, comparisons should not occur near the test period boundaries without aguard-band. Otherwise errors are guaranteed to be reported.addregcThe addregc (address register counter) is a 13-bit loadable ripple counter. The lower 8 bits orthe upper 5 bits could be read through the DOUT<7:O> bus using rdb<O>=O and rdb<1>=O78Appendix C. FTC Implementation Details 79respectively. DOUTis tristated when rdb<1:O> = 11.When (loaden or enb)=l data is loaded from the ain<12:O> bus. The enb signal alsoswitches the clkb signal of the counters. Transitions on the enb signal should occur only when thedestination clock signal is high.Data is loaded from the ain<12:O> bus when (loaden or enb) = 1. The enb signal alsoselects one of two signals for clocking the counters. Consequently, the enb signal should select thenext clock signal only when the next clock is in the high state. This avoids inadvertently generatinga faffing edge which would trigger the counter. The counter increments when counten is high. Thecounten signal should not change on the active edge of the counter.adecmodThe adecmod (address decoder module) decodes the address space of the chip into blocks of 32addresses each. As shown in the schematic, the full address space is not fully decoded — only bits6 and 7 are used to identify a block.aregmodIn a multichannel implementation, this schematic would contain one channel address register module (caregrnod) for each channel. The decoders (118 and 119) select from which caregmod to read orwrite.caregmodThe caregrnod (channel address register module) schematic has the registers that contain the currentaddress to be fetched, the restart address (lower bound) and the ending address (upper bound).The restart and ending addresses are included inside the address range.The address is incremented on the faffing edge of the incb signal. Disabling countenbprevents the address from incrementing. Disabling wrapenb prevents the current address fromwrapping to the restart address after it reaches the ending address. Asserting the aoutenb signalAppendix C. FTC Implementation Details 80addr<2:0> Register accessed000 restart address <7:0>001 restart address <12:8>010 ending address <7:0>011 ending address <12:8>100 current address <7:0> *101 current address <12:8> *110 unused111 unused* current address could only be written/read when tm..enb = 1.Table C.1: Memory map of address space registers in the caregmod module.wifi tristate the AOUT<12:0> address bus. The tm_enb (test mode enable) signal must be assertedbefore the current address is incremented.Memory MapThe decoders DR and DW selects the registers to read and write according to the memorymap given in Table C.1.END SignalEND signals the controller to terminate test mode at the end of the current test cycle. Thissignal is used to stop testing at the end of a channels instruction sequence (instead of wrapping).Since instructions are pipelined and execute in variable number of cycles, the last instruction willnot be executed until a variable number of cycles after it has been fetched.The END GENERATOR is the circuit responsible for generating the END signal at the correct time to stop testing after the last instruction in the channels address space has been executed.This circuit is enabled when wrapenb is disabled.A low on 178.Q enables 169 to generate END when creadyb is high (i.e., the channel finishedexecuting the last instruction and is ready for the next one.) The ifip-flops 162, 166 and 178 are usedto delay the low signal generated by CMP.EQB, which indicates that the end address is reached,until the last instruction actually completed execution in the pipeline.Appendix C. FTC Implementation Details 81chmodThe chmod channel module contains the vector decoder and format memory (vectmod), the waveform formatter (wfmt2) and the input comparison (acquire) circuits.SignalsSignal Name Descriptionrst<1:0>, These are the reset (active high/low) and the start reset signals.sresetenb This activates the channel module.rdb, wrb, These pins provide the interface to the format memory.addr<3:0>,din<7:0>ac_rdb<1:0> These signals are used to read the acquire counter.ivectin<7:0>, These signals are for the instruction bus.c_iatck<1:0>, These are clocking signals.tck<1:0>,ioaderi< 1:0>padin_v This pin provides the pad input used for input comparison.fv_en This pin provides the pad enable signal used to activate the inputcomparison.Table C.2: Input signals for chmod.Signal Name DescriptionV unformatted vector.FVENABLE enable/disable output.FV formatted output.CREADYB channel ready, signal to fetch next instruction.DOUT data out for read access to format memory.Table C.3: Output signals for chmod.Appendix C. FTC Implementation Details 82chselmodThis module generates the following timing signals for pipelining the memory fetch:1. CHSELB<3:O> (channel select): This signal is an active low pulse.2. CECHSEL.YDEL<3:O> (channel enabled channel select with phase delay i.e., delayed 1 phasefrom CHSELB): This signal is basically a phase delayed version of CHSELB<3:O> that isenabled by CHENB<3:O>.3. CHENB<3:O> (channel enable): This signal is generated by latching the creadyb<3:O> (channel ready) signal using the rising edge of CHSELB<3:O>.Four versions of these three signals are generated - one for each channel. The CHSELB<3:O>and CECIISELYDEL<3:O> signals are asserted once per tclkb (test clock) period. Consequently,the CHENB<3:O> signal can change at most once per tclkb period.Counter Reset CircuitThis circuit ensures that CHSELB<3:O> and CECHSEL...PDEL<3:O> are asserted onceper tclkb period. (The tclk signal may range from 4-255 cl/c cycles.) Once the 2-bit counter reaches11, further counting is disabled using 136 until tclkb is 1. 137 is needed to disable 136 for the firstcycle.Channel Status RegisterThis circuit stores the status (CREADY) of each channel.2-bit counterThis two bit counter is decoded to generate CHSEL<3:O> and CECHSELPDEL<3:O>.Counter Decoder and Output SyncThis circuit decodes the 2-bit counter and provides the appropriate output synchronization.Appendix C. FTC Implementation Details 83chsmodThis schematic contains one chmod for each channel in the chip. The decoders select which channelto perform the read/write operation on. The 142 FF is used to override the F V_EN signal whenthe channel is inactive.clkmodThe clock module contains all the circuits that generate timing related signals.cmpl3This is a 13-bit comparator with equality indicated by an active low output.cmpcntThis 12-bit counter has an overflow bit to indicate that a counter overflow has occurred. Thiscounter is reset by sreset which is asserted at the start of each test.cntrlThis module generates internal control signals for the various operating modes and monitors theinternal and external signals for termination conditions. Mode changes (program, calibration, andtest) are synchronized to the falling edge of elk.Test ModeOnce test mode is entered, it could be terminated by several causes: 1) an external halt, ii)a mode change, iii) an end of instruction sequence, or iv) a comparison error. If the terminationsignal (162) is received more than one elk cycle prior to the end of the tclk cycle, test mode will beterminated at the end of the current tclk cycle. Otherwise, testing will stop at the end of the nexttclk cycle.Appendix C. FTC Implementation Details 84Test mode stop causes the clock to stop which unasserts trnclk_enb on the next faffing edgeof clk. This action causes tm_erib to be unasserted after stopped is asserted. The aendb terminationsignal should be timed to cause testing to stop when the ending instruction has completed execution.Termination caused by compare errors (signaled by cmp_okb) always cause testing to stopat the end of the next test cycle, because the termination signal is always received at the beginningof the test cycle following the erroneous test cycle. Termination due to internal causes will generatean external halt signal.delaycntThis 6-bit ripple counter provides synchronous delay for placing waveform edges.dlylnThis variable-length delay line allows lengths from 0 to 31 gate delays. Bypass of the delay line isalso provided by setting del<0>==0.fmemThe format memory is 31 bits wide. The current implementation has 4 words. Data is writteninto this memory 8 bits at a time, but is read 32 bits at once. For external (off-chip) reading,the multiplexor tree selects which byte to output. During test mode (i.e. tm_erib==0) reading isalways enabled.fmemmodThe “format memory index” points to the format last specified by the format instruction. At thestart of a test, this register is reset to point at the format zero, the default format. This five bitregister could index up to 32 formats although only 4 formats are provided in this prototype.The outputs of the format memory are partially buffered by 112. The unbuffered outputsare latched elsewhere (directly into counters located in the waveform formatting circuits).Appendix C. FTC Implementation Details 85The FMT<30:O> bus provides the current format data while the NFMT<18,1:O> busprovides format data that will be used during the next test period. Certain wave formattingcircuits need this information ahead of time.The fmt<2> bit is reset to disable output before the start of test.fmwordThis is a 31-bit word for the format memory.gatedelThis provides the minimal delay element for the delay line. (The delay line is actually laid out sothis schematic is only used for simulation purposes. Changing this schematic will not change theactual implementation.) The edgedel (edge delay) module generates a faffing edge that latches thecurrent test vector for driving the output. This module comprises of three major components:1. delaycrit (delay counter): A synchronous counter which subdivides the test period (telk periods) into a number of cik periods.2. one-shot pulse generator: The dffrbs and the combinational logic forms an active low pulsegenerator that is triggerable once per tclk period. This pulse generator is triggered synchronously by WFSTART and elk. The pulse duration is either the low or high phase of elkdepending on del<5>. The tclkb asynchronously resets the pulse generator on its falling edge.3. dlyln (delay line): A delay line formed using buffer (buf) cells as gate delays.More timing details about this module is given in the hand drawn timing diagrams.idecdelThe idecdel (instruction decode) module decodes the instruction specified by the 2-bit opcode(opcode<1:O>). The operations performed by these instructions are executed by combinations ofthe four control signals given in Table C .4.Appendix C. FTC Implementation Details 86Control Signal PurposeWXFLLENB- used to load format index- indicates a format instructionSLD_ENB- used to load the shift registerSCLD_ENB- enables the shifting of the shift register- indicates a vector instructionRLDENB- used to load the counterTable C.4: Control signals generated by instruction decoder.Table C.5 shows how these four control signals are used to implement the functionality ofthe four instructions.WXFD.ENB SLDENB SCLDENB RLDENBVector 1 0 0 0Format 0 0 1 1Repeat 1 1 1 0Note: 0=active 1=inactive (i.e. active-low)Table C.5: Control signals for executing instructions.When a channel is not ready to decode a new instruction, cready becomes low which causesthe four control signals to synchronously become unasserted (high). The inactive control signalsallow the various controlled modules to complete their operations. Once an instruction completesexecution, its channel is ready for the next instruction and cready becomes high.The latching edges for the four control signals are:tclk faffing edge:loadenb faffing edge:WXFDENBSLDENB, SCLDENB, and RLDENBiobufmodThe I/O buffer module provides buffering for input instructions. Actually, this revel of buffering isonly needed in designs that exploit the idle periods in the instruction stream. This module is usedAppendix C. FTC Implementation Details 87only for input buffering in the current design.latregl3This 13-bit resister with output enable is used for static address bound registers. The memory I/Omodule contains the address registers and the I/O buffers. The 130 and MUX allows the iobufmodto be selectively written during calibration mode. During test mode, the write signals are internallygenerated.modebitsThis module contains the bits for the following three modes: i) stop on compare error, ii) stop onaddress end, and iii) sync signal output. The address decoding that is also done in this module islisted in Table C.6.pulsegenOThe pulsegenO (pulse generator active low pulse) circuit generates an active low pulse used forclearing the vector flip-flop. This pulse is generated at the beginning of the test period (e.g. RL)and/or at the ti edge (the width time). In the following description, these pulses will be referredto as the START PULSE and the WIDTH PULSE respectively.START PULSEThe start pulse is generated only for the RL format and the RC format with V=1. Sincethe start pulse occurs right at the beginning of the test period (it is triggered by tclkb—+0), the Dinput data must be setup before tc1kb—0. These inputs (nfmt<1:0> and nv) are determined bylooking ahead to the next test period. The start pulse is activated when SPO.Q is 1. Table C.7gives the truth table for generating SPO.D. This table was reduced using the karnaugh map shownin Table C.8. Equation C.1 gives the actual logic function that was implemented.Appendix C. FTC Implementation Details 8810:6 5:0 Description00000 System Space000000 tclk period [W/O]000001 address wrap (4) / stop on compare error (4) [W/O]000010 sync. output select (3-bit) [W/Oj000 lxx Channel buffers during calibrate mode [W/O]010000 Addr. end status (4) / comp. error status (4) [R/O]001000 LSB ch#0 error count [R/O]001001 MSB ch#0 error count [R/O]001010 LSB ch#1 error count [R/O]001011 MSB ch#1 error count [R/O]001100 LSB ch#2 error count [R/O]001101 MSB ch#2 error count [R/O]001110 LSB ch#3 error count [R/O]001111 MSB ch#3 error count [R/O]00001 Input Address Space BoundsOxx000 LSB restart address for ch#xxOxxOOl MSB restart address for ch#xxOxxOlO LSB end address for ch#xxOxxOll MSB end address for ch#xxOxxlOO LSB starting address for ch#xxOxxlOl MSB starting address for ch#xx00010 Format Memoryto Currently using only block 0001010000ccffbb cc = channel numberif = format numberbb = byte number (byte ordering: 3 2 1 0)Table C.6: Address decoding in modebits module.Appendix C. FTC Implementation Details 89nv nfmt<1:0> SPO.D0 00 00 01 00 10(RL) 10 11 01 00(RC) 11 01 01 10(RL) 11 11 0Table C.7: Truth table for generating SPO.D in pulsegenO module.nfmt<1:0>00 01 11 10fl?) 0 0 0 0 111001Table C.8: Karnaugh map for generating SPO.D in pulsegenO module.SPO.D = nfmt<0>+nfmt<1>*WY= nfmt<0>+nfmt<1>+nv (C.1)The SPO flip-flop is active in the 1 state (since this causes PULSEJ?B—*0). The SPO FF isreset when fvb == 1 (i.e. the vector FF is 0).WIDTH PULSEThe width pulse is generated only for the Rb format and the RC format with V=1. Ifthese conditions are met, the width pulse is generated when wpin goes low. The width pulse isactivated when WPO.Q is 1. This depends on the combinational logic at the D input. Table C.9gives the truth table for generating WPO.D. This table was reduced using the karnaugh map shownin Table C.10. Equation C.2 gives the actual logic function that was implemented.Appendix C. FTC Implementation Details 90v fmt<1:O> WPO.D0 00 00 01 00 10(RL) 10 11 01 00(RC) 11 01 01 10(RL) 11 11 0Table C.9: Truth table for generating WPO.D in pulsegenO module.fmt<1:0>00 01 11 10vO 0 0 0 111001Table C.10: Karnaugh map for generating WPO.D in pulsegenO module.WPO.D = fmt < 0> +fmt < 1 > * y= fmt < 0> +fmt < 1 > +v (C.2)The WPO flip-flop is active in the 1 state (since this causes PULSERB—0). The WPO FFis reset when fvb == 1 (i.e. the vector FF is 0).pulsegeniThe pulsegeni (pulse generator active high pulse) circuit generates an active high pulse used forsetting the vector flip-flop. This pulse is generated at the beginning of the test period (e.g. RH)and/or at the t1 edge (the width time). In the following description, these pulses will be referredto as the START PULSE and the WIDTH PULSE respectively.START PULSEAppendix C. FTC Implementation Details 91The start pulse is generated oniy for the RH format and the RC format with V—_0. Sincethe start pulse occurs right at the beginning of the test period (it is triggered by tclkb—*0), the Dinput data must be setup before tc1kb—0. These inputs (nfmt<1:0> and nv) are determined bylooking ahead to the next test period. The start pulse is activated when SP1.Q is 0. Table C.11gives the truth table for generating WP1.D. This table was reduced using the karnaugh map shownin Table C.12. Equation C.3 gives the actual logic function that was implemented.fly rifmt<1:0> SP1.D0 00(RC) 00 01(RH) 00 10 10 11 11 00 11 01(RH) 01 10 11 11 1Table C.11: Truth table for generating SP1.D in pulsegeni module.nfmt<1:0>00 01 11 10fly 0 0 0 1 11101 1Table C.12: Karnaugh map for generating SP1.D in pulsegeni module.SP1.D = nfmt < 1 > +nv * fmt < 0>= nfmt < 1 > * nv * nfmt < 0> (C.3)The SP1 flip-flop is active in the 0 state (since this causes PULSEJB—*1). The SP1 FF isreset when fvb == 0 (i.e., the vector FF is 1).Appendix C. FTC Implementation Details 92WIDTH PULSEThe width pulse is generated only for the RH format and the RC format with V=0. Ifthese conditions are met, the width pulse is generated when wpin goes low. The width pulse isactivated when WP1.Q is 0. This depends on the combinational logic at the D input. Table C.13gives the truth table for generating WP1.D. This table was reduced using the karnaugh map shownin Table C.14. Equation C.4 gives the actual logic function that was implemented.v fmt(1:0> WP1.D0 00(RC) 00 01(RH) 00 10 10 11 11 00 11 0l(RH) 01 10 11 11 1Table C.13: Truth table for generating WP1.D in pulsegeni module.fmt<1:0>00 01 11 10vO 0 0 1 111011Table C.14: Karnaugh map for generating WP1.D in pulsegeni module.WP1.D = fmt < 1 > +v * fmt < 0>= frnt < 1> *v + fmt <0> (C.4)The WP1 ifip-flop is active in the 0 state (since this causes PULSEJ?B—*1). The WP1 FFis reset when fvb == 0 (i.e. the vector FF is 1). When resetb or sresetb is low, PULSES also goeslow to reset the VFF.Appendix C. FTC Implementation Details 93rscntThe repeat/shift counter keeps track of the number of cycles required by an instruction. Multi-cycle instructions must load the appropriate count to postpone decoding of the next instruction viacreadyb. Single cycle instructions do not need to initialize this counter. The vector instruction generates scld_enb==0 which causes d<5:0>=111001 regardless of din<5:0>. The repeat instructionsimply loads the desired number of repetitions into the counter.sdisrbitThis shift register bit has the following truth table:shifterTfb dun] loadlenb dini loadenbfclkböo x x x x x * 0TTh x x x x x 0TT x x x x x 1O1X 0 X 0 QQ01 X 0 a 1 J. a aIb 1 X X bbTest data bits must go through this 7-bit shift register to be output to the wave formatter. This7-bit shift register is primarily used to perform parallel to serial conversion on the vector sequenceinstruction. The multiplexor, 127, allows copying the test data bit from bit 5 of the format instruction into the MSB of the SR for immediate output. The NV output gives the vector that would beused in the next test period.sync* D has the same decoding as tne second part of the table.This module multiplexes several signals to the SYNC output.Appendix C. FTC Implementation Details 94s&sek2:0> SYNC000 creadyb001 cmp_okb010 v (unformatted vector)011 fv_en (formatted vector I/O direction)100 tclkb101 loadeub110 sresetb111 tm_enbtcblThe tcbl (tester chip block level) schematic contains the core blocks of the tester chip.tcilThe tcil (tester chip interface level) schematic adds some of the interface related blocks to the tcbl.tclkgen2The tclkgen2 (test clock generator 2 *) module generates the tclk, tclkb, loaderib and stopped signals.Since tclk is synchronously generated from cik, its period is an integral number of cik periods. Thelow phase of tclk is always two cik periods; while the high phase of tclk can vary from 2-259 cl/cperiods (specified by value loaded into the tclk high-phase register). When tclkgen2 is inactive, tclkand loadenb are high while tclkb is low. The tm_enb signal is used to active tclkgen2 and shouldbe asserted on the rising edge of cik. The first tclk period begins one cik phase after tm_enb isasserted. When trn_enb is unasserted, tclkger&2 remains active until the current tclk period is over.The stopped signal gives the current status of tclkgen2.Clean Stop CircuitUpon receiving a disable signal, this circuit causes the clock generator to stop at the endof the current test period. This feature is intended to support cleanly stopping and restarting ofAppendix C. FTC Implementation Details 95testing. The reset signal stops the clock generator immediately.tcplThe tcpl (tester chip pad level) schematic adds the pads to the chip.tmdelThis circuit provides the appropriate synchronous delay in the test mode signal to properly start-upthe pipeline. This module also generates the start reset signal (sresetb).tsThe ts (tester system) module provides a system level simulation of the tester chip. A ROM module,Ii, is used to store the encoded vectors used for simulations. This ROM module was produced usinga ROM generator.vectmodThis module contains the instruction latch (14), instruction decoder (136), the format memorymodule (125), a 7-bit shift register (163) and a counter (13). These components generate the vectorand formatting data required by the wave formatting module.The CREADYB signal indicates whether the channel is ready for more instructions. Thissignal is used for instructions that could provide vectors for more than one test cycle. In thesecases, the counter (13) is used to determine when new data is required. The instruction decoder(136, idecdel) generates the control signals required to execute the various instructions.vectselThis circuit selects the appropriate test data source depending on the format chosen.Appendix C. FTC Implementation Details 96wfmt 2This waveform formatter module generates a formatted waveform given the desired format and testdata value.widthdelThe widthdel (width delay) module generates an active low pulse used to reset the ifip flop after thetest vector has been applied for an appropriate duration (“width”). The operation of this moduleis similar to that of the edgedel except that the D input of 182 provides a means for disabling thepulse (in edgedel the D input is tied high).C.2 FTC TimingFigure C.1 gives a timing diagram showing the fetching of instructions to the decoding stage. Thesignals shown in gray are used in the four channel version.Appendix C. FTC Implementation Details 97UUULJLJ1TLflJLflJLflflJLflJ-‘a2a3aOa1a2a3aOa1a2a3Oa1a2d2 d3 dO dl d2 d3 dO dl d2 d3 dO dl_‘__J UI I I I I IU1 UXX X dO dOXciktm_enbaddr<1 2:0>din<7:0>dout<7:0>01ch_selb—23tclkbIoadenbD<7:0>WRIDX_ENBFDATLAT_ENBSLD_ENBSCLD_ENBRLD_ENBcontrol signalslatched for usetcontrol signal validFigure C.1: Timing diagram showing the fetching of instructions to the decoding stage.Appendix C. FTC Implementation Details 98C.3 FTC Circuit SchematicsThis section contains the schematics corresponding to the circuit documentation provided in Section C.1.Appendix C. FTC Implementation Details 99LCCC .9Ia aaaIt1Appendix C. FTC Implementation Details 100C5Cc)C0______ii101Appendix C. FTC Implementation DetailsAppendix C. FTC Implementation Details 102°0ELCoIihn:L--t. I-1••A AV Vto.0.0C I I0).0 .0I III1Appendix C. FTC Implementation Details 103a5C - 5 0=—rHI rC5 0)00C_____i:5ftEE_______ _______I I I !.a aa LiiiAppendix C. FTC Implementation Details 1040a)0E-cC) 1.)Appendix C. FTC Implementation Details 1050,Iq____ ____AAppendix C. FTC Implementation Details 106470331, SmH-Sm’Vp lbHta)•0CEa)0 cEcCo 0_c co 0Ii;! ,;IS 55555 5 555 sat;IfAppendix C. FTC Implementation Details- 107¶11L.0a)V00So. 0o 0I1CH£ aIAppendix C. FTC Implementation Details 108I,CCCE00V) —.00 —-E‘pAppendix C. FTC Implementation Details 109C7 A AVS2TAppendix C. FTC Implementation Details 110g4D4 4_T T—z0EIAppendix C. FTC Implementation_Details 111otoa)CC.)> >-C 22a) Q)•0 00)112Appendix C. FTC Implementation DetailsAppendix C. FTC Implementation Details 113-—-=)b.o,?UA00 E___________0)A E0A A E 0, E-_—1____:___E____— tOO fIIII C i °vVV.0— I-.00IA2Appendix C. FTC Implementation Details 114A-Appendix C. FTC Implementation Details 115000E1)E—II:I1 CIC IKmaAppendix C. FTC Implementation Details 116IC)CaCa)Ea)a)>‘—a) v-Da) a,D CC’ C)-a)C00a)>‘0a)a)00-C) .00 a.90C)•01.)a) (0’Appendix C. FTC Implementation Details 117;17aoa.a.C2Aa)VI1J\118Appendix C. FTC Implementation DetailsAppendix C. FTC Implementation Details 1191.to4,a)Saa VSa 0a Cre) —A__Na¶4Appendix C. FTC Implementation Details 120200 C0)o —0E —o 0 o1)—C C0 9o 00) 0) 0)C) 00o 00 C 0 0—— 0 >0) 0’. 0_ _1JI_________1I0__ _IAppendix C. FTC Implementation Details 121- 0Co0I -C>UC000)— C— C 0)0) 0)0)a) 0)0. Ca)— =0 0-e-a)Appendix C. FTC Implementation Details 122)-CCa00a)>C-)000a)CC a)a) OCa) a)Co CODtTA) 0)A ‘C.)C..)vgVA.0Ca)C.)a)AVa VA-D80 >AI).)= V8 Va,Appendix C. FTC Implementation Details171230b.0Co0)C00I-CC-)a)-e• U) COaghaIIfAppendix C. FTC Implementation Details 124\I7.0I La 9—.-N 29 = 94S IAppendix C. FTC ImplQmentation Details 1252C,,oC0C-)CC,,0)CC,0).C .t—CC, C0-.LIiiti126Appendix C. FTC Implementation DetailsAppendix C. FTC Implementation Details 127o.rfI v .2L__L.LI L_J• •______________=$ I;- —iqiIIc 1— 1ci’I, II 1IHIL__ _____Appendix C. FTC Implementation Details 128____ __If—<ØC>GSI]Thppq‘AlISI-ci—0<7:0>1<*R>wr tclkhighphaseregisterAppendix C. FTC Implementation Details‘171300Hw_ji-Hw LiIH_LIE!-IH_wE-a)>0C.CcCCo0)-C(CI1Appendix C. FTC Implementation Details 131>‘00)0)J7a‘3-EaAppendix C. FTC Implementation Details 132oEQ)A—_H’VSV: :-1Appendix C. FTC Implementation Details 1333 08 >3Af-iV.-F1HiII€9-_=rrnm.I134Appendix C. FTC Implementation DetailsH Ct, Ct, 0 Ct,136Appendix C. FTC Implementation Details

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0064845/manifest

Comment

Related Items