UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Testability infrastructure for Systems-on-Chip Nahvi, Mohsen 2004-12-31

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
ubc_2004-931625.pdf [ 7.81MB ]
Metadata
JSON: 1.0065406.json
JSON-LD: 1.0065406+ld.json
RDF/XML (Pretty): 1.0065406.xml
RDF/JSON: 1.0065406+rdf.json
Turtle: 1.0065406+rdf-turtle.txt
N-Triples: 1.0065406+rdf-ntriples.txt
Original Record: 1.0065406 +original-record.json
Full Text
1.0065406.txt
Citation
1.0065406.ris

Full Text

TESTABILITY INFRASTRUCTURE FOR SYSTEMS-ON-CHIP by MOHSEN NAHVI B.Sc. University of Manchester Institute of Science and Technology, 1989 M.Sc. University of Manchester Institute of Science and Technology, 1990 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES C E L E C T R I C A L AND COMPUTER ENGINEERING) We accept this thesis as conforming to the required standard  THE UNIVERSITY OF BRITISH COLUMBIA June 2004 © Mohsen Nahvi, 2004  Abstract Relying on external automatic test equipment (ATE) resources is insufficient for the new paradigm of billion-transistor core-based System-on-Chip (SoC) designs.  Embedded  testers that take over some functionality of these ATEs are increasingly deemed essential. To achieve high-quality test and reduce cost, these embedded infrastructures need to perform deterministic tests and exploit the advantages of automatic test pattern generation (ATPG) test vector sets.  This thesis proposes an embedded testing  infrastructure that leverages the potentials of the classical embedded testing in the form of Built-in Self-Test (BIST). However, unlike BIST, the methodology of this thesis is based on the conventional scan/ATPG approach. This novel methodology partitions test resources to embed the test application and test results analysis on-chip while keeping the A T P G test vector files off-chip. The proposed infrastructure was implemented on silicon and experimental area and test time results are reported. Using the methodology of this thesis, a high-quality deterministic test, with reduced overall test time through ideal multi-site testing, can be achieved. Modular, flexible, and systematic test architectures are also deemed essential in SoC tests. The conventional testing paradigm requires a direct connection between a tester and the circuit under test (CUT). This arrangement undermines the modularity in the test  n  architecture by tightly coupling its elements. This thesis proposes to de-couple test data processing and communication to lower test cost. To that end, a novel systematic and indirect test architecture that is based on network-oriented protocols is proposed. In this new architecture, test stimuli and expected results for digital cores are formatted into new protocols and then encapsulated into packets. These packets are augmented with control and address bits allowing them to be autonomously transmitted to their destination through a switching infrastructure. Finally, embedded autonomous blocks at each core are used for applying the test and comparing the results. In this way, the methodology of this thesis facilitates test cycle automation and eliminates the need for control lines. This results in better utilisation of available resources. A first implementation of this new architecture and its area and test time impact are presented.  iii  Table of Contents Abstract  ii  Table of Contents  iv  List of Tables  viii  List of Figures  ix  Acronyms  x  Acknowledgements  i  xiv  Chapter 1 Introduction  1  1.1 Current Test Strategies  3  1.2 Research Goals  5  1.2.1  Dedicated Autonomous Scan-based Testing  5  1.2.2  Test Network-on-Chip  7  1.3 Contributions  11  1.4 Thesis Organization  13  Chapter 2 Background and Motivations  14  2.1 System-on-Chip Design Methodology.  14  2.2 Reuse Paradigm  17  iv  2.3 Test Challenges  19  2.3.1  Test Challenges in D S M Technology  19  2.3.2  Test Challenges in SoC  22  2.4 SoC Test Trends  2  4  2.4.1  Test Reuse  25  2.4.2  Systematic Test Architecture  25  2.4.3  Test Resources Partitioning  27  2.4.4  Multi-site Testing  28  2.4.5  Test Data Compression  29  2.4.6  Embedded Testing  29  2.5 Current Test Architectures  31  2.5.1  Standard Wrapper  31  2.5.2  Built-in Self Test  33  2.5.3  ATE-based Test Architectures  39  Chapter 3 Dedicated Autonomous Scan-Based Testing  42  3.1 Introduction  42  3.2 Components of Scan-based Testing  43  3.3 DAST Concept  46  3.4 Implementation  50  3.4.1  EAS & E A R A Compilers  51  3.4.2  EAS & E A R A Hardware  54  3.5 Experimental Procedure  :;  59  3.6 Results  64  3.7 Summary and Conclusions  72  'hapter 4 Network-Oriented Indirect and Modular Architecture for Test  75  4.1 Introduction  75  4.2 N I M A Concept  79  4.3 Physical Layer  83  4.4 Network Layer  84  4.4.1  Packet Format  85  4.4.2  Switches  87  4.4.3  Dynamic Addressing Mechanism  91  4.4.4  Routing in the Switches  94  4.5 Application Layer  95  4.6 Implementation  98  4.6.1  Physical Layer  98  4.6.2  Network Layer  99  4.6.3  Application Layer  101  4.7 Experimental Results  102  4.7.1  Area and Power Overhead  103  4.7.2  N I M A ' s Test Time  103  4.8 Summary and Conclusions  107  vi  Chapter 5 Conclusions  110  5.1 Summary  110  5.2 Future Work  116  Bibliography  118  Appendix A : D A S T Test Time Models  131  Appendix B: Theoretical Test Time Models for Serial ATE-based Testing  135  vii  List of Tables Table 2-1: ITRS Prediction for Number of Transistors on a Chip [4]  15  Table 3-1: EAS Area and Power for the UBC_SoC Benchmark Cores  65  Table 3-2: E A R A Area and Power for the U B C J S o C Benchmark Cores  65  Table 3-3: EAS Area and Power for ITC'02 SoC Benchmark Modules  66  Table 3-4: E A R A Area and Power for ITC'02 SoC Benchmark Modules  67  Table 3-5: Simulated DAST Test Time (clock cycles) and its Test Time Models Prediction Values for Cores of UBC_SoC  70  Table 3-6: Predicted Test Time (clock cycles) for ITC'02 SoC Benchmarks Modules in DAST Methodology  71  Table 4-1: N I M A Simulated Test Time and its Test Time Model Prediction Values for Cores of UBC_SoC (in clock cycles)  105  Table 4-2: Predicted N I M A Test Time for ITC'02 SoC Benchmarks Modules (in clock cycles)  106  viii  List of Figures Figure 1-1: A n example of a simplified SoC test architecture  4  Figure 1-2: D A S T concept  6  Figure 1-3: N I M A concept  8  Figure 2-1: Example of a System-on-Chip [5]  16  Figure 2-2: Potential design complexity and designer productivity [9]  17  Figure 2-3: Design development and test flow for (a) System-on-Board, and (b) Systemon-Chip [15]  23  Figure 2-4: A generic conceptual test architecture [15]  26  Figure 2-5: Block level overview of a P1500 wrapper [45]  32  Figure 2-6: Conceptual view of the required P1500 wrapper architecture [45]  33  Figure 2-7: Block diagram of a typical BIST arrangement  34  Figure 3-1: Block diagrams of (a) D-type flip-flop, and (b) Scan D-type  flip-flop  44  Figure 3-2: Generic scan test waveforms  45  Figure 3-3: Concept of DAST  48  Figure 3-4: D A S T design  flow  49  Figure 3-5: E A S compiler algorithm  52  Figure 3-6: E A R A compiler algorithm  54  ix  Figure 3-7: Hardware implementation of D A S T components  55  Figure 3-8: EAS block Algorithmic State Machine  57  Figure 3-9: E A R A block Algorithmic State Machine  59  Figure 3-10: U B C J S o C Benchmark  61  Figure 3-11: P1500 BSR and adjusted EAS and E A R A areas for modules of ITC'02 Benchmarks  69  Figure 4-1: Conceptual representation of N I M A  80  Figure 4-2: A n example showing function of the I M block  82  Figure 4-3: The conceptual 3-layer model in N I M A  83  Figure 4-4: N I M A packet format  85  Figure 4-5: Black box diagram of a switch in N I M A with four output channels  87  Figure 4-6: Occupancy of test-stimuli sub-channels in N I M A switches  88  Figure 4-7: Occupancy of test-results sub-channels in N I M A switches  88  Figure 4-8: A typical payload array in sub-channels with invalid last bits marked by " X " . 90 Figure 4-9: Address spaces for N I M A ' s Network Layer  93  Figure 4-10: Interconnect architecture for the switch fabric in UBC_SoC  99  Figure 4-11: Block diagram of a switch in the N I M A implementation  100  Acronyms ASIC  Application-Specific Integrated Circuit  ASM  Algorithmic State Machine  ATE  Automatic Test Equipment  ATPG  Automatic Test Pattern Generation  BIST  Built-in Self-Test  BSR  Boundary Scan Register  CAS  Core Access Switch  CTL  Core Test Language  CUT  Circuit under Test  DAST  Dedicated Autonomous Scan-based Testing  DFT  Design for Testability  DSM  Deep Sub-Micron  EARA  Embedded Autonomous Results Analyzer  EAS  Embedded Autonomous Sequencer  EDA  Electronic Design Automation  FSM  Finite State Machine  GALS  Globally Asynchronous and Locally Synchrono  xi  IC  Integrated Circuit  I-IP  Infrastructure IP  ILP  Integer Linear Programming  IM  Interface Matching  I/O  Input/Output  IP  Intellectual Property  ITRS  International Technology Roadmap for Semiconductors  LAN  Local Area Network  LAS  Logical Address Space  LFSR  Linear Feedback Shift Register  LSB  Least Significant Bit  NIMA  Network-oriented Indirect and Modular Architecture  NoC  Network on Chip  PAS  Physical Address Space  PI  Primary Input  PLL  Phase Locked Loop  PO  Primary Output  SC  Scan Chain  SE  Scan-cell Element  SECT  Standard for Embedded Core Test  SI  Scan Input  xii  SO  Scan Output  SoC  System-on-Chip  TAM  Test Access Mechanism  TAP  Test Access Port  TLM  TAP Link Module  TP  Test Pattern  TS  Test Select  UDL  User-Defined Logic  VLSI  Very Large Scale Integrated Circuit  WAN  Wide Area Network  WBR  Wrapper Boundary Register  WIR  Wrapper Instruction Register  WPC  Wrapper Parallel Control  WPI  Wrapper Parallel Input  WPO  Wrapper Parallel Output  WSC  Wrapper Serial Control  WSI  Wrapper Serial Input  WSO  Wrapper Serial Output  xiii  Acknowledgements I am very grateful to my supervisor, Professor Andre Ivanov, for his friendship, support, guidance, and encouragement in the course of my graduate studies. I have been truly fortunate to have the pleasure of working with him for the past few years while conducting the research of this thesis. Andre has contributed a great deal to the manner in which I have conducted the work of this thesis, and to the development of my ability to write technical papers and be articulate when presenting my work. I thank him most sincerely for all his contributions to my professional life. M y sincere thanks go to my dear wife, Elham. Her love and support have always been unconditional and heart warming. She never ceases to amaze me with her ability to be so loving and caring. She has contributed so much to the successful completion of my graduate studies through her constant support, kindness, and love. Thank you Elham. I am also grateful to Professor Resve Saleh and Dr. Steve Wilton who have helped and encouraged me as my supervisory committee. M y thanks are also extended to the many great people working in the System-on-Chip (SoC) lab, and, in particular, to Roozbeh Mehrabadi and Victor Aken'Ova for their readiness to help in any way. grateful to Doris Metcalf, and Shahab and Ameneh Ghoreishi.  xiv  I am also  Finally, I would like to thank Canadian Microelectronics Corporation (CMC), Micronet, PMC-Sierra, Genum Corporation, and the University of British Columbia for their financial support.  xv  To Elham, for her love and kindness.  xvi  Chapter 1  Introduction The semiconductor industry has witnessed an astonishing rate of growth in the past 40 years.  The fundamental element contributing to this growth has been the continuous  reduction in the size of the transistors on a chip.  This trend has been doubling the  number of transistors on an integrated circuit (IC) every eighteen months or so, a wellknown trend referred to as Moore's law. According to the 2003 edition of International Technology Roadmap for Semiconductors (ITRS), this rate of growth could be sustained for at least fifteen more years, resulting in even more complex very large scale integrated (VLSI) circuits [1].  However, random failures in the process of fabricating an IC cannot be avoided and these failures result in defective chip circuitry.  Hence, it is essential to test  semiconductor ICs after fabrication. This post fabrication test is intended to guarantee,  1  Chapter 1: Introduction  2_  with a high probability, the absence of possible defects in the chips. In this way, the post fabrication test verifies a chip's reliability before it is used in a system [2][3]. ICs are tested in different stages (sometimes referred to as test insertions). Based on the intended goals, test insertions can target the parametric, temperature stressing, logical, functional, and timing integrity of the integrated circuit under test (CUT). Testing VLSI circuits can be a very expensive and difficult process and will grow in its cost and complexity as ICs follow Moore's law [4]. Design for testability (DFT) techniques are, hence, essential to facilitate the testing of VLSI chips and reduce test cost [2] [4].  In addition to certain  design guidelines, which enhance the testability of a C U T , DFT techniques generally consist of added hardware or circuitry that improves observability and controllability of internal nodes of the CUT.  The focus of this thesis is on DFT techniques for digital circuits. In particular, this thesis proposes two effective and novel testability infrastructures for testing embedded digital blocks of system-on-chip (SoC) designs. These unique infrastructures combine the advantages of the current SoC test solutions in innovative ways in order to accommodate different trends and requirements in addressing the SoC test challenges. The solutions developed in this thesis lower test cost by partitioning of the test resources, performing embedded deterministic test, and enabling ideal multi-site parallel testing. Moreover, when compared to existing solutions, the proposed techniques described in this thesis exhibit modular, systematic, and flexible architectures that result in improved quality, productivity, and diagnostic capability of SoC post fabrication test. Since the focus of  Chapter 1: Introduction  3_  this thesis is on DFT techniques for digital blocks of SoC designs, unless explicitly mentioned otherwise, this thesis uses the terms test and DFT as applied to digital components of an SoC design.  Section 1.1 of this chapter outlines the current test strategies.  Section 1.2 motivates  and presents the research goals of this thesis. Section 1.3 outlines the contribution of this thesis in terms of its original and novel work.  Finally, Section 1.4 provides the  organization of the rest of this thesis.  1.1 Current Test Strategies There are two distinctive testing strategies for digital circuits: external testing of a chip using automatic test equipment (ATE); and embedded self-testing in the form of logic Built-in Self-Test (BIST). In ATE-based testing, automatic test pattern generation (ATPG) tools are used to create deterministic, test vector sets. Using the A T P G test vector sets, the A T E applies the test stimuli to the CUT and collects the test results to compare them with the expected values.  Effectively, in the ATE-based testing strategy, the communication and the  application of test data are closely coupled. In other words, the A T E acts as both a test source and a sink and it is assumed that the tester establishes a data path between itself and a core such that the tester has direct control of features of the core's DFT. Using this data path, the tester applies the test data to the core before it collects and observes the results. Figure 1-1 shows an example of a simplified test arrangement with six data lines  Chapter 1: Introduction  -also referred to as test access mechanism ( T A M ) - and two control lines. In Figure 1-1, the data lines are grouped into two T A M s such that the tester can have direct control of the scan chains of each core, while minimising test time.  Data in| lines  !b  T  B  Core \  Core B  V  S T  ID  l£ R  Wrapper  Control lines Data ou( lines  CoreC  Scan chain with 8flipflops  TAM 1 with one data line TAM 2 with two data lines  Figure 1-1: A n example of a simplified SoC test architecture.  In contrast to ATE-based testing strategy, typical logic BIST techniques rely on pseudo-random test patterns. In a typical BIST arrangement, embedded blocks generate the test stimuli and apply them to the CUT. Test results are compacted into signatures and these signatures are compared to the expected ones. In effect, the BIST strategy can be considered as the opposite of the ATE-based strategy, as both the source and the sink of the test architecture are on-chip.  Chapter 1: Introduction  5  1.2 Research Goals 1.2.1 Embedded Deterministic Scan-based Testing One goal of this thesis is the development of a new methodology for embedded deterministic testing of digital cores in SoC designs. In this thesis, this novel method is referred to as dedicated autonomous scan-based testing (DAST). With DAST, the goal is to leverage the potentials of logic BIST in SoC test, without paying the potential penalties of BIST. To that end, the DAST methodology uses embedded test resources that use A T P G test stimuli and expected results, as illustrated in Figure 1-2. Effectively, in the D A S T methodology, DFT-tester resources are partitioned, keeping the A T P G test vector sets off-chip while embedding the control and the observation functions at the cores. As such, DAST is a novel embedded deterministic testing infrastructure, where both the test stimuli and the expected test results of the A T P G test vectors are used for on-chip A T P G grade testing. In the unique methodology of DAST, the embedded resources require the communication of ATPG-based test stimuli and expected results through global interconnects that act as a communication link. This communication link can be any generic T A M or the novel on-chip network-oriented switching fabric developed in this thesis as outlined in Section 1.2.2.  Chapter 1: Introduction  6  A l PCi-based Test Vector Sets  I o i l Ai.-i.-i"*-. Mculum'Mii I I  Control & Observe  1ST  Pass/Fail  \\1)  I Control & Observe  ~ S  Pass/Fail  SoC  Figure 1-2: DAST concept. The partitioning of the test resources in D A S T lowers DFT-testers cost. This, in turn, reduces total production costs. However, unlike conventional BIST, the partitioning of the test resources in DAST does not require any changes to embedded cores and only requires minimal modification of the A T P G test flow.  In this way, DAST results in  flexible and modular test architectures that facilitate the reuse of test resources and achieving a highly productive test scheme. DAST benefits from the advantages of both the BIST and the ATPG-based testing as, with DAST, A T P G test vector sets are used alongside embedded blocks. Hence, DAST enables testing to be performed at-speed and deterministically, ensuring high test quality with minimal test time. Moreover, contrary to the BIST methodology, the diagnostic information in DAST is not lost, as no compaction is implemented in the form of a signature.  Chapter 1: Introduction  7_  Owing to the fact that both the test stimuli and the test results are sent to the cores, DAST, theoretically, achieves an ideal multi-site testing strategy for ATPG-based testing. Hence, given unlimited power for the external test resources, a tester, with a limited number of test channels that are equal to the number of the test pins of one chip, can test an unlimited number of chips in parallel. This ideal case is achieved as the same data set is sent to all chips, enabling sharing of test channels between all the chips.  1.2.2 Test Network-on-Chip As a second objective, this thesis proposes and develops the architecture of a novel onchip test network. In this thesis, this architecture is referred to as a Network-oriented, Indirect and Modular test Architecture (NIMA). In N I M A , the tester is assumed to lack direct control over a core's DFT and is de-coupled from the T A M , and the core and its wrapper. This de-coupling facilitates partitioning of an external tester into its external and embedded components, and enhances the modularity and scalability of the test architecture. In addition, N I M A presents the concept of a highly hierarchical, modular, and flexible test architecture that can address high-productivity requirements of testing SoC designs and, ultimately, help in lowering test cost. Figure 1-3 illustrates combined N I M A and DAST concepts as a block diagram. In N I M A , test data, comprised of test stimuli and test results, is formatted into a new protocol and augmented with control bits. In this way, control lines are eliminated in N I M A and the T A M is only comprised of test signal lines that communicate both the test  Chapter 1: Introduction  8  data and the control bits. Hence, using N I M A , the effective test time is reduced, as the T A M lines are used more efficiently.  Test stimuli and expected results in packet format Local Sources  \  ^y,^ \  (Off-chip)  Remote Sources  \ \  '  (Off-chip)  \Transmission  Intel I.IU. Matching (l\1)  ' '• Switching Fabric  CORE Pas^Faill  Sources (On chip)  CORE Pass/Fail  CORE SoC  Figure 1-3: Combined N I M A and D A S T concepts.  The reformatted test data and the control bits in N I M A are then encapsulated into packets such that the test data can be forwarded to its core autonomously.  This  autonomous behaviour of the N I M A architecture serves well in automating test cycles, a  requirement of high-productive test.  Moreover,' as no interaction from the tester is  required in this step, a low-cost test template is achieved. Finally, at a core, embedded autonomous blocks decode N I M A ' s packets and retrieve the test data. These embedded blocks apply the test stimuli to the core and compare the responses with expected results, identifying any error. This process can be handled using the DAST approach. In using autonomous embedded blocks, N I M A achieves the main requirements of SoC testing. More specifically, N I M A serves as a methodology that enables high quality at-speed testing. Moreover, an important system-level design challenge identified in [4] is due to system complexity.  It is predicted that this challenge will lead to forcing a focus on  communication rather than computation in the next ten years. It is stated in [4] that: "At 65 nm and below, communication architectures and protocols for on-chip functional processing  units will require significant change from today's  approaches. As it becomes impossible to move signals across a large die within one clock cycle or in a power-effective manner, or to run control and dataflow processes at the same clock rate, the likely result is a shift to asynchronous (or, globally asynchronous and locally synchronous (GALS)) design style. In such a regime, islands of self-timed functionality communicate via network-oriented protocols."  Chapter 1: Introduction  10  Therefore, an underlying assumption of this thesis is that future SoC designs will include a switching fabric coupled with network-oriented protocols, for core interconnect. Hence, this thesis proposes the first methodology and test architecture that enables the reuse of such a fabric for testing cores. Evidently, the concept of having a network-on-chip (NoC) that uses network-centric protocols and a switching fabric for data communication on a chip is very new. Most relevant works are geared towards the problem of core interconnection [5][6] [7] [8]. The motivations for many of these works are the non-scalability of global wire delays and their effect on global synchronization, the degradation of bus electrical performance with every attached unit, and the need for special error control mechanisms because of the unreliable transmission medium. Based on the open literature, no work on the subject of test architectures that use network-oriented approaches has been reported to date, with the possible exception of [9] and [10]. Reference [9] studies the impact of reusing an NoC for testing core-based systems, and reports the results of such reuse on test time based on two proposed scheduling mechanisms. When reusing an NoC, the work in [10] studies the effect of power consumption on the test time and test scheduling. Hence, the studies in [9] and [10] focus on the general concept of test data packetization and its impact on test time, and they do not address the methodology and the implementation of the test architecture required for NoC reuse. Therefore, this thesis is the first to propose, develop, and empirically validate the methodology and implementation of a test network on-chip  Chapter 1: Introduction  11_  that can use its unique switching fabric or reuse other networks on-chip as they become available.  1.3 Contributions In short, the principal contributions of this thesis are listed here. •  The DAST methodology of Chapter 3 is developed as a novel on-chip deterministic tester. In addition, D A S T is developed such that it partitions the functionality of external testers into two distinctive operations of data delivery and control/observation.  This partitioning enables embedding  of the  latter  functionality at cores of an SoC design. •  The algorithms of Section 3.4.1 are developed to enable the full use of A T P G in the embedded testing methodology of DAST.  The use of A T P G results in  deterministic high-quality embedded testing in DAST, and the algorithms of Section 3.4.1 can be integrated into the design flow of a core to enhance its productivity. •  A n efficient and simple implementation of DAST hardware is presented in Section 3.4.2.  This implementation enables automatic and flexible DAST hardware  development, as it uses hardware description language with generic parameters that can be set for any cores.  Moreover, the implementation avoids design  iterations, as it only requires the interface information of cores and does not alter cores or their functionality.  Chapter 1: Introduction  •  12_  DAST is characterized in terms of its test time models and these models are presented in Section 3.5.  •  A novel on-chip test network is proposed and developed in Chapter 4.  NIMA  methodology of Chapter 4 is developed to enable a hierarchical systematic test architecture that can reuse an on-chip switching fabric with network-oriented protocols. •  N I M A is developed as a 3-layer communication network to enable modularity and scalability of test architectures. In this way, the design of N I M A facilitates test resource partitioning.  •  A novel dynamic addressing mechanism to simplify the design of N I M A ' s network layer is presented in Section 4.4.3.  •  A simple and efficient implementation of N I M A network layer is presented in Section 4.6.2, while DAST implementation is altered to work as the application layer of N I M A .  To facilitate N I M A flexibility and the automation of the  integration of N I M A in an SoC design flow, this implementation uses a hardware description language and generic values for different parameters of N I M A ' s design. •  N I M A is characterized in terms of its test time models of Section 4.7.2.  •  The methodologies of N I M A and DAST are empirically validated to fit into a typical ASIC design flow, and N I M A and D A S T are implemented on silicon. This  Chapter 1: Introduction  13  validation supports the fact that these methodologies can easily be integrated in a design flow.  1.4 Thesis Organization Chapter 2 of this thesis, after the introduction of the SoC design methodology, reviews the test challenges in SoC designs.  It then discusses the test trends and solutions in  addressing these test challenges. Finally, Chapter 2 reviews the current test architectures in detail. Chapter 3 reviews the concept of scan-based testing and identifies the components of scan-based testing, i.e., data delivery and control/observation. Chapter 3 then presents the concept of DAST in detail, provides the design flow, and presents an implementation for DAST. It also presents the experimental results when applying the D A S T methodology to a number of benchmark circuits. In Chapter 4, further motivation for N I M A is presented.  Chapter 4 also provides  N I M A ' s concept and architecture, and suggests an implementation of N I M A where DAST is integrated into the architecture.  It also presents the experimental results of  using the N I M A methodology in a number of benchmark circuits. Finally, Chapter 5 concludes the thesis by summarizing the results and thesis contributions and provides direction for future work.  Chapter 2  Background  and  Motivations This Chapter reviews the background and outlines the test challenges in SoC designs. The required test trends to address SoC test challenges are then discussed.  Finally,  current test architectures are reviewed and the research work of this thesis is motivated.  2.1 System-on-Chip Design Methodology Table 2-1, an extract from [4], compares the number of transistors on a chip in the year 2001 to that predicted for the year 2016.  14  15  Chapter 2: Background and Motivations  Table 2-1: ITRS Prediction for Number of Transistors on a Chip [4] Year High-volume Microprocessors Functions per chip at introduction (Million Transistors) Application Specific ICs (ASIC) Maximum functions per chip at production (Million Transistors)  2001  2016  193  6,184  714  16,326  Using the large number of transistors available on a chip, designers have already managed to place an entire electronic system on a single chip. These are referred to as systems-on-chip or system chips for short. A n example of an SoC is illustrated in Figure 2-1, where different embedded functional blocks (or embedded cores) of the system are conceptually shown [11]. As the number of the transistors increases, more complex systems, utilising hundreds of embedded cores, will be placed on a single chip. Apart from the availability of large transistor counts, there are other reasons behind this trend. One important reason is the market demand for portable products with increasing functionality [11][12], and another important reason is the need for a lower product cost [11][13].  16  Chapter 2: Background and Motivations  Figure 2-1: Example of a System-on-Chip [11].  The justification behind the first reason is that SoC designs are faster, more reliable, and require less power per function compared to systems on a board. This holds true as most of the communication is on-chip and, hence, there is no need for large and powerhungry drivers at the chip terminals [14].  In addition, the smaller footprint is more  suitable for portable, low-weight, and low-cost products. The rationale behind the comparatively lower cost of products using SoC chips is the fact that smaller and more compact products require fewer packages.  Products with  fewer packages, when compared to those with more packages, amount to comparatively simpler design process and integration. Hence, the overall product cost can be reduced significantly.  In addition, simpler design process result in faster introduction of the  product into the market and, hence, will generate more profit.  Chapter 2: Background and Motivations  VJ_  2.2 Reuse Paradigm Figure 2-2, adopted from [15], illustrates potential design complexity and designer productivity. In Figure 2-2, the horizontal axis represents the year and the left hand side vertical axis shows the number of logic transistors per chip in million units. In addition, in Figure 2-2, the vertical axis on the right hand side represent a measure of designer productivity in the form of the number of thousand transistors each designer can design and verify per month in his/her designs. According to Figure 2-2, designers' productivity growth rate is equal to 21% each year. However, the number of logic transistors per chip follows Moore's law and increases at a rate of 58%. Owing to these different rates of growth, the gap between the productivity of designers and the number of available transistors is widening very rapidly.  a  -  a U  o  Potential Design Complexity and Designer Productivity 10,00(r Equivalent Added Complexity Logic Tr./Chip 1,000" — TrVS.M. 100 10  56%/Yr. compounded  Complexity growth rate ^  1  o  i.  0.01 0.001  --+- 1 [ I I  »  ©  10,000 1,000  ^.-^  100 10  *—— y  0.1  "3D  100.000  — Zfsyyv. compound  Productivity growth rate  1 1 1 1 1 1 II 1 1 1 1 I I 1 1 1 1 1 1 M (M  <M  cv  CM  1  1 0.1 0.01  CM  Figure 2-2: Potential design complexity and designer productivity [15].  O u ~  s s o  Chapter 2: Background and Motivations  18_  It is becoming increasingly difficult to utilise all the transistors available on the chip effectively and still meet the time-to-market constraints. Nowadays, the very short life cycle of products compounds the problem further, as demand for a product might diminish before designers have a chance to introduce their version of the product [15][16][17][18]. To overcome the productivity gap and meet the stringent time-to-market constraints, it is not enough to put more manpower on the job.  In fact, by virtue of the law of  diminishing return, it is not economical. Hence, the only viable solution put forward so far is for designers to reuse in-house or externally acquired pre-designed/verified intellectual  properties  [4][12][13][17][18][19][20].  (IP)  as  embedded  cores  in  SoC  designs  Effectively, SoC designs need to be application-specific  integrated circuits (ASIC) that maximize reuse of IPs to improve design productivity. The concept of reuse is not new to the semiconductor industry. In the typical design flow of systems-on-board, pre-fabricated ICs are reused. However, the difference in the case of SoC design methodology is the fact that cores to be used in SoC designs are not fabricated before they are used. Instead, cores in SoC design methodology come in some form of hardware description. Based on the method of their hardware description, there are three main categories for cores, i.e., soft, firm, and hard cores [18]. Soft cores are those functional blocks that are described in any synthesizable hardware description language such as V H D L or Verilog. Firm cores are typically gate-level descriptions that are targeted for a specific fabrication technology and can be optimized for speed, area,  Chapter 2: Background and Motivations  19_  power, etc. Finally, hard cores use the least flexible description method, and are the layout description -usually in a standard industry format such as GDSII- of optimized firm cores.  2.3 Test Challenges In order to satisfy the reliability requirements of fabricated ICs, it is imperative to test these chips before they are shipped out to potential customers. However, there are many challenges in testing SoC chips, and many experts believe that testing SoC chips will be the bottleneck of future designs, i f issues of DFT for SoCs are not addressed [4][21][22][23]. These challenges are the result of both the SoC design methodology itself [4] [20][21] [24] [25] [26], and the deep sub-micron (DSM) technology used to fabricate SoC chips [24] [27][28][29]. This section summarizes these challenges.  2.3.1 Test Challenges in DSM Technology The D S M fabrication process, shrinking chip geometries, the continuity of Moore's law, aggressive time-to-market and time-to-volume requirements, and finally, the need for lower total cost are all contributing to the test challenges in the D S M era. Based on different test requirements, some of the important test issues due to the D S M technology can be categorized into the following five groups [4][27][28][29][30].  High-quality Test Very high-quality test, which achieves high fault coverage, is essential in manufacturing highly reliable chips. To guarantee high fault coverage, in addition to the stuck-at-fault  Chapter 2: Background and Motivations  20_  models, new fault models are needed to facilitate the detection of static and dynamic defects in the D S M fabrication processes [4][29][30]. Therefore, at-speed testing, using transition and path delay fault models in conjunction with scan chain(s), is becoming a requirement of chip testing [31]. However, in many cases, the performance of external testers lags behind the performance of chips they intend to test. In particular, external testers cannot operate at the high frequency required for at-speed testing of many CUTs and have insufficient accuracy for pin-to-pin timing requirements [31]. Hence, these testers lack the necessary accuracy and resources to handle at-speed testing and other new fault models. Therefore, new DFT methodologies and testing techniques are needed to facilitate at-speed testing.  In addition, these new DFT methodologies and testing  techniques should maintain the high quality of test and be able to use a variety of fault models.  Low-cost Test With almost exponential reduction trend in ICs fabrication cost, chips test cost will soon equal that of their fabrication, i f new test and DFT methods are not devised [4]. Today's monolithic testers are not designed to take advantage of the DFT used within a chip. Moreover, the design of these testers is such that they are able to perform a variety of different tests and support a broad range of ICs [27]. However, this amounts to very expensive testers with features that may not be needed or used for many CUTs or test insertions. It also results in the requirement of a large capital investment by companies for purchasing these multi-purpose testers. In addition, owing to the rapid advancement  Chapter 2: Background and Motivations  21_  of ICs, this large capital investment will be outdated within a few years, adding to the test cost.  Thus, an important challenge facing the test community is finding solutions to  reduce test cost through new DFT techniques, redesign of testers towards low-cost, reduced-functionality testers, and design of DFT-aware testers that take advantage of CUT DFT infrastructure.  Low-volume Test Data To maintain high fault coverage, for complex designs, a larger test data volume is inevitable. In turn, larger test data sets require more memory in testers. In addition, with larger test data sets and limited number of tester channels, test time will increase. These problems are exacerbated by the slower rate of increase in the number of a chip's primary input/output (I/O) pins when compared to that of the number of transistors on a chip. Therefore, reduction in the test data volume to lower test time by either reducing the test data set or increasing the effective bandwidth is a problem in need of solution. Note that, using low-cost testers can mitigate the problems associated with test data volume as, in these cases, test time contribution to the overall cost is reduced.  High-productivity Test To lower the cost and keep the time-to-market and the time-to-volume of products to their minimum, the DFT insertion, the test program development, and the test process should be hierarchical and automated as much as possible [29]. Hence, D F T techniques and test resources need to be modular and flexible to facilitate hierarchical and automated  Chapter 2: Background and Motivations  22_  test flow. Moreover, DFT circuitry should be seamless with the normal design flow and, thus, the DFT insertion process must avoid design iteration.  Test for Diagnosis Yield improvements of new chips are essential in lowering production costs and achieving the time-to-volume [32]. Diagnostic information helps in the rapid yield ramp up. Thus, the test flow needs to include fault diagnosis as one of its objectives in order to avoid expensive off-line and special diagnosis testing processes.  To this end, test  resources and DFT techniques are needed to facilitate real-time yield analysis [28].  2.3.2 Test Challenges in SoC The IP- or core-based design methodology is creating a new style of design with its own characteristics, requirements, benefits, and challenges. One particular challenge is how to test these core-based systems. testing.  Testing of SoC designs differs from system-on-board  Figure 2-3, adopted from [21], illustrates the differences in the design  development and the test flow between systems-on-chip and systems-on-board.  In a  system-on-board, each component is designed, verified, fabricated, and finally tested. The system integrator uses these tested components, and performs testing of the board and the system with the assumption that the components are fault-free.  However, as  illustrated in Figure 2-3, in the SoC design methodology, cores are designed and verified, but are not pre-fabricated. Hence, these cores cannot be tested before their integration into the SoC. Therefore, the system integrator is responsible for all post-manufacturing tests of both the cores and the system that uses these cores.  Chapter 2: Background and Motivations  System-on-Board  "2 '>  Design and Test Development  23  System-on-Chip Design and Test Development  o >~  OH C  u c o n. S o O  Fabrication  Test.  •  T  o  Design and lest Development  S u  Manufacturing  Design and 1 est Development  r  r  Manufacturing f  lot  Test  (b)  (a)  Figure 2-3: Design development and test flow for (a) System-on-Board, and (b) Systemon-Chip [21].  The different test flows as explained above and illustrated in Figure 2-3, as well as having embedded cores with limited accessibility, result in unique challenges in SoC test. Moreover, stringent time-to-market and time-to-volume requirements, and the need for lower total cost also contribute to SoC test challenges. summarised as follows:  These challenges can be  Chapter 2: Background and Motivations  •  24_  Effective communication mechanisms for the test flow between different parties in the design of the SoC (note that the design of an SoC can span multiple groups in different companies which can be located in many different geographical locations);  •  Accessibility of embedded cores from the primary input/output (I/O) of the chip;  •  Testing different types of cores with compatible methodologies;  •  Testing the user-defined logic (UDL) and the interconnects;  •  Seamless integration of cores' DFT into the system DFT (modularity of the system test architecture);  •  Having a test architecture allowing new cores to be integrated into the chip without incurring core design changes (scalability of the system test architecture);  •  Designing the system DFT such that the chip will be reusable hierarchically (scalability of the system architecture).  2.4 SoC Test Trends Test challenges as discussed in Section 2.3 call for new and advanced test methodologies for core-based SoCs.  A few of the most important test trends that, according to the  literature, help in addressing some or all of the challenges described in Section 2.3, are outlined in this section.  Chapter 2: Background and Motivations  25  2.4.1 Test Reuse The integration of IP blocks into an SoC results in a non-linear complexity growth for DFT and manufacturing test [4]. Hence, it is costly and inefficient to assign the design of embedded core DFTs and generation of test vectors to the system designer. It, however, is more practical to reuse pre-designed DFT schemes of the cores and their test vectors [4][18][21][24][25][33][34]. Core-based design with reuse methodology requires a new business model. In this new model, core-providers design and develop the IPs and coreusers integrate them in an SoC environment. Hence, core-providers are responsible for the DFT techniques used in the core as well as being responsible for providing all the information to core-users. In this modular model, core-users treat individual core test programs as distinct components and integrate/schedule these components into a system test program with limited knowledge of the core's internal detail [21]. Separation of the tasks between core-providers and core-users introduces new challenges in the entire design flow. Hence, modularity and scalability of the test architecture are deemed essential to enhance the rapid development of designs and to avoid costly iterations between core-providers and core-users.  2.4.2 Systematic Test Architecture To address core-based SoC testing challenges and enable test access to embedded cores, a more structured, systematic, and hierarchical approach than the traditional DFT is required [4]. Zorian et al. [21] proposed a generic test architecture consisting of three  Chapter 2: Background and Motivations  components: Source/Sink; Wrapper, and Test Access Mechanism (TAM).  26  Figure 2-4  illustrates this generic conceptual test architecture.  W rappi i  Figure 2-4: A generic conceptual test architecture [21].  In the model of Figure 2-4, the source applies the test stimuli to the core under test and the sink collects and performs an analysis of the test results with expected responses. In other words, the source stores and controls whereas the sink stores and observes in the process of testing a core. Independently, the source and the sink can either be on- or offchip. In addition, in the model of Figure 2-4, the T A M is the physical mechanism that connects the source and the sink with the core for communicating test data and control signals. T A M design determines how efficiently information is received and transmitted  Chapter 2: Background and Motivations  27_  from and to the outside world, and thus it affects, in part, the total test time, the complexity of test flow, and the test cost in general. Finally, the test wrapper is a shell around a core, which provides an interface between the core and its surroundings, isolates the core for test purposes, and helps in testing the interconnects. The wrapper is intended to enable "plug-and-play" cores such that cores, acquired from different providers, can be integrated into an SoC without any modifications. As part of a wide variety of methods in addressing SoC test challenges, new control and observation mechanisms are required in SoC test [4].  Moreover, to facilitate a  hierarchical test architecture, it is proposed that the test architecture needs to be separated from its behaviour [4]. In other words, it is proposed that the separation of the computation from the communication components of a test architecture is essential to address SoC test challenges. Finally, unification of the design and the test architecture flow is believed to be inevitable in SoC designs.  2.4.3 Test Resources Partitioning With the growing complexity of ICs, more features have historically been added to external testers.  This trend has transformed automatic test equipment (ATE) into  expensive monolithic testers that conform to the idea of one-solution-fits-all [28] [30]. In addition, many existing A T E are not designed to leverage the DFT features of the chips they test. DFT-aware testers are considered an important cost reduction factor in SoC test. In addition, the partitioning of A T E resources is emerging as an important solution to address test challenges in SoC. In this new model, the functionality of testers is broken  Chapter 2: Background and Motivations  28_  into different segments and distributed between different resources. For example, testing digital components, memory, and analog blocks of an SoC are performed in different ways and with different resources. Trade-offs of test resource partitioning and features of low-cost testers are still the subject of debate and extensive research. However, it has been argued and shown that this division helps in creating much needed hierarchy, modularity, flexibility, and scalability in the test flow and the test resources, and ultimately, it helps to achieve higher test quality and reduce test cost [27][28][30][35][36]. As an example, it is known that almost every chip has phase locked loop (PLL) circuits for its clocking circuitry. Migrating clocking circuitry of A T E to chips and reusing on-chip P L L not only reduces A T E cost but also enables at-speed testing. As another example, reference [28] presents an economic study of using testers with reduced features in the test flow. In this study, the benefits of using lower cost testers in the early stages of the test flow is compared with the penalties of escaping certain defects due to the lower capabilities of these lowcost and reduced-feature testers. It is shown in [28] that, even in extreme cases, using low-cost testers results in a lower overall chip cost.  2.4.4 Multi-site Testing Multi-site testing, or parallel testing, refers to concurrent testing of multiple similar chips on a single tester. Multi-site testing reduces overall test time and test cost. However, with the current A T E architecture, pin densities on the testers cannot easily increase to enable multi-site testing. One solution is to reduce tester pin functionality, enabling more  Chapter 2: Background and Motivations  pins per tester [28] [35].  29_  This solution is the direct outcome of the test resource  partitioning as described in Section 2.4.3. Therefore, multi-site testing is one of the main driving forces behind low-cost, low-feature, and DFT-aware testers [28]. In addition, new DFT methodologies are required to achieve higher degrees of parallelism in testing. These methodologies should significantly reduce the number of required test pins per IC, and, hence, increase the number of ICs that can be tested on a tester. Another way to achieve testing parallelism in SoC designs is to simultaneously test multiple cores using new DFT techniques [37]. However, one limiting factor in such a solution is the power rating of the IC under test. This rating may limit the number of cores that can be tested concurrently and, hence, limit the effectiveness of the solution.  2.4.5 Test Data Compression Compressing the test data is a clear way to address some of the challenges detailed in Section 2.3. There are many techniques in this domain [38][39][40][41][42][43].  The  majority of them target Built-in Self-Test (BIST) as the primary method of the IC test. However, there are newer published techniques that use automatic test pattern generation (ATPG) test data sets alongside embedded blocks [44] [45][46].  2.4.6 Embedded Testing One important challenge for testing core-based SoC designs is accessing embedded cores from the system's primary inputs/outputs (I/O). This accessing proves challenging owing  Chapter 2: Background and Motivations  30_  to the limited number of primary I/O pins. According to the ITRS prediction, the maximum number of pins on the chips is to increase from about 3,070 in the year 2001 to 4,420 by the year 2016 [4]. However, the ratio of power/ground pins to the total pins on the chip can typically vary between 2:3 and 1:2 [4]. Moreover, with more complex IPs, there will be more test pins on each core. Hence, the combined effect is a shortage of available pins for testing in future chips. Multiplexing signal and test pins can mitigate this problem. However, increasingly there are high-speed signal pins on a chip that cannot be shared and, hence, the shortage of test pins to access embedded cores will still be a problem needing to be addressed. In addition, testing core-based SoCs requires a level of complexity that makes relying on external automatic test equipment (ATE) insufficient [4]. As previously mentioned, the cost of A T E rises as the complexity of ICs increase. Moreover, ATEs can quickly become outdated compared to the chips that they test, and, for example, often lack the necessary resolution and accuracy to test new devices effectively and perform at-speed testing of these devices [4] [23]. The limited number of chip I/Os, high tester cost, the need for at-speed testing, and the typically lagging technology of external testers call for specialized embedded support infrastructure blocks, suited for embedded testing [4] [23]. Test resources, hence, need to be partitioned into their internal and external components such that embedded test components take over some functionality of the external A T E resources [4] [47]. embedded solutions are referred to as infrastructure IPs (I-IP) [32] [48].  These  I-IP blocks do  Chapter 2: Background and Motivations  31_  not add to the main functionality of the chip. Rather, they facilitate post fabrication test, and can enhance chip's lifetime reliability [49]. It is imperative that such embedded solutions perform deterministic tests, use the benefits of the automatic test pattern generation (ATPG), and work in harmony with their specialized off-chip counterparts [4][29][44][50].  2.5 Current Test Architectures This section overviews current test architectures.  Section 2.5.1 reviews the standard  wrapper being developed by the IEEE PI500 Working Group. Section 2.5.2 describes the general concept of Built-in Self-Test (BIST) architectures.  Finally, Section 2.5.3  provides a comprehensive survey of other test architectures where external test equipment is used as the test source and the test sink.  2.5.1 Standard Wrapper The IEEE PI 500 Working Group is working towards a Standard for Embedded Core Test (SECT), to allow the automatic identification and configuration of testability features in integrated circuits containing embedded cores [51]. Towards that end, the PI500 aims at standardising a scalable architecture in the form of a wrapper around a core and defines a Core Test Language (CTL). However, other components of the test architecture, i.e., source, sink, and T A M will not be standardized. The scalable architecture, as illustrated in Figure 2-5, will provide a mandatory serial port and an optional parallel interface for testing a core.  Chapter 2; Background and Motivations  32  User Defined Port (Optional Wrapper Parallel Port - WPP-) Wrapper Parallel Control  WPC  Wrapper Parallel Output!  WPO  Core WSI i  Wrapper Serial Input  Wrapper WSC  WSO Wrapper Serial Output!  Wrapper Serial Control Standardized Port (Required Wrapper Serial Port - WSP-)  Figure 2-5: Block level overview of a PI 500 wrapper [51].  In addition, the standard will provide a set of instructions enabling different modes, mandatory and optional, for the wrapper architecture. These instructions will be loaded into a wrapper instruction register (WIR) and control wrapper boundary registers (WBR) in front of the core's functional input/outputs. Figure 2-6 illustrates a conceptual view of the required PI500 wrapper architecture [51]. A core-provider will use the standard language to communicate to the core-user the internal, external, and pattern information for every test mode of a core.  A core-user, however, is responsible for  designing the source, sink, T A M , and the overall system DFT.  Chapter 2: Background and Motivations  33  Functional Inputs  Functional Outputs F(>  IF1  Functional Outputs  WSI  Functional Inputs  Tl 1  ; !  *t  1  Update  —H  ! !  Bypass W IK  1. WSO  " 7 ^  1  WSC  Shift  Figure 2-6: Conceptual view of the required P1500 wrapper architecture [51].  2.5.2 Built-in Self Test Embedded testing architectures in the form of Built-in Self-Test (BIST) have been used to reduce test cost [2][52][53][54][55]. BIST facilitates in-field and at-speed testing and, hence, improves the quality of test. Moreover, BIST lowers the test cost by partitioning the test resources as outlined in Section 2.4.3. Memory BIST is now widely used to test the memory components of ICs.  Moreover, using analog BIST results in reduced-  function and low-cost external testers, and can potentially achieve better test quality. Finally, logic BIST is gaining more interest as a solution that can address many of the test challenges explained in Section 2.3. This thesis uses BIST to refer to logic BIST, as the  Chapter 2: Background and Motivations  34_  focus of this thesis's work is the development of new and novel DFT techniques for digital cores of SoC designs. A block diagram of a typical BIST arrangement is illustrated in Figure 2-7. In a typical BIST arrangement, test patterns are either stored or generated on chip and are local to the CUT. These patterns are applied to the CUT under the control of a BIST controller. The outputs of the C U T are compacted during the entire process of test data application. Finally, the compacted output response is compared to reference signatures at the end of the test process, and a pass/fail signal is generated.  The dominant technique in logic  BIST is based on the STUMPS architecture [38][56], which uses multiple scan chains as the underlying DFT technique in the C U T (for an introduction to scan chains, see Section 3.2). Hence, this thesis assumes the use of scan chains in BIST.  Test Pattern Source  l  CUT  =>  Output Response Compaction  Pass/Fail •  Reference Signature(s)  Figure 2-7: Block diagram of a typical BIST arrangement.  Chapter 2: Background and Motivations  35_  BIST can simplify the integration of a core's DFT into a system DFT in SoC designs and, hence, enable near-seamless reuse of test resources. In these techniques, both the test source and the sink are local to the C U T and, thus, the T A M amounts to local wires between BIST circuitry and the core. This local communication model simplifies the problem of test data transmission. However, on-chip storage of the entire deterministic test vectors, for each core, is not cost-effective or even practical in many cases. To eliminate the need for on-chip storage of test data, typically, linear feedback shift registers (LFSR) are used to generate pseudo random patterns as the stimuli in the BIST arrangement.  These test vectors are then applied to the C U T , and, in the simplest  configuration of BIST, the test results are compacted and the final signature is compared to the signature of expected results. One problem with such pseudo-random pattern BIST is the potentially lower fault coverage, compared to that resulting from deterministic patterns obtained via automatic test pattern generation (ATPG).  This lower fault  coverage is mainly due to the linear dependency of the generated random patterns and the fact that some faults, such as path delay faults, are resistant to random patterns. The potentially lower fault coverage associated with a BIST methodology can be mitigated, although not solved, in different ways. One method for smaller circuits is the introduction of a phase shifter between the LFSR and the CUT's scan chains[56]. This phase shifter reduces the dependency of the bits applied to the scan chains. Another method consists of inserting additional test points in the design [2][57][58]. However, test point insertion requires modification of the design and is therefore not desirable.  Chapter 2: Background and Motivations  36_  In BIST, phase shifters and test points cannot typically guarantee the same high fault coverage achievable using ATPG-based vectors. Other techniques have been proposed that target the pseudo-random pattern generator to mitigate the problem of low coverage [38].  These techniques include: weighted pseudo-random sequences [59] and multiple-  polynomial LFSR with reseeding [60]. However, the complexity and cost factor of these solutions can be high and not desirable in many cases. Another strategy to address the potential low fault coverage in BIST is known as mixed-mode BIST. Mixed-mode BIST uses deterministic test vectors as well as pseudorandom vectors [61]. As one example, the fault coverage can be increased by applying top-up test vectors from an external tester. However, it has been shown that for large industrial designs, the A T P G top-up pattern volume is 25-65% of a full A T P G test approach [62]. In another study, this ratio is reported as high as 70% [63]. This large ratio defeats many reasons for using BIST, and. is not acceptable in many cases. Other techniques in mixed-mode BIST include: bit-flipping [64], bit-fixing [65], weighted random pattern generation [66], and Sequence Generating Logic [67]. In this latter work, no external deterministic pattern is applied. Rather, a block is placed between the LFSR and the C U T that changes the output of the LFSR into deterministic patterns.  The  potential problem with these techniques is the large area required for the BIST circuit. Reference [67] reports BIST hardware equal to about 5-15% of the total C U T area, for circuits of up to 100K gates that use 10,000 test patterns. It is, however, possible to  Chapter 2: Background and Motivations  37  trade-off test quality, test time, and BIST area overhead [67], i f lower test quality or longer test time is acceptable. Compacting test responses and using a signature analyzer at the output of the CUT, result in additional challenges of BIST techniques. Diagnostics in BIST is more difficult than ATE-based testing as, in BIST, most of the information needed for diagnosis is lost during compaction of the test results into signatures.  Therefore, diagnosis is only  possible with multiple test runs that add to the test time and the complexity of the test process. Moreover, using signature analysis in BIST can mask some faults, caused by inherent aliasing in signature analyzer compaction schemes. Another problem with signature analyzers, and consequently in BIST, is the "zero-ing out", where the history of the circuit response is lost [2]. Another drawback with logic BIST techniques is in their very long test time requirement. In ATPG-based testing, the test vectors are generated and then compacted to be as few as possible. Moreover, the CUT's output response is compared with the expected response for each test vector, resulting in shorter test time for faulty chips. However, in BIST, owing to the random nature of the entire test vector set -or at least the initial set of test vectors in the more elaborate versions of BIST-, a large number of test vectors is needed for achieving an equivalent level of fault coverage. In addition, the comparison with the expected reference signatures is only possible at the end of the test process or, at best, at the end of intermediate sections of the test process [68] [69]. Hence, the result of the test is only available after application of multiple test vectors in addition  Chapter 2: Background and Motivations  38_  to the vector that can identify a fault, resulting in even longer test time for faulty chips. The study in [62] shows that, typically, the test time is two to three times longer than the test time using A T P G . Using pseudo-random patterns in BIST, results in additional challenges.  These  patterns can cause parts of the CUT to generate responses that are unpredictable ("X") in the BIST simulation, and no signature can be defined for these cases. It is imperative to prevent the propagation of these unknown ("X") values to the compaction circuit, as the signature will be corrupted otherwise. Moreover, it is necessary to prevent bus conflicts during the application of the pseudo-random patterns.  Finally, BIST using pseudo-  random patterns can cause a large power dissipation, as these patterns may result in large circuit switching [70][71]. As was presented in this section, logic BIST is a powerful test methodology that can address many problems facing the test community for testing core-bases SoCs. However, there are many challenges in the path of making logic BIST a viable test solution. These challenges were summarized in this section and some of the proposed improvements presented.  Additional problems and challenges in BIST include: making the design  BIST-ready; automating BIST insertion; and integrating BIST into the overall design flow with minimal impact [62]. This thesis, as explained in Section 1.2.1 and detailed in Chapter 3, builds on the strengths of BIST methodology, solves some of its potential shortcomings, and, hence, presents an alternative solution in addressing some of the different SoC test challenges described in Section 2.3.  Chapter 2: Background and Motivations  39_  2.5.3 ATE-based Test Architectures ATE-based test architectures differ from BIST in the fact that both the source and the sink are an integral part of an external tester in the form of an A T E . This thesis uses the T A M arrangement in these architectures as a classifier. In SoC test architectures, test stimuli and test results are transported through the T A M . Hence, the T A M can be viewed as the communication link in the architecture. In addition to the T A M , extra control lines are used to properly set-up the T A M and core wrappers. Hence, there are two categories of lines in generic SoC test architectures: 1) data lines (referred to as T A M in this work) and 2) control lines. Dedicated wires or existing functional interconnects can be used for the data lines [72] [73] [74] [75]. While in [72], [73], and [74] the cores are modified such that each core has a transparent mode for testing, [75] uses the processor bus for test data transport. There are many proposed ATE-based test architectures in the literature. Based on the connection method between the chip pins and the core terminals, these can all be grouped into three main categories: multiplexer-, serial-, and bus-based connections. Most of these architectures suggest the use of a serial control mechanism to properly set-up the T A M and core wrappers. In the first category, multiplexers are used to allow test access to the cores. The simplest method in this category is to multiplex the test pins to the primary I/Os such that a direct path is established during test [76]. A second method modifies the cores such that each core has a transparent mode for testing [72][73]. A recent third method  Chapter 2: Background and Motivations  40_  provides a transparent path based on modelling the T A M design as an Integer Linear Programming (ILP) problem, to minimise the overall test time and overhead area [74]. A number of test architectures in the serial-based category use the established IEEE 1149.1 standard [26][77][78].  Whetsel, in [79], uses a hierarchical structure by  introducing a Tap Link Module (TLM). A n improvement on the T L M is presented in [80], where the Test Access Port (TAP) of the 1149.1 standard is kept unchanged from its original form, and hence, simpler T L M controls are designed. A number of different variations of the bus-based connection schemes have been reported. Varma et al. [33] suggest a structured architecture based on separate data and control buses. In their work, provision has also been made for using several such buses with different widths. To simplify the control mechanism in the T A M architecture and provide scalability of the architecture through hierarchy, a multilevel bus structure connected in a tree topology has been suggested [81]. Marinissen et al. suggested the TestRail architecture [22], where cores are connected in a daisy chain configuration and buses (or Rails) can have different widths, fan-in, and fan-out. In TestRail, each core can be bypassed i f needed to access the next one in line and control is achieved via a serial connection. In the bus-based category, different methods are suggested for accessing the cores from the buses. Core Access Switches (CAS) select P signals out of TV bits of a bus and use the TestRail topology for the buses [82]. Whetsel has suggested an addressable architecture in [83]. In this latter architecture, each core is given an addressable Test Port, which can  Chapter 2: Background and Motivations  4J_  serially be assigned with its appropriate address to provide an intelligent distributed control mechanism for connecting cores to buses. Finally, a time division multiplexing technique has been suggested in [84]. In the latter, using configurable and dedicated arbiters, cores autonomously assume the control of the bus. Additional issues in the busbased connection scheme are the test architecture optimization and test scheduling. In order to minimize the test time and the number of required test pins for a given set of constraints, many studies have been reported that suggest different heuristics to optimally design test wrappers and assign the cores to the T A M s [85][86][87][88][89][90]. ATE-based test architectures benefit from a high test quality, as A T P G test vector sets are used. However, in the above test architectures, there is no distinction made between the communication and the application of test data. The close coupling between the communication and the application of test data works against many of the test trends detailed in Section 2.4. Therefore, novel test architectures are needed that, while utilising the quality of ATPG-based testing, conform to the test trends of Section 2.4 to a high degree. This thesis, as outlined in Section 1.2.2 and explained in Chapter 4, presents a methodology that helps in achieving systematic test architectures that benefit from test reuse, resources partitioning, and multi-site and embedded testing.  Chapter 3  Dedicated  Autonomous  Scan-Based  Testing  3.1 Introduction As discussed in Section 2.4.6, embedded testing is one of the most important trends in dealing with challenges of testing core-based SoC designs. Test resources need to be partitioned to move part of the tester's functionality onto a chip. This chapter presents the concept, and provides the implementation and experimental results of a Dedicated Autonomous Scan-based Testing (DAST) methodology [91] [92], for testing embedded digital cores. The novelty in DAST is in the fact that core test stimuli and expected results, generated by A T P G , are pre-processed into a new test-data protocol. In doing so, all the control sequence of testing is transferred from the external A T E to a dedicated Embedded Autonomous Sequencer (EAS) block, associated with single or multiple cores  42  Chapter 3: Dedicated Autonomous Scan-Based Testing  43_  on an SoC. Moreover, Embedded Autonomous Results Analyzer (EARA) blocks [92] are used in synchronism with EAS blocks to deterministically analyze the test results and to compare them with expected results. Thus, using DAST, a simple data transmitter can be used in place of an external A T E .  Hence, the complexity of the test flow can be  significantly reduced and rendered much more scalable and portable between technology generations. In essence, D A S T divides the A T E functionality into its test data communication and test data control/observation components, and migrates the latter away from the A T E to place it on-chip at the periphery of the core in the EAS and E A R A . Therefore, the former can be reduced to an inexpensive unintelligent block. Since DAST is based on the conventional scan/ATPG approach, it does not imply any compromise in terms of testquality (coverage and fault models), ease of use, and broad applicability.  Moreover, as  the EAS and E A R A blocks are external to a core, the impact on the design and the design flow is also minimal. Finally, as the test is performed on chip, at-speed test is feasible with minimal cost.  3.2 Components of Scan-based Testing One of the most popular structural DFT techniques is scan design. Scan design increases internal circuit node controllability and observability of sequential circuits. In essence, scan design transforms the difficult task of sequential circuit test into the easier task of combinational circuit test. To achieve this transformation, regular D-type flip-flops in a sequential circuit, as shown in Figure 3-1 a, are replaced with the scan flip-flops as shown  44  Chapter 3: Dedicated Autonomous Scan-Based Testing  in Figure 3-lb.  When the sequential circuit is in test mode, Test Select (75) signals of  the multiplexers are asserted to select the Scan In (SI) as the input of the scan flip-flops (scan cells). In this way, flip-flops are connected in long chains to form shift registers. Using the created shift registers, all the scan cells can be set to desired states in the test mode.  Similarly, the state of the internal flip-flops and nodes can be captured by the  scan registers and shifted out serially, thereby providing increased observability [2] [3].  (a)  (b)  Figure 3-1: Block diagrams of (a) D-type flip-flop, and (b) Scan D-type flip-flop.  By nature, scan-based testing is a repetitive procedure. The steps in scan testing a core can be summarised by the generic waveforms in Figure 3-2. In Figure 3-2, the signal labels PI, SI, CLK, ST_PO, and ST_SO denote primary input values, scan input values, test clock, strobe primary outputs, and strobe scan outputs, respectively. In addition, TS represents the value of the test select signal.  Chapter 3: Dedicated Autonomous Scan-Based Testing  <  45  Repeat for (# Test Patterns)! > Rcpi-at fur (= Scan fills)  >.  :<  IS  PI  IS I  1S=I  nxzx.zx  SI  CLh  SI I'O  11  11  I .ST .SY>  Figure 3-2: Generic scan test waveforms.  From Figure 3-2, a scan test procedure can be conceptually divided into two functions: a) data delivery, and b) control/observation. For a given test pattern, data delivery refers to the following steps: •  Load the test vector into the scan chain flip-flops through the scan inputs (57);  •  Apply and maintain non-scan portions of the test vector at the primary inputs (PI) of a core;  •  Set TS to logic 1 and apply new values to PI.  For the same given test pattern, control/observation refers to the steps/actions: •  Assert CLK to load-in test vector in the scan chain;  following  Chapter 3: Dedicated Autonomous Scan-Based Testing  46_  •  Assert STJSO to test for values at scan outputs (SO);  •  Assert ST_PO on two different occasions to check consistency at the primary outputs (PO).  In the traditional methodology, the A T E is responsible for transforming raw test vector data into scan sequence waveforms of Figure 3-2. This thesis, however, introduces a simple hierarchy whereby the complex A T E scan functionality is replaced by a simple off-chip test-data compiler.  As described in Section 3.3, the compiler only needs to  marginally transform (into a new protocol) the raw test data from A T P G , embedding the capability to generate the complex scan sequences in the dedicated E A S and E A R A blocks associated with specific cores.  3.3 DAST Concept External ATEs need to perform both functions of data delivery and control/observation. This is one of the reasons for increasingly expensive and complex test flows and equipment. Given today's ample resources in terms of silicon, this thesis takes advantage of the repetitive nature of scan testing by incorporating EAS and E A R A blocks dedicated to each core on an SoC. The EAS's function is to accept simple binary test data and to transform it to produce the specific waveforms required for applying the scan pattern to the specific target core. The E A R A ' s function is to work in synchrony with the E A S block and compare the incoming expected test results to the test results at the output of the core. This essentially amounts to separating the two functions of the off-chip A T E , as  Chapter 3: Dedicated Autonomous Scan-Based Testing  47_  described in Section 3.2, and moving the control and observation onto the chip. That is, DAST replaces a hierarchically flat system by one with a simple hierarchy. Similar to the case of ATE-based testing, to reduce the amount of memory needed in the off-chip block, the DAST methodology allows for the transformed test vectors to be compressed before sending them to the embedded blocks. The compressed test sets can then be decompressed on-chip at the EAS and E A R A blocks. Using the DAST methodology, both test stimuli and expected test results are sent to a core via a T A M . Thus, as far as scan testing is concerned, it is possible to replace the A T E by an unintelligent serial interface circuit block. This latter block is simply required to transmit test data at a designated clock rate to the embedded blocks of DAST. Upon receiving test data, an EAS block applies the test data to a core and generates the test clock. Simultaneously, and in parallel to the EAS operation, the E A R A block collects the test results from the primary input and primary output pins of the core and compares them with the incoming expected test results. On the first occurrence of an inconsistency between the expected and actual results, a "sticky" no-go signal is generated by the E A R A block, marking the detection of a fault. This idea is illustrated in Figure 3-3.  Chapter 3: Dedicated Autonomous Scan-Based Testing  48  Functional Inputs  SoC : Functional outputs  1- min ililiil Autonomous Si(|innnr  >  Embedded] Core  5HBBEB  Scan I Test Clk  I iiilnilili'd  Stan Out  \ulonnmous Results An:il\/cr • I VRA) J/.o No go  Narrow Bandwidth Test Access Mechanism (TAM)  TGo/No-go Data Transmitter  lest Mimuli I id  Lxpcctcil lost Results Kile  Figure 3-3: Concept of D A S T as a block diagram.  The design flow incorporating the D A S T methodology is shown in Figure 3-4. This flow is only partially different from a flow using conventional ATPG/Scan. Similar to the case of a conventional ATPG/Scan test flow, scan chains are assumed to be inserted automatically and/or manually before the test vector and expected test result files are generated by the A T P G . The function of E A R A must be fully deterministic. Hence no unknown values, i.e., " X " values, can be allowed in the test stimuli and expected test results, as otherwise two lines are needed to encode the 3-valued data bits. In the specific case of this thesis, without loss of generality, all " X " values in the test vector file are replaced with the zero logic value. This modified test vector file is used to apply the test vectors to a gate-level model of the circuit, and real expected results are collected and  Chapter 3: Dedicated Autonomous Scan-Based Testing  49  replaced with " X " values in the expected test results file. Using the EAS- and E A R A compilers, the test vector and the expected test results files are subsequently compiled into the test data and the expected test result protocols of EAS and E A R A .  ATPG Test Vectors ' & Expected Test Results  Core RTL Model  Synthesis & Scan Insertion Tool  Replace "XV Values in the Test Vectors  Core Gate-level Modell  Simulate & Extract /tea/Expected Results  Modified ATPG Test Vectors ' & Expected - Test Results  35  EAS Compiler  fist Stimuli I ill  Figure 3-4: D A S T design flow.  EARA Compiler'"  Expelled l e t k > M i l l - . I iK  Chapter 3: Dedicated Autonomous Scan-Based Testing  50_  Both the EAS and E A R A logic are generated in R T L V H D L or Verilog and placed out of a core under test. Therefore, EAS and E A R A are generic, flexible, and non-intrusive to a core under test, and should be viewed as test soft IP. As a result, the DAST design is adaptable to the parameter changes of a core and scales with the fabrication technology. The EAS and E A R A logic only depends on a core's primary input/output numbers, scan chains number/depth, and specific type of scan cells, used to implement the chains. They do not depend on a core's functionality. Both the EAS and E A R A blocks also include bypass circuitry to allow the SoC's normal functional mode. For the test flow, only a simple data transmitter is required to send the test stimuli and the expected result data onto the chip, as shown in Figure 3-3.  The test control,  waveform generation, and comparison to the real results are all done automatically by the EAS and E A R A blocks.  3.4 Implementation Assuming a simple serial connection for each of EAS and E A R A block to the off-chip, the DAST methodology was implemented to enable the simple comparison of the performance and cost of the DAST methodology to the classical one. No specific T A M architecture was assumed for the development of DAST. However, a direct connection, as well as N I M A (as introduced in Section 1.2.2 and explained in Chapter 4), was used for implementation and verification.  Chapter 3: Dedicated Autonomous Scan-Based Testing  5J_  3.4.1 EAS & EARA Compilers The modification of the ATPG-generated file, as described in Section 3.3, is coredependent. This modification takes a few seconds for simple cores to several minutes for more complex ones. After the modification of the ATPG-generated file, EAS and E A R A compilers are needed to convert the modified ATPG-generated file into test stimuli file (EAS file) and expected test results file ( E A R A file), as shown in Figure 3-4. Two C programs were developed that take as inputs the modified A T P G test vectors and expected test results. These programs then insert three-bit op-codes to the beginning of each section of the test program to convert the test vectors and expected results into E A S and E A R A protocols, respectively. The time taken for this protocol conversion is very short and is less than a few seconds on Sun BladelOO machines for cores of up to 100K gates. The following op-codes are used as simple instructions in the EAS and E A R A : •  Shift-PI-BSR: shift into primary input boundary scan register;  •  Shift-SC: shift into scan chains;  •  Shift-SC-BSR: shift into scan input boundary scan register;  •  Assert-Clk: assert the test clock;  •  Shift-PO-BSR: shift into primary output boundary scan register;  •  Shift-SO-BSR: shift into scan output boundary scan register.  The algorithm for the EAS compiler is shown in Figure 3-5. Based on the waveforms of Figure 3-2 and the intended operation of E A S blocks, the algorithm, in line 2,  Chapter 3: Dedicated Autonomous Scan-Based Testing  52  identifies seven action groups, i.e., Pattern boundary, PI data, Clocking, SI data, PO data, SC data, and SO data. In lines 35, 39, 42, 46, 49, and 55, the necessary data for primary input (PI), clocking information, scan input (SI), primary output (PO), scan chain (SC) elements, and scan out (SO), are saved by the algorithm in temporary variables. In addition, proper flags for each action group are set at the same time.  •30. •'•. for.difference ofPO length andzero_padding counter.^.;,.' 31. zero-pad EAS file;"*" ' \ 32. 33. -' write Assert^clk code into EASfilebased on clocking information; 34. ^ffb^ak;i§^$$^^Mm^ ':-^Mw^^'p^$^0wM^ •^••^lll 35. 36. save PI data; . , 37.' .' Sliift_Pl_BSR active: ». 38. 9. . if scan cycle { , .' 10. write ShifisSCoprCode into EASfilewhen needed. 39. ( hu kinii. 11. fot i..,nir. I • ' i /•(." '•. ." 40. • save clocking information. ^. for number ofscdh"d 12. ' 41. 13. ' '' ,\ write scan element data into EA 42. 14.itliiiii/jiiliis^ itiiiii • 43.. 44. ShifCSClBSR active, ' ' < IS. 45. • -^ ::break;^^m • 16 else { — not scan cycle 17. ifShift_SCJSRactive { 46. POddtaf IK write ShiftJSCJSR op-code to EAS file; 47. • ShiftJ'O^BSR active; 19. write SI data to EAS file. 4H. 20. 49. SO. for number of scan chains { • 21. ifShiftJPO_PSR active { write ~Shift_PO_BSR op-code to EAS file. 51. save scan element data; • • 23. ' 52. Shift_SC active; '•'., • 'if Shift'j>l_BSR active j ' ^£/y '-:'. •. ihaeasezero_padding counter by op-code and glh;. PI len 53.. 5! hrrak. • .'" 26. 56. 27. for number ofscan chains f ^ V"'** if Sluft_SC_BSR active { ' ~ V Shifi_SO_BSR active. ' ' '' ' 28. increase zero_padding counter by op-code and th.SC leng ; 5H. 29. '/:-;;»' ' " ?'S*xj'5 break. 59. 60. End while not aid ofATPG file  While not end ofATPGfilethen ' identify nction_group; • case action__group of . • I'linem bnunjury: 5. if ShiftJ>I_BSR active { 6. write Shifl_PI_BSR op-code to EAS file;' write PI data to EAS file; Ilia  ' ' itllii!  •  i:[  :  :  :  :  •he''  Figure 3-5: E A S compiler algorithm.  As shown in lines 4 to 34 of the EAS compiler algorithm, based on the previously set flags, the EAS file is updated in the case of Pattern boundary action group. In lines 5 to  Chapter 3: Dedicated Autonomous Scan-Based Testing  53_  8, the op-code for primary input data together with PI data is added to the EAS file. In addition, in lines 9 to 15, data for scan cycle is identified and added to the EAS file. In lines 16 to 20 the op-code and data for scan inputs are added. Finally, lines 21 to 34 identify the number of zero-padding bits needed to maintain the synchronization between EAS and E A R A blocks.  These lines add the padding bits based on the difference  between PO's length and those of the PI and SC. The algorithm for the E A R A compiler is shown in Figure 3-6. Similarly to the case of the EAS compiler, and based on the waveforms of Figure 3-2 and the intended operation of E A R A blocks, the algorithm, in line 2, identifies seven action groups. These action groups are: Pattern boundary, PI data, Clocking, SI data, PO data, SC data, and SO data. In addition, in lines 59, 62, 65, 68, 72, and 76, the necessary data for PI, clocking information, SI, PO, SC elements, and SO, are saved by the algorithm in temporary variables, and proper flags are set. Finally, in lines 4 to 58 the E A R A file is updated. In this latter section, the algorithm, in lines 6 to 10, adds the op-code for PI data to the E A R A file. Since no action is required in the E A R A block for this period, 0 is added to the E A R A file, to indicate this fact. Lines 11 to 20 handle the initialization part of the test vector when the E A R A block is mainly dormant and, hence, it is only required to keep its synchronization with the EAS block. Lines 21 to 28 add the op-code and data for SO. Finally, in addition to updating the E A R A file for PO data, lines 29 to 58 identify and add the zero-padding bits required in the E A R A block to maintain the synchrony between E A R A and EAS blocks.  Chapter 3: Dedicated Autonomous Scan-Based Testing  .-  4. 5. 6.  While not end of ATPGfilethen identify action_group; case action_group of  Ill K. 9. 10.  I'ttlh III  26.  MM^ 2K. 29. 30. 31. 32 33. 34 ••"  m 37. 3H. 39. 40. 41,  , •  else {-- Shift_PO_BSR not active Shift_SC_BSR active { for total number of'SC and op-code length { i i~,-write 0 to EARA file; ,  -  M^S^iiir ,'  x  V^'"  if initial section of test vector { for total length of op-code { add 0 to EARA file when needed;  48. 49.  53. 54.  ifShift_PI_BSRacme ( • . •^•"'r.n.-..' • ' for total number of PI and op-code length' write 6 to EARA file.  63  3*V<- '  w B | ^ ^ ^ ^ ™ ^^^^•iii^^^^ai 1  64. 66. 67.  iiiS^BlllllSiM  for opcode length { write 0 to EARA file based on clocking information}  ^^^^^KliiSii^^^m 60  else f-^not initial section of test Hnte Shifi_10_BSR op-code into E/ >•'< •.-, for total number of scan chair, for total numbei of scan elements { ~ wrtie SO data to EARA fi  .  Wi^^^^^^f^^^S&L  for total number ofscan elements f for total number of scan chains f add 0 to EARA file;  IS.  :.•  s  bountlari:  s  12. 13. 14. 15 16. 17.  19. 21). 21. 22. •  ;  if scan cycle ( . ii\Shift_PI_BSR active { '» .btal number o' I'i ' i - ; ' . .-.v .V/. ' ''v~"dddO to EARA file:  II.  42. 43. 44. 45  ' • . ' ' . . ^t\„\ .  54  ^ ^ i ^ ^ ^ l i  „Shtft_Pl_BSR active.. save clocking information.  '"' •"' ' . ..>•'..;' ' ^  * "\ '  ' br'ealc/i'l-fff.,. ShiftJSCJ3SR active; break. <  69. save PO data; 70. Shift_PO_BSR active; else { ~ not scan cycle .71. break; if Shift_PO_BSR active f , ' write Shift_PO_BSR op-code to EARA file;'-: ' ' 73. Set flag for initial section of test vector. \ ^ ^ write TO data into EARA file. " 74 Shift_SC active. "•„ >I_BSRactive ( -"»*"*'" >"-'increase zero_padding counter bv op-code and Pl^ length • .  if Shift'_SClBSRactive{ - * • 7H. " ' save SO data;. ' ' increase zero^padding counter by op-code and SC length; 79.ShiflJSO_BSR'active: for difference ofPO length and zero^padding counber { zero-pad EARA file; H2. End while not end of A TPGfile x  -  ' ^-V^..  Figure 3-6: E A R A compiler algorithm.  3.4.2 EAS & EARA Hardware The hardware implementations of both EAS and E A R A blocks are illustrated in Figure 3-7. These hardware implementations are extremely simple and compact, and require minimal design effort for any given core. Moreover, the modularity of the EAS and E A R A design allows for their easy automation.  Chapter 3: Dedicated Autonomous Scan-Based Testing  /  t  Payload  Data In  55  IB  Shill Register  Instruction Code  To PI of the Core FF  1 sM  Control In To SI of the Core II  Core Test Clock, ST_PO, ST_SO  (a) EAS block.  From PO of the Core  | Payload  \ *  t  Datal  ">|.'ShiftRegUter|  Instruction Code  PO It S R  Enable  •FF  4L  ST PO  Enable Control In  Go/No_Go  ISM SO STJSO  Control Out  I f Enable FF  From SO of the Core  (b) E A R A block.  Figure 3-7: Hardware implementation of DAST components.  Chapter 3: Dedicated Autonomous Scan-Based Testing  56_  As shown in Figure 3-7a, in the EAS block, the incoming data is captured in a shift register until its finite state machine (FSM) block can decode the op-codes. Test data destined for the primary inputs, PI, are then sent to the appropriate boundary scan registers (BSR), and correct shift/capture signals are asserted by the F S M . The same is true for data destined for the scan inputs, 57, of the core. Upon receipt of the Assert-Clk op-code, the EAS toggles the core's test clock. Similarly, in the case of E A R A block, as shown in Figure 3-7b, the incoming data is captured in a shift register until its F S M block can decode the op-codes. Expected test results for the primary outputs are then sent to the PO-BSR and appropriate shift/capture signals are asserted by the F S M . The expected test results of the scan outputs, however, are always captured in the SO-BSR. Using X O R gates, the outputs of both the PO- and SO-BSR are always compared to a core's primary output and scan output values, respectively. However, using A N D gates, the results of the comparison are gated by the ST_PO and STJSO signals, respectively. The output of the A N D gates then act as enable signals to two flip-flops such that i f a mismatch is detected the Go/No-Go signal is asserted high. The Algorithmic State Machine (ASM) for EAS's F S M block is given in Figure 3-8 as pseudo-code. After receiving the op-code in line 4, and based on the received op-code, the state machine of Figure 3-8 enters one of five different states. These states are: Shift_PI_BSR, ShiftJSC, Shift_SC_BSR, AssertjClk, and Shift_PO_BSR. In these states, the A S M generates necessary signals for the correct operation of the E A S block in  Chapter 3: Dedicated Autonomous Scan-Based Testing  5_7  applying the test stimuli to the primary inputs and scan inputs as described in Section 3.2. In line 18, appropriate signals are asserted to synchronize the operation of E A R A block with the EAS block. Finally, in lines 27 and 28, the state machine halts its operation until the E A R A state machine calls for resumption of the process. This handshaking is needed to account for different length of scan-in and scan-out cycles of any given core.  1. ifvalid data is present then •. 2. while valid data and not end_of_op-code then 3. while not end { 4.. get the op-code; 5. }• 6. case op-code of 7. Shift-PI_BSR: 8. while not end { 9. ,-v. . shift to the primary input registers; 10. } 11. assert BSR update signals; 12. ' Shift-SC: 13. while not end { 14. shift to the scan chain; • 15. if ready to assert the clock then 16. assert the clock; 17. } 18. assert signal to synch with EARA; 19. Shift-SC_BSR: 20. while not end { 21. shift to the scan chain input registers; 22. } 23. assert BSR update signals; 24. Assert-Clk: 25. assert the test clock; 26. Shift-POJBSR: 27. call for EARA; wait for call from EARA; . 28. 29. end case; 30. end while valid data and not^end_of±op-code;. 31. else wait; .  Figure 3-8: EAS block Algorithmic State Machine.  Chapter 3: Dedicated Autonomous Scan-Based Testing  58  The Algorithmic State Machines (ASM) for E A R A ' s F S M block is given in Figure 3-9 as pseudo-code. Here, after receiving the op-code in line 3, and based on the received op-code, the state machine of Figure 3-9 enters in two different states, i.e., Shift_PO_BSR and Shift_SO_BSR. In these states, the A S M generates necessary signals for the correct operation of the E A R A block to compare the incoming expected test results to the core primary and scan outputs.  In line 9, the state machine halts its operation and sends  calling signals to the E A S block before waiting for handshaking signals from the E A S ASM.  Upon receiving the handshake signals from the E A S block, the A S M asserts  primary output strobe signals in line 11. Finally, in line 13, the state machine halts its operation a second time for the E A S state machine to call for the resumption of the process. In this case, signals from the EAS block are used for comparing the core's scan outputs with the expected results.  Chapter 3: Dedicated Autonomous Scan-Based Testing  59  , 1.. if synch EARA is present then •2. if valid data is present and not ignore valid datathen 3. get the op-code; 4. ' 'case op-code of ,5. Shift-POJBSR: 6. while not end { 7. •. shift to the primary output registers; 8. } .9. call for EAS; -I0. wait for call from EAS. .11. assert PO strobe signals; 12. Shift-SOJBSR: 13. wait for signal from EASfor synchronization; 14. end case; 15. end if; 16. else { , " . . . 17. wait for valid data; 19. else { 20: wait for synch EARA; 21.}  Figure 3-9: E A R A block Algorithmic State Machine.  3.5 Experimental Procedure Quantitative comparison of DAST performance with the existing solutions is not realistic, as DAST is the first solution that combines ATPG-based testing and BIST. Moreover, published works in this area mainly use proprietary designs and do not provide their results for a set of benchmark circuits that can be compared easily. Therefore, this thesis attempts to set precedence and reports the key characteristics of its developed work on a number of benchmark circuits. Moreover, the results of the work of this thesis are  Chapter 3: Dedicated Autonomous Scan-Based Testing  60_  compared with lower bound cases, where it is assumed that ATPG-based testing does not require any on-chip circuitry and that it results in the minimum testing time. This section describes an SoC benchmark, UBC_SoC, developed for this work, and then provides details of the experimental procedure followed to evaluate the effectiveness of the D A S T methodology, when applied to both UBC_SoC and ITC'02 SoC Benchmarks . 1  Experimenting and validating DAST by compiling empirical test time data, area, and power consumption trade-offs requires benchmark SoC circuits. One requirement for this benchmark is that it be described in R T L code so as to be able to insert scan chain(s), generate the test vectors, and implement the complete DAST flow. Since benchmarks described in R T L for SoC studies are essentially non-existent at this time, UBC_SoC was generated. UBC_SoC constitutes of a series of benchmark circuits from the ITC99 Benchmarks [93], for which the R T L codes are readily available. UBC_SoC was created using Politecnico di Torino circuits (I99T) of ITC99 Benchmarks by changing the signal types from bit or integer to std_logic. UBC_SoC includes four cores: one instance of blO with a single scan chain; one instance of blO with three scan chains; one instance of bl5 with one scan chain; and one instance of bl5 with two scan chains. Figure 3-10 illustrates the UBCJSoC benchmark. The cores of UBC_SoC were subsequently wrapped using the guidelines of the IEEE PI500 standard as discussed in  ITC'02 SoC Test Benchmarks are a set of circuits intended to help the research community for objective comparison of methods and tools for modular testing of core-based SoCs [97]. 1  Chapter 3: Dedicated Autonomous Scan-Based Testing  6J_  Section 2.5.1. Using Synopsys Design Compiler™, the circuits included in U B C j S o C were first synthesized in T S M C ' s 0.18 Ltm technology.  Following this latter step,  Synopsys Test Compiler™ was used to insert scan chain(s) in each circuit of UBC_SoC and generate the A T P G test stimuli and expected test results files of Figure 3-4. EAS and E A R A compilers were then used to generate E A S and E A R A files, as discussed in Section 3.4.1.  I'lfftll Wi:ip|i>-i  I'lrBO Wrapinri  SI  SI  Clk  Clk  I'l 51)0 W i upper  Scan ffhaia Stun Chain Clk  l'l'llll WriiiiiK-r  SCK jisau f.liain  UBC SoC  Figure 3-10: UBC_SoC Benchmark.  For each of the constituent cores of UBC_SoC, corresponding EAS and E A R A blocks, as discussed in Section 3.4.2, were developed in V H D L . The gate-level representation of each core and its dedicated EAS and E A R A R T L codes were then combined into a top V H D L module and these modules were added as core instances in the U B C SoC  Chapter 3: Dedicated Autonomous Scan-Based Testing  62  benchmark. In addition, to simplify the task at-hand, the same clock frequency for both the cores and their EAS and E A R A blocks were used. Separate test-benches were also developed in V H D L . These test-benches were used to simulate the application of test vectors to each core in the SoC and predict expected test time. Given that op-codes consist of three bits and that, typically, for each test pattern, both E A S and E A R A sequence through several states, as illustrated in Figure 3-2 and described in Section 3.2, DAST test time models can be developed (see Appendix A). These test time models are defined for three different cases given by:  MX_DAST  T  iff  ( + 2PI+ SIX SE) + TP(23 + 2PI+ SI + SIX SE) 9  PO<PI+3  M2_DAST  T  iff  =( + 9  PI+SI+6>  T _ m  iff  =  DAST  2  P  I  +  S  I  x  )  SE  + ( TP  20  + PI + PO + SI + SIX SE)  _  (3  2)  PO>PI+3  = (9 + 2PI + SIx SE) + TP(\4 + 2PO + SIx SE)  p_3)  PO>PI+SI+6 where PI, SI, and PO are the numbers of primary inputs, scan inputs, and primary  outputs as defined in Section 3.2. Moreover, TP is the total number of test patterns, SE is the maximum number of scan cells in the scan chain(s), and TMI_DAST, TM2_DAST, and TM3_DAST are  the test time models for DAST, in terms of clock cycles.  Chapter 3: Dedicated Autonomous Scan-Based Testing  63_  Only limited information is available for the ITC'02 SoC Benchmark circuits. In the second phase of the experiment, the models given in (3-1), (3-2), and (3-3) were used to predict the test time performance of D A S T on the ITC'02 SoC Benchmarks, for which the test vector files are unavailable. In addition, to compare DAST test time to that of a serial connection in a conventional external ATPG-based methodology, theoretical lower bound test-time models for the latter were developed (see Appendix B). Underlying assumptions of the latter models are that test data is applied to the core serially and that the results are observed serially. These models are given by: T  = (2PI + SIx SE) + TP(2PI + SI + SIX SE)  sl  iff  4  PO<PI  T =(2PI S2  + SIx SE) + TP(PI + PO + SI + SIXSE)  iff  PI+SI>  T  = (2PI + SIx SE) + TP(2PO + SIx SE)  S3  iff  p_ )  _  ( 3  5)  PO>PI  ( 3  .  6 )  PO>PI+SI where Tsi, Ts2, Ts3 are in clock cycles. For each constituent core of the UBC_SoC benchmark, corresponding EAS and E A R A  blocks were synthesized in T S M C ' s 0.18 pm technology and area and power data were  Chapter 3: Dedicated Autonomous Scan-Based Testing  64_  collected. The data were collected using the Area and Power Report tools of Synopsys' Design Compiler™. Moreover, using the parameters given for the cores (modules) in the ITC'02 SoC Benchmarks, corresponding E A S and E A R A blocks were designed.  These E A S and  E A R A blocks were synthesized as described earlier and their area requirements and power consumption collected with Design Compiler™. Finally, using the DAST test time models as given in Equations (3-1), (3-2), and (3-3), DAST test times for ITC'02 Benchmark modules were compiled.  3.6 Results Table 3-1 reports total area and power for components of EAS blocks for the cores of the UBC_SoC Benchmark.  In Table 3-1, blO_lSC, blO_3SC, b l 5 _ l S C , and bl5_2SC  represent cores of the UBC_SoC Benchmark, as shown in Figure 3-10, and refer to blO with one scan chain, blO with three scan chains, b l 5 with one scan chain, and b l 5 with two scan chains, respectively. In addition, in Table 3-1, PI_BSR, SI_BSR, Buffer_in, and F S M refer to primary inputs BSR, scan input BSR, buffer input, and the F S M blocks of the EAS blocks, as shown in Figure 3-7a. The data for the EAS block, excluding the mandatory components of the IEEE PI500 wrapper, appears in the Adj_EAS column. Here, for the different cores under consideration, the EAS blocks amount to a total area equal to approximately 300 to 350 2-input N A N D gates.  Chapter 3: Dedicated Autonomous Scan-Based Testing  65  Table 3-1: EAS Area and Power for the U B C SoC Benchmark Cores Hardware Components of EAS for UBC_SoC Buffer in SI BSR  PI BSR Circuit b10 b10 b15 b15  1SC 3SC 1SC 2SC  EAS for UBC SoC FSM  Adi EAS  Area (nm ) Power (uW) Area (urn ) Power (uW) Area (nm ) Power (uW) Area (um ) Power (uW) Area (um ) Power (uW) 2  2439 2439 7155 7155  2  195 569 195 382  56 56 164 164  2  2  4 13 4 9  220 220 220 220  9 9 9 9  2765 2695 3399 3362  2  51 49 60 61  2984 2915 3618 3582  60 59 70 70  Table 3-2 reports total area and power for components of E A R A blocks for cores of UBC_SoC Benchmark. In Table 3-2, PO_BSR, SO_BSR, Buffer_in, and F S M refer to primary outputs BSR, scan output BSR, buffer input, and the F S M blocks of the E A R A blocks, as shown in Figure 3-7b. Again, similarly to the E A S blocks, the data for the E A R A blocks, excluding the mandatory components of the IEEE PI500 wrapper, is given in the Adj_EARA column. Here, for different cores, the E A R A blocks amount to a total area equalling that of 200 to 250 2-input N A N D gates. Thus, per core, the total additional area for the D A S T methodology in the UBC_SoC is minimal as it amounts to the equivalent of about 500-600 2-input N A N D gates.  Table 3-2: E A R A Area and Power for the UBC_SoC Benchmark Cores  Circuit  Hardware Components of EARA for UBC_SoC EARA for UBC_SoC FSM Adi EARA PO BSR || SO BSR Buffer in Area (um'') Power (uW)jJArea (um') Power (uW) Area (um") Power (uW) Area (um") Power (uW) Area (urn") Power (uW) II  b10 b10 b15 b15  1SC 3SC 1SC 2SC  919 919 9314 9314  19 19 205 205  ||  236 459 236 350  5 12 5 9  220 220 220 220  9 9 9 9  1638 1638 2224 2224  39 39 49 49  1858 1858 2443 2443  48 48 58 58  Chapter 3: Dedicated Autonomous Scan-Based Testing  66  Table 3-3 reports total area and power for components of EAS blocks for cores (modules) of the ITC'02 SoC Benchmarks.  The notation used in Table 3-3 for  components of the EAS blocks are identical to those used in Table 3-1. Here, for the different benchmarks under consideration, the EAS blocks amount to a total area equal to approximately 350 to 450 2-input N A N D gates and are consistent with those for the UBC_SoC cores.  Table 3-3: EAS Area and Power for ITC'02 SoC Benchmark Modules EAS for ITC'02 Benchmark Modules Adi EAS  Hardware Components of EAS for ITC'02 Benchmark Modules PI BSR ITC'02 cores d281 m5 d695 m5 d695 m6 d695 m9 f2126 ml f2126 m2 f2126 m3 q1023_m1 q1023 m2 q1023 m4 q1023 m10 h953 ml h953 m2 h953 m8 p34392_m1 P93791 m12 P93791 m19 t512505 m7 t512505 m8 t512505 m9 1512505 m14 t512505 m15 t512505 m16 t512505 m23 t512505 m24 t512505 m31 U226 m7  Buffer in  SI BSR  FSM  Area (nm ) Power (nW) Area (nm ) Power (nW) Area (nm ) Power (jiW) Area (nm ) Power (nW) Area (urn ) Power (uW) 2  40335 7155 11676 6594 67184 16018 5651 26235 41644 27370 56809 21137 12799 6594 2813 68144 101478 23024 19999 15457 141898 76600 160510 18690 34306 39762 18299  2  2  925 164 268 151 1539 367 130 601 955 627 1301 484 294 151 65 1560 2325 527 458 354 3246 1755 3674 428 787 912 419  1130 6033 3021 6033 1504 3021 195 2626 382 756 195 756 382 1504 195 8668 8278 195 569 195 195 195 195 382 195 5265 3769  26 138 69 138 35 69 4 61 9 17 4 17 9 35 4 199 190 4 13 4 4 4 4 9 4 121 87  220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220  2  9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9  3582 3655 3541 3639 4037 4005 3439 3696 3370 3541 3342 3744 3452 3704 3537 4265 4371 3537 3830 3464 3724 3358 3578 3875 3724 4387 3679  2  62 63 61 63 68 68 60 64 60 62 60 64 62 63 63 71 74 63 67 63 66 60 63 68 66 74 64  3801 3875 3761 3858 4257 4224 3659 3915 3590 3761 3561 3964 3671 3923 3757 4484 4590 3757 4049 3683 3944 3578 3797 4094 3944 4606 3899  71 73 71 72 78 78 70 74 69 71 70 73 71 73 72 80 83 72 76 72 75 69 72 78 75 84 73  Chapter 3: Dedicated Autonomous Scan-Based Testing  67  Table 3-4 reports total area and power for components of E A R A blocks for cores (modules) of the ITC'02 SoC Benchmarks.  The notation used in Table 3-4 for  components of the E A R A blocks are identical to those used in Table 3-2.  Here, for  different benchmarks, the E A R A blocks amount to a total area equalling that of 250 to 300 2-input N A N D gates and are consistent with the results of the UBC_SoC cores. Thus, per core, the total additional area for the D A S T methodology is minimal as it amounts to the equivalent of about 600-800 2-input N A N D gates.  Table 3-4: E A R A Area and Power for ITC'02 SoC Benchmark Modules EARA for ITC'02 Benchmark Modules AdLEARA  Hardware Components of EARA for ITC'02 Benchmark Modules  so  PO BSR  FSM  Bufferjn  BSR  Area (nm ) Power (uW) Area (um ) Power (uW) Area (um ) Power (nW) Area (nm ) Power (uW) Area (nm ) Power (nW) 2  2  d281 m5 d695 m5 d695 m6 d695 m9 f2126 ml f2126 m2 f2126 m3 q1023_m1 q1023 m2 g1023_m4 q1023_m10 h953 ml h953 m2 h953 m8 P34392 ml P93791 m12 P93791 m19 t512505 m7 t512505 m8 t512505 m9 t512505 m14 t512505 m15 t512505 m16 t512505 m23 t512505 m24 t512505 m31 u226 m7  30016 40335 20218 42559 69550 18409 2769 36062 28358 20633 50023 20218 11900 9176 12563 10717 57463 4858 19527 16108 50544 17441 117642 16364 17031 42006 8489  654 883 443 933 1529 404 61 792 616 452 1088 443 259 202 273 236 1249 107 427 354 1100 384 2577 359 375 921 187  785 3744 1915 3744 1012 1915 207 1695 329 549 207 549 329 1012 207 5395 5167 207 443 207 207 207 207 329 207 3281 2403  2  2  20 95 49 96 26 49 5 43 9 14 5 14 9 26 5 136 130 5 12 5 5 5 5 9 5 83 62  220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220 220  9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9  2464 2561 2431 2537 2643 2374 1976 2513 2439 2399 2610 2431 2285 2228 2301 2260 2626 2122 2374 2301 2659 2391 2700 2326 2350 2630 2204  2  37 39 37 39 41 37 31 39 37 37 39 37 35 35 35 35 39 33 37 35 39 37 41 35 37 39 35  2683 2781 2651 2756 2862 2594 2195 2732 2659 2618 2830 2651 2504 2447 2521 2480 2846 2342 2594 2521 2878 2610 2919 2545 2569 2850 2423  46 48 46 48 51 46 40 48 46 46 48 46 44 44 44 44 48 42 46 44 48 46 51 44 46 48 44  Chapter 3: Dedicated Autonomous Scan-Based Testing  68_  The relatively small area and power requirements of the DAST components prove that embedded deterministic testing of the D A S T methodology is achievable with only a small area overhead in addition to the B S R area overhead of the PI500 wrapper. In addition, these small area requirements prove the flexibility and scalability of the DAST methodology, as the area overhead of the EAS and E A R A blocks appear to be essentially circuit independent. In other words, from the data of Table 3-3, PI_BSR and SIJBSR blocks are the larges blocks of the EAS blocks. Similarly, from the data of Table 3-4, PO_BSR and SO_BSR blocks are the largest blocks of the E A R A blocks. However, given that cores are wrapped by the IEEE PI500 standard wrapper, these BSR are part of the standard wrapper and can be used in the EAS and E A R A blocks. Therefore, the area requirements of DAST appear to be relatively constant for different circuits, proving that D A S T is scalable in terms of its area and power requirements. To compare DAST area overhead with that of IEEE PI500, Adj_EAS, Adj_EARA, and IEEE P1500 BSR areas for ITC'02 Benchmarks modules are plotted in Figure 3-11. As evident from Figure 3-11, the total area overhead of D A S T (the added top two sections of each column in Figure 3-11) is in many cases a small fraction of the PI500 BSR area (the bottom section of each column in Figure 3-11). Given that the PI500 wrapper is now considered as a required DFT infrastructure to test embedded core with external testers, the results prove that D A S T provides a deterministic embedded test infrastructure with only minimal area overhead when compared to ATE-based testing.  Chapter 3: Dedicated Autonomous Scan-Based Testing  69  Adjusted EAS, EARA, and P1500 BSR Area for Modules of ITC'02 SoC Benchmarks 300000  Area (urn ) 250000  200000  • Adjusted EARA Area | • Adjusted EAS Area K • P1500 BSR Area  150000  |  100000  50000  <o' <o' <o' <o' <o' co' ^ c' n n 03 rt' oi' oi<5 fv e\/ « \ ' o f cy n us iti itf of 1  s » s s  r r  1  i  B  » tl « p » «• N 5 f f ^ _  i <*>' u>' , i <o >» © o o <ot mt <ot <o& <osy <oJ? <vf S  S  S  o o o s « ^ « 2 .» ? l> .» y P 5> P P P P 0/ oi  a  w 9  o. a  J * S> ip ,  , , t , «? < 5 «? I? £ K  N  Core TvDe  <,^ 5  Figure 3-11: P1500 BSR and adjusted EAS and E A R A areas for modules of ITC'02 Benchmarks.  The test time, using the DAST methodology, for the UBC_SoC cores is tabulated in Table 3-5. In this table TP, SE, PI, PO, SI, and  TQAST  denote the number of test patterns,  maximum number of flip-flops in the scan chain(s), functional input numbers, functional output numbers, scan input numbers and the simulated D A S T test time (clock cycles), respectively. In the same table, Ts denotes the theoretical lower bound test time for a serial connection in a conventional external ATE-based approach, as given by Equations (3-4), (3-5), and (3-6).  Chapter 3: Dedicated Autonomous Scan-Based Testing  70  Table 3-5: Simulated D A S T Test Time (clock cycles) and its Test Time Models Prediction Values for Cores of UBC_SoC DAST DAST % Error External Tester Time Model Test Time Time Model Between DAST Test Time & (Cycles) DAST Time Model (Cycles) (Cycles)  Circuit Characteristics  Circuit b10 b10 b15 b15  1SC 3SC 1SC 2SC  PI  SI 1 3 1 2  13 13 38 38  PO  SE  6 17 6 6 70 449 70 225  TP 52 52 556 537  T  S  2331 2488 328009 317356  TDAST 3492 3670 335780 324867  TM_DAST 3536 3693 335802 324883  100*(T DAST - TOAST )/T AST M  0  1.3% 0.6% 0.0% 0.0%  Moreover, Table 3-5 also includes TM_DAST obtained from the DAST time models given by Equations (3-1), (3-2), and (3-3). From Table 3-5, these test-time models closely follow actual simulated test times. As previously stated in Section 3.5, this thesis reports experimental results for a suite of benchmark circuits in order to provide a basis for quantitative comparison of future work in this area. In addition, since quantitative comparison with other works in this field is not practical, this thesis compares the results of its work with lower bound cases in ATPG-based external testing. This comparison is important because other works in the field target lower bound values for ATPG-based external testing. Therefore, Table 3-6 reports the estimated DAST test time based on Equations (3-1), (3-2), and (3-3), for the ITC'02 SoC Benchmark modules, and the percentage overhead compared to the lower bound conventional external ATE-based approach.  Note that in Table 3-6, the bi-  directional pins of a module are not included. Instead, these pins are counted in both the  Chapter 3: Dedicated Autonomous Scan-Based Testing  primary inputs and outputs.  71  From Table 3-6, the increase in test time with DAST is  minimal and in many cases is less than 2% when compared to a lower bound serial connection in a conventional external ATE-based approach. Table 3-6: Predicted Test Time (clock cycles) for ITC'02 SoC Benchmarks Modules in D A S T Methodology Circuit Characteristics  ITC'02 cores  SI  PI  PO  SE  External Tester Time Model (Cycles) TP  T  DAST % Overhead Time Model Between DAST Time Model (Cycles) & External Time Model TM_DAST  s  100*(T  M D A S T  -T  S  )fT  s  d281_m5  6  214  228  32  118  77084  78745  2.2%  d695_m5  32  38  304  45  110  226796  228345  0.7%  d695_m6  16  62  152  41  234  225420  228705  1.5%  d695_m9  32  35  320  54  12  30214  30391  0.6%  f2126_m1  8  356  529  1000  334  3034084  3038769  0.2%  f2126_m2  16  85  139  319  422  2276478  2282395  0.3%  f2126_m3  1  30  20  452  103  53351  55729  4.5%  g1023_m1  14  139  273  43  134  154712  156597  1.2%  g1023_m2  2  221  215  84  74  45898  47609  3.7%  g1023_m4  4  145  155  54  268  141474  145235  2.7%  g1023_m10  1  301  377  13  29  22858  23273  1.8%  h953_m1  4  112  152  348  341  579952  584735  0.8%  h953_m2  2  68  89  327  9  8278  8413  1.6%  h953_m8  8  35  69  189  305  504832  509111  0.8%  1  15  94  806  210  209576  212525  1.4%  p93791_m12  46  361  80  93  391  1977986  1986988  0.5%  p93791_m19  p34392_m1  44  538  437  100  210  1164676  1169515  0.4%  t512505_m7  1  122  36  514  608  462230  476223  3.0%  t512505_m8  3  106  147  1473  1025  4835456  4849815  0.3%  t512505_m9  1  82  122  530  195  151624  154363  1.8%  t512505_m14  1  751  381  1225  278  761111  767514  0.8%  t512505_m15  1  406  132  386  151  182247  185729  1.9%  t512505_m16  1  850  897  154  370  722614  727803  0.7%  t512505_m23  2  99  124  1372  532  1594686  1602143  0.5%  t512505_m24  1  182  129  1669  429  874619  884495  1.1%  t512505_m31  28  211  316  1550  3370  148431662 148478851  0.0%  u226 m7  20  97  64  54  76  99618  101375  1.8%  Chapter 3: Dedicated Autonomous Scan-Based Testing  72_  3.7 Summary and Conclusions Relying on external A T E resources is insufficient for the new paradigm of billiontransistor core-based designs. The timing requirements needed for at-speed testing of many cores are beyond the capacity of external resources. Therefore, embedded testers that take over some functionality of external ATEs are increasingly deemed essential. However, to achieve high-quality test and reduce products cost, these embedded blocks need to perform deterministic tests and use the benefits of A T P G test vector sets generation. Scan test sequence generation follows a simple protocol.  A scan test sequence  generation procedure can be viewed as constituted of two distinct parts: one of test data delivery and the other of test data control and observation. The test data control and observation, i.e., the generation of the specific scan vector sequencing tends to require much A T E resources for at-speed test and automated test application and flow. It has been shown that test resources partitioning results in lowering test cost. This chapter presented a methodology that essentially ports some of external testers capabilities onto the chip.  This amounts to a methodology that is referred to as Dedicated Autonomous  Scan-based Testing (DAST). The implementation of DAST requires introducing a level of hierarchy between the A T P G and the embedded core under test. In DAST, a simple compilation of the raw test data is performed off-chip to append a few op-codes with the test patterns. The data and  Chapter 3: Dedicated Autonomous Scan-Based Testing  73_  associated op-codes are then transferred onto the chip and interpreted by an Embedded Autonomous Sequencer (EAS) and an Embedded Autonomous Results Analyzer (EARA). With an area overhead equivalent to about 600-800 2-input N A N D gates, D A S T enables on-chip comparison of test results for an SoC designed with an ATPG/scan-based DFT methodology. Embedded results evaluation is enabled by sending the expected test results, in addition to the test stimuli, while E A R A operates in synchrony with E A S . D A S T components were implemented in T S M C 0.18 [im for different cores of the ITC'02 Benchmarks and their area requirements were reported.  In addition, and for  comparison purposes, test time models for D A S T scheme and the corresponding serial connection that one would expect to find in a conventional external ATE-based methodology, were developed. Per chip, the test time in D A S T is marginally longer than the test time of a comparable serial ATE-based testing. However, multi-site testing in D A S T is ideal. In other words, theoretically, unlimited number of chips can be tested concurrently on a single tester, as the expected test results are sent to the chip and the test comparison is performed on chip. Effectively, based on the total number of similar chips tested, test time is reduced significantly using DAST methodology when compared to typical ATE-based testing. Clearly, there are trade-offs in the case of DAST with respect to the area overhead, test time, and reduced functionality of the external tester. Overall, the advantage of D A S T is that ATPG/scan-based testing can be performed with minimal external components, providing a basis for SoC testing that is cost-effective, flexible, and portable. Moreover,  Chapter 3: Dedicated Autonomous Scan-Based Testing  74  the partitioning of test resources in D A S T facilitates a hierarchical test flow suited for automation of the entire test cycle. In addition, since the D A S T scheme is based on the conventional scan/ATPG approach, D A S T benefits from the same high test-quality, ease of use, and broad applicability characteristics associated with the conventional A T E based scan testing methodology. Finally, benefiting from combining embedded testing with A T P G , DAST enables at-speed and in-field testing that is based on A T P G test vector sets and is deterministic.  Chapter 4  Network-Oriented Indirect  and  Modular  Architecture  for  Test  4.1 Introduction In ATE-based test architectures that are generally in use today, one implicit assumption is that a single tester, acting as both the source and the sink, is in direct control of the embedded cores and their DFT. While this assumption results in simple test protocols in many cases, it undermines the modularity in the generic test architecture by tightly coupling the elements of the test architecture, i.e., the source, sink, wrapper, and the T A M . Such a tester uses the total test bandwidth, where the bandwidth is defined to be the sum of the products of test pins and their maximum frequency of operation.  75  In  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  76_  addition, based on the test requirements of every core, the system integrator divides and fixes the bandwidth between different cores of the SoC. For ATE-based test architecture templates, the C U T is physically connected to the tester through a test-head and, to maintain timing requirements, a physical proximity between the chip and the tester is required. In the example illustrated in Figure 1-1, there are eight test pins, of which, six pins are used for the test data and two pins are used for the test control. This pin arrangement is based on: (i) the number of test patterns for each core, assuming identical frequency of operation; and (ii) the total given number of test pins for the chip. In addition, the data lines are divided into two T A M groups, as shown in Figure 1-1, and the wrappers around the cores are designed to match the number of the core internal scan chains to the width of the T A M . The arrangement of Figure 1-1 requires a tester with eight test pins. If such a tester is unavailable in a test insertion, the T A M s need to be changed to accommodate the available tester's channels. Moreover, i f after the design or in a subsequent design, any of the cores needs to be connected to a different T A M width from the widths given in Figure 1-1, the wrapper for that core would need to be changed. As an additional example of the close coupling between the test elements of the test arrangement of Figure 1-1, the tester is forced to operate at a minimum frequency, as dictated by different delays in the forward and return paths of the T A M s .  Here, the  forward path refers to the data lines from the tester to the cores and the return path refers to the data lines from the cores to the tester.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  77_  From the above arguments, the diminished modularity in tightly coupled test architectures leads to a reduced flexibility in being able to modify the test architecture elements, as the modification of any part of the architecture requires subsequent changes in the other parts. For example, the addition of one extra core on a T A M can change the timing characteristics of the T A M and, hence, the operating frequency of the tester. By the same argument, the restricted modularity also results in a reduced flexibility in regards to implementing or integrating new schemes in the methodology.  As chips  become more complex, a high flexibility for implementing new schemes in the test methodology is key for keeping the overall test cost low and productivity high (see Section 2.3). Test architecture and methodology with such flexibility are considered to scale well with changes in the test design and flow, and are referred to be scalable in this thesis. Moreover, with a tightly coupled test model, testing requires close proximity of a chip and the tester. Hence, in-field testing, i.e., testing the chip when in its target system, or remote-access testing, is virtually impossible or extremely costly. In addition, multiple testers cannot test a chip simultaneously. A multiple-tester arrangement is key i f test resource pooling is required between multiple sites or companies to reduce test cost. Furthermore, using a multiple-tester arrangement can be cost-effective when cores test speeds vary significantly. In such a scenario, it is more cost-effective to reserve the highspeed tester's channels for fast cores and use low-speed channels for lower speed cores. A multiple-tester arrangement is in direct contrast to a test architecture where operating at  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  78_  the lowest frequency is the fundamental mechanism used to solve the disparity between the speed of tester, I/O pins, the T A M , and the cores. Volkernik et al. in [35] have shown the cost benefit of bandwidth matching between a tester and cores. As the first steps towards developing solutions addressing the issues discussed above, in this work a Network-oriented, Indirect and Modular Architecture (NIMA) is proposed where different testers can connect to a common switching fabric and send test data to the cores under test.  N I M A is considered to be a special communication network that  consists of hardware and software that allow the transportation of test stimuli and expected results from multiple sources to multiple cores. In addition, a switching fabric for the test architecture is proposed in this thesis.  However, the N I M A architectural  design is such that it can be migrated into a template, where the common switching fabric used for core interconnections is also used for testing the SoC. The indirect methodology of N I M A breaks the coupling between the core, the T A M , and the tester by de-coupling test-data processing and its communication. In this thesis terminology, test-data processing refers to the functional behaviour of source/sink and wrapper, whereas, test-data communication refers to the interaction between source/sink and the wrapper.  Hence, N I M A alleviates the previously discussed problems associated  with tightly coupled test models. N I M A provides the basis for modular test design and programming. N I M A also enables single or multiple testers to send test data over a local area network (LAN) or a wide area network (WAN), such as the Internet, to an SoC infield in order to test the chip in its target system. Finally, the control mechanism of the  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  79_  test architecture is incorporated in the packets of N I M A . Hence, compared to typical test architectures that require separate data and control lines in their architectures, fewer test pins are required when using N I M A .  4.2 NIMA Concept The key concept in the N I M A is establishing an indirect digital communication path, from multiple sources to multiple destinations, through a switching fabric [94] [95]. Testing ICs using the N I M A scheme requires that test stimuli and expected test results for cores are first compiled into new formats and then encapsulated into packets. These packets are subsequently augmented with control and address bits such that they can autonomously be transmitted to their destination through a switching fabric. Owing to the indirect nature of the connection, embedded autonomous blocks at each core are responsible for applying the test to the core and comparing the test results. To simplify the requirements of these embedded blocks, this thesis assumes that: 1) The packets can arrive at their destination cores with varying delays; 2) The packets arrive in their original order; 3) No packet is lost in the communication link. The proposed N I M A is illustrated in Figure 4-1 as a block diagram, where IP cores, a switching fabric, on- and off-chip sources, Embedded Autonomous Sequencers (EAS), Embedded Autonomous Results Analysers (EARA), an interface matching (IM) block, and an optional transmission system are shown.  2  The indirect property of N I M A refers to tester's lack of direct control over a core's DFT.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  Test stimuli and expected results in packet format Local \ „ Sources \ (Off-chip)  80  Remote Sources  \  (Ofl-chip)  Transmission' \ System ;/\  j ..-  IntcitjLe Manning (IM) Tift '' L  Switching Fabric  CORE si:.. A' k!ts£il^sllllltiill^ :  ^  Sources  Pass/Fail  1 ^  (Oivchip)  1ARA  1r CORE PasS'Fail  '  EAS  1  EARA  J  CORE SoC  Figure 4-1: Conceptual representation of N I M A .  In N I M A , test stimuli and expected results are encapsulated into packets, hence, dedicated blocks at each core, indicated as EAS blocks in Figure 4-1, extract test stimuli from the incoming packets and apply them to their respective core.  Moreover, in  synchronism with the application of the test vectors by the EAS blocks, dedicated E A R A blocks compare test results at the output of their respective core to the expected results within the incoming packets.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  8J_  As illustrated in Figure 4-1, the connection between off-chip sources and the I M block can be direct and/or through a transmission system such as a W A N or a L A N . The I M block in Figure 4-1 receives test packets from different test sources' with varying bandwidths and matches these bandwidths to those of the SoC, where the bandwidth is defined to be the product of the frequency and the number of the channels.  As an  example, consider the case in Figure 4-2 where four different sources send test data to the SoC.  These four sources have five, three, two, and one channels, respectively.  Moreover, these sources can send data at maximum frequencies of 50, 100, 100, and 150 M H z per channel, respectively (note that, there are no control lines associated with these sources, as control signals are embedded in the incoming packets). In addition, in Figure 4-2 three different groups of test input pins to the SoC can be identified. These three groups have 2, 2, and 1 channels with maximum frequency of 200, 100, and 150 M H z per channel, respectively. According to the sequence of the incoming packets, the I M block matches the total incoming bandwidth of 900 M H z to the total SoC bandwidth of 750 MHz.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  Source  Source Source  >  Source  ^5(50MHz) |2(100MHz)  82  Transmission' System  3(100MHz)  1 (150MHz)  IntciIIIlc M lloluna l(150MHz)  ^BiiiSiSiijiiii '  2(100MHz)  2(200MHz)  Switching Fabric  SoC  Figure 4-2: A n example showing function of the I M block.  To help with partitioning of test resources, N I M A can be regarded as a special communication network, consisting of hardware and software that allow transportation of test stimuli and expected results from multiple sources to multiple cores.  Regarding  N I M A as a special communication network helps in using the cumulative knowledge in different communication networks, and to apply this knowledge for the specific case of a test network on-chip. To promote the modularity and simplicity of the design tasks, a layered architecture with a formal interface between each layer is the most established method to divide the functions implemented by communication networks [96]. Since the communication tasks involved in SoC testing are not as complex as those in other communication networks, this thesis uses a 3-layer model for N I M A that consists of a  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  83  Physical Layer, a Network Layer, and an Application Layer, as shown in Figure 4-3. In this model, the tasks in one element of the architecture deal directly with tasks in another element within the same layer through a virtual link such that modifications in other layers do not alter the protocol within a layer. The physical connection, however, is only in the vertical direction of the model except in the Physical Layer, where there is no virtual link but only a physical link.  Figure 4-3: The conceptual 3-layer model in N I M A .  4.3 Physical Layer The Physical Layer encompasses the actual interconnection medium. For contemporary ICs, the physical medium consists of the metal wires routed between the SoC blocks. This layer specifies the voltage level of the signals on the wires, the timing of the signal  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  84  events, the signalling techniques, and other physical properties of the link, such as protection measures against cross talk. Finally, the Physical Layer presents the data as a stream of l's and O's to the Network Layer in the present design of N I M A . This thesis assumes that the reliability of the physical link can be maintained in a chip, and hence no error detection mechanism is implemented.  In addition, this thesis assumes a  synchronous transmission in N I M A where a separate clock is provided next to the data bit stream. However, it is possible to encode the clock in the data and later extract it in the switching fabric [96]. Although electrical signals over metal wires are the dominant physical link in today's chips, it does not mean that N I M A cannot use a completely different physical connection, such as guided or unguided electromagnetic waves, in the future.  One benefit of a  layered approach in N I M A ' s design is the possibility of utilising new approaches with minimal design efforts as new techniques become available and when their uses are justified.  4.4 Network Layer The Network Layer dictates the details regarding how data is transmitted across the network, the switching technologies used, and the network topology implemented. The design of the Network Layer in N I M A is greatly influenced by the packet format used. Hence, this section starts by describing the design of the packet format. Following the design of the packets, the design of switches in the fabric, the addressing mechanism, and the routing strategy in the Network Layer of N I M A are provided.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  85_  4.4.1 Packet Format The most fundamental design issue is the packet format. The packet format in N I M A is illustrated in Figure 4-4.  s Sync Word  A  L  A  Data Length  Address Lenqth  Address  D Data  Figure 4-4: N I M A packet format.  Here, a brief description of each field is provided.  Sync Word Field: The Sync Word signals the beginning of the packet to a switch. The starting point of a packet within the incoming bit stream needs to be identified for the switches, as the communication link in N I M A can be idle for any length of time. The predefined pattern in Sync Word identifies a new packet to the switches.  Data Length Field: Packets in N I M A can have varying data length to accommodate different requirements of cores in terms of test data length. The Data Length field identifies the length of the embedded data in the packets to the switches, and hence, enables the switches to switch entire packets to the proper output channel.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  86  Address Length Field: As discussed later in Section 4.4.3, a novel dynamic addressing mechanism is used to promote flexibility and scalability in seamless integration of hierarchical SoCs.  A  variable length Address field is used to achieve the dynamic addressing mechanism of Section 4.4.3. The Address Length defines the length of the Address field, and, in effect, acts as a counter to the number of locations in the Address Field.  Address Field: The Address field holds a variable length of address bits. The details about this field are provided later in Section 4.4.3.  Data Field: The Data field holds a variable number of data bits (alternatively referred to as the payload in this thesis). The payload includes test data as well as other related patterns for the Application Layer. The sizes of the fields in N I M A packet format, in terms of number of bits, are denoted by S, LD, LA, A, and D as shown in Figure 4-4. For the Address field, A represents the number of locations in this field where each location contains n bits. This results in switches having 2" output channels. In addition, the first four fields, identified in Figure 4-4 with a different colour/shading from the last field, together constitute the header of a packet.  In the header of the packets, the relative positions of the Data Length and  Address Length fields have been chosen arbitrary as shown in Figure 4-4.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  87_  4.4.2 Switches Switches in N I M A have one input and 2" output channels. To promote the scalability of the N I M A design, each channel in the switch is of width 2k and is comprised of teststimuli and test-results sub-channels. The black box diagram of a switch in N I M A , with n=2, is shown in Figure 4-5.  Test-stimuli sub-channel Test-results sub-channel  Figure 4-5: Black box diagram of a switch in N I M A with four output channels.  For switches in N I M A , the least significant bit (LSB) of the test-stimuli sub-channel is denoted as the primary line and only this primary line follows the format of N I M A packets. For this reason, the packet header is only defined in the primary line, denoted by Line 0 in Figure 4-6. As illustrated in Figure 4-6 and Figure 4-7, with the header only defined in the primary line of the test-stimuli sub-channel, part of the switch channel bandwidth is reserved and thus not used. However, this arrangement results in a scalable  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  88  architecture where the value of k can range from one to its maximum available value without any change in the design of switches or N I M A ' s Network Layer protocol. That is, the same generic code for the switches can be used to instantiate the N I M A network for a particular design.  Sync Word  Line 0  Width of Test-stimuli Sub-channel  Data Length  '.AddressLength •  Reserved  Line 2  Reserved  Line 3  Reserved  Data  Pa\ load  Header  Line I  Address  c,  Reserved  Payload  Reserved  Payload  Reserved  Pavload  Reserved  Line (k-1)  Reserved  V„-t  Reserved  Pa> load  Figure 4-6: Occupancy of test-stimuli sub-channels in N I M A switches.  f  Width of Test-results Sub-channel  Line 0  Reserved  Payload  Line 1  Reserved  Payload  Line 2  Reserved  Payload  Line 3  Reserved  Payload  Reserved  Pa\ load  Reserved  Pa> load  ^_ Line (k-1)  Figure 4-7: Occupancy of test-results sub-channels in N I M A switches.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  89_  Line 1 to Line (k-1) in both the test-stimuli and the test-results sub-channels can either hold valid or invalid payload in a given packet. The V flag bits shown in Figure 4-6 help the tasks in the Application Layer of N I M A to identify lines with valid payload where a line with valid payload is identified with logic 1. Note that, in terms of having valid or invalid payload data, the status of corresponding lines in the test-stimuli and the testresults sub-channels are identical. Hence, for example, i f Line 3 to Line (k-1) do not carry valid payload data in the test-stimuli sub-channels, the same lines do not carry a valid payload in the test-results sub-channels. In addition, note that i f a packet is present on a switch's output channel, Line 0 in both sub-channels carry valid payload'data. In addition to payload validity, the tasks in the Application layer require further payload information. As shown in Figure 4-8, the payload is divided into an array of k x D bits for some applications. However, certain payloads may not occupy the trailing bits of this array in all the k bits of the output sub-channels. Figure 4-8 shows an example for a design with k-6.  In this case, a payload of length 81 bits is divided into an array of  14x6 bits, where an X denotes invalid bits in the array. As shown in Figure 4-6, C flag bits can be programmed in Line 1 to Line (k-1) of the test-stimuli sub-channel to identify the invalid bits in the array.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  90  Dbits r r  B  ,'Sf  I <-0  A: bits -s  H X X  f§ X  Figure 4-8: A typical payload array in sub-channels with invalid last bits marked by " X " .  However, the C flag requires only ("log k] bits and is used as a binary number, |C|, in 2  the Application Layer. This number only shows the status of the trailing bits in Line 1 to Line (k-1) of the sub-channels, as Line 0 in the sub-channels will always have a valid trailing bit in the existence of a packet. Thus, the number of lines with an invalid trailing bit is equal to k -1 - |C|.  For the case shown in Figure 4-8, the C flag will consist of  three bits and is equal to 010, indicating 6-1-2=3 lines having invalid trailing bits. In addition, switches in N I M A can calculate the end of the packets before the last bit of the packet leaves the switch.  Thus, theses switches can instantly look for the next  incoming packet after the last bit of the previous packet. Therefore, packets in N I M A can be pipelined one after another without any gaps between them.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  91  4.4.3 Dynamic Addressing Mechanism Using the packet structure of Figure 4-4 and the packet format as explained in Section 4.4.1, the number of the bits in the Address Field is: A < n X 2  . Since the Logical  La  Address Space (LAS) in N I M A is LAS = 2 , therefore LAS = 2 A  n  x  ^  . If the values  of n and LA are chosen appropriately, the L A S can be very large, and hence enable the system designer to use partial addressing without significant disadvantages. This partial addressing enables seamless integration of new cores as well as design of hierarchical SoCs.  Using variable length addressing, the entire L A S is essentially divided  into 2 " hierarchical levels each comprised of 2" pages. As an example, assume n=2 and L  L =10. A  For this case, LAS = 2  2048  and the L A S is divided into 1024 hierarchical levels  each of 4 pages. The value of A, the length of the Address field, need not be constant and can be chosen such that N < 2  n  x  A  , where N is the total number of cores to be accessed for testing.  As an example, consider a system with 49 cores. For this example, the minimum value of A is three as 49 < 2  2 x  3  . Thus, based on the configuration of the network, all the  cores can be addressed with a minimum of three levels of switches. Now, assume that a later revision of the design requires the use of 73 cores instead of 49. For this case, the minimum value of A is four as 73 < 2  2  x  4  . Similarly and based on the configuration of  the network, all the cores for this new revision can be addressed with a minimum of four levels of switches. Hence, a configurable and modular architecture is provided, where  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  92_  the size of the network scales and grows according to the number of the cores without any design change. The Physical Address Space (PAS) is defined as the part of the L A S that is actually being used. The PAS can vary in size and it can be chosen to be the minimum size required. When new cores in later iterations/revisions are added to the SoC and i f the PAS is all assigned, new levels of addresses can be introduced to increase the size of the PAS. Thus, new cores can be accommodated with no design modification of the Network Layer. This yields the advantage of a scalable T A M architecture between SoC design versions, referred to here as design-version scalability. Moreover, i f an existing SoC is used as an embedded core in a subsequent SoC, the PAS of the earlier SoC is assigned to the lowest part of the subsequent SoC's L A S . The PAS of the subsequent SoC is then continued from the next available hierarchical level of the 2" pages.  This creates another level of scalability: multi-level scalability.  Again,  using the design-version scalability feature, new cores in the subsequent SoC are assigned addresses in the PAS. Design-version scalability and multi-level scalability are illustrated conceptually in Figure 4-9.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  Subsequent SoC's PAS[ expands as required by additional cores  93  Existing SoC's PAS expands as new cores\ are added I (Design-version Scalability)  (Design-version Scalability)  Example of existing SoC PAS  NIMA's . Logical Address Space (LAS) (Not shown to scale)  Y  Example of Subsequent SoC PAS (Multi-level scalability)  Figure 4-9: Address spaces for N I M A ' s Network Layer. It is possible to connect each first-level switch to a primary I/O pin of the system chip. This effectively divides the entire chip space/address into subsections with uncorrelated networks that can be individually addressed and accessed. Hence, the Effective Logical Addressing Space, LAS  m  =mx2  n x 2 L A  ,  LASEJJ,  for an n-dimensional architecture can be sized to be:  where m is the number of chip primary I/Os. Note that in general,  m>2k, where 2k is switches channels width, as defined in Figure 4-5. A feature of the proposed dynamic addressing mechanism is the limiting of the latency of packets in the switches. To illustrate this, consider an alternative Network Layer where fixed address-length addressing mechanism is used. For such an example, assume that the packets use B bits in their address field. A switch in this hypothetic Network Layer can only decode the address, before routing the packets, when all the bits in the address field have been received. Thus, the latency introduced by each switch in this case is proportional to B. In the case of the N I M A , an address length pointer is used and hence, a switch can decode the address field as soon as a minimum number of address  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  94_  bits are received, i.e., n bits in the proposed architecture (see Section 4.4.4). Hence, the introduced latency by a switch in N I M A is proportional to LA+H. In order to be able to address more than five hundred cores in the case of the fixed address-length architecture, B needs to be greater than nine. However, in the case of N I M A , L +n A  can be as low as five where n=2, L =3 and, hence, 500 < 2 A  .  Moreover, the smaller latency in N I M A results in the need for smaller buffering when compared to a fixed address-length addressing mechanism. In addition, the used address bits in each N I M A switch can be taken out of the packet and discarded.  This  progressively shortens the length of the packets, and hence, the introduced latency, as packets move in the network.  4.4.4 Routing in the Switches  3  To eliminate the need for maintaining routing information in the switches, the packets in N I M A are routed using source routing, i.e., implying that the route is predefined and hardwired. As described in Section 4.4.1, the Sync Word in the primary line identifies the beginning of an incoming packet to the switches. The switches in N I M A save S + LD + L and the first n bits in the Address field of a packet as defined in Figure 4-4. The A  switches then use the first n bits in the Address field of the packets to decode the  In this thesis, the Network Layer is based on virtual indirect connections provided by switches. That is to say, to simplify N I M A , packets are sent in a particular order and are required to reach their destination in that order. Designs that are more elaborate will include out-of-order packet reception that is not considered in this work. Future studies can also look into the effects of out-of-order packet reception in terms of design complexities and cores' fault coverage in cases where packets are not put back in their original order at the cores.  3  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  95_  destination of the packets and identify the output channel to which the packet will be forwarded. As soon as the output channel is determined, the packets are routed to that channel. This ensures a minimal number of buffers and a maximum of S + Lu + LA + n clock cycles delays in the switches. This technique is referred to as cut-through routing [96].  In cut-through routing, packets are forwarded to the router output as soon as the  destination is known, without waiting for the tail of the packet to arrive. This technique reduces latency of the network and reduces the amount of required hardware in the form of buffers in the routers.  4.5 Application Layer In N I M A Application Layer, the data is comprised of test vectors, test results, DFT and wrapper control signals that have to be converted to/from packets.  The N I M A  architecture allows arbitrary bit widths for switches sub-channels, k, such that packets in a sub-channel can be blocks of ^-dimensional bit arrays instead of a one-dimensional bit stream (note m>2k where m is the total primary test pins). As a result, the cores that require the use of the test network must have the capability of scaling the packet payload into the bit width suitable for their application. Moreover, for packets wider than one bit, handshaking ready signals are required to indicate which bits are valid, as the data may not completely fill the entire payload array. This ready signal is essential to maintain a constant interface protocol between the Network and the Application layer. In effect, the information in the V and C flags as described in Section 4.4.2 can be used for the ready signal.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  96_  Since cores on an SoC are assumed to possibly have different origins, a standardised test interface or wrapper at each core is critical. Moreover, a standard language for the description of core test programs and DFT is also essential. As described in Section 2.5.1, one such standard, referred to as P1500 [51], is currently under development by the IEEE and is used in this thesis. Specific PI500 instructions are required to support full scan test as well as any other DFT strategies. Furthermore, a mechanism is needed to generate the PI500 control signals in the correct sequence at the core site.  The  mechanism needed to generate the PI500 control signals and the test programs based on Core Test Language (CTL) [51] fall in N I M A ' s Application Layer. For cores with a PI 500 wrapper, a set of PI 500 wrapper control flags is devised in the Application Layer of N I M A . These control flags enhance the flexibility and scalability of N I M A such that N I M A remains independent of future modifications or improvements to the PI500 standard, as long as the interface between the Application Layer and the Network Layer is maintained. In the design of this thesis, these control flags are based on a set of wrapper control bits embedded in the payload that results in an additional hierarchy in the message that is to be packetized for each core. These bits serve as instructions for the PI500 control mechanism. Hence, any modifications to the PI500 wrapper will result in the modification of these bits or their interpretation in the Application Layer. As shown in Figure 4-1, blocks of E A S and E A R A from the DAST methodology of Chapter 3 are other Application Layer tasks. EAS blocks are responsible for receiving  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  97_  test stimuli and applying them according to the instruction embedded in the payload. These instructions detail the data bits that must be applied to the pins of the core under test, and the timing at which they are applied. E A R A blocks, on the other hand, receive the expected results and compare them to the output of the core.  Using internal  signalling, E A R A blocks operate in synchrony with the E A S blocks to ensure the integrity of the test process. Another task in the Application Layer is decompression of the incoming test data i f compression is used on the original test data. Compression and decompression of test data can help toward reducing test time, and hence test cost, by reducing the required memory in the external tester and the number of bits needed for communication. There are many different compression and decompression techniques [40], and i f used, their tasks fall within the Application Layer in N I M A . Scheduling is another Application Layer task. There are many proposed schemes for SoC test scheduling in the literature [85][86][87][88][90], and the results of these works can be used for scheduling in N I M A . The problem definition in N I M A scheduling is similar to the above works, and is to determine when to send test packets for each core such that, while keeping within the bounds of any given constraints, the overall test time and the hardware complexity are minimised. For instance, the scheduler must prevent conflicts in the network resource and, at the same time, minimise the test time. During the test, activating all core DFT simultaneously may result in power dissipation that exceeds the chip junction or package heat tolerance. Hence, based on a given power  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  98_  budget, the scheduler needs to control the activity of the DFTs [98]. Moreover, the test data from the N I M A network may arrive at faster rates than can be consumed by a core. Thus, a buffering scheme is required to ensure the core gets the data only when it is ready to accept it. Otherwise, the integrity of the test will not be maintained. To prevent buffer overflow, constraints must be applied to the time intervals between consecutive packets destined for the cores.  4.6 Implementation To validate the concept, N I M A was implemented on the UBC_SoC platform of Section 3.5 and a chip was fabricated. The design and implementation flow follow a typical digital ASIC deign and fabrication flow that uses electronic design automation (EDA) tools. This section provides the details of the implementation for the three layers of N I M A ' s architecture.  4.6.1 Physical Layer For the Physical Layer, the traditional metal interconnect was used. Automatic tools, as part of a standard ASIC design flow, routed the wires in the implementation of N I M A . The voltage on the wire is expected to swing between OV and 1.8V, which is a typical range for the targeted 0.18 |im technology. The analysis suggested that the capacitive coupling between routed wires would not be severe enough to cause erroneous signal transition. Therefore, no additional hardware for cross-talk protection was implemented.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  99  4.6.2 Network Layer To mirror a real SoC with embedded cores, it was assumed that there is no direct access to the cores in UBC_SoC from its primary inputs/outputs (I/Os). assumed that only three test pins were available in the platform.  Moreover, it was Based on these  assumptions, the switching fabric was designed such that cores can be accessed through a maximum of two levels of switches. Figure 4-10 illustrates the interconnect architecture used for the switching fabric in UBC_SoC.  Switch Channel 00 •  —  •  Switch  •  Channel 00 • • To blO with one scan  •  Channel .01'  -.  :; To blO with three scans  'Channel 01 To b!5 with one scan:;':!  Channel 10 • To b!5 with two scans -•=.  SoC3  Figure 4-10: Interconnect architecture for the switch fabric in UBC_SoC.  Figure 4-11 illustrates a block diagram of the switches in the N I M A implementation. In Figure 4-11, incoming packets are first buffered.  Subsequently, appropriate logic  informs the finite state machines (FSM) of the detection of the sync word that, in turn, indicates the beginning of a packet. After the detection of a packet, the F S M provides necessary signals for the Packet End block such that it calculates the length of the packet.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  100  The F S M also provides necessary signals for the Last Switch Detection block. Based on the signal from the Last Switch Detection block, i f the switch is a last switch, i.e., the switch that is directly connected to cores, the switch only outputs the payload and not the entire packet. As detailed in Section 4.4.2, based on the valid or invalid status of each link in the output channels and the trailing bits in the payload array, necessary information is extracted from the packets to generate the Ready signal, as the processes in the Application Layer need the Ready signal to interpret the payload. Finally, the F S M provides necessary switching signals for the De-Mux block to route the packet, or the payload in the case of last switches, to the appropriate output channel.  Test-results sub-channel  Test-stimuli sub-channel ^  i  ^  ~r—• /  4  2Jc  Control Out  Figure 4-11: Block diagram of a switch in the N I M A implementation.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  101  The switches in the Network Layer that process the packets header and eventually route packets, were written in V H D L R T L and synthesised with the following parameters: S=6, LD=10, LA=6, n=2, and k=l. S+LD+LA  The switches buffer the incoming data for a maximum of  bits in shift registers. In the present implementation, this amounts to 22 bits.  After detecting an incoming packet from a Sync Word in the primary line, switches use the first two bits in the Address field to decode the destination of the packet and identify the output channel to which the packet will be forwarded.  In the present  implementation, after the decision on the routing is made, the switch deletes these two bits from the address field, updates the Address Length field, and passes the packets to the appropriate output channel. The Ready signal is of width [~log k~\ + k bits and is the concatenation of three parts. 2  The L S B is a general flag showing i f a valid packet is present on a switch's output channel. The next k-1 bits are a direct copy of the V flag bits in the test-stimuli subchannel, as shown in Figure 4-6. The remaining bits are a copy of the C flag bits, as shown in Figure 4-6 and explained in Section 4.4.2.  4.6.3 Application Layer The implementation of DAST, as given in Section 3.4, was used with minor modification to interface to the Network Layer of N I M A . The required modifications were mainly in the F S M blocks of Figure 3-7a and Figure 3-7b such that the control signals to these F S M could be driven from the control signals of a switch, as shown in Figure 4-11.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  102  Moreover, pre-processing C programs were developed to construct N I M A Network Layer's packets, and to encapsulate EAS and E A R A data files into the packets payload. The pre-processing can be executed off-line with minimal computational effort. These programs accept as inputs the binary files of Test Stimuli and Expected Test Results, as illustrated in the Figure 3-4, and packetize the data in these files according to the teststimuli and test-results sub-channels definitions given in Figure 4-6 and Figure 4-7. This packetizing task is performed by identifying sections of the data in the Test Stimuli and the Test Results files that can appear in the payload section of the packets such that the maximum length of the payload field is not violated and the Application Layer data is not corrupted. Subsequently, these programs add the appropriate packet header information such that the switches in the N I M A Network Layer can autonomously route the payload to the appropriate core. The scheduling of the packets is trivial, as three test-pins are used in the present implementation of N I M A . Using three test-pins and assuming the same frequency for different layers of the test architecture implies that the rate of incoming data is always smaller than the rate of data that can be used at the cores. N I M A implementations with more test pins can use more sophisticated scheduling techniques.  4.7 Experimental Results This section analyses and reports the area and the power requirements for the current implementation of switches in the Network Layer.  In addition, to provide an  understanding of the area and the power requirements for the current implementation of  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  103  N I M A Application Layer, it reports these for a selection of ITC'02 Benchmark Circuits modules. Finally, Section 4.7.2 presents the test times for cores of UBC_SoC.  Using  these empirical values, a test time model for the current implementation of N I M A is developed. This model is then used to predict the test times for a selection of ITC'02 Benchmark modules.  4.7.1 Area and Power Overhead Using Synopsys Design Compiler™, the area and power requirements for the current implementation of the switches in the Network Layer are 12656 urn* and 3026 uW, respectively.  Therefore, the area overhead of N I M A Network Layer is equivalent to  about 1250 2-input N A N D gates per switch. The modifications to the design of DAST components to interface with N I M A switches are minimal. Hence, the area and power for components of EAS and E A R A blocks for cores (modules) of the ITC'02 SoC Benchmarks are identical to those provided in Table 3-3 and Table 3-4, respectively. Thus, the total additional area for the present N I M A Application Layer is minimal as it amounts to the equivalent of about 600-800 2-input N A N D gates.  4.7.2 NIMA's Test Time The test time models given in Equations (3-1), (3-2), and (3-3) can be used to predict the required total data bits for EAS and E A R A in N I M A (denoted as DDM in this thesis). This holds true, as the models are equal to the number of the clock cycles needed to apply  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  104  all the data bits and, hence, are equal to the total data bits for E A S and E A R A (note that the sizes of EAS and E A R A data files are equal). In N I M A , all the data bits for EAS and E A R A blocks are encapsulated in the payloads. Hence, the total number of packets needed to accommodate all E A S and E A R A data bits can be modelled as Packets such that  Packets = j\  DDM  (4-1)  D  where y>7 is an empirical calibrating coefficient , DDM denotes E A S and E A R A data 4  bits, and D represents the size of the payload in bits. Finally, the test time model for any core in the present implementation of N I M A is denoted by TMN and according to the following expression T  MN  = Packets x (22 + A) + DDM  ^_ ) 2  where A denotes the number of the address bits used for the core (for example, using Figure 4-10, A-4 for blO with three and one scan chains and A=2 for b l 5 blocks). In addition, in Equation (4-2) the first part is equal to the total header bits in all the packets and DDM represents the total payload data for the core.  Due to their structure, the E A S (or Test Stimuli) and E A R A (or Expected Test Results) data files may not necessarily divide into complete D bits segments. In many cases, these segments, that represent the payloads, will be shorter than the maximum D bits allowed. The calibrating coefficient, y, in Equation (4-1) is, hence, used to compensate for this fact. 4  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  105  For each core of UBCJSoC, Table 4-1 reports the simulated N I M A test time, TV , for the implementation of UBC_SoC with D=1000. In addition, Table 4-1 reports the value of TMN as given by Equation (4-2), where y=2 and D=1000.  Moreover, to compare  N I M A test time to that of a serial connection in conventional test architectures, using Equations (3-4), (3-5), and (3-6), Ts is calculated and reported in Table 4-1. Finally, the last column in Table 4-1 reports the percentage error between the experimental N I M A test time, TV, and the time model, TMN, as given in Equation (4-2).  Table 4-1: N I M A Simulated Test Time and its Test Time Model Prediction Values for Cores of UBC_SoC (in clock cycles) NIMA % Error External Tester NIMA Total Time Model Test Time Test Data Bits Time Model Between NIMA Test Time & (Cycles) (Cycles) (Cycles) NIMA Time Model  Circuit Characteristics  Circuit b10 1SC b10 3SC b15_1SC b15 2SC  SI  PI  1 3 1 2  SE  PO  13 13 38 38  7  6 71 72  17 6 449 224  TP  49 52 1065 1057  T  S  2199 2488 629940 626268  T  N  3618 3714 686945 672202  DDM  3335 3693 644859 641075  TMN  3543 3901 675819 671891  100-(T - T )/T MN  N  N  -2.1% 5.0% -1.6% 0.0%  Assuming no control lines are required in the conventional external test architectures, Table 4-2 reports the estimated N I M A test time based on the model captured by Equation (4-2) for the ITC'02 SoC Benchmark modules and the percentage overhead compared to conventional external test architectures. TMN in Table 4-2 is reported using the maximum value of D-1023, assuming A =2 for every core, and using the same value for y as used in  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  106  Table 4-1. Note that in Table 4-2, the bi-directional pins of a module are not included. Instead, these pins are counted in both the primary inputs and outputs.  Table 4-2: Predicted N I M A Test Time for ITC'02 SoC Benchmarks Modules (in clock cycles) Circuit Characteristics  ITC'02 cores d281_m5 d695_m5 d695_m6 d695_m9 f2126 ml f2126_m2 f2126_m3 g1023_m1 g1023_m2 g1023_m4 g1023_m10 h953 ml h953_m2 h953_m8 p34392_m1 p93791_m12 p93791_m19 t512505_m7 t512505_m8 t512505_m9 t512505_m14 t512505_m15 t512505_m16 t512505_m23 t512505_m24 t512505_m31 u226 m7  PI  SI  6 32 16 32 8 16 1 14 2 4 1 4 2 8 1 46 44 1 3 1 1 1 1 2 1 28 20  214 38 62 35 356 85 30 139 221 145 301 112 68 35 15 361 538 122 106 82 751 406 850 99 182 211 97  PO  228 304 152 320 529 139 20 273 215 155 377 152 89 69 94 80 437 36 147 122 381 132 897 124 129 316 64  External Tester Total Time Model Test Data Bits (Cycles)  SE  TP  T  32 45 41 54 1000 319 452 43 84 54 13 348 327 189 806 93 100 514 1473 530 1225 386 154 1372 1669 1550 54  118 110 234 12 334 422 103 134 74 268 29 341 9 305 210 391 210 608 1025 195 278 151 370 532 429 3370 76  77084 226796 225420 30214 3034084 2276478 53351 154712 45898 141474 22858 579952 8278 504832 209576 1977986 1164676 462230 4835456 151624 761111 182247 722614  S  1594686 874619 148431662 99618  NIMA I % Overhead Time Model Between NIMA Time Modle (Cycles) I & External Tester Time Model 100*(T -T )/T  DDM  78745 228345 228705 30391 3038769 2282395 55729 156597 47609 145235 23273 584735 8413 509111 212525 1986988 1169515 476223 4849815 154363 767514 185729 727803 1602143 884495 148478851 101375  MN  82441 | 239097 | 239457 | 31831 | 3181377 I 2389531 | 58369 I 163989 [ 49865 I 152051 I 24377 I 612191 | 8845 | 533015 I 222509 | 2080252 I 1224427 | 498591 I 5077383 I 161611 I 803562 | 194465 I 761979 I 1677359 I 926015 I 155445619 | 106175 |  S  S  6.9% 5.4% 6.2% 5.4% 4.9% 5.0% 9.4% 6.0% 8.6% 7.5% 6.6% 5.6% 6.8% 5.6% 6.2% 5.2% 5.1% 7.9% 5.0% 6.6% 5.6% 6.7% 5.4% 5.2% 5.9% 4.7% 6.6%  In addition to the expected N I M A test time, Table 4-2 reports the percentage overhead of N I M A test time when compared to lower bound conventional external test  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  architecture.  107  Using lower bound values for conventional ATE-based testing methods  compares N I M A to many other test strategies reported in the literature, such as [85][86][88][90], as these works propose different methods of achieving the lowest possible test time in ATE-based testing. From Table 4-2 and assuming no control lines are required for the conventional serial connection, the increase in test time with N I M A is always less than 10% when compared to a lower bound serial connection in a conventional external tester approach. Moreover, the results reported in Table 4-2 are independent of the result reported in [9] that reusing an NoC allows comparable test times with techniques that use exclusive T A M s . As shown in [9], with the assumption of having an NoC, the large number of chip's functional I/O pins can be used for postfabrication test, resulting in low test time with no extra test pins.  4.8 Summary and Conclusions In the current test architectures, no distinction is made between the communication and the processing of test data. In addition, an underlying assumption is the existence of a physical link that enables a single tester to directly control a core's DFT elements. While this assumption generally results in simple test protocols, this chapter identified a number of problems in terms of reduced modularity and flexibility, characteristic of such a directaccess template. As one solution, this chapter proposed the concept of Network-oriented, Indirect and Modular Architecture (NIMA) for testing core-based SoCs. In N I M A , different testers can connect to a common switching fabric and send test data to the core(s) under test.  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  108  The indirect methodology of N I M A breaks the coupling between the core, the T A M , and the tester. That is, N I M A de-couples test-data processing and communication. In doing so, N I M A alleviates the problems associated with tightly coupled test architectures, and facilitates test resources partitioning for lowering test cost. N I M A is developed such that test stimuli and expected results for cores are first compiled into new formats and then encapsulated into packets.  These packets are  subsequently augmented with control and address bits such that they are autonomously transmitted to their destination through a switching fabric. In this way, N I M A makes a better utilisation of the available resources, and eliminates the need for control lines in test architectures. Owing to the indirect nature of the connection in N I M A , embedded autonomous blocks at each core are used for applying the test pattern to a core and comparing the test results with the expected results.  The autonomous test data  communication, application, and comparison in N I M A facilitate automation of the entire test cycle resulting in lowering test cost. This chapter also presented an implementation of N I M A on a simple SoC to validate its underlying concept.  For each layer of N I M A ' s architecture, it provided the detailed  design parameters used in the implementation. Finally, experimental results for a simple SoC and ITC'02 Benchmark Circuits were presented. It was observed that, in general, switches in N I M A require an area equal to about 1250 2-input N A N D gates, and that, per switch, the Application Layer in N I M A adds an area equivalent to about 600-800 2-input N A N D gates. In addition, the results predict that test time in N I M A increases less than  Chapter 4: Network-Oriented Indirect and Modular Architecture for Test  109  10% per chip, when compared to a lower bound in conventional test architectures. However, as the control mechanism is integrated in N I M A ' s packets, N I M A eliminates the need for control lines and, hence, requires fewer test pins when compared to current test architectures.  Hence, assuming the same number of test pins for N I M A and the  current test architectures, the test time in N I M A will be lower than the test time in other test architectures. In addition, multi-site testing is ideal in N I M A when D A S T is used in the application layer. Hence, the test time is significantly reduced.  Chapter 5  Conclusions  5.1 Summary Testing Very Large Scale Integration (VLSI) circuits can be an expensive and difficult process. Design for testability (DFT) techniques that facilitate testing of V L S I chips and reduce test cost are essential. With the paradigm shift towards core-based system design, new challenges arise. These new challenges call for new DFT methodologies. The common practice of relying solely on external automatic test equipment (ATE) resources is insufficient for the billion-transistor core-based systems. Moreover, typical logic Built-in Self-Test (BIST) techniques use pseudo-random patterns.  However, one  major challenge with using pseudo-random patterns is the potentially lower fault coverage, compared to that resulting from deterministic patterns obtained via automatic test pattern generation (ATPG). In addition, BIST test time is generally two to three  110  Chapter 5: Conclusions  111  times longer than that of ATPG-based testing. This longer test time is the result of the use of pseudo-random patterns and the compaction of the test responses in BIST. Additionally, compacting the test responses and using a signature analyzer at the output of the circuit under test (CUT), result in other drawbacks of BIST techniques, such as loss of diagnostic information, aliasing, and "zero-ing out". Therefore, embedded testers in the form of deterministic infrastructure intellectual property (I-IP) that take over some functionality of external ATEs are increasingly deemed essential. These I-IPs facilitate post fabrication testing, but do not add to the functionality of the chip. In other words, embedded deterministic I-IP is an enabling technology that facilitates ATPG-grade, infield, and at-speed testing and, hence, improves the quality of test. To leverage the benefits of embedded testing without the potential shortcomings of Built-in Self-Test (BIST) techniques, this thesis described a novel embedded test architecture with test resources that requires the communication of A T P G test data through on-chip global interconnects. This novel technique is referred to as Dedicated Autonomous Scan-based Testing (DAST). DAST is an improvement over BIST in two major ways. Firstly, it improves test time and test coverage with minimal test data, as it relies on A T P G test data and avoids compacting the test results. Secondly, D A S T results in no loss of diagnostic information. In addition, scan test sequence generation follows a simple protocol. A scan test sequence generation procedure was shown to constitute of two distinct parts: one of test data delivery and the other of test data control and observation. The test data control  Chapter 5: Conclusions  112  and observation, i.e., the generation of the specific scan vector sequencing tends to use much of A T E resources. Therefore, D A S T reduces test cost because it partitions the test resources and ports expensive test data control and observation responses from the A T E onto the test chip. The implementation of DAST requires introducing a level of hierarchy between the A T P G and the embedded core under test. In DAST, a simple compilation of the raw test data is performed off-chip to append a few op-codes to the test stimuli and expected outputs.  The data and associated op-codes are then transferred onto the chip and  interpreted by an Embedded Autonomous Sequencer (EAS) and an Embedded Autonomous Results Analyzer (EARA) local to each core. This thesis validated the DAST concept and, through physical implementation, showed that with an area overhead equivalent to about 600-800 2-input N A N D gates, DAST enables embedded deterministic test of SoC chips. It was shown that, per chip, the test time in DAST is marginally longer than the test time for comparable ATE-based testing. However, DAST enables an ideal multi-site testing strategy.  In this ideal multi-site  testing strategy, theoretically, an unlimited number of chips can be tested concurrently on a single tester. This is feasible, as test responses in DAST are sent alongside the test stimuli, enabling these data to be sent to similar chips concurrently.  Therefore,  effectively, based on the total number of similar chips tested, test time is reduced significantly using the DAST methodology when compared to typical ATE-based testing.  Chapter 5: Conclusions  113  In addition, this thesis categorised commonly-used test architectures into three groups based on the connection scheme between the chip pins and the core terminals. It was observed that, in general, the current common test practices assume the existence of a physical link that enables a single tester to directly control a core's DFT elements. While this assumption typically results in simple test protocols, a number of problems with such a direct-access template were identified.  These problems include: reduced  flexibility; the need for close proximity between the C U T and the tester in order to maintain timing requirements; impracticality or high cost of in-field or remote-access testing; and the impracticality of using multiple-testers to test an SoC. These problems are in addition to the reduced modularity in the generic test architecture. However, in order to address many new challenges, SoC test architectures require a more structured, systematic, modular, and hierarchical approach than the traditional DFT. Furthermore, the International Technology Roadmap predicts that the system complexity in chips using 65nm technology, and beyond necessitate a focus on communication rather than computation. Based on this prediction, this thesis proposes that future chips include a switching fabric coupled with network-oriented protocols for connecting cores together. This thesis also proposes that new methodologies and test architectures, which allow reuse of this switching fabric for test purposes, will likely be needed to lower the test cost. The issues outlined above motivated the development of the concept of Networkoriented, Indirect and Modular Architecture (NIMA) for testing core-based SoCs. In this  Chapter 5: Conclusions  114  scheme, different testers can connect to a common switching fabric and send test data to the core(s) under test. The indirect methodology of N I M A decouples the core, the test access mechanism (TAM), and the tester. That is, N I M A handles test-data processing and communication separately.  In doing so, N I M A alleviates the previously outlined  problems associated with tightly coupled test architectures. Furthermore, the de-coupling of test-data processing and communication facilitates partitioning of external test resources into new embedded blocks and external low-cost and low-functionality components.  This partitioning lowers tester cost and enhances the test productivity  through the introduction of hierarchical, modular, and flexible test architecture. N I M A is developed such that test stimuli and expected results for cores are first compiled into new formats and then encapsulated into packets.  These packets are  subsequently augmented with control and address bits such that they are autonomously transmitted to their destination through a switching fabric. Owing to the indirect nature of the connection, embedded autonomous blocks at each core are used for applying the test pattern to a core and comparing the test results with the expected results. In this thesis, the N I M A test architecture is considered as a special communication network consisting of hardware and software that allow the transportation of the test stimuli and expected results from multiple sources to multiple cores. It is predicted that regarding N I M A as a special communication network helps in using the accumulative knowledge in different communication networks. As one example, in this thesis, a 3-layered model  Chapter 5: Conclusions  115  consisting of Physical Layer, Network Layer, and Application Layer for N I M A was developed. This thesis presented an implementation of N I M A on a simple core-based design to validate its underlying concept.  For each layer of N I M A ' s architecture, the detailed  design parameters used in the implementation were presented.  Finally, experimental  results for a simple design and ITC'02 Benchmark Circuits were provided.  It was  observed that, in general, switches in N I M A require an area equal to about 1250 2-input N A N D gates, and that the Application Layer in N I M A adds an area equivalent to about 600-800 2-input N A N D gates. In addition, the results predict that test time in N I M A increases by a maximum of about 10% when compared to a lower bound test time in conventional test architectures.  This increase in test time was predicted under the  assumption that comparable conventional test architectures do not need or use any control T A M lines. In summary, N I M A provides the basis for embedded, cost-effective, scalable, modular, and flexible test design and programming with small area overhead.  N I M A , by  integrating the control mechanism in the packets, eliminates the need for control lines and, hence, requires fewer test pins when compared to current test architectures. Therefore, assuming the same number of test pins for N I M A and the current test architectures, the test time in N I M A will be lower than the test time in other test architectures.  Furthermore, N I M A facilitates the remote-test of an SoC by single or  multiple testers. N I M A also enables the transmission of the test data to an SoC deployed  Chapter 5: Conclusions  116  in the field when it is desired to test and monitor a chip in its target system. Finally, and equally important, N I M A serves in contributing towards the development of new test architectures that benefit from the reuse of a network-on-chip (NoC) interconnect template.  5.2 Future Work The concepts of DAST and N I M A for testing core-based SoC designs were validated in this thesis.  The implementations of these methodologies were given for a serial  connection to off-chip resources.  A n interesting future research work would be the  implementation and characterization of DAST and N I M A with multi-bit communication channels. O f particular interest in this case is the possibility and effect of a dynamically changing channel width to reduce idle time and, hence, lower test time. Future work should also investigate other switching techniques in the Network Layer of N I M A . Investigation needs to focus on switches that can be used in an NoC environment where cores connect to each other through these switches. Further research work for improvement of the Network Layer design and implementation can investigate redundancy in routing and out-of-order packet delivery. Out-of-order delivery can be handled in two ways in the Application Layer. In the first option, the packets will be put back into their original order.  This option could result in potentially large buffering  mechanism that may not be acceptable. In the second option, the packets are used as they are received out-of-order.  Investigations need to determine the impact of this latter  scheme on the fault coverage and operation of the embedded blocks.  Chapter 5: Conclusions  117  N I M A was developed with the assumption of a single frequency for the cores and the test architecture. Investigation is required to develop and characterize a more realistic case of having several clock-domains within the cores and throughout the chip. In addition, scheduling techniques that can handle a multiple clock-domain regime are also needed. Moreover, N I M A presents the opportunity to be easily integrated into a globally asynchronous and locally synchronous (GALS) methodology with significant benefits. Hence, more research work is required to develop N I M A to work in a chip based on G A L S concept. Finally,  investigation into  drastically different  techniques,  such  as  wireless  connections, in the Physical Layer and their impact on the overall performance of SoC testing constitute other envisaged future work.  Bibliography [1]  The International Technology Roadmap for Semiconductors, 2003 Edition, http://public.itrs.net.  [2]  M . L . Bushnell, and V . D . Agrawal, "Essentials of Electronic Testing, for Digital, Memory & Mixed-signal VLSI Circuits", Kluwer Academic Publishers, 2000, ISBN 0-7923-799-1-8.  [3]  M . Abramovici, M . A . Breuer, and A . D . Friedman, "Digital Systems Testing and Testable Design", Computer Science Press, 1990.  [4]  The International Technology Roadmap for Semiconductors, 2001  Edition,  http://public.itrc.itrs.net. [5]  P.G. and A . Greiner, " A Generic Architecture for On-Chip Packet-Switched Interconnections", Proc. Design, Automation and Test in Europe, 2000, pp. 250256.  [6]  W.J. Dally and B . Towless, "Route Packets, Not Wires: On-Chip Interconnection Networks", Proc. Design Automation Conference, 2001, pp. 684-689.  [7]  L . Benini and G. De Micheli, "Networks on Chips: A New SoC Paradigm", IEEE Computers, January 2002, pp. 70-78.  118  Bibliography  [8]  119  P.P. Pande, C. Grecu, A . Ivanov, and R. Saleh, "Design of a Switch for Network on Chip Applications", Proc. IEEE International Symposium on Circuits and Systems, 2003, pp. 217-220.  [9]  E. Cota, et al, "The Impact of NoC Reuse on the Testing of Core-Based Systems", Proc. IEEE VLSI Test Symposium, 2003, pp. 128-133.  [10] E. Cota, L. Carro, F. Wagner, and M . Lubaszewski, "Power-aware NoC Reuse on the Testing of Core-based Systems", Proc. IEEE International Test Conference, 2003, pp. 612-621. [11] E.J. Marinissen, Y . Zorian, R. Kapur, T. Taylor, and L . Whetsel, "Towards a Standard for Embedded Core Test: A n Example", Proc. IEEE International Test Conference, 1999, pp. 616-627. [12] M . Bernbaum and H . Sachs, "How VSIA Answers the SoC Dilemma", IEEE Computers, June 1999, pp. 42-49. [13] E.J. Marinissen and Y . Zorian, "Challenges in Testing Core-Based System ICs", IEEE Communication Magazine, June 1999, pp. 104 -109. [14] D. Chin, "Executing System on a Chip: Requirements for a Successful SoC Implementation", IEEE Electron Devices Meeting, 1998, IEDM98-3 - 8. [15] The International http://public.itrs.net.  Technology Roadmap  for Semiconductors,  1999 Edition,  Bibliography  120  [16] A . Benso et al, "HD2BIST: Architectural Framework for BIST Scheduling, Data Patterns delivering & Diagnosis in SoCs", Proc. IEEE International Test Conference, 2000, pp. 892 -901. [17] D . Gajski et al., "Essential Issues for IP Reuse", Proc. IEEE ASP-DAC, 2000, pp. 37-42. [18] Y . Zorian, "Test Requirements for Embedded Core-Based Systems an IEEE PI 500", Proc. IEEE International Test Conference, 1997, pp. 191-199. [19] R.K. Gupta and Y . Zorian, "Introducing Core-Based System Design", IEEE Design & Test of Computers, December 1997 14(4), pp. 15-25. [20] W. Ke and K . Truong, "Design With Testability for a Platform-Based SoC Design Methodology", Proc. IEEE AP-ASIC, 1999 pp. 307-310. [21] Y . Zorian, E.J. Marinissen, and S. Dey, "Testing Embedded-Core Based System Chips", Proc. IEEE International Test Conference, 1998, pp. 130-143. [22] E.J. Marinissen et al, " A structured & Scalable Mechanism for Test Access to Embedded Reusable Cores", Proc. IEEE International Test Conference, 1998, pp. 284-293. [23] Y . Zorian, "Testing the Monster Chip", IEEE Spectrum, July 1999, pp. 22-24. [24] Y . Zorian, "System-Chip Test Strategies", Proc. Design Automation Conference, 1998, pp.752-757.  Bibliography  121  [25] R. Chandramouli and S. Pateras, "Testing Systems on a Chip", IEEE Spectrum, Nov. 1996, pp. 42-47. [26] L . Whetsel, "Core Test Connectivity Communication & Control", Proc. IEEE International Test Conference, 1998, pp. 303-312. [27] R. Kapur, R. Chandramouli, and T.W. Williams, "Strategies for Low-Cost Test", IEEE Design & Test of Computers, November-December 2001, pp. 47-54. [28] J. Bedsole, R. Raina, A l Crouch, and M.S. Abadir, "Very Low Cost Testers: Opportunities and Challenges", IEEE Design & Test of Computers, SeptemberOctober 2001, pp. 60-69. [29] J. Raj ski, "DFT for High-Quality Low Cost Manufacturing Test", IEEE Asian Test Symposium, 2001, pp. 3-8. [30] D & T Roundtable, "Test Resource Partitioning", IEEE Design & Test of Computers, July-September 2000, pp. 126-132. [31] S. Pateras, "Achieving At-Speed Structural Test", IEEE Design & Test of Computers, September-October 2003, pp. 26-33. [32] Y . Zorian, "Embedded Memory Test & Repair: Infrastructure IP for SOC Yield", Proc. IEEE International Test Conference, 2002, pp. 340-349. [33] P. Varma and S. Bhatia, " A Structured Test Re-Use Methodology for Core-Based System Chips", Proc. IEEE International Test Conference, 1998, pp. 294-302.  Bibliography  122  [34] P. Harrod, "Testing Reusable IP- A Case Study", Proc. IEEE International Test Conference, 1999, pp. 493-498. [35] E.H. Volkernik, A . Khoche, J. Rivoir, and K . D . Hilliges, "Modern Test Techniques: Tradeoffs, Synergies, and Scalable Benefits", Journal of Electronic Testing: Theory and Applications, vol. 19, 2003, pp. 125-135. [36] A . Chandra and K . Chakrabarty, "Test Resource Partitioning for SOCs", IEEE Design & Test of Computers, September-October 2001, pp. 80-91. [37] K . Arabi, "Logic BIST and Scan Test Techniques for Multiple Identical Blocks", Proc. IEEE VLSI Test Symposium, 2002, pp. 60-65. [38] D. Das and N . A . Touba, "Reducing Test Data Volume Using External/LBIST Hybrid Test Patterns", Proc. IEEE International Test Conference, 2000, pp. 115122. [39] V . Iyengar, K . Chakrabarty, and B.T. Murray, "Deterministic Built-in Pattern Generation for Sequential Circuits", Journal of Electronic Testing: Theory and Application, vol. 15, August-October 1999, pp. 97-115. [40] A . Chandra and K . Chakrabarty, "System-on-a-Chip Test-Data Compression and Decompression Architectures Based on Golomb Codes", IEEE Trans, on ComputerAided Design of Integrated Circuits and Systems, vol. 20, No. 3, March 2001, pp. 355-368.  Bibliography  123  [41] H.G. Liang, S. Hellebrand, and H.J. Wunderlich, "Two-Dimensional Test Data Compression for Scan-Based Deterministic BIST", Proc. IEEE International Test Conference, 2001, pp. 894-902. [42] I. Bayaraktaroglu and A . Orailglu, "Test Volume and Application Time Reduction", Proc. Design Automation Conference, 2001, pp. 151-155. [43] P.T. Gonciari, B . M . Al-Hashimi, and N . Nicolici, "Improving Compression Ratio, Area Overhead, and Test Application Time for System-on-a-Chip Test Data Compression/Decompression", Proc. Design, Automation and Test in Europe, 2002, pp. 604-611. [44] J. Raj ski, M . Kassab, N . Mukherjee, and N . Tamarapalli, "Embedded Deterministic Test for Low-Cost Manufacturing", IEEE Design & Test of Computers, SeptemberOctober 2003, pp. 58-66. [45] J. Rajski et al, "Embedded Deterministic Test for Low Cost Manufacturing Test", Proc. IEEE International Test Conference, 2002, pp. 301-310. [46] T. Hiraide et al,  "BIST-Aided Scan Test - A New Method for Test Cost  Reduction", Proc. IEEE VLSI Test Symposium, 2003, pp. 359-364. [47] B. Nadeau-Dostie, "Design For At-speed Test, Diagnosis, and Measurement", Kluwer Academic Publishers, 2000. [48] Y . Zorian, "What is Infrastructure IP?", IEEE Design & Test of Computers, MayJune 2002, pp. 5-7.  Bibliography  124  [49] Y . Zorian, "Embedded Infrastructure IP for SOC Yield Improvement", Proc. Design Automation Conference, 2002, pp. 709-712. [50] Y . Zorian, "Guest Editor's Introduction: What Is Infrastructure IP?", IEEE Design & Test of Computers, mfrastructure IP Special Issue, vol. 19, Issue 3 , May-June 2002, pp. 3-5. [51] IEEE  PI500  Standard  for  Embedded  Core  Test  (SECT),  http ://grouper. ieee.org/groups/1500. [52] E.J. McCluskey, "Built-in Self-Test Techniques", IEEE Design  & Test of  Computers, vol. 2, No. 2, Apr. 1985, pp. 21-28. [53] P.H. Bardell, W.H. McAnney, and J. Savir, "Built-in Test for VLSI", John Wiley & Sons Inc., 1987, ISBN: 0-471-62463-2. [54] V . D . Agrawal, C.R. Kime, and K . K . Saluja, " A Tutorial on Built-in Self-Test, Part 1: Principles", IEEE Design & Test of Computers, March 1993, pp. 73-82. [55] V . D . Agrawal, C.R. Kime, and K . K . Saluja, " A Tutorial on Built-in Self-Test, Part 2: Applications", IEEE Design & Test of Computers, June 1993, pp. 69-77. [56] J. Rajski, N . Tamarapalli, and J. Tyszer, "Automated Synthesis of Phase Shifters for Built-in-Self-Test Application", IEEE Trans, on Computer-Aided Design of Integrated Circuits and Systems, vol. 19, No. 10. October 2000, pp. 1175-1188. [57] N . A . Touba and E.J. McCluskey, "Test Point Insertion based on Path Tracing", Proc. IEEE VLSI Test Symposium, 1996, pp. 2-8.  Bibliography  125  [58] N . Tamarapalli and J. Rajski, "Constructive Multi-Phase Test Point Insertion for Scan-Based BIST", Proc. IEEE International Test Conference, 1996, pp. 649-658. [59] F. Muradali, V . K . Agrawal, and B. Nadeau-Dostie, " A New Procedure for Weighted Random Built-In-Self-Test", Proc. IEEE International Test Conference, 1990, pp. 660-668. [60] S. Venkataraman, J. Rajski, S. Hellebrand, and S. Tarnick, " A n Efficient BIST Scheme Based on Reseeding of Multiple Polynomial Linear Feedback Shift Registers", Proc. International Conference on Computer-Aided Design, 1993, pp. 572-577. [61] K . Chakrabarty, B. T. Murray, and V . Iyengar, "Deterministic Built-in Test Pattern Generation for High-Performance Circuits Using Twisted-Ring Counters", IEEE Trans, on VLSI Systems, vol. 8, No. 5, October 2000, pp. 633-636. [62] G. Hetherigton et al, "Logic BIST for Large Industrial Designs: Real Issues and Case Studies", Proc. IEEE International Test conference, 1999, pp. 358-367. [63] R.W. Bassett et al, "Low-Cost Testing of High-Density Logic Components", IEEE Design & Test of Computers, 1990, pp. 15-28. [64] H.J. Wunderlich and G. Kiefer,  "Bit-Flipping BIST", Proc.  International  Conference on Computer Aided-Design, 1996, pp. 337-343. [65] N . A . Touba and E.J. McCluskey, "Altering a Pseudo-Random Bit Sequence for Scan Based BIST", Proc. IEEE International Test Conference, 1996, pp. 167-175.  Bibliography  126  [66] A . Jas, C V . Krishna, and N . A . Touba, "Hybrid BIST Based on Weighted PseudoRandom Testing: A New Test Resource Partitioning Scheme", Proc. IEEE VLSI Test Symposium, 2001, pp. 2-8. [67] G. Kiefer, H . Vranken, E.J. Marinissen, and H.J. Wunderlich, "Application of Deterministic Logic BIST on Industrial Circuits", Proc. IEEE International Test Conference, 2000, pp. 105-114. [68] Y . W u and A . Ivanov, "Single-Reference  Multiple Intermediate  Signature  (SREMIS) Analysis for BIST", IEEE Trans, on Computers, vol. 44, Issue 6, June 1995, pp. 817-825. [69] Y . Wu and A . Ivanov, "Minimal Hardware Multiple Signature Analysis for BIST", " Proc. IEEE VLSI Test Symposium, 1993, pp. 17-20. [70] S. Wang and S.K. Gupta, "DS-LFSR: a BIST T P G for Low Switching Activity", Proc. IEEE Trans., vol. 21, Issue 7, July 2002, pp. 842-851. [71] S. Gerstendorfer and H.J. Wunderlich, "Minimized Power Consumption for ScanBased BIST", Proc. IEEE International Test Conference, 1999, pp. 77-84. [72] I. Ghosh, S. Dey, and N . K . Jha, " A Fast & Low Cost Testing Technique for CoreBased System-on-Chip", Proc. Design Automation Conference, 1998, pp. 542-547. [73] I. Ghosh, N . K . Jha, and S. Dey, " A Low Overhead Design for Testability & Test Generation Technique for Core-Based Systems", Proc. IEEE International Test Conference, 1999, pp. 50-59.  Bibliography  127  [74] M . Nourani and C. Papachristou, " A n ILP Formulation to Optimize Test Access Mechanism in System-on-Chip Testing", Proc. IEEE International Test Conference, 2000, pp. 902-1000. [75] B. Mathewson, "Core Provider's Test Experience", Presentation at IEEEP1500 Working  Group  Meeting,  June  1998,  CA.  U.S.A.,  http://grouper.ieee.Org/groups/1500/pastmeetings.html#dac98. [76] V . Immaneni and S. Raman, "Direct Access Test Scheme-Design of Block and Core Cells for Embedded ASICS", Proc. IEEE International Test Conference, 1990, pp. 488-492. [77] N . A . Touba and B . Pouya, "Testing Embedded Cores using Partial Isolation Rings", Proc. IEEE VLSI Test Symposium, 1997, pp. 10-16. [78] B. Pouya and N . A . Touba, "Modifying User-Defined Logic for Test Access to Embedded Cores", Proc. IEEE International Test Conference, 1997, pp. 60-68. [79] L. Whetsel, " A n IEEE 1149.1 Based Test Access Architecture for ICs with Embedded Cores", Proc. IEEE International Test Conference, 1997, pp. 69-78. [80] D. Bhattacharya, "Hierarchical Test Access Architecture for Embedded Cores in an Integrated Circuit", Proc. IEEE VLSI Test Symposium, 1998, pp. 8-14. [81] A . Benso et al, "HD2BIST: Architectural Framework for BIST Scheduling, Data Patterns delivering & Diagnosis in SoCs", Proc. IEEE International Test Conference, 2000, pp. 892 -901.  Bibliography  128_  [82] M . Benabdenbi and W. Maroufi, "CAS-Bus: A Scalable and Reconfigurable Test Access Mechanism for Systems on a Chip", Proc. Design, Automation and Test in Europe, 2000, pp. 141-145. [83] L. Whetsel, "Addressable Test Ports, an Approach to Testing Embedded Cores", Proc. IEEE International Test Conference, 1999, pp. 1055-1064. [84] Z.S. Ebadi and A . Ivanov, "Time Domain Multiplexed T A M : Implementation and Comparison", Proc. Design, Automation and Test in Europe, 2003, pp. 732-737. [85] V . Iyengar  and K . Chakrabarty, "System-on-a-Chip Test Scheduling With  Precedence Relationships, Pre-emption, and Power Constraints", IEEE Trans, on Computer-Aided Design of Integrated Circuits and Systems, vol. 21, No. 9, Sept. 2002, pp. 1088-1094. [86] E. Larsson, and Z. Peng, " A n Integrated System-On-Chip Test Framework", Proc. Design, Automation, and Test in Europe, 2001, pp. 139-144. [87] Y . Huang et al, "Optimal Core Wrapper Width Selection and SOC Test Scheduling Based on 3-D Bin Packing Algorithm", Proc. IEEE International Test Conference, 2002, pp.74-82. [88] S.K. Goel and E.J. Marinissen, "Effective and Efficient Test Architecture Design for SoCs", Proc. IEEE International Test Conference, 2002, pp. 529 538.  Bibliography  129  [89] V . Iyengar, K . Chakrabarty, and E.J. Marinissen, "Test Wrapper and Test Access Mechanism Co-Optimization for System-on-Chip", Journal of ElectronicTesting: Theory and Applications, vol. 18, 2002, pp. 211-230. [90] W . Zou, S.M. Reddy, I. Pomeranz, Y . Huang, "SoC Test Scheduling Using Simulated Annealing", Proc. IEEE VLSI Test Symposium, 2003, pp. 325-331. [91] M . Nahvi, A . Ivanov, and R. Saleh, "Dedicated Autonomous Scan-Based Testing (DAST) for Embedded Cores", Proc. IEEE International Test Conference, 2002, pp. 1176-1183. [92] M . Nahvi and A . Ivanov, " A n Embedded Autonomous Scan-Based Results Analyzer (EARA) for SoC Cores", Proc. IEEE VLSI Test Symposium, 2003, pp. 293-298. [93] ITC99 Benchmarks: http://www.cerc.utexas.edu/itc99-benchmarks/bench.html. [94] M . Nahvi and A . Ivanov, " A Packet Switching Communication-Based Test Access Mechanism for System Chips", Proc. IEEE European Test Workshop, 2001, pp. 8186. [95] M . Nahvi and A . Ivanov, "Indirect Test Architecture for SoC Testing", IEEE Trans, on Computer-Aided Design of Integrated Circuits and Systems, vol. 23, Issue 7, July 2004, pp. 1128-1142. [96] W. Stallings, "Data & Computer Communications - Sixth Edition", Prentice-Hall Inc., ISBN 0-13-084370-9, 2000.  Bibliography  130  [97] ITC'02 Benchmarks: http://www.extra.research.philips.com/itc02socbenchm. [98] R. M . Chou, K . K . Saluja, and V . D. Agrawal, "Scheduling Tests for VLSI Systems Under Power Constraints", IEEE Trans, on Very Large Scale Integration (VLSI) Systems, vol. 5, No. 2, June 1997, pp. 175-185.  Appendix A: DAST Test Time Models This appendix derives Equations (3-1), (3-2), and (3-3): = (9 + 2PI + SIxSE) + TP(23 + 2PI + SI + SIxSE)  T_ M  iff  DAST  PO<PI+3  M2_DAST  T  iff  iff  ( + 9  + SIX SE) + TP(20 + PI + PO + SI + SI x SE)  2 P I  .  ( A  2 )  PI+SI+6 > PO >PI+3  M3_DAST  T  =  = (  9  +  2  P  I  +  S  l  x  )  SE  +  (  Tp  14  +  2 p  O  + SIX  SE)  ^ _ j A  3  PO>PI+SI+6 where PI, SI, and PO are the numbers of primary inputs, scan inputs, and primary  outputs as defined in Section 3.2. Moreover, TP is the total number of test patterns, SE is the maximum number of scan cells in the scan chain(s), and TMIJDAST, TM2_DAST, and Tm_DASTV£Q the test time models for DAST, in terms of clock cycles.  131  Appendix A : DAST Test Time Models  132_  Given the algorithms for the EAS and E A R A compilers given in Figure 3-5 and Figure 3-6, careful investigations of typical test data sets reveal that, for a typical test pattern, EAS blocks receive DEASJTP bits of data: D _ EAS  TP  = 2{D ) + D + 3{D _ Pl  SI  Assert  clk  )+ D + sc  D  (A-4)  Sync  where Dpi, Dsi, Dsc are the intended data bits for the primary inputs, the scan inputs, and the scan chain(s), respectively. Moreover, DA t-cik represents the data bits of the SSer  Assert-Clk op-code, as used in the EAS compiler of Section 3.4.1. Finally, Ds  ync  denotes  the number of bits needed for synchronizing DAST's EAS and E A R A blocks. Given the typical generic scan test waveforms of Figure 3-2, and based on lines 24 and 28 in the algorithm of Figure 3-5 and lines 34, 37 and 49 in the algorithm of Figure 3-6, the number of bits for synchronization, i.e., Ds  ync  D  Sync  = {{D  PO  -(D„ +D )) + \) + {{D SI  P0  can be obtained from:  -D„) + l)  (A-5)  where Dpo denotes the expected test data bits for the primary outputs. Note that Ds  ync  comprises of two terms and that each of these terms is always >1, as the number of the synchronization bits is always a positive integer. As explained in Section 3.4.1, three bits are used for the Assert-Clk op-code, and both the EAS and the E A R A compilers add three-bit op-codes to the beginning of each section of the test program. Therefore, the blocks of data for PI, SI, and the scan chains have three op-codes added to them by the E A S compiler.  Hence, DA ert-cik 3, Dp/=PI+3, =  D r=SI+3, and D =SC+3 (note that SC = SIx SE). However, D  SS  S  sc  P0  is simply equal to the  Appendix A : D A S T Test Time Models  133  number of primary outputs, i.e., PO, as the synchronization is only needed for the number of primary outputs. Replacing Equation (A-5) in Equation (A-4), and substituting for known values yields: = 2(PI + 3)+ (SI + 3)+ 3(3)+ (3 + SIxSE)+ ((PO - (PI + SI + 6)) +1)+ ((PO - (PI + 3))+1)  D _ EAS  TP  (A-6)  If one clock cycle is needed for each bit in the data blocks, the test time in DAST is equal to the total number of bits received by the E A S block. Given that the E A S block receives initialization data for the primary inputs and the scan chain(s), before receiving the entire test pattern, DAST test time can be modeled to be: 2{D ) + D PI  sc  + TP{D _ ) EAS  TP  (A-7)  where the first two terms represent the initialization bits and the last term denotes the total test pattern bits. The last two terms of Equation (A-6) are conditional terms and, as explained for Equation (A-5), are always >1. Hence, to simplify Equation (A-6), three conditions can be identified as follows: PO<PI+3; PI+SI+6>PO>PI+3; and PO>PI+SI+6.  Using  these conditions, replacing Equation (A-6) in Equation (A-7), and after simplification, Equation (A-7) yields the DAST test times models of: T _ m  iff  DAST  = (9 + 2PI + SIx SE) + 7P(23 + 2PI+ SI + SIX SE)  PO<PI+3  (_ A  8)  Appendix A : DAST Test Time Models  M2_DAST  T  iff  iff  = ( + 2PI + SIx SE) + TP(20 + PI + PO + SI + Six SE) 9  ( A  _  9 )  PI+SI+6>PO>PI+3  M3_DAST  T  134  =(9 + 2PI + SIxSE) + TP(l4 + 2PO + SIxSE)  PO>PI+SI+6  ( A  _  1 0 )  Appendix B: Theoretical Test Time Models for Serial ATE-based Testing This appendix derives Equations (3-4), (3-5), and (3-6): T =(2PI sl  iff  + SIx SE) + TP(2PI + SI + SIXSE)  PO<PI  T  = (2PI + SIx SE) + TP(PI + PO + SI + SIX SE)  iff  PI+SI> PO >PI  T  = (2PI + SIx SE) + TP(2PO + SIx SE)  S2  Si  iff  (B-l)  _  (B  ( B  .  2)  3 )  PO>PI+SI where PI, SI, and PO are the numbers of primary inputs, scan inputs, and primary  outputs as defined in Section 3.2. Moreover, TP is the total number of test patterns, SE is the maximum number of scan cells in the scan chain, and Tsi, Ts2, Tss are the test time models of a serial ATE-based testing scheme, and are in clock cycles.  135  Appendix B: Theoretical Test Time Models for Serial ATE-based Testing  136  For lower bound analysis, assume that no time is required to set up the T A M and the wrapper of the CUT, and that one clock cycle is needed for each bit in the test data blocks. Careful investigations of typical generic scan test waveforms of Figure 3-2 and the test data sets reveals that, for a typical test pattern in a serial ATE-based testing scheme, the test time, i.e., TTP is: T  TP  = 2{D ) + D + D Pf  SJ  sc  + (D  Po  - {D + D )) + {D PI  SI  Po  -  D) PI  (B-4)  where DPI, DSI, DPO, and Dsc are the intended data bits for the primary inputs, the scan inputs, primary outputs, and the scan chain(s), respectively. Moreover, the last two terms are always positive integer values. These two terms represent the differences between the number of test stimuli bits and the test response bits, and hence, denote the waiting time of the tester between finishing the application of the test stimuli and receiving all the test response bits. Given that the C U T receives initialization data for the primary inputs and the scan chain(s) before receiving the entire test pattern, the test time for a serial ATE-based testing, i.e., Ts can be modeled to be: T =2{D ) S  P1  +  D +TP(T ) SC  TP  (B-5)  where the first two terms denote the initialization bits and the last term represents the total test pattern bits.  Appendix B: Theoretical Test Time Models for Serial ATE-based Testing  137  The lower bound values of Dpi, Dsi, Dpo, and Dsc are simply equal to PI, SI, PO, and SC = SIxSE, respectively. Replacing these values and Equation (B-4) in Equation (B5) yields: T = (2PI + SIx SE) + TP(2PI + SI + SIXSE + (PO - (PI + Si)) + (PO - Si)) s  (B-6)  The last two terms of Equation (B-4) are conditional that also appear in Equation (B-6). To simplify Equation (B-6), three conditions can be identified as follows: PO<PI; PI+SI>PO>PI; and PO>PI+SI.  Using these conditions and after simplification,  Equation (B-6) yields the test time models of a serial ATE-based testing scheme: T  sl  iff  = (2PI + SIx SE) + TP(2PI + SI + SIX SE)  7)  PO<PI  T  = (2PI + SIx SE) + TP(PI + PO + SI + SIX SE)  iff  PI+SI > PO >PI  S2  _  (B  = (2PI + SIx SE) + TP(2PO + SIx SE) PO > PI+SI  ( B  .  (B-9)  8 )  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Country Views Downloads
United States 24 1
China 11 7
United Kingdom 5 0
India 2 0
Republic of Lithuania 2 0
Germany 2 1
France 1 0
City Views Downloads
Ashburn 14 0
Shenzhen 11 7
Unknown 9 44
London 5 0
Fullerton 2 0
Mountain View 2 0
Matawan 1 0
Chennai 1 0
Sunnyvale 1 0
New Delhi 1 0

{[{ mDataHeader[type] }]} {[{ month[type] }]} {[{ tData[type] }]}
Download Stats

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065406/manifest

Comment

Related Items