UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Time-driven test methodologies for embedded srams Wang, Baosheng 2005

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2005-105836.pdf [ 8.33MB ]
Metadata
JSON: 831-1.0065417.json
JSON-LD: 831-1.0065417-ld.json
RDF/XML (Pretty): 831-1.0065417-rdf.xml
RDF/JSON: 831-1.0065417-rdf.json
Turtle: 831-1.0065417-turtle.txt
N-Triples: 831-1.0065417-rdf-ntriples.txt
Original Record: 831-1.0065417-source.json
Full Text
831-1.0065417-fulltext.txt
Citation
831-1.0065417.ris

Full Text

TIME-DRIVEN TEST METHODOLOGIES FOR EMBEDDED SRAMS  BAOSHENG WANG B A . S c , Beihang University, China, 1997 M A . S c , Beijing Tsinghua University, China, 2000 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES ELECTRICAL AND COMPUTER ENGINEERING  THE UNIVERSITY OF BRITISH COLUMBIA © Baosheng Wang. 2005  Abstract According to International Technology Roadmap for Semiconductor 2003 (ITRS'03), b y 2013 over 90% o f the total System-on-a-Chip (SoC) area w i l l be occupied b y memories, e.g., S R A M s . The increasingly dense embedded S R A M s ( e - S R A M s ) are more prone to manufacturing defects and field reliability problems since they are subject to aggressive design rules. O n the one hand, this reduces the memory and S o C yield, thereby increasingly making redundancy necessary; on the other hand, it poses significant test challenges, particularly, the test time required for achieving acceptable test quality. This thesis focuses on reducing test time o f e - S R A M s , for both a single and multiple memories.  In practice, the test time for a single e - S R A M consists o f the time for testing Data Retention Faults (DRFs) and the time for testing other faults named as non-DRFs. B y tightly coupling the coupling fault (one o f non-DRFs) test and hard repair techniques, here the non-DRF test time is reduced by up to a factor o f two compared with the one when not coupling those test and repair activities. A n d this reduction is achieved without negatively impacting defect coverage. Based on the memory sizes, its D R F test time is reduced by either using a Design-for-Test ( D F T ) technique referred to as Pre-Discharge  Write Test Mode ( P D W T M ) or by reusing the inherent  read or write operation time due to the access to different cells. Furthermore, b y considering (i) the trade-off values between the yield gain and the redundancy area overhead, (ii) the delay time for D R F tests, as the two deciding factors, any e - S R A M can be categorized into one o f four groups. Based on their repair features and D R F test method selections, those four groups are named as N S R D F e - S R A M s (with N o or Soft Repair and D F T techniques for retention faults), S R D E (with Soft Repair and the reduced D E l a y time for retention faults), H R D E (with Hard Repair and the reduced D E l a y time for retention faults) an H R D F (with Hard Repair and D F T ii  techniques for retention faults). Accordingly, four customized March test algorithms, generated from a comparison algorithm, are selected for each group respectively. W i t h the proposed customized algorithms, the test time o f any single e - S R A M can be reduced by a factor o f up to two compared with that required when applying a universal M a r c h algorithm.  Moreover, the thesis also targets on reducing the test time o f multiple distributed small eS R A M s . T o the best o f our knowledge, the widely used solutions to test/diagnose such multiple e - S R A M s are to apply serial memory interfacing techniques, i.e., unidirectional and bidirectional serial interfaces. The approach i n this thesis speeds up the referred test solutions by replacing global serial response analyzers with parallel local response analyzers. The serial fault masking and defect-rate dependent diagnosis existing i n uni-directional/bi-directional serial interfaces are overcome by designing a pair o f serial to parallel and parallel to serial converters. When more eS R A M s are tested i n parallel, the accumulate test power might be over the limit and thus a power-constrained  scheduling is called for. To further reduce the test time during this  scheduling, i n this thesis, a "retention-aware" test power model is proposed to replace the original "single-rectangle" model typically used for S o C cores.  iii  Table of Contents Abstract  ii  Table o f Contents  iv  List o f Tables  vii  List o f Figures  viii  Acronyms  xi  Acknowledgements  xiii  Co-Authorship Statements  xiv  Chapter 1 1.1 1.2 1.3 1.4 1.5  Introduction  1  Motivation Related W o r k Thesis Organization Contributions References  1 3 4 7 8  Chapter 2 Simplifying Coupling Faults Test and Pause Test for D R F s for an e - S R A M , 2.1 Introduction 2.2 Defect Analysis o f IF A Functional Fault Models 2.2.1 Background 2.2.2 The Simplified F F M 2 Fault Class 2.2.3 Data Retention Fault ( D R F ) 2.3 March 5 N Test Algorithm 2.3.1 Related W o r k 2.3.2 The Proposed M a r c h 5 N Test Algorithm 2.4 Validation o f the Proposed Test Algorithm 2.4.1 Fault Injections 2.4.2 Validation Results 2.4.3 Comparison with March 9 N with Pause test 2.5 Summary 2.6 References  11 11 12 12 14 15 18 18 19 20 20 21 22 23 24  Chapter 3 Fast Detection o f Data Retention Faults and Other S R A M C e l l Open Defects 3.1 Introduction 3.2 Open Defects i n a S R A M C e l l 3.2.1 Background 3.2.2 Faulty Access N M O S ( O C 8 , O C 9 , OC10) ' 3.2.3 Faulty pull down N M O S ( O C 3 , O C 4 , O C 7 ) 3.2.4 Faulty P M O S ( O C 1 , O C 2 , O C 5 ) 3.2.5 Power Node Opens (OC11, O C 1 2 )  26 26 28 28 30 31 31 32  iv  3.3 Pre-Discharge Write Test M o d e ( P D W T M ) 3.3.1 The Concept 3.3.2 Detection o f P M O S Open Defects (PODs) 3.3.3 At-Speed Test Capability 3.4 Implementations 3.4.1 A P D W M a r c h 9 N Algorithm 3.4.2 A n N W R M a r c h 1 I N Algorithm with N o Write Recovery 3.5 Experimental Results 3.5.1 S R A M Models 3.5.2 Input Patterns and Validation Results 3.6 Discussions 3.6.1 Test Time, Defect Coverage and Detection Capability 3.6.2 Design Efforts and Area / Performance Penalties 3.6.3 Separate D R F Test U s i n g the P D W T M 3.6.4 Selections between P D W M a r c h and N W R M a r c h 3.6.5 Limitations and Future W o r k 3.7 Summary 3.8 References  33 33 34 38 40 41 43 45 45 47 54 54 55 55 56 58 60 61  Chapter 4 A Time-efficient Customizable M a r c h Test Algorithm for an e - S R A M 4.1 Introduction 4.2 Test Concepts 4.2.1 Background : 4.2.2 Classifications o f e - S R A M s for Testing 4.2.3 e - S R A M N o n - D R F Tests U s i n g Redundancy Features 4.2.4 e - S R A M Tests to Detect D R F s 4.3 Test Time Evaluations 4.3.1 A Case Study ' 4.3.2 IO Number Effects 4.3.3 Technology Trends 4.3.4 March Algorithm Complexity Impacts 4.4 Summary 4.5 References  65 65 67 67 69 71 72 74 74 78 79 80 81 82  Chapter 5 A Time-efficient Test Architecture for Multiple Distributed Small e - S R A M s 5.1 Introduction 5.2 Review o f Previous W o r k 5.3 Design-for-test ( D F T ) Techniques 5.3.1 Architectural Overview 5.3.2 Designs for Serial to Parallel Converter (SPC) 5.3.3 Designs for Local Response Analyzers ( L R A s ) 5.3.4 Designs for Testing Data Retention Faults 5.4 Evaluations 5.4.1 Defect Coverage Analysis 5.4.2 Test Time Comparisons 5.4.3 Area Overhead Estimations 5.4.4 A Case Study , 5.5 Summary and Future W o r k  85 85 87 89 89 91 92 93 95 95 96 98 99 99  ' 5.6  References  100  Chapter 6 A Fast Diagnosis Scheme for Multiple Distributed Small e - S R A M s 6.1 Introduction 6.2 Review o f Previous W o r k 6.3 The Fast Diagnosis Scheme 6.3.1 Architectural Overview 6.3.2 Serial to Parallel Converter ( S P C ) 6.3.3 Parallel to Serial Converter (PSC) 6.3.4 Diagnosis o f Data Retention Faults ' 6.4 Evaluations 6.4.1 Diagnosis Coverage Analysis 6.4.2 Diagnosis T i m e Comparison 6.4.3 Area Overhead Estimations 6.5 Summary 6.6 References  104 104 106 108 108 110 112 113 115 115 115 117 118 118  Chapter 7 A Retention-aware Test Power M o d e l for an e - S R A M 7.1 Introduction 7.2 A Retention-Aware Test Power M o d e l ' 7.3 Test Time Evaluations for B I S T e d e - S R A M s 7.3.1 A Case Study 7.3.2 Impacts o f Non-delay C y c l e Divisions 7.3.3 M e m o r y Test Algorithm Complexity Impacts on Test Time reduction 7.3.4 Memory Capacity Impact on Test Time Reduction 7.4 Quantification o f Test Time Reduction i n a S o C 7.5 Summary 7.6 References  121 121 122 124 124 126 127 128 129 131 132  Chapter 8 Conclusions 8.1 Conclusions 8.2 Future W o r k 8.2.1 Short-term Goals 8.2.2 Long-term Goals 8.3 References  135 135 137 137 140 140  '.  vi  List of Tables Table 2-1 Faults injected into 64Kx32bit S R A M under test  21  Table 2-2 Detected fault cells  22  Table 2-3 Comparison results  23  Table 3-1 Voltage levels o f bit lines and memory cell storage nodes i n a fault-free cell  35  Table 3-2 Voltage levels o f bit lines and memory cell storage nodes i n a faulty P M O S defective cell  36  Table 3-3 Voltage levels o f bit lines and memory cell storage nodes i n the cell with an open defect in V c c  37  Table 3-4 Transistor sizes o f memory cells  47  Table 3-5 Open defects detection capabilities o f M a r c h 9 N , Pause test and P D W T M  53  Table 3-6 D R F test time comparisons  54  Table 4-1 Classifying e - S R A M s under 8 bits 10  78  Table 5-1 Comparisons o f test time reduction and area overhead extra to [4-5]  99  Table 7-1 Example: e - S R A M s under test  125  Table 7-2 Test time reduction for non-delay cycle divisions  127  Table 7-3 Test time reduction for different test complexity (time)  127  Table 7-4 Test time reduction for different memory capacities  129  Table 7-5 Test time reduction factor evaluation within a S o C  131  vii  List of Figures Figure 2-1 Four-cell memory configuration  14  Figure 2-2 One D R F model with a resistive open i n the G N D / p u l l - d o w n transistor  17  Figure 2-3 The compact M a r c h 9 N with Pause test  18  Figure 2-4 The proposed M a r c h 5 N test algorithm  19  Figure 2-5 Coupling faults injections  20  Figure 3-1 Opens within a 6T S R A M cell  30  Figure 3-2 Static Noise Margin for a S R A M C e l l  40  Figure 3-3 The expand M a r c h 9 N test with Pause test  41  Figure 3-4 The P D W M a r c h 9 N  41  Figure 3-5 Memory control circuit modification for P D W M a r c h 9 N  43  Figure 3-6 The N W R M a r c h 1 I N  44  Figure 3-7 Circuits design o f the N W R T M  44  Figure 3-8 The S R A M simulation circuit  46  Figure 3-9 Detections o f opens for a fault-free cell (a), O C 1 (b), O C 2 (c), O C 5 (d) and OC11 (e) when applying a pause test  50  Figure 3-10 (I) N W R M a r c h U N validation results for a fault-free cell (a), O C 1 (b), O C 2 (c), O C 5 ( d ) a n d O C l l (e)  51  Figure 3-10 (II) P D W M a r c h 9 N validation results for a fault-free cell (a), O C 1 (b), O C 2 (c), O C 5 (d)and OC11 (e)  52  Figure 3-11 Separate D R F test using P D W T M  56  Figure 3-12 Symmetric M a r c h G test algorithm  57  Figure 3-13 N W R M a r c h G (a)/PDWMarch G (b) based on symmetric M a r c h G  57  Figure 4-1 D R F test time reduction  68 viii  Figure 4-2 Classify e - S R A M s with redundancy  69  Figure 4-3 Four-cell memory configuration  71  Figure 4-4 A typical 6T S R A M cell  73  Figure 4-5 The M a r c h 9 N test with Pause test  75  Figure 4-6 M a r c h 5 N test algorithm  75  Figure 4-7 P D W M a r c h 6 N test algorithm  75  Figure 4-8 N W R M a r c h 7 N test algorithm  75  Figure 4-9 Summary o f test algorithm selections  76  Figure 4-10 A flow chat o f test time evaluation b y applying N W R T M  77  Figure 4-11 e - S R A M s test time reductions  77  Figure 4-12 Evaluating test time reduction considering 10 numbers  78  Figure 4-13 Evaluating test time reduction considering technology node trends  79  Figure 4-14 Evaluating test time reduction considering M a r c h algorithm complexity impacts ..81 Figure 5-1 The test architecture i n [4, 5]  88  Figure 5-2 M e m o r y with serial-data-path connections i n the B I S T mode  88  Figure 5-3 The proposed test architecture  91  Figure 5-4 Designs for pattern delivery and S P C  92  Figure 5-5 A proposed L R A with a fan-in o f 64  93  Figure 5-6 A typical 6T S R A M cell and its pre-charge control circuits for P D W T M  94  Figure 6-1 The diagnosis scheme i n [7-8]  107  Figure 6-2 M e m o r y with bi-directional serial connections i n the B I S D mode  108  Figure 6-3 The proposed diagnosis scheme  109  Figure 6-4 Designs for pattern delivery and S P C  Ill  Figure 6-5 Design for a general P S C  112  ix  Figure 6-6 A typical 6T S R A M cell and its pre-charge control circuits for P D W T M  114  Figure 7-1 Simple "single-rectangle" test power model for e - S R A M s  122  Figure 7-2 The "retention-aware" test power model  123  Figure 7-3 Test scheduling with (a) "single-rectangle" and (b) "retention-aware" test power model using the test scheduling algorithms i n [8]  125  Figure 7-4 Test time reduction factor assuming N replicas o f the M 1 - M 4 e - S R A M s block .... 126 Figure 8-1 A resistive bridge i n between an asymmetric memory cell and a symmetric one.... 138  x  Acronyms SoC  System on-a-Chip  IP  Intellectual Property  ITRS  International Technology Roadmap o f Semiconductors  SRAM  Static Random Access M e m o r y  e-SRAM  Embedded Static Random Access M e m o r y  DRF  Data Retention Fault  non-DRF  A l l e - S R A M Faults except D R F s  DFT  Design-for-Test  IDDX  Including both the Quiescent power supply current test method (IDDQ) and Transient power supply current test method (IDDT)  BIST  Built-in-Self-Test  PDWTM  Pre-Discharge Write Test M o d e  IFA  Inductive Fault Analysis  FFM1  Functional Fault M o d e l involving a single memory cell  FFM2  Functional Fault M o d e l involving two memory cells  TD  Test Delay  TDH  Test Delay to detect whether a cell can keep H I G H level after some time  TDL  Test Delay to detect whether a cell can keep L O W level after some time  TDF  Final Test Delay to detect Data Retention Faults  TDA  Test Delay to Access other memory cells for a cell under test  TP  Test clock Period  TNDF  N e w Final Test Delay  xi  Toh  Test time overhead due to the use o f D F T techniques  NWRTM  N o Write Recovery Test M o d e  WL  W o r d Line  BL  Bit Line  NWRC  N o Write Recovery C y c l e  OPG  Overall Production Gain  SPC  Serial to Parallel Converter  PSC  Parallel to Serial Converter  LRA  Local Response Analyzer  MISR  Multiple Input Shift Register  BISD  Built-in Self-Diagnosis  CF  Coupling Fault  6T  6-Transistor  PDW  Pre-Discharge Write  SF  Strong Fault  WF  Weak Fault  UF  Undetected Fault  xii  Acknowledgements It has been an extremely pleasant experience to study for m y P h D with Dr. Andre Ivanov at University o f British Columbia. Throughout m y stay with the S o C Research Group, D r . Andre Ivanov has provided continuous support and valuable advice vital for the completion o f this research. H i s efforts and involvement i n this research are greatly appreciated. I have received expert advice from Dr. Yervant Zorian at the Virage Logic Corporation and Dr. Yuejian W u at Nortel Networks. Their useful comments helped me to focus the research direction.  Throughout this research, a team o f brilliant researchers has been remarkable i n offering insightful opinions and generous assistance.  This unforgettable team is sincerely honored and  treasured - Josh Y a n g , Partha Pande, Cristian Grecu, A n d y K u o and Derek H o . Without their continuous motivation and support, the completion o f this research would have not been possible.  Moreover, the entire members o f the S o C Research Group at U B C are deeply  cherished for their constant sharing o f wisdom and sense o f humor.  Last but not least, this thesis is dedicated to m y family, m y wife Y u n (April) L i , m y brothers Baohu Wang, Baochun Wang, and Baozheng Wang who have been extraordinarily supportive throughout m y years at U B C .  Their encouragement is one o f the strongest motivations for the  pursuit o f m y P h D .  This research is supported by  PMC-Sierra, Canadian Microelectronic Corporation, Natural  Sciences and Engineering Research Council o f Canada ( N S E R C ) , Micronet R & D , and Gennum Corporation.  xiii  Co-Authorship Statements During the completion o f this thesis, I have contributed almost all o f the efforts required, such as literature surveys, problem refinements, solution developments, performance evaluations and comparisons, systematic summarization and so on. Except for m y supervisor, I have three coauthors: D r . Yervant Zorian, Dr. Yuejian W u and M r . Josh Y a n g . Their contributions to this thesis mainly focus on reviewing the papers, providing comments, helping to improve the papers, etc. Moreover, M r . Josh Y a n g has helped on initializing our proposed P D W T M techniques.  xiv  Chapter 1  Introduction  1.1 Motivation Recently, the System-on-a-Chip (SoC) paradigm has been associated with a trend from logicdominant chips to memory-dominant ones. A n increasing number o f memories, e.g., Static Random Access Memories ( S R A M s ) , are embedded into emerging SoCs. For instance, 30% o f the A l p h a 21264 microprocessor and 60% o f the S t r o n g A R M Reduced Instruction Set Computer (RISC) processor are devoted to cache memory which is mainly constructed from embedded S R A M s ( e - S R A M s ) [1]. It is even predicted by the International Technology Roadmap for Semiconductors 2003 (ITRS'03) that over 90% total chip area w i l l be occupied with diverse memories by 2013 [2].  The e - S R A M complexity continues to increase i n size and speed, causing two major problems o f implementation. According to the Poisson model [3], component yield is inversely proportional to its area. Since the total S o C yield is the product o f each component yield, the component with larger area dominates the S o C i n yield. A s those e - S R A M s occupy more and more S o C area, those e - S R A M s w i l l be one o f the significant yield limiters for SoCs. Compared with other embedded logic cores, e - S R A M s are more prone to manufacturing defects and field reliability problems since they are subject to aggressive design rules. Therefore, the yield dominance o f eS R A M over S o C is more prominent. A s a result, efficient e - S R A M test and numerous e - S R A M repair become mandatory. In this thesis, we focus on e - S R A M test.  Since the IOs or ports o f those e - S R A M s are not accessible to the external users, it is impossible to test them externally. W i t h the current/latest technology, it is found that some defects are 1  speed-related and testing e - S R A M s at a full speed would improve the test coverage and therefore yield. Due to these two reasons, Built-in Self-Test (BIST) is currently the only practical solution for e - S R A M s . Generally, B I S T circuitries include test pattern generation circuits, test address generation circuits and control circuits. During B I S T execution, several factors, e.g., test coverage, test time, test area overhead and test power, need to be traded-off for an optimal balance between test yield and test cost. The higher the test coverage, the higher the yield. O n the other hand, the higher the test coverage, the higher the test cost i n terms o f test time, area overhead and power. The consideration order to achieve an optimal B I S T trade-off depends on the selected test algorithm. Currently, the M a r c h test algorithm [4] is widely used because o f its easy implementation. Both the test time and area overhead are proportional to e - S R A M capacity. However, the test time grows much faster than the test area overhead when e - S R A M capacity increases because the area o f the B I S T pattern generation and control circuitry does not vary with the e - S R A M capacity. For example, i f an e - S R A M capacity doubles, the test time w i l l also be doubled. However, B I S T pattern generation and control circuits can still be maintained and the test address generator registers only increase by one bit. Test power is determined by the number o f e - S R A M s concurrently tested.  Previously, e - S R A M s were implemented with small capacities/densities and low speed. A t that time, as long as test coverage was guaranteed, test engineers would minimize the test area overhead as much as possible. They treated both test time and test power as less important. In other words, to quantify an optimal trade-off, test area overhead would be considered first and then test time. A s aggressive applications continue to drive technology upgrades, e - S R A M s become more, larger, denser, faster and more complex i n terms o f architectures. A s a result, test time, test area overhead and test power need to increase i n order to achieve acceptable test  2  coverage and yield. A s explained above, the total test time grows much faster than the other factors. Therefore, the previous order o f priority i n view o f obtaining an optimal B I S T trade-off needs to be changed. The new order o f priority is: first, test time, second, test power and, thirdly, test area overhead. A s a result, this thesis w i l l further focus on providing time-driven test methodologies which apply to both single e - S R A M and multiple ones, especially to those multiple distributed small e - S R A M s .  1.2 Related Work The nature o f S R A M testing is different from that o f logic testing since a S R A M is actually more of a mixed-signal device whose faulty behavior i n nature is often analog. This is because most o f the e - S R A M sensing amplifiers are generally differential. Thus, defect-based fault models are more realistic and attractive. Conceptually, the e - S R A M fault sets can be divided into Data Retention Faults (DRFs) and other faults, e.g., coupling faults and stuck-at faults, named as nonD R F s in this thesis. Correspondingly, the test time for a single e - S R A M consists o f the time for testing D R F s and the time for testing non-DRFs. Previous work on reducing the non-DRFs test time under defect-based fault models mainly focuses on parallel algorithms, serial interfacing techniques [5-6], and/or partition tests [7-8] or new memory cell structures [9-10]. In [5-6], unidirectional or bidirectional serial interfaces which involve memory cells are selected to deliver patterns and collect response for multiple small distributed e - S R A M s in order to minimize test area overhead. In [7-8], large capacity memories are first divided into several relatively smaller capacity so-called segments and these segments are tested i n parallel. In [910], high test coverage and short test time for e - S R A M s are achieved simultaneously by completely replacing two-ended S R A M s with single-end ones, i.e., the ones with a single bit line. The D R F s are usually detected i n practice by performing a read operation after a predetermined delay (e.g., typically o f the order o f 100ms [11]), i.e., commonly referred to as a 3  Pause Test [11]. Such tests tend to be relatively time-consuming. Previous literature on reducing the D R F test time mainly focuses on completely removing the delay time required for detecting D R F s from the test flow by applying various Design-for-Test ( D F T ) techniques, e.g., weak write test mode [12] and the I D D X solutions [13-14]. However, the previously proposed algorithms/methodologies only partly alleviate the test time challenges for e - S R A M s . In other words, their test time is not the best based on their test requirements and therefore their test time-efficiency is not optimal. Their major shortcomings can be summarized as follows: (1) W i t h the current solutions, testing single e - S R A M non-DRFs, especially coupling faults, is time inefficient. This is because interdependency between test and/or redundancy activities is generally not considered. (2) Existing solutions (including D F T techniques) for detecting D R F s can not achieve the best test time for all sizes o f e - S R A M s . (3) Current universal test algorithms are generally ineffective i n dealing with the test o f e - S R A M s with a wide range o f capacities since they test D R F s and non-DRFs separately. (4) The architectures for testing and/or diagnosing multiple small distributed e - S R A M s i n [5-6] are not test time efficient due to their serial nature i n delivering patterns and collecting responses. (5) Without considering D R F s test, the current "single-rectangle" power model used during powerconstrained test scheduling o f e - S R A M s , e.g., the one i n [15], is overly pessimistic in test time. This is because the test power during the delay phase for D R F test can be negligible compared with the one during other test phases. Such pessimism is especially obvious for small e - S R A M s that are too small to be repaired.  1.3 Thesis Organization The format o f this thesis departs from the traditional format. It is essentially formatted according to what U B C refers to as "manuscript-based" thesis. This means that the thesis is essentially a collection o f manuscripts that have been reviewed by experts i n the field and published i n 4  conference proceedings and/or archival journals. This compilation o f papers is augmented b y an introductory and conclusive chapter. B y virtue o f this format, the reader w i l l find some redundancies i n the presentation o f the material. In more detail, the thesis is organized as follows:  This Chapter 1 introduces the thesis by presenting motivation and previous work surveys while Chapter 8 is a conclusion and presents possible future work directions.  Chapter 2 addresses the major shortcomings (1) and (2) listed Sec. 1.2. Firstly, a simplified coupling faults test methodology is proposed for reducing coupling faults test time for a single eS R A M . B y assuming those coupling faults are bi-directional, i.e., both cells affect each other equally, this methodology considers and applies the interdependency between the bi-directional coupling fault test and hard repair schemes.  Another algorithm-based technique  named  Simplified Pause Test (SPT) for D R F s is also discussed i n Chapter 2. Combining the abovementioned two methodologies, a new M a r c h 5 N algorithm generated from a M a r c h 9 N with D R F test is proposed, validated and evaluated.  Chapter 3 addresses the major shortcoming (2) listed i n Sec. 1.2 by describing a time-efficient DFT  technique referred to as Pre-Discharge Write Test M o d e ( P D W T M ) . This technique  significant accelerates the testing o f open defects within an e - S R A M cell, including the ones causing D R F s . It starts from concepts and preliminary analysis and then the two viable implementation  methods,  including the  control  circuitry designs  and  simulation-based  validations are explained i n detail. The subsequent discussions not only show its advantages and  5  give implementation method selection criteria for different cases, but also point out the future possible improvement directions.  Chapter 4 addresses the major shortcoming (3) listed i n Sec. 1.2 by presenting a methodology to design a time-efficient customizable algorithm for any single e - S R A M . The classification methodology is firstly described and a case study is presented to evaluate its performance. Several parameters involved i n the proposed methodology are discussed to show its generality in test time reduction. Those parameters include e - S R A M IO number, technology node and comparable M a r c h Algorithm complexity.  Chapter 5 addresses part o f the major shortcoming (4) listed i n Sec. 1.2 b y proposing timeefficient test architecture for multiple small distributed e - S R A M s . The main improvements over the referred solutions are replacing the time-consuming global serial response analyzer with a parallel local response analyzer and applying the P D W T M technique for D R F testing.  Chapter 6 proposes another time-efficient diagnosis scheme for multiple distributed small eS R A M s to address the major shortcoming (4) cited i n Sec. 1.2 again. The main improvements over the referred existing solutions are replacement o f the unidirectional and bi-directional serial interfaces with a pair o f serial-to-parallel and parallel-to-serial converters and selection o f the P D W T M technique for D R F diagnosis. The former replacement is to avoid the existing serial fault masking and defect-rate dependent problems.  Chapter 7 proposes a "retention-aware" test power model for power-constrained test scheduling on multiple e - S R A M s to address the major shortcoming (5) listed i n Sec. 1.2. The concept is  6  explained i n detail and several case studies are discussed ranging from multiple BISTed eS R A M s within a pure e - S R A M environment to e - S R A M s within a S o C environment.  Finally, Chapter 8 concludes and provides some suggestions for future research direction.  1.4 Contributions The principal knowledge contributions o f this thesis are summarized below, essentially following a chapter by chapter organization: 1. In Chapter 2, we propose a simplified coupling fault test and a simplified Pause Test for detecting retention faults. Firstly, the interdependency o f the selected hard repair scheme and the coupling fault testing is demonstrated, where those coupling faults are assumed bi-directional. B y applying a single addressing sequence, rather than both an increasing and a decreasing addressing sequences i n the traditional approaches, this interdependency helps to reduce the test time o f e - S R A M coupling faults by a factor o f up to two. However, this methodology cannot be applied to e - S R A M qualified for soft repair scheme or without repair scheme. Secondly, the simplified Pause Test methodology speeds up the D R F s from an algorithm-based point o f view. 2. In Chapter 3, we propose the Pre-discharge Write Test M o d e ( P D W T M ) to accelerate the detection o f the DRF-related and other opens by modifying the e - S R A M s with D F T circuitries. B y implementing it straightforwardly, P D W T M is capable o f detecting all opens i n embedded S R A M s with a zero-time penalty and at-speed test capability although it may consume more test power. This at-speed test capability is very attractive for testing e - S R A M s with speed-related defects i n very deep submicron technology. 3. Chapter 4 contains the proposal o f a customized time-efficient e - S R A M March test algorithm which involves the memory size information. This algorithm effectively 7  combines the advantages o f the methodologies developed i n Chapters 2 and 3. W i t h the consideration o f the interdependency between the test o f D R F s and non-DRFs, an eS R A M is categorized according to its test and repair features. The algorithm for each category is proved to be the most time-efficient while still providing high test coverage. 4. Chapter 5 develops test architecture for multiple distributed small e - S R A M s to improve test time o f the referred test architectures. This is achieved b y replacing the global serial response comparator with the parallel small-size local response analyzer. 5. Chapter 6 is a proposal to enhance the diagnosis time o f the referred  diagnosis  architecture for multiple distributed small e - S R A M s . This is done through replacing the unidirectional and bi-directional serial interfaces with a pair o f serial-to-parallel and parallel-to-serial converters. The improvement overcomes the serial-fault masking and defect-rate dependent diagnosis problems. 6.  Chapter 7 builds a "retention-aware" test power model for power-constrained test scheduling o f the multiple e - S R A M s within a S o C . W i t h the invented "retention-aware" test power model, scheduling the e - S R A M testing properly yields D R F coverage with zero test overhead. This is achieved b y taking advantage o f the D R F delay cycles.  1.5 References [1] S. Manne, A . Klauser, and D . Grunwald, "Pipeline gating: speculation control for energy reduction", Proceedings of the 25th annual international symposium on computer architecture, 1998, pp. 32-141. [2] International Technology Roadmap for Semiconductors, 2003 Edition: "System Drivers", pp. 6-8, 2003. [3] M . L . Bushnell and V . D . Agrawal, "Essentials o f Electronic Testing for Digital, Memory, & Mixed-Signal V L S I Circuits", K l u w e r Academic Press, 2000.  [4] A . J. V a n D e Goor, "Using march test to test S R A M s " , IEEE Design & Test of Computers, '  V o l . 10, N o . 1, pp. 8-14, 1993.  [5] W . B . Jone, D . C . Huang, S. C . W u , and K . J. Lee, " A n efficient B I S T method for distributed small buffers", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, V o l . 10, N o . 4, pp. 512-515, 2002. [6] D . C . Huang and W . B . Jone, " A parallel built-in self-diagnostic method for embedded memory arrays", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, V o l . 21, N o . 4, pp. 449-465, 2002. [7] F . K a r i m i , S. Irrinki, T. Crosbuy and F . Lombardi, " A parallel approach for testing multiport static random access memories", Proceedings of IEEE International Workshop on Memory Technology, Design and Testing, 2001, pp. 73-81. [8] J . C . Lee, Y . S. K a n g and S. H . Kang, " A parallel test algorithm for pattern sensitive faults i n semiconductor random access memories", Proceedings of IEEE ISCAS '97, 1997, V o l . 4, pp. 2721-2724. [9] C . F . W u , C . C . Wang, R . T. Hwang, and C . H . K a o , "Design o f single-ended S R A M with high test coverage and short test time", Proceedings of1998 IEEE International Symposium on Circuits and Systems (ISCAS '98), 1998, V o l . 2, pp. 292-295. [10] C . F . W u , C . C . Wang, R . T. Hwang, and C . H . K a o , "Single-ended S R A M with high test coverage and short test time", IEEE Journal of Solid-State Circuits, V o l . 35, N o . 1, pp. 114118, 2000. [11] A . J . V a n D e Goor, "Testing Semiconductor Memories, Theory and Practice", C o m T e x Publishing, Gouda, the Netherlands, 1998,  http://cardit.et.tudelft.nl/~vdgoor.  [12] A . Meixner and J . Banik, "Weak write test mode: an S R A M cell stability design for test technique", Proceedings oflTC, 1996, pp. 309-318.  9  [13] V . H . Champac,  J.  Castillejos, and J . Figueras,  "IDDQ  testing o f opens i n C M O S S R A M s " ,  Proceedings of 16th IEEE VTS, 1998, pp. 106-111. [14] D . H . Y o o n , H . S. K i m and S. H . Kang, "Dynamic power supply current testing for open defects i n C M O S S R A M s " , Electronics and Telemmunication Research Institute Journal, V o l . 23, N o . 2, pp. 77-84, 2001 [15] C . W . Wang, J . R . Huang, Y . F . L i n , K . L . Cheng, C . T. Huang, et al., "Test scheduling o f BISTed memory cores for S o C " , Proceedings of the 11th Asian Test Symposium, 2002, pp. 356-361.  10  Chapter 2  Simplifying Coupling Faults Test and Pause Test for DRFs for an e-SRAM  1  2.1 Introduction The System-on-Chip (SoC) paradigm is associated with a trend from logic-dominant chips to memory-dominant ones. From the ITRS documents [1], by 2013 over 90% o f chip area w i l l be occupied by memories, e.g., embedded S R A M s ( e - S R A M s ) . Increasingly dense e - S R A M s with large capacity are more prone to faults, not only reducing memory and S o C yield but also posing large challenges i n test time.  The nature o f memory testing is different from that o f logic testing since a memory is actually more o f a mixed-signal device whose faulty behavior is often analog i n nature. However, traditional functional memory fault models [2] typically assume a behavior that is digital in nature. Thus, memory tests targeting the coverage o f traditional functional fault models are inherently difficult to relate to yield. Inductive Fault Analysis (IFA) technology [3] has been proposed to extract memory faults that correspond to potential defects within memories [4-5]. Fault models built through I F A (here, we simply call them as I F A functional fault models) does relate more easily to test yield. However, test time challenges due to the higher and higher memory capacities still exist. T o reduce the test time, i n this chapter we base our approach on the I F A functional fault models translated from defect simulations but simplify the I F A Functional  1  A version of this chapter has been published. B. Wang, J. Yang and A. Ivanov, "On the Reduction of Test Time of  Embedded SRAMs", Proceedings of the 2003 IEEE International Workshop on Memory Technology, Design, and Testing (MTDT2003), 2003, pp. 47-52. 11  Fault Models involving T w o cells (FFM2s) and reduce test time o f the Data Retention Faults by further analyzing the defect causes o f these functional faults.  Another aspect we consider here is the memory redundancy elements. That is, toward reducing memory yield loss, a number o f different redundancy techniques are generally applied [6]. R o w or column redundancy or their combinations are considered to be more applicable for high capacity memories while word redundancy is usually deemed more effective for low capacity memories according to the performance trade-offs in [6]. In practice, the specific test set development is separated from the development o f the redundancy circuitry and associated repair algorithms. Our premise is that greater efficiencies, i.e., reduced test time, can be achieved b y more tightly coupling these activities and b y considering the redundancy techniques  and  simplified F F M 2 s .  The remainder o f this chapter is organized as follows. In Sec. 2.2, we revisit the I F A functional fault models i n view o f simplifying them. This is achieved b y reanalyzing the coupling fault and Data Retention Fault sets. A new March 5 N test algorithm based on these simplified models as well as row/column redundancy elements is proposed and presented i n Sec. 2.3. Simulation results from fault injections are presented and discussed i n Sec. 2.4. Finally, Sec. 2.5 summarizes the chapter.  2.2 Defect Analysis of IFA Functional Fault Models 2.2.1 Background The adoption o f specific fault models dictates the choice o f applicable test algorithms. Obviously, the accuracy o f test quality assessments directly depends on the fault model's ability 12  to represent the faulty behaviours caused by physical defects. Currently, there are two major methodologies available for building S R A M functional fault models: one is functional-based and the other is defect-based.  In the case o f the former, various faulty functional behaviours are  assumed, regardless o f their specific possibility and likelihood o f being caused by a physical defect. In the latter case, the layout and circuit schematic are considered and Inductive  Fault  Analysis (IFA) is used to establish relationships between physical defects and their electrical and logical implications, as well as the statistical likelihood o f such faulty behaviours.  One attractive feature o f functional fault models is that they tend to be simple to develop and formulate. However, their practical use is often easily jeopardized by the fact that corresponding fault lists and test sets grow unwieldy. For example a test length o f ( 3 N + 2N) x 2 is required 2  N  to cover N P S F s (Neighbourhood Pattern Sensitive Faults), where N is the number o f addresses [7]. A s a result, efforts are necessary to sacrifice "unimportant" or "unrealistic" functional faults from the more comprehensive sets. The difficulty then lies i n establishing the relative importance o f the functional faults that should be retained for consideration that correlate with final quality and yield.  I F A functional fault models are proposed as an alternative to address the relationships between functional faults and test yield. In [4], the realistic functional faults translated from defect simulations under several assumptions are presented and the evaluation results are consequently in terms to defect level and yield. However, the overall test time from the fault translations to final manufacturing test is still unacceptable not only because o f the lengthy I F A flow but also because o f the increased memory capacity, such as i n [4]. Thus, the I F A flow makes the memory  13  test more realistic but does not solve the test time challenge especially for testing current and future e - S R A M s with higher capacity.  In this chapter, functional fault models derived from an I F A flow are applied but simplified to reduce test time while still assuming that there is one but only one defect within a cell at one time and the cell is designed symmetrically. Within IFA-based functional fault models, F F M 2 s and Data Retention Faults are re-visited from the defect-level perspective.  2.2.2 The Simplified FFM2 Fault Class F F M 2 denotes the functional faults involving two memory cells. Most o f F F M 2 s are caused by the bridge defects between two cells. Other defects, such as O C 1 0 , S W 1 , B C 2 , B C 3 and O W defined i n [4], can also lead to F F M 2 s . However, these defects can be detected with Functional Fault Models involving a Single cell ( F F M 1 ) . This has already been verified i n [4]. Thus, according to the four cell configuration, F F M 2 s can be simplified from the ones based on faulty behaviours, e.g., the five types described i n [4], into the following three types based on defect locations, without reducing the defect coverage (we refer to the four-cell configuration to illustrate these fault types): /BL1  WL1-  BL2  /BL2  C1  C3  C2  C4  WL2-  Figure 2-1 Four-cell memory configuration •'  R o w Coupling Faults (RCFs): due to the potential bridges on the same row, the logic states o f the two faulty cells (i.e., C I and C 3 , or C 2 and C 4 i n Figure 2-1) are always inverted from their normal/intended state.  14  •  Column Coupling Faults (CCFs): because o f the potential bridges on the same column, the contents o f the cells (i.e., C I and C 2 , or C3 and C 4 i n Figure 2-1) are coupled to the same values.  •  Diagonal Coupling Faults (DCFs): when the potential bridges are on the same diagonal, the corresponding memory cells (i.e., C I and C 4 . or C 2 and C3 i n Figure 2-1) w i l l always retain the same logic value.  After row (column) redundancy activation by the fuse repair, a whole defective row (column) o f a memory cell array w i l l be replaced b y the redundancy element.  Hence, only the redundant  rows (columns) instead o f the defective rows (columns) o f the memory w i l l be accessed subsequently. Normally, the faulty memory cells, the aggressor cell and the victim cell, subject to any one o f the above three coupling faults needs to be detected and repaired because both o f them are assumed to be defective. However, once the redundant row (column) is used to replace that o f the aggressor or victim cell, the relationship between the victims or aggressors w i l l not exist because the aggressor or victim cell w i l l cease to be accessed according to the applied redundancy mechanism. In other words, detecting and repairing any one faulty cell o f a pair due to bridges are enough for yield improvement i f no other defects occur affecting such a memory pair. For example, i f C I and C 2 i n Figure 2-1 are faulty only because they are coupled, the achieved yield when detecting and repairing C I or C 2 is the same as that when detecting and repairing both, that is, C I or C 2 w i l l become fault-free i f one or the other is repaired. In this fashion, not only can test time be shortened, but repair component requirements can also be reduced. Both these factors can reduce test cost.  2.2.3 Data Retention Fault (DRF) Another type o f memory cell fault, namely the Data Retention Fault ( D R F ) , which occurs when a memory cell fails to correctly retain a previously stored logic value after some time, is difficult 15  to simulate using logic level simulators. This type o f fault can i n practice be detected b y performing a read operation after a delay (TD).  To reduce the necessary T D for testing D R F s , several Design-for-Test ( D F T ) techniques have been proposed, e.g., [8-10]. However, as these entail extra circuitry, they amount to different area penalties. In this chapter, not only is the delay time (TD) for detecting such faults reduced using our proposed test algorithm described later, this reduction is also quantified according to various parameters, such as the specifics o f the memory under test, test cycle, specific reading sequence during the testing o f D R F s , etc.  In general, two different D R F s may occur: •  C e l l inability to retain a logic low after a specific time referred to as T D L  •  C e l l inability to retain a logic high after a specific time referred to as T D H  D R F s can be caused by a defective source/drain/gate open o f the pull-up transistor o f the defective cell or by a defective V c c / G N D path. Figure 2-2 shows one o f the Data Retention Fault Models as a resistive open defect i n a G N D / p u l l - d o w n transistor. W i t h this defect, the cell w i l l fail to retain a logic low.  Usually, the final T D ( T D F ) is the maximum value o f T D L and T D H i n order to cover all the potential defects. However, because the bit-line w i l l be pre-charged to high before the read operation and the node T i n Figure 2-2 w i l l be pulled high due to the charge sharing effect by the larger bit line capacitance, reading zero from these cells with a type o f defect shown in Figure 22 w i l l flip their contents to logic one. This D R F circuit level behavior corresponds to that o f a Read Destructive Fault ( R D F ) or a Deceptive Read Destructive Fault ( D R D F ) i n [4]. Therefore,  16  this kind o f defect can be detected i f both R D F and D R D F have already been detected and the corresponding T D F can only be prescribed b y the value o f T D H . A shorter T D H w i l l reduce the testing time o f D R F s .  wu  ~|=]b  BL  ,  ,  BLb [Opens at GND/Pull down 7 Transistor  Figure 2-2 One DRF model with a resistive open in the GND/pull-down transistor Moreover, the T D can be further reduced i n the proposed test algorithm discussed i n detail i n Sec. 2.3. This reduction is possible because o f the necessary delay separating write operations to different cells. If the number o f memory rows/columns and number o f operations within each March element are denoted b y X / Y and M respectively, the new final delay time ( T N D F ) is defined as (1)  TNDF = T D F - T D A  where, T D A is defined as the re-accessing time o f a particular memory cell under test. According to different address sequencing during test, T D A can be quantified according to the following (the period o f the test clock is denoted by T P ) i) Assuming cells are accessed row by row, T D A = M * T P * ( 2 - 1) * 2 X  (2)  Y  ii) Assuming cells are accessed column by column, TDA = M * T P * ( 2 - 1 )  (3)  X  In summary, by combining equation (1), (2) and (3), the T N D F can be chosen according to the following:  17  T N D F = T D F - M * T P * ( 2 - 1) * 2 X  Y  i f a March algorithm with a row by row address  sequencing is applied; = T D F - M * T P * ( 2 - 1) i f a M a r c h algorithm following a column by column address X  sequencing is applied; = T D F for other test algorithms. For all the three cases above, i f R D F s and D R D F s i n [4] have already been tested then T D F = T D H , otherwise T D F = max ( T D L , T D H ) .  Furthermore, from the formulations, the test time can still be reduced because the T D F can even be nullified. For example, assuming T D F = 1ms, then T N D F = 0 when testing the memories o f 128K words capacity, 9 row addresses and 8 column addresses by using the March row by row address sequencing and single-operation M a r c h element ( M = 1).  2.3 March 5N Test Algorithm 2.3.1  Related Work  j]W0 Delay ^(RO ff(R0  W1)  W1) ^(R1  Delay  ft(R1  WO) WO)  Figure 2-3 The compact March 9N with Pause test Currently, M a r c h 9 N [11] is a popular memory test algorithm. Based on [12], inserting two delay cycles between any two March elements o f the March 9 N only can have extra D R F coverage. This M a r c h 9 N with Pause test is shown i n Figure 2-3. In the descriptions o f the memory test algorithms, " 1 N means the address sequencing is increasing during the test,  means the  address sequencing is decreasing during the test and "delay" is the necessary T N D F discussed in Sec. 2.2. 18  2.3.2 The Proposed March 5N Test Algorithm According to the defect analysis o f the I F A functional faults i n Sec. 2.2, the same yield can be achieved b y row/column redundancy when repairing only one o f two faulty cells which are due to bridges between the two cells as that when repairing both. W h e n writing into such faulty memory cells, because the final contents o f these two cells is only determined by the cell with the higher address when the writing address sequencing is increasing and b y the cell with the lower address when the writing address sequencing is decreasing, only the testing o f the F F M 2 s w i l l be sensitive to the address sequencing, i.e., half o f the potential F F M 2 s could be detected when using the increasing address sequencing and decreasing address sequencing separately (this w i l l be verified i n Sec. 2.4). Thus, either one o f the address sequencing coupled with row/column redundancy activation is adequate for meeting the yield goal. This reduces the test time significantly. The proposed M a r c h 5 N test algorithms are shown i n Figure 2-4 (a) and (b), where either one o f the test algorithms with different address sequencing can be applied to achieve the same yield as [11] when coupled with row/column redundancy. J],WO Delay  Delay ^ ( R O .£U  R1  Delay ^ (RO  W1)  _*(R1  Delay  WO)  W1) WO)  (to  Figure 2-4 The proposed March 5N test algorithm From Figure 2-3 and Figure 2-4, the proposed March 5 N test algorithms reduce the test time to almost a half o f the time required for running the M a r c h 9 N with Pause test. The reason to maintain the third M a r c h element with a combination o f R l and WO is for detecting potential faults within pre-charge and column decoder circuits.  19  2.4 Validation of the Proposed Test Algorithm 2.4.1  Fault Injections / / W riting C e l l s  C a s e Address // injecting R C F s / / A d d r e s s Difference b e t w e e n // A g g r e s s i v e and V i c t i m C e l l s is O N E a g g e s s i v e cell a d d r e s s : victim cell = inverter (aggressiv e cell) // injecting C C F s / / A d d r e s s Differencebetween / / A g g r e s s i v e and V i c t i m C e l l s is F O U R a g g e s s i v e cell a d d r e s s : victim cell =aggressive cell // injecting D C F s / / A d d r e s s Difference b e t w e e n // A g g r e s s i v e and V i c t i m C e l l s is F I V E a g g e s s i v e cell a d d r e s s : victim cell ^aggressive cell  Endcase  Figure 2-5 Coupling faults injections In order to verify the test algorithms, the Embedded M e m o r y under Test ( E M U T ) o f [13] consisting o f a 65536 (64K) x32bit S R A M core with 14 row addresses and 2 column addresses is injected with faults.  To cover all the cell defects, all the potential functional faults F F M l s translated from defectbased simulations except D R F s were injected for validation. A l l the simplified F F M 2 s i n this chapter were also injected into the E M U T i n [12] according to the four cell configuration i n Figure 2-1 and the H D L program i n Figure 2-5.  The faults and their addresses which were selected according to the Figure 2-1 are shown i n Table 2-1. Data Retention Faults ( D R F s ) were not injected into the S R A M , but can be detected by reading the cells after T N D F according to the defect analysis i n the Sec. 2.2.  20  Table 2-1 Faults injected into 64Kx32bit SRAM under test Faults  Number o f Faults  Fault Addresses (Hex)  Stuck-at Fault ( S A F )  4  00, 03, fd, ff  Transition Fault (TF)  4  20, 22, 30, 33  Stuck-Open Fault (SOF)  4  25, 26, 34, 36  Read Destructive Faults ( R D F )  4  50, 53, 59, 5a  4  60, 63, 69, 6a  Incorrect Read Fault (IRF)  4  70, 73, 79, 7a  Random Read Fault ( R R F )  4  80, 82, 90, 93  Undefined State Fault ( U S F )  4  86, 88, 95, 96  Deceptive Read Destructive Fault (DRDF)  Aggressor/Victim RCFs  2  06/07  CCFs  4  04/08, 10/14  DCFs  4  Oa/Of, 16/lb  Coupling Fault (CF)  Total  42  2.4.2 Validation Results The proposed test algorithm was validated under a H D L simulation environment by injecting the entire functional fault F F M l s except D R F s and the proposed simplified F F M 2 s and testing the E M U T by using the new March 5 N with row by row address sequencing. The detected fault cell addresses are shown i n Table 2-2 when both the proposed test algorithms and injected faults were simulated.  21  From Table 2-2, all the faults and defects under the initial assumptions are detected using both address sequencing.  Only the detection o f Coupling Faults (CFs) is affected b y the address  ordering. If row/column redundancy is activated, using one o f the two specific address sequencing rather than both suffices to meet yield improvement goals. Table 2-2 Detected fault cells Faults  Using Algorithm i n Figure 2-4 (a)  Using Algorithm i n Figure 2-4 (b)  SAF  00, 03, fd, ff  00, 03, fd, f f  TF  20, 22, 30, 33  20, 22, 30, 33  SOF  25, 26, 34, 36  25, 26, 34, 36  RDF  50, 53, 59, 5a  50, 53, 59, 5a  DRDF  60, 63, 69, 6a  60, 63, 69, 6a  IRF  70, 73, 79, 7a  70, 73, 79, 7a  RRF  80, 82, 90, 93  80, 82, 90, 93  USF  86, 88, 95, 96  86, 88, 95, 96  CF  07, 08, Of, 14, l b  04, 06, 0a, 10, 16  Total  37  37  2.4.3 Comparison with March 9N with Pause test The test time o f the M a r c h 9 N with Pause test i n Figure 2-3 can be quantified as (9*2 * T P + 2* N  T N D F ) . However, using either o f the test algorithms in Figure 2-4 (a) or (b), the test time o f the proposed test algorithms can be reduced to (5*2 * T P + 2* T N D F ) . Where, N is the memory N  address number.  In [13], the 65536 (64K) x 32 bit S R A M with X = 14 row addresses and Y = 2 column addresses is tested using a 10 ns test clock period T P and row-wise test address sequencing. T D H and T D L 22  are 0.5ms and 0.8ms respectively according to the S P I C E simulation i n [12]. It is noted that T D H and T D L are usually an order o f 100 ms and their values shown here are selected for easy comparison only. Using formulas i n Sec. 2.3, the test time o f the three test algorithms is compared and shown i n Table 2-3 according to the quantification results above. Table 2-3 Comparison results Test Algorithms  March 9 N  March 5 N  TP(ns)/XA7N/M  10/14/2/16/1  T D H / T D L (ms)  0.5/0.8  T N D F (ms)  0.8  0  Test Time (ms)  7.50  3.28  From Table 2-3, it was found that the test time o f our proposed M a r c h 5 N can be reduced up to 56% o f that o f the latter respectively when compared with M a r c h 9 N with Pause test. In summary, our proposed test algorithm is more efficient in test time compared to those o f M a r c h 9 N with Pause test. Importantly, this speedup is achieved without compromising fault and defect coverage.  2.5 Summary Although an IFA-based test flow allows the establishment o f the relationship between the test results and yield and defect levels better than do the methodologies based on traditional functional tests, the test application time challenge, especially arising with increased memory capacity remains. Based on the IFA-based fault models and further defect analysis, here we simplified F F M 2 s which can be used to reduce test time by coupling row/column redundancy. The predetermined delay time o f D R F s is formulated to test memories more efficiently.  23  B y injecting and simulating the I F A functional faults i n Table 2-1, we found that only Coupling Faults detection was dependent on memory address sequencing. The simulation results show that the proposed M a r c h 5 N test algorithms are efficient i n test time when considering the row/column redundancy techniques. For example, for the 65536 (64K) x 32 bit e - S R A M i n [12], the test time can be reduced to 56% o f the time required by the March 9 N with Pause test.  2.6 References [1] A . A l l a n , D . Edenfeld, W . H . Joyner, A . B . Kahng, M . Rogers and Y . Zorian. "2001 Technology Roadmap for Semiconductors", IEEE Computer, V o l . 35, N o . 1, pp. 42-53, 2002. [2] A . J. van de Goor, "Testing Semiconductor Memories, Theory and Practice", ComTex Publishing, Gouda, the Netherlands, 1998. Web: http://cardit.et.tudelft.nl/~vdgoor. [3] J. P. Shen, et al., "Inductive fault analysis o f M O S integrated circuits", IEEE Design and Test of Computers, V o l . 2, N o . 6, pp. 13-26, 1985. [4] S. Hamdioui and A . J. van de Goor, " A n experimental analysis o f spot defects i n S R A M s : realistic fault models and tests", Proceedings  of the Ninth  Asian  Test Symposium  (ATS2000), 2000, pp. 131-138. [5] T. M M a k , D . Bhattacharya, C . Prunty, B . Rogers, N . Ramadan, et al., "Cache R A M Inductive Fault Analysis with Fab Defect Modeling," Proceedings  of International  Test  Conference, 1998, pp. 862-871. [6] E . Rondey, Y . Tellier, S. Borri, " A silicon-based yield gain evaluation methodology for embedded-SRAMs with different redundancy scenarios", Proceedings International  On-Line Testing Workshop, 2002, pp. 251-255.  24  of the Eighth  IEEE  [7] D . C . K a n g and S. B . Cho, " A n efficient built-in self-test algorithm for neighborhood pattern sensitive faults i n high-density memories", Proceedings of the 4th Korea-Russia  Int'l Symp.  on Science and Tech, 2000, V o l . 2, pp. 218-223. [8] J. Castillejos and V . H . Champac, " A forced-voltage technique to test data retention faults i n C M O S S R A M by I D D Q testing", Proceedings  of the 40th Midwest Symposium on Circuits  and Systems, 1997, V o l . 1, pp. 433-436. [9] V . H . Champac and V . Avendano, M . Linares, " B i t line sensing strategy for testing for data retention faults i n C M O S S R A M s " , Electronics  Letters, Vol.36, N o . 14, pp. 1182-1183,  2000. [10] A . Meixner and J. Banik, "Weak write test mode: an S R A M cell stability design for test technique", Proceedings of International  Test Conference, 1996, pp. 309-318.  [11] R. Dekker, F. Beenker and L . Thijssen, " A realistic fault model and test algorithms for static random access memories", IEEE Transactions  on Computer-Aided  Design of Integrated  Circuits and Systems, V o l . 9, N o . 6, pp. 567-572, 1990. [12] Y . Zorian, et al., U S Patent 05381419 (Jan., 1995). [13] B . W a n g and J. Yang, " S R A M Design and Optimization", University o f British Columbia '  E E C E 5 7 9 Homework 3, 2002.  25  Chapter 3  Fast Detection of Data Retention Faults and Other SRAM Cell Open Defects  2  3.1 Introduction Testing S R A M s is different from testing logic circuits since S R A M s are more o f mixed-signal devices whose faulty behaviors are often analog i n nature. Thus, defect coverage provides a better estimate for overall test quality than fault coverage [1]. Test engineers typically seek at maximizing defect coverage by applying different test algorithms such as March tests [2].  In the most popular 6 T (6 transistor) S R A M  cells, two categories o f open defects are  undetectable under normal read/write operations o f March tests. The first category includes opens that cause Data Retention Faults (DRFs). Detecting these D R F s is known to be timeconsuming mainly because a pause i n the order o f hundreds o f milliseconds before reading the memory under test is typically required [3]. The second category o f open defects usually causes S R A M reliability degradations even though no faulty logical behavior emerges during M a r c h tests [4]. Due to submicron effects, some failures due to opens with various resistance values are frequency dependent and thus dictate an at-speed test [5]. Since testing for D R F s is a very time-consuming process, much research has been devoted to the reduction o f D R F test time. The technique presented in [6] considers the time required for a March algorithm to march through the entire memory space as part o f the pause time for a D R F  2  A version of this chapter has been accepted. J. Yang, B. Wang, Y. Wu and A. Ivanov, "Fast Detection of Data  Retention Faults and Other SRAM Cell Open Defects", to appear in the IEEE Transactions on Computer Aided Design (TCAD) ofIntegrated Circuits and Systems 26  test. A s a result, the pause time is reduced to some extent. Although this technique is able to effectively eliminate the pause time for S R A M s with very large capacities, this benefit cannot be achieved for S R A M s that are not so large, especially for small S R A M s that are often found in large numbers (e.g., hundreds) i n System-on-Chip (SoC) platforms.  B y applying simple quiescent/transient power supply current monitoring techniques, e.g., [7-14], most open defects can be detected without introducing additional delay time in the test sequences. However, current monitoring b y itself is usually a fairly slow process. In addition, its effectiveness with future deep sub-micron technologies remains to be seen. Alternatively, by introducing hardware modifications i n addition to special test algorithms [15-17], special "write disturb" schemes on each column I/O have been designed to distinguish defective cells from good ones. Besides their high design effort and overhead, these techniques must be applied at a low speed. Another technique, called the PFET-Test mode ( P T E S T ) where a weak N F E T connects each bit line with G N D , detects D R F s when the cells are read at a lower speed [18-19]. The technique presented i n [20] detects D R F s b y reading memories multiple times with the bit line pre-charge circuitry disabled. In summary, the techniques presented i n [18-20] focus on the design o f dedicated read operations either at a l o w speed or using repetitive read operations with the bit line pre-charge disabled. A s a result, all o f them lead to a long test time. This chapter proposes a novel technique that we refer to as Pre-Discharge (PDWTM).  Write Test Mode  Unlike the previous solutions, which must be conducted at a l o w speed, the proposed  P D W T M can be performed at-speed. Moreover, it can be easily merged with conventional M a r c h tests. In addition to D R F s , the P D W T M detects other open defects in the S R A M cells. Furthermore, the proposed solution imposes little extra design effort and negligible hardware and performance penalties.  27  The open defect models used i n [7-20] consider symmetric and asymmetric defects only on the faulty P M O S source or drain. Under a single defect assumption where one but only one defect may exist within a single cell each time, our examination o f all possible operations on faulty cells reveals that open defects on the gate side o f a faulty M O S device must also be considered since these defects are not detectable b y M a r c h tests but they would cause reliability issues. Thus, when evaluating our P D W T M , we use a comprehensive defect model that considers all potential open defects.  The remainder o f this chapter is organized as follows. In Sec. 3.2, the faulty behaviors observed under all possible operations in the presence o f an open defect are analyzed. The P D W T M is presented i n Sec. 3.3. A March 9 N algorithm is used as an example to illustrate how to merge the P D W T M into a March algorithm with two implementation examples i n Sec. 3.4. To validate the proposed technique at the circuit level, two extreme cases o f S R A M cells were designed according to the design considerations i n [21] using a 0.18um technology. These designs and Simulation results are presented i n Sec. 3.5. The proposed P D W T M advantages and limitations are discussed in Sec. 3.6. Some discussions regarding application requirements are provided i n Sec. 3.7. Finally, Sec. 3.8 summarizes.  3.2 Open Defects in a SRAM Cell 3.2.1 Background In order to define all potential opens i n a cell, a circuit schematic that corresponds to a typical 6T S R A M cell is shown i n Figure 3-1. In this circuit diagram, each branch is labeled b y a potential resistive open defect using the notation OCx and OCxc, where JC denotes the node number. Due to the symmetric structure o f the memory cell, opens at locations O C x and O C x c w i l l show a 28  complementary faulty behavior [22]. Therefore, only the faults denoted by OCx really need to be considered when examining possible operations i n the presence o f an open.  A general write driver and pre-charge circuits are also shown i n Figure 3-1, where WL denotes word line, PRE is an active low pre-charge high control signal, WRO and WR1 are active high write control signals for writing a 0 or a 1 into the cell respectively. According to their locations, the opens are classified into four categories: (i) faulty access N M O S opens, i.e., O C 8 , O C 9 and O C 1 0 ; (ii) faulty pull down N M O S opens, i.e., O C 3 , O C 4 and O C 7 ; (iii) faulty P M O S opens, i.e., O C 1 , O C 2 and O C 5 ; (iv) Power node opens, i.e., OC11 and OC12.  To describe the faulty behaviors o f the open defects at the circuit level, the  following  terminology is defined : 3  •  Good Cell: a cell is considered good or defect-free i f its sense amplifier always returns an expected result.  •  Strong fault (SF): a fault is considered as a strong fault i f its faulty behavior can be easily detected by sensing amplifiers. In other words, a strong faulty behavior occurs when the state o f the cell is incorrectly changed, can not be changed, or that sense amplifier returns an incorrect result.  •  Weak fault (WF): a fault is considered as a weak fault i f its faulty behavior may or may not be observed b y a read operation. Its detection is dependent on the memory design, e.g., sensitivity o f sensing amplifiers, and the test conditions.  3  These faults are defined based on the designs without sense amplifier. If an actual sense amplifier is added, WF  can only be detected if the sensitivity of the sense amplifier is good enough. 29  •  Undetected fault (UF):  a fault is considered undetectable if the fault cannot be activated  or its faulty behavior cannot be observed by a read operation. In such cases, the faulty behavior never emergesfromthe defective cell since the sense amplifier always outputs correct logic values. •  "WO/1" and "R071" denote a write and a read of 0/1 respectively. "V c c  PRE  W L  OC5  O C 5c 0 C2c  OC9  I I  ^0 C 6 c  OCi  B  0 C8c  [  0 C3c  BL 1  n  0 C4  1  | W R 0  BLb  0 C4c  0 C1 2  W R 1  11  1  GND  Figure 3-1 Opens within a 6T SRAM cell  3.2.2 Faulty Access NMOS (OC8, OC9, OC10) For a defective cell with OC8, OC9 or OC10, it is impossible for a WO to flip the cell from 1 to 0 because node A can never be pulled down when the resistance of these opens is sufficiently large. Consequently, a strong fault occurs during a subsequent RO cycle. These defects are easily detectable with any March algorithm. When the resistance of the opens is small, however, they may not fail the conventional March tests (please see Sec. 3.5).  If OC10 in the cell is a full-open, it may cause the access NMOS transistor stuck-on. Such an OC10 can be easily detected because the content in the defective cell would be altered when writing a different value into the other cells on the same column. 30  3.2.3 Faulty pull down NMOS (OC3, OC4, OC7) The voltage level o f BL i n this faulty cell can never be pulled down i n a read cycle when the resistance o f O C 3 , O C 4 or O C 7 is sufficiently large and the bit lines are pre-charged high. A s a result, a strong fault occurs i n any RO cycle since the voltage level o f BL w i l l be higher than that of BLb after W L starts to be open. These are also easily detectable with M a r c h algorithms. However, these defects may become undetectable to the M a r c h algorithms i f their resistance is too small.  If O C 7 is a full open, it is likely to cause the pull down N M O S become stuck-on. In this case, the cell behaves as a stuck-at-0, which is easily detectable.  3.2.4 Faulty PMOS (OC1, OC2, OC5) For a S R A M cell, the pull-up P M O S is usually designed to be strong enough to infinitely retain logic value i n the cell without refresh. W h e n the resistance o f OC1 or O C 2 becomes sufficiently large, the current through P M O S may not be able to compensate for the leakage current o f the pull down N M O S and the defective cell cannot retain its logic value infinitely any more. However, it is still capable o f performing correct read and write operations provided that there is rio long delay between the operations. The faulty behavior o f such defects is often referred to as Data Retention Faults (DRFs) and they are undetectable to March algorithms. T o detect a D R F , the most common practice is to write a cell and wait for a few hundred milliseconds before reading the cell for evaluation [23]. Even with the long wait cycle, the detectable resistance is often greater than 100 Gohms [24]. When O C 5 is a full-open which leads the pull-up P M O S stuck-on, the defective cell w i l l fail at WO. However, under some condition, O C 5 may cause no faulty behavior at all during normal read/write operations but its existence w i l l lead to reliability problems [4]. 31  3.2.5  Power Node Opens (OC11, OC12)  Similar to defective P M O S cells, a faulty cell due to OC11 cannot retain its logic high value infinitely but it is still capable o f performing correct read and write operations i f there is no long delay between the operations. W h e n performing a WO, bit line B L discharges node A and bit line B L b charges node B to V c c - V , where V t n  t n  is the threshold o f the access N M O S . During a read  cycle, the pull-down N M O S at node A discharges B L . However, bit line B L b charges node B or refreshes the charge at node B . Therefore, as long as the defective cell is accessed often enough, the cell would appear as i f it is defect free and thus escape any typical M a r c h tests. However, i f the defective cell is left idle for a long period, node B may gradually lose its charge and cause a read operation to fail [16]. In the case o f O C 1 2 , without a l o w resistance path to ground available to pull down the bit line during a read cycle, the observability o f the faulty behavior through a read operation may depend on the memory design, e.g., sensitivity o f the sensing amplifier and test conditions, i.e., a weak fault would occur during a read operation.  In summary, from the above analysis, only O C 1 , O C 2 , O C 5 and OC11 remain undetectable to typical M a r c h algorithms. Because these undetectable opens are all related to the pull up P M O S components, we refer to such open defects as " P M O S Open Defects" or P O D s for simplicity. Subsequently, we focus on special techniques required to test the faults caused b y any one o f the PODs.  Moreover, i n concept, " N M O S Open Defects ( N O D s ) " can also cause D R F s , i.e., the cell with N O D s won't maintain a constant logic zero after sometime. However, the read operation usually destroys this logic zero due to the faulty N M O S . O n the contrary, the read operation can  32  maintain a logic one, even the cell is with a P O D . The essential reason for this "unsymmetrical" feature is because the bit lines are pre-charged to high i n the typical M a r c h algorithms.  3.3 Pre-Discharge Write Test Mode (PDWTM) This section presents the basic concept o f the P D W T M and discusses its coverage o f the open defects. In addition, it also discusses its at-speed capability. Its implementations and mergence with March algorithms w i l l be discussed i n Sec. 3.4.  3.3.1  The Concept  Before each normal read or write operation, both bit lines B L and B L b are pre-charged high, i.e., V c c (see Figure 3-1). During a read operation, the pull-down N M O S device i n a selected cell discharges its bit line to create a voltage difference between B L and B L b . This difference is then interpreted by a sense amplifier. In normal memory operations, a write also begins with precharged high bit lines. T o write a 1 ( W l ) , a powerful write driver pulls down bit line B L b to a strong ground ( G N D ) and leaves B L floating high. The G N D on B L b forces node B to 0, thus causing the P M O S to charge node A . Once a certain threshold voltage difference between nodes A and B is established, the cross-coupled inverters i n the cell amplify the difference and eventually yield a stable state where node A is 1 and node B is 0 even after the memory cell is de-selected. Similarly, a write driver pulls down bit line B L for a write 0 operation or WO.  The concept o f the P D W T M is very simple. Instead o f having a write operation begin with precharged high bit lines, the P D W T M uses a special write operation that begins with a predischarge low bit line or bit lines. This special write operation is capable o f writing correct values into a good cell but is unable to write a cell with the P M O S open defects or P O D s . This  33  inability to write a defective cell can easily be detected by a subsequent read operation. For easy reference, we named this special write operation as P D W .  The P D W works as follows. W e assume that a cell stores an initial value o f 0 with node A being 0 and node B being 1. In addition, we assume that the bit lines are pre-discharged to G N D level and we would like to perform a W l on the cell. Before the W l begins, due to the pre-discharged low, both bit lines are at a floating G N D level, which implies that the voltage level is at G N D but not driven by any active source. When the W l begins, node B is pulled down to a strong G N D by a powerful write driver when the word line ( W L ) is turned on. Initially, node A floats at the G N D level since its pull-down N M O S is turned off by the strong G N D at node B and its bit line B L has been pre-discharged low. For a good cell, the P M O S device at node A can easily charge the small capacitance o f node A with the help o f the access N M O S limiting the influence o f bit line B L . Once a certain threshold voltage difference between nodes A and B is exceeded, the cross-coupled inverters i n the cell act as an amplifier to quickly boost the difference and eventually cause the cell to flip its value, i.e., a successful W l . If the P M O S at node A is defective due to an open, e.g., O C 1 , both nodes A and B w i l l remain low since there is no P M O S to charge node A and node B is pulled down by the write driver. Once the access N M O S devices are turned off, node A w i l l remain low and node B w i l l be pulled high b y its P M O S device. A s a result, the defective cell w i l l retain its old value. In other words, the W l fails for the faulty cell.  3.3.2  Detection of PMOS Open Defects (PODs)  A s discussed i n Sec. 3.2, the N M O S related open defects would cause a M a r c h test to fail. However, all the P M O S open defects or the P O D s are not detectable b y conventional M a r c h tests. This section discusses i n more detail the detection o f the P O D s . 34  This section assumes each special write (the P D W ) to be followed b y a read operation R l for test response evaluation. For completeness, we also include a description o f the operation o f a good cell. (i) Good C e l l Table 3-1 shows the voltage levels o f bit lines and memory cell storage nodes i n a fault-free cell. Table 3-1 Voltage levels of bit lines and memory cell storage nodes in a fault-free cell Patterns  BL  BLb  A  B  Initial  0(f)  0(f)  0  Vcc  W L on  0(f)  0  ^>0  0  W L off  0(f)  0(f)  -» V c c  0  WR  Vcc  Vcc  Vcc  0  W L on  V c c (f)  <Vcc-V  Vcc  0  Wl  Rl  tn  Initially, the bit lines are at the pre-discharged low level, shown as "0 (f)" i n Table I for "floating" at 0 ( G N D ) . In addition, the memory cell is assumed to store an initial value o f 0 with node A being 0 and node B being V c c . After word line ( W L ) is turned on, node B is pulled down to 0 by a write driver and node A is charged b y its P M O S to a level greater than 0, shown as  >0" i n Table I. Once a certain  threshold voltage difference is reached between nodes A and B , the latch mechanism o f the cell w i l l continue to amplify this difference even after the W L is turned off. In Table 3-1, this is the reason for the notation  V c c " for node A with W L off, indicating that node A continues to  pull up even after W L is off. In Table 3-1, node A is pulled up to V c c at the end o f the write recovery or W R (will define later) portion o f the W l operation. In fact, as discussed later, there are usually many clock cycles to charge the node A by the P M O S i n a M a r c h test since the  35  March test w i l l not access this same cell again after the P D W until the next time when the test marches to this same cell. The write recovery or W R i n Table 3-1 is a normal pre-charge high operation at the end o f a write to get the bit lines ready for the next read operation R l . A s shown i n Table 3-1, the W l i n a good cell succeeds and the subsequent R l returns a correct value, (ii) Detection o f O C 1 , O C 2 & O C 5 Table 3-2 shows the voltage levels o f bit lines and memory storage nodes i n a defective cell due to O C 1 , O C 2 and O C 5 shown i n Figure 3-1. A s shown i n Table 3-2, when the W L is turned on during the W l cycle, node A remains at 0 i f the resistance o f the open defect is too large for the P M O S to charge it. In the mean time, node B stays slightly above 0 because the P M O S transistor at node B is turned on due to the node A voltage level. Once the W L is turned off, node A w i l l remain at 0 and node B w i l l be charged back to V c c by its P M O S . A s a result, the defective cell fails to perform the W l and the failure is detected by the subsequent R l . Table 3-2 Voltage levels of bit lines and memory cell storage nodes in a faulty PMOS defective cell Patterns  BL  BLb  A  B  Initial  0(f)  0(f)  0  Vcc  W L on  0(f)  0  0  >0  W L off  0(f)  0(f)  0  ^Vcc  WR  Vcc  Vcc  0  Vcc  WLon  <Vcc-V  V c c (f)  0  Vcc  Wl  Rl  tn  (iii) Detection o f O C l l Table 3-3 lists the voltage levels o f the bit lines and memory cell storage nodes i n the faulty cell with an open defect at the V c c node.  36  Although the power supply to the cell is abnormally resistive or even disconnected due to defect O C 1 1 , the memory cell is able to perform a normal write. This is because the normal write operation has both bit lines pre-charged high. However, the voltage at node B cannot reach V c c after W L is turned off. Instead, it can only reach V c c - V  t n  through the bit line pre-charge due to  the powerless pull-up P M O S . Table 3-3 Voltage levels of bit lines and memory cell storage nodes in the cell with an open defect in Vcc Patterns  BL  BLb  A  B  Initial  0(f)  0(f)  0  VcC-V  W L on  0(f)  0  0  >0  W L off  0(f)  0(f)  0  >0  WR  Vcc  Vcc  0  >0  Wl  Rl  W L on  B L < BLb  t n  A <B  When the W L is turned on during the W l , node A remains at 0 due to pre-discharged B L . Although node B is pulled down by a write driver, its voltage level w i l l always be greater than 0 due to the floating power node and the 0 value at node A which makes the P M O S at node B always on. Since node A is always at a lower voltage level than node B , the pre-charged high bit lines during the following read operation would help amplify this difference. A s shown later i n the chapter, depending on the duration o f the read cycle, the subsequent R l may or may not fail.  In summary, the P D W operation that begins with pre-discharged low bit lines is capable o f detecting all P M O S open defects i n a cell.  37  3.3.3  At-Speed Test Capability  A s pointed out i n Sec. 3.1, for a good cell, as long as a small voltage difference is established between nodes A and B during the time when the word line is turned on, the cross-coupled inverters i n the cell w i l l continue to amplify the difference even after the W L is turned off, eventually causing the memory cell to flip its value. This is because the capacitance at node A or B when the word line is off is much smaller than the one when the word line is on. Whether the P D W T M is capable o f working at speed depends on two factors: (1). whether a voltage difference greater than a noise margin plus cell offset can be established such that the difference be amplified even after turning off the W L ; (2). whether enough time is available such that a good cell be able to flip its value before the W L is turned on again for a subsequent operation. The latter condition is guaranteed for M a r c h algorithms that contain a "read-modify-write" such as ( R l WO) or (RO W l ) (please refer to Figure 3-3 for an example). After writing a cell, the algorithm w i l l not access the same cell again until the algorithm w i l l have marched through the entire memory space and w i l l have come back to the same cell. In practice, the memory space is typically much greater than 1 word. Marching through the entire memory space would thus take more than one clock cycles and provides enough time for a good cell to flip its value after a predischarge write or P D W . For M a r c h tests which do not contain "read-modify-write", this condition can also be met as it is validated by the experimental results for at-speed tests i n Sec. 3.5.  The condition that a small voltage difference can be established such that a good cell could amplify the difference even after the W L is turned off is also guaranteed to always be met for the following reasons.  38  First, the duration o f a word line being on is relatively long for a write operation because this duration is usually determined by read operations. During a read cycle, a word line must be kept on long enough for a memory cell to sufficiently discharge a heavily loaded bit line. Secondly, a very small voltage difference is sufficient to cause a good cell to flip due to the cell symmetry and the nature o f the latch mechanism. For example, i f we assume that this difference must be greater than 5% o f V c c , 9 0 m V would suffice i f V c c = 1.8V for a 0.18pm technology. During a W l with bit lines pre-discharged low, it is not so difficult for the P M O S to charge node A to such a level since the capacitance o f node A is extremely small with the help o f the access N M O S limiting the influence o f bit line B L . Our S P I C E simulation (reported later) on both low power and high speed memory cells revealed that such levels were easily achievable.  For a defective cell with a P O D , running the P D W T M at-speed usually improves its detection. For example, consider a scenario where the P M O S at node A has a partial open defect with a reasonably high resistance. If the W L is kept on for sufficiently long during the P D W , the defective P M O S might be able to slowly charge node A high. However, i f the test is conducted at-speed, the same resistance i n the defective P M O S may not have enough time to charge A at all, thus causing the write to fail. Such conjecture has been verified b y our S P I C E simulation and reported later i n Table 3-5.  Two factors, i.e., the S R A M design style and the noise, might affect the P D W T M at-speed test capability. Currently, the bit lines are usually pre-charged after the normal read and write operations for S R A M s , including e - S R A M s . However, for some asynchronous high-speed commodity S R A M designs, the bit lines pre-charge o f their normal read and write operations is at the beginning o f the cycle. A s a result, an extra time w i l l be required to discharge bit lines at  39  the end o f the cycle right before the cycle under test to enter P D W T M i f the latter is implemented i n the straightforward way. A t that time, the P D W T M won't be able to run at speed.  The noise effects on running the P D W T M at speed can be analyzed based on the Figure 3-2.  Static Noise Margin  \ 0  A  Figure 3-2 Static Noise Margin for a SRAM Cell A t the beginning o f the P D W , the transition w i l l start from the origin point without considering noise since both node A and B are at G N D . If the noise causes the starting point above the V s line i n Figure 3-2, e.g., point n, the cell won't flip for a W l . Otherwise, it w i l l flip correctly. Fortunately, the starting point would usually be below the V s line i n Figure 3-2. This is because node A is at weak G N D and B is at strong G N D during W l . W i t h the increased voltage difference between node A and node B , it w i l l flip even more quickly . 4  3.4 Implementations W e showed i n Sec. 3.3 that a defective cell w i l l fail a P D W while a good cell w i l l succeed. Such a' pass/fail decision is the same as that used i n March algorithms. Consequently, the proposed P D W T M can be easily merged with any M a r c h algorithm.  4  This is to acknowledge Dr. Res Saleh who suggested adding these extra discussions. 40  There exist many ways to implement the P D W T M . This chapter provides two examples. One is a straightforward implementation o f the P D W T M , which pre-discharges bit lines low during the memory operation right before a P D W . The other is to use the No Write Recovery Test Mode (NWRTM) [25] for setting up the floating low bit lines using a write cycle before a P D W . A s mentioned earlier, the proposed P D W T M can be merged into any March algorithm. A s an example, we use the expand March 9 N with Pause test [26-27] shown i n Figure 3-3. In Figure 33, "IT" represents increasing address sequencing during test while  represents decreasing  address sequence. "Delay" denotes a specific time required for D R P detection. Moreover, " 0 " and " 1 " represent the test patterns and their complementary values, i.e., " 5 " and " A " o f the check-board test patterns in [28], instead o f only 0 and 1. T o better illustrate the P D W T M , we divide the March 9 N with Pause test algorithm into two stages: a first stage is a conventional March 9 N and the other is a Pause test targeting the D R F s . J J , w o TJ,(RO w 1 )TJJ;R 1 w o ) ^ ( R O \-»  March Delay  , Q . (R 0  -  W1)  Pause  w 1)^R  1w o )  9N  "-| Delay  ,TJ,R1  Test  ^-j  Figure 3-3 The expand March 9N test with Pause test  3.4.1  A PDWMarch 9N Algorithm  This section presents a straightforward implementation o f the P D W T M into the conventional March 9 N . The algorithm is referred to as a P D W M a r c h 9 N shown i n Figure 3-4.  ^ ( W 0 )  JJ,(R0  1r ( R O  p r 0  W1 ) W 1 )  rj,  (R1"  IT  (R1  r 0  Figure 3-4 The PDWMarch 9N  41  W 0 )  W 0 )  In Figure 3-4, RO ° and Rl ° pi  pi  represent read 0 and read 1 operations respectively that pre-  discharge the bit lines low at the end o f the read, as opposed to pre-charging the bit lines high as in normal read operations.  W i t h the R 0 ° and R l ° operations i n the P D W M a r c h 9 N , the write operations that immediately pi  follow the R 0  p i  p r 0  and R l ° naturally become the P D W . Compared to the original M a r c h 9 N p i  algorithm shown i n Figure 3-3, aside from the elimination o f the time consuming pause test, the P D W M a r c h 9 N simply replaces two o f RO and R l operations o f the M a r c h 9 N with R 0 Rl  p K )  respectively. Since R 0  p K )  and R\  pr0  p r 0  and  take exactly the same amount o f time as the original RO  and R l to execute, the P D W M a r c h 9 N algorithm achieves full D R F coverage with zero D R F test time because it adds no test cycles extra to the conventional M a r c h 9 N test and can also be applied at-speed.  To realize the R 0 ° and R l ° , the pre-charge control circuit o f the memory must be modified. In pi  p i  the original memory cell, we define the signal P R E to be active low, enabling a pre-charge cycle to pre-charge bit lines high; and define W R O and W R 1 to be active high signals, enabling the WO and W l drivers, respectively (please refer to Figure 3-1). The modification required to implement the R0  pr0  and R\ ° pT  is shown i n Figure 3-5.  The circuit shown i n Figure 3-5 works as follows. During an R 0 ° or Rl pi  PDWTM  pr0  cycle, we set  = 1. After reading a cell, the memory's original control circuit generates an active-low  pre-charge pulse P R E _ i n . Since  PDWTM  = 1, the normal pre-charge high signal  PRE  is  disabled. Instead, this pre-charge pulse forces both W R O = 1 and W R 1 = 1 so that the write drivers are re-used to pre-discharge both bit lines low. During a following write operation WO or  42  W l , P D W T M = 0. In this case, the pre-charge pulse P R E _ i n after the write operation w i l l turn on the normal pre-charge high circuitry to pre-charge the bit lines high to be ready for a subsequent read operation. P D W T M = 0 for all the RO and R l as well.  Figure 3-5 Memory control circuit modification for PDWMarch 9N A s shown i n Figure 3-5, the memory modification requires an addition o f 5 gates. Therefore, the modified circuit imposes an extra gate delay to the pre-charge high circuitry and the two write drivers as compared to the original circuit. Besides the memory modification described above, the P D W M a r c h 9 N also requires a B I S T controller to provide a mode signal P D W T M .  3.4.2 An NWRMarch 11N Algorithm with No Write Recovery This section presents the second implementation o f the P D W T M . It uses the N o Write Recovery Write ( N W R W ) operations i n [25], where the bit line(s) are discharged during either an existing write cycle or an extra write cycle before a P D W . In regards to memory modification, this implementation requires only a single gate addition. To present the N W R T M , we define the following terms. •  Write Recovery (WR): W R is a bit line pre-charge high operation after a write. The purpose o f the W R is to prepare the bit lines for a possible read following the write.  •  No Write Recovery  Write (NWRW): N W R W is a write operation with no Write Recovery  ( N W R ) , i.e., the bit lines retain the values written after a N W R W operation completes.  Using the N W R W , the P D W T M yields another implementation that we refer to as No Write Recovery  March  Test Algorithm  (NWRMarch 43  UN),  shown in Figure 5. In Figure 5,  " N W 0 / N W 1 " represents writing 0/1 with write recovery disabled. The parts involving the N W R T M are referred to as N W R T M O and N W R T M 1 respectively.  A s shown i n Figure 3-6, the N W R M a r c h 1 I N eliminates the time consuming pause test used i n the conventional M a r c h test shown i n Figure 3-3. In its replacement, one N W O and one N W 1 cycle are added. The purpose o f N W O ( N W 1 ) is to force bit line B L (BLb) l o w till the N W O or N W 1 is complete. In the case o f N W O , the W l that follows right after becomes a P D W . During the W l operation, B L stays at a floating 0 due to the N W O while B L b is pulled down to a strong G N D by the write driver, which corresponds to the same scenario as what was discussed i n Sec. 3.3 for the P D W . The operation o f a WO following a N W 1 should be obvious to readers.  JL(NW 1  W O ) JL(R0  N W R TM  lT(R0  Wl]  R 1  w  ° )  0  NWO W l )  F*  .J],(  "tT(R1 W O )  N W R TM 1  Figure 3-6 The NWRMarch 11N The memory modification required to implement the N W O and N W 1 is quite simple and shown in Figure 3-7. P R E  Figure 3-7 Circuits design of the NWRTM During an N W O or N W 1 operation, signal P D W T M = 1, which disables the pre-charge high circuitry. A s a result, the write recovery is disabled and causes bit line B L to float at 0 after N W O and bit line B L b to float at 0 after N W 1 . In terms o f performance penalties, the modification imposes an extra gate delay only on the pre-charge high circuitry as compared to the original one. Again, a B I S T controller must provide a mode signal P D W T M i n this case.  44  3.5 Experimental Results To validate the proposed solutions, models o f twelve defective cells were created, each corresponding to one o f the twelve possible open defects discussed in Sec. 3.2. These models were created for a 0.18um technology using the Salicide 1 P 6 M 1.8V S P I C E model [29]. Each open defect is modeled by a single resistance ranging i n value from 1 K D to 1000 GQ, on a logarithmic scale, similarly to what was adopted i n [4].  In this experiment, the expected bit line voltage difference for sensing a correct value i n a good cell is assumed as 10% o f V c c while the edge point o f the bit line voltage difference for considering either SFs or W F s is 7.5% o f V c c . In other words, during reading a defective cell, i f the bit line voltage difference is greater than 7.5% o f V c c but the polarity o f the difference is opposite to what is expected for a good cell, the fault is considered a S F . Otherwise, i f the difference is less than 7.5%, it is considered a W F .  3.5.1  SRAM Models  For simplicity, but without loss o f generality, the S R A M simulation model shown i n Figure 3-8 was developed. This model includes one memory column I/O, i.e., it includes one memory cell, two pre-charge P M O S ( P R E assumed to be active low), two equivalent bit line loadings, and two simplified write control N M O S gates. Both W R O and WR1 are low during read cycles. In write cycles, W R O (WR1) is high and WR1 ( W R O ) is low to write data 0 (1) into the cell.  In order to run the circuit level simulations, the pre-charge P M O S devices and write control N M O S devices are assumed to be o f the same size, i.e., 10/0.18 u m (width/length), and the bit line capacitance is assumed to be l p F . W e use the traditional methodology i n [21] to design two extreme cases to evaluate the detection o f all the defects. 45  The design considerations o f an S R A M cell stated i n [21] are such that a cell should meet the following conditions: (i) Nondestructive Read Condition; (ii) Write Condition; (iii) Data Retention Condition; (iv) Power Dissipation Condition. To meet conditions (i) and (iii), the P M O S devices i n a cell should be as weak as possible to enhance the write condition (ii) but also strong enough to hold data i n data retention test (iii). Therefore, we chose the minimum size 0.22um/0.18um to meet the cell design considerations and minimum layout area requirements.  PRE  10/0.18  1 O/Ch 1 8  BL  BLb  WL Memory C e l  1pF  L1 F P  W R1  W RO  10/0.18 10/0.18 Size Format: Width/Length Size Unit: urn  Figure 3-8 The SRAM simulation circuit For condition (iv), there is a trade-off between high-speed and low power objectives. To validate the applicability o f our algorithm to different design styles, we implemented a high-speed cell and a low power cell for our evaluations.  To meet condition (i), techniques reported i n [30-31] were used to simulate the Static Noise Margin ( S N M ) i n a read operation under the following simulation parameters and variations  46  (comers): (i) 10% variation on supply voltage (i.e., 1.62-1.98V); (ii) temperature effects (0100°C); (iii) M O S Models (SS, SF, T T , F S , F F ) ; (iv) S N M > 0.18V (i.e., 10% o f power supply). 5  Table 3-4 Transistor sizes of memory cells C e l l L P (urn)  C e l l _ H S (urn)  Pull-up P M O S  0.22/0.18  0.22/0.18  Pull-down N M O S  0.22/0.18  0.405/0.18  Access N M O S  0.22/0.29  0.22/0.18  According to the latter design considerations, Table 4 specifies the transistor sizes o f the low power cell ( C e l l L P ) and the high speed cell (Cell_HS). The memory control circuits used i n the simulation are shown i n Figures 3-4 and 3-6.  3.5.2 Input Patterns and Validation Results In our experiments, S P I C E simulations were performed at a slow speed and at a "full speed". In the l o w speed simulation, the clock period is set to 20ns, o f which 10ns is for a read or write with the W L turned on and 10ns is used for pre-charge. The same clock period is used for both the low power and the high speed cells. For the at-speed simulation, the clock period is determined by the fastest possible read operation because reading is usually slower than writing. Due to performance difference, the duration for which the W L is on is different for the low power and high speed cells. However, the same precharge period is used for both, i.e., 1.5ns. A s a result, a clock period o f 4ns is used for the l o w power cell and a clock period o f 3.3ns is used for the high speed cell i n the at-speed simulation.  5  These two letters delegate transistors process corners, where S means "Slow"; F means "Fast" and T means  "Typical". 47  In the experiments, three sets o f input patterns were used. The first input patterns are for the traditional March 9 N test and a pause test. The waveforms o f validation results using the pause test are shown i n Figure 3-9. The second and third input patterns are for the N W R M a r c h 1 I N and P D W M a r c h 9 N , respectively. Figure 3-10 (I) and (II) show the waveforms o f validation results using the N W R M a r c h U N and P D W M a r c h 9 N respectively. In Figures 3-9 (I) and (II), the memory operations used i n the experiments are illustrated at the bottom o f each figure. For comparison, the waveforms for a defect-free cell are shown i n Figures 3-8 (a) and 3-9 (I)(a) and 3-9 (II)(a). Figures 3-8 (b), 3-8 (c), 3-9 (I)/(II)(b) and 3-9 (I)/(II)(c) show that the pause test and N W R M a r c h HN/PDWMarch  9 N can all detect  O C 1 and  O C 2 . However,  only  the  NWRMarch  H N / P D W M a r c h 9 N can detect both defects at speed. In the pause test simulation, the memory cell loses its value after some delay and an incorrect 0 returns in the subsequent R l cycle. In the N W R M a r c h 1 l N / P D W M a r c h 9 N , the fault is triggered during a W l cycle and detected i n the following R l cycle. A s shown i n Figures 3-8 (d) and (e), the pause test cannot detect O C 5 or O C 1 1 . In comparison, the N W R M a r c h 1 I N and P D W M a r c h 9 N are both capable o f detecting these defects at-speed and at low speed as illustrated i n Figures 3-9 (I)/(II) (d) and 3-9 (I)/(II) (e). The reason that pause test is unable to detect O C 5 is because the defect is a resistive open. Therefore, the voltage level o f the defective P M O S gate is always equal to that o f node B regardless o f its resistance value. A s a result, during the pause test, this faulty cell does not behave as a data retention fault and w i l l thus not be detected by the pause test. This has also been confirmed i n [4]. If the resistance o f this open were infinite, the pause test may or may not detect this defect depending on the faulty P M O S node voltage conditions since this node would be floating i n this case.  48  In the case o f O C 1 1 , from Figure 3-9 (e), the voltage at node B is always l o w and never exceeds that o f node A during the duration o f the pause. Due to the charge sharing effects between the bit lines and the memory cell's latch mechanism, this faulty cell would return to the 1 state while WL is on i n the following read cycle. It would therefore still return a correct data value o f 1 i n the R l cycle regardless o f the duration o f the pause even though it effectively causes a data retention fault. This is shown i n the portion o f Figure 3-9 (e) that corresponds to the R l operation. In addition to the P M O S defects or P O D s , simulation was also conducted for all twelve possible open defects shown i n Figure 3-1. The simulation results are listed i n Table 3-5 i n terms o f fault type and detectability for both low speed and at-speed tests. The fault types i n Table 3-5 were defined i n Sec. 3.2 and their detectability is quantified by the minimal value o f open resistance that is detectable b y the applied tests. Since the detail implementation o f W R T M in [17] is not in the open literature, we cannot list their detectabilities i n Table 3-5 for comparison. Moreover, since opens other than the P O D s can be detected by typical M a r c h algorithms, the detection capability i n terms o f resistance values when using the pause test is not listed i n Table 3-5. A s Table 3-5 shows, the proposed P D W M a r c h 9 N and N W R M a r c h 1 I N are able to detect all the twelve defects while March 9 N plus the pause test can only detect ten o f the twelve. Moreover, the minimum detectable resistance values i n all cases o f the low-speed tests are all equal or less than those o f the March 9 N plus the pause test. In many cases, our new algorithms achieve much lower minimum detectable resistance than the M a r c h 9 N plus the pause test. The same can be claimed for the at-speed test cases as well. To study the sensitivities o f the P D W T M on the assumptions we made earlier, further simulations were conducted. The results show that changing the dividing line for discriminating SFs and W F s away from 7.5% o f V c c does not change the fault types shown Table 3-5. In  49  addition, increasing bit line capacitive loading does not change fault type, either. However, it does change the detection capacity to some degree. For example, when we increased the bit line loading from l p f to lOpf, the detectability o f O C 2 changed from 1 M Q to 0.1 M f 2 ; i n addition, the detectability o f O C 7 changed from 0.1 GO. o 10 G O . The detectabilities o f all the other defects did not change. 'delay 2 50ms then read DtttifiVfJKlriM]  HbtoA  s m  I 0  i  i  i  i  I i 30m  ^ HMcrt —  1 sum 0  4  X  —f- i 20011  I i i i i I i 100m 1S0m Ttnc |lln) (TIME)  1 ' — - - 11 1m  1  1  !  ,  i  ,  1  i  1  ! 1 * 11 1.2m Time pin) (TIHEj  ll  SWTtgl DttMt'.fl3lc£g)  i  „_  1 JD0U  - 1  i  b c i DstKitfi  —ter—t  r  i  'delay 250ms then read  i  1.S 8>  Delay 2g0ms than Read  III  a  mm'tpsci.et)X —  ADtfaEt-ftea€cll  1  U  i  i  KOm  ! Delay 250rra than Read T  i  i i  1 1 14m  ! | 1.6m  1  i  |  'delay 25Dms then read g  S  1  1.5 SOOm  £m»&  1  i  |  f t !  t  !  i I  i  j"W"~i"600u  400u  i  i  OC2 Deletion  800u  1n Tnc[lln)lTIHE)  i  Dtlaj 250ms than Read j !  :  I  !  12m  1.4n  'delay 250rrts then read  Wette  OCSDoteGttii  —A. i  F  i Datay 2$Dm&th«i Road  \  IbiteA  t  i t i  I 50m  i t  100m  150m TmclllrfilTIHE)  200n  250m  'delay 25Qms then read  rmutsi.fBBci u\ DD:ii0»|lBoel1.b) (5—  1.S *  I  1  V.  i N K f e A ~ " \ " "' bci iDetofon" 1 1 —  mm 0 p—r i—i  i  i  50m  100m  150m This pin) [TIBEj  Delay 250ms  20011  K0m  R1  Figure 3-9 Detections of opens for a fault-free cell (a), OC1 (b), OC2 (c), OC5 (d) and OC11 (e) when applying a pause test  50  In summary, the P D W M a r c h 9 N and the N W R M a r c h 1 I N can detect all twelve open defects and their detection capability i n terms o f resistance value is better than that o f the M a r c h 9 N plus the pause test for S R A M cells designed according to the design considerations i n [21]. Wave D0:tifl:v(xc1.a)  'no write recovery cyclexelljp at-speed test  Symbol X—  * — I  !.5  !  1  \  \  i  i  f  i  ADefdct-fr@@Ceii j I  ]  I  \  30n  35n  I  !  1  T~ 30n  35n  ~1—I 40n  a  2 >  Wave D0:trQ:vfxcoc1.a)  500m 0  X—  1.5  r o  1  r  5  500m  o  [  [  —\ 5n  A 1.5  '  (c)  Wave  1.5  ft I  5  \  Wave D01r0:v(xcoc11.a)  0  '.  * W)  1  1 25n  r  -j—  -1 40n  1  i  '  1 15n  1 ' 20n Time (lin) (TIME) 1  1 25n  '  * /—1 /  OC2 Deteco'on  :  ;  i  !  I 10n  15n  20n Time (lin) (TIME)  25n  I 30n  35n  40n  30n  35B  40n  30n  35n  I—I 40n  *no write recovery cyclexelljp at-speed test J  \  I  :  ["  [  OC5Detec«Sh  5n  lOn  15n  Node A  20n Time (lin) (TIME)  25n  'no write recovery cyclexelljp at-speed test  Symbol X—  1 1 20n Tims (lin) (TIME)  Node A  1  500m  1 10n  !  Symbol X—  1  OC1 Detauon  '  !  5n  DD:trO:v(xcoc5.a)  1 15n  *no write recovery cycle:cell_lp at-speed test  '  1  1 500m  1  Node A  Symbol X—  1 10n  *no write recovery cyclexelljp at-speed test  A (b)  1  o  D0:lrt>:v(xcoc2.a)  i  Symbol  !  Wave  Node A  1  15  a o  1  2  500m  2  0 5n  Initialize  NWO  10n  W1  I  15n  R1  I 20n Time (lin) (TIME)  NW1  WO  25n  R0  NWO  W1  R1  Figure 3-10 (I) NWRMarch 11N validation results for a fault-free cell (a), OC1 (b), OC2 (c), OC5 (d) and OC11 (e)  51  Wave  Symbol  D0:tr0:v(xc1.a)  X—  1.5 0  'pdw(read): w0;r0prQ;w1 ;r1pr0 celljp speed lest  ( f a  ^ — ^ D e t e c t - f r e e ;eii  i  1  i  ! \^  o = $  Wave  Symbol  D0:»:v(xcoc1.a)  X—  500m 0  Node A I  I 0  I 5n  1  'pdw(read):  Wave  1  I I 20n 25n Time (lin) (TIME) 1  1  W0;r0pr0;w1;r1pr0  1  1  1  30n  I  1  I  35n  40n  i 35n  I 40n  celljp speed test  I %  500m 0  OC1 Detection Node* I 5n  1  I 10n  *pdw(read):  Symbol  D0:tr0:v(xcoc2.a}  I 15n  1  1.5 *  I 0  (ID  I lOn  1  I :  1  I 15n  I 20n Time (lin) (TIME) 1  1  W0;r0pr0;w1;r1pr0  I 25n  1  I 30n  celljp speed test  15  1  D  « %  500m 0  i — r 5n  Wave  Symbol  D0:tr0:v()coc5.a)  X—  10n  'pdw(read):  15n  - r ~ 30n  20n 25n Time (lin) (TIME)  w0;r0pr0;w1;r1pr0  35n  —I  1  40n  celljp speed test  1.5 1 o 500m > i — r  n  1  5n  Wave  Symbol  D0:tr0:v(xcoe11.a)  X—  1 — 10n  'pdw(read):  1  1 — 15n  1 — i — 20n 25n Time (lin! (TIME) 1  1  W0;r0pr0;w1;r1pr0  1  r ~ 30n  35n  I 40n  35n  40n  1  celljp speed test  1.5 «  o i  1 500m "T" 5n  Initialize  WO  10n  R0  15n  W1  T"  20n Time (tin) (TIME)  Rf"  | W0  -T  -  25n  R0 P "  30n  Wl  piu  R1  WO  Figure 3-10 (II) PDWMarch 9N validation results for a fault-free cell (a), OC1 (b), OC2 (c), OC5 (d) and OC11 (e)  52  Table 3-5 Open defects detection capabilities of March 9N, Pause test and PDWTM Open  Test  Detection Capability:  , Defect  Algorithms  LP(low-speed, at-speed) / HS(low-speed, at-speed)  9N  Undetectable  Pause  1000/ 100 (GQs)  PDWTM  (100, 10)/(100,100) ( K Q s )  9N  Undetectable  Pause  100/ 100 (GQs)  PDWTM  ( 1 , 0 . 1 ) / ( 0 . 1 , 0.1) ( M Q s )  9N  (0.1, 1 ) / ( 0 . 1 , 0.1) ( M Q s )  PDWTM  (0.1, 0.1)/ (0.1, 0.1) ( M Q s )  9N  (100, 1000) / (10, 100) ( K Q s )  PDWTM  (100, 100)/(10, 10) ( K Q s )  9N  Undetectable  Pause  Undetectable  PDWTM  ( 1 , 0 . 0 1 ) / ( 1 0 , 10) (GQs)  9N  (100, 10)/(100, 100) ( M Q s )  PDWTM  (100, 1) / (100, 100) ( M Q s )  9N  ( 1 , 0 . 1 ) / ( 0 . 1 , 0.1) (GQs)  PDWTM  (0.1, 0.01)/(0.1, 0.1) ( G Q s )  9N  (100, 100)/(100, 100) ( K Q s )  PDWTM  (10, 1 0 ) / ( 1 0 , 10) ( K Q s )  9N  (100, 100)/(100,100) ( K Q s )  PDWTM  (10, 1 0 ) / ( 1 0 , 10) ( K Q s )  9N  (100, 10)/(100, 10) ( M Q s )  PDWTM  (100, 0.1)/(100, 100) ( M Q s )  9N  Undetectable  Pause  Undetectable  PDWTM  (10, 1 ) / ( 1 0 , 10) ( M Q s )  9N  (1, 1 0 0 0 0 ) / ( l , l ) ( M Q s )  PDWTM  (1,10)/(1,1) (MQs)  OC1  OC2  OC3 OC4  OC5  OC6 ' OC7 OC8 OC9 OC10  ' OC11  OC12  6  7  8  6  SF for all cells during low-speed test and for HS cells during at-speed test; WF for LP cells during at-speed test.  7  WF for all cells.  8  WF for all cells. 53  3.6  Discussions  3.6.1 Test Time, Defect Coverage and Detection Capability W e used the same assumptions as i n [17] for test time comparison, i.e., we assumed a memory array with 128 word lines, 250ms pause time for each pause during the pause test and a clock period o f 50 ns. W e also assume that the selected test algorithm is M a r c h 9 N . Table 3-6 DRF test time comparisons Test Patterns  Incremental test time  Pause test  WO; Delay; RO; W l ; Delay; R l  500ms  WWTM  WO; RO; W 1 ; R 1  26ps  IDDQ  WO; R 0 ; W 1 ; R 1  26ps  NWO; N W 1  «13ps  NWRMarch 1 I N  Replace pre-charge high with PDWTM PDWMarch 9N  pre-discharge l o w during some  0 ps  read cycles Table 3-6 shows the test time for several different test methods. Because the differences only lie i n the detection o f defects that cause D R F s , only the test time required for detecting these defects is listed. For a typical pause test with 250ms delay, the test time is about 500ms due to the requirement o f such two delays (see Figure 3-8). The test time for both the Weak Write Test M o d e ( W W T M ) i n [17] and the I D D Q test schemes i n [7-8, 11-13] is as about 26ps due to the 4x128 test cycles and extra time required for exercising these schemes.  Assuming we conduct the P D W T M at the same l o w speed as the W W T M in [17], the incremental test time resulting from the two extra N W R W cycles at each address is less than 13us i n the N W R M a r c h 1 I N . The incremental test time for the D R F s using the P D W M a r c h 9 N 54  is zero as explained i n Sec. 3.4. In fact, since the P D W T M is capable o f running at-speed, the test time o f the N W R M a r c h 1 I N is significantly less than 13 us.  3.6.2 Design Efforts and Area / Performance Penalties The memory modifications required for the N W R M a r c h and the P D W M a r c h are shown i n Figures 3.4 and 3.6, respectively. Implementing the N W R M a r c h requires only a single O R gate to disable to the pre-charge high circuitry during a N W R W cycle. The additional hardware requirement for the P D W M a r c h is five gates. The additional design effort for these modifications is thereby nearly negligible. In addition, area penalties due to the P D W T M are negligible as well.  In terms o f performance overhead, the N W R M a r c h imposes a single gate delay to the pre-charge high circuitry while P D W M a r c h algorithm imposes a single gate delay on the pre-charge high circuitry as well as the write drivers.  In comparison, previous D F T schemes require not only memory control logical modification for entering a special test mode such as the W W T M i n [17] but also some special D F T circuitry for each and every bit line. For example, a weak R A M write ( W R W ) circuit is added to each bit line in the W W T M i n [17] and a current sensor is required i n the I D D Q tests i n [7-8, 11-13]. The modifications on all the bit lines obviously amount to significantly higher design efforts and area/performance penalties. A s pointed out i n [17], the design effort for the W R W can amount to roughly two engineering man-months.  3.6.3 Separate DRF Test Using the PDWTM The concept o f the P D W T M has been shown to be easily merged with a conventional March algorithm. However, i f desired, a special D R F test can also be derived from the P D W T M . This  55  test can be applied after a conventional M a r c h test for fault coverage enhancement. Using the concept o f P D W T M , the test would look like that shown i n Figure 3-11. The purpose o f this is to provide an alternative solution i n the case where one is contrived to not being able to disturb or modify an existing M a r c h test. ^(R0  PR0  W1)  ft^R!"  0  WO) ft (RO)  (NWO  (a) Separate DRF Test Based on PDWMarch  W1) ft (R1 NW1 WO) ft (RO)  (b)SeparateDRFTestBasedonNWRMarch  Figure 3-11 Separate DRF test using PDWTM  3.6.4  Selections between PDWMarch and NWRMarch  A s stated i n Sec. 3.4, the P D W T M concept can be applied to any M a r c h algorithm. The P D W M a r c h 9 N and N W R M a r c h U N are two implementation examples when applying the P D W T M concept to the March 9 N algorithm. For easy reference, we simply name these two algorithms after merge as P D W M a r c h and N W R M a r c h respectively. Both o f these two algorithms are capable o f achieving full coverage o f the P M O S open defects or P O D s . However, the  PDWMarch  achieves  such coverage  without  additional cycles. In comparison,  the  N W R M a r c h using the N W R technique requires additional cycles for N W O and N W 1 . However, when applied to some other algorithms, the N W R technique can also achieve D R F coverage without additional cycles. One such algorithm is the Symmetric M a r c h G algorithm shown i n Figure 3-12 [2]. Figure 3-13(a) shows a derived Symmetric N W R M a r c h G using the N W R technique Comparing Figures 3-11 and 3-12(a) reveals that the derived Symmetric N W R M a r c h G has achieved full P O D coverage without any extra cycle.  For comparison, Figure 3-13(b) shows a derived Symmetric P D W M a r c h G algorithm when applying the P D W T M concept to the original Symmetric M a r c h G i n a straightforward way. In Figure 3-13(b), W0 ° pi  and W l ° represent a write 0 and a write 1 operation respectively that prep i  56  discharge the bit lines low at the end o f the operations. W i t h the W0  pr0  and W l  p r 0  , the write  operations that follow right after naturally become the special pre-discharge write or P D W . JT,W0 JJ,(R0 W1  R1 WO W 1 ) ^ ( R 1 WO RO W 1 ) f t ( R 1 WO W1 WO)  ^f(R0 W1 R1 WO) Delay  ^ ( R O W1 R 1 ) Delay ^ { R 1  WO RO)  Figure 3-12 Symmetric March G test algorithm JJ,(R0 W1 R1 NWO W 1 )  J1.W0  tf(R1 W 0 N W 1 W 0 ) t f (  R 0 W 1  R  1  ^ ( R 1 WO RO W 1 )  W O ) ^ ( R O W 1 R 1 ) ^ ( R 1 WO RO)  (a) ^WO  JJ,(R0 W1 R1 W O  W1)  pr0  ,TJ.(R1 WO RO  W1)  ft (R1 WO W 1 W O ) "(f(RO W1 R1 W O ) f ( R O W1 R1)ft(R1 WO RO) pr0  (b)  Figure 3-13 NWRMarch G (a)/PDWMarch G (b) based on symmetric March G In summary, the straightforward implementation o f the P D W T M can always achieve full coverage o f the P O D s without any extra cycle. When the N W R technique is used, it may or may not require extra cycles, dependency on the algorithms to which the technique is applied. On the other hand, as discussed i n Sec. 3.4, the implementation using the N W R technique always uses less  hardware  and imposes  less  performance  penalties  compared  to the  implementation o f the P D W T M i n the straightforward way.  In addition, i f R 0  p r 0  and R l  p r 0  must be used due to the choice o f algorithms, e.g., the M a r c h 9 N ,  readers must be aware that the R 0 ° and R l pi  p r 0  assume the availability o f sufficient time to pre-  discharge the bit lines from V c c to G N D after a read. Such assumption can often be met with synchronous memories, especially embedded ones. F o r memories that do not satisfy such assumption, it might be challenging to make R 0  p r t )  and R l  p K )  work at speed. In comparison, the  implementation using the N W R technique does not impose such assumption since the used N W O  57  and N W 1 are essentially write operations with the subsequent pre-charge high disabled. A s a result, the N W O and N W 1 should always work at the same speed as the normal WO and W l .  In the P D W M a r c h algorithm, both bit lines are discharged low before each P D W . A t the end o f the P D W , both bit lines must be charged back to V c c . In comparison, a normal write requires only one bit line to be charged or discharged. The simultaneous charge or discharge o f both bit lines for the P D W can cause additional power dissipations. For the N W R M a r c h algorithm, each bit line is discharged i n a separated clock cycle but both are charged back to V c c at the same time. Therefore, the N W R M a r c h would consume less power than the P D W M a r c h but still more than that i n a general M a r c h test algorithm.  In conclusion, an optimal implementation o f the P D W T M depends on various parameters, namely the design styles o f S R A M s , hardware overhead, performance impacts and the choice o f the March algorithms.  3.6.5 Limitations and Future Work The smaller the minimum detectable open resistance, the better the detection capability. Compared to the most popular pause test for D R F s , the P D W T M and the derived algorithms have achieved much better detection capability. For example, the P D W T M is able to detect resistive opens between 10KO. and \MQ. for O C 1 and O C 2 while the pause test cannot detect anything less than 100GQ (see Table 5 and [29]). In addition, the P D W T M is also able to detect other opens (e.g. O C 5 and OC11) that are undetectable to the pause test and traditional March tests. According to [32], when the resistance amounts to a couple o f MCls, the defect is considered as an open. However, the detection capability for some o f these opens can be as high as 1 0 M D or even 1 0 G D when the P D W T M is applied. Further improvement is desired. 58  A s shown i n Sec. 5.1, the proposed P D W T M has been validated based on the memory cell design criteria i n [21] under 0.18 p m technology across all process and operating condition corners. In addition, the P D W T M has also been validated under a 0.13pm technology however only at a typical operating condition due to technology file limitations. This memory cell transistor sizes which are defined as width/length (unit: um) are as follows: Pull-up P M O S : 0.16/0.12; Pull-down N M O S : 0.23/0.12; Access N M O S : 0.17/0.14. The formal confirmation that P D W R M would successfully apply to future scaled technology is under investigation.  In general, the proposed P D W consumes more power as compared to a normal write due to the simultaneous pre-charge o f both bit lines at the end o f the P D W . In the straightforward implementation o f the P D W , additional power dissipation also comes from the fact that it predischarges both bit lines at same time before a P D W cycle. In fact, pre-discharging both bit lines before a P D W is not necessary from fault coverage point o f view. A s a further improvement, one can pre-discharge one bit line based on the value to be written during the P D W . For example, i f the P D W is a W l , bit line B L must be pre-discharged before the write operation. O n the other hand, i f the P D W is a W 0 , only bit line B L b needs be pre-discharged before the write. However, such a power improvement would make the pre-discharge control circuitry more complex.  In some high-speed applications where pre-charge state is V c c / 2 , during a normal read, the analog differential sensing amplifier w i l l output the data quickly since the two bit lines are going in a reverse direction. However, using a bit line pre-charge o f V c c / 2 can potentially cause the cell to go unstable during read. To improve its read stability and cell read noise margin, special voltage or size w i l l be designed for those N M O S transistors. The only difference between a typical test mode and P D W T M lies i n their write cycle and their read cycles are the same.  59  Because bit lines are pre-charged into V c c / 2 , to commit a normal write, it may need to adjust the memory sizes. This size may out o f the range to apply P D W T M since this D F T technique is designed and validated based on the typical e - S R A M design, e.g., bit lines are pre-charged to high.  Another high-speed type o f e - S R A M applies pre-charging bit lines to V c c - V t . W i t h this bit lines pre-charge technique, it has been proved based on simulations that there are no different between cases with bit lines pre-charged to V c c and those with bit lines pre-charged to V c c - V t .  3.7 Summary Testing for data retention faults (DRFs) and other open defects i n S R A M cells is difficult and time consuming. M a n y design-for-test ( D F T ) techniques have been developed to deal with the long test time requirements for the D R F s [7-20]. These techniques however all impose substantial hardware requirements, performance penalties and design efforts. Furthermore, they can only be applied at low speed and may not be able to detect all S R A M open defects. Based on a comprehensive defect model, this chapter proposed a novel D F T solution referred to as the Pre-Discharge Write Test Mode ( P D W T M ) . Our S P I C E simulation has demonstrated that the P D W T M is capable o f detecting all open defects i n S R A M cells. In addition, it is capable o f running at-speed. When applied to existing M a r c h algorithms, the P D W T M ' s ability to achieve full D R F and other S R A M cell open defects has also been demonstrated. T w o example implementations o f the P D W T M have been provided. The added hardware and performance penalties as well as design efforts are negligible.  60  3.8 References [1] A . Jee, "Defect-oriented analysis o f memory B I S T tests", Proceedings of the 2002 IEEE International Workshop on Memory Technology, Design and Testing (MTDT 2002), 2002, pp. 7-11. [2] A . J. van de Goor, "Using march tests to test S R A M s " , IEEE Design & Test of Computers, V o l . 10, Issue 1, pp. 8-14, 1993. [3] R . Rajsuman, " A n algorithm and design to test random access memories", Proceedings of International Symposium on Circuits and Systems (ISCAS), 1992, V o l . 1 , pp. 439-442. [4] T . M . Mark, D . Bhattacharya, C . Prunty, B . Roeder, N . Ramadan, J . Ferguson and J. Y u , "Cache ram inductive fault analysis with fab defect modeling", Proceedings ofInternational Test Conference (ITC), 1998, pp. 862-871. [5] T. J. Powell, W . T. Cheng, J . Rayhawk, O . Samman, P. Policke and S. L a i , " B I S T for Deep Submicron  ASIC  Memories with H i g h  Performance  Application",  Proceedings of  International Test Conference (ITC), 2003, V o l . 1, pp. 386-392. [6] B . Wang, J. Yang, and A . Ivanov, "Reducing test time o f embedded  SRAMs", in  Proceedings of the 2003 IEEE International Workshop on Memory Technology, Design and Testing (MTDT 2003), 2003, pp. 47-52. [7] V . H . Champac,  J.  Castillejos and  J.  Figueras,  "IDDQ  testing o f opens i n C M O S S R A M s " ,  Proceedings of 16th IEEE VLSI Test Symposium (VTS), 1998, pp. 106-111. [8] J . Castillejos and V . H . Champac, " A forced-voltage technique to test data retention faults i n C M O S S R A M b y I D D Q testing", Proceedings of the 40th Midwest Symposium on Circuits and Systems, 1997, V o l . 1, pp. 433-436.  61  [9] D . H . Y o o n , H . S. K i m and S. H . Kang, "Dynamic power supply current test for C M O S S R A M " , Proceedings of the 6th International Conference on VLSI and CAD, 1999, pp. 399402. [10] D . H . Y o o n , H . S. K i m and S. Kang, "Dynamic power supply current testing for open defects i n C M O S S R A M s " , Electronics and Telemmunication Research Institute (ETRI) Journal, V o l . 23, N o . 2, pp. 77-84, 2001. [11] C . K o o , T. Toms, J. Jelemensky, E . Carter and P. Smith, " H i g h reliability C M O S S R A M with built-in soft defect detection", Digest of Technical Papers, Symposium on VLSI Circuits, 1989, pp. 75-76. [12] C . K u o , T. Toms, B . T. Neel, J. Jelemensky, E . A . Carter and P. Smith, "Soft-defect '  detection ( S D D ) technique for a high-reliability C M O S S R A M " , IEEE Journal of SolidState Circuits, V o l . 25, Issue 1, pp. 61-67, 1990.  [13] J. L i u and R . M a k k i , "Power supply current detectability o f S R A M defects", Proceedings of the Fourth Asian Test Symposium, 1995, pp. 367-373. [14] S. A . Kumar, R . Z . M a k k i and D . M . Binkley,  "I DT D  Testing o f Embedded C M O S S R A M s " ,  Proceedings of Design, Automation and Test in Europe Conference and Exhibition, 2002, p. 1117. [15] V . H . Champac, V . Avendano and M . Linares, " B i t line sensing strategy for testing for data retention faults i n C M O S S R A M s " , Electronics Letters, V o l . 36, Issue 14, pp. 1182-1183, 2000. [16] V . H . Champac and V . Avendano, "Test o f data retention faults i n C M O S S R A M s using special D F T circuitries", IEEE proceedings of Circuits, Devices and Systems, 2004, V o l . 151, Issue 2, pp. 78-82.  62  [17] A . Meixner and J. Banik, "Weak write test mode: an S R A M cell stability design for test technique", Proceedings of International  Test Conference, 1996, pp. 309-318.  [18] J. Brauch and J. Fleischman, "Design o f cache test hardware on the H P PA8500", Proceedings of International  Test Conference (ITC), 1997, pp. 286-293.  [19] R. Aitken, N . Dogra, D . Gandhi, and S. Becker, "Redundancy, repair, and test features o f a 90nm embedded S R A M generator", Proceedings  of the 18  th  IEEE International  Symposium  on Defect and Fault Tolerance in VLSI System (DFT'03), 2003, pp. 467-474. [20] R. D . Adams et al., U . S . Patent 6,681,350 (Jan. 20, 2004). [21] K . A n a m i , M . Yoshimoto, H . Shinohara, Y . Hirata and T. N A K A N O ,  "Design  Consideration o f a Static M e m o r y C e l l " , IEEE Journal of Solid State Circuits, V o l . SC-18, N o . 4, pp. 414-418, 1983. [22] S. Hamdioui and A . J. van de Goor, " A n experimental analysis o f spot defects i n S R A M s : realistic fault models and tests", Proceedings  of the Ninth  Asian  Test Symposium  (ATS2000), 2000, pp. 131-138. [23] R. Dekker, F. Beenker, and L . Thijssen, " A realistic fault model and test algorithms for static random access memories", IEEE  Transactions  on Computer-Aided  Design  of  Integrated Circuits and Systems, V o l . 9, Issue 6, pp. 567-572, 1990. [24] A . J. van de Goor, Testing Semiconductor Memories: Theory and Practice, John W i l e y & Sons, 1996, N e w Y o r k . [25] J. Yang, B . Wang and A . Ivanov, "Open Defects Detection within 6T S R A M Cells using a N o Write Recovery Test Mode", Proceedings Design, 2004, pp. 493-498.  63  of the 17  th  International  Conference on VLSI  [26] R . Dekker, F . Beenker and L . Thijssen, "Fault modeling and test algorithm development for static random access memories", Proceedings of International Test Conference, 1988, pp. ,  343-352.  [27] Y . Zorian, et. al., U S Patent 05381419 (Jan., 1995). [28] B . Wang and J. Y a n g , " S R A M Design and Optimization", University o f British Columbia E E C E 5 7 9 Homework 3, 2002. [29] Taiwan Semiconductor Manufacturing Company Limited, www.tsmc.com [30] E . Seevinck, F . J . List, and J. Lohstroh, "Static-Noise M a r g i n Analysis o f M O S S R A M Cells", IEEE Journal of Solid State Circuits, V o l . 22, Issue 5, pp. 748-754, 1987. [31] M . Lee, W . I. Sze and C . M . W u , "Static Noise M a r g i n and Soft-Error Rate Simulations for T h i n F i l m Transistor C e l l Stability i n a 4 M b i t S R A M Design", Proceedings of 1995 IEEE International Symposium on Circuits and Systems (ISCAS '95), 1995, V o l . 2, pp. 937-940. f32] R . R . Montanes, J . P . de Gyvez and P . Volf, "Resistance characterization for weak open defects", IEEE Design & Test of Computers, V o l . 19, Issue 5, pp. 18 -26, 2002.  64  Chapter 4  A Time-efficient Customizable March Test Algorithm for an eSRAM  9  4.1 Introduction The System-on-a-Chip (SoC) paradigm is associated with a trend from logic-dominant chips to memory-dominant ones. A n increasing number o f memories with various capacities, e.g., S R A M s , are embedded into the emerging SoCs. Increasingly dense embedded S R A M s (eS R A M s ) are more prone to faults. This not only reduces memory and S o C yield thereby increasingly making redundancy a must, but also poses large test challenges, i n particular the overall time for testing both data retention faults (DRFs) and non-DRFs.  Due to the mixed-signal nature o f S R A M s , defect-based algorithms can achieve higher defect coverage than functional-based ones [1-2]. Previous work on reducing the N o n - D R F s test time for e - S R A M s under defect-based fault models mainly focuses on parallel test algorithms, e.g., those from [3] or the smart test algorithms that consider post-test redundancy schemes, such as those i n [4]. However, parallel test algorithms are limited i n reducing the test time o f multiple memories under increasingly tight power constraints. Obviously, applying them only is not effective at reducing test time when the capacity o f the different multiple S R A M s varies considerably.  9  A version of this chapter has been published. B. Wang, J. Yang, J. Cicalo, A. Ivanov and Y. Zorian, "Reducing  Embedded SRAM Test Time under Redundancy Constraints", Proceedings of the VTS04, Apr. 2004, pp. 237-242. 65  B y more tightly coupling the test and repair activities and capabilities i n [4], the F F M 2 faults mapped from defects, as described i n [2], are simplified, and consideration is given to the hard repair schemes toward greatly reducing test time without negatively impacting defect coverage. Normally, different capacity e - S R A M s are augmented b y different redundancy  techniques  according to the Overall Production Gain ( O P G ) [5] factor that quantifies the trade-offs between yield gain and redundancy area overhead. The algorithm as described i n [4] can only be considered optimal under specific assumptions, namely that hard repair schemes are used. Unfortunately, it cannot be effective for e - S R A M s with soft repair schemes or without redundancy schemes (Non-redundancy). These soft repair schemes are usually those deemed more effective for memories with no larger than 4 M b density i n 180nm technology according to the O P G qualifications reported i n [5]. In reality, redundancy schemes are not used for very tiny e - S R A M s due to cost considerations.  Previous defect-based literature on reducing the test time o f D R F s focuses on shortening the delay time for D R F s through test algorithms [4] or on completely removing the delay time for D R F s from the test flow by applying various Design-for-test ( D F T ) techniques [6-11]. In [4], it is pointed out that this delay time can be effectively eliminated i n the case o f large capacity S R A M s by effecting the test and test control sequences i n specific manners. However, such delay cannot be eliminated typically when testing a large number o f small capacity S R A M s , such as those typically found on S o C platforms. When the memory capacity becomes larger, the test time caused by the extra D F T cycles i n [6-11] becomes longer than the removed delay time for D R F s and thus become a penalty. Overall, the above mentioned techniques for reducing both parts o f the test time for e - S R A M s deal with the memory capacity and the current universal algorithms cannot test e - S R A M s o f any  66  capacity with timing-efficient strategies. In this chapter, with the goal o f reducing test time for embedded S R A M s , we consider the O P G quantification and the delay time for D R F tests as the two deciding factors for testing the N o n - D R F s and D R F s . Based on this consideration, e - S R A M s are categorized into four groups: with N o or Soft Repair & D F T for retention faults ( N S R D F ) , with Soft Repair & D E l a y for retention faults ( S R D E ) , with Hard Repair & D E l a y for retention faults ( H R D E ) , and with Hard Repair & D F T for retention faults ( H R D F ) . Accordingly, we select four corresponding M a r c h test algorithms, generated from a comparison algorithm, by combining the advantages o f the methodologies proposed i n [4] and the D F T technique from [10-11] since the two concepts currently have the highest test time reduction rates and can be applied jointly to any M a r c h test algorithm.  The remainder o f this chapter is organized as follows. In Sec. 4.2, the methodology for categorizing e - S R A M s by considering both O P G and the delay time for D R F tests is described i n detail. In Sec. 4.3, the test time evaluations i n terms o f memory capacities, number o f I/Os, technology scaling trends and March algorithm impacts are presented and discussed. Finally, Sec. 4.4 summarizes the chapter.  4.2 Test Concepts  4.2.1  Background  Although functional fault models tend to be easily developed and formulated, difficulties lie in establishing test results from these models that properly reflect yield. Defect-based fault models, e.g., fault models created under Inductive Fault Analysis (IFA) [12] ( I F A fault models), are more attractive i n terms o f yield and defect level results. This is due to their realistic nature since I F A  67  faults are actually translated from physical defects. Hence, i n this chapter, all the functional faults i n [2] are revisited from a defect point o f view using the I F A technology.  D R F testing is traditionally done b y performing a read operation after a certain pre-determined delay denoted b y T D F . According to [4], the applied T D F i n the test algorithms can be reduced. This reduction is due to the inherent delay o f re-accessing the same memory cell i n actual tests, and is quantified as Test Delay Access ( T D A ) time. The quantification process is shown i n Figure 4-1. Accessing Sequences  Disable & Wait TDF  Proposed  ^  W//A  m////////////////////A TDA  , TDNF  TDF = TNDF + TDA  Fast column address sequencing: wordline reopen after accessing other ( 2 - 1 ) * 2 c e l l s x  Y  Fast row address sequencing: wordline reopen after accessing other (2 1) cells  X  X: Row Address Number; Y: Column Address Number; TP: Test Clock Period (ns)  Figure 4-1 DRF test time reduction The new T D F ( T N D F ) i n the test flow o f [4] can be quantified as T D F - T D A , where T D A is dependent on many factors, e.g., memory architecture, test clock period, test sequence, the number o f operations within each M a r c h element, etc. T w o typical accessing methods: fast column accessing sequence and fast row accessing sequence, are shown i n Figure 4-1. The D R F test delay time i n the proposed test algorithms is quantified from the T N D F using fast column accessing sequence method instead o f the T D F . T o maintain other M a r c h algorithms referenced in this chapter, T D F is still used instead o f T N D F .  A number o f different redundancy techniques are generally applied towards reducing memory yield loss. For a single e - S R A M , either hard or soft repair methodology, instead o f their 68  combination, is selected. Usually, hard repair methodologies, e.g., row or column redundancy or their combinations, are considered to be more applicable for high capacity memories. Soft repair, e.g., word redundancy, is usually deemed more effective for low capacity memories. The selection o f appropriate redundancy techniques, shown i n Figure 4-2, is usually done according to the trade-off between the final yield and the redundancy area overhead, e.g., O P G i n [5]. If the O P G h r ( O P G using hard repair) value is higher than O P G s r ( O P G using soft repair) value, the row/column redundancy can be used. Otherwise, the word redundancy is selected. Proposed in this chapter  Proposed in [5] Redundancy Schemes Selections  No Repair  Soft Repair  Hard Repair —  1  Note:  No Repair  1  — Soft Repair.  Hard Repair  Seiec'iors between PDWMarch and NWRMarch  1. OPGsr: O P G with soft repair scheme 2. OPGhr: O P G with hard repair scheme 3. Toh: Overhead Test Time due to DFT 4. TNDF: Quantified Delay Time for DRF test  5. PDWMarch: March merged with Pre-Dischgarge Write] 6. NWRMarch: March merged with No Write Recovery NSRDF: No/Soft Repair & DFT  SRDE: Soft Repair & Delay  HRDF: Hard Repair & DFT  HRDE: Hard Repair & Delay  Figure 4-2 Classify e-SRAMs with redundancy  4.2.2 Classifications of e-SRAMs for Testing Based on the O P G i n [5] and the delay time o f D R F tests, the e - S R A M s with redundancy schemes i n SoCs can be divided into four categories for test and repair. The proposed division process i n this chapter is also shown i n Figure 4-2.  69  In [5], as long as the O P G s r value is higher than the O P G h r value, word redundancy is selected as the repair technique instead o f row/column redundancy. In this chapter, row/column redundancy rather than word redundancy is applied i f the O P G s r value is slightly higher than the O P G h r value. A user-defined k based on the trade-offs among cost-related elements is used as the adjustment factor. The reason for this criterion is that more test time can be reduced under the row/column redundancy scheme, according to [4]. For D R F testing, either a certain delay time ( T N D F ) or a D F T technique that circumvents the delay time requirement has to be used. If the necessary delay T N D F is more than the T o h (test time overhead due to extra D F T cycles), D F T techniques are applied. Otherwise, only the delay time is used. However, it would be noted that the T o h value is dependent on the D F T implementation method. Thus, the selection o f D F T implementation methods has to be considered before deciding using D F T or T N D F for D R F s . It is observed from [5] that row/column redundancy schemes are applied on the larger capacity eS R A M s . O n the contrary, smaller capacity memories cause a larger T N D F . A s a result, we classify those memories with redundancy schemes into four categories: S R D F e - S R A M s (with Soft Repair & D F T for retention faults); S R D E e - S R A M s (with Soft Repair & D E l a y for retention faults); H R D F e - S R A M s (with Hard Repair & D F T for retention faults); and H R D E eS R A M s (with Hard Repair & D E l a y for retention faults).  For e - S R A M s without redundancy schemes, we simply consider them as part o f the "very-smallcapacity" category since i) i n reality, their capacities are smaller than any o f those with redundancy schemes due to cost considerations; ii) for all the existing test time reduction techniques, there is no difference between them and those with soft repair. Therefore, we update the name o f this category as N S R D F (with N o or Soft Repair & D F T for retention faults) for those non-repairable memories.  70  It is noted that the selections between the two implementation methods o f P D W T M , e.g., P D W M a r c h and N W R M a r c h , has to be done before considering whether to choose D F T technique or just use simplified Pause test. This is because the T N D F value is related to the implementation method o f the P D W T M . The selection between P D W M a r c h and N W R M a r c h can be referred to Chapter 3.  4.2.3 e-SRAM Non-DRF Tests Using Redundancy Features In  [2], FFM2  denotes functional faults involving two memory cells. Mostly, these faults are  caused by bridge defects between two cells. According to physical defect locations instead o f different faulty behaviors, i f the four-cell configuration i n Figure 4-3 is used, these  FFM2s were  simplified from five categories i n [2] into three types: R o w Coupling Faults (RCFs), Column Coupling Faults (CCFs) and Diagonal Coupling Faults (DCFs). This simplification follows from the fact that the defects causing these faults can be considered to be located between rows, columns and diagonals. WL1-  BL1  H  /BL1  C2 W  BL2  H  /BL2  C4 W  Figure 4-3 Four-cell memory configuration Since these coupling faults are caused by physical bridge defects, they behave as a cell pair with bi-directional impacts: aggressor and victim. The aggressor cell and the victim cell are named according to the accessing sequence. Usually, the first cell accessed is referred as the victim cell since the next cell accessed o f the pair couples the first cell. Obviously, the victim cell w i l l become a good cell as long as the aggressor disappears, e.g., repaired. According to row/column redundancy mechanisms, only redundant rows/columns, not faulty ones, are accessed. Thus, i f one o f the cell pairs is detected using one o f the two address sequences (increasing or 71  decreasing) and repaired using row/column redundancy techniques, the other cell (victim) automatically becomes a good cell. In other words, as shown i n [4], detecting with one address sequence and repairing any faulty cell o f a pair due to bridges with hard repair techniques can achieve the same yield as that obtained by performing both address sequencing i n the pair. This can be validated through defect injections and following fault detections [4].  Clearly, using one address sequence to achieve the same defect coverage as that obtained b y using any defect-based March algorithm reduces the non-DRF faults test time, by a factor o f up to 2. However, the soft repair or non-redundancy scheme cannot be applied here since neither is able to achieve the same defect coverage as when using the hard repair. This is because both the repaired cell and the faulty cell are accessed at the same time when using soft repair, e.g., [13].  4.2.4 e-SRAM Tests to Detect DRFs In this chapter, when T N D F is longer than T O H , we eliminate this T N D F but create T O H by choosing an existing low-penalty D F T , P D W T M [10-11] to reduce the test time o f D R F s .  L i k e the methodology i n [8-9], i n P D W T M , a special write cycle is created to distinguish a good cell from a faulty cell when subjected to a D R F caused b y an open defect on the pull-up P M O S . In this section, a typical 6T S R A M cell with storage node A and complementary storage node B , shown i n Figure 4-4, is used to illustrate the differences between the specifically designed write cycle and the normal write cycle.  72  WL ~ r  Vcc  1 AGND  Figure 4-4 A typical 6T SRAM cell During the normal W l cycle, node B is pulled down by the bit line B L b that is driven to strong G N D b y the write control logic, and node A is pulled up due to the charge sharing with the floating bit line B L that has already been pre-charged to V C C . Here, strong G N D means that the node is driven to the G N D voltage level b y other sources. Due to the latch mechanism o f the memory cell, the cell flips its value from Z E R O to O N E as long as the voltage level o f node B is pulled to a sufficiently low level. In algorithms from [8-9], by setting the bit lines B L and B L b to a given voltage level between V C C and G N D during the write operation, e.g., when the access N M O S is on, a good cell fails to flip while a faulty cell does. Similarly, we set the voltage level o f B L and B L b to weak G N D and strong G N D respectively. Here, weak G N D means the node voltage level is at G N D but no sources drive this node. This causes an opposite result, i.e., a good cell succeeds at flipping its logic value while a faulty cell fails to do so. For a good cell, there is no problem writing a O N E because node B can be pulled down b y the bit line B L b and the cell can flip to O N E due to the latch mechanism. However, a faulty cell subject to a D R F fails to flip because the voltage level o f node A never exceeds that o f node B . The voltage level o f node A always remains at G N D since (i) lacking the P M O S or path to the supply rail, node A cannot be pulled high regardless o f how l o w the node B voltage level reaches, so the latch i n this faulty cell malfunctions; and (ii) there are no charge sharing effects with bit line B L because it is set at weak G N D . G N D is the  73  lowest achievable voltage level and node A remains at G N D , the voltage level o f node A never exceeds that o f node B and the faulty cell fails to flip. A s a result, D R F s are detected under PDWTM.  From [10-11], the P D W T M can be merged into any M a r c h test by simply either adding two extra N o Write Recovery Cycles ( N W R C s ) just before its normal write cycles or replacing pre-charge phase with pre-discharge phase i n the normal write cycles. This merging success is because it can share the same write operation mechanism, i.e., faulty cells w i l l fail to flip during the test. In other words, P D W T M has the advantage o f being mergable with any typical M a r c h test algorithm without incurring additional test patterns. Other D F T techniques do not share this advantage. Hence, P D W T M is the best i n terms o f test time reduction for D R F s among all existing D F T techniques.  4.3 Test Time Evaluations Since the concepts i n Sec. 4.2.3 and 4.2.4 have been validated i n [4] and [10-11] respectively, this section evaluates the test time o f all capacity memories to show the advantages o f the proposed classification.  4.3.1  A Case Study  4.3.1.1 Test Algorithm Generations Since test concepts i n Sec. 4.2 are based on M a r c h algorithms, we select a typical defect-based test algorithm with high defect coverage, e.g., M a r c h 9 N with Pause test [14] shown i n Figure 45, to generate M a r c h test algorithms for each category memories i n this study. Named according to their complexities, these time-efficient algorithms and their selection summary are shown i n  74  Figure 4-6 (a) or (b), Figure 4-7 (a) or (b), Figure 4-8 (a) or (b), Figure 3-4, Figure 3-6, and Figure 4-9 respectively, where " f t " means the address sequencing is increasing during the test, " ' ^ 7 " means the address sequencing is decreasing during the test. In addition, delay is the necessary T D F for Figure 4-5 and T N D F for others respectively discussed in Sec. 4.2, " N w O / N w l " represents writing 0/1 i n the N W R T M . Moreover, either (a) or (b) test algorithms with different address sequencings i n Figure 4-6, Figure 4-7 and Figure 4-8 can be used. QWO tT(RO  Delay  ^(RO  W 1 ) ,J1.(R1  Delay  W1)  WO) WO)  tf(R1  Figure 4-5 The March 9N test with Pause test Jl^wo Delay ^(ROW1) ^(R1  Delay  -jj^WO Delay ft(ROW1)  WO)  Delay  ft(R1  (a)  WO)  (b)  Figure 4-6 March 5N test algorithm  |,wo  ^(R0  ^(Ri ° p r  WO)  P R O  ^ ( R O ° W1)  W1)  p r  Q RO  ^(R1  (a)  P R 0  WO)  \J RO  (b)  Figure 4-7 PDWMarch 6N test algorithm ,Q,(NW1 WO) ^ ( R O NWO W1 )^(R1  ft(NW1 WO)  WO)  ft(RO NWO W1Vjf(R1  m  (a)  Figure 4-8 NWRMarch 7N test algorithm  75  WO)  OPGsr OPGhr  Redundancy Scheme  PDWTM Implementation Selections  TNDF Toh  DRF Test Strategy  Classification  Algorithm Complexity  >0  PDWTM  NSRDF  9N  =0  TNDF  SRDE  9N  >0  PDWTM  NSRDF  9N  <=0  TNDF  SRDE  11N  >0  PDWTM  HRDF  6N  <=0  TNDF  HRDE  5N  >0  PDWTM  HRDF  7N  <=0  TNDF  HRDE  5N  PDWMarch >= 1.01  Soft NWRMarch  PDWMarch < 1.01  Hard NWRMarch  Figure 4-9 Summary of test algorithm selections  4.3.1.2 Test Time Quantifications The row address number X is assumed to be equal to the column address number Y (even total address number) or one more than Y (odd total address number) in order to minimize the physical layout area. Moreover, i n this section, the test clock period (TP), T D F , and the 10 numbers are assumed to be 50ns (the same as i n [8]), 100ms, and 8 bits respectively. The technology used is 180nm.  According to Figure 4-2, we assume that word redundancy is used i n e - S R A M s with less than 2 M b densities since (OPGsr - OPGhr) / O P G s r is 0.98% for 2.5Mb i n [5]. U s i n g the 8bit IO assumption above, the e - S R A M s with less than 2 5 6 K B use word redundancy (k is assumed as 100.98% i n this study) and the others use row/column redundancy, assuming 180nm technology. Based on the concepts i n Sec. 4.2 and the test algorithms above, we evaluate the memory test time with the following flow chart by implementing P D W T M with N W R , shown i n Figure 4-10, when using the four proposed test algorithms.  76  Start: Memory under T e s t Define Y, TP, T D F (IO = 8), X = Y or X= Y + 1 T N D F = T D F - T P * (2* - 1) * 2v, T o h = T P * 2*+* Total T e s t T i m e for M a r c h 9 N with P a u s e Test: 9 * 2* y * T P + 2 * T D F +  <^TNDF > N  T C ^ >  r  \  r  March 9 N  March 11N Total test time:  Total test time: 9 * 2 *y * T P x  2 * TNDF  Y March 7 N  Total test Time:  11 * 2**y * T P  +  March 5 N  5 - 2* y * T P + +  2 * TNDF  Total test Time: 7 * 2 *V * T P X  T e s t T i m e Reduction Percentages: (1 - Total _time (xN)/Total_time (9N)) * 10Q, where x = 5, 7, 9, 11  Figure 4-10 A flow chat of test time evaluation by applying NWRTM The reduction percentages i n terms o f memory capacity, compared with that o f March 9 N with Pause test, are shown i n Figure 4-11 and their corresponding classifications are shown i n Table 4-1. A s shown i n Figure 4-11 and Table 4-1, i n this case study, using different time-efficient test algorithms, the test time o f various capacity e - S R A M s can be reduced by a percentage o f 44.4% or greater.  Test Time Reduction vs. Memory Capacity 100 A-  A  |  A  90 80  -•—5N  70  •  - 7N  -A—11N ~  60 50 40 10  12  14  16  18  20  22  Memory Capacity (X+Y)  Figure 4-11 e-SRAMs test time reductions  77  24  Table 4-1 Classifying e-SRAMs under 8 bits 10 Capacity  128KB  256KB  512KB  1MB  2MB  5N (%)  15  24  36  47  54  7N (%)  82  71  57  45  36  1 IN (%)  72  54  33  14  0  HRDF  NSRDF  e-SRAM  HRDE  4.3.2 IO Number Effects The number o f IOs for e - S R A M s may vary depending on applications. Following the evaluation steps above and using the same other assumptions except for the number o f IOs, the test time reduction evaluations for e - S R A M s o f 4, 8, 16 and 32 bit IOs are shown i n Figure 4-12.  From Figure 4-12, for frequently used IO numbers, we can apply the proposed test algorithms to achieve a total test time reduction o f more than 44 percent for all capacity e - S R A M s in 180nm technology.  Test Time Reduction vs. Memory Capacity  40  J 10  , 12  14  16  18  ,  i  20  22  i 24  Memory Capacity (X+Y)  Figure 4-12 Evaluating test time reduction considering IO numbers  78  4.3.3 Technology Trends O n one hand, both the yield and the redundancy area overhead are related to the entire physical layout area. Thus, i f the total memory area is kept the same, the O P G does not change. Therefore, the division line between which memory capacity is appropriate for word redundancy and which is appropriate for row/column redundancy increases twice per technology node. This is true since the transistor number i n a fixed layout area increases twice per technology node.  Test Time Reduction vs. Memory Capacity 100  90 - • — 180nm  80  -m— 130nm -£r— 90nm  70  -X—65nm  60  -•—32nm  —45nm •G  H—22nm 50  40 10  12  14  16  18  20  22  24  Memory Capacity (X+Y)  Figure 4-13 Evaluating test time reduction considering technology node trends O n the other hand, the on-chip Built-in-self-test (BIST) clock period (TP) is also shrinking based on the technology nodes. According to the on-chip clock frequency i n Table 4a o f the I T R S ' 9 9 document [15], Table 4c o f the ITRS'01 document [16] and Table 4c and 4d o f the ITRS'03 document [17], the on-chip clock period is predicted to decrease by the following factors for each technology step, starting at the 180nm to 130nm step, and ending at the 32nm to 22nm step: 1.403, 2.37, 1.690, 1.710, 1.681, and 1.486. The T P reduction ratios are assumed to have the same trend as that o f an on-chip clock. B y maintaining other assumptions, the evaluation results when varying the technology nodes are shown in Figure 4-13.  79  Figure 4-13 shows that when compared with M a r c h 9 N with Pause test, the proposed test algorithm achieves more than a 44 percent reduction o f the overall test time for all the memories, without a loss o f test quality. This trend holds true at least down to the 22nm technology node.  4.3.4  March Algorithm Complexity Impacts  The March Algorithms for e - S R A M s test may vary depending on the defect  coverage  requirements. Following the evaluation steps above and using the same other assumptions i n the case study except for the different comparison March algorithm, the test time reduction evaluations for M a r c h 9 N with Pause test [4], March 2 6 N and M a r c h 4 2 N [18] are shown i n Figure 4-14.  From Figure 4-14, it can be seen that the higher the March algorithm complexity the more the total test time reduction, e.g., more than 48%, when the memory capacity is large enough, e.g., those with row/column redundancy. It is also can be derived from the test time quantification procedure that the minimum reduction ratio w i l l approach to 50% i f the M a r c h algorithm is complex enough, e.g., March 1000N. Let cx denote the complexity o f the M a r c h algorithm under comparison, the proposed customizable M a r c h algorithm complexity for H R D E would  R= l i m  be  (cx/2+1).  cx-i^ ^  + l)  As  a  result,  its  1 0 0  x l 0 0 = l i m ( 5 0 — — ) = 50.  80  test  time  reduction  ratio  e-SRAMs R  is  Test Time Reduction v& Memory Capacity  89  1  79  - • — 10N 69  • • — 26N — — 42N  59 49 39  10  12  14  16 18 20 Memory Capacity (X+Y)  22  24  Figure 4-14 Evaluating test time reduction considering March algorithm complexity impacts  4.4 Summary More and more embedded  S R A M s o f various capacities occupy larger and larger area  percentages within current and future SoCs. Increasingly dense e - S R A M s present a challenge in terms o f test time.  There are no universal time-efficient test algorithms. The parallel test algorithm proposed i n [3], the smart test algorithm that considers the post-test redundancy schemes i n [4], and DFT-based test algorithms i n [6-11] are not suitable to cover all capacity e - S R A M s i n reducing the overall test time. B y reviewing all the test time reduction techniques under the defect-based fault models, O P G in [5] and the new delay time ( T N D F ) for D R F tests i n [4] are selected to classify eS R A M s into four categories and these memories are tested with different time-efficient test algorithms i n order to reduce their overall test time as much as possible. The entire test time reduction evaluations show that the classification can achieve test time reductions by up to a factor o f two at least down to 22nm technology for all capacity e - S R A M s with different IO numbers when compared with March 9 N with Pause test i n this chapter. When the comparison 81  March algorithm becomes complex enough, the minimum test time reduction ratio w i l l up to 50% for the memories with row/column redundancy and larger and larger capacity predicted i n the ITRS documents.  4.5 References [1] A . J . van de Goor, "Testing Semiconductor Memories, Theory and Practice", ComTex Publishing, Gouda, the Netherlands, 1998, http://cardit.et.tudelft.nl/~vdgoor. [2] S. Hamdioui, and A . J. van de Goor, " A n experimental analysis o f spot defects i n S R A M s : realistic fault models and tests", Proceedings of the Ninth Asian Test Symposium (ATS2000), 2000, pp. 131-138. [3] D . C . Huang and W . B . Jone, " A parallel transparent B I S T method for embedded memory arrays b y tolerating redundant operations", IEEE Transactions on Computer-aided Design ofIntegrated Circuits and Systems, 2002, V o l . 21, N o . 5, pp. 617-628. [4] B . Wang, J . Yang, and A . Ivanov, "Reducing test time o f embedded S R A M s " , Proceedings of the 2003 IEEE International Workshop on Memory Technology, Design and Testing (MTDT'03), 2003, pp.47-52. [5] E . Rondey, Y . Tellier, S. Borri, " A silicon-based yield gain evaluation methodology for embedded-SRAMs with different redundancy scenarios", Proceedings of the Eighth IEEE International On-Line Testing Workshop, 2002, pp. 251-255. [6] V . H . Champac, J. Castillejos, J . Figueras,  "IDDQ  testing o f opens i n C M O S S R A M s " ,  Proceedings of 16th IEEE VLSI Test Symposium, 1998, pp. 106-111. [7] D . H . Y o o n , H . S. K i m , S. H . Kang, "Dynamic Power Supply Current Testing for Open Defects i n C M O S S R A M s " , Electronics and Telecommunication Research Institute (ETRI) Journal, V o l . 23, N o . 2, pp. 77-84, 2001.  82  [8] A . Meixner and J. Banik, "Weak write test mode: an S R A M cell stability design for test technique", Proceedings  of International  Test Conference, 1996, pp. 309-318.  [9] V . H . Champac, V . Avendano, and M . Linares, " B i t line sensing strategy for testing for data retention faults i n C M O S S R A M s " , Electronic Letters, V o l . 36, Issue 14, pp. 1182-1183, i  2000. [10] J. Yang, B . Wang, and A . Ivanov, "Open Defects Detection within 6T S R A M Cells using a N o Write Recovery Test Mode", Proceedings  of International  Conference on VLSI design  2004, 2004, pp. 493-498. [11] J. Yang, B . Wang, Y . W u and A . Ivanov, "Fast Detection o f Data Retention Faults and Other S R A M Cell Open Defects", to appear in the IEEE Transactions  on Computer  Aided  Design (TCAD) of Integrated Circuits and Systems [12] J. P. Shen, et al., "Inductive fault analysis o f M O S integrated circuits", IEEE Design and Test of Computers, V o l . 2, N o . 6, pp. 13-26, 1985. [13] V . Schober, S. Paul and O. Picot, " M e m o r y built-in self-repair using redundant words", Proceedings,  of International  Test Conference, 2001, pp. 995-1001.  [14] R. Dekker, F. Beenker, and L . Thijssen, " A realistic fault model and test algorithms for static random access memories", IEEE  Transactions  on Computer-Aided  Design  of  Integrated Circuits and Systems, V o l . 9, Issue 6, pp. 567-572, 1990 [15] International Technology Roadmap for Semiconductors, 1999 Edition: "Overall Technology Roadmap Characteristics and Glossary", pp. 14-16,1999. [16]  International  Technology Roadmap  for  Semiconductors,  2001  Edition: "Executive  for  Semiconductors,  2003  Edition: "Executive  Summary," pp. 44-48, 2001. [17] ,  International  Technology Roadmap  Summary," pp. 53-54, 2003. 83  [18] A . J. V a n D e Goor and B . Smith, "The automatic generation o f M a r c h tests", Records of the IEEE International  Workshop on Memory Technology, Design and Testing, 1994, pp. 86-  91.  84  Chapter 5  A Time-efficient Test Architecture for Multiple Distributed Small e-SRAMs  10  5.1 Introduction The System-on-a-Chip (SoC) paradigm is associated with a trend from logic-dominant chips to memory-dominant ones [1]. One o f the reasons for this is that an increasingly large number o f small S R A M s (e.g., those too small to apply redundancy because o f the area overhead constraint [2]) are widely embedded into the emerging SoCs for buffering data between computational components.  different  For example, up to 121 small S R A M s are designed into a  networking S o C IEX2000 [3]. A s they become increasingly dense, these small e - S R A M s are becoming more prone to defects and faults. This not only reduces memory and S o C yield but also poses significant test challenges.  Difficulties with testing distributed small e - S R A M s lie i n the following: (i) External testers become increasingly incapable o f testing deeply embedded memories because o f the limited external observability and controllability. At-speed tests are also generally infeasible due to limited timing accuracies when accessed externally. Built-in-self-test (BIST) appears to be the only known cost-effective solution for such problems, (ii) For large numbers o f relatively small embedded S R A M s , the use o f separate B I S T controllers/circuitry can amount to unacceptable test area overheads, (iii) The spatial distribution occurring with several small e - S R A M s renders  10  A version of this chapter has been published. B. Wang, Y. Wu and A. Ivanov, "Designs for Reducing Test Time  of Distributed Small Embedded SRAMs", Proceedings of IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'04), Oct. 2004, pp. 120-128. 85  the routing o f wires required for delivering patterns and analyzing the responses to be problematic, especially i f a single B I S T controller is to be shared b y many S R A M s to keep overhead low.  (iv) Though every e - S R A M may be relatively small, their possibly large  accumulated capacity may cause long test time i f testing them i n parallel is sometimes not allowed. Moreover, the test time is dominated by the time for testing Data Retention Faults (DRFs) which, i n practice, are tested b y performing a read operation following a predetermined delay (e.g., 100 ms). In other words, the total test time for these small e - S R A M s is at least a couple  o f hundred  milliseconds, regardless  o f the  memory  size/capacity  and  testing  methodology, e.g., parallel or sequential test.  To overcome the  above challenges, previous work mainly focuses  on developing test  architectures which support parallel B I S T with a limited number o f control and data signals [49]. In [4-9], the concurrent tests o f such small distributed e - S R A M s allow a dramatic reduction in the total test time. The serial interface used i n [4-8] for delivering test patterns and analyzing test responses simplifies the routing from the B I S T controller to the memories under test. Moreover, the use o f a single shared B I S T controller, as proposed i n [4-8], minimizes the test area overhead without negatively affecting the defect and fault coverage. The architecture i n [9] also supports the diagnosis. However, the serial nature o f the response analyzer and the corresponding complex test algorithms i n [4-8] result i n long test time. The architecture i n [9] has a separate data background generator and a control signal generator associated with each memory. This scheme is generally not feasible for testing multiple distributed small e - S R A M s due to the routing problem and area overhead penalty. M o r e importantly, all the previous work fails to consider the D R F testing which dominates the time for testing small e - S R A M s . A s a  86  result, the defect coverage is compromised, and therefore, the test time reduction, due to the proposed architecture, is overestimated.  In this chapter, the thesis proposes new designs targeting the reduction o f the total test time for distributed small embedded S R A M s while still maintaining acceptable control signal routing complexity and corresponding area overhead. Our designs are based on those proposed i n [4-5]. W e use a modified version o f a high-performance, local, parallel equality comparator [10], to replace the serial one used i n [4-5]. W e combine the proposed architecture with an effective design-for-test ( D F T ) technique known as "Pre-Discharge Write Test M o d e " ( P D W T M ) [11-12] that is used to test D R F s without incurring extra delay time. Together, these improvements yield a high defect coverage, a l o w area overhead, and short test time for testing multiple, small distributed e - S R A M s .  The remainder o f this chapter is organized as follows.  In Sec. 5.2, we briefly review the test  architecture i n [4-5]. The detailed designs for reducing test time o f distributed small S R A M s are proposed i n Sec. 5.3. The test evaluations, i.e., defect coverage analysis, test time comparison, and area overhead estimation, are discussed i n Sec. 5.4. Finally, Sec. 5.5 summarizes the chapter.  5.2 Review of Previous Work W e begin by briefly reviewing the basics o f the test architecture developed i n [4-5] for testing distributed small e - S R A M s . To achieve l o w area overhead, a shared single B I S T controller, including an address counter, a data background generator, and a control signal generator, is used. To solve the routing problem, a serial interface is designed to deliver patterns and responses. A Multiple Input Shift Register ( M I S R ) is used to execute the test parallelism. In order to protect the M I S R from being corrupted by the undefined initialized data coming out 87  from the memories, the B I S T controller uses an enable line to disable the M I S R during the initialization. The test architecture is shown i n Figure 5-1. BIST Controller  T_ct  Address Counter  Control Generator  Data Background Generator  e-SRAM 1  e-SRAM2  e-SRAM 3  Serial  Serial  Serial  a si  Multiple Input Shift Register (MISR)  Figure 5-1 The test architecture in [4, 5] The algorithm R S M a r c h used i n [4-5] is based on a M a r c h C - algorithm [13]. In addition to the faults covered b y a M a r c h C - with a single data background, R S M a r c h also covers all intra-word and column decoder faults.  The serial interfacing i n [4-5] utilizes a design from [14-15] that requires only two lines for test data application and observation, instead o f one with parallel lines for a full IO bandwidth for each small e - S R A M . A n example o f this serial interfacing technique is shown i n Figure 5-2. Memory Cell i  Memory Cell i+1  S e n s e amplifier  Write driver/  Transparent latch  Multiplexer/ To next test data input or serial input From previous output or serial input  <  Output i Normal data inputs  Output i+1 ->  Figure 5-2 Memory with serial-data-path connections in the BIST mode 88  W i t h such serial interface, delivering a single pattern needs c cycles and so does analyzing a test response. For a wide memory with a relatively large c, this serial testing scheme w i l l result i n long test time.  Moreover, the authors o f [4-5] do not consider the testing o f data retention faults (DRFs). Not considering D R F s limits defect and fault coverage and also mitigates the test time reduction achieved due to their proposed architecture since the time required for the test o f D R F s usually dominates the small e - S R A M total test time.  5.3 Design-for-test (DFT) Techniques 5.3.1 Architectural Overview To improve the architecture i n [4-5] for better test time without imposing excessive area overhead, we propose a test architecture with serial delivery o f test patterns and parallel analysis o f test responses. The patterns serially delivered to e - S R A M s are applied to these memories through the Serial to Parallel Converters (SPCs). In order not to worsen the test signals routing problem, we design a parallel response analyzer that is intended to reside locally to a memory under test. In other words, i n our scheme, each e - S R A M possesses its own Local Response Analyzer ( L R A ) . Each L R A checkes test responses from its corresponding e - S R A M i n parallel and outputs a single bit signal for each read operation. These signals from each e - S R A M are sent to a global M I S R . The global B I S T controller is designed based on the largest (i.e., the largest capacity)/widest (i.e., the largest IO number) e - S R A M .  A time-efficient method to test D R F s is referred to as the "Pre-Discharge Write Test M o d e " ( P D W T M ) [11-12]. W e adopt this method here as well by implementing it with the N W R M a r c h 89  method. Since this implementation only requires a single control gate for the entire e - S R A M s to disable pre-charge bit lines during D R F testing, another extra global line is added into the control generator to enable the D R F testing for all e - S R A M s . R S M a r c h used i n [4-5] covers all intra-word and column decoder faults. For fair comparison, we use the M a r c h C W algorithm [16] for the proposed test architecture. The M a r c h C W is essentially a parallel March C - algorithm with additional march elements for testing intra-word coupling and column decoder faults. W i t h the discussions above, our proposed test architecture is shown i n Figure 5-3.  The proposed architecture works as follows. Before each M a r c h element begins, a test pattern is serially delivered to all S P C s local to the e - S R A M s . Once the pattern delivery is complete, the controller conducts a full march element before providing a new pattern.  Similarly to that described i n [4-5], each pattern is written to each address only once for the largest memory or memories. For smaller ones, however, the same pattern could be written on each address multiple times as the addresses wrap around. It should be pointed out that the test responses obtained from a smaller e - S R A M w i l l change as soon as the addresses wrap around for the first time, due to the read-modify-write operations o f the M a r c h C - . K n o w i n g when the addresses wrap around requires memory size information. To reduce hardware complexity, we choose not to store such information. Instead, we employ a L R A as a space compactor to pompress each word test response obtained from a memory into a single bit output feeding further into a Global M I S R for time compaction.  90  BIST Controller Control Generator  Address Counter  Serial to Parallel Converter (SPC)  3  Serial to Parallel Converter ( S P C )  e-SRAM 1 Local Response Analyzer (LRA)  e-SRAM2 I  V Local Response Analyzer (LRA) 1  Data Background Generator  Serial to Parallel Converter (SPC)  e-SRAM3 Local R e s p o n s e Analyzer (LRA)  r Global Multiple Input Shift Register (Global MISR)  Figure 5-3 The proposed test architecture A t the end o f a B I S T run, the content o f the M I S R is shifted out and compared against a golden signature to determine i f any memory has failed.  5.3.2 Designs for Serial to Parallel Converter (SPC) In the proposed test architecture, SPCs receive the patterns delivered from the Data Background Generator o f the B I S T controller and apply them to the corresponding e - S R A M s in parallel so that it is possible to analyze their test responses i n parallel. If the serial pattern from Data Background Generator is shifted from the least significant bit ( L S B ) to the most significant bit ( M S B ) and S P C also converts the patterns from the L S B to the M S B , different S P C s for the widest e - S R A M and any other one w i l l be needed. For example, i f both pattern delivery and conversion are performed from L S B to M S B , the converted pattern for the widest e - S R A M and the one for any other e - S R A M would be T V [c-l:0] and T V [c-1: c-c'] respectively, when the Data Background Generator delivers a Test Vector denoted as T V [c-l:0] (where c and c' is the IO number o f the widest e - S R A M and any other narrower  one  respectively). This is because the first (c-c') bits o f test patterns for the narrower e - S R A M s are always lost during the shift at that time. However, the expected patterns delivered to the narrower e - S R A M s should be T V [ c ' - l :0]. This mismatch may reduce test coverage. 91  *|  DFF  1  DFF  o  TITV[3:0]  DFF  i n o—  Shift Clock  (a), for c =  4 1TTV[2:0]  TV[0:3]  6  *  DFF  tj o—n  DFF  O  *  DFF  (b). for c' = 3  Figure 5-4 Designs for pattern delivery and SPC  To prevent the potential coverage loss, we design the pattern delivery and conversion for all memories as the following: the serial pattern is shifted from the MSB to the LSB and its corresponding SPC coverts the patternsfromthe MSB to the LSB. As a result, the corresponding conversionfromData Background Generator will also be modified as thatfromthe MSB to LSB. With this appropriate design, all the patterns are correctly delivered in parallel to every small eSRAM under test. The design example of two co-existing small e-SRAMs only with c = 4 and c'= 3 is shown in Figure 5-4 (a) and (b) respectively. Obviously, the widest e-SRAM has an IO number as 4 while the narrower one has an IO number as 3 in this example. It should be pointed out that a new test pattern is delivered to the SPC only once just before each March element begins. 5.3.3  Designs for Local Response Analyzers (LRAs)  In environments that include multiple distributed small e-SRAMs, the designs of LRAs should minimize area while allowing for large fan-ins to accommodate e-SRAMs with large numbers of IOs.  Based on these two requirements, we slightly modify the equality comparator designs in [10] by removing one of the output inverters. An example with a fan-in of 64 is shown in Figure 5-5. 92  Figure 5-5 A proposed LRA with a fan-in of 64 Because o f the Global M I S R , the purpose o f the L R A is actually a space compactor which consists o f multiple inputs and a single output. In application, its terminal A s are connected to the memory outputs. Its terminal B s are connected either to the S P C or simply logic O's. When A ( i ) = B(i) for all /'s, the L R A outputs a 0. Otherwise, it outputs a 1. From Figure 5-5, a L R A with fan-in = 64 has (4 + 4 x 64) transistors. In other words, the L R A area per bit/fan-in is comparable to the area o f a 6T memory cell.  5.3.4 Designs for Testing Data Retention Faults To test D R F s , here we adopt a previously proposed low-penalty D F T technique from [11-12], named as "Pre-Discharge Write Test M o d e " ( P D W T M ) , and implement it with the N W R M a r c h method.  L i k e the methodology i n [18-19], i n P D W T M , a special write cycle is created to distinguish a good cell from a faulty cell when subjected to a D R F caused b y an open defect on the pull-up P M O S . A typical 6T S R A M cell with storage node A and complementary storage node B , shown in Figure 5-6, is used to illustrate the differences between the specifically designed " N o Write Recovery C y c l e " ( N W R C ) and a normal write cycle. The bit line pre-charge circuit o f P D W T M is also shown i n Figure 5-6, where the signal P D W T M is used to switch between the N W R C and a normal write cycle. 93  WL  Vcc  J-TL BL  n  BLb  ^GND PRE PDWTM  Figure 5-6 A typical 6T SRAM cell and its pre-charge control circuits for PDWTM  Referring to Figure 5-6, during a normal W l cycle, node B is pulled down by the bitline B L b that is driven to "true" G N D by the write control logic, and node A is pulled up due to the charge sharing with the floating bit line B L that has already been pre-charged to V C C . Here, "true" G N D means that the node is driven to the G N D voltage level by other sources. Due to the latch mechanism o f the memory cell, the cell flips its value from " Z E R O " to " O N E " as long as the voltage level o f node B is pulled to a sufficiently low level.  In the test algorithms from [18-19], by setting the bit lines B L and B L b to a given voltage level between V C C and G N D during the write operation, e.g., when the access N M O S is "on", a good cell fails to flip while a faulty cell does. Similarly, we set the voltage level o f B L and B L b to "float" G N D and "true" G N D respectively. Here, "float" G N D means the node voltage level is at G N D but no sources drive this node. This causes an opposite result, i.e., a good cell succeeds at flipping its logic value while a faulty cell fails to do so. For a good cell, there is no problem writing a " O N E " because node B can be pulled down by the bit line B L b and the cell can flip to " O N E " due to the latch mechanism. However, a faulty cell subject to a D R F fails to flip because the voltage level o f node A never exceeds that o f node B . The voltage level o f node A always remains at G N D since (i) lacking the P M O S or path to the supply rail, node A cannot be pulled to " O N E " regardless o f how low the node B voltage level reaches, so the latch i n this faulty cell 94  malfunctions; and (ii) there are no charge sharing effects with bit line B L because it is set at "float" G N D . G N D is the lowest achievable voltage level and node A remains at G N D , the voltage level o f node A never exceeds that o f node B and the faulty cell fails to flip. A s a result, D R F s are detected under P D W T M .  From [11-12], the P D W T M can be merged with any M a r c h test by simply adding two extra N W R C s just before the normal write cycles since P D W T M can share the same write operation mechanism, i.e., faulty cells w i l l fail to flip during the test. In other words, P D W T M has the advantage o f being mergable with any typical March test algorithm without incurring additional test patterns. Other D F T techniques do not share this advantage. Hence, P D W T M is the best i n terms o f test time for D R F s among all existing D F T techniques.  5.4 Evaluations For evaluation, we compare the performance o f the proposed test architecture, e.g., defect coverage, test time and area overhead with that i n [4-5].  5.4.1  Defect Coverage Analysis  From [11-12], the selection o f the P D W T M technique greatly improves the defect coverage not only because it allows the detection o f defects causing D R F s but also because it allows the detection o f defects not causing faulty logical behaviors but possibly causing reliability problems.  Since the M a r c h C W is applied based on the largest/widest e - S R A M , redundant read/write operations w i l l run on the smaller/narrower ones. W i t h the proposed test architecture, the same read/write could be operated on the e - S R A M s with the same capacities regardless o f the IO numbers. However, the read/write operations are different on the ones with different capacities. 95  To analyze the defect/fault coverage effects due to the redundant operations, two cases are considered i n this chapter: Case I: i f the largest e - S R A M address space can cover the smaller e - S R A M address space (for example, the capacities o f all e - S R A M s are 2  m  words, where m is the e - S R A M address  number), the smaller e - S R A M s w i l l be read and written more than once during each march element. Since each write is conducted after a read operation and because o f the Global M I S R , the redundant read/write operation on smaller memories w i l l not compromise its defect coverage at all. •  Case II: i f the former space can not cover the latter one due to sometimes partially decoded address space, we can always select different subsets o f the largest e - S R A M address buses to map for the address space o f the smaller e - S R A M space. In other words, the Case II can '  always be converted into the Case I.  Therefore, these redundant operations on the smaller e - S R A M s do not have any negative effects on the defect/fault coverage at all no matter what their capacities are.  In summary, the defect coverage o f multiple small e - S R A M tests is increased due to the proposed test architecture compared with the one i n [4-5] because o f the applied P D W T M technique.  5.4.2 Test Time Comparisons Since the test o f D R F s is not considered i n [4-5], the reported test time reduction can be considered to be optimistic. W i t h our proposed test architecture, the test time is much less than that i n [4-5], especially when the D R F s test is considered.  96  The R S M a r c h test algorithm can be considered as a fully serial version o f the M a r c h C - except that it covers all intra-word and column decoder faults. Therefore, assuming the largest/widest eS R A M under test has a capacity o f n words and IO number as c, the test time o f the R S M a r c h test algorithm considering the data background for the test architecture i n [4-5] is T^ = lOnct  (1)  5]  where t is the test clock period (ns).  Without D R F s test, the test time o f the selected March C W test algorithm for our proposed test architecture can be calculated as T  proposed  = {(10* + 5c) + (5« + 3 c ) p o g c"|} x t  (2)  2  where (10n+5c) is the complexity o f a parallel March C - algorithm with our test architecture; (5n+3c) [ i g c] represent the complexity due to the added march element i n March C W for 0  2  detecting intra-word and column decoder faults.  The test time reduction we achieve with the proposed architecture is given by the following  R  =  Vsi  i°«£  =  (3)  "^proposed  (10 + 5 f l o g c l ) « + (5 + 3 p o g c l ) c 2  2  Although it is not obvious, the reduction factor R is greater than 1 i n practice. For example, i f both n and c are equal to 4, the R can be 1.29. R increases as c and n increases. For example, for the case o f c - 16 and n = 32, R = 4.16; for the case o f c = 32 and n = 32, R = 5.82. More examples and discussions on the reduction factor can be found i n Sec. 5.4.4. If the testing o f D R F s is considered (they are generally tested through a predetermined delay, e.g., 100ms), the traditional extra test time for D R F s includes 4 units o f extra complexities (i.e.,  97  wO/rO, w l / r l ) and 200ms delay time while only 2 units o f extra test complexities (i.e., N w O / N w l in [11-12]) are needed for the proposed test architecture. In this case, the test time ratio R can be calculated as shown i n Equation (4): R  _ 7J _ +4«cr + 2 x l 0 0 x l 0 4  ^  6  5J  proposed +  T  l  n  t  where t, T . , and Tp p ed are i n ns. [ 4  5 ]  ro  0S  From equation (4), the reduction ratio due to the proposed test architecture could be extremely high when D R F s test is included.  5.4.3  Area Overhead Estimations  Usually, the area overhead for testing small e - S R A M s can be estimated from the transistor counts and from global interfacing wire numbers.  Compared with the designs i n [4-5], the proposed test architecture adds one extra interfacing wire only for the P D W T M control line. In the proposed test architecture, the area o f the B I S T controller is comparable to that i n [4-5]. Since the S P C and Global M I S R are also comparable i n area to the shift registers and M I S R i n [4-5], the only area contributor extra to [4-5] i n the proposed design is the L R A .  From Sec. 5.3.2, the L R A area o f the i  t h  e - S R A M with c, IOs can be considered to be equivalent  to the area o f c, memory cells. Therefore, this total area overhead extra to [4-5] is  4 » = P' * ,oio%  () 5  V  2,(1,-xc,)  where n, is the capacity o f the i  t h  small e - S R A M . Since the IO number o f the e - S R A M s is i n  practice not extremely large compared with the number o f their capacities, this extra area overhead can be acceptable even neglectable. 98  5.4.4 A Case Study To quantitatively investigate the test time reduction and area overhead extra to [4-5] due to our proposed test architecture/methodology, we consider the benchmark e - S R A M s used i n [4]. Therefore, the corresponding parameters can be assigned as follows:  n = 512 (the largest capacity o f e - S R A M s under test is 512 words); c = 100 (the largest IO number o f e - S R A M s under test is 100). Moreover, we assume the test clock period is 10ns predicted by the ITRS documents [21] for the recent on-chip B I S T clock period. W i t h the assumptions above, the comparison results can be summarized i n Table 5-1. Table 5-1 Comparisons of test time reduction and area overhead extra to [4-5] R Aextra  «0.1%  Interface W i r i n g Overhead w/o D R F test  with D R F test  20x  750x  1  From Table 5-1, our test architecture can reduce the test time o f those benchmark e - S R A M s in [4-5] at least at a factor o f 20x. If D R F s test is considered in both [4-5] and this chapter, the test time ratio for those e - S R A M s under test i n [4-5] could be extremely high, i.e., 750x.  5.5 Summary and Future Work One o f the trends for memory dominance i n S o C is that a large number o f distributed small eS R A M s are embedded into the latter as buffers. The major challenge o f testing these memories is not only the test area overhead i n terms o f test circuits and wiring for pattern delivery and response analysis, but also the test time under the high defect coverage requirements.  99  The test architecture i n [4-5] enables test execution parallelism for all small distributed eS R A M s . However, the serial mode for delivering the patterns and analyzing response seriously hurts the test time efficiency. More importantly, the negligence o f D R F testing i n [4-5] not only reduces the defect coverage but also overstates the test time reduction because testing D R F s is time-consuming.  This chapter improves the test architecture i n [4-5] by including D R F tests and revises the response analysis method. W e adopt a previously proposed D F T technique, referred to as " P D W T M " to cover the D R F s with the least test time. The parallel but local response analysis methodology greatly reduces the total test time with reasonable area overhead. Compared with those i n [4-5], the evaluation results from a case study indicate that the test time is reduced by a factor o f approximately twenty while the area penalty extra to [4-5] remains neglectable, e.g., much less than 0.1% o f total memory area for test circuits and one interface wiring. Since the patterns are still delivered i n a serial mode, our future work may target to further reduce test time o f distributed small e - S R A M s by improving the method for delivering test patterns without adversely affecting signal routing. Moreover, the L R A w i l l become slow for very high fan-ins due to the large accumulation capacitance on the output node. Improvement on the L R A ' s speed w i l l also be a part o f our future work.  5.6 References (\] Y . Zorian, and S. Shoukourian, "Embedded-memory test and repair: infrastructure IP for S o C yield", IEEE Design & Test of Computers, V o l . 20, N o . 3, pp. 58-66, 2003. [2] B . Wang, J. Yang, J. Cicalo, A . Ivanov and Y . Zorian, "Reducing Embedded S R A M Test Time under Redundancy Constraints", Proceedings (VTS04), 2004, pp. 237-242. 100  of the 22  nd  IEEE VLSI Test Symposium  [3] A . Bommireddy, J. Khare, S. Shaikh, and S. Su, "Test and debug o f networking SoCs - a '  case study", Proceedings of IEEE 18  th  VLSI test Symposium (VTS), 2000, pp. 121 - 1 2 6 .  [4] W . B . Jone, D . C . Huang, S. C . W u , and K . J. Lee, " A n efficient B I S T method for small buffers", Proceedings ofl 7th IEEE VLSI Test Symposium, 1999, pp. 246- 251. [5] W . B . Jone, D . C . Huang, S. C . W u , and K . J. Lee, " A n efficient B I S T method for distributed small buffers", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, V o l . 10, N o . 4, pp. 512-515,2002. [6] D . C . Huang and W . B . Jone, " A parallel transparent B I S T method for embedded memory arrays by tolerating redundant operations", IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, V o l . 21, N o . 5, pp. 617-628, 2002. [7] W . B . Jone, D . C . Huang and S. R . Das, " A n efficient B I S T method for non-traditional faults o f embedded memory arrays", Proceedings of the 19th IEEE on Instrumentation and Measurement Technology Conference (IMTC/2002), 2002, V o l . 1, pp. 601- 606. [8] W . B . Jone, D . C . Huang and S. R. Das, " A n efficient B I S T method for non-traditional faults o f embedded memory arrays", IEEE Transactions on Instrumentation and Measurement, V o l . 52, N o . 5, pp. 1381-1390, 2003. [9] C . W . Wang, R . S. Tzeng, C . F . W u , C . T. Huang, C . W . W u , S. Y . Huang, S. H . L i n , and H . P. Wang, " A built-in self-test  and self-diagnosis scheme  for heterogeneous  SRAM  Clusters", Proceedings of 10th Asian Test Symposium, 2001, pp. 103-108. [10] C . C . Wang, P. M . Lee, C . F . W u and H . L . W u , " H i g h Fan-in Dynamic C M O S Comparators with L o w Transistor Count", IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, V o l . 50, N o . 9, pp. 1216-1220, 2003.  101  [11] J. Yang, B . Wang, and A . Ivanov, "Open Defects Detection within 6T S R A M Cells using a N o Write Recovery Test Mode", Proceedings 1  of International  Conference on VLSI Design  2004, 2004, pp. 493-498.  [12] J. Yang, B . Wang, Y . W u and A . Ivanov, "Fast Detection o f Data Retention Faults and Other S R A M Cell Open Defects", to appear in the IEEE Transactions  on Computer  Aided  Design (TCAD) of Integrated Circuits and Systems [13] M . Marinescu, "Simple and Efficient Algorithms for Functional R A M Testing", IEEE proceedings  of International  Test Conference,  1982, pp. 236-239.  [14] B . Nadeau-Dostie, A . Silburt and V . K , Agarwal, " A serial interfacing technique for built-in and external testing o f embedded memories", Proceedings  of the IEEE  1989  Custom  Integrated Circuits Conference, 1989, pp. 22.2/1- 22.2/5. [15] B . Nadeau-Dostie, A . Silburt and V . K , Agarwal, "Serial interfacing for embedded-memory testing", IEEE Design & Test of Computers, V o l . 7, N o . 2, pp. 52-63, 1990. [16] C . F. W u , C . T. Huang and C . W . W u , " R A M S E S : a fast memory fault simulator", Proceedings  of International  Symposium on Defect and Fault Tolerance in VLSI Systems,  1999, pp. 165-173. [17] V . H . Champac, J. Castillejos, and J. Figueras, "IDDQ testing o f opens i n C M O S S R A M s " , Proceedings  of 16th IEEE VLSI Test Symposium, 1998, pp. 106-111.  [18] D . H . Y o o n , H . S. K i m , and S. H . Kang, "Dynamic Power Supply Current Testing for Open Defects i n C M O S S R A M s " , Electronics  and Telecommunication  Research Institute (ETRI)  Journal, V o l . 23, N o . 2, pp. 77-84, 2001. [19] A . Meixner, and J. Banik, "Weak write test mode: an S R A M cell stability design for test technique", Proceedings  of International  Test Conference, 1996, pp. 309-318.  102  [20] V . H . Champac, V . Avendano, and M . Linares, " B i t line sensing strategy for testing for data retention faults in C M O S S R A M s " , Electronics  Letters, V o l . 36, Issue 14, pp. 1182-1183,  2000. [21]  International  Technology Roadmap  for  Summary," pp. 44-48, 2001.  103  Semiconductors,  2001  Edition:  "Executive  Chapter 6  A Fast Diagnosis Scheme for Multiple Distributed Small eSRAMs  11  6.1 Introduction Currently, one o f the System-on-a-Chip (SoC) paradigms is associated with a trend that an increasingly large number o f small S R A M s are widely embedded for buffering data between different computational components [1]. A s the embedded S R A M s ( e - S R A M s ) are increasingly dense, the ability for fault diagnose becomes more important. The diagnosis o f such memories is riot only important for locating the faulty cells such that repair can be done to improve the production yield, but also important for debugging the memory circuits for process improvement during the product development stage [2].  Difficulties with the diagnosis o f distributed small e - S R A M s lie i n the following: (i) External testers become increasingly incapable o f diagnosing deeply embedded memories with limited external observability and controllability. Built-in-Self-Diagnosis (BISD) appears to be the only known cost-effective solution for such problems, (ii) For large numbers o f relatively small eS R A M s , the use o f separate B I S D controllers/circuitry can amount to unacceptable test area overheads, (iii) The spatial distribution occurring with several small e - S R A M s renders the routing o f wires required for delivering patterns and analyzing the responses to be problematic, especially i f a single B I S D controller is to be shared by many S R A M s to keep overhead low. (iv) Their diagnosis time is dominated by the time for diagnosing Data Retention Faults (DRFs),  11  A version of this chapter has been published. B. Wang, Y. Wu and A. Ivanov, " A Fast Diagnosis Scheme for  Distributed Small Embedded SRAMs", Proceedings oftheDATE'2005, March 2005, pp. 852-857. 104  which, in practice, are diagnosed by performing a read operation following a predetermined delay (e.g., 100 ms) [3]. In other words, the total diagnosis time for these small e - S R A M s is at least a couple o f hundred milliseconds, regardless o f the memory size/capacity and diagnosis methodology, e.g., parallel or sequential diagnosis.  To overcome the above challenges, previous work mainly focuses on developing diagnosis architectures that support parallel B I S D with a single shared B I S D controller [4-8]. The parallel diagnosis o f distributed small e - S R A M s minimizes the diagnosis area overhead  without  negatively affecting the diagnosis coverage, while allowing a dramatic reduction in the total diagnosis time. However, the scheme i n [4] only supports multiple small e - S R A M s o f the same size, which is usually impractical i n a real S o C . The architecture i n [5-6] has a separate general data background generator and control signal generator associated with each memory. This scheme is generally not feasible for diagnosing multiple distributed small e - S R A M s due to the routing and area penalty. The bi-directional serial interface used i n [7-8] for delivering patterns and collecting responses not only simplifies the routing from the B I S D controller to the memories under diagnosis, but also solves the serial fault masking problem o f the singledirectional serial interface used i n [9-10]. Unfortunately, a M a r c h element with the bi-directional serial interface i n [7-8] can detect at most one fault. Thus, the memory diagnosis capability is dependent on the defect rate. This results i n long diagnosis time, even under a reasonable defect rate. M o r e importantly, all the previous work fails to consider the D R F which dominates the time for small e - S R A M s diagnosis. A s a result, the diagnosis coverage is compromised and the diagnosis time reduction is overstated.  105  This chapter proposes a new diagnosis scheme targeting total diagnosis time reduction for distributed small embedded S R A M s while still maintaining acceptable control signal routing complexity and corresponding area overhead. Our designs are based on those proposed i n [7-8]. W e use a pair comprised o f a Serial-to-Parallel Converter (SPC) and a Parallel-to-Serial Converter (PSC) to replace the bi-directional serial interface i n [7, 8] for each e - S R A M . This avoids the problems o f the serial fault masking and defect rate dependent diagnosis. W e combine the proposed scheme with an effective design-for-test ( D F T ) technique known as "Pre-Discharge Write Test M o d e " ( P D W T M ) [11-12] for diagnosing D R F s without incurring any extra delay time. Together, these improvements yield a high coverage at the expense o f a relatively short execution time and a low area overhead for diagnosing multiple, small distributed e - S R A M s .  The remainder o f this chapter is organized as follows.  In Sec. 6.2, we briefly review the  diagnosis architecture in [7-8]. The detailed scheme designs for reducing the diagnosis time o f distributed small S R A M s are proposed i n Sec. 6.3. The diagnosis evaluations, i.e., diagnosis coverage analysis, diagnosis time comparison, and diagnosis area overhead estimation, are discussed i n Sec. 6.4. Finally, Sec. 6.5 summarizes the chapter.  6.2 Review of Previous Work W e begin b y briefly reviewing the basics o f the diagnosis architecture developed i n [7-8] for diagnosing distributed small e - S R A M s . T o achieve low area overhead, a shared single B I S D controller is adopted, which includes an address trigger to enable the address generators located local to each memory, a data background generator, and a control signal generator. The memory address generators are designed to be local to each memory to simplify routing. T o overcome the data routing challenge and the serial fault masking problem arising from the single-directional serial interface [9, 10], a bi-directional serial interface is designed to deliver patterns and collect 106  responses. These responses are routed back to the controller and compared with the expected values, bit b y bit for each memory. Once a defective cell has been detected, it can be replaced with a spare cell i f it is available. This diagnosis scheme is shown i n Figure 6-1.  The D i a g R S M a r c h algorithm used i n [7-8] is based on a March C - algorithm [13], but can detect all the faults covered by March C W [14] algorithm, which extends M a r c h C - by considering multiple data backgrounds. The bi-directional serial interface i n [7-8] improves the singledirectional serial scan circuit structures i n [9-10] during the data application and observation, such that no fault can be masked by a preceding fault and all the fault cells can be correctly identified. A n example o f this bi-directional interface technique is shown i n Figure 6-2.  W i t h such a serial interface, at most one fault can be detected for each M a r c h element. In other words, the total diagnosis process is dependent on the defect rate. For a general manufacturing process with a reasonable defect rate, this diagnosis scheme w i l l result i n long diagnosis time. Moreover, the authors o f [7-8] do not consider the diagnosis o f D R F s . Not considering D R F s limits diagnosis coverage and also mitigates the diagnosis time reduction achieved due to their proposed architecture since the time required for the diagnosis o f D R F s usually dominates the small e - S R A M total diagnosis time. BIST/BISR  Controller  A d d r e s s Trigger  ^|  Bi-directional Seriallnterface  Control Generator  ^|  Data B a c k g r o u n d Generator  Bi-directional Seriallnterface  Comparator Array  Bi-directional Seriallnterface  e-SRAM1  e-SRAM2  e-SRAM3  BackupMemory  BackupMemory  BackupMemory  Figure 6-1 The diagnosis scheme in [7-8] 107  T r a n s p a r e n t latch  T o n e x t right t e s t d a t a i n p u t or s e r i a l i n p u t .Output  i+1  F r o m right p r e v i o u s o u t p u t or s e r i a l i n p u t i n p u t or s e r i a l i n p u t  Figure 6-2 Memory with bi-directional serial connections in the BISD mode  6.3 The Fast Diagnosis Scheme 6.3.1  Architectural Overview  To improve the architecture i n [7-8] for shortened diagnosis time without imposing excessive area overhead, we propose a diagnosis scheme based on serial delivery but parallel application o f patterns and serial analysis o f responses. The patterns serially delivered to e - S R A M s are applied to these memories through the Serial-to-Parallel Converters (SPCs). The responses from each eS R A M are routed back to the B I S D controller i n a serial fashion through the Parallel-to-Serial Converter (PSC). These responses are compared with the expected values, bit b y bit for each eS R A M by the comparator array. Once a defective cell has been detected, the diagnosis information, e.g., the faulty address, applied data background, etc., w i l l be registered for on-chip repair or shifted out for off-line analysis. To avoid worsening the diagnosis signals routing problem, we design both the S P C and the P S C to reside locally to a memory under diagnosis. In other words, in our scheme, each e - S R A M possesses its own S P C and P S C . L i k e that i n [7-8], 108  the memory address generators for each e - S R A M are also designed to be local to each memory to save the test address routing area. The global B I S D controller is designed based on the largest (i.e., largest capacity) and the widest (largest IO number) e - S R A M (s).  A time-efficient method to diagnose D R F s is referred to as the "Pre-Discharge Write Test M o d e " ( P D W T M ) i n [11-12]. W e adopt this method here as well b y implementing it with the N W R M a r c h method. Since this implementation only requires a single control gate for the entire e - S R A M s to disable pre-charge bit lines during D R F diagnosis, a P D W T M signal is routed to all the memories. This signal is added into the control generator to enable the D R F diagnosis for all e-SRAMs.  For a fair comparison, we use the March C W algorithm in [14] for the proposed diagnosis scheme. According to the above discussions, our proposed diagnosis scheme is shown i n Figure 6-3. BISD Controller Address Trigger  ' 'done  -j-  Control Generator  L)| Serial to Parallel Converter ( S P C )  diagnosis " ^scan out  b i s d  Data Background I-* Generator  Serial to Parallel Converter ( S P C )  Comparator Array  3| Serial to Parallel Converter ( S P C )  e-SRAM 1  e-SRAM 2  e-SRAM 3  Parallel to Serial Converter ( P S C )  Parallel to Serial Converter ( P S C )  Parallel to Serial Converter ( P S C )  Figure 6-3 The proposed diagnosis scheme  The proposed scheme works as follows. Before each March element begins, a test pattern is Serially delivered to all SPCs local to the e - S R A M s . Once the pattern delivery is complete, the controller triggers the local address generator to conduct a full M a r c h element before providing a 109  new test pattern. During the read phase o f each March element, once the memory responses are captured by the P S C , they are shifted back to the B I S D controller, bit b y bit, while the memory is i n an idle or no-op mode. If a memory is not equipped with an idle or no-op mode, the memory is placed i n a read mode however with data read ignored. Since our P S C shifting path does not involve the memories, there is no fault masking effect. The comparator array i n the central controller compares these responses with the expected values bit by bit. Once a defective cell is found, the diagnosis information, e.g., failure addresses, data background, etc., w i l l be either registered for on-chip repair or scanned out for off-line analysis.  Similarly to that described i n [7-8], a pattern is written to each address only once for the largest memory or memories. For smaller ones, however, the same pattern could be written on each address multiple times as the addresses wrap around. It should be pointed out that the responses obtained from a smaller e - S R A M w i l l change as soon as the addresses wrap around for the first time, due to the read-modify-write operations o f the M a r c h C - . K n o w i n g when the addresses wrap around requires memory size information. This chapter chooses to store this information i n the B I S D controller, just like that i n [7-8], so that the comparison i n the B I S D controller can tolerate those redundant read/write operations.  6.3.2 Serial to Parallel Converter (SPC) In the proposed diagnosis scheme, S P C s receive the patterns delivered from the Data Background Generator o f the B I S D controller and apply these patterns to the corresponding eS R A M s i n parallel.  If the serial pattern from the Data Background Generator is shifted from the least significant bit ( L S B ) to the most significant bit ( M S B ) and the S P C also converts the patterns from the L S B to 110  the M S B , different types o f SPCs w i l l be needed. E.g., i f both pattern delivery and conversion are performed from L S B to M S B , the converted pattern for the widest e - S R A M and smaller one would be D P [c-l:0] and D P [c-1: c-c'], respectively, when the Data Background Generator delivers a Diagnosis Pattern denoted as D P [c-l:0], where c and c' is the IO number o f the widest e - S R A M and the narrower one respectively. This is because the first (c-c') bits of patterns for the narrower e - S R A M s are shifted out o f the S P C and get lost. However, the expected patterns delivered to the narrower e - S R A M s should be D P [c'-l:0]. This mismatch may reduce coverage.  To prevent the potential coverage loss, we design the pattern delivery and conversion for all memories according to the following: the serial pattern is shifted from the M S B to the L S B and its corresponding S P C coverts the patterns from the M S B to the L S B . A s a result, the corresponding conversion from the Data Background Generator w i l l also be modified as that from the M S B to L S B . W i t h this appropriate design, all the patterns are correctly delivered i n parallel to every small e - S R A M under diagnosis.  ;DP[3:0]  (a), fore = 4  Shift CtocK  DP[0:3]  -DP[2:0]  TJ (b). forc=3  Figure 6-4 Designs for pattern delivery and SPC A design example o f two co-existing small e - S R A M s with c = 4 and c'= 3 is shown i n Figure 64 (a) and (b), respectively. Obviously, the widest e - S R A M has an IO number o f 4 while the narrower one has an IO number o f 3 i n this example. It should be pointed out that a new diagnosis pattern is delivered to the S P C only once just before each March element begins.  Ill  6.3.3  Parallel to Serial Converter (PSC)  The proposed P S C s collect diagnosis responses from each e - S R A M i n parallel and convert them into serial ones. These serialized Diagnosis Response ( D R ) sequences are shifted back to the B I S D controller from the L S B to M S B . Although the response analysis for each e - S R A M is i n a serial manner, the response sequences from all the memories are analyzed i n parallel.  Since all the P S C s are independent o f each other, they can be designed to be the same for each eS R A M s , i.e., to go from L S B to M S B . To separate the memory output from the shifting components, a scan type o f D F F s are adopted. A n example P S C is shown i n Figure 6-5. DRlc'iO]  DR_serial_out  scan an  Figure 6-5 Design for a general PSC In Figure 6-5, the diagnosis responses are first captured into c' registers i n parallel. This is followed b y the memory entering an idle mode when shifting the memory responses back to the B I S D controller for evaluation. If a memory does not have an idle mode, we can place the memory i n read mode with read data ignored during the shift operation o f the P S C . Since the memory does not interfere with shift operation o f the P S C , there w i l l be no negative impact on diagnosis coverage due to the extra read operations. However, an extra scan_en signal is required to control the capture o f memory test response and the shift or serialization o f the captured test response. It should be pointed out that the serialization operation o f the P S C with the memory i n an idle mode does not compromise at-speed diagnosis coverage. This is because i n the readmodify-write operations used i n the M a r c h C - , e.g., (RO W l ) , the only signals that change after the RO and before the W l are the read/write enable ( W E N ) and data inputs. A s long as we ensure 112  that the W E N and data inputs do not change until the last shift operation i n the P S C , the shift operation does not change at-speed coverage o f the W E N decoding and data input circuitry.  6.3.4  Diagnosis of Data Retention Faults  To diagnose D R F s , we adopt a previously proposed low-penalty D F T technique from [11-12], referred to as "Pre-Discharge Write Test M o d e " ( P D W T M ) .  L i k e the methodology i n [15-16], i n P D W T M , a special write cycle is created to distinguish a good cell from a faulty cell when subjected to a D R F caused by an open defect on the pull-up P M O S . A typical 6T S R A M cell with storage node A and complementary storage node B , shown in Figure 6-6, is used to illustrate the differences between the specifically designed " N o Write Recovery C y c l e " ( N W R C ) and a normal write cycle. The bit line pre-charge circuit o f N W R T M is also shown i n Figure 6-6, where the signal P D W T M is used to switch between the N W R C and a normal write cycle. In Figure 6-6, during a normal W l cycle, node B is pulled down by the bit line B L b that is driven to "true" G N D by the write control logic; and node A is pulled up by its pull-up P M O S . Here, "true" G N D means that the node is driven to the G N D voltage level b y an active device. Due to the latch mechanism o f the memory cell, the cell flips its value from " Z E R O " to " O N E " as long as the voltage difference between nodes A and B reaches a threshold.  In [15-16], by setting the bit lines B L and B L b to a given voltage level between V c c and G N D during the write operation, a good cell fails to flip while a faulty cell does. Similarly, we set the voltage level o f B L and B L b to "float" G N D and "true" G N D respectively. Here, "float" G N D means the node voltage level is at G N D but not actively driven b y any device. This causes the opposite result, i.e., a good cell succeeds at flipping its logic value while a faulty cell fails to do so. For the above example, a good cell has no problem writing a " O N E " because node B can be 113  pulled down by the bit line B L b and the cell can flip to " O N E " due to the latch mechanism. However, a faulty cell subject to a D R F fails to flip because the voltage level o f node A never exceeds that o f node B . The voltage level o f node A always remains at G N D since (i) lacking the P M O S or path to the supply rail, node A cannot be pulled to " O N E " regardless o f how low the node B voltage level reaches. Thus, the latch i n this faulty cell malfunctions; and (ii) there are no charge sharing effects with bit line B L because it is set at "float" G N D . G N D is the lowest achievable voltage level and node A remains at G N D . Consequently, the voltage level o f node A never exceeds that o f node B and the faulty cell fails to flip. Therefore, D R F s are detected under PDWTM. WL  Figure 6-6 A typical 6T SRAM cell and its pre-charge control circuits for PDWTM  L i k e a normal write operation, a P D W T M write operation can successfully write a good cell and may fail to write a defective cell causing D R F s . Therefore, the P D W T M can be merged with any M a r c h test b y simply adding two extra N W R C s just before the normal write [11-12]. Other D F T techniques do not share this advantage. Hence, P D W T M is the best i n terms o f test time for D R F s among all existing D F T techniques.  114  6.4 Evaluations 6.4.1  Diagnosis Coverage Analysis  Compared with the diagnostic scheme i n [7-8], the proposed scheme simply replaces the b i directional serial interface with a pair o f S P C and P S C . A l l the other components i n [7-8] are preserved. In terms o f diagnostic coverage, the adoption o f the March C W i n this chapter provides the same coverage as the serialized M a r c h C - used i n [7-8]. However, due to the use o f the P D W T M , the proposed scheme achieves additional coverage o f D R F s . A s a result, the diagnosis coverage o f the proposed diagnosis scheme is increased compared with that o f those i n [7-8] because its diagnosis capacities i n D R F s and other defects not causing faulty logical behaviors but possibly causing reliability problems.  6.4.2 Diagnosis Time Comparison Since the D R F s diagnosis is not considered i n [7-8], the reported diagnosis time reduction can be considered to be optimistic. W i t h our proposed diagnosis scheme, the diagnosis time is much less than that i n [7-8].  The D i a g R S M a r c h algorithm i n [7-8] is based on right-shift operational R S M a r c h with extra March elements that include both left-shift operations and checkboard patterns. Therefore, assuming the largest/widest e - S R A M under diagnosis has a capacity o f n words and IO number of c, the diagnosis time o f the D i a g R S M a r c h algorithm i n [7-8] without considering D R F s diagnosis is T _ = \lknct + 9nct = (\lk + 9)nct {1  (1)  %]  where t is the diagnosis clock period (ns) and k is iteration number o f M l elements required in  [7-8]. 115  Without D R F s diagnosis, the diagnosis time o f the selected M a r c h C W algorithm for our proposed diagnosis scheme can be calculated as T  = {(5n + 5c + 5n(c +1)) + (3« + 3c + 2»(c + l))pog c >  proposed  (2)  2  where (5n+5c+5n(c+l)) is the complexity o f a parallel M a r c h C - algorithm with our diagnosis scheme; (3n+3c+2n(c+l))[\  ~\ represents the complexity due to the added M a r c h element i n  OS2C  March C W for detecting intra-word and column decoder faults.  The diagnosis time reduction we achieve with the proposed scheme is given by the following _  ?;,-.]  R  T P  (\lk + 9)nc  ,o os,„  (lOn  P  + 5c  + 5nc)  + (5«  + 3c  (3) + 2nc)  [log,  c]  Although not obvious, the reduction factor R w i l l always exceed one i n practice because the iteration number k is always much larger than one.  If the D R F s are considered, the extra diagnosis time for D R F s when using the diagnosis architecture i n [7-8] includes 8k units o f extra complexities (i.e., (wO/rO)R+L, ( w l / r l ) R ) and + L  200ms delay time. In comparison, the proposed scheme requires only 2 units o f extra test complexities (i.e., N w O / N w l in [11-12]) for D R F s diagnosis. In this case, the diagnosis time ratio R can be calculated as shown i n Equation (4): R  =  7- .„ + 8 * I I C < + 2 x l 0  (4)  8  [7  T ^ pr  + (2» + 2c)t  stJ  where t, T .8] and T [7  proposed  are i n ns.  From equation (4), the reduction ratio due to the proposed diagnosis scheme could be extremely high when D R F s diagnosis is included. 116  To quantitatively investigate the diagnosis time reduction, we use a case study i n [16] as the benchmark e - S R A M s , where n = 512, c = 100 and t = 10ns. W e assume that 1% o f the memory cells are defective and all four different defect types i n [8] occur with equal likelihood. From [8], the maximum numbers o f the total faults for each o f the benchmark e - S R A M s i n [17] would be 256. Since the M l element i n [7-8] can cover 75% o f those faults and each iteration o f the M l element can identify at most two faults, the minimum iteration number k can be calculated to be (256*0.75/2) = 96. Using these assumptions, we found this diagnosis time reduction factor R, without considering D R F s , is at least 84. If D R F s are considered, R for the e - S R A M s i n [17] can be at least 145.  6.4.3  Area Overhead Estimations  Area overhead o f the proposed scheme is evaluated according to the required number o f transistors to implement the scheme and the number o f global wires. Compared with the designs i n [7-8], the proposed scheme adds only one extra global wire for the control o f the P S C .  In terms o f transistor count, the bi-directional serial interface [7-8] actually includes a set o f 4:1 multiplexers and latches. In the proposed scheme, a S P C and a P S C together require two sets o f shift registers and 2:1 multiplexers, one for selecting between normal and test inputs and the other for the scan D F F s i n P S C . In terms o f transistor count, we find that a D-flip-flop is equivalent to two 6T S R A M cells while a latch is equivalent to one 6T S R A M cells. Therefore, this total area overhead extra to [7-8] is three 6T S R A M cells per bit. Fortunately, this extra area can  be neglectable i n practice. For example, this area overhead is around 1.8% for the  benchmark e - S R A M s i n [17] when applying both that i n [7-8] and the proposed diagnosis scheme. 117  6.5 Summary The major challenge o f diagnosing distributed small e - S R A M s is not only the diagnosis area overhead i n terms o f diagnosis circuits and wires routing, but also the diagnosis time under high coverage requirements. This chapter presented a significant improvement on the diagnosis architecture described i n [7-8]. B y replacing the bi-directional serial interface used i n [7-8], the proposed scheme greatly reduced diagnosis time. B y adopting a previous D F T technique known as " P D W T M " to detect data retention faults, the proposed scheme achieved better coverage. Compared with those i n [7-8], the evaluation results indicate that the diagnosis time under a reasonable 1% defect rate environment is reduced b y a factor o f at least 84 with 1.8% o f the total cells area extra to that i n [7-8].  6.6 References [1] A . Bommireddy, J . Khare, S. Shaikh and S. T. Su, "Test and debug o f networking SoCs - a case study", Proceedings of IEEE 18  th  VLSI test Symposium (VTSOO), 2000, pp. 121- 126.  [2] C . Selva, C . Torelli, D . Rimondi, R . Zappa, S. Corbani, et. al., " A programmable built-in self-diagnosis for embedded S R A M " , Proceedings  of the 2004 International  Workshop on  Memory Technology, Design and Testing (MTDT04), 2004, pp. 84-89. [3] B . Wang, J. Yang, J . Cicarlo A . Ivanov and Y . Zorian., "Reducing embedded S R A M test time under redundancy constraints", Proceedings  of the 22  nd  IEEE VLSI Test Symposium  (VTS04), 2004, pp. 237-242. [4] L . M . Deng, R . S. Tzeng, C . F . W u , C . T. Huang, C . W . W u , et a l , " A parallel built-in selfdiagnosis scheme for embedded memory", Proceedings  of the 2004 IEEE  International  Workshop on Memory Technology, Design, and Testing (MTDT04), 2004, pp. 65-69.  118  [5] C . W . Wang, R . S. Tzeng, C . F . W u , C . T. huang, C . W . W u , et. al., " A built-in self-test and self-diagnosis scheme for heterogeneous S R A M clusters", Proceedings  of 10th Asian Test  Symposium, 2001, pp. 103-108. [6] R . C . Aitken, " A modular wrapper enabling high speed B I S T and repair for small wide memories", Proceedings of the International  Test Conference (ITC04), 2004, pp. 997- 1005.  [7] D . C . Huang, W . B . Jone, and S. R . Das, " A parallel built-in self-diagnostic method for embedded memory buffers", Proceedings  of the International  Conference on VLSI Design,  2001, pp. 397-402. [8] D . C . Huang and W . B . Jone, " A parallel built-in self-diagnostic method for embedded memory arrays", IEEE Transactions on Computer-Aided  Design of Integrated Circuits and  Systems, V o l . 21, Issue 4, pp. 449-465, 2002. [9] B . Nadeau-Dostie, B . Silburt, and V . K . Agarwal, " A serial interfacing technique for built-in and external testing o f embedded memories", Proceedings  of the IEEE  1989 Custom  Integrated Circuits Conference, 1989, pp. 22.2/1- 22.2/5. [10] B . Nadeau-Dostie, B . Silburt, and V . K . Agarwal, "Serial interfacing for embedded-memory testing", IEEE Design & Test of Computers, V o l . 7, N o . 2, pp. 52 -63, 1990. [11] J. Yang, B . W a n g and A . Ivanov, "Open defects detection within 6T S R A M cells using a N o Write Recovery Test Mode", Proceedings  of International  Conference on VLSI Design  2004, 2004, pp. 493-498. [12] J . Yang, B . Wang, Y . W u and A . Ivanov, "Fast Detection o f Data Retention Faults and Other S R A M Cell Open Defects", to appear in the IEEE Transactions on Computer  Aided  Design (TCAD) of Integrated Circuits and Systems [13] M . Marinescu, "Simple and efficient algorithms for functional R A M testing", IEEE proceedings  of International  Test Conference, 1982, pp. 236-239.  119  [14] C . F. W u , C . T. Huang and C . W . W u , " R A M S E S : a fast memory fault simulator", Proceedings  of International  Symposium on Defect and Fault Tolerance in VLSI Systems,  1999, pp. 165-173. [15] A . Meixner, and J. Banik, "Weak write test mode: an S R A M cell stability design for test technique", Proceedings  of International  Test Conference, 1996, pp. 309-318.  [16] V . H . Champac, V . Avendano, and M . Linares, " B i t line sensing strategy for testing for data retention faults i n C M O S S R A M s " , Electronics  Letters, V o l . 36, Issue 14, pp. 1182-1183,  2000. [17] B . Wang, Y . W u and A . Ivanov., "Designs for Reducing Test Time o f Distributed Small Embedded S R A M s " , Proceedings  of IEEE International  Tolerance in VLSI Systems, 2004, pp. 120-128.  120  Symposium on Defect and Fault  Chapter 7  A Retention-aware Test Power Model for an e-SRAM  12  7.1 Introduction Advances i n semiconductor technology have enabled concurrent testing o f embedded cores, e.g., embedded S R A M s ( e - S R A M s ) , i n a System-on-a-Chip (SoC) [1-10] as an effective approach for total test time reduction. Power consumption i n test mode is known to be higher than that i n a mission mode [6]. A s a result, when a large number o f e - S R A M s are tested simultaneously, the total power dissipation during the test can exceed power limits, thus causing device damage, yield loss, and unacceptably high defect levels. In order to exploit test parallelism for maximal test time reduction while not violating the test power constraints, efficient test scheduling algorithms are called for. The test power model obviously constitutes a key component o f any power-constrained test scheduling.  Traditionally, the test power model for e - S R A M s is identical to that used for S o C cores made o f random logic [6-10]. This model simply assumes the power consumption during test to be constant. Consequently, a memory test can be represented by a single rectangle, as shown i n Figure 7-1, where the rectangle's width denotes the test time and its height denotes the test power. Though this model is simple, it is overly pessimistic i n regards to e - S R A M s testing where Data Retention Faults (DRFs) are included as target faults. Even though Design-for-Test (DFT) techniques exist for D R F s [11-15], they often come with high  hardware/performance  overhead and design efforts. A s a result, the most common practice for D R F test today is still to  A version of this chapter has been published. B. Wang, J. Yang, Y. Wu and A. Ivanov, "A Retention-Aware Test Power Model for Embedded SRAM", Proceedings oftheASP-DAC'2005, pp. 1180-1183, Jan. 18-21, 2005 121  use two separate delay cycles i n a test, during which the memories under test conduct no operation and thus consumes "zero" or negligible power [16-17]. In this chapter, we take these two separate "zero" power delay cycles into consideration and propose a "retention-aware" test power model for e - S R A M s . Our new model uses three rectangles, instead o f one, to more accurately describe the power consumption during the test o f e - S R A M s . This model is simple and clearly distinguishes the test power for detecting D R F s and power for other faults. Compared to the previous power model, the new model provides more freedom for overall eS R A M s test schedule optimization. A general test algorithm  ^(wO rO) — - Delay  ft(w1  r1)-  Test power  The test power model Test time  Figure 7-1 Simple "single-rectangle" test power model for e-SRAMs. To  evaluate the effectiveness o f the new model on test time reduction, we use the same  algorithms presented i n [8] however with the traditional power model replaced b y the "retentionaware" model. The new model is evaluated with various assumptions and test conditions. Expectedly, our evaluation results show that the greatest test time reduction occurs i n the cases where the delays required for testing D R F s dominate the total e - S R A M test time. Furthermore, we show that the test time reduction using the new model is more obvious for a S o C with dominant number o f small e - S R A M s .  7.2 A Retention-Aware Test Power Model D R F s are an important type o f memory faults. These faults occur when a memory cell fails to retain a stored logic value after a certain time. Since it is difficult to detect D R F s using normal 122  memory write/read operations, testing these D R F s is traditionally done b y performing a read operation after a certain pre-determined delay denoted by T D F . In practice, T D F is typically at the order o f 100 milliseconds [18].  When a memory is placed into such a delay phase for testing D R F s , it does not perform any read or write operation. During the delay portion o f a memory test such as that shown i n Figure 7-1, the power dissipation due to the memory is negligible compared to the power for the rest o f the memory test and the test o f other S o C cores. Hence, the assumption that a memory test power can be represented b y a "single-rectangle" is overly conservative i n practice. T o more accurately describe the test power during e - S R A M test, a retention-aware test power model is needed.  Testing D R F s usually requires two delay cycles. During the first delay cycle, the memory contains a certain specified data pattern. During the second delay cycle, the complementary data pattern is used. T o reflect the two delay cycles i n a memory test, we propose to restrain the power levels to two, zero and a specified maximum value. During the two delay cycles, zero power dissipation is assumed. For the rest o f a memory test, the maximum dissipation is assumed. Figure 7-2 illustrates the proposed test power model corresponding to a general e-  YV/y/A  S R A M test algorithm.  The Delay C y c l e s for D R F s  | N o n - d e l a y cycles during tests  1  Test power T h e test power model  A  B  ^  _  c  Test time  Figure 7-2 The "retention-aware" test power model. 123  In Figure 7-2, the entire memory test session is divided into three sub non-delay portions labeled A , B , and C . T l and T2 i n Figure 7-2 represent the delay separating the n o n - D R F test cycles, i.e., between A and B or B and C . Generally, both T l and T2 are identical and equal to the T D F . However, i n order to study the effect o f T l and T2 on the test time reduction using the new model, we assume that T l and T2 can differ from each other and can be greater than T D F due to the applied test scheduling algorithms. The duration for sub sessions A , B , and C may be not identical either.  T D F is a parameter that depends on many factors, i.e., design techniques, manufacturing technologies, etc. In addition, there are various D F T techniques that would affect T D F . In [17], for example, the authors have shown how to reduce T D F without memory modification.  7.3 Test Time Evaluations for BISTed e-SRAMs For simplicity, we assume for now that all S o C cores under test are e - S R A M s with Built-in SelfTest (BIST). Later, we w i l l discuss the case where a S o C is made o f different percentages o f eS R A M s and other types o f embedded cores. T o compare the test time using different test power models under various assumptions and test conditions, we select the test scheduling algorithms in [8] for evaluations. The test time reduction factor R is defined as a ratio o f the test time when using the new retention-aware model over the time when using the traditional single-rectangle model.  7.3.1 A Case Study In this subsection, a test case with four e - S R A M s is defined to demonstrate the test time reductions. W e then extend this evaluation to a case with a large number o f e - S R A M s b y replicating the four e - S R A M s block above multiple times. 124  For both the retention-aware model and the traditional model, the total test time for a single eS R A M is the sum o f the delay cycles, i.e., 2 * T D F , and the non-delay cycles o f the memory test algorithm. For simplicity, we assume the general test algorithm represented i n Figure 7-2 with three equal-duration sub non-delay cycles A , B and C . In the next subsection we w i l l look at non-equal duration cases. A l s o , we assume T D F = 1 0 0 ms, and the system test power limit to be lOOOmW. Finally, we assume a case o f four e - S R A M s with test time and test power as reported in Table 7-1. A s shown i n Figure 7-3 (b), the use o f our "retention-aware" model reduces the total test time from 201 + 400 = 601 ms to 201 + 2*(100-l/3) = (400 + 1/3) ms. In other words, a reduction factor R ~ 1.5x is achieved. Table 7-1 Example: e-SRAMs under test e-SRAMs  Total Test Time (ms) = (A+B+C + 2 * T D F )  Test Power (mW)  Ml  1 +2*100  600  M2  1+2*100  400  M3  200 + 2*100  300  M4  200 + 2*100  700 T e s t p o w e r ( m w ) M 3 A M 1 A M 3 B M 1 B  T e s t p o w e r (m w )  M 3 C M 1 C  M 3 M 1  M 4  h o o Kf 0 0  M 2  M 4 A T e s t tim e (m s ) (a )  Kf  M 2 A M 4 B M 2B M 4C T e s t time (ms) (b)  M 2 C  Figure 7-3 Test scheduling with (a) "single-rectangle" and (b) "retention-aware" test power model using the test scheduling algorithms in [8]  125  To determine the effectiveness o f the new model for the case where a larger number o f eS R A M s exist, we simply replicate memories o f M 1 - M 4 N times. Figure 7-4 illustrates the reduction factor achieved using our model as N increases.  From Figure 7-4, a maximum reduction factor o f 3x is achieved when N - 3. This is because all the delay cycles are filled with the non-delay cycles once N reaches 3. W e denote the value o f N for which the maximum reduction is achieved as N  sat  and the corresponding reduction factor as  Rsat.  Reduction Factor vs. N  1.4 + 1  ^am  '•—,  2  3 N  :  ^ 4  !—I  5  Figure 7-4 Test time reduction factor assuming N replicas of the M1-M4 e-SRAMs block  7.3.2  Impacts of Non-delay Cycle Divisions  During the test power modeling o f the preceding case study, the widths o f the non-delay cycles were assumed to be equal for simplicity. In reality, it may not be the case. Here, we revisit our case study by removing the equal duration assumption. The relative divisions we assume, corresponding to A : B : C , are 1:1:1; 1:3:2; and 1:3:4. W e simply refer to these relative division methods as Division 1, 2 and 3. T o further explore the impact o f the relative division sequences, we divided Division 3 into a total o f three (3a, 3b, 3c) corresponding to A : B : C as follows: 1:3:4; 4:1:3; 3:4:1.  The maximum test time reduction factors R  sal  and the corresponding N  sat  that we obtain for these  5 relative divisions or non-delay cycles after test scheduling are shown i n Table 7-2. 126  Table 7-2 Test time reduction for non-delay cycle divisions Division Methods  3  N  1  3  2  4 3a  5  3b  5  3c  4  From Table 7-2, we see that the  N t s a  Rsat (x)  s a t  3.0  are apparently different from the case o f equal duration non-  delay cycles. However, the saturation values o f the reduction factor R  s a t  are the same for all test  cases no matter how to divide these total non-delay cycles. In other words, the ratio o f the three non-delay cycles, A : B : C , has no impact on R . sat  7.3.3  Memory Test Algorithm Complexity Impacts on Test Time reduction  This section studies the effect o f test algorithm complexities on the test time reduction when using the new power model. Table 7-3 Test time reduction for different test complexity (time) Algorithms  Total Test Time (ms) = (A+B+C + 2 T D F ) M1/M2  0.1+2*100  M3/M4  20 + 2*100  M1/M2  1+2*100  M3/M4  200 + 2*100  M1/M2  10 + 2*100  M3/M4  2000 + 2*100  Simple  Nominal  Complex  Nsat  Rsat (x)  16  20.8  3  2.99  2  1.20  For easy discussion, we refer to the case reported i n Sec. 3.1 as the n o m i n a l algorithm. Based on the n o m i n a l case, we introduce two other cases o f different time complexities: 127  •  Simple algorithm: the test time for the three non-delay cycles (A+B+C) is 1/10 the T D F .  •  Complex algorithm: the test time for the three non-delay cycles (A+B+C) is lOx the T D F .  A s shown i n Table 7-3, the more complex test algorithm the smaller N  s a t  and R . This trend is sat  expected as the relative duration o f T D F decreases with respect to the time o f the non-delay cycles.  7.3.4  Memory Capacity Impact on Test Time Reduction  In this subsection, we revisit our case study by considering cases comprised o f e - S R A M s with different capacities. W e again use a simple notation. •  Small Capacity e - S R A M s : The time o f the delay cycles, i.e., 2 * T D F , is much greater than the time for the non-delay cycles, e.g., T D F = 100* (A+B+C).  •  Medium Capacity e - S R A M s : the test time o f the non-delay cycles and that o f the delay cycles are comparable, e.g., 2 * T D F = (A+B+C).  •  Large Capacity e - S R A M s : The test time o f the non-delay cycles time is much greater than the delay cycles, i.e., 100*TDF = (A+B+C).  Except for the time o f the non-delay cycles, we maintain the assumptions made i n Sec. 3.1. W e considered six kinds o f test cases (for example, small-small denotes that both M 1 / M 2 and M 3 / M 3 pairs are Small Capacity e - S R A M s ) . These are reported i n Table 7-4.  A s shown i n Table 7-4, different memory capacities yield different R  sal  and N t - The reduction is sa  greatest for the small-small test case. However, for testing a large number o f e - S R A M s with long respective test time, e.g., non-delay cycles as the order o f 100 times their T D F , the reduction that our test power model allows to achieve is much smaller.  128  Table 7-4 Test time reduction for different memory capacities Test Time (ms)  Test cases  M1/M2  Rsat (x)  151  200  3  1.50  2  1.04  3  2.00  2  1.04  1  1.02  201  Small-Small M3/M4  201  M1/M2  201  Small-Medium M3/M4  400  M1/M2  201  M3/M4  10200  M1/M2  400  M3/M4  400  M1/M2  400  M3/M4  10200  M1/M2  10200  M3/M4  10200  Small-Large  Medium-Medium  Medium-Large •  Nsat  Large-Large  7.4 Quantification of Test Time Reduction in a SoC During a test scheduling under power constraints, the maximum instant power consumption P , is predetermined and assumed to be constant. A t any given instant, the total power consumption is simply the sum o f the power dissipation o f each core being tested at the time. T o minimize the total test time, T i, while not violating the system power constraint, P , the objective o f a test tota  m  scheduling is to fit all the test power models o f all cores into a single rectangle box. The height o f the box is limited b y P and the width o f the box, i.e., the total test time, is to be minimized. m  Assuming the traditional "single-rectangle" test power model for all cores, the minimal test time Ttotal-single IS!  j* _ soccores total-single m  Bw)+  e-SRAMs  non-e-SRAM-cores P m  D  /]\  where P , and 7/ represent test power consumption and test time for core /.  Next, we replace the traditional test power model used for e - S R A M s i n Equation 1 with the "retention-aware" model while continue to use the traditional model for other types o f cores. A s a result, a corresponding lower bound on the total test time achievable with our new model, retention,  T i. tota  can be quantified as follows: Y,P x(T -2xTDF)+ i  i  rn e-SRAMs total-retention  The ratio o f  T i. i tota  smg  e  £i>xj; non-e-SRAM-cores  ( )\ \r") r  JJ m  over  T i. tota  retention  provides an estimate o f the maximum time reduction  factor that can be achieved using our new model i n the presence o f n o n - S R A M type cores. This is shown as Equation (3). Assuming all the S o C cores consume the same power P and can be tested with the same time T, we define the following two parameters to evaluate R  sat  under a S o C environment:  •  Test Time Ratio R (%)= 2 x T D F / T x 100  •  The Number Ratio R (%) = the number o f e-SRAMs/the number o f all S o C coresx 100  T  n  total-sin gle  D  soc _ cores  sat ~  K  soc_cores  1  J-SMMS  e-SRAMs  2xTDF  x  soc cores  130  Table 7-5 reveals that the "retention-aware" model can leads to very significant test time reduction i f e - S R A M s dominate the total number o f embedded cores i n a S o C and the T D F dominates the test time o f the e - S R A M s . Table 7-5 Test time reduction factor evaluation within a SoC  0.1  10  50  90  99  99.9  0  1.00  1.00  1.00  1.00  1.00  1.00  10  1.00  1.01  1.05  1.10  1.11  1.11  50  1.00  1.05  1.33  1.82  1.98  2.00  90  1.00  1.10  1.82  5.26  9.17  9.91  100  1.00  1.11  2.00  10.00  100  1000  Rx  7.5 Summary Parallel test o f multiple cores often leads to dramatic reduction o f test time. However, high power dissipation during parallel test imposes a limit on the degree o f such parallelism. A n efficient test scheduling is usually required for minimal test time under power constraints. However, the effectiveness o f such a test scheduling algorithm is limited by the accuracy o f the test power models it uses.  Traditionally, test power modeling treats e - S R A M s the same as other embedded random logic cores and represent the test power using the "single-rectangle" model. This chapter showed that this model is overly conservative for e - S R A M s due to the "zero" power delay cycles used to detect Data Retention Faults (DRFs). B y taking advantage o f the "zero" power delay cycles, we proposed a novel "retention-aware" test power model. Our model is simple but more accurate i n regards to test power/time predictions. 131  When using the new model i n a test scheduling algorithm [8], we have demonstrated very significant test time reduction as compared to using the traditional model for those e - S R A M s that use simple test algorithms and/or have long wait duration for D R F detection. D F T techniques that detect D R F s yet without imposing the delay cycles often come with certain overhead. This chapter demonstrated that these D F T techniques with various overhead are not necessary i f there exist a large number o f S o C cores (including e - S R A M s ) as predicted b y ITRS. In other words, this chapter demonstrated that scheduling the e - S R A M testing properly b y taking advantage o f the delay cycles would yield D R F coverage with zero overhead.  7.6 References [1] E . Larsson, J . Pouget, and Z . Peng, "Defect-aware S o C Test Scheduling", Proceedings  of  IEEE VLSI Test Symposium, 2004, pp. 359-364. [2] W . B . Jone, D . C . Huang, S. C . W u and K . J . Lee, " A n efficient B I S T method for distributed small buffers", IEEE Transactions on VLSI Systems, V o l . 10, Issue 4, pp. 512-515, 2002. [3] W . B . Jone, D . C . Huang, S. C . W u and K . J . Lee, " A n efficient B I S T method for small buffers", Proceedings of IEEE VLSI Test Symposium, 1999, pp. 246-251. [4] D . C . Huang and W . Jone, " A parallel transparent B I S T method for embedded memory arrays b y tolerating redundant operations", IEEE Transactions on Computer Aided  Design  of Integrated Circuits and System, V o l . 21, N o . 5, pp. 617-628, 2002. [5] W . B . Jone, D . C . Huang, and S. R . Das, " A n efficient B I S T method for non-traditional faults o f embedded memory arrays", Proceedings Measurement  of the 19th IEEE on Instrumentation  and  Technology Conference (IMTC/2002), 2003, V o l . 1, pp. 601- 606.  [6] Y . Zorian, " A Distributed B I S T Control Scheme for Complex V L S I Devices", Digest of Papers, Eleventh Annual 1993 IEEE VLSI Test Symposium, pp. 4-9, 1993.  132  [7] C . W . Wang, R . S. Tzeng, C . F . W u , C . W . W u , etc, " A built-in self-test and self-diagnosis scheme for heterogeneous S R A M Clusters", Proceedings of 10th Asian Test Symposium, 2001, pp. 103-108. [8] C . W . Wang, J . R . Huang, Y . F . L i n , K . L . Chung, and C . W . W u „ "Test scheduling o f BISTed memory cores for S o C " , Proceedings of the 11th Asian Test Symposium, 2002, pp. 356-361. [9] J. Chin, and M . Nourani, "Power-time tradeoff i n test scheduling for SoCs", Proceedings of the 21 International Conference on Computer Design, 2003, pp. 548-553. st  [10] Y . X i a , M . Jeske, amd B . Wang, " U s i n g a distributed rectangle bin-packing approach for core-based S o C test scheduling with power constraints", Proceedings of International Conference on Computer Aided Design, 2003, pp. 100-105. [11] V . H . Champac, J . Castlli and J. Figures,  "I DQ D  testing o f opens i n C M O S S R A M s " ,  Proceedings of 16th IEEE VLSI Test Symposium, 1998, pp. 106-111. [12] D . H . Y o o n , "Dymnamic Power Supply Current Testing for Open Defects i n C M O S S R A M s " , Electronics and Telemmunication Research Institute (ETRI) Journal, V o l . 23, N o . 2, pp. 77-84,2001. [13] A . Meixner, J . Binker, "Weak write test mode: an S R A M cell stability design for test technique", Proceedings ofInternational Test Conference, 1996, pp. 309-318. [14] V . H . Champac, V . Endano and M .Lineas, M . Linares; " B i t line sensing strategy for testing for data retention faults i n C M O S S R A M s " , Electronics Letters, V o l . 36, Issue 14, pp. 1182-1183,2000. [15] J . Yang, B . Wang, and A . Ivanov, "Open Defects Detection within 6T S R A M Cells using a N o Write Recovery Test Mode", Proceedings of International Conference on VLSI design 2004, 2004, pp. 493-498.  133  [16] R. Dekker et al., " A realistic fault model and test algorithm for static random access memories", IEEE transactions  on Computer-aided  design, V o l . C-9, N o . 6, pp. 567-572,  1990. [17] B . Wang, J. Y a n g and S. Yang, "Reucing Test Time o f Embedded S R A M s " , Proceedings the 2003 IEEE International (MTDT2003),  Workshop  on Memory  Technology,  Design,  and  of  Testing  2003, pp. 47-52.  [18] Z . A l - A r s , and A . J. V a n De Goor, "Soft faults and the importance o f stresses i n memory testing", Proceedings of Design Automation and Test in Europe (DATE 2004), 2004, V o l . 2, pp. 1084-1089.  134  Chapter 8 8.1  Conclusions  Conclusions  Currently, SoCs are becoming very much memory dominant, up to 70% o f the total chip area. According to the predictions i n the I T R S ' 0 3 documents, the embedded memories, e.g., eS R A M s , are the significant yield limiters i n SoCs. Furthermore, since memory arrays are usually the densest physical structure and made from the smallest geometry process features available, these e - S R A M s are more prone to manufacturing defects and field reliability problems. Such situation not only demands post-test redundancy, but also poses significant test challenges, particularly, the test time required for achieving acceptable test quality.  The first part o f this research focuses on reducing the test time o f single e - S R A M s . In practice, an e - S R A M fault set includes D R F s and n o n - D R F s and its test time consists o f the time for testing both faults respectively. Due to the mixed-signal nature o f the e - S R A M test, defect-based fault models are more realistic and attractive than the function-based ones. To improve memory yield, several different redundancy techniques, i.e., hard repair and soft repair, are generally applied. Based on the defect locations rather than faulty behaviors, the coupling faults can be assumed bi-directional (the environmental or soft error based coupling faults could be unidirectional) and thus detected by using both addressing sequences, i.e., increasing and decreasing ones. B y more tightly coupling these test and repair techniques, the n o n - D R F test time for a single e - S R A M can be reduced b y a factor o f up to two without negatively impacting defect coverage. This reduction is achieved b y applying a single addressing sequence. The reason to do so is because the other cell automatically becomes a good cell i f one is detected using one o f the two addressing sequences and repaired using hard repair techniques. Based on e - S R A M 135  capacities, we reduce D R F test time either using a D F T technique referred to as P D W T M or quantifying D R F delay time due to the necessary time separating operations to different cells. The P D W T M technique reduces the D R F test time b y completely removing the delay from the test flow. W i t h the goal o f reducing test time for S R A M s with various capacities, we consider the yield gain and redundancy area overhead trade-off value and the delay time for D R F tests as the two deciding factors for categorizing e - S R A M s into four groups, namely as N S R D F (No or Soft Repair & D F T for retention faults), S R D E (Soft Repair & D E l a y for retention faults), H R D E (Hard Repair & D E l a y for retention faults), and H R D F (Hard Repair & D F T for retention faults) types. Accordingly, we propose four corresponding M a r c h tests, generated from a comparison algorithm, by combining the advantages o f the above coupling fault testing methodologies and the P D W T M technique. Simulations show that the test time can always be reduced b y a factor o f up to two for a single S R A M . This reduction is achieved regardless o f the specific M a r c h algorithms.  The second part o f this research focuses on reducing test time o f multiple distributed small eS R A M s . This is achieved by improving the referred test/diagnosis architectures and test power model for power-constrained test scheduling. T o our best knowledge, the widely used solution to test/diagnose  those  e-SRAMs  is  applying  serial  memory  interfacing  technique,  e.g.,  unidirectional and bidirectional serial interfaces. This technique minimizes the test area overhead by involving the memory cells themselves into the serial paths. The approach i n this dissertation improves the test time required i n this alternative b y applying both the P D W T M technique for D R F s and parallel Local Response Analyzers, instead o f global serial response analyzers. The reduction  techniques  still  maintain  acceptable  control  signal  routing  complexity  and  corresponding area overhead. The serial fault masking effect and time-consuming defect-rate  136  dependent diagnosis existing i n unidirectional/bidirectional serial interfaces are overcome b y designing a pair o f serial-to-parallel  and parallel-to-serial  converters. To further reduce the test  time i n the power-constrained test scheduling for the concurrent test o f those e - S R A M s , i n this work, a "retention-aware"  test power model is proposed to replace the original  "single-  rectangle" model typically used for S o C cores. W i t h the proposed model but without both extra delay cycles and D F T techniques, e - S R A M D R F s can still be covered. This is because, during SoC test scheduling, all the necessary delay cycles for D R F s can be filled b y other non-delay cycles.  8.2 Future Work 8.2.1  Short-term Goals  In this thesis, the coupling faults are assumed to behave as a bi-directional pair. However, the following two factors may render this assumption invalid. During the modeling o f coupling faults, all the memory cells are assumed to be symmetric i n both circuit-level and physical-level. A s a result, the defect-based coupling faults' behaviors are always bi-directional no matter whether the bridges are low or high ohmic resistive. Therefore, the first factor is that those memory cells are sometimes designed intentionally [1-2] or accidentally as asymmetric. The asymmetric e - S R A M cells in [1-2] are designed mainly for lowering the power consumption, e.g., gate leakage. The accidental asymmetry possibly existing i n the e - S R A M cells could be due to the variations o f both cell sizes and layouts. A t this time, the coupling faults w i l l be unidirectional and the relationship o f aggressor and victim is determined based on the two-shorted nodes' conditions, e.g., node capacitance.  137  Secondly, i f those coupling faults are caused not by bridge defects, but by other factors, e.g., cross-talk, their faulty behavior w i l l not be bi-directional either. The faulty behavior direction depends on the severity o f effects due to these factors at the two-shorted nodes. The cell with the worse affected node usually becomes the victim and the other one is the aggressor. In order to model the uni-directional coupling faults due to the first factor mentioned above, the first part o f the short-term future work would focus on the cells with accidental asymmetry. Since the cells with intentional asymmetry usually have more than six transistors, their total fault models (including the coupling fault models) w i l l be different from the ones for the symmetric 6T S R A M cells. A s a result, those cases, like the ones i n [1-2], w i l l be out o f the score o f this thesis because we specialize our research on 6T S R A M cell only. In Figure 8-1, a coupling fault between an accidentally asymmetric cell and a symmetric cell is modeled with a resistive bridge, where the asymmetry is modeled as an extra capacitor at the true node. WL  Figure 8-1 A resistive bridge in between an asymmetric memory cell and a symmetric one When the resistor R between node AO and A l i n Figure 8-1 has low resistance value, e.g., lOohms, this coupling fault is generally static [3]. Due to the extra capacitor C at node AO, this coupling fault w i l l be uni-directional for some range o f values o f R and C . Thus, the goal o f this work is to find (i) the threshold resistance value o f R under given capacitance value o f C, (ii) the threshold capacitance value o f C under given resistance value o f R , when the coupling fault 138  changes from bi-directional to uni-directional. The first aims to identify our algorithm detectability i n terms o f R resistance value for given asymmetric cells. The second targets to find the tolerance o f the symmetry degree i n terms o f C capacitance value i f our algorithm can be applied for given faulty cells. The uni-directional coupling fault modeling due to the second factor mentioned above involves fully investigating its causes except physical defects. The threshold values o f this factor w i l l also be identified. After that, some extra tests might need to be added to cover the coupling faults that are not bi-directional due to the both factors. Moreover, not only would those extra test sets be discovered, but also those test sets would be minimized for the test time reduction.  The third part o f the short-term future work w i l l develop an efficient  power-constrained  scheduling program b y applying the proposed "retention-aware" test power model. The purpose of this development is to quantify the overhead on the test scheduling program due to the "retention-aware" model. The best-known power-constrained test scheduling for multiple eS R A M s w i l l be investigated. After that, this test scheduling program w i l l be modified i n order to utilize the proposed test power model. Finally, the corresponding test overhead w i l l be evaluated.  It is very time-consuming to detect Data Retention Faults i n the very deep submicron ( V D S M ) technology. Our proposed cost-effective D F T technique, referred to as P D W T M , has been validated fully under T S M C 0.18um technology and partly under U M C 0.13um technology. It is worthwhile to verify its validity under more advanced technology, e.g., 90nm or even 65nm technology. This is because one o f our most significant competitive solutions, Programmable Weak Write Test M o d e ( P W W T M ) , has been proven to be effective under up to 65nm technology [4-5]. Therefore, the fourth part o f the short-term future work w i l l validate the  139  PDWTM  under  90nm technology and possibly under  65nm technology depending  the  availability o f the technology resources.  8.2.2 Long-term Goals The long-term future work w i l l focus on reducing the test time o f multiple general e - S R A M s . According to the I T R S ' 0 3 predictions, the e - S R A M s scenario within a S o C w i l l vary. In other words, these e - S R A M sizes, physical locations, speeds, etc. w i l l be different. For example, some SoCs have several very large e - S R A M s with different running speeds, e.g., the ones with M e g a word capacities and wide range o f running frequencies. Moreover, i n some other SoCs both H R D E e - S R A M s and N S R D F e - S R A M s may co-exist. A t that time, their test situations w i l l become very complex and so are their test time reduction methodology developments. This is because the redundancy strategy exploration w i l l be very difficult i n order to achieve an optimal trade-off value between the yield gain and redundancy area overhead.  In this work, we w i l l investigate the redundancy strategy selections for multiple e - S R A M s coexisting within a S o C . The grouping methods used for a single e - S R A M i n this thesis might be modified to deal with the new redundancy situations. Based on the new classifications, more time-efficient test solutions w i l l be proposed for each category.  8.3 References [1] Y . J. Chang, F. L a i , and C . L . Y a n g , "Zero-aware asymmetric S R A M cell for reducing cache power i n writing zero", IEEE Transaction on Very Large Scale Integration (VLSI) Systems, V o l . 12, Issue 8, pp. 827-836, 2004. [2] N . A z i z i , and F. N . Najm, " A n asymmetric S R A M cell to lower gate leakage", Proceedings of 5th International Symposium on Quality Electronic Design, 2004, pp. 534-539. 140  [3] M . Azimane, A . Majhi, G . Gronthoud, M . Lousberg, S. Eichenberger, et al., " A new algorithm for dynamic faults detection i n R A M s " , Proceedings of VLSI Test Symposium '  (VTS), 2005, pp. 177-182.  [4] D . M . W u , M . L i n , M . Reddy, T . Jaber, A . Sabbavarapu, et al., " A n optimized D F T and test pattern generation strategy for an Intel high performance microprocessor", Proceedings of International Test Conference (ITC), 2004, pp. 38-47. [5] K . Zhang, U . Bhattacharya, Z . C , F . Hamzaoglu, D . Murry, et al., " S R A M design on 65-nm C M O S technology with dynamic sleep transistor for leakage reduction", IEEE Journal of Solid-State Circuits (JSSC), vol. 40, N o . 4, pp. 895-901, 2005.  141  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065417/manifest

Comment

Related Items