BIDIRECTIONAL SEQUENTIAL DECODING by Kaiping Li M. A, Sc., Northern Jiao-Tong University, Beijing, P. R. of China, B. A. Sc., 1986 Northern Jiao-Tong University, Beijing, P. R. of China, 1983 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF ELECTRICAL ENGINEERING We accept this thesis as conforming t a reds The UNIVERSITY OF BRITISH COLUMBIA June 1994 © KAIPING LI, 1994 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. (Signature) Department of Ci The University of British Columbia Vancouver, Canada Date DE-6 (2188) Abstract The main drawback of sequential decoding is the variability of its decoding effort which could cause decoding erasures. We propose and analyze in this dissertation efficient bidirectional sequential decoding (B SD) techniques to alleviate this drawback, In the proposed BSD, two decoders are used; one is called forward decoder (PD), and is used to search the tree from forward direction; while the other is called backward decoder (BD), and is used for the backward search of the tree. Forward decoding and backward decoding are performed simultaneously, and stop somewhere in the tree. In one class of BSD, to which we refer as BSD-merge, decoding stops whenever FD and BD merge at a common encoder state some where in the tree. In the other class of BSD, that is BSD no-merge, no common encoder state is required, and decoding stops when FD meets BD. Different BSD algorithms based on the stack algorithm are constructed; namely Algorithm TAmeet which belongs to the class of B SD-no-merge, Algorithm TAmerge and Algorithm Timerge which belong to BSD-merge, and finally Algorithm HTTmerge which is a hybrid version of BSD-merge and BSD-no-merge. The relationships between backward coding and forward coding are examined in detail. Good convolutional codes, with memory m ranging from 2 to 25, suitable for bidirectional decoding, found through extensive computer search, are provided. These codes possess the same distance properties from both forward and backward directions. It is found by analysis and computer simulations that the distribution of the total number of computations per decoded block of the proposed BSD is still Pareto, as that of unidirectional sequential decoding (USD). However, the advantage of the proposed 11 BSD appears as an increase in the Pareto exponent, and hence as a decrease in the computational variability and erasure probability. More specifically, we prove by using the random coding approach that the Pareto exponent of BSD using Algorithm TAmeet is asymptotically twice that of USD, and conjecture that this also applies to Algorithm TAmerge. On the other hand, it is found that the computational cutoff rate remains unchanged, but the use of our BSD reduces the average number of computations per decoded bit. Using the random coding approach, we show that the error performance of BSD merge is asymptotically the same as that of USD. Moreover, we show that the bit error probability of Algorithm TAmeet satisfies the random coding bound for block codes. Computer simulations are provided to confirm the analytical findings. The use of BSD-merge substantially reduces the computational variability of con ventional sequential decoding without compromising the error performance, However, BSD does not completely eliminate the erasure problem. We therefore combine our BSD idea in conjunction with the multiple stack algorithm (MSA), which is an erasure-free decoding algorithm. It is shown through analysis and computer simulations that the new bidirectional multiple stack algorithm (BMSA) offers substantial advantages over the MSA in terms of computational effort, memory requirements and error performance. The BMSA appears as an attractive alternative to the Viterbi algorithm (VA) where low error probabilities and high decoding speeds are required. 111 Contents Abstract List of Tables vii List of Figures ix List of Symbols xii Acknowledgment Chapter 1 xvii Introduction 1 1.1 Outline of the Dissertation 8 1.2 Claim of Contributions of this Dissertation Chapter 2 . 9 Convolutional Coding and Sequential Decoding 11 2.1 Convolutional Coding 11 2.2 Sequential Decoding 16 2.3 Backward Coding/Decoding 20 2.4 Search for Good Symmetric Codes 27 Bidirectional Sequential Decoding Algorithms 33 3.1 Algorithm TAmeet, a BSD-no-merge 34 3.2 Algorithm TAmerge, a BSD-merge 41 3.3 Coarsening on Algorithm TAmerge 50 Chapter 3 iv Chapter 4 Computational Performance of BSD 55 4.1 A Simple Approach to Computational Distribution 55 4.2 Upper Bounds to Computational Distribution of Algorithm TAmeet 58 4.3 Extension to Other BSD Algorithms 64 4.4 Lower Bound to Computational Distribution 66 4.5 Average Number of Computations 68 4.6 Numerical and Simulation Results 71 Error Performance of BSD 91 5.1 Error Performance of USD 91 5.2 Error Performance of BSD-merge 93 5.3 Error Performance of Algorithm TAmeet 5,4 Numerical and Simulation Results 110 Bidirectional Multiple Stack Algorithm 115 6.1 The Multiple Stack Algorithm (MSA) 116 6.2 Bidirectional Multiple Stack Algorithm (BMSA) 122 6.3 Computational Properties of the BMSA 125 6.4 Error Performance of the BMSA 135 6.4.1 Effect of Parameter Z 1 136 6.4.2 Effects of Parameters T and Z 138 Comparison with Viterbi Decoding 138 Chapter 5 Chapter 6 6.5 V 109 . . . . Chapter 7 Conclusions and Suggestions for Further Research . . . References 141 145 Appendix B Proof of Theorem 4.4 Appendix C List of Acronyms . 150 154 vi List of Tables Table 2.1 Symmetric bidirectional optimum distance profile (SBODP) codes 28 ,.,,.,.,.,..,.,,...,,.,...,,,., Table 2.2 Distance spectra of SBODP codes Table 2.3 Symmetric almost bidirectional optimum distance profile 29 ................. (SABODP) codes 31 Table 2,4 Distance spectra of SABODP codes 32 Table 4.1 Comparison of average computations for L and p Table 4.2 = = 8000 75 = 200, Clim = 4000 0.0594 81 Comparison of average computations for L and p Table 4.4 = 400, Clim 0.0409 Comparison of average computations for L and p Table 4.3 = = = 400, Clim 4000 0.0289. 83 Comparison of computational and error performances of different m with the new definition of one computation Table 5.1 Error performance comparison of different algorithms for the case Pr Table 5.2 = 1.1, i.e., R < 112 Rcomp Error performance comparison of different algorithms for the case Pr Table 5.3 88 = 0.68, i.e., R > Rcomp ................. Error performance comparison of Algorithm TAmeet for different block length L vii (z = 113 1) 114 Table 5.4 Error performance comparison of Algorithm TAmerge ( = 1) for different block length L 114 Table 6.1 Average number of computations as a function of Z 1 134 Table 6.2 Average number of computations as a function of T Table 6.3 Average number of computations as a function of Cijm Table 6,4 Error performance as a function of parameter T 138 Table 6.5 Error performance as a function of parameter Z 138 vu . . . . . . . 135 135 List of Figures Figure 1.1 General digital communication system 1 Figure 1.2 Binary symmetric channel 3 Figure 2.1 Convolutional encoder for the (3, 1, 2) code Figure 2.2 Forward code trellis diagram for the encoder of Figure 2.1. Figure 2.3 Forward code tree diagram of the encoder of Figure 2.1. Figure 2.4 Backward code trellis diagram for the encoder of Figure 2.1 Figure 2.5 Backward code tree diagram of the encoder of Figure 2.1,. Figure 3.1 Illustration of the stopping rules of Algorithm TAmeet. Figure 3.2 Illustration of the overlapping in BSD-merge........... 42 Figure 3.3 Example of opposing paths passing each other Figure 3,4 Typical impossible event in Algorithm TAmerge......... 44 Figure 3.5 Illustration of a typical impossible event in Figure 3.4...... 45 Figure 3.6 Illustration of the possible event Figure 3.7 Illustration of another impossible event 46 Figure 3.8 Distinct forward and backward paths in the FS and BS 47 Figure 3.9 Merging test in Algorithm TTmerge 52 Figure 4.1 (a) Correct path metric with forward decoding. (b) Correct path 12 . . . . . . . . 13 14 21 22 35 43 ..........,..... 45 metric with backward decoding. (c) Correct path metric with bidirectional decoding 57 Figure 4.2 Distribution of the total number of computations per decoded block for the case L = ix 400 and p = 0.0409, i.e., Pr = 1.1. . . 73 Figure 4,3 Distribution of the total number of computations per decoded block for different codes using the same parameters as in Figure 4.2 Figure 4.4 77 Distribution of the total number of computations per decoded block for a systematic ODP code (Pr Figure 4.5 78 = 200 and p = 0.0594, i.e., Pr = 0.68. . = 400 and p = 0.0289, i.e., Pr = 1.5. Probability mass of merging and meeting when code rate . . 82 < 84 Rcomp Figure 4.8 80 Distribution of the total number of computations per decoded block for the case L Figure 4.7 1.1) Disthbution of the total number of computations per decoded block for the case L Figure 4.6 = Probability mass of merging and meeting when code rate > 86 Rcomp Figure 4.9 Distribution of the overlapping length 87 Figure 4.10 Distribution of the total number of computations for different m with the new definition of one computation 89 Figure 5.1 Example of error events with a correct merging encoder state. 94 Figure 5.2 Example of error events with an incorrect merging encoder state 94 Figure 5.3 Illustration of the merging error event 97 Figure 6.1 Illustration of the multiple stack algorithm Figure 6.2 Empirical computational distribution for the final decision of the MSA 117 120 x Figure 6.3 The first stack saving as a function of P 1 Figure 6,4 Empirical computational distribution for the final decision 128 using Z 1 as a parameter Figure 6.5 130 Empirical computational distribution for the final decision using T as a parameter Figure 6.6 131 Empirical computational distributions for the final decision using Z as a parameter Figure 6.7 132 Bit error probability of the BMSA and MSA as a function of 1 Z Figure 6.8 137 Comparison on error performance for different algorithms. xi . 140 List of Symbols Definition Lrl Smallest integer not less than [TJ Largest integer not greater than r Defined by R = r (o)/o 0 E Substack spacing 7 Fano metric One branch metric drop Pr Defined by R = Indicator function ad free + C CM8A Total number of paths of weight dfree + 2 max [mm (c[, c_)] Total number of computations needed to decode one branch using the BMSA CSD Total number of computations needed to decode one branch using BSD CISA Total number of computations needed to decode one branch using the MSA -’ 5 C’ Total number of computations needed to decode one branch using USD Ccrir Critical number of computations Channel capacity xii Cdjree+j Total number of nonzero information bits on all weight dfree + i paths Cf D 5 Total number of computations required to decode a block of L branches by any BSD algorithm Cf, Cf Total number of computations required by forward and backward decoders to decode a whole block of L branches, respectively C, Cf_i Total number of computations needed by forward and backward de coders to decode x or (L — x) branches without searching beyond the boundary at level x, respectively AF ‘iB ‘L—z Total number of computations required by forward and backward decoders to correctly decode x or (L — x) branches, respectively Ci The i-th branch computations CL Total number of computations needed to decode a block of L branches by using Algorithm TAmeet CL Total number of computations required to decode a block of L branches by using Algorithm TAmerge with the assumption of correct decoding TAmerge CL Total number of computations required to decode a block of L branches by using Algorithm TAmerge CfM Total number of computations required to decode a block of L branches by using the BMSA Total number of computations required to decode a block of L branches by using the MSA XIII Ciim Computational limit Total number of computations required to correctly decode the first A branches using USD D Unit delay d Distance profile dfree Free distance d[], d[.] Column distance function of forward and backward codes, respec tively Incorrect path subset generated from level i E[1 Expectation function () 0 E Gallager function Smallest concave function E() G(D), Gr(D) Transfer function matrix of forward and backward codes, respectively H() Entropy function I Identity matrix K Code constraint length k Number of bits per unit time at the input of an encoder L Block length (including a known tail) I Meeting point ‘F lB The farthest level reached by forward and backward decoders, re spectively xiv FTOP’ 1 1 BTOP The length of top path in FS and BS, respectively 1 Merging point M[] Path metric function WI Total encode memory m Encode memory order mh Number of matching information symbols in the merging test Nb Total number of bit errors given Smg incorrect n Number of bits per unit time at the output of an encoder nb(r) Expected number of bit errors caused by an incorrect path diverging at level r 1 + l — L at the end of decoding. Pj The first (primary) stack overflow probability Pb Bit error probability Block error probability Per Erasure probability p Crossover probability of BSC R Code rate (flats per channel symbol) r Code rate (bits per channel symbol) Rcomp Computational cutoff rate 5 SF, S Current number of forward and backward stacks, respectively 1 S Encoder state i xv Smg Common encoder state at 10 The first (primary) stack size saving ratio x(i) Correct path from root node to level i x(t) Incorrect path which diverges from the correct path at level i and remerges at level (i + t) Incorrect path which diverges from the correct path at level i and stretches out t branches from the correct node i The set of all possible incorrect paths which diverge from the correct path at level i X(i;j) The set of all incorrect paths which diverge and remerge with the correct path at level i and (L - j) Total number of elements in .(i;j) Z 1 Z ZM = 2 z pMsA = 2ZM,i = 2,3,•• ZpM The i-th forward or backward stack size in the BMSA Zj The i-th stack size in the MSA xvi Acknowledgment I would like to thank my wife, Libing, for her constant encouragement, support, and understanding during these years. I am also greatly indebted to my research supervisor, Dr. Samir Kallel, for his continual encouragement and highly constructive comments. I wish to express my sincere thanks for his guidance throughout this research. I would like to thank Mr. Dimitrios P. Bouras, Mr. Siavash Massoumi, Mr. Longxiang Dai, and Mr. Weimin Sun for their helpful discussions. I would also like to thank Dr. Samir Kallel, Mr. Siavash Massoumi, and Mr. Dimitrios P. Bouras for their help with transcript changes after I joined OKI Telecom. Finally, I would like to thank Mrs. Vicki Waidro-Snider and Mr. Peter Howard for their invaluable help in perfecting the use of the English language throughout this text. xvii Chapter 1 Introduction Digital communication deals primarily with the process of transmitting digital infor mation from one point to another through a channel which is subject to noise disturbances. This implies that the information received at the destination (prior to processing) is not al ways exactly the same as transmitted. Shannon [11 showed in his famous coding theorem that digital information can be transmitted with arbitrarily low error probability provided that the data rate is smaller than the channel capacity. Since then, numerous coding and decoding techniques have been proposed in the attempt to approach Shannon’s promise of reliable communication near the channel capacity. The increasing demand for efficient and reliable digital communications over noisy channels has led to the use of sophisticated coding and decoding techniques for the purpose of error correction. These techniques, called forward error correction (FEC) are useful, especially with power-limited systems. r Source Destination Source Encoder Channel Encoder H-H_::r Mo Demodulator Figure 1.1 General digital communication system. 1 Chapter 1 Introduction A general block diagram of a digital communication system is given in Figure 1.1. An arbitrary source generates a signal which is then transformed by a source encoder into a redundancy-free digital information sequence. If the source is analog, analog-to-digital (A/D) conversion is performed at the source encoder. Generally, source and source encoder are treated as a digital source which emits an equal likely binary sequence. The information sequence then enters the channel encoder which adds redundancy to combat the channel noise. The outputs from the channel encoder, referred to as codewords, are then mapped into a waveform signal by the modulator for transmission over the waveform channel, The waveform channel may represent a telephone line, a wireless radio link, a space communication link, etc. The demodulator processes the received waveform signal and produces an output that may be discrete (quantized) or continuous (unquantized). The output of the demodulator is called the received sequence. The channel decoder, based on the received sequence and the known structure of the channel encoder has to select the best possible transmitted sequence. The purpose of the channel encoder and decoder is to allow the digital information sequence to be efficiently and reliably reproduced at the output of the channel decoder. The decoded sequence then enters the source decoder, which reproduces a signal that is delivered to the user. If the source signal is continuous, this also involves digital-to-analog (D/A) conversion. To focus attention on the channel encoder and channel decoder, the modula tor/demodulator (modem) and the waveform channel are combined into a coding channel. In this dissertation, a discrete memoryless channel (DMC) is assumed, which constitutes the simplest class of coding channel models and is defined as follows. The input is a sequence of elements from a finite alphabet, and the output is a sequence of elements 2 Chapter 1 Introduction from the same or different alphabet. Moreover, each element in the output sequence is statistically dependent only on the element in the corresponding position of the input sequence and is determined by a fixed conditional probability assignment. For example, the binary symmetric channel (BSC), which is shown in Figure 1.2, is a DMC with binary input and output alphabets, where each digit at the channel input is reproduced correctly at the channel output with some fixed probability (1 - p) and is altered by noise into the opposite digit with probability p. Input alphabet 0 Output alphabet 0 1 1 ‘-p Figure 1.2 Binary symmetric channel. A multitude of methods on coding and decoding techniques which permit low error probability in digital communications have been proposed and investigated. Since excel lent introductory literature into coding/decoding theory is available [2—8], the following survey will therefore be limited to matters relevant to this thesis, that is convolutional encoding with probabilistic decoding. Supporting literature is found in the references. A convolutional code is described by three integers, n, k, and m. At each unit time, the encoder accepts k-bit of the information sequence and produces a codeword (encoded sequence) of n-bits. The integer m is a parameter known as the code memory length. An important characteristic of an (n, k, m) convolutional code is that the encoder has memory 3 Chapter — 1 Introduction the n-bits emitted by the convolutional encoding procedure is not only a function of the k-bit input, but is also a function of its previous m inputs. Viterbi decoding [9] and sequential decoding’ [11] are the two main probabilistic decoding methods for convolutional codes. The Viterbi algorithm (VA) is an optimum maximum likelihood decoding scheme, but its optimality is obtained at a large decoding effort. The number of computations performed per information digit is constant, but increases exponentially with the code memory length. In practice, the VA can thus be employed only with short memory codes. Historically, sequential decoding was proposed in 1957 by Wozencraft and Reiffen [11] as the first practical decoding procedure for convolutional codes. In sequential decoding, the tree-like structure of the convolutional code is used in a step-by-step search of the most likely transmitted sequence. As long as the data rate does not exceed a quantity called computational cutoff rate , 01 which is less than the channel capacity, R the average number of computations required to decode one information digit is small and independent of the code memory length. The error probability of sequential decoding decreases exponentially with the code memory length [12], and it is lower bounded by that of the VA. As m —* co, however, both decoding techniques yield the same error performance. In addition, the decoding complexity of sequential decoding is practically insensitive to the code memory. Con sequently, convolutional encoding with sequential decoding is at present one of the best known methods to achieve an arbitrarily low error probability over a DMC [7, 8, 13, 14]. In this dissertation, sequential decoding is used in a narrow sense. That is, it only includes the metric-first and the depth-first algorithms although the breadth-first algorithm can be categorized as sequential decoding in a wide sense [10]. 4 Chapter 1 Introduction Convolutional coding with sequential decoding has been used in various systems such as satellite communications, deep space communications, etc. There are two principle sequential decoding algorithms: the Fano algorithm [13] and the stack algorithm proposed independently by Zigangirov [15] and Jelinek [16]. Re gardless of the algorithm, a major problem with sequential decoding is the computational variability. As a consequence of this variability, the decoding effort for a given received sequence may occasionally exceed the physical limitations of the decoder, leading in evitably to buffer overflows and infonnation erasures. Thus, while the undetected error probability of sequential decoding can be made arbitrarily small, the erasure probability or buffer overflow remains substantial [7, 8, 17]. Although the disadvantage of over flows or erasures of sequential decoding may be useful in systems with retransmission capabilities, it is desirable to make the erasure probability as small as possible without compromising the error performance in order to achieve higher throughput and reliabil ity. In addition, there are many other cases where erasures are undesirable. For example, in real-time systems with long path delays, erasures can cause complete data loss, In this case, the main contribution of errors comes from the buffer overflow. Thus, it is very important to reduce the erasure probability. The prime objective of this research is tailored at alleviating the computational variability, and thus minimizing erasures in sequential decoding. In the past, several modifications of sequential decoding to reduce the erasure probability have been reported and implemented. Excessively long tree searches can be terminated early by placing a backsearch limit on the Fano and the stack algorithms, 5 Chapter 1 Introduction thereby lowering the erasure probability [14]. A similar effort is achieved with the stack algorithm by deleting the path with the lowest metric from the stack whenever the stack fills up [15, 16]. Although these methods accelerate decoding and reduce the erasure probability, the resulting increase in the error probability is found to be substantial [18]. Haccoun and Ferguson [19] proposed a class of generalized (or multiple path) stack algorithms which extends several paths simultaneously from the top of the stack. It was shown that the variability of the computational effort is reduced, but at a cost of a large average number of computations compared to the ordinary stack algorithm. In order to eliminate erasures entirely, Chevillat and Costello [20, 21] introduced an interesting multiple stack algorithm (MSA). It is a modification of the stack algorithm and uses several stacks to accommodate the problem of stack overflow. However, it is found that the resulting increase in stack and buffer storage is substantial. Fomey [22] in 1967 suggested that sequential decoding can be started from the end of a block which is terminated by a known tail. He proposed a bidirectional decoding scheme which uses a forward decoder and a backward decoder to search a terminated tree from initial and final encoder states, respectively. Decoding with Forney’s scheme stops whenever either the forward or the backward decoder reaches the end of its tree. Computer simulations revealed that this scheme does not offer much improvement in terms of computational performance. The idea of bidirectional search has also been used for computing the free distance of convolutional codes [23, 24] and distance spectrum of trellis codes [25]. Recently, a bidirectional decoding search has been applied to the decoding of block codes [26]. 6 Chapter 1 Introduction Most recently, Haccoun and Beizile [27, 28] investigated bidirectional breadth-first M algorithm for the decoding of convolutional codes. Although erasures are avoided by the nature of the breadth-first M algorithm used, the loss of the correct path problem, which is inherent to the M algorithm, limits the performance of this scheme. We propose and analyze in this dissertation efficient bidirectional sequential decoding (BSD) techniques [29]. It is shown that the proposed BSD techniques substantially alleviate the drawbacks of conventional or unidirectional sequential decoding (USD). In the proposed BSD, two decoders are used, like in Forney’s scheme; one is called forward decoder (FD), and is used to search the tree from forward direction; the other is called backward decoder (BD), and is used for the backward search of the tree. Forward decoding and backward decoding are performed simultaneously. However, unlike Forney’s scheme, decoding in the proposed BSD techniques stops somewhere in the tree, In one class of BSD, which we refer to as BSD-merge, decoding stops whenever FD and BD merge at a common encoder state somewhere in the tree. In the other class of BSD, that is BSD-no-merge, decoding stops when FD meets BD. Different BSD algorithms based on the stack algorithm are constructed; namely Algorithm TAmeet which belongs to the class of B SD-no-merge, Algorithm TAmerge and Algorithm TTmerge which belong to B SD-merge, and Algorithm HTTmerge which is a hybrid version of BSD-merge and BSD-no-merge. Finally, to eliminate erasures entirely, we combine Algorithm TTmerge in conjunction with the MSA of Chevillat and Costello [20, 18, 21], to obtain a more efficient and reliable erasure-free algorithm, which we call bidirectional multiple stack algorithm (BMSA). 7 Chapter 1 Introduction 1.1 Outline of the Dissertation Chapter 2 is a brief summary of the basic elements of convolutional coding and sequential decoding. The idea and basic properties of backward coding/decoding are also presented in this chapter. Moreover, good convolutional codes suitable for bidirectional decoding, which are found through computer search, are listed in this chapter. In Chapter 3, the proposed BSD algorithms are described and their fundamental properties are investigated. Furthermore, our proposed BSD algorithms are compared with the existing BSD algorithms which can be classified under the category of BSD no-merge in a wide sense. A theoretical analysis of the computational performance of the proposed BSD algo rithms is given in Chapter 4, together with extensive computer simulation results. It is found, both by theoretical analysis as well as computer simulations, that the distribution of the total number of computations per decoded block by any of the proposed BSD algorithm is still Pareto, as that of USD. However, the advantage of the proposed BSD appears as an increase in the Pareto exponent, and hence as a decrease in the compu tational variability. More specifically, we prove by using the random coding approach that the Pareto exponent of BSD using Algorithm TAmeet is asymptotically twice that of USD, and conjecture that this also applies to Algorithm TAmerge. On the other hand, it is found that the computational cutoff rate remains unchanged, but the use of the proposed BSD reduces the average number of computations per decoded bit. This reduction in computational variability and average number of computations translates into a substan tial decrease in the erasure probability (or buffer overflow problem), which is the main 8 Chapter 1 Introduction drawback of sequential decoding. The error performance of the proposed BSD techniques is explored in Chapter 5 by both theoretical analysis and computer simulations. It is found that BSD-merge has asymptotically the same error performance as USD, while the bit error probability of Algorithm TAmeet follows the random coding bound for block codes. In Chapter 6, an erasure-free BSD technique based on the MSA [18, 211, which we call bidirectional multiple stack algorithm (BMSA), is proposed and analyzed. It is found that the BMSA performs better than the conventional MSA, in terms of memory requirements, computational effort and error performance. Chapter 7 contains final conclusions and suggestions for future research, Appendix A contains a rigorous proof of Theorem 4.4 which suggests that under the assumption of correct decoding the computational cutoff rate of BSD is the same as that of USD. Appendix B lists acronyms used in this dissertation. 1.2 Claim of Contributions of this Dissertation The major contributions of this research are summarized below. The properties of backward coding/decoding are examined in great detail. More specifically, the relationships between a backward code and the corresponding forward code are given through Theorems 2.1 and 2.2, Corollaries 2.1 and 2.2 and Properties 2.1 and 2.2. Good convolutional codes suitable for bidirectional decoding found through computer search are provided. 9 Chapter 1 Introduction Efficient BSD techniques that alleviate the drawbacks of conventional sequential decoding are proposed and examined by analysis and extensive computer simulations. The main theoretical contributions in the analysis of the proposed BSD techniques are as follow: 1, A computational upper bound for Algorithm TAmeet is found for a specific timeinvariant convolutional code (Theorem 4.1). 2. A computational upper bound for Algorithm TAmeet is discovered for an ensemble of trellis codes (Theorem 4.2). It is conjectured that this bound also applies to Algorithm TAmerge. 3. A computational lower bound is obtained for any code and any type of BSD (Theorem 4.3). 4. The computational cutoff rate of BSD is found to be the same as that of USD (Theorem 4.4). 5. The error performance of BSD-merge is proved to be asymptotically the same as that of USD (Theorems 5.1 to 5.3, Corollaries 5.1 and 5.2). 6. The bit error probability of Algorithm TAmeet is found to satisfy the random coding bound for block codes. An erasure-free algorithm, the BMSA, is proposed and analyzed. Through computer simulations and analysis, we demonstrate that the new BMSA performs better that the conventional MSA. 10 Chapter 2 Convolutional Coding and Sequential Decoding 2.1 Convolutional Coding An (n, k, m) convolutional encoder can be implemented with k shift registers, n modulo-2 adders and a commutator that scans the output of the adders. The length of each of the k shift registers is less than or equal to the encoder memory order m [81,2 The connections of the n adders to the k shift registers define the code. This is generally specified with a so-called k x n transfer function matrix G(D) [8], given by G(D) = g(D) (2D) 1 g g ( 2 D) (2D) 2 g ... ... g(D) g(D) 2 ’(D) 3 g g’(D) (D) 2 g where g (D) is a polynomial of degree ... (2.1) g(D) m that indicates the connection of the j-th adder to the i-th shift register. For example, Figure 2.1 represents a (3, 1, 2) encoder withG(D) = ,l+D+D [1+D,l+D ] 2 . Encoding is performed k-bit at a time. Each k-bit to be encoded is fed to the encoder from the left side, and the content of the k shift registers is shifted one position to the right. The modulo-2 adders are then sampled in sequence by the commutator, producing n output coded bits. For convenience, we refer to this as forward encoding. Defining fl max [deg g3)(D)] as the total encoder memory [8] and the state i=1 1<j<n of the encoder as its k shift registers’ contents, there are a total of 2 different possible 2 The memory order m is related with the constraint length K [7], which is defined as the maximum number of present and past encoder input symbols that influence any output symbol, i.e., K = m + 1. 11 Chapter 2 Convolutional Coding and Sequential Decoding U vi V Figure 2.1 Convolutional encoder for the (3, 1, 2) code. states. Then, the operations of the encoder can be described using a state diagram, where the n output bits are a function of the state and the k input bits [8]. Operations of a convolutional encoder can also be described by a trellis or a tree diagram [8]. Figures 2.2 and 2.3 represent six-level trellis and tree diagrams, respectively, for the (3, 1, 2) encoder of Figure 2.1, corresponding to information sequence of length 4 bits with a zero tail of length 2. The upper branch leaving each state in both the trellis and tree diagrams in Figures 2.2 and 2.3 represents input one, while the lower branch represents input zero. Starting from the initial state So, each path in the trellis or tree diagram corresponds to a possible encoded sequence as generated by the encoder. For example, the highlighted path on the forward code tree of Figure 2.3 corresponds to information sequence (110100) and code sequence (111, 010, 110, 100, 101, 011). 12 Chapter 2 Convolutional Coding and Sequential Decoding 0 1 2 3 4 5 6 Time units Figure 2.2 Forward code trellis diagram for the encoder of Figure 2.1. 13 Chapter 2 Convolutional Coding and Sequential Decoding 001 110 011 001 [110 011 000 100 101 011 010 110 011 000 000 111 010 110 — 011 100 r1ioi 011 000 101 011 001 110 011 101 —1 111 010 110 011 000 V 0 111 101 100 101 011 011 000 000 000 010 110 011 111 101 r— 000 011 000 101 011 1_000 000 000 000 Figure 2.3 Forward code tree diagram of the encoder of Figure 2.1. 14 Chapter 2 Convolutional Coding and Sequential Decoding The code rate of an (n, k, m) convolutional code is r = k/n information bits per coded bit or channel symbol. The code rate can also be expressed as R where u = k, 2 in (u)/n, nats per channel symbol We have thus R = (2.2) r in 2. The error correcting capabilities of a convolutional code are intimately related with the memory m and code rate r, The larger is m and/or the smaller is r, the more powerful is the code. Lowering the rate of the code involves a higher bandwidth expansion for transmission, which might be problematic in certain applications. It is therefore desirable to increase m as much as possible. The two main probabilistic decoding techniques for convolutional codes are Viterbi decoding [9] and sequential decoding [11]. Given a received sequence of channel symbols and a code structure, a Viterbi decoder explores the whole code trellis in order to decide which is the most probable path or code sequence to have been transmitted. Viterbi algorithm (VA) is thus an optimum decoding technique in the maximum likelihood sense. However, due to the amount of computations needed for decoding, which grows exponentially with the code memory m, the VA is practically limited to short memory codes (typically m 7) [7, 8]. A sequential decoder on the other hand attempts to make the best decision by only exploring a small portion of the code tree (or trellis). Sequential decoding is thus suboptimum, but can be used with long memory convolutional codes. Unfortunately, sequential decoding has serious drawbacks, and our efforts in this thesis are oriented towards the goal of alleviating some of these drawbacks. 15 Chapter 2 Convolutional Coding and Sequential Decoding 2.2 Sequential Decoding In a system using convolutional coding and sequential decoding, information is usually transmitted in blocks (L — m branches), and each block is terminated by a known tail (usually m zero branches). Sequential decoding is a sub-optimum tree search procedure that only explores a fraction of the encoded tree in order to determine the most likely transmitted path for a given received sequence. The basic idea of sequential decoding is as follow, Assuming a DMC, starting from the root node, a sequential decoder explores the code tree, one branch at a time and uses the log-likelihood function or Fano metric [13] given by 1 P (Yilxi)R P(y) where x is the i-th channel input symbol (1 symbol, and P(y) i (23) nL), y, is the corresponding received P(yjIx)q(x) is the a priori probability for the received symbol = ) y with q(xj) being the distribution of channel input symbol x . The total metric for a 1 path of length I branches is then ft (2.4) i=l and a sequential decoder, based on the stack algorithm, will always attempt to search and extend the path having the largest cumulative metric. At the end of the tree, the path with the highest metric is accepted as the decoded path. Thus, while this decoder is moving forward into the tree, it occasionally realizes that it is not following the best current path, and backs up to follow a more promising one. This backtracking ability [30], while essential for achieving good error performance, is also the main cause of the 16 Chapter 2 Convolutional Coding and Sequential Decoding serious problems in sequential decoding. Since here, decoding is performed along only the forward direction, it shall be referred to as unidirectional sequential decoding (USD). Since the stack algorithm is much simpler to describe and analyze than the Fano algorithm, we will focus on the stack algorithm in this dissertation. In this algorithm, a stack is used to store the already searched paths in decreasing order of their metric values. The top of the stack is the path with the largest accumulated metric among the paths in the stack, and will be extended one level further into the tree. The stack is reordered after each extension, so that the top always contains the current best or most likely path. Decoding stops whenever the top path reaches the end of the tree. For ease of reference, we refer to this algorithm as unidirectional stack algorithm and list it in the following. Unidirectional Stack Algorithm: Step 0: Place the initial root node of the code tree in the stack. Step J. Compute the metrics of all successors of the top node in the stack, and enter the new nodes in the stack, ordered with respect to their cumulative metrics. Step 2: Remove from the stack the node whose successors were just inserted. Step 3: If the top node in the stack is the final node of the tree, stop. Otherwise, return to step 1. The major problem of the above algorithm is the reordering of the stack after each extension (Step 1) of the top path. To overcome this difficulty, Jelinek [16] proposed a quantized version of the algorithm in which the paths are placed at random into substacks according to their metric values. All paths whose metric values are within a certain range 17 Chapter 2 Convolutional Coding and Sequential Decoding are stored in a substack. That is a path of metric F is inserted into substack HzS.F<(H+1)A, H if (2.5) where \ is the substack spacing in metric value. In this quantized stack, the search for the top path reduces to the search for the highest non-empty substack. The path that is to be extended is taken at random or usually in a first-in last-out mode. In our computer simulation, the first-in last-out mode was used which is suggested by Johannesson [31] to be near-optimal. The decoding effort of USD is usually measured by the distribution of the number of computations per either decoded bit or decoded block. In this thesis, the number of computations per block is used. A computation is defined as one hypothesis of a node to be on the correct path. 3 The lower bound given by Jacobs and Berlekamp [30] together with the upper bounds given by Savage [321, Falconer [33] and Jelinek [34] have shown that the computational distribution of USD is essentially Pareto. Specifically, define as the total number of computations required to decode the first A branches of the code tree using sequential decoding. With the assumption of correct decoding, Jacobs and Berlekamp [30] showed that for all codes, any DMC and any USD algorithm, CK satisfies P(CKsD > N) > N exp {_o(v’iT) }, (2.6) where 0 (/17) is an asymptotically unimportant term, 4 and the Paretian exponent ct is defined by the following parametric equation R = ()/c. 0 E (2.7) A computation in the stack algorithm is an extension of the top node, while in the Fano algorithm, it is usually defined to be a forward move [5]. 0(b) represents that the value can be upper-bounded by Abs, where A is a finite constant for all indexes i. 18 Chapter 2 Convolutional Coding and Sequential Decoding The function Eo(a) in (2.7) is the smallest concave function greater than or equal to the Gallager function E (cx) [5]. Using the random coding technique over an ensemble of 0 trellis codes [35], Savage, Falconer and Jelinek showed that for a DMC, the first-branch computations C 1 of sequential decoding is upper-bounded as 1 P(C where p < Pr, which is defined as R = N) AN, (2,8) 5 and A is a finite constant independent Eo(pr)/pr, of N [36]. Moreover, it was noticed that asymptotically, the lower bound gives an accurate description of the computational distribution for good codes and practical sequential decoding algorithms [311. The computational performance of USD with a specific convolutional code depends on the decoding algorithm employed and the distance properties of the forward code. In particular, optimum forward sequential decoding performance requires rapid initial column distance growth of the forward code for fast decoding and minimum erasure probability, and a large free distance (dfree) for minimum error probability [37, 38]. Specifically, for a code with rate R and a BSC with a crossover probability p satisfying R<1n2+pln where H(p) = —plnp — (1 — p) ln (1 — ‘—p —H(p), (2.9) p) is the entropy function, the distribution 1 > N) for a specific time-invariant convolutional code is upper-bounded by [38] P(C 1 P(C where o-, 5 and N) <• exp{—id[log N] + log N}, are parameters which of N, and are as defined in [38]. pr = a if the channel is a BSC, but Pr a in genera’. 19 are (2.10) functions of p and R, but independent Chapter 2 Convolutional Coding and Sequential Decoding The distance profile d of a forward code is defined as the column distance function (CDF) over only the first constraint length (i.e., d , d 0 [d , 1 “, dm1) [8]. Since the distance profile determines the initial column distance growth of the forward code, and is easier to compute than the entire CDF, it is usually used instead of the entire CDF as a criterion for selecting codes for use with sequential decoding. A code is said to have a d superior to a d’ of another forward code of the same memory order m, when there is some i m such that dj{j: (2.11) A forward code is said to have an optimum distance profile (ODP) [8] if its d is superior to that of any other forward code of the same memory order. 23 Backward Coding/Decoding Because data is transmitted with a known (e.g. zero) tail in a block mode, all paths depart from and terminate at the same state S. A path in the trellis or tree diagram can thus be viewed in reverse as starting from the final state S 0 and terminating at the initial state So. Every such path also corresponds to an encoded sequence. When viewed as starting from the final state, the distinct paths in the tree or trellis define a code which we refer to as the backward code. From the tree (trellis) of the original forward code, one can generate the backward tree (trellis) for the backward code. Figures 2.4 and 2.5 show, respectively, the backward trellis and tree for the backward code obtained from the original forward code (3, 1, 2) of Figure 2.1. Note that a backward code sequence is obtained by simply reversing the original forward code sequence. In addition, the 20 Chapter 2 Convolutional Coding and Sequential Decoding 0 1 2 3 4 5 6 Time units Figure 2,4 Backward code trellis diagram for the encoder of Figure 2.1. corresponding information sequences are reversals of each other, as well. For example, the highlighted path on the 21 Chapter 2 Convolutional Coding and Sequential Decoding 100 010 111 100 1 010 111 000 001 111 011 101 010 111 000 000 110 011 010 111 001 111 000 000 000 100 010 111 011 V 010 111 000 110 0 001 101 111 000 000 111 101 000 011 010 111 110 101 111 000 110 101 111 000 000 000 000 000 Figure 2.5 Backward code tree diagram of the encoder of Figure 2.1. 22 Chapter 2 Convolutional Coding and Sequential Decoding forward code tree of Figure 2,3 corresponds to the information sequence (1101) and the code sequence (111, 010, 110, 100, 101, 011). Their equivalent reversed information and code sequences are (1011) and (110, 101, 001, 011, 010, 111), respectively (see the highlighted path on the backward tree of Figure 2.5). It can be readily shown that the transfer function matrix Gr(D) of an (n, k, m) backward convolutional code can be expressed in terms of the transfer function matrix of the forward code [39] by Gr(D) = m D g(D’) g’)(D_l) g(D’) g(D_’) g(D’) ,, 1_1) , (2.12) ::‘ g (D’) g’(D’) (D’) .0 As an example, consider the encoder of Figure 2.1. The transfer function matrix of its corresponding backward encoder is Gr(D) = [1 + D + D ,1 +D 2 ,D+D 2 ]. 2 It appears from the above discussion that, given a convolutional encoder, one can perform backward encoding by feeding the information bits to be encoded from the right side of the registers, and set the registers to shift left. A forward (n, k, m) convolutional code is said to be invertible if there exists an n x k matrix G(D) such that G(D)G’(D) for some 6 = (2.13) 6 ID 0, where I is the k x k identity matrix [8]. For an (n, 1, m) convolutional code, this means that there exists a feedforward inverse G(D) of delay S (S 0) if and only if [40] GCD [g(1)(D), g (D),’ 2 23 ‘ ‘ , g)(D)] = , 6 D (2.14) Chapter 2 Convolutional Coding and Sequential Decoding where GCD denotes the greatest common divisor. For an (n, k, m) convolutional code with k > 1, let A ( D), i = 1, 2,..., (), be the determinants of the () distinct k x k submatrixes of the transfer function matrix G(D). Then a feedforward inverse G’ (D) of delay 6 exists if and only if GCD[z(D) : i for some 6 l,2,•••, = ()] (2.15) = 0 [401. Also, (2.14) or (2.15) is a necessary and sufficient condition for a code to be noncatastrophic. We now show that a backward convolutional code is invertible provided that the forward code is invertible. Lemma 2.1: For an invertible (n, 1, m) convolutional code, the delay S in (2.14) is always no greater than m, since deg [g(i)(D)] mfor all i. Lemma 2.2: For an invertible (n, k, m) convolutional code with k (2.15) is always no greater than km, since deg[A(D)] > 1, the delay S in km for all i. Theorem 2,1: A backward code corresponding to an invertible (n, 1, m) convolutional code with delay S is invertible and its delay is equal to (m — 5). Proof: From (2.14), we can write GCD [g() (D’), g(fl_1) (D’), . . . , g(’) (D’)] = D, (2.16) and GCD [Drng(n) (D’),. . . , Dmg(l) (D—’)] = (2.17) From (2.12), we can write (2.17) as , GCD [g.’(D), g c(D)] 1 (D),’’’ g 2 24 = (2.18) Chapter 2 Convolutional Coding and Sequential Decoding From Lemma 2.1, we know that m -6 Q.E.D. 0. Theorem 2.2: A backward code corresponding to an invertible (n, k, m) convolutional code (Ic> 1) with delay 6 is also invertible and its delay is equal to (km — 6). Proof According to (2.12), we can write GCD[Lri(D) i = 1,2,..., ()] = GCD[Dkm/(D_1) : i 1,2,•., ()]. (2.19) From (2.15), we can write (D_1) : i 1 GCD[A = 1,2,••, ()] , 6 D (2.20) = Dkm, (2,21) = Therefore, from (2.19) and (2.20) we can write GCD[Lri(D) : i and from Lemma 2.2, we know km — ()] = 6 Q.E.D. 0. According to the above discussion we can write the following. Corollary 2.1: A backward convolutional code is noncatastrophic if and only if its forward code is noncatastrophic. Corollary 2.2: Decoding of a forward invertible convolutional code can be performed using the corresponding backward code, This will be called backward decoding as opposed to the usual forward decoding. Property 2.1: The distance spectrum [8] of a backward convolutional code is identical to that of the corresponding forward code. This is because the Hamming weight of any path on the forward code tree or trellis is the same as the corresponding path on the backward code tree or trellis. As a consequence of this property, when maximum 25 Chapter 2 Convolutional Coding and Sequential Decoding likelihood decoding is used, the error performance of a backward code is the same as that of the corresponding forward code. Property 2.2: The distance profile of a backward code is usually differentfrom that of the corresponding forward code. This is rather obvious and can be seen from the forward and backward code trees of Figures 2.3 and 2.5. For this example, the first coefficient of the distance profile of the forward code is equal to 3; whereas for its corresponding backward code, it is equal to 2. As a consequence of this property, backward sequential decoding is in general not as efficient as its corresponding forward sequential decoding when a forward ODP code is used. With our bidirectional sequential algorithms to be discussed later, decoding is per formed in both forward and backward directions. It is thus desirable to use codes that possess the same good distance profile in both directions (see Theorem 4.1 in Chapter 4). A sufficient condition for a forward code to have the same distance profile as its corre sponding backward code is symmetry. A code is said to be symmetric or reversible [41, 42j if the code generators of the forward code are the same as those of the corresponding backward code. For example, the code with generators g’)(D) = = 2 and g 1+D+D (D) 2 2 is symmetric. Moreover, since this code has an optimum distance profile [43j 1+D we call it a symmetric bidirectional optimum distance profile code. A code is said to have a symmetric bidirectional optimum distance profile (SBODP) if it is symmetric and its distance profile is optimum. Unfortunately, most known ODP codes do not possess the property of symmetry. However, one can search for new symmetric codes that have a good distance profile. 26 Chapter 2 Convolutional Coding and Sequential Decoding 24 Search for Good Symmetric Codes Using a computer search procedure [44, 45], we have found good symmethc (2, 1, m) non-systematic codes with m ranging from 2 to 25. In our computer search, we first attempt to find a symmetric code with the same distance profile as the corresponding known ODP code. If more than one code is found, we select the one with the highest dfree and the minimal number of paths of weight dfree. Of course, we considered only noncatastrophic codes. Results are given in Table 2.1, Also included in the table are the free distances of known ODP codes. Distance spectra of the found SBODP codes are listed in Table 2.2. In Table 2.2, adiree+j denotes the number of paths of weight dfree + , while Cdf +i denotes the total number of nonzero information bits on all these paths. From Table 2.1, one can immediately notice that SBODP codes do not exist for some values of m, which are marked by dash signs. Moreover, as it can be seen from Table 2.1, to find a SBODP code, usually dfree had to be sacrificed. In the following, some new codes which slightly compromise the distance profile for a better dfree are presented. 27 Chapter 2 Convolutional Coding and Sequential Decoding Table 2.1 Symmetric bidirectional optimum distance profile (SBODP) codes 6 m 2 dfree dfree of ODP 5 5 5 7 3 4 5 6 7 — — 23 31 6 — — — 127 165 — 8 6 7 8 10 10 — — — — — — 9 1157 1731 12 12 10 2251 3003 9 14 11 5247 7125 12 14 — — 13 27537 37275 12 16 14 56777 77735 12 17 15 134277 176435 16 18 16 222511 304443 14 19 17 475417 741571 16 20 18 1346477 1762635 18 21 — — — 20 5734323 6261675 18 22 21 12022373 15744405 20 24 22 31065423 23340731 21 24 23 44407043 61070111 18 26 24 176153077 134442235 24 26 25 221446737 373544611 24 27 8 12 19 6 2 G The m = — 12 15 22 2 SBODP code is the known ODP code found by Johannesson [43] and is listed here for completeness. 28 Chapter 2 Convolutional Coding and Sequential Decoding Table 2.2 Distance spectra of SBODP codes (adjr+ji = m [cdfr+j = 0, 1,2, 0, 1,2, 1 4 23 31 6 (1,0,4,0,22,0,124,0,682,0) [1,0,10,0,96,0,778,0,5616,0] 6 127 165 8 (1,0,5,0,35,0,187,0,1074,0) [2,0,15,0,188,0,1275,0,9350,0] 9 1157 1731 12 (8,0,16,0,159,0,741,0,5027,0) [32,0,78,0,1150,0,6390,0,52470,0] 10 2251 3003 9 (1,0,1,1,,1,18,21,27,110,256) [1,0,5,4,9,88,123,186,796,2060] 11 5247 7125 12 (3,0,10,0,33,0,216,0,1216,0) [10,0,42,0,228,0,1702,0,11248,0] 13 27537 37275 12 (1,0,2,0,11,0,54,0376,0) [2,0,6,0,68,0,352,0,3278,0] 14 56777 77735 12 (2,0,0,0,10,0,16,0,208,0) [4,0,0,0,46,0118,01420,0] 15 134277 176435 16 (13,0,0,0,166,0,96,0,3651,0) [60,0,0,0, 13 30,0,734,0,37790,0] 16 222511 304443 14 (2,0,3,0,7,0,59,0,237,0) [5,0,13,0,36,0,455,0,1966,0] 17 475417 741571 16 (2,0,1,0,33,0,97,0,751,0) [6,0,7,0,220,0,753,0,7386,0] 18 1346477 1762635 18 (1,0,12,0,51,0,302,0,1801,0) [5,0,70,0,365,0,2720,0,18953,0] 20 5734323 6261675 18 (1,0,1,0,12,0,89,0,474,0) [5,0,6,0,92,0,782,0,5034,0] 21 12022373 15744405 20 (3,0,8,0,36,0,196,0,1246,0) [24,0,56,0,288,0,1782,0,14312,0] 22 31065423 23340731 21 (1,5,7,18,30,74,175,341,863,2097) [5,36,37,168,298,670, 1721,3686,9883,253661 23 44407043 61070111 18 (1,0,2,0,4,0,19,0,54,0,) [1,0,10,0,20,0,116,0,406,0] 24 176153077 134442235 24 (2,3,26,42,93,173) [16,13,234,350,884,1769] 25 221446737 373544611 24 (4,0,17,0,78,0,508,0,2780,0) [26,0,135,0,756,0,5540,0,34644,0] 29 Chapter 2 Convolutional Coding and Sequential Decoding In the event that no SBODP code can be found, we then slightly decrease the distance profile to find a good code (large dfree and good distance profile). The same method can be applied when the dfree of a SBODP code is inferior to that of the known ODP code. We shall refer to this symmetric non-ODP code as an almost BODP (SABODP) code, The computer search results are listed in Table 2.3. The dash sign in Table 2.3 indicates that no code with a better dfree is found after slightly decreasing the distance profile. Distance spectra of the found SABODP codes are listed in Table 2.4. The search for SABODP codes is conducted as follows. We first examine all codes obtained through possible modifications of the last four coefficients of the distance profile of the corresponding ODP code. Then, among these codes, we select the one with the highest dfree and the best distance profile. The last four coefficients of the distance profile of the obtained codes are given in the last column of Table 2.3. When the distance profile is slightly decreased, a code with a better dfree may be found. For example, dfree of the SBODP code of memory m = 23 is equal to 18. When only the last coefficient of the distance profile is decreased by one, a SABODP code with dfree = 24 is found. Notice that for any (2, 1, m) systematic code to be symmetric, the second adder of the encoder should be connected to only the last stage of the shift register. This would result in a very poor code in terms of both the free distance and the distance profile. 30 Chapter 2 Convolutional Coding and Sequential Decoding Table 2.3 Symmetric almost bidirectional optimum distance profile (SABODP) codes m 2 3 4 G’ — 13 , — dm — 15 2,3,3,3 dfree — 6 dfree of ODP 5 6 7 — — — 57 75 3,4,4,4 8 — — — — 7 247 345 4,5,5,5 10 10 8 507 705 5,5,5,6 10 12 — — — 2617 3615 6,6,6,6 — — — 5 6 9 10 11 — — 12 — 8 10 12 14 14 12 16767 12345 6,7,7,7 14 15 13 24433 33045 6,7,7,7 14 16 14 47263 63271 7,7,8,8 16 17 — — — — 16 232357 367131 8,8,8,8 16 19 17 723705 507627 8,8,8,8 18 20 18 1167671 1712517 8,8,9,9 19 21 19 2714477 3744635 9,9,9,9 20 22 20 5267447 7117325 9,9,9,9 20 22 21 17242231 11444257 9,9,9,10 22 24 22 32427561 21675053 9,10,10,10 22 24 23 55231643 61346255 10,10,10,10 24 26 - - - - - - - - 15 24 25 31 18 26 27 Chapter 2 Convolutional Coding and Sequential Decoding Table 2.4 Distance spectra of SABODP codes ,,+,i = 1 (ad 0, 1,2, 0, 1,2, m G’ 3 13 15 6 (2,0,10,0,49,0,241,0,1185,0) [4,0,38,0,277,0,1806,0,11063,0] 5 57 75 8 (3,0,12,0,70,0,397,0,2223,0) [8,0,46,0,400,0,2925,0,20446,0] 7 247 345 10 (4,0,19,0,95,0,539,0,3111,0) [10,0,86,0,649,0,4400,0,30287,0] 8 507 705 10 (2,0,10,0,51,0,287,0,1560,0) [4,0,36,0,289,0,2118,0,13984,0] 10 2617 3615 12 (2,0,15,0,71,0,383,0,2333,0) [6,0,65,0,494,0,3053,0,22486,0] 12 16767 12345 14 (2,0,20,0,93,0,612,0,3283,0) [6,0,111,0,701,0,5456,0,35447,0] 13 24433 33045 14 (3,0,10,0,47,0,280,0,1687,0) [11,0,52,0,323,0,2478,0,17675,0] 14 47263 63271 16 (6,0,27,0,140,0,808,0,4745,0) [34,0,151,0,1120,0,7520,0,52014,0] 16 232357 367131 16 (1,0,2,0,38,0,239,0,1131,0) [10,0,6,0,246,0,1917,0,11114,0] 17 723705 507627 18 (1,0,29,0,119,0,578,0,3433,0) [3,0,182,0,953,0,5816,0,39833,0] 18 1167671 1712517 19 (4,7,1,25,74,138,353,953,2090,5351) [20,58,3,182,612,1242,3369,10278,23840,66680] 19 2714477 3744635 20 (5,0,45,0,160,0,851,0,5184,0) [24,0,319,0,1440,8775,0,61394,0] 20 5267447 7117325 20 (1,0,19,0,101,0,423,0,2492,0) [4,0,143,0,884,0,4295,0,29122,0] 21 17242231 11444257 22 (9,0,58,0,275,0,1290,0,7381,0) [49,0,488,0,2695,14502,0,94513,0] 22 32427561 21675053 22 (1,0,44,0,128,0,793,0,4094,0) [11,0,368,0,1078,0,8648,0,50150,0] 23 55231643 61346255 24 (14,0,60,0,348,0,1959,0,10600,0) [108,0,592,0,3602,0,24395,0,145826,0] djree [Cdf,,+ji = 32 Chapter 3 Bidirectional Sequential Decoding Algorithms In this chapter, BSD algorithms in which the code tree is explored from both forward and backward directions are presented. Operations of all the proposed algorithms are essentially the same, except for their stopping rules. In our BSD algorithms, two separate stacks are used. One is used for the forward search of the tree and is called the forward stack (FS), while the other is used for the backward search and is called the backward stack (BS). Starting from the root nodes of the forward and backward trees (trellises), respectively, forward and backward search operations are performed alternatively according to the usual stack algorithm. Define the total number of computations needed to decode one block by any BSD algorithm as the sum of the computations performed by forward and backward decoders. The proposed BSD algorithms can be classified into two categories: BSD-no-merge and BSD-merge. In BSD-merge, decoding stops whenever a path in one stack (PS or BS) merges with a path in the other stack. A path in one stack is said to merge (or agree) with a path in the other stack if the two paths share the same encoder state at a common tree level. In the class of BSD-no-merge, forward and backward portions of the decoded path do not have to agree with each other. In the following, Algorithm TAmeet which belongs to the class of BSD-no-merge is first presented and analyzed. Next, two algorithms, Algorithm TAmerge and Algorithm Timerge, which belong to the class of BSD-merge, are presented and analyzed. Finally, 33 Chapter 3 Bidirectional Sequential Decoding Algorithms Algorithm HTTmerge, which is a hybrid scheme of B SD-merge and BSD-no-merge, is presented. For the convenience of describing the following algorithms, let FTQP 1 denote the level of the top node in the FS, and IF the farthest level 7 reached by the forward decoder, Similarly, let iBTOP denote the level of the top node in the BS, and ‘B the farthest level reached by the backward decoder. 3.1 Algorithm TAmeet, a BSD-no-merge Algorithm TAmeet stops decoding whenever a path in one of the two stacks meets a path in the opposite direction, and these two meeting paths do not have to agree with each other. The level of the forward tree at the meeting point is denoted by 1. As a consequence, the portion of the tree explored by forward decoder does not overlap with the portion explored by backward decoder, as illustrated by Figure 3.1. Algorithm TAmeet is as follow. Algorithm TAmeet: Step 0: Place the root nodes of forward and backward trees in the FS and BS, respectively. Step 1. Compute the metrics of all successors of the top node in the FS, and enter the new nodes in the FS, ordered with respect to their cumulative metrics. Step 2.’ Remove from the FS the node whose successors were just inserted. Step 3: Check if 1 F+ 8T0p = 1 L. If yes, stop decoding and go to step 7. Otherwise, go to the next step. The farthest level is not necessarily equal to the level of the top node because the decoder allows backtracking. 34 Chapter 3 Bidirectional Sequential Decoding Algorithms Stop decoding if one of the nodes is at the top of FS or BS Meeting point 1 ® Extended node 0 + Unextended node Forward tree Backward tree root node root node 1 L branches Figure 3.1 Illustration of the stopping rules of Algorithm TAmeet. Step 4: Compute the metrics of all successors of the top node in the BS, and enter the new nodes in the BS, ordered with respect to their cumulative metrics. Step 5: Remove from the BS the node whose successors were just inserted. Step 6: Check if FT 1 P 0 +1 B L. If yes, stop decoding and go to step 10. Otherwise, return to step 1. 8 in the BS is accepted as the backward portion of the decoded Step 7: The top path path. The forward portion of the decoded path is selected from the FS, whose end node is at level IF, (select the one with the highest metric value if there is more than one path at level IF). If mm ([m/2] , iBo) = B’rop’ 1 retreat m — BTOP 1 branches from the end node of the forward portion of the decoded path, then take the information 8 Defined as the path with the top node as its end node, 35 Chapter 3 Bidirectional Sequential Decoding Algorithms symbols corresponding to the rest of the path as the decoded symbols, and go to step 13. Otherwise go to the next step. Step 8: If mm ([m/21, iF) = F 1 retreat m — F 1 branches from the end node of the backward portion of the decoded path, then take the information symbols corresponding to the rest of the path as the decoded symbols, and go to step 13. Otherwise go to the next step. Step 9: Retreat the last rrn/21 and [m/2J branches from the forward and backward portions of the decoded path, respectively. Take the information symbols corresponding to the rest of the two paths as the decoded symbols, and go to step 13. Step JO: The top path in the FS is accepted as the forward portion of the decoded path. The backward portion of the decoded path is selected from the BS, whose end node is at level lB (select the one with the highest metric value if there is more than one path at level IB). If mm ([rn/2j = 1 FTOP’ , retreat m — FTQP 1 branches from the end node of the backward portion of the decoded path, then take the information symbols corresponding to the rest of the path as the decoded symbols, and go to step 13. Otherwise go to the next step. Step 11: If mm 1, 1 2 (Em/ B) = B 1 retreat m — B 1 branches from the end node of the forward portion of the decoded path, then take the information symbols corresponding to the rest of the path as the decoded symbols, and go to step 13. Otherwise go to the next step. Step 12: Retreat the last [rn/2J and 1 2 Em/ branches from the forward and backward portions of the decoded path, respectively. Take the information symbols corresponding 36 Chapter 3 BidirectionaL Sequential Decoding Algorithms to the rest of the two paths as the decoded symbols. Step 13: Output the decoded information symbols. When the channel is not too noisy, the forward and backward portions of the decoded path may actually merge (agree) with each other since a sequential decoder always follows the most likely path. Should this not be the case, a valid path (not necessarily the correct one) can be assembled by retreating rn branches from forward and backward portions of the decoded path, as described above. Note that in the absence of a burst in a block, the meeting point I would most of the time fall near the middle of the tree. However, should there be a single burst error within a block, it is expected that the meeting point would occur near the middle of the burst. Thus by retreating about m/2 branches from forward and backward meeting paths, the burst error may be totally eliminated if its length is less than m. The meeting point 1 at the end of decoding in Algorithm TAmeet is equal to decoding is stopped at step 6, or (L — BTOP) 1 FTQP 1 if decoding is stopped at step 3. Note that forward and backward decoders always meet somewhere in the tree, i.e. 1 < 1 < L — 1. Let x be an arbitrary level in the forward code tree, and imagine cutting the tree into two separate subtrees at this level. Let C denote the number of computations needed by the forward decoder to decode x branches without searching beyond level x of the forward tree, i.e., when the top node in the PS reaches depth x for the first time. This is equivalent to decoding an x branch-tree without a terminating tail. Similarly, let C_ denote the number of computations needed by the backward decoder to decode an (L x) — branch-tree without searching beyond level (L 37 — x) of the backward tree. Clearly, for any Chapter 3 Bidirectional Sequential Decoding Algorithms DMC and any fixed arbitrary level x, random variables C and Cf are independent because the forward and backward search areas are disjoint. Let us assume that the amount of time to hypothesize (extend) one node is the same for both the forward and the backward decoders. Let CL denote the total number of computations needed to decode one block by Algorithm TAmeet, By the nature of Algorithm TAmeet, we can write the following property. Property 3.1: Let I be the meeting point, then L — — 1 Cf, J 2C( and C_ F 1 C 1 + 1 and 2CE_ 1 + 1, Cf_ if decoding is stopped at step 6 if decoding is stopped at step 3 (3 J) ProofS According to Algorithm TAmeet, if decoding is stopped at step 6, the forward decoder must have extended the same number of nodes as the backward decoder. if decoding is stopped at step 3, however, the forward decoder must have extended one more node than the backward decoder. Let us first assume that decoding is stopped at step 6. By the definition of the meeting point I (see Figure 3.1), we know that which means CL = 2C(. Moreover, since 1 B = L — 1 BTOP’ 1 Now, suppose that decoding is stopped at step 3. In this case, CL BTOP = 1 FTQP = 1 1, then C_ 1 > Cf. = 1 + 1 since 2C_ L—l, where the additional computation is due to the last extension by the forward decoder. Moreover, since 1 F = 1 FTOP’ 1 then Cf > Cf_ 1 + 1. Q.E.D. Corollary 3.]: Let I be the meeting point, then CL since CL — 1 < 2Cf and CL — — 1 1 <2 mm (cf, 1 cf_ ) , 1 according to Property 3.]. 2C_ 38 (3.2) Chapter 3 Bidirectional Sequential Decoding Algorithms Obviously, CL = 2C( = 1 if and only if both top nodes in the FS and BS 2Cf_ are at forward tree level I at the end of decoding, i.e., both top nodes are at the meeting point I. It is worth noticing that the minus one in (3.2) can actually be ignored and thus C’L is essentially equal to 2 mm (Of’, cf_ ). 1 Now, we introduce an important property which is the foundation in our analysis of the computational performance of the proposed BSD in Chapter 4. Property 3.2: Let integer i E [1, L — — 1], then 1 <2 max {min(Cf. cP_j]. (3.3) Proof First of all, note that there must exist a meeting point in Algorithm TAmeet, i.e., I < 1 < L — 1. By Corollary 3.1, one can write — 1 <2mm (c(,o_). (3.4) Moreover, we can write max [min(CfCf_j)] since 1 < 1 < L — mmn(Cf,Cf_j), (3,5) Q.E.D. 1. In fact, it is easy to see that CL is virtually equal to 2 max [mm (of, C_)] . This approximation, however, is not necessary for the proof of the upper bounds on the computational distribution of Algorithm TAmeet in Chapter 4. It is interesting to point out that Forney’s scheme [22] belongs to BSD-no-merge since forward and backward decoders do not need to agree with each other, and its Note that CL mm (cr, ce,) for all possible i. 39 Chapter 3 Bidirectional Sequential Decoding Algorithms total number of computations is equal to 2mm (Ci, of), where Cf and of denote the number of computations needed by forward and backward decoders to decode the whole block, respectively. Let us now compare Algorithm TAmeet with the bidirectional M algorithm [28], It is obvious that they both belong to the same category of BSD-no-merge since merging between the forward and backward portions of the decoded path is not required in both algorithms. The main difference, however, is in the nature of the search algorithm employed in the forward and backward decoders. In the bidirectional M algorithm the number of computations per decoded branch is constant, whereas in Algorithm TAmeet that number is variable since backtracking is allowed, As a consequence, erasure free decoding is possible in the bidirectional M algorithm, while Algorithm TAmeet can not guarantee it, On the other hand, the bidirectional M algorithm suffers the correct path loss problem. Another difference is how they handle a burst. In Algorithm TAmeet, forward and backward decoders search into forward and backward trees alternatively no matter how many or where the bursts are located. At the end of decoding, both decoders in Algorithm TAmeet retreat about m I 2 branches. In the bidirectional M algorithm, however, in order for the first decoder encountering a burst to wait for the other decoder’s help from the opposite direction, a burst detection method is required. Moreover, only the bad decoder retreats m branches at the end of decoding. The BER of the bidirectional M algorithm degrades with increasing block length L [281, while in Algorithm TAmeet, the BER improves as L increases (see Chapter 5 for analysis and simulation results). 40 Chapter 3 Bidirectional Sequential Decoding Algorithms 32 Algorithm TAmerge, a BSD-merge In BSD-merge, decoding does not stop whenever two opposing paths meet each other as in the algorithm described above. Instead, a merging test is applied to opposing paths and decoding stops only when the test is successful. We present in the following Algorithm TAmerge which belongs to the class of BSD-merge, Algorithm TAmerge works in the same way as Algorithm TAmeet, except that at the meeting point I decoding is not stopped, unless a pair of forward and backward paths merge with each other. Instead, decoding is continued until a pair of merged forward and backward paths is found. Let I denote the level of the forward tree when this happens. Obviously, here the portion of the tree searched by forward decoder does overlap with that portion searched by backward decoder, as illustrated by Figure 3.2. More specifically, Algorithm TAmerge is as follow. Algorithm TA merge: Steps 0 to 2 are exactly the same as in Algorithm TAmeet. Step 3: If 1 BTOP +iF to L — BTQP 1 > L, check the top path in the BS with all paths with depths equal in the FS. If one (or more) merging path is found, stop decoding. Among all merging paths, select the one with the highest cumulative metric as the decoded path. Otherwise, go to the next step. Steps 4 and 5 are exactly the same as in Algorithm TAmeet. Step 6: If 1 FTQP +lB> L, check the top path in the FS with all paths with depths equal to L — Frop 1 in the BS. If one (or more) merging path is found, stop decoding. Among 41 Chapter 3 Bidirectional Sequential Decoding Algorithms Merging point 10 1 — Forward tree root node Correct path • -o-- — Incorrect path Backward tree root node 0- — \— 1 \ \\ \ K o Q- 0 0— \ — \ \ L branches Figure 3.2 Illustration of the overlapping in BSD-merge. all merging paths, select the one with the highest cumulative metric as the decoded path. Otherwise, go to step 1. It is worth pointing out that all nodes in one stack (FS or BS) can be easily linked with respect to their depths. Hence, in the merging test (steps 3 and 6), it is easy to find all those nodes which should be tested with the top node in the opposite direction. During the merging test at steps 3 and 6, the top path in either the FS or BS is only tested with end nodes of paths in the opposite direction, which are at the same level with the tested top path. A natural question that may arise is whether two opposing paths may pass each other or not since intermediate nodes are not being examined during the merging test. For example, Figure 3.3 illustrates such a situation where two opposing 42 Chapter 3 Bidirectional Sequential Decoding Algorithms paths may have common encoder states at some of their intermediate nodes, We now show that the events shown in this figure can not take place in Algorithm TAmerge. • common encoder state (C) (d) Figure 3.3 Example of opposing paths passing each other. First of all, events like (c) and (d) in Figure 3.3 can not happen unless event (a) or (b) occurs in the first place. This is because both forward and backward decoders must reach their predecessors before any further extensions. We are going to show that events (a) and (b) in Figure 3.3 can not occur in algorithm TAmerge. Clearly, events (a) and (b) are the same type of event. Hence, only event (a), which is shown in Figure 3.4 for more detail, will be considered in the following. 43 Chapter 3 Bidirectional Sequential Decoding Algorithms level level x FOE]’ ‘mi L) U B] Figure 3.4 Typical impossible event in Algorithm TAmerge. Figure 3.4 implies that the backward path with end node BO merges with the forward path with its intermediate node FO. Moreover, EQ and BO are the only common nodes between the forward and backward paths. As Figure 3.4 indicates, (Ii, 12, , Im) denotes the common encoder state. Moreover, the encoder state of node B] is either (12,J3, , J,O) or (12,13,’’ B]. Similarly, let , ‘m, 1). Let denote the last bit of the encoder state denote the last bit of the encoder state F]. Since FO is the intermediate state of the forward path, all successors of node FO must have been reached, and FO is consequently removed from the FS (steps 1 and 2 in Algorithm TAmerge). Let F] and F]’ be the successors of FO as illustrated in Figure 3.4. Since FO and BO are the only common nodes between forward and backward paths, Fl and B] are common nodes, i.e. = Otherwise, nodes Fl’ and B] will share a common encoder state, as indicated in Figure 3.5. Now, let us assume that F] is still in the FS, which means F] has not been extended 44 Chapter 3 Bidirectional Sequential Decoding Algorithms F] level x • common encoder state Figure 3.5 Illustration of a typical impossible event in Figure 3.4. Stop decoding when B] becomes the top node in the BS. levelx÷1 = 10 • common encoder state Figure 3.6 Illustration of the possible event. yet before the backward path reached node BO. According to step 3 in Algorithm TAmerge, however, decoding must have been stopped at the time when B] is at the top of the BS. Thus, BO does not have any chance to be reached because B] must be at the top of the BS before BO can be reached. This is illustrated in Figure 3.6, which clearly shows that end node F] of the forward path merges with end node B] of the backward path. Next, let us assume that F] is not in the FS, which means F] has already been extended. As shown in Figure 3.7, let P2 and F2’ denote the successors of F], and assume that F2 and B2 have the same encoder state. Clearly, Figure 3.7 shows the same 45 Chapter 3 Bidirectional Sequential Decoding Algorithms type of event as that in Figure 3.3 (d) since there is more than one common encoder state. Based on our discussion above, this event can not occur before an event like that in Figure 3.3 (a) or (b) happens. Following this same line of reasoning, we get the following property. level x . common encoder state Figure 3.7 Illustration of another impossible event. Property 3.3: in Algorithm TAmerge, forward and backward paths in the FS and BS can only merge at their end nodes. That is merging paths can not overlap each other. Since the decoded path is a merged path, forward and backward portions of the decoded path will never overlap in Algorithm TAmerge. Furthermore, we can conclude that overlapped forward and backward paths in Algorithm TAmerge do not have the same encoder state at any of their common tree levels because otherwise Property 3.3 would be breached. Two forward and backward paths are said to be totally distinct as long as they do not overlap or do not share the same encoder state at any common tree level in their overlapping portion. Figure 3,8 illustrates all possible types of paths in the FS and in the BS. In Figure 3.8 (a), forward and backward paths do not have a chance to meet. 46 Chapter 3 Bidirectional Sequential Decoding Algorithms In Figure 3.8 (b), merged forward and backward paths do not overlap, while in Figure 3.8 (c) forward and backward paths do not share any common encoder state in their overlapping area. Thus, we can write the following property. (a) (b) Forward tree level !F Xp Backward tree level 1 B (c) Figure 3.8 Distinct forward and backward paths in the FS and BS. Property 3.4: in Algorithm TAmerge, all paths in the FS are totally distinct from all paths in the BS, Like in Algorithm TAmeet, forward and backward decoders cannot reach their terminal nodes in Algorithm TAmerge because all successors of the root nodes of forward 47 Chapter 3 Bidirectional Sequential Decoding Algorithms and backward trees were reached at the beginning. Thus, forward and backward decoders must merge somewhere in the tree, i.e., 1 < 10 < L — 1. In the following analysis of Algorithm TAmerge, correct decoding is assumed, which is a good approximation, especially for long memory codes. For an arbitrary forward tree level x, let us define C as the number of computations required to correctly decode the first x branches by forward decoder. Similarly, define as the number of computations required to correctly decode the last (L-x) branches by backward decoder. Unlike Algorithm TAmeet, forward decoder may extend incorrect paths beyond level x before it finally decodes all x branches. Backward decoder may also extend incorrect paths beyond level (L-x) before it finally decodes all (L-x) branches. Now, let CL denote the total number of computations needed by Algorithm TAmerge with the assumption of correct decoding. By analogy to Algorithm TAmeet, we can claim the following properties. Property 3.5: Let 10 be the merging point of Algorithm TAmerge, then I 20( and Cf. Gf’, if decoding is stopped at step 6 +l and C(CL 10 2C_ 1 + 1, if decoding is stopped at step 3 (3.6) Proof Similar to Algorithm TAmeet, if decoding is stopped at step 6, forward decoder must have extended the same number of nodes as backward decoder. If decoding is stopped at step 3, however, forward decoder must have extended one more node than backward decoder. Without loss of generality, let us assume that decoding is stopped by step 6. Thus, the top node at the FS is at level 1 when decoding is stopped, and the forward decoded path up to this top node is correct. 48 Hence, the forward Chapter 3 Bidirectional Sequential Decoding Algorithms decoder must have performed exactly C( computations. Furthermore, the algorithm does not require that the backward decoder totally decodes the last (L Of — l) branches, i.e., Of’. Q.E.D. Coro1lay 3.2: Let 1 be the merging point in Algorithm TAmerge, then —1 <2mm (C(,Cf_ ), 1 (3.7) according to Property 3.5. Similar to the discussion of Corollary 3.1, CL is almost equal to 2 (or, Property 3,6: Let integer i CL — e [1, L — 1], then 1 <2 max [rnin(O[, Of_)]. Proof’ Same as the proof of Property 3.2. (3.8) Q.E.D. Similar to Algorithm TAmeet, one can easily conclude that CL is practically equal to 2 max [mm (Of’, c)]. Strictly speaking, the number of computations without the assumption of correct decoding is less than or equal to that with the assumption. We now examine the merging test (step 3 and step 6) of Algorithm TAmerge in more detail. In Algorithm TAmerge, decoding stops whenever a top path either in the FS or in the BS merges with any path from the opposite direction. Clearly, the number of paths being tested in the merging test is a random variable. This variable is small under good channel conditions and increases as the channel gets noisier. Suppose that the top path in the BS merges with a forward path FP which is not at the top of the FS (step 3). The end node of FP is at the merging point 1 according to Property 3.3. As a consequence of 49 Chapter 3 Bidirectional Sequential Decoding Algorithms this merging, forward decoder does not need to extend all forward paths whose metrics are higher than M[FP] to decode the forward portion of the decoded path FP. In other words, the merging test at step 3 effectively reduces the number of computations needed by forward decoder to decode PP. Similarly, the merging test at step 6 may reduce the number of computations needed by backward decoder if the top path in the FS merges with an opposite path that is not at the top of the BS. Hence, the multiple examinations of the top path in one stack with all paths in the opposite stack results in a reduction in computational variability since the overall number of computations per decoded block is reduced. The parallel to this is the multiple path extensions in the generalized stack algorithms of Haccoun and Ferguson [19], which allows a reduction in computational variability at a cost of an increase in average number of computations per decoded block. Note that in Algorithm TAmerge, the average number of computations is not increased, but is even reduced, as it will be shown in Chapter 4, 33 Coarsening on Algorithm TAmerge In order to efficiently perform the merging test in Algorithm TAmerge, nodes have to be linked together with respect to their depths. However, this introduces the drawback of extra processing time and memory needed for storage in forward and backward decoders. In the following, we provide two modified algorithms, quite similar to Algorithm TAmerge but without the aforementioned drawback. These two algorithms work best with quantized stacks. The modification is done in the merging test of Algorithm TAmerge (steps 3 and 6). Here, the merging test is only performed among all nodes in the highest non-empty 50 Chapter 3 Bidirectional Sequential Decoding Algorithms substacks in the FS and BS instead of testing the top node in each stack with all possible merging nodes in the opposite direction. As a consequence, however, two opposing paths may pass each other because forward and backward portions of the merged path may not coexist in the highest non-empty substacks at the same time. in other words, forward and backward portions of the merged path may overlap, as illustrated in Figure 3.3. In order to reduce the possibility of two opposite paths passing each other, not only the endpoints of the two possible merging paths should be tested but also other possible merging points in the overlapping area as will be described later. Coarsening I (Algorithm TTmerge): Steps 0 to 2 are the same as in Algorithm TAmeet, except the top node is now any node in the non-empty highest substack, Step 3: If F 1 + 1 B > L, check all paths in the non-empty highest substack in the BS with all paths in the non-empty highest substack in the FS. If there is an overlap between forward and backward paths that are being tested, check all encoder states in the overlapping area. If one or more merging paths is found, stop decoding. Among all merging paths, select the one with the highest cumulative metric as the decoded path. Otherwise, go to the next step. Steps 4 and 5 are the same as in Algorithm TAmeet, except the top node is now any node in the non-empty highest substack. Step 6: If F 1 + 1 B > L, check all paths in the non-empty highest substack in the FS with all paths in the non-empty highest substack in the BS. If there is an overlap between the forward and the backward paths being tested, check all encoder states in 51 Chapter 3 Bidirectional Sequential Decoding Algorithms possible merging nodes (a) path inFS BS to L (b) Figure 3.9 Merging test in Algorithm TTmerge, the overlapping area. If one or more merging paths is found, stop decoding. Among all merging paths, select the one with the highest cumulative metric as the decoded path. Otherwise, go to the next step. Step 7: Check if the node to be extended in the highest non-empty substack in the FS is a terminal node in the forward tree. Check if the node to be extended in the highest non-empty substack in the BS is a terminal node in the backward tree. If either one is a terminal node, stop decoding and that top path is accepted as the decoded path. In the event that both top paths in the PS and BS are terminal nodes in the forward and backward trees, respectively, the path with the highest cumulative metric is selected as the decoded path. Otherwise, go to step 1. 52 Chapter 3 Bidirectional Sequential Decoding Algorithms If two opposing paths pass each other and one of them reaches the end of its corresponding tree, clearly the decoding should be stopped. Therefore, step 7 is needed in Algorithm TTmerge. However, step 7 can be eliminated if we treat the forward and backward root nodes as connected with zero paths. For example, if the top node in the FS is at level L, it can be viewed as a merging node with the backward root node. The merging test in Algorithm TT’merge is illustrated in Figure 3.8. In Figure 3.8 (a), the forward and backward paths do not overlap and thus only the end points are tested. However in Figure 3.8 (b), forward and backward paths do overlap over to branches. In this case all intermediate nodes are examined. This involves t +1 comparisons which 0 can be done by simply performing one exclusive-OR operation between the last m+to information symbols of the tested paths, and checking if there exits m consecutive zeros, In case the depth of one of the tested paths is smaller than (m + to), the shorter one can be viewed as one rooted by a zero path of any length. It is interesting to notice that the number of nodes in the highest forward and backward substacks varies according to the channel conditions. When the channel is quiet, only the correct node is likely to float in the highest substack. As the channel condition degrades, the number of nodes in the highest substacks tends to increase. This means that the merging test involves a variable number of examinations, which depends on the channel conditions. Coarsening 2 (Algorithm HTTmerge): We now describe Algorithm HTTmerge which is a hybrid version of Algorithm TTmerge and Algorithm TAmeet. Algorithm HTTmerge is exactly the same as Algorithm 53 Chapter 3 Bidirectional Sequential Decoding Algorithms Tlmerge except that the number of information symbols required to be the same in the merging test of Algorithm TTmerge (steps 3 and 6), denoted m, is less than is that if two opposing paths agree on their last they would eventually agree on all m mj < m m, The idea branches, it is anticipated that last branches if decoding is continued. Of course this may not happen all the time, especially if mh is relatively small, and this premature stopping may introduce additional decoding errors. If mh=m, Algorithm HTTmerge becomes equivalent to Algorithm TTmerge. In the other extreme, if mt=O, it becomes equivalent to Algorithm TAmeet. Clearly, Algorithm HTTmerge reduces the amount of computations by compromising the error performance since the number of matching information symbols mh is less than m. Hence, Algorithm HTTmerge can provide a trade-off between computational effort and error performance by controlling the number of matching information symbols in the merging test. 54 Chapter 4 Computational Performance of BSD In this chapter, we analyze the computational performance of the proposed BSD algorithms. It is assumed that the channel is a DMC. Also, BSD using the stack algorithm is assumed. However, all results in this chapter can be applied to arbitrary BSD sequential denote the total number of computations performed by any BSD decoders. Let algorithm for decoding one block of L branches (including a known tail), where one computation is defined as in Chapter 3, i.e., the extension of one node from either forward or backward direction. We are mainly concerned with the distribution of Cf SD, i.e., P (Cf SD > N). In the following, we first quickly demonstrate the property of the computational distribution by a simple approach. Then, Algorithm TAmeet is analyzed in detail. More specifically, upper bounds on the computational distribution are given by Theorem 4.1 (for a specific time-invariant convolutional code) and by Theorem 4.2 (for an ensemble of trellis codes). Moreover, we conjecture that Theorem 4.2 can be applied to Algorithm TAmerge. A lower bound on the computational distribution for any code and any type of BSD is also derived. The average number of computations and the computational cutoff rate of BSD are then discussed. Finally, extensive computer simulation results are provided, confirming our theoretical analysis. 4.1 A Simple Approach to Computational Distribution Before giving the general derivation, we first quickly show interesting results on the computational distribution of BSD when the code rate is below the computational 55 Chapter 4 Computational Performance of BSD cutoff rate Rcomp. Let us define a dip [36, 46] on the correct path as the largest metric difference between two consecutive non-adjacent breakout nodes (see Figure 4.1 (a)). Notice that the dip is almost the same whether decoding is performed from forward or backward direction because the metric of the backward correct path is the mirror image of its corresponding metric of the forward correct path (see Figure 4.1 (a) and (b)). Alternatively, the dip can be viewed as a “noise burst” [30], It is admitted that the Pareto distribution P (CysD > N) N—Pr is caused by the following two opposing effects: (i) the probability of the dip size (burst length) y decreases exponentially with ; (ii) the number of computations necessary to decode the correct path increases exponentially with ‘y [5, 15, 30, 32, 36, 46, 47]. All these suggest the inherent inability of USD to deal effectively with a large dip (long burst). These difficulties arise from the essential nature of USD algorithms, i.e., searching only from one direction, and not from any fundamental limitations on convolutional codes or from any basic difficulties arising from the dip (burst) [30]. In the following, we demonstrate that the proposed BSD relieves the inefficiency of USD by attacking the dip (burst) form both forward and backward directions. Suppose that the code rate is below Rcomp, experimental evidence [48] indicates that with a very high probability only one important dip exists in a block. Moreover, it is easy to see that C’ is an increasing function of x, while G_ is a decreasing function of x. Hence, according to Property 3.2, C/’ 1 cf_ in Algorithm TAmeet, where I is the meeting point. Similarly, in Algorithm TAmerge, Cr C_ where 1 is the merging point. Thus, Algorithm TAmeet and Algorithm TAmerge optimally break every received block into two portions that require roughly the same number of computations 56 Chapter 4 Computational Performance of BSD o Nonbreakout node • Breakout node Forward tree root node (a) Backward tree root node DIP 1 (b) Backward tree root node Forward tree root node Optimum meeting (merging) point (c) Figure 4.1 (a) Correct path metric with forward decoding. (b) Correct path metric with backward decoding. (c) Correct path metric with bidirectional decoding. for decoding. Also, portions of the decoded forward and backward paths in the FS and BS will never overlap in Algorithm TAmeet and Algorithm TAmerge. In summary, the proposed BSD algorithms attack the dip (burst) from both sides, and halt somewhere near the middle of the dip. Hence, with the assumption of correct 57 Chapter 4 Computational Performance of BSD decoding, the meeting (or merging) event can be viewed as the breaking of the dip in an almost optimal way, i.e., into two almost equal parts (see Figure 4.1 (c)). It seems plausible that the required amount of computations by the proposed BSD is approximately the amount required by a USD to move through two dips (bursts) of half the original size. This is why our BSD needs substantially less computations. Therefore, if USD needs N cx exp -y computations for any given correct path dip ‘-y, our BSD asymptotically needs about twice of exp cx /T computations for the same dip. If this characterization is valid, we can conclude that p(CBSD > N) P(Cy8D > (N) ) 2 Pr. 2 2 N (4.1) Note the factor two in the exponent of the distribution (4.1). 4.2 Upper Bounds to Computational Distribution of Algorithm TAmeet The computational effort to decode a given block depends on the strategy of the algorithm, the transmitted and the received sequences. For the convenience of analysis, define a random variable C as C 2 max [mm (Of, Ce_)j. We first derive upper bounds to the distribution function of the random variable C, then we will apply them to CL of Algorithm TAmeet. Lemma 4.1: P(C> N) < P(cf > N/2)F(Cf_ where the summation is over all possible values of x, i.e., 1 < x < L 58 (4.2) N/2), — 1. Chapter 4 Computational Performance of BSD ProofS By the definition of the random variable C, one can immediately write P(C> N) N/2) P(max{rnin[C,C_]} p (min[C, C_] (4.3) N/2), where the summation is over all possible values of x, i.e., 1 <x < L — 1. The inequality in (4.3) is obvious since the summation includes the maximization point. Because random variables C’ and Cf_i are independent for any fixed arbitrary x, one can write (4.2) [49]. Q.E.D. Before we continue the derivation of the upper bound, we need the following lemma. Lemma 4.2: The distribution P(CK’’ > N) of USD to decode A branches in afinite A branch code tree (or infinite code tree) is upper bounded by P(GKSD N) 1 AP(C — 1), (4,4) where C 1 denotes the number of computations peiformed by USD in order to decode the first branch in an infinite code tree. Proof’ For an infinite code tree, according to the union bound, we have [20] P(CK >N) <P(Oc t N_A) N_A) =AP(C _ 1 l) (4.5) where CK’ denotes the number of computations required to decode the first A infor mation branches in an infinite tree, and Cj is the number of computations by USD in 59 Chapter 4 Computational Performance of BSD order to decode the t-th branch. For an infinite code tree, C = 01 since the number of computations performed by USD to decode one branch is the same for all branches. For a finite code tree, we have P(CK8D N) 8D > N) 1 P(CK AP C>——1 (4,6) because CK Q.E.D Lemma 4.2 confirms that there exists no essential difference between P (cS) and P(C 1 N) N) as Jacobs and Berlekamp suggested in [30]. From now on, we let of C denote the number of computations required to decode the first branch in the B and 1 infinite forward and backward code trees, respectively. Theorem 4.1: For a specific time-invariant convolutional (i.e., linear trellis) code t! with rate R and a BSC with a crossover probability p satisfying (2.9), the computational distribution of Algorithm TAmeet is upper bounded by 1 P(CL > N) <A . + d[log N — exp{—t(d[log N — log (uL)] 1og (uL)]) + 21og N} where A 1 is a finite constant independent of N. Here , q, (4.7) d[x], c1[x], ndF and are the same as defined by Chevillat and Costello [38]. The superscripts F and B denote the corresponding forward and backward codes of Q, respectively. Proof’ Using Chevillat and Costello’s upper bound, which is given by (2.10) for a specific time-invariant convolutional code [381, we can express P(cf — <a n .exp{_itd[logu 60 ( — of q1og as (Y — i)} (4.8) Chapter 4 Computational Performance of BSD and Cj 8 as N B 1 2(L — exp{—dB 1og 1) < — x) 1N 2(L x) — 1) + — (LN_X) _1)}. 2 ( log (4.9) Thus, according to Lemmas 4.1 and 4.2, we can write P(C N) d [iogu .exP{_(d[1ogu •(L—x)•ndF [( 1)]) + log (2(L x) x) Notice that the CDF is a non-decreasing function and log (a a 2 and — )]+ _i)] }. 1) > log (a) — (4.10) 1 when 2. We can thus write u P(C (_ N) <2 x (L + d [iogu 2 — = —-- •2 mu . — exP{_(d{logu x).ndF (2(L 1]) + 1og - X)) [x. (L — mu x)] 1---- log (2uL)} + d[1og N 1 exp{—t(d[1og N A — — ()_ ] [() ( - F “z .exp{—(d [1ogN . logy (2uL)]) + 2log N} 1og (2uL)] + d[1og N — According to Property 3.2, we know that CL 1og (2uL)]) + 2q1og N}. — 1 < C, (4.11) Hence, P(CL > N) Q.E.D. P(C> N). Theorem 4.1 shows that rapid column-distance growth of both forward and backward codes are essential to make BSD efficient. Using a computer search procedure, we have provided in Chapter 2 SBODP and SABODP codes which have the same rapid column distance growth rate from both forward and backward directions. 61 Chapter 4 Computational Performance of BSD However, if the convolutional code used is optimum from one direction but is very bad from the other direction, one can expect that the distribution of computations using BSD falls close to that using USD from the optimum direction, as the above bound indicates. A systematic ODP code is one typical example. Therefore, systematic codes are in general not suitable for BSD algorithms. Corollaiy 4.]: For a symmetric convolutional code (e.g., a SBODP or SABODP code), Theorem 4.1 can be written as <A 2 P(CL > N) 1 (ndj exp {2(—d[1og N . since d[x] = d[x] d[x] and d’ = — 1og (2uL)] + 1og N)}, (4.12) Notice the factor two in the = exponent of the distribution (4.12). Furthermore, the above bound (Theorem 4.1) can also be shown to be related to random-coding results, For an ensemble of trellis codes, d[x] < N/2 and ndB < = d[x] = n x/2, N/2 [38]. Therefore, for an ensemble of trellis codes, Theorem 4.1 can be expressed as P(CL> N) <2 2i [x (L . x)]. — 2 (N) . exp{—2(n/2 — ) log N + n1og (2uL)} ()2 where 2 and A J, J, = = 1 (2uL). A = ” N 2 A (au. n — 2q)/(2lnu) . exp {—2(n/2 — )log N} (4.13) — 1, which is the same as that of USD [38]. Moreover, are functions of the channel and code rate but independent of N. The above arguments are more clearly demonstrated by the following theorem. Notice that Lemmas 4.1 and 4.2 can be applied to an ensemble of codes. 62 Chapter 4 Computational Performance of BSD Theorem 4.2: Let Pr be defined by R 0 (Pr) / Pr E = On any DMC and for large N, the computational distribution of Algorithm TAmeet for an ensemble of trellis codes, P(CL > N), is upper bounded by P(CL > N) (4.14) , 2 N 3 A where P <Pr is the Pareto exponent of USD and A 3 is a finite constant independent ofN. Proof According to Lemma 4.1, 4.2 and (2.8), one obtains — N) <A2x(L_x)(_ ‘Y[2(Lx) = 2 N [x(L A )]1+P(i — i] — L— x)_P 2 (4.15) in (4.15) for N sufficiently large such that N> 2L, the terms in the last two brackets are very small and can be bounded by a constant. Therefore, P(C> N) <A , 2 N 3 (4.16) where 3 A = A 2 [x(L [x(L — — x)J’ (i x)]1+P(l P1 2 <2 ( A +p+p /4) 2 From Property 3.2, P(C — + [x(L we know that CL — 1 2x) ) — — 2(L— x)] _P p x)] [i + 2 (4.17) . 1 x)] < C. Hence, P(CL > N) < Q.E.D. N). From our computer simulation evidence (see section 4.6), we suggest that Theorem 4.2 can be applied to good specific time-invariant convolutional codes. 63 The reason Chapter is as follows. 4 Computational Performance of BSD Experimental results [48) have demonstrated that inequality (2.7) can actually be used as an approximate expression for the computational distribution with good specific time-invariant convolutional codes. Moreover, Anderson [50, 51] showed that good specific time-invariant convolutional codes follow the random code properties quite closely. In other words, good specific time-invariant convolutional codes have pseudorandom characters although they are not random. 4.3 Extension to Other BSD Agorithms In this section, we conjecture that the same asymptotic random coding upper bound (Theorem 4.2) also applies to Algorithm TAmerge. Algorithm TAmerge is similar to Algorithm TAmeet except for the stopping rules. More specifically, Property 3.2 of Algorithm TAmeet and Property 3.6 of Algorithm TAmerge are basically the same. By analogy to the analysis of Algorithm TAmeet, we can write P(GmeTe where Amerge 0 > N) P(GL > N) P (min[G, cf] > N/2), (4.18) is the true number of computations of Algorithm TAmerge, i.e. without the assumption of correct decoding. Hence, the only question is that for any fixed arbitrary x whether C and are independent or not. As defined in Chapter 3, for an arbitrary forward tree level x, random variables O and Of are the number of computations required to correctly decode x and (L-x) branches by forward and backward decoders, respectively. Furthermore, unlike Algorithm TAmeet, both forward and backward decoders may extend incorrect paths beyond level x before the end of decoding. Thus, some incorrect nodes searched by forward decoder may also be visited by backward decoder. 64 Chapter 4 Computational Performance of BSD Let Df’ be the forward incorrect path subset generated from forward tree level i, Similarly, let D denote the backward incorrect path subset generated from backward tree level j. According to the operation of sequential decoding, C is a function of the metrics of the correct path up to level x and metrics of incorrect paths in the subsets Dr, 1 =1, 2, ..., x, [31, 46, 52J. Thus, one can write = f(M[xF(h)], M[F]), where h e [1, x] and F E U Dr. (4.19) Here, XF(h) is the forward correct path up to forward tree level h, and *F is a forward incorrect path. Similarly, one can write = f(M[xB(h)],M[B]), where he [1,L — x] and B C UDP. Here, xB(h) is the backward correct path up to backward tree level h, and (4.20) $cB is a backward incorrect path. Strictly speaking, on a random code tree, C and C_ are not truiy independent, because an incorrect forward path *F may share some common nodes with an incorrect backward path xB. Fortunately, the probability of this event gets smaller as the code memory m increases. 10 Hence, by neglecting the above event, we can consider C’ and cf essentially independent, and thus conjecture that Theorem 4.2 also applies to Algorithm TAmerge. In Algorithm TTmerge, forward and backward merged paths may overlap each other, and hence the analysis of the computational distribution is even more difficult (if not impossible). However, as our computer simulation results (see Figure 4.9 in section 4.6) 10 The probability of this event is in the same order of the probability of error, which is shown in Chapter 5 to decrease exponentially with m. 65 Chapter 4 Computational Performance of BSD suggest, the overlapping area is insignificant, especially when code rate R Thus, it seems reasonable to suggest that the computational performance of Algorithm TTmerge can be approximated by that of Algorithm TAmeet if the code rate R Finally, it is easy to see that the computational distribution of Algorithm HTTmerge is upper bounded by that of Algorithm TTmerge and lower bounded by that of Algorithm TAmeet. Clearly, Algorithm HTTmerge is closer to Algorithm TTmerge when the number of required matching information symbols mh is close to the code memory m. Otherwise, it is more like Algorithm TAmeet. This phenomenon is clearly observed in the computer simulations (see Figure 4.2 in section 4.6). 4.4 Lower Bound to Computational Distribution Following Jacobs and Berlekamp [30], we show that with the assumption of correct decoding no algorithm of BSD type can have a computational distribution better than °. 2 N Theorem 4.3: For all convolutional (or trellis) codes, any DMC, and with the as sumption of correct decoding, the distribution P (Cf > N) of any BSD algorithm is lower bounded by p(CBSD> N) 22aN_2exp{_O(/i)}, (4.21) where a is defined in parametric equation (2.7). Proof Let us separate every received block into two almost equal parts, i.e. forward tree search operations will not go beyond forward tree level [(L tree search will not go beyond backward tree level F(L 66 — — m)/2j and backward m)/21 by the help of a genie as Chapter 4 Computational Performance of BSD defined by Jacobs and Berlekamp [30]. Notice that there are m branches in the codeword tree separating the portions considered by the genie-aided forward and backward decoders, allowing them to choose codeword paths independently. Let Cm)/ J and 2 denote the total number of computations required before first correctly decoding the [(L — rn)/2J branches by the idealized forward decoder and the [(L — m)/2] branches by the idealized backward decoder, respectively. Obviously under the assumption of correct decoding, > 2mm _m)/2J0RL_m)/21] for any BSD algorithm. Therefore, we can write P (2mm > N) = j,C_m)/ 2 [C_m)/ l ] (C_mIj > > N) N/2, C_m)/ l > N/2). 2 (4.22) By the assistance of a genie, forward and backward search areas do not overlap, i.e. events (C_m)I j > N/2) and (0_m)/21 > N/2) are independent for any DMC. 2 Therefore, we finally get BsD 0 P( > N) j > N/2) 2 (Cm)/ { (*) a exp =2 aexp a N_ [-Oi . p (Cm)/ l 2 > N/2) (vci)] [_o(/iT)]. Q.E.D. (4.23) Combining the lower bound (Theorem 4.3) and the upper bound (Theorem 4.2), we demonstrated that the computational distribution of the proposed BSD is Pareto, and its Pareto exponent is between 2c and 2p, where p < Pr is the Pareto exponent of USD. Since the Pareto exponent of USD is approximately c or Pr, we conclude that the Pareto exponent of BSD is approximately twice that of USD, 67 Chapter 4 Computational Performance of BSD 4.5 Average Number of Computations In this section, we examine the average number of computations of BSD and show that it is smaller than that of USD. The computational cutoff rate of BSD is then discussed, We define the average number of computations per decoded branch, denoted by Gb, as the total number of computations used to correctly decode one block divided by the block length L. Using Jacobs and Berlekamp’s approximation [301 for the distribution of CKSD, which is given by P(CKSD > N) AN, (4.24) 2 N, (r_1L) (4.25) we can approximate the distribution of BSD as > N) where we assume L >> m, and ignore the difference between — L(L — m)/2j and m)/21 since both tenns are almost equal to L I 2 when L is large. The total number of computations per decoded block of BSD can be any integer in the interval [L, M] where M is the maximum number of computations in BSD. Thus, BSD ECb — = 1 BSD 7L NP(Cf N [ (cf = N) SD> N) — p (cf > N)] M-l M (N + l)P(C > N) = N=L-1 NP(Cf > N) — N=L 68 Chapter 4 Computational Performance of BSD M-l =P(cfsD>N)+P(cfsD>L_l) SD — In (4.26), P(C > L — i) = (4.26) M). > 1 and p(cfS) > M) = 0. Therefore, using (4.25), (4.26) becomes M-l E[C] =l+yzP(Cf8D>N) M—l 1 + 22a_2 L > (4.27) ’. 2 N Moreover, similar to [30], we can write M M-l — L j 2 , a (4.28) lnM—lnL, Finally, using (4.28), we obtain E [csD] . + L 1 L jl—2a L2(1_a)l a .i 1 - — (4.29) L(1nM—lnL), 22 11+2 Similarly, one can get the average number of computations per decoded branch for USD as E C-° 1 + L’), f H1+1nMi—1nL, — a 1 a=1’ (4.30) where M 1 is the maximum number of computations in USD. From (4.29) and (4.30), it can be seen that E [c°] is smaller than E for any a. However, the difference becomes less significant when a is much larger than 69 Chapter 4 Computational Performance of BSD one, This is because the average computation for both USD and the proposed BSD is essentially one computation per branch when a >1. However, it is expected that higher order moments of the computation will be significantly improved by the proposed BSD even for high a. Let us now discuss the computational cutoff rate. The computational cutoff rate of sequential decoding is defined as the supremum of rates for which the average computation per branch is bounded [301. It is well known that with the assumption of correct decoding the computational cutoff rate Rcomp of USD is equal to E (]) [30, 53]. 0 Rcomp is a truly critical channel parameter, and with the assumption of correct decoding the average number of computations per decoded branch of USD is unbounded if the code rate R > R 0 [30, 53]. In the following theorem, it is shown that the computational cutoff rate for BSD with the assumption of correct decoding is the same as that of USD. Theorem 4.4: Consider an infinite sequence of convolutional (or trellis) codes and any DMC. For code j, let R denote its rate and m its memory. Suppose each code has n symbols per branch in its codeword tree, and R 1 Code {} R for all i, for some R > 01 R , is used with block length L 1 (including a known tail to terminate the encoder in a known state) for BSD. Equiprobable codewords and arbitrary sequential decoder are assumed. Under the assumption of correct decoding,” (fL for any i and some 0 < eo —* oc and L (i + 1, then the expected number of computations per branch satisfies E [CsD] u oo (4.3]) This assumption is equivalent to defining the number of computations needed to be infinite if BSD makes decoding error. 70 Chapter 4 Computational Performance of BSD Proof’ See Appendix A. Q.E.D. As a simple approach, from (4.29) one can actually show that E [GSD] L — cc and a < 1. However, it is —* co when important to notice that the above computational cutoff rate is based on the assumption of correct decoding. As suggested by Anderson [51], this assumption is unrealistic and in fact there is no cutoff rate phenomenon for sequential decoding as long as Fe > 0. Nevertheless, the cutoff rate is a bound such that above this rate the average computation will increase exponentially. From (4.29) and (4.30), one can easily see that when a < 1, i.e. above the cutoff rate, E [CD] is much smaller than E [C ] although it is also much larger than one. Hence, the useful range 1 of code rates with the proposed BSD is increased compared to that with USD according to Anderson’s definition on a “useful decoder” [51]. 46 Numerical and Simulation Results Computer simulations have been performed in order to verify the above theoretical results. Non-systematic rate r = 1 / 2 (bits per channel symbol) convolutional codes were used. All algorithms were run over a BSC with a transition probability p under strictly identical conditions (i.e., using the same code and noise sequences), in order to make them comparable. In our computer simulations, metrics were scaled into integers. In Algorithm TAmeet, Algorithm TAmerge and Algorithm HTTmerge, the substack spacing was chosen to be equal to one, i.e., using unquantized stacks. Different substack spacing values, namely L = I and zi = 7 were chosen in Algorithm Tlmerge. Notice here that there may exist more than one path in the highest non-empty substack even when z I. 71 Chapter 4 Computational Performance of BSD Figure 4.2 shows the distributions of the total number of computations per decoded block of both USD and BSD algorithms for the case p Pareto exponent of memory m = Pr = 1.1 (i.e., R 0 = = 0.0409, corresponding to a 0.52 bits per channel symbol). A SBODP code 23 was used, and 200,000 blocks each of length L = 400 bits (377 information bits) were simulated for each algorithm. The Pareto approximations for USD (stack algorithm z = 1) and BSD (Algorithm TAmerge) are also indicated on the figure. As expected, the use of BSD results in a significant reduction of the computational variability of sequential decoding. More specifically, the slope of the straight line that approximates the tail of the distribution of Algorithm TAmerge is equal to 2.16, which is more than twice that of USD. This is in agreement with our derived upper bound on the computational distribution (Theorem 4.2), Moreover, it also agrees with the lower bound (Theorem 4.3), i.e., less than 2 cr. It can be noticed from Figure 4.2 that the slopes of the computational distributions of all BSD-merge algorithms are similar to that of Algorithm TAmeet, which suggests that the additional computations associated with the merging test are asymptotically unimportant as compared to the computations of Algorithm TAmeet when the code rate R < As Forney [36] suggested, Figure 4.2 shows that an unidirectional quantized stack algorithm (z2i=7) increases the computational effort compared to an unidirectional unquan tized stack algorithm (z=1). However, Figure 4.2 shows that the tail of the computational distribution of Algorithm TTmerge improves when the quantized stack algorithm (=7) is used. This is due to the fact that if the substack spacing L is increased, more paths will be tested in the merging test so that the chance of merging gets enlarged. 72 II 1 Cf2 CD Q. C CD CD . t-) CD — C C C 00 C C C C C C 00 C C (J 0, 0 0 0 Probability P(CL> N) 0 0 0 C) CD Chapter 4 Computational Performance of BSD For comparison purposes, the computational distribution of Fomey’s scheme [22] built on the stack algorithm is also depicted in Figure 4.2. As we pointed out in Chapter 3, the total number of computations of Forney’s scheme is equal to 2 mm (Ci, CL . 9 Hence, the minimum number of computations with this scheme is equal to twice that of USD. Clearly, as indicated in the figure, Fomey’s scheme does not provide any significant advantage over conventional USD. This is mainly because the dip size in the correct path metric is essentially the same whether decoding is performed from forward or backward directions (see Figure 4,1). Thus, the number of computations, which is mainly determined by the dip size of correct path metric, is basically the same for both forward and backward decoders. Table 4.1 shows the average number of computations per decoded branch of different algorithms. Clearly, the average number of computations is reduced by using the proposed BSD. Table 4.1 also shows the number of erasure blocks (undecoded blocks) with a maximum allowed computation Cijm = 8000. These results suggest that our BSD can significantly alleviate the erasure or overflow problem, which is the main obstacle in the application of sequential decoding. Furthermore, Table 4.1 shows that with an increase in computations of about 3% with Algorithm TAmerge compared to Algorithm TAmeet, decoding errors (18.3% decoded blocks were in error after decoding by Algorithm TAmeet) were practically eliminated. 12 Thus, with the use of the merging test in BSD, decoding errors can be significantly reduced. For Algorithm HTTmerge, Figure 4.2 and Table 4.1 suggest that there exists a substantial computational gain by just letting m 2 = (m — 1) in the merging test. Thus, Algorithm HTTmerge can easily provide For a more detailed analysis and computer simulation results on the error perfomiance of the proposed BSD, refer to Chapter 6. 74 Chapter 4 Computational Performance of BSD Table 4.1 Comparison of average computations for L = 400, Ciim = 8000 and p = 0.0409. SBODP Code m = 23 runs Pr’’, L=400, 2x 10 5 Ave. comp. per branch Erasure blocks Error blocks USD (unquantized stack algorithm_z = 1) 1.621 1324 0 USD (quantized stack algorithm L = 7) 1.776 1743 0 Algorithm TAmeet (z2i = 1) 1.264 7 36614 Algorithm TAmerge_(L = 1) 1.303 21 1 Algorithm rrmerge_( = 1) 1.390 84 1 Algorithm TTmerge_(z = 7) 1.393 59 0 Algorithm HTTmerge 1, L = 1) (mh = m 1.333 39 1 Algorithm HTT’merge 2, = 1) (mh = m 1.319 31 3 — — a trade-off between computational effort and error performance by adjusting the number of matching information symbols m. Figure 4.3 compares the computational distributions of Algorithm TAmerge for ODP, SBODP and SABODP codes all of memory in Figure 4.2, m = 23, using the same parameters as It can be seen from this figure that the distribution of the number of computations per decoded block using a SBODP code is better than that with an ODP code. However, the distribution of the number of computations per decoded block using a SABODP code is almost identical to that with a SBODP code. With a SABODP code, no errors were found after decoding because its free distance (equals to 24) is much larger than the free distance of the SBODP code (equals to 18). These simulation results 75 Chapter 4 Computational Performance of BSD are clearly in agreement with our theoretical observations. Figure 4,4 shows the distributions of the total number of computations per decoded block of different algorithms using a systematic ODP code (m = 23). As expected, this figure shows that the computational distribution of BSD is almost the same as that of USD. This is due to the fact that the backward code of an ODP forward code is too poor and hence the backward decoder can hardly ever help in decoding. Figure 4.4 also demonstrates that the number of computations needed in Forney’s scheme is essentially equal to twice that of USD. Figure 4.5 shows the computational distributions of the total number of computations per decoded block of both USD and our BSD algorithms for the case p corresponding to a Pareto exponent Pr symbol). = 0.68 (i.e., R 0 = = 0.0594, 0.44 bits per channel The same code as in Figure 4.2 was used again, and 10,000 blocks each of length L = 200 bits were simulated for each algorithm. The Pareto approximations of these distributions are also indicated on the figure. In Figure 4.5, the computational distribution of Algorithm TAmeet (a BSD-no-merge) clearly violates the lower bound given by Theorem 4.3 because there were too many decoding errors (44.1% decoded blocks were in error as illustrated in Table 4.2),13 However, no errors were noticed after decoding by Algorithm TTmerge, while only one block was in error after decoding by Algorithm TAmerge. The slope of the straight line that approximates the tail of the computational distribution of Algorithm TAmerge is equal to 0.89, which is more than twice that of USD. Again, it agrees with our upper and lower bounds. Moreover, Notice thai the lower bound in Theorem 4.3 is based on the assumption of correct decoding. 76 Chapter 4 Computational Performance of BSD 0 A c-) • 10 400 800 1200 1600 2000 4000 N Figure 4.3 Distribution of the total number of computations per decoded block for different codes using the same parameters as in Figure 4.2. 77 8000 Chapter 4 Computational Performance of BSD 10 0 A c-) L 8000 N Figure 4.4 Distribution of the total number of computations per decoded block for a systematic ODP code (Pr 1.1). 78 Chapter 4 Computational Performance of BSD the slopes that approximate the computational distributions of Algorithm TTmerge and Algorithm TAmerge are almost the same. Table 4.2 compares the average number of computations for different algorithms for the case p = 0.0594. The number of erasure blocks and the number of error blocks are also shown in the table. It can be seen that the number of erasure blocks is substantially reduced by using the proposed BSD. Also, by using the merging test, almost all errors associated with Algorithm TAmeet were avoided, Furthermore, by comparing Table 4.1 and Table 4.2, one can notice that the improvements in the average number of computations of our BSD compared to that of USD are more significant when the Pareto exponent Pr is less than one. As anticipated, the useful range of code rates [51] with our BSD is increased as compared to the case with USD. These observations agree with our suggestions made in section 4.5. Figure 4.6 shows the distributions of the total number of computations per decoded block of both USD and our BSD algorithms for the case p a Pareto exponent Pr = 1.5 (i.e., Rcomp = = 0.0289, corresponding to 0.58 bits per channel symbol), The same code as in Figure 4.2 was used, and 5,000,000 blocks each of length L = 400 bits were run for each algorithm. Again, the slopes that approximate the computational distributions of the different BSD algorithms are quite similar. Once again, these simulation results are in agreement with our analytical results. Table 4.3 compares the average number of computations for different algorithms for the case p = 0 .0289. The average number of computations, the number of erasure blocks and the number of error blocks are also shown in Table 4.3. It can be noticed that BSD-merge eliminates all decoding errors associated with Algorithm TAmeet (4.7% 79 Chapter 4 Computational Performance of BSD 10 0 A c) 10 10 4000 iv Figure 4.5 Distribution of the total number of computations per decoded block for the case L = 200 and p = 0.0594, i.e., Pr 0.68. 80 Chapter 4 Computational Performance of BSD Table 4.2 Comparison of average computations for L SBODP Code m = 23 4 runs , L=200, i0 68 Pr=° USD (unquantized stack algorithm_ = 1) = 200, Cijm = 4000 and p = 0.0594. Ave. Coffi per branch Erasure blocks Error blocks 4561 1175 0 USD (quantized stack algorithm_ = 7) 5.106 1382 0 Algorithm TAmeet (z = 1) 1,541 0 4406 Algorithm TAmerge_(L = 1) 2.284 166 1 Algorithm TTmerge_(z = 1) 2.799 293 0 Algorithm TTmerge_( = 7) 2.894 306 0 blocks). Moreover, the additional number of computations due to the merging test is only 0.5% with Algorithm TAmerge compared to Algorithm TAmeet. All results show that the additional computations due to the merging test are worth the reward and those computations can be ignored when the code rate R <Reomp. As indicated in the table, the average number of computations of USD and our BSD are quite comparable here. However, due to the reduction in computational variability, as shown in Figure 4.6, it is expected that higher order moments will be smaller with the proposed BSD than with USD. Figure 4.7 shows the empirical probability masses of merging and meeting points in BSD for the two cases Pr = 1.1 and Pr = 1.5. It can be seen that statistically, there is essentially no difference between the meeting and merging points. This implies that the overlapping portion between forward and backward search areas when the merging test 81 00 CD CD 0 . fl CE]. 00 0 II II CD 0 Q. CD 0 C C C 0 C C 0 C 00 0 C — o o Probability P(CL> N) — o C0 — I C C Chapter 4 Computational Performance of BSD Table 4.3 Comparison of average computations for L = 400, Clim = 4000 and p = 0.0289. SBODP Code m = 23 , L=400, 5x 106 runs 5 Pr=L Ave. comp. per branch Erasure blocks Error blocks USD (unquantized stack algorithm_L = 1) 1.139 575 0 USD (quantized stack algorithm __7) 1.167 4336 0 Algorithm TAmeet (z = 1) 1,107 5 233321 Algorithm TAmerge_(L = 1) 1.113 11 0 Algorithm TTmerge_( = 1) 1.122 42 0 Algorithm TTmerge_(z = 7) 1.137 15 0 is used is negligible when the code rate R <R ,, Moreover, Figure 4.7 also shows 01 that the probability mass of the merging (meeting) point is concentrated towards the middle of the block. That is to suggest that when the channel is relatively quiet (i.e., Pr > 1), the merging (or meeting) point is more likely to be near L / 2, which means that the computational lower bound on the computational distribution (Theorem 4.3) is quite accurate in this case. Figure 4.8 shows the empirical probability masses of the merging points for Algorithm TAmerge and Algorithm TTmerge and that of the meeting point of Algorithm TAmeet, for the case Pr = 0.68. As can be seen, the probability mass of the merging point with Algorithm TAmerge is quite similar to that with Algorithm TTmerge. However, unlike the above case (i.e., R < Rcomp), here the probability mass of the meeting point starts to differ from that of the merging point. This suggests that the additional number of 83 Chapter 4 Computational Performance of BSD 0.07 0,06 0.05 —. 0.04 -.S 0.03 0.02 0.01 0 0 100 200 300 x Figure 4.7 Probability mass of merging and meeting when code rate 84 400 Chapter 4 Computational Performance of BSD computations needed for merging from the first meeting point increases as the code rate increases beyond Rcomp. Figure 4.9 shows the empirical distributions of overlapping lengths in Algorithm TAmerge and Algorithm TTmerge, which is defined as = F 1 +1 B — L at the end of decoding. O, represents the portion of the tree over which forward search and backward search overlap each other. In Figure 4.9, the block length L=200 for the case Pr and L=400 for Pr = 1.1 and Pr = = 0.68, 1,5. As expected, the distribution of the overlapping length decays faster as the channel becomes less noisy. Figure 4.9 suggests that the distribution of the overlapping length decays in an exponential fashion. This is because O, is essentially the sum of the deepest depths reached by forward and backward decoders, and it is known that the distribution of the depth x of an incorrect path explored by a sequential decoder decreases exponentially with x [4, 38]. Finally, since in BSD two separate tree search processors can work in parallel, it seems more reasonable to define a computation in BSD as hypotheses (extension) of two nodes, one on the forward direction and the other one on the backward direction. Clearly, all analyses in this chapter still hold except that “N” should be replaced by “2N”. As a result of this definition, our computer simulation results should be shifted horizontally to the left by a factor of two. Figure 4.10 compares the distributions of Algorithm TTmerge (L SABODP codes (m = = 1) for different 12, 16, 23) using the same parameters as in Figure 4.2, but with the new definition of one computation with BSD. The number of blocks used for each simulation was 200,000 each of length L = 400. The maximum number of computations allowed to decode one block was set to 4000. The computational distributions of USD 85 Chapter 4 ComputationaL Performance of BSD 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0 50 100 150 x Figure 4.8 Probability mass of merging and meeting when code rate > . 01 R 86 200 00 CD CD C CD C 0 CD Tj C C Probability P(O > X) 0 Cl) C C) 0 n Chapter 4 Computational Performance of BSD Table 4.4 Comparison of computational and error performances of different m with the new definition of one computation. SABODP Codes 5 runs , L=400, 2x i0 1 Pr Ave. comp. per branch Erasure blocks Error blocks USD (unquantized stack algorithm_A = 1, m = 12) 1.571 2761 100 USD (quantized stack algorithm A = 1, m = 16) 1.564 2793 11 USD (quantized stack algorithm A = 1, m = 23) 1.530 2595 0 Algorithm Tlmerge (A = 1, m = 12) 0.710 42 383 Algorithm TTmerge (A = 1, m = 16) 0.707 73 27 Algorithm Tlmerge (A = 1, m = 23) 0.700 77 0 are also shown in Figure 4.10. Moreover, the number of erasure blocks, the number of error blocks, and the average number of computations, (using the new definition of one computation with BSD) are shown in Table 4,4, Similar to USD, the number of computations of BSD is essentially not related with m. However, the error performance is improved with increasing m. In summary we have demonstrated that the use of the proposed BSD can reduce both the computational variability and average number of computations of sequential decoding. It should be pointed out at this stage that in our comparison of BSD-merge algorithms with USD, we did not take into account the additional processing time needed to perform the merging test in BSD-merge. However, one can use extra processors to perform this task that would not slow down the node extension operations of sequential decoding. Therefore, it seems that the additional complexity due to the merging test 88 Chapter 4 Computational Performance of BSD 0 10 Unidirectional Stack Algorithm (A=l) \ “ ‘% ‘\ —1 10 V \\____ \\ ‘c ‘ A -2 10 Algorithm TTmerge (A=l) — — — — %... —c• \%., \___ -3 10 SABODP Codes m=23 m=16 m = 12 — — — — — — %. 10200 400 600 800 1000 2000 N Figure 4.10 Distribution of the total number of computations for different m with the new definition of one computation. 89 4000 Chapter 4 Computational Performance of BSD is unimportant compared to the significant improvements in computational performance obtained by using the proposed BSD. 90 Chapter 5 Error Performance of BSD In this chapter, the error performance of the proposed BSD algorithms is analyzed through the random coding argument. We assume that the channel is a DMC and BSD is based on the stack algorithm. However, all results obtained in this chapter can also be applied to arbitrary BSD sequential decoders, It is obvious that the error performance of any BSD is lower bounded by that of Viterbi decoding since sequential decoding is suboptimum. Thus, we only need to develop upper bounds on the error performance of the proposed BSD. First, the basic elements of the error performance of USD are summarized. Then, we deal with the probability of decoding error of B SD-merge (Algorithm TAmerge and Algorithm ‘TTmerge). Finally, the error performance of BSD-no-merge (Algorithm TAmeet) is discussed. It is found that for an ensemble of linear trellis codes the error performance of BSD-merge is asymptotically the same as that of USD, while the bit error probability of Algorithm TAmeet satisfies the random coding bound for block codes. 5.1 Error Performance of USD The basic results of the error performance of USD are summarized in the following. Let x(j) denote the correct path from root node to level j. Let x (t) denote an incorrect path which diverges from the correct path x at level i and remerges at level (i + t). A proto-error event as defined by Forney [36] is an event for which some incorrect path x(t) has a metric M[x(t)j at the point of convergence with the correct path x 91 Chapter 5 Error Performance of BSD such that M[x(t)] rninM[x(j)j. (5.1) .2 t By definition, no error can occur without a proto-error event whereas a proto-error event may not necessarily lead to a decoding error for a unidirectional unquantized stack algorithm [36]. Furthermore, for an incorrect path to remerge with the correct path, t must be no less than the code constraint length K. For an ensemble of linear trellis codes, the block error probability p8D, and the bit error rate with a unidirectional unquantized stack algorithm can be bounded by [6, 12, 36] [1_U pUSD < 0 o(1)/R]2G R (1 — E)Rcomp (52) )IR] 5 [l__E( e I2 (1 , — e)Rcomp R (1 — and pU9D < b I 0 , [l_U en’o( ) 6 , (1 R — (1 — e)Rcomp )Rcomp R (1 — — where is a positive number, R is the code rate as defined in (2.2), E (S) is the Gallager 0 function [5], Ci,, is the channel capacity, and for Rcomp < R < C,,, S e [0, 1] is the solution of R= and for 0 R Rcomp, S = (l—e)E ( 0 8) (5.4) 1. Now, define a proto-error event with a bias B as the event for which some incorrect path x(t) has a metric M[x(t)] at the point of convergence with the correct path x such that M[x(t)j minM[x(j)] 92 — B, (5.5) Chapter 5 Effor Performance of BSD where B is a positive number. Thus, for the unidirectional quantized stack algorithm with a substack spacing = , no error can occur without a proto-error event with a bias B z. Hence, it can be shown that the error performance of the unidirectional quantized stack algorithm is asymptotically the same as that of an unquantized one but with an insignificant increase factor of A 6 = ) in (5.2) and (5.3) [6, 36]. 6 e’(’ 5.2 Error Performance of BSD-merge In this section, the error performance of Algorithm TAmerge using unquantized stacks is analyzed in detail. The results are then extended to other BSD algorithms. Although the following analysis assumes unquantized stacks, it can be easily generalized to the case of quantized stacks, in a similar way as for USD [6, 36]. Let Smg denote the common encoder state of forward and backward portions of the merged path at the merging point 1. Obviously, Smg can be either correct or incorrect. Figures 5.1 and 5,2 show some typical situations in BSD where the common encoder state Smg is correct and incorrect, respectively. As indicated in Figure 5.1, there is always a possibility that more than one forward (backward) path may merge with more than one path in the opposite direction at encoder state Smg. Should this occurs, Algorithm TAmerge always choose the one (perhaps not the correct one) with the highest cumulative metric. In any case, the encoder state Smg at the merging point 10 in Figure 5.1 is assumed to be the correct one. In Figure 5.2, however, Smg is an incorrect encoder state. In the following, we divide the derivation of the block and bit error probabilities into three steps to prove that the error performance of Algorithm TAmerge is asymptotically the same as that of USD. In the first step, we show that if the encoder state Smg is correct, decoding 93 Chapter 5 Enor Performance of BSD S mg Backward correct path Forward correct path Merging point 10 Forward incorrect paths • Backward incorrect paths Correct encoder state Figure 5.1 Example of error events with a correct merging encoder state. Forward correct path Backward tree level Forward tree level i i Backward correct path S mg Backward incorrect paths Forward incorrect paths Merging point 10 • Correct encoder state Incorrect encoder state Figure 5.2 Example of error events with an incorrect merging encoder state. errors can be upper-bounded by the proto-error event with a bias ), where ) is the largest possible one branch metric drop. In step 2, we investigate the probability that the encoder state Smg is incorrect. Finally, the overall error performance is derived in step 3. Before investigating the above situations in detail, we need the following lemmas. Consider an incorrect path (t) which diverges from the correct path x at level i and 94 Chapter stretches out t 5 Error Performance of BSD branches from the correct node i. First of all, an incorrect path will be extended by a sequential decoder using an unquantized stack algorithm only if [6, 36] rninM[x(j)]. M[$c(t)] (5.6) Thus, one can immediately write the following lemma. Lemma 5,1: An incorrect path 5c(t) (either in forward decoding or in backward decoding), which diverged from the correct path at level i and remained unmerged up to level (i + t), may be on the top of the stack only M[*(t)]> where (i + mm f M[x(j)] (5.7,) h) denotes the farthest level of the correct path reached by the unquantized stack algorithm. Furthermore, notice that for any path present at any position in the stack, the parent node of that path must have been at the top of the stack at an earlier time. Thus, we can write the following lemma. Lemma 5.2: An incorrect path j(t) (either in forward decoding or in backward decoding), which diverged from the correct path at level i and remained unmerged up to level (i + t), could be in the stack only if > where (i + mm iji+h M[x(j)] — ). (‘5.8) h) denotes the farthest level of the correct path reached by the unquantized stack algorithm. We now proceed with the derivation of the error probability of Algorithm TAmerge. 95 Chapter 5 Error Performance of BSD Step 1: Analyze the error performance of Algorithm TAmerge given that the encoder state Smg is correct. According to the operations of Algorithm TAmerge, merging can occur at either step 3 or step 6. Without loss of generality, we suppose that decoding is stopped at step 3. This means that the end node of the backward portion of the decoded path at level l is at the top of the BS, whereas the end node of the forward portion of the decoded path could be at any position in the FS. As indicated in Figure 5.1, decoding error in this case is clearly upper-bounded by proto-error events, More specifically, let us suppose that there exists a forward incorrect path that merges with a backward incorrect path which is at the top of the BS. Clearly, decoding errors in the backward portion of the decoded path belong to proto-error events without a bias. Furthermore, according to Lemma 5.2, the metric of the forward incorrect path in the FS can not be lower than the minimum cumulative metric of the forward correct path minus \. Hence, the decoding error event near the merging point 1 made by the forward decoder is bounded by the proto-error event with a bias B = \. Clearly, any other decoding errors (if any) in the forward decoder can be upper-bounded by the proto-error event without a bias. In summary, all error events in Algorithm TAmerge when the encoder state Smg is correct can be upper-bounded by the proto-error events with a bias B = A. Thus, we can write the following lemma. Lemma 5.3: if in the merging test the encoder state Smg is correct, decoding error events in Algorithm TAmerge can be upper-bounded by the proto-error events with a bias \, which is asymptotically the same as that of USD. Step 2.’ Investigate the probability that the encoder state Smg is incorrect. Since the 96 Chapter 5 Error Performance of BSD encoder state Smg is incorrect, a decoding error must exist around the merging point 1 (see Figure 5.2). Let us define this decoding error event between the last diverging points of the correct forward and backward paths as the merging error event (see Figure 5.3). Clearly, only one such event can occur within each decoded block. Let pMerge denote the probability of the merging error event. Obviously, outside the merging error event, everything is the same as in USD, as illustrated in Figure 5,2. Correct path Incorrect path 1 Forward tree level i+h + Backward tree level i S mg Forward tree level i Merging point 1= i+t 1 Backward tree level D 2 i.e. backward tree level j +t Figure 5,3 Illustration of the merging error event. In Figure 5.3, we assume that forward and backward incorrect paths diverge from the correct path at forward tree level i and backward tree level j, respectively. As indicated in the figure, tj is the length of the forward portion of the decoded path from level i, and t 2 is the length of the backward portion of the decoded path from level j. Let the ensemble of all possible forward incorrect paths which diverge from the correct path at 97 Chapter 5 Error Performance of BSD forward tree level i be denoted by X(i). Similarly, let X(j) denote the ensemble of all possible backward incorrect paths which diverge from the correct path at backward tree level]. Let *F,(t1) denote a forward incorrect path in X(i) whose end node is at forward tree level (i + t ). Analogously, let 1 in (j) t +t 2 = B,(t2) whose end node is at backward tree level L — j — i and the merging point 10 = (j i+t 1 denote a backward incorrect path + t ). 2 = L — According to Property 3.3, (j ) since the forward 2 +t and backward portions of the decoded path wifi never overlap. Let (i hi) denote the + farthest depth of the forward correct path reached by forward decoder, Similarly, let + (j ) denote the farthest depth of the backward correct path reached by the backward 2 h decoder. Obviously, h 1 + h 2 1 +t t 2 according to Property 3.3. Clearly, not all paths in X.(i) and all paths in i) and j) X(j) can pairwise merge. Let .(i; j) contain which are pairwise merging paths. It is obvious that 1(i; j) can be viewed as the set of all incorrect paths which diverge and remerge with the correct path at forward tree level i and L —j. Define 1(i; j) as the total number of pairwise merging incorrect paths contained in the set .(i; j). Hence, t + t 2 I (i;j) (u — l)uL_i_i_K = (u — l)utl+t2_K [6]. Now, let 1 pMere(j be the probability of the merging error event as shown in Figure 5.3. parameters i, t , t 1 2 are given, parameterj is also determined since j = L — > K, and 2 , 1 h , ) 2 th Note that if (i + tl ). 2 +t We will demonstrate that the upper bound on P(i, t ,h 1 ,t 1 ,h 2 ) is only related with 2 ) after averaging over the ensemble of linear trellis codes. 2 1 +t (t ) and (h 2 1 +h Lemmas 5.1 and 5.2 are all that we need to determine the upper bound on the probability of the merging error event, and the method used in our derivation follows 98 Chapter 5 Error Performance of BSD closely that of Viterbi and Omura [6]. Also, it is important to notice that conditions in Lemmas 5A and 5.2 are only necessary but not sufficient. Lemma 5,4: Given i, t , h 1 , t 1 , h 2 , the probability of the merging error event in 2 Algorithm TAmerge is upper-bounded by 2 j+h i+hi 1 2 pMere( ,t h 1 h , ) < 2 > P(yx) e_M [xp*r)1 eM[,j(t1)]eM[*B,j(t2)] x ( (F,(il);B, ) )eX(i;j) L I2 wherej = L e_M[(T)1 — (i+ti (5.9) I ). 2 +t Proof’ Without loss of generality, we assume that decoding is stopped at step 3 in Algorithm TAmerge. That is merging occurs between a top path xB(t2) in the BS and a path xF(t1) which could be at any position in the FS. According to Lemmas 5.1 and 5.2, we can write M[F,(t1)] mm — 1 :<r z+hi M[xF(Ti)]+) 9F(i,tl,hl) O (5J0) 0. (5.11) and M[*B,(t2)] Moreover, 5cF,j(tl) and (*F,(t) XB,f(12)) — XB,,(t2) mm )] 2 M[xB(r 9B(j,t2,h2) must share the same encoder state Smg at level 10, i.e., e X(i;j). Similar to [6, 36], we can write pMerge( ,h 1 t , 12, h 1 ) < P(yx)q(y), 2 99 (5.12) Chapter 5 Error Performance of BSD where the received code vector 14 runs over all symbols between forward tree level i and backward tree level j, and the indicator function q5(y) is defined as (1, if gF(i,tl,hl) 0 and gB(j,tl,h2) 0 for some (*F,(t);kB,(t2)) E X(i;j) q(y) 1 0, (5J3) otherwise. As in [6, 36], if for a given y, (y) = 0 1, then for u exp {.gF(i, t , hi)}.exp {agB(i, t 1 , h)} l,for some (kF,(t1); B,(t2)) C X(i;j). 2 (5.14) Hence for this y 1 exp{u .gF(i,tl,hl)} .exp{ .gB(j,t2,h2)}] 6 = (y), a, 0. (ti ) ;xB,, (1 ))EX(s;j) 2 (5.15) while if (y) = 0, (5.15) holds trivially. Also, we can write [36, 6] i+hi exp —uS{ mm M[xp(Tl)]} exp —uSMxF(T)], (5.16) exp—aSMxB(T)]. (5.17) r=s and 2 j+h exp—crS{ mm M[XB(T2)]} < j+h ,r 2 T=) Substituting (5.16) and (5.17) into (5.15), and combining (5.12) and (5.15) yield Q.E.D. (5.9). 15 As in [54, 35, 6], define the exponent functions fi() = In P(y) q(x) [] , (5.18) 14 Notation is simplified if we do not specify the dimensions of vectors; they are either implicit or specifically designated after each equation. Note fi(o8) = —Ec(, 8)—.78R, f2(J, 8) = —Ecj(r, 6) and f(o, 8) = —E (o, 8)+8R, where Ec(, 8), Eci(o, 6) 1 and Ez(cr, 8) are defined by Viterbi and Omura [6 pp. 358]. 100 Chapter 5 Effor Performance of BSD f2(a, 6) P(){ q(x) = in [)] q(x’) [P(Y)] , (5.19) and (5.20) By applying Holder inequality [4] to the above exponent functions, we can write efl(A) ef2(M <exp —{(1 — exp —(1 6)Eo{6/(1 — [/(1 0 )E — — )] 6)] + u6E [(1 0 (5.21) = — cr)/cr]} = 6cSie, (5.22) and <exp —6E [(1 0 )/j — = 6je()äR, (5.23) where 8 c and S are as defined in [6 pp.359]. Now, we are ready to show that the upper bound on P(i,t ,2 1 ,t h 1 h , ) is only 2 afunction oft 1 +t 2 = L—j —i K and h 1 +h 1 +t 2 < t . 2 Lemma 5.5: Given parameters i, tj, h , t 1 , and h 2 , the ensemble average probability 2 of the merging error event in Algorithm TAmerge is upper-bounded by pMergc( where u>0,6 ,t h 1 h , ) 2 ti, 2 1 +12 [0,1],t (enR > 1 K,h — S(h1+h2)S(t1+t2)e_6nKR, 3 l) 2 1,h Proof’ According to Lemma 5.4 and letting S 1, and 1 h +h 2 e ti +t. [0, 1] as in [4, 6, 36, 54], we can write pMerge( 2 i+h i+hi , 1 t , 12, h 1 h ) 2 P(yx) M[xF(T)1 5 eT 101 (5.24) e_0SM[xB(T)1 Chapter 5 Error Performance of BSD S ( eM[F,j(t1)1 x eM[Bj(t2)1 x ( 3 );*n ) )X(i;.j) i ( (*Q 2 2 i+h 1 i+h < e’X P(ylx) X(i;j) > x {eM[*Fi(u1)1 X eM[(t2)]} < — l)Su(t1+t2_k 2 j+h 1 i+h x P(yx) e_L13M[xF(T)i x {eaM{*Fi(ul)1} (t2)i} 3 X {e’M[ Obviously, (5.25) is independent of indexes i and the sake of simplicity, let i = j = j (5.25) . due to the ensemble average. For 0 in above summations, and omit subscripts i and j in (5,25). Also, note that there is no difference in the ensemble average of the forward and backward codes. Thus pMcrgc (i, t 1h ,t 1 ,h 2 ) e’(u 2 — l)Su(tt2_I)6 1 h x P(yjx) x {eM[F(l 2 h e_M[xF(1’)1 )i}5{ =e(u — eJM[B(t ) 2 ] 1 )Su(t12_K)sT(hi, 2 )T(h t 1 t , ), 2 (5.26) where T(h, t) Y x(h:) S ( x 1 q[(tj)jeUMt 102 , (5.27) Chapter 5 Error Performance of BSD and where x(h ) and *(t) are codeword segments of h 1 1 and t1 branches, respectively. Notice the similarity between equation (5.27) and equation (6.2.9) in Viterbi and Omura [6]. Hence, we can write T(hL, t) as T(h — — fexpn{(h expn{(t —t)[f ( 1 u6) +6R] +tf (u,6)}, 2 (,6) 6R] + hf 3 h)[f (u,6)}, 2 — — t t h 528 Substitute (5.21) through (5.23) into (5.28), we obtain T(h,t) < (5.29) flhSfltze_ThtiöR Finally, substitute (5.29) into (5.26), and note u enR by the definition of the code rate = in (2.2), to obtain (5.24). Q.E.D. In the following, we show that for an ensemble of linear trellis codes pMerge and pMerge have the same exponential exponent as pUSD and Pb’D but with an insignificant increase factor of e/(1) at most. Theorem 5.]: For an ensemble of linear trellis codes, the probability of the merging error event is upper-bounded by pMerge e < — —nKR (A ) rl7e e0(S), 7 A V — — Rcomp “comp R C ’ 1 (530) where 7 A is a finite constant, and for Reomp (enR = R R while for 0 R Rcomp, S = C, 6 (6) 0 E 1. 103 — i) (5.31) [0, 1] is the solution of (5.32) Chapter 5 Error Performance of BSD Proof The probability of the merging error event without any condition is given by pMerge = (5.33) ,h ,t ,hi 2 1 Vi,2 According to Lemma 5.5, given parameters i, t , h, t 1 , and h 2 , the ensemble average 2 probability of the merging error event is upper-bounded by pMerge( where u A 6 ,h 1 t t, h 1 ) < e 2 0,6 C [0,lj,t 1 +t 2 (e7lR fl(h1+h2)fl(t1+L2)_nKR 6 i) — (5.34) 1 +t 1 > l,h > 1, and 1 K,h h +h 2 <t . Moreover, 2 one can write [6] l,6j <1, Thus letting R = if = and R< 1 (6)/6 where 8 C [0, 1], 1 0 E h = = (6) 0 E (5.35) 1 and t = K, we can bound (5.34) by pMerge( , 1 t K (emR eA 68 ,t 1 h ,h 2 ) 2 < eTf (e — i) — e_SnKR 6 l) (5,36) As in [6], choose 6 = 1 if B < E (1) = Rcomp. Substituting (5.36) into (5.33), proves 0 the above theorem. Note that in (5.30) A 7 = A 6 (e Q.E.D. — 1) if that of quantizing the stack in USD with z = = 7 has the similar effect to \. Thus, A \. Therefore, by comparing (5.30) with (5.2) of USD, one can conclude that the additional error if any due to the merging error event has the same exponent as USD. Step 3: Now, we are ready to show that the overall error performance of Algorithm TAmerge is asymptotically the same as that of USD. 104 Chapter 5 Error Performance of BSD Theorem 5.2: For an ensemble of linear trellis codes, the probability of error with Algorithm TAmerge is upper-bounded by pTAmerge PsD + 6 <A pT9e, (537) Proof’ Let A, B and B’ be some events where B’ is the complement of event B. Thus, we can write P(A) = P(A, B) + P(A, B’) = P(AB)P(B) + P(AIB’)P(B’). (5.38) Now let A be the decoding error event and B the event that the encoder state Smg is correct, Then P(AIB’) 1 and we can write P(A) = P(AIB)P(B) + P(B’) < P(AIB) + P(B’). Combining (5.39) and Lemma 5.3, we complete the proof. (5.39) Q.E.D. Corollary 5,1: For an ensemble of linear trellis codes, the probability of error with Algorithm TAmerge is upper-bounded by pTAmerge PJSD. 6 <2A (5.40) Corollary 5.1 suggests that the error probability of Algorithm TAmerge is at most twice that of a quantized USD with = 105 Chapter 5 Error Perfomiance of BSD Theorem 5.3: Given that the encoder state Smg is incorrect, the bit error probability of Algorithm TAmerge follows the same upper-bound as that of a unidirectional quantized stack algorithm with = Proof According to the definition in [6], the bit error probability is the expected number of bit errors in a given sequence of received bits normalized by the total number of bits in the sequence. Thus for a trellis code with block length L and log 2 u information bits per branch, TAmerge b 13 (Smg . is incorrect) = E[Nbl (L m)log u 2 (5.41) — where Nb is the total number of bit errors in the L-branch code sequence given that the encoder state Smg is incorrect. Now, let nb(T) denote the expected number of bit errors caused by an incorrect path diverging at forward tree level T. As indicated in Figure 5.2, Nb can be divided into three parts i.e., i—rn E[Nbj L—m E[n(r)] + E[nb(i, t ,h 1 ,t 1 ,h 2 )] + 2 E[nb(r)], (5.42) where the first and third terms correspond to the total number of bit errors outside the merging error event, and the second term represents the number of bit errors due to the merging error event with parameters i, tj, hj, t , h. The inequality in (5.42) follows 2 from the fact that bit error sequences outside the merging error event may overlap [6]. As in [6], we can write , 1 E[nb(i, t , 12, h 1 h )] 2 (t + t 2 — K + 1) log 2 (u)P(i, t, h , 12, h 1 ), 2 106 (5.43) Chapter 5 Error Performance of BSD since (t 2 1 + t — K + 1) log 2 (u) is the number of bit errors due to the merging error event. According to Lemma 5.5, E[nb(i, t ,h 1 ,t 1 ,h 2 )] is only related with parameters 2 t t + t 2 and 1 +h h , where t 2 h K and 0 < h t. Moreover, since pMere( h) follows the same upper bound as that of the proto-error event with a bias \, we can write L—m E[Nb] where o 0 arid 6 e e° (5.44) E[nb(T)J [0, 1]. Substitute (5.44) into (5.41), we can write L—m pTAmer9e(g i incorrect) 2u (L -_rn)log > E[nb(T)j. (5.45) Notice that aside from the factor e, the right side of (5,45) is the upper bound of the bit error probability of a unidirectional unquantized stack algorithm [6 pp. 312j. As discussed earlier, e°’ has the same effect as using a unidirectional quantized stack algorithm with = \. Q.E.D. Similar to the proof of Theorem 5.2, one can write P(A) = P(AIB)P(B) + P(AIB’)P(B’) = P(AB) + P(B’) [P(AIB’) — P(AIB)], (5.46) where B still denotes the event that the encoder state Smg is correct. Now, let P(A) be the bit error probability with Algorithm TAmerge. Then, P (A B) is the bit error rate of Algorithm TAmerge given that the encoder state Smg is correct, and P(AIB’) the bit error rate given that Smg is incorrect. In (5.46), P(B’) is the probability that Smg is incorrect, which is upper-bounded by Theorem 5.1. Furthermore, According to Lemma 5.3, P(AB) follows the same upper 107 Chapter 5 Error Performance of BSD bound as for a unidirectional quantized stack algorithm with = ).. Similarly, according to Theorem 5.3, P(AB’) also follows the same upper bound as for a unidirectional quantized stack algorithm with = ). Although P(AIB) P(AIB’) in general, we can still conclude that the second term in (5.46) compared to the first term is asymptotically unimportant. Thus, the following corollary follows. Corollary 5.2: For an ensemble of linear trellis codes, the bit error probability of Algorithm TAmerge is asymptotically upper-bounded by pTAmerge PD. 6 <A (5.47) According to Corollaries 5.1 and 5.2, one can conclude that the error performance of Algorithm TAmerge is asymptotically the same as that of USD. More accurately, the error performance of Algorithm TAmerge using unquantized stacks is similar to that of a unidirectional quantized stack algorithm with = The error performance of Algorithm TTmerge using unquantized stacks is better than that of Algorithm TAmerge since the metric of its merged path is higher than or equal to the merged path metric in Algorithm TAmerge. This is because the merging test in Algorithm TTmerge is only performed among paths in the highest forward and backward substacks. The error performance of Algorithm HTTmerge is close to that of Algorithm TTmerge when the number of required matching information symbols mh is close to the code memory length m. On the other hand, when that number is zero (or close to zero), Algorithm HTTmerge behaves like Algorithm TAmeet. Thus, one can say that the error performance of Algorithm HTTmerge, is upper-bounded by that of Algorithm 108 Chapter 5 Error Performance of BSD TAmeet and lower bounded by that of Algorithm TTmerge. Furthermore, the number of computations needed by Algorithm HTTmerge decreases as mh decreases. Hence, Algorithm HTTmerge can easily provide a trade-off between the error performance and computational effort by controlling the number of matching information symbols m during the merging test. 53 Error Performance of Agorthm TAmeet According to Property 2.1 in Chapter 2, the distance spectra of the forward and backward codes of a specific time-invariant convolutional code are exactly the same. Suppose we let forward and backward decoders decode to the ends of their trees independently. Then, the error performance of the backward decoder is expected to be almost the same as that of the forward decoder. In Algorithm TAmeet, however, decoding stops whenever the search areas of forward and backward decoders meet. In other words, the searching areas of the forward and backward decoders will never overlap. The benefits are that the computational effort is reduced to its minimum among our proposed BSD algorithms and the extra merging test which might be time consuming is not required. However, the disadvantage is that the decoded forward and backward paths may not merge (agree) with each other. Thus, Algorithm TAmeet may make additional decoding errors near its meeting point. Hence, the error performance of Algorithm TAmeet is obviously worse than that of any other BSD algorithms with a merging test. In Algorithm TAmeet, both decoded forward and backward paths retreat about m /2 branches from their end nodes.’ 6 Thus, the length of the retreated branches, m / 2, can 16 Here, we ignore the Diophantine constraint, i.e., treat m 12 as a integer. 109 Chapter 5 Error Performance of BSD be viewed as the length of the backsearch limit for the last decoded information symbols by forward and backward decoders. For the rest of the decoded information symbols, the backsearch limit is clearly greater than m / 2. Hence, the decoded information symbols near the beginning and end of each block are more reliable than those near the meeting point. Thus, we can upper-bound the bit error probability of Algorithm TAmeet by that of USD with a backsearch limit equal to [m/2j. Clearly, the above upper bound indicates that the BER of Algorithm TAmeet is essentially not related with the block length L. Furthermore, the above argument suggests that a long block length is preferred in order to reduce the actual BER with Algorithm TAmeet. For an ensemble of linear trellis codes, Zigangirov [55] provided an upper bound on the bit error probability of USD with backsearch limit r that is greater than the constraint length K. When the backsearch limit 7 is equal to or less than the constraint length K, it is known [12, 56, 57] that the probability of error satisfies the random coding bound for block codes. Therefore, we can conclude that the random block coding exponent applies to the bit error probability of Algorithm TAmeet, and thus we can write pTAmeet < 8 A [m/2J n[Eo(pr)—prR] (5.48) From the above discussion, one can see that the only reason the error performance of Algorithm TAmeet is worse than that of USD is because it does not require the decoded forward and backward paths to merge with each other. 5.4 Numerical and Simulation Results in order to verify the analysis results, lengthy computer simulations were conducted. 110 Chapter 5 Error Performance of BSD Since decoding errors rarely occur with a long memory code, a short memory (m = 10) SABODP code was used. Similar to Chapter 4, the code rate was chosen to be equal to 0.5, a BSC channel was used, and metrics were scaled into integers in the simulations. Moreover, all algorithms were run under strictly identical conditions (i.e., using the same code and noise sequences) in order to make the results comparable. Results are shown in Tables 5.1 and 5.2 for Pr = 1.1 and Pr = 0.68, respectively. In these simulations, the one branch metric drop ) = 16 and 14 for Pr = 1.1 and Pr 0.68, respectively. 50,000 blocks of length L = = 400 were simulated for each algorithm in Table 5,1, while 5,000 blocks of length L = 200 were simulated for each algorithm in Table 5.2. In addition, the computational limit for each algorithm was selected large enough to make the erasure probability negligible. As expected, Tables 5.1 and 5.2 show that the error perfonnance (both bit and block error rate) of Algorithm TAmerge is similar to that of USD. More precisely, the BER of Algorithm TAmerge with unquantized stacks is very close to that of a unidirectional quantized stack algorithm with a sub stack spacing ).. Moreover, as Corollary 5.1 suggested, the block error probability of Algorithm TAmerge with unquantized stacks is upper-bounded by twice that of unidirectional quantized stack algorithm with z = Tables 5.1 and 5.2 suggest that the error performance of Algorithm Timerge with = 1 is very close to that of a unidirectional unquantized stack algorithm. Table 5.1 also shows that the error performance of Algorithm TTmerge with a small A, i.e., A < A, is slightly better than that of Algorithm TAmerge using unquantized stacks. One can also notice from Tables 5.1 and 5.2 that the error performance of Algorithm TT’merge with A = A is close to that of Algorithm TAmerge with A = 1. Moreover, when m = (m 111 — 1) Chapter 5 Error Performance of BSD Table 5.1 Error performance comparison of different algorithms for the case Pr = 1,1, i.e., R < Rcomp SABODP Code m=10 Pr=” ,L=400,5 x 1 runs BER Error blocks Erasure blocks Max. comp. Ave. comp./br. USD (unquantized stack algorithm_ = 1) 1.86e-4 261 0 163045 1.824 USD (quantized stack algorithm_ = 16) 5.73e-4 649 4 2e5 3.383 Algorithm TAmeet ( = 1) 8,09e-3 14552 0 9278 1.274 Algorithm TAmerge (z = 1) 6,08e-4 962 0 9364 1.310 Algorithm TTmerge_( = 1) 1.39e-4 340 0 17567 1.397 Algorithm TTmerge_(zS. = 7) 2.79e-4 496 0 16877 1.403 Algorithm TTmerge_( = 16) 8.07e-4 1188 0 23383 1.876 Algorithm HTlnierge (nih = rn—i,_z = 1) 3.52e-4 672 0 9562 1.336 Algorithm HTfmerge (rnh = 1) = m —2, 6.57e-4 1204 0 9562 1,322 and = 1, the error performance of Algorithm HTTmerge is similar to that of Algorithm TTmerge. The trade-off between the error performance and computational effort that can be provided by Algorithm HTTmerge is clearly apparent in Tables 5.1 and 5.2. The average and maximum number of computations, and the number of blocks remained undecoded (erasure blocks) with different algorithms are listed in Tables 5.1 and 5.2. As it can be seen, the maximum number of computations and the average number of computations necessary to decode a block using the proposed BSD are much less than those needed by USD. Moreover, the simulation results suggest that a large 112 Chapter 5 Error Performance of BSD Table 5.2 Error performance comparison of different algorithms for the case Pr = 0.68, i.e., R > R 0 SABODP Code m=10 runs pr=0.68,L200,5X BER Error blocks Erasure blocks Max. comp, Avg. comp./br. USD (unquantized stack algorithm_L=1) l,99e-2 413 4 2e5 18.434 USD (quantized stack algorithm_with z=l4) 3.88e-2 770 6 2e5 32.5 14 Algorithm TAmeet ( = 1) 4.88e-2 3047 0 2545 1.559 Algorithm TAmerge_(A = 1) l.58e-2 591 0 3095 1.898 Algorithm Tlmerge_(z=1) 9.27e-3 300 3 2e4 3.026 Algorithm Tlmerge_(LS=14) 2.91e-2 871 0 19107 4.144 Algorithm HTlmerge (mh = m 1,_L=1) 1.39e-2 515 0 6758 2.108 Algorithm HTTmerge (mh = m 2,_ti=l) l.70e-2 716 0 4258 1.938 - in Algorithm TTmerge does not benefit both the error performance and average computations. Thus, a small value of L seems to be a better choice for the overall performance of Algorithm TTmerge as well as Algorithm HTTmerge. Finally, we examine the effect of block length L on the error performance of the proposed BSD algorithms. Simulation results of the error performance of Algorithm TAmeet and Algorithm TAmerge are shown in Table 5.3 and Table 5.4, respectively. It can be noticed from Table 5.3 that the BER of Algorithm TAmeet improves slightly as L increases, which confirms the findings in section 5.3. The BER of all algorithms with a merging test is not sensitive to L. However, as expected, the block error probability 113 Chapter 5 Effor Performance of BSD Table 5.3 Error performance comparison of Algorithm TAmeet (L\ = I) for different block length L SABODP Code m=10 L=l00 L=200 L=t300 L=400 L=500 Bit error rate (BER) l.03e-2 9,84e-3 8.90e-3 8.09e-3 7.56e-3 #of error blocks 8869 11944 13482 14552 15348 # of erasure blocks 0 0 0 0 0 Max. #ofcomp. 583 2805 5021 9278 11200 Ave. comp. / branch 1.142 1.202 1.244 1.274 1,302 pr=l.1,5X10 I’UflS Table 5.4 Error performance comparison of Algorithm TAmerge (z = 1) for different block length L SABODP Code m=10 pr1.l,5X10 fllflS L=100 L=200 L=300 L=400 L=500 Bit error rate (BER) 3.07e-4 4.62e-4 5.50e-4 6.08e-4 6.52e-4 # of error blocks 151 409 664 962 1256 # of erasure blocks 0 0 0 0 0 Max. #ofcomp. 1133 3074 6465 9364 11362 Ave. comp. / branch 1.189 1,247 1.283 1.310 1.335 increases with increasing L, for all the algorithms under study. Tables 5.3 and 5.4 indicate that Algorithm TAmerge and Algorithm TAmeet have basically the same maximum and average number of computations when the code rate is lower than Rcomp. This argument is especially true when L is not too small. Thus, the additional computations introduced by the merging test are asymptotically insignificant, and are worth the improvements in error performance. Moreover, the maximum and average computations is much smaller than those of USD for all values of L. In other words, the number of erasure blocks with the proposed BSD is much smaller than those of USD under the same maximum number of computations. 114 Chapter 6 Bidirectional Multiple Stack Algorithm The main disadvantage of sequential decoding algorithms is their inherent overflow problem. The number of computations in sequential decoding is a random variable with a Pareto distribution. Hence, there is always a small fraction of blocks which require an infeasible number of computations and thus cannot be completely decoded. This ratio of undecodable blocks is often called the erasure probability Per. Several known techniques have been proposed to alleviate the overflow problem of sequential decoding [14—16, 18, 19, 58J, In Chapter 3, we proposed BSD algorithms which were shown to significantly reduce the erasure probability. The proposed BSD algorithms improve the computational performance by approximately doubling the Pareto exponent of conventional sequential decoding techniques. However, the decoding effort is still a random variable with a Pareto distribution, and the overflow or erasure problem is still not eliminated. Chevillat and Costello [18] proposed an interesting decoding algorithm, called multi ple stack algorithm (MSA), that allows erasure-free decoding. The basic idea is to make use of higher order stacks for blocks that require an excessive number of computations to be decoded. However, to achieve a good error performance with the MSA, a large amount of memory is needed. In this chapter we propose to modify the MSA to accommodate bidirectional tree search [29], as described in Chapter 3. We refer to this decoding algorithm as bidirectional multiple stack algorithm (BMSA). It is found by analysis as well as computer simulations 115 Chapter 6 Bidirectional Multiple Stack Algorithm that the use of bidirectional decoding with the MSA improves the performance in terms of computational effort, memory requirements and decoding errors. First, the basic elements of the conventional MSA are reviewed briefly. The BMSA is then discussed in detail and its performance is evaluated by analysis and computer simulations. Finally, different decoding algorithms, including the MSA, the new BMSA and the Viterbi algorithm, are compared on the basis of their memory requirements, bit error performances and computational efforts. 61 The Multiple Stack Algorithm (MSA) The multiple stack algorithm (MSA) eliminates the overflow problem entirely at the expense of a substantial increase in stack and buffer storage requirements. It is a modification of the stack algorithm which employs several stacks to alleviate the problem of stack overflow. The basic functions of the MSA are illustrated in Figure 6.1. Initially, the MSA decoder operates as a conventional stack algorithm. If a terminal node in the tree is reached before the first (primary) stack with size Zfvls is filled, decoding is complete, and the decoded sequence is identical to that of a conventional stack algorithm. However, if the first stack is filled and an extensive search is needed, the top T nodes of the first stack are transferred to the second stack and decoding resumes using only the T transferred nodes. If the end of the tree is reached before this stack is full, the terminal node is accepted as a tentative decision and is stored in a tentative register. The decoder clears the second stack, returns to the first stack, and attempts to find another tentative decision, The new tentative decision is compared to the preceding one, and the better one is retained and stored in the register. The process of transferring T nodes to another stack 116 Chapter 6 Bidirectional Multiple Stack Algorithm is not limited to the second stack. Additional stacks are formed as needed. If a stack fills up before the end of decoding, T nodes are transferred to a higher order (secondary) stack until a tentative decision is made. For simplicity, we assume that all secondary stacks have the same stack size, i.e., ZM = ZMSA, i 2, 3,.... The MSA terminates = decoding when it reaches the end of the tree in the first stack, or if a predetermined computational limit Cijm is reached while the search is in progress. In both cases, the best tentative decision is accepted as the final decision. I Top T nodes Transfer if 1full Transfer if Z2 if full 1 Second Stack Z2 Transfer if Z3 if full TopTnodes I Third Stack Z3 First Stack Zi Figure 6.1 Illustration of the multiple stack algorithm. The major difference between the single stack algorithm and the multi-stack algorithm is well described by their names. The single stack algorithm stops the decoding process when the stack is full and erases the undecoded block. The MSA continues decoding in secondary stacks to reach a decision. When a tentative decision is made in one of the secondary stacks, and the computational limit Gum has not been reached, the MSA 117 Chapter 6 Bidirectional Multiple Stack Algorithm returns to previous stacks to refine the tentative decision. Decoding stops either after satisfactorily achieving first-stack decoding or by reaching the computational limit. In [18], the block error probability of the MSA, pMSA PeUSD + is upper-bounded by USD 1 p (6.1) where PY is the block error probability of the single stack algorithm with an infinite stack size, and pf-’ is the probability of the first stack overflow. 1 P It is shown in [18] that \Pr / 9 (Z A 1 — 1) , (6,2) where A 9 is a constant and Pr is the Pareto exponent of USD as defined in Chapter 2. Similarly, the bit error probability of the MSA can be upper-bounded by pjfrISA <pUSD + (6.3) As pointed out by Viterbi and Omura [6], the block error probability is not a reasonable performance measure, and ultimately the most useful measure is the bit error probability. Moreover, as suggested in Chapter 5, the bit error probability of sequential decoding, including our BSD, is independent of the block length L. Hence, in this chapter, the bit error probability is the criteria used to measure the error performance of various algorithms. According to (6.3), in order to achieve a low bit error probability with the MSA, both pJSD and P 5 must be very small. It is trivial to make pS) very small by employing a long memory convolutional code, The objective is then to make P-”-’ as small as 118 Chapter 6 Bidirectional Multiple Stack Algorithm the desirable error performance. However, to make pf SD very small, say comparable D with a long memory code, the size of the first stack must be chosen very large. 9 to PbU As indicated in (6,2), pSD only decreases in a Pareto function with an increase in the first stack size ZtsA, This means that a substantial first stack size ZsA is needed to make pf9D relatively small. As it will shown later, the use of the proposed BSD is an alternative to reduce the first stack overflow probability for a given first stack size. Chevillat and Costello [181 investigated the distribution of block computations for the first tentative decision of the MSA. They showed that it is Pareto before the first stack is filled, decreasing exponentially thereafter, They also analyzed the computational distribution for the final decision and gave the following upper bound: N-, 9 (A < pfJSD P(Cp > N) 0, where N < N ZN > Cim, — 1 Cm (6.4) cj’’ is the total number of computations required for the final decision and p < Pr’ However, no experimental results on the final computational distribution are reported in the literature. Since the number of computations for the final decision represents the overall computational performance of the decoder, in the rest of this dissertation, the computational distribution for the final decision will be the focal point. In all computer simulations reported in this chapter, antipodal signaling over an additive white Gaussian noise (AWGN) channel with hard decision at the output of a coherent demodulator is assumed. Figure 6.2 shows empirical results of the computational distribution for the final decision of the MSA with parameters L A7, Ciim = 0 2000, Eb / N The code used is an m = = 4.58 dB, corresponding to Pr = = 128, T = 3, ZM 1, and ZfL’5’A = 7 SABODP code of rate r = 1 / 2 (see Table 2.3). 119 = 11 4L, 5L, 6L. Chapter 6 Bidirectional Multiple Stack Algorithm 10 0 A -, 10 10 128 256 384 640 768 512 1536 2000 N Figure 6.2 Empirical computational distribution for the final decision of the MSA. As indicated by (6,4), it can be seen from the figure that the computational distribution for the final decision can be divided into three different segments. Before the first stack becomes full (i.e., N Zf’), the computational effort follows a Pareto distribution as in the case of a conventional infinite stack size algorithm. After the first stack fills up (i.e., Zj’lsA < N < Gum), the distribution quickly reaches a computational floor, 120 Chapter 6 Bidirectional Multiple Stack Algorithm Finally, after the computational limit (i.e. N> Ciim) is reached, it becomes zero. The behavior of the first and last segments of the final computational distribution are obvious. The behavior of the second segment (i.e., ZSA < N Gum) can be explained as follows. Whenever the first stack fills up, the algorithm transfers T nodes to the second stack. The transferred nodes may either include the correct one, or they can all be incorrect nodes, In the first case, the correct node is amongst the transferred nodes in the secondary stacks. As described before, the decoding process is stopped only if a tentative decision is reached in the first stack, or if the computational limit is reached. Since the current decoding path is only accepted as a tentative decision, the MSA will not stop decoding even after the correct final node is located in a secondary stack. Let’s assume that the correct final node is in the i-th (i>]) stack. The MSA empties the i-th stack and returns to the (i — ])-th stack. Since the correct node is removed from the i-th stack, there is no correct node in any of the stacks. 17 It is therefore unlikely to return to the first stack and reach a tentative decision, In other words, with a high probability, decoding will not stop until the computational limit is reached. Next, assume that all the transferred T nodes are incorrect. Since there is no correct node in the second stack, it will take many computations to find one tentative decision and it is therefore very likely to require higher order stacks, Assume that the tentative decision is made in the i-th (1>]) stack. The MSA empties the i-th stack and returns to the (i — ])-th stack and tries to find another decision. After many computations, if the computational limit is not yet reached, the MSA will finally return to the first stack. However, the MSA can only extend T nodes before the first stack fills up again. This 17 Here, we ignore the fact that an incorrect path may merge with the correct path since this only occurs with a small probability. 121 Chapter 6 Bidirectional Multiple Stack Algorithm implies that there is a very small chance that a tentative decision can be reached within the T extensions in the first stack. If the first stack fills up again, another T nodes will be transferred to the second stack. Hence, if the correct node is not transferred into the second stack, during the first time, it may be transferred into the second stack during the second, third, etc. times. In summary, whenever the first stack fills up, the MSA will not, most of the time, stop the decoding process until the computational limit is reached. 6.2 Bidirectional Multiple Stack Algorithm (BMSA) A bidirectional multiple stack algorithm (BMSA) is constructed in this section, which combines the ideas of BSD and the use of multiple stacks. Any of the BSD algorithms described in Chapter 3 can be used. However, Algorithm TTmerge is chosen due to its ease of implementation and good error and computational performances. The BMSA based on Algorithm TTmerge consists of a number of forward and backward finite-size memory stacks. We assume that the pairs of forward and backward stacks are of the same size, denoted by ZPM. Initially, decoding with the BMSA is exactly the same as with Algorithm Timerge. If decoding is terminated before the two stacks fill up, the decoded sequence is the same as that with Algorithm TTmerge, However, should one of the stacks (FS or BS) fill up, like in the conventional MSA, the T top nodes in that stack are transferred to a secondary stack, and decoding resumes using only the T transferred nodes. Once a merged path is found, 18 that path is taken as a tentative decision, and is stored in a register. The secondary stack is then cleared, and decoding continues on the primary stacks. If, however, one of the stacks fills up For simplicity and ease of implementation, the merging test is only conducted between nodes of top non-empty substacks of the current pair of forward and backward stacks. 122 Chapter 6 Bidirectional Multiple Stack Algorithm before a decision is reached, the T top nodes in that stack are transferred to yet another secondary stack, and decoding resumes once again using only the T transferred nodes. This process is continued, that is secondary stacks are used as needed. Once a tentative decision is found in a given pair of forward and backward stacks, these two stacks are cleared, and decoding resumes on the predecessor stacks in the attempt to refine that tentative decision. Decoding is stopped whenever a decision is reached in the pair of primary stacks or when a predetermined computation limit Cijm is reached. In any event, the best tentative decision is accepted as the decoded path. A detailed description of the BMSA is given below, Let SF denote the current number of the forward stack, and SB the current number of the backward stack. The steps are as follow: Step 0: Place the root nodes of forward and backward trees in the 1st FS and the 1st BS, respectively. Set SF = 5 B = I. Step 1: Compute the metrics of all successors of any node from the highest nonempty substack in the SFth FS, remove this node from the SFth FS, and enter the new nodes in their proper substack in the Spth FS. Step 2: If 1 B > L, check all paths in the highest non-empty substack in the F + 1 SBth BS with all paths in the highest non-empty substack in the SFth FS. if one or more merging paths is found, select the one, including the one already in the tentative register if any, with the highest cumulative metric as a tentative decision and go to step 13. Otherwise, go to the next step. Step 3: If the SFth FS is full, transfer T nodes from the highest non-empty substack to the (SF 19 + 1)th FS,’ 9 and set SF = SF + 1. Go to the next step. If the highest non-empty substack contains less than T nodes, pick the rest from the next inghest non-empty substack, 123 Chapter 6 Bidirectional Multiple Stack Algorithm Step 4: Check if the node to be extended in the highest non-empty substack in the SBth BS is a terminal node of the backward tree. If yes, compare that path with the one in the tentative register, if any, select the better one, and go to the next step, Otherwise, go to step 6. Step 5: If B 5 = 1, stop decoding and take the path in the tentative register as the decoded path. Otherwise, go to step 15. Step 6: If the number of computations is equal to Cjjm, stop decoding and take the path in the tentative register as the decoded path. Otherwise, go the next step. Step 7: Compute the metrics of all successors of any node from the highest nonempty substack in the SBth BS, remove this node from the stack and enter the new nodes in their proper substack in the SBth BS. Step 8: If B F + 1 1 L, check all paths in the highest non-empty substack in the SFth FS with all paths in the highest non-empty substack in the SBth BS. If one or more merging paths is found, select the one, including the one afready in the tentative register, if any, with the highest cumulative metric as a tentative decision and go to step 16. Otherwise, go to the next step. Step 9: If the SBth BS is full, transfer T nodes from the highest non-empty substack to the (SB + J)th BS, and set SB = B + 5 1. Go to the next step. Step 10: Check if the node to be extended in the highest non-empty substack in the SFth FS is a terminal node of the forward tree. If yes, compare that path with the one in the tentative register, if any, select the better one, and go to the next step. Otherwise, go to step 12. 124 Chapter 6 Bidirectional Multiple Stack Algorithm Step 11: If 5 F 1, stop decoding and take the path in the tentative register as the = decoded path. Otherwise, go to step 18. Step 12: If the number of computations is equal to Ciim, stop decoding and take the path in the tentative register as the decoded path. Otherwise, go to step 1, Step 13: If SF = 1 and SB = 1, stop decoding and take the path in the tentative register as the decoded path. Otherwise, go to the next step. Step 14: If SF > 1, empty the SFth FS, set SF SF — 1, and go to the next step. Otherwise, remove the merged node from the 1st FS, and go to the next step. Step 15: If SB > 1, empty the SBth BS, set SB SB = — 1, and go to step 6. Otherwise, remove the merged node from the 1st BS, and go to step 6. Step 16: If SF = 1 and 5 B = 1, stop decoding and take the path in the tentative register as the decoded path. Otherwise, go to the next step. Step 17: If 5 B > 1, empty the SBth BS, set SB = B 5 — 1, and go to the next step. Otherwise, remove the merged node from the 1st BS, and go to the next step. Step 18: If SF> 1, empty the SFth PS, set 5 F = F 5 — 1, and go to step 12. Otherwise, remove the merged node from the 1st PS, and go to step 12. 6.3 Computational Properties of the BMSA Define the total number of computations Cf MSA needed to decode one block by the BMSA as the sum of the computations performed by forward and backward decoders. Let Z 1 = 2ZpM Zf’, and Z = 2ZpM = Z/’ for i=2,3, . As in the ZpMs should be made large enough so that only the MSA, the size of the 1st stack 4 very noisy blocks, which are potential erasures in conventional BSD, use higher order 125 Chapter 6 Bidirectional Multiple Stack Algorithm forward and backward stacks. On the other hand, the size of the secondary stacks ZM9A should be made substantially small for the BMSA to quickly advance through forward and backward trees, and find a reasonably good tentative decision. For T1, at most L pairs of stacks are formed before reaching the first tentative decision. Once a tentative decision is made, the decoder returns to previous stacks to refine the tentative decision. The BMSA is therefore erasure-free if one or more tentative decisions are made before reaching the computational limit Cijm. In the worst case scenario, where forward and backward decoders do not merge before the final forward or backward node is reached, the number of computations needed by the BMSA to guarantee erasure-free decoding would be twice the critical number of computations for the MSA, Ccrii, which is given by [18] L—rn—1 ( Gcrii = i) + 2m. (6.5) Therefore, to ensure erasure-free decoding using the BMSA with T=], we must have Gum > 2 X Gcrit. (6.6) Although the above criteria for selecting Cijm is extremely conservative, it offers an easy and safe design rule. Clearly, the computational properties of the first tentative decision of the BMSA are similar to those of the MSA. Thus, following [18] and using the approximation on the computational distribution of Algorithm TTmerge (see Chapter 4), one can write Property 6.]: The first stack overflow probability for the BMSA, Pj”, is given by the erasure probability of Algorithm TTmerge with primary stack size Z . That is, 1 pSD = P(cf8D > 1 z — 126 i) o(Z 1 A — Pr, 2 i)_ (6,7) Chapter 6 Bidirectional Multiple Stack Algorithm where A 10 is a constant independent of Zi, and Pr is the Pareto exponent. From (6.7) and (6.2), it can be noticed that B9D 1 p is much smaller than p9D for a given Z . In other words, to achieve a given overflow probability, the required Zj with 1 the BMSA is much smaller than that with the MSA, To quantify this reduction in stack ZsA(P Eb/No), and 1 , ZjBM(P 0 , size, let 1 Eb/N denote respectively, the stack sizes ) needed with the MSA and the BMSA to achieve a certain first stack overflow probability Pj. at a given signal to noise ratio Eb I N . The first (primary) stack size saving ratio 0 ° 2 can be expressed by s ‘MSAD 12 IT’.T\ - zPMSA(P,Eb/No)’ (68) which can be simplified to /P 9 (A ) 1 o/Pi) 1 (A 2 — = ( A rp_1/2Pr 1 11 A , 12 P’ (6,9) where A 11 is a constant that depends on the channel condition and the decoder used, which can be determined by simulation. For comparison purpose, from now on, we let A 11 = 1. Figure 6.3 shows a plot of S 3 as a function of P 1 for different values of Pareto exponent Pr. probability As shown in this figure, the smaller the required first stack overflow Pj is, the greater is the saving in Zj. Moreover, the saving in the primary stack size increases as the Pareto exponent (or EbIN ) decreases. For instance, when Pr 0 20 The first stack size saving ratio represents approximately the saving in the total memory size since the first stack is much larger than the secondary stacks. 127 Chapter 6 Bidirectional Multiple Stack Algorithm 350 300 250 200 150 100 50 0 4 io- 102 iü- 1 P Figure 6.3 The first stack saving as a function of Pj. = 1 (E,IN=4.58 dB), and P 1 For the same Pj. S = = 18 for Pr , the primary stack size saving is about 32 times. 3 i0 = 1.2 (Eb/N05.O dB). As for the total number of computations for the final decision of the BMSA, following [18], we can write the following property. 128 Chapter 6 Bidirectional Multiple Stack Algorithm Property 6.2: The final computational distribution of the BMSA is upper-bounded by oN 1 (A , 2 P(CfMSA > N) < 1 0, where the Pareto exponent p < Pr’ and Z 1 = N 1 <Z 1 N 0 lim elsewhere, — (6j ZBMsA. 2 The computational distribution of the final decision for the BMSA reaches a floor beyond a certain threshold like that of the conventional MSA. However, since pD << pSD for a given primary stack size Z , when compared to the MSA, the computational 1 floor is substantially reduced with the BMSA. In order to verify the above properties, computer simulations were performed using 50,000 blocks each of length L = 128. The signal to noise ratio Eb / N 0 was fixed to 4.58 dB, corresponding to a Pareto exponent of Pr non-systematic rate r = I / 2 memory m = 1. The code used is a SABODP, 7 code (see Table 2.3). The stacks in both the MSA and the BMSA were quantized with = 7. Figure 6.4 shows the results of the final computational distributions of the MSA and 1 with parameters T the BMSA for different values of the first stack size Z and Ciim = = 3, Z = 11, 2000. The Pareto approximations are plotted as solid straight lines on the figure. As it can be seen from this figure, the simulation results are in total agreement with the analytical properties discussed above. Figure 6.5 shows the results of the final computational distributions of the MSA and 1 the BMSA for different values of T. The different parameters used are: Z = 6L, Z = 1] and m2000. 11 This figure suggests that the first stack overflow probability and the final C number of computations are essentially independent of parameter T, as it was predicted by analysis (see (6.7) and (6.10)). 129 Chapter 6 Bidirectional Multiple Stack Algorithm 0 10 L _ _ :: 10 MSA // - — 10 - \ I I Zi = 640 Z1=768 0 —1 A Z1=512 — BMSA . -2 Pareto Approximation 10 -3 256 128 640 768 512 384 1536 2000 N Figure 6.4 Empirical computational distribution for the final decision using Z 1 as a parameter. Figure 6.6 shows the results of the final computational distributions of the MSA and the BMSA for various values of Z. The different parameters used are: Cijm and 4500 for Z = 11, 21 and 31, respectively, T = = 2000, 3300 3, and Z = 6L. Clim is increased with 1 Z in order to guarantee erasure-free decoding. This figure shows that parameter Z does not affect either the first stack overflow probability or the final number of computations, 130 Chapter 6 Bidirectional Multiple Stack Algorithm 10 0 10_i A 2 io 10 -3 128 256 384 640 768 512 1536 2000 N Figure 6.5 Empirical computational distribution for the final decision using T as a parameter. as it was predicted by analysis (see (6.7) and (6.10)). According to the above analysis and computer simulation results, the primary stack size Zj is the dominating factor in the final computational effort for both the MSA and the BMSA, while Ciim is mainly determined by Ccrit to guarantee erasure-free decoding. The average number of computations of the proposed BMSA is now examined, 131 Chapter 6 Bidirectional Multiple Stack Algorithm 0 A I 128 256 768 384 1536 2304 4500 N Figure 6.6 Empirical computational distributions for the final decision using Z as a parameter. Following the derivation of the average number of computations of the proposed BSD in Chapter 4, we get the following. Property 6.3: The average number of computations per branch of the BMSA is upper- 132 Chapter 6 Bidirectional Multiple Stack Algorithm bounded by <A + E [CBMSA] 12 { 0 A, (2p-1)L LL [ — 12 ) 1P — 1 (Z — 1 )1_2P] p [1n(Zi—1)—1n(J—1)j, — where the Pareto exponent p < Pr (Oil) 12 is a constant given by and A pBSDIC ‘S 1 + 12 A , p= Zi) hm (6.12) — L Proof Similar to the analysis for the proposed BSD in Chapter 4, the average number of computations per branch can be expressed by C:m —1 E[CM9A] = P(CfM.9A > N), 1 + (6,13) N=L where P(CfM’A > N) is upper-bounded by (6.10). Substituting (6.10) into (6.13), yields 1 Z E[GM5] oN 1 A + 2 <j pBSDf ‘S1im 1 L N=L 10 A = where A 12 = 1 ’° 2 N NL < = f { +1 —1 1 z N+, 12 A (6,14) N=L 1 + PiBSD(Ciim z, ) 1 —z — )/L. Moreover, since 1 Z °dx 2 x L-1 — 1)’ — 1 (Z — i)1_2P] P 2 (6.15) ln(Zi—1)—ln(L—1), — and substituting (6.15) into (6.14) the above property is proved. 133 Q.E.D. Chapter 6 Bidirectional Multiple Stack Algorithm Similarly, for the MSA, the average number of computations per branch can be bounded by E [cM] 13 <A + (p-)L [(L 1 (Z — — — i)1_P], # —l)—ln(L—1)], 1 {ln(Z (6.16) p=l where Al? is a constant and is defined as below pUsDi 13 A 1 + urn — Z 1) (6.17) By comparing (6.11) and (6.16), one can notice that under the same parameters, the average number of computations with the BMSA is smaller than that with the MSA, Under the same Cijm, in order to reduce the average number of computations, the primary 1 should be increased. This indicates a clear trade-off between the required stack size Z memory and the computational performance for both the MSA and the BMSA. However, the BMSA offers a substantial improvement in terms of computational performance and memory requirements compared to the MSA, as it was shown above. Moreover, one can notice from (6.11) and (6.16) that the average number of computations is less sensitive to Ciim for the BMSA as compared to the MSA. Table 6.1 shows some empirical results of the average number of computations for different values of Z . As predicted, in all cases, the use of the BMSA results in a 1 significant reduction of the average number of computations. Table 6.1 Average number of computations as a function of Z 1 SABODP (m = 7), L =128, 5x10 4 runs Eb/No = 4.58 dB, T = 3, Z = 11, Ciim = 2000 1 Z = 512 1 Z = 640 Zj = 768 MSA 2.512 2.347 2.256 BMSA 1.576 1,504 1.478 134 Chapter 6 Bidirectional Multiple Stack Algorithm Table 6.2 shows some empirical results of the average number of computations for different values of T, These results indicate that parameter T is not related with the average number of computations. This can also be seen from the analytical expressions given by (6.11) and (6.16). Table 6.2 Average number of computations as a function of T SABODP (m = 7), L = 128, 5x i0 runs E/NO=4.58dB,Zjt768,Z=ll,Cjj= T1 T=2 T=3 MSA 2.256 2.256 2.256 BMSA 1.475 1.477 1.478 2000 Table 6.3 shows some empirical results on the average number of computations as a function of Cjjm. The results confirm that the average number of computations is less sensitive to Cijm in the case of the BMSA compared to the MSA. Table 6.3 Average number of computations as a function of Cijm SABODP (m = 7), L = 128, 5x runs Eb/No=4.S8dB,Z]=768,T=3 Cjjm = 2000 (Z=11) Ciim 3300 (Z=21) Ciim = 4500 (Z=31) MSA 2,256 2.803 3.310 BMSA 1.478 1.522 1.555 6.4 Error Performance of the BMSA The error performance of the BMSA can be derived in the same way as for the conventional MSA [18]. One can hence write the following property. Property O.4: The bit error probability of the BMSA is upper-bounded by pBMSA <pBSD 135 + (6.18) Chapter 6 Bidirectional Multiple Stack Algorithm where pSD is the bit error probability of BSD using a single pair of infinite size FS and BS. The effect of parameters , T and Z on the bit error probability of the BMSA is 1 Z now provided. 6.4.1 Effect of Parameter Zj Based on the results in Chapter 5, pSD of BSD-merge is asymptotically the same as that of USD. As it was found in the previous section, for the same primary stack size Z , the first stack overflow probability 1 pBSD of the BMSA is much smaller than that of the conventional MSA. For example, as shown in Figure 6.4, for Z 1 BSD 1 p = 768, is around 5 x i0, whereas Pfr’ is only about 5 x 10—2. Hence, by employing the BMSA, the error performance can be substantially improved without increasing the primary stack size Zi. Figure 6.7 shows the bit error probability curves of the MSA and the BMSA 1 for the two values Lb/No as a function of Z 5.55 dB (Pr = = 4.58 dB (Pr 1) and Lb/No 1.5), which are drawn in dotted and dashed lines, respectively. = The parameters used in the decoding are the same as those used for Figure 6.4. As it can be seen from the figure, the error performance of both the MSA and the BMSA improves as Zj increases, up to a certain floor which is determined by the error performance limit of the employed code, that is obtained p8D pBSD pJ. of USD. For reference, we ) of B SD-merge or 9 through computer simulations using Algorithm TTmerge with unlimited sizes of FS and BS. Results are plotted on Figure 6.7 as straight lines. The error floor at Lb/No = 4.58 dB is around 3 x i0, and at Lb/No 136 = 5.55 dB it is around lO, Chapter 6 Bidirectional Multiple Stack Algorithm With the BMSA, the error floor is practically reached with Z 1 Eb/No = 4.58 dB and Eb/No = 640 and Z 1 5.55 dB, respecfively. For Eb/No = 640, we can get an approximation of BSD 1 p = 512 for 4.58 dB and Zj = which is about 7 x i0 from Figure 6.5. To achieve this same value with the MSA, using the Pareto approximation on Figure 6,5, 1 is about 5356, which is 8 times more than that of the BMSA. the required Z 10_I 10 -2 ‘ ‘ ‘4 ‘ %. ‘ \ ‘ ‘4 l:J.:1 10 -3 ‘4 ‘4. —---. -4 ‘4% 44 4‘%4%4 10 -4 MSA E BMSA Error Floor (Pr_ 1) Error Floor r 1.5) - -5 10121 256 384 512 640 768 Zi Figure 6.7 Bit error probability of the BMSA and MSA as a function of Zi. 137 Chapter 6 Bidirectional Multiple Stack Algorithm 6.4.2 Effects of Parameters T and Z As suggested by Chevillat and Costello [18], T and Z have no significant effects on the error performance of the MSA. Similar arguments also hold for the BMSA. Tables 6.4 and 6.5 show the error performance of both the BMSA and the MSA as a function of T and Z with the same parameters used for Figures 6,5 and 6.6, respectively. As expected, the error performance is independent of parameters T and Z. Thus, T and Z can be used to improve memory requirements and computational performance. Table 6.4 Error performance as a function of parameter T. SABODP code m = 7, L = 128 runs 4 5xl0 Eb/No=4.58 dB MSA BMSA T=l T=2 T=3 T=l T=2 T=3 BER l.88e-2 1.86e-2 l.84e-2 3.96e-3 3.89e-3 3.88e-3 #of error blocks 3503 3480 3464 1809 1803 1802 Table 6.5 Error performance as a function of parameter Z. SABODP code m = 7, L = 128 runs 4 5xlO =4.58 dB 0 EbIN MSA BMSA Z=ll Z=2l Z=31 Z=ll Z=21 Z=31 BER 1.84e-2 l.81e-2 l.80e-2 3,88e-3 3.81e-3 3.81e-3 # of error blocks 3464 3457 3454 1802 1796 1795 6.5 Comparison with Viterbi Decoding We now compare the proposed BMSA with the optimum Viterbi algorithm (VA). Figure 6.8 shows the BER of the various algorithms found through computer simulations. The codes and parameters used are as follow: 138 Chapter 6 Bidirectional Multiple Stack Algorithm • A rate 1/2 memory m • A rate 1/2 memory m = 7 optimum free distance code [7] was used with VA. = 7 SABODP code was used with both the MSA and the BMSA. The different parameters used with both the MSA and the BMSA are: L 128, T • = 3, Z = = 11, Z 1 = 640, and Cijm = 2000. Another rate 1/2 memory m = different parameters used are: L 10 SABODP code was used with the BMSA. The = 128, T = 3, Z = 11, Z 1 = 1024, and Gum = 2300. As expected, the BER performance of the BMSA is better than that of the conven tional MSA. For example, for the same code and the range of Eb/No shown in Figure 6.8, the BMSA is about 0.5 dB better than the MSA. The VA using the optimum rate 1/2 code performs better than the BMSA for the same code memory m. However, as the memory m of the code used increases, the error performance of the BMSA improves. For example, with the m = 10 code, the error performance of the BMSA is about 0.4 dB better than that of the VA at a BER = j5, This improvement in BER can be even made larger by using a longer memory code. For the BMSA, the average number of computations per decoded bit is typically less than 2 (see Table 6.1). This is much smaller than that of the VA, which requires 2” computations per bit. However, as pointed out by Ma [21], a node extension is executed much faster with the VA than with a sequential decoder (typically 10 times faster), Nevertheless, the overall decoding speed with the BMSA remains faster than with the VA, More importantly, the decoding complexity with the BMSA is virtually insensitive to the code memory m. Moreover, it was found by Ma [21] that as the code memory m increases, the MSA tends to require less memory for decoding than with the VA. Furthermore, we have shown that the proposed BMSA not only performs better than the MSA, but also requires less 139 C CD CD CD < CD CD d C.D tic _f CD C C- • C C C - $.- tic C C) C CD C.’) tic 2 C C C CD C CD -f C q CD -— ‘ CD tic C-DC C CCD C) tIC CD -f C CD CD -t CD C — z- r-,) C ) C) C tTi Ltt BER C 0 CD C Chapter 7 Conclusions and Suggestions for Further Research In a system using convolutional coding and sequential decoding, data is transmitted in blocks, and each block is terminated by a known tail. A conventional sequential decoder searches the tree from the forward direction, starting from initial encoder state (initial root of the tree), up to the final encoder state (end root of the tree). The main drawback of sequential decoding is the variability of its decoding effort which could cause decoding erasures. In order to alleviate this drawback, efficient bidirectional sequential decoding (BSD) techniques, in which the tree is simultaneously searched from forward and backward directions, were proposed and analyzed in this dissertation. The relationships between backward coding and forward coding have been examined in detail. Good convolutional codes, with memory m ranging from 2 to 25, suitable for bidirectional decoding have been found through computer search. These codes possess the same distance properties from both forward and backward directions. In the proposed BSD, two decoders are used; one is called forward decoder (FD), and is used to search the tree from forward direction; while the other is called backward decoder (BD), and is used for the backward search of the tree. Forward decoding and backward decoding are performed simultaneously, and stop somewhere in the tree. In one class of BSD, which are referred to as BSD-merge, decoding stops whenever FD and BD reach a common encoder state somewhere in the tree. In the other class of BSD, that is BSD-no-merge, no common encoder state is required, and decoding stops when 141 Chapter 7 Conclusions and Suggestions for Further Research FD meets BD. Different BSD algorithms were constructed based on the stack algorithm; Algorithm TAmeet which belongs to the class of B SD-no-merge, Algorithm TAmerge and Algorithm TTmerge which belong to BSD-merge, and finally Algorithm HTTmerge which is a hybrid version of BSD-merge and BSD-no-merge. The computational performance of the proposed BSD algorithms has been analyzed in detail. It was found that the distribution of the total number of computations per decoded block is still Pareto, as that of unidirectional sequential decoding (USD). However, the advantage of the proposed BSD appears as an increase in the Pareto exponent, and hence as a decrease in the computational variability. This results a decrease in the erasure probability. More specifically, it is proved by using the random coding approach that the Pareto exponent of BSD using Algorithm TAmeet is asymptotically twice that of USD. It is conjectured that this also applies to Algorithm TAmerge. On the other hand, it was found that the computational cutoff rate of sequential decoding remains unchanged, but the use of BSD reduces the average number of computations per decoded block. Extensive computer simulations were performed which confirmed our theoretical analysis. Based on the random coding approach, the error performance of BSD-merge is shown to be asymptotically the same as that of USD. Moreover, the bit error probability of Algorithm TAmeet is found to satisfy the random coding bound for block codes. Computer simulations were conducted and found to agree with the analytical findings. Among all the proposed BSD algorithms, Algorithm TAmeet is the simplest one in terms of implementation. However, its error performance is relatively poor compared to that of USD. Both Algorithm TAmerge and Algorithm TTmerge offer good computational 142 Chapter 7 Conclusions and Suggestions for Further Research and error performances. The merging test is more demanding in Algorithm TAmerge than in Algorithm TTmerge, and therefore, Algorithm TTmerge may be preferred in practice. Algorithm HTlmerge is very useful as it can provide a tradeoff between computational effort and error performance. None of the above four schemes can completely eliminate the erasure problem. Therefore, the idea of BSD was combined with the MSA, and an efficient erasure-free bidirectional MSA (BMSA) was constructed, Through analysis and computer simulations, it is shown that the new BMSA offers substantial advantages over the MSA in terms of computational effort, memory requirements and error performance. The BMSA appears as an attractive alternative to the VA where low error probabilities and high decoding speeds are required. Some interesting problems for future research are: 1. The implementation aspects of the proposed BSD algorithms need to be investigated. One interesting systolic search was recently proposed for the stack algorithm [59]. It would be nice to see how and if this idea can be adapted to the BSD algorithms. 2. BSD can be used for the decoding of tree-like block codes. Of course one needs to examine the distance properties of these tree-like block codes from the backward direction. Perhaps good block codes suitable for BSD need to be designed. 3. Recently sequential decoding was successfully applied to trellis codes [58], and good long memory trellis codes were found. Similarly, the proposed BSD algorithms can be applied to trellis codes. Our analysis is general and applies to both convolutional and trellis codes. However, one needs to find good symmetric trellis codes that 143 Chapter 7 Conclusions and Suggestions for Further Research possess good distance properties from both forward and backward directions. Note that the rate 1/2 SBODP or SABODP codes are suitable for Gray mapped QPSK modulation since in this case the Euclidean and Hamming distances are equivalent. 4. It would be of interest to see how well the proposed BSD algorithms perform for resolving intersymbol interference (1ST) channels, compared to conventional sequential decoding [60], 5. Finally, the BSD idea can be combined in conjunction with the buffer looking algorithm which is another erasure-free decoding method proposed recently by Wang [58]. A study of this approach is also of interest. 144 References [1] C. E. Shannon, “A Mathematical Theory of Communication,” Bell Syst. Techn. J., vol. 27, July and Oct. 1948. [2] J. M. Wozencraft and 1. M. Jacobs, Principles of Communication Engineering. New York: Wiley, 1965. [3] W. W. Peterson and E. J. Weldon, Error-Correcting Codes. Cambridge, Mass.: MIT Press, 2nd ed., 1972. [4] F, Jelinek, Probabilistic Information Theory. McGraw-Hill Book Company, 1968. [5] R. G. Gallager, Information Theory and Reliable Communication. New York: Wiley, 1968. [6] A. J. Viterbi and J. K. Omura, Principles of Digital Communication and Coding. McGraw-Hill Book Company, 1979. [7] V. K. Bhargava, D, Haccoun, R. Matyas, and P. Nuspi, Digital Communication by Satellite. New York: Wiley, 1981, [8] S. Lin and D. J. Costello, Jr., Error Control Coding: Fundamentals and Applications. NJ: Prentice-Hall: Englewood Cliffs, 1983. [9] A. J. Viterbi, “Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm,” IEEE Trans. on Inform. Theory, vol. IT- 13, pp. 260—269, April 1967. [10] J. B. Anderson and S. Mohan, “Sequential Coding Algorithms: A Survey and Cost Analysis,” IEEE Trans. on Comm., vol. COM-32, pp. 169—176, Feb. 1984. [11] J. M. Wozencraft and B. Reiffen, Sequential Decoding. Cambridge, Mass.: MIT Press, 1961. [12] H. L. Yudkin, “Channel State Testing in Information Decoding,” Sc.D. Diss., Dept. Elec. Eng., MJ.T., Cambridge, 1964. [13] R. M. Fano, “A Heuristic Discussion of Probabilistic Decoding,” IEEE Trans. on Inform. Theory, vol. IT-9, pp. 64—73, April 1963. 145 [14] G. D. Forney, Jr. and E. K. Bower, “A High-Speed Sequential Decoder: Prototype Design and Test,” IEEE Trans. Commun. Technol., vol. COM-19, pp. 82 1—835, Oct. 1971. [15] K. S. Zigangirov, “Some Sequential Decoding Procedures,” Probi. Peredach. Inform., vol. 2, no. 4, pp. 13—15, 1966. [16] F. Jelinek, “Fast Sequential Decoding Using a Stack,” JBMJ. Res. Develop., vol. 13, pp. 675—685, Nov. 1969. [17] J, W. Layland, and W. A. Lushbaugh, “A Flexible High-Speed Sequential Decoder for Deep Space Channels,” IEEE Trans. Commun. Technol., vol. COM-19, pp. 813— 820, Oct. 1971. [181 P. R. Chevillat and D. J. Costello, Jr., “A Multiple Stack Algorithm for Erasurefree Decoding of Convolutional Codes,” IEEE Trans. on Comm., vol. COM-25, pp. 1460— 1470, Dec. 1977. [19] D. Haccoun and M. J. Ferguson, “Generalized Stack Algorithms for Decoding Convolutional Codes,” iEEE Trans. on Inform. Theory, vol. IT-21, pp. 638—651, Nov. 1975. [20] P. R. Chevillat, “Fast Sequential Decoding and a New Complete Decoding Algorithm,” Ph.D. Diss., Dept. Elec. Eng., Illinois Institute of Technology, Chicago, IL, May 1976. [21] H. H. Ma, “The Multiple Stack Algorithm Implemented on a Zilog Z-80 Micro computer,” IEEE Trans. on Comm., vol. COM-28, pp. 1876—1882, Nov. 1980, [22] G. D. Forney, Jr., “Final Report on a Coding System Design for Advanced Solar Missions,” Rep. NAS2 -3637, Contract NASA CR73167, NASA Ames Res. Center, Moffett Field, CA, 1967. [23] L. R. Bahi, C. D. Cullum, W. Frazer, and F. Jelinek, “An Efficient Algorithm for Computing Free Distance,” IEEE Trans. on Inform. Theory, vol. IT-18, pp. 437— 439, May 1972. [24] K. J. Larsen, “Comments on ‘An Efficient Algorithm for Computing Free Distance’,” IEEE Trans. on Inform. Theory, vol. IT-19, pp. 577—579, July 1973. 146 [251 M. Rouanne and D. J. Costello, Jr., “An Algorithm for Computing the Distance Spectrum of Trellis Codes,” IEEE J. on Selected Areas in Commun., vol. 7, pp. 929— 940, Aug. 1989, [26] F. Hemmati, “Bidirectional Trellis Decoding,” 1990 IEEE Book of Abstracts of Information Theory Symposium, San Diego, p. 107, Jan. 1990. [27] D. Haccoun and J. Belzile, “Bidirectional Algorithms for the Decoding of Convo lutional Codes,” 1990 IEEE Book of Abstracts of Information Theory Symposium, San Diego, p. 177, Jan. 1990. [28] J. Belzile and D. Haccoun, “Bidirectional Breadth-First Algorithms for the Decoding of Convolutional Codes,” IEEE Trans. on Comm., vol. 41, pp. 370—380, Feb. 1993, [29] K. Li and S. Kallel, “Bidirectional Sequential Decoding Algorithms,” IEEE Symposium on Information Theory, in San Antonio, TX, USA, Jan. 1993. [30] I. M. Jacobs and E. R. Berlekamp, “A Lower Bound to the Distribution of Computation for Sequential Decoding,” IEEE Trans. on Inform. Theory, vol. IT13, pp. 167—174, April 1967. [31] R. Johannesson, “On the Distribution of Computation for Sequential Decoding Using the Stack Algorithm,” IEEE Trans. on Inform. Theory, vol. IT-25, pp. 323—33 1, May 1979. [32] J. E. Savage, “Sequential Decoding the Computation Problem,” Bell Syst. Techn. J., vol. 45, no. 1, pp. 149—175, 1966. — [33] D. D. Falconer, “An Upper Bound on the Distribution of Computation for Sequential Decoding with Rate Above Rcom,” M.i.T.-RLE Quart. Progress Rept., pp. 174—179, April 1966. [34] F. Jelinek, “An Upper Bound on Moments of Sequential Decoding Effort,” IEEE Trans. Inform. Theory, vol. IT-iS, pp. 140—149, January 1969. [35] 0. D. Forney, Jr., “Convolutional Codes. II. Maximum-Likelihood Decoding,” Inform. Contr., vol. 25, pp. 222—266, 1974. [36] 0. D. Forney, Jr., “Convolutional Codes. III. Sequential Decoding,” inform. Contr., vol. 25, pp. 267—297, 1974. 147 [37] P. R. Chevillat and D. J, Costello, Jr., “Distance and Computation in Sequential Decoding,” IEEE Trans. on Commun., vol. COM-24, pp. 440—447, April 1976. [38] P. R. Chevillat and D. J. Costello, Jr., “An Analysis of Sequential Decoding for Specific Time-Invariant Convolutional Codes,” IEEE Trans. on Inform. Theory, vol. IT-24, pp. 443—451, July 1978. [39] K. Li and S. Kallel, “Bidirectional Sequential Decoding for Convolutional Codes,” 1991 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing in Victoria, B.C., Canada, May 1991. [40] J. L. Massey and M. K. Sam, “Inverses of Linear Sequential Circuits,” IEEE Trans. on Comp., vol. C-17, pp. 330—337, April 1968. [41] J. L. Massey, “Reversible Codes,” Inform. and Control, vol. 7, pp. 369—380, 1964. [42] J. P. Robinson, “Reversible Convolutional Codes,” IEEE Trans. on Inform. Theoty, pp. 609—610, July 1968. [43] R. Johannesson, “Robustly-Optimal Rate One-Half Binary Covolutional Codes,” IEEE Trans. on Inform. Theory, vol. IT-21, pp. 464—468, July 1975, [44] R. Johannesson and E. Paaske, “Further Results on Binary Covolutional Codes with an Optimum Distance Profile,” IEEE Trans. on Inform. Theory, vol. IT-24, pp. 264—268, March 1978. [45] M. Cedervall and R. Johannesson, “A Fast Algorithm for Computing Distance Spectrum of Convolutional Codes,” IEEE Trans. on lnform. Theory, vol. 35, pp. 1146—1159, Nov. 1989. [46] D. Haccoun, “A Markov Chain Analysis of the Sequential Decoding Metric,” IEEE Trans. on Inform. Theory, vol. IT-26, pp. 109—113, January 1980. [47] J. B. Anderson and S. Mohan, Source and Channel Coding.’ an Algorithmic Approach. Kluwer Academic Publishers, 1991. [48] J. K. L. Jordan, “The Performance of Sequential Decoding in Conjunction with Efficient Modulation,” IEEE Trans. Commun. Technol., vol. COM-14, pp. 283—297, June 1966. [49] A. Papoulis, Probability, Random Variables, and Stochastic Processes. McGrawHill Book Company, 1965. 148 [50] J. B. Anderson, “Limited Search Trellis Decoding of Convolutional Codes,” IEEE Trans. on Inform. Theory, vol. IT-35, pp. 944—955, Sept. 1989. [51] J. B. Anderson, “Sequential Decoding Based on an Error Criterion,” IEEE Trans. on Inform. Theory, vol. IT-38, pp. 987—1001, May 1992. [52] D. Haccoun, “A Branching Process Analysis of the Average Number of Computa tions of Stack Algorithm,” IEEE Trans. on Inform. Theory, vol. IT-30, pp. 497—508, May 1984. [53] E. Arikan, “An Upper Bound on the Cutoff Rate of Sequential Decoding,” IEEE Trans. on Inform. Theory, vol. IT-34, pp. 55—63, January 1988. [54] F. Jelinek, “Upper Bounds on Sequential Decoding Performance Parameters,” IEEE Trans. Inform. Theory, vol. IT-20, pp. 227—239, March 1974. [55] K. S. Zigangirov, “On the Error Probability of Sequential Decoding on the BSC,” IEEE Trans. inform. Theory, vol. IT-18, pp. 199—202, January 1972. [56] K. S. Zigangirov, “Algorithm of Sequential Decoding in which the Error Probability Increases in Agreement with the Random Coding Bound,” Probi. Peredach. Inform., vol. 4, pp. 83—85, 1968. [57] K. S. Zigangirov, “Procedures of Sequential Decoding,” Coding and Complexity C1SM Courses and Lectures No. 216, Springer-Verlag, Wien, New York, pp. 109— 130, 1975. - [58] F. Q. Wang, “Efficient Sequential Decoding of Trellis Codes,” PhD. Diss., Dept. Elec. Eng., University of Notre Dame, Notre Dame, Indiana, Dec. 1992, [59] C. Y. Chang and K. Yao, “Systolic array architecture for the sequential stack decoding algorithm,” Advanced algorithms and architectures for signal processing, vol. 696, pp. 196—203, 1986. [60] F. Xiong, A. Zerilc and E. Shwedyk, “Sequential Sequence Estimation for Channels with Intersymbol Interference of Finite or Infinite Length,” IEEE Trans. Commun., vol. 38, pp. 795—804, June 1990. [61] B. Hajek, “Hitting-time and Occupation-time Bounds Implied by Drift Analysis with Applications,” Advances in Applied Probability, vol. 14, pp. 502—525, 1982. 149 Appendix A Proof of Theorem 4.4 In the proof of Theorem 4.4, we need a lemma (see Lemma 2.2 of [61] for a more general result). 7> Lemma Ad: Suppose a real-valued random variable V satisfies E[V] IV < M almost surely,for some constants E [eV] In particular, for , = and M. Then for any 0 and 0, > —TI( + M) + eflM, (Ad) (1 /M) in (1 + /M), E [e] <(1 + /M)[1 — ln(1 + /M)] <1. Proof The proof follows from noting that e” (A2) (TI/k! 1 — V + = [e1M — Q.E.D. (1 + TIM)]. Proof of Theorem 4.4: Similar to the proof of the computational lower bound in Theorem 4.3, separate every received block into two almost equal parts, i.e., lev els [1, [(L — m)/2J] for forward tree search operations and [[(L — m)/2j + m, L] for backward tree search. Here, L and m denote any L and m for any code with the assumption of correct decoding, Cf SD > mm j. Also j, CRL_m)/21] where 2 [C_m)/ and C_m)/ l defined in the proof of Theorem 4.3, are independent ran 2 dom variables. Now, let l be an arbitrary integer of multiple n and less than [(L define U [(L — m)rI/2l?lj. For I j — m)/2J . n, and U, define NJ to be the set of nodes in the 150 forward tree at segment j1, that are descendants of the correct end node of segment 1)1. Given that the end node of the segment (j l)l, — (j — is correctly decoded, let cf be the number of distinct nodes in NJ that are hypothesized (extended) by the idealized forward decoder before correctly decoding the segment jl. Then c$. Define a similar quantity, n, to be the number of nodes in NJ that are at least as likely to be the correct nodes at segment j1,. Then, because the codewords are a priori equiprobable and correct decoding is assumed, one can use Lemma 3.1 in [53]21 to show that E[cr] E[n]. (A3) Lemma 3.1 of [53] assumes that the stack algorithm is used. However, its proof remains valid for any sequential decoder for which one can determine which of two nodes at the same level is hypothesized (extended) first, by considering only the received message up to that level. Now let K denote the DMC channel. For every choose a finite l(K, R,6 , 0 Q) Q > 0 and 0 < o 1, one can always sufficiently large to satisfy the condition of Lemma 6.1 in [53], and thus [exp (l(R — )/8) —2] 1 R > 32Q(1 + ‘)n’. (A.4) Therefore, using Lemma 6.1 in [53] into (A.3), we can write E[cf] _E[n2] exp [l(R > — R)/8] l6Ql(1 + + 1, 21 1 <j <U. (A.5) The quantities defined by Arikan in [53j do not count the correct node itself (and thus equal one less than the quantities defined here), but clearly the result still holds 151 Let = 16Ql(1 + E)n’, JI = 1, and M = X where Xj denote the size of the channel input alphabet. Then, E o < < cr < exp (Rl), 0 < exp (Rl,) X. Let = .uo E [c’j Since cr, 1 <j < — < cr 1 and — ILOI < M since exp (Rl), and for distinct codewords, (1/M) In (1 + l/M). By Lemma A.l, E[exp (_ii(cr where 8 = (1 + l/M)[l > — <8, — ln(l + l/M)], and 8 < (A.6) 1. U are identically distributed, independent random valuables, one can use Chemoff bound and get P{(F) o} E[exP (-(cro))] = {E[exp (-(cr - ))] } (A.7) Now choose Lo(K, R,E , 0 Q) so that: (1) L 0 > 4(1 + e )l/n and 1 (A.8) (2) L 0 SupposeL 0 and L L > 2l(l +o)(l —1n2/lnO)/n. (1 + e’)m. Then j 2 P(C_m)/ > uo) p ( Uo) = 1 > 152 cr > Uo) — 1/2. (A.9) In the same way, one can show ) > 1/2, 0 21 > Ui P(C_m.)/ (A.1O) FC Since C[(Llfl)/2j are independent, we have and (BC j(L—mj)/21 E[C’’] = IFC BC E [mm CL(Lm.)/2j, G1(L.m)/2])] /L > /L 0 (1/2)(1/2)U >Q. Q.E.D,(A.11) 153 Appendix B List of Acronyms AID Analogy-to-Digital AWGN Additive White Gaussian Noise BD Backward Decoder BMSA Bidirectional Multiple Stack Algorithm BODP Bidirectional Optimum Distance Profile BS Backward Stack BSC Binary Symmetric Channel BSD Bidirectional Sequential Decoding BSD-merge Bidirectional Sequential Decoding With a Merging Test BSD-no-merge Bidirectional Sequential Decoding Without a Merging Test CDF Column Distance Function DIA Digital-to-Analogy DMC Discrete Memoryless Channel Eb/No Energy Per Bit to Noise Ratio FD Forward Decoder FEC Forward Error Correction FS Forward Stack GCD Greatest Common Divisor ISI Intersymbol Interference 154 modem Modulator/Demodulator MSA Multiple Stack Algorithm ODP Optimum Distance Profile QPSK Quaternary Phase Shift Key SABODP Symmetric Almost Bidirectional Optimum Distance Profile SBODP Symmetric Bidirectional Optimum Distance Profile USD Unidirectional Sequential Decoding VA Viterbi Algorithm 155
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Bidirectional sequential decoding
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Bidirectional sequential decoding Li, Kaiping 1994
pdf
Page Metadata
Item Metadata
Title | Bidirectional sequential decoding |
Creator |
Li, Kaiping |
Date Issued | 1994 |
Description | The main drawback of sequential decoding is the variability of its decoding effort which could cause decoding erasures. We propose and analyze in this dissertation efficient bidirectional sequential decoding (B SD) techniques to alleviate this drawback, In the proposed BSD, two decoders are used; one is called forward decoder (PD), and is used to search the tree from forward direction; while the other is called backward decoder (BD), and is used for the backward search of the tree. Forward decoding and backward decoding are performed simultaneously, and stop somewhere in the tree. In one class of BSD, to which we refer as BSD-merge, decoding stops whenever FD and BD merge at a common encoder state some where in the tree. In the other class of BSD, that is BSD no-merge, no common encoder state is required, and decoding stops when FD meets BD. Different BSD algorithms based on the stack algorithm are constructed; namely Algorithm TAmeet which belongs to the class of BSD-no-merge, Algorithm TAmerge and Algorithm Timerge which belong to BSD-merge, and finally Algorithm HTTmerge which is a hybrid version of BSD-merge and BSD-no-merge. The relationships between backward coding and forward coding are examined in detail. Good convolutional codes, with memory m ranging from 2 to 25, suitable for bidirectional decoding, found through extensive computer search, are provided. These codes possess the same distance properties from both forward and backward directions. It is found by analysis and computer simulations that the distribution of the total number of computations per decoded block of the proposed BSD is still Pareto, as that of unidirectional sequential decoding (USD). However, the advantage of the proposed BSD appears as an increase in the Pareto exponent, and hence as a decrease in the computational variability and erasure probability. More specifically, we prove by using the random coding approach that the Pareto exponent of BSD using Algorithm TAmeet is asymptotically twice that of USD, and conjecture that this also applies to Algorithm TAmerge. On the other hand, it is found that the computational cutoff rate remains unchanged, but the use of our BSD reduces the average number of computations per decoded bit. Using the random coding approach, we show that the error performance of BSD merge is asymptotically the same as that of USD. Moreover, we show that the bit error probability of Algorithm TAmeet satisfies the random coding bound for block codes. Computer simulations are provided to confirm the analytical findings. The use of BSD-merge substantially reduces the computational variability of con ventional sequential decoding without compromising the error performance, However, BSD does not completely eliminate the erasure problem. We therefore combine our BSD idea in conjunction with the multiple stack algorithm (MSA), which is an erasure-free decoding algorithm. It is shown through analysis and computer simulations that the new bidirectional multiple stack algorithm (BMSA) offers substantial advantages over the MSA in terms of computational effort, memory requirements and error performance. The BMSA appears as an attractive alternative to the Viterbi algorithm (VA) where low error probabilities and high decoding speeds are required. |
Extent | 3605157 bytes |
Genre |
Thesis/Dissertation |
Type |
Text |
File Format | application/pdf |
Language | eng |
Date Available | 2009-04-14 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0065074 |
URI | http://hdl.handle.net/2429/7040 |
Degree |
Doctor of Philosophy - PhD |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 1994-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 831-ubc_1994-953578.pdf [ 3.44MB ]
- Metadata
- JSON: 831-1.0065074.json
- JSON-LD: 831-1.0065074-ld.json
- RDF/XML (Pretty): 831-1.0065074-rdf.xml
- RDF/JSON: 831-1.0065074-rdf.json
- Turtle: 831-1.0065074-turtle.txt
- N-Triples: 831-1.0065074-rdf-ntriples.txt
- Original Record: 831-1.0065074-source.json
- Full Text
- 831-1.0065074-fulltext.txt
- Citation
- 831-1.0065074.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065074/manifest