On the Construction, Dimensionality, and DecodingofLinear Block Code TrellisesbyAlan D. KotB.A.Sc. (Hons.), The University of British Columbia, 1981M.A.Sc., The University of British Columbia, 1987A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE STUDIES(Department of Electrical Engineering)We accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIADecember 1992© Alan D. Kot, 1992In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature) Department of a-ea:P./C.^.617,9A,e-c--,-.,)The University of British ColumbiaVancouver, CanadaDate^/99.1DE-6 (2/88)AbstractIn principle, the coding gain of a digital communications system can be improved throughthe use of longer error control codes and soft-decision maximum-likelihood (ML) decoding.Unfortunately, the implementation of such techniques is limited by the computational requirementsof the decoder. In this thesis, we consider several aspects of trellis-based ML, and 'near-ML',soft-decision decoding of general linear block codes.ML decoding is feasible for reasonably compact trellises using the Viterbi algorithm. Forgeneral linear block codes we present simple methods for trellis construction that are based onthe methods of Wolf and Massey. It is confirmed that the trellises so constructed are minimal,and an improvement of Muder's lower bound on the maximum trellis dimension is found. It isshown that when equivalent codes are constructed by permutations of the symbol positions that theresulting trellis dimensions are fixed near either end, while in the central portion of the trellis thedimensions vary between Wolf's upper bound on the maximum dimension and a new lower boundon the minimum dimension. This lower bound and the improved bound on the maximum trellisdimension are exact for maximum distance separable codes. These bounds indicate that only codes(and their duals) that have a smallest minimum distance min (dmi„, dm-L.) significantly less thanthe corresponding Singleton bound can possibly have a compact trellis.For trellises that are impractically large for decoding by the Viterbi algorithm, we consider near-ML decoding using partial searches of a code trellis or tree. Despite the fact that this approach issuboptimal in terms of the probability of error, it has the potential for an improved trade-off betweencoding gain and computational effort. The decoding problem faced by a partial trellis search isdescribed in an alternative manner to Massey's variable-length code model. As part of this approach,we specialize Massey's use of 'random-tails' in order to exploit a priori knowledge of the codeand information-symbol distribution used. As a result there are some minor changes to the usualmetric, but only in atypical situations — such as the use of unequal a priori information-symbolprobabilities, non-linear codes, and non-symmetric, non-binary discrete memoryless channels.We introduce a simple and efficient (linear time) method for the selection of the best metricsfrom among a number of contenders in a breadth-first search. This method can provide a sortedset of survivor metrics during M algorithm decoding of binary linear block codes using only Mcomparisons in the worst case. The application of the method to the decoding of convolutionalcodes is also discussed.We present a method for soft-decision decoding of general linear block codes that we refer toas Reconfigurable Trellis (RT) Decoding. RT decoding carries out a partial search on a differentand more easily searched trellis (or tree) for the code. The trellis used for decoding is dependentupon the reliability of the received data, so that it is determined 'on-the-fly'. Only a portion of thereconfigured trellis needs to be constructed, as guided by the partial search. While RT decodingrequires some computational overhead prior to beginning each search, the search effort itself can bevery significantly reduced, so that overall an improvement can be achieved in the trade-off betweencoding gain and computational effort.With respect to the performance analysis of suboptimal soft-decision decoding schemes, weextend the uniform error property of ML decoding of linear codes on a binary-input symmetric-output memoryless channel to include the more general case of partial trellis searches. For RT-Mdecoding of general linear block codes the uniform error property is used together with a novelchannel model and a very simple search technique to estimate the increase in the probability of errorover that of ML decoding. This approach can yield accurate results while avoiding the complexityof considering numerous soft-decision output values, and without requiring detailed knowledge ofthe code structure.Table of ContentsAbstractList of Figures^ viList of Tables viiiAcknowledgements^ ixChapter 1 Introduction 11.1 The Decoding Problem ^ 21.2 Hard-Decisions, Soft-Decisions, and the Discarding of Likelihood Information ^ 41.3 Organization of the Thesis 9Chapter 2 Trellises for Linear Block Codes^ 102.1 Code Trees and Trellises 122.2 Review of Trellis Construction Methods ^ 14Wolf's Trellis Construction Method 14Massey's Trellis Construction Method 18Forney's Trellis Construction Method ^ 232.3 Minimal Trellises ^ 262.4 Simplified Trellis Construction Methods 27Method 1 27Method 2 ^ 36Method 3 39Discussion 402.5 Bounds on Trellis Dimensions ^ 41Discussion ^ 48Chapter 3 Trellis Decoding with Pruning^ 493.1 Tree/Trellis Decoding Algorithms 513.2 A Description of Partial Trellis Searches ^ 523.3 Decoders Constrained by Pruning 56A General Decoding Metric and its use in Trellis Decoding ^ 56Discussion ^ 64Breadth-First Decoding ^ 663.4 Contender Sifting 68Breadth-First Contender Sifting for Binary Linear Block Codes ^ 70ivRate 1/n Binary Convolutional Codes^ 79Discussion ^ 81Chapter 4 Reconfigurable Trellis Decoding^ 834.1 Soft-Decision Decoders for General Linear Block Codes ^ 854.2 The RT Decoding Concept 884.3 The Uniform Error Property for Pruning Decoders 974.4 The Pruning Impact ^ 984.5 Approximate Analysis of Pruning Impact for the RT-M Algorithm ^ 100The Order Statistic Channel Model ^ 100Approximate Pruning Impact of the RT-M Algorithm ^ 107Estimated WER of the RT-M Algorithm 1124.6 Computational Effort of RT-M Decoding ^ 115The Number of Operations for RT-M Decoding ^ 115Comparison to ML Decoding ^ 125Comparison to M Algorithm Decoding ^ 1274.7 Discussion ^ 137Chapter 5 Conclusion 1415.1 Summary of Contributions ^ 141Trellis Construction 141Trellis Dimensionality 142A General Decoding Metric and its use in Trellis Decoding ^ 142Contender Sifting ^ 143Reconfigurable Trellis Decoding^ 144Analysis of Error Probability of Pruning Decoders ^ 1455.2 Suggestions for Further Research 147Appendix A Sorting Methods for Contender Sifting^ 148A.1 Modified Insertion-Sort ^ 148A.2 Modified Merge-Sort 149Appendix B Proof of Theorem 4.1^ 150References^ 153List of FiguresFigure 1.1^Simplified Diagram of a Block Coded Communication System ^ 3Figure 1.2^Some Block Decoder Strategies ^ 5Figure 2.1^Step 1 of Wolf's Trellis Construction Method for a Binary (5,3) Code ^ 17Figure 2.2^Step 2 of Wolf's Trellis Construction Method for a Binary (5,3) Code ^ 17Figure 2.3^Trellis obtained using Massey's Trellis Construction Method for a (5,3) Code 20Figure 2.4^An Alternate Trellis for the (5,3) Code ^ 20Figure 2.5^Trellis obtained using Fomey's Trellis Construction Method for a (5,3) Code^25Figure 2.6^Reduction of Parity Check Matrix to Standard Form. ^ 29Figure 2.7^Trellis for a (7,4) Hamming Code Generated using Method 1 ^ 35Figure 2.8^Trellis for a (7,4) Hamming Code Generated using Method 2 38Figure 2.9^Bounding Envelopes of the Trellis Dimensions ^ 47Figure 3.1^Representation of Pruning Decoder Operation as Acting on Sets of Codewords ^ 55Figure 3.2^Representation of Pruning Decoder Operation as Acting on Sets of Heads 55Figure 3.3^Normalized Number of Comparisons for Sifting M from 2M Contenders 71Figure 3.4^Contender Formation and Sifting ^ 73Figure 3.5 Contender-Subset Formation and Contender Merge-Sifting ^ 74Figure 3.6^A Rate 1/2 Convolutional Encoder ^ 79Figure 4.1^An Example of the M Algorithm Decoding a (7,4) Code ^ 93Figure 4.2^An Example of the RT-M Algorithm Decoding a (7,4) Code 93Figure 4.3^Word Error Rate using the M Algorithm, RT-M Algorithm and the ViterbiAlgorithm ^ 96Figure 4.4^A Symmetric Pairing of Outputs from a Binary-Input Symmetric-OutputChannel 101Figure 4.5^A Sequence of Symmetric Transition Probabilities ^ 103Figure 4.6^Reordered Transition Probabilities ^ 103Figure 4.7^The Order Statistic Channel Model 103Figure 4.8^Examples of the Sort-Function for Different Sample Sizes ^ 106viList of FiguresFigure 4.9^Head Likelihood-Distance versus Depth for the RT-M Algorithm with aSystematic Code. ^ 111Figure 4.10^Word Error Rate of the RT-M Algorithm, Simulation and Estimation Results 114Figure 4.11^Number of Operations vs. Coding Gain : (32,16,8) Code on an AWGNChannel ^131Figure 4.12^Number of Operations vs. Coding Gain : (32,21,6) Code on an AWGNChannel ^132Figure 4.13^Number of Operations vs. Coding Gain : (64,51,6) Code on an AWGNChannel ^133Figure 4.14^Number of Operations vs. Coding Gain : (32,16,8) Code on a Rayleigh FadedChannel ^134Figure 4.15^Number of Operations vs. Coding Gain : (32,21,6) Code on a Rayleigh FadedChannel ^135Figure 4.16^Number of Operations vs. Coding Gain : (64,51,6) Code on a Rayleigh FadedChannel ^136viiList of TablesTable 4.1^Upper Bounds on the Number of Metric Operations ^124Table 4.2^Upper Bounds on the Number of Binary-Vector Operations ^124Table 4.3^Metric-Pair Operation Counts for Decoding the Extended Binary Golay (24,12)Code ^ 126viiiAcknowledgementsI wish to sincerely thank my research supervisor, Professor Cyril Leung, for his continualinterest and highly constructive criticism.I would also like to thank the following people who kindly provided me with preprints of theirwork; J.B. Anderson, J.T. Coffey and R.M. Goodman, F. Hemmati, and J.L Massey.Finally, I am grateful for financial support received from a Natural Sciences and Engineer-ing Research Council (NSERC) Postgraduate Scholarship, a British Columbia Science CouncilG.R.E.A.T. Scholarship, a British Columbia Advanced Systems Institute Scholarship, a Universityof British Columbia Supplemental Scholarship, and from NSERC Grant OGP-1731.ixChapter 1IntroductionNever discard information prematurely...A.J. Viterbi [1]THE reliability and efficiency of digital communications systems can be significantly improvedthrough the use of error control coding techniques. Although error control coding is one ofseveral potential techniques — including various modulation, diversity (time, space, frequency, etc.),and retransmission schemes — these techniques are complementary. Communications systems canemploy some combination of these techniques to trade off reliability against various costs such asbandwidth, delay, transmitted power, and hardware requirements.In implementations of error control coding schemes, the complexity of the decoder is typicallythe limiting constraint. However, improvements in decoding algorithms, in concert with theincreasing performance and decreasing costs of VLSI circuit technology, facilitate the utilizationof more powerful error control codes. The work presented in this thesis focuses on the decodingChapter 1. Introduction^ 1.1 The Decoding Problem'bottleneck', and we consider efficient approaches to attain optimal and near-optimal decoding ofgeneral linear block codes. By 'optimal', we mean that a decoder's probability of error is that ofa maximum-likelihood (ML) decoder. The motivation in considering 'near-optimal' decoders is toobtain a significant decrease in decoding effort while incurring a negligible or a tolerable increasein probability of error. As well, the motivation for considering general linear block codes is tohave a single method accommodate several applications, so that improved economies-of-scale maybe realized.1.1 The Decoding ProblemSince we are interested in general codes, it is useful to consider the results of information theoryin guiding us towards useful codes. We recall the fact that improved performance is generallyobtained for longer codes. This is a direct result of the channel coding theorem, which in partstates that the expected probability of message error is upper bounded by [2]< e-nEr(R),where n is the block length, R is the code rate, and Er(R) is the random coding exponent. Thisresult applies to the ensemble of all randomly selected codes, thus guaranteeing that codes existthat satisfy the bound of (1.1). For rates R below the capacity of the channel, Er(R) > 0, and thusa longer block length will improve ensemble code performance.Unfortunately, the desirable error control power of the longer codes comes mainly at a costof greater code complexity. This has constrained implementations to moderate blocklengths. Theobjective is then not only to develop efficient decoding algorithms but to explore the trade-offbetween the decoding effort and the probability of error. Different decoding algorithms can have2modulatormessage mk q-ary information^ n q-ary channel(n,k)block encoder •symbolssymbols •codewordreceiverChapter 1. Introduction^ 1.1 The Decoding Problemcontinuous-timechannel•^codeword estimateFigure 1.1. Simplified Diagram of a Block Coded Communication Systemsubstantial differences in error probability and in decoding effort when decoding the same code.If a desired error probability is specified one should seek the combination of code and decoderthat minimize the decoding effort. In the absence of a performance specification, one should seekdecoders that are good in the sense of attaining low error probability with low decoding effort,when operating on a specific code.A simplified diagram of a communication system that employs error control coding is shownin Figure 1.1. The (n, k) block encoder accepts k q-ary information symbols at a time, and formsa codeword c = ( c1 c2 • c„ ) which is fed to the modulator. The modulator maps eachcodeword into a continuous-time waveform, which is transmitted through the channel where it issubject to degradation. The task of the receiver is to estimate from the channel output whichcodeword was transmitted. We take as the criterion of optimality the probability of codeworderror. Ideally a maximum a posteriori (MAP) estimate of the transmitted codeword is formed, asthis minimizes the codeword error probability [3]. A brute force method of achieving this is tocompute an a posteriori probability for each codeword, and then select the largest. Since there areqk codewords this brute force method is usually impractical. To decode more efficiently, a closerexamination of the problem is required.3Chapter 1. Introduction^ 1.1 The Decoding Problem1.2 Hard-Decisions, Soft-Decisions, and the Discardingof Likelihood InformationFigure 1.2 shows a system similar to Figure 1.1, except that four types of decoding schemes areshown. For practical decoding the receiver processing is split into several sub-blocks, as shown,with most of the blocks operating on discrete-time data.1 The purpose of Figure 1.2 is to displaythe salient differences between the various decoders in terms of where information is discarded,either by quantization or by intentional omission. In Figure 1.2, blocks that quantize or otherwisediscard information are shown shaded.The demodulator delivers vectors of data to be processed by the decoder. The n vectors ofdemodulator output r, are denoted as r = (r, r, • • r„ ). The demodulator output may includechannel state information during each symbol interval, which can improve performance.2Consider the vector of demodulator output r, for the ith symbol. Let the demodulator outputbe processed to yield symbol likelihoods, which can always be done without losing any relevant in-formation [5]. This maps a r, into q symbol likelihoods ( f (r, I 0), f(r I 1), , f (r, q — 1) ).Each of the four decoder strategies shown in Figure 1.2 is shown to have a likelihood computationblock which computes, for each symbol period, q symbol likelihoods. These q likelihoods persymbol period are ideally all passed to the next stage of the decoder. Decoder strategies differsignificantly in the manner in which they handle these likelihoods, as discussed below.A common simplification that is made to MAP decoding is to assume equally likely codewords,in which case the MAP decoding rule simplifies to a Maximum Likelihood (ML) decoding rule1^In practice, some of the blocks may not be explicitly separate as shown here (e.g. the likelihood computation block) since their function maybe handled implicitly by the demodulator or decoder.2^A common example of channel state information is the estimation of the received signal level on a Rayleigh fading channel. For channel stateinformation we assume that the channel and modulation are of the SOCFES (State Observable Channel with Freely Evolving State) type, describedin [4]. The requirements for a SOCFES are that the observable channel states be independent of the modulated symbols and that the probabilitydistribution of the demodulator output for a symbol period depends only on the hypothesized transmitted symbol and the observed channel state.4Vector-Channelsoft-decisiondecoderq symbol likelihoodslikelihoodcomputation :p: UQUnquantized Soft-Decision Decoder-likelihoodcomputationlikelihoodquantizersoft-decisiondecoderPQk q-ary informationsymbols •message mn q-ary channelcodeword csymbolsn vectors of demodulator outputdemodulatorQuantized Soft-Decision Decoderlikelihoodcomputationselectmaximumsymbollikelihood1 symbol label (hard-decision)1 symbol likelihood^I error thresholdcomparisonerrors/erasuredecoderEEErrors/Erasures Decoder1 symbol label^(hard-decision)selectmaximum hard-decision 116HDsymbollikelihood1 symbol likelihood decoder• likelihoodcomputationHard-Decision Decoder(n,k)block encodermodulator•continuous-timechannelChapter 1. Introduction^ 1.2 Hard-Decisions, Soft-Decisions, andthe Discarding of Likelihood InformationFigure 1.2. Some Block Decoder StrategiesBlocks with inherent performance losses due to quantization or ignoring some data are shown shaded. Darkshaded lines indicate vector(s) of data transferred per symbol period. The block error probabilitiesfor the unquantized, errors/erasures, and hard-decision decoders satisfy PtiQ_< P- - EE < PHD• Aswell, for typical quantized soft-decision decoders we have PtiQ < PQ < PEE < PHD •5Chapter 1. Introduction^ 1.2 Hard-Decisions, Soft-Decisions, andthe Discarding of Likelihood Information[3]. Under the assumption that the channel is memoryless3, the ML decoding rule chooses thecodeword c that maximizesf (r1c) = H f(rd ci)^ (1.2)i--1where each term in the product is the likelihood of the channel output r, given the hypothesizedsymbol c,.The unquantized soft-decision decoder shown in Figure 1.2 processes all of the exact (unquan-tized) symbol likelihoods so that it can potentially deliver a ML decision. Let Pu(2 denote theprobability that the decoder output word is in error.The quantized soft-decision decoder processes symbol likelihoods after quantization, a processthat has an inherently degrading effect on performance. While the effects of quantization canbe made negligible, in a practical implementation where the goal is to minimize the number ofquantization levels there will be some budgeted loss in performance, so that PQ > Pucj. If weview the likelihood quantizer block in Figure 1.2 as being rather versatile — in the sense thatin addition to quantization it can perform scaling (i.e. multiplication by a constant), selection ofa maximum, and make comparisons against a threshold — then we can view errors/erasures andhard-decision decoding as special cases of the quantized soft-decision decoder. The errors/erasuresstrategy is formed by having the quantizer select, per each symbol period, the symbol elementthat has the largest likelihood if the latter is above a certain threshold; otherwise all q symbollikelihoods are set to zero. This action can be separated into first selecting the maximum symbollikelihood, which is then followed by thresholding to 'flag' erasures, as shown in Figure 1.2. Inthis way we can see that information is discarded at both steps. The probability of error for theerrors/erasures decoder is denoted as PEE. Typically the errors/erasures strategy discards more3^Or, that the channel can be made memoryless by incorporating channel state infomiation (see previous footnote).6Chapter 1. Introduction^ 1.2 Hard-Decisions, Soft-Decisions, andthe Discarding of Likelihood Informationlikelihood information than the quantized soft-decision decoder, so that PEE is typically greaterthan PQ. (Note however that in cases where one uses a very poor quantizer, the errors/erasuresdecoder can outperform the quantized soft-decision decoder.) Finally, the hard-decision strategyis formed by having the quantizer select, per each symbol period, the symbol that has the largestlikelihood and normalizing this likelihood to some standard value, then setting all other symbolelement likelihoods to zero. This can be viewed as tossing out the erasure information of theerrors/erasures decoder, as shown in Figure 1.2, so that PHD > PEE.Comparing the block error probability of the four decoding strategies shown in Figure 1.2 wehave PuQ < PEE < PHD, and typically PUG2 < PQ < PEE < PHD • The decrease in performanceis concomitant with less likelihood information being passed to the final decoding block. It is hopedthat the decrease in performance arising from discarding likelihood information prior to the finaldecoding block will be offset if there are significant savings in computational effort. The approachtaken in this thesis, however, is to attempt to achieve substantial computational savings withoutdiscarding significant likelihood information prior to the final decoding block. We take the approachthat the soft-decision decoder itself should be the judge of when to discard likelihood information.The motivation for this is twofold. First, retaining all of the likelihood information for use by thesoft-decision decoder can avoid an irrecoverable loss of coding gain. Second, the soft-decisiondecoder may be able to exploit the soft-decisions to simplify its decoding task.The loss of coding gain incurred by hard-decision decoding on the additive white Gaussiannoise (AWGN) channel with binary antipodal signaling is approximately 2 dB, for the ensembleof randomly constructed block codes4 [3]. If we consider specific codes instead of randomlyconstructed codes there are some other results available. For a code with guaranteed hard-decision4^As the SNR is decreased, the loss can be shown to approach 101og10 (2/7r) = 1.96 dB.7Chapter 1. Introduction^ 1.2 Hard-Decisions, Soft-Decisions, andthe Discarding of Likelihood Informationerror correcting capability ofdram 1t = ^2^_I(1.3)Chase [6] has shown that as the SNR is increased, the use of ML soft-decision decoding willeffectively increase the error correcting capability to dm,„ — 1. This corresponds to an approximatedoubling of the error correcting power of the code given by (1.3). In terms of dB gain, the asymptoticimprovement is 3 dB. At practical signal to noise ratios the gain obtained is closer to 2 dB.More significant gains can be realized using soft-decision decoding on some other channels. Ofconsiderable importance is an AWGN channel with Rayleigh faded signal amplitude and uniformlydistributed phase. Such a channel arises in mobile radio propagation at VHF/UHF frequencies inurban areas [7]. Rayleigh amplitude fading was also considered by Chase, with binary antipodalsignaling. It was shown that the gain can be significantly larger than 3 dB, and that the gain willincrease further with SNR, since ML decoding results in a change in asymptotic behaviour from a(t 1) th order to a dmth. order diversity system. Simulation results [6] for a (24,12) Golay codeshow a gain of approximately 6 dB relative to hard-decision decoding, at a bit error rate (BER)of 10-4.While we have reviewed that the fundamental motivation for using soft-decision decodingis that it can potentially exploit all of the available likelihood information, and in doing soattain the theoretical minimum probability of error, we have not yet suggested how this mightbe accomplished. The approach taken in this thesis is to employ tree and trellis representations ofgeneral linear block codes, which enable the decoder to easily utilize soft-decision metrics. Soft-decision decoding of convolutional codes has long exploited trees and trellises. ML decoding oflinear block codes has also exploited trellises (e.g. [8]) using the Viterbi algorithm (VA), providedthe trellis is reasonably compact. It is therefore of interest to know how to compute (and/or bound)8Chapter 1. Introduction^ 1.3 Organization of the Thesisthe trellis dimensions. As well, if the trellis is so large that a full search using the VA is impractical,a partial tree or trellis search could be considered. In this thesis we consider several aspects ofgeneral linear block code trellises, relating to their construction, dimensionality, and decoding viaboth complete and partial searches.1.3 Organization of the ThesisIn Chapter 2 we discuss trellis representations of general linear block codes. Improved methodsfor constructing complete or partial trellises are presented, and new bounds on the trellis dimensionsare found.In Chapter 3 we consider the case where a full trellis search is impractical and investigate someaspects of partial trellis searches. We find an appropriate metric for discarding paths. We alsopresent a new method to efficiently sort path metrics.In Chapter 4 we present a class of decoding techniques that exploit soft-decisions — not onlyfor the inherent coding gain — but to reduce the decoding effort. In doing so the techniquescompletely restructure the tree or trellis search.In Chapter 5 we conclude with a summary of contributions made and suggest some topics forfurther research.9Chapter 2Trellises for Linear Block Codes"The trellis is the foundation for channel coding in the sense that the trellis associatedwith the code is the primary determiner of the necessary effort for ML decoding."J.L. Massey [9] .T RELLIS representations of convolutional codes appear extensively in the literature. Origi-nally, Forney pointed out that a convolutional code state diagram can be represented by a"trellis" [10][111. With such a representation, ML decoding could then be viewed as a shortest-pathproblem, and it became clear that the Viterbi algorithm (VA) is optimal — not only asymptoticallyoptimal as discussed in [12].5 The powerful and versatile combination of the VA with convolu-tional code type trellises has since been successfully applied in many communications applications,including maximum-likelihood sequence estimation in intersymbol interference and the decodingof combined coding and modulation schemes [10[14].5^The optimality of the VA was also shown by Omura in [13].1 0Chapter 2. Trellises for Linear Block CodesIn contrast, trellis representations of block codes have been discussed much less frequently,with only a few papers appearing within twenty years of the introduction of the convolutional codetrellis. Of these, the introduction of linear block code trellises appears in [15][8][9]. Recently,in [16] and [17], trellis construction and the trellis state-space dimensions were re-examined. Inthis chapter, we continue this examination of trellis construction and dimensionality for generallinear block codes.In Section 2.1 we review some basic definitions related to trellises and their dimensions.In Section 2.2 we review three methods for trellis construction and dimensional computationsintroduced by Wolf [8], Massey [9], and Forney [16]. It is shown that Massey's and Wolf's trellisesare isomorphic6 and the isomorphism is derived.In Section 2.3 we discuss minimal trellises (trellises with a minimal number of states) and weutilize some of Muder's analysis in [17] to show that trellises generated using Wolf's or Massey'smethod yield minimal trellises. This complements Muder's result that Forney's method yields aminimal trellis [17].In Section 2.4 three simplified trellis construction methods are presented for constructingminimal trellises of general linear block codes. Also, a method for computing the trellis dimensionsis derived that is an alternative to the methods of [9] and [16].In Section 2.5, we improve on a bound on the trellis dimensions due to Muder [17] and discusssome other properties of the trellis dimensions. In particular, it is shown that when equivalentcodes are constructed by permutations of the symbol positions the resulting trellis dimensions areA trellis is isomorphic to some other trellis if it differs only in the labeling of states at each depth PM11Chapter 2. Trellises for Linear Block Codes^ 2.3 Organization of the Thesisfixed near either end, while in the central portion of the trellis the dimensions vary between anattainable upper bound and a lower bound.2.1 Code Trees and TrellisesIn the communications literature, a tree or a trellis is commonly described as a graph thatrepresents each codeword of a code by a distinct path, with all paths originating at a single rootnode. Each path is composed of branches, and a trellis differs from a tree in that its branchesmerge as the trellis is traversed. Each branch may be assigned a number of codeword symbols.For convolutional codes each branch usually represents the n channel symbols output during eachshift of k information bits into a rate R k In convolutional encoder. For block codes we usuallyassociate one channel symbol per branch although this need not be the case [9][16 ][17]. Puncturedconvolutional [18][19] or block [20] codes are other examples where the resultant tree or trellismay have a non-constant number of channel symbols per branch.We now describe trellises more formally. Most of this discussion is a summary of somedefinitions from Muder [17], with some alteration of notation.Consider an edge labeled directed graph T = (V, A, E) consisting of a set of vertices V andedges E, with each edge labeled by elements from an alphabet A. If V can be partitioned intoa union of disjoint subsetsV=VoUVIU•••UVn, (2.1)such that every edge that begins at a vertex in V/ ends at a vertex in V, then T is called a trellisof length 71. The vertices V/ of a trellis are commonly called the states (or nodes ) at depth (orlevel) 1, and the edges are commonly called branches.12Chapter 2. Trellises for Linear Block Codes^ 2.1 Code Trees and TrellisesA codeword c EC is usually represented in T by a path consisting of Ti branches labeled byel , c2, , c„ from 170 to V. As mentioned earlier, one may group branches to form multi-symbolbranches.An example of a trellis, under Muder's definition, is the usual code tree for C. We will denotethe code tree for C as T. The tree T is an example of a proper trellis, where a trellis is proper if1/0 has a single state 0 (the root), every state belongs to some length n path, and no two brancheshave the same initial state and label. For linear codes it is sufficient to limit our discussion toproper trellises [17].The set of states at depth 1 in T correspond to the set of heads of length 1. We introduce the termhead to refer to a length 1 initial portion of a codeword, i.e. if c (el, c2, , , c„) E Cthen a head is ch (c1, c2, ,c). We use Muder's terminology of referring to the tail as thelength 7/ — 1 final portion of a codeword, denoted c1, so that when given a head ch if(ch , c t) EC (2.2)then ct is called a tail of ch .The number of states at each depth of the trellis is of interest, since this largely determines theML decoding effort of both convolutional and block codes [9]. As in [17], let S1 denote the numberof states at depth 1 of a trellis of C, and let S = maxi Si, si = logq Si, and 8 = 'Ogg S. We referto si as the dimension of the trellis at depth 1, and s simply as the maximum trellis dimension.77^We note that in [17] that these terms refer to minimal trellises, which will be discussed in §2.3. Our present use of these terms to refer togeneral trellises will not cause confusion, since at present we need not distinguish between minimal and non-minimal trellises. Moreover, it willbecome apparent in §2.3 and §2.4 that all trellises considered in this thesis are in fact minimal.13Chapter 2. Trellises for Linear Block Codes^ 2.2 Review of Trellis Construction Methods2.2 Review of Trellis Construction MethodsTrellis representations of convolutional codes arise naturally from the fact that the convolutionalencoder is a finite state machine. To construct a convolutional code trellis one can assign trellisstates to be a vector corresponding to the contents of the encoder shift register [10][11], and thebranches are labeled with the corresponding encoded symbols.Constructing trellises for general linear block codes is less obvious than for the case ofconvolutional codes. In this section we review trellis constructions for general linear block codesdue to Wolf [8], Massey [9], and Forney [16], with emphasis on the first two methods since theyform the basis for the simplified trellis construction methods discussed in §2.4.2.2.1 Wolf's Trellis Construction MethodTrellis representations of linear block codes appear to have been first introduced by Bahl et al[15], who showed that linear block codes may be represented by a trellis and examined a decodingalgorithm that minimizes the symbol error rate. Wolf [8] later discussed trellis construction in moredetail and emphasized that ML block decoding can be achieved using the Viterbi algorithm.Wolf's trellis construction method for a general linear block code is based on the parity checkmatrix.8 We show how the trellis for a code can be generated by first describing how any particularcodeword is represented. Let H denote the code's parity check matrix and let h, denote the it"column vector of H. The generator matrix G corresponding to H contains rows that span a kdimensional subspace of GF(q)71, and this subspace corresponds to the code. Recall that the rowspace of H is an orthogonal complement to the row space of G, so thatHGT = 0.^ (2.3)8^Wolf also discussed that a trellis for a cyclic block code can be generated in a manner similar to that for a convolutional code, in that one cantake as the state the contents of the encoder shift register.14Chapter 2. Trellises for Linear Block Codes^ 2.2 Review of Trellis Construction MethodsEquivalently, any codeword c E C satisfiesHcT = 0.^ (2.4)The above expression can also be written as71^hici = 0^(2.5)where cz is the^symbol in c. Equations (2.4) and (2.5) state that the syndrome of a codewordis zero.Consider the sequence defined by0-0 = 01 Gri^hiCil Cr2^h2e2,^=^+ ^(2.6)and observe that a„ is a recursive expression describing the formation of the syndrome. A paththrough a trellis for a particular codeword can be traced as follows. Let the root node of the trelliscorrespond to 0-0 = 0, the all zero n — k tuple. Let states at successive depths in the trellis begiven by o-i, 0-2, , u„. Thus the beginning and final states will be ao = 0 = a„. In principle, onecan think of the trellis as being generated by considering the paths traced by all codewords. Eachsequence of states in a path has a one to one correspondence with a valid codeword, since at leastone of the symbols in a codeword differentiates it from all other codewords, resulting in at leastone of the states in the sequence (2.6) for each codeword being different.The formation of a state can be more concisely described using a head ch of length 1 and thematrix Hh, which consists of the 1 initial columns of H. A state at depth 1 can then be written as^HhchT.^ (2.7)The trellis may be formed by tracing the sequence of all states, generated by all heads ch in (2.7),for each subsequent depth. Wolf's trellis construction consists of two steps. In the first step all15Chapter 2. Trellises for Linear Block Codes^ 2.2 Review of Trellis Construction Methodsheads are used for all depths in (2.7), including those heads that do not belong to the code. Thisadding together of all possible symbol weighted combinations of the columns of H yields a state-space diagram, called the unexpurgated trellis, which represents all q" uncoded sequences. In thesecond step Wolf then expurgates all paths that do not lead to the zero state at depth it, which thenleaves a trellis representing the code.As a simple example consider the binary (5,3) code with parity check matrix given byH=[ 1 1 0 00 1 0 1 1 (2.8)The unexpurgated trellis is shown in Figure 2.1, and the final code trellis is shown in Figure 2.2.With respect to the trellis dimensions, in [151[8] it was noted that since the state vector haslength it - k, the number of states is at any depth is at most q1—k• In other words, the maximumtrellis dimension s is upper bounded by it — k. In the example, the maximum trellis dimensionmeets the upper bound of it - k = 2.16Chapter 2. Trellises for Linear Block Codes^ 2.2 Review of Trellis Construction Methods0^2^3^4^5Figure 2.1. Step 1 of Wolf's Trellis Construction Method for a Binary (5,3) CodeIn this step this unexpurgated trellis is generated. A horizontal branch corresponds to a '0' channel bit.0^2^3^4^5Figure 2.2. Step 2 of Wolf's Trellis Construction Method for a Binary (5,3) CodeThe code trellis is obtained from the unexpurgated trellis by removingall branches that do not lead to the all zero state at depth IL17Chapter 2. Trellises for Linear Block Codes^ 2.2 Review of Trellis Construction Methods2.2.2 Massey's Trellis Construction MethodIn [9] Massey discusses a trellis construction that utilizes the code's generator matrix. We willdemonstrate the trellis construction method using the code of the previous example, which is alsoused in [9]. A generator matrix corresponding to (2.8) isP1^P21 0 1 0 00 1^1^0 1 (2.9)0 0 0 1 1i2^i3where we have labeled the information and parity bit positions as i1 , i2, i3 and p1, p2 respectively.We may concisely summarize Massey's state assignment method using our earlier definitions ofthe head and tail of a codeword. A state is assigned to be the vector of parity bits in the tail, asdetermined by the information bits in the head. Hence, a state assignment at each depth of the trelliscan be written as a matrix equation using the appropriate submatrices of G. Let us assume that Ghas been reduced to row echelon form, so that all non-pivot elements in the information-positions9have been eliminated. Let P denote a matrix formed from G by extracting all the columns of Gthat correspond to the parity symbols. For our example,G=P= (2.10) Let Pt.) denote the submatrix obtained from P by retaining only the first j rows, and then of these,retaining only those columns that correspond to parity bits that are in the tail. Massey's stateassignment method for a head of length I can then be expressed as(2.11)9^Throughout this thesis, 'the' information positions are those independently specified positions found by reducing the generator matrix fromleft to right.18Chapter 2. Trellises for Linear Block Codes^ 2.2 Review of Trellis Construction Methodswhere ih is the vector of information bits in the head, with j indicating the number of informationbits in ih . For our example, at depth 0 we have ih = 0, J = 0, and so ao = 0. At depth 1,ih (i1), j = 1, and Pti = (1 0), so that states are assigned using(i1)( 1^0). (2.12)Similarly, at depth 2 we have ih = i^2), J= 2, anda2 =(i1 z2 ).^(1^01). (2.13)At depth 3, ih = (i1 i2), j = 2, and a single parity bit remains in the tail, so that= ( 22 ) ) • (2.14)Notice that a3 is a state vector of length 1, while 0-1 and a2 were of length 2. For convenience indrawing the trellis, we can append leading zeros to any state to form a constant length state vectorof length n — k. At depth 4 we have,0azi = ( il i2 i3 )( 1^•1(2.15)Finally, at depth 5, o-0 = 0. Figure 2.3 shows the trellis formed using the above equations, withleading zeros appended to state vectors of length less than n — k. Note that while the state assignmentequations give row vectors, we will transpose them and use column vectors in indicating the stateson trellis diagrams.Massey gives a more compact trellis than that of Figure 2.3, which is shown in Figure 2.4. Themore compact trellis can be found from Figure 2.3 by grouping the symbols at depths 2 and 3. (Inthe example in [9], Massey finds the trellis of Figure 2.4 by first grouping the symbols and thenforming state equations similar to (2.12)—(2.15) .) We will concentrate on single symbol per branchtrellises (as in [17]) since this allows us to easily compare different trellises for the same code.19Chapter 2. Trellises for Linear Block Codes^ 2.2 Review of Trellis Construction Methods7^2^3^4^5Figure 2.3. Trellis obtained using Massey's Trellis Construction Method for a (5,3) Code2^3^4Figure 2.4. An Alternate Trellis for the (5,3) CodeThis version is identical to Figure 2.3 except for a grouping oftwo symbols per branch for the transition from depth 1 to 2.Comparison of the example of Massey's trellis in Figure 2.3 with the example of Wolf's trellisin Figure 2.2 reveals that they are identical. To explain why this is so, first note that in Massey'sconstruction we began by reducing G to row echelon form to completely eliminate all non-pivotelements in the information positions. The remaining columns specify the generation of the paritysymbols. Form a H matrix corresponding to G.1° In our example the generator matrix of (2.9) will10^This can be done as follows. First, since G has been reduced to row echelon form, and has rank k, we can extract k columns that will forman identity matrix Ik• This leaves n — k columns that form the matrix P. From the usual discussion of systematic linear block codes, a matrix ofthe form [1k I P} is orthogonal to [—PT I I,_k]. We can form a parity check matrix for any non-systematic code by first extracting the parity20Chapter 2. Trellises for Linear Block Codes^ 2.2 Review of Trellis Construction Methodsgive the corresponding parity check matrix of (2.8). Now, recall that in Wolf's trellis constructionmethod the state assignment equation (2.7) iscr(W)^HhchT^ (2.16)where here we have used the superscript (W) to indicate that this is Wolf's state assignment. Sincewe require o-„ = 0 we must have(W) 1-, Hi ct T^0 (2.17)where Hi is the tail portion of H (i.e. the last n — 1 columns of H) . Note that there will exist acodeword tail, descending from this state, that has zeros in its information symbol positions. Letcii denote such a tail. Then, the state o-vv) in (2.16) will also satisfya(vv) H i caT^0. (2.18)The second term in (2.18) is precisely the vector of parity symbols in the tail corresponding toinformation symbols in the head.11 Consequently, we can writefitc/tT= 0 1(M)T^ (2.19)where the (M)T superscript of or/ indicates that the right hand side of (2.19) is equivalent toMassey's state assignment, transposed (and extended with zeros so that it has length 7? — k). Hence,using (2.19) in (2.18) we have that (W)^(M)Tal + a l^=0 (2.20)columns from G to fonn P, then forming —PT and In-k, and finally forming H by placing the n — k columns of I,_k into the parity positions(in order), and similarly placing the k columns of —PT into the information positions.11^To see that this is so, first consider that c't has nonzero symbols only in the parity positions of the tail, and these symbols are determined bythe information symbols in the head. Next, consider that 11' has (by construction) columns in the parity positions that fomi an identity matrix, sothat the product Wel' will simply extract out the parity symbols of c'.21Chapter 2. Trellises for Linear Block Codes^ 2.2 Review of Trellis Construction Methodsor(M)T^(W)= — a-l^•(2.21)For the binary case this simplifies to(M)T^(W)= (2.22)and the state assignments of Massey's and Wolf's methods are equal (except for the transpose andthe extension of Massey's state vector with zeros). For the general nonbinary case the isomorphismbetween Massey's and Wolf's state assignments is given by (2.21).With regard to the trellis dimensions, it is discussed in [9] that the maximum number of statesat any depth of the trellis will bemax [rank(Pt3 )1 •^ (2.23)For example, from (2.12)-(2.15), the largest rank matrix occurs for depth 2, withp t2 = [^0 11which has rank 2, indicating that the maximum number of states is 22. Also, since the dimension ofthe trellis at each depth is the dimension of each of the P, we have for our example the followingdimensions;(2.24)depth 0 1 2 3 4 5si 0 1 2 1 1 0(2.25)In closing, we emphasize that the main difference between the methods of Massey and Wolfis that Massey directly calculates the allowed states; instead of generating a trellis representing alluncoded sequences and then expurgating tails that do not reach the toor.1212^In [9] the state o-„ =_ 0 is called the toor, as it is the root node of the trellis viewed from the reverse direction.22Chapter 2. Trellises for Linear Block Codes^ 2.2 Review of Trellis Construction Methods2.2.3 Forney's Trellis Construction MethodIn 1161 Forney's construction of linear block code trellises utilizes a trellis oriented generatormatrix. We briefly review his approach below, drawing upon Forney's terminology but with somealterations.Consider the generator matrix G for an (n, k) linear block code C, and an associated trellis T.The states at depths 0,1, ... , n will be indexed by 1.13 Choose some depth 1 at which we divide thecode's symbols into the past and the future. By the past we mean symbols before a node at depth1 while the future corresponds to symbols after depth 1.14 Consider the subcode Cp consisting ofall the codewords that are all zero in the future. Now, in 1161 the span of a vector refers to therange of coordinates between the first and last nonzero coordinates, inclusive. Hence, the span ofCp is in the past. This past subcode Cp may be specified by a number Kp of generator matrix rows(generators). Similarly, we define the subcode C1 consisting of all the codewords that are all zero inthe past, i.e. their span is in the future, and may be specified by K1 generators. Now, the codewordsof C can be specified by k generators, of which we can account for Kp K1 using the past andfuture subcode generators. This leaves a need for K, = k — Kp — K f generators which necessarilyspan parts of both the past and futurels. We will refer to this code as the spanning code, C.One may calculate the trellis dimensions by first reducing G to exhibit generators for C1 andCp, then counting Kf and lip at each depth and using K, = k — K1— K. Alternatively, afterreducing G one can find K, by inspecting the matrix to count the number of generators belongingto Cs at the depth. In other words, K, will be the number of generators that cover the depth.13^We have used the term depth, instead of position as used in [16], to refer to the indexing of nodes through the trellis in order to avoid confusionwith our use of position to index the n codeword coordinates.14^Fomey's terms past and future correspond to the terms head and tail, respectively.15^The dimension K, is Fomey's notation for the state-space dimension at sonic depth, where elsewhere we have used s1 to denote the state-spacedimension at depth 1.23Chapter 2. Trellises for Linear Block Codes^ 2.2 Review of Trellis Construction MethodsWe now demonstrate Forney's method of dimension calculation and trellis construction usingthe example (5,3) code. A trellis oriented generator matrix for the code is100010110011001(2.26)from which we can determine the following dimensions;depth 0 1 2 3 4 5Kp 0 0 0 1 2 3K f 3 2 1 1 0 0Ks 0 1 2 1 1 0The trellis construction proceeds as follows. First, note that the trellis oriented generator matrixfor some depth can be partitioned asG p 00^G f^ (2.28)Gs!' Gt.,so as to display the past and future subcode generators as well as the generators for the spanningcode. The spanning code generators are shown as being split into their head and tail portions.Forney's method labels states at a depth with all the codewords generated by the heads of thespanning code, Gsh.For our example, we obtain the following state assignment equations, by reading off GI: foreach depth from (2.26)ao^0 (2.29)= (ii)(1)^ (2.30)0-9 (2.31)24Chapter 2. Trellises for Linear Block Codes^ 2.2 Review of Trellis Construction MethodscY3 = (i2)( 1 1) (2.32)0-4 = (i3)( 0 1) (2.33)o-5 = 0 (2.34)These equations generate the trellis shown in Figure 2.5.0^2^3^4^5Figure 2.5. Trellis obtained using Forney's Trellis Construction Method for a (5,3) Code25Chapter 2. Trellises for Linear Block Codes^ 2.3 Minimal Trellises2.3 Minimal TrellisesWe are interested in trellises that have the fewest number of states. In [17], Muder definesa trellis T for a code C to be minimal if for every trellis T' for C the number of states at eachdepth is minimal, i.e.(2.35)Muder showed that minimal trellises exist for every linear block code as well as determining theconditions for minimality. The approach was to begin with a tree / and then view trellis states asrepresenting heads in T that satisfy a certain equivalence relation. For a trellis to be minimal, itis necessary and sufficient that a state at depth I be assigned to those heads Ch of the tree / thatsatisfy the equivalence relationC r•-'tC lh (2.36)where —1 indicates that ch and Cih share the same set of tails. The equivalence classes resultingfrom the partitioning of heads by at depth 1 form the set of states V1 for a minimal trellis. Allminimal trellises for a given code are isomorphic [17, Theorem 1].In [17], Muder used this minimality condition (i.e. that minimal trellises have states assignedaccording to the equivalence classes) to show that Forney's construction method yields a minimaltrellis. Here, we use Muder's minimality condition to prove the following proposition.Proposition 2.1: Wolf's method [8] for constructing trellises for general linear block codes, andMassey's method [9], yield minimal trellises.Proof: Since we have shown earlier that Massey's trellis construction method results in a trellisthat is isomorphic to the trellis generated by Wolf's method, it is sufficient to show that Wolf'smethod results in a minimal trellis.26Chapter 2. Trellises for Linear Block Codes^ 2.4 Simplified Trellis Construction MethodsIf an (Ti — 0—tuple ci is to be a tail of ch then we must have from HcT = 0 thatHhchT^HictT.^ (2.37)Equation (2.37) implies that for two heads Ch and cth to share a tail we require thatHhchT = Hyla.^ (2.38)Also, heads that satisfy (2.38) and (2.37) will share all of their tails. In other words, heads thatsatisfy (2.38) and (2.37) will belong to an equivalence class defined by —t . From Muder's conditionthat a minimal trellis can be constructed by assigning a state to those heads in a —t equivalenceclass, we can construct a minimal trellis by assigning a state to those heads that satisfy (2.38)and (2.37). However, this is precisely the state assignment definition (2.7) used in Wolf's trellisconstruction method.2.4 Simplified Trellis Construction MethodsWe now describe three simplified methods of constructing a minimal trellis for general linearblock codes. The methods avoid the generation/expurgation of heads using Wolf's unexpurgatedtrellis, and also avoid matrix multiplication at each state computation as in Massey's and Forney'smethods. Some other features also contribute to their simplicity, which will be summarized at theend of this section.2.4.1 Method 1Method 1 assigns each state to be a vector of parity check symbols, just as in [15] [8]. Theresulting trellis will be isomorphic to these trellises, and so will be a minimal trellis. Our presentation27Chapter 2. Trellises for Linear Block Codes^ 2.4 Simplified Trellis Construction Methodsof the trellis construction method begins with finding an alternative method for assessing the trellisdimensions. This method of assessing the trellis dimensions will give identical results to Massey'sand Forney's methods since all of these trellises are minimal.The parity check matrix type trellises of [15] [8] assigns as a state to a head the vector (partialsyndrome) formed by summing columns of H weighted by the corresponding head symbols. Tofind the trellis dimensions we will consider the dimension of the column space of partitions of H,where a partition of H consists of head and tail submatrices. The head submatrix Hh contains thefirst 1 columns of H and the tail submatrix contains the remaining n — 1 columns.Elementary row operations on H will not change the solution to HcT = 0, so that we are freeto construct trellises from such transformed parity check matrices. Although any set of elementaryrow operations on H will preserve the code, here we prefer to put H into a standard form, denotedilstd, for reasons that will be discussed later. The actions to be performed on H to reduce it tostandard form are summarized in Figure 2.6. First, Step 1 uses elementary row operations to reduceH to lower trapezoidal form, with the elimination proceeding from right to left and bottom to top.Since the rank of H is n — k this reduction will yield n — k tail pivots, whose column positions areindicated in Figure 2.6 by downward pointing arrows along the top of the matrix. For any head/tailpartition of H, this step will identify tail pivots with corresponding columns that can be taken asbasis vectors for the column space of Ht. (This follows from the fact that pivots found by rowreduction also correspond to the pivot locations found if column reduction was used 1121[.) The Hmatrix after Step 1 will be denoted H1. The second step is to use elementary row operations, exceptrow exchanges, to find n — k head pivots, by proceeding from left to right and from top to bottom.The resulting matrix is the desired 'Ltd, and the column positions of the head pivots are indicatedin Figure 2.6 by upward pointing arrows along the bottom of the matrix. For any partition of H the280*0+4 f+Head PivotsStep 2: Row reduction,top left to bottom right,without row exchanges,yields HstdTail Pivots+++ ++++Chapter 2. Trellises for Linear Block Codes^ 2.4 Simplified Trellis Construction MethodsStep 1: Row reduction,bottom right to top left,yields lower trapezoidal form,H1Figure 2.6 Reduction of Parity Check Matrix to Standard Form.The *'s represent the row and column position of the pivots.head pivots identify those columns that may be taken as basis vectors for the column space of Hh.Now consider the formation of the trellis using Hstd. Any codeword satisfies (2.5), orequivalently^fechT HyT^ (2.39)where, in order to simplify the notation, we have used H for Hstd. We may rewrite (2.39) ascli + HtciT 0 (2.40)where cy/ is the state for ch at depth /. Consider an extension from depth / to depth / 1, withthe new state given by^= 0-1 + hi+ I^. (2.41)29Chapter 2. Trellises for Linear Block Codes^ 2.4 Simplified Trellis Construction MethodsProposition 2.2: Let /h and h denote the set of symbol positions at which head and tail pivots,respectively, are found in H. Then the increase in trellis state-space dimension after an extensionfrom depth / to / 1 is{^o^14-ict Ih, 1+1V h1 1+1 E Ih, 1 + 1 CZ It—1 1 d - - 1 ,Z Ih, 1+1EIt0^1+1E1h, 1+ 1 E it(2.42)Proof: Let Hh(ext) denote Hh after extension of the heads to depth / 1. Similarly, let Ht(et)vext,denote Ht after extension. During extension, the column ht+i is transferred from Ht to HittFor any state at depth / 1, we must have from (2.40) that+ Ht (e xt) ct(ezt)T^0 (2.43)where ct(t) the codeword tail for c after extension of the head to depth /^1. Since (2.43)expresses an linear dependence condition it must hold that the state-space of the heads at depth/ 1 is spanned by the columns of Hi (t). We consider the four cases in (2.42) in turn.Case (i) /^1^Ih, 1 + 1 CI It. With /^1^Ih, then the state-space dimension cannot beincreased, since 1-4+.1 is not linearly independent of Hh. Also, with / 1 V It the columns ofHt(t) span the same space as those of Ht, so that any state in existence at depth / will also satisfy(2.43). Hence st+i — = 0.Case (ii) / 1 E Ii,, / 1 V It. With / 1 V It we have from Case (i) that all states in existenceat depth / will also satisfy (2.43). With / 1 E It, the state-space dimension will increase byone, provided all such states satisfy (2.43). This will be true since the transferral of ht+i from30Chapter 2. Trellises for Linear Block Codes^ 2.4 Simplified Trellis Construction MethodsHt to Hh(ext) does not change the span of lit(ext), thus the inclusion of 1-1/+1 in Hhtext) adds adimension without violating (2.43).Case (iii) / 1 CZ hi, 1+1 E I. As in Case (i), with / 1 1), the state-space dimension cannotbe increased. Assume that the state space dimension remains constant. This would require thatHt() span the same space as Ht. However, this is not the case since 11/±1 is not in the span ofHt(ext), so that the state space dimension must be reduced by one.Case (iv) / 1 E 1 +1 E I. Here the effect of the transferral of hi+i may be viewed asincreasing the dimension of the extended state—space due to the fact that 1-1/+1 is a basis vectorfor Hh(ext), followed by a decrease in dimension due to the fact that hi+i, while being in thespan of Hh("t), is not in the span of lit(ext). Hence the result is no net change in the state-spacedimensionality.Evaluating the trellis dimensions from a parity check matrix in standard form using Proposition2.2 is particularly simple. One needs to simply increment the dimension by one for each headpivot and similarly decrement it for each tail pivot. As an example, we return to our (5,3) codeexample, with its parity check matrix;11^1 o 010 1 o 1111(2.44)which happens to already be in standard form. We have indicated the positions of the head and tailpivots with up and down arrows, respectively. Beginning at depth 0 with so = 0 and working todepth 5, we increment si for each 'up' arrow and decrement it for each 'down' arrow, to obtaindepth 0 1 2 3 4 5Si^0 1 2 1^1 0 (2.45)31Chapter 2. Trellises for Linear Block Codes^ 2.4 Simplified Trellis Construction MethodsFor the purpose of finding the trellis dimensions, it is sufficient to have H in standard form,but it is not necessary. All that is required to use the increment/decrement rule of Proposition 2.2is to have reduced the parity check matrix to expose the head and tail pivots.By viewing H as the generator matrix for C1 (the dual code to C) the reduction of H to exposethe head and tail pivots will yield a trellis oriented generator matrix for C1. Moreover, since Forneyshowed that the state-space dimensions for C or its dual CI are identical [16] we can compute thetrellis dimensions beginning with either G or H, then reducing (which finds a matrix which can beconsidered to be a trellis oriented generator matrix or a parity check matrix with head/tail pivotsexposed), and then finally computing the trellis dimensions by counting the number of spanninggenerators or by incrementing/decrementing a dimension 'counter' according to the positions of thehead/tail pivots. We comment that the increment/decrement method of Proposition 2.2 is somewhatsimpler, since at each depth one needs only to 'bump' a counter after inspection of 2 items (the upand down arrows that indicate the pivot positions); as opposed to inspecting k items (the rows ofG) and counting up all those that meet a test of whether they belong to the spanning code.Trellis Construction: Method 1.A minimal trellis can be constructed in a breadth-first manner and without the codewordexpurgation used by Wolf [8], as described below. The parity check matrix to be used can beH1 or Hstd, or any H matrix that has been row reduced from 'right to left' to find tail pivots. Wewill refer to a position at which there is a tail pivot in the reduced matrix as a constraint position.Trellis Construction (Method 1)1. Initialization: Set 1 = 0, o0 = 0, Vo = {0-0} so that at depth 0 the set of states170 consists only of a root.32Chapter 2. Trellises for Linear Block Codes^ 2.4 Simplified Trellis Construction Methods2. Constraint Position Test: If the position / 1 V It the extension is unconstrained;perform step 3. If the position 1+1 E It the extension is constrained; perform step 4.3. Unconstrained Extension: For all states crt in Vb p%%[ Error: timeout; OffendingCommand=^+^hi+1^,j = 0,1,...,q— 1^(2.46)and label each branch from at to 07+1 with its associated symbol al. (This extendseach state at depth / with q branches to states at depth / 1.) Perform step 5.4. Constrained Extension: For each state at in Vt; Let ck, denote the value of thestate vector element which corresponds to the row of the tail pivot at position /. Let= —ac denote the additive inverse of cyc. Set^crl+1 =^atchi+i^ (2.47)and label the branch from ai to al+1 with cV,.. (This extends each state at depth /with only a single branch.)5. Termination Test: If / = n — 1 the trellis is complete. Otherwise increment / andperform step 2.Using parity check matrices that have been reduced to lower trapezoidal form, such as H1or Hstd, offers a minor advantage in that the state extensions at each constraint position becomeslightly easier to construct. A pointer initialized to 0 and incremented at each constraint positioncan be used to pick off a, from the state crt• The resulting trellis will have the characteristic that ateach successive constraint position the remaining leading element of the state vector is disallowed(i.e. must be zero). Using parity check matrices that have been reduced to standard form allowsthe trellis dimensions to be computed prior to trellis construction.As an example, consider a Hamming (7,4) code specified by the parity check matrix33Chapter 2. Trellises for Linear Block Codes^ 2.4 Simplified Trellis Construction Methods1 1 1 1 0 0 00 0 1 1 1 1 00 1 0 1 1 0 14440 1 2 3 2 2 1 0which can be seen to have been obtained by interchanging positions 1 and 5 of a systematic paritycheck matrix. In this case, the matrix is already in standard form, and we have indicated the headand tail pivots along with the resulting trellis dimensions. The corresponding trellis appears inFigure 2.7 which shows the leading ('most significant') bits of the state vector being successivelydisallowed at each successive constraint depth.341^2^3^4^5^6^7/1 \Chapter 2. Trellises for Linear Block Codes^ 2.4 Simplified Trellis Construction MethodsFigure 2.7 Trellis for a (7,4) Hamming Code Generated using Method 1The trellis was generated from a parity check matrix in lower trapezoidal form, which results in the mostsignificant bits of the state vector being successively disallowed at each constraint depth.35Chapter 2. Trellises for Linear Block Codes^ 2.4 Simplified Trellis Construction Methods2.4.2 Method 2Method 2 is a refinement of Method 1, in that the state extensions are even simpler to compute.This simplification of the state computations occurs at constraint positions.For Method 2, the parity check matrix is reduced to row echelon form, so that each pivot isthe only nonzero element in its column. We again require that the reduction of H be carried outfrom right to left and from bottom to top. The resulting form will be similar to that of H1 asshown in Figure 2.6, except that complete reduction is used to eliminate all nonpivot elements.Let the resulting matrix be denoted as H2. For the example parity check matrix on page 33, weobtain H2 asI I1^1^o o 1^1^o^1 1 1 o o o (2.48)o 1 o 1 o 1In the following algorithm, steps 1 and 2 are slightly modified versions of the respective steps inMethod 1, and steps 3 and 5 are identical to the respective steps in Method 1. The main changeof Method 2 with respect to Method 1 is in step 4.Trellis Construction (Method 2)1, Initialization: Set / = 0, o-0 = 0, V0 = {0-0} so that at depth 0 the set of states Voconsists only of a root. Initialize a counter ip to 0. (The counter ip will be used topoint to successive elements of the state vector.)2. Constraint Position Test: If the position / 1 ,% It the extension is unconstrained;perform step 3. If the position /^1 E It the extension is constrained; incrementip and perform step 4.3. Unconstrained Extension: For all states a/ in^performcri±i =a1 + 1h1+1^, j = 0, 1,^, — 1^(2.49)36Chapter 2. Trellises for Linear Block Codes^ 2.4 Simplified Trellis Construction Methodsand label each branch from ai to ai+i with its associated symbol al. (This extendseach state at depth / with q branches to states at depth / 1.) Perform step 5.4. Constrained Extension: For each state cri in 1; Form 07+1 by setting the iti; elementof a/ to zero. Let a, denote the value of the it); element in the state vector 07, andlet ctic, = —a, denote the additive inverse of ac. Label the branch fromtoa/ .._ 07+1with cVc. (This extends each state at depth / with only a single branch.)5. Termination Test: If / = n — 1 the trellis is complete. Otherwise increment / andperform step 2.Method 2 makes explicit use of a counter (i p) to point to successive elements of the state vectorat each constraint depth. More importantly, the row echelon form of H2 is exploited to simplifythe next-state formation at a constraint depth. The state vector element pointed to by ip is simplyset to zero. Figure 2.8 shows the resulting trellis.37Chapter 2. Trellises for Linear Block Codes^ 2.4 Simplified Trellis Construction Methods1^2^3^4^5^6^7Figure 2.8 Trellis for a (7,4) Hamming Code Generated using Method 2The trellis was generated from a parity check matrix in a modified row echelon form, which facilitates the next-statecomputation at constraint positions. At each constraint-position it can be seen that the next-state is formedfrom the present state by simply setting the remaining most significant bit of the state vector to zero.38Chapter 2. Trellises for Linear Block Codes^ 2.4 Simplified Trellis Construction Methods2.4.3 Method 3Method 3 is a simple modification of Massey's trellis construction method for non-systematiclinear block codes [9]. Method 3 is a simplification of the method of [9] in that it avoids matrixmultiplication to extend each state to the next depth. As in [9], we begin by reducing G to rowechelon form, to completely eliminate all non-pivot elements. As discussed earlier, let P denotethe matrix formed by extracting the n — k columns of G that correspond to the parity positions.Also, let pm denote the nith row of P, and let ptn, denote the tail of the mth row of P (i.e. pt„,consists of the elements in the mth row of P that correspond to positions in the codeword tail.). Inthe following algorithm, steps 1 and 2 are similar to the respective steps in Method 2, while step5 is identical. The main differences from Method 2 are in steps 3 and 4.Trellis Construction (Method 3)1, Initialization: Set 1 = 0, cro = 0, Vo { so that at depth 0 the set of statesVo consists only of the root. Initialize an information-symbol counter i/ to zero.Initialize a parity-symbol counter i p to zero.2. Constraint Position Test: If the position 1+1 cE It the extension is unconstrained;increment i1 and perform step 3. If the position 1+1 E It the extension is constrained;increment i p and perform step 4.3. Unconstrained Extension: For all states o-/ in 1, perform= i + ajp 1 ,j =0,1,...,q —1 (2.50)and label each branch from cri to cri+i with its associated symbol a3. Here pt,1 is thetail portion (i.e. the last n — k —ip elements) of the row of P. (This step extendseach state at depth 1 with q branches to states at depth 1 + 1.) Perform step 5.39Chapter 2. Trellises for Linear Block Codes^ 2.4 Simplified Trellis Construction Methods4. Constrained Extension: For each state a/ in Vi; Form 0-/+1 by setting the itp elementof cri to zero. Label the branch from a/ to cri±i with the i1/11, element of ai. (Thisextends each state at depth 1 with only a single branch.)5. Termination: If 1 = n — 1 the trellis is complete. Otherwise increment 1 andperform step 2.2.4.4 DiscussionMethod 1 utilizes any form of the parity check matrix that has been row reduced from 'right toleft' to find tail pivots. In particular, Method I can be used with a parity check matrix in 'standard'form,16 which facilitates the computation of the trellis dimensions as well as the trellis construction.Method 2 is a specialized version of Method 1, and uses the parity check matrix reduced tomodified row echelon form. While Method 2 does not directly provide the trellis dimensions, itgreatly simplifies the next-state formation at constraint-positions. Indeed, there need not be anymodification made to state vectors when extending them to constraint positions, if we agree to maskoff (ignore) each successive leading bit of the state vector as indicated by the counter ip. Thiscan be done since each state at a constraint depth is known to have a zero element in the positionindicated by the counter ip.Method 3 utilizes the generator matrix reduced to row echelon form, and its state assignmentrequires similar effort to that of Method 2. Branch labeling at constraint positions is perhapssimplest in Method 3 since the branch label is simply taken to be the value of a particular elementof the state vector, while in Method 2 the label is the additive inverse of the same. However, thisminor advantage does not exist for binary codes.16^The `standard form' parity check matrix }Tsui is equivalent to Fomey's 'trellis oriented generator matrix', except for a specified ordering ofthe rows to have the matrix in lower trapezoidal form.40Chapter 2. Trellises for Linear Block Codes^ 2.5 Bounds on Trellis DimensionsWe emphasize that each of Methods 1, 2, and 3, share the following principal advantages overprevious methods. They avoid the generation of an unexpurgated trellis (followed by expurgationof invalid tails) as used in [8] for general linear block codes. They also avoid matrix multiplicationat each extension to form the trellis, as used in [9] for non-systematic linear block codes, or asused in [16] for general linear block codes.The simplified trellis construction methods can also be used to construct and search partialtrellises. As will be discussed in subsequent chapters, partial trellis or tree searches are usefulwhere the full trellis is so large that the storage, generation, or searching of the entire trellis isunfeasible. The simplified trellis construction methods presented here should be useful in theconstruction of partial trellises 'on the fly', as guided by the search algorithm, so that only theportion of the trellis that needs to be explored is generated.2.5 Bounds on Trellis DimensionsSince the number of states in the trellis affects the ML decoding effort it is not only of interest toconstruct a minimal trellis but also to have simple bounds on the number of trellis states attainablefor different code parameters (n, k, d,„) [17]. A complication arises in that the trellis state spacedimensions can be altered by a permutation of symbol positions, as pointed out by Massey [9] andothers, including; Bahl et al[15], Matis and Modestino [22], Muder[17], and Kasami et al [23].Bahl et at [15] and Wolf [8] established a simple upper bound on the maximum trellis dimension.Muder [17] introduced some lower bounds on the maximum trellis dimension of a minimal trellis.Herein we improve on Muder's bound for general linear block codes. As well we show that thevariability of trellis dimensions due to symbol position permutation is limited to a central region of41Chapter 2. Trellises for Linear Block Codes^ 2.5 Bounds on Trellis Dimensionsthe trellis. The dimensions in the fixed region are determined and we give a simple lower boundon the minimum trellis dimensions in the variable dimension region.In [15] [8] it was noted that a parity check matrix based trellis will have s < n—k, which followsfrom the fact that there are n — k symbols in the state vector. Also, since there are qk codewords,or using the fact that the state-space dimensions of the dual code Ci (with k parity check symbols)are identical to C [16], we have also that s < k. These upper bounds are summarized ass < min (k, n — k).^ (2.51)The first new result of this section is the following proposition.Proposition 2.3: For any linear block code C the state-space dimensions at the d consecutivedepths at either end of the trellis are/^/ E {0, 1, 2,^, d —n — 1^1 E {n — (d —1),...,n — 1,n}for any symbol position permutation, whered rnin (drmit, dIznzn)(2.52)(2.53)is the smaller of the minimum distances of the code C or its dual CI.Proof: We first show that si^/, / E {0,1, 2,^, d — 11. For this, from Proposition 2.2 it issufficient to show that / E h, and / It for 1 C {1, 2, , d — O. The fact that any dmin — 1columns of H are linearly independent implies that there will be dmi„ — 1 head pivots in positions1, 2, , dmin — 1, so we have / E Ih for 1 E {1, 2,..., drnin —1}. The minimum weight ofany row in H is d„, since H = GI. The earliest that a tail pivot can possibly occur is for thecase where a minimum weight row has all of its nonzero symbols bunched together in the earliestpositions. This corresponds to a weight di71 row having its di nonzero symbols in the first di77221/7722^ mzn42Chapter 2. Trellises for Linear Block Codes^ 25 Bounds on Trellis Dimensionspositions. Consequently, the earliest that a tail pivot can occur is at depth dn, so that 1 V It for1 E {1, 2,^, dimm — 1}. This fact plus the fact that 1 E /h for 1 E {1, 2,^, dmi„ —1} impliesthat 1 E ,1 h for 1 E {1,2, ,d — 1}, where d min (dm,„, d7-Lnin). Finally, to establishthat sl n — 1 for 1 E {n — (d —1),..., n — 1, n} we observe that the above arguments hold whenviewing the parity check matrix in the reverse direction from depth n Proposition 2.3 provides us with a lower bound on the minimum trellis dimension as follows.Corollary 2.3.1: For any linear block code the maximum trellis dimension satisfiess > min^— 1, d„ — 1)^ (2.54)or equivalently,s > d — 1.^ (2.55)Proof: For a code and its dual with smallest minimum distance d, Proposition 2.3 indicates thatthe trellis dimension at depth d —I will always be d —1, so that the maximum trellis dimension isat least this value.Corollary 2.3.2: If C is a maximal distance separable (MDS) code, thenS = min (k, n — k).^ (2.56)Proof: For an MDS code d„„„ — 1 = 71 - k so that any 71 - k codeword positions are paritypositions and any k positions form an information set. (In other words, any k codeword positionsare independently specified, and the remainder are forced by the constraints of the code.) Now,43Chapter 2. Trellises for Linear Block Codes^ 2.5 Bounds on Trellis Dimensionsviewing H as the generator matrix for C1, we have that any n — k positions are independentlyspecified and hence any k positions form a parity set, i.e. any k columns of H-L will be linearlyindependent. Hence dimin —1 = k and C1 is MDS. So, using dmi„ — 1 = n— k and (171ii71 — 1 = k inCorollary 2.4.1 we have s > min (k, 71 — k). This, combined with the fact that s < min (k, i — k)establishes (2.56).That MDS codes have a maximum trellis dimension given by (2.56) was shown by Muder[17] by bounding the dimensions of the past and future subcodes. Using the same method Muderobtained the following lower bounds on .5, s > min (k, n — k 2A)8 > min — k, k — 2A-L) (2.57)whereandA^71 k (dmin^1) (2.58)Al k — (di„ — 1).^ (2.59)To see that Corollary 2.3.1 improves on (2.57), we simplify (2.57) as followss > min (k, n — k — 2A, n — k , k — 2A-L)= min (72 — k — 2A, k — 2A-L)— min (dnizn — 1 — A, d 71 — 1 — A1)^ (2.60)where we have used the fact that A > 0 and that A1 > 0. Comparing the equivalent form (2.60)of Muder's bounds (2.57) to Corollary 2.3.1 (2.54) reveals our improvement in that the terms Aand Al have been eliminated.44Chapter 2. Trellises for Linear Block Codes^ 2.5 Bounds on Trellis DimensionsFor example, the Golay (24,12,8) code has A =^= 5 and Muder's bound of (2.57) yieldss > 2 , while Proposition 2.3 gives s > 7. This is significantly closer to the known results for thesmallest s possible for this code, which is 9 [17].Proposition 2.3 indicates that the trellis dimensions for the first d — 1 consecutive depths ateither end of the trellis do not vary regardless of symbol position permutation. We will refer tothese depths, 1 < d,1 > n — d, as fixed dimension regions. Between these fixed dimension regionsthe trellis dimensions may vary with symbol position permutation, and we refer to these depthsd < is < 71 - d as the variable dimension region. We now lower bound the minimum dimensionof the variable dimension region,/ LS^•8 = 111111^11-1111^($h)(VP)G [d< h max [0, min (dmi„ — 1 — A, drn- i„ — 1 — Ai)].^(2.62)Proof: Consider a (n,k,din,„) code C. Either we will have d = dm,„ or d = din... First, assumethat d = dmi„. From Proposition 2.4 we have that the trellis dimensions are dm,„ — 1 at depthdm,„ — 1, independent of any permutation of symbol positions. From Proposition 2.2 recall that adecrease in dimension is possible at a tail pivot, of which there are n — k. Now d„„„ —1 of these areconfined to consecutive positions at the toor, so there are A = n — k — (d„„„ —1) tail pivots whichmay change positions for equivalent codes obtained by symbol position permutation. Hence, a lowerbound on the smallest state-space dimension would be if all A tail pivots followed position din „ — 1,45Chapter 2. Trellises for Linear Block Codes^ 2.5 Bounds on Trellis Dimensionsbefore any remaining head pivots, resulting in s' > dm,„ — 1 — A. Now assume that d dn, sothat we consider the code CI with minimum distance dimjii and k pivots. Repeating the argumentused above for the code C, we have that s = diffiz„ — 1 at depth dim,. — 1, and s' > d' ^—These two cases and the fact that 8 > 0 give Proposition 2.4.Note that Proposition 2.4 may also be used to show that MDS codes have s = min (k, n — k),since A = A-L = 0 and the lower bound on the minimum state-space dimension (2.62) will agreewith the upper bound of (2.51).In light of Proposition 2.4 we revisit Muder's lower bound on s given by (2.57) and simplifiedin (2.60). Note that this lower bound on the maximum dimension is equal to Proposition 2.4'slower bound on the minimum dimension. For example, the (24,12,8) extended Golay code hasA = A-L = 5, with (2.60) indicating that the maximum dimension is .s > 2, while (2.62) indicatesthat the minimum dimension in the variable dimension region is Si > 2.We now return to the upper bound s < min (k,n — k) and note that a systematic ordering ofsymbol positions will result in dimensions meeting this bound. Using the definitions of A andA-L this is equivalent tos = min (dini„ — 1 + A,dimin — 1 + Al)^(systematic form ).^(2.63)Combining this upper bound on s with Proposition 2.4 we have the following upper and lowerbounds on the trellis dimensions in the variable dimension region,max [0, min (dmi„-1— A, d71„i„-1— AI)] < si < min (d„ii„ — 1+ A, di-Lniii-1+ A1), d < 1 < it — d.^(2.64)For example, consider a code and its dual with smallest minimum distance d and corresponding A,denoted Ad, and assume that Ad < d —1. Then from (2.64) the range in trellis dimensions in the46Chapter 2. Trellises for Linear Block Codes^ 2.5 Bounds on Trellis Dimensionsvariable dimension region will be +Ad, with respect to fixed trellis dimension of d — 1 at depthsd — 1 from either end of the trellis. This variability is represented in Figure 2.9, which shows thebounding envelope of the trellis dimensions for all symbol position permutations.State-spaceDimensionState-spaceDimensiond -1 tAdfDepthd -1^n-(d-1)^d -1^n-(d -1)Figure 2.9 Bounding Envelopes of the Trellis DimensionsThe heavy lines indicate upper and lower bounds on the trellis dimensions. Near eitherend of the trellis, the dark shading indicates the fixed dimension regions. The lightshading indicates the variable dimension region. Examples shown are for Ad < d — 1.The bounds indicate that a substantial decrease in trellis dimension is achievable only for codesthat have a large Ad, or equivalently a low minimum distance d relative to the correspondingSingleton bound, dmi„ < 72 — k 1 or din-L. < k + 1. For example, codes with low d, such asbinary BCH codes for increasing n, may have a maximum trellis dimension that is considerablysmaller than Ti — k. Indeed, in [23], Kasami et at have recently found such trellises for some binaryBCH codes. However, for codes with d relatively close to the Singelton bound, (2.64) shows thata large decrease in state-space dimension is not possible. These results, in addition to those of47Chapter 2. Trellises for Linear Block Codes^ 2.5 Bounds on Trellis DimensionsMuder, strengthen support for the conjecture [17] that when dm,„ is maximized for a given n andk, then s is close to its maximum possible value.2.5.1 DiscussionBeginning with Bahl et at [15] and Wolf [8] the advantage of trellis decoding over brute forcecorrelation decoding was demonstrated for high rate codes, using the q"—k upper bound on thenumber of states. In [15] [9] [22] and elsewhere, the fact that the trellis dimensions are affectedby symbol position permutations was observed. In [9] it was emphasized that this property shouldbe exploited to simplify decoding. However, as discussed in [9] and [22], the variability of thetrellis dimensions for general codes was not explicitly known. From Muder's lower bounds on themaximum trellis dimension and from the improved and new bounds developed here, the dependenceof the trellis dimensions on the specific code has been made more explicit. Lower bounds on themaximum trellis dimension and on the minimum trellis dimension (in the variable dimension region)are given in terms of the code parameters n, k , d771 , In particular, the lower bound on theminimum trellis dimension appears to be the first such bound developed. This bound can be usedto demonstrate that some codes will not have a small number of states relative to the worst case ofmin (qk, q"—k). For example, the trellis dimensions of MDS codes are completely unaffected bysymbol position permutation, and remain at their worst case values. The bounds indicate that onlycodes (and their duals) that have a smallest minimum distance d = min (dm,„, d) significantlyless than the corresponding Singleton bound can possibly have a small number of states relativeto the worst case of min (qk, qn—k ).48Chapter 3Trellis Decoding with PruningWhat defeats us ... is the "curse of dimensionality".R. Bellman [241M AXIMUM—LIKELIHOOD decoding using the Viterbi algorithm rapidly becomes imprac-tical as the trellis dimensions increase. This exponential growth of the number of trellisstates is an example of Bellman's "curse of dimensionality" [24] [25] in dynamic programming.With larger trellises, we are thus motivated to find strategies to save decoding effort.As discussed in Chapter 1, the trellis search can be reduced by judiciously discarding pathsbased on soft-decision metrics; this avoids an increase in error probability due to coarse quantizationor hard-decision decoding which effectively discard likelihood information at the outset for all paths.However, some increase in error probability will be incurred if paths are discarded from the searchwithout assurance that they are not the ML path. We will refer to such action as pruning. While49Chapter 3. Trellis Decoding with Pruning^ 3.decoding with pruning will result in an increased probability of error relative to ML decoding, thisincrease will be tolerable (or perhaps negligible) when the pruning decoder is well designed.Two aspects of decoding with pruning are discussed in this chapter. The first is the choiceand use of an appropriate metric for pruning on a tree or trellis. The second is the task ofcontender sifting. Contender sifting refers to the selection of some number of the best metrics fromamong a larger number of competing metrics. A method for contender sifting is introduced that isconsiderably more efficient than schemes based on sorting or selection.In Section 3.1 we briefly review the main types of tree and trellis decoding algorithms. InSection 3.2 we describe partial trellis searches, and define some terminology for use in later sections.In Section 3.3 we find an appropriate metric for pruning on a tree or trellis. The approachtaken is somewhat different from [26] in its description of the problem and in its modeling ofthe decoder's knowledge of branch-likelihoods in the unexplored part of the tree or trellis. It isshown that the decoder should discard heads in the obvious manner (by first exploiting mergesin the trellis and then pruning the 'worst' heads first as indicated by the metric) and that in somecircumstances the metric should be altered from that usually used. Such situations include unequal apriori information-symbol probabilities, non-linear codes, and non-symmetric, non—binary discrete-memoryless-channels (DMCs).In Section 3.4 we discuss some previous approaches to contender sifting and present a newcontender sifting method. The new method can sift out a sorted set of the best M of 2M metricsusing a worst case number of comparisons that is linear in M. This might seem surprising, sinceit is well known that comparison-based sorting of 71 items requires 0(n log2 n) comparisons [27].The efficiency of the contender sifting method is attained by exploiting an ordering that is inherent50Chapter 3. Trellis Decoding with Pruning^ 3.1 Tree/Trellis Decoding Algorithmsin the metrics. Specific applications to the decoding of binary linear block codes and binary linearconvolutional codes are presented.3.1 Tree/Trellis Decoding AlgorithmsFull searches of a code tree consider all paths through the tree and then select the ML path. TheVA was introduced for decoding convolutional codes [12] and it exploits the trellis structure [11].Its decoding efficiency arises from the observation that of any heads that share the same tail(s),only the single best of these heads needs be considered further into the trellis. The VA extendsheads to the next depth of the trellis, and then retains only the single best head for each state. Inthis way, the VA discards any path as soon as it has proven to be inferior to some other path. Thisyields the same decoder output as if all paths through the trellis were considered. The VA has alsobeen used in several other applications [11][28] besides decoding of convolutional codes, includingthe decoding of linear block codes since they may be represented by a trellis [15][81191116].Several other search algorithms have been used that may partially explore the tree or trellis.They are usually classified as breadth-first, depth-first, and metric-first type searches [29].The VA and the M algorithm [301[31] are examples of breadth-first searches. M algorithmdecoding on a trellis proceeds just as the VA at each depth, except that only the best M states areretained instead of the full trellis width.17 This pruning used by the M algorithm will result in anincreased probability of error relative to the VA. Some other breadth-first algorithms are similar tothe M algorithm except that the number of heads retained is dynamic; a head is retained until itsmetric differs from the current 'best' head by a certain amount [32]. This approach can lower theaverage computational effort at the expense of a variable decoding delay.17^For use with very long sequences of convolutionally encoded data, the path storage length is truncated to L symbols, and the algorithm isthen referred to as the (M,L) algorithm.51Chapter 3. Trellis Decoding with Pruning^ 3.1 Tree/Trellis Decoding AlgorithmsAn example of a depth-first search is the Fano algorithm [33] which pursues a single headinto the tree until its metric falls below an adaptive threshold. When a candidate head fails in thismanner, the algorithm backtracks into the tree to follow another head. The Fano algorithm thusmakes a variable number of attempts at following an acceptable head.The third class of decoding algorithms are the metric-first approaches, or stack decodingalgorithms. Stack decoding algorithms maintain lists of candidate heads of various lengths rankedaccording to their metrics. In the Zigangirov-Jelinek stack algorithm [34][35] the search is carriedout by extending the highest ranked head to form new heads, then merging these metrics with thosealready ranked. There are several variants of the stack algorithms [29], which in general attemptto reduce the variability of the decoding effort and the probability of erasing the correct path.3.2 A Description of Partial Trellis SearchesPruning decoders begin operation at the root node of the trellis, with no heads as yet explored.The search begins by extending heads from the root node and computing their metrics. Using theseheads and metrics the decoder then decides which heads to retain for possible further consideration,and which heads to discard. The heads that are compared are referred to as contenders. Thosecontenders that are retained are referred to as survivors. The action of selecting the contenders to beretained is referred to as contender sifting. The discarded heads will be referred to either as defeatedor pruned, and the distinction between these is as follows. A defeated head is one that is discardedfrom further consideration due to the fact that a test of its metric and state is sufficient to prove thatit will have a poorer path metric relative to at least one other path. A pruned head is one that is alsodiscarded from further consideration, but the test of its metric and state used to decide to discard itis insufficient to prove that it will have a poorer path metric relative to at least one other path.52Chapter 3. Trellis Decoding with Pruning^ 3.2 A Description of Partial Trellis SearchesThe decoding search is performed in a sequence of stages, as shown in Figures 3.1 and 3.2.Each stage consists of the following steps:1. Extension of some or all of the previous stage's survivors, to form contenders.2. Updating of the metrics for the contenders.3. Determination of the heads to be defeated.4. Determination of the heads to be pruned.It will be useful to view the decoder as acting on sets of heads or on corresponding sets ofcodewords, as described below.Since each head corresponds to one or more codewords, we may view the decoder as acting onthe set of all qk codewords, as shown in Figure 3.1. At the input to the first stage, all qk codewordsare contenders, and by the end of the stage some may have been defeated and some others mayhave been pruned. The survivors of stage 1 become the contenders for stage 2, and the process isrepeated. Thus, the decoder progressively reduces the number of survivors stage-by-stage, with asingle survivor at the end of the final stage being the decoder's output.Figure 3.2 represents the decoder as acting on sets of heads, instead of acting on sets ofcodewords as in Figure 3.1. Some or all of the survivors of a previous stage are extended to formcontenders. These contenders are then considered by the decoder for possible elimination.In Figure 3.1, the set of contender codewords for stage m is denoted Sm. S Dm and S pm denotethe sets of defeated and pruned codewords, respectively, at stage 7n. These sets of codewords arerepresented by corresponding sets of heads in Figure 3.2, and are denoted Snh5, and Sk,respectively.53Chapter 3. Trellis Decoding with Pruning^ 3.2 A Description of Partial Trellis SearchesThe breadth-first, depth-first, and metric-first decoding algorithms discussed in §3.1 can bedescribed by this model of partial trellis searches as follows. In a breadth-first decoder, each stagebegins with extending all survivors, typically one branch further into the trellis, to form contendersof equal length. Contenders may be defeated and pruned at each stage, and there are rnfinal = nstages in total. In contrast, the depth-first and metric-first algorithms do not extend all survivors ateach stage. For example, the stack algorithm or the Fano algorithm extend only a single survivor perstage, and the total number of stages mfinai is a random variable. We can view such backtrackingsearches as not pruning at any stage except the final stage.54Stage 1^Stage 2 Stage m finalnumber ofsurvivingheadsfinalsurvivortimeChapter 3. Trellis Decoding with Pruning^ 3.2 A Description of Partial Trellis SearchesFigure 3.1. Representation of Pruning Decoder Operation as Acting on Sets of CodewordsThe decoder begins with the set of all qk codewords and progressively reduces thenumber of surviving codewords. ,S'm is the set of contender codewords for stage tn.Spm and ,S'pn, are sets of defeated and pruned codewords, respectively, at stage ?an.Figure 3.2. Representation of Pruning Decoder Operation as Acting on Sets of HeadsThe shaded region within each stage m represents the formation of the contenders Smil from the previousstage's survivors. The defeated heads Am and the pruned heads Sip'm are chosen from the contendersThe set Sibm consists of all heads whose descendants are the set of codewords SD„.in Figure 3.1. Similarly, S, corresponds to the set of pruned codewords S pm in Figure 3.1.55Chapter 3. Trellis Decoding with Pruning^ 3.3 Decoders Constrained by Pruning3.3 Decoders Constrained by Pruning3.3.1 A General Decoding Metric and its use in Trellis DecodingIn this section we assume that we are given some survivor extension method; i.e. we are giventhe decoder's method for extending the set of survivors from the previous stage. We find a decodingmetric and show how it should be used in trellis decoding in order to minimize the probability oferror, given the fact that the decoder is constrained to use pruning. We do not completely specifyhow to carry out the pruning in terms of, for example, the number of heads to prune; that and thechoice of the survivor extension method can be separated from the choice of an appropriate metric.The discussion in this section is divided into five steps labeled (i)-(v), with the followingorganization. Step (i) expresses the probability of error in terms of discarding codewords from alist of all codewords, as illustrated in Figure 3.1. Step (ii) incorporates into the expression for theprobability of error the fact that multiple codewords are discarded per head, as illustrated in Figure3.2. Step (iii) discusses the constraint on the minimization of the probability of error imposed by thefact that the codeword likelihood information is incomplete (since only part of the trellis has beenexplored). We use a model of the decoder's knowledge of the branch—likelihoods on the unexploredtails that specializes Massey's 'random-tail' approach [26]. Step (iv) discusses the constraint thatthe pruning of heads is carried out stage-by-stage. Finally, Step (v) finds an equivalent form of thedecoding metric that involves only quantities computed over the explored portion of the trellis.(i) Probability of ErrorSince the action of the decoder is to discard codewords in multiple stages, as shown in Figure3.1, it is natural to express the probability of error as the sum of contributions from each stage.56Chapter 3. Trellis Decoding with Pruning^ 3.3 Decoders Constrained by PruningFirst, the probability of correct decoding isPr(correct ) = f Pr(correct r)f (r) dr^ (3.1)where Pr( correct r)^Pr(cout = Ctransmitted I r), with cout being the codeword released bythe decoder. We also use the notation Pr(c r) to denote the probability that the codeword cwas transmitted given that r was received. Now, using (3.1) and the fact that E pr(c r) = 1ceCwe may writePr(error) = 1 — Pr( correct)= I f (r) dr — f Pr(cout r)f (r) dr^f^Pr(C r) f (r) dr — f Pr(coutR cEC^= f^Pr(C r) Pr) dr .e V eo u t11'^cECOf (r) dr(3.2)Equation (3.2) expresses the probability of error as the expectation over all possible demodulatoroutputs r e R of the probability that any unchosen codeword given r is correct.All of the codewords appearing in the sum in (3.2) are by definition either defeated or pruned.Hence we may split the sum in (3.2) to givePr(error) =Pr(c r)^Pr(c r)j f(r) dr^(3.3)R [cESD^cESPThe integral of the terms in the first sum in (3.3) is the contribution to the error probability fromdecoding trials in which a defeated codeword was in fact correct. Similarly, the integral of theterms in the second sum in (3.3) is the contribution to the error probability from decoding trialsin which a pruned codeword was correct.57Chapter 3. Trellis Decoding with Pruning^ 3.3 Decoders Constrained by PruningTo minimize (3.3), it is sufficient to minimize its integrand for each r, which can be written asE [i(c c SD) + /(c E Sp)] Pr(c r) f(r)^(3.4)cecwhere /(.) is unity if its argument is true and is zero otherwise. As well, since the decoder discardsdisjoint sets of codewords at each stage we can rewrite (3.4) asn1-final12, E [i(c E SD) + I(c E Sp_)] Pr(c r) f (r) .nr=1 cEC(3.5)(ii) Multiple Codewords Discarded per HeadWhen the decoder discards a head it consequently discards all of the codewords that descendfrom the head, so that the decoder is not free to choose individual codewords to discard inminimizing (3.5)18. Hence, the set of discarded codewords Sp_ + S Dm in (3.5) consists of alldescendant codewords of the pruned heads Sk and of the defeated heads ShDrn. To incorporatethis fact into (3.5) we first note that for any codeword,Pr(c I r)f(^=^r c)Pr(c)= f (rh rt ch ct) Pr (ch, ct)= f (rh ch ct) f (rt rh ch ct) pr (ch) pi, (ct ch)= f (rh ch) f (ri ct) Pr (ch) PT (Ct Ch)^(3.6)where in the final step we have used the fact that the channel is assumed to be memoryless.19Using (3.6) in (3.5), and since the discarded codewords at stage m are chosen from the descendants18^See also Figure 3.2.19^Also, in (3.6), Pr (c') = Pr(ci)Pr(c2 I c1)Pr(c2 I c2,c1)- • • Pr(ci^c2, c1) andPr(ct I ch) = Pr(c1+1 I CI, • • • ,c2•ci)Pr(ci+2 I is • • • e2 el) •^Pr(en I c,_1 • • • • C2^)•58Chapter 3. Trellis Decoding with Pruning^ 3.3 Decoders Constrained by Pruningof S , we can replace E in (3.5) by E^E , where the last sum generates all tails ctcec^ch es;, vet ,—,Chof a head ch, to obtainIT/final^[/(ch E St) + i(Ch E Sibm)1f rh I ch) Pr (ch) E f (rt c/)Pr(ct I ch) .m=1 Ch ES.^ vct_ch(3.7)Equation (3.7) makes explicit the fact that the discard decisions are made at the heads and that alldescendant codewords of each head are consequently discarded.(iii) Optimization Constraint: Incomplete Codeword Likelihood InformationMinimization of (3.7) depends on what codeword likelihood information is available; someof the likelihood information will not be available due to the fact that the search is incomplete.The codeword likelihood information available at any stage can be considered to be split into twoportions, with the first portion consisting of branch-likelihoods on the explored head, and the secondportion consisting of less specific branch-likelihood information available on the unexplored tail.To clarify this let us first consider (3.6), and rewrite it as a product of terms that correspond tothe head and tail, viz[f (rh ch) Pr (ch)] [f (rt ct) Pr (ct ch)] .^(3.8)gi(rh,ch)^g2(rt,ct,,h)We have labeled the first term in square brackets in (3.8) as a function gi^Ch), and have labeledthe second term as a function g2^et, e").\.For an explored head c1' the decoder has computed f (rh ch)p7,(cli ) In other words, for anexplored head Ch the decoder has available the function gi (rh, ch) precisely.For an unexplored tail ct the decoder has not computed f (rt ct) P (C1 Ch). In other words,the decoder does not have available 92(rt ,c t , c" ). We need to consider how the decoder should59Chapter 3. Trellis Decoding with Pruning^ 3.3 Decoders Constrained by Pruningact given that it doesn't have the function g2 (r/, ct, ch) available precisely. We consider two cases.The first case is where the head Ch can be classified as defeated. In this case the competing headsat a node share all their tails, so the decoder can defeat Ch without needing to know 92 (rt, ct, ch).This is precisely the basis used for discarding codewords in the Viterbi algorithm. The second caseis where the head ch cannot be classified as defeated. In this case the decoder is constrained toestimate the function g2 ci, ch).The estimation of the function g2 (ri, ct, ch ) depends on what information is available to thedecoder with regard to the unexplored tail. For this we have to assume (and justify) a decodermodel. In other words, we need to state what a priori and a posteriori information regarding anunexplored tail should be assumed available to the decoder. We also need to state how the decodermay be constrained in utilizing this information.We will use a model of sequential decoding that is essentially that used in [26]. The decoderis considered to have no specific knowledge of the code symbol on an unexplored branch, but as ahead is extended to explore the branch the decoder gains knowledge of the code symbol. (This codesymbol is then used to look up the appropriate symbol metric for use in updating the head metric).th,\To consider how the decoder will estimate 92^c, ch)^f(rt ct) p7.(ct c) we first recallthatflf (rt I ct) = H^ci).^ (3.9)z=i+i^We assume that the decoder has computed (f(r 0), f(rj 1),^, f (r, q — 1)) for every symbolposition i. In other words, we assume that the decoder has computed the symbol metrics for everysymbol position. However, on an unexplored branch the decoder has (as yet) effectively no specificinformation as to which likelihood to use. This corresponds to an uncertainty as to which likelihoodwill be assigned to an unexplored branch. This uncertainty can be quantified and used to estimate60Chapter 3. Trellis Decoding with Pruning^ 3.3 Decoders Constrained by Pruningf (rt ci) as follows. Let us assume that the decoder has, in advance, utilized knowledge of the codeand the a priori information-symbol probabilities, to compute the probabilities {(22(j)} 3=0,1, ,q-1 ofthe occurrence of the channel-symbol j for position i. (For example, it is common to use linearcodes with equiprobable information-symbols, in which case Q(j) = 1/q for j = 0, 1, , q —1 andi = 1, 2, . , n.) On an unexplored branch at position i, the decoder is then taken to be constrainedto model symbol j as occurring according to the probability assignment Q (j)2°. Consequently,the decoder will not be able to distinguish between different unexplored tails of the same length,since the information available (namely, {(2,(j)}3=0,1, q-1 and { f (r, I j) }3=0,1, - ) is the samex=1+1, ,n^ x=1+1,^nfor any of them. Hence the decoder's estimate of f (rt I ct) is actually only a function of I./ (andthe probability assignment Qi(j)). Accordingly, we let fo (ri) denote the decoder's estimate off (ri ct).We now show that as a consequence of this model, fo (rt) can be interpreted and computed ina simple way. Consider the pdf of ri, which is given byf(rt)^Y,i(rl I ct)P7-(ct)^(3.10)etwhere the sum includes every tail ct of length n — 1. Using fo (ri) as the decoder's estimate off (rt et) in (3.10) we obtain an approximation to f (r1) as>2, fo (rt)^(Ci) = f0 (rt) .etThis shows that fo (rt) can be interpreted as the approximate pdf of r1. Accordingly, we computefo (rt) as fo (11 = H^f I iWz(1)-i=l+1 j=1(3.12)20^This is essentially a specialization of the 'random-tail' approach used in 1261. We will later discuss the significance of using Q i(j) versusother distributions.(3.11)61Chapter 3. Trellis Decoding with Pruning^ 3.3 Decoders Constrained by PruningThe contribution to (3.7) if a head is discarded isf rh ch p7, (ch)^f(rt ct) Pr (ct ch))Vet ch "As discussed above, the decoder has computed f( rh I ch )Pr(ch ), and can estimate f (rtfo (rt). Using this in (3.13) we obtain(3.13)ct) asf rh ch) Pr (ch) ^ J.° (rt) Pr (ct ch) _= f rh ch) Pr (ch)Vct,chas the expected contribution to the error probability if a head is pruned.fo (ri)^(3.14)For later convenience, we choose to writefo (1'1)^fo(ri)^ (3.15)i=i+1with each term in the product defined asfofri) = ^ gri I j) Q.(j).^ (3.16)j= 1(iv) Optimization Constraint: Stage by Stage DecodingHere we consider that the decoder must discard heads stage by stage, as illustrated in Figure 3.2.The decoder is constrained to decide at each stage on which heads to discard, i.e. the decoder isconstrained to consider each stage independently of future stages. This corresponds to the decodernot being able to predict which heads will be discarded in future stages. In considering whichheads to discard from Smh , the decoder should first discard those heads that can be classified asdefeated, since (as discussed earlier) a defeated head's descendants will contribute less to (3.7) than62Chapter 3. Trellis Decoding with Pruning^ 3.3 Decoders Constrained by Pruningits competitor's descendants. Next, in considering which heads to discard from those remaining(i.e. Sinh — Sib) the decoder can minimize its expected contribution to (3.7) by using (3.14) as theexpected contribution for pruning a head. In summary, the decoder minimizes^ f rh ch) pr (ch) fo (rt)chtE hat each successive stage m, where Shpin c S — Slbm.(v) An Equivalent Decoding Metric(3.17)The above results indicate that the heads to be pruned at each stage are those whose pdff (rh ch)pi,(ch) weighted by fo (rt) are lowest. The effect of the weighting by fo (ri) is thatif this 'expected tail' is quite likely, then further consideration of the head is favoured, since theunexplored branches occur frequently and hence may contribute significantly to the error probability.On the other hand if this 'expected tail' is unlikely, then if the head is also unlikely then discardingthe head will not significantly contribute to the error probability.Instead of minimizing (3.17), we may equivalently minimize (3.17) divided by any positivequantity that is independent of each ch. Suppose we choose to divide (3.17) by Jo (rh) Jo (r1),where fo (rh) is defined similarly as done for fo (rt) in (3.15), vizJo (rh) = Hfofroq=ITY,/iJQ.(i)• (3.18)i = 1 j = 1The decoder can now aim to minimize^ f( rh ch)pr(ch)Jo (rh )hCh P m(3.19)63Chapter 3. Trellis Decoding with Pruning^ 3.3 Decoders Constrained by Pruningat each successive stage 7n, where 4_ c Smh — S. From (3.19) the metric is nowf( rh ch) pr (ch)(3.20) (rh)The advantage of (3.19) over (3.17) is that it only involves quantities computed over theexplored portion of the trellis. Each head likelihood weighting has been normalized by fo (rh)fo (rt),which results in a weighting term involving only information available on the explored branches.Consequently, the decoder fits the usual definition of sequential decoding, i.e. decoding thatsequentially examines branches of the trellis so that at any stage the decoder's choice of whichheads to discard and to extend do not depend upon received branches deeper in the trellis. Thisdefinition of sequential decoding is equivalent to that of Jacobs and Berlekamp [36], except for ourexplicit allowance of trellis decoding in addition to tree decoding. The decoder is sequential in thatthe computation of its metric does not depend on received branches deeper in the trellis, althoughwe have utilized some knowledge of the unexplored branch likelihoods in the metric derivation.3.3.2 DiscussionEqn. (3.19) simply states that the decoder should, during each stage, first update the contendermetrics using (3.20), then discard those contenders that can be defeated, and then prune only thepoorest of the remaining contenders. In other words, the decoder should discard contenders ina rather obvious manner, by first exploiting merges in the trellis and then by pruning contenders'worst-first'. However, the metric (3.20) differs slightly from that used previously, as we nextdescribe.In [26] and the related discussion in [37], sequential decoding is modeled as being equivalentto the decoding of variable length codes that have been appended with randomly coded tails. The64Chapter 3. Trellis Decoding with Pruning^ 3.3 Decoders Constrained by Pruningsequential decoder's knowledge of the unexplored codeword tails is thus modeled as if randomlychosen code symbols were used. In modeling the decoder's knowledge of the unexplored codewordtails, our approach specializes the use of random code tails as in [26][37], by assuming thatthe decoder has precomputed the channel-symbol probabilities that are determined by the codeand information-symbol distribution used. This results in some minor differences between our"normalizing" term f0(7.,) given by (3.16) and the corresponding term denoted Po(y) in [261137].Specifically, from [261[37, pg 45],P0(y ) = Pr( Yi j)Qrandom (i) (3.21)where yi is a DMC output and where Qrandom(j) is a random-coding probability assignment for achannel-symbol to take on the value j. From (3.16)fofrz) = Pri I j) (2.(j) (3.22)where Q(j) is the channel-symbol probability distribution for channel symbols at depth i takingon the value j, for the specific code and the specific information-symbol distribution used. Thetwo distributions, Qrand„,,(j) and Qz(j), differ in two minor respects. First, 0 random (i) is nota function of the symbol position, while Q(j) accommodates differing probability assignmentsfor different positions. An example of where the two distributions will differ in this regard isthe case of unequal information-symbol probabilities. Second, 0 random (i) is a 'random-coding'probability distribution in the sense that it is usually taken [37, pg 451121 to be the maximizinginput-distribution in the computation of Ro [3]. ()random (j) will then be determined only by thechannel. This differs from Q(j) which is determined, not by the channel, but by the specific codeand information-symbol distribution used. However, for the typical case of a symmetric DMC [2](or a binary-input DMC [371), we will have Qrandom(j) 1 lq, = 0,1, ... ,q — 1, which will equalQ,(j) for the (also typical) case of linear codes with equiprobable codewords. Atypical decoding65Chapter 3. Trellis Decoding with Pruning^ 3.3 Decoders Constrained by Pruningsituations that involve non-linear codes, or non-symmetric, non-binary-input DMCs, or unequal apriori information-symbol probabilities, can result in differences between these two distributions.For most coding applications the metric (3.20) is identical to that used in [26]{37].3.3.3 Breadth-First DecodingHere we consider breadth-first decoding with equiprobable codewords, for which the metric(3.20) can be simplified and put into a more convenient form. This breadth-first metric will beused extensively in the next chapter.In breadth-first decoding, heads are compared for pruning when they are of equal length. Inminimizing (3.19) the decoder can then equivalently minimizeE f (rh ch)^ (3.23)ehEShat each stage, since for equiprobable codewords and equal length heads we can ignore the commonterms Pr(ch) and fo(rh).To choose heads that will minimize (3.23), we can also compare heads in the following manner.First, a head metric f (rh I ch) can be compared to other head metrics by favouring to prune firstthose heads that have the largest 1/f (r" ch). As well, we can multiply this by any positivefactor that is independent of ch. If we choose to multiply by f (rh I y11) where y is the vector ofhard-decisions, we obtain an alternative metricf( rh yh)f (rh ch) •Using the usual factoring of the pdf's and taking logarithms, we obtain1 Ydlog if(ri Yi)1.i=1^Lf(ri I ci)i(3.25)(3.24)66Chapter 3. Trellis Decoding with Pruning^ 3.3 Decoders Constrained by PruningIt will be useful to define the symbol likelihood-distance asD(yi, ci) = [f(ri I yi)1log Pri I cin(3.26) (In (3.26) we have included the absolute value signs only so that D(yi, (-22) = D(c„ yz).) Notethat D(yz, cz) > 0, and that equality is attained for cz = yz. Using (3.26) we see that (3.25) canbe written asD (yh ch) D(yz, ci) (3.27)where D (yh , ch) will be referred to as the head likelihood-distance. For heads of length n weobtain a codeword likelihood-distance D(y, , c).Where quantization is used the symbol likelihood-distance is as defined in (3.26), but withthe pdf f(7-, •) replaced by the probability Pr( Q(r) • ), where Q(7.,) is the discrete randomvariable resulting from the quantization of rz. For the case of hard-decision quantization on abinary—input symmetric—output memoryless channel to form a BSC it is easily shown that thecodeword likelihood-distance D(y, c) reduces to a form that is equivalent to the Hamming distancedi/(y,From the above discussion, we have that the decoder should prune those heads with the largestlikelihood-distance D (yh after defeating any heads that merge at a node in the trellis. Fromthe above discussion on quantization, it is easy to see that this rule of first pruning the heads withthe largest likelihood-distance is a generalization of the path deletion axiom [38] which states thatwhenever heads of the same length are being considered for pruning on a BSC, the one with thegreatest Hamming distance to the received path is pruned first.67Chapter 3. Trellis Decoding with Pruning^ 3.4 Contender Sifting3.4 Contender SiftingFor partial trellis searches to be efficient it is implicit that there be an efficient method forcontender sifting. Contender sifting refers to the selection of some the best contender metrics fromamong a larger number of contender metrics. The most common approach is to sort the contendersaccording to their metrics, and then retain those with the best metric. Alternatively, selection ofthe best (without regard to order within the best set) can be used. As pointed out in [291139], thecomputational effort associated with contender sifting is significant, so that it severely limits thenumber of survivors that can be retained. In the stack algorithm [35] and its variants (e.g. [40]),it is common to reduce the computational effort of contender sifting by using approximate sorting;the contender metrics are sorted into quantized bins (buckets). The sorting is approximate sincethe bin contents are not sorted (i.e. the sorting is not a complete bucket-sort [41]). In breadth-first decoding such as the M algorithm, most approaches have used either sorting [42][39][43] orselection [39][44]. We will concentrate on breadth-first decoders, and in particular, the M algorithm,since it offers a decoding effort that is virtually constant, in contrast to metric-first searches [29].We now illustrate the computational effort of various contender-sifting methods for the Malgorithm. Much of our discussion follows that in [39]. Let A r c(i , n) denote the number ofcomparisons used in sifting out the best i from among n contenders. For typical decodingapplications such as rate 1/2 binary convolutional codes, or for binary linear block codes, it isappropriate to consider Ne(M, 2M). Figure 3.3 plots Aic(M, 2M) normalized by M, versus M, forvarious contender sifting methods. The specific methods shown in Figure 3.3 use insertion-sorting,merge-sorting, and Hadian and Sobel's selection algorithm [27].The insertion-sort is particularly easy to implement but is suitable only for a very small numberof inputs. Insertion sorting can be suitably modified to output only the best M of the 2M contenders.68Chapter 3. Trellis Decoding with Pruning^ 3.4 Contender SiftingThis simple modification and the calculation of the worst-case number of comparisons are discussedin Appendix A. The worst-case number of comparisons normalized by M is, from Appendix A,Nc(M,2M) 1< (3M —1)^ (3.28)M^2which is plotted in Figure 3.3.Merge-sorting is considerably more efficient than insertion-sorting. The worst-case number ofcomparisons for merge-sorting is 0(n lg n) , where lg denotes log2 (versus 0(712) for insertion-sorting), and merge-sorting is an asymptotically optimum comparison-based sort [2711411145].Merge-sorting can be suitably modified to output only the best M of the 2M contenders. Thissimple modification and the calculation of the worst-case number of comparisons, for M being apower of 2, are discussed in Appendix A, from which we obtainNc(M,2M)< 21g M —1+21M.^ (3.29)MHadian and Sobel's selection algorithm usesrnin{M + (M — 1)[1g (M + 2)1, M — 1 + Mr lg (M^(3.30)comparisons in the worst-case to find the Mth best of the 2M [27, pg 214]. Moreover, (3.30) givesthe worst-case number of comparisons to partition the 2M contenders into the best and worst M,since any comparison-based selection (order-statistic) algorithm has carried out a sufficient numberof comparisons to partition the input set into best and worst subsets [2711145]. For M a power of 2,the first term in (3.30) is the smaller21, and this term normalized by M isArc(M,2M)^1^ = 1 + (1 —^lg (M 2)1 ,M a power of 2^(3.31) 21^That the first tem in (3.30) is smaller is verified as follows. For M = 2', i> 0, [1g (2' + 2)1 = rig (2' + 1)1. Let this quantity bedenoted by A, we can write the first term in (3.30) as M (M — 1)A and the second term as M — 1 + MA. The second term minus the first isA — I which is greater than or equal to zero for i > 0.69Chapter 3. Trellis Decoding with Pruning^ 3.4 Contender Siftingwhich is shown in Figure 3.3. Hence, Hadian and Sobel's selection algorithm takes 0(M lg M) =0(.1 lg 12) comparisons in the worst-case, or about one-half the number of comparisons of merge-sort.Also shown in Figure 3.3 are the number of comparisons for the best known selection algorithmsfor small M [27, Table 1, pg 2151. For asymptotically large M, selection schemes exist whichrequire a worst-case number of comparisons asymptotic to 10.86M [46] and 6M [47].In the following sections we introduce a contender sifting method for breadth-first decoding thatcan retain a sorted set of M survivors with a worst-case number of comparisons that is linear in M,for any size of M. As will be described, the proposed method exploits an inherent ordering of thecontender metrics. For decoding binary linear block codes, the proposed contender sifting methodhas a worst-case number of comparisons as shown as the lowest dotted line in Figure 3.3. For easeof description, the method will be presented for the M algorithm operating on a tree for a binarylinear block code. Extensions of the contender sifting method for use in decoding convolutionalcodes, and other cases are then discussed.3.4.1 Breadth-First Contender Sifting for Binary Linear Block CodesConsider a code tree for a binary linear block code, with one channel-bit per branch22. Forease of illustration we assume that likelihood-distance is used as the decoding metric.A typical stage in the operation of the M algorithm is illustrated in Figure 3.4. The decoderbegins with a set S of M survivors and extends each survivor to the next depth to form a set Cof 2M contenders, as shown in Figure 3.4(a) for the case of an information-bit. (The case of a22^Throughout this thesis, we make the natural assumption that there are no all zero columns in the code's generator matrix, so that there are no'wasted' symbol positions that carry no information.70Chapter 3. Trellis Decoding with Pruning 3.4 Contender SiftingFigure 3.3 Normalized Number of Comparisons for Sifting M from 2M ContendersThe number of comparisons Ne(M, 2M) (normalized by M) sufficient to sift out the best M of 2M contenders isplotted against M for various contender sifting methods. Simple modifications to insertion-sortingand merge-sorting were used to provide only the best M from 2M contenders. Comparison-basedselection (order-statistic) methods, such as Hadian and Sobel's shown here, inherently partition the2M contenders into the best M and the worst M. Also shown are the best known results for smallM (M = 2, 3, 4, 5). The dotted lines correspond to the proposed contender merge-sifting method.71Chapter 3. Trellis Decoding with Pruning^ 3.4 Contender Siftingconstraint-bit will be discussed later.) Standard contender sifting schemes operate on the list of2M contender metrics shown in Figure 3.4(b), with the result being a list of M survivors (whichmay be either sorted or unsorted).The efficiency of the proposed contender sifting method arises from a particular organization ofthe contenders within each decoding stage. Figure 3.5 illustrates a reorganization of the contendermetrics into two contender-subsets. The first subset, labeled Co in Figure 3.5, consists of the Mcontender metrics that were formed by extending each survivor with the '0' bit. The second subset,labeled C1, consists of the other M contenders that were formed by extending each survivor withthe '1' bit. Of paramount importance is that each contender-subset is an sorted set, as shown inFigure 3.5. The ordering within the contender-subsets requires no sorting effort — the ordering is aconsequence of our method for forming the contender subsets from an ordered set of survivors. Forexample in Figure 3.5, since bit '0' happens to be the hard-decision, each contender's metric in Cois identical to those in S (due to the fact that the likelihood-distance for the hard-decision bit is zero),and thus the ordering inherent in S is preserved in Co. In C1, each contender metric is formed byaddition of an identical likelihood-distance to each survivor metric, so that the ranking inherent in Sis also preserved in C1. This organization of contenders enables the decoder to maintain a sorted listby merging the two sorted contender-subsets. Merging of two ordered sets of size M is very simplecompared to complete sorting of a single set of size 2M [27], as the merging can be accomplishedwith 2M — 1 comparisons. In fact, since we seek only the best M of the 2M, the decoder canaccomplish this in M comparisons by partially merging the two contender-subsets until the requirednumber M have been found. This partial merging can be done in /1/ comparisons, so we haveNe(M,2M)Mwhich is the lowest dashed line in Figure 3.3.(3.32)72i- 1 depthsurvivorsInput list^intermediatelist32.0 32.024.0 31.031.0 25.523.0 25.025.5 24.017.5 23.025.0sort21.017.0 17.5 retain21.0 17.0 17.013.0 15.1 15.115.1 13.0 13.07.1 10.0 10.010.0 8.0 8.02.0 7.1 7.18.0 2.0 2.00.0 0.0 0.025.525.024.023.021.017.517.015.113.032.031.0Contender Metricsat depth i.Survivor Metricsat depth 1-1. 10.08.07.12.00.0 .1Chapter 3. Trellis Decoding with Pruning^ 3.4 Contender SiftingD bth,c h(a) Formation of the Contender Metrics.^(b) Contender Sifting using Sorting.Figure 3.4 Contender Formation and SiftingThe formation of the contender metrics is shown in (a) for extension of the survivors at depth i - 1 with aninformation-bit at position i. Only the branch likelihood-distances for position i are shown. The M = 8 survivormetrics at depth i - 1 are updated to form 2M = 16 contender metrics at depth i. In (b), the 2M contendermetrics are shown in a list, which is then processed by the decoder to retain the best M. For simplicitywe show complete sorting of the contenders, which is then followed by retention of the best half.7325.525.024.023.021.017.517015.1^ Contender-Subset, C124.0Survivor Metricsat depth i-1.23.017.517.013.07.12.00.0partial merging32.031.013.010.08.07.12.00.0 survivors32.031.025.525.021.015.110.08.0•■■I.Chapter 3. Trellis Decoding with Pruning^ 3.4 Contender SiftingD (yfi ,c h)i- 1^ depth^Contender-Subset, Co(a) Formation of the Contender Metrics.^(b) Contender Merge-Sifting, using Contender-Subsetsand partial merging.Figure 3.5 Contender-Subset Formation and Contender Merge-SiftingIn (a), during the extension of the survivors, the set of contender metrics is partitioned into two contender-subsets.according to whether the extended bit is a '0' or a '1'. In (a), the '1' extensions are indicated with darker lines thanthe '0' extensions (we have assumed, without loss of generality, that the '0' bit is the hard-decision.). In (b), thesecontender-subsets are shown as lists labeled Co and C1. This choice of contender-subsets results in each subset beingan ordered set, so that partial merging can be used instead of sorting. The result is an ordered list of M survivors.74Chapter 3. Trellis Decoding with Pruning^ 3.4 Contender SiftingThe above discussion outlines the main principle of the proposed contender sifting method,which we shall refer to as contender merge-sifting (CMS). As will be discussed shortly, the basiccontender merge-sifting method can be applied for constraint-symbols, as well as for other decodingapplications.Decoding Systematic Binary Linear Block CodesThe basic contender merge-sifting method discussed above arms us with a technique that issufficient to implement M algorithm decoding of systematic binary linear block codes.For codes in systematic form the M algorithm need only sift contenders where the number ofcontenders exceeds M. Let /m denote the first depth where the number of contenders exceedsM. Contender sifting will then occur at depths 1114 through k. The M algorithm has no need tosift contenders for the final n — k depths, since the final 71 - k depths are all constraint depths(where each extension of M survivors results in Al contenders). To implement the M algorithmwe can utilize contender merge-sifting as follows. For the depths 1 through 1m — 1, we usecontender merge-sifting with complete merging of the contender-subsets, to maintain an sorted setof 21, 22, , 21A1-1 survivors at the depths 1, 2.... , /Ai — 1, respectively. This ensures that theset of survivors at depth 1A4 — 1 will be sorted, so that merge-sifting can be used at subsequentdepths. At depths 1M through k, the decoder uses contender merge-sifting with partial merging ofthe contender-subsets, as shown in Figure 3.5, to maintain a sorted set of M survivors. At depthsk + 1 through 71, no contender sifting is needed. The output codeword is chosen as the best amongthe Al final survivors.The total number of metric comparisons used for the CMS implementation of the M algorithm,for this systematic code case, can be upper bounded as follows. For the initial depths 1, 2, /A/ —1where the number of contenders is less than Al, the number of comparisons to maintain the ordered75Chapter 3. Trellis Decoding with Pruning^ 3.4 Contender Siftingset of 21,^survivors using complete merging is at most23where the depth 1m is given byIm_lE (2' - i) 2/m - im -1z=i(3.33)/m = [10g2 (M)^L (3.34)For the depths /m through k, the maintenance of the ordered set of Al survivors can be done usingpartial merging of the contender subsets, with M comparisons for each of these depths, so that114(15' — 1N1 +1)^ (3.35)comparisons can accomplish the contender-sifting for these depths. The total number of comparisonsis (3.33) plus (3.35), plus M —1 comparisons to choose the best metric at depth ii, givingN, < M (k + 2 — /m) + 21m — /m — 2.^ (3.36)A simpler upper bound on N, can be obtained by replacing the number of comparisons in thepreceding analysis for the depths 2,3, ....L1 — 1, with M. Now, as in the preceding analysis, atthe first depth we count one comparison, and to this we add the upper bound of M comparisons foreach of the depths 2, 3, ... ,k, and lastly add in the M — 1 final comparisons at depth it, to give< 1 + M(k —1) + (M —1)= Mk.^ (3.37)23^The reason that (3.33) is an upper bound is that we have assumed that each of the first 1Af — 1 positions is an information position. This is anatural assumption since typically M is small, and for M < 2dm,n —I each of the first /Ai — 1 positions are certainly information-positions.76Chapter 3. Trellis Decoding with Pruning^ 3.4 Contender SiftingDecoding Non-Systematic Binary Linear Block CodesFor decoding non-systematic codes the M algorithm again carries out contender sifting onlyat the information positions. To implement the M algorithm the decoder can utilize contendermerge-sifting in a similar manner as above, except that it maintains the sorted survivor set untilthe last information-position. Let the last information-position be denoted as K. To maintain thesorted survivor sets at constraint-positions before K, contender merge-sifting works in a fashionsimilar to that shown in Figure 3.5, in that the contender-subsets are defined in the same way,but the difference being that there are no longer M contenders in each subset. The number ofcontenders in each subset may vary from 0 to M, with a total of M contenders shared betweenthem. Since in this case we want to retain all M contenders, the decoder should completely mergethe contender-subsets.The number of comparisons used in contender merge-sifting at a constraint-position can beupper-bounded as follows. We use the same merging method as was used for the information-positions (where there were equal numbers of contenders in each contender-subset). The largestnumber of comparisons will occur for the case where there is only a single contender in one subset,say in C1, and its metric is higher than any of the Al — 1 metrics in Co. Thus, the largest numberof comparisons will be Al — 1. This will account for an additional (K — k)(M — 1) comparisonsover that of the systematic case, given by (3.36), so that we haveAre < M(K 2 — /m ) 21m — (K — k)— lm — 2.^(3.38)For any linear block code with minimum distance 4,„„ the maximum value that Ar can have is77 — (din,„ —1)24, so that an upper bound on (3.38) isAre < M(71 + 3 —^z — 1M ) 2 1M —^— k^n ) 1A,/ 324^This follows from the fact that any d,,, — ] symbols of a linear block code can be chosen as panty symbols.77Chapter 3. Trellis Decoding with Pruning^ 3.4 Contender Sifting^= M(k + A + 2 — /m) + 21m — 1m — 2 — A^ (3.39)where A A 72 - k +1 — dmi„ is the amount by which clnii„ is less than the Singleton bound.Alternatively, a simpler (but looser) upper bound can be formed by adding in the extra(K — k)(M — 1) comparisons (arising from the non-systematicity) to the upper bound for thesystematic case (3.37), to obtain^< Mk + (K — k)(M — 1).^ (3.40)Finally, using the fact that K < n — (dmi„ — 1) in (3.40) givesN, < Mk + (M — 1)[n, — k — (dmin —1)]= Mk + (M — 1)A^ (3.41)as a simple upper bound on the number of comparisons to decode any binary linear block codeusing contender merge-sifting to implement the M algorithm.Let us now return to Figure 3.3, and consider the worst case number of comparisons used incontender merge-sifting in decoding any binary linear block code. From our earlier discussion ondecoding systematic codes, we had that there were at most M comparisons at any depth. From thediscussion on decoding non-systematic codes, we had that there were at most M — 1 comparisonsat any constraint position. Hence, the maximum number of comparisons at any depth is upperbounded by M, so that the lowest dotted line in Figure 3.3 is an upper-bound on the normalizednumber of comparisons at any depth, for M algorithm decoding of any binary linear block code.78Chapter 3. Trellis Decoding with Pruning^ 3.4 Contender Sifting3.4.2 Rate 1/n Binary Convolutional CodesHere we show how contender merge-sifting can be applied to breadth-first tree decoding of rate1/n binary convolutional codes. We will first consider rate 1/2 binary convolutional codes generatedby an encoder of the form shown in Figure 3.6.In Figure 3.6, each information-bit is shifted into the encoder shift register to produce twochannel-bits. The shift register connections to the two sets of adders are specified by the generatorpolynomials 91(x) = 910 + gii x gimxm and 92(x) = 920 + 921 + • • + g27ix"' . Theconvolutional code trellis will have .s = 2' states, wherem = max { deg [gi (x), g2 (x)] 1 (3.42)is referred to as the constraint length of the code [48]. For example, constraint length 6 (64 state)codes are widely used in practice, with VLSI Viterbi algorithm decoders available.Figure 3.6 A Rate 1/2 Convolutional Encoder79Chapter 3. Trellis Decoding with Pruning^ 3.4 Contender SiftingA rate 1/2 binary convolutional code tree for the encoder in Figure 3.6 is almost invariablydrawn with 2 channel-bits per branch. An alternate way to view the tree is to redraw each 2—bitbranch as a pair of consecutive single-bit branches. In order to implement the M algorithm usingcontender merge-sifting, we first consider how the M algorithm's operation on the original tree canbe viewed on the new tree. On the original tree the M algorithm prunes after each 2—bit branch25.This is equivalent, on the single-bit per branch tree, to not pruning after the first bit of the pair,and then pruning after the second bit. This can be accomplished using contender merge-sifting, asfollows. Using the first bit of the pair, the decoder forms contender-subsets and merges them toretain all of the 2M contenders as survivors. At the second bit of the pair, contender-subsets areformed and partially merged to retain the best M contenders as survivors.The number of comparisons used in this sifting procedure is the sum of the 2M— 1 comparisonsfor the complete merging, plus the M comparisons in the worst case for partial merging, givingIV, <2M —1 + M= 3M —1 . (3.43)The number of comparisons normalized by M is then NciM < 3 — 1/M which is shown inFigure 3.3. It can be seen that CMS is very efficient compared to the other schemes, for all butthe smallest M.The above procedure generalizes for rate 1/n codes as follows. First, the n-bit per branchtrellis is viewed as a concatenation of n single-bit branches. At the first bit, complete merging ofthe two contender-subsets of size M is used to retain 2M sorted contenders. At each of the nextn — 2 bits, complete merging of contender-subsets is again used, to retain 2M sorted contenders.25^For simplification, we assume that the decoder depth is not near either end of the tree or trellis, where no pruning is required. For the initialdepths of the tree or trellis (where the number of contenders is less than M)the decoder can use complete merging as discussed for the binary blockcode case. For the final depths of the tree or trellis (where there are only constraint positions) no sifting is required.80Chapter 3. Trellis Decoding with Pruning^ 3.4 Contender SiftingAt the final bit (of the n) partial merging of the contender-subsets is used to retain only the bestM as survivors. The number of comparisons used in this n-step sifting procedure is then< (2M — 1) + (n — 2)(2M — 1) + M= (n — 1)(2M — 1) M= M (2n — 1) — (a — 1)^ (3.44)to maintain a sorted survivor list at each depth of the original n—bit per branch trellis.3.4.3 DiscussionIt should be clear that CMS is not restricted to linear tree codes. The essential requirement touse CMS is that the contender metrics be partitioned into sorted subsets, so that merging can beused. This in turn requires branches at the same depth of the tree to share a pool of branch metrics.Non-linear codes can have an unequal number of contenders in each contender-subset, which issimilar to the case already considered for constraint-positions of linear codes.We emphasize that although we have used the head likelihood-distance as the decoding metric,the use of CMS is not restricted to this metric. Any metric can be used provided that the contendermetrics can be partitioned into sorted subsets. As stated above, this requires branches at the samedepth of the tree to share a pool of branch metrics. For example, it is common to use the metricslog [f (r 0)] and log [f (r I 1)] when decoding binary block codes, and each branch at depth i ofthe tree utilizes one of these two metrics.With regard to trellis decoding, we comment that if so desired, the merging of contenders attrellis nodes can be used with contender merge-sifting. Competition among heads that merge at thetrellis nodes will reduce the number of entries in the contender-subsets, and this does not destroythe ordered property of the each contender-subset.81Chapter 3. Trellis Decoding with Pruning^ 3.4 Contender SiftingContender merge-sifting is readily applicable to breadth-first bidirectional decoding of block orconvolutional codes. In bidirectional searches, a block (or convolutional code frame ) of symbolsis searched from both ends of a trellis, or from two code trees [4911501151].Finally, we comment that CMS is applicable to the important class of punctured convolutionalcodes. Perhaps the most common example of such a code is that of puncturing (i.e. not transmitting)certain bits in a rate 1/2 code, which generates a code of rate greater than 1/2 [18]. Puncturedconvolutional codes enable the use of one encoder and decoder (with very minor alterations) for arange of code rates. This is especially useful in automatic repeat-request (ARQ) systems employingincremental redundancy [20], where rate-compatible punctured convolutional codes are attractive[19]. In such applications CMS can be easily utilized, since at a punctured bit the relative metricsof the contenders will be unchanged. Consequently, the formation and merging of the contendersubsets is simplified and will require no metric comparisons.82Chapter 4Reconfigurable Trellis DecodingTo find an efficient technique which, given a received word, can generate a set ofcodewords that will with high probability contain that word which is most likely: Thisis the central problem.GD. Forney [5]THE efficiency of partial trellis searches arises from the judicious selection of a small set ofsurvivors. If a sufficient number of survivors are wisely chosen, the increase in probabilityof error over that of a ML decoder will be small. In this chapter we introduce a class of techniquesreferred to as Reconfigurable Trellis Decoding (RTD), or RT decoding, that can retain relativelyfew survivors at the cost of a negligible increase in error probability.RT decoding is not simply a partial trellis search algorithm. In fact it can utilize well knownpartial tree or trellis search algorithms. The essence of RT decoding is to carry out the search usinga different trellis for the code. The trellis used for decoding is dependent upon the reliability of83Chapter 4. Reconfigurable Trellis Decodingthe received data, so that it is determined 'on-the-fly'. The trellis used is a 'reconfiguration' ofthe transmitted code trellis, in that it corresponds to a code with its symbol positions reorderedaccording to their reliabilities.In Section 4.1 we review non-trellis based and trellis based soft-decision decoders for linearblock codes. In Section 4.2 we explain the concept of RT decoding in detail. The number ofsurvivors required to attain 'near-ML' performance will depend on the channel and the code. Onsoft-decision channels such as the AWGN channel with binary PSK, there can be a significantreduction in the number of survivors required. An even larger reduction is attained for an erasurechannel, where c/n„„ — 1 or fewer erasures can always be corrected for any linear block code withonly a single survivor.In Section 4.3 we extend the Uniform Error Property [52] of ML decoding of linear codes on abinary-input symmetric-output memoryless channel to the more general case of pruning decoding.This result is used in later sections.In Section 4.4 we introduce the pruning impact, which is the increase of the probability oferror of a pruning decoder with respect to a ML decoder. An upper bound on the pruning impactis found for use in later sections.In Section 4.5 the pruning impact of RT decoding using the M algorithm is estimated. An orderstatistic channel (OS C) model is introduced and used to facilitate the assessment of the pruningimpact. The pruning impact of the RT-M algorithm is found using a simple search procedure thatis independent of the detailed structure of the code. Examples are given to compare the estimatedword error rate of the RT-M algorithm to simulation results.In Section 4.6 the computational effort of RT-M decoding is discussed. Tables of formulas are84Chapter 4. Reconfigurable Trellis Decoding^4.1 Soft-Decision Decoders for General Linear Block Codespresented that give the number of metric operations (comparisons, additions) and the number ofbinary-vector operations used in carrying out the search. Plots of the number of operations againstcoding gain are presented for some example codes on an AWGN channel with binary PSK and ona Rayleigh faded AWGN channel with binary NCFSK.In Section 4.7 we review some of the main results of the previous sections and discuss RTdecoding in terms of its resiliency to bursts of errors. Some other aspects of RT decoding arebriefly reviewed, including its decoding delay and its computational overhead.4.1 Soft-Decision Decoders for General Linear Block CodesThe majority of soft-decision decoding schemes for block codes do not utilize trees or trellises.Here we discuss some of the main schemes that have been proposed for ML decoding, or 'near-ML'decoding, of linear block codes. One of the earliest ML soft-decision decoding algorithms is Wagnerdecoding [53], which is applicable only to the simple (n, n — 1) single parity check codes. TheWagner decoding algorithm consists simply of inverting the least reliable bit if the initial check onparity was not met. Note that hard-decision decoding of this code cannot correct any errors, whilesoft-decision Wagner decoding on the AWGN channel with binary PSK will approach d„„„ —1 1bit error correction with increasing SNR [6].In 1966, Forney introduced Generalized Minimum Distance (GMD) decoding [5][10], whichprovided a way to utilize an errors and erasures decoder for soft-decision decoding. In GMD thereceived word is modified by erasing a number of the least reliable symbols, and it is then fed toan errors and erasures decoder to generate a candidate codeword. Several candidate codewords aregenerated (at most about [dm,„/2]) by erasing a successively larger number of the least reliablesymbols. This procedure is also referred to as successive erasures decoding (SED) [54]. The85Chapter 4. Reconfigurable Trellis Decoding^4.1 Soft-Decision Decoders for General Linear Block Codes'generalized distance' is used in a stopping rule to halt the generation of candidate codewords,and can thus lower the average decoding effort. Various modifications have been suggested to theGeneralized Distance metric and stopping rule, including [54][551156].In 1972, Chase [6] introduced a method that utilizes a hard-decision decoder to generatecandidate codewords. A candidate codeword is generated by first perturbing some bits of thereceived word and then feeding it to a hard-decision decoder. By using a number of suchperturbations and subsequent decodings, the resultant set of candidate codewords are then comparedusing a soft-decision metric. The main perturbation schemes discussed in [6] alter the least reliablebits. Since the perturbation of the received word amounts to 'guessing' the errors, the techniqueis sometimes referred to as 'Chasing' [57].There are many other examples of soft-decision decoding schemes that similarly require eithera hard-decision decoding algorithm [58][59][60], or that detailed structure of the code be exploitede.g. [61][62][63][64][65].A class of decoding schemes that can in principle be applied to any linear block code, andwithout the need for an errors/erasures decoder or a hard-decision decoder, is information setdecoding. The central concept of information set decoding is as follows. If we can find a set ofk independently specified symbols (an information set) that is error free, then the remaining n — ksymbols (a parity set) will be correctly determined by the constraints of the code. Since any errorpattern that occurs in these n — k parity positions will not alter the decoded output, these errorsare said to be 'trapped'. A number of information sets are utilized to attempt to trap a specifiednumber of errors. These information sets can be precomputed, or selected during decoding by somedeterministic or random method. Specialized versions of information set decoding (e.g. [66][67])generate a restricted number of information sets by carrying out automorphic (code-preserving)86Chapter 4. Reconfigurable Trellis Decoding^4.1 Soft-Decision Decoders for General Linear Block Codespermutations of the symbols. To consider the decoding effort for general linear^k) block codes,let N (n, k, 1) denote the minimum number of information sets required to trap t errors. It is known[681169] thatN(n,k,t) > [^n — k[ n — k — 1^In—k+t+1 I^I I.n — 1^I- 71 - + 1^1(4.1)A more readily computed, albeit looser, lower bound is [69]n)N(n, k,t) > (k).^ (4.2)n—tIn (4.2), the numerator corresponds to the number of t—tuples to be trapped, and the denominatorcorresponds to the number of t—tuples which can be trapped by each information set. Now supposethat for soft-decision decoding that we wish only to trap up to dmi„ — 1 errors, so that the decodercan attain the asymptotic ML error probability [6] [70]. Setting t = dni,„ — 1 in (4.2) yields(4.3)From this we see that for codes with 41,1 — 1 approaching the Singleton bound of 71. - k that thedenominator of (4.3) approaches unity, so that the ability of any information set to trap multiplet—tuples is entirely lost. This bodes poorly for soft-decision decoding of large minimum distancecodes using pure information set decoding.An alternative to pure information set decoding is to use one or more information sets and searchthrough some error patterns, e.g. [71]. To further reduce the number of information sets considered,one can guide the choice of information sets by utilizing the soft-decision bit reliabilities [69][72].As will be discussed later, RT decoding utilizes this approach and combines it with trellis decoding.Trellis based soft-decision decoding of general linear block codes has received relatively littleattention in the literature. The trellis constructions of Wolf [8], Massey [9], or Forney [16], can(dm,n-1)N(n, k, dmi„ — 1) >^n k )•drn,„-187Chapter 4. Reconfigurable Trellis Decoding^4.1 Soft-Decision Decoders for General Linear Block Codesbe used with the Viterbi algorithm to attain ML decoding. For some codes with specializedstructure, particularly efficient trellis based decoders can be developed [16]. For general linearblock codes, a possible benefit of using trellis or tree representations is that decoding algorithmsoriginally conceived for decoding convolutional codes may be applicable to decoding block codes.The various breath-first, depth-first, and metric-first search strategies for convolutional codes, e.g.[29][30][331[3511401[73][741, may be useful in decoding block codes. However, few results havebeen published on the application of such methods to decoding block codes. Matis and Modestino[22] proposed the use of a modified M algorithm to decode a linear block code trellis. In thisscheme the number of trellis branches searched is reduced by using a limit of M survivors, and,for a specified number of the most reliable information positions, by only extending branches thatagree with the hard-decisions. This idea was later applied to a Rayleigh faded AWGN channelin [75]. In using convolutional codes the length of the code is sometimes truncated in order tostrictly limit the number of errors incurred when the correct path is lost [52]. This is naturallyachieved in using block codes. The finite length of block codes and truncated convolutional codes(which are effectively block codes) can also be exploited in bidirectional searches of the trellis[49] or tree [50][5114.2 The RT Decoding ConceptAs discussed in Chapter 3, in a partial search of a trellis the decoder utilizes the currentlygathered likelihood information to guide its subsequent exploration. The concept of ReconfigurableTrellis Decoding is to combine partial trellis searching with the fact that we may exploit alternativetrellis representations of the code based on symbol position permutations. The decoder operates onan equivalent-code trellis that has its symbol positions in an order advantageous to decoding. After88Chapter 4. Reconfigurable Trellis Decoding^ 4.2 The RT Decoding Conceptdecoding of the equivalent-code the symbols are simply restored to their original order. The RTdecoder aims to exploit reliable symbols by repositioning them to improve the search efficiency.Efficiency in partial trellis searches is attained by exploring heads that are most likely to bepart of the correct (transmitted) path, or equivalently, by discarding those heads that are unlikely tobelong to the correct path. At each decoding stage the decoder compares head metrics to discardthose heads that are presently ranked as unlikely. Increased search efficiency would be attainedif the rank order of head metrics rapidly converged with depth to their final values. In otherwords, efficient pruning would be facilitated if the influence of the metrics of the unexplored tailswas insignificant. Since reliable symbol-positions have one symbol-hypothesis that is much morelikely than its alternatives, and since unreliable symbol-positions have little distinction betweenalternative symbol-hypotheses, reconfiguring the symbol positions in a most reliable symbol first(MRSF) manner should increase the rate of convergence with depth of the rank order of headmetrics. In other words, using MRSF ordering should enable the head exploration to rapidly gatherand utilize the most significant branch-likelihood information.The above heuristic argument is perhaps reasonable for tree decoding, but it is less plausiblein the case of a trellis. With a trellis, symbol position reordering may vary the trellis dimensions,as discussed in Chapter 2. If the code had a compact trellis then symbol position reordering maysignificantly increase the trellis dimensions and the search effort saved by using such a compactrepresentation may be lost. We remark that as discussed in Chapter 2, for a linear block code tohave a compact trellis requires a smallest minimum distance d -= mm(dim, , dim.) significantlyless than the corresponding Singleton bound. For codes with a large d relative to the correspondingSingleton bound the impact on the trellis dimensionality due to symbol position reordering will besmall. For example, MDS codes will have no alteration of trellis dimensions due to symbol position89Chapter 4. Reconfigurable Trellis Decoding^ 4.2 The RT Decoding Conceptreordering, so that any reordering can be used without a dimensional penalty.With partial searches the question arises as to whether the search should use a trellis or a tree.With partial searches of large dimensional trellises we are constrained in practice to retain onlysome manageable number of survivors, which will be small in relation to the total number of states.It has been found (for example, see [32]) that it is unlikely that such survivors merge, so that trellisdecoding offers little reduction in the number of survivors. As well, with partial trellis searchesof large trellises it is not feasible to store the entire trellis, so that it is necessary to compute thecontender states. This results in the need to find matching states amongst a sizeable number ofsurvivors, which can involve significant effort [32]. For these reasons, we are inclined to use treedecoding for large dimensional codes.Despite the above practical reasons favouring tree decoding over trellis decoding, we prefer touse the designation Reconfigurable Trellis Decoding since it is clear that the technique can alsobe utilized on the less structured tree-representation of the code. We often use the abbreviationRT decoding, which can conveniently have either interpretation, with the particular meaning beingclear from the context. In cases where possible confusion may arise, the appropriate full formwill be used.As mentioned above it may not be feasible to store the entire trellis or tree so that the contenderstates should be computed — only the explored portion of the trellis or tree is generated. To dothis, one can use the simplified trellis construction methods discussed in Chapter 2, except thatstates are found only for those heads that are extended. For use with RT decoding we note that thereordering of the symbol positions implies a corresponding reordering of the columns of the paritycheck matrix H and the generator matrix G. In this chapter, we will use the 'Method 2' simplifiedtrellis construction (pg. 32) and reduce H to row echelon form prior to beginning the search.90Chapter 4. Reconfigurable Trellis Decoding^ 4.2 The RT Decoding ConceptAn Example. We now present examples of decoding using RTD and decoding using theoriginal trellis for a Hamming (7,4) code. With this small code our intent is only to illustrate thebasic mechanics of RTD.Assume that the transmitter sends the codewordCtransmitted =-(1^1^1^1^1^1^1) (4.4)and that the original trellis is as shown in Figure 4.1. At the receiver, assume that the resultingsymbol likelihood-distances are(0.1^0.3 0.4^1.6 0.8^1.2 0.2)^ (4.5)(where, since the code is binary, we need only show the single nonzero likelihood-distance for eachsymbol position). As well, we assume that the vector of hard-decisions isy = ( 0 1 1 1 1 1 0).^ (4.6)A hard decision decoder would decode to the codeword nearest in Hamming distance to y, whichis cout,H D =--( 0 0 1^1^1^1 0).For simplicity we assume that decoding is performed using the M algorithm with M=1, so thatat each depth only a single survivor is retained. Figure 4.1 shows the original code trellis withthe explored heads indicated by dark branches, and with the explored head's likelihood-distancesindicated at each depth. In this example the decoded codeword is coo =( 0 1 1 0 1 0 0)which, like the hard-decision decoder, is in error. For RT decoding we first reorder the bit positionsaccording to their reliability. The most-reliable-symbol-first order of likelihood-distances is( 1.6^1.2^0.8^0.4^0.3^0.2^0.1). (4.7)91Chapter 4. Reconfigurable Trellis Decoding^ 4.2 The RT Decoding ConceptThe original parity check matrix is shown below along with the new parity check matrix formedby correspondingly reordering the columns. Also shown is the reduced reordered matrix, where thearrows at the top of the reduced matrix indicate the constraint positions.1 1 1 0 0 0 0 1 1 0 1 1 1 0 00[ 10 1 1 1 1010 1^1[11 1 0 0110 1^0[11 0 1 10100 1 0 1 1 0 1 1^0 1 0 1 1 0 0^1 1 0 1 0 1Figure 4.2 shows the resulting code trellis with the explored heads indicated by dark branches.The RTD output codeword iscout,RTD =(1 1 1 1 1 1^ (4.8)which agrees with ctr a It S Mitt e d•92( 0 )1 1(0 \(0\10/0)110/( 1 \(01(1iio)(1 \/Chapter 4. Reconfigurable Trellis Decoding^ 4.2 The RT Decoding Concept0^7^2^3^4 5^6^70.3 16 28 2 8VII I AA -TIP"'6^2.4CTAtTf0.1Figure 4.1 An Example of the M Algorithm Decoding a (7,4) CodeThe dark branches indicate the explored trellis using the M algorithm with M-=1. The output codeword asdetermined by the lowest likelihood-distance of the explored heads is cow =( 0 1 1 0 1 0 0).(o)/0\01(7)'00^1^2^3^4^5^6^7VIA1 6 0 30 0Wit:IWAFFPAir lAY0.0^2.800Figure 4.2 An Example of the RT-M Algorithm Decoding a (7,4) CodeThe output codeword using the RT-M algorithm with M.1 is Co„t,RTD =(1 1 1 1 1 1 1).93Chapter 4. Reconfigurable Trellis Decoding^ 4.2 The RT Decoding ConceptThis simple example also illustrates a key property of RT decoding. Consider the reorderedvector of likelihood-distances (4.7) and note that since a smaller likelihood-distance implies a largerprobability of error, we see that RT decoding will tend to collect channel errors into bursts in thetail of the trellis, regardless of the original error distribution. This can result in 'trapping' manyerrors in the tail (i.e. many errors will fall in the constraint positions in the tail), so that fewsurvivors are required.The number of errors that will be trapped by RT decoding depends on the nature of the channel.For example, if we are fortunate enough to have a channel that is extremely well approximated byan erasure channel, then RTD will collect all erasures in the tail. In the case of an MDS code onan erasure channel, if there are n — k or fewer erasures they will all be in the final n — k positionsof the reconfigured trellis, and these final n — k = dm,„ — 1 positions will be constraint positions— thus we can ensure that ML decoding can be attained by retaining only one survivor, regardlessof the size of the code. Similarly, for non-MDS codes, we can be assured to correct any patternof clni,„ — 1 errors with only one survivor.In fairness, the erasure channel represents the best possible case for RT decoding. It is easy tosee that in the case of the BSC, there is no advantage to trellis reconfiguration. However, we areinterested in more typical soft-decision channels, where we can gain some improvement in errorprobability compared to hard-decision decoding while at the same time being able to save trellissearching effort via RT decoding.94Chapter 4. Reconfigurable Trellis Decoding^ 4.2 The RT Decoding ConceptAn Example of Decoding on an AWGN ChannelHere we present some simulation results for the binary (24,12) extended Golay code decodedusing the M-algorithm, with and without reconfiguration, and the Viterbi algorithm. Figure 4.3shows the codeword error rate (WER) versus signal-to-noise ratio (SNR26) in dB for binary phase-shift-keying (BPSK) on an additive white Gaussian noise (AWGN) channel. It is easy to show thatthe likelihood-distance for each bit is proportional to HI, where r, is the output of the demodulatorfor the ith bit.The simulation results shown in Figure 4.3 demonstrate a considerable reduction in the numberof survivors when using RT decoding. For example, compared to hard-decision decoding, the Malgorithm with 16 survivors performs slightly better, while the RT-M algorithm with only onesurvivor outperforms both. The RT-M algorithm with 8 survivors attains virtually the same WERas the M algorithm with 128 survivors. To be within 0.25 dB of the Viterbi algorithm at a WER of10-2, the RT-M algorithm needs about 8 survivors, while the M algorithm needs about 128. Thiscompares with the smallest s (the maximum trellis dimension) attainable for the (24,12) code [17]of 9, which corresponds to 512 states.2726^Here the SNR is Eb/No, i.e. the ratio of the energy per information-bit to the one-sided power-spectral-density (PSI)) of the AWGN.27^As discussed in 1171, trellises with smaller dimensions can be drawn if one is willing to group multiple bits per trellis branch. In (16] a trellisfor the Golay (24,12) code is constructed which facilitates an efficient decoding algorithm for this particular code. Later we will present a moredetailed comparison of these approaches.95Chapter 4. Reconfigurable Trellis Decoding^ 4.2 The RT Decoding ConceptSNR (dB)Figure 4.3 Word Error Rate using the M Algorithm, RT-M Algorithm and the Viterbi AlgorithmThe word error rate (WER) is plotted vs. the SNR for the (24,12) Golay code onan AWGN channel with BPSK. The solid and dotted curves show the M algorithm and the RT-Malgorithm word error rates, respectively. The error bars show 95% confidence intervals.96Chapter 4. Reconfigurable Trellis Decoding^4.3 The Uniform Error Property for Pruning Decoders4.3 The Uniform Error Property for Pruning DecodersIn this section we discuss how the uniform error property of ML decoding of linear codes onan binary-input symmetric-output memoryless channel [52] can be extended to the more generalcase of pruning decoding.A binary-input symmetric-output memoryless channel [52] has inputs labeled 0 and 1 and hasan output that is symmetric in the sense thatf(ri 0) = f(—ri 1) (4.9)for all values of rz. For example, the AWGN channel, the BSC, and any other symmetricallyquantized AWGN channel can be shown to satisfy (4•9)28 [52]. This symmetry can be used toshow that the uniform error property [52] holds for linear codes with ML decoding, i.e. thatPe,M L = Pe,M Lje, , any CI- E C (4.10)where Pe,A L is the error probability using ML decoding, and Pe,mLie, denotes the error probabilitygiven that a specific codeword c) is transmitted. The uniform error property allows one to evaluatethe probability of error for an ML decoder by assuming that any single codeword is sent, which forconvenience can be taken to be the all zero codeword co. One would expect that the uniform errorproperty should hold for pruning decoders. This generalization of (4.10) is stated in the followingtheorem, which is proved in Appendix B.Theorem 4.1 The error probability Pe of a pruning decoder, when decoding linear codes on a treeor a minimal trellis and assuming a binary-input symmetric-output memoryless channel, satisfies Pe — Pek, , any ci E C^ (4.11)28^A simple transfomiation of the output may be required so that (4.9) is satisfied. For example, suppose an AWGN channel is used with thesignal constellation points for 0 and 1 at r = 0 and r = A, respectively. To satisfy (4.9), one needs only to shift r so that the signal points occurat ±A/2.97Chapter 4. Reconfigurable Trellis Decoding^4.3 The Uniform Error Property for Pruning Decoderswhere Pe le, denotes the error probability given that codeword c is transmitted. The pruning decoderis assumed to adhere to the decoding rule of minimizing (3.19) at each decoding stage, and in doingso it is assumed to treat each codeword as equiprobable.4.4 The Pruning ImpactWhen decoding using pruning it is desirable to attain "near-ML performance". To quantify thiswe define the pruning impact, PI, to be the increase in probability of error of a pruning decoderwith respect to a ML decoder. Introducing the pruning impact will also facilitate the analysis,compared to using the error probability directly.In any decoder, an error occurs when the correct path (CP) is not released as output; whichin the case of a pruning decoder is due to the CP being pruned, or due to the most likely path ofthose remaining being chosen instead of the CP. The error probability can be written asPe = I Pr( CP not chosen I r)f(r) dr= f Pr( CP pruned U ML path CP I r) f(r) dr.^(4.12)Forming a union bound on the error causing events in (4.12), we obtainP, < I [Pr( CP pruned I r) + Pr( ML path CP I 0] f(r) dr.^(4.13)Let Pe,A I L denote the error probability of an ML decoder. We see that (4.13) isP, 0 correspond to the larger likelihoodof f(r, 0) and f(—r, 10).It will be convenient to work with conditional transition probabilities. Conditioning on theoutput being either p, or —pz, and given that 0 was sent, the crossover probability isPr(—p 1 0,^=^0) f(lpil 10)f (- Pi 10) f (pi 1 0) + f(-p I 0)_ ^f(Pi 1) i(Pi 0)+PPi I 1)(4.23)100Chapter 4. Reconfigurable Trellis Decoding^ 4.5 Approximate Analysis of PruningImpact for the RT-M AlgorithmFigure 4.4 A Symmetric Pairing of Outputs from a Binary-Input Symmetric-Output Channelwhere we have used the fact that f (— p 10) = f (pi 1). Similarly, the crossover probability giventhat 1 was sent and that pi or — p, occurred, isf(pd IPd I 1) Pr(ei I 1' led)^1) +f(Pi11) (4.24) f(pill)+ f(pi 10)and (4.23) and (4.24) are equal. We will denote this symmetric crossover probability simply as pi.Now consider a sequence of 71 outputs from a binary-input symmetric-output memorylesschannel. Any sequence of n outputs belonging to(4.25)can be represented by the transition diagram shown in Figure 4.5, where the crossover probabilitiesare labeled by( P1^P2^• ' • Pn )•^ (4.26)Larger conditional probabilities are shown as darker arrows, to emphasize that the channel reliabilityvaries over the sequence of p bits.101Chapter 4. Reconfigurable Trellis Decoding^ 45 Approximate Analysis of PruningImpact for the RT-M AlgorithmA RT decoder views the outputs of Figure 4.5 as a reordered sequence. An example is shownin Figure 4.6, where the reordering puts the most reliable bits first. This corresponds to reorderingthe bits according to their crossover probabilities. In Figure 4.6 the ranked crossover probabilitiesare denoted as pz,„, with the subscript i : n indicating the ith smallest order statistic from a sampleof size n. A channel formed by reordering blocks of n bits into most-reliable-bit-first order willbe referred to as an Order Statistic Channel (OSC).On an OSC the crossover probability for any particular position will vary as different sequencesof 71 outputs are observed. However, we propose to model each conditional transition probability bya single 'fixed' transition probability. A representative set of transition probabilities are taken to bethe expected transition probabilities for each reordered position. The resulting model is referred toas the Order Statistic Channel Model (OSC Model), and is shown in Figure 4.7, where the 'fixed'transition probabilities are denoted as 152,7.The justification for the OSC model stems from an asymptotic property of order statistics;specifically, that as n is increased, an OS is an asymptotic estimator of a quantile of the parentdistribution [761177]. Our direct interest is not in estimating a quantile, but rather we are interestedin the property that the variance of an OS is reduced as n increases [76][77]. For RT decoding weare interested in modelling the effect of OS reordering on the transition probabilities fp,}7 1. Aswill be discussed, the effect of the OS reordering will be to decrease the variability of these transitionprobabilities. As n is increased, the varying transition probabilities for a reordered position will beincreasingly well represented by the expected transition probabilities used to form the OSC model.Let Fx (x) denote the distribution of the r.v. X that is used to rank the symbol positions, withcorresponding pdf fx(x). The it r.v.s X1, X2,^,X are assumed independent and identically102.11■■011110.,,^P2:nrifg•,!r....aeeaaeoaeeegpcox*n,133.'nChapter 4. Reconfigurable Trellis Decoding^ 45 Approximate Analysis of PruningImpact for the RT-M AlgorithmFigure 4.5 A Sequence of Symmetric Transition ProbabilitiesThe reception of n outputs belonging to ( ±pi 1p2 • • • ±p,..,) from a binary-inputsymmetric-output channel are represented here by symmetric transition probabilities. Toemphasize that the bit reliabilities vary, larger transition probabilities are shown as darker arrows.Figure 4.6 Reordered Transition ProbabilitiesThe symbol position permutation used by RTD forms a channel consisting of reordered transition probabilities. Herethe ith crossover probability is labeled as pi.„ to indicate that it is the ith order statistic from a sample of size n. As inFigure 4.5 larger transition probabilities are shown with darker arrows, which displays the most reliable bit reordering.•■111111110.-: Figure 4.7 The Order Statistic Channel ModelThe OSC Model represents the channel viewed by the RT decoder as having fixedtransition probabilities, which are the expected transition probabilities for eachreordered bit. The expected crossover probabilities are denoted as P1 n, P.) n,^pn n •103Chapter 4. Reconfigurable Trellis Decoding^ 4.5 Approximate Analysis of PruningImpact for the RT-M Algorithmdistributed (i.i.d.), which is consistent with our assumption of a memoryless channel. A standardresult [76] is that the pdf for the Oh smallest OS drawn from n samples of a parent i.i.d. r.v. withdistribution Fx (x) isfxh.„(x) h ( hi) Fxh-1 (x)[1 FX(x)]7" fX(x)^,1 < h < n.^(4.27)Following the approach used in [77] the pdf of the OS given by (4.27) can be written as the productof the pdf fx(x) of the parent distribution with another function Wh,„(x) given byWh:n(x) = h(hl) F_t-1 (x)11 — Fx(x)rh , 1 32, so that the M=64 WER value should be reasonably close to Pe d u L. 30 The error barsfrom the simulation of M=64 are incorporated into the "Estimated RT-M" plot.The estimated WER exceeds the simulation results by less than about 30% for a SNR of 2 dB.This excess corresponds to a small dB loss (< 0.25 dB), which can be estimated from Figure 4.3(pg. 96). It can also be seen that this excess decreases for higher SNR. To account for this effect,let us review the sources of excess31 in the estimate of the pruning impact, namely, the use of:the use of the systematic code to upper bound the pruning impact,ii the upper bound on the pruning impact (Eqn. (4.22)).For increasing SNR, the errors that occur in a block are increasingly confined to the poorestranked bits. For example, the number of errors occurring outside of the dm,„ — 1 poorest rankedbits will decrease with increasing SNR. Consider the situation where the SNR has increased suchthat effectively all of the errors that occur are confined to the dm,„ — 1 poorest ranked bits. In this30^This can also be verified by comparison with the lower bound on the VA shown in Figure 4.3.31^The accuracy of the estimate is also affected by the use of the OSC model, which may contribute positively or negatively to the excess. Wehave previously discussed that this error diminishes as n is increased.112Chapter 4. Reconfigurable Trellis Decoding^ 4.5 Approximate Analysis of PruningImpact for the RT-M Algorithmsituation, decoding the systematic code will give effectively the same error probability as decodingany equivalent code, since both codes have the same distance properties for the positions wherethe errors do occur. This is due to the fact that all equivalent codes will have the last dm,„ — 1bits as constraint bits. This accounts for a decreasing excess in the estimate of the pruning impactdue to item (i) as the SNR is increased.The increasing confinement of errors to the poorest ranked positions as the SNR is increasedwill also imply that the excess due to item (ii) will decrease. This can be seen as follows. Recallthat the upper bound on the pruning impact is (4.16)/31 = Pr(CP pruned)^ (4.39)and the excess of this upper bound is (4.17)Pr( CP pruned n CP ML path).^ (4.40)However, with M fixed and the SNR increasing, the increasing confinement of the errors to thed„„„ — 1 poorest positions implies that the pruning decoder WER will approach that of ML decod-ing.32 This implies that P7-(CP pruned) becomes small in relation to Ped = Pr(CP ML path),which in turn implies that (4.40) decreases in relation to Pe,m L.32^Recall that if all errors were confined to constraint positions the decoder can attain ML decoding with only M=l. (See pg 94.)113SNR = 2 dBSNR = 3 dBChapter 4. Reconfigurable Trellis Decoding 4.5 Approximate Analysis of PruningImpact for the RT-M Algorithm 1^Frill ^1^1^1^1^II^1^1^1^1^1^1^1 1^1^T 1^1^1^1^1^1^1^1^1^I I^IF^Il^I^I^I i^F^F^F^1 1 1110 -1a)cc010-2A.•1 .%\ \— N %--:- \_ ‘1^%6^..... q .......................-VI N %...... 4vt N ../...1. Estimated RT-M- N'^•••1—±..._1_,^V,^RT-MV,i^'%...It^\ •,.^Estimated RT-ME \tT..^‘'..'.............11.1...^ss..s .... .... v. ..................It,— 1-.;^RT-M-^11:_^i1:.Al:1V:1^1X'tX Estimated RT-MRT-M SNR = 4 dB31^1^1^11^1^1^1^II I !^1^1^1^1^1^1^1^1^1^1^1^1^1^1^1^1^F l^1 I i^1^1^1^1 1^1^1^1^1^I F150-30^5^10 20 25 30 35 4540 50 55 6560Maximum Number of SurvivorsFigure 4.10 Word Error Rate of the RT-M Algorithm, Simulation and Estimation ResultsThe word error rate is plotted vs. the maximum number of survivorsfor the (24,12) Golay code on an AWGN channel with BPSK.114Chapter 4. Reconfigurable Trellis Decoding^ 4.6 Computational Effort of RT-M Decoding4.6 Computational Effort of RT-M DecodingIn discussing the computational effort of RT decoding we will assume that the contender merge-sifting (CMS) method of §3.4 is used to implement the M algorithm. We will make comparisonsof this RT—M algorithm with both ML decoding and near-ML decoding. Specifically, RT-M iscompared to:The most efficient soft-decision ML decoders for a specific code, namely thedecoders of Vardy and Be'ery [78], Snyders and Be'ery [64], Forney [16], andConway and Sloane [62] , for the binary extended Golay (24,12) code.ii The M algorithm operating on various binary linear block codes.As will be discussed, case (i) provides a comparison against highly specialized decoders, whilecase (ii) provides a comparison against decoding of general linear block codes.4.6.1 The Number of Operations for RT-M DecodingHere we consider the number of metric operations (such as additions and comparisons) andthe non-metric operations (such as the binary-vector operations involved in carrying out partialsearches) carried out by RT-M decoding. We assume that binary linear block codes are used.Number of Metric Comparisons, NeIn §3.4 upper bounds were found on the worst-case number of metric comparisons to implementthe M algorithm using contender merge-sifting, for binary linear block codes. For RT-M decodingthe decoder will be decoding all codes equivalent to the original transmitted code, so that anappropriate upper bound from §3.4 is (3.39). Equation (3.39) upper bounds the worst-case numberof metric comparisons for the worst-case equivalent-code, which has its last information-symbol at115Chapter 4. Reconfigurable Trellis Decoding^ 4.6 Computational Effort of RT-M Decodingdepth n — dmi„. Equation (3.39) isN< M(k + + 2 — /m ) 21m — 111/ — 2 — A^(4.41)where 1m = [10g2 (M)] + 1 is the first depth where the number of contenders exceeds M, and= ii - k +1 — cl„„„ is the amount by which dini„ is less than the Singleton bound.As well, a looser but simpler upper bound is given by (3.41)Are < Mk + (M — 1)A.^ (4.42)Number of Metric Additions, NaThe worst case number of metric additions in the M algorithm implemented using CMS can beupper bounded as follows. For k — lm information-positions where M survivors are extended toform 2M contenders, there will be exactly M contenders in each of the contender-subsets. Assume,without loss of generality, that the contender-subset Co corresponds to the hard-decision, for everydepth. Since the contender-subset Co has metrics that are identical to those of the previous survivors,no additions are required here. The M contender metrics in C1 can be computed with M additions,for each of these k —1m depths. For each of the n — k constraint-positions, the worst-case numberof additions occurs when all of the Al contenders fall in C1, again using Al additions. For theinitial depths 1,2,...,1m —1 of the tree before there are M contenders, we can upperbound thenumber of additions as M, at each of these depths. Consequently, a simple upper bound on theworst-case number of metric additions is Al additions for each of the 71 depths, givingNo < Mn.^ (4.43)116E[Na] <21m —1 + M(k — 1m) + 72-114m ( n k2m) 2 — 1 .4 1 14 (4.45)Chapter 4. Reconfigurable Trellis Decoding^ 4.6 Computational Effort of RT-M DecodingAn upper bound on the expected number of additions can be obtained by more carefully accountingfor the number of additions in the initial depths of the tree, and by upper bounding the expectednumber of additions at constraint positions. In the depths less than or equal to lm, the number ofadditions is the sum of the number of entries in C1, which is at mostE 22_1 _ 1. (4.44)z=iFor each constraint position, we have found that the worst case number of additions is M, and theexpected number of additions is M12, as follows. First, recall that in a binary linear block codethe number of '0' bits and '1' bits is identical for each symbol position, so that with equiprobableinformation-bits the parity bits will also be equally likely to be a '0' or a F. In other words, ata constraint-position the bit is just as likely to be a '0' as it is to be a 'I'. As discussed earlier,extensions using all '0' bits or all 'I' bits require zero and M additions, respectively — so thatwith the constraint bits being '0' or '1' with equal frequency we have that the expected number ofadditions to form M contenders at each constraint depth is then simply M12. Using (4.44) for thenumber of additions in the initial 1M depths, plus M additions for the each of the remaining k—lminformation positions (at most), plus M/2 expected additions for the 7i - k constraint positions,we obtainSymbol Position RankingThe ranking of the received symbols could be accomplished using a comparison-based sortingalgorithm. For small values of ii the associated computational effort may be acceptable. However,117Chapter 4. Reconfigurable Trellis Decoding^ 4.6 Jomputational Effort of RT-M Decodingmuch more efficient sorting schemes can be employed to exploit the nature of the distribution ofthe input sequence, as described below.Recall that the n values to be sorted are assumed to be i.i.d. with common distribution Fx(x).We can divide the domain of the distribution to form n quantization regions such that the probabilityof X falling in any region is 1/n. This quantization transforms the continuous r.v. X into auniformly distributed discrete r.v. Sorting of uniformly distributed inputs can be accomplishedusing a bucket-sort in 0(n) expected time [45]. The efficiency of the bucket-sort stems fromsplitting the sorting procedure into the quantization stage (which categorizes inputs into buckets),and a sorting stage that sorts each bucket's contents. The efficiency arises with uniformly distributedinputs since the expected number of entries per bucket is unity; hence few buckets will requiresorting, with little sorting effort expected per bucket.The sorting effort can be further reduced by using approximate sorting, in which we dispensewith sorting the bucket contents. Such approximate sorting should be sufficient for RT decodingas n increases, for the following reason. Consider that the approximate sorting can be viewed asquantizing the decoding metric to 71 levels. Since it is well known that soft-decision decoding canbe accomplished on AWGN channels with a very small dB loss using only a small number b of bitsfor quantization (3 bits are typically used) [69], one would expect that providing roughly 2b bucketsshould be sufficient. For n > 2b, the number of buckets will exceed the number of quantizationlevels known to achieve good soft-decision decoding performance, so that approximate bucketsorting using n buckets should be sufficient. For 71 >> 26 it should then also be sufficient to havefewer than n buckets. With only a few bits being devoted to the metric, the metrics can be assignedto their appropriate buckets in worst case time of precisely 71 steps (and using zero comparisons)using the binary metric values as a direct address to index the array of buckets.118Chapter 4. Reconfigurable Trellis Decoding^ 4.6 Computational Effort of RT-M DecodingNumber of Binary Vector Operations for Code Matrix Reduction, -NT'Here we consider the number of binary vector operations to reduce the parity check matrix Hor the generator matrix G, for use with the 'Method 2' or 'Method 3' trellis generation schemes,respectively, discussed in Chapter 2.Using elementary row operations the reduction of the (71 - k) X 71 matrix H can be done usingof the order of (n — k) 2 71 bit operations. Since these bit operations are simple bit-by-bit XOR orzero-testing operations, they are ideally suited to implementation in digital hardware. As well, itis natural to process the matrix using operations on n—bit vectors, rather than bit by bit. The keyoperations in the matrix reduction are (I) the identification of the rightmost 1 to find a pivot fromthe remaining non-pivot rows, and (2) the addition of this row to other rows that share a 1 in thispivot position. We assume that we have a length-n binary-vector processing-block that can findthe rightmost 1 in a row. We also assume that we have a length-n binary-vector processing-blockthat performs the vector-XOR on another row if it has a one in the pivot position. The worst casenumber of such binary-vector operations is obtained as follows. At the first step in the reductionthere will be at most n — k tests to find the rightmost 1 as the first pivot. This is followed byn — k — 1 operations of the XOR block to eliminate l's in the non-pivot rows. At subsequent steps,there are at most 71 - k — 1, 71 - k — 2, . , 2, 1 tests to find pivots, with n — k — 1 XOR-blockoperations for each pivot row. In total then there are71—k^1 71 - k)(71 - k + 1)^ (4.46)tests for pivots and(71 — k)(71 — k — 1)^ (4.47)119Chapter 4. Reconfigurable Trellis Decoding^ 4.6 Computational Effort of RT-M DecodingXOR-block operations. Hence the total number of length-n binary vector operations is upper-bounded byNr^1< —2(n — k)(n — k +1) (n — 0(n — k —1)= —3 (n — 0(n. — k — 1/3)2< (n —^. (4.48)The number of length-n binary vector operations for reducing the generator matrix G instead ofH can be similarly obtained as(4.49)For codes with rate R > 1/2, which is typically the case, it is simpler to reduce H for use inMethod 2, rather than reduce G and use Method 3.Number of Binary -Vector Operations for Tree Searching, NsHere we account for the number of binary-vector operations used in carrying out the partialtree search. The search can utilize states, as in the trellis generation schemes of Methods 2 and 3,or the search can dispense with maintaining states, as will be discussed shortly.First we consider the number of binary-vector operations for the Method 2 and Method 3trellis generation schemes. Both of these methods will have an identical number of binary-vectoroperations, except for the reduction of the code matrix. As discussed in the previous section, typicalcodes have a rate R> 1/2 and will require fewer operations for matrix reduction using Method 2.Below we consider the worst case and expected number of binary vector operations.In Method 2, for extensions using the zero bit the new state is unchanged, so that no vector-XOR operation is required. We thus need only count the extensions using a '1'. However, not120Chapter 4. Reconfigurable Trellis Decoding^ 4.6 Computational Effort of RT-M Decodingall extensions using a one need be counted, for two reasons. First, recall from the 'Method 2'trellis construction method that extensions at constraint-positions need not update the state, sincethe extended state will automatically have a zero in a specific bit. (This bit is indicated by a counterthat starts at 0 and is incremented for each constraint position, so that at each constraint positionthe leading remaining bit of each state is set to zero.) Hence the decoder can simply ignore thisbit for all states at the constraint-depth (and all subsequent depths). Second, of the head extensionsusing a '1' at each information-position, the decoder need only compute the new state for thosecontenders that are retained by contender merge-sifting. However, we will ignore this saving andwill form a worst case upper bound by counting the number of extensions using a '1' at informationpositions. For the initial //w depths the decoder will performIME _ 1 (4.50)vector-XOR operations to update the contender states. For each of the remaining k —1 Af information-positions, there are M extensions using a '1' (although not all of these extensions will be kept).Hence, the number of binary-vector operations is upper bounded byArs < M(A7 — /AI ) 21m — 1. (4.51)A simpler upper bound is given by counting M state extensions at each of the information positions,givingN, < MIc. (4.52)An upper bound on the expected number of state computations can be found by reconsideringthe above worst-case calculations at information positions greater or equal to 1M•. Consider that withequally likely information-bits, and a symmetric channel, we should not favour the hard-decision121Chapter 4. Reconfigurable Trellis Decoding^ 4.6 Computational Effort of RT-M Decodingbeing a '1' over it being a '0'. Thus, for depth 1.m and later, it is equally likely that a case of all ofC1 being retained is just as likely as a case of all of Co being retained. In other words, it is equallylikely that M or 0 states are computed. Similarly, other numbers of contenders being retained fromthe contender-subsets (e.g. (M/2, M/2) or (1, M — 1) etc.) occur without bias towards either a'0' or a '1'. Hence the average number of state computations at an information-bit is MI2. Wethen haveE[N]< —m(k — 64)+21' — 1 .2(4.53)Finally, we consider an alternate method of carrying out the partial search, that dispenses withcalculating states. In the method discussed above, we have generated a partial tree by simplyignoring merges in a partial trellis. We used the states only as a means of keeping track of theencoding of heads. One can avoid retaining state information for each survivor (and instead retainjust the head symbols) by using 're-encoding,' as follows. At an information-position a head isextended with all symbols of the code alphabet. At a constraint-position, in order that the head bepart of a codeword the extended symbol must satisfy one of the parity-check equations (due to thefact that the parity check matrix has been reduced to row echelon form). The appropriate paritycheck equation is simply the itit' row of the reduced H matrix, where ip is the constraint-positioncounter introduced earlier. Thus, each head is extended and 're-encoded' by ensuring that eachconstraint symbol satisfies its appropriate parity check equation.An upper bound on the worst case number of binary-vector operations used in re-encoding canbe formed by assuming that at each constraint-position we have M contenders, so that there areat most M(n — k) binary-vector operations, since as each head is extended there will be n — kconstraints to be met.122Chapter 4. Reconfigurable Trellis Decoding^ 4.6 Computational Effort of RT-M DecodingThis approach can also be used with the reduced G matrix, where we simply re-encode eachparity symbol as required, using the appropriate column of G. This will also result in a worst caseupper bound of M(n — k) binary vector operations.This 'no-state' (or re-encoding) approach can be roughly compared to the state approach ofMethods 2 and 3, as follows. Methods 2 and 3 were shown to have a number of binary-vectoroperations upper bounded by Mk, whereas the re-encoding approach has a worst case upper boundof M(7? — k). Hence, re-encoding without maintaining states will be advantageous for high-ratecodes.Summary of Upper Bounds on the Number of OperationsTables 4.1 and 4.2 summarize the most useful of the upper bounds found above on the worstcase and expected number of operations. From the detailed bounds in the table we see that eachof N N a and N, are loosely upper-bounded by Mn. Tables 4.1 and 4.2 will be used in thefollowing sections.123Chapter 4. Reconfigurable Trellis Decoding^ 4.6 Computational Effort of RT-M DecodingUpper Bound on Worst Case Upper Bound on Expected CaseIV, M(k + A + 2 – /m) ± 21m – 1m – 2 – A (4.41)Comparisons(< Mk + (M – 1)A) (4.42)Na Mn (4.43)+21^4 –Additions M ( 2 /m)^1(4.45)Table 4.1 Upper Bounds on the Number of Metric OperationsThe upper bounds in the table are for decoding any binary linear block code using the Malgorithm. Contender merge-sifting (§3.4) is assumed to be used to implement the M algorithm.Upper Bound on Worst Case Upper Bound on Expected Case3H – (n – k)2^(4.48)Nr 2MatrixReduction G –3 k 2^ (4.49)2M(k – Im)+21" – 1^(4.51) —^– 1)0+ 21" – 1^(4.53)States 2( 2(A.4)which can be easily verified to have the solutionn,c(n) = nig n — (n — 1)^ (A.5)where lg denotes log2.The total number of comparisons for contender sifting is thenN(M,2M) = 2[M lg M — (M — 1)] M= 2M lg M — (M — 2).^ (A.6)Normalizing by M, we obtainN(M, 2M)= 21g M — 1 + 2/M. (A.7) 149Appendix B: Proof of Theorem 4.1To prove the uniform error property for pruning decoders given by Theorem 4.1 we first writePeic, = f Pr(error r, cj sent) f (r cj sent) dr^(B.1)andPeick =^Pr (error I r', ck sent) f (ri I ck sent) dr'.^(B.2)To show that (B.1) and (B.2) are equal, it is sufficient to show that for each r there is a correspondingr' such thatf (r' ck sent) = f(r cj sent)^ (B.3)andPr (error I^ck sent) = Pr(error r, c sent).^(B.4)Now f(r cj sent) =fJ f (ri cji) and f(r I ck sent) =fJ f(ri I cki) . By choosingi=1^ i1{ , cki = cii=i^ (B.5)^-72 Cki^Cjiit is easy to see that (B.3) will be satisfied, since for each position where cj and ck agree we havef (7)z I ckz) = f(rI ckz) = f(7-, I cp)^ (B.6)and for each position where cj and ck disagree we have, using the symmetry of the binary-inputsymmetric-output channel (i.e. f(ri I 0) = f(—r 1)), thatf^cki) = f^ (B.7)150Appendix B^ Proof of Theorem 4.1To show that (B.4) holds, we first let "CI denote the output codeword when c was transmittedand r was received, and similarly we let "Ck denote the output codeword when ck was transmittedand r' was received. Also, we let e = cj denote the error vector between the transmittedcodeword cl and the output codeword "CI, and similarly we let e' = ck -F-6, denote the error vectorbetween the transmitted codeword ck and the output codeword C'k. To show that (BA) holds, itis sufficient to show that e = e'.If e is to equal ei, we need cj C'j = ck -dk, or equivalently we need cj ck =i +Define z = cl ck. We then need only to show that the output codewords also differ by z. Forthis it is sufficient to show that at any decoding stage that each head ch retained by the decoderusing r has a corresponding head cfh retained by the decoder using rt, where ch cfh = zh. Wewill show this by showing that each head that enters any node a in the trellis when r is processedhas a counterpart that enters a node al when ri is processed, where the corresponding heads differby zh, and where the corresponding heads have identical metrics. In other words, we aim to showthat the decoder encounters the same set of metrics at a node a when processing r as it does at a'when processing ri, with the heads for these 'equal-metric counterparts' differing by zh.Recall that the decoding rule is to minimize (3.19), f( rh I ch)Pr(ch) fo (rh )^(B.8)Ch EnmSince the heads entering any given node have equal lengths, and the decoder is taken to assumeequiprobable codewords, for the purpose of comparing metrics at a depth we can simply usef( rh ch).Consider a head metric f (rh ch). For a head metric using ri to equal this we needf (rfh c/h) = f (rh ch).^ (B.9)151Appendix B^ Proof of Theorem 4.1Recall (from (B.5) and the definition of z) that we have chosen r' to be the negative of r in thosepositions where z is nonzero. Using this fact and the symmetry of the binary-input symmetric-output channel, for (B.9) to hold we will need c'h to differ from ch in those positions where Zh isnonzero. In other words we must have that Ch + = z", which is part of what we needed to show.All that remains is to show that all the heads that merge at a node a (when processing r) have theirequal-metric counterparts (when processing r') at a node a'. From Chapter 2 (see Proposition 2.1on pg. 26 and (2.7) ), each head that merges at a node a in a minimal trellis satisfies=^chT^ (B.10)Now, since any head Cfh that is an equal-metric counterpart to Ch satisfies^= Ch z", we havethat its state isat Hh cihT= Hh (chT zhT)= a + HyT.^ (B.11)Now either Hh zhT--is nonzero or it is zero. In the former case, we have that all the heads thatmerge at a when r is processed have their equal-metric counterparts merge at some other node 0-',as required. In the latter case, we have a similar situation except that a' =Using any minimal-trellis state-assignment method besides that used above [8] will yield anisomorphic trellis [17]. However, this relabeling of trellis states will not alter our result, since thestate labels are immaterial to the head metrics and the partitioning of the heads into states. Finally,we comment that above proof holds for symmetrically quantized outputs, under the assumptionthat metric ties are randomly resolved, as is assumed in the proof of the uniform error propertyfor ML decoding in [52].152References[1] A. J. Viterbi, "Wireless digital communications: A view based on three lessons learned," IEEECommun. Magazine, vol. 29, pp. 33-36, Sept. 1991.[2] R. G. Gallager, Information Theory and Reliable Communication. Wiley, 1968.[3] J. M. Wozencraft and I. M. Jacobs, Principles of Communication Engineering. Wiley, 1965.[4] J. L. Massey, "The how and why of channel coding," in IEEE Int. Zurich. Seminar on DigitalCommunications, 1984.[5] G. D. Forney, Concatenated Codes. M.I.T. Press, 1966.[6] D. Chase, "A class of algorithms for decoding block codes with channel measurementinformation," IEEE Trans. Inform. Theory, vol. IT-18, pp. 170-182, Jan. 1972.[7] W. Jakes, ed., Microwave Mobile Communications. Wiley, 1974.[8] J. K. Wolf, "Efficient maximum likelihood decoding of linear block codes using a trellis,"IEEE Trans. Inform. Theory, vol. IT-24, pp. 76-80, Jan. 1978.[9] J. L. Massey, "Foundations and methods of channel coding," in Proc. of the Int. Conf. on Info.Theory and Systems, vol. 65, NTG-Fachberichte, Sept. 1978.[10] G. D. Forney, "Review of random tree codes," tech. rep., NASA Ames Research Center,Moffet Field CA, Dec. 1967. Contract NAS2-3637 NASA CR73176 Final Report, App.A.[11] G. D. Forney, "The Viterbi algorithm," Proc. IEEE, vol. 61, pp. 268-278, 1973.[12] A. J. Viterbi, "Error bounds for convolutional code and an asymptotically optimal decodingalgorithm," IEEE Trans. Inform. Theory, vol. IT-13, pp. 260-269, Apr. 1967.[13] J. K. Omura, "On the Viterbi algorithm," IEEE Trans. Inform. Theory, vol. IT-15, pp. 177–179, Jan. 1969.[14] G. Ungerboeck, "Channel coding with multilevel/phase signals," IEEE Trans. Inform. Theory,vol. IT-28, no. 1, pp. 55-67, 1982.[15] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, "Optimal decoding of linear codes forminimizing symbol error rate," IEEE Trans. Inform. Theory, pp. 284-287, Mar. 1974.[16] G. D. Forney, "Coset codes — Part II: Binary lattices and related codes," IEEE Trans. Inform.Theory, vol. IT-34, pp. 1152-1187, Sept. 1988. Appendix A.[17] D. J. Muder, "Minimal trellises for block codes," IEEE Trans. Inform. Theory, vol. IT-34,pp. 1049-1053, Sept. 1988.153References[18] J. B. Cain, G. C. Clark, and J. M. Geist, "Punctured convolutional codes of rate (n-1)/n andsimplified maximum likelihood decoding," IEEE Trans. Inform. Theory, vol. IT-25, pp. 97—100, Apr. 1979.[19] J. Hagenauer, "Rate compatible punctured convolutional codes (RCPC-codes) and theirapplication," IEEE Trans. Commun., vol. 36, pp. 389-400, Apr. 1988.[20] D. M. Mandelbaum, "An adaptive-feedback coding scheme employing incremental redun-dancy," IEEE Trans. Inform. Theory, vol. IT-20, pp. 388-389, May 1974.[21] G. Strang, Linear Algebra and its Applications. Academic Press, 1976.[22] K. R. Matis and J. W. Modestino, "Reduced-search soft-decision trellis decoding of linearblock codes," IEEE Trans. Inform. Theory, vol. IT-28, pp. 349-355, Mar. 1982.123] T. Kasami, T. Takata, T. Fujiwara, and S. Lin, "On complexity of trellis structure of linearblock codes," in IEEE Int. Symposium on Information Theory, June 1991.[24] R. Bellman, Adaptive Control Processes: A Guided Tour. Princeton University Press, 1961.[25] R. Bellman, Applied Dynamic Programming. Princeton University Press, 1962.[26] J. L. Massey, "Variable-length codes and the Fano metric," IEEE Trans. Inform. Theory,vol. IT-18, pp. 196-198, Jan. 1972.[27] D. E. Knuth, The Art of Computer Programming: Volume III, Sorting and Searching. Addison-Wesley, 1973.[28] R. Blahut, Digital Transmission of Information. Addison-Wesley, 1990.[29] J. B. Anderson and S. Mohan, "Sequential decoding algorithms: A survey and cost analysis,"IEEE Trans. Commun., vol. COM-32, pp. 169-176, Feb. 1984.[30] F. Jelinek and J. B. Anderson, "Instrumentable tree encoding of information sources," IEEETrans. Inform. Theory, pp. 118-119, Jan. 1971.[31] J. B. Anderson and J. B. Bodie, "Tree encoding of speech," IEEE Trans. Inform. Theory,vol. IT-21, pp. 379-387, July 1975.[32] S. J. Simmons, "Breadth-first trellis decoding with adaptive effort," IEEE Trans. Commun.,vol. COM-38, pp. 3-12, Jan. 1990.[33] R. M. Fano, "A heuristic discussion of probabilistic decoding," IEEE Trans. Inform. Theory,vol. IT-19, pp. 64-74, Apr. 1963.[34] K. Zigangirov, "Some sequential decoding procedures," Probl. Peredach. Inform., vol. 2,pp. 13-25, 1966.[35] F. Jefinek, "A fast sequential decoding algorithm using a stack," IBM J. Res. Dev., vol. 13,pp. 675-685, Nov. 1969.154References[36] I. M. Jacobs and E. R. Berlekamp, "A lower bound to the distribution of computation forsequential decoding," IEEE Trans. Inform. Theory, vol. IT-13, pp. 167-174, Apr. 1967.[37] J. L. Massey, "Error bounds for tree codes, trellis codes, and convolutional codes withencoding and decoding procedures," In Coding and Complexity, G. Longo (Ed.) Springer-Verlag, 1976.[38] J. B. Anderson, "Limited search trellis decoding of convolutional codes," IEEE Trans. Inform.Theory, vol. IT-35, pp. 944-955, Sept. 1989.[39] T. Hashimoto, "A list-type reduced constraint generalizaton of the Viterbi algorithm," IEEETrans. Inform. Theory, vol. IT-33, pp. 866-876, Nov. 1987.[40] D. Haccoun and M. J. Ferguson, "Generalized stack algorithms for decoding convolutionalcodes," IEEE Trans. Inform. Theory, vol. IT-21, pp. 638-651, Nov. 1975.[41] A. Aho, J. Hoperoft, and J. Ullman, Data Structures and Algorithms. Addison-Wesley, 1982.[42] S. Mohan and A. K. Sood, "A multiprocessor architecture for the (M,L)-algorithm suitablefor VLSI implementation," IEEE Trans. Commun., vol. COM-34, pp. 1218-1224, Dec. 1986.[43] S. J. Simmons, "A bitonic-sorter based VLSI implementation of the M algorithm," in IEEEPacific Rim Conference on Communications, Computers and Signal Processing, (Victoria,B.C., Canada), pp. 337-340, June 1989.[44] S. J. Simmons, "A nonsorting VLSI structure for implementing the (M,L) algorithm," IEEEJ. Selected Areas Commun., vol. JSAC-6, pp. 538-546, Apr. 1988.[45] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms. McGraw-Hill,1990.[46] M. Blum, R. Floyd, V. Pratt, R. Rivest, and R. Tarjan, "Time bounds for selection," J. Comp.Sys. Sci., vol. 7, pp. 448-461, 1973.[47] A. Schtinhage, M. Paterson, and N. Pippenger, "Finding the median," J. Comp. Sys. Sci.,vol. 13, pp. 184-199, 1976.[48] R. E. Blahut, Theory and Practice of Error Control Codes. Addison-Wesley, 1983.[49] F. Hemmati, "Bidirectional decoding of linear block codes," in IEEE Sympos. Inf. Theory,1990.[50] D. Haccoun and G. Belzile, "Bidirectional algorithms for the decoding of convolutionalcodes," in IEEE Inform. Theory Symposium, p. 177, 1990.[51] K. Li and S. Kallel, "Bidirectional sequential decoding for convolutional codes," in IEEEPacific Rim Conf. on Communications, Computers, and Signal Processing, pp. 200-204, 1991.[52] A. J. Viterbi and J. Omura, Principles of Digital Communication and Coding. McGraw-Hill,1979.155References[53] R. Silverman and M. Balser, "Coding for constant-data-rate systems," IRE Trans. Inform.Theory, vol. PG IT-4, pp. 50-63, 1954.[54] G. Einarsson and C. Sundberg, "A note on soft decision decoding with successive erasures,"IEEE Trans. Inform. Theory, vol. IT-22, pp. 88-96, Jan. 1976.[55] C. Yu and D. Costello, "Generalized minimum distance decoding algorithms for Q-ary outputchannels," IEEE Trans. Inform. Theory, vol. IT-26, pp. 238-243, Mar. 1980.[56] D. Taipale and M. Pursley, "An improvement to generalized-minimum-distance decoding,"IEEE Trans. Inform. Theory, vol. IT-37, pp. 167-172, Jan. 1991.[57] E. Berlekamp, "The technology of error-correcting codes," Proc. IEEE, vol. 68, pp. 564-593,May 1980.[58] E. Weldon, "Decoding binary block codes on Q-ary output channels," IEEE Trans. Inform.Theory, vol. IT-17, pp. 713-718, Nov. 1971.[59] S. Wainberg and J. Wolf, "Algebraic decoding of block codes over a q-asy input, Q-ary outputchannel, Q>q," Information and Control, vol. 22, pp. 232-247, Nov. 1973.[60] C. Hackett, "An efficient algorithm for soft-decision decoding of the (24,12) extended Golaycode," IEEE Trans. Commun., vol. COM-29, pp. 909-911, June 1981.[61] J. Massey, Threshold Decoding. MIT Press, 1963.[62] J. Conway and N. Sloane, "Soft decoding techniques for codes and lattices, including theGolay code and the Leech lattice," IEEE Trans. Inform. Theory, vol. IT-32, pp. 41-56, Jan.1986.[63] Y. Be'ery and J. Snyders, "Optimal soft decision block decoders based on fast Hadamardtransform," IEEE Inf. Theory, vol. IT-32, pp. 355-364, May 1986.[64] J. Snyders and Y. Be'ery, "Maximum likelihood soft decoding of binary block codes anddecoders for Golay codes," IEEE Trans. Inform. Theory, vol. IT-35, pp. 963-975, Sept. 1989.[65] J. Snyders, "Reduced lists of error patterns for maximum likelihood soft decoding," IEEETrans. Inform. Theory, vol. IT-37, pp. 1194-1200, July 1991.[66] E. Prange, "The use of information sets in decoding cyclic codes," IRE Trans. Inform. Theory,vol. IT-8, pp. s5—s9, 1962.[67] F. MacWilliams, "Permutation decoding of systematic codes," Bell Syst. Tech. J., vol. 43,pp. 485-505, 1964.[68] J. Schtinheim, "On coverings," Pac. J. Math, vol. 14, pp. 1405-1411, 1964.[69] G. Clark and J. B. Cain, Error-Correction Coding for Digital Communications. Plenum, 1981.[70] J. Coffey and R. M. Goodman, "The complexity of information set decoding," IEEE Trans.Inform. Theory, vol. IT-36, pp. 1031-1037, Sept. 1990.156References[71] T. Kasami, "A decoding procedure for multiple-error-correcting cyclic codes," IEEE Trans.Inform. Theory, vol. IT-10, pp. 134-138, 1964.[72] B. Dorsch, "A decoding algorithm for binary block codes and J—ary output channels," IEEETrans. Inform. Theory, vol. IT-20, pp. 391-394, May 1974.[73] C. Lin and J. Anderson, "M-algorithm decoding of channel convolutional codes," in Conf.on Information Science and Systems, (Princeton University), pp. 362-366, 1986.[74] G. Pottie and D. Taylor, "A comparison of reduced complexity decoding algorithms for trelliscodes," IEEE J. Selected Areas Commun., vol. COM-38, pp. 981-991, July 1990.[75] T. Matsumoto, "Trellis decoding of linear block codes in digital mobile radio," in IEEEVehic. Tech. Conf., pp. 6-11, 1988.[76] H. A. David, Order Statistics. Wiley, 1981.[77] J. Sanie, K. D. Donohue, and N. M. Bilgutay, "Order statistic filters as postdetectionprocessors," IEEE Trans. Acoust. Speech Sig. Proc., vol. 38, pp. 1722-1731, Oct. 1990.[78] A. Vardy and Y. Be'ery, "More efficient decoding of the Golay codes," IEEE Trans. Inform.Theory, vol. IT-37, pp. 667-672, May 1991.[79] J. Proakis, Digital Communication. McGraw-Hill, 2 ed., 1983.[80] R. Clarke, "A statistical theory of mobile radio reception," Bell Syst. Tech. J., vol. 47, pp. 957—1000, 1968.[81] A. Kot and C. Leung, "Optimal partial decision combining in diversity systems," IEEE Trans.Commun., vol. COM-38, pp. 981-991, July 1990.[82] B. Radosavlievic, E. Arikan, and B. Hajek, "Sequential decoding of low-density parity-checkcodes by adaptive reordering of parity checks," IEEE Trans. Inform. Theory, vol. IT-38,pp. 1833-1839, Nov. 1992.[83] R. Gallager, Low-Density Parity-Check Codes. MIT Press, 1963.[84] R. Blahut, Fast Algorithms for Digital Signal Processing. Addison-Wesley, 1985.157